lrm character and rlm character throw exception #119
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
when the code parse html code like:
June, 2016
program will throw IndexError exception.
I find this bug in the implement of handle_charref.
In handle_data, it maybe match the zero element of char, but the lrm and rlm character are defined as ''(empty).
So, when program match the zero element of lrm and rlm character data,
+++++++++++++++++++++++++++++++++++
elif (self.preceding_stressed
and re.match(r'[^\s.!?]', data[0])
and not hn(self.current_tag)
and self.current_tag not in ['a', 'code', 'pre']):
+++++++++++++++++++++++++++++++++++
This is traceback:
Traceback (most recent call last):
File "get_email.py", line 37, in
text = h.handle(mail_content_string) # html格式 转成 markdown 格式
File "/data/vijay/emailcont_venv/lib/python3.4/site-packages/html2text/init.py", line 149, in handle
self.feed(data)
File "/data/vijay/emailcont_venv/lib/python3.4/site-packages/html2text/init.py", line 146, in feed
HTMLParser.HTMLParser.feed(self, data)
File "/usr/lib64/python3.4/html/parser.py", line 165, in feed
self.goahead(0)
File "/usr/lib64/python3.4/html/parser.py", line 268, in goahead
self.handle_charref(name)
File "/data/vijay/emailcont_venv/lib/python3.4/site-packages/html2text/init.py", line 186, in handle_charref
self.handle_data(self.charref(c), True)
File "/data/vijay/emailcont_venv/lib/python3.4/site-packages/html2text/init.py", line 802, in handle_data
and re.match(r'[^\s.!?]', data[0])
IndexError: string index out of range