Pdfminer master #2

josiahkhor · 2022-07-22T05:23:59Z

Pull request

Thanks for improving pdfminer.six! Please include the following information to
help us discuss and merge this PR:

A description of why this PR is needed. What does it fix? What does it
improve?
A summary of the things that this PR changes.
Reference the issues that this PR fixes (use the fixes #(issue nr) syntax).
If this PR does not fix any issue, create the issue first and mention that
you are willing to work on it.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide
instructions so we can reproduce. Include an example pdf if you have one.

Checklist

I have added tests that prove my fix is effective or that my feature
works
I have added docstrings to newly created methods and classes
I have optimized the code at least one time after creating the initial
version
I have updated the README.md or I am verified that this
is not necessary
I have updated the readthedocs documentation or I
verified that this is not necessary
I have added a consice human-readable description of the change to
CHANGELOG.md

* Remove unused sortedcontainers package * Fix changelog format * Fix a link to the PR * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

* Fix for when 'trailer' is indented Closes pdfminer#214 * Address CR comments - strip line after parsing * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

…on 3.9 (pdfminer#522) Closes pdfminer#503

) This reverts commit ec223d1.

…cking the value of 'DW2' (pdfminer#529) Closes pdfminer#518 * Fix TypeError: cannot unpack non-iterable PDFObjRef object, when unpacking the value of 'DW2' An error is occured when the 'DW2' key contains a PDFObjRef object instead of a list of int values, e.g: 'DW2': <PDFObjRef:152>. To solve this issue, we utilise the resolve1() function See: pdfminer#518 * Updated CHANGELOG * Update CHANGELOG.md Co-authored-by: Dimitrios TSOLAKIDIS <[email protected]> Co-authored-by: Pieter Marsman <[email protected]>

… removing the try-except check that writes a unicode character to the stream (pdfminer#523) Closes pdfminer#191 * Remove supoprt for non standard output streams that are not binary by removing the try-except check that writes a unicode character to the stream * Add docstring * Fix flake8

…with the error: [Errno 13] Permission denied) (pdfminer#484) Closes pdfminer#469 * Issue pdfminer#469 is fixed * one extra comment to code is added * TemporaryFilePath context manager is added to facilitate tests * flake8 complaints fixed * Update docs of tempfilepath.py * Fix flake8 Co-authored-by: Pieter Marsman <[email protected]>

…Trusty Tahr to Focal Fossa (pdfminer#585) * Update .travis.yml * Also change 3.9-dev to 3.9 because that is now supported by travis

* tox: use Python 3.9 final * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

* Fix .paint_path handling of single line segments - Fixes typo ("ml" should have been "mlh") - Removes if-statement that required individual line segments to be strictly horizontal or vertical. * Treat 'ml'-shape paths as lines not curves Althoguh 'mlh' is the canonical implementation for a single line segment, 'ml' is fairly common. Adds tests and sample PDF. * Fix trailing whitespace * Fix point-extraction from Beziér path commands This commit corrects the manner in which "pts" are extracted from Beziér path commands. See Table 4.9 of PDF reference manual, and new comments in code for details. Previously, depending on whether the command (c, v, or y) the code was extracting some combination of control points (not on curve) and the actual points-on-curve. This commit also refactors .paint_path, so that apply_matrix_pt is only called in one place, and to treat the "h" command in a manner more consistent with other path commands. * Add comments to test_paint_path_quadrilaterals * Parse rect-forming mllll paths as rects not curves Now that .paint_path has been refactored, adding support for rect-forming mllll paths requires no extra code, beyond a minor tweak to the relevant elif statement. * One changelog line with ref to mr * Remove PDFLayoutAnalyzer._create_curve because implementation has become trivial due to refactoring * Extract variables from if statement to make it easier to read * Optimize imports order * Trigger travis build * Revert "Trigger travis build" This reverts commit 41c0518 * Update travis badge * Update travis badge Co-authored-by: Pieter Marsman <[email protected]>

* Fix for when trailer is indented * Store stripped line * This commit breaks things... * Or maybe this one breaks things? * Remove commented code because no longer used. * Add CHANGELOG.md * Add poetry venv management files to gitignore since I started using poetry to manage the python envs for this project Co-authored-by: Pieter Marsman <[email protected]>

…dfminer#537) * Added support for Paeth PNG filter compression (predictor value = 4) * Use `above` and `upper_left` as in the pseudo code * Refactor: use variable names that are very close to the pseudo code and add pieces of the docs to show what is going on. * Fix line length issues * Add line about compressions to README.md * Fix merge conflict on readme * Fix bug in filter type Up * Make if-else consistent Co-authored-by: Eduardo Gonzalez Lopez de Murillas <[email protected]> Co-authored-by: Pieter Marsman <[email protected]>

…ser (pdfminer#574) * check obj type * update changelog * Update CHANGELOG.md * fix the bug * fix condition * update changelog * update changelog again * update changelog * update Co-authored-by: Pieter Marsman <[email protected]> Co-authored-by: Tony Tong <[email protected]>

* Fix typos and possible mistakes. * Revert two edits based on discussion in pdfminer#579 Revert the two changes based on our discussion. I read the documentation and had a glimpse at the default code. And perhaps the confusion was caused by the figure that shows the Char Margin (M) and the Word Margin (W). Clearly, M is smaller than W in absolute terms, but as mentioned, they are both relative numbers. Maybe it is useful to point that out in the figure but I am not sure how best to do it. Another option is to mention use something like `min_char_margin_threshold` or similar, in the hope that they are easier to understand. Just some thoughts! * Triggering travis again Co-authored-by: Pieter Marsman <[email protected]>

Fixes pdfminer#566 * try to fix issue of some Chinese characters cannot be extracted correctly (pdfminer#566). * format code to pass flake8 check. * fix typo and refer to issue 593. Co-authored-by: huan_cheng <[email protected]> Co-authored-by: Pieter Marsman <[email protected]>

Co-authored-by: Pieter Marsman <[email protected]>

…iner#600) * Fix an error when dumping a TOC * Fix a bug that a TOC title variable is a bytes type * Update CHANGELOG.md * Update CHANGELOG.md * Rename e() to escape() and merge two isinstance() checks Co-authored-by: Pieter Marsman <[email protected]>

The canonical home of the documentation framework has moved from documentation.divio.com to https://diataxis.fr.

* add missing import for extract_text_to_fp * Replace testsetup with visible imports in documentation * Remove obsolete check for python version; python 2 is not supported anymore * (Unrelated to this MR) Remove sys from converter.py * Optimize imports * (Unrelated to this MR) fix line length error Co-authored-by: Pieter Marsman <[email protected]>

…output of TagExtractor * high_level: emit diagnostic for bad output_type * TagExtractor: eliminate runtime error This does not make is usable, but will satisfy my curiosity. * Use if-elif-else structure * Fix pycharm spacing warning * Rename _write_outfp to _write * Properly format tag names and tag values. Using utils.make_compat_str() such that the tag value is always a string. * Update CHANGELOG.md * Fix flake8 errors Co-authored-by: Pieter Marsman <[email protected]>

* Fix typos in converting_pdf_to_text.rst * The word "pdfminer.six" as a whole should not be separated by newline, otherwise they are treated as two separated words by renderer, and incorrectly displayed as separated. * Trim redundant spaces Co-authored-by: Pieter Marsman <[email protected]>

* feat: Add support for ISO 32000-2 AES256 encryption * feat: Applies review suggestions

…ry, escaped \r\n should be removed (pdfminer#616) * detect TextIOWrapper as non-binary * I don't understand the CHANGELOG.md format, hope this is good enough * Delete \\\r\n in Literal Strings (ref. section 7.3.4.2 of PDF32000_2008) * Keep Travis CI happy * Added test * Remove pdfminer/Changelog * Prettify _parse_string_1 * Add CHANGELOG.md * Satisfy flake8 * Update CHANGELOG.md * Use logging.Logger.warning instead of warning.warn in most cases, following the Python official guidance that warning.warn is directed at _developers_, not users * (pdfdocument.py) remove declarations of PDFTextExtractionNotAllowedWarning, PDFNoValidXRefWarning * (pdfpage.py) Don't import warning, don't use PDFTextExtractionNotAllowedWarning * (tools/dumppdf.py) Don't import warning, don't use PDFNoValidXRefWarning * (tests/test_tools_dumppdf.py) Don't import warning, check for logging.WARN rather than PDFNoValidXRefWarning * get name right * make flake8 happy * Revert "make flake8 happy" This reverts commit 4592769. * Revert "get name right" This reverts commit 80091ea. * Revert "Use logging.Logger.warning instead of warning.warn in most cases, following" This reverts commit 3c1e3d6. * Revert "Merge branch 'preferLoggingToWarning' into hst" This reverts commit 9d9d139, reversing changes made to 80091ea. * Revert "Revert "Merge branch 'preferLoggingToWarning' into hst"" This reverts commit b3da219. Co-authored-by: Henry S. Thompson <[email protected]> Co-authored-by: Pieter Marsman <[email protected]>

Squashed commit of the following: commit fa229f7 Merge: eaab3c6 c3e3499 Author: Andrew Baumann <[email protected]> Date: Mon Sep 6 20:33:06 2021 -0700 Merge branch 'develop' into mypy (and fixed types) commit eaab3c6 Author: Andrew Baumann <[email protected]> Date: Mon Sep 6 20:00:45 2021 -0700 reformat all multi-line function defs to one-arg-per-line commit 3fe2b69 Author: Andrew Baumann <[email protected]> Date: Mon Sep 6 15:58:48 2021 -0700 ccitt nit -- avoid casting needlessly commit 15983d8 Author: Andrew Baumann <[email protected]> Date: Mon Sep 6 15:58:36 2021 -0700 tweak CHANGELOG commit 13dc0ba Author: Andrew Baumann <[email protected]> Date: Mon Sep 6 15:43:46 2021 -0700 add failing tests for dumppdf crash commit 6b509c5 Author: Andrew Baumann <[email protected]> Date: Mon Sep 6 15:24:23 2021 -0700 ccitt: apply misc PR feedback commit feb031b Author: Andrew Baumann <[email protected]> Date: Mon Sep 6 15:18:26 2021 -0700 add missing None return type to all __init__ methods commit c0d62d6 Author: Andrew Baumann <[email protected]> Date: Mon Sep 6 15:13:08 2021 -0700 minor cleanup, remove a few more Any types commit b52a059 Author: Andrew Baumann <[email protected]> Date: Sun Sep 5 22:37:28 2021 -0700 tighten up types, avoid Any in favour of explicit casts commit e58fd48 Author: Andrew Baumann <[email protected]> Date: Sun Sep 5 14:10:49 2021 -0700 annotate ccitt.py, and fix one definite bug (array.tostring was renamed tobytes) commit 6052906 Author: Andrew Baumann <[email protected]> Date: Sat Sep 4 22:37:38 2021 -0700 python 3.7 back-compat commit 4dbcf87 Author: Andrew Baumann <[email protected]> Date: Sat Sep 4 22:32:43 2021 -0700 annotate pdfminer.jbig2 commit 0d40b7c Author: Andrew Baumann <[email protected]> Date: Sat Sep 4 22:31:33 2021 -0700 annotate pdf2txt.py commit 5f82eb4 Author: Andrew Baumann <[email protected]> Date: Sat Sep 4 09:16:31 2021 -0700 cleanup: make Plane generic commit 624fc92 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 23:16:51 2021 -0700 bluntly ignore calls to cryptography.hazmat commit 96b2043 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 23:01:06 2021 -0700 finish annotating, and disallow_untyped_defs for pdfminer.* _except_ ccitt and jbig2 commit 0ab5863 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 21:51:56 2021 -0700 annotate pdffont commit 4b689f1 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 18:30:02 2021 -0700 annotate a couple more scripts; document sketchy code commit 291981f Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 15:02:01 2021 -0700 pacify flake8 commit 45d2ce9 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 14:31:48 2021 -0700 annotate dumppdf, and comment likely bugs commit 7278d83 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 13:49:58 2021 -0700 enable mypy on tests and tools, fix one implicit reexport bug commit 4a83166 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 13:25:59 2021 -0700 pdfdocument: per dumppdf.py, get_dest accepts either bytes or str commit 43701e1 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 13:25:00 2021 -0700 layout: LAParams.boxes_flow may be None commit 164f816 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 09:45:09 2021 -0700 add whitespace, pacify flake8 commit 893b9fb Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 09:40:33 2021 -0700 support old Python without typing.Protocol commit dc24508 Author: Andrew Baumann <[email protected]> Date: Fri Sep 3 09:12:03 2021 -0700 Move "# type: ignore" comments to fix mypy on Python < 3.8 The placement of these comments got more flexible in 3.8 due to python/mypy#1032 Satisfying older Python and fitting in flake8's 79-character line limit was quite a challenge! commit da03afe Author: Andrew Baumann <[email protected]> Date: Thu Sep 2 22:59:58 2021 -0700 fix text output from HTMLConverter commit 5401276 Author: Andrew Baumann <[email protected]> Date: Thu Sep 2 22:40:22 2021 -0700 annotate high_level.py and the immediately-reachable internal APIs (mostly converters) commit cc49051 Author: Andrew Baumann <[email protected]> Date: Thu Sep 2 17:04:35 2021 -0700 * expand and improve annotations in cmap, encryption/decompression and fonts * disallow untyped calls; this way, we have a core set of typed code that can grow over time (just not for ccitt, because there's a ton of work lurking there) * expand "typing: none" comments to suppress a specific error code commit 92df54b Author: Andrew Baumann <[email protected]> Date: Wed Sep 1 20:50:59 2021 -0700 update CHANGELOG commit f72aaea Merge: ff787a9 8ea9f10 Author: Andrew Baumann <[email protected]> Date: Wed Sep 1 20:47:03 2021 -0700 Merge branch 'develop' into mypy commit ff787a9 Author: Andrew Baumann <[email protected]> Date: Sat Aug 21 21:46:14 2021 -0700 be more precise about types on ps/pdf stacks, remove most of the Any annotations commit be15501 Author: Andrew Baumann <[email protected]> Date: Sat Aug 21 10:13:58 2021 -0700 silence missing imports, (maybe?) hook to tox commit ff4b6a9 Author: Andrew Baumann <[email protected]> Date: Fri Aug 20 22:49:06 2021 -0700 turn on more strict checks, and untangle the layout mess with generics Status: $ mypy pdfminer pdfminer/ccitt.py:565: error: Cannot find implementation or library stub for module named "pygame" pdfminer/ccitt.py:565: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports pdfminer/pdfdocument.py:7: error: Skipping analyzing "cryptography.hazmat.backends": found module but no type hints or library stubs pdfminer/pdfdocument.py:8: error: Skipping analyzing "cryptography.hazmat.primitives.ciphers": found module but no type hints or library stubs pdfminer/pdfdevice.py:191: error: Argument 1 to "write" of "IO" has incompatible type "str"; expected "bytes" pdfminer/image.py:84: error: Cannot find implementation or library stub for module named "PIL" Found 5 errors in 4 files (checked 27 source files) pdfdevice.py:191 appears to be a real bug commit 5c9c0b1 Author: Andrew Baumann <[email protected]> Date: Fri Aug 20 17:22:41 2021 -0700 finish annotating layout commit 0e6871c Author: Andrew Baumann <[email protected]> Date: Fri Aug 20 16:54:46 2021 -0700 general progress on annotations * finish utils * annotate more of pdfinterp, pdfdevice * document reason for # type: ignore comments * fix cyclic imports * satisfy flake8 commit 17d59f4 Author: Andrew Baumann <[email protected]> Date: Thu Aug 19 21:38:50 2021 -0700 WIP on type annotations With the possible exception of psparser.py, this is far from complete. $ mypy pdfminer pdfminer/ccitt.py:565: error: Cannot find implementation or library stub for module named "pygame" pdfminer/ccitt.py:565: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports pdfminer/pdfdocument.py:7: error: Skipping analyzing "cryptography.hazmat.backends": found module but no type hints or library stubs pdfminer/pdfdocument.py:8: error: Skipping analyzing "cryptography.hazmat.primitives.ciphers": found module but no type hints or library stubs pdfminer/image.py:84: error: Cannot find implementation or library stub for module named "PIL"

…mypy, nosetest and sphinx in there own environment on cicd (pdfminer#677) * Improve tox.ini by running flake8, mypy, nosetests and sphinx in there own environment. Improves isolation. Dependencies of one package won't influence the next. This should fail for the current setup with typing-extensions. * Try to fix actually running tox tests on travis * Use recent tox * Fix using Literal[False] for open_filename. None has the same true value as False, and therefore it does not matter. * Replace typing_extensions.Literal by the type of the literal * Add line to CHANGELOG.md

Fixes pdfminer#625 * add support for Identity-H/V cmap fonts * format code to pass flake8 check * Remove indent * Remove indent * Use isinstance instead of type check * Use or instead of any * Use str in variable, instead of str.find() * Fix mypy error: add typing annotations to get_unichr() * Fix type of PDFCIDFont. Can be any type of CMapBase. This is a quick fix, the entire cmap structure does not have proper inheritance. * Added line to CHANGELOG.md * Add separate class for IdentityUnicodeMap * Remove ABC from CmapBase * Remove ABC from CmapBase * Remove blank line Co-authored-by: huan_cheng <[email protected]> Co-authored-by: Pieter Marsman <[email protected]>

@zegrep

…ner#637) * Attempt to handle decompression error on some broken PDF files from times to times we go through files where no text is detected, while readers like evince reads the pdf nicely. After digging it occured this is because the PDF includes some badly compressed data. This may be fixed by uncompressing byte per byte and ignoring the error on the last check bytes (arbitrarily found to be the 3 last). This has been largely inspired by py-pdf/pypdf#422 and the test file has been taken from there, so credits to @zegrep. * Attempt to handle decompression error on some broken PDF files from times to times we go through files where no text is detected, while readers like evince reads the pdf nicely. After digging it occured this is because the PDF includes some badly compressed data. This may be fixed by uncompressing byte per byte and ignoring the error on the last check bytes (arbitrarily found to be the 3 last). This has been largely inspired by py-pdf/pypdf#422 and the test file has been taken from there, so credits to @zegrep. * Use a warnings instead of raising exception where zlib error is detected before the CRC checksum. * Add line to CHANGELOG.md * Only try decompressing if not in strict mode * Change error into warning because warning.warn needs a subclass of Warning Co-authored-by: Sylvain Thénault <[email protected]> Co-authored-by: Pieter Marsman <[email protected]>

* array.array.tostring -> array.array.tobytes The tostring method has been deprecated since Python 3.2 and was removed altogether in 3.9. In Python 3.2 the method was renamed to "tobytes" Will close pdfminer#641 * changelog entry * test for tobytes * Fix CHANGELOG.md * Update CHANGELOG.md to PR that I can push on * Simplify tests Co-authored-by: Forest Gregg <[email protected]>

…se to pytest (pdfminer#704) * Replace tox with nox * Replace travis with github actions * Fix pytest, mypy and flake8 errors * Add pytest. * Run on all commits * Remove nose * Speedup slow tests to save GitHub actions minutes * Added line to CHANGELOG.md * Fix line too long in pdfdocument.py * Update .github/workflows/actions.yml Co-authored-by: Jake Stockwin <[email protected]> * Improve actions.yml * Fix error with nox name for mypy * Add names for jobs * Replace nose.raises with pytest.raises Co-authored-by: Jake Stockwin <[email protected]>

* `log.info` changed to `log.debug` in six files * Fix identation * Remove from CHANGELOG.md since no functionality has changed Co-authored-by: Pedro Nunes <[email protected]> Co-authored-by: Pieter Marsman <[email protected]>

* Check blackness in github actions * Blacken code * Update github action names * Add contributing guidelines on using black * Add to checklist for PR

* Raise specific warning if Pillow cannot be imported * Improve error message * Update docs * Update CHANGELOG.md * Update pdfminer/image.py Co-authored-by: Jake Stockwin <[email protected]> Co-authored-by: Jake Stockwin <[email protected]>

* Adding in checks for spurious lines that contain either only spaces or new line characters * Added spurious lines check and unit tests * Updated CHANGELOG.md with changes * Simplify code * Simplify code * Simplify code * Remove changes to lines that are not actually changed * Format import * Improve CHANGELOG.md * Improve CHANGELOG.md * Fix cicd * Blacken Co-authored-by: Pieter Marsman <[email protected]>

…r#727) * Add github action for releasing to pypi if git tag is added. * Checkout code and fix typos. * Replace end with fi * Strictly numeric version for testing. * Remove obsolete Make commands for publishing * Also create GitHub release * Update pdfminer/__init__.py Co-authored-by: Jake Stockwin <[email protected]> * Remove test pypi release * Use maintained github action for releasing * Change tag format for versions * Undo commenting pypi publishing * Remove develop branch, since that will be removed in favor off adding tags for releases. * Change version regex Co-authored-by: Jake Stockwin <[email protected]>

* Convert fontname to str if it is bytes * Add CHANGELOG.md

…#733) * Raise KeyError when name in name2unicode is not of type str * Add CHANGELOG.md

…ys set (pdfminer#732) * Fix log.debug statement in lzw.py by ensuring that self.table is always set. * Add CHANGELOG.md

* Log warning and continue gracefully if errors in cmap * Fix nox testing * Also log warning if cid range is larger than actual code * Format with black * Add docstring * Add CHANGELOG.md * Restore running cmapdb.py directly

pdfminer#737) * Refactor ImageWriter and add method for exporting an image from bytes. E.g. when FlateDecode just results in a list of RGB bytes. * Added docstrings * Add CHANGELOG.md * Run black * Run black

* Use charset-normalizer instead of chardet * Ignore charset_normalizer type stub * Add CHANGELOG.md

* Ignore path constructors that do not begin with m Per PDF Reference Section 4.4.1, "path construction operators may be invoked in any sequence, but the first one invoked must be m or re to begin a new subpath." Since pdfminer.six already converts all `re` (rectangle) operators to their equivelent `mlllh` representation, paths ingested by `.paint_path(...)` that do not begin with the `m` operator are invalid. In addition to the advantage of hewing to the PDF Reference, this change also avoids the `ValueError: not enough values to unpack (expected 2, got 1)` error raised by the ` pts = [apply_matrix_pt(self.ctm, pt) for pt in raw_pts]` line in `converter.py` when parsing PDFs that (erroneously) include `("h",)` paths. * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

Using an upper bound for dependency versions on a library is a source of troubles for users. Let's not do it as it makes pdfminer wreck havoc downstream. Signed-off-by: Philippe Ombredanne <[email protected]>

* Fix Sphinx warnings howto/acro_forms.rst:4: WARNING: Title underline too short. howto/acro_forms.rst:81: WARNING: Bullet list ends without a blank line; unexpected unindent. howto/acro_forms.rst:88: WARNING: Bullet list ends without a blank line; unexpected unindent. howto/acro_forms.rst:122: WARNING: Bullet list ends without a blank line; unexpected unindent. tutorial/extract_pages.rst:6: WARNING: Failed to create a cross reference. A title or caption not found: api_extract_pages * Fix documenting pdf2txt.py reference/commandline.rst:12: ERROR: Module "tools.pdf2txt" has no attribute "maketheparser" Incorrect argparse :module: or :func: values? * Add CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

…t documented. Also deprecate usage of scripts that are only there for testing purposes. (pdfminer#756) * Deprecate usage of `if __name__ == "__main__"` in scripts that are not document. Also deprecate usage of scripts that are only there for testing purposes. * Add CHANGELOG.md * Cleanup CHANGELOG.md * Cleanup CHANGELOG.md * Undo deleting conf_glyphlist.py and conf_afm.py and add a deprecation warning instead

* Issue pdfminer#720 resolve1 when getting the default width. * Add CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

…. (pdfminer#774) * Fix crash with unencrypted metadata values (pdfminer#766). * Explicitly check for length * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

…#768) * Ignore null characters in PSBaseParser Beforehand, null characters were encoded as PSKeyword tokens. This caused issue pdfminer#617, as pdfdevice.py would attempt to decode the null character PSKeyword, when it expects a byte string, as opposed to a PSKeyword, causing pdfminer.six to crash. As null characters are superfluous within PSBaseParser, ignore them. * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

* Install typing_extensions on Python 3.6 and 3.7 * Add CHANGELOG.md * Black setup.py

* Run black locally with nox * Update contributor instructions * Fix workflow

…o pdfminer-master # Conflicts: # pdfminer/fontmetrics.py

estshorter and others added 30 commits October 24, 2020 15:55

Remove unused dependency on sortedcontainers (pdfminer#525)

61300ee

* Remove unused sortedcontainers package * Fix changelog format * Fix a link to the PR * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

Fix for when 'trailer' is indented (pdfminer#513)

ec223d1

* Fix for when 'trailer' is indented Closes pdfminer#214 * Address CR comments - strip line after parsing * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

Remove explicit support for Python 3.4 and 3.5, adding tests for pyth…

875e530

…on 3.9 (pdfminer#522) Closes pdfminer#503

Revert "Fix for when 'trailer' is indented (pdfminer#513)" (pdfminer#534

178a831

) This reverts commit ec223d1.

Correct typo's and syntax errors from README.md (pdfminer#538)

f389b97

Fix cryptography build in travis cicd by upgrading distribution from …

761410e

…Trusty Tahr to Focal Fossa (pdfminer#585) * Update .travis.yml * Also change 3.9-dev to 3.9 because that is now supported by travis

Use python3.9 in tox config

22f9052

* tox: use Python 3.9 final * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

Fix 594 use null id when encrypted but no id given (pdfminer#595)

a70f088

Co-authored-by: Pieter Marsman <[email protected]>

Updated link to Diátaxis documentation website (pdfminer#606)

1d33c02

The canonical home of the documentation framework has moved from documentation.divio.com to https://diataxis.fr.

Add support for ISO 32000-2 AES256 encryption (pdfminer#614)

c3e3499

* feat: Add support for ISO 32000-2 AES256 encryption * feat: Applies review suggestions

Bump version to 20211012

da5b968

pietermarsman and others added 29 commits February 2, 2022 22:24

Update actions.yml so that it will run for all PR's

81f873e

Update README.md batch for Continuous integration

2254306

Changed log.info to log.debug in six files (pdfminer#690)

830acff

* `log.info` changed to `log.debug` in six files * Fix identation * Remove from CHANGELOG.md since no functionality has changed Co-authored-by: Pedro Nunes <[email protected]> Co-authored-by: Pieter Marsman <[email protected]>

Check blackness in github actions (pdfminer#711)

b9a8920

* Check blackness in github actions * Blacken code * Update github action names * Add contributing guidelines on using black * Add to checklist for PR

Bump version

c2e516d

Fix github actions tag regex

a2e1d6a

Fix github actions tag regex

ae7f315

Convert fontname to str if it is bytes in HTMLConverter (pdfminer#734)

e27cd54

* Convert fontname to str if it is bytes * Add CHANGELOG.md

Raise KeyError when name in name2unicode is not of type str (pdfminer…

782368b

…#733) * Raise KeyError when name in name2unicode is not of type str * Add CHANGELOG.md

Fix log.debug statement in lzw.py by ensuring that self.table is alwa…

13021c9

…ys set (pdfminer#732) * Fix log.debug statement in lzw.py by ensuring that self.table is always set. * Add CHANGELOG.md

Refactor ImageWriter and add method for exporting an image from bytes. (

617e4c8

pdfminer#737) * Refactor ImageWriter and add method for exporting an image from bytes. E.g. when FlateDecode just results in a list of RGB bytes. * Added docstrings * Add CHANGELOG.md * Run black * Run black

Use charset-normalizer instead of chardet (pdfminer#744)

1bf3c42

* Use charset-normalizer instead of chardet * Ignore charset_normalizer type stub * Add CHANGELOG.md

Bump version 20220506 & fix small issue with types

e19aea9

Remove upper version bounds (pdfminer#755)

7f97e26

Using an upper bound for dependency versions on a library is a source of troubles for users. Let's not do it as it makes pdfminer wreck havoc downstream. Signed-off-by: Philippe Ombredanne <[email protected]>

Update CHANGELOG.md for pdfminer#755

0b09d5f

Fix TypeError when getting default width of font (pdfminer#772)

1044fc0

* Issue pdfminer#720 resolve1 when getting the default width. * Add CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

Fix ValueError with unencrypted metadata values (Fixes pdfminer#766)…

f63e9fb

…. (pdfminer#774) * Fix crash with unencrypted metadata values (pdfminer#766). * Explicitly check for length * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>

Install typing_extensions on Python 3.6 and 3.7 (pdfminer#775)

4733eb3

* Install typing_extensions on Python 3.6 and 3.7 * Add CHANGELOG.md * Black setup.py

Run black locally with nox (pdfminer#776)

8f52578

* Run black locally with nox * Update contributor instructions * Fix workflow

Merge branch 'master' of https://github.com/pdfminer/pdfminer.six int…

697e7a9

…o pdfminer-master # Conflicts: # pdfminer/fontmetrics.py

josiahkhor merged commit 778c8c8 into master Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pdfminer master #2

Pdfminer master #2

josiahkhor commented Jul 22, 2022

Pdfminer master #2

Pdfminer master #2

Conversation

josiahkhor commented Jul 22, 2022