Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pdfminer master #2

Merged
merged 72 commits into from
Jul 22, 2022
Merged

Pdfminer master #2

merged 72 commits into from
Jul 22, 2022

Conversation

josiahkhor
Copy link
Owner

Pull request

Thanks for improving pdfminer.six! Please include the following information to
help us discuss and merge this PR:

  • A description of why this PR is needed. What does it fix? What does it
    improve?
  • A summary of the things that this PR changes.
  • Reference the issues that this PR fixes (use the fixes #(issue nr) syntax).
    If this PR does not fix any issue, create the issue first and mention that
    you are willing to work on it.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide
instructions so we can reproduce. Include an example pdf if you have one.

Checklist

  • I have added tests that prove my fix is effective or that my feature
    works
  • I have added docstrings to newly created methods and classes
  • I have optimized the code at least one time after creating the initial
    version
  • I have updated the README.md or I am verified that this
    is not necessary
  • I have updated the readthedocs documentation or I
    verified that this is not necessary
  • I have added a consice human-readable description of the change to
    CHANGELOG.md

estshorter and others added 30 commits October 24, 2020 15:55
* Remove unused sortedcontainers package

* Fix changelog format

* Fix a link to the PR

* Update CHANGELOG.md

Co-authored-by: Pieter Marsman <[email protected]>
* Fix for when 'trailer' is indented

Closes pdfminer#214

* Address CR comments - strip line after parsing

* Update CHANGELOG.md

Co-authored-by: Pieter Marsman <[email protected]>
…cking the value of 'DW2' (pdfminer#529)

Closes pdfminer#518 

* Fix TypeError: cannot unpack non-iterable PDFObjRef object, when unpacking the value of 'DW2'

An error is occured when the 'DW2' key contains a PDFObjRef object instead of a list of int values, e.g: 'DW2': <PDFObjRef:152>.
To solve this issue, we utilise the resolve1() function

See: pdfminer#518

* Updated CHANGELOG

* Update CHANGELOG.md

Co-authored-by: Dimitrios TSOLAKIDIS <[email protected]>
Co-authored-by: Pieter Marsman <[email protected]>
… removing the try-except check that writes a unicode character to the stream (pdfminer#523)

Closes pdfminer#191 

* Remove supoprt for non standard output streams that are not binary by removing the try-except check that writes a unicode character to the stream

* Add docstring

* Fix flake8
…with the error: [Errno 13] Permission denied) (pdfminer#484)

Closes pdfminer#469

* Issue pdfminer#469 is fixed

* one extra comment to code is added

* TemporaryFilePath context manager is added to facilitate tests

* flake8 complaints fixed

* Update docs of tempfilepath.py

* Fix flake8

Co-authored-by: Pieter Marsman <[email protected]>
…Trusty Tahr to Focal Fossa (pdfminer#585)

* Update .travis.yml

* Also change 3.9-dev to 3.9 because that is now supported by travis
* tox: use Python 3.9 final

* Update CHANGELOG.md

Co-authored-by: Pieter Marsman <[email protected]>
* Fix .paint_path handling of single line segments

- Fixes typo ("ml" should have been "mlh")

- Removes if-statement that required individual line segments to be
  strictly horizontal or vertical.

* Treat 'ml'-shape paths as lines not curves

Althoguh 'mlh' is the canonical implementation for a single line
segment, 'ml' is fairly common.

Adds tests and sample PDF.

* Fix trailing whitespace

* Fix point-extraction from Beziér path commands

This commit corrects the manner in which "pts" are extracted from Beziér
path commands. See Table 4.9 of PDF reference manual, and new comments
in code for details. Previously, depending on whether the command (c,
v, or y) the code was extracting some combination of control points (not
on curve) and the actual points-on-curve.

This commit also refactors .paint_path, so that apply_matrix_pt is only
called in one place, and to treat the "h" command in a manner more
consistent with other path commands.

* Add comments to test_paint_path_quadrilaterals

* Parse rect-forming mllll paths as rects not curves

Now that .paint_path has been refactored, adding support for
rect-forming mllll paths requires no extra code, beyond a minor tweak to
the relevant elif statement.

* One changelog line with ref to mr

* Remove PDFLayoutAnalyzer._create_curve because implementation has become trivial due to refactoring

* Extract variables from if statement to make it easier to read

* Optimize imports order

* Trigger travis build

* Revert "Trigger travis build"

This reverts commit 41c0518

* Update travis badge

* Update travis badge

Co-authored-by: Pieter Marsman <[email protected]>
* Fix for when trailer is indented

* Store stripped line

* This commit breaks things...

* Or maybe this one breaks things?

* Remove commented code because no longer used.

* Add CHANGELOG.md

* Add poetry venv management files to gitignore since I started using poetry to manage the python envs for this project

Co-authored-by: Pieter Marsman <[email protected]>
…dfminer#537)

* Added support for Paeth PNG filter compression (predictor value = 4)

* Use `above` and `upper_left` as in the pseudo code

* Refactor: use variable names that are very close to the pseudo code and add pieces of the docs to show what is going on.

* Fix line length issues

* Add line about compressions to README.md

* Fix merge conflict on readme

* Fix bug in filter type Up

* Make if-else consistent

Co-authored-by: Eduardo Gonzalez Lopez de Murillas <[email protected]>
Co-authored-by: Pieter Marsman <[email protected]>
…ser (pdfminer#574)

* check obj type

* update changelog

* Update CHANGELOG.md

* fix the bug

* fix condition

* update changelog

* update changelog again

* update changelog

* update

Co-authored-by: Pieter Marsman <[email protected]>
Co-authored-by: Tony Tong <[email protected]>
* Fix typos and possible mistakes.

* Revert two edits based on discussion in pdfminer#579

Revert the two changes based on our discussion. 

I read the documentation and had a glimpse at the default code. And perhaps the confusion was caused by the figure that shows the Char Margin (M) and the Word Margin (W). Clearly, M is smaller than W in absolute terms, but as mentioned, they are both relative numbers.

Maybe it is useful to point that out in the figure but I am not sure how best to do it. 

Another option is to mention use something like `min_char_margin_threshold` or similar, in the hope that they are easier to understand. Just some thoughts!

* Triggering travis again

Co-authored-by: Pieter Marsman <[email protected]>
Fixes pdfminer#566 

* try to fix issue of some Chinese characters cannot be extracted
correctly (pdfminer#566).

* format code to pass flake8 check.

* fix typo and refer to issue 593.

Co-authored-by: huan_cheng <[email protected]>
Co-authored-by: Pieter Marsman <[email protected]>
…iner#600)

* Fix an error when dumping a TOC

* Fix a bug that a TOC title variable is a bytes type

* Update CHANGELOG.md

* Update CHANGELOG.md

* Rename e() to escape() and merge two isinstance() checks

Co-authored-by: Pieter Marsman <[email protected]>
The canonical home of the documentation framework has moved
from documentation.divio.com to https://diataxis.fr.
* add missing import for extract_text_to_fp

* Replace testsetup with visible imports in documentation

* Remove obsolete check for python version; python 2 is not supported anymore

* (Unrelated to this MR) Remove sys from converter.py

* Optimize imports

* (Unrelated to this MR) fix line length error

Co-authored-by: Pieter Marsman <[email protected]>
…output of TagExtractor

* high_level: emit diagnostic for bad output_type

* TagExtractor: eliminate runtime error

This does not make is usable, but will satisfy my curiosity.

* Use if-elif-else structure

* Fix pycharm spacing warning

* Rename _write_outfp to _write

* Properly format tag names and tag values. Using utils.make_compat_str() such that the tag value is always a string.

* Update CHANGELOG.md

* Fix flake8 errors

Co-authored-by: Pieter Marsman <[email protected]>
* Fix typos in converting_pdf_to_text.rst

* The word "pdfminer.six" as a whole should not be separated by newline, otherwise they are treated as two separated words by renderer, and incorrectly displayed as separated.

* Trim redundant spaces

Co-authored-by: Pieter Marsman <[email protected]>
* feat: Add support for ISO 32000-2 AES256 encryption

* feat: Applies review suggestions
…ry, escaped \r\n should be removed (pdfminer#616)

* detect TextIOWrapper as non-binary

* I don't understand the CHANGELOG.md format, hope this is good enough

* Delete \\\r\n in Literal Strings (ref. section 7.3.4.2 of PDF32000_2008)

* Keep Travis CI happy

* Added test

* Remove pdfminer/Changelog

* Prettify _parse_string_1

* Add CHANGELOG.md

* Satisfy flake8

* Update CHANGELOG.md

* Use logging.Logger.warning instead of warning.warn in most cases, following
 the Python official guidance that warning.warn is directed at _developers_,
 not users

 * (pdfdocument.py) remove declarations of PDFTextExtractionNotAllowedWarning,
			PDFNoValidXRefWarning

 * (pdfpage.py) Don't import warning, don't use PDFTextExtractionNotAllowedWarning

 * (tools/dumppdf.py) Don't import warning, don't use PDFNoValidXRefWarning

 * (tests/test_tools_dumppdf.py) Don't import warning, check for logging.WARN rather
				  than PDFNoValidXRefWarning

* get name right

* make flake8 happy

* Revert "make flake8 happy"

This reverts commit 4592769.

* Revert "get name right"

This reverts commit 80091ea.

* Revert "Use logging.Logger.warning instead of warning.warn in most cases, following"

This reverts commit 3c1e3d6.

* Revert "Merge branch 'preferLoggingToWarning' into hst"

This reverts commit 9d9d139, reversing
changes made to 80091ea.

* Revert "Revert "Merge branch 'preferLoggingToWarning' into hst""

This reverts commit b3da219.

Co-authored-by: Henry S. Thompson <[email protected]>
Co-authored-by: Pieter Marsman <[email protected]>
Squashed commit of the following:

commit fa229f7
Merge: eaab3c6 c3e3499
Author: Andrew Baumann <[email protected]>
Date:   Mon Sep 6 20:33:06 2021 -0700

    Merge branch 'develop' into mypy (and fixed types)

commit eaab3c6
Author: Andrew Baumann <[email protected]>
Date:   Mon Sep 6 20:00:45 2021 -0700

    reformat all multi-line function defs to one-arg-per-line

commit 3fe2b69
Author: Andrew Baumann <[email protected]>
Date:   Mon Sep 6 15:58:48 2021 -0700

    ccitt nit -- avoid casting needlessly

commit 15983d8
Author: Andrew Baumann <[email protected]>
Date:   Mon Sep 6 15:58:36 2021 -0700

    tweak CHANGELOG

commit 13dc0ba
Author: Andrew Baumann <[email protected]>
Date:   Mon Sep 6 15:43:46 2021 -0700

    add failing tests for dumppdf crash

commit 6b509c5
Author: Andrew Baumann <[email protected]>
Date:   Mon Sep 6 15:24:23 2021 -0700

    ccitt: apply misc PR feedback

commit feb031b
Author: Andrew Baumann <[email protected]>
Date:   Mon Sep 6 15:18:26 2021 -0700

    add missing None return type to all __init__ methods

commit c0d62d6
Author: Andrew Baumann <[email protected]>
Date:   Mon Sep 6 15:13:08 2021 -0700

    minor cleanup, remove a few more Any types

commit b52a059
Author: Andrew Baumann <[email protected]>
Date:   Sun Sep 5 22:37:28 2021 -0700

    tighten up types, avoid Any in favour of explicit casts

commit e58fd48
Author: Andrew Baumann <[email protected]>
Date:   Sun Sep 5 14:10:49 2021 -0700

    annotate ccitt.py, and fix one definite bug (array.tostring was renamed tobytes)

commit 6052906
Author: Andrew Baumann <[email protected]>
Date:   Sat Sep 4 22:37:38 2021 -0700

    python 3.7 back-compat

commit 4dbcf87
Author: Andrew Baumann <[email protected]>
Date:   Sat Sep 4 22:32:43 2021 -0700

    annotate pdfminer.jbig2

commit 0d40b7c
Author: Andrew Baumann <[email protected]>
Date:   Sat Sep 4 22:31:33 2021 -0700

    annotate pdf2txt.py

commit 5f82eb4
Author: Andrew Baumann <[email protected]>
Date:   Sat Sep 4 09:16:31 2021 -0700

    cleanup: make Plane generic

commit 624fc92
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 23:16:51 2021 -0700

    bluntly ignore calls to cryptography.hazmat

commit 96b2043
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 23:01:06 2021 -0700

    finish annotating, and disallow_untyped_defs for pdfminer.* _except_ ccitt and jbig2

commit 0ab5863
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 21:51:56 2021 -0700

    annotate pdffont

commit 4b689f1
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 18:30:02 2021 -0700

    annotate a couple more scripts; document sketchy code

commit 291981f
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 15:02:01 2021 -0700

    pacify flake8

commit 45d2ce9
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 14:31:48 2021 -0700

    annotate dumppdf, and comment likely bugs

commit 7278d83
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 13:49:58 2021 -0700

    enable mypy on tests and tools, fix one implicit reexport bug

commit 4a83166
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 13:25:59 2021 -0700

    pdfdocument: per dumppdf.py, get_dest accepts either bytes or str

commit 43701e1
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 13:25:00 2021 -0700

    layout: LAParams.boxes_flow may be None

commit 164f816
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 09:45:09 2021 -0700

    add whitespace, pacify flake8

commit 893b9fb
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 09:40:33 2021 -0700

    support old Python without typing.Protocol

commit dc24508
Author: Andrew Baumann <[email protected]>
Date:   Fri Sep 3 09:12:03 2021 -0700

    Move "# type: ignore" comments to fix mypy on Python < 3.8

    The placement of these comments got more flexible in 3.8 due to
    python/mypy#1032

    Satisfying older Python and fitting in flake8's 79-character line
    limit was quite a challenge!

commit da03afe
Author: Andrew Baumann <[email protected]>
Date:   Thu Sep 2 22:59:58 2021 -0700

    fix text output from HTMLConverter

commit 5401276
Author: Andrew Baumann <[email protected]>
Date:   Thu Sep 2 22:40:22 2021 -0700

    annotate high_level.py and the immediately-reachable internal APIs (mostly converters)

commit cc49051
Author: Andrew Baumann <[email protected]>
Date:   Thu Sep 2 17:04:35 2021 -0700

     * expand and improve annotations in cmap, encryption/decompression and fonts
     * disallow untyped calls; this way, we have a core set of
       typed code that can grow over time
       (just not for ccitt, because there's a ton of work lurking there)
     * expand "typing: none" comments to suppress a specific error code

commit 92df54b
Author: Andrew Baumann <[email protected]>
Date:   Wed Sep 1 20:50:59 2021 -0700

    update CHANGELOG

commit f72aaea
Merge: ff787a9 8ea9f10
Author: Andrew Baumann <[email protected]>
Date:   Wed Sep 1 20:47:03 2021 -0700

    Merge branch 'develop' into mypy

commit ff787a9
Author: Andrew Baumann <[email protected]>
Date:   Sat Aug 21 21:46:14 2021 -0700

    be more precise about types on ps/pdf stacks, remove most of the Any annotations

commit be15501
Author: Andrew Baumann <[email protected]>
Date:   Sat Aug 21 10:13:58 2021 -0700

    silence missing imports, (maybe?) hook to tox

commit ff4b6a9
Author: Andrew Baumann <[email protected]>
Date:   Fri Aug 20 22:49:06 2021 -0700

    turn on more strict checks, and untangle the layout mess with generics

    Status:
    $ mypy pdfminer
    pdfminer/ccitt.py:565: error: Cannot find implementation or library stub for module named "pygame"
    pdfminer/ccitt.py:565: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
    pdfminer/pdfdocument.py:7: error: Skipping analyzing "cryptography.hazmat.backends": found module but no type hints or library stubs
    pdfminer/pdfdocument.py:8: error: Skipping analyzing "cryptography.hazmat.primitives.ciphers": found module but no type hints or library stubs
    pdfminer/pdfdevice.py:191: error: Argument 1 to "write" of "IO" has incompatible type "str"; expected "bytes"
    pdfminer/image.py:84: error: Cannot find implementation or library stub for module named "PIL"
    Found 5 errors in 4 files (checked 27 source files)

    pdfdevice.py:191 appears to be a real bug

commit 5c9c0b1
Author: Andrew Baumann <[email protected]>
Date:   Fri Aug 20 17:22:41 2021 -0700

    finish annotating layout

commit 0e6871c
Author: Andrew Baumann <[email protected]>
Date:   Fri Aug 20 16:54:46 2021 -0700

    general progress on annotations
     * finish utils
     * annotate more of pdfinterp, pdfdevice
     * document reason for # type: ignore comments
     * fix cyclic imports
     * satisfy flake8

commit 17d59f4
Author: Andrew Baumann <[email protected]>
Date:   Thu Aug 19 21:38:50 2021 -0700

    WIP on type annotations

    With the possible exception of psparser.py, this is far from complete.

    $ mypy pdfminer
    pdfminer/ccitt.py:565: error: Cannot find implementation or library stub for module named "pygame"
    pdfminer/ccitt.py:565: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
    pdfminer/pdfdocument.py:7: error: Skipping analyzing "cryptography.hazmat.backends": found module but no type hints or library stubs
    pdfminer/pdfdocument.py:8: error: Skipping analyzing "cryptography.hazmat.primitives.ciphers": found module but no type hints or library stubs
    pdfminer/image.py:84: error: Cannot find implementation or library stub for module named "PIL"
…mypy, nosetest and sphinx in there own environment on cicd (pdfminer#677)

* Improve tox.ini by running flake8, mypy, nosetests and sphinx in there own environment.

Improves isolation. Dependencies of one package won't influence the next.

This should fail for the current setup with typing-extensions.

* Try to fix actually running tox tests on travis

* Use recent tox

* Fix using Literal[False] for open_filename.

None has the same true value as False, and therefore it does not matter.

* Replace typing_extensions.Literal by the type of the literal

* Add line to CHANGELOG.md
Fixes pdfminer#625 

* add support for Identity-H/V cmap fonts

* format code to pass flake8 check

* Remove indent

* Remove indent

* Use isinstance instead of type check

* Use or instead of any

* Use str in variable, instead of str.find()

* Fix mypy error: add typing annotations to get_unichr()

* Fix type of PDFCIDFont. Can be any type of CMapBase.

This is a quick fix, the entire cmap structure does not have proper inheritance.

* Added line to CHANGELOG.md

* Add separate class for IdentityUnicodeMap

* Remove ABC from CmapBase

* Remove ABC from CmapBase

* Remove blank line

Co-authored-by: huan_cheng <[email protected]>
Co-authored-by: Pieter Marsman <[email protected]>
…ner#637)

* Attempt to handle decompression error on some broken PDF files

from times to times we go through files where no text is detected, while readers
like evince reads the pdf nicely. After digging it occured this is because the
PDF includes some badly compressed data. This may be fixed by uncompressing byte
per byte and ignoring the error on the last check bytes (arbitrarily found to be
the 3 last).

This has been largely inspired by py-pdf/pypdf#422
and the test file has been taken from there, so credits to @zegrep.

* Attempt to handle decompression error on some broken PDF files

from times to times we go through files where no text is detected, while readers
like evince reads the pdf nicely. After digging it occured this is because the
PDF includes some badly compressed data. This may be fixed by uncompressing byte
per byte and ignoring the error on the last check bytes (arbitrarily found to be
the 3 last).

This has been largely inspired by py-pdf/pypdf#422
and the test file has been taken from there, so credits to @zegrep.

* Use a warnings instead of raising exception

where zlib error is detected before the CRC checksum.

* Add line to CHANGELOG.md

* Only try decompressing if not in strict mode

* Change error into warning because warning.warn needs a subclass of Warning

Co-authored-by: Sylvain Thénault <[email protected]>
Co-authored-by: Pieter Marsman <[email protected]>
* array.array.tostring -> array.array.tobytes

The tostring method has been deprecated since Python 3.2 and was
removed altogether in 3.9. In Python 3.2 the method was renamed
to "tobytes"

Will close pdfminer#641

* changelog entry

* test for tobytes

* Fix CHANGELOG.md

* Update CHANGELOG.md to PR that I can push on

* Simplify tests

Co-authored-by: Forest Gregg <[email protected]>
pietermarsman and others added 29 commits February 2, 2022 22:24
…se to pytest (pdfminer#704)

* Replace tox with nox

* Replace travis with github actions

* Fix pytest, mypy and flake8 errors

* Add pytest.

* Run on all commits

* Remove nose

* Speedup slow tests to save GitHub actions minutes

* Added line to CHANGELOG.md

* Fix line too long in pdfdocument.py

* Update .github/workflows/actions.yml

Co-authored-by: Jake Stockwin <[email protected]>

* Improve actions.yml

* Fix error with nox name for mypy

* Add names for jobs

* Replace nose.raises with pytest.raises

Co-authored-by: Jake Stockwin <[email protected]>
* `log.info` changed to `log.debug` in six files

* Fix identation

* Remove from CHANGELOG.md since no functionality has changed

Co-authored-by: Pedro Nunes <[email protected]>
Co-authored-by: Pieter Marsman <[email protected]>
* Check blackness in github actions

* Blacken code

* Update github action names

* Add contributing guidelines on using black

* Add to checklist for PR
* Raise specific warning if Pillow cannot be imported

* Improve error message

* Update docs

* Update CHANGELOG.md

* Update pdfminer/image.py

Co-authored-by: Jake Stockwin <[email protected]>

Co-authored-by: Jake Stockwin <[email protected]>
* Adding in checks for spurious lines that contain either only spaces or new line characters

* Added spurious lines check and unit tests

* Updated CHANGELOG.md with changes

* Simplify code

* Simplify code

* Simplify code

* Remove changes to lines that are not actually changed

* Format import

* Improve CHANGELOG.md

* Improve CHANGELOG.md

* Fix cicd

* Blacken

Co-authored-by: Pieter Marsman <[email protected]>
…r#727)

* Add github action for releasing to pypi if git tag is added.

* Checkout code and fix typos.

* Replace end with fi

* Strictly numeric version for testing.

* Remove obsolete Make commands for publishing

* Also create GitHub release

* Update pdfminer/__init__.py

Co-authored-by: Jake Stockwin <[email protected]>

* Remove test pypi release

* Use maintained github action for releasing

* Change tag format for versions

* Undo commenting pypi publishing

* Remove develop branch, since that will be removed in favor off adding tags for releases.

* Change version regex

Co-authored-by: Jake Stockwin <[email protected]>
* Convert fontname to str if it is bytes

* Add CHANGELOG.md
…#733)

* Raise KeyError when name in name2unicode is not of type str

* Add CHANGELOG.md
…ys set (pdfminer#732)

* Fix log.debug statement in lzw.py by ensuring that self.table is always set.

* Add CHANGELOG.md
* Log warning and continue gracefully if errors in cmap

* Fix nox testing

* Also log warning if cid range is larger than actual code

* Format with black

* Add docstring

* Add CHANGELOG.md

* Restore running cmapdb.py directly
pdfminer#737)

* Refactor ImageWriter and add method for exporting an image from bytes.

E.g. when FlateDecode just results in a list of RGB bytes.

* Added docstrings

* Add CHANGELOG.md

* Run black

* Run black
* Use charset-normalizer instead of chardet

* Ignore charset_normalizer type stub

* Add CHANGELOG.md
* Ignore path constructors that do not begin with  m

Per PDF Reference Section 4.4.1, "path construction operators may be
invoked in any sequence, but the first one invoked must be m or re to
begin a new subpath." Since pdfminer.six already converts all `re`
(rectangle) operators to their equivelent `mlllh` representation, paths
ingested by `.paint_path(...)` that do not begin with the `m` operator
are invalid.

In addition to the advantage of hewing to the PDF Reference, this change
also avoids the `ValueError: not enough values to unpack (expected 2,
got 1)` error raised by the ` pts = [apply_matrix_pt(self.ctm, pt) for
pt in raw_pts]` line in `converter.py` when parsing PDFs that
(erroneously) include `("h",)` paths.

* Update CHANGELOG.md

Co-authored-by: Pieter Marsman <[email protected]>
Using an upper bound for dependency versions on a library
is a source of troubles for users.
Let's not do it as it makes pdfminer wreck havoc downstream.

Signed-off-by: Philippe Ombredanne <[email protected]>
* Fix Sphinx warnings

howto/acro_forms.rst:4: WARNING: Title underline too short.
howto/acro_forms.rst:81: WARNING: Bullet list ends without a blank line; unexpected unindent.
howto/acro_forms.rst:88: WARNING: Bullet list ends without a blank line; unexpected unindent.
howto/acro_forms.rst:122: WARNING: Bullet list ends without a blank line; unexpected unindent.
tutorial/extract_pages.rst:6: WARNING: Failed to create a cross reference. A title or caption not found: api_extract_pages

* Fix documenting pdf2txt.py

reference/commandline.rst:12: ERROR: Module "tools.pdf2txt" has no attribute "maketheparser"
Incorrect argparse :module: or :func: values?

* Add CHANGELOG.md

Co-authored-by: Pieter Marsman <[email protected]>
…t documented. Also deprecate usage of scripts that are only there for testing purposes. (pdfminer#756)

* Deprecate usage of `if __name__ == "__main__"` in scripts that are not document. Also deprecate usage of scripts that are only there for testing purposes.

* Add CHANGELOG.md

* Cleanup CHANGELOG.md

* Cleanup CHANGELOG.md

* Undo deleting conf_glyphlist.py and conf_afm.py and add a deprecation warning instead
* Issue pdfminer#720

resolve1 when getting the default width.

* Add CHANGELOG.md

Co-authored-by: Pieter Marsman <[email protected]>
…. (pdfminer#774)

* Fix crash with unencrypted metadata values (pdfminer#766).

* Explicitly check for length

* Update CHANGELOG.md

Co-authored-by: Pieter Marsman <[email protected]>
…#768)

* Ignore null characters in PSBaseParser

Beforehand, null characters were encoded as PSKeyword tokens. This caused
issue pdfminer#617, as pdfdevice.py would attempt to decode the null character
PSKeyword, when it expects a byte string, as opposed to a PSKeyword, causing
pdfminer.six to crash.

As null characters are superfluous within PSBaseParser, ignore them.

* Update CHANGELOG.md

Co-authored-by: Pieter Marsman <[email protected]>
* Install typing_extensions on Python 3.6 and 3.7

* Add CHANGELOG.md

* Black setup.py
* Run black locally with nox

* Update contributor instructions

* Fix workflow
…o pdfminer-master

# Conflicts:
#	pdfminer/fontmetrics.py
@josiahkhor josiahkhor merged commit 778c8c8 into master Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.