Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Can't read text in parsed PDFs (weird encoding) #57

@AndreaCogliati

Description

@AndreaCogliati

I'm using ILPDFKit to extract some text from some PDF files generated by another iOS app. Now I'm having some issues with the encoding of the text in certain files. I convert the Contents stream of an ILPDFPage into a string, then look for BT / ET pairs to extract the text.

For instance, one file contains the following text stream:

BT 0.03260000 Tc 7 0 0 7 0 0 Tm /Tc1 1 Tf [ (Las) 4 (t Name) ] TJ ET

from which I can easily extract the string Last Name

In another file (which has the same general format of the previous file, and which renders correctly on screen), I see the following string instead:

BT 0.03260000 Tc 7 0 0 7 0 0 Tm /TT2 1 Tf [ (!\"#) 4 ($%&\"\'\\() ] TJ ET

Why do I see those weird characters instead of the text Last Name? What am I doing wrong?

The only difference between the two files, apparently, is that one was created on iOS 9, the other was created on iOS 10.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions