Can't read text in parsed PDFs (weird encoding)

I'm using ILPDFKit to extract some text from some PDF files generated by another iOS app. Now I'm having some issues with the encoding of the text in certain files. I convert the `Contents` stream of an ILPDFPage into a string, then look for BT / ET pairs to extract the text. 

For instance, one file contains the following text stream:

`BT 0.03260000 Tc 7 0 0 7 0 0 Tm /Tc1 1 Tf [ (Las) 4 (t Name) ] TJ ET`

from which I can easily extract the string `Last Name`

In another file (which has the same general format of the previous file, and which renders correctly on screen), I see the following string instead:

`BT 0.03260000 Tc 7 0 0 7 0 0 Tm /TT2 1 Tf [ (!\"#) 4 ($%&\"\'\\() ] TJ ET`

Why do I see those weird characters instead of the text `Last Name`? What am I doing wrong?

The only difference between the two files, apparently, is that one was created on iOS 9, the other was created on iOS 10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't read text in parsed PDFs (weird encoding) #57

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Can't read text in parsed PDFs (weird encoding) #57

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions