Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert dangerous PDF files to clean PDF files #27

Open
bofrese opened this issue Sep 6, 2020 · 1 comment
Open

Convert dangerous PDF files to clean PDF files #27

bofrese opened this issue Sep 6, 2020 · 1 comment

Comments

@bofrese
Copy link
Contributor

bofrese commented Sep 6, 2020

The detection of wether PDF files are dangerous or not may get too many false positives, as it simply detects if the PDF has an openaction.

It should be possible to convert a PDF to a clean PDF via pdf2ps and back...
pdf2ps input.pdf - | ps2pdf - output.pdf

we of course only need to do this if the PDF has been detected as dangerous.

I got the idea on how to clean PDF's from here: https://security.stackexchange.com/questions/171716/how-to-know-if-a-pdf-file-is-infected

I currently do not have the time to do this myself, so I leave the note here in the hope that somebody else will have the time :-)

@Rafiot
Copy link
Member

Rafiot commented Sep 6, 2020

So I tries things similar to that, but it comes with a bunch of issues:

  • it simply fails a lot on a lot of legitimate PDFs
  • it makes it impossible to copy text out of the PDFs, which is often a requirement

In theory, turning a PDF into PDF/A should/could remove active contents from the file, making it saner. The problem is that pdf to pdf/a converters are not exactly working either (or they weren't last time I checked at least).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants