Skip to content
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.
Jerome Flesch edited this page Jan 13, 2015 · 65 revisions

How to scan a lot of documents without waiting for the page orientation detection and the OCR ?

  1. In the settings dialog, disable the OCR (OCR Lang: disabled)
  2. Scan all you documents
  3. Page orientation detection will be disabled (dependant on the OCR), so you may have to fix it manually
  4. Application Menu (or menu "Document" in 0.1) -> "Advanded" -> "Redo OCR on all the documents"

How to import many PDF documents in one shot ?

  1. Create a directory
  2. Put all your PDFs in it
  3. In paperwork, in the sub-menu next to the "Print" button -> "Import file(s)"
  4. Select the directory containing all the PDFs

Where are Paperwork files located ?

By default:

  • Configuration : ~/.config/paperwork.conf
  • Index : ~/.local/share/paperwork
  • Documents : ~/papers

The index is always updated according based on the documents. When Paperwork starts, the modification time of each file is used to detecte changes on the documents.

How are the documents stored ?

See the page describing the work directory organisation

How to uninstall Paperwork ?

If you installed Paperwork manually:

sudo pip uninstall paperwork
sudo pip uninstall pyocr
sudo pip uninstall pyinsane

(it's python-pip on some systems)

Note that there are other dependencies installed with Paperwork. However, python-pip can't detect and remove automatically unused dependencies. This is why you should use your distribution package(s) if possible.

Why did you do X instead of Y ?

Variant: Why haven't you implemented X yet ?

Because the SABDFL (me) said so :-)

If you want something changed or improved, your options are:

  • To open a ticket. Try to explain how doing things your way would concretely benefit Paperwork, its users, or even better, the SABDFL. Please provide realistic examples and use cases. Even better : Make me want to do it (see issue #356 for a very good example).
  • To send a patch. However, please also open a ticket first, so we can discuss the numerous use cases you're going to break.

Let's be honest: I'm not going to do anything just because it looks better to you. I'm also not going to do anything just to satisfy a weird use case that only concern you. So please do your best to be convincing without being annoying :-)

Also, please keep in mind I'm doing this on my free time. In other words, I have a very limited amount of time I can spend on Paperwork. So weird or crazy (but valid) features may be delayed from version to version until the end of time.

Why can't X be configured ?

Because if we added all the options everyone want, the settings dialog would look like the space shuttle panel. I'm not going to design a crazy GUI like the one of Eclipse.

However, in the future, there may be hidden settings in the configuration file to accommodate weird requirements.

How can I get statistics regarding my documents ?

Statistics are fun. Unfortunately, they are not really helpful here, so there is nothing in the GUI to get some. However, there is a script:

$ git clone https://github.com/jflesch/paperwork.git
$ cd paperwork
$ scripts/stats.py
(...)
Statistics
==========
Total number of documents: 965
Total number of pages: 1801
Total number of words: 1875857
Total words len: 8702511
Total number of unique words: 56680
===
Maximum number of pages in one document: 75
Maximum word length: 179
Average word length: 4.639219
Average number of words per page: 1041.564131
Average number of words per document: 1943.893264
Average number of pages per document: 1.866321
Average number of unique words per document: 724.718135
Average accuracy of label prediction (global): 95%
Average accuracy of label prediction (positive): 64%
Average accuracy of label prediction (negative): 97%
Clone this wiki locally