Skip to content
@pdfliberation

PDF Liberation

A commons for the work of liberating data from PDF files

Popular repositories Loading

  1. knowledge knowledge Public

    A place to collect and share knowledge about liberating data from PDFs

    Shell 53 7

  2. whatwordwhere whatwordwhere Public

    Forked from jsfenfen/whatwordwhere

    Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.

    Python 22 5

  3. pdf-hackathon pdf-hackathon Public

    Resources related to PDF Liberation hackathon

    12 11

  4. pdf_table_extraction pdf_table_extraction Public

    experimenting with pdf2text and python pdf-table-extract

    JavaScript 11 3

  5. Jersey-City-Budget-PDF-Liberation Jersey-City-Budget-PDF-Liberation Public

    This project will liberate data from pdf files found on http://www.cityofjerseycity.com/pub-info.aspx?id=2430 and will create .csv and .json files to be uploaded on https://data.openjerseycity.org/…

    Python 6 1

  6. financial_disclosure_scraping financial_disclosure_scraping Public

    (DC team) experimenting with available options for extracting info from PFDs

    Python 4 2

Repositories

Showing 10 of 20 repositories
  • knowledge Public

    A place to collect and share knowledge about liberating data from PDFs

    pdfliberation/knowledge’s past year of commit activity
    Shell 53 Unlicense 7 1 1 Updated Jan 30, 2022
  • python-hocrgeo Public

    Python tool for converting hOCR files to geographic file formats

    pdfliberation/python-hocrgeo’s past year of commit activity
    Python 4 BSD-3-Clause 0 3 0 Updated Aug 14, 2014
  • USAID-DEC Public Forked from dbarlett/USAID-DEC

    Data from the United States Agency for International Development (USAID) Development Experience Clearinghouse (DEC).

    pdfliberation/USAID-DEC’s past year of commit activity
    1 6 0 0 Updated Apr 7, 2014
  • python-popplergeo Public

    package to convert pdftotext bbox xhtml output to geojson

    pdfliberation/python-popplergeo’s past year of commit activity
    1 MIT 0 3 0 Updated Feb 23, 2014
  • whatwordwhere Public Forked from jsfenfen/whatwordwhere

    Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.

    pdfliberation/whatwordwhere’s past year of commit activity
    Python 22 16 0 0 Updated Feb 23, 2014
  • OCRToolkit Public Forked from opensecrets/OCRToolkit

    Tools for working with Optical Character Recognition output

    pdfliberation/OCRToolkit’s past year of commit activity
    Python 4 BSD-3-Clause 6 0 0 Updated Feb 17, 2014
  • amnestydata Public

    Amnesty International Torture data

    pdfliberation/amnestydata’s past year of commit activity
    Java 3 0 0 0 Updated Feb 9, 2014
  • Jersey-City-Budget-PDF-Liberation Public

    This project will liberate data from pdf files found on http://www.cityofjerseycity.com/pub-info.aspx?id=2430 and will create .csv and .json files to be uploaded on https://data.openjerseycity.org/dataset/jersey-city-2013-budget-adopted-spending

    pdfliberation/Jersey-City-Budget-PDF-Liberation’s past year of commit activity
    Python 6 1 0 0 Updated Jan 25, 2014
  • pdfliberation.github.io Public

    Homepage for this organization

    pdfliberation/pdfliberation.github.io’s past year of commit activity
    CSS 2 0 0 0 Updated Jan 24, 2014
  • NYCEDCprosedatascraper Public

    This uses regular expressions (in php, but can be any language) get data from the NYC EDC newsletters

    pdfliberation/NYCEDCprosedatascraper’s past year of commit activity
    PHP 1 MIT 0 0 0 Updated Jan 22, 2014

Top languages

Loading…

Most used topics

Loading…