DDEx Project

Extracting data independently of file formats

DDEx - Document Data Extractor - is a framework that allows applications to transparently open and extract the content of documents, regardless of formats.

We are working to provide support for:

OLE2 file formats [.doc, .xls, .ppt]
OOXML file formats [.docx, .xlsx, .pptx]
ODF file formats [.odt, .ods, .odp]
CSV
PDF
Google Docs (minimal support)

Goal, Challenges, Differentials

DDEx is based on the Builder Design Pattern, and can be easily extended to support other formats. DDEx aims at decoupling the process of content extraction from the content processing, handling the diversity of file formats and providing access to the document's content independently of file formats.

DDEx manages the intersection between multiple APIs (such as Apache POI and ODFDOM) by offering a common interface, allowing applications to use document's content in other contexts, encapsulating and performing the extraction independently of formats.

Who is using DDEx?

DDEx was born on the academia and ended up being used by other Ph.D. and MSc students during their research. DDEx is also being used by other projects and is associated with academic productions, such as:

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
lib		lib
src		src
LICENSE		LICENSE
README.MD		README.MD
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDEx Project

Extracting data independently of file formats

Goal, Challenges, Differentials

Who is using DDEx?

About

Releases

Packages

Languages

License

matheusmota/ddex

Folders and files

Latest commit

History

Repository files navigation

DDEx Project

Extracting data independently of file formats

Goal, Challenges, Differentials

Who is using DDEx?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages