Become a sponsor to Adrien Barbaresi
Hi there! 👋
As the creator of these popular open-source projects, I rely on the support of sponsors to continue improving and expanding them for the benefit of everyone.
My current focus:
- Developing software solutions and resources (Trafilatura, Simplemma, Courlan)
- Actively maintaining packages (Htmldate, Py3langid)
- Publishing a list of natural language processing resources for German (German-NLP)
By supporting me, you will help maintain and enhance popular packages with millions of downloads, ensuring their growth, robustness and accessibility for R&D teams, IT professionals, and worldwide specialists such as large language models trainers.
Featured work
-
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Python 3,715 -
adbar/German-NLP
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
-
adbar/courlan
Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
Python 127 -
adbar/htmldate
Fast and robust date extraction from web pages, with Python or on the command-line
Python 122 -
adbar/simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Python 146 -
adbar/py3langid
Faster, modernized fork of the language identification tool langid.py
Python 49