This is a web scraper tool designed to extract Odia data from websites and collect relevant information for further analysis and processing. It is developed using Python and utilizes various libraries to fetch, parse, and store the extracted data.
- Extract data from multiple websites by providing a list of URLs or using a sitemap.
- Handle different types of documents, including PDF, TXT and DOCX
- Export the extracted data in various formats, such as JSONL (JSON Lines) or text files (.TXT), for easy storage and analysis.
- Handle errors gracefully and provide informative messages in case of unsuccessful extractions.
This web scraper is inspired by and built upon various open-source libraries and tutorials available on the web. We thank the contributors of those projects for their valuable work.
For any issues, suggestions, or contributions, please contact OdiagenAI at [email protected]. Feel free to submit bug reports or feature requests on the repository's issue tracker.
Happy scraping!