The Organization Information Fetcher is a Python-based application designed to gather, clean, and store detailed information about various organizations. It uses web crawling and structured data extraction techniques to compile comprehensive company profiles.
- Web Crawling: Automatically searches the web for company information.
- Data Cleaning: Ensures the gathered data is structured and complete.
- Data Storage: Saves the cleaned data into CSV files for further analysis.
-
Clone the repository:
git clone https://github.com/SamlRx/organization_information_fetcher.git cd organization_information_fetcher -
Install
uvif not already installed:pip install uv
-
Create a virtual environment:
uv venv
-
Run the application:
python src/organization_information_fetcher_app/main.py
-
To run tests:
pytest
- Create a
.envfile in the root directory of the project. - Add the following environment variables to the
.envfile:# .env MISTRAL_API_KEY=YOUR_MISTRAL_API_KEY
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch). - Commit your changes (
git commit -am 'Add new feature'). - Push to the branch (
git push origin feature-branch). - Create a new Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.
For any inquiries, please contact SamlRx.