This project is a web scraper that fetches property details from a website and uses RabbitMQ for message queuing. The project is structured using the SOLID principles and uses Puppeteer for web scraping.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Docker and Docker Compose
- Clone the repository:
git clone https://github.com/man0l/imot-scraper.git
cd imot-scraper
- Build the Docker images:
docker-compose build
- Run the migrations:
docker-compose run web_scraper_consumer npx sequelize-cli db:migrate --migrations-path ./src/migrations/ --models-path ./src/models/ --config ./src/config/db.json
- To start the RabbitMQ server, publisher, and consumer:
docker-compose up
The property_type_publisher.js
script will automatically publish property types URLs to RabbitMQ, and the main.js
script will consume the URLs and scrape property details.
You can view the logs for each service in the Docker Compose output.
You can access the RabbitMQ Management interface at http://host.docker.internal:15672. The default username and password are guest
.
Also, you could connect to the rabbitmq server through the same host and port host.docker.internal:5672
- Docker - Containerization platform
- Node.js - JavaScript runtime
- Puppeteer - Headless browser for web scraping
- RabbitMQ - Open source message broker
Please read CONTRIBUTING.md
for details on our code of conduct, and the process for submitting pull requests to us.
This project is licensed under the MIT License - see the LICENSE.md
file for details