This github repository provides implementation of web scrapper using any pre-train open-source LLM from Ollama.
This project is building with help of tools:
- ✅ uv
- ✅ unit test
- ✅ CI support tool: Github Action
- ✅ Open-source pre-train LLM model from Ollama
- ✅ FastAPI
- ✅ Docker
Key features:
- ✅ Use open-source LLM to parse car description from any website.(for further information please visit document here)
Due to time and resource constraint, the following features are left as future works:
- ⬜ Integration test
- ⬜ LLM inference on GPU(s) (since I don't have GPU machine/cluster available to explore this at this point)
Interesting future direction:
- ⬜ Parse customer reviews and ratings to learn demand and market trend
- ⬜ Parse promotions and incentives, which can help car dealer to learn marketing and sales strategy of opponent dealer
To build and run my app. Docker is all you need! Please follows steps below:
- Create a shared network for 2
Dockercontainers to communicate
docker network create llm_scraper_host
- Build docker for
Ollamaat host11434
docker run --network llm_scraper_host -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
- Execute
Ollamamodel
docker exec -it ollama ollama run gemma3:1b
Since I am building this app just using cpu, a lightweight model, gemma3:1b, is selected due to resource constraint.
- Build LLM web scraper app docker by using image as follows
docker build -t llm-scraper-app .
- Then run docker
docker run -p 8000:8000 -e LLM_MODEL_NAME=gemma3:1b -e OLLAMA_HOST=11434 --network llm_scraper_host llm-scraper-app
The app will be available at http://127.0.0.1:8000/.
To learn about available features and how to use them, please see Features.md