Skip to content

ntgbaoo/Web-scraper-with-LLM

Repository files navigation

LLM for web scraper

Unit Tests Linting

Docker Build Mypy Type Checking

Overview

This github repository provides implementation of web scrapper using any pre-train open-source LLM from Ollama.

Key Features and Tooling

This project is building with help of tools:

Key features:

  • ✅ Use open-source LLM to parse car description from any website.(for further information please visit document here)

Due to time and resource constraint, the following features are left as future works:

  • ⬜ Integration test
  • ⬜ LLM inference on GPU(s) (since I don't have GPU machine/cluster available to explore this at this point)

Interesting future direction:

  • ⬜ Parse customer reviews and ratings to learn demand and market trend
  • ⬜ Parse promotions and incentives, which can help car dealer to learn marketing and sales strategy of opponent dealer

Installation

To build and run my app. Docker is all you need! Please follows steps below:

  1. Create a shared network for 2 Docker containers to communicate
docker network create llm_scraper_host
  1. Build docker for Ollama at host 11434
docker run --network llm_scraper_host -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
  1. Execute Ollama model
docker exec -it ollama ollama run gemma3:1b

Since I am building this app just using cpu, a lightweight model, gemma3:1b, is selected due to resource constraint.

  1. Build LLM web scraper app docker by using image as follows
docker build -t llm-scraper-app .
  1. Then run docker
docker run -p 8000:8000 -e LLM_MODEL_NAME=gemma3:1b -e OLLAMA_HOST=11434 --network llm_scraper_host llm-scraper-app

The app will be available at http://127.0.0.1:8000/.

To learn about available features and how to use them, please see Features.md

About

Implementation of web scrapper with LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors