Skip to content

Latest commit



124 lines (83 loc) · 4.85 KB

File metadata and controls

124 lines (83 loc) · 4.85 KB

NodeJS TypeScript Playwright MongoDB Redis ESLint Docker

GitHub's topics scraper with Playwright

A microservice crawling and scraping GitHub repositories based on a specific topic (i.e. climatechange). This project is part of my final project for the Helsinki University's Full Stack Open course.

The service is subcribed to a Redis pub/sub message channel and starts a new scraping process whenever a message is received.

The microservice stores the results into an Atlas Mongodb database. The complete result is also stored into a local .json and .csv file.

The scraping process returns for each repository found the following data:

  • owner
  • name
  • URL
  • number of starts
  • description
  • list of repository topics

Work hours

A list of approximate work hours used to develop the project are listed in


Run npm install

Configure secret/environment variables

  • In the root folder create .env file with following keys:
MONGO_URL = 'mongodb+srv://fullstack:[email protected]/repos?retryWrites=true&w=majority'
REDIS_URL = 'redis://'
  • Set sensitive data as secrets with commands:
    fly secrets set MONGO_URL='mongodb+srv://fullstack:[email protected]/repos?retryWrites=true&w=majority' fly secrets set REDIS_URL='redis://'


npm run build to compile typescript .ts files located in /src
npm start to run in dev mode the compiled files located in ./build folder
npm run dev to run typescript files on the fly reloading when something changes

Deploy to

Check secrets: fly secrets list

Deploy to Fly fly deploy or npm run deploy

Scale Fly app to 0 machines (stopped) fly scale count 0

Scale Fly app back to 1 machine fly scale count 1

Show list of Fly apps currently deployed: fly apps list

Show logs from all machines (or filter by id with -i flag) fly logs

Restart machine fly machine restart


Docker image is used by to deploy this micro-service.
It can be also used to run and debug the Docker image.

Build Docker image docker build . -t scraper

Run Docker image docker run --env MONGO_URL='MONGO_URL_in_.ENV_FILE' --env REDIS_URL='REDIS_URL_in_.ENV_FILE' scraper

Docker list of all containers docker ps -a
Restart a container docker restart [container-id]
Follow container logs docker logs --follow [container-id]

Docker best practices: Docker best practicesOpen it in a new tab.


Print list of all commits to a .txt file (Docs)

git log --reverse --pretty=format:'| %as | 1 | %s |' > log.txt


Mongodb atlas

Connect via web app

Redis cloud

Connect via web app

Connect via terminal

Use the Connect button from the web app which will provide something like this: redis-cli -u redis://

Once you are connected, check open and running pub.sub channels with: PUBSUB CHANNELS
