WMDD4950 - Shell Script Project

Repository for a web crawler (input number of sites) by using BFS algorithm. Made in shell script (v4.0+).

Starting from https://en.wikipedia.org/wiki/Cloud_computing the script will crawl for non-repetitive wiki pages, using BFS algorithm, and will save the pages on disk, process the files and extract the words inside each file and save them in form of an indexer in which each file has an alphabetically sorted list of words in which each line has the world and the number of times that word has shown up in that file.

After creating the indexer file, there's another script which takes a word as input and outputs the total count of appearance of that word in your files plus the number of times that it appears on each file that has the word.

Requirements

Shell Script version 4 or older
Lynx browser (a good source on how to install: https://www.tecmint.com/command-line-web-browsers/)

How to Use

Download the two bash files crawler.sh and count.sh
Execute the crawler file ./crawler.sh nnn (nnn is the limit of number of sites to crawl, if not specified 150 assumed)
Wait for the end of execution. It may take some time depending on how many sites you chose.
Crawler.sh will ask you if you want to delete temporary files. Choose your answer.
Execute the count file ./cound.sh www (www is the word you want to search and count)
Count.sh will generate a result.txt file, and will ask you if you want to display its contents. Choose your answer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

WMDD4950 - Shell Script Project

Requirements

How to Use

Files

README.md

Latest commit

History

README.md

File metadata and controls

WMDD4950 - Shell Script Project

Requirements

How to Use