WEB_CRAWLING

pip install requests
pip install beautifulsoup4
pip install pymupdf

Then run main.py by invoking the following command

python main.py #script to run all code

To run the crawler and converter scripts seperately run the following commands in succession.

python crawler.py   #script to download the pdf files
python conversion.py #script to convert all the pdf files to .txt and .xml

Check the folder for the generated files

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
it_lab_proj-main.zip		it_lab_proj-main.zip

Provide feedback