Skip to content

Commit

Permalink
Minor update explaining we do not provide 3GB of pdfs
Browse files Browse the repository at this point in the history
  • Loading branch information
cvanlabe committed Sep 16, 2022
1 parent 2f4f3fb commit f5ed3f0
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion UNDataScraping/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
This folder is used by the program that builds the dataset from scratch. It scrapes the un.org website, and retrieves the html pages holding the yearly overview, and the meeting and resolution transcripts.
They are stored in the `scratch/`.

The data used to build the current dataset are included for reference.
The data used to build the current dataset are included for reference. We did not include the > 3GB of pdf text transcripts.
Those can be retrieved by running the `load_unsc_meeting_data_to_db.py` script.

0 comments on commit f5ed3f0

Please sign in to comment.