BORA (Boletín Oficial de la Republica Argentina) Crawler implemented using Scrapy.
BORA is the Official Gazette for Argentina, where the government publishes public or legal notices, including companies incorporation or modifications in their structure and share holders.
More details can be found in this article.
This crawler saves the following information for each notice:
- id: Notice ID in the BORA website
- company: Name of the company
- date: Date of publication
- type: Type of publication. Eg: company constitution, company modification, etc.
- content: Text of publication
The content of the publication contains unstructured text and must be further processed in order to extract data.
To run the spider and save the crawled items in JSON use:
scrapy crawl bora -o items_bora.json -a start_date=YYY-mm-dd -a end_date=YYY-mm-dd
start_date and end_date are optional, with default values 2011-01-01 and current date respectively.
When deploying to Scrapinghub, make sure you use the scrapy stack, as explained here in order to avoid SSL errors.
Distributed under the MIT License. See LICENSE file for further details.