Skip to content

Conversation

@maxachis
Copy link
Collaborator

Create source collector main prototype.

Two components in the original issue were not fulfilled:

Eliminate existing duplicate URLs from original batch through combination of calls to Data Sources App Database (via PDAP API Client) and internal Collector Manage Database

  • This has been partially addressed -- deduplication occurs with the internal collector manage database, but the Data Sources App database is not currently called

A Command Line Interface (CLI) for managing commands to the system (possibly a precursor to a full-fledged API)
This was started but has been largely abandoned, and remnants will likely be phased out in a later build -- this is secondary to the development of the API.

* Create explicit "main" method
* Extract logic to functions
* Add detailed docstrings
* Add some comments
* Create explicit "main" method
* Extract logic to functions
* Add detailed docstrings
* Add some comments and TODOs
* Add detailed docstrings
* Add some comments and TODOs
* Create explicit main function and `__main__` section
* Add detailed docstrings
* Add some comments and TODOs
* Extract logic from `muck_get.py` and `download_muckrock_foia.py`
* Create constants for base muckrock api url and foia extension of base url
* Extract logic for loading from and saving to json files to separate functions
* Add TODOs
* Extract `muck_get.py` logic to FOIA searcher
* Remove deprecated `download_muckrock_foia.py`
* Create MuckrockFetcher base class
* Implement in FOIAFetcher
* Create JurisdictionFetcher and AgencyFetcher
* Replace relevant logic in `generate_detailed_muckrock_csv.py`
* Create Enum Class
* Simplify Agency Info data creation
* Extract logic to separate functions
* Create Enum Class
* Simplify Agency Info data creation
* Extract logic to separate functions
* Create LoopFetcher classes
* Implement in `get_allegheny_foias`
* Create SQLClient classes
* Add custom exception handling to Muckrock Fetcher.
* Clean up comments
* Extract some logic to separate functions.
* Create FOIA DB Searcher class, incorporate into module
* Extract logic to functions
* Move all class files into `/classes` module
* Create Example Collector/Preprocessor
* Finish AutoGoogler Collector/Preprocessor
* Create two integration tests
@maxachis
Copy link
Collaborator Author

maxachis commented Jan 6, 2025

Closing in favor of #129

@maxachis maxachis closed this Jan 6, 2025
@maxachis maxachis deleted the mc_122_source_collector_main branch April 17, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants