To build an ETL pipeline using open access museum collection datasets. Focus on ingesting and transforming into a form easy to query and reusable.
Develop a reproducible pipeline that makes it easier to understand and analyze. Should be able to answer questions such as:
- Top material
- Date distributions
- Missing required fields
- Artists with multiple spellings
- Images without dimensions