This Scrapy project should help to find albums with creative-commons license.
- spiders through a music tag until a given number of new CC albums are found
- stores all crawled albums in a couchdb database
- includes all track information (length, mp3-url, license per track)
- Install scrapy ( https://docs.scrapy.org/en/latest/intro/install.html )
- Install couchdb ( https://docs.couchdb.org/en/stable/install/index.html ) (Currently a local database on default port is assumed)
- Adjust tag to crawl (currently: Edit
bandcamp/spiders/tags.py
and change variable "tag", will be moved to command line argument) - Start crawler in base directory:
scrapy crawl tags
- Enable CORS in couchdb
- Start frontend with
ng serve
I will implement this...
Shows all new results in a list view. Maybe include the "featured track" to quickly listen into the album without opening the album page.
You can mark an album as "Done"
You can export a formattet metadata-string for the album, including title, artist and license (so I can use it for shownotes or as reference when giving CC-credits).
Maybe build something quick using angular ... or even a more specialized toolbox which has some more data-view related features.
Adjustable:
- which tag to crawl?
- how many new CC-Albums should be found until terminating?