Script and library which reads urls and converts to objects, allows exporting as CSV or JSON.
Handle sitemaps according to: https://www.sitemaps.org/protocol.html
pip install site-map-parser
smapper $url > /tmp/data.csv
Logs written to ~/sitemap_run.log
| Argument | Options | Default | Information | 
|---|---|---|---|
| -h | N/A | N/A | Outputs argument data | 
| url | e.g. http://www.example.com-http://www.example.com/other_sitemap.xml | N/A | Required - sitemap data to retrieve | 
| -l, --log | CRITICALorERRORorWARNINGorINFOorDEBUG | INFO | logs to sitemapper_run.log in install folder | 
| -e, --exporter | csvorjson | csv | Export format of the data | 
from sitemapparser import SiteMapParser
sm = SiteMapParser('http://www.example.com')    # reads /sitemap.xml
if sm.has_sitemaps():
    sitemaps = sm.get_sitemaps() # returns iterator of sitemapper.Sitemap instances
else:
    urls = sm.get_urls()         # returns iterator of sitemapper.Url instancesTwo exporters are available: csv and json
from sitemapparser.exporters import CSVExporter
# sm set as per earlier library usage example
csv_exporter = CSVExporter(sm)
if sm.has_sitemaps():
    print(csv_exporter.export_sitemaps())
elif sm.has_urls():
    print(csv_exporter.export_urls())from sitemapparser.exporters import JSONExporter
# sm set as per earlier library usage example
json_exporter = JSONExporter(sm)
if sm.has_sitemaps():
    print(json_exporter.export_sitemaps())
elif sm.has_urls():
    print(json_exporter.export_urls())