You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When sitemap generator makes its sitemap index file, it has everything needed to create the sources section of a given gleaner configuration file such as:
gleaner:
runid: iow # this will be the bucket the output is placed in...
summon: true # do we want to visit the web sites and pull down the files
mill: false
context:
cache: true
contextmaps:
- prefix: "https://schema.org/"
file: "/home/vagrant/conf/jsonldcontext.jsonld" # wget http://schema.org/docs/jsonldcontext.jsonld
- prefix: "http://schema.org/"
file: "/home/vagrant/conf/jsonldcontext.jsonld" # wget http://schema.org/docs/jsonldcontext.jsonld
summoner:
after: "" # "21 May 20 10:00 UTC"
mode: full # full || diff: If diff compare what we have currently in gleaner to sitemap, get only new, delete missing
threads: 2
delay: # milliseconds (1000 = 1 second) to delay between calls (will FORCE threads to 1)
headless: http://localhost:9222 # URL for headless see docs/headless
millers:
graph: true
sources:
- active: 'true'
domain: https://pids.geoconnex.dev
headless: 'false'
name: refgages0
pid: https://gleaner.io/genid/geoconnex
propername: refgages0
sourcetype: sitemap
url: https://pids.geoconnex.dev/sitemap/ref/gages/gages__0.xml
- active: 'true'
domain: https://pids.geoconnex.dev
headless: 'false'
name: refmainstems
pid: https://gleaner.io/genid/geoconnex
propername: refmainstems
sourcetype: sitemap
url: https://pids.geoconnex.dev/sitemap/ref/mainstems/mainstems__0.xml
- active: 'true'
domain: https://pids.geoconnex.dev
headless: 'false'
name: dams0
pid: https://gleaner.io/genid/geoconnex
propername: dams0
sourcetype: sitemap
url: https://pids.geoconnex.dev/sitemap/ref/dams/dams__0.xml
- active: 'true'
domain: https://pids.geoconnex.dev
headless: 'false'
name: cdss0
pid: https://gleaner.io/genid/geoconnex
propername: cdss0
sourcetype: sitemap
url: https://pids.geoconnex.dev/sitemap/cdss/co_gages__0.xml
- active: 'true'
domain: https://pids.geoconnex.dev
headless: 'false'
name: nmwdist0
pid: https://gleaner.io/genid/geoconnex
propername: nmwdist0
sourcetype: sitemap
url: https://pids.geoconnex.dev/sitemap/nmwdi/st/nmwdi-st__0.xml
Describe the solution you'd like
Either as a separate step, or as function of the already existing sitemap generator workflow, it would be nice to be able to generate this section.
Describe alternatives you've considered
Ideally gleaner would be able to parse a sitemap index file to create the source entries. In lieu of something like this, being able to copy paste the configuration would be a step up from manual entry, especially as the list of sources we are crawling grows.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When sitemap generator makes its sitemap index file, it has everything needed to create the
sources
section of a given gleaner configuration file such as:Describe the solution you'd like
Either as a separate step, or as function of the already existing sitemap generator workflow, it would be nice to be able to generate this section.
Describe alternatives you've considered
Ideally gleaner would be able to parse a sitemap index file to create the source entries. In lieu of something like this, being able to copy paste the configuration would be a step up from manual entry, especially as the list of sources we are crawling grows.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: