- Wikidata 2017: Internet Archive
- Wikidata 2018: Internet Archive
Both versions were hosted on Blazegraph on a local server to analyze SPARQL schema coverage changes over time.
- Bio2RDF SPARQL Endpoint: https://Bio2RDF.org/sparql/
The query logs were retrieved from multiple sources:
-
Linked SPARQL Queries Dataset (LSQ) 2.0:
- SPARQL Endpoint: https://lsq.data.dice-research.org/sparql
- This dataset contains queries from 24 datasets, including 23 Bio2RDF datasets and one Wikidata dataset.
-
Bio2RDF Query Logs:
- Dumontier Lab Repository: https://download.dumontierlab.com/Bio2RDF/logs/
-
Wikidata Query Logs:
- International Center for Computational Logic (ICCL): https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en
- This dataset includes all queries and organic queries for Interval 1 and Interval 7.
To calculate SPARQL Schema Coverage (SC), use the following steps:
- Extract all schema elements by running the code in the
KG-Schema-extractors
folder. - Extract used schema elements from SPARQL query logs by running the code in the
Schema-coverage-method
folder. - Compute SC (%) using the formula:
[ SC (%) = \left( \frac{USE}{TSE} \right) \times 100 ] where:- TSE (Total Schema Elements): All distinct types and predicates in the KG.
- USE (Used Schema Elements): The subset of schema elements found in user SPARQL queries.
To perform the usage pattern analysis as proposed in the paper, run the code in the KG-Usage-analysis
folder.
The generated usage metadata for Bio2RDF and Wikidata KGs can be found in the generated-usage-metadata
folder.