Knowledge graph usage metadata: Insights from SPARQL log analysis

Wikidata 2017: <a href="https://archive.org/download/wikibase-wikidatawiki-20170821" rel="nofollow">Internet Archive
Wikidata 2018: <a href="https://archive.org/download/wikibase-wikidatawiki-20180205" rel="nofollow">Internet Archive

Datasets

Both versions were hosted on Blazegraph on a local server to analyze SPARQL schema coverage changes over time.

The query logs were retrieved from multiple sources:

Linked SPARQL Queries Dataset (LSQ) 2.0:
- SPARQL Endpoint: https://lsq.data.dice-research.org/sparql
- This dataset contains queries from 24 datasets, including 23 Bio2RDF datasets and one Wikidata dataset.
Bio2RDF Query Logs:
- Dumontier Lab Repository: https://download.dumontierlab.com/Bio2RDF/logs/
Wikidata Query Logs:
- International Center for Computational Logic (ICCL): https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en
- This dataset includes all queries and organic queries for Interval 1 and Interval 7.

To calculate SPARQL Schema Coverage (SC), use the following steps:

Extract all schema elements by running the code in the KG-Schema-extractors folder.
Extract used schema elements from SPARQL query logs by running the code in the Schema-coverage-method folder.
Compute SC (%) using the formula:
[ SC (%) = \left( \frac{USE}{TSE} \right) \times 100 ] where:
- TSE (Total Schema Elements): All distinct types and predicates in the KG.
- USE (Used Schema Elements): The subset of schema elements found in user SPARQL queries.

To perform the usage pattern analysis as proposed in the paper, run the code in the KG-Usage-analysis folder.

The generated usage metadata for Bio2RDF and Wikidata KGs can be found in the generated-usage-metadata folder.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
KG-Schema-extractors		KG-Schema-extractors
KG-Usage-analysis		KG-Usage-analysis
Schema-coverage-method		Schema-coverage-method
generated-usage-metadata		generated-usage-metadata
Intractive-graph-Bio2RDF2019-organic-Logs.html		Intractive-graph-Bio2RDF2019-organic-Logs.html
LICENSE		LICENSE
README.md		README.md