Skip to content

DataTreehouse/ERA-SHACL-Benchmark

 
 

Repository files navigation

ERA-SHACL-Benchmark

This is a real data SHACL engines benchmark for performance and conformance. It is based on a real use case in the railway domain comprising the whole European railway infrastructure information (RINF).

The shapes test suite

From the whole set of SHACL shapes analysis, multiple test cases were extracted as unitary tests of different SHACL-core and SHACL-SPARQL features.

This test suite was formulated based on the W3C SHACL test suite shapes. It consists of a subset of SHACL-core and SHACL-SPARQL shapes (actual and their variations) built for validating the ERA knowledge graph together with real failing triples extracted from it and synthetic violations complementing the suite to test the engines' capabilities. This unitary test, although it does not cover all the official test suite cases, presents new real-world cases that have not been tested before.

The generated validation reports are the assets used to assess each engine's capabilities in terms of conformance. Including correctness and completeness.

The performance benchmark

The approximately 55 million triples of the knowledge graph are split into three orders of magnitude tests and validated against three subsets of SHACL shapes. Totalling nine combinations of experiments per engine.

Each tested engine was configured to run in memory as recommended by its respective documentation. To ensure reproducibility and portability, each library was included within a CLI application and packed in a Docker image. The loading and validation times were measured within each engine's operation using time measurement standard libraries in the corresponding programming language each library has been programmed with.

The report quality

For comprehensiveness assessment, a set of SHACL shapes was formulated in order for the generated reports to be validated against the mandatory SHACL specifications for reports.

Usage instructions

  1. Clone this repository.
  2. Execute the get_data.sh script to process and store the knowledge graph subsets and shapes for the benchmark.
  3. Make sure you have Docker installed with admin access (or execute everything with sudo).
  4. Build the images using the build_images.sh script available in the engines folder.
  5. Run the test suite's tests script run_test.sh to generate the test reports.
  6. Execute the script run_benchmark.sh to run the benchmark.
  7. Generate the tables and figures using the scripts provided in the analysis folder.
  8. Finally, check the reports' compliance by running the report.py script in the analysis folder.

How to use this benchmark for testing other engines

If you want to test an additional engine or a different version, create a simple CLI app following the requirements described below and build it inside a Docker image. Then, customise the benchmark script to include it in the list of engines to be tested in the corresponding line.

CLI application requirements

  • Positional arguments: The paths pointing to a single data triples turtle file, a single shapes triples turtle file, and a single validation report turtle file are the positional arguments the app should parse in that order.
  • In addition, the load and validation time measurements metrics should be printed on the command line as follows, so the benchmark script could capture them:
    Load time: 0.258
    Validation time: 0.039
  • Time should be measured in seconds and printed without units.
  • Measuring the memory consumption is not required, as the script takes the peak from a sample of the running Docker memory profile.

Feel free to use any of the other engines' image code as a reference for a correct app implementation.

About

SHACL-SPARQL Benchmark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 44.1%
  • Python 26.3%
  • Shell 14.7%
  • Dockerfile 9.4%
  • C# 5.5%