Interactive Spark Notebooks for running SANSA-Examples. In this repository you will find a docker-compose.yml for running Hadoop/Spark cluster locally. The cluster also includes Hue for navigation and copying file to HDFS. The notebooks are created and run using Apache Zeppelin.
- Docker Engine >= 1.13.0
- docker-compose >= 1.10.0
- Around 10 GB of disk space for Docker images
After installation of docker add yourself to docker group (%username% is your username) and relogin:
sudo usermod -aG docker %username%
This allows to run docker commands without sudo prefix (necessary for running make targets).
Get the SANSA Examples jar file (requires wget
):
make
Start the cluster (this will lead to downloading BDE docker images, will take a while):
make up
When start-up is done you will be able to access the following interfaces:
- http://localhost:8080/ (Spark Master)
- http://localhost:8088/home (Hue HDFS Filebrowser)
- http://localhost/ (Zeppelin)
To load the data to your cluster simply do:
make load-data
Go on and open Zeppelin, choose any available notebook and try to execute it.
To restart Zeppelin without restarting the whole stack:
make restart
Stop the whole stack:
make down
It is also possible to execute the applications from the command line. Get SANSA-Examples jar and start the cluster if you already have not done it:
make
make up
make load-data
Then you can execute any of the following commands to run the examples from the command line:
make cli-triples-reader
make cli-triple-ops
make cli-triples-writer
make cli-pagerank
make cli-rdf-stats
make cli-inferencing
make cli-sparklify
make cli-owl-reader-manchester
make cli-owl-reader-functional
make cli-owl-dataset-reader-manchester
make cli-owl-dataset-reader-functional
make cli-clustering
make cli-rule-mining
We always welcome new contributors to the project! Please see our contribution guide for more details on how to get started contributing to SANSA.