Skip to content

make load-data apparently not loading data to cluster #20

@arminus

Description

@arminus

This is the output when running make load-data: (I ran make before that, there's a 384,4MB sansa-examples-spark.jar present in examples/jars and my setup appears to be running fine):

make load-data
docker run -it --rm -v /home/www/bde/SANSA-Notebooks/sansa-notebooks/examples/data:/data --net spark-net -e "CORE_CONF_fs_defaultFS=hdfs://namenode:8020" bde2020/hadoop-namenode:1.1.0-hadoop2.8-java8 hdfs dfs -copyFromLocal /data /data
Configuring core
 - Setting fs.defaultFS=hdfs://namenode:8020
Configuring hdfs
 - Setting dfs.namenode.name.dir=file:///hadoop/dfs/name
Configuring yarn
Configuring httpfs
Configuring kms
Configuring for multihomed network
docker exec -it namenode hdfs dfs -ls /data
Found 1 items
drwxr-xr-x   - root supergroup          0 2021-04-16 13:45 /data/data

This was the 2nd time I ran make load-data, so besides apparently not uploading any data, I recreated another data dir inside /data on the 2nd run.

Navigating to http://localhost:8088/filebrowser/#/data I can see the nested dat dir but nothing else.

-> I un-jared sansa-examples-spark.jar into examples/data so that the data gets picked up by make load-data, but that step seems to be missing in one of the build targets.

In conjunction with that, the Zeppelin RDF notebook references a file hdfs://namenode:8020/data/rdf.nt - that file is not present in sansa-examples-spark.jar - so I wonder if there's some other issue in play here?

As a side note, copying the data now seems to be running forever (on a reasonable fast Linux box)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions