Skip to content

Taking forever while loading data from compressed files #2

@ayush47-github

Description

@ayush47-github

Hi,
I am having an issue while loading data from compressed files. It is taking a lot of time (left it overnight) and still not completing. It is stuck at streaming records. Included shell output and some other relevant info below:

$ ./import.sh ~/Downloads/darpa-tc/ 0.0.0.0 4712 -v
Consuming from file /home/user/Downloads/darpa-tc/
Using additional arguments 0.0.0.0
Using outfile from second arg: 4712
log4j: Level token is [INFO].
log4j: Category org.apache.spark.repl.SparkIMain$exprTyper set to INFO
log4j: Handling log4j.additivity.org.apache.spark.repl.SparkIMain$exprTyper=[null]
log4j: Finished configuring.
INFO DASImporter: Using input directory: /home/user/Downloads/darpa-tc/
INFO DASImporter: Using schema file: TCCDMDatum.avsc
INFO DASImporter: Using destination host: 0.0.0.0:4712
INFO DASImporter: Verbose: false
INFO DASImporter: Parsing file /home/user/Downloads/darpa-tc/ta1-cadets-1-e5-official-2.bin.1.gz
INFO DASImporter: Streaming records...

$ sudo docker compose ls
NAME STATUS CONFIG FILES
tc_data_visualization_tool running(1) /home/user/Downloads/TC_Data_Visualization_Tool/docker-compose.yml

$ java --version
openjdk 11.0.16 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

$ uname -a
Linux data-server 5.15.0-33-generic #34-Ubuntu SMP Wed May 18 13:34:26 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04 LTS
Release: 22.04
Codename: jammy

Any help would be greatly appreciated! Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions