TPC-H inside of a Docker Container
The version of the tpc-h generator is the version 2.4.0 and its coming from the revision 2.18.0
docker build --tag databloom/tpch:latest .
docker pull ghcr.io/databloom-ai/tpch-docker:main
the data will be allocated in path where you are executing the TPCH
docker run -it -v "$(pwd)":/data ghcr.io/databloom-ai/tpch-docker:main -h
USAGE:
dbgen [-{vf}][-T {pcsoPSOL}]
[-s <scale>][-C <procs>][-S <step>]
dbgen [-v] [-O m] [-s <scale>] [-U <updates>]
Basic Options
===========================
-C <n> -- separate data set into <n> chunks (requires -S, default: 1)
-f -- force. Overwrite existing files
-h -- display this message
-q -- enable QUIET mode
-s <n> -- set Scale Factor (SF) to <n> (default: 1)
-S <n> -- build the <n>th step of the data/update set (used with -C or -U)
-U <n> -- generate <n> update sets
-v -- enable VERBOSE mode
Advanced Options
===========================
-b <s> -- load distributions for <s> (default: dists.dss)
-d <n> -- split deletes between <n> files (requires -U)
-i <n> -- split inserts between <n> files (requires -U)
-T c -- generate cutomers ONLY
-T l -- generate nation/region ONLY
-T L -- generate lineitem ONLY
-T n -- generate nation ONLY
-T o -- generate orders/lineitem ONLY
-T O -- generate orders ONLY
-T p -- generate parts/partsupp ONLY
-T P -- generate parts ONLY
-T r -- generate region ONLY
-T s -- generate suppliers ONLY
-T S -- generate partsupp ONLY
To generate the SF=1 (1GB), validation database population, use:
dbgen -vf -s 1
To generate updates for a SF=1 (1GB), use:
dbgen -v -U 1 -s 1
To execute a shell inside of the docker images
docker run -it --entrypoint /bin/bash ghcr.io/databloom-ai/tpch-docker:main