This project provides a setup for loading OMOP CDM data into both ClickHouse and PostgreSQL databases.
- Docker and Docker Compose
- Python 3.8+ (for running the ID rewriting script)
# Start both ClickHouse and PostgreSQL
docker-compose up -d
# Check if services are running
docker-compose psThe data loading process is automatic when the containers start:
- ClickHouse: Data is loaded automatically via the
clickhouse-init.xmlconfiguration - PostgreSQL: Data is loaded automatically via the
init.sqlinitialization script
# Connect to ClickHouse
docker exec -it omop_clickhouse clickhouse-client --database=omop
# Or connect from host
clickhouse-client --host=localhost --port=9000 --database=omop# Connect to PostgreSQL
docker exec -it omop_postgres psql -U omop_user -d omop
# Or connect from host
psql -h localhost -p 5432 -U omop_user -d omopIf you want to rewrite IDs to use smaller sequential integers while maintaining referential integrity:
python3 rewrite_ids.pyThis will create a new directory omop_data_csv_rewritten/ with the processed files.
If you need to shift dates in the OMOP data:
python3 shift_omop_dates.pyBoth databases use the standard OMOP CDM schema with the following main tables:
person- Patient demographicsvisit_occurrence- Healthcare visitscondition_occurrence- Diagnosesdrug_exposure- Medicationsprocedure_occurrence- Proceduresmeasurement- Lab results and measurementsobservation- Clinical observationsdeath- Mortality data- And many more...
- Port: 8123 (HTTP), 9000 (Native)
- Database:
omop - User:
default - Password:
default
- Port: 5432
- Database:
omop - User:
omop_user - Password:
omop_password
# Check ClickHouse logs
docker-compose logs clickhouse
# Check PostgreSQL logs
docker-compose logs postgres# Restart all services
docker-compose restart
# Restart specific service
docker-compose restart postgres# Rebuild and restart
docker-compose down
docker-compose build --no-cache
docker-compose up -dThe OMOP data should be placed in the omop_data_csv/ directory as gzipped CSV files with the following naming convention:
person.csv.gzvisit_occurrence.csv.gzcondition_occurrence.csv.gz- etc.
See LICENSE file for details.