-
Notifications
You must be signed in to change notification settings - Fork 1
Configuration
a.metwalli edited this page Feb 6, 2026
·
1 revision
This page is a practical summary. The full reference is in docs/config.md and the example config is example/config.yml.
A tiny, readable example for onboarding:
version: "0.1"
report:
path: "report"
entities:
- name: "animal"
source:
format: "csv"
path: "example/in/animal"
sink:
write_mode: "append"
accepted:
format: "parquet"
path: "example/out/accepted/animal"
rejected:
format: "csv"
path: "example/out/rejected/animal"
policy:
severity: "reject"
schema:
normalize_columns:
enabled: true
strategy: "snake_case"
columns:
- name: "animal_id"
type: "string"
nullable: false
unique: true
- name: "species"
type: "string"
- name: "age"
type: "int64"For a complete, commented config that covers domains, storages, cloud paths, write modes, mismatch policies, and multiple formats, see:
docs/wiki/Full-Config-Example.md
-
source: input format, path, and options. -
sink: accepted/rejected outputs and write mode. -
policy:warn,reject, orabort. -
schema: column types, nullability, unique checks, optional normalization.
write_mode controls how accepted and rejected datasets are written:
-
overwrite(default): replaces existing dataset parts. -
append: adds new dataset parts.
The mode is defined at sink.write_mode and applies to both accepted and rejected outputs.
Cloud storage definitions live under storages. Use source.storage and sink.*.storage to select a client. See docs/support-matrix.md for supported formats and cloud behavior.