Skip to content

Commit db13875

Browse files
committed
add all other configs for now
1 parent 892b5e7 commit db13875

File tree

2 files changed

+354
-4
lines changed

2 files changed

+354
-4
lines changed

docs/website/docs/reference/configuration.md

Lines changed: 338 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -864,3 +864,341 @@ None
864864
* **`url`** - _str_ <br />
865865
* **`api_key`** - _str_ <br />
866866
* **`additional_headers`** - _typing.Dict[str, str]_ <br />
867+
868+
## All other Configurations
869+
### BaseConfiguration
870+
None
871+
872+
873+
### ConfigProvidersConfiguration
874+
None
875+
876+
* **`enable_airflow_secrets`** - _bool_ <br />
877+
* **`enable_google_secrets`** - _bool_ <br />
878+
* **`airflow_secrets`** - _[VaultProviderConfiguration](#vaultproviderconfiguration)_ <br />
879+
* **`google_secrets`** - _[VaultProviderConfiguration](#vaultproviderconfiguration)_ <br />
880+
881+
### ConfigSectionContext
882+
None
883+
884+
* **`in_container`** - _bool_ <br /> Current container, if None then not injected
885+
* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
886+
* **`pipeline_name`** - _str_ <br />
887+
* **`sections`** - _typing.Tuple[str, ...]_ <br />
888+
* **`merge_style`** - _typing.Callable[[dlt.common.configuration.specs.config_section_context.ConfigSectionContext, dlt.common.configuration.specs.config_section_context.ConfigSectionContext], NoneType]_ <br />
889+
* **`source_state_key`** - _str_ <br />
890+
891+
### ContainerInjectableContext
892+
Base class for all configurations that may be injected from a Container. Injectable configuration is called a context
893+
894+
* **`in_container`** - _bool_ <br /> Current container, if None then not injected
895+
* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
896+
897+
### CsvFormatConfiguration
898+
None
899+
900+
* **`delimiter`** - _str_ <br />
901+
* **`include_header`** - _bool_ <br />
902+
* **`quoting`** - _quote_all | quote_needed_ <br />
903+
* **`on_error_continue`** - _bool_ <br />
904+
* **`encoding`** - _str_ <br />
905+
906+
### DBTRunnerConfiguration
907+
None
908+
909+
* **`package_location`** - _str_ <br />
910+
* **`package_repository_branch`** - _str_ <br />
911+
* **`package_repository_ssh_key`** - _str_ <br />
912+
* **`package_profiles_dir`** - _str_ <br />
913+
* **`package_profile_name`** - _str_ <br />
914+
* **`auto_full_refresh_when_out_of_sync`** - _bool_ <br />
915+
* **`package_additional_vars`** - _typing.Mapping[str, typing.Any]_ <br />
916+
* **`runtime`** - _[RuntimeConfiguration](#runtimeconfiguration)_ <br />
917+
918+
### DestinationCapabilitiesContext
919+
Injectable destination capabilities required for many Pipeline stages ie. normalize
920+
921+
* **`in_container`** - _bool_ <br /> Current container, if None then not injected
922+
* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
923+
* **`preferred_loader_file_format`** - _jsonl | typed-jsonl | insert_values | parquet | csv | reference | model_ <br />
924+
* **`supported_loader_file_formats`** - _typing.Sequence[typing.Literal['jsonl', 'typed-jsonl', 'insert_values', 'parquet', 'csv', 'reference', 'model']]_ <br />
925+
* **`loader_file_format_selector`** - _class 'dlt.common.destination.capabilities.LoaderFileFormatSelector'_ <br /> Callable that adapts `preferred_loader_file_format` and `supported_loader_file_formats` at runtime.
926+
* **`preferred_table_format`** - _iceberg | delta | hive | native_ <br />
927+
* **`supported_table_formats`** - _typing.Sequence[typing.Literal['iceberg', 'delta', 'hive', 'native']]_ <br />
928+
* **`type_mapper`** - _typing.Type[dlt.common.destination.capabilities.DataTypeMapper]_ <br />
929+
* **`recommended_file_size`** - _int_ <br /> Recommended file size in bytes when writing extract/load files
930+
* **`preferred_staging_file_format`** - _jsonl | typed-jsonl | insert_values | parquet | csv | reference | model_ <br />
931+
* **`supported_staging_file_formats`** - _typing.Sequence[typing.Literal['jsonl', 'typed-jsonl', 'insert_values', 'parquet', 'csv', 'reference', 'model']]_ <br />
932+
* **`format_datetime_literal`** - _typing.Callable[..., str]_ <br />
933+
* **`escape_identifier`** - _typing.Callable[[str], str]_ <br />
934+
* **`escape_literal`** - _typing.Callable[[typing.Any], typing.Any]_ <br />
935+
* **`casefold_identifier`** - _typing.Callable[[str], str]_ <br /> Casing function applied by destination to represent case insensitive identifiers.
936+
* **`has_case_sensitive_identifiers`** - _bool_ <br /> Tells if destination supports case sensitive identifiers
937+
* **`decimal_precision`** - _typing.Tuple[int, int]_ <br />
938+
* **`wei_precision`** - _typing.Tuple[int, int]_ <br />
939+
* **`max_identifier_length`** - _int_ <br />
940+
* **`max_column_identifier_length`** - _int_ <br />
941+
* **`max_query_length`** - _int_ <br />
942+
* **`is_max_query_length_in_bytes`** - _bool_ <br />
943+
* **`max_text_data_type_length`** - _int_ <br />
944+
* **`is_max_text_data_type_length_in_bytes`** - _bool_ <br />
945+
* **`supports_transactions`** - _bool_ <br />
946+
* **`supports_ddl_transactions`** - _bool_ <br />
947+
* **`naming_convention`** - _str | typing.Type[dlt.common.normalizers.naming.naming.NamingConvention] | class 'module'_ <br />
948+
* **`alter_add_multi_column`** - _bool_ <br />
949+
* **`supports_create_table_if_not_exists`** - _bool_ <br />
950+
* **`supports_truncate_command`** - _bool_ <br />
951+
* **`schema_supports_numeric_precision`** - _bool_ <br />
952+
* **`timestamp_precision`** - _int_ <br />
953+
* **`max_rows_per_insert`** - _int_ <br />
954+
* **`insert_values_writer_type`** - _str_ <br />
955+
* **`supports_multiple_statements`** - _bool_ <br />
956+
* **`supports_clone_table`** - _bool_ <br /> Destination supports CREATE TABLE ... CLONE ... statements
957+
* **`max_table_nesting`** - _int_ <br /> Allows a destination to overwrite max_table_nesting from source
958+
* **`supported_merge_strategies`** - _typing.Sequence[typing.Literal['delete-insert', 'scd2', 'upsert']]_ <br />
959+
* **`merge_strategies_selector`** - _class 'dlt.common.destination.capabilities.MergeStrategySelector'_ <br />
960+
* **`supported_replace_strategies`** - _typing.Sequence[typing.Literal['truncate-and-insert', 'insert-from-staging', 'staging-optimized']]_ <br />
961+
* **`replace_strategies_selector`** - _class 'dlt.common.destination.capabilities.ReplaceStrategySelector'_ <br />
962+
* **`max_parallel_load_jobs`** - _int_ <br /> The destination can set the maximum amount of parallel load jobs being executed
963+
* **`loader_parallelism_strategy`** - _parallel | table-sequential | sequential_ <br /> The destination can override the parallelism strategy
964+
* **`max_query_parameters`** - _int_ <br /> The maximum number of parameters that can be supplied in a single parametrized query
965+
* **`supports_native_boolean`** - _bool_ <br /> The destination supports a native boolean type, otherwise bool columns are usually stored as integers
966+
* **`supports_nested_types`** - _bool_ <br /> Tells if destination can write nested types, currently only destinations storing parquet are supported
967+
* **`enforces_nulls_on_alter`** - _bool_ <br /> Tells if destination enforces null constraints when adding NOT NULL columns to existing tables
968+
* **`sqlglot_dialect`** - _str_ <br /> The SQL dialect used by sqlglot to transpile a query to match the destination syntax.
969+
970+
### FilesystemConfiguration
971+
A configuration defining filesystem location and access credentials.
972+
973+
When configuration is resolved, `bucket_url` is used to extract a protocol and request corresponding credentials class.
974+
* s3
975+
* gs, gcs
976+
* az, abfs, adl, abfss, azure
977+
* file, memory
978+
* gdrive
979+
* sftp
980+
981+
982+
* **`bucket_url`** - _str_ <br />
983+
* **`credentials`** - _[AwsCredentials](#awscredentials) | [GcpServiceAccountCredentials](#gcpserviceaccountcredentials) | [AzureCredentialsWithoutDefaults](#azurecredentialswithoutdefaults) | [AzureServicePrincipalCredentialsWithoutDefaults](#azureserviceprincipalcredentialswithoutdefaults) | [AzureCredentials](#azurecredentials) | [AzureServicePrincipalCredentials](#azureserviceprincipalcredentials) | [GcpOAuthCredentials](#gcpoauthcredentials) | [SFTPCredentials](#sftpcredentials)_ <br />
984+
* **`read_only`** - _bool_ <br /> Indicates read only filesystem access. Will enable caching
985+
* **`kwargs`** - _typing.Dict[str, typing.Any]_ <br /> Additional arguments passed to fsspec constructor ie. dict(use_ssl=True) for s3fs
986+
* **`client_kwargs`** - _typing.Dict[str, typing.Any]_ <br /> Additional arguments passed to underlying fsspec native client ie. dict(verify="public.crt) for botocore
987+
* **`deltalake_storage_options`** - _typing.Dict[str, typing.Any]_ <br />
988+
* **`deltalake_configuration`** - _typing.Dict[str, typing.Optional[str]]_ <br />
989+
990+
### Incremental
991+
Adds incremental extraction for a resource by storing a cursor value in persistent state.
992+
993+
The cursor could for example be a timestamp for when the record was created and you can use this to load only
994+
new records created since the last run of the pipeline.
995+
996+
To use this the resource function should have an argument either type annotated with `Incremental` or a default `Incremental` instance.
997+
For example:
998+
999+
>>> @dlt.resource(primary_key='id')
1000+
>>> def some_data(created_at=dlt.sources.incremental('created_at', '2023-01-01T00:00:00Z'):
1001+
>>> yield from request_data(created_after=created_at.last_value)
1002+
1003+
When the resource has a `primary_key` specified this is used to deduplicate overlapping items with the same cursor value.
1004+
1005+
Alternatively you can use this class as transform step and add it to any resource. For example:
1006+
>>> @dlt.resource
1007+
>>> def some_data():
1008+
>>> last_value = dlt.sources.incremental.from_existing_state("some_data", "item.ts")
1009+
>>> ...
1010+
>>>
1011+
>>> r = some_data().add_step(dlt.sources.incremental("item.ts", initial_value=now, primary_key="delta"))
1012+
>>> info = p.run(r, destination="duckdb")
1013+
1014+
Args:
1015+
cursor_path: The name or a JSON path to a cursor field. Uses the same names of fields as in your JSON document, before they are normalized to store in the database.
1016+
initial_value: Optional value used for `last_value` when no state is available, e.g. on the first run of the pipeline. If not provided `last_value` will be `None` on the first run.
1017+
last_value_func: Callable used to determine which cursor value to save in state. It is called with a list of the stored state value and all cursor vals from currently processing items. Default is `max`
1018+
primary_key: Optional primary key used to deduplicate data. If not provided, a primary key defined by the resource will be used. Pass a tuple to define a compound key. Pass empty tuple to disable unique checks
1019+
end_value: Optional value used to load a limited range of records between `initial_value` and `end_value`.
1020+
Use in conjunction with `initial_value`, e.g. load records from given month `incremental(initial_value="2022-01-01T00:00:00Z", end_value="2022-02-01T00:00:00Z")`
1021+
Note, when this is set the incremental filtering is stateless and `initial_value` always supersedes any previous incremental value in state.
1022+
row_order: Declares that data source returns rows in descending (desc) or ascending (asc) order as defined by `last_value_func`. If row order is know, Incremental class
1023+
is able to stop requesting new rows by closing pipe generator. This prevents getting more data from the source. Defaults to None, which means that
1024+
row order is not known.
1025+
allow_external_schedulers: If set to True, allows dlt to look for external schedulers from which it will take "initial_value" and "end_value" resulting in loading only
1026+
specified range of data. Currently Airflow scheduler is detected: "data_interval_start" and "data_interval_end" are taken from the context and passed Incremental class.
1027+
The values passed explicitly to Incremental will be ignored.
1028+
Note that if logical "end date" is present then also "end_value" will be set which means that resource state is not used and exactly this range of date will be loaded
1029+
on_cursor_value_missing: Specify what happens when the cursor_path does not exist in a record or a record has `None` at the cursor_path: raise, include, exclude
1030+
lag: Optional value used to define a lag or attribution window. For datetime cursors, this is interpreted as seconds. For other types, it uses the + or - operator depending on the last_value_func.
1031+
range_start: Decide whether the incremental filtering range is `open` or `closed` on the start value side. Default is `closed`.
1032+
Setting this to `open` means that items with the same cursor value as the last value from the previous run (or `initial_value`) are excluded from the result.
1033+
The `open` range disables deduplication logic so it can serve as an optimization when you know cursors don't overlap between pipeline runs.
1034+
range_end: Decide whether the incremental filtering range is `open` or `closed` on the end value side. Default is `open` (exact `end_value` is excluded).
1035+
Setting this to `closed` means that items with the exact same cursor value as the `end_value` are included in the result.
1036+
1037+
1038+
* **`cursor_path`** - _str_ <br />
1039+
* **`initial_value`** - _typing.Any_ <br />
1040+
* **`end_value`** - _typing.Any_ <br />
1041+
* **`row_order`** - _asc | desc_ <br />
1042+
* **`allow_external_schedulers`** - _bool_ <br />
1043+
* **`on_cursor_value_missing`** - _raise | include | exclude_ <br />
1044+
* **`lag`** - _float_ <br />
1045+
* **`range_start`** - _open | closed_ <br />
1046+
* **`range_end`** - _open | closed_ <br />
1047+
1048+
### ItemsNormalizerConfiguration
1049+
None
1050+
1051+
* **`add_dlt_id`** - _bool_ <br /> When true, items to be normalized will have `_dlt_id` column added with a unique ID for each row.
1052+
* **`add_dlt_load_id`** - _bool_ <br /> When true, items to be normalized will have `_dlt_load_id` column added with the current load ID.
1053+
1054+
### LanceDBClientOptions
1055+
None
1056+
1057+
* **`max_retries`** - _int_ <br /> `EmbeddingFunction` class wraps the calls for source and query embedding
1058+
1059+
### LoadPackageStateInjectableContext
1060+
None
1061+
1062+
* **`in_container`** - _bool_ <br /> Current container, if None then not injected
1063+
* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
1064+
* **`storage`** - _class 'dlt.common.storages.load_package.PackageStorage'_ <br />
1065+
* **`load_id`** - _str_ <br />
1066+
1067+
### LoadStorageConfiguration
1068+
None
1069+
1070+
* **`load_volume_path`** - _str_ <br />
1071+
* **`delete_completed_jobs`** - _bool_ <br />
1072+
1073+
### LoaderConfiguration
1074+
None
1075+
1076+
* **`pool_type`** - _process | thread | none_ <br /> type of pool to run, must be set in derived configs
1077+
* **`start_method`** - _str_ <br /> start method for the pool (typically process). None is system default
1078+
* **`workers`** - _int_ <br /> how many parallel loads can be executed
1079+
* **`run_sleep`** - _float_ <br /> how long to sleep between runs with workload, seconds
1080+
* **`parallelism_strategy`** - _parallel | table-sequential | sequential_ <br /> Which parallelism strategy to use at load time
1081+
* **`raise_on_failed_jobs`** - _bool_ <br /> when True, raises on terminally failed jobs immediately
1082+
* **`raise_on_max_retries`** - _int_ <br /> When gt 0 will raise when job reaches raise_on_max_retries
1083+
* **`truncate_staging_dataset`** - _bool_ <br />
1084+
1085+
### NormalizeConfiguration
1086+
None
1087+
1088+
* **`pool_type`** - _process | thread | none_ <br /> type of pool to run, must be set in derived configs
1089+
* **`start_method`** - _str_ <br /> start method for the pool (typically process). None is system default
1090+
* **`workers`** - _int_ <br /> # how many threads/processes in the pool
1091+
* **`run_sleep`** - _float_ <br /> how long to sleep between runs with workload, seconds
1092+
* **`destination_capabilities`** - _[DestinationCapabilitiesContext](#destinationcapabilitiescontext)_ <br />
1093+
* **`json_normalizer`** - _[ItemsNormalizerConfiguration](#itemsnormalizerconfiguration)_ <br />
1094+
* **`parquet_normalizer`** - _[ItemsNormalizerConfiguration](#itemsnormalizerconfiguration)_ <br />
1095+
* **`model_normalizer`** - _[ItemsNormalizerConfiguration](#itemsnormalizerconfiguration)_ <br />
1096+
1097+
### NormalizeStorageConfiguration
1098+
None
1099+
1100+
* **`normalize_volume_path`** - _str_ <br />
1101+
1102+
### ParquetFormatConfiguration
1103+
None
1104+
1105+
* **`flavor`** - _str_ <br />
1106+
* **`version`** - _str_ <br />
1107+
* **`data_page_size`** - _int_ <br />
1108+
* **`timestamp_timezone`** - _str_ <br />
1109+
* **`row_group_size`** - _int_ <br />
1110+
* **`coerce_timestamps`** - _s | ms | us | ns_ <br />
1111+
* **`allow_truncated_timestamps`** - _bool_ <br />
1112+
1113+
### PipelineContext
1114+
None
1115+
1116+
* **`in_container`** - _bool_ <br /> Current container, if None then not injected
1117+
* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
1118+
1119+
### PoolRunnerConfiguration
1120+
None
1121+
1122+
* **`pool_type`** - _process | thread | none_ <br /> type of pool to run, must be set in derived configs
1123+
* **`start_method`** - _str_ <br /> start method for the pool (typically process). None is system default
1124+
* **`workers`** - _int_ <br /> # how many threads/processes in the pool
1125+
* **`run_sleep`** - _float_ <br /> how long to sleep between runs with workload, seconds
1126+
1127+
### QdrantClientOptions
1128+
None
1129+
1130+
* **`port`** - _int_ <br />
1131+
* **`grpc_port`** - _int_ <br />
1132+
* **`prefer_grpc`** - _bool_ <br />
1133+
* **`https`** - _bool_ <br />
1134+
* **`prefix`** - _str_ <br />
1135+
* **`timeout`** - _int_ <br />
1136+
* **`host`** - _str_ <br />
1137+
1138+
### RuntimeConfiguration
1139+
None
1140+
1141+
* **`pipeline_name`** - _str_ <br />
1142+
* **`sentry_dsn`** - _str_ <br />
1143+
* **`slack_incoming_hook`** - _str_ <br />
1144+
* **`dlthub_telemetry`** - _bool_ <br />
1145+
* **`dlthub_telemetry_endpoint`** - _str_ <br />
1146+
* **`dlthub_telemetry_segment_write_key`** - _str_ <br />
1147+
* **`log_format`** - _str_ <br />
1148+
* **`log_level`** - _str_ <br />
1149+
* **`request_timeout`** - _float_ <br /> Timeout for http requests
1150+
* **`request_max_attempts`** - _int_ <br /> Max retry attempts for http clients
1151+
* **`request_backoff_factor`** - _float_ <br /> Multiplier applied to exponential retry delay for http requests
1152+
* **`request_max_retry_delay`** - _float_ <br /> Maximum delay between http request retries
1153+
* **`config_files_storage_path`** - _str_ <br /> Platform connection
1154+
* **`dlthub_dsn`** - _str_ <br />
1155+
1156+
### SchemaConfiguration
1157+
None
1158+
1159+
* **`naming`** - _str | typing.Type[dlt.common.normalizers.naming.naming.NamingConvention] | class 'module'_ <br />
1160+
* **`json_normalizer`** - _typing.Dict[str, typing.Any]_ <br />
1161+
* **`allow_identifier_change_on_table_with_data`** - _bool_ <br />
1162+
* **`use_break_path_on_normalize`** - _bool_ <br /> Post 1.4.0 to allow table and column names that contain table separators
1163+
1164+
### SchemaStorageConfiguration
1165+
None
1166+
1167+
* **`schema_volume_path`** - _str_ <br />
1168+
* **`import_schema_path`** - _str_ <br />
1169+
* **`export_schema_path`** - _str_ <br />
1170+
* **`external_schema_format`** - _json | yaml_ <br />
1171+
* **`external_schema_format_remove_defaults`** - _bool_ <br />
1172+
1173+
### SourceInjectableContext
1174+
A context containing the source schema, present when dlt.resource decorated function is executed
1175+
1176+
* **`in_container`** - _bool_ <br /> Current container, if None then not injected
1177+
* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
1178+
* **`source`** - _class 'dlt.extract.source.DltSource'_ <br />
1179+
1180+
### SourceSchemaInjectableContext
1181+
A context containing the source schema, present when dlt.source/resource decorated function is executed
1182+
1183+
* **`in_container`** - _bool_ <br /> Current container, if None then not injected
1184+
* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
1185+
* **`schema`** - _class 'dlt.common.schema.schema.Schema'_ <br />
1186+
1187+
### StateInjectableContext
1188+
None
1189+
1190+
* **`in_container`** - _bool_ <br /> Current container, if None then not injected
1191+
* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
1192+
* **`state`** - _class 'dlt.common.pipeline.TPipelineState'_ <br />
1193+
1194+
### TransformationConfiguration
1195+
Configuration for a transformation
1196+
1197+
* **`buffer_max_items`** - _int_ <br />
1198+
1199+
### VaultProviderConfiguration
1200+
None
1201+
1202+
* **`only_secrets`** - _bool_ <br />
1203+
* **`only_toml_fragments`** - _bool_ <br />
1204+
* **`list_secrets`** - _bool_ <br />

0 commit comments

Comments
 (0)