dlt-hub
diff --git a/‎docs/website/docs/reference/configuration.md
Lines changed: 338 additions & 0 deletions b/‎docs/website/docs/reference/configuration.md
Lines changed: 338 additions & 0 deletions
@@ -864,3 +864,341 @@ None
 * **`url`** - _str_ <br /> 
 * **`api_key`** - _str_ <br /> 
 * **`additional_headers`** - _typing.Dict[str, str]_ <br /> 
+
+## All other Configurations
+### BaseConfiguration
+None
+
+
+### ConfigProvidersConfiguration
+None
+
+* **`enable_airflow_secrets`** - _bool_ <br /> 
+* **`enable_google_secrets`** - _bool_ <br /> 
+* **`airflow_secrets`** - _[VaultProviderConfiguration](#vaultproviderconfiguration)_ <br /> 
+* **`google_secrets`** - _[VaultProviderConfiguration](#vaultproviderconfiguration)_ <br /> 
+
+### ConfigSectionContext
+None
+
+* **`in_container`** - _bool_ <br /> Current container, if None then not injected
+* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
+* **`pipeline_name`** - _str_ <br /> 
+* **`sections`** - _typing.Tuple[str, ...]_ <br /> 
+* **`merge_style`** - _typing.Callable[[dlt.common.configuration.specs.config_section_context.ConfigSectionContext, dlt.common.configuration.specs.config_section_context.ConfigSectionContext], NoneType]_ <br /> 
+* **`source_state_key`** - _str_ <br /> 
+
+### ContainerInjectableContext
+Base class for all configurations that may be injected from a Container. Injectable configuration is called a context
+
+* **`in_container`** - _bool_ <br /> Current container, if None then not injected
+* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
+
+### CsvFormatConfiguration
+None
+
+* **`delimiter`** - _str_ <br /> 
+* **`include_header`** - _bool_ <br /> 
+* **`quoting`** - _quote_all | quote_needed_ <br /> 
+* **`on_error_continue`** - _bool_ <br /> 
+* **`encoding`** - _str_ <br /> 
+
+### DBTRunnerConfiguration
+None
+
+* **`package_location`** - _str_ <br /> 
+* **`package_repository_branch`** - _str_ <br /> 
+* **`package_repository_ssh_key`** - _str_ <br /> 
+* **`package_profiles_dir`** - _str_ <br /> 
+* **`package_profile_name`** - _str_ <br /> 
+* **`auto_full_refresh_when_out_of_sync`** - _bool_ <br /> 
+* **`package_additional_vars`** - _typing.Mapping[str, typing.Any]_ <br /> 
+* **`runtime`** - _[RuntimeConfiguration](#runtimeconfiguration)_ <br /> 
+
+### DestinationCapabilitiesContext
+Injectable destination capabilities required for many Pipeline stages ie. normalize
+
+* **`in_container`** - _bool_ <br /> Current container, if None then not injected
+* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
+* **`preferred_loader_file_format`** - _jsonl | typed-jsonl | insert_values | parquet | csv | reference | model_ <br /> 
+* **`supported_loader_file_formats`** - _typing.Sequence[typing.Literal['jsonl', 'typed-jsonl', 'insert_values', 'parquet', 'csv', 'reference', 'model']]_ <br /> 
+* **`loader_file_format_selector`** - _class 'dlt.common.destination.capabilities.LoaderFileFormatSelector'_ <br /> Callable that adapts `preferred_loader_file_format` and `supported_loader_file_formats` at runtime.
+* **`preferred_table_format`** - _iceberg | delta | hive | native_ <br /> 
+* **`supported_table_formats`** - _typing.Sequence[typing.Literal['iceberg', 'delta', 'hive', 'native']]_ <br /> 
+* **`type_mapper`** - _typing.Type[dlt.common.destination.capabilities.DataTypeMapper]_ <br /> 
+* **`recommended_file_size`** - _int_ <br /> Recommended file size in bytes when writing extract/load files
+* **`preferred_staging_file_format`** - _jsonl | typed-jsonl | insert_values | parquet | csv | reference | model_ <br /> 
+* **`supported_staging_file_formats`** - _typing.Sequence[typing.Literal['jsonl', 'typed-jsonl', 'insert_values', 'parquet', 'csv', 'reference', 'model']]_ <br /> 
+* **`format_datetime_literal`** - _typing.Callable[..., str]_ <br /> 
+* **`escape_identifier`** - _typing.Callable[[str], str]_ <br /> 
+* **`escape_literal`** - _typing.Callable[[typing.Any], typing.Any]_ <br /> 
+* **`casefold_identifier`** - _typing.Callable[[str], str]_ <br /> Casing function applied by destination to represent case insensitive identifiers.
+* **`has_case_sensitive_identifiers`** - _bool_ <br /> Tells if destination supports case sensitive identifiers
+* **`decimal_precision`** - _typing.Tuple[int, int]_ <br /> 
+* **`wei_precision`** - _typing.Tuple[int, int]_ <br /> 
+* **`max_identifier_length`** - _int_ <br /> 
+* **`max_column_identifier_length`** - _int_ <br /> 
+* **`max_query_length`** - _int_ <br /> 
+* **`is_max_query_length_in_bytes`** - _bool_ <br /> 
+* **`max_text_data_type_length`** - _int_ <br /> 
+* **`is_max_text_data_type_length_in_bytes`** - _bool_ <br /> 
+* **`supports_transactions`** - _bool_ <br /> 
+* **`supports_ddl_transactions`** - _bool_ <br /> 
+* **`naming_convention`** - _str | typing.Type[dlt.common.normalizers.naming.naming.NamingConvention] | class 'module'_ <br /> 
+* **`alter_add_multi_column`** - _bool_ <br /> 
+* **`supports_create_table_if_not_exists`** - _bool_ <br /> 
+* **`supports_truncate_command`** - _bool_ <br /> 
+* **`schema_supports_numeric_precision`** - _bool_ <br /> 
+* **`timestamp_precision`** - _int_ <br /> 
+* **`max_rows_per_insert`** - _int_ <br /> 
+* **`insert_values_writer_type`** - _str_ <br /> 
+* **`supports_multiple_statements`** - _bool_ <br /> 
+* **`supports_clone_table`** - _bool_ <br /> Destination supports CREATE TABLE ... CLONE ... statements
+* **`max_table_nesting`** - _int_ <br /> Allows a destination to overwrite max_table_nesting from source
+* **`supported_merge_strategies`** - _typing.Sequence[typing.Literal['delete-insert', 'scd2', 'upsert']]_ <br /> 
+* **`merge_strategies_selector`** - _class 'dlt.common.destination.capabilities.MergeStrategySelector'_ <br /> 
+* **`supported_replace_strategies`** - _typing.Sequence[typing.Literal['truncate-and-insert', 'insert-from-staging', 'staging-optimized']]_ <br /> 
+* **`replace_strategies_selector`** - _class 'dlt.common.destination.capabilities.ReplaceStrategySelector'_ <br /> 
+* **`max_parallel_load_jobs`** - _int_ <br /> The destination can set the maximum amount of parallel load jobs being executed
+* **`loader_parallelism_strategy`** - _parallel | table-sequential | sequential_ <br /> The destination can override the parallelism strategy
+* **`max_query_parameters`** - _int_ <br /> The maximum number of parameters that can be supplied in a single parametrized query
+* **`supports_native_boolean`** - _bool_ <br /> The destination supports a native boolean type, otherwise bool columns are usually stored as integers
+* **`supports_nested_types`** - _bool_ <br /> Tells if destination can write nested types, currently only destinations storing parquet are supported
+* **`enforces_nulls_on_alter`** - _bool_ <br /> Tells if destination enforces null constraints when adding NOT NULL columns to existing tables
+* **`sqlglot_dialect`** - _str_ <br /> The SQL dialect used by sqlglot to transpile a query to match the destination syntax.
+
+### FilesystemConfiguration
+A configuration defining filesystem location and access credentials.
+
+    When configuration is resolved, `bucket_url` is used to extract a protocol and request corresponding credentials class.
+    * s3
+    * gs, gcs
+    * az, abfs, adl, abfss, azure
+    * file, memory
+    * gdrive
+    * sftp
+    
+
+* **`bucket_url`** - _str_ <br /> 
+* **`credentials`** - _[AwsCredentials](#awscredentials) | [GcpServiceAccountCredentials](#gcpserviceaccountcredentials) | [AzureCredentialsWithoutDefaults](#azurecredentialswithoutdefaults) | [AzureServicePrincipalCredentialsWithoutDefaults](#azureserviceprincipalcredentialswithoutdefaults) | [AzureCredentials](#azurecredentials) | [AzureServicePrincipalCredentials](#azureserviceprincipalcredentials) | [GcpOAuthCredentials](#gcpoauthcredentials) | [SFTPCredentials](#sftpcredentials)_ <br /> 
+* **`read_only`** - _bool_ <br /> Indicates read only filesystem access. Will enable caching
+* **`kwargs`** - _typing.Dict[str, typing.Any]_ <br /> Additional arguments passed to fsspec constructor ie. dict(use_ssl=True) for s3fs
+* **`client_kwargs`** - _typing.Dict[str, typing.Any]_ <br /> Additional arguments passed to underlying fsspec native client ie. dict(verify="public.crt) for botocore
+* **`deltalake_storage_options`** - _typing.Dict[str, typing.Any]_ <br /> 
+* **`deltalake_configuration`** - _typing.Dict[str, typing.Optional[str]]_ <br /> 
+
+### Incremental
+Adds incremental extraction for a resource by storing a cursor value in persistent state.
+
+    The cursor could for example be a timestamp for when the record was created and you can use this to load only
+    new records created since the last run of the pipeline.
+
+    To use this the resource function should have an argument either type annotated with `Incremental` or a default `Incremental` instance.
+    For example:
+
+    >>> @dlt.resource(primary_key='id')
+    >>> def some_data(created_at=dlt.sources.incremental('created_at', '2023-01-01T00:00:00Z'):
+    >>>    yield from request_data(created_after=created_at.last_value)
+
+    When the resource has a `primary_key` specified this is used to deduplicate overlapping items with the same cursor value.
+
+    Alternatively you can use this class as transform step and add it to any resource. For example:
+    >>> @dlt.resource
+    >>> def some_data():
+    >>>     last_value = dlt.sources.incremental.from_existing_state("some_data", "item.ts")
+    >>>     ...
+    >>>
+    >>> r = some_data().add_step(dlt.sources.incremental("item.ts", initial_value=now, primary_key="delta"))
+    >>> info = p.run(r, destination="duckdb")
+
+    Args:
+        cursor_path: The name or a JSON path to a cursor field. Uses the same names of fields as in your JSON document, before they are normalized to store in the database.
+        initial_value: Optional value used for `last_value` when no state is available, e.g. on the first run of the pipeline. If not provided `last_value` will be `None` on the first run.
+        last_value_func: Callable used to determine which cursor value to save in state. It is called with a list of the stored state value and all cursor vals from currently processing items. Default is `max`
+        primary_key: Optional primary key used to deduplicate data. If not provided, a primary key defined by the resource will be used. Pass a tuple to define a compound key. Pass empty tuple to disable unique checks
+        end_value: Optional value used to load a limited range of records between `initial_value` and `end_value`.
+            Use in conjunction with `initial_value`, e.g. load records from given month `incremental(initial_value="2022-01-01T00:00:00Z", end_value="2022-02-01T00:00:00Z")`
+            Note, when this is set the incremental filtering is stateless and `initial_value` always supersedes any previous incremental value in state.
+        row_order: Declares that data source returns rows in descending (desc) or ascending (asc) order as defined by `last_value_func`. If row order is know, Incremental class
+                    is able to stop requesting new rows by closing pipe generator. This prevents getting more data from the source. Defaults to None, which means that
+                    row order is not known.
+        allow_external_schedulers: If set to True, allows dlt to look for external schedulers from which it will take "initial_value" and "end_value" resulting in loading only
+            specified range of data. Currently Airflow scheduler is detected: "data_interval_start" and "data_interval_end" are taken from the context and passed Incremental class.
+            The values passed explicitly to Incremental will be ignored.
+            Note that if logical "end date" is present then also "end_value" will be set which means that resource state is not used and exactly this range of date will be loaded
+        on_cursor_value_missing: Specify what happens when the cursor_path does not exist in a record or a record has `None` at the cursor_path: raise, include, exclude
+        lag: Optional value used to define a lag or attribution window. For datetime cursors, this is interpreted as seconds. For other types, it uses the + or - operator depending on the last_value_func.
+        range_start: Decide whether the incremental filtering range is `open` or `closed` on the start value side. Default is `closed`.
+            Setting this to `open` means that items with the same cursor value as the last value from the previous run (or `initial_value`) are excluded from the result.
+            The `open` range disables deduplication logic so it can serve as an optimization when you know cursors don't overlap between pipeline runs.
+        range_end: Decide whether the incremental filtering range is `open` or `closed` on the end value side. Default is `open` (exact `end_value` is excluded).
+            Setting this to `closed` means that items with the exact same cursor value as the `end_value` are included in the result.
+    
+
+* **`cursor_path`** - _str_ <br /> 
+* **`initial_value`** - _typing.Any_ <br /> 
+* **`end_value`** - _typing.Any_ <br /> 
+* **`row_order`** - _asc | desc_ <br /> 
+* **`allow_external_schedulers`** - _bool_ <br /> 
+* **`on_cursor_value_missing`** - _raise | include | exclude_ <br /> 
+* **`lag`** - _float_ <br /> 
+* **`range_start`** - _open | closed_ <br /> 
+* **`range_end`** - _open | closed_ <br /> 
+
+### ItemsNormalizerConfiguration
+None
+
+* **`add_dlt_id`** - _bool_ <br /> When true, items to be normalized will have `_dlt_id` column added with a unique ID for each row.
+* **`add_dlt_load_id`** - _bool_ <br /> When true, items to be normalized will have `_dlt_load_id` column added with the current load ID.
+
+### LanceDBClientOptions
+None
+
+* **`max_retries`** - _int_ <br /> `EmbeddingFunction` class wraps the calls for source and query embedding
+
+### LoadPackageStateInjectableContext
+None
+
+* **`in_container`** - _bool_ <br /> Current container, if None then not injected
+* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
+* **`storage`** - _class 'dlt.common.storages.load_package.PackageStorage'_ <br /> 
+* **`load_id`** - _str_ <br /> 
+
+### LoadStorageConfiguration
+None
+
+* **`load_volume_path`** - _str_ <br /> 
+* **`delete_completed_jobs`** - _bool_ <br /> 
+
+### LoaderConfiguration
+None
+
+* **`pool_type`** - _process | thread | none_ <br /> type of pool to run, must be set in derived configs
+* **`start_method`** - _str_ <br /> start method for the pool (typically process). None is system default
+* **`workers`** - _int_ <br /> how many parallel loads can be executed
+* **`run_sleep`** - _float_ <br /> how long to sleep between runs with workload, seconds
+* **`parallelism_strategy`** - _parallel | table-sequential | sequential_ <br /> Which parallelism strategy to use at load time
+* **`raise_on_failed_jobs`** - _bool_ <br /> when True, raises on terminally failed jobs immediately
+* **`raise_on_max_retries`** - _int_ <br /> When gt 0 will raise when job reaches raise_on_max_retries
+* **`truncate_staging_dataset`** - _bool_ <br /> 
+
+### NormalizeConfiguration
+None
+
+* **`pool_type`** - _process | thread | none_ <br /> type of pool to run, must be set in derived configs
+* **`start_method`** - _str_ <br /> start method for the pool (typically process). None is system default
+* **`workers`** - _int_ <br /> # how many threads/processes in the pool
+* **`run_sleep`** - _float_ <br /> how long to sleep between runs with workload, seconds
+* **`destination_capabilities`** - _[DestinationCapabilitiesContext](#destinationcapabilitiescontext)_ <br /> 
+* **`json_normalizer`** - _[ItemsNormalizerConfiguration](#itemsnormalizerconfiguration)_ <br /> 
+* **`parquet_normalizer`** - _[ItemsNormalizerConfiguration](#itemsnormalizerconfiguration)_ <br /> 
+* **`model_normalizer`** - _[ItemsNormalizerConfiguration](#itemsnormalizerconfiguration)_ <br /> 
+
+### NormalizeStorageConfiguration
+None
+
+* **`normalize_volume_path`** - _str_ <br /> 
+
+### ParquetFormatConfiguration
+None
+
+* **`flavor`** - _str_ <br /> 
+* **`version`** - _str_ <br /> 
+* **`data_page_size`** - _int_ <br /> 
+* **`timestamp_timezone`** - _str_ <br /> 
+* **`row_group_size`** - _int_ <br /> 
+* **`coerce_timestamps`** - _s | ms | us | ns_ <br /> 
+* **`allow_truncated_timestamps`** - _bool_ <br /> 
+
+### PipelineContext
+None
+
+* **`in_container`** - _bool_ <br /> Current container, if None then not injected
+* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
+
+### PoolRunnerConfiguration
+None
+
+* **`pool_type`** - _process | thread | none_ <br /> type of pool to run, must be set in derived configs
+* **`start_method`** - _str_ <br /> start method for the pool (typically process). None is system default
+* **`workers`** - _int_ <br /> # how many threads/processes in the pool
+* **`run_sleep`** - _float_ <br /> how long to sleep between runs with workload, seconds
+
+### QdrantClientOptions
+None
+
+* **`port`** - _int_ <br /> 
+* **`grpc_port`** - _int_ <br /> 
+* **`prefer_grpc`** - _bool_ <br /> 
+* **`https`** - _bool_ <br /> 
+* **`prefix`** - _str_ <br /> 
+* **`timeout`** - _int_ <br /> 
+* **`host`** - _str_ <br /> 
+
+### RuntimeConfiguration
+None
+
+* **`pipeline_name`** - _str_ <br /> 
+* **`sentry_dsn`** - _str_ <br /> 
+* **`slack_incoming_hook`** - _str_ <br /> 
+* **`dlthub_telemetry`** - _bool_ <br /> 
+* **`dlthub_telemetry_endpoint`** - _str_ <br /> 
+* **`dlthub_telemetry_segment_write_key`** - _str_ <br /> 
+* **`log_format`** - _str_ <br /> 
+* **`log_level`** - _str_ <br /> 
+* **`request_timeout`** - _float_ <br /> Timeout for http requests
+* **`request_max_attempts`** - _int_ <br /> Max retry attempts for http clients
+* **`request_backoff_factor`** - _float_ <br /> Multiplier applied to exponential retry delay for http requests
+* **`request_max_retry_delay`** - _float_ <br /> Maximum delay between http request retries
+* **`config_files_storage_path`** - _str_ <br /> Platform connection
+* **`dlthub_dsn`** - _str_ <br /> 
+
+### SchemaConfiguration
+None
+
+* **`naming`** - _str | typing.Type[dlt.common.normalizers.naming.naming.NamingConvention] | class 'module'_ <br /> 
+* **`json_normalizer`** - _typing.Dict[str, typing.Any]_ <br /> 
+* **`allow_identifier_change_on_table_with_data`** - _bool_ <br /> 
+* **`use_break_path_on_normalize`** - _bool_ <br /> Post 1.4.0 to allow table and column names that contain table separators
+
+### SchemaStorageConfiguration
+None
+
+* **`schema_volume_path`** - _str_ <br /> 
+* **`import_schema_path`** - _str_ <br /> 
+* **`export_schema_path`** - _str_ <br /> 
+* **`external_schema_format`** - _json | yaml_ <br /> 
+* **`external_schema_format_remove_defaults`** - _bool_ <br /> 
+
+### SourceInjectableContext
+A context containing the source schema, present when dlt.resource decorated function is executed
+
+* **`in_container`** - _bool_ <br /> Current container, if None then not injected
+* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
+* **`source`** - _class 'dlt.extract.source.DltSource'_ <br /> 
+
+### SourceSchemaInjectableContext
+A context containing the source schema, present when dlt.source/resource decorated function is executed
+
+* **`in_container`** - _bool_ <br /> Current container, if None then not injected
+* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
+* **`schema`** - _class 'dlt.common.schema.schema.Schema'_ <br /> 
+
+### StateInjectableContext
+None
+
+* **`in_container`** - _bool_ <br /> Current container, if None then not injected
+* **`extras_added`** - _bool_ <br /> Tells if extras were already added to this context
+* **`state`** - _class 'dlt.common.pipeline.TPipelineState'_ <br /> 
+
+### TransformationConfiguration
+Configuration for a transformation
+
+* **`buffer_max_items`** - _int_ <br /> 
+
+### VaultProviderConfiguration
+None
+
+* **`only_secrets`** - _bool_ <br /> 
+* **`only_toml_fragments`** - _bool_ <br /> 
+* **`list_secrets`** - _bool_ <br />