From 7e184af051152c7f3c5e66879033ff66ab510b2a Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 1 Oct 2025 12:45:29 +0100 Subject: [PATCH 01/31] systemd timers --- documentation/deployment/systemd.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/documentation/deployment/systemd.md b/documentation/deployment/systemd.md index 2fde94897..b635c68fe 100644 --- a/documentation/deployment/systemd.md +++ b/documentation/deployment/systemd.md @@ -141,3 +141,30 @@ is activated or de-activated. You also do not need to apply `sudo` to make changes to the services. Consistent with the examples on this page, we recommend scoped users. + + +## Daily timers + +If running QuestDB on a `systemd` based Linux (for example, `Ubuntu`) you may find that, by default, there are a number of daily upgrade timers enabled. + +When executed, these tasks restart `systemd` services, which can cause interruptions to QuestDB. It will appear +that QuestDB restarted with no errors or apparent trigger. + +To resolve it, either: + +- Force services to be listed for restart, but not restarted automatically. + - Modify `/etc/needrestart/needrestart.conf` to contain `$nrconf{restart} = 'l'`. +- Disable the auto-upgrade services entirely: + +```bash +sudo systemctl disable --now apt-daily-upgrade.timer +sudo systemctl disable --now apt-daily.timer +sudo systemctl disable --now unattended-upgrades.service +``` + + +You can check the status of the timers using: + +```bash +systemctl list-timers --all | grep apt +``` \ No newline at end of file From 99431d52f90a56eaa29ffbe9223197d00fa9125e Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 1 Oct 2025 13:05:23 +0100 Subject: [PATCH 02/31] add note about aggregation functions to help robots --- documentation/deployment/systemd.md | 1 - .../reference/function/aggregation.md | 26 +++++++++++++++++++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/documentation/deployment/systemd.md b/documentation/deployment/systemd.md index b635c68fe..7473faad2 100644 --- a/documentation/deployment/systemd.md +++ b/documentation/deployment/systemd.md @@ -162,7 +162,6 @@ sudo systemctl disable --now apt-daily.timer sudo systemctl disable --now unattended-upgrades.service ``` - You can check the status of the timers using: ```bash diff --git a/documentation/reference/function/aggregation.md b/documentation/reference/function/aggregation.md index 4c5c3f314..b90a4c9da 100644 --- a/documentation/reference/function/aggregation.md +++ b/documentation/reference/function/aggregation.md @@ -7,6 +7,32 @@ description: Aggregate functions reference documentation. This page describes the available functions to assist with performing aggregate calculations. + +:::note + +QuestDB does not using aggregate functions as arguments to other functions. For example, this is not allowed: + +```questdb-sql +SELECT datediff('d', min(timestamp), max(timestmap)) FROM trades; +``` + +You can work around this limitation by using CTEs or subqueries: + +```questdb-sql +-- CTE +WITH minmax AS ( + SELECT min(timestamp) as min_date, max(timestamp) as max_date FROM trades +) +SELECT datediff('d', min_date, max_date) FROM minmax; + +-- Subquery +SELECT datediff('d', min_date, max_date) FROM ( + SELECT min(timestamp) as min_date, max(timestamp) as max_date FROM trades +); +``` + +::: + ## approx_count_distinct `approx_count_distinct(column_name, precision)` - estimates the number of From 4ace1049bd3307ee8b3e7acc744dd9813a3f1a49 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 1 Oct 2025 13:19:00 +0100 Subject: [PATCH 03/31] add the error message specifically --- documentation/reference/function/aggregation.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/documentation/reference/function/aggregation.md b/documentation/reference/function/aggregation.md index b90a4c9da..2d7470c9d 100644 --- a/documentation/reference/function/aggregation.md +++ b/documentation/reference/function/aggregation.md @@ -16,6 +16,10 @@ QuestDB does not using aggregate functions as arguments to other functions. For SELECT datediff('d', min(timestamp), max(timestmap)) FROM trades; ``` +Running it will result in the following error: + +`Aggregate function cannot be passed as an argument` + You can work around this limitation by using CTEs or subqueries: ```questdb-sql From d1752764b7e71b0f5440522e2ee1f14a8c4dfe4d Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 1 Oct 2025 17:00:50 +0100 Subject: [PATCH 04/31] ilp notes --- documentation/clients/java_ilp.md | 58 +++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/documentation/clients/java_ilp.md b/documentation/clients/java_ilp.md index 4655e173e..052bf3474 100644 --- a/documentation/clients/java_ilp.md +++ b/documentation/clients/java_ilp.md @@ -168,6 +168,46 @@ There are three ways to create a client instance: // ... } ``` + +## Configuring multiple urls + +:::note + +This feature requires QuestDB OSS 9.1.0+ or Enterprise 3.0.4+. + +::: + +The ILP client can be configured with multiple _possible_ endpoints to send your data to. Only one will be sent to at +any one time. + +To configure this feature, simply provide multiple `addr` entries. For example: + + +```java +try (Sender sender = Sender.fromConfig("http::addr=localhost:9000;addr=localhost:9999;")) { + // ... +} +``` + +On initialisation, if `protocol_version=auto`, the sender will identify the first instance that is writeable. Then it will _stick_ to this instance and write +any subsequent data to it. + +In the event that the instance becomes unavailable for writes, the client will retry the other possible endpoints, and when it finds +a new writeable instance, will _stick_ to it instead. This unvailability is characterised by failures to connect or locate the instance, +or the instance returning an error code due to it being read-only. + +By configuring multiple addresses, you can continue allowing you to continue to capture data if your primary instance +fails, without having to reconfigure the clients. This backup instance can be hot or cold, and so long as it is assigned a known address, it will be written to as soon as it is started. + +Enterprise users can leverage this feature to transparently handle replication failover, without the need to introduce a load-balancer or +reconfigure clients. + +:::tip + +You may wish to increase the value of `retry_timeout` if you expect your backup instance to take a large amount of time to become writeable. + +::: + ## General usage pattern @@ -289,6 +329,13 @@ closing the client. ## Error handling + +:::note + +If you have configured multiple addresses, retries will be run against different instances. + +::: + HTTP automatically retries failed, recoverable requests: network errors, some server errors, and timeouts. Non-recoverable errors include invalid data, authentication errors, and other client-side errors. @@ -318,6 +365,17 @@ With TCP transport, you don't have this option. If you get an exception, you can't continue with the same client instance, and don't have insight into which rows were accepted by the server. +:::caution + +Error handling behaviour changed with the release of QuestDB 9.1.0. + +Previously, failing all retries would cause the code to except and release the buffered data. + +Now the buffer will not be released. If you wish to re-use the same sender with fresh data, you must call the +new `reset()` function. + +::: + ## Designated timestamp considerations The concept of [designated timestamp](/docs/concept/designated-timestamp/) is From 86d99755bc60681ae33c3bfb5f7664bad52371d0 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 1 Oct 2025 18:06:32 +0100 Subject: [PATCH 05/31] scrap potentially vestigial window function --- documentation/reference/function/window.md | 44 ---------------------- 1 file changed, 44 deletions(-) diff --git a/documentation/reference/function/window.md b/documentation/reference/function/window.md index 8daf7489f..b394efc91 100644 --- a/documentation/reference/function/window.md +++ b/documentation/reference/function/window.md @@ -128,50 +128,6 @@ SELECT FROM trades; ``` -## first_not_null_value() - -In the context of window functions, `first_not_null_value(value)` returns the first non-null value in the set of rows defined by the window frame. - -**Arguments:** - -- `value`: Any numeric value. - -**Return value:** - -- The first non-null occurrence of `value` for the rows in the window frame. Returns `NaN` if no non-null values are found. - -**Description** - -When used as a window function, `first_not_null_value()` operates on a "window" of rows defined by the `OVER` clause. The rows in this window are determined by the `PARTITION BY`, `ORDER BY`, and frame specification components of the `OVER` clause. - -The `first_not_null_value()` function respects the frame clause, meaning it only includes rows within the specified frame in the calculation. The result is a separate value for each row, based on the corresponding window of rows. - -Unlike `first_value()`, this function skips null values and returns the first non-null value it encounters in the window frame. This is particularly useful when dealing with sparse data or when you want to ignore null values in your analysis. - -Note that the order of rows in the result set is not guaranteed to be the same with each execution of the query. To ensure a consistent order, use an `ORDER BY` clause outside of the `OVER` clause. - -**Syntax:** - -```questdb-sql title="first_not_null_value() syntax" -first_not_null_value(value) OVER (window_definition) -``` - -**Example:** - -```questdb-sql title="first_not_null_value() example" demo -SELECT - symbol, - price, - timestamp, - first_not_null_value(price) OVER ( - PARTITION BY symbol - ORDER BY timestamp - ROWS BETWEEN 3 PRECEDING AND CURRENT ROW - ) AS first_valid_price -FROM trades; -``` - - ## first_value() In the context of window functions, `first_value(value)` calculates the first From 9815e154c2b2dfdf652ed7610fce3c0e8057b657 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 1 Oct 2025 18:23:20 +0100 Subject: [PATCH 06/31] IN list --- documentation/configuration-utils/_cairo.config.json | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/documentation/configuration-utils/_cairo.config.json b/documentation/configuration-utils/_cairo.config.json index 3e0aaec6e..f3d6b1e7f 100644 --- a/documentation/configuration-utils/_cairo.config.json +++ b/documentation/configuration-utils/_cairo.config.json @@ -295,6 +295,10 @@ "default": "false", "description": "Sets debug flag for JIT compilation. When enabled, assembly will be printed into `stdout`." }, + "cairo.sql.jit.max.in.list.size.threshold": { + "default": "10", + "description": "Controls whether or not JIT compilation will be used for a query that uses the IN predicate. If the IN list is longer than this threshold, JIT compilation will be cancelled." + }, "cairo.sql.jit.bind.vars.memory.page.size": { "default": "4K", "description": "Sets the memory page size for storing bind variable values for JIT compiled filter." From db0ce1705d6c0824195f99c87941f388b716af7e Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 1 Oct 2025 18:23:23 +0100 Subject: [PATCH 07/31] mode --- .../reference/function/aggregation.md | 45 ++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/documentation/reference/function/aggregation.md b/documentation/reference/function/aggregation.md index 2d7470c9d..7ebe67c30 100644 --- a/documentation/reference/function/aggregation.md +++ b/documentation/reference/function/aggregation.md @@ -10,7 +10,7 @@ calculations. :::note -QuestDB does not using aggregate functions as arguments to other functions. For example, this is not allowed: +QuestDB does not support using aggregate functions as arguments to other functions. For example, this is not allowed: ```questdb-sql SELECT datediff('d', min(timestamp), max(timestmap)) FROM trades; @@ -816,6 +816,49 @@ FROM (SELECT rnd_double() a FROM long_sequence(100)); | :--------------- | | 49.5442334742831 | +## mode + +`mode(value)` - calculates the mode (most frequent) value out of a particular dataset. + +For `mode(B)`, if there are an equal number of `true` and `false` values, `true` will be returned as a tie-breaker. + +For other modes, if there are equal mode values, the returned value will be whichever the code identifies first. + +To make the result deterministic, you must enforce an underlying sort order. + +#### Parameters + +- `value` - one of (LONG, DOUBLE, BOOLEAN, STRING, VARCHAR, SYMBOL) + +#### Return value + +Return value type is the same as the type of the input `value`. + + +#### Examples + +With this dataset: + +| symbol | value | +|-----------|-------| +| A | alpha | +| A | alpha | +| A | alpha | +| A | omega | +| B | beta | +| B | beta | +| B | gamma | + +```questdb-sql +SELECT symbol, mode(value) as mode FROM dataset; +``` + +| symbol | mode | +|--------|-------| +| A | alpha | +| B | beta | + + ## stddev / stddev_samp `stddev_samp(value)` - Calculates the sample standard deviation of a set of From 8b6a9bc420c2ea95d39e9965f868430b112400a9 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 1 Oct 2025 18:35:08 +0100 Subject: [PATCH 08/31] robot one shot --- documentation/reference/api/rest.md | 67 +++++- documentation/reference/function/meta.md | 44 ++++ documentation/reference/sql/copy.md | 183 +++++++++++++++- static/images/docs/diagrams/.railroad | 6 +- static/images/docs/diagrams/copy.svg | 256 ++++++++++++++++------- 5 files changed, 470 insertions(+), 86 deletions(-) diff --git a/documentation/reference/api/rest.md b/documentation/reference/api/rest.md index 184408294..bfb9dbf89 100644 --- a/documentation/reference/api/rest.md +++ b/documentation/reference/api/rest.md @@ -590,16 +590,34 @@ returned in a tabular form to be saved and reused as opposed to JSON. `/exp` is expecting an HTTP GET request with following parameters: -| Parameter | Required | Description | -| :-------- | :------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `query` | Yes | URL encoded query text. It can be multi-line. | -| `limit` | No | Paging opp parameter. For example, `limit=10,20` will return row numbers 10 through to 20 inclusive and `limit=20` will return first 20 rows, which is equivalent to `limit=0,20`. `limit=-20` will return the last 20 rows. | -| `nm` | No | `true` or `false`. Skips the metadata section of the response when set to `true`. | +| Parameter | Required | Description | +| :--------------------- | :------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `query` | Yes | URL encoded query text. It can be multi-line. | +| `limit` | No | Paging opp parameter. For example, `limit=10,20` will return row numbers 10 through to 20 inclusive and `limit=20` will return first 20 rows, which is equivalent to `limit=0,20`. `limit=-20` will return the last 20 rows. | +| `nm` | No | `true` or `false`. Skips the metadata section of the response when set to `true`. | +| `fmt` | No | Export format. Valid values: `parquet`. When set to `parquet`, exports data in Parquet format instead of CSV. | + +#### Parquet Export Parameters + +When `fmt=parquet`, the following additional parameters are supported: + +| Parameter | Required | Default | Description | +| :--------------------- | :------- | :----------- | :-------------------------------------------------------------------------------------------------------------- | +| `partition_by` | No | `NONE` | Partition unit: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, or `YEAR`. | +| `compression_codec` | No | `ZSTD` | Compression algorithm: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, or `LZ4_RAW`. | +| `compression_level` | No | Codec-dependent | Compression level (codec-specific). Higher values = better compression but slower. | +| `row_group_size` | No | `100000` | Number of rows per Parquet row group. | +| `data_page_size` | No | `1048576` | Size of data pages in bytes (default 1MB). | +| `statistics_enabled` | No | `true` | Enable Parquet column statistics: `true` or `false`. | +| `parquet_version` | No | `2` | Parquet format version: `1` (v1.0) or `2` (v2.0). | +| `raw_array_encoding` | No | `true` | Use raw encoding for arrays: `true` or `false`. | The parameters must be URL encoded. ### Examples +#### CSV Export (default) + Considering the query: ```shell @@ -620,6 +638,45 @@ A HTTP status code of `200` is returned with the following response body: 200501BS00005,"2005-01-10T00:00:00.000Z",21:13 ``` +#### Parquet Export + +Export query results to Parquet format: + +```shell +curl -G \ + --data-urlencode "query=SELECT * FROM trades WHERE timestamp IN today()" \ + --data-urlencode "fmt=parquet" \ + http://localhost:9000/exp > trades_today.parquet +``` + +#### Parquet Export with Custom Options + +Export with custom compression and partitioning: + +```shell +curl -G \ + --data-urlencode "query=SELECT * FROM trades" \ + --data-urlencode "fmt=parquet" \ + --data-urlencode "partition_by=DAY" \ + --data-urlencode "compression_codec=ZSTD" \ + --data-urlencode "compression_level=9" \ + --data-urlencode "row_group_size=1000000" \ + http://localhost:9000/exp > trades.parquet +``` + +#### Parquet Export with LZ4 Compression + +Export with LZ4_RAW compression for faster export: + +```shell +curl -G \ + --data-urlencode "query=SELECT symbol, price, amount FROM trades WHERE timestamp > dateadd('h', -1, now())" \ + --data-urlencode "fmt=parquet" \ + --data-urlencode "compression_codec=LZ4_RAW" \ + --data-urlencode "compression_level=3" \ + http://localhost:9000/exp > recent_trades.parquet +``` + ## Error responses ### Malformed queries diff --git a/documentation/reference/function/meta.md b/documentation/reference/function/meta.md index 963981f8a..6893f06dd 100644 --- a/documentation/reference/function/meta.md +++ b/documentation/reference/function/meta.md @@ -589,6 +589,50 @@ If you want to re-read metadata for all user tables, simply use an asterisk: SELECT hydrate_table_metadata('*'); ``` +## copy_export_log + +`copy_export_log()` or `sys.copy_export_log` returns the export log for `COPY TO` operations. + +**Arguments:** + +- `copy_export_log()` does not require arguments. + +**Return value:** + +Returns metadata on `COPY TO` export operations for the last three days, including columns such as: + +- `ts` - timestamp of the log event +- `id` - export identifier that can be used to track export progress +- `table` - source table name (or 'query' for subquery exports) +- `destination` - destination directory path for the export +- `format` - export format (currently only 'PARQUET') +- `status` - event status: 'started', 'finished', 'failed', or 'cancelled' +- `message` - error message when status is 'failed' +- `rows_exported` - total number of exported rows (shown in final log row) +- `partition` - partition name for partitioned exports (null for non-partitioned) + +**Examples:** + +```questdb-sql +SELECT * FROM copy_export_log(); +``` + +| ts | id | table | destination | format | status | message | rows_exported | partition | +| --------------------------- | ---------------- | ------ | ------------- | ------- | -------- | ------- | ------------- | ---------- | +| 2024-10-01T14:23:15.123456Z | 7f3a9c2e1b456789 | trades | trades_export | PARQUET | started | | 0 | null | +| 2024-10-01T14:25:42.987654Z | 7f3a9c2e1b456789 | trades | trades_export | PARQUET | finished | | 1000000 | null | + +```questdb-sql title="Track specific export" +SELECT * FROM copy_export_log() WHERE id = '7f3a9c2e1b456789'; +``` + +```questdb-sql title="View recent failed exports" +SELECT ts, table, destination, message +FROM copy_export_log() +WHERE status = 'failed' +ORDER BY ts DESC; +``` + ## flush_query_cache() `flush_query_cache' invalidates cached query execution plans. diff --git a/documentation/reference/sql/copy.md b/documentation/reference/sql/copy.md index c7f359ea0..7194c1122 100644 --- a/documentation/reference/sql/copy.md +++ b/documentation/reference/sql/copy.md @@ -23,6 +23,13 @@ following impact: ## Description +The `COPY` command has two modes of operation: + +1. **Import mode**: `COPY table_name FROM 'file.csv'` - Copies data from a delimited text file into QuestDB +2. **Export mode**: `COPY table_name TO 'output_directory'` or `COPY (query) TO 'output_directory'` - Exports table or query results to Parquet format + +### Import Mode + Copies tables from a delimited text file saved in the defined root directory into QuestDB. `COPY` has the following import modes: @@ -56,6 +63,22 @@ request(s) will be rejected. `COPY '' CANCEL` cancels the copying operation defined by the import `id`, while an import is taking place. +### Export Mode + +Exports data from a table or query result set to Parquet format. The export is performed asynchronously and non-blocking, allowing writes to continue during the export process. + +**Key features:** + +- Export entire tables or query results +- Configurable Parquet export options (compression, row group size, etc.) +- Non-blocking exports - writes continue during export +- Supports partitioned exports matching table partitioning +- Configurable size limits + +**Export directory:** + +The export destination is relative to `cairo.sql.copy.root` (defaults to `root_directory/export`). You can configure this through the [configuration settings](/docs/configuration/). + ### Root directory `COPY` requires a defined root directory where CSV files are saved and copied @@ -90,10 +113,13 @@ the `/Users` tree and set the root directory accordingly. ::: -### Log table +### Log tables + +`COPY` generates log tables tracking operations: -`COPY` generates a log table,`sys.text_import_log`, tracking `COPY` operation -for the last three days with the following information: +#### Import log: `sys.text_import_log` + +Tracks `COPY FROM` (import) operations for the last three days with the following information: | Column name | Data type | Notes | | ------------- | --------- | ----------------------------------------------------------------------------- | @@ -130,8 +156,30 @@ Log table row retention is configurable through `COPY` returns `id` value from `sys.text_import_log` to track the import progress. +#### Export log: `sys.copy_export_log` + +Tracks `COPY TO` (export) operations for the last three days with the following information: + +| Column name | Data type | Notes | +| ------------- | --------- | ----------------------------------------------------------------------------- | +| ts | timestamp | The log event timestamp | +| id | string | Export id | +| table | symbol | Source table name (or 'query' for subquery exports) | +| destination | symbol | The destination directory path | +| format | symbol | Export format (currently only 'PARQUET') | +| status | symbol | The event status: started, finished, failed, cancelled | +| message | string | The error message when status is failed | +| rows_exported | long | The total number of exported rows (shown in final log row) | +| partition | symbol | Partition name for partitioned exports (null for non-partitioned) | + +Log table row retention is configurable through `cairo.sql.copy.log.retention.days` setting, and is three days by default. + +`COPY TO` returns an `id` value from `sys.copy_export_log` to track the export progress. + ## Options +### Import Options (COPY FROM) + - `HEADER true/false`: When `true`, QuestDB automatically assumes the first row is a header. Otherwise, schema recognition is used to determine whether the first row is used as header. The default setting is `false`. @@ -150,8 +198,25 @@ progress. - `ABORT`: Abort whole import on first error, and restore the pre-import table status +### Export Options (COPY TO) + +All export options are specified using the `WITH` clause after the `TO` destination path. + +- `FORMAT PARQUET`: Specifies Parquet as the export format (currently the only supported format). Default: `PARQUET`. +- `PARTITION_BY `: Partition the export by time unit. Valid values: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, `YEAR`. Default: matches the source table's partitioning, or `NONE` for queries. +- `SIZE_LIMIT `: Maximum size for export files. Supports units like `10MB`, `1GB`, etc. When exceeded, a new file is created. Default: unlimited. +- `COMPRESSION_CODEC `: Parquet compression algorithm. Valid values: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, `LZ4_RAW`. Default: `ZSTD`. +- `COMPRESSION_LEVEL `: Compression level (codec-specific). Higher values mean better compression but slower speed. Default: varies by codec. +- `ROW_GROUP_SIZE `: Number of rows per Parquet row group. Larger values improve compression but increase memory usage. Default: `100000`. +- `DATA_PAGE_SIZE `: Size of data pages within row groups in bytes. Default: `1048576` (1MB). +- `STATISTICS_ENABLED true/false`: Enable Parquet column statistics for better query performance. Default: `true`. +- `PARQUET_VERSION `: Parquet format version. Valid values: `1` (v1.0) or `2` (v2.0). Default: `2`. +- `RAW_ARRAY_ENCODING true/false`: Use raw encoding for arrays (more efficient for numeric arrays). Default: `true`. + ## Examples +### Import Examples + For more details on parallel import, please also see [Importing data in bulk via CSV](/docs/guides/import-csv/#import-csv-via-copy-sql). @@ -194,3 +259,115 @@ SELECT * FROM 'sys.text_import_log' WHERE id = '55ca24e5ba328050' LIMIT -1; | ts | id | table | file | phase | status | message | rows_handled | rows_imported | errors | | :-------------------------- | ---------------- | ------- | ----------- | ----- | --------- | ---------------------------------------------------------- | ------------ | ------------- | ------ | | 2022-08-03T14:04:42.268502Z | 55ca24e5ba328050 | weather | weather.csv | null | cancelled | import cancelled [phase=partition_import, msg=`Cancelled`] | 0 | 0 | 0 | + +### Export Examples + +#### Export entire table to Parquet + +Export a complete table to Parquet format: + +```questdb-sql title="Export table to Parquet" +COPY trades TO 'trades_export' WITH FORMAT PARQUET; +``` + +Returns an export ID: + +| id | +| ---------------- | +| 7f3a9c2e1b456789 | + +Track export progress: + +```questdb-sql +SELECT * FROM sys.copy_export_log WHERE id = '7f3a9c2e1b456789'; +``` + +#### Export query results to Parquet + +Export the results of a query: + +```questdb-sql title="Export filtered data" +COPY (SELECT * FROM trades WHERE timestamp IN today() AND symbol = 'BTC-USD') +TO 'btc_today' +WITH FORMAT PARQUET; +``` + +#### Export with partitioning + +Export data partitioned by day: + +```questdb-sql title="Export with daily partitions" +COPY trades TO 'trades_daily' +WITH FORMAT PARQUET +PARTITION BY DAY; +``` + +This creates separate Parquet files for each day's data in subdirectories named by date. + +#### Export with custom Parquet options + +Configure compression, row group size, and other Parquet settings: + +```questdb-sql title="Export with custom compression" +COPY trades TO 'trades_compressed' +WITH + FORMAT PARQUET + COMPRESSION_CODEC ZSTD + COMPRESSION_LEVEL 9 + ROW_GROUP_SIZE 1000000 + DATA_PAGE_SIZE 2097152; +``` + +#### Export with size limits + +Limit export file size to create multiple files: + +```questdb-sql title="Export with 1GB file size limit" +COPY trades TO 'trades_chunked' +WITH + FORMAT PARQUET + SIZE_LIMIT 1GB; +``` + +When the export exceeds 1GB, QuestDB creates multiple numbered files: `trades_chunked_0.parquet`, `trades_chunked_1.parquet`, etc. + +#### Export aggregated data + +Export aggregated results for analysis: + +```questdb-sql title="Export OHLCV data" +COPY ( + SELECT + timestamp, + symbol, + first(price) AS open, + max(price) AS high, + min(price) AS low, + last(price) AS close, + sum(amount) AS volume + FROM trades + WHERE timestamp > dateadd('d', -7, now()) + SAMPLE BY 1h +) +TO 'ohlcv_7d' +WITH FORMAT PARQUET; +``` + +#### Monitor export status + +Check all recent exports: + +```questdb-sql title="View export history" +SELECT ts, table, destination, status, rows_exported +FROM sys.copy_export_log +WHERE ts > dateadd('d', -1, now()) +ORDER BY ts DESC; +``` + +Sample output: + +| ts | table | destination | status | rows_exported | +| --------------------------- | ------ | ---------------- | -------- | ------------- | +| 2024-10-01T14:23:15.123456Z | trades | trades_export | finished | 1000000 | +| 2024-10-01T13:45:22.654321Z | query | btc_today | finished | 45672 | +| 2024-10-01T12:30:11.987654Z | trades | trades_daily | finished | 1000000 | diff --git a/static/images/docs/diagrams/.railroad b/static/images/docs/diagrams/.railroad index 6f011c785..1fc0854fc 100644 --- a/static/images/docs/diagrams/.railroad +++ b/static/images/docs/diagrams/.railroad @@ -138,7 +138,11 @@ case ::= 'CASE' ('WHEN' condition 'THEN' value)* ( | 'ELSE' value ) 'END' copy - ::= 'COPY' (id 'CANCEL' | tableName 'FROM' fileName (| 'WITH' (| 'HEADER' (true|false) |'TIMESTAMP' columnName | 'DELIMITER' delimiter | 'FORMAT' format | |'PARTITION BY' ('NONE'|'YEAR'|'MONTH'|'DAY'|'HOUR') | 'ON ERROR' ('SKIP_ROW'|'SKIP_COLUMN'|'ABORT')) )) + ::= 'COPY' ( + id 'CANCEL' + | tableName 'FROM' fileName ('WITH' ('HEADER' (true|false) | 'TIMESTAMP' columnName | 'DELIMITER' delimiter | 'FORMAT' format | 'PARTITION BY' ('NONE'|'YEAR'|'MONTH'|'DAY'|'HOUR') | 'ON ERROR' ('SKIP_ROW'|'SKIP_COLUMN'|'ABORT')))? + | (tableName | '(' selectQuery ')') 'TO' destinationPath ('WITH' ('FORMAT' 'PARQUET' | 'PARTITION BY' ('NONE'|'HOUR'|'DAY'|'WEEK'|'MONTH'|'YEAR') | 'SIZE_LIMIT' sizeValue | 'COMPRESSION_CODEC' ('UNCOMPRESSED'|'SNAPPY'|'GZIP'|'LZ4'|'ZSTD'|'LZ4_RAW') | 'COMPRESSION_LEVEL' number | 'ROW_GROUP_SIZE' number | 'DATA_PAGE_SIZE' number | 'STATISTICS_ENABLED' (true|false) | 'PARQUET_VERSION' ('1'|'2') | 'RAW_ARRAY_ENCODING' (true|false)))? + ) createTableTimestamp ::= 'CREATE' someCreateTableStatement 'timestamp' '(' columnName ')' diff --git a/static/images/docs/diagrams/copy.svg b/static/images/docs/diagrams/copy.svg index c06a2091e..6796d7378 100644 --- a/static/images/docs/diagrams/copy.svg +++ b/static/images/docs/diagrams/copy.svg @@ -1,4 +1,4 @@ - + - - - - - COPY - - - id - - CANCEL - - - tableName - - FROM - - - fileName - - WITH - - - HEADER - - - true - - - false - - TIMESTAMP - - - columnName - - DELIMITER - - - delimiter - - FORMAT - - - format - - PARTITION BY - - - NONE - - - YEAR - - - MONTH - - - DAY - - - HOUR - - - ON ERROR - - - SKIP_ROW - - - SKIP_COLUMN - - - ABORT - - - + + + + + COPY + + + id + + CANCEL + + + tableName + + FROM + + + fileName + + WITH + + + HEADER + + + true + + + false + + TIMESTAMP + + + columnName + + DELIMITER + + + delimiter + + FORMAT + + + format + + PARTITION BY + + + NONE + + + YEAR + + + MONTH + + + DAY + + + HOUR + + + ON ERROR + + + SKIP_ROW + + + SKIP_COLUMN + + + ABORT + + + tableName + + ( + + + selectQuery + + ) + + + TO + + + destinationPath + + WITH + + + FORMAT + + + PARQUET + + + PARTITION BY + + + NONE + + + HOUR + + + DAY + + + WEEK + + + MONTH + + + YEAR + + + SIZE_LIMIT + + + sizeValue + + COMPRESSION_CODEC + + + UNCOMPRESSED + + + SNAPPY + + + GZIP + + + LZ4 + + + ZSTD + + + LZ4_RAW + + + COMPRESSION_LEVEL + + + ROW_GROUP_SIZE + + + DATA_PAGE_SIZE + + + number + + STATISTICS_ENABLED + + + RAW_ARRAY_ENCODING + + + true + + + false + + PARQUET_VERSION + + + 1 + + + 2 + + + \ No newline at end of file From c2221b16dc883e9d8d5416f1769ee36079d5c9a3 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 1 Oct 2025 18:57:31 +0100 Subject: [PATCH 09/31] some edits --- documentation/reference/api/rest.md | 33 +++-- documentation/reference/sql/copy.md | 190 ++++++++++++-------------- static/images/docs/diagrams/.railroad | 2 +- 3 files changed, 105 insertions(+), 120 deletions(-) diff --git a/documentation/reference/api/rest.md b/documentation/reference/api/rest.md index bfb9dbf89..0e350a830 100644 --- a/documentation/reference/api/rest.md +++ b/documentation/reference/api/rest.md @@ -590,27 +590,27 @@ returned in a tabular form to be saved and reused as opposed to JSON. `/exp` is expecting an HTTP GET request with following parameters: -| Parameter | Required | Description | -| :--------------------- | :------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `query` | Yes | URL encoded query text. It can be multi-line. | -| `limit` | No | Paging opp parameter. For example, `limit=10,20` will return row numbers 10 through to 20 inclusive and `limit=20` will return first 20 rows, which is equivalent to `limit=0,20`. `limit=-20` will return the last 20 rows. | -| `nm` | No | `true` or `false`. Skips the metadata section of the response when set to `true`. | -| `fmt` | No | Export format. Valid values: `parquet`. When set to `parquet`, exports data in Parquet format instead of CSV. | +| Parameter | Required | Description | +|:----------|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `query` | Yes | URL encoded query text. It can be multi-line. | +| `limit` | No | Paging opp parameter. For example, `limit=10,20` will return row numbers 10 through to 20 inclusive and `limit=20` will return first 20 rows, which is equivalent to `limit=0,20`. `limit=-20` will return the last 20 rows. | +| `nm` | No | `true` or `false`. Skips the metadata section of the response when set to `true`. | +| `fmt` | No | Export format. Valid values: `parquet`, `csv`. When set to `parquet`, exports data in Parquet format instead of CSV. | #### Parquet Export Parameters When `fmt=parquet`, the following additional parameters are supported: -| Parameter | Required | Default | Description | -| :--------------------- | :------- | :----------- | :-------------------------------------------------------------------------------------------------------------- | -| `partition_by` | No | `NONE` | Partition unit: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, or `YEAR`. | -| `compression_codec` | No | `ZSTD` | Compression algorithm: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, or `LZ4_RAW`. | -| `compression_level` | No | Codec-dependent | Compression level (codec-specific). Higher values = better compression but slower. | -| `row_group_size` | No | `100000` | Number of rows per Parquet row group. | -| `data_page_size` | No | `1048576` | Size of data pages in bytes (default 1MB). | -| `statistics_enabled` | No | `true` | Enable Parquet column statistics: `true` or `false`. | -| `parquet_version` | No | `2` | Parquet format version: `1` (v1.0) or `2` (v2.0). | -| `raw_array_encoding` | No | `true` | Use raw encoding for arrays: `true` or `false`. | +| Parameter | Required | Default | Description | +|:---------------------|:---------|:----------------|:----------------------------------------------------------------------------------------------------| +| `partition_by` | No | `NONE` | Partition unit: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, or `YEAR`. | +| `compression_codec` | No | `LZ4_RAW` | Compression algorithm: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, `LZ4_RAW`, `BROTLI`, `LZO`. | +| `compression_level` | No | Codec-dependent | Compression level (codec-specific). Higher values = better compression but slower. | +| `row_group_size` | No | `100000` | Number of rows per Parquet row group. | +| `data_page_size` | No | `1048576` | Size of data pages in bytes (default 1MB). | +| `statistics_enabled` | No | `true` | Enable Parquet column statistics: `true` or `false`. | +| `parquet_version` | No | `2` | Parquet format version: `1` (v1.0) or `2` (v2.0). | +| `raw_array_encoding` | No | `true` | Use raw encoding for arrays: `true` or `false`. | The parameters must be URL encoded. @@ -673,7 +673,6 @@ curl -G \ --data-urlencode "query=SELECT symbol, price, amount FROM trades WHERE timestamp > dateadd('h', -1, now())" \ --data-urlencode "fmt=parquet" \ --data-urlencode "compression_codec=LZ4_RAW" \ - --data-urlencode "compression_level=3" \ http://localhost:9000/exp > recent_trades.parquet ``` diff --git a/documentation/reference/sql/copy.md b/documentation/reference/sql/copy.md index 7194c1122..288ca94f5 100644 --- a/documentation/reference/sql/copy.md +++ b/documentation/reference/sql/copy.md @@ -25,10 +25,10 @@ following impact: The `COPY` command has two modes of operation: -1. **Import mode**: `COPY table_name FROM 'file.csv'` - Copies data from a delimited text file into QuestDB -2. **Export mode**: `COPY table_name TO 'output_directory'` or `COPY (query) TO 'output_directory'` - Exports table or query results to Parquet format +1. **Import mode**: `COPY table_name FROM 'file.csv'`, copying data from a delimited text file into QuestDB. +2. **Export mode**: `COPY table_name TO 'output_directory'` or `COPY (query) TO 'output_directory'`, exporting data to Parquet files. -### Import Mode +### Import mode (COPY-FROM) Copies tables from a delimited text file saved in the defined root directory into QuestDB. `COPY` has the following import modes: @@ -55,7 +55,7 @@ into QuestDB. `COPY` has the following import modes: :::note -`COPY` takes up all the available resources. While one import is running, new +Parallel `COPY` takes up all the available resources. While one import is running, new request(s) will be rejected. ::: @@ -63,23 +63,7 @@ request(s) will be rejected. `COPY '' CANCEL` cancels the copying operation defined by the import `id`, while an import is taking place. -### Export Mode - -Exports data from a table or query result set to Parquet format. The export is performed asynchronously and non-blocking, allowing writes to continue during the export process. - -**Key features:** - -- Export entire tables or query results -- Configurable Parquet export options (compression, row group size, etc.) -- Non-blocking exports - writes continue during export -- Supports partitioned exports matching table partitioning -- Configurable size limits - -**Export directory:** - -The export destination is relative to `cairo.sql.copy.root` (defaults to `root_directory/export`). You can configure this through the [configuration settings](/docs/configuration/). - -### Root directory +### Import root `COPY` requires a defined root directory where CSV files are saved and copied from. A CSV file must be saved to the root directory before starting the `COPY` @@ -87,7 +71,7 @@ operation. There are two root directories to be defined: - `cairo.sql.copy.root` is used for storing regular files to be imported. By default, it points to the `root_directory/import` directory. This allows you to drop a CSV - file into the `import` directory and start the import operation. + file into the `import` directory and start the import operation. - `cairo.sql.copy.work.root` is used for storing temporary files like indexes or temporary partitions. Unless otherwise specified, it points to the `root_directory/tmp` directory. @@ -113,13 +97,9 @@ the `/Users` tree and set the root directory accordingly. ::: -### Log tables - -`COPY` generates log tables tracking operations: +### Logs -#### Import log: `sys.text_import_log` - -Tracks `COPY FROM` (import) operations for the last three days with the following information: +`COPY-FROM` reports its progress through a system table, `sys.text_import_log`. This contains the following information: | Column name | Data type | Notes | | ------------- | --------- | ----------------------------------------------------------------------------- | @@ -136,49 +116,27 @@ Tracks `COPY FROM` (import) operations for the last three days with the followin | | | The counters are shown in the final log row for the given import | | errors | long | The number of errors for the given phase | -\* Available phases for parallel import are: - -- setup -- boundary_check -- indexing -- partition_import -- symbol_table_merge -- update_symbol_keys -- build_symbol_index -- move_partitions -- attach_partitions -- analyze_file_structure -- cleanup -Log table row retention is configurable through -`cairo.sql.copy.log.retention.days` setting, and is three days by default. +**Parallel import phases** + - setup + - boundary_check + - indexing + - partition_import + - symbol_table_merge + - update_symbol_keys + - build_symbol_index + - move_partitions + - attach_partitions + - analyze_file_structure + - cleanup -`COPY` returns `id` value from `sys.text_import_log` to track the import -progress. +The retention for this table is configured using the `cairo.sql.copy.log.retention.days` setting, and is three days by default. -#### Export log: `sys.copy_export_log` +`COPY` returns an `id` value, which can be correlated with `sys.text_import_log` to track the import progress. -Tracks `COPY TO` (export) operations for the last three days with the following information: +### Options -| Column name | Data type | Notes | -| ------------- | --------- | ----------------------------------------------------------------------------- | -| ts | timestamp | The log event timestamp | -| id | string | Export id | -| table | symbol | Source table name (or 'query' for subquery exports) | -| destination | symbol | The destination directory path | -| format | symbol | Export format (currently only 'PARQUET') | -| status | symbol | The event status: started, finished, failed, cancelled | -| message | string | The error message when status is failed | -| rows_exported | long | The total number of exported rows (shown in final log row) | -| partition | symbol | Partition name for partitioned exports (null for non-partitioned) | - -Log table row retention is configurable through `cairo.sql.copy.log.retention.days` setting, and is three days by default. - -`COPY TO` returns an `id` value from `sys.copy_export_log` to track the export progress. - -## Options - -### Import Options (COPY FROM) +These options are provided as key-value pairs after the `WITH` keyword. - `HEADER true/false`: When `true`, QuestDB automatically assumes the first row is a header. Otherwise, schema recognition is used to determine whether the @@ -192,30 +150,13 @@ Log table row retention is configurable through `cairo.sql.copy.log.retention.da - `DELIMITER`: Default setting is `,`. - `PARTITION BY`: Partition unit. - `ON ERROR`: Define responses to data parsing errors. The valid values are: - - `SKIP_ROW`: Skip the entire row - - `SKIP_COLUMN`: Skip column and use the default value (`null` for nullable - types, `false` for boolean, `0` for other non-nullable types) - - `ABORT`: Abort whole import on first error, and restore the pre-import table - status - -### Export Options (COPY TO) - -All export options are specified using the `WITH` clause after the `TO` destination path. - -- `FORMAT PARQUET`: Specifies Parquet as the export format (currently the only supported format). Default: `PARQUET`. -- `PARTITION_BY `: Partition the export by time unit. Valid values: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, `YEAR`. Default: matches the source table's partitioning, or `NONE` for queries. -- `SIZE_LIMIT `: Maximum size for export files. Supports units like `10MB`, `1GB`, etc. When exceeded, a new file is created. Default: unlimited. -- `COMPRESSION_CODEC `: Parquet compression algorithm. Valid values: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, `LZ4_RAW`. Default: `ZSTD`. -- `COMPRESSION_LEVEL `: Compression level (codec-specific). Higher values mean better compression but slower speed. Default: varies by codec. -- `ROW_GROUP_SIZE `: Number of rows per Parquet row group. Larger values improve compression but increase memory usage. Default: `100000`. -- `DATA_PAGE_SIZE `: Size of data pages within row groups in bytes. Default: `1048576` (1MB). -- `STATISTICS_ENABLED true/false`: Enable Parquet column statistics for better query performance. Default: `true`. -- `PARQUET_VERSION `: Parquet format version. Valid values: `1` (v1.0) or `2` (v2.0). Default: `2`. -- `RAW_ARRAY_ENCODING true/false`: Use raw encoding for arrays (more efficient for numeric arrays). Default: `true`. - -## Examples + - `SKIP_ROW`: Skip the entire row + - `SKIP_COLUMN`: Skip column and use the default value (`null` for nullable + types, `false` for boolean, `0` for other non-nullable types) + - `ABORT`: Abort whole import on first error, and restore the pre-import table + status -### Import Examples +### Examples For more details on parallel import, please also see [Importing data in bulk via CSV](/docs/guides/import-csv/#import-csv-via-copy-sql). @@ -260,7 +201,59 @@ SELECT * FROM 'sys.text_import_log' WHERE id = '55ca24e5ba328050' LIMIT -1; | :-------------------------- | ---------------- | ------- | ----------- | ----- | --------- | ---------------------------------------------------------- | ------------ | ------------- | ------ | | 2022-08-03T14:04:42.268502Z | 55ca24e5ba328050 | weather | weather.csv | null | cancelled | import cancelled [phase=partition_import, msg=`Cancelled`] | 0 | 0 | 0 | -### Export Examples + +### Export mode (COPY-TO) + +Exports data from a table or query result set to Parquet format. The export is performed asynchronously and non-blocking, allowing writes to continue during the export process. + +**Key features:** + +- Export entire tables or query results +- Configurable Parquet export options (compression, row group size, etc.) +- Non-blocking exports - writes continue during export +- Supports partitioned exports matching table partitioning +- Configurable size limits + +### Export root + +The export destination is relative to `cairo.sql.copy.export.root` (defaults to `root_directory/export`). You can configure this through the [configuration settings](/docs/configuration/). + +### Logs + +`COPY-TO` reports its progress through a system table, `sys.copy_export_log`. This contains the following information: + + +| Column name | Data type | Notes | +|--------------------|-----------|-------------------------------------------------------------------| +| ts | timestamp | The log event timestamp | +| id | string | Export id | +| table_name | symbol | Source table name (or 'query' for subquery exports) | +| export_path | symbol | The destination directory path | +| num_exported_files | int | The number of files exported | +| phase | symbol | The export execution phase | +| status | symbol | The event status: started, finished, failed, cancelled | +| message | VARCHAR | Information about the current phase/step | +| errors | long | Error code(s) | + +Log table row retention is configurable through `cairo.sql.copy.log.retention.days` setting, and is three days by default. + +`COPY TO` returns an `id` value from `sys.copy_export_log` to track the export progress. + +### Options + +All export options are specified using the `WITH` clause after the `TO` destination path. + +- `FORMAT PARQUET`: Specifies Parquet as the export format (currently the only supported format). Default: `PARQUET`. +- `PARTITION_BY `: Partition the export by time unit. Valid values: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, `YEAR`. Default: matches the source table's partitioning, or `NONE` for queries. +- `COMPRESSION_CODEC `: Parquet compression algorithm. Valid values: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, `LZ4_RAW`. Default: `LZ4_RAW`. +- `COMPRESSION_LEVEL `: Compression level (codec-specific). Higher values mean better compression but slower speed. Default: varies by codec. +- `ROW_GROUP_SIZE `: Number of rows per Parquet row group. Larger values improve compression but increase memory usage. Default: `100000`. +- `DATA_PAGE_SIZE `: Size of data pages within row groups in bytes. Default: `1048576` (1MB). +- `STATISTICS_ENABLED true/false`: Enable Parquet column statistics for better query performance. Default: `true`. +- `PARQUET_VERSION `: Parquet format version. Valid values: `1` (v1.0) or `2` (v2.0). Default: `2`. +- `RAW_ARRAY_ENCODING true/false`: Use raw encoding for arrays (compatibility for parquet readers). Default: `true`. + +## Examples #### Export entire table to Parquet @@ -282,6 +275,8 @@ Track export progress: SELECT * FROM sys.copy_export_log WHERE id = '7f3a9c2e1b456789'; ``` +This will copy all of the partitions and convert them individually to parquet. + #### Export query results to Parquet Export the results of a query: @@ -292,6 +287,8 @@ TO 'btc_today' WITH FORMAT PARQUET; ``` +This will export the result set to a single parquet file. + #### Export with partitioning Export data partitioned by day: @@ -318,18 +315,7 @@ WITH DATA_PAGE_SIZE 2097152; ``` -#### Export with size limits - -Limit export file size to create multiple files: - -```questdb-sql title="Export with 1GB file size limit" -COPY trades TO 'trades_chunked' -WITH - FORMAT PARQUET - SIZE_LIMIT 1GB; -``` - -When the export exceeds 1GB, QuestDB creates multiple numbered files: `trades_chunked_0.parquet`, `trades_chunked_1.parquet`, etc. +This allows you to tune each export request to your particular needs. #### Export aggregated data diff --git a/static/images/docs/diagrams/.railroad b/static/images/docs/diagrams/.railroad index 1fc0854fc..ad80c73f4 100644 --- a/static/images/docs/diagrams/.railroad +++ b/static/images/docs/diagrams/.railroad @@ -141,7 +141,7 @@ copy ::= 'COPY' ( id 'CANCEL' | tableName 'FROM' fileName ('WITH' ('HEADER' (true|false) | 'TIMESTAMP' columnName | 'DELIMITER' delimiter | 'FORMAT' format | 'PARTITION BY' ('NONE'|'YEAR'|'MONTH'|'DAY'|'HOUR') | 'ON ERROR' ('SKIP_ROW'|'SKIP_COLUMN'|'ABORT')))? - | (tableName | '(' selectQuery ')') 'TO' destinationPath ('WITH' ('FORMAT' 'PARQUET' | 'PARTITION BY' ('NONE'|'HOUR'|'DAY'|'WEEK'|'MONTH'|'YEAR') | 'SIZE_LIMIT' sizeValue | 'COMPRESSION_CODEC' ('UNCOMPRESSED'|'SNAPPY'|'GZIP'|'LZ4'|'ZSTD'|'LZ4_RAW') | 'COMPRESSION_LEVEL' number | 'ROW_GROUP_SIZE' number | 'DATA_PAGE_SIZE' number | 'STATISTICS_ENABLED' (true|false) | 'PARQUET_VERSION' ('1'|'2') | 'RAW_ARRAY_ENCODING' (true|false)))? + | (tableName | '(' selectQuery ')') 'TO' destinationPath ('WITH' ('FORMAT' 'PARQUET' | 'PARTITION_BY' ('NONE'|'HOUR'|'DAY'|'WEEK'|'MONTH'|'YEAR') | 'COMPRESSION_CODEC' ('UNCOMPRESSED'|'SNAPPY'|'GZIP'|'LZ4'|'ZSTD'|'LZ4_RAW'|'BROTLI'|'LZO') | 'COMPRESSION_LEVEL' number | 'ROW_GROUP_SIZE' number | 'DATA_PAGE_SIZE' number | 'STATISTICS_ENABLED' (true|false) | 'PARQUET_VERSION' ('1'|'2') | 'RAW_ARRAY_ENCODING' (true|false)))? ) createTableTimestamp From 86903dab900a3701025ee9bad06d6a8ca1e3e250 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 8 Oct 2025 11:07:52 +0100 Subject: [PATCH 10/31] show columns --- documentation/reference/sql/show.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/documentation/reference/sql/show.md b/documentation/reference/sql/show.md index 245b653d8..463a546fe 100644 --- a/documentation/reference/sql/show.md +++ b/documentation/reference/sql/show.md @@ -52,17 +52,17 @@ SHOW TABLES; ### SHOW COLUMNS -```questdb-sql -SHOW COLUMNS FROM my_table; -``` - -| column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | designated | -| ------ | --------- | ------- | ------------------ | ------------ | -------------- | ---------- | -| symb | SYMBOL | true | 1048576 | false | 256 | false | -| price | DOUBLE | false | 0 | false | 0 | false | -| ts | TIMESTAMP | false | 0 | false | 0 | true | -| s | STRING | false | 0 | false | 0 | false | +```questdb-sql title="show columns" demo +SHOW COLUMNS FROM trades; +``` +| column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | symbolTableSize | designated | upsertKey | +| --------- | --------- | ------- | ------------------ | ------------ | -------------- | --------------- | ---------- | --------- | +| symbol | SYMBOL | false | 0 | true | 256 | 42 | false | false | +| side | SYMBOL | false | 0 | true | 256 | 2 | false | false | +| price | DOUBLE | false | 0 | false | 0 | 0 | false | false | +| amount | DOUBLE | false | 0 | false | 0 | 0 | false | false | +| timestamp | TIMESTAMP | false | 0 | false | 0 | 0 | true | false | ### SHOW CREATE TABLE From cc8b400b6d9ea928c523db4d44c65f0ae789d29d Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Wed, 8 Oct 2025 12:05:37 +0100 Subject: [PATCH 11/31] asof first pass --- documentation/concept/sql-optimizer-hints.md | 84 ++++++++++++++++---- documentation/reference/sql/asof-join.md | 1 + 2 files changed, 68 insertions(+), 17 deletions(-) diff --git a/documentation/concept/sql-optimizer-hints.md b/documentation/concept/sql-optimizer-hints.md index 492988721..0f26b456f 100644 --- a/documentation/concept/sql-optimizer-hints.md +++ b/documentation/concept/sql-optimizer-hints.md @@ -28,28 +28,25 @@ Hints are designed to be a safe optimization mechanism: ----- -## Binary Search Optimizations and Hints +## Time-series JOIN hints Since QuestDB 9.0.0, QuestDB's optimizer defaults to using a binary search-based strategy for **`ASOF JOIN`** and **`LT JOIN`** (Less Than Join) queries that have a filter on the right-hand side (the joined or lookup table). This approach is generally faster as it avoids a full table scan. However, for some specific data distributions and filter conditions, the previous strategy of performing a parallel full -table scan can be more performant. For these cases, QuestDB provides hints to *avoid* the default binary search. +table scan can be more performant. For these cases, QuestDB provides hints to modify the default search strategy. -### AVOID\_ASOF\_BINARY\_SEARCH and AVOID\_LT\_BINARY\_SEARCH +The `asof`-prefixed hints will also apply to `lt` joins. -These hints instruct the optimizer to revert to the pre-9.0 execution strategy for `ASOF JOIN` and `LT JOIN` queries, +### asof_linear_search(left_table right_table) + +This hint instructs the optimizer to revert to the pre-9.0 execution strategy for `ASOF JOIN` and `LT JOIN` queries, respectively. This older strategy involves performing a full parallel scan on the joined table to apply filters *before* executing the join. -- `AVOID_ASOF_BINARY_SEARCH(left_table_alias right_table_alias)`: Use for **`ASOF JOIN`** queries. -- `AVOID_LT_BINARY_SEARCH(table_alias)`: Use for **`LT JOIN`** queries. - - - -```questdb-sql title="Avoiding binary search for an ASOF join" -SELECT /*+ AVOID_ASOF_BINARY_SEARCH(orders md) */ +```questdb-sql title="Using linear search for an ASOF join" +SELECT /*+ asof_linear_search(orders md) */ orders.ts, orders.price, md.md_ts, md.bid, md.ask FROM orders ASOF JOIN ( @@ -68,20 +65,20 @@ The **default strategy (binary search)** works as follows: evaluating the filter condition until a match is found. -The **hinted strategy (`AVOID_..._BINARY_SEARCH`)** forces this plan: +The hinted strategy forces this plan: 1. Apply the filter to the *entire* joined table in parallel. 2. Join the filtered (and now much smaller) result set to the main table. -#### When to use the AVOID hints +#### When to use it -You should only need these hints in a specific scenario: when the filter on your joined table is **highly selective**. +You should only need this hint in a specific scenario: when the filter on your joined table is **highly selective**. A filter is considered highly selective if it eliminates a very large percentage of rows (e.g., more than 95%). In this situation, the hinted strategy can be faster because: @@ -95,6 +92,53 @@ scan may have to check many rows before finding one that satisfies the filter co For most other cases, especially with filters that have low selectivity or when the joined table data is not in memory ("cold"), the default binary search is significantly faster as it minimizes I/O operations. +### `asof_index_search(left_table right_table)` + +This hint instructs the optimizer to use a symbol's index to skip over any time partitions where the symbol does not appear. + +In partitions where the symbol does appear, there will still be some scanning to locate the matching rows. + +`asof_index_search(left_table_alias right_table_alias)` + +```questdb-sql title="Using index search for an ASOF join" +SELECT /*+ asof_index_search(orders md) */ + orders.timestamp, orders.symbol, orders.price +FROM orders +ASOF JOIN (md) ON (symbol); +``` + +#### When to use it + +This hint can be effective when your symbol is rare, meaning the index is highly selective, rarely appearing in any of +your partitions. + +If the symbol appears frequently, then this hint may cause a slower execution plan than the default. + + +### `asof_memoized_search` + +This hint instructs the optimizer to memoize (remember) rows it has previously seen, and use this information to avoid +repeated re-scanning of data. + +Imagine a linear scan. For each symbol, we must scan forward to find the next available row. This symbol could be far away. +When the matching row is located, we store it, pick the next symbol, and repeat this scan. This causes repeated re-reading of data. + +Instead, the query engine will check each row for a matching symbol, recording the locations. Then when the symbol is next +processed, the memoized rows are checked (look-ahead) and the cursor skips forward. + +```questdb-sql title="Using memoized search for an ASOF join" +SELECT /*+ asof_memoized_search(orders md) */ + orders.timestamp, orders.symbol, orders.price +FROM orders +ASOF JOIN (md) ON (symbol); +``` + +#### When to use it + +If your table has a very skewed symbol distribution, this hint can dramatically speed up the query. A typical skew +would be a few symbols with very large row counts, and many symbols with very small row counts. This hint works well +for Zipfian-distributed data. + ----- ### Execution Plan Observation @@ -133,10 +177,10 @@ SelectedRecord #### Hinted Execution Plan (Full Scan) -When you use the `AVOID_ASOF_BINARY_SEARCH` hint, the plan changes. +When you use the `asof_linear_search` hint, the plan changes. ```questdb-sql title="Observing execution plan with the AVOID hint" demo -EXPLAIN SELECT /*+ AVOID_ASOF_BINARY_SEARCH(core_price market_data) */ +EXPLAIN SELECT /*+ asof_linear_search(core_price market_data) */ * FROM core_price ASOF JOIN market_data @@ -161,3 +205,9 @@ SelectedRecord                 Frame forward scan on: market_data ``` +## Deprecated hints + +- `avoid_asof_binary_search` + - superceded by `asof_linear_search` +- `avoid_lt_binary_search` + - superceded by `asof_linear_search` \ No newline at end of file diff --git a/documentation/reference/sql/asof-join.md b/documentation/reference/sql/asof-join.md index d17158b2a..cf0f41bc4 100644 --- a/documentation/reference/sql/asof-join.md +++ b/documentation/reference/sql/asof-join.md @@ -87,6 +87,7 @@ FROM ```
+ | timestamp | symbol | best_bid_price | | --------------------------- | ------ | -------------- | | 2025-09-16T14:00:00.006068Z | USDJPY | 145.67 | From 8ebd42c8b70cd5bbb04911d1bf3349c5b9f88cfa Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Mon, 27 Oct 2025 14:12:26 +0000 Subject: [PATCH 12/31] correct table --- documentation/reference/function/meta.md | 44 +++++++++++++----------- 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/documentation/reference/function/meta.md b/documentation/reference/function/meta.md index 6893f06dd..3b846a959 100644 --- a/documentation/reference/function/meta.md +++ b/documentation/reference/function/meta.md @@ -603,36 +603,40 @@ Returns metadata on `COPY TO` export operations for the last three days, includi - `ts` - timestamp of the log event - `id` - export identifier that can be used to track export progress -- `table` - source table name (or 'query' for subquery exports) -- `destination` - destination directory path for the export -- `format` - export format (currently only 'PARQUET') -- `status` - event status: 'started', 'finished', 'failed', or 'cancelled' -- `message` - error message when status is 'failed' -- `rows_exported` - total number of exported rows (shown in final log row) -- `partition` - partition name for partitioned exports (null for non-partitioned) +- `table_name` - source table name (or 'query' for subquery exports) +- `export_path` - destination directory path for the export +- `phase` - progress markers for each export step +- `status` - event status for each phase, for example 'started', 'finished' +- `message` - additional text (important for error rows) +- `errors` - error number or flag **Examples:** ```questdb-sql -SELECT * FROM copy_export_log(); +COPY trades TO 'trades' WITH FORMAT PARQUET; ``` -| ts | id | table | destination | format | status | message | rows_exported | partition | -| --------------------------- | ---------------- | ------ | ------------- | ------- | -------- | ------- | ------------- | ---------- | -| 2024-10-01T14:23:15.123456Z | 7f3a9c2e1b456789 | trades | trades_export | PARQUET | started | | 0 | null | -| 2024-10-01T14:25:42.987654Z | 7f3a9c2e1b456789 | trades | trades_export | PARQUET | finished | | 1000000 | null | +| id | +|------------------| +| 38b2b45f28aa822e | -```questdb-sql title="Track specific export" -SELECT * FROM copy_export_log() WHERE id = '7f3a9c2e1b456789'; -``` +Checking the log: -```questdb-sql title="View recent failed exports" -SELECT ts, table, destination, message -FROM copy_export_log() -WHERE status = 'failed' -ORDER BY ts DESC; +```questdb-sql +SELECT * FROM copy_export_log() WHERE id = '38b2b45f28aa822e'; ``` +| ts | id | table_name | export_path | num_exported_files | phase | status | message | errors | +|-----------------------------|------------------|------------|---------------------------------|--------------------|-----------------------|----------|---------|--------| +| 2025-10-27T14:07:20.513119Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | started | queued | 0 | +| 2025-10-27T14:07:20.541779Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | finished | 0 | +| 2025-10-27T14:07:20.542552Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | started | null | 0 | +| 2025-10-27T14:07:20.658111Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | finished | null | 0 | +| 2025-10-27T14:07:20.658185Z | 38b2b45f28aa822e | trades | null | null | move_files | started | null | 0 | +| 2025-10-27T14:07:20.670200Z | 38b2b45f28aa822e | trades | null | null | move_files | finished | null | 0 | +| 2025-10-27T14:07:20.670414Z | 38b2b45f28aa822e | trades | ///export/trades/ | 26 | success | finished | null | 0 | + + ## flush_query_cache() `flush_query_cache' invalidates cached query execution plans. From 3ba5815194d44106a3a5b68f418c33eb564b5e13 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Mon, 27 Oct 2025 15:04:00 +0000 Subject: [PATCH 13/31] doc fixes --- documentation/reference/api/rest.md | 21 +++++++++++---------- documentation/reference/function/meta.md | 9 +++++---- 2 files changed, 16 insertions(+), 14 deletions(-) diff --git a/documentation/reference/api/rest.md b/documentation/reference/api/rest.md index 0e350a830..534a34f29 100644 --- a/documentation/reference/api/rest.md +++ b/documentation/reference/api/rest.md @@ -601,16 +601,17 @@ returned in a tabular form to be saved and reused as opposed to JSON. When `fmt=parquet`, the following additional parameters are supported: -| Parameter | Required | Default | Description | -|:---------------------|:---------|:----------------|:----------------------------------------------------------------------------------------------------| -| `partition_by` | No | `NONE` | Partition unit: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, or `YEAR`. | -| `compression_codec` | No | `LZ4_RAW` | Compression algorithm: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, `LZ4_RAW`, `BROTLI`, `LZO`. | -| `compression_level` | No | Codec-dependent | Compression level (codec-specific). Higher values = better compression but slower. | -| `row_group_size` | No | `100000` | Number of rows per Parquet row group. | -| `data_page_size` | No | `1048576` | Size of data pages in bytes (default 1MB). | -| `statistics_enabled` | No | `true` | Enable Parquet column statistics: `true` or `false`. | -| `parquet_version` | No | `2` | Parquet format version: `1` (v1.0) or `2` (v2.0). | -| `raw_array_encoding` | No | `true` | Use raw encoding for arrays: `true` or `false`. | +| Parameter | Required | Default | Description | +|:---------------------|:---------|:----------|:----------------------------------------------------------------------------------------------------| +| `partition_by` | No | `NONE` | Partition unit: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, or `YEAR`. | +| `compression_codec` | No | `ZSTD` | Compression algorithm: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, `LZ4_RAW`, `BROTLI`, `LZO`. | +| `compression_level` | No | `9` | Compression level (codec-specific). Higher values = better compression but slower. | +| `row_group_size` | No | `100000` | Number of rows per Parquet row group. | +| `data_page_size` | No | `1048576` | Size of data pages in bytes (default 1MB). | +| `statistics_enabled` | No | `true` | Enable Parquet column statistics: `true` or `false`. | +| `parquet_version` | No | `2` | Parquet format version: `1` (v1.0) or `2` (v2.0). | +| `raw_array_encoding` | No | `true` | Use raw encoding for arrays: `true` or `false`. | +| `rmode` | No | `false` | Set HTTP response mode: `nodelay` or not sent | The parameters must be URL encoded. diff --git a/documentation/reference/function/meta.md b/documentation/reference/function/meta.md index 3b846a959..5629f2790 100644 --- a/documentation/reference/function/meta.md +++ b/documentation/reference/function/meta.md @@ -589,22 +589,23 @@ If you want to re-read metadata for all user tables, simply use an asterisk: SELECT hydrate_table_metadata('*'); ``` -## copy_export_log +## sys.copy_export_log -`copy_export_log()` or `sys.copy_export_log` returns the export log for `COPY TO` operations. +`sys.copy_export_log` is a pseudo-table containing the export log for `COPY TO` operations. **Arguments:** -- `copy_export_log()` does not require arguments. +- `sys.copy_export_log` does not require arguments. **Return value:** -Returns metadata on `COPY TO` export operations for the last three days, including columns such as: +Returns metadata on `COPY TO` export operations for the last three days, including the columns: - `ts` - timestamp of the log event - `id` - export identifier that can be used to track export progress - `table_name` - source table name (or 'query' for subquery exports) - `export_path` - destination directory path for the export +- `num_exported_files` - how many output files were written - `phase` - progress markers for each export step - `status` - event status for each phase, for example 'started', 'finished' - `message` - additional text (important for error rows) From 43011872adc9cf44bd379f9fa4623a8b2165d6d1 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Mon, 27 Oct 2025 15:11:39 +0000 Subject: [PATCH 14/31] simplify path --- documentation/reference/function/meta.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/documentation/reference/function/meta.md b/documentation/reference/function/meta.md index 5629f2790..38240d942 100644 --- a/documentation/reference/function/meta.md +++ b/documentation/reference/function/meta.md @@ -627,15 +627,15 @@ Checking the log: SELECT * FROM copy_export_log() WHERE id = '38b2b45f28aa822e'; ``` -| ts | id | table_name | export_path | num_exported_files | phase | status | message | errors | -|-----------------------------|------------------|------------|---------------------------------|--------------------|-----------------------|----------|---------|--------| -| 2025-10-27T14:07:20.513119Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | started | queued | 0 | -| 2025-10-27T14:07:20.541779Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | finished | 0 | -| 2025-10-27T14:07:20.542552Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | started | null | 0 | -| 2025-10-27T14:07:20.658111Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | finished | null | 0 | -| 2025-10-27T14:07:20.658185Z | 38b2b45f28aa822e | trades | null | null | move_files | started | null | 0 | -| 2025-10-27T14:07:20.670200Z | 38b2b45f28aa822e | trades | null | null | move_files | finished | null | 0 | -| 2025-10-27T14:07:20.670414Z | 38b2b45f28aa822e | trades | ///export/trades/ | 26 | success | finished | null | 0 | +| ts | id | table_name | export_path | num_exported_files | phase | status | message | errors | +|-----------------------------|------------------|------------|--------------------------|--------------------|-----------------------|----------|---------|--------| +| 2025-10-27T14:07:20.513119Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | started | queued | 0 | +| 2025-10-27T14:07:20.541779Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | finished | 0 | +| 2025-10-27T14:07:20.542552Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | started | null | 0 | +| 2025-10-27T14:07:20.658111Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | finished | null | 0 | +| 2025-10-27T14:07:20.658185Z | 38b2b45f28aa822e | trades | null | null | move_files | started | null | 0 | +| 2025-10-27T14:07:20.670200Z | 38b2b45f28aa822e | trades | null | null | move_files | finished | null | 0 | +| 2025-10-27T14:07:20.670414Z | 38b2b45f28aa822e | trades | //export/trades/ | 26 | success | finished | null | 0 | ## flush_query_cache() From b9ad967a464f1a7647ae68b98329b43223e2f98a Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Mon, 27 Oct 2025 15:49:19 +0000 Subject: [PATCH 15/31] see if this fixes --- documentation/reference/function/meta.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/documentation/reference/function/meta.md b/documentation/reference/function/meta.md index 38240d942..d21505600 100644 --- a/documentation/reference/function/meta.md +++ b/documentation/reference/function/meta.md @@ -627,15 +627,15 @@ Checking the log: SELECT * FROM copy_export_log() WHERE id = '38b2b45f28aa822e'; ``` -| ts | id | table_name | export_path | num_exported_files | phase | status | message | errors | -|-----------------------------|------------------|------------|--------------------------|--------------------|-----------------------|----------|---------|--------| -| 2025-10-27T14:07:20.513119Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | started | queued | 0 | -| 2025-10-27T14:07:20.541779Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | finished | 0 | -| 2025-10-27T14:07:20.542552Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | started | null | 0 | -| 2025-10-27T14:07:20.658111Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | finished | null | 0 | -| 2025-10-27T14:07:20.658185Z | 38b2b45f28aa822e | trades | null | null | move_files | started | null | 0 | -| 2025-10-27T14:07:20.670200Z | 38b2b45f28aa822e | trades | null | null | move_files | finished | null | 0 | -| 2025-10-27T14:07:20.670414Z | 38b2b45f28aa822e | trades | //export/trades/ | 26 | success | finished | null | 0 | +| ts | id | table_name | export_path | num_exported_files | phase | status | message | errors | +|-----------------------------|------------------|------------|--------------------------------|--------------------|-----------------------|----------|---------|--------| +| 2025-10-27T14:07:20.513119Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | started | queued | 0 | +| 2025-10-27T14:07:20.541779Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | finished | 0 | +| 2025-10-27T14:07:20.542552Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | started | null | 0 | +| 2025-10-27T14:07:20.658111Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | finished | null | 0 | +| 2025-10-27T14:07:20.658185Z | 38b2b45f28aa822e | trades | null | null | move_files | started | null | 0 | +| 2025-10-27T14:07:20.670200Z | 38b2b45f28aa822e | trades | null | null | move_files | finished | null | 0 | +| 2025-10-27T14:07:20.670414Z | 38b2b45f28aa822e | trades | /<dbroot>/export/trades/ | 26 | success | finished | null | 0 | ## flush_query_cache() From 4e48bcd087fe799440f06c844bb8c097daccdc3d Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 10:08:27 +0000 Subject: [PATCH 16/31] Apply suggestions from code review --- documentation/reference/sql/copy.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/documentation/reference/sql/copy.md b/documentation/reference/sql/copy.md index 288ca94f5..0005f69cb 100644 --- a/documentation/reference/sql/copy.md +++ b/documentation/reference/sql/copy.md @@ -28,7 +28,7 @@ The `COPY` command has two modes of operation: 1. **Import mode**: `COPY table_name FROM 'file.csv'`, copying data from a delimited text file into QuestDB. 2. **Export mode**: `COPY table_name TO 'output_directory'` or `COPY (query) TO 'output_directory'`, exporting data to Parquet files. -### Import mode (COPY-FROM) +## Import mode (COPY-FROM) Copies tables from a delimited text file saved in the defined root directory into QuestDB. `COPY` has the following import modes: @@ -202,7 +202,7 @@ SELECT * FROM 'sys.text_import_log' WHERE id = '55ca24e5ba328050' LIMIT -1; | 2022-08-03T14:04:42.268502Z | 55ca24e5ba328050 | weather | weather.csv | null | cancelled | import cancelled [phase=partition_import, msg=`Cancelled`] | 0 | 0 | 0 | -### Export mode (COPY-TO) +## Export mode (COPY-TO) Exports data from a table or query result set to Parquet format. The export is performed asynchronously and non-blocking, allowing writes to continue during the export process. From 2e411706286b25483c26f29f15022c8df0e8e599 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 11:26:32 +0000 Subject: [PATCH 17/31] add note about rw requirement --- documentation/reference/api/rest.md | 8 ++++++++ documentation/reference/sql/copy.md | 8 ++++++++ 2 files changed, 16 insertions(+) diff --git a/documentation/reference/api/rest.md b/documentation/reference/api/rest.md index 534a34f29..8c706b549 100644 --- a/documentation/reference/api/rest.md +++ b/documentation/reference/api/rest.md @@ -599,6 +599,14 @@ returned in a tabular form to be saved and reused as opposed to JSON. #### Parquet Export Parameters +:::warning + +Parquet exports currently require writing interim data to disk, and therefore must be run on **read-write instances only**. + +This limitation will be removed in future. + +::: + When `fmt=parquet`, the following additional parameters are supported: | Parameter | Required | Default | Description | diff --git a/documentation/reference/sql/copy.md b/documentation/reference/sql/copy.md index 288ca94f5..f0f8ea7e2 100644 --- a/documentation/reference/sql/copy.md +++ b/documentation/reference/sql/copy.md @@ -216,6 +216,14 @@ Exports data from a table or query result set to Parquet format. The export is p ### Export root +:::warning + +Parquet exports currently require writing interim data to disk, and therefore must be run on **read-write instances only**. + +This limitation will be removed in future. + +::: + The export destination is relative to `cairo.sql.copy.export.root` (defaults to `root_directory/export`). You can configure this through the [configuration settings](/docs/configuration/). ### Logs From 9375dc740006258b44c6d7a9b93341bcb555b224 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 14:00:50 +0000 Subject: [PATCH 18/31] addressing comments --- .../configuration-utils/_cairo.config.json | 24 +++++++++++ .../_parquet-export.config.json | 6 +++ documentation/configuration.md | 43 ++++++++++++++++--- documentation/reference/api/rest.md | 22 +++++----- .../reference/function/aggregation.md | 43 ------------------- documentation/reference/function/meta.md | 25 +++++++---- documentation/reference/sql/copy.md | 38 ++++++++++------ 7 files changed, 119 insertions(+), 82 deletions(-) create mode 100644 documentation/configuration-utils/_parquet-export.config.json diff --git a/documentation/configuration-utils/_cairo.config.json b/documentation/configuration-utils/_cairo.config.json index 4611663ee..ac40302db 100644 --- a/documentation/configuration-utils/_cairo.config.json +++ b/documentation/configuration-utils/_cairo.config.json @@ -466,5 +466,29 @@ "cairo.partition.encoder.parquet.raw.array.encoding.enabled": { "default": "false", "description": "determines whether to export arrays in QuestDB-native binary format (true, less compatible) or Parquet-native format (false, more compatible)." + }, + "cairo.partition.encoder.parquet.version": { + "default": 1, + "description": "Output parquet version to use for parquet-encoded partitions. Can be 1 or 2." + }, + "cairo.partition.encoder.parquet.statistics.enabled": { + "default": true, + "description": "Controls whether or not statistics are included in parquet-encoded partitions." + }, + "cairo.partition.encoder.parquet.compression.codec": { + "default": "ZSTD", + "description": "Sets the default compression codec for parquet-encoded partitions. Alternatives include `LZ4_RAW`, `SNAPPY`." + }, + "cairo.partition.encoder.parquet.compression.level": { + "default": "9 (ZSTD), 0 (otherwise)", + "description": "Sets the default compression level for parquet-encoded partitions. Dependent on underlying compression codec." + }, + "cairo.partition.encoder.parquet.row.group.size": { + "default": "100000", + "description": "Sets the default row-group size for parquet-encoded partitions." + }, + "cairo.partition.encoder.parquet.data.page.size": { + "default": "1048576", + "description": "Sets the default page size for parquet-encoded partitions." } } diff --git a/documentation/configuration-utils/_parquet-export.config.json b/documentation/configuration-utils/_parquet-export.config.json new file mode 100644 index 000000000..d885dafe5 --- /dev/null +++ b/documentation/configuration-utils/_parquet-export.config.json @@ -0,0 +1,6 @@ +{ + "cairo.sql.copy.export.root": { + "default": "export", + "description": "Root directory for parquet exports via `COPY-TO` SQL This path must not overlap with other directory (e.g. db, conf) of running instance, otherwise export may delete or overwrite existing files. Relative paths are resolved against the server root directory." + } +} \ No newline at end of file diff --git a/documentation/configuration.md b/documentation/configuration.md index b84261150..4b050cff3 100644 --- a/documentation/configuration.md +++ b/documentation/configuration.md @@ -10,6 +10,7 @@ import cairoConfig from "./configuration-utils/\_cairo.config.json" import parallelSqlConfig from "./configuration-utils/\_parallel-sql.config.json" import walConfig from "./configuration-utils/\_wal.config.json" import csvImportConfig from "./configuration-utils/\_csv-import.config.json" +import parquetExportConfig from "./configuration-utils/\_parquet-export.config.json" import postgresConfig from "./configuration-utils/\_postgres.config.json" import tcpConfig from "./configuration-utils/\_tcp.config.json" import udpConfig from "./configuration-utils/\_udp.config.json" @@ -168,17 +169,19 @@ applying WAL data to the table storage: -### CSV import +### COPY settings + +#### Import This section describes configuration settings for using `COPY` to import large -CSV files. +CSV files, or export parquet files. -Settings for `COPY`: +Settings for `COPY FROM` (import): -#### CSV import configuration for Docker +**CSV import configuration for Docker** For QuestDB instances using Docker: @@ -222,6 +225,36 @@ Where: It is important that the two path are identical (`/var/lib/questdb/questdb_import` in the example). + +### Export + + + +Parquet export is also generally impacted by query execution and parquet conversion parameters. + +If not overridden, the following default setting will be used. + + + + + + ### Parallel SQL execution This section describes settings that can affect the level of parallelism during diff --git a/documentation/reference/api/rest.md b/documentation/reference/api/rest.md index 8c706b549..f97490ced 100644 --- a/documentation/reference/api/rest.md +++ b/documentation/reference/api/rest.md @@ -609,17 +609,17 @@ This limitation will be removed in future. When `fmt=parquet`, the following additional parameters are supported: -| Parameter | Required | Default | Description | -|:---------------------|:---------|:----------|:----------------------------------------------------------------------------------------------------| -| `partition_by` | No | `NONE` | Partition unit: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, or `YEAR`. | -| `compression_codec` | No | `ZSTD` | Compression algorithm: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, `LZ4_RAW`, `BROTLI`, `LZO`. | -| `compression_level` | No | `9` | Compression level (codec-specific). Higher values = better compression but slower. | -| `row_group_size` | No | `100000` | Number of rows per Parquet row group. | -| `data_page_size` | No | `1048576` | Size of data pages in bytes (default 1MB). | -| `statistics_enabled` | No | `true` | Enable Parquet column statistics: `true` or `false`. | -| `parquet_version` | No | `2` | Parquet format version: `1` (v1.0) or `2` (v2.0). | -| `raw_array_encoding` | No | `true` | Use raw encoding for arrays: `true` or `false`. | -| `rmode` | No | `false` | Set HTTP response mode: `nodelay` or not sent | +| Parameter | Required | Default | Description | +|:---------------------|:---------|:----------|:-------------------------------------------------------------------------------------------------------------------| +| `partition_by` | No | `NONE` | Partition unit: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, or `YEAR`. | +| `compression_codec` | No | `ZSTD` | Compression algorithm: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, `LZ4_RAW`, `BROTLI`, `LZO`. | +| `compression_level` | No | `9` | Compression level (codec-specific). Higher values = better compression but slower. | +| `row_group_size` | No | `100000` | Number of rows per Parquet row group. | +| `data_page_size` | No | `1048576` | Size of data pages in bytes (default 1MB). | +| `statistics_enabled` | No | `true` | Enable Parquet column statistics: `true` or `false`. | +| `parquet_version` | No | `2` | Parquet format version: `1` (v1.0) or `2` (v2.0). | +| `raw_array_encoding` | No | `false` | Use raw encoding for arrays: `true` (lighter-weight, less compatible) or `false` (heavier-weight, more compatible) | +| `rmode` | No | `false` | Set HTTP response mode: `nodelay` or not sent | The parameters must be URL encoded. diff --git a/documentation/reference/function/aggregation.md b/documentation/reference/function/aggregation.md index 794161ff0..2919641f8 100644 --- a/documentation/reference/function/aggregation.md +++ b/documentation/reference/function/aggregation.md @@ -923,49 +923,6 @@ FROM (SELECT rnd_double() a FROM long_sequence(100)); | :--------------- | | 49.5442334742831 | -## mode - -`mode(value)` - calculates the mode (most frequent) value out of a particular dataset. - -For `mode(B)`, if there are an equal number of `true` and `false` values, `true` will be returned as a tie-breaker. - -For other modes, if there are equal mode values, the returned value will be whichever the code identifies first. - -To make the result deterministic, you must enforce an underlying sort order. - -#### Parameters - -- `value` - one of (LONG, DOUBLE, BOOLEAN, STRING, VARCHAR, SYMBOL) - -#### Return value - -Return value type is the same as the type of the input `value`. - - -#### Examples - -With this dataset: - -| symbol | value | -|-----------|-------| -| A | alpha | -| A | alpha | -| A | alpha | -| A | omega | -| B | beta | -| B | beta | -| B | gamma | - -```questdb-sql -SELECT symbol, mode(value) as mode FROM dataset; -``` - -| symbol | mode | -|--------|-------| -| A | alpha | -| B | beta | - - ## stddev / stddev_samp `stddev_samp(value)` - Calculates the sample standard deviation of a set of diff --git a/documentation/reference/function/meta.md b/documentation/reference/function/meta.md index d21505600..c005fd3ba 100644 --- a/documentation/reference/function/meta.md +++ b/documentation/reference/function/meta.md @@ -601,15 +601,22 @@ SELECT hydrate_table_metadata('*'); Returns metadata on `COPY TO` export operations for the last three days, including the columns: -- `ts` - timestamp of the log event -- `id` - export identifier that can be used to track export progress -- `table_name` - source table name (or 'query' for subquery exports) -- `export_path` - destination directory path for the export +- `ts` - timestamp of the log event. +- `id` - export identifier that can be used to track export progress. +- `table_name` - source table name (or 'query' for subquery exports). +- `export_path` - destination directory path for the export. - `num_exported_files` - how many output files were written -- `phase` - progress markers for each export step -- `status` - event status for each phase, for example 'started', 'finished' -- `message` - additional text (important for error rows) -- `errors` - error number or flag +- `phase` - progress markers for each export step. The phases are: + - `wait_to_run`: queued for execution. + - `populating_temp_table`: building temporary table with materialized query result. + - `converting_partitions`: converting temporary table partitions to parquet. + - `move_files`: copying converted files to export directory. + - `dropping_temp_table`: cleaning up temporary data. + - `sending_data`: streaming network response. + - `success`: completion of export task. +- `status` - event status for each phase, for example 'started', 'finished'. +- `message` - additional text (important for error rows). +- `errors` - error number or flag. **Examples:** @@ -624,7 +631,7 @@ COPY trades TO 'trades' WITH FORMAT PARQUET; Checking the log: ```questdb-sql -SELECT * FROM copy_export_log() WHERE id = '38b2b45f28aa822e'; +SELECT * FROM sys.copy_export_log WHERE id = '38b2b45f28aa822e'; ``` | ts | id | table_name | export_path | num_exported_files | phase | status | message | errors | diff --git a/documentation/reference/sql/copy.md b/documentation/reference/sql/copy.md index b3737769c..28c6ceaf7 100644 --- a/documentation/reference/sql/copy.md +++ b/documentation/reference/sql/copy.md @@ -231,18 +231,26 @@ The export destination is relative to `cairo.sql.copy.export.root` (defaults to `COPY-TO` reports its progress through a system table, `sys.copy_export_log`. This contains the following information: -| Column name | Data type | Notes | -|--------------------|-----------|-------------------------------------------------------------------| -| ts | timestamp | The log event timestamp | -| id | string | Export id | -| table_name | symbol | Source table name (or 'query' for subquery exports) | -| export_path | symbol | The destination directory path | -| num_exported_files | int | The number of files exported | -| phase | symbol | The export execution phase | -| status | symbol | The event status: started, finished, failed, cancelled | -| message | VARCHAR | Information about the current phase/step | -| errors | long | Error code(s) | - +| Column name | Data type | Notes | +|--------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------| +| ts | timestamp | The log event timestamp | +| id | string | Export id | +| table_name | symbol | Source table name (or 'query' for subquery exports) | +| export_path | symbol | The destination directory path | +| num_exported_files | int | The number of files exported | +| phase | symbol | The export execution phase: none, wait_to_run, populating_temp_table, converting_partitions, move_files, dropping_temp_table, sending_data, success | +| status | symbol | The event status: started, finished, failed, cancelled | +| message | VARCHAR | Information about the current phase/step | +| errors | long | Error code(s) | + +- `wait_to_run`: queued for execution. + - `populating_temp_table`: building temporary table with materialized query result. + - `converting_partitions`: converting temporary table partitions to parquet. + - `move_files`: copying converted files to export directory. + - `dropping_temp_table`: cleaning up temporary data. + - `sending_data`: streaming network response. + - `success`: completion of export task. + - Log table row retention is configurable through `cairo.sql.copy.log.retention.days` setting, and is three days by default. `COPY TO` returns an `id` value from `sys.copy_export_log` to track the export progress. @@ -283,7 +291,9 @@ Track export progress: SELECT * FROM sys.copy_export_log WHERE id = '7f3a9c2e1b456789'; ``` -This will copy all of the partitions and convert them individually to parquet. +This will copy all of the partitions from `trades`, and convert them individually to parquet. + +If partitioning of `NONE` is used, then a single parquet file will be generated instead. #### Export query results to Parquet @@ -304,7 +314,7 @@ Export data partitioned by day: ```questdb-sql title="Export with daily partitions" COPY trades TO 'trades_daily' WITH FORMAT PARQUET -PARTITION BY DAY; +PARTITION_BY DAY; ``` This creates separate Parquet files for each day's data in subdirectories named by date. From d879175c9f20ec8498537fabddf8aec5e2591dff Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 14:01:00 +0000 Subject: [PATCH 19/31] scrap meta --- documentation/reference/function/meta.md | 56 ------------------------ 1 file changed, 56 deletions(-) diff --git a/documentation/reference/function/meta.md b/documentation/reference/function/meta.md index c005fd3ba..963981f8a 100644 --- a/documentation/reference/function/meta.md +++ b/documentation/reference/function/meta.md @@ -589,62 +589,6 @@ If you want to re-read metadata for all user tables, simply use an asterisk: SELECT hydrate_table_metadata('*'); ``` -## sys.copy_export_log - -`sys.copy_export_log` is a pseudo-table containing the export log for `COPY TO` operations. - -**Arguments:** - -- `sys.copy_export_log` does not require arguments. - -**Return value:** - -Returns metadata on `COPY TO` export operations for the last three days, including the columns: - -- `ts` - timestamp of the log event. -- `id` - export identifier that can be used to track export progress. -- `table_name` - source table name (or 'query' for subquery exports). -- `export_path` - destination directory path for the export. -- `num_exported_files` - how many output files were written -- `phase` - progress markers for each export step. The phases are: - - `wait_to_run`: queued for execution. - - `populating_temp_table`: building temporary table with materialized query result. - - `converting_partitions`: converting temporary table partitions to parquet. - - `move_files`: copying converted files to export directory. - - `dropping_temp_table`: cleaning up temporary data. - - `sending_data`: streaming network response. - - `success`: completion of export task. -- `status` - event status for each phase, for example 'started', 'finished'. -- `message` - additional text (important for error rows). -- `errors` - error number or flag. - -**Examples:** - -```questdb-sql -COPY trades TO 'trades' WITH FORMAT PARQUET; -``` - -| id | -|------------------| -| 38b2b45f28aa822e | - -Checking the log: - -```questdb-sql -SELECT * FROM sys.copy_export_log WHERE id = '38b2b45f28aa822e'; -``` - -| ts | id | table_name | export_path | num_exported_files | phase | status | message | errors | -|-----------------------------|------------------|------------|--------------------------------|--------------------|-----------------------|----------|---------|--------| -| 2025-10-27T14:07:20.513119Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | started | queued | 0 | -| 2025-10-27T14:07:20.541779Z | 38b2b45f28aa822e | trades | null | null | wait_to_run | finished | 0 | -| 2025-10-27T14:07:20.542552Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | started | null | 0 | -| 2025-10-27T14:07:20.658111Z | 38b2b45f28aa822e | trades | null | null | converting_partitions | finished | null | 0 | -| 2025-10-27T14:07:20.658185Z | 38b2b45f28aa822e | trades | null | null | move_files | started | null | 0 | -| 2025-10-27T14:07:20.670200Z | 38b2b45f28aa822e | trades | null | null | move_files | finished | null | 0 | -| 2025-10-27T14:07:20.670414Z | 38b2b45f28aa822e | trades | /<dbroot>/export/trades/ | 26 | success | finished | null | 0 | - - ## flush_query_cache() `flush_query_cache' invalidates cached query execution plans. From e6c5b1ec2ea3df60ccef79afb9196fc715af1cd4 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 17:38:12 +0000 Subject: [PATCH 20/31] add example --- documentation/reference/sql/copy.md | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/documentation/reference/sql/copy.md b/documentation/reference/sql/copy.md index 28c6ceaf7..c46f80e2d 100644 --- a/documentation/reference/sql/copy.md +++ b/documentation/reference/sql/copy.md @@ -317,7 +317,26 @@ WITH FORMAT PARQUET PARTITION_BY DAY; ``` -This creates separate Parquet files for each day's data in subdirectories named by date. +The underlying table does not already need to be partitioned. Likewise, you can output query results as partitions: + +```questdb-sql title=Export queries with partitions +COPY ( + SELECT generate_series as date + FROM generate_series('2025-01-01', '2025-02-01', '1d') +) +TO 'dates' +WITH FORMAT PARQUET +PARTITION_BY DAY; +``` + +This creates separate Parquet files for each day's data in subdirectories named by date. For example: + +- export + - dates + - 2025-01-01.parquet + - 2025-01-02.parquet + - 2025-01-03.parquet + - ... #### Export with custom Parquet options From f4aef9b6252f98763849db048f8952e486de27b3 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 17:40:42 +0000 Subject: [PATCH 21/31] add tip reference --- documentation/guides/export-parquet.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/documentation/guides/export-parquet.md b/documentation/guides/export-parquet.md index 18bb245fc..80965c129 100644 --- a/documentation/guides/export-parquet.md +++ b/documentation/guides/export-parquet.md @@ -12,6 +12,7 @@ There are three ways of converting or exporting data to Parquet: * [Export query as file via REST](#export-query-as-file-via-rest) * [Export query as files via COPY](#export-query-as-files-via-copy) + * * [In-place conversion](#in-place-conversion) ## Data Compression @@ -47,6 +48,13 @@ To export a query as a file, you can use either the `/exp` REST API endpoint or ### Export query as file via REST + +:::tip + +See also the [/exp documentation](/documentation/reference/api/rest.md). + +::: + You can use the same parameters as when doing a [CSV export](/docs/reference/api/rest/#exp---export-data), only passing `parquet` as the `fmt` parameter value. ``` @@ -73,6 +81,13 @@ start DuckDB and execute: ### Export query as files via COPY + +:::tip + +See also the [COPY-TO documentation](/documentation/reference/sql/copy.md). + +::: + If you prefer to export data via SQL, or if you want to export asynchronously, you can use the `COPY` command from the web console, from any pgwire-compliant client, or using the [`exec` endpoint](/docs/reference/api/rest/#exec---execute-queries) of the REST API. From 2b9a22451eee01e4509d75ba72bc8df55b61ca30 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 17:42:12 +0000 Subject: [PATCH 22/31] probably fixing links --- documentation/guides/export-parquet.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/documentation/guides/export-parquet.md b/documentation/guides/export-parquet.md index 80965c129..08d44e0de 100644 --- a/documentation/guides/export-parquet.md +++ b/documentation/guides/export-parquet.md @@ -51,7 +51,7 @@ To export a query as a file, you can use either the `/exp` REST API endpoint or :::tip -See also the [/exp documentation](/documentation/reference/api/rest.md). +See also the [/exp documentation](/docs/reference/api/rest.md). ::: @@ -84,7 +84,7 @@ start DuckDB and execute: :::tip -See also the [COPY-TO documentation](/documentation/reference/sql/copy.md). +See also the [COPY-TO documentation](/docs/reference/sql/copy.md). ::: From 603b526c00a6e7907cf489422d0867a79e03bbb3 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 17:44:36 +0000 Subject: [PATCH 23/31] nix rmode, leave it undocumented --- documentation/reference/api/rest.md | 1 - 1 file changed, 1 deletion(-) diff --git a/documentation/reference/api/rest.md b/documentation/reference/api/rest.md index f97490ced..a12755e67 100644 --- a/documentation/reference/api/rest.md +++ b/documentation/reference/api/rest.md @@ -619,7 +619,6 @@ When `fmt=parquet`, the following additional parameters are supported: | `statistics_enabled` | No | `true` | Enable Parquet column statistics: `true` or `false`. | | `parquet_version` | No | `2` | Parquet format version: `1` (v1.0) or `2` (v2.0). | | `raw_array_encoding` | No | `false` | Use raw encoding for arrays: `true` (lighter-weight, less compatible) or `false` (heavier-weight, more compatible) | -| `rmode` | No | `false` | Set HTTP response mode: `nodelay` or not sent | The parameters must be URL encoded. From 4bb6fa8c4bf4cd23d3c6ecca883eb485f2a299ff Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 18:25:41 +0000 Subject: [PATCH 24/31] fix links --- documentation/guides/export-parquet.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/documentation/guides/export-parquet.md b/documentation/guides/export-parquet.md index 08d44e0de..b037dfd5c 100644 --- a/documentation/guides/export-parquet.md +++ b/documentation/guides/export-parquet.md @@ -51,7 +51,7 @@ To export a query as a file, you can use either the `/exp` REST API endpoint or :::tip -See also the [/exp documentation](/docs/reference/api/rest.md). +See also the [/exp documentation](/docs/reference/api/rest). ::: @@ -84,7 +84,7 @@ start DuckDB and execute: :::tip -See also the [COPY-TO documentation](/docs/reference/sql/copy.md). +See also the [COPY-TO documentation](/docs/reference/sql/copy). ::: From 626c30c58feca480dcf15b2a738f4920dfaefdea Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 18:30:06 +0000 Subject: [PATCH 25/31] fix acorn --- documentation/configuration.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/documentation/configuration.md b/documentation/configuration.md index 4b050cff3..d2543bb31 100644 --- a/documentation/configuration.md +++ b/documentation/configuration.md @@ -240,16 +240,16 @@ Parquet export is also generally impacted by query execution and parquet convers If not overridden, the following default setting will be used. From 175be6897a284418725e7f14802250b9d17b4ad7 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 18:37:56 +0000 Subject: [PATCH 26/31] fix acorn maybe --- documentation/configuration-utils/_cairo.config.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/documentation/configuration-utils/_cairo.config.json b/documentation/configuration-utils/_cairo.config.json index ac40302db..24ae49061 100644 --- a/documentation/configuration-utils/_cairo.config.json +++ b/documentation/configuration-utils/_cairo.config.json @@ -468,11 +468,11 @@ "description": "determines whether to export arrays in QuestDB-native binary format (true, less compatible) or Parquet-native format (false, more compatible)." }, "cairo.partition.encoder.parquet.version": { - "default": 1, + "default": "1", "description": "Output parquet version to use for parquet-encoded partitions. Can be 1 or 2." }, "cairo.partition.encoder.parquet.statistics.enabled": { - "default": true, + "default": "true", "description": "Controls whether or not statistics are included in parquet-encoded partitions." }, "cairo.partition.encoder.parquet.compression.codec": { From 304e664c2e0ca342a54af72ee16ca981ddeb7e23 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 18:43:05 +0000 Subject: [PATCH 27/31] definitely fixed --- documentation/configuration.md | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/documentation/configuration.md b/documentation/configuration.md index d2543bb31..524a91996 100644 --- a/documentation/configuration.md +++ b/documentation/configuration.md @@ -181,7 +181,7 @@ Settings for `COPY FROM` (import): + Parquet export is also generally impacted by query execution and parquet conversion parameters. From d3028d4062c03647be8d315df3f6aeebe5ea59da Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 18:52:03 +0000 Subject: [PATCH 28/31] fixes --- .../_parquet-export.config.json | 2 +- documentation/configuration.md | 2 +- documentation/guides/export-parquet.md | 14 +------------- documentation/guides/import-csv.md | 2 +- documentation/reference/sql/copy.md | 2 +- 5 files changed, 5 insertions(+), 17 deletions(-) diff --git a/documentation/configuration-utils/_parquet-export.config.json b/documentation/configuration-utils/_parquet-export.config.json index d885dafe5..df5fb5e44 100644 --- a/documentation/configuration-utils/_parquet-export.config.json +++ b/documentation/configuration-utils/_parquet-export.config.json @@ -1,6 +1,6 @@ { "cairo.sql.copy.export.root": { "default": "export", - "description": "Root directory for parquet exports via `COPY-TO` SQL This path must not overlap with other directory (e.g. db, conf) of running instance, otherwise export may delete or overwrite existing files. Relative paths are resolved against the server root directory." + "description": "Root directory for parquet exports via `COPY-TO` SQL. This path must not overlap with other directory (e.g. db, conf) of running instance, otherwise export may delete or overwrite existing files. Relative paths are resolved against the server root directory." } } \ No newline at end of file diff --git a/documentation/configuration.md b/documentation/configuration.md index 524a91996..89efedb58 100644 --- a/documentation/configuration.md +++ b/documentation/configuration.md @@ -226,7 +226,7 @@ It is important that the two path are identical (`/var/lib/questdb/questdb_import` in the example). -### Export +#### Export diff --git a/documentation/guides/export-parquet.md b/documentation/guides/export-parquet.md index b037dfd5c..7060c86d5 100644 --- a/documentation/guides/export-parquet.md +++ b/documentation/guides/export-parquet.md @@ -12,7 +12,6 @@ There are three ways of converting or exporting data to Parquet: * [Export query as file via REST](#export-query-as-file-via-rest) * [Export query as files via COPY](#export-query-as-files-via-copy) - * * [In-place conversion](#in-place-conversion) ## Data Compression @@ -34,24 +33,14 @@ You can override these defaults when [exporting via COPY](#export-query-as-files ## Export queries as files -:::warning -Exporting as files is right now available on a development branch: [https://github.com/questdb/questdb/pull/6008](https://github.com/questdb/questdb/pull/6008). -If you want to test this feature, you need to clone and compile the branch. - -The code is functional, but it is just lacking fuzzy tests and documentation. We should be able to include this in a -release soon enough, but for exporting it is safe to just checkout the development branch, compile, and start QuestDB -pointing to the target jar. -::: - To export a query as a file, you can use either the `/exp` REST API endpoint or the `COPY` command. ### Export query as file via REST - :::tip -See also the [/exp documentation](/docs/reference/api/rest). +See also the [/exp documentation](/docs/reference/api/rest/#exp---export-data). ::: @@ -78,7 +67,6 @@ start DuckDB and execute: select * from read_parquet('~/tmp/exp.parquet'); ``` - ### Export query as files via COPY diff --git a/documentation/guides/import-csv.md b/documentation/guides/import-csv.md index 062e5a6fd..8b0d14a9c 100644 --- a/documentation/guides/import-csv.md +++ b/documentation/guides/import-csv.md @@ -127,7 +127,7 @@ csvstack *.csv > singleFile.csv #### Configure `COPY` -- Enable `COPY` and [configure](/docs/configuration/#csv-import) the `COPY` +- Enable `COPY` and [configure](/docs/configuration/#copy-settings) the `COPY` directories to suit your server. - `cairo.sql.copy.root` must be set for `COPY` to work. diff --git a/documentation/reference/sql/copy.md b/documentation/reference/sql/copy.md index c46f80e2d..b192b7713 100644 --- a/documentation/reference/sql/copy.md +++ b/documentation/reference/sql/copy.md @@ -77,7 +77,7 @@ operation. There are two root directories to be defined: `root_directory/tmp` directory. Use the [configuration keys](/docs/configuration/) to edit these properties in -[`COPY` configuration settings](/docs/configuration/#csv-import): +[`COPY` configuration settings](/docs/configuration/#copy-settings): ```shell title="Example" cairo.sql.copy.root=/Users/UserName/Desktop From c53cf80833529f18f7e6efa78efc567f27c4a203 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 18:52:43 +0000 Subject: [PATCH 29/31] fixes --- documentation/guides/export-parquet.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/documentation/guides/export-parquet.md b/documentation/guides/export-parquet.md index 7060c86d5..d26aa23e1 100644 --- a/documentation/guides/export-parquet.md +++ b/documentation/guides/export-parquet.md @@ -64,7 +64,7 @@ to point DuckDB to the example file exported in the previous example, you could start DuckDB and execute: ``` - select * from read_parquet('~/tmp/exp.parquet'); +select * from read_parquet('~/tmp/exp.parquet'); ``` ### Export query as files via COPY @@ -84,13 +84,13 @@ or using the [`exec` endpoint](/docs/reference/api/rest/#exec---execute-queries) You can export a query: ``` - COPY (select * from market_data limit 3) TO 'market_data_parquet_table' WITH FORMAT PARQUET; +COPY (select * from market_data limit 3) TO 'market_data_parquet_table' WITH FORMAT PARQUET; ``` Or you can export a whole table: ``` - COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET; +COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET; ``` @@ -109,7 +109,6 @@ If you want to monitor the export process, you can issue a call like this: SELECT * FROM 'sys.copy_export_log' WHERE id = '45ba24e5ba338099'; ``` - While it is running, export can be cancelled with: ``` From 2f6bc1ec3913aab9c4b8dd8cfa26692455a91dea Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 18:59:02 +0000 Subject: [PATCH 30/31] cleanup --- documentation/configuration.md | 3 --- documentation/reference/sql/copy.md | 8 -------- 2 files changed, 11 deletions(-) diff --git a/documentation/configuration.md b/documentation/configuration.md index 89efedb58..6fc267b67 100644 --- a/documentation/configuration.md +++ b/documentation/configuration.md @@ -247,9 +247,6 @@ If not overridden, the following default setting will be used. ]} /> - - - ### Parallel SQL execution This section describes settings that can affect the level of parallelism during diff --git a/documentation/reference/sql/copy.md b/documentation/reference/sql/copy.md index b192b7713..74aac097e 100644 --- a/documentation/reference/sql/copy.md +++ b/documentation/reference/sql/copy.md @@ -243,14 +243,6 @@ The export destination is relative to `cairo.sql.copy.export.root` (defaults to | message | VARCHAR | Information about the current phase/step | | errors | long | Error code(s) | -- `wait_to_run`: queued for execution. - - `populating_temp_table`: building temporary table with materialized query result. - - `converting_partitions`: converting temporary table partitions to parquet. - - `move_files`: copying converted files to export directory. - - `dropping_temp_table`: cleaning up temporary data. - - `sending_data`: streaming network response. - - `success`: completion of export task. - - Log table row retention is configurable through `cairo.sql.copy.log.retention.days` setting, and is three days by default. `COPY TO` returns an `id` value from `sys.copy_export_log` to track the export progress. From 1e223d222961a8e2af6334e37c74a328e6bda947 Mon Sep 17 00:00:00 2001 From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com> Date: Tue, 28 Oct 2025 19:36:15 +0000 Subject: [PATCH 31/31] cleanup --- documentation/reference/api/rest.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/documentation/reference/api/rest.md b/documentation/reference/api/rest.md index a12755e67..e0318ede2 100644 --- a/documentation/reference/api/rest.md +++ b/documentation/reference/api/rest.md @@ -20,8 +20,7 @@ off-the-shelf HTTP clients. It provides a simple way to interact with QuestDB and is compatible with most programming languages. API functions are fully keyed on the URL and they use query parameters as their arguments. -The Web Console[Web Console](/docs/web-console/) is the official Web client -relying on the REST API. +The [Web Console](/docs/web-console/) is the official Web client for QuestDB, that relies on the REST API. **Available methods**