diff --git a/docs/src/user-docs/index.md b/docs/src/user-docs/index.md index 724d4ee26f..c2ea60207a 100644 --- a/docs/src/user-docs/index.md +++ b/docs/src/user-docs/index.md @@ -50,6 +50,7 @@ Reference docs like format specifications, etc. :caption: Quick start quick-start/index +quick-start/text-v-json quick-start/clp-json quick-start/clp-text ::: diff --git a/docs/src/user-docs/quick-start/clp-json.md b/docs/src/user-docs/quick-start/clp-json.md index 7f080a3d87..1f2702070a 100644 --- a/docs/src/user-docs/quick-start/clp-json.md +++ b/docs/src/user-docs/quick-start/clp-json.md @@ -45,7 +45,7 @@ sbin/compress.sh --timestamp-key '' [ ...] * `` are paths to JSON log files or directories containing such files. * Each JSON log file should contain each log event as a - [separate JSON object](./index.md#clp-json), i.e., *not* as an array. + [separate JSON object](./text-v-json.md#clp-json), i.e., *not* as an array. The compression script will output the compression ratio of each dataset you compress, or you can use the UI to view overall statistics. diff --git a/docs/src/user-docs/quick-start/index.md b/docs/src/user-docs/quick-start/index.md index cfe7fea552..bcc7be0481 100644 --- a/docs/src/user-docs/quick-start/index.md +++ b/docs/src/user-docs/quick-start/index.md @@ -49,68 +49,19 @@ install or upgrade it by following the instructions for your OS. There are two flavors of CLP: -* **[clp-json](#clp-json)** for compressing and searching **JSON** logs. -* **[clp-text](#clp-text)** for compressing and searching **unstructured text** logs. +* **`clp-json`** for compressing and searching **JSON** logs. +* **`clp-text`** for compressing and searching **unstructured text** logs. :::{note} Both flavors contain the same binaries but are configured with different values for the `package.storage_engine` key in the package's config file (`etc/clp-config.yml`). ::: -### clp-json - -The JSON flavor of CLP is appropriate for JSON logs, where each log event is an independent JSON -object. For example: - -```json lines -{ - "t": { - "$date": "2023-03-21T23:46:37.392" - }, - "ctx": "conn11", - "msg": "Waiting for write concern." -} -{ - "t": { - "$date": "2023-03-21T23:46:37.392" - }, - "msg": "Set last op to system time" -} -``` - -The log file above contains two log events represented by two JSON objects printed one after the -other. Whitespace is ignored, so the log events could also appear with no newlines and indentation. - -If you're using JSON logs, download and extract the `clp-json` release from the -[Releases][clp-releases] page, then proceed to the [clp-json quick-start](./clp-json.md) guide. - -### clp-text - -The text flavor of CLP is appropriate for unstructured text logs, where each log event contains a -timestamp and may span one or more lines. - -:::{note} -If your logs don't contain timestamps or CLP can't automatically parse the timestamps in your logs, -it will treat each line as an independent log event. -::: - -For example: - -```text -2015-03-23T15:50:17.926Z INFO container_1 Transitioned from ALLOCATED to ACQUIRED -2015-03-23T15:50:17.927Z ERROR Scheduler: Error trying to assign container token -java.lang.IllegalArgumentException: java.net.UnknownHostException: i-e5d112ea - at org.apache.hadoop.security.buildTokenService(SecurityUtil.java:374) - at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) -Caused by: java.net.UnknownHostException: i-e5d112ea - ... 17 more -``` - -The log file above contains two log events, both beginning with a timestamp. The first is a single -line, while the second contains multiple lines. +Download and extract your chosen flavor from the [Releases][clp-releases] page, and then proceed to +the [appropriate quick-start guide](#using-clp). -If you're using unstructured text logs, download and extract the `clp-text` release from the -[Releases][clp-releases] page, then proceed to the [clp-text quick-start](./clp-text.md) guide. +If you're having trouble selecting which flavor will work best for you, or you'd like to compare the +capabilities of the two flavors, check out the [clp-text vs. clp-json](./text-v-json.md) page. --- diff --git a/docs/src/user-docs/quick-start/text-v-json.md b/docs/src/user-docs/quick-start/text-v-json.md new file mode 100644 index 0000000000..c7ff06a4d8 --- /dev/null +++ b/docs/src/user-docs/quick-start/text-v-json.md @@ -0,0 +1,115 @@ +# clp-text vs. clp-json + +CLP comes in two flavors: + +* **[clp-json](#clp-json)** for compressing and searching **JSON** logs. +* **[clp-text](#clp-text)** for compressing and searching **unstructured text** logs. + +:::{note} +Both flavors contain the same binaries but are configured with different values for the +`package.storage_engine` key in the package's config file (`etc/clp-config.yml`). +::: + +[Table 1](#table-1) compares the different capabilities and limitations of each of the two flavors. + +(table-1)= +:::{card} + + +|Capability|`clp-text`|`clp-json`| +|---|:---:|:---:| +|Compression of unstructured text logs||| +|Compression of JSON logs|1|| +|Compression of CLP IR files||| +|Compression of CLP KV-IR files||| +|Command line search||| +|WebUI search||| +|Decompression||| +|Automatic timestamp parsing|2|2, 3| +|Preservation of time zone information|4|4| +|Retention control||| +|Archive management||| +|Dataset management||| +|S3 support||| +|Multi-node deployment||| +|CLP + Presto integration||| +|Parallel compression||| + ++++ +**Table 1**: The capabilities and limitations of CLP's two flavors. + +1) `clp-text` is able to compress and search JSON logs as if they were unstructured text, but + `clp-text` cannot query individual fields. +2) Timestamp parsing is limited to specific supported formats: see + [clp-text timestamp formats][ts-text] and [clp-json timestamp formats][ts-json] for more details. +3) Timestamps are parsed automatically as long as the timestamp key for the logs is provided at + compression time using the `--timestamp-key` flag. +4) We hope to introduce support for the preservation of time zone information in a future update + (issue is up [here](https://github.com/y-scope/clp/issues/1290)) +::: + +## clp-json + +The JSON flavor of CLP is appropriate for JSON logs, where each log event is an independent JSON +object. For example: + +```json lines +{ + "t": { + "$date": "2023-03-21T23:46:37.392" + }, + "ctx": "conn11", + "msg": "Waiting for write concern." +} +{ + "t": { + "$date": "2023-03-21T23:46:37.392" + }, + "msg": "Set last op to system time" +} +``` + +The log file above contains two log events represented by two JSON objects printed one after the +other. Whitespace is ignored, so the log events could also appear with no newlines and indentation. + +If you're using JSON logs, download and extract the `clp-json` release from the +[Releases][clp-releases] page, then proceed to the [clp-json quick-start](./clp-json.md) guide. + +## clp-text + +The text flavor of CLP is appropriate for unstructured text logs, where each log event contains a +timestamp and may span one or more lines. + +:::{note} +If your logs don't contain timestamps or CLP can't automatically parse the timestamps in your logs, +it will treat each line as an independent log event. +::: + +For example: + +```text +2015-03-23T15:50:17.926Z INFO container_1 Transitioned from ALLOCATED to ACQUIRED +2015-03-23T15:50:17.927Z ERROR Scheduler: Error trying to assign container token +java.lang.IllegalArgumentException: java.net.UnknownHostException: i-e5d112ea + at org.apache.hadoop.security.buildTokenService(SecurityUtil.java:374) + at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) +Caused by: java.net.UnknownHostException: i-e5d112ea + ... 17 more +``` + +The log file above contains two log events, both beginning with a timestamp. The first is a single +line, while the second contains multiple lines. + +If you're using unstructured text logs, download and extract the `clp-text` release from the +[Releases][clp-releases] page, then proceed to the [clp-text quick-start](./clp-text.md) guide. + +[clp-releases]: https://github.com/y-scope/clp/releases + +[ts-text]: https://github.com/y-scope/clp/blob/main/components/core/src/clp/TimestampPattern.cpp#L120 + +[ts-json]: https://github.com/y-scope/clp/blob/main/components/core/src/clp_s/TimestampPattern.cpp#L210