Skip to content
96 changes: 79 additions & 17 deletions docs/src/user-docs/guides-using-object-storage/clp-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,97 @@ should be able to use CLP as described in the [clp-json quick-start guide](../qu

## Compressing logs from S3

To compress logs from S3, use the `sbin/compress.sh` script as follows, replacing the fields in
angle brackets (`<>`) with the appropriate values:
To compress logs from S3, use the `sbin/compress-from-s3.sh` script. The script supports two modes
of operation:

* [**s3-object** mode](#s3-object-compression-mode): Compress S3 objects specified by their full
S3 URLs.
* [**s3-key-prefix** mode](#s3-key-prefix-compression-mode): Compress all S3 objects under a given
S3 key prefix.

### `s3-object` compression mode

The `s3-object` mode allows you to specify individual S3 objects to compress by using their full
URLs. To use this mode, call the `sbin/compress-from-s3.sh` script as follows, and replace the
fields in angle brackets (`<>`) with the appropriate values:

```bash
sbin/compress.sh \
sbin/compress-from-s3.sh \
--timestamp-key <timestamp-key> \
<url>
--dataset <dataset-name> \
s3-object \
<object-url> [<object-url> ...]
```

* `<url>` is a URL identifying the logs to compress. It can have one of two formats:
* `https://<bucket-name>.s3.<region-code>.amazonaws.com/<prefix>`
* `https://s3.<region-code>.amazonaws.com/<bucket-name>/<prefix>`
* The fields in `<url>` are as follows:
* `<object-url>` is a URL identifying the S3 object to compress. It can be written in either of two
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `<object-url>` is a URL identifying the S3 object to compress. It can be written in either of two
* `<object-url>` is a URL identifying the S3 object to compress. It can be written in one of two

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested changing to "either" because it more readily indicates that the user can write the URL in either one of the two formats. having just "one of two" implies that only one of the formats will be valid in each use case, and that the user has to figure out which one to use.

formats:
* `https://<bucket-name>.s3.<region-code>.amazonaws.com/<object-key>`
* `https://s3.<region-code>.amazonaws.com/<bucket-name>/<object-key>`
* The fields in `<object-url>` are as follows:
* `<bucket-name>` is the name of the S3 bucket containing your logs.
* `<region-code>` is the AWS region [code][aws-region-codes] for the S3 bucket containing your
logs.
* `<prefix>` is the prefix of all logs you wish to compress and must begin with the
`<all-logs-prefix>` value from the [compression IAM policy][compression-iam-policy].
* `<object-key>` is the [object key][aws-s3-object-key] of the log file object you wish to
compress.

:::{warning}
There must be no duplicate object keys across all `<object-url>` arguments.
:::


* For a description of other fields, see the [clp-json quick-start
guide](../quick-start/clp-json.md#compressing-json-logs).

Instead of specifying input object URLs explicitly in the command, you may specify them in a text
file and then pass the file into the command using the `--inputs-from` flag, like so:

```bash
sbin/compress-from-s3.sh \
--timestamp-key <timestamp-key> \
--dataset <dataset-name> \
s3-object \
--inputs-from <input-file>
```

* `<input-file>` is a path to a text file containing one S3 object URL **per line**. The URLs must
follow the same format as described above for `<object-url>`.

:::{note}
Compressing from S3 only supports a single URL but will compress any logs that have the given
prefix.
The `s3-object` mode requires the input object keys to share a non-empty common prefix. If the input
object keys do not share a common prefix, they will be rejected and no compression job will be
created. This limitation will be addressed in a future release.
:::

### `s3-key-prefix` compression mode

The `s3-key-prefix` mode allows you to compress all objects under a given S3 key prefix. To use this
mode, call the `sbin/compress-from-s3.sh` script as follows, and replace the fields in angle
brackets (`<>`) with the appropriate values:

```bash
sbin/compress-from-s3.sh \
--timestamp-key <timestamp-key> \
--dataset <dataset-name> \
s3-key-prefix \
<key-prefix-url>
```

If you wish to compress a single log file, specify the entire path to the log file. However, if
that log file's path is a prefix of another log file's path, then both log files will be compressed
(e.g., with two files "logs/syslog" and "logs/syslog.1", a prefix like "logs/syslog" will cause
both logs to be compressed). This limitation will be addressed in a future release.
* `<key-prefix-url>` is a URL identifying the S3 key prefix to compress. It can be written in either
of two formats:
Comment on lines +83 to +84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `<key-prefix-url>` is a URL identifying the S3 key prefix to compress. It can be written in either
of two formats:
* `<key-prefix-url>` is a URL identifying the S3 key prefix to compress. It can be written in one of
two formats:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above re. "either"

* `https://<bucket-name>.s3.<region-code>.amazonaws.com/<key-prefix>`
* `https://s3.<region-code>.amazonaws.com/<bucket-name>/<key-prefix>`
* The fields in `<key-prefix-url>` are as follows:
* `<bucket-name>` is the name of the S3 bucket containing your logs.
* `<region-code>` is the AWS region [code][aws-region-codes] for the S3 bucket containing your
logs.
* `<key-prefix>` is the prefix of all logs you wish to compress and must begin with the
`<all-logs-prefix>` value from the [compression IAM policy][compression-iam-policy].

:::{note}
`s3-key-prefix` mode only accepts a single `<key-prefix-url>` argument. This limitation will be
addressed in a future release.
:::

[add-iam-policy]: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#embed-inline-policy-console
[aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability
[aws-s3-object-key]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
[compression-iam-policy]: ./object-storage-config.md#configuration-for-compression