-
Notifications
You must be signed in to change notification settings - Fork 83
docs(clp-package): Rewrite S3 log compression guide to reflect new API and script features. #1510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
LinZhihao-723
merged 9 commits into
y-scope:main
from
LinZhihao-723:compress-from-s3-doc
Nov 1, 2025
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
1e7fc13
Draft docs.
LinZhihao-723 1f28d2b
Merge branch 'main' into compress-from-s3-doc
LinZhihao-723 f23ab47
Apply suggestions from code review
LinZhihao-723 7e36987
Address code review comments with nit modification.
LinZhihao-723 d3451cd
Merge branch 'oss-main' into compress-from-s3-doc
LinZhihao-723 bebd046
Direct-commit nit comments
quinntaylormitchell 5a33db9
Apply suggestions from code review
LinZhihao-723 aa94167
Partially apply Kirk's comments with Quinn's suggestions.
LinZhihao-723 a792512
Merge branch 'main' into compress-from-s3-doc
LinZhihao-723 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -5,35 +5,97 @@ should be able to use CLP as described in the [clp-json quick-start guide](../qu | |||||||||
|
|
||||||||||
| ## Compressing logs from S3 | ||||||||||
|
|
||||||||||
| To compress logs from S3, use the `sbin/compress.sh` script as follows, replacing the fields in | ||||||||||
| angle brackets (`<>`) with the appropriate values: | ||||||||||
| To compress logs from S3, use the `sbin/compress-from-s3.sh` script. The script supports two modes | ||||||||||
| of operation: | ||||||||||
|
|
||||||||||
| * [**s3-object** mode](#s3-object-compression-mode): Compress S3 objects specified by their full | ||||||||||
| S3 URLs. | ||||||||||
| * [**s3-key-prefix** mode](#s3-key-prefix-compression-mode): Compress all S3 objects under a given | ||||||||||
| S3 key prefix. | ||||||||||
|
|
||||||||||
| ### `s3-object` compression mode | ||||||||||
|
|
||||||||||
| The `s3-object` mode allows you to specify individual S3 objects to compress by using their full | ||||||||||
| URLs. To use this mode, call the `sbin/compress-from-s3.sh` script as follows, and replace the | ||||||||||
| fields in angle brackets (`<>`) with the appropriate values: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| sbin/compress.sh \ | ||||||||||
| sbin/compress-from-s3.sh \ | ||||||||||
| --timestamp-key <timestamp-key> \ | ||||||||||
| <url> | ||||||||||
| --dataset <dataset-name> \ | ||||||||||
| s3-object \ | ||||||||||
| <object-url> [<object-url> ...] | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| * `<url>` is a URL identifying the logs to compress. It can have one of two formats: | ||||||||||
| * `https://<bucket-name>.s3.<region-code>.amazonaws.com/<prefix>` | ||||||||||
| * `https://s3.<region-code>.amazonaws.com/<bucket-name>/<prefix>` | ||||||||||
| * The fields in `<url>` are as follows: | ||||||||||
| * `<object-url>` is a URL identifying the S3 object to compress. It can be written in either of two | ||||||||||
| formats: | ||||||||||
| * `https://<bucket-name>.s3.<region-code>.amazonaws.com/<object-key>` | ||||||||||
| * `https://s3.<region-code>.amazonaws.com/<bucket-name>/<object-key>` | ||||||||||
| * The fields in `<object-url>` are as follows: | ||||||||||
quinntaylormitchell marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
| * `<bucket-name>` is the name of the S3 bucket containing your logs. | ||||||||||
| * `<region-code>` is the AWS region [code][aws-region-codes] for the S3 bucket containing your | ||||||||||
| logs. | ||||||||||
| * `<prefix>` is the prefix of all logs you wish to compress and must begin with the | ||||||||||
| `<all-logs-prefix>` value from the [compression IAM policy][compression-iam-policy]. | ||||||||||
| * `<object-key>` is the [object key][aws-s3-object-key] of the log file object you wish to | ||||||||||
| compress. | ||||||||||
|
|
||||||||||
| :::{warning} | ||||||||||
| There must be no duplicate object keys across all `<object-url>` arguments. | ||||||||||
| ::: | ||||||||||
|
|
||||||||||
|
|
||||||||||
| * For a description of other fields, see the [clp-json quick-start | ||||||||||
| guide](../quick-start/clp-json.md#compressing-json-logs). | ||||||||||
|
|
||||||||||
| Instead of specifying input object URLs explicitly in the command, you may specify them in a text | ||||||||||
| file and then pass the file into the command using the `--inputs-from` flag, like so: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| sbin/compress-from-s3.sh \ | ||||||||||
| --timestamp-key <timestamp-key> \ | ||||||||||
| --dataset <dataset-name> \ | ||||||||||
| s3-object \ | ||||||||||
| --inputs-from <input-file> | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| * `<input-file>` is a path to a text file containing one S3 object URL **per line**. The URLs must | ||||||||||
| follow the same format as described above for `<object-url>`. | ||||||||||
|
|
||||||||||
LinZhihao-723 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
| :::{note} | ||||||||||
| Compressing from S3 only supports a single URL but will compress any logs that have the given | ||||||||||
| prefix. | ||||||||||
| The `s3-object` mode requires the input object keys to share a non-empty common prefix. If the input | ||||||||||
| object keys do not share a common prefix, they will be rejected and no compression job will be | ||||||||||
| created. This limitation will be addressed in a future release. | ||||||||||
| ::: | ||||||||||
|
|
||||||||||
| ### `s3-key-prefix` compression mode | ||||||||||
|
|
||||||||||
| The `s3-key-prefix` mode allows you to compress all objects under a given S3 key prefix. To use this | ||||||||||
| mode, call the `sbin/compress-from-s3.sh` script as follows, and replace the fields in angle | ||||||||||
| brackets (`<>`) with the appropriate values: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| sbin/compress-from-s3.sh \ | ||||||||||
| --timestamp-key <timestamp-key> \ | ||||||||||
| --dataset <dataset-name> \ | ||||||||||
| s3-key-prefix \ | ||||||||||
| <key-prefix-url> | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| If you wish to compress a single log file, specify the entire path to the log file. However, if | ||||||||||
| that log file's path is a prefix of another log file's path, then both log files will be compressed | ||||||||||
| (e.g., with two files "logs/syslog" and "logs/syslog.1", a prefix like "logs/syslog" will cause | ||||||||||
| both logs to be compressed). This limitation will be addressed in a future release. | ||||||||||
| * `<key-prefix-url>` is a URL identifying the S3 key prefix to compress. It can be written in either | ||||||||||
| of two formats: | ||||||||||
|
Comment on lines
+83
to
+84
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above re. "either" |
||||||||||
| * `https://<bucket-name>.s3.<region-code>.amazonaws.com/<key-prefix>` | ||||||||||
| * `https://s3.<region-code>.amazonaws.com/<bucket-name>/<key-prefix>` | ||||||||||
| * The fields in `<key-prefix-url>` are as follows: | ||||||||||
| * `<bucket-name>` is the name of the S3 bucket containing your logs. | ||||||||||
| * `<region-code>` is the AWS region [code][aws-region-codes] for the S3 bucket containing your | ||||||||||
| logs. | ||||||||||
| * `<key-prefix>` is the prefix of all logs you wish to compress and must begin with the | ||||||||||
| `<all-logs-prefix>` value from the [compression IAM policy][compression-iam-policy]. | ||||||||||
|
|
||||||||||
| :::{note} | ||||||||||
| `s3-key-prefix` mode only accepts a single `<key-prefix-url>` argument. This limitation will be | ||||||||||
| addressed in a future release. | ||||||||||
| ::: | ||||||||||
|
|
||||||||||
| [add-iam-policy]: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#embed-inline-policy-console | ||||||||||
| [aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability | ||||||||||
| [aws-s3-object-key]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html | ||||||||||
| [compression-iam-policy]: ./object-storage-config.md#configuration-for-compression | ||||||||||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested changing to "either" because it more readily indicates that the user can write the URL in either one of the two formats. having just "one of two" implies that only one of the formats will be valid in each use case, and that the user has to figure out which one to use.