Configuring log collection JSON parsing

You can configure the Fluentd log collector to determine if a log message is in JSON format and merge the message into the JSON payload document posted to Elasticsearch. This feature is disabled by default.

You can enable or disable this feature by editing the MERGE_JSON_LOG environment variable in the fluentd daemonset.

Important

Enabling this feature comes with risks, including:

Possible log loss due to Elasticsearch rejecting documents due to inconsistent type mappings.
Potential buffer storage leak caused by rejected message cycling.
Overwrite of data for field with same names.

The features in this topic should be used by only experienced Fluentd and Elasticsearch users.

Prerequisites

Set cluster logging to the unmanaged state.

Procedure

Use the following command to enable this feature:

oc set env ds/fluentd MERGE_JSON_LOG=true (1)

Set this to false to disable this feature or true to enable this feature.

Setting MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING

If you set the MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING enviroment variables to true, you might receive an Elasticsearch 400 error. The error occurs because when`MERGE_JSON_LOG=true`, Fluentd adds fields with data types other than string. When you set CDM_UNDEFINED_TO_STRING=true, Fluentd attempts to add those fields as a string value resulting in the Elasticsearch 400 error. The error clears when the indices roll over for the next day.

When Fluentd rolls over the indices for the next day’s logs, it will create a brand new index. The field definitions are updated and you will not get the 400 error.

Records that have hard errors, such as schema violations, corrupted data, and so forth, cannot be retried. The log collector sends the records for error handling. If you add a <label @ERROR> section to your Fluentd config, as the last <label>, you can handle these records as needed.

For example:

data:
  fluent.conf:

....

    <label @ERROR>
      <match **>
        @type file
        path /var/log/fluent/dlq
        time_slice_format %Y%m%d
        time_slice_wait 10m
        time_format %Y%m%dT%H%M%S%z
        compress gzip
      </match>
    </label>

This section writes error records to the Elasticsearch dead letter queue (DLQ) file. See the fluentd documentation for more information about the file output.

Then you can edit the file to clean up the records manually, edit the file to use with the Elasticsearch /_bulk index API and use cURL to add those records. For more information on Elasticsearch Bulk API, see the Elasticsearch documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster-logging-collector-json.adoc

cluster-logging-collector-json.adoc

Configuring log collection JSON parsing

Files

cluster-logging-collector-json.adoc

Latest commit

History

cluster-logging-collector-json.adoc

File metadata and controls

Configuring log collection JSON parsing