Add info to date processor docs

PeteGillinElastic · PeteGillinElastic · commit cbc8f5d03b0a · 2025-04-26T17:53:03.000+01:00
diff --git a/docs/reference/enrich-processor/date-processor.md b/docs/reference/enrich-processor/date-processor.md
@@ -6,7 +6,6 @@ mapped_pages:
 
 # Date processor [date-processor]
 
-
 Parses dates from fields, and then uses the date or timestamp as the timestamp for the document. By default, the date processor adds the parsed date as a new field called `@timestamp`. You can specify a different field by setting the `target_field` configuration parameter. Multiple date formats are supported as part of the same date processor definition. They will be used sequentially to attempt parsing the date field, in the same order they were defined as part of the processor definition.
 
 $$$date-options$$$
@@ -16,7 +15,7 @@ $$$date-options$$$
 | `field` | yes | - | The field to get the date from. |
 | `target_field` | no | @timestamp | The field that will hold the parsed date. |
 | `formats` | yes | - | An array of the expected date formats. Can be a [java time pattern](/reference/elasticsearch/mapping-reference/mapping-date-format.md) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. |
-| `timezone` | no | UTC | The timezone to use when parsing the date. Supports [template snippets](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#template-snippets). |
+| `timezone` | no | UTC | The default timezone used by the processor (see below). Supports [template snippets](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#template-snippets). |
 | `locale` | no | ENGLISH | The locale to use when parsing the date, relevant when parsing month names or week days. Supports [template snippets](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#template-snippets). |
 | `output_format` | no | `yyyy-MM-dd'T'HH:mm:ss.SSSXXX` | The format to use when writing the date to `target_field`. Must be a valid [java time pattern](/reference/elasticsearch/mapping-reference/mapping-date-format.md). |
 | `description` | no | - | Description of the processor. Useful for describing the purpose of the processor or its configuration. |
@@ -25,6 +24,14 @@ $$$date-options$$$
 | `on_failure` | no | - | Handle failures for the processor. See [Handling pipeline failures](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#handling-pipeline-failures). |
 | `tag` | no | - | Identifier for the processor. Useful for debugging and metrics. |
 
+The `timezone` option may have two effects on the behavior of the processor:
+ - If the string being parsed matches a format representing a local date-time, such as `yyyy-MM-dd HH:mm:ss`, it will be assumed to be in the timezone specified by this option. This is not applicable if the string matches a format representing a zoned date-time, such as `yyyy-MM-dd HH:mm:ss zzz`: in that case, the timezone parsed from the string will be used. It is also not applicable if the string matches an absolute time format, such as `epoch_millis`.
+ - The date-time will be converted into the timezone given by this option before it is formatted and written into the target field. This is not applicable if the `output_format` is an absolute time format such as `epoch_millis`.
+
+::::{warning}
+We recommend avoiding the use of short abbreviations for timezone names, since they can be ambiguous. For example, under certain circumstances, one JDK might interpret `PST` as `America/Tijuana`, i.e. Pacific (Standard) Time, while another JDK might interpret it as `Asia/Manila`, i.e. Philippine Standard Time. If your input data contains such abbreviations, you should convert them using your own knowledge of what each abbreviation means in your data before parsing them. See below for an example. This does not apply to UTC, which is always safe to use.
+::::
+
 Here is an example that adds the parsed date to the `timestamp` field based on the `initial_date` field:
 
 ```js
@@ -62,3 +69,47 @@ The `timezone` and `locale` processor parameters are templated. This means that
 }
 ```
 
+In the example below, the `message` field in the input is expected to be a string formed of a local date-time in `yyyyMMddHHmmss` format, a timezone abbreviated to one of `PST`, `CET`, or `JST` representing Pacific, Central European, or Japan time, and a body. This field is split up with a `grok` processor, then the timezones are converted into standard full names with a `script` processor, then the date-time is parsed with a `date` processor, and finally the unwanted fields are discarded with a `drop` processor.
+
+```js
+{
+  "description" : "...",
+  "processors": [
+    {
+      "grok": {
+        "field": "message",
+        "patterns": ["%{DATESTAMP_EVENTLOG:local_date_time} %{TZ:short_tz} %{GREEDYDATA:body}"],
+        "pattern_definitions": {
+          "TZ": "[A-Z]{3}"
+        }
+      }
+    },
+    {
+      "script": {
+        "source": "ctx['full_tz'] = params['tz_map'][ctx['short_tz']]",
+        "params": {
+          "tz_map": {
+            "PST": "America/Los_Angeles",
+            "CET": "Europe/Amsterdam",
+            "JST": "Asia/Tokyo"
+          }
+        }
+      }
+    },
+    {
+      "date": {
+        "field": "local_date_time",
+        "formats": ["yyyyMMddHHmmss"],
+        "timezone": "{{{full_tz}}}"
+      }
+    },
+    {
+      "remove": {
+        "field": ["message", "local_date_time", "short_tz", "full_tz"]
+      }
+    }
+  ]
+}
+```
+
+With that pipeline, a `message` field with the value `20250102123456 PST Hello world` will result in a `@timestamp` field with the value `2025-01-02T12:34:56.000-08:00` and a `body` field with the value `Hello world`. (Note: A `@timestamp` field will normally be mapped to a `date` type, and therefore it will be indexed as an integer representing milliseconds since the epoch, although the original format and timezone may be preserved in the `_source`.)