Ingest pipeline processors: Syntax for explicit access of fields with dots #125841
Labels
:Data Management/Ingest Node
Execution or management of Ingest Pipelines including GeoIP
>enhancement
Team:Data Management
Meta label for data/management team
Description
Context
It's not uncommon for documents to have fields with dots in them:
When specifying a field in a processor (e.g. grok, rename or others), it's currently not possible to target these fields, because dots are always interpreted as nested objects.
{ "grok": { "field": "nested.a.b.c" }}
will only work on{ "nested": { "a": { "b": { "c": "This is a test" } } } }
.This is especially relevant for OTel data and the streams project which plans to transform all incoming data to match the otel format.
Solution
A new syntax should be introduced to allow accessing these fields in all processors. Dots are interpreted as nested objects except when enclosed in
['
and']
:Some examples:
It's possible to escape quotes within the quotes using
\
to still access field names with brackets in them:Open questions
How does this syntax play with mustache template which are supported in some cases? For the scope of the observability team, it would be OK to not support it initially - this could be added later on.
Breaking change
This feature constitutes a change of behavior - using
['
followed by']
in a field name specified in an ingest pipeline is currently allowed and treats these as regular characters. However, these cases are expected to be very rare.Draft for breaking change proposal: https://github.com/elastic/dev/issues/3091
Why not dot_expander?
The dot_expander processor is addressing a similar need by normalizing the data instead of allowing the user to specify the difference. However, it has some downsides which are unacceptable in some cases:
References
POC: #125566
Discussion: https://github.com/elastic/streams-program/discussions/224
The text was updated successfully, but these errors were encountered: