Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest pipeline processors: Syntax for explicit access of fields with dots #125841

Open
flash1293 opened this issue Mar 28, 2025 · 1 comment · May be fixed by #125566
Open

Ingest pipeline processors: Syntax for explicit access of fields with dots #125841

flash1293 opened this issue Mar 28, 2025 · 1 comment · May be fixed by #125566
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team

Comments

@flash1293
Copy link
Contributor

flash1293 commented Mar 28, 2025

Description

Context

It's not uncommon for documents to have fields with dots in them:

{
  "nested": {
     "a.b.c": "This is a test"
  }
}

When specifying a field in a processor (e.g. grok, rename or others), it's currently not possible to target these fields, because dots are always interpreted as nested objects. { "grok": { "field": "nested.a.b.c" }} will only work on { "nested": { "a": { "b": { "c": "This is a test" } } } }.

This is especially relevant for OTel data and the streams project which plans to transform all incoming data to match the otel format.

Solution

A new syntax should be introduced to allow accessing these fields in all processors. Dots are interpreted as nested objects except when enclosed in [' and ']:

{ "grok": { "field": "nested['a.b.c']" }}

Some examples:

"resource.attributes['bar.foo']" // matches {"resource": {"attributes": {"bar.foo": "…"}}}
"['resource']['attributes']['bar.foo']" // same as above
"resource.attributes.bar.foo" // matches {"resource": {"attributes": {"bar": {"foo": "…"}}}}
"['resource']['attributes']['bar']['foo']" // matches {"resource": {"attributes": {"bar": {"foo": "…"}}}}
"['resource.attributes']['bar.foo']" // matches {"resource.attributes": {"bar.foo": "…"}}
"['resource.attributes.bar.foo']" // matches {"resource.attributes.bar.foo": "…"}}

It's possible to escape quotes within the quotes using \ to still access field names with brackets in them:

my['weird[\'fieldname\']'] // matches { "my": { "weird['fieldname']": "..." } }

Open questions

How does this syntax play with mustache template which are supported in some cases? For the scope of the observability team, it would be OK to not support it initially - this could be added later on.

Breaking change

This feature constitutes a change of behavior - using [' followed by '] in a field name specified in an ingest pipeline is currently allowed and treats these as regular characters. However, these cases are expected to be very rare.

Draft for breaking change proposal: https://github.com/elastic/dev/issues/3091

Why not dot_expander?

The dot_expander processor is addressing a similar need by normalizing the data instead of allowing the user to specify the difference. However, it has some downsides which are unacceptable in some cases:

  • Not possible to have a prefix of a dotted field name as a primitive value (especially in OTel this is a common format):
{
  "host": "abc",
  "host.name": "def"  // can't be dot-expanded without breaking host
}
  • Possible collisions
{
  "host": { "name": "abc" },
  "host.name": "def"
}
  • Different from OTTL, which allows this style of access
  • Changes the shape of the data which loses information - it becomes impossible to tell the difference between dotted field names and nested field names

References

POC: #125566
Discussion: https://github.com/elastic/streams-program/discussions/224

@flash1293 flash1293 added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement needs:triage Requires assignment of a team area label labels Mar 28, 2025
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Mar 28, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants