Skip to content

Conversation

susan-shu-c
Copy link
Member

@susan-shu-c susan-shu-c commented Sep 18, 2025

1. What does this PR do?

2. Which ECS fields are affected/introduced?

Field Type Description /Usage
gen_ai.system_instructions flattened The system message or instructions provided to the GenAI model separately from the chat history.
gen_ai.input.messages nested The chat history provided to the model as an input.
gen_ai.output.messages nested Messages returned by the model where each message represents a specific model response (choice, candidate).
gen_ai.tool.definitions nested The list of source system tool definitions available to the GenAI agent or model.
gen_ai.tool.call.arguments flattened Parameters passed to the tool call.
gen_ai.tool.call.result flattened The result returned by the tool call (if any and if execution was successful).

Changes based on OTel:

3. Why is this change necessary?

4. Have you added/updated documentation?

YES / NO / N/A

5. Have you built ECS and committed any newly generated files?

YES / NO

6. Have you run the ECS validation tests locally?

YES / NO

7. Anything else for the reviewers?

Looking for feedback

[Edit: see comment]

For the fields where it would be more useful to keep the associations and have more cases for searching, I changed the field type to nested, and for those that don't need the associations and probably don't need nested searching, I changed them to flattened.

For most of the fields, they are lists of .json objects, or .json objects. For fields whose content could be very long (input.messages, output.messages), I have proposed that they are the flattened type due to costs.

via docs for nested type:

When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with key and value fields. Instead, consider using the flattened data type, which maps an entire object as a single field and allows for simple searches over its contents. Nested documents and queries are typically expensive, so using the flattened data type for this use case is a better option.

Though as I am not a subject matter expert on the field types and efficiency, looking for additional feedback or comments.


Commit Message

Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link

Documentation changes preview: https://docs-v3-preview.elastic.dev/elastic/ecs/pull/2532/reference/

Copy link

github-actions bot commented Sep 18, 2025

🔍 Preview links for changed docs

| [ECS](/reference/ecs-ecs.md) | Meta-information specific to ECS. |
| [ELF Header](/reference/ecs-elf.md) | These fields contain Linux Executable Linkable Format (ELF) metadata. |
| [Email](/reference/ecs-email.md) | Describes an email transaction. |
| [Entity](/reference/ecs-entity.md) | Fields to describe various types of entities across IT environments. |
Copy link
Member Author

@susan-shu-c susan-shu-c Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased off main and then generated the files, not sure why these changes are still showing up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was bug unrelated to this in how the entity fields are generated. The fix for that isn't merged yet.

Don't worry about these changes in your PR for now, it'll go away once the other fix is merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this done and I can regenerate the documents? cc @trisch-me

description: The system message or instructions provided to the GenAI model separately from the chat history.
example: TODO
level: extended
beta: This field reuse is beta and subject to change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it’s field reuse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the wording from @mjwolf for the initial batch of fields: #2475 (comment)
However let me know if this needs update

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was incorrect before, the text should be This field is beta and subject to change. for all the beta fields here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all occurrences, thanks for catching that

Copy link
Contributor

@trisch-me trisch-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as stage 2 is a final stage, please update all examples and generate all fields

@flash1293
Copy link

flash1293 commented Oct 8, 2025

The main difference to keep in mind is that flattened does not retain an association between the properties of an array of objects, while nested does. Taking an example from the otel repo:

[
  {
    "role": "system",
    "parts": [
      {
        "type": "text",
        "content": "You are a helpful bot"
      }
    ]
  },
  {
    "role": "user",
    "parts": [
      {
        "type": "text",
        "content": "Tell me a joke about OpenTelemetry"
      }
    ]
  }
]

With flattened, it would not be possible to query for something like "system" role has a text like "helpful bot", because the data is stored like this:

{
  "role": ["system",  "user"],
  "parts.content": [
    "You are a helpful bot",
    "Tell me a joke about OpenTelemetry"
  ]
}

The association between the role field and the parts.content field is lost - in the example it looks like that it would be possible to just look at the index of the value in the array, but that doesn't work in practice for multiple reasons.

So whether it should be flattened or nested really depends on what we want to do with this data later on.

I'm not an expert on the genai stuff, not sure what to do about it...

@susan-shu-c
Copy link
Member Author

@flash1293 thanks a lot for explaining the tradeoffs. For the fields where it would be more useful to keep the associations and have more cases for searching, I changed the field type to nested, and for those that don't need the associations and probably don't need nested searching, I changed them to flattened. Really appreciate the help.

gen_ai.tool.definitions | (Looking for feedback) nested | (Part of invoke_agent span) The list of source system tool definitions available to the GenAI agent or model.
gen_ai.tool.call.arguments | (Looking for feedback) nested | (Part of OTel execute_tool span) Parameters passed to the tool call.
gen_ai.tool.call.result | (Looking for feedback) nested | (Part of OTel execute_tool span) The result returned by the tool call (if any and if execution was successful).
gen_ai.tool.call.result | (Looking for feedback) flattened | (Part of OTel execute_tool span) The result returned by the tool call (if any and if execution was successful).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can take out (Looking for feedback) if the types have been decided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, apologies, I didn't realize this file had to be updated as well. Let me go and refresh all these files so that they are all up to date

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@flash1293
Copy link

Thanks @susan-shu-c - side note, do we have cases of existing nested and flattened ECS fields already?

@susan-shu-c
Copy link
Member Author

Hi @flash1293 there are a few:

Flattened:
elf.exports
log.syslog.structured_data

Nested:
elf.sections
threat.enrichments

@susan-shu-c
Copy link
Member Author

susan-shu-c commented Oct 10, 2025

Thanks for the comments all. I've updated the following:

  1. Cleaned up the rfcs/text/0052-gen_ai-additional-fields.md file to be on par with rfcs/text/0052/gen_ai.yaml, cleaned up unused comments.
  2. Updated schemas/gen_ai.yml which Michael said the published schema will be taken from

However, when I try to run make clean generate experimental I am getting some unexpected behavior, where many .md files in docs is being deleted, trying to resolve with @mjwolf

Also getting a failure in tests

  File "/code/ecs/scripts/generators/otel.py", line 156, in __set_stability
    otel['stability'] = self.attributes[get_otel_attribute_name(field_details, otel)]['stability']
                        ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'gen_ai.output.messages'
make: *** [generator] Error 1

OTel reference: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/#gen-ai-output-messages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants