Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Inclusion of Trace Object & Profile #1243

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

pladamgregory
Copy link

@pladamgregory pladamgregory commented Nov 5, 2024


Proposal: Inclusion of Trace Object & Profile

Description: This proposal introduces the concept of traces to the OCSF schema. The goal is to enhance OCSF to cover observability data for distributed traces.

Traces contain vital information about the flow of requests through a distributed system, including a unique trace ID, individual span IDs, timestamps for start and end, duration, and metadata such as service names and error details. This data provides a comprehensive view of how requests are processed, revealing performance metrics and service dependencies. Traces are useful for performance monitoring, as they help identify bottlenecks and slow operations. They also facilitate root cause analysis by allowing developers to pinpoint issues and optimize the overall system for improved reliability and user experience.

To support the proposal, here's how the modeled example would look when applied to a purchase transaction trace. This illustrates how each span and event would be structured, using OCSF:

Example Trace: Purchase Transaction Trace

1. User Service Span

  • Span Name: User Authentication
  • Service: User Service
  • Duration: 10ms
  • Events:
    • start_auth: Marks when authentication started
    • db_query: Records time spent querying the user database
    • auth_success: Indicates successful authentication

2. Order Service Span

  • Span Name: Create Order
  • Service: Order Service
  • Parent Span: User Authentication (order creation requires user authentication)
  • Duration: 50ms
  • Events:
    • validate_cart: Checks if all items in the cart are available
    • calculate_total: Calculates the total price
    • order_created: Confirms that the order was created in the system

3. Payment Service Span

  • Span Name: Process Payment
  • Service: Payment Service
  • Parent Span: Create Order
  • Duration: 100ms
  • Events:
    • start_payment: Marks the initiation of the payment process
    • payment_gateway_call: Time spent calling an external payment gateway
    • payment_success: Confirms successful payment processing

4. Inventory Service Span

  • Span Name: Update Inventory
  • Service: Inventory Service
  • Parent Span: Create Order
  • Duration: 30ms
  • Events:
    • inventory_lock: Temporarily locks inventory items
    • update_db: Updates inventory database to reflect items sold
    • inventory_release: Releases inventory lock

5. Notification Service Span

  • Span Name: Send Confirmation Email
  • Service: Notification Service
  • Parent Span: Create Order
  • Duration: 20ms
  • Events:
    • email_generated: Generates the email content
    • email_sent: Confirms the email was sent to the user

Summary of Trace

  • Trace: Purchase Item
  • Flow: User AuthenticationCreate OrderProcess PaymentUpdate InventorySend Confirmation Email

OCSF Model (Table)

Action Description Event Class Profile Type Trace ID Span ID
start_auth Marks when authentication started 3002 Trace Profile Trace_001 Span_001
db_query Records time spent querying the user database 6005 Trace Profile Trace_001 Span_002
auth_success Indicates successful authentication 3002 Trace Profile Trace_001 Span_003
validate_cart Checks if all items in the cart are available 6009 (New Application Execution Activity) Trace Profile Trace_001 Span_004
calculate_total Calculates the total price 6009 (New Application Execution Activity) Trace Profile Trace_001 Span_005
order_created Confirms that the order was created in the system 6009 (New Application Execution Activity) Trace Profile Trace_001 Span_006
start_payment Marks the initiation of the payment process 6009 (New Application Execution Activity) Trace Profile Trace_001 Span_007
payment_gateway_call Time spent calling an external payment gateway 6003 Trace Profile Trace_001 Span_008
payment_success Confirms successful payment processing 6003 Trace Profile Trace_001 Span_009
inventory_lock Temporarily locks inventory items 6009 (New Application Execution Activity) Trace Profile Trace_001 Span_010
update_db Updates inventory database to reflect items sold 6005 Trace Profile Trace_001 Span_011
inventory_release Releases inventory lock 6009 (New Application Execution Activity) Trace Profile Trace_001 Span_012
email_generated Generates the email content 4009 Trace Profile Trace_001 Span_013
email_sent Confirms the email was sent to the user 4009 Trace Profile Trace_001 Span_014

New trace_info Object & Profile

Trace Object: Defines key application Trace Information for trace events. (Included Via trace profile)

{
  "caption": "Trace",
  "description": "The trace object contains information about distruibuted traces which are critical to observability and describe how requests move through a system, capturing each step's timing and status.",
  "extends": "object",
  "name": "trace",
  "attributes": {
    "uid": {
      "description": "The unique identifier of the trace used in distributed systems and microservices architecture to track and correlate requests across various components of an application.",
      "requirement": "required"
    },
    "span": {
      "description": "The attributes associated with a span within a distributed trace.",
      "requirement": "optional"
    },
    "service": {
      "description": "Identifies the service or component generating the trace.",
      "requirement": "optional"
    },
    "status_code": {
      "description": "Indicates whether the operations in the trace were successful, failed, or had an error, aiding in pinpointing issues.",
      "requirement": "optional"
    },
    "start_time": {
      "description": "The start timestamp of the trace, essential for identifying latency and performance bottlenecks.",
      "requirement": "optional"
    },
    "end_time": {
      "description": "The end timestamp of the trace, essential for identifying latency and performance bottlenecks.",
      "requirement": "optional"
    },
    "duration": {
      "description": "The trace duration, the amount of time the trace covers from <code>start_time</code> to <code>end_time</code> in milliseconds.",
      "requirement": "optional"
    }
  }
}

New Trace Attributes: Enum of key application Trace Information for trace events.

"trace": {
  "caption": "Trace",
  "description": "The attributes associated with an event containing trace data.",
  "type": "trace"
},
"span": {
  "caption": "Span",
  "description": "The attributes associated with an event containing span data.",
  "type": "span"
},

New Span Object Attributes: Enum of key application Trace Information for trace events.

{
  "caption": "Span",
  "description": "The attributes associated with an event containing span data.",
  "extends": "object",
  "name": "span",
  "attributes": {
    "uid": {
      "description": "The unique identifier of the span used in distributed systems and microservices architecture to track and correlate requests across various components of an application.",
      "requirement": "required"
    },
    "service": {
      "description": "Identifies the service or component creating the span, which helps track its path through a distributed system.",
      "requirement": "optional"
    },
    "operation": {
      "description": "Describes an actions performed in a span, such as API requests, database queries, or computations.",
      "requirement": "optional",
      "is_array": true
    },
    "parent_span": {
      "description": "The parent span of this span object. It is recommended to only populate this field for the first process object, to prevent deep nesting.",
      "requirement": "optional"
    },
    "start_time": {
      "description": "The start timestamp of the span, essential for identifying latency and performance bottlenecks.",
      "requirement": "optional"
    },
    "end_time": {
      "description": "The end timestamp of the span, essential for identifying latency and performance bottlenecks.",
      "requirement": "optional"
    },
    "duration": {
      "description": "The span duration, the amount of time the trace covers from <code>start_time</code> to <code>end_time</code> in milliseconds.",
      "requirement": "optional"
    },
    "status_code": {
      "description": "Indicates whether the operations in the span were successful, failed, or had an error, aiding in pinpointing issues.",
      "requirement": "optional"
    }
  }
}

Traces profile

{
  "description": "The Traces Profile extends the OCSF framework to capture and standardize observability events, specifically targeting trace-level data. This profile enables integration and normalization of distributed tracing information, allowing OCSF events to retain essential trace context such as trace IDs, span relationships, and service dependencies.",
  "meta": "profile",
  "caption": "Traces",
  "name": "traces",
  "annotations": {
    "group": "primary"
  },
  "attributes": {
    "trace": {
      "description": "The trace object contains information about distruibuted traces which are critical to observability and describe how requests move through a system, capturing each step's timing and status.",
      "requirement": "recommended"
    }
  }
}

Reverting span operation back to string in span.json

Signed-off-by: Adam Gregory <[email protected]>
objects/span.json Outdated Show resolved Hide resolved
"description": "The unique identifier of the trace used in distributed systems and microservices architecture to track and correlate requests across various components of an application.",
"requirement": "required"
},
"span": {
Copy link
Contributor

@davemcatcisco davemcatcisco Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does every trace have just a single top-level span? In your example, your Purchase Item trace seems to contain five spans. Is it assumed that there will be a single top-level root span to which all of these will refer to as their parent_span? What I'm asking, I guess, is if this should be an array of spans?

I'm a little bit confused too about how and where child spans would be represented. If a trace decomposes into one or more spans, and if each span can be further decomposed, then does it not make sense for the whole thing to be a sort of recursive structure? e.g.

{
  "spans": [
    {
      "service": {},
      "operation": {},
      "spans": [
        {
          "service": {},
          "operation": {}
        },
        {
          "service": {},
          "operation": {},
          "spans": [
            {
              "service": {},
              "operation": {}
            }
          ]
        }
      ]
    },
    {
      "service": {},
      "operation": {},
      "spans": [
        {
          "service": {},
          "operation": {}
        }
      ]
    }
  ]
}

It's possible I'm completely missing the point here!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think within each span, there might be several events which should be represented by the entire ocsf class. For example a span might contain a change freeze event, a data base update event, and a change unfreeze event. each of these will be represented as its own OCSF record hence the correlation of 1 span to 1 event in the context of ocsf

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if this explanation is satisfactory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think within each span, there might be several events which should be represented by the entire ocsf class. For example a span might contain [multiple events]

Sorry, I'm not really understanding. According to the way you're proposing to set it up in the schema, a span would fall within a trace, and a trace would fall within a HTTP Activity or API Activity. So I don't understand how you're saying that a span contains events.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think contextually you need to think about spans and traces in a different way compared to many of the other OCSF attribution. You can think of a trace as a transaction of transactions and a span as a transaction of related events.

I think the issue is when thinking about the trace span paradigm is that maybe we are viewing traces as events but traces are not events they are essentially a recording of metadata associated with related events.

The trace and span objects essentially are a way of “tagging” the event with the span and trace information from which it may have been associated with

Because of how the ontology of OCSF is designed, the event cannot represent a span or trace but rather carries forward the relevant metadata associated with it. Traces and spans are related events but they are not events in and of themselves and therefore cannot be representing as an OCSF class for example.

By using the trace profile we can ensure the metadata of the relevant associated traces/spans is preserved that is the goal here. Spans are within traces and events are within spans but the trace and span object represent the metadata associated with this for the event in which the objects exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the effort that has gone into that explanation but I'm afraid it's simply too abstract for somebody who isn't familiar with the trace & span terminology. I really don't want to take up any more of your time (or mine) than is necessary. So unless you can exemplify the above with a concrete example that would help me to understand the concepts, I'm going to have to drop out of reviewing this and leave it to folks with a better grasp of this area.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in your Trace example in the PR desc, it appears each trace will have multiple spans? Is that not the case?

Frankly, I am in the same boat as @davemcatcisco here. What would truly help is, an example OCSF event, which utilizes the proposed updates. The end goal being -> this is how an OCSF API Activity event can be augmented with observability/trace information, using these new structures, and then, this is what it all means..

I missing this contextual info, to truly help review the modeling aspects in the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @floydtree. I'm a little relieved to know it's not just me. I have a tendancy sometimes in situations like this to wonder, "am I just too stupid to understand this?"

So, @pladamgregory, there are now two reviewers suggesting that a worked example might be the best way to explain these concepts clearly. Is that something you could do?

Copy link
Author

@pladamgregory pladamgregory Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is a simple implementation of how this would work using OCSF auth + database query over 3 spans within a trace. @davemcatcisco @floydtree Please see below:

[
  {
    "time": 1731414896,
    "activity_id": 6,
    "activity": "Preauth",
    "user": {
      "username": "john.doe"
    },
    "type_uid": 300201,
    "category_uid": 3,
    "trace": {
      "uid": "Trace_001",
      "span": {
        "uid": "Span_001",
        "service": "User Service",
        "operation": [
          "User Authentication"
        ],
        "start_time": "2024-11-12T12:00:00Z",
        "end_time": "2024-11-12T12:00:10Z",
        "duration": "10ms"
      },
      "service": "User Service",
      "status_code": "Success",
      "start_time": "2024-11-12T12:00:00Z",
      "end_time": "2024-11-12T12:01:00Z",
      "duration": "1min"
    }
  },
  {
    "time": 1731414896,
    "activity_id": 1,
    "type_uid": 600501,
    "category_uid": 6,
    "actor": {
      "name": "JohnDoe",
      "role": "User",
      "process": "example_process"
    },
    "database": {
      "name": "SQLdatabase",
      "uid": "bc6e9d20-a125-11ef-91f2-0242ac110007",
      "type_id": 1
    },
    "query_info": {
      "query_string": "GET user.name"
    },
    "src_endpoint": {
      "ip": "192.168.1.1",
      "port": 443
    },
    "trace": {
      "uid": "Trace_001",
      "span": {
        "uid": "Span_002",
        "service": "User Service",
        "operation": [
          "User Authentication",
          "DB Query"
        ],
        "start_time": "2024-11-12T12:00:10Z",
        "end_time": "2024-11-12T12:00:30Z",
        "duration": "20ms"
      },
      "service": "User Service",
      "status_code": "Success",
      "start_time": "2024-11-12T12:00:00Z",
      "end_time": "2024-11-12T12:01:00Z",
      "duration": "1min"
    }
  },
  {
    "time": 1731414896,
    "activity_id": 1,
    "activity": "Logon",
    "user": {
      "username": "john.doe"
    },
    "type_uid": 300201,
    "category_uid": 3,
    "trace": {
      "uid": "Trace_003",
      "span": {
        "uid": "Span_002",
        "service": "User Service",
        "operation": [
          "Auth Success"
        ],
        "start_time": "2024-11-12T12:00:10Z",
        "end_time": "2024-11-12T12:00:30Z",
        "duration": "20ms"
      },
      "service": "User Service",
      "status_code": "Success",
      "start_time": "2024-11-12T12:00:00Z",
      "end_time": "2024-11-12T12:01:00Z",
      "duration": "1min"
    }
  }
]

dictionary.json Outdated Show resolved Hide resolved
objects/span.json Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make more sense to be called "Observability" Profile instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trace is ideal since the metric component of observability will likely have to be a class in and of itself, likely within the discovery category.

Signed-off-by: Adam Gregory <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants