Skip to content
This repository was archived by the owner on Apr 2, 2024. It is now read-only.

Introduce materialization for operations #247

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Conversation

cevian
Copy link
Contributor

@cevian cevian commented May 8, 2022

This commit introduces a materialization to record
the parent-child relation between operations. Namely for
and span of operation P that is the parent of a span with
operation C, we record the 10-minute bucketed relation:

  • bucket time
  • id of operation P
  • id of operation C
  • count()

This gives us a way to quickly aggregate the service map.
The materialization is done like a continous aggregate. But,
we couldn't use continuous aggs since they don't support
self-joins.

This materialization support real-time aggregation where the
materialization is combined with a freshly-calculated aggregation
of the not-yet materialized head of the data. This is available
in the _ps_trace.operation_stats view.

A limitation of this approach is that we don't support invalidations
so if a new span comes in for a time-period that has already been
materialized it is never corrected. Currently we materialize
data that is no newer than 10-minutes prior to the most
recent data.

A few assumptions are made in the materialization:

  • That a child span's start_time is >= parent span's
    start time. This seem obvious but I couldn't find
    it documented in the spec.
  • A child span's start_time is < parent span's start_time + 1 hour.
    This is more questionable. But I think it's ok if we
    document the limitation.

The background job to materialize this data is set up to be separate
from the rest of the jobs for isolation and also because we always
want to run this every 10 minutes and don't want parallelism.

Fixed parameters we may want to make configurable later:

  • Size of bucket (currently 10 minutes)
  • Size aggregated in one loop/txn (1 bucket)
  • Invalidation window (10 minutes)
  • chunk size of materialized hypertable (7 days)

This commit introduces a materialization to record
the parent-child relation between operations. Namely for
and span of operation P that is the parent of a span with
operation C, we record the 10-minute bucketed relation:
- bucket time
- id of operation P
- id of operation C
- count()

This gives us a way to quickly aggregate the service map.
The materialization is done like a continous aggregate. But,
we couldn't use continuous aggs since they don't support
self-joins.

This materialization support real-time aggregation where the
materialization is combined with a freshly-calculated aggregation
of the not-yet materialized head of the data. This is available
in the _ps_trace.operation_stats view.

A limitation of this approach is that we don't support invalidations
so if a new span comes in for a time-period that has already been
materialized it is never corrected. Currently we materialize
data that is no newer than 10-minutes prior to the most
recent data.

A few assumptions are made in the materialization:
- That a child span's start_time is >= parent span's
  start time. This seem obvious but I couldn't find
  it documented in the spec.
- A child span's start_time is < parent span's start_time + 1 hour.
  This is more questionable. But I think it's ok if we
  document the limitation.

The background job to materialize this data is set up to be separate
from the rest of the jobs for isolation and also because we always
want to run this every 10 minutes and don't want parallelism.

Fixed parameters we may want to make configurable later:
- Size of bucket (currently 10 minutes)
- Size aggregated in one loop/txn (1 bucket)
- Invalidation window (10 minutes)
- chunk size of materialized hypertable (7 days)
@pgnickb
Copy link
Contributor

pgnickb commented May 9, 2022

A child span's start_time is < parent span's start_time + 1 hour.
This is more questionable. But I think it's ok if we
document the limitation.

Fixed parameters we may want to make configurable later:

Perhaps worth adding the max parent age is worth it as well. We never know what the end system is going to be like. I reckon on a system where the delta of >1h is the case it wont be as significant of a performance penalty either.


--correctness check
SELECT * FROM real_view
EXCEPT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat :-)

_query_start := clock_timestamp ();
INSERT INTO _ps_trace.operation_materialization
SELECT
public.time_bucket(_bucket, parent.start_time) bucket,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public.time_bucket(_bucket, parent.start_time) bucket,
public.time_bucket(_bucket, parent.start_time) as bucket,

child.operation_id as child_operation_id,
count(*) as cnt
FROM
_ps_trace.span parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ps_trace.span parent
_ps_trace.span as parent

FROM
_ps_trace.span parent
INNER JOIN
_ps_trace.span child ON (parent.span_id = child.parent_span_id AND parent.trace_id = child.trace_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ps_trace.span child ON (parent.span_id = child.parent_span_id AND parent.trace_id = child.trace_id)
_ps_trace.span as child ON (parent.span_id = child.parent_span_id AND parent.trace_id = child.trace_id)


--we materialize one bucket at a time until we are done. We do this so that
--we can make steady progress even if we are pretty far behind.
WHILE (_start + _bucket) <= _global_end LOOP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain this should be <= and not just <. Can you please elaborate on this?

INNER JOIN
_ps_trace.span child ON (parent.span_id = child.parent_span_id AND parent.trace_id = child.trace_id)
WHERE
parent.start_time >= COALESCE(_ps_trace.materialize_watermark(), '-infinity'::timestamp with time zone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parent.start_time >= COALESCE(_ps_trace.materialize_watermark(), '-infinity'::timestamp with time zone)
parent.start_time >= COALESCE(_ps_trace.materialize_watermark(), '-infinity'::pg_catalog.timestamptz)

WHERE
parent.start_time >= COALESCE(_ps_trace.materialize_watermark(), '-infinity'::timestamp with time zone)
--assumes the child span starts on or after the parent starts
AND child.start_time >= COALESCE(_ps_trace.materialize_watermark(), '-infinity'::timestamp with time zone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AND child.start_time >= COALESCE(_ps_trace.materialize_watermark(), '-infinity'::timestamp with time zone)
AND child.start_time >= COALESCE(_ps_trace.materialize_watermark(), '-infinity'::pg_catalog.timestamptz)

WHERE parent_start_time < COALESCE(_ps_trace.materialize_watermark(), '-infinity'::timestamp with time zone)
UNION ALL
SELECT
public.time_bucket('10 minute', parent.start_time) parent_start_time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth to put the interval into an in-linable config-function.

child_operation_id bigint,
cnt bigint,
--todo what else,
UNIQUE(parent_start_time, parent_operation_id, child_operation_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth upgrading to a primary key to enforce NOT NULLs.

--keep this separate from the other jobs for independence and also since the right
--schedule here is always "ever 10 min". We never want this to run in parallel so
--it's always just one job making progress. This job is usally pretty quiet so enable
--logging by default (should chirp once every 10 min).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree with this. It is quiet indeed, but what benefit does it provide? Perhaps it's better to expose the duration (or the end_time) in the job table (_ps_trace.operation_materialization) so that the user can assess the performance change of this over time and consider changing the bucket size.
If this is a file log entry then I'd see this as yet another line I wouldn't care about, since a single entry doesn't give you any information at all.

@JamesGuthrie JamesGuthrie removed their request for review August 2, 2022 07:00
@jgpruitt jgpruitt removed their request for review January 31, 2023 21:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants