Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add global histogram metrics for received message sizes per-protocol #12087

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gomoripeti
Copy link
Contributor

Proposed Changes

Keep track of message size distribution received via AMQP 0.9.1, AMQP 1.0 and MQTT protocols. This is useful information to set an appropriate max_message_size config or otherwise tune RabbitMQ or the clients.

This metric is bumped right after message size check happens.
A separate module is added to hold the set of seshat counters of the histogram buckets, in order to groupd them better together as opposed to just adding the additional counters in the per-protocol global counters list.

Partially implements #11637

TODO: document new prometheus metrics

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • I have added tests that prove my fix is effective or that my feature works
  • All tests pass locally with my changes
  • If relevant, I have added necessary documentation to https://github.com/rabbitmq/rabbitmq-website
  • If relevant, I have added this change to the first version(s) in release-notes that I expect to introduce it

Further Comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution
you did and what alternatives you considered, etc.

@lukebakken lukebakken self-assigned this Aug 22, 2024
@gomoripeti
Copy link
Contributor Author

  • I used make and forgot to update any bazel file with the new module, trying to look what needs to be updated
  • I did not add the metric to the stream protocol. in some edge cases its not obvious how to get message size eg in compressed sub-batches (one option is to count it as MessageCount number of UncompressedSize / MessageCount messages, or just skip such a sub-batch)
  • because of the native amqp 1.0 rewrite this PR probably cannot be automatically backported to 3.13.x but Im willing to do it manually after it is accepted

@mergify mergify bot added the bazel label Aug 22, 2024
Copy link
Member

@ansd ansd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expose a histogram, I'm in favour of using a Prometheus histogram data type instead of counter data type.

This PR uses counter
~/workspace/rabbitmq-server$ curl -s localhost:15692/metrics | grep rabbitmq_global_message_size
# TYPE rabbitmq_global_message_size_16KiB counter
# HELP rabbitmq_global_message_size_16KiB Total number of messages received from publishers with a size =< 16KiB
rabbitmq_global_message_size_16KiB{protocol="amqp091"} 0
rabbitmq_global_message_size_16KiB{protocol="amqp10"} 0
rabbitmq_global_message_size_16KiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_16KiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_16KiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_16KiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_1MiB counter
# HELP rabbitmq_global_message_size_1MiB Total number of messages received from publishers with a size =< 1MiB
rabbitmq_global_message_size_1MiB{protocol="amqp091"} 0
rabbitmq_global_message_size_1MiB{protocol="amqp10"} 0
rabbitmq_global_message_size_1MiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_1MiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_1MiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_1MiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_512MiB counter
# HELP rabbitmq_global_message_size_512MiB Total number of messages received from publishers with a size =< 512MiB
rabbitmq_global_message_size_512MiB{protocol="amqp091"} 0
rabbitmq_global_message_size_512MiB{protocol="amqp10"} 0
rabbitmq_global_message_size_512MiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_512MiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_512MiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_512MiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_64B counter
# HELP rabbitmq_global_message_size_64B Total number of messages received from publishers with a size =< 64B
rabbitmq_global_message_size_64B{protocol="amqp091"} 0
rabbitmq_global_message_size_64B{protocol="amqp10"} 0
rabbitmq_global_message_size_64B{protocol="mqtt310"} 0
rabbitmq_global_message_size_64B{protocol="mqtt311"} 0
rabbitmq_global_message_size_64B{protocol="mqtt50"} 0
rabbitmq_global_message_size_64B{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_16MiB counter
# HELP rabbitmq_global_message_size_16MiB Total number of messages received from publishers with a size =< 16MiB
rabbitmq_global_message_size_16MiB{protocol="amqp091"} 0
rabbitmq_global_message_size_16MiB{protocol="amqp10"} 0
rabbitmq_global_message_size_16MiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_16MiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_16MiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_16MiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_256MiB counter
# HELP rabbitmq_global_message_size_256MiB Total number of messages received from publishers with a size =< 256MiB
rabbitmq_global_message_size_256MiB{protocol="amqp091"} 0
rabbitmq_global_message_size_256MiB{protocol="amqp10"} 0
rabbitmq_global_message_size_256MiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_256MiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_256MiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_256MiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_4MiB counter
# HELP rabbitmq_global_message_size_4MiB Total number of messages received from publishers with a size =< 4MiB
rabbitmq_global_message_size_4MiB{protocol="amqp091"} 0
rabbitmq_global_message_size_4MiB{protocol="amqp10"} 0
rabbitmq_global_message_size_4MiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_4MiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_4MiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_4MiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_1KiB counter
# HELP rabbitmq_global_message_size_1KiB Total number of messages received from publishers with a size =< 1KiB
rabbitmq_global_message_size_1KiB{protocol="amqp091"} 0
rabbitmq_global_message_size_1KiB{protocol="amqp10"} 0
rabbitmq_global_message_size_1KiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_1KiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_1KiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_1KiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_4KiB counter
# HELP rabbitmq_global_message_size_4KiB Total number of messages received from publishers with a size =< 4KiB
rabbitmq_global_message_size_4KiB{protocol="amqp091"} 0
rabbitmq_global_message_size_4KiB{protocol="amqp10"} 0
rabbitmq_global_message_size_4KiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_4KiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_4KiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_4KiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_64MiB counter
# HELP rabbitmq_global_message_size_64MiB Total number of messages received from publishers with a size =< 64MiB
rabbitmq_global_message_size_64MiB{protocol="amqp091"} 0
rabbitmq_global_message_size_64MiB{protocol="amqp10"} 0
rabbitmq_global_message_size_64MiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_64MiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_64MiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_64MiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_256KiB counter
# HELP rabbitmq_global_message_size_256KiB Total number of messages received from publishers with a size =< 256KiB
rabbitmq_global_message_size_256KiB{protocol="amqp091"} 0
rabbitmq_global_message_size_256KiB{protocol="amqp10"} 0
rabbitmq_global_message_size_256KiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_256KiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_256KiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_256KiB{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_256B counter
# HELP rabbitmq_global_message_size_256B Total number of messages received from publishers with a size =< 256B
rabbitmq_global_message_size_256B{protocol="amqp091"} 0
rabbitmq_global_message_size_256B{protocol="amqp10"} 0
rabbitmq_global_message_size_256B{protocol="mqtt310"} 0
rabbitmq_global_message_size_256B{protocol="mqtt311"} 0
rabbitmq_global_message_size_256B{protocol="mqtt50"} 0
rabbitmq_global_message_size_256B{protocol="stream"} 0
# TYPE rabbitmq_global_message_size_64KiB counter
# HELP rabbitmq_global_message_size_64KiB Total number of messages received from publishers with a size =< 64KiB
rabbitmq_global_message_size_64KiB{protocol="amqp091"} 0
rabbitmq_global_message_size_64KiB{protocol="amqp10"} 0
rabbitmq_global_message_size_64KiB{protocol="mqtt310"} 0
rabbitmq_global_message_size_64KiB{protocol="mqtt311"} 0
rabbitmq_global_message_size_64KiB{protocol="mqtt50"} 0
rabbitmq_global_message_size_64KiB{protocol="stream"} 0
I'd prefer a histogram something along these lines
# TYPE rabbitmq_global_message_size_bytes histogram
HELP rabbitmq_global_message_size_bytes Message Size

rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="64"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="256"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="1024"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="4096"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="16384"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="65536"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="262144"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="1048576"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="4194304"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="1048576"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="16777216"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="25165824"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="268435456"} 0
rabbitmq_global_message_size_bytes_count{protocol="amqp091"} 0
rabbitmq_global_message_size_bytes_sum{protocol="amqp091"} 0

rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="64"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="256"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="1024"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="4096"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="16384"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="65536"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="262144"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="1048576"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="4194304"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="1048576"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="16777216"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="25165824"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp10",le="268435456"} 0
rabbitmq_global_message_size_bytes_count{protocol="amqp10"} 0
rabbitmq_global_message_size_bytes_sum{protocol="amqp10"} 0

This PR also doesn't name metrics correctly because it uses names such as rabbitmq_global_message_size_64MiB.
According to https://prometheus.io/docs/practices/naming/#metric-names:

A metric name should use base units (e.g. seconds, bytes, meters - not milliseconds, megabytes, kilometers).

@ansd
Copy link
Member

ansd commented Aug 23, 2024

As a bonus, it might be worth even trying out experimental native Prometheus histograms because we want exactly these properties:

  • Native histogram bucket boundaries are calculated by a formula that depends on the scale (resolution) of the native histogram, and are not user defined. The calculation produces exponentially increasing bucket boundaries.
  • Native histogram bucket boundaries might change (widen) dynamically if the observations result in too many buckets.
  • An instance of a native histogram metric only requires a single time series, because the buckets, sum of observations, and the count of observations are stored in a single data type called native histogram rather than in separate time series using the float data type. Thus, there are no _bucket, _sum, and _count series. There is only time series.

Anyway, for now we can start with the current histogram Prometheus data type.
We could still use Erlang counters internally. However we should expose a histogram via the Prometheus histogram data type instead of counter data type.

@gomoripeti
Copy link
Contributor Author

Thank you @ansd for the valuable feedback, I din't know about the histogram prometheus type

@gomoripeti
Copy link
Contributor Author

reading up on native histograms I found:

Native histograms don’t have a textual presentation at the moment on the application’s /metrics endpoint, thus Prometheus negotiates a Protobuf protocol transfer in this case.

I guess the fact that native histograms can only be exposed via protobuf is a blocker for now. But interesting topic for the future.

@gomoripeti
Copy link
Contributor Author

Updated to expose prometheus classic histograms.
(seshat is replaced by directly calling counters, but maybe it could be generalized to add histogram support to seshat. Currently many things are hardcoded and static eg the bucket limits)

(I realize the bellow example could go into a unit test, will add on Monday)

1> rabbit_global_counters:message_size(amqp091, 100).
ok
2> rabbit_global_counters:message_size(amqp091, 60_000).
ok
3> rabbit_global_counters:overview().
#{[{protocol,amqp091}] =>
      #{consumers => 0,messages_received_total => 0,
        messages_received_confirm_total => 0,
        messages_routed_total => 0,
        messages_unroutable_dropped_total => 0,
        messages_unroutable_returned_total => 0,
        messages_confirmed_total => 0,publishers => 0,
        message_size_bytes =>
            #{64 => 0,256 => 1,1024 => 0,4096 => 0,16384 => 0,65536 => 1,
              262144 => 0,1048576 => 0,4194304 => 0,16777216 => 0,
              67108864 => 0,268435456 => 0,infinity => 0,sum => 60100}},
...
4> rabbit_global_counters:prometheus_format().
...
  message_size_bytes =>
      #{type => histogram,
        values =>
             [...
              {[{protocol,amqp091}],
              [{64,0},
               {256,1},
               {1024,1},
               {4096,1},
               {16384,1},
               {65536,2},
               {262144,2},
               {1048576,2},
               {4194304,2},
               {16777216,2},
               {67108864,2},
               {268435456,2},
               {infinity,2}],
              2,60100},
...
                              ],
        help => "Size of messages received from publishers"},
...
curl -s localhost:15692/metrics|grep message_size
# TYPE rabbitmq_global_message_size_bytes histogram
# HELP rabbitmq_global_message_size_bytes Size of messages received from publishers
...
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="64"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="256"} 1
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="1024"} 1
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="4096"} 1
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="16384"} 1
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="65536"} 2
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="262144"} 2
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="1048576"} 2
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="4194304"} 2
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="16777216"} 2
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="67108864"} 2
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="268435456"} 2
rabbitmq_global_message_size_bytes_bucket{protocol="amqp091",le="+Inf"} 2
rabbitmq_global_message_size_bytes_count{protocol="amqp091"} 2
rabbitmq_global_message_size_bytes_sum{protocol="amqp091"} 60100

@gomoripeti
Copy link
Contributor Author

rebased to latest main and added a unit test suite. let me know if there is anything else I can do with the PR.

@ansd
Copy link
Member

ansd commented Sep 19, 2024

Thank you @gomoripeti.
This PR needs to run bazel run gazelle. Since I don't have permissions to push to this PR branch, I squashed your commits into one commit and created #12342. Let's see whether tests pass on that new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants