Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add counter telemetry(WIP) #2741

Open
wants to merge 1 commit into
base: factors
Choose a base branch
from
Open

Conversation

me-diru
Copy link
Contributor

@me-diru me-diru commented Aug 21, 2024

Fixes #2564

I think this captures the metrics of LLM and Key-Value store

@me-diru
Copy link
Contributor Author

me-diru commented Aug 21, 2024

Captured the LLM model and prompt information and kv stores get and set key information

Not sure if the key is accessible when querying on Prometheus/Grafana
Screenshot from 2024-08-21 11-50-55

cc: @calebschoepp

@@ -92,6 +92,9 @@ impl key_value::HostStore for KeyValueDispatch {
store: Resource<key_value::Store>,
key: String,
) -> Result<Result<Option<Vec<u8>>, Error>> {
// Log key value host component get feature
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the intention behind these comments, but I think they're needlessly verbose. Directly reading spin_telemetry::counter is pretty intuitive as to what it's doing.

@@ -92,6 +92,9 @@ impl key_value::HostStore for KeyValueDispatch {
store: Resource<key_value::Store>,
key: String,
) -> Result<Result<Option<Vec<u8>>, Error>> {
// Log key value host component get feature
spin_telemetry::counter!(spin.key_value_get = 1, key = key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all probably be monotonic_counter's b/c they're monotonically increasing.

@@ -102,6 +105,9 @@ impl key_value::HostStore for KeyValueDispatch {
key: String,
value: Vec<u8>,
) -> Result<Result<(), Error>> {
// Log key value host component set feature
spin_telemetry::counter!(spin.key_value_set = 1, key = key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that you're adding all this to the host components which we're going to be deleting very soon. This should all probably be added to the factors which mean this would have to be a PR against the factors branch.

spin_telemetry::counter!(
spin.llm_infer = 1,
model_name = model,
prompt_given = prompt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this as an attribute. The prompt could be very large and it is also likely to be high cardinality which is not a good fit for a metric attribute.

@me-diru me-diru force-pushed the OTel-metric branch 2 times, most recently from cd4491a to ea08fb6 Compare August 22, 2024 21:40
@itowlson
Copy link
Contributor

@me-diru is this intended to land in the factors branch, or should it be rebased off main? Currently, merging this would merge a whole bunch of unrelated factors stuff too.

@calebschoepp
Copy link
Collaborator

@me-diru is this intended to land in the factors branch, or should it be rebased off main? Currently, merging this would merge a whole bunch of unrelated factors stuff too.

This should land on factors (or main once factors merges in there).

@me-diru me-diru changed the base branch from main to factors August 22, 2024 22:07
@me-diru
Copy link
Contributor Author

me-diru commented Aug 22, 2024

I changed the base to factors, and it should only reflect my code changes. Thanks for checking in @itowlson !

@me-diru
Copy link
Contributor Author

me-diru commented Aug 22, 2024

@calebschoepp
I am not sure how to test the telemetry metrics in factors. For llm-compute, I think I am capturing it in the right place. However, when I run the build using

4318/v1/traces OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=h
ttp://localhost:4318/v1/metrics ../../../spin/target
/debug/spin up --runtime-config-file ../runtime-conf
ig.toml

it gives me an error of

Eror: unused runtime config key(s): llm_compute

I tried to run the factor_test.rs for the factor-llm, but it gave me another error on

error[E0277]: the trait bound `url::Url: Deserialize<'_>` is not satisfied
  --> crates/factor-llm/src/spin.rs:91:17
   |
91 | #[derive(Debug, serde::Deserialize)]
   |                 ^^^^^^^^^^^^^^^^^^ the trait `Deserialize<'_>` is not implemented for `url::Url`
   |

Which I think satisfies

On the other hand, for factor-key-value, I think it is utilizing the spin-key-value crate to do the set and get functions? So I guess the monotonic counters for key-value should suffice.

Would be great to have your input

@calebschoepp
Copy link
Collaborator

@me-diru I still see 3 commits that aren't yours that you'll want to take out of this diff.

@itowlson
Copy link
Contributor

itowlson commented Aug 22, 2024

@me-diru

the trait Deserialize<'_> is not implemented for url::Url

You might need to enable the serde feature for the url crate. https://docs.rs/url/latest/url/#feature-serde

(This is a common pattern in Rust utility crates, where it's useful for people to be able to serialise the types, but they don't want to force a heavyweight serde dependency on people who just want to sling URL or times or whatever.)

@me-diru
Copy link
Contributor Author

me-diru commented Aug 22, 2024

You might need to enable the serde feature for the url crate. https://docs.rs/url/latest/url/#feature-serde

That did the trick, tests passed :D

I am just curious how the tests passed before, though 😅 In the current case of factor-llm, we don't have to deserialize the Url?

@itowlson
Copy link
Contributor

@me-diru It will work if any crate in the build turns the feature on. This can be cause surprises when you use your crate in a slightly different build context and the other crate that happened to make things work is no longer there and boom your code stops compiling.

This is a significant pain point for features but I gather there is not much that can be done.

@calebschoepp
Copy link
Collaborator

Are you still having errors running it @me-diru? I would need to see the runtime config you're using to help more.

@me-diru
Copy link
Contributor Author

me-diru commented Aug 23, 2024

Are you still having errors running it @me-diru? I would need to see the runtime config you're using to help more.

Yes, it's still happening. When I run the same command with the latest spin cli release(2.7), it works fine.

anonymized runtime-config file

[llm_compute]
type = "remote_http"
url = "<URL>"
auth_token = "<AUTH-TOKEN>"

I checked with factors branch spin binary and the same error occurs. I don't think the metrics code is causing this one.

Maybe @lann could give more insight

@lann
Copy link
Collaborator

lann commented Aug 26, 2024

☝️ This hopefully fixed it.

Signed-off-by: Rohit Dandamudi <[email protected]>
@me-diru
Copy link
Contributor Author

me-diru commented Sep 5, 2024

@calebschoepp I think it's now capturing the metrics in the new factors code!
image

@calebschoepp
Copy link
Collaborator

Sweet, @me-diru is this ready for a final review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Record OTel metrics in host components
4 participants