Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synchronous logger for telemetry #2432

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Add synchronous logger for telemetry #2432

wants to merge 20 commits into from

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Mar 5, 2025

Changes

This PR adds a synchronous telemetry logger for the CLI with a max timeout of 3 seconds. Due to the 3-second timeout configuration, this is only meant to be used in long-running commands.

This is a short-term solution. Eventually, we'd like to transition to a daemon process to upload the telemetry logs to amortise the costs of configuring authentication and maintaining a warm pool of HTTP connections, as well as a better UX for the end user.

Note that users can set the DATABRICKS_CLI_DISABLE_TELEMETRY environment variable to disable telemetry collection.

Why

To collect telemetry, which was previously inaccessible to us, and answer questions like which templates customers like to use and which DABs features would be safe to deprecate.

Tests

Unit and acceptance tests.

Also manually verified that the telemetry upload works:

(artifact-playground) ➜  cli git:(sync-logger) cli selftest send-telemetry --debug
15:58:20 Info: start pid=40386 version=0.0.0-dev+a2825ca89a23 args="cli, selftest, send-telemetry, --debug"
15:58:20 Debug: Loading DEFAULT profile from /Users/shreyas.goenka/.databrickscfg pid=40386 sdk=true
15:58:20 Info: completed execution pid=40386 exit_code=0
15:58:21 Debug: POST /telemetry-ext
> {
>   "items": null,
>   "protoLogs": [
>     "{\"frontend_log_event_id\":\"82d29b3a-d5ff-48f3-8a21-dae6e08d2999\",\"entry\":{\"databricks_cli_log\":{\"... (232 more bytes)",
>     "{\"frontend_log_event_id\":\"d6be8220-7db8-45d9-97d6-4c09c25e2664\",\"entry\":{\"databricks_cli_log\":{\"... (232 more bytes)"
>   ],
>   "uploadTime": 1741186700967
> }
< HTTP/2.0 200 OK
< {
<   "errors": null,
<   "numProtoSuccess": 2,
<   "numSuccess": 0
< } pid=40386 sdk=true

@@ -7,8 +7,10 @@ import (
"github.com/google/uuid"
)

var cmdExecId = uuid.New().String()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a global now. It should be initialized once per command and can be set on the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #2439. Will update this PR once the other one is merged.

protoLogs[i] = string(b)
}

apiClient, err := client.New(cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new client means we cannot reuse existing connections. Can this function take a workspace client directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Anytime we need to make a bespoke API call, we rely on creating a new client.

I can look into whether we can reuse the existing connections as a followup optimization.

"numProtoSuccess": 2
}
'''
DelaySeconds = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add env var DATABRICKS_CLI_TELEMETRY_TIMEOUT and set it to 0.01 in tests so that we don't have to wait 3 seconds here?

Copy link
Contributor Author

@shreyas-goenka shreyas-goenka Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests run in parallel anyway so it does not add latency.

Not configuring the timeout makes the test more representative of the real behaviour, but I'm fine with making it configurable with an environment variable if you still recommend doing so.

cmd/root/root.go Outdated
ExitCode: int64(exitCode),
})

err := telemetry.Upload(ctx, ConfigUsed(ctx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not hide SetExecutionContext/HasLogs/env var checking inside telemetry packages? So the code would look like this:

telemetry.Upload(ctx, ... ExecutionContext{...}) // updates logs with execution context, checks env var if upload is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved SetExecutionContext inside and the env var code inside. We cannot move HasLogs because not every command would configure a config in the context, and for those cases ConfigUsed(ctx)) would panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can work on a followup PR that consolidates ConfigUsed and other clients in the libs/command library. Ref: #2439

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #2440 is merged, we can get rid of HasLogs as well.

github-merge-queue bot pushed a commit that referenced this pull request Mar 6, 2025
## Changes
This PR also starts storing the DBR versions in the context.

Go patch file used:
```
@@
var x expression
@@
-dbr.MockRuntime(x, true)
+dbr.MockRuntime(x, dbr.Environment{IsDbr: true, Version: "15.4"})

@@
var x expression
@@
-dbr.MockRuntime(x, false)
+dbr.MockRuntime(x, dbr.Environment{})
```

ref: https://github.com/uber-go/gopatch

## Why
This localised all DBR version accesses to `libs/dbr`. Relevant comment:
#2432 (comment)

## Tests
Exiting tests are modified.
@shreyas-goenka shreyas-goenka requested review from pietern and denik March 6, 2025 13:44
github-merge-queue bot pushed a commit that referenced this pull request Mar 6, 2025
## Changes
This PR moves `ConfigUsed` from the root package to `libs/command`.

## Why
Having the ConfigUsed function in the root package is a problem because
that means we cannot use that function from outside the `root` package
since doing so often leads to an import cycle (because `root` imports
everything implicitly).

Moving it to a separate package that consolidates the read interaction
and solves the import cycle issue. Example where this would have
simplified code:
#2432 (comment)

I'd like to send PRs to do the same for the workspace client and account
client as well. I'll wait however until this one is merged incase people
have concerns about the approach here.

## Tests
Existing tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants