-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[DOCS-12615] Data Observability reorg #33006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Created DOCS-12912 for docs review. |
janine-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an ASTOUNDING amount of work, Rosa, well done!! Thanks for your patience for the review. All of my comments are really tiny and could easily be punted to a fast-follow, since I know your PMs want to get this released soon 🙂
content/en/data_observability/integrations/warehouse/databricks.md
Outdated
Show resolved
Hide resolved
content/en/data_observability/integrations/warehouse/snowflake.md
Outdated
Show resolved
Hide resolved
content/en/database_monitoring/setup_documentdb/amazon_documentdb.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Janine Chan <[email protected]>
…g/documentation into rtrieu/docs-12615-update-data-obs
content/en/data_observability/quality_monitoring/openlineage.md
Outdated
Show resolved
Hide resolved
| @@ -1,44 +1,36 @@ | |||
| --- | |||
| title: Data Observability | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinzenghu this page needs to be updated to explain suite level overview + what you get in quality and jobs. Right now it mostly indexes only on quality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check out this commit: 36aea7d#diff-962ee3dc7e8fcec564f2759e0b618688f539ccacafd059f1a0d1cb1a9eff1c35
|
@warrierr and I caught up just on on organization, specifically on whether to have a separate Integrations section. Originally I imagined a separate Integrations section would reduce redundancy across SKUs (especially as we introduce more) and encourage the cross-product suite message. But I'm convinced now that it's secondary to having a simple and clear onboarding experience for specific products. We will probably have to change things in the future, but for now (and we're sorry for the late structure change @Rosa Trieu 😬) how about we go with this change? I'll also leave this as a comment in the PR and change the doc
|
| </div> | ||
|
|
||
| <div class="col"> | ||
| <a class="card h-100" href="/data_observability/jobs_monitoring/dbtcore"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we don't have 2 different logos for dbt core and dbt cloud which is confusing. Can we combine this into one dbt page instead of 2 with different tabs for setup steps per platforms (like we have for Airflow)? So it would be a "dbt core" setup tab and one for "dbt cloud" The overview/next sections are similar between the two anyway anyway.
| identifier: jobs_monitoring_airflow | ||
| parent: data_jobs | ||
| weight: 200000 | ||
| - name: dbt Core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just make this a single dbt page with setup for both core and cloud (see other comment with more details)
| - name: Spark on Amazon EMR | ||
| url: data_observability/jobs_monitoring/emr | ||
| identifier: jobs_monitoring_emr | ||
| parent: transformation_integrations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| parent: transformation_integrations | |
| parent: data_jobs |
| identifier: jobs_monitoring_dataproc | ||
| parent: data_jobs | ||
| weight: 700000 | ||
| - name: Custom Jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - name: Custom Jobs | |
| - name: Custom Jobs (OpenLineage) |
|
|
||
| ## Further reading | ||
|
|
||
| {{< partial name="whats-next/whats-next.html" >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbt further reading
https://www.datadoghq.com/blog/understanding-dbt/
| aliases: | ||
| - /data_jobs/databricks | ||
| further_reading: | ||
| - link: '/data_jobs' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the preview callout for Databricks serverless on this page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also make the first sentence on this page
Data Jobs Monitoring gives visibility into the performance and reliability of your Databricks jobs and workflows running on clusters or serverless compute
| --- | ||
|
|
||
| {{< callout url="#" btn_hidden="true" header="Data Jobs Monitoring for Apache Airflow is in Preview" >}} | ||
| {{< callout url="#" btn_hidden="true" header="Data Jobs Monitoring for Apache Airflow is in preview" >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the preview callout from this page as part of the GA
| </div> | ||
| <div class="row row-cols-1 row-cols-md-4 g-2 g-xl-3 justify-content-sm-center"> | ||
|
|
||
| <div class="col"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rtrieu can we reorganize this setup section now that we have more technologies that aren't Spark and the logos don't make that clear?
Specifically i'm thinking
Data Jobs Monitoring supports multiple job technologies. To get started, select your technology and follow the installation instructions:
- Logo list of Databricks, Airflow, dbt
Apache Spark jobs on the following platforms:
- Logo list of K8s, EMR, Google Dataproc
| - [OpenLineage Python client (HTTP transport)](#option-2-openlineage-python-client-http-transport) | ||
| - [OpenLineage Python client (Datadog transport)](#option-3-openlineage-python-client-datadog-transport) | ||
|
|
||
| ## Option 1: Direct HTTP with curl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.


What does this PR do? What is the motivation?
Reorg of existing Data Jobs docs:
See also: #33006 (comment)
Merge instructions
Merge readiness:
For Datadog employees:
Your branch name MUST follow the
<name>/<description>convention and include the forward slash (/). Without this format, your pull request will not pass CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.If your branch doesn't follow this format, rename it or create a new branch and PR.
[6/5/2025] Merge queue has been disabled on the documentation repo. If you have write access to the repo, the PR has been reviewed by a Documentation team member, and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #documentation channel in Slack.
Additional notes