DOCS-3198: offline data pipelines, SDK docs, hot data store fixes #4440

nathan-contino · 2025-07-02T20:53:33Z

adds information about offline data pipelines (docs/data-ai/data/data-pipelines.md)
- previously had no page in the docs besides some minimal CLI doc discussing pipelines; this introduces that missing page
- includes examples in all supported languages (Python, Go, TypeScript) for basic data pipeline tasks
  - note that Go snippets link directly to Go API reference -- see SDK docs notes for more info
updates generated SDK documentation to include Python and Typescript data pipelines APIs (no Flutter yet)
- no Go because our data page doesn't seem to have or support any Go snippets (i'm guessing there's an out-of-scope story here)
yanked hot data store out into its own page ((docs/data-ai/data/hot-data-store.md), since:
- it has a lot in common with data pipelines, which could easily lead to user confusion
- was very buried (increasing the likelihood of user confusion)
- the recent API improvements for data pipelines broke our existing hot data store examples
slight reorder of 'Advanced data capture and sync configurations' since some short-but-useful sections were buried all the way at the end of a very long page of complex, niche examples
note that the alias for hot data store doesn't work -- leaving it for now in the hopes that someone can suggest a better alternative for relocating a single section of a still-existing page to another page

netlify · 2025-07-02T20:53:37Z

✅ Deploy Preview for viam-docs ready!

Name	Link
🔨 Latest commit	`b475d83`
🔍 Latest deploy log	https://app.netlify.com/projects/viam-docs/deploys/68659cd6aee1e900084fe33f
😎 Deploy Preview	https://deploy-preview-4440--viam-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
Lighthouse	1 paths audited Performance: 53 (🔴 down 3 from production) Accessibility: 100 (no change from production) Best Practices: 100 (no change from production) SEO: 92 (no change from production) PWA: 70 (no change from production) View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

JessamyT

Started reviewing but this is a philosophical one about the extent to which we want to redundantly document the APIs and CLI commands. Feels like a new precedent, since for each item for example "Delete a pipeline," there's a 1:1 matching section in the API docs as well as an example in the CLI page, so will leave to Naomi to review.

JessamyT · 2025-07-02T22:00:29Z

static/include/app/apis/generated/data-table.md

@@ -7,7 +7,7 @@
 | [`TabularDataBySQL`](/dev/reference/apis/data-client/#tabulardatabysql) | Obtain unified tabular data and metadata, queried with SQL. Make sure your API key has permissions at the organization level in order to use this. |
 | [`TabularDataByMQL`](/dev/reference/apis/data-client/#tabulardatabymql) | Obtain unified tabular data and metadata, queried with MQL. |
 | [`BinaryDataByFilter`](/dev/reference/apis/data-client/#binarydatabyfilter) | Retrieve optionally filtered binary data from Viam. |
-| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) | Retrieve binary data from Viam by `BinaryID`. |
+| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) | Retrieve binary data from the Viam by `BinaryID`. |


Suggested change

| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) | Retrieve binary data from the Viam by `BinaryID`. |

| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) | Retrieve binary data from Viam by `BinaryID`. |

Can you also update static/include/app/apis/overrides/protos/data.BinaryDataByIDs.md to fix this?

JessamyT · 2025-07-02T22:02:31Z

docs/data-ai/data/data-pipelines.md

+Viam stores the output of these pipelines in a cache so that you can access complex aggregation results more efficiently.
+When late-arriving data syncs to Viam, pipelines automatically re-run to keep summaries accurate.
+
+For example, you could use a data pipeline to pre-calculate results like "average temperature per hour".


Suggested change

For example, you could use a data pipeline to pre-calculate results like "average temperature per hour".

For example, you could use a data pipeline to pre-calculate results like "average temperature per hour."

or better yet

Suggested change

For example, you could use a data pipeline to pre-calculate results like "average temperature per hour".

For example, you could use a data pipeline to pre-calculate results such as average temperature per hour.

JessamyT · 2025-07-02T22:10:55Z

static/include/app/apis/generated/app.md

@@ -2587,7 +2622,7 @@ User-defined metadata is billed as data.

 **Parameters:**

- `robot_id` ([str](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)) (required): The ID of the robot with which to associate the user-defined metadata. You can obtain your robot ID from your machine's page.
+- `robot_id` ([str](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)) (required): The ID of the robot with which to associate the user-defined metadata. You can obtain your robot ID from the machine page.


This seems less helpful

JessamyT · 2025-07-02T22:13:20Z

static/include/app/apis/overrides/protos/data.ListDataPipelines.md

@@ -0,0 +1 @@
+Get the configuration for multiple data pipelines.


Suggested change

Get the configuration for multiple data pipelines.

Get a list of configurations of all data pipelines for an organization.

Maybe "for" instead of "of": Get a list of configurations for all data pipelines for an organization.

Sure--edited my suggestion!

JessamyT · 2025-07-02T22:14:27Z

static/include/app/apis/overrides/protos/data.GetDataPipeline.md

@@ -0,0 +1 @@
+Get the configuration of a data pipeline.


Consider linking to your new page on [data pipeline]

JessamyT · 2025-07-02T22:37:22Z

When you address the merge conflicts with /generated/app.md, note that we changed a couple things manually in #4431 but I created this upstream PR to get them to stick.

vijayvuyyuru

Thanks Nathan!

vijayvuyyuru · 2025-07-03T12:50:38Z

docs/data-ai/data/hot-data-store.md

+
+## Query
+
+Queries typically execute on blog storage.


Suggested change

Queries typically execute on blog storage.

Queries typically execute on blob storage.

vijayvuyyuru · 2025-07-03T12:53:36Z

docs/data-ai/data/hot-data-store.md

+
+### Query limitations
+
+You cannot use the following MongoDB aggregation operators when querying your hot data store:


So this is true for both standard storage and hot data. But there's a lot more operators we don't support. This link contains the list of the ones we allow. https://github.com/viamrobotics/app/blob/e706a2e3ea57a252f102b37e0ab2b9d6eeed51e0/datamanagement/tabular_data_by_query.go#L64

katiepeters · 2025-07-03T17:20:17Z

static/include/app/apis/generated/data-table.md

@@ -23,3 +23,8 @@
 | [`ConfigureDatabaseUser`](/dev/reference/apis/data-client/#configuredatabaseuser) | Configure a database user for the Viam organization’s MongoDB Atlas Data Federation instance. |
 | [`AddBinaryDataToDatasetByIDs`](/dev/reference/apis/data-client/#addbinarydatatodatasetbyids) | Add the `BinaryData` to the provided dataset. |
 | [`RemoveBinaryDataFromDatasetByIDs`](/dev/reference/apis/data-client/#removebinarydatafromdatasetbyids) | Remove the BinaryData from the provided dataset. |
+| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) | Get the configuration of a data pipeline. |


Suggested change

| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) | Get the configuration of a data pipeline. |

| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) | Get the configuration for a data pipeline. |

katiepeters · 2025-07-03T17:20:43Z

static/include/app/apis/generated/data-table.md

+| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) | Get the configuration of a data pipeline. |
+| [`ListDataPipelines`](/dev/reference/apis/data-client/#listdatapipelines) | Get the configuration for multiple data pipelines. |
+| [`CreateDataPipeline`](/dev/reference/apis/data-client/#createdatapipeline) | Create a data pipeline. |
+| [`DeleteDataPipeline`](/dev/reference/apis/data-client/#deletedatapipeline) | Delete a data pipeline. |


(And all of its data)

katiepeters · 2025-07-03T17:21:51Z

static/include/app/apis/generated/data-table.md

+| [`ListDataPipelines`](/dev/reference/apis/data-client/#listdatapipelines) | Get the configuration for multiple data pipelines. |
+| [`CreateDataPipeline`](/dev/reference/apis/data-client/#createdatapipeline) | Create a data pipeline. |
+| [`DeleteDataPipeline`](/dev/reference/apis/data-client/#deletedatapipeline) | Delete a data pipeline. |
+| [`ListDataPipelineRuns`](/dev/reference/apis/data-client/#listdatapipelineruns) | List the statuses of individual executions of a data pipeline. |


Not just status. So maybe List the individual executions of a data pipeline? Something like that?

katiepeters · 2025-07-03T17:23:53Z

static/include/app/apis/overrides/protos/data.ListDataPipelineRuns.md

@@ -0,0 +1 @@
+List the statuses of individual executions of a data pipeline.


same comment

katiepeters · 2025-07-03T17:24:45Z

static/include/app/apis/overrides/protos/data.ListDataPipelines.md

@@ -0,0 +1 @@
+Get the configuration for multiple data pipelines.


Maybe "for" instead of "of": Get a list of configurations for all data pipelines for an organization.

katiepeters · 2025-07-03T19:08:42Z

docs/data-ai/data/data-pipelines.md

+{{% /tab %}}
+{{< /tabs >}}
+
+### Update a pipeline


I'm tempted to not include this in the documentation. We don't want people changing pipeline schedules or queries after a pipeline has started inserting query results. Might end up just being the name that we allow them to update. Is it ok to leave this out?

katiepeters · 2025-07-03T19:10:18Z

docs/data-ai/data/data-pipelines.md

+
+### Disable a pipeline
+
+Disabling a data pipeline lets you pause data pipeline execution without fully deleting the pipeline configuration from your organization.


Maybe note that any time windows that pass while the pipeline is disabled will not contain data and will not be backfilled if the pipeline is enabled again

katiepeters · 2025-07-03T19:11:15Z

docs/data-ai/data/data-pipelines.md

+{{% /tab %}}
+{{< /tabs >}}
+
+### Delete a pipeline


Deleting a pipeline will also delete its run history and the pipeline results collection

katiepeters · 2025-07-03T19:12:52Z

docs/data-ai/data/data-pipelines.md

+{{< tabs >}}
+{{% tab name="Python" %}}
+
+Use [`DataClient.ListDataPipelineRuns`](/dev/reference/apis/data-client/#listdatapipelineruns) to view the statuses of past executions of a pipeline:


I wouldn't say past. A run might not be complete yet. Also like I mentioned elsewhere, shows more than status

katiepeters · 2025-07-03T19:15:50Z

docs/data-ai/data/data-pipelines.md

+        bson.encode({"$match": {"component_name": "temperature-sensor"}}),
+        bson.encode({
+            "$group": {
+                "_id": "$location_id",


Do not do this. In fact, I think we should probably call out that you should not specify an id if your last stage is a group stage unless the id is guaranteed to be unique for every pipeline run. Otherwise there will be duplicate id errors, and only the first pipeline result will save successfully

npentrel · 2025-07-04T14:42:28Z

(I'll hold off on review until commetns are resolved)

nathan-contino added 3 commits July 2, 2025 16:36

DOCS-3198: data pipelines

620b3f6

Minor fix

7477955

Prettier fixes

d245dd3

viambot added the safe to build This pull request is marked safe to build from a trusted zone label Jul 2, 2025

Remove alias for hot data store, which breaks netlify

b475d83

JessamyT reviewed Jul 2, 2025

View reviewed changes

vijayvuyyuru reviewed Jul 3, 2025

View reviewed changes

katiepeters reviewed Jul 3, 2025

View reviewed changes

npentrel self-requested a review July 4, 2025 14:42

	\| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) \| Retrieve binary data from the Viam by `BinaryID`. \|
	\| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) \| Retrieve binary data from Viam by `BinaryID`. \|

	For example, you could use a data pipeline to pre-calculate results like "average temperature per hour".
	For example, you could use a data pipeline to pre-calculate results like "average temperature per hour."

		@@ -0,0 +1 @@
		Get the configuration for multiple data pipelines.

	Get the configuration for multiple data pipelines.
	Get a list of configurations of all data pipelines for an organization.

	Queries typically execute on blog storage.
	Queries typically execute on blob storage.


		### Query limitations

		You cannot use the following MongoDB aggregation operators when querying your hot data store:

	\| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) \| Get the configuration of a data pipeline. \|
	\| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) \| Get the configuration for a data pipeline. \|

		@@ -0,0 +1 @@
		List the statuses of individual executions of a data pipeline.


		### Disable a pipeline

		Disabling a data pipeline lets you pause data pipeline execution without fully deleting the pipeline configuration from your organization.

DOCS-3198: offline data pipelines, SDK docs, hot data store fixes #4440

Are you sure you want to change the base?

DOCS-3198: offline data pipelines, SDK docs, hot data store fixes #4440

Conversation

nathan-contino commented Jul 2, 2025

Uh oh!

netlify bot commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for viam-docs ready!

Uh oh!

JessamyT left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JessamyT Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JessamyT commented Jul 2, 2025

Uh oh!

vijayvuyyuru left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

npentrel commented Jul 4, 2025

Uh oh!

Uh oh!

netlify bot commented Jul 2, 2025 •

edited

Loading

JessamyT Jul 2, 2025 •

edited

Loading