-
Notifications
You must be signed in to change notification settings - Fork 48
DOCS-3198: offline data pipelines, SDK docs, hot data store fixes #4440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DOCS-3198: offline data pipelines, SDK docs, hot data store fixes #4440
Conversation
✅ Deploy Preview for viam-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started reviewing but this is a philosophical one about the extent to which we want to redundantly document the APIs and CLI commands. Feels like a new precedent, since for each item for example "Delete a pipeline," there's a 1:1 matching section in the API docs as well as an example in the CLI page, so will leave to Naomi to review.
@@ -7,7 +7,7 @@ | |||
| [`TabularDataBySQL`](/dev/reference/apis/data-client/#tabulardatabysql) | Obtain unified tabular data and metadata, queried with SQL. Make sure your API key has permissions at the organization level in order to use this. | | |||
| [`TabularDataByMQL`](/dev/reference/apis/data-client/#tabulardatabymql) | Obtain unified tabular data and metadata, queried with MQL. | | |||
| [`BinaryDataByFilter`](/dev/reference/apis/data-client/#binarydatabyfilter) | Retrieve optionally filtered binary data from Viam. | | |||
| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) | Retrieve binary data from Viam by `BinaryID`. | | |||
| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) | Retrieve binary data from the Viam by `BinaryID`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) | Retrieve binary data from the Viam by `BinaryID`. | | |
| [`BinaryDataByIDs`](/dev/reference/apis/data-client/#binarydatabyids) | Retrieve binary data from Viam by `BinaryID`. | |
Can you also update static/include/app/apis/overrides/protos/data.BinaryDataByIDs.md to fix this?
Viam stores the output of these pipelines in a cache so that you can access complex aggregation results more efficiently. | ||
When late-arriving data syncs to Viam, pipelines automatically re-run to keep summaries accurate. | ||
|
||
For example, you could use a data pipeline to pre-calculate results like "average temperature per hour". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, you could use a data pipeline to pre-calculate results like "average temperature per hour". | |
For example, you could use a data pipeline to pre-calculate results like "average temperature per hour." |
or better yet
For example, you could use a data pipeline to pre-calculate results like "average temperature per hour". | |
For example, you could use a data pipeline to pre-calculate results such as average temperature per hour. |
@@ -2587,7 +2622,7 @@ User-defined metadata is billed as data. | |||
|
|||
**Parameters:** | |||
|
|||
- `robot_id` ([str](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)) (required): The ID of the robot with which to associate the user-defined metadata. You can obtain your robot ID from your machine's page. | |||
- `robot_id` ([str](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)) (required): The ID of the robot with which to associate the user-defined metadata. You can obtain your robot ID from the machine page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems less helpful
@@ -0,0 +1 @@ | |||
Get the configuration for multiple data pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get the configuration for multiple data pipelines. | |
Get a list of configurations of all data pipelines for an organization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "for" instead of "of": Get a list of configurations for all data pipelines for an organization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure--edited my suggestion!
@@ -0,0 +1 @@ | |||
Get the configuration of a data pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider linking to your new page on [data pipeline]
When you address the merge conflicts with /generated/app.md, note that we changed a couple things manually in #4431 but I created this upstream PR to get them to stick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nathan!
|
||
## Query | ||
|
||
Queries typically execute on blog storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Queries typically execute on blog storage. | |
Queries typically execute on blob storage. |
|
||
### Query limitations | ||
|
||
You cannot use the following MongoDB aggregation operators when querying your hot data store: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is true for both standard storage and hot data. But there's a lot more operators we don't support. This link contains the list of the ones we allow. https://github.com/viamrobotics/app/blob/e706a2e3ea57a252f102b37e0ab2b9d6eeed51e0/datamanagement/tabular_data_by_query.go#L64
@@ -23,3 +23,8 @@ | |||
| [`ConfigureDatabaseUser`](/dev/reference/apis/data-client/#configuredatabaseuser) | Configure a database user for the Viam organization’s MongoDB Atlas Data Federation instance. | | |||
| [`AddBinaryDataToDatasetByIDs`](/dev/reference/apis/data-client/#addbinarydatatodatasetbyids) | Add the `BinaryData` to the provided dataset. | | |||
| [`RemoveBinaryDataFromDatasetByIDs`](/dev/reference/apis/data-client/#removebinarydatafromdatasetbyids) | Remove the BinaryData from the provided dataset. | | |||
| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) | Get the configuration of a data pipeline. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) | Get the configuration of a data pipeline. | | |
| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) | Get the configuration for a data pipeline. | |
| [`GetDataPipeline`](/dev/reference/apis/data-client/#getdatapipeline) | Get the configuration of a data pipeline. | | ||
| [`ListDataPipelines`](/dev/reference/apis/data-client/#listdatapipelines) | Get the configuration for multiple data pipelines. | | ||
| [`CreateDataPipeline`](/dev/reference/apis/data-client/#createdatapipeline) | Create a data pipeline. | | ||
| [`DeleteDataPipeline`](/dev/reference/apis/data-client/#deletedatapipeline) | Delete a data pipeline. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And all of its data)
| [`ListDataPipelines`](/dev/reference/apis/data-client/#listdatapipelines) | Get the configuration for multiple data pipelines. | | ||
| [`CreateDataPipeline`](/dev/reference/apis/data-client/#createdatapipeline) | Create a data pipeline. | | ||
| [`DeleteDataPipeline`](/dev/reference/apis/data-client/#deletedatapipeline) | Delete a data pipeline. | | ||
| [`ListDataPipelineRuns`](/dev/reference/apis/data-client/#listdatapipelineruns) | List the statuses of individual executions of a data pipeline. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just status. So maybe List the individual executions of a data pipeline
? Something like that?
@@ -0,0 +1 @@ | |||
List the statuses of individual executions of a data pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment
@@ -0,0 +1 @@ | |||
Get the configuration for multiple data pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "for" instead of "of": Get a list of configurations for all data pipelines for an organization.
{{% /tab %}} | ||
{{< /tabs >}} | ||
|
||
### Update a pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to not include this in the documentation. We don't want people changing pipeline schedules or queries after a pipeline has started inserting query results. Might end up just being the name that we allow them to update. Is it ok to leave this out?
|
||
### Disable a pipeline | ||
|
||
Disabling a data pipeline lets you pause data pipeline execution without fully deleting the pipeline configuration from your organization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note that any time windows that pass while the pipeline is disabled will not contain data and will not be backfilled if the pipeline is enabled again
{{% /tab %}} | ||
{{< /tabs >}} | ||
|
||
### Delete a pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting a pipeline will also delete its run history and the pipeline results collection
{{< tabs >}} | ||
{{% tab name="Python" %}} | ||
|
||
Use [`DataClient.ListDataPipelineRuns`](/dev/reference/apis/data-client/#listdatapipelineruns) to view the statuses of past executions of a pipeline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say past. A run might not be complete yet. Also like I mentioned elsewhere, shows more than status
bson.encode({"$match": {"component_name": "temperature-sensor"}}), | ||
bson.encode({ | ||
"$group": { | ||
"_id": "$location_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not do this. In fact, I think we should probably call out that you should not specify an id if your last stage is a group stage unless the id is guaranteed to be unique for every pipeline run. Otherwise there will be duplicate id errors, and only the first pipeline result will save successfully
(I'll hold off on review until commetns are resolved) |
docs/data-ai/data/data-pipelines.md
)docs/data-ai/data/hot-data-store.md
), since: