DataBricks - Python - Data Asset Generation

This tool is designed to create a connection between Code Ocean and your SQL Warehouse in Databricks, submit a query, and create a data asset. The query is performed by a separate capsule (Databricks - Python - Data Connector).

Configuration

Generate API Token

In Code Ocean

Go to your user profile

Go to Access Tokens

Create an API token with full read write access to Capsules and Data Assets

Update API Secret to point to the correct API token.

Point to Query capsule

In order to use the "Databricks - Python - Data Connector" capsule into your environment, you will need to update the metadata ID this capsule is pointing to.

Browse to the "Databricks - Python - Data Connector" capsule in your environment.
Go to the "Metadata" tab.

Copy the metadata id from the Query capsule

In the config.sh file in this capsule, edit "databricks_query" to match the correct metadata for your Query capsule:

Get Parameter Information from Databricks

In your Databricks workspace, go to SQL Warehouse. Select your warehouse. Press Connection Details. Use these as input for this capsule.

Hostname
HTTPPath

In your Databricks account, go to Data, select the dataset you wish to query, and get "Catalog" name.

Catalog

App Panel Parameters

Hostname

Workspace hostname, see [Get Parameter Information from Databricks](## Get Parameter Information from Databricks) [default: dbc-2a6017bc-079e.cloud.databricks.com]

HTTPPath

Workspace hostname, see [Get Parameter Information from Databricks](## Get Parameter Information from Databricks) [default: /sql/1.0/warehouses/0c7a8dff9ad0e63c]

Catalog

SQL warehouse catalog, see [Get Parameter Information from Databricks](## Get Parameter Information from Databricks) [default: hive_metastore]

SQL Query

SQL Query to execute. This should be a "SELECT" statement pulling data from the warehouse. [default: SELECT * FROM default.diamonds LIMIT 2]

Output File Name

Name for output file, does not include extension [default: output]

Output Format

Data format to output. [default: csv]

Data Asset Name

Name for the output Data Asset [default: MyDataAsset]

Folder Name

This is where the data will be found when it is attached to a capsule. Note, this can be edited after attaching. [default: Mount]

Output

A dataset with your table from SQL query requested in .csv/tsv format.

Source

https://docs.databricks.com/dev-tools/python-sql-connector.html

Code Ocean is a cloud-based computational platform that aims to make it easy for researchers to share, discover, and run code.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code		code
environment		environment
images		images
metadata		metadata
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataBricks - Python - Data Asset Generation

Configuration

Generate API Token

Point to Query capsule

Get Parameter Information from Databricks

App Panel Parameters

Output

Source

About

Uh oh!

Releases

Packages

Uh oh!

Languages

codeocean/casule_databricks_data_asset_generation

Folders and files

Latest commit

History

Repository files navigation

DataBricks - Python - Data Asset Generation

Configuration

Generate API Token

Point to Query capsule

Get Parameter Information from Databricks

App Panel Parameters

Output

Source

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages