Credits to [email protected] and [email protected] for notebook logic for database initialization steps. This architecture will be deployed:
On your local machine, inside this folder of adb-external-hive-metastore
:
-
Clone the
tf_azure_deployment
repository to local. -
Supply with your
terraform.tfvars
file to overwrite default values accordingly. See inputs section below on optional/required variables. -
For step 2, variables for db_username and db_password, you can also use your environment variables: terraform will automatically look for environment variables with name format TF_VAR_xxxxx.
export TF_VAR_db_username=yoursqlserveradminuser
export TF_VAR_db_password=yoursqlserveradminpassword
-
Init terraform and apply to deploy resources:
terraform init
terraform apply
Step 4 automatically completes 99% steps. The last 1% step is to manually trigger the deployed job to run once.
Go to databricks workspace - Job - run the auto-deployed job only once; this is to initialize the database with metastore schema.
Then you can verify in a notebook:
We can also check inside the sql db (metastore), we've successfully linked up cluster to external hive metastore and registered the table here:
Now you can config all other clusters to use this external metastore, using the same spark conf and env variables of cold start cluster.
Refer to tutorial: https://kb.databricks.com/metastore/create-table-ddl-for-metastore.html
dbs = spark.catalog.listDatabases()
for db in dbs:
f = open("your_file_name_{}.ddl".format(db.name), "w")
tables = spark.catalog.listTables(db.name)
for t in tables:
DDL = spark.sql("SHOW CREATE TABLE {}.{}".format(db.name, t.name))
f.write(DDL.first()[0])
f.write("\n")
f.close()
Module creates:
- Resource group with random prefix
- Tags, including
Owner
, which is taken fromaz account show --query user
- VNet with public and private subnet
- Databricks workspace
- External Hive Metastore for ADB workspace
- Private endpoint connection to external metastore
Name | Version |
---|---|
azurerm | =2.83.0 |
databricks | 0.3.10 |
Name | Version |
---|---|
azurerm | 2.83.0 |
databricks | 0.3.10 |
external | 2.1.0 |
random | 3.1.0 |
No modules.
Name | Description | Type | Default | Required |
---|---|---|---|---|
cold_start | if true, will spin up a cluster to download hive jars to dbfs | bool |
true |
no |
db_password | Database administrator password | string |
n/a | yes |
db_username | Database administrator username | string |
n/a | yes |
dbfs_prefix | n/a | string |
"dbfs" |
no |
no_public_ip | n/a | bool |
true |
no |
private_subnet_endpoints | n/a | list |
[] |
no |
rglocation | n/a | string |
"southeastasia" |
no |
spokecidr | n/a | string |
"10.179.0.0/20" |
no |
sqlvnetcidr | n/a | string |
"10.178.0.0/20" |
no |
workspace_prefix | n/a | string |
"adb" |
no |
Name | Description |
---|---|
arm_client_id | n/a |
arm_subscription_id | n/a |
arm_tenant_id | n/a |
azure_region | n/a |
databricks_azure_workspace_resource_id | n/a |
resource_group | n/a |
workspace_url | n/a |