Skip to content

Files

Latest commit

32d9bc2 · Aug 30, 2022

History

History
This branch is 15 commits behind asean-rssa/tf_azure_deployment:master.

adb-external-hive-metastore

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
May 12, 2022
Jun 22, 2022
Jul 3, 2022
Jun 22, 2022
Nov 10, 2021
Nov 10, 2021
Nov 10, 2021
Nov 7, 2021
Nov 7, 2021
Aug 30, 2022
Aug 30, 2022
Nov 7, 2021
Aug 30, 2022

ADB workspace with external hive metastore

Credits to [email protected] and [email protected] for notebook logic for database initialization steps. This architecture will be deployed:

Get Started:

On your local machine, inside this folder of adb-external-hive-metastore:

  1. Clone the tf_azure_deployment repository to local.

  2. Supply with your terraform.tfvars file to overwrite default values accordingly. See inputs section below on optional/required variables.

  3. For step 2, variables for db_username and db_password, you can also use your environment variables: terraform will automatically look for environment variables with name format TF_VAR_xxxxx.

    export TF_VAR_db_username=yoursqlserveradminuser

    export TF_VAR_db_password=yoursqlserveradminpassword

  4. Init terraform and apply to deploy resources:

    terraform init

    terraform apply

Step 4 automatically completes 99% steps. The last 1% step is to manually trigger the deployed job to run once.

Go to databricks workspace - Job - run the auto-deployed job only once; this is to initialize the database with metastore schema.

alt text

Then you can verify in a notebook:

alt text

We can also check inside the sql db (metastore), we've successfully linked up cluster to external hive metastore and registered the table here:

alt text

Now you can config all other clusters to use this external metastore, using the same spark conf and env variables of cold start cluster.

Notes: Migrate from your existing managed metastore to external metastore

Refer to tutorial: https://kb.databricks.com/metastore/create-table-ddl-for-metastore.html

dbs = spark.catalog.listDatabases()
for db in dbs:
    f = open("your_file_name_{}.ddl".format(db.name), "w")
    tables = spark.catalog.listTables(db.name)
    for t in tables:
        DDL = spark.sql("SHOW CREATE TABLE {}.{}".format(db.name, t.name))
        f.write(DDL.first()[0])
        f.write("\n")
    f.close()

Module creates:

  • Resource group with random prefix
  • Tags, including Owner, which is taken from az account show --query user
  • VNet with public and private subnet
  • Databricks workspace
  • External Hive Metastore for ADB workspace
  • Private endpoint connection to external metastore

Requirements

Name Version
azurerm =2.83.0
databricks 0.3.10

Providers

Name Version
azurerm 2.83.0
databricks 0.3.10
external 2.1.0
random 3.1.0

Modules

No modules.

Resources

Name Type
azurerm_databricks_workspace.this resource
azurerm_key_vault.akv1 resource
azurerm_key_vault_access_policy.example resource
azurerm_key_vault_secret.hivepwd resource
azurerm_key_vault_secret.hiveurl resource
azurerm_key_vault_secret.hiveuser resource
azurerm_mssql_database.sqlmetastore resource
azurerm_mssql_server.metastoreserver resource
azurerm_mssql_server_extended_auditing_policy.mssqlpolicy resource
azurerm_mssql_virtual_network_rule.sqlservervnetrule resource
azurerm_network_security_group.this resource
azurerm_private_dns_zone.dnsmetastore resource
azurerm_private_dns_zone_virtual_network_link.metastorednszonevnetlink resource
azurerm_private_endpoint.sqlserverpe resource
azurerm_resource_group.this resource
azurerm_storage_account.sqlserversa resource
azurerm_subnet.plsubnet resource
azurerm_subnet.private resource
azurerm_subnet.public resource
azurerm_subnet.sqlsubnet resource
azurerm_subnet_network_security_group_association.private resource
azurerm_subnet_network_security_group_association.public resource
azurerm_virtual_network.sqlvnet resource
azurerm_virtual_network.this resource
databricks_cluster.coldstart resource
databricks_global_init_script.metastoreinit resource
databricks_job.metastoresetup resource
databricks_notebook.ddl resource
databricks_secret_scope.kv resource
random_string.naming resource
azurerm_client_config.current data source
databricks_current_user.me data source
databricks_node_type.smallest data source
databricks_spark_version.latest_lts data source
external_external.me data source

Inputs

Name Description Type Default Required
cold_start if true, will spin up a cluster to download hive jars to dbfs bool true no
db_password Database administrator password string n/a yes
db_username Database administrator username string n/a yes
dbfs_prefix n/a string "dbfs" no
no_public_ip n/a bool true no
private_subnet_endpoints n/a list [] no
rglocation n/a string "southeastasia" no
spokecidr n/a string "10.179.0.0/20" no
sqlvnetcidr n/a string "10.178.0.0/20" no
workspace_prefix n/a string "adb" no

Outputs

Name Description
arm_client_id n/a
arm_subscription_id n/a
arm_tenant_id n/a
azure_region n/a
databricks_azure_workspace_resource_id n/a
resource_group n/a
workspace_url n/a