Summary
The ADBC BigQuery driver currently supports discovery of projects, datasets, tables (including views, materialized views, external tables, clones, snapshots), columns, and primary/foreign key constraints. However, BigQuery has additional object types and metadata that data development tools need — routines (functions and procedures), ML models, partition details, DDL, reverse foreign keys, and richer table/dataset properties. This issue requests expanding metadata discovery to cover these gaps.
We build a collaborative data operations platform and currently use the BigQuery JDBC driver supplemented by INFORMATION_SCHEMA queries and the Google Cloud BigQuery Java API to extract metadata. We are evaluating the ADBC driver as a modern Arrow-native replacement but cannot adopt it without coverage for the object categories listed below.
Each section describes what we need, why it matters, what metadata properties are required, and how the data can be obtained from BigQuery.
Table of Contents
- Stored Procedures
- Scalar Functions (UDFs)
- Table Functions
- Routine Parameters
- ML Models
- Partition Metadata
- DDL / CREATE Statement Retrieval
- Reverse Foreign Keys (Referenced By)
- Extended Table Statistics
- Dataset Properties
- Extended External Table Properties
- Extended Materialized View Properties
- Table Options (Labels, Expiration, Encryption)
- Row-Level Security Policies
- Column-Level Security (Policy Tags)
- Authorized Views & Datasets
- Search Indexes
1. Stored Procedures
Motivation
Stored procedures are a core part of BigQuery's programmability model. They support SQL and JavaScript, can accept IN/OUT/INOUT parameters, and are used extensively for ETL orchestration and data transformations. Data development tools need to enumerate them for catalog browsing, autocomplete, and documentation.
Required Metadata
| Property |
Description |
Source Column |
| Routine catalog |
Project containing the procedure |
routine_catalog |
| Routine schema |
Dataset containing the procedure |
routine_schema |
| Routine name |
Procedure identifier |
routine_name |
| Routine type |
PROCEDURE |
routine_type |
| Routine body |
SQL or EXTERNAL |
routine_body |
| External language |
JAVASCRIPT (if external) |
external_language |
| DDL |
Complete CREATE PROCEDURE statement |
ddl |
| Description |
User-provided description |
From ROUTINE_OPTIONS |
| Is deterministic |
Whether the procedure is deterministic |
is_deterministic |
| Created |
Creation timestamp |
created |
| Last altered |
Last modification timestamp |
last_altered |
| Security type |
INVOKER or DEFINER |
security_type |
How to Obtain
SELECT
r.routine_catalog,
r.routine_schema,
r.routine_name,
r.routine_type,
r.routine_body,
r.external_language,
r.is_deterministic,
r.security_type,
r.created,
r.last_altered,
r.ddl
FROM
`project`.`dataset`.INFORMATION_SCHEMA.ROUTINES r
WHERE
r.routine_type = 'PROCEDURE'
ORDER BY
r.routine_name;
Options (description, library URIs) are available from:
SELECT option_name, option_value
FROM `project`.`dataset`.INFORMATION_SCHEMA.ROUTINE_OPTIONS
WHERE specific_name = 'procedure_name';
2. Scalar Functions (UDFs)
Motivation
User-defined scalar functions are used extensively in BigQuery for reusable transformations. Tools need to discover them for autocomplete, signature help, and catalog browsing.
Required Metadata
| Property |
Description |
Source Column |
| Routine catalog |
Project containing the function |
routine_catalog |
| Routine schema |
Dataset containing the function |
routine_schema |
| Routine name |
Function identifier |
routine_name |
| Routine type |
FUNCTION |
routine_type |
| Return data type |
Function return type |
data_type |
| Routine body |
SQL or EXTERNAL |
routine_body |
| External language |
JAVASCRIPT (if external) |
external_language |
| DDL |
Complete CREATE FUNCTION statement |
ddl |
| Description |
User-provided description |
From ROUTINE_OPTIONS |
| Is deterministic |
Whether the function is deterministic |
is_deterministic |
| Created |
Creation timestamp |
created |
| Last altered |
Last modification timestamp |
last_altered |
How to Obtain
SELECT
r.routine_catalog,
r.routine_schema,
r.routine_name,
r.routine_type,
r.data_type,
r.routine_body,
r.routine_definition,
r.external_language,
r.is_deterministic,
r.created,
r.last_altered,
r.ddl
FROM
`project`.`dataset`.INFORMATION_SCHEMA.ROUTINES r
WHERE
r.routine_type = 'FUNCTION'
ORDER BY
r.routine_name;
3. Table Functions
Motivation
Table-valued functions return tabular results and are used in FROM clauses. They need to be distinguished from scalar functions because they have a result column schema rather than a single return type.
Required Metadata
Same as scalar functions (section 2), plus:
| Property |
Description |
| Return columns |
List of result column names and types |
How to Obtain
-- Identify table functions
SELECT *
FROM `project`.`dataset`.INFORMATION_SCHEMA.ROUTINES r
WHERE r.routine_type = 'TABLE FUNCTION';
Return column schema is embedded in the ddl column and can also be inferred from INFORMATION_SCHEMA.PARAMETERS entries with ordinal_position = 0 or from the routine definition.
4. Routine Parameters
Motivation
Parameter metadata enables signature help, call template generation, and documentation. Without parameter names and types, tools can only show the routine name.
Required Metadata
| Property |
Description |
Source Column |
| Routine name |
Parent routine |
specific_name |
| Parameter name |
Parameter identifier |
parameter_name |
| Ordinal position |
1-based position |
ordinal_position |
| Parameter mode |
IN, OUT, INOUT |
parameter_mode |
| Data type |
BigQuery type name |
data_type |
| Is result |
Whether this describes the return type |
parameter_mode = 'OUT' and ordinal_position = 0 |
How to Obtain
SELECT
p.specific_catalog,
p.specific_schema,
p.specific_name,
p.ordinal_position,
p.parameter_name,
p.parameter_mode,
p.data_type
FROM
`project`.`dataset`.INFORMATION_SCHEMA.PARAMETERS p
WHERE
p.ordinal_position > 0 -- exclude return type entry
ORDER BY
p.specific_name,
p.ordinal_position;
5. ML Models
Motivation
BigQuery ML is a core BigQuery feature that allows users to create, train, and deploy machine learning models using SQL. Models are first-class dataset objects. Tools need to enumerate them for catalog browsing and display model properties.
Required Metadata
| Property |
Description |
Source |
| Model name |
Model identifier |
model_name |
| Model type |
LINEAR_REG, LOGISTIC_REG, KMEANS, BOOSTED_TREE, DNN_CLASSIFIER, ARIMA_PLUS, etc. |
model_type |
| Description |
User-provided description |
description |
| Created |
Creation timestamp |
creation_time |
| Last modified |
Last modification timestamp |
modified_time |
| Expiration |
Model expiration timestamp |
model_expiration_time |
| Labels |
Key-value labels |
labels |
| Feature columns |
Input feature column names and types |
feature_columns |
| Label columns |
Target label columns |
label_columns |
| Training runs |
List of training executions with timestamps |
training_runs |
| Location |
GCP region |
location |
| Encryption |
KMS key configuration |
encryption_configuration |
| Default trial ID |
Default trial for hyperparameter tuning |
default_trial_id |
How to Obtain
Option A: INFORMATION_SCHEMA
SELECT
model_name,
model_type,
schema_name,
creation_time,
modified_time,
model_expiration_time,
description,
labels,
feature_columns,
label_columns,
training_runs,
location,
encryption_configuration,
default_trial_id
FROM
`project`.`dataset`.INFORMATION_SCHEMA.ML_MODELS;
Option B: BigQuery API
GET https://bigquery.googleapis.com/bigquery/v2/projects/{project}/datasets/{dataset}/models/{model}
The API returns the full model resource including training runs, evaluation metrics, and hyperparameters.
Note
ML models are not traditional tables and don't fit naturally into the GetObjects table hierarchy. Consider either:
- Including them as a special table type (e.g., "MODEL")
- Providing a separate
GetModels() method
- Extending
GetObjects with a new sibling to db_schema_tables
6. Partition Metadata
Motivation
BigQuery tables are commonly partitioned by time or integer range. Users need per-partition statistics for:
- Understanding data distribution and freshness
- Writing partition-pruning queries
- Monitoring storage costs (active vs long-term tiers)
- Identifying stale or oversized partitions
Required Metadata
| Property |
Description |
Source Column |
| Table name |
Parent table |
table_name |
| Partition ID |
Partition identifier (e.g., "20240101", "NULL", "UNPARTITIONED") |
partition_id |
| Total rows |
Row count in partition |
total_rows |
| Total logical bytes |
Logical storage size |
total_logical_bytes |
| Total billable bytes |
Billable storage size |
total_billable_bytes |
| Last modified time |
Last data modification |
last_modified_time |
| Storage tier |
ACTIVE or LONG_TERM |
storage_tier |
How to Obtain
SELECT
p.table_name,
p.partition_id,
p.total_rows,
p.total_logical_bytes,
p.total_billable_bytes,
p.last_modified_time,
p.storage_tier
FROM
`project`.`dataset`.INFORMATION_SCHEMA.PARTITIONS p
WHERE
p.table_name = 'table_name'
ORDER BY
p.partition_id;
Aggregated Table Statistics from Partitions
SELECT
p.table_name,
COUNT(p.partition_id) AS partition_count,
SUM(p.total_rows) AS total_rows,
SUM(p.total_logical_bytes) AS total_logical_bytes,
SUM(p.total_billable_bytes) AS total_billable_bytes,
MAX(p.last_modified_time) AS last_modified_time
FROM
`project`.`dataset`.INFORMATION_SCHEMA.PARTITIONS p
GROUP BY
p.table_name;
7. DDL / CREATE Statement Retrieval
Motivation
DDL captures the complete definition of any BigQuery object — tables, views, materialized views, external tables, functions, procedures, and models. It is essential for:
- Schema documentation and version control
- Environment migration (dev to staging to production)
- Object recreation and cloning
- Understanding full object configuration (partitioning, clustering, options)
Required Metadata
| Property |
Description |
| Object name |
Fully qualified object name |
| Object type |
TABLE, VIEW, MATERIALIZED VIEW, FUNCTION, PROCEDURE, etc. |
| DDL text |
Complete CREATE statement |
How to Obtain
BigQuery uniquely provides DDL as a column in INFORMATION_SCHEMA:
-- Tables, views, materialized views, external tables
SELECT table_name, table_type, ddl
FROM `project`.`dataset`.INFORMATION_SCHEMA.TABLES;
-- Routines (functions, procedures)
SELECT routine_name, routine_type, ddl
FROM `project`.`dataset`.INFORMATION_SCHEMA.ROUTINES;
Suggested Approach
The simplest integration is to include the ddl column from existing INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.ROUTINES queries and expose it as an additional field in GetObjects (e.g., table_ddl in TABLE_SCHEMA).
8. Reverse Foreign Keys (Referenced By)
Motivation
When inspecting a table, users need to see not only which tables it references (foreign keys) but also which tables reference it (reverse foreign keys / "referenced by"). This is essential for impact analysis — understanding what downstream tables would be affected by changes.
Required Metadata
| Property |
Description |
| Referencing table |
Table that holds the foreign key |
| FK constraint name |
Foreign key constraint name |
| Referencing columns |
Columns in the referencing table |
| Referenced columns |
Columns in this table (the target) |
How to Obtain
SELECT
constr.table_name AS referencing_table,
constr.constraint_name,
col.column_name AS referencing_column,
pk_col.column_name AS referenced_column
FROM
`project`.`dataset`.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE u
INNER JOIN `project`.`dataset`.INFORMATION_SCHEMA.TABLE_CONSTRAINTS constr
ON constr.constraint_name = u.constraint_name
AND constr.constraint_type = 'FOREIGN KEY'
INNER JOIN `project`.`dataset`.INFORMATION_SCHEMA.KEY_COLUMN_USAGE col
ON col.constraint_name = constr.constraint_name
INNER JOIN `project`.`dataset`.INFORMATION_SCHEMA.KEY_COLUMN_USAGE pk_col
ON pk_col.table_name = 'referenced_table_name'
AND pk_col.ordinal_position = col.position_in_unique_constraint
WHERE
TRIM(u.table_name) = 'referenced_table_name'
ORDER BY
constr.constraint_name,
col.ordinal_position;
Note
This uses the same INFORMATION_SCHEMA tables already queried for forward foreign keys. It could be exposed as an additional constraint type or as a separate field on the table.
9. Extended Table Statistics
Motivation
The driver currently exposes num_rows from table metadata. Tools need richer storage statistics for cost monitoring, capacity planning, and performance optimization.
Required Metadata
| Property |
Description |
Source |
| Total rows |
Row count |
PARTITIONS aggregate or num_rows |
| Total logical bytes |
Logical storage size |
PARTITIONS aggregate |
| Total billable bytes |
Billable storage size |
PARTITIONS aggregate |
| Total physical bytes |
Physical storage including compression |
TABLE_STORAGE |
| Active logical bytes |
Recently modified data (<90 days) |
TABLE_STORAGE |
| Long-term logical bytes |
Data unmodified >90 days |
TABLE_STORAGE |
| Time travel physical bytes |
Time travel window storage |
TABLE_STORAGE |
| Partition count |
Number of partitions |
PARTITIONS aggregate |
| Last modified time |
Most recent data change |
PARTITIONS aggregate |
How to Obtain
-- Aggregated from PARTITIONS (most accurate row counts)
SELECT
table_name,
COUNT(*) AS partition_count,
SUM(total_rows) AS total_rows,
SUM(total_logical_bytes) AS total_logical_bytes,
SUM(total_billable_bytes) AS total_billable_bytes,
MAX(last_modified_time) AS last_modified_time
FROM `project`.`dataset`.INFORMATION_SCHEMA.PARTITIONS
GROUP BY table_name;
-- Detailed storage breakdown
SELECT
table_name,
total_logical_bytes,
active_logical_bytes,
long_term_logical_bytes,
total_physical_bytes,
active_physical_bytes,
long_term_physical_bytes,
time_travel_physical_bytes
FROM `project`.`dataset`.INFORMATION_SCHEMA.TABLE_STORAGE;
ADBC Spec Alignment
The ADBC GetStatistics() API (since v1.1.0) defines standard statistic keys for row count (key 6), null count (key 5), distinct count (key 1), etc. Row count and byte sizes would map naturally to this API.
10. Dataset Properties
Motivation
Datasets (schemas) in BigQuery carry important configuration metadata — location, default expiration, labels, access controls, and encryption settings. Tools need this for environment documentation and administrative workflows.
Required Metadata
| Property |
Description |
| Dataset name |
Dataset identifier |
| Description |
User-provided description |
| Location |
GCP region (e.g., US, EU, us-central1) |
| Creation time |
When the dataset was created |
| Last modified time |
Last configuration change |
| Default table expiration |
Auto-delete interval for new tables |
| Default partition expiration |
Auto-delete interval for partitions |
| Labels |
Key-value labels |
| Default collation |
Default collation for new string columns |
| Max time travel hours |
Time travel window (48-168 hours) |
| Storage billing model |
LOGICAL or PHYSICAL |
| Is case insensitive |
Case sensitivity setting |
How to Obtain
Option A: INFORMATION_SCHEMA
SELECT *
FROM `project`.INFORMATION_SCHEMA.SCHEMATA
WHERE schema_name = 'dataset_name';
SELECT *
FROM `project`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS
WHERE schema_name = 'dataset_name';
Option B: BigQuery API
GET https://bigquery.googleapis.com/bigquery/v2/projects/{project}/datasets/{dataset}
The API returns the complete dataset resource including access controls and encryption configuration.
11. Extended External Table Properties
Motivation
External tables in BigQuery connect to data stored outside BigQuery (Cloud Storage, Google Sheets, Bigtable, etc.). Tools need to display the source configuration so users understand where data comes from and how it's parsed.
Required Metadata
| Property |
Description |
| Source format |
AVRO, CSV, DATASTORE_BACKUP, GOOGLE_SHEETS, JSON, ORC, PARQUET, ICEBERG, DELTA |
| Source URIs |
Array of Cloud Storage URIs or Sheet URLs |
| Compression |
GZIP, SNAPPY, etc. |
| Encoding |
UTF-8, ISO-8859-1 |
| Field delimiter |
CSV field delimiter |
| Skip leading rows |
Number of header rows to skip |
| Quote character |
CSV quote character |
| Allow jagged rows |
Whether ragged CSV rows are allowed |
| Allow quoted newlines |
Whether newlines in quoted fields are allowed |
| Max bad records |
Error tolerance threshold |
| Null marker |
String representing NULL |
| Hive partition URI prefix |
Hive-style partition detection |
| Require hive partition filter |
Whether partition filter is required |
| Sheet range |
Google Sheets range (if applicable) |
| JSON extension |
GEOJSON, etc. |
| Enable logical types |
Whether Avro/Parquet logical types are used |
| Decimal target types |
How decimals are mapped |
How to Obtain
SELECT
t.table_name,
o.option_name,
o.option_value
FROM
`project`.`dataset`.INFORMATION_SCHEMA.TABLES t
JOIN `project`.`dataset`.INFORMATION_SCHEMA.TABLE_OPTIONS o
ON o.table_name = t.table_name
WHERE
t.table_type = 'EXTERNAL'
ORDER BY
t.table_name, o.option_name;
Key option names include: format, uris, compression, encoding, field_delimiter, skip_leading_rows, quote, allow_jagged_rows, allow_quoted_newlines, max_bad_records, null_marker, hive_partition_uri_prefix, require_hive_partition_filter, sheet_range, json_extension, enable_logical_types, decimal_target_types.
12. Extended Materialized View Properties
Motivation
Materialized views in BigQuery auto-refresh and can significantly reduce query costs. Tools need to display refresh configuration and status.
Required Metadata
| Property |
Description |
Source |
| View definition |
The SELECT query |
INFORMATION_SCHEMA.MATERIALIZED_VIEWS.definition |
| Enable refresh |
Whether auto-refresh is on |
TABLE_OPTIONS enable_refresh |
| Refresh interval |
Minutes between refreshes |
TABLE_OPTIONS refresh_interval_minutes |
| Last refresh time |
When the MV was last refreshed |
INFORMATION_SCHEMA.MATERIALIZED_VIEWS.last_refresh_time |
| Refresh watermark |
Freshness of the MV data |
INFORMATION_SCHEMA.MATERIALIZED_VIEWS.refresh_watermark |
How to Obtain
SELECT
table_name,
definition,
last_refresh_time,
refresh_watermark
FROM
`project`.`dataset`.INFORMATION_SCHEMA.MATERIALIZED_VIEWS;
Combined with TABLE_OPTIONS for refresh configuration.
13. Table Options (Labels, Expiration, Encryption)
Motivation
BigQuery tables carry rich configuration via options. Tools need to display these for documentation, compliance auditing, and operational awareness.
Required Metadata
| Property |
Description |
Option Name |
| Description |
Table description |
description |
| Expiration timestamp |
When the table auto-deletes |
expiration_timestamp |
| Partition expiration days |
When partitions auto-delete |
partition_expiration_days |
| Require partition filter |
Whether queries must filter on partition |
require_partition_filter |
| KMS key name |
Cloud KMS encryption key |
kms_key_name |
| Friendly name |
Display name |
friendly_name |
| Labels |
Key-value labels |
labels |
| Default rounding mode |
Rounding for NUMERIC/BIGNUMERIC |
default_rounding_mode |
| Max staleness |
BI Engine staleness tolerance |
max_staleness |
How to Obtain
SELECT
table_name,
option_name,
option_type,
option_value
FROM
`project`.`dataset`.INFORMATION_SCHEMA.TABLE_OPTIONS
WHERE
table_name = 'table_name';
14. Row-Level Security Policies
Motivation
BigQuery supports row-level access policies that filter rows based on the querying user's identity. Tools that display security configuration need to surface these policies.
Required Metadata
| Property |
Description |
Source Column |
| Policy name |
Row policy identifier |
row_access_policy_name |
| Table name |
Table the policy applies to |
table_name |
| Filter predicate |
SQL expression controlling access |
filter_predicate |
| Grantees |
Users/groups/domains granted access |
grantee_list |
| Created |
Creation timestamp |
creation_time |
How to Obtain
SELECT
row_access_policy_name,
table_catalog,
table_schema,
table_name,
filter_predicate,
grantee_list,
creation_time
FROM
`project`.`dataset`.INFORMATION_SCHEMA.ROW_ACCESS_POLICIES;
15. Column-Level Security (Policy Tags)
Motivation
BigQuery supports column-level security via Data Catalog policy tags. Columns can be tagged with policies that restrict access. Tools should indicate which columns have security policies applied.
Required Metadata
| Property |
Description |
Source Column |
| Table name |
Parent table |
table_name |
| Column name |
Protected column |
column_name |
| Policy tag |
Data Catalog policy tag resource name |
policy_tag |
How to Obtain
SELECT
table_name,
column_name,
field_path,
policy_tag
FROM
`project`.`dataset`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE
policy_tag IS NOT NULL;
16. Authorized Views & Datasets
Motivation
BigQuery supports authorized views and authorized datasets — views/datasets that can access data in other datasets regardless of the querying user's permissions. This is a key security pattern. Tools should indicate which views are authorized.
Required Metadata
For authorized views on a dataset:
| Property |
Description |
| Source dataset |
The dataset granting access |
| Authorized view |
Project.dataset.view that has been authorized |
For authorized datasets:
| Property |
Description |
| Source dataset |
The dataset granting access |
| Authorized dataset |
Project.dataset that has been authorized |
How to Obtain
This metadata is available via the BigQuery API on the dataset resource's access field:
GET https://bigquery.googleapis.com/bigquery/v2/projects/{project}/datasets/{dataset}
The response includes an access array with entries of type view (for authorized views) and dataset (for authorized datasets).
17. Search Indexes
Motivation
BigQuery supports search indexes for efficient text search across STRING, JSON, and ARRAY columns. Tools should display which tables have search indexes and their configuration.
Required Metadata
| Property |
Description |
Source Column |
| Index name |
Search index identifier |
index_name |
| Table name |
Indexed table |
table_name |
| Index columns |
Columns included in the index |
index_columns |
| Index type |
SEARCH |
index_type |
| Index status |
ACTIVE, PENDING, DISABLED, etc. |
index_status |
| Created |
Creation timestamp |
creation_time |
| Last modified |
Last modification timestamp |
last_modified_time |
| Coverage percentage |
Percentage of table data indexed |
coverage_percentage |
How to Obtain
SELECT
table_name,
index_name,
index_columns,
index_type,
index_status,
creation_time,
last_modified_time,
coverage_percentage
FROM
`project`.`dataset`.INFORMATION_SCHEMA.SEARCH_INDEXES;
Summary Priority Table
| # |
Category |
BigQuery Source |
Impact |
| 1 |
Stored Procedures |
INFORMATION_SCHEMA.ROUTINES |
High — core SQL development |
| 2 |
Scalar Functions (UDFs) |
INFORMATION_SCHEMA.ROUTINES |
High — autocomplete, documentation |
| 3 |
Table Functions |
INFORMATION_SCHEMA.ROUTINES |
High — SQL development |
| 4 |
Routine Parameters |
INFORMATION_SCHEMA.PARAMETERS |
High — signature help |
| 5 |
ML Models |
INFORMATION_SCHEMA.ML_MODELS / BigQuery API |
Medium — BQML users |
| 6 |
Partition Metadata |
INFORMATION_SCHEMA.PARTITIONS |
High — cost & performance |
| 7 |
DDL Retrieval |
INFORMATION_SCHEMA.TABLES.ddl / ROUTINES.ddl |
High — migration, documentation |
| 8 |
Reverse Foreign Keys |
INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE |
Medium — impact analysis |
| 9 |
Extended Table Statistics |
INFORMATION_SCHEMA.PARTITIONS / TABLE_STORAGE |
Medium — cost monitoring |
| 10 |
Dataset Properties |
INFORMATION_SCHEMA.SCHEMATA / BigQuery API |
Medium — admin context |
| 11 |
External Table Properties |
INFORMATION_SCHEMA.TABLE_OPTIONS |
Medium — data source config |
| 12 |
Materialized View Properties |
INFORMATION_SCHEMA.MATERIALIZED_VIEWS |
Medium — refresh status |
| 13 |
Table Options |
INFORMATION_SCHEMA.TABLE_OPTIONS |
Medium — configuration display |
| 14 |
Row-Level Security Policies |
INFORMATION_SCHEMA.ROW_ACCESS_POLICIES |
Low — security audit |
| 15 |
Column-Level Security |
INFORMATION_SCHEMA.COLUMN_FIELD_PATHS |
Low — security audit |
| 16 |
Authorized Views & Datasets |
BigQuery API (dataset access field) |
Low — security audit |
| 17 |
Search Indexes |
INFORMATION_SCHEMA.SEARCH_INDEXES |
Low — index awareness |
We're happy to provide further details on any of these categories.
Summary
The ADBC BigQuery driver currently supports discovery of projects, datasets, tables (including views, materialized views, external tables, clones, snapshots), columns, and primary/foreign key constraints. However, BigQuery has additional object types and metadata that data development tools need — routines (functions and procedures), ML models, partition details, DDL, reverse foreign keys, and richer table/dataset properties. This issue requests expanding metadata discovery to cover these gaps.
We build a collaborative data operations platform and currently use the BigQuery JDBC driver supplemented by
INFORMATION_SCHEMAqueries and the Google Cloud BigQuery Java API to extract metadata. We are evaluating the ADBC driver as a modern Arrow-native replacement but cannot adopt it without coverage for the object categories listed below.Each section describes what we need, why it matters, what metadata properties are required, and how the data can be obtained from BigQuery.
Table of Contents
1. Stored Procedures
Motivation
Stored procedures are a core part of BigQuery's programmability model. They support SQL and JavaScript, can accept IN/OUT/INOUT parameters, and are used extensively for ETL orchestration and data transformations. Data development tools need to enumerate them for catalog browsing, autocomplete, and documentation.
Required Metadata
routine_catalogroutine_schemaroutine_nameroutine_typeroutine_bodyexternal_languageddlis_deterministiccreatedlast_alteredsecurity_typeHow to Obtain
Options (description, library URIs) are available from:
2. Scalar Functions (UDFs)
Motivation
User-defined scalar functions are used extensively in BigQuery for reusable transformations. Tools need to discover them for autocomplete, signature help, and catalog browsing.
Required Metadata
routine_catalogroutine_schemaroutine_nameroutine_typedata_typeroutine_bodyexternal_languageddlis_deterministiccreatedlast_alteredHow to Obtain
3. Table Functions
Motivation
Table-valued functions return tabular results and are used in FROM clauses. They need to be distinguished from scalar functions because they have a result column schema rather than a single return type.
Required Metadata
Same as scalar functions (section 2), plus:
How to Obtain
Return column schema is embedded in the
ddlcolumn and can also be inferred fromINFORMATION_SCHEMA.PARAMETERSentries withordinal_position = 0or from the routine definition.4. Routine Parameters
Motivation
Parameter metadata enables signature help, call template generation, and documentation. Without parameter names and types, tools can only show the routine name.
Required Metadata
specific_nameparameter_nameordinal_positionparameter_modedata_typeparameter_mode = 'OUT'andordinal_position = 0How to Obtain
5. ML Models
Motivation
BigQuery ML is a core BigQuery feature that allows users to create, train, and deploy machine learning models using SQL. Models are first-class dataset objects. Tools need to enumerate them for catalog browsing and display model properties.
Required Metadata
model_namemodel_typedescriptioncreation_timemodified_timemodel_expiration_timelabelsfeature_columnslabel_columnstraining_runslocationencryption_configurationdefault_trial_idHow to Obtain
Option A: INFORMATION_SCHEMA
Option B: BigQuery API
The API returns the full model resource including training runs, evaluation metrics, and hyperparameters.
Note
ML models are not traditional tables and don't fit naturally into the
GetObjectstable hierarchy. Consider either:GetModels()methodGetObjectswith a new sibling todb_schema_tables6. Partition Metadata
Motivation
BigQuery tables are commonly partitioned by time or integer range. Users need per-partition statistics for:
Required Metadata
table_namepartition_idtotal_rowstotal_logical_bytestotal_billable_byteslast_modified_timestorage_tierHow to Obtain
Aggregated Table Statistics from Partitions
7. DDL / CREATE Statement Retrieval
Motivation
DDL captures the complete definition of any BigQuery object — tables, views, materialized views, external tables, functions, procedures, and models. It is essential for:
Required Metadata
How to Obtain
BigQuery uniquely provides DDL as a column in
INFORMATION_SCHEMA:Suggested Approach
The simplest integration is to include the
ddlcolumn from existingINFORMATION_SCHEMA.TABLESandINFORMATION_SCHEMA.ROUTINESqueries and expose it as an additional field inGetObjects(e.g.,table_ddlinTABLE_SCHEMA).8. Reverse Foreign Keys (Referenced By)
Motivation
When inspecting a table, users need to see not only which tables it references (foreign keys) but also which tables reference it (reverse foreign keys / "referenced by"). This is essential for impact analysis — understanding what downstream tables would be affected by changes.
Required Metadata
How to Obtain
Note
This uses the same
INFORMATION_SCHEMAtables already queried for forward foreign keys. It could be exposed as an additional constraint type or as a separate field on the table.9. Extended Table Statistics
Motivation
The driver currently exposes
num_rowsfrom table metadata. Tools need richer storage statistics for cost monitoring, capacity planning, and performance optimization.Required Metadata
PARTITIONSaggregate ornum_rowsPARTITIONSaggregatePARTITIONSaggregateTABLE_STORAGETABLE_STORAGETABLE_STORAGETABLE_STORAGEPARTITIONSaggregatePARTITIONSaggregateHow to Obtain
ADBC Spec Alignment
The ADBC
GetStatistics()API (since v1.1.0) defines standard statistic keys for row count (key 6), null count (key 5), distinct count (key 1), etc. Row count and byte sizes would map naturally to this API.10. Dataset Properties
Motivation
Datasets (schemas) in BigQuery carry important configuration metadata — location, default expiration, labels, access controls, and encryption settings. Tools need this for environment documentation and administrative workflows.
Required Metadata
How to Obtain
Option A: INFORMATION_SCHEMA
Option B: BigQuery API
The API returns the complete dataset resource including access controls and encryption configuration.
11. Extended External Table Properties
Motivation
External tables in BigQuery connect to data stored outside BigQuery (Cloud Storage, Google Sheets, Bigtable, etc.). Tools need to display the source configuration so users understand where data comes from and how it's parsed.
Required Metadata
How to Obtain
Key option names include:
format,uris,compression,encoding,field_delimiter,skip_leading_rows,quote,allow_jagged_rows,allow_quoted_newlines,max_bad_records,null_marker,hive_partition_uri_prefix,require_hive_partition_filter,sheet_range,json_extension,enable_logical_types,decimal_target_types.12. Extended Materialized View Properties
Motivation
Materialized views in BigQuery auto-refresh and can significantly reduce query costs. Tools need to display refresh configuration and status.
Required Metadata
INFORMATION_SCHEMA.MATERIALIZED_VIEWS.definitionenable_refreshrefresh_interval_minutesINFORMATION_SCHEMA.MATERIALIZED_VIEWS.last_refresh_timeINFORMATION_SCHEMA.MATERIALIZED_VIEWS.refresh_watermarkHow to Obtain
Combined with TABLE_OPTIONS for refresh configuration.
13. Table Options (Labels, Expiration, Encryption)
Motivation
BigQuery tables carry rich configuration via options. Tools need to display these for documentation, compliance auditing, and operational awareness.
Required Metadata
descriptionexpiration_timestamppartition_expiration_daysrequire_partition_filterkms_key_namefriendly_namelabelsdefault_rounding_modemax_stalenessHow to Obtain
14. Row-Level Security Policies
Motivation
BigQuery supports row-level access policies that filter rows based on the querying user's identity. Tools that display security configuration need to surface these policies.
Required Metadata
row_access_policy_nametable_namefilter_predicategrantee_listcreation_timeHow to Obtain
15. Column-Level Security (Policy Tags)
Motivation
BigQuery supports column-level security via Data Catalog policy tags. Columns can be tagged with policies that restrict access. Tools should indicate which columns have security policies applied.
Required Metadata
table_namecolumn_namepolicy_tagHow to Obtain
16. Authorized Views & Datasets
Motivation
BigQuery supports authorized views and authorized datasets — views/datasets that can access data in other datasets regardless of the querying user's permissions. This is a key security pattern. Tools should indicate which views are authorized.
Required Metadata
For authorized views on a dataset:
For authorized datasets:
How to Obtain
This metadata is available via the BigQuery API on the dataset resource's
accessfield:The response includes an
accessarray with entries of typeview(for authorized views) anddataset(for authorized datasets).17. Search Indexes
Motivation
BigQuery supports search indexes for efficient text search across STRING, JSON, and ARRAY columns. Tools should display which tables have search indexes and their configuration.
Required Metadata
index_nametable_nameindex_columnsindex_typeindex_statuscreation_timelast_modified_timecoverage_percentageHow to Obtain
Summary Priority Table
INFORMATION_SCHEMA.ROUTINESINFORMATION_SCHEMA.ROUTINESINFORMATION_SCHEMA.ROUTINESINFORMATION_SCHEMA.PARAMETERSINFORMATION_SCHEMA.ML_MODELS/ BigQuery APIINFORMATION_SCHEMA.PARTITIONSINFORMATION_SCHEMA.TABLES.ddl/ROUTINES.ddlINFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGEINFORMATION_SCHEMA.PARTITIONS/TABLE_STORAGEINFORMATION_SCHEMA.SCHEMATA/ BigQuery APIINFORMATION_SCHEMA.TABLE_OPTIONSINFORMATION_SCHEMA.MATERIALIZED_VIEWSINFORMATION_SCHEMA.TABLE_OPTIONSINFORMATION_SCHEMA.ROW_ACCESS_POLICIESINFORMATION_SCHEMA.COLUMN_FIELD_PATHSINFORMATION_SCHEMA.SEARCH_INDEXESWe're happy to provide further details on any of these categories.