Feature Request: Comprehensive Metadata Discovery for Data Development Tools

## Summary

The ADBC BigQuery driver currently supports discovery of projects, datasets, tables (including views, materialized views, external tables, clones, snapshots), columns, and primary/foreign key constraints. However, BigQuery has additional object types and metadata that data development tools need — routines (functions and procedures), ML models, partition details, DDL, reverse foreign keys, and richer table/dataset properties. This issue requests expanding metadata discovery to cover these gaps.

We build a collaborative data operations platform and currently use the BigQuery JDBC driver supplemented by `INFORMATION_SCHEMA` queries and the Google Cloud BigQuery Java API to extract metadata. We are evaluating the ADBC driver as a modern Arrow-native replacement but cannot adopt it without coverage for the object categories listed below.

Each section describes what we need, why it matters, what metadata properties are required, and how the data can be obtained from BigQuery.

---

## Table of Contents

1. [Stored Procedures](#1-stored-procedures)
2. [Scalar Functions (UDFs)](#2-scalar-functions-udfs)
3. [Table Functions](#3-table-functions)
4. [Routine Parameters](#4-routine-parameters)
5. [ML Models](#5-ml-models)
6. [Partition Metadata](#6-partition-metadata)
7. [DDL / CREATE Statement Retrieval](#7-ddl--create-statement-retrieval)
8. [Reverse Foreign Keys (Referenced By)](#8-reverse-foreign-keys-referenced-by)
9. [Extended Table Statistics](#9-extended-table-statistics)
10. [Dataset Properties](#10-dataset-properties)
11. [Extended External Table Properties](#11-extended-external-table-properties)
12. [Extended Materialized View Properties](#12-extended-materialized-view-properties)
13. [Table Options (Labels, Expiration, Encryption)](#13-table-options-labels-expiration-encryption)
14. [Row-Level Security Policies](#14-row-level-security-policies)
15. [Column-Level Security (Policy Tags)](#15-column-level-security-policy-tags)
16. [Authorized Views & Datasets](#16-authorized-views--datasets)
17. [Search Indexes](#17-search-indexes)

---

## 1. Stored Procedures

### Motivation

Stored procedures are a core part of BigQuery's programmability model. They support SQL and JavaScript, can accept IN/OUT/INOUT parameters, and are used extensively for ETL orchestration and data transformations. Data development tools need to enumerate them for catalog browsing, autocomplete, and documentation.

### Required Metadata

| Property | Description | Source Column |
|---|---|---|
| Routine catalog | Project containing the procedure | `routine_catalog` |
| Routine schema | Dataset containing the procedure | `routine_schema` |
| Routine name | Procedure identifier | `routine_name` |
| Routine type | PROCEDURE | `routine_type` |
| Routine body | SQL or EXTERNAL | `routine_body` |
| External language | JAVASCRIPT (if external) | `external_language` |
| DDL | Complete CREATE PROCEDURE statement | `ddl` |
| Description | User-provided description | From ROUTINE_OPTIONS |
| Is deterministic | Whether the procedure is deterministic | `is_deterministic` |
| Created | Creation timestamp | `created` |
| Last altered | Last modification timestamp | `last_altered` |
| Security type | INVOKER or DEFINER | `security_type` |

### How to Obtain

```sql
SELECT
    r.routine_catalog,
    r.routine_schema,
    r.routine_name,
    r.routine_type,
    r.routine_body,
    r.external_language,
    r.is_deterministic,
    r.security_type,
    r.created,
    r.last_altered,
    r.ddl
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.ROUTINES r
WHERE
    r.routine_type = 'PROCEDURE'
ORDER BY
    r.routine_name;
```

Options (description, library URIs) are available from:
```sql
SELECT option_name, option_value
FROM `project`.`dataset`.INFORMATION_SCHEMA.ROUTINE_OPTIONS
WHERE specific_name = 'procedure_name';
```

---

## 2. Scalar Functions (UDFs)

### Motivation

User-defined scalar functions are used extensively in BigQuery for reusable transformations. Tools need to discover them for autocomplete, signature help, and catalog browsing.

### Required Metadata

| Property | Description | Source Column |
|---|---|---|
| Routine catalog | Project containing the function | `routine_catalog` |
| Routine schema | Dataset containing the function | `routine_schema` |
| Routine name | Function identifier | `routine_name` |
| Routine type | FUNCTION | `routine_type` |
| Return data type | Function return type | `data_type` |
| Routine body | SQL or EXTERNAL | `routine_body` |
| External language | JAVASCRIPT (if external) | `external_language` |
| DDL | Complete CREATE FUNCTION statement | `ddl` |
| Description | User-provided description | From ROUTINE_OPTIONS |
| Is deterministic | Whether the function is deterministic | `is_deterministic` |
| Created | Creation timestamp | `created` |
| Last altered | Last modification timestamp | `last_altered` |

### How to Obtain

```sql
SELECT
    r.routine_catalog,
    r.routine_schema,
    r.routine_name,
    r.routine_type,
    r.data_type,
    r.routine_body,
    r.routine_definition,
    r.external_language,
    r.is_deterministic,
    r.created,
    r.last_altered,
    r.ddl
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.ROUTINES r
WHERE
    r.routine_type = 'FUNCTION'
ORDER BY
    r.routine_name;
```

---

## 3. Table Functions

### Motivation

Table-valued functions return tabular results and are used in FROM clauses. They need to be distinguished from scalar functions because they have a result column schema rather than a single return type.

### Required Metadata

Same as scalar functions (section 2), plus:

| Property | Description |
|---|---|
| Return columns | List of result column names and types |

### How to Obtain

```sql
-- Identify table functions
SELECT *
FROM `project`.`dataset`.INFORMATION_SCHEMA.ROUTINES r
WHERE r.routine_type = 'TABLE FUNCTION';
```

Return column schema is embedded in the `ddl` column and can also be inferred from `INFORMATION_SCHEMA.PARAMETERS` entries with `ordinal_position = 0` or from the routine definition.

---

## 4. Routine Parameters

### Motivation

Parameter metadata enables signature help, call template generation, and documentation. Without parameter names and types, tools can only show the routine name.

### Required Metadata

| Property | Description | Source Column |
|---|---|---|
| Routine name | Parent routine | `specific_name` |
| Parameter name | Parameter identifier | `parameter_name` |
| Ordinal position | 1-based position | `ordinal_position` |
| Parameter mode | IN, OUT, INOUT | `parameter_mode` |
| Data type | BigQuery type name | `data_type` |
| Is result | Whether this describes the return type | `parameter_mode = 'OUT'` and `ordinal_position = 0` |

### How to Obtain

```sql
SELECT
    p.specific_catalog,
    p.specific_schema,
    p.specific_name,
    p.ordinal_position,
    p.parameter_name,
    p.parameter_mode,
    p.data_type
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.PARAMETERS p
WHERE
    p.ordinal_position > 0  -- exclude return type entry
ORDER BY
    p.specific_name,
    p.ordinal_position;
```

---

## 5. ML Models

### Motivation

BigQuery ML is a core BigQuery feature that allows users to create, train, and deploy machine learning models using SQL. Models are first-class dataset objects. Tools need to enumerate them for catalog browsing and display model properties.

### Required Metadata

| Property | Description | Source |
|---|---|---|
| Model name | Model identifier | `model_name` |
| Model type | LINEAR_REG, LOGISTIC_REG, KMEANS, BOOSTED_TREE, DNN_CLASSIFIER, ARIMA_PLUS, etc. | `model_type` |
| Description | User-provided description | `description` |
| Created | Creation timestamp | `creation_time` |
| Last modified | Last modification timestamp | `modified_time` |
| Expiration | Model expiration timestamp | `model_expiration_time` |
| Labels | Key-value labels | `labels` |
| Feature columns | Input feature column names and types | `feature_columns` |
| Label columns | Target label columns | `label_columns` |
| Training runs | List of training executions with timestamps | `training_runs` |
| Location | GCP region | `location` |
| Encryption | KMS key configuration | `encryption_configuration` |
| Default trial ID | Default trial for hyperparameter tuning | `default_trial_id` |

### How to Obtain

**Option A: INFORMATION_SCHEMA**
```sql
SELECT
    model_name,
    model_type,
    schema_name,
    creation_time,
    modified_time,
    model_expiration_time,
    description,
    labels,
    feature_columns,
    label_columns,
    training_runs,
    location,
    encryption_configuration,
    default_trial_id
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.ML_MODELS;
```

**Option B: BigQuery API**
```
GET https://bigquery.googleapis.com/bigquery/v2/projects/{project}/datasets/{dataset}/models/{model}
```

The API returns the full model resource including training runs, evaluation metrics, and hyperparameters.

### Note

ML models are not traditional tables and don't fit naturally into the `GetObjects` table hierarchy. Consider either:
1. Including them as a special table type (e.g., "MODEL")
2. Providing a separate `GetModels()` method
3. Extending `GetObjects` with a new sibling to `db_schema_tables`

---

## 6. Partition Metadata

### Motivation

BigQuery tables are commonly partitioned by time or integer range. Users need per-partition statistics for:
- Understanding data distribution and freshness
- Writing partition-pruning queries
- Monitoring storage costs (active vs long-term tiers)
- Identifying stale or oversized partitions

### Required Metadata

| Property | Description | Source Column |
|---|---|---|
| Table name | Parent table | `table_name` |
| Partition ID | Partition identifier (e.g., "20240101", "__NULL__", "__UNPARTITIONED__") | `partition_id` |
| Total rows | Row count in partition | `total_rows` |
| Total logical bytes | Logical storage size | `total_logical_bytes` |
| Total billable bytes | Billable storage size | `total_billable_bytes` |
| Last modified time | Last data modification | `last_modified_time` |
| Storage tier | ACTIVE or LONG_TERM | `storage_tier` |

### How to Obtain

```sql
SELECT
    p.table_name,
    p.partition_id,
    p.total_rows,
    p.total_logical_bytes,
    p.total_billable_bytes,
    p.last_modified_time,
    p.storage_tier
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.PARTITIONS p
WHERE
    p.table_name = 'table_name'
ORDER BY
    p.partition_id;
```

### Aggregated Table Statistics from Partitions

```sql
SELECT
    p.table_name,
    COUNT(p.partition_id) AS partition_count,
    SUM(p.total_rows) AS total_rows,
    SUM(p.total_logical_bytes) AS total_logical_bytes,
    SUM(p.total_billable_bytes) AS total_billable_bytes,
    MAX(p.last_modified_time) AS last_modified_time
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.PARTITIONS p
GROUP BY
    p.table_name;
```

---

## 7. DDL / CREATE Statement Retrieval

### Motivation

DDL captures the complete definition of any BigQuery object — tables, views, materialized views, external tables, functions, procedures, and models. It is essential for:
- Schema documentation and version control
- Environment migration (dev to staging to production)
- Object recreation and cloning
- Understanding full object configuration (partitioning, clustering, options)

### Required Metadata

| Property | Description |
|---|---|
| Object name | Fully qualified object name |
| Object type | TABLE, VIEW, MATERIALIZED VIEW, FUNCTION, PROCEDURE, etc. |
| DDL text | Complete CREATE statement |

### How to Obtain

BigQuery uniquely provides DDL as a column in `INFORMATION_SCHEMA`:

```sql
-- Tables, views, materialized views, external tables
SELECT table_name, table_type, ddl
FROM `project`.`dataset`.INFORMATION_SCHEMA.TABLES;

-- Routines (functions, procedures)
SELECT routine_name, routine_type, ddl
FROM `project`.`dataset`.INFORMATION_SCHEMA.ROUTINES;
```

### Suggested Approach

The simplest integration is to include the `ddl` column from existing `INFORMATION_SCHEMA.TABLES` and `INFORMATION_SCHEMA.ROUTINES` queries and expose it as an additional field in `GetObjects` (e.g., `table_ddl` in `TABLE_SCHEMA`).

---

## 8. Reverse Foreign Keys (Referenced By)

### Motivation

When inspecting a table, users need to see not only which tables it references (foreign keys) but also which tables reference it (reverse foreign keys / "referenced by"). This is essential for impact analysis — understanding what downstream tables would be affected by changes.

### Required Metadata

| Property | Description |
|---|---|
| Referencing table | Table that holds the foreign key |
| FK constraint name | Foreign key constraint name |
| Referencing columns | Columns in the referencing table |
| Referenced columns | Columns in this table (the target) |

### How to Obtain

```sql
SELECT
    constr.table_name AS referencing_table,
    constr.constraint_name,
    col.column_name AS referencing_column,
    pk_col.column_name AS referenced_column
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE u
    INNER JOIN `project`.`dataset`.INFORMATION_SCHEMA.TABLE_CONSTRAINTS constr
        ON constr.constraint_name = u.constraint_name
        AND constr.constraint_type = 'FOREIGN KEY'
    INNER JOIN `project`.`dataset`.INFORMATION_SCHEMA.KEY_COLUMN_USAGE col
        ON col.constraint_name = constr.constraint_name
    INNER JOIN `project`.`dataset`.INFORMATION_SCHEMA.KEY_COLUMN_USAGE pk_col
        ON pk_col.table_name = 'referenced_table_name'
        AND pk_col.ordinal_position = col.position_in_unique_constraint
WHERE
    TRIM(u.table_name) = 'referenced_table_name'
ORDER BY
    constr.constraint_name,
    col.ordinal_position;
```

### Note

This uses the same `INFORMATION_SCHEMA` tables already queried for forward foreign keys. It could be exposed as an additional constraint type or as a separate field on the table.

---

## 9. Extended Table Statistics

### Motivation

The driver currently exposes `num_rows` from table metadata. Tools need richer storage statistics for cost monitoring, capacity planning, and performance optimization.

### Required Metadata

| Property | Description | Source |
|---|---|---|
| Total rows | Row count | `PARTITIONS` aggregate or `num_rows` |
| Total logical bytes | Logical storage size | `PARTITIONS` aggregate |
| Total billable bytes | Billable storage size | `PARTITIONS` aggregate |
| Total physical bytes | Physical storage including compression | `TABLE_STORAGE` |
| Active logical bytes | Recently modified data (<90 days) | `TABLE_STORAGE` |
| Long-term logical bytes | Data unmodified >90 days | `TABLE_STORAGE` |
| Time travel physical bytes | Time travel window storage | `TABLE_STORAGE` |
| Partition count | Number of partitions | `PARTITIONS` aggregate |
| Last modified time | Most recent data change | `PARTITIONS` aggregate |

### How to Obtain

```sql
-- Aggregated from PARTITIONS (most accurate row counts)
SELECT
    table_name,
    COUNT(*) AS partition_count,
    SUM(total_rows) AS total_rows,
    SUM(total_logical_bytes) AS total_logical_bytes,
    SUM(total_billable_bytes) AS total_billable_bytes,
    MAX(last_modified_time) AS last_modified_time
FROM `project`.`dataset`.INFORMATION_SCHEMA.PARTITIONS
GROUP BY table_name;

-- Detailed storage breakdown
SELECT
    table_name,
    total_logical_bytes,
    active_logical_bytes,
    long_term_logical_bytes,
    total_physical_bytes,
    active_physical_bytes,
    long_term_physical_bytes,
    time_travel_physical_bytes
FROM `project`.`dataset`.INFORMATION_SCHEMA.TABLE_STORAGE;
```

### ADBC Spec Alignment

The ADBC `GetStatistics()` API (since v1.1.0) defines standard statistic keys for row count (key 6), null count (key 5), distinct count (key 1), etc. Row count and byte sizes would map naturally to this API.

---

## 10. Dataset Properties

### Motivation

Datasets (schemas) in BigQuery carry important configuration metadata — location, default expiration, labels, access controls, and encryption settings. Tools need this for environment documentation and administrative workflows.

### Required Metadata

| Property | Description |
|---|---|
| Dataset name | Dataset identifier |
| Description | User-provided description |
| Location | GCP region (e.g., US, EU, us-central1) |
| Creation time | When the dataset was created |
| Last modified time | Last configuration change |
| Default table expiration | Auto-delete interval for new tables |
| Default partition expiration | Auto-delete interval for partitions |
| Labels | Key-value labels |
| Default collation | Default collation for new string columns |
| Max time travel hours | Time travel window (48-168 hours) |
| Storage billing model | LOGICAL or PHYSICAL |
| Is case insensitive | Case sensitivity setting |

### How to Obtain

**Option A: INFORMATION_SCHEMA**
```sql
SELECT *
FROM `project`.INFORMATION_SCHEMA.SCHEMATA
WHERE schema_name = 'dataset_name';

SELECT *
FROM `project`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS
WHERE schema_name = 'dataset_name';
```

**Option B: BigQuery API**
```
GET https://bigquery.googleapis.com/bigquery/v2/projects/{project}/datasets/{dataset}
```

The API returns the complete dataset resource including access controls and encryption configuration.

---

## 11. Extended External Table Properties

### Motivation

External tables in BigQuery connect to data stored outside BigQuery (Cloud Storage, Google Sheets, Bigtable, etc.). Tools need to display the source configuration so users understand where data comes from and how it's parsed.

### Required Metadata

| Property | Description |
|---|---|
| Source format | AVRO, CSV, DATASTORE_BACKUP, GOOGLE_SHEETS, JSON, ORC, PARQUET, ICEBERG, DELTA |
| Source URIs | Array of Cloud Storage URIs or Sheet URLs |
| Compression | GZIP, SNAPPY, etc. |
| Encoding | UTF-8, ISO-8859-1 |
| Field delimiter | CSV field delimiter |
| Skip leading rows | Number of header rows to skip |
| Quote character | CSV quote character |
| Allow jagged rows | Whether ragged CSV rows are allowed |
| Allow quoted newlines | Whether newlines in quoted fields are allowed |
| Max bad records | Error tolerance threshold |
| Null marker | String representing NULL |
| Hive partition URI prefix | Hive-style partition detection |
| Require hive partition filter | Whether partition filter is required |
| Sheet range | Google Sheets range (if applicable) |
| JSON extension | GEOJSON, etc. |
| Enable logical types | Whether Avro/Parquet logical types are used |
| Decimal target types | How decimals are mapped |

### How to Obtain

```sql
SELECT
    t.table_name,
    o.option_name,
    o.option_value
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.TABLES t
    JOIN `project`.`dataset`.INFORMATION_SCHEMA.TABLE_OPTIONS o
        ON o.table_name = t.table_name
WHERE
    t.table_type = 'EXTERNAL'
ORDER BY
    t.table_name, o.option_name;
```

Key option names include: `format`, `uris`, `compression`, `encoding`, `field_delimiter`, `skip_leading_rows`, `quote`, `allow_jagged_rows`, `allow_quoted_newlines`, `max_bad_records`, `null_marker`, `hive_partition_uri_prefix`, `require_hive_partition_filter`, `sheet_range`, `json_extension`, `enable_logical_types`, `decimal_target_types`.

---

## 12. Extended Materialized View Properties

### Motivation

Materialized views in BigQuery auto-refresh and can significantly reduce query costs. Tools need to display refresh configuration and status.

### Required Metadata

| Property | Description | Source |
|---|---|---|
| View definition | The SELECT query | `INFORMATION_SCHEMA.MATERIALIZED_VIEWS.definition` |
| Enable refresh | Whether auto-refresh is on | TABLE_OPTIONS `enable_refresh` |
| Refresh interval | Minutes between refreshes | TABLE_OPTIONS `refresh_interval_minutes` |
| Last refresh time | When the MV was last refreshed | `INFORMATION_SCHEMA.MATERIALIZED_VIEWS.last_refresh_time` |
| Refresh watermark | Freshness of the MV data | `INFORMATION_SCHEMA.MATERIALIZED_VIEWS.refresh_watermark` |

### How to Obtain

```sql
SELECT
    table_name,
    definition,
    last_refresh_time,
    refresh_watermark
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.MATERIALIZED_VIEWS;
```

Combined with TABLE_OPTIONS for refresh configuration.

---

## 13. Table Options (Labels, Expiration, Encryption)

### Motivation

BigQuery tables carry rich configuration via options. Tools need to display these for documentation, compliance auditing, and operational awareness.

### Required Metadata

| Property | Description | Option Name |
|---|---|---|
| Description | Table description | `description` |
| Expiration timestamp | When the table auto-deletes | `expiration_timestamp` |
| Partition expiration days | When partitions auto-delete | `partition_expiration_days` |
| Require partition filter | Whether queries must filter on partition | `require_partition_filter` |
| KMS key name | Cloud KMS encryption key | `kms_key_name` |
| Friendly name | Display name | `friendly_name` |
| Labels | Key-value labels | `labels` |
| Default rounding mode | Rounding for NUMERIC/BIGNUMERIC | `default_rounding_mode` |
| Max staleness | BI Engine staleness tolerance | `max_staleness` |

### How to Obtain

```sql
SELECT
    table_name,
    option_name,
    option_type,
    option_value
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.TABLE_OPTIONS
WHERE
    table_name = 'table_name';
```

---

## 14. Row-Level Security Policies

### Motivation

BigQuery supports row-level access policies that filter rows based on the querying user's identity. Tools that display security configuration need to surface these policies.

### Required Metadata

| Property | Description | Source Column |
|---|---|---|
| Policy name | Row policy identifier | `row_access_policy_name` |
| Table name | Table the policy applies to | `table_name` |
| Filter predicate | SQL expression controlling access | `filter_predicate` |
| Grantees | Users/groups/domains granted access | `grantee_list` |
| Created | Creation timestamp | `creation_time` |

### How to Obtain

```sql
SELECT
    row_access_policy_name,
    table_catalog,
    table_schema,
    table_name,
    filter_predicate,
    grantee_list,
    creation_time
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.ROW_ACCESS_POLICIES;
```

---

## 15. Column-Level Security (Policy Tags)

### Motivation

BigQuery supports column-level security via Data Catalog policy tags. Columns can be tagged with policies that restrict access. Tools should indicate which columns have security policies applied.

### Required Metadata

| Property | Description | Source Column |
|---|---|---|
| Table name | Parent table | `table_name` |
| Column name | Protected column | `column_name` |
| Policy tag | Data Catalog policy tag resource name | `policy_tag` |

### How to Obtain

```sql
SELECT
    table_name,
    column_name,
    field_path,
    policy_tag
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE
    policy_tag IS NOT NULL;
```

---

## 16. Authorized Views & Datasets

### Motivation

BigQuery supports authorized views and authorized datasets — views/datasets that can access data in other datasets regardless of the querying user's permissions. This is a key security pattern. Tools should indicate which views are authorized.

### Required Metadata

For authorized views on a dataset:

| Property | Description |
|---|---|
| Source dataset | The dataset granting access |
| Authorized view | Project.dataset.view that has been authorized |

For authorized datasets:

| Property | Description |
|---|---|
| Source dataset | The dataset granting access |
| Authorized dataset | Project.dataset that has been authorized |

### How to Obtain

This metadata is available via the BigQuery API on the dataset resource's `access` field:

```
GET https://bigquery.googleapis.com/bigquery/v2/projects/{project}/datasets/{dataset}
```

The response includes an `access` array with entries of type `view` (for authorized views) and `dataset` (for authorized datasets).

---

## 17. Search Indexes

### Motivation

BigQuery supports search indexes for efficient text search across STRING, JSON, and ARRAY columns. Tools should display which tables have search indexes and their configuration.

### Required Metadata

| Property | Description | Source Column |
|---|---|---|
| Index name | Search index identifier | `index_name` |
| Table name | Indexed table | `table_name` |
| Index columns | Columns included in the index | `index_columns` |
| Index type | SEARCH | `index_type` |
| Index status | ACTIVE, PENDING, DISABLED, etc. | `index_status` |
| Created | Creation timestamp | `creation_time` |
| Last modified | Last modification timestamp | `last_modified_time` |
| Coverage percentage | Percentage of table data indexed | `coverage_percentage` |

### How to Obtain

```sql
SELECT
    table_name,
    index_name,
    index_columns,
    index_type,
    index_status,
    creation_time,
    last_modified_time,
    coverage_percentage
FROM
    `project`.`dataset`.INFORMATION_SCHEMA.SEARCH_INDEXES;
```

---

## Summary Priority Table

| # | Category | BigQuery Source | Impact |
|---|---|---|---|
| 1 | Stored Procedures | `INFORMATION_SCHEMA.ROUTINES` | High — core SQL development |
| 2 | Scalar Functions (UDFs) | `INFORMATION_SCHEMA.ROUTINES` | High — autocomplete, documentation |
| 3 | Table Functions | `INFORMATION_SCHEMA.ROUTINES` | High — SQL development |
| 4 | Routine Parameters | `INFORMATION_SCHEMA.PARAMETERS` | High — signature help |
| 5 | ML Models | `INFORMATION_SCHEMA.ML_MODELS` / BigQuery API | Medium — BQML users |
| 6 | Partition Metadata | `INFORMATION_SCHEMA.PARTITIONS` | High — cost & performance |
| 7 | DDL Retrieval | `INFORMATION_SCHEMA.TABLES.ddl` / `ROUTINES.ddl` | High — migration, documentation |
| 8 | Reverse Foreign Keys | `INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE` | Medium — impact analysis |
| 9 | Extended Table Statistics | `INFORMATION_SCHEMA.PARTITIONS` / `TABLE_STORAGE` | Medium — cost monitoring |
| 10 | Dataset Properties | `INFORMATION_SCHEMA.SCHEMATA` / BigQuery API | Medium — admin context |
| 11 | External Table Properties | `INFORMATION_SCHEMA.TABLE_OPTIONS` | Medium — data source config |
| 12 | Materialized View Properties | `INFORMATION_SCHEMA.MATERIALIZED_VIEWS` | Medium — refresh status |
| 13 | Table Options | `INFORMATION_SCHEMA.TABLE_OPTIONS` | Medium — configuration display |
| 14 | Row-Level Security Policies | `INFORMATION_SCHEMA.ROW_ACCESS_POLICIES` | Low — security audit |
| 15 | Column-Level Security | `INFORMATION_SCHEMA.COLUMN_FIELD_PATHS` | Low — security audit |
| 16 | Authorized Views & Datasets | BigQuery API (dataset access field) | Low — security audit |
| 17 | Search Indexes | `INFORMATION_SCHEMA.SEARCH_INDEXES` | Low — index awareness |

We're happy to provide further details on any of these categories.


Property	Description	Source Column
Routine catalog	Project containing the procedure	`routine_catalog`
Routine schema	Dataset containing the procedure	`routine_schema`
Routine name	Procedure identifier	`routine_name`
Routine type	PROCEDURE	`routine_type`
Routine body	SQL or EXTERNAL	`routine_body`
External language	JAVASCRIPT (if external)	`external_language`
DDL	Complete CREATE PROCEDURE statement	`ddl`
Description	User-provided description	From ROUTINE_OPTIONS
Is deterministic	Whether the procedure is deterministic	`is_deterministic`
Created	Creation timestamp	`created`
Last altered	Last modification timestamp	`last_altered`
Security type	INVOKER or DEFINER	`security_type`

Property	Description	Source Column
Routine catalog	Project containing the function	`routine_catalog`
Routine schema	Dataset containing the function	`routine_schema`
Routine name	Function identifier	`routine_name`
Routine type	FUNCTION	`routine_type`
Return data type	Function return type	`data_type`
Routine body	SQL or EXTERNAL	`routine_body`
External language	JAVASCRIPT (if external)	`external_language`
DDL	Complete CREATE FUNCTION statement	`ddl`
Description	User-provided description	From ROUTINE_OPTIONS
Is deterministic	Whether the function is deterministic	`is_deterministic`
Created	Creation timestamp	`created`
Last altered	Last modification timestamp	`last_altered`

Property	Description	Source Column
Routine name	Parent routine	`specific_name`
Parameter name	Parameter identifier	`parameter_name`
Ordinal position	1-based position	`ordinal_position`
Parameter mode	IN, OUT, INOUT	`parameter_mode`
Data type	BigQuery type name	`data_type`
Is result	Whether this describes the return type	`parameter_mode = 'OUT'` and `ordinal_position = 0`

Property	Description	Source
Model name	Model identifier	`model_name`
Model type	LINEAR_REG, LOGISTIC_REG, KMEANS, BOOSTED_TREE, DNN_CLASSIFIER, ARIMA_PLUS, etc.	`model_type`
Description	User-provided description	`description`
Created	Creation timestamp	`creation_time`
Last modified	Last modification timestamp	`modified_time`
Expiration	Model expiration timestamp	`model_expiration_time`
Labels	Key-value labels	`labels`
Feature columns	Input feature column names and types	`feature_columns`
Label columns	Target label columns	`label_columns`
Training runs	List of training executions with timestamps	`training_runs`
Location	GCP region	`location`
Encryption	KMS key configuration	`encryption_configuration`
Default trial ID	Default trial for hyperparameter tuning	`default_trial_id`

Property	Description	Source Column
Table name	Parent table	`table_name`
Partition ID	Partition identifier (e.g., "20240101", "NULL", "UNPARTITIONED")	`partition_id`
Total rows	Row count in partition	`total_rows`
Total logical bytes	Logical storage size	`total_logical_bytes`
Total billable bytes	Billable storage size	`total_billable_bytes`
Last modified time	Last data modification	`last_modified_time`
Storage tier	ACTIVE or LONG_TERM	`storage_tier`

Property	Description
Object name	Fully qualified object name
Object type	TABLE, VIEW, MATERIALIZED VIEW, FUNCTION, PROCEDURE, etc.
DDL text	Complete CREATE statement

Property	Description
Referencing table	Table that holds the foreign key
FK constraint name	Foreign key constraint name
Referencing columns	Columns in the referencing table
Referenced columns	Columns in this table (the target)

Property	Description	Source
Total rows	Row count	`PARTITIONS` aggregate or `num_rows`
Total logical bytes	Logical storage size	`PARTITIONS` aggregate
Total billable bytes	Billable storage size	`PARTITIONS` aggregate
Total physical bytes	Physical storage including compression	`TABLE_STORAGE`
Active logical bytes	Recently modified data (<90 days)	`TABLE_STORAGE`
Long-term logical bytes	Data unmodified >90 days	`TABLE_STORAGE`
Time travel physical bytes	Time travel window storage	`TABLE_STORAGE`
Partition count	Number of partitions	`PARTITIONS` aggregate
Last modified time	Most recent data change	`PARTITIONS` aggregate

Property	Description
Dataset name	Dataset identifier
Description	User-provided description
Location	GCP region (e.g., US, EU, us-central1)
Creation time	When the dataset was created
Last modified time	Last configuration change
Default table expiration	Auto-delete interval for new tables
Default partition expiration	Auto-delete interval for partitions
Labels	Key-value labels
Default collation	Default collation for new string columns
Max time travel hours	Time travel window (48-168 hours)
Storage billing model	LOGICAL or PHYSICAL
Is case insensitive	Case sensitivity setting

Property	Description
Source format	AVRO, CSV, DATASTORE_BACKUP, GOOGLE_SHEETS, JSON, ORC, PARQUET, ICEBERG, DELTA
Source URIs	Array of Cloud Storage URIs or Sheet URLs
Compression	GZIP, SNAPPY, etc.
Encoding	UTF-8, ISO-8859-1
Field delimiter	CSV field delimiter
Skip leading rows	Number of header rows to skip
Quote character	CSV quote character
Allow jagged rows	Whether ragged CSV rows are allowed
Allow quoted newlines	Whether newlines in quoted fields are allowed
Max bad records	Error tolerance threshold
Null marker	String representing NULL
Hive partition URI prefix	Hive-style partition detection
Require hive partition filter	Whether partition filter is required
Sheet range	Google Sheets range (if applicable)
JSON extension	GEOJSON, etc.
Enable logical types	Whether Avro/Parquet logical types are used
Decimal target types	How decimals are mapped

Property	Description	Source
View definition	The SELECT query	`INFORMATION_SCHEMA.MATERIALIZED_VIEWS.definition`
Enable refresh	Whether auto-refresh is on	TABLE_OPTIONS `enable_refresh`
Refresh interval	Minutes between refreshes	TABLE_OPTIONS `refresh_interval_minutes`
Last refresh time	When the MV was last refreshed	`INFORMATION_SCHEMA.MATERIALIZED_VIEWS.last_refresh_time`
Refresh watermark	Freshness of the MV data	`INFORMATION_SCHEMA.MATERIALIZED_VIEWS.refresh_watermark`

Property	Description	Option Name
Description	Table description	`description`
Expiration timestamp	When the table auto-deletes	`expiration_timestamp`
Partition expiration days	When partitions auto-delete	`partition_expiration_days`
Require partition filter	Whether queries must filter on partition	`require_partition_filter`
KMS key name	Cloud KMS encryption key	`kms_key_name`
Friendly name	Display name	`friendly_name`
Labels	Key-value labels	`labels`
Default rounding mode	Rounding for NUMERIC/BIGNUMERIC	`default_rounding_mode`
Max staleness	BI Engine staleness tolerance	`max_staleness`

Property	Description	Source Column
Policy name	Row policy identifier	`row_access_policy_name`
Table name	Table the policy applies to	`table_name`
Filter predicate	SQL expression controlling access	`filter_predicate`
Grantees	Users/groups/domains granted access	`grantee_list`
Created	Creation timestamp	`creation_time`

Property	Description
Source dataset	The dataset granting access
Authorized view	Project.dataset.view that has been authorized

Property	Description
Source dataset	The dataset granting access
Authorized dataset	Project.dataset that has been authorized

Property	Description	Source Column
Index name	Search index identifier	`index_name`
Table name	Indexed table	`table_name`
Index columns	Columns included in the index	`index_columns`
Index type	SEARCH	`index_type`
Index status	ACTIVE, PENDING, DISABLED, etc.	`index_status`
Created	Creation timestamp	`creation_time`
Last modified	Last modification timestamp	`last_modified_time`
Coverage percentage	Percentage of table data indexed	`coverage_percentage`

#	Category	BigQuery Source	Impact
1	Stored Procedures	`INFORMATION_SCHEMA.ROUTINES`	High — core SQL development
2	Scalar Functions (UDFs)	`INFORMATION_SCHEMA.ROUTINES`	High — autocomplete, documentation
3	Table Functions	`INFORMATION_SCHEMA.ROUTINES`	High — SQL development
4	Routine Parameters	`INFORMATION_SCHEMA.PARAMETERS`	High — signature help
5	ML Models	`INFORMATION_SCHEMA.ML_MODELS` / BigQuery API	Medium — BQML users
6	Partition Metadata	`INFORMATION_SCHEMA.PARTITIONS`	High — cost & performance
7	DDL Retrieval	`INFORMATION_SCHEMA.TABLES.ddl` / `ROUTINES.ddl`	High — migration, documentation
8	Reverse Foreign Keys	`INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE`	Medium — impact analysis
9	Extended Table Statistics	`INFORMATION_SCHEMA.PARTITIONS` / `TABLE_STORAGE`	Medium — cost monitoring
10	Dataset Properties	`INFORMATION_SCHEMA.SCHEMATA` / BigQuery API	Medium — admin context
11	External Table Properties	`INFORMATION_SCHEMA.TABLE_OPTIONS`	Medium — data source config
12	Materialized View Properties	`INFORMATION_SCHEMA.MATERIALIZED_VIEWS`	Medium — refresh status
13	Table Options	`INFORMATION_SCHEMA.TABLE_OPTIONS`	Medium — configuration display
14	Row-Level Security Policies	`INFORMATION_SCHEMA.ROW_ACCESS_POLICIES`	Low — security audit
15	Column-Level Security	`INFORMATION_SCHEMA.COLUMN_FIELD_PATHS`	Low — security audit
16	Authorized Views & Datasets	BigQuery API (dataset access field)	Low — security audit
17	Search Indexes	`INFORMATION_SCHEMA.SEARCH_INDEXES`	Low — index awareness

Feature Request: Comprehensive Metadata Discovery for Data Development Tools #118

Description

Summary

Table of Contents

1. Stored Procedures

Motivation

Required Metadata

How to Obtain

2. Scalar Functions (UDFs)

Motivation

Required Metadata

How to Obtain

3. Table Functions

Motivation

Required Metadata

How to Obtain

4. Routine Parameters

Motivation

Required Metadata

How to Obtain

5. ML Models

Motivation

Required Metadata

How to Obtain

Note

6. Partition Metadata

Motivation

Required Metadata

How to Obtain

Aggregated Table Statistics from Partitions

7. DDL / CREATE Statement Retrieval

Motivation

Required Metadata

How to Obtain

Suggested Approach

8. Reverse Foreign Keys (Referenced By)

Motivation

Required Metadata

How to Obtain

Note

9. Extended Table Statistics

Motivation

Required Metadata

How to Obtain

ADBC Spec Alignment

10. Dataset Properties

Motivation

Required Metadata

How to Obtain

11. Extended External Table Properties

Motivation

Required Metadata

How to Obtain

12. Extended Materialized View Properties

Motivation

Required Metadata

How to Obtain

13. Table Options (Labels, Expiration, Encryption)

Motivation

Required Metadata

How to Obtain

14. Row-Level Security Policies

Motivation

Required Metadata

How to Obtain

15. Column-Level Security (Policy Tags)

Motivation

Required Metadata

How to Obtain

16. Authorized Views & Datasets

Motivation

Required Metadata

How to Obtain

17. Search Indexes

Motivation

Required Metadata

How to Obtain

Summary Priority Table

Metadata

Metadata