diff --git a/skills/cloud/bigquery-ai-ml/SKILL.md b/skills/cloud/bigquery-ai-ml/SKILL.md index 97d4bca46d..5fb0017125 100644 --- a/skills/cloud/bigquery-ai-ml/SKILL.md +++ b/skills/cloud/bigquery-ai-ml/SKILL.md @@ -5,14 +5,14 @@ metadata: description: >- Leverages BigQuery's built-in machine learning and GenAI capabilities for advanced data analytics. Use when you need to write SQL queries - that perform time-series forecasting, detect outliers, or leverage generative AI capabilities in BigQuery. + that perform time-series forecasting, detect outliers, find key drivers, or leverage generative AI capabilities in BigQuery. --- # BigQuery AI & ML BigQuery integrates with Vertex AI to provide powerful machine learning and generative AI capabilities directly within SQL queries using built-in functions -like `AI.FORECAST`, `AI.DETECT_ANOMALIES`, and `AI.GENERATE`. +like `AI.FORECAST`, `AI.KEY_DRIVERS`, `AI.DETECT_ANOMALIES`, and `AI.GENERATE`. ## Reference Directory @@ -25,6 +25,9 @@ like `AI.FORECAST`, `AI.DETECT_ANOMALIES`, and `AI.GENERATE`. - [AI Generate](references/ai_generate.md): General-purpose text and content generation using Gemini models. +- [AI Key Drivers](references/ai_key_drivers.md): Automatically identify + dimensional segments most responsible for driving changes in a metric. + ## Related Skills - [BigQuery Basics Skill](../bigquery-basics): diff --git a/skills/cloud/bigquery-ai-ml/references/ai_key_drivers.md b/skills/cloud/bigquery-ai-ml/references/ai_key_drivers.md new file mode 100644 index 0000000000..5f9e7fff58 --- /dev/null +++ b/skills/cloud/bigquery-ai-ml/references/ai_key_drivers.md @@ -0,0 +1,97 @@ +# BigQuery AI.Key_Drivers + +`AI.KEY_DRIVERS` automatically identifies the key dimensional segments most +responsible for driving changes in a specified metric between a defined interest +group and a reference group. + +## Syntax Reference + +```sql +SELECT + * +FROM + AI.KEY_DRIVERS( + { TABLE TABLE | (QUERY_STATEMENT) }, + metric_col => 'METRIC_COL', + dimension_cols => DIMENSION_COLS, + interest_label_col => 'INTEREST_LABEL_COL' + [, min_apriori_support => MIN_APRIORI_SUPPORT] + [, top_k => TOP_K] + [, enable_pruning => ENABLE_PRUNING] + ) +``` + +### Input Arguments + +Argument | Requirement | Type | Description +:------------------------ | :----------- | :-------------- | :---------- +**`input_data`** | **Required** | | The source table or subquery containing the data to analyze. +**`metric_col`** | **Required** | `STRING` | Metric column name. Must be of type: INT64, NUMERIC, BIGNUMERIC, or FLOAT64. +**`interest_label_col`** | **Required** | `STRING` | Boolean column name: `TRUE` for interest group, `FALSE` for reference group. +**`dimension_cols`** | **Required** | `ARRAY` | 1-12 dimension columns (INT64, BOOL, STRING); cannot be `metric_col` or `interest_label_col`. +**`min_apriori_support`** | Optional | `FLOAT64` | Minimum apriori support threshold [0, 1] for output segments. Default: 0.1. Cannot be used with `top_k`. +**`top_k`** | Optional | `INT64` | Return top k insights [1, 1M] by apriori support. If unset, uses `min_apriori_support=0.1`. Cannot be used with `min_apriori_support`. +**`enable_pruning`** | Optional | `BOOL` | If `TRUE` (default), redundant insights are pruned. If `FALSE`, all insights meeting thresholds are returned. Two segments are redundant if two conditions are met: 1) their metric values are equal 2) The dimensions and corresponding values of one row are a subset of the dimensions and corresponding values of the other. In this case, the row with more dimensions (the more descriptive row) is kept. + +### Output Schema + +Returns a `STRUCT` with the following fields: + +Column Name | Type | Description +:----------------------------------- | :-------------- | :---------- +**`drivers`** | `ARRAY` | Provides a list of drivers, or dimension values of interest, which describes each of the segments. +**`metric_interest`** | `NUMERIC` | The sum of the metric_column for the data in the interest segment. +**`metric_reference`** | `NUMERIC` | The sum of the metric_column for data in the reference segment. +**`difference`** | `NUMERIC` | The difference between the interest and reference metric values for a segment. +**`relative_difference`** | `NUMERIC` | The relative change of a segment, calculated as the difference divided by the reference metric value. +**`unexpected_difference`** | `NUMERIC` | Measures deviation of segment from the rest of the population's growth. Calculated as: (segment relative_difference - complement relative_difference) * segment reference metric. +**`relative_unexpected_difference`** | `NUMERIC` | The unexpected_difference divided by the expected interest metric value for a segment. +**`contribution`** | `NUMERIC` | Contains the absolute value of the difference value: `ABS(difference)`. +**`apriori_support`** | `NUMERIC` | Segment size relative to the total population (filters small segments). + +## Examples + +### Identifying Key Drivers in 2024 H2 Liquor Sales + +```sql +WITH InputData AS ( + SELECT + sale_dollars, + city, + category_name, + vendor_name, + (date > '2024-07-01') AS IS_H2 + FROM `bigquery-public-data.iowa_liquor_sales.sales` + WHERE EXTRACT(YEAR FROM DATE) = 2024 +) +SELECT * +FROM AI.KEY_DRIVERS( + TABLE InputData, + metric_col => 'sale_dollars', + dimension_cols => ['city', 'vendor_name', 'category_name'], + interest_label_col => 'IS_H2', + min_apriori_support => 0 +); +``` + +### Finding Top 5 Key Drivers in Bottle Costs + +```sql +SELECT * +FROM AI.KEY_DRIVERS( + ( + SELECT + state_bottle_cost, + city, + category_name, + (EXTRACT(MONTH FROM date) >= 7) AS is_h2 + FROM `bigquery-public-data.iowa_liquor_sales.sales` + WHERE EXTRACT(YEAR FROM date) = 2024 + ), + metric_col => 'state_bottle_cost', + dimension_cols => ['city', 'category_name'], + interest_label_col => 'is_h2', + top_k => 5 +); +``` +