google · copybara-service · Jun 22, 2026
diff --git a/skills/cloud/datalineage-bigquery-asset-impact-analysis/SKILL.md b/skills/cloud/datalineage-bigquery-asset-impact-analysis/SKILL.md
@@ -0,0 +1,129 @@
+---
+name: datalineage-bigquery-asset-impact-analysis
+description: >-
+  Performs a deep impact analysis for a broken, stale, or modified BigQuery data asset.
+  Identifies all downstream tables, dashboards, and processes that will be affected (the blast radius).
+---
+
+# BigQuery Asset Impact Analysis
+
+This skill guides the agent in performing a downstream impact analysis (blast
+radius assessment) when a BigQuery table or view is reported as broken, stale,
+missing, or when a user is planning maintenance and wants to know the
+consequences of modifying or pausing updates to an asset.
+
+It relies primarily on the **Google Cloud Data Lineage (Dataplex) MCP Server**
+to discover relationships between assets.
+
+## Prerequisites
+
+This skill requires access to the Google Cloud Data Lineage API and an active
+client connection to the Data Lineage MCP Server. For detailed connection
+configurations and tool schemas, refer to [MCP Usage](references/mcp-usage.md).
+
+## Analysis Workflow
+
+### 1. Resolve the Asset Fully Qualified Name (FQN)
+
+*   Ensure you have the correct FQN format for the BigQuery asset:
+    *   *Format:* `bigquery:{project_id}.{dataset_id}.{table_or_view_id}`
+    *   *Example:* `bigquery:my-prod-project.analytics.orders`
+*   If the user provides a partial path (e.g., just `dataset.table`), attempt to
+    infer the active project from the environment or ask the user for
+    clarification.
+
+### 2. Determine Locations and Parent Path
+
+Identify the locations to search and construct the Data Lineage API request:
+
+*   **Discover Asset Location**: Run the command `bq show --format=json
+    {project_id}:{dataset_id}` and extract the `location` field (e.g.,
+    `us-central1` or `us`). If location discovery fails due to permissions or
+    missing tools, prompt the user for the dataset's location.
+*   **Set Parent Path**: Set the `parent` path to match the location where your
+    Data Lineage MCP server is configured (e.g.,
+    `projects/{project_id}/locations/{mcp_server_location}`).
+*   **Configure Search Scope**: Include the discovered asset location in the
+    `locations` array of the payload (e.g., `["us-central1"]` or `["us",
+    "us-central1"]`).
+
+### 3. Retrieve the Downstream Lineage Graph
+
+Call the `DataLineageServer:search_lineage` tool to fetch downstream
+relationships.
+
+*   **Direction**: Set to `DOWNSTREAM`.
+*   **Search Parameters**: Use `max_depth = 10` and `max_process_per_link = 5`
+    as robust defaults.
+
+### 4. Identify the Blast Radius
+
+Traverse the returned lineage links to build the impact graph:
+
+*   **Affected Assets**: The `target` of each link represents a downstream asset
+    that depends on your source asset.
+*   **Transform Processes**: Inspect the `processes` field on each link. This
+    identifies the ETL pipelines, BigQuery Views, or Scheduled Queries that
+    propagate the data.
+*   **Direct vs. Indirect Impact**:
+    *   **Direct Impact (Depth 1)**: Assets directly consuming the source asset.
+        If a link has `dependency_type: EXACT_COPY`, mark the target as
+        "Directly Stale / Identical Copy".
+    *   **Indirect Impact (Depth > 1)**: Assets further down the stream that
+        will experience cascading stale data or failures.
+
+### 5. Summarize and Format the Output
+
+Present your findings clearly to the user using the following structure:
+
+1.  **Executive Summary**: State the total number of downstream assets affected
+    and the maximum depth of the impact.
+2.  **Critical Path**: Highlight high-priority downstream assets (e.g., assets
+    containing "prod", "dashboard", "reporting", or "master" in their names).
+3.  **Blast Radius Table**: A clean markdown table listing the dependencies. You
+    MUST include all columns: | Downstream Asset | Transform Process | Depth |
+    Impact Type | | :--- | :--- | :--- | :--- | |
+    `bigquery:project.dataset.table` | `projects/p/locations/l/processes/proc` |
+    1 | Direct | | `bigquery:project.dataset.view` |
+    `projects/p/locations/l/processes/view` | 2 | Indirect |
+4.  **Analysis Metadata**: Provide transparency on the parameters and boundaries
+    of your search so the user can choose to expand them:
+    *   **Locations Searched**: `{list_of_locations_queried}`
+    *   **Parent Location**: `{parent_path}`
+    *   **Depth Limit**: `{max_depth}`
+    *   **Process per Link Limit**: `{max_process_per_link}`
+    *   *Tip for User*: Let the user know they can request to rerun the analysis
+        with expanded locations or larger depth limits.
+
+## Crucial Constraints & Guardrails
+
+1.  **Interpret Empty Responses Correctly**:
+    *   An empty response from `search_lineage` indicates that no lineage links
+        are currently recorded for this asset in the queried locations. Do not
+        assume the tool is broken or delayed; report the lack of dependencies
+        clearly and proceed.
+2.  **Strictly Banned Bypasses**:
+    *   Always use the `DataLineageServer:search_lineage` MCP tool. Avoid
+        attempting fallback queries using `gcloud alpha datalineage` or raw
+        `curl` API calls, as these will fail in the agent sandbox due to
+        authentication and mTLS restrictions.
+3.  **Verify Asset Existence First**:
+    *   If `bq show` indicates the source table does not exist, stop and report
+        this directly to the user. Do not attempt to guess alternative table
+        names unless the user explicitly instructs you to do so.
+4.  **No Output Shortcutting or Hallucinated Artifacts**:
+    *   Present the complete downstream blast radius table directly in your
+        final response. Avoid telling the user you have created a separate
+        markdown file or artifact containing the details unless you have
+        explicitly executed file-writing tools to create it.
+
+## Reference Directory
+
+-   [MCP Usage](references/mcp-usage.md): Using the Google Cloud Data Lineage
+    remote MCP server and tool preferences.
+
+## External Documentation
+
+-   [Google Cloud Dataplex Data Lineage Documentation](https://cloud.google.com/dataplex/docs/data-lineage)
+-   [Use the Data Lineage MCP server](https://docs.cloud.google.com/dataplex/docs/use-lineage-mcp)
+-   [Dataplex Data Lineage API Reference](https://cloud.google.com/dataplex/docs/reference/data-lineage/rest)
diff --git a/skills/cloud/datalineage-bigquery-asset-impact-analysis/references/mcp-usage.md b/skills/cloud/datalineage-bigquery-asset-impact-analysis/references/mcp-usage.md
@@ -0,0 +1,40 @@
+# Google Cloud Data Lineage MCP Usage
+
+The Data Lineage service is supported by a remote Model Context Protocol (MCP)
+server that provides structured tools for discovering relationships between data
+assets.
+
+## MCP Tools for Data Lineage
+
+-   **search_lineage**: Performs a breadth-first search (upstream or downstream)
+    to retrieve lineage links for an asset identified by its Fully Qualified
+    Name (FQN). Supports Column-Level Lineage (CLL).
+
+### Tool Preference Hierarchy
+
+*   Default: Data Lineage MCP tool (`DataLineageServer:search_lineage`) > `bq`
+    CLI > `gcloud` CLI.
+*   Note: `gcloud` does NOT support lineage links searches in standard/beta
+    tracks. Always prefer the MCP tool.
+
+## Setup Instructions
+
+To connect to the Data Lineage MCP server, see
+[Use the Data Lineage MCP server](https://docs.cloud.google.com/dataplex/docs/use-lineage-mcp).
+
+For client configuration, add the following block to your agent's MCP
+configuration file (e.g., `mcp_config.json`):
+
+```json
+{
+  "mcpServers": {
+    "DataLineageServer": {
+      "serverUrl": "https://datalineage.googleapis.com/mcp",
+      "authProviderType": "google_credentials",
+      "headers": {
+        "x-goog-user-project": "<GCP_PROJECT_ID>"
+      }
+    }
+  }
+}
+```