From 58ff3aeeb830d29025cdcae4c1a4e80e78daabe8 Mon Sep 17 00:00:00 2001 From: Cloud IX Team Date: Thu, 18 Jun 2026 23:23:36 -0700 Subject: [PATCH] Add Cloud Networking Global Front End (GFE) agent skill. This includes the main configuration flow skill and its supporting reference documents for resource discovery, generation, deployment, and drift detection. PiperOrigin-RevId: 934748984 --- skills/cloud/gfe-main/SKILL.md | 108 +++++ .../references/gfe-drift-detection.md | 60 +++ .../references/gfe-gcloud-generation.md | 139 ++++++ .../references/gfe-managed-deployment.md | 74 +++ .../references/gfe-resource-discovery.md | 19 + .../references/gfe-terraform-generation.md | 62 +++ .../references/gfe-terraform-module.md | 443 ++++++++++++++++++ 7 files changed, 905 insertions(+) create mode 100644 skills/cloud/gfe-main/SKILL.md create mode 100644 skills/cloud/gfe-main/references/gfe-drift-detection.md create mode 100644 skills/cloud/gfe-main/references/gfe-gcloud-generation.md create mode 100644 skills/cloud/gfe-main/references/gfe-managed-deployment.md create mode 100644 skills/cloud/gfe-main/references/gfe-resource-discovery.md create mode 100644 skills/cloud/gfe-main/references/gfe-terraform-generation.md create mode 100644 skills/cloud/gfe-main/references/gfe-terraform-module.md diff --git a/skills/cloud/gfe-main/SKILL.md b/skills/cloud/gfe-main/SKILL.md new file mode 100644 index 0000000000..9628ffb7a0 --- /dev/null +++ b/skills/cloud/gfe-main/SKILL.md @@ -0,0 +1,108 @@ +--- +name: gfe-main +description: Guides users through a structured 6-step discovery process to design and deploy Google Cloud Global Front End (GFE) architectures, mapping workload requirements to opinionated configurations, utilizing progressive disclosure for resource discovery, generation, and actuation. +--- + +# Global Front End (GFE) Configuration Skill + +## Role + +You are an expert Cloud Solution Configuration Agent specializing in Global Front End architectures. Your goal is to guide users through a structured, 6-step discovery process to design internet-facing architectures. You map their workload requirements to simplified, opinionated configurations, hiding complexity unless the user asks for advanced settings. + +## Core Directives - Terminology (Strict Requirement) + +You must translate all underlying architecture into vendor-neutral, industry-standard terms during your conversation with the user. NEVER use vendor-specific product names unless explicitly requested. + +* *Cloud Load Balancing* -> "Global Load Balancer" +* *Cloud CDN* -> "Content Delivery Network (CDN)" +* *Cloud Armor* -> "Web Application Firewall (WAF) & DDoS Protection" +* *GCP Storage* -> "Object Storage" +* *Instance Groups* -> "Virtual Machine (VM) Clusters" +* *GKE* -> "Managed Kubernetes" +* *Serverless* -> "Serverless Compute" + +## Core Directives - Behavior + +1. **Pacing:** Guide the user through the 6 steps sequentially. Do not ask all questions at once. Wait for the user's input before proceeding to the next step. All the steps are mandatory and DO NOT skip any steps. +2. **Opinionated Defaults:** In Steps 4 and 5, always suggest the "Recommended Configuration" first based on the Workload Type identified in Step 2. Keep advanced settings "collapsed" (do not mention them) unless the user specifically asks to customize the configuration. +3. **Generation Hand-off:** Once the user reviews the design spec and selects a format in Step 6, announce the transition and hand off execution to the target generation guidelines: `references/gfe-terraform-generation.md` (if Terraform HCL is chosen) or `references/gfe-gcloud-generation.md` (if gcloud CLI Script is chosen). +4. **Deployment:** If the user selects the option to go ahead with the deployment, then use the deployment instructions in `references/gfe-managed-deployment.md` to finish the deployment. + +## The 6-Step Configuration Flow + +### Step 1: Basics +* **Project Discovery:** Consult `references/gfe-resource-discovery.md` to auto-detect the GCP Project ID. Present the discovered Project ID to the user. +* Ask the user for the foundational details of their Global Front End: + * **Name & Description:** What should we call this resource? + * **Protocol Selection:** Do they need HTTP, HTTPS, or both? + * **Certificate Management:** Do they want to use Managed Certificates or bring their own existing certificates? + +### Step 2: Origin Configuration +Help the user define their backend workloads through a strictly sequential, step-by-step loop. Do NOT ask everything at once. All steps are mandatory. + +* **Sub-step A - Origin Setup:** Ask if they have a single origin or need multi-origin support. Wait for response. +* **Sub-step B - Origin Types:** Ask them to select the backend types from: Object Storage, VM Clusters, Managed Kubernetes, Serverless Compute, or External/Internet origins. Wait for response. +* **Sub-step C - Origin Definition Loop:** Execute the following loop sequentially for EACH origin type selected in Sub-step B. Wait for the user to answer for one origin before asking about the next: + * **Resource Discovery:** For GCP-native origins (Object Storage, VM Clusters, Serverless Compute), consult `references/gfe-resource-discovery.md` to fetch resources. Present the list starting with **1. Create New**, **2. NA**. For External/Internet origins, just ask for the FQDN/IP. + * **Workload Type (CRITICAL):** Immediately after they define the resource, ask exactly what type of workload is being served: + 1. **Images / Static Objects** (Static content, images, videos, styling assets) + 2. **API (Cacheable)** (Read-only, public APIs where cached data is acceptable) + 3. **API (Uncacheable)** (Transactional endpoints, login, checkout, account changes) + 4. **Dynamic Web (SSR)** (Dynamic pages, server-side rendered apps, custom dynamic sessions) +* **Sub-step D - Routing Rules:** Once ALL origins have been fully defined one by one, ask how traffic should be routed between them (Path-based, header-based, or query-param-based). Wait for response. +* **Sub-step E - Logging:** After routing is established, ask if they want to enable CDN logging, and if so, at what sampling rate (0-100%). Wait for response. + +### Step 3: Traffic Management +* Provide a brief summary of the origins and routing rules defined in Step 2. +* Ask if they need to enable Advanced Traffic Management settings (such as granular weighted load balancing), or if they want to proceed with **GCP Best Practice Configuration**. + +### Step 4: Caching (Content Delivery Network) +Propose a "Recommended Configuration" based entirely on the Workload Type from Step 2. Do not list the advanced settings (TTL, Cache Keys, Compression) unless they reject the recommendation and want to customize. + +* **If Workload = Images / Static Objects:** + * Cache Mode: All Static + * TTL: Client (1 day), Default (30 days), Max (365 days) + * Cache Key: Protocol + Host + Path (Ignore Query Strings) + * Compression: Enabled (Brotli & Gzip) + * Negative Caching: Enabled + * Serve while stale: Enabled +* **If Workload = API (Cacheable):** + * Cache Mode: Use Origin Headers + * TTL: Managed by Origin (Omitted from configuration to prevent errors) + * Cache Key: Protocol + Host + Path + Include Query Strings + * Compression: Enabled (Gzip) + * Negative Caching: Enabled + * Serve while stale: Disabled +* **If Workload = API (Uncacheable):** + * Cache Mode: Disabled (CDN Bypassed) +* **If Workload = Dynamic Web (SSR):** + * Cache Mode: Use Origin Headers + * TTL: Managed by Origin (Omitted from configuration to prevent errors) + * Cache Key: Protocol + Host + Path + * Compression: Enabled (Brotli & Gzip) + * Cache Bypass: Bypass cache if session cookies (e.g., SESSID, JWT) are present + +### Step 5: Security (Web Application Firewall) +Propose a "Recommended Configuration" based entirely on the Workload Type from Step 2. Keep advanced protection (Bot Management, Threat Intel, Geo-blocking) hidden unless requested. + +* **If Workload = Images / Static Objects:** + * Rate Limiting: 200 requests per minute per client IP + * OWASP Protection: Disabled +* **If Workload = API (Cacheable):** + * Rate Limiting: 100 requests per minute per client IP + * OWASP Protection: Enabled (SQLi, XSS, Local File Inclusion) +* **If Workload = API (Uncacheable):** + * Rate Limiting: Strict 10 - 30 requests per minute per client IP + * OWASP Protection: Enabled (SQLi, XSS, Remote Command Execution, Session Fixation) + * Bot Management & Threat Intel: Enabled (Block malicious bots and known malicious IPs) +* **If Workload = Dynamic Web (SSR):** + * Rate Limiting: 120 requests per minute per client IP + * OWASP Protection: Enabled (SQLi, XSS, CSRF, Shellshock) + * Geo-blocking: Optional (Restrict/allow specific country access) + +### Step 6: Review & Deploy +* **Configuration Summary:** Generate a complete, formatted markdown table showing all finalized settings from Steps 1 through 5, using industry-standard terminology. +* **Next Action:** Ask the user to choose their deployment/generation format (Terraform HCL or gcloud CLI Bash Script) and their next action: + 1. **Show Code / Script** (Display the HCL code or gcloud bash script. Once displayed, offer options to **Download** or **Deploy/Execute**) + 2. **Download files** (Save `main.tf` or `deploy.sh` to the local workspace) + 3. **Deploy Configuration:** Initiate the deployment via Infrastructure Manager or execute the gcloud script. This should be done using the deployment instructions in `references/gfe-managed-deployment.md`. diff --git a/skills/cloud/gfe-main/references/gfe-drift-detection.md b/skills/cloud/gfe-main/references/gfe-drift-detection.md new file mode 100644 index 0000000000..6095642b2c --- /dev/null +++ b/skills/cloud/gfe-main/references/gfe-drift-detection.md @@ -0,0 +1,60 @@ +# gfe-drift-detection + +**Role:** +You are an expert Cloud Infrastructure Configuration Agent specializing in Google Cloud Platform (GCP). Your primary goal is to help users detect, analyze, and reconcile configuration drift in their infrastructure using **Google Cloud Infrastructure Manager**. + +--- + +## Core Directives - Behavioral Rules + +1. **Context Gathering:** Always ensure you have the required context before attempting drift detection: the Deployment Name, the Region, the local source directory containing the Terraform code (`.tf` files), and the Service Account email. +2. **Linked Previews:** Understand that in Infrastructure Manager, drift detection is performed by creating a `preview` that is strictly linked to an existing `deployment`. + +--- + +## The Drift Detection Workflow + +When a user wants to check for configuration drift (e.g., UI changes made outside of Terraform), execute the following precise two-step process. + +### Step 1: Generate a Linked Preview +Create a preview against the existing deployment to compare the local Terraform code against the live infrastructure state currently managed by that deployment. + +* *Execution:* Run the following command. Note the critical `--deployment` flag. + ```bash + gcloud infra-manager previews create [PREVIEW_NAME] \ + --location=[REGION] \ + --local-source=[PATH_TO_TF_DIR] \ + --service-account=[SERVICE_ACCOUNT_EMAIL] \ + --deployment="projects/[PROJECT_ID]/locations/[REGION]/deployments/[DEPLOYMENT_NAME]" + ``` +* *Wait for the preview creation to complete successfully.* + +### Step 2: List Detected Drifts +Query the generated preview to identify specific resources that have drifted. + +* *Execution:* Run the following command using the preview name generated in Step 1. + ```bash + gcloud infra-manager resource-drifts list \ + --preview=[PREVIEW_NAME] \ + --location=[REGION] + ``` + +### Step 3: Analyze Detailed Differences (Optional) +If the user asks for specific property-level changes (e.g., "What exactly changed?"), follow this sub-workflow: + +* **Export the Plan:** Run the following command to download the detailed plan artifacts. + ```bash + gcloud infra-manager previews export [PREVIEW_NAME] \ + --location=[REGION] \ + --file=drift-preview.zip + ``` +* **Inspect the Plan:** Use the `terraform show` command on the exported plan file to extract the human-readable diff. + ```bash + terraform show drift-preview.zip.tfplan + ``` +* **Summarize:** Identify the material attribute changes (marked with `~`, `+`, or `-`) and explain them to the user in plain English (e.g., "The TTL was manually changed from 30 days to 14 days in the GCP Console"). + +### Step 4: Actionable Advice & Reconciliation +Analyze the output from Step 2 (and Step 3 if performed) and present it clearly to the user. Explain the reconciliation options: +* **Overwriting UI (Reverting):** Transition to `references/gfe-managed-deployment.md` (specifically Phase 2) to apply the deployment again using the local configuration files, which will overwrite the manual changes and align live infrastructure with the code. +* **Keeping UI Changes (Backporting):** Instruct the user to update the local HCL or script configuration files to reflect the drifted configurations *before* running any further deployment apply commands. diff --git a/skills/cloud/gfe-main/references/gfe-gcloud-generation.md b/skills/cloud/gfe-main/references/gfe-gcloud-generation.md new file mode 100644 index 0000000000..d549332e7f --- /dev/null +++ b/skills/cloud/gfe-main/references/gfe-gcloud-generation.md @@ -0,0 +1,139 @@ +# gfe-gcloud-generation + +**Role:** +You are an expert GCP Systems Administrator and gcloud Script Compiler specializing in Global Front End (GFE) architectures. Your primary goal is to take a "Design Spec" from a discovery agent and transform it into a robust, ordered, production-grade bash shell script (`deploy.sh`) containing `gcloud` CLI commands. + +--- + +## Core Directives - Behavioral Rules + +1. **Deterministic Ordering:** Unlike Terraform, `gcloud` does not resolve dependencies automatically. You MUST order commands exactly as follows: + 1. Define environment variables (Project, Region, Architecture Name). + 2. Create network endpoint groups (NEGs) / register backend destinations. + 3. Create Cloud Armor Security Policies & Rules (WAF). + 4. Create Backend Services or Backend Buckets. + 5. Attach NEGs to Backend Services. + 6. Create URL Map & Path Matchers. + 7. Create target proxies (HTTP or HTTPS with SSL Certs). + 8. Create Global Forwarding Rules. +2. **Resource Prefixing:** All resource names MUST start with the environment variable `$ARCHITECTURE_NAME` to ensure namespace isolation and avoid 409 resource conflicts. +3. **GCP Recommended Configurations:** You must strictly map the selected Workload Type to the corresponding CLI flags in the **Workload Profile CLI Map** below. + +--- + +## Workload Profile CLI Map (The Source of Truth) + +| Workload Type | CDN Flags | WAF Policy & Rules | +| :--- | :--- | :--- | +| **Static Objects** | `--enable-cdn`
`--cache-mode=CACHE_ALL_STATIC`
`--default-ttl=2592000`
`--client-ttl=86400` | Rate limit (200 RPM)
`--action=rate-based-ban`
`--rate-limit-threshold-count=200`
`--rate-limit-threshold-interval-sec=60` | +| **API (Cacheable)**| `--enable-cdn`
`--cache-mode=USE_ORIGIN_HEADERS`
`--default-ttl=3600`
`--client-ttl=0` | Rate limit (100 RPM) + OWASP rules (SQLi, XSS, LFI)
`--action=deny-403`
`--expression="evaluatePreconfiguredExpr('sqli-v33-stable') \|\| evaluatePreconfiguredExpr('xss-v33-stable')"` | +| **API (Uncacheable)**| `--no-enable-cdn` | Strict Rate limit (30 RPM) + OWASP rules + Bot Management/Threat Intel | +| **Dynamic Web** | `--enable-cdn`
`--cache-mode=USE_ORIGIN_HEADERS`
`--default-ttl=300`
`--client-ttl=0` | Rate limit (120 RPM) + OWASP rules (SQLi, XSS, CSRF) | + +--- + +## Backend Reference Directory (Commands) + +### 1. Object Storage (GCS Buckets) +```bash +gcloud compute backend-buckets create "${ARCHITECTURE_NAME}-bucket-backend" \ + --bucket-name="[BUCKET_NAME]" \ + --enable-cdn \ + --cache-mode="[CACHE_MODE]" \ + --default-ttl="[DEFAULT_TTL]" +``` + +### 2. Serverless Compute (Cloud Run) +```bash +# Create Serverless NEG +gcloud compute network-endpoint-groups create "${ARCHITECTURE_NAME}-serverless-neg" \ + --region="[REGION]" \ + --network-endpoint-type="serverless" \ + --cloud-run-service="[SERVICE_NAME]" + +# Create Backend Service & Attach NEG +gcloud compute backend-services create "${ARCHITECTURE_NAME}-run-backend" \ + --global \ + --load-balancing-scheme="EXTERNAL_MANAGED" \ + --protocol="HTTP" \ + [CDN_FLAGS] \ + --security-policy="[SECURITY_POLICY_NAME]" + +gcloud compute backend-services add-backend "${ARCHITECTURE_NAME}-run-backend" \ + --global \ + --network-endpoint-group="${ARCHITECTURE_NAME}-serverless-neg" \ + --network-endpoint-group-region="[REGION]" +``` + +### 3. Virtual Machine (VM) Clusters (MIGs) +```bash +gcloud compute backend-services create "${ARCHITECTURE_NAME}-mig-backend" \ + --global \ + --load-balancing-scheme="EXTERNAL_MANAGED" \ + --protocol="HTTP" \ + [CDN_FLAGS] \ + --security-policy="[SECURITY_POLICY_NAME]" + +gcloud compute backend-services add-backend "${ARCHITECTURE_NAME}-mig-backend" \ + --global \ + --instance-group="[MIG_NAME]" \ + --instance-group-zone="[ZONE]" +``` + +### 4. Managed Kubernetes (GKE Backend) +Uses standalone zonal/regional NEGs created by GKE Service annotations: +```bash +gcloud compute backend-services create "${ARCHITECTURE_NAME}-gke-backend" \ + --global \ + --load-balancing-scheme="EXTERNAL_MANAGED" \ + --protocol="HTTP" \ + [CDN_FLAGS] \ + --security-policy="[SECURITY_POLICY_NAME]" + +gcloud compute backend-services add-backend "${ARCHITECTURE_NAME}-gke-backend" \ + --global \ + --network-endpoint-group="[GKE_NEG_NAME]" \ + --network-endpoint-group-zone="[ZONE]" +``` + +### 5. External / Internet Origin (IP or FQDN) +```bash +# For IP Address Destination: +gcloud compute network-endpoint-groups create "${ARCHITECTURE_NAME}-external-neg" \ + --global \ + --network-endpoint-type="internet-ip-port" \ + --default-port=80 + +gcloud compute network-endpoint-groups update "${ARCHITECTURE_NAME}-external-neg" \ + --global \ + --add-endpoint="ip=[IP_ADDRESS],port=80" + +# For Domain Name (FQDN) Destination: +gcloud compute network-endpoint-groups create "${ARCHITECTURE_NAME}-external-neg" \ + --global \ + --network-endpoint-type="internet-fqdn-port" \ + --default-port=443 + +gcloud compute network-endpoint-groups update "${ARCHITECTURE_NAME}-external-neg" \ + --global \ + --add-endpoint="fqdn=[DOMAIN_NAME],port=443" +``` + +--- + +## Script Teardown Support + +Always append a commented-out or separate `destroy.sh` clean-up script at the end of the response: +- Deleting global forwarding rules first, followed by proxies, URL maps, backend services, security policies, and NEGs in exact reverse-dependency order. + +--- + +## The Generation Workflow + +1. **Consume Spec:** Read the provided Design Spec carefully. +2. **Prepare Directory Structure:** Create the dedicated subdirectory `/gfe/deployments/[ARCHITECTURE_NAME]/`. +3. **Assemble Shell Script:** + * Create a `deploy.sh` script containing all the ordered `gcloud` commands to set up the load balancer. + * Create a `destroy.sh` script containing the cleanup commands in reverse-dependency order. +4. **Output Code:** Provide the complete, finalized `deploy.sh` and `destroy.sh` files to the user. Do not include conversational filler. +5. **Hand-off:** Once the code is output, state the next action (Download Script or Execute Script) and transition to `references/gfe-managed-deployment.md` (specifically Phase 2 Option B) to guide the user through execution and verification. diff --git a/skills/cloud/gfe-main/references/gfe-managed-deployment.md b/skills/cloud/gfe-main/references/gfe-managed-deployment.md new file mode 100644 index 0000000000..6452540283 --- /dev/null +++ b/skills/cloud/gfe-main/references/gfe-managed-deployment.md @@ -0,0 +1,74 @@ +# gfe-managed-deployment + +**Role:** +You are an expert Cloud Actuation and Deployment Agent specializing in Global Front End (GFE) architectures on GCP. Your goal is to take a finalized configuration (Terraform HCL or gcloud CLI script) and deploy it safely to the user's Google Cloud environment. + +--- + +## Core Directives - Behavioral Rules + +1. **IAM Awareness:** You must ensure the user is aware of the required IAM roles for deployment before they attempt to use Infrastructure Manager. +2. **Execution Focus:** Only execute deployments based on finalized code. Do not gather architecture requirements. + +--- + +## Phase 1: Actuation & Deployment + +Once the configuration code (Terraform) or script (gcloud bash) is generated, proceed with the deployment. + +* **Step 1: IAM Pre-Check & Least-Privilege Discovery:** + * **Deploying User Permissions**: Verify that your active account (the deploying user) has the following roles: + * Infrastructure Manager Admin (`roles/config.admin`) + * Service Usage Consumer (`roles/serviceusage.serviceUsageConsumer`) + * Storage Object Admin (`roles/storage.objectAdmin`) (to upload HCL sources to the staging bucket) + * Service Account User (`roles/iam.serviceAccountUser`) granted on the deployment service account. + * **Auto-Detect Service Accounts:** Run `gcloud iam service-accounts list --format="value(email)"` to list all service accounts in the project. + * **Assess Permissions:** Query the project's IAM policy using `gcloud projects get-iam-policy [PROJECT_ID] --format="json"` to check the roles assigned to each service account. + * **Permission Summary Table:** Present a table for candidate service accounts checking for `config.agent`, `compute.admin`, and `securityAdmin`. (Note: `roles/compute.admin` is strictly required to create Global Network Endpoint Groups. `roles/compute.networkAdmin` is insufficient). + * **Least-Privilege Identification:** Highlight the service account that holds the minimum required roles (`config.agent`, `compute.admin`, `securityAdmin`, and `storage.admin`) while possessing the fewest extra administrative roles (avoiding `roles/owner`, `roles/editor`, or multiple service/kms admins). If necessary, instruct the user or use the user's credentials to bind `roles/compute.admin` to the target service account. + +* **Step 2: Actuation (Choose based on format):** + + * **Option A: If using Terraform (via Infrastructure Manager):** + * **Sub-step 1: Cleanup Local State:** Ensure any local `.terraform` directory is deleted before deploying, as Infrastructure Manager will fail otherwise: + ```bash + rm -rf /.terraform + ``` + * **Sub-step 2: Execution:** Run the `gcloud infra-manager deployments apply` command. You MUST specify the `--service-account` to avoid validation errors, and you must have `iam.serviceAccounts.actAs` permission on it. Always include the `--import-existing-resources` flag: + ```bash + gcloud infra-manager deployments apply projects/[PROJECT_ID]/locations/us-central1/deployments/[DEPLOYMENT_ID] \ + --local-source="[LOCAL_SOURCE_DIR]" \ + --service-account="projects/[PROJECT_ID]/serviceAccounts/[SERVICE_ACCOUNT_EMAIL]" \ + --import-existing-resources + ``` + * **Sub-step 3: Monitoring & Describing:** + * To get the deployment state: + ```bash + gcloud infra-manager deployments describe projects/[PROJECT_ID]/locations/us-central1/deployments/[DEPLOYMENT_ID] + ``` + * To list the status of specific deployed resources (requires revision ID): + ```bash + gcloud infra-manager resources list \ + --deployment=[DEPLOYMENT_ID] \ + --location=us-central1 \ + --revision=[REVISION_ID] + ``` + + * **Option B: If using gcloud CLI (via bash script):** + * **Sub-step 1: Execution:** Execute the generated `deploy.sh` script in the terminal: + ```bash + bash [PATH_TO_SCRIPT_DIR]/deploy.sh + ``` + * **Sub-step 2: Verification:** Run resource description commands (e.g. `gcloud compute forwarding-rules describe`) to verify that the load balancer is active and retrieve the public IP. + +--- + +## Phase 2: Drift Detection & Teardown + +* **Drift Detection:** If manual changes occur on the live load balancer resources, transition to `references/gfe-drift-detection.md` to preview and reconcile. +* **Teardown/Deletion:** + * **If Terraform:** Run `gcloud infra-manager deployments delete` with `--delete-policy=delete`. Note: Do NOT pass the `--service-account` argument to the `delete` command, as it is not supported (Infrastructure Manager uses the service account already associated with the deployment in the cloud). + * **If gcloud CLI:** Run the generated `destroy.sh` clean-up script: + ```bash + bash [PATH_TO_SCRIPT_DIR]/destroy.sh + ``` diff --git a/skills/cloud/gfe-main/references/gfe-resource-discovery.md b/skills/cloud/gfe-main/references/gfe-resource-discovery.md new file mode 100644 index 0000000000..ab660e17c7 --- /dev/null +++ b/skills/cloud/gfe-main/references/gfe-resource-discovery.md @@ -0,0 +1,19 @@ +# gfe-resource-discovery + +**Role:** +You are a Cloud Resource Discovery Agent. Your goal is to execute gcloud commands to fetch and list existing Google Cloud resources in the user's project to assist with the Global Front End (GFE) architecture configuration. + +--- + +## Core Directives + +1. **Discovery UX:** Prioritize a clean, numbered-list UX for all resource discovery. Never ask for manual string input for existing resources. Always present lists starting with: **1. Create New**, **2. NA**. +2. **Project ID Detection:** Auto-detect the target GCP project ID using `gcloud config get-value project`. +3. **Resource Fetching Guidelines:** + When requested by the main configuration skill, fetch the specific resource types: + * **Buckets (GCS):** Run `gcloud storage buckets list --format="value(name,location)"` + * **VM Clusters (MIGs):** Run `gcloud compute instance-groups managed list --format="value(name,region,zone)"` + * **Serverless (Cloud Run):** Run `gcloud run services list --format="value(name,region)"` + * *(Only fetch the resource types that the user explicitly selected in their Origin Types.)* + +4. **Return Format:** Return the discovered items clearly labeled with their region/zone where applicable. diff --git a/skills/cloud/gfe-main/references/gfe-terraform-generation.md b/skills/cloud/gfe-main/references/gfe-terraform-generation.md new file mode 100644 index 0000000000..af04c6a02a --- /dev/null +++ b/skills/cloud/gfe-main/references/gfe-terraform-generation.md @@ -0,0 +1,62 @@ +# gfe-terraform-generation + +**Role:** +You are a highly precise Terraform Code Assembler specializing in Global Front End (GFE) architectures on GCP. Your primary goal is to take a "Design Spec" from the `references/gfe-resource-discovery.md` skill and transform it into syntactically perfect, production-grade HCL code. + +--- + +## Core Directives - Behavioral Rules + +1. **Deterministic Output:** You must strictly follow the **Workload Profile Map** below. If a user selects a workload type, you MUST apply the corresponding HCL properties. Do not deviate or get "creative" with the code. +2. **Schema Enforcement:** You expect a "Design Spec" containing: Architecture Name, Project, Region, Protocols, Origins (with associated Workload Types), and Routing Rules. +3. **Cross-Resource Linking:** Ensure all Terraform resources are correctly linked using reference syntax (e.g., `service = google_compute_backend_service.example.id`) rather than hardcoding names. +4. **Directory Isolation:** Always generate HCL code inside a dedicated, isolated subdirectory named after the architecture (e.g., `/gfe/deployments/[ARCHITECTURE_NAME]/`) to avoid stale state or resource name pollution. +5. **Resource Prefixing:** Ensure all GCP resource names in the generated HCL are dynamically prefixed with the Architecture Name (either via input variables or string interpolation) to guarantee global uniqueness and prevent 409 resource conflicts. +6. **Lowercase Naming Only:** Infrastructure Manager and GCP APIs are strict on resource naming. The architecture name, deployment IDs, and all generated resource names MUST be strictly lowercase and match `^[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?$`. If the user provides a name with uppercase characters, convert it to lowercase automatically before using it in the configuration. + + +--- + +## Terraform Syntax & GCP API Constraints + +To prevent validation and deployment errors, always adhere to the following GCP provider constraints: + +1. **Cloud Armor (`google_compute_security_policy`)**: + * **Default Action**: Do NOT use `default_rule_action = "..."` at the level. You must explicitly define the default action as a `rule` block with priority `2147483647` and action `allow` (or `deny`). + * **Rate Limiting Action**: Use `action = "throttle"` instead of `rate-based-ban` for rate limit rules. This ensures compatibility across different API/provider versions. + * **Cloud Armor Edge Constraints**: `type = "CLOUD_ARMOR_EDGE"` policies (required for Backend Buckets) DO NOT support `rate_limit_options`. Only use standard `allow` or `deny` rules. Do not generate rate limiting configurations for backend buckets. +2. **Backend Buckets (`google_compute_backend_bucket`)**: + * **Cache Key Policy**: Do NOT include a `cache_key_policy` block (e.g., trying to set `include_query_string = false`). Backend buckets do not support these arguments; query strings are automatically ignored by default when using `CACHE_ALL_STATIC`. + * **Default TTL Limits**: The `cdn_policy.default_ttl` cannot be greater than the `max_ttl` (which defaults to `86400`). You MUST cap `default_ttl` at `86400`. Do NOT use values like `2592000`. +3. **Backend Services (`google_compute_backend_service`)**: + * **Origin Header TTLs**: If using `cache_mode = "USE_ORIGIN_HEADERS"`, you MUST omit `default_ttl` and `client_ttl` from the `cdn_policy` block. Specifying them will cause validation errors. + * **Request Coalescing**: Avoid using `request_coalescing = true` inside the `cdn_policy` of backend services unless verified to be supported by the active provider version. +4. **Provider Version**: + * Always configure the `required_providers` block to use the Google provider version `~> 5.0` (or newer) to ensure rate-limiting features are correctly supported. + +--- + +## Workload Profile Map (The Source of Truth) + +| Workload Type | `enable_cdn` | `cdn_policy.cache_mode` | `default_ttl` | `cache_key_policy` | WAF Protection (Cloud Armor) | +| :--- | :--- | :--- | :--- | :--- | :--- | +| **Static Objects** | `true` | `CACHE_ALL_STATIC` | `86400` (1d) | Host + Protocol + Path (Ignore Query Strings) | None (Rate Limiting unsupported on CLOUD_ARMOR_EDGE) | +| **API (Cacheable)**| `true` | `USE_ORIGIN_HEADERS` | `3600` (1h) | **Include Query Strings** | OWASP (SQLi/XSS/LFI) + Rate Limit (100 RPM) | +| **API (Uncacheable)**| `false`| N/A | N/A | N/A | OWASP (SQLi/XSS/RCE/Session Fixation) + Strict Rate Limit (10-30 RPM) + Bot Management | +| **Dynamic Web** | `true` | `USE_ORIGIN_HEADERS` | `300` (5m) | Host + Protocol + Path (Bypass on session cookie) | OWASP (SQLi/XSS/CSRF/Shellshock) + Rate Limit (120 RPM) | + +--- + +## The Generation Workflow + +1. **Consume Spec:** Read the provided Design Spec carefully. +2. **Prepare Directory Structure:** Create the dedicated subdirectory `/gfe/deployments/[ARCHITECTURE_NAME]/`. +3. **Assemble HCL:** + * Generate a `variables.tf` and `terraform.tfvars` defining the input variables (`architecture_name`, `project_id`, `region`, etc.) to dynamically parameterize the blueprint. + * Generate the `terraform` and `provider` blocks in `main.tf`. + * For each origin, create the appropriate backend resource (`backend_bucket`, `backend_service`, or `region_network_endpoint_group`), dynamically prefixing the `name` field using the architecture name prefix, and injecting properties from the **Workload Profile Map**. + * Create the `google_compute_security_policy` resources for each backend using the WAF rules defined in the map, with dynamic prefixing. + * Create the `google_compute_url_map` using the provided path and header-based routing rules. + * Create the frontend resources (`target_http_proxy` / `target_https_proxy` with `ssl_certificates`, and `global_forwarding_rule`), with dynamic prefixing. +4. **Output Code:** Provide the complete, finalized `main.tf`, `variables.tf`, and `terraform.tfvars` files to the user. Do not include conversational filler; focus on the technical integrity of the code. +5. **Hand-off:** Once the code is output, state the next action (Download Files or Deploy Configuration) and transition to `references/gfe-managed-deployment.md` (specifically Phase 2 Option A) to guide the user through deployment pre-checks and execution. diff --git a/skills/cloud/gfe-main/references/gfe-terraform-module.md b/skills/cloud/gfe-main/references/gfe-terraform-module.md new file mode 100644 index 0000000000..ede7b68839 --- /dev/null +++ b/skills/cloud/gfe-main/references/gfe-terraform-module.md @@ -0,0 +1,443 @@ +# GFE HTTP Load Balancer & Cloud Armor Terraform Module Skill + +This skill defines how to design, generate, and manage Google Cloud Global Load Balancers and Cloud Armor Security Policies using official Google-managed Terraform modules: +* [terraform-google-lb-http](https://github.com/terraform-google-modules/terraform-google-lb-http) +* [terraform-google-cloud-armor](https://github.com/GoogleCloudPlatform/terraform-google-cloud-armor) + +Leveraging these modules ensures syntax compliance, minimizes generation volume, and standardizes security profile definitions. + +--- + +## 1. Cloud Armor Configuration Guideline + +Depending on the backend type, choose either a standard Backend Security Policy (`CLOUD_ARMOR`) or an Edge Security Policy (`CLOUD_ARMOR_EDGE`). + +### A. Backend Security Policy (`CLOUD_ARMOR`) +Supports rate-limiting, full OWASP preconfigured sets (WAF), reCAPTCHA Enterprise redirects/scores, threat intelligence lists, and auto-deploying Adaptive Protection. + +```hcl +module "security_policy_complete" { + source = "GoogleCloudPlatform/cloud-armor/google" + version = "~> 8.0" + + project_id = var.project_id + name = "${var.architecture_name}-sec-policy" + description = "Unified corporate security policy" + default_rule_action = "allow" + type = "CLOUD_ARMOR" + + # reCAPTCHA Enterprise redirection site key mapping + recaptcha_redirect_site_key = var.recaptcha_site_key + + # Enable Layer 7 DDoS mitigation and Adaptive Protection + layer_7_ddos_defense_enable = true + layer_7_ddos_defense_rule_visibility = "STANDARD" + + # 1. Standard IP Security Rules & Rate Limiting + security_rules = { + # Throttling / Rate-Limiting rules + "rate_limit_api" = { + action = "rate_based_ban" + priority = 1000 + src_ip_ranges = ["*"] + rate_limit_options = { + exceed_action = "deny(429)" + rate_limit_http_request_count = 100 + rate_limit_http_request_interval_sec = 60 + ban_duration_sec = 600 + enforce_on_key = "IP" + } + } + + # reCAPTCHA challenge redirect action + "force_recaptcha_challenge" = { + action = "redirect" + priority = 1100 + src_ip_ranges = ["190.217.68.0/24"] # Suspect IP ranges + redirect_type = "GOOGLE_RECAPTCHA" + } + } + + # 2. Pre-configured WAF Rules with Sensitivity & Tuning/Exclusions + pre_configured_rules = { + "sqli_rule" = { + action = "deny(403)" + priority = 2000 + target_rule_set = "sqli-v33-stable" + sensitivity_level = 2 # 0 to 4 + # Exclude specific rule IDs causing false positives + exclude_target_rule_ids = ["owasp-crs-v030301-id942110-sqli", "owasp-crs-v030301-id942120-sqli"] + } + "xss_rule" = { + action = "deny(403)" + priority = 2001 + target_rule_set = "xss-v33-stable" + sensitivity_level = 2 + } + } + + # 3. Threat Intelligence Feeds + threat_intelligence_rules = { + "block_tor_exits" = { + action = "deny(403)" + priority = 3000 + feed = "iplist-tor-exit-nodes" + description = "Block all incoming traffic from known TOR exit nodes" + } + "block_malicious_ips" = { + action = "deny(403)" + priority = 3001 + feed = "iplist-known-malicious-ips" + description = "Block known malicious IP addresses feed" + } + } + + # 4. Adaptive Protection Auto-Deploy suggested rules + adaptive_protection_auto_deploy = { + "auto_mitigate_ddos" = { + enable = true + load_threshold = 0.8 + confidence_threshold = 0.9 + impacted_baseline_threshold = 0.01 + } + } + + # 5. Custom CEL Rules (e.g. reCAPTCHA Token Assessment, Region constraints) + custom_rules = { + "block_low_recaptcha_score" = { + action = "deny(403)" + priority = 4000 + description = "Deny requests with poor reCAPTCHA scores" + expression = "token.recaptcha_session.score < 0.3" + } + } +} +``` + +### B. Edge Security Policy (`CLOUD_ARMOR_EDGE`) +Used primarily for GCS buckets to filter request access at the edge before caching or processing. + +```hcl +module "security_policy_edge" { + source = "GoogleCloudPlatform/cloud-armor/google" + version = "~> 8.0" + + project_id = var.project_id + name = "${var.architecture_name}-edge-policy" + description = "Edge security policy for Static Assets / GCS Buckets" + default_rule_action = "deny(403)" + type = "CLOUD_ARMOR_EDGE" + + # Edge rules only support custom region / IP matching (no WAF or rate limiting) + custom_rules = { + "allow_trusted_regions" = { + action = "allow" + priority = 100 + description = "Allow requests from trusted countries only" + expression = "['US', 'IN', 'GB'].contains(origin.region_code)" + } + } +} +``` + + +--- + +## 2. HTTP Load Balancer Configuration Guideline + +The `gce-lb-http` module encapsulates the global forwarding rule, URL map, proxies, and backend services. + +### 2. Supported Backend & Network Endpoint Group (NEG) Types + +Depending on where your backend application is hosted, define the backend service group using one of the following patterns: + +#### A. Serverless NEG (Cloud Run / Cloud Functions) +Used for serverless endpoints. +```hcl +resource "google_compute_region_network_endpoint_group" "serverless_neg" { + name = "${var.architecture_name}-serverless-neg" + network_endpoint_type = "SERVERLESS" + region = var.region + cloud_run { + service = var.cloud_run_service_name + } +} +``` + +#### B. External / Internet NEG (Internet IP / FQDN) +Used for endpoints located outside of Google Cloud (on other clouds or public internet). +```hcl +# NEG definition +resource "google_compute_global_network_endpoint_group" "external_neg" { + name = "${var.architecture_name}-external-neg" + network_endpoint_type = "INTERNET_IP_PORT" # or "INTERNET_FQDN_PORT" + default_port = 80 +} + +# Endpoint binding +resource "google_compute_global_network_endpoint" "external_endpoint" { + global_network_endpoint_group = google_compute_global_network_endpoint_group.external_neg.name + ip_address = var.external_ip_address # or fqdn if using INTERNET_FQDN_PORT + port = 80 +} +``` + +#### C. Private Service Connect (PSC) NEG +Used to route traffic to service attachments published by other VPCs or Google services privately. +* PSC NEGs use the `PRIVATE_SERVICE_CONNECT` type. +* Requires linking to the producer's `service_attachment`. +```hcl +resource "google_compute_region_network_endpoint_group" "psc_neg" { + name = "${var.architecture_name}-psc-neg" + network_endpoint_type = "PRIVATE_SERVICE_CONNECT" + psc_target_service = var.producer_service_attachment_url + region = var.region + network = var.network_id + subnetwork = var.subnetwork_id +} +``` + +#### D. Hybrid NEG (Non-GCP Private IP) +Used to route traffic to on-premises resources or other clouds connected via Cloud VPN or Cloud Interconnect. +```hcl +resource "google_compute_network_endpoint_group" "hybrid_neg" { + name = "${var.architecture_name}-hybrid-neg" + network_endpoint_type = "NON_GCP_PRIVATE_IP_PORT" + network = var.network_id + default_port = 80 + zone = var.zone +} + +resource "google_compute_network_endpoint" "hybrid_endpoint" { + network_endpoint_group = google_compute_network_endpoint_group.hybrid_neg.name + ip_address = var.on_prem_ip_address + port = 80 + zone = var.zone +} +``` + +#### E. VM Managed Instance Groups (MIG) +Used for classic virtual machine clusters running on GCE. No NEG is required; you link the Instance Group manager output directly. +```hcl +# Reference existing MIG or create a new one +data "google_compute_instance_group" "vm_mig" { + name = var.mig_name + zone = var.zone +} +``` + + +### Module Instantiation +Link backend services to respective NEGs and apply routing matching in the `url_map` block. + +```hcl +module "gce_lb_http" { + source = "GoogleCloudPlatform/lb-http/google" + version = "~> 9.0" + + project = var.project_id + name = var.architecture_name + http_forward = true + + backends = { + # Default External Backend (API) + default = { + protocol = "HTTP" + enable_cdn = true + security_policy = module.security_policy_api.policy_id + + cdn_policy = { + cache_mode = "USE_ORIGIN_HEADERS" + default_ttl = 3600 + cache_key_policy = { + include_query_string = true + } + } + + groups = [ + { + group = google_compute_global_network_endpoint_group.external_neg.id + } + ] + } + + # GCS Bucket Backend + gcs = { + protocol = "HTTP" + enable_cdn = true + + cdn_policy = { + cache_mode = "CACHE_ALL_STATIC" + default_ttl = 2592000 + } + + groups = [ + { + group = "projects/${var.project_id}/global/backendBuckets/${var.architecture_name}-gcs-backend" + } + ] + } + + # Serverless Backend + serverless = { + protocol = "HTTP" + enable_cdn = true + security_policy = module.security_policy_api.policy_id + + cdn_policy = { + cache_mode = "CACHE_ALL_STATIC" + default_ttl = 2592000 + } + + groups = [ + { + group = google_compute_region_network_endpoint_group.serverless_neg.id + } + ] + } + } + + # Custom Path Routing Matchers + url_map = { + default_service = "default" + path_matchers = { + main-matcher = { + default_service = "default" + path_rules = [ + { + paths = ["/api", "/api/*"] + service = "default" + }, + { + paths = ["/video", "/video/*", "/docs", "/docs/*"] + service = "gcs" + }, + { + paths = ["/storage", "/storage/*"] + service = "serverless" + } + ] + } + } + } +} +``` + +--- + +## 3. Advanced Use Cases & Scenarios + +### A. HTTP to HTTPS Redirect +To enforce SSL/TLS encryption across your load balancer: +* Set `ssl = true`. +* Provide certificates using either `ssl_certificates = [...]` or `certificate_map = ...`. +* Set `https_redirect = true` to automatically spin up a port-80 forwarding rule that redirects all incoming HTTP requests to HTTPS. + +```hcl +module "gce_lb_http" { + source = "GoogleCloudPlatform/lb-http/google" + version = "~> 9.0" + + project = var.project_id + name = var.architecture_name + ssl = true + http_forward = true + https_redirect = true + + # SSL Configuration options + ssl_certificates = [google_compute_ssl_certificate.cert.self_link] + # OR Certificate Manager Map + # certificate_map = google_certificate_manager_certificate_map.default.id + + # ... backends ... +} +``` + +### B. Granular CDN Caching Policies +For fine-tuned control over cache keys, TTL values, and cache bypass headers: +* Set `cache_mode` to either `CACHE_ALL_STATIC`, `USE_ORIGIN_HEADERS`, or `FORCE_CACHE_ALL`. +* Customize client, default, and max TTLs (in seconds). +* Construct a `cache_key_policy` to strip query strings or protocol headers if needed. +* Provide a list of `bypass_cache_on_request_headers` (e.g., `["Pragma", "Authorization"]`). + +```hcl +backends = { + gcs_cdn = { + protocol = "HTTP" + enable_cdn = true + + cdn_policy = { + cache_mode = "CACHE_ALL_STATIC" + default_ttl = 86400 # 1 day + client_ttl = 3600 # 1 hour + max_ttl = 604800 # 7 days + serve_while_stale = 86400 + + cache_key_policy = { + include_host = true + include_protocol = true + include_query_string = false # ignore query parameters for cache matching + } + + bypass_cache_on_request_headers = [ + "bypass-cdn", + "pragma" + ] + } + # ... groups ... + } +} +``` + +### C. Cross-Project Backends (Shared VPC) +When the Load Balancer frontend is in a Host Project, but the actual backend VM clusters or NEGs reside in a Service Project: +* Set `project` explicitly inside the specific backend definition block. +* Define `firewall_projects` and `firewall_networks` so that the module configures firewall access in the Host Project. + +```hcl +module "gce_lb_http" { + source = "GoogleCloudPlatform/lb-http/google" + version = "~> 9.0" + + project = var.host_project_id + name = var.architecture_name + firewall_projects = [var.host_project_id] + firewall_networks = [var.network_name] + + backends = { + service_backend = { + # Points this backend service to the service project + project = var.service_project_id + protocol = "HTTP" + groups = [ + { + group = "projects/${var.service_project_id}/zones/${var.zone}/instanceGroups/${var.mig_name}" + } + ] + } + } +} +``` + +### D. Multi-MIG Load Balancing (Regional Failover / Anycast) +To distribute traffic to instance groups across multiple zones or regions: +* List multiple group references under the `groups` array of the backend configuration. +* Google's global load balancer will automatically route users to the closest healthy group (Anycast routing). + +```hcl +backends = { + web_servers = { + protocol = "HTTP" + groups = [ + { + # Group in Zone A + group = data.google_compute_instance_group.mig_zone_a.id + }, + { + # Group in Zone B (Failover / Geo-Routing Target) + group = data.google_compute_instance_group.mig_zone_b.id + } + ] + } +} +```