Skip to content

docs(spec): private networking for host: microsoft.foundry#8690

Open
m5i-work wants to merge 9 commits into
mainfrom
m5i/foundry-private-network-spec
Open

docs(spec): private networking for host: microsoft.foundry#8690
m5i-work wants to merge 9 commits into
mainfrom
m5i/foundry-private-network-spec

Conversation

@m5i-work

@m5i-work m5i-work commented Jun 17, 2026

Copy link
Copy Markdown
Member

Resolves #8165

Summary

Doc-only PR. Adds docs/specs/foundry-private-network/spec.md, a technical design spec for declarative private networking on the host: microsoft.foundry service in azure.yaml. No product code changes — the spec is the implementation contract; code PRs follow.

The unified azure.yaml design (§1.4) explicitly defers VNet binding (a Foundry Account setting) to the built-in Bicep work. This spec is that path: a flat network: block the existing bicep-less synthesizer reads to provision a network-bound account.

What changed in this update

Stakeholders approved the final azure.yaml surface. This PR now reflects that shape:

  • network: present means private data plane always (publicNetworkAccess: Disabled + account private endpoint). Omit network: for public.
  • peSubnet is required whenever network: is declared.
  • agentSubnet presence derives egress:
    • present ⇒ BYO/customer subnet (useMicrosoftManagedNetwork: false)
    • absent ⇒ Microsoft-managed egress (useMicrosoftManagedNetwork: true)
  • isolationMode is valid only when agentSubnet is absent.
  • prefix presence controls subnet create vs reference:
    • present ⇒ create subnet
    • absent ⇒ reference existing subnet
  • dns is optional; omitted means create/link zones, while dns.resourceGroup references central/shared zones.

What the spec covers

  • Flat network: surfaceagentSubnet, peSubnet, dns, and managed-egress isolationMode; no mode: byo | managed enum.
  • Primitives from the service team's templates — what azd keeps, drops (managed stores, caphost wiring, tool subnet, UAMI, CMK), and defers (ACR).
  • Additive synthesizer / main.bicep changes (high level), including regenerating the precompiled main.arm.json fallback.
  • ValidationpeSubnet required, subnet VNet/name required, CIDR validation, same-VNet v1 constraint, isolationMode only for managed egress, brownfield precedence (endpoint: wins).
  • Update behavior — re-provision is incremental, but topology changes are limited by service mutability; tightening managed isolation is supported, relaxing/switching egress is not a v1 contract.
  • ACR private-networking RoI decision — BYO image (--image) in v1; defer local-build-into-private-ACR.

Relationship to in-flight work

Lands as a follow-on into the huimiu/foundry-azure-yaml feature branch after the synthesizer (#8643) merges. Rides ServiceConfig.AdditionalProperties, so no core change beyond the schema slice. The BYO-image flow and the merged remote-build skip for VNet-injected accounts keep ACR off the critical path. See §3 for the in-flight work overview.

Out of scope

Public opt-out under network:, cross-VNet v1 topologies, BYO stores + capability-host wiring, tool subnet, UAMI, CMK, and local-build-into-private-ACR are explicitly excluded from v1.

Add a technical design spec for a declarative network: block on the
host: microsoft.foundry service in azure.yaml. Covers the byo/managed
VNet surface, the primitives captured from the service team's standard
network-secured agent template, the additive synthesizer/main.bicep
changes, validation, brownfield precedence, and the ACR-private-networking
RoI decision (BYO image in v1, defer local-build-into-private-ACR).

Doc-only; no product code changes. Lands as a follow-on to the bicep-less
provisioning and unified azure.yaml efforts.
Comment thread docs/specs/foundry-private-network/spec.md Outdated
Comment thread docs/specs/foundry-private-network/spec.md Outdated
Comment thread docs/specs/foundry-private-network/spec.md Outdated
- Problem: shorten the §1.4 quote; add 'Why the block sits on the service'
  justifying the service-level (project-shaped) declaration vs the account-level
  VNet binding, including 1:1 account/project and multi-project behavior.
- §3: replace the per-PR table with a brief overview + a tree of the major
  in-flight efforts.
- Remove the §5 primitives table; retain the template-15 link (and sibling
  specs) under a new References section. Preserve the load-bearing account-flip
  and DNS-zone facts in the synthesizer section. Renumber §6-§10 → §5-§9.
Comment thread docs/specs/foundry-private-network/spec.md
Comment thread docs/specs/foundry-private-network/spec.md
- Note the Solution YAML is one illustrative scenario (BYO VNet), not the full
  schema; point to §4 for the field reference.
- Rewrite Scope to high-level features, dropping ARM/yaml field-level detail
  (networkInjections scenario, DNS zone names, --image).
@m5i-work m5i-work marked this pull request as ready for review June 17, 2026 06:36
Copilot AI review requested due to automatic review settings June 17, 2026 06:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new design spec documenting a proposed network: block for host: microsoft.foundry services in azure.yaml, intended as an implementation contract for follow-on code changes to the Foundry bicep-less synthesizer.

Changes:

  • Introduces a new technical design spec describing the network: schema surface (byo vs managed) and tri-state create/reference semantics.
  • Documents intended synthesizer/template impacts (new modules, parameters, main.arm.json regeneration) and validation/brownfield precedence.
  • Captures v1 scope decisions (BYO image for ACR story; defers private ACR build path) plus telemetry/docs follow-ups and open questions.

Comment thread docs/specs/foundry-private-network/spec.md Outdated
Comment thread docs/specs/foundry-private-network/spec.md Outdated
Comment thread docs/specs/foundry-private-network/spec.md Outdated
Comment thread docs/specs/foundry-private-network/spec.md Outdated
Comment thread docs/specs/foundry-private-network/spec.md Outdated
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

📋 Prioritization Note

Thanks for the contribution! The linked issue isn't in the current milestone yet.
Review may take a bit longer — reach out to @RickWinter or @kristenwomack if you'd like to discuss prioritization.

m5i-work added 5 commits June 17, 2026 14:48
Join hard-wrapped prose paragraphs and list items into single lines so the
rendered text reflows to the reader's width. Code blocks, the PR-stack tree,
YAML examples, and tables are preserved verbatim.
The unify-azure-yaml and bicepless-foundry specs are not yet in the repo
(they live in their own docs PRs), so the relative links would break on
merge. Point them at the source PRs (#8590, #8577) instead. Add the GitHub
handles hund030/huimiu to the cspell ignore directive.
- Drop the §N prefixes so all section titles are consistently unnumbered;
  convert internal cross-references to section names (external §1.4/§2.1
  references to the unify spec are unchanged).
- Remove the Open questions list and retitle the section to 'Telemetry and
  docs'.
Replace the terse schema snippet with the fully-commented version that
documents mode/byo/managed/dns inline, including the subnet tri-state rules
and the DNS create-vs-reference behavior.
Avoid implying the literal 192.168.x.x prefixes are required; the surface
section documents the schema, so use the <CIDR> placeholder notation there.
The Solution section keeps concrete values as an illustrative scenario.
Replace the earlier mode/byo/managed network model with the approved flat
network block:
- network present always means private data plane
- peSubnet is required and drives the account private endpoint
- agentSubnet presence derives BYO vs Microsoft-managed egress
- isolationMode is valid only for managed egress
- subnet prefix presence controls create vs reference
- dns references central/private zones when specified

Also document update constraints and remove stale PR dependency language.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec is clean. The egress derivation from agentSubnet presence (instead of a mode enum) is a good call; fewer states, less validation surface.

One gap worth noting in the Update behavior section: the table documents unsupported topology mutations (BYO ↔ managed switch, relaxing isolation mode), but the validation pipeline section only covers structural/static checks against the current azure.yaml. The failure path for these mutations isn't specified:

  • Does azd detect the mutation by comparing config against deployed state and fail pre-synthesis?
  • Or does it pass through to ARM and let the Foundry RP reject it?

If the intent is "let ARM handle it in v1," a one-liner in the Update section noting that azd doesn't validate topology transitions against live state (so errors come from the RP, not azd) would save implementers a design question and set user expectations.

Otherwise this is ready to go as an implementation contract.

@m5i-work

Copy link
Copy Markdown
Member Author

VNet spec update needed after #8779

#8779 is merged, so the private networking spec should use the split resource-service shape as the baseline. The VNet feature is not shipped yet, so we should not preserve network: on azure.ai.agent or microsoft.foundry service hosts.

Please update this spec PR (no force-push required from me) as follows:

Contract

network: belongs only on the host: azure.ai.project service:

infra:
  provider: microsoft.foundry

services:
  my-agent:
    host: azure.ai.agent
    uses:
      - ai-project
    image: myprivacr.azurecr.io/agents/my-agent:v1

  ai-project:
    host: azure.ai.project
    deployments:
      - name: gpt-4.1-mini
        model: { format: OpenAI, name: gpt-4.1-mini, version: "2025-04-14" }
        sku: { name: GlobalStandard, capacity: 10 }
    network:
      agentSubnet:
        vnet: ${AZURE_VNET_ID}
        name: agent-subnet
        prefix: 192.168.10.0/24
      peSubnet:
        vnet: ${AZURE_VNET_ID}
        name: pe-subnet
        prefix: 192.168.11.0/24
      dns:
        resourceGroup: rg-private-dns
        subscription: ${AZURE_DNS_SUBSCRIPTION_ID}

infra.provider: microsoft.foundry remains the provisioning provider dispatch key. host: microsoft.foundry should not be the documented VNet service shape.

Specific spec edits

  • Retitle from Private networking for host: microsoft.foundry to Private networking for host: azure.ai.project.
  • Rewrite “Why the block sits on the service” to say azure.ai.project is the project/account provisioning boundary. azure.ai.agent owns agent deploy config and depends on the project via uses:.
  • Change all YAML examples from monolithic microsoft.foundry to split azure.ai.agent + azure.ai.project.
  • In Scope/Out of scope, explicitly say network: on azure.ai.agent and microsoft.foundry is not part of the shipped VNet contract.
  • In the azure.yaml surface section, say network: is a sibling of endpoint: and deployments: on host: azure.ai.project, not a sibling of agents: on the old monolithic service.
  • In synthesizer/provider sections, replace “no structural provider/eject change” with: provisioning and eject must discover the single azure.ai.project service and synthesize endpoint:, deployments:, and network: from that service.
  • In brownfield, say endpoint: on the azure.ai.project service wins and network: is ignored with a warning.
  • In telemetry, say provision.network_mode is derived from services.<project>.network where host: azure.ai.project.
  • In docs/env-var wording, keep AZURE_VNET_ID and AZURE_DNS_SUBSCRIPTION_ID as example user-chosen ${VAR} names, not fixed env vars read by azd.

I opened the code/docs PR for this reparenting here: #8809

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec documents network: on host: microsoft.foundry, but #8779 merged the split resource-service shape. All examples, the title, and the "Why the block sits on the service" rationale need updating for host: azure.ai.project before this can merge. Your reparenting plan in the comment covers the right set of edits.

Two items for the reparented version (see inline comments for details):

  1. The update behavior section lists unsupported topology mutations but doesn't specify who enforces them (azd pre-synthesis check vs. ARM/RP rejection). Clarifying this prevents an implementer design question.

  2. The brownfield section could note the diagnostic gap when endpoint: points to a network-secured account: connectivity errors won't mention private networking as the cause.

| Relax managed `isolationMode` or disable managed network after enabling it | Not supported by the service. Create a new environment/account instead. |
| Switch BYO egress ↔ managed egress after agents/capability host exist | Not a v1 contract. Create a new environment/account instead. |

The docs should present network topology changes as provision-time configuration, with update support limited to safe/idempotent re-provisioning and service-supported tightening.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says "update support limited to safe/idempotent re-provisioning and service-supported tightening" but doesn't say who enforces the "not supported" transitions in rows 4-5 of the table. If azd doesn't validate topology transitions against live state (letting ARM/the RP fail), say so explicitly here. If azd should detect and reject pre-synthesis, that's a design requirement the synthesizer implementation needs to cover.


`endpoint:` on the service already short-circuits synthesis — the synthesizer returns `ErrEndpointBrownfield` and the provider connects to the existing project without provisioning. A network-secured account reached this way is **already** network-bound by whoever created it.

Therefore, when `endpoint:` is present, `network:` is **ignored** (the account's network posture is fixed and not azd's to change). This is documented as explicit precedence: `endpoint:` wins, and a project that wants azd to manage its network posture must be greenfield (no `endpoint:`). If both are present, azd warns that `network:` has no effect in brownfield mode.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the brownfield account has publicNetworkAccess: Disabled and the developer's machine isn't on the VNet, azd will fail with a connectivity error that won't mention private networking as the cause. Worth noting whether v1 should detect this condition (e.g., check the account's publicNetworkAccess property after resolving endpoint:) and surface a diagnostic hint, or whether that's deferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow azd ai agent to use a private-VNet Foundry account (provision new or reuse an existing account)

4 participants