docs(spec): private networking for host: microsoft.foundry#8690
docs(spec): private networking for host: microsoft.foundry#8690m5i-work wants to merge 9 commits into
Conversation
Add a technical design spec for a declarative network: block on the host: microsoft.foundry service in azure.yaml. Covers the byo/managed VNet surface, the primitives captured from the service team's standard network-secured agent template, the additive synthesizer/main.bicep changes, validation, brownfield precedence, and the ACR-private-networking RoI decision (BYO image in v1, defer local-build-into-private-ACR). Doc-only; no product code changes. Lands as a follow-on to the bicep-less provisioning and unified azure.yaml efforts.
- Problem: shorten the §1.4 quote; add 'Why the block sits on the service' justifying the service-level (project-shaped) declaration vs the account-level VNet binding, including 1:1 account/project and multi-project behavior. - §3: replace the per-PR table with a brief overview + a tree of the major in-flight efforts. - Remove the §5 primitives table; retain the template-15 link (and sibling specs) under a new References section. Preserve the load-bearing account-flip and DNS-zone facts in the synthesizer section. Renumber §6-§10 → §5-§9.
- Note the Solution YAML is one illustrative scenario (BYO VNet), not the full schema; point to §4 for the field reference. - Rewrite Scope to high-level features, dropping ARM/yaml field-level detail (networkInjections scenario, DNS zone names, --image).
There was a problem hiding this comment.
Pull request overview
Adds a new design spec documenting a proposed network: block for host: microsoft.foundry services in azure.yaml, intended as an implementation contract for follow-on code changes to the Foundry bicep-less synthesizer.
Changes:
- Introduces a new technical design spec describing the
network:schema surface (byovsmanaged) and tri-state create/reference semantics. - Documents intended synthesizer/template impacts (new modules, parameters,
main.arm.jsonregeneration) and validation/brownfield precedence. - Captures v1 scope decisions (BYO image for ACR story; defers private ACR build path) plus telemetry/docs follow-ups and open questions.
📋 Prioritization NoteThanks for the contribution! The linked issue isn't in the current milestone yet. |
Join hard-wrapped prose paragraphs and list items into single lines so the rendered text reflows to the reader's width. Code blocks, the PR-stack tree, YAML examples, and tables are preserved verbatim.
- Drop the §N prefixes so all section titles are consistently unnumbered; convert internal cross-references to section names (external §1.4/§2.1 references to the unify spec are unchanged). - Remove the Open questions list and retitle the section to 'Telemetry and docs'.
Replace the terse schema snippet with the fully-commented version that documents mode/byo/managed/dns inline, including the subnet tri-state rules and the DNS create-vs-reference behavior.
Avoid implying the literal 192.168.x.x prefixes are required; the surface section documents the schema, so use the <CIDR> placeholder notation there. The Solution section keeps concrete values as an illustrative scenario.
Replace the earlier mode/byo/managed network model with the approved flat network block: - network present always means private data plane - peSubnet is required and drives the account private endpoint - agentSubnet presence derives BYO vs Microsoft-managed egress - isolationMode is valid only for managed egress - subnet prefix presence controls create vs reference - dns references central/private zones when specified Also document update constraints and remove stale PR dependency language.
jongio
left a comment
There was a problem hiding this comment.
Spec is clean. The egress derivation from agentSubnet presence (instead of a mode enum) is a good call; fewer states, less validation surface.
One gap worth noting in the Update behavior section: the table documents unsupported topology mutations (BYO ↔ managed switch, relaxing isolation mode), but the validation pipeline section only covers structural/static checks against the current azure.yaml. The failure path for these mutations isn't specified:
- Does azd detect the mutation by comparing config against deployed state and fail pre-synthesis?
- Or does it pass through to ARM and let the Foundry RP reject it?
If the intent is "let ARM handle it in v1," a one-liner in the Update section noting that azd doesn't validate topology transitions against live state (so errors come from the RP, not azd) would save implementers a design question and set user expectations.
Otherwise this is ready to go as an implementation contract.
VNet spec update needed after #8779#8779 is merged, so the private networking spec should use the split resource-service shape as the baseline. The VNet feature is not shipped yet, so we should not preserve Please update this spec PR (no force-push required from me) as follows: Contract
infra:
provider: microsoft.foundry
services:
my-agent:
host: azure.ai.agent
uses:
- ai-project
image: myprivacr.azurecr.io/agents/my-agent:v1
ai-project:
host: azure.ai.project
deployments:
- name: gpt-4.1-mini
model: { format: OpenAI, name: gpt-4.1-mini, version: "2025-04-14" }
sku: { name: GlobalStandard, capacity: 10 }
network:
agentSubnet:
vnet: ${AZURE_VNET_ID}
name: agent-subnet
prefix: 192.168.10.0/24
peSubnet:
vnet: ${AZURE_VNET_ID}
name: pe-subnet
prefix: 192.168.11.0/24
dns:
resourceGroup: rg-private-dns
subscription: ${AZURE_DNS_SUBSCRIPTION_ID}
Specific spec edits
I opened the code/docs PR for this reparenting here: #8809 |
jongio
left a comment
There was a problem hiding this comment.
The spec documents network: on host: microsoft.foundry, but #8779 merged the split resource-service shape. All examples, the title, and the "Why the block sits on the service" rationale need updating for host: azure.ai.project before this can merge. Your reparenting plan in the comment covers the right set of edits.
Two items for the reparented version (see inline comments for details):
-
The update behavior section lists unsupported topology mutations but doesn't specify who enforces them (azd pre-synthesis check vs. ARM/RP rejection). Clarifying this prevents an implementer design question.
-
The brownfield section could note the diagnostic gap when
endpoint:points to a network-secured account: connectivity errors won't mention private networking as the cause.
| | Relax managed `isolationMode` or disable managed network after enabling it | Not supported by the service. Create a new environment/account instead. | | ||
| | Switch BYO egress ↔ managed egress after agents/capability host exist | Not a v1 contract. Create a new environment/account instead. | | ||
|
|
||
| The docs should present network topology changes as provision-time configuration, with update support limited to safe/idempotent re-provisioning and service-supported tightening. |
There was a problem hiding this comment.
This says "update support limited to safe/idempotent re-provisioning and service-supported tightening" but doesn't say who enforces the "not supported" transitions in rows 4-5 of the table. If azd doesn't validate topology transitions against live state (letting ARM/the RP fail), say so explicitly here. If azd should detect and reject pre-synthesis, that's a design requirement the synthesizer implementation needs to cover.
|
|
||
| `endpoint:` on the service already short-circuits synthesis — the synthesizer returns `ErrEndpointBrownfield` and the provider connects to the existing project without provisioning. A network-secured account reached this way is **already** network-bound by whoever created it. | ||
|
|
||
| Therefore, when `endpoint:` is present, `network:` is **ignored** (the account's network posture is fixed and not azd's to change). This is documented as explicit precedence: `endpoint:` wins, and a project that wants azd to manage its network posture must be greenfield (no `endpoint:`). If both are present, azd warns that `network:` has no effect in brownfield mode. |
There was a problem hiding this comment.
When the brownfield account has publicNetworkAccess: Disabled and the developer's machine isn't on the VNet, azd will fail with a connectivity error that won't mention private networking as the cause. Worth noting whether v1 should detect this condition (e.g., check the account's publicNetworkAccess property after resolving endpoint:) and surface a diagnostic hint, or whether that's deferred.
Resolves #8165
Summary
Doc-only PR. Adds
docs/specs/foundry-private-network/spec.md, a technical design spec for declarative private networking on thehost: microsoft.foundryservice inazure.yaml. No product code changes — the spec is the implementation contract; code PRs follow.The unified
azure.yamldesign (§1.4) explicitly defers VNet binding (a Foundry Account setting) to the built-in Bicep work. This spec is that path: a flatnetwork:block the existing bicep-less synthesizer reads to provision a network-bound account.What changed in this update
Stakeholders approved the final
azure.yamlsurface. This PR now reflects that shape:network:present means private data plane always (publicNetworkAccess: Disabled+ account private endpoint). Omitnetwork:for public.peSubnetis required whenevernetwork:is declared.agentSubnetpresence derives egress:useMicrosoftManagedNetwork: false)useMicrosoftManagedNetwork: true)isolationModeis valid only whenagentSubnetis absent.prefixpresence controls subnet create vs reference:dnsis optional; omitted means create/link zones, whiledns.resourceGroupreferences central/shared zones.What the spec covers
network:surface —agentSubnet,peSubnet,dns, and managed-egressisolationMode; nomode: byo | managedenum.main.bicepchanges (high level), including regenerating the precompiledmain.arm.jsonfallback.peSubnetrequired, subnet VNet/name required, CIDR validation, same-VNet v1 constraint,isolationModeonly for managed egress, brownfield precedence (endpoint:wins).--image) in v1; defer local-build-into-private-ACR.Relationship to in-flight work
Lands as a follow-on into the
huimiu/foundry-azure-yamlfeature branch after the synthesizer (#8643) merges. RidesServiceConfig.AdditionalProperties, so no core change beyond the schema slice. The BYO-image flow and the merged remote-build skip for VNet-injected accounts keep ACR off the critical path. See §3 for the in-flight work overview.Out of scope
Public opt-out under
network:, cross-VNet v1 topologies, BYO stores + capability-host wiring, tool subnet, UAMI, CMK, and local-build-into-private-ACR are explicitly excluded from v1.