Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 15697Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 15697" |
…219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d224200 to
4956a53
Compare
The controller depends on IAzureProvisioningOptionsManager which is only registered in run mode. Moving the controller registration inside the run-mode block fixes the DI resolution failure in publish/test scenarios. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r fallback AzureProvisioner is resolved in all modes as an eventing subscriber but depends on AzureProvisioningController. Register the controller and a no-op IAzureProvisioningOptionsManager unconditionally so DI succeeds in publish/test mode. In run mode, RunModeProvisioningContextProvider overrides the no-op via AddSingleton (registered before TryAdd). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
|
@copilot fix the failing test |
I investigated all Azure test classes locally and all tests pass (BicepProvisionerTests, EnvironmentResourceExtensionsTests, ResourcePreparerTests, ProvisioningContextProviderTests, DeployerTests, BicepResourceTests — 178+ tests total). I was unable to access the CI logs to identify the specific failing test due to DNS restrictions in this environment. The previous two commits (95026c9, 7bbaf1f) addressed DI resolution failures for Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Remove explicit References.Add for role assignment resources — the dependency is already discovered through bicep template parameters. The extra reference caused duplicate parameter keys in the manifest. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🎬 CLI E2E Test Recordings — 52 recordings uploaded (commit View recordings
📹 Recordings uploaded automatically from CI run #23757560248 |
|
/deployment-test |
|
🚀 Deployment tests starting on PR #15697... This will deploy to real Azure infrastructure. Results will be posted here when complete. |
|
❌ Deployment E2E Tests failed — 21 passed, 8 failed, 0 cancelled View test results and recordings
|
karolz-ms
left a comment
There was a problem hiding this comment.
Provided some comments--hope this helps.
The hardest part of writing a Kubernetes-like controller is dealing both model changes and real-world changes. Each can drift independently, and each might be inconsistent with the request you have just dequeued (e.g. you might receive a startup request for a resource that is already running etc.) But I think the AzureProvisioningController is fortunately much simpler: the model is static, and the primary goal of the controller is to serialize asynchronous operations. I have no major concerns over this approach here.
| return false; | ||
| } | ||
|
|
||
| private async Task<object?> EnqueueOperationAsync( |
There was a problem hiding this comment.
The name of this method seems misleading. I would expect it puts something in the operations queue, then returns. In fact, it enqueues the item and then waits for the operation to complete. The name should reflect the second part.
|
|
||
| internal static ImmutableArray<EnvironmentCommandDefinition> EnvironmentCommandDefinitions { get; } = | ||
| [ | ||
| new( |
There was a problem hiding this comment.
Do you think all these commands could be implemented in more OO-way, maybe even 😬 DI-injected?
| return false; | ||
| } | ||
|
|
||
| private async Task<object?> EnqueueOperationAsync( |
There was a problem hiding this comment.
You could also consider making this truly just "enqueue opteration", and then RunOperationAsync() could wait on the operation completion task and return the result.
| { | ||
| using var periodic = new PeriodicTimer(DriftCheckInterval, timeProvider); | ||
|
|
||
| while (await periodic.WaitForNextTickAsync(stoppingToken).ConfigureAwait(false)) |
There was a problem hiding this comment.
Not sure if this makes a difference here, but for periodic work I usually schedule a single-period wait after the periodic job completes. This way the break between jobs is constant and not dependent on how long each instance of a job ran.
|
|
||
| private void EnsureOperationLoopStarted() | ||
| { | ||
| if (Interlocked.CompareExchange(ref _operationLoopStarted, 1, 0) == 0) |
There was a problem hiding this comment.
We now have a small utility for this 😄 https://github.com/microsoft/aspire/blob/main/src/Aspire.Hosting/Utils/ConcurrencyUtils.cs#L10
|
|
||
| foreach (var resource in GetProvisionableAzureResources(model)) | ||
| { | ||
| if (!ShouldCheckForDrift(resource.Resource) || |
There was a problem hiding this comment.
Should we do this in parallel?
| CollectProvisioningDependencies(dependencies, parameter); | ||
| } | ||
|
|
||
| foreach (var reference in resource.References) |
There was a problem hiding this comment.
Do we also need to wait for references of references etc? (transitive closure)
Can GetResourceDependenciesAsync() help here?
| CollectProvisioningDependencies(dependencies, reference); | ||
| } | ||
|
|
||
| foreach (var dependency in dependencies) |
There was a problem hiding this comment.
There is probably a way to use Task.WhenAll() here instead of this loop. Apart from being more concise, the provisioning will fail faster (on first failed dependency).
| dependencies.Add(azureResource); | ||
| } | ||
|
|
||
| if (value is IValueWithReferences valueWithReferences) |
There was a problem hiding this comment.
Again, GetResourceDependenciesAsync() is probably what you want.
| IconVariant.Regular, | ||
| IsHighlighted: true), | ||
| new( | ||
| AzureEnvironmentCommand.ChangeAzureContext, |
There was a problem hiding this comment.
Maybe this is not the right place, but we should think about timeouts for different kinds of provisioning operations. E.g. it probably does not make sense to let resetting the provisioning state get stuck for minutes.
Description
This PR introduces
AzureProvisioningController, a serialized control loop that coordinates all run-mode Azure provisioning operations. It replaces the inline provisioning logic that previously lived inAzureProvisionerwith a channel-based queue that serializes startup provisioning, dashboard commands, CLI commands, and background drift detection through a single processing loop.Controller architecture
The controller uses a
Channel<QueuedOperation>with a single reader. Every operation — provision, reprovision, reset, change-location, change-context, delete, drift-check — is modeled as a typed intent record that gets enqueued and processed one at a time. This eliminates races between concurrent dashboard commands, CLI commands, and the periodic drift monitor.Within a provisioning pass, individual resources fan out concurrently but are ordered by dependency. Each resource gets a per-resource
ProvisioningTaskCompletionSourcethat downstream resources await before starting their own deployment. The TCS is completed through exactly two paths (CompleteProvisioning/FailProvisioning), so dependents unblock as soon as their prerequisites finish rather than waiting for the entire batch.What the provisioning stack can do now
Resource commands (per-resource):
Environment commands (all resources):
Background drift detection:
Azure resource metadata:
azure.subscription.id,azure.resource.group,azure.tenant.id,azure.tenant.domain,azure.location, andresource.source(full ARM deployment id)Location overrides:
InvalidResourceLocationconflictsOther changes
BicepProvisioner— hardened checksum reuse validation, unified Azure identity metadata across fresh/cached paths, predicted deployment-id stamping for failed resourcesRunModeProvisioningContextProvider— refactored Azure context acquisition and interactive promptingAzureResourcePreparer— wires per-resource commands into the app modelAzureProvisioningControllerin run mode (fixes DI failures in publish/test scenarios)Test coverage
Checklist
<remarks />and<code />elements on your triple slash comments?aspire.devissue: