Skip to content

License validation still deadlocks under cold-start thread-pool starvation in 16.1.1 (Task.Run sync-over-async, follow-up to #4612) #4640

Description

@matthoneycutt-sm

TL;DR

The Task.Run(...).GetAwaiter().GetResult() sync-over-async added in #4613 (shipped in 16.1.1) does not eliminate the license-validation deadlock. Under a cold-start thread-pool-starvation scenario it makes the deadlock more likely, because the validation now needs a free pool thread to complete while the calling thread blocks on it under the DI singleton-build lock. We captured a full production memory dump on 16.1.1 showing the whole app convoyed behind that one lock with CPU idle.

Environment

  • AutoMapper: 16.1.1 (includes Better async handling of license validation; fixes #4612 #4613)
  • License: valid commercial key configured via cfg.LicenseKey
  • Runtime: ASP.NET Core API on .NET, Azure App Service (IIS in-process / w3wp)
  • DI container: Lamar. IMapper / MapperConfiguration registered as singletons built lazily on first resolve
  • Config size: large, ~1,100+ mapped types, ~615 custom mapping classes built via reflection during MapperConfiguration construction

Relationship to #4612 / #4613

#4612 reported AutoMapper.dll getting locked during Web Deploy on 16.0.0. You traced it to sync-over-async in LicenseAccessor and switched to Task.Run in #4613, shipped in 16.1.1. This issue is that the Task.Run change reduced the problem but didn't eliminate it. Since #4612 is closed and locked, I'm filing a new one for the same root cause.

The current 16.1.1 implementation in src/AutoMapper/Licensing/LicenseAccessor.cs:

var validateResult = Task.Run(() => handler.ValidateTokenAsync(licenseKey, parms))
    .GetAwaiter()
    .GetResult();

with Current => _license ??= Initialize(); and Initialize() calling ValidateKey during the first MapperConfiguration construction.

Expected behavior

A cold-starting instance with a valid commercial license takes traffic without wedging, regardless of how much concurrent load arrives before the MapperConfiguration singleton is built.

Actual behavior

Under a deploy that cold-starts every instance simultaneously into immediate live traffic, all instances wedge at once: CPU sits idle (~12%) while every request thread is parked, waiting on the singleton-build lock held by the one thread blocked inside ValidateKey. The instances only recover after a long delay (slow pool thread injection) or an instance recycle.

Why Task.Run(...).GetResult() still deadlocks under pool starvation

  1. Every inbound request resolves IMapper, forcing the lazy singleton build of MapperConfiguration on first hit under Lamar's singleton-build lock.
  2. That build calls LicenseAccessor.ValidateKey, which does Task.Run(() => handler.ValidateTokenAsync(...)).GetAwaiter().GetResult().
  3. Task.Run queues the validation work onto the thread pool and .GetResult() blocks the calling thread until that work item completes, all while the calling thread holds the singleton-build lock.
  4. During cold start the pool is already saturated by the inbound request burst (every request is parked needing IMapper), with no idle threads and only slow hill-climbing thread injection. The queued Task.Run work item therefore can't be scheduled.
  5. The lock holder blocks indefinitely, and every other request convoys behind the singleton lock. The whole process stalls.

The Task.Run form is actually worse than a plain in-place .GetResult() here. It introduces a dependency on pool-thread availability for the very work whose result is blocking a pool thread, which is a thread-pool-starvation deadlock. The blocking call sits under a process-wide singleton lock, so it takes the whole app down rather than one request.

Stack trace (verbatim from clrstack, lock holder, top-down)

System.Threading.Monitor.Wait(System.Object, Int32)
System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken)
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(...)
System.Runtime.CompilerServices.TaskAwaiter`1[[System.__Canon, System.Private.CoreLib]].GetResult()
AutoMapper.Licensing.LicenseAccessor.ValidateKey(System.String)
AutoMapper.Licensing.LicenseAccessor.Initialize()
AutoMapper.Licensing.LicenseAccessor.get_Current()
AutoMapper.MapperConfiguration..ctor(AutoMapper.MapperConfigurationExpression, Microsoft.Extensions.Logging.ILoggerFactory)
Microsoft.Extensions.DependencyInjection.ServiceCollectionExtensions+<>c.<AddAutoMapperClasses>b__10_4(System.IServiceProvider)
Lamar.IoC.Resolvers.SingletonLambdaResolver`2[...].Build(Lamar.IoC.Scope)
Lamar.IoC.Resolvers.SingletonResolver`1[...].Resolve(Lamar.IoC.Scope)
Lamar.IoC.Scope.GetInstance(System.Type)
... ASP.NET Core MVC controller activation ...

The other ~524 threads are all blocked one level up in Lamar...SingletonResolver.Resolve / Lamar.IoC.Scope.GetInstance, waiting on the same lock.

Dump evidence (1.4 GB w3wp full dump, 16.1.1)

  • 557 managed threads.
  • Thread pool: Workers Total 526, Running 526, Idle 0, Min 100, Max 32767. CPU 12%, a saturated worker count with nothing actually running.
  • Exactly one contended lock: SyncBlock Index 57, MonitorHeld 1049 (1 owner + 524 waiters), owned by the single thread above.
  • Convoy counts (threads whose stack contains the frame):
    • Lamar...SingletonResolver: 525
    • Lamar.IoC.Scope.GetInstance: 525
    • <AddAutoMapperClasses> lambda: 526
    • LicenseAccessor: 3, ValidateKey: 1, Monitor.Wait: 4

One thread is building the singleton (blocked in ValidateKey), and 524 are waiting on its lock.

Requested fix

License validation appears to be local JWT validation with no network I/O, so it shouldn't need to be async at all on this path. Any of these would address it:

  1. Use the synchronous validation path. JsonWebTokenHandler has a synchronous ValidateToken. You noted in AutoMapper 16.0.0 – Web Deploy sometimes fails with locked AutoMapper.dll #4612 it's marked obsolete, but for local-only validation a synchronous call removes the sync-over-async hazard entirely and is preferable to Task.Run(...).GetResult(), which can't make forward progress under pool starvation.
  2. Move validation off the construction / singleton-lock path. Validate lazily or deferred so it never runs the blocking wait while holding the DI singleton-build lock.
  3. Document this footgun and a recommended warm-up. Forcing an eager build is already possible (resolve IMapper at startup before the app takes traffic, which is the workaround we used). The gap is guidance: nothing warns that the default lazy-singleton build runs license validation under the DI container's build lock, or recommends warming it on startup when you're on a lazy-resolving container like Lamar. A short note in the licensing/registration docs would save the next team this dump.

On reproduction

This is a cold-start + immediate-load race: it does not reproduce when the singleton is built quietly with pool threads to spare, so we don't have a minimal deterministic Gist. What we do have is the full 16.1.1 production dump above, which pins the mechanism. I can share more from the dump (SOS output, the full clrstack -all). And if you put a candidate fix in a pre-release, we can run it in our environment and report back whether the cold-start wedge still happens, since our deploys reproduce it reliably.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions