-
Notifications
You must be signed in to change notification settings - Fork 319
Fix deadlock in dd-task-scheduler #10096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| // Load JFR Handlers class early, if present (it has been moved and renamed in JDK23+). | ||
| // This prevents a deadlock. See PROF-13025. | ||
| try { | ||
| Class.forName("jdk.jfr.events.Handlers"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are touching this class only when we have JFR available, so loading the Handlers class should not be disturbing anything.
But - we have smap entry events disabled by default, yet we are getting these issues ... I have a suspicion that we are initing the support regardless of the enablement status. Could you check for PROFILING_SMAP_COLLECTION_ENABLED setting before merging this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be slighly more complex. The smap event is registered in datadog.trace.bootstrap.Agent and there we probably don't have easy access to PROFILING_SMAP_COLLECTION_ENABLED (not sure), and it might also affect other events. I filed PROF-13213 for a deeper investigation. I would merge this PR as-is. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 6 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.57.0-SNAPSHOT~6bdc537296, baseline=1.57.0-SNAPSHOT~5c5592a6f6
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.082 s) : 0, 1081716
Total [baseline] (8.765 s) : 0, 8764931
Agent [candidate] (1.09 s) : 0, 1090032
Total [candidate] (8.808 s) : 0, 8807646
section iast
Agent [baseline] (1.23 s) : 0, 1230255
Total [baseline] (9.45 s) : 0, 9450437
Agent [candidate] (1.231 s) : 0, 1231407
Total [candidate] (9.513 s) : 0, 9513442
gantt
title insecure-bank - break down per module: candidate=1.57.0-SNAPSHOT~6bdc537296, baseline=1.57.0-SNAPSHOT~5c5592a6f6
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.194 ms) : 0, 1194
crashtracking [candidate] (1.201 ms) : 0, 1201
BytebuddyAgent [baseline] (649.735 ms) : 0, 649735
BytebuddyAgent [candidate] (655.361 ms) : 0, 655361
GlobalTracer [baseline] (281.599 ms) : 0, 281599
GlobalTracer [candidate] (283.977 ms) : 0, 283977
AppSec [baseline] (32.372 ms) : 0, 32372
AppSec [candidate] (32.566 ms) : 0, 32566
Debugger [baseline] (67.764 ms) : 0, 67764
Debugger [candidate] (67.824 ms) : 0, 67824
Remote Config [baseline] (651.031 µs) : 0, 651
Remote Config [candidate] (672.405 µs) : 0, 672
Telemetry [baseline] (9.06 ms) : 0, 9060
Telemetry [candidate] (9.085 ms) : 0, 9085
Flare Poller [baseline] (3.75 ms) : 0, 3750
Flare Poller [candidate] (3.778 ms) : 0, 3778
section iast
crashtracking [baseline] (1.219 ms) : 0, 1219
crashtracking [candidate] (1.204 ms) : 0, 1204
BytebuddyAgent [baseline] (797.141 ms) : 0, 797141
BytebuddyAgent [candidate] (797.58 ms) : 0, 797580
GlobalTracer [baseline] (256.873 ms) : 0, 256873
GlobalTracer [candidate] (257.043 ms) : 0, 257043
IAST [baseline] (27.128 ms) : 0, 27128
IAST [candidate] (27.135 ms) : 0, 27135
AppSec [baseline] (35.485 ms) : 0, 35485
AppSec [candidate] (35.527 ms) : 0, 35527
Debugger [baseline] (64.482 ms) : 0, 64482
Debugger [candidate] (65.011 ms) : 0, 65011
Remote Config [baseline] (576.234 µs) : 0, 576
Remote Config [candidate] (588.615 µs) : 0, 589
Telemetry [baseline] (8.361 ms) : 0, 8361
Telemetry [candidate] (8.418 ms) : 0, 8418
Flare Poller [baseline] (3.438 ms) : 0, 3438
Flare Poller [candidate] (3.478 ms) : 0, 3478
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.57.0-SNAPSHOT~6bdc537296, baseline=1.57.0-SNAPSHOT~5c5592a6f6
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.083 s) : 0, 1083315
Total [baseline] (10.945 s) : 0, 10944743
Agent [candidate] (1.083 s) : 0, 1082663
Total [candidate] (10.859 s) : 0, 10858558
section appsec
Agent [baseline] (1.266 s) : 0, 1265549
Total [baseline] (11.156 s) : 0, 11156354
Agent [candidate] (1.266 s) : 0, 1265663
Total [candidate] (11.17 s) : 0, 11170336
section iast
Agent [baseline] (1.226 s) : 0, 1225814
Total [baseline] (11.193 s) : 0, 11192946
Agent [candidate] (1.226 s) : 0, 1225944
Total [candidate] (11.194 s) : 0, 11193584
section profiling
Agent [baseline] (1.204 s) : 0, 1204112
Total [baseline] (10.965 s) : 0, 10964741
Agent [candidate] (1.219 s) : 0, 1218988
Total [candidate] (11.045 s) : 0, 11045026
gantt
title petclinic - break down per module: candidate=1.57.0-SNAPSHOT~6bdc537296, baseline=1.57.0-SNAPSHOT~5c5592a6f6
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.202 ms) : 0, 1202
crashtracking [candidate] (1.205 ms) : 0, 1205
BytebuddyAgent [baseline] (649.782 ms) : 0, 649782
BytebuddyAgent [candidate] (649.662 ms) : 0, 649662
GlobalTracer [baseline] (282.516 ms) : 0, 282516
GlobalTracer [candidate] (281.857 ms) : 0, 281857
AppSec [baseline] (32.294 ms) : 0, 32294
AppSec [candidate] (32.489 ms) : 0, 32489
Debugger [baseline] (68.464 ms) : 0, 68464
Debugger [candidate] (68.254 ms) : 0, 68254
Remote Config [baseline] (668.768 µs) : 0, 669
Remote Config [candidate] (649.949 µs) : 0, 650
Telemetry [baseline] (9.033 ms) : 0, 9033
Telemetry [candidate] (9.166 ms) : 0, 9166
Flare Poller [baseline] (3.803 ms) : 0, 3803
Flare Poller [candidate] (3.909 ms) : 0, 3909
section appsec
crashtracking [baseline] (1.21 ms) : 0, 1210
crashtracking [candidate] (1.195 ms) : 0, 1195
BytebuddyAgent [baseline] (690.238 ms) : 0, 690238
BytebuddyAgent [candidate] (690.333 ms) : 0, 690333
GlobalTracer [baseline] (259.575 ms) : 0, 259575
GlobalTracer [candidate] (259.343 ms) : 0, 259343
IAST [baseline] (24.518 ms) : 0, 24518
IAST [candidate] (24.474 ms) : 0, 24474
AppSec [baseline] (174.416 ms) : 0, 174416
AppSec [candidate] (173.89 ms) : 0, 173890
Debugger [baseline] (66.144 ms) : 0, 66144
Debugger [candidate] (67.053 ms) : 0, 67053
Remote Config [baseline] (743.29 µs) : 0, 743
Remote Config [candidate] (746.015 µs) : 0, 746
Telemetry [baseline] (9.158 ms) : 0, 9158
Telemetry [candidate] (9.147 ms) : 0, 9147
Flare Poller [baseline] (3.987 ms) : 0, 3987
Flare Poller [candidate] (4.042 ms) : 0, 4042
section iast
crashtracking [baseline] (1.224 ms) : 0, 1224
crashtracking [candidate] (1.201 ms) : 0, 1201
BytebuddyAgent [baseline] (792.639 ms) : 0, 792639
BytebuddyAgent [candidate] (792.396 ms) : 0, 792396
GlobalTracer [baseline] (255.354 ms) : 0, 255354
GlobalTracer [candidate] (255.17 ms) : 0, 255170
IAST [baseline] (26.985 ms) : 0, 26985
IAST [candidate] (26.902 ms) : 0, 26902
AppSec [baseline] (35.493 ms) : 0, 35493
AppSec [candidate] (35.625 ms) : 0, 35625
Debugger [baseline] (66.149 ms) : 0, 66149
Debugger [candidate] (66.603 ms) : 0, 66603
Remote Config [baseline] (578.467 µs) : 0, 578
Remote Config [candidate] (595.638 µs) : 0, 596
Telemetry [baseline] (8.471 ms) : 0, 8471
Telemetry [candidate] (8.523 ms) : 0, 8523
Flare Poller [baseline] (3.529 ms) : 0, 3529
Flare Poller [candidate] (3.514 ms) : 0, 3514
section profiling
crashtracking [baseline] (1.191 ms) : 0, 1191
crashtracking [candidate] (1.201 ms) : 0, 1201
BytebuddyAgent [baseline] (701.371 ms) : 0, 701371
BytebuddyAgent [candidate] (711.156 ms) : 0, 711156
GlobalTracer [baseline] (220.342 ms) : 0, 220342
GlobalTracer [candidate] (223.832 ms) : 0, 223832
AppSec [baseline] (32.232 ms) : 0, 32232
AppSec [candidate] (32.963 ms) : 0, 32963
Debugger [baseline] (68.477 ms) : 0, 68477
Debugger [candidate] (68.826 ms) : 0, 68826
Remote Config [baseline] (617.885 µs) : 0, 618
Remote Config [candidate] (631.826 µs) : 0, 632
Telemetry [baseline] (8.875 ms) : 0, 8875
Telemetry [candidate] (8.971 ms) : 0, 8971
Flare Poller [baseline] (3.771 ms) : 0, 3771
Flare Poller [candidate] (3.856 ms) : 0, 3856
ProfilingAgent [baseline] (97.8 ms) : 0, 97800
ProfilingAgent [candidate] (97.606 ms) : 0, 97606
Profiling [baseline] (98.371 ms) : 0, 98371
Profiling [candidate] (98.178 ms) : 0, 98178
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 3 performance regressions! Performance is the same for 17 metrics, 16 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.57.0-SNAPSHOT~6bdc537296, baseline=1.57.0-SNAPSHOT~5c5592a6f6
dateFormat X
axisFormat %s
section baseline
no_agent (1.205 ms) : 1193, 1217
. : milestone, 1205,
iast (3.32 ms) : 3274, 3367
. : milestone, 3320,
iast_FULL (6.033 ms) : 5972, 6093
. : milestone, 6033,
iast_GLOBAL (3.696 ms) : 3648, 3744
. : milestone, 3696,
profiling (2.039 ms) : 2020, 2058
. : milestone, 2039,
tracing (1.778 ms) : 1763, 1793
. : milestone, 1778,
section candidate
no_agent (1.2 ms) : 1188, 1211
. : milestone, 1200,
iast (3.268 ms) : 3225, 3311
. : milestone, 3268,
iast_FULL (5.846 ms) : 5788, 5905
. : milestone, 5846,
iast_GLOBAL (3.68 ms) : 3625, 3734
. : milestone, 3680,
profiling (2.035 ms) : 2015, 2054
. : milestone, 2035,
tracing (1.815 ms) : 1800, 1831
. : milestone, 1815,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.57.0-SNAPSHOT~6bdc537296, baseline=1.57.0-SNAPSHOT~5c5592a6f6
dateFormat X
axisFormat %s
section baseline
no_agent (16.883 ms) : 16721, 17045
. : milestone, 16883,
appsec (18.339 ms) : 18151, 18527
. : milestone, 18339,
code_origins (17.77 ms) : 17595, 17944
. : milestone, 17770,
iast (17.659 ms) : 17484, 17834
. : milestone, 17659,
profiling (18.661 ms) : 18474, 18847
. : milestone, 18661,
tracing (17.675 ms) : 17499, 17850
. : milestone, 17675,
section candidate
no_agent (18.398 ms) : 18202, 18595
. : milestone, 18398,
appsec (19.498 ms) : 19304, 19693
. : milestone, 19498,
code_origins (18.167 ms) : 17983, 18351
. : milestone, 18167,
iast (18.307 ms) : 18123, 18491
. : milestone, 18307,
profiling (18.377 ms) : 18194, 18560
. : milestone, 18377,
tracing (17.812 ms) : 17637, 17987
. : milestone, 17812,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.57.0-SNAPSHOT~6bdc537296, baseline=1.57.0-SNAPSHOT~5c5592a6f6
dateFormat X
axisFormat %s
section baseline
no_agent (14.965 s) : 14965000, 14965000
. : milestone, 14965000,
appsec (14.424 s) : 14424000, 14424000
. : milestone, 14424000,
iast (17.981 s) : 17981000, 17981000
. : milestone, 17981000,
iast_GLOBAL (18.115 s) : 18115000, 18115000
. : milestone, 18115000,
profiling (14.683 s) : 14683000, 14683000
. : milestone, 14683000,
tracing (14.839 s) : 14839000, 14839000
. : milestone, 14839000,
section candidate
no_agent (15.648 s) : 15648000, 15648000
. : milestone, 15648000,
appsec (14.649 s) : 14649000, 14649000
. : milestone, 14649000,
iast (18.148 s) : 18148000, 18148000
. : milestone, 18148000,
iast_GLOBAL (18.238 s) : 18238000, 18238000
. : milestone, 18238000,
profiling (14.618 s) : 14618000, 14618000
. : milestone, 14618000,
tracing (14.808 s) : 14808000, 14808000
. : milestone, 14808000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.57.0-SNAPSHOT~6bdc537296, baseline=1.57.0-SNAPSHOT~5c5592a6f6
dateFormat X
axisFormat %s
section baseline
no_agent (1.478 ms) : 1466, 1489
. : milestone, 1478,
appsec (2.461 ms) : 2410, 2513
. : milestone, 2461,
iast (2.221 ms) : 2156, 2286
. : milestone, 2221,
iast_GLOBAL (2.271 ms) : 2205, 2336
. : milestone, 2271,
profiling (2.076 ms) : 2023, 2128
. : milestone, 2076,
tracing (2.048 ms) : 1997, 2099
. : milestone, 2048,
section candidate
no_agent (1.479 ms) : 1467, 1490
. : milestone, 1479,
appsec (2.468 ms) : 2416, 2520
. : milestone, 2468,
iast (2.212 ms) : 2148, 2276
. : milestone, 2212,
iast_GLOBAL (2.267 ms) : 2202, 2332
. : milestone, 2267,
profiling (2.077 ms) : 2025, 2129
. : milestone, 2077,
tracing (2.059 ms) : 2008, 2110
. : milestone, 2059,
|
What Does This Do
This is a fix/workaround that addresses a potential deadlock in JFR. The deadlock has been reported to upstream, and a fix has been proposed there, but it would be good to also fix it in our own code for when the code is running on an unpatched JDK.
Motivation
This change fixes a deadlock in the profiling agent.
Additional Notes
We are loading the JFR handlers class early. This will prevent the deadlock that happens when we call EventType.getEventType() from the SmapEventFactory static initializer. That JFR code would first lock the Utils class lock, and then later try to acquire the Handlers' class initializer lock. If another thread is currently in the static initializer of the Handlers class, then that thread would attempt to acquire the Utils class lock, and thus deadlock. Calling the Handlers static initializer early avoids that scenario by ensuring that the Handlers class is fully initialized before calling into the JFR code that would acquire the Utils lock.
The problem is only present in JDKs < 22, and the Handlers class has been renamed and moved starting in JDK22. That is why the class is loaded reflectively and any errors while loading the class are silently ignored.
Contributor Checklist
type:and (comp:orinst:) labels in addition to any useful labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: PROF-13025