Skip to content

8260555: Change the default TIMEOUT_FACTOR from 4 to 1 #26749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

lkorinth
Copy link
Contributor

@lkorinth lkorinth commented Aug 12, 2025

This changes the timeout factor from 4 to 1. Most of the changes add timeouts to individual test cases so that I am able to run them with a timeout factor of 0.7 (some margin to the checked in factor of one)

In addition to changing the timeout factor, I am also using a library call to parse the timeout factor from the Java properties (I cannot use the library function everywhere as jtreg does not allow me to add @library notations to non test case files).

My approach has been to run all tests, and afterwards updating those that fail due to the timeout factor. The amount of updated test cases is huge, and my strategy has been to quadruple the timeout if I could not directly see that less was needed. In a few places, I have added a bit more timeout so that it will work with the 0.7 timeout factor.

These fixes have been created when I have ploughed through test cases:
JDK-8352719: Add an equals sign to the modules statement
JDK-8352709: Remove bad timing annotations from WhileOpTest.java
JDK-8352074: Test MemoryLeak.java seems not to test what it is supposed to test
CODETOOLS-7903937: JTREG uses timeout factor on socket timeout but not on KEEPALIVE
CODETOOLS-7903961: Make default timeout configurable

After the review, I will update the copyrights.

I have run testing tier1-8. The last time with a timeout factor of 1 instead of 0.7.

I got 4 timing related faults:

  1. runtime/Thread/TestThreadDumpMonitorContention.java
    This is probably: https://bugs.openjdk.org/browse/JDK-8361370
  2. sun/tools/jhsdb/BasicLauncherTest.java
    I am unsure about this one, it has not failed on my runs before, even with a timeout factor of 0.7, maybe I was unlucky.
  3. gc/stress/TestReclaimStringsLeaksMemory.java
    I updated this to 480 seconds, I finish this fairly fast (~14s) on my not very fast computer, but the Macs that fail are old x86-based ones.
  4. sun/security/ssl/X509KeyManager/CertChecking.java
    This is a new test that I got on last rebase. I have added a timeout of 480 to it.

In addition to these four tests, I have another one "java/lang/ThreadLocal/MemoryLeak.java" that earlier failed with a timeout factor of 0.7 but did not fail in the last run. I will not update that test case, because the extra time spent is strange and should be looked at. I have created JDK-8352074 on that test case, and I will probably create another bug describing the timeout I got.

From the review of the cancelled "8356171: Increase timeout for test cases as preparation for change of default timeout factor", I have taken a few actions:

  1. in testing(md|html): interpreted mode -> forced compilation mode
  2. in MTTest.java: changed 1200 -> 400 (was 300 to begin with)

I am now re-running tier 1-8.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8260555: Change the default TIMEOUT_FACTOR from 4 to 1 (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26749/head:pull/26749
$ git checkout pull/26749

Update a local copy of the PR:
$ git checkout pull/26749
$ git pull https://git.openjdk.org/jdk.git pull/26749/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26749

View PR using the GUI difftool:
$ git pr show -t 26749

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26749.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 12, 2025

👋 Welcome back lkorinth! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 12, 2025

@lkorinth This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8260555: Change the default TIMEOUT_FACTOR from 4 to 1

Reviewed-by: ihse, sspitsyn, lmesnik

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 138 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Aug 12, 2025

@lkorinth The following labels will be automatically applied to this pull request:

  • build
  • client
  • compiler
  • core-libs
  • hotspot
  • i18n
  • javadoc
  • net
  • nio
  • security
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Aug 12, 2025

Webrevs

@plummercj
Copy link
Contributor

sun/tools/jhsdb/BasicLauncherTest.java
I am unsure about this one, it has not failed on my runs before, even with a timeout factor of 0.7, maybe I was unlucky.

@lkorinth Can you send me a link to the failure?

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the Serviceability related tweaks and I'm okay with them in general.
But I'm curious if you do not see any timeouts with this anymore.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 13, 2025
@lkorinth
Copy link
Contributor Author

sun/tools/jhsdb/BasicLauncherTest.java
I am unsure about this one, it has not failed on my runs before, even with a timeout factor of 0.7, maybe I was unlucky.

@lkorinth Can you send me a link to the failure?

I sent it to you on email.

@lkorinth
Copy link
Contributor Author

I've reviewed the Serviceability related tweaks and I'm okay with them in general. But I'm curious if you do not see any timeouts with this anymore.

I only got the four timeouts described in the description, I got a few other failures as well that was not timeout related. I sent you a link to the test results in an email.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 13, 2025
@dougxc
Copy link
Member

dougxc commented Aug 14, 2025

I have run testing tier1-8. The last time with a timeout factor of 1 instead of 0.7.

Would you mind also running tier9 to avoid surprises there.

@lkorinth
Copy link
Contributor Author

Added three new /timeout=480 after the last run.

@@ -936,7 +936,8 @@ define SetupRunJtregTestBody
JTREG_ALL_OPTIONS := $$(JTREG_JAVA_OPTIONS) $$(JTREG_VM_OPTIONS)

JTREG_AUTO_PROBLEM_LISTS :=
JTREG_AUTO_TIMEOUT_FACTOR := 4
# Please reach consensus before changing this. It was not easy changing it to a `1`.
JTREG_AUTO_TIMEOUT_FACTOR := 1
Copy link
Member

@sendaoYan sendaoYan Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the default value of JTREG_AUTO_TIMEOUT_FACTOR set to 1 by default, then the value of JTREG_AUTO_TIMEOUT_FACTOR when run with -Xcomp should be change from 10 to 2.5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me if the author meant this to be 2.5 more than normal or 10 more than JTREG default, or a multiplier that seems to work. It does not bother me more if it is a 10 then a 2.5, as it needs to have a value that is not the multiplicative identity value. I will not change this, the change I have made is already large enough and I want this to be integrated ASAP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also something that can be changed later, in a follow up fix.

Copy link
Member

@sendaoYan sendaoYan Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take test test/hotspot/jtreg/compiler/arraycopy/stress/TestStressArrayCopy.java as example.
If there is a bug in jvm with -Xcomp option which will cause this test run time outed. Before this PR, it will take 7200*10 seconds to run this test finish and report time outed failure. But after this PR, it will take 28800*10 seconds to run this test finish and then report timed out failure. I think the 28800*10 senonds is too long and it's unacceptable.

Copy link
Member

@dholmes-ora dholmes-ora Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELETED - I confused the timeout with the timeout-factor. New comment below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think 4x longer timeouts for -Xcomp is unreasonable. I also do not want to make this huge change even bigger. If you would like to change it after the integration I think that would be valuable --- though my guess is that it could be quite a lot of work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sendaoYan this PR changes the default timeoutFactor and so also has to change quite a number of implicit and explicit timeouts. But that doesn't mean that test configs that already set their own timeoutFactor should adjust by the same factor! That just doesn't work for any test with an implicit default timeout.

Copy link
Member

@sendaoYan sendaoYan Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this PR change the default timeoutFactor when the tested JVM options do not contains '-Xcomp', and at the same time also multiplies 4 of the timeout value defined in some tests.

So after this PR, the tests which the timeout value has been multiplied 4 will have more timeout value, when the tested JVM options contains '-Xcomp'.

I do agree this change, what I mean is this change has some side effect.

If you would like to change it after the integration I think that would be valuable --- though my guess is that it could be quite a lot of work.

I think I can try it in a new PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to warn you before you put too much energy into it. Changing the -Xcomp timeout factor might have even bigger impact than my change. Also, I have no idea how well that flag is tested in open testing. That is, your change might look good for you --- but might cause havoc for companies doing more extensive testing.

I have still not received green light for integrating my change, because extensive testing is still being run (and other teams are evaluating). I advise against changing the flag. When I evaluate the benefit for the default timeout, it was mainly not the timeout in itself that was the problem, but the fact that most people have no idea that the timeout factor is applied and thus can not create or debug tests in a good way. I hope this helps you, and does not come out as too negative. I just feel that I have put too much energy into this, and I do not hope that struggle for you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lkorinth Thanks for your advice sincerely. I think you are right, we need more evaluate cautiously before start to change the timeoutFactor for -Xcomp.

@lkorinth
Copy link
Contributor Author

I have run testing tier1-8. The last time with a timeout factor of 1 instead of 0.7.

Would you mind also running tier9 to avoid surprises there.

I had problems doing this, and I just want to say that I have not run tier9 (I have talked to Douglas off-list).

@lkorinth
Copy link
Contributor Author

If there are no mayor objections, I will update the copyrights before I leave work today.

@dfuch
Copy link
Member

dfuch commented Aug 18, 2025

Hi Leo, I played a bit with your changes and I observed intermittent timeout failures for the following tests:

	java/net/httpclient/CancelledResponse.java
	java/net/httpclient/whitebox/FlowTestDriver.java

The first test failing more frequently. Could you please add /timeout=480 to these two tests as well?

@RogerRiggs
Copy link
Contributor

Generally, changes with this many changed files are broken down into smaller PRs to make the review easier and more conclusive. In this case, hotspot, jdk, and langtools might have been good groupings. The reviews could be done in parallel and committed with the final change to jtreg when they are all approved.

@lkorinth
Copy link
Contributor Author

Generally, changes with this many changed files are broken down into smaller PRs to make the review easier and more conclusive. In this case, hotspot, jdk, and langtools might have been good groupings. The reviews could be done in parallel and committed with the final change to jtreg when they are all approved.

Noted. I did it so reviewers could se the change "as a whole". Feel free to review a part of the change!

Copy link
Member

@lmesnik lmesnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this! I think the changes are good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 18, 2025
@lkorinth
Copy link
Contributor Author

Hi Leo, I played a bit with your changes and I observed intermittent timeout failures for the following tests:

	java/net/httpclient/CancelledResponse.java
	java/net/httpclient/whitebox/FlowTestDriver.java

The first test failing more frequently. Could you please add /timeout=480 to these two tests as well?

Fixed! Thanks for helping!

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 18, 2025
@dfuch
Copy link
Member

dfuch commented Aug 18, 2025

Thanks! Changes to JNDI / net / httpclient LGTM.

@dholmes-ora
Copy link
Member

By rough count there are 1300 tests that have an explicit timeout set. This change would reduce the actual applied timeout to a quarter of what was previously used, yet you have only had to bump the timeout value for a fraction of the tests - which I find somewhat (pleasantly) surprising. It may be that many of these timeouts stem from a time when we had much much slower machines, so a refresh might not be a bad thing. It will take some time to see the full effects of this change though.

@lkorinth
Copy link
Contributor Author

By rough count there are 1300 tests that have an explicit timeout set. This change would reduce the actual applied timeout to a quarter of what was previously used, yet you have only had to bump the timeout value for a fraction of the tests - which I find somewhat (pleasantly) surprising. It may be that many of these timeouts stem from a time when we had much much slower machines, so a refresh might not be a bad thing. It will take some time to see the full effects of this change though.

Thanks David! And as you said, a few more adjustments will probably be needed when we have run this for a while.

@magicus
Copy link
Member

magicus commented Aug 20, 2025

It seems to me like there is a need to automatically collect normal test run times automatically, so we can match the timeout given to any individual test with the normal execution time. After all, the purpose of any timeout on tests is to allow the test to execute normally, but not wait too long in case of a problem that causes the test to take too long (or forever) to run.

I realize that this is highly hardware dependent, but test times tend to be Pareto distributed, so a very quick test maybe takes 1 second on fast machines and 10 on slow, and very slow tests maybe take 15 minutes on fast machines and 40 minutes on slow. In the first case, anything above 15 seconds is probably sus, and in the second case, anything above 60 is probably not good either (I'm just adding 50% to the max time).

Right now it seems engineers is spending their valuable time giving guesstimates for each individual test. That seems like poorly used time and resources.

Copy link
Member

@magicus magicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build changes look good now. Thanks!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 20, 2025
@dholmes-ora
Copy link
Member

I realize that this is highly hardware dependent, but test times tend to be Pareto distributed, so a very quick test maybe takes 1 second on fast machines and 10 on slow,

@magicus unfortunately that is often not the case in practice. We can see many tests that normally run very quickly but occasionally run very slow - minutes versus seconds.

Right now it seems engineers is spending their valuable time giving guesstimates for each individual test. That seems like poorly used time and resources.

For this exercise existing explicit timeout values need to be multiplied by 4 to keep the same absolute timeout value. In addition a number of tests that use the implicit default timeout (120*4) now need an explicit timeout (480 generally).

@magicus
Copy link
Member

magicus commented Aug 21, 2025

@magicus unfortunately that is often not the case in practice. We can see many tests that normally run very quickly but occasionally run very slow - minutes versus seconds.

That is surprising and a bit disappointing. I guess it is not worth the effort to try and figure out why this is the case; it could probably vary from test to test and be difficult to track for little gain. Still, it makes you wonder.

@@ -46,7 +46,7 @@
* @test
* @bug 8204454 8359053
* @summary URL.getContent() should return SoundClip for supported formats
* @run main/othervm -Xmx128m AudioContentHandlers
* @run main/othervm/timeout=480 -Xmx128m AudioContentHandlers
*/
Copy link
Contributor

@prrace prrace Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at our CI and this test has run 80,000 times and only 10 of those have gone > 120 seconds (and only 2 > 145 seconds)
Perhaps I'd see similar for other tests. But I need to hear test-specific reasons for the test-specific boost of 4x from what I think (120) is the default to 480.
Otherwise I'd prefer no change, or a small change, by maybe 1.5x not 4x, and we'll adjust the test when we see evidence that it is not enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prrace the change maintains the same absolute timeout value for those tests. Before the default of 120 was multiplied by the timeoutFactor of 4 to given 480. Now the value 480 is multiplied by the timeoutFactor of 1 to give 480. And IIRC Leo only did that for tests that demonstrated a timeout with the new default settings (120*1). It is not practical for Leo to investigate every changed test to see if it could get away with a value between 120 and 480. The change just maintains the status quo. Test owners are free to investigate further if they think it worth fine tuning these values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.