-
Notifications
You must be signed in to change notification settings - Fork 119
Conformance and Functional Tests Failing Inconsistently #3433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
After a couple days of investigation on the conformance tests, findings were limited and I could not pinpoint the cause of the flakey conformance tests. Here are some details on my findings: After about 50-60 local test runs, with variations including: keeping the same NGF instance between test runs, deleting and restarting the NGF instance between test runs, experimental tests on/off, NGINX OSS or Plus, all test runs passed. Compiling information from 15 or so pipeline runs, it seems like Kubernetes version, NGINX OSS or Plus, or experimental tests on/off did not have a major influence on the success of a conformance test run. Slightly more pipeline runs with experimental tests on failed, however I think that could be just because they run some more tests, increasing the likelihood of a flakey failure. There was no pattern of a specific conformance test case failing among failed conformance test runs. Some cases of errors that I saw:
Some pipeline runs would pass completely, so I am setting a current failure rate of a conformance pipeline run at 1/12 ~= 8% There is currently some thought that a fix of a bug on the NGINX Agent side might fix some of these issues. |
Note: On local and pipeline runs with NGINX Plus, the conformance test still passes even if these errors exist:
Example job that has those errors, and yet still passes: https://github.com/nginx/nginx-gateway-fabric/actions/runs/15473165211/job/43562767814 |
After investigating for 2 days, I am not able to pin point exactly why the functional tests are failing. I did manual runs of NGF with OSS and plus a lot of times to reciprocate the issue locally but couldn't get the tests that were failing in the pipeline fail locally. Some of the failing pipeline test failures were -
Upon further investigation using print statements in pipeline, graceful recovery tests fail due to I have opened a PR to ignore the upstream error message to avoid issues. Once we have a bug fix from Agent team, we can remove this error message and re-verify if the issue still exists. |
Uh oh!
There was an error while loading. Please reload this page.
Both conformance and functional tests are failing inconsistently. We need an investigation into if there is a common root cause amongst failures. We see timeouts and many other types of failures.
Acceptance
Possible Causes
The text was updated successfully, but these errors were encountered: