Skip to content

Conversation

damanm24
Copy link
Contributor

@damanm24 damanm24 commented Sep 12, 2025

#1937 introduced new vmm_tests which seem to be flaky as they will occasionally timeout. Upon further investigation, the following message can be observed in the petri logs: hv_vmbus: probe failed for device 766e96f8-2ceb-437e-afe3-a93169e48a7b. It's likely that openhcl is busy with other tasks (such as writing to com3) so it is unable to handle the vmbus probe request from vtl0. This PR attempts to fix that by only emitting high severity logs once kmsg is available.

@damanm24 damanm24 requested a review from a team as a code owner September 12, 2025 23:25
Copy link

Some(client_ref) => {
tracing::info!("Agent is online now");
let sh = client_ref.unix_shell();
pipette_client::cmd!(sh, "dmesg -n 3").run().await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to do any system level filtering like this, instead we can alter the value for OPENVMM_LOG that we pass to the command line during vm construction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, we probably shouldn't be doing this in tests anyways. If a test is too noisy we should lower the levels of individual events from debug to trace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i disagree - i think we want com3 output, but we want to lower the output we spit out once we have better channels available (like kmsg). com3 is really really slow, so we shouldn't rely on verbose output there of any kind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that this is because early output (kernel/bootshim) we have no other mechanism. we could also perhaps look at virtio serial for both as well in a future change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we'd need to handle reboots here too.

self.wait_for_agent().await
let client = self.wait_for_agent().await?;

match self.openhcl_diag() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor the duplicated code, but I wanted to get two things done first:

  1. List of test failures
  2. Feedback on the nature of these changes, to see if I'm on the right track.

Copy link

Copy link

@damanm24 damanm24 changed the title wip: petri: add ability to lower severity of com3 emitted logs, for tests that print too much to com3 petri: add ability to lower severity of com3 emitted logs, for tests that print too much to com3 Sep 18, 2025
Copy link

@damanm24 damanm24 changed the title petri: add ability to lower severity of com3 emitted logs, for tests that print too much to com3 petri: lower severity of com3 emitted logs for all vmm_tests Sep 26, 2025
@damanm24 damanm24 merged commit a1ac181 into microsoft:main Sep 26, 2025
29 checks passed
@tjones60 tjones60 added the backport_2505 Change should be backported to the release/2505 branch label Sep 30, 2025
smalis-msft pushed a commit that referenced this pull request Oct 7, 2025
#2003 Introduced a change to potentially fix a flaky vmm_test related to
validating mnf usage in guests. However, it looks like the test
flakiness has not been resolved. I am disabling the test for now to
unblock contributors while I have a proper chance to investigate the
issue.

---------

Co-authored-by: Daman Mulye <[email protected]>
tjones60 added a commit to tjones60/openvmm that referenced this pull request Oct 14, 2025
tjones60 added a commit that referenced this pull request Oct 14, 2025
#2167)

…(#2003)"

This reverts commit a1ac181.

This change restores COM3 logs until we can figure out a way to switch
over to diag client once it is available.
@chris-oo chris-oo removed the backport_2505 Change should be backported to the release/2505 branch label Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants