Skip to content

Conversation

@last-genius
Copy link
Contributor

@last-genius last-genius commented Nov 26, 2025

Implements an optimization similar to the "hybrid" mode in vhd-tool: when exporting to qcow from raw, if the VDI is backed by a QCOW file, read its header, determine the allocated clusters, and only export these. This allows skipping over zero clusters in a sparse disk.

Unlike vhd-tool, however, this is implemented in a modular way - qcow-stream-tool gets a new read_headers command that outputs the list of allocated clusters (and other info) in JSON format, which allows it to be consumed by the Python qcow-to-stdout script (and by vhd-tool in future stages of this work, see below).

This is the first step of improving handling of sparse VDIs in xapi. I've got the rest working, but I'll be opening PRs step-by-step for the following once this PR gets merged:

  1. vhd-tool gets a read_headers command outputting list of allocated blocks as well
  2. stream_vdi uses read_headers for both VHD and QCOW to avoid reading zero blocks on XVA export (greatly speeds up handling of sparse disks and avoids issues with timeouts)
  3. vhd-tool and qcow-to-stdout can read headers of the opposite format, allowing faster export of sparse VDIs backed by a different format.

Best reviewed by commit.

It returns info on the allocated clusters in a JSON.

Signed-off-by: Andrii Sultanov <[email protected]>
@last-genius last-genius force-pushed the asv/qcow-read-headers branch from 87a2aec to b3903b5 Compare November 26, 2025 15:54
elif nonzero_clusters is not None:
if diff_file_name:
if diff_virtual_size is None or diff_nonzero_clusters is None:
sys.exit("[Error] QCOW headers for the diff file were not provided.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the other components handle this error gracefully? Maybe this case should be treated in the else where diff_file_name is not provided instead to be able to still work, albeit in a slower way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's better to report this error rather than suffer a silent performance hit - diffing between a qcow-backed vdi and a non-qcow backed vdi shouldn't happen and would indicate something going wrong

args.cluster_size = source_cluster_size
nonzero_clusters = json_header['data_clusters']
except KeyError as e:
raise RuntimeError(f'Incomplete JSON - missing value for {str(e)}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar way here, it's probably worth logging, but not erroring out when it's running on customer's machines.

@psafont
Copy link
Member

psafont commented Nov 27, 2025

There are two warning raised by reviewdog, could you see to them? Otherwise I have no objections

On export, instead of reading the whole raw disk, consult the JSON (if
provided), and only allocate the clusters that are present in the table.

This is analogous to vhd-tool's handling of export, and greatly speeds up
handling of sparse disks.

Signed-off-by: Andrii Sultanov <[email protected]>
Take the expected driver type as a parameter, to allow this helper to be used
by qcow code as well.

Signed-off-by: Andrii Sultanov <[email protected]>
Pass the JSON output of read_headers into qcow2-to-stdout to handle the export
further.

Signed-off-by: Andrii Sultanov <[email protected]>
@last-genius last-genius force-pushed the asv/qcow-read-headers branch from b3903b5 to db9c477 Compare November 27, 2025 16:54
…ters

Translates JSON from qcow-stream-tool to OCaml types.

This is currently unused, but will be used in stream_vdi and vhd_tool_wrapper
in the future.

Signed-off-by: Andrii Sultanov <[email protected]>
@last-genius last-genius force-pushed the asv/qcow-read-headers branch from db9c477 to 92f4def Compare November 27, 2025 17:09
let clusters =
List.map
(fun (_, virt_address) ->
let ( |> ) = Int64.shift_right_logical in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( >>) would be more conventional, especially when |> is used frequently for something else.

]
in
let json_string = Yojson.to_string json in
let* () = Lwt_io.printf "%s" json_string in
Copy link
Contributor

@lindig lindig Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Lwt_io.print without a format string?

let json = Yojson.Basic.from_channel ~buf ~fname:"qcow_header.json" ic in
In_channel.close ic ;
let cluster_size =
1 lsl Yojson.Basic.Util.(member "cluster_bits" json |> to_int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not define (<<)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants