-
Notifications
You must be signed in to change notification settings - Fork 296
qcow: Only process allocated clusters on export from raw #6769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
qcow: Only process allocated clusters on export from raw #6769
Conversation
It returns info on the allocated clusters in a JSON. Signed-off-by: Andrii Sultanov <[email protected]>
87a2aec to
b3903b5
Compare
| elif nonzero_clusters is not None: | ||
| if diff_file_name: | ||
| if diff_virtual_size is None or diff_nonzero_clusters is None: | ||
| sys.exit("[Error] QCOW headers for the diff file were not provided.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the other components handle this error gracefully? Maybe this case should be treated in the else where diff_file_name is not provided instead to be able to still work, albeit in a slower way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's better to report this error rather than suffer a silent performance hit - diffing between a qcow-backed vdi and a non-qcow backed vdi shouldn't happen and would indicate something going wrong
python3/libexec/qcow2-to-stdout.py
Outdated
| args.cluster_size = source_cluster_size | ||
| nonzero_clusters = json_header['data_clusters'] | ||
| except KeyError as e: | ||
| raise RuntimeError(f'Incomplete JSON - missing value for {str(e)}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar way here, it's probably worth logging, but not erroring out when it's running on customer's machines.
|
There are two warning raised by reviewdog, could you see to them? Otherwise I have no objections |
On export, instead of reading the whole raw disk, consult the JSON (if provided), and only allocate the clusters that are present in the table. This is analogous to vhd-tool's handling of export, and greatly speeds up handling of sparse disks. Signed-off-by: Andrii Sultanov <[email protected]>
Take the expected driver type as a parameter, to allow this helper to be used by qcow code as well. Signed-off-by: Andrii Sultanov <[email protected]>
Pass the JSON output of read_headers into qcow2-to-stdout to handle the export further. Signed-off-by: Andrii Sultanov <[email protected]>
b3903b5 to
db9c477
Compare
…ters Translates JSON from qcow-stream-tool to OCaml types. This is currently unused, but will be used in stream_vdi and vhd_tool_wrapper in the future. Signed-off-by: Andrii Sultanov <[email protected]>
db9c477 to
92f4def
Compare
| let clusters = | ||
| List.map | ||
| (fun (_, virt_address) -> | ||
| let ( |> ) = Int64.shift_right_logical in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( >>) would be more conventional, especially when |> is used frequently for something else.
| ] | ||
| in | ||
| let json_string = Yojson.to_string json in | ||
| let* () = Lwt_io.printf "%s" json_string in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not Lwt_io.print without a format string?
| let json = Yojson.Basic.from_channel ~buf ~fname:"qcow_header.json" ic in | ||
| In_channel.close ic ; | ||
| let cluster_size = | ||
| 1 lsl Yojson.Basic.Util.(member "cluster_bits" json |> to_int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not define (<<)?
Implements an optimization similar to the "hybrid" mode in
vhd-tool: when exporting to qcow from raw, if the VDI is backed by a QCOW file, read its header, determine the allocated clusters, and only export these. This allows skipping over zero clusters in a sparse disk.Unlike
vhd-tool, however, this is implemented in a modular way -qcow-stream-toolgets a newread_headerscommand that outputs the list of allocated clusters (and other info) in JSON format, which allows it to be consumed by the Pythonqcow-to-stdoutscript (and byvhd-toolin future stages of this work, see below).This is the first step of improving handling of sparse VDIs in xapi. I've got the rest working, but I'll be opening PRs step-by-step for the following once this PR gets merged:
vhd-toolgets aread_headerscommand outputting list of allocated blocks as wellstream_vdiusesread_headersfor both VHD and QCOW to avoid reading zero blocks on XVA export (greatly speeds up handling of sparse disks and avoids issues with timeouts)vhd-toolandqcow-to-stdoutcan read headers of the opposite format, allowing faster export of sparse VDIs backed by a different format.Best reviewed by commit.