Skip to content

Extract VM images #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pombredanne opened this issue Feb 11, 2021 · 3 comments
Open

Extract VM images #16

pombredanne opened this issue Feb 11, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@pombredanne
Copy link
Member

We should be able to extract VMDK, VDI and similar qcow images, as well as ext2, ext3 and ext4 (and ideally some squashfs too?)

@pombredanne pombredanne added the enhancement New feature or request label Feb 11, 2021
@pombredanne
Copy link
Member Author

Note that one immediate application would be in scancode.io rootfs pipeline

@pombredanne
Copy link
Member Author

I played with a few tools and there is one that shines brightly by @rwmjones and that's https://libguestfs.org/
It works beautifully using the tar-out format.

pombredanne added a commit that referenced this issue Apr 6, 2021
THis is a two step extraction using libguestfs to get a FS to a tarball
which is then extractcode normally (hence dealing with links, device
files and other permission oddities as a side effect).

We support VDI (VirtualBox, VMDK (VMware) and QCOW2 (QEMU)

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Apr 22, 2021
pombredanne added a commit that referenced this issue Jun 1, 2021
@tdruez
Copy link

tdruez commented Jun 1, 2021

The --all-formats is required for the new extraction features but not documented.

More importantly, the all_formats=False was added as an argument of the extract_file function but is not used, see https://github.com/nexB/extractcode/blob/main/src/extractcode/extract.py#L230

Also, why would we want such option in the first place?

pombredanne added a commit that referenced this issue Jun 2, 2021
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 2, 2021
- This is to extract a single archive file of any supported format
  non recursively.
- Also apply minor formatting and refactoring for readability
- Improve docstrings
- Add tests

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants