Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse file in archive triggers "OSError: [Errno 28] No space left on device: '...' " during scan #32

Open
goekDil opened this issue Sep 29, 2021 · 8 comments

Comments

@goekDil
Copy link

goekDil commented Sep 29, 2021

Description

moby-20.10.5.zip

After scanning the folder 'moby-20.10.5' I ran into following error:

  • (from docker logs)
    image

There should be enough space left on the device.

How To Reproduce

Scan folder 'moby-20.10.5' with following function.

image

Note, that max_depth=0, such that there are no limitations.

System configuration

For bug reports, it really helps us to know:

  • What OS are you running on? (Windows/MacOS/Linux)
    Windows/docker, as well as Linux/docker
  • What version of scancode-toolkit was used to generate the scan file?
    21.8.4
  • What installation method was used to install/run scancode? (pip/source download/other)
    pip with Python 3.6
@goekDil goekDil changed the title "OSError: [Errno 28]" No space left on device: '...' during scan "OSError: [Errno 28] No space left on device: '...' " during scan Sep 29, 2021
@pombredanne
Copy link
Member

@pombredanne
Copy link
Member

And I had not seen your attachment! so I am good wih the code link you provided.

So I think this is sparse tarball issue which is 60GB extracted and only 5KB otherwise, unextracted, and not even compressed

Can you test scanning that one single 5KB file?
https://github.com/moby/moby/blob/363e9a88a11be517d9e8c65c998ff56f774eb4dc/vendor/archive/tar/testdata/gnu-sparse-big.tar

And can you detail what pre-processing you apply on this. Did you call extractcode first?
Is there a place where I can see your code? I am always interested in how ScanCode is integrated!

@pombredanne
Copy link
Member

Note that this could be related to aboutcode-org/scancode-toolkit#2431 where @Angi2412 and @avishmehta68710 both mentioned having a similar issue or at least this comment aboutcode-org/scancode-toolkit#2431 (comment) references the same code:

I could observe the same behaviour and error message with this file: moby v20.10.5.

@goekDil
Copy link
Author

goekDil commented Sep 30, 2021

Yes, I apply extractcode first! Unfortunately, there is no place where you can see my code, but I can provide a minimalexample.py:

  from scancode import cli
  import multiprocessing
  import logging
  from extractcode import extract
  
  def main():
  
      extract_log = extract.extract(location='gnu-sparse-big.tar', recurse=True)
      i = 0
      for e in extract_log:
          print(e)
  
      rc, results = cli.run_scan(
          'gnu-sparse-big.tar-extract', license=True, copyright=True,
          return_results=True, processes=14, verbose=True, quiet=False,
          timeout=46800)
  
      print(rc, results)
  
  if __name__ == "__main__":
      main()

As you proposed, I tested the one single 5KB file and the exact same Error occurs:

image
(docker logs) (Linux and Windows with docker)

Depending on the machine I work on

  • I either experience the same Issue as in Timeout error despite high timeout scancode-toolkit#2431 (Linux with docker or run minimalexample.py as standalone python script)
  • or there appears the Issue "OSError: [Errno 28] No space left on device" as I mentioned above (Windows with docker, as well as Linux with docker)

@pombredanne
Copy link
Member

So this is a sparse file issue. The short term workaround may be to ignore the gnu-sparse-big.tar file entirely.

@goekDil
Copy link
Author

goekDil commented Oct 8, 2021

Thank you very much! After further analysis, I also came to the conclusion that this is a sparse file issue.

@goekDil goekDil closed this as completed Oct 8, 2021
@pombredanne
Copy link
Member

I still may want to keep this open for now and transfer the issue to extractcode... as we may want to have a special processing for sparse files

@pombredanne pombredanne reopened this Oct 8, 2021
@pombredanne pombredanne changed the title "OSError: [Errno 28] No space left on device: '...' " during scan Sparse file in archive triggers "OSError: [Errno 28] No space left on device: '...' " during scan Oct 8, 2021
@pombredanne pombredanne transferred this issue from aboutcode-org/scancode-toolkit Oct 8, 2021
@pombredanne
Copy link
Member

Done... now in extractcode!

pombredanne pushed a commit that referenced this issue Oct 8, 2021
Check for deps in local thirdparty directory #31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants