You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I found tarfile may generate a file slightly different with the one made by GNU Tar (https://www.gnu.org/software/tar/), especially when a path name is longer than 100 bytes.
Here's the test code:
# py-tar.pyimporttarfile, iomemory_file=io.BytesIO()
tar_obj=tarfile.open(name=None, mode="w", fileobj=memory_file, format=tarfile.GNU_FORMAT)
tar_info=tarfile.TarInfo("abcdef"*20)
tar_info.type=tarfile.DIRTYPEtar_info.mode=0o755tar_info.mtime=1609459200# UTC 2021-01-01tar_info.uid=1000tar_info.gid=1000tar_info.uname="ubuntu"tar_info.gname="ubuntu"tar_obj.addfile(tar_info, None)
tar_obj.close()
memory_file.seek(0)
binary_data=memory_file.read()
# import binascii# hex_data = binascii.hexlify(binary_data)# sep = 16# for i in range(0, len(hex_data), sep * 2):# part = hex_data[i:i + sep * 2]# print(*(part[i:i+2].decode() for i in range(0, len(part), 2)), binary_data[i//2:][:sep], sep=" ")withopen("py.tar", "wb") asfp:
fp.write(binary_data)
As a result, when comparing the generated py.tar and gnu.tar, we may get such a difference:
So I wonder will python might align such a detail on tarfile.GNU_FORMAT with the one of GNU tar?
BTW, here's my environment (I'm on Ubuntu 24.04), and I find the main branch of CPython has a similar Lib/tarfile.py and should have a same behavior difference:
$ LANG=C tar --version
tar (GNU tar) 1.35
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by John Gilmore and Jay Fenlason.
$ LANG=C python --version
Python 3.12.3
Bug report
Bug description:
Recently I found
tarfile
may generate a file slightly different with the one made by GNU Tar (https://www.gnu.org/software/tar/), especially when a path name is longer than 100 bytes.Here's the test code:
mkdir -m 755 abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdef tar cf gnu.tar --sort=name --owner=ubuntu:1000 --group=ubuntu:1000 --mtime='UTC 2021-01-01' abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdef/ python ./py-tar.py
As a result, when comparing the generated
py.tar
andgnu.tar
, we may get such a difference:So I wonder will python might align such a detail on
tarfile.GNU_FORMAT
with the one of GNU tar?BTW, here's my environment (I'm on Ubuntu 24.04), and I find the
main
branch of CPython has a similarLib/tarfile.py
and should have a same behavior difference:CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
tarfile.py#_create_gnu_long_header
to align with GNU Tar #130820The text was updated successfully, but these errors were encountered: