Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align with GNU Tar when a file name is too long #130819

Open
gdh1995 opened this issue Mar 4, 2025 · 0 comments
Open

Align with GNU Tar when a file name is too long #130819

gdh1995 opened this issue Mar 4, 2025 · 0 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@gdh1995
Copy link

gdh1995 commented Mar 4, 2025

Bug report

Bug description:

Recently I found tarfile may generate a file slightly different with the one made by GNU Tar (https://www.gnu.org/software/tar/), especially when a path name is longer than 100 bytes.

Here's the test code:

# py-tar.py
import tarfile, io
memory_file = io.BytesIO()
tar_obj = tarfile.open(name=None, mode="w", fileobj=memory_file, format=tarfile.GNU_FORMAT)
tar_info = tarfile.TarInfo("abcdef" * 20)
tar_info.type = tarfile.DIRTYPE
tar_info.mode = 0o755
tar_info.mtime = 1609459200  # UTC 2021-01-01
tar_info.uid = 1000
tar_info.gid = 1000
tar_info.uname = "ubuntu"
tar_info.gname = "ubuntu"
tar_obj.addfile(tar_info, None)
tar_obj.close()
memory_file.seek(0)
binary_data = memory_file.read()
# import binascii
# hex_data = binascii.hexlify(binary_data)
# sep = 16
# for i in range(0, len(hex_data), sep * 2):
#     part = hex_data[i:i + sep * 2]
#     print(*(part[i:i+2].decode() for i in range(0, len(part), 2)), binary_data[i//2:][:sep], sep=" ")
with open("py.tar", "wb") as fp:
    fp.write(binary_data)
mkdir -m 755 abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdef
tar cf gnu.tar --sort=name --owner=ubuntu:1000 --group=ubuntu:1000 --mtime='UTC 2021-01-01' abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdef/
python ./py-tar.py

As a result, when comparing the generated py.tar and gnu.tar, we may get such a difference:

Image

So I wonder will python might align such a detail on tarfile.GNU_FORMAT with the one of GNU tar?

BTW, here's my environment (I'm on Ubuntu 24.04), and I find the main branch of CPython has a similar Lib/tarfile.py and should have a same behavior difference:

$ LANG=C tar --version
tar (GNU tar) 1.35
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
$ LANG=C python --version
Python 3.12.3

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant