Skip to content

package: skip unpack errors on paths excluded by manifest #19324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

ianic
Copy link
Contributor

@ianic ianic commented Mar 16, 2024

The purpose of this is to prepare the ground for package manager related changes like:
#17463 and #18089.

I found that there is are same logic implemented in both tar and git unpacking.

Tar pipeToFileSystem and git checkout are both: collecting file system errors,
writing that errors to error bundle.
Recursive directory copy and pipeToFileSystem are both creating Dir when
writing file while Dir still doesn't exist.

Adding something new requires changes in a few places.

This collects file system logic in a single place and leaves tar/git with their
primary job: decoding tarball or git pack.

I'll mark this draft hoping to get some kind of go/no go to implementing fixes
to the above issues on top of this.

@ianic ianic marked this pull request as draft March 16, 2024 20:00
Copy link
Contributor

@ianprime0509 ianprime0509 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth (I'm not on the core team, just a contributor), this approach seems reasonable to me. Using an iterator interface in the package unpacking logic definitely seems nicer than requiring each file tree structure to expose its own interface to "hook" into the extraction logic using Diagnostics.

I'll be happy to help with any support needed on the Git side of things for this PR, such as the iterator approach I mentioned in my other comment.

Comment on lines 85 to 149
// Same interface as std.fs.Dir.createFile, symLink
const inf = struct {
parent: *Self,

pub fn makePath(_: @This(), _: []const u8) !void {}

pub fn createFile(t: @This(), sub_path: []const u8, _: fs.File.CreateFlags) !fs.File {
return (try t.parent.createFile(sub_path)) orelse error.Skip;
}

pub fn symLink(t: @This(), target_path: []const u8, sym_link_path: []const u8, _: fs.Dir.SymLinkFlags) !void {
try t.parent.symLink(target_path, sym_link_path);
}
}{ .parent = self };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why this was introduced, but I think a better long-term solution to solve this use-case would be for git.zig to expose an iterator over tree entries and use that from the unpacking code. That would avoid the use of anytype and error.Skip. Since git.zig isn't a public API and is used only for package fetching, the checkout function could be completely removed in favor of the tree iterator design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

That was also my thinking but since checkout is used in test and the main function I decided for the first step to make just minimal changes so those two can be mostly unchanged.
Iterator which returns all entries, and does not require a recursive call for each directory, will be nice to use from the outside.
That error.Skip is a really ugly hack.

@andrewrk
Copy link
Member

andrewrk commented Mar 17, 2024

I think both ways of doing it are reasonable. To me it is
AAA BBB CCC vs
ABC ABC ABC.

I think one or the other can be better depending on the problem, and in general beginner software developers have a bias towards the first one while intermediate programmers have a bias towards the second one.

For this part of the codebase, I did consider both options and reasoned that, while the former does involve having similar logic implemented multiple times, it makes up for it in the sense that you can more easily read the code for a particular fetch mechanism, and perhaps optimize it better.

I made this mistake in the backend of the compiler - I tried to do ABC ABC ABC with respect to CPU architecture, but it turned out to be untenable, and now it has been reworked to the AAA BBB CCC pattern.

It's difficult to tell objectively which way is better. I will say that I do have a preference for status quo.

@ianic
Copy link
Contributor Author

ianic commented Mar 19, 2024

Fixes #18089 (checks the remaining 2 items).

Before this any unpack error will be raised after the unpack step. Now this is postponed until we get the manifest. Manifest is then used to filter errors for the files which are not included.

There are two types of unpack errors:

  • files with the same name but different casing, fails on non case sensitive file systems (Windows, macOS)
  • symlinks, creation fails on Windows if they are disabled

Here is example which has both:

sample
├── build.zig
├── build.zig.zon
├── dir1
│   └── dir2
│       └── dir3
│           ├── file1
│           ├── file2
│           └── file3
├── dir4
│   └── dir5
│       └── dir6
│           ├── file1 -> ../../../dir1/dir2/dir3/file1
│           └── file2_symlink -> ../../../dir1/dir2/dir3/file2
├── dir7
│   └── dir8
│       └── dir9
│           ├── file4
│           └── File4
└── src
    ├── main.zig
    └── root.zig

It is packed in sample.tar, and once more in sample_src.tar where build.zig.zon has only src in paths (excluded all problematic folders):

    .paths = .{
        "src",
    },

On Linux both tars get unpacked. On Windows and macOS both fail before
this changes and the second succeeds after this change.

Windows (with disabled symlinks), before both failing:

C:\Users\ianic\code\zig-bin\zig.exe build --global-cache-dir zig-global-cache-release
C:\Users\ianic\code\issues\fetch\build.zig.zon:78:20: error: unable to unpack tarball
            .url = "http://10.211.55.26:8000/sample.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists
note: unable to create symlink from 'dir4/dir5/dir6/file1' to '../../../dir1/dir2/dir3/file1': AccessDenied
note: unable to create symlink from 'dir4/dir5/dir6/file2_symlink' to '../../../dir1/dir2/dir3/file2': AccessDenied
C:\Users\ianic\code\issues\fetch\build.zig.zon:85:20: error: unable to unpack tarball
            .url = "http://10.211.55.26:8000/sample_src.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists
note: unable to create symlink from 'dir4/dir5/dir6/file1' to '../../../dir1/dir2/dir3/file1': AccessDenied
note: unable to create symlink from 'dir4/dir5/dir6/file2_symlink' to '../../../dir1/dir2/dir3/file2': AccessDenied
PS C:\Users\ianic\code\issues\fetch> C:\Users\ianic\code\zig-bin\bin\zig.exe build --global-cache-dir zig-global-cache-release

after, just first is failing:

C:\Users\ianic\code\zig-io\build\stage3\bin\zig.exe build --global-cache-dir zig-global-cache-master
C:\Users\ianic\code\issues\fetch\build.zig.zon:78:20: error: unable to unpack
            .url = "http://10.211.55.26:8000/sample.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists
note: unable to create symlink from 'dir4/dir5/dir6/file1' to '../../../dir1/dir2/dir3/file1': AccessDenied
note: unable to create symlink from 'dir4/dir5/dir6/file2_symlink' to '../../../dir1/dir2/dir3/file2': AccessDenied

macOS, before both failing:

Fetch Packages [45/40] /Users/ianic/code/zig/issues/fetch/build.zig.zon:75:20: error: unable to unpack tarball
            .url = "file:///Users/ianic/code/zig/issues/fetch/sample.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists
/Users/ianic/code/zig/issues/fetch/build.zig.zon:80:20: error: unable to unpack tarball
            .url = "file:///Users/ianic/code/zig/issues/fetch/sample_src.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

after, just first is failing:

Fetch Packages [41/40] /Users/ianic/code/zig/issues/fetch/build.zig.zon:75:20: error: unable to unpack
            .url = "file:///Users/ianic/code/zig/issues/fetch/sample.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists

I checked that fetch behavior is unchanged for a list of hopefully representative packages.

@ianic ianic marked this pull request as ready for review March 19, 2024 21:19
@ianic ianic changed the title package: refactor tar/git unpack package: skip unpack errors on paths excluded by manifest Mar 19, 2024
ianic added 16 commits March 22, 2024 21:26
Tar pipeToFileSystem and git checkout have repeating logic: collecting
diagnostic on errors, writing that errors to error bundle. Recurisive
directory copy and pipeToFileSystem are both creating dir when writing
file while dir still don't exists. Implementing new logic, thinking
about file executable bit, requires changes on few places.
This collects file handlling logic on single place and leaves other
places with their job: unpacking tarball or git pack.
Just refactoring to put similar code closer.
Logic moved to Unpack.
Diagnostic error collection has been moved to Unpack. Mode does not have
implementation, and after this changes implementation will be in Unpack.
This error type is ignored during tar unpacking.
Option to remove errors from paths not included in package.
Prior to
[this](ziglang@ef9966c#diff-08c935ef8c633bb630641d44230597f1cff5afb0e736d451e2ba5569fa53d915R805)
commit tar was not a valid extension. After that this one is valid case.
Before this any unpack error will be raised after the unpack step. Now
this is postponed until we get manifest, and if then use included
folders from manifest to filter errors which are not for the files from
included folders.

I made a folder which has symlinks and file names with same names in
different casing.
```
sample
├── build.zig
├── build.zig.zon
├── dir1
│   └── dir2
│       └── dir3
│           ├── file1
│           ├── file2
│           └── file3
├── dir4
│   └── dir5
│       └── dir6
│           ├── file1 -> ../../../dir1/dir2/dir3/file1
│           └── file2_symlink -> ../../../dir1/dir2/dir3/file2
├── dir7
│   └── dir8
│       └── dir9
│           ├── file4
│           └── File4
└── src
    ├── main.zig
    └── root.zig
```

Then pack this into sample.tar and another one sample_src.tar where
build.zig.zon has only src in paths (excluded all problematic folders):
```Zig
    .paths = .{
        "src",
    },
```

On Linux both tars get unpacked. On Windows and macOS both fail before
this changes and second succeds after this change.

Windows (with disabled symlinks):
```
C:\Users\ianic\code\zig-bin\zig.exe build --global-cache-dir zig-global-cache-release
C:\Users\ianic\code\issues\fetch\build.zig.zon:78:20: error: unable to unpack tarball
            .url = "http://10.211.55.26:8000/sample.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists
note: unable to create symlink from 'dir4/dir5/dir6/file1' to '../../../dir1/dir2/dir3/file1': AccessDenied
note: unable to create symlink from 'dir4/dir5/dir6/file2_symlink' to '../../../dir1/dir2/dir3/file2': AccessDenied
C:\Users\ianic\code\issues\fetch\build.zig.zon:85:20: error: unable to unpack tarball
            .url = "http://10.211.55.26:8000/sample_src.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists
note: unable to create symlink from 'dir4/dir5/dir6/file1' to '../../../dir1/dir2/dir3/file1': AccessDenied
note: unable to create symlink from 'dir4/dir5/dir6/file2_symlink' to '../../../dir1/dir2/dir3/file2': AccessDenied
PS C:\Users\ianic\code\issues\fetch> C:\Users\ianic\code\zig-bin\bin\zig.exe build --global-cache-dir zig-global-cache-release

C:\Users\ianic\code\zig-io\build\stage3\bin\zig.exe build --global-cache-dir zig-global-cache-master
C:\Users\ianic\code\issues\fetch\build.zig.zon:78:20: error: unable to unpack
            .url = "http://10.211.55.26:8000/sample.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists
note: unable to create symlink from 'dir4/dir5/dir6/file1' to '../../../dir1/dir2/dir3/file1': AccessDenied
note: unable to create symlink from 'dir4/dir5/dir6/file2_symlink' to '../../../dir1/dir2/dir3/file2': AccessDenied
```

macOS
```
Fetch Packages [45/40] /Users/ianic/code/zig/issues/fetch/build.zig.zon:75:20: error: unable to unpack tarball
            .url = "file:///Users/ianic/code/zig/issues/fetch/sample.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists
/Users/ianic/code/zig/issues/fetch/build.zig.zon:80:20: error: unable to unpack tarball
            .url = "file:///Users/ianic/code/zig/issues/fetch/sample_src.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fetch Packages [41/40] /Users/ianic/code/zig/issues/fetch/build.zig.zon:75:20: error: unable to unpack
            .url = "file:///Users/ianic/code/zig/issues/fetch/sample.tar",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: unable to create file 'dir7/dir8/dir9/file4': PathAlreadyExists

```
In pipeToFileSystem when option mode_mode option is set on posix file systems.
Not used from Unpack any more.
Unpack tar without removing leading root folder. Then find package root
in unpacked tmp folder.

Fixes ziglang#17779
@ianic
Copy link
Contributor Author

ianic commented Mar 22, 2024

Adding logic from another PR: #19111 which fixes #17779.
Tar unpacking was removing the leading root folder. But there are packages without that leading folder that all fail with BadFileName. When one component is stripped from files in the root that leaves us with the empty file name.
We are now packing without removing the root folder and then deciding whether to skip a single root folder.

This now fixes 0.12 milestone package manager related issues: #17779 and #18089.
It also fixes #17460.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to tell objectively which way is better. I will say that I do have a preference for status quo.

I'm going to strengthen my stance on this. Sorry to do this after you have put so much work into it. But as the person who ultimately must maintain this code, I do want to keep my preference here. I don't want this restructuring to unify the git unpacking and archive unpacking. I liked it better before, when there were separate unpack implementations that had a set of well-defined tasks they were supposed to do, even though it was possible for them to get out of sync.

@ianic
Copy link
Contributor Author

ianic commented Mar 26, 2024

Sorry to do this after you have put so much work into it.

No problem.
Thanks for feedback.
Will close this and try some other approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants