Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite cp #3323

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Rewrite cp #3323

wants to merge 1 commit into from

Conversation

apostasie
Copy link
Contributor

@apostasie apostasie commented Aug 17, 2024

The motivation for this is to address all issues (and variants not listed as individual tickets) in:

The fundamental problem with our current implementation of cp is that the logic to figure out if a destination is read-only is incomplete.
One of the key reason for it is that we do not follow symlinks, or relative paths, inside a container destination. For example, if we are copying to a path located into a readonly volume, we stop there, while the path may very well resolve to a completely different volume (or the rootfs).
Furthermore, there are other logic issues involving readonly rootfs, leading to situations where we DO copy while we should not, and conversely, where we do NOT copy while we should.

There is a also a fair amount of duplication, and finally, test isolation was problematic.

For this PR, the highlights are, for the testing part:

  • tests rewritten from scratch to cleanly separate test cases and test rig (should make adding more tests cases easier)
  • expanded test-suite to add more cases
  • better isolation
  • beside the "regular" expanded tests, added a "TestAcidCopy" test meant to reproduce corner/complicated cases and other "special" conditions that do not fit in the normal test rig

For the code part:

  • provide a mechanism to fully resolve a container path while on the host when all we have is the mounted root snapshot
  • provide a range of more expressive and informative errors for the user to diagnose issues
  • should address all link resolution, relative paths and read-only concerns

Also note this is superseding #3275

I appreciate this is a somewhat sizable PR (although a very large part of it (about 1500 lines) are just for tests).

If there is a different (simpler) approach to this problem - that would pass the test-suite - I am happy to re-evaluate of course.

@apostasie apostasie force-pushed the b-dev-cp branch 4 times, most recently from f1662be to a486b0a Compare August 17, 2024 22:48
options.SrcPath,
options.GOptions.Snapshotter,
options.FollowSymLink)
options)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplifying signature

pkg/testutil/testutil.go Outdated Show resolved Hide resolved
@apostasie apostasie force-pushed the b-dev-cp branch 8 times, most recently from 5ff18ff to 7d17a10 Compare August 21, 2024 20:06
err = errors.Join(errors.New("unable to retrieve containers with error"), err)
} else {
err = fmt.Errorf("no container found for: %s", options.ContainerReq)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure we report the right error message to the user.

if err := os.MkdirAll(dstFull, 0o755); err != nil {
return err

// XXX FIXME: this seems wrong. What about ownership? We could be doing that inside a container
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not touching this right now.
This is previous code.
Am I mis-reading this, or is this a problem when doing that in a container target? (eg: shouldn't we do that as part of the nsenter/re-exec routine?)

resp, err := client.SnapshotService(snapshotter).Mounts(ctx, snapKey)
if err != nil {
return "", nil, err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you don't like these Akihiro, but would you reconsider? It does make code much easier to read for these old eyes here :-)

@@ -0,0 +1,436 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the gist of it, that provides (proper) symlink and path resolution.

@@ -0,0 +1,24 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one day we support this on Windows - we will need the win version of volumeNameLen

// FIXME: the following will break the test (anything that will evaluate on the shell, obviously):
// - `
// - $a, ${a}, etc
complexify = "" // = "-~a0-_.(){}[]*#! \"'∞"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful to stress test rig when modifying it. Deactivated by default as it makes things much harder to read.

@apostasie apostasie changed the title [WIP] Rewrite cp Rewrite cp Aug 21, 2024
@apostasie apostasie marked this pull request as ready for review August 21, 2024 21:06
@apostasie
Copy link
Contributor Author

CI is green.

PTAL at your convenience.

@apostasie
Copy link
Contributor Author

Rebased and comments addressed.

pkg/testutil/testutil.go Outdated Show resolved Hide resolved
},
}
count, err := walker.Walk(ctx, options.ContainerReq)

if count < 1 {
err = fmt.Errorf("could not find container: %s, with error: %w", options.ContainerReq, err)
if count == -1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this change. What's the issue with the current logic? (also I don't think we should panic here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current logic (count < 1) would treat identically two unrelated conditions:

  • count == 0, which means we found no container
  • count == -1, which means we errored

https://github.com/containerd/nerdctl/blob/main/pkg/idutil/containerwalker/containerwalker.go#L47-L74

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the panic, it would literally never happen (looking at the code in ^).
Let me know if you feel this is still something we should change.

if err != nil {
return err
return errors.Join(ErrShouldNeverHappen, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a ErrShouldNeverHappen error?

Copy link
Contributor Author

@apostasie apostasie Aug 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only situation I can think of where you would get an error here is if the container that was still live at the time we called CopyFiles has since been deleted.
As mentioned above, while this is not impossible, it is incredibly unlikely.

You are right though that it is not 0 probability, and I can wrap that in another condition instead.

return errors.Join(fpErr, err)
}

return errors.Join(fmt.Errorf("failed to execute %v", tarXCmd.Args), err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just use fmt.Errorf("failed to execute %v: %w", tarXCmd.Args, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think errors.Join is better.

With fmt.Errorf:

  • we get a variety of format accross the codebase ("failed: %w", "failed (%w)", "failed with %w", "%w (hint: foo bla)")
  • because of ^, things are less readable, as errors are clumped together with little indication which comes first

err = mount.All(resp, tempDir)
if err != nil {
return "", nil, fmt.Errorf("failed to mount snapshot with error %s", err.Error())
return "", nil, errors.Join(errors.New("failed to mount snapshot"), err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt.Errorf("failed to mount snapshot: %w", err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same as above)

@apostasie
Copy link
Contributor Author

Thanks a lot for the ongoing review @djdongjin !
Let me know your thoughts in the individual comments - will look again tomorrow.

@apostasie apostasie force-pushed the b-dev-cp branch 4 times, most recently from aac449f to 5f70ba4 Compare September 3, 2024 03:09
@apostasie
Copy link
Contributor Author

Rebased.
Pending CI.

@apostasie
Copy link
Contributor Author

apostasie commented Sep 3, 2024

Obviously, the move to sub-packages is further exposing problems in our tests (chiefly because the parallel pool is greatly reduced - split in 15 buckets - which significantly increase the risk of a negative interaction, and the chance of cross package leftovers (15x)).

The introduction of a lot more / longer cp tests with this PR is likely tripping unrelated tests - and there seem to be something wrong with encrypt tests (starting with the fact that they are pruning), and even for the docker compat suite (with prune again).

That doesn't change things for this PR fundamentally, as the problem is not the code in here (which is very limited in its impact) - but obviously I'll send a couple of other PRs first to fix some of our tests issues so this here gets green.

@apostasie
Copy link
Contributor Author

Pending #3402

@apostasie
Copy link
Contributor Author

Reverting to draft until I figure out what's going on with testing.

Signed-off-by: apostasie <[email protected]>
@apostasie
Copy link
Contributor Author

Will wait for new tooling framework to land to update tests and get some clarity on what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants