-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Better errors from runc init
#4928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
08fb065 to
0200b76
Compare
0200b76 to
af1e5f2
Compare
af1e5f2 to
0871366
Compare
0871366 to
8d2e079
Compare
8d2e079 to
c735358
Compare
abf4958 to
ef31851
Compare
ef31851 to
3e5a8ed
Compare
|
@kolyshkin The extra path (the one no present in the other mentioned PRs) LGTM. But would that print the libcrypto issue? I mean, is the go panic forwarded? This panic you posted in this issue, for example: #4916 (comment) It seems packages.microsoft.com is down now, I can't easily test myself (Yeah, I'm sending some messages, but they are probably aware already :)). If you still have that install handy, it will be great if you can test it :) |
In case early stage of runc init (nsenter) fails for some reason, it logs error(s) with FATAL log level, via bail(). The runc init log is read by a parent (runc create/run/exec) and is logged via normal logrus mechanism, which is all fine and dandy, except when `runc init` fails, we return the error from the parent (which is usually not too helpful, for example): runc run failed: unable to start container process: can't get final child's PID from pipe: EOF Now, the actual underlying error is from runc init and it was logged earlier; here's how full runc output looks like: FATA[0000] nsexec-1[3247792]: failed to unshare remaining namespaces: No space left on device FATA[0000] nsexec-0[3247790]: failed to sync with stage-1: next state: Success ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF The problem is, upper level runtimes tend to ignore everything except the last line from runc, and thus error reported by e.g. docker is not very helpful. This patch tries to improve the situation by collecting FATAL errors from runc init and appending those to the error returned (instead of logging). With it, the above error will look like this: ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF; runc init error(s): nsexec-1[141549]: failed to unshare remaining namespaces: No space left on device; nsexec-0[141547]: failed to sync with stage-1: next state: Success Yes, it is long and ugly, but at least the upper level runtime will report it. Signed-off-by: Kir Kolyshkin <[email protected]>
3e5a8ed to
d387935
Compare
Alas, no. This PR is about the C code of You can emulate the libcrypto error by adding "panic" call into |
This currently includes #4930 (and serves as a test for it). Draft until that one is merged.This currently includes #4951 and is therefore a draft until #4951 is merged.Inspired by the discussion in #4905.
In case early stage of runc init (nsenter) fails for some reason, it
logs error(s) with FATAL log level, via bail().
The runc init log is read by a parent (runc create/run/exec) and is
logged via normal logrus mechanism, which is all fine and dandy, except
when
runc initfails, we return the error from the parent (which isusually not too helpful, for example):
Now, the actual underlying error is from runc init and it was logged
earlier; here's how full runc output looks like:
The problem is, upper level runtimes tend to ignore everything except
the last line from runc, and thus error reported by e.g. docker is not
very helpful.
This patch tries to improve the situation by collecting FATAL errors
from runc init and appending those to the error returned (instead of
logging). With it, the above error will look like this:
Yes, it is long and ugly, but at least the upper level runtime will
report it.
Fixes: #4905