Skip to content

Conversation

@prandla
Copy link
Member

@prandla prandla commented Aug 26, 2025

  • Use /box instead of /tmp as the sandbox's home dir.
  • Remove allow_writing_only; it's impossible to enforce with the new design.
  • Remove "secure commands" mechanism, run all commands in sandbox. (I think it was mostly useful earlier to run unzip, by default with allow_writing_only it couldn't unzip properly. But none of the languages even run unzip anymore.)
  • Simplify cleanup logic a bit, instead of keeping the sandbox dir, we archive the sandbox in case of failure. Also remove the keep_sandbox flag. (TODO: write the tool that unzips an archived sandbox for debugging. It can take the archive hash as an argument and load it from the file storage automatically.)
  • Remove Job.keep_sandbox field, from what I can tell this was already unused.
  • Remove sandbox.max_file_size option: It breaks dotnet, and it doesn't enforce any cap on the total size of /box.
  • And finally, make CMS set isolate's --quota.

Creating automated tests to verify the quota behavior was nontrivial: setting up a quota-enabled filesystem is actually somewhat tricky (or at least poorly documented). Initially I tested locally with a tmpfs as the box_root; this required hacking isolate to use quotactl_fd instead of quotactl. (the former is a much nicer interface, but it's undocumented. I need to make a bug report to man-pages to get it documented :) )

Eventually I figured out that with an ext4 filesystem on a loop device, it's also possible to use the old quotactl, and thus the current release of isolate, when running tests. We apparently need tmpfs quotas on github actions. New Isolate release uses quotactl_fd exclusively.

Also, since the quota setting is completely global, I had to set it to a somewhat high value (or well, I don't actually know how big is required, but since it's also used during compilation i set it to 64MB). And then I had to make the loop filesystem quite large also...

I also wrote some documentation about quotas in the sample config file (should this be moved to the documentation instead?). The conditions for how to configure the filesystem for quota support were found by pure trial and error :), but during it I discovered some combinations where isolate runs successfully with the --quota setting, but the quota is silently not enforced. This seems a bit scary, perhaps we should explicitly warn against this...

Closes #1242, closes #1005, closes #916, closes #309, maybe a few more :)

@prandla prandla requested a review from gollux August 26, 2025 07:56
@prandla
Copy link
Member Author

prandla commented Aug 26, 2025

Oh, i just noticed the error in the CI logs (which was actually there earlier too, i just missed it...)

"mount: /var/lib/isolate: mount(2) system call failed: No such process."

I have no idea what this means. It works on my machine, and I don't have any ubuntu machines to test on...

@prandla
Copy link
Member Author

prandla commented Aug 26, 2025

well great. github actions is running kernel 6.11.0-1018-azure, which simply does not have quota support compiled in (neither built-in nor as a kernel module). (this wasn't easy to even verify because they also disabled /proc/config.gz.)

So, time to go spelunking for a kernel that supports it i guess???

@prandla
Copy link
Member Author

prandla commented Aug 26, 2025

Okay, further experimentation shows that while there is no way to get a more suitable kernel, there is a way to use quotas: the lack of quota support i mentioned earlier turned out to be a lack of support for the ext4 quota file format. tmpfs quotas are supported though.

This still means we need a new isolate release though, because the only way to access tmpfs quotas is via quotactl_fd. I hacked it together here: prandla/isolate@8c4f994 but you can do a nicer implementation that checks for #ifdef SYS_quotactl_fd if you want.

* remove the "secure commands" mechanism, run all commands in isolate
* remove the allow_writing_only mechanism (to be replaced with proper FS
  quotas)
* simplify the cleanup() logic, always delete sandboxes after we are
  finished using them (sandbox archive works well enough for debugging
  failures)
This behavior is allowed now.
@prandla prandla force-pushed the isolate-box-dir branch 2 times, most recently from d449c51 to 2a147ab Compare August 31, 2025 08:04
@prandla
Copy link
Member Author

prandla commented Aug 31, 2025

Rewrote the description of quotas in cms.sample.toml a bit. (Is that too big of a wall of text? Should it maybe be moved to the documentation instead?) Also removed max_file_size (see commit message).

Getting tests to pass on Github Actions is still blocked on the Isolate PR linked above. All tests pass on my machine.

It breaks dotnet, and it was not secure anyways; quotas are a better
replacement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant