-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raw file pread from multiple processes #9239
Labels
Comments
nickva
added a commit
to nickva/otp
that referenced
this issue
Dec 26, 2024
At the C API level pread [1] can be called from multiple threads. In Erlang, however, all the calls must be made from a single owner process. In a busy concurrent environment that means all operations for a file descriptor must be serialized, and could pile up in the controller's mailbox having to wait for potentially slower operations to complete. To fix it, add the ability to make raw concurrent pread calls from Erlang as well. File descriptor lifetime is still tied to the owner process, so it gets closed when owner dies. Other processes are only allowed to make pread calls, all others still require going through the owner. Other file layers, like compression and delayed IO, still keep the previous behavior. They have their own `get_fd_data/1` functions per layer which check controller ownership. Concurrent preads are not allowed in those layers. In unix_prim_file.c the seek+read fallback would have required exposing a flag in Erlang in order to keep the old behavior since another process could see the temparily changed position. However, before adding a new flag looked where pread might not be supported, and it seems most Unix-like OSes since Sun0S 5(Solaris 2.5) and AT&T Sys V Rel4 (so all modern BSD) seem to have it. So perhaps, it's safe to remove the fallback altogether and simplify the code? As a precaution kept a configure check with an early failure and a clear message about it. This necessitates updating preloaded beam files so an OTP team member would have to take over the PR, if the idea is acceptable to start with. I didn't commit the beam files so any CI tests might not run either? Issue: erlang#9239 [1] https://www.man7.org/linux/man-pages/man2/pread.2.html
nickva
added a commit
to nickva/otp
that referenced
this issue
Dec 26, 2024
At the C API level pread [1] can be called from multiple threads. In Erlang, however, all the calls must be made from a single owner process. In a busy concurrent environment that means all operations for a file descriptor must be serialized, and could pile up in the controller's mailbox having to wait for potentially slower operations to complete. To fix it, add the ability to make raw concurrent pread calls from Erlang as well. File descriptor lifetime is still tied to the owner process, so it gets closed when owner dies. Other processes are only allowed to make pread calls, all others still require going through the owner. Other file layers, like compression and delayed IO, still keep the previous behavior. They have their own `get_fd_data/1` functions per layer which check controller ownership. Concurrent preads are not allowed in those layers. In unix_prim_file.c the seek+read fallback would have required exposing a flag in Erlang in order to keep the old behavior since another process could see the temporarily changed position. However, before adding a new flag looked where pread might not be supported, and it seems most Unix-like OSes since Sun0S 5(Solaris 2.5) and AT&T Sys V Rel4 (so all modern BSD) seem to have it. So perhaps, it's safe to remove the fallback altogether and simplify the code? As a precaution kept a configure check with an early failure and a clear message about it. This necessitates updating preloaded beam files so an OTP team member would have to take over the PR, if the idea is acceptable to start with. I didn't commit the beam files so any CI tests might not run either? Issue: erlang#9239 [1] https://www.man7.org/linux/man-pages/man2/pread.2.html
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Is your feature request related to a problem? Please describe.
I would like to make plain
pread
calls using raw file handles from non-owner processes.In CouchDB we use a single raw file handle per database file. We open files in the append-only + read mode, we don't use any extra layers ('read-ahead', 'compression' etc, just plain raw fds). Then, we issue either
pread
andwrite
API calls. Write calls append to the end of the file (appending headers, copy-on-write btree nodes, etc). Reads are justpread
calls at essentially random file offsets. Since all the operations are serialized via a single owner process, a high rate of writes, could block reads. It would be beneficial if we could issue preads calls from non-owner processes so pread calls wouldn't wait on potentially long running write calls to complete.It seems, at least at the POSIX file descriptor level, pread calls are thread-safe, and can be used from concurrent threads.
From pread2 man page:
In Erlang it's currently not possible to do that and the pread operations fail when called from non-owner process.
Looking through various forums and discussions online it seemed at first like it would be doable in Erlang as well: https://erlangforums.com/t/share-the-same-file-handle-between-multiple-erlang-processes/3039 but trying it out on the latest OTP version it's still not possible.
Describe the solution you'd like
Have the ability to issue pread calls on raw, plain, file handles. This seems to be possible on Linux, and on Windows as well via the "overlapped" option https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-readfile?redirectedfrom=MSDN#syntax, though not entirely sure here as I am not familiar with the Window API very well.
Describe alternatives you've considered
Memory mapping using emmap library could be an option. That means we'd have to map the whole file to memory and constantly unmap / remap it as it's being appended to. But, since OS file API allows preads from different threads, perhaps it would be possible to bring that feature to Erlang as well.
Also, at first sight, it may seem mmap-ing should be "faster" but that's not necessarily true: at least based on a benchmark someone ran for Rust a fews years ago: https://internals.rust-lang.org/t/introduce-write-at-write-all-at-read-at-read-exact-at-on-windows/19649/23
The text was updated successfully, but these errors were encountered: