-
Notifications
You must be signed in to change notification settings - Fork 15k
[lld][MachO] Follow-up to use madvise() for threaded file page-in. #157917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-support @llvm/pr-subscribers-lld-macho Author: John Holdsworth (johnno1962) ChangesFurther to #147134 (comment), switch to use the madvise() api to page in mmap'd files. Full diff: https://github.com/llvm/llvm-project/pull/157917.diff 1 Files Affected:
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index 3db638e1ead96..2635c82a53448 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -53,6 +53,8 @@
#include "llvm/TextAPI/Architecture.h"
#include "llvm/TextAPI/PackedVersion.h"
+#include <sys/mman.h>
+
using namespace llvm;
using namespace llvm::MachO;
using namespace llvm::object;
@@ -334,11 +336,10 @@ class SerialBackgroundQueue {
// This code forces the page-ins on multiple threads so
// the process is not stalled waiting on disk buffer i/o.
void multiThreadedPageInBackground(DeferredFiles &deferred) {
- static const size_t pageSize = Process::getPageSizeEstimate();
static const size_t largeArchive = 10 * 1024 * 1024;
#ifndef NDEBUG
using namespace std::chrono;
- std::atomic_int numDeferedFilesTouched = 0;
+ std::atomic_int numDeferedFilesAdvised = 0;
static std::atomic_uint64_t totalBytes = 0;
auto t0 = high_resolution_clock::now();
#endif
@@ -349,13 +350,11 @@ void multiThreadedPageInBackground(DeferredFiles &deferred) {
return;
#ifndef NDEBUG
totalBytes += buff.size();
- numDeferedFilesTouched += 1;
+ numDeferedFilesAdvised += 1;
#endif
- // Reference all file's mmap'd pages to load them into memory.
- for (const char *page = buff.data(), *end = page + buff.size(); page < end;
- page += pageSize)
- LLVM_ATTRIBUTE_UNUSED volatile char t = *page;
+ // Advise that mmap'd files should be loaded into memory.
+ madvise((void *)buff.data(), buff.size(), MADV_WILLNEED);
};
#if LLVM_ENABLE_THREADS
{ // Create scope for waiting for the taskGroup
@@ -376,7 +375,7 @@ void multiThreadedPageInBackground(DeferredFiles &deferred) {
auto dt = high_resolution_clock::now() - t0;
if (Process::GetEnv("LLD_MULTI_THREAD_PAGE"))
llvm::dbgs() << "multiThreadedPageIn " << totalBytes << "/"
- << numDeferedFilesTouched << "/" << deferred.size() << "/"
+ << numDeferedFilesAdvised << "/" << deferred.size() << "/"
<< duration_cast<milliseconds>(dt).count() / 1000. << "\n";
#endif
}
|
dc9ac78
to
0687bce
Compare
0687bce
to
65e433c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
✅ With the latest revision this PR passed the C/C++ code formatter. |
01fd8d8
to
1f188be
Compare
1f188be
to
59e77f5
Compare
d653ae5
to
298380e
Compare
5be953a
to
b87e01e
Compare
b87e01e
to
64d5be0
Compare
1636d63
to
11711bb
Compare
I've included changes to two additional files to counter the performance regression of 2 seconds from 85cd3d9 for the linker when using the --worker-threads option due to not mmap()ing many input files. Binary object and archive files don't need to be null terminated. This seemed the least intrusive change. |
The one in |
@drodriguez I wondered about .tbd files. Basically, the new code in the commit I mention, prevents most input files from using mmap() to solve an obscure clang problem on Linux. I'm open to suggestions about what to do about this. Edit:
This is an incorrect assumption see below.. |
241b686
to
884b771
Compare
llvm/lib/Support/MemoryBuffer.cpp
Outdated
// don't use mmap in that case (unless it is object or archive file). | ||
if (!RequiresNullTerminator || *Result->getBufferEnd() == '\0' || | ||
StringRef(Filename.str()).ends_with(".o") || | ||
StringRef(Filename.str()).ends_with(".a")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I feel about leaking object file logic into MemoryBuffer.cpp
. Could we just use RequiresNullTerminator = false
in getFile()
for object files (or however this code is called)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, this version is awful but at least the fix is located close to the change that caused the problem. Have a look at the commit to see the previous version which passed in RequiresNullTerminator = false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas how we should handle this @zygoloid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we be setting RequiresNullTerminator
when mapping a .o
or .a
file? That sounds like a bug in the caller to me.
Can you explain how it prevents most input files from using mmap? I would expect that for most input files either the file is binary (in which case |
The default value for RequiresNullTerminator in the header for MemoryBuffer::getFile(...) is true when not specified. |
Can we fix that? That really seems like the wrong default, given that -- as we've discovered -- guaranteeing a null terminator is not free. Failing to provide the requested guarantee in some cases seems like the wrong answer. |
This reverts commit 884b771.
@zygoloid I found the existing discussion about why not only Linux in the comments of #152595. Makes sense. The problem is that the default of |
Why? They'd only take the slow path if their size is a multiple of the page size or they hit an OS bug. |
Object files are a multiple of the page size no? And what about source files that happen to have a length which is a multiple of the page size? |
If so, any code opening them as a memory buffer really shouldn't be passing |
@zygoloid I cannot find an online link, but for example, in my macOS 15.6.1,
And there's no @johnno1962 : I have checked the YAML/JSON parsers for the tbd and they seem to check for the length of the files and not blindly searching for the end of the string. I cannot say those are the only textual files that are processed (linker scripts files and similar might be accepted?). If you have executed the test suite and nothing pops, maybe this is correct, but the situation described in #152595 is not something that can be replicated easily in the tests, so this might be introducing problems. |
FWIW Object files seem to have a length which is a multiple of 8. |
It may still be needed in some cases even without the Linux bug. For example, if another process (say, a text editor) mmaps the file and writes into the slack bytes, then clang is executed while that other process still has the file open, clang's mmap might see those non-zero bytes, because it will share the pages. It should be easy enough to write a test program to see if that happens in practice. (Does the OS zero the slack bytes when the last page is first mapped, or each time it's mapped?) But to reiterate, the problem here is that people are incorrectly passing |
Co-authored-by: James Henderson <[email protected]>
So I guess the current version of this PR is as good as it gets for now. Probably, the default argument should be removed from MemoryBuffer::getFile altogether. It is referenced in 245 places (about half tests). There's also the preceding argument static ErrorOr<std::unique_ptr<MemoryBuffer>>
getFile(const Twine &Filename, bool IsText = false,
bool RequiresNullTerminator = true, bool IsVolatile = false,
std::optional<Align> Alignment = std::nullopt);
ErrorOr<std::unique_ptr<MemoryBuffer>>
MemoryBuffer::getFile(const Twine &Filename, bool IsText,
bool RequiresNullTerminator, bool IsVolatile,
std::optional<Align> Alignment) {
return getFileAux<MemoryBuffer>(Filename, /*MapSize=*/-1, /*Offset=*/0,
IsText, RequiresNullTerminator, IsVolatile,
Alignment);
}
template <typename MB>
static ErrorOr<std::unique_ptr<MB>>
getFileAux(const Twine &Filename, uint64_t MapSize, uint64_t Offset,
bool IsText, bool RequiresNullTerminator, bool IsVolatile,
std::optional<Align> Alignment) {
Expected<sys::fs::file_t> FDOrErr = sys::fs::openNativeFileForRead(
Filename, IsText ? sys::fs::OF_TextWithCRLF : sys::fs::OF_None); Looks like this would be a difficult signature to modify as it is also referenced (using the default) 54 times in the swift project. One migration path would be to create a new API without the default e.g. getFileBuffer and people can slowly switch to that. |
0b53f29
to
80767a2
Compare
@zygoloid, just an update on what I'm trying with recent commits. I'm seeking to verify the follow change to your change:
... along with some other minor changes to the signatures of functions internal to MemoryBuffer.cpp to marshal the IsText value through to getOpenFileImpl(). The difference is IsText has a default argument value of false whereas the default value for RequiresNullTerminator is true when you don't specify a value which seems to be the common case. I appreciate this is a minor fib for the code but there don't seem to be many other solutions given the constraints of a well used entry point. I don't imagine many source buffers that require null termination are not also being specified as text in practice and if that assumption didn't hold the worst case is to revert to the behaviour before Aug 13th. How would you feel about me making this change? There were Ci problems but they fixed themselves. If you're wondering how I'm testing/proving this change I have a trial link of the largest .dylib in Chrome which always takes 14 seconds using
P.S. I worked out where the missing 2 seconds are actually going. It's not because i/o is dropping down from mmap() to a file read at all but the very act of checking for the terminating byte waits for a synchronous page read, single threaded for the 22,000 input files being opened in the example link. .1ms per random access read from secondary storage seems credible.
|
27f3d37
to
f2654ae
Compare
Further to #147134 (comment), switch to use the madvise() api to page in mmap'd files.