-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8365526: Crash with null Symbol passed to SystemDictionary::resolve_or_null #28438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Welcome back coleenp! A progress list of the required criteria for merging this PR into |
|
@coleenp This change is no longer ready for integration - check the PR body for details. |
Webrevs
|
tkrodriguez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tracking this down! Looks good.
| // Only add a new resolution error if one hasn't been found for this constant pool index. In this case, | ||
| // resolution succeeded but there's an error in this nest host. | ||
| assert(pool->resolved_klass_at(which) != nullptr, "klass is should be resolved if there is no entry"); | ||
| ResolutionErrorTable::add_entry(pool, which, message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be inclined to swap the cases.
if (entry == nullptr) {
...
} else if (entry->nest_host_error() == nullptr) {
...
}
Is there ever a situation where replacing an entry in ResolutionErrorTable is correct? Maybe there should be a check for that somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this reorganization would look nicer.
No, there's a never a situation where calling replacing an entry in the ResolutionErrorTable is correct because this HashTable::put() function leaks the value that it has replaced. I've been testing an assert for this.
In general, this function can leak the value but I did a test and we don't leak anything but this one right now. But I think we should fix this separately.
| bool created = false; | ||
| _resolution_error_table->put_if_absent(key, entry, &created); | ||
| assert(created, "should be created not updated"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above 3 lines can be replaced with _resolution_error_table->put_when_absent(key, entry).
| // still want to add the error message for the higher-level access checks to report. We should | ||
| // only reach here under the same error condition, so we can ignore the potential race with setting | ||
| // the message. If we see it is already set then we can ignore it. | ||
| entry->set_nest_host_error(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing -- shouldn't we free the old entry->_nest_host_error?
Also, there's a related memory leak here:
// Add entry to resolution error table to record the error when the first
// attempt to resolve a reference to a class has failed.
void SystemDictionary::add_resolution_error(const constantPoolHandle& pool, int which,
Symbol* error, const char* message,
Symbol* cause, const char* cause_msg) {
{
MutexLocker ml(Thread::current(), SystemDictionary_lock);
ResolutionErrorEntry* entry = ResolutionErrorTable::find_entry(pool, which);
if (entry == nullptr) {
ResolutionErrorTable::add_entry(pool, which, error, message, cause, cause_msg);
} else {
// message and cause_msg are leaked <<<<<<<<<<
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the SystemDictionary case, we're fine. You wouldn't think so, but we are. That's because message and cause_msg are resource allocated, and those strings are strdup:ed in the constructor of the table entry. InstanceKlass::next_host has a memory leak though, because ResolutionErrorEntry does take ownership of the underlying string pointer, so we have this:
const char* msg = ss.as_string(true /* on C-heap */);
constantPoolHandle cph(THREAD, constants());
SystemDictionary::add_nest_host_error(cph, _nest_host_index, msg);
// ... down the callstack we go, reaching the constructor call:
ResolutionErrorEntry *entry = new ResolutionErrorEntry(message);
ResolutionErrorEntry(const char* message):
_error(nullptr),
_message(nullptr),
_cause(nullptr),
_cause_msg(nullptr),
_nest_host_error(message) {} // <-- NooooAs opposed to the other constructor, which looks like this:
// This is the call to the constructor this time:
ResolutionErrorEntry *entry = new ResolutionErrorEntry(error, message, cause, cause_msg);
ResolutionErrorEntry::ResolutionErrorEntry(Symbol* error, const char* message,
Symbol* cause, const char* cause_msg):
_error(error),
_message(message != nullptr ? os::strdup(message) : nullptr),
_cause(cause),
_cause_msg(cause_msg != nullptr ? os::strdup(cause_msg) : nullptr),
_nest_host_error(nullptr) {
Symbol::maybe_increment_refcount(_error);
Symbol::maybe_increment_refcount(_cause);
}This is actually pretty bad :-/, I'd really appreciate it if we could make these types of bugs a bit more shallow at the time of writing them.
Maybe it'd be nice to have a type that tells the reader that an object doesn't intend to free a received pointer on its destruction? This is a very small sketch of something illustrating kind of what I mean:
template<typename T>
using Borrow = T*;
template<typename T>
using Own = T*;
// "I'll take a string, but I don't intend to be responsible for freeing it"
const char* os::strdup(Borrow<const char>, MemTag) { /* ... */}
class SystemDictionary {
Own<const char> _message; // I own this, and so I intend to free it when I'm destroyed
Own<const char> _cause_msg; // Same here
// "I'll take a message and a cause_msg, and I won't be responsible for freeing it"
void SystemDictionary::add_resolution_error(const constantPoolHandle& pool, int which,
Symbol* error, Borrow<const char> message,
Symbol* cause, Borrow<const char> cause_msg) :
// Reader meant to think: Wait, we're assigning a Borrow to an Own directly? Seems wrong.
_message(message),
// Reader meant to think: Aah, we're making a copy to get ownership
_cause_msg(os::strdup(cause_msg))
{
/* ... */
}
};This will make no compiler errors for us in case of incorrect usage, but it will be a sign to the reader that SystemDictionary doesn't intend to clean up message or cause_msg, and that the writer actually thought about the possibility of a leak from these strings.
I'm not suggesting this is what we add, I'm just saying that clearly we can communicate more in the code than we currently do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a ticket for this: 8372373
| if (entry == nullptr) { | ||
| // Only add a new resolution error if one hasn't been found for this constant pool index. In this case, | ||
| // resolution succeeded but there's an error in this nest host. | ||
| assert(pool->resolved_klass_at(which) != nullptr, "klass is should be resolved if there is no entry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "klass should be resolved ...
The vm was crashing because the constant pool couldn't find the resolution error in the ResolutionErrorEntry error field.
There are two uses of ResolutionErrorEntry in the ResolutionErrorTable. The key to this table is {ConstantPool, cp-index}. In this crash, multiple threads were racing to record nest_host_errors in the case where resolution failed. In this case, there was already a ResolutionErrorEntry in the table for the constant pool resolution failure. In the 'if' case of add_nest_host_error we check to see if there's already a nest_host_error assuming it's the same error, then the 'else' case was unconditionally adding a ResolutionErrorEntry with just the nest host message. Calling HashTable::put() with this entry with just the nest host message, was overwriting the entry with the constant pool resolution error, ie. the other fields. The crash happened in ConstantPool::throw_resolution_error() because the error field was overwritten (and leaked too).
Add a null check before calling ResolutionErrorEntry add entry. Also added an assert that we only add a resolution error for nest host errors in the case of success since in the case of failure there will always already be a ResolutionErrorEntry for the failing constant pool and cp index and we don't want to overwrite that again.
Tested with submitted reproducer and tier1-4.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28438/head:pull/28438$ git checkout pull/28438Update a local copy of the PR:
$ git checkout pull/28438$ git pull https://git.openjdk.org/jdk.git pull/28438/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28438View PR using the GUI difftool:
$ git pr show -t 28438Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28438.diff
Using Webrev
Link to Webrev Comment