Skip to content

Conversation

@coleenp
Copy link
Contributor

@coleenp coleenp commented Nov 20, 2025

The vm was crashing because the constant pool couldn't find the resolution error in the ResolutionErrorEntry error field.

There are two uses of ResolutionErrorEntry in the ResolutionErrorTable. The key to this table is {ConstantPool, cp-index}. In this crash, multiple threads were racing to record nest_host_errors in the case where resolution failed. In this case, there was already a ResolutionErrorEntry in the table for the constant pool resolution failure. In the 'if' case of add_nest_host_error we check to see if there's already a nest_host_error assuming it's the same error, then the 'else' case was unconditionally adding a ResolutionErrorEntry with just the nest host message. Calling HashTable::put() with this entry with just the nest host message, was overwriting the entry with the constant pool resolution error, ie. the other fields. The crash happened in ConstantPool::throw_resolution_error() because the error field was overwritten (and leaked too).

Add a null check before calling ResolutionErrorEntry add entry. Also added an assert that we only add a resolution error for nest host errors in the case of success since in the case of failure there will always already be a ResolutionErrorEntry for the failing constant pool and cp index and we don't want to overwrite that again.

Tested with submitted reproducer and tier1-4.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8365526: Crash with null Symbol passed to SystemDictionary::resolve_or_null (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28438/head:pull/28438
$ git checkout pull/28438

Update a local copy of the PR:
$ git checkout pull/28438
$ git pull https://git.openjdk.org/jdk.git pull/28438/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28438

View PR using the GUI difftool:
$ git pr show -t 28438

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28438.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 20, 2025

👋 Welcome back coleenp! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 20, 2025

@coleenp This change is no longer ready for integration - check the PR body for details.

@openjdk
Copy link

openjdk bot commented Nov 20, 2025

@coleenp The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 20, 2025
@mlbridge
Copy link

mlbridge bot commented Nov 20, 2025

Webrevs

Copy link
Contributor

@tkrodriguez tkrodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking this down! Looks good.

// Only add a new resolution error if one hasn't been found for this constant pool index. In this case,
// resolution succeeded but there's an error in this nest host.
assert(pool->resolved_klass_at(which) != nullptr, "klass is should be resolved if there is no entry");
ResolutionErrorTable::add_entry(pool, which, message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be inclined to swap the cases.

if (entry == nullptr) {
    ...
} else if (entry->nest_host_error() == nullptr) {
    ...
}

Is there ever a situation where replacing an entry in ResolutionErrorTable is correct? Maybe there should be a check for that somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this reorganization would look nicer.
No, there's a never a situation where calling replacing an entry in the ResolutionErrorTable is correct because this HashTable::put() function leaks the value that it has replaced. I've been testing an assert for this.
In general, this function can leak the value but I did a test and we don't leak anything but this one right now. But I think we should fix this separately.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 21, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Nov 21, 2025
Comment on lines +88 to +90
bool created = false;
_resolution_error_table->put_if_absent(key, entry, &created);
assert(created, "should be created not updated");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above 3 lines can be replaced with _resolution_error_table->put_when_absent(key, entry).

// still want to add the error message for the higher-level access checks to report. We should
// only reach here under the same error condition, so we can ignore the potential race with setting
// the message. If we see it is already set then we can ignore it.
entry->set_nest_host_error(message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing -- shouldn't we free the old entry->_nest_host_error?

Also, there's a related memory leak here:

// Add entry to resolution error table to record the error when the first
// attempt to resolve a reference to a class has failed.
void SystemDictionary::add_resolution_error(const constantPoolHandle& pool, int which,
                                            Symbol* error, const char* message,
                                            Symbol* cause, const char* cause_msg) {
  {
    MutexLocker ml(Thread::current(), SystemDictionary_lock);
    ResolutionErrorEntry* entry = ResolutionErrorTable::find_entry(pool, which);
    if (entry == nullptr) {
      ResolutionErrorTable::add_entry(pool, which, error, message, cause, cause_msg);
    } else {
        // message and cause_msg are leaked <<<<<<<<<<
    }
  }
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SystemDictionary case, we're fine. You wouldn't think so, but we are. That's because message and cause_msg are resource allocated, and those strings are strdup:ed in the constructor of the table entry. InstanceKlass::next_host has a memory leak though, because ResolutionErrorEntry does take ownership of the underlying string pointer, so we have this:

      const char* msg = ss.as_string(true /* on C-heap */);
      constantPoolHandle cph(THREAD, constants());
      SystemDictionary::add_nest_host_error(cph, _nest_host_index, msg);
      // ... down the callstack we go, reaching the constructor call:
       ResolutionErrorEntry *entry = new ResolutionErrorEntry(message);
       ResolutionErrorEntry(const char* message):
        _error(nullptr),
        _message(nullptr),
        _cause(nullptr),
        _cause_msg(nullptr),

        _nest_host_error(message) {} // <-- Noooo

As opposed to the other constructor, which looks like this:

// This is the call to the constructor this time:
  ResolutionErrorEntry *entry = new ResolutionErrorEntry(error, message, cause, cause_msg);
ResolutionErrorEntry::ResolutionErrorEntry(Symbol* error, const char* message,
                                           Symbol* cause, const char* cause_msg):
        _error(error),
        _message(message != nullptr ? os::strdup(message) : nullptr),
        _cause(cause),
        _cause_msg(cause_msg != nullptr ? os::strdup(cause_msg) : nullptr),
        _nest_host_error(nullptr) {

  Symbol::maybe_increment_refcount(_error);
  Symbol::maybe_increment_refcount(_cause);
}

This is actually pretty bad :-/, I'd really appreciate it if we could make these types of bugs a bit more shallow at the time of writing them.

Maybe it'd be nice to have a type that tells the reader that an object doesn't intend to free a received pointer on its destruction? This is a very small sketch of something illustrating kind of what I mean:

template<typename T>
using Borrow = T*;
template<typename T>
using Own = T*;

// "I'll take a string, but I don't intend to be responsible for freeing it"
const char* os::strdup(Borrow<const char>, MemTag) { /* ... */}

class SystemDictionary {
  Own<const char> _message; // I own this, and so I intend to free it when I'm destroyed
  Own<const char> _cause_msg; // Same here

  // "I'll take a message and a cause_msg, and I won't be responsible for freeing it"
  void SystemDictionary::add_resolution_error(const constantPoolHandle& pool, int which,
                                            Symbol* error, Borrow<const char> message,
                                            Symbol* cause, Borrow<const char> cause_msg) : 
  // Reader meant to think: Wait, we're assigning a Borrow to an Own directly? Seems wrong.
    _message(message),
// Reader meant to think: Aah, we're making a copy to get ownership
   _cause_msg(os::strdup(cause_msg)) 
{
  /* ... */
}
};

This will make no compiler errors for us in case of incorrect usage, but it will be a sign to the reader that SystemDictionary doesn't intend to clean up message or cause_msg, and that the writer actually thought about the possibility of a leak from these strings.

I'm not suggesting this is what we add, I'm just saying that clearly we can communicate more in the code than we currently do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a ticket for this: 8372373

if (entry == nullptr) {
// Only add a new resolution error if one hasn't been found for this constant pool index. In this case,
// resolution succeeded but there's an error in this nest host.
assert(pool->resolved_klass_at(which) != nullptr, "klass is should be resolved if there is no entry");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "klass should be resolved ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-runtime [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants