Skip to content

[Modules] Record whether VarDecl initializers contain side effects #143739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hnrklssn
Copy link
Member

This assert was reportedly added to "Defensively ensure that GetExternalDeclStmt protects itself from nested deserialization". In the tests for swiftlang/swift#81859 I was able to trigger this assert without nested deserialization. In FinishedDeserializing we have this:

void ASTReader::FinishedDeserializing() {
  assert(NumCurrentElementsDeserializing &&
         "FinishedDeserializing not paired with StartedDeserializing");
  if (NumCurrentElementsDeserializing == 1) {
    // We decrease NumCurrentElementsDeserializing only after pending actions
    // are finished, to avoid recursively re-calling finishPendingActions().
    finishPendingActions();
  }
  --NumCurrentElementsDeserializing;

where NumCurrentElementsDeserializing is clearly 1 when calling finishPendingActions. Through this we end up in loadDeclUpdateRecords, which has:

in loadDeclUpdateRecords we have:
    // Check if this decl was interesting to the consumer. If we just loaded
    // the declaration, then we know it was interesting and we skip the call
    // to isConsumerInterestedIn because it is unsafe to call in the
    // current ASTReader state.
    bool WasInteresting = Record.JustLoaded || isConsumerInterestedIn(D);

In this case Record.JustLoaded is false, so we end up calling isConsumerInterestedIn.
In isConsumerInterestedIn we have:

  // An ImportDecl or VarDecl imported from a module map module will get
  // emitted when we import the relevant module.
  if (isPartOfPerModuleInitializer(D)) {
    auto *M = D->getImportedOwningModule();
    if (M && M->isModuleMapModule() &&
        getContext().DeclMustBeEmitted(D))
      return false;
  }

in DeclMustBeEmitted we have:

  // Variables that have initialization with side-effects are required.
  if (VD->getInit() && VD->getInit()->HasSideEffects(*this) &&
      // We can get a value-dependent initializer during error recovery.
      (VD->getInit()->isValueDependent() || !VD->evaluateValue()))
    return true;

in VarDecl::getInit we have:

  auto *Eval = getEvaluatedStmt();

  return cast<Expr>(Eval->Value.get(
      Eval->Value.isOffset() ? getASTContext().getExternalSource() : nullptr));

which ends up calling ASTReader::GetExternalDeclStmt.

At first I considered whether bool WasInteresting = Record.JustLoaded || isConsumerInterestedIn(D); should guard against calling isConsumerInterestedIn in even more cases, but I didn't know what that would be. Instead I tried removing the assert, and my test passed, so I assume calling isConsumerInterestedIn was safe in this case.

rdar://153085264

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Jun 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Henrik G. Olsson (hnrklssn)

Changes

This assert was reportedly added to "Defensively ensure that GetExternalDeclStmt protects itself from nested deserialization". In the tests for swiftlang/swift#81859 I was able to trigger this assert without nested deserialization. In FinishedDeserializing we have this:

void ASTReader::FinishedDeserializing() {
  assert(NumCurrentElementsDeserializing &amp;&amp;
         "FinishedDeserializing not paired with StartedDeserializing");
  if (NumCurrentElementsDeserializing == 1) {
    // We decrease NumCurrentElementsDeserializing only after pending actions
    // are finished, to avoid recursively re-calling finishPendingActions().
    finishPendingActions();
  }
  --NumCurrentElementsDeserializing;

where NumCurrentElementsDeserializing is clearly 1 when calling finishPendingActions. Through this we end up in loadDeclUpdateRecords, which has:

in loadDeclUpdateRecords we have:
    // Check if this decl was interesting to the consumer. If we just loaded
    // the declaration, then we know it was interesting and we skip the call
    // to isConsumerInterestedIn because it is unsafe to call in the
    // current ASTReader state.
    bool WasInteresting = Record.JustLoaded || isConsumerInterestedIn(D);

In this case Record.JustLoaded is false, so we end up calling isConsumerInterestedIn.
In isConsumerInterestedIn we have:

  // An ImportDecl or VarDecl imported from a module map module will get
  // emitted when we import the relevant module.
  if (isPartOfPerModuleInitializer(D)) {
    auto *M = D-&gt;getImportedOwningModule();
    if (M &amp;&amp; M-&gt;isModuleMapModule() &amp;&amp;
        getContext().DeclMustBeEmitted(D))
      return false;
  }

in DeclMustBeEmitted we have:

  // Variables that have initialization with side-effects are required.
  if (VD-&gt;getInit() &amp;&amp; VD-&gt;getInit()-&gt;HasSideEffects(*this) &amp;&amp;
      // We can get a value-dependent initializer during error recovery.
      (VD-&gt;getInit()-&gt;isValueDependent() || !VD-&gt;evaluateValue()))
    return true;

in VarDecl::getInit we have:

  auto *Eval = getEvaluatedStmt();

  return cast&lt;Expr&gt;(Eval-&gt;Value.get(
      Eval-&gt;Value.isOffset() ? getASTContext().getExternalSource() : nullptr));

which ends up calling ASTReader::GetExternalDeclStmt.

At first I considered whether bool WasInteresting = Record.JustLoaded || isConsumerInterestedIn(D); should guard against calling isConsumerInterestedIn in even more cases, but I didn't know what that would be. Instead I tried removing the assert, and my test passed, so I assume calling isConsumerInterestedIn was safe in this case.

rdar://153085264


Full diff: https://github.com/llvm/llvm-project/pull/143739.diff

1 Files Affected:

  • (modified) clang/lib/Serialization/ASTReader.cpp (-2)
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 70b54b7296882..aaa64b06e1cee 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -8310,8 +8310,6 @@ Stmt *ASTReader::GetExternalDeclStmt(uint64_t Offset) {
     Error(std::move(Err));
     return nullptr;
   }
-  assert(NumCurrentElementsDeserializing == 0 &&
-         "should not be called while already deserializing");
   Deserializing D(this);
   return ReadStmtFromStream(*Loc.F);
 }

@ChuanqiXu9
Copy link
Member

I think we shouldn't remove the assertion. Your test passes with the removal of the assertion since the initializers are not complex. So it ends quickly. But if it is a complex initialization which triggers more deserialization, I feel it will be problematic.

I think the point is in DeclMustBeEmitted, this should be a "pure" method but it triggers deserialization.

I think, the proper solution may be:

  1. When we write a VarDecl, use a bit to record whether the var decl has an initialization with side effects.
  2. When we read a var decl with the above information, let's record it in a set in ASTReader.
  3. When we decide if a VarDecl needs to be emitted in DeclMustBeEmitted, let's lookup it in the above set.

@hnrklssn
Copy link
Member Author

I think we shouldn't remove the assertion. Your test passes with the removal of the assertion since the initializers are not complex. So it ends quickly. But if it is a complex initialization which triggers more deserialization, I feel it will be problematic.

I think the point is in DeclMustBeEmitted, this should be a "pure" method but it triggers deserialization.

I think, the proper solution may be:

1. When we write a VarDecl, use a bit to record whether the var decl has an initialization with side effects.

2. When we read a var decl with the above information, let's record it in a set in ASTReader.

3. When we decide if a VarDecl needs to be emitted in `DeclMustBeEmitted`, let's lookup it in the above set.

Thanks for providing context! I'm not very familiar with this part of clang. I have a few clarifying questions:
DeclMustBeEmitted is part of ASTContext rather than ASTReader, so unless I missed something it wouldn't have access to this set. We could check the "has initializer with side effect" set in ASTReader before the call to DeclMustBeEmitted, but unless we removed that piece of code from DeclMustBeEmitted it would still result in the initializer being deserialized, right? There are other callers that would not necessarily have anASTReader in context, so I'm not sure how to go about that.

Actually, looking at the callers, one of them is isRequiredDecl, which is used to determine whether decls go into the EagerlyDeserializedDecls record. I'm not fully clear on how eagerly they are deserialized, but if the initializer is also eagerly deserialized, then maybe the ASTReader could skip calling DeclMustBeEmitted for VarDecls that aren't fully deserialized, since they would have been in the EagerlyDeserializedDecls record if DeclMustBeEmitted returned true.

Another option would be to set some bits in VarDecl::NonParmVarDeclBits rather than creating a set in ASTReader.
WDYT?

@ChuanqiXu9
Copy link
Member

I think we shouldn't remove the assertion. Your test passes with the removal of the assertion since the initializers are not complex. So it ends quickly. But if it is a complex initialization which triggers more deserialization, I feel it will be problematic.
I think the point is in DeclMustBeEmitted, this should be a "pure" method but it triggers deserialization.
I think, the proper solution may be:

1. When we write a VarDecl, use a bit to record whether the var decl has an initialization with side effects.

2. When we read a var decl with the above information, let's record it in a set in ASTReader.

3. When we decide if a VarDecl needs to be emitted in `DeclMustBeEmitted`, let's lookup it in the above set.

Thanks for providing context! I'm not very familiar with this part of clang. I have a few clarifying questions: DeclMustBeEmitted is part of ASTContext rather than ASTReader, so unless I missed something it wouldn't have access to this set. We could check the "has initializer with side effect" set in ASTReader before the call to DeclMustBeEmitted, but unless we removed that piece of code from DeclMustBeEmitted it would still result in the initializer being deserialized, right? There are other callers that would not necessarily have anASTReader in context, so I'm not sure how to go about that.

Actually, looking at the callers, one of them is isRequiredDecl, which is used to determine whether decls go into the EagerlyDeserializedDecls record. I'm not fully clear on how eagerly they are deserialized, but if the initializer is also eagerly deserialized, then maybe the ASTReader could skip calling DeclMustBeEmitted for VarDecls that aren't fully deserialized, since they would have been in the EagerlyDeserializedDecls record if DeclMustBeEmitted returned true.

Semantically it is better to do this in DeclMustBeEmitted since this is what we're deciding.

We can relate ASTContext and ASTReader by ExternalASTSource generally. We can add an interface in ExternalASTSource and call it in ASTContext and implement it in ASTReader.

Another option would be to set some bits in VarDecl::NonParmVarDeclBits rather than creating a set in ASTReader. WDYT?

Generally we don't do this, since it will add additional cost for non-module users. Although it is cheaper and simpler for module users, generally we prefer to give non-module users higher precedence.

@hnrklssn hnrklssn force-pushed the remove-deserialize-assert-upstream branch from f5b43df to d239fee Compare June 16, 2025 14:24
@llvmbot llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Jun 16, 2025
@hnrklssn hnrklssn changed the title [ASTReader] Remove assert in GetExternalDeclStmt [Modules] Record whether VarDecl initializers contain side effects Jun 16, 2025
Copy link

github-actions bot commented Jun 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@hnrklssn hnrklssn force-pushed the remove-deserialize-assert-upstream branch from d239fee to a8042b8 Compare June 16, 2025 14:44
@hnrklssn
Copy link
Member Author

@ChuanqiXu9 I pushed a new fix based on your feedback. Please let me know what you think. I don't really know how to test this, so if you think it needs testing I'm open for suggestions.

@hnrklssn hnrklssn requested a review from benlangmuir June 16, 2025 14:47
@hnrklssn hnrklssn force-pushed the remove-deserialize-assert-upstream branch from a8042b8 to 44e965e Compare June 16, 2025 14:48
@hnrklssn hnrklssn requested a review from Bigcheese June 16, 2025 15:45
@cyndyishida cyndyishida requested a review from vsapsai June 16, 2025 16:00
Copy link
Collaborator

@benlangmuir benlangmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this new function should be called from ASTContext::DeclMustBeEmitted, no?

@@ -1442,6 +1442,10 @@ class ASTReader
const StringRef &operator*() && = delete;
};

/// VarDecls with initializers containing side effects must be emitted,
/// but DeclMustBeEmitted is not allowed to deserialize the intializer.
llvm::SmallPtrSet<Decl *, 16> InitSideEffectVars;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but do we have any kind of data informing us how many of these we typically see?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea!

@ChuanqiXu9
Copy link
Member

The direction meets my expectation. I think you already have an existing test for swift. Maybe you can try to reduce a lit test from it.

@hnrklssn hnrklssn force-pushed the remove-deserialize-assert-upstream branch from 44e965e to 8658e3e Compare June 17, 2025 14:13
@hnrklssn
Copy link
Member Author

The direction meets my expectation. I think you already have an existing test for swift. Maybe you can try to reduce a lit test from it.

Yeah, I've tried reducing a lit test, but my reduced case doesn't trigger the original assert because the initializer is already deserialized. I'm trying to figure out what the difference is.

@hnrklssn hnrklssn force-pushed the remove-deserialize-assert-upstream branch 2 times, most recently from 2c488b2 to 89179c0 Compare June 18, 2025 15:10
@hnrklssn
Copy link
Member Author

The direction meets my expectation. I think you already have an existing test for swift. Maybe you can try to reduce a lit test from it.

Yeah, I've tried reducing a lit test, but my reduced case doesn't trigger the original assert because the initializer is already deserialized. I'm trying to figure out what the difference is.

Managed to create a reduced test case now that triggered the original assert, but doesn't with the current patch.

@hnrklssn
Copy link
Member Author

I think this new function should be called from ASTContext::DeclMustBeEmitted, no?

Of course, yes! Fixed now.

@hnrklssn hnrklssn force-pushed the remove-deserialize-assert-upstream branch from 89179c0 to 3f29c7c Compare June 19, 2025 13:26
Calling `DeclMustBeEmitted` should not lead to more deserialization, as
it may occur before previous deserialization has finished.
When passed a `VarDecl` with an initializer however, `DeclMustBeEmitted`
needs to know whether that initializer contains side effects. When the
`VarDecl` is deserialized but the initializer is not, this triggers
deserialization of the initializer. To avoid this we add a bit to the
serialization format for `VarDecl`s, indicating whether its initializer
contains side effects or not, so that the `ASTReader` can query this
information directly without deserializing the initializer.

rdar://153085264
@hnrklssn hnrklssn force-pushed the remove-deserialize-assert-upstream branch from 3f29c7c to eacaac0 Compare June 19, 2025 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants