Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add id dispenser for numerical yul node ids #15838

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

clonker
Copy link
Member

@clonker clonker commented Feb 7, 2025

Depends on #15823

@clonker clonker added the has dependencies The PR depends on other PRs that must be merged first label Feb 7, 2025
Comment on lines 70 to 73
// this can be replaced by the actually used ids in the provided block once the AST uses ids instead of YulString
std::set<NodeId> usedIds = ranges::views::iota(static_cast<size_t>(0), m_mapping.size() + m_offset) | ranges::to<std::set<NodeId>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version where AST contains numerical ids:

std::set<NodeId> usedIds = NameCollector(_root).names();
// add ghosts to usedIds as they're not referenced in the regular ast
for (size_t i = 0; i < m_mapping.size(); ++i)
if (m_mapping[i] == ASTNodeRegistry::ghostId())
usedIds.insert(i + m_offset);

@clonker clonker requested a review from blishko February 7, 2025 11:12
@clonker clonker force-pushed the add_node_id_dispenser branch 2 times, most recently from 10dde18 to 0076c71 Compare February 10, 2025 07:45
@clonker clonker force-pushed the add_yul_name_label_registry branch 4 times, most recently from 5ade700 to 4602bf7 Compare February 10, 2025 15:49
@clonker clonker force-pushed the add_node_id_dispenser branch from 0076c71 to 26ea8ed Compare February 10, 2025 16:42
@clonker clonker force-pushed the add_yul_name_label_registry branch 4 times, most recently from 0fdc32e to 5bc5d98 Compare February 13, 2025 10:53
@clonker clonker force-pushed the add_node_id_dispenser branch from 26ea8ed to 96f8138 Compare February 13, 2025 11:11
@clonker clonker force-pushed the add_yul_name_label_registry branch 2 times, most recently from 837871f to 43a00c6 Compare February 13, 2025 11:22
@clonker clonker force-pushed the add_node_id_dispenser branch from 96f8138 to 1ae159b Compare February 13, 2025 11:26
@clonker clonker force-pushed the add_yul_name_label_registry branch 2 times, most recently from a6ae0c7 to b0f5c6b Compare February 14, 2025 13:11
@clonker clonker force-pushed the add_node_id_dispenser branch from 1ae159b to 37112b5 Compare February 14, 2025 13:47
@clonker clonker force-pushed the add_yul_name_label_registry branch from b0f5c6b to 2d04f51 Compare February 14, 2025 14:18
@clonker clonker force-pushed the add_node_id_dispenser branch from 37112b5 to ada52c0 Compare February 14, 2025 14:30
@clonker clonker force-pushed the add_yul_name_label_registry branch 3 times, most recently from 44cace1 to f330e45 Compare February 19, 2025 12:57
@clonker clonker force-pushed the add_node_id_dispenser branch 2 times, most recently from b514841 to 9f54785 Compare February 20, 2025 08:14
@cameel cameel force-pushed the add_yul_name_label_registry branch from f330e45 to 4d96648 Compare March 1, 2025 00:11
Base automatically changed from add_yul_name_label_registry to develop March 1, 2025 05:29
@clonker clonker force-pushed the add_node_id_dispenser branch from 9f54785 to f50d498 Compare March 4, 2025 09:12
@clonker clonker removed the has dependencies The PR depends on other PRs that must be merged first label Mar 4, 2025
/// Reserved labels, equipped with the transparent less comparison operator to be able to handle string_view.
std::set<std::string, std::less<>> m_reservedLabels;
/// Offset by which LabelIDs must be shifted to be used in the dispenser's `m_idToLabelMapping`
size_t m_offset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to suggest a clearer name for this, but actually I think it would it be much more self-explanatory to just use ASTLabelRegistry::maxID() directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, absolutely. Thanks for the suggestion!

ASTLabelRegistry const& labels() const { return m_labels; }

/// Spawns a new LabelID which depends on a parent LabelID that will be used for its string representation.
LabelID newID(LabelID _parent = 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should say that the parent label must not be an unused one.

Same for resolveParentLabelID().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for generateNewLabels() we should mention that the resulting registry will never contain any unused IDs. They will always have new labels generated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for generateNewLabels() we should mention that the resulting registry will never contain any unused IDs. They will always have new labels generated.

It will have unused IDs. Right now I am not compressing the id range but it's append-only. As ids grow, so does the number of unused ones in the lower ranges. I thought about compressing them to the used ones, which would mean potentially remapping existing IDs in the AST and make generateNewLabels return a fresh AST. We could still do that if we figure it's a performance bottleneck or we waste too much memory. But personally I don't think so.

Comment on lines 76 to 82
bool LabelIDDispenser::ghost(LabelID const _id) const
{
yulAssert(_id < m_idToLabelMapping.size() + m_offset, "ID exceeds bounds.");
if (_id >= m_offset)
return m_idToLabelMapping[_id - m_offset] == ASTLabelRegistry::ghostLabelIndex();

return m_labels.ghost(_id);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function will return false if _id's parent is a ghost. Is that intentional? If not, then we should assert against that possibility in newID(). Note that currently newID() only rejects IDs which are ghosts themselves (due to range assert in resolveParentLabelID()).

newID()'s docstring should also document how it handles ghosts in either form.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in my mind one should only use newGhost to generate new ghosts. Deriving labels from ghosts has the potential to break things in an intricate fashion. Definitely not intended. I'm adding an assert + documentation of this.

}


ASTLabelRegistry LabelIDDispenser::generateNewLabels(Block const&, Dialect const& _dialect) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't use the _block yet, IMO it would be best not add this parameter for now. I'd rather see the places that need to pass it in modified in the same PR that will implement the functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fair:) I thought it would be easier this way but if you prefer it without the parameter then I'll remove it for now.


#include <libyul/optimiser/LabelIDDispenser.h>

#include <libyul/optimiser/NameCollector.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class does not really seem to use anything from NameCollector.h. Or did I miss something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't miss anything, it's in the same vein as the block argument: to be used once fully integrated. I'll remove the include for now.

auto const parentLabelID = resolveParentLabelID(id);

auto const originalLabelIndex = m_labels.idToLabelIndex(parentLabelID);
auto const& originalLabel = originalLabels[originalLabelIndex];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a string view, not a string. Is there any point in taking a reference to it?

Suggested change
auto const& originalLabel = originalLabels[originalLabelIndex];
std::string_view const originalLabel = originalLabels[originalLabelIndex];

BTW, this yet another good example of how auto obfuscates things that would have been obvious otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a string :) case in point, though!

auto const& parentLabel = originalLabels[parentLabelIndex];

std::string generatedLabel = parentLabel;
size_t suffix = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small optimization we could do here would be to remember the counter value we stopped at when we last visited the same parent.

Not sure if it's worth it in that form, because it would require us to keep track of counters for all labels from the original registry, but a simplified version would be to just use a single counter for all labels. That could be achieved simply by moving the suffix variable initialization out of the loop. The downside being that numbers used in names would have more digits than they could otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! The memory impact isn't too big, just a bunch of size_ts.

Comment on lines 38 to 41
/// A set of reserved labels may be provided, which is excluded when generating new labels. If a reserved label
/// already appears in the label registry and is used as-is in the AST, it will be reused despite it being
/// provided here.
explicit LabelIDDispenser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important thing to mention is that the original labels will be preserved even if they're not valid Yul identifiers.

And for generateNewLabels() I'd add that labels are guaranteed to be valid and not reserved IFF the original registry satisfies this condition. And it's guaranteed that at least no additional invalid/reserved names will be introduced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, noted these things.

Comment on lines 91 to 95
if (usedIDs.empty())
return {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will never be satisfied, because ASTLabelRegistry always contains at least one label (the empty one) and therefore usedIDs will always contain 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another consequence of including 0 in usedIDs is that you'll try to generate a new label for it (you also mark it as reused), which will overwrite idToLabelMap[0]. I think that'll actually trip an assertion in ASTLabelRegistry constructor, because it requires that the label at 0 is "".

On the other hand the "" you put in labels will go unreferenced (unless there are unused IDs in the original registry).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will never be satisfied, because ASTLabelRegistry always contains at least one label (the empty one) and therefore usedIDs will always contain 0.

Right. Although this is a bit temporary and to be replaced with whatever the NameCollector yields on an input block in which it may happen to have an empty usedIDs set. I have removed the condition for now so it can be re-introduced when properly integrating the block input.

m_reservedLabels(_reservedLabels.begin(), _reservedLabels.end()),
m_offset(_labels.maxID() + 1)
{
m_reservedLabels += std::set{""};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even necessary to reserve ""? As far as I can tell, m_reservedLabels just prevents a label from being reused in generateNewLabels(), but you set reusedLabels[0] = true there, so it won't hit the reserved name check anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I don't think it is necessary and having fewer reserved labels is good for performance. :)

@clonker clonker force-pushed the add_node_id_dispenser branch 4 times, most recently from 0d84ba2 to d2d9d5b Compare March 11, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants