Skip to content

Conversation

@ggiraldez
Copy link
Contributor

@ggiraldez ggiraldez commented Dec 3, 2025

This PR defines an AST API as a façade over the last IR language currently available (ir2_flat_contracts) that allows opaque access to the IR nodes and the semantic information computed in the frontend passes.

  • From the SemanticAnalysis object that is obtainable from the CompilationUnit it's possible to retrieve the root of the AST of each file in the unit get_file_ast_root(), which returns ast::SourceUnit.
  • Navigation from the SourceUnit down through the intermediate AST nodes, down to the terminal nodes.
  • Non-terminal nodes are wrapped in objects which contain a reference to the IR node and the SemanticAnalysis.
  • Terminal nodes are not wrapped by default except for Identifier nodes, which allow attempting to resolve to a Definition object.
  • A Visitor trait can be implemented and used via accept_source_unit (and related functions) to traverse the wrapped AST.
  • Non-terminal nodes define methods to access the child nodes.
  • Certain non-terminal nodes also define other methods to access other semantic information (eg. SourceUnit defines contracts() to retrieve the list of contracts defined in the file)

@changeset-bot
Copy link

changeset-bot bot commented Dec 3, 2025

⚠️ No Changeset found

Latest commit: b039fbe

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@ggiraldez ggiraldez marked this pull request as ready for review December 17, 2025 23:27
@ggiraldez ggiraldez requested review from a team as code owners December 17, 2025 23:27
define_variant!(YulVariable, Identifier);
}

include!("ast_nodes.generated.rs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused by the structure/imports until I found this include!() here.
I suggest using the same pattern in other IRs, by renaming ast_nodes to nodes, and the new nodes here to node_extensions. In that case, ast/mod.rs can simply pub use nodes::* and pub use node_extensions::* to achieve the same result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest splitting the new types/extensions into type-specific modules under ast/node_extensions/*, as I expect them to only grow in size, and it would make reviewing them much easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I thought it was not possible to extend the implementations (defining new impl blocks) for types in a different module, but I was mistaken. I'll change the structure of this code.

}

impl IdentifierStruct {
fn create(ir_node: &Rc<TerminalNode>, semantic: &Rc<SemanticAnalysis>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be pub(crate)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an oversight.

.unwrap()
}

define_variant!(Constant);
Copy link
Contributor

@OmarTawfik OmarTawfik Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the extra SemanticAnalysis field for every def in the compilation, this struct/define_variant!() pattern makes it harder to match on the type, resorting to the added is_/as_ extensions.

I wonder if we should make it a proper enum instead? Each *Definition type contains a SemanticAnalysis object already.

enum Definition {
    Contract(ContractDefinition),
    Enum(EnumDefinition),
    // etc...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest adding a sync marker between the variants of this Definition and binder::Definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try changing this to an enum. Seeing the code in perspective it makes perfect sense.

}
}

fn internal_definition(&self) -> &binder::Definition {
Copy link
Contributor

@OmarTawfik OmarTawfik Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this PR adds definitions, but not references.
Is this planned for a future PR?

Copy link
Contributor Author

@ggiraldez ggiraldez Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References would need to be added to the the Definition (or each *Definition AST node). It's not added yet because we don't have the inverse mapping computed yet on the SemanticAnalysis object, but that's easy to do. I think that can go into this PR. I'll add that.

let binder::Definition::Contract(contract) = definition else {
return None;
};
if definition.identifier().unparse() == name {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking this PR:

I notice in many places in the backend we are using unparse() (which clones the string value) just to get a reference to check/compare. I wonder if we should use/add a fn TerminalNode.value() -> &str instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add on this, we should intern all of these strings so these comparisons are just over ids (numbers).

Copy link
Contributor Author

@ggiraldez ggiraldez Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of interning, which will also further improve performance of the binder, as there's a big chunk of time spent in hashing identifiers for lookup. This should be done in the parser though.

@@ -0,0 +1,6549 @@
// This file is generated automatically by infrastructure scripts. Please don't edit by hand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we measured the impact of adding this additional Rc<SemanticAnalysis> to every single node in the tree?
I like the new API, and that people don't have to see/interact with the SemanticAnalysis object at all, but I want to make sure that the increase in memory/allocation time is acceptable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an issue for me right now is that we're cloning everything by default, so if a user wants to traverse the AST they will create a huge amount of (probably short lived) Rcs to it. The API looks almost like a typed cursor, so I wonder if maybe the methods should consume self rather than borrow it:

    pub fn members(self) -> SourceUnitMembersStruct {
        SourceUnitMembersStruct::create(
            &self.ir_node.members,
            self.semantic,
        ))
    }

This way, if a user actually needs to keep the reference they can cheaply clone it (we should #[derive(Clone)])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, whenever we start looking into allocators we'll probably bring in lifetimes in, so a lot of this Rc<SemanticAnalysis> should become &'arena SemanticAnalysis

Copy link
Contributor Author

@ggiraldez ggiraldez Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding a Rc<SemanticAnalysis> to each IR node be even possible? I think there is a circular dependency there, and the borrow checker will not like that. Nevermind this, I read the opening comment wrong.

Re: the use of Rc, maybe we could be using value structs directly for Rust code, but once we cross into Wasm territory we need to keep track of the memory somehow, and our wrappers will use Rcs anyway. The AST is a façade/cursor API so maybe it's not that impactful?

let unit = build_compilation_unit()?;
let semantic_analysis = unit.semantic_analysis();
assert_eq!(2, semantic_analysis.files().len());
assert_eq!(3, semantic_analysis.files().len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this test to sample.rs as well, eliminating this module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions/suggestions. Thanks!

Copy link
Contributor

@teofr teofr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments and questions

@@ -0,0 +1,6549 @@
// This file is generated automatically by infrastructure scripts. Please don't edit by hand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an issue for me right now is that we're cloning everything by default, so if a user wants to traverse the AST they will create a huge amount of (probably short lived) Rcs to it. The API looks almost like a typed cursor, so I wonder if maybe the methods should consume self rather than borrow it:

    pub fn members(self) -> SourceUnitMembersStruct {
        SourceUnitMembersStruct::create(
            &self.ir_node.members,
            self.semantic,
        ))
    }

This way, if a user actually needs to keep the reference they can cheaply clone it (we should #[derive(Clone)])

@@ -0,0 +1,6549 @@
// This file is generated automatically by infrastructure scripts. Please don't edit by hand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, whenever we start looking into allocators we'll probably bring in lifetimes in, so a lot of this Rc<SemanticAnalysis> should become &'arena SemanticAnalysis

let binder::Definition::Contract(contract) = definition else {
return None;
};
if definition.identifier().unparse() == name {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add on this, we should intern all of these strings so these comparisons are just over ids (numbers).

@OmarTawfik OmarTawfik self-requested a review December 20, 2025 07:27
Also constructs the this (reverse) mapping after resolving all references.
@ggiraldez ggiraldez requested a review from teofr December 30, 2025 18:49
Copy link
Contributor

@teofr teofr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a small comment, but LGTM

definitions.insert(*node_id, vec![*reference_id]);
}
}
Resolution::Ambiguous(_) | Resolution::BuiltIn(_) | Resolution::Unresolved => {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small note, we may be interested in computing references for BuiltIns as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants