Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AST annotations #256

Open
edsko opened this issue Nov 6, 2024 · 5 comments
Open

Implement AST annotations #256

edsko opened this issue Nov 6, 2024 · 5 comments
Assignees

Comments

@edsko
Copy link
Collaborator

edsko commented Nov 6, 2024

We should be able handle annotations in the AST, such as recording source spans, additional information for structs (such as offsets, field widths etc provided by Clang) or type information (for type inference of macros). Perhaps the following rather simple approach would be sufficient:

data Pass = ...

type Annot :: Symbol -> Pass -> Type
type family Annot con p -- open type family

type Expr :: Pass -> Type
data Expr p
  = Con1 ( Annot "Con1" p ) A B
  | Con2 ( Annot "Con2" p ) C
  ... -- NB: no extension constructor
@edsko edsko added this to the 1: `Storable` instances milestone Nov 6, 2024
@edsko
Copy link
Collaborator Author

edsko commented Nov 6, 2024

Alternatives would be to have a different type family for each constructor, or to use an open sum type approach e.g.

data ExprCon = Con1 | Con2

type Content :: Pass -> k -> Type
data family Content p con
data    instance Content P Con1 = MkCon1 A B
newtype instance Content P Con2 = MkCon2 C

data WithAnnot p con = Annot { annotation :: !( Annot p con ), content :: !( Content p con ) }

type Expr p = VariantF ( WithAnnot p ) '[ Con1, Con2 ]

but that seems a bit too heavy-weight.

@TravisCardwell
Copy link
Collaborator

I am working on implementing the design in the first comment, using a symbol-indexed open type family with passes defined using a sum type.

I do not yet have a clear view of what passes we might have. To start with, I just defined a single pass, tentatively called Parsed.

We need to decide what parts of the AST we would like to annotate. Here is what I have so far:

  • Newtype
  • NewtypeField
  • Struct
  • StructField

Should other parts of the AST be annotated?

In this initial implementation, I just set all annotations to (). The code is changed with minimal modifications to get it to compile, and we can format it nicely when adding actual annotations.

Regarding module organization, the annotation types (just type family instances since everything is currently ()) are currently all defined in HsBindgen.Hs.AST. I imagine that we may want to use multiple modules when the implementation increases in size.

I have not updated the tests yet. A number of them fail because the AST is now pretty-printed with annotations.

I am pushing the current state to the ast-annotations branch in case anybody wants to look at it. Please do not hesitate to let me know of any corrections or suggestions.

@TravisCardwell
Copy link
Collaborator

Here is an overview of the data flow:

graph TD
  SRC@{ shape: doc, label: "C Source"}
  LL("Low-level libclang types")
  C("C AST types")
  CIR("C IR types")
  HS("Haskell AST types")
  BC("Backend common types")
  TH("Template Haskell types")
  PP("Preprocessor types")
  DST@{ shape: doc, label: "Haskell Source"}

  SRC-- parsed to     -->LL
  LL--  translated to -->C
  C--   translated to -->CIR
  CIR-- translated to -->HS
  HS--  translated to -->BC
  BC--  translated to -->TH
  BC--  translated to -->PP
  PP--  rendered to   -->DST

  click LL "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen-libclang/src/HsBindgen/Clang/LowLevel/Core.hs" "HsBindgen.Clang.LowLevel.Core"
  click C "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/C/AST.hs" "HsBindgen.C.AST"
  click HS "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/Hs/AST.hs" "HsBindgen.Hs.AST"
  click BC "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/Backend/Common.hs" "HsBindgen.Backend.Common"
  click TH "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/Backend/TH.hs" "HsBindgen.Backend.TH"
  click PP "https://github.com/well-typed/hs-bindgen/blob/main/hs-bindgen/src/HsBindgen/Backend/PP.hs" "HsBindgen.Backend.PP"
Loading

Use of a simplified C IR is discussed in #253. From this discussion, my understanding is that the C AST will be transformed to separate C IR types. Perhaps neither C AST nor C IR types need annotations. I imagine that multi-pass processing will all occur only with the Haskell AST. I do not yet have a clear view of what passes we might have, though.

The term "annotations" has a connotation of "extra" information, but I think we should be clear that it is used for including information with types that vary depending on the pass regardless of it is "extra" or not.

  • LINE pragmas may be generated (Include LINE pragmas in generated output? #74). (I imagine that this may be enabled/disabled via configuration.) The required source location information is retrieved from libclang extents, and it will need to be passed from the C AST to the backend types. The type does not vary, so this is probably best included directly in the types, not in annotations.

  • Tool decisions may be output in comments (Explain tool decisions in generated output #23). (I imagine that this may be enabled/disabled via configuration.) I imagine that this may be implemented using a type like [ToolDecision] throughout all the types. This type does not vary, so this is probably best included directly in the types, not in annotations.

  • Documentation must be translated from C/Doxygen to Haskell/Haddock syntax (Include Haddocks for exported (low-level) bindings #26). If this translation is context-free, perhaps the high-level documentation types (defined in HsBindgen.Clang.HighLevel.Documentation) will be passed from the C AST to the backend types. In the preprocessor backend, it can be translated to Haskell documentation strings that include Haddock documentation syntax and is formatted with appropriate indentation and line length. In the Template Haskell backend, it can be translated to Haskell documentation strings that do not include Haddock documentation syntax, to be added in specific locations by module finalizers. With this design, the type does not vary, so it is probably best included directly in the types, not in annotations.

  • Import resolution is done differently in the different backends. The Template Haskell backend references names directly, imported in our backend implementation. The preprocessor backend resolves names using our own code, specifying the module a name is imported from and if the import should be qualified or not. Imports can optionally specify aliases, and the type of name (identifier or operator) determines how names are pretty-printed. All of this is specific to the preprocessor backend. I do not think that import resolution should be done earlier, as it would complicate the design and implementation.

@phadej
Copy link
Collaborator

phadej commented Nov 11, 2024

I agree with @TravisCardwell if the TL;DR is that for every need we identified so far the cleaner and simpler solution is "This [extra info] type does not vary, so this is probably best included directly in the [structure] types".

And, YAGNI for any non-yet identified needs.

@edsko
Copy link
Collaborator Author

edsko commented Nov 12, 2024

I don't mind delaying until we have a concrete use case, but one such use case is the results of type inference from @sheaf 's type checker.

TravisCardwell added a commit that referenced this issue Dec 4, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.

(Cherry-picked from `source-info` for experimentation)
TravisCardwell added a commit that referenced this issue Dec 5, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.
TravisCardwell added a commit that referenced this issue Dec 9, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.
TravisCardwell added a commit that referenced this issue Dec 17, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.
TravisCardwell added a commit that referenced this issue Dec 18, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants