Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Mar 27, 2025

This PR adds basic support for Typst. It's pretty basic for now and still has many areas to improve, since I'm still discovering LPeg, I'd love to hear any thoughts or suggestions you have!!!
Typst Reference

Preview from vis editor

image

@ghost ghost marked this pull request as draft March 27, 2025 11:40
@orbitalquark
Copy link
Owner

Thanks for starting on this! When you think it's ready, request a review and I'll have a look. Also, please give a link to Typst documentation.

@ghost
Copy link
Author

ghost commented Mar 27, 2025

Thanks, will surely do

i know this is out of context but i have a question about a rule i have:

local method = lex:tag("UNKNOWN", '#' * lexer.word * '.') * (lex:tag(lexer.FUNCTION_METHOD, lexer.word) * P('('))
lex:add_rule('func_method', method)

this is supposed to match the following expression:
#dict.len()
and it does, but I don't want to tag the #dict. as anything for now (not even UNKNOWN), but only use it to match the whole expr and only tag len, similar to non-input-consuming operators in lpeg, which when i try to use them the expression isn't matched anymore

@ghost
Copy link
Author

ghost commented Mar 28, 2025

This is what I achieved as of now

2025-03-28_17-27-55_focused_window

some things are still not working correctly as I don't know enough LPeg to get them to work (like embedded parsing: notice the if (on the right pane) in the #for block not being tagged as keyword)

Typst's syntax crate (includes lexer, parser...)

@orbitalquark
Copy link
Owner

Thanks, will surely do

i know this is out of context but i have a question about a rule i have:

local method = lex:tag("UNKNOWN", '#' * lexer.word * '.') * (lex:tag(lexer.FUNCTION_METHOD, lexer.word) * P('('))
lex:add_rule('func_method', method)

this is supposed to match the following expression: #dict.len() and it does, but I don't want to tag the #dict. as anything for now (not even UNKNOWN), but only use it to match the whole expr and only tag len, similar to non-input-consuming operators in lpeg, which when i try to use them the expression isn't matched anymore

You always have to tag matched text with something in order to move on and tag stuff that follows. You can use the lexer.DEFAULT tag for the stuff you don't care about.

2ta00ha3 added 5 commits March 31, 2025 19:00
@ghost ghost marked this pull request as ready for review March 31, 2025 23:05
@ghost
Copy link
Author

ghost commented Apr 9, 2025

@orbitalquark, I think the work is ready for a review. I added comments in the code about things I wasn't sure how to do, so I’d love your feedback on those and any other suggestions you have

Copy link
Owner

@orbitalquark orbitalquark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this! It looks really promising. As I've noted in a comment below, you've partially solved the "lexer embedded in itself" problem, and I am eager to iterate on it. In the meantime, I've left some comments and suggestions.

This is not a thorough review, so I may have more to say in a subsequent look-over, but it's a start. Thanks again!

EDIT: I realize I haven't addressed any of your inline code comments. Sorry about that. I will do so in a subsequent review. Hopefully there's enough for you to work on until then.

lexers/typst.lua Outdated
@@ -0,0 +1,125 @@
local lexer = require('lexer')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modern lexers use local lexer = lexer now.

lexers/typst.lua Outdated
@@ -0,0 +1,125 @@
local lexer = require('lexer')
local token = lexer.token
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modern lexers don't need this anymore.

lexers/typst.lua Outdated
Comment on lines 80 to 81
local embed_start = lex:tag('emb_tag', start)
local embed_end = lexer:tag('emb_tag', S('}'))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using lexer.EMBEDDED tag names so they are styled correctly.

lexers/typst.lua Outdated
}
end

local emb_lex = lexer.new('scripting')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a more descriptive name like 'typst_scripting'.

lexers/typst.lua Outdated
Comment on lines 83 to 100
local function add_rules(lexer_obj, pre)
local rules = build_rules(pre)
lexer_obj:add_rule('header', rules.header)
lexer_obj:add_rule('field', rules.field)
lexer_obj:add_rule('function', rules.mod_func + rules.func)
lexer_obj:add_rule('method', rules.method)
lexer_obj:add_rule('label', rules.label + rules.label_two)
lexer_obj:add_rule('code', lex:tag(lexer.CODE, rules.code))
lexer_obj:add_rule('string', lex:tag(lexer.STRING, rules.string))
lexer_obj:add_rule('link', lex:tag(lexer.LINK, rules.link))
lexer_obj:add_rule('math', lex:tag('environment.math', rules.math))
lexer_obj:add_rule('keyword', rules.keyword)
lexer_obj:add_rule('identifier', rules.iden)
--lexer_obj:add_rule('number', lex:tag(lexer.NUMBER, rules.numeric_value))
lexer_obj:add_rule('list', rules.list)
lexer_obj:add_rule('comment', rules.comment)
lexer_obj:add_rule('operator', rules.operator)
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can bring the contents of build_rules() into this function. When trying to read this function, I find myself constantly scrolling up to see how each pattern is defined. Making changes would be a bit difficult.

Comment on lines +10 to +11
lex:add_rule('bold', bold)
lex:add_rule('italic', italic)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to keep these out of build_rules()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to keep build_rules generic, so it has the shared rules in both scripting/text mode, as for bold/italic, these must only be applied in text mode, as an example:

#let countwords(s) = {
  if s == "" {
    return 0 * 0 * 5
  }
}

In the above example if we kept the bold/italic in the build_rules, it would tag the 0 in 0 * 0 * 5 as bold, which is not correct, since in scripting mode, -plain- text must be in brackets, and in consequence the italic/bold must be only applied in that case, as an example:

#let somefn(word) = {
   let text = [This is a *plain* text];
}

Comment on lines +105 to +109
add_rules(emb_lex, '')

lex:embed(emb_lex, embed_start, embed_end)

add_rules(lex, '#')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm completely blown away. You've partially solved the "language embedded in itself" problem. I would really like to use something like

local self = lexer.load('typst', 'typst_scripting')
lex:embed(self, start_rule, end_rule)

You cannot do this now because of an endless loop, but I would totally work on a fix to try and make this viable.



lex:set_word_list(lexer.KEYWORD, {
'if', 'else', 'for', 'while', 'let', 'set', 'import', 'include', 'return',
Copy link
Owner

@orbitalquark orbitalquark Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a nitpick for now, but in my brief testing,

#while n < 10 {
  n = (n * 2) - 1
  (n,)
}

the #while was not highlighting as a keyword.

I just thought I'd make a note.

EDIT: I saw you already noted something similar in an earlier comment, so this is just another instance to report.

@orbitalquark
Copy link
Owner

orbitalquark commented Apr 9, 2025

Sorry to commit directly to your PR. I was trying to figure out how to show a proof-of-concept for how we can embed this lexer within itself: 1a3f6f9. I thought I'd be on my own branch or something, but I guess not. I'm not very well versed with how git and GitHub work together.

Sorry about all the formatting changes. I forgot to turn off my autoformatter before I started making changes. Hopefully you get the idea.

@ghost
Copy link
Author

ghost commented Apr 9, 2025

Thanks so much for the notes, ill try to apply the changes required, as for the embedding mode, i'll look more into it later, and try to find something to fix the unbalanced brackets/curly braces and the recursive tagging in embedding mode

P.S: the docs are very very helpful and straight forward, probably one of the best docs i've came across for a long time since the BSDs' manpages, usually i'd spam some LLM w/ the man page or the docs and ask him what i want (i'm looking at you GNU manpages), but the Scintillua API docs had exactly what i needed

@mcepl
Copy link

mcepl commented Jun 5, 2025

Thanks for starting on this! When you think it's ready, request a review and I'll have a look. Also, please give a link to Typst documentation.

This is probably what @orbitalquark was after, isn’t it? https://typst.app/docs/reference/syntax/

@mcepl
Copy link

mcepl commented Jun 5, 2025

I wonder whether we shouldn’t have some library of testing documents. I have converted my last sermon slides to Typst (and I am not completely persuaded that it is easier to use than normal LaTeX Beamer, but that isn’t the point here), and I don’t see much in results.

Ezra 9–10_ Spiritual Reformation.typ.gz

@ghost
Copy link
Author

ghost commented Jun 12, 2025

one of the main limitations in the current lexer impl is the lack of support for recursive lexer embedding,for example, handling transitions between text and embedded scripting mode, the areas where this limitation appears are marked with 'EMBED' in the screenshot:
image

@ghost ghost closed this by deleting the head repository Aug 6, 2025
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants