-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrote the LaTeX parser to be more flexible #10
base: master
Are you sure you want to change the base?
Conversation
If no more review comments, then I'm merging. |
Only had time for a very brief review. Can look again this weekend. |
This comment here as it may get some discussion: The former will have more performant processing of error cases, since they are not exceptional. But it will need more work to maintain, as results are harder to compose. It will also be slower in the happy case, especially if computation expressions are needed to deal with the complexity of Results. The alternative is an option, where failure to find things gives a None, and where invalid syntax gives an exception. I think invalid syntax can be treated as exceptional, so we can make our lives easier by avoiding Result. Any other opinions? |
In general it's good that you have got something working but a lot of simplification and explanation will be needed. I wasn't able to discover the strategy used for parsing LaTeX because it was hidden behind functions with highly obscure type signatures. Names should be informative and their should be xml docs but it's only possible to name and describe objects once they have been simplified enough to be comprehensible. I will try to do some constructive work in this direction this weekend. |
Possibly useful? |
I read about the slowness of computation expressions. I wouldn't use them for Results.
Will simplify the code a bit. |
I have thought more about Results and also considered a simple issue in another repo.
I believe exceptions is better, because the inner functions are simplified to have types which return A parsing function routine which for example looks for a certain pattern should return None if it fails to find the pattern; these are not exceptional. A parsing function which is used when a certain pattern is known to be implied should throw an exception on failure. For example after seeing The global algorithm can catch failures and either return Error("invalid LaTeX") or fail with an InvalidLatexException. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
loop [] | ||
|
||
let mapFoldResult f (state:'State) (list: 'T list) : Result<'Result list * 'State, 'Error> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs XML comment.
} | ||
|
||
|
||
[<Struct>] type internal Read = Until of char | UntilRightDelimiter | OneArgument | All |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XML doc for this type?
let (|CommandName|) cs = match cs with | ||
| c::cs when not <| isAlphabet c -> string c, cs | ||
| PartitionAlphabets (ab, cs) -> System.String.Concat ab, skipSpaces cs | ||
let rec read tableEnv until cs list = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type signature of read is very complex. From the name and the type signature (and failing that an xml doc) you should be able to work our what any function does.
| PartitionAlphabets (ab, cs) -> System.String.Concat ab, skipSpaces cs | ||
let rec read tableEnv until cs list = | ||
///Reads a delimiter | ||
let readDelimiter cs = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is almost comprehensible.
Why does it return a Delimiter * char list instead of just a Delimiter?
If it returned a Delimiter then the type signature and name would indicate what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes a list of to-be-read characters, slices out a delimiter, returns it along with the to-be-read characters after the delimiter.
MathDisplay/MathAtom/LaTeX.fs
Outdated
| '['::cs -> readArgUntil' readOptionalArgUntil id (Until ']') argDict argDict.AddOptional tableEnv cs | ||
| _ -> Ok (tableEnv, ValueNone, cs) | ||
|
||
let rec replaceArguments atom state = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaceArguments is obviously a key function but has an obfuscated type signature. The same goes for the inner functions replace1 and replace2.
I've resolved the AliasDictionary issues (simplification and documentation). |
Now |
Why does a user need to interact with an alias dictionary? If there is a need for extension in future it can be done but it is comprehensible now, as simple as possible for what it is doing, documented, and definitely provides the functionality we are using at the moment (or else the project wouldn't compile). |
For example, |
I removed as much as possible while keeping the main project buildable. Ok do do one or two extra things like expose the keys or remove private on the dictionaries as needed in future. But we shouldn't add them until then. It's important to keep good discipline. I think the file went from 100 lines to 50. Maybe in future it goes back up to 55 if we need something extra. |
While I agree on YANGI in general, I think all collection types should at least act like collections, have collection methods, and implement collection interfaces. Also, why make Delimiter a DU which limits its possible values? It also kind of repeats information in |
Not all interfaces; otherwise it gets too large to maintain (as the previous code was). We should consider this an internal type, and add these things as needed. The correct API if we do add them is not obvious because there are two dictionaries. For example
There are a limited number of delimiters so representing these as a limited number of values is better. Easier to understand code with a finite number of values than infinite number of possible values of which most are invalid. There is the option of adding an |
This idea of yours is interesting: let Commands =
[ ["frac"], Fraction (Argument 1, Argument 2, Center, Center, ValueNone)
["sqrt"], Radical (Argument_Optional (1, Row []), Argument 1)
|> AliasDictionary While we may not be able to do exactly this (since it requires usage of an undisplayable internal Latex-specific Argument atom), we could adjust it like this: [<Struct>][<RequireQualifiedAccess>]
type Command =
| Frac
/// A square root or nth root
| Sqrt
| Color
let Commands =
[ ["frac"], Command.Frac
["sqrt"], Command.Sqrt
["color"], Command.Color]
|> AliasDictionary
let parseColor:string -> System.Drawing.Color = failwith "not implemented"
let rec parseCommand(command:Command,
optionals:string[],
args:string[],
parseAtom:string->MathAtom) =
let require (n:int) =
if args.Length <> n
then failwith <| (string n) + " inputs required by " + (Commands.GetPrimaryAlias command)
let maxOptional (n:int) =
if optionals.Length > n
then failwith <| "At most " + (string n) + " optional inputs allowed in " + (Commands.GetPrimaryAlias command)
let getOptional n =
if n < optionals.Length then Some(optionals.[n]) else None
match command with
| Command.Frac ->
require 2
Fraction(parseAtom args.[0], parseAtom args.[1], ValueNone, ValueNone, ValueNone)
| Command.Sqrt ->
require 1
maxOptional 1
Radical(getOptional 0 |> Option.map parseAtom, parseAtom args.[0])
| Command.Color ->
require 2
Colored(parseAtom args.[1], parseColor args.[0]) This also works with things to be parsed that are not Atoms (e.g. Color as shown). |
It gets complicated when we are going to implement "\binom" or "\TeX" which give atoms with more complicated structures. These two commands are both present in CSharpMath. ["binom"], Delimited (Delimiter.LeftBracket, Frac (Argument 1, Argument 2, Center, Center, ValueNone), Delimiter.RightBracket)
["TeX"], Styled (Row [Ordinary "T", Space -.1667, Offsetted (Ordinary "E", 0, -4.5), Space -.125, Ordinary "X"], FontStyles.Roman) |
Regarding Result vs Exception: MathAtom/MathDisplay.Tests.Benchmarks/Result_ExceptionBenchmark.fs Lines 43 to 48 in a0eac56
While Result is 2.7 times slower than Exception when the result is Ok, Exception is 1055.5 times slower than Result when the result is Error. |
This will probably be very noticeable when using the Example app which displays user-inputted valid/invalid LaTeX real-time. |
Those examples easy with the draft scheme I gave. | Command.Binom ->
require 2
Delimited (Delimiter.LeftBracket, Frac (parseAtom args.[0], parseAtom args.[1], Center, Center, ValueNone), Delimiter.RightBracket)
| Command.TeX ->
require 0
Styled (Row [Ordinary "T", Space -.1667, Offsetted (Ordinary "E", 0, -4.5), Space -.125, Ordinary "X"], FontStyles.Roman)
Is it possible to tokenize a LaTeX string (split it into a list of commands, delimiters, ordinaries...) before implementing any logic? |
Now we need to define new
That is the purpose of |
The purpose of AliasDictionary is to convert multiple alias strings into the things they represent and to give the primary alias of a represented object. A one-way dictionary is OK from commands to behavior, which is defined on all commands, is better known/represented as a function.
This does more than lexing. |
What I'm trying to recreate with
It's just some sort of pattern matching that converts between LaTeX strings and MathAtoms. |
Some miscommunication. I am asking whether lexing of LaTeX is possible. |
Lexing itself is possible, just that I don't see a reason for the extra step. |
Two way pattern matching: I'm skeptical that will work. I think we need to get this repo into a clean state with comprehensible code and any attempts like that can branch off as experiments. Don't want to tackle extremely difficult stuff before we have good basics in place. |
I think I will need to implement a simple parser. |
Serialization of some simple mathatoms into latex will have some utility. And some sharing of data, especially names and aliases, could be shared between parser and serializer. But it's important not to couple parsing to a complex system to create a difficult coupling between parser and serializer and mathatom itself. That will prevent getting master into any sort of reasonable shape in the near future. |
I would like this pr to go into master which means keeping it clean. I will write a parser but am travelling for next 2 weeks (including HK, could meet there). Anything to do with Serialization (beyond aliases and a list of commands) shouldn't go in this pr. Once we have a good pr to merge then master will be in a good state to branch off from with feature branches. |
When will you be in HK? |
No description provided.