-
Notifications
You must be signed in to change notification settings - Fork 45
Preserve content indent in multiline Patterns #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What would you do with weird =
Some text that
dedents on its way. |
Great question, thanks. It would be a syntax error, raised in |
From a big-picture POV, I think we should only do this if we can avoid introducing syntax errors. I think it's a good achievement that we stop kicking people in the face for them just typing ahead with 0.7, and I'd like us to be consistent about that. That is, I think we should continue to support multi-line text with leading inline, and also weird indention patterns. I think that's more valuable than a mark-up free indention logic. Technically, I wanna throw in that look = mommy,
\tI can have leading tabs
or not. is another interesting challenge to the algorithm. With |
Do you have a proposal how to achieve this? |
Sadly, I don't have a proposal on how to create markup-free indention. Rambling a bit: I guess the only hard blocker is leading-tabs intermixed with spaces, 'cause there's just no way to tell if a tab is any particular number of spaces. In a scenario of consistent indention, we could skip the leading indention of the non-inline lines. Now, in a case of inconsistent indention (mix of space and tab), switching this off would be weird and somewhat an element of surprise, I think. Syntax error would be worse. We could also reduce inline white-space to just space, and put the tab character into the bucket of all the other chars. But that's a major religious debate, and probably also surprising and confusing. I'm not thrilled by any of my ideas here. .... |
Switching back to bigger picture. I can see how there are some applications where indention matters. Say, a markdown document. But then, I think that doing markdown paragraphs is probably not a nice experience with Fluent to begin with, regardless of the indention issue. This proposal does have down-sides for the applications where Fluent currently is good at, say regular HTML fragments, which do white-space normalization during layout. Right now, you're allowed to get indention off there. Keeping that seems like a bigger win for the applications that we know of. For the unknown unknowns, I'd suggest to WONTFIX this. I'd be open to re-open this at the point where we're faced with a concrete project, that benefits from Fluent, and that shows how to develop this feature in a way that serves both the existing and the future customers. I think that's OK for the things we're doing now, and will be OK for them most likely if we revisit this. |
Isn't this what the current behavior (Syntax 0.6) is? Which part of the indent would be kept in the AST as localizer-originating? As far as your other tab-vs-spaces questions go, I look up to how Python deals with inconsistent indentation characters: it doesn't allow them. I'm OK with a syntax errors in such cases.
I don't understand how this proposal has down-sides for an environment which does normalization anyways. Would you mind rephrasing? |
The balance that I'm trying to strike here is between being lax to input and being predictable in what the result is. The goal is to allow localizers more control over what comes out from In the following snippets I'm going to use a regular space ( Perhaps a few more examples will help the duscussion. The base use-case is this: multiline-text =
<p>
████Hello.
</p> In this example, the localizer-originating indent of The following examples of inconsistent indentation would be illegal. This is fine because most editors can maintain the same level of indentation when pressing Enter. You'd have to try hard to cause a syntax error. # Error: inconsistent indentation.
multiline-text =
<p>
████Hello.
⭾</p>
# Error: inconsistent indentation.
multiline-text =
<p>
Hello.
</p> Note that by making them illegal right now, we reserve the possibility of making it valid syntax in a future milestone. Perhaps lines indented by fewer blanks than the first line should still count as valid? We don't have to make this call now, which is a good thing for 0.7 and our ability to ship it. If the first line of a multiline pattern starts with a tab, then all other lines have to start with tabs as well: multiline-text =
⭾<p>
⭾████Hello.
⭾</p> Furthermore, mixing inline patterns and block patterns would be illegal in 0.7. # Error: mixing inline and block patterns
mixed-text = Starts inline
and continues as a block. Again, by forbidding this right now we make it possible to allow this in the future if we see people making this mistake often. Maybe the behavior should be the current one: no indents are stored in the AST in this case. I don't know, but that's OK because we don't have to decide right now. All these changes are good for making the grammar less ambiguous and for giving more control to the localizer. As a reminder, here's the summary of the goals of Syntax 0.7:
I consider the fact that multiline-text =
<p>
████Hello.
</p> parses and serializes back to multiline-text =
<p>
Hello.
</p> a significant surprise. We need to fix this. If we're going to wontfix this issue, I expect us to find another solution to this problem. |
Serialization of white-space is issue #121. If that's important to fix, let's re-triage it.
I think that less ambiguity is actually counter-acting your stated goals for 0.7. In fact, the white-space relaxation is actually allowing more ways to express a valid fluent message. Allowing that for technical syntax on the one hand, and at the same time introduce barriers in regular text is contradictory in my mind. |
#121 is overkill for what I'm trying to achieve here. It doesn't really help because it doesn't make the whitespace part of the value in the AST. Serialization is just one use-case. I think it's important to store some whitespace in the AST as part of values because it allows consumers of the AST to interpret it to their needs. This is exactly the same reason why I changed my mind in #122.
The whitespace relaxation proposal in #87 is directly in line with the "reduce surprises" goal. |
I like this proposal a lot. It follows my natural instincts when I read the messages and I agree that dedented text should throw an error (for now at least). The only change I'd suggest is that:
should work. It's perfectly easy to find the initial-indent on line two and use it as the default indent for the remaining lines, just like in all other cases. |
There are three reasons to forbid it:
I suggest we forbid this syntax now as a way to test the hypothesis about the parsing speed-up and to simplify the EBNF and the parser implementations. If the feedback from localizers indicates that they trip over this frequently, I'll be very open to allowing this in the future. |
I'd like to make a decision here when I'm back from vacation mid-next week. It looks like there are three areas of discussion:
|
I and zibi both voiced strong objections to adding syntax errors for this feature. If we're fine removing them in 0.8, we should push the whole issue to 0.8, and fix this once, and fix it right. stas raised that we could get a performance optimization from not allowing multi-line text and inline text to share lexical structure. I don't see that in the python impl I'm toying around with. Counter-proposal is:
This should be very easy to implement in zero-copy parsers, and there are interesting algorithmic approaches to other parsers. Notably, we can speculatively discard the indent of the first follow line, and re-indent the previous follow-line starts when we find a lesser indent. In the common cases, this will be O(1), and in pathological examples O(n^2), with n being the number of lines. |
The perf speedups is a nice-to-have side effect. The main reason is I gave is:
So far, I've only seen @zbraniecki address this with:
I disagree with implementing this approach right now. I'm open to discussing adding it later, once we have some experience and data to work with.
I disagree. By restricting right now we can ensure that no rogue FTL file written between now and 0.8 will stand in the way of relaxation. I also prefer an iterative approach over the fix this once, and fix it right one.
That's not how I read @zbraniecki's comment:
My counter-counter-proposal (which is my initial proposal with your suggestion to remove tabs):
|
Based on the conversation from today, I created a gist with the remaining item: https://gist.github.com/zbraniecki/5848e7142cf9090edce5d48f80ad9b1f For anyone following this thread, please, help us make the decision! :) |
I'm going to move this issue to 0.8. My goal of splitting the scope between 0.7 and a later milestone was to make it possible to ship a good basic version of this sooner rather than later. A prolonged discussion about this idea defeats its original purpose. In talking to @Pike earlier today, a new edge-case also came up which will require some further thought and design (I'm using # The first line's indent is 0. The greater common indent is 0.
# Should all 4 spaces at the beginning of the "at your disposal"
# line be part of the value? Even if at least one of them is required
# by the syntax?
credits-available =
{ $num ->
[one] One
*[other] {$num}
} available
····at your disposal. I think we need to decide if indent before placeables should count for the purpose of computing the localizer-intended indent. We'll also need to ask the same question for blank lines and lines consisting entirely of spaces. These might turn out to be simple questions, but in the interest of getting 0.7 ready, let's consider them holistically in 0.8. I no longer believe that the version I proposed for 0.7 is safe enough to push for it. |
Leaving this here for when it's time to do 0.8: |
I took your branch and started working on it to get it ready for a PR. The major missing piece right now is including the indent of the first line of block patterns in the dedentation logic, too. Right now, the parser skips the entire blank space before the first character of the block pattern and any indent of the first line is lost in the process. (By block pattern I mean a pattern which starts on a line below the identifier.) # We need to preserve the ···· indent, rather than
# skip it because it precedes a Pattern.
message =
····Foo
Bar I'll work on this tomorrow. |
YAML is terrible in many ways, including multi-line strings. But a single thing I like there is that the whole multi-line string behaviour is defined at the beginning without later touching the value itself. Maybe I am completely off, but could Fluent use a similar notation (but different semantics)? What about something like this: Having
Having
This way strings could probably contain blank lines, despite it won't be very visible without an editor visualizing the spaces.
I am not sure about lines with trailing spaces and if you want to solve them too. If there could be some another (optional) modifier to specify the line trailing spaces behaviour, explicitly including them in the string (like
|
Thanks for the ideas, @MikkCZ! We actually looked at YAML's multiline handling in #122, although in a slightly different context. Before I filed this proposal, I considered extending the syntax of multiline patterns; it was very similar to your examples, actually :) There were a number of reasons why I didn't go with this approach. The sigils, like Patterns can also be values of attributes and variants, which themselves can be indented for clarity. This makes it a bit hard to know how much of the indentation of the first line is an artifact of where the pattern is in the message. # "······Foo\n··Bar\nBaz"?
message =
.attr = >
······Foo
········Bar
······Baz I also didn't want to give any special treatment to leading and trailing whitespace. It's usually a sign of a problem in the software if the translation needs it. It's most commonly abused to concatenate messages in the code rather than in Fluent. Having a dedicated syntax for it would legitimize its use. Furthermore, if absolutely required, the whitespace can be included in the value explicitly via a # "Foo\nBar··"
message =
Foo
Bar{" "} I admit it's a bit awkward for leading and trailing newlines, though. # "\n\nFoo\n\n"
message = {""}
··Foo
··{""} Hopefully, the shouldn't be translations which needs this. And if in the future we see use-cases for making this easier to write, we can try to find a different solution. Lastly, I was hoping to find a solution which wouldn't require memorizing which sigil does what. The YAML example is actually very telling. While it all makes sense, it's also just very complex and I image users need to refer to the manual often to get it right!
@Pike described the current proposal as the "ruler rule" ;) Imagine holding a ruler in your hand and putting it up vertically against the monitor. Then start moving it to the right until you reach the first character on any of the lines of the block pattern. Any extra indentation on other lines should be preserved. This solution allows to use complex HTML markup in translations and preserve its indentation. It also works well if the translation is to be interpreted as Markdown. Do you see other use-cases where it might not be enough? Let me know what you think and thanks again for taking the time to share your thoughts! |
Regarding the case I mentioned in #162 (comment): # Exhibit A.
# Value is "····Foo\nBar".
message =
····Foo
Bar I remember an idea from a year ago to explicitly forbid leading whitespace in
This proposal changes this assumption. @Pike, do you anticipate any troubles because of that? One that I can think of is that serializing the value of # Exhibit B.
# Value is "Foo\nBar".
message = ····Foo
Bar However, I think our current implementations protect us from this. Here's my thinking:
|
Oh, wait, maybe we don't actually need to solve the
That's not a missing piece, it's a feature! To preserve the leading indent one needs to use a # Formats to "····Foo\nBar".
message =
{"····"}Foo
Bar |
@Pike would you like to open a PR based on your branch? |
After giving it more thought, I'm still torn but I'm leaning towards giving the leading block indent priority over the trim-leading-blanks behavior. If the localizer decides to use multiline patterns, it's because they care about the formatting for some reason. It may be pure aesthetics or maybe the requirements of the environment (e.g. Markdown's indented code blocks). In either case, indentation becomes part of the authored content and should be preserved. I'm struggling to come up with good examples, but I'd expect the following Fluent messages to follow the ruler rule rather than have the the leading blanks trimmed: lipsum =
Lorem ipsum dolor sit amet enim. Etiam ullamcorper.
Suspendisse a pellentesque dui, non felis. Maecenas
malesuada elit lectus felis, malesuada ultricies. Curabitur
et ligula. Ut molestie a, ultricies porta urna. Vestibulum
commodo volutpat a, convallis ac, laoreet enim.
Phasellus fermentum in, dolor. Pellentesque facilisis.
Nulla imperdiet sit amet magna. Vestibulum dapibus,
mauris nec malesuada fames ac turpis velit, rhoncus
eu, luctus et interdum adipiscing wisi. for-of-loop =
for (let element of elements) {
doSomething(element);
}
The above snippet is an example of a `for-of` loop is JavaScript.
It can be used to iterate over a collection of items which implements
the `[Symbol.iterator]()` method. This creates a refactoring hazard when a two-line pattern which preserved the blanks in front of the first line is converted to a single-line pattern. # "····Foo\nBar"
foo =
Foo
Bar # "Foo Bar"
foo =
Foo Bar However, I'd argue that if the leading indent of a multiline pattern is indeed important, it's because of the same reason that would make it look bad as a single-line pattern anyways. And so the risk of making a mistake when refactoring is low. Case in point, Foo Bar is not something that I'd choose to write as multiline in the first place nor something that should have its leading whitespace preserved. |
In Syntax 0.6 all leading indentation on each line of multiline Patterns is removed in the AST. The following Fluent code:
multi = Lorem ipsum dolor sit amet enim. Etiam ullamcorper.
parses as:
Problem statement
The indentation on each line conflates two orthogonal purposes:
In Syntax 0.6, the only way for the localizer to have their indent preserved is to use
StringLiterals
with explicit whitespace in each line:This might be an acceptable solution to one-off situations, but it doesn't scale for multiline patterns, where
multi > 2
. One use-case for such patterns which require custom indentation is translations which define some multiline HTML to be used by a markup overlay mechanism.Prior work
|
) be placed at the beginning of each line of the string literal. I don't think this approach would be right for Fluent. In Fluent, typing text should be the simplest thing to do. Consequently, text should have the sweetest syntax, i.e. no syntax at all. If anything, I'd consider using|
for everything else except multiline text, but that's a different issue.TextElement
) doesn't have any syntax at all and a similar approach is not possible.Proposal
I propose that the width of the indentation of the first line of a Pattern be considered the indent of the entire Pattern across all its lines. Any extra leading whitespace should be included in the AST as part of the Pattern.
In other words, the following example:
multi = Lorem ipsum dolor sit amet enim. Etiam ullamcorper.
will parse as:
In the rare cases when the first line of the Pattern needs to keep its indent, a
StringLiteral
should be used to 1) make it explicit and 2) define the indent required by the syntax. This is consistent with how creating Patterns with leading whitespace works right now. Without the explicitStringLiteral
, all leading and trailing whitespace is trimmed during parsing.parses as:
Requirements
For this to work, we'll need to forbid multiline Patterns which start on the same line as the identifier or the variant key.
above-snippet-fixed = A multiline Pattern must start on a new line. Now the syntax-required indentation is unambiguous and any extra leading whitespace will be kept in the AST.
The text was updated successfully, but these errors were encountered: