Skip to content

Preserve content indent in multiline Patterns #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stasm opened this issue Aug 3, 2018 · 26 comments
Closed

Preserve content indent in multiline Patterns #162

stasm opened this issue Aug 3, 2018 · 26 comments
Labels

Comments

@stasm
Copy link
Contributor

stasm commented Aug 3, 2018

In Syntax 0.6 all leading indentation on each line of multiline Patterns is removed in the AST. The following Fluent code:

multi =
    Lorem ipsum dolor
      sit amet enim.
        Etiam ullamcorper.

parses as:

{
    "type": "TextElement",
    "value": "Lorem ipsum dolor\nsit amet enim.\nEtiam ullamcorper.",
}

Problem statement

The indentation on each line conflates two orthogonal purposes:

  1. Fluent requires all text on a new line to be indented. Thanks to this indentation Fluent knows this is text content.
  2. The author of the example chose to indent line 2 and line 3 relative to line 1. The indentation is arbitrary and on top of what Fluent requires. It can be considered part of the content, just as the fact that this text is formatted over three separate lines is.

In Syntax 0.6, the only way for the localizer to have their indent preserved is to use StringLiterals with explicit whitespace in each line:

multi =
    Lorem ipsum dolor
    {"  "}sit amet enim.
    {"    "}Etiam ullamcorper.

This might be an acceptable solution to one-off situations, but it doesn't scale for multiline patterns, where multi > 2. One use-case for such patterns which require custom indentation is translations which define some multiline HTML to be used by a markup overlay mechanism.

Prior work

  • Python's textwrap.dedent. It is a runtime solution which isn't suitable to our needs of performing this work during the AST construction.
  • Scala's stripMargin. It requires that a special character (by default |) be placed at the beginning of each line of the string literal. I don't think this approach would be right for Fluent. In Fluent, typing text should be the simplest thing to do. Consequently, text should have the sweetest syntax, i.e. no syntax at all. If anything, I'd consider using | for everything else except multiline text, but that's a different issue.
  • Swift's closing quotation marks approach. This takes advantage of the fact that string literals in Swift have delimiters. Again, in Fluent, the main form in which text exists (TextElement) doesn't have any syntax at all and a similar approach is not possible.

Proposal

I propose that the width of the indentation of the first line of a Pattern be considered the indent of the entire Pattern across all its lines. Any extra leading whitespace should be included in the AST as part of the Pattern.

In other words, the following example:

multi =
    Lorem ipsum dolor
      sit amet enim.
        Etiam ullamcorper.

will parse as:

{
    "type": "TextElement",
    "value": "Lorem ipsum dolor\n  sit amet enim.\n    Etiam ullamcorper.",
}

In the rare cases when the first line of the Pattern needs to keep its indent, a StringLiteral should be used to 1) make it explicit and 2) define the indent required by the syntax. This is consistent with how creating Patterns with leading whitespace works right now. Without the explicit StringLiteral, all leading and trailing whitespace is trimmed during parsing.

multi =
    {"    "}Lorem ipsum dolor
      sit amet enim.
        Etiam ullamcorper.

parses as:

{
    "type": "Placeable",
    "expression": {
        "type": "StringLiteral",
        "value": "    "
    }
},
{
    "type": "TextElement",
    "value": "Lorem ipsum dolor\n  sit amet enim.\n    Etiam ullamcorper.",
}

Requirements

For this to work, we'll need to forbid multiline Patterns which start on the same line as the identifier or the variant key.

now-forbidden = A Pattern which starts inline
    but continues as multiline. There's no indent
    on its first line whatsoever so it's not clear how
    much indent should be considered as required
    by the syntax and how much as indent to be
    kept in the AST.
above-snippet-fixed =
    A multiline Pattern must start on a new line.
    Now the syntax-required indentation is unambiguous
            and any extra leading whitespace will be kept
            in the AST.
@Pike
Copy link
Contributor

Pike commented Aug 3, 2018

What would you do with

weird =
    Some text that
  dedents on its way.

@stasm
Copy link
Contributor Author

stasm commented Aug 3, 2018

Great question, thanks. It would be a syntax error, raised in abstract.mjs. On the implementation level, the grammar defined in the EBNF would preserve all indents and then the FTL.Pattern constructor in abstract.mjs would verify that all lines have at least the indent of the first line.

@Pike
Copy link
Contributor

Pike commented Aug 8, 2018

From a big-picture POV, I think we should only do this if we can avoid introducing syntax errors.

I think it's a good achievement that we stop kicking people in the face for them just typing ahead with 0.7, and I'd like us to be consistent about that. That is, I think we should continue to support multi-line text with leading inline, and also weird indention patterns.

I think that's more valuable than a mark-up free indention logic.

Technically, I wanna throw in that

look = mommy,
\tI can have leading tabs
  or not.

is another interesting challenge to the algorithm. With \t being an actual tab character.

@stasm
Copy link
Contributor Author

stasm commented Aug 8, 2018

That is, I think we should continue to support multi-line text with leading inline, and also weird indention patterns.

Do you have a proposal how to achieve this?

@Pike
Copy link
Contributor

Pike commented Aug 8, 2018

Sadly, I don't have a proposal on how to create markup-free indention. Rambling a bit:

I guess the only hard blocker is leading-tabs intermixed with spaces, 'cause there's just no way to tell if a tab is any particular number of spaces.

In a scenario of consistent indention, we could skip the leading indention of the non-inline lines.

Now, in a case of inconsistent indention (mix of space and tab), switching this off would be weird and somewhat an element of surprise, I think. Syntax error would be worse.

We could also reduce inline white-space to just space, and put the tab character into the bucket of all the other chars. But that's a major religious debate, and probably also surprising and confusing.

I'm not thrilled by any of my ideas here. ....

@Pike
Copy link
Contributor

Pike commented Aug 8, 2018

Switching back to bigger picture.

I can see how there are some applications where indention matters. Say, a markdown document. But then, I think that doing markdown paragraphs is probably not a nice experience with Fluent to begin with, regardless of the indention issue.

This proposal does have down-sides for the applications where Fluent currently is good at, say regular HTML fragments, which do white-space normalization during layout. Right now, you're allowed to get indention off there.

Keeping that seems like a bigger win for the applications that we know of.

For the unknown unknowns, I'd suggest to WONTFIX this. I'd be open to re-open this at the point where we're faced with a concrete project, that benefits from Fluent, and that shows how to develop this feature in a way that serves both the existing and the future customers. I think that's OK for the things we're doing now, and will be OK for them most likely if we revisit this.

@stasm
Copy link
Contributor Author

stasm commented Aug 8, 2018

In a scenario of consistent indention, we could skip the leading indention of the non-inline lines.

Isn't this what the current behavior (Syntax 0.6) is? Which part of the indent would be kept in the AST as localizer-originating?

As far as your other tab-vs-spaces questions go, I look up to how Python deals with inconsistent indentation characters: it doesn't allow them. I'm OK with a syntax errors in such cases.

This proposal does have down-sides for the applications where Fluent currently is good at, say regular HTML fragments, which do white-space normalization during layout. Right now, you're allowed to get indention off there.

I don't understand how this proposal has down-sides for an environment which does normalization anyways. Would you mind rephrasing?

@stasm
Copy link
Contributor Author

stasm commented Aug 8, 2018

The balance that I'm trying to strike here is between being lax to input and being predictable in what the result is. The goal is to allow localizers more control over what comes out from format() as well as how multiline messages are serialized back to text. Consider that the Syntax 0.6 approach results in the AST losing all data about indents and consequently—in all multiline being serialized back as a uniformly-indented block.

In the following snippets I'm going to use a regular space ( ) to mean the indent required by the syntax, and to denote the indent originating from the localizer. I'm also using for a tab.

Perhaps a few more examples will help the duscussion. The base use-case is this:

multiline-text =
    <p>
    ████Hello.
    </p>

In this example, the localizer-originating indent of ████ will be stored in the AST. This allows better serialization without hurting any of the current applications of Fluent (which are mostly HTML-based). It also opens up new possibilities for environments which will want to look at the whitespace in translations. These are currently at a disadvantage due to the fact that we strip all indents and only preserve line breaks. One of the design principles of Fluent is the principle of least power. Preserving more information about how the localizer laid out the translation follows this principle.

The following examples of inconsistent indentation would be illegal. This is fine because most editors can maintain the same level of indentation when pressing Enter. You'd have to try hard to cause a syntax error.

# Error: inconsistent indentation.
multiline-text =
    <p>
    ████Hello.
⭾</p>

# Error: inconsistent indentation.
multiline-text =
    <p>
  Hello.
    </p>

Note that by making them illegal right now, we reserve the possibility of making it valid syntax in a future milestone. Perhaps lines indented by fewer blanks than the first line should still count as valid? We don't have to make this call now, which is a good thing for 0.7 and our ability to ship it.

If the first line of a multiline pattern starts with a tab, then all other lines have to start with tabs as well:

multiline-text =
⭾<p>
⭾████Hello.
⭾</p>

Furthermore, mixing inline patterns and block patterns would be illegal in 0.7.

# Error: mixing inline and block patterns
mixed-text = Starts inline
    and continues as a block.

Again, by forbidding this right now we make it possible to allow this in the future if we see people making this mistake often. Maybe the behavior should be the current one: no indents are stored in the AST in this case. I don't know, but that's OK because we don't have to decide right now.

All these changes are good for making the grammar less ambiguous and for giving more control to the localizer. As a reminder, here's the summary of the goals of Syntax 0.7:

This milestone is focused on reducing the surprises of the syntax and making it easier to learn just by looking at FTL files and trying to discover syntax patterns from them.

I consider the fact that

multiline-text =
    <p>
    ████Hello.
    </p>

parses and serializes back to

multiline-text =
    <p>
    Hello.
    </p>

a significant surprise. We need to fix this. If we're going to wontfix this issue, I expect us to find another solution to this problem.

@Pike
Copy link
Contributor

Pike commented Aug 9, 2018

Serialization of white-space is issue #121. If that's important to fix, let's re-triage it.

All these changes are good for making the grammar less ambiguous and for giving more control to the localizer. As a reminder, here's the summary of the goals of Syntax 0.7:

This milestone is focused on reducing the surprises of the syntax and making it easier to learn just by looking at FTL files and trying to discover syntax patterns from them.

I think that less ambiguity is actually counter-acting your stated goals for 0.7. In fact, the white-space relaxation is actually allowing more ways to express a valid fluent message.

Allowing that for technical syntax on the one hand, and at the same time introduce barriers in regular text is contradictory in my mind.

@stasm
Copy link
Contributor Author

stasm commented Aug 9, 2018

Serialization of white-space is issue #121. If that's important to fix, let's re-triage it.

#121 is overkill for what I'm trying to achieve here. It doesn't really help because it doesn't make the whitespace part of the value in the AST. Serialization is just one use-case. I think it's important to store some whitespace in the AST as part of values because it allows consumers of the AST to interpret it to their needs. This is exactly the same reason why I changed my mind in #122.

I think that less ambiguity is actually counter-acting your stated goals for 0.7. In fact, the white-space relaxation is actually allowing more ways to express a valid fluent message.

The whitespace relaxation proposal in #87 is directly in line with the "reduce surprises" goal.

@zbraniecki
Copy link
Collaborator

I like this proposal a lot. It follows my natural instincts when I read the messages and I agree that dedented text should throw an error (for now at least).

The only change I'd suggest is that:

now-forbidden = A Pattern which starts inline
    but continues as multiline. There's no indent
    on its first line whatsoever so it's not clear how
    much indent should be considered as required
    by the syntax and how much as indent to be
    kept in the AST.

should work. It's perfectly easy to find the initial-indent on line two and use it as the default indent for the remaining lines, just like in all other cases.
I'd don't see a reason to forbid it.

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

It's perfectly easy to find the initial-indent on line two and use it as the default indent for the remaining lines, just like in all other cases. I'd don't see a reason to forbid it.

There are three reasons to forbid it:

  1. It's an exception to the rule which says that the indent of the first line defines the indent to be stripped away. I'd prefer the first version of this feature to ship without exceptions.
  2. We can forbid it now and safely decide to allow it later. The act of forbidding is forward-compatible.
  3. It will speed up parsing. The parser can safely assume that a character on the same line as the id means that the entire pattern is contained on this line. This affects 100% of strings currently in gecko-strings (verified with rg -g "*.ftl" "^\s+[^ .*\[{}]"). To be fair, however, I should say that we have been recommending to not use block patterns at all for now, since we weren't sure what changes Syntax 0.7 would bring. Right now, there are 68 messages in gecko-strings with values of length over 70 characters ( rg -g "*.ftl" "=.{70,}"|wc -l). That's just 5% of all FTL strings there.

I suggest we forbid this syntax now as a way to test the hypothesis about the parsing speed-up and to simplify the EBNF and the parser implementations. If the feedback from localizers indicates that they trip over this frequently, I'll be very open to allowing this in the future.

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

I'd like to make a decision here when I'm back from vacation mid-next week. It looks like there are three areas of discussion:

  1. Inconsistent indent character - i.e. what should happen when tab indents are mixed with space indents. According to the proposal, the kind of indent used on the first line would win. Alternatively, @Pike suggested to forbid tabs in Use only spaces for syntax whitespace, drop tab for that #165.

  2. Inconsistent amount of indents - i.e. what should happen when the first line is indented by 4 spaces and the following one by 2 or 3. According to the proposal, the parser would produce a syntax error because of insufficient indent on the second line. An alternative solution might be to calculate the maximum common amount of indent for all lines. This alternative behavior could also be safely added in 0.8.

  3. Mixing inline and block patterns - i.e. should patterns be allowed to start on the same line as the identifier and then continue on the next line? The main driver for forbidding it is the fact that the first line of such pattern is by definition inline and doesn't have any indent. This in turn makes deciding how much indent to strip tricky, or at least not obvious.

@Pike
Copy link
Contributor

Pike commented Aug 14, 2018

I and zibi both voiced strong objections to adding syntax errors for this feature.

If we're fine removing them in 0.8, we should push the whole issue to 0.8, and fix this once, and fix it right.

stas raised that we could get a performance optimization from not allowing multi-line text and inline text to share lexical structure. I don't see that in the python impl I'm toying around with.

Counter-proposal is:

  1. Push this to 0.8
  2. Disallow TAB as indention character (maybe already in 0.7)
  3. In text_cont, require ' '+ as indention
  4. In AST generation, find the greatest common sequence of spaces, and drop them from the lines.

This should be very easy to implement in zero-copy parsers, and there are interesting algorithmic approaches to other parsers. Notably, we can speculatively discard the indent of the first follow line, and re-indent the previous follow-line starts when we find a lesser indent. In the common cases, this will be O(1), and in pathological examples O(n^2), with n being the number of lines.

@stasm
Copy link
Contributor Author

stasm commented Aug 16, 2018

stas raised that we could get a performance optimization from not allowing multi-line text and inline text to share lexical structure. I don't see that in the python impl I'm toying around with.

The perf speedups is a nice-to-have side effect. The main reason is I gave is:

The main driver for forbidding it is the fact that the first line of such pattern is by definition inline and doesn't have any indent. This in turn makes deciding how much indent to strip tricky, or at least not obvious.

So far, I've only seen @zbraniecki address this with:

It's perfectly easy to find the initial-indent on line two and use it as the default indent for the remaining lines

I disagree with implementing this approach right now. I'm open to discussing adding it later, once we have some experience and data to work with.

If we're fine removing them in 0.8, we should push the whole issue to 0.8, and fix this once, and fix it right.

I disagree. By restricting right now we can ensure that no rogue FTL file written between now and 0.8 will stand in the way of relaxation. I also prefer an iterative approach over the fix this once, and fix it right one.

I and zibi both voiced strong objections to adding syntax errors for this feature.

That's not how I read @zbraniecki's comment:

It follows my natural instincts when I read the messages and I agree that dedented text should throw an error (for now at least).

My counter-counter-proposal (which is my initial proposal with your suggestion to remove tabs):

  1. Forbid \t as syntax indent in 0.7.
  2. Forbid mixing inline and block patterns in 0.7.
  3. Forbid lines with less than the indent established by the first line in 0.7.
  4. Monitor feedback from developers and localizers.
  5. Lift restrictions 2 and 3 if needed in 0.8 or post-1.0.

@zbraniecki
Copy link
Collaborator

Based on the conversation from today, I created a gist with the remaining item: https://gist.github.com/zbraniecki/5848e7142cf9090edce5d48f80ad9b1f

For anyone following this thread, please, help us make the decision! :)

@stasm
Copy link
Contributor Author

stasm commented Aug 22, 2018

I'm going to move this issue to 0.8. My goal of splitting the scope between 0.7 and a later milestone was to make it possible to ship a good basic version of this sooner rather than later. A prolonged discussion about this idea defeats its original purpose.

In talking to @Pike earlier today, a new edge-case also came up which will require some further thought and design (I'm using · for a single space):

# The first line's indent is 0. The greater common indent is 0.
# Should all 4 spaces at the beginning of the "at your disposal"
# line be part of the value? Even if at least one of them is required
# by the syntax?
credits-available =
{ $num ->
    [one] One
   *[other] {$num}
} available
····at your disposal.

I think we need to decide if indent before placeables should count for the purpose of computing the localizer-intended indent. We'll also need to ask the same question for blank lines and lines consisting entirely of spaces. These might turn out to be simple questions, but in the interest of getting 0.7 ready, let's consider them holistically in 0.8. I no longer believe that the version I proposed for 0.7 is safe enough to push for it.

@Pike
Copy link
Contributor

Pike commented Aug 24, 2018

Leaving this here for when it's time to do 0.8:

master...Pike:dedent-localizer-whitespace

@stasm stasm added the syntax label Oct 16, 2018
@stasm
Copy link
Contributor Author

stasm commented Oct 18, 2018

I took your branch and started working on it to get it ready for a PR. The major missing piece right now is including the indent of the first line of block patterns in the dedentation logic, too. Right now, the parser skips the entire blank space before the first character of the block pattern and any indent of the first line is lost in the process.

(By block pattern I mean a pattern which starts on a line below the identifier.)

# We need to preserve the ···· indent, rather than
# skip it because it precedes a Pattern.
message =
    ····Foo
    Bar

I'll work on this tomorrow.

@stasm stasm changed the title Preserve localizer-intended indent in multiline Patterns Preserve content indent in multiline Patterns Oct 19, 2018
@MikkCZ
Copy link

MikkCZ commented Oct 20, 2018

YAML is terrible in many ways, including multi-line strings. But a single thing I like there is that the whole multi-line string behaviour is defined at the beginning without later touching the value itself. Maybe I am completely off, but could Fluent use a similar notation (but different semantics)?

What about something like this:

Having | modifier after the identifier to take the first line as a base and indent other lines relatively.

# "Foo\n··Bar\nBaz"
message = |
··Foo
····Bar
··Baz

Having > modifier after the identifier to include the first line indentation in the string, but indent other lines relatively.

# "··Foo\n··Bar\nBaz"
message = >
··Foo
····Bar
··Baz

This way strings could probably contain blank lines, despite it won't be very visible without an editor visualizing the spaces.

# "\nFoo\n"
message = |
··
··Foo
··

I am not sure about lines with trailing spaces and if you want to solve them too. If there could be some another (optional) modifier to specify the line trailing spaces behaviour, explicitly including them in the string (like < ?).

# "Foo\nBar··"
message = |<
  Foo
  Bar··

@stasm
Copy link
Contributor Author

stasm commented Oct 23, 2018

Thanks for the ideas, @MikkCZ! We actually looked at YAML's multiline handling in #122, although in a slightly different context. Before I filed this proposal, I considered extending the syntax of multiline patterns; it was very similar to your examples, actually :)

There were a number of reasons why I didn't go with this approach. The sigils, like | or > would need to made special. Right now they are just regular text characters. I'd be reluctant to add more specials, tbh. In fact, the motivation behind #123 is to limit them to the absolute minimum (i.e., just the braces, {}).

Patterns can also be values of attributes and variants, which themselves can be indented for clarity. This makes it a bit hard to know how much of the indentation of the first line is an artifact of where the pattern is in the message.

# "······Foo\n··Bar\nBaz"?
message =
    .attr = >
······Foo
········Bar
······Baz

I also didn't want to give any special treatment to leading and trailing whitespace. It's usually a sign of a problem in the software if the translation needs it. It's most commonly abused to concatenate messages in the code rather than in Fluent. Having a dedicated syntax for it would legitimize its use. Furthermore, if absolutely required, the whitespace can be included in the value explicitly via a StringLiteral:

# "Foo\nBar··"
message =
  Foo
  Bar{"  "}

I admit it's a bit awkward for leading and trailing newlines, though. StringLiterals can't have newlines in them, so this uses an empty StringLiteral to start a Pattern on the same line as the identifier, and to end it two lines below Foo:

# "\n\nFoo\n\n"
message = {""}

··Foo

··{""}

Hopefully, the shouldn't be translations which needs this. And if in the future we see use-cases for making this easier to write, we can try to find a different solution.

Lastly, I was hoping to find a solution which wouldn't require memorizing which sigil does what. The YAML example is actually very telling. While it all makes sense, it's also just very complex and I image users need to refer to the manual often to get it right!

                     ~ How YAML Handles Multiline Text ~

                      >     |            "     '     >-     >+     |-     |+
-------------------------|------|-----|-----|-----|------|------|------|------  
Trailing spaces   | Kept | Kept |     |     |     | Kept | Kept | Kept | Kept
Single newline => | _    | \n   | _   | _   | _   | _    |  _   | \n   | \n
Double newline => | \n   | \n\n | \n  | \n  | \n  | \n   |  \n  | \n\n | \n\n
Final newline  => | \n   | \n   |     |     |     |      |  \n  |      | \n
Final dbl nl's => |      |      |     |     |     |      | Kept |      | Kept  
In-line newlines  | No   | No   | No  | \n  | No  | No   | No   | No   | No
Spaceless newlines| No   | No   | No  | \   | No  | No   | No   | No   | No 
Single quote      | '    | '    | '   | '   | ''  | '    | '    | '    | '
Double quote      | "    | "    | "   | \"  | "   | "    | "    | "    | "
Backslash         | \    | \    | \   | \\  | \   | \    | \    | \    | \
" #", ": "        | Ok   | Ok   | No  | Ok  | Ok  | Ok   | Ok   | Ok   | Ok
Can start on same | No   | No   | Yes | Yes | Yes | No   | No   | No   | No
line as key       |

@Pike described the current proposal as the "ruler rule" ;) Imagine holding a ruler in your hand and putting it up vertically against the monitor. Then start moving it to the right until you reach the first character on any of the lines of the block pattern. Any extra indentation on other lines should be preserved.

This solution allows to use complex HTML markup in translations and preserve its indentation. It also works well if the translation is to be interpreted as Markdown. Do you see other use-cases where it might not be enough?

Let me know what you think and thanks again for taking the time to share your thoughts!

@stasm
Copy link
Contributor Author

stasm commented Oct 25, 2018

Regarding the case I mentioned in #162 (comment):

# Exhibit A.
# Value is "····Foo\nBar".
message =
    ····Foo
    Bar

I remember an idea from a year ago to explicitly forbid leading whitespace in TextElements in the serializer. In the bug, I said:

[I]t is not possible for an AST which is a result of parsing to contain Patterns whose first element is a TextElement with leading whitespace.

This proposal changes this assumption. @Pike, do you anticipate any troubles because of that? One that I can think of is that serializing the value of ····Foo\nBar naively will result in the ···· indent to be placed on the same line as the identifier, which means it will be lost to trimming:

# Exhibit B.
# Value is "Foo\nBar".
message = ····Foo
    Bar

However, I think our current implementations protect us from this. Here's my thinking:

  1. The current JS/Python serializer is already more of a pretty-printer: it formats according to what we consider the good style for writing Fluent. It doesn't serialize Patterns naively; instead it detects multiline text and always serializes it starting from a new line.

  2. If we ever make the serializer produce exactly the same formatting as the file the AST was parsed from (we'll need whitespace in the AST; see Store top-level whitespace in the AST #116, Parse whitespace inside of entries #121), then this will be automatically solved by this very fact. The value parsed from Exhibit A will always serialize back to the same formatting and the problem illustrated by Exhibit B will never occur.

@stasm
Copy link
Contributor Author

stasm commented Oct 25, 2018

Oh, wait, maybe we don't actually need to solve the ····Foo\nBar problem at all! Because it's not a problem, really. To quote my comment from last week (#162 (comment)):

The major missing piece right now is including the indent of the first line of block patterns in the dedentation logic, too. Right now, the parser skips the entire blank space before the first character of the block pattern and any indent of the first line is lost in the process.

That's not a missing piece, it's a feature! To preserve the leading indent one needs to use a StringLiteral, which is exactly how it's done everywhere else!

# Formats to "····Foo\nBar".
message =
    {"····"}Foo
    Bar

@stasm
Copy link
Contributor Author

stasm commented Oct 25, 2018

@Pike would you like to open a PR based on your branch?

@stasm
Copy link
Contributor Author

stasm commented Oct 29, 2018

After giving it more thought, I'm still torn but I'm leaning towards giving the leading block indent priority over the trim-leading-blanks behavior.

If the localizer decides to use multiline patterns, it's because they care about the formatting for some reason. It may be pure aesthetics or maybe the requirements of the environment (e.g. Markdown's indented code blocks). In either case, indentation becomes part of the authored content and should be preserved.

I'm struggling to come up with good examples, but I'd expect the following Fluent messages to follow the ruler rule rather than have the the leading blanks trimmed:

lipsum =
        Lorem ipsum dolor sit amet enim. Etiam ullamcorper.
    Suspendisse a pellentesque dui, non felis. Maecenas 
    malesuada elit lectus felis, malesuada ultricies. Curabitur
    et ligula. Ut molestie a, ultricies porta urna. Vestibulum
    commodo volutpat a, convallis ac, laoreet enim.
        Phasellus fermentum in, dolor. Pellentesque facilisis.
    Nulla imperdiet sit amet magna. Vestibulum dapibus,
    mauris nec malesuada fames ac turpis velit, rhoncus
    eu, luctus et interdum adipiscing wisi. 
for-of-loop =

        for (let element of elements) {
            doSomething(element);
        }

    The above snippet is an example of a `for-of` loop is JavaScript.
    It can be used to iterate over a collection of items which implements
    the `[Symbol.iterator]()` method.

This creates a refactoring hazard when a two-line pattern which preserved the blanks in front of the first line is converted to a single-line pattern.

# "····Foo\nBar"
foo =
        Foo
    Bar
# "Foo Bar"
foo =
        Foo Bar

However, I'd argue that if the leading indent of a multiline pattern is indeed important, it's because of the same reason that would make it look bad as a single-line pattern anyways. And so the risk of making a mistake when refactoring is low. Case in point, Foo Bar is not something that I'd choose to write as multiline in the first place nor something that should have its leading whitespace preserved.

@stasm
Copy link
Contributor Author

stasm commented Oct 30, 2018

I've opened a WIP PR in #197, based on @Pike's branch mentioned before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants