Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🪲 Fix keyword translation when keywords are substrings of each other #6121

Merged
merged 6 commits into from
Jan 22, 2025

Conversation

rix0rrr
Copy link
Collaborator

@rix0rrr rix0rrr commented Jan 22, 2025

In #6118, we discovered that when the source language keywword is a substring of the target language keyword, we make a mistake in the string replacement.

This happens because we do the following:

  • First: parse the entire program
  • Then: for every line containing a keyword, do a separate search and replace on the line. This makes us match the Catalan keyword a where the source line actually contained the English keyword at.

In this PR, I'm changing the logic a little: as result of parsing, we already know the location in the line where the keyword occurs, so we can just immediately replace that part of the line with the new keyword.

Two complications that make this slightly less straightforward than it sounds (which are probably the reason why we are doing the additional search&replace in the first place):

  • Our grammar rules match whitespace as part of the keyword token, but the whitespace needs to remain in place.
    • Solution: find the whitespace in the matched token and only substitute the rest of the token.
  • String substitutions may change the length of the string, and therefore invalidate all the indexes of the parse tree that follow it.
    • Solution: first collect a list of all subsitutions, then apply them in back-to-front order. That way, all yet-to-be-processed indexes are never disturbed by a substitution.

Fixes #6118.

In #6118, we discovered that when the source language keywword is a
substring of the target language keyword, we make a mistake in the
string replacement.

This happens because we do the following:

- First: parse the entire program
- Then: for every line containing a keyword, do a separate search and
  replace on the line.

In this PR, I'm changing the logic a little: as part of parsing, we
already know the location in the line where the keyword occurs, so we
can just immediately replace that part of the line with the new keyword.

Two complications that make this slightly less straightforward than it
sounds:

- Our grammar rules match whitespace as part of the keyword token, but
  the whitespace needs to remain in place.
    - Solution: find the whitespace in the matched token and only
      substitute the rest of the token.
- String substitutions may change the length of the string, and
  therefore invalidate all the indexes of the parse tree that follow
  it.
    - Solution: first collect a list of all subsitutions, then apply
      them in back-to-front order. That way, all yet-to-be-processed
      indexes are never disturbed by a substitution.

Fixes #6118.
Copy link
Contributor

mergify bot commented Jan 22, 2025

Thank you for contributing! Your pull request is now going on the merge train (choo choo! Do not click update from main anymore, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit f8003e0 into main Jan 22, 2025
11 checks passed
@mergify mergify bot deleted the keyword-translation-catalan branch January 22, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

🪲 Keyword translation fails for Catalan: "at random" -> "att random"
2 participants