Skip to content

All backends: platform encoding errors bypass transpiled UnicodeDecodeError handling #334

@ldayton

Description

@ldayton

Summary

Python code that handles invalid UTF-8 raises UnicodeDecodeError, which callers can catch. Tongues transpiles the explicit raise UnicodeDecodeError(...) calls correctly, but platform string/encoding operations in the target language throw different native exceptions that bypass the transpiled error handling.

Reproduction

Python:

try:
    result = parse(invalid_utf8_input)
except (ParseError, UnicodeDecodeError):
    # handled

In JS, the equivalent code catches ParseError and the transpiled UnicodeDecodeError class, but Node's TextDecoder throws a native TypeError: The encoded data was not valid for encoding utf-8 which isn't an instance of the transpiled UnicodeDecodeError.

Impact

7 test failures across JS, Perl, and Ruby backends in the Parable transpilation. All involve test inputs containing intentionally invalid UTF-8 byte sequences (from bash's multibyte character tests).

Possible fixes

  1. Wrap platform encoding operations: when transpiling bytes.decode("utf-8") or equivalent, emit a try/catch that converts the native encoding error to the transpiled UnicodeDecodeError
  2. Normalize in the catch site: when transpiling except UnicodeDecodeError, also catch the platform's native encoding error type (JS TypeError, Ruby Encoding::InvalidByteSequenceError, Perl Encode::...)

Discovered via ldayton/Parable#413.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendbugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions