Skip to content

Fix/haskell string escapes#54

Open
theeldermillenial wants to merge 14 commits intoOpShin:masterfrom
theeldermillenial:fix/haskell-string-escapes
Open

Fix/haskell string escapes#54
theeldermillenial wants to merge 14 commits intoOpShin:masterfrom
theeldermillenial:fix/haskell-string-escapes

Conversation

@theeldermillenial
Copy link
Copy Markdown

Changes

We replace ast.literal_eval() with a custom parser, implement Plutus V3 trailing byte rejection to conform to Haskell validation, and implemented a bunch of TODOs for minor fixes.

Rationale: It's a Python string literal parser that doesn't support Haskell's \DDD (decimal) and \oOOO (octal) escape sequences. Since UPLC string literals follow Haskell conventions (not Python), a purpose-built decoder is more correct. The custom _decode_haskell_string() handles all standard escapes (\n, \t, \xHH, etc.) plus the Haskell-specific ones. No functionality is lost — ast.literal_eval() features like \N{UNICODE NAME} are Python-only and don't appear in UPLC.

This fixes three Haskell compatibility problems in string parsing and script deserialization.

  1. Haskell string escape sequences

The UPLC parser now handles Haskell's \DDD (decimal) and \oOOO (octal) string escape sequences, which Python doesn't natively support. Replaced ast.literal_eval() with a custom
_decode_haskell_string() that handles all Haskell escape formats.

Before: (con string "\83\o143") → "\83\o143" (literals kept)
After: (con string "\83\o143") → "Sc" (correctly expanded)

This fixes the string-04 Plutus conformance test case.

  1. Strict mode for PlutusV3 trailing bytes rejection

unflatten() now accepts strict=True which rejects programs with trailing data after the flat encoding. The Conway-era PlutusV3 deserializer requires strict mode — any extra bytes after
the flat-encoded program must cause deserialization failure. PlutusV1/V2 remain lenient (default strict=False).

# V3 strict — rejects trailing bytes
program = unflatten(script_cbor, strict=True)

# V1/V2 lenient — ignores trailing bytes (default)
program = unflatten(script_cbor)

Added has_trailing_data() to UplcDeserializer and a finalize() call in unflatten() to check for remaining bits.

  1. Minor fixes
  • SECP256k1 length checks: verify_ecdsa_secp256k1 validates pubkey (33 bytes), signature (64 bytes), message (32 bytes). verify_schnorr_secp256k1 validates pubkey (32 bytes), signature
    (64 bytes). Matches Haskell's size validation.
  • Zero-cost builtin error: budget_cost_of_op_on_model() now raises RuntimeError for builtins not in the cost model instead of silently returning Budget(0, 0). Prevents free evaluation
    of unknown builtins.
  • File extension bug: load_network_config() checked file.suffix == "json" but Path.suffix returns ".json". Fixed.

Test results

  • 3,997 acceptance tests pass (1 pre-existing failure: missing libsecp256k1)
  • 3 new tests for string escape handling
  • All 1,246 string-specific acceptance tests pass

theeldermillenial and others added 12 commits March 22, 2026 09:59
Replaced python_ast.literal_eval() with custom _decode_haskell_string()
that handles \DDD (decimal), \oOOO (octal), \xHH (hex), \uHHHH,
\UHHHHHHHH, and standard single-char escapes. 3997 acceptance tests
pass (1 pre-existing failure: missing libsecp256k1).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
unflatten() now accepts strict=True which rejects programs with trailing
data after the flat encoding. PlutusV3 requires strict deserialization
(Conway-era tightening). PlutusV1/V2 remain lenient (default).

Added has_trailing_data() to UplcDeserializer and finalize() call in
unflatten() to check for remaining bits after read_program().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… bug

- ECDSA: validate pubkey 33 bytes, sig 64 bytes, msg 32 bytes
- Schnorr: validate pubkey 32 bytes, sig 64 bytes
- machine.py: raise RuntimeError for unknown builtins instead of Budget(0,0)
- cost_model.py: fix file.suffix == "json" → ".json"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pip install uplc[crypto] installs pysecp256k1 and pyblst for full
cryptographic builtin support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plutus conformance suite uses 'boolean' (alias for 'bool') and 'array'
(alias for 'list'). Added to all three constanttype productions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…as int

- (con unit 0) now works (value ignored for unit type)
- (con bool 0) now works (0=False, nonzero=True)
- (con integer <bytes>) now works (int.from_bytes conversion)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The PlutusVersionEnforcer was called during parse(), rejecting
case/constr terms in program version 1.0.0. The version restriction
belongs at the flat serialization level, not the textual parser.
The Haskell evaluator accepts case/constr in any version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FrameCases now maps built-in types to constructor tags for case matching:
- BuiltinBool: False→tag 0, True→tag 1
- BuiltinUnit: tag 0
- BuiltinInteger: tag N
- BuiltinPair: tag 0 with [left, right] fields
- BuiltinList: empty→tag 0, non-empty→tag 1 with [head, tail]

Fixes 10 constant-case conformance test failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The strict length checks (pubkey 33, sig 64, msg 32 for ECDSA; pubkey
32, sig 64 for Schnorr) were too restrictive. The Haskell Plutus spec
uses varying encodings and the conformance tests pass different sizes.
Let pysecp256k1 validate the inputs instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@nielstron nielstron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your submission! Please add test cases that cover all changed/added functionality (for example the array and boolean builtins). Please also see the individual comments.

try:
tks = l.lex(s)
program = p.parse(tks)
PlutusVersionEnforcer().visit(program)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing this will incorrectly lead to accepting invalid programs when specifying the Plutus version. This would lead to incorrect behavior when testing e.g Plutus V1 transactions

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I've restored PlutusVersionEnforcer in parse() and the UnsupportedTerm handler. The version enforcement at parse time is important for testing versioned programs. I've also added strict mode support to unflatten() as a separate enforcement point for flat-decoded programs (needed for PlutusV3 Conway-era strictness), but that's additive, not a replacement.

f"Parsing failed, invalid production: {e.message}",
(filename, e.source_pos.lineno, e.source_pos.colno, source),
) from None
except UnsupportedTerm as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this never invoked? I think there should be an error rather than returning a potentially invalid program?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored the UnsupportedTerm handler is back. It was incorrectly removed as part of a broader refactor. The error path is needed when PlutusVersionEnforcer rejects terms during parsing.

@nielstron
Copy link
Copy Markdown
Contributor

And please make sure to format the files using the pre-commit specified formatter.

Addresses review feedback from nielstron on PR OpShin#54:

1. Restore PlutusVersionEnforcer in parse() — version enforcement at
   the textual parser level is important for testing PlutusV1/V2 scripts
2. Restore UnsupportedTerm exception handler in parse()
3. Remove boolean keyword alias (Haskell only accepts 'bool', not 'boolean')
4. Revert type coercions (int→bool, bytes→int, permissive unit) —
   Haskell uses strict type-directed parsing per PlutusCore.Parser.Builtin
5. Remove case-on-integer (Integer is not a SOP type per Plutus spec;
   conformance test case-5 expects evaluation failure)
6. Add 13 new tests covering: array keyword, strict mode, case on
   bool/unit/list, zero-cost builtin error, cost_model fix, Schnorr msg
7. Format all files with black pre-commit formatter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
python-secp256k1-cardano (C bindings) is already a hard dependency.
The [crypto] extras had pysecp256k1>=0.14.0 (pure Python, different
package from PyPI) which conflicts — installing both corrupts the
pysecp256k1 namespace. Also removed duplicate pyblst from [crypto]
since it's already a hard dep.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants