Skip to content

fix(cli): sanitize surrogate characters in _on_text_changed paste fallback#9158

Closed
XiaoXiao0221 wants to merge 2 commits intoNousResearch:mainfrom
XiaoXiao0221:fix/cli-paste-surrogate-crash
Closed

fix(cli): sanitize surrogate characters in _on_text_changed paste fallback#9158
XiaoXiao0221 wants to merge 2 commits intoNousResearch:mainfrom
XiaoXiao0221:fix/cli-paste-surrogate-crash

Conversation

@XiaoXiao0221
Copy link
Copy Markdown
Contributor

What does this PR do?

Fix UnicodeEncodeError when pasting text containing lone surrogates (e.g. from Google Docs / Word clipboard) in terminals that do not support Bracketed Paste mode.

handle_paste (BracketedPaste) was already sanitized in PR #8980. The _on_text_changed fallback path that fires when bracketed paste is unavailable was missing the same sanitization, causing:

Exception 'utf-8' codec can't encode characters in position 1828-1829: surrogates not allowed

Related Issue

Follow-up to #8980.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • cli.py: Added _sanitize_surrogates() call before paste_file.write_text() in the _on_text_changed fallback paste handler

How to Test

  1. Use a terminal without Bracketed Paste support (or simulate by pasting via a non-bracketed method)
  2. Paste text copied from Google Docs or Word containing special characters (e.g. em-dashes, smart quotes)
  3. Verify no UnicodeEncodeError is raised and text is correctly saved to the paste file
  4. Alternatively: copy text with lone surrogates and paste into hermes CLI

Checklist

  • 🐛 Bug fix
  • No unrelated changes mixed in (diff hygiene verified)
  • My commit messages follow Conventional Commits (fix(cli):)
  • Tested on Windows

…tirith binary

- Add Windows platform detection in _detect_target() (pc-windows-msvc)
- Use .zip archive for Windows targets instead of .tar.gz
- Implement zipfile extraction for Windows binary (tirith.exe)
- Fix path resolution bug where src_base ignored nested zip members
- Skip chmod +x on Windows (not supported)
- Add unit test script for cross-platform archive handling

Fixes: WinError 2 file not found when tirith auto-installs on Windows
…lback

The handle_paste function (BracketedPaste mode) was already sanitized
in PR NousResearch#8980. The _on_text_changed fallback path that fires when
bracketed paste is not available also writes pasted text to a file
without sanitizing, causing UnicodeEncodeError when text contains lone
surrogates from Word/Google Docs clipboard paste.

Add _sanitize_surrogates() call before write_text() in the fallback
handler, matching the same protection already applied to the primary
handle_paste path.
@XiaoXiao0221
Copy link
Copy Markdown
Contributor Author

Closing this in favor of #9015. #9015 now carries the clean fallback-path fix on top of current main, with regression coverage and without the unrelated tirith changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant