Skip to content

bug: ctrl+w (deleteWordBackward) doesn't recognize word boundaries between CJK and ASCII characters #596

@mkusaka

Description

@mkusaka

Description

When mixing CJK (Japanese/Chinese/Korean) characters with ASCII text, ctrl+w / deleteWordBackward deletes both together instead of treating them as separate words.

Steps to Reproduce

  1. Type mixed text like 日本語abc or テストtest
  2. Place cursor at the end
  3. Press ctrl+w

Expected Behavior

Only abc (or test) should be deleted, stopping at the CJK-ASCII boundary.

This is the expected behavior per Unicode UAX #\29 (Text Segmentation), where CJK characters are treated as individual word units, creating implicit boundaries between CJK and Latin scripts.

Actual Behavior

The entire string 日本語abc is deleted at once.

Root Cause

Looking at utf8.zig, the findWrapBreaks function only recognizes these as word boundaries:

  • ASCII: spaces, tabs, punctuation (-, /, ., etc.), brackets
  • Unicode: various space characters (NBSP, ideographic space, etc.)

CJK characters and script transitions are not considered word boundaries.

Suggested Fix

Consider implementing UAX #\29 compliant word boundary detection, or at minimum:

  1. Treat each CJK character as a word boundary (per UAX #\29 default)
  2. Treat script transitions (e.g., Han → Latin, Hiragana → Latin) as word boundaries

References

Environment

  • OS: macOS
  • Terminal: iTerm2
  • opencode version: 1.1.36 (discovered while using opencode)
  • opentui version: 0.1.75 (@opentui/core, @opentui/solid)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions