Skip to content

Conversation

@gaborbernat
Copy link
Contributor

@gaborbernat gaborbernat commented Nov 13, 2025

Universal Ctags crashed with assertion failure in vStringPutImpl() when encountering files with UTF-16 encoding. The assertion c >= 0 && c <= 0xff failed because ctags expected all characters to fit within single byte range, but UTF-16 files contain multi-byte sequences that violate this assumption.

This fix adds:

  • Detection of UTF-16 BOM (both LE and BE) in file reading
  • Automatic conversion from UTF-16 to UTF-8 using iconv when UTF-16 is detected
  • Force memory stream processing for UTF-16 files to enable conversion
  • Test cases for both UTF-16 LE and BE files

Resolves issue #4342

Signed-off-by: Bernát Gábor [email protected]

…n failures

Universal Ctags crashed with assertion failure in vStringPutImpl() when
encountering files with UTF-16 encoding. The assertion `c >= 0 && c <= 0xff`
failed because ctags expected all characters to fit within single byte range,
but UTF-16 files contain multi-byte sequences that violate this assumption.

This fix adds:
- Detection of UTF-16 BOM (both LE and BE) in file reading
- Automatic conversion from UTF-16 to UTF-8 using iconv when UTF-16 is detected
- Force memory stream processing for UTF-16 files to enable conversion
- Test cases for both UTF-16 LE and BE files

Resolves issue universal-ctags#4342

Signed-off-by: Bernát Gábor <[email protected]>
@gaborbernat gaborbernat force-pushed the fix-utf16-encoding-crash branch from 55764aa to c041872 Compare November 13, 2025 08:18
@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.88%. Comparing base (d48558f) to head (fa7d3e7).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4347   +/-   ##
=======================================
  Coverage   85.87%   85.88%           
=======================================
  Files         252      252           
  Lines       62597    62631   +34     
=======================================
+ Hits        53755    53789   +34     
  Misses       8842     8842           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gaborbernat and others added 3 commits November 13, 2025 06:45
Enables test execution for existing UTF-16 test files by adding the
required args.ctags configuration file. This ensures the UTF-16 LE and
UTF-16 BE files are processed during test runs, improving code coverage
for the UTF-16 to UTF-8 conversion functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Adds test for UTF-16 conversion failure path using malformed UTF-16 data
with invalid surrogate sequences. This triggers the iconv() failure path
and tests the fallback mechanism that preserves original data when
UTF-16 to UTF-8 conversion fails.

This ensures 100% coverage of the UTF-16 conversion error handling code
including the eFree(converted_data) cleanup logic.

Signed-off-by: Bernát Gábor <[email protected]>
Adds specific test for UTF-16 Big Endian BOM detection (FE FF) to ensure
complete coverage of line 899: (bom[0] == 0xFE && bom[1] == 0xFF).

This test completes 100% coverage of all UTF-16 BOM detection paths
including both LE (FF FE) and BE (FE FF) byte order markers.

Signed-off-by: Bernát Gábor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant