Skip to content

Conversation

@indutny
Copy link
Member

@indutny indutny commented Nov 4, 2025

src/node_modules.cc needs to be consistent with src/node_file.cc in how it translates the utf8 strings to std::wstring otherwise we might end up in situation where we can read the source code of imported package from disk, but fail to recognize that it is an ESM (or CJS) and cause runtime errors. This type of error is possible on Windows when the path contains unicode characters and "Language for non-Unicode programs" is set to "Chinese (Traditional, Taiwan)".

See: #58768

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Nov 4, 2025
@indutny indutny force-pushed the fix/esm-cp-utf8 branch 2 times, most recently from 13d23c9 to 440a6be Compare November 4, 2025 20:59
@indutny
Copy link
Member Author

indutny commented Nov 4, 2025

cc @joyeecheung

@indutny
Copy link
Member Author

indutny commented Nov 4, 2025

cc @nodejs/cpp-reviewers

`src/node_modules.cc` needs to be consistent with `src/node_file.cc` in
how it translates the utf8 strings to `std::wstring` otherwise we might
end up in situation where we can read the source code of imported
package from disk, but fail to recognize that it is an ESM (or CJS) and
cause runtime errors. This type of error is possible on Windows when the
path contains unicode characters and "Language for non-Unicode programs"
is set to "Chinese (Traditional, Taiwan)".

See: nodejs#58768
Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have test for this?

@indutny
Copy link
Member Author

indutny commented Nov 4, 2025

@RafaelGSS it is a good question, and I'd like to answer yes, but I'm afraid this is complicated to reproduce as it requires changing "Language for non-Unicode programs" to non-english locale so that GetACP() returns some other value. It is pretty straightforward to test manually on Windows, though:

  1. Create a folder named ”炭治郎” (yay, Demon Slayer reference) and cd into it
  2. npm init -y
  3. npm install pino
  4. Run node -p "require('internal/test/binding').internalBinding('modules').getNearestParentPackageJSON(require('path').join(process.cwd(), 'node_modules', 'pino', 'bin.js'))"
  5. Verify that it doesn't print undefined
  6. Change "Language for non-Unicode programs" to "Chinese (Traditional, Taiwan)"
  7. Run node -p ... again and verify that the result is the same

@indutny
Copy link
Member Author

indutny commented Nov 4, 2025

I feel like this should be the safe change, though, since it moves to match src/node_file.cc.

@indutny
Copy link
Member Author

indutny commented Nov 4, 2025

Unrelated to above a backport to 22-x-y would be appreciated! (Although, the patch can't be a plain cherry-pick since there are two conversion callsites in 22-x-y)

@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.56%. Comparing base (71da1a8) to head (297b2b9).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
src/node_modules.cc 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #60575      +/-   ##
==========================================
+ Coverage   88.54%   88.56%   +0.01%     
==========================================
  Files         704      704              
  Lines      208072   208092      +20     
  Branches    40076    40084       +8     
==========================================
+ Hits       184241   184293      +52     
+ Misses      15884    15831      -53     
- Partials     7947     7968      +21     
Files with missing lines Coverage Δ
src/compile_cache.cc 80.78% <ø> (ø)
src/node_file.cc 75.60% <100.00%> (ø)
src/util-inl.h 81.57% <100.00%> (+0.37%) ⬆️
src/util.h 89.71% <ø> (ø)
src/node_modules.cc 76.16% <88.88%> (+0.17%) ⬆️

... and 44 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

return nullptr;
if (is_permissions_enabled) {
std::string generic_path;
#ifdef _WIN32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that the amount of ifdefs sprinkled everywhere is a bit cluttering; it looks like all new usages of these are on a std::filesystem::path, so technically they are just repeating what PathToString() does, I think we can just put PathToString to util.h and use that instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think usage of generic_string vs string complicates it a bit, but let me see what I can do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about db29a1d? I tried to unify things as much as I could.


#ifdef _WIN32
std::wstring wide_path = ConvertToWideString(path_value_str, GetACP());
std::wstring wide_path = ConvertToWideString(path_value_str, CP_UTF8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here - I think we can just create StringToPath and let BufferValueToPath use it, and here it can just use StringToPath

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by db29a1d. I also dropped the code_point argument from the util method to make sure we don't use GetACP() by accident.

@joyeecheung
Copy link
Member

FWIW it doesn't look like GHA supports locale setting https://github.com/orgs/community/discussions/68929 and I doubt that we can afford one more Windows setup on Jenkins. I don't have a Windows machine with me right now but I can confirm #60575 (comment) is generally how you reproduce this types of issues on Windows (it's relatively common to see programs misbehave on CJK locale, but crashing is unusual).

Copy link
Member

@joyeecheung joyeecheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a suggestion

Co-authored-by: Joyee Cheung <[email protected]>
@indutny
Copy link
Member Author

indutny commented Nov 6, 2025

Let me know if anything else is needed before the merge! (It sounds like all LGTMs that this PR got a stale now because of addressing PR feedback so I can't merge it).

cc @joyeecheung

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Nov 6, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Nov 6, 2025
@nodejs-github-bot
Copy link
Collaborator

CI: https://ci.nodejs.org/job/node-test-pull-request/70075/

@indutny
Copy link
Member Author

indutny commented Nov 6, 2025

Looks like CI is green, whew 🥲

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants