Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: faster utf8<->utf16 conversion on Windows #4549

Merged
merged 1 commit into from
Dec 1, 2024

Conversation

aras-p
Copy link
Contributor

@aras-p aras-p commented Dec 1, 2024

Description

OIIO 2.3.13 with PR #3307 changed MultiByteToWideChar/WideCharToMultiByte usage to C++11 functionality, but that has two issues:

  1. it is way slower, primarily due to locale object access (on Visual C++ STL implementation in VS2022 at least). Since primary use case of these conversions is on Windows, maybe it is better to use a fast code path.

  2. whole of machinery is deprecated with C++17 accross the board, and will be removed in C++26. I've kept the existing functions in there since otherwise it would have been an API break, but really maybe with OIIO 3.0 they should have been un-exposed. Too late now though :(

Tests

Performance numbers: doing ImageInput::create() on 1138 files where they are not images at all (so OIIO in turns tries all the input plugins on them). Ryzen 5950X, VS2022, Windows:

  • utf8_to_utf16 3851ms -> 21ms
  • utf16_to_utf8 1055ms -> 4ms

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable. (Check if there is no
    need to update the documentation, for example if this is a bug fix that
    doesn't change the API.)
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • If I added or modified a C++ API call, I have also amended the
    corresponding Python bindings (and if altering ImageBufAlgo functions, also
    exposed the new functionality as oiiotool options).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format before submitting, I definitely will look at the CI
    test that runs clang-format and fix anything that it highlights as being
    nonconforming.

OIIO 2.3.13 with PR AcademySoftwareFoundation#3307 changed MultiByteToWideChar/WideCharToMultiByte
usage to C++11 <codecvt> functionality, but that has two issues:

1) it is *way* slower, primarily due to locale object access
(on Visual C++ STL implementation in VS2022 at least). Since primary
use case of these conversions is on Windows, maybe it is better to
use a fast code path.

2) whole of <codecvt> machinery is deprecated with C++17 accross the
board, and will be removed in C++26. I've kept the existing
functions in there since otherwise it would have been an API break,
but really maybe with OIIO they should have been un-exposed. Too late
now though :(

Performance numbers: doing ImageInput::create() on 1138 files where they
are not images at all (so OIIO in turns tries all the input plugins on
them). Ryzen 5950X, VS2022, Windows:

- utf8_to_utf16 3851ms -> 21ms
- utf16_to_utf8 1055ms -> 4ms

Signed-off-by: Aras Pranckevicius <[email protected]>
@aras-p
Copy link
Contributor Author

aras-p commented Dec 1, 2024

The one CI failure fails at cmake setup time with Could NOT find pystring (missing: pystring_LIBRARY) (found suitable version "1.1.4", minimum required is "1.1.4") which is... confusing.

@lgritz
Copy link
Collaborator

lgritz commented Dec 1, 2024

I have seen that from time to time recently... but not always, since we obviously do have passing tests most of the time. I'm not quite sure what's going on there, but it's obviously unrelated to your patch. I will poke it to make it run the CI again for that job, but even if that doesn't pass, I won't hold this up.

OIIO itself doesn't use pystring, that's being pulled in by the OpenColorIO build.

Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and thanks for the fix! It's shocking that there isn't a stable, portable, performant utf8<->utf16 conversion that's part of the C++ standard that we can rely on and that doesn't completely change every couple of standard revisions.

I have, frankly, wondered if we should just get rid of all the wchar stuff entirely from OIIO, and put the responsibility of Windows programmers to convert to utf8 before calling OIIO API calls. We added the wchar ones to try to make it convenient for Windows users, but sheesh, C++ sure isn't making it easy on us.

@lgritz
Copy link
Collaborator

lgritz commented Dec 1, 2024

Yeah, just rerunning the failed test with no other modifications... and it succeeded this time. Something odd is going on with OpenColorIO or pystring itself, maybe? But I'm not sure why it's nondeterministic.

@lgritz lgritz merged commit 05f5f59 into AcademySoftwareFoundation:main Dec 1, 2024
28 checks passed
lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request Dec 1, 2024
…ation#4549)

OIIO 2.3.13 with PR AcademySoftwareFoundation#3307 changed
MultiByteToWideChar/WideCharToMultiByte usage to C++11 <codecvt>
functionality, but that has two issues:

1) it is *way* slower, primarily due to locale object access (on Visual
C++ STL implementation in VS2022 at least). Since primary use case of
these conversions is on Windows, maybe it is better to use a fast code
path.

2) whole of <codecvt> machinery is deprecated with C++17 accross the
board, and will be removed in C++26. I've kept the existing functions in
there since otherwise it would have been an API break, but really maybe
with OIIO 3.0 they should have been un-exposed. Too late now though :(

## Tests

Performance numbers: doing ImageInput::create() on 1138 files where they
are not images at all (so OIIO in turns tries all the input plugins on
them). Ryzen 5950X, VS2022, Windows:

- utf8_to_utf16 3851ms -> 21ms
- utf16_to_utf8 1055ms -> 4ms

Signed-off-by: Aras Pranckevicius <[email protected]>
lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request Dec 9, 2024
…ation#4549)

OIIO 2.3.13 with PR AcademySoftwareFoundation#3307 changed
MultiByteToWideChar/WideCharToMultiByte usage to C++11 <codecvt>
functionality, but that has two issues:

1) it is *way* slower, primarily due to locale object access (on Visual
C++ STL implementation in VS2022 at least). Since primary use case of
these conversions is on Windows, maybe it is better to use a fast code
path.

2) whole of <codecvt> machinery is deprecated with C++17 accross the
board, and will be removed in C++26. I've kept the existing functions in
there since otherwise it would have been an API break, but really maybe
with OIIO 3.0 they should have been un-exposed. Too late now though :(

## Tests

Performance numbers: doing ImageInput::create() on 1138 files where they
are not images at all (so OIIO in turns tries all the input plugins on
them). Ryzen 5950X, VS2022, Windows:

- utf8_to_utf16 3851ms -> 21ms
- utf16_to_utf8 1055ms -> 4ms

Signed-off-by: Aras Pranckevicius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants