Skip to content

Commit

Permalink
perf: faster utf8<->utf16 conversion on Windows
Browse files Browse the repository at this point in the history
OIIO 2.3.13 with PR #3307 changed MultiByteToWideChar/WideCharToMultiByte
usage to C++11 <codecvt> functionality, but that has two issues:

1) it is *way* slower, primarily due to locale object access
(on Visual C++ STL implementation in VS2022 at least). Since primary
use case of these conversions is on Windows, maybe it is better to
use a fast code path.

2) whole of <codecvt> machinery is deprecated with C++17 accross the
board, and will be removed in C++26. I've kept the existing
functions in there since otherwise it would have been an API break,
but really maybe with OIIO they should have been un-exposed. Too late
now though :(

Performance numbers: doing ImageInput::create() on 1138 files where they
are not images at all (so OIIO in turns tries all the input plugins on
them). Ryzen 5950X, VS2022, Windows:

- utf8_to_utf16 3851ms -> 21ms
- utf16_to_utf8 1055ms -> 4ms

Signed-off-by: Aras Pranckevicius <[email protected]>
  • Loading branch information
aras-p committed Dec 1, 2024
1 parent 9ee71c4 commit 44fb8b2
Showing 1 changed file with 36 additions and 7 deletions.
43 changes: 36 additions & 7 deletions src/libutil/strutil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ OIIO_PRAGMA_WARNING_POP
#if defined(__APPLE__) || defined(__FreeBSD__)
# include <xlocale.h>
#endif
#ifdef _WIN32
# include <windows.h>
#endif

#include <OpenImageIO/dassert.h>
#include <OpenImageIO/string_view.h>
Expand Down Expand Up @@ -961,6 +964,17 @@ Strutil::replace(string_view str, string_view pattern, string_view replacement,
std::wstring
Strutil::utf8_to_utf16wstring(string_view str) noexcept
{
#ifdef _WIN32
// UTF8<->UTF16 conversions are primarily needed on Windows, so use the
// fastest option (C++11 <codecvt> is many times slower due to locale
// access overhead, and is deprecated starting with C++17).
std::wstring result;
result.resize(
MultiByteToWideChar(CP_UTF8, 0, str.data(), str.length(), NULL, 0));
MultiByteToWideChar(CP_UTF8, 0, str.data(), str.length(), result.data(),
(int)result.size());
return result;
#else
try {
OIIO_PRAGMA_WARNING_PUSH
OIIO_CLANG_PRAGMA(GCC diagnostic ignored "-Wdeprecated-declarations")
Expand All @@ -970,13 +984,25 @@ Strutil::utf8_to_utf16wstring(string_view str) noexcept
} catch (const std::exception&) {
return std::wstring();
}
#endif
}



std::string
Strutil::utf16_to_utf8(const std::wstring& str) noexcept
{
#ifdef _WIN32
// UTF8<->UTF16 conversions are primarily needed on Windows, so use the
// fastest option (C++11 <codecvt> is many times slower due to locale
// access overhead, and is deprecated starting with C++17).
std::string result;
result.resize(WideCharToMultiByte(CP_UTF8, 0, str.data(), str.length(),
NULL, 0, NULL, NULL));
WideCharToMultiByte(CP_UTF8, 0, str.data(), str.length(), &result[0],
(int)result.size(), NULL, NULL);
return result;
#else
try {
OIIO_PRAGMA_WARNING_PUSH
OIIO_CLANG_PRAGMA(GCC diagnostic ignored "-Wdeprecated-declarations")
Expand All @@ -986,29 +1012,32 @@ Strutil::utf16_to_utf8(const std::wstring& str) noexcept
} catch (const std::exception&) {
return std::string();
}
#endif
}



std::string
Strutil::utf16_to_utf8(const std::u16string& str) noexcept
{
#ifdef _WIN32
std::string result;
result.resize(WideCharToMultiByte(CP_UTF8, 0, (const WCHAR*)str.data(),
str.length(), NULL, 0, NULL, NULL));
WideCharToMultiByte(CP_UTF8, 0, (const WCHAR*)str.data(), str.length(),
&result[0], (int)result.size(), NULL, NULL);
return result;
#else
try {
OIIO_PRAGMA_WARNING_PUSH
OIIO_CLANG_PRAGMA(GCC diagnostic ignored "-Wdeprecated-declarations")
// There is a bug in MSVS 2017 causing an unresolved symbol if char16_t is used (see https://stackoverflow.com/a/35103224)
#if defined _MSC_VER && _MSC_VER >= 1900 && _MSC_VER < 1930
std::wstring_convert<std::codecvt_utf8_utf16<int16_t>, int16_t> convert;
auto p = reinterpret_cast<const int16_t*>(str.data());
return convert.to_bytes(p, p + str.size());
#else
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
return conv.to_bytes(str);
#endif
OIIO_PRAGMA_WARNING_POP
} catch (const std::exception&) {
return std::string();
}
#endif
}


Expand Down

0 comments on commit 44fb8b2

Please sign in to comment.