bytes: replacement bytes implementation for libc++18 #23072

dotnwat · 2024-08-27T04:27:51Z

libc++ >= v18 have deprecated (to be removed in v19) char_traits for T other than char (and some other types, like wchar). our bytes implementation is uses T=uint8_t and because seastar::sstring interoperates with std::string/std::string_view, we encounter the deprecation.

this PR introduces a new bytes implementation that wraps a seastar::sstring<char>, and casts back and forth between pointers as needed at the interface level to provide the illusion of uint8_t storage.

after the conversion to sstring, we recognize that we now control the bytes interface, and use this to reduce the scope by, for example, removing the char* converting constructor, among a couple other interface clean-ups.

Backports Required

Release Notes

none

src/v/bytes/bytes.h

dotnwat · 2024-08-27T19:47:12Z

~~I will probably switch this over to use abseil inlinedvector after examining it i think we can achieve a similar uninitialized allocation optimization.~~

vbotbuildovich · 2024-08-28T02:58:36Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53648#01919687-a202-4ac3-8e9b-332abfcecaef

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54467#0191ecb4-4b10-4abb-95f9-108199563778

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54467#0191edb6-6c7f-4622-ae75-732df0996cf2

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54499#0191f739-8cf9-417c-8997-287c31d17996

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54905#01921b57-4157-4d81-9700-0e4b9b9330bb

dotnwat · 2024-09-13T17:27:00Z

src/v/security/jwt.h

@@ -537,7 +539,10 @@ class verifier {
        auto second_dot = jose_enc[0].length() + 1 + jose_enc[1].length();
        auto msg = sv.substr(0, second_dot);
        if (!verifier->second.verify(
-              detail::char_view_cast<bytes_view::value_type>(msg), signature)) {


@BenPope @michael-redpanda bytes_view is no longer a basic_string_view

rockwotj

nice

travisdowns · 2024-09-14T02:26:46Z

src/v/bytes/bytes.h

+
+    static const char* cast_down(const uint8_t* p) {
+        // NOLINTNEXTLINE
+        return reinterpret_cast<const char*>(p);


I am pretty sure this general approach is UB: i.e., accessing an array of char (which is what ss::string contains) through a uint8_t * is UB, since they are different types. Except in very limited cases you can't convert a pointer of one type to a pointer to an unrelated type and then access though it (and even fewer cases when the pointers are to arrays).

That said, it's probably the type of UB that maybe works in practice?

Interesting, thanks for the UB call out. TBH I thought that as long as the string is treated as a bag of bytes it was ok. I'll do some investigation. It would be nice if everything is above board!

So often in serialization/deserialization, though, we have a bag of bytes (e.g. char*) with a particular encoding which we can use reinterpret_cast to access. Is it that reinterpret_cast is always UB?

So often in serialization/deserialization, though, we have a bag of bytes (e.g. char*) with a particular encoding which we can use reinterpret_cast to access. Is it that reinterpret_cast is always UB?

Not exactly, it's accessing an object of type T thought a pointer to type U that is UB, simply doing the cast itself isn't UB. Of course, to get a U * which actually points to T may require something like a reinterpret cast (though there are other ways too: C-style cast, 2x static_cast though void *, memcpy etc).

This is the so-called "strict aliasing" rules.

we have a bag of bytes

There is an exception for char, but it kind of only works one way. For any type, you can inspect and write it's bytes using char *. So the following (access int through a char pointer) is valid:

int some_int = 5; char * as_bytes = reinterpret_cast<char *>(&some_int); printf("byte 2 is %d", as_bytes[2]);

but the reverse (access char through int pointer) is not:

char some_chars[] = {1, 2, 3, 4}; int * as_int = reinterpret_cast<int *>(&some_chars); printf("four bytes as int: %d", *as_int);

In this case we are least sometimes doing the "reverse" (disallowed) case because ss::string puts chars into the array then we access them as uint8_t.

However, it seems quite unlikely this UB will bite use in practice:

The aliasing exception above for char applies to all "character types", which includes unsigned char too. uint8_t can in principle be a different type from char/unsigned char (i.e., not a character type) but in practice it is unsigned char, so in fact the aliasing exception applies. It's still "weird" that ss::string is treating it as char and the wrapper as unsigned char, but this in the realm of unspecified behavior (depends on the representation of char and uchar but we know they are the same), not undefined behavior.

Even though the aliasing except is "one way" per above, compilers have a pretty hard time applying that aspect and so IME as long as one of the types is a char type, it's not going to do tricky aliasing related optimizations even when you do the "reverse" (disallowed) case. I couldn't come up with any example where it does, anyway.

Godbot:

https://godbolt.org/z/jYjPq4rPo

add_int shows that strict aliasing is used by the optimizer: i0 + ints[0] is effectively collapsed to 2 * ints[0] (i.e., ints is read only once) even though there is an intervening write of the shorts array: the compiler knows shorts can't alias ints because the they different types. add_char shows the opposite: the optimization is not applied because at least one side (in this case both) are character-types, so the aliasing exception applies.

dotnwat · 2024-09-15T18:36:53Z

@travisdowns @StephanDollberg @rockwotj in the interest of avoiding the UB concern entirely (but like travis mentioned, perhaps its UB that we are ok with), i added a new commit that implements bytes in terms of absl::inlined_vector, and created a benchmark for a few cases:

initialized_later
zero initialization
append

bytes.*: abseil version
sstring.*: sstring version

as expected, initialized_later is faster with sstring because abseil interface doesn't offer the option to skip initialization (in this benchmark it is zero initialization).

roughly, up to 128K sizes, we'd pay ~microsecond of overhead. presumably we could also go hunting around the code base for usages of bytes() that are in a hot path and change how the bytes type is used if they show up in a profile?

14: test                              iterations      median         mad         min         max      allocs       tasks        inst
14: bytes.initialized_later_0          459500000     1.373ns     0.000ns     1.373ns     1.375ns       0.000       0.000        17.3
14: sstring.initialized_later_0        503104000     1.189ns     0.000ns     1.187ns     1.190ns       0.000       0.000        10.3
14: bytes.initialized_later_10         290141000     2.604ns     0.001ns     2.603ns     2.604ns       0.000       0.000        36.3
14: sstring.initialized_later_10       489875000     1.206ns     0.002ns     1.205ns     1.208ns       0.000       0.000        10.3
14: bytes.initialized_later_100        106797000     8.209ns     0.000ns     8.208ns     8.210ns       1.000       0.000       143.3
14: sstring.initialized_later_100      118622000     7.262ns     0.002ns     7.257ns     7.266ns       1.000       0.000       123.3
14: bytes.initialized_later_1000        74471000     9.398ns     0.013ns     9.385ns     9.428ns       1.000       0.000       172.3
14: sstring.initialized_later_1000      89729000     7.098ns     0.001ns     7.097ns     7.100ns       1.000       0.000       123.3
14: bytes.initialized_later_10000       10988000    60.754ns     0.007ns    60.747ns    60.777ns       1.000       0.000       424.3
14: sstring.initialized_later_10000     26059000     7.074ns     0.002ns     7.072ns     7.077ns       1.000       0.000       123.3
14: bytes.initialized_later_100000       1095000   615.564ns     0.075ns   615.475ns   618.526ns       1.000       0.000      3389.3
14: sstring.initialized_later_100000     3005000    28.151ns     0.082ns    28.070ns    28.300ns       1.000       0.000       630.3
14: bytes.initialized_zero_0           463474000     1.361ns     0.000ns     1.361ns     1.362ns       0.000       0.000        17.3
14: sstring.initialized_zero_0         170729000     5.054ns     0.000ns     5.053ns     5.056ns       0.000       0.000        29.3
14: bytes.initialized_zero_10          306899000     2.421ns     0.000ns     2.420ns     2.421ns       0.000       0.000        36.3
14: sstring.initialized_zero_10        169572000     5.056ns     0.001ns     5.055ns     5.059ns       0.000       0.000        29.3
14: bytes.initialized_zero_100         102871000     8.575ns     0.000ns     8.572ns     8.576ns       1.000       0.000       143.3
14: sstring.initialized_zero_100       104785000     8.371ns     0.004ns     8.364ns     8.375ns       1.000       0.000       138.3
14: bytes.initialized_zero_1000         74622000     9.385ns     0.001ns     9.383ns     9.387ns       1.000       0.000       172.3
14: sstring.initialized_zero_1000       74918000     9.383ns     0.001ns     9.374ns     9.388ns       1.000       0.000       167.3
14: bytes.initialized_zero_10000        10982000    60.763ns     0.003ns    60.756ns    60.772ns       1.000       0.000       424.3
14: sstring.initialized_zero_10000      10971000    60.759ns     0.002ns    60.748ns    60.766ns       1.000       0.000       419.3
14: bytes.initialized_zero_100000        1095000   615.436ns     0.028ns   615.382ns   615.481ns       1.000       0.000      3389.3
14: sstring.initialized_zero_100000      1095000   616.402ns     0.088ns   616.314ns   616.564ns       1.000       0.000      3383.3
14: bytes.append_0                     276549000     2.812ns     0.000ns     2.812ns     2.814ns       0.000       0.000        57.3
14: sstring.append_0                    83581000    11.155ns     0.000ns    11.155ns    11.166ns       0.000       0.000        87.3
14: bytes.append_10                     94539000     9.884ns     0.001ns     9.879ns     9.885ns       0.000       0.000       127.3
14: sstring.append_10                   71173000    13.316ns     0.001ns    13.310ns    13.316ns       0.000       0.000       121.3
14: bytes.append_100                    48183000    19.572ns     0.007ns    19.564ns    19.585ns       2.000       0.000       410.3
14: sstring.append_100                  49985000    18.837ns     0.002ns    18.830ns    18.842ns       2.000       0.000       336.3
14: bytes.append_1000                   18448000    51.085ns     0.064ns    50.851ns    51.148ns       2.000       0.000       724.3
14: sstring.append_1000                 21044000    43.598ns     0.051ns    43.546ns    43.729ns       2.000       0.000       492.3
14: bytes.append_10000                   2714000   337.452ns     0.048ns   337.404ns   337.749ns       2.000       0.000      3898.3
14: sstring.append_10000                 3532000   253.486ns     0.004ns   253.457ns   253.490ns       2.000       0.000      1868.3
14: bytes.append_100000                   261000     3.533us     0.165ns     3.533us     3.533us       2.000       0.000     33147.3
14: sstring.append_100000                 316000     2.874us     0.108ns     2.873us     2.874us       2.000       0.000     13213.3
14: Test Exit code 0
1/1 Test #14: bytes_bench_rpbench ..............   Passed  216.32 sec

rockwotj · 2024-09-16T00:28:51Z

As a side note, it looks like std::string now (as of C++23) has the ability to be created but uninitialized with resize_and_overwrite. It's been around in libc++ for a while.

dotnwat · 2024-09-16T00:35:46Z

As a side note, it looks like std::string now (as of C++23) has the ability to be created but uninitialized with resize_and_overwrite. It's been around in libc++ for a while.

yeh. that interface was rejected for std::vector and i think also std::inplace_vector for some reason.

rockwotj · 2024-09-16T00:55:12Z

Yeah bypassing default constructors and such can be tricky. It's easier to explain with raw bytes.

StephanDollberg · 2024-09-16T08:45:52Z

/microbench

StephanDollberg · 2024-09-16T12:48:39Z

Instruction and alloc count diffs from other microbenches:

Performance changes detected in 9 tests
storage_rpbench_reducer_bench.compaction_key_reducer_test: inst -> +2.39%
heartbeat_bench_rpbench_fixture.test_old_hb_reply: inst -> +11.18%
heartbeat_bench_rpbench_fixture.test_old_hb_reply: allocs -> +0.00%
heartbeat_bench_rpbench_fixture.test_old_hb_request: inst -> +11.10%
crypto_bench_rpbench_openssl_perf_test.md5_1k: inst -> +0.14%
crypto_bench_rpbench_openssl_perf_test.sha256_1k: inst -> +0.04%
crypto_bench_rpbench_openssl_perf_test.sha512_1k: inst -> +0.06%
crypto_bench_fips_rpbench_openssl_perf_test.md5_1k: inst -> +0.14%
crypto_bench_fips_rpbench_openssl_perf_test.sha256_1k: inst -> +0.04%
crypto_bench_fips_rpbench_openssl_perf_test.sha512_1k: inst -> +0.06%

dotnwat · 2024-09-16T15:07:58Z

Instruction and alloc count diffs from other microbenches:

that seems unsurprising with abseil's inlined vector. what's the threshold for concern?

StephanDollberg · 2024-09-16T15:30:07Z

There is no general policy, will always have to go on a case by case basis.

This looks fine to me as well given the only major change is in the old heartbeats which shouldn't have any major usage anymore.

travisdowns · 2024-09-16T19:40:38Z

@rockwotj wrote:

As a side note, it looks like std::string now (as of C++23) has the ability to be created but uninitialized with resize_and_overwrite. It's been around in libc++ for a while.

Well it doesn't really allow you to do the "created but uninitialized" thing, I don't think:

If any of the following conditions is satisfied, the behavior is undefined:

...

Any character in range [p, p + r) has an indeterminate value.

So it's saying it's UB to (for example), pass in an op which simply returns count but does not write the corresponding chars, which is how you'd emulate allocate-but-not-init.

Still, even if you adhere to this rule this interface can replace some of the reasons you want uninit storage in the first place.

Of course, breaking this rule seems quite unlikely to be punished in practice.

rockwotj

Overall the code LGTM, just one thing that the new struct is 8 bytes bigger and IDK if that's OK.

src/v/bytes/bytes.h

dotnwat · 2024-09-20T18:38:39Z

Overall the code LGTM, just one thing that the new struct is 8 bytes bigger and IDK if that's OK.

yeh, also noticed this. i don't think it matters, and i'm not sure where the existing bytes_inline_size comes from. it was probably an educated guess.

tbh i'd like to teach iobuf SSO and get rid of bytes type entirely.

dotnwat · 2024-09-20T18:40:39Z

thanks for the review @rockwotj. i have a merge conflict, and a few things to cleanup. should be able to get this posted again next week.

travisdowns · 2024-09-21T12:20:28Z

I agree that the 8 bytes increase is probably OK.

The bytes_view is no longer a string_view type. Instead of wiring into the char_view_cast in jwt.h there is only one place where conversion is needed so its done explicitly there. Signed-off-by: Noah Watkins <[email protected]>

This is necessary for using libc++18 when type_traits<T> is deprecated for all types other than char (and a couple other types). So instead we wrap an absl::inlined_vector and expose it with the same interface. Signed-off-by: Noah Watkins <[email protected]>

Signed-off-by: Noah Watkins <[email protected]>

This constructor had already been involved in a reversal of parameters mistake. redpanda-data@49fef40 Signed-off-by: Noah Watkins <[email protected]>

Although common in tests, there are very few places where a bytes() object is constructed from a string. Having a converting constructor for a string literal throws away some of the strong type benefits of the bytes object. So we replace it with a bytes::from_string factory. Signed-off-by: Noah Watkins <[email protected]>

Signed-off-by: Noah Watkins <[email protected]>

Avoids the use of append(pointer,1) for adding a single element to the bytes vector. This is also a useful interface because it can be used with things like std::back_inserter. Signed-off-by: Noah Watkins <[email protected]>

All of the remaining instances of bytes::append are just longer forms of bytes::from_string factory. Signed-off-by: Noah Watkins <[email protected]>

Signed-off-by: Noah Watkins <[email protected]>

github-actions bot added the area/redpanda label Aug 27, 2024

dotnwat commented Aug 27, 2024

View reviewed changes

src/v/bytes/bytes.h Outdated Show resolved Hide resolved

dotnwat commented Aug 27, 2024

View reviewed changes

src/v/bytes/bytes.h Show resolved Hide resolved

bashtanov self-requested a review August 27, 2024 08:43

dotnwat force-pushed the bytes branch from d3b9e72 to dc002f2 Compare August 27, 2024 23:53

dotnwat marked this pull request as draft August 27, 2024 23:53

dotnwat force-pushed the bytes branch from dc002f2 to c7b63a9 Compare September 11, 2024 20:43

github-actions bot added the area/build label Sep 11, 2024

dotnwat force-pushed the bytes branch from c7b63a9 to c4fa013 Compare September 12, 2024 00:37

github-actions bot added the area/wasm WASM Data Transforms label Sep 12, 2024

dotnwat force-pushed the bytes branch from c4fa013 to 67c3f9b Compare September 13, 2024 17:06

dotnwat marked this pull request as ready for review September 13, 2024 17:07

dotnwat requested a review from michael-redpanda as a code owner September 13, 2024 17:07

dotnwat requested review from travisdowns and StephanDollberg September 13, 2024 17:08

dotnwat force-pushed the bytes branch from 67c3f9b to 4f19ad4 Compare September 13, 2024 17:26

dotnwat commented Sep 13, 2024

View reviewed changes

rockwotj previously approved these changes Sep 13, 2024

View reviewed changes

travisdowns reviewed Sep 14, 2024

View reviewed changes

dotnwat dismissed rockwotj’s stale review via 081abb3 September 15, 2024 18:27

redpanda-data deleted a comment from vbotbuildovich Sep 16, 2024

rockwotj mentioned this pull request Sep 17, 2024

bazel: PGO for bazel clang toolchain #23142

Merged

7 tasks

rockwotj reviewed Sep 20, 2024

View reviewed changes

src/v/bytes/bytes.h Show resolved Hide resolved

rockwotj previously approved these changes Sep 20, 2024

View reviewed changes

dotnwat added 11 commits September 21, 2024 08:41

security: avoid bytes_view to string_view conversion

31afcaa

The bytes_view is no longer a string_view type. Instead of wiring into the char_view_cast in jwt.h there is only one place where conversion is needed so its done explicitly there. Signed-off-by: Noah Watkins <[email protected]>

bytes: clean up output operator no std

9bb2f26

Signed-off-by: Noah Watkins <[email protected]>

bytes: use more efficient iobuf::from factory

8016ba6

Signed-off-by: Noah Watkins <[email protected]>

bytes: add initialized_zero constructor tag

524ac8b

This constructor had already been involved in a reversal of parameters mistake. redpanda-data@49fef40 Signed-off-by: Noah Watkins <[email protected]>

bytes: remove bytes::operator+=

ff83042

Signed-off-by: Noah Watkins <[email protected]>

bytes: remove unused constructor

f376201

Signed-off-by: Noah Watkins <[email protected]>

bytes: add push_back interface

749d649

Avoids the use of append(pointer,1) for adding a single element to the bytes vector. This is also a useful interface because it can be used with things like std::back_inserter. Signed-off-by: Noah Watkins <[email protected]>

bytes: remove append interface

6849d4f

All of the remaining instances of bytes::append are just longer forms of bytes::from_string factory. Signed-off-by: Noah Watkins <[email protected]>

chore: apply clang-format

453a3c0

Signed-off-by: Noah Watkins <[email protected]>

dotnwat dismissed rockwotj’s stale review via 453a3c0 September 22, 2024 18:22

dotnwat force-pushed the bytes branch from 081abb3 to 453a3c0 Compare September 22, 2024 18:22

This comment was marked as resolved.

Sign in to view

dotnwat requested review from travisdowns and rockwotj September 23, 2024 14:56

rockwotj approved these changes Sep 23, 2024

View reviewed changes

dotnwat merged commit 040a655 into redpanda-data:dev Sep 23, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytes: replacement bytes implementation for libc++18 #23072

bytes: replacement bytes implementation for libc++18 #23072

dotnwat commented Aug 27, 2024 •

edited

Loading

dotnwat commented Aug 27, 2024 •

edited

Loading

vbotbuildovich commented Aug 28, 2024 •

edited

Loading

dotnwat Sep 13, 2024 •

edited

Loading

rockwotj left a comment

travisdowns Sep 14, 2024

dotnwat Sep 14, 2024 •

edited

Loading

travisdowns Sep 16, 2024 •

edited

Loading

dotnwat commented Sep 15, 2024 •

edited

Loading

rockwotj commented Sep 16, 2024

dotnwat commented Sep 16, 2024

rockwotj commented Sep 16, 2024

StephanDollberg commented Sep 16, 2024

StephanDollberg commented Sep 16, 2024

dotnwat commented Sep 16, 2024

StephanDollberg commented Sep 16, 2024

travisdowns commented Sep 16, 2024 •

edited

Loading

rockwotj left a comment

dotnwat commented Sep 20, 2024

dotnwat commented Sep 20, 2024

travisdowns commented Sep 21, 2024

This comment was marked as resolved.

bytes: replacement bytes implementation for libc++18 #23072

bytes: replacement bytes implementation for libc++18 #23072

Conversation

dotnwat commented Aug 27, 2024 • edited Loading

Backports Required

Release Notes

dotnwat commented Aug 27, 2024 • edited Loading

vbotbuildovich commented Aug 28, 2024 • edited Loading

dotnwat Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

rockwotj left a comment

Choose a reason for hiding this comment

travisdowns Sep 14, 2024

Choose a reason for hiding this comment

dotnwat Sep 14, 2024 • edited Loading

Choose a reason for hiding this comment

travisdowns Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

dotnwat commented Sep 15, 2024 • edited Loading

rockwotj commented Sep 16, 2024

dotnwat commented Sep 16, 2024

rockwotj commented Sep 16, 2024

StephanDollberg commented Sep 16, 2024

StephanDollberg commented Sep 16, 2024

dotnwat commented Sep 16, 2024

StephanDollberg commented Sep 16, 2024

travisdowns commented Sep 16, 2024 • edited Loading

rockwotj left a comment

Choose a reason for hiding this comment

dotnwat commented Sep 20, 2024

dotnwat commented Sep 20, 2024

travisdowns commented Sep 21, 2024

This comment was marked as resolved.

dotnwat commented Aug 27, 2024 •

edited

Loading

dotnwat commented Aug 27, 2024 •

edited

Loading

vbotbuildovich commented Aug 28, 2024 •

edited

Loading

dotnwat Sep 13, 2024 •

edited

Loading

dotnwat Sep 14, 2024 •

edited

Loading

travisdowns Sep 16, 2024 •

edited

Loading

dotnwat commented Sep 15, 2024 •

edited

Loading

travisdowns commented Sep 16, 2024 •

edited

Loading