Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes: replacement bytes implementation for libc++18 #23072

Merged
merged 11 commits into from
Sep 23, 2024

Conversation

dotnwat
Copy link
Member

@dotnwat dotnwat commented Aug 27, 2024

libc++ >= v18 have deprecated (to be removed in v19) char_traits for T other than char (and some other types, like wchar). our bytes implementation is uses T=uint8_t and because seastar::sstring interoperates with std::string/std::string_view, we encounter the deprecation.

this PR introduces a new bytes implementation that wraps a seastar::sstring<char>, and casts back and forth between pointers as needed at the interface level to provide the illusion of uint8_t storage.

after the conversion to sstring, we recognize that we now control the bytes interface, and use this to reduce the scope by, for example, removing the char* converting constructor, among a couple other interface clean-ups.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

src/v/bytes/bytes.h Outdated Show resolved Hide resolved
@bashtanov bashtanov self-requested a review August 27, 2024 08:43
@dotnwat
Copy link
Member Author

dotnwat commented Aug 27, 2024

I will probably switch this over to use abseil inlinedvector after examining it i think we can achieve a similar uninitialized allocation optimization.

@@ -537,7 +539,10 @@ class verifier {
auto second_dot = jose_enc[0].length() + 1 + jose_enc[1].length();
auto msg = sv.substr(0, second_dot);
if (!verifier->second.verify(
detail::char_view_cast<bytes_view::value_type>(msg), signature)) {
Copy link
Member Author

@dotnwat dotnwat Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenPope @michael-redpanda bytes_view is no longer a basic_string_view

rockwotj
rockwotj previously approved these changes Sep 13, 2024
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice


static const char* cast_down(const uint8_t* p) {
// NOLINTNEXTLINE
return reinterpret_cast<const char*>(p);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure this general approach is UB: i.e., accessing an array of char (which is what ss::string contains) through a uint8_t * is UB, since they are different types. Except in very limited cases you can't convert a pointer of one type to a pointer to an unrelated type and then access though it (and even fewer cases when the pointers are to arrays).

That said, it's probably the type of UB that maybe works in practice?

Copy link
Member Author

@dotnwat dotnwat Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for the UB call out. TBH I thought that as long as the string is treated as a bag of bytes it was ok. I'll do some investigation. It would be nice if everything is above board!

So often in serialization/deserialization, though, we have a bag of bytes (e.g. char*) with a particular encoding which we can use reinterpret_cast to access. Is it that reinterpret_cast is always UB?

Copy link
Member

@travisdowns travisdowns Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So often in serialization/deserialization, though, we have a bag of bytes (e.g. char*) with a particular encoding which we can use reinterpret_cast to access. Is it that reinterpret_cast is always UB?

Not exactly, it's accessing an object of type T thought a pointer to type U that is UB, simply doing the cast itself isn't UB. Of course, to get a U * which actually points to T may require something like a reinterpret cast (though there are other ways too: C-style cast, 2x static_cast though void *, memcpy etc).

This is the so-called "strict aliasing" rules.

we have a bag of bytes

There is an exception for char, but it kind of only works one way. For any type, you can inspect and write it's bytes using char *. So the following (access int through a char pointer) is valid:

int some_int = 5;
char * as_bytes = reinterpret_cast<char *>(&some_int);
printf("byte 2 is %d", as_bytes[2]);

but the reverse (access char through int pointer) is not:

char some_chars[] = {1, 2, 3, 4};
int * as_int = reinterpret_cast<int *>(&some_chars);
printf("four bytes as int: %d", *as_int);

In this case we are least sometimes doing the "reverse" (disallowed) case because ss::string puts chars into the array then we access them as uint8_t.

However, it seems quite unlikely this UB will bite use in practice:

  1. The aliasing exception above for char applies to all "character types", which includes unsigned char too. uint8_t can in principle be a different type from char/unsigned char (i.e., not a character type) but in practice it is unsigned char, so in fact the aliasing exception applies. It's still "weird" that ss::string is treating it as char and the wrapper as unsigned char, but this in the realm of unspecified behavior (depends on the representation of char and uchar but we know they are the same), not undefined behavior.

  2. Even though the aliasing except is "one way" per above, compilers have a pretty hard time applying that aspect and so IME as long as one of the types is a char type, it's not going to do tricky aliasing related optimizations even when you do the "reverse" (disallowed) case. I couldn't come up with any example where it does, anyway.

Godbot:

https://godbolt.org/z/jYjPq4rPo

add_int shows that strict aliasing is used by the optimizer: i0 + ints[0] is effectively collapsed to 2 * ints[0] (i.e., ints is read only once) even though there is an intervening write of the shorts array: the compiler knows shorts can't alias ints because the they different types. add_char shows the opposite: the optimization is not applied because at least one side (in this case both) are character-types, so the aliasing exception applies.

@dotnwat
Copy link
Member Author

dotnwat commented Sep 15, 2024

@travisdowns @StephanDollberg @rockwotj in the interest of avoiding the UB concern entirely (but like travis mentioned, perhaps its UB that we are ok with), i added a new commit that implements bytes in terms of absl::inlined_vector, and created a benchmark for a few cases:

  1. initialized_later
  2. zero initialization
  3. append

bytes.*: abseil version
sstring.*: sstring version

as expected, initialized_later is faster with sstring because abseil interface doesn't offer the option to skip initialization (in this benchmark it is zero initialization).

roughly, up to 128K sizes, we'd pay ~microsecond of overhead. presumably we could also go hunting around the code base for usages of bytes() that are in a hot path and change how the bytes type is used if they show up in a profile?

14: test                              iterations      median         mad         min         max      allocs       tasks        inst
14: bytes.initialized_later_0          459500000     1.373ns     0.000ns     1.373ns     1.375ns       0.000       0.000        17.3
14: sstring.initialized_later_0        503104000     1.189ns     0.000ns     1.187ns     1.190ns       0.000       0.000        10.3
14: bytes.initialized_later_10         290141000     2.604ns     0.001ns     2.603ns     2.604ns       0.000       0.000        36.3
14: sstring.initialized_later_10       489875000     1.206ns     0.002ns     1.205ns     1.208ns       0.000       0.000        10.3
14: bytes.initialized_later_100        106797000     8.209ns     0.000ns     8.208ns     8.210ns       1.000       0.000       143.3
14: sstring.initialized_later_100      118622000     7.262ns     0.002ns     7.257ns     7.266ns       1.000       0.000       123.3
14: bytes.initialized_later_1000        74471000     9.398ns     0.013ns     9.385ns     9.428ns       1.000       0.000       172.3
14: sstring.initialized_later_1000      89729000     7.098ns     0.001ns     7.097ns     7.100ns       1.000       0.000       123.3
14: bytes.initialized_later_10000       10988000    60.754ns     0.007ns    60.747ns    60.777ns       1.000       0.000       424.3
14: sstring.initialized_later_10000     26059000     7.074ns     0.002ns     7.072ns     7.077ns       1.000       0.000       123.3
14: bytes.initialized_later_100000       1095000   615.564ns     0.075ns   615.475ns   618.526ns       1.000       0.000      3389.3
14: sstring.initialized_later_100000     3005000    28.151ns     0.082ns    28.070ns    28.300ns       1.000       0.000       630.3
14: bytes.initialized_zero_0           463474000     1.361ns     0.000ns     1.361ns     1.362ns       0.000       0.000        17.3
14: sstring.initialized_zero_0         170729000     5.054ns     0.000ns     5.053ns     5.056ns       0.000       0.000        29.3
14: bytes.initialized_zero_10          306899000     2.421ns     0.000ns     2.420ns     2.421ns       0.000       0.000        36.3
14: sstring.initialized_zero_10        169572000     5.056ns     0.001ns     5.055ns     5.059ns       0.000       0.000        29.3
14: bytes.initialized_zero_100         102871000     8.575ns     0.000ns     8.572ns     8.576ns       1.000       0.000       143.3
14: sstring.initialized_zero_100       104785000     8.371ns     0.004ns     8.364ns     8.375ns       1.000       0.000       138.3
14: bytes.initialized_zero_1000         74622000     9.385ns     0.001ns     9.383ns     9.387ns       1.000       0.000       172.3
14: sstring.initialized_zero_1000       74918000     9.383ns     0.001ns     9.374ns     9.388ns       1.000       0.000       167.3
14: bytes.initialized_zero_10000        10982000    60.763ns     0.003ns    60.756ns    60.772ns       1.000       0.000       424.3
14: sstring.initialized_zero_10000      10971000    60.759ns     0.002ns    60.748ns    60.766ns       1.000       0.000       419.3
14: bytes.initialized_zero_100000        1095000   615.436ns     0.028ns   615.382ns   615.481ns       1.000       0.000      3389.3
14: sstring.initialized_zero_100000      1095000   616.402ns     0.088ns   616.314ns   616.564ns       1.000       0.000      3383.3
14: bytes.append_0                     276549000     2.812ns     0.000ns     2.812ns     2.814ns       0.000       0.000        57.3
14: sstring.append_0                    83581000    11.155ns     0.000ns    11.155ns    11.166ns       0.000       0.000        87.3
14: bytes.append_10                     94539000     9.884ns     0.001ns     9.879ns     9.885ns       0.000       0.000       127.3
14: sstring.append_10                   71173000    13.316ns     0.001ns    13.310ns    13.316ns       0.000       0.000       121.3
14: bytes.append_100                    48183000    19.572ns     0.007ns    19.564ns    19.585ns       2.000       0.000       410.3
14: sstring.append_100                  49985000    18.837ns     0.002ns    18.830ns    18.842ns       2.000       0.000       336.3
14: bytes.append_1000                   18448000    51.085ns     0.064ns    50.851ns    51.148ns       2.000       0.000       724.3
14: sstring.append_1000                 21044000    43.598ns     0.051ns    43.546ns    43.729ns       2.000       0.000       492.3
14: bytes.append_10000                   2714000   337.452ns     0.048ns   337.404ns   337.749ns       2.000       0.000      3898.3
14: sstring.append_10000                 3532000   253.486ns     0.004ns   253.457ns   253.490ns       2.000       0.000      1868.3
14: bytes.append_100000                   261000     3.533us     0.165ns     3.533us     3.533us       2.000       0.000     33147.3
14: sstring.append_100000                 316000     2.874us     0.108ns     2.873us     2.874us       2.000       0.000     13213.3
14: Test Exit code 0
1/1 Test #14: bytes_bench_rpbench ..............   Passed  216.32 sec

@rockwotj
Copy link
Contributor

As a side note, it looks like std::string now (as of C++23) has the ability to be created but uninitialized with resize_and_overwrite. It's been around in libc++ for a while.

@dotnwat
Copy link
Member Author

dotnwat commented Sep 16, 2024

As a side note, it looks like std::string now (as of C++23) has the ability to be created but uninitialized with resize_and_overwrite. It's been around in libc++ for a while.

yeh. that interface was rejected for std::vector and i think also std::inplace_vector for some reason.

@rockwotj
Copy link
Contributor

Yeah bypassing default constructors and such can be tricky. It's easier to explain with raw bytes.

@redpanda-data redpanda-data deleted a comment from vbotbuildovich Sep 16, 2024
@StephanDollberg
Copy link
Member

/microbench

@StephanDollberg
Copy link
Member

Instruction and alloc count diffs from other microbenches:

Performance changes detected in 9 tests
storage_rpbench_reducer_bench.compaction_key_reducer_test: inst -> +2.39%
heartbeat_bench_rpbench_fixture.test_old_hb_reply: inst -> +11.18%
heartbeat_bench_rpbench_fixture.test_old_hb_reply: allocs -> +0.00%
heartbeat_bench_rpbench_fixture.test_old_hb_request: inst -> +11.10%
crypto_bench_rpbench_openssl_perf_test.md5_1k: inst -> +0.14%
crypto_bench_rpbench_openssl_perf_test.sha256_1k: inst -> +0.04%
crypto_bench_rpbench_openssl_perf_test.sha512_1k: inst -> +0.06%
crypto_bench_fips_rpbench_openssl_perf_test.md5_1k: inst -> +0.14%
crypto_bench_fips_rpbench_openssl_perf_test.sha256_1k: inst -> +0.04%
crypto_bench_fips_rpbench_openssl_perf_test.sha512_1k: inst -> +0.06%

@dotnwat
Copy link
Member Author

dotnwat commented Sep 16, 2024

Instruction and alloc count diffs from other microbenches:

that seems unsurprising with abseil's inlined vector. what's the threshold for concern?

@StephanDollberg
Copy link
Member

There is no general policy, will always have to go on a case by case basis.

This looks fine to me as well given the only major change is in the old heartbeats which shouldn't have any major usage anymore.

@travisdowns
Copy link
Member

travisdowns commented Sep 16, 2024

@rockwotj wrote:

As a side note, it looks like std::string now (as of C++23) has the ability to be created but uninitialized with resize_and_overwrite. It's been around in libc++ for a while.

Well it doesn't really allow you to do the "created but uninitialized" thing, I don't think:

If any of the following conditions is satisfied, the behavior is undefined:

...

  • Any character in range [p, p + r) has an indeterminate value.

So it's saying it's UB to (for example), pass in an op which simply returns count but does not write the corresponding chars, which is how you'd emulate allocate-but-not-init.

Still, even if you adhere to this rule this interface can replace some of the reasons you want uninit storage in the first place.

Of course, breaking this rule seems quite unlikely to be punished in practice.

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the code LGTM, just one thing that the new struct is 8 bytes bigger and IDK if that's OK.

src/v/bytes/bytes.h Show resolved Hide resolved
@dotnwat
Copy link
Member Author

dotnwat commented Sep 20, 2024

Overall the code LGTM, just one thing that the new struct is 8 bytes bigger and IDK if that's OK.

yeh, also noticed this. i don't think it matters, and i'm not sure where the existing bytes_inline_size comes from. it was probably an educated guess.

tbh i'd like to teach iobuf SSO and get rid of bytes type entirely.

rockwotj
rockwotj previously approved these changes Sep 20, 2024
@dotnwat
Copy link
Member Author

dotnwat commented Sep 20, 2024

thanks for the review @rockwotj. i have a merge conflict, and a few things to cleanup. should be able to get this posted again next week.

@travisdowns
Copy link
Member

I agree that the 8 bytes increase is probably OK.

The bytes_view is no longer a string_view type. Instead of wiring into
the char_view_cast in jwt.h there is only one place where conversion is
needed so its done explicitly there.

Signed-off-by: Noah Watkins <[email protected]>
This is necessary for using libc++18 when type_traits<T> is deprecated
for all types other than char (and a couple other types). So instead we
wrap an absl::inlined_vector and expose it with the same interface.

Signed-off-by: Noah Watkins <[email protected]>
This constructor had already been involved in a reversal of parameters
mistake. redpanda-data@49fef40

Signed-off-by: Noah Watkins <[email protected]>
Although common in tests, there are very few places where a bytes()
object is constructed from a string. Having a converting constructor for
a string literal throws away some of the strong type benefits of the
bytes object. So we replace it with a bytes::from_string factory.

Signed-off-by: Noah Watkins <[email protected]>
Avoids the use of append(pointer,1) for adding a single element to the
bytes vector. This is also a useful interface because it can be used
with things like std::back_inserter.

Signed-off-by: Noah Watkins <[email protected]>
All of the remaining instances of bytes::append are just longer forms of
bytes::from_string factory.

Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>

This comment was marked as resolved.

@dotnwat dotnwat merged commit 040a655 into redpanda-data:dev Sep 23, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants