-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes: replacement bytes implementation for libc++18 #23072
Conversation
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53648#01919687-a202-4ac3-8e9b-332abfcecaef ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54467#0191ecb4-4b10-4abb-95f9-108199563778 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54467#0191edb6-6c7f-4622-ae75-732df0996cf2 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54499#0191f739-8cf9-417c-8997-287c31d17996 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54905#01921b57-4157-4d81-9700-0e4b9b9330bb |
@@ -537,7 +539,10 @@ class verifier { | |||
auto second_dot = jose_enc[0].length() + 1 + jose_enc[1].length(); | |||
auto msg = sv.substr(0, second_dot); | |||
if (!verifier->second.verify( | |||
detail::char_view_cast<bytes_view::value_type>(msg), signature)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenPope @michael-redpanda bytes_view is no longer a basic_string_view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
src/v/bytes/bytes.h
Outdated
|
||
static const char* cast_down(const uint8_t* p) { | ||
// NOLINTNEXTLINE | ||
return reinterpret_cast<const char*>(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure this general approach is UB: i.e., accessing an array of char
(which is what ss::string
contains) through a uint8_t *
is UB, since they are different types. Except in very limited cases you can't convert a pointer of one type to a pointer to an unrelated type and then access though it (and even fewer cases when the pointers are to arrays).
That said, it's probably the type of UB that maybe works in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for the UB call out. TBH I thought that as long as the string is treated as a bag of bytes it was ok. I'll do some investigation. It would be nice if everything is above board!
So often in serialization/deserialization, though, we have a bag of bytes (e.g. char*) with a particular encoding which we can use reinterpret_cast to access. Is it that reinterpret_cast is always UB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So often in serialization/deserialization, though, we have a bag of bytes (e.g. char*) with a particular encoding which we can use reinterpret_cast to access. Is it that reinterpret_cast is always UB?
Not exactly, it's accessing an object of type T
thought a pointer to type U
that is UB, simply doing the cast itself isn't UB. Of course, to get a U *
which actually points to T
may require something like a reinterpret cast (though there are other ways too: C-style cast, 2x static_cast though void *
, memcpy etc).
This is the so-called "strict aliasing" rules.
we have a bag of bytes
There is an exception for char
, but it kind of only works one way. For any type, you can inspect and write it's bytes using char *
. So the following (access int
through a char
pointer) is valid:
int some_int = 5;
char * as_bytes = reinterpret_cast<char *>(&some_int);
printf("byte 2 is %d", as_bytes[2]);
but the reverse (access char through int pointer) is not:
char some_chars[] = {1, 2, 3, 4};
int * as_int = reinterpret_cast<int *>(&some_chars);
printf("four bytes as int: %d", *as_int);
In this case we are least sometimes doing the "reverse" (disallowed) case because ss::string
puts chars into the array then we access them as uint8_t
.
However, it seems quite unlikely this UB will bite use in practice:
-
The aliasing exception above for
char
applies to all "character types", which includesunsigned char
too. uint8_t can in principle be a different type fromchar
/unsigned char
(i.e., not a character type) but in practice it isunsigned char
, so in fact the aliasing exception applies. It's still "weird" thatss::string
is treating it aschar
and the wrapper asunsigned char
, but this in the realm of unspecified behavior (depends on the representation of char and uchar but we know they are the same), not undefined behavior. -
Even though the aliasing except is "one way" per above, compilers have a pretty hard time applying that aspect and so IME as long as one of the types is a char type, it's not going to do tricky aliasing related optimizations even when you do the "reverse" (disallowed) case. I couldn't come up with any example where it does, anyway.
Godbot:
https://godbolt.org/z/jYjPq4rPo
add_int
shows that strict aliasing is used by the optimizer: i0 + ints[0]
is effectively collapsed to 2 * ints[0]
(i.e., ints
is read only once) even though there is an intervening write of the shorts
array: the compiler knows shorts
can't alias ints
because the they different types. add_char
shows the opposite: the optimization is not applied because at least one side (in this case both) are character-types, so the aliasing exception applies.
@travisdowns @StephanDollberg @rockwotj in the interest of avoiding the UB concern entirely (but like travis mentioned, perhaps its UB that we are ok with), i added a new commit that implements bytes in terms of absl::inlined_vector, and created a benchmark for a few cases:
as expected, initialized_later is faster with sstring because abseil interface doesn't offer the option to skip initialization (in this benchmark it is zero initialization). roughly, up to 128K sizes, we'd pay ~microsecond of overhead. presumably we could also go hunting around the code base for usages of bytes() that are in a hot path and change how the bytes type is used if they show up in a profile?
|
As a side note, it looks like |
yeh. that interface was rejected for std::vector and i think also std::inplace_vector for some reason. |
Yeah bypassing default constructors and such can be tricky. It's easier to explain with raw bytes. |
/microbench |
Instruction and alloc count diffs from other microbenches:
|
that seems unsurprising with abseil's inlined vector. what's the threshold for concern? |
There is no general policy, will always have to go on a case by case basis. This looks fine to me as well given the only major change is in the old heartbeats which shouldn't have any major usage anymore. |
@rockwotj wrote:
Well it doesn't really allow you to do the "created but uninitialized" thing, I don't think:
So it's saying it's UB to (for example), pass in an Still, even if you adhere to this rule this interface can replace some of the reasons you want uninit storage in the first place. Of course, breaking this rule seems quite unlikely to be punished in practice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the code LGTM, just one thing that the new struct is 8 bytes bigger and IDK if that's OK.
yeh, also noticed this. i don't think it matters, and i'm not sure where the existing bytes_inline_size comes from. it was probably an educated guess. tbh i'd like to teach iobuf SSO and get rid of bytes type entirely. |
thanks for the review @rockwotj. i have a merge conflict, and a few things to cleanup. should be able to get this posted again next week. |
I agree that the 8 bytes increase is probably OK. |
The bytes_view is no longer a string_view type. Instead of wiring into the char_view_cast in jwt.h there is only one place where conversion is needed so its done explicitly there. Signed-off-by: Noah Watkins <[email protected]>
This is necessary for using libc++18 when type_traits<T> is deprecated for all types other than char (and a couple other types). So instead we wrap an absl::inlined_vector and expose it with the same interface. Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
This constructor had already been involved in a reversal of parameters mistake. redpanda-data@49fef40 Signed-off-by: Noah Watkins <[email protected]>
Although common in tests, there are very few places where a bytes() object is constructed from a string. Having a converting constructor for a string literal throws away some of the strong type benefits of the bytes object. So we replace it with a bytes::from_string factory. Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
Avoids the use of append(pointer,1) for adding a single element to the bytes vector. This is also a useful interface because it can be used with things like std::back_inserter. Signed-off-by: Noah Watkins <[email protected]>
All of the remaining instances of bytes::append are just longer forms of bytes::from_string factory. Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
libc++ >= v18 have deprecated (to be removed in v19) char_traits for T other than char (and some other types, like wchar). our bytes implementation is uses T=uint8_t and because seastar::sstring interoperates with std::string/std::string_view, we encounter the deprecation.
this PR introduces a new bytes implementation that wraps a
seastar::sstring<char>
, and casts back and forth between pointers as needed at the interface level to provide the illusion of uint8_t storage.after the conversion to sstring, we recognize that we now control the bytes interface, and use this to reduce the scope by, for example, removing the char* converting constructor, among a couple other interface clean-ups.
Backports Required
Release Notes