-
-
Notifications
You must be signed in to change notification settings - Fork 22.5k
Combine CharBuffer
/StringBuffer
classes, and use in more places
#106596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Made StringBuffer character type templated, to support both char and char32_t buffers - Trimmed down implementation to focus on only appending one character at a time (other methods were unused) - Replaced CharBuffer class from file_access.cpp, and renamed StringBuffer to CharBuffer - Updated JSON parsing and String uri encoding/decoding to use CharBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! De-duplicating code makes sense to me, and CharBuffer
and StringBuffer
seem like a good fit in principle.
However, converting CharBuffer to String by parsing and copying it again String(get_terminated_buffer)
seems like a design flaw, since it's guaranteed to be slow. If we're re-designing this class, this should definitely be addressed.
Here's an idea:
Instead of LocalVector<T>
, we store in T
, either String
or CharString
.
Appending using String
interface methods would introduce quite a lot of copy-on-write overhead, so CharBuffer
will have to cheat a little bit and abuse ptr()
with const_cast<T *>(ptr())
to access the private pointer, to skip CoW checks. This would require it to check capacity, so we'd need to expose capacity()
from CowData
(and String
/ CharStringT
, so that CharBuffer
can check for the need to resize by itself. All this obviously means it assumes sole ownership of the backing array. This is fine as long as the backing String
/ CharStringT
is not handed out.
The type can then have a function like T finalize()
in which it hands out the backing String
/ CharStringT
, and resets its own state (so that it doesn't refer to the same String
).
I think this should eliminate the copy on finalization, and improve overhead.
Let me know what you think :)
Do you mean use only My initial feeling about replacing the Regarding the |
Keep the fixed array buffer!
I don't think it should involve much additional complexity, but I guess we'll find out!
That's true, I guess we don't expect the dynamic buffer to be used often.
Yes, something like |
I did some benchmarking by changing the var d := {
"key1": 1,
"key2": "long string".repeat(1000),
"key3": 1.2345,
"key4": true
} On using StringType = std::conditional_t<std::is_same_v<T, char32_t>, String, CharStringT<T>>;
...
const StringType as_string() {
T *current_buffer = _get_current_buffer();
current_buffer[_length] = '\0';
StringType result;
result.resize(_length + 1);
T *result_ptr = result.ptrw();
memcpy(result_ptr, current_buffer, (_length + 1) * sizeof(T));
return result;
} This raises the question - do we want Changing to use void _reserve(size_t p_size) {
if (p_size <= _capacity) {
return;
}
bool copy = !_dynamic_buffer && _length > 0;
if (!copy) {
_string.set_size(_length + 1);
}
_string.resize(p_size);
_capacity = _string.capacity();
_dynamic_buffer = _string.ptrw();
if (copy) {
memcpy(_dynamic_buffer, _fixed_buffer, _length * sizeof(T));
}
} Summary of times/string validation details:
The performance difference was larger than I was expecting. I think at this point the big questions are what we want to do with string validation, and how much we want to expose/rely on cowdata implementation. |
Thanks for all those extensive tests! I think they show the potential for a high performance API like this :)
Practically, we probably want a similar API like
What do you mean by this? I'd hope the coupling would be relatively low, except for the Side ramble: Actually, thinking about this now, this whole |
StringBuffer
character type templated, to support bothchar
andchar32_t
buffersCharBuffer
class fromfile_access.cpp
, and renamedStringBuffer
toCharBuffer
CharBuffer
This makes
JSON.parse_string
around 50% faster, and uri encode/decode methods around 2x faster, compared with gdscript below.ConfigFile.load
andFileAccess.get_line
are also included below just to check that usingCharBuffer<>
isn't slower than the previous implementations.old:
new:
This change started out with looking into how to combine
StringBuffer
/StringBuilder
, and I came across #77158 which was looking into a similar change. After trying out some ideas and seeing some of the benchmarks there, it felt like while the two classes are similar, they have distinct goals -StringBuffer
more focused on reducing/eliminating allocations for small strings, andStringBuilder
more focused on handling longer strings. Rather than trying to come up with an approach that works well for both cases, this PR attempts to trim downStringBuffer
(nowCharBuffer
) to focus on the case of building small strings out of individual characters, and reuse the class in a few more places that can benefit from it. Going this route would also hopefully make optimizingStringBuilder
easier in the future, without having to worry about trying to keep it fast in the cases whereCharBuffer
would be a better fit