Combine `CharBuffer`/`StringBuffer` classes, and use in more places #106596

aaronp64 · 2025-05-19T14:06:14Z

Made StringBuffer character type templated, to support both char and char32_t buffers
Trimmed down implementation to focus on only appending one character at a time (other methods were unused)
Replaced CharBuffer class from file_access.cpp, and renamed StringBuffer to CharBuffer
Updated JSON parsing and String uri encoding/decoding to use CharBuffer

This makes JSON.parse_string around 50% faster, and uri encode/decode methods around 2x faster, compared with gdscript below. ConfigFile.load and FileAccess.get_line are also included below just to check that using CharBuffer<> isn't slower than the previous implementations.

func _ready() -> void:
	run_test("json_parse")
	run_test("uri_encode_decode")
	run_test("load_config")
	run_test("read_lines")
	
func run_test(p_name : String):
	var start := Time.get_ticks_msec()
	call(p_name)
	var end := Time.get_ticks_msec()
	print("%s: %dms" % [p_name, end - start])

func json_parse():
	var d := {
		"key1": 1,
		"key2": "string value",
		"key3": 1.2345,
		"key4": true
	}
	var str := JSON.stringify(d)
	for i in 100000:
		var d2 = JSON.parse_string(str)

func uri_encode_decode():
	var query_params := "is%3Apr+is%3Aopen+label%3Atopic%3Arendering"
	for i in 200000:
		query_params.uri_decode().uri_encode()
		
func load_config():
	for i in 5000:
		var config := ConfigFile.new()
		config.load("res://icon.svg.import")

func read_lines():
	for i in 5000:
		var file_access := FileAccess.open("res://icon.svg.import", FileAccess.READ)
		while !file_access.eof_reached():
			file_access.get_line()

old:

json_parse: 254ms
uri_encode_decode: 417ms
load_config: 456ms
read_lines: 389ms

new:

json_parse: 156ms
uri_encode_decode: 182ms
load_config: 436ms
read_lines: 388ms

This change started out with looking into how to combine StringBuffer/StringBuilder, and I came across #77158 which was looking into a similar change. After trying out some ideas and seeing some of the benchmarks there, it felt like while the two classes are similar, they have distinct goals - StringBuffer more focused on reducing/eliminating allocations for small strings, and StringBuilder more focused on handling longer strings. Rather than trying to come up with an approach that works well for both cases, this PR attempts to trim down StringBuffer (now CharBuffer) to focus on the case of building small strings out of individual characters, and reuse the class in a few more places that can benefit from it. Going this route would also hopefully make optimizing StringBuilder easier in the future, without having to worry about trying to keep it fast in the cases where CharBuffer would be a better fit

- Made StringBuffer character type templated, to support both char and char32_t buffers - Trimmed down implementation to focus on only appending one character at a time (other methods were unused) - Replaced CharBuffer class from file_access.cpp, and renamed StringBuffer to CharBuffer - Updated JSON parsing and String uri encoding/decoding to use CharBuffer

Ivorforce

Thanks! De-duplicating code makes sense to me, and CharBuffer and StringBuffer seem like a good fit in principle.

However, converting CharBuffer to String by parsing and copying it again String(get_terminated_buffer) seems like a design flaw, since it's guaranteed to be slow. If we're re-designing this class, this should definitely be addressed.

Here's an idea:

Instead of LocalVector<T>, we store in T, either String or CharString.
Appending using String interface methods would introduce quite a lot of copy-on-write overhead, so CharBuffer will have to cheat a little bit and abuse ptr() with const_cast<T *>(ptr()) to access the private pointer, to skip CoW checks. This would require it to check capacity, so we'd need to expose capacity() from CowData (and String / CharStringT, so that CharBuffer can check for the need to resize by itself. All this obviously means it assumes sole ownership of the backing array. This is fine as long as the backing String / CharStringT is not handed out.

The type can then have a function like T finalize() in which it hands out the backing String / CharStringT, and resets its own state (so that it doesn't refer to the same String).

I think this should eliminate the copy on finalization, and improve overhead.
Let me know what you think :)

aaronp64 · 2025-05-19T15:19:56Z

Do you mean use only String/CharStringT, or keep the fixed array buffer, and only replace the LocalVector part? I think keeping the array is useful, as there are some cases where we're parsing numbers, and no allocations are done at all.

My initial feeling about replacing the LocalVector is the extra complexity would outweigh the performance gains, though I can try it out and compare. The cases where the LocalVector (or String/CharStringT) actually gets used should be rare - if it's happening enough to show up as a bottleneck, the FIXED_BUFFER_SIZE should probably be changed, or just not use CharBuffer. Though I guess when string types are being read in, we can't really know the length for sure. The choice of LocalVector was based on just using the fixed buffer in the common case, and being simple and "good enough" for when the string is longer than expected.

Regarding the finalize method - the current CharBuffer implementation allows getting the current state of the string, then continuing to append to it, though this is currently unused. I was going to say finalize would make that case slower, but I think avoiding the copy when getting the first string state would cancel that out. It would be another step to handle that though if we want to support that case.

Ivorforce · 2025-05-19T17:18:16Z

Do you mean use only String/CharStringT, or keep the fixed array buffer, and only replace the LocalVector part? I think keeping the array is useful, as there are some cases where we're parsing numbers, and no allocations are done at all.

Keep the fixed array buffer!

My initial feeling about replacing the LocalVector is the extra complexity would outweigh the performance gains, though I can try it out and compare.

I don't think it should involve much additional complexity, but I guess we'll find out!

The cases where the LocalVector (or String/CharStringT) actually gets used should be rare - if it's happening enough to show up as a bottleneck, the FIXED_BUFFER_SIZE should probably be changed, or just not use CharBuffer. Though I guess when string types are being read in, we can't really know the length for sure. The choice of LocalVector was based on just using the fixed buffer in the common case, and being simple and "good enough" for when the string is longer than expected.

That's true, I guess we don't expect the dynamic buffer to be used often.

Regarding the finalize method - the current CharBuffer implementation allows getting the current state of the string, then continuing to append to it, though this is currently unused. I was going to say finalize would make that case slower, but I think avoiding the copy when getting the first string state would cancel that out. It would be another step to handle that though if we want to support that case.

Yes, something like copy() should be exactly as fast as what you have now, while finalize() should be free (if the backing buffer already exists). It should be a net win in any case.

aaronp64 · 2025-05-19T19:31:48Z

I did some benchmarking by changing the json_parse dictionary to the below, to test what happens with longer strings:

	var d := {
		"key1": 1,
		"key2": "long string".repeat(1000),
		"key3": 1.2345,
		"key4": true
	}

On master, this took 13061ms. With the current PR code, it took 2258ms. Some of the time taken with the current PR code is due to String::append_utf32 validating each character - adding the below and using as_string() instead of get_terminated_buffer() reduced the time to 1810ms:

	using StringType = std::conditional_t<std::is_same_v<T, char32_t>, String, CharStringT<T>>;
	...
	const StringType as_string() {
		T *current_buffer = _get_current_buffer();
		current_buffer[_length] = '\0';

		StringType result;
		result.resize(_length + 1);
		T *result_ptr = result.ptrw();
		memcpy(result_ptr, current_buffer, (_length + 1) * sizeof(T));
		return result;
	}

This raises the question - do we want CharBuffer to validate the string, have it as an option, or no validation? The original StringBuffer didn't validate, while the json code updated by this PR did validate. The current PR code validates through String(str.get_terminated_buffer()).

Changing to use String/CharStringT inside CharBuffer reduces the time further to 1561ms. In addition to the changes you mentioned in your earlier comment, I also added CowData::set_size to manually set the size, without risking resize zeroing out/overwriting the characters. I think the current implementation doesn't actually require that, between p_ensure_zero and std::is_trivially_constructible_v<T> checks, but I'm not sure how much we want to rely on CowData implementation details from CharBuffer. There's also an expectation that CowData::resize will always choose capacity based on some growth factor, and not allocate an exact amount. If we resize with a growth factor from CharBuffer, we would have to resize down to the final length at the end, which could cause the same copy anyway. At least, this is based on what I currently have for _reserve when trying to implement this:

	void _reserve(size_t p_size) {
		if (p_size <= _capacity) {
			return;
		}

		bool copy = !_dynamic_buffer && _length > 0;
		if (!copy) {
			_string.set_size(_length + 1);
		}
		_string.resize(p_size);
		_capacity = _string.capacity();
		_dynamic_buffer = _string.ptrw();
		if (copy) {
			memcpy(_dynamic_buffer, _fixed_buffer, _length * sizeof(T));
		}
	}

Summary of times/string validation details:

	master	current PR	PR with `as_string`	PR with `String` inside `CharBuffer`
json_parse time	13061ms	2258ms	1810ms	1561ms
string validated?	no when `StringBuffer` was used, yes for other `char32_t`/`String` cases	yes	no	no

The performance difference was larger than I was expecting. I think at this point the big questions are what we want to do with string validation, and how much we want to expose/rely on cowdata implementation.

Ivorforce · 2025-05-19T20:46:05Z

Thanks for all those extensive tests! I think they show the potential for a high performance API like this :)

I think at this point the big questions are what we want to do with string validation

Practically, we probably want a similar API like String, where you can append either fast (i.e. unchecked), or with some format in mind (e.g. append_utf8).
I'm not sure what the best approach is to avoid duplicating code. Perhaps we can start by doing no validation whatsoever, and making certain that users of CharBuffer only append valid string parts.

how much we want to expose/rely on cowdata implementation.

What do you mean by this? I'd hope the coupling would be relatively low, except for the const_cast on ptr and the capacity() call to ensure no size differences.

Side ramble: Actually, thinking about this now, this whole CharBuffer idea ties in pretty well with another concern I had recently, where you may want occasional high-performance single reference access to a CowData type temporarily. For example, #105264 recently used a LocalVector as a temporary buffer, while switching to Vector later. While the right decision at this time, I still think it's kind of kind of silly it requires a finalizing copy.
Anyway, enough about that. It probably makes sense to move forwards with this PR for now, and keep the above mentioned ideas in mind for possible future work.

aaronp64 requested a review from a team as a code owner May 19, 2025 14:06

Ivorforce reviewed May 19, 2025

View reviewed changes

AThousandShips added enhancement topic:core labels May 19, 2025

AThousandShips added this to the 4.x milestone May 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Combine `CharBuffer`/`StringBuffer` classes, and use in more places #106596

Combine `CharBuffer`/`StringBuffer` classes, and use in more places #106596

aaronp64 commented May 19, 2025

Uh oh!

Ivorforce left a comment •

edited

Loading

Uh oh!

aaronp64 commented May 19, 2025

Uh oh!

Ivorforce commented May 19, 2025

Uh oh!

aaronp64 commented May 19, 2025

Uh oh!

Ivorforce commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

Combine CharBuffer/StringBuffer classes, and use in more places #106596

Are you sure you want to change the base?

Combine CharBuffer/StringBuffer classes, and use in more places #106596

Conversation

aaronp64 commented May 19, 2025

Uh oh!

Ivorforce left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aaronp64 commented May 19, 2025

Uh oh!

Ivorforce commented May 19, 2025

Uh oh!

aaronp64 commented May 19, 2025

Uh oh!

Ivorforce commented May 19, 2025

Uh oh!

Uh oh!

Combine `CharBuffer`/`StringBuffer` classes, and use in more places #106596

Combine `CharBuffer`/`StringBuffer` classes, and use in more places #106596

Ivorforce left a comment •

edited

Loading