Skip to content

Conversation

@krakow10
Copy link
Contributor

@krakow10 krakow10 commented May 20, 2025

Closes #515.

Here's a draft of what it could look like to drop the ustr sledgehammer in favour of fine-grained memory control. This addresses all my concerns in #515, and is the culmination of my exploration efforts there. There is only enough code here to run rbx_binary benches, but the rest of the code for rbx_xml, rbx_util, rbx_reflector, etc. can be adapted from my raw-str branch krakow10/rbx-dom@1482a07...raw-str, which is where most of this code came from as well.

Here are just the commits for this PR on top of the five prerequisite PR dependencies:
branch krakow10/rbx-dom@prereq...lifetimes

Overview:

  • Drop Ustr completely in favour of &'a str for properties and class names
  • Add lifetimes to WeakDom and Instance for properties
  • Introduce DecodeOptions to rbx_binary with an optional string interner

Performance:
Deserialization is within 2%, but Serialization performance seems to be significantly (40%) slower, which I don't understand why. Performance vs master branch is now at +17% for both serialization and deserialization.

There are three supported use cases:

  1. Static lifetime. Ignore or deny invalid properties, get WeakDom<'static>:
let dom = rbx_binary::from_reader(reader, DecodeOptions::ignore_unknown())?;
let dom = rbx_binary::from_reader(reader, DecodeOptions::error_on_unknown())?;
  1. File lifetime. Allow invalid properties by referencing DecompressedFile binding, required to outlive WeakDom<'dom>:
let file = DecompressedFile::from_reader(reader)?;
let dom = file.deserialize(DecodeOptions::read_unknown())?;
  1. Custom lifetime. Allow invalid properties using a custom string interner, enabling the DecompressedFile data to be immediately dropped to free its memory, also freeing the interner's memory when it is dropped:
let host = bumpalo::Bump::new();
let mut index = HashSet::new();
let interner = |str: &str|{
	match index.get(str) {
		Some(interned) => interned,
		None => {
			// intern a new string
			let interned = host.alloc_str(str) as &str;
			index.insert(interned);
			interned
		}
	}
};
let dom = rbx_binary::from_reader(reader, DecodeOptions::read_unknown_with(interner))?;

@Dekkonot
Copy link
Member

I haven't commented on this because I was curious what the results would be. I felt it unlikely that it would improve things much, but I as curious.

Given that the results are unimpressive or just downright bad, I am going to put a pin in this and say thank you for the exploration but we aren't interested in doing this. We just adopted Ustr, and it's a much better solution than attaching lifetimes to everything just from a DX standpoint.

The memory footgun you ran into with the string internment is something we can work to make more obvious or safer (I would be open to a method for Instance that tried to get a property with a string but did not allocate any new Ustrs). But we are not dropping Ustr.

@krakow10
Copy link
Contributor Author

Given that the results are unimpressive or just downright bad

Is this referring to the serialization performance dropping or the developer experience when managing lifetimes? I assume the latter. I figured as much and was prepared for this to be rejected, but I thought it was worth a shot with how far it's come. Thanks for taking a look.

@Dekkonot
Copy link
Member

Is this referring to the serialization performance dropping or the developer experience when managing lifetimes?

Both! If it doesn't make a difference for deserializing, make serialization performance worse, and makes the DX worse I think it's obvious that we can't do it.

@krakow10
Copy link
Contributor Author

krakow10 commented May 22, 2025

I expect that the performance issue can be addressed, but exposing lifetimes seemed like a dealbreaker which is why I didn't look into the performance issue. I was never doing this for speed, it was for memory control. The next best thing to dropping ustr is replacing rbx_dom_weak with rbx_dom_strong, and I've already put in some work there #92 to generate the types. I can't seem to get rbx-dom out of my head, so see you around!

@krakow10
Copy link
Contributor Author

krakow10 commented May 30, 2025

The performance issue affects the serializer in general, and is addressed in #544 and added to the dependencies of this PR. I wanted to prove that the performance issue was not caused by these changes, it was merely magnifying an existing bottleneck in the serializer. Here is the performance uplift for this PR from rebasing onto #544:
image

After rebasing on top of #544, this PR is +17% faster than the master branch in both serialization and deserialization. However, it is -18% behind #544 in serialization without this on top, so there may be another serializer improvement waiting to be found. I tested against both master branch and the prereq branch, this PR got +17% vs master and +15% vs prereqs for "Miner's Haven/Deserialize". Not sure why there was no deserialization uplift was when I was testing before, but we take those.

Now dropping ustr is a pure performance win and memory control benefit compared to master branch, so if the performance was the dealbreaker then that is no longer an issue.

@krakow10
Copy link
Contributor Author

I've been considering whether to put more effort into this, but I think my effort is better spent on #92. Closing in favour of working on that.

@krakow10 krakow10 closed this Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ustr Concerns & Footguns

2 participants