-
Notifications
You must be signed in to change notification settings - Fork 245
Implement Syncback to support converting Roblox files to a Rojo project #937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 03330cf.
kennethloeffler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on rojo syncback --list output -- it's confusing that something like
rojo syncback --input game.rbxl --dry-run --list | grep InterestingFiledoesn't work, and doubly confusing that why is not documented anywhere.
All other output seems informational/diagnostic, so I think those other messages should be written to stderr. That way stdout remains clear for any "real" output of the command, and normal redirection and piping will just work.
The --list output format is also rather unwieldy, because the user must either retain state or search backwards to determine whether lines represent writes or removals. I think I'd prefer a format more like this, where the first column is whether the line represents a write or removal, and the second column is the path:
Writing /path/of/an/addition
Writing /path/of/another/addition
Removing /path/of/a/removal
Removing /path/of/another/removal
Pipe output from `--list` to stdout
Accept new snapshots
|
Good call on making |
f01b603 to
84bf14c
Compare
Well, the time has come. Syncback is real. Over 300 commits and 9 months later, here we are.
I'd like to apologize. There are a lot of files in this pull request and most of them are for tests. I went back and forth about whether they should be included in the this PR or added in a second one, but ultimately I decided that someone would have to go through them either way so I may as well include them now. If you want them removed so that the code is easier to review, I can do so.
This is a general tech overview of what's changed, what's added, and how it works. It is't a detailed tech dive because I'm a firm believer in reading code, but this should serve as a primer for reading the code.
Overview
Syncback is a feature designed to solve the longstanding problem of converting Roblox files into Rojo projects. It works by accepting a Rojo project file and a Roblox file, then iterating through the file to match Instances to the Rojo project. Then, every Instance that's different has a middleware assigned to it that then has a method for converting the Instance back into the file type the middleware represents.
The middleware used is the old one (if an Instance is already present in the Rojo project) or a reasonable default depending upon the class of the Instance.
The path an Instance is written to is dependent upon where it is in project tree. So if you had a project like this:
{ "tree": { "DataModel": { "ReplicatedStorage": { "$path": "src/Shared" } } } }All children of
ReplicatedStoragewould be placed undersrc/Shared. If one or more of those children became a directory (such as aModuleScriptwith children becoming a directory withinit.luauin it), then descendants of that Instance would be placed under that directory. This is recursive for every path that exists within a project file.A direct consequence of this is that the tree of a project file controls syncback almost directly.
Change Summary
The following changes were made to Rojo to help support syncback. It does not cover syncback-specific additions, those will be later.
create_dirandcreate_dir_allfunctions tomemofsInstigatingSource::ProjectNodewas changed to be a struct variant instead of a tuple variantlibrojoBTreeMapinstead ofHashMapfor properties and attributesexamplefield now has an alias namedexamplesCow<str>instead of just&strinitfiles) now useMiddlewareas their backingrelevant_pathsof directories now only contains actually existent pathsMiddlewareis now stored in the metadata forInstanceMiddlewarenow displays in the web uiOf all of these, I have an explanation for all of them except for changing what type of variant
InstigatingSource::ProjectNodeis. In my defense, that change as made 7 months ago and it probably made sense at the time.You may have concerns with the
BTreeMapchange. I do too. I don't know what the performance hit of using aBTreeMapis at scale because I don't have any tests that uses projects, json models, or metadata files enough to stress test it. The testing I did seems to indicate it has a minimal impact because of how infrequently they're used though; at most it adds a bit of startup time for the server. A change of this nature is necessary in some capacity because the order of these maps has to be deterministic for syncback to function well.I'm happy to answer any other questions you have on the changes.
Syncback Additions
There are three main parts to syncback:
SyncbackSnapshotand the relatedSyncbackReturnstructssyncbackmethod forMiddlewareitemsThroughout the code for syncback you will see that I've used the terms 'old' and 'new'. In all cases, the term 'old' refers to the Rojo project and the term 'new' refers to the Roblox file. So e.g. the 'new dom' refers to the DOM from the Roblox file. I can change this if you'd like, but it's pervasive throughout the code and I think it's cleaner than 'project' and 'roblox' or whatever.
A
SyncbackSnapshotcontains data for performing syncback on a specific Instance. It's utilized both by the syncback method for eachMiddlewareas well as the core syncback loop. You'll want to read the code for exact details, but in summary these snapshots contain data that's global for syncback (things like a reference to a VFS) and data for syncing back a specific Instance from the Roblox file. This includes a path to that it should be written to and the Instance it corresponds to from the Rojo file.A
SyncbackReturncontains the actual results of running syncback on aSyncbackSnapshot. It's returned from the syncback methods for eachMiddleware, as you may expect. It includes aFsSnapshotthat contains the serialized form of the Instance (including any meta.json files), a set ofSyncbackSnapshotsto run syncback over (this is used by directories and projects), and a list of paths that should be removed by Syncback. These are consumed by the core syncback loop to know what changes to make to the file system as well as to continue the syncback process.The bulk of the work is done in the individual syncback methods for each Middleware. These are stored in the (now-misnamed)
snapshot_middlewaresubmodules along with the snapshot functions for each middleware. They accept aSyncbackSnapshotand return aSyncbackReturn. Using the information contained within the passedSyncbackSnapshot, eachMiddlewarereserializes the provided Instance and if necessary returns a list of other Instances that need to have syncback ran on them. The vast majority ofMiddlewarevariants have a very simple syncback implementation. Directories and project files are the exception, so they're documented a bit more below.The core loop is essentially a queue of
SyncbackSnapshots that's extended by pulling the returned snapshots out of eachSyncbackReturn. The loop does some comparisons to skip identical Instances, checks that a given snapshot is allowed to be written (it may be blocked by one of the syncback rules), and then determines aMiddlewareto use for each snapshot before running the syncback method for theMiddleware.The core loop assumes that the root of the project file and the root of the Roblox file are the same thing, so the loop is able to get started by constructing a
SyncbackSnapshotthat contains the roots.This sounds simple because it is. By delegating most of the logic to the individual middleware implementations, the core design of syncback is relatively straightforward to follow.
Instance Matching
Syncback for a directory or project file has to perform matching between the Rojo project and Roblox file rather than relying upon
SyncbackSnapshotlike other middleware. This is because they need to generate the snapshots for their children and project nodes respectively. For directories, this also allows children that are present on the Rojo project but not in the Roblox file to be removed from the file system, which is an important quality of life feature for syncback.Instance matching is done by name, since the file system requires unique names unlike Roblox. This runs into some problems with generic Instance names (such as 'Tree' or 'Part') so it isn't perfect. Rojo's project format also doesn't support multiple models having the same name but both being in the same place on the file system. Unless we want to adopt a method of this nature (multiroot models, multiple paths for a folder, or whatever), matching by name is going to have to work.
This is a surprisingly complex problem but both
DirandProjectmiddlewares have a fair few comments describing the implementation as a result. Thus, I will simply encourage you to read the implementation and then ask me questions when you have them. It will be easier that way.Instance Comparisons
Syncback's core loop compare Instances from the
SyncbackSnapshotand skips if them and all of their descendants are identical. This is done for UX more than anything, because it makes for a bad experience if files are rewritten when they don't have to be. Syncback can't just naively skip if two Instances are the same (identical name, classname, and properties map) because it's is designed such that parents initiate syncback for their children.To quickly tell if two Instances are identical, there's a variety of ways you could handle it. The easiest one is to just compare two Instances' names, class name, and properties. This works well, but it doesn't work for comparing subtrees. To do that, you'd potentially have to compare infinite Instances from each tree and match them by name and class and... Well, it gets out of hand. So, rather than doing that syncback needed a way to quickly and efficiently compare entire subtrees, regardless of the number of descendants they had.
This problem was solved by hashing and comparing the hashes of two different Instances. At a glance, this doesn't seem to solve the descendant comparison problem, but by making it so that each parent incorporates the hashes of its children, it makes it so that hashes are only identical if and only if their descendants are also identical. All it takes to accomplish this is by hashing a DOM depth-first and then making it so that a parent incorporates the hashes of its children during the hashing process. This has the consequence of making it so you have to frontload calculating hashes, but it's relatively quick so it isn't a big deal. Syncback precomputes hashes before the core loop even begins so that there's only an initial overhead.
The current hashing algorithm used is Blake3. There's no real reasoning behind this except that Blake3 is used by
rbx-typesalready for hashingSharedString, so it doesn't add a direct dependency. Really the only requirement is that the hash be reasonably quick (not cryptographic), and to that end something like XXH3 or GxHash may be more appropriate, but that's mostly hypothetical. The hashing process is by no means a bottleneck for syncback.An interesting note is the prospect of hashing floating point numbers. As most Roblox data types use floats, it's inevitable that we run into imprecisions. To try to mitigate this, floats are rounded somewhat (literally via:
((x * 10.0).round() / 10.0)) before hashing, but that only solves minor imprecisions. I do not have a solution for larger ones. In practice, this has proven to not be a big deal for Adopt Me and other tests, but it's still worth noting as a potential problem.