-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[taskchampion] Drop old versions on the sync server #3
Comments
I'd like to take a crack at this. I want to make sure I have the right understanding of what's wanted, though. Currently, a snapshot is an opaque, possibly encrypted, blob of data understood by the client, but not the server. The snapshot contains the current state of the user's tasks at the time of snapshotting, i.e. all non-deleted tasks and associated data, not including their modification history. A "version" a set of modifications to the task database, e.g. "two tasks were added, one was delete, one was modified"? The goal is to periodically delete versions that are "too old" -- possibly as defined in the server config -- if they are also older than the most recent snapshot. If "too old" is two weeks and the most recent snapshot was one week ago, one week of prior versions would be retained. Looking at the sqlite storage backend as a reference, I am thinking:
Questions:
|
You've got the context pretty much correct. The snapshot is always encrypted, and also contains tasks of status Deleted. One piece that's missing, and that guides setting the parameters, a client can only sync if there are enough versions on the server, where "sync" means to combine local changes on the client with changes on the server. If it can't do that, then it just downloads the latest snapshot and any local changes on the client are lost. "Enough versions" is a little complicated. The client stores the ID of the last version it saw on the server. When it begins a sync operation, the child of that version must still exist on the server. Support for this already exists (at least in Taskchampion; I don't recall if the "hey, you need to re-initialize" message is implemented in Taskwarrior). Deleting on sync seems reasonable. If it becomes too much load, we could make it probabalistic, with the probability a tunable parameter. |
Here's a high-level proposal:
Does that make sense? It seems like this path is simple and lets us postpone some of the more difficult questions until step (3) while still solving most of the use cases. |
I think there may be some confusion about "client" in this case. A "client id" represents a task database on the server, and is the same for all replicas connected to that task database. For example, my laptop, desktop, and phone would all use the same client id. I think what you mean by "..of any client" is really "..of any replica", for example the oldest Also note that encryption keys are per-client, not per-replica. That is, my laptop, desktop, and phone all share the same encryption key. We should have a process to handle a compromised key, but I think it would involve migrating all replicas to a new client_id (and new encryption key). How would users generate replica id's? If they use the same replica id on two replicas, what happens? What if we didn't store replica_id's, and just implemented a variant of (3) -- deleting versions when they are old enough that every actively-used client should have seen them (on the scale of months). Would that be simpler for the user? Would it be less effective? |
Ah, I was in fact under the impression that devices (replicas) were clients with their own encryption keys. My whole idea was based on thinking the server already knew the state of the replicas. I'll have to give some more thought to whether maintaining that data is worth the trouble.
This doesn't strike me as a great security story. Maybe it's daunting but would it be worth considering per-replica keys? I would argue that most people don't roll keys but they do roll devices. I recall at least one person reporting that their team uses bugwarrior to aggregate their issues into a single synchronized taskwarrior database. I believe the entire team would be one "client" under the new model, so when somebody leaves the company everybody on the team would need to roll the key.
One option would be to reject the initial sync if it's not unique. Another option would be to not have human-readable uuid's and allow the human-readable device name to be a duplicate. This seems to be the way chat clients like Signal or Element work, though I've never like that UX myself because it's impossible to know what device you're verifying/revoking without having the device in order to check it's key.
I don't see a strategy like this having so much consensus. How eagerly you want potentially lossy cleanup operations to run depends on a combination of the storage space you have available, the rate at which you're adding task data, and the value of your task data. For example, a user adding their tasks manually is going to value their data a lot more than somebody using taskwarrior strictly as an issue aggregator with bugwarrior. The simplest thing for the user is if taskwarrior could mostly clean up after itself without the possibility of loss and the user could configure more aggressive cleanup if and when they encounter a need to do so. |
Let's split off the issue of key rotation. I don't know of a simple mechanism for each replica to have a different key but still be able to exchange information with other replicas. I think it's impossible to clean up anything and ensure no data loss: a replica may just stop sync'ing, and it's impossible for the server to know if that replica will come back someday or is gone forever. You suggested addressing this by deleting a replica when its last sync was "old enough", and I think that strategy would make a lot of sense for a user: imagine I dug an old laptop out of the closet and turned it on for the first time in over a year. As a user, I want that laptop to get back up to date, but I don't need any task updates that were sitting on that laptop, un-sync'd for over a year, to appear -- the utility of those changes has long ago faded. The expiration duration can be configurable, so a server with limited space might expire versions after a month, for example. My variant of (3) accomplishes a similar thing (if a replica doesn't sync for a long time, it will have to reset to a snapshot), but without storing replica_id's. |
Under my proposed model, replica's would know when all the replica's had last been updated. An alternative to auto-deletion would be to notify users when a certain threshold of age is reached, either by adding an urgent task or a separate UI mechanism.
Imagine I'm using the sync server mostly for backup or I stop regularly using two devices and become a single-device user. For some reason the sync server stops working -- maybe the configuration is broken or the service daemon didn't restart it when it crashed, etc. I don't notice that the sync server is down for over a year, but when I fix it everything I've done for the last year on my laptop gets deleted.
I don't think your variant is unworkable, but it seems like there might be a number of user stories we need to take into account and come up with solutions for. From an ease-of-use perspective, it seems like with your variant users will need to consider and decide what kind of user they are at the time they set up the task server. This involves not only acquiring some understanding of how task synchronization works but also predicting what kind of user they will remain in the future. With my proposal, users can "lazily" learn about this issue when it confronts them as an actual problem -- either running out of storage space or getting warned that a device hasn't been seen for a long time. And they choose whether to manually address the issue by deleting the offending replica or whether to automate this process for the future. Many users will never be confronted with it at all. Another issue is that the sync server is designed to support multiple clients and it seems like all clients will have to have the same configuration. From this "hosted taskwarrior" perspective, it seems like the server would want the ability to delete old versions based solely on the total size of a client's database. Maybe that would be better split off into a separate issue. My concern though is that your "simple" variant will provide a sub-optimal but "good enough" solution to all the user stories, complacency will set in, and we may not consider a more nuanced solution that might be easier to implement at an earlier stage of development. |
We should definitely have a user consent prompt before resetting local state to a snapshot, so everything on your laptop wouldn't be deleted automatically! I think this is currently an error message suggesting deleting the local task DB. That UX could be better, but at least the consequences are clear and the user can choose. Do you want to make a more precise model of your proposal? My concerns are in the details, so having those details drawn out would help. |
Here's a more detailed and updated breakdown of my proposal above: Step 1: guaranteed-lossless version deletions on sync
{
purge_versions();
Ok(
...
async fn purge_versions() -> None {
for (uuid) in versions.sort_by(oldest) {
// We can't delete this version if it's newer than the newest replica.
if uuid == snapshots.sort_by(newest).version_id {
// Technically this version isn't needed but this is probably the simplest way to set a stopping condition without a more robust tracking of timestamps.
break;
}
// We can't delete this version if it's needed by a replica.
for (_, base_version) in replicas {
if server.get_child_version(base_version) == uuid {
// This is the next version needed by this replica.
break;
}
}
// Otherwise, delete it.
versions.delete(uuid);
}
} Step 2: add a command to manually delete a replica
# TODO Change the data model to make this more performant!
fn approximate_version_age(base_version: UUID) -> Timestamp {
let found_op = false;
for (op) in Txn.operations() {
if op.uuid == base_version {
found_op = true;
}
if found_op && let ReplicaOp::Update(update) = op {
update.timestamp
}
}
}
Step 3: automatic replica deletion based on age
for (replica_uuid, age) in list_replicas() {
if age > config.get("replica_max_age") {
delete_replica(replica_uuid);
}
} |
Nice! Can you add some detail on how replica UUIDs are registered, and how old replicas are expired? |
I edited my comment to specify that replicas generate their own UUID on initial sync. There isn't really a special "registration" story on the server side because the server will store anything it's given in a |
Thanks!
|
Key revocation for the purposes of authorization (a separate concern from encryption) is a real concern but maybe should be spun off into a separate issue that can be fixed later? I don't think this makes the situation worse than present -- if an attacker gets the client_id and encryption_secret they can add as many replicas as they want. And it would be nice to have a better idea where GothenburgBitFactory/taskchampion#371 is going because that would also change the way these id's/keys are used anyway and could make signatures an easy solution.
Yes, that's correct!
Yeah, I figured there were a few different ways we could change the data model to make that performant and that you'd have a better idea which was best. Regardless of the implementation, the timestamp needs to be tied to the timestamp of the version itself, not the time when the replica received it. For example, a replica on a poor internet connection might only receive a few very old operations now and then but be very far behind -- we shouldn't give it a timestamp of "today" just because it received a version today. Maybe the simplest approach would be to add a |
Sorry, I was indeed crossing two different issues there. Still, worth considering the overlap! What's the motivation to tie a version's timestamp to the timestamp of an operation within it? Is there an advantage over simply adding a timestamp for the version when the version is created? |
The motivation of the current proposal is that it puts all the work on the client side without adding complexity to the overall system, but I think we'd agree that there are better options. I'll post an updated proposal momentarily. |
Phase One: guaranteed-lossless version deletions on sync
{
purge_versions();
Ok(
...
async fn purge_versions() -> None {
for (uuid) in versions.sort_by(oldest) {
// We can't delete this version if it's newer than the newest replica.
if uuid == snapshots.sort_by(newest).version_id {
// Technically this version isn't needed but this is probably the simplest way to set a stopping condition without a more robust tracking of timestamps.
break;
}
// We can't delete this version if it's needed by a replica.
for (_, base_version) in replicas {
if server.get_child_version(base_version) == uuid {
// This is the next version needed by this replica.
break;
}
}
// Otherwise, delete it.
versions.delete(uuid);
}
} Phase Two: add a command to list replicas and the timestamp of their base_version
Phase Three: add a command to manually delete a replica
Phase Four: automatic replica deletion based on age
for (replica_uuid, age) in list_replicas() {
if age > config.get("replica_max_age") {
delete_replica(replica_uuid);
}
} |
Apologies for the wait while I found some time to consider this. I think this sounds like a good model! Getting version timestamps from the We could support this for the cloud-storage-backed model, too, by for example storing each replica's latest version in an object named Anyway, this is all just thinking about how this fits into other ideas. The proposal sounds good :) |
I updated the summary of this issue with the latest status -- generalizing a little to cover cloud storage as well as the sync server. |
(Summary updated Jan 2024)
The sync model results in a linked sequence of versions on the server (whether taskchampion-sync-server or on a cloud service). To avoid that history growing forever, there's some cleanup (in
taskchampion/taskchampion/src/server/cloud/server.rs
for cloud services) that applies a date-based heuristic to delete old versions. That heuristic is trading use of extra space against the possibility that a replica that hasn't synced in a loooong time will find itself unable to sync, because the last version it saw has been deleted.The proposal here involves tracking all replicas and the last version they saw. The downside to this proposal is that it adds some required UI for users to manage replicas.
See GothenburgBitFactory/taskchampion#311 for more background.
The text was updated successfully, but these errors were encountered: