Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Track record/account delete and update data in subject status #2804

Merged
merged 13 commits into from
Oct 30, 2024

Conversation

foysalit
Copy link
Contributor

@foysalit foysalit commented Sep 9, 2024

This PR updates moderation subject status with 3 new properties recordUpdatedAt recordDeletedAt and recordStatus. as account/record events are received, these help moderators query subject statuses with delete/update time range and record status.

type: 'string',
format: 'datetime',
description:
'Last update timestamp of the record the subject is associated with',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the language here is quite convoluted. would love suggestions to make it clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something along the lines of: "Timestamp at which the record or account was last updated."

@@ -11127,7 +11145,7 @@ export const schemaDict = {
type: 'object',
description:
'Logs account status related events on a repo subject. Normally captured by automod from the firehose and emitted to ozone for historical tracking.',
required: ['timestamp', 'status', 'active'],
required: ['timestamp', 'active'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: if this PR doesn't make it into the main branch, we'd wanna port this change over to the parent PR.

// When deactivated accounts are re-activated, we receive the event with just the active flag set to true
// so we want to make sure that the recordStatus is not set to an outdated value
if (currentStatus?.recordStatus !== 'active' && event.meta?.active) {
status.recordStatus = 'active'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default for recordStatus is null, may be we should fallback to that? although, doing this is maybe helpful if we want to see which accounts reactivated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend reflecting the exact hosting status here (where active is either null or 'active', slight preference for the latter), and then if we want to keep tract of reactivations we do so with a separate field such as reactivatedAt. That allows the status to not be dependent on past statuses, and if we want to know if something is active we don't need to check for multiple different values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the null value is basically "we haven't tracked any hosting status for this" so I think safer option is to check for actual deactivated value for status.


if (event.action === 'tools.ozone.moderation.defs#accountEvent') {
const status: Partial<ModerationSubjectStatusRow> = {
recordStatus: `${event.meta?.status}`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the types here, as written this could end-up containing the string 'undefined'.

}

if (event.meta?.tombstone) {
status.recordStatus = 'tombstoned'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a couple values for recordStatus in here that aren't reflected in the lexicon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(P.S. is this separate from 'deleted'? Historically we have used the term "tombstoned" in a couple different ways, and I'm not sure which one is meant here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to my understanding, the tombstoned event is only emitted on identity and this is "technically" equivalend as "deleted" but not exactly the same thing.

Comment on lines 134 to 148
"recordUpdatedAt": {
"type": "string",
"format": "datetime",
"description": "Last update timestamp of the record the subject is associated with"
},
"recordDeletedAt": {
"type": "string",
"format": "datetime",
"description": "Timestamp referencing when the record the subject is associated with was deleted"
},
"recordStatus": {
"type": "string",
"description": "Status of the record the subject is associated with. Statuses are different when the subject references an account vs. a record",
"knownValues": ["takendown", "suspended", "deleted", "deactivated"]
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names work alright if the subject is a record, but if the subject is a repository or chat message, using the term "record" here doesn't seem like the best fit. Would there be any issue with using something like subjectUpdatedAt/subjectDeletedAt instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was using record as a generic term here without the context of atproto records. I'm also not sure about subject prefix either since we already have updatedAt column on subject, we will end up with subject.updatedAt and subject.subjectUpdatedAt which feels odd. I think I'm onboard with the term hosting just from technical standpoint.

Comment on lines 144 to 148
"recordStatus": {
"type": "string",
"description": "Status of the record the subject is associated with. Statuses are different when the subject references an account vs. a record",
"knownValues": ["takendown", "suspended", "deleted", "deactivated"]
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a complex field, I think there are a few potential issues with it:

  1. It's called "record status" but it may contain the hosting status of an account.
  2. The hosting statuses of records and accounts are mixed together, which could make it seem like a record can be marked as takendown here, which isn't the case.
  3. Potential confusion between the ozone status and the hosting status: the terms "takendown" and "suspended" are also terms used by ozone overlapping with the takendown and suspendedUntil fields.

Here's one idea: we could combine the three new fields into something more like this (using typescript just to illustrate the concept):

type SubjectStatusView = {
  // ...
  hosting?: AccountHosting | RecordHosting
}

type AccountHosting = {
  $type: 'tools.ozone.moderation.defs#accountHosting',
  status: 'active' | 'takendown' | 'suspended' | 'deleted' | 'deactivated'
  createdAt: Date
  updatedAt: Date
  deletedAt?: Date
  suspendedAt?: Date
  deactivatedAt?: Date
  takendownAt?: Date
  reactivatedAt?: Date
}

type RecordHosting = {
  $type: 'tools.ozone.moderation.defs#recordHosting'
  status: 'active' | 'deleted'
  createdAt: Date
  updatedAt: Date
  deletedAt?: Date
}

This also gives us a way to make certain fields required together if we want, e.g. if hosting is present then hosting.createdAt can always be present too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is nice!

@@ -28,6 +28,31 @@
"format": "datetime",
"description": "Search subjects reviewed after a given timestamp"
},
"recordDeletedAfter": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My qualm with these fields is just that they are named "record" when they really may apply to any subject, e.g. an account rather than a record.

Comment on lines 4 to 15
await db.schema
.alterTable('moderation_subject_status')
.addColumn('recordStatus', 'varchar')
.execute()
await db.schema
.alterTable('moderation_subject_status')
.addColumn('recordDeletedAt', 'varchar')
.execute()
await db.schema
.alterTable('moderation_subject_status')
.addColumn('recordUpdatedAt', 'varchar')
.execute()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend distinguishing these as e.g. hostingStatus, hostingDeletedAt, etc.

type: 'string',
format: 'datetime',
description:
'Last update timestamp of the record the subject is associated with',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something along the lines of: "Timestamp at which the record or account was last updated."

Comment on lines 142 to 144
const timestamp = event.meta?.timestamp
? `${event.meta?.timestamp}`
: event.createdAt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types here are a little funky to wrangle. To avoid ending up with a string like 'true' here, could we go with something more explicit like:

Suggested change
const timestamp = event.meta?.timestamp
? `${event.meta?.timestamp}`
: event.createdAt
const timestamp = typeof event.meta?.timestamp === 'string'
? event.meta.timestamp
: event.createdAt

// When deactivated accounts are re-activated, we receive the event with just the active flag set to true
// so we want to make sure that the recordStatus is not set to an outdated value
if (currentStatus?.recordStatus !== 'active' && event.meta?.active) {
status.recordStatus = 'active'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend reflecting the exact hosting status here (where active is either null or 'active', slight preference for the latter), and then if we want to keep tract of reactivations we do so with a separate field such as reactivatedAt. That allows the status to not be dependent on past statuses, and if we want to know if something is active we don't need to check for multiple different values.

Comment on lines 668 to 671
"status": {
"type": "string",
"knownValues": ["takendown", "suspended", "deleted", "deactivated"]
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a hosting perspective individual records can't be deactivated, takendown, or suspended— only accounts can. They can be deleted, though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see the values I'd expect reflected in the implementation, so this probably just needs to be updated.

Comment on lines 638 to 641
"status": {
"type": "string",
"knownValues": ["takendown", "suspended", "deleted", "deactivated"]
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the value unknown used in the implementation—may be worth adding here and to the same field in #recordHosting.

Copy link
Collaborator

@devinivy devinivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Let's just add a changeset here before merge.

@foysalit
Copy link
Contributor Author

this is going to be merged into #2661 and I'll add the changeset there since that's the one landing on main.

@foysalit foysalit merged commit b9bb009 into ozone-account-events Oct 30, 2024
10 checks passed
@foysalit foysalit deleted the ozone-track-record-delete-and-update branch October 30, 2024 21:43
foysalit added a commit that referenced this pull request Nov 7, 2024
* ✨ Add events for account and record update/delete/deactivation

* ✨ Add handle change event

* ✨ Reduce account events to 2 types and record events to 1

* ✨ Store metadata from account, identity and record events

* ✨ Add created event for record

* ✨ Add ndd the new events to allowed types in emitEvent

* ✨ Use string value for record op and add tombstone flag to identity event

* ✨ Add active flag on account events

* ✨ Change accountStatus -> status to match with firehose event

* ✨ Make active flag required

* 🚨 fix prettier style issue

* ✨ Track record/account delete and update data in subject status (#2804)

* ✨ Store deleted/updated event data in subject_status

* 🐛 Fix query for recordDeletedAt and recordUpdatedAt

* ✨ Add tombstoned status

* ✨ Move from record to hosting term

* ✅ Add tests for hosting params

* ✨ Update lexicons for hostingStatuses

* ✅ Update snapshots

* ✅ Update snapshots

* ✅ Update snapshots

* ✨ Adjust hosting statuses

* 📝 Add changeset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants