Skip to content

feat: add zod to degreeworks parsing#333

Open
iAmbrosial wants to merge 30 commits intomainfrom
add-zod-to-degreeworks-parsing
Open

feat: add zod to degreeworks parsing#333
iAmbrosial wants to merge 30 commits intomainfrom
add-zod-to-degreeworks-parsing

Conversation

@iAmbrosial
Copy link
Contributor

Description

Adds Zod runtime validation to all DegreeWorks API responses in the degreeworks scraper. Previously, API responses were cast directly to TypeScript types with no runtime checks. This meant malformed or unexpected responses would silently fail or lead to unclear downstream errors. We are thus introducing schemas for all response shapes and integrating them into their respective clients.

Every variant of the Rule type has a ruleType field containing a unique string literal. To replicate this behaviour in an equivalent Zod schema, the ruleTypeSchema is defined as a discriminated union. Since discriminated unions cannot take intersections, every rule schema extends ruleBaseSchema to include the label field – reducing repetition if another rule type ever needs to be added.

The DWAuditErrorResponse type denotes values that never occur (i.e., it is impossible for this field to have a valid value). It is thus nonsensical to validate the error response type with Zod, and we instead only create a DWAuditOKSchema and then directly validate the parsed results against that in the DegreeWorks Client.

Related Issue

#275

Motivation and Context

We are adding Zod to DegreeWorks response parsing and before storing data such that we can check our assumptions going forward. Without runtime validation, unexpected API response shapes and silently sparse data have caused issues in the area. Zod validation makes these failures visible immediately.

How Has This Been Tested?

Ran the DegreeWorks Scraper for the 2025-2026 catalog year. The majority of undergraduate and graduate programs and all cached specializations parsed successfully. There were two cases of "Unexpected audit response shape" due to response objects lacking a blockArray field, but there seems to be no clear reason as to why other than DegreeWorks-side errors. The one previously unclear ruleType was confirmed (hopefully) and added to ruleMarkerSchema.

Screenshots (if appropriate):

N/A

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code involves a change to the database schema.
  • My code requires a change to the documentation.

@iAmbrosial iAmbrosial requested a review from laggycomputer March 7, 2026 00:25
Copy link
Member

@laggycomputer laggycomputer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh if the parse runs fine then it's probably fine

@laggycomputer laggycomputer linked an issue Mar 7, 2026 that may be closed by this pull request
Copy link
Member

@laggycomputer laggycomputer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aight very cool good typescript

may want to hold off merging until after #277

also merge main

Copy link
Collaborator

@sanskarm7 sanskarm7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests well and adds good observability to failed audits

well done bill :D

lgtm!!

Copy link
Member

@laggycomputer laggycomputer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we're now duplicating schemas and types, which is quite fragile. Can you remove every type which is now superseded by a Zod schema (or make them aliases to a z.infer)? I'm seeing type WithClause and withClauseSchema, for example.

@iAmbrosial
Copy link
Contributor Author

It seems we're now duplicating schemas and types, which is quite fragile. Can you remove every type which is now superseded by a Zod schema (or make them aliases to a z.infer)? I'm seeing type WithClause and withClauseSchema, for example.

I removed certain duplicate types, but had to keep the Rule type and its constituent types to be explicitly defined rather than inferred. This arises from ruleSchema referencing itself via z.lazy(), creating a circular inference that seemingly requires these types to not rely on further circular references and instead stay as explicitly types. Not sure if there are other ways around this.

Copy link
Member

@laggycomputer laggycomputer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny stuff

@laggycomputer
Copy link
Member

Not sure if there are other ways around this.

https://zod.dev/api#circularity-errors

@iAmbrosial
Copy link
Contributor Author

Not sure if there are other ways around this.

https://zod.dev/api#circularity-errors

Since ruleSchema is a discriminated union that includes many schemas, the entire type annotation that would be needed might be enormous as it would require zod types for each schema and every reference to ruleSchema within those nested schemas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zod in DegreeWorks scraping

3 participants