Conversation
laggycomputer
left a comment
There was a problem hiding this comment.
tbh if the parse runs fine then it's probably fine
apps/data-pipeline/degreeworks-scraper/src/components/DegreeworksClient.ts
Outdated
Show resolved
Hide resolved
apps/data-pipeline/degreeworks-scraper/src/components/DegreeworksClient.ts
Outdated
Show resolved
Hide resolved
sanskarm7
left a comment
There was a problem hiding this comment.
tests well and adds good observability to failed audits
well done bill :D
lgtm!!
laggycomputer
left a comment
There was a problem hiding this comment.
It seems we're now duplicating schemas and types, which is quite fragile. Can you remove every type which is now superseded by a Zod schema (or make them aliases to a z.infer)? I'm seeing type WithClause and withClauseSchema, for example.
I removed certain duplicate types, but had to keep the Rule type and its constituent types to be explicitly defined rather than inferred. This arises from ruleSchema referencing itself via z.lazy(), creating a circular inference that seemingly requires these types to not rely on further circular references and instead stay as explicitly types. Not sure if there are other ways around this. |
|
Since ruleSchema is a discriminated union that includes many schemas, the entire type annotation that would be needed might be enormous as it would require zod types for each schema and every reference to ruleSchema within those nested schemas. |
Description
Adds Zod runtime validation to all DegreeWorks API responses in the degreeworks scraper. Previously, API responses were cast directly to TypeScript types with no runtime checks. This meant malformed or unexpected responses would silently fail or lead to unclear downstream errors. We are thus introducing schemas for all response shapes and integrating them into their respective clients.
Every variant of the
Ruletype has aruleTypefield containing a unique string literal. To replicate this behaviour in an equivalent Zod schema, theruleTypeSchemais defined as a discriminated union. Since discriminated unions cannot take intersections, every rule schema extendsruleBaseSchemato include the label field – reducing repetition if another rule type ever needs to be added.The
DWAuditErrorResponsetype denotes values that never occur (i.e., it is impossible for this field to have a valid value). It is thus nonsensical to validate the error response type with Zod, and we instead only create aDWAuditOKSchemaand then directly validate the parsed results against that in the DegreeWorks Client.Related Issue
#275
Motivation and Context
We are adding Zod to DegreeWorks response parsing and before storing data such that we can check our assumptions going forward. Without runtime validation, unexpected API response shapes and silently sparse data have caused issues in the area. Zod validation makes these failures visible immediately.
How Has This Been Tested?
Ran the DegreeWorks Scraper for the 2025-2026 catalog year. The majority of undergraduate and graduate programs and all cached specializations parsed successfully. There were two cases of "Unexpected audit response shape" due to response objects lacking a blockArray field, but there seems to be no clear reason as to why other than DegreeWorks-side errors. The one previously unclear ruleType was confirmed (hopefully) and added to ruleMarkerSchema.
Screenshots (if appropriate):
N/A
Types of changes
Checklist: