-
Notifications
You must be signed in to change notification settings - Fork 14
SPIKE: Project "type" and number data model #5726
Description
Why are we doing this?
Spike to investigate a possible data model for Project number and "type" (name TBD). This can simplify some of the data development work
We currently represent Project.number as a character field. We expect to have several types of projects, with project IDs that also include information about the type. We expect the IDs to look something like "POS-2025-2026" or "FOO-1234". The alphabetic part represents the type and the number is some semi-arbitrary sequence identifier (tho it may also encode information like a year). This can cause issues, see below.
Instead, we could define "type" of project as a choice field/enum, and the number as an integer field, rather than a charfield as we have it now.
That normalises the data model, and representing numbers as numbers can be simpler for application code (sort, search) and helpful for the database to process. We would have to change the uniqueness requirement to be on category+number together and have some derived model methods or fields for converting to and from the text format for each type. What we display to users need not change. We can still use our new regex in the forms with a little extra code there.
category legacy, number 123 => "123"
category NHSE-approved, number 20262001 => "POS-2026-2001"
category data development, number 9045 => "DATA-9045"
Projects have long had the number field (called "Project ID" in the UI) as a semi-stable non-pk identifier for projects in Job Server and beyond. The manual projects initiative added a new number format (POS-2026-2001), converting number to a character field, and adding regex validation and a fairly complex sort. Data development work wants to add a new format and project "types". There may be more types and formats in future.
Issues:
- denormalised char number+type both have semantic info about project type and must be kept consistent;
- the application and downstream (Grafana) may want to slice data by any of type, number (as int), full ID string;
- decomposing and recomposing number in many places adds complexity (and maybe a performance hit? possibly negligible);
- charfield may represent large numbers and we have to reason about how to make that work well with the database layer;
How will we know when it's done?
We have an outline or proposal for what to do, pros and cons, maybe a prototype or draft.
What are we doing?
- Proposal document
- Draft PR to prototype it identify what I am missing...