SPIKE: Project "type" and number data model

# Why are we doing this?

Spike to investigate a possible data model for Project number and "type" (name TBD). This can simplify some of the data development work 

We currently represent `Project.number` as a character field. We expect to have several types of projects, with project IDs that also include information about the type. We expect the IDs to look something like "POS-2025-2026" or "FOO-1234". The alphabetic part represents the type and the number is some semi-arbitrary sequence identifier (tho it may also encode information like a year). This can cause issues, see below.

Instead, we could define "type" of project as a choice field/enum, and the number as an integer field, rather than a charfield as we have it now.
That normalises the data model, and representing numbers as numbers can be simpler for application code (sort, search) and helpful for the database to process. We would have to change the uniqueness requirement to be on category+number together and have some derived model methods or fields for converting to and from the text format for each type. What we display to users need not change. We can still use our new regex in the forms with a little extra code there.

```
category legacy, number 123 => "123"
category NHSE-approved, number 20262001 => "POS-2026-2001"
category data development, number 9045 => "DATA-9045"
```

Projects have long had the `number` field (called "Project ID" in the UI) as a semi-stable non-pk identifier for projects in Job Server and beyond. The [manual projects initiative](https://github.com/opensafely-core/job-server/issues?q=label%3A%22initiative%3Amanual-projects%22) added a new `number` format (POS-2026-2001), converting `number` to a character field, and adding regex validation and a fairly complex sort. Data development work wants to [add a new format](https://github.com/opensafely-core/job-server/issues/5701) and [project "types"](https://github.com/opensafely-core/job-server/issues/5683). There may be more types and formats in future.

Issues:
- denormalised char number+type both have semantic info about project type and must be kept consistent;
- the application and downstream (Grafana) may want to slice data by any of type, number (as int), full ID string;
- decomposing and recomposing number in many places adds complexity (and maybe a performance hit? possibly negligible);
- charfield may represent large numbers and we have to reason about how to make that work well with the database layer;

[Slack thread with discussion.](https://bennettoxford.slack.com/archives/C069SADHP1Q/p1773912703577749?thread_ts=1773831375.923219&cid=C069SADHP1Q)

# How will we know when it's done?

We have an outline or proposal for what to do, pros and cons, maybe a prototype or draft. 

# What are we doing?

- [Proposal document](https://docs.google.com/document/d/1NYZCFARvXpGFCo1w8sx-7170hhMxPeRXV8YPhs-SYcg/edit?usp=sharing)
- Draft PR to prototype it identify what I am missing...  

---

[Defining delivery tasks guidance](https://bennett.wiki/tech-group/delivery-process/task-definition/)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPIKE: Project "type" and number data model #5726

Why are we doing this?

How will we know when it's done?

What are we doing?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SPIKE: Project "type" and number data model #5726

Description

Why are we doing this?

How will we know when it's done?

What are we doing?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions