Data Model: Soulmates MVP

Feature: 001-prd-md Date: 2025-10-02 Storage: Appwrite Database (NoSQL document collections)

Collections Overview

erDiagram
    USERS ||--o{ TRAITS : has
    USERS ||--o{ INTERESTS : has
    USERS ||--o{ VALUES : has
    USERS ||--o{ RESPONSES : submits
    USERS ||--o| LOCATIONS : has
    USERS ||--o{ NOTIFICATION_SCHEDULES : has
    QUESTIONS ||--o{ RESPONSES : answered_by
    USERS ||--o{ MATCHES : similar_to

1. Users Collection

Collection ID: users Description: User profiles with anonymous identity and consent preferences

Fields

Field	Type	Required	Indexed	Validation	Notes
`$id`	string	✓	PK	UUID v4	Auto-generated by Appwrite Auth
`handle`	string	✓	unique	3-20 chars, alphanumeric+underscore	User-chosen pseudonym
`avatar_id`	string	✓	-	Appwrite Storage file ID	Reference to avatar image
`created_at`	datetime	✓	✓	ISO 8601	Account creation timestamp
`country`	string	✓	✓	ISO 3166-1 alpha-2	User's country (for analytics, not discovery)
`age_band`	string	✓	-	enum: "18-24", "25-34", "35-44", "45+"	Age bracket (not exact age)
`visibility`	string	✓	✓	enum: "hidden", "heatmap", "coarse_pin"	Discovery visibility mode
`radius_km`	integer	✓	-	5-200	Discovery radius in km
`consent`	object	✓	-	See below	Granular consent flags
`notification_windows`	array[string]	✓	-	["HH:MM-HH:MM"]	Time windows for notifications
`onboarding_completed`	boolean	✓	✓	-	Has user finished onboarding?
`last_active_at`	datetime	✓	✓	ISO 8601	Last app interaction

Consent Object Schema

{
  "profiling": {
    "granted": true,
    "timestamp": "2025-10-02T12:00:00Z"
  },
  "location": {
    "granted": false,
    "timestamp": "2025-10-02T12:00:00Z"
  },
  "notifications": {
    "granted": true,
    "timestamp": "2025-10-02T12:00:00Z"
  },
  "sensitive_questions": {
    "granted": false,
    "categories": [],
    "timestamp": "2025-10-02T12:00:00Z"
  }
}

Indexes

handle (unique)
visibility + country (compound, for discovery queries)
created_at (for cohort analysis)
last_active_at (for inactive user cleanup)

Validation Rules

handle: /^[a-zA-Z0-9_]{3,20}$/
notification_windows: Max 3 windows, each <12 hours duration
radius_km: 5 ≤ value ≤ 200

2. Questions Collection

Collection ID: questions Description: Profiling questions with metadata for adaptive selection

Fields

Field	Type	Required	Indexed	Validation	Notes
`$id`	string	✓	PK	UUID v4	Question ID (e.g., "q_47a")
`topic`	string	✓	✓	dot-notation	Category (e.g., "music.clubbing")
`dimension`	string	✓	✓	enum	Trait dimension measured
`type`	string	✓	-	enum: "likert_5", "choice", "multi_choice", "slider"	Answer format
`text`	string	✓	-	10-200 chars	Question text
`answers`	array[string]	✓	-	Matches type	Answer options
`info_gain`	float	✓	✓	0.0-1.0	Expected information gain score
`safety`	object	✓	-	See below	Safety filter metadata
`why`	string	✓	-	50-300 chars	Explanation for "Why this question?"
`active`	boolean	✓	✓	-	Is question in rotation?
`created_at`	datetime	✓	-	ISO 8601	When question was added
`source`	string	✓	-	enum: "seed", "llm_generated", "manual"	Origin of question

Dimension Enum Values

"Openness", "Extraversion", "Conscientiousness", "Agreeableness", "SensationSeeking", "RoutineVsNovelty", "SocialEnergy", "CreativeFocus"

Safety Object Schema

{
  "sensitive": false,
  "categories": [],  // If sensitive=true: ["health", "religion", "politics", "sexual_orientation"]
  "requires_opt_in": false
}

Indexes

dimension + active (compound, for question selection)
info_gain desc (for prioritization)
topic (for filtering)

3. Responses Collection

Collection ID: responses Description: User answers to questions

Fields

Field	Type	Required	Indexed	Validation	Notes
`$id`	string	✓	PK	UUID v4	Auto-generated
`user_id`	string	✓	✓	FK users.$id	Respondent
`question_id`	string	✓	✓	FK questions.$id	Question answered
`timestamp`	datetime	✓	✓	ISO 8601	When answered
`answer`	object	✓	-	See below	Selected answer
`time_to_answer_ms`	integer	✓	-	>0	Time from shown to submitted
`session_id`	string	✓	-	UUID v4	App session ID (for analytics)

Answer Object Schema

{
  "value": 4,           // For likert/slider: numeric 0-4 or 0-100
  "text": "Agree",      // For choice: selected option text
  "indices": [0, 2]     // For multi_choice: selected option indices
}

Indexes

user_id + timestamp desc (compound, for user history)
question_id (for question analytics)
user_id + question_id (compound unique, prevent duplicate answers)

4. Traits Collection

Collection ID: traits Description: User trait scores derived from responses

Fields

Field	Type	Required	Indexed	Validation	Notes
`$id`	string	✓	PK	UUID v4	Auto-generated
`user_id`	string	✓	✓	FK users.$id	User
`dimension`	string	✓	✓	enum (same as questions)	Trait dimension
`score`	float	✓	-	0.0-1.0	Normalized trait score
`confidence`	float	✓	-	0.0-1.0	Confidence level (based on # answers)
`updated_at`	datetime	✓	✓	ISO 8601	Last recalculation

Indexes

user_id + dimension (compound unique, one score per dimension per user)
updated_at (for incremental updates)

Calculation Logic

Initial confidence = 0.0 (no answers)
Confidence after N answers: min(1.0, N / 10) (full confidence at 10+ answers)
Score: weighted average of answers mapping to 0.0-1.0 scale

5. Interests Collection

Collection ID: interests Description: User interest tags with weights

Fields

Field	Type	Required	Indexed	Validation	Notes
`$id`	string	✓	PK	UUID v4	Auto-generated
`user_id`	string	✓	✓	FK users.$id	User
`tag`	string	✓	✓	lowercase, no spaces	Interest tag (e.g., "hiking")
`weight`	float	✓	-	0.0-1.0	Strength of interest
`source`	string	✓	-	enum: "explicit", "inferred"	How tag was derived

Indexes

user_id + tag (compound unique)
tag (for tag popularity analytics)

6. Values Collection

Collection ID: values Description: User value tags with weights

Fields

Same structure as Interests collection, different semantic meaning.

Example values: "sustainability", "ambition", "autonomy", "community", "creativity"

7. Locations Collection

Collection ID: locations Description: Coarse user locations (geohash level 5)

Fields

Field	Type	Required	Indexed	Validation	Notes
`$id`	string	✓	PK	UUID v4	Auto-generated
`user_id`	string	✓	unique	FK users.$id	User (one location per user)
`coarse_cell`	string	✓	✓	geohash length 5	~5km precision (e.g., "u33d8")
`country`	string	✓	✓	ISO 3166-1 alpha-2	Country code
`updated_at`	datetime	✓	✓	ISO 8601	Last location update

Indexes

user_id (unique)
coarse_cell + updated_at (compound, for proximity queries)
country (for country-level analytics)

Privacy Rules

Only created if user consent.location.granted = true
Deleted immediately if consent revoked
Max update frequency: 1/hour (prevent tracking)

8. Matches Collection

Collection ID: matches Description: Cached similarity scores between users

Fields

Field	Type	Required	Indexed	Validation	Notes
`$id`	string	✓	PK	UUID v4	Auto-generated
`user_id`	string	✓	✓	FK users.$id	User A
`other_id`	string	✓	✓	FK users.$id	User B
`score`	float	✓	✓	0.0-1.0	Similarity score
`band`	string	✓	✓	enum: "very_similar", "similar", "some_overlap"	Display category
`shared_traits`	array[object]	✓	-	Max 3	Top shared dimensions
`computed_at`	datetime	✓	✓	ISO 8601	Cache timestamp

Shared Traits Schema

[
  {
    "dimension": "Openness",
    "score_a": 0.85,
    "score_b": 0.88,
    "delta": 0.03
  }
]

Indexes

user_id + score desc (compound, for "most similar to me" queries)
other_id (for bidirectional lookup)
user_id + other_id (compound unique, prevent duplicate pairs)
computed_at (for cache expiration cleanup)

Cache Invalidation

TTL: 24 hours (matches recomputed daily)
Invalidate immediately if either user updates profile significantly

9. Notification Schedules Collection

Collection ID: notification_schedules Description: User notification preferences and engagement metrics

Fields

Field	Type	Required	Indexed	Validation	Notes
`$id`	string	✓	PK	UUID v4	Auto-generated
`user_id`	string	✓	unique	FK users.$id	User
`last_sent_at`	datetime	-	✓	ISO 8601	Last notification sent
`next_eligible_at`	datetime	✓	✓	ISO 8601	Earliest next send time (2hr cooldown)
`bandit_state`	object	✓	-	See below	Thompson sampling state
`metrics`	object	✓	-	See below	Engagement metrics

Bandit State Schema

{
  "arms": [
    {"hour": 9, "alpha": 2, "beta": 1},   // 9am: 2 opens, 1 ignore
    {"hour": 12, "alpha": 1, "beta": 2},  // 12pm: 1 open, 2 ignores
    {"hour": 18, "alpha": 5, "beta": 1}   // 6pm: 5 opens, 1 ignore
  ]
}

Metrics Schema

{
  "notifications_sent": 42,
  "notifications_opened": 28,
  "open_rate": 0.67,
  "avg_time_to_open_ms": 120000,
  "last_7_days_opens": 5
}

Indexes

user_id (unique)
next_eligible_at (for scheduler queries)

State Transitions

User Lifecycle

[New User]
  → onboarding_completed=false
  → (complete onboarding)
  → onboarding_completed=true
  → (answer 10 questions)
  → traits.confidence ≥ 0.3
  → (opt into location)
  → locations record created
  → (discovery enabled)
  → matches computed

Question Lifecycle

[Created]
  → source="seed|llm_generated|manual", active=false
  → (safety review)
  → active=true
  → (shown to users, analytics)
  → info_gain updated
  → (poor engagement)
  → active=false

Match Lifecycle

[No match record]
  → (user profiles updated)
  → similarity-matcher Function runs
  → score computed, band assigned
  → match record created
  → (24 hours pass)
  → match deleted, recomputed on next run

Collection Relationships

One-to-Many:

users → traits (1:N, max 8 traits per user)
users → interests (1:N, ~5-20 per user)
users → values (1:N, ~3-10 per user)
users → responses (1:N, unbounded growth)
questions → responses (1:N, unbounded growth)

One-to-One:

users → locations (1:1, nullable if consent.location=false)
users → notification_schedules (1:1)

Many-to-Many:

users ↔ users via matches (N:M, cached similarity graph)

Data Retention

Collection	Retention Policy
users	Until account deletion
questions	Indefinite (seed questions), 90 days (LLM-generated if inactive)
responses	Until account deletion, or 2 years if user inactive
traits	Recomputed from responses, deleted with user
interests/values	Deleted with user
locations	Deleted immediately if consent revoked or user deleted
matches	TTL 24 hours (rolling cache)
notification_schedules	Deleted with user

Security & Privacy Rules

All collections follow Appwrite permission model:

// Example: responses collection
{
  "read": ["user:{user_id}"],           // Users can only read own responses
  "create": ["user:{user_id}"],         // Users can only create own responses
  "update": [],                         // No updates (immutable after creation)
  "delete": ["user:{user_id}"]          // Users can delete own responses
}

Special cases:

questions: read=["role:all"], write=["role:admin"]
matches: read=["user:{user_id}", "user:{other_id}"], write=["role:function"]
locations: read=["role:function"], write=["user:{user_id}"] (not publicly readable)

FilesExpand file tree

data-model.md

Latest commit

History

data-model.md

File metadata and controls

Data Model: Soulmates MVP

Collections Overview

1. Users Collection

Fields

Consent Object Schema

Indexes

Validation Rules

2. Questions Collection

Fields

Dimension Enum Values

Safety Object Schema

Indexes

3. Responses Collection

Fields

Answer Object Schema

Indexes

4. Traits Collection

Fields

Indexes

Calculation Logic

5. Interests Collection

Fields

Indexes

6. Values Collection

Fields

7. Locations Collection

Fields

Indexes

Privacy Rules

8. Matches Collection

Fields

Shared Traits Schema

Indexes

Cache Invalidation

9. Notification Schedules Collection

Fields

Bandit State Schema

Metrics Schema

Indexes

State Transitions

User Lifecycle

Question Lifecycle

Match Lifecycle

Collection Relationships

Data Retention

Security & Privacy Rules