Skip to content

Latest commit

 

History

History
439 lines (332 loc) · 13 KB

File metadata and controls

439 lines (332 loc) · 13 KB

Data Model: Soulmates MVP

Feature: 001-prd-md Date: 2025-10-02 Storage: Appwrite Database (NoSQL document collections)

Collections Overview

erDiagram
    USERS ||--o{ TRAITS : has
    USERS ||--o{ INTERESTS : has
    USERS ||--o{ VALUES : has
    USERS ||--o{ RESPONSES : submits
    USERS ||--o| LOCATIONS : has
    USERS ||--o{ NOTIFICATION_SCHEDULES : has
    QUESTIONS ||--o{ RESPONSES : answered_by
    USERS ||--o{ MATCHES : similar_to
Loading

1. Users Collection

Collection ID: users Description: User profiles with anonymous identity and consent preferences

Fields

Field Type Required Indexed Validation Notes
$id string PK UUID v4 Auto-generated by Appwrite Auth
handle string unique 3-20 chars, alphanumeric+underscore User-chosen pseudonym
avatar_id string - Appwrite Storage file ID Reference to avatar image
created_at datetime ISO 8601 Account creation timestamp
country string ISO 3166-1 alpha-2 User's country (for analytics, not discovery)
age_band string - enum: "18-24", "25-34", "35-44", "45+" Age bracket (not exact age)
visibility string enum: "hidden", "heatmap", "coarse_pin" Discovery visibility mode
radius_km integer - 5-200 Discovery radius in km
consent object - See below Granular consent flags
notification_windows array[string] - ["HH:MM-HH:MM"] Time windows for notifications
onboarding_completed boolean - Has user finished onboarding?
last_active_at datetime ISO 8601 Last app interaction

Consent Object Schema

{
  "profiling": {
    "granted": true,
    "timestamp": "2025-10-02T12:00:00Z"
  },
  "location": {
    "granted": false,
    "timestamp": "2025-10-02T12:00:00Z"
  },
  "notifications": {
    "granted": true,
    "timestamp": "2025-10-02T12:00:00Z"
  },
  "sensitive_questions": {
    "granted": false,
    "categories": [],
    "timestamp": "2025-10-02T12:00:00Z"
  }
}

Indexes

  • handle (unique)
  • visibility + country (compound, for discovery queries)
  • created_at (for cohort analysis)
  • last_active_at (for inactive user cleanup)

Validation Rules

  • handle: /^[a-zA-Z0-9_]{3,20}$/
  • notification_windows: Max 3 windows, each <12 hours duration
  • radius_km: 5 ≤ value ≤ 200

2. Questions Collection

Collection ID: questions Description: Profiling questions with metadata for adaptive selection

Fields

Field Type Required Indexed Validation Notes
$id string PK UUID v4 Question ID (e.g., "q_47a")
topic string dot-notation Category (e.g., "music.clubbing")
dimension string enum Trait dimension measured
type string - enum: "likert_5", "choice", "multi_choice", "slider" Answer format
text string - 10-200 chars Question text
answers array[string] - Matches type Answer options
info_gain float 0.0-1.0 Expected information gain score
safety object - See below Safety filter metadata
why string - 50-300 chars Explanation for "Why this question?"
active boolean - Is question in rotation?
created_at datetime - ISO 8601 When question was added
source string - enum: "seed", "llm_generated", "manual" Origin of question

Dimension Enum Values

"Openness", "Extraversion", "Conscientiousness", "Agreeableness", "SensationSeeking", "RoutineVsNovelty", "SocialEnergy", "CreativeFocus"

Safety Object Schema

{
  "sensitive": false,
  "categories": [],  // If sensitive=true: ["health", "religion", "politics", "sexual_orientation"]
  "requires_opt_in": false
}

Indexes

  • dimension + active (compound, for question selection)
  • info_gain desc (for prioritization)
  • topic (for filtering)

3. Responses Collection

Collection ID: responses Description: User answers to questions

Fields

Field Type Required Indexed Validation Notes
$id string PK UUID v4 Auto-generated
user_id string FK users.$id Respondent
question_id string FK questions.$id Question answered
timestamp datetime ISO 8601 When answered
answer object - See below Selected answer
time_to_answer_ms integer - >0 Time from shown to submitted
session_id string - UUID v4 App session ID (for analytics)

Answer Object Schema

{
  "value": 4,           // For likert/slider: numeric 0-4 or 0-100
  "text": "Agree",      // For choice: selected option text
  "indices": [0, 2]     // For multi_choice: selected option indices
}

Indexes

  • user_id + timestamp desc (compound, for user history)
  • question_id (for question analytics)
  • user_id + question_id (compound unique, prevent duplicate answers)

4. Traits Collection

Collection ID: traits Description: User trait scores derived from responses

Fields

Field Type Required Indexed Validation Notes
$id string PK UUID v4 Auto-generated
user_id string FK users.$id User
dimension string enum (same as questions) Trait dimension
score float - 0.0-1.0 Normalized trait score
confidence float - 0.0-1.0 Confidence level (based on # answers)
updated_at datetime ISO 8601 Last recalculation

Indexes

  • user_id + dimension (compound unique, one score per dimension per user)
  • updated_at (for incremental updates)

Calculation Logic

  • Initial confidence = 0.0 (no answers)
  • Confidence after N answers: min(1.0, N / 10) (full confidence at 10+ answers)
  • Score: weighted average of answers mapping to 0.0-1.0 scale

5. Interests Collection

Collection ID: interests Description: User interest tags with weights

Fields

Field Type Required Indexed Validation Notes
$id string PK UUID v4 Auto-generated
user_id string FK users.$id User
tag string lowercase, no spaces Interest tag (e.g., "hiking")
weight float - 0.0-1.0 Strength of interest
source string - enum: "explicit", "inferred" How tag was derived

Indexes

  • user_id + tag (compound unique)
  • tag (for tag popularity analytics)

6. Values Collection

Collection ID: values Description: User value tags with weights

Fields

Same structure as Interests collection, different semantic meaning.

Example values: "sustainability", "ambition", "autonomy", "community", "creativity"


7. Locations Collection

Collection ID: locations Description: Coarse user locations (geohash level 5)

Fields

Field Type Required Indexed Validation Notes
$id string PK UUID v4 Auto-generated
user_id string unique FK users.$id User (one location per user)
coarse_cell string geohash length 5 ~5km precision (e.g., "u33d8")
country string ISO 3166-1 alpha-2 Country code
updated_at datetime ISO 8601 Last location update

Indexes

  • user_id (unique)
  • coarse_cell + updated_at (compound, for proximity queries)
  • country (for country-level analytics)

Privacy Rules

  • Only created if user consent.location.granted = true
  • Deleted immediately if consent revoked
  • Max update frequency: 1/hour (prevent tracking)

8. Matches Collection

Collection ID: matches Description: Cached similarity scores between users

Fields

Field Type Required Indexed Validation Notes
$id string PK UUID v4 Auto-generated
user_id string FK users.$id User A
other_id string FK users.$id User B
score float 0.0-1.0 Similarity score
band string enum: "very_similar", "similar", "some_overlap" Display category
shared_traits array[object] - Max 3 Top shared dimensions
computed_at datetime ISO 8601 Cache timestamp

Shared Traits Schema

[
  {
    "dimension": "Openness",
    "score_a": 0.85,
    "score_b": 0.88,
    "delta": 0.03
  }
]

Indexes

  • user_id + score desc (compound, for "most similar to me" queries)
  • other_id (for bidirectional lookup)
  • user_id + other_id (compound unique, prevent duplicate pairs)
  • computed_at (for cache expiration cleanup)

Cache Invalidation

  • TTL: 24 hours (matches recomputed daily)
  • Invalidate immediately if either user updates profile significantly

9. Notification Schedules Collection

Collection ID: notification_schedules Description: User notification preferences and engagement metrics

Fields

Field Type Required Indexed Validation Notes
$id string PK UUID v4 Auto-generated
user_id string unique FK users.$id User
last_sent_at datetime - ISO 8601 Last notification sent
next_eligible_at datetime ISO 8601 Earliest next send time (2hr cooldown)
bandit_state object - See below Thompson sampling state
metrics object - See below Engagement metrics

Bandit State Schema

{
  "arms": [
    {"hour": 9, "alpha": 2, "beta": 1},   // 9am: 2 opens, 1 ignore
    {"hour": 12, "alpha": 1, "beta": 2},  // 12pm: 1 open, 2 ignores
    {"hour": 18, "alpha": 5, "beta": 1}   // 6pm: 5 opens, 1 ignore
  ]
}

Metrics Schema

{
  "notifications_sent": 42,
  "notifications_opened": 28,
  "open_rate": 0.67,
  "avg_time_to_open_ms": 120000,
  "last_7_days_opens": 5
}

Indexes

  • user_id (unique)
  • next_eligible_at (for scheduler queries)

State Transitions

User Lifecycle

[New User]
  → onboarding_completed=false
  → (complete onboarding)
  → onboarding_completed=true
  → (answer 10 questions)
  → traits.confidence ≥ 0.3
  → (opt into location)
  → locations record created
  → (discovery enabled)
  → matches computed

Question Lifecycle

[Created]
  → source="seed|llm_generated|manual", active=false
  → (safety review)
  → active=true
  → (shown to users, analytics)
  → info_gain updated
  → (poor engagement)
  → active=false

Match Lifecycle

[No match record]
  → (user profiles updated)
  → similarity-matcher Function runs
  → score computed, band assigned
  → match record created
  → (24 hours pass)
  → match deleted, recomputed on next run

Collection Relationships

One-to-Many:

  • users → traits (1:N, max 8 traits per user)
  • users → interests (1:N, ~5-20 per user)
  • users → values (1:N, ~3-10 per user)
  • users → responses (1:N, unbounded growth)
  • questions → responses (1:N, unbounded growth)

One-to-One:

  • users → locations (1:1, nullable if consent.location=false)
  • users → notification_schedules (1:1)

Many-to-Many:

  • users ↔ users via matches (N:M, cached similarity graph)

Data Retention

Collection Retention Policy
users Until account deletion
questions Indefinite (seed questions), 90 days (LLM-generated if inactive)
responses Until account deletion, or 2 years if user inactive
traits Recomputed from responses, deleted with user
interests/values Deleted with user
locations Deleted immediately if consent revoked or user deleted
matches TTL 24 hours (rolling cache)
notification_schedules Deleted with user

Security & Privacy Rules

All collections follow Appwrite permission model:

// Example: responses collection
{
  "read": ["user:{user_id}"],           // Users can only read own responses
  "create": ["user:{user_id}"],         // Users can only create own responses
  "update": [],                         // No updates (immutable after creation)
  "delete": ["user:{user_id}"]          // Users can delete own responses
}

Special cases:

  • questions: read=["role:all"], write=["role:admin"]
  • matches: read=["user:{user_id}", "user:{other_id}"], write=["role:function"]
  • locations: read=["role:function"], write=["user:{user_id}"] (not publicly readable)