Skip to content

Commit b248798

Browse files
docs(security): land RLS-lockdown / storage / realtime-(a) remediation plans (#231)
Carved from the #232 draft so the planning docs live in main (the SQL stays in #232 as the applied-to-prod record, not merged). The RLS lockdown is already applied to production (anon locked out, confirmed). #230 references the realtime-jwt-bridge design doc landed here.
1 parent 10e0f74 commit b248798

3 files changed

Lines changed: 234 additions & 0 deletions

File tree

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Realtime option (a) — JWT-mint bridge (design, #231)
2+
3+
**Status: DESIGN for review. Build AFTER the RLS lockdown lands** — this is the
4+
piece that restores `room_messages` realtime under RLS. Not built yet.
5+
6+
## Goal
7+
Keep live chat working once RLS is enabled, **without** the public anon key:
8+
the realtime client authenticates as the logged-in user via a Supabase JWT, and
9+
an RLS policy scopes delivery to the user's room memberships.
10+
11+
## Why a JWT is needed
12+
Sapling uses its **own HMAC session** (`sapling_session`), not Supabase Auth, so
13+
`auth.uid()` is empty and RLS can't identify the user. We bridge by minting a
14+
Supabase-format JWT for the same user and handing it to the realtime client.
15+
16+
## Components
17+
18+
### 1. Mint a Supabase JWT (backend)
19+
At login (and on refresh), the backend mints a short-lived JWT signed with the
20+
**Supabase JWT secret** (legacy HS256; the same secret behind the anon key):
21+
```
22+
claims: { sub: <user_id>, role: "authenticated", aud: "authenticated", exp: now+1h, iat: now }
23+
```
24+
- New endpoint, e.g. `GET /api/auth/realtime-token` (auth-gated by the existing
25+
session): returns `{ token, expires_at }`, minted from the still-valid 30-day
26+
session.
27+
- New env `SUPABASE_JWT_SECRET` (from Supabase → Settings → API → JWT secret).
28+
Add it to `validate_config()` (#174) as required outside local.
29+
- Note: Supabase is migrating to **asymmetric signing keys**. If this project is
30+
on the new keys, mint/verify with the project's signing key instead of an
31+
HS256 shared secret — confirm which at build time.
32+
33+
### 2. RLS SELECT policy on room_messages (membership-scoped)
34+
```sql
35+
CREATE POLICY room_messages_member_read ON public.room_messages
36+
FOR SELECT TO authenticated
37+
USING (EXISTS (
38+
SELECT 1 FROM public.room_members m
39+
WHERE m.room_id = room_messages.room_id AND m.user_id = auth.uid()
40+
));
41+
```
42+
With the lockdown's RLS enabled and the JWT setting `auth.uid()`, an
43+
`authenticated` subscriber receives changes **only for rooms they belong to**
44+
the client-side `room_id` filter stops being the only gate. (`authenticated`
45+
still holds the table GRANT SELECT, which the lockdown intentionally left.)
46+
47+
### 3. Realtime delivery
48+
Two options, smallest first:
49+
- **(i) postgres_changes + the RLS policy above (recommended).** Realtime
50+
evaluates the subscriber's RLS on each change, so the existing
51+
`Social.tsx` subscription keeps working but is now membership-scoped. Minimal
52+
client change: set the JWT (below). `room_messages` is already in the
53+
`supabase_realtime` publication.
54+
- **(ii) Private channels (Realtime Authorization).** Mark the channel
55+
`{ config: { private: true } }` and add a policy on `realtime.messages` for
56+
the topic. More robust/explicit but a larger client rework. Defer unless (i)
57+
proves insufficient.
58+
59+
### 4. Client wiring (frontend)
60+
- After login, fetch the realtime token and apply it:
61+
`getSupabase().realtime.setAuth(token)` (and pass it when (re)creating the
62+
client). The client stops relying on the anon key for authorization.
63+
- **JWT refresh (the main complexity):** the session is **30 days** but the
64+
Supabase JWT is **~1 hour**. Add a refresh loop — re-fetch the token shortly
65+
before `expires_at` and call `setAuth` again — or the subscription drops when
66+
the JWT expires. Handle: tab wake from sleep, network reconnect, and a failed
67+
refresh (fall back to REST-only, which the #230 display fix already supports).
68+
69+
## Sequencing
70+
1. RLS lockdown (separate, urgent) — breaks anon realtime (accepted).
71+
2. This bridge — restores realtime for authenticated users, membership-scoped.
72+
73+
## Effort estimate
74+
Moderate. Backend JWT-mint endpoint + env wiring (small); RLS SELECT policy
75+
(small); client setAuth (small); **JWT refresh loop + reconnect handling
76+
(the real work)**. Reuses the existing Supabase realtime architecture — no
77+
backend fan-out/broker, no client rebuild (contrast option (c)).
78+
79+
## Optional follow-up
80+
If live reactions are wanted back (the dead `room_reactions` subscription is
81+
being removed), publish `room_reactions` to `supabase_realtime` and add the same
82+
membership-scoped SELECT policy. Until then, reactions update on
83+
load/refresh via REST.

docs/security/rls-lockdown-plan.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Project-wide RLS lockdown — apply & verification plan (#231)
2+
3+
**Status: APPLIED to production 2026-06-13.** Anon is confirmed locked out
4+
(direct REST calls to `users`/`oauth_tokens`/`user_roles`/etc. now return
5+
`permission denied`, SQLSTATE 42501). This doc is the record of what was applied
6+
and how it was verified. The SQL scripts (`rls_lockdown.sql` apply,
7+
`rls_lockdown_rollback.sql` emergency revert) live in **PR #232** as the applied
8+
record — intentionally NOT merged to `main` (the change went straight to prod;
9+
nothing re-runs them from the repo).
10+
11+
## Why this is safe for the backend
12+
The backend authenticates to Supabase with `SUPABASE_SERVICE_KEY` → the
13+
`service_role`, which has **`rolbypassrls = true`** (verified live: `SELECT
14+
rolname, rolbypassrls FROM pg_roles``service_role=t`, `anon=f`,
15+
`authenticated=f`). RLS does not apply to row-bypass roles, so **every backend
16+
query keeps working unchanged**. RLS only constrains `anon`/`authenticated`,
17+
which is exactly the public-anon-key path we're closing.
18+
19+
## Expected breakage (accepted)
20+
Anon realtime on `room_messages` stops delivering once RLS is on / anon DML is
21+
revoked. This stays broken until the **option (a)** JWT bridge lands
22+
(`docs/security/realtime-jwt-bridge-design.md`). Per decision, the full-DB
23+
exposure outranks live chat updates. The #230 display fix already re-fetches via
24+
the (service-role) REST endpoint, so chat still works on load/refresh — only the
25+
live push is paused.
26+
27+
## Test-first on a branch (if available)
28+
Supabase branching wasn't reachable via the MCP for this project (`list_branches`
29+
errored), so it may be on a plan/permission that doesn't expose it. If you have
30+
branching:
31+
1. Create a dev branch in the dashboard.
32+
2. Run `rls_lockdown.sql` against the branch.
33+
3. Run the verification below pointed at the branch.
34+
4. Merge the branch (or apply the same SQL to prod) once green.
35+
36+
If branching is unavailable: apply to prod during a low-traffic window with
37+
`rls_lockdown_rollback.sql` open and ready. The change is transactional
38+
(`BEGIN/COMMIT`) and fast (DDL only, no table rewrites).
39+
40+
## Pre-apply snapshot (record for diffing)
41+
```sql
42+
SELECT count(*) FILTER (WHERE relrowsecurity) AS rls_on,
43+
count(*) FILTER (WHERE NOT relrowsecurity) AS rls_off
44+
FROM pg_class WHERE relnamespace='public'::regnamespace AND relkind='r';
45+
-- expected before: rls_on=2, rls_off=38
46+
```
47+
48+
## Apply
49+
Run `backend/db/security/rls_lockdown.sql`.
50+
51+
## Post-apply verification checklist
52+
1. **RLS now on for all public tables:**
53+
```sql
54+
SELECT count(*) FILTER (WHERE NOT relrowsecurity) AS still_off
55+
FROM pg_class WHERE relnamespace='public'::regnamespace AND relkind='r';
56+
-- expect: still_off = 0
57+
```
58+
2. **anon has no table DML left:**
59+
```sql
60+
SELECT count(*) AS anon_grants
61+
FROM information_schema.role_table_grants
62+
WHERE table_schema='public' AND grantee='anon'
63+
AND privilege_type IN ('SELECT','INSERT','UPDATE','DELETE');
64+
-- expect: anon_grants = 0
65+
```
66+
3. **anon is blocked at the REST endpoint** (the actual exposure): with the
67+
public anon key,
68+
```
69+
curl -s -o /dev/null -w "%{http_code}\n" \
70+
"https://jxqcmjqtjlpuxfrxmrdv.supabase.co/rest/v1/users?select=id&limit=1" \
71+
-H "apikey: <ANON_KEY>" -H "Authorization: Bearer <ANON_KEY>"
72+
```
73+
Expect **401** (or `[]` with permission-denied), not a row. Repeat for
74+
`user_roles`, `oauth_tokens`, `messages`.
75+
4. **Backend still works (service_role):**
76+
- `cd backend && python -m pytest tests/ -q` (suite is hermetic; sanity only).
77+
- Hit live read + write endpoints against the target DB and confirm normal
78+
behavior, e.g. `GET /api/auth/me` (read), a calendar/gradebook create
79+
(write), a notes save. All should succeed exactly as before (service_role
80+
bypasses RLS).
81+
5. **Realtime is paused (expected):** open a room — messages still load and
82+
refresh via REST; live push is down until option (a). No errors beyond the
83+
subscription returning nothing.
84+
85+
## Rollback
86+
If something critical breaks: run
87+
`backend/db/security/rls_lockdown_rollback.sql` (re-grants anon, disables RLS on
88+
the 38). ⚠️ This restores the insecure state — re-apply the lockdown + option
89+
(a) as soon as the issue is understood.
90+
91+
## Follow-ups (not in this script)
92+
- `authenticated` keeps its grants (RLS-with-no-policy denies it today); option
93+
(a) adds membership-scoped policies for it on `room_messages`.
94+
- Storage hardening is a separate track (`docs/security/storage-hardening-plan.md`).
95+
- The 2 already-RLS tables (`achievement_cosmetics`, `achievement_triggers`)
96+
have RLS on but **no policies** — confirm nothing legitimately reads them via
97+
anon (the backend uses service_role, so it's unaffected).
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Storage hardening — PR-plan (#231)
2+
3+
**Status: DRAFT plan for review. Nothing applied.**
4+
5+
## Live findings (Sapling prod, read-only)
6+
Buckets that actually exist (3):
7+
8+
| Bucket | `public` | Written by | Read by | Issue |
9+
|---|---|---|---|---|
10+
| `issues-media-files` (issue-report screenshots) | **true** | frontend **anon key** (`ReportIssueFlow.tsx`) | `getPublicUrl` (public) | anon upload + public read |
11+
| `application_resumes` (résumés) | **true** | backend service key (`careers.py`) | `getPublicUrl` (public) | **résumé PII publicly readable** |
12+
| `avatars` | true | backend service key (`storage_service.py`) | public `<img>` | intended public read |
13+
14+
`storage.objects` policies: `"Allow public read"` (SELECT, `{public}`, `issues-media-files`) and **`"Allow uploads"` (INSERT, `{public}`, no bucket/auth restriction)** → anyone can upload to **any** bucket, unauthenticated, unbounded (no size limit on `issues-media-files`/`application_resumes`).
15+
16+
Note: `chat-images` and `cosmetic-assets` referenced in code **do not exist** — those upload paths are dead (separate cleanup; not a live exposure).
17+
18+
## Target state
19+
All storage writes go through the **backend (service_role)**; private buckets are read via **backend-generated signed URLs**; only `avatars` stays public-read. After this, there are **no anon/public storage policies** — the anon storage surface is gone.
20+
21+
| Bucket | public | upload path | read path |
22+
|---|---|---|---|
23+
| `issues-media-files` | **false** | new backend endpoint (multipart → service-key upload), reusing `request_limits.read_within_limit` + content-type allowlist (the #220/#229 pattern) | backend signed URL (admin view) |
24+
| `application_resumes` | **false** | already backend (`careers.py`) | backend signed URL (admin view) |
25+
| `avatars` | true | already backend | public (unchanged) |
26+
27+
## Changes
28+
29+
### SQL (review before applying)
30+
```sql
31+
BEGIN;
32+
UPDATE storage.buckets SET public = false WHERE id IN ('issues-media-files','application_resumes');
33+
DROP POLICY IF EXISTS "Allow uploads" ON storage.objects; -- kills the global public INSERT
34+
DROP POLICY IF EXISTS "Allow public read" ON storage.objects; -- issues-media-files public read
35+
COMMIT;
36+
```
37+
No new storage.objects policies are needed: backend uploads/reads use `service_role` (bypasses storage RLS). `avatars` stays `public=true` so its objects remain readable without a policy.
38+
39+
### Backend
40+
- New `POST /api/issue-reports/screenshot` (auth-gated via `get_session_user_id`): accepts the file, validates type+size with the shared `request_limits` helpers, uploads to `issues-media-files` with the service key (mirror `careers._upload_resume`), returns the storage path (not a public URL).
41+
- Signed-URL helper for private buckets (admin views of screenshots/résumés): backend issues a short-TTL signed URL via the storage REST API with the service key.
42+
43+
### Frontend
44+
- `ReportIssueFlow.tsx`: stop using the anon `supabase.storage` client; POST the screenshot to the new backend endpoint. Removes a direct anon-key path (also shrinks the #231 surface).
45+
- Admin résumé/screenshot views: fetch signed URLs from the backend instead of assuming public URLs.
46+
47+
## Verification
48+
- `storage.buckets`: `issues-media-files` and `application_resumes` show `public=false`; `avatars` stays `true`.
49+
- `pg_policies` (schema `storage`): the two `{public}` policies are gone.
50+
- Anon upload attempt → denied. Public URL to a private-bucket object → 400/403; signed URL → 200.
51+
- Issue-report flow and résumé upload still work end-to-end via the backend; avatars still render.
52+
53+
## Priority
54+
`application_resumes` (résumé PII, publicly readable) is **equal priority** to the screenshots bucket — both flip to private first; the global public-INSERT policy is dropped in the same change.

0 commit comments

Comments
 (0)