feat(middleware): add rate_limit by hidetzu · Pull Request #10 · hidetzu/prism-api

hidetzu · 2026-04-11T22:01:01Z

Summary

PR 4/9 of the Phase 2 series. Adds the second defensive middleware: per-IP token-bucket rate limiting backed by golang.org/x/time/rate. Each unique client IP gets its own *rate.Limiter, and exceeded requests get 429\ + response.CodeRateLimited` before reaching the handler.

Also includes a small internal refactor: the `clientIP` helper moves from `logging.go` into a dedicated `clientip.go` file so both `logging.go` and `ratelimit.go` can share it without drift. Behavior is unchanged.

Not wired into `app.New()`. Wiring happens in PR 6 along with `body_limit` and `concurrency_limit`.

Key design decisions

Keying by `clientIP`, not `RemoteAddr`

Fly.io terminates TLS at its edge, so every request arriving at the machine shares the same RemoteAddr (the edge proxy's internal IP). Without X-Forwarded-For keying, every client on the planet would share a single limiter — totally broken.

clientIP prefers the first entry of X-Forwarded-For and falls back to RemoteAddr only when no XFF header is present. The rate limit tests include a KeysByXForwardedForNotRemoteAddr case that explicitly simulates the Fly edge scenario.

rpm → tokens per second conversion

rate.Limit is tokens per second. The middleware converts from rpm via rate.Limit(float64(rpm) / 60.0). For the Phase 1 config defaults (RATE_LIMIT_RPM=10, RATE_LIMIT_BURST=20), that's 0.167 tokens/sec with a 20-token burst.

`rpm <= 0` disables the limiter

Matches the convention established by body_limit. Useful for tests and opt-in no-limit local dev.

Unbounded map growth is accepted for Phase 2

Per the Phase 2 instruction §14, eviction of stale per-IP entries is not required in this phase. The map will grow over process lifetime. For short-lived Fly.io machines (auto-stop after idle) this is a non-issue in practice; a background janitor can be added in a later phase if long-lived deployments see problematic growth. The design comment in ratelimit.go flags this explicitly.

`clientIP` refactor

Moved from logging.go into a dedicated clientip.go file to avoid duplication. Both callers live in the same middleware package so the helper stays package-private (not exported). The logging tests still pass against the moved definition — the refactor is mechanical.

Test plan

7 test cases in `ratelimit_test.go`:

Test	Verifies
`AllowsWithinBurst`	3 requests with `burst=3` all pass
`RejectsBeyondBurstReturns429`	3rd request with `burst=2` returns 429 with standard error body (`code=rate_limited`, non-empty message, correct Content-Type)
`PerIPIsIndependent`	IP A exhausts its burst; IP B still has full burst
`KeysByXForwardedForNotRemoteAddr`	Two clients sharing one `RemoteAddr` but with different XFF headers are keyed independently (Fly edge scenario)
`RecoversOverTime`	600 rpm + burst 1: exhaust, reject, sleep 150ms, recover. Verifies the limiter is time-based
`DisabledWhenRPMZero`	`RateLimit(0, ...)` → handler runs 5/5 times
`NegativeRPMDisabled`	`RateLimit(-1, ...)` → handler runs

Local checks

go vet ./... clean
go test ./... -race -count=1 all pass (existing + 7 new + 2 body_limit + 2 logging)
golangci-lint run 0 issues
go mod tidy leaves go.mod / go.sum clean (CI tidy check will pass)

Dependencies added

golang.org/x/time v0.15.0

Per `docs/development_rules.md` §3 this is an explicitly-allowed dependency for rate limiting.

Phase 2 context

#	PR	Status
1	response codes	merged
2	validation helpers	merged
3	body_limit middleware	merged
4	rate_limit middleware (this PR)	open
5	concurrency_limit middleware	blocked on 1
6	wire defensive middleware + E2E test	blocked on 3, 4, 5
7	analyze usecase adapter for pkg/prism	blocked on 1
8	/v1/analyze handler and routing	blocked on 1, 2, 7
9	/v1/prompt endpoint	blocked on 1, 2, 7

Intended to be squash-merged.

Second of the three defensive middleware. Enforces a per-IP token-bucket rate limit using golang.org/x/time/rate. Each unique client IP gets its own *rate.Limiter configured with rpm requests per minute and a burst capacity of burst tokens; exceeded requests are rejected with 429 and response.CodeRateLimited before the handler runs. Keying uses the same clientIP helper that logging already uses, so rate limiting honors X-Forwarded-For (Fly.io edge proxy) and falls back to RemoteAddr. To keep the two consumers in sync, the helper is moved out of logging.go into a dedicated clientip.go file; both logging.go and ratelimit.go now reference the same package-internal clientIP function. Behavior is unchanged, the logging tests still pass against the new location. rpm <= 0 disables the limiter entirely (returns a pass-through middleware), matching the convention established by body_limit. Memory note: the per-IP limiter map grows unbounded over process lifetime. Phase 2 explicitly accepts this (Phase 2 instruction §14 says eviction is not required); a background janitor will be added in a later phase if long-lived deployments see meaningful growth. Not yet wired into app.New(); the full defensive stack (body_limit + rate_limit + concurrency_limit) will land together in a later PR along with an end-to-end test. Tests cover within-burst passthrough, beyond-burst rejection with the standard error body, per-IP independence, X-Forwarded- For keying (important for the Fly.io edge shared-proxy case), time-based recovery with a 150 ms sleep, and the disabled paths for zero and negative rpm.

hidetzu merged commit 0529de1 into main Apr 11, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(middleware): add rate_limit#10

feat(middleware): add rate_limit#10
hidetzu merged 1 commit into
mainfrom
feat/middleware-rate-limit

hidetzu commented Apr 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hidetzu commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key design decisions

Keying by clientIP, not RemoteAddr

rpm → tokens per second conversion

rpm <= 0 disables the limiter

Unbounded map growth is accepted for Phase 2

clientIP refactor

Test plan

Local checks

Dependencies added

Phase 2 context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hidetzu commented Apr 11, 2026 •

edited

Loading

Keying by `clientIP`, not `RemoteAddr`

`rpm <= 0` disables the limiter

`clientIP` refactor