You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This directory contains conformance tests that validate Agent-Diff API replicas against their real-world production counterparts. The tests compare **response schema/shape**, **status codes**, **error semantics**, **mutation behavior**, and **pagination** — not exact values, since IDs and timestamps naturally differ between environments.
6
+
7
+
## What Existed Before
8
+
9
+
Prior to this expansion, conformance tests existed for Box, Calendar, and Linear as production parity tests, and Slack as docs-golden (replica-only) tests. Coverage was uneven:
-**Extended error handling**: Invalid time ranges (end before start), missing required fields, delete non-existent calendar, events for non-existent calendar, ACL with invalid role
32
+
-**Pagination parity**: Events and CalendarList with maxResults=1, nextPageToken following
33
+
34
+
### Expanded: Linear (`test_linear_parity_comprehensive.py`)
35
+
36
+
Added three new test sections:
37
+
-**Error response parity**: Non-existent issue by UUID, mutation with invalid team ID, malformed UUID — validates both environments return errors for the same inputs
38
+
-**Pagination parity**: issues(first:1) and issues(last:1) pageInfo shape, cursor-based pagination following
39
+
-**Earlier fixes**: Removed 3 invalid test cases that tested replica extensions not present in production (labels.none, comments.none filters; missing title validation strictness)
Retained as a complementary replica-only validation layer (22 tests). These run without API credentials and validate response shapes against documented Slack API contracts.
Across all four services, the following core API behaviors are confirmed to match production:
59
+
60
+
-**Response schema/shape parity**: All CRUD operations (create, read, update, delete) return structurally identical responses between replicas and production APIs. Field names, nesting, types, and list structures match.
61
+
-**Error code parity**: Replicas return the same error codes as production for invalid inputs — `404` for non-existent resources, `400` for malformed requests, `channel_not_found` / `user_not_found` / `no_text` / `message_not_found` for Slack-specific errors.
62
+
-**Pagination behavior**: Cursor-based (Slack, Linear) and token-based (Calendar) pagination produces structurally identical responses. Page sizes are respected, continuation tokens work correctly.
63
+
-**Mutation semantics**: Create, update, and delete operations produce equivalent state changes and response shapes across all services.
64
+
-**GraphQL schema fidelity** (Linear): Introspection comparison confirms that query/mutation fields, input types, and object types are aligned between production and replica on all benchmark-relevant surfaces.
65
+
66
+
### Minor Issues Identified
67
+
68
+
The expanded test suite identified a small number of minor discrepancies, none of which affect benchmark scoring or the validity of reported results. These will be addressed before publication:
69
+
70
+
-**Calendar**: The replica accepts events with end time before start time (Google Calendar returns HTTP 400). This is an input validation gap — the replica processes the request rather than rejecting it. Four event list responses are missing computed fields that Google injects server-side. These do not affect the benchmark because no benchmark task depends on time-range validation rejection or these specific computed fields.
71
+
-**Linear**: Schema introspection detects 2 fields recently added to Linear's production API (`activity`, `hasSharedUsers` on `IssueFilter`) that the replica does not yet implement. These are new Linear features not used by any benchmark task.
72
+
-**Box**: One edge case in collection operations. Does not affect any benchmark task.
73
+
74
+
## How to Run
75
+
76
+
```bash
77
+
# All conformance tests
78
+
pytest -m conformance -v
79
+
80
+
# Individual services (production parity — requires API credentials)
0 commit comments