You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Vector store size impact varies by model**: GPT-4.1 series shows minimal latency impact across vector store sizes, while GPT-5 series shows significant increases.
227
221
@@ -241,10 +235,6 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
|| Extra Large (105 MB) | 0.636 | 0.528 | 0.528 | 0.528 |
260
246
261
247
**Key Insights:**
262
248
263
249
-**Best Performance**: gpt-5-mini consistently achieves the highest ROC AUC scores across all vector store sizes (0.909-0.939)
264
-
-**Best Latency**: gpt-4.1-nano shows the most consistent and lowest latency across all scales (4,171-4,809ms P50) but shows poor performance
250
+
-**Best Latency**: gpt-4.1-mini shows the most consistent and lowest latency across all scales (6,661-7,374ms P50) while maintaining solid accuracy
265
251
-**Most Stable**: gpt-4.1-mini (default) maintains relatively stable performance across vector store sizes with good accuracy-latency balance
266
252
-**Scale Sensitivity**: gpt-5 shows the most variability in performance across vector store sizes, with performance dropping significantly at larger scales
267
253
-**Performance vs Scale**: Most models show decreasing performance as vector store size increases, with gpt-5-mini being the most resilient
@@ -271,4 +257,4 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
271
257
-**Signal-to-noise ratio degradation**: Larger vector stores contain more irrelevant documents that may not be relevant to the specific factual claims being validated
272
258
-**Semantic search limitations**: File search retrieves semantically similar documents, but with a large diverse knowledge source, these may not always be factually relevant
273
259
-**Document quality matters more than quantity**: The relevance and accuracy of documents is more important than the total number of documents
274
-
-**Performance plateaus**: Beyond a certain size (11 MB), the performance impact becomes less severe
260
+
-**Performance plateaus**: Beyond a certain size (11 MB), the performance impact becomes less severe
0 commit comments