feat: v15 AI Infrastructure — provider abstraction, split models, latency, local models, AA benchmarks by seanthimons · Pull Request #162 · seanthimons/serapeum

seanthimons · 2026-03-18T19:44:29Z

Summary

Complete v15 AI Infrastructure milestone: 5 phases, 14 commits, 215 new tests.

Set A — Retrieval & Usage Tracking

OA Usage Tracking — API key support, usage log table, Cost Tracker section with budget badge and 90% warning toast
Retrieval Quality — Split VSS/BM25 with RRF fusion, RAG-Fusion query reformulation, contextual chunk headers, stale index detection

Set B — AI Infrastructure (5 phases)

Phase 1: Provider Abstraction — Unified provider_chat_completion/provider_get_embeddings interface, all 15+ call sites migrated, duration_ms timing on every call
Phase 2: Latency Tracking — Migration 012, latency queries with p50/p95 percentiles, Cost Tracker latency section with sparkline trend
Phase 3: Split Models — 3-slot routing (fast/quality/embedding) via COST_OPERATION_META slots and resolve_model_for_operation(), Settings UI with 3 dropdowns, chat_model → quality_model migration
Phase 4: Provider Management — Migration 013 (providers table), CRUD with default-provider protection, Settings UI for add/edit/delete/test providers, embedding dimension detection and mismatch warning, stale index extension
Phase 5: AA Integration — R/api_artificialanalysis.R with fetch/load/cache, bundled snapshot (14 models), model ID matching (manual + fuzzy), enriched model picker labels (Q:score, tok/s, $/M), smart defaults algorithm, model info panel enrichment

Key files

File	Role
`R/api_provider.R`	Provider abstraction layer + slot resolution
`R/api_artificialanalysis.R`	AA benchmarks client
`R/cost_tracking.R`	Operation metadata with slots, latency queries
`R/mod_settings.R`	3-slot UI, providers section, AA section
`R/mod_cost_tracker.R`	Latency analytics UI
`migrations/012_*.sql`	duration_ms column
`migrations/013_*.sql`	providers table

Test plan

215 new automated tests pass (41 provider + 59 cost/latency + 78 slot/CRUD/stale + 37 AA)
Shiny smoke test passes after every phase
Manual: Settings page renders 3 model dropdowns correctly
Manual: Add/test/delete a custom provider (e.g., Ollama)
Manual: Latency section appears in Cost Tracker after an LLM call
Manual: AA benchmark data shows in model picker labels
Manual: Embedding dimension warning appears when switching models

…nd migration nudge REVERSION POINT — Phase 1 of v15 Set A complete. Safe to revert to this commit if Phase 2 (Cost Tracker UI + Sidebar Badge) introduces issues. - Add migration 011: oa_usage_log table for tracking API credit usage - Add perform_oa_request() centralized wrapper with header parsing - Replace all req_perform() calls with perform_oa_request() - Add parse_oa_usage_headers() and log_oa_usage() functions - Add openalex_api_key to Settings UI, effective config, and env var support - Add migration nudge banner (dismissible) for email-only users - Add should_show_oa_migration_nudge() helper - Add 43 unit tests across test-oa-usage-tracking.R and test-oa-migration.R Resolves groundwork for #157

…d toast warning REVERSION POINT — Phase 2 of v15 Set A complete. Safe to revert to this commit if Phase 3 (Split VSS/BM25 with RRF Fusion) introduces issues. - Add get_oa_daily_usage() and get_oa_usage_history() query functions - Add oa_budget_percentage(), oa_budget_color(), oa_toast_should_fire() helpers - Add OA usage value box to Cost Tracker tab (daily budget, requests, credit usage) - Add sidebar OA budget badge with green/yellow/red color tiers - Add one-time-per-day toast notification at >= 90% budget consumption - Badge and tracker section hidden for polite-pool users (no API key) - Add openalex_search, openalex_fetch, openalex_topics, query_reformulation to COST_OPERATION_META Continues #157

…tract title bug REVERSION POINT — Phase 3 of v15 Set A complete. Safe to revert to this commit if Phase 4 (Query Reformulation) introduces issues. - Add rrf_merge() function implementing Reciprocal Rank Fusion (k=60) - Replace single ragnar_retrieve() with split ragnar_retrieve_vss() + ragnar_retrieve_bm25() - Add enrich_retrieval_results() shared metadata parser - Fix #159: abstract chunks now show actual paper titles from DB instead of "[Abstract]" - retrieve_with_ragnar() now accepts multiple queries (prep for Phase 4 RAG-Fusion) - Pass con to retrieve_with_ragnar() for abstract title lookup - Add 15 unit tests for RRF merge algorithm Resolves #159, continues #12, #48

REVERSION POINT — Phase 4 of v15 Set A complete. Safe to revert to this commit if Phase 5 (Contextual Chunk Headers) introduces issues. - Add generate_query_variants() for LLM-powered query expansion (3 variants) - Add parse_query_variants() supporting numbered, dashed, and plain line formats - Wire reformulation into search_chunks_hybrid() with config/session params - Log reformulation cost as "query_reformulation" operation - Add Settings toggle: "Query Reformulation (RAG-Fusion)" (enabled by default) - When disabled, falls back to single-query retrieval (still with RRF) - Add 7 unit tests for query variant parsing Continues #12, #48, #142

REVERSION POINT — Phase 5 of v15 Set A complete. Safe to revert to this commit if subsequent changes introduce issues. - Add prepend_contextual_header() to prefix chunks with [Paper Title] or [Title | Section: X] - Modify chunk_with_ragnar() to accept optional paper_title parameter - Prepend paper titles to abstract chunks during rebuild (replaces old [Abstract] approach) - Use filename (minus extension) as document chunk header during rebuild - Add RAGNAR_INDEX_SCHEMA_VERSION (v2) for tracking chunk format changes - Add is_ragnar_store_stale() and mark_ragnar_store_current() for stale detection - Mark stores as current (v2) after successful rebuild - Add 17 unit tests for headers, backward compatibility, and stale detection Completes v15 Set A. Resolves #12, #48. Continues #142, #157.

…endpoints Create R/api_provider.R with unified interface (provider_chat_completion, provider_get_embeddings, provider_list_models, provider_check_health) that wraps any OpenAI-compatible LLM endpoint with automatic duration_ms timing and NULL usage token handling. Big-bang migration of all 15+ call sites across rag.R, slides.R, mod_query_builder.R, _ragnar.R, db.R, mod_document_notebook.R, mod_search_notebook.R, and mod_slides.R from direct chat_completion/ get_embeddings calls to the provider layer. Key changes: - All LLM calls now route through provider_chat_completion/provider_get_embeddings - duration_ms captured for every call via proc.time() - log_cost() accepts optional duration_ms parameter (forward-compat with Phase 2) - estimate_cost() returns $0 for local models (is_local=TRUE) instead of DEFAULT_PRICING - NULL usage tokens default to 0 (graceful handling for local models) - mirai workers source api_provider.R and config.R for async reindex tasks

Tests cover: provider config creation, usage normalization (NULL/partial), config bridge, health check (offline), cost estimation (local vs cloud), and log_cost with optional duration_ms parameter. Also adds brainstorm and plan documents for v15 Set B.

… Tracker UI Phase 2 of v15 Set B AI Infrastructure: - Migration 012: add duration_ms column to cost_log - 4 latency query functions (by model, by operation, trend, summary) with p50/p95 percentiles and NULL-safe handling - Latency accordion section in Cost Tracker with value box, per-model table, per-operation table, and sparkline trend - 22 new tests covering all latency queries and edge cases

…operation-based resolution Phase 3 of v15 Set B AI Infrastructure: - Add slot field to all 17 COST_OPERATION_META entries (fast/quality/embedding/NA) - Add resolve_model_for_operation() for centralized model routing - Settings UI: 3 dropdowns (quality, fast optional, embedding) replacing 2 - Fast slot falls back to quality model when not configured - Runtime migration: chat_model → quality_model for existing users - Migrate all call sites across rag.R, mod_slides.R, mod_query_builder.R, mod_document_notebook.R, mod_search_notebook.R, db.R - 51 new tests for slot resolution, fallback, migration, COST_OPERATION_META validation

…etection, and stale index extension Phase 4 of v15 Set B AI Infrastructure: - Migration 013: providers table with OpenRouter seeded as default - Provider CRUD: save/get/delete with default-provider protection - Settings UI: Providers section with add/edit/delete/test connection modals - Embedding dimension detection via known table + provider probe fallback - Stale index detection extended to check embedding model mismatch - is_local_provider() helper for zero-cost detection - get_all_available_models() for multi-provider model aggregation - 27 new tests for CRUD, upsert, dimension detection, stale index, local provider

… matching, and smart defaults Phase 5 of v15 Set B AI Infrastructure: - New R/api_artificialanalysis.R: fetch/load/cache AA model data - Bundled snapshot: 14 models with quality, speed, price data - Model ID matching: manual mapping (18 entries) + fuzzy normalization - Model picker enrichment: shows Q:score, tok/s, $/M when AA data available - Smart defaults: cheapest competent for fast, smartest affordable for quality - Settings UI: Model Benchmarks section with refresh button and AA API key - Model info panel enriched with quality, coding, speed, TTFT from AA - DB caching for refreshed AA data with proper JSON round-trip - 37 new tests covering data loading, matching, enrichment, smart defaults, caching

- README: add AI Infrastructure section (3-slot routing, multi-provider, AA benchmarks, smart defaults), latency tracking, new files in tree - TODO: mark all v15 issues complete (#157, #144, #142, #48, #8)

Copilot

Pull request overview

This PR completes the v15 AI Infrastructure milestone by introducing a provider abstraction layer (to support OpenRouter + local OpenAI-compatible endpoints), adding split-model routing (fast/quality/embedding), integrating Artificial Analysis benchmark enrichment, and expanding telemetry with latency + OpenAlex usage tracking.

Changes:

Added provider abstraction (provider_chat_completion / provider_get_embeddings), provider CRUD (DB-backed), and model slot routing.
Added latency persistence + analytics (migration + queries + Cost Tracker UI).
Added retrieval quality upgrades (query reformulation parsing, RRF merge, contextual chunk headers) and bundled AA benchmark snapshot + mapping.

Reviewed changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`R/api_provider.R`	Introduces provider abstraction, local/provider helpers, slot resolution, and model aggregation.
`R/api_openalex.R`	Adds OA usage header parsing + centralized request wrapper.
`R/cost_tracking.R`	Adds slots to operation metadata, latency queries, OA usage queries, and cost logging w/ duration.
`R/_ragnar.R`	Switches embedding to provider layer; adds contextual headers + stale-index schema tracking; adds RRF retrieval.
`R/db.R`	Updates hybrid search to use provider embeddings + query reformulation variants + RRF retrieval.
`R/mod_cost_tracker.R`	Adds latency section UI + OpenAlex usage section UI.
`app.R`	Adds OA sidebar badge rendering logic.
`migrations/011_.sql`, `migrations/012_.sql`, `migrations/013_*.sql`	Adds OA usage log table, latency column, providers table.
`tests/testthat/*`	Adds unit/integration coverage for RRF merge, query reformulation parsing, OA usage tracking, providers, AA, latency.
`README.md`, `TODO.md`, `docs/plans/`, `docs/brainstorms/`	Updates documentation/planning to reflect v15 milestone completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

R/cost_tracking.R

+  today <- as.character(Sys.Date())
+  last_fired <- tryCatch(get_db_setting(con, "oa_toast_last_fired_date"), error = function(e) NULL)
+
+  if (!is.null(last_fired) && last_fired == today) return(FALSE)
+
+  TRUE


R/api_provider.R

+#' Build provider config from effective_config
+#'
+#' Extracts the OpenRouter API key from the effective_config list
+#' and returns a provider_config. This bridges the existing settings
+#' system with the provider layer.
+#'
+#' @param config effective_config list (from mod_settings_server)
+#' @return provider_config for OpenRouter
+provider_from_config <- function(config) {
+  api_key <- get_setting(config, "openrouter", "api_key")
+  openrouter_provider(api_key)


R/api_openalex.R

+  body <- tryCatch({
+    perform_oa_request(req, con = NULL, operation = "search")
  }, error = function(e) {
    stop_api_error(e, "OpenAlex")
  })

-  body <- resp_body_json(resp)
  parse_search_response(body)


migrations/011_create_oa_usage_log.sql

+-- Track OpenAlex API usage from response headers
+-- Supports the new freemium API key model (Feb 2026)
+CREATE TABLE IF NOT EXISTS oa_usage_log (
+  id VARCHAR PRIMARY KEY DEFAULT (gen_random_uuid()::VARCHAR),


R/api_provider.R

+provider_list_models <- function(provider) {
+  req <- build_provider_request(provider, "models")
+
+  resp <- tryCatch({
+    req_perform(req)
+  }, error = function(e) {
+    return(data.frame(id = character(), name = character(), stringsAsFactors = FALSE))
+  })


R/rag.R

+    cost <- estimate_cost(model,
+                          result$usage$prompt_tokens %||% 0,
+                          result$usage$completion_tokens %||% 0)
+    log_cost(con, "query_reformulation", model,
+             result$usage$prompt_tokens %||% 0,
+             result$usage$completion_tokens %||% 0,
+             result$usage$total_tokens %||% 0,
+             cost, session_id,
+             duration_ms = result$duration_ms)


R/_ragnar.R


+    # Mark store schema as current
+    if (!is.null(con)) {
+      mark_ragnar_store_current(con, notebook_id)


R/_ragnar.R

+  # Require provider with API key (needed for embed function)
+  if (is.null(provider) || is.null(provider$api_key) || nchar(provider$api_key) == 0) {
+    stop("Provider with API key required to create/open ragnar store for embedding")


R/db.R

    # Attach embed function for query vectorization (ragnar_retrieve needs it)
-    if (!is.null(store) && !is.null(api_key) && nchar(api_key) > 0) {
-      store@embed <- make_embed_function(api_key, embed_model)
+    has_provider <- !is.null(provider) && !is.null(provider$api_key) && nchar(provider$api_key) > 0


R/mod_search_notebook.R

@@ -531,7 +533,7 @@ mod_search_notebook_server <- function(id, con, notebook_id, config, notebook_re
        )
        result
      }, notebook_id = notebook_id, documents = documents, abstracts = abstracts,
-         api_key = api_key, embed_model = embed_model, interrupt_flag = interrupt_flag,
+         provider = provider, embed_model = embed_model, interrupt_flag = interrupt_flag,
         progress_file = progress_file, app_dir = app_dir)


…back provider_from_config() now resolves the default provider from the DB providers table instead of always returning OpenRouter. This unblocks local models (Ollama, LM Studio) for all LLM operations. - provider_from_config: add con param, resolve DB default, fall back to OpenRouter - build_provider_request: handle NA api_key from DuckDB (not just NULL) - get_ragnar_store: remove hard api_key requirement, allow local providers - search_chunks_hybrid: simplify has_provider guard to !is.null(provider) - estimate_cost: pass is_local flag at all 11 call sites in rag.R - mark_ragnar_store_current: pass embed_model for mismatch detection - async reindex: plumb db_path through mirai worker for stale-index metadata - OA usage: use UTC dates for daily usage/toast dedup - migration 011: fix gen_random_uuid() to DuckDB-native uuid() - api_key guards: update mod_query_builder, mod_search_notebook to allow local providers - Add 23 integration tests verified against live LM Studio (Gemma 3 + nomic embeddings)

- cost_tracking.R: merge HEAD's slot field with integration's refiner_eval entry - api_openalex.R: keep perform_oa_request() (v15 usage tracking), wire through perform_openalex() for verbose logging - TODO.md: combine completed items from both branches

- Add bounds checks on choices/data arrays in provider API responses - Add tryCatch in make_embed_function for provider error context - Add removeModal() before showModal() in slide generation to prevent modal stacking - Wrap resp_body_json in tryCatch in fetch_aa_models for non-JSON fallback

seanthimons added 14 commits March 17, 2026 13:55

docs: mark all v15 Set A plan tasks as complete

9542b18

docs: mark all Phase 1 provider abstraction tasks as complete

c462c3b

docs: update README and TODO for v15 AI Infrastructure completion

a98aa63

- README: add AI Infrastructure section (3-slot routing, multi-provider, AA benchmarks, smart defaults), latency tracking, new files in tree - TODO: mark all v15 issues complete (#157, #144, #142, #48, #8)

Copilot AI review requested due to automatic review settings March 18, 2026 19:44

Copilot AI reviewed Mar 18, 2026

View reviewed changes

seanthimons added 2 commits March 19, 2026 14:38

docs: add local LLM setup instructions for LM Studio and Ollama

d680449

seanthimons changed the base branch from main to integration March 20, 2026 17:53

seanthimons added 2 commits March 20, 2026 13:56

seanthimons merged commit 4e299ca into integration Mar 20, 2026

seanthimons deleted the v15-ai-infrastructure branch March 22, 2026 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v15 AI Infrastructure — provider abstraction, split models, latency, local models, AA benchmarks#162

feat: v15 AI Infrastructure — provider abstraction, split models, latency, local models, AA benchmarks#162
seanthimons merged 18 commits intointegrationfrom
v15-ai-infrastructure

seanthimons commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seanthimons commented Mar 18, 2026

Summary

Set A — Retrieval & Usage Tracking

Set B — AI Infrastructure (5 phases)

Key files

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants