Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 49 additions & 17 deletions apps/web/src/app/api/extract-events/route.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import OpenAI from 'openai';

import { Type } from '@google/genai';
import { NextResponse } from 'next/server';
import { getGeminiClient, hasGeminiKey } from '@/lib/gemini-client';

Expand Down Expand Up @@ -49,6 +49,43 @@ const extractionSchema = {
additionalProperties: false,
};

// Gemini responseSchema using @google/genai Type system
const geminiResponseSchema = {
type: Type.OBJECT,
properties: {
events: {
type: Type.ARRAY,
items: {
type: Type.OBJECT,
properties: {
type: { type: Type.STRING, enum: ['action', 'topic', 'insight', 'tool', 'resource'] },
title: { type: Type.STRING },
description: { type: Type.STRING },
timestamp: { type: Type.STRING, nullable: true },
priority: { type: Type.STRING, enum: ['high', 'medium', 'low'] },
},
required: ['type', 'title', 'description', 'priority'],
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gemini schema allows timestamp to be omitted (required excludes it) but the OpenAI path always includes timestamp (nullable) and the prompt examples expect it. This leads to inconsistent response shapes depending on provider/model. Consider either adding timestamp to the Gemini schema required list (keeping it nullable) or updating the prompt/consumers to treat it as truly optional everywhere.

Suggested change
required: ['type', 'title', 'description', 'priority'],
required: ['type', 'title', 'description', 'timestamp', 'priority'],

Copilot uses AI. Check for mistakes.
},
},
actions: {
type: Type.ARRAY,
items: {
type: Type.OBJECT,
properties: {
title: { type: Type.STRING },
description: { type: Type.STRING },
category: { type: Type.STRING, enum: ['setup', 'build', 'deploy', 'learn', 'research', 'configure'] },
estimatedMinutes: { type: Type.NUMBER, nullable: true },
},
required: ['title', 'description', 'category'],
},
},
summary: { type: Type.STRING },
topics: { type: Type.ARRAY, items: { type: Type.STRING } },
},
required: ['events', 'actions', 'summary', 'topics'],
};
Comment on lines +53 to +87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved type safety and consistency with gemini-video-analyzer.ts, consider adding as const to all required arrays within this schema definition. This makes the types stricter and prevents accidental modifications.

For example:

// ...
        required: ['type', 'title', 'description', 'priority'] as const,
// ...


const SYSTEM_PROMPT = `You are an expert content analyst. Extract structured data from video transcripts.
Be specific and practical — no vague or generic items.
For events: classify type (action/topic/insight/tool/resource) and priority (high/medium/low).
Expand Down Expand Up @@ -91,16 +128,17 @@ async function extractWithOpenAI(trimmed: string, videoTitle?: string, videoUrl?
async function extractWithGemini(trimmed: string, videoTitle?: string, videoUrl?: string) {
const ai = getGeminiClient();
const response = await ai.models.generateContent({
model: 'gemini-2.0-flash',
model: 'gemini-3-pro-preview',
contents: `${SYSTEM_PROMPT}\n\n${buildUserPrompt(trimmed, videoTitle, videoUrl)}`,
config: {
temperature: 0.3,
responseMimeType: 'application/json',
responseSchema: geminiResponseSchema,
tools: [{ googleSearch: {} }],
Comment on lines +131 to 137
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractWithGemini() is called when you already have transcript text, but it still enables googleSearch and uses the more expensive gemini-3-pro-preview to support schema+tool together. If search grounding isn't needed for transcript-only extraction (the prompt doesn't instruct it), consider removing the googleSearch tool in this path so you can use a cheaper model (or keep Pro only for the direct videoUrl search path).

Suggested change
model: 'gemini-3-pro-preview',
contents: `${SYSTEM_PROMPT}\n\n${buildUserPrompt(trimmed, videoTitle, videoUrl)}`,
config: {
temperature: 0.3,
responseMimeType: 'application/json',
responseSchema: geminiResponseSchema,
tools: [{ googleSearch: {} }],
model: 'gemini-1.5-flash-latest',
contents: `${SYSTEM_PROMPT}\n\n${buildUserPrompt(trimmed, videoTitle, videoUrl)}`,
config: {
temperature: 0.3,
responseMimeType: 'application/json',
responseSchema: geminiResponseSchema,

Copilot uses AI. Check for mistakes.
},
});
const text = (response.text ?? '').trim();
const cleaned = text.replace(/^```(?:json)?\s*\n?/i, '').replace(/\n?```\s*$/i, '');
return JSON.parse(cleaned);
const text = response.text ?? '';
return JSON.parse(text);
Comment on lines +140 to +141
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

JSON.parse('') will throw an error if the API returns an empty string. To make this more robust, it's safer to parse a fallback empty object, a pattern already used in gemini-video-analyzer.ts.

Suggested change
const text = response.text ?? '';
return JSON.parse(text);
const text = response.text || '{}';
return JSON.parse(text);

}

export async function POST(request: Request) {
Expand Down Expand Up @@ -146,29 +184,23 @@ export async function POST(request: Request) {
try {
const ai = getGeminiClient();
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
model: 'gemini-3-pro-preview',
contents: `${SYSTEM_PROMPT}\n\nAnalyze this YouTube video and extract structured data.
Use your Google Search tool to find the video's transcript, description, and chapter content.

Video URL: ${videoUrl}
${videoTitle ? `Video Title: ${videoTitle}` : ''}
Comment on lines 191 to 192
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The videoUrl and videoTitle parameters from the request body are directly embedded into the LLM prompt without sanitization. An attacker can provide a malicious URL or title containing instructions to manipulate the LLM's behavior. Since the LLM has access to the googleSearch tool, this could be used to perform arbitrary searches or exfiltrate information through search queries.


Extract events, actions, summary, and topics from the actual video content found via search.
Respond with ONLY valid JSON matching this structure:
{
"events": [{"type": "action|topic|insight|tool|resource", "title": "...", "description": "...", "timestamp": "02:15" or null, "priority": "high|medium|low"}],
"actions": [{"title": "...", "description": "...", "category": "setup|build|deploy|learn|research|configure", "estimatedMinutes": number or null}],
"summary": "2-3 sentence summary",
"topics": ["topic1", "topic2"]
}`,
Extract events, actions, summary, and topics from the actual video content found via search.`,
config: {
temperature: 0.3,
responseMimeType: 'application/json',
responseSchema: geminiResponseSchema,
tools: [{ googleSearch: {} }],
},
});
const text = (response.text ?? '').trim();
const cleaned = text.replace(/^```(?:json)?\s*\n?/i, '').replace(/\n?```\s*$/i, '');
parsed = JSON.parse(cleaned);
const text = response.text ?? '';
parsed = JSON.parse(text);
Comment on lines +202 to +203
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

JSON.parse('') will throw an error if the API returns an empty string. To make this more robust, it's safer to parse a fallback empty object, a pattern already used in gemini-video-analyzer.ts.

Suggested change
const text = response.text ?? '';
parsed = JSON.parse(text);
const text = response.text || '{}';
parsed = JSON.parse(text);

provider = 'gemini-search';
} catch (e) {
console.warn('Gemini direct video extraction failed:', e);
Expand Down
2 changes: 1 addition & 1 deletion apps/web/src/app/api/transcribe/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ export async function POST(request: Request) {
const metadataContext = metadata ? formatMetadataAsContext(metadata) : '';

const result = await ai.models.generateContent({
model: 'gemini-2.5-flash',
model: 'gemini-3-pro-preview',
contents: `You are a video transcription assistant with access to Google Search.
Comment on lines 108 to 110
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching transcription/search grounding from gemini-2.5-flash to gemini-3-pro-preview may significantly increase latency and cost for this endpoint, even though it doesn't use responseSchema controlled generation. If the model change is only needed for the responseSchema+googleSearch combination, consider keeping Flash here (or making the model configurable via an env var) to avoid an avoidable operational regression.

Copilot uses AI. Check for mistakes.

For the following YouTube video, use your googleSearch tool to find the ACTUAL transcript,
Expand Down
160 changes: 122 additions & 38 deletions apps/web/src/lib/gemini-video-analyzer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
* transcripts, descriptions, chapters, and metadata from YouTube videos.
* Based on the UVAI PK=998 implementation pattern.
*
* NOTE: Vertex AI does NOT support responseSchema (controlled generation)
* combined with googleSearch tool. JSON structure is enforced via prompt.
* Uses gemini-3-pro-preview which supports responseSchema + googleSearch
* together (older models like gemini-2.5-flash do not).
*/

import { Type } from '@google/genai';
import { getGeminiClient } from './gemini-client';

export interface VideoAnalysisResult {
Expand All @@ -31,56 +32,140 @@ export interface VideoAnalysisResult {
topics: string[];
architectureCode: string;
ingestScript: string;
e22Snippets: {
title: string;
description: string;
code: string;
language: string;
}[];
}

/**
* Gemini response schema using the @google/genai Type system.
* Matches the UVAI PK=998 structured output requirements exactly.
*/
const responseSchema = {
type: Type.OBJECT,
properties: {
title: { type: Type.STRING },
summary: { type: Type.STRING },
transcript: {
type: Type.ARRAY,
items: {
type: Type.OBJECT,
properties: {
start: { type: Type.NUMBER, description: 'Seconds from video start' },
duration: { type: Type.NUMBER },
text: { type: Type.STRING },
},
required: ['start', 'duration', 'text'] as const,
},
},
events: {
type: Type.ARRAY,
items: {
type: Type.OBJECT,
properties: {
timestamp: { type: Type.NUMBER },
label: { type: Type.STRING },
description: { type: Type.STRING },
codeMapping: {
type: Type.STRING,
description: 'One-line code implementation of the action',
},
cloudService: { type: Type.STRING },
},
required: ['timestamp', 'label', 'description', 'codeMapping', 'cloudService'] as const,
},
},
actions: {
type: Type.ARRAY,
items: {
type: Type.OBJECT,
properties: {
title: { type: Type.STRING },
description: { type: Type.STRING },
category: {
type: Type.STRING,
enum: ['setup', 'build', 'deploy', 'learn', 'research', 'configure'],
},
estimatedMinutes: { type: Type.NUMBER, nullable: true },
},
required: ['title', 'description', 'category'] as const,
},
},
topics: { type: Type.ARRAY, items: { type: Type.STRING } },
architectureCode: { type: Type.STRING },
ingestScript: { type: Type.STRING },
e22Snippets: {
type: Type.ARRAY,
items: {
type: Type.OBJECT,
properties: {
title: { type: Type.STRING },
description: { type: Type.STRING },
code: { type: Type.STRING },
language: { type: Type.STRING },
},
required: ['title', 'description', 'code', 'language'] as const,
},
},
},
required: [
'title',
'summary',
'transcript',
'events',
'actions',
'topics',
'architectureCode',
'ingestScript',
'e22Snippets',
] as const,
};

/**
* Build the agentic system instruction for the Gemini model.
* Implements the Think → Act → Observe → Map loop from PK=998.
*/
function buildSystemInstruction(videoUrl: string): string {
const videoId = videoUrl.match(/[?&]v=([^&]+)/)?.[1] || videoUrl;
return `You are the Agentic Video Intelligence Engine.

MISSION:
1. WATCH the video at ${videoUrl} by searching for its transcript, technical documentation,
channel description, and chapter markers using your googleSearch tool.
2. THINK: Analyze the sequence of technical events described in the transcript and description.
Pay special attention to chapter markers — they indicate the video creator's own breakdown
of the content structure.
3. ACT: Reconstruct the timeline and generate actionable tasks that mirror the video content.
1. WATCH the video (Video ID: ${videoId}) by searching for its transcript, technical documentation,
Comment on lines +132 to +136
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The videoUrl parameter is used to construct the systemInstruction for the Gemini model. Specifically, the videoId (which can be the full videoUrl if the regex doesn't match) is embedded in the system instruction. This allows for prompt injection, where an attacker can provide a malicious URL to manipulate the LLM's behavior.

and chapter markers using your googleSearch tool.
2. THINK: Analyze the sequence of technical events described in the transcript.
3. ACT: Reconstruct the timeline and generate Python 'ingest.py' logic that mimics
the data patterns discussed in the video.
4. OBSERVE & MAP: Extract specific "Action Events" from the video and provide a direct
code mapping for each.
"E22 Mapping" (code logic) for each.

DATA STRUCTURE REQUIREMENTS:
- title: Accurate video title from search results.
- summary: A high-level technical executive summary.
- transcript: An array of {start, duration, text} reconstructed from grounding.
Use chapter timestamps and description content if a full transcript is unavailable.
Each entry should cover a meaningful segment (30-120 seconds).
- events: 3-5 key technical milestones with timestamp, label, description, and codeMapping.
- actions: 3-8 concrete tasks a developer/learner should DO after watching.
- topics: Key topics and technologies covered.
- architectureCode: A Markdown-formatted cloud architecture blueprint.
- ingestScript: A robust, modular Python script using Playwright for high-density ingestion.
- e22Snippets: 3-5 production-ready code snippets for E22 cloud solutions.

IMPORTANT RULES:
- Use your googleSearch tool to find the ACTUAL content. Search for the video URL,
the video title, and related terms.
STRICT RULE: NO MOCK DATA. Only use what is found via search grounding.
- Use your googleSearch tool to find the ACTUAL content.
- The video creator often provides detailed descriptions with chapter breakdowns.
USE that metadata — it is high-quality structured content.
- If a spoken transcript is not available, reconstruct content from the description,
chapters, comments, and related articles found via search.
- NO MOCK DATA. Only use what is found via search grounding.
- Be thorough — capture every key point, technical detail, and actionable insight.

You MUST respond with ONLY valid JSON (no markdown fences, no extra text) matching this exact structure:
{
"title": "Accurate video title",
"summary": "2-3 sentence technical executive summary",
"transcript": [
{"start": 0, "duration": 60, "text": "segment text covering 30-120 seconds each"}
],
"events": [
{"timestamp": 0, "label": "Event Name", "description": "What happened", "codeMapping": "one-line code", "cloudService": "relevant service"}
],
"actions": [
{"title": "Task title", "description": "What to do", "category": "setup|build|deploy|learn|research|configure", "estimatedMinutes": 15}
],
"topics": ["topic1", "topic2"],
"architectureCode": "markdown architecture overview or empty string",
"ingestScript": "Python script or empty string"
}`;
- Be thorough — capture every key point, technical detail, and actionable insight.`;
}

/**
* Executes a deep agentic analysis of a YouTube video using Gemini + Google Search.
* Uses gemini-3-pro-preview with responseSchema + googleSearch (PK=998 pattern).
* This is a single API call that handles both transcription AND extraction.
*/
export async function analyzeVideoWithGemini(
Expand All @@ -91,17 +176,16 @@ export async function analyzeVideoWithGemini(
const systemInstruction = buildSystemInstruction(videoUrl);

const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
model: 'gemini-3-pro-preview',
contents: `Perform Agentic Grounding for Video: ${videoUrl}`,
config: {
systemInstruction,
responseMimeType: 'application/json',
responseSchema,
tools: [{ googleSearch: {} }],
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateContent no longer sets a temperature here (it was previously set elsewhere in the codebase for similar calls). With tool use + long structured outputs, the default temperature can increase variance and failure rate. Consider explicitly setting temperature (and any other stability-related config like topP) to keep outputs deterministic and reduce retries/cost.

Suggested change
tools: [{ googleSearch: {} }],
tools: [{ googleSearch: {} }],
temperature: 0.2,
topP: 0.8,

Copilot uses AI. Check for mistakes.
temperature: 0.3,
},
Comment on lines 181 to 186
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The temperature setting is omitted here, while other Gemini API calls in the codebase specify it (e.g., 0.3 in extract-events/route.ts, 0.2 in transcribe/route.ts). To ensure consistent and predictable model behavior for structured data extraction, it's recommended to explicitly set a temperature here as well. A low value like 0.3 would be consistent with other similar calls.

    config: {
      systemInstruction,
      responseMimeType: 'application/json',
      responseSchema,
      tools: [{ googleSearch: {} }],
      temperature: 0.3,
    },

});

const resultText = (response.text || '').trim();
// Strip markdown code fences if present
const cleaned = resultText.replace(/^```(?:json)?\s*\n?/i, '').replace(/\n?```\s*$/i, '');
return JSON.parse(cleaned) as VideoAnalysisResult;
const resultText = response.text || '{}';
return JSON.parse(resultText) as VideoAnalysisResult;
Comment on lines +189 to +190
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response.text || '{}' silently turns an empty/blocked model response into an object missing required fields, but the function still casts it to VideoAnalysisResult. This can propagate undefined values to callers and make failures hard to diagnose. Prefer throwing a descriptive error when response.text is empty/non-JSON (or validate required keys before returning).

Suggested change
const resultText = response.text || '{}';
return JSON.parse(resultText) as VideoAnalysisResult;
const rawText = response.text;
if (!rawText || !rawText.trim()) {
throw new Error('Gemini video analysis returned empty response text');
}
let parsed: unknown;
try {
parsed = JSON.parse(rawText);
} catch {
throw new Error('Gemini video analysis returned non-JSON response text');
}
const result = parsed as Partial<VideoAnalysisResult> | null;
if (
!result ||
typeof result.title !== 'string' ||
typeof result.summary !== 'string' ||
!Array.isArray(result.transcript) ||
!Array.isArray(result.events) ||
!Array.isArray(result.actions) ||
!Array.isArray(result.topics) ||
typeof result.architectureCode !== 'string' ||
typeof result.ingestScript !== 'string' ||
!Array.isArray(result.e22Snippets)
) {
throw new Error(
'Gemini video analysis response is missing required fields',
);
}
return result as VideoAnalysisResult;

Copilot uses AI. Check for mistakes.
}
Loading