Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 61 additions & 20 deletions apps/web/src/app/api/extract-events/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -150,38 +150,79 @@ export async function POST(request: Request) {
try {
const { transcript, videoTitle, videoUrl } = await request.json();

if (!transcript || typeof transcript !== 'string') {
// Accept either transcript text OR videoUrl for direct Gemini analysis
if ((!transcript || typeof transcript !== 'string') && !videoUrl) {
return NextResponse.json(
{ error: 'transcript (string) is required' },
{ error: 'transcript (string) or videoUrl is required' },
{ status: 400 }
);
}

const trimmed = transcript.slice(0, 8000);
let parsed;
let provider = 'openai';

// Try OpenAI first, fall back to Gemini on quota/auth errors
if (process.env.OPENAI_API_KEY) {
try {
parsed = await extractWithOpenAI(trimmed, videoTitle, videoUrl);
} catch (err) {
const msg = err instanceof Error ? err.message : '';
if ((msg.includes('429') || msg.includes('quota') || msg.includes('rate')) && process.env.GEMINI_API_KEY) {
console.warn('OpenAI quota hit, falling back to Gemini');
parsed = await extractWithGemini(trimmed, videoTitle, videoUrl);
provider = 'gemini';
} else {
throw err;
// If we have transcript text, use the existing extraction logic
if (transcript && typeof transcript === 'string' && transcript.length > 50) {
const trimmed = transcript.slice(0, 8000);

if (process.env.OPENAI_API_KEY) {
try {
parsed = await extractWithOpenAI(trimmed, videoTitle, videoUrl);
} catch (err) {
const msg = err instanceof Error ? err.message : '';
if ((msg.includes('429') || msg.includes('quota') || msg.includes('rate')) && process.env.GEMINI_API_KEY) {
console.warn('OpenAI quota hit, falling back to Gemini');
parsed = await extractWithGemini(trimmed, videoTitle, videoUrl);
provider = 'gemini';
} else {
throw err;
}
}
} else if (process.env.GEMINI_API_KEY) {
parsed = await extractWithGemini(trimmed, videoTitle, videoUrl);
provider = 'gemini';
}
} else if (process.env.GEMINI_API_KEY) {
parsed = await extractWithGemini(trimmed, videoTitle, videoUrl);
provider = 'gemini';
} else {
}

// If no transcript but have videoUrl + Gemini, do direct video analysis via Google Search
if (!parsed && videoUrl && process.env.GEMINI_API_KEY) {
try {
const ai = getGemini();
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The model name gemini-2.5-flash appears to be incorrect and will likely cause the API call to fail. The current flash model is named gemini-1.5-flash-latest. It's advisable to use a constant for model names to ensure consistency and avoid such errors, as this typo is present in multiple files.

Suggested change
model: 'gemini-2.5-flash',
model: 'gemini-1.5-flash-latest',

contents: `${SYSTEM_PROMPT}\n\nAnalyze this YouTube video and extract structured data.
Use your Google Search tool to find the video's transcript, description, and chapter content.

Video URL: ${videoUrl}
${videoTitle ? `Video Title: ${videoTitle}` : ''}

Extract events, actions, summary, and topics from the actual video content found via search.
Respond with ONLY valid JSON matching this structure:
{
"events": [{"type": "action|topic|insight|tool|resource", "title": "...", "description": "...", "timestamp": "02:15" or null, "priority": "high|medium|low"}],
"actions": [{"title": "...", "description": "...", "category": "setup|build|deploy|learn|research|configure", "estimatedMinutes": number or null}],
"summary": "2-3 sentence summary",
"topics": ["topic1", "topic2"]
}`,
Comment on lines +193 to +206
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Untrusted user input (videoUrl and videoTitle) is directly embedded into the prompt for direct video analysis. This allows for prompt injection attacks that could manipulate the LLM's behavior or its use of the googleSearch tool.

config: {
temperature: 0.3,
responseMimeType: 'application/json',
responseSchema: geminiResponseSchema,
Comment on lines +209 to +210
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same incompatibility exists here: responseMimeType: 'application/json', responseSchema: geminiResponseSchema, and tools: [{ googleSearch: {} }] are all set in the same call. Gemini does not allow combining structured JSON output (responseSchema) with grounding tools (googleSearch) in the same request — this will cause a runtime API error. Either remove responseSchema/responseMimeType and parse the free-text response, or remove googleSearch and supply the transcript text directly.

Suggested change
responseMimeType: 'application/json',
responseSchema: geminiResponseSchema,

Copilot uses AI. Check for mistakes.
tools: [{ googleSearch: {} }],
},
});
const text = response.text ?? '';
parsed = JSON.parse(text);
provider = 'gemini-search';
} catch (e) {
console.warn('Gemini direct video extraction failed:', e);
}
}

if (!parsed) {
return NextResponse.json({
success: false,
error: 'No AI API key configured. Set OPENAI_API_KEY or GEMINI_API_KEY.',
error: 'No AI API key configured or all extraction attempts failed. Set GEMINI_API_KEY.',
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message says "Set GEMINI_API_KEY", but /api/extract-events still supports OPENAI_API_KEY as a working provider for the transcript-based path. A user who only has OPENAI_API_KEY configured and provides a videoUrl without a transcript will see this misleading error. The message should mention both API keys.

Suggested change
error: 'No AI API key configured or all extraction attempts failed. Set GEMINI_API_KEY.',
error: 'No AI API key configured or all extraction attempts failed. Set GEMINI_API_KEY and/or OPENAI_API_KEY.',

Copilot uses AI. Check for mistakes.
data: { events: [], actions: [], summary: '', topics: [] },
});
}
Expand Down
146 changes: 73 additions & 73 deletions apps/web/src/app/api/transcribe/route.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import OpenAI from 'openai';
import { GoogleGenAI } from '@google/genai';
import { NextResponse } from 'next/server';
import { fetchYouTubeMetadata, formatMetadataAsContext } from '@/lib/youtube-metadata';
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSDoc comment for POST /api/transcribe still lists the strategy order as it existed before this PR: "2. OpenAI Responses API with web_search" and "3. Gemini fallback". After this change, the order is reversed — Gemini is now strategy 2 (primary) and OpenAI is strategy 3 (fallback). The comment is now incorrect and will mislead future developers.

Copilot uses AI. Check for mistakes.

let _openai: OpenAI | null = null;
function getOpenAI() {
Expand Down Expand Up @@ -93,59 +94,42 @@ export async function POST(request: Request) {
}
}

// Strategy 2: OpenAI Responses API with web_search
if (url && !audioUrl && process.env.OPENAI_API_KEY) {
// Fetch YouTube metadata (description, chapters, title) — used by strategies below
let metadata: Awaited<ReturnType<typeof fetchYouTubeMetadata>> = null;
if (url) {
try {
const response = await getOpenAI().responses.create({
model: 'gpt-4o-mini',
instructions: `You are a video content transcription assistant.
Given a YouTube URL, use web search to find the video's transcript or detailed content.
Return the full transcript text if available, or a detailed content summary.
Be thorough — capture all key points, quotes, and technical details.`,
tools: [{ type: 'web_search' as const }],
input: `Find and return the full transcript or detailed content of this video: ${url}`,
});

const text = response.output_text || '';

if (text.length > 100) {
return NextResponse.json({
success: true,
transcript: text,
source: 'openai-web-search',
wordCount: text.split(/\s+/).length,
});
}
} catch (e) {
console.warn('OpenAI web_search transcript failed:', e);
metadata = await fetchYouTubeMetadata(url);
} catch {
console.log('YouTube metadata fetch failed, continuing without');
}
}

// Strategy 3: Gemini with direct YouTube URL processing + Google Search grounding
// Strategy 2: Gemini with Google Search grounding (PRIMARY for YouTube)
// Uses Google Search to find actual transcript content, descriptions, and chapters
if (url && !audioUrl && process.env.GEMINI_API_KEY) {
try {
const ai = getGemini();
const metadataContext = metadata ? formatMetadataAsContext(metadata) : '';

const result = await ai.models.generateContent({
model: 'gemini-2.0-flash',
contents: [
{
role: 'user',
parts: [
{
fileData: {
mimeType: 'video/*',
fileUri: url,
},
},
{
text: 'Provide a complete, detailed transcript of this video. ' +
'Include all spoken content verbatim. ' +
'Include timestamps where possible in [MM:SS] format. ' +
'Be thorough and comprehensive — capture every key point, quote, and technical detail.',
},
],
},
],
model: 'gemini-2.5-flash',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The model name gemini-2.5-flash is not a valid model identifier and will cause this API call to fail. Please update it to a correct model name, such as gemini-1.5-flash-latest, to ensure the primary transcription strategy functions correctly.

Suggested change
model: 'gemini-2.5-flash',
model: 'gemini-1.5-flash-latest',

contents: `You are a video transcription assistant with access to Google Search.

For the following YouTube video, use your googleSearch tool to find the ACTUAL transcript,
description, and chapter content. The video creator often provides detailed descriptions
with chapter breakdowns — USE that metadata as high-quality structured content.

${metadataContext ? `KNOWN VIDEO METADATA:\n${metadataContext}\n` : ''}
Video URL: ${url}

INSTRUCTIONS:
1. Search for the video's transcript using Google Search.
2. If a spoken transcript is available, return it verbatim.
3. If not, reconstruct detailed content from the description, chapters, comments,
and related articles found via search.
4. Be thorough — capture ALL key points, technical details, quotes, and actionable insights.
5. Include timestamps in [MM:SS] format where possible.
6. Do NOT return generic advice like "click Show Transcript" — return actual content.`,
Comment on lines +116 to +132
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The url parameter and metadata fetched from YouTube (which can be attacker-controlled) are directly concatenated into LLM prompts. This poses a risk of prompt injection, allowing an attacker to manipulate the transcription process or the LLM's behavior.

config: {
temperature: 0.2,
tools: [{ googleSearch: {} }],
Expand All @@ -157,40 +141,56 @@ Be thorough — capture all key points, quotes, and technical details.`,
return NextResponse.json({
success: true,
transcript: text,
source: 'gemini-video',
source: 'gemini-search',
wordCount: text.split(/\s+/).length,
metadata: metadata ? {
title: metadata.title,
channel: metadata.channel,
chapters: metadata.chapters,
} : undefined,
});
}
} catch (e) {
console.warn('Gemini video URL processing failed, trying text fallback:', e);

// Fallback: text-based Gemini with Google Search grounding
try {
const ai = getGemini();
const result = await ai.models.generateContent({
model: 'gemini-2.0-flash',
contents: `You are a video content transcription assistant. ` +
`For the following YouTube video URL, provide a detailed transcript or content summary. ` +
`Include all key points, technical details, quotes, and actionable insights. ` +
`Be thorough and comprehensive.\n\nVideo URL: ${url}`,
config: {
temperature: 0.2,
tools: [{ googleSearch: {} }],
},
});
const text = result.text ?? '';
console.warn('Gemini Google Search transcript failed:', e);
}
}

if (text.length > 100) {
return NextResponse.json({
success: true,
transcript: text,
source: 'gemini',
wordCount: text.split(/\s+/).length,
});
}
} catch (e2) {
console.warn('Gemini text fallback also failed:', e2);
// Strategy 3: OpenAI Responses API with web_search (fallback)
if (url && !audioUrl && process.env.OPENAI_API_KEY) {
try {
const metadataContext = metadata ? formatMetadataAsContext(metadata) : '';

const response = await getOpenAI().responses.create({
model: 'gpt-4o-mini',
instructions: `You are a video content transcription assistant.
Given a YouTube URL, use web search to find the video's ACTUAL transcript or detailed content.
Return the full transcript text if available. If not, provide a comprehensive content summary
based on the video's description, chapters, and any available reviews or summaries.
Do NOT return instructions on how to find a transcript — return the actual content.
Be thorough — capture all key points, quotes, technical details, and chapter breakdowns.`,
tools: [{ type: 'web_search' as const }],
input: `Find and return the full transcript or detailed content of this video: ${url}
${metadataContext ? `\nKNOWN METADATA:\n${metadataContext}` : ''}`,
Comment on lines +172 to +173
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Untrusted user input (url) and metadata are concatenated into the OpenAI prompt, leading to a potential prompt injection vulnerability.

});

const text = response.output_text || '';

// Reject results that are just instructions rather than actual content
const isGarbage = text.toLowerCase().includes('click show transcript') ||
text.toLowerCase().includes('click on the three dots') ||
text.toLowerCase().includes('steps to find') ||
(text.length < 300 && text.includes('transcript'));
Comment on lines +179 to +182
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to detect 'garbage' responses is a great addition for robustness. However, the current implementation is a bit difficult to read and maintain as a single long boolean expression. Refactoring this to use an array of substrings would make it cleaner and easier to update in the future.

        const garbageSubstrings = [
          'click show transcript',
          'click on the three dots',
          'steps to find',
        ];
        const lowerCaseText = text.toLowerCase();
        const isGarbage = garbageSubstrings.some(s => lowerCaseText.includes(s)) ||
          (text.length < 300 && lowerCaseText.includes('transcript'));


if (text.length > 100 && !isGarbage) {
return NextResponse.json({
success: true,
transcript: text,
source: 'openai-web-search',
wordCount: text.split(/\s+/).length,
});
}
} catch (e) {
console.warn('OpenAI web_search transcript failed:', e);
}
}

Expand Down
63 changes: 52 additions & 11 deletions apps/web/src/app/api/video/route.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import { NextResponse } from 'next/server';
import { publishEvent, EventTypes } from '@/lib/cloudevents';
import { analyzeVideoWithGemini } from '@/lib/gemini-video-analyzer';

// Backend URL with validation - skip if not a valid URL
const rawBackendUrl = process.env.BACKEND_URL || '';
Expand Down Expand Up @@ -112,18 +113,59 @@ export async function POST(request: Request) {
}
}

// ── Strategy 2: Frontend-only pipeline ──
// Works on Vercel without the Python backend by chaining the serverless
// /api/transcribe and /api/extract-events routes directly.
// ── Strategy 2: Gemini Agentic Analysis (primary frontend strategy) ──
// Uses Google Search grounding to retrieve transcripts, descriptions,
// and chapter data directly — no separate transcribe/extract steps needed.
if (process.env.GEMINI_API_KEY) {
try {
await publishEvent(EventTypes.TRANSCRIPT_STARTED, { url, strategy: 'gemini-agentic' }, url);
const startTime = Date.now();
const analysis = await analyzeVideoWithGemini(url, process.env.GEMINI_API_KEY);
Comment on lines +119 to +123
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyzeVideoWithGemini call has no timeout. Unlike Strategy 1 (which wraps the backend fetch in a 15-second AbortController timeout), the Gemini agentic call can take arbitrarily long — especially because it involves multiple internal Google Search round-trips. Vercel serverless functions have execution limits (typically 10-60 seconds depending on the plan). If the Gemini call runs long, the serverless function will be killed mid-execution with a 504/FUNCTION_INVOCATION_TIMEOUT error, rather than gracefully falling back to Strategy 3. A timeout wrapping this call (e.g., Promise.race with an AbortSignal-based timeout) should be added so the fallback chain is triggered cleanly instead.

Copilot uses AI. Check for mistakes.
const elapsed = Date.now() - startTime;

await publishEvent(EventTypes.PIPELINE_COMPLETED, {
strategy: 'gemini-agentic',
success: true,
transcriptSegments: analysis.transcript?.length || 0,
events: analysis.events?.length || 0,
}, url);

// Use trusted backend origin instead of deriving from potentially user-controlled request data
const origin = BACKEND_URL;
return NextResponse.json({
id: `vid_${Date.now().toString(36)}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using Date.now().toString(36) for generating an ID is not guaranteed to be unique, which could lead to collisions if the endpoint is called in rapid succession. For generating unique identifiers, it's more robust to use crypto.randomUUID(), which is already used elsewhere in the project for CloudEvents.

Suggested change
id: `vid_${Date.now().toString(36)}`,
id: `vid_${crypto.randomUUID()}`,

status: 'complete',
processing_time_ms: elapsed,
result: {
success: true,
insights: {
summary: analysis.summary,
actions: analysis.actions?.map((a) => a.title) || [],
topics: analysis.topics || [],
sentiment: 'Neutral',
},
transcript_segments: analysis.transcript?.length || 0,
transcript_source: 'gemini-agentic',
agents_used: ['gemini-agentic-engine'],
errors: [],
raw_response: {
title: analysis.title,
transcript: analysis.transcript,
events: analysis.events,
actions: analysis.actions,
architectureCode: analysis.architectureCode,
ingestScript: analysis.ingestScript,
},
},
});
} catch (e) {
console.warn('Gemini agentic analysis failed, falling back to transcribe chain:', e);
}
}

// Step 1: Get transcript
// ── Strategy 3: Frontend-only transcribe → extract chain (fallback) ──
let transcript = '';
let transcriptSource = 'none';
try {
await publishEvent(EventTypes.TRANSCRIPT_STARTED, { url, strategy: 'frontend' }, url);
await publishEvent(EventTypes.TRANSCRIPT_STARTED, { url, strategy: 'frontend-chain' }, url);
const baseUrl = getBaseUrl(request);
const transcribeRes = await fetch(`${baseUrl}/api/transcribe`, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The getBaseUrl function derives the base URL for internal API calls from the request.url, which is influenced by the user-controlled Host header. An attacker can manipulate the Host header to redirect internal fetch calls to an arbitrary external server, potentially leading to SSRF or the exfiltration of sensitive data (like the url or transcript).

method: 'POST',
Expand All @@ -140,7 +182,6 @@ export async function POST(request: Request) {
console.error('Transcript extraction failed:', e);
}

// Step 2: Extract events + insights from transcript
let extraction: { events?: Array<{ type: string; title: string; description?: string; timestamp?: string; priority?: string }>; actions?: Array<{ title: string }>; summary?: string; topics?: string[] } = {};
if (transcript) {
try {
Expand All @@ -165,7 +206,7 @@ export async function POST(request: Request) {

await publishEvent(
hasResults ? EventTypes.PIPELINE_COMPLETED : EventTypes.PIPELINE_FAILED,
{ strategy: 'frontend', success: hasResults, transcriptSource },
{ strategy: 'frontend-chain', success: hasResults, transcriptSource },
url,
);

Expand All @@ -176,15 +217,15 @@ export async function POST(request: Request) {
result: {
success: hasResults,
insights: {
summary: extraction.summary || (hasResults ? 'Transcript extracted successfully' : 'Could not extract transcript — configure OPENAI_API_KEY or GEMINI_API_KEY'),
summary: extraction.summary || (hasResults ? 'Transcript extracted successfully' : 'Could not extract transcript — configure GEMINI_API_KEY'),
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GEMINI_API_KEY environment variable is now the primary/required key for all frontend strategies (Strategy 2 in /api/video, primary strategy in /api/transcribe, and fallback in /api/extract-events), but it is absent from apps/web/.env.example. New developers or those setting up the environment won't know they need to set it, which will silently degrade the primary pipeline to the fallback chain. GEMINI_API_KEY should be added to .env.example with an appropriate placeholder and comment.

Copilot uses AI. Check for mistakes.
actions: extraction.actions?.map((a) => a.title) || [],
topics: extraction.topics || [],
sentiment: 'Neutral',
},
transcript_segments: 0,
transcript_source: transcriptSource,
agents_used: ['frontend-pipeline'],
errors: hasResults ? [] : ['Backend unavailable and transcript extraction failed'],
errors: hasResults ? [] : ['All strategies failed — ensure GEMINI_API_KEY is set'],
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message says "ensure GEMINI_API_KEY is set", but the Strategy 3 fallback chain (/api/transcribe/api/extract-events) also uses OPENAI_API_KEY as a valid fallback provider. A user who only has OPENAI_API_KEY set will see this message even though their setup is partially functional for this fallback path. The message should mention both keys: e.g. "All strategies failed — ensure GEMINI_API_KEY or OPENAI_API_KEY is set".

Copilot uses AI. Check for mistakes.
raw_response: {
transcript: { text: transcript },
extraction,
Expand Down
Loading
Loading