Skip to content

Conversation

mwitiderrick
Copy link
Contributor

This PR adds an end-to-end example demonstrating using Qdrant for semantic search over YouTube video transcripts combined with OpenAI to generate new video ideas based on past content.

Specifically, it includes:

✅ Setup of a video_transcripts collection in Qdrant with vector embeddings (text-embedding-ada-002) and metadata payloads (e.g., transcript, user_id).

embed_and_store() function to generate embeddings for transcripts and store them with metadata in Qdrant.

search_similar_transcripts() function to semantically search for transcripts similar to a query text, filtering by user ID.

generate_video_idea() function that uses OpenAI Chat Completions API to propose a new video idea based on retrieved similar transcripts.

✨ Why This is Useful:

Demonstrates a practical use case of combining vector search (Qdrant) with language generation (OpenAI).

Shows how to store text and metadata together with embeddings for richer retrieval.

Provides a real-world example relevant to creators, marketers, and content platforms.

🔧 Technologies Used:

@kacperlukawski kacperlukawski requested a review from Copilot April 29, 2025 10:24
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an end‑to‑end example demonstrating semantic search over YouTube video transcripts with Qdrant and idea generation using OpenAI. Key changes include:

  • Adding helper functions (e.g., embed_and_store, search_similar_transcripts, generate_video_idea) for handling embeddings, storage, and idea generation.
  • Implementing new API endpoints and management commands for YouTube processing and periodic task scheduling.
  • Expanding documentation (README.MD) to guide setup and usage.

Reviewed Changes

Copilot reviewed 71 out of 73 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
video-generation/backend/api/youtube_utils.py Adds YouTube authentication, transcript embedding, and storage functions using Qdrant and OpenAI.
video-generation/backend/api/views.py Provides API endpoints for user API keys, task handling, and video generation requests.
video-generation/backend/api/urls.py Registers new API endpoints for task management and video processing.
video-generation/backend/api/transcription.py Implements audio transcription using OpenAI Whisper API.
video-generation/backend/api/tests.py Placeholder for tests.
video-generation/backend/api/redis_client.py Sets up Redis client for task status.
video-generation/backend/api/qdrant_utils.py Adds functions for semantic search and video idea generation via Qdrant and OpenAI.
video-generation/backend/api/models.py Placeholder for Django models.
video-generation/backend/api/management/commands/schedule_tasks.py Schedules periodic tasks for video creation and vector DB updates.
video-generation/backend/api/management/commands/run_youtube_process.py Provides CLI command to trigger YouTube processing tasks.
video-generation/backend/api/management/commands/create_qdrant_collection.py Command to ensure Qdrant collection exists before processing.
video-generation/backend/api/apps.py Standard Django app configuration.
video-generation/README.MD Updates documentation for project setup, usage, and API integration.
Files not reviewed (2)
  • video-generation/backend/.gitignore: Language not supported
  • video-generation/backend/Dockerfile: Language not supported
Comments suppressed due to low confidence (1)

video-generation/backend/api/views.py:25

  • The task 'test_celery_task' is referenced without being imported or defined; ensure it is correctly imported or updated to the appropriate task.
task = test_celery_task.delay(2, 3)




response = openai.chat.completions.create(
Copy link

Copilot AI Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call 'openai.chat.completions.create' appears to be incorrect; update it to 'openai.ChatCompletion.create' as per the OpenAI API specification.

Suggested change
response = openai.chat.completions.create(
response = openai.ChatCompletion.create(

Copilot uses AI. Check for mistakes.

Copy link
Member

@kacperlukawski kacperlukawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwitiderrick Thanks a lot for contributing this application! I'm not merging it, as I don't think it belongs in the "examples" category. Please let me clarify that a bit. An example for us is a digestible piece presenting how to use a certain Qdrant functionality. However, this seems to be a fully-fledged application that requires multiple systems to run, so not that many people will be able to test it on their own.

Keeping this application in a separate repository would make sense, as we do with all the other demos. A running version should also be hosted somewhere to attract interest.

I left some minor comments, but in general, I think the idea for the app is really neat! The app looks like any standard Django application.

Thanks again for doing this effort!

Comment on lines +90 to +138
qdrant = QdrantClient(url=os.getenv("QDRANT_HOST"),prefer_grpc=False )

def ensure_qdrant_collection():
if not qdrant.collection_exists("video_transcripts"):
qdrant.create_collection(
collection_name="video_transcripts",
vectors_config=VectorParams(
size=1536,
distance=Distance.COSINE
)
)


def embed_and_store(user, text, metadata):
logger.info(f"[🔑] Starting embed_and_store for user {user.id} with metadata: {metadata}")

try:
client = OpenAI(api_key=user.openai_api_key_decrypted)
logger.info("[🧠] Initialized OpenAI client.")
except Exception as e:
logger.exception("[❌] Failed to initialize OpenAI client.")
raise e

try:
response = client.embeddings.create(
input=[text],
model="text-embedding-ada-002"
)
embedding = response.data[0].embedding
logger.info("[✅] Embedding successfully created.")
except Exception as e:
logger.exception("[❌] Failed to generate embedding.")
raise e

try:
point_id = str(uuid.uuid4())
logger.info(f"[🆔] Generated UUID: {point_id}")

point = PointStruct(id=point_id, vector=embedding, payload=metadata)
logger.info("[📦] PointStruct created.")

qdrant.upsert("video_transcripts", [point])
logger.info(f"[📤] Upserted into Qdrant with point ID {point_id}")

return point_id

except Exception as e:
logger.exception("[❌] Failed to upsert into Qdrant.")
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these functions do not belong here and should be put in qdrant_utils.py instead. I struggled a bit with finding them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that file required at all?

Comment on lines +11 to +24
def encrypt_value(value):
if not value:
return None
f = get_fernet()
return f.encrypt(value.encode()).decode()

def decrypt_value(value):
if not value:
return None
try:
f = get_fernet()
return f.decrypt(value.encode()).decode()
except InvalidToken:
return "[DECRYPTION_FAILED]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that you implemented this! Many people will store everything in plaintext.

Comment on lines +81 to +90
class Video(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="videos")
title = models.CharField(max_length=255)
description = models.TextField()
video_url = models.URLField()
created_at = models.DateTimeField(auto_now_add=True)

def __str__(self):
return self.title No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit confusing that the Django app is called users and there are some other models here. First of all, I checked the api app, but there were no models at all, which I found quite intriguing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants