YouTube Video Transcript Search and Idea Generation with Qdrant and OpenAI #71

mwitiderrick · 2025-04-29T07:35:14Z

This PR adds an end-to-end example demonstrating using Qdrant for semantic search over YouTube video transcripts combined with OpenAI to generate new video ideas based on past content.

Specifically, it includes:

✅ Setup of a video_transcripts collection in Qdrant with vector embeddings (text-embedding-ada-002) and metadata payloads (e.g., transcript, user_id).

✅ embed_and_store() function to generate embeddings for transcripts and store them with metadata in Qdrant.

✅ search_similar_transcripts() function to semantically search for transcripts similar to a query text, filtering by user ID.

✅ generate_video_idea() function that uses OpenAI Chat Completions API to propose a new video idea based on retrieved similar transcripts.

✨ Why This is Useful:

Demonstrates a practical use case of combining vector search (Qdrant) with language generation (OpenAI).

Shows how to store text and metadata together with embeddings for richer retrieval.

Provides a real-world example relevant to creators, marketers, and content platforms.

🔧 Technologies Used:

Copilot

Pull Request Overview

This PR introduces an end‑to‑end example demonstrating semantic search over YouTube video transcripts with Qdrant and idea generation using OpenAI. Key changes include:

Adding helper functions (e.g., embed_and_store, search_similar_transcripts, generate_video_idea) for handling embeddings, storage, and idea generation.
Implementing new API endpoints and management commands for YouTube processing and periodic task scheduling.
Expanding documentation (README.MD) to guide setup and usage.

Reviewed Changes

Copilot reviewed 71 out of 73 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
video-generation/backend/api/youtube_utils.py	Adds YouTube authentication, transcript embedding, and storage functions using Qdrant and OpenAI.
video-generation/backend/api/views.py	Provides API endpoints for user API keys, task handling, and video generation requests.
video-generation/backend/api/urls.py	Registers new API endpoints for task management and video processing.
video-generation/backend/api/transcription.py	Implements audio transcription using OpenAI Whisper API.
video-generation/backend/api/tests.py	Placeholder for tests.
video-generation/backend/api/redis_client.py	Sets up Redis client for task status.
video-generation/backend/api/qdrant_utils.py	Adds functions for semantic search and video idea generation via Qdrant and OpenAI.
video-generation/backend/api/models.py	Placeholder for Django models.
video-generation/backend/api/management/commands/schedule_tasks.py	Schedules periodic tasks for video creation and vector DB updates.
video-generation/backend/api/management/commands/run_youtube_process.py	Provides CLI command to trigger YouTube processing tasks.
video-generation/backend/api/management/commands/create_qdrant_collection.py	Command to ensure Qdrant collection exists before processing.
video-generation/backend/api/apps.py	Standard Django app configuration.
video-generation/README.MD	Updates documentation for project setup, usage, and API integration.

Files not reviewed (2)

video-generation/backend/.gitignore: Language not supported
video-generation/backend/Dockerfile: Language not supported

Comments suppressed due to low confidence (1)

video-generation/backend/api/views.py:25

The task 'test_celery_task' is referenced without being imported or defined; ensure it is correctly imported or updated to the appropriate task.

task = test_celery_task.delay(2, 3)

Copilot · 2025-04-29T10:25:36Z

video-generation/backend/api/qdrant_utils.py

+
+
+
+    response = openai.chat.completions.create(


The call 'openai.chat.completions.create' appears to be incorrect; update it to 'openai.ChatCompletion.create' as per the OpenAI API specification.

Suggested change

response = openai.chat.completions.create(

response = openai.ChatCompletion.create(

video-generation/README.MD

Co-authored-by: Copilot <[email protected]>

kacperlukawski

@mwitiderrick Thanks a lot for contributing this application! I'm not merging it, as I don't think it belongs in the "examples" category. Please let me clarify that a bit. An example for us is a digestible piece presenting how to use a certain Qdrant functionality. However, this seems to be a fully-fledged application that requires multiple systems to run, so not that many people will be able to test it on their own.

Keeping this application in a separate repository would make sense, as we do with all the other demos. A running version should also be hosted somewhere to attract interest.

I left some minor comments, but in general, I think the idea for the app is really neat! The app looks like any standard Django application.

Thanks again for doing this effort!

kacperlukawski · 2025-05-05T10:04:30Z

video-generation/backend/api/youtube_utils.py

+qdrant = QdrantClient(url=os.getenv("QDRANT_HOST"),prefer_grpc=False )
+
+def ensure_qdrant_collection():
+    if not qdrant.collection_exists("video_transcripts"):
+        qdrant.create_collection(
+            collection_name="video_transcripts",
+            vectors_config=VectorParams(
+                size=1536,
+                distance=Distance.COSINE
+            )
+        )
+
+
+def embed_and_store(user, text, metadata):
+    logger.info(f"[🔑] Starting embed_and_store for user {user.id} with metadata: {metadata}")
+
+    try:
+        client = OpenAI(api_key=user.openai_api_key_decrypted)
+        logger.info("[🧠] Initialized OpenAI client.")
+    except Exception as e:
+        logger.exception("[❌] Failed to initialize OpenAI client.")
+        raise e
+
+    try:
+        response = client.embeddings.create(
+            input=[text],
+            model="text-embedding-ada-002"
+        )
+        embedding = response.data[0].embedding
+        logger.info("[✅] Embedding successfully created.")
+    except Exception as e:
+        logger.exception("[❌] Failed to generate embedding.")
+        raise e
+
+    try:
+        point_id = str(uuid.uuid4())
+        logger.info(f"[🆔] Generated UUID: {point_id}")
+
+        point = PointStruct(id=point_id, vector=embedding, payload=metadata)
+        logger.info("[📦] PointStruct created.")
+
+        qdrant.upsert("video_transcripts", [point])
+        logger.info(f"[📤] Upserted into Qdrant with point ID {point_id}")
+
+        return point_id
+
+    except Exception as e:
+        logger.exception("[❌] Failed to upsert into Qdrant.")
+        raise e


I think these functions do not belong here and should be put in qdrant_utils.py instead. I struggled a bit with finding them.

kacperlukawski · 2025-05-05T10:04:44Z

video-generation/backend/compose-example.yml

Is that file required at all?

kacperlukawski · 2025-05-05T10:09:43Z

video-generation/backend/users/utils.py

+def encrypt_value(value):
+    if not value:
+        return None
+    f = get_fernet()
+    return f.encrypt(value.encode()).decode()
+
+def decrypt_value(value):
+    if not value:
+        return None
+    try:
+        f = get_fernet()
+        return f.decrypt(value.encode()).decode()
+    except InvalidToken:
+        return "[DECRYPTION_FAILED]"


I love that you implemented this! Many people will store everything in plaintext.

kacperlukawski · 2025-05-05T10:11:10Z

video-generation/backend/users/models.py

+class Video(models.Model):
+    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
+    user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="videos")
+    title = models.CharField(max_length=255)
+    description = models.TextField()
+    video_url = models.URLField()
+    created_at = models.DateTimeField(auto_now_add=True)
+
+    def __str__(self):
+        return self.title


It is a bit confusing that the Django app is called users and there are some other models here. First of all, I checked the api app, but there were no models at all, which I found quite intriguing.

Qdrant-Powered AI Video Idea Generation

dd8c7aa

kacperlukawski requested a review from Copilot April 29, 2025 10:24

Copilot AI reviewed Apr 29, 2025

View reviewed changes

Update video-generation/README.MD

f50db15

Co-authored-by: Copilot <[email protected]>

kacperlukawski reviewed May 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

YouTube Video Transcript Search and Idea Generation with Qdrant and OpenAI #71

YouTube Video Transcript Search and Idea Generation with Qdrant and OpenAI #71

Uh oh!

mwitiderrick commented Apr 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 29, 2025

Uh oh!

Uh oh!

kacperlukawski left a comment

Uh oh!

kacperlukawski May 5, 2025

Uh oh!

kacperlukawski May 5, 2025

Uh oh!

kacperlukawski May 5, 2025

Uh oh!

kacperlukawski May 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	response = openai.chat.completions.create(
	response = openai.ChatCompletion.create(

YouTube Video Transcript Search and Idea Generation with Qdrant and OpenAI #71

Are you sure you want to change the base?

YouTube Video Transcript Search and Idea Generation with Qdrant and OpenAI #71

Uh oh!

Conversation

mwitiderrick commented Apr 29, 2025

✨ Why This is Useful:

🔧 Technologies Used:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kacperlukawski left a comment

Choose a reason for hiding this comment

Uh oh!

kacperlukawski May 5, 2025

Choose a reason for hiding this comment

Uh oh!

kacperlukawski May 5, 2025

Choose a reason for hiding this comment

Uh oh!

kacperlukawski May 5, 2025

Choose a reason for hiding this comment

Uh oh!

kacperlukawski May 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants