Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Lambda unable to get transcript? #375

Closed
julian998-dot opened this issue Jan 24, 2025 · 2 comments
Closed

AWS Lambda unable to get transcript? #375

julian998-dot opened this issue Jan 24, 2025 · 2 comments

Comments

@julian998-dot
Copy link

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior:

  1. Lambda Image public.ecr.aws/lambda/python:3.11.2025.01.13.14
  2. Code Below
  3. Call The function

What code / cli command are you executing?

transcript = YouTubeTranscriptApi.get_transcript(video_id)

Which Python version are you using?

Python 3.11.11

Which version of youtube-transcript-api are you using?

youtube-transcript-api 0.6.3

Expected behavior

Get the trancripts of some videos

For example: I expected to receive the english transcript

Actual behaviour

In local all work prefect!!
I use de Youtube API to search some videos and get the ID's to pass to the library, and actually work.
But when i deply an image in AWS Lambda Function with docker, it just doaent work all the videos that work in local now show:

Could not retrieve a transcript for the video https://www.youtube.com/watch?v=ym30IDwQ5LI! This is most likely caused by:
Subtitles are disabled for this video
If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!

And for every video, i tried proxy, public and private proxy, even VPN but seem the same,
Dont get it, i can use the youtube API for search in AWS, but get blocked when are from AWS?

Please help!

This is the code i'm using.

from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import TranscriptsDisabled, VideoUnavailable, NoTranscriptFound
from googleapiclient.discovery import build
import json
from tqdm import tqdm

YOUTUBE_API_KEY = 'YT_API_KEY'  


# Función Lambda
def lambda_handler(event, context):
    search_results = search_videos("TED Talks", max_results=10)
    transcripts = []
    for video_id, video_title, published_at, channel_title in tqdm(search_results, desc="Procesando videos"):
        try:
            transcript = get_transcript(video_id)
            processed_transcript = process_transcript(transcript)
            transcripts.append(processed_transcript)
            
        except NoTranscriptFound:
                pass
    return {
        "statusCode": 200,
        "body": json.dumps(str({
            "transcripts": len(transcripts),
            "sample": str(str(transcripts[-1][:30])+'...')
        })
        )
    }

def search_videos(query, max_results=5):
    youtube = build("youtube", "v3", developerKey=YOUTUBE_API_KEY)

    request = youtube.search().list(
        part="snippet",
        q=query,
        type="video",
        order="date",
        maxResults=max_results,
        videoCaption="closedCaption"  # Solo videos con subtítulos
    )
    response = request.execute()

    videos = []
    for item in response['items']:
        video_id = item['id']['videoId']
        video_title = item['snippet']['title']
        published_at = item['snippet']['publishedAt']
        channel_title = item['snippet']['channelTitle']
        videos.append((video_id, video_title, published_at, channel_title))

    return videos



def get_transcript(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return transcript
    except (TranscriptsDisabled, VideoUnavailable, NoTranscriptFound) as e:
        print(f"Error : No Subtitulos ", e)
        return ''
    except Exception as e:
        print(f"Error inesperado con proxy: {e}")
        return ''

def process_transcript(transcript):
    return " ".join([item['text'] for item in transcript])

if __name__ == '__main__':
    print(lambda_handler('', ''))



Thanks for your help!

@SeyBoo
Copy link

SeyBoo commented Jan 26, 2025

Same error on cloud run

@jdepoix
Copy link
Owner

jdepoix commented Jan 27, 2025

duplicate of #303

@jdepoix jdepoix closed this as completed Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants