AWS Lambda unable to get transcript? #375

julian998-dot · 2025-01-24T22:56:07Z

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior:

Lambda Image public.ecr.aws/lambda/python:3.11.2025.01.13.14
Code Below
Call The function

What code / cli command are you executing?

transcript = YouTubeTranscriptApi.get_transcript(video_id)

Which Python version are you using?

Python 3.11.11

Which version of youtube-transcript-api are you using?

youtube-transcript-api 0.6.3

Expected behavior

Get the trancripts of some videos

For example: I expected to receive the english transcript

Actual behaviour

In local all work prefect!!
I use de Youtube API to search some videos and get the ID's to pass to the library, and actually work.
But when i deply an image in AWS Lambda Function with docker, it just doaent work all the videos that work in local now show:

Could not retrieve a transcript for the video https://www.youtube.com/watch?v=ym30IDwQ5LI! This is most likely caused by:
Subtitles are disabled for this video
If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!

And for every video, i tried proxy, public and private proxy, even VPN but seem the same,
Dont get it, i can use the youtube API for search in AWS, but get blocked when are from AWS?

Please help!

This is the code i'm using.

from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import TranscriptsDisabled, VideoUnavailable, NoTranscriptFound
from googleapiclient.discovery import build
import json
from tqdm import tqdm

YOUTUBE_API_KEY = 'YT_API_KEY'  


# Función Lambda
def lambda_handler(event, context):
    search_results = search_videos("TED Talks", max_results=10)
    transcripts = []
    for video_id, video_title, published_at, channel_title in tqdm(search_results, desc="Procesando videos"):
        try:
            transcript = get_transcript(video_id)
            processed_transcript = process_transcript(transcript)
            transcripts.append(processed_transcript)
            
        except NoTranscriptFound:
                pass
    return {
        "statusCode": 200,
        "body": json.dumps(str({
            "transcripts": len(transcripts),
            "sample": str(str(transcripts[-1][:30])+'...')
        })
        )
    }

def search_videos(query, max_results=5):
    youtube = build("youtube", "v3", developerKey=YOUTUBE_API_KEY)

    request = youtube.search().list(
        part="snippet",
        q=query,
        type="video",
        order="date",
        maxResults=max_results,
        videoCaption="closedCaption"  # Solo videos con subtítulos
    )
    response = request.execute()

    videos = []
    for item in response['items']:
        video_id = item['id']['videoId']
        video_title = item['snippet']['title']
        published_at = item['snippet']['publishedAt']
        channel_title = item['snippet']['channelTitle']
        videos.append((video_id, video_title, published_at, channel_title))

    return videos



def get_transcript(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return transcript
    except (TranscriptsDisabled, VideoUnavailable, NoTranscriptFound) as e:
        print(f"Error : No Subtitulos ", e)
        return ''
    except Exception as e:
        print(f"Error inesperado con proxy: {e}")
        return ''

def process_transcript(transcript):
    return " ".join([item['text'] for item in transcript])

if __name__ == '__main__':
    print(lambda_handler('', ''))

Thanks for your help!

The text was updated successfully, but these errors were encountered:

SeyBoo · 2025-01-26T19:00:31Z

Same error on cloud run

jdepoix · 2025-01-27T09:17:46Z

duplicate of #303

jdepoix closed this as completed Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Lambda unable to get transcript? #375

AWS Lambda unable to get transcript? #375

julian998-dot commented Jan 24, 2025

SeyBoo commented Jan 26, 2025

jdepoix commented Jan 27, 2025

AWS Lambda unable to get transcript? #375

AWS Lambda unable to get transcript? #375

Comments

julian998-dot commented Jan 24, 2025

To Reproduce

What code / cli command are you executing?

Which Python version are you using?

Which version of youtube-transcript-api are you using?

Expected behavior

Actual behaviour

SeyBoo commented Jan 26, 2025

jdepoix commented Jan 27, 2025