Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TranscriptsDisabled But it's not disabled (works locally, fails on Cloud machine) #303

Open
atoonk opened this issue Jul 17, 2024 · 164 comments

Comments

@atoonk
Copy link

atoonk commented Jul 17, 2024

To Reproduce

using youtube-transcript-api-0.6.2:

cat test.py 
from youtube_transcript_api import YouTubeTranscriptApi

print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))

outputs:

python3 ./test.py 
Traceback (most recent call last):
  File "/root/border0-plugin/./test.py", line 3, in <module>
    print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 137, in get_transcript
    return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts
    return TranscriptListFetcher(http_client).fetch(video_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch
    self._extract_captions_json(self._fetch_video_html(video_id), video_id),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json
    raise TranscriptsDisabled(video_id)
youtube_transcript_api._errors.TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=w8rYQ40C9xo! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!

What code / cli command are you executing?

I am running

from youtube_transcript_api import YouTubeTranscriptApi
print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))

Which Python version are you using?

Python 3.11.6

Which version of youtube-transcript-api are you using?

youtube-transcript-api-0.6.2

Expected behavior

Describe what you expected to happen.
I expected to receive the english transcript
I can see it in browser, see screenshot:
Screenshot 2024-07-17 at 2 56 23 PM

Actual behaviour

Traceback (most recent call last):
  File "/root/border0-plugin/./test.py", line 3, in <module>
    print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 137, in get_transcript
    return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts
    return TranscriptListFetcher(http_client).fetch(video_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch
    self._extract_captions_json(self._fetch_video_html(video_id), video_id),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json
    raise TranscriptsDisabled(video_id)
youtube_transcript_api._errors.TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=w8rYQ40C9xo! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
@Ibrahim-Faisal15
Copy link

Ibrahim-Faisal15 commented Jul 17, 2024

Yes the issue is valid, but it seems that this does not show with the link, which Youtube gave us when we use the link from Share button.

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

Hi @atoonk, do you only have this issue with this specific video, or all videos you are trying to retrieve? I can retrieve the subtitles for that video without any issues, which usually means that you are being rate-limited by YouTube (which would also mean that this should happen for all videos).

@SKVNDR
Copy link

SKVNDR commented Jul 18, 2024

Hi @jdepoix, I encountered the same problem yesterday with every video I tried. Although I don't use the API frequently, I do access it a few times per day. I hope it's not some new restriction from YouTube. I experienced the same problem as @atoonk, and the issue is still present today.

Thanks a lot for your quick response and for this amazing tool; I really like it.

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

Hi @SKVNDR, then you're most definitely being blocked by YouTube. The only way to work around this is to change your IP address in any way (VPN, proxy, or assign a new IP if possible).

@fleerdayo
Copy link

I can confirm that YouTube is most likely blocking =/
It works from my local dev env but it doesn't work in production all things equal.

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

If you're running your code on a cloud machine it could be that (depending on your setup) you're getting assigned an IP from a pool that is shared with other machines. So the IP you're using could potentially be blocked without you doing anything. YouTube could also generally black list certain IPs that are known to belong to cloud providers (just a guess, I don't know if they actually do that!).

@atoonk
Copy link
Author

atoonk commented Jul 18, 2024

Ah yes, i tried it from my laptop at home and it works fine now. And indeed, it affected all videos, which I why I thought it was a bug or new behaviour in YT api.
So, I guess YouTube blocked me (this was on Digital ocean machine). Bummer, gotta find a way around that. Any docs on the ratelimit numbers or when folks get added? I only run this once every few weeks and only for a dozen videos or so. So bit surprised I was blocked. Unless it's all of digital ocean.

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

Since this is not an official API, there unfortunately is no information on rate limits and when or for how long you will get blocked. People have been reporting different things, so I don't feel like it is consistent either.

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

I will pin this issue and leave it open, since there are issues being opened due to this all the time.
Feel free to discuss workarounds and share your experience on YouTubes blocking heuristics, but be aware that there is no proper fix here and probably never will be. That's the nature of using an unofficial API unfortunately.

@SKVNDR
Copy link

SKVNDR commented Jul 18, 2024

Same for me. I use a droplet on DigitalOcean, and YouTube probably blocked the IP from there, but using a proxy fixed the issue...

@auspy
Copy link

auspy commented Jul 18, 2024

Same for me. I use a droplet on DigitalOcean, and YouTube probably blocked the IP from there, but using a proxy fixed the issue...

how did you create a proxy can you share the code. did you use a free proxy or paid? how did you obtain that proxy?

@SKVNDR
Copy link

SKVNDR commented Jul 18, 2024

Hi @auspy,

from youtube_transcript_api import YouTubeTranscriptApi  
YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "https://user:pass@domain:port"})

I'm using a paid proxy from smartproxy.com with the "Residential" offer.
There are probably other better proxies available; I chose this one randomly.

@atoonk
Copy link
Author

atoonk commented Jul 18, 2024

confirmed, using a proxy from my droplet worked. I used this to proxy traffic from my digital ocean droplet to my local laptop. https://docs.border0.com/docs/expose-a-http-proxy
which will allow you to expose a proxy on localhost and have it egress on a separate machine (in my case my laptop)

transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "http://localhost:8080"})

can make a more details quick video if folks are interested in how to use that.

@yourdesigncoza
Copy link

Having the exact same issue, & also using DigitalOcean droplet

@auspy
Copy link

auspy commented Jul 19, 2024

confirmed, using a proxy from my droplet worked. I used this to proxy traffic from my digital ocean droplet to my local laptop. https://docs.border0.com/docs/expose-a-http-proxy which will allow you to expose a proxy on localhost and have it egress on a separate machine (in my case my laptop)

transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "http://localhost:8080"})

can make a more details quick video if folks are interested in how to use that.

sure would love a video on it. drop the link here

@auspy
Copy link

auspy commented Jul 19, 2024

Hi @auspy,

from youtube_transcript_api import YouTubeTranscriptApi  
YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "https://user:pass@domain:port"})

I'm using a paid proxy from smartproxy.com with the "Residential" offer. There are probably other better proxies available; I chose this one randomly.

thank you for sharing. this surely looks like a cheap option but I was looking for something free. don't want to pay in initial stages of my project.

@yourdesigncoza
Copy link

yourdesigncoza commented Jul 19, 2024

confirmed, using a proxy from my droplet worked. I used this to proxy traffic from my digital ocean droplet to my local laptop. https://docs.border0.com/docs/expose-a-http-proxy which will allow you to expose a proxy on localhost and have it egress on a separate machine (in my case my laptop)

transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "http://localhost:8080"})

can make a more details quick video if folks are interested in how to use that.

sure would love a video on it. drop the link here

@auspy Would Appreciate a vid. or just more info. ::: I'm all new to proxies etc. seems most info. online is kinda for the more experienced :::

@williamtkelley
Copy link

Just ran across this issue today, glad I found this thread. I too am on Digital Ocean, running my code in a Docker container. Getting transcripts runs fine locally, but not on DO.

I would appreciate the video mentioned above, as proxies are new to me. If I use my localhost as a proxy, it means I need to leave the machine running 24/7 right? I mean, I guess that's obvious.

@tuganbaev
Copy link

Yep, same with me -- looks like youtube blocked many DO servers at once -- i didn't spent so much requests and I'm banned.

@alimbekovKZ
Copy link

I also use Digital Ocean droplet, i think they block IPs from DO servers. now I using google cloud functions.

@BenjaminKobjolke
Copy link

I can confirm that it is a problem with digital ocean servers being blocked.
Using a proxy is the solutiion.

@alimbekovKZ
Copy link

Now this error also in google cloud functions.

@ethan-0l
Copy link

ethan-0l commented Aug 5, 2024

Blocked from dedicated OVH too

@atikinkoon
Copy link

Has anyone faced same issue on pythonanywhere?

@0xRaduan
Copy link

0xRaduan commented Aug 7, 2024

faced the same issue today in aws ec2

@june-zeroxflow
Copy link

same issue today on aws lambda

Repository owner deleted a comment from hatemmezlini Jan 17, 2025
@GourangPatidar
Copy link

Ah yes, i tried it from my laptop at home and it works fine now. And indeed, it affected all videos, which I why I thought it was a bug or new behaviour in YT api. So, I guess YouTube blocked me (this was on Digital ocean machine). Bummer, gotta find a way around that. Any docs on the ratelimit numbers or when folks get added? I only run this once every few weeks and only for a dozen videos or so. So bit surprised I was blocked. Unless it's all of digital ocean.

Are you able to Run on AWS Lambda , how you have done this ?

@rdodev
Copy link

rdodev commented Jan 23, 2025

Seems APIfy just increased their price on the transcript actor dramatically. So it's not viable even for hobby/personally use since it's a $19.99/mo flat fee.

@Bootcody
Copy link

My teammate, @wnd180, and I discovered that AWS IP blocking caused an issue. We tested running the server on Google Cloud Platform (GCP), and it was successful! I recommend deploying the server on GCP. 🙌

Sadly, this approach is only feasible if you’re in the early stages of development or using tools(like docker..) that facilitate migration. 🥲

Just tested by setting up a GCP Func (deployed in US region). This doesn't work, if was ever working...

@rdodev
Copy link

rdodev commented Jan 26, 2025

My teammate, @wnd180, and I discovered that AWS IP blocking caused an issue. We tested running the server on Google Cloud Platform (GCP), and it was successful! I recommend deploying the server on GCP. 🙌
Sadly, this approach is only feasible if you’re in the early stages of development or using tools(like docker..) that facilitate migration. 🥲

Just tested by setting up a GCP Func (deployed in US region). This doesn't work, if was ever working...

Yeah seems YT is very aggressive with IP blacklists. Wouldn't be surprised if they they do the same to proxies, too. What we need is a good way auth so that requests to transcripts aren't anon.

@Bootcody
Copy link

My teammate, @wnd180, and I discovered that AWS IP blocking caused an issue. We tested running the server on Google Cloud Platform (GCP), and it was successful! I recommend deploying the server on GCP. 🙌
Sadly, this approach is only feasible if you’re in the early stages of development or using tools(like docker..) that facilitate migration. 🥲

Just tested by setting up a GCP Func (deployed in US region). This doesn't work, if was ever working...

Yeah seems YT is very aggressive with IP blacklists. Wouldn't be surprised if they they do the same to proxies, too. What we need is a good way auth so that requests to transcripts aren't anon.

Funny thing: some videos work with proxies just fine.. but some popular ones don't work with a single proxy I use from a list of 20..

But yet, there is no official lib or API from YT itself for this. So not sure where you'd like to authenticate to?

@rdodev
Copy link

rdodev commented Jan 26, 2025

Yeah seems YT is very aggressive with IP blacklists. Wouldn't be surprised if they they do the same to proxies, too. What we need is a good way auth so that requests to transcripts aren't anon.

Funny thing: some videos work with proxies just fine.. but some popular ones don't work with a single proxy I use from a list of 20..

But yet, there is no official lib or API from YT itself for this. So not sure where you'd like to authenticate to?

I thought if you made the request with the correct token, YT wouldn't block the transcript request. I imagine that's what APIFy does behind the scenes.

@Bootcody
Copy link

Yeah seems YT is very aggressive with IP blacklists. Wouldn't be surprised if they they do the same to proxies, too. What we need is a good way auth so that requests to transcripts aren't anon.

Funny thing: some videos work with proxies just fine.. but some popular ones don't work with a single proxy I use from a list of 20..
But yet, there is no official lib or API from YT itself for this. So not sure where you'd like to authenticate to?

I thought if you made the request with the correct token, YT wouldn't block the transcript request. I imagine that's what APIFy does behind the scenes.

No no, I tested the scraper to be ran from GCP, as was claimed above. This doesn't work. Being blocked like from any other cloud/hoster.

@kevin-weitgenant
Copy link

Has any one tried Youtube Data API to retrieve YouTube subtitles? What are pros and cons ? There is daily quota of 1000. How many requests can you fit in daily quota?

late reply but I tried, waste of time...

you need to set up Oauth and after that, for my surprise, seems like only videos that have third party collaboration enabled, are the ones that you can downlaod the subtitles.

what I think is quite a small percentage of videos. I wasnt able to download any subtitles.

But I was able to get the transcript list with youtube api, only that.

@timelinedr
Copy link

Went down a rabbit hole and ended up here. Thanks for these solutions. I am now having success using Smart Proxy.

Repository owner deleted a comment from whitewidow0 Feb 5, 2025
Repository owner deleted a comment from GourangPatidar Feb 5, 2025
@JamieWells1
Copy link

I deployed on Render. Same problem here. Proxying through smart proxy fixed it. Although smart proxy residential plan starts at $7 per GB.

Hi, I'm also using Render and smartproxy. How did you get it working? I'm currently using the proxy and still getting the same error. Code is here: main.py

Repository owner deleted a comment from OpeyemiSanusi Feb 7, 2025
Repository owner deleted a comment from jjzhuo Feb 7, 2025
Repository owner deleted a comment from harrycarson Feb 7, 2025
@samyogdhital
Copy link

I am also facing the same error when using this library from huggingface.
Any other solution beside running your own proxy?

@JamieWells1
Copy link

JamieWells1 commented Feb 7, 2025

Yep managed to fix it. Unfortunately it will require proxies, but the error was an SSL authentication error, if you haven't done so already generate a new password for your proxies. The default password was stale and just had to put in a new one.

If it's a personal project, you can get 1GB of request bandwidth from smartproxy for $7, which will last you for ages.

@brannonwinn
Copy link

I've tried with Webshare and smartproxy and still have no luck getting it to work...for those of you using smartproxy was there anything in particular that you had to do?

@JamieWells1
Copy link

JamieWells1 commented Feb 8, 2025

  1. Include both http and https in the proxies parameter
  2. Reset the initial password that smartproxy gives you for your proxies and use the new one
  3. Rotate your requests between different ports

What's the error you're getting?

@brannonwinn
Copy link

  1. Include both http and https in the proxies parameter
  2. Reset the initial password that smartproxy gives you for your proxies and use the new one
  3. Rotate your requests between different ports

What's the error you're getting?

That worked! Thanks a ton!

@weltitob
Copy link

I've bought a resedential proxy on smartproxy and yet it doesnt work for me - Please help.

@JamieWells1
Copy link

If you've tried all of the above and it still doesn't work, set the session type to Rotating instead of Sticky.

@pedrocarnevale
Copy link

In smart proxy I''m getting Max retries exceeded with url: (...) (Caused by ProxyError('Unable to connect to proxy', OSError('Tunnel connection failed: 407 Proxy Authentication Required')))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests