video disabled due to region lock shows Transcript/subtitle disabled. #213

michaelthwan · 2023-06-28T06:07:06Z

To Reproduce

Steps to reproduce the behavior:

What code / cli command are you executing?

A user tried extracting this video
https://www.youtube.com/watch?v=kZsVStYdmws
This video is available in only some regions (e.g. Hong Kong, Taiwan) but not for the others (e.g. United States).
Therefore, it works in local (Hong Kong) but after deployment (to a US server), it will shows
Subtitles are disabled for this video

This code can reproduce that, it worked if using VPN for HK region. Doesn't work for US

video_id = "kZsVStYdmws"
YouTubeTranscriptApi.list_transcripts(video_id)

Which Python version are you using?

Python 3.10.8

Which version of youtube-transcript-api are you using?

youtube-transcript-api 0.6.0

Expected behavior

Describe what you expected to happen.
I think it is okay that region which disabled the video cannot fetch transcript, but the exception is confusing that I troubleshot for a while to understand why it happened.

Potentially, it is because it entered raise TranscriptsDisabled part. Therefore maybe adding one more exception handling helps.

    def _extract_captions_json(self, html, video_id):
        splitted_html = html.split('"captions":')

        if len(splitted_html) <= 1:
            if video_id.startswith('http://') or video_id.startswith('https://'):
                raise InvalidVideoId(video_id)
            if 'class="g-recaptcha"' in html:
                raise TooManyRequests(video_id)
            if '"playabilityStatus":' not in html:
                raise VideoUnavailable(video_id)

          **Here, added exception**

            **raise TranscriptsDisabled**(video_id)

Actual behaviour

it will shows Subtitles are disabled for this video for disabled video region even the subtitle is enabled.

The text was updated successfully, but these errors were encountered:

michaelthwan · 2023-06-28T06:09:03Z

I will respect whether you fix it or not. Thanks for handling

jdepoix · 2023-06-28T07:34:21Z

Hi @michaelthwan,
thank you for reporting. I agree: this is not something we can do anything about, but a more descriptive error message would be nice. I am currently a bit short on time to implement this myself, but I will put it on the list and contributions will be very much welcome! 😊

crhowell · 2023-07-11T06:05:58Z

@jdepoix I finally had some down time, taking a look at this issue.

As far as what YouTube identifies this error as its still considered "Video unavailable" for the main reason, but has subreason text that displays The uploader has not made this video available in your country

In the browser, in place of the video not loading due to a region lock we get a black background with white text showing:

Video unavailable
The uploader has not made this video available in your country

In the HTML we end up with this to search against

"playabilityStatus":{"status":"UNPLAYABLE","reason":"Video unavailable","errorScreen":{"playerErrorMessageRenderer":{"subreason":{"runs":[{"text":"The uploader has not made this video available in your country"}]}

We could do a new error message class such as this? To keep it somewhat inline with whats in the response of YouTube.

# file: youtube_transcript_api/_errors.py

class VideoUnplayable(CouldNotRetrieveTranscript):
    CAUSE_MESSAGE = 'The video has not been made available in your country'

Though it would be another search for an exact string match against html such as

def _extract_captions_json(self, html, video_id):
    splitted_html = html.split('"captions":')
    
    if len(splitted_html) <= 1:
        if video_id.startswith('http://') or video_id.startswith('https://'):
            raise InvalidVideoId(video_id)
        if 'class="g-recaptcha"' in html:
            raise TooManyRequests(video_id)
        if '"playabilityStatus":' not in html:
            raise VideoUnavailable(video_id)     
        
        # add something like this
        if 'The uploader has not made this video available in your country' in html:
            raise VideoUnplayable(video_id)

Its a little fragile but I think you've once said before that technically this entire API is unofficial and could break at any time anyway. Let me know what you think. I could PR this in and probably add a test case or two while I have some down time.

crhowell · 2023-07-11T06:29:45Z

@jdepoix Interestingly enough we could also add an Age-related error class as well. Although it seems we could get around the age-related retrieval of a transcript since you can pull a transcript regardless if you are logged in or not. To do that would require adding logic around my findings in #110. But until we have that workaround implemented we could at least throw an appropriate error a very similar way as the country/region lock since the HTML to match on for that lives in the same spot and looks like this.

"playabilityStatus":{"status":"LOGIN_REQUIRED","reason":"Sign in to confirm your age","errorScreen":{"playerErrorMessageRenderer":{"subreason":{"runs":[{"text":"This video may be inappropriate for some users."}]}

This would let us also sign off #111 until a workaround is implemented.

jdepoix · 2023-07-23T13:02:13Z

Hi @crhowell, thanks for looking into this and sorry for the late reply!
It looks like the data in "playabilityStatus" could generally be useful to provide more helpful exceptions and error messages! We could add a exception type for each status (LoginRequired, VideoUnplayable) which render playabilityStatus.reason and playabilityStatus.errorScreen.playerErrorMessageRenderer.subreason.runs as part of the error message. However, just looking for a natural language string in the html definitely is too fragile, as this probably will be in a different language depending on the locale. But isn't this part of the json we are parsing in json.loads(splitted_html[1].split(',"videoDetails')[0].replace('\n', '')) anyways? In that case we could just check what the status is and throw the corresponding exception, while passing in the reason/subreason. If it is not part of the json we are currently parsing, I guess we should find a way to parse it, since everything else will be very fragile.

crhowell · 2023-07-23T20:53:35Z

@jdepoix Well its branched logic in there based on whether or not splitted_html has an index 1 or not.

Basically if we split the html html.split('"captions":') on captions. If that List has a length less than or equal to 1. We will ALWAYS raise an exception and json.loads never runs.

Otherwise, that means if we have more than 1 index position our list, we do try to parse the 1st index position.

But in our case for these specific errors, from what ive inspected via debug breakpoint we do not have more than 1 index position so we would never hit the json.loads side of the branching, we always raise the exception which leaves us back with the fragile in html statement.

Let me include a snippet of the full function logic

def _extract_captions_json(self, html, video_id):
    splitted_html = html.split('"captions":')
    if len(splitted_html) <= 1:
        if video_id.startswith('http://') or video_id.startswith('https://'):
            raise InvalidVideoId(video_id)
        if 'class="g-recaptcha"' in html:
            raise TooManyRequests(video_id)
        if '"playabilityStatus":' not in html:
            raise VideoUnavailable(video_id)
        # NOTE: this is where we hit for our current issues errors.
        raise TranscriptsDisabled(video_id)

    captions_json = json.loads(
        splitted_html[1].split(',"videoDetails')[0].replace('\n', '')
    ).get('playerCaptionsTracklistRenderer')
    if captions_json is None:
        raise TranscriptsDisabled(video_id)

    if 'captionTracks' not in captions_json:
        raise NoTranscriptAvailable(video_id)

    return captions_json

Update
Confirmed that both the Age Restricted video and Country/Region locked video len(splitted_html) will be 1

michaelthwan · 2023-07-25T02:51:09Z

You guys are very helpful. Thank you @crhowell @jdepoix

jdepoix · 2023-07-26T17:19:28Z

Hi @crhowell, yeah, that makes sense, but this should be solvable 😊

if len(splitted_html) <= 1:
        if video_id.startswith('http://') or video_id.startswith('https://'):
            raise InvalidVideoId(video_id)
        if 'class="g-recaptcha"' in html:
            raise TooManyRequests(video_id)
        splitted_html = html.split('"playabilityStatus":')
        if len(splitted_html) <= 1:
            raise VideoUnavailable(video_id)
        
        playability_status_json = json.loads(
            splitted_html[1].split(',"WHAT_EVER_THE_NEXT_PROPERTY_IS')[0].replace('\n', '')
        )

        # ... handle playability_status_json ...

        # fallback if we don't know the status
        raise TranscriptsDisabled(video_id)

crhowell · 2023-07-26T20:38:00Z

@jdepoix I can throw an initial pass PR together for this I have a partial solution already. Ill test it against Age/Region error cases as well as the valid working cases so we can see what kind of "reason" shows up when everything is working fine and transcripts are retrievable.

Ill tag you for review on it once submitted.

Update
PR #219

Note, this PR is a quick first pass at it. Worth testing against more video IDs, I am sure there are some edge cases and more "status" values we might be able to get to add as custom errors.

I did a little bit of testing. Let me know what you do or dont like we can tweak it as necessary. I need to add a few tests for the helpers, so coverage dropped a tiny bit due to that.

mihailmariusiondev · 2024-10-21T19:10:01Z

I'm experiencing the same problem with this library. My setup:

VPS hosted on Hetzner (German IP): Unable to retrieve transcripts
Local machine (Spain IP): Can download transcripts without any problem

This suggests that the issue is indeed related to region-based restrictions or how YouTube is responding to requests from different geographic locations. The current error message ("Subtitles are disabled for this video") is misleading and made troubleshooting difficult.

I'm looking forward to the implementation of a more descriptive error handling system that can differentiate between truly disabled subtitles and region-based restrictions. This would greatly improve the user experience and make it easier to diagnose and handle these issues in our applications.

In the meantime, is there a recommended workaround for handling region-locked videos? Would using a proxy or VPN be a viable solution for production environments facing this issue?

jdepoix · 2024-10-21T19:23:05Z

Hi @mihailmariusiondev, this most likely is not an issue of your region being blocked, but the IP of your cloud provider being blocked. Have a look #303 to find more about this issue.

jdepoix added the enhancement New feature or request label Jun 28, 2023

Repository owner deleted a comment from hatemmezlini Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

video disabled due to region lock shows Transcript/subtitle disabled. #213

video disabled due to region lock shows Transcript/subtitle disabled. #213

michaelthwan commented Jun 28, 2023

michaelthwan commented Jun 28, 2023

jdepoix commented Jun 28, 2023

crhowell commented Jul 11, 2023 •

edited

Loading

crhowell commented Jul 11, 2023

jdepoix commented Jul 23, 2023

crhowell commented Jul 23, 2023 •

edited

Loading

michaelthwan commented Jul 25, 2023

jdepoix commented Jul 26, 2023

crhowell commented Jul 26, 2023 •

edited

Loading

mihailmariusiondev commented Oct 21, 2024

jdepoix commented Oct 21, 2024

video disabled due to region lock shows Transcript/subtitle disabled. #213

video disabled due to region lock shows Transcript/subtitle disabled. #213

Comments

michaelthwan commented Jun 28, 2023

To Reproduce

What code / cli command are you executing?

Which Python version are you using?

Which version of youtube-transcript-api are you using?

Expected behavior

Actual behaviour

michaelthwan commented Jun 28, 2023

jdepoix commented Jun 28, 2023

crhowell commented Jul 11, 2023 • edited Loading

crhowell commented Jul 11, 2023

jdepoix commented Jul 23, 2023

crhowell commented Jul 23, 2023 • edited Loading

michaelthwan commented Jul 25, 2023

jdepoix commented Jul 26, 2023

crhowell commented Jul 26, 2023 • edited Loading

mihailmariusiondev commented Oct 21, 2024

jdepoix commented Oct 21, 2024

crhowell commented Jul 11, 2023 •

edited

Loading

crhowell commented Jul 23, 2023 •

edited

Loading

crhowell commented Jul 26, 2023 •

edited

Loading