-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
video disabled due to region lock shows Transcript/subtitle disabled. #213
Comments
I will respect whether you fix it or not. Thanks for handling |
Hi @michaelthwan, |
@jdepoix I finally had some down time, taking a look at this issue. As far as what YouTube identifies this error as its still considered "Video unavailable" for the main reason, but has subreason text that displays In the browser, in place of the video not loading due to a region lock we get a black background with white text showing:
In the HTML we end up with this to search against
We could do a new error message class such as this? To keep it somewhat inline with whats in the response of YouTube. # file: youtube_transcript_api/_errors.py
class VideoUnplayable(CouldNotRetrieveTranscript):
CAUSE_MESSAGE = 'The video has not been made available in your country' Though it would be another search for an exact string match against def _extract_captions_json(self, html, video_id):
splitted_html = html.split('"captions":')
if len(splitted_html) <= 1:
if video_id.startswith('http://') or video_id.startswith('https://'):
raise InvalidVideoId(video_id)
if 'class="g-recaptcha"' in html:
raise TooManyRequests(video_id)
if '"playabilityStatus":' not in html:
raise VideoUnavailable(video_id)
# add something like this
if 'The uploader has not made this video available in your country' in html:
raise VideoUnplayable(video_id) Its a little fragile but I think you've once said before that technically this entire API is unofficial and could break at any time anyway. Let me know what you think. I could PR this in and probably add a test case or two while I have some down time. |
@jdepoix Interestingly enough we could also add an Age-related error class as well. Although it seems we could get around the age-related retrieval of a transcript since you can pull a transcript regardless if you are logged in or not. To do that would require adding logic around my findings in #110. But until we have that workaround implemented we could at least throw an appropriate error a very similar way as the country/region lock since the HTML to match on for that lives in the same spot and looks like this.
This would let us also sign off #111 until a workaround is implemented. |
Hi @crhowell, thanks for looking into this and sorry for the late reply! |
@jdepoix Well its branched logic in there based on whether or not Basically if we split the html Otherwise, that means if we have more than 1 index position our list, we do try to parse the 1st index position. But in our case for these specific errors, from what ive inspected via debug Let me include a snippet of the full function logic def _extract_captions_json(self, html, video_id):
splitted_html = html.split('"captions":')
if len(splitted_html) <= 1:
if video_id.startswith('http://') or video_id.startswith('https://'):
raise InvalidVideoId(video_id)
if 'class="g-recaptcha"' in html:
raise TooManyRequests(video_id)
if '"playabilityStatus":' not in html:
raise VideoUnavailable(video_id)
# NOTE: this is where we hit for our current issues errors.
raise TranscriptsDisabled(video_id)
captions_json = json.loads(
splitted_html[1].split(',"videoDetails')[0].replace('\n', '')
).get('playerCaptionsTracklistRenderer')
if captions_json is None:
raise TranscriptsDisabled(video_id)
if 'captionTracks' not in captions_json:
raise NoTranscriptAvailable(video_id)
return captions_json Update |
Hi @crhowell, yeah, that makes sense, but this should be solvable 😊 if len(splitted_html) <= 1:
if video_id.startswith('http://') or video_id.startswith('https://'):
raise InvalidVideoId(video_id)
if 'class="g-recaptcha"' in html:
raise TooManyRequests(video_id)
splitted_html = html.split('"playabilityStatus":')
if len(splitted_html) <= 1:
raise VideoUnavailable(video_id)
playability_status_json = json.loads(
splitted_html[1].split(',"WHAT_EVER_THE_NEXT_PROPERTY_IS')[0].replace('\n', '')
)
# ... handle playability_status_json ...
# fallback if we don't know the status
raise TranscriptsDisabled(video_id) |
@jdepoix I can throw an initial pass PR together for this I have a partial solution already. Ill test it against Age/Region error cases as well as the valid working cases so we can see what kind of "reason" shows up when everything is working fine and transcripts are retrievable. Ill tag you for review on it once submitted. Update Note, this PR is a quick first pass at it. Worth testing against more video IDs, I am sure there are some edge cases and more "status" values we might be able to get to add as custom errors. I did a little bit of testing. Let me know what you do or dont like we can tweak it as necessary. I need to add a few tests for the helpers, so coverage dropped a tiny bit due to that. |
I'm experiencing the same problem with this library. My setup:
This suggests that the issue is indeed related to region-based restrictions or how YouTube is responding to requests from different geographic locations. The current error message ("Subtitles are disabled for this video") is misleading and made troubleshooting difficult. I'm looking forward to the implementation of a more descriptive error handling system that can differentiate between truly disabled subtitles and region-based restrictions. This would greatly improve the user experience and make it easier to diagnose and handle these issues in our applications. In the meantime, is there a recommended workaround for handling region-locked videos? Would using a proxy or VPN be a viable solution for production environments facing this issue? |
Hi @mihailmariusiondev, this most likely is not an issue of your region being blocked, but the IP of your cloud provider being blocked. Have a look #303 to find more about this issue. |
To Reproduce
Steps to reproduce the behavior:
What code / cli command are you executing?
A user tried extracting this video
https://www.youtube.com/watch?v=kZsVStYdmws
This video is available in only some regions (e.g. Hong Kong, Taiwan) but not for the others (e.g. United States).
Therefore, it works in local (Hong Kong) but after deployment (to a US server), it will shows
Subtitles are disabled for this video
This code can reproduce that, it worked if using VPN for HK region. Doesn't work for US
Which Python version are you using?
Python 3.10.8
Which version of youtube-transcript-api are you using?
youtube-transcript-api 0.6.0
Expected behavior
Describe what you expected to happen.
I think it is okay that region which disabled the video cannot fetch transcript, but the exception is confusing that I troubleshot for a while to understand why it happened.
Potentially, it is because it entered
raise TranscriptsDisabled
part. Therefore maybe adding one more exception handling helps.Actual behaviour
it will shows
Subtitles are disabled for this video
for disabled video region even the subtitle is enabled.The text was updated successfully, but these errors were encountered: