Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Implement retry logic for YouTube transcript fetching and fix URL decoding issue #1035

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

iw4p
Copy link

@iw4p iw4p commented Feb 18, 2025

Hi! had problem with youtube part, especially when I wanted to paste url on terminal (MacOS: zsh, bash - Linux: bash).
Also in the README nothing's mentioned about Youtube, I added it until doc completes.

What’s Changed:

This PR introduces several important updates to improve the reliability and functionality of the YouTube transcript fetching process and URL handling:

  1. Retry Logic for YouTube Transcript Fetching:

    • I've added a retry mechanism around the YouTube transcript fetching operation. This helps to handle intermittent failures or network issues more gracefully by retrying the operation a few times before failing.
  2. Fixed URL Decoding Issue:

    • There was an issue where YouTube URLs with escape characters (like \? and \=) were not being processed correctly, especially when pasted from the terminal. This fix ensures that URLs are properly decoded using urllib.parse.unquote(), so URLs like https://www.youtube.com/watch\?v\=videoID are handled properly.
  3. Improved Metadata and Description Extraction:

    • I’ve also improved the logic for extracting metadata and descriptions from YouTube pages. This makes the extraction process more reliable, particularly when dealing with different YouTube page layouts.
  4. Error Handling Improvements:

    • Enhanced error handling for the YouTube transcript fetching process, so the system can recover better from failures or missing data.
  5. Refactored _findKey Function:

    • The _findKey function has been refactored to simplify its code and make it more efficient by using json.items() for dictionary iteration instead of a more complex recursive method.

Why This Change is Needed:

  • Reliability: The retry mechanism will improve the reliability of fetching transcripts, which can fail due to network issues or API rate limiting.
  • Correct URL Processing: With the URL decoding fix, users can now paste URLs directly from the terminal without worrying about escape sequences, ensuring URLs are parsed correctly.
  • Better Metadata Handling: The improvements to metadata and description extraction will ensure that we get more accurate data from YouTube pages.
  • Resiliency: The improved error handling will help the application deal with temporary issues without failing entirely, making the process more robust.

@iw4p
Copy link
Author

iw4p commented Feb 18, 2025

@iw4p please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant