Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata tasks created for videos that won't be downloaded #156

Closed
DCKcode opened this issue Sep 19, 2021 · 4 comments · Fixed by #787
Closed

Metadata tasks created for videos that won't be downloaded #156

DCKcode opened this issue Sep 19, 2021 · 4 comments · Fixed by #787

Comments

@DCKcode
Copy link

DCKcode commented Sep 19, 2021

Thanks for making this app! I'm really loving this concept and hope to see this evolve further.

I was setting up TubeSync on my home server and adding a bunch of channels to it. (Okay, I enthusiastically added over 25 channels to it). I'd be interested in using TubeSync to keep track of favorite YT channels from now on, so I tried configuring TubeSync to not download any old videos. I don't want a sync of the entire channel - just a sync of the channel from now on. Every video I added a download cap rule of "1 week".

While no videos were downloaded older than one week, the task still was completely flooded with tasks to fetch metadata. Since I added some channels with thousands of videos, this meant the task backlog was completely flooded by metadata fetches that weren't necessary.

To reproduce:

Expected behavior:

  • Only tasks are added involving downloads from the last week

Actual behavior:

  • Tasks are added for all videos in a channel's history
  • Logs indicate TubeSync explores a channel's entire history as if no download cap is set; it just stops at downloading videos older than one week when it gets to the download task
  • This results in hundreds or thousands more tasks than necessary
@meeb
Copy link
Owner

meeb commented Sep 20, 2021

Thanks for the detailed issue description. This is a byproduct of the way TubeSync indexes channels which is to do a "fast" (or flat) scan of the channel just to get video IDs, but this doesn't return much else other than the video ID, title, playlist or channel name and a few other details. The fast/flat scan is the check it does on a timer to detect new videos. Fast scans, unfortunately, don't contain the publish date of the video. As there's no information on the publish date of the video tasks are then scheduled to get the video metadata which then does contain the publish date.

Obviously as you've reported this creates a metadata task for every video, even ones that will be skipped, because TubeSync doesn't know it can skip them until it downloads the metadata for the video.

I'm fiddling with looking to see if there are any "limit by publish age" flags you could manually set for playlists (such as, here's a playlist URL but only show me anything newer than 14 days) but I've not found anything reliable or that suitable yet.

The alternative is to have a "use RSS feed" for channels and then only download new media as it's released, ignoring all older media, however that wouldn't obey the current download cap age either.

From my own usage it does tend to sort itself out after a few hours, although of course the larger the channels you add the greater the potential problem might be.

Work is ongoing to find a sensible solution here.

@DCKcode
Copy link
Author

DCKcode commented Sep 27, 2021

TubeSync never managed to sort itself out for some 30 channels I added for me. It seems the RSS solution is a great way to go about this - even as an additional data source just to quickly determine what the latest videos for. I've managed to set up a scheduled shell script that replicates most of what I want out of TubeSync, that just processes the RSS feeds for channels.

I understand why you've taken this approach. If this is to be analogous to being a Sonarr for YouTube, then you'd want to be able to sync entire channels.

However, the way I view YouTube I'm definitely not looking to sync down all thousands of past videos of my subscriptions, I'm just interested in automatically downloading their latest ones. Unlike Sonarr and Radarr YouTube subscriptions generally don't require you to go far back into the past, at least not for me. So having a solution optimized to just sync down a few recent videos of subscriptions is what I really need.

@meeb
Copy link
Owner

meeb commented Sep 28, 2021

Thanks for the feedback. I am looking into the RSS feed option, it just doesn't fit that neatly into the way channels are indexed right now so it's a bit more work than I originally thought it might be. TubeSync doesn't need to download thousands of videos, but it does need to get the metadata for thousands of videos at the moment. Ideally there would be some sort of querystring for channels to only return a subset of the channels videos by date which would solve this, alas I've not found it yet if it exists.

@tcely tcely moved this to Untriaged in Status Feb 24, 2025
@tcely tcely added this to Status Feb 24, 2025
@tcely
Copy link
Contributor

tcely commented Feb 27, 2025

This has been greatly improved with recent changes.

The RSS reading is being tracked in:

More work on this can be done by passing a date to yt-dlp to perhaps limit how large of a playlist TubeSync has to process. Something like this was suggested in #48.

The daterange option was added in:

@tcely tcely moved this from Untriaged to Todo in Status Feb 27, 2025
@tcely tcely moved this from Todo to In Progress in Status Feb 27, 2025
@meeb meeb closed this as completed in #787 Feb 27, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Status Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants