Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract keyframes from a video using Katna #2

Open
dennyabrain opened this issue Sep 12, 2020 · 2 comments
Open

Extract keyframes from a video using Katna #2

dennyabrain opened this issue Sep 12, 2020 · 2 comments

Comments

@dennyabrain
Copy link
Contributor

Instead of extracting one random keyframe from a video, we should try using Katna library to extract keyframes.
in my MVP i dropped support for katna that was implemented by @anushree-a because it was taking prohibitively long times to complete on long files. But that could have been some error on my part. So this needs further investigation.

Intended Outcome

When a video is posted to the server, it should download the video, extract x number of keyframes from it and index it.
We need to think about what happens for long files. What are some edge cases we are willing to let go etc.

@duggalsu
Copy link

duggalsu commented Oct 5, 2020

I checked the katna keyframe detection workflow. Here is how it works -

  • The video is divided into equally spaced segments starting from the beginning and keyframes are extracted from each segment.
  • The number of segments is either equal to the number of desired keyframes given as input or depending upon the number of available cpus
  • The number of selected keyframes will be relative to
    • the number of available cpus and
    • the brightness and contrast of selected frames.
      Hence the timing and number of keyframe output is not reproducible across videos of different lengths and on different machines
  • The processing time itself will depend on the number of available cpu cores, hence the timing is not reproducible across different machines
  • Katna seems to select the method for keyframe detection automatically

It took an average of

  • 6.4 sec for 19 sec video for 15, 10 or 5 frames with 4-core cpu
  • 140 seconds for a 3min 18sec video for 15 frames with a 4-core cpu

https://github.com/scikit-image/scikit-image/blob/master/doc/source/user_guide/video.rst

Moviepy invokes FFmpeg through a subprocess, pipes the decoded video from FFmpeg into RAM, and reads it out. This approach is straightforward, but it can be brittle, and it's not workable for large videos that exceed available RAM. It works on all platforms if FFmpeg is installed.

Since it does not link to FFmpeg's underlying libraries, it is easier to install but about half as fast.

I am encountering a multiprocessing bug in Katna with Python 3.6, 3.7 and 3.8 where a process lock does not get released / is not found after extracting the first set of frames. I did not test older Python versions.

Katna does not seem to have good load balancing. A long video continues processing in a single process. Shorter videos do not use all cpu cores equally. I need to test this further.

It is dependent on moviepy which is not working with new python versions and is not executing consistently well even with Python 3.6.

Unless these issues are fixed, Katna does not seem stable for production use.

@dennyabrain
Copy link
Contributor Author

Thank you @duggalsu for evaluating this. If we were to take a step back and reevaulate this. Our main problem statement is to extract key frames out of a video. I am guessing its unlikely we'll find one library that takes care of all use cases. So maybe a good approach is to enumerate the use cases and implement different implementation for it?

This is not an exhaustive list but we did a similar enumeration of use cases for another project and some parameters we identified were video length, file size, frame rate, frame resolution, format (avi, mp4, mov, flv). These are things that could affect what mechanism to extract frames. What makes this slightly more complex is that its never just one parameter that could be used to decide the algorithm. for instance a short video with high frame resolution might need as much memory as a long video with low frame resolution.

so the solution we decided on was to start supporting limited use cases and respond with a NOT_SUPPORTED message for ALL use cases that our system doesn't support.

with that in mind, our primary use cases are

  1. amateur videos taken on smartphones and uploaded on social media (short length (10-60 seconds) med-high resolution)
  2. well produced vlogs and youtube videos (5-10 min, med-high resolution)

we can deprioritize the very long videos and other use cases for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants