-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show % of file/time/similar and/or expose as metric? #83
Comments
Have vague plans of doing something along those lines. Ffmpeg does actually support sending progress information in a parseable format, and Media::Convert (the part of SReview that integrates with ffmpeg) has support for parsing that information and passing progress information on to a callback routine. This is in the test suite so we know it does work, but it is not hooked up in SReview yet. It sounds straightforward to 'just' implement it, but every time I think about it I get lost in minutiae related to the fact that you want progress information of the whole script, not a single ffmpeg command, and there can be dozens in a single script, and I want to weigh the progress information of a single command against expected runtime, or maybe not, and maybe I should just enumerate all the commands, but then future changes become a pain, maybe I should just create a separate object to keep track of what needs to be done so it's automatic, but what needs to be done is dynamic based on the result of some of the commands and you can't know all that beforehand, so you need loops and conditionals in that object, and and and OMG it's becoming a DSL now maybe I should just stop. But yeah, eventually I'll get over myself and just do it. Patches welcome, I guess 😉 |
Also, I don't want to stop with progress information, I also want to implement better error handling so that if an ffmpeg aborts, we update the state of the talk to a failed state so we know that happens. And while we're at it, might want to capture stderr and stdout too and put it in the database so we can keep track of things more easily. Etc. This all gets very hairy and needs a lot of design work. |
I'd advise to do the most stupid thing that works:
Per step in job:
* Time per past step
* Time of current step
* Estimate of future jobs, NaN if unknown
We can already do decent math on that in Prometheus
Errors and everything, shove it into Loki
We can't see it in the UI then, but we can detect it, alert on it, show a
dashboard that shows what needs to be done.
In different words, get the data out into systems designed to handle such
cases and you don't have to build the logic yourself.
Sent by mobile; please excuse my brevity.
|
Can we transcode quicker? That would remove the need for this metric. |
Sure. Transcoding is always a tradeoff between transcode quality (reflected in file size and number of artifacts) versus transcode time. Transcoding for longer will get you better results (smaller files, less artifacts). I think the current settings are reasonable, but we can definitely revisit them. The "vmaf" link that was provided in a different issue should also be useful (we don't use that currently, we probably should, but I need to figure out what the best way to do so is, first).
No it wouldn't :) |
To expand on this a bit. SReview is optimized to maximize throughput, not speed of a single encode. This is also why I prefer to have a queue slot per CPU rather than only doing a load 3/4th, as we do now (the difference is very clear in grafana, fwiw). With the hetzner nodes, we had about 3 times the CPUs that we have now. With that, we managed to transcode somewhat over 300 videos in rougly 16 hours. This means that we should be able to transcode roughly 100 videos in 16 hours with what we have left, which I think is more than plenty. |
you are right. But I want to have the throughput discussion, but I'll open another issue for that. |
If we get any progress information from e.g. ffmpeg, it might be nice to expose it as a metric. That way we would predict runtimes which would help with e.g. downscaling the cluster.
The text was updated successfully, but these errors were encountered: