Show % of file/time/similar and/or expose as metric? #83

RichiH · 2025-02-12T21:25:29Z

If we get any progress information from e.g. ffmpeg, it might be nice to expose it as a metric. That way we would predict runtimes which would help with e.g. downscaling the cluster.

yoe · 2025-02-12T22:25:46Z

Have vague plans of doing something along those lines.

Ffmpeg does actually support sending progress information in a parseable format, and Media::Convert (the part of SReview that integrates with ffmpeg) has support for parsing that information and passing progress information on to a callback routine. This is in the test suite so we know it does work, but it is not hooked up in SReview yet.

It sounds straightforward to 'just' implement it, but every time I think about it I get lost in minutiae related to the fact that you want progress information of the whole script, not a single ffmpeg command, and there can be dozens in a single script, and I want to weigh the progress information of a single command against expected runtime, or maybe not, and maybe I should just enumerate all the commands, but then future changes become a pain, maybe I should just create a separate object to keep track of what needs to be done so it's automatic, but what needs to be done is dynamic based on the result of some of the commands and you can't know all that beforehand, so you need loops and conditionals in that object, and and and OMG it's becoming a DSL now maybe I should just stop.

But yeah, eventually I'll get over myself and just do it.

Patches welcome, I guess 😉

yoe · 2025-02-12T22:28:27Z

Also, I don't want to stop with progress information, I also want to implement better error handling so that if an ffmpeg aborts, we update the state of the talk to a failed state so we know that happens. And while we're at it, might want to capture stderr and stdout too and put it in the database so we can keep track of things more easily. Etc.

This all gets very hairy and needs a lot of design work.

RichiH · 2025-02-13T05:46:50Z

I'd advise to do the most stupid thing that works: Per step in job: * Time per past step * Time of current step * Estimate of future jobs, NaN if unknown We can already do decent math on that in Prometheus Errors and everything, shove it into Loki We can't see it in the UI then, but we can detect it, alert on it, show a dashboard that shows what needs to be done. In different words, get the data out into systems designed to handle such cases and you don't have to build the logic yourself. Sent by mobile; please excuse my brevity.

johanvdw · 2025-02-13T08:34:15Z

Can we transcode quicker? That would remove the need for this metric.

yoe · 2025-02-13T16:15:43Z

Can we transcode quicker?

Sure. Transcoding is always a tradeoff between transcode quality (reflected in file size and number of artifacts) versus transcode time. Transcoding for longer will get you better results (smaller files, less artifacts).

I think the current settings are reasonable, but we can definitely revisit them.

The "vmaf" link that was provided in a different issue should also be useful (we don't use that currently, we probably should, but I need to figure out what the best way to do so is, first).

That would remove the need for this metric.

No it wouldn't :)

yoe · 2025-02-13T16:36:03Z

I think the current settings are reasonable

To expand on this a bit.

SReview is optimized to maximize throughput, not speed of a single encode. This is also why I prefer to have a queue slot per CPU rather than only doing a load 3/4th, as we do now (the difference is very clear in grafana, fwiw).

With the hetzner nodes, we had about 3 times the CPUs that we have now. With that, we managed to transcode somewhat over 300 videos in rougly 16 hours. This means that we should be able to transcode roughly 100 videos in 16 hours with what we have left, which I think is more than plenty.

johanvdw · 2025-02-13T16:55:10Z

That would remove the need for this metric.

No it wouldn't :)

you are right. But I want to have the throughput discussion, but I'll open another issue for that.

johanvdw mentioned this issue Feb 13, 2025

throughput vs single encode speed #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show % of file/time/similar and/or expose as metric? #83

Show % of file/time/similar and/or expose as metric? #83

RichiH commented Feb 12, 2025

yoe commented Feb 12, 2025

yoe commented Feb 12, 2025

RichiH commented Feb 13, 2025 via email

johanvdw commented Feb 13, 2025

yoe commented Feb 13, 2025

yoe commented Feb 13, 2025

johanvdw commented Feb 13, 2025

Show % of file/time/similar and/or expose as metric? #83

Show % of file/time/similar and/or expose as metric? #83

Comments

RichiH commented Feb 12, 2025

yoe commented Feb 12, 2025

yoe commented Feb 12, 2025

RichiH commented Feb 13, 2025 via email

johanvdw commented Feb 13, 2025

yoe commented Feb 13, 2025

yoe commented Feb 13, 2025

johanvdw commented Feb 13, 2025