Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize videos with Dangerzone #1030

Open
apyrgio opened this issue Dec 13, 2024 · 2 comments
Open

Sanitize videos with Dangerzone #1030

apyrgio opened this issue Dec 13, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@apyrgio
Copy link
Contributor

apyrgio commented Dec 13, 2024

Proposed representations

Sanitizing a video means that we need to represent the video and the audio stream in the simplest format possible, with no metadata and headers.

For videos, that would be a sequence of images. In FFmpeg terms, this is called rawvideo, and is very similar to the way we handle RGB pixel streams right now. In order to decode it, we'd need to specify the frame rate, pixel format (rgb24 is the simplest one), and video size (width x height). In Dangerzone, we standardize on RGB image formats, and can already pass the width/height of each page.

For audio, that would be a sequence of audio samples. For mono streams, this sequence is linear, whereas for stereo streams, it alternates between left and right channel. In FFmpeg terms, this represented with the pcm_s16le audio codec by default (read more).

As for re-enconding the video, it seems that the best way to move forward is to use the H.265 codec, and the MP4 container format.

Commands

Sample file (240KiB)

Get video info (width, height, frame rate) (runs in sandbox)

$ ffprobe -v error -select_streams v:0 -show_entries stream=width,height,r_frame_rate -of default=noprint_wrappers=1 sample.mp4
width=960
height=540
r_frame_rate=25/1

Decode video to raw format (runs in sandbox)

ffmpeg -i sample.mp4 -c:v rawvideo -pix_fmt rgb24 out_video.rgb

Note

For the sample clip I posted above, the raw video took 55MiB of storage space.

Decode audio to raw format (runs in sandbox)

ffmpeg -i sample.mp4 -ar 44100 -ac 2  -c:a pcm_s16le -f s16le out_audio.raw

Encode raw audio and video streams to H.265

ffmpeg -f s16le -ar 44100 -ac 2 -i out_audio.raw -f rawvideo -pix_fmt rgb24 -s 960x540 -r 25 -i out_video.rgb -c:v libx265 -c:a aac out.mp4

Note

The output file is 337K. So, similarly to images, it seems that the video gets inflated as well. We can compress it further by tweaking the crf, pix_fmt, and preset values, but that can be done at a later stage.

@apyrgio apyrgio added the enhancement New feature or request label Dec 13, 2024
@apyrgio apyrgio changed the title Sanitized videos with Dangerzone Sanitize videos with Dangerzone Dec 13, 2024
@apyrgio
Copy link
Contributor Author

apyrgio commented Dec 13, 2024

The above is a way to perform the sanitization manually, using intermediate files. We don't want that, as these files can get very large, very quick. In our case, we can take advantage of the stdout and stderr of the Dangerzone container, and pipe the video and audio stream to those pipes.

Here's a proof-of-concept using named pipes (Linux only) to show that this is indeed possible:

#!/bin/bash

set -ex

rm -f video_pipe audio_pipe out.mp4
mkfifo video_pipe
mkfifo audio_pipe

ffmpeg -i sample.mp4 -ar 44100 -ac 2 -c:a pcm_s16le -f s16le pipe:1 > audio_pipe &
ffmpeg  -i sample.mp4 -c:v rawvideo  -pix_fmt rgb24 -f rawvideo pipe:1 > video_pipe &
ffmpeg -f s16le -ar 44100 -ac 2 -i audio_pipe -f rawvideo -pix_fmt rgb24 -s 960x540 -r 25 -i video_pipe -c:v libx265 -c:a aac out.mp4

@legoktm
Copy link
Member

legoktm commented Dec 13, 2024

As for re-enconding the video, it seems that the best way to move forward is to use the H.265 codec, and the MP4 container format.

I would avoid H.265 (aka HEVC) for now, it still doesn't have wide platform support yet, primarily no Firefox support; see https://caniuse.com/?search=h265. I also don't think it provides any real advantage for the DZ usecase over H.264, which has much broader support.

But is there a specific reason to go with mp4 in the first place? It's still a patent-encumbered format until theoretically 2030 (per Wikipedia).

I'd think we should also consider using VP8/VP9 codecs with a webm container. Is allowing people to pick between the two an option? Or using whatever the input format was (assuming ffmpeg supports encoding it)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

2 participants