Sanitize videos with Dangerzone #1030

apyrgio · 2024-12-13T13:49:22Z

Proposed representations

Sanitizing a video means that we need to represent the video and the audio stream in the simplest format possible, with no metadata and headers.

For videos, that would be a sequence of images. In FFmpeg terms, this is called rawvideo, and is very similar to the way we handle RGB pixel streams right now. In order to decode it, we'd need to specify the frame rate, pixel format (rgb24 is the simplest one), and video size (width x height). In Dangerzone, we standardize on RGB image formats, and can already pass the width/height of each page.

For audio, that would be a sequence of audio samples. For mono streams, this sequence is linear, whereas for stereo streams, it alternates between left and right channel. In FFmpeg terms, this represented with the pcm_s16le audio codec by default (read more).

As for re-enconding the video, it seems that the best way to move forward is to use the H.265 codec, and the MP4 container format.

Commands

Sample file (240KiB)

Get video info (width, height, frame rate) (runs in sandbox)

$ ffprobe -v error -select_streams v:0 -show_entries stream=width,height,r_frame_rate -of default=noprint_wrappers=1 sample.mp4
width=960
height=540
r_frame_rate=25/1

Decode video to raw format (runs in sandbox)

ffmpeg -i sample.mp4 -c:v rawvideo -pix_fmt rgb24 out_video.rgb

Note

For the sample clip I posted above, the raw video took 55MiB of storage space.

Decode audio to raw format (runs in sandbox)

ffmpeg -i sample.mp4 -ar 44100 -ac 2  -c:a pcm_s16le -f s16le out_audio.raw

Encode raw audio and video streams to H.265

ffmpeg -f s16le -ar 44100 -ac 2 -i out_audio.raw -f rawvideo -pix_fmt rgb24 -s 960x540 -r 25 -i out_video.rgb -c:v libx265 -c:a aac out.mp4

Note

The output file is 337K. So, similarly to images, it seems that the video gets inflated as well. We can compress it further by tweaking the crf, pix_fmt, and preset values, but that can be done at a later stage.

The text was updated successfully, but these errors were encountered:

apyrgio · 2024-12-13T14:34:41Z

The above is a way to perform the sanitization manually, using intermediate files. We don't want that, as these files can get very large, very quick. In our case, we can take advantage of the stdout and stderr of the Dangerzone container, and pipe the video and audio stream to those pipes.

Here's a proof-of-concept using named pipes (Linux only) to show that this is indeed possible:

#!/bin/bash

set -ex

rm -f video_pipe audio_pipe out.mp4
mkfifo video_pipe
mkfifo audio_pipe

ffmpeg -i sample.mp4 -ar 44100 -ac 2 -c:a pcm_s16le -f s16le pipe:1 > audio_pipe &
ffmpeg  -i sample.mp4 -c:v rawvideo  -pix_fmt rgb24 -f rawvideo pipe:1 > video_pipe &
ffmpeg -f s16le -ar 44100 -ac 2 -i audio_pipe -f rawvideo -pix_fmt rgb24 -s 960x540 -r 25 -i video_pipe -c:v libx265 -c:a aac out.mp4

legoktm · 2024-12-13T21:47:53Z

As for re-enconding the video, it seems that the best way to move forward is to use the H.265 codec, and the MP4 container format.

I would avoid H.265 (aka HEVC) for now, it still doesn't have wide platform support yet, primarily no Firefox support; see https://caniuse.com/?search=h265. I also don't think it provides any real advantage for the DZ usecase over H.264, which has much broader support.

But is there a specific reason to go with mp4 in the first place? It's still a patent-encumbered format until theoretically 2030 (per Wikipedia).

I'd think we should also consider using VP8/VP9 codecs with a webm container. Is allowing people to pick between the two an option? Or using whatever the input format was (assuming ffmpeg supports encoding it)?

apyrgio added the enhancement New feature or request label Dec 13, 2024

github-project-automation bot added this to Dangerzone ✨ Dec 13, 2024

github-project-automation bot moved this to Todo in Dangerzone ✨ Dec 13, 2024

apyrgio changed the title ~~Sanitized videos with Dangerzone~~ Sanitize videos with Dangerzone Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanitize videos with Dangerzone #1030

Sanitize videos with Dangerzone #1030

apyrgio commented Dec 13, 2024

apyrgio commented Dec 13, 2024

legoktm commented Dec 13, 2024

Sanitize videos with Dangerzone #1030

Sanitize videos with Dangerzone #1030

Comments

apyrgio commented Dec 13, 2024

Proposed representations

Commands

Get video info (width, height, frame rate) (runs in sandbox)

Decode video to raw format (runs in sandbox)

Decode audio to raw format (runs in sandbox)

Encode raw audio and video streams to H.265

apyrgio commented Dec 13, 2024

legoktm commented Dec 13, 2024