Add research.md: FFmpeg DNN call chain analysis and SR model integration guide by Copilot · Pull Request #1 · hongbolv/FFmpeg

Copilot · 2026-03-24T23:48:56Z

Comprehensive documentation tracing the full call path libavfilter → DNN Interface → OpenVINO Backend → OpenVINO C Library and summarizing steps to integrate new image super-resolution models.

What's covered

Layered architecture — 5-layer design from video filters (vf_dnn_processing.c) through dnn_filter_common, dnn_interface, backend implementations, down to OpenVINO C API
Complete call chain — End-to-end ASCII diagram showing every function call for an SR model inference, from ff_dnn_init() → dnn_load_model_ov() → ov_core_create() through to infer_completion_callback() → ff_proc_from_dnn_to_frame()
OpenVINO C API catalog — All 40+ ov_* functions used by the backend, grouped by category (core, model, preprocessing, inference, tensor, port/shape)
Async execution mechanism — Triple-queue architecture (task_queue / lltask_queue / request_queue) and producer-consumer pattern with SafeQueue
Key data structures — DNNModule, DNNModel, DnnContext, OVModel, OVRequestItem, TaskItem, DNNData
I/O image copy analysis — Complete inventory of all sws_scale() and av_image_copy_plane() operations across dnn_io_proc.c, vf_dnn_processing.c, and dnn_backend_openvino.c with line numbers, trigger conditions, and a typical SR copy chain diagram
Data copy necessity analysis — Classification of each copy operation as always-required (format conversion, normalization), conditionally-triggered (UV planes for YUV formats, NCHW layout conversions), or potentially-avoidable (GRAYF32 zero-copy, OpenVINO remote tensors), with per-configuration copy count comparison table
FFmpeg vs OpenVINO preprocessing comparison — Detailed analysis of which preprocessing is already delegated to OpenVINO's ov_preprocess_* API (~40%: normalization, element type conversion, layout declaration), which remains FFmpeg-only (color space conversion, UV chroma handling, packed↔planar, resize), redundancy in layout conversion (both FFmpeg manual transpose and OpenVINO pipeline), and potential optimization directions
Tile-based SR analysis — Documents that FFmpeg currently has no tile-based SR support, with code evidence, workaround solutions (external crop, input_resizable, Python overlap-fusion script), and future improvement directions
SR model integration guide — 7-step process: model conversion to IR, parameter identification, preprocessing config (scale/mean/layout), dnn_processing filter usage, optional custom pre/post-proc, build, and validation
Concrete examples — FFmpeg commands for SRCNN, EDSR, Real-ESRGAN (with full model preparation steps from PyTorch → ONNX → OpenVINO IR), SwinIR with correct options
Existing SR examples inventory — Catalog of SR resources already in the FFmpeg codebase: the dedicated vf_sr.c filter (SRCNN/ESPCN), doc/filters.texi usage examples with dnn_processing, external model training repos, and comparison between the specialized sr filter and the generic dnn_processing approach
Real-ESRGAN x4plus complete integration guide — End-to-end practice guide (§13) referencing Real-ESRGAN and OpenVINO docs, covering: RRDBNet architecture details and all model variants, environment setup, three model conversion methods (PyTorch→ONNX→IR, Python API, direct PyTorch→IR), FFmpeg call chain analysis with parameter-to-code mapping, usage commands for five scenarios (image/video/dynamic size/GPU/YUV), complete data flow diagram, performance optimization (FP16 quantization, nireq tuning, memory consumption table), anime model variant integration (x4plus_anime_6B, realesr-animevideov3), end-to-end validation with PyTorch reference comparison, and a one-shot complete workflow

Example usage from the guide

# RGB 3-channel SR model (Real-ESRGAN 4x) via OpenVINO
ffmpeg -i input.mp4 -vf \
  "format=rgb24,\
   dnn_processing=dnn_backend=openvino:\
   model=realesrgan_4x.xml:\
   input=input:output=output:\
   layout=nchw:scale=255:\
   input_resizable=1:async=1:nireq=2" \
  -y output.mp4

Documentation-only change — no code modifications.

⚡ Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast.

…n and SR model integration guide Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/804dfbc5-4ad0-468f-9050-140dd13640bd

hongbolv · 2026-03-25T01:17:29Z

research.md

+
+---
+
+## 9. 异步执行机制


@copilot 对于大尺寸图片super resolution，目前有没有分块做super resolution的支持呢？

FFmpeg 当前不支持分块（tile-based）超分辨率处理。所有 DNN 相关代码中没有 tile/block/patch/overlap 相关逻辑，DFT_PROCESS_FRAME 定义为 "process the whole frame"，activate() 对每帧调用一次 ff_dnn_execute_model()。

input_resizable 选项是 reshape 模型输入维度，不是分块；batch_size 在 OpenVINO 2.0 后端已限制为 1。

已在 f94cbc3 中新增第 10 节「大尺寸图片分块 SR 处理分析」，包含代码证据、变通方案（FFmpeg crop 滤镜外部分块、input_resizable 直接处理、Python 脚本分块+重叠融合），以及潜在的未来改进方向。

hongbolv · 2026-03-25T01:18:59Z

research.md

+```
+
+#### Real-ESRGAN (RGB 3通道, 4x 放大)
+


@copilot 有没有集成real-ESRGAN的实现呢？

FFmpeg 没有 Real-ESRGAN 的专用实现代码。整个代码库中搜索 "esrgan" 只在 research.md 中有结果。

Real-ESRGAN 通过通用的 dnn_processing 滤镜加载，与 EDSR、SwinIR 等使用完全相同的代码路径（dnn_load_model_ov() → ov_core_read_model() → 通用推理流程）。用户需要自行将 Real-ESRGAN 模型转换为 OpenVINO IR 格式。

已在 f94cbc3 中扩展了 Real-ESRGAN 示例（第 11.5 节），添加了完整的模型准备步骤（PyTorch → ONNX → OpenVINO IR）和使用说明。

hongbolv · 2026-03-25T01:20:11Z

research.md

+---
+
+## 8. 数据流与 I/O 处理
+


@copilot I/O处理有哪些地方有image copy？

已在 f94cbc3 中新增第 8.5 节「I/O 处理中的 Image Copy 详解」，逐文件列出所有 image copy 操作：

dnn_io_proc.c — 8 处 sws_scale() + 2 处 av_image_copy_plane()：

ff_proc_from_frame_to_dnn(): L251(RGB→GBRP), L273(格式转换), L280(GRAYF32直通), L306(Y通道转float)

ff_proc_from_dnn_to_frame(): L101(RGB格式转换), L131(GBRP→RGB), L140(GRAYF32直通), L166(GRAYF32→GRAY8)

vf_dnn_processing.c — 最多 5 处：

L305 av_frame_copy_props()（属性，非像素）

L233 av_image_copy_plane() ×2（UV直接复制）

L238-244 sws_scale() ×2-3（UV缩放，SR场景）

dnn_backend_openvino.c — ov_tensor_data() 是零拷贝内存映射，实际复制通过回调间接触发上述 dnn_io_proc 函数。

典型 YUV420P SR 路径总计约 5 次数据复制（1次属性 + 2次格式转换 + 2次UV缩放）。

@copilot 视频数据的copy是必须的吗？还是在特定配置下需要的？

大部分数据复制是架构性必须的（格式转换 + 归一化），但复制次数取决于配置：

始终必须：av_frame_copy_props()（帧属性）、sws_scale() 的 UINT8↔FLOAT 类型转换（模型要求 float 输入）。

配置相关：

UV 复制/缩放 — 仅 planar YUV 格式触发，RGB/GRAY 格式完全跳过

NCHW packed↔planar — 仅 layout=nchw 触发，layout=nhwc 可少 1-2 次复制

UV 缩放 vs 直接复制 — SR 场景尺寸变化走 sws_scale()，非 SR（如去噪）尺寸相同走 av_image_copy_plane()

配置对比：RGB24+NHWC 最少（2 次像素复制），YUV420P+SR 需 4 次，RGB24+NCHW 需 4 次。

当前无零拷贝路径：即使 GRAYF32 直通场景仍用 av_image_copy_plane() 做内存复制；OpenVINO 远程张量（ov_remote_tensor）也未被 FFmpeg 使用。

已在 90fa59a 新增 §8.5.5 详细分析。

@copilot openvino也有preprocessing的模块，ffmpeg层还需要这些吗？

FFmpeg 后端已经在用 OpenVINO 的预处理模块——归一化（scale/mean）和布局声明（NHWC→NCHW）已委托给 ov_preprocess_preprocess_steps_scale/mean/convert_element_type()（dnn_backend_openvino.c L612-750），约占预处理工作量的 40%。

但 FFmpeg 层的 sws_scale() 仍不可替代，原因：

色彩空间转换（YUV→RGB、NV12→RGB）——OpenVINO preprocessing API 没有色彩空间转换函数

UV 色度通道复制/缩放——完全在 DNN 推理范围之外，只能由 FFmpeg 管理

Resize——FFmpeg 后端未调用 ov_preprocess_preprocess_steps_resize()，UV 缩放由 copy_uv_planes() 处理

存在一处冗余：布局转换在 dnn_io_proc.c（手动 packed→planar 缓冲区重排）和 OpenVINO 管线（NHWC→NCHW 声明）中双重执行，是潜在的优化方向。

已在 3c0b33b 新增 §8.5.6 详细对比分析。

…eal-ESRGAN integration notes Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/f1de5a73-82a9-40cd-81e0-4fb022693840

… separate block Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/f1de5a73-82a9-40cd-81e0-4fb022693840

…nditional Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/0329e3b6-6dc6-4a1a-b799-7619a4f752f4

Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/a001f4d6-aaa8-4fc4-853c-83c66d33c429

hongbolv · 2026-03-25T02:48:11Z

research.md

+| 10 | ✅ 设备选择 | 根据硬件选择 CPU/GPU/VPU |
+
+### 11.5 常见 SR 模型集成示例
+


@copilot 目前FFMPEG中有没有SR模型集成的example吗?

FFmpeg 代码库中有以下 SR 相关的示例：

vf_sr.c — 专用 SR 滤镜，内置支持 SRCNN 和 ESPCN（仅 TensorFlow 后端）

doc/filters.texi L12252-12265 — 提供了 dnn_processing 滤镜的 SRCNN/ESPCN 完整命令示例（YUV420P Y 通道处理）

外部训练仓库 — 文档引用了 https://github.com/XueweiMeng/sr/tree/sr_dnn_native 提供模型训练和 .pb 导出脚本

但无预置模型文件（用户需自行准备），无 FATE 测试，且现有示例仅覆盖 TensorFlow + 经典模型。对于 Real-ESRGAN、SwinIR 等现代模型需使用 dnn_processing + OpenVINO 后端。

已在 b37c641 新增 §11.6 详细汇总。

@copilot 参考https://github.com/xinntao/Real-ESRGAN和https://docs.openvino.ai/)，实现REAL-ESRGAN_x4plus的集成。

已在 caf8c2e 新增 §13「Real-ESRGAN x4plus 完整集成实践」，包含：

模型概述（§13.1）：RRDBNet 架构参数、所有模型变体对比表

环境准备（§13.2）：PyTorch + OpenVINO + FFmpeg 编译环境

模型转换（§13.3）：3 种转换方法（ONNX→IR、Python API、PyTorch 直接转换），含完整导出脚本和验证代码

FFmpeg 调用链分析（§13.4）：Real-ESRGAN 在 dnn_processing 中的完整调用路径，参数到代码位置的映射表

使用命令（§13.5）：图片 SR、视频异步流水线、动态尺寸、GPU 推理、YUV 输入等 5 种场景

数据流图解（§13.6）：RGB24+NCHW 下从输入帧到输出帧的完整数据复制路径

性能优化（§13.7）：FP16 量化、nireq 调优、内存消耗对照表

模型变体集成（§13.8）：x4plus_anime_6B 和 realesr-animevideov3 的导出和使用

端到端验证（§13.9）：单帧验证 + PyTorch 参考输出对比脚本

完整一键流程（§13.10）：从下载权重到 FFmpeg 输出的 6 步命令

Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/0042692c-58c9-4142-8b8c-ff7009cea3f4

Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/7ba31ca1-b1b2-4ea9-8d26-b9d571bf8755

Copilot · 2026-03-25T03:08:13Z

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

docs.openvino.ai
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Initial plan

6a1f5f2

Copilot AI assigned Copilot and hongbolv Mar 24, 2026

Copilot started work on behalf of hongbolv March 24, 2026 23:49 View session

Add comprehensive research.md documenting FFmpeg DNN module call chai…

bb136ec

…n and SR model integration guide Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/804dfbc5-4ad0-468f-9050-140dd13640bd

Copilot AI changed the title ~~[WIP] Add research documentation for FFMPEG and image SR model integration~~ Add research.md: FFmpeg DNN call chain analysis and SR model integration guide Mar 24, 2026

Copilot AI requested a review from hongbolv March 24, 2026 23:58

Copilot finished work on behalf of hongbolv March 24, 2026 23:58

hongbolv reviewed Mar 25, 2026

View reviewed changes

Copilot started work on behalf of hongbolv March 25, 2026 01:20 View session

Copilot AI and others added 2 commits March 25, 2026 01:28

Add sections on tile-based SR analysis, I/O image copy details, and R…

f94cbc3

…eal-ESRGAN integration notes Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/f1de5a73-82a9-40cd-81e0-4fb022693840

Fix subsection numbering (10.3.x → 11.3.x) and split Python code into…

90b07c4

… separate block Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/f1de5a73-82a9-40cd-81e0-4fb022693840

Copilot AI requested a review from hongbolv March 25, 2026 01:30

Copilot finished work on behalf of hongbolv March 25, 2026 01:30

Copilot started work on behalf of hongbolv March 25, 2026 01:34 View session

Add section 8.5.5: analysis of which data copies are necessary vs. co…

90fa59a

…nditional Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/0329e3b6-6dc6-4a1a-b799-7619a4f752f4

Copilot finished work on behalf of hongbolv March 25, 2026 01:37

Copilot started work on behalf of hongbolv March 25, 2026 01:45 View session

Add section 8.5.6: FFmpeg vs OpenVINO preprocessing overlap analysis

3c0b33b

Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/a001f4d6-aaa8-4fc4-853c-83c66d33c429

Copilot finished work on behalf of hongbolv March 25, 2026 01:50

hongbolv reviewed Mar 25, 2026

View reviewed changes

Copilot started work on behalf of hongbolv March 25, 2026 02:48 View session

Add section 11.6: inventory of existing SR examples in FFmpeg codebase

b37c641

Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/0042692c-58c9-4142-8b8c-ff7009cea3f4

Copilot AI requested a review from hongbolv March 25, 2026 02:55

Copilot finished work on behalf of hongbolv March 25, 2026 02:55

Copilot started work on behalf of hongbolv March 25, 2026 02:59 View session

Add section 13: comprehensive Real-ESRGAN x4plus integration guide

caf8c2e

Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/7ba31ca1-b1b2-4ea9-8d26-b9d571bf8755

Copilot finished work on behalf of hongbolv March 25, 2026 03:08

		\| 10 \| ✅ 设备选择 \| 根据硬件选择 CPU/GPU/VPU \|

		### 11.5 常见 SR 模型集成示例


		---

		## 9. 异步执行机制

Conversation

Copilot AI commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's covered

Example usage from the guide

Uh oh!

hongbolv Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

hongbolv Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

hongbolv Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

hongbolv Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

hongbolv Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

hongbolv Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

hongbolv Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Mar 25, 2026

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 24, 2026 •

edited

Loading