Skip to content

Add research.md: FFmpeg DNN call chain analysis and SR model integration guide#1

Draft
Copilot wants to merge 8 commits intomasterfrom
copilot/research-ffmpeg-image-sr-model
Draft

Add research.md: FFmpeg DNN call chain analysis and SR model integration guide#1
Copilot wants to merge 8 commits intomasterfrom
copilot/research-ffmpeg-image-sr-model

Conversation

Copy link

Copilot AI commented Mar 24, 2026

Comprehensive documentation tracing the full call path libavfilter → DNN Interface → OpenVINO Backend → OpenVINO C Library and summarizing steps to integrate new image super-resolution models.

What's covered

  • Layered architecture — 5-layer design from video filters (vf_dnn_processing.c) through dnn_filter_common, dnn_interface, backend implementations, down to OpenVINO C API
  • Complete call chain — End-to-end ASCII diagram showing every function call for an SR model inference, from ff_dnn_init()dnn_load_model_ov()ov_core_create() through to infer_completion_callback()ff_proc_from_dnn_to_frame()
  • OpenVINO C API catalog — All 40+ ov_* functions used by the backend, grouped by category (core, model, preprocessing, inference, tensor, port/shape)
  • Async execution mechanism — Triple-queue architecture (task_queue / lltask_queue / request_queue) and producer-consumer pattern with SafeQueue
  • Key data structuresDNNModule, DNNModel, DnnContext, OVModel, OVRequestItem, TaskItem, DNNData
  • I/O image copy analysis — Complete inventory of all sws_scale() and av_image_copy_plane() operations across dnn_io_proc.c, vf_dnn_processing.c, and dnn_backend_openvino.c with line numbers, trigger conditions, and a typical SR copy chain diagram
  • Data copy necessity analysis — Classification of each copy operation as always-required (format conversion, normalization), conditionally-triggered (UV planes for YUV formats, NCHW layout conversions), or potentially-avoidable (GRAYF32 zero-copy, OpenVINO remote tensors), with per-configuration copy count comparison table
  • FFmpeg vs OpenVINO preprocessing comparison — Detailed analysis of which preprocessing is already delegated to OpenVINO's ov_preprocess_* API (~40%: normalization, element type conversion, layout declaration), which remains FFmpeg-only (color space conversion, UV chroma handling, packed↔planar, resize), redundancy in layout conversion (both FFmpeg manual transpose and OpenVINO pipeline), and potential optimization directions
  • Tile-based SR analysis — Documents that FFmpeg currently has no tile-based SR support, with code evidence, workaround solutions (external crop, input_resizable, Python overlap-fusion script), and future improvement directions
  • SR model integration guide — 7-step process: model conversion to IR, parameter identification, preprocessing config (scale/mean/layout), dnn_processing filter usage, optional custom pre/post-proc, build, and validation
  • Concrete examples — FFmpeg commands for SRCNN, EDSR, Real-ESRGAN (with full model preparation steps from PyTorch → ONNX → OpenVINO IR), SwinIR with correct options
  • Existing SR examples inventory — Catalog of SR resources already in the FFmpeg codebase: the dedicated vf_sr.c filter (SRCNN/ESPCN), doc/filters.texi usage examples with dnn_processing, external model training repos, and comparison between the specialized sr filter and the generic dnn_processing approach
  • Real-ESRGAN x4plus complete integration guide — End-to-end practice guide (§13) referencing Real-ESRGAN and OpenVINO docs, covering: RRDBNet architecture details and all model variants, environment setup, three model conversion methods (PyTorch→ONNX→IR, Python API, direct PyTorch→IR), FFmpeg call chain analysis with parameter-to-code mapping, usage commands for five scenarios (image/video/dynamic size/GPU/YUV), complete data flow diagram, performance optimization (FP16 quantization, nireq tuning, memory consumption table), anime model variant integration (x4plus_anime_6B, realesr-animevideov3), end-to-end validation with PyTorch reference comparison, and a one-shot complete workflow

Example usage from the guide

# RGB 3-channel SR model (Real-ESRGAN 4x) via OpenVINO
ffmpeg -i input.mp4 -vf \
  "format=rgb24,\
   dnn_processing=dnn_backend=openvino:\
   model=realesrgan_4x.xml:\
   input=input:output=output:\
   layout=nchw:scale=255:\
   input_resizable=1:async=1:nireq=2" \
  -y output.mp4

Documentation-only change — no code modifications.


⚡ Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast.

…n and SR model integration guide

Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/804dfbc5-4ad0-468f-9050-140dd13640bd
Copilot AI changed the title [WIP] Add research documentation for FFMPEG and image SR model integration Add research.md: FFmpeg DNN call chain analysis and SR model integration guide Mar 24, 2026
Copilot AI requested a review from hongbolv March 24, 2026 23:58

---

## 9. 异步执行机制
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot 对于大尺寸图片super resolution,目前有没有分块做super resolution的支持呢?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFmpeg 当前不支持分块(tile-based)超分辨率处理。所有 DNN 相关代码中没有 tile/block/patch/overlap 相关逻辑,DFT_PROCESS_FRAME 定义为 "process the whole frame",activate() 对每帧调用一次 ff_dnn_execute_model()

input_resizable 选项是 reshape 模型输入维度,不是分块;batch_size 在 OpenVINO 2.0 后端已限制为 1。

已在 f94cbc3 中新增第 10 节「大尺寸图片分块 SR 处理分析」,包含代码证据、变通方案(FFmpeg crop 滤镜外部分块、input_resizable 直接处理、Python 脚本分块+重叠融合),以及潜在的未来改进方向。

```

#### Real-ESRGAN (RGB 3通道, 4x 放大)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot 有没有集成real-ESRGAN的实现呢?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFmpeg 没有 Real-ESRGAN 的专用实现代码。整个代码库中搜索 "esrgan" 只在 research.md 中有结果。

Real-ESRGAN 通过通用的 dnn_processing 滤镜加载,与 EDSR、SwinIR 等使用完全相同的代码路径(dnn_load_model_ov()ov_core_read_model() → 通用推理流程)。用户需要自行将 Real-ESRGAN 模型转换为 OpenVINO IR 格式。

已在 f94cbc3 中扩展了 Real-ESRGAN 示例(第 11.5 节),添加了完整的模型准备步骤(PyTorch → ONNX → OpenVINO IR)和使用说明。

---

## 8. 数据流与 I/O 处理

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot I/O处理有哪些地方有image copy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已在 f94cbc3 中新增第 8.5 节「I/O 处理中的 Image Copy 详解」,逐文件列出所有 image copy 操作:

dnn_io_proc.c — 8 处 sws_scale() + 2 处 av_image_copy_plane()

  • ff_proc_from_frame_to_dnn(): L251(RGB→GBRP), L273(格式转换), L280(GRAYF32直通), L306(Y通道转float)
  • ff_proc_from_dnn_to_frame(): L101(RGB格式转换), L131(GBRP→RGB), L140(GRAYF32直通), L166(GRAYF32→GRAY8)

vf_dnn_processing.c — 最多 5 处:

  • L305 av_frame_copy_props()(属性,非像素)
  • L233 av_image_copy_plane() ×2(UV直接复制)
  • L238-244 sws_scale() ×2-3(UV缩放,SR场景)

dnn_backend_openvino.cov_tensor_data() 是零拷贝内存映射,实际复制通过回调间接触发上述 dnn_io_proc 函数。

典型 YUV420P SR 路径总计约 5 次数据复制(1次属性 + 2次格式转换 + 2次UV缩放)。

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot 视频数据的copy是必须的吗?还是在特定配置下需要的?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

大部分数据复制是架构性必须的(格式转换 + 归一化),但复制次数取决于配置:

始终必须av_frame_copy_props()(帧属性)、sws_scale() 的 UINT8↔FLOAT 类型转换(模型要求 float 输入)。

配置相关

  • UV 复制/缩放 — 仅 planar YUV 格式触发,RGB/GRAY 格式完全跳过
  • NCHW packed↔planar — 仅 layout=nchw 触发,layout=nhwc 可少 1-2 次复制
  • UV 缩放 vs 直接复制 — SR 场景尺寸变化走 sws_scale(),非 SR(如去噪)尺寸相同走 av_image_copy_plane()

配置对比:RGB24+NHWC 最少(2 次像素复制),YUV420P+SR 需 4 次,RGB24+NCHW 需 4 次。

当前无零拷贝路径:即使 GRAYF32 直通场景仍用 av_image_copy_plane() 做内存复制;OpenVINO 远程张量(ov_remote_tensor)也未被 FFmpeg 使用。

已在 90fa59a 新增 §8.5.5 详细分析。

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot openvino也有preprocessing的模块,ffmpeg层还需要这些吗?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFmpeg 后端已经在用 OpenVINO 的预处理模块——归一化(scale/mean)和布局声明(NHWC→NCHW)已委托给 ov_preprocess_preprocess_steps_scale/mean/convert_element_type()dnn_backend_openvino.c L612-750),约占预处理工作量的 40%。

但 FFmpeg 层的 sws_scale() 仍不可替代,原因:

  1. 色彩空间转换(YUV→RGB、NV12→RGB)——OpenVINO preprocessing API 没有色彩空间转换函数
  2. UV 色度通道复制/缩放——完全在 DNN 推理范围之外,只能由 FFmpeg 管理
  3. Resize——FFmpeg 后端未调用 ov_preprocess_preprocess_steps_resize(),UV 缩放由 copy_uv_planes() 处理

存在一处冗余:布局转换在 dnn_io_proc.c(手动 packed→planar 缓冲区重排)和 OpenVINO 管线(NHWC→NCHW 声明)中双重执行,是潜在的优化方向。

已在 3c0b33b 新增 §8.5.6 详细对比分析。

Copilot AI and others added 2 commits March 25, 2026 01:28
…eal-ESRGAN integration notes

Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/f1de5a73-82a9-40cd-81e0-4fb022693840
… separate block

Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/f1de5a73-82a9-40cd-81e0-4fb022693840
| 10 | ✅ 设备选择 | 根据硬件选择 CPU/GPU/VPU |

### 11.5 常见 SR 模型集成示例

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot 目前FFMPEG中有没有SR模型集成的example吗?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFmpeg 代码库中有以下 SR 相关的示例:

  1. vf_sr.c — 专用 SR 滤镜,内置支持 SRCNN 和 ESPCN(仅 TensorFlow 后端)
  2. doc/filters.texi L12252-12265 — 提供了 dnn_processing 滤镜的 SRCNN/ESPCN 完整命令示例(YUV420P Y 通道处理)
  3. 外部训练仓库 — 文档引用了 https://github.com/XueweiMeng/sr/tree/sr_dnn_native 提供模型训练和 .pb 导出脚本

无预置模型文件(用户需自行准备),无 FATE 测试,且现有示例仅覆盖 TensorFlow + 经典模型。对于 Real-ESRGAN、SwinIR 等现代模型需使用 dnn_processing + OpenVINO 后端。

已在 b37c641 新增 §11.6 详细汇总。

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已在 caf8c2e 新增 §13「Real-ESRGAN x4plus 完整集成实践」,包含:

  • 模型概述(§13.1):RRDBNet 架构参数、所有模型变体对比表
  • 环境准备(§13.2):PyTorch + OpenVINO + FFmpeg 编译环境
  • 模型转换(§13.3):3 种转换方法(ONNX→IR、Python API、PyTorch 直接转换),含完整导出脚本和验证代码
  • FFmpeg 调用链分析(§13.4):Real-ESRGAN 在 dnn_processing 中的完整调用路径,参数到代码位置的映射表
  • 使用命令(§13.5):图片 SR、视频异步流水线、动态尺寸、GPU 推理、YUV 输入等 5 种场景
  • 数据流图解(§13.6):RGB24+NCHW 下从输入帧到输出帧的完整数据复制路径
  • 性能优化(§13.7):FP16 量化、nireq 调优、内存消耗对照表
  • 模型变体集成(§13.8):x4plus_anime_6Brealesr-animevideov3 的导出和使用
  • 端到端验证(§13.9):单帧验证 + PyTorch 参考输出对比脚本
  • 完整一键流程(§13.10):从下载权重到 FFmpeg 输出的 6 步命令

Copy link
Author

Copilot AI commented Mar 25, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • docs.openvino.ai
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants