Add research.md: FFmpeg DNN call chain analysis and SR model integration guide#1
Add research.md: FFmpeg DNN call chain analysis and SR model integration guide#1
Conversation
…n and SR model integration guide Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/804dfbc5-4ad0-468f-9050-140dd13640bd
|
|
||
| --- | ||
|
|
||
| ## 9. 异步执行机制 |
There was a problem hiding this comment.
@copilot 对于大尺寸图片super resolution,目前有没有分块做super resolution的支持呢?
There was a problem hiding this comment.
FFmpeg 当前不支持分块(tile-based)超分辨率处理。所有 DNN 相关代码中没有 tile/block/patch/overlap 相关逻辑,DFT_PROCESS_FRAME 定义为 "process the whole frame",activate() 对每帧调用一次 ff_dnn_execute_model()。
input_resizable 选项是 reshape 模型输入维度,不是分块;batch_size 在 OpenVINO 2.0 后端已限制为 1。
已在 f94cbc3 中新增第 10 节「大尺寸图片分块 SR 处理分析」,包含代码证据、变通方案(FFmpeg crop 滤镜外部分块、input_resizable 直接处理、Python 脚本分块+重叠融合),以及潜在的未来改进方向。
| ``` | ||
|
|
||
| #### Real-ESRGAN (RGB 3通道, 4x 放大) | ||
|
|
There was a problem hiding this comment.
FFmpeg 没有 Real-ESRGAN 的专用实现代码。整个代码库中搜索 "esrgan" 只在 research.md 中有结果。
Real-ESRGAN 通过通用的 dnn_processing 滤镜加载,与 EDSR、SwinIR 等使用完全相同的代码路径(dnn_load_model_ov() → ov_core_read_model() → 通用推理流程)。用户需要自行将 Real-ESRGAN 模型转换为 OpenVINO IR 格式。
已在 f94cbc3 中扩展了 Real-ESRGAN 示例(第 11.5 节),添加了完整的模型准备步骤(PyTorch → ONNX → OpenVINO IR)和使用说明。
| --- | ||
|
|
||
| ## 8. 数据流与 I/O 处理 | ||
|
|
There was a problem hiding this comment.
已在 f94cbc3 中新增第 8.5 节「I/O 处理中的 Image Copy 详解」,逐文件列出所有 image copy 操作:
dnn_io_proc.c — 8 处 sws_scale() + 2 处 av_image_copy_plane():
ff_proc_from_frame_to_dnn(): L251(RGB→GBRP), L273(格式转换), L280(GRAYF32直通), L306(Y通道转float)ff_proc_from_dnn_to_frame(): L101(RGB格式转换), L131(GBRP→RGB), L140(GRAYF32直通), L166(GRAYF32→GRAY8)
vf_dnn_processing.c — 最多 5 处:
- L305
av_frame_copy_props()(属性,非像素) - L233
av_image_copy_plane()×2(UV直接复制) - L238-244
sws_scale()×2-3(UV缩放,SR场景)
dnn_backend_openvino.c — ov_tensor_data() 是零拷贝内存映射,实际复制通过回调间接触发上述 dnn_io_proc 函数。
典型 YUV420P SR 路径总计约 5 次数据复制(1次属性 + 2次格式转换 + 2次UV缩放)。
There was a problem hiding this comment.
大部分数据复制是架构性必须的(格式转换 + 归一化),但复制次数取决于配置:
始终必须:av_frame_copy_props()(帧属性)、sws_scale() 的 UINT8↔FLOAT 类型转换(模型要求 float 输入)。
配置相关:
- UV 复制/缩放 — 仅 planar YUV 格式触发,RGB/GRAY 格式完全跳过
- NCHW packed↔planar — 仅
layout=nchw触发,layout=nhwc可少 1-2 次复制 - UV 缩放 vs 直接复制 — SR 场景尺寸变化走
sws_scale(),非 SR(如去噪)尺寸相同走av_image_copy_plane()
配置对比:RGB24+NHWC 最少(2 次像素复制),YUV420P+SR 需 4 次,RGB24+NCHW 需 4 次。
当前无零拷贝路径:即使 GRAYF32 直通场景仍用 av_image_copy_plane() 做内存复制;OpenVINO 远程张量(ov_remote_tensor)也未被 FFmpeg 使用。
已在 90fa59a 新增 §8.5.5 详细分析。
There was a problem hiding this comment.
FFmpeg 后端已经在用 OpenVINO 的预处理模块——归一化(scale/mean)和布局声明(NHWC→NCHW)已委托给 ov_preprocess_preprocess_steps_scale/mean/convert_element_type()(dnn_backend_openvino.c L612-750),约占预处理工作量的 40%。
但 FFmpeg 层的 sws_scale() 仍不可替代,原因:
- 色彩空间转换(YUV→RGB、NV12→RGB)——OpenVINO preprocessing API 没有色彩空间转换函数
- UV 色度通道复制/缩放——完全在 DNN 推理范围之外,只能由 FFmpeg 管理
- Resize——FFmpeg 后端未调用
ov_preprocess_preprocess_steps_resize(),UV 缩放由copy_uv_planes()处理
存在一处冗余:布局转换在 dnn_io_proc.c(手动 packed→planar 缓冲区重排)和 OpenVINO 管线(NHWC→NCHW 声明)中双重执行,是潜在的优化方向。
已在 3c0b33b 新增 §8.5.6 详细对比分析。
…eal-ESRGAN integration notes Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/f1de5a73-82a9-40cd-81e0-4fb022693840
… separate block Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/f1de5a73-82a9-40cd-81e0-4fb022693840
…nditional Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/0329e3b6-6dc6-4a1a-b799-7619a4f752f4
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/a001f4d6-aaa8-4fc4-853c-83c66d33c429
| | 10 | ✅ 设备选择 | 根据硬件选择 CPU/GPU/VPU | | ||
|
|
||
| ### 11.5 常见 SR 模型集成示例 | ||
|
|
There was a problem hiding this comment.
FFmpeg 代码库中有以下 SR 相关的示例:
vf_sr.c— 专用 SR 滤镜,内置支持 SRCNN 和 ESPCN(仅 TensorFlow 后端)doc/filters.texiL12252-12265 — 提供了dnn_processing滤镜的 SRCNN/ESPCN 完整命令示例(YUV420P Y 通道处理)- 外部训练仓库 — 文档引用了 https://github.com/XueweiMeng/sr/tree/sr_dnn_native 提供模型训练和
.pb导出脚本
但无预置模型文件(用户需自行准备),无 FATE 测试,且现有示例仅覆盖 TensorFlow + 经典模型。对于 Real-ESRGAN、SwinIR 等现代模型需使用 dnn_processing + OpenVINO 后端。
已在 b37c641 新增 §11.6 详细汇总。
There was a problem hiding this comment.
There was a problem hiding this comment.
已在 caf8c2e 新增 §13「Real-ESRGAN x4plus 完整集成实践」,包含:
- 模型概述(§13.1):RRDBNet 架构参数、所有模型变体对比表
- 环境准备(§13.2):PyTorch + OpenVINO + FFmpeg 编译环境
- 模型转换(§13.3):3 种转换方法(ONNX→IR、Python API、PyTorch 直接转换),含完整导出脚本和验证代码
- FFmpeg 调用链分析(§13.4):Real-ESRGAN 在
dnn_processing中的完整调用路径,参数到代码位置的映射表 - 使用命令(§13.5):图片 SR、视频异步流水线、动态尺寸、GPU 推理、YUV 输入等 5 种场景
- 数据流图解(§13.6):RGB24+NCHW 下从输入帧到输出帧的完整数据复制路径
- 性能优化(§13.7):FP16 量化、nireq 调优、内存消耗对照表
- 模型变体集成(§13.8):
x4plus_anime_6B和realesr-animevideov3的导出和使用 - 端到端验证(§13.9):单帧验证 + PyTorch 参考输出对比脚本
- 完整一键流程(§13.10):从下载权重到 FFmpeg 输出的 6 步命令
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/0042692c-58c9-4142-8b8c-ff7009cea3f4
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com> Agent-Logs-Url: https://github.com/hongbolv/FFmpeg/sessions/7ba31ca1-b1b2-4ea9-8d26-b9d571bf8755
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Comprehensive documentation tracing the full call path
libavfilter → DNN Interface → OpenVINO Backend → OpenVINO C Libraryand summarizing steps to integrate new image super-resolution models.What's covered
vf_dnn_processing.c) throughdnn_filter_common,dnn_interface, backend implementations, down to OpenVINO C APIff_dnn_init()→dnn_load_model_ov()→ov_core_create()through toinfer_completion_callback()→ff_proc_from_dnn_to_frame()ov_*functions used by the backend, grouped by category (core, model, preprocessing, inference, tensor, port/shape)task_queue/lltask_queue/request_queue) and producer-consumer pattern withSafeQueueDNNModule,DNNModel,DnnContext,OVModel,OVRequestItem,TaskItem,DNNDatasws_scale()andav_image_copy_plane()operations acrossdnn_io_proc.c,vf_dnn_processing.c, anddnn_backend_openvino.cwith line numbers, trigger conditions, and a typical SR copy chain diagramov_preprocess_*API (~40%: normalization, element type conversion, layout declaration), which remains FFmpeg-only (color space conversion, UV chroma handling, packed↔planar, resize), redundancy in layout conversion (both FFmpeg manual transpose and OpenVINO pipeline), and potential optimization directionsinput_resizable, Python overlap-fusion script), and future improvement directionsscale/mean/layout),dnn_processingfilter usage, optional custom pre/post-proc, build, and validationvf_sr.cfilter (SRCNN/ESPCN),doc/filters.texiusage examples withdnn_processing, external model training repos, and comparison between the specializedsrfilter and the genericdnn_processingapproachx4plus_anime_6B,realesr-animevideov3), end-to-end validation with PyTorch reference comparison, and a one-shot complete workflowExample usage from the guide
Documentation-only change — no code modifications.
⚡ Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast.