电影元数据和多语言字幕功能改进

改进概述

我们对电影字幕处理系统进行了以下关键改进：

增强了从文件路径识别电影名称的能力
- 现在会分析完整路径中的信息，而不仅仅是文件名
- 考虑父目录结构作为电影名称识别的重要线索
- 对中文、日文等非英语电影名称的识别能力显著提升
优化了 DeepSeek AI 的元数据生成 prompt
- 更详细、更结构化的 prompt 设计
- 针对非英语电影添加了更多指导信息
- 强调 IMDb ID 的重要性和正确格式
- 降低 temperature 参数 (0.2) 以获得更稳定的结果
改进了 VideoProcessor 中的处理逻辑
- 使用 MovieNameRecognizer 作为首选电影名称识别方法
- 保留原有方法作为备选，确保系统稳定性
- 从目录名和文件名中提取年份信息以提高准确性
增强多语言字幕支持
- 自动检测电影原始语言
- 为电影提供原始语言和英语双语字幕
- 优化字幕文件命名方式，以语言代码作为后缀

技术细节

电影名称识别改进

def recognize_movie_name(self, filename):
    # 提取完整路径信息
    full_path = filename

    # 分析父目录结构以获取电影线索
    parent_dirs = []
    if os.path.sep in filename:
        parent_dir = os.path.dirname(filename)
        path_parts = parent_dir.split(os.path.sep)
        # 获取最后两级目录名称（如果有）
        parent_dirs = path_parts[-2:] if len(path_parts) >= 2 else path_parts

        # 提取文件名
        filename = os.path.basename(filename)

    # 构造改进的 prompt，包含路径信息
    prompt = f"""I need to identify the official movie name from a video file.
Full path: '{full_path}'
Filename: '{name_without_ext}'
"""

    # 添加父目录信息
    if parent_dirs:
        parent_dirs_str = " > ".join(parent_dirs)
        prompt += f"Parent directories: '{parent_dirs_str}'\n"

DeepSeek 元数据生成 Prompt 优化

prompt = f"""Please generate comprehensive and accurate metadata for the movie "{movie_name}".
"""
if year:
    prompt += f"This movie was released in the year {year}.\n"

prompt += """
I need the most accurate information possible, especially for movies from China, Japan, or other non-English speaking countries.

Return the information as a well-structured JSON object with the following fields:
- title: The official English title
- original_title: The original title in the movie's native language
- release_date: in YYYY-MM-DD format
- runtime: in minutes
- overview: A concise 2-3 sentence plot summary
- genres: Array of genres
- director: Name of the director
- actors: Array of main cast members (top 5-8 actors)
- language: Primary language of the film
- country: Country of origin
- imdb_id: The IMDb ID (e.g., "tt0111161") - this is EXTREMELY important for subtitle searches
- imdb_rating: Numerical IMDb rating if known
"""

测试结果

我们使用多个测试案例验证了改进的有效性，包括：

对日本动画电影的测试：
- 《龙猫》(My Neighbor Totoro)
- 《千与千寻》(Spirited Away)
测试结果显示系统能够：
- 从路径中正确识别电影名称
- 推断电影的原始语言（如日语）
- 生成准确的元数据，包括 IMDb ID、原始标题和英文标题
- 下载对应的多语言字幕

后续改进方向

继续优化元数据识别准确性：
- 考虑添加更多的电影数据库 API 作为后备
- 改进对特殊字符和非常规命名的处理
增强用户界面功能：
- 允许用户在 Web UI 中手动调整识别结果
- 提供更直观的多语言字幕选择界面
扩展语言支持：
- 添加对更多语言的支持
- 优化小语种电影的识别和处理
提高字幕匹配质量：
- 实现更智能的字幕评分和选择机制
- 开发字幕同步和校正功能

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

电影元数据和多语言字幕功能改进

改进概述

技术细节

电影名称识别改进

DeepSeek 元数据生成 Prompt 优化

测试结果

后续改进方向

FilesExpand file tree

README_IMPROVEMENTS.md

Latest commit

History

README_IMPROVEMENTS.md

File metadata and controls

电影元数据和多语言字幕功能改进

改进概述

技术细节

电影名称识别改进

DeepSeek 元数据生成 Prompt 优化

测试结果

后续改进方向