Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

功能请求:实现歌单内歌曲搜索 #865

Open
mokurin000 opened this issue Aug 13, 2024 · 2 comments
Open

功能请求:实现歌单内歌曲搜索 #865

mokurin000 opened this issue Aug 13, 2024 · 2 comments
Labels
FEP feeluown enhancement proposal

Comments

@mokurin000
Copy link
Contributor

mokurin000 commented Aug 13, 2024

简介与背景

如题,目前fuo未提供于ui或rpc接口( https://github.com/cosven/feeluownx 需要)搜索播放列表中特定歌曲的功能

鉴于直接进行关键词搜索较容易匹配失败,可能需要尝试使用模糊匹配算法

方案概述

  • 基于Dice-Sørensen 模糊匹配搜索
  • 使用标准库 diffllib (对乱序支持不好)

经过一些尝试,推荐做法是以匹配结果的最多 k 个高可能结果按顺序返回(+ 数量可设置)
另外可以把字符串处理前统一为小写简体,后者依赖opencc

may block cosven/feeluownx#4

另附:

from difflib import SequenceMatcher
from sys import argv
from typing import List
from collections.abc import Hashable


def std_distance(s1: str, s2: str):
    s = SequenceMatcher(isjunk=lambda c: c.isspace(), a=s1, b=s2)
    return s.ratio()


def distance(x: List[Hashable], y: List[Hashable]) -> float:
    wx = [tuple(x[i : i + 2]) for i in range(len(x) - 1)]
    wy = [tuple(y[i : i + 2]) for i in range(len(y) - 1)]

    nx = len(wx)
    ny = len(wy)

    hash_set = set(wx)

    length = 0

    for w in wy:
        if w in hash_set:
            length += 2

    if nx + ny == 0:
        return 0.0

    return length / (nx + ny)


if __name__ == "__main__":
    print(f'{"Std difflib":16}', std_distance(argv[1], argv[2]))
    print(f'{"Dice-Sorensen":16}', distance(argv[1], argv[2]))
@mokurin000 mokurin000 added the FEP feeluown enhancement proposal label Aug 13, 2024
@cosven
Copy link
Member

cosven commented Aug 14, 2024

好奇上述算法和标准库里面的这个是类似的么?
https://docs.python.org/3/library/difflib.html#difflib.get_close_matches

@mokurin000
Copy link
Contributor Author

mokurin000 commented Aug 14, 2024

好奇上述算法和标准库里面的这个是类似的么? https://docs.python.org/3/library/difflib.html#difflib.get_close_matches

标准库的匹配算法看了下是:
https://github.com/python/cpython/blob/05fc4d758aa9b731c722e6c1c18e1f5e54597393/Lib/difflib.py#L421-L490

看起来是不同算法,不过我还没测它的ratio,我试试

不过 difflib 里面是更复杂的匹配次数,最后还是 匹配次数*2/(两者长度和),Dice-Sorensen 里面的匹配就是2元素滑动窗口取集合,另一个滑动窗口迭代计数

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FEP feeluown enhancement proposal
Projects
None yet
Development

No branches or pull requests

2 participants