Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all semantic search items will have a score of 0 #17

Open
johnqiuwan opened this issue Jul 31, 2024 · 8 comments
Open

all semantic search items will have a score of 0 #17

johnqiuwan opened this issue Jul 31, 2024 · 8 comments

Comments

@johnqiuwan
Copy link

I have followed the doc to do the Realtime search query. The setup is smooth, and the query has no error.

However, I noticed that all the result of the semantic search query items will have a score of 0

Is that normal ?

@vladvildanov
Copy link
Collaborator

@johnqiuwan Thanks for reaching out! Could you provide more details on the problem and some reproducable code example?

@johnqiuwan
Copy link
Author

Thank you for the quick reply!

Sample code to process the semantic search

<?php

namespace App\Services;

use RedisVentures\RedisVl\Vectorizer\Factory;
use RedisVentures\RedisVl\VectorHelper;
use RedisVentures\RedisVl\Query\VectorQuery;
use RedisVentures\RedisVl\Index\SearchIndex;
use Predis\Client;



class VectorQueryService
{
    protected $factory;
    protected $vectorProvider;
    protected $vectorHelper;
    protected $index;
    public function __construct()
    {
        //
        $this->factory = new Factory();
        $this->vectorProvider =
            $this->factory->createVectorizer('openai', env('TEXT_EMBEDDING_MODEL'));
        $this->vectorHelper = new VectorHelper();

        $this->index = new SearchIndex(new Client(), $this->schema());

        $this->index->create();
    }

    private function schema()
    {
        $schema = [
            'index' => [
                'name' => 'idx:product',
                'prefix' => 'laravel_hemes_database_product_by_id:',
                'storage_type' => 'json',
            ],
            'fields' => [
                'id' => [
                    // 'path' => '$.id',
                    'type' => 'numeric',
                ],
                'description' => [
                    // 'path' => '$.description',
                    'type' => 'text',
                ],
                'vector' => [
                    // 'path' => '$.description_embeddings',
                    'type' => 'vector',
                    'dims' => 1536,
                    'datatype' => 'float32',
                    'algorithm' => 'flat',
                    'distance_metric' => 'cosine'
                ],
                'image' => [
                    'type' => 'tag'
                ],
                'slug' => [
                    // 'path' => '$.slug',
                    'type' => 'tag',
                ],
                'product_name_text' => [
                    // 'path' => '$.product_name',
                    'type' => 'text',
                ],
                'price' => [
                    // 'path' => '$.price',
                    'type' => 'numeric',
                    // 'sortable' => true,
                ],
                'current_price' => [
                    // 'path' => '$.price',
                    'type' => 'numeric',
                    // 'sortable' => true,
                ],
                'created' => [
                    // 'path' => '$.created_at',
                    'type' => 'numeric',
                    // 'sortable' => true,
                ],
                'variant_options' => [
                    // 'path' => '$.variant_options',
                    'type' => 'tag',
                ],
                'model' => [
                    // 'path' => '$.product_specifications.model',
                    'type' => 'text',
                ],
                'category' => [
                    //'path' => '$.product_specifications.category',
                    'type' => 'tag',
                ],
                'manufactory' => [
                    // 'path' => '$.product_specifications.manufactory[*]',
                    'type' => 'tag',
                ],
            ],
        ];
        return $schema;
    }

    public function embed($text)
    {
        $embedding = $this->vectorProvider->embed($text);
        $embedding = $embedding['data'][0]['embedding'];

        if (!is_array($embedding)) {
            $embedding = [$embedding];
        }
        return $embedding;
    }

    public function query($embedding)
    {
        // $embedding = [VectorHelper::toBytes($embedding)];

        $query = new VectorQuery($embedding, 'vector', ['id', 'description', 'product_name_text', 'variant_options', 'model', 'category', 'manufactory', 'price', 'current_price',  'slug', 'image'], 10, true, 3);

        return $this->index->query($query);
    }

    public function processResult($result)
    {
        return collect($result)->map(function ($product, $key) {
            return collect($product)->transform(function ($value) {
                return json_decode($value, true);
            });
        })->values()->toArray();
    }

    public function resultDto($result)
    {

        return collect($result)->map(function ($product, $key) {
            $product['id'] = $product['id'][0];
            $product['description'] = $product['description'][0];
            $product['product_name'] = $product['product_name_text'][0];
            $product['slug'] = $product['slug'][0];
            return $product;
        })->toArray();
    }
}

Context:

  1. using redisjson to store the embedding data
  2. using openai text-embedding-3-small model to do the embedding (dimension 1536)

Already checked:

  1. The redisjson index created successfully
  2. The embedding data stored successfully in redisjson
  3. There is no errors when perform the search

Problem:
All the items returned will have a score of 0

Expected behavior
the score should not all 0

Versions:

  • Predis: 2.2
  • PHP 8.1
  • Redis Stack 7.2.4
  • mac

Additional context
If the vector value is updated in the redisjson, the search result will update accordingly. It seems the search is working but just all the scores are 0.

@johnqiuwan
Copy link
Author

Does any updates on this @vladvildanov , thank you

@vladvildanov
Copy link
Collaborator

@johnqiuwan By default Redis calculates scores based on terms frequency and it's occurrences in the document. Could you try to use other scorers available by default in Redis? It feels like it's something related to server-side

https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/scoring/

@johnqiuwan
Copy link
Author

Thank you for the updates! I have looked the doc from the link you gave, but I am still not make sure why all return items has score of 0. It seems to me is a bug, but the items will update if the embedding updated. This is kind of strange to me , lol

I am appreciated your time and your amazing work.

@vladvildanov
Copy link
Collaborator

Thank you so much! Let me know if you find something or feel free to contribute 👌

@tfortin
Copy link

tfortin commented Dec 11, 2024

Hello there,

I'm facing the same issue, with pretty much the same schema than you @johnqiuwan. I've tried as you said @vladvildanov to force another scorer than default TFIDF, but I'm still getting 0 as result. BTW, this could be an idea for improving your library, just adding a scorer parameter to the VectorQuery class and then in getSearchArguments():

        if ($this->scorer) {
            $searchArguments = $searchArguments->scorer($this->scorer);
        }

It would allow user to set a different scorer.

Anyway, as a workaround I was able to retrieve the vector_score in the returnFields:

        $query = new VectorQuery(
            $embedding,
            'sentence_embedding',
            ['id', 'sentence', 'vector_score'], // Vector score is added here
            scorer: 'BM25', // This would be nice
        );

The difference is that the lower the vector_score is, the closer the sentence is to the result.

Hope this could help! And thanks for your work @vladvildanov

@vladvildanov
Copy link
Collaborator

@tfortin Thank you for your feedback! I will take a look in near time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants