Skip to content

please adapt download_embeddings to support 'file://' url scheme #54

@EricDeveaud

Description

@EricDeveaud

Hello,

can you please adapt download_embeddings to also support file://.... url schema.

this will allow us to provide centralized embedings files on our cluster, avoiding the download time.

something in this spirit.

    log.info("Downloading embeddings from %s �~F~R %s", url, dest)

    if url.startswith("file://"):
        from urllib.request import urlopen
        with urlopen(url) as resp:
            content = resp.read()
            with open(dest, "wb")as f:
                f.write(content)
        return dest
    else:
        #else download from given URL
        with requests.get(url, stream=True, timeout=timeout) as resp:
            resp.raise_for_status()

problem, this will duplicate the file for each user // base_directory

I would prefer to main.py to directly handle embeddings data file

something like:

    # Nuevo: obtener nombre del archivo desde la URL
    URL=conf["embeddings_url"]
    if URL.startswith('file://'):
        file_path=urllib.parse.urlparse(URL).path
        if os.path.exists(file_path):
            tar_path=file_path
    else:
        filename = os.path.basename(urllib.parse.urlparse(conf["embeddings_url"]).path)
        tar_path = os.path.join(embeddings_dir, filename)

        logger.info(f"Downloading reference embeddings to {tar_path}...")
        download_embeddings(conf["embeddings_url"], tar_path)

    logger.info("Loading embeddings into the database...")
    load_dump_to_db(tar_path, conf)

maybeed there is something I missed regarding the embeddings file needed in base_directory

lmk if this sounds acceptable and which method you prefer. I will then propose a PR

!hasta luego¡

Eric

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions