Skip to content

unknown encoding #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yangwendy opened this issue Mar 6, 2023 · 5 comments
Closed

unknown encoding #51

yangwendy opened this issue Mar 6, 2023 · 5 comments

Comments

@yangwendy
Copy link

pip install tiktoken in python 3.10

import tiktoken
enc = tiktoken.encoding_for_model("text-davinci-003")

Report error: ValueError: Unknown encoding p50k_base

assert ENCODING_CONSTRUCTORS is not None
59 if encoding_name not in ENCODING_CONSTRUCTORS:
---> 60 raise ValueError(f"Unknown encoding {encoding_name}")

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Mar 6, 2023

This is not enough information to reproduce the problem. Could you run these commands and paste the full output:

python --version
python -c 'import platform; print(platform.platform())'
python -m venv env
source env/bin/activate
env/bin/python -m pip install wheel
env/bin/python -m pip install tiktoken
env/bin/python -c 'import tiktoken; print(tiktoken.get_encoding("gpt2"))'
env/bin/python -c 'import site; import os; print(os.listdir(site.getsitepackages()[0]))'

@eddir
Copy link

eddir commented Mar 6, 2023

Happens to me too when I try to pack my application into exe through pyinstaller. Seems like module tiktoken_ext is unavailable after packaging.
I tried to add this module but I got another error: "FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\eddir\PycharmProjects\Deus\output\main\blobfile\VERSION'" .

PyInstaller args:
pyinstaller --noconfirm --onedir --windowed --collect-all "tiktoken_ext" "C:/Users/eddir/PycharmProjects/Deus/main.py"

Looks like the importing system for encoding is a little bit confuse.

@hauntsaninja
Copy link
Collaborator

Thanks, there's another issue in which people are talking about pyinstaller: #43

If anyone has an issue that does not involve pyinstaller, please run the commands in #51 (comment) and paste the full log

@hauntsaninja
Copy link
Collaborator

OP hasn't responded, so closing. #43 is the right issue to talk about pyinstaller in

@hauntsaninja hauntsaninja closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2023
@rishabhgupta93
Copy link

I am getting similar issue while loading the encoding:

The code snippet is as follows:

import tiktoken
from llama_index.callbacks import CallbackManager, TokenCountingHandler
enc = tiktoken.get_encoding("WhereIsAI/UAE-Large-V1")
token_counter = TokenCountingHandler(tokenizer= enc.encode)

and the error i am getting is as follows:

_---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[20], line 3
1 import tiktoken
2 from llama_index.callbacks import CallbackManager, TokenCountingHandler
----> 3 enc = tiktoken.get_encoding("WhereIsAI/UAE-Large-V1")
4 token_counter = TokenCountingHandler(tokenizer= enc.encode)

File f:\pycharmprojects\llamaindex\venv\lib\site-packages\tiktoken\registry.py:68, in get_encoding(encoding_name)
65 assert ENCODING_CONSTRUCTORS is not None
67 if encoding_name not in ENCODING_CONSTRUCTORS:
---> 68 raise ValueError(
69 f"Unknown encoding {encoding_name}. Plugins found: {_available_plugin_modules()}"
70 )
72 constructor = ENCODING_CONSTRUCTORS[encoding_name]
73 enc = Encoding(**constructor())

ValueError: Unknown encoding WhereIsAI/UAE-Large-V1. Plugins found: ['tiktoken_ext.openai_public']_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants