When ESBMC output is too big, a TPM Rate Limit error occurs. #93

Yiannis128 · 2023-11-13T01:51:44Z

When ESBMC has an output (counterexample) that is too big, then the token size is too large to be passed to the LLM. Currently, due to switching to LangChain, we don't measure and check if the current tokens have been surpassed. When the error occurs, LangChain will output it, please see Example.

Example (From the FormAI dataset FormAI_92991.c):

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for gpt-3.5-turbo-16k in organization on tokens per min (TPM): Limit 180000, Used 150144, Requested 53468. Please try again in 7.87s. Visit https://platform.openai.com/account/rate-limits to learn more..

The text was updated successfully, but these errors were encountered:

ishaan-jaff · 2023-11-20T21:29:34Z

@Yiannis128
i'm the maintainer of LiteLLM we allow you to increase your throughput - load balance between multiple deployments (Azure, OpenAI)
I'd love your feedback, especially if this does not solve your problem

Here's how to use it
Docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

Yiannis128 · 2023-11-20T22:26:08Z

@Yiannis128
i'm the maintainer of LiteLLM we allow you to increase your throughput - load balance between multiple deployments (Azure, OpenAI)
I'd love your feedback, especially if this does not solve your problem

Here's how to use it
Docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

Hi, thanks for the suggestion, before I look at this, I would like to ask if you have a hugging face model uploaded? Since I already have hugging face model support.

I will still look at it if you don't but if you do it will be much easier to implement.

ishaan-jaff · 2023-11-20T22:30:29Z

Yes we support hugging face llms - are you trying to load balance between hugging face endpoints ?

Yiannis128 · 2023-11-21T01:05:52Z

Yes we support hugging face llms - are you trying to load balance between hugging face endpoints ?

No, I only ask it because I have an interface for adding text-generation-inference compatible models through hugging face. So if you do, which is great! Is this an alternative to langchain? Could you inform me of the positives you have?

Langchain had some limitations when I implemented it, not sure about now, so I'm weighing cost/benefit of considering to switch :)

Yiannis128 added the bug Something isn't working label Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When ESBMC output is too big, a TPM Rate Limit error occurs. #93

When ESBMC output is too big, a TPM Rate Limit error occurs. #93

Yiannis128 commented Nov 13, 2023 •

edited

Loading

ishaan-jaff commented Nov 20, 2023

Yiannis128 commented Nov 20, 2023

ishaan-jaff commented Nov 20, 2023

Yiannis128 commented Nov 21, 2023 •

edited

Loading

When ESBMC output is too big, a TPM Rate Limit error occurs. #93

When ESBMC output is too big, a TPM Rate Limit error occurs. #93

Comments

Yiannis128 commented Nov 13, 2023 • edited Loading

ishaan-jaff commented Nov 20, 2023

Yiannis128 commented Nov 20, 2023

ishaan-jaff commented Nov 20, 2023

Yiannis128 commented Nov 21, 2023 • edited Loading

Yiannis128 commented Nov 13, 2023 •

edited

Loading

Yiannis128 commented Nov 21, 2023 •

edited

Loading