Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Using Mistral model through Azure Endpoint #2828

Open
mpalaourg opened this issue May 30, 2024 · 5 comments
Open

[Bug]: Using Mistral model through Azure Endpoint #2828

mpalaourg opened this issue May 30, 2024 · 5 comments
Labels
0.2 Issues which are related to the pre 0.4 codebase needs-triage

Comments

@mpalaourg
Copy link

Describe the bug

Hello everyone. I am trying to use models deployed in Azure AI Studio, using the Azure Endpoint (not Azure OpenAI).

I have deployed the models, and from my understanding I need to set llm_config to contain api_key and base_url from the specific resource from Azure. In this case I am using a serverless endpoint that hosts a Mistral-large model.

I am 99% confident that the connection between Autogen and Azure Endpoint has been established, because I can get a reply back from the model using as a summary_method the last message and also I was able to use it and have it call a tool.

From the error message, I figured out that a message with type "assistant" tries to be send to the model but it's not supported. This is created somewhere in client.py. It's there a hacky way, I can get around this?

Steps to reproduce

You will need a model deployed in Azure AI Studio (or I guess just a Mistral model)

Then using two agents a simple initiate_chat with summary_method="relfection_with_llm" will show the behavior I describe.

Model Used

mistral-large deployed on Azure AI Studio

Expected Behavior

When I am using a gpt model as a foundation model, I can set summary_method and summary_args to get a summarized version of the chat between agents.

For example

user_query = "Tell me a joke"
summary_prompt = f"Return the answer to {user_query}. If the intended request is NOT properly addressed, please point\
 it out and provide a summarized takeaway from the conversation. Do not add any introductory phrases. "

chat_res = user_proxy_agent.initiate_chat(
    joker_agent,
    clear_history=False,
    message=user_query,
    summary_method="reflection_with_llm",
    summary_args={ "summary_prompt": summary_prompt},
    max_turns=3
)

And a summary is returned.

Screenshots and logs

The conversation took place ->

UserProxy (to JokerAgent):

Tell me a joke

--------------------------------------------------------------------------------
JokerAgent (to UserProxy):

I'm glad you found me helpful, and I'd be happy to share another joke with you:

Why did the tomato turn red?

Because it saw the salad dressing!

I hope you enjoyed that one. If you have any more questions or need assistance with anything else, please don't hesitate to ask.

TERMINATE

--------------------------------------------------------------------------------

but then I got this error message

---------------------------------------------------------------------------
APIStatusError                            Traceback (most recent call last)
Cell In[5], line 5
      1 user_query = "Tell me a joke"
      2 summary_prompt = f"Return the answer to {user_query}. If the intended request is NOT properly addressed, please point\
      3  it out and provide a summarized takeaway from the conversation. Do not add any introductory phrases. "
----> 5 chat_res = user_proxy_agent.initiate_chat(
      6     joker_agent,
      7     clear_history=False,
      8     message=user_query,
      9     summary_method="reflection_with_llm",
     10     summary_args={ "summary_prompt": summary_prompt},
     11     max_turns=3
     12 )
     13 print(f"User Question: {user_query}\n\nAgent Answer: {chat_res.summary}")

File ..\.venv\lib\site-packages\autogen\agentchat\conversable_agent.py:992, in ConversableAgent.initiate_chat(self, recipient, clear_history, silent, cache, max_turns, summary_method, summary_args, message, **kwargs)
    990         msg2send = self.generate_init_message(message, **kwargs)
    991     self.send(msg2send, recipient, silent=silent)
--> 992 summary = self._summarize_chat(
    993     summary_method,
    994     summary_args,
    995     recipient,
    996     cache=cache,
    997 )
    998 for agent in [self, recipient]:
    999     agent.client_cache = agent.previous_cache

File ..\.venv\lib\site-packages\autogen\agentchat\conversable_agent.py:1113, in ConversableAgent._summarize_chat(self, summary_method, summary_args, recipient, cache)
   1110     summary_method = self._last_msg_as_summary
   1112 if isinstance(summary_method, Callable):
-> 1113     summary = summary_method(self, recipient, summary_args)
   1114 else:
   1115     raise ValueError(
   1116         "If not None, the summary_method must be a string from [`reflection_with_llm`, `last_msg`] or a callable."
   1117     )

File ..\.venv\lib\site-packages\autogen\agentchat\conversable_agent.py:1146, in ConversableAgent._reflection_with_llm_as_summary(sender, recipient, summary_args)
   1144 agent = sender if recipient is None else recipient
   1145 try:
-> 1146     summary = sender._reflection_with_llm(prompt, msg_list, llm_agent=agent, cache=summary_args.get("cache"))
   1147 except BadRequestError as e:
   1148     warnings.warn(
   1149         f"Cannot extract summary using reflection_with_llm: {e}. Using an empty str as summary.", UserWarning
   1150     )

File ..\.venv\lib\site-packages\autogen\agentchat\conversable_agent.py:1179, in ConversableAgent._reflection_with_llm(self, prompt, messages, llm_agent, cache)
   1177 else:
   1178     raise ValueError("No OpenAIWrapper client is found.")
-> 1179 response = self._generate_oai_reply_from_client(llm_client=llm_client, messages=messages, cache=cache)
   1180 return response

File ..\.venv\lib\site-packages\autogen\agentchat\conversable_agent.py:1319, in ConversableAgent._generate_oai_reply_from_client(self, llm_client, messages, cache)
   1316         all_messages.append(message)
   1318 # TODO: #1143 handle token limit exceeded error
-> 1319 response = llm_client.create(
   1320     context=messages[-1].pop("context", None),
   1321     messages=all_messages,
   1322     cache=cache,
   1323 )
   1324 extracted_response = llm_client.extract_text_or_completion_object(response)[0]
   1326 if extracted_response is None:

File ..\.venv\lib\site-packages\autogen\oai\client.py:638, in OpenAIWrapper.create(self, **config)
    636 try:
    637     request_ts = get_current_ts()
--> 638     response = client.create(params)
    639 except APITimeoutError as err:
    640     logger.debug(f"config {i} timed out", exc_info=True)

File ..\.venv\lib\site-packages\autogen\oai\client.py:285, in OpenAIClient.create(self, params)
    283     params = params.copy()
    284     params["stream"] = False
--> 285     response = completions.create(**params)
    287 return response

File ..\.venv\lib\site-packages\openai\_utils\_utils.py:275, in required_args.<locals>.inner.<locals>.wrapper(*args, **kwargs)
    273             msg = f"Missing required argument: {quote(missing[0])}"
    274     raise TypeError(msg)
--> 275 return func(*args, **kwargs)

File ..\.venv\lib\site-packages\openai\resources\chat\completions.py:667, in Completions.create(self, messages, model, frequency_penalty, function_call, functions, logit_bias, logprobs, max_tokens, n, presence_penalty, response_format, seed, stop, stream, temperature, tool_choice, tools, top_logprobs, top_p, user, extra_headers, extra_query, extra_body, timeout)
    615 @required_args(["messages", "model"], ["messages", "model", "stream"])
    616 def create(
    617     self,
   (...)
    665     timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
    666 ) -> ChatCompletion | Stream[ChatCompletionChunk]:
--> 667     return self._post(
    668         "/chat/completions",
    669         body=maybe_transform(
    670             {
    671                 "messages": messages,
    672                 "model": model,
    673                 "frequency_penalty": frequency_penalty,
    674                 "function_call": function_call,
    675                 "functions": functions,
    676                 "logit_bias": logit_bias,
    677                 "logprobs": logprobs,
    678                 "max_tokens": max_tokens,
    679                 "n": n,
    680                 "presence_penalty": presence_penalty,
    681                 "response_format": response_format,
    682                 "seed": seed,
    683                 "stop": stop,
    684                 "stream": stream,
    685                 "temperature": temperature,
    686                 "tool_choice": tool_choice,
    687                 "tools": tools,
    688                 "top_logprobs": top_logprobs,
    689                 "top_p": top_p,
    690                 "user": user,
    691             },
    692             completion_create_params.CompletionCreateParams,
    693         ),
    694         options=make_request_options(
    695             extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout
    696         ),
    697         cast_to=ChatCompletion,
    698         stream=stream or False,
    699         stream_cls=Stream[ChatCompletionChunk],
    700     )

File ..\.venv\lib\site-packages\openai\_base_client.py:1233, in SyncAPIClient.post(self, path, cast_to, body, options, files, stream, stream_cls)
   1219 def post(
   1220     self,
   1221     path: str,
   (...)
   1228     stream_cls: type[_StreamT] | None = None,
   1229 ) -> ResponseT | _StreamT:
   1230     opts = FinalRequestOptions.construct(
   1231         method="post", url=path, json_data=body, files=to_httpx_files(files), **options
   1232     )
-> 1233     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

File ..\.venv\lib\site-packages\openai\_base_client.py:922, in SyncAPIClient.request(self, cast_to, options, remaining_retries, stream, stream_cls)
    913 def request(
    914     self,
    915     cast_to: Type[ResponseT],
   (...)
    920     stream_cls: type[_StreamT] | None = None,
    921 ) -> ResponseT | _StreamT:
--> 922     return self._request(
    923         cast_to=cast_to,
    924         options=options,
    925         stream=stream,
    926         stream_cls=stream_cls,
    927         remaining_retries=remaining_retries,
    928     )

File ..\.venv\lib\site-packages\openai\_base_client.py:1013, in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls)
   1010         err.response.read()
   1012     log.debug("Re-raising status error")
-> 1013     raise self._make_status_error_from_response(err.response) from None
   1015 return self._process_response(
   1016     cast_to=cast_to,
   1017     options=options,
   (...)
   1020     stream_cls=stream_cls,
   1021 )

APIStatusError: Error code: 424 - {'object': 'Error', 'message': 'Expected last role to be one of: [tool, user] but got assistant', 'type': 'invalid_request_error', 'code': 3230}

Additional Information

pyautogen==0.2.26
python 3.10.11

@mpalaourg mpalaourg added the bug label May 30, 2024
@gagb
Copy link
Collaborator

gagb commented Aug 28, 2024

@mpalaourg -- do you still have this bug? @marklysze might have helpful suggestions per his experience with using different models.

@marklysze
Copy link
Contributor

@mpalaourg, if you are still having this issue, can you please provide your joker_agent's config (without keys).

This issue relates to the order of roles in the messages which need to be adjusted for Mistral (and other non-OpenAI inference). If it's still an issue a possibility is updating the Mistral client class to support a connection to Azure's serverless endpoints.

@rysweet rysweet added 0.2 Issues which are related to the pre 0.4 codebase needs-triage labels Oct 2, 2024
@fniedtner fniedtner removed the bug label Oct 24, 2024
@saad-vb
Copy link

saad-vb commented Oct 25, 2024

The conversation is probably something picked from the cache. The problem probably lies in the deployment of the LLM on Azure's endpoints. I tried with Meta-Llama-3-70B-Instruct model deployment over Azure serverless endpoint with the following config and get an "openai.NotFoundError: NOT FOUND" error.

{
"model": Meta-Llama-3-70B-Instruct-mq,
"api_key": ,
"base_url": "https://<deployment_name>.eastus.models.ai.azure.com",
"api_type": "azure",
"api_version": "8"
}

The code works fine with a regular Azure Open AI config
{
"model": "gpt-4o",
"api_type": "azure",
"api_key": <api_key>,
"base_url": "https://<deployment_name>.openai.azure.com",
"api_version": "2024-08-01-preview"
}

@JMLX42
Copy link
Contributor

JMLX42 commented Jan 14, 2025

Same problem using LiteLLM + mistral-large-latest

Here is a curl equivalent of the query that autogen does:

curl -X POST \
https://api.mistral.ai/v1/chat/completions \
-H 'Authorization: Bearer <the-api-they>' \
-H 'Content-Type: application/json' \
-d '{
    "model": "mistral-large-latest",
    "messages": [
        {
            "content": "You are a GitLab assistant: your purpose is to help users discuss a specific GitLab issue.",
            "role": "system"
        },
        {
            "content": "Please handle the following todo: GitLab instance URL: https://gitlab.com/api/v4\n\n Todo ID: 493871141\n Todo action: directly_addressed\n Todo state: pending\n Todo target ID: 25\n Todo target type: Issue\n Todo target URL: https://gitlab.com/lx-industries/wally-the-wobot/tests/repl-tests/-/issues/25#note_2296444245\n\n Project ID: 45010942\n Project name: LX Industries / Wally The Wobot / tests / REPL Tests\n Project path: lx-industries/wally-the-wobot/tests/repl-tests\n Project default branch: main\n Project description: \n",
            "role": "user"
        },
        {
            "content": "Please reply to the user.",
            "role": "user"
        },
        {
            "tool_calls": [
                {
                    "id": "iRvM4muS4",
                    "function": {
                        "arguments": "{\"todo_id\": 493871141, \"project_id\": 45010942, \"target_url\": \"https://gitlab.com/lx-industries/wally-the-wobot/tests/repl-tests/-/issues/25#note_2296444245\", \"target_type\": \"Issue\", \"target_id\": 25}",
                        "name": "get_todo_discussion_id"
                    },
                    "type": "function"
                }
            ],
            "role": "assistant"
        },
        {
            "content": "e7764e059fad9a55ff30dbd4b2bf108b5205e486",
            "role": "tool",
            "tool_call_id": "iRvM4muS4"
        },
        {
            "content": "[{\"name\": \"list_issue_notes\", \"arguments\": {\"project_id\": 45010942, \"issue_iid\": 25, \"discussion_id\": \"e7764e059fad9a55ff30dbd4b2bf108b5205e486\"}}]",
            "role": "assistant"
        }
    ]
}'

I get the following error:

An error occurred: litellm.BadRequestError: MistralException - Error code: 400 - {'object': 'error', 'message': 'Expected last role User or Tool (or Assistant with prefix True) for serving but got assistant', 'type': 'invalid_request_error', 'param': None, 'code': None}

With the relevant message being Expected last role User or Tool (or Assistant with prefix True) for serving but got assistant.

How I run LiteLLM:

compose.yml

---
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-v1.58.1@sha256:0bd93bb9062e4cb004c8f85c5eb8bf0469f1830f8c888f0f1b1f196d2747774e
    volumes:
      - ./config.yml:/app/config.yml:ro
    ports:
      - 4000:4000
    command: ["--config", "/app/config.yml", --detailed_debug]

config.yml

---

model_list: 
  - model_name: mistral-large-latest
    litellm_params:
      model: mistral/mistral-large-latest
      api_base: https://api.mistral.ai/v1/
      api_key: the-api-key
    model_info:
      id: mistral-large-latest
      max_tokens: 131072

litellm_settings:
  drop_params: true

general_settings:

Then, in my app:

        model_client = OpenAIChatCompletionClient(
            model="mistral-large-latest",
            api_key="notneeded",
            base_url="http://0.0.0.0:4000",
            model_capabilities={
                "json_output": False,
                "vision": False,
                "function_calling": True,
            },
        )

Update: the problem does not exist with the OpenAI API (https://api.openai.com/v1/chat/completions with model = gpt-4o).

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 14, 2025

Related #5044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 Issues which are related to the pre 0.4 codebase needs-triage
Projects
None yet
Development

No branches or pull requests

8 participants