Skip to content

During MCP Client session.call_tool, if session.cleanup is encountered, it will cause call_tool to be permanently blocked. #1560

@gdisk

Description

@gdisk

Initial Checks

Description

For availability requirements, after my MCP client establishes a connection with the server, I need to use a separate thread to poll whether the MCP client is functioning properly. When detecting that the MCP client cannot retrieve tools, I manually perform cleanup and then reconnect. However, when encountering the following sequence of operations, the client's call_tool operation becomes permanently blocked:

  1. MCP client successfully establishes connection with the server;
  2. MCP client calls the call_tool method, which takes a long time to execute (assume 20 seconds);
  3. The health check thread detects that the client cannot retrieve the tool list and initiates cleanup operation;
  4. The health check thread completes cleanup, and the MCP server subprocess terminates (verified via ps command);
  5. The call_tool from step 2 remains permanently blocked without any exception (unless timeout is set);

Additional notes:

  1. If steps 3 and 4 are replaced with manually killing the MCP server subprocess, call_tool immediately responds with an error: "connection closed."
  2. For availability considerations, I cannot abandon the health check requirement.

Example Code

server.py

# server.py

import asyncio
from mcp.server.fastmcp import FastMCP

# Create server
mcp = FastMCP("Test Server")

@mcp.tool()
async def test() -> str:
    # mock block for 20s
    await asyncio.sleep(20)
    return "hello world"

if __name__ == "__main__":
    mcp.run(transport="stdio")

client.py

import logging
from contextlib import AsyncExitStack
from datetime import timedelta
from typing import Optional

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

logging.basicConfig(
    level=logging.DEBUG, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)


class MCPClient:

    def __init__(self, params: dict):
        self.params = params
        self.session: Optional[ClientSession] = None
        self.exit_stack = AsyncExitStack()

    async def connect(self):
        server_params = StdioServerParameters(**self.params)
        stdio_transport = await self.exit_stack.enter_async_context(
            stdio_client(server_params)
        )
        stdio, write = stdio_transport
        self.session = await self.exit_stack.enter_async_context(
            ClientSession(stdio, write)
        )

        await self.session.initialize()

    async def cleanup(self):
        await self.exit_stack.aclose()


async def test_call_tool(client: MCPClient):
    await asyncio.sleep(5)
    try:
        tools = await client.session.list_tools()
        logger.debug(f"[call_tool] tools count: {len(tools.tools)}")
        # timeout works well
        # result = await client.session.call_tool("test", {}, read_timeout_seconds=timedelta(seconds=60))

        # always block
        result = await client.session.call_tool("test", {})

        logger.debug(f"[call_tool] result text length: {len(result.content[0].text)}")
    except asyncio.CancelledError:
        logger.debug("[test_call_tool] cancelled")
    except Exception as e:
        logger.debug(f"test_call_tool failed: {e}")


async def test_check_connection(client: MCPClient):
    try:
        # init connection
        await client.connect()
        logger.debug("[check_connection] connect completed")

        # list tools
        tools = await client.session.list_tools()
        logger.debug(f"[check_connection] tools count: {len(tools.tools)}")

        # wait for call_tool started
        await asyncio.sleep(10)

        # mock cleanup when call_tool is running
        await client.cleanup()
        logger.debug("[check_connection] cleanup completed")

        # mock reconnect
        await client.connect()
        logger.debug("[check_connection] reconnect completed")
    except asyncio.CancelledError:
        logger.debug("[test_check_connection] cancelled")
    except Exception as e:
        logger.debug(f"[test_check_connection] failed: {e}")


async def main():
    params = {
        "command": "python",
        "args": ["server.py"],
    }
    client = MCPClient(params)
    try:
        tasks = [test_call_tool(client), test_check_connection(client)]
        await asyncio.gather(*tasks)
        logger.debug("[main] test completed")
    except Exception as e:
        logger.debug(f"[main] test failed: {e}")
    except asyncio.CancelledError:
        logger.debug("[main] test cancelled")
    finally:
        logger.debug("[main] cleanup")


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

console output

2025-10-31 11:58:14,924 - __main__ - DEBUG - [check_connection] connect completed
2025-10-31 11:58:14,926 - __main__ - DEBUG - [check_connection] tools count: 1
2025-10-31 11:58:18,828 - __main__ - DEBUG - [call_tool] tools count: 1
2025-10-31 11:58:26,945 - __main__ - DEBUG - [check_connection] cleanup completed
2025-10-31 11:58:28,071 - __main__ - DEBUG - [check_connection] reconnect completed

Python & MCP Python SDK

Python: 3.12
MCP Python SDK: 1.20.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions