GLM5: Timeouts and rate limit errors

## Problem

Users are experiencing frequent timeouts and rate limit (429) errors when using GLM5 inference.

### Error Details

```
HTTPSConnectionPool(host='cloud-api.near.ai', port=443): Read timed out. (read timeout=60)
```

```
Rate limiting (429): 429 Client Error: Too Many Requests for url: https://cloud-api.near.ai/v1/chat/completions
```

### Source

- Slack: https://jasnahworkspace.slack.com/archives/C09GV3MTGFK/p1771535121703859
- Datadog logs: https://us3.datadoghq.com/logs?query=env%3Aprod%20service%3Acloud-api%20status%3A%28warn%20OR%20error%29%20%40fields.message%3A%22Organization%20concurrent%20request%20limit%20exceeded%20for%20model%22&agg_m=count&agg_m_source=base&agg_t=count&clustering_pattern_field_path=message&cols=host%2Cservice&messageDisplay=inline&refresh_mode=sliding&storage=hot&stream_sort=desc&viz=stream&from_ts=1771363463621&to_ts=1771536263621&live=true

## Proposed Resolution

1. **Relax rate limits** for GLM5 endpoints
2. **Add more GLM5 instances** to handle current demand

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLM5: Timeouts and rate limit errors #450

Problem

Error Details

Source

Proposed Resolution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GLM5: Timeouts and rate limit errors #450

Description

Problem

Error Details

Source

Proposed Resolution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions