Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
async_polling_example.py	async_polling_example.py

Inference.net Async API - Python Samples

Sample scripts demonstrating how to use the inference.net Asynchronous Inference API for cost-effective batch processing.

Overview

The inference.net Async API allows you to submit inference requests that complete within 24-72 hours (but generally finish much faster, often within a few minutes) at significantly reduced costs. This is ideal for:

Large-scale content generation
Batch document processing
Non-urgent data analysis
Cost-sensitive workloads

Prerequisites

Python 3.10+
uv package manager
An inference.net API key

Installing uv

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or with Homebrew
brew install uv

Quick Start

Set your API key:

export INFERENCE_API_KEY=your-api-key-here

Run the polling example:

uv run async_polling_example.py

Configuration

Edit the configuration section at the top of async_polling_example.py:

NUM_REQUESTS = 10           # Number of requests to submit (max 50)
MODEL_ID = "your-model-id"  # Model to use for inference
POLL_INTERVAL_SECONDS = 2   # Seconds between polling attempts
MAX_POLL_ATTEMPTS = 120     # Maximum polling attempts

Examples

Polling Example (`async_polling_example.py`)

Demonstrates submitting a group of inference requests, polling for results, and displaying a comprehensive summary.

Features:

Configurable number of requests (up to 50)
Progress bar during polling
Request-response correlation via custom IDs
Comprehensive summary with success rates, performance metrics, and token usage

Sample Output:

================================================================================
 INFERENCE.NET ASYNC GROUP API - POLLING EXAMPLE
================================================================================

  Configuration:
    • Requests to submit: 10
    • Model: inference-net/load-test
    • Poll interval: 2s

  Submitting 10 Requests
  --------------------------

    req-001: What is the capital of France?
    req-002: What is 2 + 2?
    req-003: Name one planet in our solar system.
    req-004: What color is the sky on a clear day?
    req-005: How many legs does a spider have?
    ... and 5 more

  Submitting to Group API...
  ✓ Group created: 6Z6oLaIo0PFleu2Wc1LIg
  ✓ Group size: 10

  Polling for Results
  -----------------------

  [  1] |░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 0/10 (0%) (⏳ 0 running, 5 queued)
  [  2] |████████████████████████░░░░░░| 8/10 (80%) (⏳ 2 running, 0 queued)
  [  3] |██████████████████████████████| 10/10 (100%)

================================================================================
 GROUP GENERATION SUMMARY
================================================================================

  Success Rate
  ----------------
  ✓ |████████████████████████████████████████| 100.0%

      Successful:    10
      Failed:         0
      Total:         10

  Performance Metrics
  -----------------------
      Total wall-clock time:      12.75s
      Avg generation time:       2803.1ms
      Min generation time:        955.0ms
      Max generation time:       4556.0ms
      Throughput:                  0.78 req/s

  Token Usage
  ---------------
      Prompt tokens:            186
      Completion tokens:      1,000
      Total tokens:           1,186
      Avg completion/req:     100.0

  Request-Response Correlation
  --------------------------------

  ID         Status     Question                       Response
  ---------- ---------- ------------------------------ -------------------------
  req-001    ✓ Success  What is the capital of Fr...   The capital of France...
  req-002    ✓ Success  What is 2 + 2?                 2 + 2 equals 4.
  req-003    ✓ Success  Name one planet in our so...   Mars is a planet in...
  ...

================================================================================
 Generation complete!
================================================================================

How Request-Response Correlation Works

Each request includes a metadata.custom_id field that persists through the async processing pipeline:

{
    "model": "your-model",
    "messages": [...],
    "metadata": {"custom_id": "req-001"}  # Your tracking ID
}

When retrieving results, the custom_id is available in the response, allowing you to match responses back to your original requests:

# In the generation response:
generation["request"]["metadata"]["custom_id"]  # "req-001"

API Endpoints Used

Endpoint	Method	Description
`/v1/async/group/chat/completions`	POST	Submit a group of chat completion requests
`/v1/async/group/{groupId}/generations`	GET	Retrieve results for a group

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Inference.net Async API - Python Samples

Overview

Prerequisites

Installing uv

Quick Start

Configuration

Examples

Polling Example (`async_polling_example.py`)

How Request-Response Correlation Works

API Endpoints Used

Group API Limits

Documentation

License

FilesExpand file tree

async-with-group

Directory actions

More options

Directory actions

More options

Latest commit

History

async-with-group

Folders and files

parent directory

README.md

Inference.net Async API - Python Samples

Overview

Prerequisites

Installing uv

Quick Start

Configuration

Examples

Polling Example (async_polling_example.py)

How Request-Response Correlation Works

API Endpoints Used

Group API Limits

Documentation

License

Polling Example (`async_polling_example.py`)