Inference.net Async API - Python Samples

Sample scripts demonstrating how to use the inference.net Asynchronous Inference API for cost-effective batch processing.

Overview

The inference.net Async API allows you to submit inference requests that complete within 24-72 hours (but generally finish much faster, often within a few minutes) at significantly reduced costs. This repository contains two approaches for working with the async API:

Approach	Description	Best For
Single Request	Submit requests individually, poll each separately	Streaming, fine-grained control
Group API	Submit up to 50 requests as a batch	Batch processing, simpler tracking

Quick Comparison

┌─────────────────────────────────────────────────────────────────────────────┐
│                         SINGLE REQUEST APPROACH                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Request 1 ──► /v1/async/chat/completions ──► ID-1 ──► Poll /generation/1  │
│   Request 2 ──► /v1/async/chat/completions ──► ID-2 ──► Poll /generation/2  │
│   Request 3 ──► /v1/async/chat/completions ──► ID-3 ──► Poll /generation/3  │
│                                                                             │
│   • Each request tracked independently                                      │
│   • Poll each generation ID separately                                      │
│   • Process results as they complete                                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                           GROUP API APPROACH                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────┐                                                           │
│   │ Request 1   │                                                           │
│   │ Request 2   │──► /v1/async/group/chat/completions ──► Group ID          │
│   │ Request 3   │                                              │            │
│   └─────────────┘                                              ▼            │
│                                            Poll /group/{id}/generations     │
│                                                              │              │
│   • Single API call for multiple requests                    ▼              │
│   • One group ID tracks all requests          ┌─────────────────────────┐   │
│   • Get all results in one response           │ All results together    │   │
│                                               └─────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

Feature Comparison

Feature	Single Request	Group API
Max requests	Unlimited	50 per group
Submission	Individual API calls	Single batch call
Tracking	One ID per request	One ID for entire group
Polling	Poll each request	Poll once for all
Results	Process as completed	Get all at once
Complexity	More code, more control	Simpler, less flexible

When to Use Each Approach

Use Single Request (`async/`) when:

Requests arrive over time (streaming/real-time)
You need different retry logic per request
You want to process results immediately as each completes
You're integrating with existing per-request workflows
You need more than 50 requests

Use Group API (`async-with-group/`) when:

You have a batch of related requests ready to submit
You want simpler code with less polling logic
You're processing up to 50 requests together
You want to track completion of an entire batch
You plan to use webhook notifications (coming soon)

Prerequisites

Python 3.10+
uv package manager
An inference.net API key

Installing uv

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or with Homebrew
brew install uv

Quick Start

Set your API key:

export INFERENCE_API_KEY=your-api-key-here

Run either example:

# Single request approach
uv run async/async_polling_example.py

# Group API approach
uv run async-with-group/async_polling_example.py

Project Structure

python-async-samples/
├── README.md                          # This file
├── async/                             # Single request approach
│   ├── README.md                      # Detailed documentation
│   └── async_polling_example.py       # Example script
└── async-with-group/                  # Group API approach
    ├── README.md                      # Detailed documentation
    └── async_polling_example.py       # Example script

API Endpoints Reference

Single Request API

Endpoint	Method	Description
`/v1/async/chat/completions`	POST	Submit a single async request
`/v1/generation/{id}`	GET	Get result for a generation

Group API

Endpoint	Method	Description
`/v1/async/group/chat/completions`	POST	Submit a group of requests
`/v1/async/group/{groupId}/generations`	GET	Get all results for a group

Documentation

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inference.net Async API - Python Samples

Overview

Quick Comparison

Feature Comparison

When to Use Each Approach

Use Single Request (`async/`) when:

Use Group API (`async-with-group/`) when:

Prerequisites

Installing uv

Quick Start

Project Structure

API Endpoints Reference

Single Request API

Group API

Documentation

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
async-with-group		async-with-group
async		async
README.md		README.md

context-labs/group-async-inference-example-py

Folders and files

Latest commit

History

Repository files navigation

Inference.net Async API - Python Samples

Overview

Quick Comparison

Feature Comparison

When to Use Each Approach

Use Single Request (async/) when:

Use Group API (async-with-group/) when:

Prerequisites

Installing uv

Quick Start

Project Structure

API Endpoints Reference

Single Request API

Group API

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Use Single Request (`async/`) when:

Use Group API (`async-with-group/`) when:

Packages