Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

readme.md

api_huggingface

stable

Comprehensive Rust client for HuggingFace Inference API with Router API support for Pro models.

🎯 Architecture: Stateless HTTP Client

This API crate is designed as a stateless HTTP client with zero persistence requirements. It provides:

  • Direct HTTP calls to the HuggingFace Inference API
  • In-memory operation state only (resets on restart)
  • No external storage dependencies (databases, files, caches)
  • No configuration persistence beyond environment variables

This ensures lightweight, containerized deployments and eliminates operational complexity.

🏛️ Governing Principle: "Thin Client, Rich API"

Expose all server-side functionality transparently while maintaining zero client-side intelligence or automatic behaviors.

Key principles:

  • API Transparency: One-to-one mapping with HuggingFace API endpoints
  • Zero Automatic Behavior: No implicit decision-making or magic thresholds
  • Explicit Control: Developer decides when, how, and why operations occur
  • Configurable Reliability: Enterprise features available through explicit configuration

Scope

In Scope

  • Text generation via Router API (Llama-3, Mistral, Kimi-K2)
  • Embeddings generation with similarity calculations
  • Model discovery and status checking
  • Streaming responses (SSE)
  • Vision APIs (classification, detection, captioning)
  • Audio APIs (ASR, TTS, classification, transformation)
  • Enterprise reliability (circuit breaker, rate limiting, failover, health checks)
  • Synchronous API wrapper

Out of Scope

  • Model training (inference only)
  • File upload/download (text-based API interactions only)
  • Custom model hosting (HuggingFace hosted models only)
  • GraphQL support (REST API only)

Features

Core Capabilities:

  • Router API for Pro plan models (OpenAI-compatible format)
  • Text generation with streaming support
  • Embeddings with similarity calculations
  • Model availability checking

Multimodal Features:

  • Vision: Image classification, object detection, captioning
  • Audio: Speech recognition, text-to-speech, classification

Enterprise Reliability:

  • Circuit breaker pattern for failure detection
  • Rate limiting with token bucket algorithm
  • Multi-endpoint failover (4 strategies)
  • Background health checks
  • Dynamic configuration with runtime updates
  • Performance metrics tracking
  • LRU caching with TTL

Installation

Add to your Cargo.toml:

[dependencies]
api_huggingface = "0.2.0"

Quick Start

Basic Usage

use api_huggingface::
{
  Client,
  environment::HuggingFaceEnvironmentImpl,
  components::{ input::InferenceParameters, models::Models },
  secret::Secret,
};

#[ tokio::main ]
async fn main() -> Result< (), Box< dyn std::error::Error > >
{
  let api_key = Secret::load_from_env( "HUGGINGFACE_API_KEY" )?;
  let env = HuggingFaceEnvironmentImpl::build( api_key, None )?;
  let client = Client::build( env )?;

  let params = InferenceParameters::new()
    .with_temperature( 0.7 )
    .with_max_new_tokens( 100 );

  let response = client.inference()
    .create_with_parameters
    (
      "What is the capital of France?",
      Models::llama_3_3_70b_instruct(),
      params
    )
    .await?;

  println!( "Response: {:?}", response );
  Ok( () )
}

Embeddings with Similarity

use api_huggingface::{ Client, environment::HuggingFaceEnvironmentImpl, secret::Secret };

#[ tokio::main ]
async fn main() -> Result< (), Box< dyn std::error::Error > >
{
  let api_key = Secret::load_from_env( "HUGGINGFACE_API_KEY" )?;
  let env = HuggingFaceEnvironmentImpl::build( api_key, None )?;
  let client = Client::build( env )?;

  let embedding1 = client.embeddings().create( "Hello world" ).await?;
  let embedding2 = client.embeddings().create( "Hi there" ).await?;

  let similarity = client.embeddings().similarity( &embedding1, &embedding2 )?;
  println!( "Similarity: {:.4}", similarity );

  Ok( () )
}

Authentication

Option 1: Workspace Secret (Recommended)

Create secret/-secrets.sh in your workspace root:

#!/bin/bash
export HUGGINGFACE_API_KEY="hf_your-key-here"

Option 2: Environment Variable

export HUGGINGFACE_API_KEY="hf_your-key-here"

Get your API key from huggingface.co/settings/tokens.

Feature Flags

Core Features

  • default - Core async inference and embeddings
  • inference - Text generation API
  • embeddings - Embeddings generation
  • models - Model discovery and status

Streaming and Processing

  • inference-streaming - SSE streaming support
  • embeddings-similarity - Similarity calculations
  • embeddings-batch - Batch processing

Enterprise Reliability

  • circuit-breaker - Failure detection and recovery
  • rate-limiting - Token bucket rate limiting
  • failover - Multi-endpoint failover
  • health-checks - Background health monitoring
  • dynamic-config - Runtime configuration

Client Enhancements

  • sync - Synchronous API wrappers
  • caching - LRU caching with TTL
  • performance-metrics - Request tracking

Presets

  • full - All features enabled
  • integration - Integration tests with real API

Testing

Test Coverage

  • Comprehensive unit and integration tests
  • Real API integration tests (no mocking)
  • No-mockup policy: all tests use real HuggingFace API

Supported Models

Router API (Pro Plan)

Model Provider Capabilities
moonshotai/Kimi-K2-Instruct-0905 groq Chat completions
meta-llama/Llama-3.1-8B-Instruct various Chat completions
mistralai/Mistral-7B-Instruct various Chat completions
codellama/CodeLlama-34b-Instruct various Code generation

Legacy Inference API (Free Tier)

Model Task
facebook/bart-large-cnn Summarization
gpt2 Text generation
sentence-transformers/all-MiniLM-L6-v2 Embeddings

Documentation

Dependencies

  • reqwest: HTTP client with async support
  • tokio: Async runtime
  • serde: Serialization/deserialization
  • workspace_tools: Secret management
  • error_tools: Unified error handling
  • secrecy: Secure credential handling

All dependencies workspace-managed for consistency.

Contributing

  1. Follow established patterns in existing code
  2. Use 2-space indentation consistently
  3. Add tests for new functionality
  4. Update documentation for public APIs
  5. Ensure zero clippy warnings: cargo clippy -- -D warnings
  6. Follow zero-tolerance mock policy (real API integration only)
  7. Follow the "Thin Client, Rich API" principle

License

MIT

Responsibility

This section documents all files and directories in the crate root, ensuring Complete Entity Coverage per organizational_principles.rulebook.md.

Path Purpose
src/ Source code implementation - client, API methods, error handling
tests/ Comprehensive test suite with NO MOCKING policy
examples/ API usage examples demonstrating Hugging Face API features
docs/ Technical documentation organized in design collections (api/, operation/)
Cargo.toml Crate metadata and dependency configuration
readme.md Crate overview, quick start, API documentation, and this Responsibility Table
license MIT license text
.config Configuration directory
task/ Work-item tracking and decision registry

Links