diff --git a/.cursor/rules/google-style-guide.mdc b/.cursor/rules/google-style-guide.mdc new file mode 100644 index 00000000..87764bcf --- /dev/null +++ b/.cursor/rules/google-style-guide.mdc @@ -0,0 +1,86 @@ +--- +description: +globs: +alwaysApply: true +--- +--- +description: | + Enforce Google's Developer Style Guide principles for technical documentation. + These rules guide the AI to create clear, consistent, and user-friendly documentation. +globs: + - "*.md" + - "*.mdx" + - "*.txt" +--- + +# Google Developer Style Guide for Technical Documentation + +## Document Structure +- Always use sentence case for all Markdown headings (e.g., '# This is a heading' not '# This Is A Heading'). +- Begin each main section with a brief one or two sentence overview that summarizes the section's content. +- Organize content into logical sections with clear and concise headings and subheadings. +- Structure the documentation in a hierarchical manner, using heading levels (# for main titles, ## for sections, ### for subsections). + +## Lists and Formatting +- Use Markdown numbered lists (1., 2., etc.) for sequential steps or ordered procedures. +- Use Markdown unordered lists (-, *, etc.) for collections of related items that don't have a specific order. +- Format code-related text using Markdown code blocks with the appropriate language identifier for syntax highlighting: + + ```python + def example_function(): + return "Hello, world!" + ``` +- Format UI elements such as button labels and menu items using bold Markdown syntax (**UI Element**). +- Use italic text (*text*) sparingly, primarily for emphasis, terms, or book titles. +- Present pairs of related data (like terms and definitions) using description lists or bold terms followed by their explanations. +- Use unambiguous date formatting, preferably YYYY-MM-DD. + +## Language and Tone +- Always address the reader using the second person pronoun "you" instead of "we" or "us". +- Prefer active voice in sentences. For example, instead of "The file was saved by the system," write "The system saved the file." +- Maintain a friendly, conversational, and helpful tone, similar to explaining a concept to a colleague. +- Use standard American English spelling and punctuation consistently. +- Avoid highly technical jargon without providing clear explanations or definitions. +- Be mindful of using idioms or culturally specific references that might not be universally understood. +- Avoid unnecessary repetition of adjectives and adverbs. +- Write in a clear, concise, and factual manner, avoiding overly casual or promotional language. + +## Links and References +- When creating hyperlinks using Markdown, ensure the link text clearly describes the target page (e.g., [Learn more about the API](mdc:url)). +- Prioritize linking to official documentation, well-established technical websites, or academic resources. +- For fundamental concepts crucial to understanding the current topic, provide a brief explanation within the documentation rather than immediately linking externally. +- Reserve external links for more detailed or supplementary information. + +## Code Examples +- Always enclose code examples in Markdown code blocks using triple backticks (```) and specify the programming language. +- Precede every code block with a brief paragraph explaining its context and purpose. +- Follow the code block with an explanation of its key parts and expected output. +- Provide substantial, real-world code examples that demonstrate complete or significant functionality rather than isolated snippets. +- If the code example pertains to a specific file or directory, mention its location relative to the project root. + +## Images and Diagrams +- When including images or diagrams, use Markdown image syntax and provide descriptive alt text: ![Alt text describing the image](mdc:image.png) +- Prefer PNG format for diagrams and illustrations, and WebP format for other images where appropriate. +- Ensure all images serve a purpose and enhance understanding of the content. + +## Warnings, Notes, and Important Information +- Format warnings using Markdown blockquotes with a clear prefix: + > :::warning + + This action cannont be undone. + + ::: + +- Format notes using Markdown blockquotes: + > :::warning + + Additional configuration may be required for custom installations. + + ::: +- Keep warning, note, and important information messages brief and to the point, focusing on essential information. + +## Step-by-Step Instructions +- Present step-by-step instructions using Markdown numbered lists. +- Begin each step with a clear action verb (e.g., "Click", "Open", "Enter"). +- Ensure each step represents a single, actionable task. +- Provide sufficient detail for the target audience to understand and execute each action without requiring additional assumptions. \ No newline at end of file diff --git a/docs/sdks/go/overview.md b/docs/sdks/go/overview.md index 647cafe0..85e8ecca 100644 --- a/docs/sdks/go/overview.md +++ b/docs/sdks/go/overview.md @@ -1,43 +1,39 @@ --- title: Overview sidebar_position: 1 +description: "Get started with RunPod Go SDK for building web applications, server-side implementations, and automating tasks. Learn how to install, configure, and secure your API key." --- -Get started with setting up your RunPod projects using Go. -Whether you're building web applications, server-side implementations, or automating tasks, the RunPod Go SDK provides the tools you need. -This guide outlines the steps to get your development environment ready and integrate RunPod into your Go projects. +This guide helps you set up and use the RunPod Go SDK in your projects. You'll learn how to install the SDK, configure your environment, and integrate RunPod into your Go applications. ## Prerequisites -Before you begin, ensure that you have the following: +Before you begin, ensure you have: -- Go installed on your machine (version 1.16 or later) -- A RunPod account with an API key and Endpoint Id +- Go 1.16 or later installed +- A RunPod account with an API key and endpoint ID -## Install the RunPod SDK {#install} +## Install the SDK -Before integrating RunPod into your project, you'll need to install the SDK. +To install the RunPod SDK in your project: -To install the RunPod SDK, run the following `go get` command in your project directory. +1. Run this command in your project directory: + ```bash + go get github.com/runpod/go-sdk + ``` -```command -go get github.com/runpod/go-sdk -``` - -This command installs the `runpod-sdk` package. -Then run the following command to install the dependencies: - -```command -go mod tidy -``` - -For more details about the package, visit the [Go package page](https://pkg.go.dev/github.com/runpod/go-sdk/pkg/sdk) or the [GitHub repository](https://github.com/runpod/go-sdk). +2. Install dependencies: + ```bash + go mod tidy + ``` -## Add your API key +For more details, visit: +- [Go package documentation](https://pkg.go.dev/github.com/runpod/go-sdk/pkg/sdk) +- [GitHub repository](https://github.com/runpod/go-sdk) -To use the RunPod SDK in your project, you first need to import it and configure it with your API key and endpoint ID. Ensure these values are securely stored, preferably as environment variables. +## Configure your environment -Below is a basic example of how to initialize and use the RunPod SDK in your Go project. +Set up your API key and endpoint ID in your Go application: ```go func main() { @@ -54,21 +50,20 @@ func main() { } ``` -This snippet demonstrates how to import the SDK, initialize it with your API key, and reference a specific endpoint using its ID. - -### Secure your API key +## Secure your API key -When working with the RunPod SDK, it's essential to secure your API key. -Storing the API key in environment variables is recommended, as shown in the initialization example. This method keeps your key out of your source code and reduces the risk of accidental exposure. +Always store your API key securely: -:::note +- Use environment variables (recommended) +- Avoid storing keys in source code +- Use secure secrets management solutions -Use environment variables or secure secrets management solutions to handle sensitive information like API keys. +> **Note:** Never commit API keys to version control. Use environment variables or secure secrets management solutions to handle sensitive information. -::: +## Next steps -For more information, see the following: +For more information, see: -- [RunPod SDK Go Package](https://pkg.go.dev/github.com/runpod/go-sdk/pkg/sdk) -- [RunPod GitHub Repository](https://github.com/runpod/go-sdk) -- [Endpoints](/sdks/go/endpoints) +- [Endpoints documentation](endpoints.md) +- [Go package documentation](https://pkg.go.dev/github.com/runpod/go-sdk/pkg/sdk) +- [GitHub repository](https://github.com/runpod/go-sdk) diff --git a/docs/sdks/graphql/configurations.md b/docs/sdks/graphql/configurations.md index 4a67215a..db962e48 100644 --- a/docs/sdks/graphql/configurations.md +++ b/docs/sdks/graphql/configurations.md @@ -4,50 +4,62 @@ sidebar_position: 1 description: "Configure your environment with essential arguments: containerDiskInGb, dockerArgs, env, imageName, name, and volumeInGb, to ensure correct setup and operation of your container." --- -For details on queries, mutations, fields, and inputs, see the [RunPod GraphQL Spec](https://graphql-spec.runpod.io/). +This guide explains the essential configuration arguments for your RunPod environment. For complete API details, see the [RunPod GraphQL Spec](https://graphql-spec.runpod.io/). -When configuring your environment, certain arguments are essential to ensure the correct setup and operation. Below is a detailed overview of each required argument: +## Required arguments -### `containerDiskInGb` +The following arguments are required for proper container setup and operation: + +### Container disk size + +`containerDiskInGb` specifies the container's disk size in gigabytes: -- **Description**: Specifies the size of the disk allocated for the container in gigabytes. This space is used for the operating system, installed applications, and any data generated or used by the container. - **Type**: Integer -- **Example**: `10` for a 10 GB disk size. +- **Example**: `10` for a 10 GB disk +- **Use**: Operating system, applications, and container data -### `dockerArgs` +### Docker arguments + +`dockerArgs` overrides the container's start command: -- **Description**: If specified, overrides the [container start command](https://docs.docker.com/engine/reference/builder/#cmd). If this argument is not provided, it will rely on the start command provided in the docker image. - **Type**: String -- **Example**: `sleep infinity` to run the container in the background. +- **Example**: `sleep infinity` for background operation +- **Use**: Custom container startup behavior + +### Environment variables - +`env` sets container environment variables: -### `env` +- **Type**: Dictionary/Object +- **Example**: `{"DATABASE_URL": "postgres://user:password@localhost/dbname"}` +- **Use**: Application configuration and credentials -- **Description**: A set of environment variables to be set within the container. These can configure application settings, external service credentials, or any other configuration data required by the software running in the container. -- **Type**: Dictionary or Object -- **Example**: `{"DATABASE_URL": "postgres://user:password@localhost/dbname"}`. +### Docker image -### `imageName` +`imageName` specifies the container image: -- **Description**: The name of the Docker image to use for the container. This should include the repository name and tag, if applicable. - **Type**: String -- **Example**: `"nginx:latest"` for the latest version of the Nginx image. +- **Example**: `"nginx:latest"` +- **Use**: Container base image and version -### `name` +### Container name + +`name` identifies your container instance: -- **Description**: The name assigned to the container instance. This name is used for identification and must be unique within the context it's being used. - **Type**: String -- **Example**: `"my-app-container"`. +- **Example**: `"my-app-container"` +- **Use**: Container identification and management + +### Persistent volume -### `volumeInGb` +`volumeInGb` defines persistent storage size: -- **Description**: Defines the size of an additional persistent volume in gigabytes. This volume is used for storing data that needs to persist between container restarts or redeployments. - **Type**: Integer -- **Example**: `5` for a 5GB persistent volume. +- **Example**: `5` for 5GB storage +- **Use**: Data persistence between restarts + +## Optional arguments -Ensure that these arguments are correctly specified in your configuration to avoid errors during deployment. +Additional configuration options may be available for specific use cases. See the [RunPod GraphQL Spec](https://graphql-spec.runpod.io/) for details. -Optional arguments may also be available, providing additional customization and flexibility for your setup. +> **Note:** Ensure all required arguments are correctly specified to avoid deployment errors. diff --git a/docs/sdks/javascript/overview.md b/docs/sdks/javascript/overview.md index 5cb33fc1..1fc9fb51 100644 --- a/docs/sdks/javascript/overview.md +++ b/docs/sdks/javascript/overview.md @@ -4,31 +4,36 @@ sidebar_position: 1 description: "Get started with RunPod JavaScript SDK, a tool for building web apps, server-side implementations, and automating tasks. Learn how to install, integrate, and secure your API key for seamless development." --- -Get started with setting up your RunPod projects using JavaScript. Whether you're building web applications, server-side implementations, or automating tasks, the RunPod JavaScript SDK provides the tools you need. -This guide outlines the steps to get your development environment ready and integrate RunPod into your JavaScript projects. +This guide helps you set up and use the RunPod JavaScript SDK in your projects. You'll learn how to install the SDK, configure your environment, and integrate RunPod into your JavaScript applications. -## Install the RunPod SDK +## Install the SDK -Before integrating RunPod into your project, you'll need to install the SDK. -Using Node.js and npm (Node Package Manager) simplifies this process. -Ensure you have Node.js and npm installed on your system before proceeding. +To use the RunPod SDK in your project: -To install the RunPod SDK, run the following npm command in your project directory. +1. Ensure you have Node.js and npm installed on your system +2. Run one of these commands in your project directory: -```command -npm install --save runpod-sdk -# or -yarn add runpod-sdk -``` + ```bash + npm install --save runpod-sdk + # or + yarn add runpod-sdk + ``` + +This installs the `runpod-sdk` package and adds it to your project's `package.json` dependencies. + +For more details, visit: +- [npm package page](https://www.npmjs.com/package/runpod-sdk) +- [GitHub repository](https://github.com/runpod/js-sdk) -This command installs the `runpod-sdk` package and adds it to your project's `package.json` dependencies. -For more details about the package, visit the [npm package page](https://www.npmjs.com/package/runpod-sdk) or the [GitHub repository](https://github.com/runpod/js-sdk). +## Configure your environment -## Add your API key +To use the RunPod SDK, you need to: -To use the RunPod SDK in your project, you first need to import it and configure it with your API key and endpoint ID. Ensure these values are securely stored, preferably as environment variables. +1. Import the SDK +2. Configure it with your API key and endpoint ID +3. Store sensitive information securely -Below is a basic example of how to initialize and use the RunPod SDK in your JavaScript project. +Here's how to initialize the SDK: ```javascript const { RUNPOD_API_KEY, ENDPOINT_ID } = process.env; @@ -38,22 +43,22 @@ const runpod = runpodSdk(RUNPOD_API_KEY); const endpoint = runpod.endpoint(ENDPOINT_ID); ``` -This snippet demonstrates how to import the SDK, initialize it with your API key, and reference a specific endpoint using its ID. -Remember, the RunPod SDK uses the ES Module (ESM) system and supports asynchronous operations, making it compatible with modern JavaScript development practices. +The SDK uses ES Modules (ESM) and supports asynchronous operations for modern JavaScript development. -### Secure your API key +## Secure your API key -When working with the RunPod SDK, it's essential to secure your API key. -Storing the API key in environment variables is recommended, as shown in the initialization example. This method keeps your key out of your source code and reduces the risk of accidental exposure. +Always store your API key securely: -:::note +- Use environment variables (recommended) +- Avoid storing keys in source code +- Use secure secrets management solutions -Use environment variables or secure secrets management solutions to handle sensitive information like API keys. +> **Note:** Never commit API keys to version control. Use environment variables or secure secrets management solutions to handle sensitive information. -::: +## Next steps -For more information, see the following: +For more information, see: -- [RunPod SDK npm Package](https://www.npmjs.com/package/runpod-sdk) -- [RunPod GitHub Repository](https://github.com/runpod/js-sdk) -- [Endpoints](/sdks/javascript/endpoints) +- [Endpoints documentation](endpoints.md) +- [npm package documentation](https://www.npmjs.com/package/runpod-sdk) +- [GitHub repository](https://github.com/runpod/js-sdk) diff --git a/docs/sdks/overview.md b/docs/sdks/overview.md index 40e5e0bd..b9d33789 100644 --- a/docs/sdks/overview.md +++ b/docs/sdks/overview.md @@ -1,42 +1,74 @@ --- title: Overview -description: "Unlock serverless functionality with RunPod SDKs, enabling developers to create custom logic, simplify deployments, and programmatically manage infrastructure, including Pods, Templates, and Endpoints." +description: "Learn how to use RunPod SDKs to build, deploy, and manage AI applications. Find solutions for common use cases and get started quickly with your preferred programming language." sidebar_position: 1 --- -RunPod SDKs provide developers with tools to use the RunPod API for creating serverless functions and managing infrastructure. -They enable custom logic integration, simplify deployments, and allow for programmatic infrastructure management. +This guide helps you use RunPod SDKs to build and manage AI applications. Choose your preferred programming language and follow the guides that match your goals. -## Interacting with Serverless Endpoints +## Quick start -Once deployed, serverless functions is exposed as an Endpoints, you can allow external applications to interact with them through HTTP requests. +Get started quickly with your preferred programming language: -#### Interact with Serverless Endpoints: +- [Python SDK](python/overview.md) - Best for AI/ML applications +- [JavaScript SDK](javascript/overview.md) - Ideal for web applications +- [Go SDK](go/overview.md) - Great for high-performance services +- [GraphQL API](graphql/configurations.md) - Direct API access -Your Serverless Endpoints works similarly to an HTTP request. -You will need to provide an Endpoint Id and a reference to your API key to complete requests. +## Common use cases -## Infrastructure management +### Build AI applications +- [Create serverless endpoints](python/endpoints.md) +- [Deploy ML models](python/apis.md) +- [Monitor application performance](python/structured-logging.md) -The RunPod SDK facilitates the programmatic creation, configuration, and management of various infrastructure components, including Pods, Templates, and Endpoints. +### Manage infrastructure +- [Set up GPU instances](python/apis.md#list-available-gpus) +- [Configure templates](python/apis.md#create-templates) +- [Scale resources](python/apis.md#create-endpoints) -### Managing Pods +### Monitor and debug +- [Track application logs](python/structured-logging.md) +- [Monitor performance](python/apis.md) +- [Debug issues](python/structured-logging.md#log-levels) -Pods are the fundamental building blocks in RunPod, representing isolated environments for running applications. +## Choose your SDK -#### Manage Pods: +Each SDK is optimized for different use cases: -1. **Create a Pod**: Use the SDK to instantiate a new Pod with the desired configuration. -2. **Configure the Pod**: Adjust settings such as GPU, memory allocation, and network access according to your needs. -3. **Deploy Applications**: Deploy your applications or services within the Pod. -4. **Monitor and scale**: Utilize the SDK to monitor Pod performance and scale resources as required. +### Python SDK +Best for: +- AI/ML model deployment +- Data processing pipelines +- Scientific computing +- Quick prototyping -### Manage Templates and Endpoints +### JavaScript SDK +Best for: +- Web applications +- Frontend integrations +- Browser-based tools +- Node.js services -Templates define the base environment for Pods, while Endpoints enable external access to services running within Pods. +### Go SDK +Best for: +- High-performance services +- Microservices +- CLI tools +- System utilities -#### Use Templates and Endpoints: +### GraphQL API +Best for: +- Custom integrations +- Direct API access +- Complex queries +- Real-time updates -1. **Create a Template**: Define a Template that specifies the base configuration for Pods. -2. **Instantiate Pods from Templates**: Use the Template to create Pods with a consistent environment. -3. **Expose Services via Endpoints**: Configure Endpoints to allow external access to applications running in Pods. +## Next steps + +1. Choose your preferred programming language +2. Follow the quick start guide +3. Explore use case examples +4. Build your application + +> **Note:** All SDKs provide similar core functionality. Choose based on your team's expertise and project requirements. diff --git a/docs/sdks/python/_loggers.md b/docs/sdks/python/_loggers.md deleted file mode 100644 index 36fd1fdc..00000000 --- a/docs/sdks/python/_loggers.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -title: Loggers -description: "Enable efficient application monitoring and debugging with RunPod's structured logging interface, simplifying issue identification and resolution, and ensuring smooth operation." ---- - -Logging is essential for insight into your application's performance and health. -It facilitates quick identification and resolution of issues, ensuring smooth operation. - -Because of this, RunPod provides a structured logging interface, simplifying application monitoring and debugging, for your Handler code. - -To setup logs, instantiate the `RunPodLogger()` module. - -```python -import runpod - -log = runpod.RunPodLogger() -``` - -Then set the log level. -In the following example, there are two logs levels being set. - -```python -import runpod -import os - -log = runpod.RunPodLogger() - - -def handler(job): - try: - job_input = job["input"] - log.info("Processing job input") - - name = job_input.get("name", "World") - log.info("Processing completed successfully") - - return f"Hello, {name}!" - except Exception as e: - # Log the exception with an error level log - log.error(f"An error occurred: {str(e)}") - return "An error occurred during processing." - - -runpod.serverless.start({"handler": handler}) -``` - -## Log levels - -RunPod provides a logging interface with types you're already familiar with. - -The following provides a list of log levels you can set inside your application. - -- `debug`: For in-depth troubleshooting. Use during development to track execution flow. -- `info`: (default) Indicates normal operation. Confirms the application is running as expected. -- `warn`: Alerts to potential issues. Signals unexpected but non-critical events. -- `error`: Highlights failures. Marks inability to perform a function, requiring immediate attention. diff --git a/docs/sdks/python/apis.md b/docs/sdks/python/apis.md index b93edef0..677a860f 100644 --- a/docs/sdks/python/apis.md +++ b/docs/sdks/python/apis.md @@ -8,13 +8,11 @@ description: "Learn how to manage computational resources with the RunPod API, i import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; -This document outlines the core functionalities provided by the RunPod API, including how to interact with Endpoints, manage Templates, and list available GPUs. -These operations let you dynamically manage computational resources within the RunPod environment. +This guide explains how to use the RunPod API to manage computational resources. You'll learn how to work with endpoints, templates, and GPUs programmatically. -## Get Endpoints +## Get endpoints -To retrieve a comprehensive list of all available endpoint configurations within RunPod, you can use the `get_endpoints()` function. -This function returns a list of endpoint configurations, allowing you to understand what's available for use in your projects. +Retrieve a list of all available endpoint configurations: ```python import runpod @@ -22,17 +20,14 @@ import os runpod.api_key = os.getenv("RUNPOD_API_KEY") -# Fetching all available endpoints +# Get all available endpoints endpoints = runpod.get_endpoints() - -# Displaying the list of endpoints print(endpoints) ``` -## Create Template +## Create templates -Templates in RunPod serve as predefined configurations for setting up environments efficiently. -The `create_template()` function facilitates the creation of new templates by specifying a name and a Docker image. +Templates define predefined configurations for your environments. Create a new template: @@ -44,14 +39,14 @@ import os runpod.api_key = os.getenv("RUNPOD_API_KEY") try: - # Creating a new template with a specified name and Docker image - new_template = runpod.create_template(name="test", image_name="runpod/base:0.1.0") - - # Output the created template details + # Create a new template + new_template = runpod.create_template( + name="test", + image_name="runpod/base:0.1.0" + ) print(new_template) except runpod.error.QueryError as err: - # Handling potential errors during template creation print(err) print(err.query) ``` @@ -77,11 +72,9 @@ except runpod.error.QueryError as err: -## Create Endpoint +## Create endpoints -Creating a new endpoint with the `create_endpoint()` function. -This function requires you to specify a `name` and a `template_id`. -Additional configurations such as GPUs, number of Workers, and more can also be specified depending your requirements. +Create a new endpoint using a template: @@ -93,28 +86,25 @@ import os runpod.api_key = os.getenv("RUNPOD_API_KEY") try: - # Creating a template to use with the new endpoint + # Create a template first new_template = runpod.create_template( - name="test", image_name="runpod/base:0.4.4", is_serverless=True + name="test", + image_name="runpod/base:0.4.4", + is_serverless=True ) - - # Output the created template details print(new_template) - # Creating a new endpoint using the previously created template + # Create an endpoint using the template new_endpoint = runpod.create_endpoint( name="test", template_id=new_template["id"], gpu_ids="AMPERE_16", workers_min=0, - workers_max=1, + workers_max=1 ) - - # Output the created endpoint details print(new_endpoint) except runpod.error.QueryError as err: - # Handling potential errors during endpoint creation print(err) print(err.query) ``` @@ -153,9 +143,9 @@ except runpod.error.QueryError as err: -## Get GPUs +## List available GPUs -For understanding the computational resources available, the `get_gpus()` function lists all GPUs that can be allocated to endpoints in RunPod. This enables optimal resource selection based on your computational needs. +Get information about available GPUs: @@ -167,10 +157,8 @@ import os runpod.api_key = os.getenv("RUNPOD_API_KEY") -# Fetching all available GPUs +# Get all available GPUs gpus = runpod.get_gpus() - -# Displaying the GPUs in a formatted manner print(json.dumps(gpus, indent=2)) ``` @@ -189,17 +177,15 @@ print(json.dumps(gpus, indent=2)) "displayName": "A100 SXM 80GB", "memoryInGb": 80 } - // Additional GPUs omitted for brevity ] ``` -## Get GPU by Id +## Get GPU details -Use `get_gpu()` and pass in a GPU Id to retrieve details about a specific GPU model by its ID. -This is useful when understanding the capabilities and costs associated with various GPU models. +Retrieve detailed information about a specific GPU: @@ -211,9 +197,9 @@ import os runpod.api_key = os.getenv("RUNPOD_API_KEY") -gpus = runpod.get_gpu("NVIDIA A100 80GB PCIe") - -print(json.dumps(gpus, indent=2)) +# Get details for a specific GPU +gpu = runpod.get_gpu("NVIDIA A100 80GB PCIe") +print(json.dumps(gpu, indent=2)) ``` @@ -244,7 +230,6 @@ print(json.dumps(gpus, indent=2)) ``` - -Through these functionalities, the RunPod API enables efficient and flexible management of computational resources, catering to a wide range of project requirements. +> **Note:** The API provides flexible resource management options. Choose configurations that best match your project requirements. diff --git a/docs/sdks/python/overview.md b/docs/sdks/python/overview.md index 8e699cc6..9cca4877 100644 --- a/docs/sdks/python/overview.md +++ b/docs/sdks/python/overview.md @@ -1,148 +1,73 @@ --- title: Overview sidebar_position: 1 -description: "Get started with setting up your RunPod projects using Python. Learn how to install the RunPod SDK, create a Python virtual environment, and configure your API key for access to the RunPod platform." +description: "Get started with RunPod Python SDK for building AI applications, deploying ML models, and managing computational resources. Learn how to set up your environment and start building." --- -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; +This guide helps you use the RunPod Python SDK to build AI applications and manage computational resources. You'll learn how to set up your environment and start building with Python. -Get started with setting up your RunPod projects using Python. -Depending on the specific needs of your project, there are various ways to interact with the RunPod platform. -This guide provides an approach to get you up and running. +## Quick start -## Install the RunPod SDK +1. Set up your Python environment: + ```bash + python3 -m venv env + source env/bin/activate # On macOS/Linux + # or + env\Scripts\activate # On Windows + ``` -Create a Python virtual environment to install the RunPod SDK library. -Virtual environments allow you to manage dependencies for different projects separately, avoiding conflicts between project requirements. +2. Install the SDK: + ```bash + python -m pip install runpod + ``` -To get started, install setup a virtual environment then install the RunPod SDK library. +3. Configure your API key: + ```python + import runpod + import os - - + runpod.api_key = os.getenv("RUNPOD_API_KEY") + ``` -Create a Python virtual environment with [venv](https://docs.python.org/3/library/venv.html): +## Common use cases - ```command - python3 -m venv env - source env/bin/activate - ``` +### Deploy ML models +- [Create serverless endpoints](endpoints.md) +- [Configure GPU resources](apis.md#list-available-gpus) +- [Monitor model performance](structured-logging.md) - - +### Build AI applications +- [Set up development environment](apis.md#create-templates) +- [Deploy applications](apis.md#create-endpoints) +- [Track application logs](structured-logging.md) -Create a Python virtual environment with [venv](https://docs.python.org/3/library/venv.html): +### Manage resources +- [Configure GPU instances](apis.md#list-available-gpus) +- [Set up templates](apis.md#create-templates) +- [Scale endpoints](apis.md#create-endpoints) - ```command - python -m venv env - env\Scripts\activate - ``` +## Key features - - +### Serverless deployment +- Deploy ML models as serverless endpoints +- Automatic scaling based on demand +- Pay-per-use pricing model -Create a Python virtual environment with [venv](https://docs.python.org/3/library/venv.html): +### Resource management +- GPU instance configuration +- Template-based deployment +- Resource monitoring - ```command - python3 -m venv env - source env/bin/activate - ``` +### Monitoring and logging +- Structured logging interface +- Performance tracking +- Error handling - - +## Next steps -To install the SDK, run the following command from the terminal. +1. [Set up your environment](apis.md) +2. [Deploy your first model](endpoints.md) +3. [Monitor your application](structured-logging.md) +4. [Scale your resources](apis.md#create-endpoints) -```command -python -m pip install runpod -``` - - - -You should have the RunPod SDK installed and ready to use. - -## Get RunPod SDK version - -To ensure you've setup your RunPod SDK in Python, choose from one of the following methods to print the RunPod Python SDK version to your terminal. - - - - - Run the following command using pip to get the RunPod SDK version. - - ```command - pip show runpod - ``` - - You should see something similar to the following output. - - ```command - runpod==1.6.1 - ``` - - - - - Run the following command from your terminal to get the RunPod SDK version. - - ```command - python3 -c "import runpod; print(runpod.__version__)" - ``` - - - - - To ensure you've setup your installation correctly, get the RunPod SDK version. - Create a new file called `main.py`. - Add the following to your Python file and execute the script. - - ```python - import runpod - - version = runpod.version.get_version() - - print(f"RunPod version number: {version}") - ``` - - You should see something similar to the following output. - - ```text - RunPod version number: 1.X.0 - ``` - - - - -You can find the latest version of the RunPod Python SDK on [GitHub](https://github.com/runpod/runpod-python/releases). - -Now that you've installed the RunPod SDK, add your API key. - -## Add your API key - -Set `api_key` and reference its variable in your Python application. -This authenticates your requests to the RunPod platform and allows you to access the [RunPod API](/sdks/python/apis). - -```python -import runpod -import os - -runpod.api_key = os.getenv("RUNPOD_API_KEY") -``` - -:::note - -It's recommended to use environment variables to set your API key. -You shouldn't load your API key directly into your code. - -For these examples, the API key loads from an environment variable called `RUNPOD_API_KEY`. - -::: - -Now that you've have the RunPod Python SDK installed and configured, you can start using the RunPod platform. - -For more information, see: - -- [APIs](/sdks/python/apis) -- [Endpoints](/sdks/python/endpoints) +> **Note:** The Python SDK is optimized for AI/ML applications. Use it for model deployment, data processing, and scientific computing. diff --git a/docs/sdks/python/structured-logging.md b/docs/sdks/python/structured-logging.md new file mode 100644 index 00000000..ca9cddac --- /dev/null +++ b/docs/sdks/python/structured-logging.md @@ -0,0 +1,96 @@ +--- +title: Structured logging +description: "Monitor and debug your applications with RunPod's structured logging interface. Track performance, identify issues, and gain insights into your running serverless functions." +--- + +# Loggers + +RunPod's structured logging interface helps you monitor and debug your applications. This guide shows you how to set up and use the RunPod logger to track performance metrics, identify issues, and ensure smooth operation of your deployments. + +## Quick start + +### Initialize the logger + +```python +from runpod.serverless.logging import RunPodLogger + +# Initialize the logger +logger = RunPodLogger() +``` + +### Using the logger in a handler function + +```python +from runpod.serverless.logging import RunPodLogger + +logger = RunPodLogger() + +def handler(event): + logger.info("Processing request") + + try: + # Your logic here + input_data = event["input"] + logger.debug(f"Received input: {input_data}") + + # Process data + result = process_data(input_data) + + logger.info("Request processed successfully") + return result + except Exception as e: + logger.error(f"Error processing request: {str(e)}") + return {"error": str(e)} +``` + +## Log levels + +RunPod logger supports different log levels for various situations: + +- **Debug**: Detailed information useful for debugging + ```python + logger.debug("Loading model with parameters", extra={"model_size": "7B", "quantization": "4bit"}) + ``` + +- **Info**: General information about the application's operation + ```python + logger.info("Request processing started") + ``` + +- **Warning**: Potential issues that aren't errors but might need attention + ```python + logger.warn("Memory usage above 80%", extra={"memory_used": "12.8GB", "memory_total": "16GB"}) + ``` + +- **Error**: Errors that allow the application to continue running + ```python + logger.error("Failed to process input", extra={"error_type": "ValueError", "input_id": "123"}) + ``` + +## Best practices + +For effective logging: + +1. **Use appropriate log levels**: Reserve debug for development information, info for normal operations, warnings for potential issues, and errors for actual problems. + +2. **Include context**: Add relevant data using the `extra` parameter to make logs more useful. + ```python + logger.info("Generated image", extra={"dimensions": "1024x1024", "generation_time": "2.3s"}) + ``` + +3. **Log beginning and end of key operations**: This helps track execution flow and identify where issues occur. + ```python + logger.info("Starting model inference") + # ... inference code ... + logger.info("Model inference completed", extra={"inference_time": elapsed_time}) + ``` + +4. **Include error details**: When catching exceptions, include specific error information. + ```python + try: + # Operation that might fail + except Exception as e: + logger.error(f"Operation failed: {str(e)}", extra={"error_type": type(e).__name__}) + ``` + +> **Note**: The default log level is `info`. You can adjust this based on your debugging needs and production requirements. diff --git a/docs/serverless/build/first-endpoint.md b/docs/serverless/build/first-endpoint.md new file mode 100644 index 00000000..342f537c --- /dev/null +++ b/docs/serverless/build/first-endpoint.md @@ -0,0 +1,185 @@ +--- +title: Create your first endpoint +description: "Build and deploy your own custom serverless endpoint on RunPod. Learn to set up your environment, create a handler function, and deploy your container to RunPod Serverless." +sidebar_position: 1 +--- + +# Create your first endpoint + +In this guide, you'll learn how to build and deploy a custom serverless endpoint that can process any type of data. + +## Prerequisites + +Before you begin, make sure you have: + +- A RunPod account ([Sign up here](https://www.runpod.io/console/serverless)) +- Docker installed on your machine ([Get Docker](https://docs.docker.com/get-docker/)) +- Python 3.10 or later +- Basic Python knowledge + +## Step 1: Set up your project + +1. Create a new directory for your project: + +```bash +mkdir my-runpod-endpoint +cd my-runpod-endpoint +``` + +2. Create a Python virtual environment: + +```bash +# Create the virtual environment +python -m venv venv + +# Activate it (macOS/Linux) +source venv/bin/activate + +# Or on Windows +# venv\Scripts\activate +``` + +3. Install the RunPod Python SDK: + +```bash +pip install runpod +``` + +## Step 2: Create a handler function + +Create a file named `handler.py` with this basic template: + +```python +import runpod + +def handler(event): + """ + This function is called when a request is sent to your endpoint. + """ + # Get the input from the request + job_input = event["input"] + + # Process the input + # Replace this with your actual processing logic + result = { + "message": f"Received input: {job_input}", + "processed": True, + "timestamp": runpod.utils.get_utc_timestamp() + } + + # Return the result + return result + +# Start the serverless function +runpod.serverless.start({"handler": handler}) +``` + +## Step 3: Create a test input file + +Create a file named `test_input.json` to test your handler locally: + +```json +{ + "input": { + "text": "Hello, RunPod!", + "parameter": 42 + } +} +``` + +## Step 4: Test your handler locally + +Run your handler locally to make sure it works: + +```bash +python handler.py +``` + +You should see output that looks like: + +``` +--- Starting Serverless Worker | Version X.X.X --- +INFO | Using test_input.json as job input. +DEBUG | Retrieved local job: {'input': {'text': 'Hello, RunPod!', 'parameter': 42}, 'id': 'local_test'} +INFO | local_test | Started. +DEBUG | local_test | Handler output: {'message': "Received input: {'text': 'Hello, RunPod!', 'parameter': 42}", 'processed': True, 'timestamp': '2023-08-01T15:30:45Z'} +INFO | Job local_test completed successfully. +INFO | Job result: {'output': {'message': "Received input: {'text': 'Hello, RunPod!', 'parameter': 42}", 'processed': True, 'timestamp': '2023-08-01T15:30:45Z'}} +INFO | Local testing complete, exiting. +``` + +## Step 5: Create a Dockerfile + +Create a `Dockerfile` to package your handler: + +```dockerfile +FROM python:3.10-slim + +WORKDIR /app + +# Install dependencies +RUN pip install --no-cache-dir runpod + +# Copy handler code +COPY handler.py /app/ +COPY test_input.json /app/ + +# Start the handler +CMD ["python", "-u", "handler.py"] +``` + +## Step 6: Build and push your Docker image + +1. Build your Docker image: + +```bash +docker build -t username/my-runpod-endpoint:latest . +``` + +Replace `username` with your Docker Hub username or container registry prefix. + +2. Push your image to Docker Hub or your container registry: + +```bash +docker push username/my-runpod-endpoint:latest +``` + +## Step 7: Deploy to RunPod + +1. Go to the [RunPod Console](https://www.runpod.io/console/serverless) +2. Click "New Endpoint" +3. Enter your Docker image URL +4. Configure your endpoint: + - **Name**: Choose a descriptive name + - **GPU Type**: Select a GPU type (or CPU) + - **Min/Max Workers**: Set scaling parameters + - **Idle Timeout**: How long to keep workers running after inactivity + +5. Click "Deploy" + +## Step 8: Test your endpoint + +Once deployed, you can test your endpoint using the RunPod console or with a curl request: + +```bash +curl -X POST "https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/run" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -d '{ + "input": { + "text": "Hello from API request!", + "parameter": 100 + } + }' +``` + +Replace `YOUR_ENDPOINT_ID` and `YOUR_API_KEY` with your actual values. + +## Next steps + +Now that you've created your first endpoint, you might want to: + +- [Learn about handler functions](/docs/serverless/build/handler-functions) - More advanced handler patterns +- [Build a custom worker](/docs/serverless/build/custom-workers) - Create workers with custom dependencies +- [Start from a template](/docs/serverless/build/from-template) - Use starter templates for different use cases +- [Configure autoscaling](/docs/serverless/manage/scaling) - Optimize for performance and cost \ No newline at end of file diff --git a/docs/serverless/examples/text-generation.md b/docs/serverless/examples/text-generation.md new file mode 100644 index 00000000..978bf7a4 --- /dev/null +++ b/docs/serverless/examples/text-generation.md @@ -0,0 +1,303 @@ +--- +title: Text generation +description: "Build a text generation API with large language models on RunPod Serverless. This complete guide covers setup, deployment, optimization, and integration." +--- + +# Text generation with LLMs + +This guide shows you how to build and deploy a text generation API using large language models (LLMs) on RunPod Serverless. + +## Overview + +You'll learn how to: +1. Set up a text generation endpoint +2. Configure for optimal performance and cost +3. Send requests and process responses +4. Integrate with your applications + +## Prerequisites + +- A RunPod account with serverless access +- Basic understanding of Python and Docker +- Familiarity with LLMs (optional) + +## Option 1: Use quick deploy (easiest) + +RunPod offers pre-configured endpoints for popular LLMs: + +1. Go to the [RunPod Console](https://www.runpod.io/console/serverless) +2. Click "Quick Deploy" +3. Select a text generation model (Llama, Mistral, etc.) +4. Configure GPU and worker settings +5. Deploy + +## Option 2: Build a custom endpoint + +For more flexibility, you can create a custom endpoint: + +### Step 1: Create a handler with vLLM + +Create a file named `handler.py`: + +```python +import runpod +import os +from vllm import LLM, SamplingParams + +# Initialize the model (runs once when worker starts) +def init_model(): + global model + model = LLM( + model="meta-llama/Llama-3-8b-chat-hf", # Replace with your preferred model + tensor_parallel_size=1, # Adjust based on GPU type + trust_remote_code=True + ) + return model + +# Initialize model globally +model = init_model() + +# Define sampling parameters +default_params = SamplingParams( + temperature=0.7, + top_p=0.95, + max_tokens=512 +) + +def handler(event): + """ + Handle inference requests + """ + try: + # Get input from request + job_input = event["input"] + prompt = job_input.get("prompt", "") + system_prompt = job_input.get("system_prompt", "You are a helpful AI assistant.") + + # Get custom generation parameters or use defaults + params = job_input.get("params", {}) + sampling_params = SamplingParams( + temperature=params.get("temperature", default_params.temperature), + top_p=params.get("top_p", default_params.top_p), + max_tokens=params.get("max_tokens", default_params.max_tokens) + ) + + # Format prompt for chat + formatted_prompt = f"[INST] <>\n{system_prompt}\n<>\n\n{prompt} [/INST]" + + # Generate text + outputs = model.generate([formatted_prompt], sampling_params) + + # Format output + generated_text = outputs[0].outputs[0].text + + return { + "generated_text": generated_text, + "model": "meta-llama/Llama-3-8b-chat-hf", + "usage": { + "prompt_tokens": len(prompt.split()), + "completion_tokens": len(generated_text.split()), + "total_tokens": len(prompt.split()) + len(generated_text.split()) + } + } + + except Exception as e: + return {"error": str(e)} + +# Start the serverless function +runpod.serverless.start({"handler": handler}) +``` + +### Step 2: Create a Dockerfile + +Create a `Dockerfile`: + +```dockerfile +FROM runpod/pytorch:2.2.0-py3.10-cuda12.1.0-devel + +WORKDIR /app + +# Install dependencies +RUN pip install --no-cache-dir runpod vllm transformers accelerate + +# Copy handler code +COPY handler.py . + +# Set environment variables +ENV HUGGING_FACE_HUB_TOKEN="your_hf_token" # Replace with your token +ENV RUNPOD_VLLM_MODEL="meta-llama/Llama-3-8b-chat-hf" + +# Start the handler +CMD ["python", "-u", "handler.py"] +``` + +### Step 3: Build and push the image + +```bash +docker build -t your-username/llm-endpoint:latest . +docker push your-username/llm-endpoint:latest +``` + +### Step 4: Deploy the endpoint + +1. Go to the RunPod Serverless console +2. Create a new endpoint with your image +3. Select an appropriate GPU (A10G, A100, etc.) +4. Configure workers based on expected traffic +5. Deploy + +## Sending requests + +Send requests to your endpoint: + +```python +import requests +import json + +# Replace with your endpoint ID and API key +ENDPOINT_ID = "your-endpoint-id" +API_KEY = "your-api-key" + +def generate_text(prompt, system_prompt="You are a helpful AI assistant.", **params): + url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/run" + + payload = { + "input": { + "prompt": prompt, + "system_prompt": system_prompt, + "params": params + } + } + + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {API_KEY}" + } + + response = requests.post(url, headers=headers, data=json.dumps(payload)) + return response.json() + +# Example usage +result = generate_text( + "Explain quantum computing in simple terms.", + temperature=0.5, + max_tokens=300 +) + +print(result) +``` + +## Performance optimization + +### Model size vs. performance + +| Model Size | GPU Recommendation | Throughput | Latency | +|------------|-------------------|------------|---------| +| 7B-8B | L4, RTX 4090 | Medium-High | Low | +| 13B-14B | A10G, A6000 | Medium | Medium | +| 30B-70B | A100 40GB/80GB | Low-Medium | Medium-High | + +### Quantization + +Add quantization to reduce memory usage and increase throughput: + +```python +# Modify the init_model function +def init_model(): + global model + model = LLM( + model="meta-llama/Llama-3-8b-chat-hf", + tensor_parallel_size=1, + trust_remote_code=True, + quantization="awq" # Use AWQ quantization + ) + return model +``` + +### Caching + +Enable caching to improve performance for repeated queries: + +```python +# Modify the init_model function +def init_model(): + global model + model = LLM( + model="meta-llama/Llama-3-8b-chat-hf", + tensor_parallel_size=1, + trust_remote_code=True, + cache_size=1024 # Cache up to 1024 requests + ) + return model +``` + +## Monitoring and scaling + +### Configure optimal scaling + +For text generation endpoints: + +- **For experimentation**: Min 0, Max 1-2, Idle 60s +- **For production**: Min 1, Max 5+, Idle 300s + +### Monitor performance + +Check your endpoint metrics to: +- Track usage patterns +- Identify bottlenecks +- Optimize cost + +## Integration examples + +### Web application + +```javascript +async function generateText() { + const prompt = document.getElementById('prompt').value; + + const response = await fetch('https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/run', { + method: 'POST', + headers: { + 'Content-Type': 'application/json', + 'Authorization': 'Bearer YOUR_API_KEY' + }, + body: JSON.stringify({ + input: { + prompt: prompt + } + }) + }); + + const data = await response.json(); + document.getElementById('result').innerText = data.output.generated_text; +} +``` + +### Async processing for long tasks + +For long-running generation, use the async API: + +```python +# Submit job +response = requests.post( + f"https://api.runpod.ai/v2/{ENDPOINT_ID}/run", + headers=headers, + json=payload +) +job_id = response.json()["id"] + +# Check status and get result when done +status_url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/status/{job_id}" +while True: + status = requests.get(status_url, headers=headers).json() + if status["status"] == "COMPLETED": + result = status["output"] + break + time.sleep(1) +``` + +## Next steps + +- [Explore image generation](/docs/serverless/examples/image-generation) +- [Learn about chaining endpoints](/docs/serverless/examples/chaining-endpoints) +- [Optimize costs](/docs/serverless/manage/optimize) \ No newline at end of file diff --git a/docs/serverless/get-started.md b/docs/serverless/get-started.md index bbea805a..cba51c72 100644 --- a/docs/serverless/get-started.md +++ b/docs/serverless/get-started.md @@ -1,132 +1,201 @@ --- -title: "Get started with Endpoints" +title: "Step-by-step guide" sidebar_position: 2 -description: Master the art of building Docker images, deploying Serverless endpoints, and sending requests with this comprehensive guide, covering prerequisites, RunPod setup, and deployment steps. +description: "Follow this detailed step-by-step guide to build and deploy a custom serverless endpoint on RunPod. Set up your development environment, create a handler, and deploy your application." --- -## Build a Serverless Application on RunPod +# Building a custom endpoint: Step-by-step -Follow these steps to set up a development environment, create a handler file, test it locally, and build a Docker image for deployment: +This comprehensive guide walks you through creating and deploying a custom serverless endpoint on RunPod from scratch. We'll build a simple application that you can later adapt for your specific needs. -1. Create a Python virtual environment and install RunPod SDK +## Prerequisites -```bash -# 1. Create a Python virtual environment -python3 -m venv venv +Before you begin, make sure you have: +- A RunPod account ([Sign up here](https://www.runpod.io/console/serverless)) +- Docker installed on your machine ([Get Docker](https://docs.docker.com/get-docker/)) +- Python 3.10 or later +- Basic understanding of Python and Docker -# 2. Activate the virtual environment -# On macOS/Linux: +## Step 1: Set up your development environment -source venv/bin/activate +1. Create a new directory for your project: + ```bash + mkdir my-serverless-app + cd my-serverless-app + ``` -# On Windows: -venv\Scripts\activate +2. Create and activate a Python virtual environment: + ```bash + # Create virtual environment + python3 -m venv venv -# 3. Install the RunPod SDK -pip install runpod -``` + # Activate it (macOS/Linux) + source venv/bin/activate + # OR (Windows) + venv\Scripts\activate + ``` + +3. Install the RunPod SDK: + ```bash + pip install runpod + ``` + +## Step 2: Create your handler function -2. Create the handler file (rp_handler.py): +Create a file named `handler.py` with this basic template: ```python import runpod -import time def handler(event): - input = event['input'] - instruction = input.get('instruction') - seconds = input.get('seconds', 0) - - # Placeholder for a task; replace with image or text generation logic as needed - time.sleep(seconds) - result = instruction.replace(instruction.split()[0], 'created', 1) - - return result - -if __name__ == '__main__': - runpod.serverless.start({'handler': handler}) + """ + Process requests sent to your serverless endpoint. + This function is invoked when a request is sent to your endpoint. + """ + try: + # Get input from the request + job_input = event["input"] + + # Process the input (customize this for your needs) + # This is where your application logic goes + result = { + "message": f"Processed input: {job_input}", + "status": "success", + "timestamp": runpod.utils.get_utc_timestamp() + } + + # Return the result + return result + + except Exception as e: + # Return error information if something goes wrong + return {"error": str(e)} + +# Start the serverless function +runpod.serverless.start({"handler": handler}) ``` -3. Create a test_input.json file in the same folder: +## Step 3: Create a test input file -```python +Create a file named `test_input.json` to test your handler locally: + +```json { "input": { - "instruction": "create a image", - "seconds": 15 + "message": "Hello, RunPod!", + "parameter": 42 } } ``` -4. Test the handler code locally: +## Step 4: Test locally -```python -python3 rp_handler.py +Run your handler locally to ensure it works correctly: -# You should see an output like this: ---- Starting Serverless Worker | Version 1.7.0 --- -INFO | Using test_input.json as job input. -DEBUG | Retrieved local job: {'input': {'instruction': 'create a image', 'seconds': 15}, 'id': 'local_test'} -INFO | local_test | Started. -DEBUG | local_test | Handler output: created a image -DEBUG | local_test | run_job return: {'output': 'created a image'} -INFO | Job local_test completed successfully. -INFO | Job result: {'output': 'created a image'} -INFO | Local testing complete, exiting. +```bash +python handler.py ``` -5. Create a Dockerfile: - -```docker -FROM python:3.10-slim - -WORKDIR / -RUN pip install --no-cache-dir runpod -COPY rp_handler.py / +You should see output similar to: -# Start the container -CMD ["python3", "-u", "rp_handler.py"] ``` - -6. Build and push your Docker image - -```command -docker build --platform linux/amd64 --tag /: . +--- Starting Serverless Worker | Version X.X.X --- +INFO | Using test_input.json as job input. +DEBUG | Retrieved local job: {'input': {'message': 'Hello, RunPod!', 'parameter': 42}, 'id': 'local_test'} +INFO | local_test | Started. +DEBUG | local_test | Handler output: {'message': "Processed input: {'message': 'Hello, RunPod!', 'parameter': 42}", 'status': 'success', 'timestamp': '2023-08-01T15:30:45Z'} +INFO | Job local_test completed successfully. +INFO | Local testing complete, exiting. ``` -7. Push to your container registry: +## Step 5: Containerize your application + +1. Create a `Dockerfile`: + ```dockerfile + FROM python:3.10-slim + + WORKDIR /app + + # Install dependencies + RUN pip install --no-cache-dir runpod + + # Copy your handler code + COPY handler.py /app/ + COPY test_input.json /app/ + + # Start the handler + CMD ["python", "-u", "handler.py"] + ``` + +2. Build your Docker image: + ```bash + docker build --platform linux/amd64 -t your-username/serverless-app:latest . + ``` + + Replace `your-username` with your Docker Hub username or registry prefix. + +3. Test your container locally: + ```bash + docker run your-username/serverless-app:latest + ``` + +4. Push to Docker Hub or your registry: + ```bash + docker push your-username/serverless-app:latest + ``` + +## Step 6: Deploy to RunPod + +1. Go to the [RunPod Serverless Console](https://www.runpod.io/console/serverless) +2. Click "New Endpoint" +3. Enter your Docker image URL (e.g., `your-username/serverless-app:latest`) +4. Configure your endpoint: + - **Name**: Choose a descriptive name for your endpoint + - **GPU Type**: Select the appropriate GPU (or CPU) based on your needs + - **Worker Count**: Set to 0 for scale-to-zero or 1+ to keep workers warm + - **Max Workers**: Set the maximum number of concurrent workers + - **Idle Timeout**: How long to keep workers alive after finishing a job + - **Flash Boot**: Enable for faster cold starts (if needed) + +5. Click "Deploy" to create your endpoint + +## Step 7: Test your endpoint + +Once deployed, you can test your endpoint using the RunPod console or with curl: -```command -docker push /: +```bash +curl -X POST "https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/run" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "input": { + "message": "Hello from API request!", + "parameter": 100 + } + }' ``` -:::note +Replace `YOUR_ENDPOINT_ID` and `YOUR_API_KEY` with your actual values. -When building your docker image, you might need to specify the platform you are building for. -This is important when you are building on a machine with a different architecture than the one you are deploying to. +## Step 8: Monitor and adjust -When building for RunPod providers use `--platform=linux/amd64`. +1. Check the logs and metrics in the RunPod console +2. Adjust worker count and idle timeout based on your observed traffic patterns +3. Update your endpoint as needed by pushing new Docker images -::: +## Next steps -Alternatively, you can clone our [worker-basic](https://github.com/runpod-workers/worker-basic) repository to quickly build a Docker image and push it to your container registry for a faster start. +Now that you've deployed your custom endpoint, you can: -Now that you've pushed your container registry, you're ready to deploy your Serverless Endpoint to RunPod. +- Add more complex processing logic to your handler function +- Integrate with machine learning models or other libraries +- Set up CI/CD for automated deployments +- Connect your endpoint to your applications -## Deploy a Serverless Endpoint +For advanced usage, explore: -This step will walk you through deploying a Serverless Endpoint to RunPod. You can refer to this walkthrough to deploy your own custom Docker image. +- [Worker development](workers/overview.md) +- [Endpoint management](endpoints/manage-endpoints.md) +- [Configure autoscaling](manage/scaling.md) - +> **Pro tip**: For local development, you can use our [example repository](https://github.com/runpod-workers/worker-basic) as a starting point. diff --git a/docs/serverless/index.md b/docs/serverless/index.md new file mode 100644 index 00000000..5950cdd1 --- /dev/null +++ b/docs/serverless/index.md @@ -0,0 +1,96 @@ +--- +title: RunPod Serverless +description: "Deploy, scale, and manage AI applications with RunPod Serverless. Build custom endpoints or use pre-configured models with pay-per-second pricing." +sidebar_position: 1 +slug: /serverless +--- + +# RunPod Serverless + +Deploy and scale AI applications without managing infrastructure. RunPod Serverless handles the complexity, so you can focus on building. + +## Choose your path + +
+
+
+

🚀 I want to deploy quickly

+
+
+

Deploy pre-built AI models in minutes without writing code.

+ +
+
+
+
+

🧩 I want to build custom

+
+
+

Build and deploy custom AI applications with your code.

+ +
+
+
+ +
+
+
+

⚙️ I'm managing endpoints

+
+
+

Scale, monitor, and optimize your serverless endpoints.

+ +
+
+
+
+

🔍 I need specific information

+
+
+

Find technical details and reference documentation.

+ +
+
+
+ +## Key features + +- **Pay-per-second pricing**: Only pay for what you use +- **Automatic scaling**: Scale to meet demand, then down to zero when idle +- **Global availability**: Deploy your endpoints close to your users +- **Custom containers**: Use any Docker container with your preferred framework +- **Fast cold starts**: Start processing in seconds, not minutes +- **Comprehensive monitoring**: Track usage, performance, and costs + +## When to use Serverless + +| Use case | Description | +|----------|-------------| +| AI inference | Deploy models that handle real-time prediction requests | +| Batch processing | Process large datasets with parallel workers | +| API backends | Build scalable APIs that handle variable workloads | +| Periodic tasks | Schedule jobs that run on a regular basis | + +## Get started + +Ready to deploy your first serverless endpoint? Choose one of these guides: + +- [Deploy a pre-built model](/docs/serverless/quick-start/deploy-models) - Get started in minutes +- [Build a custom endpoint](/docs/serverless/build/first-endpoint) - Create your own application +- [Set up GitHub integration](/docs/serverless/github-integration) - Connect to your GitHub repository \ No newline at end of file diff --git a/docs/serverless/manage/scaling.md b/docs/serverless/manage/scaling.md new file mode 100644 index 00000000..3089107f --- /dev/null +++ b/docs/serverless/manage/scaling.md @@ -0,0 +1,130 @@ +--- +title: Configure autoscaling +description: "Learn how to optimize the performance and cost of your RunPod serverless endpoints through effective autoscaling configuration. Find strategies for different workload patterns." +sidebar_position: 1 +--- + +# Configure autoscaling + +RunPod's serverless platform provides powerful autoscaling capabilities that allow your endpoints to dynamically adjust to changing workloads. This guide will help you configure autoscaling for optimal performance and cost efficiency. + +## How autoscaling works + +RunPod Serverless uses a queue-based system to manage requests and automatically scale workers: + +1. **Requests enter the queue**: When requests are sent to your endpoint, they enter a queue +2. **Workers process the queue**: Available workers pull jobs from the queue +3. **Automatic scaling**: When the queue builds up, RunPod spawns more workers (up to your max) +4. **Scale down**: When workers are idle for the specified timeout period, they are shut down + +## Key autoscaling parameters + +| Parameter | Description | Default | Recommendation | +|-----------|-------------|---------|---------------| +| **Min Workers** | Minimum number of workers to keep running | 0 | Set to 1+ for low latency | +| **Max Workers** | Maximum number of workers to scale to | 1 | Set based on peak demand | +| **Idle Timeout** | How long to keep inactive workers (seconds) | 30 | 60-300 for balance | +| **Flash Boot** | Pre-warm workers for faster startup | Off | Enable for latency-sensitive apps | + +## Configuring autoscaling + +1. **Navigate to your endpoint** in the RunPod console +2. Click **Edit** next to your endpoint +3. Adjust the following settings: + +### Min Workers + +Set the minimum number of workers to keep running at all times: + +- **0 workers**: Scale to zero when idle (lowest cost, highest cold start latency) +- **1+ workers**: Always keep some workers running (higher cost, instant responses) + +### Max Workers + +Set the maximum number of workers that can run concurrently: + +- Set this based on your expected peak demand +- Consider your budget and quota limits +- Higher values allow for better handling of traffic spikes + +### Idle Timeout + +Configure how long (in seconds) to keep inactive workers running: + +- **Low values (30-60s)**: More aggressive scale down, lower costs +- **High values (5-30min)**: More stable availability, lower cold starts, higher costs + +## Choosing the right configuration + +### For low cost, occasional use + +``` +Min Workers: 0 +Max Workers: 1-3 +Idle Timeout: 30-60 seconds +``` + +Best for: Personal projects, testing, infrequent usage + +### For balanced performance and cost + +``` +Min Workers: 1 +Max Workers: 5-10 +Idle Timeout: 2-5 minutes +``` + +Best for: Production applications with moderate, variable traffic + +### For high-performance applications + +``` +Min Workers: 2+ +Max Workers: 20+ +Idle Timeout: 5-10 minutes +Flash Boot: Enabled +``` + +Best for: Production applications with high traffic or strict latency requirements + +## Monitoring autoscaling performance + +1. Navigate to your endpoint in the RunPod console +2. Click the **Metrics** tab to view: + - Active workers over time + - Queue depth + - Request latency + - Worker utilization + +Use these metrics to fine-tune your autoscaling configuration. + +## Advanced strategies + +### For batch processing + +If you're running batch jobs where throughput is more important than latency: + +- Set a higher **Max Workers** to process more jobs in parallel +- Consider a lower **Idle Timeout** to reduce costs between batches +- Use the [RunPod Sync API](/docs/serverless/reference/api) for coordinating batch jobs + +### For consistent low latency + +If your application requires consistently low latency: + +- Set **Min Workers** to at least 1 to avoid cold starts +- Enable **Flash Boot** to pre-warm additional workers +- Consider using [Multiple GPU types](/docs/serverless/reference/configurations) for cost optimization + +## Common pitfalls + +- **Setting Max Workers too low**: Can cause queue buildup during traffic spikes +- **Setting Min Workers too high**: Increases costs unnecessarily during low-traffic periods +- **Setting Idle Timeout too low**: Can cause thrashing (constant scaling up and down) +- **Not accounting for initialization time**: Some models take time to load, adjust accordingly + +## Next steps + +- [Monitor performance](/docs/serverless/manage/monitoring) - Track your endpoint's metrics +- [Optimize resources](/docs/serverless/manage/optimize) - Further tune for cost and performance +- [Manage jobs](/docs/serverless/manage/jobs) - Learn how to manage individual job requests \ No newline at end of file diff --git a/docs/serverless/overview.md b/docs/serverless/overview.md index fd21c676..69fb90c6 100644 --- a/docs/serverless/overview.md +++ b/docs/serverless/overview.md @@ -1,32 +1,98 @@ --- -title: Overview -description: "Scale machine learning workloads with RunPod Serverless, offering flexible GPU computing for AI inference, training, and general compute, with pay-per-second pricing and fast deployment options for custom endpoints and handler functions." -sidebar_position: 1 +title: Concepts overview +description: "Learn the core concepts of RunPod Serverless - how it works, key components, and fundamental architecture. Understand workers, endpoints, and the request lifecycle." +sidebar_position: 4 --- -RunPod offers Serverless GPU and CPU computing for AI inference, training, and general compute, allowing users to pay by the second for their compute usage. -This flexible platform is designed to scale dynamically, meeting the computational needs of AI workloads from the smallest to the largest scales. +# RunPod Serverless concepts -You can use the following methods: +This guide explains the core concepts of RunPod Serverless to help you understand how the platform works and how different components interact with each other. -- Handler Functions: Bring your own functions and run in the cloud. -- Quick Deploy: Quick deploys are pre-built custom endpoints of the most popular AI models. +## Architecture overview -## Why RunPod Serverless? +RunPod Serverless operates as a managed container platform that automatically handles scaling, infrastructure, and resource allocation. Here's how it works: -You should choose RunPod Serverless instances for the following reasons: +1. **Containers**: Your code runs inside Docker containers with the resources you specify (GPU, memory, etc.) +2. **Request queue**: API requests are placed in a queue specific to your endpoint +3. **Workers**: Container instances process jobs from the queue in parallel +4. **Auto-scaling**: Workers are dynamically created or destroyed based on demand -- **AI Inference:** Handle millions of inference requests daily and can be scaled to handle billions, making it an ideal solution for machine learning inference tasks. This allows users to scale their machine learning inference while keeping costs low. -- **Autoscale:** Dynamically scale workers from 0 to 100 on the Secure Cloud platform, which is highly available and distributed globally. This provides users with the computational resources exactly when needed. -- **AI Training:** Machine learning training tasks that can take up to 12 hours. GPUs can be spun up per request and scaled down once the task is done, providing a flexible solution for AI training needs. -- **Container Support:** Bring any Docker container to RunPod. Both public and private image repositories are supported, allowing users to configure their environment exactly how they want. -- **3s Cold-Start:** To help reduce cold-start times, RunPod proactively pre-warms workers. The total start time will vary based on the runtime, but for stable diffusion, the total start time is 3 seconds cold-start plus 5 seconds runtime. -- **Metrics and Debugging:** Transparency is vital in debugging. RunPod provides access to GPU, CPU, Memory, and other metrics to help users understand their computational workloads. Full debugging capabilities for workers through logs and SSH are also available, with a web terminal for even easier access. -- **Webhooks:** Users can leverage webhooks to get data output as soon as a request is done. Data is pushed directly to the user's Webhook API, providing instant access to results. + -RunPod Serverless are not just for AI Inference and Training. -They're also great for a variety of other use cases. -Feel free to use them for tasks like rendering, molecular dynamics, or any other computational task that suits your needs. +## Core components + +### Endpoints + +An endpoint is the public REST API URL that serves your application. It has: + +- A unique ID and API key for authentication +- Configuration for scaling, resources, and networking +- A job queue that manages all incoming requests + +### Workers + +Workers are container instances that process jobs from the queue: + +- Each worker runs your container image +- Workers can be scaled from 0 to many instances +- Workers are isolated for security and performance + +### Handler functions + +A handler function is the code that processes each request: + +```python +def handler(event): + # Process input data + job_input = event["input"] + + # Perform work here + result = process_data(job_input) + + # Return output + return result +``` + +### Jobs + +Each request to your endpoint creates a job: + +- Jobs have unique IDs for tracking +- Jobs can be synchronous (wait for result) or asynchronous (get result later) +- Jobs have states (PENDING, IN_PROGRESS, COMPLETED, etc.) + +## Request lifecycle + +1. **Request submission**: A client sends a POST request to your endpoint's URL +2. **Queue entry**: The request becomes a job in the queue +3. **Worker assignment**: A worker picks up the job when available +4. **Processing**: Your handler function processes the job +5. **Response**: The result is returned to the client + +## Scalability concepts + +### Min/Max workers + +- **Min workers**: The minimum number of workers to keep running (even when idle) +- **Max workers**: The maximum number of concurrent workers to scale to + +### Idle timeout + +The time (in seconds) a worker remains active after finishing a job before shutting down. + +### Flash boot + +A feature that pre-warms workers to reduce cold start latency. + +## Next steps + +Now that you understand the core concepts: + +1. [Deploy a pre-built model](quick-start/deploy-models.md) for a quick start +2. [Create your first endpoint](build/first-endpoint.md) to build something custom +3. [Learn about handler functions](workers/handlers/overview.md) for advanced development + +> **Pro tip**: For optimal performance and cost-efficiency, tailor your worker count and idle timeout to your workload patterns. + + +4. **Configure your endpoint** + + Set your endpoint configurations: + + - **Name**: Give your endpoint a descriptive name + - **GPU Type**: Select the GPU type (A100, A6000, etc.) + - **Worker Count**: How many concurrent workers to run + - **Idle Timeout**: How long to keep workers warm after inactivity + +5. **Deploy** + + Click "Deploy" to create your endpoint. The deployment typically takes 1-2 minutes. + +6. **Get your endpoint ID and API Key** + + Once deployed, note your Endpoint ID and ensure you have your API key (available in your account settings). + +## Testing your model + +### Using the web interface + +1. Navigate to your endpoint in the RunPod console +2. Click "Test" in the sidebar +3. Enter your test input in the provided JSON editor +4. Click "Run" to see the results + +### Using curl + +```bash +curl -X POST "https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/run" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -d '{ + "input": { + "prompt": "a photo of an astronaut riding a horse on mars" + } + }' +``` + +Replace `YOUR_ENDPOINT_ID` and `YOUR_API_KEY` with your actual values. + +## Example: Stable Diffusion XL + +Here's an example input for Stable Diffusion XL: + +```json +{ + "input": { + "prompt": "A serene mountain landscape with a lake, photorealistic, 8k", + "negative_prompt": "blurry, distorted, low quality", + "num_inference_steps": 30, + "guidance_scale": 7.5 + } +} +``` + +## Next steps + +- Customize model parameters - Adjust settings for better results +- Integrate with your application - Connect your model to your app +- Monitor performance - Track usage and optimize costs \ No newline at end of file diff --git a/docs/serverless/reference/configurations.md b/docs/serverless/reference/configurations.md new file mode 100644 index 00000000..2f08f905 --- /dev/null +++ b/docs/serverless/reference/configurations.md @@ -0,0 +1,142 @@ +--- +title: Configuration options +description: "Complete reference guide for RunPod Serverless configuration options. Learn about all available settings for endpoints, workers, networking, storage, and advanced features." +sidebar_position: 1 +--- + +# Configuration options + +This reference guide details all available configuration options for RunPod Serverless endpoints. + +## Endpoint configurations + +### Basic settings + +| Setting | Description | Default | Notes | +|---------|-------------|---------|-------| +| **Name** | Identifier for your endpoint | - | Alphanumeric, hyphens allowed | +| **Image** | Container image URL | - | Public or private registry | +| **GPU Type** | Hardware type for compute | - | See [GPU options](#gpu-options) | +| **Worker Count** | Minimum workers to keep warm | 0 | Range: 0-1000 | +| **Max Workers** | Maximum concurrent workers | 1 | Range: 1-1000 | +| **Idle Timeout** | Seconds before scaling down | 30 | Range: 10-3600 | +| **Flash Boot** | Pre-warm workers | Disabled | Reduces cold start latency | + +### Network settings + +| Setting | Description | Default | Notes | +|---------|-------------|---------|-------| +| **Public Network** | Allow internet access | Enabled | Required for most use cases | +| **VPC** | Virtual Private Cloud | None | For secure networking | +| **Custom Domains** | Custom endpoint URL | None | Requires SSL certificate | + +### Storage settings + +| Setting | Description | Default | Notes | +|---------|-------------|---------|-------| +| **Volume Size** | Additional storage (GB) | 0 | Range: 0-1000 | +| **Volume Type** | Storage performance tier | Standard | Standard or SSD | +| **Persistent Storage** | Keep data between runs | Disabled | Higher cost, data preservation | + +### Advanced settings + +| Setting | Description | Default | Notes | +|---------|-------------|---------|-------| +| **Container Concurrency** | Jobs per worker | 1 | Range: 1-10 | +| **Memory** | RAM per worker (GB) | Auto | Based on GPU/CPU | +| **vCPUs** | CPU cores per worker | Auto | Based on GPU type | +| **Environment Variables** | Container environment | None | Format: KEY=value | +| **Secrets** | Encrypted environment vars | None | For sensitive data | +| **Container Template ID** | Template reference | None | For template deployment | + +## GPU options + +### NVIDIA GPUs + +| GPU | vCPUs | Memory | Best for | +|-----|-------|--------|----------| +| **A100 80GB** | 30 | 80GB | Large LLMs, multi-modal models | +| **A100 40GB** | 24 | 40GB | Most LLMs, large diffusion models | +| **A10G** | 12 | 24GB | Medium models, production workloads | +| **L4** | 8 | 24GB | Efficient, cost-effective inference | +| **A6000** | 14 | 48GB | Research, 3D rendering | +| **A5000** | 12 | 24GB | Computer vision, mixed workloads | +| **A4000** | 10 | 16GB | Smaller models, cost effective | +| **RTX 4090** | 12 | 24GB | Fast cost-effective inference | +| **RTX 3090** | 10 | 24GB | Good balance for most models | + +### CPU only + +| CPU | vCPUs | Memory | Best for | +|-----|-------|--------|----------| +| **4 Core** | 4 | 16GB | API servers, orchestration | +| **8 Core** | 8 | 32GB | Data processing, medium workloads | +| **16 Core** | 16 | 64GB | Heavy CPU computation | + +## Environment variables + +You can set these environment variables to configure worker behavior: + +| Variable | Description | Default | +|----------|-------------|---------| +| `RUNPOD_WEBHOOK_URL` | URL to receive job completion webhooks | None | +| `RUNPOD_WEBHOOK_SECRET` | Secret for webhook authentication | None | +| `RUNPOD_ENDPOINT_ID` | Endpoint identifier (auto-set) | - | +| `RUNPOD_API_KEY` | For API actions within container | None | +| `RUNPOD_LOG_LEVEL` | Logging verbosity (DEBUG,INFO,WARN,ERROR) | INFO | +| `RUNPOD_TIMEOUT_GRACE` | Grace period before timeout (seconds) | 30 | +| `RUNPOD_TRUSTED_ORIGINS` | CORS origins for direct access | None | + +## Endpoint template format + +If you're using CI/CD or Infrastructure as Code, here's the JSON schema for endpoint configuration: + +```json +{ + "name": "my-endpoint", + "image": "username/image:tag", + "gpu": "A10G", + "minWorkers": 0, + "maxWorkers": 5, + "idleTimeout": 60, + "flashBoot": false, + "network": { + "publicNetwork": true, + "vpc": null, + "domains": [] + }, + "storage": { + "size": 20, + "type": "ssd", + "persistent": false + }, + "advanced": { + "containerConcurrency": 1, + "memory": null, + "vcpu": null, + "envVars": { + "KEY1": "value1", + "KEY2": "value2" + }, + "secrets": { + "API_KEY": "secret_value" + } + } +} +``` + +## Configuration best practices + +- **Start with minimal resources** and scale up as needed +- **Test locally** before deployment to ensure container works +- **Use environment variables** for configuration that changes between environments +- **Use secrets** for sensitive information like API keys +- **Consider persistent storage** for large models or datasets +- **Monitor metrics** to fine-tune your configuration + +## Related guides + +- [Configure autoscaling](/docs/serverless/manage/scaling) - Optimize performance and cost +- [Optimize resources](/docs/serverless/manage/optimize) - Fine-tune resource usage +- [API reference](/docs/serverless/reference/api) - Programmatically manage endpoints +- [Troubleshooting guide](/docs/serverless/reference/troubleshooting) - Solve common issues \ No newline at end of file diff --git a/docs/serverless/reference/troubleshooting.md b/docs/serverless/reference/troubleshooting.md new file mode 100644 index 00000000..ca1daa63 --- /dev/null +++ b/docs/serverless/reference/troubleshooting.md @@ -0,0 +1,226 @@ +--- +title: Troubleshooting guide +description: "Comprehensive guide to troubleshooting common issues with RunPod Serverless. Find solutions for deployment problems, runtime errors, performance issues, and more." +--- + +# Troubleshooting guide + +This guide helps you diagnose and resolve common issues with RunPod Serverless. + +## Deployment issues + +### Endpoint failed to deploy + +| Symptom | Possible cause | Solution | +|---------|---------------|----------| +| "Failed to pull image" | Image doesn't exist or private | Verify image name, check registry permissions | +| "GPU not available" | GPU type unavailable/quota limit | Try different GPU type or contact support | +| "Container exited with code 1" | Container startup error | Check container logs for errors | +| "Network error" | Network configuration issue | Check network settings, VPC configurations | + +#### Checking container logs: + +1. Go to your endpoint in the RunPod console +2. Click on the "Logs" tab +3. Look for error messages during container startup + +### Container startup issues + +| Symptom | Possible cause | Solution | +|---------|---------------|----------| +| "ImportError: No module named X" | Missing dependency | Add dependency to Dockerfile | +| "CUDA error: out of memory" | Model too large for GPU | Use larger GPU or optimize model | +| "Permission denied" | File permissions issue | Fix permissions in Dockerfile or in handler | + +#### Example: Adding dependencies + +```dockerfile +# Add missing dependencies +RUN pip install --no-cache-dir missing-package +``` + +## Runtime errors + +### Request failures + +| Error code | Description | Possible solution | +|------------|-------------|-------------------| +| 400 | Bad request format | Check input JSON format | +| 401 | Unauthorized | Verify API key is correct and active | +| 404 | Endpoint not found | Verify endpoint ID is correct | +| 429 | Rate limit exceeded | Reduce request rate or increase quota | +| 500 | Server error | Check endpoint logs for errors | +| 503 | Service unavailable | Endpoint may be overloaded or down | + +### Common handler errors + +| Symptom | Possible cause | Solution | +|---------|---------------|----------| +| "KeyError" | Missing required input field | Check input validation in handler | +| "CUDA out of memory" | Input too large or memory leak | Optimize memory usage, batch processing | +| "Timeout error" | Processing took too long | Optimize handler, increase timeout, disable flash boot | + +#### Example: Input validation + +```python +def handler(event): + job_input = event.get("input", {}) + + # Validate required fields + if "text" not in job_input: + return {"error": "Missing required field 'text'"} + + # Process valid input + # ... +``` + +## Performance issues + +### Slow cold start + +| Symptom | Possible cause | Solution | +|---------|---------------|----------| +| First request takes >30s | Large container/model | Use smaller model, optimize container, enable flash boot | +| Inconsistent cold starts | Resource contention | Increase min workers to avoid cold starts | + +#### Optimizing cold starts: + +1. Minimize container size +2. Load models on demand rather than at startup +3. Use quantized models +4. Enable flash boot option +5. Increase min workers to 1+ to keep container warm + +### High latency + +| Symptom | Possible cause | Solution | +|---------|---------------|----------| +| All requests slow | Inefficient handler code | Profile and optimize handler | +| Intermittent slowdowns | Worker overload | Check metrics, adjust container concurrency | +| Queue delays | Not enough workers | Increase max workers, adjust scaling | + +## Scaling issues + +### Not scaling up + +| Symptom | Possible cause | Solution | +|---------|---------------|----------| +| Requests queuing | Max workers too low | Increase max workers | +| Workers not spawning | Resource quota reached | Request quota increase | +| Slow scaling | GPU availability | Try different GPU types | + +### Excessive scaling + +| Symptom | Possible cause | Solution | +|---------|---------------|----------| +| Too many idle workers | Min workers too high | Reduce min workers | +| High costs | Idle timeout too long | Reduce idle timeout | +| Workers scaling but idle | Handler not processing queue | Check handler implementation | + +## Logs and monitoring + +### Accessing logs + +1. Go to your endpoint in the RunPod console +2. Click on the "Logs" tab +3. Select a worker ID to view specific worker logs +4. Use the search function to filter logs + +### Setting log level + +Set the `RUNPOD_LOG_LEVEL` environment variable to one of: +- `DEBUG` - Verbose debugging information +- `INFO` - Standard operational information (default) +- `WARNING` - Only warnings and errors +- `ERROR` - Only errors + +```dockerfile +ENV RUNPOD_LOG_LEVEL=DEBUG +``` + +## Debugging strategies + +### Local testing + +Test your handler locally before deployment: + +```bash +python handler.py +``` + +This runs the handler with your `test_input.json` file. + +### SSH into worker + +For complex issues, you can SSH into a running worker: + +1. Go to your endpoint in the RunPod console +2. Click "SSH" next to an active worker +3. Use the web terminal to investigate issues + +Common debug commands: +```bash +# Check container logs +cat /var/log/runpod/worker.log + +# Check system resources +nvidia-smi +free -h +df -h + +# Check running processes +ps aux +``` + +### Testing API requests + +Use curl to test your endpoint directly: + +```bash +curl -X POST https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/run \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -d '{ + "input": { + "test": "value" + } + }' +``` + +## Common fixes + +### Memory issues + +1. **Use streaming responses** for large outputs +2. **Process data in batches** instead of all at once +3. **Clean up unused variables** to free memory +4. **Use lower precision** (fp16 or int8) for models +5. **Add swap space** in your Dockerfile: + +```dockerfile +# Add swap space for extra memory +RUN fallocate -l 4G /swapfile && \ + chmod 600 /swapfile && \ + mkswap /swapfile && \ + swapon /swapfile +``` + +### Network issues + +1. **Increase request timeout** for long-running operations +2. **Add retry logic** in your client applications +3. **Check firewall settings** if making external API calls +4. **Use VPC** for secure internal communication + +## Getting help + +If you've tried the solutions above and still have issues: + +1. **Check documentation** for updates and known issues +2. **Join the RunPod Discord** for community support +3. **Open a support ticket** with detailed information: + - Endpoint ID + - Container logs + - Steps to reproduce the issue + - Error messages + - Screenshots if applicable \ No newline at end of file diff --git a/docs/serverless/workers/overview.md b/docs/serverless/workers/overview.md index 3d825453..20572e7c 100644 --- a/docs/serverless/workers/overview.md +++ b/docs/serverless/workers/overview.md @@ -1,22 +1,112 @@ --- -title: Overview +title: Workers overview sidebar_position: 1 -description: "RunPod is a cloud-based platform for managed function execution, offering fully managed infrastructure, automatic scaling, flexible language support, and seamless integration, allowing developers to focus on code and deploy it easily." +description: "Technical reference for RunPod workers. Learn how workers function, their core components, and implementation details for building serverless applications." --- -Workers run your code in the cloud. +# Workers overview -### Key characteristics +Workers are the foundation of RunPod's serverless platform, executing code in response to API requests and automatically scaling based on demand. This guide provides a technical reference for implementing and working with RunPod workers. -- **Fully Managed Execution**: RunPod takes care of the underlying infrastructure, so your code runs whenever it's triggered, without any server setup or maintenance. -- **Automatic Scaling**: The platform scales your functions up or down based on the workload, ensuring efficient resource usage. -- **Flexible Language Support**: RunPod SDK supports various programming languages, allowing you to write functions in the language you're most comfortable with. -- **Seamless Integration**: Once your code is uploaded, RunPod provides an Endpoint, making it easy to integrate your Handler Functions into any part of your application. +## Core components -## Get started +A RunPod worker consists of these essential components: -To start using RunPod Workers: +### Handler function -1. **Write your function**: Code your Handler Functions in a supported language. -2. **Deploy to RunPod**: Upload your Handler Functions to RunPod. -3. **Integrate and Execute**: Use the provided Endpoint to integrate with your application. +The handler function is the entry point for all requests: + +```python +def handler(event): + """Process incoming requests""" + # Extract input data + job_input = event["input"] + + # Process the input + result = process_data(job_input) + + # Return response + return result +``` + +### Container environment + +Workers run in Docker containers with: +- Pre-configured runtime environment +- GPU acceleration (when needed) +- Network access for external API calls +- Isolated execution environment + +### Configuration options + +Workers can be configured with: +- Container image +- Hardware resources (CPU, RAM, GPU) +- Environment variables +- Network settings +- Scaling parameters + +## Worker lifecycle + +1. **Initialization**: When a worker starts, it loads dependencies and models +2. **Request handling**: The worker processes incoming jobs from the queue +3. **Response delivery**: Results are returned to the client +4. **Scaling**: Workers are created or destroyed based on queue depth +5. **Termination**: Workers shut down after the idle timeout + +## Implementation patterns + +### Synchronous processing + +The default pattern processes each request and returns results immediately: + +```python +def handler(event): + # Process input and return result directly + return process_data(event["input"]) +``` + +### Asynchronous processing + +For long-running tasks, process jobs asynchronously: + +```python +def handler(event): + # Start processing in background + process_id = start_background_task(event["input"]) + + # Return job ID for status tracking + return {"process_id": process_id, "status": "processing"} +``` + +### Batch processing + +Process multiple items in a single request: + +```python +def handler(event): + # Process a batch of items + items = event["input"]["items"] + results = [process_item(item) for item in items] + + return {"results": results} +``` + +## Development guidelines + +For optimal worker performance: + +1. **Minimize initialization time**: Load models and dependencies at startup +2. **Optimize memory usage**: Release resources when not needed +3. **Handle errors gracefully**: Return clear error information +4. **Validate input data**: Check required parameters +5. **Implement proper logging**: Use structured logging for troubleshooting + +## Related resources + +- [Handler functions documentation](handlers/overview.md) +- [Development guide](development/overview.md) +- [Deployment options](deploy/deploy.md) +- [Environment variables](development/environment-variables.md) + +For step-by-step tutorials on building workers, see the [Workers tutorials section](/docs/tutorials/workers/overview). diff --git a/docs/serverless/workers/specialized/stable-diffusion.md b/docs/serverless/workers/specialized/stable-diffusion.md new file mode 100644 index 00000000..f965a00c --- /dev/null +++ b/docs/serverless/workers/specialized/stable-diffusion.md @@ -0,0 +1,198 @@ +--- +title: Stable Diffusion worker +description: "Deploy and customize Stable Diffusion image generation models on RunPod using our optimized worker implementation. Generate high-quality images with simple API calls." +sidebar_position: 1 +--- + +# Stable Diffusion worker + +The RunPod Stable Diffusion worker provides an optimized implementation of Stable Diffusion models for image generation. This worker allows you to deploy and interact with Stable Diffusion models through a simple API interface. + +## Features + +- Support for multiple Stable Diffusion versions (XL, 2.1, 1.5) +- Optimized inference for faster generation +- Advanced sampling techniques (DDIM, DPM-Solver, etc.) +- Support for various generation parameters (guidance scale, steps, etc.) +- Image-to-image and inpainting capabilities +- ControlNet and LoRA adaptation support + +## Quick deployment + +### Option 1: Deploy from template + +1. Go to the [RunPod Console](https://www.runpod.io/console/serverless) +2. Click "New Endpoint" +3. Select "Stable Diffusion XL" from the templates +4. Configure your endpoint settings: + - GPU Type: A10G or higher recommended + - Worker Count: 0 (scale to zero) or 1+ for immediate availability + - Max Workers: Based on your expected load +5. Click "Deploy" + +### Option 2: Use custom Docker image + +```bash +docker pull runpod/stable-diffusion:latest +``` + +## API usage + +### Text-to-image generation + +```json +{ + "input": { + "prompt": "A photorealistic cat astronaut floating in space, 4k, detailed", + "negative_prompt": "deformed, ugly, bad anatomy", + "width": 1024, + "height": 1024, + "num_inference_steps": 30, + "guidance_scale": 7.5, + "seed": 42 + } +} +``` + +### Image-to-image generation + +```json +{ + "input": { + "prompt": "A castle in a magical forest", + "negative_prompt": "deformed, ugly, bad anatomy", + "image": "https://example.com/input-image.jpg", + "strength": 0.75, + "num_inference_steps": 30, + "guidance_scale": 7.5 + } +} +``` + +## Configuration + +### Environment variables + +| Variable | Description | Default | +|----------|-------------|---------| +| `SD_MODEL` | Model version to use | `stabilityai/stable-diffusion-xl-base-1.0` | +| `PRECISION` | Inference precision | `fp16` | +| `ENABLE_XFORMERS` | Enable memory efficient attention | `true` | +| `MAX_QUEUE_SIZE` | Maximum queue size | `100` | +| `MAX_BATCH_SIZE` | Maximum batch size | `4` | + +### Model options + +| Model ID | Description | VRAM Required | +|----------|-------------|--------------| +| `stabilityai/stable-diffusion-xl-base-1.0` | SDXL base model | 16GB+ | +| `stabilityai/stable-diffusion-2-1-base` | SD 2.1 base model | 8GB+ | +| `runwayml/stable-diffusion-v1-5` | SD 1.5 | 6GB+ | + +## Performance considerations + +- A10G GPUs can process approximately 2-3 SDXL images per minute at 1024x1024 +- For higher throughput, consider using A100 GPUs or increasing max_workers +- Reducing resolution and inference steps can significantly improve throughput +- Using SD 2.1 or 1.5 instead of SDXL can double throughput with slightly lower quality + +## Advanced usage + +### Using ControlNet + +```json +{ + "input": { + "prompt": "A fantasy landscape with mountains", + "controlnet": { + "type": "canny", + "image": "https://example.com/canny-image.jpg", + "conditioning_scale": 0.8 + } + } +} +``` + +### Using LoRA models + +```json +{ + "input": { + "prompt": "A portrait in the style of ", + "lora": { + "model_id": "path/to/lora", + "weight": 0.8 + } + } +} +``` + +## Examples + +### Basic image generation + +```python +import requests +import json +import base64 +from PIL import Image +import io + +API_KEY = "YOUR_API_KEY" +ENDPOINT_ID = "YOUR_ENDPOINT_ID" + +def generate_image(prompt): + url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/run" + + payload = { + "input": { + "prompt": prompt, + "negative_prompt": "ugly, deformed, bad anatomy, blurry", + "num_inference_steps": 30, + "guidance_scale": 7.5, + "width": 1024, + "height": 1024 + } + } + + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {API_KEY}" + } + + response = requests.post(url, headers=headers, json=payload) + response_json = response.json() + + # Get job ID + job_id = response_json.get("id") + + # Check status and get result when complete + status_url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/status/{job_id}" + + while True: + status_response = requests.get(status_url, headers=headers) + status_data = status_response.json() + + if status_data.get("status") == "COMPLETED": + # Get the base64 image data + image_data = status_data.get("output", [])[0] + # Convert base64 to image + image = Image.open(io.BytesIO(base64.b64decode(image_data))) + # Save the image + image.save("generated_image.png") + print("Image generated successfully!") + break + + time.sleep(2) # Poll every 2 seconds + +# Generate an image +generate_image("A photorealistic cat astronaut floating in space, 4k, detailed") +``` + +## Next steps + +- [Explore vLLM workers](/docs/serverless/workers/vllm/overview) for text generation +- [Learn about text-to-speech workers](/docs/serverless/workers/specialized/tts) for audio generation +- [Configure autoscaling](/docs/serverless/manage/scaling) to optimize cost and performance + +> **Pro tip**: For AI application development, consider using a combination of different worker types to build a complete pipeline (e.g., text generation → image generation → speech synthesis). \ No newline at end of file diff --git a/docs/serverless/workers/specialized/tts.md b/docs/serverless/workers/specialized/tts.md new file mode 100644 index 00000000..5dd99507 --- /dev/null +++ b/docs/serverless/workers/specialized/tts.md @@ -0,0 +1,222 @@ +--- +title: Text-to-Speech worker +description: "Deploy advanced text-to-speech models on RunPod to convert text into natural-sounding speech. Use a variety of voices, languages, and styles with simple API calls." +sidebar_position: 2 +--- + +# Text-to-Speech worker + +The RunPod Text-to-Speech (TTS) worker provides high-quality speech synthesis capabilities powered by cutting-edge models. This worker allows you to convert text into natural-sounding speech with various voices and styles. + +## Features + +- High-quality natural-sounding speech synthesis +- Support for multiple TTS models (XTTS, Bark, Coqui) +- Voice cloning capabilities (from audio samples) +- Multiple language support +- Voice emotion and style control +- Background noise and music mixing options + +## Quick deployment + +### Option 1: Deploy from template + +1. Go to the [RunPod Console](https://www.runpod.io/console/serverless) +2. Click "New Endpoint" +3. Select "XTTS" from the templates +4. Configure your endpoint settings: + - GPU Type: L4 or RTX 4000 series recommended + - Worker Count: 0 (scale to zero) or 1+ for immediate availability + - Max Workers: Based on your expected load +5. Click "Deploy" + +### Option 2: Use custom Docker image + +```bash +docker pull runpod/tts:latest +``` + +## API usage + +### Basic text-to-speech + +```json +{ + "input": { + "text": "Hello, this is a test of the text to speech system. It sounds quite natural.", + "voice": "female_01", + "language": "en", + "speed": 1.0 + } +} +``` + +### Voice cloning + +```json +{ + "input": { + "text": "This is a custom voice saying hello to everyone listening.", + "voice_sample_url": "https://example.com/voice-sample.mp3", + "language": "en", + "speed": 1.0 + } +} +``` + +### Styled speech + +```json +{ + "input": { + "text": "Breaking news! Scientists have made an incredible discovery!", + "voice": "male_02", + "style": "newsreader", + "emotion": "excited", + "language": "en" + } +} +``` + +## Configuration + +### Environment variables + +| Variable | Description | Default | +|----------|-------------|---------| +| `TTS_MODEL` | TTS model to use | `XTTS-v2` | +| `DEFAULT_VOICE` | Default voice if not specified | `female_01` | +| `DEFAULT_LANGUAGE` | Default language code | `en` | +| `ENABLE_VOICE_CLONING` | Enable voice cloning feature | `true` | +| `MAX_TEXT_LENGTH` | Maximum input text length | `1000` | + +### Model options + +| Model ID | Description | VRAM Required | +|----------|-------------|--------------| +| `XTTS-v2` | High-quality multilingual TTS | 8GB+ | +| `Bark` | Versatile TTS with ambient sounds | 10GB+ | +| `Coqui-TTS` | Lightweight TTS model | 4GB+ | + +## Voice options + +| Voice ID | Description | Languages | +|----------|-------------|-----------| +| `female_01` | Professional female voice | en, es, fr | +| `female_02` | Young female voice | en, de, fr | +| `male_01` | Deep male voice | en, es, de | +| `male_02` | Professional male voice | en, fr, it | +| `child_01` | Child voice | en | + +## Performance considerations + +- L4 GPUs can generate speech at approximately 20-30x realtime (20-30 seconds of audio per second) +- A single worker can handle multiple concurrent requests +- For high-throughput applications, consider using multiple workers +- Text length directly impacts generation time +- Voice cloning requires additional processing time (5-10 seconds per request) + +## Advanced usage + +### Background music mixing + +```json +{ + "input": { + "text": "Welcome to our podcast about science and technology.", + "voice": "male_01", + "background_music": { + "url": "https://example.com/background-music.mp3", + "volume": 0.2 + } + } +} +``` + +### Multi-voice conversations + +```json +{ + "input": { + "conversation": [ + {"speaker": "female_01", "text": "Hello, how are you today?"}, + {"speaker": "male_01", "text": "I'm doing great, thanks for asking!"}, + {"speaker": "female_01", "text": "Wonderful to hear that. Let's get started."} + ], + "language": "en" + } +} +``` + +## Examples + +### Python client example + +```python +import requests +import base64 +import io +from pydub import AudioSegment +from pydub.playback import play + +API_KEY = "YOUR_API_KEY" +ENDPOINT_ID = "YOUR_ENDPOINT_ID" + +def generate_speech(text, voice="female_01", language="en"): + url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/run" + + payload = { + "input": { + "text": text, + "voice": voice, + "language": language + } + } + + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {API_KEY}" + } + + response = requests.post(url, headers=headers, json=payload) + response_json = response.json() + + # Get job ID + job_id = response_json.get("id") + + # Check status and get result when complete + status_url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/status/{job_id}" + + while True: + status_response = requests.get(status_url, headers=headers) + status_data = status_response.json() + + if status_data.get("status") == "COMPLETED": + # Get the base64 audio data + audio_data = status_data.get("output", {}).get("audio_base64") + + # Convert base64 to audio + audio_bytes = base64.b64decode(audio_data) + audio = AudioSegment.from_file(io.BytesIO(audio_bytes), format="mp3") + + # Save the audio + audio.export("generated_speech.mp3", format="mp3") + print("Speech generated successfully!") + + # Play the audio + play(audio) + break + + time.sleep(1) # Poll every second + +# Generate speech +generate_speech("Hello, this is a test of the text to speech system. It sounds quite natural.") +``` + +## Next steps + +- [Explore vLLM workers](/docs/serverless/workers/vllm/overview) for text generation +- [Learn about Stable Diffusion workers](/docs/serverless/workers/specialized/stable-diffusion) for image generation +- [Configure autoscaling](/docs/serverless/manage/scaling) to optimize cost and performance + +> **Pro tip**: Combine TTS with other workers to create end-to-end applications, such as text generation → speech synthesis or text → image → video narration pipelines. \ No newline at end of file diff --git a/docs/tutorials/workers/custom-worker.md b/docs/tutorials/workers/custom-worker.md new file mode 100644 index 00000000..fc7a7f22 --- /dev/null +++ b/docs/tutorials/workers/custom-worker.md @@ -0,0 +1,721 @@ +--- +title: Build a custom worker from scratch +description: "Learn how to create a custom RunPod worker from scratch. This step-by-step guide covers setting up the project structure, implementing the handler, building the container, and deploying to RunPod Serverless." +sidebar_position: 4 +--- + +# Building a custom worker from scratch + +In this tutorial, you'll learn how to build a custom RunPod worker completely from scratch. Rather than starting with an existing template, we'll walk through creating each component of a worker, helping you understand the architecture and customize it for your specific needs. + +## Prerequisites + +- RunPod account with serverless access +- Docker installed locally +- Basic understanding of Python and Docker +- Git for version control (optional) + +## Step 1: Set up your project structure + +First, let's create a directory structure for our worker: + +```bash +mkdir -p my-custom-worker/{src,builder} +cd my-custom-worker +``` + +Now, create the basic files for your worker: + +```bash +touch handler.py +touch Dockerfile +touch README.md +touch src/__init__.py +touch src/rp_handler.py +touch builder/requirements.txt +``` + +## Step 2: Define your worker's function + +For this tutorial, we'll create an image processing worker that can perform basic operations like resizing, cropping, and applying filters. Let's start by defining the requirements: + +```bash +# Add these to builder/requirements.txt +runpod==1.2.0 +Pillow==10.0.0 +numpy==1.24.3 +requests==2.30.0 +``` + +Now, let's implement the core functionality in `src/image_processor.py`: + +```bash +touch src/image_processor.py +``` + +Edit the file with the following content: + +```python +import os +from io import BytesIO +import base64 +from PIL import Image, ImageFilter, ImageOps, ImageEnhance + +class ImageProcessor: + """ + Class for handling image processing operations + """ + + @staticmethod + def load_image(image_data=None, image_path=None): + """Load an image from data or path""" + if image_data: + return Image.open(BytesIO(image_data)) + elif image_path: + return Image.open(image_path) + else: + raise ValueError("Either image_data or image_path must be provided") + + @staticmethod + def resize_image(image, width=None, height=None, maintain_aspect=True): + """Resize an image to specified dimensions""" + if width is None and height is None: + return image + + if maintain_aspect: + if width and height: + return image.thumbnail((width, height)) + elif width: + ratio = width / image.width + height = int(image.height * ratio) + return image.resize((width, height), Image.LANCZOS) + elif height: + ratio = height / image.height + width = int(image.width * ratio) + return image.resize((width, height), Image.LANCZOS) + else: + if width is None: + width = image.width + if height is None: + height = image.height + return image.resize((width, height), Image.LANCZOS) + + @staticmethod + def crop_image(image, left, top, right, bottom): + """Crop image to specified coordinates""" + return image.crop((left, top, right, bottom)) + + @staticmethod + def apply_filter(image, filter_name): + """Apply a filter to the image""" + filters = { + "blur": ImageFilter.BLUR, + "sharpen": ImageFilter.SHARPEN, + "contour": ImageFilter.CONTOUR, + "edge_enhance": ImageFilter.EDGE_ENHANCE, + "emboss": ImageFilter.EMBOSS, + "smooth": ImageFilter.SMOOTH, + "grayscale": "grayscale" + } + + if filter_name not in filters: + raise ValueError(f"Filter '{filter_name}' not supported. Available filters: {list(filters.keys())}") + + if filter_name == "grayscale": + return ImageOps.grayscale(image) + else: + return image.filter(filters[filter_name]) + + @staticmethod + def adjust_image(image, brightness=None, contrast=None, saturation=None): + """Adjust image properties""" + result = image + + if brightness is not None: + enhancer = ImageEnhance.Brightness(result) + result = enhancer.enhance(brightness) + + if contrast is not None: + enhancer = ImageEnhance.Contrast(result) + result = enhancer.enhance(contrast) + + if saturation is not None: + enhancer = ImageEnhance.Color(result) + result = enhancer.enhance(saturation) + + return result + + @staticmethod + def image_to_base64(image, format="JPEG", quality=85): + """Convert image to base64 string""" + buffer = BytesIO() + image.save(buffer, format=format, quality=quality) + img_str = base64.b64encode(buffer.getvalue()).decode('utf-8') + return f"data:image/{format.lower()};base64,{img_str}" + + @staticmethod + def process_image(input_data): + """Process an image based on input parameters""" + try: + # Extract image data - either from URL or base64 + image_data = input_data.get("image", {}) + + if "url" in image_data: + import requests + response = requests.get(image_data["url"], stream=True) + image = ImageProcessor.load_image(image_data=response.content) + elif "base64" in image_data: + # Handle potential base64 prefixes + base64_data = image_data["base64"] + if "base64," in base64_data: + base64_data = base64_data.split("base64,")[1] + + image = ImageProcessor.load_image(image_data=base64.b64decode(base64_data)) + else: + raise ValueError("Image source not specified. Provide 'url' or 'base64'") + + # Get operations to perform + operations = input_data.get("operations", []) + + # Apply each operation in sequence + for operation in operations: + op_type = operation.get("type") + params = operation.get("params", {}) + + if op_type == "resize": + width = params.get("width") + height = params.get("height") + maintain_aspect = params.get("maintain_aspect", True) + image = ImageProcessor.resize_image(image, width, height, maintain_aspect) + + elif op_type == "crop": + left = params.get("left", 0) + top = params.get("top", 0) + right = params.get("right", image.width) + bottom = params.get("bottom", image.height) + image = ImageProcessor.crop_image(image, left, top, right, bottom) + + elif op_type == "filter": + filter_name = params.get("name") + if not filter_name: + raise ValueError("Filter name not specified") + image = ImageProcessor.apply_filter(image, filter_name) + + elif op_type == "adjust": + brightness = params.get("brightness") + contrast = params.get("contrast") + saturation = params.get("saturation") + image = ImageProcessor.adjust_image(image, brightness, contrast, saturation) + + else: + raise ValueError(f"Unknown operation type: {op_type}") + + # Get output format + output_format = input_data.get("output", {}) + format_type = output_format.get("format", "JPEG") + quality = output_format.get("quality", 85) + + # Convert to base64 for output + result_base64 = ImageProcessor.image_to_base64(image, format=format_type, quality=quality) + + return { + "success": True, + "processed_image": result_base64, + "metadata": { + "width": image.width, + "height": image.height, + "format": format_type + } + } + + except Exception as e: + return { + "success": False, + "error": str(e) + } +``` + +## Step 3: Implement the handler + +Now, let's implement the RunPod handler in `src/rp_handler.py`: + +```python +from .image_processor import ImageProcessor + +def handler(event): + """ + This is the handler function that will be called by the serverless. + """ + try: + # Get job input + job_input = event.get("input", {}) + + # Process the image + result = ImageProcessor.process_image(job_input) + + # Return the result + return result + + except Exception as e: + # Return error information + return { + "success": False, + "error": str(e) + } +``` + +And in the main `handler.py` file: + +```python +#!/usr/bin/env python3 +import runpod +import os +import sys + +# Add the src directory to the path so we can import our modules +sys.path.append(os.path.dirname(os.path.realpath(__file__))) + +from src.rp_handler import handler + +# Start the serverless function +runpod.serverless.start({"handler": handler}) +``` + +## Step 4: Create the Dockerfile + +Now, let's create a Docker container for our worker: + +```dockerfile +FROM python:3.9-slim + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + ffmpeg \ + libsm6 \ + libxext6 \ + && rm -rf /var/lib/apt/lists/* + +# Set working directory +WORKDIR /app + +# Install Python dependencies +COPY builder/requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Copy the application +COPY . . + +# Run the worker +CMD ["python", "-u", "handler.py"] +``` + +## Step 5: Create a README + +Let's add documentation in `README.md`: + +```markdown +# Custom Image Processing Worker for RunPod + +This worker provides image processing capabilities including resizing, cropping, filtering, and adjustments. + +## Input Format + +The worker accepts the following input structure: + +```json +{ + "image": { + "url": "https://example.com/image.jpg" + // or + "base64": "base64_encoded_image_data" + }, + "operations": [ + { + "type": "resize", + "params": { + "width": 800, + "height": 600, + "maintain_aspect": true + } + }, + { + "type": "filter", + "params": { + "name": "sharpen" + } + } + ], + "output": { + "format": "JPEG", + "quality": 85 + } +} +``` + +## Operations + +The following operations are supported: + +1. **resize** - Resize an image + - `width`: Target width in pixels + - `height`: Target height in pixels + - `maintain_aspect`: Boolean to maintain aspect ratio + +2. **crop** - Crop an image + - `left`: Left coordinate + - `top`: Top coordinate + - `right`: Right coordinate + - `bottom`: Bottom coordinate + +3. **filter** - Apply a filter + - `name`: One of "blur", "sharpen", "contour", "edge_enhance", "emboss", "smooth", "grayscale" + +4. **adjust** - Adjust image properties + - `brightness`: Float value (1.0 is original, 0.0 is black, 2.0 is twice as bright) + - `contrast`: Float value (1.0 is original, 0.0 is gray, 2.0 is twice the contrast) + - `saturation`: Float value (1.0 is original, 0.0 is grayscale, 2.0 is twice the saturation) + +## Output + +The worker returns: + +```json +{ + "success": true, + "processed_image": "base64_encoded_result_image", + "metadata": { + "width": 800, + "height": 600, + "format": "JPEG" + } +} +``` + +## Error Handling + +In case of errors: + +```json +{ + "success": false, + "error": "Error message" +} +``` +``` + +## Step 6: Build and test locally + +Let's build and test our worker locally: + +1. Build the Docker image: + +```bash +docker build -t my-custom-worker:latest . +``` + +2. Create a test file named `test_input.json`: + +```json +{ + "input": { + "image": { + "url": "https://images.unsplash.com/photo-1682687220063-4742bd7fd538" + }, + "operations": [ + { + "type": "resize", + "params": { + "width": 800, + "height": 600, + "maintain_aspect": true + } + }, + { + "type": "filter", + "params": { + "name": "sharpen" + } + }, + { + "type": "adjust", + "params": { + "brightness": 1.2, + "contrast": 1.1 + } + } + ], + "output": { + "format": "JPEG", + "quality": 90 + } + } +} +``` + +3. Run the Docker container and test the worker: + +```bash +docker run -it --rm -v $(pwd)/test_input.json:/app/test_input.json my-custom-worker:latest python -c "import json; from src.rp_handler import handler; print(json.dumps(handler(json.load(open('test_input.json'))), indent=2))" +``` + +## Step 7: Push to Docker Hub + +After testing, push your worker to Docker Hub: + +```bash +# Log in to Docker Hub +docker login + +# Tag your image +docker tag my-custom-worker:latest yourusername/my-custom-worker:latest + +# Push to Docker Hub +docker push yourusername/my-custom-worker:latest +``` + +## Step 8: Deploy to RunPod + +1. Go to the [RunPod Serverless Console](https://www.runpod.io/console/serverless) +2. Click "New Endpoint" +3. Enter your Docker image URL (e.g., `yourusername/my-custom-worker:latest`) +4. Configure your endpoint settings: + - GPU Type: CPU is sufficient for basic image processing, but choose a GPU if needed + - Worker Count: 0 (scale to zero) or 1+ to keep warm + - Max Workers: Set based on expected load +5. Click "Deploy" + +## Step 9: Test your deployed endpoint + +Use this Python code to test your endpoint: + +```python +import requests +import json +import time +import base64 +from PIL import Image +from io import BytesIO + +API_KEY = "YOUR_API_KEY" +ENDPOINT_ID = "YOUR_ENDPOINT_ID" + +def process_image(image_path=None, image_url=None, operations=None, output=None): + url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/run" + + if operations is None: + operations = [ + { + "type": "resize", + "params": { + "width": 800, + "maintain_aspect": True + } + } + ] + + if output is None: + output = { + "format": "JPEG", + "quality": 85 + } + + # Prepare payload + payload = { + "input": { + "image": {}, + "operations": operations, + "output": output + } + } + + # Add image source (file or URL) + if image_path: + with open(image_path, "rb") as f: + img_base64 = base64.b64encode(f.read()).decode('utf-8') + payload["input"]["image"]["base64"] = img_base64 + elif image_url: + payload["input"]["image"]["url"] = image_url + else: + raise ValueError("Either image_path or image_url must be provided") + + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {API_KEY}" + } + + # Submit job + response = requests.post(url, headers=headers, json=payload) + response_json = response.json() + + job_id = response_json.get("id") + status_url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/status/{job_id}" + + # Poll for job completion + while True: + status_response = requests.get(status_url, headers=headers) + status_data = status_response.json() + + if status_data.get("status") == "COMPLETED": + return status_data.get("output") + elif status_data.get("status") == "FAILED": + return {"error": "Job failed", "details": status_data} + + time.sleep(1) + +# Example usage with URL +result = process_image( + image_url="https://images.unsplash.com/photo-1682687220063-4742bd7fd538", + operations=[ + { + "type": "resize", + "params": { + "width": 800, + "maintain_aspect": True + } + }, + { + "type": "filter", + "params": { + "name": "sharpen" + } + } + ] +) + +if result.get("success"): + # Display or save the image + image_data = result["processed_image"].split("base64,")[1] + image_bytes = base64.b64decode(image_data) + + # Open the image + image = Image.open(BytesIO(image_bytes)) + + # Save to file + image.save("processed_image.jpg") + print(f"Image processed successfully: {result['metadata']}") +else: + print(f"Error: {result.get('error')}") +``` + +## Advanced improvements + +Here are some ways to enhance your worker: + +### 1. Add batch processing + +Modify the handler to accept multiple images in a batch: + +```python +def handler(event): + try: + job_input = event.get("input", {}) + + # Check if this is a batch job + if "images" in job_input: + results = [] + + for image_job in job_input["images"]: + # Process each image + result = ImageProcessor.process_image(image_job) + results.append(result) + + return { + "success": True, + "results": results + } + else: + # Single image processing + return ImageProcessor.process_image(job_input) + + except Exception as e: + return { + "success": False, + "error": str(e) + } +``` + +### 2. Add image analysis + +Extend the processor to include analysis features: + +```python +@staticmethod +def analyze_image(image): + """Analyze image properties""" + import numpy as np + + # Convert to numpy array + img_array = np.array(image) + + # Calculate histogram + hist_r = np.histogram(img_array[..., 0], bins=256, range=(0, 256))[0] if image.mode in ('RGB', 'RGBA') else [] + hist_g = np.histogram(img_array[..., 1], bins=256, range=(0, 256))[0] if image.mode in ('RGB', 'RGBA') else [] + hist_b = np.histogram(img_array[..., 2], bins=256, range=(0, 256))[0] if image.mode in ('RGB', 'RGBA') else [] + + # Calculate average brightness + if image.mode == 'L': + brightness = float(np.mean(img_array)) + else: + brightness = float(np.mean(0.299 * img_array[..., 0] + 0.587 * img_array[..., 1] + 0.114 * img_array[..., 2])) + + return { + "size": (image.width, image.height), + "mode": image.mode, + "format": image.format, + "brightness": brightness, + "histograms": { + "r": hist_r.tolist() if len(hist_r) > 0 else [], + "g": hist_g.tolist() if len(hist_g) > 0 else [], + "b": hist_b.tolist() if len(hist_b) > 0 else [] + } + } +``` + +### 3. Add caching + +Implement a simple caching mechanism for frequently processed images: + +```python +import hashlib + +# Add to ImageProcessor class +image_cache = {} + +@staticmethod +def get_cache_key(input_data): + """Generate a cache key from input data""" + return hashlib.md5(json.dumps(input_data, sort_keys=True).encode()).hexdigest() + +@staticmethod +def process_image_with_cache(input_data): + """Process image with caching""" + cache_key = ImageProcessor.get_cache_key(input_data) + + # Check if result is in cache + if cache_key in ImageProcessor.image_cache: + print("Using cached result") + return ImageProcessor.image_cache[cache_key] + + # Process image + result = ImageProcessor.process_image(input_data) + + # Store in cache (with a limit) + if len(ImageProcessor.image_cache) > 100: + # Remove a random item if cache is full + ImageProcessor.image_cache.pop(next(iter(ImageProcessor.image_cache))) + + ImageProcessor.image_cache[cache_key] = result + return result +``` + +## Next steps + +- Add more advanced image processing operations like object detection or segmentation +- Integrate with computer vision libraries like OpenCV +- Add support for WebP, AVIF, and other modern image formats +- Implement more sophisticated caching and optimization +- Create a simple web interface for testing your worker +- Set up monitoring and logging for production use +- Share your worker with the RunPod community + +> **Pro tip**: Consider adding a simple health check endpoint to your worker to make sure it's running correctly and to monitor resource usage. \ No newline at end of file diff --git a/docs/tutorials/workers/index.md b/docs/tutorials/workers/index.md new file mode 100644 index 00000000..d0a8f3a2 --- /dev/null +++ b/docs/tutorials/workers/index.md @@ -0,0 +1,52 @@ +--- +title: Worker Tutorials +description: "Learn how to create and customize specialized RunPod workers. Follow step-by-step guides to build, deploy, and optimize workers for various AI workloads." +sidebar_position: 1 +--- + +# Worker Tutorials + +These tutorials will guide you through creating and customizing specialized RunPod workers. Each tutorial provides detailed instructions and examples to help you build robust, high-performance workers for your specific use cases. + +## What you'll learn + +- How to set up your development environment for creating workers +- Best practices for building efficient and scalable workers +- Techniques for customizing and optimizing workers for specific tasks +- Methods for testing and deploying your workers to production + +## Available tutorials + +| Tutorial | Description | +|----------|-------------| +| [Build a Custom Worker from Scratch](./custom-worker.md) | Create a custom worker completely from scratch with a focus on image processing | +| [Create a vLLM Worker](./vllm-worker.md) | Build a worker for serving Large Language Models efficiently using vLLM | +| [Build a Whisper STT Worker](./whisper-worker.md) | Create a speech-to-text worker using OpenAI's Whisper model | +| [Custom Stable Diffusion Worker](/docs/serverless/workers/specialized/stable-diffusion) | Deploy and customize a Stable Diffusion image generation worker | +| [Text-to-Speech Worker](/docs/serverless/workers/specialized/tts) | Build a worker for converting text into natural-sounding speech | + +## Prerequisites + +Before starting these tutorials, you should have: + +- A RunPod account with serverless access +- Basic understanding of Docker and containerization +- Familiarity with Python programming +- Docker installed on your local machine for development and testing + +## Getting help + +If you encounter any issues while following these tutorials, you can: + +- Check the [RunPod documentation](https://docs.runpod.io) for additional information +- Join the [RunPod Discord community](https://discord.gg/runpod) for help from other users +- Contact [RunPod support](https://runpod.io/contact) for assistance + +## Next steps + +After completing these tutorials, consider: + +1. Exploring other worker templates in the [RunPod Workers GitHub repository](https://github.com/runpod-workers) +2. Combining multiple workers to create more complex AI pipelines +3. Sharing your custom worker with the RunPod community +4. Setting up monitoring and observability for your deployed workers \ No newline at end of file diff --git a/docs/tutorials/workers/overview.md b/docs/tutorials/workers/overview.md new file mode 100644 index 00000000..eb535909 --- /dev/null +++ b/docs/tutorials/workers/overview.md @@ -0,0 +1,76 @@ +--- +title: Workers overview +description: "Learn about RunPod workers and how they power serverless AI deployments. Understand worker architecture, capabilities, and implementation approaches." +sidebar_position: 1 +--- + +# Workers overview + +RunPod workers are the fundamental building blocks of serverless deployments on the RunPod platform. They provide a containerized, scalable approach to running AI workloads on demand. This overview explains what workers are, how they function, and the different types available. + +## What are workers? + +Workers are containerized applications designed to perform specific tasks when invoked through the RunPod API. Each worker: + +- Runs in its own isolated environment (Docker container) +- Processes jobs from a queue +- Returns results through a standardized API response +- Can scale horizontally based on demand + +Workers follow a handler-based architecture, where a central function processes incoming requests, performs computations, and returns results. + +## Worker architecture + +A typical RunPod worker consists of these components: + +1. **Handler function**: The entry point that receives requests and returns responses +2. **Core logic**: The specific implementation of the worker's functionality +3. **Input validation**: Ensures requests contain the necessary parameters +4. **Output formatting**: Structures the results in a standardized way +5. **Error handling**: Manages exceptions and provides helpful error messages + +## Types of workers + +RunPod supports several types of workers: + +### By template + +- **Pre-built workers**: Ready-to-use workers for common AI tasks (vLLM, Stable Diffusion, Whisper) +- **Template workers**: Starting points for custom development +- **Custom workers**: Fully customized implementations for specific needs + +### By capability + +- **Inference workers**: Deploy AI models for inference (text generation, image generation) +- **Processing workers**: Transform or analyze data (image processing, audio transcription) +- **Utility workers**: Provide support functions (file conversion, data extraction) + +## Worker development approaches + +There are multiple ways to develop workers for RunPod: + +1. **Start from scratch**: Build a worker completely from the ground up +2. **Customize a template**: Modify an existing template to meet your needs +3. **Adapt an existing model**: Package a pre-trained model into a worker +4. **Combine workers**: Create pipelines of workers that feed into each other + +## When to use workers + +Workers are ideal for: + +- Deploying machine learning models for inference +- Processing data asynchronously +- Handling computationally intensive tasks +- Building scalable AI applications +- Creating APIs for AI functionality + +## Next steps + +The tutorials in this section will guide you through creating different types of workers: + +- Building a custom worker from scratch +- Creating a vLLM worker for text generation +- Implementing a Whisper worker for speech-to-text +- Customizing specialized workers like Stable Diffusion and TTS + +Each tutorial provides step-by-step instructions and code examples to help you build and deploy your own workers on RunPod. \ No newline at end of file diff --git a/docs/tutorials/workers/vllm-worker.md b/docs/tutorials/workers/vllm-worker.md new file mode 100644 index 00000000..d2899939 --- /dev/null +++ b/docs/tutorials/workers/vllm-worker.md @@ -0,0 +1,476 @@ +--- +title: Create a vLLM worker +description: "Learn how to create, customize, and deploy a vLLM worker for serving Large Language Models efficiently on RunPod. Follow step-by-step instructions to build your own LLM inference endpoint." +sidebar_position: 2 +--- + +# Creating a custom vLLM worker + +In this tutorial, you'll learn how to create and deploy a custom vLLM worker on RunPod. vLLM is a high-performance library for LLM inference that provides significant speedups over traditional methods. We'll walk through setting up your development environment, customizing a vLLM worker, and deploying it as a serverless endpoint. + +## Prerequisites + +- RunPod account with serverless access +- Docker installed locally +- Basic understanding of Python and Docker +- GitHub account (optional, for storing your worker code) + +## Step 1: Set up your development environment + +1. Clone the RunPod vLLM worker template: + +```bash +git clone https://github.com/runpod-workers/worker-vllm.git +cd worker-vllm +``` + +2. Create a Python virtual environment: + +```bash +python -m venv venv +source venv/bin/activate # On Windows: venv\Scripts\activate +``` + +3. Install development dependencies: + +```bash +pip install -r requirements.txt +``` + +## Step 2: Understand the worker structure + +The template worker includes these key files: + +- `handler.py`: The main worker logic that processes requests +- `Dockerfile`: Instructions for building the worker container +- `builder/requirements.txt`: Python dependencies for the worker +- `predict.py`: Core inference logic using vLLM + +## Step 3: Customize the worker for your model + +Let's modify the worker to use a specific LLM model and add custom parameters. + +1. Open `predict.py` and modify the model initialization: + +```python +import os +from vllm import LLM, SamplingParams + +# Get model ID from environment variable or use default +MODEL_ID = os.environ.get("MODEL_ID", "meta-llama/Llama-3-8b-instruct") + +# Initialize the model with specific settings +def init_model(): + # Automatically determine tensor parallelism based on available GPUs + gpu_count = os.environ.get("GPU_COUNT", 1) + + model = LLM( + model=MODEL_ID, + tensor_parallel_size=int(gpu_count), + trust_remote_code=True, + # Add custom vLLM options here + dtype="bfloat16", # Use bfloat16 for better efficiency + gpu_memory_utilization=0.8, # Control memory usage + enforce_eager=False, # Use CUDA graphs for optimization + ) + return model +``` + +2. Create a more sophisticated handler in `handler.py`: + +```python +import runpod +import os +import predict +from vllm import SamplingParams + +# Initialize the model globally +model = predict.init_model() + +# Define default sampling parameters +default_params = SamplingParams( + temperature=0.7, + top_p=0.95, + max_tokens=512 +) + +def format_chat_prompt(messages): + """Format messages into a prompt format that the model understands.""" + formatted = "" + + # Different models have different chat templates + # This example uses Llama-3 style formatting + for msg in messages: + role = msg.get("role", "user") + content = msg.get("content", "") + + if role == "system": + formatted += f"<|system|>\n{content}\n" + elif role == "user": + formatted += f"<|user|>\n{content}\n" + elif role == "assistant": + formatted += f"<|assistant|>\n{content}\n" + + # Add final assistant prompt to indicate it's the model's turn + formatted += "<|assistant|>\n" + + return formatted + +def handler(event): + """Handle inference requests.""" + try: + job_input = event["input"] + + # Get prompt - either as raw text or chat format + if "messages" in job_input: + prompt = format_chat_prompt(job_input["messages"]) + else: + prompt = job_input.get("prompt", "") + + # Get custom parameters or use defaults + params = job_input.get("parameters", {}) + + # Create sampling parameters + sampling_params = SamplingParams( + temperature=params.get("temperature", default_params.temperature), + top_p=params.get("top_p", default_params.top_p), + top_k=params.get("top_k", 50), + max_tokens=params.get("max_tokens", default_params.max_tokens), + presence_penalty=params.get("presence_penalty", 0.0), + frequency_penalty=params.get("frequency_penalty", 0.0), + stop=params.get("stop", None) + ) + + # Generate text + outputs = model.generate([prompt], sampling_params) + + # Format output + generated_text = outputs[0].outputs[0].text + + # Count tokens for usage tracking + prompt_tokens = len(model.tokenizer.encode(prompt)) + completion_tokens = len(model.tokenizer.encode(generated_text)) + + return { + "generated_text": generated_text, + "model": os.environ.get("MODEL_ID", "meta-llama/Llama-3-8b-instruct"), + "usage": { + "prompt_tokens": prompt_tokens, + "completion_tokens": completion_tokens, + "total_tokens": prompt_tokens + completion_tokens + } + } + + except Exception as e: + return {"error": str(e)} + +# Start the serverless worker +runpod.serverless.start({"handler": handler}) +``` + +3. Update the Dockerfile to customize the environment: + +```dockerfile +FROM runpod/base:0.4.0-cuda12.1.0 + +# Set environment variables +ENV MODEL_ID="meta-llama/Llama-3-8b-instruct" +ENV HUGGING_FACE_HUB_TOKEN="YOUR_HF_TOKEN" # Set your token here or via RunPod +ENV PORT=8000 +ENV WORKSPACE="/workspace" + +# Install dependencies +WORKDIR ${WORKSPACE} +COPY builder/requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt +RUN pip install --no-cache-dir flash-attn --no-build-isolation + +# Copy source code +COPY . . + +# vLLM writes model files to this directory +VOLUME /root/.cache/huggingface + +# Pre-download model weights +RUN python -c "from huggingface_hub import snapshot_download; snapshot_download('${MODEL_ID}')" + +# Run the worker +CMD python -u handler.py +``` + +## Step 4: Test the worker locally + +1. Create a test input file named `test_input.json`: + +```json +{ + "input": { + "messages": [ + { + "role": "system", + "content": "You are a helpful AI assistant." + }, + { + "role": "user", + "content": "Explain quantum computing in simple terms." + } + ], + "parameters": { + "temperature": 0.7, + "max_tokens": 500 + } + } +} +``` + +2. Run the worker locally: + +```bash +python handler.py +``` + +## Step 5: Build and push the Docker image + +1. Build the Docker image: + +```bash +docker build -t your-username/vllm-worker:latest . +``` + +2. Push to Docker Hub: + +```bash +docker push your-username/vllm-worker:latest +``` + +## Step 6: Deploy to RunPod + +1. Go to the [RunPod Serverless Console](https://www.runpod.io/console/serverless) +2. Click "New Endpoint" +3. Enter your Docker image URL +4. Configure your endpoint settings: + - GPU Type: A10G or better recommended for LLMs + - Worker Count: 0 (scale to zero) or 1+ to keep warm + - Max Workers: Set based on expected load + - Advanced Settings: Add your Hugging Face token as a secret +5. Click "Deploy" + +## Step 7: Test your deployed endpoint + +Use this Python code to test your endpoint: + +```python +import requests +import json +import time + +API_KEY = "YOUR_API_KEY" +ENDPOINT_ID = "YOUR_ENDPOINT_ID" + +def generate_text(messages): + url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/run" + + payload = { + "input": { + "messages": messages, + "parameters": { + "temperature": 0.7, + "max_tokens": 500 + } + } + } + + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {API_KEY}" + } + + response = requests.post(url, headers=headers, json=payload) + response_json = response.json() + + job_id = response_json.get("id") + status_url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/status/{job_id}" + + while True: + status_response = requests.get(status_url, headers=headers) + status_data = status_response.json() + + if status_data.get("status") == "COMPLETED": + return status_data.get("output") + + time.sleep(1) + +# Example conversation +messages = [ + {"role": "system", "content": "You are a helpful AI assistant."}, + {"role": "user", "content": "Explain quantum computing in simple terms."} +] + +result = generate_text(messages) +print(result["generated_text"]) +``` + +## Adding OpenAI compatibility (Optional) + +To make your vLLM worker compatible with the OpenAI API format: + +1. Create a new file `openai_compat.py`: + +```python +import json +from fastapi import FastAPI, Request +from fastapi.responses import StreamingResponse +import asyncio +import predict +from vllm import SamplingParams +import uvicorn +import time + +app = FastAPI() +model = predict.init_model() + +@app.post("/v1/chat/completions") +async def chat_completions(request: Request): + data = await request.json() + + # Extract request parameters + messages = data.get("messages", []) + model_name = data.get("model", "runpod-model") + temperature = data.get("temperature", 0.7) + top_p = data.get("top_p", 0.95) + max_tokens = data.get("max_tokens", 512) + stream = data.get("stream", False) + + # Format messages into prompt + prompt = "" + for msg in messages: + role = msg.get("role", "user") + content = msg.get("content", "") + + if role == "system": + prompt += f"<|system|>\n{content}\n" + elif role == "user": + prompt += f"<|user|>\n{content}\n" + elif role == "assistant": + prompt += f"<|assistant|>\n{content}\n" + + prompt += "<|assistant|>\n" + + # Set up sampling parameters + sampling_params = SamplingParams( + temperature=temperature, + top_p=top_p, + max_tokens=max_tokens + ) + + # Generate response + start_time = time.time() + outputs = model.generate([prompt], sampling_params) + end_time = time.time() + + generated_text = outputs[0].outputs[0].text + + # Count tokens + input_tokens = len(model.tokenizer.encode(prompt)) + output_tokens = len(model.tokenizer.encode(generated_text)) + + # Format response to match OpenAI API + completion_id = f"cmpl-{int(time.time())}" + response_data = { + "id": completion_id, + "object": "chat.completion", + "created": int(time.time()), + "model": model_name, + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": generated_text + }, + "finish_reason": "stop" + } + ], + "usage": { + "prompt_tokens": input_tokens, + "completion_tokens": output_tokens, + "total_tokens": input_tokens + output_tokens + } + } + + if stream: + # Implement streaming logic + pass + + return response_data + +if __name__ == "__main__": + uvicorn.run(app, host="0.0.0.0", port=8000) +``` + +2. Update your Dockerfile to include FastAPI and run the compatibility server: + +```dockerfile +FROM runpod/base:0.4.0-cuda12.1.0 + +# Set environment variables +ENV MODEL_ID="meta-llama/Llama-3-8b-instruct" +ENV HUGGING_FACE_HUB_TOKEN="YOUR_HF_TOKEN" +ENV PORT=8000 +ENV WORKSPACE="/workspace" +ENV ENABLE_OPENAI_API="true" + +# Install dependencies +WORKDIR ${WORKSPACE} +COPY builder/requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt +RUN pip install --no-cache-dir flash-attn --no-build-isolation +RUN pip install --no-cache-dir fastapi uvicorn + +# Copy source code +COPY . . + +# vLLM writes model files to this directory +VOLUME /root/.cache/huggingface + +# Pre-download model weights +RUN python -c "from huggingface_hub import snapshot_download; snapshot_download('${MODEL_ID}')" + +# Start either the RunPod handler or OpenAI-compatible API server +CMD if [ "$ENABLE_OPENAI_API" = "true" ]; then python -u openai_compat.py; else python -u handler.py; fi +``` + +## Advanced optimizations + +For even better performance, consider these optimizations: + +1. **Quantization**: Add quantization to save memory and improve throughput: + +```python +# In predict.py +model = LLM( + model=MODEL_ID, + tensor_parallel_size=int(gpu_count), + trust_remote_code=True, + dtype="bfloat16", + quantization="awq", # Use AWQ quantization +) +``` + +2. **Continuous Batching**: Utilize vLLM's continuous batching by adjusting container concurrency: + +``` +# In RunPod endpoint configuration +Container Concurrency: 8 +``` + +3. **PagedAttention**: vLLM already uses PagedAttention, but ensure your model is compatible for best results. + +## Next steps + +- Explore other LLM optimizations like KV caching and speculative decoding +- Add additional API endpoints for specific use cases +- Fine-tune your own models and deploy them using this worker +- Set up monitoring and alerting for your endpoint + +> **Pro tip**: When working with large models, start with a smaller variant (e.g., 7B parameters) to test your setup before scaling to larger models. \ No newline at end of file diff --git a/docs/tutorials/workers/whisper-worker.md b/docs/tutorials/workers/whisper-worker.md new file mode 100644 index 00000000..7eb14cfb --- /dev/null +++ b/docs/tutorials/workers/whisper-worker.md @@ -0,0 +1,550 @@ +--- +title: Build a Whisper STT worker +description: "Create a customized speech-to-text worker using OpenAI's Whisper model on RunPod. Learn how to build, deploy, and optimize your own STT solution for accurate audio transcription." +sidebar_position: 3 +--- + +# Building a Whisper speech-to-text worker + +In this tutorial, you'll learn how to create a custom speech-to-text (STT) worker using OpenAI's Whisper model on RunPod. We'll walk through setting up your development environment, customizing the worker for specific needs, and deploying it as a serverless endpoint. + +## Prerequisites + +- RunPod account with serverless access +- Docker installed locally +- Basic understanding of Python and Docker +- Audio files for testing (WAV, MP3, FLAC, etc.) + +## Step 1: Set up your development environment + +1. Clone the RunPod Whisper worker template: + +```bash +git clone https://github.com/runpod-workers/worker-whisper.git +cd worker-whisper +``` + +2. Create a Python virtual environment: + +```bash +python -m venv venv +source venv/bin/activate # On Windows: venv\Scripts\activate +``` + +3. Install development dependencies: + +```bash +pip install -r requirements.txt +``` + +## Step 2: Understand the worker structure + +The template worker includes these key files: + +- `handler.py`: The main worker logic that processes requests +- `Dockerfile`: Instructions for building the worker container +- `src/rp_handler.py`: Core inference logic using Whisper +- `src/whisper_processor.py`: Audio processing and transcription logic + +## Step 3: Customize the worker for your use case + +Let's modify the worker to use a specific Whisper model and add custom parameters. + +1. Open `src/whisper_processor.py` and modify the model initialization: + +```python +import os +import torch +import whisper +from whisper.utils import get_writer + +class WhisperProcessor: + def __init__(self): + # Get model size from environment variable or use default + model_size = os.environ.get("WHISPER_MODEL", "medium") + + # Configure device and compute type + self.device = "cuda" if torch.cuda.is_available() else "cpu" + self.compute_type = os.environ.get("COMPUTE_TYPE", "float16") + + # Define valid options + self.valid_languages = whisper.tokenizer.LANGUAGES + self.valid_tasks = ["transcribe", "translate"] + + # Load the model - this will download if not cached + print(f"Loading Whisper {model_size} model...") + self.model = whisper.load_model( + model_size, + device=self.device, + download_root="/runpod-volume/whisper-models" + ) + print(f"Model loaded on {self.device} using {self.compute_type}") + + def process_audio(self, audio_path, options=None): + if options is None: + options = {} + + # Set default options + task = options.get("task", "transcribe") + language = options.get("language", None) + temperature = options.get("temperature", 0.0) + initial_prompt = options.get("initial_prompt", None) + word_timestamps = options.get("word_timestamps", False) + output_format = options.get("output_format", "all") + + # Validate options + if task not in self.valid_tasks: + raise ValueError(f"Task must be one of {self.valid_tasks}") + + if language is not None and language not in self.valid_languages: + raise ValueError(f"Language not supported. Must be one of {list(self.valid_languages.keys())}") + + # Prepare transcription options + transcribe_options = { + "task": task, + "temperature": temperature, + "initial_prompt": initial_prompt, + "word_timestamps": word_timestamps + } + + if language: + transcribe_options["language"] = language + + # Run transcription + print(f"Processing audio file: {audio_path}") + result = self.model.transcribe( + audio_path, + **transcribe_options + ) + + # Format the output based on user preference + if output_format == "text": + return {"text": result["text"]} + elif output_format == "srt": + writer = get_writer("srt", ".") + srt_content = writer(result, audio_path) + return {"text": result["text"], "srt": srt_content} + elif output_format == "vtt": + writer = get_writer("vtt", ".") + vtt_content = writer(result, audio_path) + return {"text": result["text"], "vtt": vtt_content} + else: # "all" + return result +``` + +2. Update the handler in `src/rp_handler.py`: + +```python +import os +import time +import base64 +import tempfile +import traceback +from .download import download_file_from_url +from .whisper_processor import WhisperProcessor + +# Initialize the processor +processor = WhisperProcessor() + +def handler(job): + ''' + Handler function for processing speech-to-text jobs + ''' + job_input = job["input"] + + # Start timing for performance metrics + start_time = time.time() + + try: + # Check for audio input (required) + if "audio" not in job_input: + return {"error": "Audio input is required. Provide a URL or base64 encoded audio"} + + # Get processing options + options = job_input.get("options", {}) + + # Create temp directory for audio file + with tempfile.TemporaryDirectory() as temp_dir: + audio_path = os.path.join(temp_dir, "audio_input") + + # Handle audio input - either URL or base64 + if "url" in job_input["audio"]: + audio_url = job_input["audio"]["url"] + print(f"Downloading audio from: {audio_url}") + download_file_from_url(audio_url, audio_path) + elif "base64" in job_input["audio"]: + audio_data = job_input["audio"]["base64"] + + # Handle potential prefixes in base64 data + if "," in audio_data: + audio_data = audio_data.split(",")[1] + + with open(audio_path, "wb") as f: + f.write(base64.b64decode(audio_data)) + else: + return {"error": "Invalid audio input. Provide either 'url' or 'base64'"} + + # Process the audio file + result = processor.process_audio(audio_path, options) + + # Calculate processing time + processing_time = time.time() - start_time + + # Add metadata to the result + result_with_metadata = { + "transcription": result, + "metadata": { + "model": os.environ.get("WHISPER_MODEL", "medium"), + "processing_time": processing_time + } + } + + return result_with_metadata + + except Exception as e: + error_traceback = traceback.format_exc() + print(f"Error processing audio: {str(e)}\n{error_traceback}") + return { + "error": str(e), + "traceback": error_traceback + } +``` + +3. Update the Dockerfile to customize the environment: + +```dockerfile +FROM runpod/base:0.4.0-cuda11.8.0 + +# Set environment variables +ENV WHISPER_MODEL="medium" +ENV COMPUTE_TYPE="float16" +ENV PYTHONUNBUFFERED=1 + +WORKDIR /app + +# Install ffmpeg for audio processing +RUN apt-get update && apt-get install -y ffmpeg && \ + rm -rf /var/lib/apt/lists/* + +# Copy and install requirements +COPY builder/requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Copy all files +COPY . . + +# Create volume for model caching +VOLUME /runpod-volume/whisper-models + +# Pre-download the model +RUN python -c "import whisper; whisper.load_model('${WHISPER_MODEL}')" + +# Start the worker +CMD ["python", "-u", "handler.py"] +``` + +## Step 4: Test the worker locally + +1. Create a test input file named `test_input.json`: + +```json +{ + "input": { + "audio": { + "url": "https://storage.googleapis.com/aai-web-samples/instrumental_speech.wav" + }, + "options": { + "task": "transcribe", + "language": "en", + "word_timestamps": true, + "output_format": "all" + } + } +} +``` + +2. Run the worker locally: + +```bash +python -m handler +``` + +## Step 5: Build and push the Docker image + +1. Build the Docker image: + +```bash +docker build -t your-username/whisper-worker:latest . +``` + +2. Push to Docker Hub: + +```bash +docker push your-username/whisper-worker:latest +``` + +## Step 6: Deploy to RunPod + +1. Go to the [RunPod Serverless Console](https://www.runpod.io/console/serverless) +2. Click "New Endpoint" +3. Enter your Docker image URL +4. Configure your endpoint settings: + - GPU Type: T4 or better recommended + - Worker Count: 0 (scale to zero) or 1+ to keep warm + - Max Workers: Set based on expected load + - Advanced Settings: Set environment variables if needed +5. Click "Deploy" + +## Step 7: Test your deployed endpoint + +Use this Python code to test your endpoint: + +```python +import requests +import json +import time +import base64 + +API_KEY = "YOUR_API_KEY" +ENDPOINT_ID = "YOUR_ENDPOINT_ID" + +def transcribe_audio(audio_file_path=None, audio_url=None, options=None): + url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/run" + + if options is None: + options = { + "task": "transcribe", + "language": "en", + "output_format": "all" + } + + # Prepare payload + payload = { + "input": { + "audio": {}, + "options": options + } + } + + # Add audio source (file or URL) + if audio_file_path: + with open(audio_file_path, "rb") as f: + audio_base64 = base64.b64encode(f.read()).decode('utf-8') + payload["input"]["audio"]["base64"] = audio_base64 + elif audio_url: + payload["input"]["audio"]["url"] = audio_url + else: + raise ValueError("Either audio_file_path or audio_url must be provided") + + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {API_KEY}" + } + + # Submit job + response = requests.post(url, headers=headers, json=payload) + response_json = response.json() + + job_id = response_json.get("id") + status_url = f"https://api.runpod.ai/v2/{ENDPOINT_ID}/status/{job_id}" + + # Poll for job completion + while True: + status_response = requests.get(status_url, headers=headers) + status_data = status_response.json() + + if status_data.get("status") == "COMPLETED": + return status_data.get("output") + elif status_data.get("status") == "FAILED": + return {"error": "Job failed", "details": status_data} + + time.sleep(2) + +# Example usage with URL +result = transcribe_audio( + audio_url="https://storage.googleapis.com/aai-web-samples/instrumental_speech.wav", + options={ + "task": "transcribe", + "language": "en", + "word_timestamps": True + } +) + +print(result["transcription"]["text"]) + +# Example with local file +# result = transcribe_audio( +# audio_file_path="path/to/your/audio.mp3", +# options={"task": "transcribe"} +# ) +``` + +## Batch processing multiple files + +For batch processing, you can create a script to handle multiple files: + +```python +import os +import json +import time +from concurrent.futures import ThreadPoolExecutor +from transcribe import transcribe_audio # Import the function from above + +def process_directory(directory_path, output_directory, max_workers=3): + """Process all audio files in a directory""" + + # Create output directory if it doesn't exist + os.makedirs(output_directory, exist_ok=True) + + # Get all audio files + audio_files = [ + os.path.join(directory_path, f) for f in os.listdir(directory_path) + if f.endswith(('.mp3', '.wav', '.flac', '.m4a', '.ogg')) + ] + + print(f"Found {len(audio_files)} audio files to process") + + # Process files in parallel + with ThreadPoolExecutor(max_workers=max_workers) as executor: + futures = [] + + for audio_file in audio_files: + base_name = os.path.basename(audio_file) + output_file = os.path.join(output_directory, f"{os.path.splitext(base_name)[0]}.json") + + # Skip if already processed + if os.path.exists(output_file): + print(f"Skipping {base_name} - already processed") + continue + + # Submit transcription job + future = executor.submit( + process_file, + audio_file, + output_file + ) + futures.append(future) + + # Wait for all jobs to complete + for i, future in enumerate(futures): + try: + result = future.result() + print(f"Completed {i+1}/{len(futures)}: {result}") + except Exception as e: + print(f"Error in job {i+1}/{len(futures)}: {str(e)}") + +def process_file(audio_file, output_file): + """Process a single audio file and save results""" + try: + # Get filename for logging + filename = os.path.basename(audio_file) + + print(f"Processing {filename}...") + result = transcribe_audio( + audio_file_path=audio_file, + options={ + "task": "transcribe", + "output_format": "all" + } + ) + + # Save the result + with open(output_file, 'w') as f: + json.dump(result, f, indent=2) + + return f"{filename} -> {output_file}" + + except Exception as e: + print(f"Error processing {audio_file}: {str(e)}") + return f"Failed: {audio_file} - {str(e)}" + +# Example usage +if __name__ == "__main__": + process_directory( + directory_path="./audio_files", + output_directory="./transcripts", + max_workers=3 + ) +``` + +## Advanced customizations + +### 1. Fine-tuning for domain-specific audio + +If you're working with specialized vocabulary (medical, legal, etc.), you can improve transcription by providing initial prompts: + +```python +options = { + "task": "transcribe", + "language": "en", + "initial_prompt": "The following is a medical consultation about cardiovascular disease. Technical terms include: myocardial infarction, atherosclerosis, ventricular tachycardia." +} +``` + +### 2. Custom vocabulary support + +Add custom vocabulary support by extending the processor: + +```python +def add_custom_vocabulary(self, text, custom_vocab): + """Add custom vocabulary terms to improve recognition.""" + for term, phonetic in custom_vocab.items(): + # Simple find-and-replace for corrections + text = text.replace(phonetic, term) + return text + +# Later in process_audio: +if "custom_vocabulary" in options: + result["text"] = self.add_custom_vocabulary( + result["text"], + options["custom_vocabulary"] + ) +``` + +### 3. Audio enhancement with noise reduction + +Add preprocessing with noise reduction: + +```python +def preprocess_audio(self, input_path, output_path): + """Apply noise reduction to improve audio quality.""" + import ffmpeg + + try: + # Apply noise reduction filter using ffmpeg + ( + ffmpeg + .input(input_path) + .audio + .filter('afftdn') # FFT-based denoiser + .output(output_path) + .run(quiet=True, overwrite_output=True) + ) + return True + except Exception as e: + print(f"Audio preprocessing failed: {str(e)}") + return False +``` + +### 4. Enable model parallelism for larger models + +For the large model, enable model parallelism: + +```python +# In WhisperProcessor.__init__ +num_gpus = torch.cuda.device_count() +if num_gpus > 1 and model_size == "large": + # Enable model parallelism for large model + print(f"Enabling model parallelism across {num_gpus} GPUs") + self.model.to(device=self.device) +``` + +## Next steps + +- Integrate with speech diarization to identify different speakers +- Add support for timestamp-based video captioning +- Implement a webhook system for async job completion notifications +- Create a simple web UI for uploading and transcribing files +- Explore using different Whisper models for various accuracy/speed tradeoffs + +> **Pro tip**: For long audio files, consider adding an option to split the audio into smaller chunks before processing, which can improve accuracy and reduce memory usage for extended recordings. \ No newline at end of file diff --git a/sidebars.js b/sidebars.js index c091a7ad..3d2f7445 100644 --- a/sidebars.js +++ b/sidebars.js @@ -18,13 +18,232 @@ module.exports = { { type: "category", label: "Serverless", + link: { + type: "doc", + id: "serverless/index", + }, items: [ { - type: "autogenerated", - dirName: "serverless", + type: "category", + label: "Quick start", + items: [ + { + type: "doc", + id: "serverless/quick-start/deploy-models", + label: "Deploy popular models" + }, + { + type: "doc", + id: "serverless/github-integration", + label: "GitHub integration" + }, + { + type: "doc", + id: "serverless/quick-deploys", + label: "Quick deploys" + }, + { + type: "doc", + id: "serverless/overview", + label: "Concepts overview" + } + ] + }, + { + type: "category", + label: "Build custom", + items: [ + { + type: "doc", + id: "serverless/build/first-endpoint", + label: "Create your first endpoint" + }, + { + type: "doc", + id: "serverless/get-started", + label: "Step-by-step guide" + }, + { + type: "doc", + id: "serverless/workers/overview", + label: "Worker overview" + } + ] + }, + { + type: "category", + label: "Manage & optimize", + items: [ + { + type: "doc", + id: "serverless/manage/scaling", + label: "Configure autoscaling" + }, + { + type: "category", + label: "Endpoint management", + items: [ + { + type: "doc", + id: "serverless/endpoints/overview", + label: "Endpoints overview" + }, + { + type: "doc", + id: "serverless/endpoints/get-started", + label: "Get started with endpoints" + }, + { + type: "doc", + id: "serverless/endpoints/send-requests", + label: "Send requests" + }, + { + type: "doc", + id: "serverless/endpoints/job-operations", + label: "Job operations" + }, + { + type: "doc", + id: "serverless/endpoints/manage-endpoints", + label: "Manage endpoints" + } + ] + } + ] + }, + { + type: "category", + label: "Use cases & examples", + items: [ + { + type: "doc", + id: "serverless/examples/text-generation", + label: "Text generation" + } + ] }, + { + type: "category", + label: "Reference", + items: [ + { + type: "doc", + id: "serverless/reference/configurations", + label: "Configuration options" + }, + { + type: "doc", + id: "serverless/reference/troubleshooting", + label: "Troubleshooting guide" + }, + { + type: "doc", + id: "serverless/references/endpoint-configurations", + label: "Endpoint configurations" + }, + { + type: "doc", + id: "serverless/references/job-states", + label: "Job states" + }, + { + type: "doc", + id: "serverless/references/operations", + label: "API operations" + } + ] + } ], }, + { + type: "category", + label: "Workers", + items: [ + { + type: "doc", + id: "serverless/workers/overview", + label: "Overview" + }, + { + type: "category", + label: "Handler functions", + items: [ + { + type: "autogenerated", + dirName: "serverless/workers/handlers", + } + ] + }, + { + type: "category", + label: "Development", + items: [ + { + type: "autogenerated", + dirName: "serverless/workers/development", + } + ] + }, + { + type: "category", + label: "Deployment", + items: [ + { + type: "autogenerated", + dirName: "serverless/workers/deploy", + } + ] + }, + { + type: "category", + label: "vLLM workers", + items: [ + { + type: "doc", + id: "serverless/workers/vllm/overview", + label: "Overview" + }, + { + type: "doc", + id: "serverless/workers/vllm/get-started", + label: "Get started" + }, + { + type: "doc", + id: "serverless/workers/vllm/openai-compatibility", + label: "OpenAI compatibility" + }, + { + type: "doc", + id: "serverless/workers/vllm/configurable-endpoints", + label: "Configurable endpoints" + }, + { + type: "doc", + id: "serverless/workers/vllm/environment-variables", + label: "Environment variables" + } + ] + }, + { + type: "category", + label: "Specialized workers", + items: [ + { + type: "doc", + id: "serverless/workers/specialized/stable-diffusion", + label: "Stable Diffusion" + }, + { + type: "doc", + id: "serverless/workers/specialized/tts", + label: "Text-to-Speech" + } + ] + } + ] + }, { type: "category", label: "Pods", @@ -144,6 +363,27 @@ module.exports = { }, ], }, + { + type: "category", + label: "Workers", + items: [ + { + type: "doc", + id: "tutorials/workers/overview", + label: "Overview" + }, + { + type: "doc", + id: "tutorials/workers/vllm-worker", + label: "Create a vLLM worker" + }, + { + type: "doc", + id: "tutorials/workers/custom-worker", + label: "Build a custom worker" + } + ], + }, { type: "category", label: "Pods",