Skip to content

Restructure Serverless docs and rewrite /serverless/endpoints/ section #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Apr 18, 2025
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
186bf0c
Restructure serverless docs
muhsinking Apr 16, 2025
dd9a726
Fix redirects
muhsinking Apr 16, 2025
877339f
Fix all links
muhsinking Apr 16, 2025
516f48f
Fix redirects
muhsinking Apr 16, 2025
8dcf2d0
Fix redirects v2
muhsinking Apr 16, 2025
e8c7ec6
Redirect test v3
muhsinking Apr 16, 2025
2b70abf
Add more redirects
muhsinking Apr 16, 2025
ffa615f
Add comments to redirects, reorder categories, fix case in misc files
muhsinking Apr 16, 2025
7b1ff97
Fix cases in glossary
muhsinking Apr 16, 2025
2294f04
Remove redirects.json
muhsinking Apr 16, 2025
9c078c5
Fix capitalization of "endpoints"
muhsinking Apr 16, 2025
2252864
Fix case of headers and sidebar
muhsinking Apr 16, 2025
57591e0
Glossary: Move links out of header
muhsinking Apr 16, 2025
8e2924f
Update serverless doc titles
muhsinking Apr 16, 2025
2e32353
Move reference -> endpoints, many updates to endpoints docs
muhsinking Apr 17, 2025
368b593
Rewrite endpoints overview
muhsinking Apr 17, 2025
7cddd4f
Rewrite manage-endpoints
muhsinking Apr 17, 2025
7a75c01
Rename send requests
muhsinking Apr 17, 2025
60b672b
Update post-review
muhsinking Apr 17, 2025
129c51d
Rewrite endpoint configurations
muhsinking Apr 17, 2025
e875836
Revert change in pagination
muhsinking Apr 17, 2025
d11c2c7
Rewrite job-operations.md, merge/delete operations.md
muhsinking Apr 17, 2025
3edeb21
Small fix in job-operations
muhsinking Apr 17, 2025
6605dc7
Rewrite send-requests.md
muhsinking Apr 18, 2025
f84c364
rename job-operations -> operations, update links, lint
muhsinking Apr 18, 2025
548f4a2
Fix broken link
muhsinking Apr 18, 2025
a72bde4
Fix broken link
muhsinking Apr 18, 2025
b84bb8d
Remove stream from execution modes
muhsinking Apr 18, 2025
932e802
Rewrite overview without bullet points
muhsinking Apr 18, 2025
04fe143
Fix link
muhsinking Apr 18, 2025
d95ec7d
Update formatting
muhsinking Apr 18, 2025
26aa47e
Update for review
muhsinking Apr 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/api/api-endpoints.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "API Endpoints"
description: "Unlock the power of RunPod's API Endpoints, manage models without managing pods, and retrieve results via the status endpoint within 30 minutes for privacy protection; rate limits enforced per user."
title: "API endpoints"
description: "Unlock the power of RunPod's API endpoints, manage models without managing pods, and retrieve results via the status endpoint within 30 minutes for privacy protection; rate limits enforced per user."
sidebar_position: 1
---

@@ -13,22 +13,22 @@ We don't keep your inputs or outputs longer than that to protect your privacy!

:::

API Endpoints are Endpoints managed by RunPod that you can use to interact with your favorite models without managing the pods yourself.
These Endpoints are available to all users.
API endpoints are endpoints managed by RunPod that you can use to interact with your favorite models without managing the pods yourself.
These endpoints are available to all users.

## Overview

The API Endpoint implementation works asynchronously as well as synchronous.

Let's take a look at the differences between the two different implementations.

### Asynchronous Endpoints
### Asynchronous endpoints

Asynchronous endpoints are useful for long-running jobs that you don't want to wait for. You can submit a job and then check back later to see if it's done.
When you fire an Asynchronous request with the API Endpoint, your input parameters are sent to our endpoint and you immediately get a response with a unique job ID.
You can then query the response by passing the job ID to the status endpoint. The status endpoint will give you the job results when completed.

### Synchronous Endpoints
### Synchronous endpoints

Synchronous endpoints are useful for short-running jobs that you want to wait for.
You can submit a job and get the results back immediately.
@@ -137,4 +137,4 @@ Exceeding limits returns a `429` error.
`/run` - 1000 requests/10s, max 200 concurrent
`/runsync` - 2000 requests/10s, max 400 concurrent

For more information, see [Job operations](/serverless/references/operations).
For more information, see [Job operations](/serverless/endpoints/operations).
79 changes: 9 additions & 70 deletions docs/glossary.md
Original file line number Diff line number Diff line change
@@ -19,77 +19,15 @@ A [worker](./serverless/workers/overview.md) is a single compute resource that p

## Endpoint

An Endpoint refers to a specific REST API (URL) provided by RunPod that your applications or services can interact with. These endpoints enable standard functionality for submitting jobs and retrieving their outputs.
An endpoint refers to a specific REST API (URL) provided by RunPod that your applications or services can interact with. These endpoints enable standard functionality for submitting jobs and retrieving their outputs.

## Handler

A Handler is a function you create that takes in submitted inputs, processes them (like generating images, text, or audio), and returns the final output.
A handler is a function you create that takes in submitted inputs, processes them (like generating images, text, or audio), and returns the final output.

## Serverless [SDK](https://github.com/runpod/runpod-python?tab=readme-ov-file#--serverless-worker-sdk)
## Serverless SDK

A Python package used when creating a handler function. This package helps your code receive requests from our serverless system, triggers your handler function to execute, and returns the function’s result back to the serverless system.

## Endpoint Settings

### Idle Timeout

The amount of time a worker remains running after completing its current request. During this period, the worker stays active, continuously checking the queue for new jobs, and continues to incur charges. If no new requests arrive within this time, the worker will go to sleep.

Default: 5 seconds

### Execution Timeout

The maximum time a job can run before the system terminates the worker. This prevents “bad” jobs from running indefinitely and draining your credit.

You can disable this setting, but we highly recommend keeping it enabled. The default maximum value is 24 hours, but if you need a longer duration, you can use job TTL to override it.

Default: 600 seconds (10 minutes)

### Job [TTL](/serverless/endpoints/send-requests#execution-policies)(Time-To-Live)

Defines the maximum time a job can remain in the queue before it's automatically terminated. This parameter ensures that jobs don't stay in the queue indefinitely. You should set this if your job runs longer than 24 hours or if you want to remove job data as soon as it is finished.

Minimum value: 10,000 milliseconds (10 seconds)
Default value: 86,400,000 milliseconds (24 hours)

### Flashboot

FlashBoot is RunPod’s magic solution for reducing the average cold-start times on your endpoint. It works probabilistically. When your endpoint has consistent traffic, your workers have a higher chance of benefiting from FlashBoot for faster spin-ups. However, if your endpoint isn’t receiving frequent requests, FlashBoot has fewer opportunities to optimize performance. There’s no additional cost associated with FlashBoot.

### Scale Type

- Queue Delay scaling strategy adjusts worker numbers based on request wait times. With zero workers initially, the first request adds one worker. Subsequent requests add workers only after waiting in the queue for the defined number of delay seconds.
- Request Count scaling strategy adjusts worker numbers according to total requests in the queue and in progress. It automatically adds workers as the number of requests increases, ensuring tasks are handled efficiently.

### Expose HTTP/TCP Ports

We allow direct communication with your worker using its public IP and port. This is especially useful for real-time applications that require minimal latency. Check out this [WebSocket example](https://github.com/runpod-workers/worker-websocket) to see how it works!

## Endpoint Metrics

### Requests

Displays the total number of requests received by your endpoint, along with the number of completed, failed, and retried requests.

### Execution Time

Displays the P70, P90, and P98 execution times for requests on your endpoint. These percentiles help analyze execution time distribution and identify potential performance bottlenecks.

### Delay Time

Delay time is the duration a request spends waiting in the queue before being picked up by a worker. Displays the P70, P90, and P98 delay times for requests on your endpoint. These percentiles help assess whether your endpoint is scaling efficiently.

### Cold Start Time

Cold start time measures how long it takes to wake up a worker. This includes the time needed to start the container, load the model into GPU VRAM, and get the worker ready to process a job. Displays the P70, P90, and P98 cold start times for your endpoint.

### Cold Start Count

Displays the number of cold starts your endpoint has during a given period. The fewer, the better, as fewer cold starts mean faster response times.

### WebhookRequest Responses

Displays the number of webhook requests sent and their corresponding responses, including success and failure counts.
The [Serverless SDK](https://github.com/runpod/runpod-python) is a Python package used when creating a handler function. This package helps your code receive requests from our serverless system, triggers your handler function to execute, and returns the function's result back to the serverless system.

# Pod

@@ -101,13 +39,14 @@ GPU instances that run in T3/T4 data centers, providing high reliability and sec

GPU instances connect individual compute providers to consumers through a vetted, secure peer-to-peer system.

## Datacenter
## Data center

A data center is a secure location where RunPod's cloud computing services, such as GPU instances and storage instances, are hosted. These data centers are equipped with redundant power, multiple ISP connections, and data backups to ensure the safety and reliability of your compute services and data.

A data center is a secure location where RunPod's cloud computing services, such as Secure Cloud and GPU Instances, are hosted. These data centers are equipped with redundancy and data backups to ensure the safety and reliability of your data.
## GPU instance

## GPU Instance
A GPU instance is a container-based compute resource that you can deploy.

GPU Instance is a container-based GPU instance that you can deploy.
These instances spin up in seconds using both public and private repositories.
They are available in two different types:

2 changes: 1 addition & 1 deletion docs/hosting/burn-testing.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Burn Testing"
title: "Burn testing"
description: "Before listing a machine on the RunPod platform, thoroughly test it with a burn test, verifying memory, CPU, and disk capabilities, and ensure compatibility with popular templates by self-renting the machine after verifying its performance."
---

6 changes: 5 additions & 1 deletion docs/hosting/partner-requirements.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# RunPod Secure Cloud Partner Requirements - Release 2025
---
title: "Secure Cloud partner requirements"
---

# RunPod Secure Cloud partner requirements (2025)

# Introduction

2 changes: 1 addition & 1 deletion docs/integrations/mods/mods.md
Original file line number Diff line number Diff line change
@@ -29,7 +29,7 @@ To start using Mods, follow these step-by-step instructions:

```yml
runpod:
# https://docs.runpod.io/serverless/workers/vllm/openai-compatibility
# https://docs.runpod.io/serverless/vllm/openai-compatibility
base-url: https://api.runpod.ai/v2/${YOUR_ENDPOINT}/openai/v1
api-key:
api-key-env: RUNPOD_API_KEY
2 changes: 1 addition & 1 deletion docs/overview.md
Original file line number Diff line number Diff line change
@@ -30,7 +30,7 @@ Use Serverless to:
### Get started with Serverless

- [Build your first Serverless app](/serverless/get-started)
- [Run any LLM as an endpoint using vLLM workers](/serverless/workers/vllm/get-started)
- [Run any LLM as an endpoint using vLLM workers](/serverless/vllm/get-started)
- [Tutorial: Create a Serverless endpoint with Stable Diffusion](/tutorials/serverless/gpu/run-your-first)

## Pods
8 changes: 4 additions & 4 deletions docs/references/troubleshooting/storage-full.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: "Storage Full"
title: "Storage full"
id: "storage-full"
description: "This document provides guidance to troubleshoot the storage full, which may occur when users generate many files, transfer files, or perform other storage-intensive tasks."
---

Storage full can occur when users generate many files, transfer files, or perform other storage-intensive tasks. This document provides guidance to help you troubleshoot this.

## Check Disk Usage
## Check disk usage

When encountering a storage full, the first step is to check your container’s disk usage. You can use the `df -h` command to display a summary of disk usage.

@@ -34,7 +34,7 @@ tmpfs 252G 0 252G 0% /sys/firmware
tmpfs 252G 0 252G 0% /sys/devices/virtual/powercap
```

## Key Areas to Check
## Key areas to check

**Container Disk Usage**: The primary storage area for your container is mounted on the `overlay` filesystem. This indicates the container’s root directory.

@@ -62,7 +62,7 @@ root@9b8e325167b2:/# find /workspace -type f -exec du -h {} + | sort -rh | head
512 /workspace/a.txt
```

## Removing Files and Directories
## Removing files and directories

Once you’ve identified large files or directories that are no longer needed, you can remove them to free up space.

8 changes: 4 additions & 4 deletions docs/references/troubleshooting/troubleshooting-502-errors.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: "502 Errors"
title: "502 errors"
id: "troubleshooting-502-errors"
description: "Troubleshoot 502 errors in your deployed pod by checking GPU attachment, pod logs, and official template instructions to resolve issues and enable seamless access."
---

502 errors can occur when users attempt to access a program running on a specific port of a deployed pod and the program isn't running or has encountered an error. This document provides guidance to help you troubleshoot this error.

### Check Your Pod's GPU
### Check your Pod's GPU

The first step to troubleshooting a 502 error is to check whether your pod has a GPU attached.

@@ -18,7 +18,7 @@ If a GPU is attached, you will see it under the Pods screen (e.g. 1 x A6000). If

![](/img/docs/fb4c0dd-image.png)

### Check Your Pod's Logs
### Check your Pod's logs

After confirming that your pod has a GPU attached, the next step is to check your pod's logs for any errors.

@@ -27,7 +27,7 @@ After confirming that your pod has a GPU attached, the next step is to check you
2. ![](/img/docs/3500eba-image.png)\
**Look for errors**: Browse through the logs to find any error messages that may provide clues about why you're experiencing a 502 error.

### Verify Additional Steps for Official Templates
### Verify additional steps for official templates

In some cases, for our official templates, the user interface does not work right away and may require additional steps to be performed by the user.

2 changes: 1 addition & 1 deletion docs/sdks/javascript/endpoints.md
Original file line number Diff line number Diff line change
@@ -626,7 +626,7 @@ console.log(result);
</TabItem>
</Tabs>

For more information, see [Execution policy](/serverless/endpoints/job-operations).
For more information, see [Execution policy](/serverless/endpoints/operations).

## Purge queue

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"label": "Development",
"position": 6,
"position": 10,
"link": {
"type": "generated-index",
"description": "Learn to develop your application."
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
title: "Local Server Flags"
title: "Local server flags"
description: "A comprehensive guide to all flags available when starting your RunPod local server for endpoint testing"
sidebar_position: 1
---

When developing RunPod serverless functions, it's crucial to test them thoroughly before deployment.
The RunPod SDK provides a powerful local testing environment that allows you to simulate your serverless endpoints right on your development machine.
When developing RunPod Serverless functions, it's crucial to test them thoroughly before deployment.
The RunPod SDK provides a powerful local testing environment that allows you to simulate your Serverless endpoints right on your development machine.
This local server eliminates the need for constant Docker container rebuilds, uploads, and endpoint updates during the development and testing phase.

To facilitate this local testing environment, the RunPod SDK offers a variety of flags that allow you to customize your setup.
@@ -20,7 +20,7 @@ By using these flags, you can create a local environment that closely mimics the

This guide provides a comprehensive overview of all available flags, their purposes, and how to use them effectively in your local testing workflow.

## Basic Usage
## Basic usage

To start your local server with additional flags, use the following format:

@@ -30,7 +30,7 @@ python your_function.py [flags]

Replace `your_function.py` with the name of your Python file containing the RunPod handler.

## Available Flags
## Available flags

### --rp_serve_api

@@ -138,6 +138,6 @@ python main.py --rp_serve_api \

This command starts the local server on port `8080` with 4 concurrent workers, sets the log level to `DEBUG`, and provides test input data.

These flags provide powerful tools for customizing your local testing environment. By using them effectively, you can simulate various scenarios, debug issues, and ensure your serverless functions are robust and ready for deployment to the RunPod cloud.
These flags provide powerful tools for customizing your local testing environment. By using them effectively, you can simulate various scenarios, debug issues, and ensure your Serverless functions are robust and ready for deployment to the RunPod cloud.

For more detailed information on each flag and advanced usage scenarios, refer to the individual tutorials in this documentation.
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/serverless/endpoints/_category_.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"label": "Endpoints",
"position": 5,
"position": 6,
"link": {
"type": "generated-index",
"description": "Learn how to customize the serverless functions used by in your applications."
Loading