Skip to content

Commit 875bee4

Browse files
muhsinkingDeJayDev
authored andcommitted
Restructure Serverless docs and rewrite /serverless/endpoints/ section (#227)
1 parent 8536493 commit 875bee4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+1281
-945
lines changed

docs/api/api-endpoints.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: "API Endpoints"
3-
description: "Unlock the power of RunPod's API Endpoints, manage models without managing pods, and retrieve results via the status endpoint within 30 minutes for privacy protection; rate limits enforced per user."
2+
title: "API endpoints"
3+
description: "Unlock the power of RunPod's API endpoints, manage models without managing pods, and retrieve results via the status endpoint within 30 minutes for privacy protection; rate limits enforced per user."
44
sidebar_position: 1
55
---
66

@@ -13,22 +13,22 @@ We don't keep your inputs or outputs longer than that to protect your privacy!
1313

1414
:::
1515

16-
API Endpoints are Endpoints managed by RunPod that you can use to interact with your favorite models without managing the pods yourself.
17-
These Endpoints are available to all users.
16+
API endpoints are endpoints managed by RunPod that you can use to interact with your favorite models without managing the pods yourself.
17+
These endpoints are available to all users.
1818

1919
## Overview
2020

2121
The API Endpoint implementation works asynchronously as well as synchronous.
2222

2323
Let's take a look at the differences between the two different implementations.
2424

25-
### Asynchronous Endpoints
25+
### Asynchronous endpoints
2626

2727
Asynchronous endpoints are useful for long-running jobs that you don't want to wait for. You can submit a job and then check back later to see if it's done.
2828
When you fire an Asynchronous request with the API Endpoint, your input parameters are sent to our endpoint and you immediately get a response with a unique job ID.
2929
You can then query the response by passing the job ID to the status endpoint. The status endpoint will give you the job results when completed.
3030

31-
### Synchronous Endpoints
31+
### Synchronous endpoints
3232

3333
Synchronous endpoints are useful for short-running jobs that you want to wait for.
3434
You can submit a job and get the results back immediately.
@@ -137,4 +137,4 @@ Exceeding limits returns a `429` error.
137137
`/run` - 1000 requests/10s, max 200 concurrent
138138
`/runsync` - 2000 requests/10s, max 400 concurrent
139139

140-
For more information, see [Job operations](/serverless/references/operations).
140+
For more information, see [Job operations](/serverless/endpoints/operations).

docs/glossary.md

+9-70
Original file line numberDiff line numberDiff line change
@@ -19,77 +19,15 @@ A [worker](./serverless/workers/overview.md) is a single compute resource that p
1919

2020
## Endpoint
2121

22-
An Endpoint refers to a specific REST API (URL) provided by RunPod that your applications or services can interact with. These endpoints enable standard functionality for submitting jobs and retrieving their outputs.
22+
An endpoint refers to a specific REST API (URL) provided by RunPod that your applications or services can interact with. These endpoints enable standard functionality for submitting jobs and retrieving their outputs.
2323

2424
## Handler
2525

26-
A Handler is a function you create that takes in submitted inputs, processes them (like generating images, text, or audio), and returns the final output.
26+
A handler is a function you create that takes in submitted inputs, processes them (like generating images, text, or audio), and returns the final output.
2727

28-
## Serverless [SDK](https://github.com/runpod/runpod-python?tab=readme-ov-file#--serverless-worker-sdk)
28+
## Serverless SDK
2929

30-
A Python package used when creating a handler function. This package helps your code receive requests from our serverless system, triggers your handler function to execute, and returns the function’s result back to the serverless system.
31-
32-
## Endpoint Settings
33-
34-
### Idle Timeout
35-
36-
The amount of time a worker remains running after completing its current request. During this period, the worker stays active, continuously checking the queue for new jobs, and continues to incur charges. If no new requests arrive within this time, the worker will go to sleep.
37-
38-
Default: 5 seconds
39-
40-
### Execution Timeout
41-
42-
The maximum time a job can run before the system terminates the worker. This prevents “bad” jobs from running indefinitely and draining your credit.
43-
44-
You can disable this setting, but we highly recommend keeping it enabled. The default maximum value is 24 hours, but if you need a longer duration, you can use job TTL to override it.
45-
46-
Default: 600 seconds (10 minutes)
47-
48-
### Job [TTL](/serverless/endpoints/send-requests#execution-policies)(Time-To-Live)
49-
50-
Defines the maximum time a job can remain in the queue before it's automatically terminated. This parameter ensures that jobs don't stay in the queue indefinitely. You should set this if your job runs longer than 24 hours or if you want to remove job data as soon as it is finished.
51-
52-
Minimum value: 10,000 milliseconds (10 seconds)
53-
Default value: 86,400,000 milliseconds (24 hours)
54-
55-
### Flashboot
56-
57-
FlashBoot is RunPod’s magic solution for reducing the average cold-start times on your endpoint. It works probabilistically. When your endpoint has consistent traffic, your workers have a higher chance of benefiting from FlashBoot for faster spin-ups. However, if your endpoint isn’t receiving frequent requests, FlashBoot has fewer opportunities to optimize performance. There’s no additional cost associated with FlashBoot.
58-
59-
### Scale Type
60-
61-
- Queue Delay scaling strategy adjusts worker numbers based on request wait times. With zero workers initially, the first request adds one worker. Subsequent requests add workers only after waiting in the queue for the defined number of delay seconds.
62-
- Request Count scaling strategy adjusts worker numbers according to total requests in the queue and in progress. It automatically adds workers as the number of requests increases, ensuring tasks are handled efficiently.
63-
64-
### Expose HTTP/TCP Ports
65-
66-
We allow direct communication with your worker using its public IP and port. This is especially useful for real-time applications that require minimal latency. Check out this [WebSocket example](https://github.com/runpod-workers/worker-websocket) to see how it works!
67-
68-
## Endpoint Metrics
69-
70-
### Requests
71-
72-
Displays the total number of requests received by your endpoint, along with the number of completed, failed, and retried requests.
73-
74-
### Execution Time
75-
76-
Displays the P70, P90, and P98 execution times for requests on your endpoint. These percentiles help analyze execution time distribution and identify potential performance bottlenecks.
77-
78-
### Delay Time
79-
80-
Delay time is the duration a request spends waiting in the queue before being picked up by a worker. Displays the P70, P90, and P98 delay times for requests on your endpoint. These percentiles help assess whether your endpoint is scaling efficiently.
81-
82-
### Cold Start Time
83-
84-
Cold start time measures how long it takes to wake up a worker. This includes the time needed to start the container, load the model into GPU VRAM, and get the worker ready to process a job. Displays the P70, P90, and P98 cold start times for your endpoint.
85-
86-
### Cold Start Count
87-
88-
Displays the number of cold starts your endpoint has during a given period. The fewer, the better, as fewer cold starts mean faster response times.
89-
90-
### WebhookRequest Responses
91-
92-
Displays the number of webhook requests sent and their corresponding responses, including success and failure counts.
30+
The [Serverless SDK](https://github.com/runpod/runpod-python) is a Python package used when creating a handler function. This package helps your code receive requests from our serverless system, triggers your handler function to execute, and returns the function's result back to the serverless system.
9331

9432
# Pod
9533

@@ -101,13 +39,14 @@ GPU instances that run in T3/T4 data centers, providing high reliability and sec
10139

10240
GPU instances connect individual compute providers to consumers through a vetted, secure peer-to-peer system.
10341

104-
## Datacenter
42+
## Data center
43+
44+
A data center is a secure location where RunPod's cloud computing services, such as GPU instances and storage instances, are hosted. These data centers are equipped with redundant power, multiple ISP connections, and data backups to ensure the safety and reliability of your compute services and data.
10545

106-
A data center is a secure location where RunPod's cloud computing services, such as Secure Cloud and GPU Instances, are hosted. These data centers are equipped with redundancy and data backups to ensure the safety and reliability of your data.
46+
## GPU instance
10747

108-
## GPU Instance
48+
A GPU instance is a container-based compute resource that you can deploy.
10949

110-
GPU Instance is a container-based GPU instance that you can deploy.
11150
These instances spin up in seconds using both public and private repositories.
11251
They are available in two different types:
11352

docs/hosting/burn-testing.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "Burn Testing"
2+
title: "Burn testing"
33
description: "Before listing a machine on the RunPod platform, thoroughly test it with a burn test, verifying memory, CPU, and disk capabilities, and ensure compatibility with popular templates by self-renting the machine after verifying its performance."
44
---
55

docs/hosting/partner-requirements.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
# RunPod Secure Cloud Partner Requirements - Release 2025
1+
---
2+
title: "Secure Cloud partner requirements"
3+
---
4+
5+
# RunPod Secure Cloud partner requirements (2025)
26

37
# Introduction
48

docs/integrations/mods/mods.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ To start using Mods, follow these step-by-step instructions:
2929

3030
```yml
3131
runpod:
32-
# https://docs.runpod.io/serverless/workers/vllm/openai-compatibility
32+
# https://docs.runpod.io/serverless/vllm/openai-compatibility
3333
base-url: https://api.runpod.ai/v2/${YOUR_ENDPOINT}/openai/v1
3434
api-key:
3535
api-key-env: RUNPOD_API_KEY

docs/overview.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Use Serverless to:
3030
### Get started with Serverless
3131

3232
- [Build your first Serverless app](/serverless/get-started)
33-
- [Run any LLM as an endpoint using vLLM workers](/serverless/workers/vllm/get-started)
33+
- [Run any LLM as an endpoint using vLLM workers](/serverless/vllm/get-started)
3434
- [Tutorial: Create a Serverless endpoint with Stable Diffusion](/tutorials/serverless/gpu/run-your-first)
3535

3636
## Pods

docs/references/troubleshooting/storage-full.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: "Storage Full"
2+
title: "Storage full"
33
id: "storage-full"
44
description: "This document provides guidance to troubleshoot the storage full, which may occur when users generate many files, transfer files, or perform other storage-intensive tasks."
55
---
66

77
Storage full can occur when users generate many files, transfer files, or perform other storage-intensive tasks. This document provides guidance to help you troubleshoot this.
88

9-
## Check Disk Usage
9+
## Check disk usage
1010

1111
When encountering a storage full, the first step is to check your container’s disk usage. You can use the `df -h` command to display a summary of disk usage.
1212

@@ -34,7 +34,7 @@ tmpfs 252G 0 252G 0% /sys/firmware
3434
tmpfs 252G 0 252G 0% /sys/devices/virtual/powercap
3535
```
3636

37-
## Key Areas to Check
37+
## Key areas to check
3838

3939
**Container Disk Usage**: The primary storage area for your container is mounted on the `overlay` filesystem. This indicates the container’s root directory.
4040

@@ -62,7 +62,7 @@ root@9b8e325167b2:/# find /workspace -type f -exec du -h {} + | sort -rh | head
6262
512 /workspace/a.txt
6363
```
6464

65-
## Removing Files and Directories
65+
## Removing files and directories
6666

6767
Once you’ve identified large files or directories that are no longer needed, you can remove them to free up space.
6868

docs/references/troubleshooting/troubleshooting-502-errors.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: "502 Errors"
2+
title: "502 errors"
33
id: "troubleshooting-502-errors"
44
description: "Troubleshoot 502 errors in your deployed pod by checking GPU attachment, pod logs, and official template instructions to resolve issues and enable seamless access."
55
---
66

77
502 errors can occur when users attempt to access a program running on a specific port of a deployed pod and the program isn't running or has encountered an error. This document provides guidance to help you troubleshoot this error.
88

9-
### Check Your Pod's GPU
9+
### Check your Pod's GPU
1010

1111
The first step to troubleshooting a 502 error is to check whether your pod has a GPU attached.
1212

@@ -18,7 +18,7 @@ If a GPU is attached, you will see it under the Pods screen (e.g. 1 x A6000). If
1818

1919
![](/img/docs/fb4c0dd-image.png)
2020

21-
### Check Your Pod's Logs
21+
### Check your Pod's logs
2222

2323
After confirming that your pod has a GPU attached, the next step is to check your pod's logs for any errors.
2424

@@ -27,7 +27,7 @@ After confirming that your pod has a GPU attached, the next step is to check you
2727
2. ![](/img/docs/3500eba-image.png)\
2828
**Look for errors**: Browse through the logs to find any error messages that may provide clues about why you're experiencing a 502 error.
2929

30-
### Verify Additional Steps for Official Templates
30+
### Verify additional steps for official templates
3131

3232
In some cases, for our official templates, the user interface does not work right away and may require additional steps to be performed by the user.
3333

docs/sdks/javascript/endpoints.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -626,7 +626,7 @@ console.log(result);
626626
</TabItem>
627627
</Tabs>
628628

629-
For more information, see [Execution policy](/serverless/endpoints/job-operations).
629+
For more information, see [Execution policy](/serverless/endpoints/operations).
630630

631631
## Purge queue
632632

docs/serverless/workers/development/_category_.json renamed to docs/serverless/development/_category_.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"label": "Development",
3-
"position": 6,
3+
"position": 10,
44
"link": {
55
"type": "generated-index",
66
"description": "Learn to develop your application."

docs/serverless/workers/development/overview.md renamed to docs/serverless/development/overview.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
2-
title: "Local Server Flags"
2+
title: "Local server flags"
33
description: "A comprehensive guide to all flags available when starting your RunPod local server for endpoint testing"
44
sidebar_position: 1
55
---
66

7-
When developing RunPod serverless functions, it's crucial to test them thoroughly before deployment.
8-
The RunPod SDK provides a powerful local testing environment that allows you to simulate your serverless endpoints right on your development machine.
7+
When developing RunPod Serverless functions, it's crucial to test them thoroughly before deployment.
8+
The RunPod SDK provides a powerful local testing environment that allows you to simulate your Serverless endpoints right on your development machine.
99
This local server eliminates the need for constant Docker container rebuilds, uploads, and endpoint updates during the development and testing phase.
1010

1111
To facilitate this local testing environment, the RunPod SDK offers a variety of flags that allow you to customize your setup.
@@ -20,7 +20,7 @@ By using these flags, you can create a local environment that closely mimics the
2020

2121
This guide provides a comprehensive overview of all available flags, their purposes, and how to use them effectively in your local testing workflow.
2222

23-
## Basic Usage
23+
## Basic usage
2424

2525
To start your local server with additional flags, use the following format:
2626

@@ -30,7 +30,7 @@ python your_function.py [flags]
3030

3131
Replace `your_function.py` with the name of your Python file containing the RunPod handler.
3232

33-
## Available Flags
33+
## Available flags
3434

3535
### --rp_serve_api
3636

@@ -138,6 +138,6 @@ python main.py --rp_serve_api \
138138

139139
This command starts the local server on port `8080` with 4 concurrent workers, sets the log level to `DEBUG`, and provides test input data.
140140

141-
These flags provide powerful tools for customizing your local testing environment. By using them effectively, you can simulate various scenarios, debug issues, and ensure your serverless functions are robust and ready for deployment to the RunPod cloud.
141+
These flags provide powerful tools for customizing your local testing environment. By using them effectively, you can simulate various scenarios, debug issues, and ensure your Serverless functions are robust and ready for deployment to the RunPod cloud.
142142

143143
For more detailed information on each flag and advanced usage scenarios, refer to the individual tutorials in this documentation.

docs/serverless/endpoints/_category_.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"label": "Endpoints",
3-
"position": 5,
3+
"position": 6,
44
"link": {
55
"type": "generated-index",
66
"description": "Learn how to customize the serverless functions used by in your applications."

0 commit comments

Comments
 (0)