You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api/api-endpoints.md
+7-7
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
-
title: "API Endpoints"
3
-
description: "Unlock the power of RunPod's API Endpoints, manage models without managing pods, and retrieve results via the status endpoint within 30 minutes for privacy protection; rate limits enforced per user."
2
+
title: "API endpoints"
3
+
description: "Unlock the power of RunPod's API endpoints, manage models without managing pods, and retrieve results via the status endpoint within 30 minutes for privacy protection; rate limits enforced per user."
4
4
sidebar_position: 1
5
5
---
6
6
@@ -13,22 +13,22 @@ We don't keep your inputs or outputs longer than that to protect your privacy!
13
13
14
14
:::
15
15
16
-
API Endpoints are Endpoints managed by RunPod that you can use to interact with your favorite models without managing the pods yourself.
17
-
These Endpoints are available to all users.
16
+
API endpoints are endpoints managed by RunPod that you can use to interact with your favorite models without managing the pods yourself.
17
+
These endpoints are available to all users.
18
18
19
19
## Overview
20
20
21
21
The API Endpoint implementation works asynchronously as well as synchronous.
22
22
23
23
Let's take a look at the differences between the two different implementations.
24
24
25
-
### Asynchronous Endpoints
25
+
### Asynchronous endpoints
26
26
27
27
Asynchronous endpoints are useful for long-running jobs that you don't want to wait for. You can submit a job and then check back later to see if it's done.
28
28
When you fire an Asynchronous request with the API Endpoint, your input parameters are sent to our endpoint and you immediately get a response with a unique job ID.
29
29
You can then query the response by passing the job ID to the status endpoint. The status endpoint will give you the job results when completed.
30
30
31
-
### Synchronous Endpoints
31
+
### Synchronous endpoints
32
32
33
33
Synchronous endpoints are useful for short-running jobs that you want to wait for.
34
34
You can submit a job and get the results back immediately.
@@ -137,4 +137,4 @@ Exceeding limits returns a `429` error.
137
137
`/run` - 1000 requests/10s, max 200 concurrent
138
138
`/runsync` - 2000 requests/10s, max 400 concurrent
139
139
140
-
For more information, see [Job operations](/serverless/references/operations).
140
+
For more information, see [Job operations](/serverless/endpoints/operations).
Copy file name to clipboardExpand all lines: docs/glossary.md
+9-70
Original file line number
Diff line number
Diff line change
@@ -19,77 +19,15 @@ A [worker](./serverless/workers/overview.md) is a single compute resource that p
19
19
20
20
## Endpoint
21
21
22
-
An Endpoint refers to a specific REST API (URL) provided by RunPod that your applications or services can interact with. These endpoints enable standard functionality for submitting jobs and retrieving their outputs.
22
+
An endpoint refers to a specific REST API (URL) provided by RunPod that your applications or services can interact with. These endpoints enable standard functionality for submitting jobs and retrieving their outputs.
23
23
24
24
## Handler
25
25
26
-
A Handler is a function you create that takes in submitted inputs, processes them (like generating images, text, or audio), and returns the final output.
26
+
A handler is a function you create that takes in submitted inputs, processes them (like generating images, text, or audio), and returns the final output.
A Python package used when creating a handler function. This package helps your code receive requests from our serverless system, triggers your handler function to execute, and returns the function’s result back to the serverless system.
31
-
32
-
## Endpoint Settings
33
-
34
-
### Idle Timeout
35
-
36
-
The amount of time a worker remains running after completing its current request. During this period, the worker stays active, continuously checking the queue for new jobs, and continues to incur charges. If no new requests arrive within this time, the worker will go to sleep.
37
-
38
-
Default: 5 seconds
39
-
40
-
### Execution Timeout
41
-
42
-
The maximum time a job can run before the system terminates the worker. This prevents “bad” jobs from running indefinitely and draining your credit.
43
-
44
-
You can disable this setting, but we highly recommend keeping it enabled. The default maximum value is 24 hours, but if you need a longer duration, you can use job TTL to override it.
Defines the maximum time a job can remain in the queue before it's automatically terminated. This parameter ensures that jobs don't stay in the queue indefinitely. You should set this if your job runs longer than 24 hours or if you want to remove job data as soon as it is finished.
51
-
52
-
Minimum value: 10,000 milliseconds (10 seconds)
53
-
Default value: 86,400,000 milliseconds (24 hours)
54
-
55
-
### Flashboot
56
-
57
-
FlashBoot is RunPod’s magic solution for reducing the average cold-start times on your endpoint. It works probabilistically. When your endpoint has consistent traffic, your workers have a higher chance of benefiting from FlashBoot for faster spin-ups. However, if your endpoint isn’t receiving frequent requests, FlashBoot has fewer opportunities to optimize performance. There’s no additional cost associated with FlashBoot.
58
-
59
-
### Scale Type
60
-
61
-
- Queue Delay scaling strategy adjusts worker numbers based on request wait times. With zero workers initially, the first request adds one worker. Subsequent requests add workers only after waiting in the queue for the defined number of delay seconds.
62
-
- Request Count scaling strategy adjusts worker numbers according to total requests in the queue and in progress. It automatically adds workers as the number of requests increases, ensuring tasks are handled efficiently.
63
-
64
-
### Expose HTTP/TCP Ports
65
-
66
-
We allow direct communication with your worker using its public IP and port. This is especially useful for real-time applications that require minimal latency. Check out this [WebSocket example](https://github.com/runpod-workers/worker-websocket) to see how it works!
67
-
68
-
## Endpoint Metrics
69
-
70
-
### Requests
71
-
72
-
Displays the total number of requests received by your endpoint, along with the number of completed, failed, and retried requests.
73
-
74
-
### Execution Time
75
-
76
-
Displays the P70, P90, and P98 execution times for requests on your endpoint. These percentiles help analyze execution time distribution and identify potential performance bottlenecks.
77
-
78
-
### Delay Time
79
-
80
-
Delay time is the duration a request spends waiting in the queue before being picked up by a worker. Displays the P70, P90, and P98 delay times for requests on your endpoint. These percentiles help assess whether your endpoint is scaling efficiently.
81
-
82
-
### Cold Start Time
83
-
84
-
Cold start time measures how long it takes to wake up a worker. This includes the time needed to start the container, load the model into GPU VRAM, and get the worker ready to process a job. Displays the P70, P90, and P98 cold start times for your endpoint.
85
-
86
-
### Cold Start Count
87
-
88
-
Displays the number of cold starts your endpoint has during a given period. The fewer, the better, as fewer cold starts mean faster response times.
89
-
90
-
### WebhookRequest Responses
91
-
92
-
Displays the number of webhook requests sent and their corresponding responses, including success and failure counts.
30
+
The [Serverless SDK](https://github.com/runpod/runpod-python) is a Python package used when creating a handler function. This package helps your code receive requests from our serverless system, triggers your handler function to execute, and returns the function's result back to the serverless system.
93
31
94
32
# Pod
95
33
@@ -101,13 +39,14 @@ GPU instances that run in T3/T4 data centers, providing high reliability and sec
101
39
102
40
GPU instances connect individual compute providers to consumers through a vetted, secure peer-to-peer system.
103
41
104
-
## Datacenter
42
+
## Data center
43
+
44
+
A data center is a secure location where RunPod's cloud computing services, such as GPU instances and storage instances, are hosted. These data centers are equipped with redundant power, multiple ISP connections, and data backups to ensure the safety and reliability of your compute services and data.
105
45
106
-
A data center is a secure location where RunPod's cloud computing services, such as Secure Cloud and GPU Instances, are hosted. These data centers are equipped with redundancy and data backups to ensure the safety and reliability of your data.
46
+
## GPU instance
107
47
108
-
##GPU Instance
48
+
A GPU instance is a container-based compute resource that you can deploy.
109
49
110
-
GPU Instance is a container-based GPU instance that you can deploy.
111
50
These instances spin up in seconds using both public and private repositories.
Copy file name to clipboardExpand all lines: docs/hosting/burn-testing.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: "Burn Testing"
2
+
title: "Burn testing"
3
3
description: "Before listing a machine on the RunPod platform, thoroughly test it with a burn test, verifying memory, CPU, and disk capabilities, and ensure compatibility with popular templates by self-renting the machine after verifying its performance."
Copy file name to clipboardExpand all lines: docs/references/troubleshooting/storage-full.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
---
2
-
title: "Storage Full"
2
+
title: "Storage full"
3
3
id: "storage-full"
4
4
description: "This document provides guidance to troubleshoot the storage full, which may occur when users generate many files, transfer files, or perform other storage-intensive tasks."
5
5
---
6
6
7
7
Storage full can occur when users generate many files, transfer files, or perform other storage-intensive tasks. This document provides guidance to help you troubleshoot this.
8
8
9
-
## Check Disk Usage
9
+
## Check disk usage
10
10
11
11
When encountering a storage full, the first step is to check your container’s disk usage. You can use the `df -h` command to display a summary of disk usage.
**Container Disk Usage**: The primary storage area for your container is mounted on the `overlay` filesystem. This indicates the container’s root directory.
40
40
@@ -62,7 +62,7 @@ root@9b8e325167b2:/# find /workspace -type f -exec du -h {} + | sort -rh | head
62
62
512 /workspace/a.txt
63
63
```
64
64
65
-
## Removing Files and Directories
65
+
## Removing files and directories
66
66
67
67
Once you’ve identified large files or directories that are no longer needed, you can remove them to free up space.
Copy file name to clipboardExpand all lines: docs/references/troubleshooting/troubleshooting-502-errors.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
---
2
-
title: "502 Errors"
2
+
title: "502 errors"
3
3
id: "troubleshooting-502-errors"
4
4
description: "Troubleshoot 502 errors in your deployed pod by checking GPU attachment, pod logs, and official template instructions to resolve issues and enable seamless access."
5
5
---
6
6
7
7
502 errors can occur when users attempt to access a program running on a specific port of a deployed pod and the program isn't running or has encountered an error. This document provides guidance to help you troubleshoot this error.
8
8
9
-
### Check Your Pod's GPU
9
+
### Check your Pod's GPU
10
10
11
11
The first step to troubleshooting a 502 error is to check whether your pod has a GPU attached.
12
12
@@ -18,7 +18,7 @@ If a GPU is attached, you will see it under the Pods screen (e.g. 1 x A6000). If
18
18
19
19

20
20
21
-
### Check Your Pod's Logs
21
+
### Check your Pod's logs
22
22
23
23
After confirming that your pod has a GPU attached, the next step is to check your pod's logs for any errors.
24
24
@@ -27,7 +27,7 @@ After confirming that your pod has a GPU attached, the next step is to check you
27
27
2.\
28
28
**Look for errors**: Browse through the logs to find any error messages that may provide clues about why you're experiencing a 502 error.
29
29
30
-
### Verify Additional Steps for Official Templates
30
+
### Verify additional steps for official templates
31
31
32
32
In some cases, for our official templates, the user interface does not work right away and may require additional steps to be performed by the user.
This command starts the local server on port `8080` with 4 concurrent workers, sets the log level to `DEBUG`, and provides test input data.
140
140
141
-
These flags provide powerful tools for customizing your local testing environment. By using them effectively, you can simulate various scenarios, debug issues, and ensure your serverless functions are robust and ready for deployment to the RunPod cloud.
141
+
These flags provide powerful tools for customizing your local testing environment. By using them effectively, you can simulate various scenarios, debug issues, and ensure your Serverless functions are robust and ready for deployment to the RunPod cloud.
142
142
143
143
For more detailed information on each flag and advanced usage scenarios, refer to the individual tutorials in this documentation.
0 commit comments