runpod · muhsinking · Apr 18, 2025 · Apr 16, 2025 · Apr 16, 2025 · Apr 16, 2025
diff --git a/docs/api/api-endpoints.md b/docs/api/api-endpoints.md
@@ -1,6 +1,6 @@
 ---
-title: "API Endpoints"
-description: "Unlock the power of RunPod's API Endpoints, manage models without managing pods, and retrieve results via the status endpoint within 30 minutes for privacy protection; rate limits enforced per user."
+title: "API endpoints"
+description: "Unlock the power of RunPod's API endpoints, manage models without managing pods, and retrieve results via the status endpoint within 30 minutes for privacy protection; rate limits enforced per user."
 sidebar_position: 1
 ---
 
@@ -13,22 +13,22 @@ We don't keep your inputs or outputs longer than that to protect your privacy!
 
 :::
 
-API Endpoints are Endpoints managed by RunPod that you can use to interact with your favorite models without managing the pods yourself.
-These Endpoints are available to all users.
+API endpoints are endpoints managed by RunPod that you can use to interact with your favorite models without managing the pods yourself.
+These endpoints are available to all users.
 
 ## Overview
 
 The API Endpoint implementation works asynchronously as well as synchronous.
 
 Let's take a look at the differences between the two different implementations.
 
-### Asynchronous Endpoints
+### Asynchronous endpoints
 
 Asynchronous endpoints are useful for long-running jobs that you don't want to wait for. You can submit a job and then check back later to see if it's done.
 When you fire an Asynchronous request with the API Endpoint, your input parameters are sent to our endpoint and you immediately get a response with a unique job ID.
 You can then query the response by passing the job ID to the status endpoint. The status endpoint will give you the job results when completed.
 
-### Synchronous Endpoints
+### Synchronous endpoints
 
 Synchronous endpoints are useful for short-running jobs that you want to wait for.
 You can submit a job and get the results back immediately.
@@ -137,4 +137,4 @@ Exceeding limits returns a `429` error.
 `/run` - 1000 requests/10s, max 200 concurrent
 `/runsync` - 2000 requests/10s, max 400 concurrent
 
-For more information, see [Job operations](/serverless/references/operations).
+For more information, see [Job operations](/serverless/endpoints/operations).
diff --git a/docs/glossary.md b/docs/glossary.md
@@ -19,77 +19,15 @@ A [worker](./serverless/workers/overview.md) is a single compute resource that p
 
 ## Endpoint
 
-An Endpoint refers to a specific REST API (URL) provided by RunPod that your applications or services can interact with. These endpoints enable standard functionality for submitting jobs and retrieving their outputs.
+An endpoint refers to a specific REST API (URL) provided by RunPod that your applications or services can interact with. These endpoints enable standard functionality for submitting jobs and retrieving their outputs.
 
 ## Handler
 
-A Handler is a function you create that takes in submitted inputs, processes them (like generating images, text, or audio), and returns the final output.
+A handler is a function you create that takes in submitted inputs, processes them (like generating images, text, or audio), and returns the final output.
 
-## Serverless [SDK](https://github.com/runpod/runpod-python?tab=readme-ov-file#--serverless-worker-sdk)
+## Serverless SDK
 
-A Python package used when creating a handler function. This package helps your code receive requests from our serverless system, triggers your handler function to execute, and returns the function’s result back to the serverless system.
-
-## Endpoint Settings
-
-### Idle Timeout
-
-The amount of time a worker remains running after completing its current request. During this period, the worker stays active, continuously checking the queue for new jobs, and continues to incur charges. If no new requests arrive within this time, the worker will go to sleep.
-
-Default: 5 seconds
-
-### Execution Timeout
-
-The maximum time a job can run before the system terminates the worker. This prevents “bad” jobs from running indefinitely and draining your credit.
-
-You can disable this setting, but we highly recommend keeping it enabled. The default maximum value is 24 hours, but if you need a longer duration, you can use job TTL to override it.
-
-Default: 600 seconds (10 minutes)
-
-### Job [TTL](/serverless/endpoints/send-requests#execution-policies)(Time-To-Live)
-
-Defines the maximum time a job can remain in the queue before it's automatically terminated. This parameter ensures that jobs don't stay in the queue indefinitely. You should set this if your job runs longer than 24 hours or if you want to remove job data as soon as it is finished.
-
-Minimum value: 10,000 milliseconds (10 seconds)
-Default value: 86,400,000 milliseconds (24 hours)
-
-### Flashboot
-
-FlashBoot is RunPod’s magic solution for reducing the average cold-start times on your endpoint. It works probabilistically. When your endpoint has consistent traffic, your workers have a higher chance of benefiting from FlashBoot for faster spin-ups. However, if your endpoint isn’t receiving frequent requests, FlashBoot has fewer opportunities to optimize performance. There’s no additional cost associated with FlashBoot.
-
-### Scale Type
-
-- Queue Delay scaling strategy adjusts worker numbers based on request wait times. With zero workers initially, the first request adds one worker. Subsequent requests add workers only after waiting in the queue for the defined number of delay seconds.
-- Request Count scaling strategy adjusts worker numbers according to total requests in the queue and in progress. It automatically adds workers as the number of requests increases, ensuring tasks are handled efficiently.
-
-### Expose HTTP/TCP Ports
-
-We allow direct communication with your worker using its public IP and port. This is especially useful for real-time applications that require minimal latency. Check out this [WebSocket example](https://github.com/runpod-workers/worker-websocket) to see how it works!
-
-## Endpoint Metrics
-
-### Requests
-
-Displays the total number of requests received by your endpoint, along with the number of completed, failed, and retried requests.
-
-### Execution Time
-
-Displays the P70, P90, and P98 execution times for requests on your endpoint. These percentiles help analyze execution time distribution and identify potential performance bottlenecks.
-
-### Delay Time
-
-Delay time is the duration a request spends waiting in the queue before being picked up by a worker. Displays the P70, P90, and P98 delay times for requests on your endpoint. These percentiles help assess whether your endpoint is scaling efficiently.
-
-### Cold Start Time
-
-Cold start time measures how long it takes to wake up a worker. This includes the time needed to start the container, load the model into GPU VRAM, and get the worker ready to process a job. Displays the P70, P90, and P98 cold start times for your endpoint.
-
-### Cold Start Count
-
-Displays the number of cold starts your endpoint has during a given period. The fewer, the better, as fewer cold starts mean faster response times.
-
-### WebhookRequest Responses
-
-Displays the number of webhook requests sent and their corresponding responses, including success and failure counts.
+The [Serverless SDK](https://github.com/runpod/runpod-python) is a Python package used when creating a handler function. This package helps your code receive requests from our serverless system, triggers your handler function to execute, and returns the function's result back to the serverless system.
 
 # Pod
 
@@ -101,13 +39,14 @@ GPU instances that run in T3/T4 data centers, providing high reliability and sec
 
 GPU instances connect individual compute providers to consumers through a vetted, secure peer-to-peer system.
 
-## Datacenter
+## Data center
+
+A data center is a secure location where RunPod's cloud computing services, such as GPU instances and storage instances, are hosted. These data centers are equipped with redundant power, multiple ISP connections, and data backups to ensure the safety and reliability of your compute services and data.
 
-A data center is a secure location where RunPod's cloud computing services, such as Secure Cloud and GPU Instances, are hosted. These data centers are equipped with redundancy and data backups to ensure the safety and reliability of your data.
+## GPU instance
 
-## GPU Instance
+A GPU instance is a container-based compute resource that you can deploy.
 
-GPU Instance is a container-based GPU instance that you can deploy.
 These instances spin up in seconds using both public and private repositories.
 They are available in two different types:
 

diff --git a/docs/hosting/burn-testing.md b/docs/hosting/burn-testing.md
@@ -1,5 +1,5 @@
 ---
-title: "Burn Testing"
+title: "Burn testing"
 description: "Before listing a machine on the RunPod platform, thoroughly test it with a burn test, verifying memory, CPU, and disk capabilities, and ensure compatibility with popular templates by self-renting the machine after verifying its performance."
 ---
 

diff --git a/docs/hosting/partner-requirements.md b/docs/hosting/partner-requirements.md
@@ -1,4 +1,8 @@
-# RunPod Secure Cloud Partner Requirements - Release 2025
+---
+title: "Secure Cloud partner requirements"
+---
+
+# RunPod Secure Cloud partner requirements (2025)
 
 # Introduction
 

diff --git a/docs/integrations/mods/mods.md b/docs/integrations/mods/mods.md
@@ -29,7 +29,7 @@ To start using Mods, follow these step-by-step instructions:
 
      ```yml
      runpod:
-       # https://docs.runpod.io/serverless/workers/vllm/openai-compatibility
+       # https://docs.runpod.io/serverless/vllm/openai-compatibility
        base-url: https://api.runpod.ai/v2/${YOUR_ENDPOINT}/openai/v1
        api-key:
        api-key-env: RUNPOD_API_KEY

diff --git a/docs/overview.md b/docs/overview.md
@@ -30,7 +30,7 @@ Use Serverless to:
 ### Get started with Serverless
 
 - [Build your first Serverless app](/serverless/get-started)
-- [Run any LLM as an endpoint using vLLM workers](/serverless/workers/vllm/get-started)
+- [Run any LLM as an endpoint using vLLM workers](/serverless/vllm/get-started)
 - [Tutorial: Create a Serverless endpoint with Stable Diffusion](/tutorials/serverless/gpu/run-your-first)
 
 ## Pods

diff --git a/docs/references/troubleshooting/storage-full.md b/docs/references/troubleshooting/storage-full.md
@@ -1,12 +1,12 @@
 ---
-title: "Storage Full"
+title: "Storage full"
 id: "storage-full"
 description: "This document provides guidance to troubleshoot the storage full, which may occur when users generate many files, transfer files, or perform other storage-intensive tasks."
 ---
 
 Storage full can occur when users generate many files, transfer files, or perform other storage-intensive tasks. This document provides guidance to help you troubleshoot this.
 
-## Check Disk Usage
+## Check disk usage
 
 When encountering a storage full, the first step is to check your container’s disk usage. You can use the `df -h` command to display a summary of disk usage.
 
@@ -34,7 +34,7 @@ tmpfs                         252G     0  252G   0% /sys/firmware
 tmpfs                         252G     0  252G   0% /sys/devices/virtual/powercap
 ```
 
-## Key Areas to Check
+## Key areas to check
 
 **Container Disk Usage**: The primary storage area for your container is mounted on the `overlay` filesystem. This indicates the container’s root directory.
 
@@ -62,7 +62,7 @@ root@9b8e325167b2:/# find /workspace -type f -exec du -h {} + | sort -rh | head
 512     /workspace/a.txt
 ```
 
-## Removing Files and Directories
+## Removing files and directories
 
 Once you’ve identified large files or directories that are no longer needed, you can remove them to free up space.
 

diff --git a/docs/references/troubleshooting/troubleshooting-502-errors.md b/docs/references/troubleshooting/troubleshooting-502-errors.md
@@ -1,12 +1,12 @@
 ---
-title: "502 Errors"
+title: "502 errors"
 id: "troubleshooting-502-errors"
 description: "Troubleshoot 502 errors in your deployed pod by checking GPU attachment, pod logs, and official template instructions to resolve issues and enable seamless access."
 ---
 
 502 errors can occur when users attempt to access a program running on a specific port of a deployed pod and the program isn't running or has encountered an error. This document provides guidance to help you troubleshoot this error.
 
-### Check Your Pod's GPU
+### Check your Pod's GPU
 
 The first step to troubleshooting a 502 error is to check whether your pod has a GPU attached.
 
@@ -18,7 +18,7 @@ If a GPU is attached, you will see it under the Pods screen (e.g. 1 x A6000). If
 
 ![](/img/docs/fb4c0dd-image.png)
 
-### Check Your Pod's Logs
+### Check your Pod's logs
 
 After confirming that your pod has a GPU attached, the next step is to check your pod's logs for any errors.
 
@@ -27,7 +27,7 @@ After confirming that your pod has a GPU attached, the next step is to check you
 2. ![](/img/docs/3500eba-image.png)\
    **Look for errors**: Browse through the logs to find any error messages that may provide clues about why you're experiencing a 502 error.
 
-### Verify Additional Steps for Official Templates
+### Verify additional steps for official templates
 
 In some cases, for our official templates, the user interface does not work right away and may require additional steps to be performed by the user.
 

diff --git a/docs/sdks/javascript/endpoints.md b/docs/sdks/javascript/endpoints.md
@@ -626,7 +626,7 @@ console.log(result);
 </TabItem>
 </Tabs>
 
-For more information, see [Execution policy](/serverless/endpoints/job-operations).
+For more information, see [Execution policy](/serverless/endpoints/operations).
 
 ## Purge queue
 

diff --git a/...rless/workers/development/_category_.json → docs/serverless/development/_category_.json b/...rless/workers/development/_category_.json → docs/serverless/development/_category_.json
@@ -1,6 +1,6 @@
 {
   "label": "Development",
-  "position": 6,
+  "position": 10,
   "link": {
     "type": "generated-index",
     "description": "Learn to develop your application."

diff --git a/...serverless/workers/development/cleanup.md → docs/serverless/development/cleanup.md b/...serverless/workers/development/cleanup.md → docs/serverless/development/cleanup.md
diff --git a/...erless/workers/development/concurrency.md → docs/serverless/development/concurrency.md b/...erless/workers/development/concurrency.md → docs/serverless/development/concurrency.md
diff --git a/...erverless/workers/development/debugger.md → docs/serverless/development/debugger.md b/...erverless/workers/development/debugger.md → docs/serverless/development/debugger.md
diff --git a/...kers/development/environment-variables.md → ...less/development/environment-variables.md b/...kers/development/environment-variables.md → ...less/development/environment-variables.md
diff --git a/...less/workers/development/local-testing.md → docs/serverless/development/local-testing.md b/...less/workers/development/local-testing.md → docs/serverless/development/local-testing.md
diff --git a/...erverless/workers/development/overview.md → docs/serverless/development/overview.md b/...erverless/workers/development/overview.md → docs/serverless/development/overview.md
@@ -1,11 +1,11 @@
 ---
-title: "Local Server Flags"
+title: "Local server flags"
 description: "A comprehensive guide to all flags available when starting your RunPod local server for endpoint testing"
 sidebar_position: 1
 ---
 
-When developing RunPod serverless functions, it's crucial to test them thoroughly before deployment.
-The RunPod SDK provides a powerful local testing environment that allows you to simulate your serverless endpoints right on your development machine.
+When developing RunPod Serverless functions, it's crucial to test them thoroughly before deployment.
+The RunPod SDK provides a powerful local testing environment that allows you to simulate your Serverless endpoints right on your development machine.
 This local server eliminates the need for constant Docker container rebuilds, uploads, and endpoint updates during the development and testing phase.
 
 To facilitate this local testing environment, the RunPod SDK offers a variety of flags that allow you to customize your setup.
@@ -20,7 +20,7 @@ By using these flags, you can create a local environment that closely mimics the
 
 This guide provides a comprehensive overview of all available flags, their purposes, and how to use them effectively in your local testing workflow.
 
-## Basic Usage
+## Basic usage
 
 To start your local server with additional flags, use the following format:
 
@@ -30,7 +30,7 @@ python your_function.py [flags]
 
 Replace `your_function.py` with the name of your Python file containing the RunPod handler.
 
-## Available Flags
+## Available flags
 
 ### --rp_serve_api
 
@@ -138,6 +138,6 @@ python main.py --rp_serve_api \
 
 This command starts the local server on port `8080` with 4 concurrent workers, sets the log level to `DEBUG`, and provides test input data.
 
-These flags provide powerful tools for customizing your local testing environment. By using them effectively, you can simulate various scenarios, debug issues, and ensure your serverless functions are robust and ready for deployment to the RunPod cloud.
+These flags provide powerful tools for customizing your local testing environment. By using them effectively, you can simulate various scenarios, debug issues, and ensure your Serverless functions are robust and ready for deployment to the RunPod cloud.
 
 For more detailed information on each flag and advanced usage scenarios, refer to the individual tutorials in this documentation.
diff --git a/...orkers/development/test-response-times.md → ...erless/development/test-response-times.md b/...orkers/development/test-response-times.md → ...erless/development/test-response-times.md
diff --git a/...rverless/workers/development/validator.md → docs/serverless/development/validator.md b/...rverless/workers/development/validator.md → docs/serverless/development/validator.md
diff --git a/docs/serverless/endpoints/_category_.json b/docs/serverless/endpoints/_category_.json
@@ -1,6 +1,6 @@
 {
   "label": "Endpoints",
-  "position": 5,
+  "position": 6,
   "link": {
     "type": "generated-index",
     "description": "Learn how to customize the serverless functions used by in your applications."