diff --git a/docs/finetuning/http-api-reference.mdx b/docs/finetuning/http-api-reference.mdx index f5d2747..f105daf 100644 --- a/docs/finetuning/http-api-reference.mdx +++ b/docs/finetuning/http-api-reference.mdx @@ -35,7 +35,7 @@ Create a new finetune. | Field | Type | Required | Description | |-------|------|----------|-------------| | name | string | yes | Unique name for this finetune (alphanumeric, hyphens, underscores) | -| rank | integer | yes | LoRA rank: 8, 16, 24, or 32 (must be multiple of 8; higher = more capacity, but slower) | +| rank | integer | yes | LoRA rank: 8, 16, 24, or 32 (must be multiple of 8; higher = more capacity, but slower training) | ### Response @@ -94,11 +94,13 @@ Get details for a specific finetune. ```json { - "finetune_id": "01HXYZ...", - "name": "my-finetune", - "rank": 32, - "created_at_ms": 1736937000000, - "updated_at_ms": 1736937000000 + "finetune": { + "finetune_id": "01HXYZ...", + "name": "my-finetune", + "rank": 32, + "created_at_ms": 1736937000000, + "updated_at_ms": 1736937000000 + } } ``` @@ -179,7 +181,7 @@ Generate rollouts for a training or evaluation request. "settings": { "temperature": 1.0, "top_p": 1.0, - "max_objects": 50 + "max_tokens": 128 } } ``` @@ -201,7 +203,8 @@ Generate rollouts for a training or evaluation request. "settings": { "temperature": 1.0, "top_p": 1.0, - "max_objects": 50 + "max_tokens": 256, + "max_objects": 8 } } ``` @@ -219,10 +222,8 @@ Generate rollouts for a training or evaluation request. |-------|------|---------|-------------| | temperature | number | 1.0 | Controls randomness (0 = deterministic) | | top_p | number | 1.0 | Nucleus sampling threshold | -| max_tokens | integer | 128 | Maximum output length (query only) | -| max_objects | integer | 50 | Maximum returned points or objects (point and detect only) | - -For `point` and `detect`, the output token budget is derived from `max_objects`. Requests to this finetuning API do not accept `max_tokens` for those skills; use `max_objects` instead. +| max_tokens | integer | 128 for query/point, 256 for detect | Maximum output length | +| max_objects | integer | — | Maximum objects to detect (detect only) | ### Ground truth @@ -256,8 +257,6 @@ All coordinates are normalized to 0–1. `ground_truth` is only supported for `point` and `detect` on `/rollouts`. -For query SFT, use `POST /train_step` with `mode: "sft"` and `targets`. - ### Response ```json @@ -266,17 +265,11 @@ For query SFT, use `POST /train_step` with `mode: "sft"` and `targets`. "rollouts": [ { "skill": "detect", - "finish_reason": "stop", "output": { "objects": [ { "x_min": 0.12, "y_min": 0.22, "x_max": 0.39, "y_max": 0.58 } ] - }, - "answer_tokens": [], - "thinking_tokens": [], - "has_answer_separator": false, - "coords": [], - "sizes": [] + } } ], "rewards": [0.8] @@ -289,20 +282,7 @@ For query SFT, use `POST /train_step` with `mode: "sft"` and `targets`. | rollouts | array | Generated rollouts | | rewards | array \| null | Computed rewards if ground truth was provided | -Each rollout also includes the following training fields: - -| Field | Type | Description | -|-------|------|-------------| -| skill | string | Skill for this rollout (`query`, `point`, or `detect`) | -| finish_reason | string | Generation stop reason | -| output | object | Skill output | -| answer_tokens | array | Answer token IDs used for training | -| thinking_tokens | array | Reasoning token IDs used for training | -| has_answer_separator | boolean | Whether an answer separator token was used | -| coords | array | Coordinate token data used for training | -| sizes | array | Size token data used for training | - -Pass these fields back unchanged when sending RL rollouts to `POST /train_step`. +Rollout objects may include training metadata such as `answer_tokens`, `thinking_tokens`, `has_answer_separator`, `coords`, and `sizes`. If you use them for RL training, pass them back unchanged to `POST /train_step`. ### Rollout output by skill @@ -372,9 +352,31 @@ Apply one training step using RL or SFT. ```json { "mode": "rl", - "request": { ... }, - "rollouts": [ ... ], - "rewards": [0.8, 0.3, 0.6, 0.5] + "request": { + "finetune_id": "01HXYZ...", + "num_rollouts": 2, + "request": { + "skill": "query", + "question": "What color is the car?", + "image_url": "data:image/jpeg;base64,...", + "reasoning": false + } + }, + "rollouts": [ + { + "skill": "query", + "output": { + "answer": "red" + } + }, + { + "skill": "query", + "output": { + "answer": "dark red" + } + } + ], + "rewards": [0.8, 0.6] } ``` @@ -387,12 +389,7 @@ Apply one training step using RL or SFT. "skill": "query", "question": "What country is this?", "image_url": "data:image/jpeg;base64,...", - "reasoning": false, - "settings": { - "temperature": 0.0, - "top_p": 1.0, - "max_tokens": 16 - } + "reasoning": false }, "targets": [ { @@ -407,162 +404,102 @@ Each group: | Field | Type | Required | Description | |-------|------|----------|-------------| | mode | string | yes | Training mode, either `"rl"` or `"sft"` | -| request | object | yes | Rollouts request for RL, or skill request for SFT | +| request | object | yes | For RL, the full request body previously sent to `POST /rollouts`; for SFT, the skill request | | rollouts | array | no | Required when `mode="rl"`; rollout objects from `/rollouts` (pass unchanged) | | rewards | array | no | Required when `mode="rl"`; reward for each rollout (same length and order) | | targets | array | no | Required when `mode="sft"`; target objects for supervised training | You can mix skills across groups in the same train step. -For query SFT: -- `targets` entries use `answer` -- `reasoning` is required when `request.reasoning=true` -- omit `reasoning` when `request.reasoning=false` - -For point SFT: -- each target must include either `points` or `boxes` +### Targets -For detect SFT: -- each target must include `boxes` +| Skill | Type | Required | Description | +|-------|------|----------|-------------| +| `query` | object | `answer` | `reasoning` is required when `request.reasoning=true`; omit it when `request.reasoning=false` | +| `point` | object | `points` or `boxes` | Include exactly one of `points` or `boxes` | +| `detect` | object | `boxes` | One or more target boxes | -### Response +**Query** ```json -{ - "step": 101, - "applied": true, - "kl": 0.12, - "router_kl": 0.03, - "grad_norm": 2.5, - "reward_mean": 0.58, - "reward_std": 0.19 -} +[ + { + "answer": "United States" + } +] ``` -| Field | Type | Description | -|-------|------|-------------| -| step | integer | Training step after this request | -| applied | boolean | Whether the step was applied | -| kl | number \| null | RL KL term | -| router_kl | number \| null | Router KL regularizer | -| grad_norm | number \| null | Gradient norm for the step | -| sft_loss | number \| null | SFT loss when SFT groups are included | -| reward_mean | number \| null | Mean reward across RL groups | -| reward_std | number \| null | Reward standard deviation across RL rollouts | - ---- - -## POST /finetunes/:finetuneId/metrics - -Log user-defined metrics for a specific training step, such as eval scores. - -### Request +If `request.reasoning=true`, include `reasoning`: ```json -{ - "step": 100, - "metrics": { - "eval/country_match": 0.63, - "eval/token_f1": 0.64 +[ + { + "answer": "United States", + "reasoning": "The road markings and signs match the US." } -} +] ``` -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| step | integer | yes | Training step the metrics belong to | -| metrics | object | yes | Metric name to numeric value map | - -### Constraints - -- Include between 1 and 100 metrics per request. -- Metric names must match `[A-Za-z0-9_/-]+`. -- Do not include `sys/` or `usr/` prefixes in metric names. -- Metric values must be finite numbers. - -### Response +**Point** ```json -{ - "ok": true, - "step": 100, - "logged_count": 2 -} +[ + { + "points": [ + { "x": 0.52, "y": 0.31 } + ] + } +] ``` -| Field | Type | Description | -|-------|------|-------------| -| ok | boolean | Whether the metrics were accepted | -| step | integer | Training step the metrics were logged for | -| logged_count | integer | Number of metric entries logged | - ---- +Point targets can also use `boxes`: -## GET /finetunes/:finetuneId/train_logs - -List recorded train log files for a finetune. +```json +[ + { + "boxes": [ + { "x_min": 0.45, "y_min": 0.22, "x_max": 0.58, "y_max": 0.39 } + ] + } +] +``` -### Query parameters +**Detect** -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| limit | integer | 20 | Maximum results per page (1–100) | -| cursor | string | — | Pagination cursor from the previous response | +```json +[ + { + "boxes": [ + { "x_min": 0.12, "y_min": 0.22, "x_max": 0.39, "y_max": 0.58 }, + { "x_min": 0.51, "y_min": 0.18, "x_max": 0.74, "y_max": 0.49 } + ] + } +] +``` ### Response ```json { - "steps": [ - { - "id": "000000123-rl=4-sft=8.json", - "step": 123, - "metadata": { - "rl": "4", - "sft": "8" - } - } - ], - "next_cursor": "000000123-rl=4-sft=8.json", - "has_more": true + "step": 101, + "applied": true, + "kl": 0.12, + "reward_mean": 0.58, + "reward_std": 0.19, + "sft_loss": 1.23 } ``` | Field | Type | Description | |-------|------|-------------| -| steps | array | Train log summaries | -| next_cursor | string \| null | Cursor for the next page | -| has_more | boolean | Whether more results are available | - -Each `steps` entry includes: - -| Field | Type | Description | -|-------|------|-------------| -| id | string | Train log file ID | -| step | integer | Training step number | -| metadata | object | Metadata parsed from the train log file name | - -Results are returned in reverse chronological order. - ---- - -## GET /finetunes/:finetuneId/train_logs/:step - -Get the raw JSON train log for a specific train log file ID. - -Use the `id` returned by `GET /finetunes/:finetuneId/train_logs`, for example: - -``` -000000123-rl=4-sft=8.json -``` - -### Response - -Returns the stored JSON train log body. +| step | integer | Training step after this request | +| applied | boolean | Whether the step was applied | +| kl | number \| null | RL KL term | +| reward_mean | number \| null | Mean reward across RL groups | +| reward_std | number \| null | Reward standard deviation across RL rollouts | +| sft_loss | number \| null | SFT loss when SFT groups are included | -**Constraints:** -- Returns 400 if `:step` is not a valid train log file ID. -- Returns 404 if the finetune or train log does not exist. +Additional training metrics may also be returned. --- @@ -651,30 +588,6 @@ Deleting the latest checkpoint will prevent resuming training for that finetune. --- -## GET /finetunes/:finetuneId/checkpoints/:step/download - -Get a presigned URL for downloading a saved checkpoint. - -### Response - -```json -{ - "url": "https://...", - "expires_in": 3600 -} -``` - -| Field | Type | Description | -|-------|------|-------------| -| url | string | Presigned download URL | -| expires_in | integer | URL lifetime in seconds | - -**Constraints:** -- Returns 404 if the finetune or checkpoint does not exist. -- Returns 410 if the checkpoint has expired or been deleted. - ---- - ## Using finetuned models for inference Once you have saved a checkpoint, you can use it for inference by specifying the `model` parameter in your API requests.