From 22398732ceef61f0cadea67d2af6dcf4a93e8a9d Mon Sep 17 00:00:00 2001 From: Wuying Created Local Users Date: Mon, 9 Feb 2026 16:30:02 +0800 Subject: [PATCH 1/4] docs: enhance fine-tuning documentation with dataset formats and LoRA usage guide - Add detailed dataset format examples (instruction/input/output, chat messages, simple text) - Add dataset preparation guidelines - Add direct TEE upload option (--dataset-path) as alternative to 0G Storage - Add comprehensive "Using the Fine-tuned Model" section: - How to download base model (HuggingFace or 0G Storage) - How to load LoRA adapter with base model - How to run inference with fine-tuned model - How to merge and save full model locally - Add Python package requirements table Co-authored-by: Cursor --- .../compute-network/fine-tuning.md | 227 +++++++++++++++++- 1 file changed, 224 insertions(+), 3 deletions(-) diff --git a/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md b/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md index aebb13d..2b28d56 100644 --- a/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md +++ b/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md @@ -112,17 +112,66 @@ Please download the parameter file template for the model you wish to fine-tune ### Prepare Your Data -Please download the dataset format specification and verification script from the [releases page](https://github.com/0gfoundation/0g-serving-broker/releases) to make sure your generated dataset complies with the requirements. +Your dataset should be in JSONL format. Each line is a JSON object representing one training example. + +#### Supported Dataset Formats + +**Format 1: Instruction-Input-Output** +```json +{"instruction": "Translate to French", "input": "Hello world", "output": "Bonjour le monde"} +{"instruction": "Translate to French", "input": "Good morning", "output": "Bonjour"} +{"instruction": "Summarize the text", "input": "Long article...", "output": "Brief summary"} +``` + +**Format 2: Chat Messages** +```json +{"messages": [{"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "2+2 equals 4."}]} +{"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there! How can I help you?"}]} +``` + +**Format 3: Simple Text (for text completion)** +```json +{"text": "The quick brown fox jumps over the lazy dog."} +{"text": "Machine learning is a subset of artificial intelligence."} +``` + +#### Dataset Guidelines + +- **Minimum examples**: At least 10 examples recommended for meaningful fine-tuning +- **Quality**: Ensure examples are accurate and representative of your use case +- **Consistency**: Use the same format throughout the dataset +- **Encoding**: UTF-8 encoding required + +You can also download the dataset format specification and verification script from the [releases page](https://github.com/0gfoundation/0g-serving-broker/releases) to validate your dataset. ### Upload Dataset +You have two options for providing your dataset: + +#### Option A: Upload to 0G Storage (Recommended for large datasets) + ```bash # Upload to 0G Storage 0g-compute-cli fine-tuning upload --data-path # Output: Root hash: 0xabc123... (save this!) ``` -> Record the root hash of the dataset; they will be needed in later steps. +> Record the root hash of the dataset; it will be needed when creating the task. + +#### Option B: Direct Upload to TEE (Simpler for small datasets) + +You can skip the 0G Storage upload and provide the dataset directly when creating a task: + +```bash +0g-compute-cli fine-tuning create-task \ + --provider \ + --model \ + --dataset-path \ + --config-path \ + --data-size +``` + +This method uploads the dataset directly to the TEE (Trusted Execution Environment) during task creation. Use `--dataset-path` instead of `--dataset` when using this option. ### Calculate Dataset Size @@ -139,6 +188,8 @@ After uploading the dataset to storage, you can calculate its size by running th After calculating the dataset size, you can create a task by running the following command: +#### Using 0G Storage (with root hash) + ```bash 0g-compute-cli fine-tuning create-task \ --provider \ @@ -148,13 +199,25 @@ After calculating the dataset size, you can create a task by running the followi --data-size ``` +#### Using Direct Upload (with local file) + +```bash +0g-compute-cli fine-tuning create-task \ + --provider \ + --model \ + --dataset-path \ + --config-path \ + --data-size +``` + **Parameters:** | Parameter | Description | |-----------|-------------| | `--provider` | Address of the service provider | | `--model` | Name of the pretrained model | -| `--dataset` | Root hash of the dataset on 0G Storage | +| `--dataset` | Root hash of the dataset on 0G Storage (use this OR `--dataset-path`) | +| `--dataset-path` | Path to local dataset file for direct TEE upload (use this OR `--dataset`) | | `--config-path` | Path to the parameter file | | `--data-size` | Size of the dataset | | `--gas-price` | Gas price (optional) | @@ -260,6 +323,164 @@ The above command performs the following operations: **Note:** The decrypted result will be saved as a zip file. Ensure that the `` ends with .zip (e.g., model_output.zip). After downloading, unzip the file to access the decrypted model. +### Extract LoRA Adapter + +After decryption, unzip the model to access the LoRA adapter files: + +```bash +unzip model_output.zip -d ./lora_adapter/ +``` + +The extracted folder will contain: + +``` +lora_adapter/ +├── output_model/ +│ ├── adapter_config.json # LoRA configuration +│ ├── adapter_model.safetensors # LoRA weights +│ ├── tokenizer.json # Tokenizer +│ ├── tokenizer_config.json +│ └── README.md +``` + +## Using the Fine-tuned Model + +After fine-tuning, you receive a **LoRA adapter** (Low-Rank Adaptation), not a full model. To use it, you need to: + +1. Download the base model +2. Load the LoRA adapter on top of the base model +3. Run inference + +### Step 1: Download Base Model + +Download the same base model that was used for fine-tuning: + +**Option A: From HuggingFace (Recommended)** +```bash +# Install huggingface-cli if not already installed +pip install huggingface_hub + +# Download the model +huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ./base_model +``` + +**Option B: From 0G Storage** +```bash +# Use the model root hash from the task details +0g-compute-cli fine-tuning download \ + --data-path ./base_model \ + --data-root +``` + +### Step 2: Load LoRA with Base Model + +Use the following Python code to combine the LoRA adapter with the base model: + +```python +from transformers import AutoModelForCausalLM, AutoTokenizer +from peft import PeftModel +import torch + +# Paths +base_model_path = "./base_model" # or "Qwen/Qwen2.5-0.5B-Instruct" +lora_adapter_path = "./lora_adapter/output_model" + +# Load tokenizer +tokenizer = AutoTokenizer.from_pretrained(lora_adapter_path) + +# Load base model +base_model = AutoModelForCausalLM.from_pretrained( + base_model_path, + torch_dtype=torch.bfloat16, + device_map="auto" +) + +# Load LoRA adapter +model = PeftModel.from_pretrained(base_model, lora_adapter_path) + +# For inference (optional: merge for faster inference) +# model = model.merge_and_unload() + +print("Model loaded successfully!") +``` + +### Step 3: Run Inference + +```python +def generate_response(prompt, max_new_tokens=100): + messages = [{"role": "user", "content": prompt}] + + # Apply chat template + text = tokenizer.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True + ) + + # Tokenize + inputs = tokenizer(text, return_tensors="pt").to(model.device) + + # Generate + outputs = model.generate( + **inputs, + max_new_tokens=max_new_tokens, + do_sample=True, + temperature=0.7, + top_p=0.9 + ) + + # Decode + response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) + return response + +# Example usage +response = generate_response("Hello, how are you?") +print(response) +``` + +### Optional: Merge and Save Full Model + +If you want to create a standalone model without needing to load the adapter separately: + +```python +from transformers import AutoModelForCausalLM, AutoTokenizer +from peft import PeftModel +import torch + +# Load base model and LoRA +base_model = AutoModelForCausalLM.from_pretrained( + "Qwen/Qwen2.5-0.5B-Instruct", + torch_dtype=torch.bfloat16, + device_map="auto" +) +model = PeftModel.from_pretrained(base_model, "./lora_adapter/output_model") + +# Merge LoRA weights into base model +merged_model = model.merge_and_unload() + +# Save the merged model +merged_model.save_pretrained("./merged_model") +tokenizer = AutoTokenizer.from_pretrained("./lora_adapter/output_model") +tokenizer.save_pretrained("./merged_model") + +print("Merged model saved to ./merged_model") +``` + +### Requirements + +Install the required Python packages: + +```bash +pip install torch transformers peft accelerate +``` + +| Package | Minimum Version | Purpose | +|---------|-----------------|---------| +| `torch` | >= 2.0 | Deep learning framework | +| `transformers` | >= 4.40.0 | Model loading and inference | +| `peft` | >= 0.10.0 | LoRA adapter support | +| `accelerate` | >= 0.27.0 | Device management | + ### Account Management For comprehensive account management, including viewing balances, managing sub-accounts, and handling refunds, see [Account Management](./account-management). From 3b0a20b0a61264ea289816f1fff78dbfc52436fe Mon Sep 17 00:00:00 2001 From: Wuying Created Local Users Date: Tue, 10 Feb 2026 17:56:27 +0800 Subject: [PATCH 2/4] docs: remove 0G Storage references from fine-tuning guide 0G Storage is not part of the current fine-tuning flow. Removed: - Upload to 0G Storage option (Option A) - Calculate Dataset Size section - 0G Storage-based task creation (--dataset, --data-size) - Download base model from 0G Storage (Option B) - Download Data command from Other Commands - Updated descriptions to reflect TEE-based direct upload flow Co-authored-by: Cursor --- .../compute-network/fine-tuning.md | 99 +++---------------- 1 file changed, 13 insertions(+), 86 deletions(-) diff --git a/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md b/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md index 2b28d56..4861b2e 100644 --- a/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md +++ b/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md @@ -144,70 +144,16 @@ Your dataset should be in JSONL format. Each line is a JSON object representing You can also download the dataset format specification and verification script from the [releases page](https://github.com/0gfoundation/0g-serving-broker/releases) to validate your dataset. -### Upload Dataset - -You have two options for providing your dataset: - -#### Option A: Upload to 0G Storage (Recommended for large datasets) - -```bash -# Upload to 0G Storage -0g-compute-cli fine-tuning upload --data-path - -# Output: Root hash: 0xabc123... (save this!) -``` -> Record the root hash of the dataset; it will be needed when creating the task. - -#### Option B: Direct Upload to TEE (Simpler for small datasets) - -You can skip the 0G Storage upload and provide the dataset directly when creating a task: - -```bash -0g-compute-cli fine-tuning create-task \ - --provider \ - --model \ - --dataset-path \ - --config-path \ - --data-size -``` - -This method uploads the dataset directly to the TEE (Trusted Execution Environment) during task creation. Use `--dataset-path` instead of `--dataset` when using this option. - -### Calculate Dataset Size - -After uploading the dataset to storage, you can calculate its size by running the following command: - -```bash -0g-compute-cli fine-tuning calculate-token \ - --model \ - --dataset-path \ - --provider -``` - ### Create Task -After calculating the dataset size, you can create a task by running the following command: - -#### Using 0G Storage (with root hash) - -```bash -0g-compute-cli fine-tuning create-task \ - --provider \ - --model \ - --dataset \ - --config-path \ - --data-size -``` - -#### Using Direct Upload (with local file) +Create a fine-tuning task by providing your dataset and configuration file. The dataset will be uploaded directly to the TEE (Trusted Execution Environment) and the fee will be automatically calculated by the broker based on the actual token count. ```bash 0g-compute-cli fine-tuning create-task \ --provider \ --model \ - --dataset-path \ - --config-path \ - --data-size + --dataset-path \ + --config-path ``` **Parameters:** @@ -216,10 +162,8 @@ After calculating the dataset size, you can create a task by running the followi |-----------|-------------| | `--provider` | Address of the service provider | | `--model` | Name of the pretrained model | -| `--dataset` | Root hash of the dataset on 0G Storage (use this OR `--dataset-path`) | -| `--dataset-path` | Path to local dataset file for direct TEE upload (use this OR `--dataset`) | +| `--dataset-path` | Path to local dataset file (JSONL format) | | `--config-path` | Path to the parameter file | -| `--data-size` | Size of the dataset | | `--gas-price` | Gas price (optional) | The output will be like: @@ -264,8 +208,8 @@ The output will be like: **Field Descriptions:** - **ID**: Unique identifier for your fine-tuning task -- **Pre-trained Model Hash**: Storage reference for the base model being fine-tuned -- **Dataset Hash**: Storage reference for your training dataset +- **Pre-trained Model Hash**: Hash identifier for the base model being fine-tuned +- **Dataset Hash**: Hash identifier for your training dataset - **Training Params**: Configuration parameters used during fine-tuning - **Fee (neuron)**: Total cost for the fine-tuning task - **Progress**: Task status. Possible values are Init, SettingUp, SetUp, Training, Trained, Delivering, Delivered, UserAcknowledged, Finished, Failed. These represent the following states, respectively: @@ -273,10 +217,10 @@ The output will be like: - `SettingUp`: Provider is preparing the environment to run the task - `SetUp`: Provider is ready to start training the model - `Training`: Provider is training the model - - `Trained`: provider has finished the training - - `Delivering`: Provider is uploading the fine-tuning result to storage - - `Delivered`: provider has uploaded the fine-tuning result - - `UserAcknowledged`: User has confirmed the result is downloadable + - `Trained`: Provider has finished the training + - `Delivering`: Provider is preparing the fine-tuning result + - `Delivered`: Fine-tuning result is ready for download + - `UserAcknowledged`: User has confirmed the result is downloaded - `Finished`: Task is completed - `Failed`: Task failed @@ -299,7 +243,7 @@ Training model for task beb6f0d8-4660-4c62-988d-00246ce913d2 completed successfu ### Confirm Task Result -Use the [Check Task](#monitor-progress) command to view task status. When the status changes to `Delivered`, it indicates that the provider has completed the fine-tuning task and uploaded the result to storage. The corresponding root hash has also been saved to the contract. You can download the model with the following command; CLI will download the model based on the root hash submitted by the provider. If the download is successful, CLI updates the contract information to confirm the model is downloaded. +Use the [Check Task](#monitor-progress) command to view task status. When the status changes to `Delivered`, it indicates that the provider has completed the fine-tuning task and the result is ready. You can download and acknowledge the model with the following command: ```bash 0g-compute-cli fine-tuning acknowledge-model --provider --task-id --data-path @@ -309,7 +253,7 @@ Use the [Check Task](#monitor-progress) command to view task status. When the st ### Decrypt Model -The provider will check the contract to verify if the user has confirmed the download, enabling the provider to settle fees successfully on the contract subsequently. Once the provider confirms the download, it uploads the key required for decryption to the contract, encrypted with the user's public key, and collects the fee. You can again use the `get-task` command to view the task status. When the status changes to `Finished`, it means the provider has uploaded the key. At this point, you can decrypt the model with the following command: +After acknowledging the model, the provider settles the fees and uploads the decryption key to the contract (encrypted with your public key). Use the `get-task` command to check the task status. When the status changes to `Finished`, you can decrypt the model: ```bash 0g-compute-cli fine-tuning decrypt-model --provider --task-id --encrypted-model --output @@ -353,9 +297,8 @@ After fine-tuning, you receive a **LoRA adapter** (Low-Rank Adaptation), not a f ### Step 1: Download Base Model -Download the same base model that was used for fine-tuning: +Download the same base model that was used for fine-tuning from HuggingFace: -**Option A: From HuggingFace (Recommended)** ```bash # Install huggingface-cli if not already installed pip install huggingface_hub @@ -364,14 +307,6 @@ pip install huggingface_hub huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ./base_model ``` -**Option B: From 0G Storage** -```bash -# Use the model root hash from the task details -0g-compute-cli fine-tuning download \ - --data-path ./base_model \ - --data-root -``` - ### Step 2: Load LoRA with Base Model Use the following Python code to combine the LoRA adapter with the base model: @@ -507,14 +442,6 @@ You can view the list of tasks submitted to a specific provider using the follow 0g-compute-cli fine-tuning list-tasks --provider ``` -#### Download Data - -You can download previously uploaded datasets using the command below: - -```bash -0g-compute-cli fine-tuning download --data-path --data-root -``` - #### Cancel a Task You can cancel a task before it starts running using the following command: From 475ea19e6d907a8a9347430d65276510f3721ee9 Mon Sep 17 00:00:00 2001 From: Wuying Created Local Users Date: Wed, 11 Feb 2026 22:18:08 +0800 Subject: [PATCH 3/4] docs: update fine-tuning guide to reflect actual tested workflow Key changes based on end-to-end testing with Qwen2.5-0.5B and Qwen3-32B: - Add --service fine-tuning to transfer-fund (prevents MinimumDepositRequired) - Restore Upload Dataset section with 0G Storage upload flow - Document both --dataset (hash) and --dataset-path (file) for create-task - Update fee description: auto-calculated by token count, not byte count - Add Qwen2.5-0.5B-Instruct and Qwen3-32B to model list - Add model name format warning (no Qwen/ prefix) - Add training config example with JSON tips - Fix status descriptions (Delivering = encrypting + uploading to 0G Storage) - Add settlement wait note before decrypt (~1 min after acknowledge) - Add --data-path directory tip for acknowledge-model - Remove base model 0G Storage download option (addresses reviewer comment) - Add troubleshooting for common errors (MinimumDepositRequired, settlement timing, JSON parsing) - Keep LoRA usage guide from original PR Co-authored-by: Cursor --- .../compute-network/fine-tuning.md | 190 ++++++++++++++---- 1 file changed, 154 insertions(+), 36 deletions(-) diff --git a/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md b/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md index 4861b2e..e417997 100644 --- a/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md +++ b/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md @@ -46,9 +46,14 @@ The Fine-tuning CLI requires an account to pay for service fees via the 0G Compu 0g-compute-cli deposit --amount 3 # Transfer funds to a provider for fine-tuning -0g-compute-cli transfer-fund --provider --amount 1 +# IMPORTANT: You must specify --service fine-tuning, otherwise funds go to the inference sub-account +0g-compute-cli transfer-fund --provider --amount 2 --service fine-tuning ``` +:::tip +If you see `MinimumDepositRequired` when creating a task, it means you haven't transferred funds to the provider's **fine-tuning** sub-account. Make sure to include `--service fine-tuning` in the `transfer-fund` command. +::: + ### List Providers ```bash 0g-compute-cli fine-tuning list-providers @@ -59,18 +64,11 @@ The output will be like: │ Provider 1 │ 0xf07240Efa67755B5311bc75784a061eDB47165Dd │ ├──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ Available │ ✓ │ -├──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤ -│ Price Per Byte in Dataset (0G) │ 0.000000000000000001 │ -├──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤ -│ Provider 2 │ ...... │ -├──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤ -│ ...... │ ...... │ └──────────────────────────────────────────────────┴──────────────────────────────────────────────────┘ ``` -- **Provider x:** The address of the provider. The address of the official provider is ```0xf07240Efa67755B5311bc75784a061eDB47165Dd```. -- **Available:** Indicates if the provider is available. If ```✓```, the provider is available. If ```✗```, the provider is occupied. -- **Price Per Byte in Dataset (0G):** The service fee charged by the provider. The fee is currently based on the byte count of the dataset. Future versions may charge more accurately based on the token count of the dataset. +- **Provider x:** The address of the provider. +- **Available:** Indicates if the provider is available. If `✓`, the provider is available. If `✗`, the provider is occupied. ### List Models @@ -89,21 +87,42 @@ These are standard models available across all providers: | Model Name | Type | Description | |------------|------|-------------| -| `distilbert-base-uncased` | Text Classification | DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. More details: [HuggingFace](https://huggingface.co/distilbert/distilbert-base-uncased) | +| `distilbert-base-uncased` | Text Classification | DistilBERT model, smaller and faster than BERT. More details: [HuggingFace](https://huggingface.co/distilbert/distilbert-base-uncased) | +| `Qwen2.5-0.5B-Instruct` | Causal LM | Qwen 2.5 instruction-tuned model (0.5B parameters). More details: [HuggingFace](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | +| `Qwen3-32B` | Causal LM | Qwen 3 large language model (32B parameters). More details: [HuggingFace](https://huggingface.co/Qwen/Qwen3-32B) | The output consists of two main sections: -- **Predefined Models:** These are models that are provided by the system as predefined options. They are typically built-in, curated, and maintained to ensure quality, reliability, and broad applicability across common use cases. +- **Predefined Models:** Models provided by the system as predefined options. They are built-in, curated, and maintained to ensure quality and reliability. -- **Provider's Model:** These models are offered by external service providers. Providers may customize or fine-tune models to address specific needs, industries, or advanced use cases. The availability and quality of these models may vary depending on the provider. +- **Provider's Model:** Models offered by external service providers. Providers may customize or fine-tune models to address specific needs. -*Note:* We currently offer the models listed above as presets. You can choose one of these models for fine-tuning. More models will be provided in future versions. +:::caution Model Name Format +Use model names **without** the `Qwen/` prefix when specifying the `--model` parameter. For example: +- ✅ `--model "Qwen2.5-0.5B-Instruct"` +- ❌ `--model "Qwen/Qwen2.5-0.5B-Instruct"` +::: ### Prepare Configuration File Please download the parameter file template for the model you wish to fine-tune from the [releases page](https://github.com/0gfoundation/0g-serving-broker/releases) and modify it according to your needs. +Example training configuration (`config.json`): +```json +{ + "neftune_noise_alpha": 5, + "num_train_epochs": 1, + "per_device_train_batch_size": 2, + "learning_rate": 0.0002, + "max_steps": 3 +} +``` + +:::tip +Use decimal notation for `learning_rate` (e.g., `0.0002` instead of `2e-4`). Some JSON parsers may not accept scientific notation. +::: + *Note:* For custom models provided by third-party Providers, you can download the usage template including instructions on how to construct the dataset and training configuration using the following command: ```bash @@ -144,9 +163,40 @@ Your dataset should be in JSONL format. Each line is a JSON object representing You can also download the dataset format specification and verification script from the [releases page](https://github.com/0gfoundation/0g-serving-broker/releases) to validate your dataset. +### Upload Dataset + +Upload your dataset to 0G Storage. The returned root hash will be used when creating a task. + +```bash +0g-compute-cli fine-tuning upload --data-path +``` + +Output: +```bash +Root hash: 0xabc123... +``` + +> **Save the root hash** — you will need it in the next step. + ### Create Task -Create a fine-tuning task by providing your dataset and configuration file. The dataset will be uploaded directly to the TEE (Trusted Execution Environment) and the fee will be automatically calculated by the broker based on the actual token count. +Create a fine-tuning task. The fee will be **automatically calculated** by the broker based on the actual token count of your dataset. + +**Option A: Using dataset root hash (recommended)** + +If you already uploaded your dataset with the `upload` command: + +```bash +0g-compute-cli fine-tuning create-task \ + --provider \ + --model \ + --dataset \ + --config-path +``` + +**Option B: Using local dataset file** + +The CLI will automatically upload the dataset to 0G Storage and create the task in one step: ```bash 0g-compute-cli fine-tuning create-task \ @@ -161,9 +211,10 @@ Create a fine-tuning task by providing your dataset and configuration file. The | Parameter | Description | |-----------|-------------| | `--provider` | Address of the service provider | -| `--model` | Name of the pretrained model | -| `--dataset-path` | Path to local dataset file (JSONL format) | -| `--config-path` | Path to the parameter file | +| `--model` | Name of the pretrained model (without `Qwen/` prefix) | +| `--dataset` | Root hash of the dataset on 0G Storage (Option A) | +| `--dataset-path` | Path to local dataset file — mutually exclusive with `--dataset` (Option B) | +| `--config-path` | Path to the training configuration file | | `--gas-price` | Gas price (optional) | The output will be like: @@ -171,7 +222,8 @@ The output will be like: ```bash Verify provider... Provider verified -Creating task... +Creating task (fee will be calculated automatically)... +Fee will be automatically calculated by the broker based on actual token count Created Task ID: 6b607314-88b0-4fef-91e7-43227a54de57 ``` @@ -200,7 +252,7 @@ The output will be like: ├───────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────┤ │ Training Params │ {......} │ ├───────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────┤ -│ Fee (neuron) │ 179668154 │ +│ Fee (neuron) │ 82 │ ├───────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────┤ │ Progress │ Delivered │ └───────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────┘ @@ -209,19 +261,19 @@ The output will be like: **Field Descriptions:** - **ID**: Unique identifier for your fine-tuning task - **Pre-trained Model Hash**: Hash identifier for the base model being fine-tuned -- **Dataset Hash**: Hash identifier for your training dataset +- **Dataset Hash**: Hash identifier for your training dataset (0G Storage root hash) - **Training Params**: Configuration parameters used during fine-tuning -- **Fee (neuron)**: Total cost for the fine-tuning task -- **Progress**: Task status. Possible values are Init, SettingUp, SetUp, Training, Trained, Delivering, Delivered, UserAcknowledged, Finished, Failed. These represent the following states, respectively: +- **Fee (neuron)**: Total cost for the fine-tuning task (automatically calculated based on token count) +- **Progress**: Task status. Possible values are: - `Init`: Task submitted - - `SettingUp`: Provider is preparing the environment to run the task - - `SetUp`: Provider is ready to start training the model + - `SettingUp`: Provider is preparing the environment (downloading dataset, etc.) + - `SetUp`: Provider is ready to start training - `Training`: Provider is training the model - - `Trained`: Provider has finished the training - - `Delivering`: Provider is preparing the fine-tuning result + - `Trained`: Provider has finished training + - `Delivering`: Provider is encrypting and uploading the model to 0G Storage - `Delivered`: Fine-tuning result is ready for download - - `UserAcknowledged`: User has confirmed the result is downloaded - - `Finished`: Task is completed + - `UserAcknowledged`: User has downloaded and confirmed the result + - `Finished`: Provider has settled fees and shared decryption key — task is completed - `Failed`: Task failed ### View Task Logs @@ -241,22 +293,35 @@ Step: 0, Logs: {'loss': ..., 'accuracy': ...} Training model for task beb6f0d8-4660-4c62-988d-00246ce913d2 completed successfully ``` -### Confirm Task Result +### Download and Acknowledge Model -Use the [Check Task](#monitor-progress) command to view task status. When the status changes to `Delivered`, it indicates that the provider has completed the fine-tuning task and the result is ready. You can download and acknowledge the model with the following command: +Use the [Check Task](#monitor-progress) command to view task status. When the status changes to `Delivered`, the provider has completed fine-tuning and the encrypted model is ready. Download and acknowledge the model: ```bash -0g-compute-cli fine-tuning acknowledge-model --provider --task-id --data-path +0g-compute-cli fine-tuning acknowledge-model \ + --provider \ + --task-id \ + --data-path ``` +The CLI will automatically download the encrypted model from 0G Storage. If 0G Storage download fails, it will fall back to downloading directly from the provider's TEE. + +:::tip +`--data-path` can be either a file path or a directory. If you provide a directory, the CLI will automatically create a file named `model_.bin` inside it. +::: + **Note:** The model file downloaded with the above command is encrypted, and additional steps are required for decryption. ### Decrypt Model -After acknowledging the model, the provider settles the fees and uploads the decryption key to the contract (encrypted with your public key). Use the `get-task` command to check the task status. When the status changes to `Finished`, you can decrypt the model: +After acknowledging the model, the provider automatically settles the fees and uploads the decryption key to the contract (encrypted with your public key). Use the `get-task` command to check the task status. **When the status changes to `Finished`**, you can decrypt the model: ```bash -0g-compute-cli fine-tuning decrypt-model --provider --task-id --encrypted-model --output +0g-compute-cli fine-tuning decrypt-model \ + --provider \ + --task-id \ + --encrypted-model \ + --output ``` The above command performs the following operations: @@ -265,6 +330,10 @@ The above command performs the following operations: - Decrypts the key using the user's private key - Decrypts the model with the decrypted key +:::caution Wait for Settlement +After `acknowledge-model`, the provider needs about **1 minute** to settle fees and upload the decryption key. If you decrypt too early (status is still `UserAcknowledged` instead of `Finished`), you may see an error like `second arg must be public key`. Simply wait and retry. +::: + **Note:** The decrypted result will be saved as a zip file. Ensure that the `` ends with .zip (e.g., model_output.zip). After downloading, unzip the file to access the decrypted model. ### Extract LoRA Adapter @@ -303,7 +372,7 @@ Download the same base model that was used for fine-tuning from HuggingFace: # Install huggingface-cli if not already installed pip install huggingface_hub -# Download the model +# Download the model (use the full HuggingFace model name with Qwen/ prefix) huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ./base_model ``` @@ -434,6 +503,22 @@ Quick CLI commands: ### Other Commands +#### Upload Dataset Separately + +You can upload a dataset to 0G Storage before creating a task: + +```bash +0g-compute-cli fine-tuning upload --data-path +``` + +#### Download Data + +You can download previously uploaded datasets from 0G Storage: + +```bash +0g-compute-cli fine-tuning download --data-path --data-root +``` + #### View Task List You can view the list of tasks submitted to a specific provider using the following command: @@ -454,6 +539,17 @@ You can cancel a task before it starts running using the following command: ## Troubleshooting +
+Error: MinimumDepositRequired + +This means the provider's fine-tuning sub-account has insufficient funds. Make sure to include `--service fine-tuning` when transferring funds: + +```bash +0g-compute-cli transfer-fund --provider --amount 2 --service fine-tuning +``` + +
+
Error: Provider busy @@ -468,6 +564,28 @@ The provider is processing another task. Options: Add more funds: ```bash -0g-compute-cli deposit --amount 0.1 +0g-compute-cli deposit --amount 3 +0g-compute-cli transfer-fund --provider --amount 2 --service fine-tuning +``` +
+ +
+Error: "second arg must be public key" when decrypting + +This means the provider hasn't finished settlement yet. Wait about 1 minute after `acknowledge-model`, then check the task status: + +```bash +0g-compute-cli fine-tuning get-task --provider --task ``` + +When `Progress` shows `Finished`, retry the `decrypt-model` command. +
+ +
+Error: "Unexpected non-whitespace character after JSON" when creating task + +Check your training configuration JSON file: +- Ensure valid JSON format +- Use decimal notation for numbers (e.g., `0.0002` instead of `2e-4`) +- Verify no trailing commas
From 350f04bb9f2c0bcc745284d31fe93d4ebe91c480 Mon Sep 17 00:00:00 2001 From: Ravenyjh Date: Thu, 12 Feb 2026 17:10:20 +0800 Subject: [PATCH 4/4] fix --- .../compute-network/fine-tuning.md | 260 ++++++++++++++---- 1 file changed, 212 insertions(+), 48 deletions(-) diff --git a/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md b/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md index e417997..6b6049b 100644 --- a/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md +++ b/docs/developer-hub/building-on-0g/compute-network/fine-tuning.md @@ -6,7 +6,7 @@ sidebar_position: 5 # Fine-tuning -Customize AI models with your own data using 0G's distributed GPU network (currently available on testnet only). +Customize AI models with your own data using 0G's distributed GPU network. ## Quick Start @@ -23,12 +23,10 @@ pnpm install @0glabs/0g-serving-broker -g #### Choose Network ```bash -# Setup network (fine-tuning currently supports testnet only) +# Setup network 0g-compute-cli setup-network ``` -**Important**: Fine-tuning services are currently available on **testnet only**. Mainnet support will be added in future releases. - #### Login with Wallet Enter your wallet private key when prompted. ```bash @@ -61,7 +59,7 @@ If you see `MinimumDepositRequired` when creating a task, it means you haven't t The output will be like: ```bash ┌──────────────────────────────────────────────────┬──────────────────────────────────────────────────┐ -│ Provider 1 │ 0xf07240Efa67755B5311bc75784a061eDB47165Dd │ +│ Provider 1 │ 0x940b4a101CaBa9be04b16A7363cafa29C1660B0d │ ├──────────────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ Available │ ✓ │ └──────────────────────────────────────────────────┴──────────────────────────────────────────────────┘ @@ -85,11 +83,10 @@ The CLI displays two categories of models: predefined models available across al #### Predefined Models These are standard models available across all providers: -| Model Name | Type | Description | -|------------|------|-------------| -| `distilbert-base-uncased` | Text Classification | DistilBERT model, smaller and faster than BERT. More details: [HuggingFace](https://huggingface.co/distilbert/distilbert-base-uncased) | -| `Qwen2.5-0.5B-Instruct` | Causal LM | Qwen 2.5 instruction-tuned model (0.5B parameters). More details: [HuggingFace](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | -| `Qwen3-32B` | Causal LM | Qwen 3 large language model (32B parameters). More details: [HuggingFace](https://huggingface.co/Qwen/Qwen3-32B) | +| Model Name | Type | Price per Million Tokens | Description | +|------------|------|--------------------------|-------------| +| `Qwen2.5-0.5B-Instruct` | Causal LM | 0.5 0G | Qwen 2.5 instruction-tuned model (0.5B parameters). More details: [HuggingFace](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | +| `Qwen3-32B` | Causal LM | 4 0G | Qwen 3 large language model (32B parameters). More details: [HuggingFace](https://huggingface.co/Qwen/Qwen3-32B) | @@ -106,9 +103,11 @@ Use model names **without** the `Qwen/` prefix when specifying the `--model` par ::: ### Prepare Configuration File -Please download the parameter file template for the model you wish to fine-tune from the [releases page](https://github.com/0gfoundation/0g-serving-broker/releases) and modify it according to your needs. -Example training configuration (`config.json`): +Use the standard configuration template below and **only modify the parameter values** as needed. Do not add additional parameters. + +#### Standard Configuration Template + ```json { "neftune_noise_alpha": 5, @@ -119,8 +118,32 @@ Example training configuration (`config.json`): } ``` -:::tip -Use decimal notation for `learning_rate` (e.g., `0.0002` instead of `2e-4`). Some JSON parsers may not accept scientific notation. +:::caution Important Configuration Rules +1. **Use the template above** - Copy the entire template +2. **Only modify parameter values** - Do not add or remove parameters +3. **Use decimal notation** - Write `0.0002` instead of `2e-4` for `learning_rate` + +**Common mistakes to avoid:** +- ❌ Adding extra parameters (e.g., `"fp16": true`, `"bf16": false`) +- ❌ Removing existing parameters +- ❌ Using scientific notation like `2e-4` +::: + +#### Adjustable Parameters + +You can modify these parameter values based on your training needs: + +| Parameter | Description | Notes | +|-----------|-------------|-------| +| `neftune_noise_alpha` | Noise injection for fine-tuning | 0-10 (0 = disabled), typical: 5 | +| `num_train_epochs` | Number of complete passes through the dataset | Positive integer, typical: 1-3 for fine-tuning | +| `per_device_train_batch_size` | Training batch size | 1-4, reduce to 1 if out of memory | +| `learning_rate` | Learning rate (use decimal notation) | 0.00001-0.001, typical: 0.0002 | +| `max_steps` | Maximum training steps | -1 (use epochs) or positive integer | + +:::tip GPU Memory Management +- If you encounter out-of-memory errors, **reduce batch size to 1** +- The provider automatically handles mixed precision training with `bf16` ::: *Note:* For custom models provided by third-party Providers, you can download the usage template including instructions on how to construct the dataset and training configuration using the following command: @@ -131,7 +154,7 @@ Use decimal notation for `learning_rate` (e.g., `0.0002` instead of `2e-4`). Som ### Prepare Your Data -Your dataset should be in JSONL format. Each line is a JSON object representing one training example. +Your dataset must be in **JSONL format** with a **`.jsonl` file extension**. Each line is a JSON object representing one training example. #### Supported Dataset Formats @@ -156,53 +179,50 @@ Your dataset should be in JSONL format. Each line is a JSON object representing #### Dataset Guidelines +- **File format**: Must be a `.jsonl` file (JSONL format) - **Minimum examples**: At least 10 examples recommended for meaningful fine-tuning - **Quality**: Ensure examples are accurate and representative of your use case - **Consistency**: Use the same format throughout the dataset - **Encoding**: UTF-8 encoding required -You can also download the dataset format specification and verification script from the [releases page](https://github.com/0gfoundation/0g-serving-broker/releases) to validate your dataset. - -### Upload Dataset - -Upload your dataset to 0G Storage. The returned root hash will be used when creating a task. - -```bash -0g-compute-cli fine-tuning upload --data-path -``` - -Output: -```bash -Root hash: 0xabc123... -``` - -> **Save the root hash** — you will need it in the next step. - ### Create Task Create a fine-tuning task. The fee will be **automatically calculated** by the broker based on the actual token count of your dataset. -**Option A: Using dataset root hash (recommended)** +**Option A: Using local dataset file (Recommended)** -If you already uploaded your dataset with the `upload` command: +The CLI will automatically upload the dataset to 0G Storage and create the task in one step: ```bash 0g-compute-cli fine-tuning create-task \ --provider \ --model \ - --dataset \ + --dataset-path \ --config-path ``` -**Option B: Using local dataset file** +**Option B: Using dataset root hash** -The CLI will automatically upload the dataset to 0G Storage and create the task in one step: +If you prefer to upload the dataset separately first, or need to reuse the same dataset: + +1. Upload your dataset to 0G Storage: + +```bash +0g-compute-cli fine-tuning upload --data-path +``` + +Output: +```bash +Root hash: 0xabc123... +``` + +2. Create the task using the root hash: ```bash 0g-compute-cli fine-tuning create-task \ --provider \ --model \ - --dataset-path \ + --dataset \ --config-path ``` @@ -212,8 +232,8 @@ The CLI will automatically upload the dataset to 0G Storage and create the task |-----------|-------------| | `--provider` | Address of the service provider | | `--model` | Name of the pretrained model (without `Qwen/` prefix) | -| `--dataset` | Root hash of the dataset on 0G Storage (Option A) | -| `--dataset-path` | Path to local dataset file — mutually exclusive with `--dataset` (Option B) | +| `--dataset-path` | Path to local dataset file — automatically uploads to 0G Storage (Option A) | +| `--dataset` | Root hash of the dataset on 0G Storage — mutually exclusive with `--dataset-path` (Option B) | | `--config-path` | Path to the training configuration file | | `--gas-price` | Gas price (optional) | @@ -229,6 +249,41 @@ Created Task ID: 6b607314-88b0-4fef-91e7-43227a54de57 *Note:* When creating a task for the same provider, you must wait for the previous task to be completed (status `Finished`) before creating a new task. If the provider is currently running other tasks, you will be prompted to choose between adding your task to the waiting queue or canceling the request. +### Fee Calculation + +The fine-tuning service fee is **automatically calculated** based on your dataset size and training configuration. The fee consists of two components: + +#### Formula + +``` +Total Fee = Training Fee + Storage Reserve Fee +``` + +Where: +- **Training Fee** = `(tokenSize / 1,000,000) × pricePerMillionTokens × trainEpochs` +- **Storage Reserve Fee** = Fixed amount based on model size + +#### Components Explained + +| Component | Description | +|-----------|-------------| +| `tokenSize` | Total number of tokens in your dataset (automatically counted) | +| `pricePerMillionTokens` | Price per million tokens (model-specific, see [Predefined Models](#predefined-models)) | +| `trainEpochs` | Number of training epochs (from your config) | +| `Storage Reserve Fee` | Fixed fee to reserve storage for the fine-tuned model:
• Qwen3-32B (~900 MB LoRA): 0.09 0G
• Qwen2.5-0.5B-Instruct (~100 MB LoRA): 0.01 0G | + +#### Example + +For a dataset with 10,000 tokens, trained for 3 epochs on Qwen2.5-0.5B-Instruct: +- Price per million tokens = 0.5 0G (see [Predefined Models](#predefined-models)) +- Training Fee = (10,000 / 1,000,000) × 0.5 × 3 = 0.015 0G +- Storage Reserve Fee = 0.01 0G (for Qwen2.5-0.5B-Instruct) +- **Total Fee = 0.025 0G** + +:::tip +The actual fee is calculated during the setup phase after your dataset is analyzed. You can view the final fee using the [`get-task`](#monitor-progress) command before training begins. +::: + ### Monitor Progress You can monitor the progress of your task by running the following command: @@ -301,13 +356,36 @@ Use the [Check Task](#monitor-progress) command to view task status. When the st 0g-compute-cli fine-tuning acknowledge-model \ --provider \ --task-id \ - --data-path + --data-path ``` The CLI will automatically download the encrypted model from 0G Storage. If 0G Storage download fails, it will fall back to downloading directly from the provider's TEE. -:::tip -`--data-path` can be either a file path or a directory. If you provide a directory, the CLI will automatically create a file named `model_.bin` inside it. +:::danger 48-Hour Deadline +**You must download and acknowledge the model within 48 hours after the task status changes to `Delivered`.** + +If you fail to acknowledge within 48 hours: +- The provider will **force settlement** automatically +- You will **lose access to the fine-tuned model** +- **30% of the total task fee** will be deducted as compensation for the provider's compute resources + +**Action required:** Monitor your task status and download promptly when it reaches `Delivered`. +::: + +:::caution File Path Required +`--data-path` **must be a file path**, not a directory. + +**Example:** +```bash +0g-compute-cli fine-tuning acknowledge-model \ + --provider \ + --task-id 0e91ef3d-ac0d-422e-a38c-9d42a28c4412 \ + --data-path /workspace/output/encrypted_model.bin +``` +::: + +:::tip Data Integrity Verification +The `acknowledge-model` command performs automatic data integrity verification to ensure the downloaded model matches the root hash that the provider submitted to the blockchain contract. This guarantees you receive the authentic model without corruption or tampering. ::: **Note:** The model file downloaded with the above command is encrypted, and additional steps are required for decryption. @@ -320,10 +398,20 @@ After acknowledging the model, the provider automatically settles the fees and u 0g-compute-cli fine-tuning decrypt-model \ --provider \ --task-id \ - --encrypted-model \ + --encrypted-model \ --output ``` +**Example:** +```bash +# Use the same file path you specified in acknowledge-model +0g-compute-cli fine-tuning decrypt-model \ + --provider \ + --task-id 0e91ef3d-ac0d-422e-a38c-9d42a28c4412 \ + --encrypted-model /workspace/output/encrypted_model.bin \ + --output /workspace/output/model_output.zip +``` + The above command performs the following operations: - Gets the encrypted key from the contract uploaded by the provider @@ -372,13 +460,18 @@ Download the same base model that was used for fine-tuning from HuggingFace: # Install huggingface-cli if not already installed pip install huggingface_hub -# Download the model (use the full HuggingFace model name with Qwen/ prefix) +# For Qwen2.5-0.5B-Instruct huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ./base_model + +# For Qwen3-32B (requires ~65GB disk space) +# huggingface-cli download Qwen/Qwen3-32B --local-dir ./base_model ``` ### Step 2: Load LoRA with Base Model -Use the following Python code to combine the LoRA adapter with the base model: +Use the following Python code to combine the LoRA adapter with the base model. + +**For Qwen2.5-0.5B-Instruct:** ```python from transformers import AutoModelForCausalLM, AutoTokenizer @@ -402,12 +495,56 @@ base_model = AutoModelForCausalLM.from_pretrained( # Load LoRA adapter model = PeftModel.from_pretrained(base_model, lora_adapter_path) -# For inference (optional: merge for faster inference) -# model = model.merge_and_unload() +print("Model loaded successfully!") +``` + +**For Qwen3-32B (requires 40GB+ VRAM):** + +```python +from transformers import AutoModelForCausalLM, AutoTokenizer +from peft import PeftModel +import torch + +# Paths +base_model_path = "./base_model" # or "Qwen/Qwen3-32B" +lora_adapter_path = "./lora_adapter/output_model" + +# Load tokenizer +tokenizer = AutoTokenizer.from_pretrained(lora_adapter_path) + +# Load base model with optimizations for large models +base_model = AutoModelForCausalLM.from_pretrained( + base_model_path, + torch_dtype=torch.float16, # Use fp16 to reduce memory + device_map="auto", # Automatically distribute across GPUs + low_cpu_mem_usage=True, # Reduce CPU memory usage during loading + trust_remote_code=True # Required for some Qwen models +) + +# Load LoRA adapter +model = PeftModel.from_pretrained(base_model, lora_adapter_path) print("Model loaded successfully!") ``` +:::tip Memory Optimization for Large Models +If you encounter out-of-memory errors with Qwen3-32B, you can use quantization: + +```python +# 8-bit quantization (requires bitsandbytes) +from transformers import BitsAndBytesConfig + +quantization_config = BitsAndBytesConfig(load_in_8bit=True) + +base_model = AutoModelForCausalLM.from_pretrained( + base_model_path, + quantization_config=quantization_config, + device_map="auto", + trust_remote_code=True +) +``` +::: + ### Step 3: Run Inference ```python @@ -474,10 +611,29 @@ print("Merged model saved to ./merged_model") Install the required Python packages: +#### For GPU Environments (Recommended) + +If you have an NVIDIA GPU, install PyTorch with CUDA support. **Important:** Match the CUDA version to your environment. + +```bash +# For CUDA 12.1 (check your CUDA version with: nvidia-smi) +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 + +# For CUDA 11.8 +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 + +# Install other ML libraries +pip install transformers peft accelerate +``` + +#### For CPU-Only Environments + ```bash pip install torch transformers peft accelerate ``` +#### Package Requirements + | Package | Minimum Version | Purpose | |---------|-----------------|---------| | `torch` | >= 2.0 | Deep learning framework | @@ -485,6 +641,14 @@ pip install torch transformers peft accelerate | `peft` | >= 0.10.0 | LoRA adapter support | | `accelerate` | >= 0.27.0 | Device management | +:::tip Verify GPU Support +After installation, verify that PyTorch can detect your GPU: +```bash +python3 -c "import torch; print('PyTorch version:', torch.__version__); print('CUDA available:', torch.cuda.is_available())" +``` +If `CUDA available: False`, you may need to reinstall PyTorch with the correct CUDA version. +::: + ### Account Management For comprehensive account management, including viewing balances, managing sub-accounts, and handling refunds, see [Account Management](./account-management).