Skip to content

Commit f366928

Browse files
authored
[Function optimization] revert gpt run_pretrain to enable tipc (PaddlePaddle#4702)
* revert gpt tipc * update test_gpt & run_pretrain modeling init * update gpt ci-case * revert gpt2 train_infer_python.txt file * update pytest run command in ci-case * update run_pretrain_static
1 parent b7cccc5 commit f366928

File tree

6 files changed

+299
-449
lines changed

6 files changed

+299
-449
lines changed

model_zoo/gpt/README.md

Lines changed: 29 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -87,22 +87,21 @@ mv gpt_en_dataset_300m_idx.npz ./data
8787

8888
```shell
8989
CUDA_VISIBLE_DEVICES=0 python run_pretrain.py \
90-
--model_name_or_path gpt2-en \
91-
--input_dir ./data \
92-
--output_dir ./output_dir/pretrain \
93-
--weight_decay 0.01 \
94-
--max_steps 500000 \
95-
--save_steps 100000 \
96-
--device gpu \
97-
--warmup_steps 320000 \
98-
--warmup_ratio 0.01 \
99-
--mirco_batch_size 4 \
100-
--eval_steps 100 \
101-
--do_train true \
102-
--do_predict true
90+
--model_type gpt \
91+
--model_name_or_path gpt2-en \
92+
--input_dir "./data"\
93+
--output_dir "output"\
94+
--weight_decay 0.01\
95+
--grad_clip 1.0\
96+
--max_steps 500000\
97+
--save_steps 100000\
98+
--decay_steps 320000\
99+
--warmup_rate 0.01\
100+
--micro_batch_size 4\
101+
--device gpu
103102
```
104103

105-
配置文件中参数释义如下
104+
其中参数释义如下
106105
- `model_name_or_path` 要训练的模型或者之前训练的checkpoint。
107106
- `input_dir` 指定输入文件,可以使用目录,指定目录时将包括目录中的所有文件。
108107
- `output_dir` 指定输出文件。
@@ -113,28 +112,31 @@ CUDA_VISIBLE_DEVICES=0 python run_pretrain.py \
113112
- `mirco_batch_size` 训练的batch大小
114113
- `device` 训练设备
115114

115+
用户也可以使用提供的shell脚本直接训练`sh scripts/run.sh`.
116+
116117
#### 单机多卡
117118

118119
同样,可以执行如下命令实现八卡训练:
119120

120121
```shell
121122
unset CUDA_VISIBLE_DEVICES
122123
python -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_pretrain.py \
123-
--model_name_or_path gpt2-en \
124-
--input_dir ./data \
125-
--output_dir ./output_dir/pretrain \
126-
--weight_decay 0.01 \
127-
--max_steps 500000 \
128-
--save_steps 100000 \
129-
--device gpu \
130-
--warmup_steps 320000 \
131-
--warmup_ratio 0.01 \
132-
--mirco_batch_size 8 \
133-
--eval_steps 100 \
134-
--do_train true \
135-
--do_predict true
124+
--model_type gpt \
125+
--model_name_or_path gpt2-en \
126+
--input_dir "./data"\
127+
--output_dir "output"\
128+
--weight_decay 0.01\
129+
--grad_clip 1.0\
130+
--max_steps 500000\
131+
--save_steps 100000\
132+
--decay_steps 320000\
133+
--warmup_rate 0.01\
134+
--micro_batch_size 4\
135+
--device gpu
136136
```
137137

138+
用户也可以使用提供的shell脚本直接训练`sh scripts/run_multi.sh`.
139+
138140
### 模型评估
139141

140142
我们提供了对[WikiText](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip)[LAMBADA](https://raw.githubusercontent.com/cybertronai/bflm/master/lambada_test.jsonl)两种数据集的评估脚本, 使用如下命令启动评估:

model_zoo/gpt/args.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,12 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14-
from __future__ import annotations
1514

1615
import argparse
1716

1817
import paddle
1918

20-
from paddlenlp.utils.log import logger # noqa: E402
19+
from paddlenlp.utils.log import logger
2120

2221

2322
def str2bool(v):
@@ -169,6 +168,9 @@ def parse_args(MODEL_CLASSES):
169168
parser.add_argument(
170169
"--device", type=str, default="gpu", choices=["cpu", "gpu", "xpu", "npu"], help="select cpu, gpu, xpu devices."
171170
)
171+
parser.add_argument(
172+
"--lr_decay_style", type=str, default="cosine", choices=["cosine", "none"], help="Learning rate decay style."
173+
)
172174
parser.add_argument(
173175
"-p",
174176
"--profiler_options",

0 commit comments

Comments
 (0)