v0.3.01
🎉 What's Changed
New Features:
- Added retry_on_http_error and retry_openai_api decorators with backoff strategy to implement automatic retry mechanism for online LLM calls
- feat(dataset): Enhance training data by including time-related options.
Enhancements:
- performance(PII): add batch PII detection to improve performance
- refactor: unifies combined content separator \n
- feat(PII): enhance PII detection for Chinese
Fix:
- Fix Regarding deepspeed version #184
- fix(dataset): Data processing results have no images.
Tests:
- Implement test for PII filtering in dataset generation
- Refactor test fixtures in test_full_pipe, add setup_data_environment and blocked word/image tag assertions
Full Changelog: v0.3.0...v0.3.01
🥲更新内容
新功能:
- 新增退避策略的retry_on_http_error和retry_openai_api装饰器,增加LLM在线调用自动重试机制
- feat(dataset):训练数据增加包含时间的选项
功能优化:
- performance(PII):新增批量PII检测以提升性能
- refactor:统一group内容分隔符为\n
- feat(PII):增强中文PII检测能力
问题修复:
- 修复关于deepspeed版本的#184问题
- fix(dataset):数据处理结果缺失图像
测试相关:
- 新增PII过滤测试脚本
- test_full_pipe新增setup_data_environment及禁用词/图片标签断言检查