Releases: crawlab-team/crawlab
v0.6.3-dev
What's Changed
- chore: update version by @tikazyq in #1330
- feat: added export spider files by @tikazyq in #1332
- fix: unable to terminate process when cancelling a task by @tikazyq in #1334
- Develop by @tikazyq in #1341
- chore: updated deps by @tikazyq in #1342
- fix: unordered columns of export by @tikazyq in #1343
- feat: added priority in spider and schedule by @tikazyq in #1344
- Updated README by @tikazyq in #1345
- feat: auto cleanup tasks over 30 days ago by @tikazyq in #1346
- chore: updated deps by @tikazyq in #1351
- chore: updated deps by @tikazyq in #1352
- chore(deps): bump google.golang.org/grpc from 1.42.0 to 1.53.0 in /backend by @dependabot in #1355
- Develop by @tikazyq in #1359
- fix(sec): upgrade github.com/gin-gonic/gin to 1.9.1 by @chncaption in #1357
New Contributors
- @chncaption made their first contribution in #1357
Full Changelog: v0.6.2...v0.6.3-dev
v0.6.3
-
Crawlab v0.6.3 Official Release
Overview
Crawlab v0.6.3 is the latest iteration of Crawlab v0.6.x, bringing a series of improvements, including bug fixes and feature optimizations.
Changelog
Bug Fixes
- Unable to terminate processes when canceling a task
- Error in Git code pulling
- Inconsistent order in the export list
- Unable to reset pending task status during restart
- Unable to cancel tasks when a node goes offline
- Unable to load node data in the crawler list
- Exported data garbled
Feature Optimizations
- Crawler file export
- Adjusted task retrieval time to 1 second
- Exception handling when FileDriver is closed
- Automatic cleaning of tasks older than 30 days
- Homepage data query optimization
- Message notification optimization
- Upgraded Gin version
- Added support for MatterMost message notification
- Added priority to crawlers and scheduled tasks
- Front-end loading performance optimization
Community
If you find Crawlab helpful for your daily development or company, please consider giving it a star on Github. If you encounter any issues, feel free to raise them as issues on Github. Additionally, you are welcome to contribute to Crawlab development. You can also join the Crawlab technical discussion group on WeChat by adding tikazyq1 to exchange ideas and experiences with other developers regarding technical development and deployment.
References
- Official Website: https://www.crawlab.cn
- Documentation: https://docs.crawlab.cn
- GitHub Repository: https://github.com/crawlab-team/crawlab
- Demo: https://demo.crawlab.cn/
v0.6.2
Web Crawler Management Platform Crawlab v0.6.2 Official Release
Overview
Crawlab v0.6.2 is the latest iterative version of Crawlab v0.6.x, bringing a series of improvements, including bug fixes, feature enhancements, and enhanced functionality for environment variables.
Changelog
Bug Fixes
- Unexpected database connection
- Task execution command not effective
- Inconsistent task restart execution command
- Inconsistent task queue
- Unable to update Cron expression
- Tasks continue to run after node is disabled
- Task ID not returned
- Unable to download JSON data
- Unable to start SeaweedFS
- Incorrect MaxRunners setting for tasks
- Duplication not effective
- Task stuck due to long logs
- Unable to monitor data source status
- Git pull error
- Application crash when spider does not exist
- Dataset issues
Feature Enhancements
- Default display of latest list data
- Close Runner after task completion
- Batch logging
- Configure log TTL
- More data sources
- Environment variables
- Remove unnecessary buttons
Community
If you find Crawlab helpful for your daily development or your company, please consider starring it on GitHub. If you encounter any issues, feel free to raise them as issues on GitHub. Additionally, you're welcome to contribute to the development of Crawlab. You can also join the Crawlab technical discussion group by adding WeChat account tikazyq1, where you can communicate and discuss with other developers regarding technical development and deployment usage.
References
- Official Website: https://www.crawlab.cn
- Documentation: https://docs.crawlab.cn
- GitHub: https://github.com/crawlab-team/crawlab
- Demo: https://demo.crawlab.cn/
v0.6.1
What's Changed
- Bump eventsource from 1.1.0 to 1.1.1 in /frontend by @dependabot in #1115
- Develop by @tikazyq in #1117
- Develop by @tikazyq in #1119
- Develop by @tikazyq in #1122
- Develop by @tikazyq in #1124
- Develop by @tikazyq in #1127
- Develop by @tikazyq in #1145
- Develop by @tikazyq in #1148
- fix(mem): fixed memory leak issue resulted by log collector by @tikazyq in #1157
- build(golang/dockerfile): rm unused tar by @ma-pony in #1165
- Develop by @tikazyq in #1169
- Develop by @tikazyq in #1170
- Develop by @tikazyq in #1185
- Develop by @tikazyq in #1191
- Develop by @tikazyq in #1205
- Develop by @tikazyq in #1208
- fix(git): unable to pull code from remote by @tikazyq in #1211
- Develop by @tikazyq in #1216
- feat(filter): updated backend deps by @tikazyq in #1227
- Develop by @tikazyq in #1229
- fix(ui): code highlight issue by @tikazyq in #1230
- Develop by @tikazyq in #1239
- fix(performance): fixed memory leak issue by @tikazyq in #1240
- fix(docker): api address incorrect issue by @tikazyq in #1241
- Develop by @tikazyq in #1245
- chore: updated deps by @tikazyq in #1249
- Develop by @tikazyq in #1256
- fix(doc): fix some broken links by @Codingendless in #1269
- chore: updated deps by @tikazyq in #1276
New Contributors
- @ma-pony made their first contribution in #1165
- @Codingendless made their first contribution in #1269
Full Changelog: v0.6.0...v0.6.1
v0.6.0-1
What's Changed
- Bump eventsource from 1.1.0 to 1.1.1 in /frontend by @dependabot in #1115
- Develop by @tikazyq in #1117
- Develop by @tikazyq in #1119
- Develop by @tikazyq in #1122
- Develop by @tikazyq in #1124
- Develop by @tikazyq in #1127
- Develop by @tikazyq in #1145
- Develop by @tikazyq in #1148
- fix(mem): fixed memory leak issue resulted by log collector by @tikazyq in #1157
- build(golang/dockerfile): rm unused tar by @ma-pony in #1165
- Develop by @tikazyq in #1169
- Develop by @tikazyq in #1170
- Develop by @tikazyq in #1185
- Develop by @tikazyq in #1191
- Develop by @tikazyq in #1205
- Develop by @tikazyq in #1208
- fix(git): unable to pull code from remote by @tikazyq in #1211
- Develop by @tikazyq in #1216
New Contributors
Full Changelog: v0.6.0...v0.6.0-1
v0.6.0
Change Log (v0.6.0)
Overview
As a major release, v0.6.0 is consisted of a number of large changes to enhance the performance, scalability, robustness and usability of Crawlab. This beta version is theoretically more robust than older versions mainly in task execution, files synchronization and node management, yet we still recommend users to thoroughly run tests with various samples.
Enhancements
Backend
- File Synchronization. Migrated file sync from MongoDB GridFS to SeaweedFS for better stability and robustness.
- Node Communication. Migrated node communication from Redis-based RPC to gRPC. Worker nodes indirectly interact with MongoDB by making gRPC calls to the master node.
- Task Queue. Migrated task queue from Redis list to MongoDB collection to allow more flexibility (e.g. priority queue).
- Logging. Migrated logging storage system to SeaweedFS to resolve performance issue in MongoDB.
- SDK Integration. Migrated results data ingestion from native SDK to task handler side.
- Task Related. Abstracted task related logics into Task Scheduler, Task Handler and Task Runners to increase decoupling and improve scalability and maintainability.
- Compotenization. Introduced DI (dependency injection) framework and componentized modules, services and sub-systems.
- Plugin Framework. Crawlab Plugin Framework (CPF) has been released. See more info [here](https://docs.crawlab.cn/en/guide/plugin/).
- Git Integration. Git integration is implemented as a built-in feature.
- Scrapy Integration. Scrapy integration is implemented as a plugin [spider-assistant](https://docs.crawlab.cn/en/guide/plugin/plugin-spider-assistant).
- Dependency Integration. Dependency integration is implemented as a plugin [dependency](https://docs.crawlab.cn/en/guide/plugin/plugin-dependency).
- Notifications. Notifications feature is implemented as a plugin [notification](https://docs.crawlab.cn/en/guide/plugin/plugin-notification).
Frontend
- Vue 3. Migrated to latest version of frontend framework Vue 3 to support more advanced features such as composition API and TypeScript.
- UI Framework. Built with Vue 3-based UI framework Element-Plus from Vue-Element-Admin, more flexibility and functionality.
- Advanced File Editor. Support more advanced file editor features including drag-and-drop copying/moving files, renaming, deleting, file editing, code highlight, nav tabs, etc.
- Customizable Table. Support more advanced built-in operations such as columns adjustment, batch operation, searching, filtering, sorting, etc.
- Nav Tabs. Support multiple nav tabs for viewing different pages.
- Batch Creation. Support batch creating objects including spiders, projects, schedules, etc.
- Detail Navigation. Sidebar navigation in detail pages.
- Enhanced Dashboard. More stats charts in home page dashboard.
Miscellaneous
- Documentation Site. Upgraded [documentation site](https://docs.crawlab.cn/en).
- Official Plugins. Allow users to install [official plugins](https://docs.crawlab.cn/en/guide/plugin/) on Crawlab web UI.
v0.6.0-beta.20211224
Change Log (v0.6.0-beta.20211224)
Overview
This is the third beta release for the next major version v0.6.0. With more features and optimization coming in, the release of official version v0.6.0 is approaching soon.
Enhancement
- Internationalization. Support Chinese.
- CLI Upload Spider. #1020
- Official Plugins. Allow users to install official plugins on Crawlab web UI.
- More Documentation. Added documentation for plugins and CLI.
Bug Fixes
TODOs
- Associated Tasks. There will be main tasks and their sub-tasks if task mode is "all nodes" or "selected nodes".
- Crontab Editor. Frontend component that visualize the crontab editing.
- Results Deduplication.
- Environment Variables.
- Frontend Utility Enhancement. Advanced features such as saved table customization.
- Log Auto Cleanup.
- More Documentation.
- E2E Tests.
- Frontend Output File Size Optimization.
What Next
The next version could the official release of v0.6.0, but not determined yet. There will be more tests running against the current beta version to ensure robustness and production-ready deployment.
v0.6.0-beta.20211120
Change Log (v0.6.0-beta.20211120)
Overview
This is the second beta release for the next major version v0.6.0 after the first beta release. With more features and optimization coming in, the release of official version v0.6.0 is approaching soon.
Enhancement
Backend
- Plugin Framework. Crawlab Plugin Framework (CPF) has been released. See more info here.
- Git Integration. Git integration is implemented as a built-in feature.
- Scrapy Integration. Scrapy integration is implemented as a plugin spider-assistant.
- Dependency Integration. Dependency integration is implemented as a plugin dependency.
- Notifications. Notifications feature is implemented as a plugin notification.
- Documentation Site. Set up documentation site.
Frontend
- Bug Fixing.
TODOs
- Associated Tasks. There will be main tasks and their sub-tasks if task mode is "all nodes" or "selected nodes".
- Crontab Editor. Frontend component that visualize the crontab editing.
- Results Deduplication.
- Environment Variables.
- Internationalization. Support Chinese.
- Frontend Utility Enhancement. Advanced features such as saved table customization.
- Log Auto Cleanup.
- More Documentation.
What Next
The next version could the official release of v0.6.0, but not determined yet. There will be more tests running against the current beta version to ensure robustness and production-ready deployment.
v0.6.0-beta.20210803
Change Log (v0.6.0-beta.20210803)
Overview
This is the beta release for the next major version v0.6.0. It recommended NOT to use it in production as it is not fully tested and thus not stable enough. Futhermore, more features including those not ready in the beta release (e.g. Git, Scrapy, Notification) are planned to be integrated into the live version, in the form of plugins.
Enhancement
As a major release, v0.6 (including beta versions) is consisted of a number of large changes to enhance the performance, scalability, robustness and usability of Crawlab. This beta version is theoretically more robust than older versions mainly in task execution, files synchronization and node management, yet we still recommend users to thoroughly run tests with various samples.
Backend
- File Synchronization. Migrated file sync from MongoDB GridFS to SeaweedFS for better stability and robustness.
- Node Communication. Migrated node communication from Redis-based RPC to gRPC. Worker nodes indirectly interact with MongoDB by making gRPC calls to the master node.
- Task Queue. Migrated task queue from Redis list to MongoDB collection to allow more flexibility (e.g. priority queue).
- Logging. Migrated logging storage system to SeaweedFS to resolve performance issue in MongoDB.
- SDK Integration. Migrated results data ingestion from native SDK to task handler side.
- Task Related. Abstracted task related logics into Task Scheduler, Task Handler and Task Runners to increase decoupling and improve scalability and maintainability.
- Compotenization. Introduced DI (dependency injection) framework and componentized modules, services and sub-systems.
Frontend
- Vue 3. Migrated to latest version of frontend framework Vue 3 to support more advanced features such as composition API and TypeScript.
- UI Framework. Built with Vue 3-based UI framework Element-Plus from Vue-Element-Admin, more flexibility and functionality.
- Advanced File Editor. Support more advanced file editor features including drag-and-drop copying/moving files, renaming, deleting, file editing, code highlight, nav tabs, etc.
- Customizable Table. Support more advanced built-in operations such as columns adjustment, batch operation, searching, filtering, sorting, etc.
- Nav Tabs. Support multiple nav tabs for viewing different pages.
- Batch Creation. Support batch creating objects including spiders, projects, schedules, etc.
- Detail Navigation. Sidebar navigation in detail pages.
- Enhanced Dashboard. More stats charts in home page dashboard.
TODOs
As you may be aware that this is a beta release, some of the existing useful features such as Git and Scrapy integration may not be available. However, we are trying to include them in the official v0.6.0 release, as some of their core functionalities are already ready in the code base, and we will add to the stable version only if they are fully tested.
- Plugin Framework. Advanced features will exist in the form of plugins, or pluggable modules.
- Git Integration. To be included as a plugin.
- Scrapy Integration. To be included as a plugin.
- Notifications. To be included as a plugin.
- Associated Tasks. There will be main tasks and their sub-tasks if task mode is "all nodes" or "selected nodes".
- Crontab Editor. Frontend component that visualize the crontab editing.
- Results Deduplication.
- Environment Variables.
- Internationalization. Support Chinese.
- Frontend Utility Enhancement. Advanced features such as saved table customization.
- Log Auto Cleanup.
- Documentation.
What Next
This beta release is only a preview and a test ground for the core functionalies in Crawlab v0.6. Therefore, we will invite you guys to download and run more tests. The official release is expected to be ready after major issues from the beta version are sorted and Plugin Framework and other key features are developed and fully tested. With that beared in mind, a second beta version before the main release will also be possible.
v0.5.1
Features / Enhancement
- Added error message details.
- Added Golang programming language support.
- Added web driver installation scripts for Chrome Driver and Firefox.
- Support system tasks. A "system task" is similar to normal spider task, it allows users to view logs of general tasks such as installing languages.
- Changed methods of installing languages from RPC to system tasks.
Bug Fixes
- Fixed first download repo 500 error in Spider Market page. #808
- Fixed some translation issues.
- Fixed 500 error in task detail page. #810
- Fixed password reset issue. #811
- Fixed unable to download CSV issue. #812
- Fixed unable to install node.js issue. #813
- Fixed disabled status for batch adding schedules. #814