JZHub is an open-source data hub designed to empower researchers, data scientists, and open science enthusiasts to manage, version, and publish scientific data with ease. Built on JZFS (a Git-like version control file system) and integrated with Resource Hub for models, workflows,storage and comptation, JZHub provides a flexible platform for collaborative data workflows. Whether you're creating citable data papers, sharing datasets, or tracking experimental changes, JZHub streamlines the process with modern tools and automation.
JZHub offers a range of capabilities to meet the needs of researchers and data scientists, inspired by public data hubs like Dataset Hubs, LLM hubs, and AI hubs. Here’s what JZHub provides:
-
Data Versioning and Collaboration: Track changes to datasets, models, and documents using JZFS, enabling Git-like versioning for collaborative research workflows, supporting team-based data management. This mirrors data hubs’ focus on versioning and collaboration, as seen in platforms like GitHub.
-
Hosting and Publishing: Store and share data and models in a hub, ensuring persistent, accessible, and citable outputs for open science, with cross-domain interoperability.
-
Metadata Management and Discovery: Automatically generate and manage metadata for datasets and models, with search and discovery features to enhance data reuse and accessibility.
-
LLM Integration: Leverage large language models (e.g., Deepseek) for automated content generation (e.g., data papers, blogs, documentation) and retrieval-augmented generation (RAG) for querying data.
-
Model Hosting and Fine-Tuning: Host and fine-tune LLMs or AI models, with integration for on-device deployment via Cloud or decentralized storage, supporting scalable AI research。
-
Security, Compliance, and Governance: Offer access controls, compliance monitoring, and data protection features for sensitive research data, ensuring trust in multi-institutional collaborations.
JZHub’s use cases are designed to support researchers in open science and AI-driven research.Here’s how JZHub can be used:
-
Data Paper Creation and Publication: Generate citable data papers from datasets using LLMs, publish with datasets as IPLD products or other formats for open science and reproducibility.
-
Collaborative Research Workflows: Track experiment data or model versions across distributed teams, share securely via decentralized storage, enhancing multi-institutional research.
-
AI-Driven Insights and Reporting: Use RAG to query datasets/models for insights, with LLM-generated summaries for reports, supporting data-driven decision-making in research.
-
Decentralized Model Deployment: Host fine-tuned LLMs or AI models for on-device use in research applications, enabling innovation in resource-constrained environments.
-
Compliance and Governance for Sensitive Data: Manage sensitive data with access controls and compliance monitoring, ensuring ethical use in collaborative open science projects.
deploy the system to your server,you can get help from this repository:
https://github.com/GitDataAI/jzhub
clone JZLab repository to your server:
git clone [email protected]:GitDataAI/jzlab.git
Before you run the project for the first time, run the following script to install packages from package.json
:
npm install
After waiting for the installation to complete,run the following script to start:
npm run dev
We welcome contributions! Please read our for guidelines. Here’s how to get started:
- Fork the repo.
- Create a feature branch (
git checkout -b feature/my-feature
). - Submit a pull request.
- Actions support.
- UI enhancements.
- Integration with LLM.
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!
Dual-licensed under MIT + Apache 2.0