DataSeek

Dataset miner and manager, which uses Selenium DeepSeek automation.

why?

Datasets are one of the most important factors for LM (language model) development.

A dataset with perfect examples and size makes a perfect LM. Not too much, not too short; it should fit to models size / parameter count.

Making language models can be seem so hard. But in fact, they are just math, trained by your dataset. Hard things are:

Computational power to train them
Creating a perfect dataset

Yes, creating a dataset manually would take years. So this is why DataSeek exists.

demo & guide

video

Demo video coming soon.

how to use?

from source

Clone the repository

git clone https://github.com/MYusufY/dataseek.git
cd dataseek

Lauch it

pip install -r requirements.txt
python3 main.py

Enter your system prompt
- A system prompt for DataSeek is the way to describe what kind of dataset you want to be generated to DeepSeek.
- You should give information about your desired output count per interval, output format, style etc.
- You can see some examples in the examples folder.
Enter your example base JSON dataset (optional)
- If you enter or import a JSON dataset which already exists, its last 30 examples will be sent to DeepSeek right after the system prompt. So it would have more idea about the format & dataset.
- This slightly improves performance. You can give a few examples, or a whole dataset to improve it. (not start from scratch- add the new examples on top of it.)
- Its completely optional.

from releases

DataSeek will be released as a standalone app soon, for Linux, macOS and (maybe) Windows. If you want to, you can open an issue to make this process faster!

Disclaimer

This repository is only for research purposes. I am not responsible for misuse. Please do not use in production!

Contact & Support

📧 [email protected]
☕ Buy me a coffee

Thanks — hope this helps!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
src		src
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataSeek

why?

demo & guide

video

how to use?

from source

from releases

Disclaimer

Contact & Support

About

Uh oh!

Releases

Packages

Languages

MYusufY/dataseek

Folders and files

Latest commit

History

Repository files navigation

DataSeek

why?

demo & guide

video

how to use?

from source

from releases

Disclaimer

Contact & Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages