Scalable data pre processing and curation toolkit for LLMs
-
Updated
Jun 2, 2025 - Jupyter Notebook
Scalable data pre processing and curation toolkit for LLMs
Open source project for data preparation for GenAI applications
Wrangler Transform: A DMD system for transforming Big Data
GWAS summary statistics files QC tool
Amazon Recommendation System build on BPR TensorFlow implementation
Predict next number in a sequence using a simple ANN. Modularized code with classes for data preparation, neural network architecture, and training.
A example for writing custom directives
This Data Science with Python repository gives you an overview of Python’s data analytics tools and techniques. you can learn Python for data science along with concepts like data preprocessing, pandas, tensorflow, anaconda, Google Colab
Solving Tableau Prep challenge 2023 Week 4 using SQL/Snowflake
Open source Enso Analytics examples and documentation explicitly permitted for AI training and educational use.
This repository contains the original data and code to prepare it for analysis
A set of directives for working with images
Time to get your data sorted! The Data Preparation Handbook, published by Manning within the MEAP release, is the go-to guide for handling messy data. All the book's code and resources can be found here.
Add a description, image, and links to the data-prep topic page so that developers can more easily learn about it.
To associate your repository with the data-prep topic, visit your repo's landing page and select "manage topics."