Skip to content

Commit

Permalink
Merge pull request #846 from agoyal26/resources_update
Browse files Browse the repository at this point in the history
Updated Resources webpage with latest talks and links
  • Loading branch information
touma-I authored Dec 3, 2024
2 parents 4171dfa + 9fc6d5b commit 19600d9
Showing 1 changed file with 29 additions and 5 deletions.
34 changes: 29 additions & 5 deletions resources.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# New Features & Enhancements

- Support for Docling 2.0 added to DPK in [pdf2parquet](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/pdf2parquet/python) transform. The new updates allow DPK users to ingest other type of documents, e.g. MS Word, MS Powerpoint, Images, Markdown, Asciidocs, etc.
- Released [Web2parquet](https://github.com/IBM/data-prep-kit/tree/dev/transforms/universal/web2parquet) transform for crawling the web.

# Data Prep Kit Resources

## 📄 Papers
Expand All @@ -7,24 +12,43 @@
3. [Scaling Granite Code Models to 128K Context](https://arxiv.org/abs/2407.13739)


## 🎤 Talks
## 🎤 External Events and Showcase

1. **"Building Successful LLM Apps: The Power of high quality data"** - [Video](https://www.youtube.com/watch?v=u_2uiZBBVIE) | [Slides](https://www.slideshare.net/slideshow/data_prep_techniques_challenges_methods-pdf-a190/271527890)
2. **"Hands on session for fine tuning LLMs"** - [Video](https://www.youtube.com/watch?v=VEHIA3E64DM)
3. **"Build your own data preparation module using data-prep-kit"** - [Video](https://www.youtube.com/watch?v=0WUMG6HIgMg)
4. **"Data Prep Kit: A Comprehensive Cloud-Native Toolkit for Scalable Data Preparation in GenAI App"** - [Video](https://www.youtube.com/watch?v=WJ147TGULwo) | [Slides](https://ossaidevjapan24.sched.com/event/1jKBm)
5. **"RAG with Data Prep Kit" Workshop** @ Mountain View, CA, USA ** - [info](https://github.com/sujee/data-prep-kit-examples/blob/main/events/2024-09-21__RAG-workshop-data-riders.md)
6. **Tech Educator summit** [IBM CSR Event](https://www.linkedin.com/posts/aanchalaggarwal_github-ibmdata-prep-kit-open-source-project-activity-7254062098295472128-OA_x?utm_source=share&utm_medium=member_desktop)
7. **Talk and Hands on session** at [MIT Bangalore](https://www.linkedin.com/posts/saptha-surendran-71a4a0ab_ibmresearch-dataprepkit-llms-activity-7261987741087801346-h0no?utm_source=share&utm_medium=member_desktop)
8. **PyData NYC 2024** - [90 mins Tutorial](https://nyc2024.pydata.org/cfp/talk/AWLTZP/)
9. **Open Source AI** [Demo Night](https://lu.ma/oss-ai?tk=A8BgIt)
10. [**Data Exchange Podcast with Ben Lorica**](https://thedataexchange.media/ibm-data-prep-kit/)
11. Unstructured Data Meetup - SF, NYC, Silicon Valley
12. IBM TechXchange Las Vegas
13. Open Source [**RAG Pipeline workshop**](https://www.linkedin.com/posts/sujeemaniyam_dataprepkit-workshop-llm-activity-7256176802383986688-2UKc?utm_source=share&utm_medium=member_desktop) with Data Prep Kit at TechEquity's AI Summit in Silicon Valley
14. **Data Science Dojo Meetup** - [video](https://datasciencedojo.com/tutorial/data-preparation-toolkit/)
15. [**DPK tutorial and hands on session at IIIT Delhi**](https://www.linkedin.com/posts/cai-iiitd-97a6a4232_datascience-datapipelines-machinelearning-activity-7263121565125349376-FG8E?utm_source=share&utm_medium=member_desktop)


## Example Code
Find example code in readme section of each tranform and some sample jupyter notebooks for getting started [**here**](examples/notebooks)

## Blogs / Tutorials

- [**IBM Developer Blog**](https://developer.ibm.com/blogs/awb-unleash-potential-llms-data-prep-kit/)
- [**Introductory Blog on DPK**](https://www.linkedin.com/pulse/unleashing-potential-large-language-models-through-data-aanchal-goyal-fgtff)
- [**DPK Header Cleanser Module Blog by external contributor**](https://www.linkedin.com/pulse/enhancing-data-quality-developing-header-cleansing-tool-kalathiya-i1ohc/?trackingId=6iAeBkBBRrOLijg3LTzIGA%3D%3D)


## Workshops
# Relevant online communities

- **2024-09-21: "RAG with Data Prep Kit" Workshop** @ Mountain View, CA, USA - [info](https://github.com/sujee/data-prep-kit-examples/blob/main/events/2024-09-21__RAG-workshop-data-riders.md)
- [**Data Prep Kit Discord Channel**](https://discord.com/channels/1276554812359442504/1303454647427661866)
- [**DPK is now listed in Github Awesome-LLM under LLM Data section**](https://github.com/Hannibal046/Awesome-LLM)
- [**DPK is now up for access via IBM Skills Build Download**](https://academic.ibm.com/a2mt/downloads/artificial_intelligence#/)
- [**DPK added to the Application Hub of “AI Sustainability Catalog”**](https://enterprise-neurosystem.github.io/Sustainability-Catalog/)

## Discord
## We Want Your Feedback!
Feel free to contribute to discussions or create a new one to share your [feedback](https://github.com/IBM/data-prep-kit/discussions)

- [**Data Prep Kit Discord Channel**](https://discord.com/channels/1276554812359442504/1286046139921207476)

0 comments on commit 19600d9

Please sign in to comment.