-
Notifications
You must be signed in to change notification settings - Fork 140
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #718 from sujee/intro-example1
Intro example 1
- Loading branch information
Showing
11 changed files
with
10,992 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
output*/ | ||
|
||
## File system artifacts | ||
.directory | ||
.DS_Store | ||
|
||
|
||
## Python output | ||
__pycache__ | ||
.ipynb_checkpoints/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Data Prep Kit Introduction | ||
|
||
This is an example featuring some of the features of data prep kit. | ||
|
||
## Running the code | ||
|
||
The code can be run on either | ||
|
||
1. Google colab: very easy to run; no local setup needed. | ||
2. On your local Python environment. Here is a quick guide. You can find instructions for latest version [here](../../../README.md#-getting-started) | ||
|
||
```bash | ||
conda create -n data-prep-kit -y python=3.11 | ||
conda activate data-prep-kit | ||
|
||
# install the following in 'data-prep-kit' environment | ||
pip3 install data-prep-tooklit==0.2.1 | ||
pip3 install data-prep-toolkit-transforms==0.2.1 | ||
pip3 install data-prep-toolkit-transforms-ray==0.2.1 | ||
pip3 install jupyterlab ipykernel ipywidgets | ||
|
||
## install custom kernel | ||
## Important: Use this kernel when running example notebooks! | ||
python -m ipykernel install --user --name=data-prep-kit --display-name "dataprepkit" | ||
|
||
# start jupyter and run the notebooks with this jupyter | ||
jupyter lab | ||
``` | ||
|
||
## Intro | ||
|
||
This notebook will demonstrate processing PDFs | ||
|
||
`PDFs ---> text ---> chunks ---> exact dedupe ---> fuzzy dedupe ---> embeddings` | ||
|
||
[python version](dpk_intro_1_python.ipynb) | [ray version](dpk_intro_1_ray.ipynb) |
Oops, something went wrong.