Skip to content

Commit 0630dc0

Browse files
committed
remove spacy dependency since it's no longer needed
1 parent ae12ba2 commit 0630dc0

File tree

5 files changed

+0
-11
lines changed

5 files changed

+0
-11
lines changed

.github/workflows/main.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,6 @@ jobs:
2323
pip install --upgrade pip setuptools
2424
pip install -r requirements_test.txt
2525
pip install pytest
26-
- name: Download spaCy model
27-
run: python -m spacy download en_core_web_sm
2826
- name: Run tests
2927
run: |
3028
pytest tests/test*.py

CONTRIBUTING.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ Firstly, clone the repository where we store our database data and schema. Insta
1515
git clone https://github.com/defog-ai/defog-data.git
1616
cd defog-data
1717
pip install -r requirements.txt
18-
python -m spacy download en_core_web_sm
1918
pip install -e .
2019
```
2120

README.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ Firstly, clone the repository where we store our database data and schema. Insta
2525
git clone https://github.com/defog-ai/defog-data.git
2626
cd defog-data
2727
pip install -r requirements.txt
28-
python -m spacy download en_core_web_sm
2928
pip install -e .
3029
```
3130

@@ -106,8 +105,6 @@ If you have a private dataset that you do not want to make publicly available bu
106105
- Begin by creating a separate git repository for your private data, that has a `setup.py` file, similar to [defog-data](https://github.com/defog-ai/defog-data).
107106
- Create the metadata and data files, and import them into your database. This is to allow our evaluation framework to run the generated queries with some actual data. You can refer to `defog-data`'s [metadata objects](https://github.com/defog-ai/defog-data/blob/main/defog_data/metadata.py) for the schema, and [setup.sh](https://github.com/defog-ai/defog-data/blob/main/setup.sh) as an example on how import the data into your database. We do not prescribe any specific folder structure, and leave it to you to decide how you want to organize your data, so long as you can import it into your database easily.
108107
- To use our metadata pruning utilities, you would need to have the following defined:
109-
- A way to load your embeddings. In our case, we call a function [load_embeddings](https://github.com/defog-ai/defog-data/blob/db8c3d4c4004144d2b3ff5a2701529f5545f520f/defog_data/supplementary.py#L85) from `defog-data`'s supplementary module to load a dictionary of database name to a tuple of the 2D embedding matrix (num examples x embedding dimension) and the associated text metadata for each row/example. If you would like to see how we generate this tuple, you may refer to [generate_embeddings](https://github.com/defog-ai/defog-data/blob/main/defog_data/supplementary.py#L11) in the `defog-data` repository.
110-
- A way to load columns associated with various named entities. In our case, we call a dictionary [columns_ner](https://github.com/defog-ai/defog-data/blob/db8c3d4c4004144d2b3ff5a2701529f5545f520f/defog_data/supplementary.py#L106) of database name to a nested dictionary that maps each named entity type to a list of column metadata strings that are associated with that named entity type. You can refer to the raw data for an example of how we generate this dictionary.
111108
- A way to define joinable columns between tables. In our case, we call a dictionary [columns_join](https://github.com/defog-ai/defog-data/blob/db8c3d4c4004144d2b3ff5a2701529f5545f520f/defog_data/supplementary.py#L233) of database name to a nested dictionary of table tuples to column name tuples. You can refer to the raw data for an example of how we generate this dictionary.
112109

113110
Once all of the 3 above steps have completed, you would need to

requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ pytest
1515
pyyaml
1616
sentence-transformers
1717
snowflake-connector-python
18-
spacy
1918
sqlalchemy
2019
tiktoken
2120
together

requirements_test.txt

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,7 @@ numpy
33
openai
44
pandas
55
psycopg2-binary
6-
# pysqlite3
7-
sentence_transformers
86
snowflake-connector-python
9-
spacy==3.7.2
107
sqlalchemy
118
sqlglot
12-
torch
139
tqdm

0 commit comments

Comments
 (0)