-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ENH] add example of forking using github repos #4413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
4142605
to
1ab410d
Compare
a5eb712
to
dd6d176
Compare
1ab410d
to
b0e30ac
Compare
be5bb4d
to
64bbb43
Compare
64bbb43
to
c54e0ca
Compare
"metadata": {}, | ||
"source": [ | ||
"## Forking\n", | ||
"ChromaDB now supports forking. Below is an example using forking to chunk and embed a github repo, fork off of the collection for a new github branch, and apply diffs to the new branch." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChromaDB -> Chroma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also worth mentioning this is only supported in Distributed Chroma / Chroma Cloud.
"source": [ | ||
"\n", | ||
"PY_LANGUAGE = get_language(\"python\")\n", | ||
"REPO_OWNER = \"jairad26\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit strange to have your personal repo here
"REPO_OWNER = \"jairad26\"\n", | ||
"REPO_NAME = \"Django-WebApp\"\n", | ||
"EXISTING_BRANCH = \"main\"\n", | ||
"NEW_BRANCH = \"test1\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we call this demo etc instead?
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Chunker and Github Helpers\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we collapse / hide this region by default?
" \"file_path\": file_path,\n", | ||
" \"size\": chunk[\"size\"],\n", | ||
" # \"node_types\": str([n[\"type\"] for n in chunk[\"nodes\"]]),\n", | ||
" # \"node_names\": str([n[\"name\"] for n in chunk[\"nodes\"] if n[\"name\"]]),\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a bunch of commented out code here?
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
" metadata document distance\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this output more legible?
Description of changes
This PR adds an example notebook to show forking usage with github repos and branching. It creates a custom chunker using tree sitter to generate embeddable chunks of code, and a github helper to pull file info, track diffs between branches. it then uses chroma cloud to index and adds a github repo into a collection, and based on the branch provided creates a fork of the collection, calculates the diff between the original branch and the new branch, deletes chunks from the new branch that correspond to the diffed files, and rechunks and reinserts them.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?