-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update indexing notebook #73
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at the changes. Imo, we still need to define which components we are transforming into lightweight components and which we keep as reusable.
I would keep the chunking component and the write to Weaviate as reusable. Any arguments for why we should implement these as custom components?
The aggregate_eval_results
and text_cleaning
seem to be good examples for custom components. I would only convert them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mrchtr!
The not so nice thing about having these components in separate files, is that we still implement them as if they are reusable. We are still wrapping a not-so-nice interface around existing libraries, while I would just "hardcode" the arguments used into the components since they're lightweight. This would simplify the component a lot, but probably works better if the components are inline.
CC: @GeorgesLorre
@@ -1,3 +1,4 @@ | |||
fondant==0.9.0 | |||
fondant[component,aws,azure,gcp,docker]==0.10.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we might need this in the other use cases as well. Didn't run into them, but might be because I already had the dependencies in my environment. Will try to re-run with a clean one.
requirements.txt
Outdated
notebook==7.0.6 | ||
weaviate-client==3.25.3 | ||
langchain==0.0.329 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this here if it's in the extra_requires
of the component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we do have langchain as extra requirement. But at the moment we are not installing the extra requirements in the local environment. When I am in the notebook, and want to test the class implementation, I will get an exception that the dependency is missing. I think this is something we should takle in #830
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we are not doing the same for eg. Ragas?
Since we are not actually executing the components in the notebook at this moment, I would leave them out of the requirements.txt.
Thx @mrchtr, looking good. But I do tend to agree here that is almost to nicely abstracted now, they are reusable components with a decorator on top now. I don't feel like we get any of the lightweight component benefits. I would really try to dumb them down by defining them inline (in the notebook). Removing optional functionality. Hardcoding most of the arguments and by doing so removing most of the docstrings. This should make them more compact and understandable at a glance. Let me know if this does not make sense. |
requirements.txt
Outdated
notebook==7.0.6 | ||
weaviate-client==3.25.3 | ||
langchain==0.0.329 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we are not doing the same for eg. Ragas?
Since we are not actually executing the components in the notebook at this moment, I would leave them out of the requirements.txt.
35ae09f
to
0ccfe45
Compare
Alright, reduced the notebooks to a minimum. |
I'm a fan of having both in our use cases, the notebooks as explanation, and the python files to actually build on. On the other hand, I see the duplication we'd be introducing. WDYT @GeorgesLorre? |
Updated indexing notebook.
For now we're using the version from this PR to allow using resuable components in notebooks.
Future Todos:
\n
. Requires further investigation. (This PR)