-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(project)✨: Enhance Fast-SeqFunc with CLI, embedding, and model functionalities #1
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Owner
ericmjl
commented
Mar 23, 2025
- Added CLI commands for training, predicting, and comparing embeddings.
- Implemented core functionalities for sequence embedding and model training.
- Updated README with detailed usage instructions and examples.
…unctionalities - Added CLI commands for training, predicting, and comparing embeddings. - Implemented core functionalities for sequence embedding and model training. - Updated README with detailed usage instructions and examples.
- Updated the environment name from 'testing' to 'tests' in the pr-tests.yaml file. - Ensured the workflow aligns with the expected configuration for the setup-pixi action.
…sign details - Added detailed API reference to the documentation. - Included a comprehensive design document outlining the architecture and components. - Updated the index page with an overview and quickstart guide.
…ile hash. - Fixed the formatting of the dependencies list in pyproject.toml. - Updated the hash in pixi.lock to reflect the changes.
…eling - Introduced a new example script for demonstrating basic usage of the fast-seqfunc library. - Enhanced the pre-commit configuration to exclude specific files from checks. - Updated the core library to include additional functionality for model evaluation and saving. - Added a notebook for interactive exploration of sequence-function modeling.
- Introduced API reference detailing core functions and classes. - Added a tutorial for sequence classification tasks. - Updated the index with an overview and roadmap. - Included a quickstart guide for new users.
… items - Introduced a new 'roadmap.md' file to the documentation. - Outlined current and future development goals for the project. - Included details on features like custom alphabets, auto-inferred alphabets, and ONNX model integration.
…ated implementation - Removed tests for deprecated parameters and methods. - Updated test cases to reflect changes in the OneHotEmbedder class. - Renamed test methods to match updated functionality.
- Introduced a comprehensive design document outlining the implementation of custom alphabets in fast-seqfunc. - Detailed the creation of an Alphabet class to handle tokenization and mapping for various sequence types. - Provided examples and integration strategies for using the new functionality in existing workflows.
…n capabilities - Implemented synthetic data generation functions for various tasks. - Added visualization and model training scripts for synthetic datasets. - Enhanced CLI and test coverage for new synthetic data features.
…dding and custom gap characters - Implemented padding for sequences of different lengths with configurable gap characters. - Updated the Alphabet and OneHotEmbedder classes to handle padding and truncation. - Enhanced tests to cover new padding functionality and edge cases.
- Updated the `predict` function to include an optional confidence score output. - Modified the CLI to use the new `save_model` function for saving models. - Enhanced type annotations and documentation for improved clarity.
- Updated the predict_cmd function to utilize model_info for predictions. - Modified the compare_embeddings function to extract and use model components from model_info. - Replaced direct model usage with evaluate_model for test data evaluation.
…th outputs - Updated the transform method to return variable-length outputs when padding is disabled. - Modified the fit_transform method to align with the updated transform method. - Removed an unused test case from the synthetic data tests.
- Updated the CLI options to remove 'multi-class' as a valid model type. - Adjusted related test cases to reflect the updated model type options.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.