Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(project)✨: Enhance Fast-SeqFunc with CLI, embedding, and model functionalities #1

Merged
merged 17 commits into from
Mar 25, 2025

Conversation

ericmjl
Copy link
Owner

@ericmjl ericmjl commented Mar 23, 2025

  • Added CLI commands for training, predicting, and comparing embeddings.
  • Implemented core functionalities for sequence embedding and model training.
  • Updated README with detailed usage instructions and examples.

ericmjl added 17 commits March 22, 2025 23:46
…unctionalities

- Added CLI commands for training, predicting, and comparing embeddings.
- Implemented core functionalities for sequence embedding and model training.
- Updated README with detailed usage instructions and examples.
- Updated the environment name from 'testing' to 'tests' in the pr-tests.yaml file.
- Ensured the workflow aligns with the expected configuration for the setup-pixi action.
…sign details

- Added detailed API reference to the documentation.
- Included a comprehensive design document outlining the architecture and components.
- Updated the index page with an overview and quickstart guide.
…ile hash.

- Fixed the formatting of the dependencies list in pyproject.toml.
- Updated the hash in pixi.lock to reflect the changes.
…eling

- Introduced a new example script for demonstrating basic usage of the fast-seqfunc library.
- Enhanced the pre-commit configuration to exclude specific files from checks.
- Updated the core library to include additional functionality for model evaluation and saving.
- Added a notebook for interactive exploration of sequence-function modeling.
- Introduced API reference detailing core functions and classes.
- Added a tutorial for sequence classification tasks.
- Updated the index with an overview and roadmap.
- Included a quickstart guide for new users.
… items

- Introduced a new 'roadmap.md' file to the documentation.
- Outlined current and future development goals for the project.
- Included details on features like custom alphabets, auto-inferred alphabets, and ONNX model integration.
…ated implementation

- Removed tests for deprecated parameters and methods.
- Updated test cases to reflect changes in the OneHotEmbedder class.
- Renamed test methods to match updated functionality.
- Introduced a comprehensive design document outlining the implementation of custom alphabets in fast-seqfunc.
- Detailed the creation of an Alphabet class to handle tokenization and mapping for various sequence types.
- Provided examples and integration strategies for using the new functionality in existing workflows.
…n capabilities

- Implemented synthetic data generation functions for various tasks.
- Added visualization and model training scripts for synthetic datasets.
- Enhanced CLI and test coverage for new synthetic data features.
…dding and custom gap characters

- Implemented padding for sequences of different lengths with configurable gap characters.
- Updated the Alphabet and OneHotEmbedder classes to handle padding and truncation.
- Enhanced tests to cover new padding functionality and edge cases.
- Updated the `predict` function to include an optional confidence score output.
- Modified the CLI to use the new `save_model` function for saving models.
- Enhanced type annotations and documentation for improved clarity.
- Updated the predict_cmd function to utilize model_info for predictions.
- Modified the compare_embeddings function to extract and use model components from model_info.
- Replaced direct model usage with evaluate_model for test data evaluation.
…th outputs

- Updated the transform method to return variable-length outputs when padding is disabled.
- Modified the fit_transform method to align with the updated transform method.
- Removed an unused test case from the synthetic data tests.
- Updated the CLI options to remove 'multi-class' as a valid model type.
- Adjusted related test cases to reflect the updated model type options.
@ericmjl ericmjl merged commit 7fabe4e into main Mar 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant