Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fast_seqfunc)✨: Add support for custom alphabets and integer sequences #5

Merged
merged 10 commits into from
Mar 26, 2025

Conversation

ericmjl
Copy link
Owner

@ericmjl ericmjl commented Mar 26, 2025

  • Introduced the Alphabet class for handling custom sequence alphabets.
  • Enhanced OneHotEmbedder to support custom alphabets.
  • Added synthetic data generation for integer sequences.
  • Implemented tests for new features and functionalities.

ericmjl added 10 commits March 26, 2025 14:07
…uences

- Introduced the Alphabet class for handling custom sequence alphabets.
- Enhanced OneHotEmbedder to support custom alphabets.
- Added synthetic data generation for integer sequences.
- Implemented tests for new features and functionalities.
…eling

- Introduced a new example script demonstrating sequence-function modeling with mixed amino acids.
- The script includes data generation, model training, and prediction functionalities.
- Provides visualization and evaluation of model performance.
…dling and compatibility

- Introduced getter and setter for the alphabet property to enhance encapsulation.
- Updated tests to align with the new alphabet handling logic.
- Refactored test cases to dynamically calculate expected dimensions.
…lass

- Updated tests to dynamically verify alphabet size and token mappings.
- Improved parameterized tests for encoding and decoding sequences.
- Added checks for sequence padding and truncation scenarios.
- Adjusted expected embedding shapes in tests to account for the updated token type count.
- Modified assertions to validate the new token type structure, including gap values and characters.
- Ensured all test cases align with the revised alphabet size and token handling.
- Updated the GitHub Actions workflow to include a test matrix for fast and slow tests.
- Introduced a new pytest marker for slow tests in the configuration.
- Added slow test markers to relevant test cases in the test suite.
…els directly

- Updated the prediction function to directly use scikit-learn models if available.
- Simplified the test cases to focus on embedding and serialization without PyCaret dependencies.
…ction logic for improved maintainability

- Unified test execution logic in CI workflows to reduce redundancy.
- Simplified prediction logic in core.py by consolidating conditional branches.
…flow

- Removed manual test output capturing and exit code handling.
- Utilized pytest's cache for counting test results.
- Streamlined the test summary creation process.
- Updated the test result parsing logic to extract counts from pytest output.
- Replaced direct cache file parsing with output analysis for better reliability.
@ericmjl ericmjl merged commit 3bb4822 into main Mar 26, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant