-
Notifications
You must be signed in to change notification settings - Fork 15
Add JUnit 5 support and initial unit test for Tokenizer #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates JUnit 5 into the project and adds the first unit tests for the Tokenizer
class to establish a testing foundation.
- Added JUnit 5 dependency in
pom.xml
- Created
TokenizerTest.java
with basic tokenization tests
Reviewed Changes
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
pom.xml | Added JUnit 5 dependency for unit testing support |
src/test/java/com/example/tokenizer/impl/TokenizerTest.java | Introduced initial test class covering core tokenizer methods |
Comments suppressed due to low confidence (1)
src/test/java/com/example/tokenizer/impl/TokenizerTest.java:39
- [nitpick] Add tests for edge cases (e.g., empty or null inputs) to
encodeOrdinary
to ensure the tokenizer handles them without errors.
void testEncodeOrdinary() {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution @ayush0407
I have a few comments:
- the changes in pom file are missing
- we need to have some instructions how to run the tests
- we need to wait for #17 to be merged first as it refactors a bit the functionality of Tokenizer class to support multiple tokenizers
|
Ill fix the Rest by Eod today thanks for the Suggestion |
Hi @mikepapadim, Review changes completed. |
hello @ayush0407, it still missing instructions on how to run the test and the expected output, also a test for mistral tokenizer. Can you possibly add these? thanks |
sure. |
…to add-junit5-tokenizer-test
Hi @mikepapadim, please review the updated PR. |
Thank you @ayush0407 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, @ayush0407! 👍
To make the JUnit support more complete and meaningful, I suggest the following improvements:
- Add unit tests for
LlamaTokenizer
methods. - Include more targeted checks, prioritizing the
encode
anddecode
methods for bothLlamaTokenizer
andMistralTokenizer
.
This will help ensure correctness and consistency across tokenizer implementations.
assertNotNull(tokens); | ||
assertFalse(tokens.isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect a check based on the expected result rather than checking if it's not null and empty.
@Test | ||
void testRegexPatternReturnsNull() { | ||
assertNull(tokenizer.regexPattern()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regexPattern() should not return null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TokenizerTest covers the static replaceControlCharacters()
methods which are implemented in the Tokenizer interface, not LlamaTokenizer. In order this junit support to make sense I would suggest to add coverage for LlamaTokenizer as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
This PR adds initial testing support to the project by integrating JUnit 5 and creating the first unit test for the
Tokenizer
class.Changes:
pom.xml
TokenizerTest.java
insrc/test/java/llama
This sets the foundation for adding more tests across the codebase.