Add JUnit 5 support and initial unit test for Tokenizer #26

ayush0407 · 2025-06-15T08:34:47Z

This PR adds initial testing support to the project by integrating JUnit 5 and creating the first unit test for the Tokenizer class.

Changes:

Added JUnit 5 dependency in pom.xml
Created TokenizerTest.java in src/test/java/llama
Added a simple test to validate the tokenization of a basic prompt

This sets the foundation for adding more tests across the codebase.

CLAassistant · 2025-06-15T08:34:52Z

All committers have signed the CLA.

Copilot

Pull Request Overview

This PR integrates JUnit 5 into the project and adds the first unit tests for the Tokenizer class to establish a testing foundation.

Added JUnit 5 dependency in pom.xml
Created TokenizerTest.java with basic tokenization tests

Reviewed Changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

File	Description
pom.xml	Added JUnit 5 dependency for unit testing support
src/test/java/com/example/tokenizer/impl/TokenizerTest.java	Introduced initial test class covering core tokenizer methods

Comments suppressed due to low confidence (1)

src/test/java/com/example/tokenizer/impl/TokenizerTest.java:39

[nitpick] Add tests for edge cases (e.g., empty or null inputs) to encodeOrdinary to ensure the tokenizer handles them without errors.

void testEncodeOrdinary() {

src/test/java/com/example/tokenizer/impl/TokenizerTest.java

mikepapadim

Thank you for your contribution @ayush0407

I have a few comments:

the changes in pom file are missing
we need to have some instructions how to run the tests
we need to wait for #17 to be merged first as it refactors a bit the functionality of Tokenizer class to support multiple tokenizers

ayush0407 · 2025-06-16T09:42:58Z

Thank you for your contribution @ayush0407

I have a few comments:

the changes in pom file are missing

we need to have some instructions how to run the tests

we need to wait for [model] Add support for Mistral models #17 to be merged first as it refactors a bit the functionality of Tokenizer class to support multiple tokenizers
Sure I forgot to Push the pom file with the JUnit dependency.

ayush0407 · 2025-06-16T09:46:47Z

Ill fix the Rest by Eod today thanks for the Suggestion

ayush0407 · 2025-06-16T17:15:11Z

Hi @mikepapadim, Review changes completed.

mikepapadim · 2025-06-18T11:55:10Z

hello @ayush0407, it still missing instructions on how to run the test and the expected output, also a test for mistral tokenizer. Can you possibly add these? thanks

ayush0407 · 2025-06-19T07:26:59Z

hello @ayush0407, it still missing instructions on how to run the test and the expected output, also a test for mistral tokenizer. Can you possibly add these? thanks

sure.

…to add-junit5-tokenizer-test

ayush0407 · 2025-06-19T13:41:46Z

Hi @mikepapadim, please review the updated PR.

mikepapadim · 2025-06-19T13:44:55Z

Thank you @ayush0407

orionpapadakis

Nice work, @ayush0407! 👍

To make the JUnit support more complete and meaningful, I suggest the following improvements:

Add unit tests for LlamaTokenizer methods.
Include more targeted checks, prioritizing the encode and decode methods for both LlamaTokenizer and MistralTokenizer.

This will help ensure correctness and consistency across tokenizer implementations.

orionpapadakis · 2025-06-19T13:51:42Z

src/test/java/com/example/tokenizer/impl/MistralTokenizerTest.java

+        assertNotNull(tokens);
+        assertFalse(tokens.isEmpty());


I would expect a check based on the expected result rather than checking if it's not null and empty.

orionpapadakis · 2025-06-19T13:54:52Z

src/test/java/com/example/tokenizer/impl/MistralTokenizerTest.java

+    @Test
+    void testRegexPatternReturnsNull() {
+        assertNull(tokenizer.regexPattern());
+    }


regexPattern() should not return null

orionpapadakis · 2025-06-19T14:04:42Z

src/test/java/com/example/tokenizer/impl/TokenizerTest.java

This TokenizerTest covers the static replaceControlCharacters() methods which are implemented in the Tokenizer interface, not LlamaTokenizer. In order this junit support to make sense I would suggest to add coverage for LlamaTokenizer as well.

test: add JUnit 5 support and initial Tokenizer test

ea7a3a2

ayush0407 mentioned this pull request Jun 15, 2025

Add Junit5 and a set of initial tests #14

Open

test: add JUnit 5 support and initial Tokenizer test

a883978

mikepapadim requested review from Copilot, orionpapadakis and mikepapadim June 16, 2025 07:53

Copilot AI reviewed Jun 16, 2025

View reviewed changes

src/test/java/com/example/tokenizer/impl/TokenizerTest.java Outdated Show resolved Hide resolved

src/test/java/com/example/tokenizer/impl/TokenizerTest.java Outdated Show resolved Hide resolved

mikepapadim reviewed Jun 16, 2025

View reviewed changes

mikepapadim assigned ayush0407 Jun 16, 2025

mikepapadim added the testing label Jun 16, 2025

test: Code review changes done.

69dee5f

ayush0407 added 2 commits June 19, 2025 17:36

Merge branch 'main' of https://github.com/ayush0407/GPULlama3.java in…

6b36927

…to add-junit5-tokenizer-test

test: Code review changes done.

08addb6

orionpapadakis requested changes Jun 19, 2025

View reviewed changes

Add JUnit 5 support and initial unit test for Tokenizer #26

Are you sure you want to change the base?

Add JUnit 5 support and initial unit test for Tokenizer #26

Uh oh!

Conversation

ayush0407 commented Jun 15, 2025

Uh oh!

CLAassistant commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

mikepapadim left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayush0407 commented Jun 16, 2025

Uh oh!

ayush0407 commented Jun 16, 2025

Uh oh!

ayush0407 commented Jun 16, 2025

Uh oh!

mikepapadim commented Jun 18, 2025

Uh oh!

ayush0407 commented Jun 19, 2025

Uh oh!

ayush0407 commented Jun 19, 2025

Uh oh!

mikepapadim commented Jun 19, 2025

Uh oh!

orionpapadakis left a comment

Choose a reason for hiding this comment

Uh oh!

orionpapadakis Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

orionpapadakis Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

orionpapadakis Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

ayush0407 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CLAassistant commented Jun 15, 2025 •

edited

Loading

mikepapadim left a comment •

edited

Loading