Fix Issue #211 : Improved Embedding Performance by Handling Base64 Encoding#303
Fix Issue #211 : Improved Embedding Performance by Handling Base64 Encoding#303yoshioterada wants to merge 2 commits intoopenai:mainfrom
Conversation
First commit to fix Issue openai#211 This commit includes the fix described in Issue openai#211. * Addressed the issue where Base64 encoding could not be handled. * Improved performance by using Base64 encoding by default.
|
Dear @TomerAberbach san, If you have time, please review my PR please? |
|
Some high-level thoughts:
|
| @JsonDeserialize(using = EmbeddingValueDeserializer::class) | ||
| class EmbeddingValue( | ||
| var base64Embedding: Optional<String> = Optional.empty(), | ||
| floatEmbedding: Optional<MutableList<Double>> = Optional.empty(), |
There was a problem hiding this comment.
I suggest that you replace all the Optional type declarations in this class with Kotlin's nullable type which is more natively supported, built into the language, and is more idiomatic and efficient.
E.g.
var base64Embedding: String? = null,
...
There was a problem hiding this comment.
@essien and @TomerAberbach, Thank you so much, I will update the codes.
There was a problem hiding this comment.
@yoshioterada just an FYI that @essien is not a maintainer on this repo
I don't agree with his suggestion because this library is meant to be consumed by Java users (hence OpenAI Java) so we should be exposing optional fields as Optional instead of nullable
There was a problem hiding this comment.
Thanks for the feedback, @TomerAberbach. I understand the concern about maintaining Java compatibility. However, I noticed that this repo already uses Kotlin's nullable types quite extensively (e.g., in HttpRequest, SseMessage, etc.). Given that context, my suggestion aligns with current conventions in the repo and avoids introducing Optional, which is redundant and less idiomatic in Kotlin.
|
At once, I will close this pull request and create new one. |
Overview
This commit includes the fix described in Issue #211.
Detail
This pull request introduces several changes to the
Embeddingclass and related components in theopenai-java-corepackage. The primary goal is to enhance the handling of embedding vectors by supporting both float lists and Base64-encoded strings. The most important changes include the introduction of theEmbeddingValueclass, modifications to theEmbeddingclass to useEmbeddingValue, and updates to the deserialization logic.Enhancements to embedding handling:
openai-java-core/src/main/kotlin/com/openai/models/embeddings/Embedding.kt: Modified theEmbeddingclass to useEmbeddingValueinstead ofList<Double>for embedding vectors. This includes changes to the constructor, builder, and relevant methods. [1] [2] [3] [4] [5] [6]Introduction of
EmbeddingValueclass:openai-java-core/src/main/kotlin/com/openai/models/embeddings/EmbeddingValue.kt: Added a new classEmbeddingValueto represent embedding vectors, which can be either a list of floats or a Base64-encoded string. This class includes methods for converting between these representations.Deserialization improvements:
openai-java-core/src/main/kotlin/com/openai/models/embeddings/EmbeddingValueDeserializer.kt: Introduced a custom deserializerEmbeddingValueDeserializerto handle the deserialization ofEmbeddingValueobjects from JSON, supporting both float arrays and Base64 strings.Default encoding format:
openai-java-core/src/main/kotlin/com/openai/models/embeddings/EmbeddingCreateParams.kt: Set the defaultEncodingFormattoBASE64for performance improvements.Test updates:
openai-java-core/src/test/kotlin/com/openai/models/embeddings/CreateEmbeddingResponseTest.ktandopenai-java-core/src/test/kotlin/com/openai/models/embeddings/EmbeddingTest.kt: Updated test cases to accommodate the changes in theEmbeddingclass and the introduction ofEmbeddingValue. [1] [2] [3]This code will run look like following Java code.
This PR code will run with look like following code style.