Skip to content

Conversation

@OwenDavisBC
Copy link

@OwenDavisBC OwenDavisBC commented Oct 17, 2025

Fixes #510

@OwenDavisBC OwenDavisBC marked this pull request as ready for review October 17, 2025 16:21
@OwenDavisBC OwenDavisBC force-pushed the ISSUE-510 branch 2 times, most recently from bbf6d0c to 33ac74f Compare October 20, 2025 17:01
@Poggecci Poggecci self-requested a review October 21, 2025 17:15
Copy link
Contributor

@Poggecci Poggecci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Owen. Thank you for the contribution!

This PR seems to both resolve the thoughts accumulation issue, but also changes what content we send in our requests to the model. Is there a reason you've coupled these two changes? The original rationale behind stripping thoughts was a naive form of context management (although the relevance of this is fully dependent on the GenAI SDK's handling of the thoughts we provide), but since we've had this behavior since release, a change to it merits its own PR.

Happy to approve if just the accumulation is kept in this PR or if we have some further discussion on the matter.

} else {
if (accumulatedThoughtText.length() > 0
&& GeminiUtil.shouldEmitAccumulatedText(currentProcessedLlmResponse)) {
LlmResponse aggregatedTextResponse =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregatedThoughtResponse here?

@OwenDavisBC
Copy link
Author

@Poggecci thank you for taking a look. The issue with stripping thoughts is that will ultimately lead to empty parts being sent to the gemini API now that we are accumulating thought-only parts. I can add some more processing to remove those from the llm request here

public Flowable<LlmResponse> generateContent(LlmRequest llmRequest, boolean stream) {
llmRequest = GeminiUtil.prepareGenenerateContentRequest(llmRequest, !apiClient.vertexAI());
GenerateContentConfig config = llmRequest.config().orElse(null);
String effectiveModelName = llmRequest.model().orElse(model());
logger.trace("Request Contents: {}", llmRequest.contents());
logger.trace("Request Config: {}", config);
if (stream) {
logger.debug("Sending streaming generateContent request to model {}", effectiveModelName);
CompletableFuture<ResponseStream<GenerateContentResponse>> streamFuture =
apiClient.async.models.generateContentStream(
effectiveModelName, llmRequest.contents(), config);
return Flowable.defer(
() ->
processRawResponses(
Flowable.fromFuture(streamFuture).flatMapIterable(iterable -> iterable)));
} else {
logger.debug("Sending generateContent request to model {}", effectiveModelName);
return Flowable.fromFuture(
apiClient
.async
.models
.generateContent(effectiveModelName, llmRequest.contents(), config)
.thenApplyAsync(LlmResponse::create));
}
}
, but that would not match what I see in the python-adk.

In the python-adk if I look at pre-processing before https://github.com/google/adk-python/blob/4a842c5a1334c3ee01406f796651299589fe12ab/src/google/adk/models/google_llm.py#L149-L154

    if stream:
      responses = await self.api_client.aio.models.generate_content_stream(
          model=llm_request.model,
          contents=llm_request.contents,
          config=llm_request.config,
      )

I see nothing removing thoughts - https://github.com/google/adk-python/blob/4a842c5a1334c3ee01406f796651299589fe12ab/src/google/adk/models/google_llm.py#L300-L326.

  async def _preprocess_request(self, llm_request: LlmRequest) -> None:

    if self._api_backend == GoogleLLMVariant.GEMINI_API:
      # Using API key from Google AI Studio to call model doesn't support labels.
      if llm_request.config:
        llm_request.config.labels = None

      if llm_request.contents:
        for content in llm_request.contents:
          if not content.parts:
            continue
          for part in content.parts:
            # Create copies to avoid mutating the original objects
            if part.inline_data:
              part.inline_data = copy.copy(part.inline_data)
              _remove_display_name_if_present(part.inline_data)
            if part.file_data:
              part.file_data = copy.copy(part.file_data)
              _remove_display_name_if_present(part.file_data)

    # Initialize config if needed
    if llm_request.config and llm_request.config.tools:
      # Check if computer use is configured
      for tool in llm_request.config.tools:
        if isinstance(tool, types.Tool) and tool.computer_use:
          llm_request.config.system_instruction = None
          await self._adapt_computer_use_tool(llm_request)

@OwenDavisBC OwenDavisBC force-pushed the ISSUE-510 branch 7 times, most recently from 516d136 to a3f0560 Compare October 27, 2025 17:12
@OwenDavisBC OwenDavisBC requested a review from Poggecci October 28, 2025 21:55
@OwenDavisBC OwenDavisBC force-pushed the ISSUE-510 branch 4 times, most recently from c1754f1 to eb26d88 Compare November 4, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemini thoughts not correctly accumulated when streaming enabled

2 participants