Skip to content

Conversation

pulpdrew
Copy link
Contributor

@pulpdrew pulpdrew commented Oct 2, 2025

Summary

Closes HDX-2310

This PR implements chunking of chart queries to improve performance of charts on large data sets and long time ranges. Recent data is loaded first, then older data is loaded one-chunk-at-a-time until the full chart date range has been queried.

Screen.Recording.2025-10-03.at.1.11.09.PM.mov

Performance Impacts

Expectations

This change is intended to improve performance in a few ways:

  1. Queries over long time ranges are now much less likely to time out, since the range is chunked into several smaller queries
  2. Average memory usage should decrease, since the total result size and number of rows being read are smaller
  3. Perceived latency of queries over long date ranges is likely to decrease, because users will start seeing charts render (more recent) data as soon as the first chunk is queried, instead of after the entire date range has been queried. However, total latency to display results for the entire date range is likely to increase, due to additional round-trip network latency being added for each additional chunk.

Measured Results

Overall, the results match the expectations outlined above.

  • Total latency changed between ~-4% and ~25%
  • Average memory usage decreased by between 18% and 80%
Scenarios and data

In each of the following tests:

  1. Queries were run 5 times before starting to measure, to ensure data is filesystem cached.
  2. Queries were then run 3 times. The results shown are the median result from the 3 runs.

Scenario: Log Search Histogram in Staging V2, 2 Day Range, No Filter

Total Latency Memory Usage (Avg) Memory Usage (Max) Chunk Count
Original 5.36 409.23 MiB 409.23 MiB 1
Chunked 5.14 83.06 MiB 232.69 MiB 4

Scenario: Log Search Histogram in Staging V2, 14 Day Range, No Filter

Total Latency Memory Usage (Avg) Memory Usage (Max) Chunk Count
Original 26.56 383.63 MiB 383.63 MiB 1
Chunked 33.08 130.00 MiB 241.21 MiB 16

Scenario: Chart Explorer Line Chart with p90 and p99 trace durations, Staging V2 Traces, Filtering for "GET" spans, 7 Day range

Total Latency Memory Usage (Avg) Memory Usage (Max) Chunk Count
Original 2.79 346.12 MiB 346.12 MiB 1
Chunked 3.26 283.00 MiB 401.38 MiB 9

Implementation Notes

When is chunking used? Chunking is used when all of the following are true:
  1. granularity and timestampValueExpression are defined in the config. This ensures that the query is already being bucketed. Without bucketing, chunking would break aggregation queries, since groups can span multiple chunks.
  2. dateRange is defined in the config. Without a date range, we'd need an unbounded set of chunks or the start and end chunks would have to be unbounded at their start and end, respectively.
  3. The config is not a metrics query. Metrics queries have complex logic which we want to avoid breaking with the initial delivery of this feature.
  4. The consumer of useQueriedChartConfig does not pass the disableQueryChunking: true option. This option is provided to disable chunking when necessary.
How are time windows chosen?
  1. First, generate the windows as they are generated for the existing search chunking feature (eg. 6 hours back, 6 hours back, 12 hours back, 24 hours back...)
  2. Then, the start and end of each window is aligned to the start of a time bucket that depends on the "granularity" of the chart.
  3. The first and last windows are shortened or extended so that the combined date range of all of the windows matches the start and end of the original config.
Which order are the chunks queried in?

Chunks are queried sequentially, most-recent first, due to the expectation that more recent data is typically more important to the user. Unlike with useOffsetPaginatedSearch, we are not paginating the data beyond the chunks, and all data is typically displayed together, so there is no need to support "ascending" order.

Does this improve client-side caching behavior?

One theoretical way in which query chunking could improve performance to enable client-side caching of individual chunks, which could then be re-used if the same query is run over a longer time range.

Unfortunately, using streamedQuery, react-query stores the entire time range as one item in the cache, so it does not re-use individual chunks or "pages" from another query.

We could accomplish this improvement by using useQueries instead of streamedQuery or useInfiniteQuery. In that case, we'd treat each chunk as its own query. This would require a number of changes:

  1. Our query key would have to include the chunk's window duration
  2. We'd need some hacky way of making the useQueries requests fire in sequence. This can be done using enabled but requires some additional state to figure out whether the previous query is done.
  3. We'd need to emulate the return value of a useQuery using the useQueries result, or update consumers.

Copy link

changeset-bot bot commented Oct 2, 2025

🦋 Changeset detected

Latest commit: d854c1f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@hyperdx/app Patch
@hyperdx/api Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

vercel bot commented Oct 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
hyperdx-v2-oss-app Ready Ready Preview Comment Oct 10, 2025 7:33pm

Copy link
Contributor

github-actions bot commented Oct 2, 2025

E2E Test Results

All tests passed • 25 passed • 3 skipped • 221s

Status Count
✅ Passed 25
❌ Failed 0
⚠️ Flaky 0
⏭️ Skipped 3

View full report →

"@microsoft/fetch-event-source": "^2.0.1",
"@tabler/icons-react": "^3.5.0",
"@tanstack/react-query": "^5.56.2",
"@tanstack/react-query": "^5.90.2",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgraded to pull in the new experimental streamedQuery function.

If we want to avoid relying on the experimental feature, this could be implemented more verbosely with useInfiniteQuery.

@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from e077e57 to 5620ca8 Compare October 3, 2025 15:09
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from a3b22e9 to 2d0e0b7 Compare October 3, 2025 20:01
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 2d0e0b7 to 6229d52 Compare October 6, 2025 18:40
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 6229d52 to 3e73289 Compare October 6, 2025 18:49
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 943ddb8 to 3e899c8 Compare October 6, 2025 19:17
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 3e899c8 to 76d2753 Compare October 6, 2025 20:48
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 76d2753 to ec976e5 Compare October 7, 2025 14:45
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from ec976e5 to ca69e95 Compare October 7, 2025 16:23
@pulpdrew pulpdrew changed the title Drew/paginated chart queries feat: Implement query chunking for charts Oct 7, 2025
@pulpdrew pulpdrew marked this pull request as ready for review October 7, 2025 17:03
@pulpdrew pulpdrew requested review from a team and dhable and removed request for a team October 7, 2025 17:06
@pulpdrew pulpdrew changed the title feat: Implement query chunking for charts perf: Implement query chunking for charts Oct 7, 2025
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from ca69e95 to 61afdb4 Compare October 9, 2025 15:34
Copy link

claude bot commented Oct 9, 2025

PR Review: Query Chunking for Charts

Overview

This PR implements progressive query chunking for chart queries to improve performance on large datasets and long time ranges. The implementation is well-thought-out with comprehensive testing and clear documentation.

✅ Strengths

1. Excellent Test Coverage

  • 940 lines of comprehensive unit tests covering edge cases
  • Tests for alignment, window generation, chunking behavior, loading states
  • Good use of mocking and async testing patterns

2. Smart Performance Trade-offs

  • Clear documentation of performance expectations vs measured results
  • Recent data loads first (most relevant to users)
  • Prevents timeouts on large queries
  • Reduces memory usage significantly (18-80% reduction)

3. Safe Defaults

  • Only enables chunking when safe (has granularity, dateRange, timestampValueExpression)
  • Explicitly disables for metrics queries (avoiding complex edge cases)
  • Provides disableQueryChunking escape hatch

4. Good State Management

  • isComplete flag clearly indicates when all chunks are loaded
  • Proper distinction between isPending, isFetching, and isLoading
  • Uses TanStack Query's streamedQuery appropriately

🔍 Issues & Suggestions

1. Critical: Package Version Consistency ⚠️

// packages/app/package.json
"@tanstack/react-query": "^5.90.2",  // Updated
"@tanstack/react-query-devtools": "^5.56.2",  // Not updated

Issue: Mismatched versions between react-query and react-query-devtools could cause runtime issues.
Recommendation: Update devtools to match: "@tanstack/react-query-devtools": "^5.90.2"

2. Potential Bug: Infinite Loop Risk ⚠️

In DBTimeChart.tsx:223, the onError callback is called on every render when there's an error:

if (query.isError && options?.onError) {
  options.onError(query.error);
}

Issue: This runs on every render, not just when error state changes. Could cause performance issues or infinite loops if onError triggers a re-render.
Recommendation: Use useEffect with error as dependency or move into query options:

const query = useQuery<TQueryFnData, ClickHouseQueryError | Error>({
  // ... other options
  onError: options?.onError, // Let react-query handle it
});

3. Edge Case: Empty Window Array

In getGranularityAlignedTimeWindows (line 105-106):

if (!windows.length || alignedStart < windows[windows.length - 1].dateRange[0]) {
  windows.push({...});
}

Potential Issue: The !windows.length check should ensure the first window is always added, but the logic could be clearer.
Recommendation: Consider simplifying:

const shouldAddWindow = windows.length === 0 || 
  alignedStart < windows[windows.length - 1].dateRange[0];
if (shouldAddWindow) {
  windows.push({...});
}

4. Type Safety: Union Discriminator

In useChartConfig.tsx:55-71, the shouldUseChunking type guard is good, but could be more explicit:

const shouldUseChunking = (
  config: ChartConfigWithOptDateRange,
): config is ChartConfigWithDateRange & {
  granularity: string;
} => {
  if (!isUsingGranularity(config)) return false;
  if (!config.dateRange) return false;
  if (isMetricChartConfig(config)) return false;
  return true;
};

Suggestion: Consider adding explicit type assertions where shouldUseChunking is used to help TypeScript narrow types more effectively.

5. Memory Consideration: Data Accumulation

The reducer in useQueriedChartConfig accumulates all chunks:

reducer: (acc, { chunk, isComplete }) => {
  return {
    data: [...(chunk.data || []), ...(acc?.data || [])],
    meta: chunk.meta,
    rows: (acc?.rows || 0) + (chunk.rows || 0),
    isComplete,
  };
},

Consideration: For very long time ranges with many chunks, this could still accumulate significant data. Consider documenting maximum expected chunks or adding a safeguard.
Recommendation: Add a comment about expected max chunks, or consider a warning if chunk count exceeds expectations.

6. Code Duplication in PatternTable.tsx

const isLoading = isTotalCountLoading || !isTotalCountComplete || isGroupedPatternsLoading;

Minor: The isTotalCountComplete check is added but the relationship between isTotalCountLoading and isTotalCountComplete could be clearer in documentation.

7. Missing Error Boundary

The chunking implementation doesn't have explicit error recovery if a chunk fails mid-stream.
Question: What happens if chunk 2 of 5 fails? Does the user see partial data or an error state?
Recommendation: Document the error behavior and consider adding retry logic per chunk or a "partial data available" indicator.

🎯 Recommendations

High Priority

  1. Fix package version mismatch for @tanstack/react-query-devtools
  2. Fix onError callback to prevent potential infinite loops
  3. Document error handling for failed chunks

Medium Priority

  1. Add JSDoc comments to getGranularityAlignedTimeWindows explaining the alignment logic
  2. Consider adding telemetry to track actual chunking performance in production
  3. Add integration test with failing chunk to verify error handling

Low Priority

  1. Consider extracting window generation constants to config
  2. Add performance budget documentation (max chunks, max total rows)

📊 Performance Analysis

The measured results align well with expectations:

  • ✅ Total latency impact acceptable (-4% to +25%)
  • ✅ Memory reduction significant (18-80%)
  • ✅ User experience improved (progressive loading)

The trade-off of slightly increased total latency for much better perceived performance is appropriate for this use case.

🏗️ Architecture

The implementation follows HyperDX patterns well:

  • ✅ Proper use of TanStack Query patterns
  • ✅ Separation of concerns (window generation, query execution, state management)
  • ✅ Typescript type safety with proper guards
  • ✅ Follows existing hook patterns

📝 Documentation

  • ✅ Excellent PR description with detailed rationale
  • ✅ Implementation notes cover key decisions
  • ✅ Performance data included
  • ⚠️ Could benefit from inline JSDoc for complex functions

Overall Assessment

This is a well-designed feature with excellent testing. The main concerns are the package version mismatch and the onError callback pattern. Once those are addressed, this should be safe to merge.

Recommendation: Request changes for the two high-priority issues, then approve after fixes.


Review generated by Claude Code 🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant