-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
If, immediately after starting Cube, I run a query that triggers a JOIN and uses a segment with a FILTER_PARAMS, future queries using that segment will also have that JOIN even if they don't need it.
If I restart Cube then immediately perform a query with that segment that does not need a JOIN, future queries using that segment will behave correctly (those that need a JOIN have it, those that don't need one don't).
Note:
- If the query uses filters instead of a segment, it seem to work as expected.
- If the segment does not use
FILTER_PARAMS, it works as expected.
To Reproduce
Note
Make sure to run the queries quickly after starting Cube, as the queries triggered in the background by the scheduler can influence the results.
- Start Cube.js
- Send a first query:
{ "dimensions": ["Tickets.ticketId", "Messages.ticketId"], "segments": ["Tickets.closedTickets"], "measures": ["Tickets.ticketCount"] } - There is a
LEFT JOINbetweenTicketsandMessagesas expected. - Send a second query:
{ "dimensions": ["Tickets.ticketId"], "segments": ["Tickets.closedTickets"], "measures": ["Tickets.ticketCount"] } - There is still get a
LEFT JOINwhich is unnecessary and gives worse performance.
Expected behavior
The second query should not have the LEFT JOIN, which is the case if the queries are run in the opposite order.
Minimally reproducible Cube Schema
Note
A full reproducible example repository can be found here: https://github.com/Spark-NF/cubejs-additional-join
It contains the reproduction steps, a docker compose file, this cube config, as well as the outputs of the queries.
cube("Tickets", {
sql: `
SELECT
ticket_id AS ticketId,
max(ticket_closed_datetime) AS closedDatetime,
FROM data_tickets
GROUP BY ticketId
`,
segments: {
closedTickets: {
sql: `${FILTER_PARAMS.Tickets.closedDatetime.filter(`closedDatetime`)}`,
},
},
measures: {
ticketCount: {
type: "count",
sql: `${CUBE}.ticketId`,
},
},
dimensions: {
ticketId: {
sql: `${CUBE}.ticketId`,
type: "number",
primaryKey: true,
shown: true,
},
},
joins: {
Messages: {
relationship: "one_to_one",
sql: `${CUBE}.ticketId = ${Messages}.ticketId`,
},
},
});
cube("Messages", {
sql: `
SELECT
ticket_id AS ticketId,
count(ticket_message_id) AS messagesCount,
FROM data_messages
GROUP BY ticketId
`,
dimensions: {
ticketId: {
sql: `${CUBE}.ticketId`,
type: "number",
primaryKey: true,
shown: true,
},
},
});Version:
Starting 0.35.80, still present on 1.3.62
Additional context
- The issue seem to be present since fix(schema-compiler): propagate FILTER_PARAMS from view to inner cube's SELECT #8466
- Adding
this.collectAllMemberNames();in theBaseQueryconstructor just before building the join hints seem to solve the issue - When the queries are run in the "wrong" order, the cache key
['collectFrom', 'collectJoinHintsFromMembers', 'Tickets.closedTickets']will contain['Tickets', 'Tickets', 'Tickets', 'Tickets', 'Tickets', 'Messages', 'Messages'](for both queries), causing the join in future queries using that segment - When the queries are run in the "correct" order, the cache key
['collectFrom', 'collectJoinHintsFromMembers', 'Tickets.closedTickets']will contain['Tickets', 'Tickets', 'Tickets', 'Tickets', 'Tickets'](for both queries)