Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-31636 Memory leak in Sys_var_gtid_binlog_state::do_check() #3523

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

DaveGosselin-MariaDB
Copy link
Member

@DaveGosselin-MariaDB DaveGosselin-MariaDB commented Sep 17, 2024

Move memory allocations performed during Sys_var_gtid_binlog_state::do_check to Sys_var_gtid_binlog_state::global_update where they will be freed before the latter method returns.

@DaveGosselin-MariaDB DaveGosselin-MariaDB self-assigned this Sep 17, 2024
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

sql/sys_vars.cc Outdated
{
if (!var->save_result_exists)
return;
auto data= (gtid_binlog_state_data *)var->save_result.ptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptr is allocated in do_check() somewhat quite surprisingly 'cos it all looks to me this could be done inside
`var->update().

I'd consider that first. If turns true indeed, we naturally shall carefully check all existing class' do_check() (to my browsing there's none but ours replication's).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, though it logically makes sense for the actual validation of the input value to be in do_check(). Sys_var_gtid_slave_pos has a slightly different, yet still consistent to check semantics pattern, which in Sys_var_gtid_slave_pos::do_check() validates the value and saves the string on thd, and then uses that string in Sys_var_gtid_slave_pos::global_update() in a load. Can we follow that pattern here? (Though @DaveGosselin-MariaDB I recall in our discussion you mentioned trying to save something on the thread, and it not working. Is this what you tried?).

Generally speaking though, I' say whichever pattern we pick, Sys_var_gtid_binlog_state and Sys_var_gtid_slave_pos should probably follow the same pattern in the source code for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Brandon, I wasn't saving something on the thread but rather marking variables such that an on_error phase could be executed over all affected variables so that they may handle any cleanup in their own implementation-dependent way (see 90eab64 )

Copy link
Contributor

@andrelkin andrelkin Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnestere, a good reminder point Sys_var_gtid_slave_pos::do_check!
Indeed both variables need be consistent wrt do_check() actions.
They both allocate and are different in the style of allocation. Sys_var_gtid_slave_pos 's

if (!(var->save_result.string_value.str=
       thd->strmake(res->ptr(), res->length())))
 {
   my_error(ER_OUT_OF_RESOURCES, MYF(0));
   return true;
 }

is comparable
with a block the current patch relocated.

I suggest we do the same. res would be computed in Sys_var_gtid_slave_pos::global_update as defined here res= var->value->val_str(&str).
@DaveGosselin-MariaDB, your thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bnestere and @andrelkin! Sys_var_gtid_slave_pos::do_check leaks memory in the sense that the result of strmake will live on the THD memroot indefinitely in the case of a failure like the one in the attached test. Consequently, it seems that making Sys_var_gtid_binlog_state behave similarly won't fix the leak but will hide it instead. For the handle_* replication threads, the memroot's lifetime extends beyond the current query, correct? If so, we could potentially have leaks build up for the lifetime of the process, not just for the query's lifetime.
I think we could either (1) deliver my changes from 90eab64 or (2) keep the current changes and modify Sys_var_gtid_slave_pos::do_check to avoid leaking memory in the case of a failure by applying to it the same idea that we applied to Sys_var_gtid_binlog_state::do_check in the current commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great point, we probably shouldn't hide the leak either. I think the on_error() approach offers a good amount of flexibility. Though before we make a decision, I'll throw a third option into the mix: 3) add a MEM_ROOT to set_var so we can allocate within the context of one SET (and automatically free it in set_var::~set_var())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree (1) is more flexible. (2) is just simpler as so far it has been sufficient. I would pick (2) to hold on with (1) until any first incident that leaves no other, incl (1), choices is met.
To (3), I am reluctant. Indeed the THD::main_mem_root that is used here gets reset at the end of the statement (incl a SET one with a failing sub-SET).

I would propose (4) to keep (2) fool-proof with consting do_check( ..., const set_var *var) the 2nd arg. But I am not really proposing, as the signature change looks super-pervasive.
Also this measure of course would not block altering other objects (in replication uses case mysql_bin_log or rpl_slave_state could become victims of that). That's why I think such altering actions should be a part of global_update() :s rather than a part of (1) 's on_error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep (2) but without adding the const then the change as it is should be fine to merge because of my reasoning from #3523 (comment) . If that's the case, then let's deliver this change as it is. We could create a ticket to address broader architectural changes as a separate effort.

Copy link
Member Author

@DaveGosselin-MariaDB DaveGosselin-MariaDB Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I see I have an OK from Andrei in a separate comment on this PR, can I take that to mean that it's OK to push?). #3523 (review)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amen!

Copy link
Contributor

@andrelkin andrelkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dave for a good piece. It all seems to provide a solution. However let's look at the issue at a different angle I am suggesting in a note.

@DaveGosselin-MariaDB
Copy link
Member Author

Thanks Dave for a good piece. It all seems to provide a solution. However let's look at the issue at a different angle I am suggesting in a note.

Thank you Andrei 😄 I will investigate the other angle.

Copy link
Contributor

@andrelkin andrelkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I am thinking though about Sys_var_gtid_slave_pos::do_check() that @bnestere mentions.

@montywi
Copy link
Contributor

montywi commented Sep 23, 2024 via email

Move memory allocations performed during Sys_var_gtid_binlog_state::do_check
to Sys_var_gtid_binlog_state::global_update where they will be freed before
the latter method returns.
@DaveGosselin-MariaDB DaveGosselin-MariaDB merged commit f7c5182 into 10.5 Sep 25, 2024
8 of 11 checks passed
@DaveGosselin-MariaDB DaveGosselin-MariaDB deleted the 10.5-mdev-31636 branch September 25, 2024 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants