Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-36297: Ability to backup selected partitions only #3891

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

slawomir-pryczek
Copy link

@slawomir-pryczek slawomir-pryczek commented Mar 13, 2025

Description

This code adds ability to backup selected partition(s) by mariadb-dump by using --partition-separator=@ and then specifying partition name in table's name, eg. mytable1@p0,p1. There should be no side effects, support for selecting partition(s) is disabled by default.
./mariadb-dump -umyuser -pmypass -h127.0.0.1 mydatabase _test1@p0,p1 _test2@p0 --partition-separator=@

Release Notes

Ability to backup specific partitions with mariadb-dump by using --partition-separator and specifying partition list in table's name

How can this PR be tested?

Won't affect mariadb-server, will try to write test case if that gets accepted.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@CLAassistant
Copy link

CLAassistant commented Mar 13, 2025

CLA assistant check
All committers have signed the CLA.

@slawomir-pryczek slawomir-pryczek marked this pull request as ready for review March 13, 2025 19:40
@slawomir-pryczek slawomir-pryczek marked this pull request as draft March 13, 2025 19:50
@slawomir-pryczek slawomir-pryczek marked this pull request as ready for review March 13, 2025 20:03
@svoj svoj added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 13, 2025
@kayofontinhas
Copy link

kayofontinhas commented Mar 14, 2025

IMHO I think that we can improve this command line arguments, maybe we can set one option to tell mysqldump that we will dump using partitions, and we can set a default separator and a optional parameter to set the separator.

maybe something like this

  1. Default case (@ as default separator)

./mariadb-dump -umyuser -pmypass -h127.0.0.1 mydatabase _test1@p0,p1 _test2@p0 --dump-using-partitions

  1. Improved (setting a optional separator)

./mariadb-dump -umyuser -pmypass -h127.0.0.1 mydatabase _test1$p0,p1 _test2$p0 --dump-using-partitions --partition-separator=$

@cvicentiu cvicentiu changed the title Ability to backup selected partitions only MDEV-36297: Ability to backup selected partitions only Mar 14, 2025
@cvicentiu
Copy link
Member

Hi @slawomir-pryczek!

Thank you for the contribution. Looking at the code briefly, it looks ok. I could not find a corresponding MDEV for this feature request so I created a new one, MDEV-36297. If there is one already, please let me know and I will mark this one as dupicate.

However, we need to get consensus whether this is the preferred approach for the CLI interface. My other thought is something along the lines of:

mariadb-dump -u<user> -p<pass> -h<host> database table1 --part p1,p2,p3 table2 --part p1,p2,p3

This would not require the use of a special string separator.

I also am thinking in how this interacts with --tables and --databases switches. With --databases there should be no partition specification support (either with my suggestion or yours).

@vaintroub @vuvova I'd like your opinion on the user experience. Do you think that the proposed approach in the PR is ok? Do you like my suggestion better or do you have an alternative suggestion?

Let's get consensus on the UX first, then I can review the code accordingly.

@cvicentiu
Copy link
Member

Another note is that Oracle separates partitions with :.

Perhaps writing tables like so:

table_1:part_1,part_2, without allowing a custom separator is more "familiar" across the industry?

@vaintroub
Copy link
Member

vaintroub commented Mar 14, 2025

My preference would be

  • no separator option
  • --partitions boolean option that applies to all databases and tables
  • individual partitions are backed up using "table1#p#part_name1", . "table1#p#part_nameN". That is, multiple entries for the same table, with fixed #p# used as separator ( or we can decide on# but once decision is made, stick to it, it is fixed separator). No new --parts position-dependent option.
  • This all must either work well with mariadb-import --dir/--tab , or else partitions should be rejected by mariadb-dump --dir/--tab
    (it would be nice of course to make use of --parallel, so it makes sense to make it work with "multi-file" output)

@vuvova
Copy link
Member

vuvova commented Mar 14, 2025

Make sure it works with all possible table names. If you use : as a separator — it shouldn't break if a table name includes :. If you're using # — ditto.

@cvicentiu
Copy link
Member

cvicentiu commented Mar 14, 2025

Thank you @vaintroub @vuvova for the comments.

I think @vaintroub suggestions make the most sense. I have a preference for : as opposed to #p# as the separator. @vaintroub are you strongly against the use of :, or see any other problems that : could cause? To me that reads much easier.

So I formalize the spec as:

  1. --partitions option to allow for specifying individual partitions in tables. Without it, table1:p1 is considered as a full table name.
  2. Define each partition separately, with full table name prefix:
    mariadb-dump --partitions db table1:p1 table1:p2 table1:p3 or
    mariadb-dump --partitions db table1#p#p1 table1#p#p2 table1#p#p3
  3. For the first iteration, to minimise scope, let's disallow --partitions with --dir and --tab options.
  4. Ensure mariadb-dump works with table names including the separator as part of their name. This needs to be covered by test cases.

@slawomir-pryczek Would you like to take a stab at implementing things this way?

@slawomir-pryczek
Copy link
Author

Hi Everyone, thanks for the feedback. @cvicentiu yes, i'll wait untill spec is completed and start implementing it.

As for the issues with backing up tables with separator in the name, we could escape colon by using double colon, if colon is picked for separator. That should be simple to use and don't break anything in "normal" mode while everything will work in partition mode.

As for doing each partition/table pair separately in command line, that should work. Will have to just backup all partitions at once, and put it in single file because of drop/create table blocks would make this kind of backp unusable and only single partition would be restored (if we have separated files for each partition). If we just backup each selected partition at once and keep it in single place all should be fine.

So maybe this way will work if we have colon in the name for partition mode some::table::name:partition1. So we say in partition mode all colons in table names should be replaced with double colon...

@vaintroub
Copy link
Member

@cvicentiu : The ":" separator is fine on the command line . But I still prefer '#' , because it can be used in filenames, and ':' not necessarily .
Let's imagine, mariadb-dump --dir is supported, and you'd now have to have a file-per-partition, and you'd need to name this file and pass it into SELECT INTO OUTFILE. Can you name it "table:part.txt"? Not on Windows. Can you name it "table#part.txt" . Yes, everywhere. If you stick with colon on the command line, you'd have to invent another naming scheme for files on disk. IF you used "#" or "#p#", you would not have to.
So while I do not have a very strong opinion, I lean into other separator that is not listed as problematic character in https://en.wikipedia.org/wiki/Filename#Problematic_characters

@slawomir-pryczek
Copy link
Author

Yes that's true, so we can maybe have # as partition separator and ## for escaping "#" char in table names. If that sounds good i'll start implementing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.
Development

Successfully merging this pull request may close these issues.

7 participants