Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NC | Online Upgrade Process #8326

Merged
merged 1 commit into from
Oct 14, 2024

Conversation

romayalon
Copy link
Contributor

@romayalon romayalon commented Sep 3, 2024

Explain the changes

High Level -
This PR is responsible for the core logic of the online upgrade of NooBaa NC, according to the design the online upgrade algorithm is as follows -

1. Backup config dir
2. Iterate hosts, upgrade RPM and restart service
    2.1. restart updates system.json host's source code version
4. Backup config dir
5. noobaa-cli upgrade start --expected_version 5.18.0 
    4.1. check should_upgrade()
    4.2. verification -
           4.2.1. Running CLI's host's package.json version is the same as the expected version
           4.2.2. All hosts in system.json were upgraded to the host's version.
    4.3. start - update system.json that phase = locked, in_progress_upgrade details
    4.4. run upgrade scripts
    4.5. finish - update system.json phase unlocked, in_progress_upgrade → upgrade_history

Details -

  1. CLI changes -
    1.1. upgrade.js - Instead of throwing NotImplemented we now call nc_upgrade_manager.upgrade_config_dir() that executes step 4 of the upgrade algorithm.
    1.2. added new flags for upgrade start - --skip_verification, --expected_version, --custom_upgrade_scripts_dir.
    1.3. Updated help, constants and validations files of the cli to support the new flags.

  2. Service changes -

  • nsfs.js - on init_nc_system() if config_directory information is missing, update the system.json with the hosts config directory information.

  • noobaa.service - remove RPM upgrade from the PreExec, moved it to nsfs.js.

  • ConfigFS -
    a. Added this.config_dir_version = '1.0.0'.
    b. Added a new function called - throw_if_config_dir_locked() that compares the config_dir version on system.json and this.config_dir_version, and if they do not match - throw a BAD REQUEST error. this function is called from all config_fs functions that do updates to the config directory, like create/update/delete_account_config_file(), create/update/delete_bucket_config_file().
    c. Added system.json related functions to config_fs, like - create/update_system_config_file() etc.

  • Upgrade Manager -
    a. Created 2 new files - upgrade_utils.js and nc_upgrade_manager.js.
    b. upgrade_utils - moved utils functions that are being used by both upgrade_manager.js and nc_upgrade_manager.js.
    c. nc_upgrade_manager - created update_rpm_upgrade() and upgrade_config_dir(), config_directory_defaults(), _verify_config_dir_upgrade(), _update_config_dir_upgrade_start(), _run_nc_upgrade_scripts(), _update_config_dir_upgrade_finish() functions that implement step 4 of the online upgrade algorithm.

Issues: Fixed #xxx / Gap #xxx

  1. Docs - will be added at the end.
  2. more integration tests
  3. --backup-config-dir-completed TODO?

Testing Instructions:

  1. sudo jest --testRegex=jest_tests/test_nc_upgrade_manager
  • Doc added/updated
  • Tests added

@nimrod-becker nimrod-becker mentioned this pull request Sep 4, 2024
2 tasks
@romayalon romayalon force-pushed the romy-online-upgrade-process branch 3 times, most recently from 4321d6a to 62d12f5 Compare September 8, 2024 08:30
@dannyzaken
Copy link
Contributor

  1. CLI changes -
    1.1. upgrade.js - Instead of throwing NotImplemented we now call nc_upgrade_manager.upgrade_config_dir() that executes step 4 of the upgrade algorithm.
    1.2. added new flags for upgrade start - --force, --expected_version, --custom_upgrade_scripts_dir.

Consider changing the --force flag to something more specific like --skip-verification

@dannyzaken
Copy link
Contributor

dannyzaken commented Sep 10, 2024

  1. noobaa-cli upgrade start --expected_version 5.18.0
    4.1. check should_upgrade()
    4.2. verification -
    4.2.1. Running CLI's host's package.json version is the same as the expected version
    4.2.2. All hosts in system.json were upgraded to the host's version.
    4.3. start - update system.json that phase = locked, in_progress_upgrade details
    4.4. run upgrade scripts

not specific to this PR, but the upgrade scripts must be reentrant in case the upgrade is restarted for some reason

src/cmd/nsfs.js Outdated
}
}
});
await config_fs.create_system_config_file(JSON.stringify(updated_system_json));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably be an update operation and not create

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is probably also relevant in this case, since all hosts will attempt to update the config dir concurrently

src/sdk/config_fs.js Outdated Show resolved Hide resolved
async _throw_if_config_dir_locked() {
const system_data = await this.get_system_config_file({silent_if_missing: true});
if (!system_data) return;
if (this.config_dir_version !== system_data.config_directory.config_dir_version) {
Copy link
Contributor

@dannyzaken dannyzaken Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is also related to the meaning of the config_dir_version. It should probably be defined better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used version_compare, is this what you meant?

const this_upgrade = { timestamp: Date.now(), from_version, to_version };
system_data[hostname].current_version = to_version;
system_data[hostname]?.upgrade_history?.successful_upgrades.unshift(this_upgrade);
await config_fs.update_system_json_with_retries(JSON.stringify(system_data));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update issue is relevant here too

@dannyzaken
Copy link
Contributor

1. Backup config dir
2. Iterate hosts, upgrade RPM and restart service
    2.1. restart updates system.json host's source code version
4. Backup config dir
5. noobaa-cli upgrade start --expected_version 5.18.0 
    4.1. check should_upgrade()
    4.2. verification -
           4.2.1. Running CLI's host's package.json version is the same as the expected version
           4.2.2. All hosts in system.json were upgraded to the host's version.
    4.3. start - update system.json that phase = locked, in_progress_upgrade details
    4.4. run upgrade scripts
    4.5. finish - update system.json phase unlocked, in_progress_upgrade → upgrade_history

regarding the verification step 4.2.2 - we discussed in previous meetings that we might want to prevent older nodes from running on the data, instead of being cluster aware. one option we raised was to move the config dir to a new location.

@romayalon romayalon force-pushed the romy-online-upgrade-process branch 6 times, most recently from 9622d3d to 1a3514e Compare October 14, 2024 14:30
@romayalon romayalon merged commit c9876e6 into noobaa:master Oct 14, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants