Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.js script tries to run mongodump (nonexistent in app container) #220

Open
eecavanna opened this issue Jul 2, 2024 · 7 comments
Open
Assignees
Labels
backlog backlogged to be reprioritized in the future bug Something isn't working jetstream2 ✈️ Issue related to deploying NMDC EDGE to the Jetstream2 platform

Comments

@eecavanna
Copy link
Collaborator

eecavanna commented Jul 2, 2024

The web server application tries to run mongodump.

const cmd = `mongodump --db ${config.DATABASE.NAME} --out ${config.DATABASE.BACKUP_DIR}/db-backup_${dateStringWithTime}`;

The container in which the web app is running does not include mongodump.

Here's a snippet from the logs of the app container on Jetstream2:

// $ docker compose logs app | grep 'mongodump'

exouser-app-1  | 2024-07-01 01:00:00 info: 	mongodump --db nmdcedge --out /project/io/db/db-backup_2024-07-01:01:00
exouser-app-1  | 2024-07-01 01:00:00 error: 	Command failed: mongodump --db nmdcedge --out /project/io/db/db-backup_2024-07-01:01:00
exouser-app-1  | /bin/sh: mongodump: not found
exouser-app-1  | 2024-07-01 01:00:00 error: 	/bin/sh: mongodump: not found
@eecavanna eecavanna added the bug Something isn't working label Jul 2, 2024
@eecavanna eecavanna added the jetstream2 ✈️ Issue related to deploying NMDC EDGE to the Jetstream2 platform label Jul 2, 2024
@eecavanna
Copy link
Collaborator Author

Looks to me like that command dumps the nmdcedge database to a directory (within some fixed base directory) named YYYY-MM-DD:HH:mm (e.g. db-backup_2024-12-25:12:59).

One concern I have about that is that having colons in a filename may lead to issues in some operating systems (for example, I don't think it's valid in Windows). I'd recommend using underscores, hyphens, or nothing; e.g. db-backup_20241225_1259.

@eecavanna
Copy link
Collaborator Author

The file webapp/server/cronserver.js has the following two jobs (among others) defined in it:

//backup nmdcedge DB every day at 10pm
cron.schedule(config.CRON.SCHEDULES.DATABASE_BACKUP_CREATOR, function () {
dbBackup();
});
//delete older DB backups every day at 12am
cron.schedule(config.CRON.SCHEDULES.DATABASE_BACKUP_PRUNER, function () {
dbBackupClean();
});

The server's config.js file has these default schedules for those jobs:

DATABASE_BACKUP_CREATOR: process.env.CRON_DATABASE_BACKUP_CREATOR_SCHEDULE || "0 1 * * *",
DATABASE_BACKUP_PRUNER: process.env.CRON_DATABASE_BACKUP_PRUNER_SCHEDULE || "0 2 * * *",

In English, those schedules are:

Job cron English
Create 0 1 * * * At 01:00
Prune 0 2 * * * At 02:00

@eecavanna eecavanna changed the title Node.js script tries to run mongodump Node.js script tries to run mongodump (nonexistent in app container) Aug 10, 2024
@eecavanna
Copy link
Collaborator Author

eecavanna commented Aug 10, 2024

For the mongo container, we could create a file named schedule_backups_cronjob.sh in /docker-entrypoint-initdb.d and populate it with the following:

# Install dependencies.
apt update
apt install -y cron nano

# Add the job description to the cron table.
echo '0 1 * * *    mongodump --authenticationDatabase admin --username' "${MONGO_INITDB_ROOT_USERNAME}" '--password' "${MONGO_INITDB_ROOT_PASSWORD}" '--db nmdcedge --out "/tmp/db-backup_`date +\%Y\%m\%d_\%H\%M`" >> /tmp/backups_cronjob.log 2>&1' | crontab -

# Restart the cron service (is this necessary?)
service cron start

Replace /tmp with the path to the directory in which we want the backups to be created.

The container will execute that script upon its startup, according to the "Initializing a fresh instance" section of the container image docs.

@ssarrafan
Copy link

@eecavanna who is this assigned to? Is it being worked on this week?

@eecavanna
Copy link
Collaborator Author

Hi @ssarrafan, it isn't assigned to anyone currently. I'll assign it to myself. I don't expect to work on it any more this week. I'll move it to the next sprint, as I do plan to continue working on it again within the next couple weeks. At some point (depending upon what the remainder of my digging uncovers), I may reassign it or spin part of it off into a separate ticket.

@ssarrafan
Copy link

@eecavanna would you be ok with me moving anything not Berkeley refactor to the next sprint? Like this issues, seems like this could wait.

@eecavanna
Copy link
Collaborator Author

Hi @ssarrafan, yes on this particular issue. I'll move it now.

There are things I want to do that aren't Berkeley Schema Roll Out-related, next Wednesday-Friday.

@eecavanna eecavanna added the backlog backlogged to be reprioritized in the future label Nov 9, 2024
@eecavanna eecavanna removed the status in EDGE Board Nov 9, 2024
@eecavanna eecavanna removed the status in Jetstream Migration Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog backlogged to be reprioritized in the future bug Something isn't working jetstream2 ✈️ Issue related to deploying NMDC EDGE to the Jetstream2 platform
Projects
Status: No status
Status: No status
Development

No branches or pull requests

2 participants