Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,20 @@ While in maintenance mode, a Datanode does not accept new writes but may still s
The Datanode transitions through the following operational states during maintenance:

1. **IN_SERVICE**: The Datanode is fully operational and participating in data writes and reads.
2. **ENTERING_MAINTENANCE**: The Datanode is transitioning into maintenance mode. New writes will be avoided.
2. **ENTERING_MAINTENANCE**: The Datanode is transitioning into maintenance mode. New writes will be avoided. The SCM monitors the Datanode until it meets all safety criteria before allowing it to fully enter maintenance.
3. **IN_MAINTENANCE**: The Datanode is in maintenance mode. Data will not be written to it. If the Datanode remains in this state beyond the configured maintenance window, its data will start to be replicated to other Datanodes to ensure data durability.

### Transition Criteria (ENTERING_MAINTENANCE to IN_MAINTENANCE)

A Datanode will remain in the `ENTERING_MAINTENANCE` state until the SCM (Storage Container Manager) verifies the following safety conditions:

* **Pipeline Closure**: All open Ratis and EC pipelines on the Datanode must be successfully closed. This ensures no active write operations are interrupted.
* **Datanode Acknowledgment**: The Datanode must confirm it has received the maintenance command and persisted the "Entering Maintenance" state to its local disk. This prevents state confusion if the Datanode is rebooted.
* **Sufficient Replication (Data Safety)**: The SCM verifies that every container stored on the Datanode has enough healthy copies elsewhere in the cluster to remain safe while the node is offline.
* **Ratis (3-way)**: By default, at least 2 replicas must remain online on other healthy Datanodes (configurable via `hdds.scm.replication.maintenance.replica.minimum`).
* **Erasure Coding (EC)**: By default, the cluster must maintain at least `Data Shards + 1` available shards elsewhere (configurable via `hdds.scm.replication.maintenance.remaining.redundancy`). For example, in an RS(6,3) policy, at least 7 shards must be online.
* **Health Check**: Every container on the node must be in a stable state (e.g., `CLOSED` or `QUASI_CLOSED`). If a container is under-replicated or "unclosed," the SCM will block the transition and trigger background replication to create new copies on other nodes until the safety threshold is met.

## Command Line Usage

To place a Datanode into maintenance mode, use the `ozone admin datanode maintenance` command. You can specify a duration for the maintenance period. If no duration is specified, a default duration will be used (this can be configured).
Expand Down
Loading