-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZNS support #1298
base: develop
Are you sure you want to change the base?
ZNS support #1298
Changes from all commits
2a360e4
4010610
d79cc74
7b8dac8
2c3f6e8
a997bfb
3ad7a61
5e320a3
5995d31
0e10011
344c3fb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ mkShell { | |
buildInputs = [ | ||
autoconf | ||
automake | ||
btrfs-progs | ||
clang | ||
cowsay | ||
docker | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Zoned Storage Support | ||
Mayastor supports zoned storage in the form of PCIe ZNS devices and zoned SPDK uring block devices. | ||
|
||
## Overview | ||
Zoned storage is a class of storage that divides its address space into zones. These zones come with a sequential write constraint. Therefore, writes can just be issued to the zones write pointer, which will be advanced with a successful write operation. If the zone's capacity is reached, the zone is being transferred to the 'Full' state by the device controller and can not be rewritten until the zone is actively reset by the user. As of now zoned storage is available in the form of SMR HDDs and ZNS SSDs. This proposal focuses on ZNS SSDs. | ||
For more information about zoned storage visit [zonedstorage.io](https://zonedstorage.io/). | ||
|
||
Zoned Namespace (ZNS) NVMe SSDs are defined as part of a NVMe Command Set (see 'NVM Express Zoned Namespace Command Set Specification' in the [NVMe Command Set Specifications](https://nvmexpress.org/developers/nvme-command-set-specifications/)) and is supported since Linux kernel v5.9. SPDK supports zoned storage since v20.10. | ||
|
||
Because ZNS SSDs align their flash media with zones, no on device garbage collection is needed. This results in better throughput, predictable latency and higher capacities per dollar (because over provisioning and DRAM for page mapping is not needed) in comparison to conventional SSDs. | ||
|
||
The concept of ZNS SSDs and its advantages are discussed in depth in the ['ZNS: Avoiding the Block Interface Tax for Flash-based SSDs'](https://www.usenix.org/conference/atc21/presentation/bjorling) paper. | ||
|
||
[RocksDB](https://github.com/facebook/rocksdb) and [TerarkDB](https://github.com/bytedance/terarkdb) are example applications of end to end integration with zoned storage through [ZenFS](https://github.com/westerndigitalcorporation/zenfs). | ||
POSIX file systems like f2fs and btrfs also have zone support. | ||
|
||
## Requirements for Mayastor | ||
Initially the ZNS support in Mayastor is targeting the non-replicated volume I/O path with a disabled volume partitioning. | ||
Replication and volume partitioning can be addressed later on as those features require special care in regards to the sequential write constrain and the devices max active zones and max open zones restrictions. | ||
|
||
The NexusChild of a non-replicated Nexus should allow ZNS NVMe devices via the PCIe URI scheme as well as zoned SPDK uring devices via the uring URI scheme. This results automatically in a zoned nexus which is exposed to the user as a raw zoned NVMe-oF target or formated with btrfs. | ||
|
||
## Prerequisites | ||
- Linux kernel v5.15.68 or higher is needed because of the patch [nvmet: fix mar and mor off-by-one errors](https://lore.kernel.org/lkml/[email protected]/) | ||
- SPDK 23.01 is needed because of [ZNS support for NVMe-oF](https://review.spdk.io/gerrit/c/spdk/spdk/+/16044/7) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -98,6 +98,7 @@ async-process = { version = "1.8.1" } | |
rstack = { version = "0.3.3" } | ||
tokio-stream = "0.1.14" | ||
rustls = "0.21.12" | ||
jemalloc-sys = "0.5.2+5.3.0-patched" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is this used for? A patched version makes wonder if this is stable enough.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for mentioning this! Can you suggest a nicer way to do this? :) @hrudaya21 Unfortunately, there is no convenient way to allocate this struct within SPDK. |
||
|
||
devinfo = { path = "../utils/dependencies/devinfo" } | ||
jsonrpc = { path = "../jsonrpc"} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some doc comments explaining what this nullblk is, for example, does it just ignore writes and return zeroes?