-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path387
More file actions
98 lines (49 loc) · 36.9 KB
/
387
File metadata and controls
98 lines (49 loc) · 36.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
We're going to journey back in time a little bit. This is more of a traditional storage technology than the previous talk, which was about lake houses, data lake houses. NFS has been part of a storage world since the mid-80s. It's a very venerable technology. A lot of people understand it and deploy it. My responsibility is to act as a steward of the code base that's in the Linux kernel that implements an NFS server. NFS being a client-server protocol in the classic sense of that term. I just want to put this slide up to point out that, yes, I do work for Oracle. But no, I'm not going to talk about Oracle products today. You can ask questions, and what I might know about them I can probably share because I don't know a lot about them. But I'm an upstream Linux guy, so that's what I'm going to talk about.
So I sort of broke down the talk into four pieces, these four pieces. I may or may not stick to this, and depending on whatever questions or other interaction you might want to have with me, we can take this in any direction you want. Although I do reserve the right to tell you to come talk to me afterward in order to keep this on schedule.
So, some of you may be familiar with the NFS server in Linux. It's changed significantly in the last two years or so, or maybe three. And in fact, I don't think I've ever given a talk like this... this at this event. So, reintroducing may mean for you, or it may mean for me, who knows? I've been the co-maintainer of the NFS server in Linux for the last three or four years, I guess around the time of COVID. Yeah, I guess 2020, 21. I joined when Bruce Fields was a maintainer, and now he's in semi-retirement. He's playing in a rock band, believe it or not. That's a true story. And now my co-maintainer is Jeff Leighton, who's got a very broad array of experience in Linux file systems. He's worked on the VFS and in Ceph and in other places, as well as being in and out of the NFS community for the last 10 or 15 years. He's got quite a lot of experience. I'm also very ably aided by a team of reviewers. I'm very much appreciative of review and contributions. This code base would not be what it is today without the contributions and expertise of those folks.
So, what does this thing do? We support all versions of NFS today, from version 2, from as I said, the 1980s, all the way up to 4.2 and newer extensions. The 4.2 features that might interest you are read plus, which is a mechanism that NFS added recently to do sparse reading. So you can get a read result back that says the range you asked for is not allocated on disk, so just fill it in with zeros, and it doesn't have to send all that data over the network. It just says a little, it sends a little packet that says it's all zeros. Or it can get more complicated than that, of course, but in our implementation, we only send either data or zeros. Data, or a whole. Allocate, which is kind of the opposite. It's basically a write same for you SCSI folks. Copy offload, that's also something SCSI has, where you can point the server at a couple of files and say move this range of bytes in that file to this range in this other file. The two files can be on different servers. So that's kind of an interesting technology, and it has a lot of evolving to do. I'll talk a little bit more in more detail in a few slides. This thing called security labels, basically this is as the Linux more or less standardized and put into NFS. We support that. I think we're one of the only server implementations that supports it. Some of the extensions I mentioned, the newer than NFS v4.2 extensions, the big one is probably going to be extended attribute support. That was kind of a schlep just because it was very politically contentious. Linux has this thing called extended attributes. Solaris has this thing called named attributes that are not the same, even though they're called attributes. So named attributes is part of the NFS v4 protocol as it was specified way back in the early 2000s. It did not have support for extended attributes as Linux knows them until recently, and now we support that. And so the user extended attributes in local file systems on the Linux server are visible and can be updated by NFS v4.2 clients. Talk about transports. I think we're one of the few NFS implementations in the industry that can support RDMA transports and not just v4.2. Just InfiniBand, but all of them. Any fabric that Linux supports, we can support with the Linux NFS server. So that's RoCE, InfiniBand, well, I mean, that's on the slide. We also have two software emulation, RDMA emulation drivers in Linux. These aren't part of nfsd per se, but nfsd can use them. One is software-emulated iWARP, the SIW driver. So anybody who's got a standard Ethernet card in their Linux NFS server can use software iWARP just to try it out, kick tires. It can also talk to hardware iWARP cards. And the other one is software-emulated RoCE. Same deal for all that. You have a standard Ethernet card in your machine, and it will talk to other software-emulated RoCE, or it will talk to hardware RoCE. I mean, obviously, they're open standard. And I want to mention the file systems that we support, the local file system types. So you may be familiar with Ganesha, which is a user-based NFS server. Ganesha is great for file systems like if you want to store your data on S3. Or in other file systems where the driver is probably not in the kernel already. Lustre, for example, is not in the kernel. I don't believe it's in the kernel anymore. So you would use a user-based NFS file server like Ganesha for that kind of application. But for any in-kernel file system in Linux, like the ones I mentioned here: XFS, BtrFS, TempFS, ext4. This is the way you want to go. It's very good in terms of performance and stability. I also mentioned NFS re-export here. And what that means is that the NFS server in Linux can act as a caching NFS server that accesses, like say, it can mount a remote NFS server and re-export those files locally. You can even choose to set up an FS cache on the NFS server so that you can store a whole bunch of remote files on that server and access them with local latencies from your local NFS client. Kind of a spiffy feature. I mentioned some other file system choices you could make here. FUSE is kind of interesting in the sense that it allows the NFS server to access user spaces that are implemented in the user space. Not terribly performant, but if you need the capability, sometimes that's good enough. We also support ZFS, if you're not American. ZFS is not an official part of the Linux kernel, but there are ports of it to Linux, and it will work with nfsd.
Then I'd like to tout a few additional capabilities. We're kind of proud of the support for Kerberos. It's pretty extensive. I'm going to talk a little bit about a few more details in a slide or two. NFS v4 referrals, if you're familiar with SMB, that also has a capability where the server can tell a client, 'oh, I don't have that anymore, that piece of data that you want, that file that you want, go over here and look at this other server.' NFS v4 also has a capability, and nfsd in Linux supports it. We do support some forms of PNFS, not all of them. The most complete support we have is for PNFS block, where nfsd in Linux acts as a metadata server, and the data that the client's accessing is actually stored and accessed via one of the block protocols like NVMe or iSCSI. The NVMe support was added very recently, contributed by Christoph Hellwig for this. I think it's coming, it's appearing in 6.11, I believe, which was just released yesterday. He wrote the spec for that, the RFC for that, 9561, and he contributed the support for it, and in fact, it was not very many lines of code, and I don't think any lines of code in nfsd had to be changed to support it, so that's kudos to him. We do support some observability. People may be familiar with dprintk. We kind of don't like dprintk anymore because of all the fancy journaling support in Linux now that will squelch any loud chatter from any subsystem. And so dprintk tends to get pretty noisy in the system log, and it will get squelched very quickly, and so that kind of makes the usefulness of dprintk pretty minimal. So we've been over the years switching over to use ftrace and tracepoints, the more modern facility. We like those a lot. I won't belabor this point too much, and if you're interested in learning more about it, come talk to me after the talk. Our ACL support, access to control list support, is POSIX only because that's what our local file system support, but NFS v4 doesn't have the ability to expose POSIX ACLs in their native form to NFS clients yet, but I will get to that in a few slides. Anyway, so what NFS v4 clients see is NFS v4 ACLs, which are kind of like Windows ACLs. What's stored on the NFS server is a POSIX ACL. So that's kind of a lossy translation because POSIX ACLs are not nearly as rich as Windows ACLs are. There's a long story there that I won't belabor, but that's what we support, POSIX ACLs.
As promised, Kerberos. Kerberos is... Oh, I see a question. I see a question here.
Yes, there is. We could repeat the question. Thank you for the reminder. The question is, 'ZFS, even though it's not a supported file system in Linux, it does support NFS v4 ACLs. Could we support that in nfsd?' The answer is, we could. We could do that. There is no plumbing through the VFS layer in Linux for NFS v4 ACLs. So, you know, it's not as easy as, 'oh, just start using ZFS and the NFS v4 support will appear.' So, we'd have to do some work for that. But that's not beyond the pale. We could do that. I think, you know, there certainly is a low hum of interest in true NFS v4 ACL support at all times. I mean, we hear that buzz, that request comes frequently. And, you know, the story is, we don't have support in the local file systems. We don't have support in the VFS. And if we had those two things, yeah, we could make that happen.
Anything we can do to support, to revive this, to support on ZFS v4? From your perspective?
Well, I just explained it. If we have VFS support, then any local file system like ZFS, yeah, we could do it.
What do we have to do to make it happen?
Well, I don't have specifics, but you need, we need to, you need to do the political work to address, you know, support for NFS v4 ACLs in the VFS layer. I think, I don't think that we can do that with the POSIX ACL APIs that exist now. Yes, maybe I should take the liberty I explained earlier and say, please come talk to me after the talk. Or what Bruce always used to say is, 'Patches are welcome.'
Um, yeah, I, you know, I'm, I'm rather agnostic about, about whether that support is appropriate or not. Um, so I'm not, I'm not gonna argue with you. I think that would be interesting for a large class of users. And so, you know, I wouldn't say no to it. As, as a co-maintainer of nfsd, I wouldn't say no.
So, over the last couple of years, we decided that it was time to remove support from the kernel, more strongly than just deprecating, but actually remove support from the kernel for the DES based, and three-DES based encryption types. So the types that are listed here in the first bullet, in the light blue color, are all gone in recent versions of the upstream kernel. So you can't pick them, even if you have user-space support for them. The remaining encryption types from the traditional implementation are the two SHA-1 based types from RFC 3962. So those are in there. There is a deprecation schedule for those, not ours. I think it's from, I want to say, the U.S. government. I think by 2030, those have to be gone as well. So all of that stuff can be built out of the kernel now if a distribution chooses. What's coming in its place?
I implemented support for Camellia and for the RFC 8009 encryption types. The 8009 encryption types are required to support FIPS 140. So that brings NFS into that world. It wasn't previously because none of the Kerberos encryption types that we had support for were FIPS compliant. So that's a new thing. And I think people will like to see that. I also took the time to add Kunit tests. So we can do basically all the tests that the RFC has provided. It's now coded up and built into the SunRPC Kerberos implementation. So you can just run those as Kunit tests. I've been talking with David Howell about how we're going to support acceleration on ISA platforms, ISAs that have acceleration for the encryption types that I mentioned so far. What's interesting about the new 8009 types is that they do, they're AEAD types, which means that the checksum computation and the encryption go together, and so they will see a lot of positive impact from hardware acceleration, so we're kind of excited about that possibility. We don't have any plan on the ground to implement it, but we've been talking about it, I think for the last two years, to see what we can do to maneuver ourselves in a position to support this, because I think I'm a big proponent of encryption on the wire.
And speaking of which, I wrote RFC 9289 with my co-author Trond Myklebust, and actually it was invented here at this conference in 2018 or 19, because we're sort of looking at what Microsoft was doing with SMB at that time. I think they were talking about SMB with TLS and maybe with QUIC, and I thought, why can't we do that with NFS? And there was a senior head very close to me at that time, Lars Egert, who said, 'Well, if you want to do NFS with QUIC, you need to start with NFS with TLS, because you know, you're going to have to specify all that stuff anyway for QUIC, and why not do it for TLS, because that's that's the technology that's here now and you can use.' So I said, okay, I'll do that. So there are two service levels. One is encryption only, that is, there's no authentication of the peers of the client or server to each other, and the other one is encryption with mutual peer authentication. That is, of course, the much preferred and more secure mode of operation, because I mean, who cares if you encrypt stuff if you don't know who you're talking to? You could be talking to a very bad actor even though you're encrypted. It's still, you know, a man in the middle attack is very possible. So you want to use mutual peer authentication, of course. Right now, support exists for X509 certificates on both ends. We would like to do pre-shared keys as well, but I haven't actually gotten to that. But one of the key important things here is the ability for us to use hardware offload, because as everyone knows, encryption has a palpable latency and compute cost. So if we can offload encryption and decryption onto another processor, that means that the host CPU memory cache and processing power is not being wasted on encryption and decryption. It's being done in the network interface card before it even reaches the host. But of course, it also works with software, with the kernel software crypto, so either way works fine.
There's been a lot of talk in the Linux community about memory safety. The Rust folks are getting pretty, they're gaining confidence about bringing Rust into the Linux kernel as a way of re-engineering some of the kernel architecture APIs and whatnot in a memory-safe language. I think that could be a benefit to SunRPC, the lower levels of NFS. So I've been working on a number of ways of trying to enable Rust without actually getting my fingers into the Rust pie. One of the first steps has been to replace the old-school C macro based XDR encoders and decoders with the newer XDR stream utilities. But the next step is going to be actually writing a tool that can read an XDR specification, an RFC, and generate code—code, either C code or Rust code or Go or whatever you want. And I've actually done that. That's where I've been spending my time for the last two or three months is building a Python tool that does just that. I'm happy to talk about that for a long time. If you'd like more details, come and see me after the talk.
Another thing I've been working on is continuous integration for upstream kernel code. Bruce had this. He basically had a machine in his basement that ran once in a while, or maybe on a nightly basis, a few tests. And my feeling was, that's great, but Bruce, you're retired, A. And B, I would like to see this done in public so that not only do we have public results and we can see the results of the test, but we can let other people run the test, basically clone them and run them themselves, or they can contribute. And if the tests are open source, if the tests and the test framework are open source. So to that end, we've adopted Luis Chamberlain's kdevops. It's something that he built while he was at SUSE several years ago, but it's an open source framework for running tests against file systems. It looks like the URL is right there on the slide, so if you're curious about kdevops and what it is, you can go look at the URL. But essentially, what it is in a nutshell is some Ansible and Libvert so that it creates guests or it uses Terraform to set up cloud shapes so that you can get a repeatable environment in which to run tests. It's under heavy evolution right now because there are a lot of people who are starting to notice that it's something that's useful. I believe BtrFS and XFS are both file systems that are starting to use it.
We've spent the last year or so building NFS-specific workflows for it. So before, it was just XFS tests, and now we've got PyNFS, which is a Python-based synthetic client that can drive server unit testing, the Linux test project, an NFS test tool, which comes from Jorge Mora at NetApp, and a workflow that runs the Git regression tests on NFS. So it's been a hefty amount of work that we put into this, and I now run this nightly, and so do several others. So we've got a lot of testing going on, a lot more than we used to. But this is, in no way, complete. We know we've got testing gaps, things like Kerberos and TLS and PNFS. There is some RDMA testing in there using the aforementioned software iWARP emulation. But, you know, obviously CI is something that's very resource intensive, and, you know, we need a lot more CPUs.
I thought I'd also mention that I'm regularly testing the long-term stable kernels. Those are, right now, long-term stable kernels are 5.15, 5.10, 5.4, 6.1, 6.6, and 4.19. These are maintained by the community long, like five or six years after their first release by Linus. And the reason, we didn't use to do that for this kind of, we didn't pay this kind of attention to the LTS kernels in the NFS community, but I've started to because it's pretty apparent that distributions, Linux distributions like to use the LTS kernels. My company uses 5.15 and 5.4. I believe Ubuntu is on 5.10, and that's used in Amazon's cloud. So we really need to have nfsd, a good level of confidence that nfsd is working well in those LTS kernels. And so I run all of the CI tests, the workflows that I just mentioned, nightly on the LTS kernels to make sure that they're in reasonable shape.
So what's been happening very recently in terms of new features? Interesting. I'm glad that you asked that. Some of you may know that nfsd has supported read delegation, NFS v4 read delegation, for quite some time. Probably since it supported NFS v4, so that would be almost 20 years. But it never supported write delegation. Write delegation is basically a promise by the server to tell the client if there are any other clients who want access to a file. And in which case, the client that holds the write delegation is free to handle application open requests locally. It doesn't have to go over the wire. That's a pretty hefty performance improvement. It also enables the clients to handle locking requests, lock range requests, because it has a delegation; it knows there are no other clients that it has to pass through the server and ask about. So we added support for the CB_GETATTR callback that allows the client or the server to ask clients that hold delegations, "Oh, by the way, has the size of this file changed locally for you? If so, I can hand that size to other clients that might be asking about it without recalling your delegation." So that's kind of a performance enhancement. The bullet point that starts, "Note that," is kind of an important one, and some people don't really realize this about write delegation, but there's a POSIX rule about mtime, which requires an mtime. If an application does a stat, the mtime in the stat has to reflect the most recent write to the file, which means that if you've got two applications running on a client, one of which is writing to a file, and the other one comes along and stats that file, if the client is still caching those writes, they haven't been sent to the server, for example, that means the server's mtime is out of date. And so the client is forced to flush all those writes back to the server and then do the GET_ATTR to get the mtime. That's an enormous performance overhead. People don't expect it when, you know, the classic problem report is, "Oh, I've got these applications that are writing to files and I did an ls -l in that directory and it took forever." Well, this is why. So you know before the client can do a GET_ATTR on that file, it has to flush the writes back. We're hoping that there's enough mechanism, new mechanism, in NFS v4 and extensions that are coming soon that we can forego the issue of having to flush dirty data back to the server to get an updated mtime. I will get to that in a couple of slides because that's interesting work.
We're also working on directory delegation. That's basically a read delegation for directories. So a client can go, 'I need to create ten files.' And it can ask the server, 'Is there anybody else looking at this directory?' And the server can say, 'No.' And so the client doesn't have to synchronously push all of those directory entry creation operations into the server. So it can accelerate and go a little faster, and also— there's a change notification mechanism built into this. This is already implemented in FreeBSD, and we have been working with the FreeBSD author to implement something that's going to interoperate with them. This is specified in NFS v4.1. So it's not something new and crazy. Go ahead.
He's asking about when does the client have to update mtime. The server's local file system is the one that controls mtime. That's why the client has to push all those rights back to the server and then the server can report the mtime because those rights have to hit the mtime, have to hit the server's local file system. Does that address your question? Yeah, so until the client sends the rights back to the server, the mtime will be— Right. That's why, yeah, that's why it has to flush those. But hold that thought.
Not a lot of people, sort of a change of topic here, not a lot of people know about PNFS and the support for parallel NFS block layout in nfsd, but it's got it, it's had it for ten years. We're recognizing the stunning and callous lack of documentation around this and also adding it to the CI workflows that I mentioned before because I think this is probably a very interesting performance play because you can— the clients can basically do IO directly to a SCSI LUN using PNFS block. nfsd has a simple implementation, that is, it doesn't do RAID or striping or anything like that, which is in the specification, but we didn't implement because it's a little more tricky. I think we were initially interested in just being able to put a file system on a SCSI LUN and having clients access it directly via iSCSI operations, and that works pretty well. I've been testing it. We're about to write some documentation to explain to people how to set it up because it's a little tricky. And as I mentioned before, you can do this with NVMe now as well. We think that's a pretty important feature.
Another thing that we did over the last year and a half or so, 18 months or so, is something called courteous lease extension. Basically, when a network partition occurs between a client and a server, the server gives the client up to 90 seconds to contact it to keep its leases active of the open and lock file state. And if the server doesn't hear from a client, it will purge them. It will just toss them out, and it will assume that the client has crashed. Well, a network partition doesn't mean that the client has crashed. It just means that it can't talk to the server. So, throwing out open and lock state is a bit draconian, and it is actually pretty disastrous in some cases. So what we do now is instead of tossing it out at that point, we put the lease information that the server holds into this intermediate state where it stays there almost indefinitely unless another client comes in and wants to open one of the files, and then it's thrown out. So there's a chance, there isn't a lot of sharing between clients and NFS, it can happen, but a lot of use cases are covered by just leaving that state alone on the server until the client is able to reconnect. So we think this is a pretty interesting feature.
My co-maintainer Jeff has been working on something called multi-grain timestamps. One of the problems that we have is local file systems on Linux don't always store a fully fine-grained timestamps in there—atime, mtime, and ctime. That's a problem because several changes to a file can happen close enough together that the timestamp granularity doesn't show that the file has actually changed more than once. That can lead to some races where a client doesn't invalidate its cache and so it has a stale version of the data in the file, and bad things happen. The reason why, of course, file systems don't update timestamps with that kind of granularity is that it's expensive. It's, you know, it's a right to durable storage for every time stamp updated or at least your journal. So the compromise has been to obtain the timestamps only in the cases where we're sure that somebody else is actually looking at the timestamp. That sounds crazy but it actually works, there are some details that he is still chasing down like you have to maintain a floor timestamp for the file system to make sure that when you're doing a make, the source and the object file timestamps don't go out of order, because that can be possible if you're on a big system with lots of CPUs and lots of different time sources. Anyway, we think this is an exciting feature because it's going to make the caching that's done by NFS clients more resilient. And that's about all I can say about that, because he's done the work and I didn't, and it's pretty impressive work, but I don't understand it.
Oh, yeah, there. So it's explained on the slide. Oh, and as the bottom bullet point says, one of the requests that when we did this work, especially by Dave Chinner, the lead, well, the former maintainer of XFS, was we don't want to have to change any code in the file systems to do this. And so Jeff did it all in the VFS. So all file systems that can, you know, will get support for this.
So what do we think is coming in the next year to 18 months?
A lot of new administrative features. Jeff has been building a netlink-based administration, administrative interface for nfsd. The old one is in the PROC file system and is way outdated. It's like 30 years old. And we have lots of new things we want to add to the kernel, of course, and changing PROC is a nightmare, mostly politics. Oh, we're working on a dynamic thread pool sizing feature where the thread pool can grow and shrink, depending on the workload on the server. Naturally, that's a lot more complicated than it sounds, but some of those patches went in 6.11. Some are going to come in 6.12. Our support for NFS before referrals is a little, it's just a little lacking around the edges. One of the problems is IPv6 addresses, and, you know, how we're going to support that in, we don't like the idea of having to support that in exports, which is how referrals are set up now. So we're going to move to an nfsref command-line tool. This is exactly what Solaris does. And you basically give the tool an NFS URL and say, you know, point clients at this URL, which can specify an IP address or a domain name and a port and a path name on that server.
I mentioned COPY offload before. I found some problems in it that sort of made me a little sick to my stomach. They're security-related problems, so I had to turn it off in upstream. But I'm working feverishly with the person who authored this code to rectify these problems. I mentioned some of them here. I don't have to go into a lot of detail unless people are really curious. But the upshot is that asynchronous copy was done only with a CB_OFFLOAD callback. And I've talked to folks in the SCSI world, and they're like, 'You can't do it that way.' Basically, that's not reliable enough to support offloaded copies. You have to have a way of polling for completion. And NFS has one. It's called offload status. But the Linux NFS client doesn't support it yet. So I've written it, and we're working on getting that into the Linux kernel.
There's a new draft standard called delstid that allows you to do the same kind of delegation tricks that you do with data, read and write, and directories with some attributes. So... If you're familiar with the thing that we were talking about before, about mtimes, basically it says, okay, for the time being, Mr. Client, you can manage the mtime all you want, and you can update it as often as you need to. And when you hand the delegation back, you have to have a right delegation to do this, when you hand the right delegation back, tell me what the mtime ended up being, and I will put that in the file. So in that way, that obviates the need for having to push all those rights back to the server in order to get an up-to-date mtime. Basically, the client can do it itself. There's another feature in here. Today, when the server hands out a delegation, it returns an OpenStateID and a DelegationStateID. With this extension,
Can you speak up, please? Are we trusting the client's clock to set that mtime? Yes, we are. Yes, no. You get the whole... mtime. The client gets the whole mtime. Yes.
As I was mentioning, today, when an Open returns a delegation, you get back an OpenStateID and a DelegationStateID, and this draft allows the server to return one or the other, because generally what happens is if it returns a DelegationStateID, a read or write delegation, the client's going to go, 'Well, I don't need to keep the OpenStateID around. Here it is back.' So, just let's just get rid of that little piece of work and just... The server says, okay, I'm returning an OpenStateID, no delegation, or I'm returning a read or write delegation, not both. A lot of that's coming to 6.12. I just got a problem report with one of these things that I need to revert some of this, so it's going to be a little more delayed for some of it. But the client already supports this stuff.
So I mentioned there's going to be more yummy stuff about support for access control lists. And here it is. Basically, for as long as NFSv4 has been around, we've had NFSv4 ACLs, which are richer than POSIX ACLs but not compatible. So now it's been proposed, hey, what if we add support for POSIX ACLs and NFSv4 as well? So we'll have the ability for clients to ask about, you know, 'What do you have on this file, Mr. Server? Do you have an NFSv4 ACL or a POSIX ACL?' And the server gets to say, 'Oh, the file system supports only POSIX, so here's your POSIX ACL.' Or the file system supports NFSv4 ACLs, so here's your NFSv4 ACL. Go ahead.
In the Linux world, he wanted me to comment on the demand for... support for POSIX ACLs. It's more than a low hum. People ask for it all the time. It's a real pain in the butt because, as I mentioned, the translation between NFSv4 ACLs and POSIX is lossy. And so a client might set an ACL, it will have to be translated into NFSv4, an NFSv4 ACL sent to a server. The server then stores that translation of that into the file system, because the file system only supports POSIX ACLs, and then it has to come all the way back and through two translations.
Can I name the file systems? So the question is, which file systems in Linux are asking for support for POSIX ACLs? All of them. We don't have any file systems native to Linux that support NFSv4 ACLs, and so they all want POSIX ACL support in them.
Yes, there is NFS Birds of a Feather tomorrow night. How many minutes do I have left? Let me, sure, I only have like two slides left anyway.
So there's this new concept that's come up recently called local I/O. Basically, if the client and server determined that, oh, they're both running on the same physical host, like they're in separate containers but they're both on the same physical host, then the client can use, I don't want to say direct I/O because everybody knows what direct I/O is and this is not direct I/O, but they, the client does I/O internally in the kernel on the physical machine. So that the read and write happens without going through NFS. This is work that's been contributed by Hammerspace. They've been doing something like this, prototyping it and actually, I guess, running it at customer sites for the last ten years. So there were some issues to work out in terms of security, but now I think it's going to come either in 6.12 or 6.13. It's imminent. If you're more... If you want more information about it, I believe the author or the person who's responsible for getting this upstream is going to be giving a talk tomorrow afternoon, the PNFS Hammerspace talk.
And then, you know, the talk wouldn't be complete if I weren't going to say we're going to, you know, rip it and replace some stuff. NFS v2 is probably something that's not used anywhere anymore. We're pretty sure it's just... You know, it's not something we need to support, and it would help our test matrix if we could get rid of that. So we're thinking about that. I mentioned at the beginning of the slides that we still support UDP. Indeed, we do, but that's another thing that would help our test matrix is if we didn't have to support that, we could take that out of the kernel. I mentioned the AES-1 encryption types. Those do have a deprecation schedule mandated by the U.S. government. So, some people have pined for a day when they don't have to ever deal with NFS v4 minor v0 ever again because it's got lots of holes and everybody's using v4 one now to great effect. So there's some talk about removing it. I don't know how serious that talk is, but...
Well, I just want to thank my hosts and the folks I work with because, as I said, you know, this is a very feature-rich piece of software, and it wouldn't be nearly as good as it is without the support I get from my contributors and reviewers and co-maintainer. Thank you.
Any questions? Go ahead.
The question is, is there anything coming in the identity-based authentication space outside of Kerberos? There's some talk. We're looking at OAuth, too. But yeah, I mean, if there's some, if you'd like to steer the standards process a little bit, if Microsoft would have any suggestions for us that would help your cloud work or whatever, then I encourage you to get involved because it's in the early stages. People are just sort of... we haven't even written anything yet. Well, Tigran has written something, but it's an incomplete problem statement, and he would like a lot of eyeballs on it so we can start steering that. So the short answer is not much. Any other questions? Go ahead.
This is Paul from Future Looking. Tracking the progress of Rust in the Linux kernel is a great joy. Are there anything... like there hasn't been any real movement for Rust doing file system work yet, let alone this thing. In terms of being able to upstream it, is there something that you think would be a good fit for something other than C in the Linux kernel, or even in the NFS server space?
What languages do you think will be better than C for...
Rust in particular, because it's the only other language that's supporting the kernel.
So, I don't agree with the priest's opposition that Rust has been solely rejected in the Linux file system community. Actually, there was a presentation at this year's LSF by the guy who's actually writing the new APIs.
I don't mean to suggest it's been rejected, just that I haven't seen it actually get upstream yet.
It's complicated.
I get that.
The paradigms that are available in Rust are very different. And the approach that the fellow took was to basically reconstruct all of the APIs. The problem is he's not a file system guy. And so he didn't really understand why the APIs work the way they do. And he correctly observed that there isn't a lot of really good documentation for the current APIs in C. So what he constructed was something that raised the hackles of the more seasoned developers. And maybe he mistook that as another rejection. And so he's sort of gone off and started doing other things, which is too bad. There are others who are carrying on the work. And I'm very interested in seeing what Rust can bring to things like NFS, the remote file systems. And NFS is not the only one in the Linux kernel, by far. The other choices are things like Go or something else. I haven't heard much there. Rust seems to be the loudest contender. But I think work is ongoing, and we're going to see some progress there in the next year to two years. I probably need to get off the stage now. Thank you for coming.