Message ID | 20240702162831.91604-1-snitzer@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | nfs/nfsd: add support for localio | expand |
> On Jul 2, 2024, at 12:28 PM, Mike Snitzer <snitzer@kernel.org> wrote: > > Hi, > > There seems to be consensus that these changes worthwhile and > extensively iterated on. I don't see a public consensus about "extensively iterated on". The folks you talk to privately might believe that, though. > I'd very much like these changes to land upstream as-is (unless review > teases out some show-stopper). These changes have been tested fairly > extensively (via xfstests) at this point. > > Can we now please provide formal review tags and merge these changes > through the NFS client tree for 6.11? Contributors don't get to determine the kernel release where their code lands; maintainers make that decision. You've stated your preference, and we are trying to accommodate. But frankly, the (server) changes don't stand up to close inspection yet. One of the client maintainers has had years to live with this work. But the server maintainers had their first look at this just a few weeks ago, and this is not the only thing any of us have on our plates at the moment. So you need to be patient. > FYI: > - I do not intend to rebase this series ontop of NeilBrown's partial > exploration of simplifying away the need for a "fake" svc_rqst > (noble goals and happy to help those changes land upstream as an > incremental improvement): > https://marc.info/?l=linux-nfs&m=171980269529965&w=2 Sorry, rebasing is going to be a requirement. Again, as with the dprintk stuff, this is code that would get reverted or replaced as soon as we merge. We don't knowingly merge that kind of code; we fix it first. To make it official, for v11 of this series: Nacked-by: Chuck Lever <chuck.lever@oracle.com> I'll be much more ready to consider an Acked-by: once the "fake svc_rqst" code has been replaced. > - In addition, tweaks to use nfsd_file_acquire_gc() instead of > nfsd_file_acquire() aren't a priority. The discussion has moved well beyond that now... IIUC the preferred approach might be to hold the file open until the local app is done with it. However, I'm still not convinced there's a benefit to using the NFSD file cache vs. a plain dentry_open(). Neil's clean-up might not need add a new nfsd_file_acquire() API if we go with plain dentry_open(). There are still interesting choices to make here before it is merged, so IMO the choices around nfsd_file_acquire() remain a priority for merge-readiness. -- Chuck Lever
On Tue, Jul 02, 2024 at 06:06:09PM +0000, Chuck Lever III wrote: > > > > On Jul 2, 2024, at 12:28 PM, Mike Snitzer <snitzer@kernel.org> wrote: > > > > Hi, > > > > There seems to be consensus that these changes worthwhile and > > extensively iterated on. > > I don't see a public consensus about "extensively iterated > on". The folks you talk to privately might believe that, > though. > > > > I'd very much like these changes to land upstream as-is (unless review > > teases out some show-stopper). These changes have been tested fairly > > extensively (via xfstests) at this point. > > > > Can we now please provide formal review tags and merge these changes > > through the NFS client tree for 6.11? > > Contributors don't get to determine the kernel release where > their code lands; maintainers make that decision. You've stated > your preference, and we are trying to accommodate. But frankly, > the (server) changes don't stand up to close inspection yet. > > One of the client maintainers has had years to live with this > work. But the server maintainers had their first look at this > just a few weeks ago, and this is not the only thing any of us > have on our plates at the moment. So you need to be patient. > > > > FYI: > > - I do not intend to rebase this series ontop of NeilBrown's partial > > exploration of simplifying away the need for a "fake" svc_rqst > > (noble goals and happy to help those changes land upstream as an > > incremental improvement): > > https://marc.info/?l=linux-nfs&m=171980269529965&w=2 > > Sorry, rebasing is going to be a requirement. What? You're imposing a rebase on completely unfinished and untested code? Any idea when Neil will post v2? Or am I supposed to take his partial first pass and fix it? > Again, as with the dprintk stuff, this is code that would get > reverted or replaced as soon as we merge. We don't knowingly > merge that kind of code; we fix it first. Nice rule, except there is merit in tested code landing without it having to see last minute academic changes. These aren't dprintk, these are disruptive changes that aren't fully formed. If they were fully formed I wouldn't be resisting them. > To make it official, for v11 of this series: > > Nacked-by: Chuck Lever <chuck.lever@oracle.com> Thanks for that. > I'll be much more ready to consider an Acked-by: once the > "fake svc_rqst" code has been replaced. If Neil completes his work I'll rebase. But last time I rebased to his simplification of the localio protocol (to use array and not lists, nice changes, appreciated but it took serious work on my part to fold them in): the code immediately BUG_ON()'d in sunrpc trivially. So please be considerate of my time and the requirement for code to actually work. I'm fine with these changes not landing for 6.11 if warranted. I just seriously question the arbitrary nature of what constitutes necessary change to allow inclusion. > > - In addition, tweaks to use nfsd_file_acquire_gc() instead of > > nfsd_file_acquire() aren't a priority. > > The discussion has moved well beyond that now... IIUC the > preferred approach might be to hold the file open until the > local app is done with it. However, I'm still not convinced > there's a benefit to using the NFSD file cache vs. a plain > dentry_open(). Saving an nfs_file to open_context, etc. All incremental improvement (that needs time to stick the landing). Why do you think it appropriate to cause upheaval on code that has clearly drawn a line in the sand in terms of established fitness? Eliding allocation of things and micro-optimizing can come later. But I guess I'll just have to agree to disagree with this approach. Really feels like I'll be forced to keep both pieces when it breaks in the near-term. By all means layer on new improvements. But this fear to establish a baseline out of fear that we _might_ change it: don't even know where to begin with that. > Neil's clean-up might not need add a new nfsd_file_acquire() > API if we go with plain dentry_open(). > > There are still interesting choices to make here before it > is merged, so IMO the choices around nfsd_file_acquire() > remain a priority for merge-readiness. Maybe Neil will post a fully working v12 rebased on his changes. Mike
> On Jul 2, 2024, at 2:32 PM, Mike Snitzer <snitzer@kernel.org> wrote: > > On Tue, Jul 02, 2024 at 06:06:09PM +0000, Chuck Lever III wrote: >> >> >>> On Jul 2, 2024, at 12:28 PM, Mike Snitzer <snitzer@kernel.org> wrote: >>> >>> Hi, >>> >>> There seems to be consensus that these changes worthwhile and >>> extensively iterated on. >> >> I don't see a public consensus about "extensively iterated >> on". The folks you talk to privately might believe that, >> though. >> >> >>> I'd very much like these changes to land upstream as-is (unless review >>> teases out some show-stopper). These changes have been tested fairly >>> extensively (via xfstests) at this point. >>> >>> Can we now please provide formal review tags and merge these changes >>> through the NFS client tree for 6.11? >> >> Contributors don't get to determine the kernel release where >> their code lands; maintainers make that decision. You've stated >> your preference, and we are trying to accommodate. But frankly, >> the (server) changes don't stand up to close inspection yet. >> >> One of the client maintainers has had years to live with this >> work. But the server maintainers had their first look at this >> just a few weeks ago, and this is not the only thing any of us >> have on our plates at the moment. So you need to be patient. >> >> >>> FYI: >>> - I do not intend to rebase this series ontop of NeilBrown's partial >>> exploration of simplifying away the need for a "fake" svc_rqst >>> (noble goals and happy to help those changes land upstream as an >>> incremental improvement): >>> https://marc.info/?l=linux-nfs&m=171980269529965&w=2 >> >> Sorry, rebasing is going to be a requirement. > > What? You're imposing a rebase on completely unfinished and untested > code? Any idea when Neil will post v2? Or am I supposed to take his > partial first pass and fix it? Don't be ridiculous. Wait for Neil to post a working version. >> Again, as with the dprintk stuff, this is code that would get >> reverted or replaced as soon as we merge. We don't knowingly >> merge that kind of code; we fix it first. > > Nice rule, except there is merit in tested code landing without it > having to see last minute academic changes. These aren't dprintk, > these are disruptive changes that aren't fully formed. If they were > fully formed I wouldn't be resisting them. It's your server patch that isn't fully formed. Allocating a fake svc_rqst outside of an svc thread context and adding a work-around to avoid the cache lookup deferral is nothing but a hacky smelly prototype. It's not merge-ready or -worthy. > If Neil completes his work I'll rebase. But last time I rebased to > his simplification of the localio protocol (to use array and not > lists, nice changes, appreciated but it took serious work on my part > to fold them in): the code immediately BUG_ON()'d in sunrpc trivially. You should be very grateful that Neil is writing your code for you. He's already contributed much more than you have any reason to expect from someone who is not employed by Hammerspace. And quite frankly, it is not reasonable to expect anyone's freshly written code to be completely free of bugs. I'm sorry it took you a little while to find the problem, but it will become easier when you become more familiar with the code base. > So please be considerate of my time and the requirement for code to > actually work. I'll be considerate when you are considerate of our time and stop patch bombing the list with tiny incremental changes, demanding we "get the review done and merge it" before it is ready. Honestly, the work is proceeding quite unremarkably for a new feature. The problem seems to be that you don't understand why we're asking for (actually quite small) changes before merging, and we're asking you to do that work. Why are we asking you to do it? It's because you are asking for /our/ time. But we don't work for Hammerspace and do not have any particular interest in localIO and have no real way to test the facility yet (no, running fstests does not count as a full test). It's your responsibility to get this code put together, it's got to be your time and effort. You are getting paid to deal with this. None of the rest of us are. No-one else is asking for this feature. >>> - In addition, tweaks to use nfsd_file_acquire_gc() instead of >>> nfsd_file_acquire() aren't a priority. >> >> The discussion has moved well beyond that now... IIUC the >> preferred approach might be to hold the file open until the >> local app is done with it. However, I'm still not convinced >> there's a benefit to using the NFSD file cache vs. a plain >> dentry_open(). > > Saving an nfs_file to open_context, etc. All incremental improvement > (that needs time to stick the landing). You are still missing the point. The phony svc_rqst is being passed into nfsd_file_acquire(). Either we have to fix nfsd_file_acquire (as Neil did) or replace it's use with fh_verify() / dentry_open(). This is not about garbage collection, and hasn't been for a while. It's about replacing unmergable prototype code. And sticking the landing? If a few good fstests results are supposed to be good enough for us to merge your code as it exists now, why aren't they good enough to verify your code is OK to merge after a rebase? -- Chuck Lever
On Wed, 03 Jul 2024, Mike Snitzer wrote: > > Maybe Neil will post a fully working v12 rebased on his changes. Maybe I will, but it won't be before Friday. I too wonder about the unusual expectation of haste, and what its real source is. NeilBrown
I am an upstream Linux kernel maintainer too. My ideals and approach are different but they are my own ;) The first localio RFC (that made it to list as v2) was posted on June 11. I have tried to work well with you and everyone willing to help and engage. So for it to come to this exchange is unfortunate. Development with excess rebases is just soul-sucking. My v11's 0th header certainly conveyed exhaustion in that aspect of how things have gone as this series has evolved. I clearly upset you by suggesting v11 suitable to merge for 6.11. I really wasn't trying to be pushy. I didn't think it controversial, but I concede not giving you much to work with if/when you disagreed. Sorry about painting you into a corner. v11 is a solid basis to develop upon further. I am all for iterating further, am aware it is my burden to carry, and am hopeful we can get localio staged in linux-next early during the 6.12 development window. Let it soak (Anna's instinct was solid). However, I'm hopeful to avoid the hell of frequent rebasing ontop of refactored code that optimizes approaches that this v11 baseline provides. SO I'd like to propose I carry the v11 baseline in my git tree: https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-next And any changes (e.g. Neil's promising refactor to avoid needing "fake" svc_rqst) can be based on 'nfs-localio-for-next' with standalone incremental commits that can possibly get folded via a final rebase once we're happy with the end result of the changes? Thanks, Mike
On Wed, Jul 03, 2024 at 10:52:20AM +1000, NeilBrown wrote: > On Wed, 03 Jul 2024, Mike Snitzer wrote: > > > > Maybe Neil will post a fully working v12 rebased on his changes. > > Maybe I will, but it won't be before Friday. No problem! I can also just run with the first patchset you provided. But hopeful you're OK with us doing incremental changes to this "v11" baseline?: https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-next Please see the other reply I just sent for more context on why I hope this works for you. Happy to do a final rebase once the code is settled. > I too wonder about the unusual expectation of haste, and what its real > source is. Desire to settle approach is all, to allow settling development and ultimately move on to developing something else in NFS. Mike
On Tue, Jul 02, 2024 at 06:06:09PM +0000, Chuck Lever III wrote: > To make it official, for v11 of this series: > > Nacked-by: Chuck Lever <chuck.lever@oracle.com> We've also not even looked into tackling the whole memory reclaim recursion problem that have historically made local loopback network file system an unsupported configuration. We've have an ongoing discussion on the XFS list that really needs to to fsdevel and mm to make any progress first. I see absolutely not chance to solved that in this merge window. I'm also a bit surprised and shocked by the rush here.
On Tue, Jul 02, 2024 at 10:04:56PM -0700, Christoph Hellwig wrote: > On Tue, Jul 02, 2024 at 06:06:09PM +0000, Chuck Lever III wrote: > > To make it official, for v11 of this series: > > > > Nacked-by: Chuck Lever <chuck.lever@oracle.com> > > We've also not even looked into tackling the whole memory reclaim > recursion problem that have historically made local loopback > network file system an unsupported configuration. We've have an > ongoing discussion on the XFS list that really needs to to fsdevel > and mm to make any progress first. I see absolutely not chance to > solved that in this merge window. I'm also a bit surprised and > shocked by the rush here. linux-nfs and linu-xfs both received those emails, there isn't some secret game here: https://marc.info/?l=linux-nfs&m=171976530216518&w=2 https://marc.info/?l=linux-xfs&m=171976530416523&w=2 I pivoted away from that (after Dave's helpful response) and back to what I provided in most every revision of this patchset (with different header and code revisions), most recent being patch 18 of this v11 series: https://marc.info/?l=linux-nfs&m=171993773109538&w=2 And if spending the past 2 months discussing and developing in the open is rushing things, I clearly need to slow down... If only I had reason to think others were considering merging these changes: https://marc.info/?l=linux-nfs&m=171942776105165&w=2 Ultimately I simply wanted to keep momentum up, I'm sure you can relate to having a vision for phasing changes in without missing a cycle. But happy to just continue working it into the 6.12 development window. I'll be sure to cc linux-fsdevel on future revision(s).
On Wed, Jul 03, 2024 at 04:52:34AM -0400, Mike Snitzer wrote: > Ultimately I simply wanted to keep momentum up, I'm sure you can > relate to having a vision for phasing changes in without missing a > cycle. But happy to just continue working it into the 6.12 > development window. It just feels really rushed to have something with cross-subsystem communication going in past -rc6 in a US holiday week. Sometimes not rushing things too much will lead to much better results.
On Wed, Jul 03, 2024 at 07:16:31AM -0700, Christoph Hellwig wrote: > On Wed, Jul 03, 2024 at 04:52:34AM -0400, Mike Snitzer wrote: > > Ultimately I simply wanted to keep momentum up, I'm sure you can > > relate to having a vision for phasing changes in without missing a > > cycle. But happy to just continue working it into the 6.12 > > development window. > > It just feels really rushed to have something with cross-subsystem > communication going in past -rc6 in a US holiday week. Sometimes > not rushing things too much will lead to much better results. Yes, I knew it to be very tight given the holiday. I should've just yielded to the reality of the calendar and there being some extra changes needed (remove "fake" svc_rqst in fs/nfsd/localio.c -- I was hopeful that could be done incrementally after merge but I digress). Will welcome any help you might offer to optimize localio as much as possible (doesn't need to be in near-term, whenever you might have time to look). Its current approach to use synchronous buffered read_iter and write_iter, with active waiting, should be improved. But Dave's idea to go full RMW to be page aligned will complicate a forecasted NFS roadmap item to allow for: "do-not-cache capabilities, so that the NFS server can turn off the buffer caching of files on clients (force O_DIRECT-like writing/reading)". But even that seems a catch-22 given the NFS client doesn't enforce DIO alignment.
I've stated looking a bit at the code, and the architectural model confuses me more than a bit. A first thing that would be very helpful is an actual problem statement. The only mention of a concrete use case is about containers, implying that this about a client in one container/namespace with the server or the servers in another containers/namespace. Is that the main use case, are there others? I kinda deduct from that that the client and server probably do not have the same view and access permissions to the underlying file systems? As this would defeat the use of NFS I suspect that is the case, but it should probably be stated clearly somewhere. Going from there I don't understand why we need multiple layers of server bypass. The normal way to do this in NFSv4 is to use pNFS layout. I.e. you add a pnfs localio layout that just does local reads and writes for the I/O path. We'd still need a way to find a good in-kernel way to get the file structure, but compared to the two separate layers of bypasses in the current code it should be significantly simpler.
On Wed, Jul 03, 2024 at 11:11:51AM -0400, Mike Snitzer wrote: > Will welcome any help you might offer to optimize localio as much as > possible (doesn't need to be in near-term, whenever you might have > time to look). Its current approach to use synchronous buffered > read_iter and write_iter, with active waiting, should be improved. > > But Dave's idea to go full RMW to be page aligned will complicate a > forecasted NFS roadmap item to allow for: "do-not-cache capabilities, > so that the NFS server can turn off the buffer caching of files on > clients (force O_DIRECT-like writing/reading)". But even that seems a > catch-22 given the NFS client doesn't enforce DIO alignment. As I just wrote in another mail I've now looked at the architecture, and either I'm missing some unstated requires, or the whole architecture seems very overcomplicated and suboptimal. If localio actually just was a pNFS layout type you could trivially do asynchronous direct I/O from the layout driver, and bypass a lot of the complexity. The actual way to find the file struct still would be nasty, but I'll try to think of something good for that.
> On Jul 3, 2024, at 11:18 AM, Christoph Hellwig <hch@infradead.org> wrote: > The actual way > to find the file struct still would be nasty, but I'll try to think of > something good for that. It is that very code that I've asked to be replaced before this series can be merged. We have a set of patches for improving that aspect that Neil is working on now. When Mike presented LOCALIO to me at LSF, my initial suggestion was to use pNFS. I think Jeff had the same reaction. IMO the design document should, as part of the problem statement, explain why a pNFS-only solution is not workable. I'm also concerned about applications in one container being able to reach around existing mount namespace silos into the NFS server container's file systems. Obviously the NFS protocol has its own authorization that would grant permission for that access, but via the network. -- Chuck Lever
> On Jul 3, 2024, at 4:52 AM, Mike Snitzer <snitzer@kernel.org> wrote: > > And if spending the past 2 months discussing and developing in the > open is rushing things, I clearly need to slow down... > > If only I had reason to think others were considering merging these > changes: https://marc.info/?l=linux-nfs&m=171942776105165&w=2 There is no mention of a particular kernel release in that email, nor is there a promise that we could hit the v6.11 merge window. In particular, I was asking how the series should be split up, since it modifies two separately maintained subsystems. I apologize for not asking this more clearly. -- Chuck Lever
On Wed, Jul 03, 2024 at 08:16:00AM -0700, Christoph Hellwig wrote: > I've stated looking a bit at the code, and the architectural model > confuses me more than a bit. > > A first thing that would be very helpful is an actual problem statement. > The only mention of a concrete use case is about containers, implying > that this about a client in one container/namespace with the server > or the servers in another containers/namespace. Is that the main use > case, are there others? Containers is a significant usecase, but also any client that might need to access local storage efficiently (e.g. GPU service running NFS client that needs to access NVMe on same host). > I kinda deduct from that that the client and server probably do not > have the same view and access permissions to the underlying file > systems? As this would defeat the use of NFS I suspect that is the > case, but it should probably be stated clearly somewhere. I can tighten that up in the Documentation. > Going from there I don't understand why we need multiple layers of > server bypass. The normal way to do this in NFSv4 is to use pNFS > layout. > > I.e. you add a pnfs localio layout that just does local reads > and writes for the I/O path. We'd still need a way to find a good > in-kernel way to get the file structure, but compared to the two > separate layers of bypasses in the current code it should be > significantly simpler. Using pNFS layout isn't viable because NFSv3 is very much in the mix (flexfiles layout) for Hammerspace.
On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote: > I'm also concerned about applications in one container being > able to reach around existing mount namespace silos into the > NFS server container's file systems. Obviously the NFS protocol > has its own authorization that would grant permission for that > access, but via the network. Yes. One good way I could think is to use SCM_RIGHT to duplicate a file descriptor over a unix socket. For that we'd need a way to actually create that unix socket first and I also don't think we currently have support for using that in-kernel, but it's a well-known way to hand file descriptors to other processes. A big plus would be that this would even work with non-kernel servers (or event clients for the matter) as long as they run on the same kernel (including non-Linux kernels).
On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote: > > > > On Jul 3, 2024, at 11:18 AM, Christoph Hellwig <hch@infradead.org> wrote: > > The actual way > > to find the file struct still would be nasty, but I'll try to think of > > something good for that. > > It is that very code that I've asked to be replaced before this > series can be merged. We have a set of patches for improving > that aspect that Neil is working on now. > > When Mike presented LOCALIO to me at LSF, my initial suggestion > was to use pNFS. I think Jeff had the same reaction. No, Jeff suggested using a O_TMPFILE based thing for localio handshake. But he had the benefit of knowing NFSv3 important for the intended localio usecase, so I'm not aware of him having pNFS design ideas. > IMO the design document should, as part of the problem statement, > explain why a pNFS-only solution is not workable. Sure, I can add that. I explained the NFSv3 requirement when we discussed at LSF. > I'm also concerned about applications in one container being > able to reach around existing mount namespace silos into the > NFS server container's file systems. Obviously the NFS protocol > has its own authorization that would grant permission for that > access, but via the network. Jeff also had concerns there (as did I) but we arrived at NFS having the ability to do it over network, so doing it with localio ultimately "OK". That said, localio isn't taking special action to escape mount namespaces (that I'm aware of) and in practice there are no requirements to do so.
On Wed, Jul 03, 2024 at 03:26:30PM +0000, Chuck Lever III wrote: > > > > On Jul 3, 2024, at 4:52 AM, Mike Snitzer <snitzer@kernel.org> wrote: > > > > And if spending the past 2 months discussing and developing in the > > open is rushing things, I clearly need to slow down... > > > > If only I had reason to think others were considering merging these > > changes: https://marc.info/?l=linux-nfs&m=171942776105165&w=2 > > There is no mention of a particular kernel release in that > email, nor is there a promise that we could hit the v6.11 > merge window. > > In particular, I was asking how the series should be split > up, since it modifies two separately maintained subsystems. > > I apologize for not asking this more clearly. No problem, I just got ahead of myself. Not your fault.
On Wed, 2024-07-03 at 11:36 -0400, Mike Snitzer wrote: > On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote: > > > > > > > On Jul 3, 2024, at 11:18 AM, Christoph Hellwig > > > <hch@infradead.org> wrote: > > > The actual way > > > to find the file struct still would be nasty, but I'll try to > > > think of > > > something good for that. > > > > It is that very code that I've asked to be replaced before this > > series can be merged. We have a set of patches for improving > > that aspect that Neil is working on now. > > > > When Mike presented LOCALIO to me at LSF, my initial suggestion > > was to use pNFS. I think Jeff had the same reaction. > > No, Jeff suggested using a O_TMPFILE based thing for localio > handshake. But he had the benefit of knowing NFSv3 important for the > intended localio usecase, so I'm not aware of him having pNFS design > ideas. > The other problem with doing this is that if a server is running in a container, how is it to know that the client is in different container on the same host, and hence that it can give out a localio layout? We'd still need some way to detect that anyway, which would probably look a lot like the localio protocol. > > IMO the design document should, as part of the problem statement, > > explain why a pNFS-only solution is not workable. > > Sure, I can add that. > > I explained the NFSv3 requirement when we discussed at LSF. > > > I'm also concerned about applications in one container being > > able to reach around existing mount namespace silos into the > > NFS server container's file systems. Obviously the NFS protocol > > has its own authorization that would grant permission for that > > access, but via the network. > > Jeff also had concerns there (as did I) but we arrived at NFS having > the ability to do it over network, so doing it with localio > ultimately > "OK". That said, localio isn't taking special action to escape mount > namespaces (that I'm aware of) and in practice there are no > requirements to do so. The one thing I think we need to ensure is that an unauthorized NFS client on the same kernel can't use this to bypass export permission checks. IOW, suppose we have a client and server on the same host. The server allows the client to access some of its exports, but not all. The rest are restricted to only certain IP addresses. Can the client use its localio access to bypass that since it's not going across the network anymore? Maybe by using open_by_handle_at on the NFS share on a guessed filehandle? I think we need to ensure that that isn't possible. I wonder if it's also worthwhile to gate localio access on an export option, just out of an abundance of caution.
> On Jul 3, 2024, at 11:36 AM, Mike Snitzer <snitzer@kernel.org> wrote: > > On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote: > >> IMO the design document should, as part of the problem statement, >> explain why a pNFS-only solution is not workable. > > Sure, I can add that. > > I explained the NFSv3 requirement when we discussed at LSF. You explained it to me in a private conversation, although there was a lot of "I don't know yet" in that discussion. It needs to be (re)explained in a public forum because reviewers keep bringing this question up. I hope to see more than just "NFSv3 is in the mix". There needs to be some explanation of why it is necessary to support NFSv3 without the use of pNFS flexfile. -- Chuck Lever
On Wed, Jul 03, 2024 at 05:19:06PM +0000, Chuck Lever III wrote: > > > > On Jul 3, 2024, at 11:36 AM, Mike Snitzer <snitzer@kernel.org> wrote: > > > > On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote: > > > >> IMO the design document should, as part of the problem statement, > >> explain why a pNFS-only solution is not workable. > > > > Sure, I can add that. > > > > I explained the NFSv3 requirement when we discussed at LSF. > > You explained it to me in a private conversation, although > there was a lot of "I don't know yet" in that discussion. Those "I don't know yet" were in response to you asking why a pNFS layout (like the block layout) is not possible to achieve localio. The answer to that is: someone(s) could try that, but there is no interest from me or my employer to resort to using block layout with centralized mapping of which client and DS are local so that the pNFS MDS could handout such pNFS block layouts. That added MDS complexity can be avoided if the client and server have autonomy to negotiate more performant access without a centralized arbiter (hence the "localio" handshake). > It needs to be (re)explained in a public forum because > reviewers keep bringing this question up. Sure. > I hope to see more than just "NFSv3 is in the mix". There > needs to be some explanation of why it is necessary to > support NFSv3 without the use of pNFS flexfile. Loaded question there, not sure why you're leading with it being invalid to decouple localio (leveraging client and server locality) from pNFS. NFS can realize benefits from localio being completely decoupled from flexfiles and pNFS. There are clear benefits with container use-cases that don't use pNFS at all. Just so happens that flexfiles ushers in the use of NFSv3. Once the client gets a flexfiles layout that points to an NFSv3 DS: the client IO is issued in terms of NFSv3. If the client happens to be on the same host as the server then using localio is a win.
On Thu, 04 Jul 2024, Chuck Lever III wrote: > > > > On Jul 3, 2024, at 11:36 AM, Mike Snitzer <snitzer@kernel.org> wrote: > > > > On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote: > > > >> IMO the design document should, as part of the problem statement, > >> explain why a pNFS-only solution is not workable. > > > > Sure, I can add that. > > > > I explained the NFSv3 requirement when we discussed at LSF. > > You explained it to me in a private conversation, although > there was a lot of "I don't know yet" in that discussion. > > It needs to be (re)explained in a public forum because > reviewers keep bringing this question up. > > I hope to see more than just "NFSv3 is in the mix". There > needs to be some explanation of why it is necessary to > support NFSv3 without the use of pNFS flexfile. > My perspective if "of course NFSv3". This core idea is to accelerate loop-back NFS and unless we have decided to deprecate NFSv3 (as I think we have decided to deprecate NFSv2), then NFSv3 support should be on the table. If v3 support turns out to be particularly burdensome, then it's not a "must have" for me, but it isn't at all clear to me that a pNFS approach would have fewer problems - only different problems. Just my 2c worth. NeilBrown
On Wed, Jul 03, 2024 at 11:28:55AM -0400, Mike Snitzer wrote: > Containers is a significant usecase, but also any client that might > need to access local storage efficiently (e.g. GPU service running NFS > client that needs to access NVMe on same host). Please explain that in terms of who talks to whom concretely using the actual Linux and/or NFS entities. The last sentence just sound like a AI generated marketing whitepaper. > I can tighten that up in the Documentation. Please write up a coherent document for the use case and circle it around. It's kinda pointless do code review if we don't have a problem statement and use case. > Using pNFS layout isn't viable because NFSv3 is very much in the mix > (flexfiles layout) for Hammerspace. Again, why and how. We have a codebase that works entirely inside the Linux kernel, and requires new code to be merged. If we can't ask people to use a the current protocol (where current means a 14 year old RFC!), we have a problem.
On Wed, Jul 03, 2024 at 03:04:05PM -0400, Mike Snitzer wrote: > The answer to that is: someone(s) could try that, but there is no > interest from me or my employer to resort to using block layout with > centralized mapping of which client and DS are local so that the pNFS > MDS could handout such pNFS block layouts. Where did block layout suddenly come from? > That added MDS complexity can be avoided if the client and server have > autonomy to negotiate more performant access without a centralized > arbiter (hence the "localio" handshake). Doing a localio layout would actually be a lot simpler than the current mess, so that argument goes the other way around. > NFS can realize benefits from localio being completely decoupled from > flexfiles and pNFS. How about actually listing the benefits? > There are clear benefits with container use-cases > that don't use pNFS at all. Well, the point would be to make them use pNFS, because pNFS is the well known and proven way to bypass the main server in NFS. > Just so happens that flexfiles ushers in the use of NFSv3. Once the > client gets a flexfiles layout that points to an NFSv3 DS: the client > IO is issued in terms of NFSv3. If the client happens to be on the > same host as the server then using localio is a win. I have no idea where flexfiles comes in here and why it matters. The Linux server does not even support flexfiles layouts.
On Wed, Jul 03, 2024 at 01:06:51PM -0400, Jeff Layton wrote: > The other problem with doing this is that if a server is running in a > container, how is it to know that the client is in different container > on the same host, and hence that it can give out a localio layout? We'd > still need some way to detect that anyway, which would probably look a > lot like the localio protocol. We'll need some way to detect that client and server are capable of the bypass. And from all it looks that's actually the hard and complicated part, and we'll need that for any scheme. And then we need a way to bypass the server for I/O, which currently is rather complex in the patchset and would be almost trivial with a new pNFS layout. > Can the client use its localio access to bypass that since it's not > going across the network anymore? Maybe by using open_by_handle_at on > the NFS share on a guessed filehandle? I think we need to ensure that > that isn't possible. If a file system is shared by containers and users in containers have the capability to use open_by_handle_at the security model is already broken without NFS or localio involved. > I wonder if it's also worthwhile to gate localio access on an export > option, just out of an abundance of caution. export and mount option. We're speaking a non-standard side band protocol here, there is no way that should be done without explicit opt-in from both sides.
On Wed, Jul 03, 2024 at 11:36:00AM -0400, Mike Snitzer wrote: > > When Mike presented LOCALIO to me at LSF, my initial suggestion > > was to use pNFS. I think Jeff had the same reaction. > > No, Jeff suggested using a O_TMPFILE based thing for localio > handshake. But he had the benefit of knowing NFSv3 important for the > intended localio usecase, so I'm not aware of him having pNFS design > ideas. How does O_TMPFILE fit in here? NFS doesn't even support O_TMPFILE.
On Wed, 2024-07-03 at 23:01 -0700, Christoph Hellwig wrote: > On Wed, Jul 03, 2024 at 11:36:00AM -0400, Mike Snitzer wrote: > > > When Mike presented LOCALIO to me at LSF, my initial suggestion > > > was to use pNFS. I think Jeff had the same reaction. > > > > No, Jeff suggested using a O_TMPFILE based thing for localio > > handshake. But he had the benefit of knowing NFSv3 important for the > > intended localio usecase, so I'm not aware of him having pNFS design > > ideas. > > How does O_TMPFILE fit in here? NFS doesn't even support O_TMPFILE. > At LSF we were tossing around ideas about how to detect whether the client and server were on the same host. My thinking was to have a common fs (maybe even a tmpfs) that was exported by all of the servers on the host and accessible by all of the containers on the host. The client would then do an O_TMPFILE open in that tmpfs, write some data to it (uuids or something) and determine the filehandle. Then it could issue a v3 READ against the NFS server for that filehandle and if it worked and the contents were as expected you could be sure you're on the same host. The client could then just close the file and it would be cleaned up. The problem of course is that that requires having a fs that is commonly accessible between all of the containers, which is a bit more setup than is ideal. The localio protocol (particularly with Neil's suggested improvements) is really a better scheme I think.
On Wed, Jul 03, 2024 at 11:00:27PM -0700, Christoph Hellwig wrote: > On Wed, Jul 03, 2024 at 01:06:51PM -0400, Jeff Layton wrote: > > The other problem with doing this is that if a server is running in a > > container, how is it to know that the client is in different container > > on the same host, and hence that it can give out a localio layout? We'd > > still need some way to detect that anyway, which would probably look a > > lot like the localio protocol. (NOTE, Jeff's message above was him stating flaws in his O_TMPFILE idea that we discussed at LSF, his idea wasn't pusued for many reasons. And Jeff has stated he believes localio better) > We'll need some way to detect that client and server are capable > of the bypass. And from all it looks that's actually the hard and > complicated part, and we'll need that for any scheme. Yes, hence the localio protocol that has wide buyin. To the point I ran with registering an NFS Program number with iana.org for the effort. My doing the localio protocol was born out of the requirement to support NFSv3. Neil's proposed refinement to add a a localio auth_domain to the nfsd_net and his proposed risk-averse handshake within the localio protocol will both improve security. > And then we need a way to bypass the server for I/O, which currently is > rather complex in the patchset and would be almost trivial with a new > pNFS layout. Some new layout misses the entire point of having localio work for NFSv3 and NFSv4. NFSv3 is very ubiquitous. And in this localio series, flexfiles is trained to use localio. (Which you apparently don't recognize or care about because nfsd doesn't have flexfiles server support). > > Can the client use its localio access to bypass that since it's not > > going across the network anymore? Maybe by using open_by_handle_at on > > the NFS share on a guessed filehandle? I think we need to ensure that > > that isn't possible. > > If a file system is shared by containers and users in containers have > the capability to use open_by_handle_at the security model is already > broken without NFS or localio involved. Containers deployed by things like podman.io and kubernetes are perfectly happy to allow containers permission to drive knfsd threads in the host kernel. That this is foreign to you is odd. An NFS client that happens to be on the host should work perfectly fine too (if it has adequate permissions). > > I wonder if it's also worthwhile to gate localio access on an export > > option, just out of an abundance of caution. > > export and mount option. We're speaking a non-standard side band > protocol here, there is no way that should be done without explicit > opt-in from both sides. That is already provided my existing controls. With both Kconfig options that default to N, and the ability to disable the use of localio entirely even if enabled in the Kconfig: echo N > /sys/module/nfs/parameters/localio_enabled And then ontop of it you have to loopback NFS mount.
On Thu, Jul 04, 2024 at 02:31:46PM -0400, Mike Snitzer wrote: > Some new layout misses the entire point of having localio work for > NFSv3 and NFSv4. NFSv3 is very ubiquitous. I'm getting tird of bringing up this "oh NFSv3" again and again without any explanation of why that matters for communication insides the same Linux kernel instance with a kernel that obviously requires patching. Why is running an obsolete protocol inside the same OS instance required. Maybe it is, but if so it needs a very good explanation. > And in this localio series, flexfiles is trained to use localio. > (Which you apparently don't recognize or care about because nfsd > doesn't have flexfiles server support). And you fail to explain why it matters. You are trying to sell this code, you better have an explanation why it's complicated and convoluted as hell. So far we are running in circles but there has been no clear explanation of use cases. > > > Can the client use its localio access to bypass that since it's not > > > going across the network anymore? Maybe by using open_by_handle_at on > > > the NFS share on a guessed filehandle? I think we need to ensure that > > > that isn't possible. > > > > If a file system is shared by containers and users in containers have > > the capability to use open_by_handle_at the security model is already > > broken without NFS or localio involved. > > Containers deployed by things like podman.io and kubernetes are > perfectly happy to allow containers permission to drive knfsd threads > in the host kernel. That this is foreign to you is odd. > > An NFS client that happens to be on the host should work perfectly > fine too (if it has adequate permissions). Can you please stop the personal attacks? I am just stating the fact that IF the containers using the NFS mount has access to the exported file systems and the privileges to use open by handle there is nothing nfsd can do about security as the container has full access to the file system anyway. That's a fact and how you deploy the various containers is completely irrelevant. It is also in case that you didn't notice it last time about the _client_ containers as stated by me and the original poster I replied to. > > > I wonder if it's also worthwhile to gate localio access on an export > > > option, just out of an abundance of caution. > > > > export and mount option. We're speaking a non-standard side band > > protocol here, there is no way that should be done without explicit > > opt-in from both sides. > > That is already provided my existing controls. With both Kconfig > options that default to N, and the ability to disable the use of > localio entirely even if enabled in the Kconfig: > echo N > /sys/module/nfs/parameters/localio_enabled And all of that is global and not per-mount or nfsd instance, which doesn't exactly scale to a multi-tenant container hosting setup.
> On Jul 5, 2024, at 1:18 AM, Christoph Hellwig <hch@infradead.org> wrote: > > On Thu, Jul 04, 2024 at 02:31:46PM -0400, Mike Snitzer wrote: >> Some new layout misses the entire point of having localio work for >> NFSv3 and NFSv4. NFSv3 is very ubiquitous. > > I'm getting tird of bringing up this "oh NFSv3" again and again without > any explanation of why that matters for communication insides the > same Linux kernel instance with a kernel that obviously requires > patching. Why is running an obsolete protocol inside the same OS > instance required. Maybe it is, but if so it needs a very good > explanation. I agree: I think the requirement for NFSv3 in this situation needs a clear justification. Both peers are recent vintage Linux kernels; both peers can use NFSv4.x, there's no explicit need for backwards compatibility in the use cases that have been provided so far. Generally I do agree with Neil's "why not NFSv3, we still support it" argument. But with NFSv4, you get better locking semantics, delegation, pNFS (possibly), and proper protocol extensibility. There are really strong reasons to restrict this facility to NFSv4. -- Chuck Lever
On Fri, Jul 05, 2024 at 01:35:18PM +0000, Chuck Lever III wrote: > I agree: I think the requirement for NFSv3 in this situation > needs a clear justification. Both peers are recent vintage > Linux kernels; both peers can use NFSv4.x, there's no > explicit need for backwards compatibility in the use cases > that have been provided so far. More importantly both peers are in fact the exact same Linux kernel instance. Which is the important point here - we are doing a bypass for a kernel talking to itself, although a kernel suffering from multiple personality (dis)order where the different sides might expose very different system views.
On Fri, Jul 05, 2024 at 01:35:18PM +0000, Chuck Lever III wrote: > > > > On Jul 5, 2024, at 1:18 AM, Christoph Hellwig <hch@infradead.org> wrote: > > > > On Thu, Jul 04, 2024 at 02:31:46PM -0400, Mike Snitzer wrote: > >> Some new layout misses the entire point of having localio work for > >> NFSv3 and NFSv4. NFSv3 is very ubiquitous. > > > > I'm getting tird of bringing up this "oh NFSv3" again and again without > > any explanation of why that matters for communication insides the > > same Linux kernel instance with a kernel that obviously requires > > patching. Why is running an obsolete protocol inside the same OS > > instance required. Maybe it is, but if so it needs a very good > > explanation. > > I agree: I think the requirement for NFSv3 in this situation > needs a clear justification. Both peers are recent vintage > Linux kernels; both peers can use NFSv4.x, there's no > explicit need for backwards compatibility in the use cases > that have been provided so far. > > Generally I do agree with Neil's "why not NFSv3, we still > support it" argument. But with NFSv4, you get better locking > semantics, delegation, pNFS (possibly), and proper protocol > extensibility. There are really strong reasons to restrict > this facility to NFSv4. NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on the same host.
On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote: > NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on > the same host. That doesn't really bring is any further. Why is it required? I think we'll just need to stop this discussion until we have reasonable documentation of the use cases and assumptions, because without that we'll get hund up in dead loops.
On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote: > On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote: > > NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on > > the same host. > > That doesn't really bring is any further. Why is it required? > > I think we'll just need to stop this discussion until we have reasonable > documentation of the use cases and assumptions, because without that > we'll get hund up in dead loops. It _really_ isn't material to the core capability that localio provides. localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in containers). Hammerspace needs localio to work with NFSv3 to assist with its "data movers" that run on the host (using nfs and nfsd). Please just remove yourself from the conversation if you cannot make sense of this. If you'd like to be involved, put the work in to understand the code and be professional.
> On Jul 5, 2024, at 10:36 AM, Mike Snitzer <snitzer@kernel.org> wrote: > > On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote: >> On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote: >>> NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on >>> the same host. >> >> That doesn't really bring is any further. Why is it required? >> >> I think we'll just need to stop this discussion until we have reasonable >> documentation of the use cases and assumptions, because without that >> we'll get hund up in dead loops. > > It _really_ isn't material to the core capability that localio provides. > localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in > containers). > > Hammerspace needs localio to work with NFSv3 to assist with its "data > movers" that run on the host (using nfs and nfsd). > > Please just remove yourself from the conversation if you cannot make > sense of this. If you'd like to be involved, put the work in to > understand the code and be professional. Sorry, I can't make sense of this either, and I find the personal attack here completely inappropriate (and a bit hypocritical, to be honest). I have nothing else to contribute that you won't either dismiss or treat as a personal attack, so I can't continue this conversation. -- Chuck Lever
On Fri, 2024-07-05 at 10:36 -0400, Mike Snitzer wrote: > On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote: > > On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote: > > > NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 > > > knfsd on > > > the same host. > > > > That doesn't really bring is any further. Why is it required? > > > > I think we'll just need to stop this discussion until we have > > reasonable > > documentation of the use cases and assumptions, because without > > that > > we'll get hund up in dead loops. > > It _really_ isn't material to the core capability that localio > provides. > localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in > containers). > > Hammerspace needs localio to work with NFSv3 to assist with its "data > movers" that run on the host (using nfs and nfsd). > > Please just remove yourself from the conversation if you cannot make > sense of this. If you'd like to be involved, put the work in to > understand the code and be professional. I disagree wholeheartedly with this statement. Christoph has raised a very valid point. You have _not_ articulated why v3 access is important here. I'm aware of why it is (at least to HS), and I think there are other valid reasons to keep v3 in the mix (as Neil has pointed out). But, that info should be in the cover letter and changelogs. Not everyone has insight into this, and tbqh, my understanding could be wrong. Let's do please try to keep the discussion civil.
On Fri, 05 Jul 2024, Christoph Hellwig wrote: > On Thu, Jul 04, 2024 at 02:31:46PM -0400, Mike Snitzer wrote: > > Some new layout misses the entire point of having localio work for > > NFSv3 and NFSv4. NFSv3 is very ubiquitous. > > I'm getting tird of bringing up this "oh NFSv3" again and again without > any explanation of why that matters for communication insides the > same Linux kernel instance with a kernel that obviously requires > patching. Why is running an obsolete protocol inside the same OS > instance required. Maybe it is, but if so it needs a very good > explanation. I would like to see a good explanation for why NOT NFSv3. I don't think NFSv3 is obsolete. The first dictionary is "No longer in use." which certainly doesn't apply. I think "deprecated" is a more relevant term. I believe that NFSv2 has been deprecated. I believe that NFSv4.0 should be deprecated. But I don't see any reason to consider NFSv3 to be deprecated. > > > And in this localio series, flexfiles is trained to use localio. > > (Which you apparently don't recognize or care about because nfsd > > doesn't have flexfiles server support). > > And you fail to explain why it matters. You are trying to sell this > code, you better have an explanation why it's complicated and convoluted > as hell. So far we are running in circles but there has been no clear > explanation of use cases. Please avoid sweeping statements like "complicated and convoluted" without backing them up with specifics. I don't particularly want to defend the current localio protocol, and I certainly see a number of points which can and must be improved. But it isn't clear to me that the big picture is either complicated or convoluted. Please provide details. Thanks, NeilBrown
On Fri, Jul 05, 2024 at 02:59:31PM +0000, Chuck Lever III wrote: > > > > On Jul 5, 2024, at 10:36 AM, Mike Snitzer <snitzer@kernel.org> wrote: > > > > On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote: > >> On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote: > >>> NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on > >>> the same host. > >> > >> That doesn't really bring is any further. Why is it required? > >> > >> I think we'll just need to stop this discussion until we have reasonable > >> documentation of the use cases and assumptions, because without that > >> we'll get hund up in dead loops. > > > > It _really_ isn't material to the core capability that localio provides. > > localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in > > containers). > > > > Hammerspace needs localio to work with NFSv3 to assist with its "data > > movers" that run on the host (using nfs and nfsd). > > > > Please just remove yourself from the conversation if you cannot make > > sense of this. If you'd like to be involved, put the work in to > > understand the code and be professional. > > Sorry, I can't make sense of this either, and I find the > personal attack here completely inappropriate (and a bit > hypocritical, to be honest). Hi Chuck, I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to Christoph. Who has taken to feign incapable of understanding localio yet is perfectly OK with flexing like he is an authority on the topic. He rallied to your Nacked-By with his chest puffed up and has proceeded to baselessly shit-talk (did you miss his emails while we slept last night?). Yes, let's condone and encourage more of that!? No, I won't abide such toxicity. But thankfully Neil has since called for him to stop. Alas... Earlier today I answered the question about "why NFSv3?" in simple terms. You and Christoph rejected it. I'm _not_ being evassive. There isn't a lot to it: "More efficient NFS in containers" _is_ the answer. But hopefully this email settles "why NFSv3?". If not, please help me (or others) further your understanding by reframing your NFSv3 concern in terms other than "why NFSv3?". Getting a bit like having to answer "why is water wet?" Why NFSv3? ---------- The localio feature improves IO performance when using NFSv3 and NFSv4 with containers. Hammerspace has immediate need for the NFSv3 support, because its "data movers" use NFSv3, but NFSv4 support is expected to be useful in the future. Just because Hammerspace is very invested in pNFS doesn't mean all aspects are framed in terms of it. General statement: ------------------ I wrote maybe ~30% of the entire localio code as it stands at "v11" and that was focused primarily on adding NFSv4 support and developing the localio protocol, hooking it into NFS's client initialization and teardown along with the server (and vice-versa, nfsd lifetime due to container applications: tearing down nfsd in container while nfs client actively connected from host). Neil helped refine the localio protocol part, and he has also looked critically at many aspects and has a great list of improvements that are needed. Jeff provided top-notch review of my initial use of SRCU and later the percpu refcnt for interlocking with the client and server. My point: others wrote the majority of localio (years ago). I'm just trying to shepherd it upstream in an acceptable form. And yes, localio supporting both NFSv3 and NFSv4 is important to me, Hammerspace and anyone who'd like more efficient IO with both NFSv3 and NFSv4 in containers. Answering "Why NFSv3?" with questions: -------------------------------------- 1) Why wasn't general NFS localio bypass controversial 3 weeks ago? Instead (given all inputs, NFSv3 support requirement being one of them) the use of a "localio protocol" got broad consensus and buyin from you, Jeff, and Neil. I _thought_ we all agreed localio was a worthwhile _auxilliary_ addition to Linux's NFS client and server (to give them awareness of each other through nfs_common) regardless of NFS protocol version. That is why I registered a localio RPC program number with IANA (at your suggestion, you were cc'd when I applied for it, and you are named on IANA.org along with Trond and myself for the program number IANA assigned): https://www.iana.org/assignments/rpc-program-numbers/rpc-program-numbers.txt $ cat rpc-program-numbers.txt | egrep 'Snitzer|Myklebust|Lever' Linux Kernel Organization 400122 nfslocalio [Mike_Snitzer][Trond_Myklebust][Chuck_Lever] [Chuck_Lever] Chuck Lever mailto:chuck.lever&oracle.com 2024-06-20 [Mike_Snitzer] Mike Snitzer mailto:snitzer&kernel.org 2024-06-20 [Trond_Myklebust] Trond Myklebust mailto:trondmy&hammerspace.com 2024-06-20 2) If we're introducing a general NFS localio bypass feature _and_ NFSv3 is important to the stakeholder proposing the feature _and_ NFSv3 support is easily implemented and supported: then why do you have such concern about localio supporting NFSv3? 3) Why do you think NFSv3 unworthy? Is this just a bellweather for broader opposition to flexfiles (and its encouraging more use of NFSv3)? Flexfiles is at the heart of NFSv3 use at Hammerspace. I've come to understand from you and Christoph that the lack of flexfiles support in NFSD helps fuel dislike for flexfiles. That's a lot for me to unpack, and pretty far removed from "why NFSv3?", so I'd need more context than I have if Hammerspace's use of flexfiles is what is fueling your challenge of localio's NFSv3 support. ... Reiterating and then expanding on my email above: localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in containers). Hammerspace needs localio to work with NFSv3 to assist with its "data movers" that run on the host (using nfs [on host] and nfsd [within container]). Now you can ask why _that_ is.. but it really is pretty disjoint from the simple matter of ensuring localio support both NFSv3 and NFSv4. I've shared that Hammerspace's "data movers" use NFSv3 currently, in the future they could use NFSv4 as needed. Hence the desire to support localio with both NFSv3 and NFSv4. [when I picked up the localio code NFSv4 wasn't even supported yet]. I _hope_ I've now answered "why NFSv3?" clearly. > I have nothing else to contribute that you won't either > dismiss or treat as a personal attack, so I can't continue > this conversation. That isn't even a little bit fair... but not taking the bait. Neil has been wonderful to work with and I look forward to all future work with him (localio and beyond). I am not trying to do anything out of line with this feature. I am and have been actively working with you, Neil and Jeff for over a month now. I've adapted and learned, _with_ your and others' help, to the best of my ability. I'm trying here, maybe you could say "I'm trying too hard". Well I just started a new job with Hammerspace after working for Red Hat for the past 15 years (much of my time spent as the upstream Linux DM maintainer -- but you know this). I am a capable engineer and I've proposed the upstreaming of a localio feature that would do well to land upstream. I've done so in a proficient way all things considered, always happy to learn new things and improve. I need to work with you. Hopefully well, and hopefully I can earn your respect, please just know I'm merely trying to improve NFS. Hammerspace would like to get all its Linux kernel NFS innovation upstream. And I'm trying to do that. localio is my first task and I've been working on it with focus for the past 2 months since joining Hammerspace. But you basically know all this, I said all of it to you at LSF. So if you know all these things (I _know_ you do), why are you treating me in this way? I feel like I'm caught in the middle of some much bigger divide than anything I've been involved with, caused or made privy to. Guess the messenger gets shot sometimes. Mike
On Sat, 06 Jul 2024, Mike Snitzer wrote: > On Fri, Jul 05, 2024 at 02:59:31PM +0000, Chuck Lever III wrote: > > > > > > > On Jul 5, 2024, at 10:36 AM, Mike Snitzer <snitzer@kernel.org> wrote: > > > > > > On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote: > > >> On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote: > > >>> NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on > > >>> the same host. > > >> > > >> That doesn't really bring is any further. Why is it required? > > >> > > >> I think we'll just need to stop this discussion until we have reasonable > > >> documentation of the use cases and assumptions, because without that > > >> we'll get hund up in dead loops. > > > > > > It _really_ isn't material to the core capability that localio provides. > > > localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in > > > containers). > > > > > > Hammerspace needs localio to work with NFSv3 to assist with its "data > > > movers" that run on the host (using nfs and nfsd). > > > > > > Please just remove yourself from the conversation if you cannot make > > > sense of this. If you'd like to be involved, put the work in to > > > understand the code and be professional. > > > > Sorry, I can't make sense of this either, and I find the > > personal attack here completely inappropriate (and a bit > > hypocritical, to be honest). > > Hi Chuck, > > I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to > Christoph. Who has taken to feign incapable of understanding localio > yet is perfectly OK with flexing like he is an authority on the topic. Ad Hominem doesn't achieve anything useful. Please stick with technical arguments. (They are the only ones I understand). NeilBrown
On Fri, Jul 05, 2024 at 11:58:56PM -0400, Mike Snitzer wrote: > I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to > Christoph. Who has taken to feign incapable of understanding localio > yet is perfectly OK with flexing like he is an authority on the topic. Hi Mike, please take a few days off and relax, and then write an actual use case and requirements document. I'm out of this thread for now, but I'd appreciate if you'd just restart, assuming no one is activing in bad faith and try to explain what you are doing and why without getting upset.
On Sat, Jul 06, 2024 at 08:08:07AM +1000, NeilBrown wrote: > I would like to see a good explanation for why NOT NFSv3. > I don't think NFSv3 is obsolete. The first dictionary is "No longer in > use." which certainly doesn't apply. > I think "deprecated" is a more relevant term. I believe that NFSv2 has > been deprecated. I believe that NFSv4.0 should be deprecated. But I > don't see any reason to consider NFSv3 to be deprecated. The obvious answer is that NFSv4.1/2 (which is really the same thing) is the only version of NFS under development and open for new features at the protocol level. So from the standardization perspective NFSv3 is obsolete. But the more important point is that NFSv4 has a built-in way to bypass the server for I/O namely pNFS. And bypassing the server by directly going to a local file system is the text book example for what pNFS does. So we'll need a really good argument why we need to reinvented a different scheme for bypassing the server for I/O. Maybe there is a really good killer argument for doing that, but it needs to be clearly stated and defended instead of assumed.
On Sat, 06 Jul 2024, Christoph Hellwig wrote: > On Sat, Jul 06, 2024 at 08:08:07AM +1000, NeilBrown wrote: > > I would like to see a good explanation for why NOT NFSv3. > > I don't think NFSv3 is obsolete. The first dictionary is "No longer in > > use." which certainly doesn't apply. > > I think "deprecated" is a more relevant term. I believe that NFSv2 has > > been deprecated. I believe that NFSv4.0 should be deprecated. But I > > don't see any reason to consider NFSv3 to be deprecated. > > The obvious answer is that NFSv4.1/2 (which is really the same thing) > is the only version of NFS under development and open for new features > at the protocol level. So from the standardization perspective NFSv3 > is obsolete. RFC-1813 is certainly obsolete from a standardization perspective - it isn't even an IETF standard - only informational. It can't be extended with any hope of interoperability between implementations. But we don't want interoperability between implementations. We want to enhance the internal workings of one particular implementation. I don't see that the standards status affects that choice. > > But the more important point is that NFSv4 has a built-in way to bypass > the server for I/O namely pNFS. And bypassing the server by directly > going to a local file system is the text book example for what pNFS > does. So we'll need a really good argument why we need to reinvented > a different scheme for bypassing the server for I/O. Maybe there is > a really good killer argument for doing that, but it needs to be clearly > stated and defended instead of assumed. Could you provide a reference to the text book - or RFC - that describes a pNFS DS protocol that completely bypasses the network, allowing the client and MDS to determine if they are the same host and to potentially do zero-copy IO. If not, I will find it hard to understand your claim that it is "the text book example". Also, neither you nor I are in a position to assert what is needed for a change to get accepted. That is up the the people with M: in front of their email address. I believe that any council that either of us give will considered with genuine interest, but making demands seems out of place. Thanks, NeilBrown
On Sat, Jul 06, 2024 at 04:37:22PM +1000, NeilBrown wrote: > > a different scheme for bypassing the server for I/O. Maybe there is > > a really good killer argument for doing that, but it needs to be clearly > > stated and defended instead of assumed. > > Could you provide a reference to the text book - or RFC - that describes > a pNFS DS protocol that completely bypasses the network, allowing the > client and MDS to determine if they are the same host and to potentially > do zero-copy IO. I did not say that we have the exact same functionality available and there is no work to do at all, just that it is the standard way to bypass the server. RFC 5662, RFC 5663 and RFC 8154 specify layouts that completely bypass the network and require the client and server to find out that they talk to the same storage devuce, and directly perform zero copy I/O. They do not require to be on the same host, though. > If not, I will find it hard to understand your claim that it is "the > text book example". pNFS is all about handing out grants to bypass the server for I/O. That is exactly what localio is doing.
Earlier yesterday I answered the question about "why NFSv3?" in simple terms. Chuck and Christoph rejected it. I'm _not_ being evassive. There isn't a lot to it: "More efficient NFS in containers" _is_ the answer. But hopefully this email settles "why NFSv3?". If not, please help me (or others) further your understanding by reframing your NFSv3 concern in terms other than "why NFSv3?". Why NFSv3? ---------- The localio feature improves IO performance when using NFSv3 and NFSv4 with containers. Hammerspace has immediate need for the NFSv3 support, because its "data movers" use NFSv3, but NFSv4 support is expected to be useful in the future. Why not use pNFS? ----------------- Just because Hammerspace is very invested in pNFS doesn't mean all aspects are framed in terms of it. The complexity of a pNFS-based approach to addressing localio makes it inferior to the proposed solution of an autonomous NFS client and server rendezvous to allow for network bypass. There is no need for pNFS and by not using pNFS the localio feature is accessible for general NFSv3 and NFSv4 use. Answering "Why NFSv3?" with questions: -------------------------------------- 1) Why wasn't general NFS localio bypass controversial 3 weeks ago? Instead (given all inputs, NFSv3 support requirement being one of them) the use of a "localio protocol" got broad consensus and buyin from Chuck, Jeff, and Neil. I _thought_ we all agreed localio was a worthwhile _auxilliary_ addition to Linux's NFS client and server (to give them awareness of each other through nfs_common) regardless of NFS protocol version. That is why I registered a localio RPC program number with IANA, at Chuck's suggestion, see: https://www.iana.org/assignments/rpc-program-numbers/rpc-program-numbers.txt $ cat rpc-program-numbers.txt | egrep 'Snitzer|Myklebust|Lever' Linux Kernel Organization 400122 nfslocalio [Mike_Snitzer][Trond_Myklebust][Chuck_Lever] [Chuck_Lever] Chuck Lever mailto:chuck.lever&oracle.com 2024-06-20 [Mike_Snitzer] Mike Snitzer mailto:snitzer&kernel.org 2024-06-20 [Trond_Myklebust] Trond Myklebust mailto:trondmy&hammerspace.com 2024-06-20 2) If we're introducing a general NFS localio bypass feature _and_ NFSv3 is important to the stakeholder proposing the feature _and_ NFSv3 support is easily implemented and supported: then why do you have such concern about localio supporting NFSv3? 3) Why do you think NFSv3 unworthy? Is this just a bellweather for broader opposition to flexfiles (and its encouraging more use of NFSv3)? Flexfiles is at the heart of NFSv3 use at Hammerspace. I've come to understand from Chuck and Christoph that the lack of flexfiles support in NFSD helps fuel dislike for flexfiles. That's a lot for me to unpack, and pretty far removed from "why NFSv3?", so I'd need more context than I have if Hammerspace's use of flexfiles is what is fueling your challenge of localio's NFSv3 support. ... Reiterating and then expanding on my earlier succinct answer: localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in containers). Hammerspace needs localio to work with NFSv3 to assist with its "data movers" that run on the host (using nfs [on host] and nfsd [within container]). Now you can ask why _that_ is.. but it really is pretty disjoint from the simple matter of ensuring localio support both NFSv3 and NFSv4. I've shared that Hammerspace's "data movers" use NFSv3 currently, in the future they could use NFSv4 as needed. Hence the desire to support localio with both NFSv3 and NFSv4. [when I picked up the localio code NFSv4 wasn't even supported yet].
On Fri, Jul 05, 2024 at 10:58:18PM -0700, Christoph Hellwig wrote: > On Fri, Jul 05, 2024 at 11:58:56PM -0400, Mike Snitzer wrote: > > I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to > > Christoph. Who has taken to feign incapable of understanding localio > > yet is perfectly OK with flexing like he is an authority on the topic. > > Hi Mike, > > please take a few days off and relax, and then write an actual use case > and requirements document. I'm out of this thread for now, but I'd > appreciate if you'd just restart, assuming no one is activing in bad > faith and try to explain what you are doing and why without getting > upset. If you'd like reasonable people to not get upset with you, you might try treating them with respect rather than gaslighting them.
> On Jul 5, 2024, at 11:58 PM, Mike Snitzer <snitzer@kernel.org> wrote: > > On Fri, Jul 05, 2024 at 02:59:31PM +0000, Chuck Lever III wrote: >> >> >>> On Jul 5, 2024, at 10:36 AM, Mike Snitzer <snitzer@kernel.org> wrote: >>> >>> On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote: >>>> On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote: >>>>> NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on >>>>> the same host. >>>> >>>> That doesn't really bring is any further. Why is it required? >>>> >>>> I think we'll just need to stop this discussion until we have reasonable >>>> documentation of the use cases and assumptions, because without that >>>> we'll get hund up in dead loops. >>> >>> It _really_ isn't material to the core capability that localio provides. >>> localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in >>> containers). >>> >>> Hammerspace needs localio to work with NFSv3 to assist with its "data >>> movers" that run on the host (using nfs and nfsd). >>> >>> Please just remove yourself from the conversation if you cannot make >>> sense of this. If you'd like to be involved, put the work in to >>> understand the code and be professional. >> >> Sorry, I can't make sense of this either, and I find the >> personal attack here completely inappropriate (and a bit >> hypocritical, to be honest). > > Hi Chuck, > > I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to > Christoph. Who has taken to feign incapable of understanding localio > yet is perfectly OK with flexing like he is an authority on the topic. Well let's try a reality test. Christoph has authored an IETF RFC on pNFS. He's also contributed the pNFS SCSI (and now NVMe) implementation in the Linux server and client. He seems to know the code well enough to offer an informed opinion. > He rallied to your Nacked-By with his chest puffed up and has > proceeded to baselessly shit-talk (did you miss his emails while we > slept last night?). Yes, let's condone and encourage more of that!? > No, I won't abide such toxicity. But thankfully Neil has since called > for him to stop. Alas... > > Earlier today I answered the question about "why NFSv3?" in simple > terms. You and Christoph rejected it. I'm _not_ being evassive. > There isn't a lot to it: "More efficient NFS in containers" _is_ the > answer. > > But hopefully this email settles "why NFSv3?". If not, please help me > (or others) further your understanding by reframing your NFSv3 concern > in terms other than "why NFSv3?". Getting a bit like having to answer > "why is water wet?" > > Why NFSv3? > ---------- > > The localio feature improves IO performance when using NFSv3 and NFSv4 > with containers. Hammerspace has immediate need for the NFSv3 > support, because its "data movers" use NFSv3, but NFSv4 support is > expected to be useful in the future. > > Just because Hammerspace is very invested in pNFS doesn't mean all > aspects are framed in terms of it. > > General statement: > ------------------ > > I wrote maybe ~30% of the entire localio code as it stands at "v11" > and that was focused primarily on adding NFSv4 support and developing > the localio protocol, hooking it into NFS's client initialization and > teardown along with the server (and vice-versa, nfsd lifetime due to > container applications: tearing down nfsd in container while nfs > client actively connected from host). Neil helped refine the localio > protocol part, and he has also looked critically at many aspects and > has a great list of improvements that are needed. Jeff provided > top-notch review of my initial use of SRCU and later the percpu refcnt > for interlocking with the client and server. > > My point: others wrote the majority of localio (years ago). I'm just > trying to shepherd it upstream in an acceptable form. And yes, > localio supporting both NFSv3 and NFSv4 is important to me, > Hammerspace and anyone who'd like more efficient IO with both NFSv3 > and NFSv4 in containers. > > Answering "Why NFSv3?" with questions: > -------------------------------------- > > 1) Why wasn't general NFS localio bypass controversial 3 weeks ago? > Instead (given all inputs, NFSv3 support requirement being one of > them) the use of a "localio protocol" got broad consensus and buyin > from you, Jeff, and Neil. > > I _thought_ we all agreed localio was a worthwhile _auxilliary_ > addition to Linux's NFS client and server (to give them awareness of > each other through nfs_common) regardless of NFS protocol version. > That is why I registered a localio RPC program number with IANA (at > your suggestion, you were cc'd when I applied for it, and you are > named on IANA.org along with Trond and myself for the program number > IANA assigned): > https://www.iana.org/assignments/rpc-program-numbers/rpc-program-numbers.txt > > $ cat rpc-program-numbers.txt | egrep 'Snitzer|Myklebust|Lever' > Linux Kernel Organization 400122 nfslocalio [Mike_Snitzer][Trond_Myklebust][Chuck_Lever] > [Chuck_Lever] Chuck Lever mailto:chuck.lever&oracle.com 2024-06-20 > [Mike_Snitzer] Mike Snitzer mailto:snitzer&kernel.org 2024-06-20 > [Trond_Myklebust] Trond Myklebust mailto:trondmy&hammerspace.com 2024-06-20 > > 2) If we're introducing a general NFS localio bypass feature _and_ > NFSv3 is important to the stakeholder proposing the feature _and_ > NFSv3 support is easily implemented and supported: then why do you > have such concern about localio supporting NFSv3? > > 3) Why do you think NFSv3 unworthy? Is this just a bellweather for > broader opposition to flexfiles (and its encouraging more use of > NFSv3)? Flexfiles is at the heart of NFSv3 use at Hammerspace. I've > come to understand from you and Christoph that the lack of flexfiles > support in NFSD helps fuel dislike for flexfiles. That's a lot for me > to unpack, and pretty far removed from "why NFSv3?", so I'd need more > context than I have if Hammerspace's use of flexfiles is what is > fueling your challenge of localio's NFSv3 support. > > ... > > Reiterating and then expanding on my email above: > > localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in > containers). > > Hammerspace needs localio to work with NFSv3 to assist with its > "data movers" that run on the host (using nfs [on host] and nfsd > [within container]). > > Now you can ask why _that_ is.. but it really is pretty disjoint from > the simple matter of ensuring localio support both NFSv3 and NFSv4. > > I've shared that Hammerspace's "data movers" use NFSv3 currently, in > the future they could use NFSv4 as needed. Hence the desire to > support localio with both NFSv3 and NFSv4. [when I picked up the > localio code NFSv4 wasn't even supported yet]. > > I _hope_ I've now answered "why NFSv3?" clearly. > >> I have nothing else to contribute that you won't either >> dismiss or treat as a personal attack, so I can't continue >> this conversation. > > That isn't even a little bit fair... but not taking the bait. > > Neil has been wonderful to work with and I look forward to all future > work with him (localio and beyond). I am not trying to do anything > out of line with this feature. I am and have been actively working > with you, Neil and Jeff for over a month now. I've adapted and > learned, _with_ your and others' help, to the best of my ability. > > I'm trying here, maybe you could say "I'm trying too hard". Well I > just started a new job with Hammerspace after working for Red Hat for > the past 15 years (much of my time spent as the upstream Linux DM > maintainer -- but you know this). I am a capable engineer and I've > proposed the upstreaming of a localio feature that would do well to > land upstream. I've done so in a proficient way all things > considered, always happy to learn new things and improve. I need to > work with you. Hopefully well, and hopefully I can earn your respect, > please just know I'm merely trying to improve NFS. > > Hammerspace would like to get all its Linux kernel NFS innovation > upstream. And I'm trying to do that. localio is my first task and > I've been working on it with focus for the past 2 months since joining > Hammerspace. But you basically know all this, I said all of it to you > at LSF. > > So if you know all these things (I _know_ you do), why are you > treating me in this way? I feel like I'm caught in the middle of some > much bigger divide than anything I've been involved with, caused or > made privy to. Yes, NFS is larger (much larger) and much older than the Linux implementation of it. I'm sorry your new colleagues have not seen fit to help you fit yourself into this picture. > Guess the messenger gets shot sometimes. This is exactly the problem: your attitude of victimhood. You act like our questions are personal attacks on you. Answering "I don't know" or "I need to think about it" or "Let me ask someone" or "Can you explain that further" is perfectly fine. Acknowledging the areas where you need to learn more is a quintessential part of being a professional software engineer. --- I have no strong feelings one way or another about flexfiles. And, I remain a full-throated advocate of the NFSv3 support in the Linux NFS stack. If you get an impression from me, come talk to me first to confirm it. Don't go talk to your colleagues; in particular don't talk to the ones who like to spread a lot of weird ideas about me. You're getting bad information. --- Our question isn't "Why NFSv3?" It's: Can your design document explain in detail how the one existing application (the data mover) will use and benefit from loopback acceleration? It needs to explain why the data mover does not use possibly more suitable solutions like NFSv4.2 COPY. Why are we going to the effort of adding this side car instead of using facilities that are already available? We're not asking for a one sentence email. A one sentence email is not "a response in simple terms". It is a petulant dismissal of our request for more info. We're asking for a real problem statement and use case, in detail, in the design document, not in email. (Go read the requests in this thread again. Honest, that's all we're asking for). --- But finally: We've asked repeatedly for very typical changes and answers, and though sometimes we're met with a positive response, other times we get a defensive response, or "I don't feel like it" or "that's busy work" or "you're wasting my time." That doesn't sound like the spirit of co-operation that I would like to see from a regular contributor, nor do I expect it from someone who is also a Linux kernel maintainer who really ought to know better. So heed Christoph's excellent advice: go eat a Snickers. Calm down. Breathe. None of the rest of us are anywhere near as upset about this as you are right now. -- Chuck Lever
> On Jul 6, 2024, at 2:42 AM, Christoph Hellwig <hch@infradead.org> wrote: > > On Sat, Jul 06, 2024 at 04:37:22PM +1000, NeilBrown wrote: >>> a different scheme for bypassing the server for I/O. Maybe there is >>> a really good killer argument for doing that, but it needs to be clearly >>> stated and defended instead of assumed. >> >> Could you provide a reference to the text book - or RFC - that describes >> a pNFS DS protocol that completely bypasses the network, allowing the >> client and MDS to determine if they are the same host and to potentially >> do zero-copy IO. > > I did not say that we have the exact same functionality available and > there is no work to do at all, just that it is the standard way to bypass > the server. > > RFC 5662, RFC 5663 and RFC 8154 specify layouts that completely bypass > the network and require the client and server to find out that they talk > to the same storage devuce, and directly perform zero copy I/O. > They do not require to be on the same host, though. > >> If not, I will find it hard to understand your claim that it is "the >> text book example". > > pNFS is all about handing out grants to bypass the server for I/O. > That is exactly what localio is doing. In particular, Neil, a pNFS block/SCSI layout provides the client with a set of device IDs. If the client is on the same storage fabric as those devices, it can then access those devices directly using SCSI commands rather than going on the network [RFC8154]. This is equivalent to a loopback acceleration mechanism. If the client and server are on the same host, then there are natural ways to expose the devices to both peers, and the existing pNFS protocol and SCSI Persistent Reservation provide strong access authorization. Both the Linux NFS client and server implement RFC 8154 well enough that this could be an alternative or even a better solution than LOCALIO. The server stores an XFS file system on the devices, and hands out layouts with the device ID and LBAs of the extents where file content is located. The fly in this ointment is the need for NFSv3 support. In an earlier email Mike mentioned that Hammerspace isn't interested in providing a centrally managed directory of block devices that could be utilized by the MDS to simply inform the client of local devices. I don't think that's the only possible solution for discovering the locality of storage devices. -- Chuck Lever
Chuck, I think we can both agree there is no real benefit to us trading punches and looking like fools more than we already have. I say that not to impugn you or your position, but we look foolish. I will pull my punches entirely (I really am, I could blow up everything and that'll really be career limiting, heh). My aim is to say my peace in this reply and hopefully we can set this unfortunate exchange to one side. I'll learn what I can from it, maybe you will, maybe others will... On Sat, Jul 06, 2024 at 04:58:50PM +0000, Chuck Lever III wrote: > > > > On Jul 5, 2024, at 11:58 PM, Mike Snitzer <snitzer@kernel.org> wrote: > > > > > > Hi Chuck, > > > > I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to > > Christoph. Who has taken to feign incapable of understanding localio > > yet is perfectly OK with flexing like he is an authority on the topic. > > Well let's try a reality test. Perception is reality, so your reality is different than mine. Neil, Christoph and yourself all took my last "Who has taken to feign" sentence above as some ad-hominem attack. It wasn't, Christoph was acting like the localio code incomprehensible for effect. > Christoph has authored an IETF RFC on pNFS. He's also contributed > the pNFS SCSI (and now NVMe) implementation in the Linux server > and client. He seems to know the code well enough to offer an > informed opinion. I am _not_ questioning (and _never_ have questioned) Christoph's extensive contributions, experience or intelligence. I am calling out that Christoph didn't actually review the localio code but proceeded to make extreme baseless negative judgments of it. And you've glossed over that entirely and made it about "Christoph is Christoph, who are you again?". Well aware who he is but I'm saying baseless negative judgments are also a very regular occurrence from him when he decides he just wants to blow up what I am doing (could be he has done this to others, I have no idea). It isn't about the technical details in the moment he does this, if he says it it must be true, the aim is taint my work. Sad because I thought that dynamic died when I finally relented in our feud over NVMe vs DM multipath that spanned 2018 to 2021) But damnit, it is happening all over again, now in the context of localio. And you're buying it because he is parroting your concerns about "why can't pNFS be used instead!?" Neil won't get an answer after having called Christoph out on this (unless Christoph now wants to try to make a liar out of me by fabricating something well after his moment to do so has passed): https://marc.info/?l=linux-nfs&m=172021727813076&w=2 If Christoph could arrest his propensity to do harm where it not warranted I'd take every single technical critical feedback very seriously and adjust the code as needed. When it is about the code, _actually_ about the code.. Christoph rules. He knows this, happy to say it (again): I respect his technical ability, etc. I do _not_ respect him making blanket statements about code without saying _why_ he arrived at his judgment. <snip many words from me answering "why NFSv3?"> > > Hammerspace would like to get all its Linux kernel NFS innovation > > upstream. And I'm trying to do that. localio is my first task and > > I've been working on it with focus for the past 2 months since joining > > Hammerspace. But you basically know all this, I said all of it to you > > at LSF. > > > > So if you know all these things (I _know_ you do), why are you > > treating me in this way? I feel like I'm caught in the middle of some > > much bigger divide than anything I've been involved with, caused or > > made privy to. > > Yes, NFS is larger (much larger) and much older than the > Linux implementation of it. I'm sorry your new colleagues > have not seen fit to help you fit yourself into this > picture. See a non-sequitur that takes shots at Hammerspace isn't professional. Over the numerous iterations of localio there have been a handful of times you have taken to name drop "Hammerspace" with a negative connotation, simmering with contempt. Like I and others should feel shame by association. They haven't done anything wrong here. Trond hasn't done anything wrong here. Whatever you grievance with Hammerspace, when interacting with me please know I don't have any context for it. It is immaterial to me and I don't need to know. If/when you have individual technical problems with something Hammerspace is doing let's just work it through without it needing to be this awkward elephant. > > Guess the messenger gets shot sometimes. > > This is exactly the problem: your attitude of victimhood. Statements like that show you're going out of your way to make an enemy of me with no _real_ basis. But let me be clear: I have absolutely been a victim of bullying and serious psychological attacks over this past week, particularly repeat gaslighting by both you and Christoph. That you're so unaware of how you have spun a false narrative and are now tag-teaming with Christoph to attack me further has been apparent for all to see. Hopefully linux-nfs is low traffic ;) You have obviously formed an unfavorable opinion of me, apparent turning point was my v11 patch header and ensuing negative exchange. I own being too pushy when seeking progress on localio v11's inclusion for 6.11; I apologized for that. Hopefully we can actually get past all of this. > You act like our questions are personal attacks on you. No, I act like your personal attacks are personal attacks. Questions and technical issues are always fair game. I pushed back on _one_ of your NFSD requirements (tracepoints), sorry if that was some indication that I'm a malcontent.. but I'm not. I can have a technical opinion though. SO I may make them known at times. > Answering "I don't know" or "I need to think about it" or > "Let me ask someone" or "Can you explain that further" is > perfectly fine. Acknowledging the areas where you need to > learn more is a quintessential part of being a professional > software engineer. I'm not afraid to admit when I don't know something (I've said it to you when we met at LSF). I'm in no way an expert in NFS, you are. I respect your command of NFS and welcome learning from you (like I have to this point and hope to in the future). > --- > > I have no strong feelings one way or another about flexfiles. > > And, I remain a full-throated advocate of the NFSv3 support > in the Linux NFS stack. > > If you get an impression from me, come talk to me first > to confirm it. Don't go talk to your colleagues; in > particular don't talk to the ones who like to spread a lot > of weird ideas about me. You're getting bad information. I haven't ever heard from anyone at Hammerspace about you. And I haven't sought insight about you either. AFAIK you aren't on anyone's mind within Hammerspace (other than me given I'm actively dealing with this unfortunate situation). > --- > > Our question isn't "Why NFSv3?" It's: Can your design > document explain in detail how the one existing application > (the data mover) will use and benefit from loopback > acceleration? It needs to explain why the data mover > does not use possibly more suitable solutions like > NFSv4.2 COPY. Why are we going to the effort of adding > this side car instead of using facilities that are > already available? > > We're not asking for a one sentence email. A one sentence > email is not "a response in simple terms". It is a petulant > dismissal of our request for more info. Well I in no way intended it to be petulant. I was trying to reduce the attack surface. And if you'd trust me at my word that'd go a long way. Try to take a step back and trust me when I tell you something. You're welcome to unfulfilled by an answer and seek clarity, but I promise you I'm not evasive or leaving information on the floor when I answer questions. If I don't know something I seek the details out. But you and Christoph are making a simple line of development (localio) into some referendum on design choices in layers you have no charter to concern yourself with (Hammerspace's data movers). You still cannot accept that localio is devoid of pNFS use case requirements (as required by Hammerspace) but still see fit to reengineer the feature in terms of pNFS. Hammerspace simply wants to optimize an NFS client on a host connecting to an NFSD running in a container on the same host. It is doing that in service to its distributed namespace that it is hosting in disparate NFSD instances. But the "data movers" are a sideband remapping/reshaping/rebalancing service. Completely disjoint from the distribute pNFS namespace that the primary (flexfiles) clients are accessing. > We're asking for a real problem statement and use case, > in detail, in the design document, not in email. > > (Go read the requests in this thread again. Honest, that's > all we're asking for). I'll consult with Trond to try to see how he suggests appeasing your request more than I already have. But you really are departing heavily from the narrow scope that localio covers. > --- > > But finally: We've asked repeatedly for very typical > changes and answers, and though sometimes we're met > with a positive response, other times we get a defensive > response, or "I don't feel like it" or "that's busy work" > or "you're wasting my time." That doesn't sound like the > spirit of co-operation that I would like to see from a > regular contributor, nor do I expect it from someone who > is also a Linux kernel maintainer who really ought to > know better. Projecting attributes and actions onto me doesn't make them true. I haven't done or said most of the things you just asserted there. I pushed back on tracepoints, but in the end I removed all the dprintk()s from the NFSD code. > So heed Christoph's excellent advice: go eat a Snickers. > Calm down. Breathe. None of the rest of us are anywhere > near as upset about this as you are right now. Your words say otherwise, they have become quite negatively charged to harm in the last week, but please let's just move on. I'm serious, no lasting harm done from my vantage point, I can move on if you can. Thanks, Mike
On Sat, 06 Jul 2024, Christoph Hellwig wrote: > On Sat, Jul 06, 2024 at 04:37:22PM +1000, NeilBrown wrote: > > > a different scheme for bypassing the server for I/O. Maybe there is > > > a really good killer argument for doing that, but it needs to be clearly > > > stated and defended instead of assumed. > > > > Could you provide a reference to the text book - or RFC - that describes > > a pNFS DS protocol that completely bypasses the network, allowing the > > client and MDS to determine if they are the same host and to potentially > > do zero-copy IO. > > I did not say that we have the exact same functionality available and > there is no work to do at all, just that it is the standard way to bypass > the server. Sometimes what you don't say is important. As you acknowledge there is work to do. Understanding how much work is involved is critical to understanding that possible direction. > > RFC 5662, RFC 5663 and RFC 8154 specify layouts that completely bypass > the network and require the client and server to find out that they talk > to the same storage devuce, and directly perform zero copy I/O. > They do not require to be on the same host, though. Thanks. > > > If not, I will find it hard to understand your claim that it is "the > > text book example". > > pNFS is all about handing out grants to bypass the server for I/O. > That is exactly what localio is doing. Yes, there is clearly an alignment. But pNFS is about handing out grants using standardised protocols that support interoperability between distinct nodes, and possibly distinct implementations. localio doesn't need any of that. It all exists in a single implementation on a single node. So in that sense there can be expected to be different priorities. Why should we pay the costs of pNFS when implementing localio? That question can only be answered if we have a good understanding of the costs and benefits. And that requires having a concrete proposal for the "pNFS" option - if only a detailed sketch. Just because pNFS could be part of the answer (which I don't dispute) that doesn't necessarily mean it should be part of the answer. Thanks, NeilBrown
On Sun, 07 Jul 2024, Chuck Lever III wrote: > > Both the Linux NFS client and server implement RFC 8154 > well enough that this could be an alternative or even a > better solution than LOCALIO. The server stores an XFS > file system on the devices, and hands out layouts with > the device ID and LBAs of the extents where file content > is located. > > The fly in this ointment is the need for NFSv3 support. Another fly in this ointment is that only XFS currently implements that .map_blocks export_operation, so only it could be used as a server-side filesystem. Maybe that would not be a barrier to Mike, but it does make it a lot less interesting to me (not that I have a particular use case in mind, but I just find "local bypass for NFSv4.1+ on XFS" less interesting than "local bypass for NFS on Linux"). But my interest isn't a requirement of course. > > In an earlier email Mike mentioned that Hammerspace isn't > interested in providing a centrally managed directory of > block devices that could be utilized by the MDS to simply > inform the client of local devices. I don't think that's > the only possible solution for discovering the locality of > storage devices. Could you sketch out an alternate solution so that it can be assessed objectively? Thanks, NeilBrown
On Mon, Jul 08, 2024 at 02:03:02PM +1000, NeilBrown wrote: > > I did not say that we have the exact same functionality available and > > there is no work to do at all, just that it is the standard way to bypass > > the server. > > Sometimes what you don't say is important. As you acknowledge there is > work to do. Understanding how much work is involved is critical to > understanding that possible direction. Of course there is. I've never said we don't need to do any work, I'm just asking why we are not using the existing infrastruture to do it. > But pNFS is about handing out grants using standardised protocols that > support interoperability between distinct nodes, and possibly distinct > implementations. localio doesn't need any of that. It all exists in a > single implementation on a single node. So in that sense there can be > expected to be different priorities. > > Why should we pay the costs of pNFS when implementing localio? Why do you think we pay a cost for it? From all I can tell it makes the job simpler, especially if we want to do things like bypassing the second page cache. > That > question can only be answered if we have a good understanding of the > costs and benefits. And that requires having a concrete proposal for > the "pNFS" option - if only a detailed sketch. I sketched the the very sketchy sketch earlier - add a new localio layout that does local file I/O. The I/O side of that is pretty tivial, and maybe I can find some time to write draft code. The file open side is just as horrible as in the current localio proposal, and I could just reuse that for now, although I think the concept of opening the file in the client contect is fundamentally wrong no matter how we skin the cat.
On Sat, Jul 06, 2024 at 05:15:08PM +0000, Chuck Lever III wrote: > In an earlier email Mike mentioned that Hammerspace isn't > interested in providing a centrally managed directory of > block devices that could be utilized by the MDS to simply > inform the client of local devices. I don't think that's > the only possible solution for discovering the locality of > storage devices. Btw, no matter what Hammerspace feels (and why that matters for Linux), the block layout is not a good idea for bypassing the network between supposedly isolated containers. It completely bypasses the NFS security model, which is an spectacularly bad idea if the clients aren't fully trusted. I've mentioned this before, but I absolutely do not advocate for using the block layout as a network bypass here. The concept to do local file I/O from the client in cases where we can do it is absolutely sensible. I just don't think doing it is a magic unmanaged layer is a good idea, and figuring out how to pass the opened file from nfsd to the client without risking security problems and creating painful layering violations needs to be solved.
On Sat, Jul 06, 2024 at 09:12:44AM -0400, Mike Snitzer wrote: > If you'd like reasonable people to not get upset with you, you might > try treating them with respect rather than gaslighting them. Let me just repeat the calm down offer. I'll do my side of it by stopping from replying to these threads for this week now, after that I'll forget all the accusations you're throwing at me, and we restart a purely technical discussion, ok?
On Mon, Jul 08, 2024 at 02:46:28AM -0700, Christoph Hellwig wrote: > On Sat, Jul 06, 2024 at 09:12:44AM -0400, Mike Snitzer wrote: > > If you'd like reasonable people to not get upset with you, you might > > try treating them with respect rather than gaslighting them. > > Let me just repeat the calm down offer. I'll do my side of it by > stopping from replying to these threads for this week now, after that > I'll forget all the accusations you're throwing at me, and we restart > a purely technical discussion, ok? OK
> On Jul 8, 2024, at 12:10 AM, NeilBrown <neilb@suse.de> wrote: > > On Sun, 07 Jul 2024, Chuck Lever III wrote: >> >> Both the Linux NFS client and server implement RFC 8154 >> well enough that this could be an alternative or even a >> better solution than LOCALIO. The server stores an XFS >> file system on the devices, and hands out layouts with >> the device ID and LBAs of the extents where file content >> is located. >> >> The fly in this ointment is the need for NFSv3 support. > > Another fly in this ointment is that only XFS currently implements that > .map_blocks export_operation, so only it could be used as a server-side > filesystem. I agree that limiting loopback acceleration only to XFS exports is an artificial and undesirable constraint. > Maybe that would not be a barrier to Mike, but it does make it a lot > less interesting to me (not that I have a particular use case in mind, > but I just find "local bypass for NFSv4.1+ on XFS" less interesting than > "local bypass for NFS on Linux"). > > But my interest isn't a requirement of course. My focus for now is ensuring that whatever is merged into NFSD can be audited, in a straightforward and rigorous way, to give us confidence that it is secure and respects the existing policies and boundaries that are configured via export authorization and local namespace settings. The pNFS-based approaches have the advantage that a lot of that auditing has already been done, and by a wider set of reviewers than just the Linux community. That's my only interest in possibly excluding NFSv3; I have no other quibble with the high-level goal of supporting NFSv3 for loopback acceleration. Thanks everyone for a spirited and frank discussion. >> In an earlier email Mike mentioned that Hammerspace isn't >> interested in providing a centrally managed directory of >> block devices that could be utilized by the MDS to simply >> inform the client of local devices. I don't think that's >> the only possible solution for discovering the locality of >> storage devices. > > Could you sketch out an alternate solution so that it can be assessed > objectively? I'll give it some thought. -- Chuck Lever
On Mon, 08 Jul 2024, Christoph Hellwig wrote: > On Mon, Jul 08, 2024 at 02:03:02PM +1000, NeilBrown wrote: > > > I did not say that we have the exact same functionality available and > > > there is no work to do at all, just that it is the standard way to bypass > > > the server. > > > > Sometimes what you don't say is important. As you acknowledge there is > > work to do. Understanding how much work is involved is critical to > > understanding that possible direction. > > Of course there is. I've never said we don't need to do any work, > I'm just asking why we are not using the existing infrastruture to do > it. > > > But pNFS is about handing out grants using standardised protocols that > > support interoperability between distinct nodes, and possibly distinct > > implementations. localio doesn't need any of that. It all exists in a > > single implementation on a single node. So in that sense there can be > > expected to be different priorities. > > > > Why should we pay the costs of pNFS when implementing localio? > > Why do you think we pay a cost for it? From all I can tell it makes > the job simpler, especially if we want to do things like bypassing > the second page cache. > > > That > > question can only be answered if we have a good understanding of the > > costs and benefits. And that requires having a concrete proposal for > > the "pNFS" option - if only a detailed sketch. > > I sketched the the very sketchy sketch earlier - add a new localio > layout that does local file I/O. The I/O side of that is pretty > tivial, and maybe I can find some time to write draft code. The file > open side is just as horrible as in the current localio proposal, > and I could just reuse that for now, although I think the concept > of opening the file in the client contect is fundamentally wrong > no matter how we skin the cat. > I had been assuming that you were proposing a new pNFS layout type with associated protocol extension, an RFC describing them, and navigation of the IETF standards process. These are (some of) the costs I was thinking of. Of course the IETF requires demonstration of interoperability between multiple implementations, and as that is impossible for localio, we would fail before we started. But I now suspect that I guessed your intention wrongly (I'm rubbish at guessing other people's intentions). Your use of the word "infrastructure" above and the sketchy sketch you provide (thanks) seems to indicate you are only suggesting that we re-use some of the pnfs abstractions and interfaces already implemented in the Linux NFS client and server. Is that what you mean? If it is, then it isn't immediately clear to me that this would have to be NFSv4 only. The different versions already share code where that makes sense. Moving the pnfs code from the nfsv4 module to a new pnfs module which nfsv3 also depends on might make sense. I'm keen to know what you are really thinking. Thanks, NeilBrown