Message ID | 164325908579.23133.4781039121536248752.stgit@noble.brown (mailing list archive) |
---|---|
Headers | show |
Series | nfsd: allow NFSv4 state to be revoked. | expand |
Hi Neil- > On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote: > > If a filesystem is exported to a client with NFSv4 and that client holds > a file open, the filesystem cannot be unmounted without either stopping the > NFS server completely, or blocking all access from that client > (unexporting all filesystems) and waiting for the lease timeout. > > For NFSv3 - and particularly NLM - it is possible to revoke all state by > writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem. > > This series extends this functionality to NFSv4. With this, to unmount > an exported filesystem is it sufficient to disable export of that > filesystem, and then write the path to unlock_filesystem. > > I've cursed mainly on NFSv4.1 and later for this. I haven't tested > yet with NFSv4.0 which has different mechanisms for state management. > > If this series is seen as a general acceptable approach, I'll look into > the NFSv4.0 aspects properly and make sure it works there. I've browsed this series and need to think about: - whether we want to enable administrative state revocation and - whether NFSv4.0 can support that reasonably In particular, are there security consequences for revoking state? What would applications see, and would that depend on which minor version is in use? Are there data corruption risks if this facility were to be misused? Also, Dai's courteous server work is something that potentially conflicts with some of this, and I'd like to see that go in first. Do you have specific user requests for this feature, and if so, what are the particular usage scenarios? > Thanks, > NeilBrown > > > --- > > NeilBrown (4): > nfsd: prepare for supporting admin-revocation of state > nfsd: allow open state ids to be revoked and then freed > nfsd: allow lock state ids to be revoked and then freed > nfsd: allow delegation state ids to be revoked and then freed > > > fs/nfsd/nfs4state.c | 105 ++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 96 insertions(+), 9 deletions(-) > > -- > Signature You should fill this in. :-) -- Chuck Lever
On Thu, Jan 27, 2022 at 03:58:10PM +1100, NeilBrown wrote: > If a filesystem is exported to a client with NFSv4 and that client holds > a file open, the filesystem cannot be unmounted without either stopping the > NFS server completely, or blocking all access from that client > (unexporting all filesystems) and waiting for the lease timeout. > > For NFSv3 - and particularly NLM - it is possible to revoke all state by > writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem. > > This series extends this functionality to NFSv4. With this, to unmount > an exported filesystem is it sufficient to disable export of that > filesystem, and then write the path to unlock_filesystem. It's always been weird that /proc/fs/nfsd/unlock_filesystem was v3-only, so thanks for looking into extending it to v4. You can accomplish the same by stopping the server, unexporting, then restarting, but then applications may see grace-period-length delays. So in a way this is just an optimization for what's probably a rare operation. Probably worth it, but I'd still be curious if there's any specific motivating cases you can share. I guess the procedure would be to unexport and then write to /proc/fs/nfsd/unlock_filesystem? An option to exportfs to do both might be handy. > I've cursed mainly on NFSv4.1 It does inspire strong feelings sometimes. --b. > and later for this. I haven't tested > yet with NFSv4.0 which has different mechanisms for state management. > > If this series is seen as a general acceptable approach, I'll look into > the NFSv4.0 aspects properly and make sure it works there. > > Thanks, > NeilBrown > > > --- > > NeilBrown (4): > nfsd: prepare for supporting admin-revocation of state > nfsd: allow open state ids to be revoked and then freed > nfsd: allow lock state ids to be revoked and then freed > nfsd: allow delegation state ids to be revoked and then freed > > > fs/nfsd/nfs4state.c | 105 ++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 96 insertions(+), 9 deletions(-) > > -- > Signature
On Fri, 28 Jan 2022, Chuck Lever III wrote: > Hi Neil- > > > On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote: > > > > If a filesystem is exported to a client with NFSv4 and that client holds > > a file open, the filesystem cannot be unmounted without either stopping the > > NFS server completely, or blocking all access from that client > > (unexporting all filesystems) and waiting for the lease timeout. > > > > For NFSv3 - and particularly NLM - it is possible to revoke all state by > > writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem. > > > > This series extends this functionality to NFSv4. With this, to unmount > > an exported filesystem is it sufficient to disable export of that > > filesystem, and then write the path to unlock_filesystem. > > > > I've cursed mainly on NFSv4.1 and later for this. I haven't tested > > yet with NFSv4.0 which has different mechanisms for state management. > > > > If this series is seen as a general acceptable approach, I'll look into > > the NFSv4.0 aspects properly and make sure it works there. > > I've browsed this series and need to think about: > - whether we want to enable administrative state revocation and > - whether NFSv4.0 can support that reasonably > > In particular, are there security consequences for revoking > state? What would applications see, and would that depend on > which minor version is in use? Are there data corruption risks > if this facility were to be misused? The expectation is that this would only be used after unexporting the filesystem. In that case, the client wouldn't notice any difference from the act of writing to unlock_filesystem, as the entire filesystem would already be inaccessible. If we did unlock_filesystem a filesystem that was still exported, the client would see similar behaviour to a network partition that was of longer duration than the lease time. Locks would be lost. > > Also, Dai's courteous server work is something that potentially > conflicts with some of this, and I'd like to see that go in > first. I'm perfectly happy to wait for the courteous server work to land before pursuing this. > > Do you have specific user requests for this feature, and if so, > what are the particular usage scenarios? It's complicated.... The customer has an HA config with multiple filesystem resource which they want to be able to migrate independently. I don't think we really support that, but they seem to want to see if they can make it work (and it should be noted that I talk to an L2 support technician who talks to the customer representative, so I might be getting the full story). Customer reported that even after unexporting a filesystem, they cannot then unmount it. Whether or not we think that independent filesystem resources is supportable, I do think that the customer should have a clear path for unmounting a filesystem without interfering with service provided from other filesystems. Stopping nfsd would interfere with that service by forcing a grace-period on all filesystems. The RFC explicitly supports admin-revocation of state, and that would address this specific need, so it seemed completely appropriate to provide it. As an aside ... I'd like to be able to suggest that the customer use network namespaces for the different filesystem resources. Each could be in its own namespace and managed independently. However I don't think we have good admin infrastructure for that do we? I'd like to be able to say "set up these 2 or 3 config files and run systemctl start nfs-server@foo and the 'foo' network namespace will be created, configured, and have an nfs server running". Do we have anything approaching that? Even a HOWTO ?? Thanks, NeilBrown
On 1/27/22 2:41 PM, NeilBrown wrote: > On Fri, 28 Jan 2022, Chuck Lever III wrote: >> Hi Neil- >> >>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote: >>> >>> If a filesystem is exported to a client with NFSv4 and that client holds >>> a file open, the filesystem cannot be unmounted without either stopping the >>> NFS server completely, or blocking all access from that client >>> (unexporting all filesystems) and waiting for the lease timeout. >>> >>> For NFSv3 - and particularly NLM - it is possible to revoke all state by >>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem. >>> >>> This series extends this functionality to NFSv4. With this, to unmount >>> an exported filesystem is it sufficient to disable export of that >>> filesystem, and then write the path to unlock_filesystem. >>> >>> I've cursed mainly on NFSv4.1 and later for this. I haven't tested >>> yet with NFSv4.0 which has different mechanisms for state management. >>> >>> If this series is seen as a general acceptable approach, I'll look into >>> the NFSv4.0 aspects properly and make sure it works there. >> I've browsed this series and need to think about: >> - whether we want to enable administrative state revocation and >> - whether NFSv4.0 can support that reasonably >> >> In particular, are there security consequences for revoking >> state? What would applications see, and would that depend on >> which minor version is in use? Are there data corruption risks >> if this facility were to be misused? > The expectation is that this would only be used after unexporting the > filesystem. In that case, the client wouldn't notice any difference > from the act of writing to unlock_filesystem, as the entire filesystem > would already be inaccessible. > > If we did unlock_filesystem a filesystem that was still exported, the > client would see similar behaviour to a network partition that was of > longer duration than the lease time. Locks would be lost. > >> Also, Dai's courteous server work is something that potentially >> conflicts with some of this, and I'd like to see that go in >> first. > I'm perfectly happy to wait for the courteous server work to land before > pursuing this. Thank you Chuck and Neil, I'm chasing a couple intermittent share reservation related problems with pynfs test. I hope to have them resolved and submit the v10 patch by end of this week. -Dai > >> Do you have specific user requests for this feature, and if so, >> what are the particular usage scenarios? > It's complicated.... > > The customer has an HA config with multiple filesystem resource which > they want to be able to migrate independently. I don't think we really > support that, but they seem to want to see if they can make it work (and > it should be noted that I talk to an L2 support technician who talks to > the customer representative, so I might be getting the full story). > > Customer reported that even after unexporting a filesystem, they cannot > then unmount it. Whether or not we think that independent filesystem > resources is supportable, I do think that the customer should have a > clear path for unmounting a filesystem without interfering with service > provided from other filesystems. Stopping nfsd would interfere with > that service by forcing a grace-period on all filesystems. > The RFC explicitly supports admin-revocation of state, and that would > address this specific need, so it seemed completely appropriate to > provide it. > > As an aside ... I'd like to be able to suggest that the customer use > network namespaces for the different filesystem resources. Each could > be in its own namespace and managed independently. However I don't > think we have good admin infrastructure for that do we? > > I'd like to be able to say "set up these 2 or 3 config files and run > systemctl start nfs-server@foo and the 'foo' network namespace will be > created, configured, and have an nfs server running". > Do we have anything approaching that? Even a HOWTO ?? > > Thanks, > NeilBrown
> On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@suse.de> wrote: > > On Fri, 28 Jan 2022, Chuck Lever III wrote: >> Hi Neil- >> >>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote: >>> >>> If a filesystem is exported to a client with NFSv4 and that client holds >>> a file open, the filesystem cannot be unmounted without either stopping the >>> NFS server completely, or blocking all access from that client >>> (unexporting all filesystems) and waiting for the lease timeout. >>> >>> For NFSv3 - and particularly NLM - it is possible to revoke all state by >>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem. >>> >>> This series extends this functionality to NFSv4. With this, to unmount >>> an exported filesystem is it sufficient to disable export of that >>> filesystem, and then write the path to unlock_filesystem. >>> >>> I've cursed mainly on NFSv4.1 and later for this. I haven't tested >>> yet with NFSv4.0 which has different mechanisms for state management. >>> >>> If this series is seen as a general acceptable approach, I'll look into >>> the NFSv4.0 aspects properly and make sure it works there. >> >> I've browsed this series and need to think about: >> - whether we want to enable administrative state revocation and >> - whether NFSv4.0 can support that reasonably >> >> In particular, are there security consequences for revoking >> state? What would applications see, and would that depend on >> which minor version is in use? Are there data corruption risks >> if this facility were to be misused? > > The expectation is that this would only be used after unexporting the > filesystem. In that case, the client wouldn't notice any difference > from the act of writing to unlock_filesystem, as the entire filesystem > would already be inaccessible. > > If we did unlock_filesystem a filesystem that was still exported, the > client would see similar behaviour to a network partition that was of > longer duration than the lease time. Locks would be lost. > >> >> Also, Dai's courteous server work is something that potentially >> conflicts with some of this, and I'd like to see that go in >> first. > > I'm perfectly happy to wait for the courteous server work to land before > pursuing this. > >> >> Do you have specific user requests for this feature, and if so, >> what are the particular usage scenarios? > > It's complicated.... > > The customer has an HA config with multiple filesystem resource which > they want to be able to migrate independently. I don't think we really > support that, With NFSv4, the protocol has mechanisms to indicate to clients that a shared filesystem has migrated, and to indicate that the clients' state has been migrated too. Clients can reclaim their state if the servers did not migrate that state with the data. It deals with the edge cases to prevent clients from stealing open/lock state during the migration. Unexporting doesn't seem like the right approach to that. > but they seem to want to see if they can make it work (and > it should be noted that I talk to an L2 support technician who talks to > the customer representative, so I might be getting the full story). > > Customer reported that even after unexporting a filesystem, they cannot > then unmount it. My first thought is that probably clients are still pinning resources on that shared filesystem. I guess that's what the unlock_ interface is supposed to deal with. But that suggests to me that unexporting first is not as risk-free as you describe above. I think applications would notice and there would be edge cases where other clients might be able to grab open/lock state before the original holders could re-establish their lease. > Whether or not we think that independent filesystem > resources is supportable, I do think that the customer should have a > clear path for unmounting a filesystem without interfering with service > provided from other filesystems. Maybe. I guess I put that in the "last resort" category rather than "this is something safe that I want to do as part of daily operation" category. > Stopping nfsd would interfere with > that service by forcing a grace-period on all filesystems. Yep. We have discussed implementing a per-filesystem grace period in the past. That is probably a pre-requisite to enabling filesystem migration. > The RFC explicitly supports admin-revocation of state, and that would > address this specific need, so it seemed completely appropriate to > provide it. Well the RFC also provides for migrating filesystems without stopping the NFS service. If that's truly the goal, then I think we want to encourage that direction instead of ripping out open and lock state. Also, it's not clear to me that clients support administrative revocation as broadly as we might like. The Linux NFS client does have support for NFSv4 migration, though it's a bit fallow these days. > As an aside ... I'd like to be able to suggest that the customer use > network namespaces for the different filesystem resources. Each could > be in its own namespace and managed independently. However I don't > think we have good admin infrastructure for that do we? None that I'm aware of. SteveD is the best person to ask. > I'd like to be able to say "set up these 2 or 3 config files and run > systemctl start nfs-server@foo and the 'foo' network namespace will be > created, configured, and have an nfs server running". > Do we have anything approaching that? Even a HOWTO ?? Interesting idea! But doesn't ring a bell. -- Chuck Lever
On Fri, Jan 28, 2022 at 09:41:24AM +1100, NeilBrown wrote: > It's complicated.... > > The customer has an HA config with multiple filesystem resource which > they want to be able to migrate independently. I don't think we really > support that, but they seem to want to see if they can make it work (and > it should be noted that I talk to an L2 support technician who talks to > the customer representative, so I might be getting the full story). > > Customer reported that even after unexporting a filesystem, they cannot > then unmount it. Whether or not we think that independent filesystem > resources is supportable, I do think that the customer should have a > clear path for unmounting a filesystem without interfering with service > provided from other filesystems. Stopping nfsd would interfere with > that service by forcing a grace-period on all filesystems. > The RFC explicitly supports admin-revocation of state, and that would > address this specific need, so it seemed completely appropriate to > provide it. I was little worried that might be the use-case. I don't see how it's going to work. You've got clients that hold locks an opens on the unexported filesystem. So maybe you can use an NFSv4 referral to point them to the new server. Are they going to try to issue reclaims to the new server? There's more to do before this works. > As an aside ... I'd like to be able to suggest that the customer use > network namespaces for the different filesystem resources. Each could > be in its own namespace and managed independently. Yeah. Then you're basically migrating the whole server, not just the one export, and that's more of a solved problem. > However I don't think we have good admin infrastructure for that do > we? > > I'd like to be able to say "set up these 2 or 3 config files and run > systemctl start nfs-server@foo and the 'foo' network namespace will be > created, configured, and have an nfs server running". Do we have > anything approaching that? Even a HOWTO ?? But I don't think we've got anything that simple yet? --b.
On Fri, 28 Jan 2022, J. Bruce Fields wrote: > On Fri, Jan 28, 2022 at 09:41:24AM +1100, NeilBrown wrote: > > It's complicated.... > > > > The customer has an HA config with multiple filesystem resource which > > they want to be able to migrate independently. I don't think we really > > support that, but they seem to want to see if they can make it work (and > > it should be noted that I talk to an L2 support technician who talks to > > the customer representative, so I might be getting the full story). > > > > Customer reported that even after unexporting a filesystem, they cannot > > then unmount it. Whether or not we think that independent filesystem > > resources is supportable, I do think that the customer should have a > > clear path for unmounting a filesystem without interfering with service > > provided from other filesystems. Stopping nfsd would interfere with > > that service by forcing a grace-period on all filesystems. > > The RFC explicitly supports admin-revocation of state, and that would > > address this specific need, so it seemed completely appropriate to > > provide it. > > I was little worried that might be the use-case. > > :-) > I don't see how it's going to work. You've got clients that hold locks > an opens on the unexported filesystem. So maybe you can use an NFSv4 > referral to point them to the new server. Are they going to try to > issue reclaims to the new server? There's more to do before this works. As I hope I implied, I'm not at all sure that the specific problem that the customer raised (cannot unmount a filesystem) directly related to the general solution that the customer is trying to create. Some customers like us to hold their hand the whole way, others like to (feel that they) have more control. In general I like to encourage independence (but I have to consciously avoid trusting the results). We have an "unlock_filesystem" interface. I want it to work for NFSv4. The HA config was background, not a complete motivation. > > > As an aside ... I'd like to be able to suggest that the customer use > > network namespaces for the different filesystem resources. Each could > > be in its own namespace and managed independently. > > Yeah. Then you're basically migrating the whole server, not just the > one export, and that's more of a solved problem. Exactly. > > > However I don't think we have good admin infrastructure for that do > > we? > > > > I'd like to be able to say "set up these 2 or 3 config files and run > > systemctl start nfs-server@foo and the 'foo' network namespace will be > > created, configured, and have an nfs server running". Do we have > > anything approaching that? Even a HOWTO ?? > > But I don't think we've got anything that simple yet? I guess I have some work to do.... Thanks, NeilBrown
On Fri, 28 Jan 2022, Chuck Lever III wrote: > > > On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@suse.de> wrote: > > > > On Fri, 28 Jan 2022, Chuck Lever III wrote: > >> Hi Neil- > >> > >>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote: > >>> > >>> If a filesystem is exported to a client with NFSv4 and that client holds > >>> a file open, the filesystem cannot be unmounted without either stopping the > >>> NFS server completely, or blocking all access from that client > >>> (unexporting all filesystems) and waiting for the lease timeout. > >>> > >>> For NFSv3 - and particularly NLM - it is possible to revoke all state by > >>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem. > >>> > >>> This series extends this functionality to NFSv4. With this, to unmount > >>> an exported filesystem is it sufficient to disable export of that > >>> filesystem, and then write the path to unlock_filesystem. > >>> > >>> I've cursed mainly on NFSv4.1 and later for this. I haven't tested > >>> yet with NFSv4.0 which has different mechanisms for state management. > >>> > >>> If this series is seen as a general acceptable approach, I'll look into > >>> the NFSv4.0 aspects properly and make sure it works there. > >> > >> I've browsed this series and need to think about: > >> - whether we want to enable administrative state revocation and > >> - whether NFSv4.0 can support that reasonably > >> > >> In particular, are there security consequences for revoking > >> state? What would applications see, and would that depend on > >> which minor version is in use? Are there data corruption risks > >> if this facility were to be misused? > > > > The expectation is that this would only be used after unexporting the > > filesystem. In that case, the client wouldn't notice any difference > > from the act of writing to unlock_filesystem, as the entire filesystem > > would already be inaccessible. > > > > If we did unlock_filesystem a filesystem that was still exported, the > > client would see similar behaviour to a network partition that was of > > longer duration than the lease time. Locks would be lost. > > > >> > >> Also, Dai's courteous server work is something that potentially > >> conflicts with some of this, and I'd like to see that go in > >> first. > > > > I'm perfectly happy to wait for the courteous server work to land before > > pursuing this. > > > >> > >> Do you have specific user requests for this feature, and if so, > >> what are the particular usage scenarios? > > > > It's complicated.... > > > > The customer has an HA config with multiple filesystem resource which > > they want to be able to migrate independently. I don't think we really > > support that, > > With NFSv4, the protocol has mechanisms to indicate to clients that > a shared filesystem has migrated, and to indicate that the clients' > state has been migrated too. Clients can reclaim their state if the > servers did not migrate that state with the data. It deals with the > edge cases to prevent clients from stealing open/lock state during > the migration. > > Unexporting doesn't seem like the right approach to that. No, but it something that should work, and should allow the filesystem to be unmounted. You get to keep both halves. > > > > but they seem to want to see if they can make it work (and > > it should be noted that I talk to an L2 support technician who talks to > > the customer representative, so I might be getting the full story). > > > > Customer reported that even after unexporting a filesystem, they cannot > > then unmount it. > > My first thought is that probably clients are still pinning > resources on that shared filesystem. I guess that's what the > unlock_ interface is supposed to deal with. But that suggests > to me that unexporting first is not as risk-free as you > describe above. I think applications would notice and there > would be edge cases where other clients might be able to > grab open/lock state before the original holders could > re-establish their lease. Unexporting isn't risk free. It just absorbs all the risks - none are left of unlock_filesystem to be blamed for. Expecting an application to recover if you unexport a filesystem and later re-export it is certainly not guaranteed. That isn't the use-case I particularly want to fix. I want to be able to unmount a filesystem without visiting call clients and killing off applications. > > > > Whether or not we think that independent filesystem > > resources is supportable, I do think that the customer should have a > > clear path for unmounting a filesystem without interfering with service > > provided from other filesystems. > > Maybe. I guess I put that in the "last resort" category > rather than "this is something safe that I want to do as > part of daily operation" category. Agree. Definitely "last resort". > > > > Stopping nfsd would interfere with > > that service by forcing a grace-period on all filesystems. > > Yep. We have discussed implementing a per-filesystem > grace period in the past. That is probably a pre-requisite > to enabling filesystem migration. > > > > The RFC explicitly supports admin-revocation of state, and that would > > address this specific need, so it seemed completely appropriate to > > provide it. > > Well the RFC also provides for migrating filesystems without > stopping the NFS service. If that's truly the goal, then I > think we want to encourage that direction instead of ripping > out open and lock state. I suspect that virtual IPs and network namespaces is the better approach for migrating exported filesystems. It isn't clear to me that integrated migration support in NFS would add anything of value. But as I think I said to Bruce - seamless migration support is not my goal here. In the context where a site has multiple filesystems that are all NFS exported, there is a case for being able to forcibly unexport/unmount one filesystem without affecting the others. That is my aim here. Thanks, NeilBrown > > Also, it's not clear to me that clients support administrative > revocation as broadly as we might like. The Linux NFS client > does have support for NFSv4 migration, though it's a bit > fallow these days. > > > > As an aside ... I'd like to be able to suggest that the customer use > > network namespaces for the different filesystem resources. Each could > > be in its own namespace and managed independently. However I don't > > think we have good admin infrastructure for that do we? > > None that I'm aware of. SteveD is the best person to ask. > > > > I'd like to be able to say "set up these 2 or 3 config files and run > > systemctl start nfs-server@foo and the 'foo' network namespace will be > > created, configured, and have an nfs server running". > > Do we have anything approaching that? Even a HOWTO ?? > > Interesting idea! But doesn't ring a bell. > > -- > Chuck Lever > > > >
On Fri, 2022-01-28 at 15:24 +1100, NeilBrown wrote: > On Fri, 28 Jan 2022, Chuck Lever III wrote: > > > > > On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > On Fri, 28 Jan 2022, Chuck Lever III wrote: > > > > Hi Neil- > > > > > > > > > On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> > > > > > wrote: > > > > > > > > > > If a filesystem is exported to a client with NFSv4 and that > > > > > client holds > > > > > a file open, the filesystem cannot be unmounted without > > > > > either stopping the > > > > > NFS server completely, or blocking all access from that > > > > > client > > > > > (unexporting all filesystems) and waiting for the lease > > > > > timeout. > > > > > > > > > > For NFSv3 - and particularly NLM - it is possible to revoke > > > > > all state by > > > > > writing the path to the filesystem into > > > > > /proc/fs/nfsd/unlock_filesystem. > > > > > > > > > > This series extends this functionality to NFSv4. With this, > > > > > to unmount > > > > > an exported filesystem is it sufficient to disable export of > > > > > that > > > > > filesystem, and then write the path to unlock_filesystem. > > > > > > > > > > I've cursed mainly on NFSv4.1 and later for this. I haven't > > > > > tested > > > > > yet with NFSv4.0 which has different mechanisms for state > > > > > management. > > > > > > > > > > If this series is seen as a general acceptable approach, I'll > > > > > look into > > > > > the NFSv4.0 aspects properly and make sure it works there. > > > > > > > > I've browsed this series and need to think about: > > > > - whether we want to enable administrative state revocation and > > > > - whether NFSv4.0 can support that reasonably > > > > > > > > In particular, are there security consequences for revoking > > > > state? What would applications see, and would that depend on > > > > which minor version is in use? Are there data corruption risks > > > > if this facility were to be misused? > > > > > > The expectation is that this would only be used after unexporting > > > the > > > filesystem. In that case, the client wouldn't notice any > > > difference > > > from the act of writing to unlock_filesystem, as the entire > > > filesystem > > > would already be inaccessible. > > > > > > If we did unlock_filesystem a filesystem that was still exported, > > > the > > > client would see similar behaviour to a network partition that > > > was of > > > longer duration than the lease time. Locks would be lost. > > > > > > > > > > > Also, Dai's courteous server work is something that potentially > > > > conflicts with some of this, and I'd like to see that go in > > > > first. > > > > > > I'm perfectly happy to wait for the courteous server work to land > > > before > > > pursuing this. > > > > > > > > > > > Do you have specific user requests for this feature, and if so, > > > > what are the particular usage scenarios? > > > > > > It's complicated.... > > > > > > The customer has an HA config with multiple filesystem resource > > > which > > > they want to be able to migrate independently. I don't think we > > > really > > > support that, > > > > With NFSv4, the protocol has mechanisms to indicate to clients that > > a shared filesystem has migrated, and to indicate that the clients' > > state has been migrated too. Clients can reclaim their state if the > > servers did not migrate that state with the data. It deals with the > > edge cases to prevent clients from stealing open/lock state during > > the migration. > > > > Unexporting doesn't seem like the right approach to that. > > No, but it something that should work, and should allow the > filesystem > to be unmounted. You get to keep both halves. > > > > > > > > but they seem to want to see if they can make it work (and > > > it should be noted that I talk to an L2 support technician who > > > talks to > > > the customer representative, so I might be getting the full > > > story). > > > > > > Customer reported that even after unexporting a filesystem, they > > > cannot > > > then unmount it. > > > > My first thought is that probably clients are still pinning > > resources on that shared filesystem. I guess that's what the > > unlock_ interface is supposed to deal with. But that suggests > > to me that unexporting first is not as risk-free as you > > describe above. I think applications would notice and there > > would be edge cases where other clients might be able to > > grab open/lock state before the original holders could > > re-establish their lease. > > Unexporting isn't risk free. It just absorbs all the risks - none > are > left of unlock_filesystem to be blamed for. > > Expecting an application to recover if you unexport a filesystem and > later re-export it is certainly not guaranteed. That isn't the use- > case > I particularly want to fix. I want to be able to unmount a > filesystem > without visiting call clients and killing off applications. > > > > > > > > Whether or not we think that independent filesystem > > > resources is supportable, I do think that the customer should > > > have a > > > clear path for unmounting a filesystem without interfering with > > > service > > > provided from other filesystems. > > > > Maybe. I guess I put that in the "last resort" category > > rather than "this is something safe that I want to do as > > part of daily operation" category. > > Agree. Definitely "last resort". > > > > > > > > Stopping nfsd would interfere with > > > that service by forcing a grace-period on all filesystems. > > > > Yep. We have discussed implementing a per-filesystem > > grace period in the past. That is probably a pre-requisite > > to enabling filesystem migration. > > > > > > > The RFC explicitly supports admin-revocation of state, and that > > > would > > > address this specific need, so it seemed completely appropriate > > > to > > > provide it. > > > > Well the RFC also provides for migrating filesystems without > > stopping the NFS service. If that's truly the goal, then I > > think we want to encourage that direction instead of ripping > > out open and lock state. > > I suspect that virtual IPs and network namespaces is the better > approach > for migrating exported filesystems. It isn't clear to me that > integrated migration support in NFS would add anything of value. No, but referrals allow you to create an arbitrary namespace out of a set of containerised knfsd instances. It really wouldn't be hard to convert an existing setup into something that gives you the single- filesystem migration capabilities you're asking for. >
On Fri, Jan 28, 2022 at 03:14:51PM +1100, NeilBrown wrote: > On Fri, 28 Jan 2022, J. Bruce Fields wrote: > > I don't see how it's going to work. You've got clients that hold locks > > an opens on the unexported filesystem. So maybe you can use an NFSv4 > > referral to point them to the new server. Are they going to try to > > issue reclaims to the new server? There's more to do before this works. > > As I hope I implied, I'm not at all sure that the specific problem that > the customer raised (cannot unmount a filesystem) directly related to > the general solution that the customer is trying to create. Some > customers like us to hold their hand the whole way, others like to (feel > that they) have more control. In general I like to encourage > independence (but I have to consciously avoid trusting the results). > > We have an "unlock_filesystem" interface. I want it to work for NFSv4. > The HA config was background, not a complete motivation. I think people do occasionally need to just rip a filesystem out even if it means IO errors to applications. And we already do this for NFSv3. So, I'm inclined to support the idea. (I also wonder whether some of the code could be a useful step towards other functionality.) > > > However I don't think we have good admin infrastructure for that do > > > we? > > > > > > I'd like to be able to say "set up these 2 or 3 config files and run > > > systemctl start nfs-server@foo and the 'foo' network namespace will be > > > created, configured, and have an nfs server running". Do we have > > > anything approaching that? Even a HOWTO ?? > > > > But I don't think we've got anything that simple yet? > > I guess I have some work to do.... RHEL HA does support NFS failover using containers. I think it's a bit more complicated than you're looking for. Let me go dig that up.... With a KVM VM and shared backend storage I think it's pretty easy: just shut down the VM on one machine and bring it up on another. --b.
> On Jan 27, 2022, at 11:24 PM, NeilBrown <neilb@suse.de> wrote: > > On Fri, 28 Jan 2022, Chuck Lever III wrote: >> >>> On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@suse.de> wrote: >>> >>> On Fri, 28 Jan 2022, Chuck Lever III wrote: >>>> Hi Neil- >>>> >>>>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote: >>>>> >>>>> If a filesystem is exported to a client with NFSv4 and that client holds >>>>> a file open, the filesystem cannot be unmounted without either stopping the >>>>> NFS server completely, or blocking all access from that client >>>>> (unexporting all filesystems) and waiting for the lease timeout. >>>>> >>>>> For NFSv3 - and particularly NLM - it is possible to revoke all state by >>>>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem. >>>>> >>>>> This series extends this functionality to NFSv4. With this, to unmount >>>>> an exported filesystem is it sufficient to disable export of that >>>>> filesystem, and then write the path to unlock_filesystem. >>>>> >>>>> I've cursed mainly on NFSv4.1 and later for this. I haven't tested >>>>> yet with NFSv4.0 which has different mechanisms for state management. >>>>> >>>>> If this series is seen as a general acceptable approach, I'll look into >>>>> the NFSv4.0 aspects properly and make sure it works there. >>>> >>>> I've browsed this series and need to think about: >>>> - whether we want to enable administrative state revocation and >>>> - whether NFSv4.0 can support that reasonably >>>> >>>> In particular, are there security consequences for revoking >>>> state? What would applications see, and would that depend on >>>> which minor version is in use? Are there data corruption risks >>>> if this facility were to be misused? >>> >>> The expectation is that this would only be used after unexporting the >>> filesystem. In that case, the client wouldn't notice any difference >>> from the act of writing to unlock_filesystem, as the entire filesystem >>> would already be inaccessible. >>> >>> If we did unlock_filesystem a filesystem that was still exported, the >>> client would see similar behaviour to a network partition that was of >>> longer duration than the lease time. Locks would be lost. >>> >>>> >>>> Also, Dai's courteous server work is something that potentially >>>> conflicts with some of this, and I'd like to see that go in >>>> first. >>> >>> I'm perfectly happy to wait for the courteous server work to land before >>> pursuing this. >>> >>>> >>>> Do you have specific user requests for this feature, and if so, >>>> what are the particular usage scenarios? >>> >>> It's complicated.... >>> >>> The customer has an HA config with multiple filesystem resource which >>> they want to be able to migrate independently. I don't think we really >>> support that, >> >> With NFSv4, the protocol has mechanisms to indicate to clients that >> a shared filesystem has migrated, and to indicate that the clients' >> state has been migrated too. Clients can reclaim their state if the >> servers did not migrate that state with the data. It deals with the >> edge cases to prevent clients from stealing open/lock state during >> the migration. >> >> Unexporting doesn't seem like the right approach to that. > > No, but it something that should work, and should allow the filesystem > to be unmounted. You get to keep both halves. > >> >> >>> but they seem to want to see if they can make it work (and >>> it should be noted that I talk to an L2 support technician who talks to >>> the customer representative, so I might be getting the full story). >>> >>> Customer reported that even after unexporting a filesystem, they cannot >>> then unmount it. >> >> My first thought is that probably clients are still pinning >> resources on that shared filesystem. I guess that's what the >> unlock_ interface is supposed to deal with. But that suggests >> to me that unexporting first is not as risk-free as you >> describe above. I think applications would notice and there >> would be edge cases where other clients might be able to >> grab open/lock state before the original holders could >> re-establish their lease. > > Unexporting isn't risk free. It just absorbs all the risks - none are > left of unlock_filesystem to be blamed for. > > Expecting an application to recover if you unexport a filesystem and > later re-export it is certainly not guaranteed. That isn't the use-case > I particularly want to fix. I want to be able to unmount a filesystem > without visiting call clients and killing off applications. OK. The top level goal then is simply to provide another arrow in the administrator's quiver to manage a large NFS server. It brings NFSv4 closer to par with the NFSv3 toolset. I say we have enough motivation for a full proof of concept. I would like to see support for minor version 0 added, and a fuller discussion of the consequences for clients and applications will be needed (at least for the purpose of administrator documentation). >>> Whether or not we think that independent filesystem >>> resources is supportable, I do think that the customer should have a >>> clear path for unmounting a filesystem without interfering with service >>> provided from other filesystems. >> >> Maybe. I guess I put that in the "last resort" category >> rather than "this is something safe that I want to do as >> part of daily operation" category. > > Agree. Definitely "last resort". > >> >> >>> Stopping nfsd would interfere with >>> that service by forcing a grace-period on all filesystems. >> >> Yep. We have discussed implementing a per-filesystem >> grace period in the past. That is probably a pre-requisite >> to enabling filesystem migration. >> >> >>> The RFC explicitly supports admin-revocation of state, and that would >>> address this specific need, so it seemed completely appropriate to >>> provide it. >> >> Well the RFC also provides for migrating filesystems without >> stopping the NFS service. If that's truly the goal, then I >> think we want to encourage that direction instead of ripping >> out open and lock state. > > I suspect that virtual IPs and network namespaces is the better approach > for migrating exported filesystems. It isn't clear to me that > integrated migration support in NFS would add anything of value. > > But as I think I said to Bruce - seamless migration support is not my > goal here. In the context where a site has multiple filesystems that > are all NFS exported, there is a case for being able to forcibly > unexport/unmount one filesystem without affecting the others. That is > my aim here. My initial impulse is to better understand what is preventing the unexported filesystem from being unmounted. Better observability there could potentially be of value. > Thanks, > NeilBrown > > >> >> Also, it's not clear to me that clients support administrative >> revocation as broadly as we might like. The Linux NFS client >> does have support for NFSv4 migration, though it's a bit >> fallow these days. >> >> >>> As an aside ... I'd like to be able to suggest that the customer use >>> network namespaces for the different filesystem resources. Each could >>> be in its own namespace and managed independently. However I don't >>> think we have good admin infrastructure for that do we? >> >> None that I'm aware of. SteveD is the best person to ask. >> >> >>> I'd like to be able to say "set up these 2 or 3 config files and run >>> systemctl start nfs-server@foo and the 'foo' network namespace will be >>> created, configured, and have an nfs server running". >>> Do we have anything approaching that? Even a HOWTO ?? >> >> Interesting idea! But doesn't ring a bell. >> >> -- >> Chuck Lever -- Chuck Lever
On Fri, Jan 28, 2022 at 01:46:45PM +0000, Chuck Lever III wrote: > My initial impulse is to better understand what is preventing > the unexported filesystem from being unmounted. Better > observability there could potentially be of value. In theory that information's in /proc/fs/nfsd/clients/. Somebody could write a tool to scan that for state referencing a given filesystem. --b.
On Fri, Jan 28, 2022 at 08:38:43AM -0500, J. Bruce Fields wrote: > On Fri, Jan 28, 2022 at 03:14:51PM +1100, NeilBrown wrote: > > On Fri, 28 Jan 2022, J. Bruce Fields wrote: > > > I don't see how it's going to work. You've got clients that hold locks > > > an opens on the unexported filesystem. So maybe you can use an NFSv4 > > > referral to point them to the new server. Are they going to try to > > > issue reclaims to the new server? There's more to do before this works. > > > > As I hope I implied, I'm not at all sure that the specific problem that > > the customer raised (cannot unmount a filesystem) directly related to > > the general solution that the customer is trying to create. Some > > customers like us to hold their hand the whole way, others like to (feel > > that they) have more control. In general I like to encourage > > independence (but I have to consciously avoid trusting the results). > > > > We have an "unlock_filesystem" interface. I want it to work for NFSv4. > > The HA config was background, not a complete motivation. > > I think people do occasionally need to just rip a filesystem out even if > it means IO errors to applications. And we already do this for NFSv3. > So, I'm inclined to support the idea. > > (I also wonder whether some of the code could be a useful step towards > other functionality.) For example, AFS-like read-only replica update: unmount a filesystem, mount a new version in its place. "Reconnecting" locks after the open would be difficult, I think, but opens should be doable? And in the read-only case nobody should care about locks. --b. > > > > > However I don't think we have good admin infrastructure for that do > > > > we? > > > > > > > > I'd like to be able to say "set up these 2 or 3 config files and run > > > > systemctl start nfs-server@foo and the 'foo' network namespace will be > > > > created, configured, and have an nfs server running". Do we have > > > > anything approaching that? Even a HOWTO ?? > > > > > > But I don't think we've got anything that simple yet? > > > > I guess I have some work to do.... > > RHEL HA does support NFS failover using containers. I think it's a bit > more complicated than you're looking for. Let me go dig that up.... > > With a KVM VM and shared backend storage I think it's pretty easy: just > shut down the VM on one machine and bring it up on another. > > --b.