Message ID | 20241023155846.63621-1-snitzer@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] nfsd: disallow file locking and delegations for NFSv4 reexport | expand |
On Wed, Oct 23, 2024 at 5:58 PM Mike Snitzer <snitzer@kernel.org> wrote: > > We do not and cannot support file locking with NFS reexport over > NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport > server reboot cannot allow clients to recover locks because the source > NFS server has not rebooted, and so it is not in grace. Since the > source NFS server is not in grace, it cannot offer any guarantees that > the file won't have been changed between the locks getting lost and > any attempt to recover/reclaim them. The same applies to delegations > and any associated locks, so disallow them too. > > Add EXPORT_OP_NOLOCKSUPPORT and exportfs_lock_op_is_unsupported(), set > EXPORT_OP_NOLOCKSUPPORT in nfs_export_ops and check for it in > nfsd4_lock(), nfsd4_locku() and nfs4_set_delegation(). Clients are > not allowed to get file locks or delegations from a reexport server, > any attempts will fail with operation not supported. Are you aware that this virtually castrates NFSv4 reexport to a point that it is no longer usable in real life? If you really want this, then the only way forward is to disable and remove NFS reexport support completely. So this patch is absolutely a NO-GO, r- Thanks, Martin
> On Oct 29, 2024, at 9:57 AM, Martin Wege <martin.l.wege@gmail.com> wrote: > > On Wed, Oct 23, 2024 at 5:58 PM Mike Snitzer <snitzer@kernel.org> wrote: >> >> We do not and cannot support file locking with NFS reexport over >> NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport >> server reboot cannot allow clients to recover locks because the source >> NFS server has not rebooted, and so it is not in grace. Since the >> source NFS server is not in grace, it cannot offer any guarantees that >> the file won't have been changed between the locks getting lost and >> any attempt to recover/reclaim them. The same applies to delegations >> and any associated locks, so disallow them too. >> >> Add EXPORT_OP_NOLOCKSUPPORT and exportfs_lock_op_is_unsupported(), set >> EXPORT_OP_NOLOCKSUPPORT in nfs_export_ops and check for it in >> nfsd4_lock(), nfsd4_locku() and nfs4_set_delegation(). Clients are >> not allowed to get file locks or delegations from a reexport server, >> any attempts will fail with operation not supported. > > Are you aware that this virtually castrates NFSv4 reexport to a point > that it is no longer usable in real life? "virtually castrates" is pretty nebulous. Please provide a detailed (and less hostile) account of an existing application that works today that no longer works when this patch is applied. Only then can we count this as a regression report. > If you really want this, > then the only way forward is to disable and remove NFS reexport > support completely. "No locking" is already the way NFSv3 re-export works. At the moment I cannot remember why we chose not to go with the "only local locking for re-export" design instead. -- Chuck Lever
Honestly, I don't know the usecase for re-exporting another server's NFS export in the first place. Is this someone trying to share NFS through a firewall? I've seen people share remote NFS exports via Samba in an attempt to avoid paying their NAS vendor for SMB support. (I think it's "standard equipment" now, but 10+ years ago? Not always...) But re-exporting another server's NFS exports? Haven't seen anyone do that in a while. Using "only local locks for reexport" would mean that -- in cases where different clients access the underlying export directly and others access the re-export -- you would have 2 different sources of "truth" with respect to locks... I have supported multiple tools that used file or byte-range record locks in my career... And this could easily royally hork any shared databases... Regards, Brian Cowan Regards, Brian Cowan ClearCase/VersionVault SWAT Mob: +1 (978) 907-2334 hcltechsw.com On Tue, Oct 29, 2024 at 10:11 AM Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > On Oct 29, 2024, at 9:57 AM, Martin Wege <martin.l.wege@gmail.com> wrote: > > > > On Wed, Oct 23, 2024 at 5:58 PM Mike Snitzer <snitzer@kernel.org> wrote: > >> > >> We do not and cannot support file locking with NFS reexport over > >> NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport > >> server reboot cannot allow clients to recover locks because the source > >> NFS server has not rebooted, and so it is not in grace. Since the > >> source NFS server is not in grace, it cannot offer any guarantees that > >> the file won't have been changed between the locks getting lost and > >> any attempt to recover/reclaim them. The same applies to delegations > >> and any associated locks, so disallow them too. > >> > >> Add EXPORT_OP_NOLOCKSUPPORT and exportfs_lock_op_is_unsupported(), set > >> EXPORT_OP_NOLOCKSUPPORT in nfs_export_ops and check for it in > >> nfsd4_lock(), nfsd4_locku() and nfs4_set_delegation(). Clients are > >> not allowed to get file locks or delegations from a reexport server, > >> any attempts will fail with operation not supported. > > > > Are you aware that this virtually castrates NFSv4 reexport to a point > > that it is no longer usable in real life? > > "virtually castrates" is pretty nebulous. Please provide a > detailed (and less hostile) account of an existing application > that works today that no longer works when this patch is > applied. Only then can we count this as a regression report. > > > > If you really want this, > > then the only way forward is to disable and remove NFS reexport > > support completely. > > "No locking" is already the way NFSv3 re-export works. > > At the moment I cannot remember why we chose not to go with > the "only local locking for re-export" design instead. > > > -- > Chuck Lever > >
> On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: > > Honestly, I don't know the usecase for re-exporting another server's > NFS export in the first place. Is this someone trying to share NFS > through a firewall? I've seen people share remote NFS exports via > Samba in an attempt to avoid paying their NAS vendor for SMB support. > (I think it's "standard equipment" now, but 10+ years ago? Not > always...) But re-exporting another server's NFS exports? Haven't seen > anyone do that in a while. The "re-export" case is where there is a central repository of data and branch offices that access that via a WAN. The re-export servers cache some of that data locally so that local clients have a fast persistent cache nearby. This is also effective in cases where a small cluster of clients want fast access to a pile of data that is significantly larger than their own caches. Say, HPC or animation, where the small cluster is working on a small portion of the full data set, which is stored on a central server. > Using "only local locks for reexport" would mean that -- in cases > where different clients access the underlying export directly and > others access the re-export -- you would have 2 different sources of > "truth" with respect to locks... I have supported multiple tools that > used file or byte-range record locks in my career... And this could > easily royally hork any shared databases... Yes, that's the downside of the local-lock-only approach. I had assumed that when locking was not available on the NFS server, the client can mount with "local_lock" (man nfs(5)). > Regards, > > Brian Cowan > > Regards, > > Brian Cowan > > ClearCase/VersionVault SWAT > > > > Mob: +1 (978) 907-2334 > > > > hcltechsw.com > > > > On Tue, Oct 29, 2024 at 10:11 AM Chuck Lever III <chuck.lever@oracle.com> wrote: >> >> >> >>> On Oct 29, 2024, at 9:57 AM, Martin Wege <martin.l.wege@gmail.com> wrote: >>> >>> On Wed, Oct 23, 2024 at 5:58 PM Mike Snitzer <snitzer@kernel.org> wrote: >>>> >>>> We do not and cannot support file locking with NFS reexport over >>>> NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport >>>> server reboot cannot allow clients to recover locks because the source >>>> NFS server has not rebooted, and so it is not in grace. Since the >>>> source NFS server is not in grace, it cannot offer any guarantees that >>>> the file won't have been changed between the locks getting lost and >>>> any attempt to recover/reclaim them. The same applies to delegations >>>> and any associated locks, so disallow them too. >>>> >>>> Add EXPORT_OP_NOLOCKSUPPORT and exportfs_lock_op_is_unsupported(), set >>>> EXPORT_OP_NOLOCKSUPPORT in nfs_export_ops and check for it in >>>> nfsd4_lock(), nfsd4_locku() and nfs4_set_delegation(). Clients are >>>> not allowed to get file locks or delegations from a reexport server, >>>> any attempts will fail with operation not supported. >>> >>> Are you aware that this virtually castrates NFSv4 reexport to a point >>> that it is no longer usable in real life? >> >> "virtually castrates" is pretty nebulous. Please provide a >> detailed (and less hostile) account of an existing application >> that works today that no longer works when this patch is >> applied. Only then can we count this as a regression report. >> >> >>> If you really want this, >>> then the only way forward is to disable and remove NFS reexport >>> support completely. >> >> "No locking" is already the way NFSv3 re-export works. >> >> At the moment I cannot remember why we chose not to go with >> the "only local locking for re-export" design instead. >> >> >> -- >> Chuck Lever >> >> -- Chuck Lever
On Tue, 29 Oct 2024 at 17:03, Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: > > > > Honestly, I don't know the usecase for re-exporting another server's > > NFS export in the first place. Is this someone trying to share NFS > > through a firewall? I've seen people share remote NFS exports via > > Samba in an attempt to avoid paying their NAS vendor for SMB support. > > (I think it's "standard equipment" now, but 10+ years ago? Not > > always...) But re-exporting another server's NFS exports? Haven't seen > > anyone do that in a while. > > The "re-export" case is where there is a central repository > of data and branch offices that access that via a WAN. The > re-export servers cache some of that data locally so that > local clients have a fast persistent cache nearby. > > This is also effective in cases where a small cluster of > clients want fast access to a pile of data that is > significantly larger than their own caches. Say, HPC or > animation, where the small cluster is working on a small > portion of the full data set, which is stored on a central > server. > Another use case is "isolation", IT shares a filesystem to your department, and you need to re-export only a subset to another department or homeoffice. Part of such a scenario might also be policy related, e.g. IT shares you the full filesystem but will do NOTHING else, and any further compartmentalization must be done in your own department. This is the typical use case for gov NFS re-export. Of course no one needs the gov customers, so feel free to break locking. Ced
> On Oct 30, 2024, at 10:55 AM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > > On Tue, 29 Oct 2024 at 17:03, Chuck Lever III <chuck.lever@oracle.com> wrote: >> >>> On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: >>> >>> Honestly, I don't know the usecase for re-exporting another server's >>> NFS export in the first place. Is this someone trying to share NFS >>> through a firewall? I've seen people share remote NFS exports via >>> Samba in an attempt to avoid paying their NAS vendor for SMB support. >>> (I think it's "standard equipment" now, but 10+ years ago? Not >>> always...) But re-exporting another server's NFS exports? Haven't seen >>> anyone do that in a while. >> >> The "re-export" case is where there is a central repository >> of data and branch offices that access that via a WAN. The >> re-export servers cache some of that data locally so that >> local clients have a fast persistent cache nearby. >> >> This is also effective in cases where a small cluster of >> clients want fast access to a pile of data that is >> significantly larger than their own caches. Say, HPC or >> animation, where the small cluster is working on a small >> portion of the full data set, which is stored on a central >> server. >> > Another use case is "isolation", IT shares a filesystem to your > department, and you need to re-export only a subset to another > department or homeoffice. Part of such a scenario might also be policy > related, e.g. IT shares you the full filesystem but will do NOTHING > else, and any further compartmentalization must be done in your own > department. > This is the typical use case for gov NFS re-export. It's not clear to me from this description why re-export is the right tool for this job. Please explain why ACLs are not used in this case -- this is exactly what they are designed to do. And again, clients of the re-export server need to mount it with local_lock. Apps can still use locking in that case, but the locks are not visible to apps on other clients. Your description does not explain why local_lock is not sufficient or feasible. > Of course no one needs the gov customers, so feel free to break locking. Please have a look at the patch description again: lock recovery does not work now, and cannot work without changes to the protocol. Isn't that a problem for such workloads? In other words, locking is already broken on NFSv4 re-export, but the current situation can lead to silent data corruption. -- Chuck Lever
On Wed, 30 Oct 2024 at 17:15, Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > On Oct 30, 2024, at 10:55 AM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > > > > On Tue, 29 Oct 2024 at 17:03, Chuck Lever III <chuck.lever@oracle.com> wrote: > >> > >>> On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: > >>> > >>> Honestly, I don't know the usecase for re-exporting another server's > >>> NFS export in the first place. Is this someone trying to share NFS > >>> through a firewall? I've seen people share remote NFS exports via > >>> Samba in an attempt to avoid paying their NAS vendor for SMB support. > >>> (I think it's "standard equipment" now, but 10+ years ago? Not > >>> always...) But re-exporting another server's NFS exports? Haven't seen > >>> anyone do that in a while. > >> > >> The "re-export" case is where there is a central repository > >> of data and branch offices that access that via a WAN. The > >> re-export servers cache some of that data locally so that > >> local clients have a fast persistent cache nearby. > >> > >> This is also effective in cases where a small cluster of > >> clients want fast access to a pile of data that is > >> significantly larger than their own caches. Say, HPC or > >> animation, where the small cluster is working on a small > >> portion of the full data set, which is stored on a central > >> server. > >> > > Another use case is "isolation", IT shares a filesystem to your > > department, and you need to re-export only a subset to another > > department or homeoffice. Part of such a scenario might also be policy > > related, e.g. IT shares you the full filesystem but will do NOTHING > > else, and any further compartmentalization must be done in your own > > department. > > This is the typical use case for gov NFS re-export. > > It's not clear to me from this description why re-export is > the right tool for this job. Please explain why ACLs are not > used in this case -- this is exactly what they are designed > to do. 1. IT departments want better/harder/immutable isolation than ACLs 2. Linux NFSv4 only implements POSIX draft ACLs, not full Windows or NFSv4 ACLs. So there is no proper way to prevent ACL editing, rendering them useless in this case. There is a reason why POSIX draft ACls were abandoned - they are not fine-granted enough for real world usage outside the Linux universe. As soon as interoperability is required these things just bite you HARD. Also, just running more nfsd in parallel on the origin NFS server is not a better option - remember the debate of non-2049 ports for nfsd? > > And again, clients of the re-export server need to mount it > with local_lock. Apps can still use locking in that case, > but the locks are not visible to apps on other clients. Your > description does not explain why local_lock is not > sufficient or feasible. Because: - it breaks applications running on more than one machine? - it breaks use cases like NFS--->SMB bridges, because without locking the typical Windows .NET application will refuse to write to a file - it breaks even SIMPLE things like Microsoft Excel Of course the happy echo "hello Linux-NFSv4-only world" >/nfs/file will always work. > > Of course no one needs the gov customers, so feel free to break locking. > > > Please have a look at the patch description again: lock > recovery does not work now, and cannot work without > changes to the protocol. Isn't that a problem for such > workloads? Nope, because of UPS (Uninterruptible power supply). Either everything is UP, or *everything* is DOWN. Boolean. > > In other words, locking is already broken on NFSv4 re-export, > but the current situation can lead to silent data corruption. Would storing the locking information into persistent files help, ie. files which persist across nfsd server restarts? Ced
> On Oct 30, 2024, at 12:37 PM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > > On Wed, 30 Oct 2024 at 17:15, Chuck Lever III <chuck.lever@oracle.com> wrote: >> >> >> >>> On Oct 30, 2024, at 10:55 AM, Cedric Blancher <cedric.blancher@gmail.com> wrote: >>> >>> On Tue, 29 Oct 2024 at 17:03, Chuck Lever III <chuck.lever@oracle.com> wrote: >>>> >>>>> On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: >>>>> >>>>> Honestly, I don't know the usecase for re-exporting another server's >>>>> NFS export in the first place. Is this someone trying to share NFS >>>>> through a firewall? I've seen people share remote NFS exports via >>>>> Samba in an attempt to avoid paying their NAS vendor for SMB support. >>>>> (I think it's "standard equipment" now, but 10+ years ago? Not >>>>> always...) But re-exporting another server's NFS exports? Haven't seen >>>>> anyone do that in a while. >>>> >>>> The "re-export" case is where there is a central repository >>>> of data and branch offices that access that via a WAN. The >>>> re-export servers cache some of that data locally so that >>>> local clients have a fast persistent cache nearby. >>>> >>>> This is also effective in cases where a small cluster of >>>> clients want fast access to a pile of data that is >>>> significantly larger than their own caches. Say, HPC or >>>> animation, where the small cluster is working on a small >>>> portion of the full data set, which is stored on a central >>>> server. >>>> >>> Another use case is "isolation", IT shares a filesystem to your >>> department, and you need to re-export only a subset to another >>> department or homeoffice. Part of such a scenario might also be policy >>> related, e.g. IT shares you the full filesystem but will do NOTHING >>> else, and any further compartmentalization must be done in your own >>> department. >>> This is the typical use case for gov NFS re-export. >> >> It's not clear to me from this description why re-export is >> the right tool for this job. Please explain why ACLs are not >> used in this case -- this is exactly what they are designed >> to do. > > 1. IT departments want better/harder/immutable isolation than ACLs So you want MAC, and the storage administrator won't set that up for you on the NFS server. NFS doesn't do MAC very well if at all. > 2. Linux NFSv4 only implements POSIX draft ACLs, not full Windows or > NFSv4 ACLs. So there is no proper way to prevent ACL editing, > rendering them useless in this case. Er. Linux NFSv4 stores the ACLs as POSIX draft, because that's what Linux file systems can support. NFSD, via NFSv4, makes these appear like NFSv4 ACLs. But I think I understand. > There is a reason why POSIX draft ACls were abandoned - they are not > fine-granted enough for real world usage outside the Linux universe. > As soon as interoperability is required these things just bite you > HARD. You, of course, have the ability to run some other NFS server implementation that meets your security requirements more fully. > Also, just running more nfsd in parallel on the origin NFS server is > not a better option - remember the debate of non-2049 ports for nfsd? I'm not sure where this is going. Do you mean the storage administrator would provide NFS service on alternate ports that each expose a separate set of exports? So the only option Linux has there is using containers or libvirt. We've continued to privately discuss the ability for NFSD to support a separate set of exports on alternate ports, but it doesn't look feasible. The export management infrastructure and user space tools would need to be rewritten. >> And again, clients of the re-export server need to mount it >> with local_lock. Apps can still use locking in that case, >> but the locks are not visible to apps on other clients. Your >> description does not explain why local_lock is not >> sufficient or feasible. > > Because: > - it breaks applications running on more than one machine? Yes, obviously. Your description needs to mention that is a requirement, since there are a lot of applications that don't need locking across multiple clients. > - it breaks use cases like NFS--->SMB bridges, because without locking > the typical Windows .NET application will refuse to write to a file That's a quagmire, and I don't think we can guarantee that will work. Linux NFS doesn't support "deny" modes, for example. > - it breaks even SIMPLE things like Microsoft Excel If you need SMB semantics, why not use Samba? The upshot appears to be that this usage is a stack of mismatched storage protocols that work around a bunch of local IT bureaucracy. I'm trying to be sympathetic, but it's hard to say that /anyone/ would fully support this. > Of course the happy echo "hello Linux-NFSv4-only world" >/nfs/file > will always work. > >>> Of course no one needs the gov customers, so feel free to break locking. >> >> >> Please have a look at the patch description again: lock >> recovery does not work now, and cannot work without >> changes to the protocol. Isn't that a problem for such >> workloads? > > Nope, because of UPS (Uninterruptible power supply). Either everything > is UP, or *everything* is DOWN. Boolean. Power outages are not the only reason lock recovery might be necessary. Network partitions, re-export server upgrades or reboots, etc. So I'm not hearing anythying to suggest this kind of workload is not impacted by the current lock recovery problems. >> In other words, locking is already broken on NFSv4 re-export, >> but the current situation can lead to silent data corruption. > > Would storing the locking information into persistent files help, ie. > files which persist across nfsd server restarts? Yes, but it would make things horribly slow. And of course there would be a lot of coding involved to get this to work. What if we added an export option to allow the re-export server to continue handling locking, but default it to off (which is the safer option) ? -- Chuck Lever
On Wed, Oct 30, 2024 at 10:08 AM Chuck Lever III <chuck.lever@oracle.com> wrote: > > CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca. > > > > > > On Oct 30, 2024, at 12:37 PM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > > > > On Wed, 30 Oct 2024 at 17:15, Chuck Lever III <chuck.lever@oracle.com> wrote: > >> > >> > >> > >>> On Oct 30, 2024, at 10:55 AM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > >>> > >>> On Tue, 29 Oct 2024 at 17:03, Chuck Lever III <chuck.lever@oracle.com> wrote: > >>>> > >>>>> On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: > >>>>> > >>>>> Honestly, I don't know the usecase for re-exporting another server's > >>>>> NFS export in the first place. Is this someone trying to share NFS > >>>>> through a firewall? I've seen people share remote NFS exports via > >>>>> Samba in an attempt to avoid paying their NAS vendor for SMB support. > >>>>> (I think it's "standard equipment" now, but 10+ years ago? Not > >>>>> always...) But re-exporting another server's NFS exports? Haven't seen > >>>>> anyone do that in a while. > >>>> > >>>> The "re-export" case is where there is a central repository > >>>> of data and branch offices that access that via a WAN. The > >>>> re-export servers cache some of that data locally so that > >>>> local clients have a fast persistent cache nearby. > >>>> > >>>> This is also effective in cases where a small cluster of > >>>> clients want fast access to a pile of data that is > >>>> significantly larger than their own caches. Say, HPC or > >>>> animation, where the small cluster is working on a small > >>>> portion of the full data set, which is stored on a central > >>>> server. > >>>> > >>> Another use case is "isolation", IT shares a filesystem to your > >>> department, and you need to re-export only a subset to another > >>> department or homeoffice. Part of such a scenario might also be policy > >>> related, e.g. IT shares you the full filesystem but will do NOTHING > >>> else, and any further compartmentalization must be done in your own > >>> department. > >>> This is the typical use case for gov NFS re-export. > >> > >> It's not clear to me from this description why re-export is > >> the right tool for this job. Please explain why ACLs are not > >> used in this case -- this is exactly what they are designed > >> to do. > > > > 1. IT departments want better/harder/immutable isolation than ACLs > > So you want MAC, and the storage administrator won't set > that up for you on the NFS server. NFS doesn't do MAC > very well if at all. > > > > 2. Linux NFSv4 only implements POSIX draft ACLs, not full Windows or > > NFSv4 ACLs. So there is no proper way to prevent ACL editing, > > rendering them useless in this case. > > Er. Linux NFSv4 stores the ACLs as POSIX draft, because > that's what Linux file systems can support. NFSD, via > NFSv4, makes these appear like NFSv4 ACLs. > > But I think I understand. > > > > There is a reason why POSIX draft ACls were abandoned - they are not > > fine-granted enough for real world usage outside the Linux universe. > > As soon as interoperability is required these things just bite you > > HARD. > > You, of course, have the ability to run some other NFS > server implementation that meets your security requirements > more fully. > > > > Also, just running more nfsd in parallel on the origin NFS server is > > not a better option - remember the debate of non-2049 ports for nfsd? > > I'm not sure where this is going. Do you mean the storage > administrator would provide NFS service on alternate > ports that each expose a separate set of exports? > > So the only option Linux has there is using containers or > libvirt. We've continued to privately discuss the ability > for NFSD to support a separate set of exports on alternate > ports, but it doesn't look feasible. The export management > infrastructure and user space tools would need to be > rewritten. > > > >> And again, clients of the re-export server need to mount it > >> with local_lock. Apps can still use locking in that case, > >> but the locks are not visible to apps on other clients. Your > >> description does not explain why local_lock is not > >> sufficient or feasible. > > > > Because: > > - it breaks applications running on more than one machine? > > Yes, obviously. Your description needs to mention that is > a requirement, since there are a lot of applications that > don't need locking across multiple clients. > > > > - it breaks use cases like NFS--->SMB bridges, because without locking > > the typical Windows .NET application will refuse to write to a file > > That's a quagmire, and I don't think we can guarantee that > will work. Linux NFS doesn't support "deny" modes, for > example. > > > > - it breaks even SIMPLE things like Microsoft Excel > > If you need SMB semantics, why not use Samba? > > The upshot appears to be that this usage is a stack of > mismatched storage protocols that work around a bunch of > local IT bureaucracy. I'm trying to be sympathetic, but > it's hard to say that /anyone/ would fully support this. > > > > Of course the happy echo "hello Linux-NFSv4-only world" >/nfs/file > > will always work. > > > >>> Of course no one needs the gov customers, so feel free to break locking. > >> > >> > >> Please have a look at the patch description again: lock > >> recovery does not work now, and cannot work without > >> changes to the protocol. Isn't that a problem for such > >> workloads? > > > > Nope, because of UPS (Uninterruptible power supply). Either everything > > is UP, or *everything* is DOWN. Boolean. > > Power outages are not the only reason lock recovery might > be necessary. Network partitions, re-export server > upgrades or reboots, etc. So I'm not hearing anythying > to suggest this kind of workload is not impacted by > the current lock recovery problems. > > > >> In other words, locking is already broken on NFSv4 re-export, > >> but the current situation can lead to silent data corruption. > > > > Would storing the locking information into persistent files help, ie. > > files which persist across nfsd server restarts? > > Yes, but it would make things horribly slow. > > And of course there would be a lot of coding involved > to get this to work. I suspect this suggestion might be a fair amount of code too (and I am certainly not volunteering to write it), but I will mention it. Another possibility would be to have the re-exporting NFSv4 server just pass locking ops through to the backend NFSv4 server. - It is roughly the inverse of what I did when I constructed a flex files pNFS server. The MDS did the locking ops and any I/O ops. were passed through to the DS(s). Of course, it was hoped the client would use layouts and bypass the MDS for I/O. rick > > What if we added an export option to allow the re-export > server to continue handling locking, but default it to > off (which is the safer option) ? > > -- > Chuck Lever > >
On Wed, 2024-10-30 at 15:48 -0700, Rick Macklem wrote: > On Wed, Oct 30, 2024 at 10:08 AM Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca. > > > > > > > > > > > On Oct 30, 2024, at 12:37 PM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > > > > > > On Wed, 30 Oct 2024 at 17:15, Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > > > > > > > > > > > > > On Oct 30, 2024, at 10:55 AM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > > > > > > > > > > On Tue, 29 Oct 2024 at 17:03, Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > > > > > > > > > On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: > > > > > > > > > > > > > > Honestly, I don't know the usecase for re-exporting another server's > > > > > > > NFS export in the first place. Is this someone trying to share NFS > > > > > > > through a firewall? I've seen people share remote NFS exports via > > > > > > > Samba in an attempt to avoid paying their NAS vendor for SMB support. > > > > > > > (I think it's "standard equipment" now, but 10+ years ago? Not > > > > > > > always...) But re-exporting another server's NFS exports? Haven't seen > > > > > > > anyone do that in a while. > > > > > > > > > > > > The "re-export" case is where there is a central repository > > > > > > of data and branch offices that access that via a WAN. The > > > > > > re-export servers cache some of that data locally so that > > > > > > local clients have a fast persistent cache nearby. > > > > > > > > > > > > This is also effective in cases where a small cluster of > > > > > > clients want fast access to a pile of data that is > > > > > > significantly larger than their own caches. Say, HPC or > > > > > > animation, where the small cluster is working on a small > > > > > > portion of the full data set, which is stored on a central > > > > > > server. > > > > > > > > > > > Another use case is "isolation", IT shares a filesystem to your > > > > > department, and you need to re-export only a subset to another > > > > > department or homeoffice. Part of such a scenario might also be policy > > > > > related, e.g. IT shares you the full filesystem but will do NOTHING > > > > > else, and any further compartmentalization must be done in your own > > > > > department. > > > > > This is the typical use case for gov NFS re-export. > > > > > > > > It's not clear to me from this description why re-export is > > > > the right tool for this job. Please explain why ACLs are not > > > > used in this case -- this is exactly what they are designed > > > > to do. > > > > > > 1. IT departments want better/harder/immutable isolation than ACLs > > > > So you want MAC, and the storage administrator won't set > > that up for you on the NFS server. NFS doesn't do MAC > > very well if at all. > > > > > > > 2. Linux NFSv4 only implements POSIX draft ACLs, not full Windows or > > > NFSv4 ACLs. So there is no proper way to prevent ACL editing, > > > rendering them useless in this case. > > > > Er. Linux NFSv4 stores the ACLs as POSIX draft, because > > that's what Linux file systems can support. NFSD, via > > NFSv4, makes these appear like NFSv4 ACLs. > > > > But I think I understand. > > > > > > > There is a reason why POSIX draft ACls were abandoned - they are not > > > fine-granted enough for real world usage outside the Linux universe. > > > As soon as interoperability is required these things just bite you > > > HARD. > > > > You, of course, have the ability to run some other NFS > > server implementation that meets your security requirements > > more fully. > > > > > > > Also, just running more nfsd in parallel on the origin NFS server is > > > not a better option - remember the debate of non-2049 ports for nfsd? > > > > I'm not sure where this is going. Do you mean the storage > > administrator would provide NFS service on alternate > > ports that each expose a separate set of exports? > > > > So the only option Linux has there is using containers or > > libvirt. We've continued to privately discuss the ability > > for NFSD to support a separate set of exports on alternate > > ports, but it doesn't look feasible. The export management > > infrastructure and user space tools would need to be > > rewritten. > > > > > > > > And again, clients of the re-export server need to mount it > > > > with local_lock. Apps can still use locking in that case, > > > > but the locks are not visible to apps on other clients. Your > > > > description does not explain why local_lock is not > > > > sufficient or feasible. > > > > > > Because: > > > - it breaks applications running on more than one machine? > > > > Yes, obviously. Your description needs to mention that is > > a requirement, since there are a lot of applications that > > don't need locking across multiple clients. > > > > > > > - it breaks use cases like NFS--->SMB bridges, because without locking > > > the typical Windows .NET application will refuse to write to a file > > > > That's a quagmire, and I don't think we can guarantee that > > will work. Linux NFS doesn't support "deny" modes, for > > example. > > > > > > > - it breaks even SIMPLE things like Microsoft Excel > > > > If you need SMB semantics, why not use Samba? > > > > The upshot appears to be that this usage is a stack of > > mismatched storage protocols that work around a bunch of > > local IT bureaucracy. I'm trying to be sympathetic, but > > it's hard to say that /anyone/ would fully support this. > > > > > > > Of course the happy echo "hello Linux-NFSv4-only world" >/nfs/file > > > will always work. > > > > > > > > Of course no one needs the gov customers, so feel free to break locking. > > > > > > > > > > > > Please have a look at the patch description again: lock > > > > recovery does not work now, and cannot work without > > > > changes to the protocol. Isn't that a problem for such > > > > workloads? > > > > > > Nope, because of UPS (Uninterruptible power supply). Either everything > > > is UP, or *everything* is DOWN. Boolean. > > > > Power outages are not the only reason lock recovery might > > be necessary. Network partitions, re-export server > > upgrades or reboots, etc. So I'm not hearing anythying > > to suggest this kind of workload is not impacted by > > the current lock recovery problems. > > > > > > > > In other words, locking is already broken on NFSv4 re-export, > > > > but the current situation can lead to silent data corruption. > > > > > > Would storing the locking information into persistent files help, ie. > > > files which persist across nfsd server restarts? > > > > Yes, but it would make things horribly slow. > > > > And of course there would be a lot of coding involved > > to get this to work. > I suspect this suggestion might be a fair amount of code too > (and I am certainly not volunteering to write it), but I will mention it. > > Another possibility would be to have the re-exporting NFSv4 server > just pass locking ops through to the backend NFSv4 server. > - It is roughly the inverse of what I did when I constructed a flex files > pNFS server. The MDS did the locking ops and any I/O ops. were > passed through to the DS(s). Of course, it was hoped the client > would use layouts and bypass the MDS for I/O. > How do you handle reclaim in this case? IOW, suppose the backend server crashes but the reexporter stays up. How do you coordinate the grace periods between the two so that the client can reclaim its lock on the backend? > > > > > What if we added an export option to allow the re-export > > server to continue handling locking, but default it to > > off (which is the safer option) ? > > > > -- > > Chuck Lever > > > > >
On Thu, Oct 31, 2024 at 4:43 AM Jeff Layton <jlayton@kernel.org> wrote: > > On Wed, 2024-10-30 at 15:48 -0700, Rick Macklem wrote: > > On Wed, Oct 30, 2024 at 10:08 AM Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > > CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca. > > > > > > > > > > > > > > > > On Oct 30, 2024, at 12:37 PM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > > > > > > > > On Wed, 30 Oct 2024 at 17:15, Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > > > > > > > > > > > > > > > > > On Oct 30, 2024, at 10:55 AM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > > > > > > > > > > > > On Tue, 29 Oct 2024 at 17:03, Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > > > > > > > > > > > On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: > > > > > > > > > > > > > > > > Honestly, I don't know the usecase for re-exporting another server's > > > > > > > > NFS export in the first place. Is this someone trying to share NFS > > > > > > > > through a firewall? I've seen people share remote NFS exports via > > > > > > > > Samba in an attempt to avoid paying their NAS vendor for SMB support. > > > > > > > > (I think it's "standard equipment" now, but 10+ years ago? Not > > > > > > > > always...) But re-exporting another server's NFS exports? Haven't seen > > > > > > > > anyone do that in a while. > > > > > > > > > > > > > > The "re-export" case is where there is a central repository > > > > > > > of data and branch offices that access that via a WAN. The > > > > > > > re-export servers cache some of that data locally so that > > > > > > > local clients have a fast persistent cache nearby. > > > > > > > > > > > > > > This is also effective in cases where a small cluster of > > > > > > > clients want fast access to a pile of data that is > > > > > > > significantly larger than their own caches. Say, HPC or > > > > > > > animation, where the small cluster is working on a small > > > > > > > portion of the full data set, which is stored on a central > > > > > > > server. > > > > > > > > > > > > > Another use case is "isolation", IT shares a filesystem to your > > > > > > department, and you need to re-export only a subset to another > > > > > > department or homeoffice. Part of such a scenario might also be policy > > > > > > related, e.g. IT shares you the full filesystem but will do NOTHING > > > > > > else, and any further compartmentalization must be done in your own > > > > > > department. > > > > > > This is the typical use case for gov NFS re-export. > > > > > > > > > > It's not clear to me from this description why re-export is > > > > > the right tool for this job. Please explain why ACLs are not > > > > > used in this case -- this is exactly what they are designed > > > > > to do. > > > > > > > > 1. IT departments want better/harder/immutable isolation than ACLs > > > > > > So you want MAC, and the storage administrator won't set > > > that up for you on the NFS server. NFS doesn't do MAC > > > very well if at all. > > > > > > > > > > 2. Linux NFSv4 only implements POSIX draft ACLs, not full Windows or > > > > NFSv4 ACLs. So there is no proper way to prevent ACL editing, > > > > rendering them useless in this case. > > > > > > Er. Linux NFSv4 stores the ACLs as POSIX draft, because > > > that's what Linux file systems can support. NFSD, via > > > NFSv4, makes these appear like NFSv4 ACLs. > > > > > > But I think I understand. > > > > > > > > > > There is a reason why POSIX draft ACls were abandoned - they are not > > > > fine-granted enough for real world usage outside the Linux universe. > > > > As soon as interoperability is required these things just bite you > > > > HARD. > > > > > > You, of course, have the ability to run some other NFS > > > server implementation that meets your security requirements > > > more fully. > > > > > > > > > > Also, just running more nfsd in parallel on the origin NFS server is > > > > not a better option - remember the debate of non-2049 ports for nfsd? > > > > > > I'm not sure where this is going. Do you mean the storage > > > administrator would provide NFS service on alternate > > > ports that each expose a separate set of exports? > > > > > > So the only option Linux has there is using containers or > > > libvirt. We've continued to privately discuss the ability > > > for NFSD to support a separate set of exports on alternate > > > ports, but it doesn't look feasible. The export management > > > infrastructure and user space tools would need to be > > > rewritten. > > > > > > > > > > > And again, clients of the re-export server need to mount it > > > > > with local_lock. Apps can still use locking in that case, > > > > > but the locks are not visible to apps on other clients. Your > > > > > description does not explain why local_lock is not > > > > > sufficient or feasible. > > > > > > > > Because: > > > > - it breaks applications running on more than one machine? > > > > > > Yes, obviously. Your description needs to mention that is > > > a requirement, since there are a lot of applications that > > > don't need locking across multiple clients. > > > > > > > > > > - it breaks use cases like NFS--->SMB bridges, because without locking > > > > the typical Windows .NET application will refuse to write to a file > > > > > > That's a quagmire, and I don't think we can guarantee that > > > will work. Linux NFS doesn't support "deny" modes, for > > > example. > > > > > > > > > > - it breaks even SIMPLE things like Microsoft Excel > > > > > > If you need SMB semantics, why not use Samba? > > > > > > The upshot appears to be that this usage is a stack of > > > mismatched storage protocols that work around a bunch of > > > local IT bureaucracy. I'm trying to be sympathetic, but > > > it's hard to say that /anyone/ would fully support this. > > > > > > > > > > Of course the happy echo "hello Linux-NFSv4-only world" >/nfs/file > > > > will always work. > > > > > > > > > > Of course no one needs the gov customers, so feel free to break locking. > > > > > > > > > > > > > > > Please have a look at the patch description again: lock > > > > > recovery does not work now, and cannot work without > > > > > changes to the protocol. Isn't that a problem for such > > > > > workloads? > > > > > > > > Nope, because of UPS (Uninterruptible power supply). Either everything > > > > is UP, or *everything* is DOWN. Boolean. > > > > > > Power outages are not the only reason lock recovery might > > > be necessary. Network partitions, re-export server > > > upgrades or reboots, etc. So I'm not hearing anythying > > > to suggest this kind of workload is not impacted by > > > the current lock recovery problems. > > > > > > > > > > > In other words, locking is already broken on NFSv4 re-export, > > > > > but the current situation can lead to silent data corruption. > > > > > > > > Would storing the locking information into persistent files help, ie. > > > > files which persist across nfsd server restarts? > > > > > > Yes, but it would make things horribly slow. > > > > > > And of course there would be a lot of coding involved > > > to get this to work. > > I suspect this suggestion might be a fair amount of code too > > (and I am certainly not volunteering to write it), but I will mention it. > > > > Another possibility would be to have the re-exporting NFSv4 server > > just pass locking ops through to the backend NFSv4 server. > > - It is roughly the inverse of what I did when I constructed a flex files > > pNFS server. The MDS did the locking ops and any I/O ops. were > > passed through to the DS(s). Of course, it was hoped the client > > would use layouts and bypass the MDS for I/O. > > > > How do you handle reclaim in this case? IOW, suppose the backend server > crashes but the reexporter stays up. How do you coordinate the grace > periods between the two so that the client can reclaim its lock on the > backend? Well, I'm not saying it is trivial. I think you would need to pass through all state operations: ExchangeID, Open,...,Lock,LockU - The tricky bit would be sessions, since the re-exporter would need to maintain sessions. --> Maybe the re-exporter would need to save the ClientID (from the backend nfsd) in non-volatile storage. When the backend server crashes/reboots, the re-exporter would see this as a failure (usually NFS4ERR_BAD_SESSION) and would pass that to the client. The only recovery RPC that would not be passed through would be Create_session, although the re-exporter would do a Create_session for connection(s) it has against the backend server. I think something like that would work for the backend crash/recovery. A crash of the re-exporter could be more of a problem, I think. It would need to have the ClientID (stored in non-volatile storage) so that it could do a Create_session with it against the backend server. - It would also depend on the backend server being courteous, so that an re-exporter crash/reboot that takes a while such that the lease expires doesn't result in a loss of state on the backend server. Anyhow, something like that. Like I said, I'm not volunteering to code it, rick > > > > > > > > > What if we added an export option to allow the re-export > > > server to continue handling locking, but default it to > > > off (which is the safer option) ? > > > > > > -- > > > Chuck Lever > > > > > > > > > > -- > Jeff Layton <jlayton@kernel.org>
> On Oct 31, 2024, at 10:48 AM, Rick Macklem <rick.macklem@gmail.com> wrote: > > On Thu, Oct 31, 2024 at 4:43 AM Jeff Layton <jlayton@kernel.org> wrote: >> >> On Wed, 2024-10-30 at 15:48 -0700, Rick Macklem wrote: >>> On Wed, Oct 30, 2024 at 10:08 AM Chuck Lever III <chuck.lever@oracle.com> wrote: >>>> >>>> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca. >>>> >>>> >>>> >>>> >>>>> On Oct 30, 2024, at 12:37 PM, Cedric Blancher <cedric.blancher@gmail.com> wrote: >>>>> >>>>> On Wed, 30 Oct 2024 at 17:15, Chuck Lever III <chuck.lever@oracle.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Oct 30, 2024, at 10:55 AM, Cedric Blancher <cedric.blancher@gmail.com> wrote: >>>>>>> >>>>>>> On Tue, 29 Oct 2024 at 17:03, Chuck Lever III <chuck.lever@oracle.com> wrote: >>>>>>>> >>>>>>>>> On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: >>>>>>>>> >>>>>>>>> Honestly, I don't know the usecase for re-exporting another server's >>>>>>>>> NFS export in the first place. Is this someone trying to share NFS >>>>>>>>> through a firewall? I've seen people share remote NFS exports via >>>>>>>>> Samba in an attempt to avoid paying their NAS vendor for SMB support. >>>>>>>>> (I think it's "standard equipment" now, but 10+ years ago? Not >>>>>>>>> always...) But re-exporting another server's NFS exports? Haven't seen >>>>>>>>> anyone do that in a while. >>>>>>>> >>>>>>>> The "re-export" case is where there is a central repository >>>>>>>> of data and branch offices that access that via a WAN. The >>>>>>>> re-export servers cache some of that data locally so that >>>>>>>> local clients have a fast persistent cache nearby. >>>>>>>> >>>>>>>> This is also effective in cases where a small cluster of >>>>>>>> clients want fast access to a pile of data that is >>>>>>>> significantly larger than their own caches. Say, HPC or >>>>>>>> animation, where the small cluster is working on a small >>>>>>>> portion of the full data set, which is stored on a central >>>>>>>> server. >>>>>>>> >>>>>>> Another use case is "isolation", IT shares a filesystem to your >>>>>>> department, and you need to re-export only a subset to another >>>>>>> department or homeoffice. Part of such a scenario might also be policy >>>>>>> related, e.g. IT shares you the full filesystem but will do NOTHING >>>>>>> else, and any further compartmentalization must be done in your own >>>>>>> department. >>>>>>> This is the typical use case for gov NFS re-export. >>>>>> >>>>>> It's not clear to me from this description why re-export is >>>>>> the right tool for this job. Please explain why ACLs are not >>>>>> used in this case -- this is exactly what they are designed >>>>>> to do. >>>>> >>>>> 1. IT departments want better/harder/immutable isolation than ACLs >>>> >>>> So you want MAC, and the storage administrator won't set >>>> that up for you on the NFS server. NFS doesn't do MAC >>>> very well if at all. >>>> >>>> >>>>> 2. Linux NFSv4 only implements POSIX draft ACLs, not full Windows or >>>>> NFSv4 ACLs. So there is no proper way to prevent ACL editing, >>>>> rendering them useless in this case. >>>> >>>> Er. Linux NFSv4 stores the ACLs as POSIX draft, because >>>> that's what Linux file systems can support. NFSD, via >>>> NFSv4, makes these appear like NFSv4 ACLs. >>>> >>>> But I think I understand. >>>> >>>> >>>>> There is a reason why POSIX draft ACls were abandoned - they are not >>>>> fine-granted enough for real world usage outside the Linux universe. >>>>> As soon as interoperability is required these things just bite you >>>>> HARD. >>>> >>>> You, of course, have the ability to run some other NFS >>>> server implementation that meets your security requirements >>>> more fully. >>>> >>>> >>>>> Also, just running more nfsd in parallel on the origin NFS server is >>>>> not a better option - remember the debate of non-2049 ports for nfsd? >>>> >>>> I'm not sure where this is going. Do you mean the storage >>>> administrator would provide NFS service on alternate >>>> ports that each expose a separate set of exports? >>>> >>>> So the only option Linux has there is using containers or >>>> libvirt. We've continued to privately discuss the ability >>>> for NFSD to support a separate set of exports on alternate >>>> ports, but it doesn't look feasible. The export management >>>> infrastructure and user space tools would need to be >>>> rewritten. >>>> >>>> >>>>>> And again, clients of the re-export server need to mount it >>>>>> with local_lock. Apps can still use locking in that case, >>>>>> but the locks are not visible to apps on other clients. Your >>>>>> description does not explain why local_lock is not >>>>>> sufficient or feasible. >>>>> >>>>> Because: >>>>> - it breaks applications running on more than one machine? >>>> >>>> Yes, obviously. Your description needs to mention that is >>>> a requirement, since there are a lot of applications that >>>> don't need locking across multiple clients. >>>> >>>> >>>>> - it breaks use cases like NFS--->SMB bridges, because without locking >>>>> the typical Windows .NET application will refuse to write to a file >>>> >>>> That's a quagmire, and I don't think we can guarantee that >>>> will work. Linux NFS doesn't support "deny" modes, for >>>> example. >>>> >>>> >>>>> - it breaks even SIMPLE things like Microsoft Excel >>>> >>>> If you need SMB semantics, why not use Samba? >>>> >>>> The upshot appears to be that this usage is a stack of >>>> mismatched storage protocols that work around a bunch of >>>> local IT bureaucracy. I'm trying to be sympathetic, but >>>> it's hard to say that /anyone/ would fully support this. >>>> >>>> >>>>> Of course the happy echo "hello Linux-NFSv4-only world" >/nfs/file >>>>> will always work. >>>>> >>>>>>> Of course no one needs the gov customers, so feel free to break locking. >>>>>> >>>>>> >>>>>> Please have a look at the patch description again: lock >>>>>> recovery does not work now, and cannot work without >>>>>> changes to the protocol. Isn't that a problem for such >>>>>> workloads? >>>>> >>>>> Nope, because of UPS (Uninterruptible power supply). Either everything >>>>> is UP, or *everything* is DOWN. Boolean. >>>> >>>> Power outages are not the only reason lock recovery might >>>> be necessary. Network partitions, re-export server >>>> upgrades or reboots, etc. So I'm not hearing anythying >>>> to suggest this kind of workload is not impacted by >>>> the current lock recovery problems. >>>> >>>> >>>>>> In other words, locking is already broken on NFSv4 re-export, >>>>>> but the current situation can lead to silent data corruption. >>>>> >>>>> Would storing the locking information into persistent files help, ie. >>>>> files which persist across nfsd server restarts? >>>> >>>> Yes, but it would make things horribly slow. >>>> >>>> And of course there would be a lot of coding involved >>>> to get this to work. >>> I suspect this suggestion might be a fair amount of code too >>> (and I am certainly not volunteering to write it), but I will mention it. >>> >>> Another possibility would be to have the re-exporting NFSv4 server >>> just pass locking ops through to the backend NFSv4 server. >>> - It is roughly the inverse of what I did when I constructed a flex files >>> pNFS server. The MDS did the locking ops and any I/O ops. were >>> passed through to the DS(s). Of course, it was hoped the client >>> would use layouts and bypass the MDS for I/O. >>> >> >> How do you handle reclaim in this case? IOW, suppose the backend server >> crashes but the reexporter stays up. How do you coordinate the grace >> periods between the two so that the client can reclaim its lock on the >> backend? > Well, I'm not saying it is trivial. > I think you would need to pass through all state operations: > ExchangeID, Open,...,Lock,LockU > - The tricky bit would be sessions, since the re-exporter would need to > maintain sessions. > --> Maybe the re-exporter would need to save the ClientID (from the > backend nfsd) in non-volatile storage. > > When the backend server crashes/reboots, the re-exporter would see > this as a failure (usually NFS4ERR_BAD_SESSION) and would pass > that to the client. > The only recovery RPC that would not be passed through would be > Create_session, although the re-exporter would do a Create_session > for connection(s) it has against the backend server. > I think something like that would work for the backend crash/recovery. The backend server would be in grace, and the re-exporter would be able to recover its lock state on the backend server using normal state recovery. I think the re- exporter would not need to expose the backend server's crash to its own clients. > A crash of the re-exporter could be more of a problem, I think. > It would need to have the ClientID (stored in non-volatile storage) > so that it could do a Create_session with it against the backend server. > - It would also depend on the backend server being courteous, so that > an re-exporter crash/reboot that takes a while such that the lease expires > doesn't result in a loss of state on the backend server. The backend server would not be in grace after the re-export server crashes. There's no way for the re-export server's NFS client to recover its lock state from the backend server. The re-export server recovers by re-learning lock state from its own clients. The question is how the re-export server could re-initialize this state in its local client of the backend server. -- Chuck Lever
On Wed, Oct 23, 2024 at 11:58:46AM -0400, Mike Snitzer wrote: > We do not and cannot support file locking with NFS reexport over > NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport > server reboot cannot allow clients to recover locks because the source > NFS server has not rebooted, and so it is not in grace. Since the > source NFS server is not in grace, it cannot offer any guarantees that > the file won't have been changed between the locks getting lost and > any attempt to recover/reclaim them. The same applies to delegations > and any associated locks, so disallow them too. > > Add EXPORT_OP_NOLOCKSUPPORT and exportfs_lock_op_is_unsupported(), set > EXPORT_OP_NOLOCKSUPPORT in nfs_export_ops and check for it in > nfsd4_lock(), nfsd4_locku() and nfs4_set_delegation(). Clients are > not allowed to get file locks or delegations from a reexport server, > any attempts will fail with operation not supported. > > Update the "Reboot recovery" section accordingly in > Documentation/filesystems/nfs/reexport.rst > > Signed-off-by: Mike Snitzer <snitzer@kernel.org> > --- > Documentation/filesystems/nfs/reexport.rst | 10 +++++++--- > fs/nfs/export.c | 3 ++- > fs/nfsd/nfs4state.c | 20 ++++++++++++++++++++ > include/linux/exportfs.h | 14 ++++++++++++++ > 4 files changed, 43 insertions(+), 4 deletions(-) > > v3: refine the patch header and reexport.rst to be clear that both > locks and delegations will fail against an NFS reexport server. > > diff --git a/Documentation/filesystems/nfs/reexport.rst b/Documentation/filesystems/nfs/reexport.rst > index ff9ae4a46530..044be965d75e 100644 > --- a/Documentation/filesystems/nfs/reexport.rst > +++ b/Documentation/filesystems/nfs/reexport.rst > @@ -26,9 +26,13 @@ Reboot recovery > --------------- > > The NFS protocol's normal reboot recovery mechanisms don't work for the > -case when the reexport server reboots. Clients will lose any locks > -they held before the reboot, and further IO will result in errors. > -Closing and reopening files should clear the errors. > +case when the reexport server reboots because the source server has not > +rebooted, and so it is not in grace. Since the source server is not in > +grace, it cannot offer any guarantees that the file won't have been > +changed between the locks getting lost and any attempt to recover them. > +The same applies to delegations and any associated locks. Clients are > +not allowed to get file locks or delegations from a reexport server, any > +attempts will fail with operation not supported. > > Filehandle limits > ----------------- > diff --git a/fs/nfs/export.c b/fs/nfs/export.c > index be686b8e0c54..2f001a0273bc 100644 > --- a/fs/nfs/export.c > +++ b/fs/nfs/export.c > @@ -154,5 +154,6 @@ const struct export_operations nfs_export_ops = { > EXPORT_OP_CLOSE_BEFORE_UNLINK | > EXPORT_OP_REMOTE_FS | > EXPORT_OP_NOATOMIC_ATTR | > - EXPORT_OP_FLUSH_ON_CLOSE, > + EXPORT_OP_FLUSH_ON_CLOSE | > + EXPORT_OP_NOLOCKSUPPORT, > }; > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index ac1859c7cc9d..63297ea82e4e 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -5813,6 +5813,15 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, > if (!nf) > return ERR_PTR(-EAGAIN); > > + /* > + * File delegations and associated locks cannot be recovered if > + * export is from NFS proxy server. > + */ > + if (exportfs_lock_op_is_unsupported(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) { > + nfsd_file_put(nf); > + return ERR_PTR(-EOPNOTSUPP); > + } > + > spin_lock(&state_lock); > spin_lock(&fp->fi_lock); > if (nfs4_delegation_exists(clp, fp)) > @@ -7917,6 +7926,11 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > } > sb = cstate->current_fh.fh_dentry->d_sb; > > + if (exportfs_lock_op_is_unsupported(sb->s_export_op)) { > + status = nfserr_notsupp; > + goto out; > + } > + > if (lock->lk_is_new) { > if (nfsd4_has_session(cstate)) > /* See rfc 5661 18.10.3: given clientid is ignored: */ > @@ -8266,6 +8280,12 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > status = nfserr_lock_range; > goto put_stateid; > } > + > + if (exportfs_lock_op_is_unsupported(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) { > + status = nfserr_notsupp; > + goto put_file; > + } > + > file_lock = locks_alloc_lock(); > if (!file_lock) { > dprintk("NFSD: %s: unable to allocate lock!\n", __func__); > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h > index 893a1d21dc1c..106fd590d323 100644 > --- a/include/linux/exportfs.h > +++ b/include/linux/exportfs.h > @@ -247,6 +247,7 @@ struct export_operations { > */ > #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */ > #define EXPORT_OP_ASYNC_LOCK (0x40) /* fs can do async lock request */ > +#define EXPORT_OP_NOLOCKSUPPORT (0x80) /* no file locking support */ > unsigned long flags; > }; > > @@ -263,6 +264,19 @@ exportfs_lock_op_is_async(const struct export_operations *export_ops) > return export_ops->flags & EXPORT_OP_ASYNC_LOCK; > } > > +/** > + * exportfs_lock_op_is_unsupported() - export does not support file locking > + * @export_ops: the nfs export operations to check > + * > + * Returns true if the nfs export_operations structure has > + * EXPORT_OP_NOLOCKSUPPORT in their flags set > + */ > +static inline bool > +exportfs_lock_op_is_unsupported(const struct export_operations *export_ops) > +{ > + return export_ops->flags & EXPORT_OP_NOLOCKSUPPORT; > +} > + > extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, > int *max_len, struct inode *parent, > int flags); > -- > 2.44.0 > There seems to be some controversy about this approach. Also I think it would be nicer all around if we followed the usual process for changes that introduce possible behavior regressions: - add the new behavior, make it optional, default old behavior - wait a few releases - change the default to new behavior Lastly, there haven't been any user complaints about the current situation of no lock recovery in the re-export case. Jeff and I discussed this, and we plan to drop this one for 6.13 but let the conversation continue. Mike, no action needed on your part for the moment, but please stay tuned! IMO having an export option (along the lines of "async/sync") that is documented in a man page is going to be a better plan. But if we find a way to deal with this situation without a new administrative control, that would be even better.
On Thu, Oct 31, 2024 at 8:01 AM Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > On Oct 31, 2024, at 10:48 AM, Rick Macklem <rick.macklem@gmail.com> wrote: > > > > On Thu, Oct 31, 2024 at 4:43 AM Jeff Layton <jlayton@kernel.org> wrote: > >> > >> On Wed, 2024-10-30 at 15:48 -0700, Rick Macklem wrote: > >>> On Wed, Oct 30, 2024 at 10:08 AM Chuck Lever III <chuck.lever@oracle.com> wrote: > >>>> > >>>> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca. > >>>> > >>>> > >>>> > >>>> > >>>>> On Oct 30, 2024, at 12:37 PM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > >>>>> > >>>>> On Wed, 30 Oct 2024 at 17:15, Chuck Lever III <chuck.lever@oracle.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Oct 30, 2024, at 10:55 AM, Cedric Blancher <cedric.blancher@gmail.com> wrote: > >>>>>>> > >>>>>>> On Tue, 29 Oct 2024 at 17:03, Chuck Lever III <chuck.lever@oracle.com> wrote: > >>>>>>>> > >>>>>>>>> On Oct 29, 2024, at 11:54 AM, Brian Cowan <brian.cowan@hcl-software.com> wrote: > >>>>>>>>> > >>>>>>>>> Honestly, I don't know the usecase for re-exporting another server's > >>>>>>>>> NFS export in the first place. Is this someone trying to share NFS > >>>>>>>>> through a firewall? I've seen people share remote NFS exports via > >>>>>>>>> Samba in an attempt to avoid paying their NAS vendor for SMB support. > >>>>>>>>> (I think it's "standard equipment" now, but 10+ years ago? Not > >>>>>>>>> always...) But re-exporting another server's NFS exports? Haven't seen > >>>>>>>>> anyone do that in a while. > >>>>>>>> > >>>>>>>> The "re-export" case is where there is a central repository > >>>>>>>> of data and branch offices that access that via a WAN. The > >>>>>>>> re-export servers cache some of that data locally so that > >>>>>>>> local clients have a fast persistent cache nearby. > >>>>>>>> > >>>>>>>> This is also effective in cases where a small cluster of > >>>>>>>> clients want fast access to a pile of data that is > >>>>>>>> significantly larger than their own caches. Say, HPC or > >>>>>>>> animation, where the small cluster is working on a small > >>>>>>>> portion of the full data set, which is stored on a central > >>>>>>>> server. > >>>>>>>> > >>>>>>> Another use case is "isolation", IT shares a filesystem to your > >>>>>>> department, and you need to re-export only a subset to another > >>>>>>> department or homeoffice. Part of such a scenario might also be policy > >>>>>>> related, e.g. IT shares you the full filesystem but will do NOTHING > >>>>>>> else, and any further compartmentalization must be done in your own > >>>>>>> department. > >>>>>>> This is the typical use case for gov NFS re-export. > >>>>>> > >>>>>> It's not clear to me from this description why re-export is > >>>>>> the right tool for this job. Please explain why ACLs are not > >>>>>> used in this case -- this is exactly what they are designed > >>>>>> to do. > >>>>> > >>>>> 1. IT departments want better/harder/immutable isolation than ACLs > >>>> > >>>> So you want MAC, and the storage administrator won't set > >>>> that up for you on the NFS server. NFS doesn't do MAC > >>>> very well if at all. > >>>> > >>>> > >>>>> 2. Linux NFSv4 only implements POSIX draft ACLs, not full Windows or > >>>>> NFSv4 ACLs. So there is no proper way to prevent ACL editing, > >>>>> rendering them useless in this case. > >>>> > >>>> Er. Linux NFSv4 stores the ACLs as POSIX draft, because > >>>> that's what Linux file systems can support. NFSD, via > >>>> NFSv4, makes these appear like NFSv4 ACLs. > >>>> > >>>> But I think I understand. > >>>> > >>>> > >>>>> There is a reason why POSIX draft ACls were abandoned - they are not > >>>>> fine-granted enough for real world usage outside the Linux universe. > >>>>> As soon as interoperability is required these things just bite you > >>>>> HARD. > >>>> > >>>> You, of course, have the ability to run some other NFS > >>>> server implementation that meets your security requirements > >>>> more fully. > >>>> > >>>> > >>>>> Also, just running more nfsd in parallel on the origin NFS server is > >>>>> not a better option - remember the debate of non-2049 ports for nfsd? > >>>> > >>>> I'm not sure where this is going. Do you mean the storage > >>>> administrator would provide NFS service on alternate > >>>> ports that each expose a separate set of exports? > >>>> > >>>> So the only option Linux has there is using containers or > >>>> libvirt. We've continued to privately discuss the ability > >>>> for NFSD to support a separate set of exports on alternate > >>>> ports, but it doesn't look feasible. The export management > >>>> infrastructure and user space tools would need to be > >>>> rewritten. > >>>> > >>>> > >>>>>> And again, clients of the re-export server need to mount it > >>>>>> with local_lock. Apps can still use locking in that case, > >>>>>> but the locks are not visible to apps on other clients. Your > >>>>>> description does not explain why local_lock is not > >>>>>> sufficient or feasible. > >>>>> > >>>>> Because: > >>>>> - it breaks applications running on more than one machine? > >>>> > >>>> Yes, obviously. Your description needs to mention that is > >>>> a requirement, since there are a lot of applications that > >>>> don't need locking across multiple clients. > >>>> > >>>> > >>>>> - it breaks use cases like NFS--->SMB bridges, because without locking > >>>>> the typical Windows .NET application will refuse to write to a file > >>>> > >>>> That's a quagmire, and I don't think we can guarantee that > >>>> will work. Linux NFS doesn't support "deny" modes, for > >>>> example. > >>>> > >>>> > >>>>> - it breaks even SIMPLE things like Microsoft Excel > >>>> > >>>> If you need SMB semantics, why not use Samba? > >>>> > >>>> The upshot appears to be that this usage is a stack of > >>>> mismatched storage protocols that work around a bunch of > >>>> local IT bureaucracy. I'm trying to be sympathetic, but > >>>> it's hard to say that /anyone/ would fully support this. > >>>> > >>>> > >>>>> Of course the happy echo "hello Linux-NFSv4-only world" >/nfs/file > >>>>> will always work. > >>>>> > >>>>>>> Of course no one needs the gov customers, so feel free to break locking. > >>>>>> > >>>>>> > >>>>>> Please have a look at the patch description again: lock > >>>>>> recovery does not work now, and cannot work without > >>>>>> changes to the protocol. Isn't that a problem for such > >>>>>> workloads? > >>>>> > >>>>> Nope, because of UPS (Uninterruptible power supply). Either everything > >>>>> is UP, or *everything* is DOWN. Boolean. > >>>> > >>>> Power outages are not the only reason lock recovery might > >>>> be necessary. Network partitions, re-export server > >>>> upgrades or reboots, etc. So I'm not hearing anythying > >>>> to suggest this kind of workload is not impacted by > >>>> the current lock recovery problems. > >>>> > >>>> > >>>>>> In other words, locking is already broken on NFSv4 re-export, > >>>>>> but the current situation can lead to silent data corruption. > >>>>> > >>>>> Would storing the locking information into persistent files help, ie. > >>>>> files which persist across nfsd server restarts? > >>>> > >>>> Yes, but it would make things horribly slow. > >>>> > >>>> And of course there would be a lot of coding involved > >>>> to get this to work. > >>> I suspect this suggestion might be a fair amount of code too > >>> (and I am certainly not volunteering to write it), but I will mention it. > >>> > >>> Another possibility would be to have the re-exporting NFSv4 server > >>> just pass locking ops through to the backend NFSv4 server. > >>> - It is roughly the inverse of what I did when I constructed a flex files > >>> pNFS server. The MDS did the locking ops and any I/O ops. were > >>> passed through to the DS(s). Of course, it was hoped the client > >>> would use layouts and bypass the MDS for I/O. > >>> > >> > >> How do you handle reclaim in this case? IOW, suppose the backend server > >> crashes but the reexporter stays up. How do you coordinate the grace > >> periods between the two so that the client can reclaim its lock on the > >> backend? > > Well, I'm not saying it is trivial. > > I think you would need to pass through all state operations: > > ExchangeID, Open,...,Lock,LockU > > - The tricky bit would be sessions, since the re-exporter would need to > > maintain sessions. > > --> Maybe the re-exporter would need to save the ClientID (from the > > backend nfsd) in non-volatile storage. > > > > When the backend server crashes/reboots, the re-exporter would see > > this as a failure (usually NFS4ERR_BAD_SESSION) and would pass > > that to the client. > > The only recovery RPC that would not be passed through would be > > Create_session, although the re-exporter would do a Create_session > > for connection(s) it has against the backend server. > > I think something like that would work for the backend crash/recovery. > > The backend server would be in grace, and the re-exporter > would be able to recover its lock state on the backend > server using normal state recovery. I think the re- > exporter would not need to expose the backend server's > crash to its own clients. For what I suggested, the re-exporting server does not hold any state, except for sessions. (It essentially becomes like an NFSv3 stateless server.) It would expose the backend server's crash/reboot to the client, which would do the recovery. (By pass through I mean "just repackage the arguments and do the operation against the backend server" instead of doing the operations in the re-exporter server. For example, it would have a separate "struct nfsd4_operations" array with different functions for open, lock, ...) Sessions are the weird case and the re-exporter would have to maintain session(s) for the client. On backend server reboot, the re-exporter would see NFS4ERR_BAD_SESSION. It would then nuke the session(s) for the client, so that it sees NFS4ERR_BAD_SESSION as well and starts state recovery. Those state recovery ops would be passed through to the backend server. When the re-exporter reboots, it only needs to recover sessions. It would reply NFS4ERR_BAD_SESSION to the client. The client would do a Create_session using the backend server's clientID. At that point, the re-exporter would know the clientID, which it could use to Create_session against the backend server and then it could create the session for the client side, assuming the Create_session on the backend server worked ok. rick > > > > A crash of the re-exporter could be more of a problem, I think. > > It would need to have the ClientID (stored in non-volatile storage) > > so that it could do a Create_session with it against the backend server. > > - It would also depend on the backend server being courteous, so that > > an re-exporter crash/reboot that takes a while such that the lease expires > > doesn't result in a loss of state on the backend server. > > The backend server would not be in grace after the re-export > server crashes. There's no way for the re-export server's > NFS client to recover its lock state from the backend server. > > The re-export server recovers by re-learning lock state from > its own clients. The question is how the re-export server > could re-initialize this state in its local client of the > backend server. > > > -- > Chuck Lever > >
On Thu, Oct 31, 2024 at 11:14:51AM -0400, Chuck Lever wrote: > On Wed, Oct 23, 2024 at 11:58:46AM -0400, Mike Snitzer wrote: > > We do not and cannot support file locking with NFS reexport over > > NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport [ ... patch snipped ... ] > > diff --git a/Documentation/filesystems/nfs/reexport.rst b/Documentation/filesystems/nfs/reexport.rst > > index ff9ae4a46530..044be965d75e 100644 > > --- a/Documentation/filesystems/nfs/reexport.rst > > +++ b/Documentation/filesystems/nfs/reexport.rst > > @@ -26,9 +26,13 @@ Reboot recovery > > --------------- > > > > The NFS protocol's normal reboot recovery mechanisms don't work for the > > -case when the reexport server reboots. Clients will lose any locks > > -they held before the reboot, and further IO will result in errors. > > -Closing and reopening files should clear the errors. > > +case when the reexport server reboots because the source server has not > > +rebooted, and so it is not in grace. Since the source server is not in > > +grace, it cannot offer any guarantees that the file won't have been > > +changed between the locks getting lost and any attempt to recover them. > > +The same applies to delegations and any associated locks. Clients are > > +not allowed to get file locks or delegations from a reexport server, any > > +attempts will fail with operation not supported. > > > > Filehandle limits > > ----------------- Note for Mike: Last sentence "Clients are not allowed to get ... delegations from a reexport server" -- IIUC it's up to the re-export server to not hand out delegations to its clients. Still, it's important to note that NFSv4 delegation would not be available for re-exports. See below for more: I'd like this paragraph to continue to discuss the issue of OPEN and I/O behavior when the re-export server restarts. The patch seems to redact that bit of detail. Following is general discussion: > There seems to be some controversy about this approach. > > Also I think it would be nicer all around if we followed the usual > process for changes that introduce possible behavior regressions: > > - add the new behavior, make it optional, default old behavior > - wait a few releases > - change the default to new behavior > > Lastly, there haven't been any user complaints about the current > situation of no lock recovery in the re-export case. > > Jeff and I discussed this, and we plan to drop this one for 6.13 but > let the conversation continue. Mike, no action needed on your part > for the moment, but please stay tuned! > > IMO having an export option (along the lines of "async/sync") that > is documented in a man page is going to be a better plan. But if we > find a way to deal with this situation without a new administrative > control, that would be even better. Proposed solutions so far: - Disable NFS locking entirely on NFS re-export - Implement full state pass-through for re-export Some history of the NFSD design and the re-export issue is provided here: http://wiki.linux-nfs.org/wiki/index.php/NFS_re-export#reboot_recovery Certain usage scenarios require that lock state be globally visible, so disabling NFS locking on re-export mounts will need to be considered carefully. Assuming that NFSv4 LOCK operations are proliferated to the back-end server in today's NFSD, does it make sense to avoid code changes at the moment, but more carefully document the configuration options and their risks? +++ In all following configurations, no state recovery occurs when the re-export server restarts, as explained in Documentation/filesystems/nfs/reexport.rst. Mount options on the re-export server and clients: * All default: open and lock state is proliferated to the back-end server and is visible to all NFS clients. * local_lock=all on the re-export server's mount of the back-end server: clients of that server all see the same set of locks, but these locks are not visible to the back-end server or any of its clients. Open state is visible everywhere. * local_lock=all on the NFS mounts on client mounts of the re-export server: applications on NFS clients do not see locks set by applications on any other NFS clients. Open state is visible everywhere. When an NFS client of the re-export server OPENs a file, currently that creates OPEN state on the re-export server, and I assume also on the back-end server. That state cannot be recovered if the re-export server restarts, but it also cannot be blocked by a mount option. Likewise, I assume the back-end server can hand out delegations to the re-export server. If the re-export server restarts, how does it recover those delegations? The re-export server could disable delegation by blocking off its callback service, but should it? What, if anything, is being done to further develop and regularly test NFS re-export in upstream kernels? The reexport.rst file: This still reads more like design notes than administrative documentation. IMHO it should instead have a more detailed description and disclaimer regarding what kind of manual recovery is needed after a re-export server restart. That seems like important information for administrators who think they might want to deploy this solution. Maybe Documentation/ isn't the right place for administrative documentation? It might be prudent to (temporarily) label NFS re-export as experimental use only, given its incompleteness and the long list of caveats.
On Mon, 18 Nov 2024 at 18:57, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Thu, Oct 31, 2024 at 11:14:51AM -0400, Chuck Lever wrote: > > On Wed, Oct 23, 2024 at 11:58:46AM -0400, Mike Snitzer wrote: > > > We do not and cannot support file locking with NFS reexport over > > > NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport > > [ ... patch snipped ... ] > > > > diff --git a/Documentation/filesystems/nfs/reexport.rst b/Documentation/filesystems/nfs/reexport.rst > > > index ff9ae4a46530..044be965d75e 100644 > > > --- a/Documentation/filesystems/nfs/reexport.rst > > > +++ b/Documentation/filesystems/nfs/reexport.rst > > > @@ -26,9 +26,13 @@ Reboot recovery > > > --------------- > > > > > > The NFS protocol's normal reboot recovery mechanisms don't work for the > > > -case when the reexport server reboots. Clients will lose any locks > > > -they held before the reboot, and further IO will result in errors. > > > -Closing and reopening files should clear the errors. > > > +case when the reexport server reboots because the source server has not > > > +rebooted, and so it is not in grace. Since the source server is not in > > > +grace, it cannot offer any guarantees that the file won't have been > > > +changed between the locks getting lost and any attempt to recover them. > > > +The same applies to delegations and any associated locks. Clients are > > > +not allowed to get file locks or delegations from a reexport server, any > > > +attempts will fail with operation not supported. > > > > > > Filehandle limits > > > ----------------- > > Note for Mike: > > Last sentence "Clients are not allowed to get ... delegations from a > reexport server" -- IIUC it's up to the re-export server to not hand > out delegations to its clients. Still, it's important to note that > NFSv4 delegation would not be available for re-exports. > > See below for more: I'd like this paragraph to continue to discuss > the issue of OPEN and I/O behavior when the re-export server > restarts. The patch seems to redact that bit of detail. > > Following is general discussion: > > > > There seems to be some controversy about this approach. > > > > Also I think it would be nicer all around if we followed the usual > > process for changes that introduce possible behavior regressions: > > > > - add the new behavior, make it optional, default old behavior > > - wait a few releases > > - change the default to new behavior > > > > Lastly, there haven't been any user complaints about the current > > situation of no lock recovery in the re-export case. > > > > Jeff and I discussed this, and we plan to drop this one for 6.13 but > > let the conversation continue. Mike, no action needed on your part > > for the moment, but please stay tuned! > > > > IMO having an export option (along the lines of "async/sync") that > > is documented in a man page is going to be a better plan. But if we > > find a way to deal with this situation without a new administrative > > control, that would be even better. > > Proposed solutions so far: > > - Disable NFS locking entirely on NFS re-export > > - Implement full state pass-through for re-export > > Some history of the NFSD design and the re-export issue is provided > here: > > http://wiki.linux-nfs.org/wiki/index.php/NFS_re-export#reboot_recovery > > Certain usage scenarios require that lock state be globally visible, > so disabling NFS locking on re-export mounts will need to be > considered carefully. > > Assuming that NFSv4 LOCK operations are proliferated to the back-end > server in today's NFSD, does it make sense to avoid code changes at > the moment, but more carefully document the configuration options > and their risks? > > +++ In all following configurations, no state recovery occurs when > the re-export server restarts, as explained in > Documentation/filesystems/nfs/reexport.rst. > > Mount options on the re-export server and clients: > > * All default: open and lock state is proliferated to the back-end > server and is visible to all NFS clients. > > * local_lock=all on the re-export server's mount of the back-end > server: clients of that server all see the same set of locks, but > these locks are not visible to the back-end server or any of its > clients. Open state is visible everywhere. > > * local_lock=all on the NFS mounts on client mounts of the re-export > server: applications on NFS clients do not see locks set by > applications on any other NFS clients. Open state is visible > everywhere. > > When an NFS client of the re-export server OPENs a file, currently > that creates OPEN state on the re-export server, and I assume also > on the back-end server. That state cannot be recovered if the > re-export server restarts, but it also cannot be blocked by a mount > option. > > Likewise, I assume the back-end server can hand out delegations to > the re-export server. If the re-export server restarts, how does it > recover those delegations? The re-export server could disable > delegation by blocking off its callback service, but should it? > > What, if anything, is being done to further develop and regularly > test NFS re-export in upstream kernels? > > The reexport.rst file: This still reads more like design notes than > administrative documentation. IMHO it should instead have a more > detailed description and disclaimer regarding what kind of manual > recovery is needed after a re-export server restart. That seems like > important information for administrators who think they might want > to deploy this solution. Maybe Documentation/ isn't the right place > for administrative documentation? > > It might be prudent to (temporarily) label NFS re-export as > experimental use only, given its incompleteness and the long list > of caveats. As someone who uses NFSv3 re-export extensively in production, I can't comment much on the "correctness" of the current locking, but it is "good enough" for us (we don't explicitly mount with local locks atm). The unique thing about our workloads though is that other than maybe the odd log file or home directory shell history file, a single process always writes a new unique file and we never overwrite. We have an asset management DB that determines the file paths to be written and a batch system to run processes (i.e. a production pipeline + render farm). We also really try to avoid having either the origin backend server or re-export server crash/reboot. But even when once a year something does invariably go wrong, we are willing to take the hit on broken mounts, processes or corrupted files (just re-run the batch jobs). Basically the upsides outweigh the downsides for our specific workloads. Coupled with FS-Cache and a few TBs of storage, using a re-export server is a very efficient way to serve files to many clients over a bandwidth constrained and/or high latency WAN link. In the case of high latency (e.g. global offices), we even do things like increase actimeo and disable CTO to reduce repeat metadata round-trips to the absolute minimum. Again, I think we have a unique workload that allows for this. If the locks will eventually be passed through to the backend server, then I suspect we would still want a way to opt out to reduce WAN latency overhead at the expense of locking correctness (maybe just using local locks). I think others with similar workloads are using it in this way too and I know Google were maintaining a howto to help customers migrate workloads to their cloud: https://github.com/GoogleCloudPlatform/knfsd-cache-utils https://cloud.google.com/architecture/deploy-nfs-caching-proxy-compute-engine Although it seems like that specific project has gone a bit quiet of late. They also helped get the reexport/crossmount fsidd helper merged into nfs-utils. I have also heard others say reexports are useful for "converting" NFSv4 storage to NFSv3 (or vice-versa) for older non-NFSv4 clients or servers, but I'm not sure how big a thing that is in this day and age. I guess Netapp's "FlexCache" product is a doing a similar thing to reexporting and seems to lean heavily on NFSv4 and delegations to achieve that? The latest version can even do write-back caching on files (get lock first, write back later). I could probably write a whole (longish) thread about the different ways we currently use NFS re-exporting and some of the remaining pitfalls if there is any interest in that... Daire
diff --git a/Documentation/filesystems/nfs/reexport.rst b/Documentation/filesystems/nfs/reexport.rst index ff9ae4a46530..044be965d75e 100644 --- a/Documentation/filesystems/nfs/reexport.rst +++ b/Documentation/filesystems/nfs/reexport.rst @@ -26,9 +26,13 @@ Reboot recovery --------------- The NFS protocol's normal reboot recovery mechanisms don't work for the -case when the reexport server reboots. Clients will lose any locks -they held before the reboot, and further IO will result in errors. -Closing and reopening files should clear the errors. +case when the reexport server reboots because the source server has not +rebooted, and so it is not in grace. Since the source server is not in +grace, it cannot offer any guarantees that the file won't have been +changed between the locks getting lost and any attempt to recover them. +The same applies to delegations and any associated locks. Clients are +not allowed to get file locks or delegations from a reexport server, any +attempts will fail with operation not supported. Filehandle limits ----------------- diff --git a/fs/nfs/export.c b/fs/nfs/export.c index be686b8e0c54..2f001a0273bc 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -154,5 +154,6 @@ const struct export_operations nfs_export_ops = { EXPORT_OP_CLOSE_BEFORE_UNLINK | EXPORT_OP_REMOTE_FS | EXPORT_OP_NOATOMIC_ATTR | - EXPORT_OP_FLUSH_ON_CLOSE, + EXPORT_OP_FLUSH_ON_CLOSE | + EXPORT_OP_NOLOCKSUPPORT, }; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index ac1859c7cc9d..63297ea82e4e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5813,6 +5813,15 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, if (!nf) return ERR_PTR(-EAGAIN); + /* + * File delegations and associated locks cannot be recovered if + * export is from NFS proxy server. + */ + if (exportfs_lock_op_is_unsupported(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) { + nfsd_file_put(nf); + return ERR_PTR(-EOPNOTSUPP); + } + spin_lock(&state_lock); spin_lock(&fp->fi_lock); if (nfs4_delegation_exists(clp, fp)) @@ -7917,6 +7926,11 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } sb = cstate->current_fh.fh_dentry->d_sb; + if (exportfs_lock_op_is_unsupported(sb->s_export_op)) { + status = nfserr_notsupp; + goto out; + } + if (lock->lk_is_new) { if (nfsd4_has_session(cstate)) /* See rfc 5661 18.10.3: given clientid is ignored: */ @@ -8266,6 +8280,12 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfserr_lock_range; goto put_stateid; } + + if (exportfs_lock_op_is_unsupported(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) { + status = nfserr_notsupp; + goto put_file; + } + file_lock = locks_alloc_lock(); if (!file_lock) { dprintk("NFSD: %s: unable to allocate lock!\n", __func__); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 893a1d21dc1c..106fd590d323 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -247,6 +247,7 @@ struct export_operations { */ #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */ #define EXPORT_OP_ASYNC_LOCK (0x40) /* fs can do async lock request */ +#define EXPORT_OP_NOLOCKSUPPORT (0x80) /* no file locking support */ unsigned long flags; }; @@ -263,6 +264,19 @@ exportfs_lock_op_is_async(const struct export_operations *export_ops) return export_ops->flags & EXPORT_OP_ASYNC_LOCK; } +/** + * exportfs_lock_op_is_unsupported() - export does not support file locking + * @export_ops: the nfs export operations to check + * + * Returns true if the nfs export_operations structure has + * EXPORT_OP_NOLOCKSUPPORT in their flags set + */ +static inline bool +exportfs_lock_op_is_unsupported(const struct export_operations *export_ops) +{ + return export_ops->flags & EXPORT_OP_NOLOCKSUPPORT; +} + extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len, struct inode *parent, int flags);
We do not and cannot support file locking with NFS reexport over NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport server reboot cannot allow clients to recover locks because the source NFS server has not rebooted, and so it is not in grace. Since the source NFS server is not in grace, it cannot offer any guarantees that the file won't have been changed between the locks getting lost and any attempt to recover/reclaim them. The same applies to delegations and any associated locks, so disallow them too. Add EXPORT_OP_NOLOCKSUPPORT and exportfs_lock_op_is_unsupported(), set EXPORT_OP_NOLOCKSUPPORT in nfs_export_ops and check for it in nfsd4_lock(), nfsd4_locku() and nfs4_set_delegation(). Clients are not allowed to get file locks or delegations from a reexport server, any attempts will fail with operation not supported. Update the "Reboot recovery" section accordingly in Documentation/filesystems/nfs/reexport.rst Signed-off-by: Mike Snitzer <snitzer@kernel.org> --- Documentation/filesystems/nfs/reexport.rst | 10 +++++++--- fs/nfs/export.c | 3 ++- fs/nfsd/nfs4state.c | 20 ++++++++++++++++++++ include/linux/exportfs.h | 14 ++++++++++++++ 4 files changed, 43 insertions(+), 4 deletions(-) v3: refine the patch header and reexport.rst to be clear that both locks and delegations will fail against an NFS reexport server.