Message ID | 20241101135452.19359-1-erin.shepherd@e43.eu (mailing list archive) |
---|---|
Headers | show |
Series | pidfs: implement file handle support | expand |
On Fri, Nov 01, 2024 at 01:54:48PM +0000, Erin Shepherd wrote: > Since the introduction of pidfs, we have had 64-bit process identifiers > that will not be reused for the entire uptime of the system. This greatly > facilitates process tracking in userspace. > > There are two limitations at present: > > * These identifiers are currently only exposed to processes on 64-bit > systems. On 32-bit systems, inode space is also limited to 32 bits and > therefore is subject to the same reuse issues. > * There is no way to go from one of these unique identifiers to a pid or > pidfd. > > Patch 1 & 2 in this stack implements fh_export for pidfs. This means > userspace can retrieve a unique process identifier even on 32-bit systems > via name_to_handle_at. > > Patch 3 & 4 in this stack implement fh_to_dentry for pidfs. This means > userspace can convert back from a file handle to the corresponding pidfd. > To support us going from a file handle to a pidfd, we have to store a pid > inside the file handle. To ensure file handles are invariant and can move > between pid namespaces, we stash a pid from the initial namespace inside > the file handle. > > I'm not quite sure if stashing an initial-namespace pid inside the file > handle is the right approach here; if not, I think that patch 1 & 2 are > useful on their own. Sorry for the delayed reply (I'm recovering from a lengthy illness.). I like the idea in general. I think this is really useful. A few of my thoughts but I need input from Amir and Jeff: * In the last patch of the series you already implement decoding of pidfd file handles by adding a .fh_to_dentry export_operations method. There are a few things to consider because of how open_by_handle_at() works. - open_by_handle_at() needs to be restricted so it only creates pidfds from pidfs file handles that resolve to a struct pid that is reachable in the caller's pid namespace. In other words, it should mirror pidfd_open(). Put another way, open_by_handle_at() must not be usable to open arbitrary pids to prevent a container from constructing a pidfd file handle for a process that lives outside it's pid namespace hierarchy. With this restriction in place open_by_handle_at() can be available to let unprivileged processes open pidfd file handles. Related to that, I don't think we need to make open_by_handle_at() open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply because any process in the initial pid namespace can open any other process via pidfd_open() anyway because pid namespaces are hierarchical. IOW, CAP_DAC_READ_SEARCH must not override the restriction that the provided pidfs file handle must be reachable from the caller's pid namespace. - open_by_handle_at() uses may_decode_fh() to determine whether it's possible to decode a file handle as an unprivileged user. The current checks don't make sense for pidfs. Conceptually, I think there don't need to place any restrictions based on global CAP_DAC_READ_SEARCH, owning user namespace of the superblock or mount on pidfs file handles. The only restriction that matters is that the requested pidfs file handle is reachable from the caller's pid namespace. - A pidfd always has exactly a single inode and a single dentry. There's no aliases. - Generally, in my naive opinion, I think that decoding pidfs file handles should be a lot simpler than decoding regular path based file handles. Because there should be no need to verify any ancestors, or reconnect paths. Pidfs also doesn't have directory inodes, only regular inodes. In other words, any dentry is acceptable. Essentially, the only thing we need is for exportfs_decode_fh_raw() to verify that the provided pidfs file handle is resolvable in the caller's pid namespace. If so we're done. The challenge is how to nicely plumb this into the code without it sticking out like a sore thumb. - Pidfs should not be exportable via NFS. It doesn't make sense.
On Tue, 2024-11-12 at 14:10 +0100, Christian Brauner wrote: > On Fri, Nov 01, 2024 at 01:54:48PM +0000, Erin Shepherd wrote: > > Since the introduction of pidfs, we have had 64-bit process identifiers > > that will not be reused for the entire uptime of the system. This greatly > > facilitates process tracking in userspace. > > > > There are two limitations at present: > > > > * These identifiers are currently only exposed to processes on 64-bit > > systems. On 32-bit systems, inode space is also limited to 32 bits and > > therefore is subject to the same reuse issues. We should really just move to storing 64-bit inode numbers internally on 32-bit machines. That would at least make statx() give you all 64 bits on 32-bit host. > > * There is no way to go from one of these unique identifiers to a pid or > > pidfd. > > > > Patch 1 & 2 in this stack implements fh_export for pidfs. This means > > userspace can retrieve a unique process identifier even on 32-bit systems > > via name_to_handle_at. > > > > Patch 3 & 4 in this stack implement fh_to_dentry for pidfs. This means > > userspace can convert back from a file handle to the corresponding pidfd. > > To support us going from a file handle to a pidfd, we have to store a pid > > inside the file handle. To ensure file handles are invariant and can move > > between pid namespaces, we stash a pid from the initial namespace inside > > the file handle. > > > > I'm not quite sure if stashing an initial-namespace pid inside the file > > handle is the right approach here; if not, I think that patch 1 & 2 are > > useful on their own. Hmm... I guess pid namespaces don't have a convenient 64-bit ID like mount namespaces do? In that case, stashing the pid from init_ns is probably the next best thing. > > Sorry for the delayed reply (I'm recovering from a lengthy illness.). > > I like the idea in general. I think this is really useful. A few of my > thoughts but I need input from Amir and Jeff: > > * In the last patch of the series you already implement decoding of > pidfd file handles by adding a .fh_to_dentry export_operations method. > > There are a few things to consider because of how open_by_handle_at() > works. > > - open_by_handle_at() needs to be restricted so it only creates pidfds > from pidfs file handles that resolve to a struct pid that is > reachable in the caller's pid namespace. In other words, it should > mirror pidfd_open(). > > Put another way, open_by_handle_at() must not be usable to open > arbitrary pids to prevent a container from constructing a pidfd file > handle for a process that lives outside it's pid namespace > hierarchy. > > With this restriction in place open_by_handle_at() can be available > to let unprivileged processes open pidfd file handles. > > Related to that, I don't think we need to make open_by_handle_at() > open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply > because any process in the initial pid namespace can open any other > process via pidfd_open() anyway because pid namespaces are > hierarchical. > > IOW, CAP_DAC_READ_SEARCH must not override the restriction that the > provided pidfs file handle must be reachable from the caller's pid > namespace. > > - open_by_handle_at() uses may_decode_fh() to determine whether it's > possible to decode a file handle as an unprivileged user. The > current checks don't make sense for pidfs. Conceptually, I think > there don't need to place any restrictions based on global > CAP_DAC_READ_SEARCH, owning user namespace of the superblock or > mount on pidfs file handles. > > The only restriction that matters is that the requested pidfs file > handle is reachable from the caller's pid namespace. > > - A pidfd always has exactly a single inode and a single dentry. > There's no aliases. > > - Generally, in my naive opinion, I think that decoding pidfs file > handles should be a lot simpler than decoding regular path based > file handles. Because there should be no need to verify any > ancestors, or reconnect paths. Pidfs also doesn't have directory > inodes, only regular inodes. In other words, any dentry is > acceptable. > > Essentially, the only thing we need is for exportfs_decode_fh_raw() > to verify that the provided pidfs file handle is resolvable in the > caller's pid namespace. If so we're done. The challenge is how to > nicely plumb this into the code without it sticking out like a sore > thumb. > > - Pidfs should not be exportable via NFS. It doesn't make sense. I haven't looked over the patchset yet, but those restrictions all sound pretty reasonable to me. Special casing the may_decode_fh permission checks may be the tricky bit. I'm not sure what that should look like, tbqh.
On 12/11/2024 14:57, Jeff Layton wrote: > On Tue, 2024-11-12 at 14:10 +0100, Christian Brauner wrote: > We should really just move to storing 64-bit inode numbers internally > on 32-bit machines. That would at least make statx() give you all 64 > bits on 32-bit host. I think that would be ideal from the perspective of exposing it to userspace. It does leave the question of going back from inode to pidfd unsolved though.I like the name_to_handle_at/open_by_handle_at approach because it neatly solves both sides of the problem with APIs we already have and understand > Hmm... I guess pid namespaces don't have a convenient 64-bit ID like > mount namespaces do? In that case, stashing the pid from init_ns is > probably the next best thing. Not that I could identify, no; so stashing the PID seemed like the most pragmatic approach. I'm not 100% sure it should be a documented property of the file handle format; I somewhat think that everything after the PID inode should be opaque to userspace and subject to change in the future (to the point I considered xoring it with a magic constant to make it less obvious to userspace/make it more obvious that its not to be relied upon; but that to my knowledge is not something that the kernel has done elsewhere). - Erin
On 12/11/2024 14:10, Christian Brauner wrote: > Sorry for the delayed reply (I'm recovering from a lengthy illness.). No worries on my part, and I hope you're feeling better! > I like the idea in general. I think this is really useful. A few of my > thoughts but I need input from Amir and Jeff: > > * In the last patch of the series you already implement decoding of > pidfd file handles by adding a .fh_to_dentry export_operations method. > > There are a few things to consider because of how open_by_handle_at() > works. > > - open_by_handle_at() needs to be restricted so it only creates pidfds > from pidfs file handles that resolve to a struct pid that is > reachable in the caller's pid namespace. In other words, it should > mirror pidfd_open(). > > Put another way, open_by_handle_at() must not be usable to open > arbitrary pids to prevent a container from constructing a pidfd file > handle for a process that lives outside it's pid namespace > hierarchy. > > With this restriction in place open_by_handle_at() can be available > to let unprivileged processes open pidfd file handles. > > Related to that, I don't think we need to make open_by_handle_at() > open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply > because any process in the initial pid namespace can open any other > process via pidfd_open() anyway because pid namespaces are > hierarchical. > > IOW, CAP_DAC_READ_SEARCH must not override the restriction that the > provided pidfs file handle must be reachable from the caller's pid > namespace. The pid_vnr(pid) == 0 check catches this case -- we return an error to the caller if there isn't a pid mapping in the caller's namespace Perhaps I should have called this out explicitly. > - open_by_handle_at() uses may_decode_fh() to determine whether it's > possible to decode a file handle as an unprivileged user. The > current checks don't make sense for pidfs. Conceptually, I think > there don't need to place any restrictions based on global > CAP_DAC_READ_SEARCH, owning user namespace of the superblock or > mount on pidfs file handles. > > The only restriction that matters is that the requested pidfs file > handle is reachable from the caller's pid namespace. I wonder if this could be handled through an addition to export_operations' flags member? > - A pidfd always has exactly a single inode and a single dentry. > There's no aliases. > > - Generally, in my naive opinion, I think that decoding pidfs file > handles should be a lot simpler than decoding regular path based > file handles. Because there should be no need to verify any > ancestors, or reconnect paths. Pidfs also doesn't have directory > inodes, only regular inodes. In other words, any dentry is > acceptable. > > Essentially, the only thing we need is for exportfs_decode_fh_raw() > to verify that the provided pidfs file handle is resolvable in the > caller's pid namespace. If so we're done. The challenge is how to > nicely plumb this into the code without it sticking out like a sore > thumb. Theoretically you should be able to use PIDFD_SELF as well (assuming that makes its way into mainline this release :-)) but I am a bit concerned about potentially polluting the open_by_handle_at logic with pidfd specificities. > - Pidfs should not be exportable via NFS. It doesn't make sense. Hmm, I guess I might have made that possible, though I'm certainly not familiar enough with the internals of nfsd to be able to test if I've done so. I guess probably this case calls for another export_ops flag? Not like we're short on them Thanks, - Erin
On Tue, Nov 12, 2024 at 11:43:13PM +0100, Erin Shepherd wrote: > On 12/11/2024 14:57, Jeff Layton wrote: > > On Tue, 2024-11-12 at 14:10 +0100, Christian Brauner wrote: > > We should really just move to storing 64-bit inode numbers internally > > on 32-bit machines. That would at least make statx() give you all 64 > > bits on 32-bit host. > > I think that would be ideal from the perspective of exposing it to > userspace. > It does leave the question of going back from inode to pidfd unsolved > though.I like the name_to_handle_at/open_by_handle_at approach because > it neatly solves both sides of the problem with APIs we already have and > understand > > > Hmm... I guess pid namespaces don't have a convenient 64-bit ID like > > mount namespaces do? In that case, stashing the pid from init_ns is > > probably the next best thing. > > Not that I could identify, no; so stashing the PID seemed like the most > pragmatic > approach. > > I'm not 100% sure it should be a documented property of the file > handle format; I somewhat think that everything after the PID inode > should be opaque to userspace and subject to change in the future (to > the point I considered xoring it with a magic constant to make it less > obvious to userspace/make it more obvious that its not to be relied > upon; but that to my knowledge is not something that the kernel has > done elsewhere). It's a handle, the internal details of its layout of it is supposed to be opaque to userspace. I wonder how well userspace deals with weirdly sized handles though... --D > - Erin > >
On Wed, Nov 13, 2024 at 12:03:08AM +0100, Erin Shepherd wrote: > > On 12/11/2024 14:10, Christian Brauner wrote: > > Sorry for the delayed reply (I'm recovering from a lengthy illness.). > No worries on my part, and I hope you're feeling better! > > I like the idea in general. I think this is really useful. A few of my > > thoughts but I need input from Amir and Jeff: > > > > * In the last patch of the series you already implement decoding of > > pidfd file handles by adding a .fh_to_dentry export_operations method. > > > > There are a few things to consider because of how open_by_handle_at() > > works. > > > > - open_by_handle_at() needs to be restricted so it only creates pidfds > > from pidfs file handles that resolve to a struct pid that is > > reachable in the caller's pid namespace. In other words, it should > > mirror pidfd_open(). > > > > Put another way, open_by_handle_at() must not be usable to open > > arbitrary pids to prevent a container from constructing a pidfd file > > handle for a process that lives outside it's pid namespace > > hierarchy. > > > > With this restriction in place open_by_handle_at() can be available > > to let unprivileged processes open pidfd file handles. > > > > Related to that, I don't think we need to make open_by_handle_at() > > open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply > > because any process in the initial pid namespace can open any other > > process via pidfd_open() anyway because pid namespaces are > > hierarchical. > > > > IOW, CAP_DAC_READ_SEARCH must not override the restriction that the > > provided pidfs file handle must be reachable from the caller's pid > > namespace. > > The pid_vnr(pid) == 0 check catches this case -- we return an error to the > caller if there isn't a pid mapping in the caller's namespace > > Perhaps I should have called this out explicitly. > > > - open_by_handle_at() uses may_decode_fh() to determine whether it's > > possible to decode a file handle as an unprivileged user. The > > current checks don't make sense for pidfs. Conceptually, I think > > there don't need to place any restrictions based on global > > CAP_DAC_READ_SEARCH, owning user namespace of the superblock or > > mount on pidfs file handles. > > > > The only restriction that matters is that the requested pidfs file > > handle is reachable from the caller's pid namespace. > > I wonder if this could be handled through an addition to export_operations' > flags member? > > > - A pidfd always has exactly a single inode and a single dentry. > > There's no aliases. > > > > - Generally, in my naive opinion, I think that decoding pidfs file > > handles should be a lot simpler than decoding regular path based > > file handles. Because there should be no need to verify any > > ancestors, or reconnect paths. Pidfs also doesn't have directory > > inodes, only regular inodes. In other words, any dentry is > > acceptable. > > > > Essentially, the only thing we need is for exportfs_decode_fh_raw() > > to verify that the provided pidfs file handle is resolvable in the > > caller's pid namespace. If so we're done. The challenge is how to > > nicely plumb this into the code without it sticking out like a sore > > thumb. > > Theoretically you should be able to use PIDFD_SELF as well (assuming that > makes its way into mainline this release :-)) but I am a bit concerned about > potentially polluting the open_by_handle_at logic with pidfd specificities. > > > - Pidfs should not be exportable via NFS. It doesn't make sense. > > Hmm, I guess I might have made that possible, though I'm certainly not > familiar enough with the internals of nfsd to be able to test if I've done > so. AFAIK check_export() in fs/nfsd/export.c spells this it out: /* There are two requirements on a filesystem to be exportable. * 1: We must be able to identify the filesystem from a number. * either a device number (so FS_REQUIRES_DEV needed) * or an FSID number (so NFSEXP_FSID or ->uuid is needed). * 2: We must be able to find an inode from a filehandle. * This means that s_export_op must be set. * 3: We must not currently be on an idmapped mount. */ Granted I've been wrong on account of stale docs before. :$ Though it would be kinda funny if you *could* mess with another machine's processes over NFS. --D > I guess probably this case calls for another export_ops flag? Not like we're > short on them > > Thanks, > - Erin > >
On 13/11/2024 01:40, Darrick J. Wong wrote: >> Hmm, I guess I might have made that possible, though I'm certainly not >> familiar enough with the internals of nfsd to be able to test if I've done >> so. > AFAIK check_export() in fs/nfsd/export.c spells this it out: > > /* There are two requirements on a filesystem to be exportable. > * 1: We must be able to identify the filesystem from a number. > * either a device number (so FS_REQUIRES_DEV needed) > * or an FSID number (so NFSEXP_FSID or ->uuid is needed). > * 2: We must be able to find an inode from a filehandle. > * This means that s_export_op must be set. > * 3: We must not currently be on an idmapped mount. > */ > > Granted I've been wrong on account of stale docs before. :$ > > Though it would be kinda funny if you *could* mess with another > machine's processes over NFS. > > --D To be clear I'm not familiar enough with the workings of nfsd to tell if pidfs fails those requirements and therefore wouldn't become exportable as a result of this patch, though I gather from you're message that we're in the clear? Regardless I think my question is: do we think either those requirements could change in the future, or the properties of pidfs could change in the future, in ways that could accidentally make the filesystem exportable? I guess though that the same concern would apply to cgroupfs and it hasn't posed an issue so far. - Erin
On Tue, Nov 12, 2024 at 11:43:13PM +0100, Erin Shepherd wrote: > On 12/11/2024 14:57, Jeff Layton wrote: > > On Tue, 2024-11-12 at 14:10 +0100, Christian Brauner wrote: > > We should really just move to storing 64-bit inode numbers internally > > on 32-bit machines. That would at least make statx() give you all 64 > > bits on 32-bit host. > > I think that would be ideal from the perspective of exposing it to > userspace. > It does leave the question of going back from inode to pidfd unsolved > though.I like the name_to_handle_at/open_by_handle_at approach because Indeed it doesn't solve it because it's possible that a given struct pid never had a pidfd created for it and thus no inode actually does exist. So when you're decoding a pidfs file handle you need to go to a struct pid based on some property. The pid is fine for that and it is equivalen to how pidfd_open() works. > it neatly solves both sides of the problem with APIs we already have and > understand > > > Hmm... I guess pid namespaces don't have a convenient 64-bit ID like > > mount namespaces do? In that case, stashing the pid from init_ns is > > probably the next best thing. > > Not that I could identify, no; so stashing the PID seemed like the most > pragmatic > approach. > > I'm not 100% sure it should be a documented property of the file handle > format; I > somewhat think that everything after the PID inode should be opaque to > userspace > and subject to change in the future (to the point I considered xoring it > with a > magic constant to make it less obvious to userspace/make it more obvious > that its > not to be relied upon; but that to my knowledge is not something that > the kernel > has done elsewhere). > > - Erin >
On Wed, 2024-11-13 at 11:17 +0100, Erin Shepherd wrote: > On 13/11/2024 01:40, Darrick J. Wong wrote: > > > Hmm, I guess I might have made that possible, though I'm certainly not > > > familiar enough with the internals of nfsd to be able to test if I've done > > > so. > > AFAIK check_export() in fs/nfsd/export.c spells this it out: > > > > /* There are two requirements on a filesystem to be exportable. > > * 1: We must be able to identify the filesystem from a number. > > * either a device number (so FS_REQUIRES_DEV needed) > > * or an FSID number (so NFSEXP_FSID or ->uuid is needed). > > * 2: We must be able to find an inode from a filehandle. > > * This means that s_export_op must be set. > > * 3: We must not currently be on an idmapped mount. > > */ > > > > Granted I've been wrong on account of stale docs before. :$ > > > > Though it would be kinda funny if you *could* mess with another > > machine's processes over NFS. > > > > --D > > To be clear I'm not familiar enough with the workings of nfsd to tell if > pidfs fails those requirements and therefore wouldn't become exportable as > a result of this patch, though I gather from you're message that we're in the > clear? > > Regardless I think my question is: do we think either those requirements could > change in the future, or the properties of pidfs could change in the future, > in ways that could accidentally make the filesystem exportable? > > I guess though that the same concern would apply to cgroupfs and it hasn't posed > an issue so far. We have other filesystems that do this sort of thing (like cgroupfs), and we don't allow them to be exportable. We'll need to make sure that that's the case before we merge this, of course, as I forget the details of how that works.
> On Nov 13, 2024, at 8:29 AM, Jeff Layton <jlayton@kernel.org> wrote: > > On Wed, 2024-11-13 at 11:17 +0100, Erin Shepherd wrote: >> On 13/11/2024 01:40, Darrick J. Wong wrote: >>>> Hmm, I guess I might have made that possible, though I'm certainly not >>>> familiar enough with the internals of nfsd to be able to test if I've done >>>> so. >>> AFAIK check_export() in fs/nfsd/export.c spells this it out: >>> >>> /* There are two requirements on a filesystem to be exportable. >>> * 1: We must be able to identify the filesystem from a number. >>> * either a device number (so FS_REQUIRES_DEV needed) >>> * or an FSID number (so NFSEXP_FSID or ->uuid is needed). >>> * 2: We must be able to find an inode from a filehandle. >>> * This means that s_export_op must be set. >>> * 3: We must not currently be on an idmapped mount. >>> */ >>> >>> Granted I've been wrong on account of stale docs before. :$ >>> >>> Though it would be kinda funny if you *could* mess with another >>> machine's processes over NFS. >>> >>> --D >> >> To be clear I'm not familiar enough with the workings of nfsd to tell if >> pidfs fails those requirements and therefore wouldn't become exportable as >> a result of this patch, though I gather from you're message that we're in the >> clear? >> >> Regardless I think my question is: do we think either those requirements could >> change in the future, or the properties of pidfs could change in the future, >> in ways that could accidentally make the filesystem exportable? >> >> I guess though that the same concern would apply to cgroupfs and it hasn't posed >> an issue so far. > > We have other filesystems that do this sort of thing (like cgroupfs), > and we don't allow them to be exportable. We'll need to make sure that > that's the case before we merge this, of course, as I forget the > details of how that works. It's far easier to add exportability later than it is to remove it if we think it was a mistake. I would err on the side of caution if there isn't an immediate need/use-case for exposure via NFS. -- Chuck Lever
On Wed, Nov 13, 2024 at 2:29 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Wed, 2024-11-13 at 11:17 +0100, Erin Shepherd wrote: > > On 13/11/2024 01:40, Darrick J. Wong wrote: > > > > Hmm, I guess I might have made that possible, though I'm certainly not > > > > familiar enough with the internals of nfsd to be able to test if I've done > > > > so. > > > AFAIK check_export() in fs/nfsd/export.c spells this it out: > > > > > > /* There are two requirements on a filesystem to be exportable. > > > * 1: We must be able to identify the filesystem from a number. > > > * either a device number (so FS_REQUIRES_DEV needed) > > > * or an FSID number (so NFSEXP_FSID or ->uuid is needed). > > > * 2: We must be able to find an inode from a filehandle. > > > * This means that s_export_op must be set. > > > * 3: We must not currently be on an idmapped mount. > > > */ > > > > > > Granted I've been wrong on account of stale docs before. :$ > > > > > > Though it would be kinda funny if you *could* mess with another > > > machine's processes over NFS. > > > > > > --D > > > > To be clear I'm not familiar enough with the workings of nfsd to tell if > > pidfs fails those requirements and therefore wouldn't become exportable as > > a result of this patch, though I gather from you're message that we're in the > > clear? > > > > Regardless I think my question is: do we think either those requirements could > > change in the future, or the properties of pidfs could change in the future, > > in ways that could accidentally make the filesystem exportable? > > > > I guess though that the same concern would apply to cgroupfs and it hasn't posed > > an issue so far. > > We have other filesystems that do this sort of thing (like cgroupfs), > and we don't allow them to be exportable. We'll need to make sure that > that's the case before we merge this, of course, as I forget the > details of how that works. TBH, I cannot find how export of cgroups with NFSEXP_FSID is prevented. We should probably block nfs export of SB_NOUSER and anyway, this should be tied to the flag for relaxing CAP_DAC_READ_SEARCH, because this is a strong indication that it's not a traditional nfs file handle. Thanks, Amir.
On Wed, Nov 13, 2024 at 02:41:29PM +0000, Chuck Lever III wrote: > > > > On Nov 13, 2024, at 8:29 AM, Jeff Layton <jlayton@kernel.org> wrote: > > > > On Wed, 2024-11-13 at 11:17 +0100, Erin Shepherd wrote: > >> On 13/11/2024 01:40, Darrick J. Wong wrote: > >>>> Hmm, I guess I might have made that possible, though I'm certainly not > >>>> familiar enough with the internals of nfsd to be able to test if I've done > >>>> so. > >>> AFAIK check_export() in fs/nfsd/export.c spells this it out: > >>> > >>> /* There are two requirements on a filesystem to be exportable. > >>> * 1: We must be able to identify the filesystem from a number. > >>> * either a device number (so FS_REQUIRES_DEV needed) > >>> * or an FSID number (so NFSEXP_FSID or ->uuid is needed). > >>> * 2: We must be able to find an inode from a filehandle. > >>> * This means that s_export_op must be set. > >>> * 3: We must not currently be on an idmapped mount. > >>> */ > >>> > >>> Granted I've been wrong on account of stale docs before. :$ > >>> > >>> Though it would be kinda funny if you *could* mess with another > >>> machine's processes over NFS. > >>> > >>> --D > >> > >> To be clear I'm not familiar enough with the workings of nfsd to tell if > >> pidfs fails those requirements and therefore wouldn't become exportable as > >> a result of this patch, though I gather from you're message that we're in the > >> clear? > >> > >> Regardless I think my question is: do we think either those requirements could > >> change in the future, or the properties of pidfs could change in the future, > >> in ways that could accidentally make the filesystem exportable? > >> > >> I guess though that the same concern would apply to cgroupfs and it hasn't posed > >> an issue so far. > > > > We have other filesystems that do this sort of thing (like cgroupfs), > > and we don't allow them to be exportable. We'll need to make sure that > > that's the case before we merge this, of course, as I forget the > > details of how that works. > > It's far easier to add exportability later than it is > to remove it if we think it was a mistake. I would err > on the side of caution if there isn't an immediate > need/use-case for exposure via NFS. Tbh, the idea itself of exporting pidfs via nfs is a) crazy and b) pretty interesting. If we could really export pidfs over NFS cleanly somehow then we would have a filesystem-like representation of a remote machine's processes. There could be a lot of interesting things in this. But I would think that this requires some proper massaging of how pidfs works. But in principle it might be possible. Again, I'm not saying it's a great idea and we definitely shouldn't do it now. But it's an interesting thought experiment at least.