diff mbox

[v1,38/38] nfs: add a Kconfig option for NFS reexporting and documentation

Message ID 1447761180-4250-39-git-send-email-jeff.layton@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton Nov. 17, 2015, 11:53 a.m. UTC
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 Documentation/filesystems/nfs/reexport.txt | 95 ++++++++++++++++++++++++++++++
 fs/nfs/Kconfig                             | 11 ++++
 2 files changed, 106 insertions(+)
 create mode 100644 Documentation/filesystems/nfs/reexport.txt

Comments

J. Bruce Fields Nov. 18, 2015, 8:22 p.m. UTC | #1
On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> +Filehandle size:
> +----------------
> +The maximum filehandle size is governed by the NFS version. Version 2
> +used fixed 32 byte filehandles. Version 3 moved to variable length
> +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> +maximum to 128 bytes.
> +
> +When reexporting an NFS filesystem, the underlying filehandle from the
> +server must be embedded inside the filehandles presented to clients.
> +Thus if the underlying server presents filehandles that are too big, the
> +reexporting server can fail to encode them. This can lead to
> +NFSERR_OPNOTSUPP errors being returned to clients.
> +
> +This is not a trivial thing to programatically determine ahead of time
> +(and it can vary even within the same server), so some foreknowledge of
> +how the underlying server constructs filehandles, and their maximum
> +size is a must.

This is the trickiest one, since it depends on an undocumented
implementation detail of the server.

Do we even know if this works for all the exportable Linux filesystems?

If proxying NFSv4.x servers is actually useful, could we add a per-fs
maximum-filesystem-size attribute to the protocol?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton Nov. 18, 2015, 9:15 p.m. UTC | #2
On Wed, 18 Nov 2015 15:22:20 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > +Filehandle size:
> > +----------------
> > +The maximum filehandle size is governed by the NFS version. Version 2
> > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > +maximum to 128 bytes.
> > +
> > +When reexporting an NFS filesystem, the underlying filehandle from the
> > +server must be embedded inside the filehandles presented to clients.
> > +Thus if the underlying server presents filehandles that are too big, the
> > +reexporting server can fail to encode them. This can lead to
> > +NFSERR_OPNOTSUPP errors being returned to clients.
> > +
> > +This is not a trivial thing to programatically determine ahead of time
> > +(and it can vary even within the same server), so some foreknowledge of
> > +how the underlying server constructs filehandles, and their maximum
> > +size is a must.
> 
> This is the trickiest one, since it depends on an undocumented
> implementation detail of the server.
> 

Yes, indeed...

> Do we even know if this works for all the exportable Linux filesystems?
> 
> If proxying NFSv4.x servers is actually useful, could we add a per-fs
> maximum-filesystem-size attribute to the protocol?
> 

Erm, I think you mean maximum-filehandle-size, but I get your point...

It's tough to do more than a quick survey, but looking at new-style fh:

The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
can get that down to 8 bytes if you specify the fsid directly. The fsid
choice is weird, because it sort of depends on the filehandle sent by
the client (which is used as a template), so I guess we really do need
to assume worst-case.

Once that's done, the encode_fh routines add the fileid part. btrfs has
a pretty large maximum one: 40 bytes. That brings the max size up to 68
bytes, which is already too large for NFSv3, before we ever get to
the part where we embed that inside another fh. We require another 12
bytes on top of the "underlying" filehandle for reexporting.

So, no this may very well not work for all exportable Linux
filesystems, but it sort of depends on the situation (and to some
degree, what gets sent by the clients). That's what makes this so hard
to figure out programmatically.

As far as extending the protocol...that's not a bad idea, though that's
obviously a longer-term solution. I don't think we can reasonably rely
on that anyway. Maybe though...
Frank Filz Nov. 18, 2015, 10:30 p.m. UTC | #3
Jeff Layton said:
> On Wed, 18 Nov 2015 15:22:20 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > +Filehandle size:
> > > +----------------
> > > +The maximum filehandle size is governed by the NFS version. Version
> > > +2 used fixed 32 byte filehandles. Version 3 moved to variable
> > > +length filehandles that can be up to 64 bytes in size. NFSv4
> > > +increased that maximum to 128 bytes.
> > > +
> > > +When reexporting an NFS filesystem, the underlying filehandle from
> > > +the server must be embedded inside the filehandles presented to
> clients.
> > > +Thus if the underlying server presents filehandles that are too
> > > +big, the reexporting server can fail to encode them. This can lead
> > > +to NFSERR_OPNOTSUPP errors being returned to clients.
> > > +
> > > +This is not a trivial thing to programatically determine ahead of
> > > +time (and it can vary even within the same server), so some
> > > +foreknowledge of how the underlying server constructs filehandles,
> > > +and their maximum size is a must.
> >
> > This is the trickiest one, since it depends on an undocumented
> > implementation detail of the server.
> >
> 
> Yes, indeed...
> 
> > Do we even know if this works for all the exportable Linux filesystems?
> >
> > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > maximum-filesystem-size attribute to the protocol?
> >
> 
> Erm, I think you mean maximum-filehandle-size, but I get your point...
> 
> It's tough to do more than a quick survey, but looking at new-style fh:
> 
> The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> can get that down to 8 bytes if you specify the fsid directly. The fsid
choice is
> weird, because it sort of depends on the filehandle sent by the client
(which
> is used as a template), so I guess we really do need to assume worst-case.
> 
> Once that's done, the encode_fh routines add the fileid part. btrfs has a
> pretty large maximum one: 40 bytes. That brings the max size up to 68
bytes,
> which is already too large for NFSv3, before we ever get to the part where
> we embed that inside another fh. We require another 12 bytes on top of the
> "underlying" filehandle for reexporting.
> 
> So, no this may very well not work for all exportable Linux filesystems,
but it
> sort of depends on the situation (and to some degree, what gets sent by
the
> clients). That's what makes this so hard to figure out programmatically.
> 
> As far as extending the protocol...that's not a bad idea, though that's
> obviously a longer-term solution. I don't think we can reasonably rely on
that
> anyway. Maybe though...

I've been thinking about this kind of thing with Ganesha's proxy server, and
conveniently, you have also provided a good use case for proxy...

One option I was going to give Ganesha is the ability to in export
configuration indicate the upstream server is Ganesha, and expect the export
configuration to be mirrored (easy for a config tool to do across the set of
servers, primary and proxy) so that Ganesha could just pass handles through.
Something similar might be possible for knfsd. With a bit more work, we
could be prepared to deal with other servers (like Ganesha providing for
knfsd or visa versa) to break apart the upstream handle to an "export"
component which can be static, and a "filesystem specific" portion that
needs to be passed through. So Ganesha could break out knfsd's fsid encoding
and map that to an exportid, and just pass through the payload handle (the
portion that comes from the exportfs interface).

Frank


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton Nov. 19, 2015, 2:01 p.m. UTC | #4
On Wed, 18 Nov 2015 14:30:41 -0800
"Frank Filz" <ffilzlnx@mindspring.com> wrote:

> Jeff Layton said:
> > On Wed, 18 Nov 2015 15:22:20 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > > +Filehandle size:
> > > > +----------------
> > > > +The maximum filehandle size is governed by the NFS version. Version
> > > > +2 used fixed 32 byte filehandles. Version 3 moved to variable
> > > > +length filehandles that can be up to 64 bytes in size. NFSv4
> > > > +increased that maximum to 128 bytes.
> > > > +
> > > > +When reexporting an NFS filesystem, the underlying filehandle from
> > > > +the server must be embedded inside the filehandles presented to
> > clients.
> > > > +Thus if the underlying server presents filehandles that are too
> > > > +big, the reexporting server can fail to encode them. This can lead
> > > > +to NFSERR_OPNOTSUPP errors being returned to clients.
> > > > +
> > > > +This is not a trivial thing to programatically determine ahead of
> > > > +time (and it can vary even within the same server), so some
> > > > +foreknowledge of how the underlying server constructs filehandles,
> > > > +and their maximum size is a must.
> > >
> > > This is the trickiest one, since it depends on an undocumented
> > > implementation detail of the server.
> > >
> > 
> > Yes, indeed...
> > 
> > > Do we even know if this works for all the exportable Linux filesystems?
> > >
> > > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > > maximum-filesystem-size attribute to the protocol?
> > >
> > 
> > Erm, I think you mean maximum-filehandle-size, but I get your point...
> > 
> > It's tough to do more than a quick survey, but looking at new-style fh:
> > 
> > The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> > can get that down to 8 bytes if you specify the fsid directly. The fsid
> choice is
> > weird, because it sort of depends on the filehandle sent by the client
> (which
> > is used as a template), so I guess we really do need to assume worst-case.
> > 
> > Once that's done, the encode_fh routines add the fileid part. btrfs has a
> > pretty large maximum one: 40 bytes. That brings the max size up to 68
> bytes,
> > which is already too large for NFSv3, before we ever get to the part where
> > we embed that inside another fh. We require another 12 bytes on top of the
> > "underlying" filehandle for reexporting.
> > 
> > So, no this may very well not work for all exportable Linux filesystems,
> but it
> > sort of depends on the situation (and to some degree, what gets sent by
> the
> > clients). That's what makes this so hard to figure out programmatically.
> > 
> > As far as extending the protocol...that's not a bad idea, though that's
> > obviously a longer-term solution. I don't think we can reasonably rely on
> that
> > anyway. Maybe though...
> 
> I've been thinking about this kind of thing with Ganesha's proxy server, and
> conveniently, you have also provided a good use case for proxy...
> 
> One option I was going to give Ganesha is the ability to in export
> configuration indicate the upstream server is Ganesha, and expect the export
> configuration to be mirrored (easy for a config tool to do across the set of
> servers, primary and proxy) so that Ganesha could just pass handles through.
> Something similar might be possible for knfsd. With a bit more work, we
> could be prepared to deal with other servers (like Ganesha providing for
> knfsd or visa versa) to break apart the upstream handle to an "export"
> component which can be static, and a "filesystem specific" portion that
> needs to be passed through. So Ganesha could break out knfsd's fsid encoding
> and map that to an exportid, and just pass through the payload handle (the
> portion that comes from the exportfs interface).
> 
> Frank
> 

It would be very tough to just pass the filehandles through here. After
all, we're hooking nfsd up to the nfs client code. You could (in
principle) have a mix of regular filesystems and reexported nfs mounts.
What if there are filehandle collisions between the one you passed
through and one of your local exported filesystems?

Breaking up the filehandle is also pretty much impossible to do in a
general way. The problem of course is that filehandles are really
opaque blobs to the client. You'd have to know ahead of time what part
of it refers to the fsid and what part is the fileid part. knfsd can
compose all sorts of filehandles, and it tries hard to mirror the type
of fh that the client is using.

Beyond that, what do you do when you get one of these "reconstituted"
filehandles back from the client where you've stripped off the fsid
info? At some point you have to reconstruct the original filehandle so
you can call back to the underlying server. Where do you store the fsid
info? Note that it has to be persistent across reboots too or you'll
see stale nfs filehandles on the clients.
J. Bruce Fields Nov. 20, 2015, 12:04 a.m. UTC | #5
On Wed, Nov 18, 2015 at 04:15:21PM -0500, Jeff Layton wrote:
> On Wed, 18 Nov 2015 15:22:20 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > +Filehandle size:
> > > +----------------
> > > +The maximum filehandle size is governed by the NFS version. Version 2
> > > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > > +maximum to 128 bytes.
> > > +
> > > +When reexporting an NFS filesystem, the underlying filehandle from the
> > > +server must be embedded inside the filehandles presented to clients.
> > > +Thus if the underlying server presents filehandles that are too big, the
> > > +reexporting server can fail to encode them. This can lead to
> > > +NFSERR_OPNOTSUPP errors being returned to clients.
> > > +
> > > +This is not a trivial thing to programatically determine ahead of time
> > > +(and it can vary even within the same server), so some foreknowledge of
> > > +how the underlying server constructs filehandles, and their maximum
> > > +size is a must.
> > 
> > This is the trickiest one, since it depends on an undocumented
> > implementation detail of the server.
> > 
> 
> Yes, indeed...
> 
> > Do we even know if this works for all the exportable Linux filesystems?
> > 
> > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > maximum-filesystem-size attribute to the protocol?
> > 
> 
> Erm, I think you mean maximum-filehandle-size, but I get your point...

Whoops, thanks.

> It's tough to do more than a quick survey, but looking at new-style fh:
> 
> The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> can get that down to 8 bytes if you specify the fsid directly. The fsid
> choice is weird, because it sort of depends on the filehandle sent by
> the client (which is used as a template), so I guess we really do need
> to assume worst-case.

The client can only ever use filehandles it's been given, so if the
backend server's always been configured to use a certain kind (e.g. if
the exports have fsid= set), then we're OK, we're not responsible for
clients that guess random filehandles.

> Once that's done, the encode_fh routines add the fileid part. btrfs has
> a pretty large maximum one: 40 bytes. That brings the max size up to 68
> bytes, which is already too large for NFSv3, before we ever get to
> the part where we embed that inside another fh. We require another 12
> bytes on top of the "underlying" filehandle for reexporting.

So it's not necessarily that bad for nfsd, though of course it makes it
more complicated to configure the backend server.  Well, and knfsd has
v3 support so this is all a bit academic I guess.

So I'm having trouble weighing the benefits of this patch set against
the risks.

It's not even necessarily true that filehandles on a given filesystem
need be constant length.  In theory a server could decide to start
giving out bigger filehandles some day (as long as it continued to
respect the old ones), and the proxy would break.  In practice maybe
nobody does that.

--b.

> So, no this may very well not work for all exportable Linux
> filesystems, but it sort of depends on the situation (and to some
> degree, what gets sent by the clients). That's what makes this so hard
> to figure out programmatically.
> 
> As far as extending the protocol...that's not a bad idea, though that's
> obviously a longer-term solution. I don't think we can reasonably rely
> on that anyway. Maybe though...
> 
> -- 
> Jeff Layton <jlayton@poochiereds.net>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton Nov. 20, 2015, 12:28 a.m. UTC | #6
On Thu, 19 Nov 2015 19:04:15 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Wed, Nov 18, 2015 at 04:15:21PM -0500, Jeff Layton wrote:
> > On Wed, 18 Nov 2015 15:22:20 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > > +Filehandle size:
> > > > +----------------
> > > > +The maximum filehandle size is governed by the NFS version. Version 2
> > > > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > > > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > > > +maximum to 128 bytes.
> > > > +
> > > > +When reexporting an NFS filesystem, the underlying filehandle from the
> > > > +server must be embedded inside the filehandles presented to clients.
> > > > +Thus if the underlying server presents filehandles that are too big, the
> > > > +reexporting server can fail to encode them. This can lead to
> > > > +NFSERR_OPNOTSUPP errors being returned to clients.
> > > > +
> > > > +This is not a trivial thing to programatically determine ahead of time
> > > > +(and it can vary even within the same server), so some foreknowledge of
> > > > +how the underlying server constructs filehandles, and their maximum
> > > > +size is a must.
> > > 
> > > This is the trickiest one, since it depends on an undocumented
> > > implementation detail of the server.
> > > 
> > 
> > Yes, indeed...
> > 
> > > Do we even know if this works for all the exportable Linux filesystems?
> > > 
> > > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > > maximum-filesystem-size attribute to the protocol?
> > > 
> > 
> > Erm, I think you mean maximum-filehandle-size, but I get your point...
> 
> Whoops, thanks.
> 
> > It's tough to do more than a quick survey, but looking at new-style fh:
> > 
> > The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> > can get that down to 8 bytes if you specify the fsid directly. The fsid
> > choice is weird, because it sort of depends on the filehandle sent by
> > the client (which is used as a template), so I guess we really do need
> > to assume worst-case.
> 
> The client can only ever use filehandles it's been given, so if the
> backend server's always been configured to use a certain kind (e.g. if
> the exports have fsid= set), then we're OK, we're not responsible for
> clients that guess random filehandles.
> 
> > Once that's done, the encode_fh routines add the fileid part. btrfs has
> > a pretty large maximum one: 40 bytes. That brings the max size up to 68
> > bytes, which is already too large for NFSv3, before we ever get to
> > the part where we embed that inside another fh. We require another 12
> > bytes on top of the "underlying" filehandle for reexporting.
> 
> So it's not necessarily that bad for nfsd, though of course it makes it
> more complicated to configure the backend server.  Well, and knfsd has
> v3 support so this is all a bit academic I guess.
> 

You just have to make sure you vet the filehandle size on the stuff
you're reexporting. In our use-case, we know that the backend server's
filehandles are well under 42 bytes, so we're well under the max size.

One thing we could consider is promoting the dprintk in nfs_encode_fh
when this occurs to a pr_err or something. That would at least make
it very obvious when that occurs...

> So I'm having trouble weighing the benefits of this patch set against
> the risks.
> 
> It's not even necessarily true that filehandles on a given filesystem
> need be constant length.  In theory a server could decide to start
> giving out bigger filehandles some day (as long as it continued to
> respect the old ones), and the proxy would break.  In practice maybe
> nobody does that.
> 

Hard to say. There are a lot of oddball servers out there. There
certainly are risks involved in reexporting, particularly if you don't
heed the caveats. It's for good reason this Kconfig option defaults to
"n". ;)

OTOH, the kernel shouldn't crash or anything if that occurs. If your
filehandles are too large to be embedded, then you just end up getting
back FILEID_INVALID on the encode_fh. That sucks if it occurs, but
it shouldn't happen if you're careful about what gets reexported.


> 
> > So, no this may very well not work for all exportable Linux
> > filesystems, but it sort of depends on the situation (and to some
> > degree, what gets sent by the clients). That's what makes this so hard
> > to figure out programmatically.
> > 
> > As far as extending the protocol...that's not a bad idea, though that's
> > obviously a longer-term solution. I don't think we can reasonably rely
> > on that anyway. Maybe though...
> > 
> > -- 
> > Jeff Layton <jlayton@poochiereds.net>
J. Bruce Fields Jan. 14, 2016, 10:21 p.m. UTC | #7
On Thu, Nov 19, 2015 at 07:28:49PM -0500, Jeff Layton wrote:
> On Thu, 19 Nov 2015 19:04:15 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Wed, Nov 18, 2015 at 04:15:21PM -0500, Jeff Layton wrote:
> > > On Wed, 18 Nov 2015 15:22:20 -0500
> > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > 
> > > > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > > > +Filehandle size:
> > > > > +----------------
> > > > > +The maximum filehandle size is governed by the NFS version. Version 2
> > > > > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > > > > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > > > > +maximum to 128 bytes.
> > > > > +
> > > > > +When reexporting an NFS filesystem, the underlying filehandle from the
> > > > > +server must be embedded inside the filehandles presented to clients.
> > > > > +Thus if the underlying server presents filehandles that are too big, the
> > > > > +reexporting server can fail to encode them. This can lead to
> > > > > +NFSERR_OPNOTSUPP errors being returned to clients.
> > > > > +
> > > > > +This is not a trivial thing to programatically determine ahead of time
> > > > > +(and it can vary even within the same server), so some foreknowledge of
> > > > > +how the underlying server constructs filehandles, and their maximum
> > > > > +size is a must.
> > > > 
> > > > This is the trickiest one, since it depends on an undocumented
> > > > implementation detail of the server.
> > > > 
> > > 
> > > Yes, indeed...
> > > 
> > > > Do we even know if this works for all the exportable Linux filesystems?
> > > > 
> > > > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > > > maximum-filesystem-size attribute to the protocol?
> > > > 
> > > 
> > > Erm, I think you mean maximum-filehandle-size, but I get your point...
> > 
> > Whoops, thanks.
> > 
> > > It's tough to do more than a quick survey, but looking at new-style fh:
> > > 
> > > The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> > > can get that down to 8 bytes if you specify the fsid directly. The fsid
> > > choice is weird, because it sort of depends on the filehandle sent by
> > > the client (which is used as a template), so I guess we really do need
> > > to assume worst-case.
> > 
> > The client can only ever use filehandles it's been given, so if the
> > backend server's always been configured to use a certain kind (e.g. if
> > the exports have fsid= set), then we're OK, we're not responsible for
> > clients that guess random filehandles.
> > 
> > > Once that's done, the encode_fh routines add the fileid part. btrfs has
> > > a pretty large maximum one: 40 bytes. That brings the max size up to 68
> > > bytes, which is already too large for NFSv3, before we ever get to
> > > the part where we embed that inside another fh. We require another 12
> > > bytes on top of the "underlying" filehandle for reexporting.
> > 
> > So it's not necessarily that bad for nfsd, though of course it makes it
> > more complicated to configure the backend server.  Well, and knfsd has
> > v3 support so this is all a bit academic I guess.
> > 
> 
> You just have to make sure you vet the filehandle size on the stuff
> you're reexporting. In our use-case, we know that the backend server's
> filehandles are well under 42 bytes, so we're well under the max size.
> 
> One thing we could consider is promoting the dprintk in nfs_encode_fh
> when this occurs to a pr_err or something. That would at least make
> it very obvious when that occurs...
> 
> > So I'm having trouble weighing the benefits of this patch set against
> > the risks.
> > 
> > It's not even necessarily true that filehandles on a given filesystem
> > need be constant length.  In theory a server could decide to start
> > giving out bigger filehandles some day (as long as it continued to
> > respect the old ones), and the proxy would break.  In practice maybe
> > nobody does that.
> > 
> 
> Hard to say. There are a lot of oddball servers out there. There
> certainly are risks involved in reexporting, particularly if you don't
> heed the caveats. It's for good reason this Kconfig option defaults to
> "n". ;)
> 
> OTOH, the kernel shouldn't crash or anything if that occurs. If your
> filehandles are too large to be embedded, then you just end up getting
> back FILEID_INVALID on the encode_fh. That sucks if it occurs, but
> it shouldn't happen if you're careful about what gets reexported.

OK, sorry for the long silence on this.

Basically I'm having trouble making the case to myself here:

	- On the one hand, having you guys carry all this stuff is
	  annoying, I'd rather our code bases were closer.
	- On the other hand, I can't see taking something that's in
	  practice basically only useful for one proprietary server,
	  which is the way it looks to me right now.
	- Also, "NFS proxying" *sounds* much more general than it really
	  is, and I fear a lot of people are going to fall into that
	  trap now matter how we warn them.

Gah.

Anyway, for now I should take the one tracepoint patch at least (and
shouldn't some of the fs patches go in regardless?) but I'm punting on
the rest.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton Jan. 15, 2016, 4 p.m. UTC | #8
On Thu, 14 Jan 2016 17:21:27 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Thu, Nov 19, 2015 at 07:28:49PM -0500, Jeff Layton wrote:
> > On Thu, 19 Nov 2015 19:04:15 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> >   
> > > On Wed, Nov 18, 2015 at 04:15:21PM -0500, Jeff Layton wrote:  
> > > > On Wed, 18 Nov 2015 15:22:20 -0500
> > > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > >   
> > > > > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:  
> > > > > > +Filehandle size:
> > > > > > +----------------
> > > > > > +The maximum filehandle size is governed by the NFS version. Version 2
> > > > > > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > > > > > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > > > > > +maximum to 128 bytes.
> > > > > > +
> > > > > > +When reexporting an NFS filesystem, the underlying filehandle from the
> > > > > > +server must be embedded inside the filehandles presented to clients.
> > > > > > +Thus if the underlying server presents filehandles that are too big, the
> > > > > > +reexporting server can fail to encode them. This can lead to
> > > > > > +NFSERR_OPNOTSUPP errors being returned to clients.
> > > > > > +
> > > > > > +This is not a trivial thing to programatically determine ahead of time
> > > > > > +(and it can vary even within the same server), so some foreknowledge of
> > > > > > +how the underlying server constructs filehandles, and their maximum
> > > > > > +size is a must.  
> > > > > 
> > > > > This is the trickiest one, since it depends on an undocumented
> > > > > implementation detail of the server.
> > > > >   
> > > > 
> > > > Yes, indeed...
> > > >   
> > > > > Do we even know if this works for all the exportable Linux filesystems?
> > > > > 
> > > > > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > > > > maximum-filesystem-size attribute to the protocol?
> > > > >   
> > > > 
> > > > Erm, I think you mean maximum-filehandle-size, but I get your point...  
> > > 
> > > Whoops, thanks.
> > >   
> > > > It's tough to do more than a quick survey, but looking at new-style fh:
> > > > 
> > > > The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> > > > can get that down to 8 bytes if you specify the fsid directly. The fsid
> > > > choice is weird, because it sort of depends on the filehandle sent by
> > > > the client (which is used as a template), so I guess we really do need
> > > > to assume worst-case.  
> > > 
> > > The client can only ever use filehandles it's been given, so if the
> > > backend server's always been configured to use a certain kind (e.g. if
> > > the exports have fsid= set), then we're OK, we're not responsible for
> > > clients that guess random filehandles.
> > >   
> > > > Once that's done, the encode_fh routines add the fileid part. btrfs has
> > > > a pretty large maximum one: 40 bytes. That brings the max size up to 68
> > > > bytes, which is already too large for NFSv3, before we ever get to
> > > > the part where we embed that inside another fh. We require another 12
> > > > bytes on top of the "underlying" filehandle for reexporting.  
> > > 
> > > So it's not necessarily that bad for nfsd, though of course it makes it
> > > more complicated to configure the backend server.  Well, and knfsd has
> > > v3 support so this is all a bit academic I guess.
> > >   
> > 
> > You just have to make sure you vet the filehandle size on the stuff
> > you're reexporting. In our use-case, we know that the backend server's
> > filehandles are well under 42 bytes, so we're well under the max size.
> > 
> > One thing we could consider is promoting the dprintk in nfs_encode_fh
> > when this occurs to a pr_err or something. That would at least make
> > it very obvious when that occurs...
> >   
> > > So I'm having trouble weighing the benefits of this patch set against
> > > the risks.
> > > 
> > > It's not even necessarily true that filehandles on a given filesystem
> > > need be constant length.  In theory a server could decide to start
> > > giving out bigger filehandles some day (as long as it continued to
> > > respect the old ones), and the proxy would break.  In practice maybe
> > > nobody does that.
> > >   
> > 
> > Hard to say. There are a lot of oddball servers out there. There
> > certainly are risks involved in reexporting, particularly if you don't
> > heed the caveats. It's for good reason this Kconfig option defaults to
> > "n". ;)
> > 
> > OTOH, the kernel shouldn't crash or anything if that occurs. If your
> > filehandles are too large to be embedded, then you just end up getting
> > back FILEID_INVALID on the encode_fh. That sucks if it occurs, but
> > it shouldn't happen if you're careful about what gets reexported.  
> 
> OK, sorry for the long silence on this.
> 
> Basically I'm having trouble making the case to myself here:
> 
> 	- On the one hand, having you guys carry all this stuff is
> 	  annoying, I'd rather our code bases were closer.
> 	- On the other hand, I can't see taking something that's in
> 	  practice basically only useful for one proprietary server,
> 	  which is the way it looks to me right now.
> 	- Also, "NFS proxying" *sounds* much more general than it really
> 	  is, and I fear a lot of people are going to fall into that
> 	  trap now matter how we warn them.
> 
> Gah.
> 
> Anyway, for now I should take the one tracepoint patch at least (and
> shouldn't some of the fs patches go in regardless?) but I'm punting on
> the rest.
> 
> --b.

Understood.

I've not had the cycles to spend on this lately anyway, as I've been
putting out fires elsewhere. Perhaps once I am able to do that and
spend some time on the performance of this, we may find that the open
file cache is more generally useful, and we can revisit it then. We'll
see...

FWIW, there is one significant bugfix to that series that I've also not
had the time to post as well. The error handling when fsnotify_add_mark
returns an error is not right, and it can end up with a double free of
the mark.

As far as what should go in soon...yeah, this tracepoint patch might be
nice:

    nfsd: add new io class tracepoint

For the vfs, these two might be good, but I'd like Al to offer an
opinion on the first one. I'm pretty sure we don't call
flush_delayed_fput until after the workqueue threads have been started,
but the only caller now is in the boot code, AFAICT and I'm not 100%
sure on that point:

    fs: have flush_delayed_fput flush the workqueue job
    fs: add a kerneldoc header to fput

This patch has already been picked up by Andrew, AFAICT:

    fsnotify: destroy marks with call_srcu instead of dedicated thread

...and the rest are pretty much specific to the reexporting
functionality.
diff mbox

Patch

diff --git a/Documentation/filesystems/nfs/reexport.txt b/Documentation/filesystems/nfs/reexport.txt
new file mode 100644
index 000000000000..4ecfd3832338
--- /dev/null
+++ b/Documentation/filesystems/nfs/reexport.txt
@@ -0,0 +1,95 @@ 
+Re-exporting nfs via nfsd:
+--------------------------
+It is possible to reeexport a nfs filesystem via nfsd, but there are
+some limitations to this scheme.
+
+The primary use case for this is allowing clients that do not support
+newer versions of NFS to access servers that do not export older
+versions of NFS. In particular, it's a way to distribute pnfs support to
+non-pnfs enabled clients (albeit at the cost of an extra hop).
+
+There are a number of caveats to doing this -- be sure to read the
+entire document below and make sure that you know what you're doing!
+
+Quick Start:
+------------
+1) ensure that the kernel is built with CONFIG_NFS_REEXPORT
+
+2) Mount the _entire_ directory tree that you wish to reexport on the
+server. nfsd is unable to cross server filesystem boundaries
+automatically, so the entire tree to be reexported must be mounted
+prior to exporting.
+
+3) Add exports for the reexported filesystem to /etc/exports, assigning
+fsid= values to each. NFS doesn't have a persistent UUID or device
+number that is guaranteed to be unique across multiple servers, so
+fsid= values must always be explicitly assigned.
+
+4) Avoid stateful operations from the clients. File locking is
+particularly problematic, but reexporting NFSv4 via NFSv4 is likely to
+have similar problems with open and delegation stateids as well.
+
+The gory details of reexportng:
+-------------------------------
+Below is a detailed list of the _known_ problems with reexporting NFS
+via nfsd. Be aware of these facts when using this feature:
+
+Filehandle size:
+----------------
+The maximum filehandle size is governed by the NFS version. Version 2
+used fixed 32 byte filehandles. Version 3 moved to variable length
+filehandles that can be up to 64 bytes in size. NFSv4 increased that
+maximum to 128 bytes.
+
+When reexporting an NFS filesystem, the underlying filehandle from the
+server must be embedded inside the filehandles presented to clients.
+Thus if the underlying server presents filehandles that are too big, the
+reexporting server can fail to encode them. This can lead to
+NFSERR_OPNOTSUPP errors being returned to clients.
+
+This is not a trivial thing to programatically determine ahead of time
+(and it can vary even within the same server), so some foreknowledge of
+how the underlying server constructs filehandles, and their maximum
+size is a must.
+
+No subtree checking:
+--------------------
+Subtree checking requires that information about the parent be encoded
+in non-directory filehandles. Since filehandle space is already at a
+premium, subtree checking is disallowed on reexported nfs filesystems.
+
+No crossing of mountpoints:
+---------------------------
+Crossing from one exported filesystem to another typically involves the
+nfs client doing a behind-the-scenes mount of the "child" filesystem. nfsd
+lacks the machinery to do this. It could (in principle) be added, but
+there's really no point as there is no way to ensure that the fsid
+(filesystem identifier) value that got assigned was persistent.
+
+Lack of a persistent fsid= value:
+---------------------------------
+NFS filesystems don't have a persistent value that we can stuff into
+the fsid. We could repackage the one that the server provides, but that
+could lead to collisions if the reexporting server has mounts to
+different underlying servers. Thus, reexporting NFS requires assigning
+a fsid= value in the export options. This value must be persistent
+across reboots of the reexporting server as well or the clients will
+see filehandles change (the dreaded "Stale NFS filehandle" error).
+
+Statefulness and locking:
+-------------------------
+Holding any sort of state across a reexported NFS mount is problematic.
+It's always possible that the reexporting server could reboot, in which
+case it will lose track of the state held on the underlying server.
+
+When it comes back up, the clients will then try to reclaim that state
+from the reexporter, but the reexporter can't provide the necessary
+guarantees to ensure that conflicting state wasn't set and released
+during the time it was down. This may mean silent data corruption.
+Any sort of stateful operations against the reexporting fileserver are
+best avoided.
+
+Because of this, it's best to use a configuration that does not involve
+the clients holding any state on the reexporter. For example, reexporting
+a NFSv4 filesystem to legacy clients via NFSv3 (sans file locking) should
+basically work.
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index f31fd0dd92c6..92ad6bcc81cc 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -200,3 +200,14 @@  config NFS_DEBUG
 	depends on NFS_FS && SUNRPC_DEBUG
 	select CRC32
 	default y
+
+config NFS_REEXPORT
+	bool "Allow reexporting of NFS filesystems via knfsd"
+	depends on NFSD
+	default n
+	help
+	  This option allows NFS filesystems to be re-exported via knfsd.
+	  This is generally only useful in some very limited situations.
+	  One such is to allow legacy client access to servers that do not
+	  support older NFS versions. Use with caution and be sure to read
+	  Documentation/filesystems/nfs/reexport.txt first!