nfs: set block size according to pnfs_blksize first

Message ID	1623847469-150122-1-git-send-email-hsiangkao@linux.alibaba.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> From: Gao Xiang <hsiangkao@linux.alibaba.com> To: linux-nfs@vger.kernel.org Cc: LKML <linux-kernel@vger.kernel.org>, Gao Xiang <hsiangkao@linux.alibaba.com>, Trond Myklebust <trond.myklebust@hammerspace.com>, Anna Schumaker <anna.schumaker@netapp.com>, Joseph Qi <joseph.qi@linux.alibaba.com> Subject: [PATCH] nfs: set block size according to pnfs_blksize first Date: Wed, 16 Jun 2021 20:44:29 +0800 Message-Id: <1623847469-150122-1-git-send-email-hsiangkao@linux.alibaba.com> Precedence: bulk
Series	nfs: set block size according to pnfs_blksize first \| expand nfs: set block size according to pnfs_blksize first

Gao Xiang June 16, 2021, 12:44 p.m. UTC

When testing fstests with ext4 over nfs 4.2, I found generic/486
failed. The root cause is that the length of its xattr value is
  min(st_blksize * 3 / 4, XATTR_SIZE_MAX)

which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than
XATTR_SIZE_MAX = 65536 for nfs since the block size would be wsize
(=131072) if bsize is not specified.

Let's use pnfs_blksize first instead of using wsize directly if
bsize isn't specified. And the testcase itself can pass now.

Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
Considering bsize is not specified, we might use pnfs_blksize
directly first rather than wsize.

 fs/nfs/super.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Trond Myklebust June 16, 2021, 1:47 p.m. UTC | #1

On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote:
> When testing fstests with ext4 over nfs 4.2, I found generic/486
> failed. The root cause is that the length of its xattr value is
>   min(st_blksize * 3 / 4, XATTR_SIZE_MAX)
> 
> which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than
> XATTR_SIZE_MAX = 65536 for nfs since the block size would be wsize
> (=131072) if bsize is not specified.
> 
> Let's use pnfs_blksize first instead of using wsize directly if
> bsize isn't specified. And the testcase itself can pass now.
> 
> Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> Cc: Anna Schumaker <anna.schumaker@netapp.com>
> Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
> Considering bsize is not specified, we might use pnfs_blksize
> directly first rather than wsize.
> 
>  fs/nfs/super.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index fe58525cfed4..5015edf0cd9a 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct super_block
> *sb, struct nfs_fs_context *ctx)
>         snprintf(sb->s_id, sizeof(sb->s_id),
>                  "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev));
>  
> -       if (sb->s_blocksize == 0)
> -               sb->s_blocksize = nfs_block_bits(server->wsize,
> +       if (sb->s_blocksize == 0) {
> +               unsigned int blksize = server->pnfs_blksize ?
> +                       server->pnfs_blksize : server->wsize;

NACK. The pnfs block size is a layout driver-specific quantity, and
should not be used to substitute for the server-advertised block size.
It only applies to I/O _if_ the client is holding a layout for a
specific file and is using pNFS to do I/O to that file.

It has nothing to do with xattrs at all.

> +
> +               sb->s_blocksize = nfs_block_bits(blksize,
>                                                  &sb-
> >s_blocksize_bits);
> +       }
>  
>         nfs_super_set_maxbytes(sb, server->maxfilesize);
>         server->has_sec_mnt_opts = ctx->has_sec_mnt_opts;

Gao Xiang June 16, 2021, 2:06 p.m. UTC | #2

On Wed, Jun 16, 2021 at 01:47:13PM +0000, Trond Myklebust wrote:
> On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote:
> > When testing fstests with ext4 over nfs 4.2, I found generic/486
> > failed. The root cause is that the length of its xattr value is
> >   min(st_blksize * 3 / 4, XATTR_SIZE_MAX)
> > 
> > which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than
> > XATTR_SIZE_MAX = 65536 for nfs since the block size would be wsize
> > (=131072) if bsize is not specified.
> > 
> > Let's use pnfs_blksize first instead of using wsize directly if
> > bsize isn't specified. And the testcase itself can pass now.
> > 
> > Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> > Cc: Anna Schumaker <anna.schumaker@netapp.com>
> > Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> > Considering bsize is not specified, we might use pnfs_blksize
> > directly first rather than wsize.
> > 
> >  fs/nfs/super.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> > index fe58525cfed4..5015edf0cd9a 100644
> > --- a/fs/nfs/super.c
> > +++ b/fs/nfs/super.c
> > @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct super_block
> > *sb, struct nfs_fs_context *ctx)
> >         snprintf(sb->s_id, sizeof(sb->s_id),
> >                  "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev));
> >  
> > -       if (sb->s_blocksize == 0)
> > -               sb->s_blocksize = nfs_block_bits(server->wsize,
> > +       if (sb->s_blocksize == 0) {
> > +               unsigned int blksize = server->pnfs_blksize ?
> > +                       server->pnfs_blksize : server->wsize;
> 
> NACK. The pnfs block size is a layout driver-specific quantity, and
> should not be used to substitute for the server-advertised block size.
> It only applies to I/O _if_ the client is holding a layout for a
> specific file and is using pNFS to do I/O to that file.

Honestly, I'm not sure if it's ok as well.

> 
> It has nothing to do with xattrs at all.

Yet my question is how to deal with generic/486, should we just skip
the case directly? I cannot find some proper way to get underlayfs
block size or real xattr value limit.

For now, generic/486 will return ENOSPC at
fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE);
when testing new nfs4.2 xattr support.

Thanks,
Gao Xiang

> 
> > +
> > +               sb->s_blocksize = nfs_block_bits(blksize,
> >                                                  &sb-
> > >s_blocksize_bits);
> > +       }
> >  
> >         nfs_super_set_maxbytes(sb, server->maxfilesize);
> >         server->has_sec_mnt_opts = ctx->has_sec_mnt_opts;
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
>

Trond Myklebust June 16, 2021, 2:20 p.m. UTC | #3

On Wed, 2021-06-16 at 22:06 +0800, Gao Xiang wrote:
> On Wed, Jun 16, 2021 at 01:47:13PM +0000, Trond Myklebust wrote:
> > On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote:
> > > When testing fstests with ext4 over nfs 4.2, I found generic/486
> > > failed. The root cause is that the length of its xattr value is
> > >   min(st_blksize * 3 / 4, XATTR_SIZE_MAX)
> > > 
> > > which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than
> > > XATTR_SIZE_MAX = 65536 for nfs since the block size would be
> > > wsize
> > > (=131072) if bsize is not specified.
> > > 
> > > Let's use pnfs_blksize first instead of using wsize directly if
> > > bsize isn't specified. And the testcase itself can pass now.
> > > 
> > > Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > Cc: Anna Schumaker <anna.schumaker@netapp.com>
> > > Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
> > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > ---
> > > Considering bsize is not specified, we might use pnfs_blksize
> > > directly first rather than wsize.
> > > 
> > >  fs/nfs/super.c | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> > > index fe58525cfed4..5015edf0cd9a 100644
> > > --- a/fs/nfs/super.c
> > > +++ b/fs/nfs/super.c
> > > @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct
> > > super_block
> > > *sb, struct nfs_fs_context *ctx)
> > >         snprintf(sb->s_id, sizeof(sb->s_id),
> > >                  "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev));
> > >  
> > > -       if (sb->s_blocksize == 0)
> > > -               sb->s_blocksize = nfs_block_bits(server->wsize,
> > > +       if (sb->s_blocksize == 0) {
> > > +               unsigned int blksize = server->pnfs_blksize ?
> > > +                       server->pnfs_blksize : server->wsize;
> > 
> > NACK. The pnfs block size is a layout driver-specific quantity, and
> > should not be used to substitute for the server-advertised block
> > size.
> > It only applies to I/O _if_ the client is holding a layout for a
> > specific file and is using pNFS to do I/O to that file.
> 
> Honestly, I'm not sure if it's ok as well.
> 
> > 
> > It has nothing to do with xattrs at all.
> 
> Yet my question is how to deal with generic/486, should we just skip
> the case directly? I cannot find some proper way to get underlayfs
> block size or real xattr value limit.
> 

RFC8276 provides no method for determining the xattr size limits. It
just notes that such limits may exist, and provides the error code
NFS4ERR_XATTR2BIG, that the server may use as a return value when those
limits are exceeded.

> For now, generic/486 will return ENOSPC at
> fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE);
> when testing new nfs4.2 xattr support.
> 

As noted above, the NFS server should really be returning
NFS4ERR_XATTR2BIG in this case, which the client, again, should be
transforming into -E2BIG. Where does ENOSPC come from?

> Thanks,
> Gao Xiang
> 
> > 
> > > +
> > > +               sb->s_blocksize = nfs_block_bits(blksize,
> > >                                                  &sb-
> > > > s_blocksize_bits);
> > > +       }
> > >  
> > >         nfs_super_set_maxbytes(sb, server->maxfilesize);
> > >         server->has_sec_mnt_opts = ctx->has_sec_mnt_opts;

Gao Xiang June 16, 2021, 2:41 p.m. UTC | #4

Hi Trond,

On Wed, Jun 16, 2021 at 02:20:49PM +0000, Trond Myklebust wrote:
> On Wed, 2021-06-16 at 22:06 +0800, Gao Xiang wrote:
> > On Wed, Jun 16, 2021 at 01:47:13PM +0000, Trond Myklebust wrote:
> > > On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote:
> > > > When testing fstests with ext4 over nfs 4.2, I found generic/486
> > > > failed. The root cause is that the length of its xattr value is
> > > >   min(st_blksize * 3 / 4, XATTR_SIZE_MAX)
> > > > 
> > > > which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than
> > > > XATTR_SIZE_MAX = 65536 for nfs since the block size would be
> > > > wsize
> > > > (=131072) if bsize is not specified.
> > > > 
> > > > Let's use pnfs_blksize first instead of using wsize directly if
> > > > bsize isn't specified. And the testcase itself can pass now.
> > > > 
> > > > Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > Cc: Anna Schumaker <anna.schumaker@netapp.com>
> > > > Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
> > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > > ---
> > > > Considering bsize is not specified, we might use pnfs_blksize
> > > > directly first rather than wsize.
> > > > 
> > > >  fs/nfs/super.c | 8 ++++++--
> > > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> > > > index fe58525cfed4..5015edf0cd9a 100644
> > > > --- a/fs/nfs/super.c
> > > > +++ b/fs/nfs/super.c
> > > > @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct
> > > > super_block
> > > > *sb, struct nfs_fs_context *ctx)
> > > >         snprintf(sb->s_id, sizeof(sb->s_id),
> > > >                  "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev));
> > > >  
> > > > -       if (sb->s_blocksize == 0)
> > > > -               sb->s_blocksize = nfs_block_bits(server->wsize,
> > > > +       if (sb->s_blocksize == 0) {
> > > > +               unsigned int blksize = server->pnfs_blksize ?
> > > > +                       server->pnfs_blksize : server->wsize;
> > > 
> > > NACK. The pnfs block size is a layout driver-specific quantity, and
> > > should not be used to substitute for the server-advertised block
> > > size.
> > > It only applies to I/O _if_ the client is holding a layout for a
> > > specific file and is using pNFS to do I/O to that file.
> > 
> > Honestly, I'm not sure if it's ok as well.
> > 
> > > 
> > > It has nothing to do with xattrs at all.
> > 
> > Yet my question is how to deal with generic/486, should we just skip
> > the case directly? I cannot find some proper way to get underlayfs
> > block size or real xattr value limit.
> > 
> 
> RFC8276 provides no method for determining the xattr size limits. It
> just notes that such limits may exist, and provides the error code
> NFS4ERR_XATTR2BIG, that the server may use as a return value when those
> limits are exceeded.
> 
> > For now, generic/486 will return ENOSPC at
> > fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE);
> > when testing new nfs4.2 xattr support.
> > 
> 
> As noted above, the NFS server should really be returning
> NFS4ERR_XATTR2BIG in this case, which the client, again, should be
> transforming into -E2BIG. Where does ENOSPC come from?

Thanks for the detailed explanation...

I think that is due to ext4 returning ENOSPC since I tested

fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE);
with ext4 as well and it returned ENOSPC, and I think it's reasonable
since setxattr() will return ENOSPC for such cases.
https://man7.org/linux/man-pages/man2/setxattr.2.html

should we transform it to E2BIG instead (at least in NFS
protocol)? but I'm still not sure that E2BIG is a valid return code for
setxattr()...

If necessary, I will look into it more tomorrow....

Thanks,
Gao Xiang

> 
> > Thanks,
> > Gao Xiang
> > 
> > > 
> > > > +
> > > > +               sb->s_blocksize = nfs_block_bits(blksize,
> > > >                                                  &sb-
> > > > > s_blocksize_bits);
> > > > +       }
> > > >  
> > > >         nfs_super_set_maxbytes(sb, server->maxfilesize);
> > > >         server->has_sec_mnt_opts = ctx->has_sec_mnt_opts;
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
>

Trond Myklebust June 16, 2021, 3:14 p.m. UTC | #5

On Wed, 2021-06-16 at 22:41 +0800, Gao Xiang wrote:
> Hi Trond,
> 
> On Wed, Jun 16, 2021 at 02:20:49PM +0000, Trond Myklebust wrote:
> > On Wed, 2021-06-16 at 22:06 +0800, Gao Xiang wrote:
> > > On Wed, Jun 16, 2021 at 01:47:13PM +0000, Trond Myklebust wrote:
> > > > On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote:
> > > > > When testing fstests with ext4 over nfs 4.2, I found
> > > > > generic/486
> > > > > failed. The root cause is that the length of its xattr value is
> > > > >   min(st_blksize * 3 / 4, XATTR_SIZE_MAX)
> > > > > 
> > > > > which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than
> > > > > XATTR_SIZE_MAX = 65536 for nfs since the block size would be
> > > > > wsize
> > > > > (=131072) if bsize is not specified.
> > > > > 
> > > > > Let's use pnfs_blksize first instead of using wsize directly if
> > > > > bsize isn't specified. And the testcase itself can pass now.
> > > > > 
> > > > > Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > Cc: Anna Schumaker <anna.schumaker@netapp.com>
> > > > > Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
> > > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > > > ---
> > > > > Considering bsize is not specified, we might use pnfs_blksize
> > > > > directly first rather than wsize.
> > > > > 
> > > > >  fs/nfs/super.c | 8 ++++++--
> > > > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> > > > > index fe58525cfed4..5015edf0cd9a 100644
> > > > > --- a/fs/nfs/super.c
> > > > > +++ b/fs/nfs/super.c
> > > > > @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct
> > > > > super_block
> > > > > *sb, struct nfs_fs_context *ctx)
> > > > >         snprintf(sb->s_id, sizeof(sb->s_id),
> > > > >                  "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev));
> > > > >  
> > > > > -       if (sb->s_blocksize == 0)
> > > > > -               sb->s_blocksize = nfs_block_bits(server->wsize,
> > > > > +       if (sb->s_blocksize == 0) {
> > > > > +               unsigned int blksize = server->pnfs_blksize ?
> > > > > +                       server->pnfs_blksize : server->wsize;
> > > > 
> > > > NACK. The pnfs block size is a layout driver-specific quantity,
> > > > and
> > > > should not be used to substitute for the server-advertised block
> > > > size.
> > > > It only applies to I/O _if_ the client is holding a layout for a
> > > > specific file and is using pNFS to do I/O to that file.
> > > 
> > > Honestly, I'm not sure if it's ok as well.
> > > 
> > > > 
> > > > It has nothing to do with xattrs at all.
> > > 
> > > Yet my question is how to deal with generic/486, should we just
> > > skip
> > > the case directly? I cannot find some proper way to get underlayfs
> > > block size or real xattr value limit.
> > > 
> > 
> > RFC8276 provides no method for determining the xattr size limits. It
> > just notes that such limits may exist, and provides the error code
> > NFS4ERR_XATTR2BIG, that the server may use as a return value when
> > those
> > limits are exceeded.
> > 
> > > For now, generic/486 will return ENOSPC at
> > > fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE);
> > > when testing new nfs4.2 xattr support.
> > > 
> > 
> > As noted above, the NFS server should really be returning
> > NFS4ERR_XATTR2BIG in this case, which the client, again, should be
> > transforming into -E2BIG. Where does ENOSPC come from?
> 
> Thanks for the detailed explanation...
> 
> I think that is due to ext4 returning ENOSPC since I tested
> 
> fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE);
> with ext4 as well and it returned ENOSPC, and I think it's reasonable
> since setxattr() will return ENOSPC for such cases.
> https://man7.org/linux/man-pages/man2/setxattr.2.html
> 
> should we transform it to E2BIG instead (at least in NFS
> protocol)? but I'm still not sure that E2BIG is a valid return code for
> setxattr()...

The setxattr() manpage appears to suggest ERANGE is the correct return
value here.

       ERANGE The size of name or value exceeds a filesystem-specific
limit.


However I can't tell if ext4 and xfs ever do that. Furthermore, it
looks as if the VFS is always returning E2BIG if size > XATTR_SIZE_MAX.

> 
> If necessary, I will look into it more tomorrow....
> 
> Thanks,
> Gao Xiang
> 
> > 
> > > Thanks,
> > > Gao Xiang
> > > 
> > > > 
> > > > > +
> > > > > +               sb->s_blocksize = nfs_block_bits(blksize,
> > > > >                                                  &sb-
> > > > > > s_blocksize_bits);
> > > > > +       }
> > > > >  
> > > > >         nfs_super_set_maxbytes(sb, server->maxfilesize);
> > > > >         server->has_sec_mnt_opts = ctx->has_sec_mnt_opts;
> > 
> > -- 
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> > 
> >

Theodore Ts'o June 16, 2021, 4:14 p.m. UTC | #6

On Wed, Jun 16, 2021 at 10:41:34PM +0800, Gao Xiang wrote:
> > > Yet my question is how to deal with generic/486, should we just skip
> > > the case directly? I cannot find some proper way to get underlayfs
> > > block size or real xattr value limit.

Note that the block size does not necessarily have thing to do with
the xattr value limit.  For example, in ext4, if you create the file
system with the ea_inode feature enabled, you can create extended
attributes up to the maximum value of 64k defined by the xattr
interface --- unless, of course, there isn't enough space in the file
system.

(The ea_inode feature will also use shared space for large xattrs, so
if you are storing hundreds of thousands of files that all have an
identical 20 kilbyte Windows security id or ACL, ea_inode is going to
be much more efficient way of supporting that particular use case.)

> > As noted above, the NFS server should really be returning
> > NFS4ERR_XATTR2BIG in this case, which the client, again, should be
> > transforming into -E2BIG. Where does ENOSPC come from?
> 
> Thanks for the detailed explanation...
> 
> I think that is due to ext4 returning ENOSPC since I tested

It's not just ext4.  From inspection, under some circumstances f2fs
and btrfs can return ENOSPC.

The distinction is that E2BIG is (from the attr_set man page):

       [E2BIG]          The value of the given attribute is too large,
       			it exceeds the maximum allowable size of an
			attribute value.

The maximum allowable size (enforced by the VFS) is XATTR_MAX_SIZE,
which is 65536 bytes.  Some file systems may impose a smaller max
allowable size.

In contrast ENOSPC means something different:

       ENOSPC		No space left on device.

This isn't necessarily just block space, BTW; it might some other file
system space --- thre might not be a free inode in the case of ext4's
ea_inode feature.  Or it be the f2fs file system not being able to
allocate a node id via f2fs_alloc_nid().

Note that generic/486 is testing a very specific case, which is
replacing a small xattr (16 bytes) with an xattr with a large value.
This is would be a really interesting test for ext4 ea_inode, when we
are replacing an xattr stored inline in the inode, or in a single 4k
block, with an xattr stored in a separate inode.  But not the way
src/attr_replace_test.c (which does all of the heavy lifting for the
generic/486 test) is currently written.

So what I would suggest is to have attr_replace_test.c try to
determine the maximum attr value size using binary search.  Start with
min=16, and max=65536, and try creating an xattr of a particular size,
and then delete the xattr, and then retry creating the xattr with the
next binary search size.  This will allow you to create a function
which determines the maximum size attr for a particular file system,
especially in those cases where it is dependent on how the file system
is configured.

> should we transform it to E2BIG instead (at least in NFS
> protocol)? but I'm still not sure that E2BIG is a valid return code for
> setxattr()...

E2BIG is defined in the attr_set(3) man page.  ENOSPC isn't mentioned
in the attr_set man page, but given that multiple file systems return
ENOSPC, and ENOSPC has a well-defined meaning in POSIX.1 which very
much makes sense when creating extended attributes, we should fix that
by adding ENOSPC to the attr_set(3) man page (which is shipped as part
of the libattr library sources).

Cheers,

						- Ted

Theodore Ts'o June 16, 2021, 5:08 p.m. UTC | #7

On Wed, Jun 16, 2021 at 03:14:17PM +0000, Trond Myklebust wrote:
> > should we transform it to E2BIG instead (at least in NFS
> > protocol)? but I'm still not sure that E2BIG is a valid return code for
> > setxattr()...
> 
> The setxattr() manpage appears to suggest ERANGE is the correct return
> value here.
> 
>        ERANGE The size of name or value exceeds a filesystem-specific
> limit.
> 
> However I can't tell if ext4 and xfs ever do that.

Ext4 will return ERANGE if the size of the xattr name is greater than
255 bytes.  However, if there is no room to create the xattr --- for
example there is no space to allocate a file system block, ext4 will
return ENOSPC.

Without the ea_inode feature, ext2 and ext4 use a single 4k block to
store all extended attributes.  So whether an xattr can be created is
dependent on whether there is room in that 4k block.  If there are too
many xattrs, or the sum of the length of all the xattr names and
values plus a certain amount of overhead exceeds 4k, it will return
ENOSPC.

> Furthermore, it looks as if the VFS is always returning E2BIG if
> size > XATTR_SIZE_MAX.

This is defined by the attr_set(3) man page, but not in the
setxattr(2) man page.  The set of errors are never intended to be
exhaustive --- there are some tcp-related errors that probably can
exposed to the caller when it is operating on a network-based file
system, for example.  But it would probably be a good idea to update
the man pages so it's a bit clearer when E2BIG and ERANGE and ENOSPC
mean.

Cheers,

					- Ted

Frank van der Linden June 16, 2021, 5:17 p.m. UTC | #8

On Wed, Jun 16, 2021 at 03:14:17PM +0000, Trond Myklebust wrote:
> The setxattr() manpage appears to suggest ERANGE is the correct return
> value here.
> 
>        ERANGE The size of name or value exceeds a filesystem-specific
> limit.
> 
> 
> However I can't tell if ext4 and xfs ever do that. Furthermore, it
> looks as if the VFS is always returning E2BIG if size > XATTR_SIZE_MAX.
> 

The basic issue here is that there are two limits: the generic one
(XATTR_SIZE_MAX), and the fs-specific one.

When crossing the generic one, the xattr code returns E2BIG. When
crossing the fs-specific one, it looks like there are a few filesystems
that return E2BIG, but others (like ext4) return ENOSPC.

For the server, NFS4ERR_XATTR2BIG is the right value to return for all
these cases.  For the generic limit, it's an easy check. For the
fs-specific limit, the server code doesn't necessarily know what's
going on, since filesystems don't have a way to advertise their
limits. So ENOSPC will *probably* mean that the attribute was too
large for the filesystem, but it might not.

You could change the server code to translate ENOSPC to NFS4ERR_XATTR2BIG.
But that might not be totally correct either, you're going to end up returning
an error to the client that is not correct in all cases either way.

The problem here for xfstests is how to define the 'correct' behavior
across all filesystems so that there's a clean pass/fail, as long
as these inconsistencies exist.

- Frank

Gao Xiang June 16, 2021, 5:51 p.m. UTC | #9

Hi Ted,

On Wed, Jun 16, 2021 at 12:14:44PM -0400, Theodore Ts'o wrote:
> On Wed, Jun 16, 2021 at 10:41:34PM +0800, Gao Xiang wrote:
> > > > Yet my question is how to deal with generic/486, should we just skip
> > > > the case directly? I cannot find some proper way to get underlayfs
> > > > block size or real xattr value limit.
> 
> Note that the block size does not necessarily have thing to do with
> the xattr value limit.  For example, in ext4, if you create the file
> system with the ea_inode feature enabled, you can create extended
> attributes up to the maximum value of 64k defined by the xattr
> interface --- unless, of course, there isn't enough space in the file
> system.
> 
> (The ea_inode feature will also use shared space for large xattrs, so
> if you are storing hundreds of thousands of files that all have an
> identical 20 kilbyte Windows security id or ACL, ea_inode is going to
> be much more efficient way of supporting that particular use case.)

Thanks for your detailed explanation too.

Yeah, yet it's not enabled in the issue setup I was assigned :(

> 
> > > As noted above, the NFS server should really be returning
> > > NFS4ERR_XATTR2BIG in this case, which the client, again, should be
> > > transforming into -E2BIG. Where does ENOSPC come from?
> > 
> > Thanks for the detailed explanation...
> > 
> > I think that is due to ext4 returning ENOSPC since I tested
> 
> It's not just ext4.  From inspection, under some circumstances f2fs
> and btrfs can return ENOSPC.

I did some test on ext4 only earlier since it's our test environment.
I didn't mean the ENOSPC behavior was ext4 only ( :-) very sorry about
that if some misunderstanding here )

> 
> The distinction is that E2BIG is (from the attr_set man page):
> 
>        [E2BIG]          The value of the given attribute is too large,
>        			it exceeds the maximum allowable size of an
> 			attribute value.
> 
> The maximum allowable size (enforced by the VFS) is XATTR_MAX_SIZE,
> which is 65536 bytes.  Some file systems may impose a smaller max
> allowable size.
> 
> In contrast ENOSPC means something different:
> 
>        ENOSPC		No space left on device.
> 
> This isn't necessarily just block space, BTW; it might some other file
> system space --- thre might not be a free inode in the case of ext4's
> ea_inode feature.  Or it be the f2fs file system not being able to
> allocate a node id via f2fs_alloc_nid().
> 
> Note that generic/486 is testing a very specific case, which is
> replacing a small xattr (16 bytes) with an xattr with a large value.
> This is would be a really interesting test for ext4 ea_inode, when we
> are replacing an xattr stored inline in the inode, or in a single 4k
> block, with an xattr stored in a separate inode.  But not the way
> src/attr_replace_test.c (which does all of the heavy lifting for the
> generic/486 test) is currently written.
> 

Yeah, as for the original case, it tried to test when it turned into
the XFS extents format, but I'm not sure if it's just the regression
test for such report only (similiar to ext4 inline xattr to non-inline
xattr case.) rather than test the XFS_DINODE_FMT_BTREE case or ea_inode
case...

> So what I would suggest is to have attr_replace_test.c try to
> determine the maximum attr value size using binary search.  Start with
> min=16, and max=65536, and try creating an xattr of a particular size,
> and then delete the xattr, and then retry creating the xattr with the
> next binary search size.  This will allow you to create a function
> which determines the maximum size attr for a particular file system,
> especially in those cases where it is dependent on how the file system
> is configured.

Considering the original XFS regression report [1], I think
underlayfs blksize may still be needed. And binary search to get the
maximum attr value may be another new case for this as well. Or
alternatively just add by block size to do a trip test.

Although I have no idea if we can just skip the case when testing with
NFS. If getting underlayfs blksize is unfeasible, I think we might
skip such case for now since nfs blksize is not useful for generic/486.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=199119

Thanks,
Gao Xiang

> 
> > should we transform it to E2BIG instead (at least in NFS
> > protocol)? but I'm still not sure that E2BIG is a valid return code for
> > setxattr()...
> 
> E2BIG is defined in the attr_set(3) man page.  ENOSPC isn't mentioned
> in the attr_set man page, but given that multiple file systems return
> ENOSPC, and ENOSPC has a well-defined meaning in POSIX.1 which very
> much makes sense when creating extended attributes, we should fix that
> by adding ENOSPC to the attr_set(3) man page (which is shipped as part
> of the libattr library sources).
> 
> Cheers,
> 
> 						- Ted

Trond Myklebust June 16, 2021, 6:51 p.m. UTC | #10

On Thu, 2021-06-17 at 01:51 +0800, Gao Xiang wrote:
> Hi Ted,
> 
> On Wed, Jun 16, 2021 at 12:14:44PM -0400, Theodore Ts'o wrote:
> > On Wed, Jun 16, 2021 at 10:41:34PM +0800, Gao Xiang wrote:
> > > > > Yet my question is how to deal with generic/486, should we
> > > > > just skip
> > > > > the case directly? I cannot find some proper way to get
> > > > > underlayfs
> > > > > block size or real xattr value limit.
> > 
> > Note that the block size does not necessarily have thing to do with
> > the xattr value limit.  For example, in ext4, if you create the
> > file
> > system with the ea_inode feature enabled, you can create extended
> > attributes up to the maximum value of 64k defined by the xattr
> > interface --- unless, of course, there isn't enough space in the
> > file
> > system.
> > 
> > (The ea_inode feature will also use shared space for large xattrs,
> > so
> > if you are storing hundreds of thousands of files that all have an
> > identical 20 kilbyte Windows security id or ACL, ea_inode is going
> > to
> > be much more efficient way of supporting that particular use case.)
> 
> Thanks for your detailed explanation too.
> 
> Yeah, yet it's not enabled in the issue setup I was assigned :(
> 
> > 
> > > > As noted above, the NFS server should really be returning
> > > > NFS4ERR_XATTR2BIG in this case, which the client, again, should
> > > > be
> > > > transforming into -E2BIG. Where does ENOSPC come from?
> > > 
> > > Thanks for the detailed explanation...
> > > 
> > > I think that is due to ext4 returning ENOSPC since I tested
> > 
> > It's not just ext4.  From inspection, under some circumstances f2fs
> > and btrfs can return ENOSPC.
> 
> I did some test on ext4 only earlier since it's our test environment.
> I didn't mean the ENOSPC behavior was ext4 only ( :-) very sorry
> about
> that if some misunderstanding here )
> 
> > 
> > The distinction is that E2BIG is (from the attr_set man page):
> > 
> >        [E2BIG]          The value of the given attribute is too
> > large,
> >                         it exceeds the maximum allowable size of an
> >                         attribute value.
> > 
> > The maximum allowable size (enforced by the VFS) is XATTR_MAX_SIZE,
> > which is 65536 bytes.  Some file systems may impose a smaller max
> > allowable size.
> > 
> > In contrast ENOSPC means something different:
> > 
> >        ENOSPC           No space left on device.
> > 
> > This isn't necessarily just block space, BTW; it might some other
> > file
> > system space --- thre might not be a free inode in the case of
> > ext4's
> > ea_inode feature.  Or it be the f2fs file system not being able to
> > allocate a node id via f2fs_alloc_nid().
> > 
> > Note that generic/486 is testing a very specific case, which is
> > replacing a small xattr (16 bytes) with an xattr with a large
> > value.
> > This is would be a really interesting test for ext4 ea_inode, when
> > we
> > are replacing an xattr stored inline in the inode, or in a single
> > 4k
> > block, with an xattr stored in a separate inode.  But not the way
> > src/attr_replace_test.c (which does all of the heavy lifting for
> > the
> > generic/486 test) is currently written.
> > 
> 
> Yeah, as for the original case, it tried to test when it turned into
> the XFS extents format, but I'm not sure if it's just the regression
> test for such report only (similiar to ext4 inline xattr to non-
> inline
> xattr case.) rather than test the XFS_DINODE_FMT_BTREE case or
> ea_inode
> case...
> 
> > So what I would suggest is to have attr_replace_test.c try to
> > determine the maximum attr value size using binary search.  Start
> > with
> > min=16, and max=65536, and try creating an xattr of a particular
> > size,
> > and then delete the xattr, and then retry creating the xattr with
> > the
> > next binary search size.  This will allow you to create a function
> > which determines the maximum size attr for a particular file
> > system,
> > especially in those cases where it is dependent on how the file
> > system
> > is configured.
> 
> Considering the original XFS regression report [1], I think
> underlayfs blksize may still be needed. And binary search to get the
> maximum attr value may be another new case for this as well. Or
> alternatively just add by block size to do a trip test.
> 
> Although I have no idea if we can just skip the case when testing
> with
> NFS. If getting underlayfs blksize is unfeasible, I think we might
> skip such case for now since nfs blksize is not useful for
> generic/486.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=199119

I'm still not seeing why we care about block sizes, and I'm certainly
not seeing why we should care about them in the NFS case.

xattrs are a form of key-value storage with atomic semantics (i.e. when
you attempt to get/set/update/remove them, then the operation either
succeeds or it fails without any side-effects).
There is no interface for doing any form of block I/O to an xattr. 
There is no requirement in the documentation that the user know
anything about block size in order to use them.

In other words, if this xfstest 'generic/486' is depending on knowledge
of the block size, then it is fundamentally a filesystem
implementation-specific test. It doesn't belong in the 'generic' test
category, because it is not testing anything that is generic to xattrs.

> 
> Thanks,
> Gao Xiang
> 
> > 
> > > should we transform it to E2BIG instead (at least in NFS
> > > protocol)? but I'm still not sure that E2BIG is a valid return
> > > code for
> > > setxattr()...
> > 
> > E2BIG is defined in the attr_set(3) man page.  ENOSPC isn't
> > mentioned
> > in the attr_set man page, but given that multiple file systems
> > return
> > ENOSPC, and ENOSPC has a well-defined meaning in POSIX.1 which very
> > much makes sense when creating extended attributes, we should fix
> > that
> > by adding ENOSPC to the attr_set(3) man page (which is shipped as
> > part
> > of the libattr library sources).
> > 
> > Cheers,
> > 
> >                                                 - Ted

Theodore Ts'o June 16, 2021, 10:51 p.m. UTC | #11

On Wed, Jun 16, 2021 at 05:17:08PM +0000, Frank van der Linden wrote:
> 
> The problem here for xfstests is how to define the 'correct' behavior
> across all filesystems so that there's a clean pass/fail, as long
> as these inconsistencies exist.

Note that the original xfstest in question is to check what happens
when you have a pre-existing xattr with a small (16 byte) value, and
replacing (overwriting) the xattr with a large (but valid) value.

The problem was that generic/486 was using the preferred size for
efficient I/O size as the "block" size, and then using a percentage of
this "block" size as the arbitrary "large xattr size".

It was not about the error codes being returned, which is a bit
confusing, and for which different man pages (attr_set and setxattr)
are differently incomplete.  That's really a different issue, and the
fact that different file systems can use different error codes is not
necessarily something that we need to rationalize, in particular for
file systems like NFS where the error codes returned are not entirely
under the control of the NFS client or server code.

Cheers,

					- Ted

Theodore Ts'o June 16, 2021, 10:55 p.m. UTC | #12

On Thu, Jun 17, 2021 at 01:51:04AM +0800, Gao Xiang wrote:
> 
> Considering the original XFS regression report [1], I think
> underlayfs blksize may still be needed. And binary search to get the
> maximum attr value may be another new case for this as well. Or
> alternatively just add by block size to do a trip test.
> 
> Although I have no idea if we can just skip the case when testing with
> NFS. If getting underlayfs blksize is unfeasible, I think we might
> skip such case for now since nfs blksize is not useful for generic/486.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=199119

I've looked at the original XFS regression size, and I don't see why
using the underlaying blocksize matters at all.  This is especially
true if you look at the comment in the test, and the commit which
fixed the bug.  All that is needed for the xfs regression test is to
start with a small xattr, and replace it with a large xattr.  The
blocksize is really irrelevant.

						- Ted

Gao Xiang June 17, 2021, 2:39 a.m. UTC | #13

On Wed, Jun 16, 2021 at 06:55:31PM -0400, Theodore Ts'o wrote:
> On Thu, Jun 17, 2021 at 01:51:04AM +0800, Gao Xiang wrote:
> > 
> > Considering the original XFS regression report [1], I think
> > underlayfs blksize may still be needed. And binary search to get the
> > maximum attr value may be another new case for this as well. Or
> > alternatively just add by block size to do a trip test.
> > 
> > Although I have no idea if we can just skip the case when testing with
> > NFS. If getting underlayfs blksize is unfeasible, I think we might
> > skip such case for now since nfs blksize is not useful for generic/486.
> > 
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=199119
> 
> I've looked at the original XFS regression size, and I don't see why
> using the underlaying blocksize matters at all.  This is especially
> true if you look at the comment in the test, and the commit which
> fixed the bug.  All that is needed for the xfs regression test is to
> start with a small xattr, and replace it with a large xattr.  The
> blocksize is really irrelevant.

What I said was the original testcase strictly addressing the original
regression report, which converts from shortform to single-block
leaf format, see:

https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/attr_replace_test.c#n40

>	/*
>	 * The value should be 3/4 the size of a fs block to ensure that we
>	 * get to extents format.
>	 */
>	ret = fstat(fd, &sbuf);
>	if (ret < 0) die();
>	size = sbuf.st_blksize * 3 / 4;

and

https://lore.kernel.org/fstests/20180425054826.GB1661@magnolia/

> > And I found another problem in the test, it fails on 1k/2k block size
> > extN filesystems, because 2k xattr doesn't fit in single block.. e.g.
> > 
> >     -Attribute "world" has a 2048 byte value for SCRATCH_MNT/hello
> >     +No space left on device
> >     +error=22 at line 46
> >     +Attribute "world" has a 1 byte value for
> > 
> > We probably need to check the block size of SCRATCH_DEV and _notrun if
> > it's smaller than 4k.
> >
> > Or require ea_inode feature when block size < 4k? Note that this test
> > does fail on ext4 with ea_inode feature enabled (so add ext4 list to
> > cc). e.g.

> I was about to say "Eh, this is a regression test for XFS so that's
> probably fine, but then...

Of course, the testcase itself may have room to improve, I'll look at
it when I have extra time.

Thanks,
Gao Xiang

> 
> 						- Ted

Theodore Ts'o June 17, 2021, 1:08 p.m. UTC | #14

On Thu, Jun 17, 2021 at 10:39:15AM +0800, Gao Xiang wrote:
> What I said was the original testcase strictly addressing the original
> regression report, which converts from shortform to single-block
> leaf format, see:
> 
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/attr_replace_test.c#n40

So the question is really is this a generic test or an xfs test?  If
it's an xfs test, then you can make non-portable assumptions about the
returned values from statfs(2).

If it's a generic test, then you can't really do that.  And given the
name --- generic/486, it's a generic test.

What we probably *can* do is to test a series of xattr value
replacements --- e.g., find out what is the maximum xattr size
supported, and then try going from an 8 byte xattr value to an xattr
value of size N, where N might be 16, 32, 64, 128, 256, .. up to the
max xattr size supported.

One could also imagine testing starting with an xattr size of 16, 32,
64, 128, 256, ... max_size and replacing with a small xattr, which
would test the reverse case.

That would ba a truly general, generic test.

In any case, I'd suggest making a proposed patch to the fstests@
mailing list, since I'll note that the current set of lists where this
is being discussed may not guarantee that XFS developers will be
paying attention to this thread.

> > > And I found another problem in the test, it fails on 1k/2k block size
> > > extN filesystems, because 2k xattr doesn't fit in single block.. e.g.
> > > 
> > >     -Attribute "world" has a 2048 byte value for SCRATCH_MNT/hello
> > >     +No space left on device
> > >     +error=22 at line 46
> > >     +Attribute "world" has a 1 byte value for
> > > 
> > > We probably need to check the block size of SCRATCH_DEV and _notrun if
> > > it's smaller than 4k.

I think you mean using a 1k or 2k extN file system when testing over
NFS.  It works fine directly on a 1k block file system, since that's
one of my regular test configs and I would have noticed if it was
breaking.

BEGIN TEST 1k (1 test): Ext4 1k block Thu Jun 17 09:05:09 EDT 2021
DEVICE: /dev/vdd
EXT_MKFS_OPTIONS: -b 1024
EXT_MOUNT_OPTIONS: -o block_validity
FSTYP         -- ext4
PLATFORM      -- Linux/x86_64 kvm-xfstests 5.13.0-rc5-xfstests-00008-ga3f5d4136d5a #171 SMP Wed Jun 16 20:40:09 EDT 2021
MKFS_OPTIONS  -- -q -b 1024 /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc

generic/486             [09:05:10][   19.128541] run fstests generic/486 at 2021-06-17 09:05:10
 [09:05:10] 0s
Ran: generic/486
Passed all 1 tests
Xunit report: /results/ext4/results-1k/result.xml

Again, my suggestion of using binary search to find the largest size
xattr supported by the file system is really the way to go.

Cheers,

						- Ted

nfs: set block size according to pnfs_blksize first

Commit Message

Comments

Patch