Message ID | 1623847469-150122-1-git-send-email-hsiangkao@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nfs: set block size according to pnfs_blksize first | expand |
On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote: > When testing fstests with ext4 over nfs 4.2, I found generic/486 > failed. The root cause is that the length of its xattr value is > min(st_blksize * 3 / 4, XATTR_SIZE_MAX) > > which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than > XATTR_SIZE_MAX = 65536 for nfs since the block size would be wsize > (=131072) if bsize is not specified. > > Let's use pnfs_blksize first instead of using wsize directly if > bsize isn't specified. And the testcase itself can pass now. > > Cc: Trond Myklebust <trond.myklebust@hammerspace.com> > Cc: Anna Schumaker <anna.schumaker@netapp.com> > Cc: Joseph Qi <joseph.qi@linux.alibaba.com> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > --- > Considering bsize is not specified, we might use pnfs_blksize > directly first rather than wsize. > > fs/nfs/super.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > index fe58525cfed4..5015edf0cd9a 100644 > --- a/fs/nfs/super.c > +++ b/fs/nfs/super.c > @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct super_block > *sb, struct nfs_fs_context *ctx) > snprintf(sb->s_id, sizeof(sb->s_id), > "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev)); > > - if (sb->s_blocksize == 0) > - sb->s_blocksize = nfs_block_bits(server->wsize, > + if (sb->s_blocksize == 0) { > + unsigned int blksize = server->pnfs_blksize ? > + server->pnfs_blksize : server->wsize; NACK. The pnfs block size is a layout driver-specific quantity, and should not be used to substitute for the server-advertised block size. It only applies to I/O _if_ the client is holding a layout for a specific file and is using pNFS to do I/O to that file. It has nothing to do with xattrs at all. > + > + sb->s_blocksize = nfs_block_bits(blksize, > &sb- > >s_blocksize_bits); > + } > > nfs_super_set_maxbytes(sb, server->maxfilesize); > server->has_sec_mnt_opts = ctx->has_sec_mnt_opts;
On Wed, Jun 16, 2021 at 01:47:13PM +0000, Trond Myklebust wrote: > On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote: > > When testing fstests with ext4 over nfs 4.2, I found generic/486 > > failed. The root cause is that the length of its xattr value is > > min(st_blksize * 3 / 4, XATTR_SIZE_MAX) > > > > which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than > > XATTR_SIZE_MAX = 65536 for nfs since the block size would be wsize > > (=131072) if bsize is not specified. > > > > Let's use pnfs_blksize first instead of using wsize directly if > > bsize isn't specified. And the testcase itself can pass now. > > > > Cc: Trond Myklebust <trond.myklebust@hammerspace.com> > > Cc: Anna Schumaker <anna.schumaker@netapp.com> > > Cc: Joseph Qi <joseph.qi@linux.alibaba.com> > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > > --- > > Considering bsize is not specified, we might use pnfs_blksize > > directly first rather than wsize. > > > > fs/nfs/super.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > > index fe58525cfed4..5015edf0cd9a 100644 > > --- a/fs/nfs/super.c > > +++ b/fs/nfs/super.c > > @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct super_block > > *sb, struct nfs_fs_context *ctx) > > snprintf(sb->s_id, sizeof(sb->s_id), > > "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev)); > > > > - if (sb->s_blocksize == 0) > > - sb->s_blocksize = nfs_block_bits(server->wsize, > > + if (sb->s_blocksize == 0) { > > + unsigned int blksize = server->pnfs_blksize ? > > + server->pnfs_blksize : server->wsize; > > NACK. The pnfs block size is a layout driver-specific quantity, and > should not be used to substitute for the server-advertised block size. > It only applies to I/O _if_ the client is holding a layout for a > specific file and is using pNFS to do I/O to that file. Honestly, I'm not sure if it's ok as well. > > It has nothing to do with xattrs at all. Yet my question is how to deal with generic/486, should we just skip the case directly? I cannot find some proper way to get underlayfs block size or real xattr value limit. For now, generic/486 will return ENOSPC at fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE); when testing new nfs4.2 xattr support. Thanks, Gao Xiang > > > + > > + sb->s_blocksize = nfs_block_bits(blksize, > > &sb- > > >s_blocksize_bits); > > + } > > > > nfs_super_set_maxbytes(sb, server->maxfilesize); > > server->has_sec_mnt_opts = ctx->has_sec_mnt_opts; > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >
On Wed, 2021-06-16 at 22:06 +0800, Gao Xiang wrote: > On Wed, Jun 16, 2021 at 01:47:13PM +0000, Trond Myklebust wrote: > > On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote: > > > When testing fstests with ext4 over nfs 4.2, I found generic/486 > > > failed. The root cause is that the length of its xattr value is > > > min(st_blksize * 3 / 4, XATTR_SIZE_MAX) > > > > > > which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than > > > XATTR_SIZE_MAX = 65536 for nfs since the block size would be > > > wsize > > > (=131072) if bsize is not specified. > > > > > > Let's use pnfs_blksize first instead of using wsize directly if > > > bsize isn't specified. And the testcase itself can pass now. > > > > > > Cc: Trond Myklebust <trond.myklebust@hammerspace.com> > > > Cc: Anna Schumaker <anna.schumaker@netapp.com> > > > Cc: Joseph Qi <joseph.qi@linux.alibaba.com> > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > > > --- > > > Considering bsize is not specified, we might use pnfs_blksize > > > directly first rather than wsize. > > > > > > fs/nfs/super.c | 8 ++++++-- > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > > > index fe58525cfed4..5015edf0cd9a 100644 > > > --- a/fs/nfs/super.c > > > +++ b/fs/nfs/super.c > > > @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct > > > super_block > > > *sb, struct nfs_fs_context *ctx) > > > snprintf(sb->s_id, sizeof(sb->s_id), > > > "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev)); > > > > > > - if (sb->s_blocksize == 0) > > > - sb->s_blocksize = nfs_block_bits(server->wsize, > > > + if (sb->s_blocksize == 0) { > > > + unsigned int blksize = server->pnfs_blksize ? > > > + server->pnfs_blksize : server->wsize; > > > > NACK. The pnfs block size is a layout driver-specific quantity, and > > should not be used to substitute for the server-advertised block > > size. > > It only applies to I/O _if_ the client is holding a layout for a > > specific file and is using pNFS to do I/O to that file. > > Honestly, I'm not sure if it's ok as well. > > > > > It has nothing to do with xattrs at all. > > Yet my question is how to deal with generic/486, should we just skip > the case directly? I cannot find some proper way to get underlayfs > block size or real xattr value limit. > RFC8276 provides no method for determining the xattr size limits. It just notes that such limits may exist, and provides the error code NFS4ERR_XATTR2BIG, that the server may use as a return value when those limits are exceeded. > For now, generic/486 will return ENOSPC at > fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE); > when testing new nfs4.2 xattr support. > As noted above, the NFS server should really be returning NFS4ERR_XATTR2BIG in this case, which the client, again, should be transforming into -E2BIG. Where does ENOSPC come from? > Thanks, > Gao Xiang > > > > > > + > > > + sb->s_blocksize = nfs_block_bits(blksize, > > > &sb- > > > > s_blocksize_bits); > > > + } > > > > > > nfs_super_set_maxbytes(sb, server->maxfilesize); > > > server->has_sec_mnt_opts = ctx->has_sec_mnt_opts;
Hi Trond, On Wed, Jun 16, 2021 at 02:20:49PM +0000, Trond Myklebust wrote: > On Wed, 2021-06-16 at 22:06 +0800, Gao Xiang wrote: > > On Wed, Jun 16, 2021 at 01:47:13PM +0000, Trond Myklebust wrote: > > > On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote: > > > > When testing fstests with ext4 over nfs 4.2, I found generic/486 > > > > failed. The root cause is that the length of its xattr value is > > > > min(st_blksize * 3 / 4, XATTR_SIZE_MAX) > > > > > > > > which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than > > > > XATTR_SIZE_MAX = 65536 for nfs since the block size would be > > > > wsize > > > > (=131072) if bsize is not specified. > > > > > > > > Let's use pnfs_blksize first instead of using wsize directly if > > > > bsize isn't specified. And the testcase itself can pass now. > > > > > > > > Cc: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > Cc: Anna Schumaker <anna.schumaker@netapp.com> > > > > Cc: Joseph Qi <joseph.qi@linux.alibaba.com> > > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > > > > --- > > > > Considering bsize is not specified, we might use pnfs_blksize > > > > directly first rather than wsize. > > > > > > > > fs/nfs/super.c | 8 ++++++-- > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > > > > index fe58525cfed4..5015edf0cd9a 100644 > > > > --- a/fs/nfs/super.c > > > > +++ b/fs/nfs/super.c > > > > @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct > > > > super_block > > > > *sb, struct nfs_fs_context *ctx) > > > > snprintf(sb->s_id, sizeof(sb->s_id), > > > > "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev)); > > > > > > > > - if (sb->s_blocksize == 0) > > > > - sb->s_blocksize = nfs_block_bits(server->wsize, > > > > + if (sb->s_blocksize == 0) { > > > > + unsigned int blksize = server->pnfs_blksize ? > > > > + server->pnfs_blksize : server->wsize; > > > > > > NACK. The pnfs block size is a layout driver-specific quantity, and > > > should not be used to substitute for the server-advertised block > > > size. > > > It only applies to I/O _if_ the client is holding a layout for a > > > specific file and is using pNFS to do I/O to that file. > > > > Honestly, I'm not sure if it's ok as well. > > > > > > > > It has nothing to do with xattrs at all. > > > > Yet my question is how to deal with generic/486, should we just skip > > the case directly? I cannot find some proper way to get underlayfs > > block size or real xattr value limit. > > > > RFC8276 provides no method for determining the xattr size limits. It > just notes that such limits may exist, and provides the error code > NFS4ERR_XATTR2BIG, that the server may use as a return value when those > limits are exceeded. > > > For now, generic/486 will return ENOSPC at > > fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE); > > when testing new nfs4.2 xattr support. > > > > As noted above, the NFS server should really be returning > NFS4ERR_XATTR2BIG in this case, which the client, again, should be > transforming into -E2BIG. Where does ENOSPC come from? Thanks for the detailed explanation... I think that is due to ext4 returning ENOSPC since I tested fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE); with ext4 as well and it returned ENOSPC, and I think it's reasonable since setxattr() will return ENOSPC for such cases. https://man7.org/linux/man-pages/man2/setxattr.2.html should we transform it to E2BIG instead (at least in NFS protocol)? but I'm still not sure that E2BIG is a valid return code for setxattr()... If necessary, I will look into it more tomorrow.... Thanks, Gao Xiang > > > Thanks, > > Gao Xiang > > > > > > > > > + > > > > + sb->s_blocksize = nfs_block_bits(blksize, > > > > &sb- > > > > > s_blocksize_bits); > > > > + } > > > > > > > > nfs_super_set_maxbytes(sb, server->maxfilesize); > > > > server->has_sec_mnt_opts = ctx->has_sec_mnt_opts; > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >
On Wed, 2021-06-16 at 22:41 +0800, Gao Xiang wrote: > Hi Trond, > > On Wed, Jun 16, 2021 at 02:20:49PM +0000, Trond Myklebust wrote: > > On Wed, 2021-06-16 at 22:06 +0800, Gao Xiang wrote: > > > On Wed, Jun 16, 2021 at 01:47:13PM +0000, Trond Myklebust wrote: > > > > On Wed, 2021-06-16 at 20:44 +0800, Gao Xiang wrote: > > > > > When testing fstests with ext4 over nfs 4.2, I found > > > > > generic/486 > > > > > failed. The root cause is that the length of its xattr value is > > > > > min(st_blksize * 3 / 4, XATTR_SIZE_MAX) > > > > > > > > > > which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than > > > > > XATTR_SIZE_MAX = 65536 for nfs since the block size would be > > > > > wsize > > > > > (=131072) if bsize is not specified. > > > > > > > > > > Let's use pnfs_blksize first instead of using wsize directly if > > > > > bsize isn't specified. And the testcase itself can pass now. > > > > > > > > > > Cc: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > Cc: Anna Schumaker <anna.schumaker@netapp.com> > > > > > Cc: Joseph Qi <joseph.qi@linux.alibaba.com> > > > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > > > > > --- > > > > > Considering bsize is not specified, we might use pnfs_blksize > > > > > directly first rather than wsize. > > > > > > > > > > fs/nfs/super.c | 8 ++++++-- > > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > > > > > index fe58525cfed4..5015edf0cd9a 100644 > > > > > --- a/fs/nfs/super.c > > > > > +++ b/fs/nfs/super.c > > > > > @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct > > > > > super_block > > > > > *sb, struct nfs_fs_context *ctx) > > > > > snprintf(sb->s_id, sizeof(sb->s_id), > > > > > "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev)); > > > > > > > > > > - if (sb->s_blocksize == 0) > > > > > - sb->s_blocksize = nfs_block_bits(server->wsize, > > > > > + if (sb->s_blocksize == 0) { > > > > > + unsigned int blksize = server->pnfs_blksize ? > > > > > + server->pnfs_blksize : server->wsize; > > > > > > > > NACK. The pnfs block size is a layout driver-specific quantity, > > > > and > > > > should not be used to substitute for the server-advertised block > > > > size. > > > > It only applies to I/O _if_ the client is holding a layout for a > > > > specific file and is using pNFS to do I/O to that file. > > > > > > Honestly, I'm not sure if it's ok as well. > > > > > > > > > > > It has nothing to do with xattrs at all. > > > > > > Yet my question is how to deal with generic/486, should we just > > > skip > > > the case directly? I cannot find some proper way to get underlayfs > > > block size or real xattr value limit. > > > > > > > RFC8276 provides no method for determining the xattr size limits. It > > just notes that such limits may exist, and provides the error code > > NFS4ERR_XATTR2BIG, that the server may use as a return value when > > those > > limits are exceeded. > > > > > For now, generic/486 will return ENOSPC at > > > fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE); > > > when testing new nfs4.2 xattr support. > > > > > > > As noted above, the NFS server should really be returning > > NFS4ERR_XATTR2BIG in this case, which the client, again, should be > > transforming into -E2BIG. Where does ENOSPC come from? > > Thanks for the detailed explanation... > > I think that is due to ext4 returning ENOSPC since I tested > > fsetxattr(fd, "user.world", value, 65536, XATTR_REPLACE); > with ext4 as well and it returned ENOSPC, and I think it's reasonable > since setxattr() will return ENOSPC for such cases. > https://man7.org/linux/man-pages/man2/setxattr.2.html > > should we transform it to E2BIG instead (at least in NFS > protocol)? but I'm still not sure that E2BIG is a valid return code for > setxattr()... The setxattr() manpage appears to suggest ERANGE is the correct return value here. ERANGE The size of name or value exceeds a filesystem-specific limit. However I can't tell if ext4 and xfs ever do that. Furthermore, it looks as if the VFS is always returning E2BIG if size > XATTR_SIZE_MAX. > > If necessary, I will look into it more tomorrow.... > > Thanks, > Gao Xiang > > > > > > Thanks, > > > Gao Xiang > > > > > > > > > > > > + > > > > > + sb->s_blocksize = nfs_block_bits(blksize, > > > > > &sb- > > > > > > s_blocksize_bits); > > > > > + } > > > > > > > > > > nfs_super_set_maxbytes(sb, server->maxfilesize); > > > > > server->has_sec_mnt_opts = ctx->has_sec_mnt_opts; > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@hammerspace.com > > > >
On Wed, Jun 16, 2021 at 10:41:34PM +0800, Gao Xiang wrote: > > > Yet my question is how to deal with generic/486, should we just skip > > > the case directly? I cannot find some proper way to get underlayfs > > > block size or real xattr value limit. Note that the block size does not necessarily have thing to do with the xattr value limit. For example, in ext4, if you create the file system with the ea_inode feature enabled, you can create extended attributes up to the maximum value of 64k defined by the xattr interface --- unless, of course, there isn't enough space in the file system. (The ea_inode feature will also use shared space for large xattrs, so if you are storing hundreds of thousands of files that all have an identical 20 kilbyte Windows security id or ACL, ea_inode is going to be much more efficient way of supporting that particular use case.) > > As noted above, the NFS server should really be returning > > NFS4ERR_XATTR2BIG in this case, which the client, again, should be > > transforming into -E2BIG. Where does ENOSPC come from? > > Thanks for the detailed explanation... > > I think that is due to ext4 returning ENOSPC since I tested It's not just ext4. From inspection, under some circumstances f2fs and btrfs can return ENOSPC. The distinction is that E2BIG is (from the attr_set man page): [E2BIG] The value of the given attribute is too large, it exceeds the maximum allowable size of an attribute value. The maximum allowable size (enforced by the VFS) is XATTR_MAX_SIZE, which is 65536 bytes. Some file systems may impose a smaller max allowable size. In contrast ENOSPC means something different: ENOSPC No space left on device. This isn't necessarily just block space, BTW; it might some other file system space --- thre might not be a free inode in the case of ext4's ea_inode feature. Or it be the f2fs file system not being able to allocate a node id via f2fs_alloc_nid(). Note that generic/486 is testing a very specific case, which is replacing a small xattr (16 bytes) with an xattr with a large value. This is would be a really interesting test for ext4 ea_inode, when we are replacing an xattr stored inline in the inode, or in a single 4k block, with an xattr stored in a separate inode. But not the way src/attr_replace_test.c (which does all of the heavy lifting for the generic/486 test) is currently written. So what I would suggest is to have attr_replace_test.c try to determine the maximum attr value size using binary search. Start with min=16, and max=65536, and try creating an xattr of a particular size, and then delete the xattr, and then retry creating the xattr with the next binary search size. This will allow you to create a function which determines the maximum size attr for a particular file system, especially in those cases where it is dependent on how the file system is configured. > should we transform it to E2BIG instead (at least in NFS > protocol)? but I'm still not sure that E2BIG is a valid return code for > setxattr()... E2BIG is defined in the attr_set(3) man page. ENOSPC isn't mentioned in the attr_set man page, but given that multiple file systems return ENOSPC, and ENOSPC has a well-defined meaning in POSIX.1 which very much makes sense when creating extended attributes, we should fix that by adding ENOSPC to the attr_set(3) man page (which is shipped as part of the libattr library sources). Cheers, - Ted
On Wed, Jun 16, 2021 at 03:14:17PM +0000, Trond Myklebust wrote: > > should we transform it to E2BIG instead (at least in NFS > > protocol)? but I'm still not sure that E2BIG is a valid return code for > > setxattr()... > > The setxattr() manpage appears to suggest ERANGE is the correct return > value here. > > ERANGE The size of name or value exceeds a filesystem-specific > limit. > > However I can't tell if ext4 and xfs ever do that. Ext4 will return ERANGE if the size of the xattr name is greater than 255 bytes. However, if there is no room to create the xattr --- for example there is no space to allocate a file system block, ext4 will return ENOSPC. Without the ea_inode feature, ext2 and ext4 use a single 4k block to store all extended attributes. So whether an xattr can be created is dependent on whether there is room in that 4k block. If there are too many xattrs, or the sum of the length of all the xattr names and values plus a certain amount of overhead exceeds 4k, it will return ENOSPC. > Furthermore, it looks as if the VFS is always returning E2BIG if > size > XATTR_SIZE_MAX. This is defined by the attr_set(3) man page, but not in the setxattr(2) man page. The set of errors are never intended to be exhaustive --- there are some tcp-related errors that probably can exposed to the caller when it is operating on a network-based file system, for example. But it would probably be a good idea to update the man pages so it's a bit clearer when E2BIG and ERANGE and ENOSPC mean. Cheers, - Ted
On Wed, Jun 16, 2021 at 03:14:17PM +0000, Trond Myklebust wrote: > The setxattr() manpage appears to suggest ERANGE is the correct return > value here. > > ERANGE The size of name or value exceeds a filesystem-specific > limit. > > > However I can't tell if ext4 and xfs ever do that. Furthermore, it > looks as if the VFS is always returning E2BIG if size > XATTR_SIZE_MAX. > The basic issue here is that there are two limits: the generic one (XATTR_SIZE_MAX), and the fs-specific one. When crossing the generic one, the xattr code returns E2BIG. When crossing the fs-specific one, it looks like there are a few filesystems that return E2BIG, but others (like ext4) return ENOSPC. For the server, NFS4ERR_XATTR2BIG is the right value to return for all these cases. For the generic limit, it's an easy check. For the fs-specific limit, the server code doesn't necessarily know what's going on, since filesystems don't have a way to advertise their limits. So ENOSPC will *probably* mean that the attribute was too large for the filesystem, but it might not. You could change the server code to translate ENOSPC to NFS4ERR_XATTR2BIG. But that might not be totally correct either, you're going to end up returning an error to the client that is not correct in all cases either way. The problem here for xfstests is how to define the 'correct' behavior across all filesystems so that there's a clean pass/fail, as long as these inconsistencies exist. - Frank
Hi Ted, On Wed, Jun 16, 2021 at 12:14:44PM -0400, Theodore Ts'o wrote: > On Wed, Jun 16, 2021 at 10:41:34PM +0800, Gao Xiang wrote: > > > > Yet my question is how to deal with generic/486, should we just skip > > > > the case directly? I cannot find some proper way to get underlayfs > > > > block size or real xattr value limit. > > Note that the block size does not necessarily have thing to do with > the xattr value limit. For example, in ext4, if you create the file > system with the ea_inode feature enabled, you can create extended > attributes up to the maximum value of 64k defined by the xattr > interface --- unless, of course, there isn't enough space in the file > system. > > (The ea_inode feature will also use shared space for large xattrs, so > if you are storing hundreds of thousands of files that all have an > identical 20 kilbyte Windows security id or ACL, ea_inode is going to > be much more efficient way of supporting that particular use case.) Thanks for your detailed explanation too. Yeah, yet it's not enabled in the issue setup I was assigned :( > > > > As noted above, the NFS server should really be returning > > > NFS4ERR_XATTR2BIG in this case, which the client, again, should be > > > transforming into -E2BIG. Where does ENOSPC come from? > > > > Thanks for the detailed explanation... > > > > I think that is due to ext4 returning ENOSPC since I tested > > It's not just ext4. From inspection, under some circumstances f2fs > and btrfs can return ENOSPC. I did some test on ext4 only earlier since it's our test environment. I didn't mean the ENOSPC behavior was ext4 only ( :-) very sorry about that if some misunderstanding here ) > > The distinction is that E2BIG is (from the attr_set man page): > > [E2BIG] The value of the given attribute is too large, > it exceeds the maximum allowable size of an > attribute value. > > The maximum allowable size (enforced by the VFS) is XATTR_MAX_SIZE, > which is 65536 bytes. Some file systems may impose a smaller max > allowable size. > > In contrast ENOSPC means something different: > > ENOSPC No space left on device. > > This isn't necessarily just block space, BTW; it might some other file > system space --- thre might not be a free inode in the case of ext4's > ea_inode feature. Or it be the f2fs file system not being able to > allocate a node id via f2fs_alloc_nid(). > > Note that generic/486 is testing a very specific case, which is > replacing a small xattr (16 bytes) with an xattr with a large value. > This is would be a really interesting test for ext4 ea_inode, when we > are replacing an xattr stored inline in the inode, or in a single 4k > block, with an xattr stored in a separate inode. But not the way > src/attr_replace_test.c (which does all of the heavy lifting for the > generic/486 test) is currently written. > Yeah, as for the original case, it tried to test when it turned into the XFS extents format, but I'm not sure if it's just the regression test for such report only (similiar to ext4 inline xattr to non-inline xattr case.) rather than test the XFS_DINODE_FMT_BTREE case or ea_inode case... > So what I would suggest is to have attr_replace_test.c try to > determine the maximum attr value size using binary search. Start with > min=16, and max=65536, and try creating an xattr of a particular size, > and then delete the xattr, and then retry creating the xattr with the > next binary search size. This will allow you to create a function > which determines the maximum size attr for a particular file system, > especially in those cases where it is dependent on how the file system > is configured. Considering the original XFS regression report [1], I think underlayfs blksize may still be needed. And binary search to get the maximum attr value may be another new case for this as well. Or alternatively just add by block size to do a trip test. Although I have no idea if we can just skip the case when testing with NFS. If getting underlayfs blksize is unfeasible, I think we might skip such case for now since nfs blksize is not useful for generic/486. [1] https://bugzilla.kernel.org/show_bug.cgi?id=199119 Thanks, Gao Xiang > > > should we transform it to E2BIG instead (at least in NFS > > protocol)? but I'm still not sure that E2BIG is a valid return code for > > setxattr()... > > E2BIG is defined in the attr_set(3) man page. ENOSPC isn't mentioned > in the attr_set man page, but given that multiple file systems return > ENOSPC, and ENOSPC has a well-defined meaning in POSIX.1 which very > much makes sense when creating extended attributes, we should fix that > by adding ENOSPC to the attr_set(3) man page (which is shipped as part > of the libattr library sources). > > Cheers, > > - Ted
On Thu, 2021-06-17 at 01:51 +0800, Gao Xiang wrote: > Hi Ted, > > On Wed, Jun 16, 2021 at 12:14:44PM -0400, Theodore Ts'o wrote: > > On Wed, Jun 16, 2021 at 10:41:34PM +0800, Gao Xiang wrote: > > > > > Yet my question is how to deal with generic/486, should we > > > > > just skip > > > > > the case directly? I cannot find some proper way to get > > > > > underlayfs > > > > > block size or real xattr value limit. > > > > Note that the block size does not necessarily have thing to do with > > the xattr value limit. For example, in ext4, if you create the > > file > > system with the ea_inode feature enabled, you can create extended > > attributes up to the maximum value of 64k defined by the xattr > > interface --- unless, of course, there isn't enough space in the > > file > > system. > > > > (The ea_inode feature will also use shared space for large xattrs, > > so > > if you are storing hundreds of thousands of files that all have an > > identical 20 kilbyte Windows security id or ACL, ea_inode is going > > to > > be much more efficient way of supporting that particular use case.) > > Thanks for your detailed explanation too. > > Yeah, yet it's not enabled in the issue setup I was assigned :( > > > > > > > As noted above, the NFS server should really be returning > > > > NFS4ERR_XATTR2BIG in this case, which the client, again, should > > > > be > > > > transforming into -E2BIG. Where does ENOSPC come from? > > > > > > Thanks for the detailed explanation... > > > > > > I think that is due to ext4 returning ENOSPC since I tested > > > > It's not just ext4. From inspection, under some circumstances f2fs > > and btrfs can return ENOSPC. > > I did some test on ext4 only earlier since it's our test environment. > I didn't mean the ENOSPC behavior was ext4 only ( :-) very sorry > about > that if some misunderstanding here ) > > > > > The distinction is that E2BIG is (from the attr_set man page): > > > > [E2BIG] The value of the given attribute is too > > large, > > it exceeds the maximum allowable size of an > > attribute value. > > > > The maximum allowable size (enforced by the VFS) is XATTR_MAX_SIZE, > > which is 65536 bytes. Some file systems may impose a smaller max > > allowable size. > > > > In contrast ENOSPC means something different: > > > > ENOSPC No space left on device. > > > > This isn't necessarily just block space, BTW; it might some other > > file > > system space --- thre might not be a free inode in the case of > > ext4's > > ea_inode feature. Or it be the f2fs file system not being able to > > allocate a node id via f2fs_alloc_nid(). > > > > Note that generic/486 is testing a very specific case, which is > > replacing a small xattr (16 bytes) with an xattr with a large > > value. > > This is would be a really interesting test for ext4 ea_inode, when > > we > > are replacing an xattr stored inline in the inode, or in a single > > 4k > > block, with an xattr stored in a separate inode. But not the way > > src/attr_replace_test.c (which does all of the heavy lifting for > > the > > generic/486 test) is currently written. > > > > Yeah, as for the original case, it tried to test when it turned into > the XFS extents format, but I'm not sure if it's just the regression > test for such report only (similiar to ext4 inline xattr to non- > inline > xattr case.) rather than test the XFS_DINODE_FMT_BTREE case or > ea_inode > case... > > > So what I would suggest is to have attr_replace_test.c try to > > determine the maximum attr value size using binary search. Start > > with > > min=16, and max=65536, and try creating an xattr of a particular > > size, > > and then delete the xattr, and then retry creating the xattr with > > the > > next binary search size. This will allow you to create a function > > which determines the maximum size attr for a particular file > > system, > > especially in those cases where it is dependent on how the file > > system > > is configured. > > Considering the original XFS regression report [1], I think > underlayfs blksize may still be needed. And binary search to get the > maximum attr value may be another new case for this as well. Or > alternatively just add by block size to do a trip test. > > Although I have no idea if we can just skip the case when testing > with > NFS. If getting underlayfs blksize is unfeasible, I think we might > skip such case for now since nfs blksize is not useful for > generic/486. > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=199119 I'm still not seeing why we care about block sizes, and I'm certainly not seeing why we should care about them in the NFS case. xattrs are a form of key-value storage with atomic semantics (i.e. when you attempt to get/set/update/remove them, then the operation either succeeds or it fails without any side-effects). There is no interface for doing any form of block I/O to an xattr. There is no requirement in the documentation that the user know anything about block size in order to use them. In other words, if this xfstest 'generic/486' is depending on knowledge of the block size, then it is fundamentally a filesystem implementation-specific test. It doesn't belong in the 'generic' test category, because it is not testing anything that is generic to xattrs. > > Thanks, > Gao Xiang > > > > > > should we transform it to E2BIG instead (at least in NFS > > > protocol)? but I'm still not sure that E2BIG is a valid return > > > code for > > > setxattr()... > > > > E2BIG is defined in the attr_set(3) man page. ENOSPC isn't > > mentioned > > in the attr_set man page, but given that multiple file systems > > return > > ENOSPC, and ENOSPC has a well-defined meaning in POSIX.1 which very > > much makes sense when creating extended attributes, we should fix > > that > > by adding ENOSPC to the attr_set(3) man page (which is shipped as > > part > > of the libattr library sources). > > > > Cheers, > > > > - Ted
On Wed, Jun 16, 2021 at 05:17:08PM +0000, Frank van der Linden wrote: > > The problem here for xfstests is how to define the 'correct' behavior > across all filesystems so that there's a clean pass/fail, as long > as these inconsistencies exist. Note that the original xfstest in question is to check what happens when you have a pre-existing xattr with a small (16 byte) value, and replacing (overwriting) the xattr with a large (but valid) value. The problem was that generic/486 was using the preferred size for efficient I/O size as the "block" size, and then using a percentage of this "block" size as the arbitrary "large xattr size". It was not about the error codes being returned, which is a bit confusing, and for which different man pages (attr_set and setxattr) are differently incomplete. That's really a different issue, and the fact that different file systems can use different error codes is not necessarily something that we need to rationalize, in particular for file systems like NFS where the error codes returned are not entirely under the control of the NFS client or server code. Cheers, - Ted
On Thu, Jun 17, 2021 at 01:51:04AM +0800, Gao Xiang wrote: > > Considering the original XFS regression report [1], I think > underlayfs blksize may still be needed. And binary search to get the > maximum attr value may be another new case for this as well. Or > alternatively just add by block size to do a trip test. > > Although I have no idea if we can just skip the case when testing with > NFS. If getting underlayfs blksize is unfeasible, I think we might > skip such case for now since nfs blksize is not useful for generic/486. > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=199119 I've looked at the original XFS regression size, and I don't see why using the underlaying blocksize matters at all. This is especially true if you look at the comment in the test, and the commit which fixed the bug. All that is needed for the xfs regression test is to start with a small xattr, and replace it with a large xattr. The blocksize is really irrelevant. - Ted
On Wed, Jun 16, 2021 at 06:55:31PM -0400, Theodore Ts'o wrote: > On Thu, Jun 17, 2021 at 01:51:04AM +0800, Gao Xiang wrote: > > > > Considering the original XFS regression report [1], I think > > underlayfs blksize may still be needed. And binary search to get the > > maximum attr value may be another new case for this as well. Or > > alternatively just add by block size to do a trip test. > > > > Although I have no idea if we can just skip the case when testing with > > NFS. If getting underlayfs blksize is unfeasible, I think we might > > skip such case for now since nfs blksize is not useful for generic/486. > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=199119 > > I've looked at the original XFS regression size, and I don't see why > using the underlaying blocksize matters at all. This is especially > true if you look at the comment in the test, and the commit which > fixed the bug. All that is needed for the xfs regression test is to > start with a small xattr, and replace it with a large xattr. The > blocksize is really irrelevant. What I said was the original testcase strictly addressing the original regression report, which converts from shortform to single-block leaf format, see: https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/attr_replace_test.c#n40 > /* > * The value should be 3/4 the size of a fs block to ensure that we > * get to extents format. > */ > ret = fstat(fd, &sbuf); > if (ret < 0) die(); > size = sbuf.st_blksize * 3 / 4; and https://lore.kernel.org/fstests/20180425054826.GB1661@magnolia/ > > And I found another problem in the test, it fails on 1k/2k block size > > extN filesystems, because 2k xattr doesn't fit in single block.. e.g. > > > > -Attribute "world" has a 2048 byte value for SCRATCH_MNT/hello > > +No space left on device > > +error=22 at line 46 > > +Attribute "world" has a 1 byte value for > > > > We probably need to check the block size of SCRATCH_DEV and _notrun if > > it's smaller than 4k. > > > > Or require ea_inode feature when block size < 4k? Note that this test > > does fail on ext4 with ea_inode feature enabled (so add ext4 list to > > cc). e.g. > I was about to say "Eh, this is a regression test for XFS so that's > probably fine, but then... Of course, the testcase itself may have room to improve, I'll look at it when I have extra time. Thanks, Gao Xiang > > - Ted
On Thu, Jun 17, 2021 at 10:39:15AM +0800, Gao Xiang wrote: > What I said was the original testcase strictly addressing the original > regression report, which converts from shortform to single-block > leaf format, see: > > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/attr_replace_test.c#n40 So the question is really is this a generic test or an xfs test? If it's an xfs test, then you can make non-portable assumptions about the returned values from statfs(2). If it's a generic test, then you can't really do that. And given the name --- generic/486, it's a generic test. What we probably *can* do is to test a series of xattr value replacements --- e.g., find out what is the maximum xattr size supported, and then try going from an 8 byte xattr value to an xattr value of size N, where N might be 16, 32, 64, 128, 256, .. up to the max xattr size supported. One could also imagine testing starting with an xattr size of 16, 32, 64, 128, 256, ... max_size and replacing with a small xattr, which would test the reverse case. That would ba a truly general, generic test. In any case, I'd suggest making a proposed patch to the fstests@ mailing list, since I'll note that the current set of lists where this is being discussed may not guarantee that XFS developers will be paying attention to this thread. > > > And I found another problem in the test, it fails on 1k/2k block size > > > extN filesystems, because 2k xattr doesn't fit in single block.. e.g. > > > > > > -Attribute "world" has a 2048 byte value for SCRATCH_MNT/hello > > > +No space left on device > > > +error=22 at line 46 > > > +Attribute "world" has a 1 byte value for > > > > > > We probably need to check the block size of SCRATCH_DEV and _notrun if > > > it's smaller than 4k. I think you mean using a 1k or 2k extN file system when testing over NFS. It works fine directly on a 1k block file system, since that's one of my regular test configs and I would have noticed if it was breaking. BEGIN TEST 1k (1 test): Ext4 1k block Thu Jun 17 09:05:09 EDT 2021 DEVICE: /dev/vdd EXT_MKFS_OPTIONS: -b 1024 EXT_MOUNT_OPTIONS: -o block_validity FSTYP -- ext4 PLATFORM -- Linux/x86_64 kvm-xfstests 5.13.0-rc5-xfstests-00008-ga3f5d4136d5a #171 SMP Wed Jun 16 20:40:09 EDT 2021 MKFS_OPTIONS -- -q -b 1024 /dev/vdc MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc generic/486 [09:05:10][ 19.128541] run fstests generic/486 at 2021-06-17 09:05:10 [09:05:10] 0s Ran: generic/486 Passed all 1 tests Xunit report: /results/ext4/results-1k/result.xml Again, my suggestion of using binary search to find the largest size xattr supported by the file system is really the way to go. Cheers, - Ted
diff --git a/fs/nfs/super.c b/fs/nfs/super.c index fe58525cfed4..5015edf0cd9a 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -1068,9 +1068,13 @@ static void nfs_fill_super(struct super_block *sb, struct nfs_fs_context *ctx) snprintf(sb->s_id, sizeof(sb->s_id), "%u:%u", MAJOR(sb->s_dev), MINOR(sb->s_dev)); - if (sb->s_blocksize == 0) - sb->s_blocksize = nfs_block_bits(server->wsize, + if (sb->s_blocksize == 0) { + unsigned int blksize = server->pnfs_blksize ? + server->pnfs_blksize : server->wsize; + + sb->s_blocksize = nfs_block_bits(blksize, &sb->s_blocksize_bits); + } nfs_super_set_maxbytes(sb, server->maxfilesize); server->has_sec_mnt_opts = ctx->has_sec_mnt_opts;
When testing fstests with ext4 over nfs 4.2, I found generic/486 failed. The root cause is that the length of its xattr value is min(st_blksize * 3 / 4, XATTR_SIZE_MAX) which is 4096 * 3 / 4 = 3072 for underlayfs ext4 rather than XATTR_SIZE_MAX = 65536 for nfs since the block size would be wsize (=131072) if bsize is not specified. Let's use pnfs_blksize first instead of using wsize directly if bsize isn't specified. And the testcase itself can pass now. Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> --- Considering bsize is not specified, we might use pnfs_blksize directly first rather than wsize. fs/nfs/super.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)