Message ID | 20200910200644.8165-1-fllinden@amazon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nfs: round down reported block numbers in statfs | expand |
On Thu, Sep 10, 2020 at 08:06:44PM +0000, Frank van der Linden wrote: > nfs_statfs rounds up the numbers of blocks as computed > from the numbers the server return values. > > This works out well if the client block size, which is > the same as wrsize, is smaller than or equal to the actual > filesystem block size on the server. > > But, for NFS4, the size is usually larger (1M), meaning > that statfs reports more free space than actually is > available. This confuses, for example, fstest generic/103. > > Given a choice between reporting too much or too little > space, the latter is the safer option, so don't round > up the number of blocks. This also simplifies the code. > > Signed-off-by: Frank van der Linden <fllinden@amazon.com> I doubted whether I should send this in as an RFC, since this is one of those things that might generate more discussion. In any case, let me add some details here: generic/103 is a test that sees if adding an xattr to a full filesystem correctly returns ENOSPC. To achieve that, it gets the number of free blocks (f_bavail), uses fallocate to allocate that space minus some slack (512k), and then fills it up with 64k-sized xattrs. For NFS (4.2) this fails, because the filesystem rounds the free blocks up to f_bsize (1M). So even f_bavail minus 512k can be more than is actually available. The fallocate fails, and the test fails before it gets to the xattr part. Other client implementations simply use the lowest common denominator for f_bsize (512), so the space reporting always works out. But since wrsize is used here, you have to make a choice between rounding up or rounding down, and the latter seems safer. Sure, all the caveats apply: the stats are just a snapshot, applications shouldn't rely on the exact data, etc, but I think the xfstest in question is a decent example of why rounding down is a bit better. It also simplifies the code. - Frank
Hi Frank, Thank you for the patch! Yet something to improve: [auto build test ERROR on nfs/linux-next] [also build test ERROR on v5.9-rc4 next-20200910] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Frank-van-der-Linden/nfs-round-down-reported-block-numbers-in-statfs/20200911-040903 base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next config: microblaze-nommu_defconfig (attached as .config) compiler: microblaze-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=microblaze If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): microblaze-linux-ld: fs/nfs/super.o: in function `nfs_statfs': >> (.text+0x140): undefined reference to `__udivdi3' >> microblaze-linux-ld: (.text+0x168): undefined reference to `__udivdi3' microblaze-linux-ld: (.text+0x190): undefined reference to `__udivdi3' --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Frank,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on nfs/linux-next]
[also build test ERROR on v5.9-rc4 next-20200910]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Frank-van-der-Linden/nfs-round-down-reported-block-numbers-in-statfs/20200911-040903
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: riscv-rv32_defconfig (attached as .config)
compiler: riscv32-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=riscv
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
riscv32-linux-ld: fs/nfs/super.o: in function `nfs_statfs':
super.c:(.text+0x204): undefined reference to `__udivdi3'
>> riscv32-linux-ld: super.c:(.text+0x220): undefined reference to `__udivdi3'
riscv32-linux-ld: super.c:(.text+0x23c): undefined reference to `__udivdi3'
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Fri, Sep 11, 2020 at 12:36:31PM +0800, kernel test robot wrote: > > > Hi Frank, > > Thank you for the patch! Yet something to improve: > > [auto build test ERROR on nfs/linux-next] > [also build test ERROR on v5.9-rc4 next-20200910] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch] > > url: https://github.com/0day-ci/linux/commits/Frank-van-der-Linden/nfs-round-down-reported-block-numbers-in-statfs/20200911-040903 > base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next > config: riscv-rv32_defconfig (attached as .config) > compiler: riscv32-linux-gcc (GCC) 9.3.0 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=riscv > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > > riscv32-linux-ld: fs/nfs/super.o: in function `nfs_statfs': > super.c:(.text+0x204): undefined reference to `__udivdi3' > >> riscv32-linux-ld: super.c:(.text+0x220): undefined reference to `__udivdi3' > riscv32-linux-ld: super.c:(.text+0x23c): undefined reference to `__udivdi3' > > --- > 0-DAY CI Kernel Test Service, Intel Corporation > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Ah yeah, forgot about 64bit divides and 32bit archs.. Will send v2 with div64_u64. - Frank
On Thu, 2020-09-10 at 20:06 +0000, Frank van der Linden wrote: > nfs_statfs rounds up the numbers of blocks as computed > from the numbers the server return values. > > This works out well if the client block size, which is > the same as wrsize, is smaller than or equal to the actual > filesystem block size on the server. > > But, for NFS4, the size is usually larger (1M), meaning > that statfs reports more free space than actually is > available. This confuses, for example, fstest generic/103. > > Given a choice between reporting too much or too little > space, the latter is the safer option, so don't round > up the number of blocks. This also simplifies the code. > > Signed-off-by: Frank van der Linden <fllinden@amazon.com> > --- > fs/nfs/super.c | 25 ++++--------------------- > 1 file changed, 4 insertions(+), 21 deletions(-) > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > index 7a70287f21a2..45d368a5cc95 100644 > --- a/fs/nfs/super.c > +++ b/fs/nfs/super.c > @@ -217,8 +217,6 @@ EXPORT_SYMBOL_GPL(nfs_client_for_each_server); > int nfs_statfs(struct dentry *dentry, struct kstatfs *buf) > { > struct nfs_server *server = NFS_SB(dentry->d_sb); > - unsigned char blockbits; > - unsigned long blockres; > struct nfs_fh *fh = NFS_FH(d_inode(dentry)); > struct nfs_fsstat res; > int error = -ENOMEM; > @@ -241,26 +239,11 @@ int nfs_statfs(struct dentry *dentry, struct > kstatfs *buf) > > buf->f_type = NFS_SUPER_MAGIC; > > - /* > - * Current versions of glibc do not correctly handle the > - * case where f_frsize != f_bsize. Eventually we want to > - * report the value of wtmult in this field. > - */ > - buf->f_frsize = dentry->d_sb->s_blocksize; NACK. This comment and explicit setting is there to document why we're not propagating the true value of the filesystem block size. Please do not remove it. > - > - /* > - * On most *nix systems, f_blocks, f_bfree, and f_bavail > - * are reported in units of f_frsize. Linux hasn't had > - * an f_frsize field in its statfs struct until recently, > - * thus historically Linux's sys_statfs reports these > - * fields in units of f_bsize. > - */ > buf->f_bsize = dentry->d_sb->s_blocksize; > - blockbits = dentry->d_sb->s_blocksize_bits; > - blockres = (1 << blockbits) - 1; > - buf->f_blocks = (res.tbytes + blockres) >> blockbits; > - buf->f_bfree = (res.fbytes + blockres) >> blockbits; > - buf->f_bavail = (res.abytes + blockres) >> blockbits; > + > + buf->f_blocks = res.tbytes / buf->f_bsize; > + buf->f_bfree = res.fbytes / buf->f_bsize; > + buf->f_bavail = res.abytes / buf->f_bsize; > > buf->f_files = res.tfiles; > buf->f_ffree = res.afiles; As far as I can tell, all we're doing here is changing rounding up to rounding down, since dentry->d_sb->s_blocksize should always equal to (1 << dentry->d_sb->s_blocksize_bits). Otherwise, switching from a shift to division seems like it is just making the calculation less efficient. Why?
On Wed, Sep 16, 2020 at 03:14:52PM +0000, Trond Myklebust wrote: > On Thu, 2020-09-10 at 20:06 +0000, Frank van der Linden wrote: > > nfs_statfs rounds up the numbers of blocks as computed > > from the numbers the server return values. > > > > This works out well if the client block size, which is > > the same as wrsize, is smaller than or equal to the actual > > filesystem block size on the server. > > > > But, for NFS4, the size is usually larger (1M), meaning > > that statfs reports more free space than actually is > > available. This confuses, for example, fstest generic/103. > > > > Given a choice between reporting too much or too little > > space, the latter is the safer option, so don't round > > up the number of blocks. This also simplifies the code. > > > > Signed-off-by: Frank van der Linden <fllinden@amazon.com> > > --- > > fs/nfs/super.c | 25 ++++--------------------- > > 1 file changed, 4 insertions(+), 21 deletions(-) > > > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > > index 7a70287f21a2..45d368a5cc95 100644 > > --- a/fs/nfs/super.c > > +++ b/fs/nfs/super.c > > @@ -217,8 +217,6 @@ EXPORT_SYMBOL_GPL(nfs_client_for_each_server); > > int nfs_statfs(struct dentry *dentry, struct kstatfs *buf) > > { > > struct nfs_server *server = NFS_SB(dentry->d_sb); > > - unsigned char blockbits; > > - unsigned long blockres; > > struct nfs_fh *fh = NFS_FH(d_inode(dentry)); > > struct nfs_fsstat res; > > int error = -ENOMEM; > > @@ -241,26 +239,11 @@ int nfs_statfs(struct dentry *dentry, struct > > kstatfs *buf) > > > > buf->f_type = NFS_SUPER_MAGIC; > > > > - /* > > - * Current versions of glibc do not correctly handle the > > - * case where f_frsize != f_bsize. Eventually we want to > > - * report the value of wtmult in this field. > > - */ > > - buf->f_frsize = dentry->d_sb->s_blocksize; > > NACK. This comment and explicit setting is there to document why we're > not propagating the true value of the filesystem block size. Please do > not remove it. It's a comment from 2006, which is a bit outdated. wtmult is an NFSv3 value, and glibc hasn't had that issue for a while. Maybe the comment should be updated? I'm happy to not touch it, though - will leave that change out. > > > - > > - /* > > - * On most *nix systems, f_blocks, f_bfree, and f_bavail > > - * are reported in units of f_frsize. Linux hasn't had > > - * an f_frsize field in its statfs struct until recently, > > - * thus historically Linux's sys_statfs reports these > > - * fields in units of f_bsize. > > - */ > > buf->f_bsize = dentry->d_sb->s_blocksize; > > - blockbits = dentry->d_sb->s_blocksize_bits; > > - blockres = (1 << blockbits) - 1; > > - buf->f_blocks = (res.tbytes + blockres) >> blockbits; > > - buf->f_bfree = (res.fbytes + blockres) >> blockbits; > > - buf->f_bavail = (res.abytes + blockres) >> blockbits; > > + > > + buf->f_blocks = res.tbytes / buf->f_bsize; > > + buf->f_bfree = res.fbytes / buf->f_bsize; > > + buf->f_bavail = res.abytes / buf->f_bsize; > > > > buf->f_files = res.tfiles; > > buf->f_ffree = res.afiles; > > As far as I can tell, all we're doing here is changing rounding up to > rounding down, since dentry->d_sb->s_blocksize should always equal to > (1 << dentry->d_sb->s_blocksize_bits). > > Otherwise, switching from a shift to division seems like it is just > making the calculation less efficient. Why? Well, my thinking here was that using a straight division made the code simpler, and it's not a code path that is performance-sensitive. But, I forgot about needing a div64 macro for 32bit archs anyway, which kind of undoes the 'make the code simpler' argument a bit.. I'll change it to just rounding down (e.g. remove blockres from the equation). Thanks, - Frank
diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 7a70287f21a2..45d368a5cc95 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -217,8 +217,6 @@ EXPORT_SYMBOL_GPL(nfs_client_for_each_server); int nfs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct nfs_server *server = NFS_SB(dentry->d_sb); - unsigned char blockbits; - unsigned long blockres; struct nfs_fh *fh = NFS_FH(d_inode(dentry)); struct nfs_fsstat res; int error = -ENOMEM; @@ -241,26 +239,11 @@ int nfs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_type = NFS_SUPER_MAGIC; - /* - * Current versions of glibc do not correctly handle the - * case where f_frsize != f_bsize. Eventually we want to - * report the value of wtmult in this field. - */ - buf->f_frsize = dentry->d_sb->s_blocksize; - - /* - * On most *nix systems, f_blocks, f_bfree, and f_bavail - * are reported in units of f_frsize. Linux hasn't had - * an f_frsize field in its statfs struct until recently, - * thus historically Linux's sys_statfs reports these - * fields in units of f_bsize. - */ buf->f_bsize = dentry->d_sb->s_blocksize; - blockbits = dentry->d_sb->s_blocksize_bits; - blockres = (1 << blockbits) - 1; - buf->f_blocks = (res.tbytes + blockres) >> blockbits; - buf->f_bfree = (res.fbytes + blockres) >> blockbits; - buf->f_bavail = (res.abytes + blockres) >> blockbits; + + buf->f_blocks = res.tbytes / buf->f_bsize; + buf->f_bfree = res.fbytes / buf->f_bsize; + buf->f_bavail = res.abytes / buf->f_bsize; buf->f_files = res.tfiles; buf->f_ffree = res.afiles;
nfs_statfs rounds up the numbers of blocks as computed from the numbers the server return values. This works out well if the client block size, which is the same as wrsize, is smaller than or equal to the actual filesystem block size on the server. But, for NFS4, the size is usually larger (1M), meaning that statfs reports more free space than actually is available. This confuses, for example, fstest generic/103. Given a choice between reporting too much or too little space, the latter is the safer option, so don't round up the number of blocks. This also simplifies the code. Signed-off-by: Frank van der Linden <fllinden@amazon.com> --- fs/nfs/super.c | 25 ++++--------------------- 1 file changed, 4 insertions(+), 21 deletions(-)