Message ID | 20191024154455.19370-3-jthumshirn@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Provide an estimation of (free/total) inodes in statfs | expand |
On 2019/10/24 下午11:44, Johannes Thumshirn wrote: > On the BeeGFS Mailing list there is a report claiming BTRFS is not usable > with BeeGFS, as BeeGFS is using statfs output to determine the number of > total and free inodes. BeeGFS needs the number of free inodes as it stores > its meta-data either in extended attributes of the underlying file-system > or directly in an inline inode. According to the BeeGFS Server Tuning > Guide: > > """ > BeeGFS metadata is stored as extended attributes (EAs) on the underlying > file system to optimal performance. One metadata file will be created for > each file that a user creates. About extended attributes usage: BeeGFS > Metadata files have a size of 0 bytes (i.e. no normal file contents). > > Access to extended attributes is possible with the getfattr tool. > > If the inodes of the underlying file system are sufficiently large, EAs > can be inlined into the inode of the underlying file system. Additional > data blocks are then not required anymore and metadata disk usage will be > reduced. With EAs inlined into the inode, access latencies are reduced as > seeking to an extra data block is not required anymore. > """ Personally speaking, reporting 0 used and 0 free should be the proper way. User of the fs should be aware of dynamical fs which doesn't go fixed inodes. I really think it's BeeFS' job to change their behavior. Since there are more thing to consider when faking the used/free inodes. > > Provide some estimated numbers of total and free inodes in statfs by > dividing the number of blocks by the size of an inode-item for the total > number of possible inodes and for the number of free inodes divide the > number of free blocks by the size of an inode-item, similar to what other > file-systems without a fixed number of inodes do. > > This of is just an estimation and should not be relied upon. > > Without the patch applied: > rapido1:/# df -hTi /mnt/test > Filesystem Type Inodes IUsed IFree IUse% Mounted on > /mnt/test btrfs 0 0 0 - /mnt/test > > With the patch applied on an empty fs: > rapido1:/# df -hTi /mnt/test > Filesystem Type Inodes IUsed IFree IUse% Mounted on > /dev/zram0 btrfs 1.6K 0 1.6K 0% /mnt/test > > With the patch applied on a dirty fs: > rapido1:/# df -hTi /mnt/test > Filesystem Type Inodes IUsed IFree IUse% Mounted on > /dev/zram0 btrfs 1.6K 1.5K 197 88% /mnt/test > > Link: https://groups.google.com/forum/#!msg/fhgfs-user/IJqGS5o1UD0/8ftDdUI3AQAJ > Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> > --- > fs/btrfs/super.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index b818f764c1c9..6f6f6a70eb1e 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -2068,6 +2068,8 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) > buf->f_blocks = div_u64(btrfs_super_total_bytes(disk_super), factor); > buf->f_blocks >>= bits; > buf->f_bfree = buf->f_blocks - (div_u64(total_used, factor) >> bits); > + buf->f_files = div_u64(buf->f_blocks, sizeof(struct btrfs_inode_item)); That's too optimistic. (I'd call it even beyond Elon Musk's schedule) We have tree block header overhead, and with the increase of tree blocks, the size of extent tree will also increase and bring overhead. In long run, user will report that the ffiles increases more than they used. It will be a hell to calculate such estimation, and we will never reach a good enough point for that. > + buf->f_ffree = div_u64(buf->f_bfree, sizeof(struct btrfs_inode_item)); The same can be applied to ffree, it will decrease faster than real usage. If whatever the distributed fs is using ffree/files as an indicator, it's not reliable anyway. And if they accept such unreliable indicator, they'd better double think before using that indicator. Thanks, Qu > > /* Account global block reserve as used, it's in logical size already */ > spin_lock(&block_rsv->lock); >
On 25/10/2019 02:56, Qu Wenruo wrote: [...] > Personally speaking, reporting 0 used and 0 free should be the proper > way. User of the fs should be aware of dynamical fs which doesn't go > fixed inodes. > > I really think it's BeeFS' job to change their behavior. > > Since there are more thing to consider when faking the used/free inodes. I'm with you on this. It is something BeeGFS has to fix, but judging from what other file-systems do, some do have a real fixed number of inodes, some assign 0 or -1, some do not touch the variable at all and some (i.e. xfs) fake a number. My role model was xfs here.
On Thu, Oct 24, 2019 at 05:44:55PM +0200, Johannes Thumshirn wrote: > On the BeeGFS Mailing list there is a report claiming BTRFS is not usable > with BeeGFS, as BeeGFS is using statfs output to determine the number of > total and free inodes. BeeGFS needs the number of free inodes as it stores > its meta-data either in extended attributes of the underlying file-system > or directly in an inline inode. According to the BeeGFS Server Tuning > Guide: > > """ > BeeGFS metadata is stored as extended attributes (EAs) on the underlying > file system to optimal performance. One metadata file will be created for > each file that a user creates. About extended attributes usage: BeeGFS > Metadata files have a size of 0 bytes (i.e. no normal file contents). That's not really typical use of a files and the 'optimal performance' claim would need some clarifications. > Access to extended attributes is possible with the getfattr tool. > > If the inodes of the underlying file system are sufficiently large, EAs > can be inlined into the inode of the underlying file system. Additional > data blocks are then not required anymore and metadata disk usage will be > reduced. With EAs inlined into the inode, access latencies are reduced as > seeking to an extra data block is not required anymore. So this describes how it's implemented in EXT4 and the BeeGFS is probably tuned to work 'optimally' there. > """ > > Provide some estimated numbers of total and free inodes in statfs by > dividing the number of blocks by the size of an inode-item for the total > number of possible inodes and for the number of free inodes divide the > number of free blocks by the size of an inode-item, similar to what other > file-systems without a fixed number of inodes do. > > This of is just an estimation and should not be relied upon. This is the most problematic part. The inode counts cannot be calculated exactly on btrfs, because of the dynamic nature of the space usage. We can only give rough estimates "how the rest of unallocated space would be used if [assumptions]". We have this problem with explaining 'df' values and now somebody is asking for the same with 'df -i'. The Inode/IFree numbers are intentionally zero, to avoid confusion of monitoring tools to report low inode counts. Though I can't find a documented and standardized interpretation of the numbers, manual page of statfs only says fsfilcnt_t f_files; /* Total file nodes in filesystem */ fsfilcnt_t f_ffree; /* Free file nodes in filesystem */ for the respective fields. And nothing else. For traditional filesystems, and EXT2/3/4 in particular, the inodes are preallocated at creation time so calculating the numbers is easy. I believe XFS does that too without the option inode64, so users are used to see non-zero value and nowadays it has to be faked. That makes sense from backward compatibility POV. But still the numbers are made up and can change unexpectedly. Btrfs has reported 0/0/0 since the beginning to not cofuse monitoring tools, yet this is exactly what can be seen at https://groups.google.com/forum/#!msg/fhgfs-user/IJqGS5o1UD0/8ftDdUI3AQAJ I'd say fix your monitoring tool not to interpret 0% free inodes in case there's also 0 in total. This is not even btrfs-specific fix, IMHO this is interpreting the numbers in the wrong way. > Without the patch applied: > rapido1:/# df -hTi /mnt/test > Filesystem Type Inodes IUsed IFree IUse% Mounted on > /mnt/test btrfs 0 0 0 - /mnt/test > > With the patch applied on an empty fs: > rapido1:/# df -hTi /mnt/test > Filesystem Type Inodes IUsed IFree IUse% Mounted on > /dev/zram0 btrfs 1.6K 0 1.6K 0% /mnt/test > > With the patch applied on a dirty fs: > rapido1:/# df -hTi /mnt/test > Filesystem Type Inodes IUsed IFree IUse% Mounted on > /dev/zram0 btrfs 1.6K 1.5K 197 88% /mnt/test At the moment I object against conjuring up numbers like that. It's perhaps going to silence some tools but would cause lots of questions because the numbers otherwise don't reflect reality, not even close. We try hard to make the regular Allocated/Free space numbers to match users' expectations, but it's not perfect and can't be made much better. And I'm glad we have a simple answer to the inode counts. Should the discussion continue, it would be good to have interested people from BeeGFS on CC.
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index b818f764c1c9..6f6f6a70eb1e 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2068,6 +2068,8 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_blocks = div_u64(btrfs_super_total_bytes(disk_super), factor); buf->f_blocks >>= bits; buf->f_bfree = buf->f_blocks - (div_u64(total_used, factor) >> bits); + buf->f_files = div_u64(buf->f_blocks, sizeof(struct btrfs_inode_item)); + buf->f_ffree = div_u64(buf->f_bfree, sizeof(struct btrfs_inode_item)); /* Account global block reserve as used, it's in logical size already */ spin_lock(&block_rsv->lock);
On the BeeGFS Mailing list there is a report claiming BTRFS is not usable with BeeGFS, as BeeGFS is using statfs output to determine the number of total and free inodes. BeeGFS needs the number of free inodes as it stores its meta-data either in extended attributes of the underlying file-system or directly in an inline inode. According to the BeeGFS Server Tuning Guide: """ BeeGFS metadata is stored as extended attributes (EAs) on the underlying file system to optimal performance. One metadata file will be created for each file that a user creates. About extended attributes usage: BeeGFS Metadata files have a size of 0 bytes (i.e. no normal file contents). Access to extended attributes is possible with the getfattr tool. If the inodes of the underlying file system are sufficiently large, EAs can be inlined into the inode of the underlying file system. Additional data blocks are then not required anymore and metadata disk usage will be reduced. With EAs inlined into the inode, access latencies are reduced as seeking to an extra data block is not required anymore. """ Provide some estimated numbers of total and free inodes in statfs by dividing the number of blocks by the size of an inode-item for the total number of possible inodes and for the number of free inodes divide the number of free blocks by the size of an inode-item, similar to what other file-systems without a fixed number of inodes do. This of is just an estimation and should not be relied upon. Without the patch applied: rapido1:/# df -hTi /mnt/test Filesystem Type Inodes IUsed IFree IUse% Mounted on /mnt/test btrfs 0 0 0 - /mnt/test With the patch applied on an empty fs: rapido1:/# df -hTi /mnt/test Filesystem Type Inodes IUsed IFree IUse% Mounted on /dev/zram0 btrfs 1.6K 0 1.6K 0% /mnt/test With the patch applied on a dirty fs: rapido1:/# df -hTi /mnt/test Filesystem Type Inodes IUsed IFree IUse% Mounted on /dev/zram0 btrfs 1.6K 1.5K 197 88% /mnt/test Link: https://groups.google.com/forum/#!msg/fhgfs-user/IJqGS5o1UD0/8ftDdUI3AQAJ Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> --- fs/btrfs/super.c | 2 ++ 1 file changed, 2 insertions(+)