mbox series

[RFC,0/3] vfs: convert inode cache to hlist-bl

Message ID 20210406123343.1739669-1-david@fromorbit.com (mailing list archive)
Headers show
Series vfs: convert inode cache to hlist-bl | expand

Message

Dave Chinner April 6, 2021, 12:33 p.m. UTC
Hi folks,

Recently I've been doing some scalability characterisation of
various filesystems, and one of the limiting factors that has
prevented me from exploring filesystem characteristics is the
inode hash table. namely, the global inode_hash_lock that protects
it.

This has long been a problem, but I personally haven't cared about
it because, well, XFS doesn't use it and so it's not a limiting
factor for most of my work. However, in trying to characterise the
scalability boundaries of bcachefs, I kept hitting against VFS
limitations first. bcachefs hits the inode hash table pretty hard
and it becaomse a contention point a lot sooner than it does for
ext4. Btrfs also uses the inode hash, but it's namespace doesn't
have the capability to stress the indoe hash lock due to it hitting
internal contention first.

Long story short, I did what should have been done a decade or more
ago - I converted the inode hash table to use hlist-bl to split up
the global lock. This is modelled on the dentry cache, with one
minor tweak. That is, the inode hash value cannot be calculated from
the inode, so we have to keep a record of either the hash value or a
pointer to the hlist-bl list head that the inode is hashed into so
taht we can lock the corect list on removal.

Other than that, this is mostly just a mechanical conversion from
one list and lock type to another. None of the algorithms have
changed and none of the RCU behaviours have changed. But it removes
the inode_hash_lock from the picture and so performance for bcachefs
goes way up and CPU usage for ext4 halves at 16 and 32 threads. At
higher thread counts, we start to hit filesystem and other VFS locks
as the limiting factors. Profiles and performance numbers are in
patch 3 for those that are curious.

I've been running this in benchmarks and perf testing across
bcachefs, btrfs and ext4 for a couple of weeks, and it passes
fstests on ext4 and btrfs without regressions. So now it needs more
eyes and testing and hopefully merging....

Cheers,

Dave.

Comments

Kent Overstreet April 6, 2021, 10:03 p.m. UTC | #1
On Tue, Apr 06, 2021 at 10:33:40PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> Recently I've been doing some scalability characterisation of
> various filesystems, and one of the limiting factors that has
> prevented me from exploring filesystem characteristics is the
> inode hash table. namely, the global inode_hash_lock that protects
> it.
> 
> This has long been a problem, but I personally haven't cared about
> it because, well, XFS doesn't use it and so it's not a limiting
> factor for most of my work. However, in trying to characterise the
> scalability boundaries of bcachefs, I kept hitting against VFS
> limitations first. bcachefs hits the inode hash table pretty hard
> and it becaomse a contention point a lot sooner than it does for
> ext4. Btrfs also uses the inode hash, but it's namespace doesn't
> have the capability to stress the indoe hash lock due to it hitting
> internal contention first.
> 
> Long story short, I did what should have been done a decade or more
> ago - I converted the inode hash table to use hlist-bl to split up
> the global lock. This is modelled on the dentry cache, with one
> minor tweak. That is, the inode hash value cannot be calculated from
> the inode, so we have to keep a record of either the hash value or a
> pointer to the hlist-bl list head that the inode is hashed into so
> taht we can lock the corect list on removal.
> 
> Other than that, this is mostly just a mechanical conversion from
> one list and lock type to another. None of the algorithms have
> changed and none of the RCU behaviours have changed. But it removes
> the inode_hash_lock from the picture and so performance for bcachefs
> goes way up and CPU usage for ext4 halves at 16 and 32 threads. At
> higher thread counts, we start to hit filesystem and other VFS locks
> as the limiting factors. Profiles and performance numbers are in
> patch 3 for those that are curious.
> 
> I've been running this in benchmarks and perf testing across
> bcachefs, btrfs and ext4 for a couple of weeks, and it passes
> fstests on ext4 and btrfs without regressions. So now it needs more
> eyes and testing and hopefully merging....

These patches have been in the bcachefs repo for a bit with no issues, and they
definitely do help with performance - thanks, Dave!