mbox series

[v7,0/4] xfs: Remove wrappers for some semaphores

Message ID 20200320210317.1071747-1-preichl@redhat.com (mailing list archive)
Headers show
Series xfs: Remove wrappers for some semaphores | expand

Message

Pavel Reichl March 20, 2020, 9:03 p.m. UTC
Remove some wrappers that we have in XFS around the read-write semaphore
locks.

The goal of this cleanup is to remove mrlock_t structure and its mr*()
wrapper functions and replace it with native rw_semaphore type and its
native calls.

Pavel Reichl (4):
  xfs: Refactor xfs_isilocked()
  xfs: clean up whitespace in xfs_isilocked() calls
  xfs: xfs_isilocked() can only check a single lock type
  xfs: replace mrlock_t with rw_semaphores

 fs/xfs/libxfs/xfs_bmap.c |   8 +--
 fs/xfs/mrlock.h          |  78 -----------------------------
 fs/xfs/xfs_file.c        |   3 +-
 fs/xfs/xfs_inode.c       | 104 ++++++++++++++++++++++++++-------------
 fs/xfs/xfs_inode.h       |  25 ++++++----
 fs/xfs/xfs_iops.c        |   4 +-
 fs/xfs/xfs_linux.h       |   2 +-
 fs/xfs/xfs_qm.c          |   2 +-
 fs/xfs/xfs_super.c       |   6 +--
 9 files changed, 98 insertions(+), 134 deletions(-)
 delete mode 100644 fs/xfs/mrlock.h

Comments

Darrick J. Wong March 23, 2020, 3:28 a.m. UTC | #1
On Fri, Mar 20, 2020 at 10:03:13PM +0100, Pavel Reichl wrote:
> Remove some wrappers that we have in XFS around the read-write semaphore
> locks.
> 
> The goal of this cleanup is to remove mrlock_t structure and its mr*()
> wrapper functions and replace it with native rw_semaphore type and its
> native calls.

Hmmm, there's something funny about this patchset that causes my fstests
vm to explode with isilocked assertions everywhere... I'll look more
tomorrow (it's still the weekend here) but figured I should tell you
sooner than later.

--D

> Pavel Reichl (4):
>   xfs: Refactor xfs_isilocked()
>   xfs: clean up whitespace in xfs_isilocked() calls
>   xfs: xfs_isilocked() can only check a single lock type
>   xfs: replace mrlock_t with rw_semaphores
> 
>  fs/xfs/libxfs/xfs_bmap.c |   8 +--
>  fs/xfs/mrlock.h          |  78 -----------------------------
>  fs/xfs/xfs_file.c        |   3 +-
>  fs/xfs/xfs_inode.c       | 104 ++++++++++++++++++++++++++-------------
>  fs/xfs/xfs_inode.h       |  25 ++++++----
>  fs/xfs/xfs_iops.c        |   4 +-
>  fs/xfs/xfs_linux.h       |   2 +-
>  fs/xfs/xfs_qm.c          |   2 +-
>  fs/xfs/xfs_super.c       |   6 +--
>  9 files changed, 98 insertions(+), 134 deletions(-)
>  delete mode 100644 fs/xfs/mrlock.h
> 
> -- 
> 2.25.1
>
Pavel Reichl March 23, 2020, 9:22 a.m. UTC | #2
Oh, thanks for the heads up...I'll try to investigate.

On Mon, Mar 23, 2020 at 4:28 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Fri, Mar 20, 2020 at 10:03:13PM +0100, Pavel Reichl wrote:
> > Remove some wrappers that we have in XFS around the read-write semaphore
> > locks.
> >
> > The goal of this cleanup is to remove mrlock_t structure and its mr*()
> > wrapper functions and replace it with native rw_semaphore type and its
> > native calls.
>
> Hmmm, there's something funny about this patchset that causes my fstests
> vm to explode with isilocked assertions everywhere... I'll look more
> tomorrow (it's still the weekend here) but figured I should tell you
> sooner than later.
>
> --D
>
> > Pavel Reichl (4):
> >   xfs: Refactor xfs_isilocked()
> >   xfs: clean up whitespace in xfs_isilocked() calls
> >   xfs: xfs_isilocked() can only check a single lock type
> >   xfs: replace mrlock_t with rw_semaphores
> >
> >  fs/xfs/libxfs/xfs_bmap.c |   8 +--
> >  fs/xfs/mrlock.h          |  78 -----------------------------
> >  fs/xfs/xfs_file.c        |   3 +-
> >  fs/xfs/xfs_inode.c       | 104 ++++++++++++++++++++++++++-------------
> >  fs/xfs/xfs_inode.h       |  25 ++++++----
> >  fs/xfs/xfs_iops.c        |   4 +-
> >  fs/xfs/xfs_linux.h       |   2 +-
> >  fs/xfs/xfs_qm.c          |   2 +-
> >  fs/xfs/xfs_super.c       |   6 +--
> >  9 files changed, 98 insertions(+), 134 deletions(-)
> >  delete mode 100644 fs/xfs/mrlock.h
> >
> > --
> > 2.25.1
> >
>
Darrick J. Wong March 23, 2020, 4:33 p.m. UTC | #3
On Mon, Mar 23, 2020 at 10:22:02AM +0100, Pavel Reichl wrote:
> Oh, thanks for the heads up...I'll try to investigate.

Ahah, I figured it out.  It took me a while to pin down a solid reproducer,
but here's a stack trace that I see most often:

[  812.102819] XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_trans_inode.c, line: 91
[  812.104017] ------------[ cut here ]------------
[  812.104598] WARNING: CPU: 2 PID: 26250 at fs/xfs/xfs_message.c:112 assfail+0x30/0x50 [xfs]
[  812.105505] Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables iptable_filter bfq sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: xfs]
[  812.108176] CPU: 2 PID: 26250 Comm: kworker/2:1 Tainted: G        W         5.6.0-rc4-djw #rc4
[  812.110742] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
[  812.111724] Workqueue: xfsalloc xfs_btree_split_worker [xfs]
[  812.112404] RIP: 0010:assfail+0x30/0x50 [xfs]
[  812.112905] Code: 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 10 5d 37 a0 e8 c5 f8 ff ff 0f b6 1d fa 5b 11 00 80 fb 01 0f 87 aa 1b 06 00 83 e3 01 75 04 <0f> 0b 5b c3 0f 0b 48 c7 c7 80 a6 40 a0 e8 4d b8 0d e1 0f 1f 40 00
[  812.114912] RSP: 0018:ffffc90000c7fc48 EFLAGS: 00010246
[  812.115508] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  812.116286] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0364b4d
[  812.117087] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  812.117875] R10: 000000000000000a R11: f000000000000000 R12: ffff888078695700
[  812.118661] R13: 0000000000000000 R14: ffff8880787c4460 R15: 0000000000000000
[  812.119445] FS:  0000000000000000(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000
[  812.120311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  812.120969] CR2: 00007fe67192deb8 CR3: 00000000713a1002 CR4: 00000000001606a0
[  812.121755] Call Trace:
[  812.122097]  xfs_trans_log_inode+0x128/0x140 [xfs]
[  812.122685]  xfs_bmbt_alloc_block+0x133/0x230 [xfs]
[  812.123286]  __xfs_btree_split+0x16d/0x980 [xfs]
[  812.123876]  xfs_btree_split_worker+0x61/0x90 [xfs]
[  812.124446]  process_one_work+0x250/0x5c0
[  812.124910]  ? worker_thread+0xcf/0x3a0
[  812.125380]  worker_thread+0x3d/0x3a0
[  812.125807]  ? process_one_work+0x5c0/0x5c0
[  812.126297]  kthread+0x121/0x140
[  812.126684]  ? kthread_park+0x80/0x80
[  812.127111]  ret_from_fork+0x3a/0x50
[  812.127545] irq event stamp: 75298
[  812.127947] hardirqs last  enabled at (75297): [<ffffffff810d7f98>] console_unlock+0x428/0x580
[  812.128895] hardirqs last disabled at (75298): [<ffffffff81001db0>] trace_hardirqs_off_thunk+0x1a/0x1c
[  812.129942] softirqs last  enabled at (74892): [<ffffffffa02c9f2b>] xfs_buf_find+0xa6b/0x1130 [xfs]
[  812.130984] softirqs last disabled at (74890): [<ffffffffa02c98c0>] xfs_buf_find+0x400/0x1130 [xfs]
[  812.131993] ---[ end trace 82fd2c9f1faba927 ]---

This one I could reproduce on my laptop by running generic/324 with a 1k
blocksize, but on my test vm fleet it was xfs/057 with 4k blocks.  YMMV.

Anyway -- I augmented isilocked with some extra trace_printk to see what
was going on, and noticed this:

161011:           <...>-28177 [000]   803.593121: xfs_ilock:            dev 8:80 ino 0x46 flags ILOCK_EXCL caller xfs_alloc_file_space
161022:           <...>-28177 [000]   803.593126: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
161038:           <...>-28177 [000]   803.593132: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
161048:           <...>-28177 [000]   803.593136: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
161064:           <...>-28177 [000]   803.593172: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
161083:           <...>-27081 [000]   803.593559: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 0
<assertion blows up>
162017:           <...>-28177 [001]   803.641200: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162036:           <...>-28177 [001]   803.641215: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162057:           <...>-28177 [001]   803.641224: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162073:           <...>-28177 [001]   803.641243: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162091:           <...>-28177 [001]   803.641260: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162106:           <...>-28177 [001]   803.641288: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162127:           <...>-28177 [001]   803.641318: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162140:           <...>-28177 [001]   803.641397: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162158:           <...>-28177 [001]   803.641416: xfs_iunlock:          dev 8:80 ino 0x46 flags ILOCK_EXCL caller xfs_alloc_file_space

Process 28177 takes the ilock, and tries to do a bmapi_write which
involves a bmap btree split.  The kernel libxfs delegates the split to a
workqueue (apparently to reduce stack usage?), which performs the bmbt
split and logs the inode core.  The kworker is process 27081.

lockdep tracks the rwsem's lock state /and/ which process actually
holds the rwsem.  This ownership doesn't transfer from 28177 to 27081,
so when the kworker asks lockdep if it holds ILOCK_EXCL, lockdep says
no, because 27081 doesn't own the lock, 28177 does.  Kaboom.

The old mrlock_t had that 'int mr_writer' field which didn't care about
lock ownership and so isilocked would return true and so the assert was
happy.

So now comes the fun part -- the old isilocked code had a glaring hole
in which it would return true if *anyone* held the lock, even if the
owner is some other unrelated thread.  That's probably good enough for
most of the fstests because we generally only run one thread at a time,
and developers will probably notice. :)

However, with your series applied, the isilocked function becomes much
more powerful when lockdep is active because now we can test that the
lock is held *by the caller*, which closes that hole.

Unfortunately, it also trips over this bmap split case, so if there's a
way to solve that problem we'd better find it quickly.  Unfortunately, I
don't know of a way to gift a lock to another thread temporarily...

Thoughts?

--D

> On Mon, Mar 23, 2020 at 4:28 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > On Fri, Mar 20, 2020 at 10:03:13PM +0100, Pavel Reichl wrote:
> > > Remove some wrappers that we have in XFS around the read-write semaphore
> > > locks.
> > >
> > > The goal of this cleanup is to remove mrlock_t structure and its mr*()
> > > wrapper functions and replace it with native rw_semaphore type and its
> > > native calls.
> >
> > Hmmm, there's something funny about this patchset that causes my fstests
> > vm to explode with isilocked assertions everywhere... I'll look more
> > tomorrow (it's still the weekend here) but figured I should tell you
> > sooner than later.
> >
> > --D
> >
> > > Pavel Reichl (4):
> > >   xfs: Refactor xfs_isilocked()
> > >   xfs: clean up whitespace in xfs_isilocked() calls
> > >   xfs: xfs_isilocked() can only check a single lock type
> > >   xfs: replace mrlock_t with rw_semaphores
> > >
> > >  fs/xfs/libxfs/xfs_bmap.c |   8 +--
> > >  fs/xfs/mrlock.h          |  78 -----------------------------
> > >  fs/xfs/xfs_file.c        |   3 +-
> > >  fs/xfs/xfs_inode.c       | 104 ++++++++++++++++++++++++++-------------
> > >  fs/xfs/xfs_inode.h       |  25 ++++++----
> > >  fs/xfs/xfs_iops.c        |   4 +-
> > >  fs/xfs/xfs_linux.h       |   2 +-
> > >  fs/xfs/xfs_qm.c          |   2 +-
> > >  fs/xfs/xfs_super.c       |   6 +--
> > >  9 files changed, 98 insertions(+), 134 deletions(-)
> > >  delete mode 100644 fs/xfs/mrlock.h
> > >
> > > --
> > > 2.25.1
> > >
> >
>
Christoph Hellwig March 23, 2020, 4:49 p.m. UTC | #4
On Mon, Mar 23, 2020 at 09:33:42AM -0700, Darrick J. Wong wrote:
> lockdep tracks the rwsem's lock state /and/ which process actually
> holds the rwsem.  This ownership doesn't transfer from 28177 to 27081,
> so when the kworker asks lockdep if it holds ILOCK_EXCL, lockdep says
> no, because 27081 doesn't own the lock, 28177 does.  Kaboom.

> The old mrlock_t had that 'int mr_writer' field which didn't care about
> lock ownership and so isilocked would return true and so the assert was
> happy.
> 
> So now comes the fun part -- the old isilocked code had a glaring hole
> in which it would return true if *anyone* held the lock, even if the
> owner is some other unrelated thread.  That's probably good enough for
> most of the fstests because we generally only run one thread at a time,
> and developers will probably notice. :)
> 
> However, with your series applied, the isilocked function becomes much
> more powerful when lockdep is active because now we can test that the
> lock is held *by the caller*, which closes that hole.
> 
> Unfortunately, it also trips over this bmap split case, so if there's a
> way to solve that problem we'd better find it quickly.  Unfortunately, I
> don't know of a way to gift a lock to another thread temporarily...
> 
> Thoughts?

lock_release() in the donor, lock_acquire in the worker context, and
vice versa on return.  We already do that dance indirectly with
xfs_setfilesize_trans_alloc and friends.
Dave Chinner March 23, 2020, 9:59 p.m. UTC | #5
On Mon, Mar 23, 2020 at 09:33:42AM -0700, Darrick J. Wong wrote:
> On Mon, Mar 23, 2020 at 10:22:02AM +0100, Pavel Reichl wrote:
> > Oh, thanks for the heads up...I'll try to investigate.
> 
> Ahah, I figured it out.  It took me a while to pin down a solid reproducer,
> but here's a stack trace that I see most often:
....
> lockdep tracks the rwsem's lock state /and/ which process actually
> holds the rwsem.  This ownership doesn't transfer from 28177 to 27081,
> so when the kworker asks lockdep if it holds ILOCK_EXCL, lockdep says
> no, because 27081 doesn't own the lock, 28177 does.  Kaboom.

Yeah, linux has this weird idea that semaphores can only be accessed
from a single process context, like a mutex. We've beaten our head
against this many times, and it's the only reason struct semaphore
still exists and everything isn't a mutex.

rwsems now do crazy stuff like track the owner for spinning
optimisations, despite the fact it's a semaphore and, by definition,
can be released by non-owner contexts....

> The old mrlock_t had that 'int mr_writer' field which didn't care about
> lock ownership and so isilocked would return true and so the assert was
> happy.
> 
> So now comes the fun part -- the old isilocked code had a glaring hole
> in which it would return true if *anyone* held the lock, even if the
> owner is some other unrelated thread.  That's probably good enough for
> most of the fstests because we generally only run one thread at a time,
> and developers will probably notice. :)

No, that's not a bug, that's how semaphores work - semaphores
protect the object, not the process context that took the lock.

i.e. the difference between a mutex and a semaphore is that the
mutex protects a code path from running concurrently with other
processes, while a semaphore protects an object from other accesses,
even when they don't have a process context associated with them
(e.g. while they are under IO).

> However, with your series applied, the isilocked function becomes much
> more powerful when lockdep is active because now we can test that the
> lock is held *by the caller*, which closes that hole.

Not really, it's just another "lockdep behaviour is wrong" issue we
have to work around. When we switch to a worker, we need to release
and acquire the lockdep context so that it thinks the current
process working on the object owns the lock.

> Unfortunately, it also trips over this bmap split case, so if there's a
> way to solve that problem we'd better find it quickly.  Unfortunately, I
> don't know of a way to gift a lock to another thread temporarily...

rwsem_release() when the work is queued, rwsem_acquire() when the
work is run. And vice versa when the work is done.

Cheers,

Dave.