diff mbox series

soft lockup in xfs/170 on a file system formatted with -m crc=0

Message ID YYVo8ZyKpy4Di0pK@mit.edu (mailing list archive)
State Deferred, archived
Headers show
Series soft lockup in xfs/170 on a file system formatted with -m crc=0 | expand

Commit Message

Theodore Ts'o Nov. 5, 2021, 5:25 p.m. UTC
Is this a known failure?  I can reliably reproduce this soft lockup
running xfs/170 using "gce-xfstests -c xfs/v4 xfs/170" using
v5.15-rc4.  The xfs/v4 test config formats the file system using -m
crc=0 with no special mount options.

I've attached the kernel config that I used; it's the standard one
obtained via "gce-xfstests install-kconfig"[1].


[1] https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/util/install-kconfig

Thanks,

					- Ted

commit adac31869b098d5f85d0930874dcf6a524d128d3
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Fri Nov 5 13:10:35 2021 -0400

    test-appliance: add xfs/170 to xfs/v4's exclude file
    
    The xfs/170 test is reliably causing a soft lockup when run on a file
    system formatted with mkfs.xfs -m crc=0.
    
            CONFIG: xfs/v4
            ZONE:   us-east1-b
            VM STATUS:      timeout on one test (xfs/170)
            SINCE LAST UPDATE:      3h6m10s
            TEST STATUS:    hang
    
    run fstests xfs/170 at 2021-11-05 02:27:05
    ...
    [11024.269799] XFS (dm-1): Unmounting Filesystem
    [11024.695394] XFS (dm-1): Mounting V4 Filesystem
    [11024.731406] XFS (dm-1): Ending clean mount
    [11024.731552] xfs filesystem being mounted at /xt-vdc supports timestamps until 2038 (0x7fffffff)
    watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [dd:9671]
    irq event stamp: 22146924
    hardirqs last  enabled at (22146923): [<ffffffff903cebad>] _raw_spin_unlock_irqrestore+0x2d/0x40
    hardirqs last disabled at (22146924): [<ffffffff903bd4fb>] sysvec_apic_timer_interrupt+0xb/0x90
    softirqs last  enabled at (16886422): [<ffffffff906002ce>] __do_softirq+0x2ce/0x3fd
    softirqs last disabled at (16886399): [<ffffffff8f6bc1e8>] __irq_exit_rcu+0x88/0xb0
    CPU: 1 PID: 9671 Comm: dd Not tainted 5.15.0-rc4-xfstests-00018-g124e7c61deb2 #382
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
    Code: c7 18 53 48 89 f3 48 8b 74 24 10 e8 e5 3b 35 ff 48 89 ef e8 dd 66 35 ff 80 e7 02 74 06 e8 f3 77 3e ff fb 65 ff 0d 4b 82 c4 6f <5b> 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 65 ff 05 32
    RSP: 0018:ffffa055c2913598 EFLAGS: 00000246
    RAX: 000000000151ef6b RBX: 0000000000000282 RCX: 0000000000000040
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff903cebad
    RBP: ffff945d3aff4be0 R08: 0000000000000001 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000080000 R12: 0000000000000000
    R13: ffffa055c2913640 R14: ffff945d13892a10 R15: ffff945d13892a58
    FS:  00007f62407d8580(0000) GS:ffff945dd9400000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f6240613000 CR3: 000000014bc04004 CR4: 00000000003706e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     down_trylock+0x25/0x30
     xfs_buf_trylock+0x17/0x190
     xfs_buf_find+0x1b3/0x4a0
     xfs_buf_get_map+0x44/0x3a0
     xfs_buf_read_map+0x52/0x2e0
     ? xfs_read_agf+0xa3/0x180
     xfs_trans_read_buf_map+0x144/0x430
     ? xfs_read_agf+0xa3/0x180
     ? __lock_acquire+0x3a7/0x6c0
     xfs_read_agf+0xa3/0x180
     xfs_alloc_read_agf+0x4c/0x110
     xfs_alloc_pagf_init+0x27/0x60
     xfs_filestream_pick_ag+0x280/0x530
     xfs_filestream_new_ag+0x87/0x100
     xfs_bmap_btalloc_filestreams.constprop.0+0xe0/0x120
     xfs_bmap_btalloc+0x3e6/0x700
     xfs_bmapi_allocate+0xe4/0x310
     xfs_bmapi_write+0x42a/0x580
     xfs_iomap_write_direct+0x18d/0x210
     xfs_direct_write_iomap_begin+0x3ab/0x690
     ? lock_is_held_type+0x98/0x100
     iomap_iter+0x12b/0x240
     __iomap_dio_rw+0x1ff/0x630
     iomap_dio_rw+0xa/0x30
     xfs_file_dio_write_aligned+0xae/0x1c0
     xfs_file_write_iter+0xd8/0x130
     new_sync_write+0x122/0x1b0
     ? mod_objcg_state+0x180/0x2a0
     vfs_write+0x25d/0x370
     ksys_write+0x68/0xe0
     do_syscall_64+0x3b/0x90
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7f6240700504
    Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
    RSP: 002b:00007ffdac48f148 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6240700504
    RDX: 0000000000100000 RSI: 00007f6240514000 RDI: 0000000000000001
    RBP: 0000000000100000 R08: 00000000ffffffff R09: 0000000000000000
    R10: ffffffffffffff3b R11: 0000000000000246 R12: 00007f6240514000
    R13: 0000000000000000 R14: 0000000000000000 R15: 00007f6240514000
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Comments

Darrick J. Wong Nov. 6, 2021, 2:10 a.m. UTC | #1
On Fri, Nov 05, 2021 at 01:25:05PM -0400, Theodore Ts'o wrote:
> Is this a known failure?  I can reliably reproduce this soft lockup
> running xfs/170 using "gce-xfstests -c xfs/v4 xfs/170" using
> v5.15-rc4.  The xfs/v4 test config formats the file system using -m
> crc=0 with no special mount options.
> 
> I've attached the kernel config that I used; it's the standard one
> obtained via "gce-xfstests install-kconfig"[1].

Is the system totally idle, or is it still pounding the storage?  The
softlockup looks like we're stuck trying to lock an AGF buffer, which
could just be the result of a long(ish) stall due to other threads or
the log or something else.

I guess "3h6m10s" implies it just died for 3+ hours.

Also, uh... 5.15 didn't prove to be a stable testing base (at least not
without a bunch of other patches to kvm, the memory manager, and the
block layer) until 5.15-rc7.

--D

> 
> 
> [1] https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/util/install-kconfig
> 
> Thanks,
> 
> 					- Ted
> 
> commit adac31869b098d5f85d0930874dcf6a524d128d3
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Fri Nov 5 13:10:35 2021 -0400
> 
>     test-appliance: add xfs/170 to xfs/v4's exclude file
>     
>     The xfs/170 test is reliably causing a soft lockup when run on a file
>     system formatted with mkfs.xfs -m crc=0.
>     
>             CONFIG: xfs/v4
>             ZONE:   us-east1-b
>             VM STATUS:      timeout on one test (xfs/170)
>             SINCE LAST UPDATE:      3h6m10s
>             TEST STATUS:    hang
>     
>     run fstests xfs/170 at 2021-11-05 02:27:05
>     ...
>     [11024.269799] XFS (dm-1): Unmounting Filesystem
>     [11024.695394] XFS (dm-1): Mounting V4 Filesystem
>     [11024.731406] XFS (dm-1): Ending clean mount
>     [11024.731552] xfs filesystem being mounted at /xt-vdc supports timestamps until 2038 (0x7fffffff)
>     watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [dd:9671]
>     irq event stamp: 22146924
>     hardirqs last  enabled at (22146923): [<ffffffff903cebad>] _raw_spin_unlock_irqrestore+0x2d/0x40
>     hardirqs last disabled at (22146924): [<ffffffff903bd4fb>] sysvec_apic_timer_interrupt+0xb/0x90
>     softirqs last  enabled at (16886422): [<ffffffff906002ce>] __do_softirq+0x2ce/0x3fd
>     softirqs last disabled at (16886399): [<ffffffff8f6bc1e8>] __irq_exit_rcu+0x88/0xb0
>     CPU: 1 PID: 9671 Comm: dd Not tainted 5.15.0-rc4-xfstests-00018-g124e7c61deb2 #382
>     Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>     RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
>     Code: c7 18 53 48 89 f3 48 8b 74 24 10 e8 e5 3b 35 ff 48 89 ef e8 dd 66 35 ff 80 e7 02 74 06 e8 f3 77 3e ff fb 65 ff 0d 4b 82 c4 6f <5b> 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 65 ff 05 32
>     RSP: 0018:ffffa055c2913598 EFLAGS: 00000246
>     RAX: 000000000151ef6b RBX: 0000000000000282 RCX: 0000000000000040
>     RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff903cebad
>     RBP: ffff945d3aff4be0 R08: 0000000000000001 R09: 0000000000000001
>     R10: 0000000000000000 R11: 0000000000080000 R12: 0000000000000000
>     R13: ffffa055c2913640 R14: ffff945d13892a10 R15: ffff945d13892a58
>     FS:  00007f62407d8580(0000) GS:ffff945dd9400000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 00007f6240613000 CR3: 000000014bc04004 CR4: 00000000003706e0
>     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>     Call Trace:
>      down_trylock+0x25/0x30
>      xfs_buf_trylock+0x17/0x190
>      xfs_buf_find+0x1b3/0x4a0
>      xfs_buf_get_map+0x44/0x3a0
>      xfs_buf_read_map+0x52/0x2e0
>      ? xfs_read_agf+0xa3/0x180
>      xfs_trans_read_buf_map+0x144/0x430
>      ? xfs_read_agf+0xa3/0x180
>      ? __lock_acquire+0x3a7/0x6c0
>      xfs_read_agf+0xa3/0x180
>      xfs_alloc_read_agf+0x4c/0x110
>      xfs_alloc_pagf_init+0x27/0x60
>      xfs_filestream_pick_ag+0x280/0x530
>      xfs_filestream_new_ag+0x87/0x100
>      xfs_bmap_btalloc_filestreams.constprop.0+0xe0/0x120
>      xfs_bmap_btalloc+0x3e6/0x700
>      xfs_bmapi_allocate+0xe4/0x310
>      xfs_bmapi_write+0x42a/0x580
>      xfs_iomap_write_direct+0x18d/0x210
>      xfs_direct_write_iomap_begin+0x3ab/0x690
>      ? lock_is_held_type+0x98/0x100
>      iomap_iter+0x12b/0x240
>      __iomap_dio_rw+0x1ff/0x630
>      iomap_dio_rw+0xa/0x30
>      xfs_file_dio_write_aligned+0xae/0x1c0
>      xfs_file_write_iter+0xd8/0x130
>      new_sync_write+0x122/0x1b0
>      ? mod_objcg_state+0x180/0x2a0
>      vfs_write+0x25d/0x370
>      ksys_write+0x68/0xe0
>      do_syscall_64+0x3b/0x90
>      entry_SYSCALL_64_after_hwframe+0x44/0xae
>     RIP: 0033:0x7f6240700504
>     Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
>     RSP: 002b:00007ffdac48f148 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>     RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6240700504
>     RDX: 0000000000100000 RSI: 00007f6240514000 RDI: 0000000000000001
>     RBP: 0000000000100000 R08: 00000000ffffffff R09: 0000000000000000
>     R10: ffffffffffffff3b R11: 0000000000000246 R12: 00007f6240514000
>     R13: 0000000000000000 R14: 0000000000000000 R15: 00007f6240514000
>     
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> 
> diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
> index a9acba9c..83ccfd79 100644
> --- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
> +++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
> @@ -1,2 +1,16 @@
>  # Normal configurations don't support dax
>  -g dax
> +
> +# On a 5.15-rc4 kernel, xfs/170 reliably causes a soft lockup in
> +# xfs_alloc_read_agf()+0x41/0x110.  Call stack:
> +#
> +# xfs_alloc_pagf_init+0x27/0x60
> +# xfs_filestream_pick_ag+0x280/0x530
> +# xfs_filestream_new_ag+0x87/0x100
> +# xfs_bmap_btalloc_filestreams.constprop.0+0xe0/0x120
> +# xfs_bmap_btalloc+0x3e6/0x700
> +# xfs_bmapi_allocate+0xe4/0x310
> +# xfs_bmapi_convert_delalloc+0x26c/0x480
> +# xfs_map_blocks+0x1b5/0x510
> +# ...
> +xfs/170
Theodore Ts'o Nov. 6, 2021, 2:52 p.m. UTC | #2
On Fri, Nov 05, 2021 at 07:10:31PM -0700, Darrick J. Wong wrote:
> On Fri, Nov 05, 2021 at 01:25:05PM -0400, Theodore Ts'o wrote:
> > Is this a known failure?  I can reliably reproduce this soft lockup
> > running xfs/170 using "gce-xfstests -c xfs/v4 xfs/170" using
> > v5.15-rc4.  The xfs/v4 test config formats the file system using -m
> > crc=0 with no special mount options.
> > 
> > I've attached the kernel config that I used; it's the standard one
> > obtained via "gce-xfstests install-kconfig"[1].
> 
> Also, uh... 5.15 didn't prove to be a stable testing base (at least not
> without a bunch of other patches to kvm, the memory manager, and the
> block layer) until 5.15-rc7.

Thanks, I haven't had a problem using 5.15-rc4 as a testing base for
ext4, so I wasn't aware of that.  I just tested xfs/170 on 5.15 final,
and it is no longer triggering a soft lockup.

Thanks again,

					- Ted
diff mbox series

Patch

diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
index a9acba9c..83ccfd79 100644
--- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
+++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
@@ -1,2 +1,16 @@ 
 # Normal configurations don't support dax
 -g dax
+
+# On a 5.15-rc4 kernel, xfs/170 reliably causes a soft lockup in
+# xfs_alloc_read_agf()+0x41/0x110.  Call stack:
+#
+# xfs_alloc_pagf_init+0x27/0x60
+# xfs_filestream_pick_ag+0x280/0x530
+# xfs_filestream_new_ag+0x87/0x100
+# xfs_bmap_btalloc_filestreams.constprop.0+0xe0/0x120
+# xfs_bmap_btalloc+0x3e6/0x700
+# xfs_bmapi_allocate+0xe4/0x310
+# xfs_bmapi_convert_delalloc+0x26c/0x480
+# xfs_map_blocks+0x1b5/0x510
+# ...
+xfs/170