diff mbox

[v4,00/27] add block layout driver to pnfs client

Message ID 20110729185415.GA23061@merit.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Jim Rees July 29, 2011, 6:54 p.m. UTC
Christoph Hellwig wrote:

  How well is the I/O code tested?  It's a full reimplementation of
  code full of nasty traps.  Did you run xfstests over it?  It supports
  nfs, so pointing it to a pnfs share should probably just work.

The current version of the code has been tested with Connectathon and
iozone.  Previous versions have been tested with the above plus various
other test suites and everyday use like kernel builds.

xfstests does require a small patch to work with NFSv4, which I can supply
if anyone is interested.

I can't test the current code with xfstests because NFS 4.1 without pnfs
doesn't pass these tests.  Here is what I get (test with block layout is
similar but without the hung task):

rhcl1# ./check -nfs
FSTYP         -- nfs
PLATFORM      -- Linux/x86_64 rhcl1 3.0.0-blk

001      - output mismatch (see 001.out.bad)
089
Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
 kernel:------------[ cut here ]------------

Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
 kernel:invalid opcode: 0000 [#1] SMP 

Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
 kernel:Stack:

Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
 kernel:Call Trace:

Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
 kernel:Code: 48 89 e5 53 41 52 48 8b 9f a8 02 00 00 48 8d bb 88 01 00 00 e8 23 39 19 e1 8b 83 5c 02 00 00 ff c0 85 c0 89 83 5c 02 00 00 74 02 <0f> 0b 66 ff 83 88 01 00 00 41 59 5b 5d c3 55 48 89 e5 41 57 41 

INFO: task t_mtab:13810 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
t_mtab          D 0000000000000000     0 13810  13684 0x00000080
 ffff880037b05c38 0000000000000086 ffff88007ae46dc8 ffff880000000000
 ffff88007b02ae00 ffff880037b05fd8 ffff880037b05fd8 0000000000012c40
 ffffffff81a0c020 ffff88007b02ae00 ffff880037b05c38 ffffffffa0281a2f
Call Trace:
 [<ffffffffa0281a2f>] ? __put_nfs_open_context+0x35/0xad [nfs]
 [<ffffffff8143b623>] __mutex_lock_common+0xfd/0x15e
 [<ffffffff8143b799>] __mutex_lock_slowpath+0x16/0x18
 [<ffffffff8143b737>] mutex_lock+0x1e/0x32
 [<ffffffff811082fd>] ? walk_component+0x36d/0x3b1
 [<ffffffff811d738c>] ima_file_check+0x53/0x119
 [<ffffffff81109c2d>] do_last+0x44d/0x57c
 [<ffffffff81108bf6>] ? path_init+0x196/0x29d
 [<ffffffff8110a6e3>] path_openat+0xca/0x30b
 [<ffffffff8109a316>] ? call_rcu_sched+0x10/0x12
 [<ffffffff8110a972>] do_filp_open+0x33/0x81
 [<ffffffff8143af5c>] ? _cond_resched+0x9/0x1d
 [<ffffffff81113920>] ? alloc_fd+0x6d/0x118
 [<ffffffff810fe69b>] do_sys_open+0x69/0xfb
 [<ffffffff810907f2>] ? audit_syscall_entry+0x140/0x16c
 [<ffffffff810fe748>] sys_open+0x1b/0x1d
 [<ffffffff81442592>] system_call_fastpath+0x16/0x1b
INFO: task t_mtab:13812 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
t_mtab          D ffff88007b291c00     0 13812  13684 0x00000080
 ffff880037d97c88 0000000000000082 ffff880037f099d8 ffff88007ae46dc8
 ffff880037b81700 ffff880037d97fd8 ffff880037d97fd8 0000000000012c40
 ffff88007b02ae00 ffff880037b81700 ffff880037d97c58 ffffffffa01ffa64
Call Trace:
 [<ffffffffa01ffa64>] ? generic_lookup_cred+0x10/0x12 [sunrpc]
 [<ffffffff8143b623>] __mutex_lock_common+0xfd/0x15e
 [<ffffffff8143b799>] __mutex_lock_slowpath+0x16/0x18
 [<ffffffff8143b737>] mutex_lock+0x1e/0x32
 [<ffffffff81107a9c>] ? audit_inode+0x15/0x28
 [<ffffffff8110998f>] do_last+0x1af/0x57c
 [<ffffffff81108bf6>] ? path_init+0x196/0x29d
 [<ffffffff8110a6e3>] path_openat+0xca/0x30b
 [<ffffffffa02a545b>] ? __nfs4_close+0xfc/0x108 [nfs]
 [<ffffffff8110a972>] do_filp_open+0x33/0x81
 [<ffffffff8143af5c>] ? _cond_resched+0x9/0x1d
 [<ffffffff81113920>] ? alloc_fd+0x6d/0x118
 [<ffffffff810fe69b>] do_sys_open+0x69/0xfb
 [<ffffffff810907f2>] ? audit_syscall_entry+0x140/0x16c
 [<ffffffff810fe748>] sys_open+0x1b/0x1d
 [<ffffffff81442592>] system_call_fastpath+0x16/0x1b
------------[ cut here ]------------
kernel BUG at /home/rees/linux-pnfs/fs/nfs/callback_xdr.c:775!
invalid opcode: 0000 [#1] SMP 
CPU 0 
Modules linked in: blocklayoutdriver nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio ip6t_REJECT ib_iser rdma_cm ib_cm nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter iw_cm ib_sa ib_mad ip6_tables ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi serio_raw amd64_edac_mod pcspkr tg3 i2c_nforce2 i2c_core edac_core edac_mce_amd shpchp k8temp ipv6 autofs4 mptspi ata_generic mptscsih pata_acpi mptbase scsi_transport_spi sata_nv pata_amd [last unloaded: scsi_wait_scan]

Pid: 1503, comm: nfsv4.1-svc Not tainted 3.0.0-blk #35 HP ProLiant DL145 G2/K85NL
RIP: 0010:[<ffffffffa02a89c2>]  [<ffffffffa02a89c2>] nfs4_cb_take_slot+0x2c/0x3a [nfs]
RSP: 0018:ffff880037f89c00  EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff88006b7ffc00 RCX: 0000000000000001
RDX: 0000000000000004 RSI: ffff88006b48da40 RDI: ffff88006b7ffd88
RBP: ffff880037f89c10 R08: 0000000000000000 R09: ffff880071a01db8
R10: ffff880071a01ca8 R11: ffff880037f89e40 R12: ffff88006bbb9800
R13: 0000000000000000 R14: ffff88006b716800 R15: ffff88006b7fec00
FS:  00007f78a8d7d720(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003ff9064c60 CR3: 000000006b341000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsv4.1-svc (pid: 1503, threadinfo ffff880037f88000, task ffff88007b060000)
Stack:
 ffff880071a01ca8 ffff88006b716800 ffff880037f89ca0 ffffffffa02a942f
 ffff88006b716800 ffff880071f33098 ffff880037f89ca0 ffff88006b48da40
 0000000137f89c50 ffff88006b716808 ffff88006b7ffc58 ffff880037f89d60
Call Trace:
 [<ffffffffa02a942f>] nfs4_callback_sequence+0x264/0x32c [nfs]
 [<ffffffffa02a80f9>] nfs4_callback_compound+0x36a/0x4e5 [nfs]
 [<ffffffffa01fff27>] svc_process_common+0x253/0x4d0 [sunrpc]
 [<ffffffffa0200279>] bc_svc_process+0xd5/0xfe [sunrpc]
 [<ffffffffa02a74b3>] nfs41_callback_svc+0xd5/0x126 [nfs]
 [<ffffffff81063928>] ? remove_wait_queue+0x35/0x35
 [<ffffffffa02a73de>] ? param_set_portnr+0x47/0x47 [nfs]
 [<ffffffff8106328a>] kthread+0x7f/0x87
 [<ffffffff81444714>] kernel_thread_helper+0x4/0x10
 [<ffffffff8106320b>] ? kthread_worker_fn+0x143/0x143
 [<ffffffff81444710>] ? gs_change+0x13/0x13
Code: 48 89 e5 53 41 52 48 8b 9f a8 02 00 00 48 8d bb 88 01 00 00 e8 23 39 19 e1 8b 83 5c 02 00 00 ff c0 85 c0 89 83 5c 02 00 00 74 02 <0f> 0b 66 ff 83 88 01 00 00 41 59 5b 5d c3 55 48 89 e5 41 57 41 
RIP  [<ffffffffa02a89c2>] nfs4_cb_take_slot+0x2c/0x3a [nfs]
 RSP <ffff880037f89c00>
---[ end trace 76c6d9f5d46ae22e ]---
Callback slot table overflowed
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Hellwig July 29, 2011, 7:01 p.m. UTC | #1
On Fri, Jul 29, 2011 at 02:54:15PM -0400, Jim Rees wrote:
> xfstests does require a small patch to work with NFSv4, which I can supply
> if anyone is interested.

Please send it to xfs@oss.sgi.com with a proper description and signoff.

> PLATFORM      -- Linux/x86_64 rhcl1 3.0.0-blk
> 
> 001      - output mismatch (see 001.out.bad)
> -iter 5 chain ... check ....................................
> +001 not run: this test requires a valid host fs for $SCRATCH_DEV
>  cleanup
> 002      [not run] this test requires a valid host fs for $SCRATCH_DEV
> 003      [not run] not suitable for this filesystem type: nfs
> 004      [not run] not suitable for this filesystem type: nfs
> 005      [not run] this test requires a valid host fs for $SCRATCH_DEV
> 006      [not run] this test requires a valid host fs for $SCRATCH_DEV
> 007      [not run] this test requires a valid host fs for $SCRATCH_DEV

It seems like you didn't set up the SCRATCH_DEV variable properly.

> Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
>  kernel:------------[ cut here ]------------
> 
> Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
>  kernel:invalid opcode: 0000 [#1] SMP 
> 
> Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
>  kernel:Stack:
> 
> Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
>  kernel:Call Trace:
> 
> Message from syslogd@rhcl1 at Jul 29 14:42:05 ...

Looks like we did find a bug in NFS.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jim Rees July 29, 2011, 7:13 p.m. UTC | #2
Christoph Hellwig wrote:

  On Fri, Jul 29, 2011 at 02:54:15PM -0400, Jim Rees wrote:
  > xfstests does require a small patch to work with NFSv4, which I can supply
  > if anyone is interested.
  
  Please send it to xfs@oss.sgi.com with a proper description and signoff.

I don't have a proper patch, just one that works for me.  But I'll send a
bug report.  It has to do with the mismatch between nfs and nfs4 mount
types, so it's not really a xfstests bug.  I think this will fix itself when
the nfsvers=4 changes fully propagate.

  It seems like you didn't set up the SCRATCH_DEV variable properly.

I was just skipping those tests so I could get to the one that fails.  I've
also tested with the SCRATCH_DEV tests in other runs.

  > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
  >  kernel:------------[ cut here ]------------
  > 
  > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
  >  kernel:invalid opcode: 0000 [#1] SMP 
  > 
  > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
  >  kernel:Stack:
  > 
  > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
  >  kernel:Call Trace:
  > 
  > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
  
  Looks like we did find a bug in NFS.

It kind of looks that way.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust July 30, 2011, 1:09 a.m. UTC | #3
On Fri, 2011-07-29 at 15:13 -0400, Jim Rees wrote: 
> Christoph Hellwig wrote:
> 
>   On Fri, Jul 29, 2011 at 02:54:15PM -0400, Jim Rees wrote:
>   > xfstests does require a small patch to work with NFSv4, which I can supply
>   > if anyone is interested.
>   
>   Please send it to xfs@oss.sgi.com with a proper description and signoff.
> 
> I don't have a proper patch, just one that works for me.  But I'll send a
> bug report.  It has to do with the mismatch between nfs and nfs4 mount
> types, so it's not really a xfstests bug.  I think this will fix itself when
> the nfsvers=4 changes fully propagate.
> 
>   It seems like you didn't set up the SCRATCH_DEV variable properly.
> 
> I was just skipping those tests so I could get to the one that fails.  I've
> also tested with the SCRATCH_DEV tests in other runs.
> 
>   > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
>   >  kernel:------------[ cut here ]------------
>   > 
>   > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
>   >  kernel:invalid opcode: 0000 [#1] SMP 
>   > 
>   > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
>   >  kernel:Stack:
>   > 
>   > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
>   >  kernel:Call Trace:
>   > 
>   > Message from syslogd@rhcl1 at Jul 29 14:42:05 ...
>   
>   Looks like we did find a bug in NFS.
> 
> It kind of looks that way.

Is that reproducible on the upstream kernel, or is it something that is
being introduced by the pNFS blocks code?
Jim Rees July 30, 2011, 3:26 a.m. UTC | #4
Trond Myklebust wrote:

  >   Looks like we did find a bug in NFS.
  > 
  > It kind of looks that way.
  
  Is that reproducible on the upstream kernel, or is it something that is
  being introduced by the pNFS blocks code?

It happens without the blocks module loaded, but it could be from something
we did outside the module.  I will test this weekend when I get a chance.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jim Rees July 30, 2011, 2:18 p.m. UTC | #5
Trond Myklebust wrote:

  Is that reproducible on the upstream kernel, or is it something that is
  being introduced by the pNFS blocks code?

Upstream kernel 3.0.0-next-20110729 fails in a similar way, so it's not
anything introduced by the block layout code.

kernel BUG at /home/rees/linux-next/fs/nfs/callback_xdr.c:775!
invalid opcode: 0000 [#1] SMP 
CPU 0 
Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb3i ip6t_REJECT libcxgbi cxgb3 nf_conntrack_ipv6 nf_defrag_ipv6 mdio ib_iser ip6table_filter ip6_tables rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi amd64_edac_mod i2c_nforce2 shpchp tg3 pcspkr edac_core i2c_core serio_raw k8temp edac_mce_amd ipv6 autofs4 ata_generic pata_acpi mptspi mptscsih mptbase scsi_transport_spi sata_nv pata_amd [last unloaded: scsi_wait_scan]

Pid: 6494, comm: nfsv4.1-svc Tainted: G        W   3.0.0-next-20110729 #2 HP ProLiant DL145 G2/K85NL
RIP: 0010:[<ffffffffa02998df>]  [<ffffffffa02998df>] nfs4_cb_take_slot+0x2e/0x3e [nfs]
RSP: 0018:ffff880074275bf0  EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff88007b67ec00 RCX: 0000000000000001
RDX: 000000000000001c RSI: ffff88006b8b49f0 RDI: ffff88007b67ed88
RBP: ffff880074275c00 R08: 00000000000000d0 R09: 0000000000000002
R10: ffff88007fc12e70 R11: ffffffff81b42ab0 R12: ffff880037e72000
R13: 0000000000000000 R14: ffff880037ed3800 R15: ffff88007b67e800
FS:  00007fb17e8de720(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003ff9064c60 CR3: 0000000069a99000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsv4.1-svc (pid: 6494, threadinfo ffff880074274000, task ffff880069410000)
Stack:
 ffff880037e72000 ffff880037ed3800 ffff880074275ca0 ffffffffa029a5c2
 ffff88006b8b49e0 ffff880071786098 ffff880074275ca0 ffffffffa0299a1a
 ffff88006b8b49e0 0000000100000246 ffff88007b67ec48 ffff880037ed3808
Call Trace:
 [<ffffffffa029a5c2>] nfs4_callback_sequence+0x272/0x338 [nfs]
 [<ffffffffa0299a1a>] ? decode_cb_sequence_args+0x12b/0x24a [nfs]
 [<ffffffffa0299733>] nfs4_callback_compound+0x364/0x4e2 [nfs]
 [<ffffffff8106f3bf>] ? groups_alloc+0x38/0xbe
 [<ffffffffa01f7206>] svc_process_common+0x260/0x4d3 [sunrpc]
 [<ffffffffa01f7552>] bc_svc_process+0xd9/0x102 [sunrpc]
 [<ffffffffa029884c>] nfs41_callback_svc+0xd5/0x126 [nfs]
 [<ffffffff8106905f>] ? wake_up_bit+0x25/0x25
 [<ffffffffa0298777>] ? nfs_callback_down+0x7c/0x7c [nfs]
 [<ffffffff81068bf8>] kthread+0x7d/0x85
 [<ffffffff81467e14>] kernel_thread_helper+0x4/0x10
 [<ffffffff81068b7b>] ? kthread_worker_fn+0x147/0x147
 [<ffffffff81467e10>] ? gs_change+0x13/0x13
Code: e5 53 48 83 ec 08 48 8b 9f a8 02 00 00 48 8d bb 88 01 00 00 e8 c4 5b 1c e1 8b 83 5c 02 00 00 ff c0 85 c0 89 83 5c 02 00 00 74 04 <0f> 0b eb fe 66 ff 83 88 01 00 00 41 5b 5b c9 c3 55 48 8d 42 08 
RIP  [<ffffffffa02998df>] nfs4_cb_take_slot+0x2e/0x3e [nfs]
 RSP <ffff880074275bf0>
---[ end trace 27e49b345894527a ]---
Callback slot table overflowed
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peng Tao July 30, 2011, 2:25 p.m. UTC | #6
On Sat, Jul 30, 2011 at 11:26 AM, Jim Rees <rees@umich.edu> wrote:
> Trond Myklebust wrote:
>
>  >   Looks like we did find a bug in NFS.
>  >
>  > It kind of looks that way.
>
>  Is that reproducible on the upstream kernel, or is it something that is
>  being introduced by the pNFS blocks code?
>
> It happens without the blocks module loaded, but it could be from something
> we did outside the module.  I will test this weekend when I get a chance.
I tried xfstests again and was able to reproduce a hang on both block
layout and file layout (upstream commit ed1e62, w/o block layout
code). It seems it is a bug in pnfs code. I did not see it w/ NFSv4.
For pnfs block and file layout, it can be reproduced by just running
xfstests with ./check -nfs. It does not show up every time but is
likely to happen in less than 10 runs.  Not sure if it is the same one
Jim reported though.

block layout trace:
[  660.039009] BUG: soft lockup - CPU#1 stuck for 22s! [10.244.82.74-ma:29730]
[  660.039014] Modules linked in: blocklayoutdriver nfs lockd fscache
auth_rpcgss nfs_acl ebtable_na
t ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM
iptable_mangle bridge stp llc sunrpc be2isc
si ip6t_REJECT iscsi_boot_sysfs nf_conntrack_ipv6 nf_defrag_ipv6 bnx2i
ip6table_filter cnic uio ip6_
tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi ppdev i2c_pii
x4 i2c_core pcspkr e1000 parport_pc microcode parport vmw_balloon
shpchp ipv6 floppy mptspi mptscsih
 mptbase scsi_transport_spi [last unloaded: nfs]
[  660.039014] CPU 1
[  660.039014] Modules linked in: blocklayoutdriver nfs lockd fscache
auth_rpcgss nfs_acl ebtable_na
t ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM
iptable_mangle bridge stp llc sunrpc be2isc
si ip6t_REJECT iscsi_boot_sysfs nf_conntrack_ipv6 nf_defrag_ipv6 bnx2i
ip6table_filter cnic uio ip6_
tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi ppdev i2c_pii
x4 i2c_core pcspkr e1000 parport_pc microcode parport vmw_balloon
shpchp ipv6 floppy mptspi mptscsih
 mptbase scsi_transport_spi [last unloaded: nfs]
[  660.039014]
[  660.039014] Pid: 29730, comm: 10.244.82.74-ma Tainted: G      D
3.0.0-pnfs+ #2 VMware, Inc. V
Mware Virtual Platform/440BX Desktop Reference Platform
[  660.039014] RIP: 0010:[<ffffffff81084f49>]  [<ffffffff81084f49>]
do_raw_spin_lock+0x1e/0x25
[  660.039014] RSP: 0018:ffff88001fef5e60  EFLAGS: 00000297
[  660.039014] RAX: 000000000000002b RBX: ffff88003be19000 RCX: 0000000000000001
[  660.039014] RDX: 000000000000002a RSI: ffff8800219a7cf0 RDI: ffff880020e4d988
[  660.039014] RBP: ffff88001fef5e60 R08: 0000000000000000 R09: 000000000000df20
[  660.039014] R10: 0000000000000000 R11: ffff8800219a7c00 R12: ffff88001fef5df0
[  660.039014] R13: 00000000c355df1b R14: ffff88003bfaeac0 R15: ffff8800219a7c00
[  660.039014] FS:  0000000000000000(0000) GS:ffff88003fd00000(0000)
knlGS:0000000000000000
[  660.039014] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  660.039014] CR2: 00007fc6122a4000 CR3: 0000000001a04000 CR4: 00000000000006e0
[  660.039014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  660.039014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  660.039014] Process 10.244.82.74-ma (pid: 29730, threadinfo
ffff88001fef4000, task ffff88001fca80
00)
[  660.039014] Stack:
[  660.039014]  ffff88001fef5e70 ffffffff814585ee ffff88001fef5e90
ffffffffa02badee
[  660.039014]  0000000000000000 ffff8800219a7c00 ffff88001fef5ee0
ffffffffa02bc2d9
[  660.039014]  ffff880000000000 ffffffffa02d2250 ffff88001fef5ee0
ffff88002059ba10
[  660.039014] Call Trace:
[  660.039014]  [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10
[  660.039014]  [<ffffffffa02badee>] nfs4_begin_drain_session+0x24/0x8f [nfs]
[  660.039014]  [<ffffffffa02bc2d9>] nfs4_run_state_manager+0x271/0x517 [nfs]
[  660.039014]  [<ffffffffa02bc068>] ? nfs4_do_reclaim+0x422/0x422 [nfs]
[  660.039014]  [<ffffffff810719bf>] kthread+0x84/0x8c
[  660.039014]  [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10
[  660.039014]  [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148
[  660.039014]  [<ffffffff81460f50>] ? gs_change+0x13/0x13
[  660.039014] Code: 00 00 10 00 74 05 e8 a7 59 1b 00 5d c3 55 48 89
e5 66 66 66 66 90 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2
74 07 f3 90 <0f> b7 17 eb f5 5d c3 55 48 89 e5 66 66 66 66 90 8b 07 89
c2 c1
[  660.039014] Call Trace:
[  660.039014]  [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10
[  660.039014]  [<ffffffffa02badee>] nfs4_begin_drain_session+0x24/0x8f [nfs]
[  660.039014]  [<ffffffffa02bc2d9>] nfs4_run_state_manager+0x271/0x517 [nfs]
[  660.039014]  [<ffffffffa02bc068>] ? nfs4_do_reclaim+0x422/0x422 [nfs]
[  660.039014]  [<ffffffff810719bf>] kthread+0x84/0x8c
[  660.039014]  [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10
[  660.039014]  [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148
[  660.039014]  [<ffffffff81460f50>] ? gs_change+0x13/0x13


file layout trace:
[19716.049009] BUG: soft lockup - CPU#1 stuck for 23s! [10.244.82.76-ma:29036]
[19716.049011] Modules linked in: nfs_layout_nfsv41_files nfs lockd
fscache auth_rpcgss nfs_acl ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 be2iscsi iscsi_boot_sysfs
bnx2i cnic ip6table_filter uio ip6_tables cxgb3i libcxgbi cxgb3 mdio
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev microcode
i2c_piix4 e1000 vmw_balloon parport_pc parport shpchp pcspkr i2c_core
ipv6 mptspi mptscsih mptbase scsi_transport_spi floppy [last unloaded:
nfs]
[19716.049011] CPU 1
[19716.049011] Modules linked in: nfs_layout_nfsv41_files nfs lockd
fscache auth_rpcgss nfs_acl ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 be2iscsi iscsi_boot_sysfs
bnx2i cnic ip6table_filter uio ip6_tables cxgb3i libcxgbi cxgb3 mdio
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev microcode
i2c_piix4 e1000 vmw_balloon parport_pc parport shpchp pcspkr i2c_core
ipv6 mptspi mptscsih mptbase scsi_transport_spi floppy [last unloaded:
nfs]
[19716.049011]
[19716.049011] Pid: 29036, comm: 10.244.82.76-ma Tainted: G      D
3.0.0-pnfs+ #2 VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform
[19716.049011] RIP: 0010:[<ffffffff81084f49>]  [<ffffffff81084f49>]
do_raw_spin_lock+0x1e/0x25
[19716.049011] RSP: 0018:ffff88002a69be60  EFLAGS: 00000297
[19716.049011] RAX: 0000000000000005 RBX: ffff88002a59fd00 RCX: 0000000000000002
[19716.049011] RDX: 0000000000000004 RSI: ffff8800208c00f0 RDI: ffff8800208c1188
[19716.049011] RBP: ffff88002a69be60 R08: 0000000000000002 R09: 0000ffff00066c0a
[19716.049011] R10: 0000ffff00066c0a R11: ffff8800208c0000 R12: ffff88002a69bdf0
[19716.049011] R13: 0000000001ce15a2 R14: ffff88002a6f1f80 R15: ffff8800208c0000
[19716.049011] FS:  0000000000000000(0000) GS:ffff88003fd00000(0000)
knlGS:0000000000000000
[19716.049011] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[19716.049011] CR2: 00007fad5ac53000 CR3: 0000000038784000 CR4: 00000000000006e0
[19716.049011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19716.049011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[19716.049011] Process 10.244.82.76-ma (pid: 29036, threadinfo
ffff88002a69a000, task ffff880022db9720)
[19716.049011] Stack:
[19716.049011]  ffff88002a69be70 ffffffff814585ee ffff88002a69be90
ffffffffa02be836
[19716.049011]  0000000000000002 ffff8800208c0000 ffff88002a69bee0
ffffffffa02bfd21
[19716.049011]  ffff880000000000 ffffffffa02d59c0 ffff88002a69bee0
ffff880037971ce8
[19716.049011] Call Trace:
[19716.049011]  [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10
[19716.049011]  [<ffffffffa02be836>] nfs4_begin_drain_session+0x24/0x8f [nfs]
[19716.049011]  [<ffffffffa02bfd21>] nfs4_run_state_manager+0x271/0x517 [nfs]
[19716.049011]  [<ffffffffa02bfab0>] ? nfs4_do_reclaim+0x422/0x422 [nfs]
[19716.049011]  [<ffffffff810719bf>] kthread+0x84/0x8c
[19716.049011]  [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10
[19716.049011]  [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148
[19716.049011]  [<ffffffff81460f50>] ? gs_change+0x13/0x13
[19716.049011] Code: 00 00 10 00 74 05 e8 a7 59 1b 00 5d c3 55 48 89
e5 66 66 66 66 90 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2
74 07 f3 90 <0f> b7 17 eb f5 5d c3 55 48 89 e5 66 66 66 66 90 8b 07 89
c2 c1
[19716.049011] Call Trace:
[19716.049011] Call Trace:
[19716.049011]  [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10
[19716.049011]  [<ffffffffa02be836>] nfs4_begin_drain_session+0x24/0x8f [nfs]
[19716.049011]  [<ffffffffa02bfd21>] nfs4_run_state_manager+0x271/0x517 [nfs]
[19716.049011]  [<ffffffffa02bfab0>] ? nfs4_do_reclaim+0x422/0x422 [nfs]
[19716.049011]  [<ffffffff810719bf>] kthread+0x84/0x8c
[19716.049011]  [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10
[19716.049011]  [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148
[19716.049011]  [<ffffffff81460f50>] ? gs_change+0x13/0x13
Trond Myklebust Aug. 1, 2011, 9:10 p.m. UTC | #7
On Sat, 2011-07-30 at 22:25 +0800, Peng Tao wrote: 
> On Sat, Jul 30, 2011 at 11:26 AM, Jim Rees <rees@umich.edu> wrote:
> > Trond Myklebust wrote:
> >
> >  >   Looks like we did find a bug in NFS.
> >  >
> >  > It kind of looks that way.
> >
> >  Is that reproducible on the upstream kernel, or is it something that is
> >  being introduced by the pNFS blocks code?
> >
> > It happens without the blocks module loaded, but it could be from something
> > we did outside the module.  I will test this weekend when I get a chance.
> I tried xfstests again and was able to reproduce a hang on both block
> layout and file layout (upstream commit ed1e62, w/o block layout
> code). It seems it is a bug in pnfs code. I did not see it w/ NFSv4.
> For pnfs block and file layout, it can be reproduced by just running
> xfstests with ./check -nfs. It does not show up every time but is
> likely to happen in less than 10 runs.  Not sure if it is the same one
> Jim reported though.
> 
> block layout trace:
> [  660.039009] BUG: soft lockup - CPU#1 stuck for 22s! [10.244.82.74-ma:29730]
> [  660.039014] Modules linked in: blocklayoutdriver nfs lockd fscache
> auth_rpcgss nfs_acl ebtable_na
> t ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM
> iptable_mangle bridge stp llc sunrpc be2isc
> si ip6t_REJECT iscsi_boot_sysfs nf_conntrack_ipv6 nf_defrag_ipv6 bnx2i
> ip6table_filter cnic uio ip6_
> tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi ppdev i2c_pii
> x4 i2c_core pcspkr e1000 parport_pc microcode parport vmw_balloon
> shpchp ipv6 floppy mptspi mptscsih
>  mptbase scsi_transport_spi [last unloaded: nfs]
> [  660.039014] CPU 1
> [  660.039014] Modules linked in: blocklayoutdriver nfs lockd fscache
> auth_rpcgss nfs_acl ebtable_na
> t ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM
> iptable_mangle bridge stp llc sunrpc be2isc
> si ip6t_REJECT iscsi_boot_sysfs nf_conntrack_ipv6 nf_defrag_ipv6 bnx2i
> ip6table_filter cnic uio ip6_
> tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi ppdev i2c_pii
> x4 i2c_core pcspkr e1000 parport_pc microcode parport vmw_balloon
> shpchp ipv6 floppy mptspi mptscsih
>  mptbase scsi_transport_spi [last unloaded: nfs]
> [  660.039014]
> [  660.039014] Pid: 29730, comm: 10.244.82.74-ma Tainted: G      D
> 3.0.0-pnfs+ #2 VMware, Inc. V
> Mware Virtual Platform/440BX Desktop Reference Platform
> [  660.039014] RIP: 0010:[<ffffffff81084f49>]  [<ffffffff81084f49>]
> do_raw_spin_lock+0x1e/0x25
> [  660.039014] RSP: 0018:ffff88001fef5e60  EFLAGS: 00000297
> [  660.039014] RAX: 000000000000002b RBX: ffff88003be19000 RCX: 0000000000000001
> [  660.039014] RDX: 000000000000002a RSI: ffff8800219a7cf0 RDI: ffff880020e4d988
> [  660.039014] RBP: ffff88001fef5e60 R08: 0000000000000000 R09: 000000000000df20
> [  660.039014] R10: 0000000000000000 R11: ffff8800219a7c00 R12: ffff88001fef5df0
> [  660.039014] R13: 00000000c355df1b R14: ffff88003bfaeac0 R15: ffff8800219a7c00
> [  660.039014] FS:  0000000000000000(0000) GS:ffff88003fd00000(0000)
> knlGS:0000000000000000
> [  660.039014] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  660.039014] CR2: 00007fc6122a4000 CR3: 0000000001a04000 CR4: 00000000000006e0
> [  660.039014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  660.039014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  660.039014] Process 10.244.82.74-ma (pid: 29730, threadinfo
> ffff88001fef4000, task ffff88001fca80
> 00)
> [  660.039014] Stack:
> [  660.039014]  ffff88001fef5e70 ffffffff814585ee ffff88001fef5e90
> ffffffffa02badee
> [  660.039014]  0000000000000000 ffff8800219a7c00 ffff88001fef5ee0
> ffffffffa02bc2d9
> [  660.039014]  ffff880000000000 ffffffffa02d2250 ffff88001fef5ee0
> ffff88002059ba10
> [  660.039014] Call Trace:
> [  660.039014]  [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10
> [  660.039014]  [<ffffffffa02badee>] nfs4_begin_drain_session+0x24/0x8f [nfs]
> [  660.039014]  [<ffffffffa02bc2d9>] nfs4_run_state_manager+0x271/0x517 [nfs]
> [  660.039014]  [<ffffffffa02bc068>] ? nfs4_do_reclaim+0x422/0x422 [nfs]
> [  660.039014]  [<ffffffff810719bf>] kthread+0x84/0x8c
> [  660.039014]  [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10
> [  660.039014]  [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148
> [  660.039014]  [<ffffffff81460f50>] ? gs_change+0x13/0x13
> [  660.039014] Code: 00 00 10 00 74 05 e8 a7 59 1b 00 5d c3 55 48 89
> e5 66 66 66 66 90 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2
> 74 07 f3 90 <0f> b7 17 eb f5 5d c3 55 48 89 e5 66 66 66 66 90 8b 07 89
> c2 c1

OK...

Looking at the callback code, I see that if tbl->highest_used_slotid !=
0, then we BUG() while holding the backchannel's tbl->slot_tbl_lock
spinlock. That seems a likely candidate for the above hang.

Andy, how we are guaranteed that tbl->highest_used_slotid won't take
values other than 0, and why do we commit suicide when it does? As far
as I can see, there is no guarantee that we call nfs4_cb_take_slot() in
nfs4_callback_compound(), however we appear to unconditionally call
nfs4_cb_free_slot() provided there is a session.

The other strangeness would be the fact that there is nothing enforcing
the NFS4_SESSION_DRAINING flag. If the session is draining, then the
back-channel simply ignores that and goes ahead with processing the
callback. Is this to avoid deadlocks with the server returning
NFS4ERR_BACK_CHAN_BUSY when the client does a DESTROY_SESSION?
diff mbox

Patch

--- 001.out     2011-07-29 12:11:34.057245055 -0400
+++ 001.out.bad 2011-07-29 14:41:36.697152750 -0400
@@ -1,9 +1,4 @@ 
 QA output created by 001
 cleanup
-setup ....................................
-iter 1 chain ... check ....................................
-iter 2 chain ... check ....................................
-iter 3 chain ... check ....................................
-iter 4 chain ... check ....................................
-iter 5 chain ... check ....................................
+001 not run: this test requires a valid host fs for $SCRATCH_DEV
 cleanup
002      [not run] this test requires a valid host fs for $SCRATCH_DEV
003      [not run] not suitable for this filesystem type: nfs
004      [not run] not suitable for this filesystem type: nfs
005      [not run] this test requires a valid host fs for $SCRATCH_DEV
006      [not run] this test requires a valid host fs for $SCRATCH_DEV
007      [not run] this test requires a valid host fs for $SCRATCH_DEV
008      [not run] not suitable for this filesystem type: nfs
009      [not run] not suitable for this filesystem type: nfs
010      [not run] dbtest was not built for this platform
011      [not run] this test requires a valid host fs for $SCRATCH_DEV
012      [not run] not suitable for this filesystem type: nfs
013      [not run] this test requires a valid host fs for $SCRATCH_DEV
014      [not run] this test requires a valid host fs for $SCRATCH_DEV
015      [not run] not suitable for this filesystem type: nfs
016      [not run] not suitable for this filesystem type: nfs
017      [not run] not suitable for this filesystem type: nfs
018      [not run] not suitable for this filesystem type: nfs
019      [not run] not suitable for this filesystem type: nfs
020      [not run] not suitable for this filesystem type: nfs
021      [not run] not suitable for this filesystem type: nfs
022      [not run] xfsdump not found
023      [not run] xfsdump not found
024      [not run] xfsdump not found
025      [not run] xfsdump not found
026      [not run] xfsdump not found
027      [not run] xfsdump not found
028      [not run] xfsdump not found
029      [not run] not suitable for this filesystem type: nfs
030      [not run] not suitable for this filesystem type: nfs
031      [not run] not suitable for this filesystem type: nfs
032      [not run] not suitable for this filesystem type: nfs
033      [not run] not suitable for this filesystem type: nfs
034      [not run] not suitable for this filesystem type: nfs
035      [not run] xfsdump not found
036      [not run] xfsdump not found
037      [not run] xfsdump not found
038      [not run] xfsdump not found
039      [not run] xfsdump not found
040      [not run] Can't run srcdiff without KWORKAREA set
041      [not run] not suitable for this filesystem type: nfs
042      [not run] not suitable for this filesystem type: nfs
043      [not run] xfsdump not found
044      [not run] not suitable for this filesystem type: nfs
045      [not run] not suitable for this filesystem type: nfs
046      [not run] xfsdump not found
047      [not run] xfsdump not found
048      [not run] not suitable for this filesystem type: nfs
049      [not run] not suitable for this filesystem type: nfs
050      [not run] not suitable for this filesystem type: nfs
051      [not run] not suitable for this filesystem type: nfs
052      [not run] not suitable for this filesystem type: nfs
053      [not run] this test requires a valid $SCRATCH_DEV
054      [not run] not suitable for this filesystem type: nfs
055      [not run] xfsdump not found
056      [not run] xfsdump not found
057      [not run] Place holder for IRIX test 057
058      [not run] Place holder for IRIX test 058
059      [not run] Place holder for IRIX test 059
060      [not run] Place holder for IRIX test 060
061      [not run] xfsdump not found
062      [not run] this test requires a valid $SCRATCH_DEV
063      [not run] xfsdump not found
064      [not run] xfsdump not found
065      [not run] xfsdump not found
066      [not run] xfsdump not found
067      [not run] not suitable for this filesystem type: nfs
068      [not run] not suitable for this filesystem type: nfs
069      [not run] this test requires a valid $SCRATCH_DEV
070      [not run] attrs not supported by this filesystem type: nfs
071      [not run] not suitable for this filesystem type: nfs
072      [not run] not suitable for this filesystem type: nfs
073      [not run] not suitable for this filesystem type: nfs
074      [not run] this test requires a valid host fs for $SCRATCH_DEV
075      [not run] this test requires a valid host fs for $SCRATCH_DEV
076      [not run] this test requires a valid $SCRATCH_DEV
077      [not run] attrs not supported by this filesystem type: nfs
078      [not run] not suitable for this filesystem type: nfs
079      [not run] not suitable for this filesystem type: nfs
080      [not run] not suitable for this filesystem type: nfs
081      [not run] not suitable for this filesystem type: nfs
082      [not run] not suitable for this filesystem type: nfs
083      [not run] not suitable for this filesystem type: nfs
084      [not run] not suitable for this filesystem type: nfs
085      [not run] not suitable for this filesystem type: nfs
086      [not run] not suitable for this filesystem type: nfs
087      [not run] not suitable for this filesystem type: nfs
088      - output mismatch (see 088.out.bad)
--- 088.out     2011-07-29 12:11:34.085247833 -0400
+++ 088.out.bad 2011-07-29 14:41:48.336307595 -0400
@@ -1,9 +1,2 @@ 
 QA output created by 088
-access(TEST_DIR/t_access, 0) returns 0
-access(TEST_DIR/t_access, R_OK) returns 0
-access(TEST_DIR/t_access, W_OK) returns 0
-access(TEST_DIR/t_access, X_OK) returns -1
-access(TEST_DIR/t_access, R_OK | W_OK) returns 0
-access(TEST_DIR/t_access, R_OK | X_OK) returns -1
-access(TEST_DIR/t_access, W_OK | X_OK) returns -1
-access(TEST_DIR/t_access, R_OK | W_OK | X_OK) returns -1
+fchown: Invalid argument