Message ID | 20110729185415.GA23061@merit.edu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jul 29, 2011 at 02:54:15PM -0400, Jim Rees wrote: > xfstests does require a small patch to work with NFSv4, which I can supply > if anyone is interested. Please send it to xfs@oss.sgi.com with a proper description and signoff. > PLATFORM -- Linux/x86_64 rhcl1 3.0.0-blk > > 001 - output mismatch (see 001.out.bad) > -iter 5 chain ... check .................................... > +001 not run: this test requires a valid host fs for $SCRATCH_DEV > cleanup > 002 [not run] this test requires a valid host fs for $SCRATCH_DEV > 003 [not run] not suitable for this filesystem type: nfs > 004 [not run] not suitable for this filesystem type: nfs > 005 [not run] this test requires a valid host fs for $SCRATCH_DEV > 006 [not run] this test requires a valid host fs for $SCRATCH_DEV > 007 [not run] this test requires a valid host fs for $SCRATCH_DEV It seems like you didn't set up the SCRATCH_DEV variable properly. > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > kernel:------------[ cut here ]------------ > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > kernel:invalid opcode: 0000 [#1] SMP > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > kernel:Stack: > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > kernel:Call Trace: > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... Looks like we did find a bug in NFS. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig wrote: On Fri, Jul 29, 2011 at 02:54:15PM -0400, Jim Rees wrote: > xfstests does require a small patch to work with NFSv4, which I can supply > if anyone is interested. Please send it to xfs@oss.sgi.com with a proper description and signoff. I don't have a proper patch, just one that works for me. But I'll send a bug report. It has to do with the mismatch between nfs and nfs4 mount types, so it's not really a xfstests bug. I think this will fix itself when the nfsvers=4 changes fully propagate. It seems like you didn't set up the SCRATCH_DEV variable properly. I was just skipping those tests so I could get to the one that fails. I've also tested with the SCRATCH_DEV tests in other runs. > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > kernel:------------[ cut here ]------------ > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > kernel:invalid opcode: 0000 [#1] SMP > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > kernel:Stack: > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > kernel:Call Trace: > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... Looks like we did find a bug in NFS. It kind of looks that way. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2011-07-29 at 15:13 -0400, Jim Rees wrote: > Christoph Hellwig wrote: > > On Fri, Jul 29, 2011 at 02:54:15PM -0400, Jim Rees wrote: > > xfstests does require a small patch to work with NFSv4, which I can supply > > if anyone is interested. > > Please send it to xfs@oss.sgi.com with a proper description and signoff. > > I don't have a proper patch, just one that works for me. But I'll send a > bug report. It has to do with the mismatch between nfs and nfs4 mount > types, so it's not really a xfstests bug. I think this will fix itself when > the nfsvers=4 changes fully propagate. > > It seems like you didn't set up the SCRATCH_DEV variable properly. > > I was just skipping those tests so I could get to the one that fails. I've > also tested with the SCRATCH_DEV tests in other runs. > > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > > kernel:------------[ cut here ]------------ > > > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > > kernel:invalid opcode: 0000 [#1] SMP > > > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > > kernel:Stack: > > > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > > kernel:Call Trace: > > > > Message from syslogd@rhcl1 at Jul 29 14:42:05 ... > > Looks like we did find a bug in NFS. > > It kind of looks that way. Is that reproducible on the upstream kernel, or is it something that is being introduced by the pNFS blocks code?
Trond Myklebust wrote: > Looks like we did find a bug in NFS. > > It kind of looks that way. Is that reproducible on the upstream kernel, or is it something that is being introduced by the pNFS blocks code? It happens without the blocks module loaded, but it could be from something we did outside the module. I will test this weekend when I get a chance. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Trond Myklebust wrote: Is that reproducible on the upstream kernel, or is it something that is being introduced by the pNFS blocks code? Upstream kernel 3.0.0-next-20110729 fails in a similar way, so it's not anything introduced by the block layout code. kernel BUG at /home/rees/linux-next/fs/nfs/callback_xdr.c:775! invalid opcode: 0000 [#1] SMP CPU 0 Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb3i ip6t_REJECT libcxgbi cxgb3 nf_conntrack_ipv6 nf_defrag_ipv6 mdio ib_iser ip6table_filter ip6_tables rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi amd64_edac_mod i2c_nforce2 shpchp tg3 pcspkr edac_core i2c_core serio_raw k8temp edac_mce_amd ipv6 autofs4 ata_generic pata_acpi mptspi mptscsih mptbase scsi_transport_spi sata_nv pata_amd [last unloaded: scsi_wait_scan] Pid: 6494, comm: nfsv4.1-svc Tainted: G W 3.0.0-next-20110729 #2 HP ProLiant DL145 G2/K85NL RIP: 0010:[<ffffffffa02998df>] [<ffffffffa02998df>] nfs4_cb_take_slot+0x2e/0x3e [nfs] RSP: 0018:ffff880074275bf0 EFLAGS: 00010286 RAX: 00000000ffffffff RBX: ffff88007b67ec00 RCX: 0000000000000001 RDX: 000000000000001c RSI: ffff88006b8b49f0 RDI: ffff88007b67ed88 RBP: ffff880074275c00 R08: 00000000000000d0 R09: 0000000000000002 R10: ffff88007fc12e70 R11: ffffffff81b42ab0 R12: ffff880037e72000 R13: 0000000000000000 R14: ffff880037ed3800 R15: ffff88007b67e800 FS: 00007fb17e8de720(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003ff9064c60 CR3: 0000000069a99000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process nfsv4.1-svc (pid: 6494, threadinfo ffff880074274000, task ffff880069410000) Stack: ffff880037e72000 ffff880037ed3800 ffff880074275ca0 ffffffffa029a5c2 ffff88006b8b49e0 ffff880071786098 ffff880074275ca0 ffffffffa0299a1a ffff88006b8b49e0 0000000100000246 ffff88007b67ec48 ffff880037ed3808 Call Trace: [<ffffffffa029a5c2>] nfs4_callback_sequence+0x272/0x338 [nfs] [<ffffffffa0299a1a>] ? decode_cb_sequence_args+0x12b/0x24a [nfs] [<ffffffffa0299733>] nfs4_callback_compound+0x364/0x4e2 [nfs] [<ffffffff8106f3bf>] ? groups_alloc+0x38/0xbe [<ffffffffa01f7206>] svc_process_common+0x260/0x4d3 [sunrpc] [<ffffffffa01f7552>] bc_svc_process+0xd9/0x102 [sunrpc] [<ffffffffa029884c>] nfs41_callback_svc+0xd5/0x126 [nfs] [<ffffffff8106905f>] ? wake_up_bit+0x25/0x25 [<ffffffffa0298777>] ? nfs_callback_down+0x7c/0x7c [nfs] [<ffffffff81068bf8>] kthread+0x7d/0x85 [<ffffffff81467e14>] kernel_thread_helper+0x4/0x10 [<ffffffff81068b7b>] ? kthread_worker_fn+0x147/0x147 [<ffffffff81467e10>] ? gs_change+0x13/0x13 Code: e5 53 48 83 ec 08 48 8b 9f a8 02 00 00 48 8d bb 88 01 00 00 e8 c4 5b 1c e1 8b 83 5c 02 00 00 ff c0 85 c0 89 83 5c 02 00 00 74 04 <0f> 0b eb fe 66 ff 83 88 01 00 00 41 5b 5b c9 c3 55 48 8d 42 08 RIP [<ffffffffa02998df>] nfs4_cb_take_slot+0x2e/0x3e [nfs] RSP <ffff880074275bf0> ---[ end trace 27e49b345894527a ]--- Callback slot table overflowed -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jul 30, 2011 at 11:26 AM, Jim Rees <rees@umich.edu> wrote: > Trond Myklebust wrote: > > > Looks like we did find a bug in NFS. > > > > It kind of looks that way. > > Is that reproducible on the upstream kernel, or is it something that is > being introduced by the pNFS blocks code? > > It happens without the blocks module loaded, but it could be from something > we did outside the module. I will test this weekend when I get a chance. I tried xfstests again and was able to reproduce a hang on both block layout and file layout (upstream commit ed1e62, w/o block layout code). It seems it is a bug in pnfs code. I did not see it w/ NFSv4. For pnfs block and file layout, it can be reproduced by just running xfstests with ./check -nfs. It does not show up every time but is likely to happen in less than 10 runs. Not sure if it is the same one Jim reported though. block layout trace: [ 660.039009] BUG: soft lockup - CPU#1 stuck for 22s! [10.244.82.74-ma:29730] [ 660.039014] Modules linked in: blocklayoutdriver nfs lockd fscache auth_rpcgss nfs_acl ebtable_na t ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc be2isc si ip6t_REJECT iscsi_boot_sysfs nf_conntrack_ipv6 nf_defrag_ipv6 bnx2i ip6table_filter cnic uio ip6_ tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev i2c_pii x4 i2c_core pcspkr e1000 parport_pc microcode parport vmw_balloon shpchp ipv6 floppy mptspi mptscsih mptbase scsi_transport_spi [last unloaded: nfs] [ 660.039014] CPU 1 [ 660.039014] Modules linked in: blocklayoutdriver nfs lockd fscache auth_rpcgss nfs_acl ebtable_na t ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc be2isc si ip6t_REJECT iscsi_boot_sysfs nf_conntrack_ipv6 nf_defrag_ipv6 bnx2i ip6table_filter cnic uio ip6_ tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev i2c_pii x4 i2c_core pcspkr e1000 parport_pc microcode parport vmw_balloon shpchp ipv6 floppy mptspi mptscsih mptbase scsi_transport_spi [last unloaded: nfs] [ 660.039014] [ 660.039014] Pid: 29730, comm: 10.244.82.74-ma Tainted: G D 3.0.0-pnfs+ #2 VMware, Inc. V Mware Virtual Platform/440BX Desktop Reference Platform [ 660.039014] RIP: 0010:[<ffffffff81084f49>] [<ffffffff81084f49>] do_raw_spin_lock+0x1e/0x25 [ 660.039014] RSP: 0018:ffff88001fef5e60 EFLAGS: 00000297 [ 660.039014] RAX: 000000000000002b RBX: ffff88003be19000 RCX: 0000000000000001 [ 660.039014] RDX: 000000000000002a RSI: ffff8800219a7cf0 RDI: ffff880020e4d988 [ 660.039014] RBP: ffff88001fef5e60 R08: 0000000000000000 R09: 000000000000df20 [ 660.039014] R10: 0000000000000000 R11: ffff8800219a7c00 R12: ffff88001fef5df0 [ 660.039014] R13: 00000000c355df1b R14: ffff88003bfaeac0 R15: ffff8800219a7c00 [ 660.039014] FS: 0000000000000000(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 [ 660.039014] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 660.039014] CR2: 00007fc6122a4000 CR3: 0000000001a04000 CR4: 00000000000006e0 [ 660.039014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 660.039014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 660.039014] Process 10.244.82.74-ma (pid: 29730, threadinfo ffff88001fef4000, task ffff88001fca80 00) [ 660.039014] Stack: [ 660.039014] ffff88001fef5e70 ffffffff814585ee ffff88001fef5e90 ffffffffa02badee [ 660.039014] 0000000000000000 ffff8800219a7c00 ffff88001fef5ee0 ffffffffa02bc2d9 [ 660.039014] ffff880000000000 ffffffffa02d2250 ffff88001fef5ee0 ffff88002059ba10 [ 660.039014] Call Trace: [ 660.039014] [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10 [ 660.039014] [<ffffffffa02badee>] nfs4_begin_drain_session+0x24/0x8f [nfs] [ 660.039014] [<ffffffffa02bc2d9>] nfs4_run_state_manager+0x271/0x517 [nfs] [ 660.039014] [<ffffffffa02bc068>] ? nfs4_do_reclaim+0x422/0x422 [nfs] [ 660.039014] [<ffffffff810719bf>] kthread+0x84/0x8c [ 660.039014] [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10 [ 660.039014] [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148 [ 660.039014] [<ffffffff81460f50>] ? gs_change+0x13/0x13 [ 660.039014] Code: 00 00 10 00 74 05 e8 a7 59 1b 00 5d c3 55 48 89 e5 66 66 66 66 90 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 <0f> b7 17 eb f5 5d c3 55 48 89 e5 66 66 66 66 90 8b 07 89 c2 c1 [ 660.039014] Call Trace: [ 660.039014] [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10 [ 660.039014] [<ffffffffa02badee>] nfs4_begin_drain_session+0x24/0x8f [nfs] [ 660.039014] [<ffffffffa02bc2d9>] nfs4_run_state_manager+0x271/0x517 [nfs] [ 660.039014] [<ffffffffa02bc068>] ? nfs4_do_reclaim+0x422/0x422 [nfs] [ 660.039014] [<ffffffff810719bf>] kthread+0x84/0x8c [ 660.039014] [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10 [ 660.039014] [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148 [ 660.039014] [<ffffffff81460f50>] ? gs_change+0x13/0x13 file layout trace: [19716.049009] BUG: soft lockup - CPU#1 stuck for 23s! [10.244.82.76-ma:29036] [19716.049011] Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 be2iscsi iscsi_boot_sysfs bnx2i cnic ip6table_filter uio ip6_tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev microcode i2c_piix4 e1000 vmw_balloon parport_pc parport shpchp pcspkr i2c_core ipv6 mptspi mptscsih mptbase scsi_transport_spi floppy [last unloaded: nfs] [19716.049011] CPU 1 [19716.049011] Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 be2iscsi iscsi_boot_sysfs bnx2i cnic ip6table_filter uio ip6_tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev microcode i2c_piix4 e1000 vmw_balloon parport_pc parport shpchp pcspkr i2c_core ipv6 mptspi mptscsih mptbase scsi_transport_spi floppy [last unloaded: nfs] [19716.049011] [19716.049011] Pid: 29036, comm: 10.244.82.76-ma Tainted: G D 3.0.0-pnfs+ #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [19716.049011] RIP: 0010:[<ffffffff81084f49>] [<ffffffff81084f49>] do_raw_spin_lock+0x1e/0x25 [19716.049011] RSP: 0018:ffff88002a69be60 EFLAGS: 00000297 [19716.049011] RAX: 0000000000000005 RBX: ffff88002a59fd00 RCX: 0000000000000002 [19716.049011] RDX: 0000000000000004 RSI: ffff8800208c00f0 RDI: ffff8800208c1188 [19716.049011] RBP: ffff88002a69be60 R08: 0000000000000002 R09: 0000ffff00066c0a [19716.049011] R10: 0000ffff00066c0a R11: ffff8800208c0000 R12: ffff88002a69bdf0 [19716.049011] R13: 0000000001ce15a2 R14: ffff88002a6f1f80 R15: ffff8800208c0000 [19716.049011] FS: 0000000000000000(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 [19716.049011] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [19716.049011] CR2: 00007fad5ac53000 CR3: 0000000038784000 CR4: 00000000000006e0 [19716.049011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [19716.049011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [19716.049011] Process 10.244.82.76-ma (pid: 29036, threadinfo ffff88002a69a000, task ffff880022db9720) [19716.049011] Stack: [19716.049011] ffff88002a69be70 ffffffff814585ee ffff88002a69be90 ffffffffa02be836 [19716.049011] 0000000000000002 ffff8800208c0000 ffff88002a69bee0 ffffffffa02bfd21 [19716.049011] ffff880000000000 ffffffffa02d59c0 ffff88002a69bee0 ffff880037971ce8 [19716.049011] Call Trace: [19716.049011] [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10 [19716.049011] [<ffffffffa02be836>] nfs4_begin_drain_session+0x24/0x8f [nfs] [19716.049011] [<ffffffffa02bfd21>] nfs4_run_state_manager+0x271/0x517 [nfs] [19716.049011] [<ffffffffa02bfab0>] ? nfs4_do_reclaim+0x422/0x422 [nfs] [19716.049011] [<ffffffff810719bf>] kthread+0x84/0x8c [19716.049011] [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10 [19716.049011] [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148 [19716.049011] [<ffffffff81460f50>] ? gs_change+0x13/0x13 [19716.049011] Code: 00 00 10 00 74 05 e8 a7 59 1b 00 5d c3 55 48 89 e5 66 66 66 66 90 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 <0f> b7 17 eb f5 5d c3 55 48 89 e5 66 66 66 66 90 8b 07 89 c2 c1 [19716.049011] Call Trace: [19716.049011] Call Trace: [19716.049011] [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10 [19716.049011] [<ffffffffa02be836>] nfs4_begin_drain_session+0x24/0x8f [nfs] [19716.049011] [<ffffffffa02bfd21>] nfs4_run_state_manager+0x271/0x517 [nfs] [19716.049011] [<ffffffffa02bfab0>] ? nfs4_do_reclaim+0x422/0x422 [nfs] [19716.049011] [<ffffffff810719bf>] kthread+0x84/0x8c [19716.049011] [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10 [19716.049011] [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148 [19716.049011] [<ffffffff81460f50>] ? gs_change+0x13/0x13
On Sat, 2011-07-30 at 22:25 +0800, Peng Tao wrote: > On Sat, Jul 30, 2011 at 11:26 AM, Jim Rees <rees@umich.edu> wrote: > > Trond Myklebust wrote: > > > > > Looks like we did find a bug in NFS. > > > > > > It kind of looks that way. > > > > Is that reproducible on the upstream kernel, or is it something that is > > being introduced by the pNFS blocks code? > > > > It happens without the blocks module loaded, but it could be from something > > we did outside the module. I will test this weekend when I get a chance. > I tried xfstests again and was able to reproduce a hang on both block > layout and file layout (upstream commit ed1e62, w/o block layout > code). It seems it is a bug in pnfs code. I did not see it w/ NFSv4. > For pnfs block and file layout, it can be reproduced by just running > xfstests with ./check -nfs. It does not show up every time but is > likely to happen in less than 10 runs. Not sure if it is the same one > Jim reported though. > > block layout trace: > [ 660.039009] BUG: soft lockup - CPU#1 stuck for 22s! [10.244.82.74-ma:29730] > [ 660.039014] Modules linked in: blocklayoutdriver nfs lockd fscache > auth_rpcgss nfs_acl ebtable_na > t ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM > iptable_mangle bridge stp llc sunrpc be2isc > si ip6t_REJECT iscsi_boot_sysfs nf_conntrack_ipv6 nf_defrag_ipv6 bnx2i > ip6table_filter cnic uio ip6_ > tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi > scsi_transport_iscsi ppdev i2c_pii > x4 i2c_core pcspkr e1000 parport_pc microcode parport vmw_balloon > shpchp ipv6 floppy mptspi mptscsih > mptbase scsi_transport_spi [last unloaded: nfs] > [ 660.039014] CPU 1 > [ 660.039014] Modules linked in: blocklayoutdriver nfs lockd fscache > auth_rpcgss nfs_acl ebtable_na > t ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM > iptable_mangle bridge stp llc sunrpc be2isc > si ip6t_REJECT iscsi_boot_sysfs nf_conntrack_ipv6 nf_defrag_ipv6 bnx2i > ip6table_filter cnic uio ip6_ > tables cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi > scsi_transport_iscsi ppdev i2c_pii > x4 i2c_core pcspkr e1000 parport_pc microcode parport vmw_balloon > shpchp ipv6 floppy mptspi mptscsih > mptbase scsi_transport_spi [last unloaded: nfs] > [ 660.039014] > [ 660.039014] Pid: 29730, comm: 10.244.82.74-ma Tainted: G D > 3.0.0-pnfs+ #2 VMware, Inc. V > Mware Virtual Platform/440BX Desktop Reference Platform > [ 660.039014] RIP: 0010:[<ffffffff81084f49>] [<ffffffff81084f49>] > do_raw_spin_lock+0x1e/0x25 > [ 660.039014] RSP: 0018:ffff88001fef5e60 EFLAGS: 00000297 > [ 660.039014] RAX: 000000000000002b RBX: ffff88003be19000 RCX: 0000000000000001 > [ 660.039014] RDX: 000000000000002a RSI: ffff8800219a7cf0 RDI: ffff880020e4d988 > [ 660.039014] RBP: ffff88001fef5e60 R08: 0000000000000000 R09: 000000000000df20 > [ 660.039014] R10: 0000000000000000 R11: ffff8800219a7c00 R12: ffff88001fef5df0 > [ 660.039014] R13: 00000000c355df1b R14: ffff88003bfaeac0 R15: ffff8800219a7c00 > [ 660.039014] FS: 0000000000000000(0000) GS:ffff88003fd00000(0000) > knlGS:0000000000000000 > [ 660.039014] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 660.039014] CR2: 00007fc6122a4000 CR3: 0000000001a04000 CR4: 00000000000006e0 > [ 660.039014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 660.039014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 660.039014] Process 10.244.82.74-ma (pid: 29730, threadinfo > ffff88001fef4000, task ffff88001fca80 > 00) > [ 660.039014] Stack: > [ 660.039014] ffff88001fef5e70 ffffffff814585ee ffff88001fef5e90 > ffffffffa02badee > [ 660.039014] 0000000000000000 ffff8800219a7c00 ffff88001fef5ee0 > ffffffffa02bc2d9 > [ 660.039014] ffff880000000000 ffffffffa02d2250 ffff88001fef5ee0 > ffff88002059ba10 > [ 660.039014] Call Trace: > [ 660.039014] [<ffffffff814585ee>] _raw_spin_lock+0xe/0x10 > [ 660.039014] [<ffffffffa02badee>] nfs4_begin_drain_session+0x24/0x8f [nfs] > [ 660.039014] [<ffffffffa02bc2d9>] nfs4_run_state_manager+0x271/0x517 [nfs] > [ 660.039014] [<ffffffffa02bc068>] ? nfs4_do_reclaim+0x422/0x422 [nfs] > [ 660.039014] [<ffffffff810719bf>] kthread+0x84/0x8c > [ 660.039014] [<ffffffff81460f54>] kernel_thread_helper+0x4/0x10 > [ 660.039014] [<ffffffff8107193b>] ? kthread_worker_fn+0x148/0x148 > [ 660.039014] [<ffffffff81460f50>] ? gs_change+0x13/0x13 > [ 660.039014] Code: 00 00 10 00 74 05 e8 a7 59 1b 00 5d c3 55 48 89 > e5 66 66 66 66 90 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 > 74 07 f3 90 <0f> b7 17 eb f5 5d c3 55 48 89 e5 66 66 66 66 90 8b 07 89 > c2 c1 OK... Looking at the callback code, I see that if tbl->highest_used_slotid != 0, then we BUG() while holding the backchannel's tbl->slot_tbl_lock spinlock. That seems a likely candidate for the above hang. Andy, how we are guaranteed that tbl->highest_used_slotid won't take values other than 0, and why do we commit suicide when it does? As far as I can see, there is no guarantee that we call nfs4_cb_take_slot() in nfs4_callback_compound(), however we appear to unconditionally call nfs4_cb_free_slot() provided there is a session. The other strangeness would be the fact that there is nothing enforcing the NFS4_SESSION_DRAINING flag. If the session is draining, then the back-channel simply ignores that and goes ahead with processing the callback. Is this to avoid deadlocks with the server returning NFS4ERR_BACK_CHAN_BUSY when the client does a DESTROY_SESSION?
--- 001.out 2011-07-29 12:11:34.057245055 -0400 +++ 001.out.bad 2011-07-29 14:41:36.697152750 -0400 @@ -1,9 +1,4 @@ QA output created by 001 cleanup -setup .................................... -iter 1 chain ... check .................................... -iter 2 chain ... check .................................... -iter 3 chain ... check .................................... -iter 4 chain ... check .................................... -iter 5 chain ... check .................................... +001 not run: this test requires a valid host fs for $SCRATCH_DEV cleanup 002 [not run] this test requires a valid host fs for $SCRATCH_DEV 003 [not run] not suitable for this filesystem type: nfs 004 [not run] not suitable for this filesystem type: nfs 005 [not run] this test requires a valid host fs for $SCRATCH_DEV 006 [not run] this test requires a valid host fs for $SCRATCH_DEV 007 [not run] this test requires a valid host fs for $SCRATCH_DEV 008 [not run] not suitable for this filesystem type: nfs 009 [not run] not suitable for this filesystem type: nfs 010 [not run] dbtest was not built for this platform 011 [not run] this test requires a valid host fs for $SCRATCH_DEV 012 [not run] not suitable for this filesystem type: nfs 013 [not run] this test requires a valid host fs for $SCRATCH_DEV 014 [not run] this test requires a valid host fs for $SCRATCH_DEV 015 [not run] not suitable for this filesystem type: nfs 016 [not run] not suitable for this filesystem type: nfs 017 [not run] not suitable for this filesystem type: nfs 018 [not run] not suitable for this filesystem type: nfs 019 [not run] not suitable for this filesystem type: nfs 020 [not run] not suitable for this filesystem type: nfs 021 [not run] not suitable for this filesystem type: nfs 022 [not run] xfsdump not found 023 [not run] xfsdump not found 024 [not run] xfsdump not found 025 [not run] xfsdump not found 026 [not run] xfsdump not found 027 [not run] xfsdump not found 028 [not run] xfsdump not found 029 [not run] not suitable for this filesystem type: nfs 030 [not run] not suitable for this filesystem type: nfs 031 [not run] not suitable for this filesystem type: nfs 032 [not run] not suitable for this filesystem type: nfs 033 [not run] not suitable for this filesystem type: nfs 034 [not run] not suitable for this filesystem type: nfs 035 [not run] xfsdump not found 036 [not run] xfsdump not found 037 [not run] xfsdump not found 038 [not run] xfsdump not found 039 [not run] xfsdump not found 040 [not run] Can't run srcdiff without KWORKAREA set 041 [not run] not suitable for this filesystem type: nfs 042 [not run] not suitable for this filesystem type: nfs 043 [not run] xfsdump not found 044 [not run] not suitable for this filesystem type: nfs 045 [not run] not suitable for this filesystem type: nfs 046 [not run] xfsdump not found 047 [not run] xfsdump not found 048 [not run] not suitable for this filesystem type: nfs 049 [not run] not suitable for this filesystem type: nfs 050 [not run] not suitable for this filesystem type: nfs 051 [not run] not suitable for this filesystem type: nfs 052 [not run] not suitable for this filesystem type: nfs 053 [not run] this test requires a valid $SCRATCH_DEV 054 [not run] not suitable for this filesystem type: nfs 055 [not run] xfsdump not found 056 [not run] xfsdump not found 057 [not run] Place holder for IRIX test 057 058 [not run] Place holder for IRIX test 058 059 [not run] Place holder for IRIX test 059 060 [not run] Place holder for IRIX test 060 061 [not run] xfsdump not found 062 [not run] this test requires a valid $SCRATCH_DEV 063 [not run] xfsdump not found 064 [not run] xfsdump not found 065 [not run] xfsdump not found 066 [not run] xfsdump not found 067 [not run] not suitable for this filesystem type: nfs 068 [not run] not suitable for this filesystem type: nfs 069 [not run] this test requires a valid $SCRATCH_DEV 070 [not run] attrs not supported by this filesystem type: nfs 071 [not run] not suitable for this filesystem type: nfs 072 [not run] not suitable for this filesystem type: nfs 073 [not run] not suitable for this filesystem type: nfs 074 [not run] this test requires a valid host fs for $SCRATCH_DEV 075 [not run] this test requires a valid host fs for $SCRATCH_DEV 076 [not run] this test requires a valid $SCRATCH_DEV 077 [not run] attrs not supported by this filesystem type: nfs 078 [not run] not suitable for this filesystem type: nfs 079 [not run] not suitable for this filesystem type: nfs 080 [not run] not suitable for this filesystem type: nfs 081 [not run] not suitable for this filesystem type: nfs 082 [not run] not suitable for this filesystem type: nfs 083 [not run] not suitable for this filesystem type: nfs 084 [not run] not suitable for this filesystem type: nfs 085 [not run] not suitable for this filesystem type: nfs 086 [not run] not suitable for this filesystem type: nfs 087 [not run] not suitable for this filesystem type: nfs 088 - output mismatch (see 088.out.bad) --- 088.out 2011-07-29 12:11:34.085247833 -0400 +++ 088.out.bad 2011-07-29 14:41:48.336307595 -0400 @@ -1,9 +1,2 @@ QA output created by 088 -access(TEST_DIR/t_access, 0) returns 0 -access(TEST_DIR/t_access, R_OK) returns 0 -access(TEST_DIR/t_access, W_OK) returns 0 -access(TEST_DIR/t_access, X_OK) returns -1 -access(TEST_DIR/t_access, R_OK | W_OK) returns 0 -access(TEST_DIR/t_access, R_OK | X_OK) returns -1 -access(TEST_DIR/t_access, W_OK | X_OK) returns -1 -access(TEST_DIR/t_access, R_OK | W_OK | X_OK) returns -1 +fchown: Invalid argument