Message ID | 20130326144640.GB3353@fieldses.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Mar 26, 2013 at 10:46:40AM -0400, J. Bruce Fields wrote: > > fs/locks.c:2093 says that we did the final fput of a file that still had > > posix locks held on it. > > > > I can't see how that would happen, but admittedly the nfsd4 code here is > > much too complicated for its own good. > > > > If it were possible it would be useful to know what lead up to this. A > > network trace (tcpdump -s0 -wtmp.pcap, then send me tmp.pcap) would be > > useful for that, but I fear it's probably too huge by the time you hit > > the bug. > > > > (The following warning ("rcu_sched self-detected stall") looks like the > > result of BUGing while holding file_lock_lock. That BUG should probably > > be downgraded to a WARN_ON_ONCE.) > > Can you run with test patches? > > This just makes nfsd's fput calls synchronous so that we see in the > backtrace who called them. > > (But assuming it does turn out to be one of these callers, we may then > need some more printk's in the nfsd code...). Might also be a good idea to dump the offending struct file_lock, while we are at it... -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/26/2013 03:46 PM, J. Bruce Fields wrote: > Can you run with test patches? > > This just makes nfsd's fput calls synchronous so that we see in the > backtrace who called them. Well - the patched 3.8.4 host kernel now survives the stress test of the UML system. What I get at the host is now : 2013-03-26T18:32:17.487+01:00 n22 kernel: ------------[ cut here ]------------ 2013-03-26T18:32:17.487+01:00 n22 kernel: WARNING: at mm/page_alloc.c:2376 __alloc_pages_nodemask+0x262/0x7c0() 2013-03-26T18:32:17.487+01:00 n22 kernel: Hardware name: 4180F65 2013-03-26T18:32:17.487+01:00 n22 kernel: Modules linked in: md5 nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun msr snd_hda_codec_conexant snd_hda_intel uvcvideo i915 usblp snd_hda_codec sdhci_pci snd_pcm cfbfillrect videobuf2_vmalloc arc4 iwldvm cfbimgblt i2c_algo_bit cfbcopyarea intel_agp videobuf2_memops sdhci coretemp videobuf2_core mac80211 snd_page_alloc kvm_intel acpi_cpufreq fbcon mperf bitblit softcursor font intel_gtt videodev mmc_core snd_timer kvm drm_kms_helper thinkpad_acpi drm nvram agpgart fb snd i2c_i801 video i2c_core tpm_tis soundcore processor 8250_pci 8250_core iwlwifi thermal battery cfg80211 fbdev ac tpm rfkill psmouse serial_core e1000e evdev thermal_sys button tpm_bios wmi hwmon aesni_intel ablk_helper cryptd lr w aes_i586 xts gf128mul cbc sha256_generic fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode] 2013-03-26T18:32:17.487+01:00 n22 kernel: Pid: 6614, comm: nfsd Not tainted 3.8.4 #9 2013-03-26T18:32:17.487+01:00 n22 kernel: Call Trace: 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c1033ba2>] warn_slowpath_common+0x72/0xa0 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c10db802>] ? __alloc_pages_nodemask+0x262/0x7c0 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c10db802>] ? __alloc_pages_nodemask+0x262/0x7c0 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c1033bf2>] warn_slowpath_null+0x22/0x30 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c10db802>] __alloc_pages_nodemask+0x262/0x7c0 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c1069758>] ? dequeue_task_fair+0x258/0x6f0 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c13a8803>] ? inet_recvmsg+0x73/0x90 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c10dbd7c>] __get_free_pages+0x1c/0x30 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c110e0f9>] kmalloc_order_trace+0x29/0xb0 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f857e480>] ? ip_map_request+0xd0/0xd0 [sunrpc] 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f85824d6>] ? cache_revisit_request+0x26/0xd0 [sunrpc] 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c110f149>] __kmalloc+0x1b9/0x1e0 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f85835bf>] ? cache_check+0x22f/0x340 [sunrpc] 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f90a0b1c>] nfs4_acl_new+0x1c/0x30 [nfsd] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9095482>] nfsd4_decode_fattr+0x302/0x6c0 [nfsd] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<c110f068>] ? __kmalloc+0xd8/0x1e0 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9095871>] nfsd4_decode_setattr+0x31/0x40 [nfsd] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f909804d>] nfs4svc_decode_compoundargs+0x23d/0x280 [nfsd] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9097e10>] ? nfsd4_release_compoundargs+0x70/0x70 [nfsd] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9081b3f>] nfsd_dispatch+0x3f/0x1b0 [nfsd] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f857b8a8>] svc_process+0x3f8/0x6d0 [sunrpc] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f8586d5e>] ? svc_xprt_received+0x3e/0x80 [sunrpc] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f85879e1>] ? svc_recv+0x2d1/0x3c0 [sunrpc] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f908165d>] nfsd+0xad/0x110 [nfsd] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f90815b0>] ? nfsd_destroy+0x70/0x70 [nfsd] 2013-03-26T18:32:17.488+01:00 n22 kernel: [<c1053b94>] kthread+0x94/0xa0 2013-03-26T18:32:17.488+01:00 n22 kernel: [<c13d0f77>] ret_from_kernel_thread+0x1b/0x28 2013-03-26T18:32:17.488+01:00 n22 kernel: [<c1053b00>] ? flush_kthread_work+0xd0/0xd0 2013-03-26T18:32:17.488+01:00 n22 kernel: ---[ end trace 82b96ff8bcd5453f ]---
On 03/26/2013 03:46 PM, J. Bruce Fields wrote:
> Can you run with test patches?
applied - today I got a warning instead a bug at the host kernel (3.8.4):
2013-03-27T22:28:43.480+01:00 n22 kernel: ------------[ cut here ]------------
2013-03-27T22:28:43.480+01:00 n22 kernel: WARNING: at fs/locks.c:2093 locks_remove_flock+0xe5/0x110()
2013-03-27T22:28:43.480+01:00 n22 kernel: Hardware name: 4180F65
2013-03-27T22:28:43.480+01:00 n22 kernel: Modules linked in: loop md5 nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun msr i915 cfbfillrect cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt sdhci_pci fbcon bitblit softcursor font drm_kms_helper sdhci usblp drm arc4 mmc_core iwldvm mac80211 uvcvideo videobuf2_vmalloc snd_hda_codec_conexant videobuf2_memops videobuf2_core coretemp videodev acpi_cpufreq iwlwifi mperf cfg80211 snd_hda_intel agpgart fb fbdev i2c_i801 thinkpad_acpi processor video thermal ac snd_hda_codec snd_pcm snd_page_alloc 8250_pci i2c_core kvm_intel snd_timer nvram snd soundcore 8250_core e1000e serial_core button tpm_tis tpm kvm rfkill tpm_bios psmouse thermal_sys battery wmi evdev hwmon aesni_intel ablk_helper cryp
td lrw aes_i586 xts gf128mul cbc sha256_generic fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
2013-03-27T22:28:43.480+01:00 n22 kernel: Pid: 7442, comm: nfsd Not tainted 3.8.4 #13
2013-03-27T22:28:43.480+01:00 n22 kernel: Call Trace:
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1033ba2>] warn_slowpath_common+0x72/0xa0
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] ? locks_remove_flock+0xe5/0x110
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] ? locks_remove_flock+0xe5/0x110
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1033bf2>] warn_slowpath_null+0x22/0x30
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] locks_remove_flock+0xe5/0x110
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c110eb53>] ? kmem_cache_alloc+0x73/0x140
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c111e719>] __fput+0x79/0x1f0
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c111ed1b>] __fput_sync+0x4b/0x60
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bb8b17>] __nfs4_file_put_access+0x57/0x80 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bb8b73>] nfs4_file_put_access+0x33/0x40 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbd4f6>] nfsd4_open_downgrade+0x1b6/0x210 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbd340>] ? nfsd4_open_confirm+0x130/0x130 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8baf94b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbf180>] ? nfsd4_set_lockstateid+0x50/0x50 [nfsd]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba1ba0>] nfsd_dispatch+0xa0/0x1b0 [nfsd]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f857b8a8>] svc_process+0x3f8/0x6d0 [sunrpc]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8586d5e>] ? svc_xprt_received+0x3e/0x80 [sunrpc]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f85879e1>] ? svc_recv+0x2d1/0x3c0 [sunrpc]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba165d>] nfsd+0xad/0x110 [nfsd]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba15b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<c1053b94>] kthread+0x94/0xa0
2013-03-27T22:28:43.481+01:00 n22 kernel: [<c13d0f77>] ret_from_kernel_thread+0x1b/0x28
2013-03-27T22:28:43.481+01:00 n22 kernel: [<c1053b00>] ? flush_kthread_work+0xd0/0xd0
2013-03-27T22:28:43.481+01:00 n22 kernel: ---[ end trace 66f6fe36b0d13e8f ]---
The UML kernel was 3.9.0-rc4-00144-ga8c4528, and the trinity command just :
$> trinity --children 2 --victims /mnt/nfs/n22/victims -x mremap
On Wed, Mar 27, 2013 at 10:35:12PM +0100, Toralf Förster wrote: > On 03/26/2013 03:46 PM, J. Bruce Fields wrote: > > Can you run with test patches? > > applied - today I got a warning instead a bug at the host kernel (3.8.4): Excellent! Thanks, I think I see the bug now.... --b. > 2013-03-27T22:28:43.480+01:00 n22 kernel: ------------[ cut here ]------------ > 2013-03-27T22:28:43.480+01:00 n22 kernel: WARNING: at fs/locks.c:2093 locks_remove_flock+0xe5/0x110() > 2013-03-27T22:28:43.480+01:00 n22 kernel: Hardware name: 4180F65 > 2013-03-27T22:28:43.480+01:00 n22 kernel: Modules linked in: loop md5 nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun msr i915 cfbfillrect cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt sdhci_pci fbcon bitblit softcursor font drm_kms_helper sdhci usblp drm arc4 mmc_core iwldvm mac80211 uvcvideo videobuf2_vmalloc snd_hda_codec_conexant videobuf2_memops videobuf2_core coretemp videodev acpi_cpufreq iwlwifi mperf cfg80211 snd_hda_intel agpgart fb fbdev i2c_i801 thinkpad_acpi processor video thermal ac snd_hda_codec snd_pcm snd_page_alloc 8250_pci i2c_core kvm_intel snd_timer nvram snd soundcore 8250_core e1000e serial_core button tpm_tis tpm kvm rfkill tpm_bios psmouse thermal_sys battery wmi evdev hwmon aesni_intel ablk_helper cryp > td lrw aes_i586 xts gf128mul cbc sha256_generic fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode] > 2013-03-27T22:28:43.480+01:00 n22 kernel: Pid: 7442, comm: nfsd Not tainted 3.8.4 #13 > 2013-03-27T22:28:43.480+01:00 n22 kernel: Call Trace: > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1033ba2>] warn_slowpath_common+0x72/0xa0 > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] ? locks_remove_flock+0xe5/0x110 > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] ? locks_remove_flock+0xe5/0x110 > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1033bf2>] warn_slowpath_null+0x22/0x30 > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] locks_remove_flock+0xe5/0x110 > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c110eb53>] ? kmem_cache_alloc+0x73/0x140 > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c111e719>] __fput+0x79/0x1f0 > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c111ed1b>] __fput_sync+0x4b/0x60 > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bb8b17>] __nfs4_file_put_access+0x57/0x80 [nfsd] > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bb8b73>] nfs4_file_put_access+0x33/0x40 [nfsd] > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbd4f6>] nfsd4_open_downgrade+0x1b6/0x210 [nfsd] > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbd340>] ? nfsd4_open_confirm+0x130/0x130 [nfsd] > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8baf94b>] nfsd4_proc_compound+0x41b/0x530 [nfsd] > 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbf180>] ? nfsd4_set_lockstateid+0x50/0x50 [nfsd] > 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba1ba0>] nfsd_dispatch+0xa0/0x1b0 [nfsd] > 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f857b8a8>] svc_process+0x3f8/0x6d0 [sunrpc] > 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8586d5e>] ? svc_xprt_received+0x3e/0x80 [sunrpc] > 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f85879e1>] ? svc_recv+0x2d1/0x3c0 [sunrpc] > 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba165d>] nfsd+0xad/0x110 [nfsd] > 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba15b0>] ? nfsd_destroy+0x70/0x70 [nfsd] > 2013-03-27T22:28:43.481+01:00 n22 kernel: [<c1053b94>] kthread+0x94/0xa0 > 2013-03-27T22:28:43.481+01:00 n22 kernel: [<c13d0f77>] ret_from_kernel_thread+0x1b/0x28 > 2013-03-27T22:28:43.481+01:00 n22 kernel: [<c1053b00>] ? flush_kthread_work+0xd0/0xd0 > 2013-03-27T22:28:43.481+01:00 n22 kernel: ---[ end trace 66f6fe36b0d13e8f ]--- > > > The UML kernel was 3.9.0-rc4-00144-ga8c4528, and the trinity command just : > $> trinity --children 2 --victims /mnt/nfs/n22/victims -x mremap > > -- > MfG/Sincerely > Toralf Förster > pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/file_table.c b/fs/file_table.c index de9e965..79cb999 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -337,6 +337,8 @@ void __fput_sync(struct file *file) } } +EXPORT_SYMBOL(__fput_sync); + EXPORT_SYMBOL(fput); void put_filp(struct file *file) diff --git a/fs/locks.c b/fs/locks.c index a94e331..c962161 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2090,7 +2090,7 @@ void locks_remove_flock(struct file *filp) continue; } /* What? */ - BUG(); + WARN_ON_ONCE(1); } before = &fl->fl_next; } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 9d1c5db..c01c60c 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -951,7 +951,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, write->wr_offset, rqstp->rq_vec, nvecs, &cnt, &write->wr_how_written); if (filp) - fput(filp); + __fput_sync(filp); write->wr_bytes_written = cnt; @@ -1325,7 +1325,7 @@ encode_op: } /* XXX Ugh, we need to get rid of this kind of special case: */ if (op->opnum == OP_READ && op->u.read.rd_filp) - fput(op->u.read.rd_filp); + __fput_sync(op->u.read.rd_filp); nfsd4_increment_op_stats(op->opnum); } diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index ba6fdd4..dfdf54c 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -546,7 +546,7 @@ nfsd4_shutdown_recdir(struct nfsd_net *nn) { if (!nn->rec_file) return; - fput(nn->rec_file); + __fput_sync(nn->rec_file); nn->rec_file = NULL; } diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a8309c6..0f38b39 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -201,7 +201,7 @@ static void nfs4_file_get_access(struct nfs4_file *fp, int oflag) static void nfs4_file_put_fd(struct nfs4_file *fp, int oflag) { if (fp->fi_fds[oflag]) { - fput(fp->fi_fds[oflag]); + __fput_sync(fp->fi_fds[oflag]); fp->fi_fds[oflag] = NULL; } } @@ -353,7 +353,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp) if (atomic_dec_and_test(&fp->fi_delegees)) { vfs_setlease(fp->fi_deleg_file, F_UNLCK, &fp->fi_lease); fp->fi_lease = NULL; - fput(fp->fi_deleg_file); + __fput_sync(fp->fi_deleg_file); fp->fi_deleg_file = NULL; } } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index d586117..76bb67c 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -826,7 +826,7 @@ out: void nfsd_close(struct file *filp) { - fput(filp); + __fput_sync(filp); } /*