Message ID | 20160603055655.GQ14480@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Well, I have some bad news. This patch does not fix the issue 100% of the time apparently, I just hit it again. Only this time it's much harder to trigger, but stack is the same (looks a bit different due to a compiler change). Must be some much narrower race now. I still don't have a crashdump, though (apparently makedumpfile that is used by kexec-tools is behind times and does not work with 4.7.0-rc1 kernels) so I cannot tell you more about the dentry still. [12470.365211] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989! [12470.366336] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [12470.366927] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr acpi_cpufreq i2c_piix4 tpm_tis virtio_console tpm nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm serio_raw virtio_blk floppy [12470.368917] CPU: 7 PID: 15952 Comm: cat Not tainted 4.7.0-rc1-vm-nfs+ #115 [12470.369554] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [12470.370137] task: ffff8800447447c0 ti: ffff880049a48000 task.ti: ffff880049a48000 [12470.371214] RIP: 0010:[<ffffffff81288061>] [<ffffffff81288061>] d_splice_alias+0x1e1/0x470 [12470.372340] RSP: 0018:ffff880049a4bab8 EFLAGS: 00010286 [12470.372906] RAX: ffff8800393372a8 RBX: ffff88003c781000 RCX: 0000000000000001 [12470.373525] RDX: 0000000000001895 RSI: ffff88003c781000 RDI: ffff8800393372a8 [12470.374145] RBP: ffff880049a4baf0 R08: 00001353641935c2 R09: 0000000000000000 [12470.374777] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003a7b9300 [12470.375407] R13: 0000000000000000 R14: ffff88003bf0d2a8 R15: 0000000000000000 [12470.376016] FS: 00007fbb07feb700(0000) GS:ffff88006b800000(0000) knlGS:0000000000000000 [12470.377106] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12470.377693] CR2: 000055a498c7bdc8 CR3: 00000000479b3000 CR4: 00000000000006e0 [12470.378311] Stack: [12470.378823] ffff880040008f00 0000000029e67876 ffff88003c781000 ffff88003a7b9300 [12470.379946] 0000000000000000 ffff88003bf0d2a8 0000000000000000 ffff880049a4bb48 [12470.381075] ffffffff8137d63c ffffffffffffffeb ffff880000000000 0000000000000000 [12470.382228] Call Trace: [12470.382766] [<ffffffff8137d63c>] nfs_lookup+0x15c/0x420 [12470.383363] [<ffffffff8137f681>] nfs_atomic_open+0xb1/0x700 [12470.383961] [<ffffffff812792ea>] lookup_open+0x2ea/0x770 [12470.384570] [<ffffffff8127c76f>] path_openat+0x7ff/0x1030 [12470.385169] [<ffffffff8127d15f>] ? getname_flags+0x4f/0x1f0 [12470.385770] [<ffffffff8104a485>] ? kvm_sched_clock_read+0x25/0x40 [12470.386361] [<ffffffff8127e1d1>] do_filp_open+0x91/0x100 [12470.386945] [<ffffffff8188aa97>] ? _raw_spin_unlock+0x27/0x40 [12470.387559] [<ffffffff8128f810>] ? __alloc_fd+0x100/0x200 [12470.388158] [<ffffffff8126a230>] do_sys_open+0x130/0x220 [12470.388758] [<ffffffff8126a33e>] SyS_open+0x1e/0x20 [12470.389327] [<ffffffff8188b3fc>] entry_SYSCALL_64_fastpath+0x1f/0xbd [12470.389929] Code: 83 c4 10 4c 89 f8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 df e8 f1 d6 ff ff 4c 89 f7 e8 19 2a 60 00 45 31 ff eb d9 49 89 ff eb d4 <0f> 0b 48 8b 43 50 4c 8b 78 68 49 8d 97 c8 03 00 00 eb 02 f3 90 [12470.392299] RIP [<ffffffff81288061>] d_splice_alias+0x1e1/0x470 On Jun 3, 2016, at 1:56 AM, Al Viro wrote: > On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote: > >> This one cures the insta-crash I was having, and I see no other ill-effects so far. > > OK... I can take it through vfs.git, but I think it'd be better off in > NFS tree. Is everyone OK with something like the following? > > make nfs_atomic_open() call d_drop() on all ->open_context() errors. > > In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code" > unconditional d_drop() after the ->open_context() had been removed. It had > been correct for success cases (there ->open_context() itself had been doing > dcache manipulations), but not for error ones. Only one of those (ENOENT) > got a compensatory d_drop() added in that commit, but in fact it should've > been done for all errors. As it is, the case of O_CREAT non-exclusive open > on a hashed negative dentry racing with e.g. symlink creation from another > client ended up with ->open_context() getting an error and proceeding to > call nfs_lookup(). On a hashed dentry, which would've instantly triggered > BUG_ON() in d_materialise_unique() (or, these days, its equivalent in > d_splice_alias()). > > Cc: stable@vger.kernel.org # v3.10+ > Tested-by: Oleg Drokin <green@linuxhacker.ru> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index aaf7bd0..6e3a6f4 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry, > err = PTR_ERR(inode); > trace_nfs_atomic_open_exit(dir, ctx, open_flags, err); > put_nfs_open_context(ctx); > + d_drop(dentry); > switch (err) { > case -ENOENT: > - d_drop(dentry); > d_add(dentry, NULL); > nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); > break; -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Jun 6, 2016, at 7:36 PM, Oleg Drokin wrote: > Well, I have some bad news. > > This patch does not fix the issue 100% of the time apparently, I just hit it again. Ok, so now finally a piece of good news. Whatever was causing this other much harder to hit crash is gone in -rc2, gone are some other disturbing things I saw. Hopefully this patch will get propagated everywhere soonish.-- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Jun 9, 2016, at 9:33 PM, Oleg Drokin wrote: > > On Jun 6, 2016, at 7:36 PM, Oleg Drokin wrote: > >> Well, I have some bad news. >> >> This patch does not fix the issue 100% of the time apparently, I just hit it again. > > Ok, so now finally a piece of good news. > Whatever was causing this other much harder to hit crash is gone in -rc2, gone are > some other disturbing things I saw. Famous last words, I guess. It all returned overnight. But this time it's different from the past couple. 0xffffffff813aa8bb is in nfs4_do_open (/home/green/bk/linux/fs/nfs/nfs4proc.c:2482). 2477 d_drop(dentry); 2478 alias = d_exact_alias(dentry, state->inode); 2479 if (!alias) 2480 alias = d_splice_alias(igrab(state->inode), dentry); 2481 /* d_splice_alias() can't fail here - it's a non-directory */ So it appears the dentry that was negative turned positive in the middle of d_exact_alias call… and also despite us doing d_drop(dentry), it's also already hashed? If this can happen in the middle of our operations here, same thing presumably can happen in the other place we patched up in nfs_atomic_open - we do d_drop there, but by the time we get into d_splice_alias it's now all hashed again by some racing thread, that would explain some of the rarer earlier crashes I had in -rc1 after the initial fix. I wonder if it could be a remote-fs specific open vs open race? Say we have atomic open for parent1/file1, this locks parent1 and proceeds to lookup file1. Now before the lookup commences, some other thread moves file1 to parent2 and yet some other thread goes to open parent2/file1. Eventually it all converges with two threads trying to instantiate file1, if we get it "just wrong" then a clash like what we see can happen, no? Hm, I guess then both opens would have their own dentry and it's only inode that's shared, so that cannot be it. How can anything find a dentry we presumably just allocated and hash it while we are not done with it, I wonder? Also I wonder if all of this is somehow related to this other problem I am having where drop_nlink reports going into negative territory on rename() call, I hit this twice already and I guess I just need to convert that to BUG_ON instead for a closer examination. The dentry is (I finally have a crashdump): crash> p *(struct dentry *)0xffff880055dbd2e8 $4 = { d_flags = 4718796, d_seq = { sequence = 4, dep_map = { key = 0xffffffff8337b1c0 <__key.41115>, class_cache = {0x0, 0x0}, name = 0xffffffff81c79c66 "&dentry->d_seq", cpu = 6, ip = 6510615555426900570 } }, d_hash = { next = 0x0, pprev = 0xffffc900000fd9c0 }, d_parent = 0xffff8800079d1008, d_name = { { { hash = 2223188567, len = 2 }, hash_len = 10813123159 }, name = 0xffff880055dbd358 "10" }, d_inode = 0xffff880066d8ab38, d_iname = "10\000ZZZZZZZZZZZZZZZZZZZZZZZZZZZZ", d_lockref = { { { lock = { { rlock = { raw_lock = { val = { counter = 0 } }, magic = 3735899821, owner_cpu = 4294967295, owner = 0xffffffffffffffff, dep_map = { key = 0xffffffff8337b1c8 <__key.41114>, class_cache = {0xffffffff82c1c3a0 <lock_classes+47616>, 0x0}, name = 0xffffffff81c65bc8 "&(&dentry->d_lockref.lock)->rlock", cpu = 3, ip = 18446744071583760701 } }, { __padding = "\000\000\000\000╜N╜чЪЪЪЪZZZZЪЪЪЪЪЪЪЪ", dep_map = { key = 0xffffffff8337b1c8 <__key.41114>, class_cache = {0xffffffff82c1c3a0 <lock_classes+47616>, 0x0}, name = 0xffffffff81c65bc8 "&(&dentry->d_lockref.lock)->rlock", cpu = 3, ip = 18446744071583760701 } } } }, count = 3 } } }, d_op = 0xffffffff81a46780 <nfs4_dentry_operations>, d_sb = 0xffff88004eaf3000, d_time = 6510615555426956154, d_fsdata = 0x0, { d_lru = { next = 0xffff880066d7e3e8, prev = 0xffff8800036736c8 }, d_wait = 0xffff880066d7e3e8 }, d_child = { next = 0xffff8800268629b8, prev = 0xffff8800079d1128 }, d_subdirs = { next = 0xffff880055dbd408, prev = 0xffff880055dbd408 }, d_u = { d_alias = { next = 0x0, pprev = 0xffff880066d8ad20 }, d_in_lookup_hash = { next = 0x0, pprev = 0xffff880066d8ad20 }, d_rcu = { next = 0x0, func = 0xffff880066d8ad20 } } } [22845.516232] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989! [22845.516864] invalid opcode: 0000 [#1] SMP [22845.517350] Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq joydev tpm_tis tpm virtio_console i2c_piix4 pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_blk serio_raw floppy [22845.519236] CPU: 0 PID: 2894259 Comm: cat Not tainted 4.7.0-rc2-vm-nfs+ #122 [22845.519843] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [22845.520259] task: ffff8800640d8e40 ti: ffff88004665c000 task.ti: ffff88004665c000 [22845.520975] RIP: 0010:[<ffffffff81287811>] [<ffffffff81287811>] d_splice_alias+0x1e1/0x470 [22845.521746] RSP: 0018:ffff88004665fa08 EFLAGS: 00010286 [22845.522122] RAX: ffff880066d8ab38 RBX: 0000000000000000 RCX: 0000000000000001 [22845.522524] RDX: 000000000000191a RSI: ffff880055dbd2e8 RDI: ffff880066d8ab38 [22845.522926] RBP: ffff88004665fa40 R08: 0000235352190a66 R09: ffff880055dbd2e8 [22845.523328] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880072d43c00 [22845.523731] R13: ffff880072d43c00 R14: ffff88004883a580 R15: ffff880052615800 [22845.524131] FS: 00007f3b2c27d700(0000) GS:ffff88006b400000(0000) knlGS:0000000000000000 [22845.524846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [22845.525225] CR2: 000055649285c0d0 CR3: 000000004d961000 CR4: 00000000000006f0 [22845.525628] Stack: [22845.525964] ffff88004665fa20 ffffffff8188a1f7 0000000000000000 ffff880072d43c00 [22845.526700] ffff880072d43c00 ffff88004883a580 ffff880052615800 ffff88004665fb20 [22845.527433] ffffffff813aa8bb ffffffff00000000 ffff8800024000c0 ffff88004665fd80 [22845.528161] Call Trace: [22845.528504] [<ffffffff8188a1f7>] ? _raw_spin_unlock+0x27/0x40 [22845.528943] [<ffffffff813aa8bb>] nfs4_do_open+0xaeb/0xb10 [22845.529331] [<ffffffff813aa9d0>] nfs4_atomic_open+0xf0/0x110 [22845.529718] [<ffffffff8137eefc>] nfs_atomic_open+0x1ac/0x700 [22845.530112] [<ffffffff8127900a>] lookup_open+0x2ea/0x770 [22845.530488] [<ffffffff8127bee5>] path_openat+0x6e5/0xc20 [22845.530881] [<ffffffff8104a425>] ? kvm_sched_clock_read+0x25/0x40 [22845.531268] [<ffffffff8127d981>] do_filp_open+0x91/0x100 [22845.531645] [<ffffffff8188a1f7>] ? _raw_spin_unlock+0x27/0x40 [22845.532027] [<ffffffff8128efc0>] ? __alloc_fd+0x100/0x200 [22845.532405] [<ffffffff81269f80>] do_sys_open+0x130/0x220 [22845.533157] [<ffffffff8126a08e>] SyS_open+0x1e/0x20 vmcore is at http://knox.linuxhacker.ru/tmp/dentry2/vmcore.gz vmlinux is at http://knox.linuxhacker.ru/tmp/dentry2/vmlinux.gz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
It looks like this patch was totally forgotten? I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause crash in nfs code. And I think it's unrelated to the other parallel case too. On Jun 3, 2016, at 1:56 AM, Al Viro wrote: > On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote: > >> This one cures the insta-crash I was having, and I see no other ill-effects so far. > > OK... I can take it through vfs.git, but I think it'd be better off in > NFS tree. Is everyone OK with something like the following? > > make nfs_atomic_open() call d_drop() on all ->open_context() errors. > > In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code" > unconditional d_drop() after the ->open_context() had been removed. It had > been correct for success cases (there ->open_context() itself had been doing > dcache manipulations), but not for error ones. Only one of those (ENOENT) > got a compensatory d_drop() added in that commit, but in fact it should've > been done for all errors. As it is, the case of O_CREAT non-exclusive open > on a hashed negative dentry racing with e.g. symlink creation from another > client ended up with ->open_context() getting an error and proceeding to > call nfs_lookup(). On a hashed dentry, which would've instantly triggered > BUG_ON() in d_materialise_unique() (or, these days, its equivalent in > d_splice_alias()). > > Cc: stable@vger.kernel.org # v3.10+ > Tested-by: Oleg Drokin <green@linuxhacker.ru> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index aaf7bd0..6e3a6f4 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry, > err = PTR_ERR(inode); > trace_nfs_atomic_open_exit(dir, ctx, open_flags, err); > put_nfs_open_context(ctx); > + d_drop(dentry); > switch (err) { > case -ENOENT: > - d_drop(dentry); > d_add(dentry, NULL); > nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); > break; -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote: > It looks like this patch was totally forgotten? > I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause > crash in nfs code. And I think it's unrelated to the other parallel case too. I assumed it would go through NFS tree, seeing that it's NFS-specific and has nothing to do with any of the recent VFS changes (oops is triggerable starting from 3.11); I can certainly put it through vfs.git, and there will be changes nearby, but this one should go into -stable as a separate patch. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jun 20, 2016, at 10:08, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote: >> It looks like this patch was totally forgotten? >> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause >> crash in nfs code. And I think it's unrelated to the other parallel case too. > > I assumed it would go through NFS tree, seeing that it's NFS-specific and > has nothing to do with any of the recent VFS changes (oops is triggerable > starting from 3.11); I can certainly put it through vfs.git, and there > will be changes nearby, but this one should go into -stable as a separate > patch. > I’ll take it through the NFS tree. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 20, 2016 at 02:54:36PM +0000, Trond Myklebust wrote: > > > On Jun 20, 2016, at 10:08, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote: > >> It looks like this patch was totally forgotten? > >> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause > >> crash in nfs code. And I think it's unrelated to the other parallel case too. > > > > I assumed it would go through NFS tree, seeing that it's NFS-specific and > > has nothing to do with any of the recent VFS changes (oops is triggerable > > starting from 3.11); I can certainly put it through vfs.git, and there > > will be changes nearby, but this one should go into -stable as a separate > > patch. > > > > I’ll take it through the NFS tree. OK. It's really a -stable fodder, BTW - all you need to trigger that oops is a hashed negative dentry from earlier lookup + symlink created from another client + attempt to open from ours. Gets you d_splice_alias() (or d_materialise_unique() prior to 3.19) with hashed dentry and that triggers BUG_ON, leaving us with the parent directory locked. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On 06/20/2016 10:08 AM, Al Viro wrote: > On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote: >> It looks like this patch was totally forgotten? >> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause >> crash in nfs code. And I think it's unrelated to the other parallel case too. > > I assumed it would go through NFS tree, seeing that it's NFS-specific and > has nothing to do with any of the recent VFS changes (oops is triggerable > starting from 3.11); I can certainly put it through vfs.git, and there > will be changes nearby, but this one should go into -stable as a separate > patch. I was going to put together an nfs bugfixes pull request for 4.7 this week, so I can include the patch there if this is easy to hit. Anna > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Jun 20, 2016, at 11:43 AM, Anna Schumaker wrote: > Hi, > > On 06/20/2016 10:08 AM, Al Viro wrote: >> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote: >>> It looks like this patch was totally forgotten? >>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause >>> crash in nfs code. And I think it's unrelated to the other parallel case too. >> >> I assumed it would go through NFS tree, seeing that it's NFS-specific and >> has nothing to do with any of the recent VFS changes (oops is triggerable >> starting from 3.11); I can certainly put it through vfs.git, and there >> will be changes nearby, but this one should go into -stable as a separate >> patch. > > I was going to put together an nfs bugfixes pull request for 4.7 this week, so I can include the patch there if this is easy to hit. Yes, it is very easy to hit. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jun 20, 2016, at 11:43, Anna Schumaker <Anna.Schumaker@netapp.com> wrote: > > Hi, > > On 06/20/2016 10:08 AM, Al Viro wrote: >> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote: >>> It looks like this patch was totally forgotten? >>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause >>> crash in nfs code. And I think it's unrelated to the other parallel case too. >> >> I assumed it would go through NFS tree, seeing that it's NFS-specific and >> has nothing to do with any of the recent VFS changes (oops is triggerable >> starting from 3.11); I can certainly put it through vfs.git, and there >> will be changes nearby, but this one should go into -stable as a separate >> patch. > > I was going to put together an nfs bugfixes pull request for 4.7 this week, so I can include the patch there if this is easy to hit. > Hi Anna, Please do, and please keep the Cc: stable… Thanks Trond
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index aaf7bd0..6e3a6f4 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry, err = PTR_ERR(inode); trace_nfs_atomic_open_exit(dir, ctx, open_flags, err); put_nfs_open_context(ctx); + d_drop(dentry); switch (err) { case -ENOENT: - d_drop(dentry); d_add(dentry, NULL); nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); break;