Message ID | 20140717202721.GG30442@fieldses.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/17/2014 10:27 PM, J. Bruce Fields wrote: > On Wed, Jul 16, 2014 at 02:57:24PM -0400, J. Bruce Fields wrote: >> On Sat, Jul 12, 2014 at 08:31:15PM +0800, Kinglong Mee wrote: >>> I think it is caused by kfree an uninitialized address. >>> Can you test with the patch listed in following url, >>> I have send some days before ? >>> >>> "[PATCH 1/4] NFSD: Fix memory leak in encoding denied lock" >>> http://www.spinics.net/lists/linux-nfs/msg44719.html >> >> I have this queued for 3.17, but if it causes a crash then it should go >> to 3.16 now. >> >> However, I'm confused: the only explicit initialization of lk_denied is >> in the case vfs_lock_file() returns -EAGAIN. Our usual tests (cthon, >> pynfs) do lots of succesful locks, so should have hit this before. >> >> OK, I see: this memory zeroed by a memset in svc_process_common(): >> >> memset(rqstp->rq_argp, 0, procp->pc_argsize); >> >> *But* in the case of the NFSv4 compound operation, we only have enough >> space in rq_argp for 8 operations, anything more is allocated in >> fs/nfsd/nfs4xdr.c:nfsd4_decode_compound: >> >> if (argp->opcnt > ARRAY_SIZE(argp->iops)) { >> argp->ops = kmalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); >> ... >> >> So, perhaps we got a compound with more than 8 operations, with the LOCK >> operation in the 9th or later position? >> >> But the Linux NFS client doesn't do that, so I don't understand how >> Toralf hit this. >> >> Am I missing anything here? >> >> Toralf, is that crash reproduceable? If so, does replacing the above >> kmalloc by a kcalloc also fix it? > > Sorry, that should be kzalloc. We should probably fix that regardless. > > But I still don't understand how you hit this case.... > > --b. > > commit 5d6031ca742f > Author: J. Bruce Fields <bfields@redhat.com> > Date: Thu Jul 17 16:20:39 2014 -0400 > > nfsd4: zero op arguments beyond the 8th compound op > > The first 8 ops of the compound are zeroed since they're a part of the > argument that's zeroed by the > > memset(rqstp->rq_argp, 0, procp->pc_argsize); > > in svc_process_common(). But we handle larger compounds by allocating > the memory on the fly in nfsd4_decode_compound(). Other than code > recently fixed by 01529e3f8179 "NFSD: Fix memory leak in encoding denied > lock", I don't know of any examples of code depending on this > initialization. But it definitely seems possible, and I'd rather be > safe. > > Compounds this long are unusual so I'm much more worried about failure > in this poorly tested cases than about an insignificant performance hit. > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > index 01023a595163..628b430e743e 100644 > --- a/fs/nfsd/nfs4xdr.c > +++ b/fs/nfsd/nfs4xdr.c > @@ -1635,7 +1635,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) > goto xdr_error; > > if (argp->opcnt > ARRAY_SIZE(argp->iops)) { > - argp->ops = kmalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); > + argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); > if (!argp->ops) { > argp->ops = argp->iops; > dprintk("nfsd: couldn't allocate room for COMPOUND\n"); > I'm trying to reproduce it, but due to the nature of fuzz testing ...
On 07/17/2014 10:27 PM, J. Bruce Fields wrote: >> Toralf, is that crash reproduceable? If so, does replacing the above >> kmalloc by a kcalloc also fix it? Yes. I could reproduce it today, it tooks about 2 hours using these trinity paramters : -C 2 -N 100000 -x mremap -x munmap -q -v /mnt/nfsv4 The backtrace of the core file : Thread 1 (LWP 17085): #0 0xb770baec in __kernel_vsyscall () #1 0x08484e85 in kill () at ../sysdeps/unix/syscall-template.S:81 #2 0x0807253d in uml_abort () at arch/um/os-Linux/util.c:93 #3 0x08072885 in os_dump_core () at arch/um/os-Linux/util.c:148 #4 0x0806241d in panic_exit (self=0x86c95c0 <panic_exit_notifier>, unused1=0, unused2=0x8700980 <buf.19753>) at arch/um/kernel/um_arch.c:240 #5 0x08099706 in notifier_call_chain (nl=0x0, val=2231598400, v=0x6, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93 #6 0x08099863 in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:183 #7 atomic_notifier_call_chain (nh=0x8700944 <panic_notifier_list>, val=0, v=0x8700980 <buf.19753>) at kernel/notifier.c:193 #8 0x084e0494 in panic (fmt=0x0) at kernel/panic.c:133 #9 0x080ffc70 in kfree (x=0x218) at mm/slub.c:3380 #10 0x0822c0db in nfsd4_encode_lock (resp=0x85576120, nfserr=0, lock=0x855752d0) at fs/nfsd/nfs4xdr.c:2910 #11 0x08233ad3 in nfsd4_encode_operation (resp=0x85576120, op=0x855752b0) at fs/nfsd/nfs4xdr.c:3889 #12 0x0822afff in nfsd4_proc_compound (rqstp=0x85571030, args=0x85037d40, resp=0x85576120) at fs/nfsd/nfs4proc.c:1386 #13 0x08219aea in nfsd_dispatch (rqstp=0x85571030, statp=0x7fcc8018) at fs/nfsd/nfssvc.c:686 #14 0x084686ff in svc_process_common (rqstp=0x85571030, argv=0x85037d40, resv=0x855711ac) at net/sunrpc/svc.c:1206 #15 0x08469bbb in svc_process (rqstp=0x85571030) at net/sunrpc/svc.c:1331 #16 0x08219472 in nfsd (vrqstp=0x85571030) at fs/nfsd/nfssvc.c:609 #17 0x08095806 in kthread (_create=0x84f86eb0) at kernel/kthread.c:207 #18 0x0805f64b in new_thread_handler () at arch/um/kernel/process.c:129 #19 0x00000000 in ?? () and the UML guest command line : ... Kernel panic - not syncing: BUG! CPU: 0 PID: 1397 Comm: nfsd Tainted: G B 3.16.0-rc5-00152-g59ca9ee-dirty #76 Stack: 0859f770 0859f770 00000003 086c8547 00000000 855752d0 0c6c9700 85037e1c 084e3e42 00000000 85037df0 85037e44 084e0478 085ab1c4 08700980 0859c46a 85037e50 00000000 00000000 855752d0 0c6c9700 85037e74 080ffc70 0859c46a Call Trace: [<084e3e42>] dump_stack+0x26/0x28 [<084e0478>] panic+0x7a/0x194 [<080ffc70>] kfree+0x80/0x150 [<0822ba5e>] ? nfsd4_encode_stateid+0x3e/0x60 [<0822c0db>] nfsd4_encode_lock+0x4b/0x60 [<08233ad3>] nfsd4_encode_operation+0xd3/0x1d0 [<0822afff>] nfsd4_proc_compound+0x4bf/0x670 [<0809b999>] ? groups_alloc+0x39/0xe0 [<08219aea>] nfsd_dispatch+0xda/0x200 [<084686ff>] svc_process_common+0x31f/0x610 [<08469bbb>] svc_process+0x10b/0x140 [<0809d270>] ? default_wake_function+0x0/0x20 [<08219390>] ? nfsd+0x0/0x140 [<08219472>] nfsd+0xe2/0x140 [<08095806>] kthread+0xd6/0xe0 [<0809c60d>] ? finish_task_switch.isra.56+0x1d/0x70 [<0805f64b>] new_thread_handler+0x6b/0x90 I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ? > > Sorry, that should be kzalloc. We should probably fix that regardless. > > But I still don't understand how you hit this case.... > > --b.
On 07/18/2014 06:22 PM, Toralf Förster wrote:
> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ?
Well, next crash (with kzalloc patch) happened after 20 minutes ...
On Sat, Jul 19, 2014 at 12:50 AM, Toralf Förster <toralf.foerster@gmx.de> wrote: > On 07/18/2014 06:22 PM, Toralf Förster wrote: >> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ? > > Well, next crash (with kzalloc patch) happened after 20 minutes ... Maybe I have found the problem. The stateid and denied are defined as an union as, fs/nfsd/xdr4.h 145 struct nfsd4_lock_denied { 146 clientid_t ld_clientid; 147 struct xdr_netobj ld_owner; 148 u64 ld_start; 149 u64 ld_length; 150 u32 ld_type; 151 }; 152 153 struct nfsd4_lock { ... ... 174 /* response */ 175 union { 176 struct { 177 stateid_t stateid; 178 } ok; 179 struct nfsd4_lock_denied denied; 180 } u; 30 struct xdr_netobj { 31 unsigned int len; 32 u8 * data; 33 }; sizeof(stateid_t) = 16, sizeof(clientid_t) = 8, sizeof(struct xdr_netobj) = 16, (on x86_x64 platform), sizeof(struct xdr_netobj) = 8, (on i686 platform) Lock file success, nfsd will copy stateid to the union, but the value also influence denied. If on x86_64 platform, only influence the len in xdr_netobj, but on i686 platform, will influence the len and the data in xdr_netobj. So, the problem only appears on i686 platform. thanks, Kinglong Mee -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/19/2014 05:23 AM, Kinglong Mee wrote: > Lock file success, nfsd will copy stateid to the union, but the value > also influence denied. > If on x86_64 platform, only influence the len in xdr_netobj, > but on i686 platform, will influence the len and the data in xdr_netobj. > So, the problem only appears on i686 platform. OT, but wouldn't this be a good example for the coverity people to check for it too ?
On Sat, Jul 19, 2014 at 11:23:59AM +0800, Kinglong Mee wrote: > On Sat, Jul 19, 2014 at 12:50 AM, Toralf Förster <toralf.foerster@gmx.de> wrote: > > On 07/18/2014 06:22 PM, Toralf Förster wrote: > >> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ? > > > > Well, next crash (with kzalloc patch) happened after 20 minutes ... > > Maybe I have found the problem. > The stateid and denied are defined as an union as, > fs/nfsd/xdr4.h > 145 struct nfsd4_lock_denied { > 146 clientid_t ld_clientid; > 147 struct xdr_netobj ld_owner; > 148 u64 ld_start; > 149 u64 ld_length; > 150 u32 ld_type; > 151 }; > 152 > 153 struct nfsd4_lock { > ... ... > 174 /* response */ > 175 union { > 176 struct { > 177 stateid_t stateid; > 178 } ok; > 179 struct nfsd4_lock_denied denied; > 180 } u; > > 30 struct xdr_netobj { > 31 unsigned int len; > 32 u8 * data; > 33 }; > > sizeof(stateid_t) = 16, sizeof(clientid_t) = 8, > sizeof(struct xdr_netobj) = 16, (on x86_x64 platform), > sizeof(struct xdr_netobj) = 8, (on i686 platform) > > Lock file success, nfsd will copy stateid to the union, but the value > also influence denied. > If on x86_64 platform, only influence the len in xdr_netobj, > but on i686 platform, will influence the len and the data in xdr_netobj. > So, the problem only appears on i686 platform. Oh, great catch, thanks. Sounds like that would explain all of Toralf's results. I'll include this explanation with your original patch and submit it for 3.16. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 7/21/2014 23:55, J. Bruce Fields wrote: > On Sat, Jul 19, 2014 at 11:23:59AM +0800, Kinglong Mee wrote: >> On Sat, Jul 19, 2014 at 12:50 AM, Toralf Förster <toralf.foerster@gmx.de> wrote: >>> On 07/18/2014 06:22 PM, Toralf Förster wrote: >>>> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ? >>> >>> Well, next crash (with kzalloc patch) happened after 20 minutes ... >> >> Maybe I have found the problem. >> The stateid and denied are defined as an union as, >> fs/nfsd/xdr4.h >> 145 struct nfsd4_lock_denied { >> 146 clientid_t ld_clientid; >> 147 struct xdr_netobj ld_owner; >> 148 u64 ld_start; >> 149 u64 ld_length; >> 150 u32 ld_type; >> 151 }; >> 152 >> 153 struct nfsd4_lock { >> ... ... >> 174 /* response */ >> 175 union { >> 176 struct { >> 177 stateid_t stateid; >> 178 } ok; >> 179 struct nfsd4_lock_denied denied; >> 180 } u; >> >> 30 struct xdr_netobj { >> 31 unsigned int len; >> 32 u8 * data; >> 33 }; >> >> sizeof(stateid_t) = 16, sizeof(clientid_t) = 8, >> sizeof(struct xdr_netobj) = 16, (on x86_x64 platform), >> sizeof(struct xdr_netobj) = 8, (on i686 platform) >> >> Lock file success, nfsd will copy stateid to the union, but the value >> also influence denied. >> If on x86_64 platform, only influence the len in xdr_netobj, >> but on i686 platform, will influence the len and the data in xdr_netobj. >> So, the problem only appears on i686 platform. > > Oh, great catch, thanks. Sounds like that would explain all of Toralf's > results. > > I'll include this explanation with your original patch and submit it for > 3.16. I saw the patch in your git-tree, there is one fault about the comments. > (Note that lock->lk_denied.ld_owner.data appears it should be zero here, > until you notice that it's one arm of a union the other arm of which is > written to in the succesful case by the > > memcpy(&oc->oc_resp_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t)); > > in nfsd4_open_confirm(). It's, memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid, sizeof(stateid_t)); in nfsd4_lock(). > In the 32-bit case this overwrites ld_owner.data.) thanks, Kinglong Mee -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 23, 2014 at 01:04:35PM +0800, Kinglong Mee wrote: > On 7/21/2014 23:55, J. Bruce Fields wrote: > > On Sat, Jul 19, 2014 at 11:23:59AM +0800, Kinglong Mee wrote: > >> On Sat, Jul 19, 2014 at 12:50 AM, Toralf Förster <toralf.foerster@gmx.de> wrote: > >>> On 07/18/2014 06:22 PM, Toralf Förster wrote: > >>>> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ? > >>> > >>> Well, next crash (with kzalloc patch) happened after 20 minutes ... > >> > >> Maybe I have found the problem. > >> The stateid and denied are defined as an union as, > >> fs/nfsd/xdr4.h > >> 145 struct nfsd4_lock_denied { > >> 146 clientid_t ld_clientid; > >> 147 struct xdr_netobj ld_owner; > >> 148 u64 ld_start; > >> 149 u64 ld_length; > >> 150 u32 ld_type; > >> 151 }; > >> 152 > >> 153 struct nfsd4_lock { > >> ... ... > >> 174 /* response */ > >> 175 union { > >> 176 struct { > >> 177 stateid_t stateid; > >> 178 } ok; > >> 179 struct nfsd4_lock_denied denied; > >> 180 } u; > >> > >> 30 struct xdr_netobj { > >> 31 unsigned int len; > >> 32 u8 * data; > >> 33 }; > >> > >> sizeof(stateid_t) = 16, sizeof(clientid_t) = 8, > >> sizeof(struct xdr_netobj) = 16, (on x86_x64 platform), > >> sizeof(struct xdr_netobj) = 8, (on i686 platform) > >> > >> Lock file success, nfsd will copy stateid to the union, but the value > >> also influence denied. > >> If on x86_64 platform, only influence the len in xdr_netobj, > >> but on i686 platform, will influence the len and the data in xdr_netobj. > >> So, the problem only appears on i686 platform. > > > > Oh, great catch, thanks. Sounds like that would explain all of Toralf's > > results. > > > > I'll include this explanation with your original patch and submit it for > > 3.16. > > I saw the patch in your git-tree, there is one fault about the comments. > > > (Note that lock->lk_denied.ld_owner.data appears it should be zero here, > > until you notice that it's one arm of a union the other arm of which is > > written to in the succesful case by the > > > > memcpy(&oc->oc_resp_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t)); > > > > in nfsd4_open_confirm(). > > It's, > memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid, > sizeof(stateid_t)); > in nfsd4_lock(). Thanks, fixed! --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 01023a595163..628b430e743e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1635,7 +1635,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) goto xdr_error; if (argp->opcnt > ARRAY_SIZE(argp->iops)) { - argp->ops = kmalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); + argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); if (!argp->ops) { argp->ops = argp->iops; dprintk("nfsd: couldn't allocate room for COMPOUND\n");