Message ID | 20160714164647.GD14480@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 14/07/2016 18:46, Al Viro wrote: > On Tue, Jul 12, 2016 at 12:24:39PM +0200, Paolo Bonzini wrote: > >> On 12/07/2016 11:38, Liu Shuo wrote: >>> The failure of create debugfs of VM will return directly without release >>> the anon file. It will leak memory and file descriptors, even through >>> be not serious. >>> >>> Signed-off-by: Liu Shuo <shuo.a.liu@intel.com> > > This is broken. > >>> @@ -3067,6 +3068,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) >>> >>> if (kvm_create_vm_debugfs(kvm, r) < 0) { >>> kvm_put_kvm(kvm); >>> + sys_close(r); > > You have no warranty whatsoever that descriptor table has not been changed > by that point. You should *NEVER* use sys_close() on failure exit paths > like that. Moreover, this kvm_put_kvm() becomes a double-put, since > closing the damn file will drop that reference to kvm. > >>> return -ENOMEM; >>> } >>> >>> >> >> Thanks, applied to kvm/master. > > Please, revert. anon_inode_getfd() should be used only when there's no > possible failures past its call. What you need in this case (due to fucking > ugly API - decriptor number used as part of debugfs pathname) is something > like this (against mainline): Al, thanks very much for the explanation and the patch. I've reverted the original patch and installed yours instead. Paolo > [kvm] don't use anon_inode_getfd() before possible failures > > Once anon_inode_getfd() has succeeded, it's impossible to undo in a clean > way and no, sys_close() is not usable in such cases. Use anon_inode_getfile() > and get_unused_fd_flags() to get struct file and descriptor and do *not* > install the file into descriptor table until after the last possible failure > exit. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 48bd520..7f82c6c 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3048,6 +3048,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) > { > int r; > struct kvm *kvm; > + struct file *file; > > kvm = kvm_create_vm(type); > if (IS_ERR(kvm)) > @@ -3059,17 +3060,25 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) > return r; > } > #endif > - r = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC); > + r = get_unused_fd_flags(O_CLOEXEC); > if (r < 0) { > kvm_put_kvm(kvm); > return r; > } > + file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR); > + if (IS_ERR(file)) { > + put_unused_fd(r); > + kvm_put_kvm(kvm); > + return PTR_ERR(file); > + } > > if (kvm_create_vm_debugfs(kvm, r) < 0) { > - kvm_put_kvm(kvm); > + put_unused_fd(r); > + fput(file); > return -ENOMEM; > } > > + fd_install(r, file); > return r;git > } > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 14.Jul'16 at 17:46:47 +0100, Al Viro wrote: >On Tue, Jul 12, 2016 at 12:24:39PM +0200, Paolo Bonzini wrote: > >> On 12/07/2016 11:38, Liu Shuo wrote: >> > The failure of create debugfs of VM will return directly without release >> > the anon file. It will leak memory and file descriptors, even through >> > be not serious. >> > >> > Signed-off-by: Liu Shuo <shuo.a.liu@intel.com> > >This is broken. > >> > @@ -3067,6 +3068,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) >> > >> > if (kvm_create_vm_debugfs(kvm, r) < 0) { >> > kvm_put_kvm(kvm); >> > + sys_close(r); > >You have no warranty whatsoever that descriptor table has not been changed >by that point. You should *NEVER* use sys_close() on failure exit paths Could you please elaborate why we're not sure descriptor table's changing at the point? >like that. Moreover, this kvm_put_kvm() becomes a double-put, since >closing the damn file will drop that reference to kvm. You're right. Thanks for the correction on this point. > >> > return -ENOMEM; >> > } >> > >> > >> >> Thanks, applied to kvm/master. > >Please, revert. anon_inode_getfd() should be used only when there's no >possible failures past its call. What you need in this case (due to fucking >ugly API - decriptor number used as part of debugfs pathname) is something >like this (against mainline): > >[kvm] don't use anon_inode_getfd() before possible failures > >Once anon_inode_getfd() has succeeded, it's impossible to undo in a clean >way and no, sys_close() is not usable in such cases. Use anon_inode_getfile() >and get_unused_fd_flags() to get struct file and descriptor and do *not* >install the file into descriptor table until after the last possible failure >exit. > >Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Thanks. It's better. >--- >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >index 48bd520..7f82c6c 100644 >--- a/virt/kvm/kvm_main.c >+++ b/virt/kvm/kvm_main.c >@@ -3048,6 +3048,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) > { > int r; > struct kvm *kvm; >+ struct file *file; > > kvm = kvm_create_vm(type); > if (IS_ERR(kvm)) >@@ -3059,17 +3060,25 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) > return r; > } > #endif >- r = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC); >+ r = get_unused_fd_flags(O_CLOEXEC); > if (r < 0) { > kvm_put_kvm(kvm); > return r; > } >+ file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR); >+ if (IS_ERR(file)) { >+ put_unused_fd(r); >+ kvm_put_kvm(kvm); >+ return PTR_ERR(file); >+ } > > if (kvm_create_vm_debugfs(kvm, r) < 0) { >- kvm_put_kvm(kvm); >+ put_unused_fd(r); >+ fput(file); > return -ENOMEM; > } > >+ fd_install(r, file); > return r; > } > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 15, 2016 at 10:22:04AM +0800, Liu Shuo wrote: > > You have no warranty whatsoever that descriptor table has not been changed > > by that point. You should *NEVER* use sys_close() on failure exit paths > Could you please elaborate why we're not sure descriptor table's changing at the point? Because that could be called by one thread while another (having guessed the descriptor you are about to get) does close()/dup2()/etc. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri 15.Jul'16 at 3:26:03 +0100, Al Viro wrote: >On Fri, Jul 15, 2016 at 10:22:04AM +0800, Liu Shuo wrote: >> > You have no warranty whatsoever that descriptor table has not been changed >> > by that point. You should *NEVER* use sys_close() on failure exit paths >> Could you please elaborate why we're not sure descriptor table's changing at the point? > >Because that could be called by one thread while another (having guessed the >descriptor you are about to get) does close()/dup2()/etc. If there is no such thread (who operates the descriptor based on guessing), i can think the changing is safe at the point. As the fd has not been delivered to userspace. Am i right? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 15, 2016 at 11:18:41AM +0800, Liu Shuo wrote: > If there is no such thread (who operates the descriptor based on > guessing), i can think the changing is safe at the point. As the fd has > not been delivered to userspace. Am i right? <wry> Expecting nice behaviour from userland code is something best avoided, really. </wry> All jokes aside, this other thread doesn't have to be malicious - just being buggy would suffice. Besides, you never know if something like userns won't be dumped into the kernel, making your ioctl accessible to genuinely malicious code. The only sane approach is to treat descriptor tables as shared data structures and postpone the insertion of struct file reference into descriptor table until you are past all failure exits. Including the ones related to copying to userland - e.g. pipe(2) creates a pipe, sets up two struct file associated with it, reserves two descriptors, copies them into userland array and only if everything has succeeded proceeds to fd_install(). In your case passing the descriptor to userland is not an issue (return value of ioctl(2) goes via register and that can't fail), so the last failure exit is that after failed attempt to create debugfs stuff. We have to reserve the descriptor before that (it's used as a part of debugfs directory name), so anon_inode_getfd() is not an option - it combines reserving descriptor with fd_install(). Such situations are exactly the reason why anon_inode_getfile() is there; anon_inode_getfd() is usable only when it is the very last thing we do before returning the descriptor to userland. FWIW, original code was not unreasonable - it simply treated debugfs stuff as optional and ignored those failures. That way anon_inode_getfd() is fine - there's no failure exits after it. If we want to fail when debugfs had been enabled and we'd failed to populate it, we need to use the real primitives behind anon_inode_getfd(), though. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri 15.Jul'16 at 6:09:59 +0100, Al Viro wrote: >On Fri, Jul 15, 2016 at 11:18:41AM +0800, Liu Shuo wrote: > >> If there is no such thread (who operates the descriptor based on >> guessing), i can think the changing is safe at the point. As the fd has >> not been delivered to userspace. Am i right? > ><wry> >Expecting nice behaviour from userland code is something best avoided, really. ></wry> Got it! :) > >All jokes aside, this other thread doesn't have to be malicious - just being >buggy would suffice. Besides, you never know if something like userns won't >be dumped into the kernel, making your ioctl accessible to genuinely >malicious code. > >The only sane approach is to treat descriptor tables as shared data structures >and postpone the insertion of struct file reference into descriptor table >until you are past all failure exits. Including the ones related to copying >to userland - e.g. pipe(2) creates a pipe, sets up two struct file associated >with it, reserves two descriptors, copies them into userland array and only if >everything has succeeded proceeds to fd_install(). In your case passing the >descriptor to userland is not an issue (return value of ioctl(2) goes via >register and that can't fail), so the last failure exit is that after failed >attempt to create debugfs stuff. We have to reserve the descriptor before >that (it's used as a part of debugfs directory name), so anon_inode_getfd() >is not an option - it combines reserving descriptor with fd_install(). >Such situations are exactly the reason why anon_inode_getfile() is there; >anon_inode_getfd() is usable only when it is the very last thing we do >before returning the descriptor to userland. Really appreciate your exhaustive explanations. Thanks. > >FWIW, original code was not unreasonable - it simply treated debugfs stuff as >optional and ignored those failures. That way anon_inode_getfd() is fine - >there's no failure exits after it. If we want to fail when debugfs had >been enabled and we'd failed to populate it, we need to use the real primitives >behind anon_inode_getfd(), though. So, my firstly immature idea returns back. Is the fd used by debugfs name very useful? Do we really need it here? Alternative approach is putting the fd into the debugfs, then the code will be clean here, as we can put anon_inode_getfd at very last of the ioctl. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 48bd520..7f82c6c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3048,6 +3048,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) { int r; struct kvm *kvm; + struct file *file; kvm = kvm_create_vm(type); if (IS_ERR(kvm)) @@ -3059,17 +3060,25 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) return r; } #endif - r = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC); + r = get_unused_fd_flags(O_CLOEXEC); if (r < 0) { kvm_put_kvm(kvm); return r; } + file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR); + if (IS_ERR(file)) { + put_unused_fd(r); + kvm_put_kvm(kvm); + return PTR_ERR(file); + } if (kvm_create_vm_debugfs(kvm, r) < 0) { - kvm_put_kvm(kvm); + put_unused_fd(r); + fput(file); return -ENOMEM; } + fd_install(r, file); return r; }