Message ID | 20220303183328.1499189-2-dmatlack@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESEND,1/2] KVM: Prevent module exit until all VMs are freed | expand |
On Thu, Mar 03, 2022, David Matlack wrote: > Tie the lifetime the KVM module to the lifetime of each VM via > kvm.users_count. This way anything that grabs a reference to the VM via > kvm_get_kvm() cannot accidentally outlive the KVM module. > > Prior to this commit, the lifetime of the KVM module was tied to the > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU > file descriptors by their respective file_operations "owner" field. > This approach is insufficient because references grabbed via > kvm_get_kvm() do not prevent closing any of the aforementioned file > descriptors. > > This fixes a long standing theoretical bug in KVM that at least affects > async page faults. kvm_setup_async_pf() grabs a reference via > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing > prevents the VM file descriptor from being closed and the KVM module > from being unloaded before this callback runs. > > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out") And (or) Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations") because the above is x86-centric, at a glance PPC and maybe s390 have issues beyond async #PF. > Cc: stable@vger.kernel.org > Suggested-by: Ben Gardon <bgardon@google.com> > [ Based on a patch from Ben implemented for Google's kernel. ] > Signed-off-by: David Matlack <dmatlack@google.com> > --- > virt/kvm/kvm_main.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 35ae6d32dae5..b59f0a29dbd5 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir); > > static const struct file_operations stat_fops_per_vm; > > +static struct file_operations kvm_chardev_ops; > + > static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, > unsigned long arg); > #ifdef CONFIG_KVM_COMPAT > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type) > preempt_notifier_inc(); > kvm_init_pm_notifier(kvm); > > + if (!try_module_get(kvm_chardev_ops.owner)) { The "try" aspect is unnecessary. Stealing from Paolo's version, /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */ __module_get(kvm_chardev_ops.owner); > + r = -ENODEV; > + goto out_err; > + } > + > return kvm; > > out_err: > @@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm) > preempt_notifier_dec(); > hardware_disable_all(); > mmdrop(mm); > + module_put(kvm_chardev_ops.owner); > } > > void kvm_get_kvm(struct kvm *kvm) > > base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd > prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b > prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156 > -- > 2.35.1.616.g0bdcbb4464-goog >
On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Mar 03, 2022, David Matlack wrote: > > Tie the lifetime the KVM module to the lifetime of each VM via > > kvm.users_count. This way anything that grabs a reference to the VM via > > kvm_get_kvm() cannot accidentally outlive the KVM module. > > > > Prior to this commit, the lifetime of the KVM module was tied to the > > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU > > file descriptors by their respective file_operations "owner" field. > > This approach is insufficient because references grabbed via > > kvm_get_kvm() do not prevent closing any of the aforementioned file > > descriptors. > > > > This fixes a long standing theoretical bug in KVM that at least affects > > async page faults. kvm_setup_async_pf() grabs a reference via > > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing > > prevents the VM file descriptor from being closed and the KVM module > > from being unloaded before this callback runs. > > > > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out") > > And (or) > > Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations") > > because the above is x86-centric, at a glance PPC and maybe s390 have issues > beyond async #PF. > > > Cc: stable@vger.kernel.org > > Suggested-by: Ben Gardon <bgardon@google.com> > > [ Based on a patch from Ben implemented for Google's kernel. ] > > Signed-off-by: David Matlack <dmatlack@google.com> > > --- > > virt/kvm/kvm_main.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 35ae6d32dae5..b59f0a29dbd5 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir); > > > > static const struct file_operations stat_fops_per_vm; > > > > +static struct file_operations kvm_chardev_ops; > > + > > static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, > > unsigned long arg); > > #ifdef CONFIG_KVM_COMPAT > > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type) > > preempt_notifier_inc(); > > kvm_init_pm_notifier(kvm); > > > > + if (!try_module_get(kvm_chardev_ops.owner)) { > > The "try" aspect is unnecessary. Stealing from Paolo's version, > > /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */ > __module_get(kvm_chardev_ops.owner); Right, I did see that and agree we're guaranteed the KVM module has a reference at this point. But the KVM module might be in state MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait"), which try_module_get() checks. > > > + r = -ENODEV; > > + goto out_err; > > + } > > + > > return kvm; > > > > out_err: > > @@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm) > > preempt_notifier_dec(); > > hardware_disable_all(); > > mmdrop(mm); > > + module_put(kvm_chardev_ops.owner); > > } > > > > void kvm_get_kvm(struct kvm *kvm) > > > > base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd > > prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b > > prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156 > > -- > > 2.35.1.616.g0bdcbb4464-goog > >
On Tue, Mar 08, 2022, David Matlack wrote: > On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Thu, Mar 03, 2022, David Matlack wrote: > > > Tie the lifetime the KVM module to the lifetime of each VM via > > > kvm.users_count. This way anything that grabs a reference to the VM via > > > kvm_get_kvm() cannot accidentally outlive the KVM module. > > > > > > Prior to this commit, the lifetime of the KVM module was tied to the > > > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU > > > file descriptors by their respective file_operations "owner" field. > > > This approach is insufficient because references grabbed via > > > kvm_get_kvm() do not prevent closing any of the aforementioned file > > > descriptors. > > > > > > This fixes a long standing theoretical bug in KVM that at least affects > > > async page faults. kvm_setup_async_pf() grabs a reference via > > > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing > > > prevents the VM file descriptor from being closed and the KVM module > > > from being unloaded before this callback runs. > > > > > > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out") > > > > And (or) > > > > Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations") > > > > because the above is x86-centric, at a glance PPC and maybe s390 have issues > > beyond async #PF. > > > > > Cc: stable@vger.kernel.org > > > Suggested-by: Ben Gardon <bgardon@google.com> > > > [ Based on a patch from Ben implemented for Google's kernel. ] > > > Signed-off-by: David Matlack <dmatlack@google.com> > > > --- > > > virt/kvm/kvm_main.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index 35ae6d32dae5..b59f0a29dbd5 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir); > > > > > > static const struct file_operations stat_fops_per_vm; > > > > > > +static struct file_operations kvm_chardev_ops; > > > + > > > static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, > > > unsigned long arg); > > > #ifdef CONFIG_KVM_COMPAT > > > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type) > > > preempt_notifier_inc(); > > > kvm_init_pm_notifier(kvm); > > > > > > + if (!try_module_get(kvm_chardev_ops.owner)) { > > > > The "try" aspect is unnecessary. Stealing from Paolo's version, > > > > /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */ > > __module_get(kvm_chardev_ops.owner); > > Right, I did see that and agree we're guaranteed the KVM module has a > reference at this point. But the KVM module might be in state > MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait"), which > try_module_get() checks. Ah, can you throw that in as a comment? Doesn't have to be much, just enough of a breadcrumb to connect the dots and to prevent us from "optimizing" this to __module_get() in the future. /* Use the "try" variant to play nice with e.g. "rmmod --wait". */ With a comment, Reviewed-by: Sean Christopherson <seanjc@google.com>
On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Mar 03, 2022, David Matlack wrote: > > Tie the lifetime the KVM module to the lifetime of each VM via > > kvm.users_count. This way anything that grabs a reference to the VM via > > kvm_get_kvm() cannot accidentally outlive the KVM module. > > > > Prior to this commit, the lifetime of the KVM module was tied to the > > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU > > file descriptors by their respective file_operations "owner" field. > > This approach is insufficient because references grabbed via > > kvm_get_kvm() do not prevent closing any of the aforementioned file > > descriptors. > > > > This fixes a long standing theoretical bug in KVM that at least affects > > async page faults. kvm_setup_async_pf() grabs a reference via > > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing > > prevents the VM file descriptor from being closed and the KVM module > > from being unloaded before this callback runs. > > > > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out") > > And (or) > > Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations") > > because the above is x86-centric, at a glance PPC and maybe s390 have issues > beyond async #PF. SGTM. It's a moot point in terms of stable inclusion since af585b921e5d was first added in v2.6.38. But for anyone doing their own backporting, 3d3aab1b973b makes it a bit more obvious this is a generic problem even though it's not the commit that introduces the bug. > > > Cc: stable@vger.kernel.org > > Suggested-by: Ben Gardon <bgardon@google.com> > > [ Based on a patch from Ben implemented for Google's kernel. ] > > Signed-off-by: David Matlack <dmatlack@google.com> > > --- > > virt/kvm/kvm_main.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 35ae6d32dae5..b59f0a29dbd5 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir); > > > > static const struct file_operations stat_fops_per_vm; > > > > +static struct file_operations kvm_chardev_ops; > > + > > static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, > > unsigned long arg); > > #ifdef CONFIG_KVM_COMPAT > > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type) > > preempt_notifier_inc(); > > kvm_init_pm_notifier(kvm); > > > > + if (!try_module_get(kvm_chardev_ops.owner)) { > > The "try" aspect is unnecessary. Stealing from Paolo's version, > > /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */ > __module_get(kvm_chardev_ops.owner); > > > + r = -ENODEV; > > + goto out_err; > > + } > > + > > return kvm; > > > > out_err: > > @@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm) > > preempt_notifier_dec(); > > hardware_disable_all(); > > mmdrop(mm); > > + module_put(kvm_chardev_ops.owner); > > } > > > > void kvm_get_kvm(struct kvm *kvm) > > > > base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd > > prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b > > prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156 > > -- > > 2.35.1.616.g0bdcbb4464-goog > >
On Tue, Mar 8, 2022 at 3:09 PM Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Mar 08, 2022, David Matlack wrote: > > On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Thu, Mar 03, 2022, David Matlack wrote: > > > > Tie the lifetime the KVM module to the lifetime of each VM via > > > > kvm.users_count. This way anything that grabs a reference to the VM via > > > > kvm_get_kvm() cannot accidentally outlive the KVM module. > > > > > > > > Prior to this commit, the lifetime of the KVM module was tied to the > > > > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU > > > > file descriptors by their respective file_operations "owner" field. > > > > This approach is insufficient because references grabbed via > > > > kvm_get_kvm() do not prevent closing any of the aforementioned file > > > > descriptors. > > > > > > > > This fixes a long standing theoretical bug in KVM that at least affects > > > > async page faults. kvm_setup_async_pf() grabs a reference via > > > > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing > > > > prevents the VM file descriptor from being closed and the KVM module > > > > from being unloaded before this callback runs. > > > > > > > > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out") > > > > > > And (or) > > > > > > Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations") > > > > > > because the above is x86-centric, at a glance PPC and maybe s390 have issues > > > beyond async #PF. > > > > > > > Cc: stable@vger.kernel.org > > > > Suggested-by: Ben Gardon <bgardon@google.com> > > > > [ Based on a patch from Ben implemented for Google's kernel. ] > > > > Signed-off-by: David Matlack <dmatlack@google.com> > > > > --- > > > > virt/kvm/kvm_main.c | 8 ++++++++ > > > > 1 file changed, 8 insertions(+) > > > > > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > > index 35ae6d32dae5..b59f0a29dbd5 100644 > > > > --- a/virt/kvm/kvm_main.c > > > > +++ b/virt/kvm/kvm_main.c > > > > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir); > > > > > > > > static const struct file_operations stat_fops_per_vm; > > > > > > > > +static struct file_operations kvm_chardev_ops; > > > > + > > > > static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, > > > > unsigned long arg); > > > > #ifdef CONFIG_KVM_COMPAT > > > > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type) > > > > preempt_notifier_inc(); > > > > kvm_init_pm_notifier(kvm); > > > > > > > > + if (!try_module_get(kvm_chardev_ops.owner)) { > > > > > > The "try" aspect is unnecessary. Stealing from Paolo's version, > > > > > > /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */ > > > __module_get(kvm_chardev_ops.owner); > > > > Right, I did see that and agree we're guaranteed the KVM module has a > > reference at this point. But the KVM module might be in state > > MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait"), which > > try_module_get() checks. > > Ah, can you throw that in as a comment? Doesn't have to be much, just enough of > a breadcrumb to connect the dots and to prevent us from "optimizing" this to > __module_get() in the future. > > /* Use the "try" variant to play nice with e.g. "rmmod --wait". */ Yeah. I should have included this in the first place (or at least a blurb in the commit message). > > With a comment, > > Reviewed-by: Sean Christopherson <seanjc@google.com>
Hi, David. Some comments below. On 3/3/22 15:33, David Matlack wrote: > Tie the lifetime the KVM module to the lifetime of each VM via > kvm.users_count. This way anything that grabs a reference to the VM via > kvm_get_kvm() cannot accidentally outlive the KVM module. > > Prior to this commit, the lifetime of the KVM module was tied to the > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU > file descriptors by their respective file_operations "owner" field. > This approach is insufficient because references grabbed via > kvm_get_kvm() do not prevent closing any of the aforementioned file > descriptors. > > This fixes a long standing theoretical bug in KVM that at least affects > async page faults. kvm_setup_async_pf() grabs a reference via > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing > prevents the VM file descriptor from being closed and the KVM module > from being unloaded before this callback runs. > > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out") > Cc: stable@vger.kernel.org > Suggested-by: Ben Gardon <bgardon@google.com> > [ Based on a patch from Ben implemented for Google's kernel. ] > Signed-off-by: David Matlack <dmatlack@google.com> > --- > virt/kvm/kvm_main.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 35ae6d32dae5..b59f0a29dbd5 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir); > > static const struct file_operations stat_fops_per_vm; > > +static struct file_operations kvm_chardev_ops; > + > static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, > unsigned long arg); > #ifdef CONFIG_KVM_COMPAT > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type) > preempt_notifier_inc(); > kvm_init_pm_notifier(kvm); > > + if (!try_module_get(kvm_chardev_ops.owner)) { > + r = -ENODEV; > + goto out_err; > + } > + Doesn't this problem also affects the other functions called from kvm_dev_ioctl()? Is it possible that the module is removed while other ioctl's are still running, e.g. KVM_GET_API_VERSION and KVM_CHECK_EXTENSION, even though they don't use struct kvm? I wonder if this try_module_get() (along with module_put() in the out path of the function) shouldn't be placed in the upper function kvm_dev_ioctl() so it would cover all the other ioctl's. > return kvm; > > out_err: > @@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm) > preempt_notifier_dec(); > hardware_disable_all(); > mmdrop(mm); > + module_put(kvm_chardev_ops.owner); > } > > void kvm_get_kvm(struct kvm *kvm) > > base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd > prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b > prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156
On 3/15/22 16:43, Murilo Opsfelder Araújo wrote: >> >> + if (!try_module_get(kvm_chardev_ops.owner)) { >> + r = -ENODEV; >> + goto out_err; >> + } >> + > > Doesn't this problem also affects the other functions called from > kvm_dev_ioctl()? > > Is it possible that the module is removed while other ioctl's are > still running, e.g. KVM_GET_API_VERSION and KVM_CHECK_EXTENSION, even > though they don't use struct kvm? No, because opening /dev/kvm also adds a reference to the module. The problem is that create_vm creates another source of references to the module that can survive after /dev/kvm is closed. Paolo
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 35ae6d32dae5..b59f0a29dbd5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir); static const struct file_operations stat_fops_per_vm; +static struct file_operations kvm_chardev_ops; + static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); #ifdef CONFIG_KVM_COMPAT @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type) preempt_notifier_inc(); kvm_init_pm_notifier(kvm); + if (!try_module_get(kvm_chardev_ops.owner)) { + r = -ENODEV; + goto out_err; + } + return kvm; out_err: @@ -1220,6 +1227,7 @@ static void kvm_destroy_vm(struct kvm *kvm) preempt_notifier_dec(); hardware_disable_all(); mmdrop(mm); + module_put(kvm_chardev_ops.owner); } void kvm_get_kvm(struct kvm *kvm)
Tie the lifetime the KVM module to the lifetime of each VM via kvm.users_count. This way anything that grabs a reference to the VM via kvm_get_kvm() cannot accidentally outlive the KVM module. Prior to this commit, the lifetime of the KVM module was tied to the lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU file descriptors by their respective file_operations "owner" field. This approach is insufficient because references grabbed via kvm_get_kvm() do not prevent closing any of the aforementioned file descriptors. This fixes a long standing theoretical bug in KVM that at least affects async page faults. kvm_setup_async_pf() grabs a reference via kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing prevents the VM file descriptor from being closed and the KVM module from being unloaded before this callback runs. Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out") Cc: stable@vger.kernel.org Suggested-by: Ben Gardon <bgardon@google.com> [ Based on a patch from Ben implemented for Google's kernel. ] Signed-off-by: David Matlack <dmatlack@google.com> --- virt/kvm/kvm_main.c | 8 ++++++++ 1 file changed, 8 insertions(+) base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156