diff mbox series

[v2,1/2] KVM: Prevent module exit until all VMs are freed

Message ID 20220309213208.872644-2-dmatlack@google.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/2] KVM: Prevent module exit until all VMs are freed | expand

Commit Message

David Matlack March 9, 2022, 9:32 p.m. UTC
Tie the lifetime the KVM module to the lifetime of each VM via
kvm.users_count. This way anything that grabs a reference to the VM via
kvm_get_kvm() cannot accidentally outlive the KVM module.

Prior to this commit, the lifetime of the KVM module was tied to the
lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
file descriptors by their respective file_operations "owner" field.
This approach is insufficient because references grabbed via
kvm_get_kvm() donot prevent closing any of the aforementioned file
descriptors.

This fixes a long standing theoretical bug in KVM that at least affects
async page faults. kvm_setup_async_pf() grabs a reference via
kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
prevents the VM file descriptor from being closed and the KVM module
from being unloaded before this callback runs.

PPC and s390 also look broken beyond the Fixes commits listed below, but
the below commits should be more than enough to guarantee inclusion in
all stable kernels.

Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations")
[ This 2.6.29 commit was an incomplete attempt to fix this bug. ]
Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
[ This 2.6.38 commit introduced async_pf and is definitely broken. ]
Cc: stable@vger.kernel.org
Suggested-by: Ben Gardon <bgardon@google.com>
[ Based on a patch from Ben implemented for Google's kernel. ]
Reviewed-by: Sean Christopherson <seanjc@google.com>

Signed-off-by: David Matlack <dmatlack@google.com>
---
 virt/kvm/kvm_main.c | 9 +++++++++
 1 file changed, 9 insertions(+)


base-commit: ce41d078aaa9cf15cbbb4a42878cc6160d76525e
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9581a24c3d17..e17f9fd847e0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -117,6 +117,8 @@  EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
 
 static const struct file_operations stat_fops_per_vm;
 
+static struct file_operations kvm_chardev_ops;
+
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
 			   unsigned long arg);
 #ifdef CONFIG_KVM_COMPAT
@@ -1132,6 +1134,12 @@  static struct kvm *kvm_create_vm(unsigned long type)
 	preempt_notifier_inc();
 	kvm_init_pm_notifier(kvm);
 
+	/* Use the "try" variant to play nice with e.g. "rmmod --wait". */
+	if (!try_module_get(kvm_chardev_ops.owner)) {
+		r = -ENODEV;
+		goto out_err;
+	}
+
 	return kvm;
 
 out_err:
@@ -1221,6 +1229,7 @@  static void kvm_destroy_vm(struct kvm *kvm)
 	preempt_notifier_dec();
 	hardware_disable_all();
 	mmdrop(mm);
+	module_put(kvm_chardev_ops.owner);
 }
 
 void kvm_get_kvm(struct kvm *kvm)