From patchwork Wed Feb 26 18:14:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13992917 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4447E22E3F1 for ; Wed, 26 Feb 2025 18:15:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740593705; cv=none; b=OrRXmS3K07haFJe98ULWuhmAZUousoP9lgpXU5P7mZGH7tcnDbb8MNo0ljvTzMYd3Uy5rV/i7cpZ0IpHFwoi8Mm3fE6w12nCqDu1E0PzhHyDQ6wT3rTW32VypcbGljcrtw9VF5gNpNm1GyBRsK9m9yBTR+jikDrM9gFs/4sJjkc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740593705; c=relaxed/simple; bh=5Kk3y5PCDWPzxrS3m/SQBAQhfMgokgFjRloIXhxSO7I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Q7TwltYmkDSJWqT/jQR7E3pTUfekpcC2RuZe8FTDdPNggEdmgTnIqqG0V3dGPQO3gbuUFKrNDyNZrDBiPtq4Y0emdBWrg7m6+4UqIeOlaFJPCj7eWzA4DKifE3BxQtzEOnvKP8AZ3pvkpgDsPN3l9kpzgB9wON9Lng1f+rkBHbE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=e6oXzmjI; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="e6oXzmjI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740593702; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oWJcvMonOJSpmhcxeUWcddM1INajJ8rhyv2xBoxE7vw=; b=e6oXzmjIFcK0NTg7urS02mtxArBV+XHvhzsMe2aAqayiBWE5M6X9sOsnnzLFefn341sPXs NLr7OzYHHVbtrD6gIUrkfK/DElnitW/3FV+4HjW4pGR/nYhKQcqFTIRrIdLz422NQ7nari hS2/NYwhOHD9A99ofrsoV08LC+IOLKo= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-654-M2oJ8OExPEa7aCgfMqKTiA-1; Wed, 26 Feb 2025 13:14:58 -0500 X-MC-Unique: M2oJ8OExPEa7aCgfMqKTiA-1 X-Mimecast-MFC-AGG-ID: M2oJ8OExPEa7aCgfMqKTiA_1740593696 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8F863180087C; Wed, 26 Feb 2025 18:14:56 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 374B01944D02; Wed, 26 Feb 2025 18:14:55 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Aaron Lewis , Jim Mattson , Kai Huang , Isaku Yamahata Subject: [PATCH 01/33] KVM: x86: Free vCPUs before freeing VM state Date: Wed, 26 Feb 2025 13:14:20 -0500 Message-ID: <20250226181453.2311849-2-pbonzini@redhat.com> In-Reply-To: <20250226181453.2311849-1-pbonzini@redhat.com> References: <20250226181453.2311849-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Sean Christopherson Free vCPUs before freeing any VM state, as both SVM and VMX may access VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs to be kicked out of nested guest mode. Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was called") partially fixed the issue, but for unknown reasons only moved the MMU unloading before VM destruction. Complete the change, and free all vCPU state prior to destroying VM state, as nVMX accesses even more state than nSVM. In addition to the AVIC, KVM can hit a use-after-free on MSR filters: kvm_msr_allowed+0x4c/0xd0 __kvm_set_msr+0x12d/0x1e0 kvm_set_msr+0x19/0x40 load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel] nested_vmx_vmexit+0x715/0xbd0 [kvm_intel] nested_vmx_free_vcpu+0x33/0x50 [kvm_intel] vmx_free_vcpu+0x54/0xc0 [kvm_intel] kvm_arch_vcpu_destroy+0x28/0xf0 kvm_vcpu_destroy+0x12/0x50 kvm_arch_destroy_vm+0x12c/0x1c0 kvm_put_kvm+0x263/0x3c0 kvm_vm_release+0x21/0x30 and an upcoming fix to process injectable interrupts on nested VM-Exit will access the PIC: BUG: kernel NULL pointer dereference, address: 0000000000000090 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page CPU: 23 UID: 1000 PID: 2658 Comm: kvm-nx-lpage-re RIP: 0010:kvm_cpu_has_extint+0x2f/0x60 [kvm] Call Trace: kvm_cpu_has_injectable_intr+0xe/0x60 [kvm] nested_vmx_vmexit+0x2d7/0xdf0 [kvm_intel] nested_vmx_free_vcpu+0x40/0x50 [kvm_intel] vmx_vcpu_free+0x2d/0x80 [kvm_intel] kvm_arch_vcpu_destroy+0x2d/0x130 [kvm] kvm_destroy_vcpus+0x8a/0x100 [kvm] kvm_arch_destroy_vm+0xa7/0x1d0 [kvm] kvm_destroy_vm+0x172/0x300 [kvm] kvm_vcpu_release+0x31/0x50 [kvm] Inarguably, both nSVM and nVMX need to be fixed, but punt on those cleanups for the moment. Conceptually, vCPUs should be freed before VM state. Assets like the I/O APIC and PIC _must_ be allocated before vCPUs are created, so it stands to reason that they must be freed _after_ vCPUs are destroyed. Reported-by: Aaron Lewis Closes: https://lore.kernel.org/all/20240703175618.2304869-2-aaronlewis@google.com Cc: Jim Mattson Cc: Yan Zhao Cc: Rick P Edgecombe Cc: Kai Huang Cc: Isaku Yamahata Signed-off-by: Sean Christopherson Message-ID: <20250224235542.2562848-2-seanjc@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 02159c967d29..6fc4ddc606bd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12877,11 +12877,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm) mutex_unlock(&kvm->slots_lock); } kvm_unload_vcpu_mmus(kvm); + kvm_destroy_vcpus(kvm); kvm_x86_call(vm_destroy)(kvm); kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1)); kvm_pic_destroy(kvm); kvm_ioapic_destroy(kvm); - kvm_destroy_vcpus(kvm); kvfree(rcu_dereference_check(kvm->arch.apic_map, 1)); kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1)); kvm_mmu_uninit_vm(kvm);