From patchwork Tue Jul 16 02:20:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 13733985 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A72A314277 for ; Tue, 16 Jul 2024 02:20:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721096428; cv=none; b=KSEkmaFT6ZkAcaUGTe7bbkG9dhHdEDeOlMWzpsqLnsv05sJYGyPnpg+bGlsWm868NTT1WeMdA3tm3Cj453kpq4HAVDymLyXo2x2mdNccKLNIijRa+VH2QKbptZXSj83AzIjg1xyvDkIKEgUlx59Xsl+57j1J2vD2rsQu45dJqaY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721096428; c=relaxed/simple; bh=HeHFh3uTY2qznZHF8awVukHmACLPPqsJ7jfC+fsUYuM=; h=From:To:Cc:Subject:Date:Message-Id:Content-Type:MIME-Version; b=TGVcTOxkb1lb9m+8ksqKt6iv256+hi5AL/NdC/Wx/nCAYdLun2d977+kWcicmmQ/CMQRBq5VfZbZn/iJggKY29MzGKG7+NjjAVCWggCjkGJPOE11JDtwK7+XNhAF5DQfuXOeRUR3ocQUWQ3Y0Euj+IWv2zLlU76GApjzsM+jnvM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hyeQJGDP; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hyeQJGDP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721096425; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=u7nsomLB5pQK9PslRRQG0TufbWxbkz8lavQHCKnAJDU=; b=hyeQJGDPVhs73nVM+nIm0m+yk2xtEkavjLP7EYyfrts/9Ri/IF41eTqi/jzpxeM9BixaYk k92ZCOJdBxJraLl6vUjZZ7yxrVemgcN+efeyAdcXtB+JJwslAlTXaB8+ehR+wKCBbi+q1Q KfyMdx9rZKrQTCFjycgDItsdqHoTzNw= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-259-dWbkc9PIOkSDFtRW_ItB3A-1; Mon, 15 Jul 2024 22:20:20 -0400 X-MC-Unique: dWbkc9PIOkSDFtRW_ItB3A-1 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BAF9619560A2; Tue, 16 Jul 2024 02:20:17 +0000 (UTC) Received: from starship.lan (unknown [10.22.8.61]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0E04E19560B2; Tue, 16 Jul 2024 02:20:14 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Paolo Bonzini , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Sean Christopherson , Borislav Petkov , linux-kernel@vger.kernel.org, Dave Hansen , Thomas Gleixner , Maxim Levitsky Subject: [PATCH v2 0/2] Fix for a very old KVM bug in the segment cache Date: Mon, 15 Jul 2024 22:20:12 -0400 Message-Id: <20240716022014.240960-1-mlevitsk@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Hi, Recently, while trying to understand why the pmu_counters_test selftest sometimes fails when run nested I stumbled upon a very interesting and old bug: It turns out that KVM caches guest segment state, but this cache doesn't have any protection against concurrent use. This usually works because the cache is per vcpu, and should only be accessed by vCPU thread, however there is an exception: If the full preemption is enabled in the host kernel, it is possible that vCPU thread will be preempted, for example during the vmx_vcpu_reset. vmx_vcpu_reset resets the segment cache bitmask and then initializes the segments in the vmcs, however if the vcpus is preempted in the middle of this code, the kvm_arch_vcpu_put is called which reads SS's AR bytes to determine if the vCPU is in the kernel mode, which caches the old value. Later vmx_vcpu_reset will set the SS's AR field to the correct value in vmcs but the cache still contains an invalid value which can later for example leak via KVM_GET_SREGS and such. In particular, kvm selftests will do KVM_GET_SREGS, and then KVM_SET_SREGS, with a broken SS's AR field passed as is, which will lead to vm entry failure. This issue is not a nested issue, and actually I was able to reproduce it on bare metal, but due to timing it happens much more often nested. The only requirement for this to happen is to have full preemption enabled in the kernel which runs the selftest. pmu_counters_test reproduces this issue well, because it creates lots of short lived VMs, but the issue as was noted about is not related to pmu. To fix this issue, all places in KVM which write to the segment cache, are now wrapped with vmx_write_segment_cache_start/end which disables the preemption. V2: incorporated Paolo's suggestion of having vmx_write_segment_cache_start/end functions (thanks!) Best regards, Maxim Levitsky Maxim Levitsky (2): KVM: nVMX: use vmx_segment_cache_clear KVM: VMX: disable preemption when touching segment fields arch/x86/kvm/vmx/nested.c | 5 ++++- arch/x86/kvm/vmx/vmx.c | 29 +++++++++++++++++++---------- arch/x86/kvm/vmx/vmx.h | 17 +++++++++++++++++ 3 files changed, 40 insertions(+), 11 deletions(-)