Message ID | 20180925202000.27352-1-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] KVM: x86: fix L1TF's MMIO GFN calculation | expand |
On 09/25/2018 01:20 PM, Sean Christopherson wrote: > One defense against L1TF in KVM is to always set the upper five bits > of the *legal* physical address in the SPTEs for non-present and > reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the > MMIO SPTE may overlap with the upper five bits that are being usurped > to defend against L1TF. To preserve the GFN, the bits of the GFN that > overlap with the repurposed bits are shifted left into the reserved > bits, i.e. the GFN in the SPTE will be split into high and low parts. > When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO > access, get_mmio_spte_gfn() unshifts the affected bits and restores > the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() > neglects to mask off the reserved bits in the SPTE that were used to > store the upper chunk of the GFN. As a result, KVM fails to detect > MMIO accesses whose GPA overlaps the repurprosed bits, which in turn > causes guest panics and hangs. > > Fix the bug by generating a mask that covers the lower chunk of the > GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The > alternative approach would be to explicitly zero the five reserved > bits that are used to store the upper chunk of the GFN, but that > requires additional run-time computation and makes an already-ugly > bit of code even more inscrutable. > > I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to > warn if GENMASK_ULL() generated a nonsensical value, but that seemed > silly since that would mean a system that supports VMX has less than > 18 bits of physical address space... > > Reported-by: Sakari Ailus <sakari.ailus@iki.fi> > Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") > Cc: Junaid Shahid <junaids@google.com> > Cc: Jim Mattson <jmattson@google.com> > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > > v2: Changed the mask to cover only the GFN and tweaked variable > names to better reflect this behavior [Junaid Shahid]. > > I didn't include Jim and Junaid's "Reviewed-by" tags in the changelog > because v2 reworks the calculation of the mask, which is the part of > this change that I'm most likely to screw up (math is hard). I pasted > them below in case you feel it's appropriate to keep them. > > Reviewed-by: Junaid Shahid <junaids@google.com> > Reviewed-by: Jim Mattson <jmattson@google.com> > > arch/x86/kvm/mmu.c | 24 ++++++++++++++++++++---- > 1 file changed, 20 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index a282321329b5..a04084cb4b7b 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -249,6 +249,17 @@ static u64 __read_mostly shadow_nonpresent_or_rsvd_mask; > */ > static const u64 shadow_nonpresent_or_rsvd_mask_len = 5; > > +/* > + * In some cases, we need to preserve the GFN of a non-present or reserved > + * SPTE when we usurp the upper five bits of the physical address space to > + * defend against L1TF, e.g. for MMIO SPTEs. To preserve the GFN, we'll > + * shift bits of the GFN that overlap with shadow_nonpresent_or_rsvd_mask > + * left into the reserved bits, i.e. the GFN in the SPTE will be split into > + * high and low parts. This mask covers the lower bits of the GFN. > + */ > +static u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; > + > + > static void mmu_spte_set(u64 *sptep, u64 spte); > static union kvm_mmu_page_role > kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu); > @@ -357,9 +368,7 @@ static bool is_mmio_spte(u64 spte) > > static gfn_t get_mmio_spte_gfn(u64 spte) > { > - u64 mask = generation_mmio_spte_mask(MMIO_GEN_MASK) | shadow_mmio_mask | > - shadow_nonpresent_or_rsvd_mask; > - u64 gpa = spte & ~mask; > + u64 gpa = spte & shadow_nonpresent_or_rsvd_lower_gfn_mask; > > gpa |= (spte >> shadow_nonpresent_or_rsvd_mask_len) > & shadow_nonpresent_or_rsvd_mask; > @@ -423,6 +432,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); > > static void kvm_mmu_reset_all_pte_masks(void) > { > + u8 low_phys_bits; > + > shadow_user_mask = 0; > shadow_accessed_mask = 0; > shadow_dirty_mask = 0; > @@ -437,12 +448,17 @@ static void kvm_mmu_reset_all_pte_masks(void) > * appropriate mask to guard against L1TF attacks. Otherwise, it is > * assumed that the CPU is not vulnerable to L1TF. > */ > + low_phys_bits = boot_cpu_data.x86_phys_bits; > if (boot_cpu_data.x86_phys_bits < > - 52 - shadow_nonpresent_or_rsvd_mask_len) > + 52 - shadow_nonpresent_or_rsvd_mask_len) { > shadow_nonpresent_or_rsvd_mask = > rsvd_bits(boot_cpu_data.x86_phys_bits - > shadow_nonpresent_or_rsvd_mask_len, > boot_cpu_data.x86_phys_bits - 1); > + low_phys_bits -= shadow_nonpresent_or_rsvd_mask_len; > + } > + shadow_nonpresent_or_rsvd_lower_gfn_mask = > + GENMASK_ULL(low_phys_bits - 1, PAGE_SHIFT); > } > > static int is_cpuid_PSE36(void) > Reviewed-by: Junaid Shahid <junaids@google.com>
Hi Sean, On Tue, Sep 25, 2018 at 01:20:00PM -0700, Sean Christopherson wrote: > One defense against L1TF in KVM is to always set the upper five bits > of the *legal* physical address in the SPTEs for non-present and > reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the > MMIO SPTE may overlap with the upper five bits that are being usurped > to defend against L1TF. To preserve the GFN, the bits of the GFN that > overlap with the repurposed bits are shifted left into the reserved > bits, i.e. the GFN in the SPTE will be split into high and low parts. > When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO > access, get_mmio_spte_gfn() unshifts the affected bits and restores > the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() > neglects to mask off the reserved bits in the SPTE that were used to > store the upper chunk of the GFN. As a result, KVM fails to detect > MMIO accesses whose GPA overlaps the repurprosed bits, which in turn > causes guest panics and hangs. > > Fix the bug by generating a mask that covers the lower chunk of the > GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The > alternative approach would be to explicitly zero the five reserved > bits that are used to store the upper chunk of the GFN, but that > requires additional run-time computation and makes an already-ugly > bit of code even more inscrutable. > > I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to > warn if GENMASK_ULL() generated a nonsensical value, but that seemed > silly since that would mean a system that supports VMX has less than > 18 bits of physical address space... > > Reported-by: Sakari Ailus <sakari.ailus@iki.fi> > Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") > Cc: Junaid Shahid <junaids@google.com> > Cc: Jim Mattson <jmattson@google.com> > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> For this one as well: Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
On 25/09/2018 22:20, Sean Christopherson wrote: > One defense against L1TF in KVM is to always set the upper five bits > of the *legal* physical address in the SPTEs for non-present and > reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the > MMIO SPTE may overlap with the upper five bits that are being usurped > to defend against L1TF. To preserve the GFN, the bits of the GFN that > overlap with the repurposed bits are shifted left into the reserved > bits, i.e. the GFN in the SPTE will be split into high and low parts. > When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO > access, get_mmio_spte_gfn() unshifts the affected bits and restores > the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() > neglects to mask off the reserved bits in the SPTE that were used to > store the upper chunk of the GFN. As a result, KVM fails to detect > MMIO accesses whose GPA overlaps the repurprosed bits, which in turn > causes guest panics and hangs. > > Fix the bug by generating a mask that covers the lower chunk of the > GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The > alternative approach would be to explicitly zero the five reserved > bits that are used to store the upper chunk of the GFN, but that > requires additional run-time computation and makes an already-ugly > bit of code even more inscrutable. > > I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to > warn if GENMASK_ULL() generated a nonsensical value, but that seemed > silly since that would mean a system that supports VMX has less than > 18 bits of physical address space... > > Reported-by: Sakari Ailus <sakari.ailus@iki.fi> > Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") > Cc: Junaid Shahid <junaids@google.com> > Cc: Jim Mattson <jmattson@google.com> > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > > v2: Changed the mask to cover only the GFN and tweaked variable > names to better reflect this behavior [Junaid Shahid]. > > I didn't include Jim and Junaid's "Reviewed-by" tags in the changelog > because v2 reworks the calculation of the mask, which is the part of > this change that I'm most likely to screw up (math is hard). I pasted > them below in case you feel it's appropriate to keep them. > > Reviewed-by: Junaid Shahid <junaids@google.com> > Reviewed-by: Jim Mattson <jmattson@google.com> > > arch/x86/kvm/mmu.c | 24 ++++++++++++++++++++---- > 1 file changed, 20 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index a282321329b5..a04084cb4b7b 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -249,6 +249,17 @@ static u64 __read_mostly shadow_nonpresent_or_rsvd_mask; > */ > static const u64 shadow_nonpresent_or_rsvd_mask_len = 5; > > +/* > + * In some cases, we need to preserve the GFN of a non-present or reserved > + * SPTE when we usurp the upper five bits of the physical address space to > + * defend against L1TF, e.g. for MMIO SPTEs. To preserve the GFN, we'll > + * shift bits of the GFN that overlap with shadow_nonpresent_or_rsvd_mask > + * left into the reserved bits, i.e. the GFN in the SPTE will be split into > + * high and low parts. This mask covers the lower bits of the GFN. > + */ > +static u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; > + > + > static void mmu_spte_set(u64 *sptep, u64 spte); > static union kvm_mmu_page_role > kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu); > @@ -357,9 +368,7 @@ static bool is_mmio_spte(u64 spte) > > static gfn_t get_mmio_spte_gfn(u64 spte) > { > - u64 mask = generation_mmio_spte_mask(MMIO_GEN_MASK) | shadow_mmio_mask | > - shadow_nonpresent_or_rsvd_mask; > - u64 gpa = spte & ~mask; > + u64 gpa = spte & shadow_nonpresent_or_rsvd_lower_gfn_mask; > > gpa |= (spte >> shadow_nonpresent_or_rsvd_mask_len) > & shadow_nonpresent_or_rsvd_mask; > @@ -423,6 +432,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); > > static void kvm_mmu_reset_all_pte_masks(void) > { > + u8 low_phys_bits; > + > shadow_user_mask = 0; > shadow_accessed_mask = 0; > shadow_dirty_mask = 0; > @@ -437,12 +448,17 @@ static void kvm_mmu_reset_all_pte_masks(void) > * appropriate mask to guard against L1TF attacks. Otherwise, it is > * assumed that the CPU is not vulnerable to L1TF. > */ > + low_phys_bits = boot_cpu_data.x86_phys_bits; > if (boot_cpu_data.x86_phys_bits < > - 52 - shadow_nonpresent_or_rsvd_mask_len) > + 52 - shadow_nonpresent_or_rsvd_mask_len) { > shadow_nonpresent_or_rsvd_mask = > rsvd_bits(boot_cpu_data.x86_phys_bits - > shadow_nonpresent_or_rsvd_mask_len, > boot_cpu_data.x86_phys_bits - 1); > + low_phys_bits -= shadow_nonpresent_or_rsvd_mask_len; > + } > + shadow_nonpresent_or_rsvd_lower_gfn_mask = > + GENMASK_ULL(low_phys_bits - 1, PAGE_SHIFT); > } > > static int is_cpuid_PSE36(void) > Queued, thanks. Paolo
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a282321329b5..a04084cb4b7b 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -249,6 +249,17 @@ static u64 __read_mostly shadow_nonpresent_or_rsvd_mask; */ static const u64 shadow_nonpresent_or_rsvd_mask_len = 5; +/* + * In some cases, we need to preserve the GFN of a non-present or reserved + * SPTE when we usurp the upper five bits of the physical address space to + * defend against L1TF, e.g. for MMIO SPTEs. To preserve the GFN, we'll + * shift bits of the GFN that overlap with shadow_nonpresent_or_rsvd_mask + * left into the reserved bits, i.e. the GFN in the SPTE will be split into + * high and low parts. This mask covers the lower bits of the GFN. + */ +static u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; + + static void mmu_spte_set(u64 *sptep, u64 spte); static union kvm_mmu_page_role kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu); @@ -357,9 +368,7 @@ static bool is_mmio_spte(u64 spte) static gfn_t get_mmio_spte_gfn(u64 spte) { - u64 mask = generation_mmio_spte_mask(MMIO_GEN_MASK) | shadow_mmio_mask | - shadow_nonpresent_or_rsvd_mask; - u64 gpa = spte & ~mask; + u64 gpa = spte & shadow_nonpresent_or_rsvd_lower_gfn_mask; gpa |= (spte >> shadow_nonpresent_or_rsvd_mask_len) & shadow_nonpresent_or_rsvd_mask; @@ -423,6 +432,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); static void kvm_mmu_reset_all_pte_masks(void) { + u8 low_phys_bits; + shadow_user_mask = 0; shadow_accessed_mask = 0; shadow_dirty_mask = 0; @@ -437,12 +448,17 @@ static void kvm_mmu_reset_all_pte_masks(void) * appropriate mask to guard against L1TF attacks. Otherwise, it is * assumed that the CPU is not vulnerable to L1TF. */ + low_phys_bits = boot_cpu_data.x86_phys_bits; if (boot_cpu_data.x86_phys_bits < - 52 - shadow_nonpresent_or_rsvd_mask_len) + 52 - shadow_nonpresent_or_rsvd_mask_len) { shadow_nonpresent_or_rsvd_mask = rsvd_bits(boot_cpu_data.x86_phys_bits - shadow_nonpresent_or_rsvd_mask_len, boot_cpu_data.x86_phys_bits - 1); + low_phys_bits -= shadow_nonpresent_or_rsvd_mask_len; + } + shadow_nonpresent_or_rsvd_lower_gfn_mask = + GENMASK_ULL(low_phys_bits - 1, PAGE_SHIFT); } static int is_cpuid_PSE36(void)