Message ID | 20180925144820.32739-1-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: fix L1TF's MMIO GFN calculation | expand |
On Tue, Sep 25, 2018 at 7:48 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > One defense against L1TF in KVM is to always set the upper five bits > of the *legal* physical address in the SPTEs for non-present and > reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the > MMIO SPTE may overlap with the upper five bits that are being usurped > to defend against L1TF. To preserve the GFN, the bits of the GFN that > overlap with the repurposed bits are shifted left into the reserved > bits, i.e. the GFN in the SPTE will be split into high and low parts. > When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO > access, get_mmio_spte_gfn() unshifts the affected bits and restores > the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() > neglects to mask off the reserved bits in the SPTE that were used to > store the upper chunk of the GFN. As a result, KVM fails to detect > MMIO accesses whose GPA overlaps the repurprosed bits, which in turn > causes guest panics and hangs. > > Fix the bug by generating a mask that covers the lower chunk of the > GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The > alternative approach would be to explicitly zero the five reserved > bits that are used to store the upper chunk of the GFN, but that > requires additional run-time computation and makes an already-ugly > bit of code even more inscrutable. > > Reported-by: Sakari Ailus <sakari.ailus@iki.fi> > Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") > Cc: Junaid Shahid <junaids@google.com> > Cc: Jim Mattson <jmattson@google.com> > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com>
On 09/25/2018 07:48 AM, Sean Christopherson wrote: > One defense against L1TF in KVM is to always set the upper five bits > of the *legal* physical address in the SPTEs for non-present and > reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the > MMIO SPTE may overlap with the upper five bits that are being usurped > to defend against L1TF. To preserve the GFN, the bits of the GFN that > overlap with the repurposed bits are shifted left into the reserved > bits, i.e. the GFN in the SPTE will be split into high and low parts. > When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO > access, get_mmio_spte_gfn() unshifts the affected bits and restores > the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() > neglects to mask off the reserved bits in the SPTE that were used to > store the upper chunk of the GFN. As a result, KVM fails to detect > MMIO accesses whose GPA overlaps the repurprosed bits, which in turn > causes guest panics and hangs. > > Fix the bug by generating a mask that covers the lower chunk of the > GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The > alternative approach would be to explicitly zero the five reserved > bits that are used to store the upper chunk of the GFN, but that > requires additional run-time computation and makes an already-ugly > bit of code even more inscrutable. Thanks for debugging and fixing this. > > Reported-by: Sakari Ailus <sakari.ailus@iki.fi> > Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") > Cc: Junaid Shahid <junaids@google.com> > Cc: Jim Mattson <jmattson@google.com> > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > arch/x86/kvm/mmu.c | 23 +++++++++++++++++++---- > 1 file changed, 19 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index a282321329b5..be5cb753fb42 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -249,6 +249,17 @@ static u64 __read_mostly shadow_nonpresent_or_rsvd_mask; > */ > static const u64 shadow_nonpresent_or_rsvd_mask_len = 5; > > +/* > + * In some cases, we need to preserve the GFN of a non-present or reserved > + * SPTE when we usurp the upper five bits of the physical address space to > + * defend against L1TF, e.g. for MMIO SPTEs. To preserve the GFN, we'll > + * shift bits of the GFN that overlap with shadow_nonpresent_or_rsvd_mask > + * left into the reserved bits, i.e. the GFN in the SPTE will be split into > + * high and low parts. This mask covers the lower bits of the GFN. > + */ > +static u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gpa_mask; > + > + > static void mmu_spte_set(u64 *sptep, u64 spte); > static union kvm_mmu_page_role > kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu); > @@ -357,9 +368,7 @@ static bool is_mmio_spte(u64 spte) > > static gfn_t get_mmio_spte_gfn(u64 spte) > { > - u64 mask = generation_mmio_spte_mask(MMIO_GEN_MASK) | shadow_mmio_mask | > - shadow_nonpresent_or_rsvd_mask; > - u64 gpa = spte & ~mask; > + u64 gpa = spte & shadow_nonpresent_or_rsvd_lower_gpa_mask; > > gpa |= (spte >> shadow_nonpresent_or_rsvd_mask_len) > & shadow_nonpresent_or_rsvd_mask; > @@ -423,6 +432,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); > > static void kvm_mmu_reset_all_pte_masks(void) > { > + u8 low_gpa_bits; > + > shadow_user_mask = 0; > shadow_accessed_mask = 0; > shadow_dirty_mask = 0; > @@ -437,12 +448,16 @@ static void kvm_mmu_reset_all_pte_masks(void) > * appropriate mask to guard against L1TF attacks. Otherwise, it is > * assumed that the CPU is not vulnerable to L1TF. > */ > + low_gpa_bits = boot_cpu_data.x86_phys_bits; > if (boot_cpu_data.x86_phys_bits < > - 52 - shadow_nonpresent_or_rsvd_mask_len) > + 52 - shadow_nonpresent_or_rsvd_mask_len) { > shadow_nonpresent_or_rsvd_mask = > rsvd_bits(boot_cpu_data.x86_phys_bits - > shadow_nonpresent_or_rsvd_mask_len, > boot_cpu_data.x86_phys_bits - 1); > + low_gpa_bits -= shadow_nonpresent_or_rsvd_mask_len; > + } > + shadow_nonpresent_or_rsvd_lower_gpa_mask = (1ULL << low_gpa_bits) - 1; I think that it might be slightly better to do something like: + shadow_nonpresent_or_rsvd_lower_gpa_mask = rsvd_bits(PAGE_SHIFT, low_gpa_bits - 1); Of course, it doesn't matter for get_mmio_spte_gfn() because that already shifts by PAGE_SHIFT, but could matter if this were to get used somewhere else. > } > > static int is_cpuid_PSE36(void) > Reviewed-by: Junaid Shahid <junaids@google.com>
On Tue, 2018-09-25 at 12:33 -0700, Junaid Shahid wrote: > On 09/25/2018 07:48 AM, Sean Christopherson wrote: ... > > @@ -423,6 +432,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); > > > > static void kvm_mmu_reset_all_pte_masks(void) > > { > > + u8 low_gpa_bits; > > + > > shadow_user_mask = 0; > > shadow_accessed_mask = 0; > > shadow_dirty_mask = 0; > > @@ -437,12 +448,16 @@ static void kvm_mmu_reset_all_pte_masks(void) > > * appropriate mask to guard against L1TF attacks. Otherwise, it is > > * assumed that the CPU is not vulnerable to L1TF. > > */ > > + low_gpa_bits = boot_cpu_data.x86_phys_bits; > > if (boot_cpu_data.x86_phys_bits < > > - 52 - shadow_nonpresent_or_rsvd_mask_len) > > + 52 - shadow_nonpresent_or_rsvd_mask_len) { > > shadow_nonpresent_or_rsvd_mask = > > rsvd_bits(boot_cpu_data.x86_phys_bits - > > shadow_nonpresent_or_rsvd_mask_len, > > boot_cpu_data.x86_phys_bits - 1); > > + low_gpa_bits -= shadow_nonpresent_or_rsvd_mask_len; > > + } > > + shadow_nonpresent_or_rsvd_lower_gpa_mask = (1ULL << low_gpa_bits) - 1; > I think that it might be slightly better to do something like: > > + shadow_nonpresent_or_rsvd_lower_gpa_mask = rsvd_bits(PAGE_SHIFT, low_gpa_bits - 1); > > Of course, it doesn't matter for get_mmio_spte_gfn() because that already shifts by PAGE_SHIFT, but could matter if this were to get used somewhere else. Good point, we're providing a mask for the GFN, not the GPA. I'll send a v2.
Hi Sean, On Tue, Sep 25, 2018 at 12:50:40PM -0700, Sean Christopherson wrote: > On Tue, 2018-09-25 at 12:33 -0700, Junaid Shahid wrote: > > On 09/25/2018 07:48 AM, Sean Christopherson wrote: > > ... > > > > @@ -423,6 +432,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); > > > > > > static void kvm_mmu_reset_all_pte_masks(void) > > > { > > > + u8 low_gpa_bits; > > > + > > > shadow_user_mask = 0; > > > shadow_accessed_mask = 0; > > > shadow_dirty_mask = 0; > > > @@ -437,12 +448,16 @@ static void kvm_mmu_reset_all_pte_masks(void) > > > * appropriate mask to guard against L1TF attacks. Otherwise, it is > > > * assumed that the CPU is not vulnerable to L1TF. > > > */ > > > + low_gpa_bits = boot_cpu_data.x86_phys_bits; > > > if (boot_cpu_data.x86_phys_bits < > > > - 52 - shadow_nonpresent_or_rsvd_mask_len) > > > + 52 - shadow_nonpresent_or_rsvd_mask_len) { > > > shadow_nonpresent_or_rsvd_mask = > > > rsvd_bits(boot_cpu_data.x86_phys_bits - > > > shadow_nonpresent_or_rsvd_mask_len, > > > boot_cpu_data.x86_phys_bits - 1); > > > + low_gpa_bits -= shadow_nonpresent_or_rsvd_mask_len; > > > + } > > > + shadow_nonpresent_or_rsvd_lower_gpa_mask = (1ULL << low_gpa_bits) - 1; > > I think that it might be slightly better to do something like: > > > > + shadow_nonpresent_or_rsvd_lower_gpa_mask = rsvd_bits(PAGE_SHIFT, low_gpa_bits - 1); > > > > Of course, it doesn't matter for get_mmio_spte_gfn() because that > > already shifts by PAGE_SHIFT, but could matter if this were to get used > > somewhere else. > > Good point, we're providing a mask for the GFN, not the GPA. I'll > send a v2. With (as well as without) the change suggested by Junaid, Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a282321329b5..be5cb753fb42 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -249,6 +249,17 @@ static u64 __read_mostly shadow_nonpresent_or_rsvd_mask; */ static const u64 shadow_nonpresent_or_rsvd_mask_len = 5; +/* + * In some cases, we need to preserve the GFN of a non-present or reserved + * SPTE when we usurp the upper five bits of the physical address space to + * defend against L1TF, e.g. for MMIO SPTEs. To preserve the GFN, we'll + * shift bits of the GFN that overlap with shadow_nonpresent_or_rsvd_mask + * left into the reserved bits, i.e. the GFN in the SPTE will be split into + * high and low parts. This mask covers the lower bits of the GFN. + */ +static u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gpa_mask; + + static void mmu_spte_set(u64 *sptep, u64 spte); static union kvm_mmu_page_role kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu); @@ -357,9 +368,7 @@ static bool is_mmio_spte(u64 spte) static gfn_t get_mmio_spte_gfn(u64 spte) { - u64 mask = generation_mmio_spte_mask(MMIO_GEN_MASK) | shadow_mmio_mask | - shadow_nonpresent_or_rsvd_mask; - u64 gpa = spte & ~mask; + u64 gpa = spte & shadow_nonpresent_or_rsvd_lower_gpa_mask; gpa |= (spte >> shadow_nonpresent_or_rsvd_mask_len) & shadow_nonpresent_or_rsvd_mask; @@ -423,6 +432,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); static void kvm_mmu_reset_all_pte_masks(void) { + u8 low_gpa_bits; + shadow_user_mask = 0; shadow_accessed_mask = 0; shadow_dirty_mask = 0; @@ -437,12 +448,16 @@ static void kvm_mmu_reset_all_pte_masks(void) * appropriate mask to guard against L1TF attacks. Otherwise, it is * assumed that the CPU is not vulnerable to L1TF. */ + low_gpa_bits = boot_cpu_data.x86_phys_bits; if (boot_cpu_data.x86_phys_bits < - 52 - shadow_nonpresent_or_rsvd_mask_len) + 52 - shadow_nonpresent_or_rsvd_mask_len) { shadow_nonpresent_or_rsvd_mask = rsvd_bits(boot_cpu_data.x86_phys_bits - shadow_nonpresent_or_rsvd_mask_len, boot_cpu_data.x86_phys_bits - 1); + low_gpa_bits -= shadow_nonpresent_or_rsvd_mask_len; + } + shadow_nonpresent_or_rsvd_lower_gpa_mask = (1ULL << low_gpa_bits) - 1; } static int is_cpuid_PSE36(void)
One defense against L1TF in KVM is to always set the upper five bits of the *legal* physical address in the SPTEs for non-present and reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the MMIO SPTE may overlap with the upper five bits that are being usurped to defend against L1TF. To preserve the GFN, the bits of the GFN that overlap with the repurposed bits are shifted left into the reserved bits, i.e. the GFN in the SPTE will be split into high and low parts. When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO access, get_mmio_spte_gfn() unshifts the affected bits and restores the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() neglects to mask off the reserved bits in the SPTE that were used to store the upper chunk of the GFN. As a result, KVM fails to detect MMIO accesses whose GPA overlaps the repurprosed bits, which in turn causes guest panics and hangs. Fix the bug by generating a mask that covers the lower chunk of the GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The alternative approach would be to explicitly zero the five reserved bits that are used to store the upper chunk of the GFN, but that requires additional run-time computation and makes an already-ugly bit of code even more inscrutable. Reported-by: Sakari Ailus <sakari.ailus@iki.fi> Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Cc: Junaid Shahid <junaids@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kvm/mmu.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)