Message ID | 1432983566-15773-15-git-send-email-guangrong.xiao@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 30/05/2015 12:59, Xiao Guangrong wrote: > Currently guest MTRR is completely prohibited if cache snoop is supported on > IOMMU (!noncoherent_dma) and host does the emulation based on the knowledge > from host side, however, host side is not the good point to know > what the purpose of guest is. A good example is that pass-throughed VGA > frame buffer is not always UC as host expected Can you explain how? The original idea was that such a framebuffer would be kvm_is_reserved_pfn and thus be unconditionally UC. > +bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, > + int page_num) > +{ > + struct mtrr_looker looker; > + struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; > + u64 start = gfn_to_gpa(gfn), end = gfn_to_gpa(gfn + page_num); > + int type = -1; > + > + mtrr_for_each_mem_type(&looker, mtrr_state, start, end) { > + if (type == -1) { > + type = looker.mem_type; > + continue; > + } > + > + if (type != looker.mem_type) > + return false; > + } > + > + if ((type != -1) && looker.partial_map && > + (mtrr_state->def_type != type)) > + return false; > + No Pascal-like parentheses. Does this have a performance impact on shadow? Perhaps we could cache in struct kvm_arch_memory_slot whether the memslot is covered by MTRRs? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/06/2015 11:36, Paolo Bonzini wrote: > Does this have a performance impact on shadow? Perhaps we could cache > in struct kvm_arch_memory_slot whether the memslot is covered by MTRRs? Nevermind, patch 15 answers my question. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/01/2015 05:36 PM, Paolo Bonzini wrote: > > > On 30/05/2015 12:59, Xiao Guangrong wrote: >> Currently guest MTRR is completely prohibited if cache snoop is supported on >> IOMMU (!noncoherent_dma) and host does the emulation based on the knowledge >> from host side, however, host side is not the good point to know >> what the purpose of guest is. A good example is that pass-throughed VGA >> frame buffer is not always UC as host expected > > Can you explain how? The original idea was that such a framebuffer > would be kvm_is_reserved_pfn and thus be unconditionally UC. Yes, frame-buffer is always UC in current code, however, UC for frame-buffer causes bad performance. It's quoted from Documentation/x86/mtrr.txt: | This is most useful when you have a video (VGA) card on a PCI or AGP bus. | Enabling write-combining allows bus write transfers to be combined into a | larger transfer before bursting over the PCI/AGP bus. This can increase | performance of image write operations 2.5 times or more. So that guest will configure the range to MTRR, this patchset follows guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to emulate guest cache type as guest expects. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/06/2015 04:56, Xiao Guangrong wrote: > > > On 06/01/2015 05:36 PM, Paolo Bonzini wrote: >> >> >> On 30/05/2015 12:59, Xiao Guangrong wrote: >>> Currently guest MTRR is completely prohibited if cache snoop is >>> supported on >>> IOMMU (!noncoherent_dma) and host does the emulation based on the >>> knowledge >>> from host side, however, host side is not the good point to know >>> what the purpose of guest is. A good example is that pass-throughed VGA >>> frame buffer is not always UC as host expected >> >> Can you explain how? The original idea was that such a framebuffer >> would be kvm_is_reserved_pfn and thus be unconditionally UC. > > Yes, frame-buffer is always UC in current code, however, UC for > frame-buffer causes bad performance. Understood now, thanks. > So that guest will configure the range to MTRR, this patchset follows > guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to > emulate guest cache type as guest expects. Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve coherency. AMD, has special logic to do this, for example: - if guest PAT says "UC" and host MTRR says "WB", the processor will not cache the memory but will snoop the cache as if CR0.CD=1 - if guest PAT says "WC" and host (nested page table) PAT says "WB" and host MTRR says "WB", the processor will still do write combining but also snoop the cache as if CR0.CD=1 I am worried that the lack of this feature could cause problems if guests map QEMU's VGA framebuffer as uncached. We have this problem on ARM, so it's not 100% theoretical. So, why do you need to always use IPAT=0? Can patch 15 keep the current logic for RAM, like this: if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm)) ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT; else ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; ? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/03/2015 03:55 PM, Paolo Bonzini wrote: > > > On 03/06/2015 04:56, Xiao Guangrong wrote: >> >> >> On 06/01/2015 05:36 PM, Paolo Bonzini wrote: >>> >>> >>> On 30/05/2015 12:59, Xiao Guangrong wrote: >>>> Currently guest MTRR is completely prohibited if cache snoop is >>>> supported on >>>> IOMMU (!noncoherent_dma) and host does the emulation based on the >>>> knowledge >>>> from host side, however, host side is not the good point to know >>>> what the purpose of guest is. A good example is that pass-throughed VGA >>>> frame buffer is not always UC as host expected >>> >>> Can you explain how? The original idea was that such a framebuffer >>> would be kvm_is_reserved_pfn and thus be unconditionally UC. >> >> Yes, frame-buffer is always UC in current code, however, UC for >> frame-buffer causes bad performance. > > Understood now, thanks. > >> So that guest will configure the range to MTRR, this patchset follows >> guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to >> emulate guest cache type as guest expects. > > Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve > coherency. AMD, has special logic to do this, for example: > > - if guest PAT says "UC" and host MTRR says "WB", the processor will not > cache the memory but will snoop the cache as if CR0.CD=1 > > - if guest PAT says "WC" and host (nested page table) PAT says "WB" and > host MTRR says "WB", the processor will still do write combining but > also snoop the cache as if CR0.CD=1 > > I am worried that the lack of this feature could cause problems if > guests map QEMU's VGA framebuffer as uncached. We have this problem on > ARM, so it's not 100% theoretical. CR0.CD is always 0 in both host and guest, i guess it's why we cleared CR0.CD and CR0.NW in vmx_set_cr0(). > > So, why do you need to always use IPAT=0? Can patch 15 keep the current > logic for RAM, like this: > > if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm)) > ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) << > VMX_EPT_MT_EPTE_SHIFT; > else > ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) > | VMX_EPT_IPAT_BIT; Yeah, it's okay, actually we considered this way, however - it's light enough, it did not hurt guest performance based on our benchmark. - the logic has always used for noncherent_dma case, extend it to normal case should have low risk and also help us to check the logic. - completely follow MTRRS spec would be better than host hides it. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/04/2015 04:23 PM, Xiao Guangrong wrote: > > > On 06/03/2015 03:55 PM, Paolo Bonzini wrote: >> >> >> On 03/06/2015 04:56, Xiao Guangrong wrote: >>> >>> >>> On 06/01/2015 05:36 PM, Paolo Bonzini wrote: >>>> >>>> >>>> On 30/05/2015 12:59, Xiao Guangrong wrote: >>>>> Currently guest MTRR is completely prohibited if cache snoop is >>>>> supported on >>>>> IOMMU (!noncoherent_dma) and host does the emulation based on the >>>>> knowledge >>>>> from host side, however, host side is not the good point to know >>>>> what the purpose of guest is. A good example is that pass-throughed VGA >>>>> frame buffer is not always UC as host expected >>>> >>>> Can you explain how? The original idea was that such a framebuffer >>>> would be kvm_is_reserved_pfn and thus be unconditionally UC. >>> >>> Yes, frame-buffer is always UC in current code, however, UC for >>> frame-buffer causes bad performance. >> >> Understood now, thanks. >> >>> So that guest will configure the range to MTRR, this patchset follows >>> guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to >>> emulate guest cache type as guest expects. >> >> Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve >> coherency. AMD, has special logic to do this, for example: >> >> - if guest PAT says "UC" and host MTRR says "WB", the processor will not >> cache the memory but will snoop the cache as if CR0.CD=1 >> >> - if guest PAT says "WC" and host (nested page table) PAT says "WB" and >> host MTRR says "WB", the processor will still do write combining but >> also snoop the cache as if CR0.CD=1 >> >> I am worried that the lack of this feature could cause problems if >> guests map QEMU's VGA framebuffer as uncached. We have this problem on >> ARM, so it's not 100% theoretical. > > CR0.CD is always 0 in both host and guest, i guess it's why we cleared > CR0.CD and CR0.NW in vmx_set_cr0(). It reminds me that we should check guest CR0.CD before check guest MTRR and disable guest PAT if guest CR0.CD = 1. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/06/2015 10:26, Xiao Guangrong wrote: >> >> CR0.CD is always 0 in both host and guest, i guess it's why we cleared >> CR0.CD and CR0.NW in vmx_set_cr0(). > > It reminds me that we should check guest CR0.CD before check guest MTRR > and disable guest PAT if guest CR0.CD = 1. I think it can be done separately, on top. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/06/2015 10:23, Xiao Guangrong wrote: >> >> So, why do you need to always use IPAT=0? Can patch 15 keep the current >> logic for RAM, like this: >> >> if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm)) >> ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) << >> VMX_EPT_MT_EPTE_SHIFT; >> else >> ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) >> | VMX_EPT_IPAT_BIT; > > Yeah, it's okay, actually we considered this way, however > - it's light enough, it did not hurt guest performance based on our > benchmark. > - the logic has always used for noncherent_dma case, extend it to > normal case should have low risk and also help us to check the logic. But noncoherent_dma is not the common case, so it's not necessarily true that the risk is low. > - completely follow MTRRS spec would be better than host hides it. We are a virtualization platform, we know well when MTRRs are necessary. Tis a risk from blindly obeying the guest MTRRs: userspace can see stale data if the guest's accesses bypass the cache. AMD bypasses this by enabling snooping even in cases that ordinarily wouldn't snoop; for Intel the solution is that RAM-backed areas should always use IPAT. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ CCed Zhang Yang ] On 06/04/2015 04:36 PM, Paolo Bonzini wrote: > > > On 04/06/2015 10:23, Xiao Guangrong wrote: >>> >>> So, why do you need to always use IPAT=0? Can patch 15 keep the current >>> logic for RAM, like this: >>> >>> if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm)) >>> ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) << >>> VMX_EPT_MT_EPTE_SHIFT; >>> else >>> ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) >>> | VMX_EPT_IPAT_BIT; >> >> Yeah, it's okay, actually we considered this way, however >> - it's light enough, it did not hurt guest performance based on our >> benchmark. >> - the logic has always used for noncherent_dma case, extend it to >> normal case should have low risk and also help us to check the logic. > > But noncoherent_dma is not the common case, so it's not necessarily true > that the risk is low. I thought noncoherent_dma exists on 1st generation(s) IOMMU, it should be fully tested at that time. > >> - completely follow MTRRS spec would be better than host hides it. > > We are a virtualization platform, we know well when MTRRs are necessary. > > Tis a risk from blindly obeying the guest MTRRs: userspace can see stale > data if the guest's accesses bypass the cache. AMD bypasses this by > enabling snooping even in cases that ordinarily wouldn't snoop; for > Intel the solution is that RAM-backed areas should always use IPAT. Not sure if UC and other cacheable type combinations on guest and host will cause problem. The SMD mentioned that snoop is not required only when "The UC attribute comes from the MTRRs and the processors are not required to snoop their caches since the data could never have been cached." (Vol 3. 11.5.2.2) VMX do not touch hardware MTRR MSRs and i guess snoop works under this case. I also noticed if SS (self-snooping) is supported we need not to invalidate cache when programming memory type (Vol 3. 11.11.8), so that means CPU works well on the page which has different cache types i guess. After think it carefully, we (Zhang Yang) doubt if always set WB for DMA memory is really a good idea because we can not assume WB DMA works well for all devices. One example is that audio DMA (not a MMIO region) is required WC to improve its performance. However, we think the SDM is not clear enough so let's do full vMTRR on MMIO and noncoherent_dma first.?:) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7462c57..c8c2a90 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3437,6 +3437,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, return false; } +static bool +check_hugepage_cache_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int level) +{ + int page_num = KVM_PAGES_PER_HPAGE(level); + + gfn &= ~(page_num - 1); + + return kvm_mtrr_check_gfn_range_consistency(vcpu, gfn, page_num); +} + static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, bool prefault) { @@ -3462,9 +3472,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, if (r) return r; - force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn); + if (mapping_level_dirty_bitmap(vcpu, gfn) || + !check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL)) + force_pt_level = 1; + else + force_pt_level = 0; + if (likely(!force_pt_level)) { level = mapping_level(vcpu, gfn); + if (level > PT_DIRECTORY_LEVEL && + !check_hugepage_cache_consistency(vcpu, gfn, level)) + level = PT_DIRECTORY_LEVEL; gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); } else level = PT_PAGE_TABLE_LEVEL; diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index bc90834..703a66b 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -645,3 +645,28 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) return type; } EXPORT_SYMBOL_GPL(kvm_mtrr_get_guest_memory_type); + +bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, + int page_num) +{ + struct mtrr_looker looker; + struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; + u64 start = gfn_to_gpa(gfn), end = gfn_to_gpa(gfn + page_num); + int type = -1; + + mtrr_for_each_mem_type(&looker, mtrr_state, start, end) { + if (type == -1) { + type = looker.mem_type; + continue; + } + + if (type != looker.mem_type) + return false; + } + + if ((type != -1) && looker.partial_map && + (mtrr_state->def_type != type)) + return false; + + return true; +} diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index a3dae49..7c30ec8 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -166,6 +166,8 @@ void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu); bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data); int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data); int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); +bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, + int page_num); #define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM \ | XSTATE_BNDREGS | XSTATE_BNDCSR \
Based on Intel's SDM, mapping huge page which do not have consistent memory cache for each 4k page will cause undefined behavior In order to avoiding this kind of undefined behavior, we force to use 4k pages under this case Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> --- arch/x86/kvm/mmu.c | 20 +++++++++++++++++++- arch/x86/kvm/mtrr.c | 25 +++++++++++++++++++++++++ arch/x86/kvm/x86.h | 2 ++ 3 files changed, 46 insertions(+), 1 deletion(-)