Message ID | 20250224070716.31360-1-yan.y.zhao@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT | expand |
On 2/24/25 08:07, Yan Zhao wrote: > This series introduces a quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT as > suggested by Paolo and Sean [1]. > > The purpose of introducing this quirk is to allow KVM to honor guest PAT on > Intel platforms with self-snoop feature. This support was previously > reverted by commit 9d70f3fec144 ("Revert "KVM: VMX: Always honor guest PAT > on CPUs that support self-snoop"") due to a reported broken of an old bochs > driver which incorrectly set memory type to UC but did not expect that UC > would be very slow on certain Intel platforms. Hi Yan, the main issue with this series is that the quirk is not disabled only for TDX VMs, but for *all* VMs if TDX is available. There are two concepts here: - which quirks can be disabled - which quirks are active I agree with making the first vendor-dependent, but for a different reason: the new KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT must be hidden if self-snoop is not present. As to the second, we already have an example of a quirk that is also active, though we don't represent that in kvm->arch.disabled_quirks: that's KVM_X86_QUIRK_CD_NW_CLEARED which is for AMD only and is effectively always disabled on Intel platforms. For those cases, we need to expose the quirk anyway in KVM_CAP_DISABLE_QUIRKS2, so that userspace knows that KVM is *aware* of a particular issue. In other words, even if disabling it has no effect, userspace may want to know that it can rely on the problematic behavior not being present. I'm testing an alternative series and will post it shortly. Paolo > Sean previously suggested to bottom out if the UC slowness issue is working > as intended so that we can enable the quirk only when the VMs are affected > by the old unmodifiable guests [2]. After consulting with CPU architects, > it's told that this behavior is expected on ICX/SPR Xeon platforms due to > the snooping implementation. > > So, implement the quirk such that KVM enables it by default on all Intel > non-TDX platforms while having the quirk explicitly reference the old > unmodifiable guests that rely on KVM to force memory type to WB. Newer > userspace can disable the quirk by default and only leave it enabled if an > old unmodifiable guest is an concern. > > The quirk is platform-specific valid, available only on Intel non-TDX > platforms. It is absent on Intel TDX and AMD platforms, where KVM always > honors guest PAT. > > Patch 1 does the preparation of making quirks platform-specific valid. > Patch 2 makes the quirk to be present on Intel and absent on AMD. > Patch 3 makes the quirk to be absent on Intel TDX and self-snoop a hard > dependency to enable TDX [3]. > As a new platform, TDX is always running on CPUs with self-snoop > feature. It has no worry to break old yet unmodifiable guests. > Simply have KVM always honor guest PAT on TDX enabled platforms. > Attaching/detaching non-coherent DMA devices would not lead to > mirrored EPTs being zapped for TDs then. A previous attempt for > this purpose is at [4]. > > > This series is based on kvm-coco-queue. It was supposed to be included in > TDX's "the rest" section. We post it separately to start review earlier. > > Patches 1 and 2 are changes to the generic code, which can also be applied > to kvm/queue. A proposal is to have them go into kvm/queue and we rebase on > that. > > Patch 3 can be included in TDX's "the rest" section in the end. > > Thanks > Yan > > [1] https://lore.kernel.org/kvm/CABgObfa=t1dGR5cEhbUqVWTD03vZR4QrzEUgHxq+3JJ7YsA9pA@mail.gmail.com > [2] https://lore.kernel.org/kvm/Zt8cgUASZCN6gP8H@google.com > [3] https://lore.kernel.org/kvm/ZuBSNS33_ck-w6-9@google.com > [4] https://lore.kernel.org/kvm/20241115084600.12174-1-yan.y.zhao@intel.com > > > Yan Zhao (3): > KVM: x86: Introduce supported_quirks for platform-specific valid > quirks > KVM: x86: Introduce Intel specific quirk > KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT > KVM: TDX: Always honor guest PAT on TDX enabled platforms > > Documentation/virt/kvm/api.rst | 30 +++++++++++++++++++++++++ > arch/x86/include/asm/kvm_host.h | 2 +- > arch/x86/include/uapi/asm/kvm.h | 1 + > arch/x86/kvm/mmu.h | 2 +- > arch/x86/kvm/mmu/mmu.c | 14 +++++++----- > arch/x86/kvm/vmx/main.c | 1 + > arch/x86/kvm/vmx/tdx.c | 5 +++++ > arch/x86/kvm/vmx/vmx.c | 39 +++++++++++++++++++++++++++------ > arch/x86/kvm/x86.c | 7 +++--- > arch/x86/kvm/x86.h | 12 +++++----- > 10 files changed, 91 insertions(+), 22 deletions(-) >
On Sat, Mar 01, 2025 at 07:49:13AM +0100, Paolo Bonzini wrote: > On 2/24/25 08:07, Yan Zhao wrote: > > This series introduces a quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT as > > suggested by Paolo and Sean [1]. > > > > The purpose of introducing this quirk is to allow KVM to honor guest PAT on > > Intel platforms with self-snoop feature. This support was previously > > reverted by commit 9d70f3fec144 ("Revert "KVM: VMX: Always honor guest PAT > > on CPUs that support self-snoop"") due to a reported broken of an old bochs > > driver which incorrectly set memory type to UC but did not expect that UC > > would be very slow on certain Intel platforms. > > Hi Yan, Hi Paolo, > the main issue with this series is that the quirk is not disabled only for > TDX VMs, but for *all* VMs if TDX is available. Yes, once TDX is enabled, the quirk is disabled for all VMs. My thought is that on TDX as a new platform, users have the option to update guest software to address bugs caused by incorrect guest PAT settings. If you think it's a must to support old unmodifiable non-TDX VMs on TDX platforms, then it's indeed an issue of this series. > > There are two concepts here: > > - which quirks can be disabled > > - which quirks are active > > I agree with making the first vendor-dependent, but for a different reason: > the new KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT must be hidden if self-snoop is > not present. I think it's a good idea to make KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT out of KVM_CAP_DISABLE_QUIRKS2, so that the quirk is always enabled when self-snoop is not present as userspace has no way to disable this quirk. However, this seems to contradict your point below, especially since it is even present on AMD platforms. "we need to expose the quirk anyway in KVM_CAP_DISABLE_QUIRKS2, so that userspace knows that KVM is *aware* of a particular issue", "even if disabling it has no effect, userspace may want to know that it can rely on the problematic behavior not being present". So, could we also expose KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT in KVM_CAP_DISABLE_QUIRKS2 on Intel platforms without self-snoop, but ensure that disabling the quirk has no effect? > As to the second, we already have an example of a quirk that is also active, > though we don't represent that in kvm->arch.disabled_quirks: that's > KVM_X86_QUIRK_CD_NW_CLEARED which is for AMD only and is effectively always > disabled on Intel platforms. For those cases, we need to expose the quirk I also have a concern about this one. Please find my comments in v2. > anyway in KVM_CAP_DISABLE_QUIRKS2, so that userspace knows that KVM is > *aware* of a particular issue. In other words, even if disabling it has no > effect, userspace may want to know that it can rely on the problematic > behavior not being present. > > I'm testing an alternative series and will post it shortly. Thanks a lot for helping with refining the patches! > > > Sean previously suggested to bottom out if the UC slowness issue is working > > as intended so that we can enable the quirk only when the VMs are affected > > by the old unmodifiable guests [2]. After consulting with CPU architects, > > it's told that this behavior is expected on ICX/SPR Xeon platforms due to > > the snooping implementation. > > > > So, implement the quirk such that KVM enables it by default on all Intel > > non-TDX platforms while having the quirk explicitly reference the old > > unmodifiable guests that rely on KVM to force memory type to WB. Newer > > userspace can disable the quirk by default and only leave it enabled if an > > old unmodifiable guest is an concern. > > > > The quirk is platform-specific valid, available only on Intel non-TDX > > platforms. It is absent on Intel TDX and AMD platforms, where KVM always > > honors guest PAT. > > > > Patch 1 does the preparation of making quirks platform-specific valid. > > Patch 2 makes the quirk to be present on Intel and absent on AMD. > > Patch 3 makes the quirk to be absent on Intel TDX and self-snoop a hard > > dependency to enable TDX [3]. > > As a new platform, TDX is always running on CPUs with self-snoop > > feature. It has no worry to break old yet unmodifiable guests. > > Simply have KVM always honor guest PAT on TDX enabled platforms. > > Attaching/detaching non-coherent DMA devices would not lead to > > mirrored EPTs being zapped for TDs then. A previous attempt for > > this purpose is at [4]. > > > > > > This series is based on kvm-coco-queue. It was supposed to be included in > > TDX's "the rest" section. We post it separately to start review earlier. > > > > Patches 1 and 2 are changes to the generic code, which can also be applied > > to kvm/queue. A proposal is to have them go into kvm/queue and we rebase on > > that. > > > > Patch 3 can be included in TDX's "the rest" section in the end. > > > > Thanks > > Yan > > > > [1] https://lore.kernel.org/kvm/CABgObfa=t1dGR5cEhbUqVWTD03vZR4QrzEUgHxq+3JJ7YsA9pA@mail.gmail.com > > [2] https://lore.kernel.org/kvm/Zt8cgUASZCN6gP8H@google.com > > [3] https://lore.kernel.org/kvm/ZuBSNS33_ck-w6-9@google.com > > [4] https://lore.kernel.org/kvm/20241115084600.12174-1-yan.y.zhao@intel.com > > > > > > Yan Zhao (3): > > KVM: x86: Introduce supported_quirks for platform-specific valid > > quirks > > KVM: x86: Introduce Intel specific quirk > > KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT > > KVM: TDX: Always honor guest PAT on TDX enabled platforms > > > > Documentation/virt/kvm/api.rst | 30 +++++++++++++++++++++++++ > > arch/x86/include/asm/kvm_host.h | 2 +- > > arch/x86/include/uapi/asm/kvm.h | 1 + > > arch/x86/kvm/mmu.h | 2 +- > > arch/x86/kvm/mmu/mmu.c | 14 +++++++----- > > arch/x86/kvm/vmx/main.c | 1 + > > arch/x86/kvm/vmx/tdx.c | 5 +++++ > > arch/x86/kvm/vmx/vmx.c | 39 +++++++++++++++++++++++++++------ > > arch/x86/kvm/x86.c | 7 +++--- > > arch/x86/kvm/x86.h | 12 +++++----- > > 10 files changed, 91 insertions(+), 22 deletions(-) > > >
On 3/3/25 02:11, Yan Zhao wrote: >> the main issue with this series is that the quirk is not disabled only for >> TDX VMs, but for *all* VMs if TDX is available. > Yes, once TDX is enabled, the quirk is disabled for all VMs. > My thought is that on TDX as a new platform, users have the option to update > guest software to address bugs caused by incorrect guest PAT settings. > > If you think it's a must to support old unmodifiable non-TDX VMs on TDX > platforms, then it's indeed an issue of this series. Yeah, unfortunately I think we need to keep the quirk for old VMs. But I think the code changes needed to do so are small and good to have anyway. >> There are two concepts here: >> >> - which quirks can be disabled >> >> - which quirks are active >> >> I agree with making the first vendor-dependent, but for a different reason: >> the new KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT must be hidden if self-snoop is >> not present. > > I think it's a good idea to make KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT out of > KVM_CAP_DISABLE_QUIRKS2, so that the quirk is always enabled when self-snoop is > not present as userspace has no way to disable this quirk. > > However, this seems to contradict your point below, especially since it is even > present on AMD platforms. > > "we need to expose the quirk anyway in KVM_CAP_DISABLE_QUIRKS2, so that > userspace knows that KVM is *aware* of a particular issue", "even if disabling > it has no effect, userspace may want to know that it can rely on the problematic > behavior not being present". There are four cases: * quirk cannot be disabled: example, "ignore guest PAT" on non-self-snoop machines: the quirk must not be in KVM_CAP_DISABLE_QUIRKS2 * quirk can be disabled: the quirk must be in KVM_CAP_DISABLE_QUIRKS2 * quirk is always disabled: right now we're always exposing those in KVM_CAP_DISABLE_QUIRKS2, so we should keep that behavior. If desired we could add a capability like KVM_CAP_DISABLED_QUIRKS * for some VMs, quirk is always disabled: this is the case also for the zap_all quirk that you have previously introduced. Right now there's no way to query it, but KVM_CAP_DISABLED_QUIRKS would also cover this. If KVM_CAP_DISABLED_QUIRKS was introduced, zap_all could be added too. > So, could we also expose KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT in > KVM_CAP_DISABLE_QUIRKS2 on Intel platforms without self-snoop, but ensure that > disabling the quirk has no effect? To keep the API clear, disabling the quirk should *always* have the effect of going to the non-quirky behavior. Which may be no effect at all if the non-quirky behavior is the only one---but the important thing is that you don't want the quirky/buggy/non-architectural behavior after a successful KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2). There is a pre-existing bug in that I think KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2) should be cumulative, i.e. should not allow re-enabling a previously-disabled quirk. I think we can change that without worrying about breaking userspace there, as the current behavior is the most surprising. >> As to the second, we already have an example of a quirk that is also active, >> though we don't represent that in kvm->arch.disabled_quirks: that's >> KVM_X86_QUIRK_CD_NW_CLEARED which is for AMD only and is effectively always >> disabled on Intel platforms. For those cases, we need to expose the quirk > I also have a concern about this one. Please find my comments in v2. Ok, I'll reply there too. >> anyway in KVM_CAP_DISABLE_QUIRKS2, so that userspace knows that KVM is >> *aware* of a particular issue. In other words, even if disabling it has no >> effect, userspace may want to know that it can rely on the problematic >> behavior not being present. >> >> I'm testing an alternative series and will post it shortly. > > Thanks a lot for helping with refining the patches! Thanks to you and sorry that the patches weren't of the best quality - I mostly wanted to start the discussion on the userspace API side before the beginning of the week in your time zone. Paolo
On Mon, Mar 03, 2025 at 11:25:08AM +0100, Paolo Bonzini wrote: > On 3/3/25 02:11, Yan Zhao wrote: > > > the main issue with this series is that the quirk is not disabled only for > > > TDX VMs, but for *all* VMs if TDX is available. > > Yes, once TDX is enabled, the quirk is disabled for all VMs. > > My thought is that on TDX as a new platform, users have the option to update > > guest software to address bugs caused by incorrect guest PAT settings. > > > > If you think it's a must to support old unmodifiable non-TDX VMs on TDX > > platforms, then it's indeed an issue of this series. > > Yeah, unfortunately I think we need to keep the quirk for old VMs. But I > think the code changes needed to do so are small and good to have anyway. > > > > There are two concepts here: > > > > > > - which quirks can be disabled > > > > > > - which quirks are active > > > > > > I agree with making the first vendor-dependent, but for a different reason: > > > the new KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT must be hidden if self-snoop is > > > not present. > > > > I think it's a good idea to make KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT out of > > KVM_CAP_DISABLE_QUIRKS2, so that the quirk is always enabled when self-snoop is > > not present as userspace has no way to disable this quirk. > > > > However, this seems to contradict your point below, especially since it is even > > present on AMD platforms. > > > > "we need to expose the quirk anyway in KVM_CAP_DISABLE_QUIRKS2, so that > > userspace knows that KVM is *aware* of a particular issue", "even if disabling > > it has no effect, userspace may want to know that it can rely on the problematic > > behavior not being present". > > There are four cases: > > * quirk cannot be disabled: example, "ignore guest PAT" on non-self-snoop > machines: the quirk must not be in KVM_CAP_DISABLE_QUIRKS2 > > * quirk can be disabled: the quirk must be in KVM_CAP_DISABLE_QUIRKS2 > > * quirk is always disabled: right now we're always exposing those in > KVM_CAP_DISABLE_QUIRKS2, so we should keep that behavior. If desired we > could add a capability like KVM_CAP_DISABLED_QUIRKS > > * for some VMs, quirk is always disabled: this is the case also for the > zap_all quirk that you have previously introduced. Right now there's no way > to query it, but KVM_CAP_DISABLED_QUIRKS would also cover this. If > KVM_CAP_DISABLED_QUIRKS was introduced, zap_all could be added too. > > > So, could we also expose KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT in > > KVM_CAP_DISABLE_QUIRKS2 on Intel platforms without self-snoop, but ensure that > > disabling the quirk has no effect? > > To keep the API clear, disabling the quirk should *always* have the effect > of going to the non-quirky behavior. Which may be no effect at all if the > non-quirky behavior is the only one---but the important thing is that you > don't want the quirky/buggy/non-architectural behavior after a successful > KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2). Thanks for this clarification! > > There is a pre-existing bug in that I think > KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2) should be cumulative, i.e. should > not allow re-enabling a previously-disabled quirk. I think we can change > that without worrying about breaking userspace there, as the current > behavior is the most surprising. That would be better. > > > As to the second, we already have an example of a quirk that is also active, > > > though we don't represent that in kvm->arch.disabled_quirks: that's > > > KVM_X86_QUIRK_CD_NW_CLEARED which is for AMD only and is effectively always > > > disabled on Intel platforms. For those cases, we need to expose the quirk > > I also have a concern about this one. Please find my comments in v2. > > Ok, I'll reply there too. > > > > anyway in KVM_CAP_DISABLE_QUIRKS2, so that userspace knows that KVM is > > > *aware* of a particular issue. In other words, even if disabling it has no > > > effect, userspace may want to know that it can rely on the problematic > > > behavior not being present. > > > > > > I'm testing an alternative series and will post it shortly. > > Thanks a lot for helping with refining the patches! > > Thanks to you and sorry that the patches weren't of the best quality - I > mostly wanted to start the discussion on the userspace API side before the > beginning of the week in your time zone. No problem. I realized the problem in my implementation of excluding quirk IGNORE_GUEST_PAT from KVM_CAP_DISABLE_QUIRKS2 on TDX platforms. This could lead to confusion for userspace, which wouldn't be able to determine whether: - it's an old KVM that does not support quirk IGNORE_GUEST_PAT, meaning KVM will ignore guest PAT, or - it's a new KVM that supports IGNORE_GUEST_PAT, meaning KVM will honor guest PAT on TDX platforms. Looking back, I was too KVM-centric. I just thought users wouldn't need to invoke KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2) on AMD or TDX, but that was wrong -- I did not consider the issue from the user's perspective.