Message ID | 20220616084643.19564-1-weijiang.yang@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Refresh queued CET virtualization series | expand |
On Thu, Jun 16, 2022 at 04:46:24AM -0400, Yang Weijiang wrote: > The purpose of this patch series is to refresh the queued CET KVM > patches[1] with the latest dependent CET native patches, pursuing > the result that whole series could be merged ahead of CET native > series[2] [3]. It might be helpful to explain what CET is here..
On Thu, Jun 16, 2022 at 04:46:24AM -0400, Yang Weijiang wrote: > To minimize the impact to exiting kernel/KVM code, most of KVM patch > code can be bypassed during runtime.Uncheck "CONFIG_X86_KERNEL_IBT" > and "CONFIG_X86_SHADOW_STACK" in Kconfig before kernel build to get > rid of CET featrures in KVM. If both of them are not enabled, KVM > clears related feature bits as well as CET user bit in supported_xss, > this makes CET related checks stop at the first points. Since most of > the patch code runs on the none-hot path of KVM, it's expected to > introduce little impact to existing code. Do I understand this right in that a host without X86_KERNEL_IBT cannot run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that was exactly what I did while developing the X86_KERNEL_IBT patches. I'm thinking that if the hardware supports it, KVM should expose it, irrespective of the host kernel using it.
On 6/16/22 12:12, Peter Zijlstra wrote: > Do I understand this right in that a host without X86_KERNEL_IBT cannot > run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that > was exactly what I did while developing the X86_KERNEL_IBT patches. > > I'm thinking that if the hardware supports it, KVM should expose it, > irrespective of the host kernel using it. For IBT in particular, I think all processor state is only loaded and stored at vmentry/vmexit (does not need XSAVES), so it should be feasible. Paolo
On Thu, Jun 16, 2022 at 02:10:50AM -0700, Christoph Hellwig wrote: > On Thu, Jun 16, 2022 at 04:46:24AM -0400, Yang Weijiang wrote: > > The purpose of this patch series is to refresh the queued CET KVM > > patches[1] with the latest dependent CET native patches, pursuing > > the result that whole series could be merged ahead of CET native > > series[2] [3]. > > It might be helpful to explain what CET is here.. Central European Time ofc :-) I think it stands for Control-flow Enforcement Technology or something along those lines, but this being Intel it loves to obfuscate stuff and make it impossible to understand what's being said to increase the buzzword bong hits. Its a mostly pointless umbrella term for IBT (Indirect Branch Tracking) and SHSTK (SHadow STacK), the first of which covers forward edge control flow and the second covers backward edge control flow.
On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote: > On 6/16/22 12:12, Peter Zijlstra wrote: > > Do I understand this right in that a host without X86_KERNEL_IBT cannot > > run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that > > was exactly what I did while developing the X86_KERNEL_IBT patches. > > > > I'm thinking that if the hardware supports it, KVM should expose it, > > irrespective of the host kernel using it. > > For IBT in particular, I think all processor state is only loaded and stored > at vmentry/vmexit (does not need XSAVES), so it should be feasible. That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET stuff is all XSAVE though. But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does* enumerate IBT and SS separately, but for each IBT/SS you have to implement both U and S. That was a problem with the first series, which only implemented support for U_CET while advertising IBT and SS (very much including S_CET), and still is a problem with this series because S_SS is missing while advertised.
On 6/16/2022 10:18 PM, Peter Zijlstra wrote: > On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote: >> On 6/16/22 12:12, Peter Zijlstra wrote: >>> Do I understand this right in that a host without X86_KERNEL_IBT cannot >>> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that >>> was exactly what I did while developing the X86_KERNEL_IBT patches. >>> >>> I'm thinking that if the hardware supports it, KVM should expose it, >>> irrespective of the host kernel using it. >> For IBT in particular, I think all processor state is only loaded and stored >> at vmentry/vmexit (does not need XSAVES), so it should be feasible. > That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET > stuff is all XSAVE though. Thank you Peter and Paolo! In this version, I referenced host kernel settings when expose X86_KERNEL_IBT to guest. The reason would be _IF_ host, for whatever reason, disabled the IBT feature, exposing the feature blindly to guest could be risking, e.g., hitting some issues host wants to mitigate. The actual implementation depends on the agreement we got :-) > > But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does* > enumerate IBT and SS separately, but for each IBT/SS you have to > implement both U and S. Exactly, the CPUID enumeration could be a pain point for the KVM solution. It makes {U,S}_CET feature control harder for guest. > > That was a problem with the first series, which only implemented support > for U_CET while advertising IBT and SS (very much including S_CET), and > still is a problem with this series because S_SS is missing while > advertised. KVM has problem advertising S_SS alone to guest when U_CET(both SS and IBT) are not available to guest. I would like to hear the voice from community on how to make the features control straightforward and reasonable. Existing CPUID enumeration cannot advertise {U, S}_SS and {U,S}_IBT well. >
On 6/16/22 16:18, Peter Zijlstra wrote: > On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote: >> On 6/16/22 12:12, Peter Zijlstra wrote: >>> Do I understand this right in that a host without X86_KERNEL_IBT cannot >>> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that >>> was exactly what I did while developing the X86_KERNEL_IBT patches. >>> >>> I'm thinking that if the hardware supports it, KVM should expose it, >>> irrespective of the host kernel using it. >> >> For IBT in particular, I think all processor state is only loaded and stored >> at vmentry/vmexit (does not need XSAVES), so it should be feasible. > > That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET > stuff is all XSAVE though. What matters is whether XFEATURE_MASK_USER_SUPPORTED includes XFEATURE_CET_USER. If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state for U_CET state, and S_CET is saved/restored via the VMCS independent of X86_KERNEL_IBT. Paolo > But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does* > enumerate IBT and SS separately, but for each IBT/SS you have to > implement both U and S. > > That was a problem with the first series, which only implemented support > for U_CET while advertising IBT and SS (very much including S_CET), and > still is a problem with this series because S_SS is missing while > advertised.
On 6/16/2022 11:28 PM, Paolo Bonzini wrote: > On 6/16/22 16:18, Peter Zijlstra wrote: >> On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote: >>> On 6/16/22 12:12, Peter Zijlstra wrote: >>>> Do I understand this right in that a host without X86_KERNEL_IBT >>>> cannot >>>> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that >>>> was exactly what I did while developing the X86_KERNEL_IBT patches. >>>> >>>> I'm thinking that if the hardware supports it, KVM should expose it, >>>> irrespective of the host kernel using it. >>> >>> For IBT in particular, I think all processor state is only loaded >>> and stored >>> at vmentry/vmexit (does not need XSAVES), so it should be feasible. >> >> That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET >> stuff is all XSAVE though. > > What matters is whether XFEATURE_MASK_USER_SUPPORTED includes > XFEATURE_CET_USER. Small correction, XFEATURE_CET_USER belongs to XFEATURE_MASK_SUPERVISOR_SUPPORTED, the name is misleading. > If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state > for U_CET state, and S_CET is saved/restored via the VMCS independent > of X86_KERNEL_IBT. A fundamental question is, should KVM always honor host CET enablement before expose the feature to guest? i.e., check X86_KERNEL_IBT and X86_SHADOW_STACK. > > Paolo > >> But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does* >> enumerate IBT and SS separately, but for each IBT/SS you have to >> implement both U and S. >> >> That was a problem with the first series, which only implemented support >> for U_CET while advertising IBT and SS (very much including S_CET), and >> still is a problem with this series because S_SS is missing while >> advertised. >
On Sat, Jun 18, 2022, Yang, Weijiang wrote: > > On 6/16/2022 11:28 PM, Paolo Bonzini wrote: > > If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state > > for U_CET state, and S_CET is saved/restored via the VMCS independent of > > X86_KERNEL_IBT. > > A fundamental question is, should KVM always honor host CET enablement > before expose the feature to guest? i.e., check X86_KERNEL_IBT and > X86_SHADOW_STACK. If there is a legitimate use case to NOT require host enablement and it's 100% safe to do so (within requiring hacks to the core kernel), then there's no hard requirement that says KVM can't virtualize a feature that's not used by the host. It's definitely uncommon; unless I'm forgetting features, LA57 is the only feature that KVM fully virtualizes (as opposed to emulates in software) without requiring host enablement. Ah, and good ol' MPX, which is probably the best prior are since it shares the same XSAVE+VMCS for user+supervisor state management. So more than one, but still not very many. But, requiring host "support" is the de facto standard largely because features tend to fall into one of three categories: 1. The feature is always available, i.e. doesn't have a software enable/disable flag. 2. The feature isn't explicitly disabled in cpufeatures / x86_capability even if it's not used by the host. E.g. MONITOR/MWAIT comes to mind where the host can be configured to not use MWAIT for idle, but it's still reported as supported (and for that case, KVM does have to explicitly guard against X86_BUG_MONITOR). 3. Require some amount of host support, e.g. exposing XSAVE without the kernel knowing how to save/restore all that state wouldn't end well. In other words, virtualizing a feature if it's disabled in the host is allowed, but it's rare because there just aren't many features where doing so is possible _and_ necessary.
On 7/15/2022 3:36 AM, Sean Christopherson wrote: > On Sat, Jun 18, 2022, Yang, Weijiang wrote: >> On 6/16/2022 11:28 PM, Paolo Bonzini wrote: >>> If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state >>> for U_CET state, and S_CET is saved/restored via the VMCS independent of >>> X86_KERNEL_IBT. >> A fundamental question is, should KVM always honor host CET enablement >> before expose the feature to guest? i.e., check X86_KERNEL_IBT and >> X86_SHADOW_STACK. > If there is a legitimate use case to NOT require host enablement and it's 100% > safe to do so (within requiring hacks to the core kernel), then there's no hard > requirement that says KVM can't virtualize a feature that's not used by the host. Yeah, CET definitely can be virtualized without considering host usages, but to make things easier, still back on some kind of host side support, e.g., xsaves. > > It's definitely uncommon; unless I'm forgetting features, LA57 is the only feature > that KVM fully virtualizes (as opposed to emulates in software) without requiring > host enablement. Ah, and good ol' MPX, which is probably the best prior are since > it shares the same XSAVE+VMCS for user+supervisor state management. So more than > one, but still not very many. Speaking of MPX, is it really active in recent kernel? I can find little piece of code at native side, instead, more code in KVM. > > But, requiring host "support" is the de facto standard largely because features > tend to fall into one of three categories: > > 1. The feature is always available, i.e. doesn't have a software enable/disable > flag. > > 2. The feature isn't explicitly disabled in cpufeatures / x86_capability even > if it's not used by the host. E.g. MONITOR/MWAIT comes to mind where the > host can be configured to not use MWAIT for idle, but it's still reported > as supported (and for that case, KVM does have to explicitly guard against > X86_BUG_MONITOR). > > 3. Require some amount of host support, e.g. exposing XSAVE without the kernel > knowing how to save/restore all that state wouldn't end well. CET may fall into one of the three or combination of them :-), depending on the complexity of the implementation. > > In other words, virtualizing a feature if it's disabled in the host is allowed, > but it's rare because there just aren't many features where doing so is possible > _and_ necessary. I'm thinking of tweaking the patches to construct a safe yet flexible solution based on a bunch of MSRs/CPUIDs/VMCS fields/XSAVES elements + a few host side constraints. Thanks for the enlightenment!
On Fri, Jul 15, 2022, Yang, Weijiang wrote: > > On 7/15/2022 3:36 AM, Sean Christopherson wrote: > > It's definitely uncommon; unless I'm forgetting features, LA57 is the only feature > > that KVM fully virtualizes (as opposed to emulates in software) without requiring > > host enablement. Ah, and good ol' MPX, which is probably the best prior are since > > it shares the same XSAVE+VMCS for user+supervisor state management. So more than > > one, but still not very many. > > Speaking of MPX, is it really active in recent kernel? I can find little > piece of code at native side, instead, more code in KVM. Nope, native MPX support was ripped out a year or two ago. The kernel provides just enough save+restore support so that KVM can continue to virtualize MPX.