diff mbox series

[v1,7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit

Message ID 317430261.8766476.1592321051337.JavaMail.zimbra@cert.pl (mailing list archive)
State Superseded
Headers show
Series Implement support for external IPT monitoring | expand

Commit Message

Michał Leszczyński June 16, 2020, 3:24 p.m. UTC
Enable IPT when entering the VM and disable it on vmexit.
Register state is persisted using vCPU ipt_state structure.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/arch/x86/hvm/vmx/vmx.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Comments

Roger Pau Monné June 16, 2020, 5:38 p.m. UTC | #1
On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
> Enable IPT when entering the VM and disable it on vmexit.
> Register state is persisted using vCPU ipt_state structure.

Shouldn't this be better done using Intel MSR load lists?

That seems to be what the SDM recommends for tracing VM events.

Thanks, Roger.
Michał Leszczyński June 16, 2020, 5:47 p.m. UTC | #2
----- 16 cze 2020 o 19:38, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
>> Enable IPT when entering the VM and disable it on vmexit.
>> Register state is persisted using vCPU ipt_state structure.
> 
> Shouldn't this be better done using Intel MSR load lists?
> 
> That seems to be what the SDM recommends for tracing VM events.
> 
> Thanks, Roger.


This is intentional, additionally described by the comment:

// MSR_IA32_RTIT_CTL is context-switched manually instead of being
// stored inside VMCS, as of Q2'20 only the most recent processors
// support such field in VMCS


There is a special feature flag which indicates whether MSR_IA32_RTIT_CTL can be loaded using MR load lists. During my experiments, I haven't found any single CPU available to me that would declare such a feature flag. I was mostly testing CPUs that were launched in 2018, so I suppose that this feature is present only on very recent hardware. Unfortunately it's not possible to check on Intel ARK as this information is not listed there at all.


Best regards,
Michał Leszczyński
CERT Polska
Roger Pau Monné June 17, 2020, 9:09 a.m. UTC | #3
On Tue, Jun 16, 2020 at 07:47:07PM +0200, Michał Leszczyński wrote:
> ----- 16 cze 2020 o 19:38, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
> > On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
> >> Enable IPT when entering the VM and disable it on vmexit.
> >> Register state is persisted using vCPU ipt_state structure.
> > 
> > Shouldn't this be better done using Intel MSR load lists?
> > 
> > That seems to be what the SDM recommends for tracing VM events.
> > 
> > Thanks, Roger.
> 
> 
> This is intentional, additionally described by the comment:
> 
> // MSR_IA32_RTIT_CTL is context-switched manually instead of being
> // stored inside VMCS, as of Q2'20 only the most recent processors
> // support such field in VMCS
> 
> 
> There is a special feature flag which indicates whether MSR_IA32_RTIT_CTL can be loaded using MR load lists.

I've been looking at the Intel SDM and I'm not able to find which bit
signals whether MSR_IA32_RTIT_CTL can be loaded using MSR load lists.
Sorry to ask, but can you elaborate on where is this signaled?

Thanks, Roger.
Michał Leszczyński June 17, 2020, 11:54 a.m. UTC | #4
----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Tue, Jun 16, 2020 at 07:47:07PM +0200, Michał Leszczyński wrote:
>> ----- 16 cze 2020 o 19:38, Roger Pau Monné roger.pau@citrix.com napisał(a):
>> 
>> > On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
>> >> Enable IPT when entering the VM and disable it on vmexit.
>> >> Register state is persisted using vCPU ipt_state structure.
>> > 
>> > Shouldn't this be better done using Intel MSR load lists?
>> > 
>> > That seems to be what the SDM recommends for tracing VM events.
>> > 
>> > Thanks, Roger.
>> 
>> 
>> This is intentional, additionally described by the comment:
>> 
>> // MSR_IA32_RTIT_CTL is context-switched manually instead of being
>> // stored inside VMCS, as of Q2'20 only the most recent processors
>> // support such field in VMCS
>> 
>> 
>> There is a special feature flag which indicates whether MSR_IA32_RTIT_CTL can be
>> loaded using MR load lists.
> 
> I've been looking at the Intel SDM and I'm not able to find which bit
> signals whether MSR_IA32_RTIT_CTL can be loaded using MSR load lists.
> Sorry to ask, but can you elaborate on where is this signaled?
> 
> Thanks, Roger.


According to SDM:

> 24 Virtual Machine Control Structures -> 24.4 Guest-state Area -> 24.4.1 Guest Register State

> IA32_RTIT_CTL (64 bits). This field is supported only on processors that support either the 1-setting of the "load IA32_RTIT_CTL" VM-entry control or that of the "clear IA32_RTIT_CTL" VM-exit control.


> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1 VM-Entry Controls

> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the reserved bits.

Please look at bit position 18 "Load IA32_RTIT_CTL".



Best regards,
Michał Leszczyński
CERT Polska
Roger Pau Monné June 17, 2020, 12:51 p.m. UTC | #5
On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
> > On Tue, Jun 16, 2020 at 07:47:07PM +0200, Michał Leszczyński wrote:
> >> ----- 16 cze 2020 o 19:38, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >> 
> >> > On Tue, Jun 16, 2020 at 05:24:11PM +0200, Michał Leszczyński wrote:
> >> >> Enable IPT when entering the VM and disable it on vmexit.
> >> >> Register state is persisted using vCPU ipt_state structure.
> >> > 
> >> > Shouldn't this be better done using Intel MSR load lists?
> >> > 
> >> > That seems to be what the SDM recommends for tracing VM events.
> >> > 
> >> > Thanks, Roger.
> >> 
> >> 
> >> This is intentional, additionally described by the comment:
> >> 
> >> // MSR_IA32_RTIT_CTL is context-switched manually instead of being
> >> // stored inside VMCS, as of Q2'20 only the most recent processors
> >> // support such field in VMCS
> >> 
> >> 
> >> There is a special feature flag which indicates whether MSR_IA32_RTIT_CTL can be
> >> loaded using MR load lists.
> > 
> > I've been looking at the Intel SDM and I'm not able to find which bit
> > signals whether MSR_IA32_RTIT_CTL can be loaded using MSR load lists.
> > Sorry to ask, but can you elaborate on where is this signaled?
> > 
> > Thanks, Roger.
> 
> 
> According to SDM:
> 
> > 24 Virtual Machine Control Structures -> 24.4 Guest-state Area -> 24.4.1 Guest Register State
> 
> > IA32_RTIT_CTL (64 bits). This field is supported only on processors that support either the 1-setting of the "load IA32_RTIT_CTL" VM-entry control or that of the "clear IA32_RTIT_CTL" VM-exit control.
> 
> 
> > 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1 VM-Entry Controls
> 
> > Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the reserved bits.
> 
> Please look at bit position 18 "Load IA32_RTIT_CTL".

I think this is something different from what I was referring to.
Those options you refer to (load/clear IA32_RTIT_CTL) deal with
loading/storing a specific field on the vmcs that maps to the guest
IA32_RTIT_CTL.

OTOH MSR load lists can be used to load and store any arbitrary MSR on
vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.

Roger.
Andrew Cooper June 17, 2020, 3:14 p.m. UTC | #6
On 17/06/2020 13:51, Roger Pau Monné wrote:
> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>
>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1 VM-Entry Controls
>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the reserved bits.
>> Please look at bit position 18 "Load IA32_RTIT_CTL".
> I think this is something different from what I was referring to.
> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
> loading/storing a specific field on the vmcs that maps to the guest
> IA32_RTIT_CTL.
>
> OTOH MSR load lists can be used to load and store any arbitrary MSR on
> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.

If I remember the historic roadmaps correctly, there are 3 cases.

The first hardware to support PT (Broadwell?) prohibited its use
completely in VMX operations.  In this case, we can use it to trace PV
guests iff we don't enable VMX in hardware to begin with.

This was relaxed in later hardware (Skylake?) to permit use within VMX
operations, but without any help in the VMCS.  (i.e. manual context
switching per this patch, or MSR load lists as noted in the SDM.)

Subsequent support for "virtualised PT" was added (IceLake?) which adds
the load/save controls, and the ability to translate the output buffer
under EPT.


All of this is from memory so I'm quite possibly wrong with details, but
I believe this is why the current complexity exists.

~Andrew
Michał Leszczyński June 17, 2020, 6:56 p.m. UTC | #7
----- 17 cze 2020 o 17:14, Andrew Cooper andrew.cooper3@citrix.com napisał(a):

> On 17/06/2020 13:51, Roger Pau Monné wrote:
>> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>>
>>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1
>>>> VM-Entry Controls
>>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine
>>>> how it should set the reserved bits.
>>> Please look at bit position 18 "Load IA32_RTIT_CTL".
>> I think this is something different from what I was referring to.
>> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
>> loading/storing a specific field on the vmcs that maps to the guest
>> IA32_RTIT_CTL.
>>
>> OTOH MSR load lists can be used to load and store any arbitrary MSR on
>> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
>> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
> 
> If I remember the historic roadmaps correctly, there are 3 cases.
> 
> The first hardware to support PT (Broadwell?) prohibited its use
> completely in VMX operations.  In this case, we can use it to trace PV
> guests iff we don't enable VMX in hardware to begin with.
> 
> This was relaxed in later hardware (Skylake?) to permit use within VMX
> operations, but without any help in the VMCS.  (i.e. manual context
> switching per this patch, or MSR load lists as noted in the SDM.)
> 
> Subsequent support for "virtualised PT" was added (IceLake?) which adds
> the load/save controls, and the ability to translate the output buffer
> under EPT.
> 
> 
> All of this is from memory so I'm quite possibly wrong with details, but
> I believe this is why the current complexity exists.
> 
> ~Andrew


I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:

> 35.5.2.2 Guest-Only Tracing
> "For this usage, VM-entry is programmed to enable trace packet generation, while VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to disable trace-packet generation in the host."

it actually helped a bit. With patch v1 there were parts of hypervisor recorded in the trace (i.e. the moment between TRACE_EN being set and actual vmenter, and the moment between vmexit and TRACE_EN being unset). Using MSR load list this was eliminated. This change will be reflected in patch v2.


I can't however implement any working scenario in which all these MSRs are managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets are buffered internally and are flushed only when TRACE_EN bit in MSR_IA32_RTIT_CTL is set to 0. The values of remaining registers will be stable after everything is serialized. I think this is too complex for the load lists alone. I belive that currently SDM instructs to use load lists only for toggling this single bit on-or-off.


Thus, for now I propose to stay with MSR_IA32_RTIT_CTL being managed by MSR load lists and the rest of related MSRs being managed manually.


Best regards,
Michał Leszczyński
CERT Polska
Luwei Kang June 17, 2020, 11:30 p.m. UTC | #8
> > On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
> >> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >>
> >>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control
> >>> Fields -> 24.8.1 VM-Entry Controls Software should consult the VMX
> capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the
> reserved bits.
> >> Please look at bit position 18 "Load IA32_RTIT_CTL".
> > I think this is something different from what I was referring to.
> > Those options you refer to (load/clear IA32_RTIT_CTL) deal with
> > loading/storing a specific field on the vmcs that maps to the guest
> > IA32_RTIT_CTL.
> >
> > OTOH MSR load lists can be used to load and store any arbitrary MSR on
> > vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
> > already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
> 
> If I remember the historic roadmaps correctly, there are 3 cases.
> 
> The first hardware to support PT (Broadwell?) prohibited its use completely in
> VMX operations.  In this case, we can use it to trace PV guests iff we don't
> enable VMX in hardware to begin with.
> 
> This was relaxed in later hardware (Skylake?) to permit use within VMX
> operations, but without any help in the VMCS.  (i.e. manual context switching
> per this patch, or MSR load lists as noted in the SDM.)
> 
> Subsequent support for "virtualised PT" was added (IceLake?) which adds the
> load/save controls, and the ability to translate the output buffer under EPT.
> 
> 
> All of this is from memory so I'm quite possibly wrong with details, but I believe
> this is why the current complexity exists.

Yes, It include 3 cases.
1. Before IA32_VMX_MISC[bit 14]:
     Intel PT doesn't support tracing in VMX operation. Execution of the VMXON instruction clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL in VMX operation causes a general-protection exception (#GP)
2. Support IA32_VMX_MISC[bit 14] but no EPT to direct PT output:
    Intel PT can be enabled across VMX but the address of Intel PT buffer is always HPA from HW point of view. There is not VMCS support in this stage. The MSR load list can be used for Intel PT context switch(VM-Entry/Exit).
3. Intel PT VM improvements (start from Icelake):
    Add a new guest IA32_RTIT_CTL field in VMCS, and HW treat the PT output addresses as GPA and translate them using EPT.

Thanks,
Luwei Kang
Roger Pau Monné June 18, 2020, 8:52 a.m. UTC | #9
On Wed, Jun 17, 2020 at 08:56:57PM +0200, Michał Leszczyński wrote:
> ----- 17 cze 2020 o 17:14, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
> 
> > On 17/06/2020 13:51, Roger Pau Monné wrote:
> >> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
> >>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >>>
> >>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1
> >>>> VM-Entry Controls
> >>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine
> >>>> how it should set the reserved bits.
> >>> Please look at bit position 18 "Load IA32_RTIT_CTL".
> >> I think this is something different from what I was referring to.
> >> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
> >> loading/storing a specific field on the vmcs that maps to the guest
> >> IA32_RTIT_CTL.
> >>
> >> OTOH MSR load lists can be used to load and store any arbitrary MSR on
> >> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
> >> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
> > 
> > If I remember the historic roadmaps correctly, there are 3 cases.
> > 
> > The first hardware to support PT (Broadwell?) prohibited its use
> > completely in VMX operations.  In this case, we can use it to trace PV
> > guests iff we don't enable VMX in hardware to begin with.
> > 
> > This was relaxed in later hardware (Skylake?) to permit use within VMX
> > operations, but without any help in the VMCS.  (i.e. manual context
> > switching per this patch, or MSR load lists as noted in the SDM.)
> > 
> > Subsequent support for "virtualised PT" was added (IceLake?) which adds
> > the load/save controls, and the ability to translate the output buffer
> > under EPT.
> > 
> > 
> > All of this is from memory so I'm quite possibly wrong with details, but
> > I believe this is why the current complexity exists.
> > 
> > ~Andrew
> 
> 
> I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:
> 
> > 35.5.2.2 Guest-Only Tracing
> > "For this usage, VM-entry is programmed to enable trace packet generation, while VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to disable trace-packet generation in the host."
> 
> it actually helped a bit. With patch v1 there were parts of hypervisor recorded in the trace (i.e. the moment between TRACE_EN being set and actual vmenter, and the moment between vmexit and TRACE_EN being unset). Using MSR load list this was eliminated. This change will be reflected in patch v2.
> 
> 
> I can't however implement any working scenario in which all these MSRs are managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets are buffered internally and are flushed only when TRACE_EN bit in MSR_IA32_RTIT_CTL is set to 0. The values of remaining registers will be stable after everything is serialized. I think this is too complex for the load lists alone. I belive that currently SDM instructs to use load lists only for toggling this single bit on-or-off.

I think that's exactly what we want: handling TraceEn at
vmentry/vmexit, so that no hypervisor packets are recorded. The rest
of the MSRs can be handled in VMM mode without issues. Switching those
on every vmentry/vmexit would also add more overhead that needed,
since I assume they don't need to be modified on every entry/exit?

> 
> Thus, for now I propose to stay with MSR_IA32_RTIT_CTL being managed by MSR load lists and the rest of related MSRs being managed manually.

Yes, that' seems like a good approach.

Roger.
Andrew Cooper June 18, 2020, 10:02 a.m. UTC | #10
On 18/06/2020 00:30, Kang, Luwei wrote:
>>> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>>>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
>>>>
>>>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control
>>>>> Fields -> 24.8.1 VM-Entry Controls Software should consult the VMX
>> capability MSRs IA32_VMX_ENTRY_CTLS to determine how it should set the
>> reserved bits.
>>>> Please look at bit position 18 "Load IA32_RTIT_CTL".
>>> I think this is something different from what I was referring to.
>>> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
>>> loading/storing a specific field on the vmcs that maps to the guest
>>> IA32_RTIT_CTL.
>>>
>>> OTOH MSR load lists can be used to load and store any arbitrary MSR on
>>> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
>>> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
>> If I remember the historic roadmaps correctly, there are 3 cases.
>>
>> The first hardware to support PT (Broadwell?) prohibited its use completely in
>> VMX operations.  In this case, we can use it to trace PV guests iff we don't
>> enable VMX in hardware to begin with.
>>
>> This was relaxed in later hardware (Skylake?) to permit use within VMX
>> operations, but without any help in the VMCS.  (i.e. manual context switching
>> per this patch, or MSR load lists as noted in the SDM.)
>>
>> Subsequent support for "virtualised PT" was added (IceLake?) which adds the
>> load/save controls, and the ability to translate the output buffer under EPT.
>>
>>
>> All of this is from memory so I'm quite possibly wrong with details, but I believe
>> this is why the current complexity exists.
> Yes, It include 3 cases.
> 1. Before IA32_VMX_MISC[bit 14]:
>      Intel PT doesn't support tracing in VMX operation. Execution of the VMXON instruction clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL in VMX operation causes a general-protection exception (#GP)
> 2. Support IA32_VMX_MISC[bit 14] but no EPT to direct PT output:
>     Intel PT can be enabled across VMX but the address of Intel PT buffer is always HPA from HW point of view. There is not VMCS support in this stage. The MSR load list can be used for Intel PT context switch(VM-Entry/Exit).
> 3. Intel PT VM improvements (start from Icelake):
>     Add a new guest IA32_RTIT_CTL field in VMCS, and HW treat the PT output addresses as GPA and translate them using EPT.

Thanks for the details, and confirming.  I think for now we can ignore
case 1 for simplicity, as I don't think it is likely that we'll have
someone on Broadwell hardware intending to run without VMX.  (If people
really want it, we can retrofit it, but I don't think the effort is
worth it for now)

~Andrew
Michał Leszczyński June 18, 2020, 11:07 a.m. UTC | #11
----- 18 cze 2020 o 10:52, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Wed, Jun 17, 2020 at 08:56:57PM +0200, Michał Leszczyński wrote:
>> ----- 17 cze 2020 o 17:14, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
>> 
>> > On 17/06/2020 13:51, Roger Pau Monné wrote:
>> >> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>> >>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
>> >>>
>> >>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1
>> >>>> VM-Entry Controls
>> >>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine
>> >>>> how it should set the reserved bits.
>> >>> Please look at bit position 18 "Load IA32_RTIT_CTL".
>> >> I think this is something different from what I was referring to.
>> >> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
>> >> loading/storing a specific field on the vmcs that maps to the guest
>> >> IA32_RTIT_CTL.
>> >>
>> >> OTOH MSR load lists can be used to load and store any arbitrary MSR on
>> >> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
>> >> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
>> > 
>> > If I remember the historic roadmaps correctly, there are 3 cases.
>> > 
>> > The first hardware to support PT (Broadwell?) prohibited its use
>> > completely in VMX operations.  In this case, we can use it to trace PV
>> > guests iff we don't enable VMX in hardware to begin with.
>> > 
>> > This was relaxed in later hardware (Skylake?) to permit use within VMX
>> > operations, but without any help in the VMCS.  (i.e. manual context
>> > switching per this patch, or MSR load lists as noted in the SDM.)
>> > 
>> > Subsequent support for "virtualised PT" was added (IceLake?) which adds
>> > the load/save controls, and the ability to translate the output buffer
>> > under EPT.
>> > 
>> > 
>> > All of this is from memory so I'm quite possibly wrong with details, but
>> > I believe this is why the current complexity exists.
>> > 
>> > ~Andrew
>> 
>> 
>> I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:
>> 
>> > 35.5.2.2 Guest-Only Tracing
>> > "For this usage, VM-entry is programmed to enable trace packet generation, while
>> > VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to disable
>> > trace-packet generation in the host."
>> 
>> it actually helped a bit. With patch v1 there were parts of hypervisor recorded
>> in the trace (i.e. the moment between TRACE_EN being set and actual vmenter,
>> and the moment between vmexit and TRACE_EN being unset). Using MSR load list
>> this was eliminated. This change will be reflected in patch v2.
>> 
>> 
>> I can't however implement any working scenario in which all these MSRs are
>> managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets are
>> buffered internally and are flushed only when TRACE_EN bit in MSR_IA32_RTIT_CTL
>> is set to 0. The values of remaining registers will be stable after everything
>> is serialized. I think this is too complex for the load lists alone. I belive
>> that currently SDM instructs to use load lists only for toggling this single
>> bit on-or-off.
> 
> I think that's exactly what we want: handling TraceEn at
> vmentry/vmexit, so that no hypervisor packets are recorded. The rest
> of the MSRs can be handled in VMM mode without issues. Switching those
> on every vmentry/vmexit would also add more overhead that needed,
> since I assume they don't need to be modified on every entry/exit?


Assuming that there is a single DomU per pcpu and they are never migrated between pcpus then you never need to modify the remaining MSRs.

In case DomUs are floating or there are multiple DomUs per pcpu, we need to read out a few MSRs on vm-exit and restore them on vm-entry. Right now I'm always using this approach as I'm pretty not sure how to optimize it without introducing additional bugs. I will show the implementation in patch v2.


> 
>> 
>> Thus, for now I propose to stay with MSR_IA32_RTIT_CTL being managed by MSR load
>> lists and the rest of related MSRs being managed manually.
> 
> Yes, that' seems like a good approach.
> 
> Roger.
Roger Pau Monné June 18, 2020, 11:49 a.m. UTC | #12
On Thu, Jun 18, 2020 at 01:07:33PM +0200, Michał Leszczyński wrote:
> ----- 18 cze 2020 o 10:52, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
> > On Wed, Jun 17, 2020 at 08:56:57PM +0200, Michał Leszczyński wrote:
> >> ----- 17 cze 2020 o 17:14, Andrew Cooper andrew.cooper3@citrix.com napisał(a):
> >> 
> >> > On 17/06/2020 13:51, Roger Pau Monné wrote:
> >> >> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
> >> >>> ----- 17 cze 2020 o 11:09, Roger Pau Monné roger.pau@citrix.com napisał(a):
> >> >>>
> >> >>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 24.8.1
> >> >>>> VM-Entry Controls
> >> >>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to determine
> >> >>>> how it should set the reserved bits.
> >> >>> Please look at bit position 18 "Load IA32_RTIT_CTL".
> >> >> I think this is something different from what I was referring to.
> >> >> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
> >> >> loading/storing a specific field on the vmcs that maps to the guest
> >> >> IA32_RTIT_CTL.
> >> >>
> >> >> OTOH MSR load lists can be used to load and store any arbitrary MSR on
> >> >> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
> >> >> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
> >> > 
> >> > If I remember the historic roadmaps correctly, there are 3 cases.
> >> > 
> >> > The first hardware to support PT (Broadwell?) prohibited its use
> >> > completely in VMX operations.  In this case, we can use it to trace PV
> >> > guests iff we don't enable VMX in hardware to begin with.
> >> > 
> >> > This was relaxed in later hardware (Skylake?) to permit use within VMX
> >> > operations, but without any help in the VMCS.  (i.e. manual context
> >> > switching per this patch, or MSR load lists as noted in the SDM.)
> >> > 
> >> > Subsequent support for "virtualised PT" was added (IceLake?) which adds
> >> > the load/save controls, and the ability to translate the output buffer
> >> > under EPT.
> >> > 
> >> > 
> >> > All of this is from memory so I'm quite possibly wrong with details, but
> >> > I believe this is why the current complexity exists.
> >> > 
> >> > ~Andrew
> >> 
> >> 
> >> I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:
> >> 
> >> > 35.5.2.2 Guest-Only Tracing
> >> > "For this usage, VM-entry is programmed to enable trace packet generation, while
> >> > VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to disable
> >> > trace-packet generation in the host."
> >> 
> >> it actually helped a bit. With patch v1 there were parts of hypervisor recorded
> >> in the trace (i.e. the moment between TRACE_EN being set and actual vmenter,
> >> and the moment between vmexit and TRACE_EN being unset). Using MSR load list
> >> this was eliminated. This change will be reflected in patch v2.
> >> 
> >> 
> >> I can't however implement any working scenario in which all these MSRs are
> >> managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets are
> >> buffered internally and are flushed only when TRACE_EN bit in MSR_IA32_RTIT_CTL
> >> is set to 0. The values of remaining registers will be stable after everything
> >> is serialized. I think this is too complex for the load lists alone. I belive
> >> that currently SDM instructs to use load lists only for toggling this single
> >> bit on-or-off.
> > 
> > I think that's exactly what we want: handling TraceEn at
> > vmentry/vmexit, so that no hypervisor packets are recorded. The rest
> > of the MSRs can be handled in VMM mode without issues. Switching those
> > on every vmentry/vmexit would also add more overhead that needed,
> > since I assume they don't need to be modified on every entry/exit?
> 
> 
> Assuming that there is a single DomU per pcpu and they are never migrated between pcpus then you never need to modify the remaining MSRs.
> 
> In case DomUs are floating or there are multiple DomUs per pcpu, we need to read out a few MSRs on vm-exit and restore them on vm-entry. Right now I'm always using this approach as I'm pretty not sure how to optimize it without introducing additional bugs. I will show the implementation in patch v2.

I think you might likely only need to modify the MSRs when doing
context switches of domains, instead of doing it on every
vmentry/vmexit?

Roger.
Andrew Cooper June 18, 2020, 5:38 p.m. UTC | #13
On 16/06/2020 16:24, Michał Leszczyński wrote:
> Enable IPT when entering the VM and disable it on vmexit.
> Register state is persisted using vCPU ipt_state structure.
>
> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 97104c319e..01d9a7b584 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -3698,6 +3698,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>      __vmread(GUEST_RSP,    &regs->rsp);
>      __vmread(GUEST_RFLAGS, &regs->rflags);
>  
> +    if ( unlikely(v->arch.hvm.vmx.ipt_state) )
> +    {
> +        wrmsrl(MSR_IA32_RTIT_CTL, 0);
> +        smp_rmb();
> +
> +        rdmsrl(MSR_IA32_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
> +        rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, v->arch.hvm.vmx.ipt_state->output_mask);
> +    }
> +
>      hvm_invalidate_regs_fields(regs);
>  
>      if ( paging_mode_hap(v->domain) )
> @@ -4497,6 +4506,23 @@ bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
>      }
>  
>   out:
> +    if ( unlikely(curr->arch.hvm.vmx.ipt_state) )
> +    {
> +        wrmsrl(MSR_IA32_RTIT_CTL, 0);
> +
> +        if (curr->arch.hvm.vmx.ipt_state->ctl)
> +        {
> +            wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, curr->arch.hvm.vmx.ipt_state->output_base);
> +            wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, curr->arch.hvm.vmx.ipt_state->output_mask);
> +            wrmsrl(MSR_IA32_RTIT_STATUS, curr->arch.hvm.vmx.ipt_state->status);
> +
> +            // MSR_IA32_RTIT_CTL is context-switched manually instead of being
> +            // stored inside VMCS, as of Q2'20 only the most recent processors
> +            // support such field in VMCS
> +            wrmsrl(MSR_IA32_RTIT_CTL, curr->arch.hvm.vmx.ipt_state->ctl);
> +        }
> +    }
> +

Some notes to help with v2.

RTIT_CTL wants managing by MSR load/save list.  See how
vmx_update_guest_efer() manages MSR_EFER for the Gen1 hardware case,
because RTIT_CTL is very similar until we get to IceLake hardware and
have a GUEST_RTIT_CTRL field.

With RTIT_CTL handled by MSR load/save list, we are now certain that
TraceEn is always clear in hypervisor context, so there's no need to
explicitly zero it before playing with other MSRs.


You don't need to save/restore the values in vmentry/exit, because that
is very expensive an unnecessary.  Instead, you can use
vmx_ctxt_switch_{from,to}() which is based on when the vcpu is switched
in/out of context.

Specifically, from your current code, it looks to be safe to leave
RTIT_STATUS/OUTPUT_MASK dirty in hardware across multiple
vmentries/exits while the vcpu is in context.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 97104c319e..01d9a7b584 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3698,6 +3698,15 @@  void vmx_vmexit_handler(struct cpu_user_regs *regs)
     __vmread(GUEST_RSP,    &regs->rsp);
     __vmread(GUEST_RFLAGS, &regs->rflags);
 
+    if ( unlikely(v->arch.hvm.vmx.ipt_state) )
+    {
+        wrmsrl(MSR_IA32_RTIT_CTL, 0);
+        smp_rmb();
+
+        rdmsrl(MSR_IA32_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
+        rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, v->arch.hvm.vmx.ipt_state->output_mask);
+    }
+
     hvm_invalidate_regs_fields(regs);
 
     if ( paging_mode_hap(v->domain) )
@@ -4497,6 +4506,23 @@  bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
     }
 
  out:
+    if ( unlikely(curr->arch.hvm.vmx.ipt_state) )
+    {
+        wrmsrl(MSR_IA32_RTIT_CTL, 0);
+
+        if (curr->arch.hvm.vmx.ipt_state->ctl)
+        {
+            wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, curr->arch.hvm.vmx.ipt_state->output_base);
+            wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, curr->arch.hvm.vmx.ipt_state->output_mask);
+            wrmsrl(MSR_IA32_RTIT_STATUS, curr->arch.hvm.vmx.ipt_state->status);
+
+            // MSR_IA32_RTIT_CTL is context-switched manually instead of being
+            // stored inside VMCS, as of Q2'20 only the most recent processors
+            // support such field in VMCS
+            wrmsrl(MSR_IA32_RTIT_CTL, curr->arch.hvm.vmx.ipt_state->ctl);
+        }
+    }
+
     if ( unlikely(curr->arch.hvm.vmx.lbr_flags & LBR_FIXUP_MASK) )
         lbr_fixup();