diff mbox

[PATCHv3,1/3] x86/fpu: improve check for XSAVE* not writing FIP/FDP fields

Message ID 1456397884-661-2-git-send-email-david.vrabel@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Vrabel Feb. 25, 2016, 10:58 a.m. UTC
The hardware may not write the FIP/FDP fields with a XSAVE*
instruction.  e.g., with XSAVEOPT/XSAVES if the state hasn't changed
or on AMD CPUs when a floating point exception is not pending.  We
need to identify this case so we can correctly apply the check for
whether to save/restore FCS/FDS.

By poisoning FIP in the saved state we can check if the hardware
writes to this field.  The poison value is both: a) non-canonical; and
b) random with a vanishingly small probability of matching a value
written by the hardware (1 / (2^63) = 10^-19).

The poison value is fixed and thus knowable by a guest (or guest
userspace).  This could allow the guest to cause Xen to incorrectly
detect that the field has not been written.  But: a) this requires the
FIP register to be a full 64 bits internally which is not the case for
all current AMD and Intel CPUs; and b) this only allows the guest (or
a guest userspace process) to corrupt its own state (i.e., it cannot
affect the state of another guest or another user space process).

This results in smaller code with fewer branches and is more
understandable.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
v3:
- Use a random (and still non-canonical) value to poison FIP.
---
 xen/arch/x86/xstate.c | 47 +++++++++++++++++------------------------------
 1 file changed, 17 insertions(+), 30 deletions(-)

Comments

Jan Beulich Feb. 25, 2016, 11:32 a.m. UTC | #1
>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote:
> The hardware may not write the FIP/FDP fields with a XSAVE*
> instruction.  e.g., with XSAVEOPT/XSAVES if the state hasn't changed
> or on AMD CPUs when a floating point exception is not pending.  We
> need to identify this case so we can correctly apply the check for
> whether to save/restore FCS/FDS.
> 
> By poisoning FIP in the saved state we can check if the hardware
> writes to this field.  The poison value is both: a) non-canonical; and
> b) random with a vanishingly small probability of matching a value
> written by the hardware (1 / (2^63) = 10^-19).

The hardware by itself will always write a canonical value with
the 64-bit save variants. The case to consider really is, as said
before, that of software storing an arbitrary value there, and
for that case I don't think a how ever small probability would
make my concerns go away (or else I would have suggested
this variation of your previous approach during v2 review).

> The poison value is fixed and thus knowable by a guest (or guest
> userspace).  This could allow the guest to cause Xen to incorrectly
> detect that the field has not been written.  But: a) this requires the
> FIP register to be a full 64 bits internally which is not the case for
> all current AMD and Intel CPUs; and b) this only allows the guest (or
> a guest userspace process) to corrupt its own state (i.e., it cannot
> affect the state of another guest or another user space process).
> 
> This results in smaller code with fewer branches and is more
> understandable.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Pending confirmation on FIP register width by at least Intel,
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
David Vrabel Feb. 25, 2016, 12:18 p.m. UTC | #2
On 25/02/16 11:32, Jan Beulich wrote:
>>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote:
>> The hardware may not write the FIP/FDP fields with a XSAVE*
>> instruction.  e.g., with XSAVEOPT/XSAVES if the state hasn't changed
>> or on AMD CPUs when a floating point exception is not pending.  We
>> need to identify this case so we can correctly apply the check for
>> whether to save/restore FCS/FDS.
>>
>> By poisoning FIP in the saved state we can check if the hardware
>> writes to this field.  The poison value is both: a) non-canonical; and
>> b) random with a vanishingly small probability of matching a value
>> written by the hardware (1 / (2^63) = 10^-19).
> 
> The hardware by itself will always write a canonical value with
> the 64-bit save variants. The case to consider really is, as said
> before, that of software storing an arbitrary value there, and
> for that case I don't think a how ever small probability would
> make my concerns go away (or else I would have suggested
> this variation of your previous approach during v2 review).

Do you not appreciate how unlikely 10^-19 is?

Assuming a context switch every 1 ms the probability of a error in a
year is 3e-9.

The probability of a dinosaur killing asteroid strike in a year is about
2e-8.

I know which one I'd be worried about...

>> The poison value is fixed and thus knowable by a guest (or guest
>> userspace).  This could allow the guest to cause Xen to incorrectly
>> detect that the field has not been written.  But: a) this requires the
>> FIP register to be a full 64 bits internally which is not the case for
>> all current AMD and Intel CPUs; and b) this only allows the guest (or
>> a guest userspace process) to corrupt its own state (i.e., it cannot
>> affect the state of another guest or another user space process).
>>
>> This results in smaller code with fewer branches and is more
>> understandable.
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> Pending confirmation on FIP register width by at least Intel,
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and
thus we will always use the 64-bit save.

For AMD, which only writes FIP and FDP if an exception is pending, if a
guest wanted to use FIP to store an arbitrary 64-bit value (in some
future CPU) it would have to manually set an exception as pending.  Its
seems implausible that any software would actually do this.

David
Jan Beulich Feb. 25, 2016, 12:27 p.m. UTC | #3
>>> On 25.02.16 at 13:18, <david.vrabel@citrix.com> wrote:
> On 25/02/16 11:32, Jan Beulich wrote:
>>>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote:
>>> The hardware may not write the FIP/FDP fields with a XSAVE*
>>> instruction.  e.g., with XSAVEOPT/XSAVES if the state hasn't changed
>>> or on AMD CPUs when a floating point exception is not pending.  We
>>> need to identify this case so we can correctly apply the check for
>>> whether to save/restore FCS/FDS.
>>>
>>> By poisoning FIP in the saved state we can check if the hardware
>>> writes to this field.  The poison value is both: a) non-canonical; and
>>> b) random with a vanishingly small probability of matching a value
>>> written by the hardware (1 / (2^63) = 10^-19).
>> 
>> The hardware by itself will always write a canonical value with
>> the 64-bit save variants. The case to consider really is, as said
>> before, that of software storing an arbitrary value there, and
>> for that case I don't think a how ever small probability would
>> make my concerns go away (or else I would have suggested
>> this variation of your previous approach during v2 review).
> 
> Do you not appreciate how unlikely 10^-19 is?
> 
> Assuming a context switch every 1 ms the probability of a error in a
> year is 3e-9.
> 
> The probability of a dinosaur killing asteroid strike in a year is about
> 2e-8.
> 
> I know which one I'd be worried about...

:-)

>>> The poison value is fixed and thus knowable by a guest (or guest
>>> userspace).  This could allow the guest to cause Xen to incorrectly
>>> detect that the field has not been written.  But: a) this requires the
>>> FIP register to be a full 64 bits internally which is not the case for
>>> all current AMD and Intel CPUs; and b) this only allows the guest (or
>>> a guest userspace process) to corrupt its own state (i.e., it cannot
>>> affect the state of another guest or another user space process).
>>>
>>> This results in smaller code with fewer branches and is more
>>> understandable.
>>>
>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> 
>> Pending confirmation on FIP register width by at least Intel,
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and
> thus we will always use the 64-bit save.

Has Intel told you (but not us), or is this just based on experiments
you did, or re-stating what I've found from experimenting?

> For AMD, which only writes FIP and FDP if an exception is pending, if a
> guest wanted to use FIP to store an arbitrary 64-bit value (in some
> future CPU) it would have to manually set an exception as pending.  Its
> seems implausible that any software would actually do this.

All of these uses of FIP/FDP are implausible, yet we're aiming at
correctly mimicking hardware behavior, allowing folks to even do
implausible things that work on bare hardware.

Jan
David Vrabel Feb. 25, 2016, 12:49 p.m. UTC | #4
On 25/02/16 12:27, Jan Beulich wrote:
>>>> On 25.02.16 at 13:18, <david.vrabel@citrix.com> wrote:
>> On 25/02/16 11:32, Jan Beulich wrote:
>>>>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote:
>>>> The poison value is fixed and thus knowable by a guest (or guest
>>>> userspace).  This could allow the guest to cause Xen to incorrectly
>>>> detect that the field has not been written.  But: a) this requires the
>>>> FIP register to be a full 64 bits internally which is not the case for
>>>> all current AMD and Intel CPUs; and b) this only allows the guest (or
>>>> a guest userspace process) to corrupt its own state (i.e., it cannot
>>>> affect the state of another guest or another user space process).
>>>>
>>>> This results in smaller code with fewer branches and is more
>>>> understandable.
>>>>
>>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>>
>>> Pending confirmation on FIP register width by at least Intel,
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and
>> thus we will always use the 64-bit save.
> 
> Has Intel told you (but not us), or is this just based on experiments
> you did, or re-stating what I've found from experimenting?

I'm just restating things already mentioned in the various threads.

>> For AMD, which only writes FIP and FDP if an exception is pending, if a
>> guest wanted to use FIP to store an arbitrary 64-bit value (in some
>> future CPU) it would have to manually set an exception as pending.  Its
>> seems implausible that any software would actually do this.
> 
> All of these uses of FIP/FDP are implausible, yet we're aiming at
> correctly mimicking hardware behavior, allowing folks to even do
> implausible things that work on bare hardware.

I think:

a) On hardware with FPCSDS, we always do a 64-bit save/restore and thus
always match the hardware behaviour.

b) On hardware without FPCSDS we /cannot/ match the hardware behaviour.
 We must have some sort of heuristic to cover the common use cases.  The
existing heuristic is /already/ inadequate since Driver Verifier
confuses it. So IMO, making the heuristic a teeny, tiny bit less precise
doesn't matter.

c) For the uncommon use cases, there is always HVM_PARAM_X87_FIP_WIDTH
to force a particular behaviour.

David
Andrew Cooper Feb. 25, 2016, 1:16 p.m. UTC | #5
On 25/02/16 12:49, David Vrabel wrote:
> On 25/02/16 12:27, Jan Beulich wrote:
>>>>> On 25.02.16 at 13:18, <david.vrabel@citrix.com> wrote:
>>> On 25/02/16 11:32, Jan Beulich wrote:
>>>>>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote:
>>>>> The poison value is fixed and thus knowable by a guest (or guest
>>>>> userspace).  This could allow the guest to cause Xen to incorrectly
>>>>> detect that the field has not been written.  But: a) this requires the
>>>>> FIP register to be a full 64 bits internally which is not the case for
>>>>> all current AMD and Intel CPUs; and b) this only allows the guest (or
>>>>> a guest userspace process) to corrupt its own state (i.e., it cannot
>>>>> affect the state of another guest or another user space process).
>>>>>
>>>>> This results in smaller code with fewer branches and is more
>>>>> understandable.
>>>>>
>>>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>>> Pending confirmation on FIP register width by at least Intel,
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and
>>> thus we will always use the 64-bit save.
>> Has Intel told you (but not us), or is this just based on experiments
>> you did, or re-stating what I've found from experimenting?
> I'm just restating things already mentioned in the various threads.
>
>>> For AMD, which only writes FIP and FDP if an exception is pending, if a
>>> guest wanted to use FIP to store an arbitrary 64-bit value (in some
>>> future CPU) it would have to manually set an exception as pending.  Its
>>> seems implausible that any software would actually do this.
>> All of these uses of FIP/FDP are implausible, yet we're aiming at
>> correctly mimicking hardware behavior, allowing folks to even do
>> implausible things that work on bare hardware.
> I think:
>
> a) On hardware with FPCSDS, we always do a 64-bit save/restore and thus
> always match the hardware behaviour.
>
> b) On hardware without FPCSDS we /cannot/ match the hardware behaviour.
>  We must have some sort of heuristic to cover the common use cases.  The
> existing heuristic is /already/ inadequate since Driver Verifier
> confuses it. So IMO, making the heuristic a teeny, tiny bit less precise
> doesn't matter.
>
> c) For the uncommon use cases, there is always HVM_PARAM_X87_FIP_WIDTH
> to force a particular behaviour.

No OS is plausibly going to hide non-IP information in FIP.

If some theoretical OS does do something like that, there is always the
override available.

~Andrew
Jan Beulich Feb. 25, 2016, 2:27 p.m. UTC | #6
>>> On 25.02.16 at 14:16, <andrew.cooper3@citrix.com> wrote:
> No OS is plausibly going to hide non-IP information in FIP.
> 
> If some theoretical OS does do something like that, there is always the
> override available.

But who says it needs to be an OS to do so? User mode code could
as much (and would imo even be more likely to do so, compared to
an OS).

Jan
Andrew Cooper Feb. 25, 2016, 3:07 p.m. UTC | #7
On 25/02/16 14:27, Jan Beulich wrote:
>>>> On 25.02.16 at 14:16, <andrew.cooper3@citrix.com> wrote:
>> No OS is plausibly going to hide non-IP information in FIP.
>>
>> If some theoretical OS does do something like that, there is always the
>> override available.
> But who says it needs to be an OS to do so? User mode code could
> as much (and would imo even be more likely to do so, compared to
> an OS).

Why does this matter?  We *cannot* make a perfect heuristic in this case.

The new heuristic is wrong substantially less often than the old
heuristic, so is an improvement.

~Andrew
David Vrabel Feb. 25, 2016, 3:09 p.m. UTC | #8
On 25/02/16 15:07, Andrew Cooper wrote:
> On 25/02/16 14:27, Jan Beulich wrote:
>>>>> On 25.02.16 at 14:16, <andrew.cooper3@citrix.com> wrote:
>>> No OS is plausibly going to hide non-IP information in FIP.
>>>
>>> If some theoretical OS does do something like that, there is always the
>>> override available.
>> But who says it needs to be an OS to do so? User mode code could
>> as much (and would imo even be more likely to do so, compared to
>> an OS).
> 
> Why does this matter?  We *cannot* make a perfect heuristic in this case.
> 
> The new heuristic is wrong substantially less often than the old
> heuristic, so is an improvement.

Well, no.  The current one is wrong 0.01% of the time[1], and is now
wrong 0.0100000000000000001% of the time.  I would suggest that this
isn't a significant different.

Anyway, since it was trivial to reorder the series, I'm posting a v4
with this controversial patch at the end.

David

[1] Based on the number of XenServer test cases it breaks.
Tian, Kevin March 1, 2016, 6:27 a.m. UTC | #9
> From: Jan Beulich

> Sent: Thursday, February 25, 2016 8:28 PM

> 

> >>

> >> Pending confirmation on FIP register width by at least Intel,

> >> Reviewed-by: Jan Beulich <jbeulich@suse.com>

> >

> > For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and

> > thus we will always use the 64-bit save.

> 

> Has Intel told you (but not us), or is this just based on experiments

> you did, or re-stating what I've found from experimenting?

> 


Still wait for an internal clarification... but could I think this
question is more for older CPUs which don't claim FPCSDS
since otherwise no such trick is required with 64bit save/restore?

Thanks
Kevin
Jan Beulich March 1, 2016, 9:31 a.m. UTC | #10
>>> On 01.03.16 at 07:27, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich
>> Sent: Thursday, February 25, 2016 8:28 PM
>> 
>> >>
>> >> Pending confirmation on FIP register width by at least Intel,
>> >> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> >
>> > For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and
>> > thus we will always use the 64-bit save.
>> 
>> Has Intel told you (but not us), or is this just based on experiments
>> you did, or re-stating what I've found from experimenting?
>> 
> 
> Still wait for an internal clarification... but could I think this
> question is more for older CPUs which don't claim FPCSDS
> since otherwise no such trick is required with 64bit save/restore?

Yes, the question regards just CPUs storing non-zero selector
values. Yet with that functionality being dropped having an
impact on migration of Windows guests, I'm not actually sure you
can retain the decision to no longer store these selectors without
the OS (or hypervisor in our case) explicitly stating its consent
(via some control register bit).

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 4f2fb8e..6a917f8 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -263,41 +263,28 @@  void xsave(struct vcpu *v, uint64_t mask)
 
     if ( word_size <= 0 || !is_pv_32bit_vcpu(v) )
     {
-        typeof(ptr->fpu_sse.fip.sel) fcs = ptr->fpu_sse.fip.sel;
-        typeof(ptr->fpu_sse.fdp.sel) fds = ptr->fpu_sse.fdp.sel;
+        uint64_t orig_fip;
+        static const uint64_t bad_fip = 0x6a3f5c4b13a533f6;
 
-        if ( cpu_has_xsaveopt || cpu_has_xsaves )
-        {
-            /*
-             * XSAVEOPT/XSAVES may not write the FPU portion even when the
-             * respective mask bit is set. For the check further down to work
-             * we hence need to put the save image back into the state that
-             * it was in right after the previous XSAVEOPT.
-             */
-            if ( word_size > 0 &&
-                 (ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET] == 4 ||
-                  ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET] == 2) )
-            {
-                ptr->fpu_sse.fip.sel = 0;
-                ptr->fpu_sse.fdp.sel = 0;
-            }
-        }
+        /*
+         * FIP/FDP may not be written in some cases (e.g., if
+         * XSAVEOPT/XSAVES is used, or on AMD CPUs if an exception
+         * isn't pending).
+         *
+         * To tell if the hardware writes these fields, poison the FIP
+         * field.  The poison is both a) non-canonical; and b) random
+         * with a vanishingly small probability to match a value the
+         * hardware may write (1e-19).
+         */
+        orig_fip = ptr->fpu_sse.fip.addr;
+        ptr->fpu_sse.fip.addr = bad_fip;
 
         XSAVE("0x48,");
 
-        if ( !(mask & ptr->xsave_hdr.xstate_bv & XSTATE_FP) ||
-             /*
-              * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
-              * is pending.
-              */
-             (!(ptr->fpu_sse.fsw & 0x0080) &&
-              boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+        /* FIP/FDP not updated? Restore the old FIP value. */
+        if ( ptr->fpu_sse.fip.addr == bad_fip )
         {
-            if ( (cpu_has_xsaveopt || cpu_has_xsaves) && word_size > 0 )
-            {
-                ptr->fpu_sse.fip.sel = fcs;
-                ptr->fpu_sse.fdp.sel = fds;
-            }
+            ptr->fpu_sse.fip.addr = orig_fip;
             return;
         }