Message ID | 1456397884-661-2-git-send-email-david.vrabel@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote: > The hardware may not write the FIP/FDP fields with a XSAVE* > instruction. e.g., with XSAVEOPT/XSAVES if the state hasn't changed > or on AMD CPUs when a floating point exception is not pending. We > need to identify this case so we can correctly apply the check for > whether to save/restore FCS/FDS. > > By poisoning FIP in the saved state we can check if the hardware > writes to this field. The poison value is both: a) non-canonical; and > b) random with a vanishingly small probability of matching a value > written by the hardware (1 / (2^63) = 10^-19). The hardware by itself will always write a canonical value with the 64-bit save variants. The case to consider really is, as said before, that of software storing an arbitrary value there, and for that case I don't think a how ever small probability would make my concerns go away (or else I would have suggested this variation of your previous approach during v2 review). > The poison value is fixed and thus knowable by a guest (or guest > userspace). This could allow the guest to cause Xen to incorrectly > detect that the field has not been written. But: a) this requires the > FIP register to be a full 64 bits internally which is not the case for > all current AMD and Intel CPUs; and b) this only allows the guest (or > a guest userspace process) to corrupt its own state (i.e., it cannot > affect the state of another guest or another user space process). > > This results in smaller code with fewer branches and is more > understandable. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> Pending confirmation on FIP register width by at least Intel, Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
On 25/02/16 11:32, Jan Beulich wrote: >>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote: >> The hardware may not write the FIP/FDP fields with a XSAVE* >> instruction. e.g., with XSAVEOPT/XSAVES if the state hasn't changed >> or on AMD CPUs when a floating point exception is not pending. We >> need to identify this case so we can correctly apply the check for >> whether to save/restore FCS/FDS. >> >> By poisoning FIP in the saved state we can check if the hardware >> writes to this field. The poison value is both: a) non-canonical; and >> b) random with a vanishingly small probability of matching a value >> written by the hardware (1 / (2^63) = 10^-19). > > The hardware by itself will always write a canonical value with > the 64-bit save variants. The case to consider really is, as said > before, that of software storing an arbitrary value there, and > for that case I don't think a how ever small probability would > make my concerns go away (or else I would have suggested > this variation of your previous approach during v2 review). Do you not appreciate how unlikely 10^-19 is? Assuming a context switch every 1 ms the probability of a error in a year is 3e-9. The probability of a dinosaur killing asteroid strike in a year is about 2e-8. I know which one I'd be worried about... >> The poison value is fixed and thus knowable by a guest (or guest >> userspace). This could allow the guest to cause Xen to incorrectly >> detect that the field has not been written. But: a) this requires the >> FIP register to be a full 64 bits internally which is not the case for >> all current AMD and Intel CPUs; and b) this only allows the guest (or >> a guest userspace process) to corrupt its own state (i.e., it cannot >> affect the state of another guest or another user space process). >> >> This results in smaller code with fewer branches and is more >> understandable. >> >> Signed-off-by: David Vrabel <david.vrabel@citrix.com> > > Pending confirmation on FIP register width by at least Intel, > Reviewed-by: Jan Beulich <jbeulich@suse.com> For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and thus we will always use the 64-bit save. For AMD, which only writes FIP and FDP if an exception is pending, if a guest wanted to use FIP to store an arbitrary 64-bit value (in some future CPU) it would have to manually set an exception as pending. Its seems implausible that any software would actually do this. David
>>> On 25.02.16 at 13:18, <david.vrabel@citrix.com> wrote: > On 25/02/16 11:32, Jan Beulich wrote: >>>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote: >>> The hardware may not write the FIP/FDP fields with a XSAVE* >>> instruction. e.g., with XSAVEOPT/XSAVES if the state hasn't changed >>> or on AMD CPUs when a floating point exception is not pending. We >>> need to identify this case so we can correctly apply the check for >>> whether to save/restore FCS/FDS. >>> >>> By poisoning FIP in the saved state we can check if the hardware >>> writes to this field. The poison value is both: a) non-canonical; and >>> b) random with a vanishingly small probability of matching a value >>> written by the hardware (1 / (2^63) = 10^-19). >> >> The hardware by itself will always write a canonical value with >> the 64-bit save variants. The case to consider really is, as said >> before, that of software storing an arbitrary value there, and >> for that case I don't think a how ever small probability would >> make my concerns go away (or else I would have suggested >> this variation of your previous approach during v2 review). > > Do you not appreciate how unlikely 10^-19 is? > > Assuming a context switch every 1 ms the probability of a error in a > year is 3e-9. > > The probability of a dinosaur killing asteroid strike in a year is about > 2e-8. > > I know which one I'd be worried about... :-) >>> The poison value is fixed and thus knowable by a guest (or guest >>> userspace). This could allow the guest to cause Xen to incorrectly >>> detect that the field has not been written. But: a) this requires the >>> FIP register to be a full 64 bits internally which is not the case for >>> all current AMD and Intel CPUs; and b) this only allows the guest (or >>> a guest userspace process) to corrupt its own state (i.e., it cannot >>> affect the state of another guest or another user space process). >>> >>> This results in smaller code with fewer branches and is more >>> understandable. >>> >>> Signed-off-by: David Vrabel <david.vrabel@citrix.com> >> >> Pending confirmation on FIP register width by at least Intel, >> Reviewed-by: Jan Beulich <jbeulich@suse.com> > > For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and > thus we will always use the 64-bit save. Has Intel told you (but not us), or is this just based on experiments you did, or re-stating what I've found from experimenting? > For AMD, which only writes FIP and FDP if an exception is pending, if a > guest wanted to use FIP to store an arbitrary 64-bit value (in some > future CPU) it would have to manually set an exception as pending. Its > seems implausible that any software would actually do this. All of these uses of FIP/FDP are implausible, yet we're aiming at correctly mimicking hardware behavior, allowing folks to even do implausible things that work on bare hardware. Jan
On 25/02/16 12:27, Jan Beulich wrote: >>>> On 25.02.16 at 13:18, <david.vrabel@citrix.com> wrote: >> On 25/02/16 11:32, Jan Beulich wrote: >>>>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote: >>>> The poison value is fixed and thus knowable by a guest (or guest >>>> userspace). This could allow the guest to cause Xen to incorrectly >>>> detect that the field has not been written. But: a) this requires the >>>> FIP register to be a full 64 bits internally which is not the case for >>>> all current AMD and Intel CPUs; and b) this only allows the guest (or >>>> a guest userspace process) to corrupt its own state (i.e., it cannot >>>> affect the state of another guest or another user space process). >>>> >>>> This results in smaller code with fewer branches and is more >>>> understandable. >>>> >>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com> >>> >>> Pending confirmation on FIP register width by at least Intel, >>> Reviewed-by: Jan Beulich <jbeulich@suse.com> >> >> For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and >> thus we will always use the 64-bit save. > > Has Intel told you (but not us), or is this just based on experiments > you did, or re-stating what I've found from experimenting? I'm just restating things already mentioned in the various threads. >> For AMD, which only writes FIP and FDP if an exception is pending, if a >> guest wanted to use FIP to store an arbitrary 64-bit value (in some >> future CPU) it would have to manually set an exception as pending. Its >> seems implausible that any software would actually do this. > > All of these uses of FIP/FDP are implausible, yet we're aiming at > correctly mimicking hardware behavior, allowing folks to even do > implausible things that work on bare hardware. I think: a) On hardware with FPCSDS, we always do a 64-bit save/restore and thus always match the hardware behaviour. b) On hardware without FPCSDS we /cannot/ match the hardware behaviour. We must have some sort of heuristic to cover the common use cases. The existing heuristic is /already/ inadequate since Driver Verifier confuses it. So IMO, making the heuristic a teeny, tiny bit less precise doesn't matter. c) For the uncommon use cases, there is always HVM_PARAM_X87_FIP_WIDTH to force a particular behaviour. David
On 25/02/16 12:49, David Vrabel wrote: > On 25/02/16 12:27, Jan Beulich wrote: >>>>> On 25.02.16 at 13:18, <david.vrabel@citrix.com> wrote: >>> On 25/02/16 11:32, Jan Beulich wrote: >>>>>>> On 25.02.16 at 11:58, <david.vrabel@citrix.com> wrote: >>>>> The poison value is fixed and thus knowable by a guest (or guest >>>>> userspace). This could allow the guest to cause Xen to incorrectly >>>>> detect that the field has not been written. But: a) this requires the >>>>> FIP register to be a full 64 bits internally which is not the case for >>>>> all current AMD and Intel CPUs; and b) this only allows the guest (or >>>>> a guest userspace process) to corrupt its own state (i.e., it cannot >>>>> affect the state of another guest or another user space process). >>>>> >>>>> This results in smaller code with fewer branches and is more >>>>> understandable. >>>>> >>>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com> >>>> Pending confirmation on FIP register width by at least Intel, >>>> Reviewed-by: Jan Beulich <jbeulich@suse.com> >>> For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and >>> thus we will always use the 64-bit save. >> Has Intel told you (but not us), or is this just based on experiments >> you did, or re-stating what I've found from experimenting? > I'm just restating things already mentioned in the various threads. > >>> For AMD, which only writes FIP and FDP if an exception is pending, if a >>> guest wanted to use FIP to store an arbitrary 64-bit value (in some >>> future CPU) it would have to manually set an exception as pending. Its >>> seems implausible that any software would actually do this. >> All of these uses of FIP/FDP are implausible, yet we're aiming at >> correctly mimicking hardware behavior, allowing folks to even do >> implausible things that work on bare hardware. > I think: > > a) On hardware with FPCSDS, we always do a 64-bit save/restore and thus > always match the hardware behaviour. > > b) On hardware without FPCSDS we /cannot/ match the hardware behaviour. > We must have some sort of heuristic to cover the common use cases. The > existing heuristic is /already/ inadequate since Driver Verifier > confuses it. So IMO, making the heuristic a teeny, tiny bit less precise > doesn't matter. > > c) For the uncommon use cases, there is always HVM_PARAM_X87_FIP_WIDTH > to force a particular behaviour. No OS is plausibly going to hide non-IP information in FIP. If some theoretical OS does do something like that, there is always the override available. ~Andrew
>>> On 25.02.16 at 14:16, <andrew.cooper3@citrix.com> wrote: > No OS is plausibly going to hide non-IP information in FIP. > > If some theoretical OS does do something like that, there is always the > override available. But who says it needs to be an OS to do so? User mode code could as much (and would imo even be more likely to do so, compared to an OS). Jan
On 25/02/16 14:27, Jan Beulich wrote: >>>> On 25.02.16 at 14:16, <andrew.cooper3@citrix.com> wrote: >> No OS is plausibly going to hide non-IP information in FIP. >> >> If some theoretical OS does do something like that, there is always the >> override available. > But who says it needs to be an OS to do so? User mode code could > as much (and would imo even be more likely to do so, compared to > an OS). Why does this matter? We *cannot* make a perfect heuristic in this case. The new heuristic is wrong substantially less often than the old heuristic, so is an improvement. ~Andrew
On 25/02/16 15:07, Andrew Cooper wrote: > On 25/02/16 14:27, Jan Beulich wrote: >>>>> On 25.02.16 at 14:16, <andrew.cooper3@citrix.com> wrote: >>> No OS is plausibly going to hide non-IP information in FIP. >>> >>> If some theoretical OS does do something like that, there is always the >>> override available. >> But who says it needs to be an OS to do so? User mode code could >> as much (and would imo even be more likely to do so, compared to >> an OS). > > Why does this matter? We *cannot* make a perfect heuristic in this case. > > The new heuristic is wrong substantially less often than the old > heuristic, so is an improvement. Well, no. The current one is wrong 0.01% of the time[1], and is now wrong 0.0100000000000000001% of the time. I would suggest that this isn't a significant different. Anyway, since it was trivial to reorder the series, I'm posting a v4 with this controversial patch at the end. David [1] Based on the number of XenServer test cases it breaks.
> From: Jan Beulich > Sent: Thursday, February 25, 2016 8:28 PM > > >> > >> Pending confirmation on FIP register width by at least Intel, > >> Reviewed-by: Jan Beulich <jbeulich@suse.com> > > > > For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and > > thus we will always use the 64-bit save. > > Has Intel told you (but not us), or is this just based on experiments > you did, or re-stating what I've found from experimenting? > Still wait for an internal clarification... but could I think this question is more for older CPUs which don't claim FPCSDS since otherwise no such trick is required with 64bit save/restore? Thanks Kevin
>>> On 01.03.16 at 07:27, <kevin.tian@intel.com> wrote: >> From: Jan Beulich >> Sent: Thursday, February 25, 2016 8:28 PM >> >> >> >> >> Pending confirmation on FIP register width by at least Intel, >> >> Reviewed-by: Jan Beulich <jbeulich@suse.com> >> > >> > For Intel CPUs, FIP is 48-bits internally and newer CPUs have FPCSDS and >> > thus we will always use the 64-bit save. >> >> Has Intel told you (but not us), or is this just based on experiments >> you did, or re-stating what I've found from experimenting? >> > > Still wait for an internal clarification... but could I think this > question is more for older CPUs which don't claim FPCSDS > since otherwise no such trick is required with 64bit save/restore? Yes, the question regards just CPUs storing non-zero selector values. Yet with that functionality being dropped having an impact on migration of Windows guests, I'm not actually sure you can retain the decision to no longer store these selectors without the OS (or hypervisor in our case) explicitly stating its consent (via some control register bit). Jan
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index 4f2fb8e..6a917f8 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -263,41 +263,28 @@ void xsave(struct vcpu *v, uint64_t mask) if ( word_size <= 0 || !is_pv_32bit_vcpu(v) ) { - typeof(ptr->fpu_sse.fip.sel) fcs = ptr->fpu_sse.fip.sel; - typeof(ptr->fpu_sse.fdp.sel) fds = ptr->fpu_sse.fdp.sel; + uint64_t orig_fip; + static const uint64_t bad_fip = 0x6a3f5c4b13a533f6; - if ( cpu_has_xsaveopt || cpu_has_xsaves ) - { - /* - * XSAVEOPT/XSAVES may not write the FPU portion even when the - * respective mask bit is set. For the check further down to work - * we hence need to put the save image back into the state that - * it was in right after the previous XSAVEOPT. - */ - if ( word_size > 0 && - (ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET] == 4 || - ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET] == 2) ) - { - ptr->fpu_sse.fip.sel = 0; - ptr->fpu_sse.fdp.sel = 0; - } - } + /* + * FIP/FDP may not be written in some cases (e.g., if + * XSAVEOPT/XSAVES is used, or on AMD CPUs if an exception + * isn't pending). + * + * To tell if the hardware writes these fields, poison the FIP + * field. The poison is both a) non-canonical; and b) random + * with a vanishingly small probability to match a value the + * hardware may write (1e-19). + */ + orig_fip = ptr->fpu_sse.fip.addr; + ptr->fpu_sse.fip.addr = bad_fip; XSAVE("0x48,"); - if ( !(mask & ptr->xsave_hdr.xstate_bv & XSTATE_FP) || - /* - * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception - * is pending. - */ - (!(ptr->fpu_sse.fsw & 0x0080) && - boot_cpu_data.x86_vendor == X86_VENDOR_AMD) ) + /* FIP/FDP not updated? Restore the old FIP value. */ + if ( ptr->fpu_sse.fip.addr == bad_fip ) { - if ( (cpu_has_xsaveopt || cpu_has_xsaves) && word_size > 0 ) - { - ptr->fpu_sse.fip.sel = fcs; - ptr->fpu_sse.fdp.sel = fds; - } + ptr->fpu_sse.fip.addr = orig_fip; return; }
The hardware may not write the FIP/FDP fields with a XSAVE* instruction. e.g., with XSAVEOPT/XSAVES if the state hasn't changed or on AMD CPUs when a floating point exception is not pending. We need to identify this case so we can correctly apply the check for whether to save/restore FCS/FDS. By poisoning FIP in the saved state we can check if the hardware writes to this field. The poison value is both: a) non-canonical; and b) random with a vanishingly small probability of matching a value written by the hardware (1 / (2^63) = 10^-19). The poison value is fixed and thus knowable by a guest (or guest userspace). This could allow the guest to cause Xen to incorrectly detect that the field has not been written. But: a) this requires the FIP register to be a full 64 bits internally which is not the case for all current AMD and Intel CPUs; and b) this only allows the guest (or a guest userspace process) to corrupt its own state (i.e., it cannot affect the state of another guest or another user space process). This results in smaller code with fewer branches and is more understandable. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- v3: - Use a random (and still non-canonical) value to poison FIP. --- xen/arch/x86/xstate.c | 47 +++++++++++++++++------------------------------ 1 file changed, 17 insertions(+), 30 deletions(-)