diff mbox series

x86: refine guest_mode()

Message ID 7b62d06c-1369-2857-81c0-45e2434357f4@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: refine guest_mode() | expand

Commit Message

Jan Beulich April 27, 2020, 8:03 a.m. UTC
The 2nd of the assertions as well as the macro's return value have been
assuming we're on the primary stack. While for most IST exceptions we
eventually switch back to the main one, for #DF we intentionally never
do, and hence a #DF actually triggering on a user mode insn (which then
is still a Xen bug) would in turn trigger this assertion, rather than
cleanly logging state.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
While we could go further and also assert we're on the correct IST
stack in an "else" ti the "if()" added, I'm not fully convinced this
would be generally helpful. I'll be happy to adjust accordingly if
others think differently; at such a point though I think this should
then no longer be a macro.

Comments

Roger Pau Monné April 27, 2020, 9:59 a.m. UTC | #1
On Mon, Apr 27, 2020 at 10:03:05AM +0200, Jan Beulich wrote:
> The 2nd of the assertions as well as the macro's return value have been
> assuming we're on the primary stack. While for most IST exceptions we
> eventually switch back to the main one, for #DF we intentionally never
> do, and hence a #DF actually triggering on a user mode insn (which then
> is still a Xen bug) would in turn trigger this assertion, rather than
> cleanly logging state.
> 
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> While we could go further and also assert we're on the correct IST
> stack in an "else" ti the "if()" added, I'm not fully convinced this
> would be generally helpful. I'll be happy to adjust accordingly if
> others think differently; at such a point though I think this should
> then no longer be a macro.
> 
> --- a/xen/include/asm-x86/regs.h
> +++ b/xen/include/asm-x86/regs.h
> @@ -10,9 +10,10 @@
>      /* Frame pointer must point into current CPU stack. */                    \
>      ASSERT(diff < STACK_SIZE);                                                \
>      /* If not a guest frame, it must be a hypervisor frame. */                \
> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \

Why not use:

ASSERT(diff >= PRIMARY_STACK_SIZE || !diff || ((r)->cs == __HYPERVISOR_CS));

I'm not sure I fully understand this layout, is it possible that you
also need to account for the size of cpu_info?

Roger.
Jan Beulich April 27, 2020, 2:08 p.m. UTC | #2
On 27.04.2020 11:59, Roger Pau Monné wrote:
> On Mon, Apr 27, 2020 at 10:03:05AM +0200, Jan Beulich wrote:
>> --- a/xen/include/asm-x86/regs.h
>> +++ b/xen/include/asm-x86/regs.h
>> @@ -10,9 +10,10 @@
>>      /* Frame pointer must point into current CPU stack. */                    \
>>      ASSERT(diff < STACK_SIZE);                                                \
>>      /* If not a guest frame, it must be a hypervisor frame. */                \
>> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
>> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
>> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
> 
> Why not use:
> 
> ASSERT(diff >= PRIMARY_STACK_SIZE || !diff || ((r)->cs == __HYPERVISOR_CS));

Except for the longer (without being helpful imo) string reported if
the assertion triggers, I see not difference.

> I'm not sure I fully understand this layout, is it possible that you
> also need to account for the size of cpu_info?

Depends on how paranoid we want the checking here to be, but in going
further I wouldn't want this to become sub-page fine-grained if we
aren't first doing e.g. what I'm mentioning in the post-commit-message
remark.

Jan
Andrew Cooper April 27, 2020, 2:35 p.m. UTC | #3
On 27/04/2020 09:03, Jan Beulich wrote:
> The 2nd of the assertions as well as the macro's return value have been
> assuming we're on the primary stack. While for most IST exceptions we
> eventually switch back to the main one,

"we switch to the main one when interrupting user mode".

"eventually" isn't accurate as it is before we enter C.

>  for #DF we intentionally never
> do, and hence a #DF actually triggering on a user mode insn (which then
> is still a Xen bug) would in turn trigger this assertion, rather than
> cleanly logging state.
>
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> While we could go further and also assert we're on the correct IST
> stack in an "else" ti the "if()" added, I'm not fully convinced this
> would be generally helpful. I'll be happy to adjust accordingly if
> others think differently; at such a point though I think this should
> then no longer be a macro.
>
> --- a/xen/include/asm-x86/regs.h
> +++ b/xen/include/asm-x86/regs.h
> @@ -10,9 +10,10 @@
>      /* Frame pointer must point into current CPU stack. */                    \
>      ASSERT(diff < STACK_SIZE);                                                \
>      /* If not a guest frame, it must be a hypervisor frame. */                \
> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
>      /* Return TRUE if it's a guest frame. */                                  \
> -    (diff == 0);                                                              \
> +    !diff || ((r)->cs != __HYPERVISOR_CS);                                    \

The (diff == 0) already worried me before because it doesn't fail safe,
but this makes things more problematic.  Consider the case back when we
had __HYPERVISOR_CS32.

Guest mode is strictly "(r)->cs & 3".

Everything else is expectations about how things ought to be laid out,
but for safety in release builds, the final judgement should not depend
on the expectations evaluating true.

~Andrew
Jan Beulich April 27, 2020, 3:15 p.m. UTC | #4
On 27.04.2020 16:35, Andrew Cooper wrote:
> On 27/04/2020 09:03, Jan Beulich wrote:
>> The 2nd of the assertions as well as the macro's return value have been
>> assuming we're on the primary stack. While for most IST exceptions we
>> eventually switch back to the main one,
> 
> "we switch to the main one when interrupting user mode".
> 
> "eventually" isn't accurate as it is before we enter C.

Right, will change.

>> --- a/xen/include/asm-x86/regs.h
>> +++ b/xen/include/asm-x86/regs.h
>> @@ -10,9 +10,10 @@
>>      /* Frame pointer must point into current CPU stack. */                    \
>>      ASSERT(diff < STACK_SIZE);                                                \
>>      /* If not a guest frame, it must be a hypervisor frame. */                \
>> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
>> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
>> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
>>      /* Return TRUE if it's a guest frame. */                                  \
>> -    (diff == 0);                                                              \
>> +    !diff || ((r)->cs != __HYPERVISOR_CS);                                    \
> 
> The (diff == 0) already worried me before because it doesn't fail safe,
> but this makes things more problematic.  Consider the case back when we
> had __HYPERVISOR_CS32.

Yes - if __HYPERVISOR_CS32 would ever have been to be used for
anything, it would have needed checking for here.

> Guest mode is strictly "(r)->cs & 3".

As long as CS (a) gets properly saved (it's a "manual" step for
SYSCALL/SYSRET as well as #VMEXIT) and (b) didn't get clobbered. I
didn't write this code, I don't think, so I can only guess that
there were intentions behind this along these lines.

> Everything else is expectations about how things ought to be laid out,
> but for safety in release builds, the final judgement should not depend
> on the expectations evaluating true.

Well, I can switch to a purely CS.RPL based approach, as long as
we're happy to live with the possible downside mentioned above.
Of course this would then end up being a more intrusive change
than originally intended ...

Jan
Roger Pau Monné April 27, 2020, 4 p.m. UTC | #5
On Mon, Apr 27, 2020 at 04:08:59PM +0200, Jan Beulich wrote:
> On 27.04.2020 11:59, Roger Pau Monné wrote:
> > On Mon, Apr 27, 2020 at 10:03:05AM +0200, Jan Beulich wrote:
> >> --- a/xen/include/asm-x86/regs.h
> >> +++ b/xen/include/asm-x86/regs.h
> >> @@ -10,9 +10,10 @@
> >>      /* Frame pointer must point into current CPU stack. */                    \
> >>      ASSERT(diff < STACK_SIZE);                                                \
> >>      /* If not a guest frame, it must be a hypervisor frame. */                \
> >> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
> >> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
> >> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
> > 
> > Why not use:
> > 
> > ASSERT(diff >= PRIMARY_STACK_SIZE || !diff || ((r)->cs == __HYPERVISOR_CS));
> 
> Except for the longer (without being helpful imo) string reported if
> the assertion triggers, I see not difference.

Wanted to avoid the empty if on non-debug builds, but I assume the
compiler will already optimize it out.

> > I'm not sure I fully understand this layout, is it possible that you
> > also need to account for the size of cpu_info?
> 
> Depends on how paranoid we want the checking here to be, but in going
> further I wouldn't want this to become sub-page fine-grained if we
> aren't first doing e.g. what I'm mentioning in the post-commit-message
> remark.

Right, leaving it as-is is fine, just wanted to be sure I fully
understand the layout.

Thanks, Roger.
Andrew Cooper April 27, 2020, 8:11 p.m. UTC | #6
On 27/04/2020 16:15, Jan Beulich wrote:
> On 27.04.2020 16:35, Andrew Cooper wrote:
>> On 27/04/2020 09:03, Jan Beulich wrote:
>>> The 2nd of the assertions as well as the macro's return value have been
>>> assuming we're on the primary stack. While for most IST exceptions we
>>> eventually switch back to the main one,
>> "we switch to the main one when interrupting user mode".
>>
>> "eventually" isn't accurate as it is before we enter C.
> Right, will change.
>
>>> --- a/xen/include/asm-x86/regs.h
>>> +++ b/xen/include/asm-x86/regs.h
>>> @@ -10,9 +10,10 @@
>>>      /* Frame pointer must point into current CPU stack. */                    \
>>>      ASSERT(diff < STACK_SIZE);                                                \
>>>      /* If not a guest frame, it must be a hypervisor frame. */                \
>>> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
>>> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
>>> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
>>>      /* Return TRUE if it's a guest frame. */                                  \
>>> -    (diff == 0);                                                              \
>>> +    !diff || ((r)->cs != __HYPERVISOR_CS);                                    \
>> The (diff == 0) already worried me before because it doesn't fail safe,
>> but this makes things more problematic.  Consider the case back when we
>> had __HYPERVISOR_CS32.
> Yes - if __HYPERVISOR_CS32 would ever have been to be used for
> anything, it would have needed checking for here.
>
>> Guest mode is strictly "(r)->cs & 3".
> As long as CS (a) gets properly saved (it's a "manual" step for
> SYSCALL/SYSRET as well as #VMEXIT) and (b) didn't get clobbered. I
> didn't write this code, I don't think, so I can only guess that
> there were intentions behind this along these lines.

Hmm - the VMExit case might be problematic here, due to the variability
in the poison used.

>
>> Everything else is expectations about how things ought to be laid out,
>> but for safety in release builds, the final judgement should not depend
>> on the expectations evaluating true.
> Well, I can switch to a purely CS.RPL based approach, as long as
> we're happy to live with the possible downside mentioned above.
> Of course this would then end up being a more intrusive change
> than originally intended ...

I'd certainly prefer to go for something which is more robust, even if
it is a larger change.

~Andrew
Jan Beulich April 28, 2020, 6:30 a.m. UTC | #7
On 27.04.2020 22:11, Andrew Cooper wrote:
> On 27/04/2020 16:15, Jan Beulich wrote:
>> On 27.04.2020 16:35, Andrew Cooper wrote:
>>> On 27/04/2020 09:03, Jan Beulich wrote:
>>>> The 2nd of the assertions as well as the macro's return value have been
>>>> assuming we're on the primary stack. While for most IST exceptions we
>>>> eventually switch back to the main one,
>>> "we switch to the main one when interrupting user mode".
>>>
>>> "eventually" isn't accurate as it is before we enter C.
>> Right, will change.
>>
>>>> --- a/xen/include/asm-x86/regs.h
>>>> +++ b/xen/include/asm-x86/regs.h
>>>> @@ -10,9 +10,10 @@
>>>>      /* Frame pointer must point into current CPU stack. */                    \
>>>>      ASSERT(diff < STACK_SIZE);                                                \
>>>>      /* If not a guest frame, it must be a hypervisor frame. */                \
>>>> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
>>>> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
>>>> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
>>>>      /* Return TRUE if it's a guest frame. */                                  \
>>>> -    (diff == 0);                                                              \
>>>> +    !diff || ((r)->cs != __HYPERVISOR_CS);                                    \
>>> The (diff == 0) already worried me before because it doesn't fail safe,
>>> but this makes things more problematic.  Consider the case back when we
>>> had __HYPERVISOR_CS32.
>> Yes - if __HYPERVISOR_CS32 would ever have been to be used for
>> anything, it would have needed checking for here.
>>
>>> Guest mode is strictly "(r)->cs & 3".
>> As long as CS (a) gets properly saved (it's a "manual" step for
>> SYSCALL/SYSRET as well as #VMEXIT) and (b) didn't get clobbered. I
>> didn't write this code, I don't think, so I can only guess that
>> there were intentions behind this along these lines.
> 
> Hmm - the VMExit case might be problematic here, due to the variability
> in the poison used.

"Variability" is an understatement - there's no poisoning at all
in release builds afaics (and to be honest it seems a somewhat
pointless to write the same values over and over again in debug
mode). With this, ...

>>> Everything else is expectations about how things ought to be laid out,
>>> but for safety in release builds, the final judgement should not depend
>>> on the expectations evaluating true.
>> Well, I can switch to a purely CS.RPL based approach, as long as
>> we're happy to live with the possible downside mentioned above.
>> Of course this would then end up being a more intrusive change
>> than originally intended ...
> 
> I'd certainly prefer to go for something which is more robust, even if
> it is a larger change.

... what's your suggestion? Basing on _just_ CS.RPL obviously won't
work. Not even if we put in place the guest's CS (albeit that
somewhat depends on the meaning we assign to the macro's returned
value). Using current inside the macro to determine whether the
guest is HVM would also seem fragile to me - there are quite a few
uses of guest_mode(). Which would leave passing in a const struct
vcpu * (or domain *), requiring to touch all call sites, including
Arm's.

Compared to this it would seem to me that the change as presented
is a clear improvement without becoming overly large of a change.

Jan
Roger Pau Monné May 18, 2020, 2:51 p.m. UTC | #8
On Tue, Apr 28, 2020 at 08:30:12AM +0200, Jan Beulich wrote:
> On 27.04.2020 22:11, Andrew Cooper wrote:
> > On 27/04/2020 16:15, Jan Beulich wrote:
> >> On 27.04.2020 16:35, Andrew Cooper wrote:
> >>> On 27/04/2020 09:03, Jan Beulich wrote:
> >>>> The 2nd of the assertions as well as the macro's return value have been
> >>>> assuming we're on the primary stack. While for most IST exceptions we
> >>>> eventually switch back to the main one,
> >>> "we switch to the main one when interrupting user mode".
> >>>
> >>> "eventually" isn't accurate as it is before we enter C.
> >> Right, will change.
> >>
> >>>> --- a/xen/include/asm-x86/regs.h
> >>>> +++ b/xen/include/asm-x86/regs.h
> >>>> @@ -10,9 +10,10 @@
> >>>>      /* Frame pointer must point into current CPU stack. */                    \
> >>>>      ASSERT(diff < STACK_SIZE);                                                \
> >>>>      /* If not a guest frame, it must be a hypervisor frame. */                \
> >>>> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
> >>>> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
> >>>> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
> >>>>      /* Return TRUE if it's a guest frame. */                                  \
> >>>> -    (diff == 0);                                                              \
> >>>> +    !diff || ((r)->cs != __HYPERVISOR_CS);                                    \
> >>> The (diff == 0) already worried me before because it doesn't fail safe,
> >>> but this makes things more problematic.  Consider the case back when we
> >>> had __HYPERVISOR_CS32.
> >> Yes - if __HYPERVISOR_CS32 would ever have been to be used for
> >> anything, it would have needed checking for here.
> >>
> >>> Guest mode is strictly "(r)->cs & 3".
> >> As long as CS (a) gets properly saved (it's a "manual" step for
> >> SYSCALL/SYSRET as well as #VMEXIT) and (b) didn't get clobbered. I
> >> didn't write this code, I don't think, so I can only guess that
> >> there were intentions behind this along these lines.
> > 
> > Hmm - the VMExit case might be problematic here, due to the variability
> > in the poison used.
> 
> "Variability" is an understatement - there's no poisoning at all
> in release builds afaics (and to be honest it seems a somewhat
> pointless to write the same values over and over again in debug
> mode). With this, ...
> 
> >>> Everything else is expectations about how things ought to be laid out,
> >>> but for safety in release builds, the final judgement should not depend
> >>> on the expectations evaluating true.
> >> Well, I can switch to a purely CS.RPL based approach, as long as
> >> we're happy to live with the possible downside mentioned above.
> >> Of course this would then end up being a more intrusive change
> >> than originally intended ...
> > 
> > I'd certainly prefer to go for something which is more robust, even if
> > it is a larger change.
> 
> ... what's your suggestion? Basing on _just_ CS.RPL obviously won't
> work. Not even if we put in place the guest's CS (albeit that
> somewhat depends on the meaning we assign to the macro's returned
> value).

Just to check I'm following this correctly, using CS.RPL won't work
for HVM guests, as HVM can legitimately use a RPL of 0 (which is not
the case for PV guests). Doesn't the same apply to the usage of
__HYPERVISOR_CS? (A HVM guest could also use the same code segment
value as Xen?)

> Using current inside the macro to determine whether the
> guest is HVM would also seem fragile to me - there are quite a few
> uses of guest_mode(). Which would leave passing in a const struct
> vcpu * (or domain *), requiring to touch all call sites, including
> Arm's.

Fragile or slow? Are there corner cases where guest_mode is used where
current is not reliable?

> Compared to this it would seem to me that the change as presented
> is a clear improvement without becoming overly large of a change.

Using the cs register is already part of the guest_mode code, even if
just in debug mode, hence I don't see it as a regression from existing
code. It however feels weird to me that the reporter of the issue
doesn't agree with the fix, and hence would like to know if there's a
way we could achieve consensus on this.

Roger.
Jan Beulich May 20, 2020, 8:56 a.m. UTC | #9
On 18.05.2020 16:51, Roger Pau Monné wrote:
> On Tue, Apr 28, 2020 at 08:30:12AM +0200, Jan Beulich wrote:
>> On 27.04.2020 22:11, Andrew Cooper wrote:
>>> On 27/04/2020 16:15, Jan Beulich wrote:
>>>> On 27.04.2020 16:35, Andrew Cooper wrote:
>>>>> On 27/04/2020 09:03, Jan Beulich wrote:
>>>>>> --- a/xen/include/asm-x86/regs.h
>>>>>> +++ b/xen/include/asm-x86/regs.h
>>>>>> @@ -10,9 +10,10 @@
>>>>>>      /* Frame pointer must point into current CPU stack. */                    \
>>>>>>      ASSERT(diff < STACK_SIZE);                                                \
>>>>>>      /* If not a guest frame, it must be a hypervisor frame. */                \
>>>>>> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
>>>>>> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
>>>>>> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
>>>>>>      /* Return TRUE if it's a guest frame. */                                  \
>>>>>> -    (diff == 0);                                                              \
>>>>>> +    !diff || ((r)->cs != __HYPERVISOR_CS);                                    \
>>>>> The (diff == 0) already worried me before because it doesn't fail safe,
>>>>> but this makes things more problematic.  Consider the case back when we
>>>>> had __HYPERVISOR_CS32.
>>>> Yes - if __HYPERVISOR_CS32 would ever have been to be used for
>>>> anything, it would have needed checking for here.
>>>>
>>>>> Guest mode is strictly "(r)->cs & 3".
>>>> As long as CS (a) gets properly saved (it's a "manual" step for
>>>> SYSCALL/SYSRET as well as #VMEXIT) and (b) didn't get clobbered. I
>>>> didn't write this code, I don't think, so I can only guess that
>>>> there were intentions behind this along these lines.
>>>
>>> Hmm - the VMExit case might be problematic here, due to the variability
>>> in the poison used.
>>
>> "Variability" is an understatement - there's no poisoning at all
>> in release builds afaics (and to be honest it seems a somewhat
>> pointless to write the same values over and over again in debug
>> mode). With this, ...
>>
>>>>> Everything else is expectations about how things ought to be laid out,
>>>>> but for safety in release builds, the final judgement should not depend
>>>>> on the expectations evaluating true.
>>>> Well, I can switch to a purely CS.RPL based approach, as long as
>>>> we're happy to live with the possible downside mentioned above.
>>>> Of course this would then end up being a more intrusive change
>>>> than originally intended ...
>>>
>>> I'd certainly prefer to go for something which is more robust, even if
>>> it is a larger change.
>>
>> ... what's your suggestion? Basing on _just_ CS.RPL obviously won't
>> work. Not even if we put in place the guest's CS (albeit that
>> somewhat depends on the meaning we assign to the macro's returned
>> value).
> 
> Just to check I'm following this correctly, using CS.RPL won't work
> for HVM guests, as HVM can legitimately use a RPL of 0 (which is not
> the case for PV guests). Doesn't the same apply to the usage of
> __HYPERVISOR_CS? (A HVM guest could also use the same code segment
> value as Xen?)

Of course (and in particular Xen as a guest would). My "Basing on
_just_ CS.RPL" wasn't meant to exclude the rest of the selector,
but to contrast this to the case where "diff" also is involved in
the calculation (which looks to be what Andrew would prefer to see
go away).

>> Using current inside the macro to determine whether the
>> guest is HVM would also seem fragile to me - there are quite a few
>> uses of guest_mode(). Which would leave passing in a const struct
>> vcpu * (or domain *), requiring to touch all call sites, including
>> Arm's.
> 
> Fragile or slow? Are there corner cases where guest_mode is used where
> current is not reliable?

This question is why I said "there are quite a few uses of
guest_mode()" - auditing them all is just one side of the medal.
The other is to prevent a new use appearing in the future that
can be reached by a call path in the time window where a lazy
context switch is pending (i.e. when current has already been
updated, but register state hasn't been yet).

>> Compared to this it would seem to me that the change as presented
>> is a clear improvement without becoming overly large of a change.
> 
> Using the cs register is already part of the guest_mode code, even if
> just in debug mode, hence I don't see it as a regression from existing
> code. It however feels weird to me that the reporter of the issue
> doesn't agree with the fix, and hence would like to know if there's a
> way we could achieve consensus on this.

Indeed. I'd be happy to make further adjustments, if only I had a
clear understanding of what is wanted (or why leaving things as
they are is better than a little bit of an improvement).

Jan
Roger Pau Monné May 20, 2020, 3:13 p.m. UTC | #10
On Wed, May 20, 2020 at 10:56:26AM +0200, Jan Beulich wrote:
> On 18.05.2020 16:51, Roger Pau Monné wrote:
> > On Tue, Apr 28, 2020 at 08:30:12AM +0200, Jan Beulich wrote:
> >> On 27.04.2020 22:11, Andrew Cooper wrote:
> >>> On 27/04/2020 16:15, Jan Beulich wrote:
> >>>> On 27.04.2020 16:35, Andrew Cooper wrote:
> >>>>> On 27/04/2020 09:03, Jan Beulich wrote:
> >>>>>> --- a/xen/include/asm-x86/regs.h
> >>>>>> +++ b/xen/include/asm-x86/regs.h
> >>>>>> @@ -10,9 +10,10 @@
> >>>>>>      /* Frame pointer must point into current CPU stack. */                    \
> >>>>>>      ASSERT(diff < STACK_SIZE);                                                \
> >>>>>>      /* If not a guest frame, it must be a hypervisor frame. */                \
> >>>>>> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
> >>>>>> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
> >>>>>> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
> >>>>>>      /* Return TRUE if it's a guest frame. */                                  \
> >>>>>> -    (diff == 0);                                                              \
> >>>>>> +    !diff || ((r)->cs != __HYPERVISOR_CS);                                    \
> >>>>> The (diff == 0) already worried me before because it doesn't fail safe,
> >>>>> but this makes things more problematic.  Consider the case back when we
> >>>>> had __HYPERVISOR_CS32.
> >>>> Yes - if __HYPERVISOR_CS32 would ever have been to be used for
> >>>> anything, it would have needed checking for here.
> >>>>
> >>>>> Guest mode is strictly "(r)->cs & 3".
> >>>> As long as CS (a) gets properly saved (it's a "manual" step for
> >>>> SYSCALL/SYSRET as well as #VMEXIT) and (b) didn't get clobbered. I
> >>>> didn't write this code, I don't think, so I can only guess that
> >>>> there were intentions behind this along these lines.
> >>>
> >>> Hmm - the VMExit case might be problematic here, due to the variability
> >>> in the poison used.
> >>
> >> "Variability" is an understatement - there's no poisoning at all
> >> in release builds afaics (and to be honest it seems a somewhat
> >> pointless to write the same values over and over again in debug
> >> mode). With this, ...
> >>
> >>>>> Everything else is expectations about how things ought to be laid out,
> >>>>> but for safety in release builds, the final judgement should not depend
> >>>>> on the expectations evaluating true.
> >>>> Well, I can switch to a purely CS.RPL based approach, as long as
> >>>> we're happy to live with the possible downside mentioned above.
> >>>> Of course this would then end up being a more intrusive change
> >>>> than originally intended ...
> >>>
> >>> I'd certainly prefer to go for something which is more robust, even if
> >>> it is a larger change.
> >>
> >> ... what's your suggestion? Basing on _just_ CS.RPL obviously won't
> >> work. Not even if we put in place the guest's CS (albeit that
> >> somewhat depends on the meaning we assign to the macro's returned
> >> value).
> > 
> > Just to check I'm following this correctly, using CS.RPL won't work
> > for HVM guests, as HVM can legitimately use a RPL of 0 (which is not
> > the case for PV guests). Doesn't the same apply to the usage of
> > __HYPERVISOR_CS? (A HVM guest could also use the same code segment
> > value as Xen?)
> 
> Of course (and in particular Xen as a guest would). My "Basing on
> _just_ CS.RPL" wasn't meant to exclude the rest of the selector,
> but to contrast this to the case where "diff" also is involved in
> the calculation (which looks to be what Andrew would prefer to see
> go away).
> 
> >> Using current inside the macro to determine whether the
> >> guest is HVM would also seem fragile to me - there are quite a few
> >> uses of guest_mode(). Which would leave passing in a const struct
> >> vcpu * (or domain *), requiring to touch all call sites, including
> >> Arm's.
> > 
> > Fragile or slow? Are there corner cases where guest_mode is used where
> > current is not reliable?
> 
> This question is why I said "there are quite a few uses of
> guest_mode()" - auditing them all is just one side of the medal.
> The other is to prevent a new use appearing in the future that
> can be reached by a call path in the time window where a lazy
> context switch is pending (i.e. when current has already been
> updated, but register state hasn't been yet).
> 
> >> Compared to this it would seem to me that the change as presented
> >> is a clear improvement without becoming overly large of a change.
> > 
> > Using the cs register is already part of the guest_mode code, even if
> > just in debug mode, hence I don't see it as a regression from existing
> > code. It however feels weird to me that the reporter of the issue
> > doesn't agree with the fix, and hence would like to know if there's a
> > way we could achieve consensus on this.
> 
> Indeed. I'd be happy to make further adjustments, if only I had a
> clear understanding of what is wanted (or why leaving things as
> they are is better than a little bit of an improvement).

OK, so I think I'm starting to understand this all. Sorry it's taken
me so long. So it's my understanding that diff != 0 can only happen in
Xen context, or when in an IST that has a different stack (ie: MCE, NMI
or DF according to current.h) and running in PV mode?

Wouldn't in then be fine to use (r)->cs & 3 to check we are in guest
mode if diff != 0? I see a lot of other places where cs & 3 is already
used to that effect AFAICT (like entry.S).

Roger.
Jan Beulich May 22, 2020, 9:52 a.m. UTC | #11
On 20.05.2020 17:13, Roger Pau Monné wrote:
> On Wed, May 20, 2020 at 10:56:26AM +0200, Jan Beulich wrote:
>> On 18.05.2020 16:51, Roger Pau Monné wrote:
>>> On Tue, Apr 28, 2020 at 08:30:12AM +0200, Jan Beulich wrote:
>>>> On 27.04.2020 22:11, Andrew Cooper wrote:
>>>>> On 27/04/2020 16:15, Jan Beulich wrote:
>>>>>> On 27.04.2020 16:35, Andrew Cooper wrote:
>>>>>>> On 27/04/2020 09:03, Jan Beulich wrote:
>>>>>>>> --- a/xen/include/asm-x86/regs.h
>>>>>>>> +++ b/xen/include/asm-x86/regs.h
>>>>>>>> @@ -10,9 +10,10 @@
>>>>>>>>      /* Frame pointer must point into current CPU stack. */                    \
>>>>>>>>      ASSERT(diff < STACK_SIZE);                                                \
>>>>>>>>      /* If not a guest frame, it must be a hypervisor frame. */                \
>>>>>>>> -    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
>>>>>>>> +    if ( diff < PRIMARY_STACK_SIZE )                                          \
>>>>>>>> +        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
>>>>>>>>      /* Return TRUE if it's a guest frame. */                                  \
>>>>>>>> -    (diff == 0);                                                              \
>>>>>>>> +    !diff || ((r)->cs != __HYPERVISOR_CS);                                    \
>>>>>>> The (diff == 0) already worried me before because it doesn't fail safe,
>>>>>>> but this makes things more problematic.  Consider the case back when we
>>>>>>> had __HYPERVISOR_CS32.
>>>>>> Yes - if __HYPERVISOR_CS32 would ever have been to be used for
>>>>>> anything, it would have needed checking for here.
>>>>>>
>>>>>>> Guest mode is strictly "(r)->cs & 3".
>>>>>> As long as CS (a) gets properly saved (it's a "manual" step for
>>>>>> SYSCALL/SYSRET as well as #VMEXIT) and (b) didn't get clobbered. I
>>>>>> didn't write this code, I don't think, so I can only guess that
>>>>>> there were intentions behind this along these lines.
>>>>>
>>>>> Hmm - the VMExit case might be problematic here, due to the variability
>>>>> in the poison used.
>>>>
>>>> "Variability" is an understatement - there's no poisoning at all
>>>> in release builds afaics (and to be honest it seems a somewhat
>>>> pointless to write the same values over and over again in debug
>>>> mode). With this, ...
>>>>
>>>>>>> Everything else is expectations about how things ought to be laid out,
>>>>>>> but for safety in release builds, the final judgement should not depend
>>>>>>> on the expectations evaluating true.
>>>>>> Well, I can switch to a purely CS.RPL based approach, as long as
>>>>>> we're happy to live with the possible downside mentioned above.
>>>>>> Of course this would then end up being a more intrusive change
>>>>>> than originally intended ...
>>>>>
>>>>> I'd certainly prefer to go for something which is more robust, even if
>>>>> it is a larger change.
>>>>
>>>> ... what's your suggestion? Basing on _just_ CS.RPL obviously won't
>>>> work. Not even if we put in place the guest's CS (albeit that
>>>> somewhat depends on the meaning we assign to the macro's returned
>>>> value).
>>>
>>> Just to check I'm following this correctly, using CS.RPL won't work
>>> for HVM guests, as HVM can legitimately use a RPL of 0 (which is not
>>> the case for PV guests). Doesn't the same apply to the usage of
>>> __HYPERVISOR_CS? (A HVM guest could also use the same code segment
>>> value as Xen?)
>>
>> Of course (and in particular Xen as a guest would). My "Basing on
>> _just_ CS.RPL" wasn't meant to exclude the rest of the selector,
>> but to contrast this to the case where "diff" also is involved in
>> the calculation (which looks to be what Andrew would prefer to see
>> go away).
>>
>>>> Using current inside the macro to determine whether the
>>>> guest is HVM would also seem fragile to me - there are quite a few
>>>> uses of guest_mode(). Which would leave passing in a const struct
>>>> vcpu * (or domain *), requiring to touch all call sites, including
>>>> Arm's.
>>>
>>> Fragile or slow? Are there corner cases where guest_mode is used where
>>> current is not reliable?
>>
>> This question is why I said "there are quite a few uses of
>> guest_mode()" - auditing them all is just one side of the medal.
>> The other is to prevent a new use appearing in the future that
>> can be reached by a call path in the time window where a lazy
>> context switch is pending (i.e. when current has already been
>> updated, but register state hasn't been yet).
>>
>>>> Compared to this it would seem to me that the change as presented
>>>> is a clear improvement without becoming overly large of a change.
>>>
>>> Using the cs register is already part of the guest_mode code, even if
>>> just in debug mode, hence I don't see it as a regression from existing
>>> code. It however feels weird to me that the reporter of the issue
>>> doesn't agree with the fix, and hence would like to know if there's a
>>> way we could achieve consensus on this.
>>
>> Indeed. I'd be happy to make further adjustments, if only I had a
>> clear understanding of what is wanted (or why leaving things as
>> they are is better than a little bit of an improvement).
> 
> OK, so I think I'm starting to understand this all. Sorry it's taken
> me so long. So it's my understanding that diff != 0 can only happen in
> Xen context, or when in an IST that has a different stack (ie: MCE, NMI
> or DF according to current.h) and running in PV mode?
> 
> Wouldn't in then be fine to use (r)->cs & 3 to check we are in guest
> mode if diff != 0? I see a lot of other places where cs & 3 is already
> used to that effect AFAICT (like entry.S).

Technically this would be correct afaics, but the idea with all this
is (or should I say "looks to be"?) to have the checks be as tight as
possible, to make sure we don't mistakenly consider something "guest
mode" which really isn't. IOW your suggestion would be fine with me
if we could exclude bugs anywhere in the code. But since this isn't
realistic, I consider your suggestion to be relaxing things by too
much.

Jan
Roger Pau Monné May 22, 2020, 10:48 a.m. UTC | #12
On Fri, May 22, 2020 at 11:52:42AM +0200, Jan Beulich wrote:
> On 20.05.2020 17:13, Roger Pau Monné wrote:
> > OK, so I think I'm starting to understand this all. Sorry it's taken
> > me so long. So it's my understanding that diff != 0 can only happen in
> > Xen context, or when in an IST that has a different stack (ie: MCE, NMI
> > or DF according to current.h) and running in PV mode?
> > 
> > Wouldn't in then be fine to use (r)->cs & 3 to check we are in guest
> > mode if diff != 0? I see a lot of other places where cs & 3 is already
> > used to that effect AFAICT (like entry.S).
> 
> Technically this would be correct afaics, but the idea with all this
> is (or should I say "looks to be"?) to have the checks be as tight as
> possible, to make sure we don't mistakenly consider something "guest
> mode" which really isn't. IOW your suggestion would be fine with me
> if we could exclude bugs anywhere in the code. But since this isn't
> realistic, I consider your suggestion to be relaxing things by too
> much.

OK, so I take that (long time) we might also want to change the cs & 3
checks from entry.S to check against __HYPERVISOR_CS explicitly?

What I would prefer is to have some kind of homogeneity in how guest
mode vs Xen mode checks are performed, so that we don't confuse
people.

Thanks, Roger.
Jan Beulich May 22, 2020, noon UTC | #13
On 22.05.2020 12:48, Roger Pau Monné wrote:
> On Fri, May 22, 2020 at 11:52:42AM +0200, Jan Beulich wrote:
>> On 20.05.2020 17:13, Roger Pau Monné wrote:
>>> OK, so I think I'm starting to understand this all. Sorry it's taken
>>> me so long. So it's my understanding that diff != 0 can only happen in
>>> Xen context, or when in an IST that has a different stack (ie: MCE, NMI
>>> or DF according to current.h) and running in PV mode?
>>>
>>> Wouldn't in then be fine to use (r)->cs & 3 to check we are in guest
>>> mode if diff != 0? I see a lot of other places where cs & 3 is already
>>> used to that effect AFAICT (like entry.S).
>>
>> Technically this would be correct afaics, but the idea with all this
>> is (or should I say "looks to be"?) to have the checks be as tight as
>> possible, to make sure we don't mistakenly consider something "guest
>> mode" which really isn't. IOW your suggestion would be fine with me
>> if we could exclude bugs anywhere in the code. But since this isn't
>> realistic, I consider your suggestion to be relaxing things by too
>> much.
> 
> OK, so I take that (long time) we might also want to change the cs & 3
> checks from entry.S to check against __HYPERVISOR_CS explicitly?

I didn't think so, no (not the least because of there not being any
guarantee afaik that EFI runtime calls couldn't play with segment
registers; they shouldn't, yes, but there's a lot of other "should"
many don't obey to). Those are guaranteed PV-only code paths. The
main issue here is that ->cs cannot be relied upon when a frame
points at HVM state.

Jan
Roger Pau Monné May 26, 2020, 10:56 a.m. UTC | #14
On Fri, May 22, 2020 at 02:00:22PM +0200, Jan Beulich wrote:
> On 22.05.2020 12:48, Roger Pau Monné wrote:
> > On Fri, May 22, 2020 at 11:52:42AM +0200, Jan Beulich wrote:
> >> On 20.05.2020 17:13, Roger Pau Monné wrote:
> >>> OK, so I think I'm starting to understand this all. Sorry it's taken
> >>> me so long. So it's my understanding that diff != 0 can only happen in
> >>> Xen context, or when in an IST that has a different stack (ie: MCE, NMI
> >>> or DF according to current.h) and running in PV mode?
> >>>
> >>> Wouldn't in then be fine to use (r)->cs & 3 to check we are in guest
> >>> mode if diff != 0? I see a lot of other places where cs & 3 is already
> >>> used to that effect AFAICT (like entry.S).
> >>
> >> Technically this would be correct afaics, but the idea with all this
> >> is (or should I say "looks to be"?) to have the checks be as tight as
> >> possible, to make sure we don't mistakenly consider something "guest
> >> mode" which really isn't. IOW your suggestion would be fine with me
> >> if we could exclude bugs anywhere in the code. But since this isn't
> >> realistic, I consider your suggestion to be relaxing things by too
> >> much.
> > 
> > OK, so I take that (long time) we might also want to change the cs & 3
> > checks from entry.S to check against __HYPERVISOR_CS explicitly?
> 
> I didn't think so, no (not the least because of there not being any
> guarantee afaik that EFI runtime calls couldn't play with segment
> registers; they shouldn't, yes, but there's a lot of other "should"
> many don't obey to). Those are guaranteed PV-only code paths. The
> main issue here is that ->cs cannot be relied upon when a frame
> points at HVM state.

Well, if it points at HVM state it could equally have __HYPERVISOR_CS
set by the guest.

Will things work anyway if you get here from an exception generated by
EFI code that has changed the code segment? You are going to hit the
assert at least, since diff will be != 0 and cs != __HYPERVISOR_CS?

I would prefer to keep things coherent by either using cs & 3 or
cs == __HYPERVISOR_CS everywhere if possible, as I'm still unsure of
the benefit of using __HYPERVISOR_CS.

Thanks, Roger.
Jan Beulich May 26, 2020, 1:55 p.m. UTC | #15
On 26.05.2020 12:56, Roger Pau Monné wrote:
> On Fri, May 22, 2020 at 02:00:22PM +0200, Jan Beulich wrote:
>> On 22.05.2020 12:48, Roger Pau Monné wrote:
>>> On Fri, May 22, 2020 at 11:52:42AM +0200, Jan Beulich wrote:
>>>> On 20.05.2020 17:13, Roger Pau Monné wrote:
>>>>> OK, so I think I'm starting to understand this all. Sorry it's taken
>>>>> me so long. So it's my understanding that diff != 0 can only happen in
>>>>> Xen context, or when in an IST that has a different stack (ie: MCE, NMI
>>>>> or DF according to current.h) and running in PV mode?
>>>>>
>>>>> Wouldn't in then be fine to use (r)->cs & 3 to check we are in guest
>>>>> mode if diff != 0? I see a lot of other places where cs & 3 is already
>>>>> used to that effect AFAICT (like entry.S).
>>>>
>>>> Technically this would be correct afaics, but the idea with all this
>>>> is (or should I say "looks to be"?) to have the checks be as tight as
>>>> possible, to make sure we don't mistakenly consider something "guest
>>>> mode" which really isn't. IOW your suggestion would be fine with me
>>>> if we could exclude bugs anywhere in the code. But since this isn't
>>>> realistic, I consider your suggestion to be relaxing things by too
>>>> much.
>>>
>>> OK, so I take that (long time) we might also want to change the cs & 3
>>> checks from entry.S to check against __HYPERVISOR_CS explicitly?
>>
>> I didn't think so, no (not the least because of there not being any
>> guarantee afaik that EFI runtime calls couldn't play with segment
>> registers; they shouldn't, yes, but there's a lot of other "should"
>> many don't obey to). Those are guaranteed PV-only code paths. The
>> main issue here is that ->cs cannot be relied upon when a frame
>> points at HVM state.
> 
> Well, if it points at HVM state it could equally have __HYPERVISOR_CS
> set by the guest.

No, that's not the point. ->cs will never be __HYPERVISOR_CS in that
case, as we never store the guest's CS selector there. Instead
hvm_invalidate_regs_fields() clobbers the field in debug builds (with
a value resulting in RPL 3), but zero (i.e. a value implying RPL 0)
remains in place in release builds.

Instead of doing this clobbering in debug mode only, we could - as I
think I did suggest before - clobber always, but just once during vCPU
init rather than on every VM exit. In debug mode we could then instead
check that the dummy values didn't themselves get clobbered.

> Will things work anyway if you get here from an exception generated by
> EFI code that has changed the code segment? You are going to hit the
> assert at least, since diff will be != 0 and cs != __HYPERVISOR_CS?

What would guarantee the latter? Additionally they could in principle
also have switched stacks then, i.e. diff may then also be larger than
PRIMARY_STACK_SIZE, in which case - with the patch in place - the
assertion is bypassed altogether.

> I would prefer to keep things coherent by either using cs & 3 or
> cs == __HYPERVISOR_CS everywhere if possible, as I'm still unsure of
> the benefit of using __HYPERVISOR_CS.

See above.

Jan
Roger Pau Monné May 27, 2020, 3:17 p.m. UTC | #16
On Tue, May 26, 2020 at 03:55:39PM +0200, Jan Beulich wrote:
> On 26.05.2020 12:56, Roger Pau Monné wrote:
> > On Fri, May 22, 2020 at 02:00:22PM +0200, Jan Beulich wrote:
> >> On 22.05.2020 12:48, Roger Pau Monné wrote:
> >>> On Fri, May 22, 2020 at 11:52:42AM +0200, Jan Beulich wrote:
> >>>> On 20.05.2020 17:13, Roger Pau Monné wrote:
> >>>>> OK, so I think I'm starting to understand this all. Sorry it's taken
> >>>>> me so long. So it's my understanding that diff != 0 can only happen in
> >>>>> Xen context, or when in an IST that has a different stack (ie: MCE, NMI
> >>>>> or DF according to current.h) and running in PV mode?
> >>>>>
> >>>>> Wouldn't in then be fine to use (r)->cs & 3 to check we are in guest
> >>>>> mode if diff != 0? I see a lot of other places where cs & 3 is already
> >>>>> used to that effect AFAICT (like entry.S).
> >>>>
> >>>> Technically this would be correct afaics, but the idea with all this
> >>>> is (or should I say "looks to be"?) to have the checks be as tight as
> >>>> possible, to make sure we don't mistakenly consider something "guest
> >>>> mode" which really isn't. IOW your suggestion would be fine with me
> >>>> if we could exclude bugs anywhere in the code. But since this isn't
> >>>> realistic, I consider your suggestion to be relaxing things by too
> >>>> much.
> >>>
> >>> OK, so I take that (long time) we might also want to change the cs & 3
> >>> checks from entry.S to check against __HYPERVISOR_CS explicitly?
> >>
> >> I didn't think so, no (not the least because of there not being any
> >> guarantee afaik that EFI runtime calls couldn't play with segment
> >> registers; they shouldn't, yes, but there's a lot of other "should"
> >> many don't obey to). Those are guaranteed PV-only code paths. The
> >> main issue here is that ->cs cannot be relied upon when a frame
> >> points at HVM state.
> > 
> > Well, if it points at HVM state it could equally have __HYPERVISOR_CS
> > set by the guest.
> 
> No, that's not the point. ->cs will never be __HYPERVISOR_CS in that
> case, as we never store the guest's CS selector there. Instead
> hvm_invalidate_regs_fields() clobbers the field in debug builds (with
> a value resulting in RPL 3), but zero (i.e. a value implying RPL 0)
> remains in place in release builds.
> 
> Instead of doing this clobbering in debug mode only, we could - as I
> think I did suggest before - clobber always, but just once during vCPU
> init rather than on every VM exit. In debug mode we could then instead
> check that the dummy values didn't themselves get clobbered.

It would make sense to clobber it always with a value that has RPL >
0, so that it's consistent with PV state.

> > Will things work anyway if you get here from an exception generated by
> > EFI code that has changed the code segment? You are going to hit the
> > assert at least, since diff will be != 0 and cs != __HYPERVISOR_CS?
> 
> What would guarantee the latter? Additionally they could in principle
> also have switched stacks then, i.e. diff may then also be larger than
> PRIMARY_STACK_SIZE, in which case - with the patch in place - the
> assertion is bypassed altogether.
> 
> > I would prefer to keep things coherent by either using cs & 3 or
> > cs == __HYPERVISOR_CS everywhere if possible, as I'm still unsure of
> > the benefit of using __HYPERVISOR_CS.
> 
> See above.

Well, I think it's an improvement overall, as it allows to properly
handle the case where a PV guest could manage to trigger an exception
that uses a stack different than the primary one.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Roger.
diff mbox series

Patch

--- a/xen/include/asm-x86/regs.h
+++ b/xen/include/asm-x86/regs.h
@@ -10,9 +10,10 @@ 
     /* Frame pointer must point into current CPU stack. */                    \
     ASSERT(diff < STACK_SIZE);                                                \
     /* If not a guest frame, it must be a hypervisor frame. */                \
-    ASSERT((diff == 0) || (r->cs == __HYPERVISOR_CS));                        \
+    if ( diff < PRIMARY_STACK_SIZE )                                          \
+        ASSERT(!diff || ((r)->cs == __HYPERVISOR_CS));                        \
     /* Return TRUE if it's a guest frame. */                                  \
-    (diff == 0);                                                              \
+    !diff || ((r)->cs != __HYPERVISOR_CS);                                    \
 })
 
 #endif /* __X86_REGS_H__ */