diff mbox series

[for-4.17?] x86/paging: return -EINVAL for paging domctls for dying domains

Message ID 20221108113850.61619-1-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show
Series [for-4.17?] x86/paging: return -EINVAL for paging domctls for dying domains | expand

Commit Message

Roger Pau Monné Nov. 8, 2022, 11:38 a.m. UTC
Like on the Arm side, return -EINVAL when attempting to do a p2m
operation on dying domains.

The current logic returns 0 and leaves the domctl parameter
uninitialized for any parameter fetching operations (like the
GET_ALLOCATION operation), which is not helpful from a toolstack point
of view, because there's no indication that the data hasn't been
fetched.

Reported-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I'm adding the for-4.17 tag because I think this is a backport
candidate to older Xen versions also.
---
 xen/arch/x86/mm/paging.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Jan Beulich Nov. 8, 2022, 4:14 p.m. UTC | #1
On 08.11.2022 12:38, Roger Pau Monne wrote:
> Like on the Arm side, return -EINVAL when attempting to do a p2m
> operation on dying domains.
> 
> The current logic returns 0 and leaves the domctl parameter
> uninitialized for any parameter fetching operations (like the
> GET_ALLOCATION operation), which is not helpful from a toolstack point
> of view, because there's no indication that the data hasn't been
> fetched.

While I can see how the present behavior is problematic when it comes
to consuming supposedly returned data, ...

> --- a/xen/arch/x86/mm/paging.c
> +++ b/xen/arch/x86/mm/paging.c
> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
>  
>      if ( unlikely(d->is_dying) )
>      {
> -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
> +        gdprintk(XENLOG_INFO,
> +                 "Tried to do a paging domctl op on dying domain %u\n",
>                   d->domain_id);
> -        return 0;
> +        return -EINVAL;
>      }

... going from "success" to "failure" here has a meaningful risk of
regressing callers. It is my understanding that it was deliberate to
mimic success in this case (without meaning to assign "good" or "bad"
to that decision). Can you instead fill the data to be returned in
some simple enough way? I assume a mere memset() isn't going to be
good enough, though (albeit public/domctl.h doesn't explicitly name
any input-only fields, so it may not be necessary to preserve
anything). Maybe zeroing ->mb and ->stats would do?

As a minor remark: _If_ you're changing the printk(), then please
also switch to using %pd.

Jan
Roger Pau Monné Nov. 8, 2022, 4:43 p.m. UTC | #2
On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
> On 08.11.2022 12:38, Roger Pau Monne wrote:
> > Like on the Arm side, return -EINVAL when attempting to do a p2m
> > operation on dying domains.
> > 
> > The current logic returns 0 and leaves the domctl parameter
> > uninitialized for any parameter fetching operations (like the
> > GET_ALLOCATION operation), which is not helpful from a toolstack point
> > of view, because there's no indication that the data hasn't been
> > fetched.
> 
> While I can see how the present behavior is problematic when it comes
> to consuming supposedly returned data, ...
> 
> > --- a/xen/arch/x86/mm/paging.c
> > +++ b/xen/arch/x86/mm/paging.c
> > @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
> >  
> >      if ( unlikely(d->is_dying) )
> >      {
> > -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
> > +        gdprintk(XENLOG_INFO,
> > +                 "Tried to do a paging domctl op on dying domain %u\n",
> >                   d->domain_id);
> > -        return 0;
> > +        return -EINVAL;
> >      }
> 
> ... going from "success" to "failure" here has a meaningful risk of
> regressing callers. It is my understanding that it was deliberate to
> mimic success in this case (without meaning to assign "good" or "bad"
> to that decision).

I would assume that was the original intention, yes, albeit the commit
message doesn't go into details about why mimicking success is
required, it's very well possible the code relying on this was xend.

> Can you instead fill the data to be returned in
> some simple enough way? I assume a mere memset() isn't going to be
> good enough, though (albeit public/domctl.h doesn't explicitly name
> any input-only fields, so it may not be necessary to preserve
> anything). Maybe zeroing ->mb and ->stats would do?

Hm, it still feels kind of wrong.  We do return errors elsewhere for
operations attempted against dying domains, and that seems all fine,
not sure why paging operations need to be different in this regard.
Arm does also return -EINVAL in that case.

So what about postponing this change to 4.18 in order to avoid
surprises, but then taking it in its current form at the start of the
development window, as to have time to detect any issues?

> As a minor remark: _If_ you're changing the printk(), then please
> also switch to using %pd.

I've considered this, but then printing: "Tried to do a paging domctl
op on dying domain dX" felt kind of repetitive to me because of the
usage of domain and dX in the same sentence.  Anyway, will adjust.

Thanks, Roger.
Jan Beulich Nov. 8, 2022, 5:03 p.m. UTC | #3
On 08.11.2022 17:43, Roger Pau Monné wrote:
> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
>> On 08.11.2022 12:38, Roger Pau Monne wrote:
>>> Like on the Arm side, return -EINVAL when attempting to do a p2m
>>> operation on dying domains.
>>>
>>> The current logic returns 0 and leaves the domctl parameter
>>> uninitialized for any parameter fetching operations (like the
>>> GET_ALLOCATION operation), which is not helpful from a toolstack point
>>> of view, because there's no indication that the data hasn't been
>>> fetched.
>>
>> While I can see how the present behavior is problematic when it comes
>> to consuming supposedly returned data, ...
>>
>>> --- a/xen/arch/x86/mm/paging.c
>>> +++ b/xen/arch/x86/mm/paging.c
>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
>>>  
>>>      if ( unlikely(d->is_dying) )
>>>      {
>>> -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
>>> +        gdprintk(XENLOG_INFO,
>>> +                 "Tried to do a paging domctl op on dying domain %u\n",
>>>                   d->domain_id);
>>> -        return 0;
>>> +        return -EINVAL;
>>>      }
>>
>> ... going from "success" to "failure" here has a meaningful risk of
>> regressing callers. It is my understanding that it was deliberate to
>> mimic success in this case (without meaning to assign "good" or "bad"
>> to that decision).
> 
> I would assume that was the original intention, yes, albeit the commit
> message doesn't go into details about why mimicking success is
> required, it's very well possible the code relying on this was xend.

Quite possible, but you never know who else has cloned code from there.

>> Can you instead fill the data to be returned in
>> some simple enough way? I assume a mere memset() isn't going to be
>> good enough, though (albeit public/domctl.h doesn't explicitly name
>> any input-only fields, so it may not be necessary to preserve
>> anything). Maybe zeroing ->mb and ->stats would do?
> 
> Hm, it still feels kind of wrong.  We do return errors elsewhere for
> operations attempted against dying domains, and that seems all fine,
> not sure why paging operations need to be different in this regard.
> Arm does also return -EINVAL in that case.
> 
> So what about postponing this change to 4.18 in order to avoid
> surprises, but then taking it in its current form at the start of the
> development window, as to have time to detect any issues?

Maybe, but to be honest I'm not convinced. Arm can't really be taken
for comparison, since the op is pretty new there iirc.

>> As a minor remark: _If_ you're changing the printk(), then please
>> also switch to using %pd.
> 
> I've considered this, but then printing: "Tried to do a paging domctl
> op on dying domain dX" felt kind of repetitive to me because of the
> usage of domain and dX in the same sentence.  Anyway, will adjust.

Simply drop the word "domain", as we've done elsewhere when switching
to %pd?

Jan
Roger Pau Monné Nov. 8, 2022, 5:15 p.m. UTC | #4
On Tue, Nov 08, 2022 at 06:03:54PM +0100, Jan Beulich wrote:
> On 08.11.2022 17:43, Roger Pau Monné wrote:
> > On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
> >> On 08.11.2022 12:38, Roger Pau Monne wrote:
> >>> Like on the Arm side, return -EINVAL when attempting to do a p2m
> >>> operation on dying domains.
> >>>
> >>> The current logic returns 0 and leaves the domctl parameter
> >>> uninitialized for any parameter fetching operations (like the
> >>> GET_ALLOCATION operation), which is not helpful from a toolstack point
> >>> of view, because there's no indication that the data hasn't been
> >>> fetched.
> >>
> >> While I can see how the present behavior is problematic when it comes
> >> to consuming supposedly returned data, ...
> >>
> >>> --- a/xen/arch/x86/mm/paging.c
> >>> +++ b/xen/arch/x86/mm/paging.c
> >>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
> >>>  
> >>>      if ( unlikely(d->is_dying) )
> >>>      {
> >>> -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
> >>> +        gdprintk(XENLOG_INFO,
> >>> +                 "Tried to do a paging domctl op on dying domain %u\n",
> >>>                   d->domain_id);
> >>> -        return 0;
> >>> +        return -EINVAL;
> >>>      }
> >>
> >> ... going from "success" to "failure" here has a meaningful risk of
> >> regressing callers. It is my understanding that it was deliberate to
> >> mimic success in this case (without meaning to assign "good" or "bad"
> >> to that decision).
> > 
> > I would assume that was the original intention, yes, albeit the commit
> > message doesn't go into details about why mimicking success is
> > required, it's very well possible the code relying on this was xend.
> 
> Quite possible, but you never know who else has cloned code from there.
> 
> >> Can you instead fill the data to be returned in
> >> some simple enough way? I assume a mere memset() isn't going to be
> >> good enough, though (albeit public/domctl.h doesn't explicitly name
> >> any input-only fields, so it may not be necessary to preserve
> >> anything). Maybe zeroing ->mb and ->stats would do?
> > 
> > Hm, it still feels kind of wrong.  We do return errors elsewhere for
> > operations attempted against dying domains, and that seems all fine,
> > not sure why paging operations need to be different in this regard.
> > Arm does also return -EINVAL in that case.
> > 
> > So what about postponing this change to 4.18 in order to avoid
> > surprises, but then taking it in its current form at the start of the
> > development window, as to have time to detect any issues?
> 
> Maybe, but to be honest I'm not convinced. Arm can't really be taken
> for comparison, since the op is pretty new there iirc.

Indeed, but the tools code paths are likely shared between x86 and
Arm, as the hypercalls are the same.

This is a domctl interface, so we are fine to do such changes.  I
understand that we want to avoid such interface changes as much as
possible, but I think we need to fix the hypercall to return error
codes rather than implementing workarounds to try to cope with a wrong
interface behavior in the first place.  Or else we could be
accumulation workarounds here in order to fool caller into thinking
the hypercall has somehow succeed, and provide kind of suitable
looking data for the output parameters.

> >> As a minor remark: _If_ you're changing the printk(), then please
> >> also switch to using %pd.
> > 
> > I've considered this, but then printing: "Tried to do a paging domctl
> > op on dying domain dX" felt kind of repetitive to me because of the
> > usage of domain and dX in the same sentence.  Anyway, will adjust.
> 
> Simply drop the word "domain", as we've done elsewhere when switching
> to %pd?

OK.

Thanks, Roger.
Edwin Török Nov. 8, 2022, 5:30 p.m. UTC | #5
> On 8 Nov 2022, at 16:43, Roger Pau Monne <roger.pau@citrix.com> wrote:
> 
> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
>> On 08.11.2022 12:38, Roger Pau Monne wrote:
>>> Like on the Arm side, return -EINVAL when attempting to do a p2m
>>> operation on dying domains.
>>> 
>>> The current logic returns 0 and leaves the domctl parameter
>>> uninitialized for any parameter fetching operations (like the
>>> GET_ALLOCATION operation), which is not helpful from a toolstack point
>>> of view, because there's no indication that the data hasn't been
>>> fetched.
>> 
>> While I can see how the present behavior is problematic when it comes
>> to consuming supposedly returned data, ...
>> 
>>> --- a/xen/arch/x86/mm/paging.c
>>> +++ b/xen/arch/x86/mm/paging.c
>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
>>> 
>>>     if ( unlikely(d->is_dying) )
>>>     {
>>> -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
>>> +        gdprintk(XENLOG_INFO,
>>> +                 "Tried to do a paging domctl op on dying domain %u\n",
>>>                  d->domain_id);
>>> -        return 0;
>>> +        return -EINVAL;
>>>     }
>> 
>> ... going from "success" to "failure" here has a meaningful risk of
>> regressing callers. It is my understanding that it was deliberate to
>> mimic success in this case (without meaning to assign "good" or "bad"
>> to that decision).
> 
> I would assume that was the original intention, yes, albeit the commit
> message doesn't go into details about why mimicking success is
> required, it's very well possible the code relying on this was xend.
> 
>> Can you instead fill the data to be returned in
>> some simple enough way? I assume a mere memset() isn't going to be
>> good enough, though (albeit public/domctl.h doesn't explicitly name
>> any input-only fields, so it may not be necessary to preserve
>> anything). Maybe zeroing ->mb and ->stats would do?
> 
> Hm, it still feels kind of wrong.

Without the fix in the bindings (and any fixup in other toolstacks as needed, I haven't checked): https://lore.kernel.org/xen-devel/94f93ee61a4d0bd2fac3f5a753cb935962be20bb.1667920496.git.edvin.torok@citrix.com/T/#u
a value of 0 here might still cause things to go subtly wrong, i.e. cause the HVM shadow multiplier of a VM to decrease, although I think there are safeguards against it going below 1.0.
However a ->mb value of all zeroes is much easier to debug (and detect!) than a completely uninitialised value.

I'd prefer if the return value of domctls could be trusted to mean that 0 = all values are initialized on output, and to be potentially uninitialised only on failure.
There are a lot of other calls with similar pattern (particularly around vcpu, which will go down a different path) in the xenctrl bindings, and I haven't checked them all yet, but it'd be a good rule of thumb if the behaviour was consistent with other hypercalls/domctls.


>  We do return errors elsewhere for
> operations attempted against dying domains, and that seems all fine,
> not sure why paging operations need to be different in this regard.
> Arm does also return -EINVAL in that case.

How about we return EINVAL *and* in DEBUG builds fill the struct with an easily recognizable poison value?

> 
> So what about postponing this change to 4.18 in order to avoid
> surprises, but then taking it in its current form at the start of the
> development window, as to have time to detect any issues?
> 
>> As a minor remark: _If_ you're changing the printk(), then please
>> also switch to using %pd.
> 
> I've considered this, but then printing: "Tried to do a paging domctl
> op on dying domain dX" felt kind of repetitive to me because of the
> usage of domain and dX in the same sentence.  Anyway, will adjust.
> 
> Thanks, Roger.

Thanks,
--Edwin
Jan Beulich Nov. 9, 2022, 7:48 a.m. UTC | #6
On 08.11.2022 18:15, Roger Pau Monné wrote:
> On Tue, Nov 08, 2022 at 06:03:54PM +0100, Jan Beulich wrote:
>> On 08.11.2022 17:43, Roger Pau Monné wrote:
>>> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
>>>> On 08.11.2022 12:38, Roger Pau Monne wrote:
>>>>> Like on the Arm side, return -EINVAL when attempting to do a p2m
>>>>> operation on dying domains.
>>>>>
>>>>> The current logic returns 0 and leaves the domctl parameter
>>>>> uninitialized for any parameter fetching operations (like the
>>>>> GET_ALLOCATION operation), which is not helpful from a toolstack point
>>>>> of view, because there's no indication that the data hasn't been
>>>>> fetched.
>>>>
>>>> While I can see how the present behavior is problematic when it comes
>>>> to consuming supposedly returned data, ...
>>>>
>>>>> --- a/xen/arch/x86/mm/paging.c
>>>>> +++ b/xen/arch/x86/mm/paging.c
>>>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
>>>>>  
>>>>>      if ( unlikely(d->is_dying) )
>>>>>      {
>>>>> -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
>>>>> +        gdprintk(XENLOG_INFO,
>>>>> +                 "Tried to do a paging domctl op on dying domain %u\n",
>>>>>                   d->domain_id);
>>>>> -        return 0;
>>>>> +        return -EINVAL;
>>>>>      }
>>>>
>>>> ... going from "success" to "failure" here has a meaningful risk of
>>>> regressing callers. It is my understanding that it was deliberate to
>>>> mimic success in this case (without meaning to assign "good" or "bad"
>>>> to that decision).
>>>
>>> I would assume that was the original intention, yes, albeit the commit
>>> message doesn't go into details about why mimicking success is
>>> required, it's very well possible the code relying on this was xend.
>>
>> Quite possible, but you never know who else has cloned code from there.
>>
>>>> Can you instead fill the data to be returned in
>>>> some simple enough way? I assume a mere memset() isn't going to be
>>>> good enough, though (albeit public/domctl.h doesn't explicitly name
>>>> any input-only fields, so it may not be necessary to preserve
>>>> anything). Maybe zeroing ->mb and ->stats would do?
>>>
>>> Hm, it still feels kind of wrong.  We do return errors elsewhere for
>>> operations attempted against dying domains, and that seems all fine,
>>> not sure why paging operations need to be different in this regard.
>>> Arm does also return -EINVAL in that case.
>>>
>>> So what about postponing this change to 4.18 in order to avoid
>>> surprises, but then taking it in its current form at the start of the
>>> development window, as to have time to detect any issues?
>>
>> Maybe, but to be honest I'm not convinced. Arm can't really be taken
>> for comparison, since the op is pretty new there iirc.
> 
> Indeed, but the tools code paths are likely shared between x86 and
> Arm, as the hypercalls are the same.

On x86 we have both xc_shadow_control() and (functional)
xc_logdirty_control(); on Arm only the former is used, while the latter
would also be impacted by your change. Plus you're not accounting for
external tool stacks (like xend would be if anyone had cared to forward
port it, when - as you said earlier - the suspicion is that the original
change was made to "please" xend).

> This is a domctl interface, so we are fine to do such changes.

We're fine to make changes to domctl which are either binary compatible
with earlier versions or which are associated with a bump of the
interface version. The latter wouldn't help in this case, while the
former is simply not true here. For Andrew's proposed new paging pool
interface the behavior suggested here would of course be fully
appropriate, demanding that tool stack either don't issue such requests
against dying domains or that they be prepared to get back errors.

Thinking about it again I'm also not convinced EINVAL is an appropriate
error code to use here. The operation isn't necessarily invalid; we
only prefer to not carry out any such anymore. EOPNOTSUPP, EPERM, or
EACCES would all seem more appropriate. Or, for ease of recognition, a
rarely used one, e.g. ENODATA, EILSEQ, or EROFS.

Finally I'm not convinced of the usefulness of this dying check in the
first place: is_dying may become set immediately after the check was
done.

Jan

>  I
> understand that we want to avoid such interface changes as much as
> possible, but I think we need to fix the hypercall to return error
> codes rather than implementing workarounds to try to cope with a wrong
> interface behavior in the first place.  Or else we could be
> accumulation workarounds here in order to fool caller into thinking
> the hypercall has somehow succeed, and provide kind of suitable
> looking data for the output parameters.
Edwin Török Nov. 9, 2022, 9:45 a.m. UTC | #7
> On 9 Nov 2022, at 07:48, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 08.11.2022 18:15, Roger Pau Monné wrote:
>> On Tue, Nov 08, 2022 at 06:03:54PM +0100, Jan Beulich wrote:
>>> On 08.11.2022 17:43, Roger Pau Monné wrote:
>>>> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
>>>>> On 08.11.2022 12:38, Roger Pau Monne wrote:
>>>>>> Like on the Arm side, return -EINVAL when attempting to do a p2m
>>>>>> operation on dying domains.
>>>>>> 
>>>>>> The current logic returns 0 and leaves the domctl parameter
>>>>>> uninitialized for any parameter fetching operations (like the
>>>>>> GET_ALLOCATION operation), which is not helpful from a toolstack point
>>>>>> of view, because there's no indication that the data hasn't been
>>>>>> fetched.
>>>>> 
>>>>> While I can see how the present behavior is problematic when it comes
>>>>> to consuming supposedly returned data, ...
>>>>> 
>>>>>> --- a/xen/arch/x86/mm/paging.c
>>>>>> +++ b/xen/arch/x86/mm/paging.c
>>>>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
>>>>>> 
>>>>>>     if ( unlikely(d->is_dying) )
>>>>>>     {
>>>>>> -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
>>>>>> +        gdprintk(XENLOG_INFO,
>>>>>> +                 "Tried to do a paging domctl op on dying domain %u\n",
>>>>>>                  d->domain_id);
>>>>>> -        return 0;
>>>>>> +        return -EINVAL;
>>>>>>     }
>>>>> 
>>>>> ... going from "success" to "failure" here has a meaningful risk of
>>>>> regressing callers. It is my understanding that it was deliberate to
>>>>> mimic success in this case (without meaning to assign "good" or "bad"
>>>>> to that decision).
>>>> 
>>>> I would assume that was the original intention, yes, albeit the commit
>>>> message doesn't go into details about why mimicking success is
>>>> required, it's very well possible the code relying on this was xend.
>>> 
>>> Quite possible, but you never know who else has cloned code from there.
>>> 
>>>>> Can you instead fill the data to be returned in
>>>>> some simple enough way? I assume a mere memset() isn't going to be
>>>>> good enough, though (albeit public/domctl.h doesn't explicitly name
>>>>> any input-only fields, so it may not be necessary to preserve
>>>>> anything). Maybe zeroing ->mb and ->stats would do?
>>>> 
>>>> Hm, it still feels kind of wrong.  We do return errors elsewhere for
>>>> operations attempted against dying domains, and that seems all fine,
>>>> not sure why paging operations need to be different in this regard.
>>>> Arm does also return -EINVAL in that case.
>>>> 
>>>> So what about postponing this change to 4.18 in order to avoid
>>>> surprises, but then taking it in its current form at the start of the
>>>> development window, as to have time to detect any issues?
>>> 
>>> Maybe, but to be honest I'm not convinced. Arm can't really be taken
>>> for comparison, since the op is pretty new there iirc.
>> 
>> Indeed, but the tools code paths are likely shared between x86 and
>> Arm, as the hypercalls are the same.
> 
> On x86 we have both xc_shadow_control() and (functional)
> xc_logdirty_control(); on Arm only the former is used, while the latter
> would also be impacted by your change. Plus you're not accounting for
> external tool stacks (like xend would be if anyone had cared to forward
> port it, when - as you said earlier - the suspicion is that the original
> change was made to "please" xend).

I don't see how returning random uninitialised data (current behaviour) or wrong data (all zeroes) is better than returning an explicit error code.

Toolstacks (e.g. XAPI) only seemingly "work" with the current behaviour, in that it is trying to detect invalid data being returned,
and the chances of the invalid data being >1000 is more likely than it being <1000, so "most of the time" the toolstack happens to work.

There was a comment in the code that the bug might be in the binding, but the binding itself seems fine (well that one at least, there might be memory corruption in other bindings
which is probably what the comment was hinting at...), which is probably why the original investigation of this bug stopped and worked around the problem toolstack
side instead of digging into the hypervisor to find the real bug, it probably trusted the hypervisor being correct too much.

Except when the random data is between 1 and 1000, in which case it accepts that value as plausible and things break, 
but not in a way in which you can reproduce (you need to rerun many times until the value on the stack happens to be in this range).

And for the latter the toolstack needs to be changed anyway to detect 0 as a bad value (i.e. same as an exception), and it already has code
to detect exceptions from these domctls.

Even if such a change was made to "please xend" it is very likely just hiding the bad/uninitialized behaviour the same way XAPI does,
it doesn't necessarily mean it worked reliably.

And even this change happens to break some code, it'd do so with a clear error code and in a reproducible way, and then such breakage can be easily fixed in the affected piece of code
(e.g. a toolstack other than XAPI which was not looking for errors from this domctl call)
Using an error code other than EINVAL as you suggest might even make finding those problems more obvious.

AFAICT XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION is only used in the OCaml C stubs (i.e. used XAPI/xenopsd) which always checks return value
and raises OCaml exception and in python lowlevel libs, which checks return value and raises python exception.

> 
>> This is a domctl interface, so we are fine to do such changes.
> 
> We're fine to make changes to domctl which are either binary compatible
> with earlier versions or which are associated with a bump of the
> interface version. The latter wouldn't help in this case, while the
> former is simply not true here. For Andrew's proposed new paging pool
> interface the behavior suggested here would of course be fully
> appropriate, demanding that tool stack either don't issue such requests
> against dying domains or that they be prepared to get back errors.
> 
> Thinking about it again I'm also not convinced EINVAL is an appropriate
> error code to use here. The operation isn't necessarily invalid; we
> only prefer to not carry out any such anymore. EOPNOTSUPP, EPERM, or
> EACCES would all seem more appropriate. Or, for ease of recognition, a
> rarely used one, e.g. ENODATA, EILSEQ, or EROFS.

EROFS on a read-only query (OP_GET_ALLOCATION) would seem wrong.
Similarly EPERM/EACCES when you're Dom0 would make someone trying to dig in the wrong place.

Other places in the code also may return EINVAL,ESRCH, EBUSY, but not sure how many of those may reach userspace,
ESRCH would seem appropriate.
EOPNOTSUP,ENODATA might be ok too.

> 
> Finally I'm not convinced of the usefulness of this dying check in the
> first place: is_dying may become set immediately after the check was
> done.


Indeed, there is an inherent race condition here (which is why the toolstack can't make this check prior to calling the domctl)
There are other places in the code where there is a BUG_ON on !is_dying, are they protected by locks,
or do they have a race condition where a dying domain can trigger it?

Best regards,
--Edwin

> 
> Jan
> 
>> I
>> understand that we want to avoid such interface changes as much as
>> possible, but I think we need to fix the hypercall to return error
>> codes rather than implementing workarounds to try to cope with a wrong
>> interface behavior in the first place.  Or else we could be
>> accumulation workarounds here in order to fool caller into thinking
>> the hypercall has somehow succeed, and provide kind of suitable
>> looking data for the output parameters.
>
Jan Beulich Nov. 9, 2022, 9:58 a.m. UTC | #8
On 09.11.2022 10:45, Edwin Torok wrote:
>> On 9 Nov 2022, at 07:48, Jan Beulich <jbeulich@suse.com> wrote:
>> On 08.11.2022 18:15, Roger Pau Monné wrote:
>>> On Tue, Nov 08, 2022 at 06:03:54PM +0100, Jan Beulich wrote:
>>>> On 08.11.2022 17:43, Roger Pau Monné wrote:
>>>>> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
>>>>>> On 08.11.2022 12:38, Roger Pau Monne wrote:
>>>>>>> Like on the Arm side, return -EINVAL when attempting to do a p2m
>>>>>>> operation on dying domains.
>>>>>>>
>>>>>>> The current logic returns 0 and leaves the domctl parameter
>>>>>>> uninitialized for any parameter fetching operations (like the
>>>>>>> GET_ALLOCATION operation), which is not helpful from a toolstack point
>>>>>>> of view, because there's no indication that the data hasn't been
>>>>>>> fetched.
>>>>>>
>>>>>> While I can see how the present behavior is problematic when it comes
>>>>>> to consuming supposedly returned data, ...
>>>>>>
>>>>>>> --- a/xen/arch/x86/mm/paging.c
>>>>>>> +++ b/xen/arch/x86/mm/paging.c
>>>>>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
>>>>>>>
>>>>>>>     if ( unlikely(d->is_dying) )
>>>>>>>     {
>>>>>>> -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
>>>>>>> +        gdprintk(XENLOG_INFO,
>>>>>>> +                 "Tried to do a paging domctl op on dying domain %u\n",
>>>>>>>                  d->domain_id);
>>>>>>> -        return 0;
>>>>>>> +        return -EINVAL;
>>>>>>>     }
>>>>>>
>>>>>> ... going from "success" to "failure" here has a meaningful risk of
>>>>>> regressing callers. It is my understanding that it was deliberate to
>>>>>> mimic success in this case (without meaning to assign "good" or "bad"
>>>>>> to that decision).
>>>>>
>>>>> I would assume that was the original intention, yes, albeit the commit
>>>>> message doesn't go into details about why mimicking success is
>>>>> required, it's very well possible the code relying on this was xend.
>>>>
>>>> Quite possible, but you never know who else has cloned code from there.
>>>>
>>>>>> Can you instead fill the data to be returned in
>>>>>> some simple enough way? I assume a mere memset() isn't going to be
>>>>>> good enough, though (albeit public/domctl.h doesn't explicitly name
>>>>>> any input-only fields, so it may not be necessary to preserve
>>>>>> anything). Maybe zeroing ->mb and ->stats would do?
>>>>>
>>>>> Hm, it still feels kind of wrong.  We do return errors elsewhere for
>>>>> operations attempted against dying domains, and that seems all fine,
>>>>> not sure why paging operations need to be different in this regard.
>>>>> Arm does also return -EINVAL in that case.
>>>>>
>>>>> So what about postponing this change to 4.18 in order to avoid
>>>>> surprises, but then taking it in its current form at the start of the
>>>>> development window, as to have time to detect any issues?
>>>>
>>>> Maybe, but to be honest I'm not convinced. Arm can't really be taken
>>>> for comparison, since the op is pretty new there iirc.
>>>
>>> Indeed, but the tools code paths are likely shared between x86 and
>>> Arm, as the hypercalls are the same.
>>
>> On x86 we have both xc_shadow_control() and (functional)
>> xc_logdirty_control(); on Arm only the former is used, while the latter
>> would also be impacted by your change. Plus you're not accounting for
>> external tool stacks (like xend would be if anyone had cared to forward
>> port it, when - as you said earlier - the suspicion is that the original
>> change was made to "please" xend).
> 
> I don't see how returning random uninitialised data (current behaviour) or wrong data (all zeroes) is better than returning an explicit error code.

I didn't say anything like that. What I did say is that we cannot lightly
move from "success" to "failure".

> And even this change happens to break some code, it'd do so with a clear error code and in a reproducible way, and then such breakage can be easily fixed in the affected piece of code
> (e.g. a toolstack other than XAPI which was not looking for errors from this domctl call)

My experience with error messages (including the conveying of error codes)
by the tool stack is rather poor. I also dare to question the
"reproducible" aspect: The main risk I see here is with a multi-threaded
tool stack where one thread hasn't become / been made properly aware of
another one being in the process of cleaning up after a domain. Its racy
issuing of further domctl-s may very well not be nicely reproducible.

> AFAICT XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION is only used in the OCaml C stubs (i.e. used XAPI/xenopsd) which always checks return value
> and raises OCaml exception and in python lowlevel libs, which checks return value and raises python exception.

But XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION isn't the only sub-op affected here,
is it?

Jan
Roger Pau Monné Nov. 9, 2022, 10:11 a.m. UTC | #9
On Wed, Nov 09, 2022 at 08:48:46AM +0100, Jan Beulich wrote:
> On 08.11.2022 18:15, Roger Pau Monné wrote:
> > On Tue, Nov 08, 2022 at 06:03:54PM +0100, Jan Beulich wrote:
> >> On 08.11.2022 17:43, Roger Pau Monné wrote:
> >>> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
> >>>> On 08.11.2022 12:38, Roger Pau Monne wrote:
> >>>>> Like on the Arm side, return -EINVAL when attempting to do a p2m
> >>>>> operation on dying domains.
> >>>>>
> >>>>> The current logic returns 0 and leaves the domctl parameter
> >>>>> uninitialized for any parameter fetching operations (like the
> >>>>> GET_ALLOCATION operation), which is not helpful from a toolstack point
> >>>>> of view, because there's no indication that the data hasn't been
> >>>>> fetched.
> >>>>
> >>>> While I can see how the present behavior is problematic when it comes
> >>>> to consuming supposedly returned data, ...
> >>>>
> >>>>> --- a/xen/arch/x86/mm/paging.c
> >>>>> +++ b/xen/arch/x86/mm/paging.c
> >>>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
> >>>>>  
> >>>>>      if ( unlikely(d->is_dying) )
> >>>>>      {
> >>>>> -        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
> >>>>> +        gdprintk(XENLOG_INFO,
> >>>>> +                 "Tried to do a paging domctl op on dying domain %u\n",
> >>>>>                   d->domain_id);
> >>>>> -        return 0;
> >>>>> +        return -EINVAL;
> >>>>>      }
> >>>>
> >>>> ... going from "success" to "failure" here has a meaningful risk of
> >>>> regressing callers. It is my understanding that it was deliberate to
> >>>> mimic success in this case (without meaning to assign "good" or "bad"
> >>>> to that decision).
> >>>
> >>> I would assume that was the original intention, yes, albeit the commit
> >>> message doesn't go into details about why mimicking success is
> >>> required, it's very well possible the code relying on this was xend.
> >>
> >> Quite possible, but you never know who else has cloned code from there.
> >>
> >>>> Can you instead fill the data to be returned in
> >>>> some simple enough way? I assume a mere memset() isn't going to be
> >>>> good enough, though (albeit public/domctl.h doesn't explicitly name
> >>>> any input-only fields, so it may not be necessary to preserve
> >>>> anything). Maybe zeroing ->mb and ->stats would do?
> >>>
> >>> Hm, it still feels kind of wrong.  We do return errors elsewhere for
> >>> operations attempted against dying domains, and that seems all fine,
> >>> not sure why paging operations need to be different in this regard.
> >>> Arm does also return -EINVAL in that case.
> >>>
> >>> So what about postponing this change to 4.18 in order to avoid
> >>> surprises, but then taking it in its current form at the start of the
> >>> development window, as to have time to detect any issues?
> >>
> >> Maybe, but to be honest I'm not convinced. Arm can't really be taken
> >> for comparison, since the op is pretty new there iirc.
> > 
> > Indeed, but the tools code paths are likely shared between x86 and
> > Arm, as the hypercalls are the same.
> 
> On x86 we have both xc_shadow_control() and (functional)
> xc_logdirty_control(); on Arm only the former is used, while the latter
> would also be impacted by your change. Plus you're not accounting for
> external tool stacks (like xend would be if anyone had cared to forward
> port it, when - as you said earlier - the suspicion is that the original
> change was made to "please" xend).

AFAICT XEN_DOMCTL_SHADOW_OP_{CLEAN,PEEK} are equally broken if no
error is returned when the domain is dying.  A caller might think the
bitmap has been fetched correctly when that's not the case, as the
bitmap buffer would be left untouched.

> > This is a domctl interface, so we are fine to do such changes.
> 
> We're fine to make changes to domctl which are either binary compatible
> with earlier versions or which are associated with a bump of the
> interface version. The latter wouldn't help in this case, while the
> former is simply not true here. For Andrew's proposed new paging pool
> interface the behavior suggested here would of course be fully
> appropriate, demanding that tool stack either don't issue such requests
> against dying domains or that they be prepared to get back errors.

I still think we need to fix this bug in Xen, and then fix any
toolstacks that rely on such bogus behavior.  Propagating such broken
interface is just going to cause more issues down the road.

> Thinking about it again I'm also not convinced EINVAL is an appropriate
> error code to use here. The operation isn't necessarily invalid; we
> only prefer to not carry out any such anymore. EOPNOTSUPP, EPERM, or
> EACCES would all seem more appropriate. Or, for ease of recognition, a
> rarely used one, e.g. ENODATA, EILSEQ, or EROFS.

I was about to use ESRCH, as that also used by track_dirty_vram() and
seems sensible.

> Finally I'm not convinced of the usefulness of this dying check in the
> first place: is_dying may become set immediately after the check was
> done.

While strictly true, this code is executed with the domain lock held,
so while is_dying might change, domain_kill() won't make progress
because of the barrier on the domain lock just after setting is_dying.

Thanks, Roger.
Jan Beulich Nov. 9, 2022, 10:23 a.m. UTC | #10
On 09.11.2022 11:11, Roger Pau Monné wrote:
> On Wed, Nov 09, 2022 at 08:48:46AM +0100, Jan Beulich wrote:
>> Finally I'm not convinced of the usefulness of this dying check in the
>> first place: is_dying may become set immediately after the check was
>> done.
> 
> While strictly true, this code is executed with the domain lock held,
> so while is_dying might change, domain_kill() won't make progress
> because of the barrier on the domain lock just after setting is_dying.

I guess I'm confused now: This code is called with the domctl lock
held, which - as said before - is a questionable thing, for serializing
things more than necessary as well as for holding this lock for
excessive periods of time. IOW I consider it wrong to depend on that
in paging_domctl() to synchronize against domain_kill(). Yet indeed that
should eliminate races at present.

Jan
Roger Pau Monné Nov. 9, 2022, 11:36 a.m. UTC | #11
On Wed, Nov 09, 2022 at 11:23:01AM +0100, Jan Beulich wrote:
> On 09.11.2022 11:11, Roger Pau Monné wrote:
> > On Wed, Nov 09, 2022 at 08:48:46AM +0100, Jan Beulich wrote:
> >> Finally I'm not convinced of the usefulness of this dying check in the
> >> first place: is_dying may become set immediately after the check was
> >> done.
> > 
> > While strictly true, this code is executed with the domain lock held,
> > so while is_dying might change, domain_kill() won't make progress
> > because of the barrier on the domain lock just after setting is_dying.
> 
> I guess I'm confused now: This code is called with the domctl lock
> held, which - as said before - is a questionable thing, for serializing
> things more than necessary as well as for holding this lock for
> excessive periods of time. IOW I consider it wrong to depend on that
> in paging_domctl() to synchronize against domain_kill(). Yet indeed that
> should eliminate races at present.

Right, both are domctls.  There are other places where is_dying get
set as part of failures in the domain create paths, but then the
paging domctl failing would be natural, as the domain is being
destroyed as part of a failed domain create.

Since I don't see replies to my other comments, do you agree on
returning an error then?

Thanks, Roger.
Jan Beulich Nov. 9, 2022, 12:02 p.m. UTC | #12
On 09.11.2022 12:36, Roger Pau Monné wrote:
> Since I don't see replies to my other comments, do you agree on
> returning an error then?

No, my view there hasn't changed. I wouldn't block a change to go in
early for 4.18, but I also wouldn't ack such.

Perhaps just one remark on your other earlier comments: While you're
right about XEN_DOMCTL_SHADOW_OP_{CLEAN,PEEK}, (effectively) random
data in the bitmap may cause a caller to do extra work, but wouldn't
look to be otherwise harmful: Considering pages dirty which aren't
is never a functional problem, while considering pages clean which
aren't is (imo) not a problem for a dying domain.

Jan
Roger Pau Monné Nov. 9, 2022, 1:22 p.m. UTC | #13
On Wed, Nov 09, 2022 at 01:02:28PM +0100, Jan Beulich wrote:
> On 09.11.2022 12:36, Roger Pau Monné wrote:
> > Since I don't see replies to my other comments, do you agree on
> > returning an error then?
> 
> No, my view there hasn't changed. I wouldn't block a change to go in
> early for 4.18, but I also wouldn't ack such.
> 
> Perhaps just one remark on your other earlier comments: While you're
> right about XEN_DOMCTL_SHADOW_OP_{CLEAN,PEEK}, (effectively) random
> data in the bitmap may cause a caller to do extra work, but wouldn't
> look to be otherwise harmful: Considering pages dirty which aren't
> is never a functional problem, while considering pages clean which
> aren't is (imo) not a problem for a dying domain.

Can't that lead to failures elsewhere when attempts to fetch those
pages find they might have been removed from the p2m?

We are exchanging one failure path for another, but it would make more
sense to return an error here instead of uninitialized data, so that
the tools don't attempt to perform actions based on such invalid
bitmaps.

Thanks, Roger.
Andrew Cooper Nov. 9, 2022, 4:11 p.m. UTC | #14
On 08/11/2022 11:38, Roger Pau Monne wrote:
> Like on the Arm side, return -EINVAL when attempting to do a p2m
> operation on dying domains.

Honestly, I'd drop the comment about ARM.  "the Arm side" has existed
for of all of a couple of weeks.

A far better justification is because almost all other DOMCTLs are
rejected with -EINVAL against dying domains.

> The current logic returns 0 and leaves the domctl parameter
> uninitialized for any parameter fetching operations (like the
> GET_ALLOCATION operation), which is not helpful from a toolstack point
> of view, because there's no indication that the data hasn't been
> fetched.
>
> Reported-by: Edwin Török <edvin.torok@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>, ideally with a
modified commit message.

Xen's behaviour is definitely bogus, whatever the perceived intention
behind this change was originally.

System-wide memory handling is unusably broken for known reasons, let
alone unexpected surprises like this, it is not credible to suggest that
unspecified obsolete code might be broken by such a change; it's
definitely broken, and what we risk is exposing a previously hidden error.

Not that this is relevant, because Xend only makes this hypercall
bounded by domctls which do yield -EINVAL for dying domains.

~Andrew
Jan Beulich Nov. 10, 2022, 9:02 a.m. UTC | #15
On 09.11.2022 17:11, Andrew Cooper wrote:
> On 08/11/2022 11:38, Roger Pau Monne wrote:
>> Like on the Arm side, return -EINVAL when attempting to do a p2m
>> operation on dying domains.
> 
> Honestly, I'd drop the comment about ARM.  "the Arm side" has existed
> for of all of a couple of weeks.
> 
> A far better justification is because almost all other DOMCTLs are
> rejected with -EINVAL against dying domains.

Would you mind supplying data to prove this statement? When looking
at just x86'es arch_do_domctl() I can't see this being the case for
ioport_permission, getpageframeinfo3, hypercall_init, set_address_size,
get_address_size, sendtrigger, bind_pt_irq, unbind_pt_irq,
ioport_mapping, set_ext_vcpucontext, and get_ext_vcpucontext. At which
point I stopped checking further because in the order they appear in
the file these are _all_ except shadow_op and [gs]ethvmcontext*. Am I
overlooking hidden checks of ->is_dying?

Jan
Jan Beulich Nov. 10, 2022, 9:03 a.m. UTC | #16
On 09.11.2022 14:22, Roger Pau Monné wrote:
> On Wed, Nov 09, 2022 at 01:02:28PM +0100, Jan Beulich wrote:
>> On 09.11.2022 12:36, Roger Pau Monné wrote:
>>> Since I don't see replies to my other comments, do you agree on
>>> returning an error then?
>>
>> No, my view there hasn't changed. I wouldn't block a change to go in
>> early for 4.18, but I also wouldn't ack such.
>>
>> Perhaps just one remark on your other earlier comments: While you're
>> right about XEN_DOMCTL_SHADOW_OP_{CLEAN,PEEK}, (effectively) random
>> data in the bitmap may cause a caller to do extra work, but wouldn't
>> look to be otherwise harmful: Considering pages dirty which aren't
>> is never a functional problem, while considering pages clean which
>> aren't is (imo) not a problem for a dying domain.
> 
> Can't that lead to failures elsewhere when attempts to fetch those
> pages find they might have been removed from the p2m?

Quite possible, yes.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 3a355eee9c..3e7be07e86 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -694,9 +694,10 @@  int paging_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
 
     if ( unlikely(d->is_dying) )
     {
-        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
+        gdprintk(XENLOG_INFO,
+                 "Tried to do a paging domctl op on dying domain %u\n",
                  d->domain_id);
-        return 0;
+        return -EINVAL;
     }
 
     if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) )