diff mbox series

[v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs

Message ID 20200306160254.8465-1-paul@xen.org (mailing list archive)
State Superseded
Headers show
Series [v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs | expand

Commit Message

Paul Durrant March 6, 2020, 4:02 p.m. UTC
From: Varad Gautam <vrd@amazon.de>

XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
In that scenario, it is possible to receive multiple __pirq_guest_unbind
calls for the same pirq from domain_kill, if the pirq has not yet been
removed from the domain's pirq_tree, as:
  domain_kill()
    -> domain_relinquish_resources()
      -> pci_release_devices()
        -> pci_clean_dpci_irq()
          -> pirq_guest_unbind()
            -> __pirq_guest_unbind()

For a shared pirq (nr_guests > 1), the first call would zap the current
domain from the pirq's guests[] list, but the action handler is never freed
as there are other guests using this pirq. As a result, on the second call,
__pirq_guest_unbind searches for the current domain which has been removed
from the guests[] list, and hits a BUG_ON.

Make __pirq_guest_unbind safe to be called multiple times by letting xen
continue if a shared pirq has already been unbound from this guest. The
PIRQ will be cleaned up from the domain's pirq_tree during the destruction
in complete_domain_destroy anyway.

Signed-off-by: Varad Gautam <vrd@amazon.de>
[taking over from Varad at v4]
Signed-off-by: Paul Durrant <paul@xen.org>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Roger suggested cleaning the entry from the domain pirq_tree so that
we need not make it safe to re-call __pirq_guest_unbind(). This seems like
a reasonable suggestion but the semantics of the code are almost
impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also
the name of struct so you generally have little idea what it actally means)
so I prefer to stick with a small fix that I can actually reason about.

v4:
 - Re-work the guest array search to make it clearer

v3:
  - Style fixups

v2:
 - Split the check on action->nr_guests > 0 and make it an ASSERT
---
 xen/arch/x86/irq.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

Comments

Jan Beulich March 9, 2020, 4:29 p.m. UTC | #1
On 06.03.2020 17:02, paul@xen.org wrote:
> From: Varad Gautam <vrd@amazon.de>
> 
> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> In that scenario, it is possible to receive multiple __pirq_guest_unbind
> calls for the same pirq from domain_kill, if the pirq has not yet been
> removed from the domain's pirq_tree, as:
>   domain_kill()
>     -> domain_relinquish_resources()
>       -> pci_release_devices()
>         -> pci_clean_dpci_irq()
>           -> pirq_guest_unbind()
>             -> __pirq_guest_unbind()
> 
> For a shared pirq (nr_guests > 1), the first call would zap the current
> domain from the pirq's guests[] list, but the action handler is never freed
> as there are other guests using this pirq. As a result, on the second call,
> __pirq_guest_unbind searches for the current domain which has been removed
> from the guests[] list, and hits a BUG_ON.
> 
> Make __pirq_guest_unbind safe to be called multiple times by letting xen
> continue if a shared pirq has already been unbound from this guest. The
> PIRQ will be cleaned up from the domain's pirq_tree during the destruction
> in complete_domain_destroy anyway.
> 
> Signed-off-by: Varad Gautam <vrd@amazon.de>
> [taking over from Varad at v4]
> Signed-off-by: Paul Durrant <paul@xen.org>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Roger suggested cleaning the entry from the domain pirq_tree so that
> we need not make it safe to re-call __pirq_guest_unbind(). This seems like
> a reasonable suggestion but the semantics of the code are almost
> impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also
> the name of struct so you generally have little idea what it actally means)
> so I prefer to stick with a small fix that I can actually reason about.
> 
> v4:
>  - Re-work the guest array search to make it clearer

I.e. there are cosmetic differences to v3 (see below), but
technically it's still the same. I can't believe the re-use
of "pirq" for different entities is this big of a problem.
But anyway:

> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind(
>  
>      BUG_ON(!(desc->status & IRQ_GUEST));
>  
> -    for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
> -        continue;
> -    BUG_ON(i == action->nr_guests);
> +    for ( i = 0; i < action->nr_guests; i++ )
> +        if ( action->guest[i] == d )
> +            break;
> +
> +    if ( i == action->nr_guests ) /* No matching entry */
> +    {
> +        /*
> +         * In case the pirq was shared, unbound for this domain in an earlier
> +         * call, but still existed on the domain's pirq_tree, we still reach
> +         * here if there are any later unbind calls on the same pirq. Return
> +         * if such an unbind happens.
> +         */
> +        ASSERT(action->shareable);
> +        return NULL;
> +    }
> +
> +    ASSERT(action->nr_guests > 0);

This seems pointless to have here - v3 had it inside the if(),
where it would actually guard against coming here with nr_guests
equal to zero. v3 also used if() and BUG() instead of ASSERT()
inside this if(), which to me would seem more in line with our
current ./CODING_STYLE guidelines of handling unexpected
conditions. Could you clarify why you switched things?

Jan
Paul Durrant March 9, 2020, 5:47 p.m. UTC | #2
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 09 March 2020 16:29
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; Varad Gautam <vrd@amazon.de>; Julien Grall <julien@xen.org>; Roger
> Pau Monné <roger.pau@citrix.com>; Andrew Cooper <andrew.cooper3@citrix.com>
> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
> 
> On 06.03.2020 17:02, paul@xen.org wrote:
> > From: Varad Gautam <vrd@amazon.de>
> >
> > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> > In that scenario, it is possible to receive multiple __pirq_guest_unbind
> > calls for the same pirq from domain_kill, if the pirq has not yet been
> > removed from the domain's pirq_tree, as:
> >   domain_kill()
> >     -> domain_relinquish_resources()
> >       -> pci_release_devices()
> >         -> pci_clean_dpci_irq()
> >           -> pirq_guest_unbind()
> >             -> __pirq_guest_unbind()
> >
> > For a shared pirq (nr_guests > 1), the first call would zap the current
> > domain from the pirq's guests[] list, but the action handler is never freed
> > as there are other guests using this pirq. As a result, on the second call,
> > __pirq_guest_unbind searches for the current domain which has been removed
> > from the guests[] list, and hits a BUG_ON.
> >
> > Make __pirq_guest_unbind safe to be called multiple times by letting xen
> > continue if a shared pirq has already been unbound from this guest. The
> > PIRQ will be cleaned up from the domain's pirq_tree during the destruction
> > in complete_domain_destroy anyway.
> >
> > Signed-off-by: Varad Gautam <vrd@amazon.de>
> > [taking over from Varad at v4]
> > Signed-off-by: Paul Durrant <paul@xen.org>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Julien Grall <julien@xen.org>
> > Cc: Roger Pau Monné <roger.pau@citrix.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> >
> > Roger suggested cleaning the entry from the domain pirq_tree so that
> > we need not make it safe to re-call __pirq_guest_unbind(). This seems like
> > a reasonable suggestion but the semantics of the code are almost
> > impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also
> > the name of struct so you generally have little idea what it actally means)
> > so I prefer to stick with a small fix that I can actually reason about.
> >
> > v4:
> >  - Re-work the guest array search to make it clearer
> 
> I.e. there are cosmetic differences to v3 (see below), but
> technically it's still the same. I can't believe the re-use
> of "pirq" for different entities is this big of a problem.

Please suggest code if you think it ought to be done differentely. I tried.

> But anyway:
> 
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind(
> >
> >      BUG_ON(!(desc->status & IRQ_GUEST));
> >
> > -    for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
> > -        continue;
> > -    BUG_ON(i == action->nr_guests);
> > +    for ( i = 0; i < action->nr_guests; i++ )
> > +        if ( action->guest[i] == d )
> > +            break;
> > +
> > +    if ( i == action->nr_guests ) /* No matching entry */
> > +    {
> > +        /*
> > +         * In case the pirq was shared, unbound for this domain in an earlier
> > +         * call, but still existed on the domain's pirq_tree, we still reach
> > +         * here if there are any later unbind calls on the same pirq. Return
> > +         * if such an unbind happens.
> > +         */
> > +        ASSERT(action->shareable);
> > +        return NULL;
> > +    }
> > +
> > +    ASSERT(action->nr_guests > 0);
> 
> This seems pointless to have here - v3 had it inside the if(),
> where it would actually guard against coming here with nr_guests
> equal to zero.

Why. The code just after this decrements nr_guests so it seems like entirely the right point to have the ASSERT. I can make it ASSERT >= 0, if that makes more sense.

> v3 also used if() and BUG() instead of ASSERT()
> inside this if(), which to me would seem more in line with our
> current ./CODING_STYLE guidelines of handling unexpected
> conditions. Could you clarify why you switched things?
> 

Because I don't think there is need to kill the host in a non-debug context if we hit this condition; it is entirely survivable as far as I can tell so a BUG_ON() did not seem appropriate.

  Paul
Jan Beulich March 10, 2020, 11:23 a.m. UTC | #3
On 09.03.2020 18:47, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 09 March 2020 16:29
>>
>> On 06.03.2020 17:02, paul@xen.org wrote:
>>> --- a/xen/arch/x86/irq.c
>>> +++ b/xen/arch/x86/irq.c
>>> @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind(
>>>
>>>      BUG_ON(!(desc->status & IRQ_GUEST));
>>>
>>> -    for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
>>> -        continue;
>>> -    BUG_ON(i == action->nr_guests);
>>> +    for ( i = 0; i < action->nr_guests; i++ )
>>> +        if ( action->guest[i] == d )
>>> +            break;
>>> +
>>> +    if ( i == action->nr_guests ) /* No matching entry */
>>> +    {
>>> +        /*
>>> +         * In case the pirq was shared, unbound for this domain in an earlier
>>> +         * call, but still existed on the domain's pirq_tree, we still reach
>>> +         * here if there are any later unbind calls on the same pirq. Return
>>> +         * if such an unbind happens.
>>> +         */
>>> +        ASSERT(action->shareable);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    ASSERT(action->nr_guests > 0);
>>
>> This seems pointless to have here - v3 had it inside the if(),
>> where it would actually guard against coming here with nr_guests
>> equal to zero.
> 
> Why. The code just after this decrements nr_guests so it seems
> like entirely the right point to have the ASSERT. I can make it
> ASSERT >= 0, if that makes more sense.

There's no way to come here when nr_guests == 0. This is because
in this case the loop will be exited with i being zero, and hence
the earlier if()'s body will be entered.

(And no, >= 0 wouldn't make sense to me - it would mean we might
have a count of -1 after the decrement.)

>> v3 also used if() and BUG() instead of ASSERT()
>> inside this if(), which to me would seem more in line with our
>> current ./CODING_STYLE guidelines of handling unexpected
>> conditions. Could you clarify why you switched things?
>>
> 
> Because I don't think there is need to kill the host in a
> non-debug context if we hit this condition; it is entirely
> survivable as far as I can tell so a BUG_ON() did not seem
> appropriate.

It'll mean we have a non-sharable IRQ in a place where this is
not supposed to happen. How can we be sure the system is in a
state allowing to safely continue? To me, if shareable / non-
shareable is of any concern here, then it ought to be BUG().
If it's not, then the ASSERT() ought to be dropped as well.

Jan
Paul Durrant March 10, 2020, 12:36 p.m. UTC | #4
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 10 March 2020 11:23
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; 'Varad Gautam' <vrd@amazon.de>; 'Julien Grall' <julien@xen.org>;
> 'Roger Pau Monné' <roger.pau@citrix.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>
> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
> 
> On 09.03.2020 18:47, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 09 March 2020 16:29
> >>
> >> On 06.03.2020 17:02, paul@xen.org wrote:
> >>> --- a/xen/arch/x86/irq.c
> >>> +++ b/xen/arch/x86/irq.c
> >>> @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind(
> >>>
> >>>      BUG_ON(!(desc->status & IRQ_GUEST));
> >>>
> >>> -    for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
> >>> -        continue;
> >>> -    BUG_ON(i == action->nr_guests);
> >>> +    for ( i = 0; i < action->nr_guests; i++ )
> >>> +        if ( action->guest[i] == d )
> >>> +            break;
> >>> +
> >>> +    if ( i == action->nr_guests ) /* No matching entry */
> >>> +    {
> >>> +        /*
> >>> +         * In case the pirq was shared, unbound for this domain in an earlier
> >>> +         * call, but still existed on the domain's pirq_tree, we still reach
> >>> +         * here if there are any later unbind calls on the same pirq. Return
> >>> +         * if such an unbind happens.
> >>> +         */
> >>> +        ASSERT(action->shareable);
> >>> +        return NULL;
> >>> +    }
> >>> +
> >>> +    ASSERT(action->nr_guests > 0);
> >>
> >> This seems pointless to have here - v3 had it inside the if(),
> >> where it would actually guard against coming here with nr_guests
> >> equal to zero.
> >
> > Why. The code just after this decrements nr_guests so it seems
> > like entirely the right point to have the ASSERT. I can make it
> > ASSERT >= 0, if that makes more sense.
> 
> There's no way to come here when nr_guests == 0. This is because
> in this case the loop will be exited with i being zero, and hence
> the earlier if()'s body will be entered.

Ok, yes, that's true.

> 
> (And no, >= 0 wouldn't make sense to me - it would mean we might
> have a count of -1 after the decrement.)
> 
> >> v3 also used if() and BUG() instead of ASSERT()
> >> inside this if(), which to me would seem more in line with our
> >> current ./CODING_STYLE guidelines of handling unexpected
> >> conditions. Could you clarify why you switched things?
> >>
> >
> > Because I don't think there is need to kill the host in a
> > non-debug context if we hit this condition; it is entirely
> > survivable as far as I can tell so a BUG_ON() did not seem
> > appropriate.
> 
> It'll mean we have a non-sharable IRQ in a place where this is
> not supposed to happen. How can we be sure the system is in a
> state allowing to safely continue? To me, if shareable / non-
> shareable is of any concern here, then it ought to be BUG().
> If it's not, then the ASSERT() ought to be dropped as well.

Ok, I'll convert back to a BUG().

  Paul

> 
> Jan
Jan Beulich March 10, 2020, 1:38 p.m. UTC | #5
On 10.03.2020 13:36, Paul Durrant wrote:
> Ok, I'll convert back to a BUG().

Wait a little - I think I have an alternative proposal. Just want to
at least smoke test it before sending out.

Jan
Jan Beulich March 10, 2020, 2:19 p.m. UTC | #6
On 09.03.2020 18:47, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 09 March 2020 16:29
>> To: paul@xen.org
>> Cc: xen-devel@lists.xenproject.org; Varad Gautam <vrd@amazon.de>; Julien Grall <julien@xen.org>; Roger
>> Pau Monné <roger.pau@citrix.com>; Andrew Cooper <andrew.cooper3@citrix.com>
>> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
>>
>> On 06.03.2020 17:02, paul@xen.org wrote:
>>> From: Varad Gautam <vrd@amazon.de>
>>>
>>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
>>> In that scenario, it is possible to receive multiple __pirq_guest_unbind
>>> calls for the same pirq from domain_kill, if the pirq has not yet been
>>> removed from the domain's pirq_tree, as:
>>>   domain_kill()
>>>     -> domain_relinquish_resources()
>>>       -> pci_release_devices()
>>>         -> pci_clean_dpci_irq()
>>>           -> pirq_guest_unbind()
>>>             -> __pirq_guest_unbind()
>>>
>>> For a shared pirq (nr_guests > 1), the first call would zap the current
>>> domain from the pirq's guests[] list, but the action handler is never freed
>>> as there are other guests using this pirq. As a result, on the second call,
>>> __pirq_guest_unbind searches for the current domain which has been removed
>>> from the guests[] list, and hits a BUG_ON.
>>>
>>> Make __pirq_guest_unbind safe to be called multiple times by letting xen
>>> continue if a shared pirq has already been unbound from this guest. The
>>> PIRQ will be cleaned up from the domain's pirq_tree during the destruction
>>> in complete_domain_destroy anyway.
>>>
>>> Signed-off-by: Varad Gautam <vrd@amazon.de>
>>> [taking over from Varad at v4]
>>> Signed-off-by: Paul Durrant <paul@xen.org>
>>> ---
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Julien Grall <julien@xen.org>
>>> Cc: Roger Pau Monné <roger.pau@citrix.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>
>>> Roger suggested cleaning the entry from the domain pirq_tree so that
>>> we need not make it safe to re-call __pirq_guest_unbind(). This seems like
>>> a reasonable suggestion but the semantics of the code are almost
>>> impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also
>>> the name of struct so you generally have little idea what it actally means)
>>> so I prefer to stick with a small fix that I can actually reason about.
>>>
>>> v4:
>>>  - Re-work the guest array search to make it clearer
>>
>> I.e. there are cosmetic differences to v3 (see below), but
>> technically it's still the same. I can't believe the re-use
>> of "pirq" for different entities is this big of a problem.
> 
> Please suggest code if you think it ought to be done differentely. I tried.

How about this? It's admittedly more code, but imo less ad hoc.
I've smoke tested it, but I depend on you or Varad to check that
it actually addresses the reported issue.

Jan

x86/pass-through: avoid double IRQ unbind during domain cleanup

XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
In that scenario, it is possible to receive multiple _pirq_guest_unbind
calls for the same pirq from domain_kill, if the pirq has not yet been
removed from the domain's pirq_tree, as:
  domain_kill()
    -> domain_relinquish_resources()
      -> pci_release_devices()
        -> pci_clean_dpci_irq()
          -> pirq_guest_unbind()
            -> __pirq_guest_unbind()

Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ
from the tree being iterated after the first call there. In case such a
removed entry still has a softirq outstanding, record it and re-check
upon re-invocation.

Reported-by: Varad Gautam <vrd@amazon.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- unstable.orig/xen/arch/x86/irq.c
+++ unstable/xen/arch/x86/irq.c
@@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p
     }
 
     if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq )
-        BUG();
+        BUG_ON(!d->is_dying);
 }
 
 /* Flush all ready EOIs from the top of this CPU's pending-EOI stack. */
--- unstable.orig/xen/drivers/passthrough/pci.c
+++ unstable/xen/drivers/passthrough/pci.c
@@ -873,7 +873,14 @@ static int pci_clean_dpci_irq(struct dom
         xfree(digl);
     }
 
-    return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0;
+    radix_tree_delete(&d->pirq_tree, dpci_pirq(pirq_dpci)->pirq);
+
+    if ( !pt_pirq_softirq_active(pirq_dpci) )
+        return 0;
+
+    domain_get_irq_dpci(d)->pending_pirq_dpci = pirq_dpci;
+
+    return -ERESTART;
 }
 
 static int pci_clean_dpci_irqs(struct domain *d)
@@ -890,8 +897,18 @@ static int pci_clean_dpci_irqs(struct do
     hvm_irq_dpci = domain_get_irq_dpci(d);
     if ( hvm_irq_dpci != NULL )
     {
-        int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
+        int ret = 0;
+
+        if ( hvm_irq_dpci->pending_pirq_dpci )
+        {
+            if ( pt_pirq_softirq_active(hvm_irq_dpci->pending_pirq_dpci) )
+                 ret = -ERESTART;
+            else
+                 hvm_irq_dpci->pending_pirq_dpci = NULL;
+        }
 
+        if ( !ret )
+            ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
         if ( ret )
         {
             spin_unlock(&d->event_lock);
--- unstable.orig/xen/include/asm-x86/hvm/irq.h
+++ unstable/xen/include/asm-x86/hvm/irq.h
@@ -158,6 +158,8 @@ struct hvm_irq_dpci {
     DECLARE_BITMAP(isairq_map, NR_ISAIRQS);
     /* Record of mapped Links */
     uint8_t link_cnt[NR_LINK];
+    /* Clean up: Entry with a softirq invocation pending / in progress. */
+    struct hvm_pirq_dpci *pending_pirq_dpci;
 };
 
 /* Machine IRQ to guest device/intx mapping. */
Paul Durrant March 17, 2020, 3:23 p.m. UTC | #7
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 10 March 2020 14:19
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; 'Varad Gautam' <vrd@amazon.de>; 'Julien Grall' <julien@xen.org>;
> 'Roger Pau Monné' <roger.pau@citrix.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>
> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
> 
> On 09.03.2020 18:47, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 09 March 2020 16:29
> >> To: paul@xen.org
> >> Cc: xen-devel@lists.xenproject.org; Varad Gautam <vrd@amazon.de>; Julien Grall <julien@xen.org>;
> Roger
> >> Pau Monné <roger.pau@citrix.com>; Andrew Cooper <andrew.cooper3@citrix.com>
> >> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
> >>
> >> On 06.03.2020 17:02, paul@xen.org wrote:
> >>> From: Varad Gautam <vrd@amazon.de>
> >>>
> >>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> >>> In that scenario, it is possible to receive multiple __pirq_guest_unbind
> >>> calls for the same pirq from domain_kill, if the pirq has not yet been
> >>> removed from the domain's pirq_tree, as:
> >>>   domain_kill()
> >>>     -> domain_relinquish_resources()
> >>>       -> pci_release_devices()
> >>>         -> pci_clean_dpci_irq()
> >>>           -> pirq_guest_unbind()
> >>>             -> __pirq_guest_unbind()
> >>>
> >>> For a shared pirq (nr_guests > 1), the first call would zap the current
> >>> domain from the pirq's guests[] list, but the action handler is never freed
> >>> as there are other guests using this pirq. As a result, on the second call,
> >>> __pirq_guest_unbind searches for the current domain which has been removed
> >>> from the guests[] list, and hits a BUG_ON.
> >>>
> >>> Make __pirq_guest_unbind safe to be called multiple times by letting xen
> >>> continue if a shared pirq has already been unbound from this guest. The
> >>> PIRQ will be cleaned up from the domain's pirq_tree during the destruction
> >>> in complete_domain_destroy anyway.
> >>>
> >>> Signed-off-by: Varad Gautam <vrd@amazon.de>
> >>> [taking over from Varad at v4]
> >>> Signed-off-by: Paul Durrant <paul@xen.org>
> >>> ---
> >>> Cc: Jan Beulich <jbeulich@suse.com>
> >>> Cc: Julien Grall <julien@xen.org>
> >>> Cc: Roger Pau Monné <roger.pau@citrix.com>
> >>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>
> >>> Roger suggested cleaning the entry from the domain pirq_tree so that
> >>> we need not make it safe to re-call __pirq_guest_unbind(). This seems like
> >>> a reasonable suggestion but the semantics of the code are almost
> >>> impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also
> >>> the name of struct so you generally have little idea what it actally means)
> >>> so I prefer to stick with a small fix that I can actually reason about.
> >>>
> >>> v4:
> >>>  - Re-work the guest array search to make it clearer
> >>
> >> I.e. there are cosmetic differences to v3 (see below), but
> >> technically it's still the same. I can't believe the re-use
> >> of "pirq" for different entities is this big of a problem.
> >
> > Please suggest code if you think it ought to be done differentely. I tried.
> 
> How about this? It's admittedly more code, but imo less ad hoc.
> I've smoke tested it, but I depend on you or Varad to check that
> it actually addresses the reported issue.

It's fairly hard to hit IIRC but we could probably engineer it with a one off ERESTART in the right place.

> 
> Jan
> 
> x86/pass-through: avoid double IRQ unbind during domain cleanup
> 
> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> In that scenario, it is possible to receive multiple _pirq_guest_unbind
> calls for the same pirq from domain_kill, if the pirq has not yet been
> removed from the domain's pirq_tree, as:
>   domain_kill()
>     -> domain_relinquish_resources()
>       -> pci_release_devices()
>         -> pci_clean_dpci_irq()
>           -> pirq_guest_unbind()
>             -> __pirq_guest_unbind()
> 
> Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ
> from the tree being iterated after the first call there. In case such a
> removed entry still has a softirq outstanding, record it and re-check
> upon re-invocation.
> 
> Reported-by: Varad Gautam <vrd@amazon.de>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- unstable.orig/xen/arch/x86/irq.c
> +++ unstable/xen/arch/x86/irq.c
> @@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p
>      }
> 
>      if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq )
> -        BUG();
> +        BUG_ON(!d->is_dying);
>  }
> 
>  /* Flush all ready EOIs from the top of this CPU's pending-EOI stack. */
> --- unstable.orig/xen/drivers/passthrough/pci.c
> +++ unstable/xen/drivers/passthrough/pci.c
> @@ -873,7 +873,14 @@ static int pci_clean_dpci_irq(struct dom
>          xfree(digl);
>      }
> 
> -    return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0;
> +    radix_tree_delete(&d->pirq_tree, dpci_pirq(pirq_dpci)->pirq);
> +
> +    if ( !pt_pirq_softirq_active(pirq_dpci) )
> +        return 0;
> +
> +    domain_get_irq_dpci(d)->pending_pirq_dpci = pirq_dpci;
> +
> +    return -ERESTART;
>  }
> 
>  static int pci_clean_dpci_irqs(struct domain *d)
> @@ -890,8 +897,18 @@ static int pci_clean_dpci_irqs(struct do
>      hvm_irq_dpci = domain_get_irq_dpci(d);
>      if ( hvm_irq_dpci != NULL )
>      {
> -        int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
> +        int ret = 0;
> +
> +        if ( hvm_irq_dpci->pending_pirq_dpci )
> +        {
> +            if ( pt_pirq_softirq_active(hvm_irq_dpci->pending_pirq_dpci) )
> +                 ret = -ERESTART;
> +            else
> +                 hvm_irq_dpci->pending_pirq_dpci = NULL;
> +        }
> 
> +        if ( !ret )
> +            ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
>          if ( ret )
>          {
>              spin_unlock(&d->event_lock);
> --- unstable.orig/xen/include/asm-x86/hvm/irq.h
> +++ unstable/xen/include/asm-x86/hvm/irq.h
> @@ -158,6 +158,8 @@ struct hvm_irq_dpci {
>      DECLARE_BITMAP(isairq_map, NR_ISAIRQS);
>      /* Record of mapped Links */
>      uint8_t link_cnt[NR_LINK];
> +    /* Clean up: Entry with a softirq invocation pending / in progress. */
> +    struct hvm_pirq_dpci *pending_pirq_dpci;
>  };
> 

That looks like it will do the job. I'll see if I can get it tested.

  Paul

>  /* Machine IRQ to guest device/intx mapping. */
>
Jan Beulich March 31, 2020, 7:40 a.m. UTC | #8
On 17.03.2020 16:23, Paul Durrant wrote:
> That looks like it will do the job. I'll see if I can get it tested.

Any luck with this, yet?

Jan
Paul Durrant March 31, 2020, 11:51 a.m. UTC | #9
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 31 March 2020 08:41
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; 'Varad Gautam' <vrd@amazon.de>; 'Julien Grall' <julien@xen.org>;
> 'Roger Pau Monné' <roger.pau@citrix.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>
> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
> 
> On 17.03.2020 16:23, Paul Durrant wrote:
> > That looks like it will do the job. I'll see if I can get it tested.
> 
> Any luck with this, yet?
> 

I have asked Varad to test it. He hopes to get to it this week.

  Paul
Jan Beulich April 23, 2020, 11:08 a.m. UTC | #10
On 31.03.2020 13:51, Paul Durrant wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 31 March 2020 08:41
>>
>> On 17.03.2020 16:23, Paul Durrant wrote:
>>> That looks like it will do the job. I'll see if I can get it tested.
>>
>> Any luck with this, yet?
> 
> I have asked Varad to test it. He hopes to get to it this week.

Any news here?

Jan
Paul Durrant April 23, 2020, 3:14 p.m. UTC | #11
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 23 April 2020 12:09
> To: paul@xen.org; 'Varad Gautam' <vrd@amazon.de>
> Cc: xen-devel@lists.xenproject.org; 'Julien Grall' <julien@xen.org>; 'Roger Pau Monné'
> <roger.pau@citrix.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>
> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
> 
> On 31.03.2020 13:51, Paul Durrant wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 31 March 2020 08:41
> >>
> >> On 17.03.2020 16:23, Paul Durrant wrote:
> >>> That looks like it will do the job. I'll see if I can get it tested.
> >>
> >> Any luck with this, yet?
> >
> > I have asked Varad to test it. He hopes to get to it this week.
> 
> Any news here?
> 

Varad tells me he is currently testing and will get back to you soon (hopefully today).

  Paul
vrd@amazon.com April 28, 2020, 11:58 a.m. UTC | #12
Hi Jan,

On 3/10/20 3:19 PM, Jan Beulich wrote:
> On 09.03.2020 18:47, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: 09 March 2020 16:29
>>> To: paul@xen.org
>>> Cc: xen-devel@lists.xenproject.org; Varad Gautam <vrd@amazon.de>; Julien Grall <julien@xen.org>; Roger
>>> Pau Monné <roger.pau@citrix.com>; Andrew Cooper <andrew.cooper3@citrix.com>
>>> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
>>>
>>> On 06.03.2020 17:02, paul@xen.org wrote:
>>>> From: Varad Gautam <vrd@amazon.de>
>>>>
>>>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
>>>> In that scenario, it is possible to receive multiple __pirq_guest_unbind
>>>> calls for the same pirq from domain_kill, if the pirq has not yet been
>>>> removed from the domain's pirq_tree, as:
>>>>    domain_kill()
>>>>      -> domain_relinquish_resources()
>>>>        -> pci_release_devices()
>>>>          -> pci_clean_dpci_irq()
>>>>            -> pirq_guest_unbind()
>>>>              -> __pirq_guest_unbind()
>>>>
>>>> For a shared pirq (nr_guests > 1), the first call would zap the current
>>>> domain from the pirq's guests[] list, but the action handler is never freed
>>>> as there are other guests using this pirq. As a result, on the second call,
>>>> __pirq_guest_unbind searches for the current domain which has been removed
>>>> from the guests[] list, and hits a BUG_ON.
>>>>
>>>> Make __pirq_guest_unbind safe to be called multiple times by letting xen
>>>> continue if a shared pirq has already been unbound from this guest. The
>>>> PIRQ will be cleaned up from the domain's pirq_tree during the destruction
>>>> in complete_domain_destroy anyway.
>>>>
>>>> Signed-off-by: Varad Gautam <vrd@amazon.de>
>>>> [taking over from Varad at v4]
>>>> Signed-off-by: Paul Durrant <paul@xen.org>
>>>> ---
>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>> Cc: Julien Grall <julien@xen.org>
>>>> Cc: Roger Pau Monné <roger.pau@citrix.com>
>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>
>>>> Roger suggested cleaning the entry from the domain pirq_tree so that
>>>> we need not make it safe to re-call __pirq_guest_unbind(). This seems like
>>>> a reasonable suggestion but the semantics of the code are almost
>>>> impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also
>>>> the name of struct so you generally have little idea what it actally means)
>>>> so I prefer to stick with a small fix that I can actually reason about.
>>>>
>>>> v4:
>>>>   - Re-work the guest array search to make it clearer
>>> I.e. there are cosmetic differences to v3 (see below), but
>>> technically it's still the same. I can't believe the re-use
>>> of "pirq" for different entities is this big of a problem.
>> Please suggest code if you think it ought to be done differentely. I tried.
> How about this? It's admittedly more code, but imo less ad hoc.
> I've smoke tested it, but I depend on you or Varad to check that
> it actually addresses the reported issue.
>
> Jan
>
> x86/pass-through: avoid double IRQ unbind during domain cleanup


I have tested that this patch prevents __pirq_guest_unbind on an 
already-unbound pirq
during the continuation call for domain_kill -ERESTART, by using a 
modified xen that
forces an -ERESTART from pirq_guest_unbind to create the continuation. 
It fixes the
underlying issue.

Tested-by: Varad Gautam <vrd@amazon.de>


>
> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> In that scenario, it is possible to receive multiple _pirq_guest_unbind
> calls for the same pirq from domain_kill, if the pirq has not yet been
> removed from the domain's pirq_tree, as:
>    domain_kill()
>      -> domain_relinquish_resources()
>        -> pci_release_devices()
>          -> pci_clean_dpci_irq()
>            -> pirq_guest_unbind()
>              -> __pirq_guest_unbind()
>
> Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ
> from the tree being iterated after the first call there. In case such a
> removed entry still has a softirq outstanding, record it and re-check
> upon re-invocation.
>
> Reported-by: Varad Gautam <vrd@amazon.de>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- unstable.orig/xen/arch/x86/irq.c
> +++ unstable/xen/arch/x86/irq.c
> @@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p
>       }
>
>       if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq )
> -        BUG();
> +        BUG_ON(!d->is_dying);
>   }
>
>   /* Flush all ready EOIs from the top of this CPU's pending-EOI stack. */
> --- unstable.orig/xen/drivers/passthrough/pci.c
> +++ unstable/xen/drivers/passthrough/pci.c
> @@ -873,7 +873,14 @@ static int pci_clean_dpci_irq(struct dom
>           xfree(digl);
>       }
>
> -    return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0;
> +    radix_tree_delete(&d->pirq_tree, dpci_pirq(pirq_dpci)->pirq);
> +
> +    if ( !pt_pirq_softirq_active(pirq_dpci) )
> +        return 0;
> +
> +    domain_get_irq_dpci(d)->pending_pirq_dpci = pirq_dpci;
> +
> +    return -ERESTART;
>   }
>
>   static int pci_clean_dpci_irqs(struct domain *d)
> @@ -890,8 +897,18 @@ static int pci_clean_dpci_irqs(struct do
>       hvm_irq_dpci = domain_get_irq_dpci(d);
>       if ( hvm_irq_dpci != NULL )
>       {
> -        int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
> +        int ret = 0;
> +
> +        if ( hvm_irq_dpci->pending_pirq_dpci )
> +        {
> +            if ( pt_pirq_softirq_active(hvm_irq_dpci->pending_pirq_dpci) )
> +                 ret = -ERESTART;
> +            else
> +                 hvm_irq_dpci->pending_pirq_dpci = NULL;
> +        }
>
> +        if ( !ret )
> +            ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
>           if ( ret )
>           {
>               spin_unlock(&d->event_lock);
> --- unstable.orig/xen/include/asm-x86/hvm/irq.h
> +++ unstable/xen/include/asm-x86/hvm/irq.h
> @@ -158,6 +158,8 @@ struct hvm_irq_dpci {
>       DECLARE_BITMAP(isairq_map, NR_ISAIRQS);
>       /* Record of mapped Links */
>       uint8_t link_cnt[NR_LINK];
> +    /* Clean up: Entry with a softirq invocation pending / in progress. */
> +    struct hvm_pirq_dpci *pending_pirq_dpci;
>   };
>
>   /* Machine IRQ to guest device/intx mapping. */
>
>




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Jan Beulich April 28, 2020, 12:18 p.m. UTC | #13
On 28.04.2020 13:58, vrd@amazon.com wrote:
> On 3/10/20 3:19 PM, Jan Beulich wrote:
>> On 09.03.2020 18:47, Paul Durrant wrote:
>>> Please suggest code if you think it ought to be done differentely. I tried.
>> How about this? It's admittedly more code, but imo less ad hoc.
>> I've smoke tested it, but I depend on you or Varad to check that
>> it actually addresses the reported issue.
>>
>> Jan
>>
>> x86/pass-through: avoid double IRQ unbind during domain cleanup
> 
> 
> I have tested that this patch prevents __pirq_guest_unbind on an already-unbound pirq
> during the continuation call for domain_kill -ERESTART, by using a modified xen that
> forces an -ERESTART from pirq_guest_unbind to create the continuation. It fixes the
> underlying issue.
> 
> Tested-by: Varad Gautam <vrd@amazon.de>

Thanks much; I'll formally submit the patch then.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index cc2eb8e925..32fcb624dc 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1680,9 +1680,23 @@  static irq_guest_action_t *__pirq_guest_unbind(
 
     BUG_ON(!(desc->status & IRQ_GUEST));
 
-    for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
-        continue;
-    BUG_ON(i == action->nr_guests);
+    for ( i = 0; i < action->nr_guests; i++ )
+        if ( action->guest[i] == d )
+            break;
+
+    if ( i == action->nr_guests ) /* No matching entry */
+    {
+        /*
+         * In case the pirq was shared, unbound for this domain in an earlier
+         * call, but still existed on the domain's pirq_tree, we still reach
+         * here if there are any later unbind calls on the same pirq. Return
+         * if such an unbind happens.
+         */
+        ASSERT(action->shareable);
+        return NULL;
+    }
+
+    ASSERT(action->nr_guests > 0);
     memmove(&action->guest[i], &action->guest[i+1],
             (action->nr_guests-i-1) * sizeof(action->guest[0]));
     action->nr_guests--;