diff mbox series

[v2] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs

Message ID 1576666417-20989-1-git-send-email-vrd@amazon.de (mailing list archive)
State Superseded
Headers show
Series [v2] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs | expand

Commit Message

Varad Gautam Dec. 18, 2019, 10:53 a.m. UTC
XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
In that scenario, it is possible to receive multiple _pirq_guest_unbind
calls for the same pirq from domain_kill, if the pirq has not yet been
removed from the domain's pirq_tree, as:
  domain_kill()
    -> domain_relinquish_resources()
      -> pci_release_devices()
        -> pci_clean_dpci_irq()
          -> pirq_guest_unbind()
            -> __pirq_guest_unbind()

For a shared pirq (nr_guests > 1), the first call would zap the current
domain from the pirq's guests[] list, but the action handler is never freed
as there are other guests using this pirq. As a result, on the second call,
__pirq_guest_unbind searches for the current domain which has been removed
from the guests[] list, and hits a BUG_ON.

Make __pirq_guest_unbind safe to be called multiple times by letting xen
continue if a shared pirq has already been unbound from this guest. The
PIRQ will be cleaned up from the domain's pirq_tree during the destruction
in complete_domain_destroy anyways.

Signed-off-by: Varad Gautam <vrd@amazon.de>
CC: Jan Beulich <jbeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>

v2: Split the check on action->nr_guests > 0 and make it an ASSERT, reword.
---
 xen/arch/x86/irq.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Paul Durrant Dec. 18, 2019, 11:13 a.m. UTC | #1
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Varad Gautam
> Sent: 18 December 2019 10:54
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Gautam, Varad
> <vrd@amazon.de>; Jan Beulich <jbeulich@suse.com>; Roger Pau Monné
> <roger.pau@citrix.com>
> Subject: [Xen-devel] [PATCH v2] x86: irq: Do not BUG_ON multiple unbind
> calls for shared pirqs
> 
> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> In that scenario, it is possible to receive multiple _pirq_guest_unbind
> calls for the same pirq from domain_kill, if the pirq has not yet been
> removed from the domain's pirq_tree, as:
>   domain_kill()
>     -> domain_relinquish_resources()
>       -> pci_release_devices()
>         -> pci_clean_dpci_irq()
>           -> pirq_guest_unbind()
>             -> __pirq_guest_unbind()
> 
> For a shared pirq (nr_guests > 1), the first call would zap the current
> domain from the pirq's guests[] list, but the action handler is never
> freed
> as there are other guests using this pirq. As a result, on the second
> call,
> __pirq_guest_unbind searches for the current domain which has been removed
> from the guests[] list, and hits a BUG_ON.
> 
> Make __pirq_guest_unbind safe to be called multiple times by letting xen
> continue if a shared pirq has already been unbound from this guest. The
> PIRQ will be cleaned up from the domain's pirq_tree during the destruction
> in complete_domain_destroy anyways.
> 
> Signed-off-by: Varad Gautam <vrd@amazon.de>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Paul Durrant <pdurrant@amazon.com>

> 
> v2: Split the check on action->nr_guests > 0 and make it an ASSERT,
> reword.
> ---
>  xen/arch/x86/irq.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index 5d0d94c..3eb7b22 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind(
> 
>      for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++
> )
>          continue;
> -    BUG_ON(i == action->nr_guests);
> +    if ( i == action->nr_guests ) {
> +        ASSERT(action->nr_guests > 0) ;
> +        /* In case the pirq was shared, unbound for this domain in an
> earlier call, but still
> +         * existed on the domain's pirq_tree, we still reach here if
> there are any later
> +         * unbind calls on the same pirq. Return if such an unbind
> happens. */
> +        if ( action->shareable )
> +            return NULL;
> +        BUG();
> +    }
> +
>      memmove(&action->guest[i], &action->guest[i+1],
>              (action->nr_guests-i-1) * sizeof(action->guest[0]));
>      action->nr_guests--;
> --
> 2.7.4
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
Jan Beulich Dec. 18, 2019, 1:42 p.m. UTC | #2
On 18.12.2019 11:53, Varad Gautam wrote:
> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> In that scenario, it is possible to receive multiple _pirq_guest_unbind
> calls for the same pirq from domain_kill, if the pirq has not yet been
> removed from the domain's pirq_tree, as:
>   domain_kill()
>     -> domain_relinquish_resources()
>       -> pci_release_devices()
>         -> pci_clean_dpci_irq()
>           -> pirq_guest_unbind()
>             -> __pirq_guest_unbind()
> 
> For a shared pirq (nr_guests > 1), the first call would zap the current
> domain from the pirq's guests[] list, but the action handler is never freed
> as there are other guests using this pirq. As a result, on the second call,
> __pirq_guest_unbind searches for the current domain which has been removed
> from the guests[] list, and hits a BUG_ON.

There must be more to this, seeing the cleanup_domain_irq_pirq()
invocation at the end of pirq_guest_unbind(), which ought to be
reached in the case you describe.

> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind(
>  
>      for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
>          continue;
> -    BUG_ON(i == action->nr_guests);
> +    if ( i == action->nr_guests ) {

Brace on its own line please.

> +        ASSERT(action->nr_guests > 0) ;

Stray blank.

> +        /* In case the pirq was shared, unbound for this domain in an earlier call, but still
> +         * existed on the domain's pirq_tree, we still reach here if there are any later
> +         * unbind calls on the same pirq. Return if such an unbind happens. */
> +        if ( action->shareable )

Long lines and malformed comment.

Do you perhaps also want to check that you take this path only
for dying guests?

Jan
Julien Grall Dec. 18, 2019, 1:57 p.m. UTC | #3
Hi Varad,

Please send new version of a patch in a new thread rather than in-reply 
to the first version.

On 18/12/2019 10:53, Varad Gautam wrote:
> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> In that scenario, it is possible to receive multiple _pirq_guest_unbind
> calls for the same pirq from domain_kill, if the pirq has not yet been
> removed from the domain's pirq_tree, as:
>    domain_kill()
>      -> domain_relinquish_resources()
>        -> pci_release_devices()
>          -> pci_clean_dpci_irq()
>            -> pirq_guest_unbind()
>              -> __pirq_guest_unbind()
> 
> For a shared pirq (nr_guests > 1), the first call would zap the current
> domain from the pirq's guests[] list, but the action handler is never freed
> as there are other guests using this pirq. As a result, on the second call,
> __pirq_guest_unbind searches for the current domain which has been removed
> from the guests[] list, and hits a BUG_ON.
> 
> Make __pirq_guest_unbind safe to be called multiple times by letting xen
> continue if a shared pirq has already been unbound from this guest. The
> PIRQ will be cleaned up from the domain's pirq_tree during the destruction
> in complete_domain_destroy anyways.
> 
> Signed-off-by: Varad Gautam <vrd@amazon.de>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> v2: Split the check on action->nr_guests > 0 and make it an ASSERT, reword.
> ---
>   xen/arch/x86/irq.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index 5d0d94c..3eb7b22 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind(
>   
>       for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
>           continue;
> -    BUG_ON(i == action->nr_guests);
> +    if ( i == action->nr_guests ) {

The { should be a new line.

> +        ASSERT(action->nr_guests > 0) ;

The space before ; is not necessary.

> +        /* In case the pirq was shared, unbound for this domain in an earlier call, but still
> +         * existed on the domain's pirq_tree, we still reach here if there are any later
> +         * unbind calls on the same pirq. Return if such an unbind happens. */

The coding style for comment is:

/*
  * Foo
  * Bar
  */

> +        if ( action->shareable )
> +            return NULL;
> +        BUG();

Given that the previous BUG_ON() was hit, would it make sense to try to 
avoid a new BUG().

So why not just returning NULL as you do for action->shareable?

> +    }
> +
>       memmove(&action->guest[i], &action->guest[i+1],
>               (action->nr_guests-i-1) * sizeof(action->guest[0]));
>       action->nr_guests--;
> 

Cheers,
vrd@amazon.com Jan. 29, 2020, 9:27 a.m. UTC | #4
Hey Julien,

On 12/18/19 2:57 PM, Julien Grall wrote:
> Hi Varad,
>
> Please send new version of a patch in a new thread rather than 
> in-reply to the first version.
>
> On 18/12/2019 10:53, Varad Gautam wrote:
>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill 
>> -ERESTARTS.
>> In that scenario, it is possible to receive multiple _pirq_guest_unbind
>> calls for the same pirq from domain_kill, if the pirq has not yet been
>> removed from the domain's pirq_tree, as:
>>    domain_kill()
>>      -> domain_relinquish_resources()
>>        -> pci_release_devices()
>>          -> pci_clean_dpci_irq()
>>            -> pirq_guest_unbind()
>>              -> __pirq_guest_unbind()
>>
>> For a shared pirq (nr_guests > 1), the first call would zap the current
>> domain from the pirq's guests[] list, but the action handler is never 
>> freed
>> as there are other guests using this pirq. As a result, on the second 
>> call,
>> __pirq_guest_unbind searches for the current domain which has been 
>> removed
>> from the guests[] list, and hits a BUG_ON.
>>
>> Make __pirq_guest_unbind safe to be called multiple times by letting xen
>> continue if a shared pirq has already been unbound from this guest. The
>> PIRQ will be cleaned up from the domain's pirq_tree during the 
>> destruction
>> in complete_domain_destroy anyways.
>>
>> Signed-off-by: Varad Gautam <vrd@amazon.de>
>> CC: Jan Beulich <jbeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> v2: Split the check on action->nr_guests > 0 and make it an ASSERT, 
>> reword.
>> ---
>>   xen/arch/x86/irq.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
>> index 5d0d94c..3eb7b22 100644
>> --- a/xen/arch/x86/irq.c
>> +++ b/xen/arch/x86/irq.c
>> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind(
>>         for ( i = 0; (i < action->nr_guests) && (action->guest[i] != 
>> d); i++ )
>>           continue;
>> -    BUG_ON(i == action->nr_guests);
>> +    if ( i == action->nr_guests ) {
>
> The { should be a new line.
>
>> +        ASSERT(action->nr_guests > 0) ;
>
> The space before ; is not necessary.
>
>> +        /* In case the pirq was shared, unbound for this domain in 
>> an earlier call, but still
>> +         * existed on the domain's pirq_tree, we still reach here if 
>> there are any later
>> +         * unbind calls on the same pirq. Return if such an unbind 
>> happens. */
>
> The coding style for comment is:
>
> /*
>  * Foo
>  * Bar
>  */
>
>> +        if ( action->shareable )
>> +            return NULL;
>> +        BUG();
>
> Given that the previous BUG_ON() was hit, would it make sense to try 
> to avoid a new BUG().
>
> So why not just returning NULL as you do for action->shareable?
>

Thanks, I've done the style fixups in v3.

I'd argue that is indeed a BUG, if the pirq was _not_ shareable and the 
loop above couldn't find a matching domain for it - that implies the 
pirq shouldn't have existed in the first place.


>> +    }
>> +
>>       memmove(&action->guest[i], &action->guest[i+1],
>>               (action->nr_guests-i-1) * sizeof(action->guest[0]));
>>       action->nr_guests--;
>>
>
> Cheers,
>




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
vrd@amazon.com Jan. 29, 2020, 9:27 a.m. UTC | #5
Hey Jan,

On 12/18/19 2:42 PM, Jan Beulich wrote:
> On 18.12.2019 11:53, Varad Gautam wrote:
>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
>> In that scenario, it is possible to receive multiple _pirq_guest_unbind
>> calls for the same pirq from domain_kill, if the pirq has not yet been
>> removed from the domain's pirq_tree, as:
>>    domain_kill()
>>      -> domain_relinquish_resources()
>>        -> pci_release_devices()
>>          -> pci_clean_dpci_irq()
>>            -> pirq_guest_unbind()
>>              -> __pirq_guest_unbind()
>>
>> For a shared pirq (nr_guests > 1), the first call would zap the current
>> domain from the pirq's guests[] list, but the action handler is never freed
>> as there are other guests using this pirq. As a result, on the second call,
>> __pirq_guest_unbind searches for the current domain which has been removed
>> from the guests[] list, and hits a BUG_ON.
> There must be more to this, seeing the cleanup_domain_irq_pirq()
> invocation at the end of pirq_guest_unbind(), which ought to be
> reached in the case you describe.


The calls to __pirq_guest_unbind and cleanup_domain_irq_pirq from 
pirq_guest_unbind are going to be mutually exclusive, since irq defaults 
to 0.


>> --- a/xen/arch/x86/irq.c
>> +++ b/xen/arch/x86/irq.c
>> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind(
>>   
>>       for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
>>           continue;
>> -    BUG_ON(i == action->nr_guests);
>> +    if ( i == action->nr_guests ) {
> Brace on its own line please.
>
>> +        ASSERT(action->nr_guests > 0) ;
> Stray blank.
>
>> +        /* In case the pirq was shared, unbound for this domain in an earlier call, but still
>> +         * existed on the domain's pirq_tree, we still reach here if there are any later
>> +         * unbind calls on the same pirq. Return if such an unbind happens. */
>> +        if ( action->shareable )
> Long lines and malformed comment.


Handled all style fixups in v3.


> Do you perhaps also want to check that you take this path only
> for dying guests?


The patch is about making pirq_guest_unbind() resilient/safe to multiple 
calls. At present, this happens when domain_kill -ERESTARTs for dying 
guests. It might also happen in other cases - and shouldn't cause xen to 
crash.


> Jan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Julien Grall Jan. 29, 2020, 10:07 a.m. UTC | #6
On 29/01/2020 09:27, vrd@amazon.com wrote:
> Hey Julien,

Hi,

> 
> On 12/18/19 2:57 PM, Julien Grall wrote:
>> Hi Varad,
>>
>> Please send new version of a patch in a new thread rather than 
>> in-reply to the first version.
>>
>> On 18/12/2019 10:53, Varad Gautam wrote:
>>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill 
>>> -ERESTARTS.
>>> In that scenario, it is possible to receive multiple _pirq_guest_unbind
>>> calls for the same pirq from domain_kill, if the pirq has not yet been
>>> removed from the domain's pirq_tree, as:
>>>    domain_kill()
>>>      -> domain_relinquish_resources()
>>>        -> pci_release_devices()
>>>          -> pci_clean_dpci_irq()
>>>            -> pirq_guest_unbind()
>>>              -> __pirq_guest_unbind()
>>>
>>> For a shared pirq (nr_guests > 1), the first call would zap the current
>>> domain from the pirq's guests[] list, but the action handler is never 
>>> freed
>>> as there are other guests using this pirq. As a result, on the second 
>>> call,
>>> __pirq_guest_unbind searches for the current domain which has been 
>>> removed
>>> from the guests[] list, and hits a BUG_ON.
>>>
>>> Make __pirq_guest_unbind safe to be called multiple times by letting xen
>>> continue if a shared pirq has already been unbound from this guest. The
>>> PIRQ will be cleaned up from the domain's pirq_tree during the 
>>> destruction
>>> in complete_domain_destroy anyways.
>>>
>>> Signed-off-by: Varad Gautam <vrd@amazon.de>
>>> CC: Jan Beulich <jbeulich@suse.com>
>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>
>>> v2: Split the check on action->nr_guests > 0 and make it an ASSERT, 
>>> reword.
>>> ---
>>>   xen/arch/x86/irq.c | 11 ++++++++++-
>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
>>> index 5d0d94c..3eb7b22 100644
>>> --- a/xen/arch/x86/irq.c
>>> +++ b/xen/arch/x86/irq.c
>>> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind(
>>>         for ( i = 0; (i < action->nr_guests) && (action->guest[i] != 
>>> d); i++ )
>>>           continue;
>>> -    BUG_ON(i == action->nr_guests);
>>> +    if ( i == action->nr_guests ) {
>>
>> The { should be a new line.
>>
>>> +        ASSERT(action->nr_guests > 0) ;
>>
>> The space before ; is not necessary.
>>
>>> +        /* In case the pirq was shared, unbound for this domain in 
>>> an earlier call, but still
>>> +         * existed on the domain's pirq_tree, we still reach here if 
>>> there are any later
>>> +         * unbind calls on the same pirq. Return if such an unbind 
>>> happens. */
>>
>> The coding style for comment is:
>>
>> /*
>>  * Foo
>>  * Bar
>>  */
>>
>>> +        if ( action->shareable )
>>> +            return NULL;
>>> +        BUG();
>>
>> Given that the previous BUG_ON() was hit, would it make sense to try 
>> to avoid a new BUG().
>>
>> So why not just returning NULL as you do for action->shareable?
>>
> 
> Thanks, I've done the style fixups in v3.
> 
> I'd argue that is indeed a BUG, if the pirq was _not_ shareable and the 
> loop above couldn't find a matching domain for it - that implies the 
> pirq shouldn't have existed in the first place.

I am afraid this is only telling me how the BUG() could be triggered and 
not why a BUG() is more warrant than an ASSERT().

AFAIU, the BUG() can only be triggered if there is a programatic error. 
This is no different than your ASSERT(action->nr_guest > 0) you just added.

Reading the section "Handling unexpected conditions" in CODING_STYLE, it 
feels to me the BUG() is not the correct handling as you can return an 
error here and it would continue fine.

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 5d0d94c..3eb7b22 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1863,7 +1863,16 @@  static irq_guest_action_t *__pirq_guest_unbind(
 
     for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
         continue;
-    BUG_ON(i == action->nr_guests);
+    if ( i == action->nr_guests ) {
+        ASSERT(action->nr_guests > 0) ;
+        /* In case the pirq was shared, unbound for this domain in an earlier call, but still
+         * existed on the domain's pirq_tree, we still reach here if there are any later
+         * unbind calls on the same pirq. Return if such an unbind happens. */
+        if ( action->shareable )
+            return NULL;
+        BUG();
+    }
+
     memmove(&action->guest[i], &action->guest[i+1],
             (action->nr_guests-i-1) * sizeof(action->guest[0]));
     action->nr_guests--;