Message ID | 1576666417-20989-1-git-send-email-vrd@amazon.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs | expand |
> -----Original Message----- > From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of > Varad Gautam > Sent: 18 December 2019 10:54 > To: xen-devel@lists.xenproject.org > Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Gautam, Varad > <vrd@amazon.de>; Jan Beulich <jbeulich@suse.com>; Roger Pau Monné > <roger.pau@citrix.com> > Subject: [Xen-devel] [PATCH v2] x86: irq: Do not BUG_ON multiple unbind > calls for shared pirqs > > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. > In that scenario, it is possible to receive multiple _pirq_guest_unbind > calls for the same pirq from domain_kill, if the pirq has not yet been > removed from the domain's pirq_tree, as: > domain_kill() > -> domain_relinquish_resources() > -> pci_release_devices() > -> pci_clean_dpci_irq() > -> pirq_guest_unbind() > -> __pirq_guest_unbind() > > For a shared pirq (nr_guests > 1), the first call would zap the current > domain from the pirq's guests[] list, but the action handler is never > freed > as there are other guests using this pirq. As a result, on the second > call, > __pirq_guest_unbind searches for the current domain which has been removed > from the guests[] list, and hits a BUG_ON. > > Make __pirq_guest_unbind safe to be called multiple times by letting xen > continue if a shared pirq has already been unbound from this guest. The > PIRQ will be cleaned up from the domain's pirq_tree during the destruction > in complete_domain_destroy anyways. > > Signed-off-by: Varad Gautam <vrd@amazon.de> > CC: Jan Beulich <jbeulich@suse.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <pdurrant@amazon.com> > > v2: Split the check on action->nr_guests > 0 and make it an ASSERT, > reword. > --- > xen/arch/x86/irq.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c > index 5d0d94c..3eb7b22 100644 > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind( > > for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ > ) > continue; > - BUG_ON(i == action->nr_guests); > + if ( i == action->nr_guests ) { > + ASSERT(action->nr_guests > 0) ; > + /* In case the pirq was shared, unbound for this domain in an > earlier call, but still > + * existed on the domain's pirq_tree, we still reach here if > there are any later > + * unbind calls on the same pirq. Return if such an unbind > happens. */ > + if ( action->shareable ) > + return NULL; > + BUG(); > + } > + > memmove(&action->guest[i], &action->guest[i+1], > (action->nr_guests-i-1) * sizeof(action->guest[0])); > action->nr_guests--; > -- > 2.7.4 > > > > > Amazon Development Center Germany GmbH > Krausenstr. 38 > 10117 Berlin > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B > Sitz: Berlin > Ust-ID: DE 289 237 879 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel
On 18.12.2019 11:53, Varad Gautam wrote: > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. > In that scenario, it is possible to receive multiple _pirq_guest_unbind > calls for the same pirq from domain_kill, if the pirq has not yet been > removed from the domain's pirq_tree, as: > domain_kill() > -> domain_relinquish_resources() > -> pci_release_devices() > -> pci_clean_dpci_irq() > -> pirq_guest_unbind() > -> __pirq_guest_unbind() > > For a shared pirq (nr_guests > 1), the first call would zap the current > domain from the pirq's guests[] list, but the action handler is never freed > as there are other guests using this pirq. As a result, on the second call, > __pirq_guest_unbind searches for the current domain which has been removed > from the guests[] list, and hits a BUG_ON. There must be more to this, seeing the cleanup_domain_irq_pirq() invocation at the end of pirq_guest_unbind(), which ought to be reached in the case you describe. > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind( > > for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) > continue; > - BUG_ON(i == action->nr_guests); > + if ( i == action->nr_guests ) { Brace on its own line please. > + ASSERT(action->nr_guests > 0) ; Stray blank. > + /* In case the pirq was shared, unbound for this domain in an earlier call, but still > + * existed on the domain's pirq_tree, we still reach here if there are any later > + * unbind calls on the same pirq. Return if such an unbind happens. */ > + if ( action->shareable ) Long lines and malformed comment. Do you perhaps also want to check that you take this path only for dying guests? Jan
Hi Varad, Please send new version of a patch in a new thread rather than in-reply to the first version. On 18/12/2019 10:53, Varad Gautam wrote: > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. > In that scenario, it is possible to receive multiple _pirq_guest_unbind > calls for the same pirq from domain_kill, if the pirq has not yet been > removed from the domain's pirq_tree, as: > domain_kill() > -> domain_relinquish_resources() > -> pci_release_devices() > -> pci_clean_dpci_irq() > -> pirq_guest_unbind() > -> __pirq_guest_unbind() > > For a shared pirq (nr_guests > 1), the first call would zap the current > domain from the pirq's guests[] list, but the action handler is never freed > as there are other guests using this pirq. As a result, on the second call, > __pirq_guest_unbind searches for the current domain which has been removed > from the guests[] list, and hits a BUG_ON. > > Make __pirq_guest_unbind safe to be called multiple times by letting xen > continue if a shared pirq has already been unbound from this guest. The > PIRQ will be cleaned up from the domain's pirq_tree during the destruction > in complete_domain_destroy anyways. > > Signed-off-by: Varad Gautam <vrd@amazon.de> > CC: Jan Beulich <jbeulich@suse.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: Andrew Cooper <andrew.cooper3@citrix.com> > > v2: Split the check on action->nr_guests > 0 and make it an ASSERT, reword. > --- > xen/arch/x86/irq.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c > index 5d0d94c..3eb7b22 100644 > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind( > > for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) > continue; > - BUG_ON(i == action->nr_guests); > + if ( i == action->nr_guests ) { The { should be a new line. > + ASSERT(action->nr_guests > 0) ; The space before ; is not necessary. > + /* In case the pirq was shared, unbound for this domain in an earlier call, but still > + * existed on the domain's pirq_tree, we still reach here if there are any later > + * unbind calls on the same pirq. Return if such an unbind happens. */ The coding style for comment is: /* * Foo * Bar */ > + if ( action->shareable ) > + return NULL; > + BUG(); Given that the previous BUG_ON() was hit, would it make sense to try to avoid a new BUG(). So why not just returning NULL as you do for action->shareable? > + } > + > memmove(&action->guest[i], &action->guest[i+1], > (action->nr_guests-i-1) * sizeof(action->guest[0])); > action->nr_guests--; > Cheers,
Hey Julien, On 12/18/19 2:57 PM, Julien Grall wrote: > Hi Varad, > > Please send new version of a patch in a new thread rather than > in-reply to the first version. > > On 18/12/2019 10:53, Varad Gautam wrote: >> XEN_DOMCTL_destroydomain creates a continuation if domain_kill >> -ERESTARTS. >> In that scenario, it is possible to receive multiple _pirq_guest_unbind >> calls for the same pirq from domain_kill, if the pirq has not yet been >> removed from the domain's pirq_tree, as: >> domain_kill() >> -> domain_relinquish_resources() >> -> pci_release_devices() >> -> pci_clean_dpci_irq() >> -> pirq_guest_unbind() >> -> __pirq_guest_unbind() >> >> For a shared pirq (nr_guests > 1), the first call would zap the current >> domain from the pirq's guests[] list, but the action handler is never >> freed >> as there are other guests using this pirq. As a result, on the second >> call, >> __pirq_guest_unbind searches for the current domain which has been >> removed >> from the guests[] list, and hits a BUG_ON. >> >> Make __pirq_guest_unbind safe to be called multiple times by letting xen >> continue if a shared pirq has already been unbound from this guest. The >> PIRQ will be cleaned up from the domain's pirq_tree during the >> destruction >> in complete_domain_destroy anyways. >> >> Signed-off-by: Varad Gautam <vrd@amazon.de> >> CC: Jan Beulich <jbeulich@suse.com> >> CC: Roger Pau Monné <roger.pau@citrix.com> >> CC: Andrew Cooper <andrew.cooper3@citrix.com> >> >> v2: Split the check on action->nr_guests > 0 and make it an ASSERT, >> reword. >> --- >> xen/arch/x86/irq.c | 11 ++++++++++- >> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c >> index 5d0d94c..3eb7b22 100644 >> --- a/xen/arch/x86/irq.c >> +++ b/xen/arch/x86/irq.c >> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind( >> for ( i = 0; (i < action->nr_guests) && (action->guest[i] != >> d); i++ ) >> continue; >> - BUG_ON(i == action->nr_guests); >> + if ( i == action->nr_guests ) { > > The { should be a new line. > >> + ASSERT(action->nr_guests > 0) ; > > The space before ; is not necessary. > >> + /* In case the pirq was shared, unbound for this domain in >> an earlier call, but still >> + * existed on the domain's pirq_tree, we still reach here if >> there are any later >> + * unbind calls on the same pirq. Return if such an unbind >> happens. */ > > The coding style for comment is: > > /* > * Foo > * Bar > */ > >> + if ( action->shareable ) >> + return NULL; >> + BUG(); > > Given that the previous BUG_ON() was hit, would it make sense to try > to avoid a new BUG(). > > So why not just returning NULL as you do for action->shareable? > Thanks, I've done the style fixups in v3. I'd argue that is indeed a BUG, if the pirq was _not_ shareable and the loop above couldn't find a matching domain for it - that implies the pirq shouldn't have existed in the first place. >> + } >> + >> memmove(&action->guest[i], &action->guest[i+1], >> (action->nr_guests-i-1) * sizeof(action->guest[0])); >> action->nr_guests--; >> > > Cheers, > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Hey Jan, On 12/18/19 2:42 PM, Jan Beulich wrote: > On 18.12.2019 11:53, Varad Gautam wrote: >> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. >> In that scenario, it is possible to receive multiple _pirq_guest_unbind >> calls for the same pirq from domain_kill, if the pirq has not yet been >> removed from the domain's pirq_tree, as: >> domain_kill() >> -> domain_relinquish_resources() >> -> pci_release_devices() >> -> pci_clean_dpci_irq() >> -> pirq_guest_unbind() >> -> __pirq_guest_unbind() >> >> For a shared pirq (nr_guests > 1), the first call would zap the current >> domain from the pirq's guests[] list, but the action handler is never freed >> as there are other guests using this pirq. As a result, on the second call, >> __pirq_guest_unbind searches for the current domain which has been removed >> from the guests[] list, and hits a BUG_ON. > There must be more to this, seeing the cleanup_domain_irq_pirq() > invocation at the end of pirq_guest_unbind(), which ought to be > reached in the case you describe. The calls to __pirq_guest_unbind and cleanup_domain_irq_pirq from pirq_guest_unbind are going to be mutually exclusive, since irq defaults to 0. >> --- a/xen/arch/x86/irq.c >> +++ b/xen/arch/x86/irq.c >> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind( >> >> for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) >> continue; >> - BUG_ON(i == action->nr_guests); >> + if ( i == action->nr_guests ) { > Brace on its own line please. > >> + ASSERT(action->nr_guests > 0) ; > Stray blank. > >> + /* In case the pirq was shared, unbound for this domain in an earlier call, but still >> + * existed on the domain's pirq_tree, we still reach here if there are any later >> + * unbind calls on the same pirq. Return if such an unbind happens. */ >> + if ( action->shareable ) > Long lines and malformed comment. Handled all style fixups in v3. > Do you perhaps also want to check that you take this path only > for dying guests? The patch is about making pirq_guest_unbind() resilient/safe to multiple calls. At present, this happens when domain_kill -ERESTARTs for dying guests. It might also happen in other cases - and shouldn't cause xen to crash. > Jan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On 29/01/2020 09:27, vrd@amazon.com wrote: > Hey Julien, Hi, > > On 12/18/19 2:57 PM, Julien Grall wrote: >> Hi Varad, >> >> Please send new version of a patch in a new thread rather than >> in-reply to the first version. >> >> On 18/12/2019 10:53, Varad Gautam wrote: >>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill >>> -ERESTARTS. >>> In that scenario, it is possible to receive multiple _pirq_guest_unbind >>> calls for the same pirq from domain_kill, if the pirq has not yet been >>> removed from the domain's pirq_tree, as: >>> domain_kill() >>> -> domain_relinquish_resources() >>> -> pci_release_devices() >>> -> pci_clean_dpci_irq() >>> -> pirq_guest_unbind() >>> -> __pirq_guest_unbind() >>> >>> For a shared pirq (nr_guests > 1), the first call would zap the current >>> domain from the pirq's guests[] list, but the action handler is never >>> freed >>> as there are other guests using this pirq. As a result, on the second >>> call, >>> __pirq_guest_unbind searches for the current domain which has been >>> removed >>> from the guests[] list, and hits a BUG_ON. >>> >>> Make __pirq_guest_unbind safe to be called multiple times by letting xen >>> continue if a shared pirq has already been unbound from this guest. The >>> PIRQ will be cleaned up from the domain's pirq_tree during the >>> destruction >>> in complete_domain_destroy anyways. >>> >>> Signed-off-by: Varad Gautam <vrd@amazon.de> >>> CC: Jan Beulich <jbeulich@suse.com> >>> CC: Roger Pau Monné <roger.pau@citrix.com> >>> CC: Andrew Cooper <andrew.cooper3@citrix.com> >>> >>> v2: Split the check on action->nr_guests > 0 and make it an ASSERT, >>> reword. >>> --- >>> xen/arch/x86/irq.c | 11 ++++++++++- >>> 1 file changed, 10 insertions(+), 1 deletion(-) >>> >>> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c >>> index 5d0d94c..3eb7b22 100644 >>> --- a/xen/arch/x86/irq.c >>> +++ b/xen/arch/x86/irq.c >>> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind( >>> for ( i = 0; (i < action->nr_guests) && (action->guest[i] != >>> d); i++ ) >>> continue; >>> - BUG_ON(i == action->nr_guests); >>> + if ( i == action->nr_guests ) { >> >> The { should be a new line. >> >>> + ASSERT(action->nr_guests > 0) ; >> >> The space before ; is not necessary. >> >>> + /* In case the pirq was shared, unbound for this domain in >>> an earlier call, but still >>> + * existed on the domain's pirq_tree, we still reach here if >>> there are any later >>> + * unbind calls on the same pirq. Return if such an unbind >>> happens. */ >> >> The coding style for comment is: >> >> /* >> * Foo >> * Bar >> */ >> >>> + if ( action->shareable ) >>> + return NULL; >>> + BUG(); >> >> Given that the previous BUG_ON() was hit, would it make sense to try >> to avoid a new BUG(). >> >> So why not just returning NULL as you do for action->shareable? >> > > Thanks, I've done the style fixups in v3. > > I'd argue that is indeed a BUG, if the pirq was _not_ shareable and the > loop above couldn't find a matching domain for it - that implies the > pirq shouldn't have existed in the first place. I am afraid this is only telling me how the BUG() could be triggered and not why a BUG() is more warrant than an ASSERT(). AFAIU, the BUG() can only be triggered if there is a programatic error. This is no different than your ASSERT(action->nr_guest > 0) you just added. Reading the section "Handling unexpected conditions" in CODING_STYLE, it feels to me the BUG() is not the correct handling as you can return an error here and it would continue fine. Cheers,
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 5d0d94c..3eb7b22 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind( for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) continue; - BUG_ON(i == action->nr_guests); + if ( i == action->nr_guests ) { + ASSERT(action->nr_guests > 0) ; + /* In case the pirq was shared, unbound for this domain in an earlier call, but still + * existed on the domain's pirq_tree, we still reach here if there are any later + * unbind calls on the same pirq. Return if such an unbind happens. */ + if ( action->shareable ) + return NULL; + BUG(); + } + memmove(&action->guest[i], &action->guest[i+1], (action->nr_guests-i-1) * sizeof(action->guest[0])); action->nr_guests--;
XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. In that scenario, it is possible to receive multiple _pirq_guest_unbind calls for the same pirq from domain_kill, if the pirq has not yet been removed from the domain's pirq_tree, as: domain_kill() -> domain_relinquish_resources() -> pci_release_devices() -> pci_clean_dpci_irq() -> pirq_guest_unbind() -> __pirq_guest_unbind() For a shared pirq (nr_guests > 1), the first call would zap the current domain from the pirq's guests[] list, but the action handler is never freed as there are other guests using this pirq. As a result, on the second call, __pirq_guest_unbind searches for the current domain which has been removed from the guests[] list, and hits a BUG_ON. Make __pirq_guest_unbind safe to be called multiple times by letting xen continue if a shared pirq has already been unbound from this guest. The PIRQ will be cleaned up from the domain's pirq_tree during the destruction in complete_domain_destroy anyways. Signed-off-by: Varad Gautam <vrd@amazon.de> CC: Jan Beulich <jbeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> v2: Split the check on action->nr_guests > 0 and make it an ASSERT, reword. --- xen/arch/x86/irq.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)