Message ID | 20200306160254.8465-1-paul@xen.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs | expand |
On 06.03.2020 17:02, paul@xen.org wrote: > From: Varad Gautam <vrd@amazon.de> > > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. > In that scenario, it is possible to receive multiple __pirq_guest_unbind > calls for the same pirq from domain_kill, if the pirq has not yet been > removed from the domain's pirq_tree, as: > domain_kill() > -> domain_relinquish_resources() > -> pci_release_devices() > -> pci_clean_dpci_irq() > -> pirq_guest_unbind() > -> __pirq_guest_unbind() > > For a shared pirq (nr_guests > 1), the first call would zap the current > domain from the pirq's guests[] list, but the action handler is never freed > as there are other guests using this pirq. As a result, on the second call, > __pirq_guest_unbind searches for the current domain which has been removed > from the guests[] list, and hits a BUG_ON. > > Make __pirq_guest_unbind safe to be called multiple times by letting xen > continue if a shared pirq has already been unbound from this guest. The > PIRQ will be cleaned up from the domain's pirq_tree during the destruction > in complete_domain_destroy anyway. > > Signed-off-by: Varad Gautam <vrd@amazon.de> > [taking over from Varad at v4] > Signed-off-by: Paul Durrant <paul@xen.org> > --- > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Julien Grall <julien@xen.org> > Cc: Roger Pau Monné <roger.pau@citrix.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > > Roger suggested cleaning the entry from the domain pirq_tree so that > we need not make it safe to re-call __pirq_guest_unbind(). This seems like > a reasonable suggestion but the semantics of the code are almost > impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also > the name of struct so you generally have little idea what it actally means) > so I prefer to stick with a small fix that I can actually reason about. > > v4: > - Re-work the guest array search to make it clearer I.e. there are cosmetic differences to v3 (see below), but technically it's still the same. I can't believe the re-use of "pirq" for different entities is this big of a problem. But anyway: > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind( > > BUG_ON(!(desc->status & IRQ_GUEST)); > > - for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) > - continue; > - BUG_ON(i == action->nr_guests); > + for ( i = 0; i < action->nr_guests; i++ ) > + if ( action->guest[i] == d ) > + break; > + > + if ( i == action->nr_guests ) /* No matching entry */ > + { > + /* > + * In case the pirq was shared, unbound for this domain in an earlier > + * call, but still existed on the domain's pirq_tree, we still reach > + * here if there are any later unbind calls on the same pirq. Return > + * if such an unbind happens. > + */ > + ASSERT(action->shareable); > + return NULL; > + } > + > + ASSERT(action->nr_guests > 0); This seems pointless to have here - v3 had it inside the if(), where it would actually guard against coming here with nr_guests equal to zero. v3 also used if() and BUG() instead of ASSERT() inside this if(), which to me would seem more in line with our current ./CODING_STYLE guidelines of handling unexpected conditions. Could you clarify why you switched things? Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 09 March 2020 16:29 > To: paul@xen.org > Cc: xen-devel@lists.xenproject.org; Varad Gautam <vrd@amazon.de>; Julien Grall <julien@xen.org>; Roger > Pau Monné <roger.pau@citrix.com>; Andrew Cooper <andrew.cooper3@citrix.com> > Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs > > On 06.03.2020 17:02, paul@xen.org wrote: > > From: Varad Gautam <vrd@amazon.de> > > > > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. > > In that scenario, it is possible to receive multiple __pirq_guest_unbind > > calls for the same pirq from domain_kill, if the pirq has not yet been > > removed from the domain's pirq_tree, as: > > domain_kill() > > -> domain_relinquish_resources() > > -> pci_release_devices() > > -> pci_clean_dpci_irq() > > -> pirq_guest_unbind() > > -> __pirq_guest_unbind() > > > > For a shared pirq (nr_guests > 1), the first call would zap the current > > domain from the pirq's guests[] list, but the action handler is never freed > > as there are other guests using this pirq. As a result, on the second call, > > __pirq_guest_unbind searches for the current domain which has been removed > > from the guests[] list, and hits a BUG_ON. > > > > Make __pirq_guest_unbind safe to be called multiple times by letting xen > > continue if a shared pirq has already been unbound from this guest. The > > PIRQ will be cleaned up from the domain's pirq_tree during the destruction > > in complete_domain_destroy anyway. > > > > Signed-off-by: Varad Gautam <vrd@amazon.de> > > [taking over from Varad at v4] > > Signed-off-by: Paul Durrant <paul@xen.org> > > --- > > Cc: Jan Beulich <jbeulich@suse.com> > > Cc: Julien Grall <julien@xen.org> > > Cc: Roger Pau Monné <roger.pau@citrix.com> > > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > > > > Roger suggested cleaning the entry from the domain pirq_tree so that > > we need not make it safe to re-call __pirq_guest_unbind(). This seems like > > a reasonable suggestion but the semantics of the code are almost > > impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also > > the name of struct so you generally have little idea what it actally means) > > so I prefer to stick with a small fix that I can actually reason about. > > > > v4: > > - Re-work the guest array search to make it clearer > > I.e. there are cosmetic differences to v3 (see below), but > technically it's still the same. I can't believe the re-use > of "pirq" for different entities is this big of a problem. Please suggest code if you think it ought to be done differentely. I tried. > But anyway: > > > --- a/xen/arch/x86/irq.c > > +++ b/xen/arch/x86/irq.c > > @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind( > > > > BUG_ON(!(desc->status & IRQ_GUEST)); > > > > - for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) > > - continue; > > - BUG_ON(i == action->nr_guests); > > + for ( i = 0; i < action->nr_guests; i++ ) > > + if ( action->guest[i] == d ) > > + break; > > + > > + if ( i == action->nr_guests ) /* No matching entry */ > > + { > > + /* > > + * In case the pirq was shared, unbound for this domain in an earlier > > + * call, but still existed on the domain's pirq_tree, we still reach > > + * here if there are any later unbind calls on the same pirq. Return > > + * if such an unbind happens. > > + */ > > + ASSERT(action->shareable); > > + return NULL; > > + } > > + > > + ASSERT(action->nr_guests > 0); > > This seems pointless to have here - v3 had it inside the if(), > where it would actually guard against coming here with nr_guests > equal to zero. Why. The code just after this decrements nr_guests so it seems like entirely the right point to have the ASSERT. I can make it ASSERT >= 0, if that makes more sense. > v3 also used if() and BUG() instead of ASSERT() > inside this if(), which to me would seem more in line with our > current ./CODING_STYLE guidelines of handling unexpected > conditions. Could you clarify why you switched things? > Because I don't think there is need to kill the host in a non-debug context if we hit this condition; it is entirely survivable as far as I can tell so a BUG_ON() did not seem appropriate. Paul
On 09.03.2020 18:47, Paul Durrant wrote: >> -----Original Message----- >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 09 March 2020 16:29 >> >> On 06.03.2020 17:02, paul@xen.org wrote: >>> --- a/xen/arch/x86/irq.c >>> +++ b/xen/arch/x86/irq.c >>> @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind( >>> >>> BUG_ON(!(desc->status & IRQ_GUEST)); >>> >>> - for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) >>> - continue; >>> - BUG_ON(i == action->nr_guests); >>> + for ( i = 0; i < action->nr_guests; i++ ) >>> + if ( action->guest[i] == d ) >>> + break; >>> + >>> + if ( i == action->nr_guests ) /* No matching entry */ >>> + { >>> + /* >>> + * In case the pirq was shared, unbound for this domain in an earlier >>> + * call, but still existed on the domain's pirq_tree, we still reach >>> + * here if there are any later unbind calls on the same pirq. Return >>> + * if such an unbind happens. >>> + */ >>> + ASSERT(action->shareable); >>> + return NULL; >>> + } >>> + >>> + ASSERT(action->nr_guests > 0); >> >> This seems pointless to have here - v3 had it inside the if(), >> where it would actually guard against coming here with nr_guests >> equal to zero. > > Why. The code just after this decrements nr_guests so it seems > like entirely the right point to have the ASSERT. I can make it > ASSERT >= 0, if that makes more sense. There's no way to come here when nr_guests == 0. This is because in this case the loop will be exited with i being zero, and hence the earlier if()'s body will be entered. (And no, >= 0 wouldn't make sense to me - it would mean we might have a count of -1 after the decrement.) >> v3 also used if() and BUG() instead of ASSERT() >> inside this if(), which to me would seem more in line with our >> current ./CODING_STYLE guidelines of handling unexpected >> conditions. Could you clarify why you switched things? >> > > Because I don't think there is need to kill the host in a > non-debug context if we hit this condition; it is entirely > survivable as far as I can tell so a BUG_ON() did not seem > appropriate. It'll mean we have a non-sharable IRQ in a place where this is not supposed to happen. How can we be sure the system is in a state allowing to safely continue? To me, if shareable / non- shareable is of any concern here, then it ought to be BUG(). If it's not, then the ASSERT() ought to be dropped as well. Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 10 March 2020 11:23 > To: paul@xen.org > Cc: xen-devel@lists.xenproject.org; 'Varad Gautam' <vrd@amazon.de>; 'Julien Grall' <julien@xen.org>; > 'Roger Pau Monné' <roger.pau@citrix.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com> > Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs > > On 09.03.2020 18:47, Paul Durrant wrote: > >> -----Original Message----- > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: 09 March 2020 16:29 > >> > >> On 06.03.2020 17:02, paul@xen.org wrote: > >>> --- a/xen/arch/x86/irq.c > >>> +++ b/xen/arch/x86/irq.c > >>> @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind( > >>> > >>> BUG_ON(!(desc->status & IRQ_GUEST)); > >>> > >>> - for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) > >>> - continue; > >>> - BUG_ON(i == action->nr_guests); > >>> + for ( i = 0; i < action->nr_guests; i++ ) > >>> + if ( action->guest[i] == d ) > >>> + break; > >>> + > >>> + if ( i == action->nr_guests ) /* No matching entry */ > >>> + { > >>> + /* > >>> + * In case the pirq was shared, unbound for this domain in an earlier > >>> + * call, but still existed on the domain's pirq_tree, we still reach > >>> + * here if there are any later unbind calls on the same pirq. Return > >>> + * if such an unbind happens. > >>> + */ > >>> + ASSERT(action->shareable); > >>> + return NULL; > >>> + } > >>> + > >>> + ASSERT(action->nr_guests > 0); > >> > >> This seems pointless to have here - v3 had it inside the if(), > >> where it would actually guard against coming here with nr_guests > >> equal to zero. > > > > Why. The code just after this decrements nr_guests so it seems > > like entirely the right point to have the ASSERT. I can make it > > ASSERT >= 0, if that makes more sense. > > There's no way to come here when nr_guests == 0. This is because > in this case the loop will be exited with i being zero, and hence > the earlier if()'s body will be entered. Ok, yes, that's true. > > (And no, >= 0 wouldn't make sense to me - it would mean we might > have a count of -1 after the decrement.) > > >> v3 also used if() and BUG() instead of ASSERT() > >> inside this if(), which to me would seem more in line with our > >> current ./CODING_STYLE guidelines of handling unexpected > >> conditions. Could you clarify why you switched things? > >> > > > > Because I don't think there is need to kill the host in a > > non-debug context if we hit this condition; it is entirely > > survivable as far as I can tell so a BUG_ON() did not seem > > appropriate. > > It'll mean we have a non-sharable IRQ in a place where this is > not supposed to happen. How can we be sure the system is in a > state allowing to safely continue? To me, if shareable / non- > shareable is of any concern here, then it ought to be BUG(). > If it's not, then the ASSERT() ought to be dropped as well. Ok, I'll convert back to a BUG(). Paul > > Jan
On 10.03.2020 13:36, Paul Durrant wrote:
> Ok, I'll convert back to a BUG().
Wait a little - I think I have an alternative proposal. Just want to
at least smoke test it before sending out.
Jan
On 09.03.2020 18:47, Paul Durrant wrote: >> -----Original Message----- >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 09 March 2020 16:29 >> To: paul@xen.org >> Cc: xen-devel@lists.xenproject.org; Varad Gautam <vrd@amazon.de>; Julien Grall <julien@xen.org>; Roger >> Pau Monné <roger.pau@citrix.com>; Andrew Cooper <andrew.cooper3@citrix.com> >> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs >> >> On 06.03.2020 17:02, paul@xen.org wrote: >>> From: Varad Gautam <vrd@amazon.de> >>> >>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. >>> In that scenario, it is possible to receive multiple __pirq_guest_unbind >>> calls for the same pirq from domain_kill, if the pirq has not yet been >>> removed from the domain's pirq_tree, as: >>> domain_kill() >>> -> domain_relinquish_resources() >>> -> pci_release_devices() >>> -> pci_clean_dpci_irq() >>> -> pirq_guest_unbind() >>> -> __pirq_guest_unbind() >>> >>> For a shared pirq (nr_guests > 1), the first call would zap the current >>> domain from the pirq's guests[] list, but the action handler is never freed >>> as there are other guests using this pirq. As a result, on the second call, >>> __pirq_guest_unbind searches for the current domain which has been removed >>> from the guests[] list, and hits a BUG_ON. >>> >>> Make __pirq_guest_unbind safe to be called multiple times by letting xen >>> continue if a shared pirq has already been unbound from this guest. The >>> PIRQ will be cleaned up from the domain's pirq_tree during the destruction >>> in complete_domain_destroy anyway. >>> >>> Signed-off-by: Varad Gautam <vrd@amazon.de> >>> [taking over from Varad at v4] >>> Signed-off-by: Paul Durrant <paul@xen.org> >>> --- >>> Cc: Jan Beulich <jbeulich@suse.com> >>> Cc: Julien Grall <julien@xen.org> >>> Cc: Roger Pau Monné <roger.pau@citrix.com> >>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >>> >>> Roger suggested cleaning the entry from the domain pirq_tree so that >>> we need not make it safe to re-call __pirq_guest_unbind(). This seems like >>> a reasonable suggestion but the semantics of the code are almost >>> impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also >>> the name of struct so you generally have little idea what it actally means) >>> so I prefer to stick with a small fix that I can actually reason about. >>> >>> v4: >>> - Re-work the guest array search to make it clearer >> >> I.e. there are cosmetic differences to v3 (see below), but >> technically it's still the same. I can't believe the re-use >> of "pirq" for different entities is this big of a problem. > > Please suggest code if you think it ought to be done differentely. I tried. How about this? It's admittedly more code, but imo less ad hoc. I've smoke tested it, but I depend on you or Varad to check that it actually addresses the reported issue. Jan x86/pass-through: avoid double IRQ unbind during domain cleanup XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. In that scenario, it is possible to receive multiple _pirq_guest_unbind calls for the same pirq from domain_kill, if the pirq has not yet been removed from the domain's pirq_tree, as: domain_kill() -> domain_relinquish_resources() -> pci_release_devices() -> pci_clean_dpci_irq() -> pirq_guest_unbind() -> __pirq_guest_unbind() Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ from the tree being iterated after the first call there. In case such a removed entry still has a softirq outstanding, record it and re-check upon re-invocation. Reported-by: Varad Gautam <vrd@amazon.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- unstable.orig/xen/arch/x86/irq.c +++ unstable/xen/arch/x86/irq.c @@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p } if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq ) - BUG(); + BUG_ON(!d->is_dying); } /* Flush all ready EOIs from the top of this CPU's pending-EOI stack. */ --- unstable.orig/xen/drivers/passthrough/pci.c +++ unstable/xen/drivers/passthrough/pci.c @@ -873,7 +873,14 @@ static int pci_clean_dpci_irq(struct dom xfree(digl); } - return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0; + radix_tree_delete(&d->pirq_tree, dpci_pirq(pirq_dpci)->pirq); + + if ( !pt_pirq_softirq_active(pirq_dpci) ) + return 0; + + domain_get_irq_dpci(d)->pending_pirq_dpci = pirq_dpci; + + return -ERESTART; } static int pci_clean_dpci_irqs(struct domain *d) @@ -890,8 +897,18 @@ static int pci_clean_dpci_irqs(struct do hvm_irq_dpci = domain_get_irq_dpci(d); if ( hvm_irq_dpci != NULL ) { - int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); + int ret = 0; + + if ( hvm_irq_dpci->pending_pirq_dpci ) + { + if ( pt_pirq_softirq_active(hvm_irq_dpci->pending_pirq_dpci) ) + ret = -ERESTART; + else + hvm_irq_dpci->pending_pirq_dpci = NULL; + } + if ( !ret ) + ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); if ( ret ) { spin_unlock(&d->event_lock); --- unstable.orig/xen/include/asm-x86/hvm/irq.h +++ unstable/xen/include/asm-x86/hvm/irq.h @@ -158,6 +158,8 @@ struct hvm_irq_dpci { DECLARE_BITMAP(isairq_map, NR_ISAIRQS); /* Record of mapped Links */ uint8_t link_cnt[NR_LINK]; + /* Clean up: Entry with a softirq invocation pending / in progress. */ + struct hvm_pirq_dpci *pending_pirq_dpci; }; /* Machine IRQ to guest device/intx mapping. */
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 10 March 2020 14:19 > To: paul@xen.org > Cc: xen-devel@lists.xenproject.org; 'Varad Gautam' <vrd@amazon.de>; 'Julien Grall' <julien@xen.org>; > 'Roger Pau Monné' <roger.pau@citrix.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com> > Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs > > On 09.03.2020 18:47, Paul Durrant wrote: > >> -----Original Message----- > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: 09 March 2020 16:29 > >> To: paul@xen.org > >> Cc: xen-devel@lists.xenproject.org; Varad Gautam <vrd@amazon.de>; Julien Grall <julien@xen.org>; > Roger > >> Pau Monné <roger.pau@citrix.com>; Andrew Cooper <andrew.cooper3@citrix.com> > >> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs > >> > >> On 06.03.2020 17:02, paul@xen.org wrote: > >>> From: Varad Gautam <vrd@amazon.de> > >>> > >>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. > >>> In that scenario, it is possible to receive multiple __pirq_guest_unbind > >>> calls for the same pirq from domain_kill, if the pirq has not yet been > >>> removed from the domain's pirq_tree, as: > >>> domain_kill() > >>> -> domain_relinquish_resources() > >>> -> pci_release_devices() > >>> -> pci_clean_dpci_irq() > >>> -> pirq_guest_unbind() > >>> -> __pirq_guest_unbind() > >>> > >>> For a shared pirq (nr_guests > 1), the first call would zap the current > >>> domain from the pirq's guests[] list, but the action handler is never freed > >>> as there are other guests using this pirq. As a result, on the second call, > >>> __pirq_guest_unbind searches for the current domain which has been removed > >>> from the guests[] list, and hits a BUG_ON. > >>> > >>> Make __pirq_guest_unbind safe to be called multiple times by letting xen > >>> continue if a shared pirq has already been unbound from this guest. The > >>> PIRQ will be cleaned up from the domain's pirq_tree during the destruction > >>> in complete_domain_destroy anyway. > >>> > >>> Signed-off-by: Varad Gautam <vrd@amazon.de> > >>> [taking over from Varad at v4] > >>> Signed-off-by: Paul Durrant <paul@xen.org> > >>> --- > >>> Cc: Jan Beulich <jbeulich@suse.com> > >>> Cc: Julien Grall <julien@xen.org> > >>> Cc: Roger Pau Monné <roger.pau@citrix.com> > >>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> > >>> > >>> Roger suggested cleaning the entry from the domain pirq_tree so that > >>> we need not make it safe to re-call __pirq_guest_unbind(). This seems like > >>> a reasonable suggestion but the semantics of the code are almost > >>> impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also > >>> the name of struct so you generally have little idea what it actally means) > >>> so I prefer to stick with a small fix that I can actually reason about. > >>> > >>> v4: > >>> - Re-work the guest array search to make it clearer > >> > >> I.e. there are cosmetic differences to v3 (see below), but > >> technically it's still the same. I can't believe the re-use > >> of "pirq" for different entities is this big of a problem. > > > > Please suggest code if you think it ought to be done differentely. I tried. > > How about this? It's admittedly more code, but imo less ad hoc. > I've smoke tested it, but I depend on you or Varad to check that > it actually addresses the reported issue. It's fairly hard to hit IIRC but we could probably engineer it with a one off ERESTART in the right place. > > Jan > > x86/pass-through: avoid double IRQ unbind during domain cleanup > > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. > In that scenario, it is possible to receive multiple _pirq_guest_unbind > calls for the same pirq from domain_kill, if the pirq has not yet been > removed from the domain's pirq_tree, as: > domain_kill() > -> domain_relinquish_resources() > -> pci_release_devices() > -> pci_clean_dpci_irq() > -> pirq_guest_unbind() > -> __pirq_guest_unbind() > > Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ > from the tree being iterated after the first call there. In case such a > removed entry still has a softirq outstanding, record it and re-check > upon re-invocation. > > Reported-by: Varad Gautam <vrd@amazon.de> > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- unstable.orig/xen/arch/x86/irq.c > +++ unstable/xen/arch/x86/irq.c > @@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p > } > > if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq ) > - BUG(); > + BUG_ON(!d->is_dying); > } > > /* Flush all ready EOIs from the top of this CPU's pending-EOI stack. */ > --- unstable.orig/xen/drivers/passthrough/pci.c > +++ unstable/xen/drivers/passthrough/pci.c > @@ -873,7 +873,14 @@ static int pci_clean_dpci_irq(struct dom > xfree(digl); > } > > - return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0; > + radix_tree_delete(&d->pirq_tree, dpci_pirq(pirq_dpci)->pirq); > + > + if ( !pt_pirq_softirq_active(pirq_dpci) ) > + return 0; > + > + domain_get_irq_dpci(d)->pending_pirq_dpci = pirq_dpci; > + > + return -ERESTART; > } > > static int pci_clean_dpci_irqs(struct domain *d) > @@ -890,8 +897,18 @@ static int pci_clean_dpci_irqs(struct do > hvm_irq_dpci = domain_get_irq_dpci(d); > if ( hvm_irq_dpci != NULL ) > { > - int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); > + int ret = 0; > + > + if ( hvm_irq_dpci->pending_pirq_dpci ) > + { > + if ( pt_pirq_softirq_active(hvm_irq_dpci->pending_pirq_dpci) ) > + ret = -ERESTART; > + else > + hvm_irq_dpci->pending_pirq_dpci = NULL; > + } > > + if ( !ret ) > + ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); > if ( ret ) > { > spin_unlock(&d->event_lock); > --- unstable.orig/xen/include/asm-x86/hvm/irq.h > +++ unstable/xen/include/asm-x86/hvm/irq.h > @@ -158,6 +158,8 @@ struct hvm_irq_dpci { > DECLARE_BITMAP(isairq_map, NR_ISAIRQS); > /* Record of mapped Links */ > uint8_t link_cnt[NR_LINK]; > + /* Clean up: Entry with a softirq invocation pending / in progress. */ > + struct hvm_pirq_dpci *pending_pirq_dpci; > }; > That looks like it will do the job. I'll see if I can get it tested. Paul > /* Machine IRQ to guest device/intx mapping. */ >
On 17.03.2020 16:23, Paul Durrant wrote:
> That looks like it will do the job. I'll see if I can get it tested.
Any luck with this, yet?
Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 31 March 2020 08:41 > To: paul@xen.org > Cc: xen-devel@lists.xenproject.org; 'Varad Gautam' <vrd@amazon.de>; 'Julien Grall' <julien@xen.org>; > 'Roger Pau Monné' <roger.pau@citrix.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com> > Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs > > On 17.03.2020 16:23, Paul Durrant wrote: > > That looks like it will do the job. I'll see if I can get it tested. > > Any luck with this, yet? > I have asked Varad to test it. He hopes to get to it this week. Paul
On 31.03.2020 13:51, Paul Durrant wrote: >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 31 March 2020 08:41 >> >> On 17.03.2020 16:23, Paul Durrant wrote: >>> That looks like it will do the job. I'll see if I can get it tested. >> >> Any luck with this, yet? > > I have asked Varad to test it. He hopes to get to it this week. Any news here? Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 23 April 2020 12:09 > To: paul@xen.org; 'Varad Gautam' <vrd@amazon.de> > Cc: xen-devel@lists.xenproject.org; 'Julien Grall' <julien@xen.org>; 'Roger Pau Monné' > <roger.pau@citrix.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com> > Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs > > On 31.03.2020 13:51, Paul Durrant wrote: > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: 31 March 2020 08:41 > >> > >> On 17.03.2020 16:23, Paul Durrant wrote: > >>> That looks like it will do the job. I'll see if I can get it tested. > >> > >> Any luck with this, yet? > > > > I have asked Varad to test it. He hopes to get to it this week. > > Any news here? > Varad tells me he is currently testing and will get back to you soon (hopefully today). Paul
Hi Jan, On 3/10/20 3:19 PM, Jan Beulich wrote: > On 09.03.2020 18:47, Paul Durrant wrote: >>> -----Original Message----- >>> From: Jan Beulich <jbeulich@suse.com> >>> Sent: 09 March 2020 16:29 >>> To: paul@xen.org >>> Cc: xen-devel@lists.xenproject.org; Varad Gautam <vrd@amazon.de>; Julien Grall <julien@xen.org>; Roger >>> Pau Monné <roger.pau@citrix.com>; Andrew Cooper <andrew.cooper3@citrix.com> >>> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs >>> >>> On 06.03.2020 17:02, paul@xen.org wrote: >>>> From: Varad Gautam <vrd@amazon.de> >>>> >>>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. >>>> In that scenario, it is possible to receive multiple __pirq_guest_unbind >>>> calls for the same pirq from domain_kill, if the pirq has not yet been >>>> removed from the domain's pirq_tree, as: >>>> domain_kill() >>>> -> domain_relinquish_resources() >>>> -> pci_release_devices() >>>> -> pci_clean_dpci_irq() >>>> -> pirq_guest_unbind() >>>> -> __pirq_guest_unbind() >>>> >>>> For a shared pirq (nr_guests > 1), the first call would zap the current >>>> domain from the pirq's guests[] list, but the action handler is never freed >>>> as there are other guests using this pirq. As a result, on the second call, >>>> __pirq_guest_unbind searches for the current domain which has been removed >>>> from the guests[] list, and hits a BUG_ON. >>>> >>>> Make __pirq_guest_unbind safe to be called multiple times by letting xen >>>> continue if a shared pirq has already been unbound from this guest. The >>>> PIRQ will be cleaned up from the domain's pirq_tree during the destruction >>>> in complete_domain_destroy anyway. >>>> >>>> Signed-off-by: Varad Gautam <vrd@amazon.de> >>>> [taking over from Varad at v4] >>>> Signed-off-by: Paul Durrant <paul@xen.org> >>>> --- >>>> Cc: Jan Beulich <jbeulich@suse.com> >>>> Cc: Julien Grall <julien@xen.org> >>>> Cc: Roger Pau Monné <roger.pau@citrix.com> >>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >>>> >>>> Roger suggested cleaning the entry from the domain pirq_tree so that >>>> we need not make it safe to re-call __pirq_guest_unbind(). This seems like >>>> a reasonable suggestion but the semantics of the code are almost >>>> impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also >>>> the name of struct so you generally have little idea what it actally means) >>>> so I prefer to stick with a small fix that I can actually reason about. >>>> >>>> v4: >>>> - Re-work the guest array search to make it clearer >>> I.e. there are cosmetic differences to v3 (see below), but >>> technically it's still the same. I can't believe the re-use >>> of "pirq" for different entities is this big of a problem. >> Please suggest code if you think it ought to be done differentely. I tried. > How about this? It's admittedly more code, but imo less ad hoc. > I've smoke tested it, but I depend on you or Varad to check that > it actually addresses the reported issue. > > Jan > > x86/pass-through: avoid double IRQ unbind during domain cleanup I have tested that this patch prevents __pirq_guest_unbind on an already-unbound pirq during the continuation call for domain_kill -ERESTART, by using a modified xen that forces an -ERESTART from pirq_guest_unbind to create the continuation. It fixes the underlying issue. Tested-by: Varad Gautam <vrd@amazon.de> > > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. > In that scenario, it is possible to receive multiple _pirq_guest_unbind > calls for the same pirq from domain_kill, if the pirq has not yet been > removed from the domain's pirq_tree, as: > domain_kill() > -> domain_relinquish_resources() > -> pci_release_devices() > -> pci_clean_dpci_irq() > -> pirq_guest_unbind() > -> __pirq_guest_unbind() > > Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ > from the tree being iterated after the first call there. In case such a > removed entry still has a softirq outstanding, record it and re-check > upon re-invocation. > > Reported-by: Varad Gautam <vrd@amazon.de> > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- unstable.orig/xen/arch/x86/irq.c > +++ unstable/xen/arch/x86/irq.c > @@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p > } > > if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq ) > - BUG(); > + BUG_ON(!d->is_dying); > } > > /* Flush all ready EOIs from the top of this CPU's pending-EOI stack. */ > --- unstable.orig/xen/drivers/passthrough/pci.c > +++ unstable/xen/drivers/passthrough/pci.c > @@ -873,7 +873,14 @@ static int pci_clean_dpci_irq(struct dom > xfree(digl); > } > > - return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0; > + radix_tree_delete(&d->pirq_tree, dpci_pirq(pirq_dpci)->pirq); > + > + if ( !pt_pirq_softirq_active(pirq_dpci) ) > + return 0; > + > + domain_get_irq_dpci(d)->pending_pirq_dpci = pirq_dpci; > + > + return -ERESTART; > } > > static int pci_clean_dpci_irqs(struct domain *d) > @@ -890,8 +897,18 @@ static int pci_clean_dpci_irqs(struct do > hvm_irq_dpci = domain_get_irq_dpci(d); > if ( hvm_irq_dpci != NULL ) > { > - int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); > + int ret = 0; > + > + if ( hvm_irq_dpci->pending_pirq_dpci ) > + { > + if ( pt_pirq_softirq_active(hvm_irq_dpci->pending_pirq_dpci) ) > + ret = -ERESTART; > + else > + hvm_irq_dpci->pending_pirq_dpci = NULL; > + } > > + if ( !ret ) > + ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); > if ( ret ) > { > spin_unlock(&d->event_lock); > --- unstable.orig/xen/include/asm-x86/hvm/irq.h > +++ unstable/xen/include/asm-x86/hvm/irq.h > @@ -158,6 +158,8 @@ struct hvm_irq_dpci { > DECLARE_BITMAP(isairq_map, NR_ISAIRQS); > /* Record of mapped Links */ > uint8_t link_cnt[NR_LINK]; > + /* Clean up: Entry with a softirq invocation pending / in progress. */ > + struct hvm_pirq_dpci *pending_pirq_dpci; > }; > > /* Machine IRQ to guest device/intx mapping. */ > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On 28.04.2020 13:58, vrd@amazon.com wrote: > On 3/10/20 3:19 PM, Jan Beulich wrote: >> On 09.03.2020 18:47, Paul Durrant wrote: >>> Please suggest code if you think it ought to be done differentely. I tried. >> How about this? It's admittedly more code, but imo less ad hoc. >> I've smoke tested it, but I depend on you or Varad to check that >> it actually addresses the reported issue. >> >> Jan >> >> x86/pass-through: avoid double IRQ unbind during domain cleanup > > > I have tested that this patch prevents __pirq_guest_unbind on an already-unbound pirq > during the continuation call for domain_kill -ERESTART, by using a modified xen that > forces an -ERESTART from pirq_guest_unbind to create the continuation. It fixes the > underlying issue. > > Tested-by: Varad Gautam <vrd@amazon.de> Thanks much; I'll formally submit the patch then. Jan
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index cc2eb8e925..32fcb624dc 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind( BUG_ON(!(desc->status & IRQ_GUEST)); - for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) - continue; - BUG_ON(i == action->nr_guests); + for ( i = 0; i < action->nr_guests; i++ ) + if ( action->guest[i] == d ) + break; + + if ( i == action->nr_guests ) /* No matching entry */ + { + /* + * In case the pirq was shared, unbound for this domain in an earlier + * call, but still existed on the domain's pirq_tree, we still reach + * here if there are any later unbind calls on the same pirq. Return + * if such an unbind happens. + */ + ASSERT(action->shareable); + return NULL; + } + + ASSERT(action->nr_guests > 0); memmove(&action->guest[i], &action->guest[i+1], (action->nr_guests-i-1) * sizeof(action->guest[0])); action->nr_guests--;