Message ID | 1562165173-31383-1-git-send-email-vrd@amazon.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs | expand |
On 03.07.2019 16:46, Varad Gautam wrote: > It is possible to receive multiple __pirq_guest_unbind calls for the same pirq > if the pirq has not yet been removed from the domain's pirq_tree. I'd appreciate if you would make clear under what conditions this can happen, as I'm getting the impression that it's not the BUG_ON() that wants removing here, but that instead some caller may need fixing, or that instead the pirq tree removal needs to happen earlier. Afaict the call from evtchn_close() can't happen more than once, for example, and I wouldn't be surprised at all if one of the callers from xen/drivers/passthrough/ wasn't sufficiently gated. > To apply stable-4.11 onwards. That's based on you having found a broken commit in the 4.11 development window (if so, please name the commit), or simply because 4.10 is about to go out of general support? > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -1711,7 +1711,15 @@ static irq_guest_action_t *__pirq_guest_unbind( > > for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) > continue; > - BUG_ON(i == action->nr_guests); > + if ( i == action->nr_guests ) { > + /* In case the pirq was shared, unbound for this domain in an earlier call, but still > + * existed on the domain's pirq_tree, we still reach here if there are any later > + * unbind calls on the same pirq. Return if such an unbind happens. */ Style: The opening brace is misplaced, plus the comment is not properly formatted and has overly long lines. > + if ( action->nr_guests > 0 && action->shareable ) > + return NULL; Why does the sharable aspect matter here? Or asked differently, why can the same situation (multiple unbind requests) not arise for non-sharable IRQs? Similary, why would this same situation not arise for the last guest getting unbound from the IRQ? There is an "action == NULL" check earlier on, but if multiple calls were legitimate, then the dprintk() there should go away (or be gated suitably) as well. Jan
On Wed, Jul 03, 2019 at 04:46:13PM +0200, Varad Gautam wrote: > It is possible to receive multiple __pirq_guest_unbind calls for the same pirq > if the pirq has not yet been removed from the domain's pirq_tree. For a shared > pirq (nr_guests > 1), the first call zaps the current domain from the pirq's > guests[] list, but the action handler is never freed as there are other guests > using this pirq. As a result, on the second call, __pirq_guest_unbind tries > search for the current domain which has been removed from the guests[] list, > and hits a BUG_ON. Thanks for digging into this, the passthrough code is quite complex and not easy to debug. As raised by Jan, I'm not sure I see how the above can happen. For PIRQs bound to event channels the freeing can only happen once, as the event channel is afterwards closed and further calls won't make progress (regardless of whether the underlying PIRQ is shared or not). So I assume this is something that you have seen with HVM guests and dpci? Getting the full trace that leads to the BUG_ON would be very helpful, because as pointed out this is likely something that wants to be fixed in the caller of pirq_guest_unbind. Do you have a reliable way to reproduce? > Allow xen to continue if a shared pirq has already been unbound from this guest. > It will be cleaned up from the domain's pirq_tree during the destruction in > complete_domain_destroy. > > Signed-off-by: Varad Gautam <vrd@amazon.de> > > --- > > To apply stable-4.11 onwards. Do you also have a bisection of when the issue was introduced? Thanks, Roger.
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 094c3c5..256f503 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1711,7 +1711,15 @@ static irq_guest_action_t *__pirq_guest_unbind( for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) continue; - BUG_ON(i == action->nr_guests); + if ( i == action->nr_guests ) { + /* In case the pirq was shared, unbound for this domain in an earlier call, but still + * existed on the domain's pirq_tree, we still reach here if there are any later + * unbind calls on the same pirq. Return if such an unbind happens. */ + if ( action->nr_guests > 0 && action->shareable ) + return NULL; + BUG(); + } + memmove(&action->guest[i], &action->guest[i+1], (action->nr_guests-i-1) * sizeof(action->guest[0])); action->nr_guests--;
It is possible to receive multiple __pirq_guest_unbind calls for the same pirq if the pirq has not yet been removed from the domain's pirq_tree. For a shared pirq (nr_guests > 1), the first call zaps the current domain from the pirq's guests[] list, but the action handler is never freed as there are other guests using this pirq. As a result, on the second call, __pirq_guest_unbind tries search for the current domain which has been removed from the guests[] list, and hits a BUG_ON. Allow xen to continue if a shared pirq has already been unbound from this guest. It will be cleaned up from the domain's pirq_tree during the destruction in complete_domain_destroy. Signed-off-by: Varad Gautam <vrd@amazon.de> --- To apply stable-4.11 onwards. xen/arch/x86/irq.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)