Message ID | 1482372913-18366-4-git-send-email-sstabellini@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Stefano, On 22/12/16 02:15, Stefano Stabellini wrote: > gic_update_one_lr is called with the vgic lock held, but it calls > vgic_get_target_vcpu, which tries to obtain the rank lock. This can > cause deadlocks. > > We already have a version of vgic_get_target_vcpu that doesn't take the > rank lock: __vgic_get_target_vcpu. > > Solve the lock inversion problem, by not taking the rank lock in > gic_update_one_lr (calling __vgic_get_target_vcpu instead of > vgic_get_target_vcpu). This is safe, because vcpu target modifications > are protected by the same vgic vcpu lock. I maintain what I said on the first version of this patch. All this patch could be simplified by moving the call to irq_set_affinity in vgic_irq_migrate. There are no restriction to do that and no need to have any lock taken but the rank lock. Regards,
On Wed, 28 Dec 2016, Julien Grall wrote: > Hi Stefano, > > On 22/12/16 02:15, Stefano Stabellini wrote: > > gic_update_one_lr is called with the vgic lock held, but it calls > > vgic_get_target_vcpu, which tries to obtain the rank lock. This can > > cause deadlocks. > > > > We already have a version of vgic_get_target_vcpu that doesn't take the > > rank lock: __vgic_get_target_vcpu. > > > > Solve the lock inversion problem, by not taking the rank lock in > > gic_update_one_lr (calling __vgic_get_target_vcpu instead of > > vgic_get_target_vcpu). This is safe, because vcpu target modifications > > are protected by the same vgic vcpu lock. > > I maintain what I said on the first version of this patch. All this patch > could be simplified by moving the call to irq_set_affinity in > vgic_irq_migrate. > > There are no restriction to do that and no need to have any lock taken but the > rank lock. All right. I'll submit a patch that does exactly that. It is not perfect, because it drops interrupts in the problematic scenario, but it should be a good place to start for a technical discussion. If it turns out that this is the best approach, we can come back to it. Cheers, Stefano
Hi Stefano, On 03/01/17 22:51, Stefano Stabellini wrote: > On Wed, 28 Dec 2016, Julien Grall wrote: >> Hi Stefano, >> >> On 22/12/16 02:15, Stefano Stabellini wrote: >>> gic_update_one_lr is called with the vgic lock held, but it calls >>> vgic_get_target_vcpu, which tries to obtain the rank lock. This can >>> cause deadlocks. >>> >>> We already have a version of vgic_get_target_vcpu that doesn't take the >>> rank lock: __vgic_get_target_vcpu. >>> >>> Solve the lock inversion problem, by not taking the rank lock in >>> gic_update_one_lr (calling __vgic_get_target_vcpu instead of >>> vgic_get_target_vcpu). This is safe, because vcpu target modifications >>> are protected by the same vgic vcpu lock. >> >> I maintain what I said on the first version of this patch. All this patch >> could be simplified by moving the call to irq_set_affinity in >> vgic_irq_migrate. >> >> There are no restriction to do that and no need to have any lock taken but the >> rank lock. > > All right. I'll submit a patch that does exactly that. It is not > perfect, because it drops interrupts in the problematic scenario, but it > should be a good place to start for a technical discussion. I think I have missed something. What is the problematic scenario you are describing? Looking at the v3, I don't see any mention of this problem. Does it mean it has been solved? Cheers,
On Mon, 16 Jan 2017, Julien Grall wrote: > Hi Stefano, > > On 03/01/17 22:51, Stefano Stabellini wrote: > > On Wed, 28 Dec 2016, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 22/12/16 02:15, Stefano Stabellini wrote: > > > > gic_update_one_lr is called with the vgic lock held, but it calls > > > > vgic_get_target_vcpu, which tries to obtain the rank lock. This can > > > > cause deadlocks. > > > > > > > > We already have a version of vgic_get_target_vcpu that doesn't take the > > > > rank lock: __vgic_get_target_vcpu. > > > > > > > > Solve the lock inversion problem, by not taking the rank lock in > > > > gic_update_one_lr (calling __vgic_get_target_vcpu instead of > > > > vgic_get_target_vcpu). This is safe, because vcpu target modifications > > > > are protected by the same vgic vcpu lock. > > > > > > I maintain what I said on the first version of this patch. All this patch > > > could be simplified by moving the call to irq_set_affinity in > > > vgic_irq_migrate. > > > > > > There are no restriction to do that and no need to have any lock taken but > > > the > > > rank lock. > > > > All right. I'll submit a patch that does exactly that. It is not > > perfect, because it drops interrupts in the problematic scenario, but it > > should be a good place to start for a technical discussion. > > I think I have missed something. What is the problematic scenario you are > describing? Looking at the v3, I don't see any mention of this problem. Does > it mean it has been solved? No, it hasn't been solved. Sorry for not having been clearer here. The issue is described in the commit message of http://marc.info/?l=xen-devel&m=148348625720995: After migrating an interrupt from vcpu/pcpu 0 to vcpu/pcpu 1, it is possible to receive a physical interrupt on pcpu 1, which Xen is supposed to inject into vcpu 1, before the LR on pcpu 0 has been cleared. In this case the irq is still marked as GIC_IRQ_GUEST_MIGRATING, and struct pending_irq is still "inflight" on vcpu 0. As the irq cannot be "inflight" on vcpu 0 and vcpu 1 simultaneously, drop the interrupt. This is not the right behavior: the interrupt should be injected into vcpu 1.
Hi Stefano, On 16/01/2017 19:10, Stefano Stabellini wrote: > On Mon, 16 Jan 2017, Julien Grall wrote: >> Hi Stefano, >> >> On 03/01/17 22:51, Stefano Stabellini wrote: >>> On Wed, 28 Dec 2016, Julien Grall wrote: >>>> Hi Stefano, >>>> >>>> On 22/12/16 02:15, Stefano Stabellini wrote: >>>>> gic_update_one_lr is called with the vgic lock held, but it calls >>>>> vgic_get_target_vcpu, which tries to obtain the rank lock. This can >>>>> cause deadlocks. >>>>> >>>>> We already have a version of vgic_get_target_vcpu that doesn't take the >>>>> rank lock: __vgic_get_target_vcpu. >>>>> >>>>> Solve the lock inversion problem, by not taking the rank lock in >>>>> gic_update_one_lr (calling __vgic_get_target_vcpu instead of >>>>> vgic_get_target_vcpu). This is safe, because vcpu target modifications >>>>> are protected by the same vgic vcpu lock. >>>> >>>> I maintain what I said on the first version of this patch. All this patch >>>> could be simplified by moving the call to irq_set_affinity in >>>> vgic_irq_migrate. >>>> >>>> There are no restriction to do that and no need to have any lock taken but >>>> the >>>> rank lock. >>> >>> All right. I'll submit a patch that does exactly that. It is not >>> perfect, because it drops interrupts in the problematic scenario, but it >>> should be a good place to start for a technical discussion. >> >> I think I have missed something. What is the problematic scenario you are >> describing? Looking at the v3, I don't see any mention of this problem. Does >> it mean it has been solved? > > No, it hasn't been solved. Sorry for not having been clearer here. The > issue is described in the commit message of > http://marc.info/?l=xen-devel&m=148348625720995: > > After migrating an interrupt from vcpu/pcpu 0 to vcpu/pcpu 1, it is > possible to receive a physical interrupt on pcpu 1, which Xen is > supposed to inject into vcpu 1, before the LR on pcpu 0 has been > cleared. In this case the irq is still marked as > GIC_IRQ_GUEST_MIGRATING, and struct pending_irq is still "inflight" on > vcpu 0. As the irq cannot be "inflight" on vcpu 0 and vcpu 1 > simultaneously, drop the interrupt. > > This is not the right behavior: the interrupt should be injected into > vcpu 1. I haven't yet reviewed v3, hence why I missed this paragraph, sorry. As you suggested on an earlier mail, I will start the discussion there. Cheers,
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 51148b4..28ab2f9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -509,7 +509,7 @@ static void gic_update_one_lr(struct vcpu *v, int i) list_del_init(&p->inflight); if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) { - struct vcpu *v_target = vgic_get_target_vcpu(v, irq); + struct vcpu *v_target = __vgic_get_target_vcpu(v, irq); irq_set_affinity(p->desc, cpumask_of(v_target->processor)); } /* diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index cceac24..c89b85f 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -221,12 +221,10 @@ int vcpu_vgic_free(struct vcpu *v) } /* The function should be called by rank lock taken. */ -static struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq) +struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq) { struct vgic_irq_rank *rank = vgic_rank_irq(v, virq); - ASSERT(spin_is_locked(&rank->lock)); - return v->domain->vcpu[rank->vcpu[virq & INTERRUPT_RANK_MASK]]; } diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index dce2f84..26594b0 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -295,6 +295,7 @@ extern int domain_vgic_init(struct domain *d, unsigned int nr_spis); extern void domain_vgic_free(struct domain *d); extern int vcpu_vgic_init(struct vcpu *v); extern struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq); +extern struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq); extern void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int virq); extern void vgic_vcpu_inject_spi(struct domain *d, unsigned int virq); extern void vgic_clear_pending_irqs(struct vcpu *v);
gic_update_one_lr is called with the vgic lock held, but it calls vgic_get_target_vcpu, which tries to obtain the rank lock. This can cause deadlocks. We already have a version of vgic_get_target_vcpu that doesn't take the rank lock: __vgic_get_target_vcpu. Solve the lock inversion problem, by not taking the rank lock in gic_update_one_lr (calling __vgic_get_target_vcpu instead of vgic_get_target_vcpu). This is safe, because vcpu target modifications are protected by the same vgic vcpu lock. Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> --- xen/arch/arm/gic.c | 2 +- xen/arch/arm/vgic.c | 4 +--- xen/include/asm-arm/vgic.h | 1 + 3 files changed, 3 insertions(+), 4 deletions(-)