Message ID | 20240705093155.871070-1-maz@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | irqchip/gic-v4: Fix VMAPP/VMOVP races | expand |
On 7/5/2024 17:31, Marc Zyngier wrote: > In 20240701072305.4129823-1-tangnianyao@huawei.com, Nianyao reports > a number of possible races that can trigger on GICv4 implementations > using the ITSList feature. > > These races involve concurrent VMOVP and VMAPP, the former happening > on vcpu load, while the latter is triggered on the first device being > MAPTI'd on a given ITS for this particular guest. > > The main issue is that we want to establish the affinity at VMAPP time, > while vcpu_load wants to set the affinity where the vcpu actually runs. > Lock ordering constraints mean that we can't lock the VPE on VMAPP, > so things are modified without any lock. What could possibly go wrong? > > THe fix is a bit involved, and relies on 3 things: > > - Making sure that the initial affinity of a VPE is fixed at activate > time, which is done early in the life of the vcpup, before it can run. > > - Add a per-VM lock that can be taken instead of the global vmovp_lock, > paving the way for a more manageable lock order. > > - Take the per-VPE lock whenever modifying the VPE affinity, as expected > everywhere else in the code. > > With that, VMAPP and VMOVP can now run concurrently and still lead to > sensible results. > > Marc Zyngier (3): > irqchip/gic-v4: Always configure affinity on VPE activation > irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock > irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued > > drivers/irqchip/irq-gic-v3-its.c | 48 ++++++++++++++---------------- > include/linux/irqchip/arm-gic-v4.h | 8 +++++ > 2 files changed, 30 insertions(+), 26 deletions(-) > I've tested this patch series. It works. Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Hi, Marc I meet another problem while fixing this in kernel 5.10. Kernel 5.10 does not support guard and we replace it with raw_spin_lock/unlock. When guest insmod nic drivers, it trigger host cpu power off somehow. The same testcase runs quite good in kernel 6.6(both host and guest). Would you fix this on long-term kernel 5.10? Or any sugguestion? Thanks for your help. Nianyao Tang On 7/8/2024 10:02, Tangnianyao wrote: > > On 7/5/2024 17:31, Marc Zyngier wrote: >> In 20240701072305.4129823-1-tangnianyao@huawei.com, Nianyao reports >> a number of possible races that can trigger on GICv4 implementations >> using the ITSList feature. >> >> These races involve concurrent VMOVP and VMAPP, the former happening >> on vcpu load, while the latter is triggered on the first device being >> MAPTI'd on a given ITS for this particular guest. >> >> The main issue is that we want to establish the affinity at VMAPP time, >> while vcpu_load wants to set the affinity where the vcpu actually runs. >> Lock ordering constraints mean that we can't lock the VPE on VMAPP, >> so things are modified without any lock. What could possibly go wrong? >> >> THe fix is a bit involved, and relies on 3 things: >> >> - Making sure that the initial affinity of a VPE is fixed at activate >> time, which is done early in the life of the vcpup, before it can run. >> >> - Add a per-VM lock that can be taken instead of the global vmovp_lock, >> paving the way for a more manageable lock order. >> >> - Take the per-VPE lock whenever modifying the VPE affinity, as expected >> everywhere else in the code. >> >> With that, VMAPP and VMOVP can now run concurrently and still lead to >> sensible results. >> >> Marc Zyngier (3): >> irqchip/gic-v4: Always configure affinity on VPE activation >> irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock >> irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued >> >> drivers/irqchip/irq-gic-v3-its.c | 48 ++++++++++++++---------------- >> include/linux/irqchip/arm-gic-v4.h | 8 +++++ >> 2 files changed, 30 insertions(+), 26 deletions(-) >> > I've tested this patch series. It works. > > Tested-by: Nianyao Tang <tangnianyao@huawei.com> > > . >
On Wed, 17 Jul 2024 09:41:30 +0100, Tangnianyao <tangnianyao@huawei.com> wrote: > > Hi, Marc > > I meet another problem while fixing this in kernel 5.10. > > Kernel 5.10 does not support guard and we replace it with raw_spin_lock/unlock. > When guest insmod nic drivers, it trigger host cpu power off somehow. The same > testcase runs quite good in kernel 6.6(both host and guest). This suggests that you got the locking order wrong, that one or several CPUs deadlock, and that a watchdog fires. > Would you fix this on long-term kernel 5.10? Or any sugguestion? Sorry, but I have no intention to maintain such an ancient kernel, which is why I did not Cc stable on these patches. I have no bandwidth for this, nor any interest in it. M.