mbox series

[v1,0/3] ARM: ITS: implement TODO and fix issues with cache

Message ID 20230919112827.1001484-1-volodymyr_babchuk@epam.com (mailing list archive)
Headers show
Series ARM: ITS: implement TODO and fix issues with cache | expand

Message

Volodymyr Babchuk Sept. 19, 2023, 11:28 a.m. UTC
Hello,

There were a couple of issues with GICv3 ITS implementation in
Xen. From user perspective it looks like no interrupts are
delivered. I observed those issues when experimented with SR-IOV on
Renesas S4 board. In my case it wasn't a 100% reproducible issue, so
it took some time and couple of tries to fix it. I wasn't sure if my
fix addressed some hardware quirks of S4 board or it was a generic
solution, so I postponed publishing of it.

Later, Stewart Hildebrand had very simmilar issues with his setup. I
shared those 3 patches with him and they fixed his issue as well. So,
I believe we need those changes in Xen mainline.

Second patch ("ARM: GICv3 ITS: do not invalidate memory while sending
a command") is not strictly required, as it just provides a small
optimization, but I believe it would be nice to have it in the code
base.

Volodymyr Babchuk (3):
  ARM: GICv3 ITS: issue INVALL command after mapping host events
  ARM: GICv3 ITS: do not invalidate memory while sending a command
  ARM: GICv3 ITS: flush all buffers, not just command queue

 xen/arch/arm/gic-v3-its.c             | 27 ++++++++++++++++++++++-----
 xen/arch/arm/include/asm/gic_v3_its.h |  2 +-
 2 files changed, 23 insertions(+), 6 deletions(-)

Comments

Stewart Hildebrand Sept. 19, 2023, 7:07 p.m. UTC | #1
On 9/19/23 07:28, Volodymyr Babchuk wrote:
> Hello,
> 
> There were a couple of issues with GICv3 ITS implementation in
> Xen. From user perspective it looks like no interrupts are
> delivered. I observed those issues when experimented with SR-IOV on
> Renesas S4 board. In my case it wasn't a 100% reproducible issue, so
> it took some time and couple of tries to fix it. I wasn't sure if my
> fix addressed some hardware quirks of S4 board or it was a generic
> solution, so I postponed publishing of it.
> 
> Later, Stewart Hildebrand had very simmilar issues with his setup. I
> shared those 3 patches with him and they fixed his issue as well. So,
> I believe we need those changes in Xen mainline.
> 
> Second patch ("ARM: GICv3 ITS: do not invalidate memory while sending
> a command") is not strictly required, as it just provides a small
> optimization, but I believe it would be nice to have it in the code
> base.

I did a bit more experimentation, and the first patch ("ARM: GICv3 ITS: issue INVALL command after mapping host events") is not strictly required either for my particular test case. But the third one ("ARM: GICv3 ITS: flush all buffers, not just command queue") indeed appears to fix a real bug.

> 
> Volodymyr Babchuk (3):
>   ARM: GICv3 ITS: issue INVALL command after mapping host events
>   ARM: GICv3 ITS: do not invalidate memory while sending a command
>   ARM: GICv3 ITS: flush all buffers, not just command queue
> 

For the curious, here are a few more details about my test case. While testing the ("SMMU handling for PCIe Passthrough on ARM") [1] series on an AMD Versal VCK190 board, I discovered an issue with MSIs in dom0. The driver for the PCIe device has multiple IRQs, but only one of them was being raised in dom0: nvme0q0 was working, but not nvme0q1/2:

xilinx-vck190-20231:~$ cat /proc/interrupts
           CPU0       CPU1
  0:          0         92   xen-dyn     Edge    -event     xenbus
 11:      18084      11575     GICv3  27 Level     arch_timer
 12:         61         90     GICv3  16 Level     events
 13:          0          0     GICv3  62 Level     zynqmp_ipi
 15:          0          0  RC-Event   0 Level     LINK_DOWN
 16:          0          0  RC-Event   3 Level     HOT_RESET
 17:          0          0  RC-Event   4 Level     CFG_PCIE_TIMEOUT
 18:          0          0  RC-Event   8 Level     CFG_TIMEOUT
 19:          0          0  RC-Event   9 Level     CORRECTABLE
 20:          0          0  RC-Event  10 Level     NONFATAL
 21:          0          0  RC-Event  11 Level     FATAL
 22:          0          0  RC-Event  12 Level     CFG_ERR_POISON
 23:          0          0  RC-Event  15 Level     PME_TO_ACK_RCVD
 24:          0          0  RC-Event  17 Level     PM_PME_RCVD
 25:          0          0  RC-Event  20 Level     SLV_UNSUPP
 26:          0          0  RC-Event  21 Level     SLV_UNEXP
 27:          0          0  RC-Event  22 Level     SLV_COMPL
 28:          0          0  RC-Event  23 Level     SLV_ERRP
 29:          0          0  RC-Event  24 Level     SLV_CMPABT
 30:          0          0  RC-Event  25 Level     SLV_ILLBUR
 31:          0          0  RC-Event  26 Level     MST_DECERR
 32:          0          0  RC-Event  27 Level     MST_SLVERR
 33:          0          0  RC-Event  28 Level     SLV_PCIE_TIMEOUT
 35:         41          0   xen-dyn     Edge    -virq      hvc_console
 37:         17          0   ITS-MSI 524288 Edge      nvme0q0
 38:          0          0     GICv3 176 Level     sysmon-irq
 40:          0          0   ITS-MSI 524289 Edge      nvme0q1
 41:          0          0   ITS-MSI 524290 Edge      nvme0q2
 42:          0          0  xen-dyn-lateeoi     Edge    -event     evtchn:xenstored
 43:         19          0  xen-dyn-lateeoi     Edge    -event     evtchn:xenstored
IPI0:        30        101       Rescheduling interrupts
IPI1:      1888       1458       Function call interrupts
IPI2:         0          0       CPU stop interrupts
IPI3:         0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0       Timer broadcast interrupts
IPI5:         0          0       IRQ work interrupts
IPI6:         0          0       CPU wake-up interrupts
Err:          0

After applying the patch ("ARM: GICv3 ITS: flush all buffers, not just command queue") all the ITS-MSI irqs work:

xilinx-vck190-20231:~$ cat /proc/interrupts
           CPU0       CPU1
  0:          0         94   xen-dyn     Edge    -event     xenbus
 11:       4928       3938     GICv3  27 Level     arch_timer
 12:         56         95     GICv3  16 Level     events
 13:          0          0     GICv3  62 Level     zynqmp_ipi
 15:          0          0  RC-Event   0 Level     LINK_DOWN
 16:          0          0  RC-Event   3 Level     HOT_RESET
 17:          0          0  RC-Event   4 Level     CFG_PCIE_TIMEOUT
 18:          0          0  RC-Event   8 Level     CFG_TIMEOUT
 19:          0          0  RC-Event   9 Level     CORRECTABLE
 20:          0          0  RC-Event  10 Level     NONFATAL
 21:          0          0  RC-Event  11 Level     FATAL
 22:          0          0  RC-Event  12 Level     CFG_ERR_POISON
 23:          0          0  RC-Event  15 Level     PME_TO_ACK_RCVD
 24:          0          0  RC-Event  17 Level     PM_PME_RCVD
 25:          0          0  RC-Event  20 Level     SLV_UNSUPP
 26:          0          0  RC-Event  21 Level     SLV_UNEXP
 27:          0          0  RC-Event  22 Level     SLV_COMPL
 28:          0          0  RC-Event  23 Level     SLV_ERRP
 29:          0          0  RC-Event  24 Level     SLV_CMPABT
 30:          0          0  RC-Event  25 Level     SLV_ILLBUR
 31:          0          0  RC-Event  26 Level     MST_DECERR
 32:          0          0  RC-Event  27 Level     MST_SLVERR
 33:          0          0  RC-Event  28 Level     SLV_PCIE_TIMEOUT
 35:         42          0   xen-dyn     Edge    -virq      hvc_console
 37:         10          0   ITS-MSI 524288 Edge      nvme0q0
 38:         48          0   ITS-MSI 524289 Edge      nvme0q1
 39:          0         66   ITS-MSI 524290 Edge      nvme0q2
 40:          0          0     GICv3 176 Level     sysmon-irq
 42:          0          0  xen-dyn-lateeoi     Edge    -event     evtchn:xenstored
 43:         13          0  xen-dyn-lateeoi     Edge    -event     evtchn:xenstored
IPI0:        78         77       Rescheduling interrupts
IPI1:      2513       2512       Function call interrupts
IPI2:         0          0       CPU stop interrupts
IPI3:         0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0       Timer broadcast interrupts
IPI5:         0          0       IRQ work interrupts
IPI6:         0          0       CPU wake-up interrupts
Err:          0

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-06/msg00353.html