Message ID | 20220405111751.166427-1-daire.mcnamara@microchip.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Lorenzo Pieralisi |
Headers | show |
Series | [RESEND,v1,1/1] PCI: microchip: Fix potential race in interrupt handling | expand |
Hey all, Is there anything you need from us on this patch? Thanks, Conor On 05/04/2022 12:17, daire.mcnamara@microchip.com wrote: > From: Daire McNamara <daire.mcnamara@microchip.com> > > Clear MSI bit in ISTATUS register after reading it before > handling individual MSI bits > > This fixes a potential race condition pointed out by Bjorn Helgaas: > https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ > > Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") > Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> > --- > Adding linux-pci mailing list > drivers/pci/controller/pcie-microchip-host.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/pci/controller/pcie-microchip-host.c b/drivers/pci/controller/pcie-microchip-host.c > index 29d8e81e4181..da8e3fdc97b3 100644 > --- a/drivers/pci/controller/pcie-microchip-host.c > +++ b/drivers/pci/controller/pcie-microchip-host.c > @@ -416,6 +416,7 @@ static void mc_handle_msi(struct irq_desc *desc) > > status = readl_relaxed(bridge_base_addr + ISTATUS_LOCAL); > if (status & PM_MSI_INT_MSI_MASK) { > + writel_relaxed(status & PM_MSI_INT_MSI_MASK, bridge_base_addr + ISTATUS_LOCAL); > status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); > for_each_set_bit(bit, &status, msi->num_vectors) { > ret = generic_handle_domain_irq(msi->dev_domain, bit); > @@ -432,13 +433,8 @@ static void mc_msi_bottom_irq_ack(struct irq_data *data) > void __iomem *bridge_base_addr = > port->axi_base_addr + MC_PCIE_BRIDGE_ADDR; > u32 bitpos = data->hwirq; > - unsigned long status; > > writel_relaxed(BIT(bitpos), bridge_base_addr + ISTATUS_MSI); > - status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); > - if (!status) > - writel_relaxed(BIT(PM_MSI_INT_MSI_SHIFT), > - bridge_base_addr + ISTATUS_LOCAL); > } > > static void mc_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: > From: Daire McNamara <daire.mcnamara@microchip.com> > > Clear MSI bit in ISTATUS register after reading it before > handling individual MSI bits That explains nothing. If you are fixing a bug please describe the issue and how the patch is fixing it. > This fixes a potential race condition pointed out by Bjorn Helgaas: > https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ > > Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") > Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> > --- > Adding linux-pci mailing list > drivers/pci/controller/pcie-microchip-host.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/pci/controller/pcie-microchip-host.c b/drivers/pci/controller/pcie-microchip-host.c > index 29d8e81e4181..da8e3fdc97b3 100644 > --- a/drivers/pci/controller/pcie-microchip-host.c > +++ b/drivers/pci/controller/pcie-microchip-host.c > @@ -416,6 +416,7 @@ static void mc_handle_msi(struct irq_desc *desc) > > status = readl_relaxed(bridge_base_addr + ISTATUS_LOCAL); > if (status & PM_MSI_INT_MSI_MASK) { > + writel_relaxed(status & PM_MSI_INT_MSI_MASK, bridge_base_addr + ISTATUS_LOCAL); What does ISTATUS_LOCAL contain vs ISTATUS_MSI ? If you explain that to me I could help you write the commit log. Thanks, Lorenzo > status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); > for_each_set_bit(bit, &status, msi->num_vectors) { > ret = generic_handle_domain_irq(msi->dev_domain, bit); > @@ -432,13 +433,8 @@ static void mc_msi_bottom_irq_ack(struct irq_data *data) > void __iomem *bridge_base_addr = > port->axi_base_addr + MC_PCIE_BRIDGE_ADDR; > u32 bitpos = data->hwirq; > - unsigned long status; > > writel_relaxed(BIT(bitpos), bridge_base_addr + ISTATUS_MSI); > - status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); > - if (!status) > - writel_relaxed(BIT(PM_MSI_INT_MSI_SHIFT), > - bridge_base_addr + ISTATUS_LOCAL); > } > > static void mc_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) > -- > 2.25.1 >
On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: >> From: Daire McNamara <daire.mcnamara@microchip.com> >> >> Clear MSI bit in ISTATUS register after reading it before >> handling individual MSI bits > > That explains nothing. If you are fixing a bug please describe > the issue and how the patch is fixing it. Someone in the pantheon of IT gods has it out for Daire, so I am sending this on his behalf, but is the following revised commit message better? Clear the MSI bit in ISTATUS register after reading it, but before reading and handling individual MSI bits from the IMSI register. This avoids a potential race where new MSI bits may be set on the IMSI register after it was read and be missed when the MSI bit in the ISTATUS register is cleared. Reported-by: Bjorn Helgaas <helgaas@kernel.org> Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> > >> This fixes a potential race condition pointed out by Bjorn Helgaas: >> https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ >> >> Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") >> Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> >> --- >> Adding linux-pci mailing list >> drivers/pci/controller/pcie-microchip-host.c | 6 +----- >> 1 file changed, 1 insertion(+), 5 deletions(-) >> >> diff --git a/drivers/pci/controller/pcie-microchip-host.c b/drivers/pci/controller/pcie-microchip-host.c >> index 29d8e81e4181..da8e3fdc97b3 100644 >> --- a/drivers/pci/controller/pcie-microchip-host.c >> +++ b/drivers/pci/controller/pcie-microchip-host.c >> @@ -416,6 +416,7 @@ static void mc_handle_msi(struct irq_desc *desc) >> >> status = readl_relaxed(bridge_base_addr + ISTATUS_LOCAL); >> if (status & PM_MSI_INT_MSI_MASK) { >> + writel_relaxed(status & PM_MSI_INT_MSI_MASK, bridge_base_addr + ISTATUS_LOCAL); > > What does ISTATUS_LOCAL contain vs ISTATUS_MSI ? If you explain that > to me I could help you write the commit log. > > Thanks, > Lorenzo > >> status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); >> for_each_set_bit(bit, &status, msi->num_vectors) { >> ret = generic_handle_domain_irq(msi->dev_domain, bit); >> @@ -432,13 +433,8 @@ static void mc_msi_bottom_irq_ack(struct irq_data *data) >> void __iomem *bridge_base_addr = >> port->axi_base_addr + MC_PCIE_BRIDGE_ADDR; >> u32 bitpos = data->hwirq; >> - unsigned long status; >> >> writel_relaxed(BIT(bitpos), bridge_base_addr + ISTATUS_MSI); >> - status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); >> - if (!status) >> - writel_relaxed(BIT(PM_MSI_INT_MSI_SHIFT), >> - bridge_base_addr + ISTATUS_LOCAL); >> } >> >> static void mc_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) >> -- >> 2.25.1 >>
[+to Marc] On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: > On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > > On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: > >> From: Daire McNamara <daire.mcnamara@microchip.com> > >> > >> Clear MSI bit in ISTATUS register after reading it before > >> handling individual MSI bits > > > > That explains nothing. If you are fixing a bug please describe > > the issue and how the patch is fixing it. > > Someone in the pantheon of IT gods has it out for Daire, so I am > sending this on his behalf, but is the following revised commit > message better? > > Clear the MSI bit in ISTATUS register after reading it, but before > reading and handling individual MSI bits from the IMSI register. > This avoids a potential race where new MSI bits may be set on the > IMSI register after it was read and be missed when the MSI bit in > the ISTATUS register is cleared. "ISTATUS" doesn't appear in the code as a register name. Neither does "IMSI". Please use names that match the code. Honestly, I don't understand enough about IRQs to determine whether this is a correct fix. Hopefully Marc will chime in. All I really know how to do is compare all the drivers and see which ones don't fit the typical patterns. And speaking of that, I looked at all the users of irq_set_chained_handler_and_data() in drivers/pci. All the handlers except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() and chained_irq_exit(). Are mc_handle_intx() and mc_handle_msi() just really special, or is this a mistake? > Reported-by: Bjorn Helgaas <helgaas@kernel.org> Please use this address instead: Reported by: Bjorn Helgaas <bhelgaas@google.com> > Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ > Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") > Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> > > >> This fixes a potential race condition pointed out by Bjorn Helgaas: > >> https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ > >> > >> Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") > >> Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> > >> --- > >> Adding linux-pci mailing list > >> drivers/pci/controller/pcie-microchip-host.c | 6 +----- > >> 1 file changed, 1 insertion(+), 5 deletions(-) > >> > >> diff --git a/drivers/pci/controller/pcie-microchip-host.c b/drivers/pci/controller/pcie-microchip-host.c > >> index 29d8e81e4181..da8e3fdc97b3 100644 > >> --- a/drivers/pci/controller/pcie-microchip-host.c > >> +++ b/drivers/pci/controller/pcie-microchip-host.c > >> @@ -416,6 +416,7 @@ static void mc_handle_msi(struct irq_desc *desc) > >> > >> status = readl_relaxed(bridge_base_addr + ISTATUS_LOCAL); > >> if (status & PM_MSI_INT_MSI_MASK) { > >> + writel_relaxed(status & PM_MSI_INT_MSI_MASK, bridge_base_addr + ISTATUS_LOCAL); > > > > What does ISTATUS_LOCAL contain vs ISTATUS_MSI ? If you explain that > > to me I could help you write the commit log. > > > > Thanks, > > Lorenzo > > > >> status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); > >> for_each_set_bit(bit, &status, msi->num_vectors) { > >> ret = generic_handle_domain_irq(msi->dev_domain, bit); > >> @@ -432,13 +433,8 @@ static void mc_msi_bottom_irq_ack(struct irq_data *data) > >> void __iomem *bridge_base_addr = > >> port->axi_base_addr + MC_PCIE_BRIDGE_ADDR; > >> u32 bitpos = data->hwirq; > >> - unsigned long status; > >> > >> writel_relaxed(BIT(bitpos), bridge_base_addr + ISTATUS_MSI); > >> - status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); > >> - if (!status) > >> - writel_relaxed(BIT(PM_MSI_INT_MSI_SHIFT), > >> - bridge_base_addr + ISTATUS_LOCAL); > >> } > >> > >> static void mc_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) > >> -- > >> 2.25.1 > >>
On Fri, 29 Apr 2022 22:57:33 +0100, Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+to Marc] > > On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: > > On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > > > On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: > > >> From: Daire McNamara <daire.mcnamara@microchip.com> > > >> > > >> Clear MSI bit in ISTATUS register after reading it before > > >> handling individual MSI bits > > > > > > That explains nothing. If you are fixing a bug please describe > > > the issue and how the patch is fixing it. > > > > Someone in the pantheon of IT gods has it out for Daire, so I am > > sending this on his behalf, but is the following revised commit > > message better? > > > > Clear the MSI bit in ISTATUS register after reading it, but before > > reading and handling individual MSI bits from the IMSI register. > > This avoids a potential race where new MSI bits may be set on the > > IMSI register after it was read and be missed when the MSI bit in > > the ISTATUS register is cleared. > > "ISTATUS" doesn't appear in the code as a register name. Neither does > "IMSI". Please use names that match the code. > > Honestly, I don't understand enough about IRQs to determine whether > this is a correct fix. Hopefully Marc will chime in. All I really > know how to do is compare all the drivers and see which ones don't fit > the typical patterns. This seems sensible. In general, edge interrupts need an early Ack *before* the handler can be run. If it happens after, you're pretty much guaranteed to lose edges that would be generated between the handler and the late Ack. This can be implemented in HW in a variety of ways (read a register, write a register, or even both). > > And speaking of that, I looked at all the users of > irq_set_chained_handler_and_data() in drivers/pci. All the handlers > except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() > and chained_irq_exit(). > > Are mc_handle_intx() and mc_handle_msi() just really special, or is > this a mistake? That's just a bug. On the right HW, this would just result in lost interrupts. M.
On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: > On Fri, 29 Apr 2022 22:57:33 +0100, > Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: > > > On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > > > > On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: > > > >> From: Daire McNamara <daire.mcnamara@microchip.com> > > > >> > > > >> Clear MSI bit in ISTATUS register after reading it before > > > >> handling individual MSI bits > > > Clear the MSI bit in ISTATUS register after reading it, but before > > > reading and handling individual MSI bits from the IMSI register. > > > This avoids a potential race where new MSI bits may be set on the > > > IMSI register after it was read and be missed when the MSI bit in > > > the ISTATUS register is cleared. > > Honestly, I don't understand enough about IRQs to determine whether > > this is a correct fix. Hopefully Marc will chime in. All I really > > know how to do is compare all the drivers and see which ones don't fit > > the typical patterns. > > This seems sensible. In general, edge interrupts need an early Ack > *before* the handler can be run. If it happens after, you're pretty > much guaranteed to lose edges that would be generated between the > handler and the late Ack. > > This can be implemented in HW in a variety of ways (read a register, > write a register, or even both). Is this something that is or could be documented somewhere under Documentation, e.g., "here are the common canonical patterns to use"? I feel like an idiot because I have this kind of question all the time and I never know how to confidently analyze it. > > And speaking of that, I looked at all the users of > > irq_set_chained_handler_and_data() in drivers/pci. All the handlers > > except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() > > and chained_irq_exit(). > > > > Are mc_handle_intx() and mc_handle_msi() just really special, or is > > this a mistake? > > That's just a bug. On the right HW, this would just result in lost > interrupts. I wonder if coccinelle or some other static analyzer would be smart enough to find this kind of error. Bjorn
On 02/05/2022 20:22, Bjorn Helgaas wrote: > On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: >> On Fri, 29 Apr 2022 22:57:33 +0100, >> Bjorn Helgaas <helgaas@kernel.org> wrote: >>> On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: >>>> On 28/04/2022 10:29, Lorenzo Pieralisi wrote: >>>>> On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: >>>>>> From: Daire McNamara <daire.mcnamara@microchip.com> >>>>>> >>>>>> Clear MSI bit in ISTATUS register after reading it before >>>>>> handling individual MSI bits > >>>> Clear the MSI bit in ISTATUS register after reading it, but before >>>> reading and handling individual MSI bits from the IMSI register. >>>> This avoids a potential race where new MSI bits may be set on the >>>> IMSI register after it was read and be missed when the MSI bit in >>>> the ISTATUS register is cleared. > >>> Honestly, I don't understand enough about IRQs to determine whether >>> this is a correct fix. Hopefully Marc will chime in. All I really >>> know how to do is compare all the drivers and see which ones don't fit >>> the typical patterns. >> >> This seems sensible. In general, edge interrupts need an early Ack >> *before* the handler can be run. If it happens after, you're pretty >> much guaranteed to lose edges that would be generated between the >> handler and the late Ack. >> >> This can be implemented in HW in a variety of ways (read a register, >> write a register, or even both). > > Is this something that is or could be documented somewhere under > Documentation, e.g., "here are the common canonical patterns to use"? > I feel like an idiot because I have this kind of question all the time > and I never know how to confidently analyze it. Daire is still having the IT issues, so before I resend the patch with a new commit message, how is the following: Clear the MSI bit in ISTATUS_LOCAL register after reading it, but before reading and handling individual MSI bits from the ISTATUS_MSI register. This avoids a potential race where new MSI bits may be set on the ISTATUS_MSI register after it was read and be missed when the MSI bit in the ISTATUS_LOCAL register is cleared. Reported by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> > >>> And speaking of that, I looked at all the users of >>> irq_set_chained_handler_and_data() in drivers/pci. All the handlers >>> except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() >>> and chained_irq_exit(). >>> >>> Are mc_handle_intx() and mc_handle_msi() just really special, or is >>> this a mistake? >> >> That's just a bug. On the right HW, this would just result in lost >> interrupts. Separate issue, separate patch. Do you want them in a series or as another standalone patch? Thanks, Conor.
On Wed, May 04, 2022 at 04:12:39PM +0100, Conor Dooley wrote: > On 02/05/2022 20:22, Bjorn Helgaas wrote: > > On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: > >> On Fri, 29 Apr 2022 22:57:33 +0100, > >> Bjorn Helgaas <helgaas@kernel.org> wrote: > >>> On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: > >>>> On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > >>>>> On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: > >>>>>> From: Daire McNamara <daire.mcnamara@microchip.com> > >>>>>> > >>>>>> Clear MSI bit in ISTATUS register after reading it before > >>>>>> handling individual MSI bits > > > >>>> Clear the MSI bit in ISTATUS register after reading it, but before > >>>> reading and handling individual MSI bits from the IMSI register. > >>>> This avoids a potential race where new MSI bits may be set on the > >>>> IMSI register after it was read and be missed when the MSI bit in > >>>> the ISTATUS register is cleared. > > > >>> Honestly, I don't understand enough about IRQs to determine whether > >>> this is a correct fix. Hopefully Marc will chime in. All I really > >>> know how to do is compare all the drivers and see which ones don't fit > >>> the typical patterns. > >> > >> This seems sensible. In general, edge interrupts need an early Ack > >> *before* the handler can be run. If it happens after, you're pretty > >> much guaranteed to lose edges that would be generated between the > >> handler and the late Ack. > >> > >> This can be implemented in HW in a variety of ways (read a register, > >> write a register, or even both). > > > > Is this something that is or could be documented somewhere under > > Documentation, e.g., "here are the common canonical patterns to use"? > > I feel like an idiot because I have this kind of question all the time > > and I never know how to confidently analyze it. > > Daire is still having the IT issues, so before I resend the patch with > a new commit message, how is the following: > > Clear the MSI bit in ISTATUS_LOCAL register after reading it, but > before reading and handling individual MSI bits from the ISTATUS_MSI > register. This avoids a potential race where new MSI bits may be set > on the ISTATUS_MSI register after it was read and be missed when the > MSI bit in the ISTATUS_LOCAL register is cleared. It is still unclear. You should translate what Marc said above into how ISTATUS_MSI and ISTATUS_LOCAL work (ie describe how HW works). Please describe what the registers do and use that to describe the fix. Thanks, Lorenzo > Reported by: Bjorn Helgaas <bhelgaas@google.com> > Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ > Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") > Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> > > > >>> And speaking of that, I looked at all the users of > >>> irq_set_chained_handler_and_data() in drivers/pci. All the handlers > >>> except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() > >>> and chained_irq_exit(). > >>> > >>> Are mc_handle_intx() and mc_handle_msi() just really special, or is > >>> this a mistake? > >> > >> That's just a bug. On the right HW, this would just result in lost > >> interrupts. > > Separate issue, separate patch. Do you want them in a series or as > another standalone patch? > > Thanks, > Conor.
On 04/05/2022 17:53, Lorenzo Pieralisi wrote: > On Wed, May 04, 2022 at 04:12:39PM +0100, Conor Dooley wrote: >> On 02/05/2022 20:22, Bjorn Helgaas wrote: >>> On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: >>>> On Fri, 29 Apr 2022 22:57:33 +0100, >>>> Bjorn Helgaas <helgaas@kernel.org> wrote: >>>>> On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: >>>>>> On 28/04/2022 10:29, Lorenzo Pieralisi wrote: >>>>>>> On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: >>>>>>>> From: Daire McNamara <daire.mcnamara@microchip.com> >>>>>>>> >>>>>>>> Clear MSI bit in ISTATUS register after reading it before >>>>>>>> handling individual MSI bits >>> >>>>>> Clear the MSI bit in ISTATUS register after reading it, but before >>>>>> reading and handling individual MSI bits from the IMSI register. >>>>>> This avoids a potential race where new MSI bits may be set on the >>>>>> IMSI register after it was read and be missed when the MSI bit in >>>>>> the ISTATUS register is cleared. >>> >>>>> Honestly, I don't understand enough about IRQs to determine whether >>>>> this is a correct fix. Hopefully Marc will chime in. All I really >>>>> know how to do is compare all the drivers and see which ones don't fit >>>>> the typical patterns. >>>> >>>> This seems sensible. In general, edge interrupts need an early Ack >>>> *before* the handler can be run. If it happens after, you're pretty >>>> much guaranteed to lose edges that would be generated between the >>>> handler and the late Ack. >>>> >>>> This can be implemented in HW in a variety of ways (read a register, >>>> write a register, or even both). >>> >>> Is this something that is or could be documented somewhere under >>> Documentation, e.g., "here are the common canonical patterns to use"? >>> I feel like an idiot because I have this kind of question all the time >>> and I never know how to confidently analyze it. >> >> Daire is still having the IT issues, so before I resend the patch with >> a new commit message, how is the following: >> >> Clear the MSI bit in ISTATUS_LOCAL register after reading it, but >> before reading and handling individual MSI bits from the ISTATUS_MSI >> register. This avoids a potential race where new MSI bits may be set >> on the ISTATUS_MSI register after it was read and be missed when the >> MSI bit in the ISTATUS_LOCAL register is cleared. > > It is still unclear. You should translate what Marc said above into > how ISTATUS_MSI and ISTATUS_LOCAL work (ie describe how HW works). > > Please describe what the registers do and use that to describe > the fix. Sure, best to wait until the IT issues are resolved so! Conor.
On Wed, May 04, 2022 at 04:12:39PM +0100, Conor Dooley wrote: > On 02/05/2022 20:22, Bjorn Helgaas wrote: > > On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: > >> On Fri, 29 Apr 2022 22:57:33 +0100, > >> Bjorn Helgaas <helgaas@kernel.org> wrote: > >>> On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: > >>>> On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > >>>>> On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: > >>>>>> From: Daire McNamara <daire.mcnamara@microchip.com> > >>>>>> > >>>>>> Clear MSI bit in ISTATUS register after reading it before > >>>>>> handling individual MSI bits > > > >>>> Clear the MSI bit in ISTATUS register after reading it, but before > >>>> reading and handling individual MSI bits from the IMSI register. > >>>> This avoids a potential race where new MSI bits may be set on the > >>>> IMSI register after it was read and be missed when the MSI bit in > >>>> the ISTATUS register is cleared. Restoring the context here: > >>> "ISTATUS" doesn't appear in the code as a register name. > >>> Neither does "IMSI". Please use names that match the code. > Daire is still having the IT issues, so before I resend the patch with > a new commit message, how is the following: > > Clear the MSI bit in ISTATUS_LOCAL register after reading it, but > before reading and handling individual MSI bits from the ISTATUS_MSI > register. This avoids a potential race where new MSI bits may be set > on the ISTATUS_MSI register after it was read and be missed when the > MSI bit in the ISTATUS_LOCAL register is cleared. Looks good, thank you! > >>> And speaking of that, I looked at all the users of > >>> irq_set_chained_handler_and_data() in drivers/pci. All the handlers > >>> except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() > >>> and chained_irq_exit(). > >>> > >>> Are mc_handle_intx() and mc_handle_msi() just really special, or is > >>> this a mistake? > >> > >> That's just a bug. On the right HW, this would just result in lost > >> interrupts. > > Separate issue, separate patch. Do you want them in a series or as > another standalone patch? Agreed, should be a separate patch. Doesn't need to be a series unless that patch only applies correctly on top of this one. Bjorn
On 04/05/2022 17:59, Bjorn Helgaas wrote: > On Wed, May 04, 2022 at 04:12:39PM +0100, Conor Dooley wrote: >> On 02/05/2022 20:22, Bjorn Helgaas wrote: >>> On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: >>>> On Fri, 29 Apr 2022 22:57:33 +0100, >>>> Bjorn Helgaas <helgaas@kernel.org> wrote: >>>>> On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: >>>>>> On 28/04/2022 10:29, Lorenzo Pieralisi wrote: >>>>>>> On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: >>>>>>>> From: Daire McNamara <daire.mcnamara@microchip.com> >>>>>>>> >>>>>>>> Clear MSI bit in ISTATUS register after reading it before >>>>>>>> handling individual MSI bits >>> >>>>>> Clear the MSI bit in ISTATUS register after reading it, but before >>>>>> reading and handling individual MSI bits from the IMSI register. >>>>>> This avoids a potential race where new MSI bits may be set on the >>>>>> IMSI register after it was read and be missed when the MSI bit in >>>>>> the ISTATUS register is cleared. > > Restoring the context here: > >>>>> "ISTATUS" doesn't appear in the code as a register name. >>>>> Neither does "IMSI". Please use names that match the code. > >> Daire is still having the IT issues, so before I resend the patch with >> a new commit message, how is the following: >> >> Clear the MSI bit in ISTATUS_LOCAL register after reading it, but >> before reading and handling individual MSI bits from the ISTATUS_MSI >> register. This avoids a potential race where new MSI bits may be set >> on the ISTATUS_MSI register after it was read and be missed when the >> MSI bit in the ISTATUS_LOCAL register is cleared. > > Looks good, thank you! Hmm, there's now a response saying that the proposed commit message is fine and one saying it isn't. Which is it? > >>>>> And speaking of that, I looked at all the users of >>>>> irq_set_chained_handler_and_data() in drivers/pci. All the handlers >>>>> except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() >>>>> and chained_irq_exit(). >>>>> >>>>> Are mc_handle_intx() and mc_handle_msi() just really special, or is >>>>> this a mistake? >>>> >>>> That's just a bug. On the right HW, this would just result in lost >>>> interrupts. >> >> Separate issue, separate patch. Do you want them in a series or as >> another standalone patch? > > Agreed, should be a separate patch. Doesn't need to be a series > unless that patch only applies correctly on top of this one. Cool, just sent one: https://lore.kernel.org/linux-pci/20220511095504.2273799-1-conor.dooley@microchip.com/ Thanks, Conor.
On Wed, May 11, 2022 at 11:00:18AM +0100, Conor Dooley wrote: > On 04/05/2022 17:59, Bjorn Helgaas wrote: > > On Wed, May 04, 2022 at 04:12:39PM +0100, Conor Dooley wrote: > > > On 02/05/2022 20:22, Bjorn Helgaas wrote: > > > > On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: > > > > > On Fri, 29 Apr 2022 22:57:33 +0100, > > > > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: > > > > > > > On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > > > > > > > > On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: > > > > > > > > > From: Daire McNamara <daire.mcnamara@microchip.com> > > > > > > > > > > > > > > > > > > Clear MSI bit in ISTATUS register after reading it before > > > > > > > > > handling individual MSI bits > > > > > > > > > > > Clear the MSI bit in ISTATUS register after reading it, but before > > > > > > > reading and handling individual MSI bits from the IMSI register. > > > > > > > This avoids a potential race where new MSI bits may be set on the > > > > > > > IMSI register after it was read and be missed when the MSI bit in > > > > > > > the ISTATUS register is cleared. > > > > Restoring the context here: > > > > > > > > "ISTATUS" doesn't appear in the code as a register name. > > > > > > Neither does "IMSI". Please use names that match the code. > > > > > Daire is still having the IT issues, so before I resend the patch with > > > a new commit message, how is the following: > > > > > > Clear the MSI bit in ISTATUS_LOCAL register after reading it, but > > > before reading and handling individual MSI bits from the ISTATUS_MSI > > > register. This avoids a potential race where new MSI bits may be set > > > on the ISTATUS_MSI register after it was read and be missed when the > > > MSI bit in the ISTATUS_LOCAL register is cleared. > > > > Looks good, thank you! > > Hmm, there's now a response saying that the proposed commit message is > fine and one saying it isn't. Which is it? I would like the commit log to contain an explanation of what ISTATUS_LOCAL reg is there for and how it is related to ISTATUS_MSI please. Thanks, Lorenzo > > > > > > And speaking of that, I looked at all the users of > > > > > > irq_set_chained_handler_and_data() in drivers/pci. All the handlers > > > > > > except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() > > > > > > and chained_irq_exit(). > > > > > > > > > > > > Are mc_handle_intx() and mc_handle_msi() just really special, or is > > > > > > this a mistake? > > > > > > > > > > That's just a bug. On the right HW, this would just result in lost > > > > > interrupts. > > > > > > Separate issue, separate patch. Do you want them in a series or as > > > another standalone patch? > > > > Agreed, should be a separate patch. Doesn't need to be a series > > unless that patch only applies correctly on top of this one. > > Cool, just sent one: > https://lore.kernel.org/linux-pci/20220511095504.2273799-1-conor.dooley@microchip.com/ > > Thanks, > Conor.
diff --git a/drivers/pci/controller/pcie-microchip-host.c b/drivers/pci/controller/pcie-microchip-host.c index 29d8e81e4181..da8e3fdc97b3 100644 --- a/drivers/pci/controller/pcie-microchip-host.c +++ b/drivers/pci/controller/pcie-microchip-host.c @@ -416,6 +416,7 @@ static void mc_handle_msi(struct irq_desc *desc) status = readl_relaxed(bridge_base_addr + ISTATUS_LOCAL); if (status & PM_MSI_INT_MSI_MASK) { + writel_relaxed(status & PM_MSI_INT_MSI_MASK, bridge_base_addr + ISTATUS_LOCAL); status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); for_each_set_bit(bit, &status, msi->num_vectors) { ret = generic_handle_domain_irq(msi->dev_domain, bit); @@ -432,13 +433,8 @@ static void mc_msi_bottom_irq_ack(struct irq_data *data) void __iomem *bridge_base_addr = port->axi_base_addr + MC_PCIE_BRIDGE_ADDR; u32 bitpos = data->hwirq; - unsigned long status; writel_relaxed(BIT(bitpos), bridge_base_addr + ISTATUS_MSI); - status = readl_relaxed(bridge_base_addr + ISTATUS_MSI); - if (!status) - writel_relaxed(BIT(PM_MSI_INT_MSI_SHIFT), - bridge_base_addr + ISTATUS_LOCAL); } static void mc_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)