Message ID | 20180129213145.26068-2-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On 1/29/2018 4:31 PM, Keith Busch wrote: > + if (!work_busy(&dpc->work)) > + schedule_work(&dpc->work); Isn't there a race condition between the time that dpc_work() clears PCI_EXP_DPC_STATUS register and when work actually completes by returning from dpc_work()? If the interrupt arrives just in this window, this code will not schedule the work.
+ Oza On Tue, Jan 30, 2018 at 7:41 AM, Sinan Kaya <okaya@codeaurora.org> wrote: > On 1/29/2018 4:31 PM, Keith Busch wrote: >> + if (!work_busy(&dpc->work)) >> + schedule_work(&dpc->work); > > Isn't there a race condition between the time that dpc_work() clears PCI_EXP_DPC_STATUS > register and when work actually completes by returning from dpc_work()? > > If the interrupt arrives just in this window, this code will not schedule the work. > > -- > Sinan Kaya > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
On 2018-01-30 11:59, Oza Pawandeep wrote: > + Oza > > On Tue, Jan 30, 2018 at 7:41 AM, Sinan Kaya <okaya@codeaurora.org> > wrote: >> On 1/29/2018 4:31 PM, Keith Busch wrote: >>> + if (!work_busy(&dpc->work)) >>> + schedule_work(&dpc->work); >> >> Isn't there a race condition between the time that dpc_work() clears >> PCI_EXP_DPC_STATUS >> register and when work actually completes by returning from >> dpc_work()? >> >> If the interrupt arrives just in this window, this code will not >> schedule the work. >> >> -- >> Sinan Kaya >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >> Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >> Linux Foundation Collaborative Project. besides, there is one more problem which I was debugging: if RP doesnt support MSI, then legacy interrupts are not well handled and we get interrupt storm. this is because; PCI_EXP_DPC_STATUS_INTERRUPT is being cleared in deferred work. which should get cleared in dpc_irq interrupt handler. I will post a patch for this, but I see there are lot of changes being made in dpc and aer lately. Bjorn: Can you please help to review my AER/DPC series of patches so that I do not have to keep rebasing and do manual merge ? Regards, Oza.
On 2018-01-30 12:04, poza@codeaurora.org wrote: > On 2018-01-30 11:59, Oza Pawandeep wrote: >> + Oza >> >> On Tue, Jan 30, 2018 at 7:41 AM, Sinan Kaya <okaya@codeaurora.org> >> wrote: >>> On 1/29/2018 4:31 PM, Keith Busch wrote: >>>> + if (!work_busy(&dpc->work)) >>>> + schedule_work(&dpc->work); >>> >>> Isn't there a race condition between the time that dpc_work() clears >>> PCI_EXP_DPC_STATUS >>> register and when work actually completes by returning from >>> dpc_work()? >>> >>> If the interrupt arrives just in this window, this code will not >>> schedule the work. >>> >>> -- >>> Sinan Kaya >>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >>> Technologies, Inc. >>> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >>> Linux Foundation Collaborative Project. > > besides, there is one more problem which I was debugging: > if RP doesnt support MSI, then legacy interrupts are not well handled > and we get interrupt storm. > this is because; > PCI_EXP_DPC_STATUS_INTERRUPT is being cleared in deferred work. > which should get cleared in dpc_irq interrupt handler. > I will post a patch for this, but I see there are lot of changes being > made in dpc and aer lately. > > Bjorn: Can you please help to review my AER/DPC series of patches so > that I do not have to keep rebasing and do manual merge ? > > Regards, > Oza. the DPC code assumes that the interrupt will be edge triggered (MSI), and clears interrupt in deferred work. but for SPI (level triggered) it should be cleared in dpc_irq. Regards, Oza.
On Mon, Jan 29, 2018 at 10:34:44PM -0800, poza@codeaurora.org wrote: > On 2018-01-30 11:59, Oza Pawandeep wrote: > > + Oza > > > > On Tue, Jan 30, 2018 at 7:41 AM, Sinan Kaya <okaya@codeaurora.org> > > wrote: > >> On 1/29/2018 4:31 PM, Keith Busch wrote: > >>> + if (!work_busy(&dpc->work)) > >>> + schedule_work(&dpc->work); > >> > >> Isn't there a race condition between the time that dpc_work() clears > >> PCI_EXP_DPC_STATUS > >> register and when work actually completes by returning from > >> dpc_work()? > >> > >> If the interrupt arrives just in this window, this code will not > >> schedule the work. > >> > >> -- > >> Sinan Kaya > >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm > >> Technologies, Inc. > >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a > >> Linux Foundation Collaborative Project. > > besides, there is one more problem which I was debugging: > if RP doesnt support MSI, then legacy interrupts are not well handled > and we get interrupt storm. > this is because; > PCI_EXP_DPC_STATUS_INTERRUPT is being cleared in deferred work. > which should get cleared in dpc_irq interrupt handler. Okay, thanks for that information. I haven't got a DPC capable device that supports INTx, so that's a gap in my current testing capabilities. It looks like as you're suggesting, we should clear DPC Interrupt Status in the top-half IRQ handler, and we should not see another DPC interrupt raised until after we clear DPC Trigger Status in the bottom-half.
On Tue, Jan 30, 2018 at 12:04:44PM +0530, poza@codeaurora.org wrote: > Bjorn: Can you please help to review my AER/DPC series of patches so > that I do not have to keep rebasing and do manual merge ? They're in the queue and I'll get to them this cycle. You don't need to rebase them unless you change your patches to actually require something from another branch. If there are conflicts with other patches that are in-flight, I'll resolve those myself. Since your patches will probably land in the v4.17 merge, it *will* be nice if you rebase them when I update the pci/master branch, which will probably be to v4.16-rc1. But unless you *depend* on features in another branch, it's easiest if you base them on pci/master, which generally stays the same throughout the cycle. Bjorn
On 2018-01-30 23:47, Keith Busch wrote: > On Mon, Jan 29, 2018 at 10:34:44PM -0800, poza@codeaurora.org wrote: >> On 2018-01-30 11:59, Oza Pawandeep wrote: >> > + Oza >> > >> > On Tue, Jan 30, 2018 at 7:41 AM, Sinan Kaya <okaya@codeaurora.org> >> > wrote: >> >> On 1/29/2018 4:31 PM, Keith Busch wrote: >> >>> + if (!work_busy(&dpc->work)) >> >>> + schedule_work(&dpc->work); >> >> >> >> Isn't there a race condition between the time that dpc_work() clears >> >> PCI_EXP_DPC_STATUS >> >> register and when work actually completes by returning from >> >> dpc_work()? >> >> >> >> If the interrupt arrives just in this window, this code will not >> >> schedule the work. >> >> >> >> -- >> >> Sinan Kaya >> >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >> >> Technologies, Inc. >> >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >> >> Linux Foundation Collaborative Project. >> >> besides, there is one more problem which I was debugging: >> if RP doesnt support MSI, then legacy interrupts are not well handled >> and we get interrupt storm. >> this is because; >> PCI_EXP_DPC_STATUS_INTERRUPT is being cleared in deferred work. >> which should get cleared in dpc_irq interrupt handler. > > Okay, thanks for that information. I haven't got a DPC capable device > that > supports INTx, so that's a gap in my current testing capabilities. > > It looks like as you're suggesting, we should clear DPC Interrupt > Status > in the top-half IRQ handler, and we should not see another DPC > interrupt > raised until after we clear DPC Trigger Status in the bottom-half. yes that's right. I am preparing a patch with that, and testing it on our platform. please let me know on what branch should I post it ? because other mail from Bjorn suggests me to post on pci/master. since he will take care of merging it. Regards, Oza.
On 2018-01-31 00:00, Bjorn Helgaas wrote: > On Tue, Jan 30, 2018 at 12:04:44PM +0530, poza@codeaurora.org wrote: >> Bjorn: Can you please help to review my AER/DPC series of patches so >> that I do not have to keep rebasing and do manual merge ? > > They're in the queue and I'll get to them this cycle. > > You don't need to rebase them unless you change your patches to > actually require something from another branch. If there are > conflicts with other patches that are in-flight, I'll resolve those > myself. > > Since your patches will probably land in the v4.17 merge, it *will* be > nice if you rebase them when I update the pci/master branch, which > will probably be to v4.16-rc1. But unless you *depend* on features in > another branch, it's easiest if you base them on pci/master, which > generally stays the same throughout the cycle. > > Bjorn sure, will rebase them on top of pci/master. Regards, Oza.
diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c index ecdb76bc7b56..cf0398ccaeb6 100644 --- a/drivers/pci/pcie/pcie-dpc.c +++ b/drivers/pci/pcie/pcie-dpc.c @@ -89,7 +89,7 @@ static void dpc_work(struct work_struct *work) struct dpc_dev *dpc = container_of(work, struct dpc_dev, work); struct pci_dev *dev, *temp, *pdev = dpc->dev->port; struct pci_bus *parent = pdev->subordinate; - u16 cap = dpc->cap_pos, ctl, status, source, reason, ext_reason; + u16 cap = dpc->cap_pos, status, source, reason, ext_reason; pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status); pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source); @@ -135,10 +135,6 @@ static void dpc_work(struct work_struct *work) pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT); - - pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, &ctl); - pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL, - ctl | PCI_EXP_DPC_CTL_INT_EN); } static void dpc_process_rp_pio_error(struct dpc_dev *dpc) @@ -207,16 +203,10 @@ static irqreturn_t dpc_irq(int irq, void *context) { struct dpc_dev *dpc = (struct dpc_dev *)context; struct pci_dev *pdev = dpc->dev->port; - u16 cap = dpc->cap_pos, ctl, status; - - pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, &ctl); - - if (!(ctl & PCI_EXP_DPC_CTL_INT_EN) || ctl == (u16)(~0)) - return IRQ_NONE; + u16 cap = dpc->cap_pos, status; pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status); - - if (!(status & PCI_EXP_DPC_STATUS_INTERRUPT)) + if (!(status & PCI_EXP_DPC_STATUS_INTERRUPT) || status == (u16)(~0)) return IRQ_NONE; if (!(status & PCI_EXP_DPC_STATUS_TRIGGER)) { @@ -225,11 +215,8 @@ static irqreturn_t dpc_irq(int irq, void *context) return IRQ_HANDLED; } - pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL, - ctl & ~PCI_EXP_DPC_CTL_INT_EN); - - schedule_work(&dpc->work); - + if (!work_busy(&dpc->work)) + schedule_work(&dpc->work); return IRQ_HANDLED; }
The control register was being abused as a way to know if a shared interrupt is notifying the driver of a new DPC event. A DPC capable port can not trigger a second interrupt until the host acknowledges the first, and since DPC handles events in a deferred work queue, we don't need to use the config register to know if the DPC driver needs to handle the interrupt. We just need to make sure we don't schedule the same work multiple times. Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/pci/pcie/pcie-dpc.c | 23 +++++------------------ 1 file changed, 5 insertions(+), 18 deletions(-)