Message ID | 20160115183625.GG3781@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Hi Sebastian, On Fri, Jan 15, 2016 at 07:36:25PM +0100, Sebastian Andrzej Siewior wrote: > I start a binary which should flash the FPGA and re-enumare the PCI-BUS > and find a new device. It works most of the time. With SLUB debug it > crashes on each iteration with something like this (compressed output): > > | pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000 > | Unable to handle kernel paging request for data at address 0x27ef9e3e > | Faulting instruction address: 0x602f5328 > | Oops: Kernel access of bad area, sig: 11 [#1] > | Workqueue: events aer_isr > | GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0 > | NIP [602f5328] pci_walk_bus+0xd4/0x104 > > Register 25 has the user-after magic. As it turns out, the old PCIe > device is leaving, generates an error before it left, aer_irq() is fired, > it schedules a work item. What happens now is that free_irq() is > invoked, all resources are gone *before* the aes_isr() work item is > completed. > So to fix this, I flush the workqueue to ensure that there is no more > work pending. > The wait_event() on wait_release should actually synchronized against > removal. However the condition (->prod_idx == ->cons_idx) is made true > before the function completes (aer_isr_one_error() is invoked right > after that) so it does not fulfill its purpose. Therefore I remove it. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> I propose to merge this patch unchanged, but with the following changelog. I want to add a bit more detail about the concurrency problem and remove a bit of the specific detail about your FPGA: commit 9963c9487f733ef8fe3a06ce3398072a40f955bf Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Fri Jan 15 19:36:25 2016 +0100 PCI/AER: Flush workqueue on device remove to avoid use-after-free A Root Port's AER structure (rpc) contains a queue of events. aer_irq() enqueues AER status information and schedules aer_isr() to dequeue and process it. When we remove a device, aer_remove() waits for the queue to be empty, then frees the rpc struct. But aer_isr() references the rpc struct after dequeueing and possibly emptying the queue, which can cause a use-after-free error as in the following scenario with two threads, aer_isr() on the left and a concurrent aer_remove() on the right: Thread A Thread B -------- -------- aer_irq(): rpc->prod_idx++ aer_remove(): wait_event(rpc->prod_idx == rpc->cons_idx) # now blocked until queue becomes empty aer_isr(): # ... rpc->cons_idx++ # unblocked because queue is now empty ... kfree(rpc) mutex_unlock(&rpc->rpc_mutex) Wait until the last scheduled instance of aer_isr() has completed before freeing the rpc struct by using flush_work() in aer_remove(). I reproduced this use-after-free by flashing a device FPGA and re-enumerating the bus to find the new device. With SLUB debug, this crashes with 0x6b bytes (POISON_FREE, the use-after-free magic number) in GPR25: pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000 Unable to handle kernel paging request for data at address 0x27ef9e3e Workqueue: events aer_isr GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0 NIP [602f5328] pci_walk_bus+0xd4/0x104 [bhelgaas: changelog] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/21/2016 09:57 PM, Bjorn Helgaas wrote: > Hi Sebastian, Hi Bjorn, > I propose to merge this patch unchanged, but with the following > changelog. I want to add a bit more detail about the concurrency > problem and remove a bit of the specific detail about your FPGA: perfect, thanks. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 21, 2016 at 02:57:17PM -0600, Bjorn Helgaas wrote: > Hi Sebastian, > > On Fri, Jan 15, 2016 at 07:36:25PM +0100, Sebastian Andrzej Siewior wrote: > > I start a binary which should flash the FPGA and re-enumare the PCI-BUS > > and find a new device. It works most of the time. With SLUB debug it > > crashes on each iteration with something like this (compressed output): > > > > | pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000 > > | Unable to handle kernel paging request for data at address 0x27ef9e3e > > | Faulting instruction address: 0x602f5328 > > | Oops: Kernel access of bad area, sig: 11 [#1] > > | Workqueue: events aer_isr > > | GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0 > > | NIP [602f5328] pci_walk_bus+0xd4/0x104 > > > > Register 25 has the user-after magic. As it turns out, the old PCIe > > device is leaving, generates an error before it left, aer_irq() is fired, > > it schedules a work item. What happens now is that free_irq() is > > invoked, all resources are gone *before* the aes_isr() work item is > > completed. > > So to fix this, I flush the workqueue to ensure that there is no more > > work pending. > > The wait_event() on wait_release should actually synchronized against > > removal. However the condition (->prod_idx == ->cons_idx) is made true > > before the function completes (aer_isr_one_error() is invoked right > > after that) so it does not fulfill its purpose. Therefore I remove it. > > > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > I propose to merge this patch unchanged, but with the following > changelog. I want to add a bit more detail about the concurrency > problem and remove a bit of the specific detail about your FPGA: > > > commit 9963c9487f733ef8fe3a06ce3398072a40f955bf > Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Date: Fri Jan 15 19:36:25 2016 +0100 > > PCI/AER: Flush workqueue on device remove to avoid use-after-free > > A Root Port's AER structure (rpc) contains a queue of events. aer_irq() > enqueues AER status information and schedules aer_isr() to dequeue and > process it. When we remove a device, aer_remove() waits for the queue to > be empty, then frees the rpc struct. > > But aer_isr() references the rpc struct after dequeueing and possibly > emptying the queue, which can cause a use-after-free error as in the > following scenario with two threads, aer_isr() on the left and a > concurrent aer_remove() on the right: > > Thread A Thread B > -------- -------- > aer_irq(): > rpc->prod_idx++ > aer_remove(): > wait_event(rpc->prod_idx == rpc->cons_idx) > # now blocked until queue becomes empty > aer_isr(): # ... > rpc->cons_idx++ # unblocked because queue is now empty > ... kfree(rpc) > mutex_unlock(&rpc->rpc_mutex) > > Wait until the last scheduled instance of aer_isr() has completed before > freeing the rpc struct by using flush_work() in aer_remove(). > > I reproduced this use-after-free by flashing a device FPGA and > re-enumerating the bus to find the new device. With SLUB debug, this > crashes with 0x6b bytes (POISON_FREE, the use-after-free magic number) in > GPR25: > > pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000 > Unable to handle kernel paging request for data at address 0x27ef9e3e > Workqueue: events aer_isr > GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0 > NIP [602f5328] pci_walk_bus+0xd4/0x104 > > [bhelgaas: changelog] > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Applied to for-linus for v4.5 with stable tag, thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c index 0bf82a20a0fb..48d21e0edd56 100644 --- a/drivers/pci/pcie/aer/aerdrv.c +++ b/drivers/pci/pcie/aer/aerdrv.c @@ -262,7 +262,6 @@ static struct aer_rpc *aer_alloc_rpc(struct pcie_device *dev) rpc->rpd = dev; INIT_WORK(&rpc->dpc_handler, aer_isr); mutex_init(&rpc->rpc_mutex); - init_waitqueue_head(&rpc->wait_release); /* Use PCIe bus function to store rpc into PCIe device */ set_service_data(dev, rpc); @@ -285,8 +284,7 @@ static void aer_remove(struct pcie_device *dev) if (rpc->isr) free_irq(dev->irq, dev); - wait_event(rpc->wait_release, rpc->prod_idx == rpc->cons_idx); - + flush_work(&rpc->dpc_handler); aer_disable_rootport(rpc); kfree(rpc); set_service_data(dev, NULL); diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h index 84420b7c9456..945c939a86c5 100644 --- a/drivers/pci/pcie/aer/aerdrv.h +++ b/drivers/pci/pcie/aer/aerdrv.h @@ -72,7 +72,6 @@ struct aer_rpc { * recovery on the same * root port hierarchy */ - wait_queue_head_t wait_release; }; struct aer_broadcast_data { diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index fba785e9df75..4e14de0f0f98 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -811,8 +811,6 @@ void aer_isr(struct work_struct *work) while (get_e_source(rpc, &e_src)) aer_isr_one_error(p_device, &e_src); mutex_unlock(&rpc->rpc_mutex); - - wake_up(&rpc->wait_release); } /**
I start a binary which should flash the FPGA and re-enumare the PCI-BUS and find a new device. It works most of the time. With SLUB debug it crashes on each iteration with something like this (compressed output): | pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000 | Unable to handle kernel paging request for data at address 0x27ef9e3e | Faulting instruction address: 0x602f5328 | Oops: Kernel access of bad area, sig: 11 [#1] | Workqueue: events aer_isr | GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0 | NIP [602f5328] pci_walk_bus+0xd4/0x104 Register 25 has the user-after magic. As it turns out, the old PCIe device is leaving, generates an error before it left, aer_irq() is fired, it schedules a work item. What happens now is that free_irq() is invoked, all resources are gone *before* the aes_isr() work item is completed. So to fix this, I flush the workqueue to ensure that there is no more work pending. The wait_event() on wait_release should actually synchronized against removal. However the condition (->prod_idx == ->cons_idx) is made true before the function completes (aer_isr_one_error() is invoked right after that) so it does not fulfill its purpose. Therefore I remove it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- v1…v2: - remove wait_release since it is broken on SMP - don't flush the workqueue only if we have ->isr set because the workqueue could be scheduled via the inject module. *compile* tested only because I don't have the HW at the moment. Bjorn, this could deserve a stable tag. However it seems to have been like that even in v2.6.20. drivers/pci/pcie/aer/aerdrv.c | 4 +--- drivers/pci/pcie/aer/aerdrv.h | 1 - drivers/pci/pcie/aer/aerdrv_core.c | 2 -- 3 files changed, 1 insertion(+), 6 deletions(-)