Message ID | dac488604766f3dcb948e702210ecc381c4f907b.1533015755.git.lukas@wunner.de (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: pciehp: Differentiate between surprise and safe removal | expand |
On Tue, Jul 31, 2018 at 07:50:37AM +0200, Lukas Wunner wrote: > When removing PCI devices below a hotplug bridge, pciehp marks them as > disconnected if the card is no longer present in the slot or it quiesces > them if the card is still present (by disabling INTx interrupts, bus > mastering and SERR# reporting). > > To detect whether the card is still present, pciehp checks the Presence > Detect State bit in the Slot Status register. The problem with this > approach is that even if the card is present, the link to it may be > down, and it that case it would be better to mark the devices as > disconnected instead of trying to quiesce them. Moreover, if the card > in the slot was quickly replaced by another one, the Presence Detect > State bit would be set, yet trying to quiesce the new card's devices > would be wrong and the correct thing to do is to mark the previous > card's devices as disconnected. > > Instead of looking at the Presence Detect State bit, it is better to > differentiate whether the card was surprise removed versus safely > removed (via sysfs or an Attention Button press). On surprise removal, > the devices should be marked as disconnected, whereas on safe removal it > is correct to quiesce the devices. > > The knowledge whether a surprise removal or a safe removal is at hand > does exist further up in the call stack: A surprise removal is > initiated by pciehp_handle_presence_or_link_change(), a safe removal by > pciehp_handle_disable_request(). > > Pass that information down to pciehp_unconfigure_device() and use it in > lieu of the Presence Detect State bit. While there, add kernel-doc to > pciehp_unconfigure_device() and pciehp_configure_device(). > > Tested-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: Keith Busch <keith.busch@intel.com> > --- > drivers/pci/hotplug/pciehp.h | 2 +- > drivers/pci/hotplug/pciehp_ctrl.c | 22 +++++++++++++--------- > drivers/pci/hotplug/pciehp_pci.c | 23 ++++++++++++++++++++--- > 3 files changed, 34 insertions(+), 13 deletions(-) > > diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h > index 811cf83f956d..39c9c8815a35 100644 > --- a/drivers/pci/hotplug/pciehp.h > +++ b/drivers/pci/hotplug/pciehp.h > @@ -181,7 +181,7 @@ void pciehp_handle_button_press(struct slot *slot); > void pciehp_handle_disable_request(struct slot *slot); > void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events); > int pciehp_configure_device(struct slot *p_slot); > -void pciehp_unconfigure_device(struct slot *p_slot); > +void pciehp_unconfigure_device(struct slot *p_slot, bool presence); > void pciehp_queue_pushbutton_work(struct work_struct *work); > struct controller *pcie_init(struct pcie_device *dev); > int pcie_init_notification(struct controller *ctrl); > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c > index 6855933ab372..7932e70e9f29 100644 > --- a/drivers/pci/hotplug/pciehp_ctrl.c > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > @@ -26,6 +26,9 @@ > hotplug controller logic > */ > > +#define SAFE_REMOVAL true > +#define SURPRISE_REMOVAL false > + > static void set_slot_off(struct controller *ctrl, struct slot *pslot) > { > /* turn off slot, turn on Amber LED, turn off Green LED if supported*/ > @@ -101,12 +104,13 @@ static int board_added(struct slot *p_slot) > /** > * remove_board - Turns off slot and LEDs > * @p_slot: slot where board is being removed > + * @safe_removal: whether the board is safely removed (versus surprise removed) > */ > -static void remove_board(struct slot *p_slot) > +static void remove_board(struct slot *p_slot, bool safe_removal) > { > struct controller *ctrl = p_slot->ctrl; > > - pciehp_unconfigure_device(p_slot); > + pciehp_unconfigure_device(p_slot, safe_removal); Below we turn off power to the slot if it has power controller. Even if we disable slot from sysfs, I think it ends up being inaccessible after power is turned off. I wonder if we should mark the devices disconnected in that case as well? > > if (POWER_CTRL(ctrl)) { > pciehp_power_off_slot(p_slot);
On Wed, Aug 01, 2018 at 07:43:58PM +0300, Mika Westerberg wrote: > On Tue, Jul 31, 2018 at 07:50:37AM +0200, Lukas Wunner wrote: > > -static void remove_board(struct slot *p_slot) > > +static void remove_board(struct slot *p_slot, bool safe_removal) > > { > > struct controller *ctrl = p_slot->ctrl; > > > > - pciehp_unconfigure_device(p_slot); > > + pciehp_unconfigure_device(p_slot, safe_removal); > > Below we turn off power to the slot if it has power controller. Even if > we disable slot from sysfs, I think it ends up being inaccessible after > power is turned off. I wonder if we should mark the devices disconnected > in that case as well? > > > > > if (POWER_CTRL(ctrl)) { > > pciehp_power_off_slot(p_slot); No, when pciehp_unconfigure_device() returns, the PCI devices below the hotplug bridge are unbound and removed from the system. They're gone, so the bit set in their pci_dev struct would no longer be accessible anyway. Unless of course something is holding a ref on the pci_dev, but that would seem to be a bug. (Accessing a device that's already removed from the system, that is.) Calling pci_dev_set_disconnected() only gives the PCI core and the driver bound to the device an indication that's it's inaccessible, so any code paths during unbound and PCI device teardown can skip accesses. (Because pci_dev_is_disconnected() is currently scoped to the PCI core, the disconnected status can only be queried from drivers that live in the PCI core, such as portdrv and all the port services drivers.) After the pci_dev is removed from the hierarchy, accessing it seems at least questionable. Does this make things clearer? Shout if it not. :-) Thanks, Lukas
On 08/01/2018 12:15 PM, Lukas Wunner wrote: > On Wed, Aug 01, 2018 at 07:43:58PM +0300, Mika Westerberg wrote: >> On Tue, Jul 31, 2018 at 07:50:37AM +0200, Lukas Wunner wrote: >>> -static void remove_board(struct slot *p_slot) >>> +static void remove_board(struct slot *p_slot, bool safe_removal) >>> { >>> struct controller *ctrl = p_slot->ctrl; >>> >>> - pciehp_unconfigure_device(p_slot); >>> + pciehp_unconfigure_device(p_slot, safe_removal); >> >> Below we turn off power to the slot if it has power controller. Even if >> we disable slot from sysfs, I think it ends up being inaccessible after >> power is turned off. I wonder if we should mark the devices disconnected >> in that case as well? >> >>> >>> if (POWER_CTRL(ctrl)) { >>> pciehp_power_off_slot(p_slot); > > No, when pciehp_unconfigure_device() returns, the PCI devices below > the hotplug bridge are unbound and removed from the system. They're > gone, so the bit set in their pci_dev struct would no longer be > accessible anyway. Unless of course something is holding a ref on > the pci_dev, but that would seem to be a bug. And a very common bug at that. Choose your favorite open-source GPU (or NVME) driver, and try unloading it -- "cannot unload/in use". I stopped being bothered by it, and take it as the status quo. > (Accessing a device that's already removed from the system, that is.) Why? lspci/userspace can still send config reads/writes. Of course, userspace reads/writes are not protected by PCI_DEV_DISCONNECTED. > Calling pci_dev_set_disconnected() only gives the PCI core and the > driver bound to the device an indication that's it's inaccessible, > so any code paths during unbound and PCI device teardown can skip > accesses. (Because pci_dev_is_disconnected() is currently scoped > to the PCI core, the disconnected status can only be queried from > drivers that live in the PCI core, such as portdrv and all the > port services drivers.) After the pci_dev is removed from the > hierarchy, accessing it seems at least questionable. I fully agree. > Does this make things clearer? Shout if it not. :-) > > Thanks, > > Lukas >
On Wed, Aug 01, 2018 at 07:15:12PM +0200, Lukas Wunner wrote: > On Wed, Aug 01, 2018 at 07:43:58PM +0300, Mika Westerberg wrote: > > On Tue, Jul 31, 2018 at 07:50:37AM +0200, Lukas Wunner wrote: > > > -static void remove_board(struct slot *p_slot) > > > +static void remove_board(struct slot *p_slot, bool safe_removal) > > > { > > > struct controller *ctrl = p_slot->ctrl; > > > > > > - pciehp_unconfigure_device(p_slot); > > > + pciehp_unconfigure_device(p_slot, safe_removal); > > > > Below we turn off power to the slot if it has power controller. Even if > > we disable slot from sysfs, I think it ends up being inaccessible after > > power is turned off. I wonder if we should mark the devices disconnected > > in that case as well? > > > > > > > > if (POWER_CTRL(ctrl)) { > > > pciehp_power_off_slot(p_slot); > > No, when pciehp_unconfigure_device() returns, the PCI devices below > the hotplug bridge are unbound and removed from the system. They're > gone, so the bit set in their pci_dev struct would no longer be > accessible anyway. Unless of course something is holding a ref on > the pci_dev, but that would seem to be a bug. (Accessing a device > that's already removed from the system, that is.) > > Calling pci_dev_set_disconnected() only gives the PCI core and the > driver bound to the device an indication that's it's inaccessible, > so any code paths during unbound and PCI device teardown can skip > accesses. (Because pci_dev_is_disconnected() is currently scoped > to the PCI core, the disconnected status can only be queried from > drivers that live in the PCI core, such as portdrv and all the > port services drivers.) After the pci_dev is removed from the > hierarchy, accessing it seems at least questionable. > > Does this make things clearer? Shout if it not. :-) Yes it does. Thank you :)
Hi , >After the pci_dev is removed from the >> hierarchy, accessing it seems at least questionable. What about AER driver . I was discussing the same in another mail chain with subject "Possible race condition in the kernel between PCI driver and AER handling" Regards Gokul, --------------------FYI--------------- I am suspecting a possible race condition in the kernel between PCI driver and AER handling. Because of the same kernel panic happens from worker thread which handles bottom half of aer irq. I am seeing this issue when I suddenly power off PCI card which supports/enabled PCIE AER error reporting. While powering off PCI device, AER driver will get AER IRQ for the device, from AER IRQ handler, it will cache AER error code and schedule worker thread to handle error. The PCIe device will get removed from PCI tree before worker thread completes its task and kernel panic is happening when worker thread tries to access PCI device's config space. Issue: crash> crash> bt PID: 2727 TASK: ffff880272adc530 CPU: 0 COMMAND: "kworker/0:2" #0 [ffff88027469fac8] machine_kexec at ffffffff8102cf18 #1 [ffff88027469fb28] crash_kexec at ffffffff810a6b05 #2 [ffff88027469fbf0] oops_end at ffffffff8176d960 #3 [ffff88027469fc18] die at ffffffff810060db #4 [ffff88027469fc48] do_general_protection at ffffffff8176d452 #5 [ffff88027469fc70] general_protection at ffffffff8176cdf2 [exception RIP: pci_bus_read_config_dword+100] RIP: ffffffff813405f4 RSP: ffff88027469fd20 RFLAGS: 00010046 RAX: 435f494350006963 RBX: ffff880274892000 RCX: 0000000000000004 RDX: 0000000000000100 RSI: 0000000000000060 RDI: ffff880274892000 RBP: ffff88027469fd48 R8: ffff88027469fd2c R9: 00000000000012c0 R10: 0000000000000006 R11: 00000000000012bf R12: ffff88027469fd5c R13: 0000000000000246 R14: 0000000000000000 R15: ffff8802741a4000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff88027469fd50] pci_find_next_ext_capability at ffffffff81345d7b #7 [ffff88027469fd90] pci_find_ext_capability at ffffffff81347225 #8 [ffff88027469fda0] get_device_error_info at ffffffff81356c4d #9 [ffff88027469fdd0] aer_isr at ffffffff81357a38 #10 [ffff88027469fe28] process_one_work at ffffffff8105d4c0 #11 [ffff88027469fe70] worker_thread at ffffffff8105e251 #12 [ffff88027469fed0] kthread at ffffffff81064260 #13 [ffff88027469ff50] ret_from_fork at ffffffff81773a38 crash> I have tested it on kernel 3.10 . But from source i could see that this case is still relevant for latest Linux source . --------------------END-------------- On Thu, Aug 2, 2018 at 12:50 PM, Mika Westerberg < mika.westerberg@linux.intel.com> wrote: > On Wed, Aug 01, 2018 at 07:15:12PM +0200, Lukas Wunner wrote: > > On Wed, Aug 01, 2018 at 07:43:58PM +0300, Mika Westerberg wrote: > > > On Tue, Jul 31, 2018 at 07:50:37AM +0200, Lukas Wunner wrote: > > > > -static void remove_board(struct slot *p_slot) > > > > +static void remove_board(struct slot *p_slot, bool safe_removal) > > > > { > > > > struct controller *ctrl = p_slot->ctrl; > > > > > > > > - pciehp_unconfigure_device(p_slot); > > > > + pciehp_unconfigure_device(p_slot, safe_removal); > > > > > > Below we turn off power to the slot if it has power controller. Even if > > > we disable slot from sysfs, I think it ends up being inaccessible after > > > power is turned off. I wonder if we should mark the devices > disconnected > > > in that case as well? > > > > > > > > > > > if (POWER_CTRL(ctrl)) { > > > > pciehp_power_off_slot(p_slot); > > > > No, when pciehp_unconfigure_device() returns, the PCI devices below > > the hotplug bridge are unbound and removed from the system. They're > > gone, so the bit set in their pci_dev struct would no longer be > > accessible anyway. Unless of course something is holding a ref on > > the pci_dev, but that would seem to be a bug. (Accessing a device > > that's already removed from the system, that is.) > > > > Calling pci_dev_set_disconnected() only gives the PCI core and the > > driver bound to the device an indication that's it's inaccessible, > > so any code paths during unbound and PCI device teardown can skip > > accesses. (Because pci_dev_is_disconnected() is currently scoped > > to the PCI core, the disconnected status can only be queried from > > drivers that live in the PCI core, such as portdrv and all the > > port services drivers.) After the pci_dev is removed from the > > hierarchy, accessing it seems at least questionable. > > > > Does this make things clearer? Shout if it not. :-) > > Yes it does. Thank you :) > <div dir="ltr">Hi ,<div><br></div><div>><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">After the pci_dev is removed from the</span><br style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">>> hierarchy, accessing it seems at least questionable.</span></div><div><br></div><div>What about AER driver . I was discussing the same in another mail chain with subject "<span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Possible race condition in the kernel between PCI driver and AER handling"</span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Regards</span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Gokul,</span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">--------------------FYI---------------</span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><span style="font-size:12px;font-family:Helvetica"><span>I </span>am suspecting a possible race condition in the kernel between PCI driver and AER handling.</span><br></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">Because of the same kernel panic happens from worker thread which handles bottom half of aer irq.</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"><br></span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">I am seeing this issue when I suddenly power off PCI card which supports/enabled PCIE AER error reporting.</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">While powering off PCI device, AER driver will get AER IRQ for the device, from AER IRQ handler, it will cache AER error code and schedule worker thread to handle error.</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">The PCIe device will get removed from PCI tree before worker thread completes its task and kernel panic is happening when worker thread tries to access PCI device's config space.</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><br></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:small;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">Issue:</span></font></p><br class="gmail-m_6437089603770748812gmail-Apple-interchange-newline" style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial"><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">crash></span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">crash> bt</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">PID: 2727 TASK: ffff880272adc530 CPU: 0 COMMAND: "kworker/0:2"</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#0 [ffff88027469fac8] machine_kexec at ffffffff8102cf18</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#1 [ffff88027469fb28] crash_kexec at ffffffff810a6b05</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#2 [ffff88027469fbf0] oops_end at ffffffff8176d960</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#3 [ffff88027469fc18] die at ffffffff810060db</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#4 [ffff88027469fc48] do_general_protection at ffffffff8176d452</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#5 [ffff88027469fc70] general_protection at ffffffff8176cdf2</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"> [exception RIP: pci_bus_read_config_dword+100]</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"> RIP: ffffffff813405f4 RSP: ffff88027469fd20 RFLAGS: 00010046</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"> RAX: 435f494350006963 RBX: ffff880274892000 RCX: 0000000000000004</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"> RDX: 0000000000000100 RSI: 0000000000000060 RDI: ffff880274892000</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"> RBP: ffff88027469fd48 R8: ffff88027469fd2c R9: 00000000000012c0</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"> R10: 0000000000000006 R11: 00000000000012bf R12: ffff88027469fd5c</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"> R13: 0000000000000246 R14: 0000000000000000 R15: ffff8802741a4000</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#6 [ffff88027469fd50] pci_find_next_ext_capability at ffffffff81345d7b</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#7 [ffff88027469fd90] pci_find_ext_capability at ffffffff81347225</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#8 [ffff88027469fda0] get_device_error_info at ffffffff81356c4d</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#9 [ffff88027469fdd0] aer_isr at ffffffff81357a38</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#10 [ffff88027469fe28] process_one_work at ffffffff8105d4c0</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#11 [ffff88027469fe70] worker_thread at ffffffff8105e251</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#12 [ffff88027469fed0] kthread at ffffffff81064260</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">#13 [ffff88027469ff50] ret_from_fork at ffffffff81773a38</span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"><br></span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">crash></span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px"><br></span></font></p><p class="gmail-m_6437089603770748812gmail-p1" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8px;line-height:normal;text-decoration-style:initial;text-decoration-color:initial;margin:0px"><font face="Helvetica"><span style="font-size:12px">I have tested it on kernel 3.10 . But from source i could see that this case is still relevant for latest Linux source .</span></font></p></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">--------------------END-------------- </span></div><div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 2, 2018 at 12:50 PM, Mika Westerberg <span dir="ltr"><<a href="mailto:mika.westerberg@linux.intel.com" target="_blank">mika.westerberg@linux.intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Wed, Aug 01, 2018 at 07:15:12PM +0200, Lukas Wunner wrote:<br> > On Wed, Aug 01, 2018 at 07:43:58PM +0300, Mika Westerberg wrote:<br> > > On Tue, Jul 31, 2018 at 07:50:37AM +0200, Lukas Wunner wrote:<br> > > > -static void remove_board(struct slot *p_slot)<br> > > > +static void remove_board(struct slot *p_slot, bool safe_removal)<br> > > > {<br> > > > struct controller *ctrl = p_slot->ctrl;<br> > > > <br> > > > - pciehp_unconfigure_device(p_<wbr>slot);<br> > > > + pciehp_unconfigure_device(p_<wbr>slot, safe_removal);<br> > > <br> > > Below we turn off power to the slot if it has power controller. Even if<br> > > we disable slot from sysfs, I think it ends up being inaccessible after<br> > > power is turned off. I wonder if we should mark the devices disconnected<br> > > in that case as well?<br> > > <br> > > > <br> > > > if (POWER_CTRL(ctrl)) {<br> > > > pciehp_power_off_slot(p_slot);<br> > <br> > No, when pciehp_unconfigure_device() returns, the PCI devices below<br> > the hotplug bridge are unbound and removed from the system. They're<br> > gone, so the bit set in their pci_dev struct would no longer be<br> > accessible anyway. Unless of course something is holding a ref on<br> > the pci_dev, but that would seem to be a bug. (Accessing a device<br> > that's already removed from the system, that is.)<br> > <br> > Calling pci_dev_set_disconnected() only gives the PCI core and the<br> > driver bound to the device an indication that's it's inaccessible,<br> > so any code paths during unbound and PCI device teardown can skip<br> > accesses. (Because pci_dev_is_disconnected() is currently scoped<br> > to the PCI core, the disconnected status can only be queried from<br> > drivers that live in the PCI core, such as portdrv and all the<br> > port services drivers.) After the pci_dev is removed from the<br> > hierarchy, accessing it seems at least questionable.<br> > <br> > Does this make things clearer? Shout if it not. :-)<br> <br> Yes it does. Thank you :)<br> </blockquote></div><br></div></div></div>
On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg wrote: > I am suspecting a possible race condition in the kernel between PCI driver > and AER handling. > > Because of the same kernel panic happens from worker thread which handles > bottom half of aer irq. > > I am seeing this issue when I suddenly power off PCI card which > supports/enabled PCIE AER error reporting. > > While powering off PCI device, AER driver will get AER IRQ for the device, > from AER IRQ handler, it will cache AER error code and schedule worker > thread to handle error. > > The PCIe device will get removed from PCI tree before worker thread > completes its task and kernel panic is happening when worker thread tries > to access PCI device's config space. > > #5 [ffff88027469fc70] general_protection at ffffffff8176cdf2 > [exception RIP: pci_bus_read_config_dword+100] > #6 [ffff88027469fd50] pci_find_next_ext_capability at ffffffff81345d7b > #7 [ffff88027469fd90] pci_find_ext_capability at ffffffff81347225 > #8 [ffff88027469fda0] get_device_error_info at ffffffff81356c4d > #9 [ffff88027469fdd0] aer_isr at ffffffff81357a38 > #10 [ffff88027469fe28] process_one_work at ffffffff8105d4c0 > #11 [ffff88027469fe70] worker_thread at ffffffff8105e251 > #12 [ffff88027469fed0] kthread at ffffffff81064260 > #13 [ffff88027469ff50] ret_from_fork at ffffffff81773a38 > > I have tested it on kernel 3.10 . But from source i could see that this > case is still relevant for latest Linux source . I'm not really familiar with the AER driver, but the problem is actually easy to spot: find_source_device() walks the hierarchy and saves a pointer to pci_dev's in an array. That array is later traversed and the pci_dev's are accessed. The solution is to acquire a ref on each device in add_error_device(): - e_info->dev[e_info->error_dev_num] = dev; + e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); Then release the ref aer_process_err_devices() by calling pci_dev_put(). I believe there's an ongoing refactoring of the AER driver and the issue may be addressed in the course of it, but as a quick fix for an ancient v3.10 kernel, the above should do the trick. HTH, Lukas
Hi Lukas, >The solution is to acquire a ref on each device in add_error_device(): I will give try and update if it works . regards, Gokul On Thu, Aug 2, 2018 at 2:16 PM, Lukas Wunner <lukas@wunner.de> wrote: > On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg wrote: > > I am suspecting a possible race condition in the kernel between PCI > driver > > and AER handling. > > > > Because of the same kernel panic happens from worker thread which handles > > bottom half of aer irq. > > > > I am seeing this issue when I suddenly power off PCI card which > > supports/enabled PCIE AER error reporting. > > > > While powering off PCI device, AER driver will get AER IRQ for the > device, > > from AER IRQ handler, it will cache AER error code and schedule worker > > thread to handle error. > > > > The PCIe device will get removed from PCI tree before worker thread > > completes its task and kernel panic is happening when worker thread > tries > > to access PCI device's config space. > > > > #5 [ffff88027469fc70] general_protection at ffffffff8176cdf2 > > [exception RIP: pci_bus_read_config_dword+100] > > #6 [ffff88027469fd50] pci_find_next_ext_capability at ffffffff81345d7b > > #7 [ffff88027469fd90] pci_find_ext_capability at ffffffff81347225 > > #8 [ffff88027469fda0] get_device_error_info at ffffffff81356c4d > > #9 [ffff88027469fdd0] aer_isr at ffffffff81357a38 > > #10 [ffff88027469fe28] process_one_work at ffffffff8105d4c0 > > #11 [ffff88027469fe70] worker_thread at ffffffff8105e251 > > #12 [ffff88027469fed0] kthread at ffffffff81064260 > > #13 [ffff88027469ff50] ret_from_fork at ffffffff81773a38 > > > > I have tested it on kernel 3.10 . But from source i could see that this > > case is still relevant for latest Linux source . > > I'm not really familiar with the AER driver, but the problem is > actually easy to spot: > > find_source_device() walks the hierarchy and saves a pointer to > pci_dev's in an array. That array is later traversed and the > pci_dev's are accessed. > > The solution is to acquire a ref on each device in add_error_device(): > > - e_info->dev[e_info->error_dev_num] = dev; > + e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); > > Then release the ref aer_process_err_devices() by calling pci_dev_put(). > > I believe there's an ongoing refactoring of the AER driver and the > issue may be addressed in the course of it, but as a quick fix for > an ancient v3.10 kernel, the above should do the trick. > > HTH, > > Lukas > <div dir="ltr">Hi <span style="font-size:small;text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Lukas,</span><div><br></div><div><div>><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">The solution is to acquire a ref on each device in add_error_device():</span></div><div>I will give try and update if it works .<br><div class="gmail_extra"><br></div><div class="gmail_extra">regards,</div><div class="gmail_extra">Gokul </div><div class="gmail_extra"><div class="gmail_quote">On Thu, Aug 2, 2018 at 2:16 PM, Lukas Wunner <span dir="ltr"><<a href="mailto:lukas@wunner.de" target="_blank">lukas@wunner.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg wrote:<br> > I am suspecting a possible race condition in the kernel between PCI driver<br> > and AER handling.<br> > <br> > Because of the same kernel panic happens from worker thread which handles<br> > bottom half of aer irq.<br> > <br> > I am seeing this issue when I suddenly power off PCI card which<br> > supports/enabled PCIE AER error reporting.<br> > <br> > While powering off PCI device, AER driver will get AER IRQ for the device,<br> > from AER IRQ handler, it will cache AER error code and schedule worker<br> > thread to handle error.<br> > <br> > The PCIe device will get removed from PCI tree before worker thread<br> > completes its task and kernel panic is happening when worker thread tries<br> > to access PCI device's config space.<br> > <br> </span><span class="">> #5 [ffff88027469fc70] general_protection at ffffffff8176cdf2<br> > [exception RIP: pci_bus_read_config_dword+100]<br> </span><span class="">> #6 [ffff88027469fd50] pci_find_next_ext_capability at ffffffff81345d7b<br> > #7 [ffff88027469fd90] pci_find_ext_capability at ffffffff81347225<br> > #8 [ffff88027469fda0] get_device_error_info at ffffffff81356c4d<br> > #9 [ffff88027469fdd0] aer_isr at ffffffff81357a38<br> > #10 [ffff88027469fe28] process_one_work at ffffffff8105d4c0<br> > #11 [ffff88027469fe70] worker_thread at ffffffff8105e251<br> > #12 [ffff88027469fed0] kthread at ffffffff81064260<br> > #13 [ffff88027469ff50] ret_from_fork at ffffffff81773a38<br> > <br> </span><span class="">> I have tested it on kernel 3.10 . But from source i could see that this<br> > case is still relevant for latest Linux source .<br> <br> </span>I'm not really familiar with the AER driver, but the problem is<br> actually easy to spot:<br> <br> find_source_device() walks the hierarchy and saves a pointer to<br> pci_dev's in an array. That array is later traversed and the<br> pci_dev's are accessed.<br> <br> The solution is to acquire a ref on each device in add_error_device():<br> <br> - e_info->dev[e_info->error_dev_<wbr>num] = dev;<br> + e_info->dev[e_info->error_dev_<wbr>num] = pci_dev_get(dev);<br> <br> Then release the ref aer_process_err_devices() by calling pci_dev_put().<br> <br> I believe there's an ongoing refactoring of the AER driver and the<br> issue may be addressed in the course of it, but as a quick fix for<br> an ancient v3.10 kernel, the above should do the trick.<br> <br> HTH,<br> <br> Lukas<br> </blockquote></div><br></div></div></div></div>
[cc += Thomas Tai] On Thu, Aug 02, 2018 at 10:46:57AM +0200, Lukas Wunner wrote: > On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg wrote: > > I am suspecting a possible race condition in the kernel between PCI driver > > and AER handling. > > The solution is to acquire a ref on each device in add_error_device(). > Then release the ref aer_process_err_devices() by calling pci_dev_put(). So in case it wasn't clear, the below is what I had in mind. Completely untested though. Does this work for you? For v3.10 compatibility, cherry-pick 89ee9f768003 (or alternatively cherry-pick 8496e85c20e7 and replace pci_dev_is_disconnected(dev) with !pci_device_is_present(dev)). -- >8 -- Subject: [PATCH] PCI/AER: Fix use-after-free on surprise removal The work item to consume errors, aer_isr(), walks the hierarchy using pci_walk_bus() and stores a pointer to PCI devices which reported an error in an array. As long as pci_walk_bus() runs, those pointers are valid because pci_bus_sem is held. But once pci_walk_bus() finishes, nothing prevents the pointers from becoming invalid, e.g. through unplugging of the PCI devices. The unprotected pointers are then dereferenced in aer_process_err_devices(), which may oops: #5 general_protection at ffffffff8176cdf2 [exception RIP: pci_bus_read_config_dword+100] #6 pci_find_next_ext_capability at ffffffff81345d7b #7 pci_find_ext_capability at ffffffff81347225 #8 get_device_error_info at ffffffff81356c4d #9 aer_isr at ffffffff81357a38 Fix by holding a ref on the devices until they have been processed. Skip processing of unplugged devices. Reported-by: gokul cg <gokuljnpr@gmail.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> --- drivers/pci/pcie/aer.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index a2e8838..937592e 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -657,7 +657,7 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev) { if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { - e_info->dev[e_info->error_dev_num] = dev; + e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); e_info->error_dev_num++; return 0; } @@ -898,6 +898,9 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) if (!pos) return 0; + if (pci_dev_is_disconnected(dev)) + return 0; + if (info->severity == AER_CORRECTABLE) { pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, &info->status); @@ -948,6 +951,7 @@ static inline void aer_process_err_devices(struct aer_err_info *e_info) for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { if (get_device_error_info(e_info->dev[i], e_info)) handle_error_source(e_info->dev[i], e_info); + pci_dev_put(e_info->dev[i]); } }
On 08/02/2018 11:07 AM, Lukas Wunner wrote: > [cc += Thomas Tai] Hi Lukas, Thank you very much for cc me. > > On Thu, Aug 02, 2018 at 10:46:57AM +0200, Lukas Wunner wrote: >> On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg wrote: >>> I am suspecting a possible race condition in the kernel between PCI driver >>> and AER handling. >> >> The solution is to acquire a ref on each device in add_error_device(). >> Then release the ref aer_process_err_devices() by calling pci_dev_put(). > > So in case it wasn't clear, the below is what I had in mind. > Completely untested though. Does this work for you? > > For v3.10 compatibility, cherry-pick 89ee9f768003 (or alternatively > cherry-pick 8496e85c20e7 and replace pci_dev_is_disconnected(dev) > with !pci_device_is_present(dev)). > > -- >8 -- > Subject: [PATCH] PCI/AER: Fix use-after-free on surprise removal > > The work item to consume errors, aer_isr(), walks the hierarchy using > pci_walk_bus() and stores a pointer to PCI devices which reported an > error in an array. As long as pci_walk_bus() runs, those pointers are > valid because pci_bus_sem is held. But once pci_walk_bus() finishes, > nothing prevents the pointers from becoming invalid, e.g. through > unplugging of the PCI devices. The unprotected pointers are then > dereferenced in aer_process_err_devices(), which may oops: I like your idea to increment the refcount during pci_walk_bus(), that should fix the use-after-free issue. We just need Gokul to confirm if it fixes his issue or not. Thanks, Thomas > > #5 general_protection at ffffffff8176cdf2 > [exception RIP: pci_bus_read_config_dword+100] > #6 pci_find_next_ext_capability at ffffffff81345d7b > #7 pci_find_ext_capability at ffffffff81347225 > #8 get_device_error_info at ffffffff81356c4d > #9 aer_isr at ffffffff81357a38 > > Fix by holding a ref on the devices until they have been processed. > Skip processing of unplugged devices. > > Reported-by: gokul cg <gokuljnpr@gmail.com> > Signed-off-by: Lukas Wunner <lukas@wunner.de> > --- > drivers/pci/pcie/aer.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index a2e8838..937592e 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -657,7 +657,7 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, > static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev) > { > if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { > - e_info->dev[e_info->error_dev_num] = dev; > + e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); > e_info->error_dev_num++; > return 0; > } > @@ -898,6 +898,9 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) > if (!pos) > return 0; > > + if (pci_dev_is_disconnected(dev)) > + return 0; > + > if (info->severity == AER_CORRECTABLE) { > pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, > &info->status); > @@ -948,6 +951,7 @@ static inline void aer_process_err_devices(struct aer_err_info *e_info) > for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { > if (get_device_error_info(e_info->dev[i], e_info)) > handle_error_source(e_info->dev[i], e_info); > + pci_dev_put(e_info->dev[i]); > } > } > >
On 08/06/2018 02:33 PM, gokul cg wrote: > Hi, > > I have tried with following patch and I am still getting same kernel panic. Sorry to hear that. What kernel message did you see before the crash. We assume the hotplug is corrupting the pci_dev, but is it really? Have you tried without insmod the hotplug provided that you can still power off the device via your I2C. > > -------------X++++++++++++++++++++X--------------------- > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c > b/drivers/pci/pcie/aer/aerdrv_core.c > index 0f4554e..05592aa 100644 > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -26,6 +26,7 @@ > #include <linux/slab.h> > #include <linux/kfifo.h> > #include "aerdrv.h" > +#include "../../pci.h" > > static bool forceload; > static bool nosourceid; > @@ -82,7 +82,7 @@ EXPORT_SYMBOL_GPL(pci_cleanup_aer_uncorrect_error_status); > static int add_error_device(struct aer_err_info *e_info, struct > pci_dev *dev) > { > if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { > -e_info->dev[e_info->error_dev_num] = dev; > +e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); > e_info->error_dev_num++; > return 0; > } > @@ -659,6 +659,9 @@ static int get_device_error_info(struct pci_dev > *dev, struct aer_err_info *info) > if (!pos) > return 1; > > + if (pci_dev_is_disconnected(dev)) > + return 0; > + > if (info->severity == AER_CORRECTABLE) { > pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, > &info->status); > @@ -710,6 +713,8 @@ static inline void aer_process_err_devices(struct > pcie_device *p_device, > for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { > if (get_device_error_info(e_info->dev[i], e_info)) > handle_error_source(p_device, e_info->dev[i], e_info); > + > + pci_dev_put(e_info->dev[i]); > } > } > -------------X++++++++++++++++++++X--------------------- > > > Note: I have configured CONFIG_HOTPLUG_PCI_PCIE and CONFIG_HOTPLUG_PCI > as modules and loading in start up using script. > > root@/proc/:~# cat config | grep -i HOT > CONFIG_TICK_ONESHOT=y > CONFIG_HOTPLUG=y > # CONFIG_MEMORY_HOTPLUG is not set > CONFIG_HOTPLUG_CPU=y > # CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set > # CONFIG_DEBUG_HOTPLUG_CPU0 is not set > CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y > CONFIG_ACPI_HOTPLUG_CPU=y > CONFIG_HOTPLUG_PCI_PCIE=m > CONFIG_HOTPLUG_PCI=m > # CONFIG_HOTPLUG_PCI_CPCI is not set > # CONFIG_HOTPLUG_PCI_SHPC is not set > CONFIG_DM_SNAPSHOT=y > # CONFIG_USB_STORAGE_JUMPSHOT is not set > # CONFIG_TRACER_SNAPSHOT is not set > root@/proc/:~# > > Panic back trace : > crash> bt > PID: 24 TASK: ffff880274ac0000 CPU: 0 COMMAND: "kworker/0:1" > #0 [ffff880274abbac8] machine_kexec at ffffffff8102cf18 > #1 [ffff880274abbb28] crash_kexec at ffffffff810a6b05 > #2 [ffff880274abbbf0] oops_end at ffffffff8176d8a0 > #3 [ffff880274abbc18] die at ffffffff810060db > #4 [ffff880274abbc48] do_general_protection at ffffffff8176d392 > #5 [ffff880274abbc70] general_protection at ffffffff8176cd32 > [exception RIP: pci_bus_read_config_dword+100] > RIP: ffffffff813405f4 RSP: ffff880274abbd20 RFLAGS: 00010046 > RAX: 435f494350006963 RBX: ffff880274891800 RCX: 0000000000000004 > RDX: 0000000000000ffc RSI: 0000000000000060 RDI: ffff880274891800 > RBP: ffff880274abbd48 R8: ffff880274abbd2c R9: 00000000000002b8 > R10: ffff880274340000 R11: 0000000000000246 R12: ffff880274abbd5c > R13: 0000000000000246 R14: 0000000000000000 R15: ffff880274920000 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #6 [ffff880274abbd50] pci_find_next_ext_capability at ffffffff81345db6 > #7 [ffff880274abbd90] pci_find_ext_capability at ffffffff81347225 > #8 [ffff880274abbda0] get_device_error_info at ffffffff81356c4d > #9 [ffff880274abbdd0] aer_isr at ffffffff81357ab0 > #10 [ffff880274abbe28] process_one_work at ffffffff8105d4c0 > #11 [ffff880274abbe70] worker_thread at ffffffff8105e251 > #12 [ffff880274abbed0] kthread at ffffffff81064260 > #13 [ffff880274abbf50] ret_from_fork at ffffffff81773978 > crash> > > > Regards, > Gokul > > On Thu, Aug 2, 2018 at 10:39 PM, Thomas Tai <thomas.tai@oracle.com > <mailto:thomas.tai@oracle.com>> wrote: > > > On 08/02/2018 11:07 AM, Lukas Wunner wrote: > > [cc += Thomas Tai] > > > Hi Lukas, > Thank you very much for cc me. > > > On Thu, Aug 02, 2018 at 10:46:57AM +0200, Lukas Wunner wrote: > > On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg wrote: > > I am suspecting a possible race condition in the kernel > between PCI driver > and AER handling. > > > The solution is to acquire a ref on each device in > add_error_device(). > Then release the ref aer_process_err_devices() by calling > pci_dev_put(). > > > So in case it wasn't clear, the below is what I had in mind. > Completely untested though. Does this work for you? > > For v3.10 compatibility, cherry-pick 89ee9f768003 (or alternatively > cherry-pick 8496e85c20e7 and replace pci_dev_is_disconnected(dev) > with !pci_device_is_present(dev)). > > -- >8 -- > Subject: [PATCH] PCI/AER: Fix use-after-free on surprise removal > > The work item to consume errors, aer_isr(), walks the hierarchy > using > pci_walk_bus() and stores a pointer to PCI devices which reported an > error in an array. As long as pci_walk_bus() runs, those > pointers are > valid because pci_bus_sem is held. But once pci_walk_bus() > finishes, > nothing prevents the pointers from becoming invalid, e.g. through > unplugging of the PCI devices. The unprotected pointers are then > dereferenced in aer_process_err_devices(), which may oops: > > > I like your idea to increment the refcount during pci_walk_bus(), > that should fix the use-after-free issue. We just need Gokul to > confirm if it fixes his issue or not. > > Thanks, > Thomas > > > > #5 general_protection at ffffffff8176cdf2 > [exception RIP: pci_bus_read_config_dword+100] > #6 pci_find_next_ext_capability at ffffffff81345d7b > #7 pci_find_ext_capability at ffffffff81347225 > #8 get_device_error_info at ffffffff81356c4d > #9 aer_isr at ffffffff81357a38 > > Fix by holding a ref on the devices until they have been processed. > Skip processing of unplugged devices. > > Reported-by: gokul cg <gokuljnpr@gmail.com > <mailto:gokuljnpr@gmail.com>> > Signed-off-by: Lukas Wunner <lukas@wunner.de > <mailto:lukas@wunner.de>> > --- > drivers/pci/pcie/aer.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index a2e8838..937592e 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -657,7 +657,7 @@ void cper_print_aer(struct pci_dev *dev, int > aer_severity, > static int add_error_device(struct aer_err_info *e_info, > struct pci_dev *dev) > { > if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { > - e_info->dev[e_info->error_dev_num] = dev; > + e_info->dev[e_info->error_dev_num] = > pci_dev_get(dev); > e_info->error_dev_num++; > return 0; > } > @@ -898,6 +898,9 @@ static int get_device_error_info(struct > pci_dev *dev, struct aer_err_info *info) > if (!pos) > return 0; > + if (pci_dev_is_disconnected(dev)) > + return 0; > + > if (info->severity == AER_CORRECTABLE) { > pci_read_config_dword(dev, pos + > PCI_ERR_COR_STATUS, > &info->status); > @@ -948,6 +951,7 @@ static inline void > aer_process_err_devices(struct aer_err_info *e_info) > for (i = 0; i < e_info->error_dev_num && > e_info->dev[i]; i++) { > if (get_device_error_info(e_info->dev[i], e_info)) > handle_error_source(e_info->dev[i], > e_info); > + pci_dev_put(e_info->dev[i]); > } > } > >
Hi Gokul, Something pop up in my mind and want to share with you. I assume that your device is not a root port device or a switch device. I assume when you power off the device, a FATAL error is sent to the root port thus trigger the aer_isr. Since it is a fatal error and your device is not a switch device, the code should not reach out your device because fatal error means that the link to your device is not reliable. So the pci_find_ext_capability() looks strange to me. When compare the code with the master branch. v3.10 is missing following patch. Would you think you can give it a try? commit 66b808099146166c44157600a166c8372172cd76 Author: Keith Busch <keith.busch@intel.com> Date: Tue Sep 27 16:23:34 2016 -0400 PCI/AER: Cache capability position Save the position of the error reporting capability so it doesn't need to be rediscovered during error handling. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Lukas Wunner <lukas@wunner.de> - Thomas On 08/06/2018 02:33 PM, gokul cg wrote: > Hi, > > I have tried with following patch and I am still getting same kernel panic. > > -------------X++++++++++++++++++++X--------------------- > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c > b/drivers/pci/pcie/aer/aerdrv_core.c > index 0f4554e..05592aa 100644 > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -26,6 +26,7 @@ > #include <linux/slab.h> > #include <linux/kfifo.h> > #include "aerdrv.h" > +#include "../../pci.h" > > static bool forceload; > static bool nosourceid; > @@ -82,7 +82,7 @@ EXPORT_SYMBOL_GPL(pci_cleanup_aer_uncorrect_error_status); > static int add_error_device(struct aer_err_info *e_info, struct > pci_dev *dev) > { > if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { > -e_info->dev[e_info->error_dev_num] = dev; > +e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); > e_info->error_dev_num++; > return 0; > } > @@ -659,6 +659,9 @@ static int get_device_error_info(struct pci_dev > *dev, struct aer_err_info *info) > if (!pos) > return 1; > > + if (pci_dev_is_disconnected(dev)) > + return 0; > + > if (info->severity == AER_CORRECTABLE) { > pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, > &info->status); > @@ -710,6 +713,8 @@ static inline void aer_process_err_devices(struct > pcie_device *p_device, > for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { > if (get_device_error_info(e_info->dev[i], e_info)) > handle_error_source(p_device, e_info->dev[i], e_info); > + > + pci_dev_put(e_info->dev[i]); > } > } > -------------X++++++++++++++++++++X--------------------- > > > Note: I have configured CONFIG_HOTPLUG_PCI_PCIE and CONFIG_HOTPLUG_PCI > as modules and loading in start up using script. > > root@/proc/:~# cat config | grep -i HOT > CONFIG_TICK_ONESHOT=y > CONFIG_HOTPLUG=y > # CONFIG_MEMORY_HOTPLUG is not set > CONFIG_HOTPLUG_CPU=y > # CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set > # CONFIG_DEBUG_HOTPLUG_CPU0 is not set > CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y > CONFIG_ACPI_HOTPLUG_CPU=y > CONFIG_HOTPLUG_PCI_PCIE=m > CONFIG_HOTPLUG_PCI=m > # CONFIG_HOTPLUG_PCI_CPCI is not set > # CONFIG_HOTPLUG_PCI_SHPC is not set > CONFIG_DM_SNAPSHOT=y > # CONFIG_USB_STORAGE_JUMPSHOT is not set > # CONFIG_TRACER_SNAPSHOT is not set > root@/proc/:~# > > Panic back trace : > crash> bt > PID: 24 TASK: ffff880274ac0000 CPU: 0 COMMAND: "kworker/0:1" > #0 [ffff880274abbac8] machine_kexec at ffffffff8102cf18 > #1 [ffff880274abbb28] crash_kexec at ffffffff810a6b05 > #2 [ffff880274abbbf0] oops_end at ffffffff8176d8a0 > #3 [ffff880274abbc18] die at ffffffff810060db > #4 [ffff880274abbc48] do_general_protection at ffffffff8176d392 > #5 [ffff880274abbc70] general_protection at ffffffff8176cd32 > [exception RIP: pci_bus_read_config_dword+100] > RIP: ffffffff813405f4 RSP: ffff880274abbd20 RFLAGS: 00010046 > RAX: 435f494350006963 RBX: ffff880274891800 RCX: 0000000000000004 > RDX: 0000000000000ffc RSI: 0000000000000060 RDI: ffff880274891800 > RBP: ffff880274abbd48 R8: ffff880274abbd2c R9: 00000000000002b8 > R10: ffff880274340000 R11: 0000000000000246 R12: ffff880274abbd5c > R13: 0000000000000246 R14: 0000000000000000 R15: ffff880274920000 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #6 [ffff880274abbd50] pci_find_next_ext_capability at ffffffff81345db6 > #7 [ffff880274abbd90] pci_find_ext_capability at ffffffff81347225 > #8 [ffff880274abbda0] get_device_error_info at ffffffff81356c4d > #9 [ffff880274abbdd0] aer_isr at ffffffff81357ab0 > #10 [ffff880274abbe28] process_one_work at ffffffff8105d4c0 > #11 [ffff880274abbe70] worker_thread at ffffffff8105e251 > #12 [ffff880274abbed0] kthread at ffffffff81064260 > #13 [ffff880274abbf50] ret_from_fork at ffffffff81773978 > crash> > > > Regards, > Gokul > > On Thu, Aug 2, 2018 at 10:39 PM, Thomas Tai <thomas.tai@oracle.com > <mailto:thomas.tai@oracle.com>> wrote: > > > On 08/02/2018 11:07 AM, Lukas Wunner wrote: > > [cc += Thomas Tai] > > > Hi Lukas, > Thank you very much for cc me. > > > On Thu, Aug 02, 2018 at 10:46:57AM +0200, Lukas Wunner wrote: > > On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg wrote: > > I am suspecting a possible race condition in the kernel > between PCI driver > and AER handling. > > > The solution is to acquire a ref on each device in > add_error_device(). > Then release the ref aer_process_err_devices() by calling > pci_dev_put(). > > > So in case it wasn't clear, the below is what I had in mind. > Completely untested though. Does this work for you? > > For v3.10 compatibility, cherry-pick 89ee9f768003 (or alternatively > cherry-pick 8496e85c20e7 and replace pci_dev_is_disconnected(dev) > with !pci_device_is_present(dev)). > > -- >8 -- > Subject: [PATCH] PCI/AER: Fix use-after-free on surprise removal > > The work item to consume errors, aer_isr(), walks the hierarchy > using > pci_walk_bus() and stores a pointer to PCI devices which reported an > error in an array. As long as pci_walk_bus() runs, those > pointers are > valid because pci_bus_sem is held. But once pci_walk_bus() > finishes, > nothing prevents the pointers from becoming invalid, e.g. through > unplugging of the PCI devices. The unprotected pointers are then > dereferenced in aer_process_err_devices(), which may oops: > > > I like your idea to increment the refcount during pci_walk_bus(), > that should fix the use-after-free issue. We just need Gokul to > confirm if it fixes his issue or not. > > Thanks, > Thomas > > > > #5 general_protection at ffffffff8176cdf2 > [exception RIP: pci_bus_read_config_dword+100] > #6 pci_find_next_ext_capability at ffffffff81345d7b > #7 pci_find_ext_capability at ffffffff81347225 > #8 get_device_error_info at ffffffff81356c4d > #9 aer_isr at ffffffff81357a38 > > Fix by holding a ref on the devices until they have been processed. > Skip processing of unplugged devices. > > Reported-by: gokul cg <gokuljnpr@gmail.com > <mailto:gokuljnpr@gmail.com>> > Signed-off-by: Lukas Wunner <lukas@wunner.de > <mailto:lukas@wunner.de>> > --- > drivers/pci/pcie/aer.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index a2e8838..937592e 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -657,7 +657,7 @@ void cper_print_aer(struct pci_dev *dev, int > aer_severity, > static int add_error_device(struct aer_err_info *e_info, > struct pci_dev *dev) > { > if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { > - e_info->dev[e_info->error_dev_num] = dev; > + e_info->dev[e_info->error_dev_num] = > pci_dev_get(dev); > e_info->error_dev_num++; > return 0; > } > @@ -898,6 +898,9 @@ static int get_device_error_info(struct > pci_dev *dev, struct aer_err_info *info) > if (!pos) > return 0; > + if (pci_dev_is_disconnected(dev)) > + return 0; > + > if (info->severity == AER_CORRECTABLE) { > pci_read_config_dword(dev, pos + > PCI_ERR_COR_STATUS, > &info->status); > @@ -948,6 +951,7 @@ static inline void > aer_process_err_devices(struct aer_err_info *e_info) > for (i = 0; i < e_info->error_dev_num && > e_info->dev[i]; i++) { > if (get_device_error_info(e_info->dev[i], e_info)) > handle_error_source(e_info->dev[i], > e_info); > + pci_dev_put(e_info->dev[i]); > } > } > > From 66b808099146166c44157600a166c8372172cd76 Mon Sep 17 00:00:00 2001 From: Keith Busch <keith.busch@intel.com> Date: Tue, 27 Sep 2016 16:23:34 -0400 Subject: [PATCH] PCI/AER: Cache capability position Save the position of the error reporting capability so it doesn't need to be rediscovered during error handling. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Lukas Wunner <lukas@wunner.de> --- drivers/pci/pcie/aer/aerdrv.c | 10 +++++----- drivers/pci/pcie/aer/aerdrv_core.c | 18 ++++++++++++------ drivers/pci/probe.c | 3 ++- include/linux/pci.h | 5 +++++ 4 files changed, 24 insertions(+), 12 deletions(-) diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c index 08ce257..e99efaa 100644 --- a/drivers/pci/pcie/aer/aerdrv.c +++ b/drivers/pci/pcie/aer/aerdrv.c @@ -134,7 +134,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc) pcie_capability_clear_word(pdev, PCI_EXP_RTCTL, SYSTEM_ERROR_INTR_ON_MESG_MASK); - aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); + aer_pos = pdev->aer_cap; /* Clear error status */ pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, ®32); pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, reg32); @@ -173,7 +173,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc) */ set_downstream_devices_error_reporting(pdev, false); - pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); + pos = pdev->aer_cap; /* Disable Root's interrupt in response to error messages */ pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, ®32); reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; @@ -200,7 +200,7 @@ irqreturn_t aer_irq(int irq, void *context) unsigned long flags; int pos; - pos = pci_find_ext_capability(pdev->port, PCI_EXT_CAP_ID_ERR); + pos = pdev->port->aer_cap; /* * Must lock access to Root Error Status Reg, Root Error ID Reg, * and Root error producer/consumer index @@ -338,7 +338,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) u32 reg32; int pos; - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); + pos = dev->aer_cap; /* Disable Root's interrupt in response to error messages */ pci_read_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, ®32); @@ -391,7 +391,7 @@ static void aer_error_resume(struct pci_dev *dev) pcie_capability_write_word(dev, PCI_EXP_DEVSTA, reg16); /* Clean AER Root Error Status */ - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); + pos = dev->aer_cap; pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status); pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &mask); if (dev->error_state == pci_channel_io_normal) diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index 9fd18a0..b1303b3 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -35,7 +35,7 @@ int pci_enable_pcie_error_reporting(struct pci_dev *dev) if (pcie_aer_get_firmware_first(dev)) return -EIO; - if (!pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR)) + if (!dev->aer_cap) return -EIO; return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS); @@ -57,7 +57,7 @@ int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev) int pos; u32 status; - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); + pos = dev->aer_cap; if (!pos) return -EIO; @@ -78,7 +78,7 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev) if (!pci_is_pcie(dev)) return -ENODEV; - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); + pos = dev->aer_cap; if (!pos) return -EIO; @@ -97,6 +97,12 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev) return 0; } +int pci_aer_init(struct pci_dev *dev) +{ + dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); + return pci_cleanup_aer_error_status_regs(dev); +} + /** * add_error_device - list device to be handled * @e_info: pointer to error info @@ -154,7 +160,7 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info) if (!(reg16 & PCI_EXP_AER_FLAGS)) return false; - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); + pos = dev->aer_cap; if (!pos) return false; @@ -551,7 +557,7 @@ static void handle_error_source(struct pcie_device *aerdev, * Correctable error does not need software intervention. * No need to go through error recovery process. */ - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); + pos = dev->aer_cap; if (pos) pci_write_config_dword(dev, pos + PCI_ERR_COR_STATUS, info->status); @@ -643,7 +649,7 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) info->status = 0; info->tlp_header_valid = 0; - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); + pos = dev->aer_cap; /* The device might not support AER */ if (!pos) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 93f280d..1575724 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1666,7 +1666,8 @@ static void pci_init_capabilities(struct pci_dev *dev) /* Enable ACS P2P upstream forwarding */ pci_enable_acs(dev); - pci_cleanup_aer_error_status_regs(dev); + /* Advanced Error Reporting */ + pci_aer_init(dev); } /* diff --git a/include/linux/pci.h b/include/linux/pci.h index 57bc838..ab6b027 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -269,6 +269,9 @@ struct pci_dev { unsigned int class; /* 3 bytes: (base,sub,prog-if) */ u8 revision; /* PCI revision, low byte of class word */ u8 hdr_type; /* PCI header type (`multi' flag masked out) */ +#ifdef CONFIG_PCIEAER + u16 aer_cap; /* AER capability offset */ +#endif u8 pcie_cap; /* PCIe capability offset */ u8 msi_cap; /* MSI capability offset */ u8 msix_cap; /* MSI-X capability offset */ @@ -1369,9 +1372,11 @@ static inline int pci_irq_vector(struct pci_dev *dev, unsigned int nr) #ifdef CONFIG_PCIEAER void pci_no_aer(void); bool pci_aer_available(void); +int pci_aer_init(struct pci_dev *dev); #else static inline void pci_no_aer(void) { } static inline bool pci_aer_available(void) { return false; } +static inline int pci_aer_init(struct pci_dev *d) { return -ENODEV; } #endif #ifdef CONFIG_PCIE_ECRC
On 08/08/2018 07:21 AM, gokul cg wrote: > Thanks Thomas, > > With patch you suggested , panic has gone away from > 'pci_find_next_ext_capability' as we not using inside aer_isr , but now > it hits at pci_bus_read_config_dword. Hmm, that's too bad. You probably are right, the dev->bus->ops->read may be corrupted. I am wondering can you print out the dev->bus->ops->read in normal working condition and compare it with the surprise power off. By the way, I am using following aer-inject tools: https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git Others thoughts is that I did pci_stop_and_remove_bus_device() and then call pci_read_config_dword() I still don't get the protection fault. Do your device driver do something special when it detects the power off? Or, may be the BIOS/UEFI did something to prevent the configure read? Regards, Thomas > > -------------------xxxxxxx bt og xxxxxxxx-----------------" > PID: 24 TASK: ffff880274ac0000 CPU: 0 COMMAND: "kworker/0:1" > #0 [ffff880274abbb18] machine_kexec at ffffffff8102cf18 > #1 [ffff880274abbb78] crash_kexec at ffffffff810a6b05 > #2 [ffff880274abbc40] oops_end at ffffffff8176d960 > #3 [ffff880274abbc68] die at ffffffff810060db > #4 [ffff880274abbc98] do_general_protection at ffffffff8176d452 > #5 [ffff880274abbcc0] general_protection at ffffffff8176cdf2 > [exception RIP: pci_bus_read_config_dword+100] > RIP: ffffffff813405f4 RSP: ffff880274abbd70 RFLAGS: 00010046 > RAX: 455a494c41495449 RBX: ffff880274891800 RCX: 0000000000000004 > RDX: 0000000000000110 RSI: 0000000000000060 RDI: ffff880274891800 > RBP: ffff880274abbd98 R8: ffff880274abbd7c R9: 00000000000011b5 > R10: 0000000000000000 R11: 00000000000011b4 R12: ffff8802741a0210 > R13: 0000000000000246 R14: ffff880272afc008 R15: ffff880272af8800 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #6 [ffff880274abbda0] get_device_error_info at ffffffff81356d74 > #7 [ffff880274abbdd0] aer_isr at ffffffff81357b41 > #8 [ffff880274abbe28] process_one_work at ffffffff8105d4c0 > #9 [ffff880274abbe70] worker_thread at ffffffff8105e251 > #10 [ffff880274abbed0] kthread at ffffffff81064260 > #11 [ffff880274abbf50] ret_from_fork at ffffffff81773a38" > > -------------------xxxxxxx bt og end xxxxxxxx----------------- > > > Regards, > Gokul > > > On Tue, Aug 7, 2018 at 9:00 PM, Thomas Tai <thomas.tai@oracle.com > <mailto:thomas.tai@oracle.com>> wrote: > > Hi Gokul, > Something pop up in my mind and want to share with you. I assume > that your device is not a root port device or a switch device. I > assume when you power off the device, a FATAL error is sent to the > root port thus trigger the aer_isr. > > Since it is a fatal error and your device is not a switch device, > the code should not reach out your device because fatal error means > that the link to your device is not reliable. So the > pci_find_ext_capability() looks strange to me. When compare the code > with the master branch. v3.10 is missing following patch. Would you > think you can give it a try? > > commit 66b808099146166c44157600a166c8372172cd76 > Author: Keith Busch <keith.busch@intel.com > <mailto:keith.busch@intel.com>> > Date: Tue Sep 27 16:23:34 2016 -0400 > > PCI/AER: Cache capability position > > Save the position of the error reporting capability so it > doesn't need to > be rediscovered during error handling. > > Signed-off-by: Keith Busch <keith.busch@intel.com > <mailto:keith.busch@intel.com>> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com > <mailto:bhelgaas@google.com>> > CC: Lukas Wunner <lukas@wunner.de <mailto:lukas@wunner.de>> > > - Thomas > > > On 08/06/2018 02:33 PM, gokul cg wrote: > > Hi, > > I have tried with following patch and I am still getting same > kernel panic. > > -------------X++++++++++++++++++++X--------------------- > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c > b/drivers/pci/pcie/aer/aerdrv_core.c > index 0f4554e..05592aa 100644 > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -26,6 +26,7 @@ > #include <linux/slab.h> > #include <linux/kfifo.h> > #include "aerdrv.h" > +#include "../../pci.h" > > static bool forceload; > static bool nosourceid; > @@ -82,7 +82,7 @@ > EXPORT_SYMBOL_GPL(pci_cleanup_aer_uncorrect_error_status); > static int add_error_device(struct aer_err_info *e_info, > struct pci_dev *dev) > { > if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { > -e_info->dev[e_info->error_dev_num] = dev; > +e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); > > e_info->error_dev_num++; > return 0; > } > @@ -659,6 +659,9 @@ static int get_device_error_info(struct > pci_dev *dev, struct aer_err_info *info) > if (!pos) > return 1; > > + if (pci_dev_is_disconnected(dev)) > + return 0; > + > if (info->severity == AER_CORRECTABLE) { > pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, > &info->status); > @@ -710,6 +713,8 @@ static inline void > aer_process_err_devices(struct pcie_device *p_device, > for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { > if (get_device_error_info(e_info->dev[i], e_info)) > handle_error_source(p_device, e_info->dev[i], e_info); > + > + pci_dev_put(e_info->dev[i]); > } > } > -------------X++++++++++++++++++++X--------------------- > > > Note: I have configured CONFIG_HOTPLUG_PCI_PCIE and > CONFIG_HOTPLUG_PCI as modules and loading in start up using script. > > root@/proc/:~# cat config | grep -i HOT > CONFIG_TICK_ONESHOT=y > CONFIG_HOTPLUG=y > # CONFIG_MEMORY_HOTPLUG is not set > CONFIG_HOTPLUG_CPU=y > # CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set > # CONFIG_DEBUG_HOTPLUG_CPU0 is not set > CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y > CONFIG_ACPI_HOTPLUG_CPU=y > CONFIG_HOTPLUG_PCI_PCIE=m > CONFIG_HOTPLUG_PCI=m > # CONFIG_HOTPLUG_PCI_CPCI is not set > # CONFIG_HOTPLUG_PCI_SHPC is not set > CONFIG_DM_SNAPSHOT=y > # CONFIG_USB_STORAGE_JUMPSHOT is not set > # CONFIG_TRACER_SNAPSHOT is not set > root@/proc/:~# > > Panic back trace : > crash> bt > PID: 24 TASK: ffff880274ac0000 CPU: 0 COMMAND: "kworker/0:1" > #0 [ffff880274abbac8] machine_kexec at ffffffff8102cf18 > #1 [ffff880274abbb28] crash_kexec at ffffffff810a6b05 > #2 [ffff880274abbbf0] oops_end at ffffffff8176d8a0 > #3 [ffff880274abbc18] die at ffffffff810060db > #4 [ffff880274abbc48] do_general_protection at ffffffff8176d392 > #5 [ffff880274abbc70] general_protection at ffffffff8176cd32 > [exception RIP: pci_bus_read_config_dword+100] > RIP: ffffffff813405f4 RSP: ffff880274abbd20 RFLAGS: 00010046 > RAX: 435f494350006963 RBX: ffff880274891800 RCX: > 0000000000000004 > RDX: 0000000000000ffc RSI: 0000000000000060 RDI: > ffff880274891800 > RBP: ffff880274abbd48 R8: ffff880274abbd2c R9: > 00000000000002b8 > R10: ffff880274340000 R11: 0000000000000246 R12: > ffff880274abbd5c > R13: 0000000000000246 R14: 0000000000000000 R15: > ffff880274920000 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #6 [ffff880274abbd50] pci_find_next_ext_capability at > ffffffff81345db6 > #7 [ffff880274abbd90] pci_find_ext_capability at ffffffff81347225 > #8 [ffff880274abbda0] get_device_error_info at ffffffff81356c4d > #9 [ffff880274abbdd0] aer_isr at ffffffff81357ab0 > #10 [ffff880274abbe28] process_one_work at ffffffff8105d4c0 > #11 [ffff880274abbe70] worker_thread at ffffffff8105e251 > #12 [ffff880274abbed0] kthread at ffffffff81064260 > #13 [ffff880274abbf50] ret_from_fork at ffffffff81773978 > crash> > > > Regards, > Gokul > > On Thu, Aug 2, 2018 at 10:39 PM, Thomas Tai > <thomas.tai@oracle.com <mailto:thomas.tai@oracle.com> > <mailto:thomas.tai@oracle.com <mailto:thomas.tai@oracle.com>>> > wrote: > > > On 08/02/2018 11:07 AM, Lukas Wunner wrote: > > [cc += Thomas Tai] > > > Hi Lukas, > Thank you very much for cc me. > > > On Thu, Aug 02, 2018 at 10:46:57AM +0200, Lukas Wunner > wrote: > > On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg > wrote: > > I am suspecting a possible race condition in > the kernel > between PCI driver > and AER handling. > > > The solution is to acquire a ref on each device in > add_error_device(). > Then release the ref aer_process_err_devices() by > calling > pci_dev_put(). > > > So in case it wasn't clear, the below is what I had in > mind. > Completely untested though. Does this work for you? > > For v3.10 compatibility, cherry-pick 89ee9f768003 (or > alternatively > cherry-pick 8496e85c20e7 and replace > pci_dev_is_disconnected(dev) > with !pci_device_is_present(dev)). > > -- >8 -- > Subject: [PATCH] PCI/AER: Fix use-after-free on > surprise removal > > The work item to consume errors, aer_isr(), walks the > hierarchy > using > pci_walk_bus() and stores a pointer to PCI devices > which reported an > error in an array. As long as pci_walk_bus() runs, those > pointers are > valid because pci_bus_sem is held. But once pci_walk_bus() > finishes, > nothing prevents the pointers from becoming invalid, > e.g. through > unplugging of the PCI devices. The unprotected > pointers are then > dereferenced in aer_process_err_devices(), which may oops: > > > I like your idea to increment the refcount during > pci_walk_bus(), > that should fix the use-after-free issue. We just need Gokul to > confirm if it fixes his issue or not. > > Thanks, > Thomas > > > > #5 general_protection at ffffffff8176cdf2 > [exception RIP: pci_bus_read_config_dword+100] > #6 pci_find_next_ext_capability at ffffffff81345d7b > #7 pci_find_ext_capability at ffffffff81347225 > #8 get_device_error_info at ffffffff81356c4d > #9 aer_isr at ffffffff81357a38 > > Fix by holding a ref on the devices until they have > been processed. > Skip processing of unplugged devices. > > Reported-by: gokul cg <gokuljnpr@gmail.com > <mailto:gokuljnpr@gmail.com> > <mailto:gokuljnpr@gmail.com <mailto:gokuljnpr@gmail.com>>> > Signed-off-by: Lukas Wunner <lukas@wunner.de > <mailto:lukas@wunner.de> > <mailto:lukas@wunner.de <mailto:lukas@wunner.de>>> > > --- > drivers/pci/pcie/aer.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer.c > b/drivers/pci/pcie/aer.c > index a2e8838..937592e 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -657,7 +657,7 @@ void cper_print_aer(struct pci_dev > *dev, int > aer_severity, > static int add_error_device(struct aer_err_info *e_info, > struct pci_dev *dev) > { > if (e_info->error_dev_num < > AER_MAX_MULTI_ERR_DEVICES) { > - e_info->dev[e_info->error_dev_num] = dev; > + e_info->dev[e_info->error_dev_num] = > pci_dev_get(dev); > e_info->error_dev_num++; > return 0; > } > @@ -898,6 +898,9 @@ static int get_device_error_info(struct > pci_dev *dev, struct aer_err_info *info) > if (!pos) > return 0; > + if (pci_dev_is_disconnected(dev)) > + return 0; > + > if (info->severity == AER_CORRECTABLE) { > pci_read_config_dword(dev, pos + > PCI_ERR_COR_STATUS, > &info->status); > @@ -948,6 +951,7 @@ static inline void > aer_process_err_devices(struct aer_err_info *e_info) > for (i = 0; i < e_info->error_dev_num && > e_info->dev[i]; i++) { > if > (get_device_error_info(e_info->dev[i], e_info)) > > handle_error_source(e_info->dev[i], > e_info); > + pci_dev_put(e_info->dev[i]); > } > } > > >
On Tue, Jul 31, 2018 at 07:50:37AM +0200, Lukas Wunner wrote: > When removing PCI devices below a hotplug bridge, pciehp marks them as > disconnected if the card is no longer present in the slot or it quiesces > them if the card is still present (by disabling INTx interrupts, bus > mastering and SERR# reporting). > > To detect whether the card is still present, pciehp checks the Presence > Detect State bit in the Slot Status register. The problem with this > approach is that even if the card is present, the link to it may be > down, and it that case it would be better to mark the devices as > disconnected instead of trying to quiesce them. Moreover, if the card > in the slot was quickly replaced by another one, the Presence Detect > State bit would be set, yet trying to quiesce the new card's devices > would be wrong and the correct thing to do is to mark the previous > card's devices as disconnected. > > Instead of looking at the Presence Detect State bit, it is better to > differentiate whether the card was surprise removed versus safely > removed (via sysfs or an Attention Button press). On surprise removal, > the devices should be marked as disconnected, whereas on safe removal it > is correct to quiesce the devices. > > The knowledge whether a surprise removal or a safe removal is at hand > does exist further up in the call stack: A surprise removal is > initiated by pciehp_handle_presence_or_link_change(), a safe removal by > pciehp_handle_disable_request(). > > Pass that information down to pciehp_unconfigure_device() and use it in > lieu of the Presence Detect State bit. While there, add kernel-doc to > pciehp_unconfigure_device() and pciehp_configure_device(). > > Tested-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: Keith Busch <keith.busch@intel.com> Applied to pci/hotplug for v4.20, thanks! > --- > drivers/pci/hotplug/pciehp.h | 2 +- > drivers/pci/hotplug/pciehp_ctrl.c | 22 +++++++++++++--------- > drivers/pci/hotplug/pciehp_pci.c | 23 ++++++++++++++++++++--- > 3 files changed, 34 insertions(+), 13 deletions(-) > > diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h > index 811cf83f956d..39c9c8815a35 100644 > --- a/drivers/pci/hotplug/pciehp.h > +++ b/drivers/pci/hotplug/pciehp.h > @@ -181,7 +181,7 @@ void pciehp_handle_button_press(struct slot *slot); > void pciehp_handle_disable_request(struct slot *slot); > void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events); > int pciehp_configure_device(struct slot *p_slot); > -void pciehp_unconfigure_device(struct slot *p_slot); > +void pciehp_unconfigure_device(struct slot *p_slot, bool presence); > void pciehp_queue_pushbutton_work(struct work_struct *work); > struct controller *pcie_init(struct pcie_device *dev); > int pcie_init_notification(struct controller *ctrl); > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c > index 6855933ab372..7932e70e9f29 100644 > --- a/drivers/pci/hotplug/pciehp_ctrl.c > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > @@ -26,6 +26,9 @@ > hotplug controller logic > */ > > +#define SAFE_REMOVAL true > +#define SURPRISE_REMOVAL false > + > static void set_slot_off(struct controller *ctrl, struct slot *pslot) > { > /* turn off slot, turn on Amber LED, turn off Green LED if supported*/ > @@ -101,12 +104,13 @@ static int board_added(struct slot *p_slot) > /** > * remove_board - Turns off slot and LEDs > * @p_slot: slot where board is being removed > + * @safe_removal: whether the board is safely removed (versus surprise removed) > */ > -static void remove_board(struct slot *p_slot) > +static void remove_board(struct slot *p_slot, bool safe_removal) > { > struct controller *ctrl = p_slot->ctrl; > > - pciehp_unconfigure_device(p_slot); > + pciehp_unconfigure_device(p_slot, safe_removal); > > if (POWER_CTRL(ctrl)) { > pciehp_power_off_slot(p_slot); > @@ -124,7 +128,7 @@ static void remove_board(struct slot *p_slot) > } > > static int pciehp_enable_slot(struct slot *slot); > -static int pciehp_disable_slot(struct slot *slot); > +static int pciehp_disable_slot(struct slot *slot, bool safe_removal); > > void pciehp_request(struct controller *ctrl, int action) > { > @@ -215,7 +219,7 @@ void pciehp_handle_disable_request(struct slot *slot) > slot->state = POWEROFF_STATE; > mutex_unlock(&slot->lock); > > - ctrl->request_result = pciehp_disable_slot(slot); > + ctrl->request_result = pciehp_disable_slot(slot, SAFE_REMOVAL); > } > > void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events) > @@ -241,7 +245,7 @@ void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events) > if (events & PCI_EXP_SLTSTA_PDC) > ctrl_info(ctrl, "Slot(%s): Card not present\n", > slot_name(slot)); > - pciehp_disable_slot(slot); > + pciehp_disable_slot(slot, SURPRISE_REMOVAL); > break; > default: > mutex_unlock(&slot->lock); > @@ -324,7 +328,7 @@ static int pciehp_enable_slot(struct slot *slot) > return ret; > } > > -static int __pciehp_disable_slot(struct slot *p_slot) > +static int __pciehp_disable_slot(struct slot *p_slot, bool safe_removal) > { > u8 getstatus = 0; > struct controller *ctrl = p_slot->ctrl; > @@ -338,17 +342,17 @@ static int __pciehp_disable_slot(struct slot *p_slot) > } > } > > - remove_board(p_slot); > + remove_board(p_slot, safe_removal); > return 0; > } > > -static int pciehp_disable_slot(struct slot *slot) > +static int pciehp_disable_slot(struct slot *slot, bool safe_removal) > { > struct controller *ctrl = slot->ctrl; > int ret; > > pm_runtime_get_sync(&ctrl->pcie->port->dev); > - ret = __pciehp_disable_slot(slot); > + ret = __pciehp_disable_slot(slot, safe_removal); > pm_runtime_put(&ctrl->pcie->port->dev); > > mutex_lock(&slot->lock); > diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c > index 5c58c22e0c08..18f83e554c73 100644 > --- a/drivers/pci/hotplug/pciehp_pci.c > +++ b/drivers/pci/hotplug/pciehp_pci.c > @@ -20,6 +20,14 @@ > #include "../pci.h" > #include "pciehp.h" > > +/** > + * pciehp_configure_device() - enumerate PCI devices below a hotplug bridge > + * @p_slot: PCIe hotplug slot > + * > + * Enumerate PCI devices below a hotplug bridge and add them to the system. > + * Return 0 on success, %-EEXIST if the devices are already enumerated or > + * %-ENODEV if enumeration failed. > + */ > int pciehp_configure_device(struct slot *p_slot) > { > struct pci_dev *dev; > @@ -62,9 +70,19 @@ int pciehp_configure_device(struct slot *p_slot) > return ret; > } > > -void pciehp_unconfigure_device(struct slot *p_slot) > +/** > + * pciehp_unconfigure_device() - remove PCI devices below a hotplug bridge > + * @p_slot: PCIe hotplug slot > + * @presence: whether the card is still present in the slot; > + * true for safe removal via sysfs or an Attention Button press, > + * false for surprise removal > + * > + * Unbind PCI devices below a hotplug bridge from their drivers and remove > + * them from the system. Safely removed devices are quiesced. Surprise > + * removed devices are marked as such to prevent further accesses. > + */ > +void pciehp_unconfigure_device(struct slot *p_slot, bool presence) > { > - u8 presence = 0; > struct pci_dev *dev, *temp; > struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate; > u16 command; > @@ -72,7 +90,6 @@ void pciehp_unconfigure_device(struct slot *p_slot) > > ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n", > __func__, pci_domain_nr(parent), parent->number); > - pciehp_get_adapter_status(p_slot, &presence); > > pci_lock_rescan_remove(); > > -- > 2.18.0 >
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index 811cf83f956d..39c9c8815a35 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -181,7 +181,7 @@ void pciehp_handle_button_press(struct slot *slot); void pciehp_handle_disable_request(struct slot *slot); void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events); int pciehp_configure_device(struct slot *p_slot); -void pciehp_unconfigure_device(struct slot *p_slot); +void pciehp_unconfigure_device(struct slot *p_slot, bool presence); void pciehp_queue_pushbutton_work(struct work_struct *work); struct controller *pcie_init(struct pcie_device *dev); int pcie_init_notification(struct controller *ctrl); diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index 6855933ab372..7932e70e9f29 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -26,6 +26,9 @@ hotplug controller logic */ +#define SAFE_REMOVAL true +#define SURPRISE_REMOVAL false + static void set_slot_off(struct controller *ctrl, struct slot *pslot) { /* turn off slot, turn on Amber LED, turn off Green LED if supported*/ @@ -101,12 +104,13 @@ static int board_added(struct slot *p_slot) /** * remove_board - Turns off slot and LEDs * @p_slot: slot where board is being removed + * @safe_removal: whether the board is safely removed (versus surprise removed) */ -static void remove_board(struct slot *p_slot) +static void remove_board(struct slot *p_slot, bool safe_removal) { struct controller *ctrl = p_slot->ctrl; - pciehp_unconfigure_device(p_slot); + pciehp_unconfigure_device(p_slot, safe_removal); if (POWER_CTRL(ctrl)) { pciehp_power_off_slot(p_slot); @@ -124,7 +128,7 @@ static void remove_board(struct slot *p_slot) } static int pciehp_enable_slot(struct slot *slot); -static int pciehp_disable_slot(struct slot *slot); +static int pciehp_disable_slot(struct slot *slot, bool safe_removal); void pciehp_request(struct controller *ctrl, int action) { @@ -215,7 +219,7 @@ void pciehp_handle_disable_request(struct slot *slot) slot->state = POWEROFF_STATE; mutex_unlock(&slot->lock); - ctrl->request_result = pciehp_disable_slot(slot); + ctrl->request_result = pciehp_disable_slot(slot, SAFE_REMOVAL); } void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events) @@ -241,7 +245,7 @@ void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events) if (events & PCI_EXP_SLTSTA_PDC) ctrl_info(ctrl, "Slot(%s): Card not present\n", slot_name(slot)); - pciehp_disable_slot(slot); + pciehp_disable_slot(slot, SURPRISE_REMOVAL); break; default: mutex_unlock(&slot->lock); @@ -324,7 +328,7 @@ static int pciehp_enable_slot(struct slot *slot) return ret; } -static int __pciehp_disable_slot(struct slot *p_slot) +static int __pciehp_disable_slot(struct slot *p_slot, bool safe_removal) { u8 getstatus = 0; struct controller *ctrl = p_slot->ctrl; @@ -338,17 +342,17 @@ static int __pciehp_disable_slot(struct slot *p_slot) } } - remove_board(p_slot); + remove_board(p_slot, safe_removal); return 0; } -static int pciehp_disable_slot(struct slot *slot) +static int pciehp_disable_slot(struct slot *slot, bool safe_removal) { struct controller *ctrl = slot->ctrl; int ret; pm_runtime_get_sync(&ctrl->pcie->port->dev); - ret = __pciehp_disable_slot(slot); + ret = __pciehp_disable_slot(slot, safe_removal); pm_runtime_put(&ctrl->pcie->port->dev); mutex_lock(&slot->lock); diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c index 5c58c22e0c08..18f83e554c73 100644 --- a/drivers/pci/hotplug/pciehp_pci.c +++ b/drivers/pci/hotplug/pciehp_pci.c @@ -20,6 +20,14 @@ #include "../pci.h" #include "pciehp.h" +/** + * pciehp_configure_device() - enumerate PCI devices below a hotplug bridge + * @p_slot: PCIe hotplug slot + * + * Enumerate PCI devices below a hotplug bridge and add them to the system. + * Return 0 on success, %-EEXIST if the devices are already enumerated or + * %-ENODEV if enumeration failed. + */ int pciehp_configure_device(struct slot *p_slot) { struct pci_dev *dev; @@ -62,9 +70,19 @@ int pciehp_configure_device(struct slot *p_slot) return ret; } -void pciehp_unconfigure_device(struct slot *p_slot) +/** + * pciehp_unconfigure_device() - remove PCI devices below a hotplug bridge + * @p_slot: PCIe hotplug slot + * @presence: whether the card is still present in the slot; + * true for safe removal via sysfs or an Attention Button press, + * false for surprise removal + * + * Unbind PCI devices below a hotplug bridge from their drivers and remove + * them from the system. Safely removed devices are quiesced. Surprise + * removed devices are marked as such to prevent further accesses. + */ +void pciehp_unconfigure_device(struct slot *p_slot, bool presence) { - u8 presence = 0; struct pci_dev *dev, *temp; struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate; u16 command; @@ -72,7 +90,6 @@ void pciehp_unconfigure_device(struct slot *p_slot) ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n", __func__, pci_domain_nr(parent), parent->number); - pciehp_get_adapter_status(p_slot, &presence); pci_lock_rescan_remove();