Message ID | PU1P153MB0169F06340FC519C2802C5CFBF680@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: hv: Add hv_pci_remove_slots() when we unload the driver | expand |
On Thu, 7 Feb 2019 20:36:32 +0000 Dexuan Cui <decui@microsoft.com> wrote: > When we unload pci-hyperv, the host doesn't send us a PCI_EJECT message. > In this case we also need to make sure the sysfs pci slot directory > is removed, otherwise "cat /sys/bus/pci/slots/2/address" will trigger > "BUG: unable to handle kernel paging request". And, if we unload/reload > the driver several times, we'll have multiple pci slot directories in > /sys/bus/pci/slots/ like this: > > root@localhost:~# ls -rtl /sys/bus/pci/slots/ > total 0 > drwxr-xr-x 2 root root 0 Feb 7 10:49 2 > drwxr-xr-x 2 root root 0 Feb 7 10:49 2-1 > drwxr-xr-x 2 root root 0 Feb 7 10:51 2-2 > > The patch adds the missing code, and in hv_eject_device_work() it also > moves pci_destroy_slot() to an earlier place where we hold the pci lock. > > Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information") > Signed-off-by: Dexuan Cui <decui@microsoft.com> > Cc: stable@vger.kernel.org > Cc: Stephen Hemminger <sthemmin@microsoft.com> Acked-by: Stephen Hemminger <sthemmin@microsoft.com>
On Thu, Feb 07, 2019 at 08:36:32PM +0000, Dexuan Cui wrote: > > When we unload pci-hyperv, the host doesn't send us a PCI_EJECT message. > In this case we also need to make sure the sysfs pci slot directory > is removed, otherwise "cat /sys/bus/pci/slots/2/address" will trigger > "BUG: unable to handle kernel paging request". And, if we unload/reload > the driver several times, we'll have multiple pci slot directories in > /sys/bus/pci/slots/ like this: > > root@localhost:~# ls -rtl /sys/bus/pci/slots/ > total 0 > drwxr-xr-x 2 root root 0 Feb 7 10:49 2 > drwxr-xr-x 2 root root 0 Feb 7 10:49 2-1 > drwxr-xr-x 2 root root 0 Feb 7 10:51 2-2 > > The patch adds the missing code, and in hv_eject_device_work() it also > moves pci_destroy_slot() to an earlier place where we hold the pci lock. This patch fixes three bugs: 1) set hpdev->pci_slot to NULL 2) move code destroying the slot inside a locked region in hv_eject_device_work() 3) Add missing slots removal code in hv_pci_remove() We need three patches, not one. (1) and (2), I am not entirely sure we want them in stable kernels, since they are potential bugs, waiting for your input. Lorenzo > Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information") > Signed-off-by: Dexuan Cui <decui@microsoft.com> > Cc: stable@vger.kernel.org > Cc: Stephen Hemminger <sthemmin@microsoft.com> > --- > drivers/pci/controller/pci-hyperv.c | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > index 9ba4d12c179c..6b4773727525 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -1491,6 +1491,21 @@ static void hv_pci_assign_slots(struct hv_pcibus_device *hbus) > } > } > > +/* > + * Remove entries in sysfs pci slot directory. > + */ > +static void hv_pci_remove_slots(struct hv_pcibus_device *hbus) > +{ > + struct hv_pci_dev *hpdev; > + > + list_for_each_entry(hpdev, &hbus->children, list_entry) { > + if (!hpdev->pci_slot) > + continue; > + pci_destroy_slot(hpdev->pci_slot); > + hpdev->pci_slot = NULL; > + } > +} > + > /** > * create_root_hv_pci_bus() - Expose a new root PCI bus > * @hbus: Root PCI bus, as understood by this driver > @@ -1887,6 +1902,10 @@ static void hv_eject_device_work(struct work_struct *work) > pci_lock_rescan_remove(); > pci_stop_and_remove_bus_device(pdev); > pci_dev_put(pdev); > + if (hpdev->pci_slot) { > + pci_destroy_slot(hpdev->pci_slot); > + hpdev->pci_slot = NULL; > + } > pci_unlock_rescan_remove(); > } > > @@ -1894,9 +1913,6 @@ static void hv_eject_device_work(struct work_struct *work) > list_del(&hpdev->list_entry); > spin_unlock_irqrestore(&hpdev->hbus->device_list_lock, flags); > > - if (hpdev->pci_slot) > - pci_destroy_slot(hpdev->pci_slot); > - > memset(&ctxt, 0, sizeof(ctxt)); > ejct_pkt = (struct pci_eject_response *)&ctxt.pkt.message; > ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE; > @@ -2682,6 +2698,7 @@ static int hv_pci_remove(struct hv_device *hdev) > pci_lock_rescan_remove(); > pci_stop_root_bus(hbus->pci_bus); > pci_remove_root_bus(hbus->pci_bus); > + hv_pci_remove_slots(hbus); > pci_unlock_rescan_remove(); > hbus->state = hv_pcibus_removed; > } > -- > 2.19.1 >
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > Sent: Tuesday, February 12, 2019 4:13 AM > ... > This patch fixes three bugs: > > 1) set hpdev->pci_slot to NULL > 2) move code destroying the slot inside a locked region in > hv_eject_device_work() > 3) Add missing slots removal code in hv_pci_remove() > > We need three patches, not one. > > (1) and (2), I am not entirely sure we want them in stable kernels, > since they are potential bugs, waiting for your input. > > Lorenzo (1) is actually unnecessary, as I suppose hpdev should be freed at a later place in the same function hv_eject_device_work -> put_pcichild() -> kfree(hpdev). But today I think I found a refcount bug in the hot-remove case and the "kfree(hpdev)" is never called in the hot-remove case. I'll further dig into this and make some extra patches. About (2), it's a race condition that can happen when the device is being hot-removed and we're unloading the pci-hyperv driver at the same time. This is not a normal usage, so I agree it doesn't really need to go into the stables. (3) should go into the stables. I'll make 3 separate patches, and extra patches for the refcount issue, and possible other minor issues. Thanks, -- Dexuan
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index 9ba4d12c179c..6b4773727525 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1491,6 +1491,21 @@ static void hv_pci_assign_slots(struct hv_pcibus_device *hbus) } } +/* + * Remove entries in sysfs pci slot directory. + */ +static void hv_pci_remove_slots(struct hv_pcibus_device *hbus) +{ + struct hv_pci_dev *hpdev; + + list_for_each_entry(hpdev, &hbus->children, list_entry) { + if (!hpdev->pci_slot) + continue; + pci_destroy_slot(hpdev->pci_slot); + hpdev->pci_slot = NULL; + } +} + /** * create_root_hv_pci_bus() - Expose a new root PCI bus * @hbus: Root PCI bus, as understood by this driver @@ -1887,6 +1902,10 @@ static void hv_eject_device_work(struct work_struct *work) pci_lock_rescan_remove(); pci_stop_and_remove_bus_device(pdev); pci_dev_put(pdev); + if (hpdev->pci_slot) { + pci_destroy_slot(hpdev->pci_slot); + hpdev->pci_slot = NULL; + } pci_unlock_rescan_remove(); } @@ -1894,9 +1913,6 @@ static void hv_eject_device_work(struct work_struct *work) list_del(&hpdev->list_entry); spin_unlock_irqrestore(&hpdev->hbus->device_list_lock, flags); - if (hpdev->pci_slot) - pci_destroy_slot(hpdev->pci_slot); - memset(&ctxt, 0, sizeof(ctxt)); ejct_pkt = (struct pci_eject_response *)&ctxt.pkt.message; ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE; @@ -2682,6 +2698,7 @@ static int hv_pci_remove(struct hv_device *hdev) pci_lock_rescan_remove(); pci_stop_root_bus(hbus->pci_bus); pci_remove_root_bus(hbus->pci_bus); + hv_pci_remove_slots(hbus); pci_unlock_rescan_remove(); hbus->state = hv_pcibus_removed; }
When we unload pci-hyperv, the host doesn't send us a PCI_EJECT message. In this case we also need to make sure the sysfs pci slot directory is removed, otherwise "cat /sys/bus/pci/slots/2/address" will trigger "BUG: unable to handle kernel paging request". And, if we unload/reload the driver several times, we'll have multiple pci slot directories in /sys/bus/pci/slots/ like this: root@localhost:~# ls -rtl /sys/bus/pci/slots/ total 0 drwxr-xr-x 2 root root 0 Feb 7 10:49 2 drwxr-xr-x 2 root root 0 Feb 7 10:49 2-1 drwxr-xr-x 2 root root 0 Feb 7 10:51 2-2 The patch adds the missing code, and in hv_eject_device_work() it also moves pci_destroy_slot() to an earlier place where we hold the pci lock. Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information") Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: stable@vger.kernel.org Cc: Stephen Hemminger <sthemmin@microsoft.com> --- drivers/pci/controller/pci-hyperv.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)