Message ID | alpine.LRH.2.02.1805021717430.2705@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, May 03, 2018 at 10:55:09AM -0400, Mikulas Patocka wrote: > I think there is still one more bug: > > If nvme_probe is called, it schedules the asynchronous work using > async_schedule - now suppose that the pci system calls the "remove", > "shutdown" or "suspend" method - this method will race with > nvme_async_probe running in the async domain - that will cause > misbehavior. > > Or - does the PCI subsystem flush the async queues before calling these > methods? I'm not sure, but it doesn't seem so. > > I think, you need to save the cookie returned by async_schedule and wait > for this cookie with async_synchronize_cookie in the other methods. I think we're fine as-is without syncing the cookie. The remove path should be fine since we already sync with the necessary work queues. The shutdown, suspend and reset paths will just cause the initial reset work to end early, the same result as what previously would happen.
On Thu, 3 May 2018, Keith Busch wrote: > On Thu, May 03, 2018 at 10:55:09AM -0400, Mikulas Patocka wrote: > > I think there is still one more bug: > > > > If nvme_probe is called, it schedules the asynchronous work using > > async_schedule - now suppose that the pci system calls the "remove", > > "shutdown" or "suspend" method - this method will race with > > nvme_async_probe running in the async domain - that will cause > > misbehavior. > > > > Or - does the PCI subsystem flush the async queues before calling these > > methods? I'm not sure, but it doesn't seem so. > > > > I think, you need to save the cookie returned by async_schedule and wait > > for this cookie with async_synchronize_cookie in the other methods. > > I think we're fine as-is without syncing the cookie. > > The remove path should be fine since we already sync with the necessary > work queues. > > The shutdown, suspend and reset paths will just cause the initial reset > work to end early, the same result as what previously would happen. Suppose this: task 1: nvme_probe task 1: calls async_schedule(nvme_async_probe), that queues the work for task 2 task 1: exits (so the device is active from pci subsystem's point of view) task 3: the pci subsystem calls nvme_remove task 3: calls nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING); task 3: cancel_work_sync(&dev->ctrl.reset_work); (does nothing because the work item hasn't started yet) task 3: nvme_remove does all the remaining work task 3: frees the device task 3: exists nvme_remove task 2: (in the async domain) runs nvme_async_probe task 2: calls nvme_reset_ctrl_sync task 2: nvme_reset_ctrl task 2: calls nvme_change_ctrl_state and queue_work - on a structure that was already freed by nvme_remove This bug is rare - but it may happen if the user too quickly activates and deactivates the device by writing to sysfs. Mikulas
On Thu, May 03, 2018 at 04:45:22PM -0400, Mikulas Patocka wrote: > Suppose this: > task 1: nvme_probe > task 1: calls async_schedule(nvme_async_probe), that queues the work for > task 2 > task 1: exits (so the device is active from pci subsystem's point of view) > task 3: the pci subsystem calls nvme_remove > task 3: calls nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING); > task 3: cancel_work_sync(&dev->ctrl.reset_work); (does nothing because the > work item hasn't started yet) > task 3: nvme_remove does all the remaining work > task 3: frees the device > task 3: exists nvme_remove > task 2: (in the async domain) runs nvme_async_probe > task 2: calls nvme_reset_ctrl_sync > task 2: nvme_reset_ctrl > task 2: calls nvme_change_ctrl_state and queue_work - on a structure that > was already freed by nvme_remove > > This bug is rare - but it may happen if the user too quickly activates and > deactivates the device by writing to sysfs. Okay, I think I see your point. Pairing a nvme_get_ctrl with a nvme_put_ctrl should fix that.
Index: linux-4.16.6/drivers/nvme/host/pci.c =================================================================== --- linux-4.16.6.orig/drivers/nvme/host/pci.c 2018-05-03 16:43:01.000000000 +0200 +++ linux-4.16.6/drivers/nvme/host/pci.c 2018-05-03 16:46:35.000000000 +0200 @@ -13,6 +13,7 @@ */ #include <linux/aer.h> +#include <linux/async.h> #include <linux/blkdev.h> #include <linux/blk-mq.h> #include <linux/blk-mq-pci.h> @@ -111,6 +112,8 @@ struct nvme_dev { dma_addr_t host_mem_descs_dma; struct nvme_host_mem_buf_desc *host_mem_descs; void **host_mem_desc_bufs; + + async_cookie_t probe_cookie; }; static int io_queue_depth_set(const char *val, const struct kernel_param *kp) @@ -2471,6 +2474,13 @@ static unsigned long check_vendor_combin return 0; } +static void nvme_async_probe(void *data, async_cookie_t cookie) +{ + struct nvme_dev *dev = data; + nvme_reset_ctrl_sync(&dev->ctrl); + flush_work(&dev->ctrl.scan_work); +} + static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id) { int node, result = -ENOMEM; @@ -2515,7 +2525,7 @@ static int nvme_probe(struct pci_dev *pd dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev)); - nvme_reset_ctrl(&dev->ctrl); + dev->probe_cookie = async_schedule(nvme_async_probe, dev); return 0; @@ -2546,6 +2556,9 @@ static void nvme_reset_done(struct pci_d static void nvme_shutdown(struct pci_dev *pdev) { struct nvme_dev *dev = pci_get_drvdata(pdev); + + async_synchronize_cookie(dev->probe_cookie); + nvme_dev_disable(dev, true); } @@ -2558,6 +2571,8 @@ static void nvme_remove(struct pci_dev * { struct nvme_dev *dev = pci_get_drvdata(pdev); + async_synchronize_cookie(dev->probe_cookie); + nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING); cancel_work_sync(&dev->ctrl.reset_work); @@ -2605,6 +2620,8 @@ static int nvme_suspend(struct device *d struct pci_dev *pdev = to_pci_dev(dev); struct nvme_dev *ndev = pci_get_drvdata(pdev); + async_synchronize_cookie(dev->probe_cookie); + nvme_dev_disable(ndev, true); return 0; } @@ -2614,6 +2631,8 @@ static int nvme_resume(struct device *de struct pci_dev *pdev = to_pci_dev(dev); struct nvme_dev *ndev = pci_get_drvdata(pdev); + async_synchronize_cookie(dev->probe_cookie); + nvme_reset_ctrl(&ndev->ctrl); return 0; }