Message ID | 1998341d-7f47-f73a-c5c5-4095b7fa9ef1@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 06/16/2017 10:41 AM, Douglas Miller wrote: > On 03/16/2017 01:49 PM, James Bottomley wrote: >> On Wed, 2017-03-15 at 19:39 -0400, Martin K. Petersen wrote: >>> Maurizio Lombardi <mlombard@redhat.com> writes: >>> >>>> With multipath, it may happen that the same device is passed to >>>> enclosure_add_device() multiple times and that the >>>> enclosure_add_links() function fails to create the symlinks because >>>> the device's sysfs directory entry is still NULL. In this case, >>>> the >>>> links will never be created because all the subsequent calls to >>>> enclosure_add_device() will immediately fail with EEXIST. >>> James? >> Well I don't think the patch is the correct way to do this. The >> problem is that if we encounter an error creating the links, we >> shouldn't add the device to the enclosure. There's no need of a >> links_created variable (see below). >> >> However, more interesting is why the link creation failed in the first >> place. The device clearly seems to exist because it was added to sysfs >> at time index 19.2 and the enclosure didn't try to use it until 60.0. >> Can you debug this a bit more, please? I can't see anything specific >> to multipath in the trace, so whatever this is looks like it could >> happen in the single path case as well. >> >> James >> >> diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c >> index 65fed71..ae89082 100644 >> --- a/drivers/misc/enclosure.c >> +++ b/drivers/misc/enclosure.c >> @@ -375,6 +375,7 @@ int enclosure_add_device(struct enclosure_device >> *edev, int component, >> struct device *dev) >> { >> struct enclosure_component *cdev; >> + int err; >> if (!edev || component >= edev->components) >> return -EINVAL; >> @@ -384,12 +385,15 @@ int enclosure_add_device(struct >> enclosure_device *edev, int component, >> if (cdev->dev == dev) >> return -EEXIST; >> - if (cdev->dev) >> + if (cdev->dev) { >> enclosure_remove_links(cdev); >> - >> - put_device(cdev->dev); >> - cdev->dev = get_device(dev); >> - return enclosure_add_links(cdev); >> + put_device(cdev->dev); >> + cdev->dev = NULL; >> + } >> + err = enclosure_add_links(cdev); >> + if (!err) >> + cdev->dev = get_device(dev); >> + return err; >> } >> EXPORT_SYMBOL_GPL(enclosure_add_device); > After stumbling across the NULL pointer panic, I was able to use > Maurizio's second patch below: > > diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c > index 65fed71..6ac07ea 100644 > --- a/drivers/misc/enclosure.c > +++ b/drivers/misc/enclosure.c > @@ -375,6 +375,7 @@ int enclosure_add_device(struct enclosure_device > *edev, int component, > struct device *dev) > { > struct enclosure_component *cdev; > + int err; > > if (!edev || component >= edev->components) > return -EINVAL; > @@ -384,12 +385,17 @@ int enclosure_add_device(struct enclosure_device > *edev, int component, > if (cdev->dev == dev) > return -EEXIST; > > - if (cdev->dev) > + if (cdev->dev) { > enclosure_remove_links(cdev); > - > - put_device(cdev->dev); > + put_device(cdev->dev); > + } > cdev->dev = get_device(dev); > - return enclosure_add_links(cdev); > + err = enclosure_add_links(cdev); > + if (err) { > + cdev->dev = NULL; > + put_device(cdev->dev); > + } > + return err; > } > EXPORT_SYMBOL_GPL(enclosure_add_device); > > > I am able to pass my testing with this patch. I don't see an official > submit of this patch, but will respond to it when I see one. Again, I > am seeing the problem even without multipath. > Just to respond to James' question on the cause. What I observed was a race condition between udevd (ses_init()) and a worker thread (do_scan_async()), where the worker thread is creating the directories that are the target of the symlinks being created by udevd. Something was happening when udevd caught up with the worker thread (so the target directory did not exist) and it seemed the worker thread either got preempted or else just could not stay ahead of udevd. This means that udevd started failing to create symlinks even though the worker thread eventually got them all created. I did observe what appeared to be preemption, as the creation of directories stopped until udevd finished failing all the (rest of the) symlinks. Although there may have been other explanations for what I saw.
Dne 16.6.2017 v 18:08 Douglas Miller napsal(a): > Just to respond to James' question on the cause. What I observed was a race condition between udevd (ses_init()) and a worker thread (do_scan_async()), > where the worker thread is creating the directories that are the target of the symlinks being created by udevd. > Something was happening when udevd caught up with the worker thread (so the target directory did not exist) and it seemed the worker thread either got preempted or > else just could not stay ahead of udevd. This means that udevd started failing to create symlinks even though the worker thread eventually got them all created. > I did observe what appeared to be preemption, as the creation of directories stopped until udevd finished failing all the (rest of the) symlinks. > Although there may have been other explanations for what I saw. > > I am able to pass my testing with this patch. I don't see an official submit of this patch, but will respond to it when I see one. Thanks Douglas for testing it, I will resubmit the patch if no one has any objections. Maurizio.
On 06/20/2017 06:38 AM, Maurizio Lombardi wrote: > > Dne 16.6.2017 v 18:08 Douglas Miller napsal(a): >> Just to respond to James' question on the cause. What I observed was a race condition between udevd (ses_init()) and a worker thread (do_scan_async()), >> where the worker thread is creating the directories that are the target of the symlinks being created by udevd. >> Something was happening when udevd caught up with the worker thread (so the target directory did not exist) and it seemed the worker thread either got preempted or >> else just could not stay ahead of udevd. This means that udevd started failing to create symlinks even though the worker thread eventually got them all created. >> I did observe what appeared to be preemption, as the creation of directories stopped until udevd finished failing all the (rest of the) symlinks. >> Although there may have been other explanations for what I saw. >> >> I am able to pass my testing with this patch. I don't see an official submit of this patch, but will respond to it when I see one. > Thanks Douglas for testing it, I will resubmit the patch if no one has any objections. > > Maurizio. > I did not see any additional comments, and no objections. Is it time to submit the new patch? Thanks, Doug
diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c index 65fed71..6ac07ea 100644 --- a/drivers/misc/enclosure.c +++ b/drivers/misc/enclosure.c @@ -375,6 +375,7 @@ int enclosure_add_device(struct enclosure_device *edev, int component, struct device *dev) { struct enclosure_component *cdev; + int err; if (!edev || component >= edev->components)