Message ID | 8816662.k3KXbdkA2e@aspire.rjw.lan (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Fri, Jul 6, 2018 at 5:01 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The devices_kset_move_last() call in really_probe() is a mistake > as it may cause parents to follow children in the devices_kset list > which then causes system shutdown to fail. Namely, if a device has > children before really_probe() is called for it (which is not > uncommon), that call will cause it to be reordered after the children > in the devices_kset list and the ordering of that list will not > reflect the correct device shutdown order. > > Also it causes the devices_kset list to be constantly reordered > until all drivers have been probed which is totally pointless > overhead in the majority of cases. > > For that reason, revert the really_probe() modifications made by > commit 52cdbdd49853. I'm sure you've considered this, but I can't figure out whether this patch will reintroduce the problem that was solved by 52cdbdd49853. That patch updated two places: (1) really_probe(), the change you're reverting here, and (2) device_move(). device_move() is only called from 4-5 places, none of which look related to the problem fixed by 52cdbdd49853, so it seems like that problem was probably resolved by the hunk you're reverting. > Fixes: 52cdbdd49853 (driver core: correct device's shutdown order) > Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/ > Reported-by: Pingfan Liu <kernelfans@gmail.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/base/dd.c | 8 -------- > 1 file changed, 8 deletions(-) > > Index: linux-pm/drivers/base/dd.c > =================================================================== > --- linux-pm.orig/drivers/base/dd.c > +++ linux-pm/drivers/base/dd.c > @@ -434,14 +434,6 @@ re_probe: > goto probe_failed; > } > > - /* > - * Ensure devices are listed in devices_kset in correct order > - * It's important to move Dev to the end of devices_kset before > - * calling .probe, because it could be recursive and parent Dev > - * should always go first > - */ > - devices_kset_move_last(dev); > - > if (dev->bus->probe) { > ret = dev->bus->probe(dev); > if (ret) >
On Mon, Jul 9, 2018 at 3:57 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > On Fri, Jul 6, 2018 at 5:01 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> The devices_kset_move_last() call in really_probe() is a mistake >> as it may cause parents to follow children in the devices_kset list >> which then causes system shutdown to fail. Namely, if a device has >> children before really_probe() is called for it (which is not >> uncommon), that call will cause it to be reordered after the children >> in the devices_kset list and the ordering of that list will not >> reflect the correct device shutdown order. >> >> Also it causes the devices_kset list to be constantly reordered >> until all drivers have been probed which is totally pointless >> overhead in the majority of cases. >> >> For that reason, revert the really_probe() modifications made by >> commit 52cdbdd49853. > > I'm sure you've considered this, but I can't figure out whether this > patch will reintroduce the problem that was solved by 52cdbdd49853. > That patch updated two places: (1) really_probe(), the change you're > reverting here, and (2) device_move(). > > device_move() is only called from 4-5 places, none of which look > related to the problem fixed by 52cdbdd49853, so it seems like that > problem was probably resolved by the hunk you're reverting. That's right, but I don't want to revert all of it. The other parts of it are kind of useful as they make the handling of the devices_kset list be consistent with the handling of dpm_list. The hunk I'm reverting, however, is completely off. It not only is incorrect (as per the above), but it also causes the devices_kset list and dpm_list to be handled differently. It had attempted to fix something, but it failed miserably at that. >> Fixes: 52cdbdd49853 (driver core: correct device's shutdown order) >> Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/ >> Reported-by: Pingfan Liu <kernelfans@gmail.com> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> --- >> drivers/base/dd.c | 8 -------- >> 1 file changed, 8 deletions(-) >> >> Index: linux-pm/drivers/base/dd.c >> =================================================================== >> --- linux-pm.orig/drivers/base/dd.c >> +++ linux-pm/drivers/base/dd.c >> @@ -434,14 +434,6 @@ re_probe: >> goto probe_failed; >> } >> >> - /* >> - * Ensure devices are listed in devices_kset in correct order >> - * It's important to move Dev to the end of devices_kset before >> - * calling .probe, because it could be recursive and parent Dev >> - * should always go first >> - */ >> - devices_kset_move_last(dev); >> - >> if (dev->bus->probe) { >> ret = dev->bus->probe(dev); >> if (ret) >>
[+cc Kishon] On Mon, Jul 9, 2018 at 4:35 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Mon, Jul 9, 2018 at 3:57 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > > On Fri, Jul 6, 2018 at 5:01 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > >> > >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> > >> The devices_kset_move_last() call in really_probe() is a mistake > >> as it may cause parents to follow children in the devices_kset list > >> which then causes system shutdown to fail. Namely, if a device has > >> children before really_probe() is called for it (which is not > >> uncommon), that call will cause it to be reordered after the children > >> in the devices_kset list and the ordering of that list will not > >> reflect the correct device shutdown order. > >> > >> Also it causes the devices_kset list to be constantly reordered > >> until all drivers have been probed which is totally pointless > >> overhead in the majority of cases. > >> > >> For that reason, revert the really_probe() modifications made by > >> commit 52cdbdd49853. > > > > I'm sure you've considered this, but I can't figure out whether this > > patch will reintroduce the problem that was solved by 52cdbdd49853. > > That patch updated two places: (1) really_probe(), the change you're > > reverting here, and (2) device_move(). > > > > device_move() is only called from 4-5 places, none of which look > > related to the problem fixed by 52cdbdd49853, so it seems like that > > problem was probably resolved by the hunk you're reverting. > > That's right, but I don't want to revert all of it. The other parts > of it are kind of useful as they make the handling of the devices_kset > list be consistent with the handling of dpm_list. > > The hunk I'm reverting, however, is completely off. It not only is > incorrect (as per the above), but it also causes the devices_kset list > and dpm_list to be handled differently. If I understand correctly, you are saying: - the 52cdbdd49853 really_probe() hunk fixed a problem, but - that hunk was the wrong fix for it, and - this patch removes the wrong fix (and probably reintroduces the problem) If devices_kset is supposed to be ordered so children follow parents, I agree the really_probe() hunk doesn't make much sense because the parent/child relation is determined by the circuit design, not by the probe order. It just seems like it's worth being clear that we're reintroducing the problem fixed by 52cdbdd49853, so it needs to be solved a different way. Ideally that would be done before this patch so there's not a regression, and this changelog could mention what's happening. > It had attempted to fix something, but it failed miserably at that. If you're saying that 52cdbdd49853 *tried* to fix a DRA7XX_evm reboot problem, but in fact, it did not fix that problem, then I guess there should be no issue with reverting that hunk. > >> Fixes: 52cdbdd49853 (driver core: correct device's shutdown order) > >> Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/ > >> Reported-by: Pingfan Liu <kernelfans@gmail.com> > >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> --- > >> drivers/base/dd.c | 8 -------- > >> 1 file changed, 8 deletions(-) > >> > >> Index: linux-pm/drivers/base/dd.c > >> =================================================================== > >> --- linux-pm.orig/drivers/base/dd.c > >> +++ linux-pm/drivers/base/dd.c > >> @@ -434,14 +434,6 @@ re_probe: > >> goto probe_failed; > >> } > >> > >> - /* > >> - * Ensure devices are listed in devices_kset in correct order > >> - * It's important to move Dev to the end of devices_kset before > >> - * calling .probe, because it could be recursive and parent Dev > >> - * should always go first > >> - */ > >> - devices_kset_move_last(dev); > >> - > >> if (dev->bus->probe) { > >> ret = dev->bus->probe(dev); > >> if (ret) > >>
On Fri, Jul 6, 2018 at 6:01 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The devices_kset_move_last() call in really_probe() is a mistake > as it may cause parents to follow children in the devices_kset list > which then causes system shutdown to fail. Namely, if a device has > children before really_probe() is called for it (which is not > uncommon), that call will cause it to be reordered after the children > in the devices_kset list and the ordering of that list will not > reflect the correct device shutdown order. > > Also it causes the devices_kset list to be constantly reordered > until all drivers have been probed which is totally pointless > overhead in the majority of cases. > > For that reason, revert the really_probe() modifications made by > commit 52cdbdd49853. > > Fixes: 52cdbdd49853 (driver core: correct device's shutdown order) > Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/ > Reported-by: Pingfan Liu <kernelfans@gmail.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/base/dd.c | 8 -------- > 1 file changed, 8 deletions(-) > > Index: linux-pm/drivers/base/dd.c > =================================================================== > --- linux-pm.orig/drivers/base/dd.c > +++ linux-pm/drivers/base/dd.c > @@ -434,14 +434,6 @@ re_probe: > goto probe_failed; > } > > - /* > - * Ensure devices are listed in devices_kset in correct order > - * It's important to move Dev to the end of devices_kset before > - * calling .probe, because it could be recursive and parent Dev > - * should always go first > - */ > - devices_kset_move_last(dev); > - > if (dev->bus->probe) { > ret = dev->bus->probe(dev); > if (ret) > Tested-by: Pingfan Liu <kernelfans@gmail.com>
On Tue, Jul 10, 2018 at 12:06 AM, Bjorn Helgaas <bhelgaas@google.com> wrote: > [+cc Kishon] > > On Mon, Jul 9, 2018 at 4:35 PM Rafael J. Wysocki <rafael@kernel.org> wrote: >> >> On Mon, Jul 9, 2018 at 3:57 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> > On Fri, Jul 6, 2018 at 5:01 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> >> >> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> >> >> The devices_kset_move_last() call in really_probe() is a mistake >> >> as it may cause parents to follow children in the devices_kset list >> >> which then causes system shutdown to fail. Namely, if a device has >> >> children before really_probe() is called for it (which is not >> >> uncommon), that call will cause it to be reordered after the children >> >> in the devices_kset list and the ordering of that list will not >> >> reflect the correct device shutdown order. >> >> >> >> Also it causes the devices_kset list to be constantly reordered >> >> until all drivers have been probed which is totally pointless >> >> overhead in the majority of cases. >> >> >> >> For that reason, revert the really_probe() modifications made by >> >> commit 52cdbdd49853. >> > >> > I'm sure you've considered this, but I can't figure out whether this >> > patch will reintroduce the problem that was solved by 52cdbdd49853. >> > That patch updated two places: (1) really_probe(), the change you're >> > reverting here, and (2) device_move(). >> > >> > device_move() is only called from 4-5 places, none of which look >> > related to the problem fixed by 52cdbdd49853, so it seems like that >> > problem was probably resolved by the hunk you're reverting. >> >> That's right, but I don't want to revert all of it. The other parts >> of it are kind of useful as they make the handling of the devices_kset >> list be consistent with the handling of dpm_list. >> >> The hunk I'm reverting, however, is completely off. It not only is >> incorrect (as per the above), but it also causes the devices_kset list >> and dpm_list to be handled differently. > > If I understand correctly, you are saying: > > - the 52cdbdd49853 really_probe() hunk fixed a problem, but It papered over a shutdown failure. Calling it a "fix" is an overstatement IMO. > - that hunk was the wrong fix for it, and > - this patch removes the wrong fix (and probably reintroduces the problem) > > If devices_kset is supposed to be ordered so children follow parents, > I agree the really_probe() hunk doesn't make much sense because the > parent/child relation is determined by the circuit design, not by the > probe order. Exactly. > It just seems like it's worth being clear that we're reintroducing the > problem fixed by 52cdbdd49853, so it needs to be solved a different > way. OK > Ideally that would be done before this patch so there's not a > regression, and this changelog could mention what's happening. Well, commit 52cdbdd49853 introduced a regression by itself, but that regression has only been reported recently. I don't really want to go into a discussion on which of the two regressions is more painful, but then IMO going back to the state from before commit 52cdbdd49853 is fair enough. Hence the patch. >> It had attempted to fix something, but it failed miserably at that. > > If you're saying that 52cdbdd49853 *tried* to fix a DRA7XX_evm reboot > problem, but in fact, it did not fix that problem, then I guess there > should be no issue with reverting that hunk. Again, it hid the reboot problem by changing the core in a way that led to a shutdown regression elsewhere. Also it looks like the platform(s) having that reboot issue do(es)n't really do system-wide suspend/resume, because that "fix" obviously doesn't help there. >> >> Fixes: 52cdbdd49853 (driver core: correct device's shutdown order) >> >> Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/ >> >> Reported-by: Pingfan Liu <kernelfans@gmail.com> >> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> --- >> >> drivers/base/dd.c | 8 -------- >> >> 1 file changed, 8 deletions(-) >> >> >> >> Index: linux-pm/drivers/base/dd.c >> >> =================================================================== >> >> --- linux-pm.orig/drivers/base/dd.c >> >> +++ linux-pm/drivers/base/dd.c >> >> @@ -434,14 +434,6 @@ re_probe: >> >> goto probe_failed; >> >> } >> >> >> >> - /* >> >> - * Ensure devices are listed in devices_kset in correct order >> >> - * It's important to move Dev to the end of devices_kset before >> >> - * calling .probe, because it could be recursive and parent Dev >> >> - * should always go first >> >> - */ >> >> - devices_kset_move_last(dev); >> >> - >> >> if (dev->bus->probe) { >> >> ret = dev->bus->probe(dev); >> >> if (ret) >> >>
Index: linux-pm/drivers/base/dd.c =================================================================== --- linux-pm.orig/drivers/base/dd.c +++ linux-pm/drivers/base/dd.c @@ -434,14 +434,6 @@ re_probe: goto probe_failed; } - /* - * Ensure devices are listed in devices_kset in correct order - * It's important to move Dev to the end of devices_kset before - * calling .probe, because it could be recursive and parent Dev - * should always go first - */ - devices_kset_move_last(dev); - if (dev->bus->probe) { ret = dev->bus->probe(dev); if (ret)