Message ID | 20220701012647.2007122-1-saravanak@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix console probe delay when stdout-path isn't set | expand |
On Fri, Jul 1, 2022 at 3:28 AM Saravana Kannan <saravanak@google.com> wrote: > > With commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by > default") the probing of TTY consoles could get delayed if they have > optional suppliers that are listed in DT, but those suppliers don't > probe by the time kernel boot finishes. The console devices will probe > eventually after driver_probe_timeout expires. > > However, since consoles are often used for debugging kernel issues, it > does not make sense to delay their probe. So, set the newly added > probe_no_timeout flag for all serial drivers that at DT based. This way, > fw_devlink will know not to delay the probing of the consoles past > kernel boot. Same question, do you think only serial drivers need that?
On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote: > These patches are on top of driver-core-next. > > Even if stdout-path isn't set in DT, this patch should take console > probe times back to how they were before the deferred_probe_timeout > clean up series[1]. Now dropped from my queue due to lack of a response to other reviewer's questions. thanks, greg k-h
On Tue, Aug 23, 2022 at 4:25 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote: > > These patches are on top of driver-core-next. > > > > Even if stdout-path isn't set in DT, this patch should take console > > probe times back to how they were before the deferred_probe_timeout > > clean up series[1]. > > Now dropped from my queue due to lack of a response to other reviewer's > questions. Sorry, I somehow missed those emails. I'll respond later today/tomorrow. -Saravana
On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote: > > These patches are on top of driver-core-next. > > > > Even if stdout-path isn't set in DT, this patch should take console > > probe times back to how they were before the deferred_probe_timeout > > clean up series[1]. > > Now dropped from my queue due to lack of a response to other reviewer's > questions. What happened to this patch? I have a 10 second timeout on console probe on my SiFive Unmatched, and I don't see this flag being set for the serial driver. In fact, I don't see it anywhere in-tree. I can't seem to locate another patchset from Saravana around this though, so I'm not sure where to look for a missing piece for the sifive serial driver. This is the second boot time regression (this one not fatal, unlike the Layerscape PCIe one) from the fw_devlink patchset. Greg, can you revert the whole set for 6.0, please? It's obviously nowhere near tested enough to go in and I expect we'll see a bunch of -stable fixups due to this if we let it remain in. This seems to be one of the worst releases I've encountered in recent years on my hardware here due to this patchset. :-( -Olof
On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote: > On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote: > > > These patches are on top of driver-core-next. > > > > > > Even if stdout-path isn't set in DT, this patch should take console > > > probe times back to how they were before the deferred_probe_timeout > > > clean up series[1]. > > > > Now dropped from my queue due to lack of a response to other reviewer's > > questions. > > What happened to this patch? I have a 10 second timeout on console > probe on my SiFive Unmatched, and I don't see this flag being set for > the serial driver. In fact, I don't see it anywhere in-tree. I can't > seem to locate another patchset from Saravana around this though, so > I'm not sure where to look for a missing piece for the sifive serial > driver. > > This is the second boot time regression (this one not fatal, unlike > the Layerscape PCIe one) from the fw_devlink patchset. > > Greg, can you revert the whole set for 6.0, please? It's obviously > nowhere near tested enough to go in and I expect we'll see a bunch of > -stable fixups due to this if we let it remain in. What exactly is "the whole set"? I have the default option fix queued up and will send that to Linus later this week (am traveling back from Plumbers still), but have not heard any problems about any other issues at all other than your report. thnaks, greg k-h
On Mon, Sep 19, 2022 at 1:44 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote: > > On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote: > > > > These patches are on top of driver-core-next. > > > > > > > > Even if stdout-path isn't set in DT, this patch should take console > > > > probe times back to how they were before the deferred_probe_timeout > > > > clean up series[1]. > > > > > > Now dropped from my queue due to lack of a response to other reviewer's > > > questions. > > > > What happened to this patch? I have a 10 second timeout on console > > probe on my SiFive Unmatched, and I don't see this flag being set for > > the serial driver. In fact, I don't see it anywhere in-tree. I can't > > seem to locate another patchset from Saravana around this though, so > > I'm not sure where to look for a missing piece for the sifive serial > > driver. > > > > This is the second boot time regression (this one not fatal, unlike > > the Layerscape PCIe one) from the fw_devlink patchset. > > > > Greg, can you revert the whole set for 6.0, please? It's obviously > > nowhere near tested enough to go in and I expect we'll see a bunch of > > -stable fixups due to this if we let it remain in. > > What exactly is "the whole set"? I have the default option fix queued > up and will send that to Linus later this week (am traveling back from > Plumbers still), but have not heard any problems about any other issues > at all other than your report. I stand corrected in this case, the issue on the Hifive Unmatched was a regression due to a PWM clock change -- I just sent a patch for that (serial driver fix). So it seems like as long as the fw_devlink.strict=1 patch is reverted, things are back to a working state here. I still struggle with how the fw_devlink patchset is expected to work though, since DT is expected to describe the hardware configuration, and it has no knowledge of whether there are drivers that will be bound to any referenced supplier devnodes. It's not going to work well to assume that they will always be bound, and to add 10 second timeouts for those cases isn't a good solution. Seems like the number of special cases will keep adding up. The whole design feels like it's falling short, and it's been patched here and there to deal with the shortcomings, instead of revisiting the full solution. (The patches are the console one, and another to deal with nfsroot boots). As long as it doesn't keep regressing others, I suppose the work to redesign it can happen in-tree, but it's not usually how we try to do it for new functionality. Especially since it's still being iterated on (with active patch sets posted around -rc1 for improvements). Oh, and one more thing for the future -- the main patch that changes behavior due to dependency tracking is 2f8c3ae8288e, named "driver core: Add wait_for_init_devices_probe helper function". It's easy to overlook this when looking at a list of patches since it's said to just introduce a helper. -Olof
On Mon, Sep 19, 2022 at 5:56 PM Olof Johansson <olof@lixom.net> wrote: > > On Mon, Sep 19, 2022 at 1:44 AM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote: > > > On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote: > > > > > These patches are on top of driver-core-next. > > > > > > > > > > Even if stdout-path isn't set in DT, this patch should take console > > > > > probe times back to how they were before the deferred_probe_timeout > > > > > clean up series[1]. > > > > > > > > Now dropped from my queue due to lack of a response to other reviewer's > > > > questions. > > > > > > What happened to this patch? I have a 10 second timeout on console > > > probe on my SiFive Unmatched, and I don't see this flag being set for > > > the serial driver. In fact, I don't see it anywhere in-tree. I can't > > > seem to locate another patchset from Saravana around this though, so > > > I'm not sure where to look for a missing piece for the sifive serial > > > driver. > > > > > > This is the second boot time regression (this one not fatal, unlike > > > the Layerscape PCIe one) from the fw_devlink patchset. > > > > > > Greg, can you revert the whole set for 6.0, please? It's obviously > > > nowhere near tested enough to go in and I expect we'll see a bunch of > > > -stable fixups due to this if we let it remain in. > > > > What exactly is "the whole set"? I have the default option fix queued > > up and will send that to Linus later this week (am traveling back from > > Plumbers still), but have not heard any problems about any other issues > > at all other than your report. > > I stand corrected in this case, the issue on the Hifive Unmatched was > a regression due to a PWM clock change -- I just sent a patch for that > (serial driver fix). > > So it seems like as long as the fw_devlink.strict=1 patch is reverted, > things are back to a working state here. > > I still struggle with how the fw_devlink patchset is expected to work > though, since DT is expected to describe the hardware configuration, > and it has no knowledge of whether there are drivers that will be > bound to any referenced supplier devnodes. It's not going to work well > to assume that they will always be bound, and to add 10 second > timeouts for those cases isn't a good solution. Seems like the number > of special cases will keep adding up. Since the introduction of deferred probe, the kernel has always assumed if there is a device described, then there is or will be a driver for it. The result is you can't use new DTs (if they add providers) with older kernels. We've ended up with a timeout because no one has come up with a better way to handle it. What the kernel needs is userspace saying "I'm done loading modules", but it's debatable whether that's a good solution too. Rob
On Mon, Sep 26, 2022 at 01:25:05PM -0500, Rob Herring wrote: > On Mon, Sep 19, 2022 at 5:56 PM Olof Johansson <olof@lixom.net> wrote: > > > > On Mon, Sep 19, 2022 at 1:44 AM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote: > > > > On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote: > > > > > > These patches are on top of driver-core-next. > > > > > > > > > > > > Even if stdout-path isn't set in DT, this patch should take console > > > > > > probe times back to how they were before the deferred_probe_timeout > > > > > > clean up series[1]. > > > > > > > > > > Now dropped from my queue due to lack of a response to other reviewer's > > > > > questions. > > > > > > > > What happened to this patch? I have a 10 second timeout on console > > > > probe on my SiFive Unmatched, and I don't see this flag being set for > > > > the serial driver. In fact, I don't see it anywhere in-tree. I can't > > > > seem to locate another patchset from Saravana around this though, so > > > > I'm not sure where to look for a missing piece for the sifive serial > > > > driver. > > > > > > > > This is the second boot time regression (this one not fatal, unlike > > > > the Layerscape PCIe one) from the fw_devlink patchset. > > > > > > > > Greg, can you revert the whole set for 6.0, please? It's obviously > > > > nowhere near tested enough to go in and I expect we'll see a bunch of > > > > -stable fixups due to this if we let it remain in. > > > > > > What exactly is "the whole set"? I have the default option fix queued > > > up and will send that to Linus later this week (am traveling back from > > > Plumbers still), but have not heard any problems about any other issues > > > at all other than your report. > > > > I stand corrected in this case, the issue on the Hifive Unmatched was > > a regression due to a PWM clock change -- I just sent a patch for that > > (serial driver fix). > > > > So it seems like as long as the fw_devlink.strict=1 patch is reverted, > > things are back to a working state here. > > > > I still struggle with how the fw_devlink patchset is expected to work > > though, since DT is expected to describe the hardware configuration, > > and it has no knowledge of whether there are drivers that will be > > bound to any referenced supplier devnodes. It's not going to work well > > to assume that they will always be bound, and to add 10 second > > timeouts for those cases isn't a good solution. Seems like the number > > of special cases will keep adding up. > > Since the introduction of deferred probe, the kernel has always > assumed if there is a device described, then there is or will be a > driver for it. The result is you can't use new DTs (if they add > providers) with older kernels. > > We've ended up with a timeout because no one has come up with a better > way to handle it. What the kernel needs is userspace saying "I'm done > loading modules", but it's debatable whether that's a good solution > too. In my opinion the deferred probe is a big hack and that is the root cause of the issues we have here and there. It has to be redesigned to be mathematically robust. It was an attempt by Andrzej Hajda to solve this [1]. [1]: https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Deferred-Problem-Issues-With-Complex-Dependencies-Between-Devices-in-Linux-Kernel-Andrzej-Hajda-Samsung.pdf
On Tue, Sep 27, 2022 at 03:17:59PM +0300, Andy Shevchenko wrote: > On Mon, Sep 26, 2022 at 01:25:05PM -0500, Rob Herring wrote: > > On Mon, Sep 19, 2022 at 5:56 PM Olof Johansson <olof@lixom.net> wrote: > > > > > > On Mon, Sep 19, 2022 at 1:44 AM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote: > > > > > On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote: > > > > > > > These patches are on top of driver-core-next. > > > > > > > > > > > > > > Even if stdout-path isn't set in DT, this patch should take console > > > > > > > probe times back to how they were before the deferred_probe_timeout > > > > > > > clean up series[1]. > > > > > > > > > > > > Now dropped from my queue due to lack of a response to other reviewer's > > > > > > questions. > > > > > > > > > > What happened to this patch? I have a 10 second timeout on console > > > > > probe on my SiFive Unmatched, and I don't see this flag being set for > > > > > the serial driver. In fact, I don't see it anywhere in-tree. I can't > > > > > seem to locate another patchset from Saravana around this though, so > > > > > I'm not sure where to look for a missing piece for the sifive serial > > > > > driver. > > > > > > > > > > This is the second boot time regression (this one not fatal, unlike > > > > > the Layerscape PCIe one) from the fw_devlink patchset. > > > > > > > > > > Greg, can you revert the whole set for 6.0, please? It's obviously > > > > > nowhere near tested enough to go in and I expect we'll see a bunch of > > > > > -stable fixups due to this if we let it remain in. > > > > > > > > What exactly is "the whole set"? I have the default option fix queued > > > > up and will send that to Linus later this week (am traveling back from > > > > Plumbers still), but have not heard any problems about any other issues > > > > at all other than your report. > > > > > > I stand corrected in this case, the issue on the Hifive Unmatched was > > > a regression due to a PWM clock change -- I just sent a patch for that > > > (serial driver fix). > > > > > > So it seems like as long as the fw_devlink.strict=1 patch is reverted, > > > things are back to a working state here. > > > > > > I still struggle with how the fw_devlink patchset is expected to work > > > though, since DT is expected to describe the hardware configuration, > > > and it has no knowledge of whether there are drivers that will be > > > bound to any referenced supplier devnodes. It's not going to work well > > > to assume that they will always be bound, and to add 10 second > > > timeouts for those cases isn't a good solution. Seems like the number > > > of special cases will keep adding up. > > > > Since the introduction of deferred probe, the kernel has always > > assumed if there is a device described, then there is or will be a > > driver for it. The result is you can't use new DTs (if they add > > providers) with older kernels. > > > > We've ended up with a timeout because no one has come up with a better > > way to handle it. What the kernel needs is userspace saying "I'm done > > loading modules", but it's debatable whether that's a good solution > > too. > > In my opinion the deferred probe is a big hack and that is the root > cause of the issues we have here and there. It has to be redesigned > to be mathematically robust. It was an attempt by Andrzej Hajda to > solve this [1]. > > [1]: https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Deferred-Problem-Issues-With-Complex-Dependencies-Between-Devices-in-Linux-Kernel-Andrzej-Hajda-Samsung.pdf deferred probe has _ALWAYS_ been known to be a hack, way back when we accepted it, but it was the best hack we had to solve a real problem that we had, so it was accepted. It's been polished over the years, but yes, it does break down at times, due to the crazy complexity of hardware systems that we have no control over. If you have concrete solutions for how to solve the issue, wonderful, please submit patches :) thanks, greg k-h