diff mbox series

[v1,1/2] driver core: fw_devlink: Allow firmware to mark devices as best effort

Message ID 20220622215912.550419-2-saravanak@google.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Fix console probe delay due to fw_devlink | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Saravana Kannan June 22, 2022, 9:59 p.m. UTC
When firmware sets the FWNODE_FLAG_BEST_EFFORT flag for a fwnode,
fw_devlink will do a best effort ordering for that device where it'll
only enforce the probe/suspend/resume ordering of that device with
suppliers that have drivers. The driver of that device can then decide
if it wants to defer probe or probe without the suppliers.

This will be useful for avoid probe delays of the console device that
were caused by commit 71066545b48e ("driver core: Set
fw_devlink.strict=1 by default").

Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
Reported-by: Sascha Hauer <sha@pengutronix.de>
Reported-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c    | 3 ++-
 include/linux/fwnode.h | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Sascha Hauer June 23, 2022, 6:50 a.m. UTC | #1
On Wed, Jun 22, 2022 at 02:59:10PM -0700, Saravana Kannan wrote:
> When firmware sets the FWNODE_FLAG_BEST_EFFORT flag for a fwnode,
> fw_devlink will do a best effort ordering for that device where it'll
> only enforce the probe/suspend/resume ordering of that device with
> suppliers that have drivers. The driver of that device can then decide
> if it wants to defer probe or probe without the suppliers.
> 
> This will be useful for avoid probe delays of the console device that
> were caused by commit 71066545b48e ("driver core: Set
> fw_devlink.strict=1 by default").
> 
> Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> Reported-by: Sascha Hauer <sha@pengutronix.de>
> Reported-by: Peng Fan <peng.fan@nxp.com>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/core.c    | 3 ++-
>  include/linux/fwnode.h | 4 ++++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 839f64485a55..61edd18b7bf3 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -968,7 +968,8 @@ static void device_links_missing_supplier(struct device *dev)
>  
>  static bool dev_is_best_effort(struct device *dev)
>  {
> -	return fw_devlink_best_effort && dev->can_match;
> +	return (fw_devlink_best_effort && dev->can_match) ||
> +		dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT;

Check for dev->fwnode first. I am running in a NULL pointer exception
here for a device that doesn't have a fwnode.

Sascha
Saravana Kannan June 23, 2022, 8:04 a.m. UTC | #2
On Wed, Jun 22, 2022 at 11:50 PM Sascha Hauer <sha@pengutronix.de> wrote:
>
> On Wed, Jun 22, 2022 at 02:59:10PM -0700, Saravana Kannan wrote:
> > When firmware sets the FWNODE_FLAG_BEST_EFFORT flag for a fwnode,
> > fw_devlink will do a best effort ordering for that device where it'll
> > only enforce the probe/suspend/resume ordering of that device with
> > suppliers that have drivers. The driver of that device can then decide
> > if it wants to defer probe or probe without the suppliers.
> >
> > This will be useful for avoid probe delays of the console device that
> > were caused by commit 71066545b48e ("driver core: Set
> > fw_devlink.strict=1 by default").
> >
> > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > Reported-by: Sascha Hauer <sha@pengutronix.de>
> > Reported-by: Peng Fan <peng.fan@nxp.com>
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/core.c    | 3 ++-
> >  include/linux/fwnode.h | 4 ++++
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 839f64485a55..61edd18b7bf3 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -968,7 +968,8 @@ static void device_links_missing_supplier(struct device *dev)
> >
> >  static bool dev_is_best_effort(struct device *dev)
> >  {
> > -     return fw_devlink_best_effort && dev->can_match;
> > +     return (fw_devlink_best_effort && dev->can_match) ||
> > +             dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT;
>
> Check for dev->fwnode first. I am running in a NULL pointer exception
> here for a device that doesn't have a fwnode.

Oops. Fixed and sent out a v2.

-Saravana
kernel test robot Sept. 9, 2022, 8:36 a.m. UTC | #3
Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 8c69343389e04f826b4f975dff69aaec083bb186 ("[PATCH v1 1/2] driver core: fw_devlink: Allow firmware to mark devices as best effort")
url: https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/Fix-console-probe-delay-due-to-fw_devlink/20220623-060244
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/driver-core.git a52ed4866d2b90dd5e4ae9dabd453f3ed8fa3cbc
patch link: https://lore.kernel.org/lkml/20220622215912.550419-2-saravanak@google.com

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------+------------+------------+
|                                          | a52ed4866d | 8c69343389 |
+------------------------------------------+------------+------------+
| boot_successes                           | 20         | 0          |
| boot_failures                            | 0          | 20         |
| canonical_address#:#[##]                 | 0          | 20         |
| RIP:dev_is_best_effort                   | 0          | 20         |
| Kernel_panic-not_syncing:Fatal_exception | 0          | 20         |
+------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/r/202209091632.cb6dfc9f-oliver.sang@intel.com


[   37.669495][    T8] RIP: 0010:dev_is_best_effort+0x4a/0xa7
[   37.669506][    T8] Code: 48 c1 e0 2a 80 3c 02 00 74 05 e8 21 64 40 ff 48 8b 9b c0 03 00 00 48 8d 7b 38 48 89 fa 48 c1 ea 03 b8 ff ff 37 00 48 c1 e0 2a <8a> 04 02 84 c0 74 07 7f 05 e8 c8 63 40 ff 8a 43 38 c0 e8 04 83 e0
[   37.669512][    T8] RSP: 0000:ffffc9000008fac8 EFLAGS: 00010286
[   37.669519][    T8] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   37.669525][    T8] RDX: 0000000000000007 RSI: ffffffff8893f5a0 RDI: 0000000000000038
[   37.669530][    T8] RBP: ffff888184889000 R08: 0000000000000000 R09: ffffffff8ab7de3f
[   37.669535][    T8] R10: ffffc9000008fad8 R11: ffffffff812d9753 R12: ffff88819af00000
[   37.669540][    T8] R13: ffff8881848890f0 R14: ffff88810cb3c0c8 R15: ffff888184889068
[   37.669546][    T8] FS:  0000000000000000(0000) GS:ffff8883ae500000(0000) knlGS:0000000000000000
[   37.669551][    T8] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.669556][    T8] CR2: 0000000000000000 CR3: 0000000008416000 CR4: 00000000000406a0
[   37.669564][    T8] Call Trace:
[   37.669569][    T8]  <TASK>
[   37.669573][    T8]  device_links_driver_bound+0x375/0x4e9
[   37.669582][    T8]  ? device_links_force_bind+0x152/0x152
[   37.669588][    T8]  driver_bound+0xd3/0x15e
[   37.669596][    T8]  really_probe+0x4d4/0x5c9
[   37.669603][    T8]  ? driver_allows_async_probing+0xf1/0xf1
[   37.669610][    T8]  __driver_probe_device+0x19f/0x1e9
[   37.669617][    T8]  driver_probe_device+0x44/0xbb
[   37.669624][    T8]  __device_attach_driver+0x100/0x149
[   37.669632][    T8]  bus_for_each_drv+0x136/0x15e
[   37.669639][    T8]  ? bus_rescan_devices+0x10/0x10
[   37.669646][    T8]  __device_attach+0x19b/0x239
[   37.669653][    T8]  ? device_driver_attach+0x96/0x96
[   37.669660][    T8]  bus_probe_device+0x9e/0x1d4
[   37.669667][    T8]  deferred_probe_work_func+0xbb/0xe6
[   37.669675][    T8]  process_one_work+0x5fa/0x9a3
[   37.669686][    T8]  ? max_active_store+0xba/0xba
[   37.669693][    T8]  ? rcu_read_unlock+0x54/0x54
[   37.669703][    T8]  ? list_add_tail+0x40/0xd7
[   37.669711][    T8]  process_scheduled_works+0x46/0x4d
[   37.669718][    T8]  worker_thread+0x47c/0x57b
[   37.669726][    T8]  ? rescuer_thread+0x561/0x561
[   37.669733][    T8]  kthread+0x237/0x246
[   37.669741][    T8]  ? kthread_complete_and_exit+0x1b/0x1b
[   37.669749][    T8]  ret_from_fork+0x22/0x30
[   37.669758][    T8]  </TASK>
[   37.669761][    T8] Modules linked in:
[   37.669769][    T8] ---[ end trace 0000000000000000 ]---
[   37.669773][    T8] RIP: 0010:dev_is_best_effort+0x4a/0xa7
[   37.669779][    T8] Code: 48 c1 e0 2a 80 3c 02 00 74 05 e8 21 64 40 ff 48 8b 9b c0 03 00 00 48 8d 7b 38 48 89 fa 48 c1 ea 03 b8 ff ff 37 00 48 c1 e0 2a <8a> 04 02 84 c0 74 07 7f 05 e8 c8 63 40 ff 8a 43 38 c0 e8 04 83 e0
[   37.669784][    T8] RSP: 0000:ffffc9000008fac8 EFLAGS: 00010286
[   37.669789][    T8] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   37.669793][    T8] RDX: 0000000000000007 RSI: ffffffff8893f5a0 RDI: 0000000000000038
[   37.669797][    T8] RBP: ffff888184889000 R08: 0000000000000000 R09: ffffffff8ab7de3f
[   37.669801][    T8] R10: ffffc9000008fad8 R11: ffffffff812d9753 R12: ffff88819af00000
[   37.669805][    T8] R13: ffff8881848890f0 R14: ffff88810cb3c0c8 R15: ffff888184889068
[   37.669810][    T8] FS:  0000000000000000(0000) GS:ffff8883ae500000(0000) knlGS:0000000000000000
[   37.669814][    T8] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.669818][    T8] CR2: 0000000000000000 CR3: 0000000008416000 CR4: 00000000000406a0
[   37.669823][    T8] Kernel panic - not syncing: Fatal exception
[   37.670010][    T8] Kernel Offset: disabled


To reproduce:

        # build kernel
	cd linux
	cp config-5.19.0-rc1-00014-g8c69343389e0 .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
diff mbox series

Patch

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 839f64485a55..61edd18b7bf3 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -968,7 +968,8 @@  static void device_links_missing_supplier(struct device *dev)
 
 static bool dev_is_best_effort(struct device *dev)
 {
-	return fw_devlink_best_effort && dev->can_match;
+	return (fw_devlink_best_effort && dev->can_match) ||
+		dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT;
 }
 
 /**
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 9a81c4410b9f..89b9bdfca925 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -27,11 +27,15 @@  struct device;
  *			     driver needs its child devices to be bound with
  *			     their respective drivers as soon as they are
  *			     added.
+ * BEST_EFFORT: The fwnode/device needs to probe early and might be missing some
+ *		suppliers. Only enforce ordering with suppliers that have
+ *		drivers.
  */
 #define FWNODE_FLAG_LINKS_ADDED			BIT(0)
 #define FWNODE_FLAG_NOT_DEVICE			BIT(1)
 #define FWNODE_FLAG_INITIALIZED			BIT(2)
 #define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD	BIT(3)
+#define FWNODE_FLAG_BEST_EFFORT			BIT(4)
 
 struct fwnode_handle {
 	struct fwnode_handle *secondary;