diff mbox series

Revert "module, async: async_synchronize_full() on module init iff async is used"

Message ID 20220127233953.2185045-1-ipylypiv@google.com (mailing list archive)
State New, archived
Headers show
Series Revert "module, async: async_synchronize_full() on module init iff async is used" | expand

Commit Message

Igor Pylypiv Jan. 27, 2022, 11:39 p.m. UTC
This reverts commit 774a1221e862b343388347bac9b318767336b20b.

We need to finish all async code before the module init sequence is done.
In the reverted commit the PF_USED_ASYNC flag was added to mark a thread
that called async_schedule(). Then the PF_USED_ASYNC flag was used to
determine whether or not async_synchronize_full() needs to be invoked.
This works when modprobe thread is calling async_schedule(), but it
does not work if module dispatches init code to a worker thread which
then calls async_schedule().

For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:

	if (cpu < nr_cpu_ids)
		error = work_on_cpu(cpu, local_pci_probe, &ddi);
	else
		error = local_pci_probe(&ddi);

We end up in a situation where a worker thread gets the PF_USED_ASYNC flag
set instead of the modprobe thread. As a result, async_synchronize_full()
is not invoked and modprobe completes without waiting for the async code
to finish.

The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)

modprobe pm80xx                      worker
...
  do_init_module()
  ...
    pci_call_probe()
      work_on_cpu(local_pci_probe)
                                     local_pci_probe()
                                       pm8001_pci_probe()
                                         scsi_scan_host()
                                           async_schedule()
                                           worker->flags |= PF_USED_ASYNC;
                                     ...
      < return from worker >
  ...
  if (current->flags & PF_USED_ASYNC) <--- false
  	async_synchronize_full();

Commit 21c3c5d28007 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit 774a1221e862 ("module,
async: async_synchronize_full() on module init iff async is used") tried
to fix.

Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading
from async is not allowed.

Given that the original deadlock issue is fixed and it is no longer allowed
to call synchronous request_module() from async we can remove PF_USED_ASYNC
flag to make module init consistently invoke async_synchronize_full()
unless async module probe is requested.

Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
---
 include/linux/sched.h |  1 -
 kernel/async.c        |  3 ---
 kernel/module.c       | 25 +++++--------------------
 3 files changed, 5 insertions(+), 24 deletions(-)

Comments

Linus Torvalds Jan. 28, 2022, 7:39 a.m. UTC | #1
On Fri, Jan 28, 2022 at 1:40 AM Igor Pylypiv <ipylypiv@google.com> wrote:
>
> This reverts commit 774a1221e862b343388347bac9b318767336b20b.

Whee. That's from early 2013, more than 9 years ago.

> This works when modprobe thread is calling async_schedule(), but it
> does not work if module dispatches init code to a worker thread which
> then calls async_schedule().

Hmm.

You make a good argument:

> Commit 21c3c5d28007 ("block: don't request module during elevator init")
> fixed the deadlock issue which the reverted commit 774a1221e862 ("module,
> async: async_synchronize_full() on module init iff async is used") tried
> to fix.
>
> Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous
> request_module() from async workers") synchronous module loading
> from async is not allowed.

This does seem to imply that limiting it to PF_USED_ASYNC is largely
pointless, at least for the originally stated reason of deadlocks with
other module loading.

However, we've done this for *so* long that I wonder if there might be
situations that have ended up depending on the lack of synchronization
for pure performance reasons.

If *this* module loading process started the async work, then we'd
wait for it, but what if there's other async work that was started by
others? This revert would now make us wait for that async work too,
and that might be a big deal slowing things down at boot time.

Looking at it, this is all under the 'module_mutex', so I guess we are
already single-threaded at least wrt loading other modules, so the
amount of unrelated async work going on is presumably fairly low and
that isn't an issue.

Anyway, I think this patch is the right thing to do, but just the fact
that we've avoided that async wait for so long makes me a bit nervous
about fallout from the revert.

Comments? Maybe this is a "just apply it, see if somebody screams" situation?

                   Linus
Igor Pylypiv Jan. 31, 2022, 9:58 p.m. UTC | #2
On Fri, Jan 28, 2022 at 09:39:12AM +0200, Linus Torvalds wrote:
> On Fri, Jan 28, 2022 at 1:40 AM Igor Pylypiv <ipylypiv@google.com> wrote:
> >
> > This reverts commit 774a1221e862b343388347bac9b318767336b20b.
> 
> Whee. That's from early 2013, more than 9 years ago.
> 
> > This works when modprobe thread is calling async_schedule(), but it
> > does not work if module dispatches init code to a worker thread which
> > then calls async_schedule().
> 
> Hmm.
> 
> You make a good argument:
> 
> > Commit 21c3c5d28007 ("block: don't request module during elevator init")
> > fixed the deadlock issue which the reverted commit 774a1221e862 ("module,
> > async: async_synchronize_full() on module init iff async is used") tried
> > to fix.
> >
> > Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous
> > request_module() from async workers") synchronous module loading
> > from async is not allowed.
> 
> This does seem to imply that limiting it to PF_USED_ASYNC is largely
> pointless, at least for the originally stated reason of deadlocks with
> other module loading.
> 
> However, we've done this for *so* long that I wonder if there might be
> situations that have ended up depending on the lack of synchronization
> for pure performance reasons.
> 
> If *this* module loading process started the async work, then we'd
> wait for it, but what if there's other async work that was started by
> others? This revert would now make us wait for that async work too,
> and that might be a big deal slowing things down at boot time.

Thanks Linus! It is not like we have no wait at all today, we have it
for drivers that call async directly (not through a worker). A potential
future refactor of some driver to start using async may also "suddenly"
enable synchronization through PF_USED_ASYNC.

In case someone encounters a noticeable boot time slowdown they can pass
the 'async_probe' parameter to probe a waiting module asynchronously.

> 
> Looking at it, this is all under the 'module_mutex', so I guess we are
> already single-threaded at least wrt loading other modules, so the
> amount of unrelated async work going on is presumably fairly low and
> that isn't an issue.
> 
> Anyway, I think this patch is the right thing to do, but just the fact
> that we've avoided that async wait for so long makes me a bit nervous
> about fallout from the revert.
> 
> Comments? Maybe this is a "just apply it, see if somebody screams" situation?
> 
>                    Linus
Tejun Heo Feb. 1, 2022, 5:43 p.m. UTC | #3
On Thu, Jan 27, 2022 at 03:39:53PM -0800, Igor Pylypiv wrote:
> This reverts commit 774a1221e862b343388347bac9b318767336b20b.
> 
> We need to finish all async code before the module init sequence is done.
> In the reverted commit the PF_USED_ASYNC flag was added to mark a thread
> that called async_schedule(). Then the PF_USED_ASYNC flag was used to
> determine whether or not async_synchronize_full() needs to be invoked.
> This works when modprobe thread is calling async_schedule(), but it
> does not work if module dispatches init code to a worker thread which
> then calls async_schedule().
> 
> For example, PCI driver probing is invoked from a worker thread based on
> a node where device is attached:
> 
> 	if (cpu < nr_cpu_ids)
> 		error = work_on_cpu(cpu, local_pci_probe, &ddi);
> 	else
> 		error = local_pci_probe(&ddi);
> 
> We end up in a situation where a worker thread gets the PF_USED_ASYNC flag
> set instead of the modprobe thread. As a result, async_synchronize_full()
> is not invoked and modprobe completes without waiting for the async code
> to finish.
> 
> The issue was discovered while loading the pm80xx driver:
> (scsi_mod.scan=async)
> 
> modprobe pm80xx                      worker
> ...
>   do_init_module()
>   ...
>     pci_call_probe()
>       work_on_cpu(local_pci_probe)
>                                      local_pci_probe()
>                                        pm8001_pci_probe()
>                                          scsi_scan_host()
>                                            async_schedule()
>                                            worker->flags |= PF_USED_ASYNC;
>                                      ...
>       < return from worker >
>   ...
>   if (current->flags & PF_USED_ASYNC) <--- false
>   	async_synchronize_full();
> 
> Commit 21c3c5d28007 ("block: don't request module during elevator init")
> fixed the deadlock issue which the reverted commit 774a1221e862 ("module,
> async: async_synchronize_full() on module init iff async is used") tried
> to fix.
> 
> Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous
> request_module() from async workers") synchronous module loading
> from async is not allowed.
> 
> Given that the original deadlock issue is fixed and it is no longer allowed
> to call synchronous request_module() from async we can remove PF_USED_ASYNC
> flag to make module init consistently invoke async_synchronize_full()
> unless async module probe is requested.
> 
> Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
> Reviewed-by: Changyuan Lyu <changyuanl@google.com>

That's quite a walk down the memory lane and I agree with your analysis. The
PF_USED_ASYNC is redundant for correctness with the removal of synchrnous
loading from iosched path and the WARN_ON guarantees that nothing in kernel
is creating a similar situation.

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.
Tejun Heo Feb. 1, 2022, 5:50 p.m. UTC | #4
On Tue, Feb 01, 2022 at 07:43:27AM -1000, Tejun Heo wrote:
> That's quite a walk down the memory lane and I agree with your analysis. The
> PF_USED_ASYNC is redundant for correctness with the removal of synchrnous
> loading from iosched path and the WARN_ON guarantees that nothing in kernel
> is creating a similar situation.
> 
> Acked-by: Tejun Heo <tj@kernel.org>

BTW, I can route this through workqueue tree but -mm or going to Linus's
tree directly might be a better option. Any opinions?

Thanks.
Tejun Heo Feb. 1, 2022, 6:13 p.m. UTC | #5
Hello,

On Fri, Jan 28, 2022 at 09:39:12AM +0200, Linus Torvalds wrote:
> However, we've done this for *so* long that I wonder if there might be
> situations that have ended up depending on the lack of synchronization
> for pure performance reasons.
> 
> If *this* module loading process started the async work, then we'd
> wait for it, but what if there's other async work that was started by
> others? This revert would now make us wait for that async work too,
> and that might be a big deal slowing things down at boot time.
> 
> Looking at it, this is all under the 'module_mutex', so I guess we are
> already single-threaded at least wrt loading other modules, so the
> amount of unrelated async work going on is presumably fairly low and
> that isn't an issue.

Looks like we're multi-threaded while running the mod inits which launch the
async jobs and single-threaded while waiting for them to finish. Greg should
know a lot better than me but according to my hazy memory and cursory code
reading udev is multi-processed when loading modules, which makes it a lot
less likely that this will impact boot time in most cases.

> Anyway, I think this patch is the right thing to do, but just the fact
> that we've avoided that async wait for so long makes me a bit nervous
> about fallout from the revert.
> 
> Comments? Maybe this is a "just apply it, see if somebody screams" situation?

So, yeah, I think the risk is pretty low and even in the unlikely case that
someone is affected, the workaround is pretty straight-forward - not waiting
for the module loading to finish if appropriate.

Thanks.
Greg KH Feb. 1, 2022, 6:20 p.m. UTC | #6
On Tue, Feb 01, 2022 at 08:13:14AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jan 28, 2022 at 09:39:12AM +0200, Linus Torvalds wrote:
> > However, we've done this for *so* long that I wonder if there might be
> > situations that have ended up depending on the lack of synchronization
> > for pure performance reasons.
> > 
> > If *this* module loading process started the async work, then we'd
> > wait for it, but what if there's other async work that was started by
> > others? This revert would now make us wait for that async work too,
> > and that might be a big deal slowing things down at boot time.
> > 
> > Looking at it, this is all under the 'module_mutex', so I guess we are
> > already single-threaded at least wrt loading other modules, so the
> > amount of unrelated async work going on is presumably fairly low and
> > that isn't an issue.
> 
> Looks like we're multi-threaded while running the mod inits which launch the
> async jobs and single-threaded while waiting for them to finish. Greg should
> know a lot better than me but according to my hazy memory and cursory code
> reading udev is multi-processed when loading modules, which makes it a lot
> less likely that this will impact boot time in most cases.

I think userspace is multi-processed here, which should help with the
reading of the modules from disk at boot while others are actually being
loaded due to the kernel lock.

> > Anyway, I think this patch is the right thing to do, but just the fact
> > that we've avoided that async wait for so long makes me a bit nervous
> > about fallout from the revert.
> > 
> > Comments? Maybe this is a "just apply it, see if somebody screams" situation?
> 
> So, yeah, I think the risk is pretty low and even in the unlikely case that
> someone is affected, the workaround is pretty straight-forward - not waiting
> for the module loading to finish if appropriate.

I agree with Linus, let's see if anyone notices :)

thanks,

greg k-h
Luis Chamberlain Feb. 3, 2022, 1:11 a.m. UTC | #7
On Tue, Feb 01, 2022 at 07:50:08AM -1000, Tejun Heo wrote:
> On Tue, Feb 01, 2022 at 07:43:27AM -1000, Tejun Heo wrote:
> > That's quite a walk down the memory lane and I agree with your analysis. The
> > PF_USED_ASYNC is redundant for correctness with the removal of synchrnous
> > loading from iosched path and the WARN_ON guarantees that nothing in kernel
> > is creating a similar situation.
> > 
> > Acked-by: Tejun Heo <tj@kernel.org>
> 
> BTW, I can route this through workqueue tree but -mm or going to Linus's
> tree directly might be a better option. Any opinions?

FWIW, although I was not working on modules when the reverted commit was
merged, my take on this is concern for a regression, much more than
others. However I do agree that async probe can help for any slowdown,
and the idea was that folks would eventually strive slowly to make more
and more modules default to async probe as they can verify that's fine.

With that said,

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

And I'd prefer this go through Linus' tree since this is a fix and
can be merged already. That would help with Aaron's kernel/module.c
refactor work (no functional changes) too as that will soon slice and
dice that to a kernel/module/*.c files.

 Luis
Linus Torvalds Feb. 3, 2022, 7:19 p.m. UTC | #8
On Tue, Feb 1, 2022 at 9:50 AM Tejun Heo <tj@kernel.org> wrote:
>
> BTW, I can route this through workqueue tree but -mm or going to Linus's
> tree directly might be a better option. Any opinions?

I'll take it directly. Hopefully nobody notices anything at all - but
if it causes unexpected serialization and somebody screams we'll know
more at that point..

           Linus
diff mbox series

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f5b2be39a78c..75ba8aa60248 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1680,7 +1680,6 @@  extern struct pid *cad_pid;
 #define PF_MEMALLOC		0x00000800	/* Allocating memory */
 #define PF_NPROC_EXCEEDED	0x00001000	/* set_user() noticed that RLIMIT_NPROC was exceeded */
 #define PF_USED_MATH		0x00002000	/* If unset the fpu must be initialized before use */
-#define PF_USED_ASYNC		0x00004000	/* Used async_schedule*(), used by module init */
 #define PF_NOFREEZE		0x00008000	/* This thread should not be frozen */
 #define PF_FROZEN		0x00010000	/* Frozen for system suspend */
 #define PF_KSWAPD		0x00020000	/* I am kswapd */
diff --git a/kernel/async.c b/kernel/async.c
index b8d7a663497f..b2c4ba5686ee 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -205,9 +205,6 @@  async_cookie_t async_schedule_node_domain(async_func_t func, void *data,
 	atomic_inc(&entry_count);
 	spin_unlock_irqrestore(&async_lock, flags);
 
-	/* mark that this task has queued an async job, used by module init */
-	current->flags |= PF_USED_ASYNC;
-
 	/* schedule for execution */
 	queue_work_node(node, system_unbound_wq, &entry->work);
 
diff --git a/kernel/module.c b/kernel/module.c
index 24dab046e16c..46a5c2ed1928 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3725,12 +3725,6 @@  static noinline int do_init_module(struct module *mod)
 	}
 	freeinit->module_init = mod->init_layout.base;
 
-	/*
-	 * We want to find out whether @mod uses async during init.  Clear
-	 * PF_USED_ASYNC.  async_schedule*() will set it.
-	 */
-	current->flags &= ~PF_USED_ASYNC;
-
 	do_mod_ctors(mod);
 	/* Start the module */
 	if (mod->init != NULL)
@@ -3756,22 +3750,13 @@  static noinline int do_init_module(struct module *mod)
 
 	/*
 	 * We need to finish all async code before the module init sequence
-	 * is done.  This has potential to deadlock.  For example, a newly
-	 * detected block device can trigger request_module() of the
-	 * default iosched from async probing task.  Once userland helper
-	 * reaches here, async_synchronize_full() will wait on the async
-	 * task waiting on request_module() and deadlock.
-	 *
-	 * This deadlock is avoided by perfomring async_synchronize_full()
-	 * iff module init queued any async jobs.  This isn't a full
-	 * solution as it will deadlock the same if module loading from
-	 * async jobs nests more than once; however, due to the various
-	 * constraints, this hack seems to be the best option for now.
-	 * Please refer to the following thread for details.
+	 * is done. This has potential to deadlock if synchronous module
+	 * loading is requested from async (which is not allowed!).
 	 *
-	 * http://thread.gmane.org/gmane.linux.kernel/1420814
+	 * See commit 0fdff3ec6d87 ("async, kmod: warn on synchronous
+	 * request_module() from async workers") for more details.
 	 */
-	if (!mod->async_probe_requested && (current->flags & PF_USED_ASYNC))
+	if (!mod->async_probe_requested)
 		async_synchronize_full();
 
 	ftrace_free_mem(mod, mod->init_layout.base, mod->init_layout.base +