Message ID | 20220622023501.517125-1-senozhatsky@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | zram: do not lookup algorithm in backends table | expand |
On (22/06/22 11:35), Sergey Senozhatsky wrote: > Always use crypto_has_comp() so that crypto can lookup module, > call usermodhelper to load the modules, wait for usermodhelper > to finish and so on. Otherwise crypto will do all of these steps > under CPU hot-plug lock and this looks like too much stuff to > handle under the CPU hot-plug lock. Besides this can end up in > a deadlock when usermodhelper triggers a code path that attempts > to lock the CPU hot-plug lock, that zram already holds. And we think that we (not exactly "we", our partners) actually see a deadlock. It goes something like this: - path A. zram grabs CPU hot-plug lock, execs /sbin/modprobe from crypto and waits for modprobe to finish disksize_store zcomp_create __cpuhp_state_add_instance __cpuhp_state_add_instance_cpuslocked zcomp_cpu_up_prepare crypto_alloc_base crypto_alg_mod_lookup call_usermodehelper_exec wait_for_completion_killable do_wait_for_common schedule - path B. async work kthread that brings in scsi device. It wants to register CPUHP states at some point, and it needs the CPU hot-plug lock for that, which is owned by zram. async_run_entry_fn scsi_probe_and_add_lun scsi_mq_alloc_queue blk_mq_init_queue blk_mq_init_allocated_queue blk_mq_realloc_hw_ctxs __cpuhp_state_add_instance __cpuhp_state_add_instance_cpuslocked mutex_lock schedule - path C. modprobe sleeps, waiting for all aync works to finish. load_module do_init_module async_synchronize_full async_synchronize_cookie_domain schedule And none can make any progress. So I think we need to move crypto_alg_mod_lookup()->call_usermodehelper_exec() out of CPU hot-plug lock and pre-load modules in advance, before we grab the hot-plug lock.
On Thu, 23 Jun 2022 00:20:05 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > On (22/06/22 11:35), Sergey Senozhatsky wrote: > > Always use crypto_has_comp() so that crypto can lookup module, > > call usermodhelper to load the modules, wait for usermodhelper > > to finish and so on. Otherwise crypto will do all of these steps > > under CPU hot-plug lock and this looks like too much stuff to > > handle under the CPU hot-plug lock. Besides this can end up in > > a deadlock when usermodhelper triggers a code path that attempts > > to lock the CPU hot-plug lock, that zram already holds. > > And we think that we (not exactly "we", our partners) actually > see a deadlock. It goes something like this: > > - path A. zram grabs CPU hot-plug lock, execs /sbin/modprobe from crypto > and waits for modprobe to finish Nope, can't do that. > disksize_store > zcomp_create > __cpuhp_state_add_instance > __cpuhp_state_add_instance_cpuslocked > zcomp_cpu_up_prepare > crypto_alloc_base > crypto_alg_mod_lookup > call_usermodehelper_exec > wait_for_completion_killable > do_wait_for_common > schedule The usermode helper is free to do anything it wants, including operations that take the CPU hotplug lock. Or operations which might in the future be changed to take that lock. > - path B. async work kthread that brings in scsi device. It wants to > register CPUHP states at some point, and it needs the CPU hot-plug > lock for that, which is owned by zram. > > async_run_entry_fn > scsi_probe_and_add_lun > scsi_mq_alloc_queue > blk_mq_init_queue > blk_mq_init_allocated_queue > blk_mq_realloc_hw_ctxs > __cpuhp_state_add_instance > __cpuhp_state_add_instance_cpuslocked > mutex_lock > schedule > > - path C. modprobe sleeps, waiting for all aync works to finish. > > load_module > do_init_module > async_synchronize_full > async_synchronize_cookie_domain > schedule > > And none can make any progress. > > So I think we need to move crypto_alg_mod_lookup()->call_usermodehelper_exec() > out of CPU hot-plug lock and pre-load modules in advance, before we grab the > hot-plug lock. If the locking is fixed, why is there still a need to preload modules?
On (22/06/22 12:19), Andrew Morton wrote: > > On (22/06/22 11:35), Sergey Senozhatsky wrote: > > > Always use crypto_has_comp() so that crypto can lookup module, > > > call usermodhelper to load the modules, wait for usermodhelper > > > to finish and so on. Otherwise crypto will do all of these steps > > > under CPU hot-plug lock and this looks like too much stuff to > > > handle under the CPU hot-plug lock. Besides this can end up in > > > a deadlock when usermodhelper triggers a code path that attempts > > > to lock the CPU hot-plug lock, that zram already holds. > > > > And we think that we (not exactly "we", our partners) actually > > see a deadlock. It goes something like this: > > > > - path A. zram grabs CPU hot-plug lock, execs /sbin/modprobe from crypto > > and waits for modprobe to finish > > Nope, can't do that. > > > disksize_store > > zcomp_create > > __cpuhp_state_add_instance > > __cpuhp_state_add_instance_cpuslocked > > zcomp_cpu_up_prepare > > crypto_alloc_base > > crypto_alg_mod_lookup > > call_usermodehelper_exec > > wait_for_completion_killable > > do_wait_for_common > > schedule > > The usermode helper is free to do anything it wants, including > operations that take the CPU hotplug lock. Or operations which might > in the future be changed to take that lock. Agreed. > > - path B. async work kthread that brings in scsi device. It wants to > > register CPUHP states at some point, and it needs the CPU hot-plug > > lock for that, which is owned by zram. > > > > async_run_entry_fn > > scsi_probe_and_add_lun > > scsi_mq_alloc_queue > > blk_mq_init_queue > > blk_mq_init_allocated_queue > > blk_mq_realloc_hw_ctxs > > __cpuhp_state_add_instance > > __cpuhp_state_add_instance_cpuslocked > > mutex_lock > > schedule > > > > - path C. modprobe sleeps, waiting for all aync works to finish. > > > > load_module > > do_init_module > > async_synchronize_full > > async_synchronize_cookie_domain > > schedule > > > > And none can make any progress. > > > > So I think we need to move crypto_alg_mod_lookup()->call_usermodehelper_exec() > > out of CPU hot-plug lock and pre-load modules in advance, before we grab the > > hot-plug lock. > > If the locking is fixed, why is there still a need to preload modules? We "fix" locking by doing initial crypto compression algorithm lookup outside of hot-plug lock (pre-load). Crypto API handles a list of preloaded modules internally. What we do currently, we call crypto_alloc_base() under hot-plug lock, which calls crypto_alg_mod_lookup(), which figures out that crypto modules list does not contain that module yet so then it modprobes it. With this patch we do the first crypto_alg_mod_lookup() outside of hot-plug lock, so that it safely modprobes compression module. Then when we grab the hot-plug lock and setup per-CPU streams, crypto_alloc_base()->crypto_alg_mod_lookup() figures that module is already on the list so no modprobe is needed.
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c index 052aa3f65514..398eb9e24eff 100644 --- a/drivers/block/zram/zcomp.c +++ b/drivers/block/zram/zcomp.c @@ -63,12 +63,6 @@ static int zcomp_strm_init(struct zcomp_strm *zstrm, struct zcomp *comp) bool zcomp_available_algorithm(const char *comp) { - int i; - - i = sysfs_match_string(backends, comp); - if (i >= 0) - return true; - /* * Crypto does not ignore a trailing new line symbol, * so make sure you don't supply a string containing
Always use crypto_has_comp() so that crypto can lookup module, call usermodhelper to load the modules, wait for usermodhelper to finish and so on. Otherwise crypto will do all of these steps under CPU hot-plug lock and this looks like too much stuff to handle under the CPU hot-plug lock. Besides this can end up in a deadlock when usermodhelper triggers a code path that attempts to lock the CPU hot-plug lock, that zram already holds. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> --- drivers/block/zram/zcomp.c | 6 ------ 1 file changed, 6 deletions(-)