Message ID | e0a0bc94-e6de-b0e5-ee46-a76cd1570ea6@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | loop: add WQ_MEM_RECLAIM flag to per device workqueue | expand |
On Thu, Mar 17, 2022 at 11:08:04PM +0900, Tetsuo Handa wrote: > Commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker") > again started using per device workqueue > > - lo->worker_task = kthread_run(loop_kthread_worker_fn, > - &lo->worker, "loop%d", lo->lo_number); > + lo->workqueue = alloc_workqueue("loop%d", > + WQ_UNBOUND | WQ_FREEZABLE, > + 0, > + lo->lo_number); > > but forgot to restore WQ_MEM_RECLAIM flag. Early versions of the patch did have WQ_MEM_RECLAIM but it was removed. I looked up my notes and found this: Changes since V11: * Removed WQ_MEM_RECLAIM flag from loop workqueue. Technically, this can be driven by writeback, but this was causing a warning in xfs and likely other filesystems aren't equipped to be driven by reclaim at the VFS layer. I don't know if this is still the case. Can you test it? > I don't know why WQ_FREEZABLE flag was added My memory here is hazy and I don't have it in my notes but the same is done in block/blk-cgroup.c which is conceptually quite similar.
On 2022/03/17 23:38, Dan Schatzberg wrote: > On Thu, Mar 17, 2022 at 11:08:04PM +0900, Tetsuo Handa wrote: >> Commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker") >> again started using per device workqueue >> >> - lo->worker_task = kthread_run(loop_kthread_worker_fn, >> - &lo->worker, "loop%d", lo->lo_number); >> + lo->workqueue = alloc_workqueue("loop%d", >> + WQ_UNBOUND | WQ_FREEZABLE, >> + 0, >> + lo->lo_number); >> >> but forgot to restore WQ_MEM_RECLAIM flag. > > Early versions of the patch did have WQ_MEM_RECLAIM but it was > removed. I looked up my notes and found this: > > Changes since V11: > > * Removed WQ_MEM_RECLAIM flag from loop workqueue. Technically, this > can be driven by writeback, but this was causing a warning in xfs > and likely other filesystems aren't equipped to be driven by reclaim > at the VFS layer. > > I don't know if this is still the case. Can you test it? Well, indeed there was WQ_MEM_RECLAIM flag in https://lore.kernel.org/all/20210316153655.500806-2-schatzberg.dan@gmail.com/ and the warning you are referring to seems to be https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ . ---------------------------------------- [ 49.143055] run fstests generic/361 at 2021-03-19 23:56:49 [ 49.418726] loop: module loaded [ 49.642111] XFS (sda4): Mounting V5 Filesystem [ 49.761139] XFS (sda4): Ending clean mount [ 49.768167] xfs filesystem being mounted at /fs/scratch supports timestamps until 2038 (0x7fffffff) [ 49.799002] loop0: detected capacity change from 0 to 2097152 [ 50.027339] XFS (loop0): Mounting V5 Filesystem [ 50.033830] XFS (loop0): Ending clean mount [ 50.038232] xfs filesystem being mounted at /fs/scratch/mnt supports timestamps until 2038 (0x7fffffff) [ 50.239610] XFS: attr2 mount option is deprecated. [ 50.423584] ------------[ cut here ]------------ [ 50.428278] workqueue: WQ_MEM_RECLAIM loop0:loop_rootcg_workfn [loop] is flushing !WQ_MEM_RECLAIM xfs-sync/sda4:xfs_flush_inodes_worker [xfs] [ 50.428387] WARNING: CPU: 0 PID: 35 at kernel/workqueue.c:2613 check_flush_dependency+0x116/0x140 [ 50.450013] Modules linked in: loop xfs dm_mod btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c sd_mod t10_pi sg ipmi_devintf ipmi_msghandler intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul hp_wmi sparse_keymap intel_gtt crc32c_intel ghash_clmulni_intel mei_wdt rfkill wmi_bmof rapl drm_kms_helper ahci intel_cstate syscopyarea mei_me libahci sysfillrect sysimgblt fb_sys_fops intel_uncore serio_raw mei drm libata intel_pch_thermal ie31200_edac wmi video tpm_infineon intel_pmc_core acpi_pad ip_tables [ 50.500731] CPU: 0 PID: 35 Comm: kworker/u8:3 Not tainted 5.12.0-rc2-00093-geaba74271070 #1 [ 50.509081] Hardware name: HP HP Z238 Microtower Workstation/8183, BIOS N51 Ver. 01.63 10/05/2017 [ 50.517963] Workqueue: loop0 loop_rootcg_workfn [loop] [ 50.523109] RIP: 0010:check_flush_dependency+0x116/0x140 [ 50.528427] Code: 8d 8b b0 00 00 00 48 8b 50 18 49 89 e8 48 8d b1 b0 00 00 00 48 c7 c7 80 ed 34 82 4c 89 c9 c6 05 8a 11 ab 01 01 e8 29 9a a9 00 <0f> 0b e9 07 ff ff ff 80 3d 78 11 ab 01 00 75 92 e9 3b ff ff ff 66 [ 50.547207] RSP: 0018:ffffc9000018fba0 EFLAGS: 00010086 [ 50.552434] RAX: 0000000000000000 RBX: ffff88843a39b600 RCX: 0000000000000027 [ 50.559567] RDX: 0000000000000027 RSI: 00000000ffff7fff RDI: ffff88843f4177f8 [ 50.566713] RBP: ffffffffc091c760 R08: ffff88843f4177f0 R09: ffffc9000018f9b8 [ 50.573875] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88810d0fa800 [ 50.581006] R13: 0000000000000001 R14: ffff88843f42ab80 R15: ffff88810d0fa800 [ 50.588139] FS: 0000000000000000(0000) GS:ffff88843f400000(0000) knlGS:0000000000000000 [ 50.596228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 50.601975] CR2: 00007fbda63ea7f8 CR3: 000000043da0a002 CR4: 00000000003706f0 [ 50.609105] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 50.616240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 50.623379] Call Trace: [ 50.625837] __flush_work+0xaf/0x220 [ 50.629418] ? __queue_work+0x142/0x400 [ 50.633261] xfs_file_buffered_write+0x1d0/0x2c0 [xfs] [ 50.638468] do_iter_readv_writev+0x157/0x1c0 [ 50.642833] do_iter_write+0x7d/0x1c0 [ 50.646513] lo_write_bvec+0x62/0x1a0 [loop] [ 50.650804] loop_process_work+0x20d/0xb80 [loop] [ 50.655543] ? newidle_balance+0x1f0/0x400 [ 50.659647] process_one_work+0x1ed/0x3c0 [ 50.663696] worker_thread+0x50/0x3c0 [ 50.667365] ? process_one_work+0x3c0/0x3c0 [ 50.671568] kthread+0x116/0x160 [ 50.674813] ? kthread_park+0xa0/0xa0 [ 50.678476] ret_from_fork+0x22/0x30 [ 50.682070] ---[ end trace cdbf6e08ae434f6c ]--- [ 56.951496] loop: Write error at byte offset 1032134656, length 4096. [ 56.957973] blk_update_request: I/O error, dev loop0, sector 2015816 op 0x1:(WRITE) flags 0x4000 phys_seg 21 prio class 0 ---------------------------------------- Since xfs_init_mount_workqueues() as of 5.17-rc8 is doing mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id); , this will still be the case. Since scsi_add_host_with_dma() in drivers/scsi/hosts.c is doing shost->work_q = alloc_workqueue("%s", WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, 1, shost->work_q_name); iscsi_host_alloc() in drivers/scsi/libiscsi.c is doing ihost->workq = alloc_workqueue("%s", WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, 1, ihost->workq_name); iscsi_transport_init() in drivers/scsi/scsi_transport_iscsi.c is doing iscsi_eh_timer_workq = alloc_workqueue("%s", WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, 1, "iscsi_eh"); include/linux/workqueue.h is doing #define create_workqueue(name) \ alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name)) , I guess that also adding __WQ_LEGACY flag would avoid hitting WARN_ONCE(worker && ((worker->current_pwq->wq->flags & (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps", worker->current_pwq->wq->name, worker->current_func, target_wq->name, target_func); warning. But since include/linux/workqueue.h only says __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ , I can't tell when not to specify __WQ_LEGACY and WQ_MEM_RECLAIM together... Tejun, what is the intent of this warning? Can the description of __WQ_LEGACY flag be updated? I think that the loop module had better reserve one "struct task_struct" for each loop device. I guess that, in general, waiting for a work in !WQ_MEM_RECLAIM WQ from a WQ_MEM_RECLAIM WQ is dangerous because that work may not be able to find "struct task_struct" for processing that work. Then, what we should do is to create mp->m_sync_workqueue with WQ_MEM_RECLAIM flag added instead of creating lo->workqueue with __WQ_LEGACY + WQ_MEM_RECLAIM flags added... Is __WQ_LEGACY + WQ_MEM_RECLAIM combination a hack for silencing this warning without fixing various WQs used by xfs and other filesystems?
Hello, On Fri, Mar 18, 2022 at 09:05:42PM +0900, Tetsuo Handa wrote: > But since include/linux/workqueue.h only says > > __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ > > , I can't tell when not to specify __WQ_LEGACY and WQ_MEM_RECLAIM together... > > Tejun, what is the intent of this warning? Can the description of __WQ_LEGACY flag > be updated? I think that the loop module had better reserve one "struct task_struct" > for each loop device. > > I guess that, in general, waiting for a work in !WQ_MEM_RECLAIM WQ from a > WQ_MEM_RECLAIM WQ is dangerous because that work may not be able to find > "struct task_struct" for processing that work. Then, what we should do is to > create mp->m_sync_workqueue with WQ_MEM_RECLAIM flag added instead of creating > lo->workqueue with __WQ_LEGACY + WQ_MEM_RECLAIM flags added... > > Is __WQ_LEGACY + WQ_MEM_RECLAIM combination a hack for silencing this warning > without fixing various WQs used by xfs and other filesystems? So, create_workqueue() is the deprecated interface and always imples MEM_RECLAIM because back when the interface was added each wq had a dedicated worker and there's no way to tell one way or the other. The warning is telling you to convert the workqueue to the alloc_workqueue() interface and explicitly use WQ_MEM_RECLAIM flag if the workqueue is gonna participate in MEM_RECLAIM chain. Thanks.
On 2022/03/19 2:15, Tejun Heo wrote: > Hello, > > On Fri, Mar 18, 2022 at 09:05:42PM +0900, Tetsuo Handa wrote: >> But since include/linux/workqueue.h only says >> >> __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ >> >> , I can't tell when not to specify __WQ_LEGACY and WQ_MEM_RECLAIM together... >> >> Tejun, what is the intent of this warning? Can the description of __WQ_LEGACY flag >> be updated? I think that the loop module had better reserve one "struct task_struct" >> for each loop device. >> >> I guess that, in general, waiting for a work in !WQ_MEM_RECLAIM WQ from a >> WQ_MEM_RECLAIM WQ is dangerous because that work may not be able to find >> "struct task_struct" for processing that work. Then, what we should do is to >> create mp->m_sync_workqueue with WQ_MEM_RECLAIM flag added instead of creating >> lo->workqueue with __WQ_LEGACY + WQ_MEM_RECLAIM flags added... >> >> Is __WQ_LEGACY + WQ_MEM_RECLAIM combination a hack for silencing this warning >> without fixing various WQs used by xfs and other filesystems? > > So, create_workqueue() is the deprecated interface and always imples > MEM_RECLAIM because back when the interface was added each wq had a > dedicated worker and there's no way to tell one way or the other. The > warning is telling you to convert the workqueue to the alloc_workqueue() > interface and explicitly use WQ_MEM_RECLAIM flag if the workqueue is gonna > participate in MEM_RECLAIM chain. Is the intent of __WQ_LEGACY flag to indicate that "this WQ was created using deprecated interface" ? But such intention no longer holds true. Despite __WQ_LEGACY flag is described as "internal: create*_workqueue()", tegra194_cpufreq_probe()/scsi_add_host_with_dma()/iscsi_host_alloc()/ iscsi_transport_init() are passing __WQ_LEGACY flag using alloc_workqueue() interface. Therefore, __WQ_LEGACY flag is no longer a meaningful indicator of "internal: create*_workqueue()". Description for __WQ_LEGACY flag needs an update. Here is an example program which reproduces WARN_ONCE(worker && ((worker->current_pwq->wq->flags & (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps", worker->current_pwq->wq->name, worker->current_func, target_wq->name, target_func); reported in https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ . ---------- test.c ---------- #include <linux/module.h> #include <linux/sched.h> static struct workqueue_struct *wq1; static struct workqueue_struct *wq2; static struct work_struct w1; static struct work_struct w2; static void wq2_workfn(struct work_struct *work) { } static void wq1_workfn(struct work_struct *work) { INIT_WORK(&w2, wq2_workfn); queue_work(wq2, &w2); flush_work(&w2); } static int __init wq_test_init(void) { wq1 = alloc_workqueue("wq1", WQ_MEM_RECLAIM, 0); wq2 = alloc_workqueue("wq2", 0, 0); INIT_WORK(&w1, wq1_workfn); queue_work(wq1, &w1); flush_work(&w1); destroy_workqueue(wq2); destroy_workqueue(wq1); return -EINVAL; } module_init(wq_test_init); MODULE_LICENSE("GPL"); ---------- test.c ---------- ---------- [ 152.666153] test: loading out-of-tree module taints kernel. [ 152.673510] ------------[ cut here ]------------ [ 152.675765] workqueue: WQ_MEM_RECLAIM wq1:wq1_workfn [test] is flushing !WQ_MEM_RECLAIM wq2:wq2_workfn [test] [ 152.675790] WARNING: CPU: 0 PID: 259 at kernel/workqueue.c:2650 check_flush_dependency+0x169/0x170 [ 152.682636] Modules linked in: test(O+) loop dm_mod dax serio_raw sg vmw_vmci fuse drm sd_mod t10_pi ata_generic pata_acpi ahci libahci ata_piix mptspi mptscsih i2c_piix4 mptbase i2c_core libata scsi_transport_spi e1000 [ 152.690020] CPU: 0 PID: 259 Comm: kworker/0:2 Kdump: loaded Tainted: G O 5.17.0-rc8+ #72 [ 152.693869] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 152.697764] Workqueue: wq1 wq1_workfn [test] [ 152.699762] RIP: 0010:check_flush_dependency+0x169/0x170 [ 152.701817] Code: 8d 7f 18 e8 89 84 3a 00 49 8b 57 18 49 81 c5 60 01 00 00 48 c7 c7 60 f4 86 82 4c 89 e6 4c 89 e9 4d 89 f0 31 c0 e8 c7 80 fc ff <0f> 0b e9 61 ff ff ff 55 41 57 41 56 41 55 41 54 53 48 83 ec 18 48 [ 152.709031] RSP: 0018:ffff8881110a7b00 EFLAGS: 00010046 [ 152.711434] RAX: 19db87cad24ebb00 RBX: ffffe8ffffa4cc00 RCX: 0000000000000002 [ 152.714781] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: ffffffff84a84f00 [ 152.717334] RBP: 0000000000000000 R08: dffffc0000000000 R09: ffffed1023546ef8 [ 152.723543] R10: ffffed1023546ef8 R11: 1ffff11023546ef7 R12: ffff888113e75160 [ 152.729068] R13: ffff888113e70f60 R14: ffffffffa0848090 R15: ffff8881014c3528 [ 152.731834] FS: 0000000000000000(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000 [ 152.734735] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 152.736840] CR2: 0000557cfd128768 CR3: 000000007216f006 CR4: 00000000001706f0 [ 152.739790] Call Trace: [ 152.740715] <TASK> [ 152.742072] start_flush_work+0xf9/0x440 [ 152.746516] __flush_work+0xed/0x170 [ 152.747845] ? flush_work+0x10/0x10 [ 152.749240] ? __queue_work+0x558/0x5b0 [ 152.750648] ? queue_work_on+0xe0/0x160 [ 152.752036] ? lockdep_hardirqs_on+0xe6/0x170 [ 152.753757] ? queue_work_on+0xed/0x160 [ 152.755546] ? wq_worker_last_func+0x20/0x20 [ 152.757177] ? rcu_read_lock_sched_held+0x87/0x100 [ 152.758960] ? perf_trace_rcu_stall_warning+0x210/0x210 [ 152.760929] process_one_work+0x45e/0x6b0 [ 152.762587] ? rescuer_thread+0x9f0/0x9f0 [ 152.764332] ? _raw_spin_lock_irqsave+0xf0/0xf0 [ 152.766110] worker_thread+0x4d7/0x960 [ 152.767523] ? _raw_spin_unlock+0x40/0x40 [ 152.769146] ? preempt_count_sub+0xf/0xc0 [ 152.770595] ? _raw_spin_unlock_irqrestore+0xb2/0x110 [ 152.773091] ? rcu_lock_release+0x20/0x20 [ 152.774637] kthread+0x178/0x1a0 [ 152.775819] ? kthread_blkcg+0x50/0x50 [ 152.777228] ret_from_fork+0x1f/0x30 [ 152.778603] </TASK> [ 152.779427] irq event stamp: 12002 [ 152.782443] hardirqs last enabled at (12001): [<ffffffff81102620>] queue_work_on+0xe0/0x160 [ 152.786186] hardirqs last disabled at (12002): [<ffffffff8237473b>] _raw_spin_lock_irq+0x7b/0xe0 [ 152.789707] softirqs last enabled at (11996): [<ffffffff810d9105>] irq_exit_rcu+0xb5/0x100 [ 152.792767] softirqs last disabled at (11971): [<ffffffff810d9105>] irq_exit_rcu+0xb5/0x100 [ 152.795802] ---[ end trace 0000000000000000 ]--- ---------- But if I do - wq1 = alloc_workqueue("wq1", WQ_MEM_RECLAIM, 0); + wq1 = alloc_workqueue("wq1", __WQ_LEGACY | WQ_MEM_RECLAIM, 0); , this warning goes away. Therefore, it seems to me that __WQ_LEGACY flag is used in combination with WQ_MEM_RECLAIM flag in order to ask check_flush_dependency() not to emit this warning when we cannot afford doing - wq2 = alloc_workqueue("wq2", 0, 0); + wq2 = alloc_workqueue("wq2", WQ_MEM_RECLAIM, 0); because the owner of wq1 and wq2 differs. Given that the legacy create_workqueue() interface always implied WQ_MEM_RECLAIM flag, maybe it is better to make alloc_workqueue() interface WQ_MEM_RECLAIM by default. That is, obsolete WQ_MEM_RECLAIM flag and __WQ_LEGACY flag, and introduce a new flag (e.g. WQ_MAY_SHARE_WORKER) which is passed to alloc_workqueue() interface only when it is absolutely confident that this WQ never participates in memory reclaim path and never participates in flush_workqueue()/flush_work() operation.
Hello, On Sat, Mar 19, 2022 at 11:02:51AM +0900, Tetsuo Handa wrote: > Is the intent of __WQ_LEGACY flag to indicate that "this WQ was created > using deprecated interface" ? But such intention no longer holds true. > > Despite __WQ_LEGACY flag is described as "internal: create*_workqueue()", > tegra194_cpufreq_probe()/scsi_add_host_with_dma()/iscsi_host_alloc()/ > iscsi_transport_init() are passing __WQ_LEGACY flag using alloc_workqueue() > interface. Therefore, __WQ_LEGACY flag is no longer a meaningful indicator of > "internal: create*_workqueue()". Description for __WQ_LEGACY flag needs an > update. ... > Given that the legacy create_workqueue() interface always implied WQ_MEM_RECLAIM flag, > > maybe it is better to make alloc_workqueue() interface WQ_MEM_RECLAIM by default. That actually is pretty expensive when added up, which is why we went for the shared worker pool model in the first place. > That is, obsolete WQ_MEM_RECLAIM flag and __WQ_LEGACY flag, and introduce a new flag > (e.g. WQ_MAY_SHARE_WORKER) which is passed to alloc_workqueue() interface only when > it is absolutely confident that this WQ never participates in memory reclaim path and > never participates in flush_workqueue()/flush_work() operation. No, just fix the abusers. There are four abusers in the kernel and they aren't difficult to fix. Thanks.
On 2022/03/22 1:55, Tejun Heo wrote: > No, just fix the abusers. There are four abusers in the kernel and they > aren't difficult to fix. So, are you expecting that a change shown below happens, by adding WQ_MEM_RECLAIM flag to all WQs which may hit "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps" warning? Otherwise, __WQ_LEGACY flag will continue serving as a hack for suppressing this warning. --- drivers/cpufreq/tegra194-cpufreq.c | 2 +- drivers/scsi/hosts.c | 2 +- drivers/scsi/libiscsi.c | 2 +- drivers/scsi/scsi_transport_iscsi.c | 2 +- include/linux/workqueue.h | 7 +++---- kernel/workqueue.c | 3 +-- 6 files changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c index ac381db25dbe..5d72ef07f9ed 100644 --- a/drivers/cpufreq/tegra194-cpufreq.c +++ b/drivers/cpufreq/tegra194-cpufreq.c @@ -379,7 +379,7 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev) if (IS_ERR(bpmp)) return PTR_ERR(bpmp); - read_counters_wq = alloc_workqueue("read_counters_wq", __WQ_LEGACY, 1); + read_counters_wq = alloc_workqueue("read_counters_wq", 0, 1); if (!read_counters_wq) { dev_err(&pdev->dev, "fail to create_workqueue\n"); err = -EINVAL; diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index f69b77cbf538..4485f65d5e92 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -277,7 +277,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, snprintf(shost->work_q_name, sizeof(shost->work_q_name), "scsi_wq_%d", shost->host_no); shost->work_q = alloc_workqueue("%s", - WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, + WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND, 1, shost->work_q_name); if (!shost->work_q) { diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 059dae8909ee..c923d2a1086e 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -2801,7 +2801,7 @@ struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht, snprintf(ihost->workq_name, sizeof(ihost->workq_name), "iscsi_q_%d", shost->host_no); ihost->workq = alloc_workqueue("%s", - WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, + WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND, 1, ihost->workq_name); if (!ihost->workq) goto free_host; diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index 554b6f784223..b4f0b7584112 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -4876,7 +4876,7 @@ static __init int iscsi_transport_init(void) } iscsi_eh_timer_workq = alloc_workqueue("%s", - WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, + WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND, 1, "iscsi_eh"); if (!iscsi_eh_timer_workq) { err = -ENOMEM; diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 7fee9b6cfede..666c4cc73f6b 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -337,7 +337,6 @@ enum { __WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */ __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */ - __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ __WQ_ORDERED_EXPLICIT = 1 << 19, /* internal: alloc_ordered_workqueue() */ WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */ @@ -420,12 +419,12 @@ alloc_workqueue(const char *fmt, unsigned int flags, int max_active, ...); __WQ_ORDERED_EXPLICIT | (flags), 1, ##args) #define create_workqueue(name) \ - alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name)) + alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, (name)) #define create_freezable_workqueue(name) \ - alloc_workqueue("%s", __WQ_LEGACY | WQ_FREEZABLE | WQ_UNBOUND | \ + alloc_workqueue("%s", WQ_FREEZABLE | WQ_UNBOUND | \ WQ_MEM_RECLAIM, 1, (name)) #define create_singlethread_workqueue(name) \ - alloc_ordered_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, name) + alloc_ordered_workqueue("%s", WQ_MEM_RECLAIM, name) extern void destroy_workqueue(struct workqueue_struct *wq); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 33f1106b4f99..aba3b1505292 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2643,8 +2643,7 @@ static void check_flush_dependency(struct workqueue_struct *target_wq, WARN_ONCE(current->flags & PF_MEMALLOC, "workqueue: PF_MEMALLOC task %d(%s) is flushing !WQ_MEM_RECLAIM %s:%ps", current->pid, current->comm, target_wq->name, target_func); - WARN_ONCE(worker && ((worker->current_pwq->wq->flags & - (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), + WARN_ONCE(worker && (worker->current_pwq->wq->flags & WQ_MEM_RECLAIM), "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps", worker->current_pwq->wq->name, worker->current_func, target_wq->name, target_func);
Hello, On Tue, Mar 22, 2022 at 07:53:49AM +0900, Tetsuo Handa wrote: > On 2022/03/22 1:55, Tejun Heo wrote: > > No, just fix the abusers. There are four abusers in the kernel and they > > aren't difficult to fix. > > So, are you expecting that a change shown below happens, by adding WQ_MEM_RECLAIM > flag to all WQs which may hit "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing > !WQ_MEM_RECLAIM %s:%ps" warning? Otherwise, __WQ_LEGACY flag will continue > serving as a hack for suppressing this warning. If you convert them to all of them in the flush chain to use alloc_workqueue() w/ MEM_RECLAIM, the warning will go away. > #define create_workqueue(name) \ > - alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name)) > + alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, (name)) > #define create_freezable_workqueue(name) \ > - alloc_workqueue("%s", __WQ_LEGACY | WQ_FREEZABLE | WQ_UNBOUND | \ > + alloc_workqueue("%s", WQ_FREEZABLE | WQ_UNBOUND | \ > WQ_MEM_RECLAIM, 1, (name)) > #define create_singlethread_workqueue(name) \ > - alloc_ordered_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, name) > + alloc_ordered_workqueue("%s", WQ_MEM_RECLAIM, name) But why are you dropping the flag from their intended users? Thanks.
On 2022/03/22 8:04, Tejun Heo wrote:
> But why are you dropping the flag from their intended users?
As far as I can see, the only difference __WQ_LEGACY makes is whether
"workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps"
warning is printed or not. What are the intended users?
On Tue, Mar 22, 2022 at 08:17:43AM +0900, Tetsuo Handa wrote: > On 2022/03/22 8:04, Tejun Heo wrote: > > But why are you dropping the flag from their intended users? > > As far as I can see, the only difference __WQ_LEGACY makes is whether > "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps" > warning is printed or not. What are the intended users? The old create_workqueue() and friends always imply WQ_MEM_RECLAIM because they all used to be served dedicated kthreads. The growing number of kthreads used this way became a headache. There were too many of these kthreads sitting around doing nothing. In some niche configurations, they ate up enough PIDs to cause boot failrues. To address the issue, the new implementation made the workqueues share pools of workers. However, this means that forward progress can't be guaranteed under memory pressure, so workqueues which are depended upon during memory reclaim now need to set WQ_MEM_RECLAIM explicitly to request a dedicated rescuer thread. The legacy flushing warning is telling us that those workqueues can be converted to alloc_workqueue + WQ_MEM_RECLAIM as we know them to be participating in memory reclaim (as they're flushing one of the explicitly marked workqueues). So, if you spot them, the right thing to do is converting all the involved workqueues to use alloc_workqueue() + WQ_MEM_RECLAIM. Thanks.
On 2022/03/22 8:27, Tejun Heo wrote: > On Tue, Mar 22, 2022 at 08:17:43AM +0900, Tetsuo Handa wrote: >> On 2022/03/22 8:04, Tejun Heo wrote: >>> But why are you dropping the flag from their intended users? >> >> As far as I can see, the only difference __WQ_LEGACY makes is whether >> "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps" >> warning is printed or not. What are the intended users? > > The old create_workqueue() and friends always imply WQ_MEM_RECLAIM because > they all used to be served dedicated kthreads. The growing number of > kthreads used this way became a headache. There were too many of these > kthreads sitting around doing nothing. In some niche configurations, they > ate up enough PIDs to cause boot failrues. OK. > > To address the issue, the new implementation made the workqueues share pools > of workers. However, this means that forward progress can't be guaranteed > under memory pressure, so workqueues which are depended upon during memory > reclaim now need to set WQ_MEM_RECLAIM explicitly to request a dedicated > rescuer thread. OK. > > The legacy flushing warning is telling us that those workqueues can be s/can be/must be/ ? > converted to alloc_workqueue + WQ_MEM_RECLAIM as we know them to be > participating in memory reclaim (as they're flushing one of the explicitly > marked workqueues). So, if you spot them, the right thing to do is > converting all the involved workqueues to use alloc_workqueue() + > WQ_MEM_RECLAIM. Then, can the description of __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ be improved to something like /* * This flag disables deadlock detection which can happen when flushing * a work item in !WQ_MEM_RECLAIM workqueue from WQ_MEM_RECLAIM workqueue. * But try to avoid using this flag, by adding WQ_MEM_RECLAIM to all WQs which * can be involved where a guarantee of forward progress under memory pressure * is required. */ ? Current /* internal: create*_workqueue() */ tells me nothing. My question is: I want to add WQ_MEM_RECLAIM flag to the WQ used by loop module because this WQ is involved upon writeback operation. But unless I add both __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used by loop module, we will hit WARN_ONCE(worker && ((worker->current_pwq->wq->flags & (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), warning because e.g. xfs-sync WQ used by xfs module is not using WQ_MEM_RECLAIM flag. mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id); You are suggesting that the correct approach is to add WQ_MEM_RECLAIM flag to WQs used by filesystems when adding WQ_MEM_RECLAIM flag to the WQ used by loop module introduces possibility of hitting WARN_ONCE(worker && ((worker->current_pwq->wq->flags & (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), warning (instead of either adding __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used by loop module or giving up WQ_MEM_RECLAIM flag for the WQ used by loop module), correct?
Hello, On Tue, Mar 22, 2022 at 09:09:53AM +0900, Tetsuo Handa wrote: > > The legacy flushing warning is telling us that those workqueues can be > > s/can be/must be/ ? Well, one thing that we can but don't want to do is converting all create_workqueue() users to alloc_workqueue() with MEM_RECLAIM, which is wasteful but won't break anything. We know for sure that the workqueues which trigger the legacy warning are participating in memory reclaim and thus need MEM_RECLAIM. So, yeah, the warning tells us that they need MEM_RECLAIM and should be converted. > ? Current /* internal: create*_workqueue() */ tells me nothing. It's trying to say that it shouldn't be used outside workqueue proper and the warning message is supposed to trigger the conversion. But, yeah, a stronger comment can help. > My question is: I want to add WQ_MEM_RECLAIM flag to the WQ used by loop module > because this WQ is involved upon writeback operation. But unless I add both > __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used by loop module, we will hit > > WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), > > warning because e.g. xfs-sync WQ used by xfs module is not using WQ_MEM_RECLAIM flag. > > mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", > XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id); > > You are suggesting that the correct approach is to add WQ_MEM_RECLAIM flag to WQs > used by filesystems when adding WQ_MEM_RECLAIM flag to the WQ used by loop module > introduces possibility of hitting > > WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), > > warning (instead of either adding __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used > by loop module or giving up WQ_MEM_RECLAIM flag for the WQ used by loop module), > correct? Yeah, you detected multiple issues at the same time. xfs sync is participating in memory reclaim but doesn't have MEM_RECLAIM and loop is marked with LEGACY but flushing other workqueues which are MEM_RECLIAM. So, both xfs and loop workqueues need to be explicitly marked with MEM_RECLAIM. Thanks.
On Tue, Mar 22, 2022 at 06:52:14AM -1000, Tejun Heo wrote: > Hello, > > On Tue, Mar 22, 2022 at 09:09:53AM +0900, Tetsuo Handa wrote: > > > The legacy flushing warning is telling us that those workqueues can be > > > > s/can be/must be/ ? > > Well, one thing that we can but don't want to do is converting all > create_workqueue() users to alloc_workqueue() with MEM_RECLAIM, which is > wasteful but won't break anything. We know for sure that the workqueues > which trigger the legacy warning are participating in memory reclaim and > thus need MEM_RECLAIM. So, yeah, the warning tells us that they need > MEM_RECLAIM and should be converted. > > > ? Current /* internal: create*_workqueue() */ tells me nothing. > > It's trying to say that it shouldn't be used outside workqueue proper and > the warning message is supposed to trigger the conversion. But, yeah, a > stronger comment can help. > > > My question is: I want to add WQ_MEM_RECLAIM flag to the WQ used by loop module > > because this WQ is involved upon writeback operation. But unless I add both > > __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used by loop module, we will hit > > > > WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > > (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), > > > > warning because e.g. xfs-sync WQ used by xfs module is not using WQ_MEM_RECLAIM flag. > > > > mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", > > XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id); > > > > You are suggesting that the correct approach is to add WQ_MEM_RECLAIM flag to WQs > > used by filesystems when adding WQ_MEM_RECLAIM flag to the WQ used by loop module > > introduces possibility of hitting > > > > WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > > (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), > > > > warning (instead of either adding __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used > > by loop module or giving up WQ_MEM_RECLAIM flag for the WQ used by loop module), > > correct? > > Yeah, you detected multiple issues at the same time. xfs sync is > participating in memory reclaim No it isn't. What makes you think it is part of memory reclaim? The xfs-sync workqueue exists solely to perform async flushes of dirty data at ENOSPC via sync_inodes_sb() because we can't call sync_inodes_sb directly in the context that hit ENOSPC due to locks and transaction contexts held. The paths that need this are buffered writes and file create (on disk inode allocation), neither of which are in the the memory reclaim path, either. So this work has nothing to do with memory reclaim, and as such it's not tagged with WQ_MEM_RECLAIM. Cheers, Dave.
Hello, On Wed, Mar 23, 2022 at 09:00:07AM +1100, Dave Chinner wrote: > > Yeah, you detected multiple issues at the same time. xfs sync is > > participating in memory reclaim > > No it isn't. What makes you think it is part of memory reclaim? > > The xfs-sync workqueue exists solely to perform async flushes of > dirty data at ENOSPC via sync_inodes_sb() because we can't call > sync_inodes_sb directly in the context that hit ENOSPC due to locks > and transaction contexts held. The paths that need this are > buffered writes and file create (on disk inode allocation), neither > of which are in the the memory reclaim path, either. > > So this work has nothing to do with memory reclaim, and as such it's > not tagged with WQ_MEM_RECLAIM. Hmmm... yeah, I actually don't know the exact dependency here and the dependency may not be real - e.g. the conclusion might be that loop is conflating different uses and needs to split its use of workqueues into two separate ones. Tetsuo, can you post more details on the warning that you're seeing? Thanks.
On 2022/03/23 7:02, Tejun Heo wrote: > Hello, > > On Wed, Mar 23, 2022 at 09:00:07AM +1100, Dave Chinner wrote: >>> Yeah, you detected multiple issues at the same time. xfs sync is >>> participating in memory reclaim >> >> No it isn't. What makes you think it is part of memory reclaim? >> >> The xfs-sync workqueue exists solely to perform async flushes of >> dirty data at ENOSPC via sync_inodes_sb() because we can't call >> sync_inodes_sb directly in the context that hit ENOSPC due to locks >> and transaction contexts held. The paths that need this are >> buffered writes and file create (on disk inode allocation), neither >> of which are in the the memory reclaim path, either. >> >> So this work has nothing to do with memory reclaim, and as such it's >> not tagged with WQ_MEM_RECLAIM. > > Hmmm... yeah, I actually don't know the exact dependency here and the > dependency may not be real - e.g. the conclusion might be that loop is > conflating different uses and needs to split its use of workqueues into two > separate ones. Tetsuo, can you post more details on the warning that you're > seeing? > It was reported at https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ .
On Wed, Mar 23, 2022 at 07:05:56AM +0900, Tetsuo Handa wrote: > > Hmmm... yeah, I actually don't know the exact dependency here and the > > dependency may not be real - e.g. the conclusion might be that loop is > > conflating different uses and needs to split its use of workqueues into two > > separate ones. Tetsuo, can you post more details on the warning that you're > > seeing? > > > > It was reported at https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ . Looks like a correct dependency to me. The work item is being flushed from good old write path. Dave? Thanks.
On Tue, Mar 22, 2022 at 12:19:40PM -1000, Tejun Heo wrote: > On Wed, Mar 23, 2022 at 07:05:56AM +0900, Tetsuo Handa wrote: > > > Hmmm... yeah, I actually don't know the exact dependency here and the > > > dependency may not be real - e.g. the conclusion might be that loop is > > > conflating different uses and needs to split its use of workqueues into two > > > separate ones. Tetsuo, can you post more details on the warning that you're > > > seeing? > > > > > > > It was reported at https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ . > > Looks like a correct dependency to me. The work item is being flushed from > good old write path. Dave? The filesystem buffered write IO path isn't part of memory reclaim - it's a user IO path and I think most filesystems will treat it that way. We've had similar layering problems with the loop IO path implyingi GFP_NOFS must be used by filesystems allocating memory in the IO path - we solved that by requiring the loop IO submission context (loop_process_work()) to set PF_MEMALLOC_NOIO so that it didn't deadlock anywhere in the underlying filesystems that have no idea that the loop device has added memory reclaim constraints to the IO path. This seems like it's the same layering problem - syscall facing IO paths are designed for incoming IO from user context, not outgoing writeback IO from memory reclaim contexts. Memory reclaim contexts are supposed to use back end filesystem operations like ->writepages() to flush dirty data when necessary. If the loop device IO mechanism means that every ->write_iter path needs to be considered as directly in the memory reclaim path, then that means a huge amount of the kernel needs to be considered as "in memory reclaim". i.e. it's not just this one XFS workqueue that is going have this problem - it's any workqueue that can be waited on by the incoming IO path. For example, network filesystem might put the network stack directly in the IO path. Which means if we then put loop on top of that filesystems, various workqueues in the network stack may now need to be considered as running under the memory reclaim path because of the loop block device. I don't know what the solution is, but if the fix is "xfs needs to mark a workqueue that has nothing to do with memory reclaim as WQ_MEM_RECLAIM because of the loop device" then we're talking about playing workqueue whack-a-mole across the entire kernel forever more.... Cheers, Dave.
On 2022/03/23 7:59, Dave Chinner wrote: > I don't know what the solution is, but if the fix is "xfs needs to > mark a workqueue that has nothing to do with memory reclaim as > WQ_MEM_RECLAIM because of the loop device" then we're talking about > playing workqueue whack-a-mole across the entire kernel forever > more.... During an attempt to fix lockdep warning caused by calling destroy_workqueue() when loop device's autoclear operation is triggered with disk->open_mutex held, Christoph has proposed a patch which avoids calling destroy_workqueue() by using a global WQ. But like demonstrated at https://lkml.kernel.org/r/22b51922-30c6-e4ed-ace9-5620f877682c@i-love.sakura.ne.jp , I confirmed that the loop module needs to reserve one "struct task_struct" on each loop device's WQ. And creating per loop device WQ with WQ_MEM_RECLAIM flag is the only available method for reserving "struct task_struct" on each loop device's WQ. That is, WQ_MEM_RECLAIM flag is required for not only surviving memory pressure situation but also surviving max active limitation. But like demonstrated at https://lkml.kernel.org/r/61f41e56-3650-f0fc-9ef5-7e19fe84e6b7@I-love.SAKURA.ne.jp , creating per loop device WQ with WQ_MEM_RECLAIM flag without __WQ_LEGACY flag will hit this "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps" warning. And if WQs used by filesystem side do not want to use WQ_MEM_RECLAIM flag, the loop module would have to abuse __WQ_LEGACY flag in order to suppress this warning.
On Wed, Mar 23, 2022 at 08:32:49AM +0900, Tetsuo Handa wrote: > On 2022/03/23 7:59, Dave Chinner wrote: > > I don't know what the solution is, but if the fix is "xfs needs to > > mark a workqueue that has nothing to do with memory reclaim as > > WQ_MEM_RECLAIM because of the loop device" then we're talking about > > playing workqueue whack-a-mole across the entire kernel forever > > more.... .... > And if WQs used by filesystem side do not want to use WQ_MEM_RECLAIM flag, the loop > module would have to abuse __WQ_LEGACY flag in order to suppress this warning. I'm not talking about whether filesysetms want to use WQ_MEM_RECLAIM or not, I'm commenting on the implicit depedency that the loop device creates that forces the use of WQ_MEM_RECLAIM in all downstream workqueues. That's what I'm asking about here - how far does this implicit, undocumented dependency actually reach and how do we communicate to all developers so that they know about this in the future when creating new workqueues that might end up under the loop device? That's the problem here - unless the developer explicitly considers and/or remembers this loopback dependency when adding a new workqueue to a filesystem (or even as far down as network stacks) then this problem is going to keep happening and we'll just have to keep driving WQ_MEM_RECLAIM deeper into the stack. Tejun stated that just marking all workqueues as WQ_MEM_RECLAIM as being problematic in some situations, and I'm concerned that resolving implicit dependencies are going to end up in that situation anyway. Cheers, Dave.
Hello, On Wed, Mar 23, 2022 at 09:59:14AM +1100, Dave Chinner wrote: > The filesystem buffered write IO path isn't part of memory reclaim - > it's a user IO path and I think most filesystems will treat it that > way. We can argue the semantics but anything in fs / io path which sit on write path should be marked MEM_RECLAIM because they can be depended upon while cleaning dirty pages. This isn't a layering problem or anything. It's just what that flag is for. > If the loop device IO mechanism means that every ->write_iter path > needs to be considered as directly in the memory reclaim path, then > that means a huge amount of the kernel needs to be considered as "in > memory reclaim". i.e. it's not just this one XFS workqueue that is > going have this problem - it's any workqueue that can be waited on > by the incoming IO path. > > For example, network filesystem might put the network stack directly > in the IO path. Which means if we then put loop on top of that > filesystems, various workqueues in the network stack may now need to > be considered as running under the memory reclaim path because of > the loop block device. > > I don't know what the solution is, but if the fix is "xfs needs to > mark a workqueue that has nothing to do with memory reclaim as > WQ_MEM_RECLAIM because of the loop device" then we're talking about > playing workqueue whack-a-mole across the entire kernel forever > more.... Yeah, all those workqueues must be and most of them are already tagged with MEM_RECLAIM. The network drivers are kinda painful and we *can* make them conditional (on it sitting in the io path) if that ever becomes necessary but the number hasn't been problematic till now. Thanks.
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 19fe19eaa50e..60dfebcedd95 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1013,7 +1013,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode, lo->lo_flags |= LO_FLAGS_READ_ONLY; lo->workqueue = alloc_workqueue("loop%d", - WQ_UNBOUND | WQ_FREEZABLE, + WQ_MEM_RECLAIM | WQ_UNBOUND | WQ_FREEZABLE, 0, lo->lo_number); if (!lo->workqueue) {
Commit b5dd2f6047ca1080 ("block: loop: improve performance via blk-mq") started using a global WQ_MEM_RECLAIM workqueue. + loop_wq = alloc_workqueue("kloopd", + WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 0); Commit f4aa4c7bbac6c4af ("block: loop: convert to per-device workqueue") started using per device WQ_MEM_RECLAIM workqueue. - loop_wq = alloc_workqueue("kloopd", - WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 0); + lo->wq = alloc_workqueue("kloopd%d", + WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 0, + lo->lo_number); Commit 4d4e41aef9429872 ("block: loop: avoiding too many pending per work I/O") adjusted max_active parameter. - WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 0, + WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16, Commit e03a3d7a94e2485b ("block: loop: use kthread_work") started using per device kthread. - lo->wq = alloc_workqueue("kloopd%d", - WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16, - lo->lo_number); + lo->worker_task = kthread_run(kthread_worker_fn, + &lo->worker, "loop%d", lo->lo_number); Commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker") again started using per device workqueue - lo->worker_task = kthread_run(loop_kthread_worker_fn, - &lo->worker, "loop%d", lo->lo_number); + lo->workqueue = alloc_workqueue("loop%d", + WQ_UNBOUND | WQ_FREEZABLE, + 0, + lo->lo_number); but forgot to restore WQ_MEM_RECLAIM flag. We should reserve one "struct task_struct" to each loop device in order to guarantee that I/O request for writeback operation can make forward progress even under memory pressure. I don't know why WQ_FREEZABLE flag was added, but let's add WQ_MEM_RECLAIM flag like commit f4aa4c7bbac6c4af ("block: loop: convert to per-device workqueue") describes. Fixes: 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- drivers/block/loop.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)