Message ID | 20240705075544.11315-1-boy.wu@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-cgroup: add spin_lock for u64_stats_update | expand |
> In 32bit SMP systems, if the system is stressed on the sys node > by processes, it may cause blkcg_fill_root_iostats to have a concurrent > problem on the seqlock in u64_stats_update, which will cause a deadlock > on u64_stats_fetch_begin in blkcg_print_one_stat. Would you like to mark any references to functions with parentheses? > To prevent this problem, add spin_locks. Another wording suggestion: Thus use an additional spin lock. How do you think about to use a summary phrase like “Add a spin lock for stats update in blkcg_fill_root_iostats()”? … > +++ b/block/blk-cgroup.c > @@ -1134,9 +1134,15 @@ static void blkcg_fill_root_iostats(void) > cpu_dkstats->sectors[STAT_DISCARD] << 9; > } > > +#if BITS_PER_LONG == 32 > + spin_lock_irq(&blkg->q->queue_lock); > +#endif > flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync); > blkg_iostat_set(&blkg->iostat.cur, &tmp); > u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags); > +#if BITS_PER_LONG == 32 > + spin_unlock_irq(&blkg->q->queue_lock); > +#endif … Under which circumstances would you become interested to apply a statement like “guard(spinlock_irq)(&blkg->q->queue_lock);”? https://elixir.bootlin.com/linux/v6.10-rc6/source/include/linux/spinlock.h#L567 Regards, Markus
Hello, On Fri, Jul 05, 2024 at 03:55:44PM +0800, boy.wu wrote: > From: Boy Wu <boy.wu@mediatek.com> > > In 32bit SMP systems, if the system is stressed on the sys node > by processes, it may cause blkcg_fill_root_iostats to have a concurrent What is sys node? > problem on the seqlock in u64_stats_update, which will cause a deadlock > on u64_stats_fetch_begin in blkcg_print_one_stat. I'm not following the scenario. Can you please detail the scenario where this leads to deadlocks? Thanks.
On Fri, 2024-07-05 at 07:13 -1000, Tejun Heo wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Hello, > > On Fri, Jul 05, 2024 at 03:55:44PM +0800, boy.wu wrote: > > From: Boy Wu <boy.wu@mediatek.com> > > > > In 32bit SMP systems, if the system is stressed on the sys node > > by processes, it may cause blkcg_fill_root_iostats to have a > concurrent > > What is sys node? > > > problem on the seqlock in u64_stats_update, which will cause a > deadlock > > on u64_stats_fetch_begin in blkcg_print_one_stat. > > I'm not following the scenario. Can you please detail the scenario > where > this leads to deadlocks? > > Thanks. > > -- > tejun I am using stress-ng to stress my ARM 32bit SMP system, and there is a test case --sysfs which create processes to read and write the node under /sys/. Then I encountered a deadlock that 3 CPUs are in do_raw_spin_lock(block/blk-cgroup.c:997) in blkcg_print_stat and 1 CPU is in u64_stats_fetch_begin(block/blk-cgroup.c:931) in blkcg_print_stat, and the sync.seq.sequence is an odd number, not an even number. When accessing /sys/fs/cgroup/io.stat, blkcg_print_stat will be called, and there is a small chance that four processes on each CPU core are accessing /sys/fs/cgroup/io.stat, which means four CPUs are calling blkcg_print_stat. As a result, blkcg_fill_root_iostats will be called simultaneously. However, u64_stats_update_begin_irqsave and u64_stats_update_end_irqrestore are not protect by spin_locks, so there is a small chance that the sync.seq.sequence will be an odd number after u64_stats_update_end_irqrestore due to the concurrent CPUs acess, because sync.seq.sequence plus one is not an atomic operation. do_raw_write_seqcount_begin(): /usr/src/kernel/common/include/linux/seqlock.h:469 c05e5cfc: e5963030 ldr r3, [r6, #48] ; 0x30 c05e5d00: e2833001 add r3, r3, #1 c05e5d04: e5863030 str r3, [r6, #48] ; 0x30 /usr/src/kernel/common/include/linux/seqlock.h:470 c05e5d08: f57ff05a dmb ishst do_raw_write_seqcount_end(): /usr/src/kernel/common/include/linux/seqlock.h:489 c05e5d30: f57ff05a dmb ishst /usr/src/kernel/common/include/linux/seqlock.h:490 c05e5d34: e5963030 ldr r3, [r6, #48] ; 0x30 c05e5d38: e2833001 add r3, r3, #1 c05e5d3c: e5863030 str r3, [r6, #48] ; 0x30 To prevent this problem, I added spin_locks in blkcg_fill_root_iostats, and this solution works fine to me when I use the stress-ng --sysfs test. -- boy.wu
On Fri, 2024-07-05 at 19:05 +0200, Markus Elfring wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > In 32bit SMP systems, if the system is stressed on the sys node > > by processes, it may cause blkcg_fill_root_iostats to have a > concurrent > > problem on the seqlock in u64_stats_update, which will cause a > deadlock > > on u64_stats_fetch_begin in blkcg_print_one_stat. > > Would you like to mark any references to functions with parentheses? > When deadlock happens, there are 3 CPUs in do_raw_spin_lock blkcg_print_stat seq_read_iter vfs_read ksys_read another 1 CPU is in u64_stats_fetch_begin blkcg_print_one_stat blkcg_print_stat seq_read_iter vfs_read ksys_read > > > To prevent this problem, add spin_locks. > > Another wording suggestion: > Thus use an additional spin lock. > > > How do you think about to use a summary phrase like “Add a spin lock > for stats update > in blkcg_fill_root_iostats()”? > > I can refine commit message as blk-cgroup: Add a spin lock for stats update In 32bit SMP systems, if multiple CPUs call blkcg_print_stat, it may cause blkcg_fill_root_iostats to have a concurrent problem on the seqlock in u64_stats_update, which will cause a deadlock on u64_stats_fetch_begin in blkcg_print_one_stat. Thus use an additional spin lock. > … > > +++ b/block/blk-cgroup.c > > @@ -1134,9 +1134,15 @@ static void blkcg_fill_root_iostats(void) > > cpu_dkstats->sectors[STAT_DISCARD] << 9; > > } > > > > +#if BITS_PER_LONG == 32 > > +spin_lock_irq(&blkg->q->queue_lock); > > +#endif > > flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync); > > blkg_iostat_set(&blkg->iostat.cur, &tmp); > > u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags); > > +#if BITS_PER_LONG == 32 > > +spin_unlock_irq(&blkg->q->queue_lock); > > +#endif > … > > Under which circumstances would you become interested to apply a > statement > like “guard(spinlock_irq)(&blkg->q->queue_lock);”? > https://elixir.bootlin.com/linux/v6.10-rc6/source/include/linux/spinlock.h#L567 > > Regards, > Markus The spin lock usage is reference from blkcg_print_stat (v6.1.25). I can try guard(spinlock_irq)(&blkg->q->queue_lock); instead. -- boy.wu
> ************* MEDIATEK Confidentiality Notice ********************
…
Please omit such information from your patches for Linux software components.
Such hints do probably not fit to communication characteristics of Linux mailing lists.
Regards,
Markus
Hello, On Mon, Jul 08, 2024 at 02:00:42AM +0000, Boy Wu (吳勃誼) wrote: > When accessing /sys/fs/cgroup/io.stat, blkcg_print_stat will be called, > and there is a small chance that four processes on each CPU core are > accessing /sys/fs/cgroup/io.stat, which means four CPUs are calling > blkcg_print_stat. As a result, blkcg_fill_root_iostats will be called > simultaneously. However, u64_stats_update_begin_irqsave and > u64_stats_update_end_irqrestore are not protect by spin_locks, so there > is a small chance that the sync.seq.sequence will be an odd number > after u64_stats_update_end_irqrestore due to the concurrent CPUs acess, > because sync.seq.sequence plus one is not an atomic operation. > > do_raw_write_seqcount_begin(): > /usr/src/kernel/common/include/linux/seqlock.h:469 > c05e5cfc: e5963030 ldr r3, [r6, #48] ; 0x30 > c05e5d00: e2833001 add r3, r3, #1 > c05e5d04: e5863030 str r3, [r6, #48] ; 0x30 > /usr/src/kernel/common/include/linux/seqlock.h:470 > c05e5d08: f57ff05a dmb ishst > > do_raw_write_seqcount_end(): > /usr/src/kernel/common/include/linux/seqlock.h:489 > c05e5d30: f57ff05a dmb ishst > /usr/src/kernel/common/include/linux/seqlock.h:490 > c05e5d34: e5963030 ldr r3, [r6, #48] ; 0x30 > c05e5d38: e2833001 add r3, r3, #1 > c05e5d3c: e5863030 str r3, [r6, #48] ; 0x30 > > To prevent this problem, I added spin_locks in blkcg_fill_root_iostats, > and this solution works fine to me when I use the stress-ng --sysfs > test. Ah, okay, so, there are two usages of u64_sync synchronization - one is for blkg->iostat_cpu and the other for blkg->iostat. The former makes sense and is safe as there can ever be only one updater with irq disabled. The latter doesn't work as there can be multiple updaters and should be removed (and replaced with spinlock). There's already blkg_stat_lock which probably was added for a similar reason in __blkcg_rstat_flush(). Can you please update the patch to remove the u64_sync usage for blkg->iostat and replace it with blkg_stat_lock? Thanks.
On Mon, 2024-07-08 at 07:43 +0200, Markus Elfring wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > ************* MEDIATEK Confidentiality Notice ******************** > … > > Please omit such information from your patches for Linux software > components. > Such hints do probably not fit to communication characteristics of > Linux mailing lists. > > Regards, > Markus The email system removing disclaimer is fixed. Can check with patch v2. Boy.Wu
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 37e6cc91d576..a633b7431e91 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1134,9 +1134,15 @@ static void blkcg_fill_root_iostats(void) cpu_dkstats->sectors[STAT_DISCARD] << 9; } +#if BITS_PER_LONG == 32 + spin_lock_irq(&blkg->q->queue_lock); +#endif flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync); blkg_iostat_set(&blkg->iostat.cur, &tmp); u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags); +#if BITS_PER_LONG == 32 + spin_unlock_irq(&blkg->q->queue_lock); +#endif } }