diff mbox series

blk-cgroup: add spin_lock for u64_stats_update

Message ID 20240705075544.11315-1-boy.wu@mediatek.com (mailing list archive)
State New
Headers show
Series blk-cgroup: add spin_lock for u64_stats_update | expand

Commit Message

Boy Wu (吳勃誼) July 5, 2024, 7:55 a.m. UTC
From: Boy Wu <boy.wu@mediatek.com>

In 32bit SMP systems, if the system is stressed on the sys node
by processes, it may cause blkcg_fill_root_iostats to have a concurrent
problem on the seqlock in u64_stats_update, which will cause a deadlock 
on u64_stats_fetch_begin in blkcg_print_one_stat.

To prevent this problem, add spin_locks.

Fixes: ef45fe470e1e ("blk-cgroup: show global disk stats in root cgroup io.stat")
Signed-off-by: Boy Wu <boy.wu@mediatek.com>
---
 block/blk-cgroup.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Markus Elfring July 5, 2024, 5:05 p.m. UTC | #1
> In 32bit SMP systems, if the system is stressed on the sys node
> by processes, it may cause blkcg_fill_root_iostats to have a concurrent
> problem on the seqlock in u64_stats_update, which will cause a deadlock
> on u64_stats_fetch_begin in blkcg_print_one_stat.

Would you like to mark any references to functions with parentheses?


> To prevent this problem, add spin_locks.

Another wording suggestion:
  Thus use an additional spin lock.


How do you think about to use a summary phrase like “Add a spin lock for stats update
in blkcg_fill_root_iostats()”?


…
> +++ b/block/blk-cgroup.c
> @@ -1134,9 +1134,15 @@ static void blkcg_fill_root_iostats(void)
>  				cpu_dkstats->sectors[STAT_DISCARD] << 9;
>  		}
>
> +#if BITS_PER_LONG == 32
> +		spin_lock_irq(&blkg->q->queue_lock);
> +#endif
>  		flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync);
>  		blkg_iostat_set(&blkg->iostat.cur, &tmp);
>  		u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
> +#if BITS_PER_LONG == 32
> +		spin_unlock_irq(&blkg->q->queue_lock);
> +#endif
…

Under which circumstances would you become interested to apply a statement
like “guard(spinlock_irq)(&blkg->q->queue_lock);”?
https://elixir.bootlin.com/linux/v6.10-rc6/source/include/linux/spinlock.h#L567

Regards,
Markus
Tejun Heo July 5, 2024, 5:13 p.m. UTC | #2
Hello,

On Fri, Jul 05, 2024 at 03:55:44PM +0800, boy.wu wrote:
> From: Boy Wu <boy.wu@mediatek.com>
> 
> In 32bit SMP systems, if the system is stressed on the sys node
> by processes, it may cause blkcg_fill_root_iostats to have a concurrent

What is sys node?

> problem on the seqlock in u64_stats_update, which will cause a deadlock 
> on u64_stats_fetch_begin in blkcg_print_one_stat.

I'm not following the scenario. Can you please detail the scenario where
this leads to deadlocks?

Thanks.
Boy Wu (吳勃誼) July 8, 2024, 2 a.m. UTC | #3
On Fri, 2024-07-05 at 07:13 -1000, Tejun Heo wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  Hello,
> 
> On Fri, Jul 05, 2024 at 03:55:44PM +0800, boy.wu wrote:
> > From: Boy Wu <boy.wu@mediatek.com>
> > 
> > In 32bit SMP systems, if the system is stressed on the sys node
> > by processes, it may cause blkcg_fill_root_iostats to have a
> concurrent
> 
> What is sys node?
> 
> > problem on the seqlock in u64_stats_update, which will cause a
> deadlock 
> > on u64_stats_fetch_begin in blkcg_print_one_stat.
> 
> I'm not following the scenario. Can you please detail the scenario
> where
> this leads to deadlocks?
> 
> Thanks.
> 
> -- 
> tejun

I am using stress-ng to stress my ARM 32bit SMP system, and there is a
test case --sysfs which create processes to read and write the node
under /sys/. Then I encountered a deadlock that 3 CPUs are in
do_raw_spin_lock(block/blk-cgroup.c:997) in blkcg_print_stat and 1 CPU
is in u64_stats_fetch_begin(block/blk-cgroup.c:931) in
blkcg_print_stat, and the sync.seq.sequence is an odd number, not an
even number.

When accessing /sys/fs/cgroup/io.stat, blkcg_print_stat will be called,
and there is a small chance that four processes on each CPU core are
accessing /sys/fs/cgroup/io.stat, which means four CPUs are calling
blkcg_print_stat. As a result, blkcg_fill_root_iostats will be called
simultaneously. However, u64_stats_update_begin_irqsave and
u64_stats_update_end_irqrestore are not protect by spin_locks, so there
is a small chance that the sync.seq.sequence will be an odd number
after u64_stats_update_end_irqrestore due to the concurrent CPUs acess,
because sync.seq.sequence plus one is not an atomic operation.

do_raw_write_seqcount_begin():
/usr/src/kernel/common/include/linux/seqlock.h:469
c05e5cfc:       e5963030        ldr     r3, [r6, #48]   ; 0x30
c05e5d00:       e2833001        add     r3, r3, #1
c05e5d04:       e5863030        str     r3, [r6, #48]   ; 0x30
/usr/src/kernel/common/include/linux/seqlock.h:470
c05e5d08:       f57ff05a        dmb     ishst

do_raw_write_seqcount_end():
/usr/src/kernel/common/include/linux/seqlock.h:489
c05e5d30:       f57ff05a        dmb     ishst
/usr/src/kernel/common/include/linux/seqlock.h:490
c05e5d34:       e5963030        ldr     r3, [r6, #48]   ; 0x30
c05e5d38:       e2833001        add     r3, r3, #1
c05e5d3c:       e5863030        str     r3, [r6, #48]   ; 0x30

To prevent this problem, I added spin_locks in blkcg_fill_root_iostats,
and this solution works fine to me when I use the stress-ng --sysfs
test.

--
boy.wu
Boy Wu (吳勃誼) July 8, 2024, 2:52 a.m. UTC | #4
On Fri, 2024-07-05 at 19:05 +0200, Markus Elfring wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  > In 32bit SMP systems, if the system is stressed on the sys node
> > by processes, it may cause blkcg_fill_root_iostats to have a
> concurrent
> > problem on the seqlock in u64_stats_update, which will cause a
> deadlock
> > on u64_stats_fetch_begin in blkcg_print_one_stat.
> 
> Would you like to mark any references to functions with parentheses?
> 
When deadlock happens, there are 3 CPUs in

do_raw_spin_lock
blkcg_print_stat
seq_read_iter
vfs_read
ksys_read

another 1 CPU is in

u64_stats_fetch_begin
blkcg_print_one_stat
blkcg_print_stat
seq_read_iter
vfs_read
ksys_read

> 
> > To prevent this problem, add spin_locks.
> 
> Another wording suggestion:
>   Thus use an additional spin lock.
> 
> 
> How do you think about to use a summary phrase like “Add a spin lock
> for stats update
> in blkcg_fill_root_iostats()”?
> 
> 
I can refine commit message as

blk-cgroup: Add a spin lock for stats update

In 32bit SMP systems, if multiple CPUs call blkcg_print_stat, it may
cause blkcg_fill_root_iostats to have a concurrent problem on the
seqlock in u64_stats_update, which will cause a deadlock
on u64_stats_fetch_begin in blkcg_print_one_stat.

Thus use an additional spin lock.
> …
> > +++ b/block/blk-cgroup.c
> > @@ -1134,9 +1134,15 @@ static void blkcg_fill_root_iostats(void)
> >  cpu_dkstats->sectors[STAT_DISCARD] << 9;
> >  }
> >
> > +#if BITS_PER_LONG == 32
> > +spin_lock_irq(&blkg->q->queue_lock);
> > +#endif
> >  flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync);
> >  blkg_iostat_set(&blkg->iostat.cur, &tmp);
> >  u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
> > +#if BITS_PER_LONG == 32
> > +spin_unlock_irq(&blkg->q->queue_lock);
> > +#endif
> …
> 
> Under which circumstances would you become interested to apply a
> statement
> like “guard(spinlock_irq)(&blkg->q->queue_lock);”?
> 
https://elixir.bootlin.com/linux/v6.10-rc6/source/include/linux/spinlock.h#L567
> 
> Regards,
> Markus

The spin lock usage is reference from blkcg_print_stat (v6.1.25).

I can try 
guard(spinlock_irq)(&blkg->q->queue_lock);
instead.

--
boy.wu
Markus Elfring July 8, 2024, 5:43 a.m. UTC | #5
> ************* MEDIATEK Confidentiality Notice ********************
…

Please omit such information from your patches for Linux software components.
Such hints do probably not fit to communication characteristics of Linux mailing lists.

Regards,
Markus
Tejun Heo July 8, 2024, 6:36 p.m. UTC | #6
Hello,

On Mon, Jul 08, 2024 at 02:00:42AM +0000, Boy Wu (吳勃誼) wrote:
> When accessing /sys/fs/cgroup/io.stat, blkcg_print_stat will be called,
> and there is a small chance that four processes on each CPU core are
> accessing /sys/fs/cgroup/io.stat, which means four CPUs are calling
> blkcg_print_stat. As a result, blkcg_fill_root_iostats will be called
> simultaneously. However, u64_stats_update_begin_irqsave and
> u64_stats_update_end_irqrestore are not protect by spin_locks, so there
> is a small chance that the sync.seq.sequence will be an odd number
> after u64_stats_update_end_irqrestore due to the concurrent CPUs acess,
> because sync.seq.sequence plus one is not an atomic operation.
> 
> do_raw_write_seqcount_begin():
> /usr/src/kernel/common/include/linux/seqlock.h:469
> c05e5cfc:       e5963030        ldr     r3, [r6, #48]   ; 0x30
> c05e5d00:       e2833001        add     r3, r3, #1
> c05e5d04:       e5863030        str     r3, [r6, #48]   ; 0x30
> /usr/src/kernel/common/include/linux/seqlock.h:470
> c05e5d08:       f57ff05a        dmb     ishst
> 
> do_raw_write_seqcount_end():
> /usr/src/kernel/common/include/linux/seqlock.h:489
> c05e5d30:       f57ff05a        dmb     ishst
> /usr/src/kernel/common/include/linux/seqlock.h:490
> c05e5d34:       e5963030        ldr     r3, [r6, #48]   ; 0x30
> c05e5d38:       e2833001        add     r3, r3, #1
> c05e5d3c:       e5863030        str     r3, [r6, #48]   ; 0x30
> 
> To prevent this problem, I added spin_locks in blkcg_fill_root_iostats,
> and this solution works fine to me when I use the stress-ng --sysfs
> test.

Ah, okay, so, there are two usages of u64_sync synchronization - one is for
blkg->iostat_cpu and the other for blkg->iostat. The former makes sense and
is safe as there can ever be only one updater with irq disabled. The latter
doesn't work as there can be multiple updaters and should be removed (and
replaced with spinlock). There's already blkg_stat_lock which probably was
added for a similar reason in __blkcg_rstat_flush(). Can you please update
the patch to remove the u64_sync usage for blkg->iostat and replace it with
blkg_stat_lock?

Thanks.
Boy Wu (吳勃誼) July 10, 2024, 6:19 a.m. UTC | #7
On Mon, 2024-07-08 at 07:43 +0200, Markus Elfring wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  > ************* MEDIATEK Confidentiality Notice ********************
> …
> 
> Please omit such information from your patches for Linux software
> components.
> Such hints do probably not fit to communication characteristics of
> Linux mailing lists.
> 
> Regards,
> Markus

The email system removing disclaimer is fixed. Can check with patch v2.

Boy.Wu
diff mbox series

Patch

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 37e6cc91d576..a633b7431e91 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1134,9 +1134,15 @@  static void blkcg_fill_root_iostats(void)
 				cpu_dkstats->sectors[STAT_DISCARD] << 9;
 		}
 
+#if BITS_PER_LONG == 32
+		spin_lock_irq(&blkg->q->queue_lock);
+#endif
 		flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync);
 		blkg_iostat_set(&blkg->iostat.cur, &tmp);
 		u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
+#if BITS_PER_LONG == 32
+		spin_unlock_irq(&blkg->q->queue_lock);
+#endif
 	}
 }