diff mbox series

[v4] blk-cgroup: Replace u64 sync with spinlock for iostat

Message ID 20240718084112.12202-1-boy.wu@mediatek.com (mailing list archive)
State New
Headers show
Series [v4] blk-cgroup: Replace u64 sync with spinlock for iostat | expand

Commit Message

Boy Wu (吳勃誼) July 18, 2024, 8:41 a.m. UTC
From: Boy Wu <boy.wu@mediatek.com>

In 32bit SMP systems, if multiple CPUs call blkcg_print_stat,
it may cause blkcg_fill_root_iostats to have a concurrent problem
on the seqlock in u64_stats_update, which will cause a deadlock
on u64_stats_fetch_begin in blkcg_print_one_stat.

Thus, replace u64 sync with spinlock to protect iostat.

Fixes: ef45fe470e1e ("blk-cgroup: show global disk stats in root cgroup io.stat")
Signed-off-by: Boy Wu <boy.wu@mediatek.com>
---
Change in v2:
 - update commit message
 - Remove u64_sync
 - Replace spin_lock_irq with guard statement
 - Replace blkg->q->queue_lock with blkg_stat_lock
Change in v3:
 - update commit message
 - Add spinlock in blkg_iostat_set structure
 - Replace all u64_sync with spinlock for iostat
 - Replace blkg_stat_lock with iostat.spinlock
Change in v4:
 - update commit message
 - Remove spinlock in blkg_iostat_set structure
 - Replace iostat.spinlock with blkg_stat_lock
 - Add 32-bit systems only define
---
 block/blk-cgroup.c | 34 +++++++++++++---------------------
 1 file changed, 13 insertions(+), 21 deletions(-)

Comments

Tejun Heo July 18, 2024, 9:15 p.m. UTC | #1
Hello,

On Thu, Jul 18, 2024 at 04:41:12PM +0800, boy.wu wrote:
>  static void blkg_clear_stat(struct blkcg_gq *blkg)
>  {
>  	int cpu;
>  
> +#if BITS_PER_LONG == 32
> +	guard(raw_spinlock_irqsave)(&blkg_stat_lock);
> +#endif

Can you please collect the ifdefs into a single place? If guard can be used
for that, that's great. If not, just spin_lock/unlock wrappers are fine too,
but please collect them into a single place and add a comment explaining why
this is necessary and why u64_sync isn't being used.

Also, for blkg_clear_stat(), we're running a slight chance of clearing of
iostat_cpu racing against state updates from the hot path. Given the
circumstances - stat reset is an cgroup1-only feature which isn't used
widely and a problematic interface to begin with, I believe this is a valid
choice but it needs to be noted.

Thanks.
Boy Wu (吳勃誼) July 19, 2024, 1:47 a.m. UTC | #2
On Thu, 2024-07-18 at 11:15 -1000, Tejun Heo wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  Hello,
> 
> On Thu, Jul 18, 2024 at 04:41:12PM +0800, boy.wu wrote:
> >  static void blkg_clear_stat(struct blkcg_gq *blkg)
> >  {
> >  int cpu;
> >  
> > +#if BITS_PER_LONG == 32
> > +guard(raw_spinlock_irqsave)(&blkg_stat_lock);
> > +#endif
> 
> Can you please collect the ifdefs into a single place? If guard can
> be used
> for that, that's great. If not, just spin_lock/unlock wrappers are
> fine too,
> but please collect them into a single place and add a comment
> explaining why
> this is necessary and why u64_sync isn't being used.
> 
If it is for readability, I think keeping using u64 sync instead of
replacing it with spinlock is better, because what u64 sync protects is
64-bit data for 32-bit systems, while spinlock can be used for many
other reasons. The root cause of this issue is just the incorrect use
of u64 sync. Adding back the missing spinlock for the correct usage of
u64 sync is simpler. Is there any benefit to replacing u64 sync with
spinlock?

> Also, for blkg_clear_stat(), we're running a slight chance of
> clearing of
> iostat_cpu racing against state updates from the hot path. Given the
> circumstances - stat reset is an cgroup1-only feature which isn't
> used
> widely and a problematic interface to begin with, I believe this is a
> valid
> choice but it needs to be noted.

I don't get this part, but if this is another issue, maybe another
patch would be better?

> 
> Thanks.
> 
> -- 
> tejun

--
boy.wu
Tejun Heo July 19, 2024, 5:31 p.m. UTC | #3
Hello, Boy.

On Fri, Jul 19, 2024 at 01:47:35AM +0000, Boy Wu (吳勃誼) wrote:
...
> If it is for readability, I think keeping using u64 sync instead of
> replacing it with spinlock is better, because what u64 sync protects is
> 64-bit data for 32-bit systems, while spinlock can be used for many
> other reasons. The root cause of this issue is just the incorrect use

Yeah, but it can't be used when there are multiple updaters.

> of u64 sync. Adding back the missing spinlock for the correct usage of
> u64 sync is simpler. Is there any benefit to replacing u64 sync with
> spinlock?

It doesn't make sense to protect u64_sync with a spinlock. Both are only
needed on 32bit. What's the point of having both? Also, note that iostat_cpu
is also updated from two paths - bio issue and stat reset. If you want to
keep that u64_sync, the only way to avoid possible deadlocks is adding
spinlock in the bio issue path too, which will be pretty expensive.

> > Also, for blkg_clear_stat(), we're running a slight chance of
> > clearing of
> > iostat_cpu racing against state updates from the hot path. Given the
> > circumstances - stat reset is an cgroup1-only feature which isn't
> > used
> > widely and a problematic interface to begin with, I believe this is a
> > valid
> > choice but it needs to be noted.
> 
> I don't get this part, but if this is another issue, maybe another
> patch would be better?

It's the same issue. Reset is another writer and whenever you have more than
one writers w/ u64_sync, there's a chance of deadlocks. So, iostat_cpu also
has two writers - bio issue and stat reset. If you want to keep using
u64_sync in both places, the only way to do it is adding spinlock protection
to both paths, which is an expensive thing to do for the bio issue path. So,
here, we'd rather just give up and let stat resetting be racy on 32bit.

Thanks.
Boy Wu (吳勃誼) July 26, 2024, 3:43 a.m. UTC | #4
On Fri, 2024-07-19 at 07:31 -1000, tj@kernel.org wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  Hello, Boy.
> 
> On Fri, Jul 19, 2024 at 01:47:35AM +0000, Boy Wu (吳勃誼) wrote:
> ...
> > If it is for readability, I think keeping using u64 sync instead of
> > replacing it with spinlock is better, because what u64 sync
> protects is
> > 64-bit data for 32-bit systems, while spinlock can be used for many
> > other reasons. The root cause of this issue is just the incorrect
> use
> 
> Yeah, but it can't be used when there are multiple updaters.
> 
> > of u64 sync. Adding back the missing spinlock for the correct usage
> of
> > u64 sync is simpler. Is there any benefit to replacing u64 sync
> with
> > spinlock?
> 
> It doesn't make sense to protect u64_sync with a spinlock. Both are
> only
> needed on 32bit. What's the point of having both? Also, note that
> iostat_cpu
> is also updated from two paths - bio issue and stat reset. If you
> want to
> keep that u64_sync, the only way to avoid possible deadlocks is
> adding
> spinlock in the bio issue path too, which will be pretty expensive.
> 

The use of a spinlock with u64 sync is suggested in
include/linux/u64_stats_sync.h:33.

 * Usage :
 *
 * Stats producer (writer) should use following template granted it
already got
 * an exclusive access to counters (a lock is already taken, or per cpu
 * data is used [in a non preemptable context])
 *
 *   spin_lock_bh(...) or other synchronization to get exclusive access
 *   ...
 *   u64_stats_update_begin(&stats->syncp);
 *   u64_stats_add(&stats->bytes64, len); // non atomic operation
 *   u64_stats_inc(&stats->packets64);    // non atomic operation
 *   u64_stats_update_end(&stats->syncp);

Is this a incorrect statment?

> > > Also, for blkg_clear_stat(), we're running a slight chance of
> > > clearing of
> > > iostat_cpu racing against state updates from the hot path. Given
> the
> > > circumstances - stat reset is an cgroup1-only feature which isn't
> > > used
> > > widely and a problematic interface to begin with, I believe this
> is a
> > > valid
> > > choice but it needs to be noted.
> > 
> > I don't get this part, but if this is another issue, maybe another
> > patch would be better?
> 
> It's the same issue. Reset is another writer and whenever you have
> more than
> one writers w/ u64_sync, there's a chance of deadlocks. So,
> iostat_cpu also
> has two writers - bio issue and stat reset. If you want to keep using
> u64_sync in both places, the only way to do it is adding spinlock
> protection
> to both paths, which is an expensive thing to do for the bio issue
> path. So,
> here, we'd rather just give up and let stat resetting be racy on
> 32bit.
> 
> Thanks.
> 
> -- 
> tejun

There are three places update iostat and two places update iostat_cpu
in blk-cgroup.c in version 6.10.1.

I assume the suggestion in u64_stats_sync.h is correct. For
readability, how about the following change?

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 85b3b9051455..7472cfa9a506 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -62,6 +62,10 @@ static struct workqueue_struct *blkcg_punt_bio_wq;

 static DEFINE_RAW_SPINLOCK(blkg_stat_lock);

+#if BITS_PER_LONG == 32
+static DEFINE_RAW_SPINLOCK(blkg_stat_cpu_lock);
+#endif
+
 #define BLKG_DESTROY_BATCH_SIZE  64

 /*
@@ -535,5 +539,55 @@ static void blkg_destroy_all(struct gendisk *disk)
        spin_unlock_irq(&q->queue_lock);
 }

+static inline unsigned long blkcg_iostats_update_begin_irqsave(struct
u64_stats_sync *syncp)
+{
+       unsigned long flags;
+
+#if BITS_PER_LONG == 64
+       flags = u64_stats_update_begin_irqsave(syncp);
+#else /* 64 bit */
+       raw_spin_lock_irqsave(&blkg_stat_lock, flags);
+       u64_stats_update_begin(syncp);
+#endif
+
+       return flags;
+}
+
+static inline void blkcg_iostats_update_end_irqrestore(struct
u64_stats_sync *syncp,
+                                                      unsigned long
flags)
+{
+#if BITS_PER_LONG == 64
+       u64_stats_update_end_irqrestore(syncp, flags);
+#else /* 64 bit */
+       u64_stats_update_end(syncp);
+       raw_spin_unlock_irqrestore(&blkg_stat_lock, flags);
+#endif
+}
+
+static unsigned long blkcg_iostats_cpu_update_begin_irqsave(struct
u64_stats_sync *syncp)
+{
+       unsigned long flags;
+
+#if BITS_PER_LONG == 64
+       flags = u64_stats_update_begin_irqsave(syncp);
+#else /* 64 bit */
+       raw_spin_lock_irqsave(&blkg_stat_cpu_lock, flags);
+       u64_stats_update_begin(syncp);
+#endif
+
+       return flags;
+}
+
+static inline void blkcg_iostats_cpu_update_end_irqrestore(struct
u64_stats_sync *syncp,
+                                                          unsigned
long flags)
+{
+#if BITS_PER_LONG == 64
+       u64_stats_update_end_irqrestore(syncp, flags);
+#else /* 64 bit */
+       u64_stats_update_end(syncp);
+       raw_spin_unlock_irqrestore(&blkg_stat_cpu_lock, flags);
+#endif
+}
+
 static void blkg_iostat_set(struct blkg_iostat *dst, struct
blkg_iostat *src)
 {
        int i;
@@ -632,24 +686,26 @@ static void blkg_iostat_set(struct blkg_iostat
*dst, struct blkg_iostat *src)
 static void __blkg_clear_stat(struct blkg_iostat_set *bis)
 {
        struct blkg_iostat cur = {0};
-       unsigned long flags;

-       flags = u64_stats_update_begin_irqsave(&bis->sync);
        blkg_iostat_set(&bis->cur, &cur);
        blkg_iostat_set(&bis->last, &cur);
-       u64_stats_update_end_irqrestore(&bis->sync, flags);
 }

 static void blkg_clear_stat(struct blkcg_gq *blkg)
 {
        int cpu;
+       unsigned long flags;

        for_each_possible_cpu(cpu) {
                struct blkg_iostat_set *s = per_cpu_ptr(blkg-
>iostat_cpu, cpu);

+               flags = blkcg_iostats_cpu_update_begin_irqsave(&s-
>sync);
                __blkg_clear_stat(s);
+               blkcg_iostats_cpu_update_end_irqrestore(&s->sync,
flags);
        }
+       flags = blkcg_iostats_update_begin_irqsave(&blkg->iostat.sync);
        __blkg_clear_stat(&blkg->iostat);
+       blkcg_iostats_update_end_irqrestore(&blkg->iostat.sync, flags);
 }

 static int blkcg_reset_stats(struct cgroup_subsys_state *css,
@@ -1134,9 +1190,9 @@ static void blkcg_fill_root_iostats(void)
                                cpu_dkstats->sectors[STAT_DISCARD] <<
9;
                }

-               flags = u64_stats_update_begin_irqsave(&blkg-
>iostat.sync);
+               flags = blkcg_iostats_update_begin_irqsave(&blkg-
>iostat.sync);
                blkg_iostat_set(&blkg->iostat.cur, &tmp);
-               u64_stats_update_end_irqrestore(&blkg->iostat.sync,
flags);
+               blkcg_iostats_update_end_irqrestore(&blkg->iostat.sync, 
flags);
        }
 }

@@ -2152,7 +2208,7 @@ void blk_cgroup_bio_start(struct bio *bio)

        cpu = get_cpu();
        bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
-       flags = u64_stats_update_begin_irqsave(&bis->sync);
+       flags = blkcg_iostats_cpu_update_begin_irqsave(&bis->sync);

        /*
         * If the bio is flagged with BIO_CGROUP_ACCT it means this is
a split
@@ -2175,7 +2231,7 @@ void blk_cgroup_bio_start(struct bio *bio)
                WRITE_ONCE(bis->lqueued, true);
        }

-       u64_stats_update_end_irqrestore(&bis->sync, flags);
+       blkcg_iostats_cpu_update_end_irqrestore(&bis->sync, flags);
        cgroup_rstat_updated(blkcg->css.cgroup, cpu);
        put_cpu();
 }

--
boy.wu
Tejun Heo July 30, 2024, 7:49 p.m. UTC | #5
Hello, Boy.

On Fri, Jul 26, 2024 at 03:43:27AM +0000, Boy Wu (吳勃誼) wrote:
...
> The use of a spinlock with u64 sync is suggested in
> include/linux/u64_stats_sync.h:33.
> 
>  * Usage :
>  *
>  * Stats producer (writer) should use following template granted it
> already got
>  * an exclusive access to counters (a lock is already taken, or per cpu
>  * data is used [in a non preemptable context])
>  *
>  *   spin_lock_bh(...) or other synchronization to get exclusive access
>  *   ...
>  *   u64_stats_update_begin(&stats->syncp);
>  *   u64_stats_add(&stats->bytes64, len); // non atomic operation
>  *   u64_stats_inc(&stats->packets64);    // non atomic operation
>  *   u64_stats_update_end(&stats->syncp);
> 
> Is this a incorrect statment?

That's not incorrect and it'd make sense if we really want to use u64_sync -
e.g. the reader is hot path. Here, just a spinlock would be simpler and do
fine.

Thanks.
Boy Wu (吳勃誼) Aug. 9, 2024, 7:50 a.m. UTC | #6
On Tue, 2024-07-30 at 09:49 -1000, tj@kernel.org wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  Hello, Boy.
> 
> On Fri, Jul 26, 2024 at 03:43:27AM +0000, Boy Wu (吳勃誼) wrote:
> ...
> > The use of a spinlock with u64 sync is suggested in
> > include/linux/u64_stats_sync.h:33.
> > 
> >  * Usage :
> >  *
> >  * Stats producer (writer) should use following template granted it
> > already got
> >  * an exclusive access to counters (a lock is already taken, or per
> cpu
> >  * data is used [in a non preemptable context])
> >  *
> >  *   spin_lock_bh(...) or other synchronization to get exclusive
> access
> >  *   ...
> >  *   u64_stats_update_begin(&stats->syncp);
> >  *   u64_stats_add(&stats->bytes64, len); // non atomic operation
> >  *   u64_stats_inc(&stats->packets64);    // non atomic operation
> >  *   u64_stats_update_end(&stats->syncp);
> > 
> > Is this a incorrect statment?
> 
> That's not incorrect and it'd make sense if we really want to use
> u64_sync -
> e.g. the reader is hot path. Here, just a spinlock would be simpler
> and do
> fine.
> 
> Thanks.
> 
> -- 
> tejun


u64_sync with spin lock has the benefit of locking only when writing
iostat, but replacing u64_sync with spin lock will lock not only when
writing iostat but also when reading iostat. Does it have enough
benefit to replace u64_sync and add the cost of locking when reading
iostat?

--
Boy.Wu
diff mbox series

Patch

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 37e6cc91d576..faa604c6fab9 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -329,7 +329,6 @@  static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct gendisk *disk,
 	INIT_WORK(&blkg->async_bio_work, blkg_async_bio_workfn);
 #endif
 
-	u64_stats_init(&blkg->iostat.sync);
 	for_each_possible_cpu(cpu) {
 		u64_stats_init(&per_cpu_ptr(blkg->iostat_cpu, cpu)->sync);
 		per_cpu_ptr(blkg->iostat_cpu, cpu)->blkg = blkg;
@@ -632,18 +631,18 @@  static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src)
 static void __blkg_clear_stat(struct blkg_iostat_set *bis)
 {
 	struct blkg_iostat cur = {0};
-	unsigned long flags;
 
-	flags = u64_stats_update_begin_irqsave(&bis->sync);
 	blkg_iostat_set(&bis->cur, &cur);
 	blkg_iostat_set(&bis->last, &cur);
-	u64_stats_update_end_irqrestore(&bis->sync, flags);
 }
 
 static void blkg_clear_stat(struct blkcg_gq *blkg)
 {
 	int cpu;
 
+#if BITS_PER_LONG == 32
+	guard(raw_spinlock_irqsave)(&blkg_stat_lock);
+#endif
 	for_each_possible_cpu(cpu) {
 		struct blkg_iostat_set *s = per_cpu_ptr(blkg->iostat_cpu, cpu);
 
@@ -995,15 +994,12 @@  static void blkcg_iostat_update(struct blkcg_gq *blkg, struct blkg_iostat *cur,
 				struct blkg_iostat *last)
 {
 	struct blkg_iostat delta;
-	unsigned long flags;
 
 	/* propagate percpu delta to global */
-	flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync);
 	blkg_iostat_set(&delta, cur);
 	blkg_iostat_sub(&delta, last);
 	blkg_iostat_add(&blkg->iostat.cur, &delta);
 	blkg_iostat_add(last, &delta);
-	u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
 }
 
 static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
@@ -1034,7 +1030,6 @@  static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
 		struct blkcg_gq *blkg = bisc->blkg;
 		struct blkcg_gq *parent = blkg->parent;
 		struct blkg_iostat cur;
-		unsigned int seq;
 
 		/*
 		 * Order assignment of `next_bisc` from `bisc->lnode.next` in
@@ -1051,10 +1046,7 @@  static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
 			goto propagate_up; /* propagate up to parent only */
 
 		/* fetch the current per-cpu values */
-		do {
-			seq = u64_stats_fetch_begin(&bisc->sync);
-			blkg_iostat_set(&cur, &bisc->cur);
-		} while (u64_stats_fetch_retry(&bisc->sync, seq));
+		blkg_iostat_set(&cur, &bisc->cur);
 
 		blkcg_iostat_update(blkg, &cur, &bisc->last);
 
@@ -1112,7 +1104,6 @@  static void blkcg_fill_root_iostats(void)
 		struct blkcg_gq *blkg = bdev->bd_disk->queue->root_blkg;
 		struct blkg_iostat tmp;
 		int cpu;
-		unsigned long flags;
 
 		memset(&tmp, 0, sizeof(tmp));
 		for_each_possible_cpu(cpu) {
@@ -1133,10 +1124,10 @@  static void blkcg_fill_root_iostats(void)
 			tmp.bytes[BLKG_IOSTAT_DISCARD] +=
 				cpu_dkstats->sectors[STAT_DISCARD] << 9;
 		}
-
-		flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync);
+#if BITS_PER_LONG == 32
+		guard(raw_spinlock_irqsave)(&blkg_stat_lock);
+#endif
 		blkg_iostat_set(&blkg->iostat.cur, &tmp);
-		u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
 	}
 }
 
@@ -1145,7 +1136,6 @@  static void blkcg_print_one_stat(struct blkcg_gq *blkg, struct seq_file *s)
 	struct blkg_iostat_set *bis = &blkg->iostat;
 	u64 rbytes, wbytes, rios, wios, dbytes, dios;
 	const char *dname;
-	unsigned seq;
 	int i;
 
 	if (!blkg->online)
@@ -1157,16 +1147,18 @@  static void blkcg_print_one_stat(struct blkcg_gq *blkg, struct seq_file *s)
 
 	seq_printf(s, "%s ", dname);
 
-	do {
-		seq = u64_stats_fetch_begin(&bis->sync);
-
+#if BITS_PER_LONG == 32
+	scoped_guard(raw_spinlock_irqsave, &blkg_stat_lock) {
+#endif
 		rbytes = bis->cur.bytes[BLKG_IOSTAT_READ];
 		wbytes = bis->cur.bytes[BLKG_IOSTAT_WRITE];
 		dbytes = bis->cur.bytes[BLKG_IOSTAT_DISCARD];
 		rios = bis->cur.ios[BLKG_IOSTAT_READ];
 		wios = bis->cur.ios[BLKG_IOSTAT_WRITE];
 		dios = bis->cur.ios[BLKG_IOSTAT_DISCARD];
-	} while (u64_stats_fetch_retry(&bis->sync, seq));
+#if BITS_PER_LONG == 32
+	}
+#endif
 
 	if (rbytes || wbytes || rios || wios) {
 		seq_printf(s, "rbytes=%llu wbytes=%llu rios=%llu wios=%llu dbytes=%llu dios=%llu",