Message ID | 20221213184446.50181-3-longman@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | blk-cgroup: Fix potential UAF & flush rstat at blkgs destruction path | expand |
On Tue, Dec 13, 2022 at 01:44:46PM -0500, Waiman Long wrote: > + /* > + * Flush all the non-empty percpu lockless lists so as to release > + * the blkg references held by those lists which, in turn, may > + * allow the blkgs to be freed and release their references to > + * blkcg speeding up its freeing. > + */ Can you mention the possible deadlock explicitly? This sounds more like an optimization. Thanks.
On 12/13/22 14:30, Tejun Heo wrote: > On Tue, Dec 13, 2022 at 01:44:46PM -0500, Waiman Long wrote: >> + /* >> + * Flush all the non-empty percpu lockless lists so as to release >> + * the blkg references held by those lists which, in turn, may >> + * allow the blkgs to be freed and release their references to >> + * blkcg speeding up its freeing. >> + */ > Can you mention the possible deadlock explicitly? This sounds more like an > optimization. I am mostly thinking about the optimization aspect. Yes, deadlock in the sense that both blkgs and blkcg remained offline but not freed is possible because of the references hold in those lockless list. It is a problem if blkcg is the only controller in a cgroup. For cgroup that has both the blkcg and memory controllers, it shouldn't be a problem as the cgroup_rstat_flush() call in the release of memory cgroup will clear up blkcg too. Right, I will update the comment to mention that. Cheers, Longman
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index ca28306aa1b1..ddd27a714d3e 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1084,6 +1084,8 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css) */ static void blkcg_destroy_blkgs(struct blkcg *blkcg) { + int cpu; + /* * blkcg_destroy_blkgs() shouldn't be called with all the blkcg * references gone. @@ -1093,6 +1095,19 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg) might_sleep(); + /* + * Flush all the non-empty percpu lockless lists so as to release + * the blkg references held by those lists which, in turn, may + * allow the blkgs to be freed and release their references to + * blkcg speeding up its freeing. + */ + for_each_possible_cpu(cpu) { + struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu); + + if (!llist_empty(lhead)) + cgroup_rstat_css_cpu_flush(&blkcg->css, cpu); + } + spin_lock_irq(&blkcg->lock); while (!hlist_empty(&blkcg->blkg_list)) { diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 528bd44b59e2..6c4e66b3fa84 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -766,6 +766,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp); void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp); void cgroup_rstat_flush_hold(struct cgroup *cgrp); void cgroup_rstat_flush_release(void); +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu); /* * Basic resource stats. diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 793ecff29038..2e44be44351f 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void) spin_unlock_irq(&cgroup_rstat_lock); } +/** + * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu + * @css: target css to be flush + * @cpu: the cpu that holds the stats to be flush + * + * A lightweight rstat flush operation for a given css and cpu. + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock + * isn't used. + */ +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu) +{ + raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); + + raw_spin_lock_irq(cpu_lock); + css->ss->css_rstat_flush(css, cpu); + raw_spin_unlock_irq(cpu_lock); +} + int cgroup_rstat_init(struct cgroup *cgrp) { int cpu;