Message ID | 20241125015258.2652325-1-leo.lilong@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xfs: fix race condition in inodegc list and cpumask handling | expand |
On Mon, Nov 25, 2024 at 09:52:58AM +0800, Long Li wrote: > There is a race condition between inodegc queue and inodegc worker where > the cpumask bit may not be set when concurrent operations occur. What problems does this cause? i.e. how do we identify systems hitting this issue? > > Current problematic sequence: > > CPU0 CPU1 > -------------------- --------------------- > xfs_inodegc_queue() xfs_inodegc_worker() > llist_del_all(&gc->list) > llist_add(&ip->i_gclist, &gc->list) > cpumask_test_and_set_cpu() > cpumask_clear_cpu() > < cpumask not set > > > Fix this by moving llist_del_all() after cpumask_clear_cpu() to ensure > proper ordering. This change ensures that when the worker thread clears > the cpumask, any concurrent queue operations will either properly set > the cpumask bit or have already emptied the list. > > Also remove unnecessary smp_mb__{before/after}_atomic() barriers since > the llist_* operations already provide required ordering semantics. it > make the code cleaner. IIRC, the barriers were for ordering the cpumask bitmap ops against llist operations. There are calls elsewhere to for_each_cpu() that then use llist_empty() checks (e.g xfs_inodegc_queue_all/wait_all), so on relaxed architectures (like alpha) I think we have to ensure the bitmask ops carried full ordering against the independent llist ops themselves. i.e. llist_empty() just uses READ_ONCE, so it only orders against other llist ops and won't guarantee any specific ordering against against cpumask modifications. I could be remembering incorrectly, but I think that was the original reason for the barriers. Can you please confirm that the cpumask iteration/llist_empty checks do not need these bitmask barriers anymore? If that's ok, then the change looks fine. -Dave.
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 7b6c026d01a1..fba784d7a146 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1948,19 +1948,20 @@ xfs_inodegc_worker( { struct xfs_inodegc *gc = container_of(to_delayed_work(work), struct xfs_inodegc, work); - struct llist_node *node = llist_del_all(&gc->list); + struct llist_node *node; struct xfs_inode *ip, *n; struct xfs_mount *mp = gc->mp; unsigned int nofs_flag; + cpumask_clear_cpu(gc->cpu, &mp->m_inodegc_cpumask); + /* - * Clear the cpu mask bit and ensure that we have seen the latest - * update of the gc structure associated with this CPU. This matches - * with the release semantics used when setting the cpumask bit in - * xfs_inodegc_queue. + * llist_del_all provides ordering semantics that ensure the CPU mask + * clearing is always visible before removing all entries from the gc + * list. This prevents list being added while the CPU mask bit is + * unset in xfs_inodegc_queue() */ - cpumask_clear_cpu(gc->cpu, &mp->m_inodegc_cpumask); - smp_mb__after_atomic(); + node = llist_del_all(&gc->list); WRITE_ONCE(gc->items, 0); @@ -2188,13 +2189,11 @@ xfs_inodegc_queue( shrinker_hits = READ_ONCE(gc->shrinker_hits); /* - * Ensure the list add is always seen by anyone who finds the cpumask - * bit set. This effectively gives the cpumask bit set operation - * release ordering semantics. + * llist_add provides necessary ordering semantics to ensure list + * additions are visible when cpumask bit is found set, so no + * additional memory barrier is needed. */ - smp_mb__before_atomic(); - if (!cpumask_test_cpu(cpu_nr, &mp->m_inodegc_cpumask)) - cpumask_test_and_set_cpu(cpu_nr, &mp->m_inodegc_cpumask); + cpumask_test_and_set_cpu(cpu_nr, &mp->m_inodegc_cpumask); /* * We queue the work while holding the current CPU so that the work
There is a race condition between inodegc queue and inodegc worker where the cpumask bit may not be set when concurrent operations occur. Current problematic sequence: CPU0 CPU1 -------------------- --------------------- xfs_inodegc_queue() xfs_inodegc_worker() llist_del_all(&gc->list) llist_add(&ip->i_gclist, &gc->list) cpumask_test_and_set_cpu() cpumask_clear_cpu() < cpumask not set > Fix this by moving llist_del_all() after cpumask_clear_cpu() to ensure proper ordering. This change ensures that when the worker thread clears the cpumask, any concurrent queue operations will either properly set the cpumask bit or have already emptied the list. Also remove unnecessary smp_mb__{before/after}_atomic() barriers since the llist_* operations already provide required ordering semantics. it make the code cleaner. Fixes: 62334fab4762 ("xfs: use per-mount cpumask to track nonempty percpu inodegc lists") Reported-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/xfs_icache.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-)