Message ID | 20220328005736.2513727-1-longman@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/list_lru: Fix possible race in memcg_reparent_list_lru_node() | expand |
On 3/27/22 20:57, Waiman Long wrote: > Muchun Song found out there could be a race between list_lru_add() > and memcg_reparent_list_lru_node() causing the later function to miss > reparenting of a lru entry as shown below: > > CPU0: CPU1: > list_lru_add() > spin_lock(&nlru->lock) > l = list_lru_from_kmem(memcg) > memcg_reparent_objcgs(memcg) > memcg_reparent_list_lrus(memcg) > memcg_reparent_list_lru() > memcg_reparent_list_lru_node() > if (!READ_ONCE(nlru->nr_items)) > // Miss reparenting > return > // Assume 0->1 > l->nr_items++ > // Assume 0->1 > nlru->nr_items++ > > Though it is not likely that a list_lru_node that has 0 item suddenly > has a newly added lru entry at the end of its life. The race is still > theoretically possible. > > Adding a spin_is_locked() check will likely be enough for x86, but it > is less certain for other arches with a more relaxed memory semantics > like arcm64 and ppc. To avoid race, this patch moves the nr_items check > to within the lock critical section. > > Fixes: 405cc51fc104 ("mm/list_lru: optimize memcg_reparent_list_lru_node()") Sorry, I should have added Reported-by: Muchun Song <songmuchun@bytedance.com> > Signed-off-by: Waiman Long <longman@redhat.com> > --- > mm/list_lru.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/mm/list_lru.c b/mm/list_lru.c > index c669d87001a6..8aec8ebd5995 100644 > --- a/mm/list_lru.c > +++ b/mm/list_lru.c > @@ -394,18 +394,18 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid, > int dst_idx = dst_memcg->kmemcg_id; > struct list_lru_one *src, *dst; > > - /* > - * If there is no lru entry in this nlru, we can skip it immediately. > - */ > - if (!READ_ONCE(nlru->nr_items)) > - return; > - > /* > * Since list_lru_{add,del} may be called under an IRQ-safe lock, > * we have to use IRQ-safe primitives here to avoid deadlock. > */ > spin_lock_irq(&nlru->lock); > > + /* > + * If there is no lru entry in this nlru, we can skip it immediately. > + */ > + if (!nlru->nr_items) > + goto out; > + > src = list_lru_from_memcg_idx(lru, nid, src_idx); > if (!src) > goto out; Cheers, Longman
On Mon, Mar 28, 2022 at 8:58 AM Waiman Long <longman@redhat.com> wrote: > > Muchun Song found out there could be a race between list_lru_add() > and memcg_reparent_list_lru_node() causing the later function to miss > reparenting of a lru entry as shown below: > > CPU0: CPU1: > list_lru_add() > spin_lock(&nlru->lock) > l = list_lru_from_kmem(memcg) > memcg_reparent_objcgs(memcg) > memcg_reparent_list_lrus(memcg) > memcg_reparent_list_lru() > memcg_reparent_list_lru_node() > if (!READ_ONCE(nlru->nr_items)) > // Miss reparenting > return > // Assume 0->1 > l->nr_items++ > // Assume 0->1 > nlru->nr_items++ > > Though it is not likely that a list_lru_node that has 0 item suddenly > has a newly added lru entry at the end of its life. The race is still > theoretically possible. > > Adding a spin_is_locked() check will likely be enough for x86, but it > is less certain for other arches with a more relaxed memory semantics > like arcm64 and ppc. To avoid race, this patch moves the nr_items check > to within the lock critical section. > > Fixes: 405cc51fc104 ("mm/list_lru: optimize memcg_reparent_list_lru_node()") > Signed-off-by: Waiman Long <longman@redhat.com> How about the following patch? It is low overhead on x86_64. Even on relaxed memory mode, I think it is also lower overhead since it avoid a store operation to nlru->lock. We do not need to insert a smp_wmb() into the list_lru_add() since spin_lock() always implies at least a load acquiring semantics. Thanks. diff --git a/mm/list_lru.c b/mm/list_lru.c index c669d87001a6..0e58374b629b 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -397,8 +397,11 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid, /* * If there is no lru entry in this nlru, we can skip it immediately. */ - if (!READ_ONCE(nlru->nr_items)) - return; + if (!READ_ONCE(nlru->nr_items)) { + smp_rmb(); + if (!spin_is_locked(&nlru->lock)) + return; + } /* * Since list_lru_{add,del} may be called under an IRQ-safe lock,
diff --git a/mm/list_lru.c b/mm/list_lru.c index c669d87001a6..8aec8ebd5995 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -394,18 +394,18 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid, int dst_idx = dst_memcg->kmemcg_id; struct list_lru_one *src, *dst; - /* - * If there is no lru entry in this nlru, we can skip it immediately. - */ - if (!READ_ONCE(nlru->nr_items)) - return; - /* * Since list_lru_{add,del} may be called under an IRQ-safe lock, * we have to use IRQ-safe primitives here to avoid deadlock. */ spin_lock_irq(&nlru->lock); + /* + * If there is no lru entry in this nlru, we can skip it immediately. + */ + if (!nlru->nr_items) + goto out; + src = list_lru_from_memcg_idx(lru, nid, src_idx); if (!src) goto out;
Muchun Song found out there could be a race between list_lru_add() and memcg_reparent_list_lru_node() causing the later function to miss reparenting of a lru entry as shown below: CPU0: CPU1: list_lru_add() spin_lock(&nlru->lock) l = list_lru_from_kmem(memcg) memcg_reparent_objcgs(memcg) memcg_reparent_list_lrus(memcg) memcg_reparent_list_lru() memcg_reparent_list_lru_node() if (!READ_ONCE(nlru->nr_items)) // Miss reparenting return // Assume 0->1 l->nr_items++ // Assume 0->1 nlru->nr_items++ Though it is not likely that a list_lru_node that has 0 item suddenly has a newly added lru entry at the end of its life. The race is still theoretically possible. Adding a spin_is_locked() check will likely be enough for x86, but it is less certain for other arches with a more relaxed memory semantics like arcm64 and ppc. To avoid race, this patch moves the nr_items check to within the lock critical section. Fixes: 405cc51fc104 ("mm/list_lru: optimize memcg_reparent_list_lru_node()") Signed-off-by: Waiman Long <longman@redhat.com> --- mm/list_lru.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)