[RFC] mm: thp: grab the lock before manipulation defer list

Message ID	20200103143407.1089-1-richardw.yang@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=8URU=2Y=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B35C521734 From: Wei Yang <richardw.yang@linux.intel.com> To: hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org Cc: kirill.shutemov@linux.intel.com, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yang.shi@linux.alibaba.com, Wei Yang <richardw.yang@linux.intel.com> Subject: [RFC PATCH] mm: thp: grab the lock before manipulation defer list Date: Fri, 3 Jan 2020 22:34:07 +0800 Message-Id: <20200103143407.1089-1-richardw.yang@linux.intel.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[RFC] mm: thp: grab the lock before manipulation defer list \| expand [RFC] mm: thp: grab the lock before manipulation defer list

Wei Yang Jan. 3, 2020, 2:34 p.m. UTC

As all the other places, we grab the lock before manipulate the defer list.
Current implementation may face a race condition.

Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>

---
I notice the difference during code reading and just confused about the
difference. No specific test is done since limited knowledge about cgroup.

Maybe I miss something important?
---
 mm/memcontrol.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

David Rientjes Jan. 3, 2020, 7:29 p.m. UTC | #1

On Fri, 3 Jan 2020, Wei Yang wrote:

> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
> 
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> 
> ---
> I notice the difference during code reading and just confused about the
> difference. No specific test is done since limited knowledge about cgroup.
> 
> Maybe I miss something important?

The check for !list_empty(page_deferred_list(page)) must certainly be 
serialized with doing list_del_init(page_deferred_list(page)).

> ---
>  mm/memcontrol.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..62b7ec34ef1a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
>  	}
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	spin_lock(&from->deferred_split_queue.split_queue_lock);
>  	if (compound && !list_empty(page_deferred_list(page))) {
> -		spin_lock(&from->deferred_split_queue.split_queue_lock);
>  		list_del_init(page_deferred_list(page));
>  		from->deferred_split_queue.split_queue_len--;
> -		spin_unlock(&from->deferred_split_queue.split_queue_lock);
>  	}
> +	spin_unlock(&from->deferred_split_queue.split_queue_lock);
>  #endif
>  	/*
>  	 * It is safe to change page->mem_cgroup here because the page
> @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page,
>  	page->mem_cgroup = to;
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	spin_lock(&to->deferred_split_queue.split_queue_lock);
>  	if (compound && list_empty(page_deferred_list(page))) {
> -		spin_lock(&to->deferred_split_queue.split_queue_lock);
>  		list_add_tail(page_deferred_list(page),
>  			      &to->deferred_split_queue.split_queue);
>  		to->deferred_split_queue.split_queue_len++;
> -		spin_unlock(&to->deferred_split_queue.split_queue_lock);
>  	}
> +	spin_unlock(&to->deferred_split_queue.split_queue_lock);
>  #endif
>  
>  	spin_unlock_irqrestore(&from->move_lock, flags);
> -- 
> 2.17.1
> 
> 
>

Wei Yang Jan. 3, 2020, 11:39 p.m. UTC | #2

On Fri, Jan 03, 2020 at 11:29:06AM -0800, David Rientjes wrote:
>On Fri, 3 Jan 2020, Wei Yang wrote:
>
>> As all the other places, we grab the lock before manipulate the defer list.
>> Current implementation may face a race condition.
>> 
>> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> 
>> ---
>> I notice the difference during code reading and just confused about the
>> difference. No specific test is done since limited knowledge about cgroup.
>> 
>> Maybe I miss something important?
>
>The check for !list_empty(page_deferred_list(page)) must certainly be 
>serialized with doing list_del_init(page_deferred_list(page)).
>

Hi David

Would you mind giving more information? You mean list_empty and list_del_init
is atomic?

David Rientjes Jan. 4, 2020, 12:44 a.m. UTC | #3

On Sat, 4 Jan 2020, Wei Yang wrote:

> On Fri, Jan 03, 2020 at 11:29:06AM -0800, David Rientjes wrote:
> >On Fri, 3 Jan 2020, Wei Yang wrote:
> >
> >> As all the other places, we grab the lock before manipulate the defer list.
> >> Current implementation may face a race condition.
> >> 
> >> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> >> 
> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> 
> >> ---
> >> I notice the difference during code reading and just confused about the
> >> difference. No specific test is done since limited knowledge about cgroup.
> >> 
> >> Maybe I miss something important?
> >
> >The check for !list_empty(page_deferred_list(page)) must certainly be 
> >serialized with doing list_del_init(page_deferred_list(page)).
> >
> 
> Hi David
> 
> Would you mind giving more information? You mean list_empty and list_del_init
> is atomic?
> 

I mean your patch is obviously correct :)  It should likely also have a 
stable@vger.kernel.org # 5.4+

Acked-by: David Rientjes <rientjes@google.com>

Wei Yang Jan. 6, 2020, 1:20 a.m. UTC | #4

On Fri, Jan 03, 2020 at 04:44:59PM -0800, David Rientjes wrote:
>On Sat, 4 Jan 2020, Wei Yang wrote:
>
>> On Fri, Jan 03, 2020 at 11:29:06AM -0800, David Rientjes wrote:
>> >On Fri, 3 Jan 2020, Wei Yang wrote:
>> >
>> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> Current implementation may face a race condition.
>> >> 
>> >> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>> >> 
>> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> 
>> >> ---
>> >> I notice the difference during code reading and just confused about the
>> >> difference. No specific test is done since limited knowledge about cgroup.
>> >> 
>> >> Maybe I miss something important?
>> >
>> >The check for !list_empty(page_deferred_list(page)) must certainly be 
>> >serialized with doing list_del_init(page_deferred_list(page)).
>> >
>> 
>> Hi David
>> 
>> Would you mind giving more information? You mean list_empty and list_del_init
>> is atomic?
>> 
>
>I mean your patch is obviously correct :)  It should likely also have a 
>stable@vger.kernel.org # 5.4+

Ah, my poor English ;-)

>
>Acked-by: David Rientjes <rientjes@google.com>

Michal Hocko Jan. 6, 2020, 10:23 a.m. UTC | #5

On Fri 03-01-20 22:34:07, Wei Yang wrote:
> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.

Please always make sure to describe the effect of the change. Why a racy
list_empty check matters?

> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> 
> ---
> I notice the difference during code reading and just confused about the
> difference. No specific test is done since limited knowledge about cgroup.
> 
> Maybe I miss something important?
> ---
>  mm/memcontrol.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..62b7ec34ef1a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
>  	}
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	spin_lock(&from->deferred_split_queue.split_queue_lock);
>  	if (compound && !list_empty(page_deferred_list(page))) {
> -		spin_lock(&from->deferred_split_queue.split_queue_lock);
>  		list_del_init(page_deferred_list(page));
>  		from->deferred_split_queue.split_queue_len--;
> -		spin_unlock(&from->deferred_split_queue.split_queue_lock);
>  	}
> +	spin_unlock(&from->deferred_split_queue.split_queue_lock);
>  #endif
>  	/*
>  	 * It is safe to change page->mem_cgroup here because the page
> @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page,
>  	page->mem_cgroup = to;
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	spin_lock(&to->deferred_split_queue.split_queue_lock);
>  	if (compound && list_empty(page_deferred_list(page))) {
> -		spin_lock(&to->deferred_split_queue.split_queue_lock);
>  		list_add_tail(page_deferred_list(page),
>  			      &to->deferred_split_queue.split_queue);
>  		to->deferred_split_queue.split_queue_len++;
> -		spin_unlock(&to->deferred_split_queue.split_queue_lock);
>  	}
> +	spin_unlock(&to->deferred_split_queue.split_queue_lock);
>  #endif
>  
>  	spin_unlock_irqrestore(&from->move_lock, flags);
> -- 
> 2.17.1

Alexander Duyck Jan. 6, 2020, 4:18 p.m. UTC | #6

On Fri, Jan 3, 2020 at 6:34 AM Wei Yang <richardw.yang@linux.intel.com> wrote:
>
> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
>
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>
> ---
> I notice the difference during code reading and just confused about the
> difference. No specific test is done since limited knowledge about cgroup.
>
> Maybe I miss something important?
> ---
>  mm/memcontrol.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..62b7ec34ef1a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
>         }
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +       spin_lock(&from->deferred_split_queue.split_queue_lock);
>         if (compound && !list_empty(page_deferred_list(page))) {
> -               spin_lock(&from->deferred_split_queue.split_queue_lock);
>                 list_del_init(page_deferred_list(page));
>                 from->deferred_split_queue.split_queue_len--;
> -               spin_unlock(&from->deferred_split_queue.split_queue_lock);
>         }
> +       spin_unlock(&from->deferred_split_queue.split_queue_lock);
>  #endif
>         /*
>          * It is safe to change page->mem_cgroup here because the page

So I suspect the lock placement has to do with the compound boolean
value passed to the function.

One thing you might want to do is pull the "if (compound)" check out
and place it outside of the spinlock check. It would then simplify
this signficantly so it is something like
if (compound) {
  spin_lock();
  list = page_deferred_list(page);
  if (!list_empty(list)) {
    list_del_init(list);
    from->..split_queue_len--;
  }
  spin_unlock();
}

Same for the block below. I would pull the check for compound outside
of the spinlock call since it is a value that shouldn't change and
would eliminate an unnecessary lock in the non-compound case.

> @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page,
>         page->mem_cgroup = to;
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +       spin_lock(&to->deferred_split_queue.split_queue_lock);
>         if (compound && list_empty(page_deferred_list(page))) {
> -               spin_lock(&to->deferred_split_queue.split_queue_lock);
>                 list_add_tail(page_deferred_list(page),
>                               &to->deferred_split_queue.split_queue);
>                 to->deferred_split_queue.split_queue_len++;
> -               spin_unlock(&to->deferred_split_queue.split_queue_lock);
>         }
> +       spin_unlock(&to->deferred_split_queue.split_queue_lock);
>  #endif
>
>         spin_unlock_irqrestore(&from->move_lock, flags);
> --

Wei Yang Jan. 7, 2020, 1:22 a.m. UTC | #7

On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> As all the other places, we grab the lock before manipulate the defer list.
>> Current implementation may face a race condition.
>
>Please always make sure to describe the effect of the change. Why a racy
>list_empty check matters?
>

Hmm... access the list without proper lock leads to many bad behaviors.

For example, if we grab the lock after checking list_empty, the page may
already be removed from list in split_huge_page_list. And then list_del_init
would trigger bug.

Wei Yang Jan. 7, 2020, 1:26 a.m. UTC | #8

On Mon, Jan 06, 2020 at 08:18:34AM -0800, Alexander Duyck wrote:
>On Fri, Jan 3, 2020 at 6:34 AM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>
>> As all the other places, we grab the lock before manipulate the defer list.
>> Current implementation may face a race condition.
>>
>> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>>
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>
>> ---
>> I notice the difference during code reading and just confused about the
>> difference. No specific test is done since limited knowledge about cgroup.
>>
>> Maybe I miss something important?
>> ---
>>  mm/memcontrol.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index bc01423277c5..62b7ec34ef1a 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
>>         }
>>
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +       spin_lock(&from->deferred_split_queue.split_queue_lock);
>>         if (compound && !list_empty(page_deferred_list(page))) {
>> -               spin_lock(&from->deferred_split_queue.split_queue_lock);
>>                 list_del_init(page_deferred_list(page));
>>                 from->deferred_split_queue.split_queue_len--;
>> -               spin_unlock(&from->deferred_split_queue.split_queue_lock);
>>         }
>> +       spin_unlock(&from->deferred_split_queue.split_queue_lock);
>>  #endif
>>         /*
>>          * It is safe to change page->mem_cgroup here because the page
>
>So I suspect the lock placement has to do with the compound boolean
>value passed to the function.
>

Hey, Alexander

Thanks for your comment.

>One thing you might want to do is pull the "if (compound)" check out
>and place it outside of the spinlock check. It would then simplify
>this signficantly so it is something like
>if (compound) {
>  spin_lock();
>  list = page_deferred_list(page);
>  if (!list_empty(list)) {
>    list_del_init(list);
>    from->..split_queue_len--;
>  }
>  spin_unlock();
>}
>
>Same for the block below. I would pull the check for compound outside
>of the spinlock call since it is a value that shouldn't change and
>would eliminate an unnecessary lock in the non-compound case.

This is reasonable, if no objection from others, I would change this in v2.

David Rientjes Jan. 7, 2020, 2:07 a.m. UTC | #9

On Tue, 7 Jan 2020, Wei Yang wrote:

> >One thing you might want to do is pull the "if (compound)" check out
> >and place it outside of the spinlock check. It would then simplify
> >this signficantly so it is something like
> >if (compound) {
> >  spin_lock();
> >  list = page_deferred_list(page);
> >  if (!list_empty(list)) {
> >    list_del_init(list);
> >    from->..split_queue_len--;
> >  }
> >  spin_unlock();
> >}
> >
> >Same for the block below. I would pull the check for compound outside
> >of the spinlock call since it is a value that shouldn't change and
> >would eliminate an unnecessary lock in the non-compound case.
> 
> This is reasonable, if no objection from others, I would change this in v2.

Looks fine to me; I don't see it as a necessary improvement but there's 
also no reason to object to it.  It's definitely a patch that is needed, 
however, for the simple reason that with the existing code we can 
manipulate the deferred split queue incorrectly so either way works for 
me.  Feel free to keep my acked-by.

Wei Yang Jan. 7, 2020, 2:33 a.m. UTC | #10

On Mon, Jan 06, 2020 at 06:07:29PM -0800, David Rientjes wrote:
>On Tue, 7 Jan 2020, Wei Yang wrote:
>
>> >One thing you might want to do is pull the "if (compound)" check out
>> >and place it outside of the spinlock check. It would then simplify
>> >this signficantly so it is something like
>> >if (compound) {
>> >  spin_lock();
>> >  list = page_deferred_list(page);
>> >  if (!list_empty(list)) {
>> >    list_del_init(list);
>> >    from->..split_queue_len--;
>> >  }
>> >  spin_unlock();
>> >}
>> >
>> >Same for the block below. I would pull the check for compound outside
>> >of the spinlock call since it is a value that shouldn't change and
>> >would eliminate an unnecessary lock in the non-compound case.
>> 
>> This is reasonable, if no objection from others, I would change this in v2.
>
>Looks fine to me; I don't see it as a necessary improvement but there's 
>also no reason to object to it.  It's definitely a patch that is needed, 
>however, for the simple reason that with the existing code we can 
>manipulate the deferred split queue incorrectly so either way works for 
>me.  Feel free to keep my acked-by.

Ah, thanks David. You are so supportive.

Michal Hocko Jan. 7, 2020, 8:38 a.m. UTC | #11

On Tue 07-01-20 09:22:41, Wei Yang wrote:
> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
> >> As all the other places, we grab the lock before manipulate the defer list.
> >> Current implementation may face a race condition.
> >
> >Please always make sure to describe the effect of the change. Why a racy
> >list_empty check matters?
> >
> 
> Hmm... access the list without proper lock leads to many bad behaviors.

My point is that the changelog should describe that bad behavior.

> For example, if we grab the lock after checking list_empty, the page may
> already be removed from list in split_huge_page_list. And then list_del_init
> would trigger bug.

And how does list_empty check under the lock guarantee that the page is
on the deferred list?

Wei Yang Jan. 8, 2020, 12:35 a.m. UTC | #12

On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
>On Tue 07-01-20 09:22:41, Wei Yang wrote:
>> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> Current implementation may face a race condition.
>> >
>> >Please always make sure to describe the effect of the change. Why a racy
>> >list_empty check matters?
>> >
>> 
>> Hmm... access the list without proper lock leads to many bad behaviors.
>
>My point is that the changelog should describe that bad behavior.
>
>> For example, if we grab the lock after checking list_empty, the page may
>> already be removed from list in split_huge_page_list. And then list_del_init
>> would trigger bug.
>
>And how does list_empty check under the lock guarantee that the page is
>on the deferred list?

Just one confusion, is this kind of description basic concept of concurrent
programming? How detail level we need to describe the effect?

To me, grab the lock before accessing the critical section is obvious.
list_empty and list_del should be the critical section. And the
lock should protect the whole critical section instead of part of it.

>-- 
>Michal Hocko
>SUSE Labs

Michal Hocko Jan. 8, 2020, 9:40 a.m. UTC | #13

On Wed 08-01-20 08:35:43, Wei Yang wrote:
> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
> >> >> As all the other places, we grab the lock before manipulate the defer list.
> >> >> Current implementation may face a race condition.
> >> >
> >> >Please always make sure to describe the effect of the change. Why a racy
> >> >list_empty check matters?
> >> >
> >> 
> >> Hmm... access the list without proper lock leads to many bad behaviors.
> >
> >My point is that the changelog should describe that bad behavior.
> >
> >> For example, if we grab the lock after checking list_empty, the page may
> >> already be removed from list in split_huge_page_list. And then list_del_init
> >> would trigger bug.
> >
> >And how does list_empty check under the lock guarantee that the page is
> >on the deferred list?
> 
> Just one confusion, is this kind of description basic concept of concurrent
> programming? How detail level we need to describe the effect?

When I write changelogs for patches like this I usually describe, what
is the potential race - e.g.
	CPU1			CPU2
	path1			path2
	  check			  lock
	  			    operation2
				  unlock
	    lock
	    # check might not hold anymore
	    operation1
	    unlock

and what is the effect of the race - e.g. a crash, data corruption,
pointless attempt for operation1 which fails with user visible effect
etc.
This helps reviewers and everybody reading the code in the future to
understand the locking scheme.

> To me, grab the lock before accessing the critical section is obvious.

It might be obvious but in many cases it is useful to minimize the
locking and do a potentially race check before the lock is taken if the
resulting operation can handle that.

> list_empty and list_del should be the critical section. And the
> lock should protect the whole critical section instead of part of it.

I am not disputing that. What I am trying to say is that the changelog
should described the problem in the first place.

Moreover, look at the code you are trying to fix. Sure extending the
locking seem straightforward but does it result in a correct code
though? See my question in the previous email. How do we know that the
page is actually enqued in a non-empty list?

Wei Yang Jan. 9, 2020, 2:03 a.m. UTC | #14

On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
>On Wed 08-01-20 08:35:43, Wei Yang wrote:
>> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
>> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
>> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> >> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> >> Current implementation may face a race condition.
>> >> >
>> >> >Please always make sure to describe the effect of the change. Why a racy
>> >> >list_empty check matters?
>> >> >
>> >> 
>> >> Hmm... access the list without proper lock leads to many bad behaviors.
>> >
>> >My point is that the changelog should describe that bad behavior.
>> >
>> >> For example, if we grab the lock after checking list_empty, the page may
>> >> already be removed from list in split_huge_page_list. And then list_del_init
>> >> would trigger bug.
>> >
>> >And how does list_empty check under the lock guarantee that the page is
>> >on the deferred list?
>> 
>> Just one confusion, is this kind of description basic concept of concurrent
>> programming? How detail level we need to describe the effect?
>
>When I write changelogs for patches like this I usually describe, what
>is the potential race - e.g.
>	CPU1			CPU2
>	path1			path2
>	  check			  lock
>	  			    operation2
>				  unlock
>	    lock
>	    # check might not hold anymore
>	    operation1
>	    unlock
>

Nice, I would prepare a changelog like this.

>and what is the effect of the race - e.g. a crash, data corruption,
>pointless attempt for operation1 which fails with user visible effect
>etc.
>This helps reviewers and everybody reading the code in the future to
>understand the locking scheme.
>
>> To me, grab the lock before accessing the critical section is obvious.
>
>It might be obvious but in many cases it is useful to minimize the
>locking and do a potentially race check before the lock is taken if the
>resulting operation can handle that.
>
>> list_empty and list_del should be the critical section. And the
>> lock should protect the whole critical section instead of part of it.
>
>I am not disputing that. What I am trying to say is that the changelog
>should described the problem in the first place.
>
>Moreover, look at the code you are trying to fix. Sure extending the
>locking seem straightforward but does it result in a correct code
>though? See my question in the previous email. How do we know that the
>page is actually enqued in a non-empty list?

I may not get your point for the last sentence.

The list_empty() doesn't check the queue is empty but check the list, here is
the page, is not enqueued into any list. Is this your concern?

>-- 
>Michal Hocko
>SUSE Labs

Wei Yang Jan. 9, 2020, 3:18 a.m. UTC | #15

On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
>On Wed 08-01-20 08:35:43, Wei Yang wrote:
>> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
>> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
>> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> >> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> >> Current implementation may face a race condition.
>> >> >
>> >> >Please always make sure to describe the effect of the change. Why a racy
>> >> >list_empty check matters?
>> >> >
>> >> 
>> >> Hmm... access the list without proper lock leads to many bad behaviors.
>> >
>> >My point is that the changelog should describe that bad behavior.
>> >
>> >> For example, if we grab the lock after checking list_empty, the page may
>> >> already be removed from list in split_huge_page_list. And then list_del_init
>> >> would trigger bug.
>> >
>> >And how does list_empty check under the lock guarantee that the page is
>> >on the deferred list?
>> 
>> Just one confusion, is this kind of description basic concept of concurrent
>> programming? How detail level we need to describe the effect?
>
>When I write changelogs for patches like this I usually describe, what
>is the potential race - e.g.
>	CPU1			CPU2
>	path1			path2
>	  check			  lock
>	  			    operation2
>				  unlock
>	    lock
>	    # check might not hold anymore
>	    operation1
>	    unlock
>
>and what is the effect of the race - e.g. a crash, data corruption,
>pointless attempt for operation1 which fails with user visible effect
>etc.

Hi, Michal, here is my attempt for an example. Hope this one looks good to
you.


    For example, the potential race would be:
    
        CPU1                      CPU2
        mem_cgroup_move_account   split_huge_page_to_list
          !list_empty
                                    lock
                                    !list_empty
                                    list_del
                                    unlock
          lock
          # !list_empty might not hold anymore
          list_del_init
          unlock
    
    When this sequence happens, the list_del_init() in
    mem_cgroup_move_account() would crash since the page is already been
    removed by list_del in split_huge_page_to_list().

Michal Hocko Jan. 9, 2020, 8:34 a.m. UTC | #16

On Thu 09-01-20 10:03:19, Wei Yang wrote:
> On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
[...]
> >Moreover, look at the code you are trying to fix. Sure extending the
> >locking seem straightforward but does it result in a correct code
> >though? See my question in the previous email. How do we know that the
> >page is actually enqued in a non-empty list?
> 
> I may not get your point for the last sentence.
> 
> The list_empty() doesn't check the queue is empty but check the list, here is
> the page, is not enqueued into any list. Is this your concern?

My bad. For some reason I have misread the code and thought this was
get_deferred_split_queue rather than page_deferred_list. Sorry about the
confusion.

Michal Hocko Jan. 9, 2020, 8:36 a.m. UTC | #17

On Thu 09-01-20 11:18:21, Wei Yang wrote:
> On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
> >On Wed 08-01-20 08:35:43, Wei Yang wrote:
> >> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
> >> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
> >> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
> >> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
> >> >> >> As all the other places, we grab the lock before manipulate the defer list.
> >> >> >> Current implementation may face a race condition.
> >> >> >
> >> >> >Please always make sure to describe the effect of the change. Why a racy
> >> >> >list_empty check matters?
> >> >> >
> >> >> 
> >> >> Hmm... access the list without proper lock leads to many bad behaviors.
> >> >
> >> >My point is that the changelog should describe that bad behavior.
> >> >
> >> >> For example, if we grab the lock after checking list_empty, the page may
> >> >> already be removed from list in split_huge_page_list. And then list_del_init
> >> >> would trigger bug.
> >> >
> >> >And how does list_empty check under the lock guarantee that the page is
> >> >on the deferred list?
> >> 
> >> Just one confusion, is this kind of description basic concept of concurrent
> >> programming? How detail level we need to describe the effect?
> >
> >When I write changelogs for patches like this I usually describe, what
> >is the potential race - e.g.
> >	CPU1			CPU2
> >	path1			path2
> >	  check			  lock
> >	  			    operation2
> >				  unlock
> >	    lock
> >	    # check might not hold anymore
> >	    operation1
> >	    unlock
> >
> >and what is the effect of the race - e.g. a crash, data corruption,
> >pointless attempt for operation1 which fails with user visible effect
> >etc.
> 
> Hi, Michal, here is my attempt for an example. Hope this one looks good to
> you.
> 
> 
>     For example, the potential race would be:
>     
>         CPU1                      CPU2
>         mem_cgroup_move_account   split_huge_page_to_list
>           !list_empty
>                                     lock
>                                     !list_empty
>                                     list_del
>                                     unlock
>           lock
>           # !list_empty might not hold anymore
>           list_del_init
>           unlock
>     
>     When this sequence happens, the list_del_init() in
>     mem_cgroup_move_account() would crash since the page is already been
>     removed by list_del in split_huge_page_to_list().

Yes this looks much more informative. I would just add that this will
crash if CONFIG_DEBUG_LIST.

Thanks!

Wei Yang Jan. 9, 2020, 8:52 a.m. UTC | #18

On Thu, Jan 09, 2020 at 09:36:41AM +0100, Michal Hocko wrote:
>On Thu 09-01-20 11:18:21, Wei Yang wrote:
>> On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
>> >On Wed 08-01-20 08:35:43, Wei Yang wrote:
>> >> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
>> >> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
>> >> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>> >> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> >> >> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> >> >> Current implementation may face a race condition.
>> >> >> >
>> >> >> >Please always make sure to describe the effect of the change. Why a racy
>> >> >> >list_empty check matters?
>> >> >> >
>> >> >> 
>> >> >> Hmm... access the list without proper lock leads to many bad behaviors.
>> >> >
>> >> >My point is that the changelog should describe that bad behavior.
>> >> >
>> >> >> For example, if we grab the lock after checking list_empty, the page may
>> >> >> already be removed from list in split_huge_page_list. And then list_del_init
>> >> >> would trigger bug.
>> >> >
>> >> >And how does list_empty check under the lock guarantee that the page is
>> >> >on the deferred list?
>> >> 
>> >> Just one confusion, is this kind of description basic concept of concurrent
>> >> programming? How detail level we need to describe the effect?
>> >
>> >When I write changelogs for patches like this I usually describe, what
>> >is the potential race - e.g.
>> >	CPU1			CPU2
>> >	path1			path2
>> >	  check			  lock
>> >	  			    operation2
>> >				  unlock
>> >	    lock
>> >	    # check might not hold anymore
>> >	    operation1
>> >	    unlock
>> >
>> >and what is the effect of the race - e.g. a crash, data corruption,
>> >pointless attempt for operation1 which fails with user visible effect
>> >etc.
>> 
>> Hi, Michal, here is my attempt for an example. Hope this one looks good to
>> you.
>> 
>> 
>>     For example, the potential race would be:
>>     
>>         CPU1                      CPU2
>>         mem_cgroup_move_account   split_huge_page_to_list
>>           !list_empty
>>                                     lock
>>                                     !list_empty
>>                                     list_del
>>                                     unlock
>>           lock
>>           # !list_empty might not hold anymore
>>           list_del_init
>>           unlock
>>     
>>     When this sequence happens, the list_del_init() in
>>     mem_cgroup_move_account() would crash since the page is already been
>>     removed by list_del in split_huge_page_to_list().
>
>Yes this looks much more informative. I would just add that this will
>crash if CONFIG_DEBUG_LIST.
>
>Thanks!

Glad you like it~

Will prepare v2 with your suggestion :-)

>-- 
>Michal Hocko
>SUSE Labs

[RFC] mm: thp: grab the lock before manipulation defer list

Commit Message

Comments

Patch