Message ID | 20200103143407.1089-1-richardw.yang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] mm: thp: grab the lock before manipulation defer list | expand |
On Fri, 3 Jan 2020, Wei Yang wrote: > As all the other places, we grab the lock before manipulate the defer list. > Current implementation may face a race condition. > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > > --- > I notice the difference during code reading and just confused about the > difference. No specific test is done since limited knowledge about cgroup. > > Maybe I miss something important? The check for !list_empty(page_deferred_list(page)) must certainly be serialized with doing list_del_init(page_deferred_list(page)). > --- > mm/memcontrol.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index bc01423277c5..62b7ec34ef1a 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page, > } > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > + spin_lock(&from->deferred_split_queue.split_queue_lock); > if (compound && !list_empty(page_deferred_list(page))) { > - spin_lock(&from->deferred_split_queue.split_queue_lock); > list_del_init(page_deferred_list(page)); > from->deferred_split_queue.split_queue_len--; > - spin_unlock(&from->deferred_split_queue.split_queue_lock); > } > + spin_unlock(&from->deferred_split_queue.split_queue_lock); > #endif > /* > * It is safe to change page->mem_cgroup here because the page > @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page, > page->mem_cgroup = to; > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > + spin_lock(&to->deferred_split_queue.split_queue_lock); > if (compound && list_empty(page_deferred_list(page))) { > - spin_lock(&to->deferred_split_queue.split_queue_lock); > list_add_tail(page_deferred_list(page), > &to->deferred_split_queue.split_queue); > to->deferred_split_queue.split_queue_len++; > - spin_unlock(&to->deferred_split_queue.split_queue_lock); > } > + spin_unlock(&to->deferred_split_queue.split_queue_lock); > #endif > > spin_unlock_irqrestore(&from->move_lock, flags); > -- > 2.17.1 > > >
On Fri, Jan 03, 2020 at 11:29:06AM -0800, David Rientjes wrote: >On Fri, 3 Jan 2020, Wei Yang wrote: > >> As all the other places, we grab the lock before manipulate the defer list. >> Current implementation may face a race condition. >> >> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> >> --- >> I notice the difference during code reading and just confused about the >> difference. No specific test is done since limited knowledge about cgroup. >> >> Maybe I miss something important? > >The check for !list_empty(page_deferred_list(page)) must certainly be >serialized with doing list_del_init(page_deferred_list(page)). > Hi David Would you mind giving more information? You mean list_empty and list_del_init is atomic?
On Sat, 4 Jan 2020, Wei Yang wrote: > On Fri, Jan 03, 2020 at 11:29:06AM -0800, David Rientjes wrote: > >On Fri, 3 Jan 2020, Wei Yang wrote: > > > >> As all the other places, we grab the lock before manipulate the defer list. > >> Current implementation may face a race condition. > >> > >> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") > >> > >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > >> > >> --- > >> I notice the difference during code reading and just confused about the > >> difference. No specific test is done since limited knowledge about cgroup. > >> > >> Maybe I miss something important? > > > >The check for !list_empty(page_deferred_list(page)) must certainly be > >serialized with doing list_del_init(page_deferred_list(page)). > > > > Hi David > > Would you mind giving more information? You mean list_empty and list_del_init > is atomic? > I mean your patch is obviously correct :) It should likely also have a stable@vger.kernel.org # 5.4+ Acked-by: David Rientjes <rientjes@google.com>
On Fri, Jan 03, 2020 at 04:44:59PM -0800, David Rientjes wrote: >On Sat, 4 Jan 2020, Wei Yang wrote: > >> On Fri, Jan 03, 2020 at 11:29:06AM -0800, David Rientjes wrote: >> >On Fri, 3 Jan 2020, Wei Yang wrote: >> > >> >> As all the other places, we grab the lock before manipulate the defer list. >> >> Current implementation may face a race condition. >> >> >> >> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") >> >> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> >> >> >> --- >> >> I notice the difference during code reading and just confused about the >> >> difference. No specific test is done since limited knowledge about cgroup. >> >> >> >> Maybe I miss something important? >> > >> >The check for !list_empty(page_deferred_list(page)) must certainly be >> >serialized with doing list_del_init(page_deferred_list(page)). >> > >> >> Hi David >> >> Would you mind giving more information? You mean list_empty and list_del_init >> is atomic? >> > >I mean your patch is obviously correct :) It should likely also have a >stable@vger.kernel.org # 5.4+ Ah, my poor English ;-) > >Acked-by: David Rientjes <rientjes@google.com>
On Fri 03-01-20 22:34:07, Wei Yang wrote: > As all the other places, we grab the lock before manipulate the defer list. > Current implementation may face a race condition. Please always make sure to describe the effect of the change. Why a racy list_empty check matters? > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > > --- > I notice the difference during code reading and just confused about the > difference. No specific test is done since limited knowledge about cgroup. > > Maybe I miss something important? > --- > mm/memcontrol.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index bc01423277c5..62b7ec34ef1a 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page, > } > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > + spin_lock(&from->deferred_split_queue.split_queue_lock); > if (compound && !list_empty(page_deferred_list(page))) { > - spin_lock(&from->deferred_split_queue.split_queue_lock); > list_del_init(page_deferred_list(page)); > from->deferred_split_queue.split_queue_len--; > - spin_unlock(&from->deferred_split_queue.split_queue_lock); > } > + spin_unlock(&from->deferred_split_queue.split_queue_lock); > #endif > /* > * It is safe to change page->mem_cgroup here because the page > @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page, > page->mem_cgroup = to; > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > + spin_lock(&to->deferred_split_queue.split_queue_lock); > if (compound && list_empty(page_deferred_list(page))) { > - spin_lock(&to->deferred_split_queue.split_queue_lock); > list_add_tail(page_deferred_list(page), > &to->deferred_split_queue.split_queue); > to->deferred_split_queue.split_queue_len++; > - spin_unlock(&to->deferred_split_queue.split_queue_lock); > } > + spin_unlock(&to->deferred_split_queue.split_queue_lock); > #endif > > spin_unlock_irqrestore(&from->move_lock, flags); > -- > 2.17.1
On Fri, Jan 3, 2020 at 6:34 AM Wei Yang <richardw.yang@linux.intel.com> wrote: > > As all the other places, we grab the lock before manipulate the defer list. > Current implementation may face a race condition. > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > > --- > I notice the difference during code reading and just confused about the > difference. No specific test is done since limited knowledge about cgroup. > > Maybe I miss something important? > --- > mm/memcontrol.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index bc01423277c5..62b7ec34ef1a 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page, > } > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > + spin_lock(&from->deferred_split_queue.split_queue_lock); > if (compound && !list_empty(page_deferred_list(page))) { > - spin_lock(&from->deferred_split_queue.split_queue_lock); > list_del_init(page_deferred_list(page)); > from->deferred_split_queue.split_queue_len--; > - spin_unlock(&from->deferred_split_queue.split_queue_lock); > } > + spin_unlock(&from->deferred_split_queue.split_queue_lock); > #endif > /* > * It is safe to change page->mem_cgroup here because the page So I suspect the lock placement has to do with the compound boolean value passed to the function. One thing you might want to do is pull the "if (compound)" check out and place it outside of the spinlock check. It would then simplify this signficantly so it is something like if (compound) { spin_lock(); list = page_deferred_list(page); if (!list_empty(list)) { list_del_init(list); from->..split_queue_len--; } spin_unlock(); } Same for the block below. I would pull the check for compound outside of the spinlock call since it is a value that shouldn't change and would eliminate an unnecessary lock in the non-compound case. > @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page, > page->mem_cgroup = to; > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > + spin_lock(&to->deferred_split_queue.split_queue_lock); > if (compound && list_empty(page_deferred_list(page))) { > - spin_lock(&to->deferred_split_queue.split_queue_lock); > list_add_tail(page_deferred_list(page), > &to->deferred_split_queue.split_queue); > to->deferred_split_queue.split_queue_len++; > - spin_unlock(&to->deferred_split_queue.split_queue_lock); > } > + spin_unlock(&to->deferred_split_queue.split_queue_lock); > #endif > > spin_unlock_irqrestore(&from->move_lock, flags); > --
On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote: >On Fri 03-01-20 22:34:07, Wei Yang wrote: >> As all the other places, we grab the lock before manipulate the defer list. >> Current implementation may face a race condition. > >Please always make sure to describe the effect of the change. Why a racy >list_empty check matters? > Hmm... access the list without proper lock leads to many bad behaviors. For example, if we grab the lock after checking list_empty, the page may already be removed from list in split_huge_page_list. And then list_del_init would trigger bug.
On Mon, Jan 06, 2020 at 08:18:34AM -0800, Alexander Duyck wrote: >On Fri, Jan 3, 2020 at 6:34 AM Wei Yang <richardw.yang@linux.intel.com> wrote: >> >> As all the other places, we grab the lock before manipulate the defer list. >> Current implementation may face a race condition. >> >> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> >> --- >> I notice the difference during code reading and just confused about the >> difference. No specific test is done since limited knowledge about cgroup. >> >> Maybe I miss something important? >> --- >> mm/memcontrol.c | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index bc01423277c5..62b7ec34ef1a 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page, >> } >> >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >> + spin_lock(&from->deferred_split_queue.split_queue_lock); >> if (compound && !list_empty(page_deferred_list(page))) { >> - spin_lock(&from->deferred_split_queue.split_queue_lock); >> list_del_init(page_deferred_list(page)); >> from->deferred_split_queue.split_queue_len--; >> - spin_unlock(&from->deferred_split_queue.split_queue_lock); >> } >> + spin_unlock(&from->deferred_split_queue.split_queue_lock); >> #endif >> /* >> * It is safe to change page->mem_cgroup here because the page > >So I suspect the lock placement has to do with the compound boolean >value passed to the function. > Hey, Alexander Thanks for your comment. >One thing you might want to do is pull the "if (compound)" check out >and place it outside of the spinlock check. It would then simplify >this signficantly so it is something like >if (compound) { > spin_lock(); > list = page_deferred_list(page); > if (!list_empty(list)) { > list_del_init(list); > from->..split_queue_len--; > } > spin_unlock(); >} > >Same for the block below. I would pull the check for compound outside >of the spinlock call since it is a value that shouldn't change and >would eliminate an unnecessary lock in the non-compound case. This is reasonable, if no objection from others, I would change this in v2.
On Tue, 7 Jan 2020, Wei Yang wrote: > >One thing you might want to do is pull the "if (compound)" check out > >and place it outside of the spinlock check. It would then simplify > >this signficantly so it is something like > >if (compound) { > > spin_lock(); > > list = page_deferred_list(page); > > if (!list_empty(list)) { > > list_del_init(list); > > from->..split_queue_len--; > > } > > spin_unlock(); > >} > > > >Same for the block below. I would pull the check for compound outside > >of the spinlock call since it is a value that shouldn't change and > >would eliminate an unnecessary lock in the non-compound case. > > This is reasonable, if no objection from others, I would change this in v2. Looks fine to me; I don't see it as a necessary improvement but there's also no reason to object to it. It's definitely a patch that is needed, however, for the simple reason that with the existing code we can manipulate the deferred split queue incorrectly so either way works for me. Feel free to keep my acked-by.
On Mon, Jan 06, 2020 at 06:07:29PM -0800, David Rientjes wrote: >On Tue, 7 Jan 2020, Wei Yang wrote: > >> >One thing you might want to do is pull the "if (compound)" check out >> >and place it outside of the spinlock check. It would then simplify >> >this signficantly so it is something like >> >if (compound) { >> > spin_lock(); >> > list = page_deferred_list(page); >> > if (!list_empty(list)) { >> > list_del_init(list); >> > from->..split_queue_len--; >> > } >> > spin_unlock(); >> >} >> > >> >Same for the block below. I would pull the check for compound outside >> >of the spinlock call since it is a value that shouldn't change and >> >would eliminate an unnecessary lock in the non-compound case. >> >> This is reasonable, if no objection from others, I would change this in v2. > >Looks fine to me; I don't see it as a necessary improvement but there's >also no reason to object to it. It's definitely a patch that is needed, >however, for the simple reason that with the existing code we can >manipulate the deferred split queue incorrectly so either way works for >me. Feel free to keep my acked-by. Ah, thanks David. You are so supportive.
On Tue 07-01-20 09:22:41, Wei Yang wrote: > On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote: > >On Fri 03-01-20 22:34:07, Wei Yang wrote: > >> As all the other places, we grab the lock before manipulate the defer list. > >> Current implementation may face a race condition. > > > >Please always make sure to describe the effect of the change. Why a racy > >list_empty check matters? > > > > Hmm... access the list without proper lock leads to many bad behaviors. My point is that the changelog should describe that bad behavior. > For example, if we grab the lock after checking list_empty, the page may > already be removed from list in split_huge_page_list. And then list_del_init > would trigger bug. And how does list_empty check under the lock guarantee that the page is on the deferred list?
On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote: >On Tue 07-01-20 09:22:41, Wei Yang wrote: >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote: >> >On Fri 03-01-20 22:34:07, Wei Yang wrote: >> >> As all the other places, we grab the lock before manipulate the defer list. >> >> Current implementation may face a race condition. >> > >> >Please always make sure to describe the effect of the change. Why a racy >> >list_empty check matters? >> > >> >> Hmm... access the list without proper lock leads to many bad behaviors. > >My point is that the changelog should describe that bad behavior. > >> For example, if we grab the lock after checking list_empty, the page may >> already be removed from list in split_huge_page_list. And then list_del_init >> would trigger bug. > >And how does list_empty check under the lock guarantee that the page is >on the deferred list? Just one confusion, is this kind of description basic concept of concurrent programming? How detail level we need to describe the effect? To me, grab the lock before accessing the critical section is obvious. list_empty and list_del should be the critical section. And the lock should protect the whole critical section instead of part of it. >-- >Michal Hocko >SUSE Labs
On Wed 08-01-20 08:35:43, Wei Yang wrote: > On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote: > >On Tue 07-01-20 09:22:41, Wei Yang wrote: > >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote: > >> >On Fri 03-01-20 22:34:07, Wei Yang wrote: > >> >> As all the other places, we grab the lock before manipulate the defer list. > >> >> Current implementation may face a race condition. > >> > > >> >Please always make sure to describe the effect of the change. Why a racy > >> >list_empty check matters? > >> > > >> > >> Hmm... access the list without proper lock leads to many bad behaviors. > > > >My point is that the changelog should describe that bad behavior. > > > >> For example, if we grab the lock after checking list_empty, the page may > >> already be removed from list in split_huge_page_list. And then list_del_init > >> would trigger bug. > > > >And how does list_empty check under the lock guarantee that the page is > >on the deferred list? > > Just one confusion, is this kind of description basic concept of concurrent > programming? How detail level we need to describe the effect? When I write changelogs for patches like this I usually describe, what is the potential race - e.g. CPU1 CPU2 path1 path2 check lock operation2 unlock lock # check might not hold anymore operation1 unlock and what is the effect of the race - e.g. a crash, data corruption, pointless attempt for operation1 which fails with user visible effect etc. This helps reviewers and everybody reading the code in the future to understand the locking scheme. > To me, grab the lock before accessing the critical section is obvious. It might be obvious but in many cases it is useful to minimize the locking and do a potentially race check before the lock is taken if the resulting operation can handle that. > list_empty and list_del should be the critical section. And the > lock should protect the whole critical section instead of part of it. I am not disputing that. What I am trying to say is that the changelog should described the problem in the first place. Moreover, look at the code you are trying to fix. Sure extending the locking seem straightforward but does it result in a correct code though? See my question in the previous email. How do we know that the page is actually enqued in a non-empty list?
On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote: >On Wed 08-01-20 08:35:43, Wei Yang wrote: >> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote: >> >On Tue 07-01-20 09:22:41, Wei Yang wrote: >> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote: >> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote: >> >> >> As all the other places, we grab the lock before manipulate the defer list. >> >> >> Current implementation may face a race condition. >> >> > >> >> >Please always make sure to describe the effect of the change. Why a racy >> >> >list_empty check matters? >> >> > >> >> >> >> Hmm... access the list without proper lock leads to many bad behaviors. >> > >> >My point is that the changelog should describe that bad behavior. >> > >> >> For example, if we grab the lock after checking list_empty, the page may >> >> already be removed from list in split_huge_page_list. And then list_del_init >> >> would trigger bug. >> > >> >And how does list_empty check under the lock guarantee that the page is >> >on the deferred list? >> >> Just one confusion, is this kind of description basic concept of concurrent >> programming? How detail level we need to describe the effect? > >When I write changelogs for patches like this I usually describe, what >is the potential race - e.g. > CPU1 CPU2 > path1 path2 > check lock > operation2 > unlock > lock > # check might not hold anymore > operation1 > unlock > Nice, I would prepare a changelog like this. >and what is the effect of the race - e.g. a crash, data corruption, >pointless attempt for operation1 which fails with user visible effect >etc. >This helps reviewers and everybody reading the code in the future to >understand the locking scheme. > >> To me, grab the lock before accessing the critical section is obvious. > >It might be obvious but in many cases it is useful to minimize the >locking and do a potentially race check before the lock is taken if the >resulting operation can handle that. > >> list_empty and list_del should be the critical section. And the >> lock should protect the whole critical section instead of part of it. > >I am not disputing that. What I am trying to say is that the changelog >should described the problem in the first place. > >Moreover, look at the code you are trying to fix. Sure extending the >locking seem straightforward but does it result in a correct code >though? See my question in the previous email. How do we know that the >page is actually enqued in a non-empty list? I may not get your point for the last sentence. The list_empty() doesn't check the queue is empty but check the list, here is the page, is not enqueued into any list. Is this your concern? >-- >Michal Hocko >SUSE Labs
On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote: >On Wed 08-01-20 08:35:43, Wei Yang wrote: >> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote: >> >On Tue 07-01-20 09:22:41, Wei Yang wrote: >> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote: >> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote: >> >> >> As all the other places, we grab the lock before manipulate the defer list. >> >> >> Current implementation may face a race condition. >> >> > >> >> >Please always make sure to describe the effect of the change. Why a racy >> >> >list_empty check matters? >> >> > >> >> >> >> Hmm... access the list without proper lock leads to many bad behaviors. >> > >> >My point is that the changelog should describe that bad behavior. >> > >> >> For example, if we grab the lock after checking list_empty, the page may >> >> already be removed from list in split_huge_page_list. And then list_del_init >> >> would trigger bug. >> > >> >And how does list_empty check under the lock guarantee that the page is >> >on the deferred list? >> >> Just one confusion, is this kind of description basic concept of concurrent >> programming? How detail level we need to describe the effect? > >When I write changelogs for patches like this I usually describe, what >is the potential race - e.g. > CPU1 CPU2 > path1 path2 > check lock > operation2 > unlock > lock > # check might not hold anymore > operation1 > unlock > >and what is the effect of the race - e.g. a crash, data corruption, >pointless attempt for operation1 which fails with user visible effect >etc. Hi, Michal, here is my attempt for an example. Hope this one looks good to you. For example, the potential race would be: CPU1 CPU2 mem_cgroup_move_account split_huge_page_to_list !list_empty lock !list_empty list_del unlock lock # !list_empty might not hold anymore list_del_init unlock When this sequence happens, the list_del_init() in mem_cgroup_move_account() would crash since the page is already been removed by list_del in split_huge_page_to_list().
On Thu 09-01-20 10:03:19, Wei Yang wrote: > On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote: [...] > >Moreover, look at the code you are trying to fix. Sure extending the > >locking seem straightforward but does it result in a correct code > >though? See my question in the previous email. How do we know that the > >page is actually enqued in a non-empty list? > > I may not get your point for the last sentence. > > The list_empty() doesn't check the queue is empty but check the list, here is > the page, is not enqueued into any list. Is this your concern? My bad. For some reason I have misread the code and thought this was get_deferred_split_queue rather than page_deferred_list. Sorry about the confusion.
On Thu 09-01-20 11:18:21, Wei Yang wrote: > On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote: > >On Wed 08-01-20 08:35:43, Wei Yang wrote: > >> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote: > >> >On Tue 07-01-20 09:22:41, Wei Yang wrote: > >> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote: > >> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote: > >> >> >> As all the other places, we grab the lock before manipulate the defer list. > >> >> >> Current implementation may face a race condition. > >> >> > > >> >> >Please always make sure to describe the effect of the change. Why a racy > >> >> >list_empty check matters? > >> >> > > >> >> > >> >> Hmm... access the list without proper lock leads to many bad behaviors. > >> > > >> >My point is that the changelog should describe that bad behavior. > >> > > >> >> For example, if we grab the lock after checking list_empty, the page may > >> >> already be removed from list in split_huge_page_list. And then list_del_init > >> >> would trigger bug. > >> > > >> >And how does list_empty check under the lock guarantee that the page is > >> >on the deferred list? > >> > >> Just one confusion, is this kind of description basic concept of concurrent > >> programming? How detail level we need to describe the effect? > > > >When I write changelogs for patches like this I usually describe, what > >is the potential race - e.g. > > CPU1 CPU2 > > path1 path2 > > check lock > > operation2 > > unlock > > lock > > # check might not hold anymore > > operation1 > > unlock > > > >and what is the effect of the race - e.g. a crash, data corruption, > >pointless attempt for operation1 which fails with user visible effect > >etc. > > Hi, Michal, here is my attempt for an example. Hope this one looks good to > you. > > > For example, the potential race would be: > > CPU1 CPU2 > mem_cgroup_move_account split_huge_page_to_list > !list_empty > lock > !list_empty > list_del > unlock > lock > # !list_empty might not hold anymore > list_del_init > unlock > > When this sequence happens, the list_del_init() in > mem_cgroup_move_account() would crash since the page is already been > removed by list_del in split_huge_page_to_list(). Yes this looks much more informative. I would just add that this will crash if CONFIG_DEBUG_LIST. Thanks!
On Thu, Jan 09, 2020 at 09:36:41AM +0100, Michal Hocko wrote: >On Thu 09-01-20 11:18:21, Wei Yang wrote: >> On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote: >> >On Wed 08-01-20 08:35:43, Wei Yang wrote: >> >> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote: >> >> >On Tue 07-01-20 09:22:41, Wei Yang wrote: >> >> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote: >> >> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote: >> >> >> >> As all the other places, we grab the lock before manipulate the defer list. >> >> >> >> Current implementation may face a race condition. >> >> >> > >> >> >> >Please always make sure to describe the effect of the change. Why a racy >> >> >> >list_empty check matters? >> >> >> > >> >> >> >> >> >> Hmm... access the list without proper lock leads to many bad behaviors. >> >> > >> >> >My point is that the changelog should describe that bad behavior. >> >> > >> >> >> For example, if we grab the lock after checking list_empty, the page may >> >> >> already be removed from list in split_huge_page_list. And then list_del_init >> >> >> would trigger bug. >> >> > >> >> >And how does list_empty check under the lock guarantee that the page is >> >> >on the deferred list? >> >> >> >> Just one confusion, is this kind of description basic concept of concurrent >> >> programming? How detail level we need to describe the effect? >> > >> >When I write changelogs for patches like this I usually describe, what >> >is the potential race - e.g. >> > CPU1 CPU2 >> > path1 path2 >> > check lock >> > operation2 >> > unlock >> > lock >> > # check might not hold anymore >> > operation1 >> > unlock >> > >> >and what is the effect of the race - e.g. a crash, data corruption, >> >pointless attempt for operation1 which fails with user visible effect >> >etc. >> >> Hi, Michal, here is my attempt for an example. Hope this one looks good to >> you. >> >> >> For example, the potential race would be: >> >> CPU1 CPU2 >> mem_cgroup_move_account split_huge_page_to_list >> !list_empty >> lock >> !list_empty >> list_del >> unlock >> lock >> # !list_empty might not hold anymore >> list_del_init >> unlock >> >> When this sequence happens, the list_del_init() in >> mem_cgroup_move_account() would crash since the page is already been >> removed by list_del in split_huge_page_to_list(). > >Yes this looks much more informative. I would just add that this will >crash if CONFIG_DEBUG_LIST. > >Thanks! Glad you like it~ Will prepare v2 with your suggestion :-) >-- >Michal Hocko >SUSE Labs
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index bc01423277c5..62b7ec34ef1a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page, } #ifdef CONFIG_TRANSPARENT_HUGEPAGE + spin_lock(&from->deferred_split_queue.split_queue_lock); if (compound && !list_empty(page_deferred_list(page))) { - spin_lock(&from->deferred_split_queue.split_queue_lock); list_del_init(page_deferred_list(page)); from->deferred_split_queue.split_queue_len--; - spin_unlock(&from->deferred_split_queue.split_queue_lock); } + spin_unlock(&from->deferred_split_queue.split_queue_lock); #endif /* * It is safe to change page->mem_cgroup here because the page @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page, page->mem_cgroup = to; #ifdef CONFIG_TRANSPARENT_HUGEPAGE + spin_lock(&to->deferred_split_queue.split_queue_lock); if (compound && list_empty(page_deferred_list(page))) { - spin_lock(&to->deferred_split_queue.split_queue_lock); list_add_tail(page_deferred_list(page), &to->deferred_split_queue.split_queue); to->deferred_split_queue.split_queue_len++; - spin_unlock(&to->deferred_split_queue.split_queue_lock); } + spin_unlock(&to->deferred_split_queue.split_queue_lock); #endif spin_unlock_irqrestore(&from->move_lock, flags);
As all the other places, we grab the lock before manipulate the defer list. Current implementation may face a race condition. Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> --- I notice the difference during code reading and just confused about the difference. No specific test is done since limited knowledge about cgroup. Maybe I miss something important? --- mm/memcontrol.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)