Message ID | 20200117233836.3434-1-richardw.yang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] mm: thp: remove the defer list related code since this will not happen | expand |
On 1/17/20 3:38 PM, Wei Yang wrote: > If compound is true, this means it is a PMD mapped THP. Which implies > the page is not linked to any defer list. So the first code chunk will > not be executed. > > Also with this reason, it would not be proper to add this page to a > defer list. So the second code chunk is not correct. > > Based on this, we should remove the defer list related code. > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Cc: <stable@vger.kernel.org> [5.4+] > > --- > v4: > * finally we identified the related code is not necessary and not > correct, just remove it > * thanks to Kirill T first spot some problem Thanks for debugging and figuring this out. Acked-by: Yang Shi <yang.shi@linux.alibaba.com> > v3: > * remove all review/ack tag since rewrite the changelog > * use deferred_split_huge_page as the example of race > * add cc stable 5.4+ tag as suggested by David Rientjes > > v2: > * move check on compound outside suggested by Alexander > * an example of the race condition, suggested by Michal > --- > mm/memcontrol.c | 18 ------------------ > 1 file changed, 18 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6c83cf4ed970..27c231bf4565 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struct page *page, > __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); > } > > -#ifdef CONFIG_TRANSPARENT_HUGEPAGE > - if (compound && !list_empty(page_deferred_list(page))) { > - spin_lock(&from->deferred_split_queue.split_queue_lock); > - list_del_init(page_deferred_list(page)); > - from->deferred_split_queue.split_queue_len--; > - spin_unlock(&from->deferred_split_queue.split_queue_lock); > - } > -#endif > /* > * It is safe to change page->mem_cgroup here because the page > * is referenced, charged, and isolated - we can't race with > @@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struct page *page, > /* caller should have done css_get */ > page->mem_cgroup = to; > > -#ifdef CONFIG_TRANSPARENT_HUGEPAGE > - if (compound && list_empty(page_deferred_list(page))) { > - spin_lock(&to->deferred_split_queue.split_queue_lock); > - list_add_tail(page_deferred_list(page), > - &to->deferred_split_queue.split_queue); > - to->deferred_split_queue.split_queue_len++; > - spin_unlock(&to->deferred_split_queue.split_queue_lock); > - } > -#endif > - > spin_unlock_irqrestore(&from->move_lock, flags); > > ret = 0;
On 1/17/20 4:57 PM, Yang Shi wrote: > > > On 1/17/20 3:38 PM, Wei Yang wrote: >> If compound is true, this means it is a PMD mapped THP. Which implies >> the page is not linked to any defer list. So the first code chunk will >> not be executed. >> >> Also with this reason, it would not be proper to add this page to a >> defer list. So the second code chunk is not correct. >> >> Based on this, we should remove the defer list related code. >> >> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg >> aware") >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >> Cc: <stable@vger.kernel.org> [5.4+] >> >> --- >> v4: >> * finally we identified the related code is not necessary and not >> correct, just remove it >> * thanks to Kirill T first spot some problem > > Thanks for debugging and figuring this out. Acked-by: Yang Shi > <yang.shi@linux.alibaba.com> BTW, the patch itself is fine, but the subject looks really confusing. It sounds like we would remove all deferred list code. I'd suggest rephrase it to: mm: thp: don't need care deferred split queue in memcg charge move path > >> v3: >> * remove all review/ack tag since rewrite the changelog >> * use deferred_split_huge_page as the example of race >> * add cc stable 5.4+ tag as suggested by David Rientjes >> >> v2: >> * move check on compound outside suggested by Alexander >> * an example of the race condition, suggested by Michal >> --- >> mm/memcontrol.c | 18 ------------------ >> 1 file changed, 18 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 6c83cf4ed970..27c231bf4565 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struct page >> *page, >> __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); >> } >> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> - if (compound && !list_empty(page_deferred_list(page))) { >> - spin_lock(&from->deferred_split_queue.split_queue_lock); >> - list_del_init(page_deferred_list(page)); >> - from->deferred_split_queue.split_queue_len--; >> - spin_unlock(&from->deferred_split_queue.split_queue_lock); >> - } >> -#endif >> /* >> * It is safe to change page->mem_cgroup here because the page >> * is referenced, charged, and isolated - we can't race with >> @@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struct page >> *page, >> /* caller should have done css_get */ >> page->mem_cgroup = to; >> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> - if (compound && list_empty(page_deferred_list(page))) { >> - spin_lock(&to->deferred_split_queue.split_queue_lock); >> - list_add_tail(page_deferred_list(page), >> - &to->deferred_split_queue.split_queue); >> - to->deferred_split_queue.split_queue_len++; >> - spin_unlock(&to->deferred_split_queue.split_queue_lock); >> - } >> -#endif >> - >> spin_unlock_irqrestore(&from->move_lock, flags); >> ret = 0; >
On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote: > If compound is true, this means it is a PMD mapped THP. Which implies > the page is not linked to any defer list. So the first code chunk will > not be executed. > > Also with this reason, it would not be proper to add this page to a > defer list. So the second code chunk is not correct. > > Based on this, we should remove the defer list related code. > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Cc: <stable@vger.kernel.org> [5.4+] This patch is identical to "mm: thp: grab the lock before manipulating defer list", which is rather confusing. Please let people know when this sort of thing is done. The earlier changelog mentioned a possible race condition. This changelog does not. In fact this changelog fails to provide any description of any userspace-visible runtime effects of the bug. Please send along such a description for inclusion, as always.
On Sat, 18 Jan 2020, Andrew Morton wrote: > On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote: > > > If compound is true, this means it is a PMD mapped THP. Which implies > > the page is not linked to any defer list. So the first code chunk will > > not be executed. > > > > Also with this reason, it would not be proper to add this page to a > > defer list. So the second code chunk is not correct. > > > > Based on this, we should remove the defer list related code. > > > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") > > > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > Cc: <stable@vger.kernel.org> [5.4+] > > This patch is identical to "mm: thp: grab the lock before manipulating > defer list", which is rather confusing. Please let people know when > this sort of thing is done. > > The earlier changelog mentioned a possible race condition. This > changelog does not. In fact this changelog fails to provide any > description of any userspace-visible runtime effects of the bug. > Please send along such a description for inclusion, as always. > The locking concern that Wei was originally looking at is no longer an issue because we determined that the code in question could simply be removed. I think the following can be added to the changelog: ----->o----- When migrating memcg charges of thp memory, there are two possibilities: (1) The underlying compound page is mapped by a pmd and thus does is not on a deferred split queue (it's mapped), or (2) The compound page is not mapped by a pmd and is awaiting split on a deferred split queue. The current charge migration implementation does *not* migrate charges for thp memory on the deferred split queue, it only migrates charges for pages that are mapped by a pmd. Thus, to migrate charges, the underlying compound page cannot be on a deferred split queue; no list manipulation needs to be done in mem_cgroup_move_account(). With the current code, the underlying compound page is moved to the deferred split queue of the memcg its memory is not charged to, so susbequent reclaim will consider these pages for the wrong memcg. Remove the deferred split queue handling in mem_cgroup_move_account() entirely. ----->o----- Acked-by: David Rientjes <rientjes@google.com>
On Sat, Jan 18, 2020 at 03:36:06PM -0800, David Rientjes wrote: >On Sat, 18 Jan 2020, Andrew Morton wrote: > >> On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote: >> >> > If compound is true, this means it is a PMD mapped THP. Which implies >> > the page is not linked to any defer list. So the first code chunk will >> > not be executed. >> > >> > Also with this reason, it would not be proper to add this page to a >> > defer list. So the second code chunk is not correct. >> > >> > Based on this, we should remove the defer list related code. >> > >> > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") >> > >> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >> > Cc: <stable@vger.kernel.org> [5.4+] >> >> This patch is identical to "mm: thp: grab the lock before manipulating >> defer list", which is rather confusing. Please let people know when >> this sort of thing is done. >> >> The earlier changelog mentioned a possible race condition. This >> changelog does not. In fact this changelog fails to provide any >> description of any userspace-visible runtime effects of the bug. >> Please send along such a description for inclusion, as always. >> > >The locking concern that Wei was originally looking at is no longer an >issue because we determined that the code in question could simply be >removed. > >I think the following can be added to the changelog: > >----->o----- > >When migrating memcg charges of thp memory, there are two possibilities: > > (1) The underlying compound page is mapped by a pmd and thus does is not > on a deferred split queue (it's mapped), or > > (2) The compound page is not mapped by a pmd and is awaiting split on a > deferred split queue. > >The current charge migration implementation does *not* migrate charges for >thp memory on the deferred split queue, it only migrates charges for pages >that are mapped by a pmd. > >Thus, to migrate charges, the underlying compound page cannot be on a >deferred split queue; no list manipulation needs to be done in >mem_cgroup_move_account(). > >With the current code, the underlying compound page is moved to the >deferred split queue of the memcg its memory is not charged to, so >susbequent reclaim will consider these pages for the wrong memcg. Remove >the deferred split queue handling in mem_cgroup_move_account() entirely. > >----->o----- > >Acked-by: David Rientjes <rientjes@google.com> Hi David, The changlog looks awesome to me. Thanks ~ Hi Andrew I see you queue this in you tree, do I need to rewrite a patch with better changelog?
On Sat 18-01-20 15:36:06, David Rientjes wrote: > On Sat, 18 Jan 2020, Andrew Morton wrote: > > > On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote: > > > > > If compound is true, this means it is a PMD mapped THP. Which implies > > > the page is not linked to any defer list. So the first code chunk will > > > not be executed. > > > > > > Also with this reason, it would not be proper to add this page to a > > > defer list. So the second code chunk is not correct. > > > > > > Based on this, we should remove the defer list related code. > > > > > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") > > > > > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > > > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > Cc: <stable@vger.kernel.org> [5.4+] > > > > This patch is identical to "mm: thp: grab the lock before manipulating > > defer list", which is rather confusing. Please let people know when > > this sort of thing is done. > > > > The earlier changelog mentioned a possible race condition. This > > changelog does not. In fact this changelog fails to provide any > > description of any userspace-visible runtime effects of the bug. > > Please send along such a description for inclusion, as always. > > > > The locking concern that Wei was originally looking at is no longer an > issue because we determined that the code in question could simply be > removed. > > I think the following can be added to the changelog: > > ----->o----- > > When migrating memcg charges of thp memory, there are two possibilities: > > (1) The underlying compound page is mapped by a pmd and thus does is not > on a deferred split queue (it's mapped), or > > (2) The compound page is not mapped by a pmd and is awaiting split on a > deferred split queue. > > The current charge migration implementation does *not* migrate charges for > thp memory on the deferred split queue, it only migrates charges for pages > that are mapped by a pmd. > > Thus, to migrate charges, the underlying compound page cannot be on a > deferred split queue; no list manipulation needs to be done in > mem_cgroup_move_account(). > > With the current code, the underlying compound page is moved to the > deferred split queue of the memcg its memory is not charged to, so > susbequent reclaim will consider these pages for the wrong memcg. Remove > the deferred split queue handling in mem_cgroup_move_account() entirely. I believe this still doesn't describe the underlying problem to the full extent. What happens with the page on the deferred list when it shouldn't be there in fact? Unless I am missing something deferred_split_scan will simply split that huge page. Which is a bit unfortunate but nothing really critical. This should be mentioned in the changelog. With that clarified, feel free to add Acked-by: Michal Hocko <mhocko@suse.com>
On Mon, Jan 20, 2020 at 08:22:37AM +0100, Michal Hocko wrote: >On Sat 18-01-20 15:36:06, David Rientjes wrote: >> On Sat, 18 Jan 2020, Andrew Morton wrote: >> >> > On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote: >> > >> > > If compound is true, this means it is a PMD mapped THP. Which implies >> > > the page is not linked to any defer list. So the first code chunk will >> > > not be executed. >> > > >> > > Also with this reason, it would not be proper to add this page to a >> > > defer list. So the second code chunk is not correct. >> > > >> > > Based on this, we should remove the defer list related code. >> > > >> > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") >> > > >> > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> > > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >> > > Cc: <stable@vger.kernel.org> [5.4+] >> > >> > This patch is identical to "mm: thp: grab the lock before manipulating >> > defer list", which is rather confusing. Please let people know when >> > this sort of thing is done. >> > >> > The earlier changelog mentioned a possible race condition. This >> > changelog does not. In fact this changelog fails to provide any >> > description of any userspace-visible runtime effects of the bug. >> > Please send along such a description for inclusion, as always. >> > >> >> The locking concern that Wei was originally looking at is no longer an >> issue because we determined that the code in question could simply be >> removed. >> >> I think the following can be added to the changelog: >> >> ----->o----- >> >> When migrating memcg charges of thp memory, there are two possibilities: >> >> (1) The underlying compound page is mapped by a pmd and thus does is not >> on a deferred split queue (it's mapped), or >> >> (2) The compound page is not mapped by a pmd and is awaiting split on a >> deferred split queue. >> >> The current charge migration implementation does *not* migrate charges for >> thp memory on the deferred split queue, it only migrates charges for pages >> that are mapped by a pmd. >> >> Thus, to migrate charges, the underlying compound page cannot be on a >> deferred split queue; no list manipulation needs to be done in >> mem_cgroup_move_account(). >> >> With the current code, the underlying compound page is moved to the >> deferred split queue of the memcg its memory is not charged to, so >> susbequent reclaim will consider these pages for the wrong memcg. Remove >> the deferred split queue handling in mem_cgroup_move_account() entirely. > >I believe this still doesn't describe the underlying problem to the full >extent. What happens with the page on the deferred list when it >shouldn't be there in fact? Unless I am missing something deferred_split_scan >will simply split that huge page. Which is a bit unfortunate but nothing >really critical. This should be mentioned in the changelog. > Per my understanding, if we do the split when it is not necessary, we probably have a lower performance due to tlb miss. For others, I don't see the impact. >With that clarified, feel free to add > >Acked-by: Michal Hocko <mhocko@suse.com> > >-- >Michal Hocko >SUSE Labs
On Mon, 20 Jan 2020, Michal Hocko wrote: > > When migrating memcg charges of thp memory, there are two possibilities: > > > > (1) The underlying compound page is mapped by a pmd and thus does is not > > on a deferred split queue (it's mapped), or > > > > (2) The compound page is not mapped by a pmd and is awaiting split on a > > deferred split queue. > > > > The current charge migration implementation does *not* migrate charges for > > thp memory on the deferred split queue, it only migrates charges for pages > > that are mapped by a pmd. > > > > Thus, to migrate charges, the underlying compound page cannot be on a > > deferred split queue; no list manipulation needs to be done in > > mem_cgroup_move_account(). > > > > With the current code, the underlying compound page is moved to the > > deferred split queue of the memcg its memory is not charged to, so > > susbequent reclaim will consider these pages for the wrong memcg. Remove > > the deferred split queue handling in mem_cgroup_move_account() entirely. > > I believe this still doesn't describe the underlying problem to the full > extent. What happens with the page on the deferred list when it > shouldn't be there in fact? Unless I am missing something deferred_split_scan > will simply split that huge page. Which is a bit unfortunate but nothing > really critical. This should be mentioned in the changelog. > Are you referring to a compound page on the deferred split queue before a task is moved? I'm not sure this is within the scope of Wei's patch.. this is simply preventing a page from being moved to the deferred split queue of a memcg that it is not charged to. Is there a concern about why this code can be removed or a suggestion on something else it should be doing instead?
On Mon 20-01-20 13:10:56, David Rientjes wrote: > On Mon, 20 Jan 2020, Michal Hocko wrote: > > > > When migrating memcg charges of thp memory, there are two possibilities: > > > > > > (1) The underlying compound page is mapped by a pmd and thus does is not > > > on a deferred split queue (it's mapped), or > > > > > > (2) The compound page is not mapped by a pmd and is awaiting split on a > > > deferred split queue. > > > > > > The current charge migration implementation does *not* migrate charges for > > > thp memory on the deferred split queue, it only migrates charges for pages > > > that are mapped by a pmd. > > > > > > Thus, to migrate charges, the underlying compound page cannot be on a > > > deferred split queue; no list manipulation needs to be done in > > > mem_cgroup_move_account(). > > > > > > With the current code, the underlying compound page is moved to the > > > deferred split queue of the memcg its memory is not charged to, so > > > susbequent reclaim will consider these pages for the wrong memcg. Remove > > > the deferred split queue handling in mem_cgroup_move_account() entirely. > > > > I believe this still doesn't describe the underlying problem to the full > > extent. What happens with the page on the deferred list when it > > shouldn't be there in fact? Unless I am missing something deferred_split_scan > > will simply split that huge page. Which is a bit unfortunate but nothing > > really critical. This should be mentioned in the changelog. > > > > Are you referring to a compound page on the deferred split queue before a > task is moved? I'm not sure this is within the scope of Wei's patch.. > this is simply preventing a page from being moved to the deferred split > queue of a memcg that it is not charged to. Is there a concern about why > this code can be removed or a suggestion on something else it should be > doing instead? No, I do not have any concern about the patch itslef. It is that the changelog doesn't decribe the user visible effect. All I am saying is that the current code splits THPs of moved pages under memory pressure even if that is not needed. And that is a clear bug.
On Mon, 20 Jan 2020, Michal Hocko wrote: > > > > When migrating memcg charges of thp memory, there are two possibilities: > > > > > > > > (1) The underlying compound page is mapped by a pmd and thus does is not > > > > on a deferred split queue (it's mapped), or > > > > > > > > (2) The compound page is not mapped by a pmd and is awaiting split on a > > > > deferred split queue. > > > > > > > > The current charge migration implementation does *not* migrate charges for > > > > thp memory on the deferred split queue, it only migrates charges for pages > > > > that are mapped by a pmd. > > > > > > > > Thus, to migrate charges, the underlying compound page cannot be on a > > > > deferred split queue; no list manipulation needs to be done in > > > > mem_cgroup_move_account(). > > > > > > > > With the current code, the underlying compound page is moved to the > > > > deferred split queue of the memcg its memory is not charged to, so > > > > susbequent reclaim will consider these pages for the wrong memcg. Remove > > > > the deferred split queue handling in mem_cgroup_move_account() entirely. > > > > > > I believe this still doesn't describe the underlying problem to the full > > > extent. What happens with the page on the deferred list when it > > > shouldn't be there in fact? Unless I am missing something deferred_split_scan > > > will simply split that huge page. Which is a bit unfortunate but nothing > > > really critical. This should be mentioned in the changelog. > > > > > > > Are you referring to a compound page on the deferred split queue before a > > task is moved? I'm not sure this is within the scope of Wei's patch.. > > this is simply preventing a page from being moved to the deferred split > > queue of a memcg that it is not charged to. Is there a concern about why > > this code can be removed or a suggestion on something else it should be > > doing instead? > > No, I do not have any concern about the patch itslef. It is that the > changelog doesn't decribe the user visible effect. All I am saying is > that the current code splits THPs of moved pages under memory pressure > even if that is not needed. And that is a clear bug. Ah, gotcha. I tried to do this in the final paragraph of my amedment to Wei's patch and why it's important that this is marked as stable. The current code in 5.4 from commit 87eaceb3faa59 places any migrated compound page onto the deferred split queue of the destination memcg regardless of whether it has a mapping pmd (list_empty(page_deferred_list()) was already false) or it does not have a mapping pmd (but is now on the wrong queue). For the latter, can_split_huge_page() can help for the actual split but not for the removal of the page that is now erroneously on the queue. For the former, memcg reclaim would not see the pages that it should split under memcg pressure so we'll see the same memcg oom conditions we saw before the deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms.
On Tue 21-01-20 15:08:39, David Rientjes wrote: > On Mon, 20 Jan 2020, Michal Hocko wrote: > > > > > > When migrating memcg charges of thp memory, there are two possibilities: > > > > > > > > > > (1) The underlying compound page is mapped by a pmd and thus does is not > > > > > on a deferred split queue (it's mapped), or > > > > > > > > > > (2) The compound page is not mapped by a pmd and is awaiting split on a > > > > > deferred split queue. > > > > > > > > > > The current charge migration implementation does *not* migrate charges for > > > > > thp memory on the deferred split queue, it only migrates charges for pages > > > > > that are mapped by a pmd. > > > > > > > > > > Thus, to migrate charges, the underlying compound page cannot be on a > > > > > deferred split queue; no list manipulation needs to be done in > > > > > mem_cgroup_move_account(). > > > > > > > > > > With the current code, the underlying compound page is moved to the > > > > > deferred split queue of the memcg its memory is not charged to, so > > > > > susbequent reclaim will consider these pages for the wrong memcg. Remove > > > > > the deferred split queue handling in mem_cgroup_move_account() entirely. > > > > > > > > I believe this still doesn't describe the underlying problem to the full > > > > extent. What happens with the page on the deferred list when it > > > > shouldn't be there in fact? Unless I am missing something deferred_split_scan > > > > will simply split that huge page. Which is a bit unfortunate but nothing > > > > really critical. This should be mentioned in the changelog. > > > > > > > > > > Are you referring to a compound page on the deferred split queue before a > > > task is moved? I'm not sure this is within the scope of Wei's patch.. > > > this is simply preventing a page from being moved to the deferred split > > > queue of a memcg that it is not charged to. Is there a concern about why > > > this code can be removed or a suggestion on something else it should be > > > doing instead? > > > > No, I do not have any concern about the patch itslef. It is that the > > changelog doesn't decribe the user visible effect. All I am saying is > > that the current code splits THPs of moved pages under memory pressure > > even if that is not needed. And that is a clear bug. > > Ah, gotcha. I tried to do this in the final paragraph of my amedment to > Wei's patch and why it's important that this is marked as stable. I considered "susbequent reclaim will consider these pages for the wrong memcg." quite unclear TBH. > The current code in 5.4 from commit 87eaceb3faa59 places any migrated > compound page onto the deferred split queue of the destination memcg > regardless of whether it has a mapping pmd > (list_empty(page_deferred_list()) was already false) or it does not have a > mapping pmd (but is now on the wrong queue). For the latter, > can_split_huge_page() can help for the actual split but not for the > removal of the page that is now erroneously on the queue. Does that mean that those fully mapped THPs are not going to be split? > For the former, > memcg reclaim would not see the pages that it should split under memcg > pressure so we'll see the same memcg oom conditions we saw before the > deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms. OK, this is yet another user visibile effect and it would be better to mention it explicitly in the changelog.
On Wed, 22 Jan 2020, Michal Hocko wrote: > > The current code in 5.4 from commit 87eaceb3faa59 places any migrated > > compound page onto the deferred split queue of the destination memcg > > regardless of whether it has a mapping pmd > > (list_empty(page_deferred_list()) was already false) or it does not have a > > mapping pmd (but is now on the wrong queue). For the latter, > > can_split_huge_page() can help for the actual split but not for the > > removal of the page that is now erroneously on the queue. > > Does that mean that those fully mapped THPs are not going to be split? > It believe it should but deferred_split_scan() also won't take it off the wrong split queue so it will remain there and any other checks for page_deferred_list(page) will succeed. > > For the former, > > memcg reclaim would not see the pages that it should split under memcg > > pressure so we'll see the same memcg oom conditions we saw before the > > deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms. > > OK, this is yet another user visibile effect and it would be better to > mention it explicitly in the changelog. > Ok, feel free to add to the last paragraph: Memcg reclaim would not see the compound pages that it should split under memcg pressure so we'll otherwise see the same memcg oom conditions we saw before the deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6c83cf4ed970..27c231bf4565 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struct page *page, __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (compound && !list_empty(page_deferred_list(page))) { - spin_lock(&from->deferred_split_queue.split_queue_lock); - list_del_init(page_deferred_list(page)); - from->deferred_split_queue.split_queue_len--; - spin_unlock(&from->deferred_split_queue.split_queue_lock); - } -#endif /* * It is safe to change page->mem_cgroup here because the page * is referenced, charged, and isolated - we can't race with @@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struct page *page, /* caller should have done css_get */ page->mem_cgroup = to; -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (compound && list_empty(page_deferred_list(page))) { - spin_lock(&to->deferred_split_queue.split_queue_lock); - list_add_tail(page_deferred_list(page), - &to->deferred_split_queue.split_queue); - to->deferred_split_queue.split_queue_len++; - spin_unlock(&to->deferred_split_queue.split_queue_lock); - } -#endif - spin_unlock_irqrestore(&from->move_lock, flags); ret = 0;
If compound is true, this means it is a PMD mapped THP. Which implies the page is not linked to any defer list. So the first code chunk will not be executed. Also with this reason, it would not be proper to add this page to a defer list. So the second code chunk is not correct. Based on this, we should remove the defer list related code. Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [5.4+] --- v4: * finally we identified the related code is not necessary and not correct, just remove it * thanks to Kirill T first spot some problem v3: * remove all review/ack tag since rewrite the changelog * use deferred_split_huge_page as the example of race * add cc stable 5.4+ tag as suggested by David Rientjes v2: * move check on compound outside suggested by Alexander * an example of the race condition, suggested by Michal --- mm/memcontrol.c | 18 ------------------ 1 file changed, 18 deletions(-)