diff mbox series

mm: Fix the pgtable leak

Message ID 20190213112900.33963-1-minchan@kernel.org (mailing list archive)
State New, archived
Headers show
Series mm: Fix the pgtable leak | expand

Commit Message

Minchan Kim Feb. 13, 2019, 11:29 a.m. UTC
[1] was backported to v4.9 stable tree but it introduces pgtable
memory leak because with fault retrial, preallocated pagetable
could be leaked in second iteration.
To fix the problem, this patch backport [2].

[1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
[2] b0b9b3df27d10, mm: stop leaking PageTables

Fixes: 5cf3e5ff95876 ("mm, memcg: fix reclaim deadlock with writeback")
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liu Bo <bo.liu@linux.alibaba.com>
Cc: <stable@vger.kernel.org> [4.9]
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/memory.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Minchan Kim Feb. 13, 2019, 11:33 a.m. UTC | #1
On Wed, Feb 13, 2019 at 08:29:00PM +0900, Minchan Kim wrote:
> [1] was backported to v4.9 stable tree but it introduces pgtable
> memory leak because with fault retrial, preallocated pagetable
> could be leaked in second iteration.
> To fix the problem, this patch backport [2].
> 
> [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
> [2] b0b9b3df27d10, mm: stop leaking PageTables
> 
> Fixes: 5cf3e5ff95876 ("mm, memcg: fix reclaim deadlock with writeback")
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Liu Bo <bo.liu@linux.alibaba.com>
> Cc: <stable@vger.kernel.org> [4.9]
> Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Martin Liu <liumartin@google.com>
Michal Hocko Feb. 13, 2019, 12:03 p.m. UTC | #2
On Wed 13-02-19 20:29:00, Minchan Kim wrote:
> [1] was backported to v4.9 stable tree but it introduces pgtable
> memory leak because with fault retrial, preallocated pagetable
> could be leaked in second iteration.
> To fix the problem, this patch backport [2].
> 
> [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
> [2] b0b9b3df27d10, mm: stop leaking PageTables
> 
> Fixes: 5cf3e5ff95876 ("mm, memcg: fix reclaim deadlock with writeback")
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Liu Bo <bo.liu@linux.alibaba.com>
> Cc: <stable@vger.kernel.org> [4.9]
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Thanks for catching this dependency. Do I assume it correctly that this
is stable-4.9 only?

> ---
>  mm/memory.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 35d8217bb0467..47248dc0b9e1a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3329,15 +3329,24 @@ static int do_fault(struct fault_env *fe)
>  {
>  	struct vm_area_struct *vma = fe->vma;
>  	pgoff_t pgoff = linear_page_index(vma, fe->address);
> +	int ret;
>  
>  	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
>  	if (!vma->vm_ops->fault)
> -		return VM_FAULT_SIGBUS;
> -	if (!(fe->flags & FAULT_FLAG_WRITE))
> -		return do_read_fault(fe, pgoff);
> -	if (!(vma->vm_flags & VM_SHARED))
> -		return do_cow_fault(fe, pgoff);
> -	return do_shared_fault(fe, pgoff);
> +		ret = VM_FAULT_SIGBUS;
> +	else if (!(fe->flags & FAULT_FLAG_WRITE))
> +		ret = do_read_fault(fe, pgoff);
> +	else if (!(vma->vm_flags & VM_SHARED))
> +		ret = do_cow_fault(fe, pgoff);
> +	else
> +		ret = do_shared_fault(fe, pgoff);
> +
> +	/* preallocated pagetable is unused: free it */
> +	if (fe->prealloc_pte) {
> +		pte_free(vma->vm_mm, fe->prealloc_pte);
> +		fe->prealloc_pte = 0;
> +	}
> +	return ret;
>  }
>  
>  static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
> -- 
> 2.20.1.791.gb4d0f1c61a-goog
>
Minchan Kim Feb. 13, 2019, 12:12 p.m. UTC | #3
On Wed, Feb 13, 2019 at 01:03:30PM +0100, Michal Hocko wrote:
> On Wed 13-02-19 20:29:00, Minchan Kim wrote:
> > [1] was backported to v4.9 stable tree but it introduces pgtable
> > memory leak because with fault retrial, preallocated pagetable
> > could be leaked in second iteration.
> > To fix the problem, this patch backport [2].
> > 
> > [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
> > [2] b0b9b3df27d10, mm: stop leaking PageTables
> > 
> > Fixes: 5cf3e5ff95876 ("mm, memcg: fix reclaim deadlock with writeback")
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Liu Bo <bo.liu@linux.alibaba.com>
> > Cc: <stable@vger.kernel.org> [4.9]
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> Thanks for catching this dependency. Do I assume it correctly that this
> is stable-4.9 only?

I have no idea how I could find it automatically that a stable patch of
linus tree is spread out with several stable trees(Hope Greg has an
answer). I just checked 4.4 longterm kernel and couldn't find it in there.

Thanks.
Michal Hocko Feb. 13, 2019, 12:24 p.m. UTC | #4
On Wed 13-02-19 21:12:00, Minchan Kim wrote:
> On Wed, Feb 13, 2019 at 01:03:30PM +0100, Michal Hocko wrote:
> > On Wed 13-02-19 20:29:00, Minchan Kim wrote:
> > > [1] was backported to v4.9 stable tree but it introduces pgtable
> > > memory leak because with fault retrial, preallocated pagetable
> > > could be leaked in second iteration.
> > > To fix the problem, this patch backport [2].
> > > 
> > > [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
> > > [2] b0b9b3df27d10, mm: stop leaking PageTables
> > > 
> > > Fixes: 5cf3e5ff95876 ("mm, memcg: fix reclaim deadlock with writeback")
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Hugh Dickins <hughd@google.com>
> > > Cc: Liu Bo <bo.liu@linux.alibaba.com>
> > > Cc: <stable@vger.kernel.org> [4.9]
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > 
> > Thanks for catching this dependency. Do I assume it correctly that this
> > is stable-4.9 only?
> 
> I have no idea how I could find it automatically that a stable patch of
> linus tree is spread out with several stable trees(Hope Greg has an
> answer). I just checked 4.4 longterm kernel and couldn't find it in there.

See http://lkml.kernel.org/r/20190115174036.GA24149@dhcp22.suse.cz

But my question was more about "this is a stable only thing"? It was not
obvious from the subject so I wanted to be sure that I am not missing
anything.
Greg Kroah-Hartman Feb. 13, 2019, 1:36 p.m. UTC | #5
On Wed, Feb 13, 2019 at 08:29:00PM +0900, Minchan Kim wrote:
> [1] was backported to v4.9 stable tree but it introduces pgtable
> memory leak because with fault retrial, preallocated pagetable
> could be leaked in second iteration.
> To fix the problem, this patch backport [2].
> 
> [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback

This is really commit 63f3655f9501 ("mm, memcg: fix reclaim deadlock
with writeback") which was in 4.9.152, 4.14.94, 4.19.16, and 4.20.3 as
well as 5.0-rc2.

> [2] b0b9b3df27d10, mm: stop leaking PageTables

This commit was in 4.10, so I am guessing that this really is just a
backport of that commit?

If so, it's not the full backport, why not take the whole thing?  Why
only cherry-pick one chunk of it?  Why do we not need the other parts?

thanks,

greg k-h
Minchan Kim Feb. 14, 2019, 7:23 a.m. UTC | #6
On Wed, Feb 13, 2019 at 02:36:24PM +0100, Greg KH wrote:
> On Wed, Feb 13, 2019 at 08:29:00PM +0900, Minchan Kim wrote:
> > [1] was backported to v4.9 stable tree but it introduces pgtable
> > memory leak because with fault retrial, preallocated pagetable
> > could be leaked in second iteration.
> > To fix the problem, this patch backport [2].
> > 
> > [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
> 
> This is really commit 63f3655f9501 ("mm, memcg: fix reclaim deadlock
> with writeback") which was in 4.9.152, 4.14.94, 4.19.16, and 4.20.3 as
> well as 5.0-rc2.

Since 4.10, we has [2] so it should be okay other (tree > 4.10)

> 
> > [2] b0b9b3df27d10, mm: stop leaking PageTables
> 
> This commit was in 4.10, so I am guessing that this really is just a
> backport of that commit?

Yub.

> 
> If so, it's not the full backport, why not take the whole thing?  Why
> only cherry-pick one chunk of it?  Why do we not need the other parts?

Because [2] actually aims for fixing [3] which was introduced at 4.10.
Since then, [1] relies on the chunk I sent. Thus we don't need other part
for 4.9.

[3] 953c66c2b22a ("mm: THP page cache support for ppc64")

Thanks.
Minchan Kim Feb. 14, 2019, 7:25 a.m. UTC | #7
On Wed, Feb 13, 2019 at 01:24:58PM +0100, Michal Hocko wrote:
> On Wed 13-02-19 21:12:00, Minchan Kim wrote:
> > On Wed, Feb 13, 2019 at 01:03:30PM +0100, Michal Hocko wrote:
> > > On Wed 13-02-19 20:29:00, Minchan Kim wrote:
> > > > [1] was backported to v4.9 stable tree but it introduces pgtable
> > > > memory leak because with fault retrial, preallocated pagetable
> > > > could be leaked in second iteration.
> > > > To fix the problem, this patch backport [2].
> > > > 
> > > > [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
> > > > [2] b0b9b3df27d10, mm: stop leaking PageTables
> > > > 
> > > > Fixes: 5cf3e5ff95876 ("mm, memcg: fix reclaim deadlock with writeback")
> > > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > Cc: Michal Hocko <mhocko@suse.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: Hugh Dickins <hughd@google.com>
> > > > Cc: Liu Bo <bo.liu@linux.alibaba.com>
> > > > Cc: <stable@vger.kernel.org> [4.9]
> > > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > 
> > > Thanks for catching this dependency. Do I assume it correctly that this
> > > is stable-4.9 only?
> > 
> > I have no idea how I could find it automatically that a stable patch of
> > linus tree is spread out with several stable trees(Hope Greg has an
> > answer). I just checked 4.4 longterm kernel and couldn't find it in there.
> 
> See http://lkml.kernel.org/r/20190115174036.GA24149@dhcp22.suse.cz
> 
> But my question was more about "this is a stable only thing"? It was not
> obvious from the subject so I wanted to be sure that I am not missing
> anything.

Yub, I think only 4.9 stable tree need to be fixed because Hugh's patch was
in there since v4.10. 

Thanks.
Minchan Kim Feb. 18, 2019, 8:20 a.m. UTC | #8
On Thu, Feb 14, 2019 at 04:23:52PM +0900, Minchan Kim wrote:
> On Wed, Feb 13, 2019 at 02:36:24PM +0100, Greg KH wrote:
> > On Wed, Feb 13, 2019 at 08:29:00PM +0900, Minchan Kim wrote:
> > > [1] was backported to v4.9 stable tree but it introduces pgtable
> > > memory leak because with fault retrial, preallocated pagetable
> > > could be leaked in second iteration.
> > > To fix the problem, this patch backport [2].
> > > 
> > > [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
> > 
> > This is really commit 63f3655f9501 ("mm, memcg: fix reclaim deadlock
> > with writeback") which was in 4.9.152, 4.14.94, 4.19.16, and 4.20.3 as
> > well as 5.0-rc2.
> 
> Since 4.10, we has [2] so it should be okay other (tree > 4.10)
> 
> > 
> > > [2] b0b9b3df27d10, mm: stop leaking PageTables
> > 
> > This commit was in 4.10, so I am guessing that this really is just a
> > backport of that commit?
> 
> Yub.
> 
> > 
> > If so, it's not the full backport, why not take the whole thing?  Why
> > only cherry-pick one chunk of it?  Why do we not need the other parts?
> 
> Because [2] actually aims for fixing [3] which was introduced at 4.10.
> Since then, [1] relies on the chunk I sent. Thus we don't need other part
> for 4.9.
> 
> [3] 953c66c2b22a ("mm: THP page cache support for ppc64")

Hi Greg,

Any chance to look into this patch?
Greg Kroah-Hartman Feb. 18, 2019, 8:33 a.m. UTC | #9
On Mon, Feb 18, 2019 at 05:20:26PM +0900, Minchan Kim wrote:
> On Thu, Feb 14, 2019 at 04:23:52PM +0900, Minchan Kim wrote:
> > On Wed, Feb 13, 2019 at 02:36:24PM +0100, Greg KH wrote:
> > > On Wed, Feb 13, 2019 at 08:29:00PM +0900, Minchan Kim wrote:
> > > > [1] was backported to v4.9 stable tree but it introduces pgtable
> > > > memory leak because with fault retrial, preallocated pagetable
> > > > could be leaked in second iteration.
> > > > To fix the problem, this patch backport [2].
> > > > 
> > > > [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
> > > 
> > > This is really commit 63f3655f9501 ("mm, memcg: fix reclaim deadlock
> > > with writeback") which was in 4.9.152, 4.14.94, 4.19.16, and 4.20.3 as
> > > well as 5.0-rc2.
> > 
> > Since 4.10, we has [2] so it should be okay other (tree > 4.10)
> > 
> > > 
> > > > [2] b0b9b3df27d10, mm: stop leaking PageTables
> > > 
> > > This commit was in 4.10, so I am guessing that this really is just a
> > > backport of that commit?
> > 
> > Yub.
> > 
> > > 
> > > If so, it's not the full backport, why not take the whole thing?  Why
> > > only cherry-pick one chunk of it?  Why do we not need the other parts?
> > 
> > Because [2] actually aims for fixing [3] which was introduced at 4.10.
> > Since then, [1] relies on the chunk I sent. Thus we don't need other part
> > for 4.9.
> > 
> > [3] 953c66c2b22a ("mm: THP page cache support for ppc64")
> 
> Hi Greg,
> 
> Any chance to look into this patch?

You sent it on Thursday, it is Monday, I tried to actually take the
weekend off :)

It's in my queue, relax, it has good company with a few other hundred
stable patches being requested...

greg k-h
Greg Kroah-Hartman Feb. 18, 2019, 1:33 p.m. UTC | #10
On Wed, Feb 13, 2019 at 08:29:00PM +0900, Minchan Kim wrote:
> [1] was backported to v4.9 stable tree but it introduces pgtable
> memory leak because with fault retrial, preallocated pagetable
> could be leaked in second iteration.
> To fix the problem, this patch backport [2].
> 
> [1] 5cf3e5ff95876, mm, memcg: fix reclaim deadlock with writeback
> [2] b0b9b3df27d10, mm: stop leaking PageTables
> 
> Fixes: 5cf3e5ff95876 ("mm, memcg: fix reclaim deadlock with writeback")
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Liu Bo <bo.liu@linux.alibaba.com>
> Cc: <stable@vger.kernel.org> [4.9]
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/memory.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)

I fixed up the changelog text to be correct, and now queued this up,
thanks.

greg k-h
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 35d8217bb0467..47248dc0b9e1a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3329,15 +3329,24 @@  static int do_fault(struct fault_env *fe)
 {
 	struct vm_area_struct *vma = fe->vma;
 	pgoff_t pgoff = linear_page_index(vma, fe->address);
+	int ret;
 
 	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
 	if (!vma->vm_ops->fault)
-		return VM_FAULT_SIGBUS;
-	if (!(fe->flags & FAULT_FLAG_WRITE))
-		return do_read_fault(fe, pgoff);
-	if (!(vma->vm_flags & VM_SHARED))
-		return do_cow_fault(fe, pgoff);
-	return do_shared_fault(fe, pgoff);
+		ret = VM_FAULT_SIGBUS;
+	else if (!(fe->flags & FAULT_FLAG_WRITE))
+		ret = do_read_fault(fe, pgoff);
+	else if (!(vma->vm_flags & VM_SHARED))
+		ret = do_cow_fault(fe, pgoff);
+	else
+		ret = do_shared_fault(fe, pgoff);
+
+	/* preallocated pagetable is unused: free it */
+	if (fe->prealloc_pte) {
+		pte_free(vma->vm_mm, fe->prealloc_pte);
+		fe->prealloc_pte = 0;
+	}
+	return ret;
 }
 
 static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,