diff mbox series

[6.12.y] mm/mmap: fix __mmap_region() error handling in rare merge failure case

Message ID 20241118194048.2355180-1-Liam.Howlett@oracle.com (mailing list archive)
State New
Headers show
Series [6.12.y] mm/mmap: fix __mmap_region() error handling in rare merge failure case | expand

Commit Message

Liam R. Howlett Nov. 18, 2024, 7:40 p.m. UTC
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

The mmap_region() function tries to install a new vma, which requires a
pre-allocation for the maple tree write due to the complex locking
scenarios involved.

Recent efforts to simplify the error recovery required the relocation of
the preallocation of the maple tree nodes (via vma_iter_prealloc()
calling mas_preallocate()) higher in the function.

The relocation of the preallocation meant that, if there was a file
associated with the vma and the driver call (mmap_file()) modified the
vma flags, then a new merge of the new vma with existing vmas is
attempted.

During the attempt to merge the existing vma with the new vma, the vma
iterator is used - the same iterator that would be used for the next
write attempt to the tree.  In the event of needing a further allocation
and if the new allocations fails, the vma iterator (and contained maple
state) will cleaned up, including freeing all previous allocations and
will be reset internally.

Upon returning to the __mmap_region() function, the error reason is lost
and the function sets the vma iterator limits, and then tries to
continue to store the new vma using vma_iter_store() - which expects
preallocated nodes.

A preallocation should be performed in case the allocations were lost
during the failure scenario - there is no risk of over allocating.  The
range is already set in the vma_iter_store() call below, so it is not
necessary.

Reported-by: syzbot+bc6bfc25a68b7a020ee1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/x/log.txt?x=17b0ace8580000
Fixes: 5de195060b2e2 ("mm: resolve faulty mmap_region() error path behaviour")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
---
 mm/mmap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Liam R. Howlett Nov. 18, 2024, 8:32 p.m. UTC | #1
Okay, before I get yelled at...

This commit is only necessary for 6.12.y until Lorenzo's other fixes to
older stables land (and I'll have to figure out what to do in each).

The commit will not work on mm-unstable, because it doesn't exist due to
refactoring.

The commit does not have a tag about "upstream commit" because there
isn't one - the closest thing I could point to does not have a stable
git id.

So here I am with a fix for a kernel that was released a few hours ago
that is not necessary in v6.13, for a bug that's out there on syzkaller.

Also, it's very unlikely to happen unless you inject failures like
syzkaller.  But hey, pretty decent turn-around on finding a fix - so
that's a rosy outlook.


* Liam R. Howlett <Liam.Howlett@oracle.com> [241118 14:41]:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> 
> The mmap_region() function tries to install a new vma, which requires a
> pre-allocation for the maple tree write due to the complex locking
> scenarios involved.
> 
> Recent efforts to simplify the error recovery required the relocation of
> the preallocation of the maple tree nodes (via vma_iter_prealloc()
> calling mas_preallocate()) higher in the function.
> 
> The relocation of the preallocation meant that, if there was a file
> associated with the vma and the driver call (mmap_file()) modified the
> vma flags, then a new merge of the new vma with existing vmas is
> attempted.
> 
> During the attempt to merge the existing vma with the new vma, the vma
> iterator is used - the same iterator that would be used for the next
> write attempt to the tree.  In the event of needing a further allocation
> and if the new allocations fails, the vma iterator (and contained maple
> state) will cleaned up, including freeing all previous allocations and
> will be reset internally.
> 
> Upon returning to the __mmap_region() function, the error reason is lost
> and the function sets the vma iterator limits, and then tries to
> continue to store the new vma using vma_iter_store() - which expects
> preallocated nodes.
> 
> A preallocation should be performed in case the allocations were lost
> during the failure scenario - there is no risk of over allocating.  The
> range is already set in the vma_iter_store() call below, so it is not
> necessary.
> 
> Reported-by: syzbot+bc6bfc25a68b7a020ee1@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/x/log.txt?x=17b0ace8580000
> Fixes: 5de195060b2e2 ("mm: resolve faulty mmap_region() error path behaviour")
> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Jann Horn <jannh@google.com>
> Cc: <stable@vger.kernel.org>
> ---
>  mm/mmap.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 79d541f1502b2..5cef9a1981f1b 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1491,7 +1491,10 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  				vm_flags = vma->vm_flags;
>  				goto file_expanded;
>  			}
> -			vma_iter_config(&vmi, addr, end);
> +			if (vma_iter_prealloc(&vmi, vma)) {
> +				error = -ENOMEM;
> +				goto unmap_and_free_file_vma;
> +			}
>  		}
>  
>  		vm_flags = vma->vm_flags;
> -- 
> 2.43.0
>
Greg KH Nov. 19, 2024, 2:17 p.m. UTC | #2
On Mon, Nov 18, 2024 at 03:32:14PM -0500, Liam R. Howlett wrote:
> Okay, before I get yelled at...
> 
> This commit is only necessary for 6.12.y until Lorenzo's other fixes to
> older stables land (and I'll have to figure out what to do in each).
> 
> The commit will not work on mm-unstable, because it doesn't exist due to
> refactoring.
> 
> The commit does not have a tag about "upstream commit" because there
> isn't one - the closest thing I could point to does not have a stable
> git id.
> 
> So here I am with a fix for a kernel that was released a few hours ago
> that is not necessary in v6.13, for a bug that's out there on syzkaller.
> 
> Also, it's very unlikely to happen unless you inject failures like
> syzkaller.  But hey, pretty decent turn-around on finding a fix - so
> that's a rosy outlook.

Why isn't this needed in 6.13.y?  What's going to be different in there
that this isn't needed?

Do you just want me to take this for the 6.12.y tree now?  I'll be glad
to, just confused a bit.

thanks,

greg k-h
Liam R. Howlett Nov. 19, 2024, 2:25 p.m. UTC | #3
* Greg KH <gregkh@linuxfoundation.org> [241119 09:17]:
> On Mon, Nov 18, 2024 at 03:32:14PM -0500, Liam R. Howlett wrote:
> > Okay, before I get yelled at...
> > 
> > This commit is only necessary for 6.12.y until Lorenzo's other fixes to
> > older stables land (and I'll have to figure out what to do in each).
> > 
> > The commit will not work on mm-unstable, because it doesn't exist due to
> > refactoring.
> > 
> > The commit does not have a tag about "upstream commit" because there
> > isn't one - the closest thing I could point to does not have a stable
> > git id.
> > 
> > So here I am with a fix for a kernel that was released a few hours ago
> > that is not necessary in v6.13, for a bug that's out there on syzkaller.
> > 
> > Also, it's very unlikely to happen unless you inject failures like
> > syzkaller.  But hey, pretty decent turn-around on finding a fix - so
> > that's a rosy outlook.
> 
> Why isn't this needed in 6.13.y?  What's going to be different in there
> that this isn't needed?

The code has been refactored and avoids the scenario.  I'd name the
refactoring commit as the upstream commit, but it does not have a stable
git id as it's in mm-unstable.  So I'm at a bit of a loss of how to
follow the process.

> 
> Do you just want me to take this for the 6.12.y tree now?  I'll be glad
> to, just confused a bit.

Yes, please.


Thanks,
Liam
Vlastimil Babka Nov. 19, 2024, 2:36 p.m. UTC | #4
On 11/19/24 15:25, Liam R. Howlett wrote:
> * Greg KH <gregkh@linuxfoundation.org> [241119 09:17]:
>> On Mon, Nov 18, 2024 at 03:32:14PM -0500, Liam R. Howlett wrote:
>> > Okay, before I get yelled at...
>> > 
>> > This commit is only necessary for 6.12.y until Lorenzo's other fixes to
>> > older stables land (and I'll have to figure out what to do in each).
>> > 
>> > The commit will not work on mm-unstable, because it doesn't exist due to
>> > refactoring.
>> > 
>> > The commit does not have a tag about "upstream commit" because there
>> > isn't one - the closest thing I could point to does not have a stable
>> > git id.
>> > 
>> > So here I am with a fix for a kernel that was released a few hours ago
>> > that is not necessary in v6.13, for a bug that's out there on syzkaller.
>> > 
>> > Also, it's very unlikely to happen unless you inject failures like
>> > syzkaller.  But hey, pretty decent turn-around on finding a fix - so
>> > that's a rosy outlook.
>> 
>> Why isn't this needed in 6.13.y?  What's going to be different in there
>> that this isn't needed?
> 
> The code has been refactored and avoids the scenario.  I'd name the
> refactoring commit as the upstream commit, but it does not have a stable
> git id as it's in mm-unstable.  So I'm at a bit of a loss of how to
> follow the process.

Is it not in mm-stable now, given we're in a merge window? Anyway AFAIU if
the stable-specific fix is completely different from the upstream
refactoring, we don't even try to pretend it's the same "commit XYZ
upstream" no?

>> 
>> Do you just want me to take this for the 6.12.y tree now?  I'll be glad
>> to, just confused a bit.
> 
> Yes, please.
> 
> 
> Thanks,
> Liam
Lorenzo Stoakes Nov. 19, 2024, 2:59 p.m. UTC | #5
On Mon, Nov 18, 2024 at 02:40:48PM -0500, Liam R. Howlett wrote:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
>
> The mmap_region() function tries to install a new vma, which requires a
> pre-allocation for the maple tree write due to the complex locking
> scenarios involved.
>
> Recent efforts to simplify the error recovery required the relocation of
> the preallocation of the maple tree nodes (via vma_iter_prealloc()
> calling mas_preallocate()) higher in the function.
>
> The relocation of the preallocation meant that, if there was a file
> associated with the vma and the driver call (mmap_file()) modified the
> vma flags, then a new merge of the new vma with existing vmas is
> attempted.
>
> During the attempt to merge the existing vma with the new vma, the vma
> iterator is used - the same iterator that would be used for the next
> write attempt to the tree.  In the event of needing a further allocation
> and if the new allocations fails, the vma iterator (and contained maple
> state) will cleaned up, including freeing all previous allocations and
> will be reset internally.
>
> Upon returning to the __mmap_region() function, the error reason is lost
> and the function sets the vma iterator limits, and then tries to
> continue to store the new vma using vma_iter_store() - which expects
> preallocated nodes.
>
> A preallocation should be performed in case the allocations were lost
> during the failure scenario - there is no risk of over allocating.  The
> range is already set in the vma_iter_store() call below, so it is not
> necessary.
>
> Reported-by: syzbot+bc6bfc25a68b7a020ee1@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/x/log.txt?x=17b0ace8580000
> Fixes: 5de195060b2e2 ("mm: resolve faulty mmap_region() error path behaviour")
> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Jann Horn <jannh@google.com>
> Cc: <stable@vger.kernel.org>
> ---
>  mm/mmap.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 79d541f1502b2..5cef9a1981f1b 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1491,7 +1491,10 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  				vm_flags = vma->vm_flags;
>  				goto file_expanded;
>  			}
> -			vma_iter_config(&vmi, addr, end);
> +			if (vma_iter_prealloc(&vmi, vma)) {
> +				error = -ENOMEM;
> +				goto unmap_and_free_file_vma;
> +			}

This breaks the whole thing of these changes which is to avoid having to
abort after invoking mmap_file(), so we can't do it this way.

As discussed via IRC, it's probably best to do this with __GFP_NOFAIL or
similar to avoid the abort altogether.

But reaching this point is very common, since drivers will often change the
flags, and failing to merge is also very common so we need to be sure this
is a no-op if no pre-allocation is required in the _very rare_ (if even
possible in reality) case of a merge failure due to OOM.

We store in the vmg state whether a merge failed due to oom, which you can
test for explicitly via vmg_nomem().


>  		}
>
>  		vm_flags = vma->vm_flags;
> --
> 2.43.0
>
Liam R. Howlett Nov. 19, 2024, 3:16 p.m. UTC | #6
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [241119 09:59]:
> On Mon, Nov 18, 2024 at 02:40:48PM -0500, Liam R. Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> >
> > The mmap_region() function tries to install a new vma, which requires a
> > pre-allocation for the maple tree write due to the complex locking
> > scenarios involved.
> >
> > Recent efforts to simplify the error recovery required the relocation of
> > the preallocation of the maple tree nodes (via vma_iter_prealloc()
> > calling mas_preallocate()) higher in the function.
> >
> > The relocation of the preallocation meant that, if there was a file
> > associated with the vma and the driver call (mmap_file()) modified the
> > vma flags, then a new merge of the new vma with existing vmas is
> > attempted.
> >
> > During the attempt to merge the existing vma with the new vma, the vma
> > iterator is used - the same iterator that would be used for the next
> > write attempt to the tree.  In the event of needing a further allocation
> > and if the new allocations fails, the vma iterator (and contained maple
> > state) will cleaned up, including freeing all previous allocations and
> > will be reset internally.
> >
> > Upon returning to the __mmap_region() function, the error reason is lost
> > and the function sets the vma iterator limits, and then tries to
> > continue to store the new vma using vma_iter_store() - which expects
> > preallocated nodes.
> >
> > A preallocation should be performed in case the allocations were lost
> > during the failure scenario - there is no risk of over allocating.  The
> > range is already set in the vma_iter_store() call below, so it is not
> > necessary.
> >
> > Reported-by: syzbot+bc6bfc25a68b7a020ee1@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/x/log.txt?x=17b0ace8580000
> > Fixes: 5de195060b2e2 ("mm: resolve faulty mmap_region() error path behaviour")
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: <stable@vger.kernel.org>
> > ---
> >  mm/mmap.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 79d541f1502b2..5cef9a1981f1b 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1491,7 +1491,10 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> >  				vm_flags = vma->vm_flags;
> >  				goto file_expanded;
> >  			}
> > -			vma_iter_config(&vmi, addr, end);
> > +			if (vma_iter_prealloc(&vmi, vma)) {
> > +				error = -ENOMEM;
> > +				goto unmap_and_free_file_vma;
> > +			}
> 
> This breaks the whole thing of these changes which is to avoid having to
> abort after invoking mmap_file(), so we can't do it this way.
> 
> As discussed via IRC, it's probably best to do this with __GFP_NOFAIL or
> similar to avoid the abort altogether.
> 
> But reaching this point is very common, since drivers will often change the
> flags, and failing to merge is also very common so we need to be sure this
> is a no-op if no pre-allocation is required in the _very rare_ (if even
> possible in reality) case of a merge failure due to OOM.

The tree will do nothing if there is enough or more than needed
preallocated for the operation.  If this is common though, we should try
to avoid the call into the tree with the vmg_nomem() test.

> 
> We store in the vmg state whether a merge failed due to oom, which you can
> test for explicitly via vmg_nomem().

Okay, I will respin.

Thanks,
Liam
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index 79d541f1502b2..5cef9a1981f1b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1491,7 +1491,10 @@  static unsigned long __mmap_region(struct file *file, unsigned long addr,
 				vm_flags = vma->vm_flags;
 				goto file_expanded;
 			}
-			vma_iter_config(&vmi, addr, end);
+			if (vma_iter_prealloc(&vmi, vma)) {
+				error = -ENOMEM;
+				goto unmap_and_free_file_vma;
+			}
 		}
 
 		vm_flags = vma->vm_flags;