mm/hugetlb: simplify refs in memfd_alloc_folio

Message ID	1725478868-61732-1-git-send-email-steven.sistare@oracle.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Steve Sistare <steven.sistare@oracle.com> To: linux-mm@kvack.org Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>, Muchun Song <muchun.song@linux.dev>, Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>, Steve Sistare <steven.sistare@oracle.com> Subject: [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio Date: Wed, 4 Sep 2024 12:41:08 -0700 Message-Id: <1725478868-61732-1-git-send-email-steven.sistare@oracle.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/hugetlb: simplify refs in memfd_alloc_folio \| expand mm/hugetlb: simplify refs in memfd_alloc_folio

Message ID

1725478868-61732-1-git-send-email-steven.sistare@oracle.com (mailing list archive)

State

New

Headers

From: Steve Sistare <steven.sistare@oracle.com>
To: linux-mm@kvack.org
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>,
        Muchun Song <muchun.song@linux.dev>,
        Andrew Morton <akpm@linux-foundation.org>,
        Matthew Wilcox <willy@infradead.org>, Peter Xu <peterx@redhat.com>,
        David Hildenbrand <david@redhat.com>,
 Jason Gunthorpe <jgg@nvidia.com>,
        Steve Sistare <steven.sistare@oracle.com>
Subject: [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio
Date: Wed,  4 Sep 2024 12:41:08 -0700
Message-Id: <1725478868-61732-1-git-send-email-steven.sistare@oracle.com>
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm/hugetlb: simplify refs in memfd_alloc_folio | expand

Commit Message

Steven Sistare Sept. 4, 2024, 7:41 p.m. UTC

The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
delete the matching folio_put in memfd_pin_folios.  This also avoids
leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
fails, which would otherwise need an additional folio_put.  This is a
continuation of the fix
  "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"

Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")

Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 mm/gup.c   | 4 +---
 mm/memfd.c | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

Comments

Matthew Wilcox (Oracle) Sept. 4, 2024, 8:02 p.m. UTC | #1

On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
> The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
> delete the matching folio_put in memfd_pin_folios.  This also avoids
> leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
> fails, which would otherwise need an additional folio_put.  This is a
> continuation of the fix
>   "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"

I think you're right, but don't we also need to get rid of the
folio_put() call in the 'if (err)' case after calling
hugetlb_add_to_page_cache()?

> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
> 
> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  mm/gup.c   | 4 +---
>  mm/memfd.c | 2 +-
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index bccabaa..947881ff 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>  	pgoff_t start_idx, end_idx, next_idx;
>  	struct folio *folio = NULL;
>  	struct folio_batch fbatch;
> -	struct hstate *h = NULL;
> +	struct hstate *h;
>  	long ret = -EINVAL;
>  
>  	if (start < 0 || start > end || !max_folios)
> @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>  							     &fbatch);
>  			if (folio) {
>  				folio_put(folio);
> -				if (h)
> -					folio_put(folio);
>  				folio = NULL;
>  			}
>  
> diff --git a/mm/memfd.c b/mm/memfd.c
> index bcb131d..f715301 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
>  						    numa_node_id(),
>  						    NULL,
>  						    gfp_mask);
> -		if (folio && folio_try_get(folio)) {
> +		if (folio) {
>  			err = hugetlb_add_to_page_cache(folio,
>  							memfd->f_mapping,
>  							idx);
> -- 
> 1.8.3.1
>

Matthew Wilcox (Oracle) Sept. 4, 2024, 8:11 p.m. UTC | #2

On Wed, Sep 04, 2024 at 09:02:59PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
> > The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
> > delete the matching folio_put in memfd_pin_folios.  This also avoids
> > leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
> > fails, which would otherwise need an additional folio_put.  This is a
> > continuation of the fix
> >   "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
> 
> I think you're right, but don't we also need to get rid of the
> folio_put() call in the 'if (err)' case after calling
> hugetlb_add_to_page_cache()?

After scratching my head about this a bit more, I was trying to preserve
the existing semantics of the code, but I think the code was always
buggy.

The correct code would be:

	folio = alloc_hugetlb_folio_nodemask(...);
	folio_put(folio);

The code as in tree today would trip an assertion:

        VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);

as alloc_hugetlb_folio_nodemask() returns a folio with a refcount 1,
folio_try_get() would increment it to 2, folio_put() would decrement
it to 1, and so we'd call free_huge_folio() with a refcount of 1.

But after your patch, the code _is_ still wrong because we'll
start with a refcount of 1, fail to add to the page cache, call
folio_put() which will decrement the refcount to 0 _and call
free_huge_folio() itself_.  Then we'll call free_huge_folio()
on an already freed and possibly reallocated folio.

So every version suggested so far (current, yours, mine) is wrong,
and the right code looks like:

                folio = alloc_hugetlb_folio_nodemask(...);
		if (folio) {
			err = hugetlb_add_to_page_cache(...);
			if (err) {
				folio_put(folio);
				return ERR_PTR(err);
			}
...

Or have I got something wrong in that analysis?

> > Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
> > 
> > Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > ---
> >  mm/gup.c   | 4 +---
> >  mm/memfd.c | 2 +-
> >  2 files changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index bccabaa..947881ff 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> >  	pgoff_t start_idx, end_idx, next_idx;
> >  	struct folio *folio = NULL;
> >  	struct folio_batch fbatch;
> > -	struct hstate *h = NULL;
> > +	struct hstate *h;
> >  	long ret = -EINVAL;
> >  
> >  	if (start < 0 || start > end || !max_folios)
> > @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> >  							     &fbatch);
> >  			if (folio) {
> >  				folio_put(folio);
> > -				if (h)
> > -					folio_put(folio);
> >  				folio = NULL;
> >  			}
> >  
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index bcb131d..f715301 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> > @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
> >  						    numa_node_id(),
> >  						    NULL,
> >  						    gfp_mask);
> > -		if (folio && folio_try_get(folio)) {
> > +		if (folio) {
> >  			err = hugetlb_add_to_page_cache(folio,
> >  							memfd->f_mapping,
> >  							idx);
> > -- 
> > 1.8.3.1
> > 
>

Steven Sistare Sept. 4, 2024, 8:13 p.m. UTC | #3

On 9/4/2024 4:02 PM, Matthew Wilcox wrote:
> On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
>> The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
>> delete the matching folio_put in memfd_pin_folios.  This also avoids
>> leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
>> fails, which would otherwise need an additional folio_put.  This is a
>> continuation of the fix
>>    "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
> 
> I think you're right, but don't we also need to get rid of the
> folio_put() call in the 'if (err)' case after calling
> hugetlb_add_to_page_cache()?

That folio_put drops the ref obtained by unfreeze:

memfd_alloc_folio()
   alloc_hugetlb_folio_nodemask()
     dequeue_hugetlb_folio_nodemask()
       dequeue_hugetlb_folio_node_exact()
         folio_ref_unfreeze(folio, 1);    ; adds 1 refcount

- Steve

>> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
>>
>> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>   mm/gup.c   | 4 +---
>>   mm/memfd.c | 2 +-
>>   2 files changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index bccabaa..947881ff 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>   	pgoff_t start_idx, end_idx, next_idx;
>>   	struct folio *folio = NULL;
>>   	struct folio_batch fbatch;
>> -	struct hstate *h = NULL;
>> +	struct hstate *h;
>>   	long ret = -EINVAL;
>>   
>>   	if (start < 0 || start > end || !max_folios)
>> @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>   							     &fbatch);
>>   			if (folio) {
>>   				folio_put(folio);
>> -				if (h)
>> -					folio_put(folio);
>>   				folio = NULL;
>>   			}
>>   
>> diff --git a/mm/memfd.c b/mm/memfd.c
>> index bcb131d..f715301 100644
>> --- a/mm/memfd.c
>> +++ b/mm/memfd.c
>> @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
>>   						    numa_node_id(),
>>   						    NULL,
>>   						    gfp_mask);
>> -		if (folio && folio_try_get(folio)) {
>> +		if (folio) {
>>   			err = hugetlb_add_to_page_cache(folio,
>>   							memfd->f_mapping,
>>   							idx);
>> -- 
>> 1.8.3.1
>>

Steven Sistare Sept. 4, 2024, 8:20 p.m. UTC | #4

On 9/4/2024 4:11 PM, Matthew Wilcox wrote:
> On Wed, Sep 04, 2024 at 09:02:59PM +0100, Matthew Wilcox wrote:
>> On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
>>> The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
>>> delete the matching folio_put in memfd_pin_folios.  This also avoids
>>> leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
>>> fails, which would otherwise need an additional folio_put.  This is a
>>> continuation of the fix
>>>    "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
>>
>> I think you're right, but don't we also need to get rid of the
>> folio_put() call in the 'if (err)' case after calling
>> hugetlb_add_to_page_cache()?
> 
> After scratching my head about this a bit more, I was trying to preserve
> the existing semantics of the code, but I think the code was always
> buggy.
> 
> The correct code would be:
> 
> 	folio = alloc_hugetlb_folio_nodemask(...);
> 	folio_put(folio);
> 
> The code as in tree today would trip an assertion:
> 
>          VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
> 
> as alloc_hugetlb_folio_nodemask() returns a folio with a refcount 1,
> folio_try_get() would increment it to 2, folio_put() would decrement
> it to 1, and so we'd call free_huge_folio() with a refcount of 1.
> 
> But after your patch, the code _is_ still wrong because we'll
> start with a refcount of 1, fail to add to the page cache, call
> folio_put() which will decrement the refcount to 0 _and call
> free_huge_folio() itself_.  Then we'll call free_huge_folio()
> on an already freed and possibly reallocated folio.

Indeed.  The explicit call to free_huge_folio must be deleted, as you
coded below.  I'll send a V2 of the patch.

- Steve

> So every version suggested so far (current, yours, mine) is wrong,
> and the right code looks like:
> 
>                  folio = alloc_hugetlb_folio_nodemask(...);
> 		if (folio) {
> 			err = hugetlb_add_to_page_cache(...);
> 			if (err) {
> 				folio_put(folio);
> 				return ERR_PTR(err);
> 			}
> ...
> 
> Or have I got something wrong in that analysis?
> 
>>> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
>>>
>>> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>>   mm/gup.c   | 4 +---
>>>   mm/memfd.c | 2 +-
>>>   2 files changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index bccabaa..947881ff 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>   	pgoff_t start_idx, end_idx, next_idx;
>>>   	struct folio *folio = NULL;
>>>   	struct folio_batch fbatch;
>>> -	struct hstate *h = NULL;
>>> +	struct hstate *h;
>>>   	long ret = -EINVAL;
>>>   
>>>   	if (start < 0 || start > end || !max_folios)
>>> @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>   							     &fbatch);
>>>   			if (folio) {
>>>   				folio_put(folio);
>>> -				if (h)
>>> -					folio_put(folio);
>>>   				folio = NULL;
>>>   			}
>>>   
>>> diff --git a/mm/memfd.c b/mm/memfd.c
>>> index bcb131d..f715301 100644
>>> --- a/mm/memfd.c
>>> +++ b/mm/memfd.c
>>> @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
>>>   						    numa_node_id(),
>>>   						    NULL,
>>>   						    gfp_mask);
>>> -		if (folio && folio_try_get(folio)) {
>>> +		if (folio) {
>>>   			err = hugetlb_add_to_page_cache(folio,
>>>   							memfd->f_mapping,
>>>   							idx);
>>> -- 
>>> 1.8.3.1
>>>
>>

diff --git a/mm/gup.c b/mm/gup.c
index bccabaa..947881ff 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3618,7 +3618,7 @@  long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
 	pgoff_t start_idx, end_idx, next_idx;
 	struct folio *folio = NULL;
 	struct folio_batch fbatch;
-	struct hstate *h = NULL;
+	struct hstate *h;
 	long ret = -EINVAL;
 
 	if (start < 0 || start > end || !max_folios)
@@ -3662,8 +3662,6 @@  long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
 							     &fbatch);
 			if (folio) {
 				folio_put(folio);
-				if (h)
-					folio_put(folio);
 				folio = NULL;
 			}
 
diff --git a/mm/memfd.c b/mm/memfd.c
index bcb131d..f715301 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -89,7 +89,7 @@  struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
 						    numa_node_id(),
 						    NULL,
 						    gfp_mask);
-		if (folio && folio_try_get(folio)) {
+		if (folio) {
 			err = hugetlb_add_to_page_cache(folio,
 							memfd->f_mapping,
 							idx);

mm/hugetlb: simplify refs in memfd_alloc_folio

Commit Message

Comments

Patch