diff mbox series

[v3] mm/page_alloc: Further fix __alloc_pages_bulk() return value

Message ID 162497449506.16614.7781489905877008435.stgit@klimt.1015granger.net (mailing list archive)
State New
Headers show
Series [v3] mm/page_alloc: Further fix __alloc_pages_bulk() return value | expand

Commit Message

Chuck Lever III June 29, 2021, 1:48 p.m. UTC
The author of commit b3b64ebd3822 ("mm/page_alloc: do bulk array
bounds check after checking populated elements") was possibly
confused by the mixture of return values throughout the function.

The API contract is clear that the function "Returns the number of
pages on the list or array." It does not list zero as a unique
return value with a special meaning. Therefore zero is a plausible
return value only if @nr_pages is zero or less.

Clean up the return logic to make it clear that the returned value
is always the total number of pages in the array/list, not the
number of pages that were allocated during this call.

The only change in behavior with this patch is the value returned
if prepare_alloc_pages() fails. To match the API contract, the
number of pages currently in the array/list is returned in this
case.

The call site in __page_pool_alloc_pages_slow() also seems to be
confused on this matter. It should be attended to by someone who
is familiar with that code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 mm/page_alloc.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Mel Gorman June 29, 2021, 3:33 p.m. UTC | #1
On Tue, Jun 29, 2021 at 09:48:15AM -0400, Chuck Lever wrote:
> The author of commit b3b64ebd3822 ("mm/page_alloc: do bulk array
> bounds check after checking populated elements") was possibly
> confused by the mixture of return values throughout the function.
> 
> The API contract is clear that the function "Returns the number of
> pages on the list or array." It does not list zero as a unique
> return value with a special meaning. Therefore zero is a plausible
> return value only if @nr_pages is zero or less.
> 
> Clean up the return logic to make it clear that the returned value
> is always the total number of pages in the array/list, not the
> number of pages that were allocated during this call.
> 
> The only change in behavior with this patch is the value returned
> if prepare_alloc_pages() fails. To match the API contract, the
> number of pages currently in the array/list is returned in this
> case.
> 
> The call site in __page_pool_alloc_pages_slow() also seems to be
> confused on this matter. It should be attended to by someone who
> is familiar with that code.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Acked-by: Mel Gorman <mgorman@techsingularity.net>
Jesper Dangaard Brouer June 29, 2021, 4:01 p.m. UTC | #2
On 29/06/2021 15.48, Chuck Lever wrote:

> The call site in __page_pool_alloc_pages_slow() also seems to be
> confused on this matter. It should be attended to by someone who
> is familiar with that code.

I don't think we need a fix for __page_pool_alloc_pages_slow(), as the 
array is guaranteed to be empty.

But a fix would look like this:

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index c137ce308c27..1b04538a3da3 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -245,22 +245,23 @@ static struct page 
*__page_pool_alloc_pages_slow(struct page_pool *pool,
         if (unlikely(pp_order))
                 return __page_pool_alloc_page_order(pool, gfp);

         /* Unnecessary as alloc cache is empty, but guarantees zero 
count */
-       if (unlikely(pool->alloc.count > 0))
+       if (unlikely(pool->alloc.count > 0))   // ^^^^^^^^^^^^^^^^^^^^^^
                 return pool->alloc.cache[--pool->alloc.count];

         /* Mark empty alloc.cache slots "empty" for 
alloc_pages_bulk_array */
         memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);

+       /* bulk API ret value also count existing pages, but array is 
empty */
         nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache);
         if (unlikely(!nr_pages))
                 return NULL;

         /* Pages have been filled into alloc.cache array, but count is 
zero and
          * page element have not been (possibly) DMA mapped.
          */
-       for (i = 0; i < nr_pages; i++) {
+       for (i = pool->alloc.count; i < nr_pages; i++) {
                 page = pool->alloc.cache[i];
                 if ((pp_flags & PP_FLAG_DMA_MAP) &&
                     unlikely(!page_pool_dma_map(pool, page))) {
                         put_page(page);
Jesper Dangaard Brouer June 29, 2021, 4:04 p.m. UTC | #3
On 29/06/2021 17.33, Mel Gorman wrote:
> On Tue, Jun 29, 2021 at 09:48:15AM -0400, Chuck Lever wrote:
>> The author of commit b3b64ebd3822 ("mm/page_alloc: do bulk array
>> bounds check after checking populated elements") was possibly
>> confused by the mixture of return values throughout the function.
>>
>> The API contract is clear that the function "Returns the number of
>> pages on the list or array." It does not list zero as a unique
>> return value with a special meaning. Therefore zero is a plausible
>> return value only if @nr_pages is zero or less.
>>
>> Clean up the return logic to make it clear that the returned value
>> is always the total number of pages in the array/list, not the
>> number of pages that were allocated during this call.
>>
>> The only change in behavior with this patch is the value returned
>> if prepare_alloc_pages() fails. To match the API contract, the
>> number of pages currently in the array/list is returned in this
>> case.
>>
>> The call site in __page_pool_alloc_pages_slow() also seems to be
>> confused on this matter. It should be attended to by someone who
>> is familiar with that code.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Chuck Lever III June 29, 2021, 4:32 p.m. UTC | #4
> On Jun 29, 2021, at 12:01 PM, Jesper Dangaard Brouer <jbrouer@redhat.com> wrote:
> 
> On 29/06/2021 15.48, Chuck Lever wrote:
> 
>> The call site in __page_pool_alloc_pages_slow() also seems to be
>> confused on this matter. It should be attended to by someone who
>> is familiar with that code.
> 
> I don't think we need a fix for __page_pool_alloc_pages_slow(), as the array is guaranteed to be empty.
> 
> But a fix would look like this:
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index c137ce308c27..1b04538a3da3 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -245,22 +245,23 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
>         if (unlikely(pp_order))
>                 return __page_pool_alloc_page_order(pool, gfp);
> 
>         /* Unnecessary as alloc cache is empty, but guarantees zero count */
> -       if (unlikely(pool->alloc.count > 0))
> +       if (unlikely(pool->alloc.count > 0))   // ^^^^^^^^^^^^^^^^^^^^^^
>                 return pool->alloc.cache[--pool->alloc.count];
> 
>         /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array */
>         memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
> 
> +       /* bulk API ret value also count existing pages, but array is empty */
>         nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache);
>         if (unlikely(!nr_pages))
>                 return NULL;

Thanks, this check makes sense to me now.


>         /* Pages have been filled into alloc.cache array, but count is zero and
>          * page element have not been (possibly) DMA mapped.
>          */
> -       for (i = 0; i < nr_pages; i++) {
> +       for (i = pool->alloc.count; i < nr_pages; i++) {
>                 page = pool->alloc.cache[i];
>                 if ((pp_flags & PP_FLAG_DMA_MAP) &&
>                     unlikely(!page_pool_dma_map(pool, page))) {
>                         put_page(page);
> 

--
Chuck Lever
Mel Gorman June 30, 2021, 6:58 a.m. UTC | #5
On Tue, Jun 29, 2021 at 06:01:12PM +0200, Jesper Dangaard Brouer wrote:
> On 29/06/2021 15.48, Chuck Lever wrote:
> 
> > The call site in __page_pool_alloc_pages_slow() also seems to be
> > confused on this matter. It should be attended to by someone who
> > is familiar with that code.
> 
> I don't think we need a fix for __page_pool_alloc_pages_slow(), as the array
> is guaranteed to be empty.
> 
> But a fix would look like this:
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index c137ce308c27..1b04538a3da3 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -245,22 +245,23 @@ static struct page
> *__page_pool_alloc_pages_slow(struct page_pool *pool,
>         if (unlikely(pp_order))
>                 return __page_pool_alloc_page_order(pool, gfp);
> 
>         /* Unnecessary as alloc cache is empty, but guarantees zero count */
> -       if (unlikely(pool->alloc.count > 0))
> +       if (unlikely(pool->alloc.count > 0))   // ^^^^^^^^^^^^^^^^^^^^^^
>                 return pool->alloc.cache[--pool->alloc.count];
> 
>         /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array
> */
>         memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
> 
> +       /* bulk API ret value also count existing pages, but array is empty
> */
>         nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache);
>         if (unlikely(!nr_pages))
>                 return NULL;
> 
>         /* Pages have been filled into alloc.cache array, but count is zero
> and
>          * page element have not been (possibly) DMA mapped.
>          */
> -       for (i = 0; i < nr_pages; i++) {
> +       for (i = pool->alloc.count; i < nr_pages; i++) {

That last part would break as the loop is updating pool->alloc_count.
Just setting pool->alloc_count = nr_pages would break if PP_FLAG_DMA_MAP
was set and page_pool_dma_map failed. Right?
Jesper Dangaard Brouer June 30, 2021, 11:22 a.m. UTC | #6
On 30/06/2021 08.58, Mel Gorman wrote:
> On Tue, Jun 29, 2021 at 06:01:12PM +0200, Jesper Dangaard Brouer wrote:
>> On 29/06/2021 15.48, Chuck Lever wrote:
>>
>>> The call site in __page_pool_alloc_pages_slow() also seems to be
>>> confused on this matter. It should be attended to by someone who
>>> is familiar with that code.
>> I don't think we need a fix for __page_pool_alloc_pages_slow(), as the array
>> is guaranteed to be empty.
>>
>> But a fix would look like this:
>>
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index c137ce308c27..1b04538a3da3 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -245,22 +245,23 @@ static struct page
>> *__page_pool_alloc_pages_slow(struct page_pool *pool,
>>          if (unlikely(pp_order))
>>                  return __page_pool_alloc_page_order(pool, gfp);
>>
>>          /* Unnecessary as alloc cache is empty, but guarantees zero count */
>> -       if (unlikely(pool->alloc.count > 0))
>> +       if (unlikely(pool->alloc.count > 0))   // ^^^^^^^^^^^^^^^^^^^^^^
>>                  return pool->alloc.cache[--pool->alloc.count];
>>
>>          /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array
>> */
>>          memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
>>
>> +       /* bulk API ret value also count existing pages, but array is empty
>> */
>>          nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache);
>>          if (unlikely(!nr_pages))
>>                  return NULL;
>>
>>          /* Pages have been filled into alloc.cache array, but count is zero
>> and
>>           * page element have not been (possibly) DMA mapped.
>>           */
>> -       for (i = 0; i < nr_pages; i++) {
>> +       for (i = pool->alloc.count; i < nr_pages; i++) {
> That last part would break as the loop is updating pool->alloc_count.
The last part "i = pool->alloc.count" probably is a bad idea.
> Just setting pool->alloc_count = nr_pages would break if PP_FLAG_DMA_MAP
> was set and page_pool_dma_map failed. Right?

Yes, this loop is calling page_pool_dma_map(), and if that fails we 
don't advance pool->alloc_count.

--Jesper
Mel Gorman June 30, 2021, 12:05 p.m. UTC | #7
On Wed, Jun 30, 2021 at 01:22:24PM +0200, Jesper Dangaard Brouer wrote:
> 
> On 30/06/2021 08.58, Mel Gorman wrote:
> > On Tue, Jun 29, 2021 at 06:01:12PM +0200, Jesper Dangaard Brouer wrote:
> > > On 29/06/2021 15.48, Chuck Lever wrote:
> > > 
> > > > The call site in __page_pool_alloc_pages_slow() also seems to be
> > > > confused on this matter. It should be attended to by someone who
> > > > is familiar with that code.
> > > I don't think we need a fix for __page_pool_alloc_pages_slow(), as the array
> > > is guaranteed to be empty.
> > > 
> > > But a fix would look like this:
> > > 
> > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > index c137ce308c27..1b04538a3da3 100644
> > > --- a/net/core/page_pool.c
> > > +++ b/net/core/page_pool.c
> > > @@ -245,22 +245,23 @@ static struct page
> > > *__page_pool_alloc_pages_slow(struct page_pool *pool,
> > >          if (unlikely(pp_order))
> > >                  return __page_pool_alloc_page_order(pool, gfp);
> > > 
> > >          /* Unnecessary as alloc cache is empty, but guarantees zero count */
> > > -       if (unlikely(pool->alloc.count > 0))
> > > +       if (unlikely(pool->alloc.count > 0))   // ^^^^^^^^^^^^^^^^^^^^^^
> > >                  return pool->alloc.cache[--pool->alloc.count];
> > > 
> > >          /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array
> > > */
> > >          memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
> > > 
> > > +       /* bulk API ret value also count existing pages, but array is empty
> > > */
> > >          nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache);
> > >          if (unlikely(!nr_pages))
> > >                  return NULL;
> > > 
> > >          /* Pages have been filled into alloc.cache array, but count is zero
> > > and
> > >           * page element have not been (possibly) DMA mapped.
> > >           */
> > > -       for (i = 0; i < nr_pages; i++) {
> > > +       for (i = pool->alloc.count; i < nr_pages; i++) {
> > That last part would break as the loop is updating pool->alloc_count.
>
> The last part "i = pool->alloc.count" probably is a bad idea.

It is because it confuses the context, either alloc.count is guaranteed
zero or it's not. Initialised as zero, it's fairly clear that it's a)
always zero and b) the way i and alloc.count works mean that the array
element pointing to a freed page is either ignored or overwritten by a
valid page pointer in a later loop iteration.
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e7af86e1a312..49eb2e134f9d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5059,7 +5059,7 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	int nr_populated = 0;
 
 	if (unlikely(nr_pages <= 0))
-		return 0;
+		goto out;
 
 	/*
 	 * Skip populated array elements to determine if any pages need
@@ -5070,7 +5070,7 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	/* Already populated array? */
 	if (unlikely(page_array && nr_pages - nr_populated == 0))
-		return nr_populated;
+		goto out;
 
 	/* Use the single page allocator for one page. */
 	if (nr_pages - nr_populated == 1)
@@ -5080,7 +5080,7 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	gfp &= gfp_allowed_mask;
 	alloc_gfp = gfp;
 	if (!prepare_alloc_pages(gfp, 0, preferred_nid, nodemask, &ac, &alloc_gfp, &alloc_flags))
-		return 0;
+		goto out;
 	gfp = alloc_gfp;
 
 	/* Find an allowed local zone that meets the low watermark. */
@@ -5153,6 +5153,7 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	local_irq_restore(flags);
 
+out:
 	return nr_populated;
 
 failed_irq:
@@ -5168,7 +5169,7 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 		nr_populated++;
 	}
 
-	return nr_populated;
+	goto out;
 }
 EXPORT_SYMBOL_GPL(__alloc_pages_bulk);