diff mbox series

page_frag: Recover from memory pressure

Message ID 20201105042140.5253-1-willy@infradead.org (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series page_frag: Recover from memory pressure | expand

Commit Message

Matthew Wilcox (Oracle) Nov. 5, 2020, 4:21 a.m. UTC
When the machine is under extreme memory pressure, the page_frag allocator
signals this to the networking stack by marking allocations with the
'pfmemalloc' flag, which causes non-essential packets to be dropped.
Unfortunately, even after the machine recovers from the low memory
condition, the page continues to be used by the page_frag allocator,
so all allocations from this page will continue to be dropped.

Fix this by freeing and re-allocating the page instead of recycling it.

Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Bert Barbe <bert.barbe@oracle.com>
Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: SRINIVAS <srinivas.eeda@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/page_alloc.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Vlastimil Babka Nov. 5, 2020, 11:56 a.m. UTC | #1
On 11/5/20 5:21 AM, Matthew Wilcox (Oracle) wrote:
> When the machine is under extreme memory pressure, the page_frag allocator
> signals this to the networking stack by marking allocations with the
> 'pfmemalloc' flag, which causes non-essential packets to be dropped.
> Unfortunately, even after the machine recovers from the low memory
> condition, the page continues to be used by the page_frag allocator,
> so all allocations from this page will continue to be dropped.
> > Fix this by freeing and re-allocating the page instead of recycling it.
> 
> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> Cc: Bert Barbe <bert.barbe@oracle.com>
> Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
> Cc: Joe Jin <joe.jin@oracle.com>
> Cc: SRINIVAS <srinivas.eeda@oracle.com>
> Cc: stable@vger.kernel.org
> Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/page_alloc.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 778e815130a6..631546ae1c53 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5139,6 +5139,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
>   
>   		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
>   			goto refill;
> +		if (nc->pfmemalloc) {
> +			free_the_page(page, compound_order(page));
> +			goto refill;

Theoretically the refill can fail and we return NULL while leaving nc->va 
pointing to a freed page, so I think you should set nc->va to NULL.

Geez, can't the same thing already happen after we sub the nc->pagecnt_bias from 
page ref, and last users of the page fragments then return them and dec the ref 
to zero and the page gets freed?

> +		}
>   
>   #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>   		/* if size can vary use size else just use PAGE_SIZE */
>
Matthew Wilcox (Oracle) Nov. 5, 2020, 12:05 p.m. UTC | #2
On Thu, Nov 05, 2020 at 12:56:43PM +0100, Vlastimil Babka wrote:
> > +++ b/mm/page_alloc.c
> > @@ -5139,6 +5139,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
> >   		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> >   			goto refill;
> > +		if (nc->pfmemalloc) {
> > +			free_the_page(page, compound_order(page));
> > +			goto refill;
> 
> Theoretically the refill can fail and we return NULL while leaving nc->va
> pointing to a freed page, so I think you should set nc->va to NULL.
> 
> Geez, can't the same thing already happen after we sub the nc->pagecnt_bias
> from page ref, and last users of the page fragments then return them and dec
> the ref to zero and the page gets freed?

I don't think you read __page_frag_cache_refill() closely enough ...

        if (unlikely(!page))
                page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);

        nc->va = page ? page_address(page) : NULL;
Vlastimil Babka Nov. 5, 2020, 12:09 p.m. UTC | #3
On 11/5/20 1:05 PM, Matthew Wilcox wrote:
> On Thu, Nov 05, 2020 at 12:56:43PM +0100, Vlastimil Babka wrote:
>> > +++ b/mm/page_alloc.c
>> > @@ -5139,6 +5139,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
>> >   		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
>> >   			goto refill;
>> > +		if (nc->pfmemalloc) {
>> > +			free_the_page(page, compound_order(page));
>> > +			goto refill;
>> 
>> Theoretically the refill can fail and we return NULL while leaving nc->va
>> pointing to a freed page, so I think you should set nc->va to NULL.
>> 
>> Geez, can't the same thing already happen after we sub the nc->pagecnt_bias
>> from page ref, and last users of the page fragments then return them and dec
>> the ref to zero and the page gets freed?
> 
> I don't think you read __page_frag_cache_refill() closely enough ...

Or rather not at all, sorry :) somehow I just saw "ah here we call the page 
allocator".

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> 
>          if (unlikely(!page))
>                  page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
> 
>          nc->va = page ? page_address(page) : NULL;
> 
>
ericnetdev dumazet Nov. 5, 2020, 1:21 p.m. UTC | #4
On 11/5/20 5:21 AM, Matthew Wilcox (Oracle) wrote:
> When the machine is under extreme memory pressure, the page_frag allocator
> signals this to the networking stack by marking allocations with the
> 'pfmemalloc' flag, which causes non-essential packets to be dropped.
> Unfortunately, even after the machine recovers from the low memory
> condition, the page continues to be used by the page_frag allocator,
> so all allocations from this page will continue to be dropped.
> 
> Fix this by freeing and re-allocating the page instead of recycling it.
> 
> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> Cc: Bert Barbe <bert.barbe@oracle.com>
> Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
> Cc: Joe Jin <joe.jin@oracle.com>
> Cc: SRINIVAS <srinivas.eeda@oracle.com>
> Cc: stable@vger.kernel.org
> Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")

Your patch looks fine, although this Fixes: tag seems incorrect.

79930f5892e ("net: do not deplete pfmemalloc reserve") was propagating
the page pfmemalloc status into the skb, and seems correct to me.

The bug was the page_frag_alloc() was keeping a problematic page for
an arbitrary period of time ?

> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/page_alloc.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 778e815130a6..631546ae1c53 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5139,6 +5139,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
>  
>  		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
>  			goto refill;
> +		if (nc->pfmemalloc) {

                if (unlikely(nc->pfmemalloc)) {

> +			free_the_page(page, compound_order(page));
> +			goto refill;
> +		}
>  
>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>  		/* if size can vary use size else just use PAGE_SIZE */
>
Matthew Wilcox (Oracle) Nov. 5, 2020, 2:02 p.m. UTC | #5
On Thu, Nov 05, 2020 at 02:21:25PM +0100, Eric Dumazet wrote:
> On 11/5/20 5:21 AM, Matthew Wilcox (Oracle) wrote:
> > When the machine is under extreme memory pressure, the page_frag allocator
> > signals this to the networking stack by marking allocations with the
> > 'pfmemalloc' flag, which causes non-essential packets to be dropped.
> > Unfortunately, even after the machine recovers from the low memory
> > condition, the page continues to be used by the page_frag allocator,
> > so all allocations from this page will continue to be dropped.
> > 
> > Fix this by freeing and re-allocating the page instead of recycling it.
> > 
> > Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> > Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> > Cc: Bert Barbe <bert.barbe@oracle.com>
> > Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
> > Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> > Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
> > Cc: Joe Jin <joe.jin@oracle.com>
> > Cc: SRINIVAS <srinivas.eeda@oracle.com>
> > Cc: stable@vger.kernel.org
> > Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
> 
> Your patch looks fine, although this Fixes: tag seems incorrect.
> 
> 79930f5892e ("net: do not deplete pfmemalloc reserve") was propagating
> the page pfmemalloc status into the skb, and seems correct to me.
> 
> The bug was the page_frag_alloc() was keeping a problematic page for
> an arbitrary period of time ?

Isn't this the commit which unmasks the problem, though?  I don't think
it's the buggy commit, but if your tree doesn't have 79930f5892e, then
you don't need this patch.

Or are you saying the problem dates back all the way to
c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves")

> > +		if (nc->pfmemalloc) {
> 
>                 if (unlikely(nc->pfmemalloc)) {

ACK.  Will make the change once we've settled on an appropriate Fixes tag.
Matthew Wilcox (Oracle) Nov. 9, 2020, 2:32 p.m. UTC | #6
On Thu, Nov 05, 2020 at 02:02:24PM +0000, Matthew Wilcox wrote:
> On Thu, Nov 05, 2020 at 02:21:25PM +0100, Eric Dumazet wrote:
> > On 11/5/20 5:21 AM, Matthew Wilcox (Oracle) wrote:
> > > When the machine is under extreme memory pressure, the page_frag allocator
> > > signals this to the networking stack by marking allocations with the
> > > 'pfmemalloc' flag, which causes non-essential packets to be dropped.
> > > Unfortunately, even after the machine recovers from the low memory
> > > condition, the page continues to be used by the page_frag allocator,
> > > so all allocations from this page will continue to be dropped.
> > > 
> > > Fix this by freeing and re-allocating the page instead of recycling it.
> > > 
> > > Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> > > Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> > > Cc: Bert Barbe <bert.barbe@oracle.com>
> > > Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
> > > Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> > > Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
> > > Cc: Joe Jin <joe.jin@oracle.com>
> > > Cc: SRINIVAS <srinivas.eeda@oracle.com>
> > > Cc: stable@vger.kernel.org
> > > Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
> > 
> > Your patch looks fine, although this Fixes: tag seems incorrect.
> > 
> > 79930f5892e ("net: do not deplete pfmemalloc reserve") was propagating
> > the page pfmemalloc status into the skb, and seems correct to me.
> > 
> > The bug was the page_frag_alloc() was keeping a problematic page for
> > an arbitrary period of time ?
> 
> Isn't this the commit which unmasks the problem, though?  I don't think
> it's the buggy commit, but if your tree doesn't have 79930f5892e, then
> you don't need this patch.
> 
> Or are you saying the problem dates back all the way to
> c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves")
> 
> > > +		if (nc->pfmemalloc) {
> > 
> >                 if (unlikely(nc->pfmemalloc)) {
> 
> ACK.  Will make the change once we've settled on an appropriate Fixes tag.

Which commit should I claim this fixes?
Eric Dumazet Nov. 9, 2020, 2:37 p.m. UTC | #7
On 11/9/20 3:32 PM, Matthew Wilcox wrote:
> On Thu, Nov 05, 2020 at 02:02:24PM +0000, Matthew Wilcox wrote:
>> On Thu, Nov 05, 2020 at 02:21:25PM +0100, Eric Dumazet wrote:
>>> On 11/5/20 5:21 AM, Matthew Wilcox (Oracle) wrote:
>>>> When the machine is under extreme memory pressure, the page_frag allocator
>>>> signals this to the networking stack by marking allocations with the
>>>> 'pfmemalloc' flag, which causes non-essential packets to be dropped.
>>>> Unfortunately, even after the machine recovers from the low memory
>>>> condition, the page continues to be used by the page_frag allocator,
>>>> so all allocations from this page will continue to be dropped.
>>>>
>>>> Fix this by freeing and re-allocating the page instead of recycling it.
>>>>
>>>> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>>>> Cc: Bert Barbe <bert.barbe@oracle.com>
>>>> Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
>>>> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
>>>> Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
>>>> Cc: Joe Jin <joe.jin@oracle.com>
>>>> Cc: SRINIVAS <srinivas.eeda@oracle.com>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
>>>
>>> Your patch looks fine, although this Fixes: tag seems incorrect.
>>>
>>> 79930f5892e ("net: do not deplete pfmemalloc reserve") was propagating
>>> the page pfmemalloc status into the skb, and seems correct to me.
>>>
>>> The bug was the page_frag_alloc() was keeping a problematic page for
>>> an arbitrary period of time ?
>>
>> Isn't this the commit which unmasks the problem, though?  I don't think
>> it's the buggy commit, but if your tree doesn't have 79930f5892e, then
>> you don't need this patch.
>>
>> Or are you saying the problem dates back all the way to
>> c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves")
>>
>>>> +		if (nc->pfmemalloc) {
>>>
>>>                 if (unlikely(nc->pfmemalloc)) {
>>
>> ACK.  Will make the change once we've settled on an appropriate Fixes tag.
> 
> Which commit should I claim this fixes?

Hmm, no big deal, lets not waste time on tracking precise bug origin.
Dongli Zhang Nov. 15, 2020, 6:47 a.m. UTC | #8
From linux-next, this patch is not in akpm branch.

According to discussion with Matthew offline, I will take the author of this
patch as Matthew was providing review for patch and suggesting a better
alternative.

Therefore, it will be much more easier or me to track this patch.

I will re-send the patch as v2 with:

1. change author from Matthew to Dongli
2. Add references to all prior discussions
3. Add more details to commit message so that it is much more easier to search
online when this issue is encountered by other people again.
4. Add "Acked-by: Vlastimil Babka <vbabka@suse.cz>".

Thank you very much!

Dongli Zhang

On 11/9/20 6:37 AM, Eric Dumazet wrote:
> 
> 
> On 11/9/20 3:32 PM, Matthew Wilcox wrote:
>> On Thu, Nov 05, 2020 at 02:02:24PM +0000, Matthew Wilcox wrote:
>>> On Thu, Nov 05, 2020 at 02:21:25PM +0100, Eric Dumazet wrote:
>>>> On 11/5/20 5:21 AM, Matthew Wilcox (Oracle) wrote:
>>>>> When the machine is under extreme memory pressure, the page_frag allocator
>>>>> signals this to the networking stack by marking allocations with the
>>>>> 'pfmemalloc' flag, which causes non-essential packets to be dropped.
>>>>> Unfortunately, even after the machine recovers from the low memory
>>>>> condition, the page continues to be used by the page_frag allocator,
>>>>> so all allocations from this page will continue to be dropped.
>>>>>
>>>>> Fix this by freeing and re-allocating the page instead of recycling it.
>>>>>
>>>>> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>>>>> Cc: Bert Barbe <bert.barbe@oracle.com>
>>>>> Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
>>>>> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
>>>>> Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
>>>>> Cc: Joe Jin <joe.jin@oracle.com>
>>>>> Cc: SRINIVAS <srinivas.eeda@oracle.com>
>>>>> Cc: stable@vger.kernel.org
>>>>> Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
>>>>
>>>> Your patch looks fine, although this Fixes: tag seems incorrect.
>>>>
>>>> 79930f5892e ("net: do not deplete pfmemalloc reserve") was propagating
>>>> the page pfmemalloc status into the skb, and seems correct to me.
>>>>
>>>> The bug was the page_frag_alloc() was keeping a problematic page for
>>>> an arbitrary period of time ?
>>>
>>> Isn't this the commit which unmasks the problem, though?  I don't think
>>> it's the buggy commit, but if your tree doesn't have 79930f5892e, then
>>> you don't need this patch.
>>>
>>> Or are you saying the problem dates back all the way to
>>> c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves")
>>>
>>>>> +		if (nc->pfmemalloc) {
>>>>
>>>>                 if (unlikely(nc->pfmemalloc)) {
>>>
>>> ACK.  Will make the change once we've settled on an appropriate Fixes tag.
>>
>> Which commit should I claim this fixes?
> 
> Hmm, no big deal, lets not waste time on tracking precise bug origin.
>
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 778e815130a6..631546ae1c53 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5139,6 +5139,10 @@  void *page_frag_alloc(struct page_frag_cache *nc,
 
 		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
 			goto refill;
+		if (nc->pfmemalloc) {
+			free_the_page(page, compound_order(page));
+			goto refill;
+		}
 
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
 		/* if size can vary use size else just use PAGE_SIZE */