mbox series

[0/4] kvfree_rcu() and _LOCK_NESTING/_PREEMPT_RT

Message ID 20200918194817.48921-1-urezki@gmail.com (mailing list archive)
Headers show
Series kvfree_rcu() and _LOCK_NESTING/_PREEMPT_RT | expand

Message

Uladzislau Rezki Sept. 18, 2020, 7:48 p.m. UTC
Hello, folk!

This is another iteration of fixing kvfree_rcu() issues related
to CONFIG_PROVE_RAW_LOCK_NESTING and CONFIG_PREEMPT_RT configs.

The first discussion is here https://lkml.org/lkml/2020/8/9/195.

- As an outcome of it, there was a proposal from Peter, instead of
using a speciall "lock-less" flag it is better to move lock-less
access to the pcplist to the separate function.

- To add a special worker thread that does prefetching of pages
if a per-cpu page cache is depleted(what is absolutely normal). 

As usual, thank you for paying attention to it and your help!

Uladzislau Rezki (Sony) (4):
  rcu/tree: Add a work to allocate pages from regular context
  mm: Add __rcu_alloc_page_lockless() func.
  rcu/tree: use __rcu_alloc_page_lockless() func.
  rcu/tree: Use schedule_delayed_work() instead of WQ_HIGHPRI queue

 include/linux/gfp.h |  1 +
 kernel/rcu/tree.c   | 90 ++++++++++++++++++++++++---------------------
 mm/page_alloc.c     | 82 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 132 insertions(+), 41 deletions(-)

Comments

Paul E. McKenney Sept. 18, 2020, 10:15 p.m. UTC | #1
On Fri, Sep 18, 2020 at 09:48:13PM +0200, Uladzislau Rezki (Sony) wrote:
> Hello, folk!
> 
> This is another iteration of fixing kvfree_rcu() issues related
> to CONFIG_PROVE_RAW_LOCK_NESTING and CONFIG_PREEMPT_RT configs.
> 
> The first discussion is here https://lkml.org/lkml/2020/8/9/195.
> 
> - As an outcome of it, there was a proposal from Peter, instead of
> using a speciall "lock-less" flag it is better to move lock-less
> access to the pcplist to the separate function.
> 
> - To add a special worker thread that does prefetching of pages
> if a per-cpu page cache is depleted(what is absolutely normal). 
> 
> As usual, thank you for paying attention to it and your help!
> 
> Uladzislau Rezki (Sony) (4):
>   rcu/tree: Add a work to allocate pages from regular context
>   mm: Add __rcu_alloc_page_lockless() func.
>   rcu/tree: use __rcu_alloc_page_lockless() func.
>   rcu/tree: Use schedule_delayed_work() instead of WQ_HIGHPRI queue

Thank you, Uladzislau!

I have pulled this into -rcu for review and testing.  I have not yet
assigned it to an intended release.

							Thanx, Paul

>  include/linux/gfp.h |  1 +
>  kernel/rcu/tree.c   | 90 ++++++++++++++++++++++++---------------------
>  mm/page_alloc.c     | 82 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 132 insertions(+), 41 deletions(-)
> 
> -- 
> 2.20.1
>
Joel Fernandes Sept. 30, 2020, 3:52 p.m. UTC | #2
On Fri, Sep 18, 2020 at 09:48:13PM +0200, Uladzislau Rezki (Sony) wrote:
> Hello, folk!
> 
> This is another iteration of fixing kvfree_rcu() issues related
> to CONFIG_PROVE_RAW_LOCK_NESTING and CONFIG_PREEMPT_RT configs.
> 
> The first discussion is here https://lkml.org/lkml/2020/8/9/195.
> 
> - As an outcome of it, there was a proposal from Peter, instead of
> using a speciall "lock-less" flag it is better to move lock-less
> access to the pcplist to the separate function.
> 
> - To add a special worker thread that does prefetching of pages
> if a per-cpu page cache is depleted(what is absolutely normal). 
> 
> As usual, thank you for paying attention to it and your help!

Doesn't making it a lower priority WQ exacerbate the problem Mel described?

So like:
1. pcp cache is depleted by kvfree_rcu without refill or other measures to
   relieve memory.
2. now other GFP_ATOMIC users could likely hit the emergency reserves in the
   buddy allocator as the watermarks are crossed.
3. kvfree_rcu() notices failure and queues workqueue to do non-preemptible
   buddy allocations which will refill the pcp cache in the process.
4. But that happens much later because this patch (4/4) down prioritized the
   work to do the refill.

I'd suggest keeping it high pri since I don't see how it can make things
better.

Or another option is:
Why not just hit the fallback path in the caller on the first attempt, and
trigger the WQ to do the allocation. If the pool grows too big, we can have
shrinkers that free memory that is excessive so that will help the phone
usecases. That way no changes to low-level allocator are needed.

Or did I miss something?

thanks,

 - Joel


> 
> Uladzislau Rezki (Sony) (4):
>   rcu/tree: Add a work to allocate pages from regular context
>   mm: Add __rcu_alloc_page_lockless() func.
>   rcu/tree: use __rcu_alloc_page_lockless() func.
>   rcu/tree: Use schedule_delayed_work() instead of WQ_HIGHPRI queue
> 
>  include/linux/gfp.h |  1 +
>  kernel/rcu/tree.c   | 90 ++++++++++++++++++++++++---------------------
>  mm/page_alloc.c     | 82 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 132 insertions(+), 41 deletions(-)
> 
> -- 
> 2.20.1
>