diff mbox series

[v13,1/7] net: split off __napi_busy_poll from napi_busy_poll

Message ID 20230518211751.3492982-2-shr@devkernel.io (mailing list archive)
State New
Headers show
Series io_uring: add napi busy polling support | expand

Commit Message

Stefan Roesch May 18, 2023, 9:17 p.m. UTC
This splits off the key part of the napi_busy_poll function into its own
function __napi_busy_poll. This is done in preparation for an additional
napi_busy_poll() function, that doesn't take the rcu_read_lock(). The
new function is introduced in the next patch.

Signed-off-by: Stefan Roesch <shr@devkernel.io>
---
 net/core/dev.c | 99 ++++++++++++++++++++++++++++----------------------
 1 file changed, 56 insertions(+), 43 deletions(-)

Comments

Jakub Kicinski May 31, 2023, 5:26 p.m. UTC | #1
On Thu, 18 May 2023 14:17:45 -0700 Stefan Roesch wrote:
>  	unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
> -	int (*napi_poll)(struct napi_struct *napi, int budget);
> -	void *have_poll_lock = NULL;
> -	struct napi_struct *napi;
> +	struct napi_busy_poll_ctx ctx = {};
>  
>  restart:

Can you refactor this further? I think the only state that's kept
across "restarts" is the start_time right? So this version is
effectively a loop around what ends up being napi_busy_loop_rcu(), no?
Jakub Kicinski June 1, 2023, 4:15 a.m. UTC | #2
On Wed, 31 May 2023 12:16:50 -0700 Stefan Roesch wrote:
> > This will conflict with:
> >
> >     https://git.kernel.org/netdev/net-next/c/c857946a4e26
> >
> > :( Not sure what to do about it..
> >
> > Maybe we can merge a simpler version to unblock io-uring (just add
> > need_resched() to your loop_end callback and you'll get the same
> > behavior). Refactor in net-next in parallel. Then once trees converge
> > do simple a cleanup and call the _rcu version?  
> 
> Jakub, I can certainly call need_resched() in the loop_end callback, but
> isn't there a potential race? need_resched() in the loop_end callback
> might not return true, but the need_resched() call in napi_busy_poll
> does?

need_resched() is best effort. It gets added to potentially long
execution paths and loops. Extra single round thru the loop won't 
make a difference.
Stefan Roesch June 2, 2023, 4:12 a.m. UTC | #3
Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 31 May 2023 12:16:50 -0700 Stefan Roesch wrote:
>> > This will conflict with:
>> >
>> >     https://git.kernel.org/netdev/net-next/c/c857946a4e26
>> >
>> > :( Not sure what to do about it..
>> >
>> > Maybe we can merge a simpler version to unblock io-uring (just add
>> > need_resched() to your loop_end callback and you'll get the same
>> > behavior). Refactor in net-next in parallel. Then once trees converge
>> > do simple a cleanup and call the _rcu version?
>>
>> Jakub, I can certainly call need_resched() in the loop_end callback, but
>> isn't there a potential race? need_resched() in the loop_end callback
>> might not return true, but the need_resched() call in napi_busy_poll
>> does?
>
> need_resched() is best effort. It gets added to potentially long
> execution paths and loops. Extra single round thru the loop won't
> make a difference.

I might be missing something, however what can happen at a high-level is:

io_napi_blocking_busy_loop()
  rcu_read_lock()
  __io_napi_busy_do_busy_loop()
  rcu_read_unlock()

in __io_napi_do_busy_loop() we do

__io_napi_do_busy_loop()
  list_foreach_entry_rcu()
    napi_busy_loop()


and in napi_busy_loop()

napi_busy_loop()
  rcu_read_lock()
  __napi_busy_poll()
  loop_end()
  if (need_resched) {
    rcu_read_unlock()
    schedule()
  }


The problem with checking need_resched in loop_end is that need_resched
can be false in loop_end, however the check for need_resched in
napi_busy_loop succeeds. This means that we unlock the rcu read lock and
call schedule. However the code in io_napi_blocking_busy_loop still
believes we hold the read lock.
Jakub Kicinski June 2, 2023, 4:26 a.m. UTC | #4
On Thu, 01 Jun 2023 21:12:10 -0700 Stefan Roesch wrote:
> The problem with checking need_resched in loop_end is that need_resched
> can be false in loop_end, however the check for need_resched in
> napi_busy_loop succeeds. This means that we unlock the rcu read lock and
> call schedule. However the code in io_napi_blocking_busy_loop still
> believes we hold the read lock.

Ah, yes, now it makes sense. 
Yeah, that's a race, scratch the workaround idea then.
Let's work on revising the patches, we'll see where we 
are in the development process when they're ready.
Stefan Roesch June 5, 2023, 5:47 p.m. UTC | #5
Jakub Kicinski <kuba@kernel.org> writes:

> On Thu, 18 May 2023 14:17:45 -0700 Stefan Roesch wrote:
>>  	unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
>> -	int (*napi_poll)(struct napi_struct *napi, int budget);
>> -	void *have_poll_lock = NULL;
>> -	struct napi_struct *napi;
>> +	struct napi_busy_poll_ctx ctx = {};
>>
>>  restart:
>
> Can you refactor this further? I think the only state that's kept
> across "restarts" is the start_time right? So this version is
> effectively a loop around what ends up being napi_busy_loop_rcu(), no?

I'm not sure I understand this correctly. Do you want the start time to
be a parameter of the function napi_busy_poll_rcu?
Jakub Kicinski June 5, 2023, 6 p.m. UTC | #6
On Mon, 05 Jun 2023 10:47:40 -0700 Stefan Roesch wrote:
> > Can you refactor this further? I think the only state that's kept
> > across "restarts" is the start_time right? So this version is
> > effectively a loop around what ends up being napi_busy_loop_rcu(), no?  
> 
> I'm not sure I understand this correctly. Do you want the start time to
> be a parameter of the function napi_busy_poll_rcu?

The RCU and non-RCU version of this function end up looking very
similar, what I'm asking is to share more code.
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 253584777101..f4677aa20f84 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6166,66 +6166,79 @@  static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool
 	local_bh_enable();
 }
 
+struct napi_busy_poll_ctx {
+	struct napi_struct *napi;
+	int (*napi_poll)(struct napi_struct *napi, int budget);
+	void *have_poll_lock;
+};
+
+static inline void __napi_busy_poll(struct napi_busy_poll_ctx *ctx,
+				    bool prefer_busy_poll, u16 budget)
+{
+	struct napi_struct *napi = ctx->napi;
+	int work = 0;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	local_bh_disable();
+	if (!ctx->napi_poll) {
+		unsigned long val = READ_ONCE(napi->state);
+
+		/* If multiple threads are competing for this napi,
+		 * we avoid dirtying napi->state as much as we can.
+		 */
+		if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED |
+			   NAPIF_STATE_IN_BUSY_POLL)) {
+			if (prefer_busy_poll)
+				set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
+			goto count;
+		}
+		if (cmpxchg(&napi->state, val,
+			    val | NAPIF_STATE_IN_BUSY_POLL |
+				  NAPIF_STATE_SCHED) != val) {
+			if (prefer_busy_poll)
+				set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
+			goto count;
+		}
+		ctx->have_poll_lock = netpoll_poll_lock(napi);
+		ctx->napi_poll = napi->poll;
+	}
+	work = ctx->napi_poll(napi, budget);
+	trace_napi_poll(napi, work, budget);
+	gro_normal_list(napi);
+
+count:
+	if (work > 0)
+		__NET_ADD_STATS(dev_net(napi->dev),
+				LINUX_MIB_BUSYPOLLRXPACKETS, work);
+	local_bh_enable();
+}
 void napi_busy_loop(unsigned int napi_id,
 		    bool (*loop_end)(void *, unsigned long),
 		    void *loop_end_arg, bool prefer_busy_poll, u16 budget)
 {
 	unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
-	int (*napi_poll)(struct napi_struct *napi, int budget);
-	void *have_poll_lock = NULL;
-	struct napi_struct *napi;
+	struct napi_busy_poll_ctx ctx = {};
 
 restart:
-	napi_poll = NULL;
+	ctx.napi_poll = NULL;
 
 	rcu_read_lock();
 
-	napi = napi_by_id(napi_id);
-	if (!napi)
+	ctx.napi = napi_by_id(napi_id);
+	if (!ctx.napi)
 		goto out;
 
 	preempt_disable();
 	for (;;) {
-		int work = 0;
-
-		local_bh_disable();
-		if (!napi_poll) {
-			unsigned long val = READ_ONCE(napi->state);
-
-			/* If multiple threads are competing for this napi,
-			 * we avoid dirtying napi->state as much as we can.
-			 */
-			if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED |
-				   NAPIF_STATE_IN_BUSY_POLL)) {
-				if (prefer_busy_poll)
-					set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
-				goto count;
-			}
-			if (cmpxchg(&napi->state, val,
-				    val | NAPIF_STATE_IN_BUSY_POLL |
-					  NAPIF_STATE_SCHED) != val) {
-				if (prefer_busy_poll)
-					set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
-				goto count;
-			}
-			have_poll_lock = netpoll_poll_lock(napi);
-			napi_poll = napi->poll;
-		}
-		work = napi_poll(napi, budget);
-		trace_napi_poll(napi, work, budget);
-		gro_normal_list(napi);
-count:
-		if (work > 0)
-			__NET_ADD_STATS(dev_net(napi->dev),
-					LINUX_MIB_BUSYPOLLRXPACKETS, work);
-		local_bh_enable();
+		__napi_busy_poll(&ctx, prefer_busy_poll, budget);
 
 		if (!loop_end || loop_end(loop_end_arg, start_time))
 			break;
 
 		if (unlikely(need_resched())) {
-			if (napi_poll)
-				busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
+			if (ctx.napi_poll)
+				busy_poll_stop(ctx.napi, ctx.have_poll_lock, prefer_busy_poll, budget);
 			preempt_enable();
 			rcu_read_unlock();
 			cond_resched();
@@ -6235,8 +6248,8 @@  void napi_busy_loop(unsigned int napi_id,
 		}
 		cpu_relax();
 	}
-	if (napi_poll)
-		busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
+	if (ctx.napi_poll)
+		busy_poll_stop(ctx.napi, ctx.have_poll_lock, prefer_busy_poll, budget);
 	preempt_enable();
 out:
 	rcu_read_unlock();