diff mbox series

[v1,net-next] af_unix: Try not to hold unix_gc_lock during accept().

Message ID 20240410201929.34716-1-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v1,net-next] af_unix: Try not to hold unix_gc_lock during accept(). | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 966 this patch: 967
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: dhowells@redhat.com
netdev/build_clang success Errors and warnings before: 954 this patch: 954
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 978 this patch: 979
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 66 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kuniyuki Iwashima April 10, 2024, 8:19 p.m. UTC
Commit dcf70df2048d ("af_unix: Fix up unix_edge.successor for embryo
socket.") added spin_lock(&unix_gc_lock) in accept() path, and it
caused regression in a stress test as reported by kernel test robot.

If the embryo socket is not part of the inflight graph, we need not
hold the lock.

To decide that in O(1) time and avoid the regression in the normal
use case,

  1. add a new stat unix_sk(sk)->scm_stat.nr_unix_fds

  2. count the number of inflight AF_UNIX sockets in the receive
     queue under unix_state_lock()

  3. move unix_update_edges() call under unix_state_lock()

  4. avoid locking if nr_unix_fds is 0 in unix_update_edges()

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202404101427.92a08551-oliver.sang@intel.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |  1 +
 net/unix/af_unix.c    |  2 +-
 net/unix/garbage.c    | 21 ++++++++++++++++++---
 3 files changed, 20 insertions(+), 4 deletions(-)

Comments

Jakub Kicinski April 13, 2024, 2:05 a.m. UTC | #1
On Wed, 10 Apr 2024 13:19:29 -0700 Kuniyuki Iwashima wrote:
>  void unix_update_edges(struct unix_sock *receiver)
>  {
> -	spin_lock(&unix_gc_lock);
> -	unix_update_graph(unix_sk(receiver->listener)->vertex);
> +	/* nr_unix_fds is only updated under unix_state_lock().
> +	 * If it's 0 here, the embryo socket is not part of the
> +	 * inflight graph, and GC will not see it.
> +	 */
> +	bool need_lock = !!receiver->scm_stat.nr_unix_fds;
> +
> +	if (need_lock) {
> +		spin_lock(&unix_gc_lock);
> +		unix_update_graph(unix_sk(receiver->listener)->vertex);
> +	}
> +
>  	receiver->listener = NULL;
> -	spin_unlock(&unix_gc_lock);
> +
> +	if (need_lock)
> +		spin_unlock(&unix_gc_lock);
>  }

Are you planning to add more code here? I feel like the sharing of 
a single line is outweighted by the conditionals.. I mean:

	/* ...
	 */
	if (!receiver->scm_stat.nr_unix_fd) {
		receiver->listener = NULL;
	} else {
		spin_lock(&unix_gc_lock);
		unix_update_graph(unix_sk(receiver->listener)->vertex);
		receiver->listener = NULL;
		spin_unlock(&unix_gc_lock);
	}

no?
Kuniyuki Iwashima April 13, 2024, 2:10 a.m. UTC | #2
From: Jakub Kicinski <kuba@kernel.org>
Date: Fri, 12 Apr 2024 19:05:22 -0700
> On Wed, 10 Apr 2024 13:19:29 -0700 Kuniyuki Iwashima wrote:
> >  void unix_update_edges(struct unix_sock *receiver)
> >  {
> > -	spin_lock(&unix_gc_lock);
> > -	unix_update_graph(unix_sk(receiver->listener)->vertex);
> > +	/* nr_unix_fds is only updated under unix_state_lock().
> > +	 * If it's 0 here, the embryo socket is not part of the
> > +	 * inflight graph, and GC will not see it.
> > +	 */
> > +	bool need_lock = !!receiver->scm_stat.nr_unix_fds;
> > +
> > +	if (need_lock) {
> > +		spin_lock(&unix_gc_lock);
> > +		unix_update_graph(unix_sk(receiver->listener)->vertex);
> > +	}
> > +
> >  	receiver->listener = NULL;
> > -	spin_unlock(&unix_gc_lock);
> > +
> > +	if (need_lock)
> > +		spin_unlock(&unix_gc_lock);
> >  }
> 
> Are you planning to add more code here? I feel like the sharing of 
> a single line is outweighted by the conditionals.. I mean:
> 
> 	/* ...
> 	 */
> 	if (!receiver->scm_stat.nr_unix_fd) {
> 		receiver->listener = NULL;
> 	} else {
> 		spin_lock(&unix_gc_lock);
> 		unix_update_graph(unix_sk(receiver->listener)->vertex);
> 		receiver->listener = NULL;
> 		spin_unlock(&unix_gc_lock);
> 	}
> 
> no?

Ah exactly, I'll repsin v2 with that style.

Thanks!
diff mbox series

Patch

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 7311b77edfc7..872ff2a50372 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -67,6 +67,7 @@  struct unix_skb_parms {
 
 struct scm_stat {
 	atomic_t nr_fds;
+	unsigned long nr_unix_fds;
 };
 
 #define UNIXCB(skb)	(*(struct unix_skb_parms *)&((skb)->cb))
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 61ecfa9c9c6b..024ba5cbdcb8 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1719,12 +1719,12 @@  static int unix_accept(struct socket *sock, struct socket *newsock, int flags,
 	}
 
 	tsk = skb->sk;
-	unix_update_edges(unix_sk(tsk));
 	skb_free_datagram(sk, skb);
 	wake_up_interruptible(&unix_sk(sk)->peer_wait);
 
 	/* attach accepted sock to socket */
 	unix_state_lock(tsk);
+	unix_update_edges(unix_sk(tsk));
 	newsock->state = SS_CONNECTED;
 	unix_sock_inherit_flags(sock, newsock);
 	sock_graft(tsk, newsock);
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 12a4ec27e0d4..4da3f4e0bb6e 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -209,6 +209,7 @@  void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 		unix_add_edge(fpl, edge);
 	} while (i < fpl->count_unix);
 
+	receiver->scm_stat.nr_unix_fds += fpl->count_unix;
 	WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + fpl->count_unix);
 out:
 	WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight + fpl->count);
@@ -222,6 +223,7 @@  void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 
 void unix_del_edges(struct scm_fp_list *fpl)
 {
+	struct unix_sock *receiver;
 	int i = 0;
 
 	spin_lock(&unix_gc_lock);
@@ -235,6 +237,8 @@  void unix_del_edges(struct scm_fp_list *fpl)
 		unix_del_edge(fpl, edge);
 	} while (i < fpl->count_unix);
 
+	receiver = fpl->edges[0].successor;
+	receiver->scm_stat.nr_unix_fds -= fpl->count_unix;
 	WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix);
 out:
 	WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->count);
@@ -246,10 +250,21 @@  void unix_del_edges(struct scm_fp_list *fpl)
 
 void unix_update_edges(struct unix_sock *receiver)
 {
-	spin_lock(&unix_gc_lock);
-	unix_update_graph(unix_sk(receiver->listener)->vertex);
+	/* nr_unix_fds is only updated under unix_state_lock().
+	 * If it's 0 here, the embryo socket is not part of the
+	 * inflight graph, and GC will not see it.
+	 */
+	bool need_lock = !!receiver->scm_stat.nr_unix_fds;
+
+	if (need_lock) {
+		spin_lock(&unix_gc_lock);
+		unix_update_graph(unix_sk(receiver->listener)->vertex);
+	}
+
 	receiver->listener = NULL;
-	spin_unlock(&unix_gc_lock);
+
+	if (need_lock)
+		spin_unlock(&unix_gc_lock);
 }
 
 int unix_prepare_fpl(struct scm_fp_list *fpl)