diff mbox series

[v2,net-next,08/11] af_unix: Define locking order for U_RECVQ_LOCK_EMBRYO in unix_collect_skb().

Message ID 20240611222905.34695-9-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series af_unix: Remove spin_lock_nested() and convert to lock_cmp_fn. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 856 this patch: 856
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 854 this patch: 854
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 860 this patch: 860
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-06-14--06-00 (tests: 647)

Commit Message

Kuniyuki Iwashima June 11, 2024, 10:29 p.m. UTC
While GC is cleaning up cyclic references by SCM_RIGHTS,
unix_collect_skb() collects skb in the socket's recvq.

If the socket is TCP_LISTEN, we need to collect skb in the
embryo's queue.  Then, both the listener's recvq lock and
the embroy's one are held.

The locking is always done in the listener -> embryo order.

Let's define it as unix_recvq_lock_cmp_fn() instead of using
spin_lock_nested().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 17 +++++++++++++++++
 net/unix/garbage.c |  8 +-------
 2 files changed, 18 insertions(+), 7 deletions(-)

Comments

Kent Overstreet June 11, 2024, 11:17 p.m. UTC | #1
On Tue, Jun 11, 2024 at 03:29:02PM GMT, Kuniyuki Iwashima wrote:
> While GC is cleaning up cyclic references by SCM_RIGHTS,
> unix_collect_skb() collects skb in the socket's recvq.
> 
> If the socket is TCP_LISTEN, we need to collect skb in the
> embryo's queue.  Then, both the listener's recvq lock and
> the embroy's one are held.
> 
> The locking is always done in the listener -> embryo order.
> 
> Let's define it as unix_recvq_lock_cmp_fn() instead of using
> spin_lock_nested().
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  net/unix/af_unix.c | 17 +++++++++++++++++
>  net/unix/garbage.c |  8 +-------
>  2 files changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 8d03c5ef61df..8959ee8753d1 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -170,6 +170,21 @@ static int unix_state_lock_cmp_fn(const struct lockdep_map *_a,
>  	/* unix_state_double_lock(): ascending address order. */
>  	return cmp_ptr(a, b);
>  }
> +
> +static int unix_recvq_lock_cmp_fn(const struct lockdep_map *_a,
> +				  const struct lockdep_map *_b)
> +{
> +	const struct sock *a, *b;
> +
> +	a = container_of(_a, struct sock, sk_receive_queue.lock.dep_map);
> +	b = container_of(_b, struct sock, sk_receive_queue.lock.dep_map);
> +
> +	/* unix_collect_skb(): listener -> embryo order. */
> +	if (a->sk_state == TCP_LISTEN && unix_sk(b)->listener == a)
> +		return -1;
> +
> +	return 0;
> +}
>  #endif

That's not symmetric.
Kuniyuki Iwashima June 11, 2024, 11:23 p.m. UTC | #2
From: Kent Overstreet <kent.overstreet@linux.dev>
Date: Tue, 11 Jun 2024 19:17:53 -0400
> On Tue, Jun 11, 2024 at 03:29:02PM GMT, Kuniyuki Iwashima wrote:
> > While GC is cleaning up cyclic references by SCM_RIGHTS,
> > unix_collect_skb() collects skb in the socket's recvq.
> > 
> > If the socket is TCP_LISTEN, we need to collect skb in the
> > embryo's queue.  Then, both the listener's recvq lock and
> > the embroy's one are held.
> > 
> > The locking is always done in the listener -> embryo order.
> > 
> > Let's define it as unix_recvq_lock_cmp_fn() instead of using
> > spin_lock_nested().
> > 
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> >  net/unix/af_unix.c | 17 +++++++++++++++++
> >  net/unix/garbage.c |  8 +-------
> >  2 files changed, 18 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > index 8d03c5ef61df..8959ee8753d1 100644
> > --- a/net/unix/af_unix.c
> > +++ b/net/unix/af_unix.c
> > @@ -170,6 +170,21 @@ static int unix_state_lock_cmp_fn(const struct lockdep_map *_a,
> >  	/* unix_state_double_lock(): ascending address order. */
> >  	return cmp_ptr(a, b);
> >  }
> > +
> > +static int unix_recvq_lock_cmp_fn(const struct lockdep_map *_a,
> > +				  const struct lockdep_map *_b)
> > +{
> > +	const struct sock *a, *b;
> > +
> > +	a = container_of(_a, struct sock, sk_receive_queue.lock.dep_map);
> > +	b = container_of(_b, struct sock, sk_receive_queue.lock.dep_map);
> > +
> > +	/* unix_collect_skb(): listener -> embryo order. */
> > +	if (a->sk_state == TCP_LISTEN && unix_sk(b)->listener == a)
> > +		return -1;
> > +
> > +	return 0;
> > +}
> >  #endif
> 
> That's not symmetric.

I think we agreed this is allowed, no ?

https://lore.kernel.org/netdev/thzkgbuwuo3knevpipu4rzsh5qgmwhklihypdgziiruabvh46f@uwdkpcfxgloo/
Paolo Abeni June 14, 2024, 11:04 a.m. UTC | #3
On Tue, 2024-06-11 at 16:23 -0700, Kuniyuki Iwashima wrote:
> From: Kent Overstreet <kent.overstreet@linux.dev>
> Date: Tue, 11 Jun 2024 19:17:53 -0400
> > On Tue, Jun 11, 2024 at 03:29:02PM GMT, Kuniyuki Iwashima wrote:
> > > While GC is cleaning up cyclic references by SCM_RIGHTS,
> > > unix_collect_skb() collects skb in the socket's recvq.
> > > 
> > > If the socket is TCP_LISTEN, we need to collect skb in the
> > > embryo's queue.  Then, both the listener's recvq lock and
> > > the embroy's one are held.
> > > 
> > > The locking is always done in the listener -> embryo order.
> > > 
> > > Let's define it as unix_recvq_lock_cmp_fn() instead of using
> > > spin_lock_nested().
> > > 
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > ---
> > >  net/unix/af_unix.c | 17 +++++++++++++++++
> > >  net/unix/garbage.c |  8 +-------
> > >  2 files changed, 18 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > > index 8d03c5ef61df..8959ee8753d1 100644
> > > --- a/net/unix/af_unix.c
> > > +++ b/net/unix/af_unix.c
> > > @@ -170,6 +170,21 @@ static int unix_state_lock_cmp_fn(const struct lockdep_map *_a,
> > >  	/* unix_state_double_lock(): ascending address order. */
> > >  	return cmp_ptr(a, b);
> > >  }
> > > +
> > > +static int unix_recvq_lock_cmp_fn(const struct lockdep_map *_a,
> > > +				  const struct lockdep_map *_b)
> > > +{
> > > +	const struct sock *a, *b;
> > > +
> > > +	a = container_of(_a, struct sock, sk_receive_queue.lock.dep_map);
> > > +	b = container_of(_b, struct sock, sk_receive_queue.lock.dep_map);
> > > +
> > > +	/* unix_collect_skb(): listener -> embryo order. */
> > > +	if (a->sk_state == TCP_LISTEN && unix_sk(b)->listener == a)
> > > +		return -1;
> > > +
> > > +	return 0;
> > > +}
> > >  #endif
> > 
> > That's not symmetric.
> 
> I think we agreed this is allowed, no ?
> 
> https://lore.kernel.org/netdev/thzkgbuwuo3knevpipu4rzsh5qgmwhklihypdgziiruabvh46f@uwdkpcfxgloo/

My understanding of such thread is that you should return 1 for the
embryo -> listener order (for consistency). You can keep returning 0
for all the other 'undefined' cases.

Thanks!

Paolo
Kuniyuki Iwashima June 14, 2024, 6:55 p.m. UTC | #4
From: Paolo Abeni <pabeni@redhat.com>
Date: Fri, 14 Jun 2024 13:04:06 +0200
> On Tue, 2024-06-11 at 16:23 -0700, Kuniyuki Iwashima wrote:
> > From: Kent Overstreet <kent.overstreet@linux.dev>
> > Date: Tue, 11 Jun 2024 19:17:53 -0400
> > > On Tue, Jun 11, 2024 at 03:29:02PM GMT, Kuniyuki Iwashima wrote:
> > > > While GC is cleaning up cyclic references by SCM_RIGHTS,
> > > > unix_collect_skb() collects skb in the socket's recvq.
> > > > 
> > > > If the socket is TCP_LISTEN, we need to collect skb in the
> > > > embryo's queue.  Then, both the listener's recvq lock and
> > > > the embroy's one are held.
> > > > 
> > > > The locking is always done in the listener -> embryo order.
> > > > 
> > > > Let's define it as unix_recvq_lock_cmp_fn() instead of using
> > > > spin_lock_nested().
> > > > 
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > ---
> > > >  net/unix/af_unix.c | 17 +++++++++++++++++
> > > >  net/unix/garbage.c |  8 +-------
> > > >  2 files changed, 18 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > > > index 8d03c5ef61df..8959ee8753d1 100644
> > > > --- a/net/unix/af_unix.c
> > > > +++ b/net/unix/af_unix.c
> > > > @@ -170,6 +170,21 @@ static int unix_state_lock_cmp_fn(const struct lockdep_map *_a,
> > > >  	/* unix_state_double_lock(): ascending address order. */
> > > >  	return cmp_ptr(a, b);
> > > >  }
> > > > +
> > > > +static int unix_recvq_lock_cmp_fn(const struct lockdep_map *_a,
> > > > +				  const struct lockdep_map *_b)
> > > > +{
> > > > +	const struct sock *a, *b;
> > > > +
> > > > +	a = container_of(_a, struct sock, sk_receive_queue.lock.dep_map);
> > > > +	b = container_of(_b, struct sock, sk_receive_queue.lock.dep_map);
> > > > +
> > > > +	/* unix_collect_skb(): listener -> embryo order. */
> > > > +	if (a->sk_state == TCP_LISTEN && unix_sk(b)->listener == a)
> > > > +		return -1;
> > > > +
> > > > +	return 0;
> > > > +}
> > > >  #endif
> > > 
> > > That's not symmetric.
> > 
> > I think we agreed this is allowed, no ?
> > 
> > https://lore.kernel.org/netdev/thzkgbuwuo3knevpipu4rzsh5qgmwhklihypdgziiruabvh46f@uwdkpcfxgloo/
> 
> My understanding of such thread is that you should return 1 for the
> embryo -> listener order (for consistency). You can keep returning 0
> for all the other 'undefined' cases.

Ah, I understood.  Will do so in v3.

Thanks!
diff mbox series

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 8d03c5ef61df..8959ee8753d1 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -170,6 +170,21 @@  static int unix_state_lock_cmp_fn(const struct lockdep_map *_a,
 	/* unix_state_double_lock(): ascending address order. */
 	return cmp_ptr(a, b);
 }
+
+static int unix_recvq_lock_cmp_fn(const struct lockdep_map *_a,
+				  const struct lockdep_map *_b)
+{
+	const struct sock *a, *b;
+
+	a = container_of(_a, struct sock, sk_receive_queue.lock.dep_map);
+	b = container_of(_b, struct sock, sk_receive_queue.lock.dep_map);
+
+	/* unix_collect_skb(): listener -> embryo order. */
+	if (a->sk_state == TCP_LISTEN && unix_sk(b)->listener == a)
+		return -1;
+
+	return 0;
+}
 #endif
 
 static unsigned int unix_unbound_hash(struct sock *sk)
@@ -1017,6 +1032,8 @@  static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
 	sk->sk_write_space	= unix_write_space;
 	sk->sk_max_ack_backlog	= READ_ONCE(net->unx.sysctl_max_dgram_qlen);
 	sk->sk_destruct		= unix_sock_destructor;
+	lock_set_cmp_fn(&sk->sk_receive_queue.lock, unix_recvq_lock_cmp_fn, NULL);
+
 	u = unix_sk(sk);
 	u->listener = NULL;
 	u->vertex = NULL;
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index dfe94a90ece4..eb8aa5171a68 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -337,11 +337,6 @@  static bool unix_vertex_dead(struct unix_vertex *vertex)
 	return true;
 }
 
-enum unix_recv_queue_lock_class {
-	U_RECVQ_LOCK_NORMAL,
-	U_RECVQ_LOCK_EMBRYO,
-};
-
 static void unix_collect_queue(struct unix_sock *u, struct sk_buff_head *hitlist)
 {
 	skb_queue_splice_init(&u->sk.sk_receive_queue, hitlist);
@@ -375,8 +370,7 @@  static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *hitlist
 			skb_queue_walk(queue, skb) {
 				struct sk_buff_head *embryo_queue = &skb->sk->sk_receive_queue;
 
-				/* listener -> embryo order, the inversion never happens. */
-				spin_lock_nested(&embryo_queue->lock, U_RECVQ_LOCK_EMBRYO);
+				spin_lock(&embryo_queue->lock);
 				unix_collect_queue(unix_sk(skb->sk), hitlist);
 				spin_unlock(&embryo_queue->lock);
 			}