diff mbox series

[v1,net,03/11] af_unix: Stop recv(MSG_PEEK) at consumed OOB skb.

Message ID 20240625013645.45034-4-kuniyu@amazon.com (mailing list archive)
State Accepted
Commit b94038d841a91d0e3f59cfe4d073e210910366ee
Delegated to: Netdev Maintainers
Headers show
Series af_unix: Fix bunch of MSG_OOB bugs and add new tests. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 859 this patch: 859
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 2 maintainers not CCed: shuah@kernel.org linux-kselftest@vger.kernel.org
netdev/build_clang success Errors and warnings before: 863 this patch: 863
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 867 this patch: 867
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 52 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-06-27--03-00 (tests: 665)

Commit Message

Kuniyuki Iwashima June 25, 2024, 1:36 a.m. UTC
After consuming OOB data, recv() reading the preceding data must break at
the OOB skb regardless of MSG_PEEK.

Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is
not compliant with TCP.

  >>> from socket import *
  >>> c1, c2 = socketpair(AF_UNIX)
  >>> c1.send(b'hello', MSG_OOB)
  5
  >>> c1.send(b'world')
  5
  >>> c2.recv(1, MSG_OOB)
  b'o'
  >>> c2.recv(9, MSG_PEEK)  # This should return b'hell'
  b'hellworld'              # even with enough buffer.

Let's fix it by returning NULL for consumed skb and unlinking it only if
MSG_PEEK is not specified.

This patch also adds test cases that add recv(MSG_PEEK) before each recv().

Without fix:

  #  RUN           msg_oob.peek.oob_ahead_break ...
  # msg_oob.c:134:oob_ahead_break:AF_UNIX :hellworld
  # msg_oob.c:135:oob_ahead_break:Expected:hell
  # msg_oob.c:137:oob_ahead_break:Expected ret[0] (9) == expected_len (4)
  # oob_ahead_break: Test terminated by assertion
  #          FAIL  msg_oob.peek.oob_ahead_break
  not ok 13 msg_oob.peek.oob_ahead_break

With fix:

  #  RUN           msg_oob.peek.oob_ahead_break ...
  #            OK  msg_oob.peek.oob_ahead_break
  ok 13 msg_oob.peek.oob_ahead_break

Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c                            |  9 ++++---
 tools/testing/selftests/net/af_unix/msg_oob.c | 25 +++++++++++++++++--
 2 files changed, 29 insertions(+), 5 deletions(-)

Comments

Paolo Abeni June 26, 2024, 4:56 p.m. UTC | #1
On Mon, 2024-06-24 at 18:36 -0700, Kuniyuki Iwashima wrote:
> After consuming OOB data, recv() reading the preceding data must break at
> the OOB skb regardless of MSG_PEEK.
> 
> Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is
> not compliant with TCP.

I'm unsure we can change the MSG_PEEK behavior at this point: existing
application(s?) could relay on that, regardless of how inconsistent
such behavior is.

I think we need at least an explicit ack from Rao, as the main known
user.

Paolo
Paolo Abeni June 26, 2024, 9:10 p.m. UTC | #2
On Wed, 2024-06-26 at 18:56 +0200, Paolo Abeni wrote:
> On Mon, 2024-06-24 at 18:36 -0700, Kuniyuki Iwashima wrote:
> > After consuming OOB data, recv() reading the preceding data must break at
> > the OOB skb regardless of MSG_PEEK.
> > 
> > Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is
> > not compliant with TCP.
> 
> I'm unsure we can change the MSG_PEEK behavior at this point: existing
> application(s?) could relay on that, regardless of how inconsistent
> such behavior is.
> 
> I think we need at least an explicit ack from Rao, as the main known
> user.

I see Rao stated that the unix OoB implementation was designed to mimic
the tcp one:

https://lore.kernel.org/netdev/c5f6abbe-de43-48b8-856a-36ded227e94f@oracle.com/

so the series should be ok.

Still given the size of the changes and the behavior change I'm
wondering if the series should target net-next instead.
That would offer some time cushion to deal with eventual regression.
WDYT?

Thanks,

Paolo
Kuniyuki Iwashima June 26, 2024, 9:47 p.m. UTC | #3
From: Paolo Abeni <pabeni@redhat.com>
Date: Wed, 26 Jun 2024 23:10:40 +0200
> On Wed, 2024-06-26 at 18:56 +0200, Paolo Abeni wrote:
> > On Mon, 2024-06-24 at 18:36 -0700, Kuniyuki Iwashima wrote:
> > > After consuming OOB data, recv() reading the preceding data must break at
> > > the OOB skb regardless of MSG_PEEK.
> > > 
> > > Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is
> > > not compliant with TCP.
> > 
> > I'm unsure we can change the MSG_PEEK behavior at this point: existing
> > application(s?) could relay on that, regardless of how inconsistent
> > such behavior is.
> > 
> > I think we need at least an explicit ack from Rao, as the main known
> > user.
> 
> I see Rao stated that the unix OoB implementation was designed to mimic
> the tcp one:
> 
> https://lore.kernel.org/netdev/c5f6abbe-de43-48b8-856a-36ded227e94f@oracle.com/
> 
> so the series should be ok.
> 
> Still given the size of the changes and the behavior change I'm
> wondering if the series should target net-next instead.
> That would offer some time cushion to deal with eventual regression.
> WDYT?

The actual change is 37 LoC and we recently have this kind of changes
(30 LoC in total) in net.git.  The last two were merged in April and
we have no user report so far.

  a6736a0addd6 af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
  22dd70eb2c3d af_unix: Don't peek OOB data without MSG_OOB.
  283454c8a123 af_unix: Call manage_oob() for every skb in unix_stream_read_generic().

Most of the changes are due to selftest.  The original test repeated the
same set of code but covered few cases.  OTOH, the new test spends most
of lines to add as many test cases as possible, which IMHO nicely covers
regression if we want to mimic TCP.

  On net.git:

  # FAILED: 20 / 38 tests passed.
  # Totals: pass:20 fail:18 xfail:0 xpass:0 skip:0 error:0

Also, now the original test is broken in stable after the commits above,
so I think it would be nice to have this series in stable.

Thanks!
Paolo Abeni June 27, 2024, 10:04 a.m. UTC | #4
On Wed, 2024-06-26 at 14:47 -0700, Kuniyuki Iwashima wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> Date: Wed, 26 Jun 2024 23:10:40 +0200
> > On Wed, 2024-06-26 at 18:56 +0200, Paolo Abeni wrote:
> > > On Mon, 2024-06-24 at 18:36 -0700, Kuniyuki Iwashima wrote:
> > > > After consuming OOB data, recv() reading the preceding data must break at
> > > > the OOB skb regardless of MSG_PEEK.
> > > > 
> > > > Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is
> > > > not compliant with TCP.
> > > 
> > > I'm unsure we can change the MSG_PEEK behavior at this point: existing
> > > application(s?) could relay on that, regardless of how inconsistent
> > > such behavior is.
> > > 
> > > I think we need at least an explicit ack from Rao, as the main known
> > > user.
> > 
> > I see Rao stated that the unix OoB implementation was designed to mimic
> > the tcp one:
> > 
> > https://lore.kernel.org/netdev/c5f6abbe-de43-48b8-856a-36ded227e94f@oracle.com/
> > 
> > so the series should be ok.
> > 
> > Still given the size of the changes and the behavior change I'm
> > wondering if the series should target net-next instead.
> > That would offer some time cushion to deal with eventual regression.
> > WDYT?
> 
> The actual change is 37 LoC 

... excluding the other ~1200 LoC ;)

> and we recently have this kind of changes
> (30 LoC in total) in net.git.  The last two were merged in April and
> we have no user report so far.
> 
>   a6736a0addd6 af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
>   22dd70eb2c3d af_unix: Don't peek OOB data without MSG_OOB.
>   283454c8a123 af_unix: Call manage_oob() for every skb in unix_stream_read_generic().

The last commit mentioned above actually sparked a bit of post merge
discussion, which is IMHO sub-optimal.

On the flip side, I think all the changes in this series make sense,
and the self-tests refactor and extension is largish, but very nice.

TL;DR: I'm going to apply this now.

Thanks,

Paolo
Shoaib Rao July 6, 2024, 9:38 a.m. UTC | #5
On 6/26/24 02:10, Paolo Abeni wrote:
> On Wed, 2024-06-26 at 18:56 +0200, Paolo Abeni wrote:
>> On Mon, 2024-06-24 at 18:36 -0700, Kuniyuki Iwashima wrote:
>>> After consuming OOB data, recv() reading the preceding data must break at
>>> the OOB skb regardless of MSG_PEEK.
>>>
>>> Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is
>>> not compliant with TCP.
>>
>> I'm unsure we can change the MSG_PEEK behavior at this point: existing
>> application(s?) could relay on that, regardless of how inconsistent
>> such behavior is.
>>
>> I think we need at least an explicit ack from Rao, as the main known
>> user.
> 
> I see Rao stated that the unix OoB implementation was designed to mimic
> the tcp one:
> 
> https://urldefense.com/v3/__https://lore.kernel.org/netdev/c5f6abbe-de43-48b8-856a-36ded227e94f@oracle.com/__;!!ACWV5N9M2RV99hQ!MxZ1Tdors9BCjgG4K-LeD_fvtJ0mQ6jgbR5CfGYIouxV7LbYRiKf7zCml6SKYN7OLG7LZnZHnBnVfyo$ 
> 
> so the series should be ok.

Yes.

Thanks,

Shoaib
> 
> Still given the size of the changes and the behavior change I'm
> wondering if the series should target net-next instead.
> That would offer some time cushion to deal with eventual regression.
> WDYT?
> 
> Thanks,
> 
> Paolo
>
diff mbox series

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 5e695a9a609c..2eaecf9d78a4 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2613,9 +2613,12 @@  static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,
 {
 	struct unix_sock *u = unix_sk(sk);
 
-	if (!unix_skb_len(skb) && !(flags & MSG_PEEK)) {
-		skb_unlink(skb, &sk->sk_receive_queue);
-		consume_skb(skb);
+	if (!unix_skb_len(skb)) {
+		if (!(flags & MSG_PEEK)) {
+			skb_unlink(skb, &sk->sk_receive_queue);
+			consume_skb(skb);
+		}
+
 		skb = NULL;
 	} else {
 		struct sk_buff *unlinked_skb = NULL;
diff --git a/tools/testing/selftests/net/af_unix/msg_oob.c b/tools/testing/selftests/net/af_unix/msg_oob.c
index d427d39d0806..de8d1fcde883 100644
--- a/tools/testing/selftests/net/af_unix/msg_oob.c
+++ b/tools/testing/selftests/net/af_unix/msg_oob.c
@@ -21,6 +21,21 @@  FIXTURE(msg_oob)
 				 */
 };
 
+FIXTURE_VARIANT(msg_oob)
+{
+	bool peek;
+};
+
+FIXTURE_VARIANT_ADD(msg_oob, no_peek)
+{
+	.peek = false,
+};
+
+FIXTURE_VARIANT_ADD(msg_oob, peek)
+{
+	.peek = true
+};
+
 static void create_unix_socketpair(struct __test_metadata *_metadata,
 				   FIXTURE_DATA(msg_oob) *self)
 {
@@ -156,8 +171,14 @@  static void __recvpair(struct __test_metadata *_metadata,
 	__sendpair(_metadata, self, buf, len, flags)
 
 #define recvpair(expected_buf, expected_len, buf_len, flags)		\
-	__recvpair(_metadata, self,					\
-		   expected_buf, expected_len, buf_len, flags)
+	do {								\
+		if (variant->peek)					\
+			__recvpair(_metadata, self,			\
+				   expected_buf, expected_len,		\
+				   buf_len, (flags) | MSG_PEEK);	\
+		__recvpair(_metadata, self,				\
+			   expected_buf, expected_len, buf_len, flags);	\
+	} while (0)
 
 TEST_F(msg_oob, non_oob)
 {