Message ID | 20241024154119.1096947-1-maze@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: define and implement new SOL_SOCKET SO_RX_IFINDEX option | expand |
On Thu, Oct 24, 2024 at 5:41 PM Maciej Żenczykowski <maze@google.com> wrote: > > This is currently only implemented for TCP and is not > guaranteed to return correct information for a multitude > of reasons (including multipath reception), but there are > scenarios where it is useful: in particular a strong host > model where connections are only viable via a single interface, > for example a VPN interface. One could for example choose > to use this to SO_BINDTODEVICE. > > Test: > // Python 2.7.18 (default, Jul 13 2022, 18:14:36) > import socket > SO_RX_IFINDEX=82 > s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0) > c = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0) > s.bind(('::', 8888)) > s.listen(128) > c.connect(('::', 8888)) > a = s.accept() > print a # (<socket._socketobject object>, ('::1', 58144, 0, 0)) > p=a[0] > p.getsockname() # ('::1', 8888, 0, 0) > p.getpeername() # ('::1', 58144, 0, 0) > c.getsockname() # ('::1', 58144, 0, 0) > c.getpeername() # ('::1', 8888, 0, 0) > p.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) > c.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 0 (unknown) > c.send(b'X') # 1 > p.recv(2) # 'X' > p.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) > c.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 0 (unknown) > p.send(b'Z') # 1 > c.recv(2) # 'Z' > p.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) > c.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) > > Which shows we should possibly fix the 3-way handshake SYN-ACK > to set sk->sk_rx_dst_ifindex. > > Cc: Lorenzo Colitti <lorenzo@google.com> > Cc: Eric Dumazet <edumazet@google.com> > Signed-off-by: Maciej Żenczykowski <maze@google.com> > --- > arch/alpha/include/uapi/asm/socket.h | 2 ++ > arch/mips/include/uapi/asm/socket.h | 2 ++ > arch/parisc/include/uapi/asm/socket.h | 2 ++ > arch/sparc/include/uapi/asm/socket.h | 2 ++ > include/uapi/asm-generic/socket.h | 2 ++ > net/core/sock.c | 4 ++++ > 6 files changed, 14 insertions(+) > > diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h > index 302507bf9b5d..5f139b095a49 100644 > --- a/arch/alpha/include/uapi/asm/socket.h > +++ b/arch/alpha/include/uapi/asm/socket.h > @@ -148,6 +148,8 @@ > > #define SCM_TS_OPT_ID 81 > > +#define SO_RX_IFINDEX 82 > + > #if !defined(__KERNEL__) > > #if __BITS_PER_LONG == 64 > diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h > index d118d4731580..ff25d24b4dea 100644 > --- a/arch/mips/include/uapi/asm/socket.h > +++ b/arch/mips/include/uapi/asm/socket.h > @@ -159,6 +159,8 @@ > > #define SCM_TS_OPT_ID 81 > > +#define SO_RX_IFINDEX 82 > + > #if !defined(__KERNEL__) > > #if __BITS_PER_LONG == 64 > diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h > index d268d69bfcd2..3f89c388e356 100644 > --- a/arch/parisc/include/uapi/asm/socket.h > +++ b/arch/parisc/include/uapi/asm/socket.h > @@ -140,6 +140,8 @@ > > #define SCM_TS_OPT_ID 0x404C > > +#define SO_RX_IFINDEX 82 > + > #if !defined(__KERNEL__) > > #if __BITS_PER_LONG == 64 > diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h > index 113cd9f353e3..f1af74f5f1ad 100644 > --- a/arch/sparc/include/uapi/asm/socket.h > +++ b/arch/sparc/include/uapi/asm/socket.h > @@ -141,6 +141,8 @@ > > #define SCM_TS_OPT_ID 0x005a > > +#define SO_RX_IFINDEX 0x005b > + > #if !defined(__KERNEL__) > > > diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h > index deacfd6dd197..b16c69e22606 100644 > --- a/include/uapi/asm-generic/socket.h > +++ b/include/uapi/asm-generic/socket.h > @@ -143,6 +143,8 @@ > > #define SCM_TS_OPT_ID 81 > > +#define SO_RX_IFINDEX 82 > + > #if !defined(__KERNEL__) > > #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) > diff --git a/net/core/sock.c b/net/core/sock.c > index 7f398bd07fb7..6c985413c21f 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1932,6 +1932,10 @@ int sk_getsockopt(struct sock *sk, int level, int optname, > v.val = READ_ONCE(sk->sk_mark); > break; > > + case SO_RX_IFINDEX: > + v.val = READ_ONCE(sk->sk_rx_dst_ifindex); > + break; > + > case SO_RCVMARK: > v.val = sock_flag(sk, SOCK_RCVMARK); > break; > -- > 2.47.0.105.g07ac214952-goog > Note: I'm not sure if I did the right thing with parisc... It has: #define SO_DEVMEM_LINEAR 78 #define SCM_DEVMEM_LINEAR SO_DEVMEM_LINEAR #define SO_DEVMEM_DMABUF 79 #define SCM_DEVMEM_DMABUF SO_DEVMEM_DMABUF #define SO_DEVMEM_DONTNEED 80 which is weird... -- Maciej Żenczykowski, Kernel Networking Developer @ Google
Maciej Żenczykowski wrote: > On Thu, Oct 24, 2024 at 5:41 PM Maciej Żenczykowski <maze@google.com> wrote: > > > > This is currently only implemented for TCP and is not > > guaranteed to return correct information for a multitude > > of reasons (including multipath reception), but there are > > scenarios where it is useful: in particular a strong host > > model where connections are only viable via a single interface, > > for example a VPN interface. One could for example choose > > to use this to SO_BINDTODEVICE. Fair to say that this is the equivalent of ipi_ifindex in IP_PKTINFO, but for non datagram sockets where skb_iff cannot be read directly? > > > > Test: > > // Python 2.7.18 (default, Jul 13 2022, 18:14:36) > > import socket > > SO_RX_IFINDEX=82 > > s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0) > > c = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0) > > s.bind(('::', 8888)) > > s.listen(128) > > c.connect(('::', 8888)) > > a = s.accept() > > print a # (<socket._socketobject object>, ('::1', 58144, 0, 0)) > > p=a[0] > > p.getsockname() # ('::1', 8888, 0, 0) > > p.getpeername() # ('::1', 58144, 0, 0) > > c.getsockname() # ('::1', 58144, 0, 0) > > c.getpeername() # ('::1', 8888, 0, 0) > > p.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) > > c.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 0 (unknown) > > c.send(b'X') # 1 > > p.recv(2) # 'X' > > p.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) > > c.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 0 (unknown) > > p.send(b'Z') # 1 > > c.recv(2) # 'Z' > > p.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) > > c.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) > > > > Which shows we should possibly fix the 3-way handshake SYN-ACK > > to set sk->sk_rx_dst_ifindex. > > > > Cc: Lorenzo Colitti <lorenzo@google.com> > > Cc: Eric Dumazet <edumazet@google.com> > > Signed-off-by: Maciej Żenczykowski <maze@google.com> > > --- > > arch/alpha/include/uapi/asm/socket.h | 2 ++ > > arch/mips/include/uapi/asm/socket.h | 2 ++ > > arch/parisc/include/uapi/asm/socket.h | 2 ++ > > arch/sparc/include/uapi/asm/socket.h | 2 ++ > > include/uapi/asm-generic/socket.h | 2 ++ > > net/core/sock.c | 4 ++++ > > 6 files changed, 14 insertions(+) > > > > diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h > > index 302507bf9b5d..5f139b095a49 100644 > > --- a/arch/alpha/include/uapi/asm/socket.h > > +++ b/arch/alpha/include/uapi/asm/socket.h > > @@ -148,6 +148,8 @@ > > > > #define SCM_TS_OPT_ID 81 > > > > +#define SO_RX_IFINDEX 82 > > + > > #if !defined(__KERNEL__) > > > > #if __BITS_PER_LONG == 64 > > diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h > > index d118d4731580..ff25d24b4dea 100644 > > --- a/arch/mips/include/uapi/asm/socket.h > > +++ b/arch/mips/include/uapi/asm/socket.h > > @@ -159,6 +159,8 @@ > > > > #define SCM_TS_OPT_ID 81 > > > > +#define SO_RX_IFINDEX 82 > > + > > #if !defined(__KERNEL__) > > > > #if __BITS_PER_LONG == 64 > > diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h > > index d268d69bfcd2..3f89c388e356 100644 > > --- a/arch/parisc/include/uapi/asm/socket.h > > +++ b/arch/parisc/include/uapi/asm/socket.h > > @@ -140,6 +140,8 @@ > > > > #define SCM_TS_OPT_ID 0x404C > > > > +#define SO_RX_IFINDEX 82 > > + > > #if !defined(__KERNEL__) > > > > #if __BITS_PER_LONG == 64 > > diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h > > index 113cd9f353e3..f1af74f5f1ad 100644 > > --- a/arch/sparc/include/uapi/asm/socket.h > > +++ b/arch/sparc/include/uapi/asm/socket.h > > @@ -141,6 +141,8 @@ > > > > #define SCM_TS_OPT_ID 0x005a > > > > +#define SO_RX_IFINDEX 0x005b > > + > > #if !defined(__KERNEL__) > > > > > > diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h > > index deacfd6dd197..b16c69e22606 100644 > > --- a/include/uapi/asm-generic/socket.h > > +++ b/include/uapi/asm-generic/socket.h > > @@ -143,6 +143,8 @@ > > > > #define SCM_TS_OPT_ID 81 > > > > +#define SO_RX_IFINDEX 82 > > + > > #if !defined(__KERNEL__) > > > > #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 7f398bd07fb7..6c985413c21f 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -1932,6 +1932,10 @@ int sk_getsockopt(struct sock *sk, int level, int optname, > > v.val = READ_ONCE(sk->sk_mark); > > break; > > > > + case SO_RX_IFINDEX: > > + v.val = READ_ONCE(sk->sk_rx_dst_ifindex); > > + break; > > + If it is limited to TCP, return error in other cases. So that we can extend it later with well defined behavior. > > case SO_RCVMARK: > > v.val = sock_flag(sk, SOCK_RCVMARK); > > break; > > -- > > 2.47.0.105.g07ac214952-goog > > > > Note: I'm not sure if I did the right thing with parisc... > It has: > #define SO_DEVMEM_LINEAR 78 > #define SCM_DEVMEM_LINEAR SO_DEVMEM_LINEAR > #define SO_DEVMEM_DMABUF 79 > #define SCM_DEVMEM_DMABUF SO_DEVMEM_DMABUF > #define SO_DEVMEM_DONTNEED 80 > which is weird... This is a common pattern. To define separate SCM constants for cmsg fields, even though they have the same constant as their [gs]etsockopt counterparts.
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h index 302507bf9b5d..5f139b095a49 100644 --- a/arch/alpha/include/uapi/asm/socket.h +++ b/arch/alpha/include/uapi/asm/socket.h @@ -148,6 +148,8 @@ #define SCM_TS_OPT_ID 81 +#define SO_RX_IFINDEX 82 + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h index d118d4731580..ff25d24b4dea 100644 --- a/arch/mips/include/uapi/asm/socket.h +++ b/arch/mips/include/uapi/asm/socket.h @@ -159,6 +159,8 @@ #define SCM_TS_OPT_ID 81 +#define SO_RX_IFINDEX 82 + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h index d268d69bfcd2..3f89c388e356 100644 --- a/arch/parisc/include/uapi/asm/socket.h +++ b/arch/parisc/include/uapi/asm/socket.h @@ -140,6 +140,8 @@ #define SCM_TS_OPT_ID 0x404C +#define SO_RX_IFINDEX 82 + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h index 113cd9f353e3..f1af74f5f1ad 100644 --- a/arch/sparc/include/uapi/asm/socket.h +++ b/arch/sparc/include/uapi/asm/socket.h @@ -141,6 +141,8 @@ #define SCM_TS_OPT_ID 0x005a +#define SO_RX_IFINDEX 0x005b + #if !defined(__KERNEL__) diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index deacfd6dd197..b16c69e22606 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -143,6 +143,8 @@ #define SCM_TS_OPT_ID 81 +#define SO_RX_IFINDEX 82 + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) diff --git a/net/core/sock.c b/net/core/sock.c index 7f398bd07fb7..6c985413c21f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1932,6 +1932,10 @@ int sk_getsockopt(struct sock *sk, int level, int optname, v.val = READ_ONCE(sk->sk_mark); break; + case SO_RX_IFINDEX: + v.val = READ_ONCE(sk->sk_rx_dst_ifindex); + break; + case SO_RCVMARK: v.val = sock_flag(sk, SOCK_RCVMARK); break;
This is currently only implemented for TCP and is not guaranteed to return correct information for a multitude of reasons (including multipath reception), but there are scenarios where it is useful: in particular a strong host model where connections are only viable via a single interface, for example a VPN interface. One could for example choose to use this to SO_BINDTODEVICE. Test: // Python 2.7.18 (default, Jul 13 2022, 18:14:36) import socket SO_RX_IFINDEX=82 s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0) c = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0) s.bind(('::', 8888)) s.listen(128) c.connect(('::', 8888)) a = s.accept() print a # (<socket._socketobject object>, ('::1', 58144, 0, 0)) p=a[0] p.getsockname() # ('::1', 8888, 0, 0) p.getpeername() # ('::1', 58144, 0, 0) c.getsockname() # ('::1', 58144, 0, 0) c.getpeername() # ('::1', 8888, 0, 0) p.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) c.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 0 (unknown) c.send(b'X') # 1 p.recv(2) # 'X' p.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) c.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 0 (unknown) p.send(b'Z') # 1 c.recv(2) # 'Z' p.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) c.getsockopt(socket.SOL_SOCKET, SO_RX_IFINDEX) # 1 (lo) Which shows we should possibly fix the 3-way handshake SYN-ACK to set sk->sk_rx_dst_ifindex. Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> --- arch/alpha/include/uapi/asm/socket.h | 2 ++ arch/mips/include/uapi/asm/socket.h | 2 ++ arch/parisc/include/uapi/asm/socket.h | 2 ++ arch/sparc/include/uapi/asm/socket.h | 2 ++ include/uapi/asm-generic/socket.h | 2 ++ net/core/sock.c | 4 ++++ 6 files changed, 14 insertions(+)