Message ID | 20241029-b4-ovpn-v11-9-de4698c73a25@openvpn.net (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Introducing OpenVPN Data Channel Offload | expand |
2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote: > +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) > +{ [...] > + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); > + if (unlikely(opcode != OVPN_DATA_V2)) { > + /* DATA_V1 is not supported */ > + if (opcode == OVPN_DATA_V1) The TCP encap code passes everything that's not V2 to userspace. Why not do that with UDP as well? > + goto drop; > + > + /* unknown or control packet: let it bubble up to userspace */ > + return 1; > + } > + > + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); > + /* some OpenVPN server implementations send data packets with the > + * peer-id set to undef. In this case we skip the peer lookup by peer-id > + * and we try with the transport address > + */ > + if (peer_id != OVPN_PEER_ID_UNDEF) { > + peer = ovpn_peer_get_by_id(ovpn, peer_id); > + if (!peer) { > + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", > + __func__, peer_id); > + goto drop; > + } > + } > + > + if (!peer) { nit: that could be an "else" combined with the previous case? > + /* data packet with undef peer-id */ > + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); > + if (unlikely(!peer)) { > + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", > + __func__); > + goto drop; > + } > + }
On 31/10/2024 12:29, Sabrina Dubroca wrote: > 2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote: >> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) >> +{ > [...] >> + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); >> + if (unlikely(opcode != OVPN_DATA_V2)) { >> + /* DATA_V1 is not supported */ >> + if (opcode == OVPN_DATA_V1) > > The TCP encap code passes everything that's not V2 to userspace. Why > not do that with UDP as well? If that's the case, then this is a bug in the TCP code. DATA_Vx packets are part of the data channel and userspace can't do anything with them (userspace handles the control channel only when the ovpn module is in use). I'll go check the TCP code then, because sending DATA_V1 to userspace is not expected. Thanks for noticing this discrepancy. > >> + goto drop; >> + >> + /* unknown or control packet: let it bubble up to userspace */ >> + return 1; >> + } >> + >> + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); >> + /* some OpenVPN server implementations send data packets with the >> + * peer-id set to undef. In this case we skip the peer lookup by peer-id >> + * and we try with the transport address >> + */ >> + if (peer_id != OVPN_PEER_ID_UNDEF) { >> + peer = ovpn_peer_get_by_id(ovpn, peer_id); >> + if (!peer) { >> + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", >> + __func__, peer_id); >> + goto drop; >> + } >> + } >> + >> + if (!peer) { > > nit: that could be an "else" combined with the previous case? mhh that's true. Then I can combine the two "if (!peer)" in one block only. Thanks! Regards, > >> + /* data packet with undef peer-id */ >> + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); >> + if (unlikely(!peer)) { >> + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", >> + __func__); >> + goto drop; >> + } >> + } >
On 29.10.2024 12:47, Antonio Quartulli wrote: > Packets received over the socket are forwarded to the user device. > > Implementation is UDP only. TCP will be added by a later patch. > > Note: no decryption/decapsulation exists yet, packets are forwarded as > they arrive without much processing. > > Signed-off-by: Antonio Quartulli <antonio@openvpn.net> > --- > drivers/net/ovpn/io.c | 66 ++++++++++++++++++++++++++- > drivers/net/ovpn/io.h | 2 + > drivers/net/ovpn/main.c | 13 +++++- > drivers/net/ovpn/ovpnstruct.h | 3 ++ > drivers/net/ovpn/proto.h | 75 ++++++++++++++++++++++++++++++ > drivers/net/ovpn/socket.c | 24 ++++++++++ > drivers/net/ovpn/udp.c | 104 +++++++++++++++++++++++++++++++++++++++++- > drivers/net/ovpn/udp.h | 3 +- > 8 files changed, 286 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c > index 77ba4d33ae0bd2f52e8bd1c06a182d24285297b4..791a1b117125118b179cb13cdfd5fbab6523a360 100644 > --- a/drivers/net/ovpn/io.c > +++ b/drivers/net/ovpn/io.c > @@ -9,15 +9,79 @@ > > #include <linux/netdevice.h> > #include <linux/skbuff.h> > +#include <net/gro_cells.h> > #include <net/gso.h> > > -#include "io.h" > #include "ovpnstruct.h" > #include "peer.h" > +#include "io.h" > +#include "netlink.h" > +#include "proto.h" > #include "udp.h" > #include "skb.h" > #include "socket.h" > > +/* Called after decrypt to write the IP packet to the device. > + * This method is expected to manage/free the skb. > + */ > +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) > +{ > + unsigned int pkt_len; > + > + /* we can't guarantee the packet wasn't corrupted before entering the > + * VPN, therefore we give other layers a chance to check that > + */ > + skb->ip_summed = CHECKSUM_NONE; > + > + /* skb hash for transport packet no longer valid after decapsulation */ > + skb_clear_hash(skb); > + > + /* post-decrypt scrub -- prepare to inject encapsulated packet onto the > + * interface, based on __skb_tunnel_rx() in dst.h > + */ > + skb->dev = peer->ovpn->dev; > + skb_set_queue_mapping(skb, 0); > + skb_scrub_packet(skb, true); > + The skb->protocol field is going to be updated in the upcoming patch in the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, why do not touch the protocol field here? > + skb_reset_network_header(skb); ovpn_decrypt_post() already reseted the network header. Why do we need it here again? > + skb_reset_transport_header(skb); > + skb_probe_transport_header(skb); > + skb_reset_inner_headers(skb); > + > + memset(skb->cb, 0, sizeof(skb->cb)); Why do we need to zero the control buffer here? > + /* cause packet to be "received" by the interface */ > + pkt_len = skb->len; > + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, > + skb) == NET_RX_SUCCESS)) > + /* update RX stats with the size of decrypted packet */ > + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); > +} > + > +static void ovpn_decrypt_post(struct sk_buff *skb, int ret) > +{ > + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; > + > + if (unlikely(ret < 0)) > + goto drop; > + > + ovpn_netdev_write(peer, skb); > + /* skb is passed to upper layer - don't free it */ > + skb = NULL; > +drop: > + if (unlikely(skb)) > + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); > + ovpn_peer_put(peer); > + kfree_skb(skb); > +} > + > +/* pick next packet from RX queue, decrypt and forward it to the device */ The function now receives packets from externel callers. Should we update the above comment? > +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb) > +{ > + ovpn_skb_cb(skb)->peer = peer; > + ovpn_decrypt_post(skb, 0); > +} > + > static void ovpn_encrypt_post(struct sk_buff *skb, int ret) > { > struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; > diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h > index aa259be66441f7b0262f39da12d6c3dce0a9b24c..9667a0a470e0b4b427524fffb5b9b395007e5a2f 100644 > --- a/drivers/net/ovpn/io.h > +++ b/drivers/net/ovpn/io.h > @@ -12,4 +12,6 @@ > > netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev); > > +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb); > + > #endif /* _NET_OVPN_OVPN_H_ */ > diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c > index 5492ce07751d135c1484fe1ed8227c646df94969..73348765a8cf24321aa6be78e75f607d6dbffb1d 100644 > --- a/drivers/net/ovpn/main.c > +++ b/drivers/net/ovpn/main.c > @@ -11,6 +11,7 @@ > #include <linux/module.h> > #include <linux/netdevice.h> > #include <linux/inetdevice.h> > +#include <net/gro_cells.h> > #include <net/ip.h> > #include <net/rtnetlink.h> > #include <uapi/linux/if_arp.h> > @@ -32,7 +33,16 @@ static void ovpn_struct_free(struct net_device *net) > > static int ovpn_net_init(struct net_device *dev) > { > - return 0; > + struct ovpn_struct *ovpn = netdev_priv(dev); > + > + return gro_cells_init(&ovpn->gro_cells, dev); > +} > + > +static void ovpn_net_uninit(struct net_device *dev) > +{ > + struct ovpn_struct *ovpn = netdev_priv(dev); > + > + gro_cells_destroy(&ovpn->gro_cells); > } > > static int ovpn_net_open(struct net_device *dev) > @@ -56,6 +66,7 @@ static int ovpn_net_stop(struct net_device *dev) > > static const struct net_device_ops ovpn_netdev_ops = { > .ndo_init = ovpn_net_init, > + .ndo_uninit = ovpn_net_uninit, > .ndo_open = ovpn_net_open, > .ndo_stop = ovpn_net_stop, > .ndo_start_xmit = ovpn_net_xmit, > diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h > index a22c5083381c131db01a28c0f51e661d690d4998..4a48fc048890ab1cda78bc104fe3034b4a49d226 100644 > --- a/drivers/net/ovpn/ovpnstruct.h > +++ b/drivers/net/ovpn/ovpnstruct.h > @@ -10,6 +10,7 @@ > #ifndef _NET_OVPN_OVPNSTRUCT_H_ > #define _NET_OVPN_OVPNSTRUCT_H_ > > +#include <net/gro_cells.h> > #include <net/net_trackers.h> > #include <uapi/linux/if_link.h> > #include <uapi/linux/ovpn.h> > @@ -23,6 +24,7 @@ > * @lock: protect this object > * @peer: in P2P mode, this is the only remote peer > * @dev_list: entry for the module wide device list > + * @gro_cells: pointer to the Generic Receive Offload cell > */ > struct ovpn_struct { > struct net_device *dev; > @@ -32,6 +34,7 @@ struct ovpn_struct { > spinlock_t lock; /* protect writing to the ovpn_struct object */ > struct ovpn_peer __rcu *peer; > struct list_head dev_list; > + struct gro_cells gro_cells; > }; > > #endif /* _NET_OVPN_OVPNSTRUCT_H_ */ > diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h > new file mode 100644 > index 0000000000000000000000000000000000000000..69604cf26bbf82539ee5cd5a7ac9c23920f555de > --- /dev/null > +++ b/drivers/net/ovpn/proto.h > @@ -0,0 +1,75 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* OpenVPN data channel offload > + * > + * Copyright (C) 2020-2024 OpenVPN, Inc. > + * > + * Author: Antonio Quartulli <antonio@openvpn.net> > + * James Yonan <james@openvpn.net> > + */ > + > +#ifndef _NET_OVPN_OVPNPROTO_H_ > +#define _NET_OVPN_OVPNPROTO_H_ > + > +#include "main.h" > + > +#include <linux/skbuff.h> > + > +/* Methods for operating on the initial command > + * byte of the OpenVPN protocol. > + */ > + > +/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in > + * one byte > + */ > +#define OVPN_KEY_ID_MASK 0x07 > +#define OVPN_OPCODE_SHIFT 3 > +#define OVPN_OPCODE_MASK 0x1F Instead of defining mask(s) and shift(s), we can define only masks and use bitfield API (see below). > +/* upper bounds on opcode and key ID */ > +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1) > +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1) > +/* packet opcodes of interest to us */ > +#define OVPN_DATA_V1 6 /* data channel V1 packet */ > +#define OVPN_DATA_V2 9 /* data channel V2 packet */ > +/* size of initial packet opcode */ > +#define OVPN_OP_SIZE_V1 1 > +#define OVPN_OP_SIZE_V2 4 > +#define OVPN_PEER_ID_MASK 0x00FFFFFF > +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF > +/* first byte of keepalive message */ > +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a > +/* first byte of exit message */ > +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28 From the above list of macros, OVPN_KEY_ID_MAX, OVPN_OPCODE_MAX, OVPN_OP_SIZE_V1, OVPN_KEEPALIVE_FIRST_BYTE, and OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE are unused and looks like should be removed. > +/** > + * ovpn_opcode_from_skb - extract OP code from skb at specified offset > + * @skb: the packet to extract the OP code from > + * @offset: the offset in the data buffer where the OP code is located > + * > + * Note: this function assumes that the skb head was pulled enough > + * to access the first byte. > + * > + * Return: the OP code > + */ > +static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 offset) > +{ > + u8 byte = *(skb->data + offset); > + > + return byte >> OVPN_OPCODE_SHIFT; For example here, the shift can be replaced with bitfield macro: #define OVPN_OPCODE_PKTTYPE_MSK 0xf8000000 #define OVPN_OPCODE_KEYID_MSK 0x07000000 #define OVPN_OPCODE_PEERID_MSK 0x00ffffff static inline u8 ovpn_opcode_from_skb(...) { u32 opcode = be32_to_cpu(*(__be32 *)(skb->data + offset)); return FIELD_GET(OVPN_OPCODE_PKTTYPE_MSK, opcode); } And the upcoming ovpn_opcode_compose() can be implemented like this: static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id) { return FIELD_PREP(OVPN_OPCODE_PKTTYPE_MSK, opcode) | FIELD_PREP(OVPN_OPCODE_KEYID_MSK, key_id) | FIELD_PREP(OVPN_OPCODE_PEERID_MSK, peer_id); } And with this size can be even embedded into ovpn_aead_encrypt() to make the header composing more clear. > +} > + > +/** > + * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset > + * @skb: the packet to extract the OP code from > + * @offset: the offset in the data buffer where the OP code is located > + * > + * Note: this function assumes that the skb head was pulled enough > + * to access the first 4 bytes. > + * > + * Return: the peer ID. > + */ > +static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, u16 offset) > +{ > + return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK; > +} > + > +#endif /* _NET_OVPN_OVPNPROTO_H_ */ > diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c > index 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 > --- a/drivers/net/ovpn/socket.c > +++ b/drivers/net/ovpn/socket.c > @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) > if (!sock) > return; > > + if (sock->sk->sk_protocol == IPPROTO_UDP) > + ovpn_udp_socket_detach(sock); > + > sockfd_put(sock); > } > > @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) > return ret; > } > > +/* Retrieve the corresponding ovpn object from a UDP socket > + * rcu_read_lock must be held on entry > + */ > +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) > +{ > + struct ovpn_socket *ovpn_sock; > + > + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) > + return NULL; > + > + ovpn_sock = rcu_dereference_sk_user_data(sk); > + if (unlikely(!ovpn_sock)) > + return NULL; > + > + /* make sure that sk matches our stored transport socket */ > + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) > + return NULL; > + > + return ovpn_sock->ovpn; Now, returning of this pointer is safe. But the following TCP transport support calls the socket release via a scheduled work. What extends socket lifetime and makes it possible to receive a UDP packet way after the interface private data release. Is it correct assumption? If the above is right then shall we set ->ovpn = NULL before scheduling the socket releasing work or somehow else mark the socket as half-destroyed? > +} > + > /** > * ovpn_socket_new - create a new socket and initialize it > * @sock: the kernel socket to embed > diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c > index d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 > --- a/drivers/net/ovpn/udp.c > +++ b/drivers/net/ovpn/udp.c > @@ -21,9 +21,95 @@ > #include "bind.h" > #include "io.h" > #include "peer.h" > +#include "proto.h" > #include "socket.h" > #include "udp.h" > > +/** > + * ovpn_udp_encap_recv - Start processing a received UDP packet. > + * @sk: socket over which the packet was received > + * @skb: the received packet > + * > + * If the first byte of the payload is DATA_V2, the packet is further processed, > + * otherwise it is forwarded to the UDP stack for delivery to user space. > + * > + * Return: > + * 0 if skb was consumed or dropped > + * >0 if skb should be passed up to userspace as UDP (packet not consumed) > + * <0 if skb should be resubmitted as proto -N (packet not consumed) > + */ > +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) > +{ > + struct ovpn_peer *peer = NULL; > + struct ovpn_struct *ovpn; > + u32 peer_id; > + u8 opcode; > + > + ovpn = ovpn_from_udp_sock(sk); > + if (unlikely(!ovpn)) { > + net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n", > + __func__); Probably we should zero ovpn pointer in the ovpn_sock to survive scheduled socket release (see comment in ovpn_from_udp_sock). So, this print should be removed to avoid printing misguiding errors. > + goto drop_noovpn; > + } > + > + /* Make sure the first 4 bytes of the skb data buffer after the UDP > + * header are accessible. > + * They are required to fetch the OP code, the key ID and the peer ID. > + */ > + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + > + OVPN_OP_SIZE_V2))) { > + net_dbg_ratelimited("%s: packet too small\n", __func__); > + goto drop; > + } > + > + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); > + if (unlikely(opcode != OVPN_DATA_V2)) { > + /* DATA_V1 is not supported */ > + if (opcode == OVPN_DATA_V1) > + goto drop; This packet dropping makes protocol accelerator, intendent to speed up the data packets processing, a protocol enforcement entity, isn't it? Shall we follow the principle of beeing liberal in what we accept and just forward everything besides data packets upstream to a userspace application? > + > + /* unknown or control packet: let it bubble up to userspace */ > + return 1; > + } > + > + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); > + /* some OpenVPN server implementations send data packets with the > + * peer-id set to undef. In this case we skip the peer lookup by peer-id > + * and we try with the transport address > + */ > + if (peer_id != OVPN_PEER_ID_UNDEF) { > + peer = ovpn_peer_get_by_id(ovpn, peer_id); > + if (!peer) { > + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", > + __func__, peer_id); Why do we consider a peer sending us garbage our problem? Meaning, this peer miss can be not our fault but a malformed packet from a 3rd party side. E.g. nowdays I can see a lot of traces of these "active probers" in my OpenVPN logs. Shall remove this message or at least make it debug to avoid bothering users with garbage traveling Internet? Anyway we can not do anything regarding incoming traffic. > + goto drop; > + } > + } > + > + if (!peer) { AFAIU, this condition can true only in case of peer_id beeing equal to OVPN_PEER_ID_UNDEF, right? In this case the condition check can be replaced by simple 'else' statement. And to make code more corresponding to the above comment regarding implementations that send undefined peer-id, can we swap sides of the lookup method selection? E.g. /* Comment about fancy implementations sending undefined peer-id */ if (peer_id == OVPN_PEER_ID_UNDEF) { /* Do transport address based loockup */ } else { /* Do peer-id based loockup */ } > + /* data packet with undef peer-id */ > + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); > + if (unlikely(!peer)) { > + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", > + __func__); > + goto drop; > + } > + } > + > + /* pop off outer UDP header */ > + __skb_pull(skb, sizeof(struct udphdr)); > + ovpn_recv(peer, skb); > + return 0; > + > +drop: > + if (peer) > + ovpn_peer_put(peer); AFAIU, the peer is alway NULL here. Shall we remove the above check? > + dev_core_stats_rx_dropped_inc(ovpn->dev); > +drop_noovpn: > + kfree_skb(skb); > + return 0; > +} > + > /** > * ovpn_udp4_output - send IPv4 packet over udp socket > * @ovpn: the openvpn instance > @@ -259,8 +345,12 @@ void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, > */ > int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) > { > + struct udp_tunnel_sock_cfg cfg = { > + .encap_type = UDP_ENCAP_OVPNINUDP, > + .encap_rcv = ovpn_udp_encap_recv, > + }; > struct ovpn_socket *old_data; > - int ret = 0; > + int ret; > > /* sanity check */ > if (sock->sk->sk_protocol != IPPROTO_UDP) { > @@ -274,6 +364,7 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) > if (!old_data) { > /* socket is currently unused - we can take it */ > rcu_read_unlock(); > + setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg); > return 0; > } > > @@ -302,3 +393,14 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) > > return ret; > } > + > +/** > + * ovpn_udp_socket_detach - clean udp-tunnel status for this socket > + * @sock: the socket to clean > + */ > +void ovpn_udp_socket_detach(struct socket *sock) > +{ > + struct udp_tunnel_sock_cfg cfg = { }; > + > + setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg); > +} > diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h > index e60f8cd2b4ac8f910aabcf8ed546af59d6ca4be4..fecb68464896bc1228315faf268453f9005e693d 100644 > --- a/drivers/net/ovpn/udp.h > +++ b/drivers/net/ovpn/udp.h > @@ -18,8 +18,9 @@ struct sk_buff; > struct socket; > > int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn); > - > +void ovpn_udp_socket_detach(struct socket *sock); > void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, > struct sk_buff *skb); > +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk); > > #endif /* _NET_OVPN_UDP_H_ */ >
On 29.10.2024 12:47, Antonio Quartulli wrote: > +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) > +{ > + unsigned int pkt_len; > + > + /* we can't guarantee the packet wasn't corrupted before entering the > + * VPN, therefore we give other layers a chance to check that > + */ > + skb->ip_summed = CHECKSUM_NONE; > + > + /* skb hash for transport packet no longer valid after decapsulation */ > + skb_clear_hash(skb); > + > + /* post-decrypt scrub -- prepare to inject encapsulated packet onto the > + * interface, based on __skb_tunnel_rx() in dst.h > + */ > + skb->dev = peer->ovpn->dev; > + skb_set_queue_mapping(skb, 0); > + skb_scrub_packet(skb, true); > + > + skb_reset_network_header(skb); > + skb_reset_transport_header(skb); > + skb_probe_transport_header(skb); > + skb_reset_inner_headers(skb); > + > + memset(skb->cb, 0, sizeof(skb->cb)); > + > + /* cause packet to be "received" by the interface */ > + pkt_len = skb->len; > + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, > + skb) == NET_RX_SUCCESS)) nit: to improve readability, the packet delivery call can be composed like this: pkt_len = skb->len; res = gro_cells_receive(&peer->ovpn->gro_cells, skb); if (likely(res == NET_RX_SUCCESS)) > + /* update RX stats with the size of decrypted packet */ > + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); > +}
On 11/11/2024 02:54, Sergey Ryazanov wrote: [...] >> +/* Called after decrypt to write the IP packet to the device. >> + * This method is expected to manage/free the skb. >> + */ >> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff >> *skb) >> +{ >> + unsigned int pkt_len; >> + >> + /* we can't guarantee the packet wasn't corrupted before entering >> the >> + * VPN, therefore we give other layers a chance to check that >> + */ >> + skb->ip_summed = CHECKSUM_NONE; >> + >> + /* skb hash for transport packet no longer valid after >> decapsulation */ >> + skb_clear_hash(skb); >> + >> + /* post-decrypt scrub -- prepare to inject encapsulated packet >> onto the >> + * interface, based on __skb_tunnel_rx() in dst.h >> + */ >> + skb->dev = peer->ovpn->dev; >> + skb_set_queue_mapping(skb, 0); >> + skb_scrub_packet(skb, true); >> + > > The skb->protocol field is going to be updated in the upcoming patch in > the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, > why do not touch the protocol field here? Well, I would personally not document missing details in a partly implemented code path. > >> + skb_reset_network_header(skb); > > ovpn_decrypt_post() already reseted the network header. Why do we need > it here again? yeah, I think this can be removed. > >> + skb_reset_transport_header(skb); >> + skb_probe_transport_header(skb); >> + skb_reset_inner_headers(skb); >> + >> + memset(skb->cb, 0, sizeof(skb->cb)); > > Why do we need to zero the control buffer here? To avoid the next layer to assume the cb is clean while it is not. Other drivers do the same as well. I think this was recommended by Sabrina as well. > >> + /* cause packet to be "received" by the interface */ >> + pkt_len = skb->len; >> + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, >> + skb) == NET_RX_SUCCESS)) >> + /* update RX stats with the size of decrypted packet */ >> + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); >> +} >> + >> +static void ovpn_decrypt_post(struct sk_buff *skb, int ret) >> +{ >> + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; >> + >> + if (unlikely(ret < 0)) >> + goto drop; >> + >> + ovpn_netdev_write(peer, skb); >> + /* skb is passed to upper layer - don't free it */ >> + skb = NULL; >> +drop: >> + if (unlikely(skb)) >> + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); >> + ovpn_peer_put(peer); >> + kfree_skb(skb); >> +} >> + >> +/* pick next packet from RX queue, decrypt and forward it to the >> device */ > > The function now receives packets from externel callers. Should we > update the above comment? yap will do. [...] >> --- /dev/null >> +++ b/drivers/net/ovpn/proto.h >> @@ -0,0 +1,75 @@ >> +/* SPDX-License-Identifier: GPL-2.0-only */ >> +/* OpenVPN data channel offload >> + * >> + * Copyright (C) 2020-2024 OpenVPN, Inc. >> + * >> + * Author: Antonio Quartulli <antonio@openvpn.net> >> + * James Yonan <james@openvpn.net> >> + */ >> + >> +#ifndef _NET_OVPN_OVPNPROTO_H_ >> +#define _NET_OVPN_OVPNPROTO_H_ >> + >> +#include "main.h" >> + >> +#include <linux/skbuff.h> >> + >> +/* Methods for operating on the initial command >> + * byte of the OpenVPN protocol. >> + */ >> + >> +/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in >> + * one byte >> + */ >> +#define OVPN_KEY_ID_MASK 0x07 >> +#define OVPN_OPCODE_SHIFT 3 >> +#define OVPN_OPCODE_MASK 0x1F > > Instead of defining mask(s) and shift(s), we can define only masks and > use bitfield API (see below). > >> +/* upper bounds on opcode and key ID */ >> +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1) >> +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1) >> +/* packet opcodes of interest to us */ >> +#define OVPN_DATA_V1 6 /* data channel V1 packet */ >> +#define OVPN_DATA_V2 9 /* data channel V2 packet */ >> +/* size of initial packet opcode */ >> +#define OVPN_OP_SIZE_V1 1 >> +#define OVPN_OP_SIZE_V2 4 >> +#define OVPN_PEER_ID_MASK 0x00FFFFFF >> +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF >> +/* first byte of keepalive message */ >> +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a >> +/* first byte of exit message */ >> +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28 > > From the above list of macros, OVPN_KEY_ID_MAX, OVPN_OPCODE_MAX, > OVPN_OP_SIZE_V1, OVPN_KEEPALIVE_FIRST_BYTE, and > OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE are unused and looks like should be > removed. ACK > >> +/** >> + * ovpn_opcode_from_skb - extract OP code from skb at specified offset >> + * @skb: the packet to extract the OP code from >> + * @offset: the offset in the data buffer where the OP code is located >> + * >> + * Note: this function assumes that the skb head was pulled enough >> + * to access the first byte. >> + * >> + * Return: the OP code >> + */ >> +static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 >> offset) >> +{ >> + u8 byte = *(skb->data + offset); >> + >> + return byte >> OVPN_OPCODE_SHIFT; > > For example here, the shift can be replaced with bitfield macro: > > #define OVPN_OPCODE_PKTTYPE_MSK 0xf8000000 > #define OVPN_OPCODE_KEYID_MSK 0x07000000 > #define OVPN_OPCODE_PEERID_MSK 0x00ffffff > > static inline u8 ovpn_opcode_from_skb(...) > { > u32 opcode = be32_to_cpu(*(__be32 *)(skb->data + offset)); > > return FIELD_GET(OVPN_OPCODE_PKTTYPE_MSK, opcode); > } > > And the upcoming ovpn_opcode_compose() can be implemented like this: > > static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id) > { > return FIELD_PREP(OVPN_OPCODE_PKTTYPE_MSK, opcode) | > FIELD_PREP(OVPN_OPCODE_KEYID_MSK, key_id) | > FIELD_PREP(OVPN_OPCODE_PEERID_MSK, peer_id); > } > > And with this size can be even embedded into ovpn_aead_encrypt() to make > the header composing more clear. I wasn't aware of the bitfield API. Yeah, it looks cleaner and gives a better definition of the first 4 bytes of the header. There is also GENMASK() that helps with creating MASKs instead of hardcofing the bits in hex. Will give it a try, thanks! > >> +} >> + >> +/** >> + * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset >> + * @skb: the packet to extract the OP code from >> + * @offset: the offset in the data buffer where the OP code is located >> + * >> + * Note: this function assumes that the skb head was pulled enough >> + * to access the first 4 bytes. >> + * >> + * Return: the peer ID. >> + */ >> +static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, >> u16 offset) >> +{ >> + return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK; >> +} >> + >> +#endif /* _NET_OVPN_OVPNPROTO_H_ */ >> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c >> index >> 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 >> --- a/drivers/net/ovpn/socket.c >> +++ b/drivers/net/ovpn/socket.c >> @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) >> if (!sock) >> return; >> + if (sock->sk->sk_protocol == IPPROTO_UDP) >> + ovpn_udp_socket_detach(sock); >> + >> sockfd_put(sock); >> } >> @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, >> struct ovpn_peer *peer) >> return ret; >> } >> +/* Retrieve the corresponding ovpn object from a UDP socket >> + * rcu_read_lock must be held on entry >> + */ >> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) >> +{ >> + struct ovpn_socket *ovpn_sock; >> + >> + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != >> UDP_ENCAP_OVPNINUDP)) >> + return NULL; >> + >> + ovpn_sock = rcu_dereference_sk_user_data(sk); >> + if (unlikely(!ovpn_sock)) >> + return NULL; >> + >> + /* make sure that sk matches our stored transport socket */ >> + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) >> + return NULL; >> + >> + return ovpn_sock->ovpn; > > Now, returning of this pointer is safe. But the following TCP transport > support calls the socket release via a scheduled work. What extends > socket lifetime and makes it possible to receive a UDP packet way after > the interface private data release. Is it correct assumption? Sorry you lost me when sayng "following *TCP* transp[ort support calls". This function is invoked only in UDP context. Was that a typ0? > > If the above is right then shall we set ->ovpn = NULL before scheduling > the socket releasing work or somehow else mark the socket as half- > destroyed? > >> +} >> + >> /** >> * ovpn_socket_new - create a new socket and initialize it >> * @sock: the kernel socket to embed >> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c >> index >> d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 >> --- a/drivers/net/ovpn/udp.c >> +++ b/drivers/net/ovpn/udp.c >> @@ -21,9 +21,95 @@ >> #include "bind.h" >> #include "io.h" >> #include "peer.h" >> +#include "proto.h" >> #include "socket.h" >> #include "udp.h" >> +/** >> + * ovpn_udp_encap_recv - Start processing a received UDP packet. >> + * @sk: socket over which the packet was received >> + * @skb: the received packet >> + * >> + * If the first byte of the payload is DATA_V2, the packet is further >> processed, >> + * otherwise it is forwarded to the UDP stack for delivery to user >> space. >> + * >> + * Return: >> + * 0 if skb was consumed or dropped >> + * >0 if skb should be passed up to userspace as UDP (packet not >> consumed) >> + * <0 if skb should be resubmitted as proto -N (packet not consumed) >> + */ >> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) >> +{ >> + struct ovpn_peer *peer = NULL; >> + struct ovpn_struct *ovpn; >> + u32 peer_id; >> + u8 opcode; >> + >> + ovpn = ovpn_from_udp_sock(sk); >> + if (unlikely(!ovpn)) { >> + net_err_ratelimited("%s: cannot obtain ovpn object from UDP >> socket\n", >> + __func__); > > Probably we should zero ovpn pointer in the ovpn_sock to survive > scheduled socket release (see comment in ovpn_from_udp_sock). So, this > print should be removed to avoid printing misguiding errors. I am also not following this. ovpn is already NULL if we are entering this branch, no? And I think this condition is quite improbable as well. > >> + goto drop_noovpn; >> + } >> + >> + /* Make sure the first 4 bytes of the skb data buffer after the UDP >> + * header are accessible. >> + * They are required to fetch the OP code, the key ID and the >> peer ID. >> + */ >> + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + >> + OVPN_OP_SIZE_V2))) { >> + net_dbg_ratelimited("%s: packet too small\n", __func__); >> + goto drop; >> + } >> + >> + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); >> + if (unlikely(opcode != OVPN_DATA_V2)) { >> + /* DATA_V1 is not supported */ >> + if (opcode == OVPN_DATA_V1) >> + goto drop; > > This packet dropping makes protocol accelerator, intendent to speed up > the data packets processing, a protocol enforcement entity, isn't it? > Shall we follow the principle of beeing liberal in what we accept and > just forward everything besides data packets upstream to a userspace > application? 'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto expect any DATA packet to bubble up as it would not know what to do with it. So any decision regarding data packets should stay in 'ovpn'. We just decided to support the modern DATA_V2 (DATA_V1 is seldomly used nowadays). Moreover, it's quite impossible that a peer will send us DATA_V1 if it passed userspace handshake and negotiation. > >> + >> + /* unknown or control packet: let it bubble up to userspace */ >> + return 1; >> + } >> + >> + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); >> + /* some OpenVPN server implementations send data packets with the >> + * peer-id set to undef. In this case we skip the peer lookup by >> peer-id >> + * and we try with the transport address >> + */ >> + if (peer_id != OVPN_PEER_ID_UNDEF) { >> + peer = ovpn_peer_get_by_id(ovpn, peer_id); >> + if (!peer) { >> + net_err_ratelimited("%s: received data from unknown peer >> (id: %d)\n", >> + __func__, peer_id); > > Why do we consider a peer sending us garbage our problem? Meaning, this > peer miss can be not our fault but a malformed packet from a 3rd party > side. E.g. nowdays I can see a lot of traces of these "active probers" > in my OpenVPN logs. Shall remove this message or at least make it debug > to avoid bothering users with garbage traveling Internet? Anyway we can > not do anything regarding incoming traffic. It could also be a peer that believes to be connected while 'ovpn' dropped it earlier on. So this message would help the admin/user understanding what's going on. no? Maybe make it an info/notice instead of error? > >> + goto drop; >> + } >> + } >> + >> + if (!peer) { > > AFAIU, this condition can true only in case of peer_id beeing equal to > OVPN_PEER_ID_UNDEF, right? In this case the condition check can be > replaced by simple 'else' statement. > This part was actually rewritten already, so better wait for v12 before further discussing. > And to make code more corresponding to the above comment regarding > implementations that send undefined peer-id, can we swap sides of the > lookup method selection? E.g. > > /* Comment about fancy implementations sending undefined peer-id */ > if (peer_id == OVPN_PEER_ID_UNDEF) { > /* Do transport address based loockup */ > } else { > /* Do peer-id based loockup */ > } > >> + /* data packet with undef peer-id */ >> + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); >> + if (unlikely(!peer)) { >> + net_dbg_ratelimited("%s: received data with undef peer-id >> from unknown source\n", >> + __func__); >> + goto drop; >> + } >> + } >> + >> + /* pop off outer UDP header */ >> + __skb_pull(skb, sizeof(struct udphdr)); >> + ovpn_recv(peer, skb); >> + return 0; >> + >> +drop: >> + if (peer) >> + ovpn_peer_put(peer); > > AFAIU, the peer is alway NULL here. Shall we remove the above check? yeah simplified as well already. Thanks! Regards,
On 12/11/2024 01:16, Sergey Ryazanov wrote: > On 29.10.2024 12:47, Antonio Quartulli wrote: >> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff >> *skb) >> +{ >> + unsigned int pkt_len; >> + >> + /* we can't guarantee the packet wasn't corrupted before entering >> the >> + * VPN, therefore we give other layers a chance to check that >> + */ >> + skb->ip_summed = CHECKSUM_NONE; >> + >> + /* skb hash for transport packet no longer valid after >> decapsulation */ >> + skb_clear_hash(skb); >> + >> + /* post-decrypt scrub -- prepare to inject encapsulated packet >> onto the >> + * interface, based on __skb_tunnel_rx() in dst.h >> + */ >> + skb->dev = peer->ovpn->dev; >> + skb_set_queue_mapping(skb, 0); >> + skb_scrub_packet(skb, true); >> + >> + skb_reset_network_header(skb); >> + skb_reset_transport_header(skb); >> + skb_probe_transport_header(skb); >> + skb_reset_inner_headers(skb); >> + >> + memset(skb->cb, 0, sizeof(skb->cb)); >> + >> + /* cause packet to be "received" by the interface */ >> + pkt_len = skb->len; >> + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, >> + skb) == NET_RX_SUCCESS)) > > nit: to improve readability, the packet delivery call can be composed > like this: > > pkt_len = skb->len; > res = gro_cells_receive(&peer->ovpn->gro_cells, skb); > if (likely(res == NET_RX_SUCCESS)) > hm, you don't like calls on two lines? :-) ok, will change it. Regards, >> + /* update RX stats with the size of decrypted packet */ >> + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); >> +}
On 15.11.2024 17:02, Antonio Quartulli wrote: > On 11/11/2024 02:54, Sergey Ryazanov wrote: > [...] > >>> +/* Called after decrypt to write the IP packet to the device. >>> + * This method is expected to manage/free the skb. >>> + */ >>> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff >>> *skb) >>> +{ >>> + unsigned int pkt_len; >>> + >>> + /* we can't guarantee the packet wasn't corrupted before >>> entering the >>> + * VPN, therefore we give other layers a chance to check that >>> + */ >>> + skb->ip_summed = CHECKSUM_NONE; >>> + >>> + /* skb hash for transport packet no longer valid after >>> decapsulation */ >>> + skb_clear_hash(skb); >>> + >>> + /* post-decrypt scrub -- prepare to inject encapsulated packet >>> onto the >>> + * interface, based on __skb_tunnel_rx() in dst.h >>> + */ >>> + skb->dev = peer->ovpn->dev; >>> + skb_set_queue_mapping(skb, 0); >>> + skb_scrub_packet(skb, true); >>> + >> >> The skb->protocol field is going to be updated in the upcoming patch >> in the caller (ovpn_decrypt_post). Shall we put a comment here >> clarifying, why do not touch the protocol field here? > > Well, I would personally not document missing details in a partly > implemented code path. Looks like the question wasn't precisely emphrased. By bad. Let me elaborate it in more details: 1. usually skb->protocol is updated just before a packet leaves a module 2. I've not found it were it was expected 3. skb->protocol is updated in the caller function - ovpn_decrypt_post(), along with the skb_reset_network_header() call. The question is, shall we put some comment here in the ovpn_netdev_write() function elaborating that this was done in the caller? Or is such comment odd? >>> + skb_reset_network_header(skb); >> >> ovpn_decrypt_post() already reseted the network header. Why do we need >> it here again? > > yeah, I think this can be removed. > >> >>> + skb_reset_transport_header(skb); >>> + skb_probe_transport_header(skb); >>> + skb_reset_inner_headers(skb); >>> + >>> + memset(skb->cb, 0, sizeof(skb->cb)); >> >> Why do we need to zero the control buffer here? > > To avoid the next layer to assume the cb is clean while it is not. > Other drivers do the same as well. AFAIR, there is no convention to clean the control buffer before the handing over. The common practice is a bit opposite, programmer shall not assume that the control buffer has been zeroed. Not a big deal to clean it here, we just can save some CPU cycles avoiding it. > I think this was recommended by Sabrina as well. Curious. It's macsec that does not zero it, or I've not understood how it was done. >>> + /* cause packet to be "received" by the interface */ >>> + pkt_len = skb->len; >>> + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, >>> + skb) == NET_RX_SUCCESS)) >>> + /* update RX stats with the size of decrypted packet */ >>> + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); >>> +} >>> + [...] >>> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c >>> index >>> 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 >>> --- a/drivers/net/ovpn/socket.c >>> +++ b/drivers/net/ovpn/socket.c >>> @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) >>> if (!sock) >>> return; >>> + if (sock->sk->sk_protocol == IPPROTO_UDP) >>> + ovpn_udp_socket_detach(sock); >>> + >>> sockfd_put(sock); >>> } >>> @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, >>> struct ovpn_peer *peer) >>> return ret; >>> } >>> +/* Retrieve the corresponding ovpn object from a UDP socket >>> + * rcu_read_lock must be held on entry >>> + */ >>> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) >>> +{ >>> + struct ovpn_socket *ovpn_sock; >>> + >>> + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != >>> UDP_ENCAP_OVPNINUDP)) >>> + return NULL; >>> + >>> + ovpn_sock = rcu_dereference_sk_user_data(sk); >>> + if (unlikely(!ovpn_sock)) >>> + return NULL; >>> + >>> + /* make sure that sk matches our stored transport socket */ >>> + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) >>> + return NULL; >>> + >>> + return ovpn_sock->ovpn; >> >> Now, returning of this pointer is safe. But the following TCP >> transport support calls the socket release via a scheduled work. What >> extends socket lifetime and makes it possible to receive a UDP packet >> way after the interface private data release. Is it correct assumption? > > Sorry you lost me when sayng "following *TCP* transp[ort support calls". > This function is invoked only in UDP context. > Was that a typ0? Yeah, you are right. The question sounds like a riddle. I should eventually stop composing emails at midnight. Let me paraphrase it. The potential issue is tricky since we create it patch-by-patch. Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying: ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock netdev_run_todo rcu_barrier <- no running ovpn_udp_encap_recv after this point free_netdev After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good. Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call: ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) And long time after the socket will be actually detached: ovpn_socket_release_work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times. So, we can end up with this scenario: ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) netdev_run_todo rcu_barrier free_netdev ovpn_udp_encap_recv <- called for an incoming UDP packet ovpn_from_udp_sock <- returns pointer to freed memory // Any access to ovpn pointer is the use-after-free ovpn_socket_release_work <- kernel finally ivoke the work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock To address the issue, I see two possible solutions: 1. flush the workqueue somewhere before the netdev release 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach >> If the above is right then shall we set ->ovpn = NULL before >> scheduling the socket releasing work or somehow else mark the socket >> as half- destroyed? >> >>> +} >>> + >>> /** >>> * ovpn_socket_new - create a new socket and initialize it >>> * @sock: the kernel socket to embed >>> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c >>> index >>> d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 >>> --- a/drivers/net/ovpn/udp.c >>> +++ b/drivers/net/ovpn/udp.c >>> @@ -21,9 +21,95 @@ >>> #include "bind.h" >>> #include "io.h" >>> #include "peer.h" >>> +#include "proto.h" >>> #include "socket.h" >>> #include "udp.h" >>> +/** >>> + * ovpn_udp_encap_recv - Start processing a received UDP packet. >>> + * @sk: socket over which the packet was received >>> + * @skb: the received packet >>> + * >>> + * If the first byte of the payload is DATA_V2, the packet is >>> further processed, >>> + * otherwise it is forwarded to the UDP stack for delivery to user >>> space. >>> + * >>> + * Return: >>> + * 0 if skb was consumed or dropped >>> + * >0 if skb should be passed up to userspace as UDP (packet not >>> consumed) >>> + * <0 if skb should be resubmitted as proto -N (packet not consumed) >>> + */ >>> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) >>> +{ >>> + struct ovpn_peer *peer = NULL; >>> + struct ovpn_struct *ovpn; >>> + u32 peer_id; >>> + u8 opcode; >>> + >>> + ovpn = ovpn_from_udp_sock(sk); >>> + if (unlikely(!ovpn)) { >>> + net_err_ratelimited("%s: cannot obtain ovpn object from UDP >>> socket\n", >>> + __func__); >> >> Probably we should zero ovpn pointer in the ovpn_sock to survive >> scheduled socket release (see comment in ovpn_from_udp_sock). So, this >> print should be removed to avoid printing misguiding errors. > > I am also not following this. ovpn is already NULL if we are entering > this branch, no? > > And I think this condition is quite improbable as well. Here, due to the scheduled nature of the detach function invocation, ovpn_from_udp_sock() can return us a pointer to the freed memory. So we should prevent ovpn_udp_encap_recv() invocation after the netdev release by flushing the workqueue. Or we can set ovpn_sock->ovpn = NULL even before scheduling the socket detaching. And in this case, ovpn_from_udp_sock() returning NULL will be a legitimate case and we should drop the error printing. >>> + goto drop_noovpn; >>> + } >>> + >>> + /* Make sure the first 4 bytes of the skb data buffer after the UDP >>> + * header are accessible. >>> + * They are required to fetch the OP code, the key ID and the >>> peer ID. >>> + */ >>> + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + >>> + OVPN_OP_SIZE_V2))) { >>> + net_dbg_ratelimited("%s: packet too small\n", __func__); >>> + goto drop; >>> + } >>> + >>> + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); >>> + if (unlikely(opcode != OVPN_DATA_V2)) { >>> + /* DATA_V1 is not supported */ >>> + if (opcode == OVPN_DATA_V1) >>> + goto drop; >> >> This packet dropping makes protocol accelerator, intendent to speed up >> the data packets processing, a protocol enforcement entity, isn't it? >> Shall we follow the principle of beeing liberal in what we accept and >> just forward everything besides data packets upstream to a userspace >> application? > > 'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto > expect any DATA packet to bubble up as it would not know what to do with > it. > > So any decision regarding data packets should stay in 'ovpn'. > > We just decided to support the modern DATA_V2 (DATA_V1 is seldomly used > nowadays). > > Moreover, it's quite impossible that a peer will send us DATA_V1 if it > passed userspace handshake and negotiation. The question was about the special handling of this packet type. If this packet type is unlikely, then why should the kernel take special care of it? Is this specific packet type going to crash the userspace application? >>> + >>> + /* unknown or control packet: let it bubble up to userspace */ >>> + return 1; >>> + } >>> + >>> + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); >>> + /* some OpenVPN server implementations send data packets with the >>> + * peer-id set to undef. In this case we skip the peer lookup by >>> peer-id >>> + * and we try with the transport address >>> + */ >>> + if (peer_id != OVPN_PEER_ID_UNDEF) { >>> + peer = ovpn_peer_get_by_id(ovpn, peer_id); >>> + if (!peer) { >>> + net_err_ratelimited("%s: received data from unknown peer >>> (id: %d)\n", >>> + __func__, peer_id); >> >> Why do we consider a peer sending us garbage our problem? Meaning, >> this peer miss can be not our fault but a malformed packet from a 3rd >> party side. E.g. nowdays I can see a lot of traces of these "active >> probers" in my OpenVPN logs. Shall remove this message or at least >> make it debug to avoid bothering users with garbage traveling >> Internet? Anyway we can not do anything regarding incoming traffic. > > It could also be a peer that believes to be connected while 'ovpn' > dropped it earlier on. So this message would help the admin/user > understanding what's going on. no? It could help troubleshooting, potentionally. On the other hand, it will flood the kernel log with whatever junk is floating around the Internet. For sure. > Maybe make it an info/notice instead of error? At best it can be a debug message for developers. But IMHO the really best choice is to get rid of it. >>> + goto drop; >>> + } >>> + } -- Sergey
On 26/11/2024 01:32, Sergey Ryazanov wrote: > On 15.11.2024 17:02, Antonio Quartulli wrote: >> On 11/11/2024 02:54, Sergey Ryazanov wrote: >> [...] >> >>>> +/* Called after decrypt to write the IP packet to the device. >>>> + * This method is expected to manage/free the skb. >>>> + */ >>>> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct >>>> sk_buff *skb) >>>> +{ >>>> + unsigned int pkt_len; >>>> + >>>> + /* we can't guarantee the packet wasn't corrupted before >>>> entering the >>>> + * VPN, therefore we give other layers a chance to check that >>>> + */ >>>> + skb->ip_summed = CHECKSUM_NONE; >>>> + >>>> + /* skb hash for transport packet no longer valid after >>>> decapsulation */ >>>> + skb_clear_hash(skb); >>>> + >>>> + /* post-decrypt scrub -- prepare to inject encapsulated packet >>>> onto the >>>> + * interface, based on __skb_tunnel_rx() in dst.h >>>> + */ >>>> + skb->dev = peer->ovpn->dev; >>>> + skb_set_queue_mapping(skb, 0); >>>> + skb_scrub_packet(skb, true); >>>> + >>> >>> The skb->protocol field is going to be updated in the upcoming patch >>> in the caller (ovpn_decrypt_post). Shall we put a comment here >>> clarifying, why do not touch the protocol field here? >> >> Well, I would personally not document missing details in a partly >> implemented code path. > > Looks like the question wasn't precisely emphrased. By bad. Let me > elaborate it in more details: > 1. usually skb->protocol is updated just before a packet leaves a module > 2. I've not found it were it was expected > 3. skb->protocol is updated in the caller function - > ovpn_decrypt_post(), along with the skb_reset_network_header() call. > > The question is, shall we put some comment here in the > ovpn_netdev_write() function elaborating that this was done in the > caller? Or is such comment odd? Ok, got it. Mah personally I don't think it's truly needed. But I have no strong opinion. > >>>> + skb_reset_network_header(skb); >>> >>> ovpn_decrypt_post() already reseted the network header. Why do we >>> need it here again? >> >> yeah, I think this can be removed. >> >>> >>>> + skb_reset_transport_header(skb); >>>> + skb_probe_transport_header(skb); >>>> + skb_reset_inner_headers(skb); >>>> + >>>> + memset(skb->cb, 0, sizeof(skb->cb)); >>> >>> Why do we need to zero the control buffer here? >> >> To avoid the next layer to assume the cb is clean while it is not. >> Other drivers do the same as well. > > AFAIR, there is no convention to clean the control buffer before the > handing over. The common practice is a bit opposite, programmer shall > not assume that the control buffer has been zeroed. > > Not a big deal to clean it here, we just can save some CPU cycles > avoiding it. If there is no convention, then I agree with you and I'd remove it. > >> I think this was recommended by Sabrina as well. > > Curious. It's macsec that does not zero it, or I've not understood how > it was done. I don't see it being zero'd. So I possibly misunderstood the suggestion. I'll remove the memset. > >>>> + /* cause packet to be "received" by the interface */ >>>> + pkt_len = skb->len; >>>> + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, >>>> + skb) == NET_RX_SUCCESS)) >>>> + /* update RX stats with the size of decrypted packet */ >>>> + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); >>>> +} >>>> + > > [...] > >>>> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c >>>> index >>>> 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 >>>> --- a/drivers/net/ovpn/socket.c >>>> +++ b/drivers/net/ovpn/socket.c >>>> @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) >>>> if (!sock) >>>> return; >>>> + if (sock->sk->sk_protocol == IPPROTO_UDP) >>>> + ovpn_udp_socket_detach(sock); >>>> + >>>> sockfd_put(sock); >>>> } >>>> @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket >>>> *sock, struct ovpn_peer *peer) >>>> return ret; >>>> } >>>> +/* Retrieve the corresponding ovpn object from a UDP socket >>>> + * rcu_read_lock must be held on entry >>>> + */ >>>> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) >>>> +{ >>>> + struct ovpn_socket *ovpn_sock; >>>> + >>>> + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != >>>> UDP_ENCAP_OVPNINUDP)) >>>> + return NULL; >>>> + >>>> + ovpn_sock = rcu_dereference_sk_user_data(sk); >>>> + if (unlikely(!ovpn_sock)) >>>> + return NULL; >>>> + >>>> + /* make sure that sk matches our stored transport socket */ >>>> + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) >>>> + return NULL; >>>> + >>>> + return ovpn_sock->ovpn; >>> >>> Now, returning of this pointer is safe. But the following TCP >>> transport support calls the socket release via a scheduled work. What >>> extends socket lifetime and makes it possible to receive a UDP packet >>> way after the interface private data release. Is it correct assumption? >> >> Sorry you lost me when sayng "following *TCP* transp[ort support calls". >> This function is invoked only in UDP context. >> Was that a typ0? > > Yeah, you are right. The question sounds like a riddle. I should > eventually stop composing emails at midnight. Let me paraphrase it. :) > > The potential issue is tricky since we create it patch-by-patch. > > Up to this patch the socket releasing procedure looks solid and > reliable. E.g. the P2P netdev destroying: > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > ovpn_peer_release_p2p > ovpn_peer_del_p2p > ovpn_peer_put > ovpn_peer_release_kref > ovpn_peer_release > ovpn_socket_put > ovpn_socket_release_kref > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > netdev_run_todo > rcu_barrier <- no running ovpn_udp_encap_recv after this point > free_netdev > > After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will > be spawned. And after the rcu_barrier() all running > ovpn_udp_encap_recv() will be done. All good. > ok > Then, the following patch 'ovpn: implement TCP transport' disjoin > ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the > socket detach function call: > > ovpn_socket_release_kref > ovpn_socket_schedule_release > schedule_work(&sock->work) > > And long time after the socket will be actually detached: > > ovpn_socket_release_work > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > > And until this detaching will take a place, UDP handler can call > ovpn_udp_encap_recv() whatever number of times. > > So, we can end up with this scenario: > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > ovpn_peer_release_p2p > ovpn_peer_del_p2p > ovpn_peer_put > ovpn_peer_release_kref > ovpn_peer_release > ovpn_socket_put > ovpn_socket_release_kref > ovpn_socket_schedule_release > schedule_work(&sock->work) > netdev_run_todo > rcu_barrier > free_netdev > > ovpn_udp_encap_recv <- called for an incoming UDP packet > ovpn_from_udp_sock <- returns pointer to freed memory > // Any access to ovpn pointer is the use-after-free > > ovpn_socket_release_work <- kernel finally ivoke the work > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > > To address the issue, I see two possible solutions: > 1. flush the workqueue somewhere before the netdev release yes! This is what I was missing. This will also solve the "how can the module wait for all workers to be done before unloading?" > 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach This makes sense too. But 1 is definitely what we need. > >>> If the above is right then shall we set ->ovpn = NULL before >>> scheduling the socket releasing work or somehow else mark the socket >>> as half- destroyed? Will think about it, it may make sense to nullify ->ovpn as well. >>> >>>> +} >>>> + >>>> /** >>>> * ovpn_socket_new - create a new socket and initialize it >>>> * @sock: the kernel socket to embed >>>> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c >>>> index >>>> d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 >>>> --- a/drivers/net/ovpn/udp.c >>>> +++ b/drivers/net/ovpn/udp.c >>>> @@ -21,9 +21,95 @@ >>>> #include "bind.h" >>>> #include "io.h" >>>> #include "peer.h" >>>> +#include "proto.h" >>>> #include "socket.h" >>>> #include "udp.h" >>>> +/** >>>> + * ovpn_udp_encap_recv - Start processing a received UDP packet. >>>> + * @sk: socket over which the packet was received >>>> + * @skb: the received packet >>>> + * >>>> + * If the first byte of the payload is DATA_V2, the packet is >>>> further processed, >>>> + * otherwise it is forwarded to the UDP stack for delivery to user >>>> space. >>>> + * >>>> + * Return: >>>> + * 0 if skb was consumed or dropped >>>> + * >0 if skb should be passed up to userspace as UDP (packet not >>>> consumed) >>>> + * <0 if skb should be resubmitted as proto -N (packet not consumed) >>>> + */ >>>> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) >>>> +{ >>>> + struct ovpn_peer *peer = NULL; >>>> + struct ovpn_struct *ovpn; >>>> + u32 peer_id; >>>> + u8 opcode; >>>> + >>>> + ovpn = ovpn_from_udp_sock(sk); >>>> + if (unlikely(!ovpn)) { >>>> + net_err_ratelimited("%s: cannot obtain ovpn object from UDP >>>> socket\n", >>>> + __func__); >>> >>> Probably we should zero ovpn pointer in the ovpn_sock to survive >>> scheduled socket release (see comment in ovpn_from_udp_sock). So, >>> this print should be removed to avoid printing misguiding errors. >> >> I am also not following this. ovpn is already NULL if we are entering >> this branch, no? >> >> And I think this condition is quite improbable as well. > > Here, due to the scheduled nature of the detach function invocation, > ovpn_from_udp_sock() can return us a pointer to the freed memory. > > So we should prevent ovpn_udp_encap_recv() invocation after the netdev > release by flushing the workqueue. Or we can set ovpn_sock->ovpn = NULL > even before scheduling the socket detaching. And in this case, > ovpn_from_udp_sock() returning NULL will be a legitimate case and we > should drop the error printing. ok got it. it is related with the comment above. > >>>> + goto drop_noovpn; >>>> + } >>>> + >>>> + /* Make sure the first 4 bytes of the skb data buffer after the >>>> UDP >>>> + * header are accessible. >>>> + * They are required to fetch the OP code, the key ID and the >>>> peer ID. >>>> + */ >>>> + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + >>>> + OVPN_OP_SIZE_V2))) { >>>> + net_dbg_ratelimited("%s: packet too small\n", __func__); >>>> + goto drop; >>>> + } >>>> + >>>> + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); >>>> + if (unlikely(opcode != OVPN_DATA_V2)) { >>>> + /* DATA_V1 is not supported */ >>>> + if (opcode == OVPN_DATA_V1) >>>> + goto drop; >>> >>> This packet dropping makes protocol accelerator, intendent to speed >>> up the data packets processing, a protocol enforcement entity, isn't >>> it? Shall we follow the principle of beeing liberal in what we accept >>> and just forward everything besides data packets upstream to a >>> userspace application? >> >> 'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto >> expect any DATA packet to bubble up as it would not know what to do >> with it. >> >> So any decision regarding data packets should stay in 'ovpn'. >> >> We just decided to support the modern DATA_V2 (DATA_V1 is seldomly >> used nowadays). >> >> Moreover, it's quite impossible that a peer will send us DATA_V1 if it >> passed userspace handshake and negotiation. > > The question was about the special handling of this packet type. If this > packet type is unlikely, then why should the kernel take special care of > it? Is this specific packet type going to crash the userspace application? Not crash (hopefully) but will create confusion because it is unexpected. The userspace dataplane path is technically inactive when 'ovpn' is in use. The idea is that any DATA_V* packet should be handled in kernelspace and userspace should not need to care. > >>>> + >>>> + /* unknown or control packet: let it bubble up to userspace */ >>>> + return 1; >>>> + } >>>> + >>>> + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); >>>> + /* some OpenVPN server implementations send data packets with the >>>> + * peer-id set to undef. In this case we skip the peer lookup >>>> by peer-id >>>> + * and we try with the transport address >>>> + */ >>>> + if (peer_id != OVPN_PEER_ID_UNDEF) { >>>> + peer = ovpn_peer_get_by_id(ovpn, peer_id); >>>> + if (!peer) { >>>> + net_err_ratelimited("%s: received data from unknown >>>> peer (id: %d)\n", >>>> + __func__, peer_id); >>> >>> Why do we consider a peer sending us garbage our problem? Meaning, >>> this peer miss can be not our fault but a malformed packet from a 3rd >>> party side. E.g. nowdays I can see a lot of traces of these "active >>> probers" in my OpenVPN logs. Shall remove this message or at least >>> make it debug to avoid bothering users with garbage traveling >>> Internet? Anyway we can not do anything regarding incoming traffic. >> >> It could also be a peer that believes to be connected while 'ovpn' >> dropped it earlier on. So this message would help the admin/user >> understanding what's going on. no? > > It could help troubleshooting, potentionally. On the other hand, it will > flood the kernel log with whatever junk is floating around the Internet. > For sure. Well, only packets having the right opcode in it and being large enough. Because we have already dropped anything that doesn't look like a DATA_V2 packet at this point. > >> Maybe make it an info/notice instead of error? > > At best it can be a debug message for developers. But IMHO the really > best choice is to get rid of it. But yeah, I agree with you. Will just silently drop. > >>>> + goto drop; >>>> + } >>>> + } > > -- > Sergey Thanks, Regards,
On 26/11/2024 09:49, Antonio Quartulli wrote: [...] >> >> The potential issue is tricky since we create it patch-by-patch. >> >> Up to this patch the socket releasing procedure looks solid and >> reliable. E.g. the P2P netdev destroying: >> >> ovpn_netdev_notifier_call(NETDEV_UNREGISTER) >> ovpn_peer_release_p2p >> ovpn_peer_del_p2p >> ovpn_peer_put >> ovpn_peer_release_kref >> ovpn_peer_release >> ovpn_socket_put >> ovpn_socket_release_kref >> ovpn_socket_detach >> ovpn_udp_socket_detach >> setup_udp_tunnel_sock >> netdev_run_todo >> rcu_barrier <- no running ovpn_udp_encap_recv after this point >> free_netdev >> >> After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() >> will be spawned. And after the rcu_barrier() all running >> ovpn_udp_encap_recv() will be done. All good. >> > > ok > >> Then, the following patch 'ovpn: implement TCP transport' disjoin >> ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the >> socket detach function call: >> >> ovpn_socket_release_kref >> ovpn_socket_schedule_release >> schedule_work(&sock->work) >> >> And long time after the socket will be actually detached: >> >> ovpn_socket_release_work >> ovpn_socket_detach >> ovpn_udp_socket_detach >> setup_udp_tunnel_sock >> >> And until this detaching will take a place, UDP handler can call >> ovpn_udp_encap_recv() whatever number of times. >> >> So, we can end up with this scenario: >> >> ovpn_netdev_notifier_call(NETDEV_UNREGISTER) >> ovpn_peer_release_p2p >> ovpn_peer_del_p2p >> ovpn_peer_put >> ovpn_peer_release_kref >> ovpn_peer_release >> ovpn_socket_put >> ovpn_socket_release_kref >> ovpn_socket_schedule_release >> schedule_work(&sock->work) >> netdev_run_todo >> rcu_barrier >> free_netdev >> >> ovpn_udp_encap_recv <- called for an incoming UDP packet >> ovpn_from_udp_sock <- returns pointer to freed memory >> // Any access to ovpn pointer is the use-after-free >> >> ovpn_socket_release_work <- kernel finally ivoke the work >> ovpn_socket_detach >> ovpn_udp_socket_detach >> setup_udp_tunnel_sock >> >> To address the issue, I see two possible solutions: >> 1. flush the workqueue somewhere before the netdev release > > yes! This is what I was missing. This will also solve the "how can the > module wait for all workers to be done before unloading?" > Actually there might be even a simpler solution: each ovpn_socket will hold a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP). I can simply increase the refcounter those objects while they are referenced by the socket and decrease it when the socket is fully released (in the detach() function called by the worker). This way the netdev cannot be released until all socket (and all peers) are gone. This approach doesn't require any local workqueue or any other special coordination as we'll just force the whole cleanup to happen in a specific order. Does it make sense? Regards,
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 77ba4d33ae0bd2f52e8bd1c06a182d24285297b4..791a1b117125118b179cb13cdfd5fbab6523a360 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -9,15 +9,79 @@ #include <linux/netdevice.h> #include <linux/skbuff.h> +#include <net/gro_cells.h> #include <net/gso.h> -#include "io.h" #include "ovpnstruct.h" #include "peer.h" +#include "io.h" +#include "netlink.h" +#include "proto.h" #include "udp.h" #include "skb.h" #include "socket.h" +/* Called after decrypt to write the IP packet to the device. + * This method is expected to manage/free the skb. + */ +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) +{ + unsigned int pkt_len; + + /* we can't guarantee the packet wasn't corrupted before entering the + * VPN, therefore we give other layers a chance to check that + */ + skb->ip_summed = CHECKSUM_NONE; + + /* skb hash for transport packet no longer valid after decapsulation */ + skb_clear_hash(skb); + + /* post-decrypt scrub -- prepare to inject encapsulated packet onto the + * interface, based on __skb_tunnel_rx() in dst.h + */ + skb->dev = peer->ovpn->dev; + skb_set_queue_mapping(skb, 0); + skb_scrub_packet(skb, true); + + skb_reset_network_header(skb); + skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb); + + memset(skb->cb, 0, sizeof(skb->cb)); + + /* cause packet to be "received" by the interface */ + pkt_len = skb->len; + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, + skb) == NET_RX_SUCCESS)) + /* update RX stats with the size of decrypted packet */ + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); +} + +static void ovpn_decrypt_post(struct sk_buff *skb, int ret) +{ + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; + + if (unlikely(ret < 0)) + goto drop; + + ovpn_netdev_write(peer, skb); + /* skb is passed to upper layer - don't free it */ + skb = NULL; +drop: + if (unlikely(skb)) + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); + ovpn_peer_put(peer); + kfree_skb(skb); +} + +/* pick next packet from RX queue, decrypt and forward it to the device */ +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb) +{ + ovpn_skb_cb(skb)->peer = peer; + ovpn_decrypt_post(skb, 0); +} + static void ovpn_encrypt_post(struct sk_buff *skb, int ret) { struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h index aa259be66441f7b0262f39da12d6c3dce0a9b24c..9667a0a470e0b4b427524fffb5b9b395007e5a2f 100644 --- a/drivers/net/ovpn/io.h +++ b/drivers/net/ovpn/io.h @@ -12,4 +12,6 @@ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev); +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb); + #endif /* _NET_OVPN_OVPN_H_ */ diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 5492ce07751d135c1484fe1ed8227c646df94969..73348765a8cf24321aa6be78e75f607d6dbffb1d 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -11,6 +11,7 @@ #include <linux/module.h> #include <linux/netdevice.h> #include <linux/inetdevice.h> +#include <net/gro_cells.h> #include <net/ip.h> #include <net/rtnetlink.h> #include <uapi/linux/if_arp.h> @@ -32,7 +33,16 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_init(struct net_device *dev) { - return 0; + struct ovpn_struct *ovpn = netdev_priv(dev); + + return gro_cells_init(&ovpn->gro_cells, dev); +} + +static void ovpn_net_uninit(struct net_device *dev) +{ + struct ovpn_struct *ovpn = netdev_priv(dev); + + gro_cells_destroy(&ovpn->gro_cells); } static int ovpn_net_open(struct net_device *dev) @@ -56,6 +66,7 @@ static int ovpn_net_stop(struct net_device *dev) static const struct net_device_ops ovpn_netdev_ops = { .ndo_init = ovpn_net_init, + .ndo_uninit = ovpn_net_uninit, .ndo_open = ovpn_net_open, .ndo_stop = ovpn_net_stop, .ndo_start_xmit = ovpn_net_xmit, diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index a22c5083381c131db01a28c0f51e661d690d4998..4a48fc048890ab1cda78bc104fe3034b4a49d226 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -10,6 +10,7 @@ #ifndef _NET_OVPN_OVPNSTRUCT_H_ #define _NET_OVPN_OVPNSTRUCT_H_ +#include <net/gro_cells.h> #include <net/net_trackers.h> #include <uapi/linux/if_link.h> #include <uapi/linux/ovpn.h> @@ -23,6 +24,7 @@ * @lock: protect this object * @peer: in P2P mode, this is the only remote peer * @dev_list: entry for the module wide device list + * @gro_cells: pointer to the Generic Receive Offload cell */ struct ovpn_struct { struct net_device *dev; @@ -32,6 +34,7 @@ struct ovpn_struct { spinlock_t lock; /* protect writing to the ovpn_struct object */ struct ovpn_peer __rcu *peer; struct list_head dev_list; + struct gro_cells gro_cells; }; #endif /* _NET_OVPN_OVPNSTRUCT_H_ */ diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h new file mode 100644 index 0000000000000000000000000000000000000000..69604cf26bbf82539ee5cd5a7ac9c23920f555de --- /dev/null +++ b/drivers/net/ovpn/proto.h @@ -0,0 +1,75 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli <antonio@openvpn.net> + * James Yonan <james@openvpn.net> + */ + +#ifndef _NET_OVPN_OVPNPROTO_H_ +#define _NET_OVPN_OVPNPROTO_H_ + +#include "main.h" + +#include <linux/skbuff.h> + +/* Methods for operating on the initial command + * byte of the OpenVPN protocol. + */ + +/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in + * one byte + */ +#define OVPN_KEY_ID_MASK 0x07 +#define OVPN_OPCODE_SHIFT 3 +#define OVPN_OPCODE_MASK 0x1F +/* upper bounds on opcode and key ID */ +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1) +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1) +/* packet opcodes of interest to us */ +#define OVPN_DATA_V1 6 /* data channel V1 packet */ +#define OVPN_DATA_V2 9 /* data channel V2 packet */ +/* size of initial packet opcode */ +#define OVPN_OP_SIZE_V1 1 +#define OVPN_OP_SIZE_V2 4 +#define OVPN_PEER_ID_MASK 0x00FFFFFF +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF +/* first byte of keepalive message */ +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a +/* first byte of exit message */ +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28 + +/** + * ovpn_opcode_from_skb - extract OP code from skb at specified offset + * @skb: the packet to extract the OP code from + * @offset: the offset in the data buffer where the OP code is located + * + * Note: this function assumes that the skb head was pulled enough + * to access the first byte. + * + * Return: the OP code + */ +static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 offset) +{ + u8 byte = *(skb->data + offset); + + return byte >> OVPN_OPCODE_SHIFT; +} + +/** + * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset + * @skb: the packet to extract the OP code from + * @offset: the offset in the data buffer where the OP code is located + * + * Note: this function assumes that the skb head was pulled enough + * to access the first 4 bytes. + * + * Return: the peer ID. + */ +static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, u16 offset) +{ + return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK; +} + +#endif /* _NET_OVPN_OVPNPROTO_H_ */ diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c index 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 --- a/drivers/net/ovpn/socket.c +++ b/drivers/net/ovpn/socket.c @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) if (!sock) return; + if (sock->sk->sk_protocol == IPPROTO_UDP) + ovpn_udp_socket_detach(sock); + sockfd_put(sock); } @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) return ret; } +/* Retrieve the corresponding ovpn object from a UDP socket + * rcu_read_lock must be held on entry + */ +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) +{ + struct ovpn_socket *ovpn_sock; + + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) + return NULL; + + ovpn_sock = rcu_dereference_sk_user_data(sk); + if (unlikely(!ovpn_sock)) + return NULL; + + /* make sure that sk matches our stored transport socket */ + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) + return NULL; + + return ovpn_sock->ovpn; +} + /** * ovpn_socket_new - create a new socket and initialize it * @sock: the kernel socket to embed diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c index d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 --- a/drivers/net/ovpn/udp.c +++ b/drivers/net/ovpn/udp.c @@ -21,9 +21,95 @@ #include "bind.h" #include "io.h" #include "peer.h" +#include "proto.h" #include "socket.h" #include "udp.h" +/** + * ovpn_udp_encap_recv - Start processing a received UDP packet. + * @sk: socket over which the packet was received + * @skb: the received packet + * + * If the first byte of the payload is DATA_V2, the packet is further processed, + * otherwise it is forwarded to the UDP stack for delivery to user space. + * + * Return: + * 0 if skb was consumed or dropped + * >0 if skb should be passed up to userspace as UDP (packet not consumed) + * <0 if skb should be resubmitted as proto -N (packet not consumed) + */ +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{ + struct ovpn_peer *peer = NULL; + struct ovpn_struct *ovpn; + u32 peer_id; + u8 opcode; + + ovpn = ovpn_from_udp_sock(sk); + if (unlikely(!ovpn)) { + net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n", + __func__); + goto drop_noovpn; + } + + /* Make sure the first 4 bytes of the skb data buffer after the UDP + * header are accessible. + * They are required to fetch the OP code, the key ID and the peer ID. + */ + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + + OVPN_OP_SIZE_V2))) { + net_dbg_ratelimited("%s: packet too small\n", __func__); + goto drop; + } + + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); + if (unlikely(opcode != OVPN_DATA_V2)) { + /* DATA_V1 is not supported */ + if (opcode == OVPN_DATA_V1) + goto drop; + + /* unknown or control packet: let it bubble up to userspace */ + return 1; + } + + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); + /* some OpenVPN server implementations send data packets with the + * peer-id set to undef. In this case we skip the peer lookup by peer-id + * and we try with the transport address + */ + if (peer_id != OVPN_PEER_ID_UNDEF) { + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", + __func__, peer_id); + goto drop; + } + } + + if (!peer) { + /* data packet with undef peer-id */ + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); + if (unlikely(!peer)) { + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", + __func__); + goto drop; + } + } + + /* pop off outer UDP header */ + __skb_pull(skb, sizeof(struct udphdr)); + ovpn_recv(peer, skb); + return 0; + +drop: + if (peer) + ovpn_peer_put(peer); + dev_core_stats_rx_dropped_inc(ovpn->dev); +drop_noovpn: + kfree_skb(skb); + return 0; +} + /** * ovpn_udp4_output - send IPv4 packet over udp socket * @ovpn: the openvpn instance @@ -259,8 +345,12 @@ void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, */ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) { + struct udp_tunnel_sock_cfg cfg = { + .encap_type = UDP_ENCAP_OVPNINUDP, + .encap_rcv = ovpn_udp_encap_recv, + }; struct ovpn_socket *old_data; - int ret = 0; + int ret; /* sanity check */ if (sock->sk->sk_protocol != IPPROTO_UDP) { @@ -274,6 +364,7 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) if (!old_data) { /* socket is currently unused - we can take it */ rcu_read_unlock(); + setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg); return 0; } @@ -302,3 +393,14 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) return ret; } + +/** + * ovpn_udp_socket_detach - clean udp-tunnel status for this socket + * @sock: the socket to clean + */ +void ovpn_udp_socket_detach(struct socket *sock) +{ + struct udp_tunnel_sock_cfg cfg = { }; + + setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg); +} diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h index e60f8cd2b4ac8f910aabcf8ed546af59d6ca4be4..fecb68464896bc1228315faf268453f9005e693d 100644 --- a/drivers/net/ovpn/udp.h +++ b/drivers/net/ovpn/udp.h @@ -18,8 +18,9 @@ struct sk_buff; struct socket; int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn); - +void ovpn_udp_socket_detach(struct socket *sock); void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, struct sk_buff *skb); +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk); #endif /* _NET_OVPN_UDP_H_ */
Packets received over the socket are forwarded to the user device. Implementation is UDP only. TCP will be added by a later patch. Note: no decryption/decapsulation exists yet, packets are forwarded as they arrive without much processing. Signed-off-by: Antonio Quartulli <antonio@openvpn.net> --- drivers/net/ovpn/io.c | 66 ++++++++++++++++++++++++++- drivers/net/ovpn/io.h | 2 + drivers/net/ovpn/main.c | 13 +++++- drivers/net/ovpn/ovpnstruct.h | 3 ++ drivers/net/ovpn/proto.h | 75 ++++++++++++++++++++++++++++++ drivers/net/ovpn/socket.c | 24 ++++++++++ drivers/net/ovpn/udp.c | 104 +++++++++++++++++++++++++++++++++++++++++- drivers/net/ovpn/udp.h | 3 +- 8 files changed, 286 insertions(+), 4 deletions(-)