From patchwork Tue Oct 8 06:54:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825744 Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8C1C1E3DE9 for ; Tue, 8 Oct 2024 06:54:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370484; cv=none; b=nFFPX2to+emrjsEKkjOEpLNi6ezaZYsiCLbYf2DI2tjCOBB6aI7vUG7UWjaH+E4U86k+nWG63ray8zwfJAsQGDI+8N0tIyYKavpuPXBVXEH0VR+I6LlZcnDNA0voDtWcJyC1Y1ZFsVxSMe3mddiVMaox4hy9Q8M1BBizOcNrpLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370484; c=relaxed/simple; bh=qMg81p/yFSbQWZ14vmFSb1wqDyUo4B3N6Li+q87CNMM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=JTpTFFEXefkrt522xOg3ldj+ldTL5tu4iTKXS26iVEHmA0tazHBkboky7VY/XhwIvpaDoTT3mBUlTc6Fcvphzhtrr8cA0WQVd9gpycRMHHOVCgL0P7iUlrBBNljitbZ2HEYHcNKKxw1ffRL2jloOUxDIqUeMOpwaWIYW2xPTrY8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=HE92fSXE; arc=none smtp.client-ip=209.85.215.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="HE92fSXE" Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-7ea1b850d5cso520068a12.1 for ; Mon, 07 Oct 2024 23:54:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370482; x=1728975282; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=7cIIVweapDCc/iwrIXHzevWxOKZ7j53GguHgBN2DIDA=; b=HE92fSXEd0opRQMbKCbicVnkWEGWXmiFKgA+LFoku+yiyqwGexCf/5d1G9hoJpbsjg 1sCxUgUmvzo4rKIFxXdBJTYIvpc31aE9mu5gPcE72mRNIOcJDZ64d5e7EFMx06nS8XQF 95cClrpuuAgf+pWWL0l/94/mhB+8PP27gk3xYt3lnYwRfj/ooe/mgZgfluGG8mLt7eOC p4ppDZOHhauCN3CUzdik0jMHuxv2dzGe+KguLnVjRPPE6T9v6gIvFJ2VnHLq3LepWq7S mJU6FCr2P+ZfuJWRGCAgyy1/CMCynOAoxI67WZga5+U7bID30lovHGtpEeG15dwHU+WU 1+iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370482; x=1728975282; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7cIIVweapDCc/iwrIXHzevWxOKZ7j53GguHgBN2DIDA=; b=ZdxNZWwbq0WA7TAYiwXTeGmsaitm3Vpd+daRI8Fz49FbIvtoBEmsm1p7WRRZbuc6/l CVS980vCzxDNrd40hlb50yzPdlKFUhUcabfLM4IMZlR/zbkAAzLG4qcngWQGGt6uy9JP 92xn34TD3ys6HxOUQANPdZ2Mv0k4R0Wb4OPu5XrYL6TRgl7onjMjUH1CiUn8OCuaboV3 dXkq3aiOEkrI+WT8cyuJmhyUz8OVEavp9d1DdM+4+K9958nGedeDcrz9VjsNYnOUCdE0 oDhrCG8rlgussuKzlYtAXahHO8yGD84XZn+nGDhjZxf5wnd0/2pNdxXg9yP88K8Pakvu X2rw== X-Forwarded-Encrypted: i=1; AJvYcCUw9+RWv1yP0uxEvV1uiVwakII2VycJSdHIT0/gapRkEmk13PHyrfqrO+n0gID2CEAx1GQ=@vger.kernel.org X-Gm-Message-State: AOJu0YyNYLlBocQcF4BUDKRbr7t9ZjcyLO+otVDWw66iK6/+Gcw9Wemq HYzPsV6HDQ960h0VyX7Fn7EuiQWUYqVEHj+CshpEbQ//PjtGShKV91ngfA9ZMtk= X-Google-Smtp-Source: AGHT+IE7KA/lFQWNaT35iOb8cfgrlZUI+9ByK/6ptxiD9KjIdH2lOkOsypu0ZXoVzE+ZMkdFa59Bpg== X-Received: by 2002:a05:6a21:1584:b0:1d6:97f2:5f72 with SMTP id adf61e73a8af0-1d6dfa279c8mr22308364637.3.1728370481930; Mon, 07 Oct 2024 23:54:41 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id 41be03b00d2f7-7e9f681f089sm6036690a12.25.2024.10.07.23.54.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:54:41 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:21 +0900 Subject: [PATCH RFC v5 01/10] virtio_net: Add functions for hashing Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-1-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 They are useful to implement VIRTIO_NET_F_RSS and VIRTIO_NET_F_HASH_REPORT. Signed-off-by: Akihiko Odaki --- include/linux/virtio_net.h | 188 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 188 insertions(+) diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index 276ca543ef44..6f192bb9ba1d 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -9,6 +9,194 @@ #include #include +struct virtio_net_hash { + u32 value; + u16 report; +}; + +struct virtio_net_toeplitz_state { + u32 hash; + const u32 *key; +}; + +#define VIRTIO_NET_SUPPORTED_HASH_TYPES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \ + VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \ + VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \ + VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \ + VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \ + VIRTIO_NET_RSS_HASH_TYPE_UDPv6) + +#define VIRTIO_NET_RSS_MAX_KEY_SIZE 40 + +static inline void virtio_net_toeplitz_convert_key(u32 *input, size_t len) +{ + while (len >= sizeof(*input)) { + *input = be32_to_cpu((__force __be32)*input); + input++; + len -= sizeof(*input); + } +} + +static inline void virtio_net_toeplitz_calc(struct virtio_net_toeplitz_state *state, + const __be32 *input, size_t len) +{ + while (len >= sizeof(*input)) { + for (u32 map = be32_to_cpu(*input); map; map &= (map - 1)) { + u32 i = ffs(map); + + state->hash ^= state->key[0] << (32 - i) | + (u32)((u64)state->key[1] >> i); + } + + state->key++; + input++; + len -= sizeof(*input); + } +} + +static inline u8 virtio_net_hash_key_length(u32 types) +{ + size_t len = 0; + + if (types & VIRTIO_NET_HASH_REPORT_IPv4) + len = max(len, + sizeof(struct flow_dissector_key_ipv4_addrs)); + + if (types & + (VIRTIO_NET_HASH_REPORT_TCPv4 | VIRTIO_NET_HASH_REPORT_UDPv4)) + len = max(len, + sizeof(struct flow_dissector_key_ipv4_addrs) + + sizeof(struct flow_dissector_key_ports)); + + if (types & VIRTIO_NET_HASH_REPORT_IPv6) + len = max(len, + sizeof(struct flow_dissector_key_ipv6_addrs)); + + if (types & + (VIRTIO_NET_HASH_REPORT_TCPv6 | VIRTIO_NET_HASH_REPORT_UDPv6)) + len = max(len, + sizeof(struct flow_dissector_key_ipv6_addrs) + + sizeof(struct flow_dissector_key_ports)); + + return len + 4; +} + +static inline u32 virtio_net_hash_report(u32 types, + const struct flow_keys_basic *keys) +{ + switch (keys->basic.n_proto) { + case cpu_to_be16(ETH_P_IP): + if (!(keys->control.flags & FLOW_DIS_IS_FRAGMENT)) { + if (keys->basic.ip_proto == IPPROTO_TCP && + (types & VIRTIO_NET_RSS_HASH_TYPE_TCPv4)) + return VIRTIO_NET_HASH_REPORT_TCPv4; + + if (keys->basic.ip_proto == IPPROTO_UDP && + (types & VIRTIO_NET_RSS_HASH_TYPE_UDPv4)) + return VIRTIO_NET_HASH_REPORT_UDPv4; + } + + if (types & VIRTIO_NET_RSS_HASH_TYPE_IPv4) + return VIRTIO_NET_HASH_REPORT_IPv4; + + return VIRTIO_NET_HASH_REPORT_NONE; + + case cpu_to_be16(ETH_P_IPV6): + if (!(keys->control.flags & FLOW_DIS_IS_FRAGMENT)) { + if (keys->basic.ip_proto == IPPROTO_TCP && + (types & VIRTIO_NET_RSS_HASH_TYPE_TCPv6)) + return VIRTIO_NET_HASH_REPORT_TCPv6; + + if (keys->basic.ip_proto == IPPROTO_UDP && + (types & VIRTIO_NET_RSS_HASH_TYPE_UDPv6)) + return VIRTIO_NET_HASH_REPORT_UDPv6; + } + + if (types & VIRTIO_NET_RSS_HASH_TYPE_IPv6) + return VIRTIO_NET_HASH_REPORT_IPv6; + + return VIRTIO_NET_HASH_REPORT_NONE; + + default: + return VIRTIO_NET_HASH_REPORT_NONE; + } +} + +static inline void virtio_net_hash_rss(const struct sk_buff *skb, + u32 types, const u32 *key, + struct virtio_net_hash *hash) +{ + struct virtio_net_toeplitz_state toeplitz_state = { .key = key }; + struct flow_keys flow; + struct flow_keys_basic flow_basic; + u16 report; + + if (!skb_flow_dissect_flow_keys(skb, &flow, 0)) { + hash->report = VIRTIO_NET_HASH_REPORT_NONE; + return; + } + + flow_basic = (struct flow_keys_basic) { + .control = flow.control, + .basic = flow.basic + }; + + report = virtio_net_hash_report(types, &flow_basic); + + switch (report) { + case VIRTIO_NET_HASH_REPORT_IPv4: + virtio_net_toeplitz_calc(&toeplitz_state, + (__be32 *)&flow.addrs.v4addrs, + sizeof(flow.addrs.v4addrs)); + break; + + case VIRTIO_NET_HASH_REPORT_TCPv4: + virtio_net_toeplitz_calc(&toeplitz_state, + (__be32 *)&flow.addrs.v4addrs, + sizeof(flow.addrs.v4addrs)); + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports, + sizeof(flow.ports.ports)); + break; + + case VIRTIO_NET_HASH_REPORT_UDPv4: + virtio_net_toeplitz_calc(&toeplitz_state, + (__be32 *)&flow.addrs.v4addrs, + sizeof(flow.addrs.v4addrs)); + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports, + sizeof(flow.ports.ports)); + break; + + case VIRTIO_NET_HASH_REPORT_IPv6: + virtio_net_toeplitz_calc(&toeplitz_state, + (__be32 *)&flow.addrs.v6addrs, + sizeof(flow.addrs.v6addrs)); + break; + + case VIRTIO_NET_HASH_REPORT_TCPv6: + virtio_net_toeplitz_calc(&toeplitz_state, + (__be32 *)&flow.addrs.v6addrs, + sizeof(flow.addrs.v6addrs)); + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports, + sizeof(flow.ports.ports)); + break; + + case VIRTIO_NET_HASH_REPORT_UDPv6: + virtio_net_toeplitz_calc(&toeplitz_state, + (__be32 *)&flow.addrs.v6addrs, + sizeof(flow.addrs.v6addrs)); + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports, + sizeof(flow.ports.ports)); + break; + + default: + hash->report = VIRTIO_NET_HASH_REPORT_NONE; + return; + } + + hash->value = toeplitz_state.hash; + hash->report = report; +} + static inline bool virtio_net_hdr_match_proto(__be16 protocol, __u8 gso_type) { switch (gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { From patchwork Tue Oct 8 06:54:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825745 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0E001E3DEA for ; Tue, 8 Oct 2024 06:54:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370490; cv=none; b=Eq1d3lxjFnum8heEkQ79gOHVM8k7GrIZgFnwdObBsIFAPX8N/n502QgIkdv3u2XwAOGcSxNhvfTROsmt9vHBPykL2GaSL0NWzPAgHtSjbYSvXGWzIFASPHQbUZPRvXduzt7IYojlFDV3UND1EyJMuwLrVCLV0BiZz+zrvxf6NME= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370490; c=relaxed/simple; bh=uaol2IGqvlm8ZTxu7WxkWiZOHaSPhvJu+1yhZK+cfFc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=A0zlUK2R0qETmZmh4X+opinSl2928IPrCixK6eB8nH5lX+9C44O5FMu8jfOc7mYEabYYcrv/zK9qFN8lRQyknerV5XZEul4v+MVigg0FIEqx4LEtUV9vCwY9uCqJ2f3y4dms8UX5izfqNymXYuOP2MjRbc4sp+Zl9M5z/FUzxis= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=fqmv9pvY; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="fqmv9pvY" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-71de9e1f431so2885737b3a.3 for ; Mon, 07 Oct 2024 23:54:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370488; x=1728975288; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=VUS5KMA/ewmQfZHiSllXFCUo7ZXTGgrN42vxjEBIbqQ=; b=fqmv9pvY8o9ipB+kE/rimSv/RWtXp9Zt7Zk088OiBWve03Jeyw3r4yukwWurgEX08X kjxPa4uGST4v/qU5GKPnCMwYQkhvqoYjFoxQv1FkrNTJAeIVGwuVRERB9qGNTaHVw2oe sUDAgPl/s+Slbpea/GS9wWepvx2SQTFEHaRlJRurp+TrL87gMen7NPDqNJp7DPhMv8xs omd+KMgS4JPNsPjnQsWX/o2kETJSAp4xU8m/rO0H5BYExEOVQvElTxbymh2pBeHKMadU na7+hEHkFwBAJ3o+YDcKig5xw1WRieOOt2vnCR4QdWitYP8bGoH8EonRZdRn4CoX84Y6 l0og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370488; x=1728975288; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VUS5KMA/ewmQfZHiSllXFCUo7ZXTGgrN42vxjEBIbqQ=; b=ncH+e2uPTW34YaGAfxGvJOLGnV4j1EAcRfEkflugREPX9Xq3D3NZOXR6+F/hOT+Y1g 1BiQbGFSlJ28IER7hZTK3lyANiAq5fTcord1cJKlKDN2Ogxg+uqaw5/B2QM5uMgdRgkI aYxC7bhXShSM/JPYLxhtSQMOZ6BAiuV5n7efaJXE4pKlSXfyszv8Arpj+ZHM0cU+bUaX T+aaJ/LtoXH3S4vVTe02JESkw1NqxiUPAnXmLIgNRVWoUUiKyr5VQQtqEqfNUPnxGh8v ZZMOWH+P8NRYMQgysb+4pMmrfoiH2PHJnm2z36O9fjLU6vlzazdxeSgDMw9l3SeGEGgn 7qfg== X-Forwarded-Encrypted: i=1; AJvYcCWlc/XqJzd+qn4wlUzp5vrMhEtyRh5GWLCv/+20E2ZjlEQCf/jDj+MWArpR//+REEWx8p8=@vger.kernel.org X-Gm-Message-State: AOJu0YxLvsCCYbJ85TXb/IxhjBMXcXyXDlrftIUylTbDOucOE6xYS/02 WPGDBT5O1PK0gWwdncX5/ypvSP0JR4Y4PZXii0RdYeODtPxH/qsLT2zsnW5jaZE= X-Google-Smtp-Source: AGHT+IEKcNMiY7wzLXe5eQDoa0aLQOUol5zSII/XfgY6pIxMP/K/JQLspO3LpuBtTgX1dSRwe3ohkA== X-Received: by 2002:a05:6a00:3e22:b0:71e:21c:bf1b with SMTP id d2e1a72fcca58-71e021cc1bcmr11161687b3a.14.1728370488090; Mon, 07 Oct 2024 23:54:48 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-71df0d6521asm5477837b3a.166.2024.10.07.23.54.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:54:47 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:22 +0900 Subject: [PATCH RFC v5 02/10] skbuff: Introduce SKB_EXT_TUN_VNET_HASH Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-2-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 This new extension will be used by tun to carry the hash values and types to report with virtio-net headers. Signed-off-by: Akihiko Odaki --- include/linux/skbuff.h | 3 +++ net/core/skbuff.c | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 29c3ea5b6e93..a361c4150144 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4718,6 +4718,9 @@ enum skb_ext_id { #endif #if IS_ENABLED(CONFIG_MCTP_FLOWS) SKB_EXT_MCTP, +#endif +#if IS_ENABLED(CONFIG_TUN) + SKB_EXT_TUN_VNET_HASH, #endif SKB_EXT_NUM, /* must be last */ }; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 83f8cd8aa2d1..f0bf94cf458b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -64,6 +64,7 @@ #include #include #include +#include #include #include @@ -4979,6 +4980,9 @@ static const u8 skb_ext_type_len[] = { #if IS_ENABLED(CONFIG_MCTP_FLOWS) [SKB_EXT_MCTP] = SKB_EXT_CHUNKSIZEOF(struct mctp_flow), #endif +#if IS_ENABLED(CONFIG_TUN) + [SKB_EXT_TUN_VNET_HASH] = SKB_EXT_CHUNKSIZEOF(struct virtio_net_hash), +#endif }; static __always_inline unsigned int skb_ext_total_length(void) From patchwork Tue Oct 8 06:54:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825746 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2B121E5706 for ; Tue, 8 Oct 2024 06:54:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370496; cv=none; b=IDU1AS28iZ/ncrkBAIcEiqRQGPq4R+9q5E4GIpbocWtwyNa1o3Frvho9DyUjYwwpwTH8apVWrgyFxbpCRTFOYC4cjm617bu/nGEw6bMjw47fh53E0hcYbLv/vNcBU0hXblrZI1RsEWJUNXXvJ5UzlfICd9OfZRSLf+i5Ld+McGU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370496; c=relaxed/simple; bh=GX7xNNedqmkBNoygMLXRBk8YNW8TUTKj2IdIaUHcsB0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=NgGVE9NxZCf4mJtHjPoqVKIArZDkFx0xAuZOZxZ9qKqG9CWT4Vt20Y7jfM0nCN0rpUiII7rdAxhR0R/cgFi+jb10PdFM8MJ8IZr86EI96H2S/ZOQCCHDzr1/7aGcP0ONqtND7eUKbiB7onznXmidUScOUMv5nwlsw7VBeeygjgE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=wmq6yPXP; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="wmq6yPXP" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-20bcae5e482so45799525ad.0 for ; Mon, 07 Oct 2024 23:54:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370494; x=1728975294; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=61I8UVOx3KOO+grmBS2eNs+UXuu5q7xipsiBLUyhIxU=; b=wmq6yPXPRSEqM/tb9PmlS7HIcPpfk09lCAmS7zkuhDFvwK8g9X6m0ELFp3dfr0zFg5 zyY3LOgttb+uAc6Mk5/DRLNHJBMUggPu0S6r1pVfWkPuS9nWad3i1RWSV9XOPn8qRT3P 7SVhYEqtofujNQBA8ou7sbHrfcQrzTQERXORKzZixdjQM15LEpAYIbPLb9ujKGR/2EjG 5dw4OjchKgxVOgdWP4o8xnMYMBuzK5KVWRWd3FY7BsAf5i5HR7AIf7WrRsaKomGp+3KT I2mVqZwVqdJybMlmUUNlQn3FXAb0BR0AnUBng6sXQMMvde7ICR2yVJkuHw0afL6mx246 yYXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370494; x=1728975294; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=61I8UVOx3KOO+grmBS2eNs+UXuu5q7xipsiBLUyhIxU=; b=SqWJD06zF/AptiGA3nFT7cWvd0ple8075EfG8eJg8W9bg8LGJYSTuQAdsS4jpiTI1d frJVlx8ZPLgUIapbmpc8SsXEZsc9+UErAaVrP1EPgcMrcTDWQv+Jzla5mZUMAJl8PNM/ DKti9YF0ddOX7v9O+GkgR/WkqZLhdkNjxF6ochnY9e8MlcKPR6N2jVao+z5AU4s8tUJp TQ0Co4SZ8aEjRcMFb5+WZlQLQqbvWlU3sqa1gKpANyGOclgOrXNXi5ZAKlCq3J8D2n2U QY5Fsq2eqNrEoPHVbrK+OF3lnOPDgeYSQkgii4rpr10QegwqeXqKgzoso9K7i1x8oPtj 7sRg== X-Forwarded-Encrypted: i=1; AJvYcCVBjts54tfdBfG5yfFq39stMPyBJt3+oWCSSSqY51WBX0KRgZIrUVZgUn2pvzNB6aIzpjw=@vger.kernel.org X-Gm-Message-State: AOJu0Yxx99SjhQNxkrRXSut5H0kg4bo/ZTRGcVFQSqTCv0n58tIMhq0g IhKAHuwTVIOWZLG9euA1A91Qlk1Aqucpfy3CDdTqFCSbv71sSD2KjuOtlb9yHhM= X-Google-Smtp-Source: AGHT+IFQle2yzzR4dcGGT4YtYzeAeqeN1k82tWPpuGYLK1pMOd+gx8+tY2XKUq2OjAmapp2a+BI3iw== X-Received: by 2002:a17:903:32c7:b0:20b:987b:e3a0 with SMTP id d9443c01a7336-20bfea56b19mr207151025ad.30.1728370494237; Mon, 07 Oct 2024 23:54:54 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-20c139a2bbbsm49777315ad.294.2024.10.07.23.54.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:54:53 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:23 +0900 Subject: [PATCH RFC v5 03/10] net: flow_dissector: Export flow_keys_dissector_symmetric Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-3-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 flow_keys_dissector_symmetric is useful to derive a symmetric hash and to know its source such as IPv4, IPv6, TCP, and UDP. Signed-off-by: Akihiko Odaki --- include/net/flow_dissector.h | 1 + net/core/flow_dissector.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index ced79dc8e856..d01c1ec77b7d 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -423,6 +423,7 @@ __be32 flow_get_u32_src(const struct flow_keys *flow); __be32 flow_get_u32_dst(const struct flow_keys *flow); extern struct flow_dissector flow_keys_dissector; +extern struct flow_dissector flow_keys_dissector_symmetric; extern struct flow_dissector flow_keys_basic_dissector; /* struct flow_keys_digest: diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 0e638a37aa09..9822988f2d49 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -1852,7 +1852,8 @@ void make_flow_keys_digest(struct flow_keys_digest *digest, } EXPORT_SYMBOL(make_flow_keys_digest); -static struct flow_dissector flow_keys_dissector_symmetric __read_mostly; +struct flow_dissector flow_keys_dissector_symmetric __read_mostly; +EXPORT_SYMBOL(flow_keys_dissector_symmetric); u32 __skb_get_hash_symmetric_net(const struct net *net, const struct sk_buff *skb) { From patchwork Tue Oct 8 06:54:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825747 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 875C41C2DC1 for ; Tue, 8 Oct 2024 06:55:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370502; cv=none; b=li88jPh3IHjpDzluvtqZHvUBArOmUQNhKUhn6sGxP/+kuxMyaK+eZg+Ub3UFlFtb6KzKXJKXKdWfI0gnw5TATVNUUok+ik3daVEwuLBSA3Rqe2Ob2/YHXWBHBI4VJ8r1++YxprgNF2qp+qILN2c0dHkNV143w0Nm4gmA6NU8FR8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370502; c=relaxed/simple; bh=1dOAIe36MmwkHa5oWaR7j4Z9yg+Iaxo0VBvk6agmKlQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=gn0Kphnfw66dyjwub8dgvrHjuDqOTCbF21w9rmX1UP37GgJ1QhwIe1/KY2TH+p3ymqtYTRmF33cdZlR2CWxXdZoP1ZMJP/sv8rz5nXm2EZAyjc35z6XyynMgsdERUl4w58bGVpruiT9S0ePQk6TDg2WBjorMIDvpvjWnaH9F2Jc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=vSPiL0hh; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="vSPiL0hh" Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-71def8abc2fso2641332b3a.1 for ; Mon, 07 Oct 2024 23:55:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370500; x=1728975300; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=lTju3LvBLdff6+EXieYtg2dw4+S3uLoh0Xf+XZ6en/o=; b=vSPiL0hh3sQaejrWUv/VhtZRTw1qck+3My5JFt/s3pTNKCSPdZKOMP7Pj4nPFUVro2 SuwD3vSmid7rB4AfTD4NykFkXPD/plTD13agkAHXWpRYBJzO6ijYNMm9nlUqXrZKUGb+ Cp4CmvrKIdScwzyPNRtsS/5PrcwJRBf4xCzLD/UmCjGZ/qkXX2UJ9QM2qqlpqn8SloGa vlJ3EY/93EH548RG7tH9kvs83FSA6wgNNn98lZVo67ij+2VdyL6LjeHSvfhl/n7SxESS 8PfqR0viXsjvvIVDY8Drp/XKaDf1fh+lVRXI7EnGB2N/yA+sDu1YIE5a5OumvLrFBoHC tbsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370500; x=1728975300; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lTju3LvBLdff6+EXieYtg2dw4+S3uLoh0Xf+XZ6en/o=; b=Shwtof55sk6LUUU0nCtNbt/TfqKFa55CgyjJuLK9ioaXQBTOZiwJgN9ro2kBOZPpd3 Z0gALhkZfqpsfT1s8RIz+ypycLbJG9m0buD9nDSNnOFgnBkUC/3kk064B5GgtIFOaNdh vXAckFLK4lQxri1zUtiA2qOuirb9isQIVjZ2Uxvh+po9dfeAQlcBb0tNDMSIiOeqCZoD JT181EnoD2hGigLFAE9bz4zGTvB4m8LT823pung5f4PEFfxt0P36eKklBP598TpEtI+4 szH+I/AlSJnvIa0RVt0uFt4U/1jqW78dKeAp+IPLxjjABy4jppYIeBnbtrGZN/3ib1je lHsw== X-Forwarded-Encrypted: i=1; AJvYcCVghk7y5yAezYezFxolucrE6NhlXIJ45EASOM/zSOzdmY5cp1Hfn3wXMSxmUZ3gkDng174=@vger.kernel.org X-Gm-Message-State: AOJu0Yw3cIC2DBDlxnz3HZWkO0Lun8WV1RoWkL2H41GCz0LP8+11eXtF NqWS2DsipMTQ7+ZLmLdLvRYnAEoR2T/W0YvZujGfZ6LrGVtWLo8i8pxw/EouAiA= X-Google-Smtp-Source: AGHT+IFK6q20ASG2OqecviUCMLitEk10h3Mle27Ut7nnDH3NgPavpsZw/Lqn/wY9dZhva/K8mI4sNw== X-Received: by 2002:a05:6a00:2c5:b0:71d:f215:1d8a with SMTP id d2e1a72fcca58-71df2151f15mr12438717b3a.2.1728370499757; Mon, 07 Oct 2024 23:54:59 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-71df0d65246sm5462646b3a.169.2024.10.07.23.54.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:54:59 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:24 +0900 Subject: [PATCH RFC v5 04/10] tun: Unify vnet implementation Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-4-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 Both tun and tap exposes the same set of virtio-net-related features. Unify their implementations to ease future changes. Signed-off-by: Akihiko Odaki --- MAINTAINERS | 1 + drivers/net/tap.c | 172 ++++++---------------------------------- drivers/net/tun.c | 208 ++++++++----------------------------------------- drivers/net/tun_vnet.h | 181 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 238 insertions(+), 324 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index cc40a9d9b8cd..209b4e1cccb1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23338,6 +23338,7 @@ F: Documentation/networking/tuntap.rst F: arch/um/os-Linux/drivers/ F: drivers/net/tap.c F: drivers/net/tun.c +F: drivers/net/tun_vnet.h TURBOCHANNEL SUBSYSTEM M: "Maciej W. Rozycki" diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 77574f7a3bd4..9a34ceed0c2c 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -26,74 +26,9 @@ #include #include -#define TAP_IFFEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE) - -#define TAP_VNET_LE 0x80000000 -#define TAP_VNET_BE 0x40000000 - -#ifdef CONFIG_TUN_VNET_CROSS_LE -static inline bool tap_legacy_is_little_endian(struct tap_queue *q) -{ - return q->flags & TAP_VNET_BE ? false : - virtio_legacy_is_little_endian(); -} - -static long tap_get_vnet_be(struct tap_queue *q, int __user *sp) -{ - int s = !!(q->flags & TAP_VNET_BE); - - if (put_user(s, sp)) - return -EFAULT; - - return 0; -} - -static long tap_set_vnet_be(struct tap_queue *q, int __user *sp) -{ - int s; - - if (get_user(s, sp)) - return -EFAULT; - - if (s) - q->flags |= TAP_VNET_BE; - else - q->flags &= ~TAP_VNET_BE; - - return 0; -} -#else -static inline bool tap_legacy_is_little_endian(struct tap_queue *q) -{ - return virtio_legacy_is_little_endian(); -} - -static long tap_get_vnet_be(struct tap_queue *q, int __user *argp) -{ - return -EINVAL; -} +#include "tun_vnet.h" -static long tap_set_vnet_be(struct tap_queue *q, int __user *argp) -{ - return -EINVAL; -} -#endif /* CONFIG_TUN_VNET_CROSS_LE */ - -static inline bool tap_is_little_endian(struct tap_queue *q) -{ - return q->flags & TAP_VNET_LE || - tap_legacy_is_little_endian(q); -} - -static inline u16 tap16_to_cpu(struct tap_queue *q, __virtio16 val) -{ - return __virtio16_to_cpu(tap_is_little_endian(q), val); -} - -static inline __virtio16 cpu_to_tap16(struct tap_queue *q, u16 val) -{ - return __cpu_to_virtio16(tap_is_little_endian(q), val); -} +#define TAP_IFFEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE) static struct proto tap_proto = { .name = "tap", @@ -641,10 +576,10 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, struct sk_buff *skb; struct tap_dev *tap; unsigned long total_len = iov_iter_count(from); - unsigned long len = total_len; + unsigned long len; int err; struct virtio_net_hdr vnet_hdr = { 0 }; - int vnet_hdr_len = 0; + int hdr_len; int copylen = 0; int depth; bool zerocopy = false; @@ -652,38 +587,20 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, enum skb_drop_reason drop_reason; if (q->flags & IFF_VNET_HDR) { - vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); - - err = -EINVAL; - if (len < vnet_hdr_len) - goto err; - len -= vnet_hdr_len; - - err = -EFAULT; - if (!copy_from_iter_full(&vnet_hdr, sizeof(vnet_hdr), from)) - goto err; - iov_iter_advance(from, vnet_hdr_len - sizeof(vnet_hdr)); - if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && - tap16_to_cpu(q, vnet_hdr.csum_start) + - tap16_to_cpu(q, vnet_hdr.csum_offset) + 2 > - tap16_to_cpu(q, vnet_hdr.hdr_len)) - vnet_hdr.hdr_len = cpu_to_tap16(q, - tap16_to_cpu(q, vnet_hdr.csum_start) + - tap16_to_cpu(q, vnet_hdr.csum_offset) + 2); - err = -EINVAL; - if (tap16_to_cpu(q, vnet_hdr.hdr_len) > len) + hdr_len = tun_vnet_hdr_get(READ_ONCE(q->vnet_hdr_sz), q->flags, from, &vnet_hdr); + if (hdr_len < 0) { + err = hdr_len; goto err; + } + } else { + hdr_len = 0; } - err = -EINVAL; - if (unlikely(len < ETH_HLEN)) - goto err; - + len = iov_iter_count(from); if (msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) { struct iov_iter i; - copylen = vnet_hdr.hdr_len ? - tap16_to_cpu(q, vnet_hdr.hdr_len) : GOODCOPY_LEN; + copylen = hdr_len ? hdr_len : GOODCOPY_LEN; if (copylen > good_linear) copylen = good_linear; else if (copylen < ETH_HLEN) @@ -697,7 +614,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, if (!zerocopy) { copylen = len; - linear = tap16_to_cpu(q, vnet_hdr.hdr_len); + linear = hdr_len; if (linear > good_linear) linear = good_linear; else if (linear < ETH_HLEN) @@ -732,9 +649,8 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, } skb->dev = tap->dev; - if (vnet_hdr_len) { - err = virtio_net_hdr_to_skb(skb, &vnet_hdr, - tap_is_little_endian(q)); + if (q->flags & IFF_VNET_HDR) { + err = tun_vnet_hdr_to_skb(q->flags, skb, &vnet_hdr); if (err) { rcu_read_unlock(); drop_reason = SKB_DROP_REASON_DEV_HDR; @@ -797,23 +713,17 @@ static ssize_t tap_put_user(struct tap_queue *q, int total; if (q->flags & IFF_VNET_HDR) { - int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0; struct virtio_net_hdr vnet_hdr; vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); - if (iov_iter_count(iter) < vnet_hdr_len) - return -EINVAL; - - if (virtio_net_hdr_from_skb(skb, &vnet_hdr, - tap_is_little_endian(q), true, - vlan_hlen)) - BUG(); - if (copy_to_iter(&vnet_hdr, sizeof(vnet_hdr), iter) != - sizeof(vnet_hdr)) - return -EFAULT; + ret = tun_vnet_hdr_from_skb(q->flags, NULL, skb, &vnet_hdr); + if (ret < 0) + goto done; - iov_iter_advance(iter, vnet_hdr_len - sizeof(vnet_hdr)); + ret = tun_vnet_hdr_put(vnet_hdr_len, iter, &vnet_hdr); + if (ret < 0) + goto done; } total = vnet_hdr_len; total += skb->len; @@ -1072,42 +982,6 @@ static long tap_ioctl(struct file *file, unsigned int cmd, q->sk.sk_sndbuf = s; return 0; - case TUNGETVNETHDRSZ: - s = q->vnet_hdr_sz; - if (put_user(s, sp)) - return -EFAULT; - return 0; - - case TUNSETVNETHDRSZ: - if (get_user(s, sp)) - return -EFAULT; - if (s < (int)sizeof(struct virtio_net_hdr)) - return -EINVAL; - - q->vnet_hdr_sz = s; - return 0; - - case TUNGETVNETLE: - s = !!(q->flags & TAP_VNET_LE); - if (put_user(s, sp)) - return -EFAULT; - return 0; - - case TUNSETVNETLE: - if (get_user(s, sp)) - return -EFAULT; - if (s) - q->flags |= TAP_VNET_LE; - else - q->flags &= ~TAP_VNET_LE; - return 0; - - case TUNGETVNETBE: - return tap_get_vnet_be(q, sp); - - case TUNSETVNETBE: - return tap_set_vnet_be(q, sp); - case TUNSETOFFLOAD: /* let the user check for future flags */ if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | @@ -1151,7 +1025,7 @@ static long tap_ioctl(struct file *file, unsigned int cmd, return ret; default: - return -EINVAL; + return tun_vnet_ioctl(&q->vnet_hdr_sz, &q->flags, cmd, sp); } } @@ -1199,7 +1073,7 @@ static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp) skb->protocol = eth_hdr(skb)->h_proto; if (vnet_hdr_len) { - err = virtio_net_hdr_to_skb(skb, gso, tap_is_little_endian(q)); + err = tun_vnet_hdr_to_skb(q->flags, skb, gso); if (err) goto err_kfree; } diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 1d06c560c5e6..dd8799d19518 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -83,6 +83,8 @@ #include #include +#include "tun_vnet.h" + static void tun_default_link_ksettings(struct net_device *dev, struct ethtool_link_ksettings *cmd); @@ -94,9 +96,6 @@ static void tun_default_link_ksettings(struct net_device *dev, * overload it to mean fasync when stored there. */ #define TUN_FASYNC IFF_ATTACH_QUEUE -/* High bits in flags field are unused. */ -#define TUN_VNET_LE 0x80000000 -#define TUN_VNET_BE 0x40000000 #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \ IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS) @@ -298,70 +297,6 @@ static bool tun_napi_frags_enabled(const struct tun_file *tfile) return tfile->napi_frags_enabled; } -#ifdef CONFIG_TUN_VNET_CROSS_LE -static inline bool tun_legacy_is_little_endian(struct tun_struct *tun) -{ - return tun->flags & TUN_VNET_BE ? false : - virtio_legacy_is_little_endian(); -} - -static long tun_get_vnet_be(struct tun_struct *tun, int __user *argp) -{ - int be = !!(tun->flags & TUN_VNET_BE); - - if (put_user(be, argp)) - return -EFAULT; - - return 0; -} - -static long tun_set_vnet_be(struct tun_struct *tun, int __user *argp) -{ - int be; - - if (get_user(be, argp)) - return -EFAULT; - - if (be) - tun->flags |= TUN_VNET_BE; - else - tun->flags &= ~TUN_VNET_BE; - - return 0; -} -#else -static inline bool tun_legacy_is_little_endian(struct tun_struct *tun) -{ - return virtio_legacy_is_little_endian(); -} - -static long tun_get_vnet_be(struct tun_struct *tun, int __user *argp) -{ - return -EINVAL; -} - -static long tun_set_vnet_be(struct tun_struct *tun, int __user *argp) -{ - return -EINVAL; -} -#endif /* CONFIG_TUN_VNET_CROSS_LE */ - -static inline bool tun_is_little_endian(struct tun_struct *tun) -{ - return tun->flags & TUN_VNET_LE || - tun_legacy_is_little_endian(tun); -} - -static inline u16 tun16_to_cpu(struct tun_struct *tun, __virtio16 val) -{ - return __virtio16_to_cpu(tun_is_little_endian(tun), val); -} - -static inline __virtio16 cpu_to_tun16(struct tun_struct *tun, u16 val) -{ - return __cpu_to_virtio16(tun_is_little_endian(tun), val); -} - static inline u32 tun_hashfn(u32 rxhash) { return rxhash & TUN_MASK_FLOW_ENTRIES; @@ -1751,8 +1686,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) }; struct sk_buff *skb; size_t total_len = iov_iter_count(from); - size_t len = total_len, align = tun->align, linear; + size_t len, align = tun->align, linear; struct virtio_net_hdr gso = { 0 }; + int hdr_len; int good_linear; int copylen; bool zerocopy = false; @@ -1763,37 +1699,25 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, enum skb_drop_reason drop_reason = SKB_DROP_REASON_NOT_SPECIFIED; if (!(tun->flags & IFF_NO_PI)) { - if (len < sizeof(pi)) + if (iov_iter_count(from) < sizeof(pi)) return -EINVAL; - len -= sizeof(pi); if (!copy_from_iter_full(&pi, sizeof(pi), from)) return -EFAULT; } if (tun->flags & IFF_VNET_HDR) { - int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz); - - if (len < vnet_hdr_sz) - return -EINVAL; - len -= vnet_hdr_sz; - - if (!copy_from_iter_full(&gso, sizeof(gso), from)) - return -EFAULT; - - if ((gso.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && - tun16_to_cpu(tun, gso.csum_start) + tun16_to_cpu(tun, gso.csum_offset) + 2 > tun16_to_cpu(tun, gso.hdr_len)) - gso.hdr_len = cpu_to_tun16(tun, tun16_to_cpu(tun, gso.csum_start) + tun16_to_cpu(tun, gso.csum_offset) + 2); - - if (tun16_to_cpu(tun, gso.hdr_len) > len) - return -EINVAL; - iov_iter_advance(from, vnet_hdr_sz - sizeof(gso)); + hdr_len = tun_vnet_hdr_get(READ_ONCE(tun->vnet_hdr_sz), tun->flags, from, &gso); + if (hdr_len < 0) + return hdr_len; + } else { + hdr_len = 0; } + len = iov_iter_count(from); if ((tun->flags & TUN_TYPE_MASK) == IFF_TAP) { align += NET_IP_ALIGN; - if (unlikely(len < ETH_HLEN || - (gso.hdr_len && tun16_to_cpu(tun, gso.hdr_len) < ETH_HLEN))) + if (unlikely(len < ETH_HLEN || (hdr_len && hdr_len < ETH_HLEN))) return -EINVAL; } @@ -1806,7 +1730,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, * enough room for skb expand head in case it is used. * The rest of the buffer is mapped from userspace. */ - copylen = gso.hdr_len ? tun16_to_cpu(tun, gso.hdr_len) : GOODCOPY_LEN; + copylen = hdr_len ? hdr_len : GOODCOPY_LEN; if (copylen > good_linear) copylen = good_linear; linear = copylen; @@ -1829,10 +1753,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, } else { if (!zerocopy) { copylen = len; - if (tun16_to_cpu(tun, gso.hdr_len) > good_linear) + if (hdr_len > good_linear) linear = good_linear; else - linear = tun16_to_cpu(tun, gso.hdr_len); + linear = hdr_len; } if (frags) { @@ -1867,7 +1791,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, } } - if (virtio_net_hdr_to_skb(skb, &gso, tun_is_little_endian(tun))) { + if (tun_vnet_hdr_to_skb(tun->flags, skb, &gso)) { atomic_long_inc(&tun->rx_frame_errors); err = -EINVAL; goto free_skb; @@ -2060,29 +1984,27 @@ static ssize_t tun_put_user_xdp(struct tun_struct *tun, struct xdp_frame *xdp_frame, struct iov_iter *iter) { + int ret; int vnet_hdr_sz = 0; size_t size = xdp_frame->len; - size_t ret; + size_t total; if (tun->flags & IFF_VNET_HDR) { struct virtio_net_hdr gso = { 0 }; vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz); - if (unlikely(iov_iter_count(iter) < vnet_hdr_sz)) - return -EINVAL; - if (unlikely(copy_to_iter(&gso, sizeof(gso), iter) != - sizeof(gso))) - return -EFAULT; - iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso)); + ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso); + if (ret < 0) + return ret; } - ret = copy_to_iter(xdp_frame->data, size, iter) + vnet_hdr_sz; + total = copy_to_iter(xdp_frame->data, size, iter) + vnet_hdr_sz; preempt_disable(); - dev_sw_netstats_tx_add(tun->dev, 1, ret); + dev_sw_netstats_tx_add(tun->dev, 1, total); preempt_enable(); - return ret; + return total; } /* Put packet to the user space buffer */ @@ -2096,6 +2018,7 @@ static ssize_t tun_put_user(struct tun_struct *tun, int vlan_offset = 0; int vlan_hlen = 0; int vnet_hdr_sz = 0; + int ret; if (skb_vlan_tag_present(skb)) vlan_hlen = VLAN_HLEN; @@ -2122,31 +2045,13 @@ static ssize_t tun_put_user(struct tun_struct *tun, if (vnet_hdr_sz) { struct virtio_net_hdr gso; - if (iov_iter_count(iter) < vnet_hdr_sz) - return -EINVAL; - - if (virtio_net_hdr_from_skb(skb, &gso, - tun_is_little_endian(tun), true, - vlan_hlen)) { - struct skb_shared_info *sinfo = skb_shinfo(skb); - - if (net_ratelimit()) { - netdev_err(tun->dev, "unexpected GSO type: 0x%x, gso_size %d, hdr_len %d\n", - sinfo->gso_type, tun16_to_cpu(tun, gso.gso_size), - tun16_to_cpu(tun, gso.hdr_len)); - print_hex_dump(KERN_ERR, "tun: ", - DUMP_PREFIX_NONE, - 16, 1, skb->head, - min((int)tun16_to_cpu(tun, gso.hdr_len), 64), true); - } - WARN_ON_ONCE(1); - return -EINVAL; - } - - if (copy_to_iter(&gso, sizeof(gso), iter) != sizeof(gso)) - return -EFAULT; + ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso); + if (ret < 0) + goto done; - iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso)); + ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso); + if (ret < 0) + goto done; } if (vlan_hlen) { @@ -2506,7 +2411,7 @@ static int tun_xdp_one(struct tun_struct *tun, skb_reserve(skb, xdp->data - xdp->data_hard_start); skb_put(skb, xdp->data_end - xdp->data); - if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) { + if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) { atomic_long_inc(&tun->rx_frame_errors); kfree_skb(skb); ret = -EINVAL; @@ -3090,8 +2995,6 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, kgid_t group; int ifindex; int sndbuf; - int vnet_hdr_sz; - int le; int ret; bool do_notify = false; @@ -3298,50 +3201,6 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, tun_set_sndbuf(tun); break; - case TUNGETVNETHDRSZ: - vnet_hdr_sz = tun->vnet_hdr_sz; - if (copy_to_user(argp, &vnet_hdr_sz, sizeof(vnet_hdr_sz))) - ret = -EFAULT; - break; - - case TUNSETVNETHDRSZ: - if (copy_from_user(&vnet_hdr_sz, argp, sizeof(vnet_hdr_sz))) { - ret = -EFAULT; - break; - } - if (vnet_hdr_sz < (int)sizeof(struct virtio_net_hdr)) { - ret = -EINVAL; - break; - } - - tun->vnet_hdr_sz = vnet_hdr_sz; - break; - - case TUNGETVNETLE: - le = !!(tun->flags & TUN_VNET_LE); - if (put_user(le, (int __user *)argp)) - ret = -EFAULT; - break; - - case TUNSETVNETLE: - if (get_user(le, (int __user *)argp)) { - ret = -EFAULT; - break; - } - if (le) - tun->flags |= TUN_VNET_LE; - else - tun->flags &= ~TUN_VNET_LE; - break; - - case TUNGETVNETBE: - ret = tun_get_vnet_be(tun, argp); - break; - - case TUNSETVNETBE: - ret = tun_set_vnet_be(tun, argp); - break; - case TUNATTACHFILTER: /* Can be set only for TAPs */ ret = -EINVAL; @@ -3397,8 +3256,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, break; default: - ret = -EINVAL; - break; + ret = tun_vnet_ioctl(&tun->vnet_hdr_sz, &tun->flags, cmd, argp); } if (do_notify) diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h new file mode 100644 index 000000000000..7c7f3f6d85e9 --- /dev/null +++ b/drivers/net/tun_vnet.h @@ -0,0 +1,181 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef TUN_VNET_H +#define TUN_VNET_H + +/* High bits in flags field are unused. */ +#define TUN_VNET_LE 0x80000000 +#define TUN_VNET_BE 0x40000000 + +static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags) +{ + return !(IS_ENABLED(CONFIG_TUN_VNET_CROSS_LE) && (flags & TUN_VNET_BE)) && + virtio_legacy_is_little_endian(); +} + +static inline long tun_vnet_get_be(int flags, int __user *sp) +{ + int s = !!(flags & TUN_VNET_BE); + + if (!IS_ENABLED(CONFIG_TUN_VNET_CROSS_LE)) + return -EINVAL; + + if (put_user(s, sp)) + return -EFAULT; + + return 0; +} + +static inline long tun_vnet_set_be(int *flags, int __user *sp) +{ + int s; + + if (!IS_ENABLED(CONFIG_TUN_VNET_CROSS_LE)) + return -EINVAL; + + if (get_user(s, sp)) + return -EFAULT; + + if (s) + *flags |= TUN_VNET_BE; + else + *flags &= ~TUN_VNET_BE; + + return 0; +} + +static inline bool tun_vnet_is_little_endian(unsigned int flags) +{ + return flags & TUN_VNET_LE || tun_vnet_legacy_is_little_endian(flags); +} + +static inline u16 tun_vnet16_to_cpu(unsigned int flags, __virtio16 val) +{ + return __virtio16_to_cpu(tun_vnet_is_little_endian(flags), val); +} + +static inline __virtio16 cpu_to_tun_vnet16(unsigned int flags, u16 val) +{ + return __cpu_to_virtio16(tun_vnet_is_little_endian(flags), val); +} + +static inline long tun_vnet_ioctl(int *sz, unsigned int *flags, + unsigned int cmd, int __user *sp) +{ + int s; + + switch (cmd) { + case TUNGETVNETHDRSZ: + s = *sz; + if (put_user(s, sp)) + return -EFAULT; + return 0; + + case TUNSETVNETHDRSZ: + if (get_user(s, sp)) + return -EFAULT; + if (s < (int)sizeof(struct virtio_net_hdr)) + return -EINVAL; + + *sz = s; + return 0; + + case TUNGETVNETLE: + s = !!(*flags & TUN_VNET_LE); + if (put_user(s, sp)) + return -EFAULT; + return 0; + + case TUNSETVNETLE: + if (get_user(s, sp)) + return -EFAULT; + if (s) + *flags |= TUN_VNET_LE; + else + *flags &= ~TUN_VNET_LE; + return 0; + + case TUNGETVNETBE: + return tun_vnet_get_be(*flags, sp); + + case TUNSETVNETBE: + return tun_vnet_set_be(flags, sp); + + default: + return -EINVAL; + } +} + +static inline int tun_vnet_hdr_get(int sz, unsigned int flags, + struct iov_iter *from, + struct virtio_net_hdr *hdr) +{ + if (iov_iter_count(from) < sz) + return -EINVAL; + + if (!copy_from_iter_full(hdr, sizeof(*hdr), from)) + return -EFAULT; + + iov_iter_advance(from, sz - sizeof(*hdr)); + if ((hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && + tun_vnet16_to_cpu(flags, hdr->csum_start) + + tun_vnet16_to_cpu(flags, hdr->csum_offset) + 2 > + tun_vnet16_to_cpu(flags, hdr->hdr_len)) + hdr->hdr_len = cpu_to_tun_vnet16(flags, + tun_vnet16_to_cpu(flags, hdr->csum_start) + + tun_vnet16_to_cpu(flags, hdr->csum_offset) + 2); + if (tun_vnet16_to_cpu(flags, hdr->hdr_len) > iov_iter_count(from)) + return -EINVAL; + + return tun_vnet16_to_cpu(flags, hdr->hdr_len); +} + +static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter, + const struct virtio_net_hdr *hdr) +{ + if (iov_iter_count(iter) < sz) + return -EINVAL; + + if (copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)) + return -EFAULT; + + iov_iter_advance(iter, sz - sizeof(*hdr)); + + return 0; +} + +static inline int tun_vnet_hdr_to_skb(unsigned int flags, + struct sk_buff *skb, + const struct virtio_net_hdr *hdr) +{ + return virtio_net_hdr_to_skb(skb, hdr, tun_vnet_is_little_endian(flags)); +} + +static inline int tun_vnet_hdr_from_skb(unsigned int flags, + const struct net_device *dev, + const struct sk_buff *skb, + struct virtio_net_hdr *hdr) +{ + int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0; + + if (virtio_net_hdr_from_skb(skb, hdr, + tun_vnet_is_little_endian(flags), true, + vlan_hlen)) { + struct skb_shared_info *sinfo = skb_shinfo(skb); + + if (net_ratelimit()) { + netdev_err(dev, "unexpected GSO type: 0x%x, gso_size %d, hdr_len %d\n", + sinfo->gso_type, tun_vnet16_to_cpu(flags, hdr->gso_size), + tun_vnet16_to_cpu(flags, hdr->hdr_len)); + print_hex_dump(KERN_ERR, "tun: ", + DUMP_PREFIX_NONE, + 16, 1, skb->head, + min(tun_vnet16_to_cpu(flags, hdr->hdr_len), 64), true); + } + WARN_ON_ONCE(1); + return -EINVAL; + } + + return 0; +} + +#endif /* TUN_VNET_H */ From patchwork Tue Oct 8 06:54:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825748 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D1681E7C23 for ; Tue, 8 Oct 2024 06:55:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370507; cv=none; b=jxumiFWcO0/ZrYMaMLzSrEhJfcosvidtkoxa2Txppo0LkKwSAkEQmpe5+3nyt8jt/1eb9MMcN6xwY0KowSv6OE9OETmcg0heiuy3PcK7jLc7vCcpgV/mQg3ooXHs/ozOROP7ccbd5HE7CTtcL2aDhDZO2TqgvID8DEgKjoNbbuY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370507; c=relaxed/simple; bh=lANdjeuF5u8nCug7V+6XvVtwfr4345Ds4BaIqTrhBSo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=MRzctQWOuPMlDrEqolYsMKk2HoURotiDbhXC1IMr5IVlg3UyvANfL8BEgJO9JDnio788kYSWCJsxM3AZ1PFx+wvXVoO/OZ0HiD4TTIsJn8cR2rf7W3qmb+exGVwzVrP38YM/Z586DgdorClDsfac2gde7TLtTk71Pxb74Wwi0OM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=QDNueq+A; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="QDNueq+A" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-20b84bfbdfcso40634955ad.0 for ; Mon, 07 Oct 2024 23:55:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370505; x=1728975305; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=5dvPHHUdir+WDvdCaxVm3cJC63BLzIFu5EZfdDdcBS8=; b=QDNueq+AeRQoRn72/ck64S6BdCIh4ats6V6JhBO+aotsr7mK3mom7IlnRWX+l/zA/6 fNAJ98NBiCm/3Tto8piMKGps+Qu9BSTfqTpdXKpDkPoCVd0VWeW5GO95gDWUbqBfKmlR zyJCpsUNi2t+q6wBHHIc1CYImSiNDRrLjw26O0jdDTFmWBJevrdXRuD7MMbvSo6Z9NLa /q5iFQR22ggMY/Xsy1aRVvthIhTV1YOVE2fR7+blsf3unGwT4HSYNuQu/uAnrIzu70zn cF5tRHFjhdqM17mLEn3OudX+Ivtv6IsVD6k21PkFFxNuVrWJKye4JGKObQoPlopB95Es JxAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370505; x=1728975305; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5dvPHHUdir+WDvdCaxVm3cJC63BLzIFu5EZfdDdcBS8=; b=uLszyOtJm8vDuKci2Cx8kF2/xRRv4GYnjpzKJIUE0QXKfrM70iv1K1fz2dobUqF+aD SHQok5QMbtNo68wqy/qtyzq4wN2An/t6Zd5t6R4uJ2xNqiFyxU5B7ubOUWiNBXpW0GOy 4tlDUJB5a1IS66l7IuET7iMOpBEq2In+TRxRrsj65zMnGVPSAtbVJtxYrkUriS/aYGLr EaklJlORnMf2YGc/L7K8Et27FQuahPiCh94riQyqj5BqVvKoVuaI/SUrAUgyVGfVmqsy c7PvAQFsMUnRTVr0YjXusV1GgVlr2GjzvtWNuIjrsoJWzEF2FqFaqgE4zKTWi/ct1H4f WNvA== X-Forwarded-Encrypted: i=1; AJvYcCUu1qV0Cfv6PaCqMA60GFnPy6QhVqPNSoQG+8ahpgZqHem1uCWFFNxZmR8ASNCbZgJTLQM=@vger.kernel.org X-Gm-Message-State: AOJu0YxIil39EIcEXmK6Wh7ZMukXagJ7wWsRiFPNdFFZ+cvj4XW2vOIQ hfmFg2xNHIqXyqXtYm4pnynoo7z81KJOg8ykYXBX3xrPOFz8E00XW78cXpkFgMA= X-Google-Smtp-Source: AGHT+IHz3TcZF7m5dn69T56wM9r7uLbH/pxoXLCwFEF5B/fIFcyPsIk0p2kYddj3LyEN9FsnedlIaQ== X-Received: by 2002:a17:903:41d2:b0:20b:a41f:6e4d with SMTP id d9443c01a7336-20c4e2bb35bmr36489095ad.15.1728370505493; Mon, 07 Oct 2024 23:55:05 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-20c1396d995sm49759165ad.230.2024.10.07.23.55.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:55:05 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:25 +0900 Subject: [PATCH RFC v5 05/10] tun: Pad virtio header with zero Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-5-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 tun used to simply advance iov_iter when it needs to pad virtio header, which leaves the garbage in the buffer as is. This is especially problematic when tun starts to allow enabling the hash reporting feature; even if the feature is enabled, the packet may lack a hash value and may contain a hole in the virtio header because the packet arrived before the feature gets enabled or does not contain the header fields to be hashed. If the hole is not filled with zero, it is impossible to tell if the packet lacks a hash value. In theory, a user of tun can fill the buffer with zero before calling read() to avoid such a problem, but leaving the garbage in the buffer is awkward anyway so fill the buffer in tun. Signed-off-by: Akihiko Odaki --- drivers/net/tun_vnet.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h index 7c7f3f6d85e9..c40bde0fdf8c 100644 --- a/drivers/net/tun_vnet.h +++ b/drivers/net/tun_vnet.h @@ -138,7 +138,8 @@ static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter, if (copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)) return -EFAULT; - iov_iter_advance(iter, sz - sizeof(*hdr)); + if (iov_iter_zero(sz - sizeof(*hdr), iter) != sz - sizeof(*hdr)) + return -EFAULT; return 0; } From patchwork Tue Oct 8 06:54:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825749 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D88D21E884A for ; Tue, 8 Oct 2024 06:55:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370514; cv=none; b=fjIILwcIcrm7uJsv7oyqcK1lEqKLHjQjK5M+cv1QdTVxwMScq5Ul7p5qmfxq+yseFplkm8yhzbMhNtZ+2A/XIkKI/LWiIIbRy+zQrXw5zczWLqPvz6Gn9vZutOOEkhx2Uv1iGEE1Zms7Owsxm3397VY+kwbMJsIaBWq3AmoZGTY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370514; c=relaxed/simple; bh=R0ayEEoH7zV14w05caZenXZ/4uRpgQ5UGzQNu6e7X4Y=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=XG9pj64mKNNXlpc2y/qxB9M+B3q+ZGibiX6sxlJpef3grzqGhG8Shmzj396wDImc2tZgiObt3q3ACOxB8OMB5h81mguKYsYz/dFFTpwulIVeCBNyhUtsxSVzufIFa9DYJXsy10f6+dmH8vpD5sMhlkxvDi+vdhqIhasw256crgU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=f5+LxzEf; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="f5+LxzEf" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-71de9e1f374so2575761b3a.1 for ; Mon, 07 Oct 2024 23:55:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370511; x=1728975311; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=DA2oj7H5OVZ6lSTog1r2qmjn4a38P1HWbQEDFGyFaSY=; b=f5+LxzEfoJMk++7P9MQTmhqWJJJCpIxVZMm6Fdfb5RPB8U8Q0xETLsTMb+gHJtI5BW hpWRKUmAl7HkRm2WfYhteRkeVlDMSD6bZYeti5d+p7uOArO8vZPllnFAw3XtbxkPnRd6 MPQuB2vTDhPMJS+4e+rta/AgRF+W+1Hy7napIDTP4gLXIxODgU1MPj4ts7RWSoLugrek TuJ1P6L40JyVC/AWXZ0IpHaDkCyJFlRCiqzV/zhg9c8WwS7tQEdHqlvZdJgW+uDsTNF6 Gd7T1iWEQmO+KdSJBjoSiW9erIE6VNwS6fS7MkYyD5VZpbrAJcCOAnjbp/Jo2Ov5yw8u yKUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370511; x=1728975311; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DA2oj7H5OVZ6lSTog1r2qmjn4a38P1HWbQEDFGyFaSY=; b=VyYkxeV4qQpLwtMYp+42u9fn6200ty8SRJfoZJBvfEC/WWu5ZYT2+Inpfy2tmm93Bg oMp0RLyZeXp8799VQj1dr6RT41RSiZVJAqhTQf9jaZ4oH3b6xeti3e+v+/gsILXt/vzU jv9SfYYHfEtRD56Z7YgkyNjXzRjiKjUxlbtDF0Xmq7EyJm400pTNWM23YjO5L0UJilZb TufvH4n4rmSl+JF4DsolGUhal0ICqzpn0SwL3pChdHjQ4DJzc9ue588J0DHl6CYZRXj4 yS5uhnhDXEFHEimb3JYPllqGLGHh5chEcfPwmQ6ezcFdXgV2JOVWyltEtrT4mardeTP+ vOnQ== X-Forwarded-Encrypted: i=1; AJvYcCUa/hRHOa8thMDcytTwOz3Ue0oW3YbeoSnO62jDlXA0PgvhC6ZUX3BGOE2JUuE+hKdJLoM=@vger.kernel.org X-Gm-Message-State: AOJu0YyK37G17y3xeAQIVQmAhg92q7/Q3FHpCiqifAfp6LvgtO2lmqsn nv383Of41pleKeMhQ91nbgIviwAxXsqLhHdIz61reHqO7cjU9M3VJUZKdUeENAU= X-Google-Smtp-Source: AGHT+IH/PdPnRx/AzUAEXbpjOGjVmc/RfKcOPN1NNNnBFyBJ/p3nDKSCwUMJJzzYWLwu0D/gF10UIw== X-Received: by 2002:a05:6a20:c916:b0:1d2:e8d8:dd46 with SMTP id adf61e73a8af0-1d6dfa33c7fmr23701211637.15.1728370511203; Mon, 07 Oct 2024 23:55:11 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-71df0cbbac6sm5466931b3a.39.2024.10.07.23.55.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:55:10 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:26 +0900 Subject: [PATCH RFC v5 06/10] tun: Introduce virtio-net hash reporting feature Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-6-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 Allow the guest to reuse the hash value to make receive steering consistent between the host and guest, and to save hash computation. Signed-off-by: Akihiko Odaki --- Documentation/networking/tuntap.rst | 7 +++ drivers/net/Kconfig | 1 + drivers/net/tap.c | 45 ++++++++++++++-- drivers/net/tun.c | 46 ++++++++++++---- drivers/net/tun_vnet.h | 102 +++++++++++++++++++++++++++++++----- include/linux/if_tap.h | 2 + include/uapi/linux/if_tun.h | 48 +++++++++++++++++ 7 files changed, 223 insertions(+), 28 deletions(-) diff --git a/Documentation/networking/tuntap.rst b/Documentation/networking/tuntap.rst index 4d7087f727be..86b4ae8caa8a 100644 --- a/Documentation/networking/tuntap.rst +++ b/Documentation/networking/tuntap.rst @@ -206,6 +206,13 @@ enable is true we enable it, otherwise we disable it:: return ioctl(fd, TUNSETQUEUE, (void *)&ifr); } +3.4 Reference +------------- + +``linux/if_tun.h`` defines the interface described below: + +.. kernel-doc:: include/uapi/linux/if_tun.h + Universal TUN/TAP device driver Frequently Asked Question ========================================================= diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 9920b3a68ed1..e2a7bd703550 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -395,6 +395,7 @@ config TUN tristate "Universal TUN/TAP device driver support" depends on INET select CRC32 + select SKB_EXTENSIONS help TUN/TAP provides packet reception and transmission for user space programs. It can be viewed as a simple Point-to-Point or Ethernet diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 9a34ceed0c2c..5e2fbe63ca47 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -179,6 +179,16 @@ static void tap_put_queue(struct tap_queue *q) sock_put(&q->sk); } +static struct virtio_net_hash *tap_add_hash(struct sk_buff *skb) +{ + return (struct virtio_net_hash *)skb->cb; +} + +static const struct virtio_net_hash *tap_find_hash(const struct sk_buff *skb) +{ + return (const struct virtio_net_hash *)skb->cb; +} + /* * Select a queue based on the rxq of the device on which this packet * arrived. If the incoming device is not mq, calculate a flow hash @@ -189,6 +199,7 @@ static void tap_put_queue(struct tap_queue *q) static struct tap_queue *tap_get_queue(struct tap_dev *tap, struct sk_buff *skb) { + struct flow_keys_basic keys_basic; struct tap_queue *queue = NULL; /* Access to taps array is protected by rcu, but access to numvtaps * isn't. Below we use it to lookup a queue, but treat it as a hint @@ -198,15 +209,32 @@ static struct tap_queue *tap_get_queue(struct tap_dev *tap, int numvtaps = READ_ONCE(tap->numvtaps); __u32 rxq; + *tap_add_hash(skb) = (struct virtio_net_hash) { .report = VIRTIO_NET_HASH_REPORT_NONE }; + if (!numvtaps) goto out; if (numvtaps == 1) goto single; + if (!skb->l4_hash && !skb->sw_hash) { + struct flow_keys keys; + + skb_flow_dissect_flow_keys(skb, &keys, FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); + rxq = flow_hash_from_keys(&keys); + keys_basic = (struct flow_keys_basic) { + .control = keys.control, + .basic = keys.basic + }; + } else { + skb_flow_dissect_flow_keys_basic(NULL, skb, &keys_basic, NULL, 0, 0, 0, + FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); + rxq = skb->hash; + } + /* Check if we can use flow to select a queue */ - rxq = skb_get_hash(skb); if (rxq) { + tun_vnet_hash_report(&tap->vnet_hash, skb, &keys_basic, rxq, tap_add_hash); queue = rcu_dereference(tap->taps[rxq % numvtaps]); goto out; } @@ -713,15 +741,16 @@ static ssize_t tap_put_user(struct tap_queue *q, int total; if (q->flags & IFF_VNET_HDR) { - struct virtio_net_hdr vnet_hdr; + struct virtio_net_hdr_v1_hash vnet_hdr; vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); - ret = tun_vnet_hdr_from_skb(q->flags, NULL, skb, &vnet_hdr); + ret = tun_vnet_hdr_from_skb(vnet_hdr_len, q->flags, NULL, skb, + tap_find_hash, &vnet_hdr); if (ret < 0) goto done; - ret = tun_vnet_hdr_put(vnet_hdr_len, iter, &vnet_hdr); + ret = tun_vnet_hdr_put(vnet_hdr_len, iter, &vnet_hdr, ret); if (ret < 0) goto done; } @@ -1025,7 +1054,13 @@ static long tap_ioctl(struct file *file, unsigned int cmd, return ret; default: - return tun_vnet_ioctl(&q->vnet_hdr_sz, &q->flags, cmd, sp); + rtnl_lock(); + tap = rtnl_dereference(q->tap); + ret = tun_vnet_ioctl(&q->vnet_hdr_sz, &q->flags, + tap ? &tap->vnet_hash : NULL, -EINVAL, + cmd, sp); + rtnl_unlock(); + return ret; } } diff --git a/drivers/net/tun.c b/drivers/net/tun.c index dd8799d19518..27308417b834 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -209,6 +209,7 @@ struct tun_struct { struct bpf_prog __rcu *xdp_prog; struct tun_prog __rcu *steering_prog; struct tun_prog __rcu *filter_prog; + struct tun_vnet_hash vnet_hash; struct ethtool_link_ksettings link_ksettings; /* init args */ struct file *file; @@ -451,6 +452,16 @@ static inline void tun_flow_save_rps_rxhash(struct tun_flow_entry *e, u32 hash) e->rps_rxhash = hash; } +static struct virtio_net_hash *tun_add_hash(struct sk_buff *skb) +{ + return skb_ext_add(skb, SKB_EXT_TUN_VNET_HASH); +} + +static const struct virtio_net_hash *tun_find_hash(const struct sk_buff *skb) +{ + return skb_ext_find(skb, SKB_EXT_TUN_VNET_HASH); +} + /* We try to identify a flow through its rxhash. The reason that * we do not check rxq no. is because some cards(e.g 82599), chooses * the rxq based on the txq where the last packet of the flow comes. As @@ -459,12 +470,17 @@ static inline void tun_flow_save_rps_rxhash(struct tun_flow_entry *e, u32 hash) */ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) { + struct flow_keys keys; + struct flow_keys_basic keys_basic; struct tun_flow_entry *e; u32 txq, numqueues; numqueues = READ_ONCE(tun->numqueues); - txq = __skb_get_hash_symmetric(skb); + memset(&keys, 0, sizeof(keys)); + skb_flow_dissect(skb, &flow_keys_dissector_symmetric, &keys, 0); + + txq = flow_hash_from_keys(&keys); e = tun_flow_find(&tun->flows[tun_hashfn(txq)], txq); if (e) { tun_flow_save_rps_rxhash(e, txq); @@ -473,6 +489,13 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) txq = reciprocal_scale(txq, numqueues); } + keys_basic = (struct flow_keys_basic) { + .control = keys.control, + .basic = keys.basic + }; + tun_vnet_hash_report(&tun->vnet_hash, skb, &keys_basic, skb->l4_hash ? skb->hash : txq, + tun_add_hash); + return txq; } @@ -1990,10 +2013,8 @@ static ssize_t tun_put_user_xdp(struct tun_struct *tun, size_t total; if (tun->flags & IFF_VNET_HDR) { - struct virtio_net_hdr gso = { 0 }; - vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz); - ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso); + ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, NULL, 0); if (ret < 0) return ret; } @@ -2018,7 +2039,6 @@ static ssize_t tun_put_user(struct tun_struct *tun, int vlan_offset = 0; int vlan_hlen = 0; int vnet_hdr_sz = 0; - int ret; if (skb_vlan_tag_present(skb)) vlan_hlen = VLAN_HLEN; @@ -2043,13 +2063,15 @@ static ssize_t tun_put_user(struct tun_struct *tun, } if (vnet_hdr_sz) { - struct virtio_net_hdr gso; + struct virtio_net_hdr_v1_hash gso; + int ret; - ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso); + ret = tun_vnet_hdr_from_skb(vnet_hdr_sz, tun->flags, tun->dev, skb, + tun_find_hash, &gso); if (ret < 0) goto done; - ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso); + ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso, ret); if (ret < 0) goto done; } @@ -3055,9 +3077,10 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, goto unlock; } - ret = -EBADFD; - if (!tun) + if (!tun) { + ret = tun_vnet_ioctl(NULL, NULL, NULL, -EBADFD, cmd, argp); goto unlock; + } netif_info(tun, drv, tun->dev, "tun_chr_ioctl cmd %u\n", cmd); @@ -3256,7 +3279,8 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, break; default: - ret = tun_vnet_ioctl(&tun->vnet_hdr_sz, &tun->flags, cmd, argp); + ret = tun_vnet_ioctl(&tun->vnet_hdr_sz, &tun->flags, + &tun->vnet_hash, -EINVAL, cmd, argp); } if (do_notify) diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h index c40bde0fdf8c..589a97dd7d02 100644 --- a/drivers/net/tun_vnet.h +++ b/drivers/net/tun_vnet.h @@ -6,6 +6,9 @@ #define TUN_VNET_LE 0x80000000 #define TUN_VNET_BE 0x40000000 +typedef struct virtio_net_hash *(*tun_vnet_hash_add)(struct sk_buff *); +typedef const struct virtio_net_hash *(*tun_vnet_hash_find)(const struct sk_buff *); + static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags) { return !(IS_ENABLED(CONFIG_TUN_VNET_CROSS_LE) && (flags & TUN_VNET_BE)) && @@ -59,18 +62,31 @@ static inline __virtio16 cpu_to_tun_vnet16(unsigned int flags, u16 val) } static inline long tun_vnet_ioctl(int *sz, unsigned int *flags, - unsigned int cmd, int __user *sp) + struct tun_vnet_hash *hash, long fallback, + unsigned int cmd, void __user *argp) { + static const struct tun_vnet_hash cap = { + .flags = TUN_VNET_HASH_REPORT, + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES + }; + struct tun_vnet_hash hash_buf; + int __user *sp = argp; int s; switch (cmd) { case TUNGETVNETHDRSZ: + if (!sz) + return -EBADFD; + s = *sz; if (put_user(s, sp)) return -EFAULT; return 0; case TUNSETVNETHDRSZ: + if (!sz) + return -EBADFD; + if (get_user(s, sp)) return -EFAULT; if (s < (int)sizeof(struct virtio_net_hdr)) @@ -80,12 +96,18 @@ static inline long tun_vnet_ioctl(int *sz, unsigned int *flags, return 0; case TUNGETVNETLE: + if (!flags) + return -EBADFD; + s = !!(*flags & TUN_VNET_LE); if (put_user(s, sp)) return -EFAULT; return 0; case TUNSETVNETLE: + if (!flags) + return -EBADFD; + if (get_user(s, sp)) return -EFAULT; if (s) @@ -95,16 +117,56 @@ static inline long tun_vnet_ioctl(int *sz, unsigned int *flags, return 0; case TUNGETVNETBE: + if (!flags) + return -EBADFD; + return tun_vnet_get_be(*flags, sp); case TUNSETVNETBE: + if (!flags) + return -EBADFD; + return tun_vnet_set_be(flags, sp); + case TUNGETVNETHASHCAP: + return copy_to_user(argp, &cap, sizeof(cap)) ? -EFAULT : 0; + + case TUNSETVNETHASH: + if (!hash) + return -EBADFD; + + if (copy_from_user(&hash_buf, argp, sizeof(hash_buf))) + return -EFAULT; + + *hash = hash_buf; + return 0; + default: - return -EINVAL; + return fallback; } } +static inline void tun_vnet_hash_report(const struct tun_vnet_hash *hash, + struct sk_buff *skb, + const struct flow_keys_basic *keys, + u32 value, + tun_vnet_hash_add vnet_hash_add) +{ + struct virtio_net_hash *report; + + if (!(hash->flags & TUN_VNET_HASH_REPORT)) + return; + + report = vnet_hash_add(skb); + if (!report) + return; + + *report = (struct virtio_net_hash) { + .report = virtio_net_hash_report(hash->types, keys), + .value = value + }; +} + static inline int tun_vnet_hdr_get(int sz, unsigned int flags, struct iov_iter *from, struct virtio_net_hdr *hdr) @@ -130,15 +192,15 @@ static inline int tun_vnet_hdr_get(int sz, unsigned int flags, } static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter, - const struct virtio_net_hdr *hdr) + const void *hdr, int content_sz) { if (iov_iter_count(iter) < sz) return -EINVAL; - if (copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)) + if (copy_to_iter(hdr, content_sz, iter) != content_sz) return -EFAULT; - if (iov_iter_zero(sz - sizeof(*hdr), iter) != sz - sizeof(*hdr)) + if (iov_iter_zero(sz - content_sz, iter) != sz - content_sz) return -EFAULT; return 0; @@ -151,32 +213,48 @@ static inline int tun_vnet_hdr_to_skb(unsigned int flags, return virtio_net_hdr_to_skb(skb, hdr, tun_vnet_is_little_endian(flags)); } -static inline int tun_vnet_hdr_from_skb(unsigned int flags, +static inline int tun_vnet_hdr_from_skb(int sz, unsigned int flags, const struct net_device *dev, const struct sk_buff *skb, - struct virtio_net_hdr *hdr) + tun_vnet_hash_find vnet_hash_find, + struct virtio_net_hdr_v1_hash *hdr) { int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0; + const struct virtio_net_hash *report = sz < sizeof(struct virtio_net_hdr_v1_hash) ? + NULL : vnet_hash_find(skb); + int content_sz; + + if (report) { + content_sz = sizeof(struct virtio_net_hdr_v1_hash); + + *hdr = (struct virtio_net_hdr_v1_hash) { + .hdr = { .num_buffers = __cpu_to_virtio16(true, 1) }, + .hash_value = cpu_to_le32(report->value), + .hash_report = cpu_to_le16(report->report) + }; + } else { + content_sz = sizeof(struct virtio_net_hdr); + } - if (virtio_net_hdr_from_skb(skb, hdr, + if (virtio_net_hdr_from_skb(skb, (struct virtio_net_hdr *)hdr, tun_vnet_is_little_endian(flags), true, vlan_hlen)) { struct skb_shared_info *sinfo = skb_shinfo(skb); if (net_ratelimit()) { netdev_err(dev, "unexpected GSO type: 0x%x, gso_size %d, hdr_len %d\n", - sinfo->gso_type, tun_vnet16_to_cpu(flags, hdr->gso_size), - tun_vnet16_to_cpu(flags, hdr->hdr_len)); + sinfo->gso_type, tun_vnet16_to_cpu(flags, hdr->hdr.gso_size), + tun_vnet16_to_cpu(flags, hdr->hdr.hdr_len)); print_hex_dump(KERN_ERR, "tun: ", DUMP_PREFIX_NONE, 16, 1, skb->head, - min(tun_vnet16_to_cpu(flags, hdr->hdr_len), 64), true); + min(tun_vnet16_to_cpu(flags, hdr->hdr.hdr_len), 64), true); } WARN_ON_ONCE(1); return -EINVAL; } - return 0; + return content_sz; } #endif /* TUN_VNET_H */ diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h index 553552fa635c..5bbb343a6dba 100644 --- a/include/linux/if_tap.h +++ b/include/linux/if_tap.h @@ -4,6 +4,7 @@ #include #include +#include struct file; struct socket; @@ -43,6 +44,7 @@ struct tap_dev { int numqueues; netdev_features_t tap_features; int minor; + struct tun_vnet_hash vnet_hash; void (*update_features)(struct tap_dev *tap, netdev_features_t features); void (*count_tx_dropped)(struct tap_dev *tap); diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h index 287cdc81c939..d11e79b4e0dc 100644 --- a/include/uapi/linux/if_tun.h +++ b/include/uapi/linux/if_tun.h @@ -62,6 +62,34 @@ #define TUNSETCARRIER _IOW('T', 226, int) #define TUNGETDEVNETNS _IO('T', 227) +/** + * define TUNGETVNETHASHCAP - ioctl to get virtio_net hashing capability. + * + * The argument is a pointer to &struct tun_vnet_hash which will store the + * maximal virtio_net hashing configuration. + */ +#define TUNGETVNETHASHCAP _IOR('T', 228, struct tun_vnet_hash) + +/** + * define TUNSETVNETHASH - ioctl to configure virtio_net hashing + * + * The argument is a pointer to &struct tun_vnet_hash. + * + * The %TUN_VNET_HASH_REPORT flag set with this ioctl will be effective only + * after calling the %TUNSETVNETHDRSZ ioctl with a number greater than or equal + * to the size of &struct virtio_net_hdr_v1_hash. + * + * The members added to the legacy header by %TUN_VNET_HASH_REPORT flag will + * always be little-endian. + * + * This ioctl results in %EBADFD if the underlying device is deleted. It affects + * all queues attached to the same device. + * + * This ioctl currently has no effect on XDP packets and packets with + * queue_mapping set by TC. + */ +#define TUNSETVNETHASH _IOW('T', 229, struct tun_vnet_hash) + /* TUNSETIFF ifr flags */ #define IFF_TUN 0x0001 #define IFF_TAP 0x0002 @@ -115,4 +143,24 @@ struct tun_filter { __u8 addr[][ETH_ALEN]; }; +/** + * define TUN_VNET_HASH_REPORT - Request virtio_net hash reporting for vhost + */ +#define TUN_VNET_HASH_REPORT 0x0001 + +/** + * struct tun_vnet_hash - virtio_net hashing configuration + * @flags: + * Bitmask consists of %TUN_VNET_HASH_REPORT and %TUN_VNET_HASH_RSS + * @pad: + * Should be filled with zero before passing to %TUNSETVNETHASH + * @types: + * Bitmask of allowed hash types + */ +struct tun_vnet_hash { + __u16 flags; + __u8 pad[2]; + __u32 types; +}; + #endif /* _UAPI__IF_TUN_H */ From patchwork Tue Oct 8 06:54:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825750 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFCEB1E9083 for ; Tue, 8 Oct 2024 06:55:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370520; cv=none; b=A/z0/2m7elXGI9nwocmKr/AutU1vroYZTSEm0AINWsmrMYi/aem++iuDILqMcT6V91mMZoooWYMEv/CyKjAdSMOUH4t5/uqksDh7YJnUjXWiTVuONRiDY10Bsggoa0pCXbRrUqcr5YwZ8ZYq1zeR4CTMlcGOZmx+6P+Lb5wv8j8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370520; c=relaxed/simple; bh=a2kJWMkExNcWI86WBUBDvEgxTbAIgHqREo5GyuUxwxs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=rNQ1D0KIPWT+Lx6UBI0JMMIXHySeoLy15/nQr69vKo4F0yA/8yjiH1uChhZt0vnVHrq+yO7aNZ6VOkCMJCQ0Z0N0VRilz4M2kzqpFu83e0SjYueEHBGhbKjOeIex/CYB1FDbOA3S7RCr2aSGIaXUxm+2HoHERjf1nQEiAkChosc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=UcKqRZ8d; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="UcKqRZ8d" Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-2e09ff2d890so4443369a91.0 for ; Mon, 07 Oct 2024 23:55:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370517; x=1728975317; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=ZMICLru6rQUo+AXF2Kzpb/5yEpgekpzMnpJa+eP1zWw=; b=UcKqRZ8dA4kSIjO0wInbuSeyTMCmdnRvfmhxoVg204bz7GwkbViO9X7QyvZ+OQKHpz 30xW6VUY7X0dufxVLzf6hvipFvzqrFNXe0vkvY2jLV6bjDrEDmwY7YhVHxpbtCUrw73Q 90JM5LpwwbKeO7beaGEnoF7CGKyQmfT2LXrp3I+IMt34iSmtN50Pzj55B0QOoLtECUve zbPzbVU4hJZS6Wd1S8guYWOUqpxnWYYvf40gfufUB51YpAfczUO652TeAQvCrPXyQ4es h+zvK6j12zDLgtrST+eDIQ+Oy0TfQC3gkstzpvFm1cf01KNz0bkKkLxzdVWIGIBlgBWy R7hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370517; x=1728975317; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZMICLru6rQUo+AXF2Kzpb/5yEpgekpzMnpJa+eP1zWw=; b=r1zhzwe9ccwjw0pyiIDW5WFRhvknUb03eoUvQ6LyJCT+0Dl3eMcBMFbNFH+MKB3Onf cIJdsEQX4YcUd1pccPbfPGuqYo25LyAgjWdGXXSs1wFMxz+nnAA4U/DSHvy/AcgQNZMq ekXy3uRwGDhhMY5GRx6jOCszUAVPyjpRwI1z4NqCfYi0YiLTbw9Y1e/1DiQPNi5rDb7G 2KQJvnfsi+hw/vvGmafrq7pDhAods4RnYG6qnQ00KPNHj6MKhO/lpDEOuS92gBZcNKq7 6kb9c3wdGpaHxrsQWpMcenqowCUh+4aOUQ5uEqQkAPRNU2eGBRuYvasbJD+UjlLYbo75 F2TA== X-Forwarded-Encrypted: i=1; AJvYcCWxqffNpiCl2dFeytc211dJSRMhpN7FPWL3sP2gGC+BmvzMyQVEUqcNFy2pu3xgscm1g9g=@vger.kernel.org X-Gm-Message-State: AOJu0Yw4+56//ZgHEG/hVOgecxpTdH4CYRFCkc6QYJiWJT9XDGgUSyI1 sKuByUa2eBnEiiVvgnYfY3KZ1ZqzfOxB4oPz9N0VwS/+UthjvUJavK98z7uahNM= X-Google-Smtp-Source: AGHT+IETqq/o8/WlMy63jHtkq90XEyz6WGiwMczZkwAe+xKo1XNx2mzQ+14/sRBXM7EDAH5EvS0xjw== X-Received: by 2002:a17:90b:4ac4:b0:2d8:ca33:42a5 with SMTP id 98e67ed59e1d1-2e1e6367493mr15217131a91.40.1728370516975; Mon, 07 Oct 2024 23:55:16 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id 98e67ed59e1d1-2e2848d28f4sm751907a91.24.2024.10.07.23.55.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:55:16 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:27 +0900 Subject: [PATCH RFC v5 07/10] tun: Introduce virtio-net RSS Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-7-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 RSS is a receive steering algorithm that can be negotiated to use with virtio_net. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS. Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF steering program. Introduce the code to perform RSS to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but I didn't opt for it because extending the current mechanism of eBPF steering program as is because it relies on legacy context rewriting, and introducing kfunc-based eBPF will result in non-UAPI dependency while the other relevant virtualization APIs such as KVM and vhost_net are UAPIs. Signed-off-by: Akihiko Odaki --- drivers/net/tap.c | 11 +++++- drivers/net/tun.c | 57 ++++++++++++++++++++------- drivers/net/tun_vnet.h | 96 +++++++++++++++++++++++++++++++++++++++++---- include/linux/if_tap.h | 4 +- include/uapi/linux/if_tun.h | 27 +++++++++++++ 5 files changed, 169 insertions(+), 26 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 5e2fbe63ca47..a58b83285af4 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -207,6 +207,7 @@ static struct tap_queue *tap_get_queue(struct tap_dev *tap, * racing against queue removal. */ int numvtaps = READ_ONCE(tap->numvtaps); + struct tun_vnet_hash_container *vnet_hash = rcu_dereference(tap->vnet_hash); __u32 rxq; *tap_add_hash(skb) = (struct virtio_net_hash) { .report = VIRTIO_NET_HASH_REPORT_NONE }; @@ -217,6 +218,12 @@ static struct tap_queue *tap_get_queue(struct tap_dev *tap, if (numvtaps == 1) goto single; + if (vnet_hash && (vnet_hash->common.flags & TUN_VNET_HASH_RSS)) { + rxq = tun_vnet_rss_select_queue(numvtaps, vnet_hash, skb, tap_add_hash); + queue = rcu_dereference(tap->taps[rxq]); + goto out; + } + if (!skb->l4_hash && !skb->sw_hash) { struct flow_keys keys; @@ -234,7 +241,7 @@ static struct tap_queue *tap_get_queue(struct tap_dev *tap, /* Check if we can use flow to select a queue */ if (rxq) { - tun_vnet_hash_report(&tap->vnet_hash, skb, &keys_basic, rxq, tap_add_hash); + tun_vnet_hash_report(vnet_hash, skb, &keys_basic, rxq, tap_add_hash); queue = rcu_dereference(tap->taps[rxq % numvtaps]); goto out; } @@ -1058,7 +1065,7 @@ static long tap_ioctl(struct file *file, unsigned int cmd, tap = rtnl_dereference(q->tap); ret = tun_vnet_ioctl(&q->vnet_hdr_sz, &q->flags, tap ? &tap->vnet_hash : NULL, -EINVAL, - cmd, sp); + true, cmd, sp); rtnl_unlock(); return ret; } diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 27308417b834..18528568aed7 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -209,7 +209,7 @@ struct tun_struct { struct bpf_prog __rcu *xdp_prog; struct tun_prog __rcu *steering_prog; struct tun_prog __rcu *filter_prog; - struct tun_vnet_hash vnet_hash; + struct tun_vnet_hash_container __rcu *vnet_hash; struct ethtool_link_ksettings link_ksettings; /* init args */ struct file *file; @@ -468,7 +468,9 @@ static const struct virtio_net_hash *tun_find_hash(const struct sk_buff *skb) * the userspace application move between processors, we may get a * different rxq no. here. */ -static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) +static u16 tun_automq_select_queue(struct tun_struct *tun, + const struct tun_vnet_hash_container *vnet_hash, + struct sk_buff *skb) { struct flow_keys keys; struct flow_keys_basic keys_basic; @@ -493,7 +495,7 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) .control = keys.control, .basic = keys.basic }; - tun_vnet_hash_report(&tun->vnet_hash, skb, &keys_basic, skb->l4_hash ? skb->hash : txq, + tun_vnet_hash_report(vnet_hash, skb, &keys_basic, skb->l4_hash ? skb->hash : txq, tun_add_hash); return txq; @@ -523,10 +525,17 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, u16 ret; rcu_read_lock(); - if (rcu_dereference(tun->steering_prog)) + if (rcu_dereference(tun->steering_prog)) { ret = tun_ebpf_select_queue(tun, skb); - else - ret = tun_automq_select_queue(tun, skb); + } else { + struct tun_vnet_hash_container *vnet_hash = rcu_dereference(tun->vnet_hash); + + if (vnet_hash && (vnet_hash->common.flags & TUN_VNET_HASH_RSS)) + ret = tun_vnet_rss_select_queue(READ_ONCE(tun->numqueues), vnet_hash, + skb, tun_add_hash); + else + ret = tun_automq_select_queue(tun, vnet_hash, skb); + } rcu_read_unlock(); return ret; @@ -2248,6 +2257,9 @@ static void tun_free_netdev(struct net_device *dev) security_tun_dev_free_security(tun->security); __tun_set_ebpf(tun, &tun->steering_prog, NULL); __tun_set_ebpf(tun, &tun->filter_prog, NULL); + rtnl_lock(); + kfree_rcu_mightsleep(rtnl_dereference(tun->vnet_hash)); + rtnl_unlock(); } static void tun_setup(struct net_device *dev) @@ -2946,13 +2958,9 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr) } static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p, - void __user *data) + int fd) { struct bpf_prog *prog; - int fd; - - if (copy_from_user(&fd, data, sizeof(fd))) - return -EFAULT; if (fd == -1) { prog = NULL; @@ -3019,6 +3027,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, int sndbuf; int ret; bool do_notify = false; + struct tun_vnet_hash_container *vnet_hash; if (cmd == TUNSETIFF || cmd == TUNSETQUEUE || (_IOC_TYPE(cmd) == SOCK_IOC_TYPE && cmd != SIOCGSKNS)) { @@ -3078,7 +3087,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, } if (!tun) { - ret = tun_vnet_ioctl(NULL, NULL, NULL, -EBADFD, cmd, argp); + ret = tun_vnet_ioctl(NULL, NULL, NULL, -EBADFD, true, cmd, argp); goto unlock; } @@ -3256,11 +3265,27 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, break; case TUNSETSTEERINGEBPF: - ret = tun_set_ebpf(tun, &tun->steering_prog, argp); + if (get_user(ret, (int __user *)argp)) { + ret = -EFAULT; + break; + } + + vnet_hash = rtnl_dereference(tun->vnet_hash); + if (ret != -1 && vnet_hash && (vnet_hash->common.flags & TUN_VNET_HASH_RSS)) { + ret = -EBUSY; + break; + } + + ret = tun_set_ebpf(tun, &tun->steering_prog, ret); break; case TUNSETFILTEREBPF: - ret = tun_set_ebpf(tun, &tun->filter_prog, argp); + if (get_user(ret, (int __user *)argp)) { + ret = -EFAULT; + break; + } + + ret = tun_set_ebpf(tun, &tun->filter_prog, ret); break; case TUNSETCARRIER: @@ -3280,7 +3305,9 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, default: ret = tun_vnet_ioctl(&tun->vnet_hdr_sz, &tun->flags, - &tun->vnet_hash, -EINVAL, cmd, argp); + &tun->vnet_hash, -EINVAL, + !rtnl_dereference(tun->steering_prog), + cmd, argp); } if (do_notify) diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h index 589a97dd7d02..f5de4fe9d14e 100644 --- a/drivers/net/tun_vnet.h +++ b/drivers/net/tun_vnet.h @@ -9,6 +9,13 @@ typedef struct virtio_net_hash *(*tun_vnet_hash_add)(struct sk_buff *); typedef const struct virtio_net_hash *(*tun_vnet_hash_find)(const struct sk_buff *); +struct tun_vnet_hash_container { + struct tun_vnet_hash common; + struct tun_vnet_hash_rss rss; + u32 rss_key[VIRTIO_NET_RSS_MAX_KEY_SIZE]; + u16 rss_indirection_table[]; +}; + static inline bool tun_vnet_legacy_is_little_endian(unsigned int flags) { return !(IS_ENABLED(CONFIG_TUN_VNET_CROSS_LE) && (flags & TUN_VNET_BE)) && @@ -62,14 +69,16 @@ static inline __virtio16 cpu_to_tun_vnet16(unsigned int flags, u16 val) } static inline long tun_vnet_ioctl(int *sz, unsigned int *flags, - struct tun_vnet_hash *hash, long fallback, + struct tun_vnet_hash_container __rcu **hashp, + long fallback, bool can_rss, unsigned int cmd, void __user *argp) { static const struct tun_vnet_hash cap = { - .flags = TUN_VNET_HASH_REPORT, + .flags = TUN_VNET_HASH_REPORT | TUN_VNET_HASH_RSS, .types = VIRTIO_NET_SUPPORTED_HASH_TYPES }; struct tun_vnet_hash hash_buf; + struct tun_vnet_hash_container *hash; int __user *sp = argp; int s; @@ -132,13 +141,57 @@ static inline long tun_vnet_ioctl(int *sz, unsigned int *flags, return copy_to_user(argp, &cap, sizeof(cap)) ? -EFAULT : 0; case TUNSETVNETHASH: - if (!hash) + if (!hashp) return -EBADFD; if (copy_from_user(&hash_buf, argp, sizeof(hash_buf))) return -EFAULT; + argp = (struct tun_vnet_hash __user *)argp + 1; + + if (hash_buf.flags & TUN_VNET_HASH_RSS) { + struct tun_vnet_hash_rss rss; + size_t indirection_table_size; + size_t key_size; + size_t size; + + if (!can_rss) + return -EBUSY; + + if (copy_from_user(&rss, argp, sizeof(rss))) + return -EFAULT; + argp = (struct tun_vnet_hash_rss __user *)argp + 1; + + indirection_table_size = ((size_t)rss.indirection_table_mask + 1) * 2; + key_size = virtio_net_hash_key_length(hash_buf.types); + size = struct_size(hash, rss_indirection_table, + (size_t)rss.indirection_table_mask + 1); + + hash = kmalloc(size, GFP_KERNEL); + if (!hash) + return -ENOMEM; + + if (copy_from_user(hash->rss_indirection_table, + argp, indirection_table_size)) { + kfree(hash); + return -EFAULT; + } + argp = (u16 __user *)argp + rss.indirection_table_mask + 1; + + if (copy_from_user(hash->rss_key, argp, key_size)) { + kfree(hash); + return -EFAULT; + } + + virtio_net_toeplitz_convert_key(hash->rss_key, key_size); + hash->rss = rss; + } else { + hash = kmalloc(sizeof(hash->common), GFP_KERNEL); + if (!hash) + return -ENOMEM; + } - *hash = hash_buf; + hash->common = hash_buf; + kfree_rcu_mightsleep(rcu_replace_pointer_rtnl(*hashp, hash)); return 0; default: @@ -146,7 +199,7 @@ static inline long tun_vnet_ioctl(int *sz, unsigned int *flags, } } -static inline void tun_vnet_hash_report(const struct tun_vnet_hash *hash, +static inline void tun_vnet_hash_report(const struct tun_vnet_hash_container *hash, struct sk_buff *skb, const struct flow_keys_basic *keys, u32 value, @@ -154,7 +207,7 @@ static inline void tun_vnet_hash_report(const struct tun_vnet_hash *hash, { struct virtio_net_hash *report; - if (!(hash->flags & TUN_VNET_HASH_REPORT)) + if (!hash || !(hash->common.flags & TUN_VNET_HASH_REPORT)) return; report = vnet_hash_add(skb); @@ -162,11 +215,40 @@ static inline void tun_vnet_hash_report(const struct tun_vnet_hash *hash, return; *report = (struct virtio_net_hash) { - .report = virtio_net_hash_report(hash->types, keys), + .report = virtio_net_hash_report(hash->common.types, keys), .value = value }; } +static inline u16 tun_vnet_rss_select_queue(u32 numqueues, + const struct tun_vnet_hash_container *hash, + struct sk_buff *skb, + tun_vnet_hash_add vnet_hash_add) +{ + struct virtio_net_hash *report; + struct virtio_net_hash ret; + u16 txq, index; + + if (!numqueues) + return 0; + + virtio_net_hash_rss(skb, hash->common.types, hash->rss_key, &ret); + + if (!ret.report) + return hash->rss.unclassified_queue % numqueues; + + if (hash->common.flags & TUN_VNET_HASH_REPORT) { + report = vnet_hash_add(skb); + if (report) + *report = ret; + } + + index = ret.value & hash->rss.indirection_table_mask; + txq = READ_ONCE(hash->rss_indirection_table[index]); + + return txq % numqueues; +} + static inline int tun_vnet_hdr_get(int sz, unsigned int flags, struct iov_iter *from, struct virtio_net_hdr *hdr) diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h index 5bbb343a6dba..7334c46a3f10 100644 --- a/include/linux/if_tap.h +++ b/include/linux/if_tap.h @@ -4,7 +4,6 @@ #include #include -#include struct file; struct socket; @@ -32,6 +31,7 @@ static inline struct ptr_ring *tap_get_ptr_ring(struct file *f) #define MAX_TAP_QUEUES 256 struct tap_queue; +struct tun_vnet_hash_container; struct tap_dev { struct net_device *dev; @@ -44,7 +44,7 @@ struct tap_dev { int numqueues; netdev_features_t tap_features; int minor; - struct tun_vnet_hash vnet_hash; + struct tun_vnet_hash_container __rcu *vnet_hash; void (*update_features)(struct tap_dev *tap, netdev_features_t features); void (*count_tx_dropped)(struct tap_dev *tap); diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h index d11e79b4e0dc..4887f97500a8 100644 --- a/include/uapi/linux/if_tun.h +++ b/include/uapi/linux/if_tun.h @@ -75,6 +75,14 @@ * * The argument is a pointer to &struct tun_vnet_hash. * + * The argument is a pointer to the compound of the following in order if + * %TUN_VNET_HASH_RSS is set: + * + * 1. &struct tun_vnet_hash + * 2. &struct tun_vnet_hash_rss + * 3. Indirection table + * 4. Key + * * The %TUN_VNET_HASH_REPORT flag set with this ioctl will be effective only * after calling the %TUNSETVNETHDRSZ ioctl with a number greater than or equal * to the size of &struct virtio_net_hdr_v1_hash. @@ -148,6 +156,13 @@ struct tun_filter { */ #define TUN_VNET_HASH_REPORT 0x0001 +/** + * define TUN_VNET_HASH_RSS - Request virtio_net RSS + * + * This is mutually exclusive with eBPF steering program. + */ +#define TUN_VNET_HASH_RSS 0x0002 + /** * struct tun_vnet_hash - virtio_net hashing configuration * @flags: @@ -163,4 +178,16 @@ struct tun_vnet_hash { __u32 types; }; +/** + * struct tun_vnet_hash_rss - virtio_net RSS configuration + * @indirection_table_mask: + * Bitmask to be applied to the indirection table index + * @unclassified_queue: + * The index of the queue to place unclassified packets in + */ +struct tun_vnet_hash_rss { + __u16 indirection_table_mask; + __u16 unclassified_queue; +}; + #endif /* _UAPI__IF_TUN_H */ From patchwork Tue Oct 8 06:54:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825751 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A97F1EABB4 for ; Tue, 8 Oct 2024 06:55:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370524; cv=none; b=C+4v2zn54vNjc8c6UtGLwNGQX/3ivWw/inbcSaRX3nRzLLQlhhofCPLZpOE0+XbXFw7JlFcge2qdIRjIz4UIU+D+Gp2GuuA8CR9gVkZM6doEkpoeflj5XmiudDtLGrBfDEwyAqjsVvyCgNxLsoKzjIDdn/tPM4noDC3Cjs9Tk9c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370524; c=relaxed/simple; bh=aMMcGMlIvM+5vysEFQY82Ld7Vd0lkGqNTVfE62GAxgo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=rZF7GMldOO19NhQkhUFTvwQUGLPSXlEVEH588YHwNXTFlVRNYT25hBDsx26DW6fIL0Ya4HePgOHqhdxzyxEAKQdcAewnqQB9RWHnwY8T9eF5C07fLW1m2Ls7e9SMjPQNIKmwEXslpcgziKL8hJlo5CP9mbrUp8fNL214PdUHdpE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=NjPvVRKC; arc=none smtp.client-ip=209.85.210.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="NjPvVRKC" Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-71def8abc2fso2641543b3a.1 for ; Mon, 07 Oct 2024 23:55:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370522; x=1728975322; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=+uKsejVxLmGPlZDZ2YAoWDts+QfkdC5shkCFnDsQphM=; b=NjPvVRKCUMKmtijp5aTQtZxfAxbPT0v+EX5xlri2uVDsgwqFNWrluIKDuoveCu5Kyj 9FNaVU6NcOx6XN+frNm0RFCd5qbiyujwbKXHg4G5q+0YI0nsRZ2CoerJg8FV9Mg3riC0 o8gh8xoSdhAP9BLSAksXIhyt9aaIzYKH5J7ysWaj3Kim+iII4HpnZhLiRGf/p+1U0Ycg XB/Zf2kqQ7m6GWTqX4OozHr50QV2n0oKEQTyLWeb3flRtJsx3DAOYYn8/dR5Smtm7lJc MuCtT+IXMkTWGdVFu3p1+vvVBI208p/qecd/ElYDB/8oqwgx4FcEBfQGv1gNflAcBH5e yQJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370522; x=1728975322; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+uKsejVxLmGPlZDZ2YAoWDts+QfkdC5shkCFnDsQphM=; b=KjQjq8lcp7Vc27szaHVDK2DAsaif97dy5Q3f/WF+sYasaanMofeB6WVM7RenIcnKhJ 4BMxqbYZSXzlw/RqHJdYWPippy7q6s8Idvp+TgYRrgpCxgNU3Sc8D1t89jMQp0BJvnpi N7U2DYDKmqfmArp0nHmhApWM5YYcM1gkfEROphLgPbMgVQa1ZZ3bq9C7HH3tWNi0aY/b IEsCRRM35JYPUdtAXljjZViJjk0CejeRUP24BPz8fxXQPV+zezJu8CJYPzZgOKKMW7Ep UehIb1maJgakddQ8pWdYa4HKTloLvEh4q4WaeUBoePa4QYZmOYsHE+xNn/YbTrsBHB5v pv1w== X-Forwarded-Encrypted: i=1; AJvYcCViDUBXBNkppDbFEpIyM+9RAqpMnA9Va7pd+vU8ARYugtNhZsG3mzUZQMtucNP/qeTMaLU=@vger.kernel.org X-Gm-Message-State: AOJu0YyeCLb7DEauw7A5gH4yf+FEs9VdYDNhnYJFRnOagZeknqSVZlp4 eDXbgZUHJzWMW+As6z2aXfdWHOJn61lZAmPaTcCnvpmBZlwV4IO0y6XpwiX+T2U= X-Google-Smtp-Source: AGHT+IHpl6coI+ZO3xm7Go5SLIOAf4hTK2yXXSof1uFkX8umSQONLP1U9ZQGcbX4LKYOwCaQ5jp55w== X-Received: by 2002:a05:6a00:84b:b0:71e:183d:6e74 with SMTP id d2e1a72fcca58-71e183d789bmr246332b3a.4.1728370522610; Mon, 07 Oct 2024 23:55:22 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-71df0d65246sm5463299b3a.169.2024.10.07.23.55.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:55:22 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:28 +0900 Subject: [PATCH RFC v5 08/10] selftest: tun: Test vnet ioctls without device Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-8-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 Ensure that vnet ioctls result in EBADFD when the underlying device is deleted. Signed-off-by: Akihiko Odaki --- tools/testing/selftests/net/tun.c | 74 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) diff --git a/tools/testing/selftests/net/tun.c b/tools/testing/selftests/net/tun.c index fa83918b62d1..463dd98f2b80 100644 --- a/tools/testing/selftests/net/tun.c +++ b/tools/testing/selftests/net/tun.c @@ -159,4 +159,78 @@ TEST_F(tun, reattach_close_delete) { EXPECT_EQ(tun_delete(self->ifname), 0); } +FIXTURE(tun_deleted) +{ + char ifname[IFNAMSIZ]; + int fd; +}; + +FIXTURE_SETUP(tun_deleted) +{ + self->ifname[0] = 0; + self->fd = tun_alloc(self->ifname); + ASSERT_LE(0, self->fd); + + ASSERT_EQ(0, tun_delete(self->ifname)) + EXPECT_EQ(0, close(self->fd)); +} + +FIXTURE_TEARDOWN(tun_deleted) +{ + EXPECT_EQ(0, close(self->fd)); +} + +TEST_F(tun_deleted, getvnethdrsz) +{ + ASSERT_EQ(-1, ioctl(self->fd, TUNGETVNETHDRSZ)); + EXPECT_EQ(EBADFD, errno); +} + +TEST_F(tun_deleted, setvnethdrsz) +{ + ASSERT_EQ(-1, ioctl(self->fd, TUNSETVNETHDRSZ)); + EXPECT_EQ(EBADFD, errno); +} + +TEST_F(tun_deleted, getvnetle) +{ + ASSERT_EQ(-1, ioctl(self->fd, TUNGETVNETLE)); + EXPECT_EQ(EBADFD, errno); +} + +TEST_F(tun_deleted, setvnetle) +{ + ASSERT_EQ(-1, ioctl(self->fd, TUNSETVNETLE)); + EXPECT_EQ(EBADFD, errno); +} + +TEST_F(tun_deleted, getvnetbe) +{ + ASSERT_EQ(-1, ioctl(self->fd, TUNGETVNETBE)); + EXPECT_EQ(EBADFD, errno); +} + +TEST_F(tun_deleted, setvnetbe) +{ + ASSERT_EQ(-1, ioctl(self->fd, TUNSETVNETBE)); + EXPECT_EQ(EBADFD, errno); +} + +TEST_F(tun_deleted, getvnethashcap) +{ + struct tun_vnet_hash cap; + int i = ioctl(self->fd, TUNGETVNETHASHCAP, &cap); + + if (i == -1 && errno == EBADFD) + SKIP(return, "TUNGETVNETHASHCAP not supported"); + + EXPECT_EQ(0, i); +} + +TEST_F(tun_deleted, setvnethash) +{ + ASSERT_EQ(-1, ioctl(self->fd, TUNSETVNETHASH)); + EXPECT_EQ(EBADFD, errno); +} + TEST_HARNESS_MAIN From patchwork Tue Oct 8 06:54:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825752 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE2BC1E9096 for ; Tue, 8 Oct 2024 06:55:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370530; cv=none; b=pIq38qYEqhAyH6TiW0hqMk/l9sdPeP45vmNi1gLyGubXeFHMB3vZbVyA0VVxDqMFzAHbpzeOwDofzYsqil+J7avuxpdoWLSM7Hcr6VIOtXxIPfv9Rwfc65OA+2kaYuRpjTlOhInGx3BUNJKPBRbPEWndvpukciFvnWZW01XcQ1I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370530; c=relaxed/simple; bh=adMwG98SgMSp8ecwFj1slwxm7LM1xpgJPGK9U2k2Ctw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=TVhFPr2YrhKbmhJO0kvzabNItofCFTaVF9KiLE6IJLYzwMYr1OGZW7w6gZTiesQWRM0QeVTfolSQ+ZRtEFznldqXBilbPuCtTADnkNQQSXRdXMJ3W+cR9BhRwBU/TQo/Cd/oq3WkeXt3iyK+GgSICyJmZw/1mjr/BTbHuW/FN/4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=nA+HaYf6; arc=none smtp.client-ip=209.85.215.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="nA+HaYf6" Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-7ea0728475dso1265346a12.0 for ; Mon, 07 Oct 2024 23:55:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370528; x=1728975328; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=9nCWKkukKnjJAcJ7oj5HSWnFT1oMn0FLf/HPPDOOuOE=; b=nA+HaYf6D7jAu/eg+dqeh6Xp64qf9SnL1MaXYGIz5xVUO1HqVBiYYDUlseHwdQfdi/ iiF1aA6ZlfCvkcw9yC1CwlX5jf9XrRzvZt5iQKpQcq4Rzbd+h+eExvFmHNjH6KF2Gg4k djg+FQOPwg+laLWPBe5YWFanKhyuiVLac3ZWZNzTKMXtmZVhwX50DM8+vbeO9oqdzcaD hQoJGE/A/CYYPIHQulmtBtshCmPOTz4coy43Qx+tdZdWk3U4SjjW3rbKCCprKIfmflTb vMacjM5N/N1NIvqX24jDOuG/ruTntUZCva9dStZl68XqzGju3DTm99xxsuB11MGvtcx+ MJ6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370528; x=1728975328; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9nCWKkukKnjJAcJ7oj5HSWnFT1oMn0FLf/HPPDOOuOE=; b=qXfUFC0qnD3kGPPnJlcvzxAz7sWwAV5Ja8SM2yQ+vpnOyogGpVQ/YvN9HquAOnuEUB VYqOWlU2C1leGtBTGLgso0SVuI0oGVhBukNqU/k48O5CFCvh9z8+sSzq2sF3XhvGzQjW 0swZGAk4O9ufPmaGBlVY/R56z03YUJLaKFUtzAOiChNYnkwD0F+7WuI3j+FUBC0Z5DBm HV0dmSDiaSY4AoNUHp47YuOSoqOAdiDdLUERpJTJArTNGUR1NdrC/1kmGYPMmWmD3knK 4BSgEl95E+31jCVFBnv34zDftxAbUiaO+HIz5/QvOuH+Ex0vp4Pxfmf2cy9SIhSaG3p3 fiBw== X-Forwarded-Encrypted: i=1; AJvYcCWCBvwc6QOAtPMqUe7AjLm1tCr6QwhZcjqDP8Aj2kSvxw7q8aiGhFKBXu3IoREbLFNnRSM=@vger.kernel.org X-Gm-Message-State: AOJu0YwRRhwIyR8ZHjxy6ouJM62lScPzaMTP2VVW2l38MnZWOUzMEWby 9qOYynSC4qpeShOWb4jAIyAB8h4cV6x0Uv+NUjkFPVYCrkSIblS7E2+B8JUUKyg= X-Google-Smtp-Source: AGHT+IFYZlmOm0QuthUmcVdw1qAEvXQ3gIQetl7LsJY9KhF0BUsSDAjZ/JPusVbOQExTm5FJbCmJqw== X-Received: by 2002:a05:6a21:3944:b0:1d4:fafb:845d with SMTP id adf61e73a8af0-1d7073bc233mr3527333637.2.1728370528243; Mon, 07 Oct 2024 23:55:28 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-71df0cbbaddsm5449672b3a.40.2024.10.07.23.55.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:55:27 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:29 +0900 Subject: [PATCH RFC v5 09/10] selftest: tun: Add tests for virtio-net hashing Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-9-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 The added tests confirm tun can perform RSS and hash reporting, and reject invalid configurations for them. Signed-off-by: Akihiko Odaki --- tools/testing/selftests/net/Makefile | 2 +- tools/testing/selftests/net/tun.c | 558 ++++++++++++++++++++++++++++++++++- 2 files changed, 551 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 9d5aa817411b..8e2ab5068171 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -110,6 +110,6 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto $(OUTPUT)/tcp_inq: LDLIBS += -lpthread $(OUTPUT)/bind_bhash: LDLIBS += -lpthread -$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/ +$(OUTPUT)/io_uring_zerocopy_tx $(OUTPUT)/tun: CFLAGS += -I../../../include/ include bpf.mk diff --git a/tools/testing/selftests/net/tun.c b/tools/testing/selftests/net/tun.c index 463dd98f2b80..ac3858744841 100644 --- a/tools/testing/selftests/net/tun.c +++ b/tools/testing/selftests/net/tun.c @@ -2,21 +2,37 @@ #define _GNU_SOURCE +#include #include #include +#include #include #include #include #include -#include +#include +#include +#include +#include +#include +#include +#include #include +#include #include #include -#include -#include +#include +#include +#include +#include #include "../kselftest_harness.h" +#define TUN_HWADDR_SOURCE { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00 } +#define TUN_HWADDR_DEST { 0x02, 0x00, 0x00, 0x00, 0x00, 0x01 } +#define TUN_IPADDR_SOURCE htonl((172 << 24) | (17 << 16) | 0) +#define TUN_IPADDR_DEST htonl((172 << 24) | (17 << 16) | 1) + static int tun_attach(int fd, char *dev) { struct ifreq ifr; @@ -39,7 +55,7 @@ static int tun_detach(int fd, char *dev) return ioctl(fd, TUNSETQUEUE, (void *) &ifr); } -static int tun_alloc(char *dev) +static int tun_alloc(char *dev, short flags) { struct ifreq ifr; int fd, err; @@ -52,7 +68,8 @@ static int tun_alloc(char *dev) memset(&ifr, 0, sizeof(ifr)); strcpy(ifr.ifr_name, dev); - ifr.ifr_flags = IFF_TAP | IFF_NAPI | IFF_MULTI_QUEUE; + ifr.ifr_flags = flags | IFF_TAP | IFF_NAPI | IFF_NO_PI | + IFF_MULTI_QUEUE; err = ioctl(fd, TUNSETIFF, (void *) &ifr); if (err < 0) { @@ -64,6 +81,40 @@ static int tun_alloc(char *dev) return fd; } +static bool tun_add_to_bridge(int local_fd, const char *name) +{ + struct ifreq ifreq = { + .ifr_name = "xbridge", + .ifr_ifindex = if_nametoindex(name) + }; + + if (!ifreq.ifr_ifindex) { + perror("if_nametoindex"); + return false; + } + + if (ioctl(local_fd, SIOCBRADDIF, &ifreq)) { + perror("SIOCBRADDIF"); + return false; + } + + return true; +} + +static bool tun_set_flags(int local_fd, const char *name, short flags) +{ + struct ifreq ifreq = { .ifr_flags = flags }; + + strcpy(ifreq.ifr_name, name); + + if (ioctl(local_fd, SIOCSIFFLAGS, &ifreq)) { + perror("SIOCSIFFLAGS"); + return false; + } + + return true; +} + static int tun_delete(char *dev) { struct { @@ -102,6 +153,159 @@ static int tun_delete(char *dev) return ret; } +static uint32_t tun_sum(const void *buf, size_t len) +{ + const uint16_t *sbuf = buf; + uint32_t sum = 0; + + while (len > 1) { + sum += *sbuf++; + len -= 2; + } + + if (len) + sum += *(uint8_t *)sbuf; + + return sum; +} + +static uint16_t tun_build_ip_check(uint32_t sum) +{ + return ~((sum & 0xffff) + (sum >> 16)); +} + +static uint32_t tun_build_ip_pseudo_sum(const void *iphdr) +{ + uint16_t tot_len = ntohs(((struct iphdr *)iphdr)->tot_len); + + return tun_sum((char *)iphdr + offsetof(struct iphdr, saddr), 8) + + htons(((struct iphdr *)iphdr)->protocol) + + htons(tot_len - sizeof(struct iphdr)); +} + +static uint32_t tun_build_ipv6_pseudo_sum(const void *ipv6hdr) +{ + return tun_sum((char *)ipv6hdr + offsetof(struct ipv6hdr, saddr), 32) + + ((struct ipv6hdr *)ipv6hdr)->payload_len + + htons(((struct ipv6hdr *)ipv6hdr)->nexthdr); +} + +static void tun_build_ethhdr(struct ethhdr *ethhdr, uint16_t proto) +{ + *ethhdr = (struct ethhdr) { + .h_dest = TUN_HWADDR_DEST, + .h_source = TUN_HWADDR_SOURCE, + .h_proto = htons(proto) + }; +} + +static void tun_build_iphdr(void *dest, uint16_t len, uint8_t protocol) +{ + struct iphdr iphdr = { + .ihl = sizeof(iphdr) / 4, + .version = 4, + .tot_len = htons(sizeof(iphdr) + len), + .ttl = 255, + .protocol = protocol, + .saddr = TUN_IPADDR_SOURCE, + .daddr = TUN_IPADDR_DEST + }; + + iphdr.check = tun_build_ip_check(tun_sum(&iphdr, sizeof(iphdr))); + memcpy(dest, &iphdr, sizeof(iphdr)); +} + +static void tun_build_ipv6hdr(void *dest, uint16_t len, uint8_t protocol) +{ + struct ipv6hdr ipv6hdr = { + .version = 6, + .payload_len = htons(len), + .nexthdr = protocol, + .saddr = { + .s6_addr32 = { + htonl(0xffff0000), 0, 0, TUN_IPADDR_SOURCE + } + }, + .daddr = { + .s6_addr32 = { + htonl(0xffff0000), 0, 0, TUN_IPADDR_DEST + } + }, + }; + + memcpy(dest, &ipv6hdr, sizeof(ipv6hdr)); +} + +static void tun_build_tcphdr(void *dest, uint32_t sum) +{ + struct tcphdr tcphdr = { + .source = htons(9), + .dest = htons(9), + .fin = 1, + .doff = sizeof(tcphdr) / 4, + }; + uint32_t tcp_sum = tun_sum(&tcphdr, sizeof(tcphdr)); + + tcphdr.check = tun_build_ip_check(sum + tcp_sum); + memcpy(dest, &tcphdr, sizeof(tcphdr)); +} + +static void tun_build_udphdr(void *dest, uint32_t sum) +{ + struct udphdr udphdr = { + .source = htons(9), + .dest = htons(9), + .len = htons(sizeof(udphdr)), + }; + uint32_t udp_sum = tun_sum(&udphdr, sizeof(udphdr)); + + udphdr.check = tun_build_ip_check(sum + udp_sum); + memcpy(dest, &udphdr, sizeof(udphdr)); +} + +static bool tun_vnet_hash_check(int source_fd, const int *dest_fds, + const void *buffer, size_t len, + uint8_t flags, + uint16_t hash_report, uint32_t hash_value) +{ + size_t read_len = sizeof(struct virtio_net_hdr_v1_hash) + len; + struct virtio_net_hdr_v1_hash *read_buffer; + struct virtio_net_hdr_v1_hash hdr = { + .hdr = { + .flags = flags, + .num_buffers = hash_report ? htole16(1) : 0 + }, + .hash_value = htole32(hash_value), + .hash_report = htole16(hash_report) + }; + int ret; + int txq = hash_report ? hash_value & 1 : 2; + + if (write(source_fd, buffer, len) != len) { + perror("write"); + return false; + } + + read_buffer = malloc(read_len); + if (!read_buffer) { + perror("malloc"); + return false; + } + + ret = read(dest_fds[txq], read_buffer, read_len); + if (ret != read_len) { + perror("read"); + free(read_buffer); + return false; + } + + ret = !memcmp(read_buffer, &hdr, sizeof(*read_buffer)) && + !memcmp(read_buffer + 1, buffer, len); + + free(read_buffer); + return ret; +} + FIXTURE(tun) { char ifname[IFNAMSIZ]; @@ -112,10 +316,10 @@ FIXTURE_SETUP(tun) { memset(self->ifname, 0, sizeof(self->ifname)); - self->fd = tun_alloc(self->ifname); + self->fd = tun_alloc(self->ifname, 0); ASSERT_GE(self->fd, 0); - self->fd2 = tun_alloc(self->ifname); + self->fd2 = tun_alloc(self->ifname, 0); ASSERT_GE(self->fd2, 0); } @@ -168,7 +372,7 @@ FIXTURE(tun_deleted) FIXTURE_SETUP(tun_deleted) { self->ifname[0] = 0; - self->fd = tun_alloc(self->ifname); + self->fd = tun_alloc(self->ifname, 0); ASSERT_LE(0, self->fd); ASSERT_EQ(0, tun_delete(self->ifname)) @@ -233,4 +437,342 @@ TEST_F(tun_deleted, setvnethash) EXPECT_EQ(EBADFD, errno); } +FIXTURE(tun_vnet_hash) +{ + int local_fd; + int source_fd; + int dest_fds[3]; +}; + +FIXTURE_SETUP(tun_vnet_hash) +{ + static const struct { + struct tun_vnet_hash hdr; + struct tun_vnet_hash_rss rss; + uint16_t rss_indirection_table[2]; + uint8_t rss_key[40]; + } vnet_hash = { + .hdr = { + .flags = TUN_VNET_HASH_REPORT | TUN_VNET_HASH_RSS, + .types = VIRTIO_NET_RSS_HASH_TYPE_IPv4 | + VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | + VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | + VIRTIO_NET_RSS_HASH_TYPE_IPv6 | + VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | + VIRTIO_NET_RSS_HASH_TYPE_UDPv6 + }, + .rss = { .indirection_table_mask = 1, .unclassified_queue = 5 }, + .rss_indirection_table = { 3, 4 }, + .rss_key = { + 0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2, + 0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0, + 0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4, + 0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c, + 0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa + } + }; + + struct { + struct virtio_net_hdr_v1_hash vnet_hdr; + struct ethhdr ethhdr; + struct arphdr arphdr; + unsigned char sender_hwaddr[6]; + uint32_t sender_ipaddr; + unsigned char target_hwaddr[6]; + uint32_t target_ipaddr; + } __packed packet = { + .ethhdr = { + .h_source = TUN_HWADDR_SOURCE, + .h_dest = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .h_proto = htons(ETH_P_ARP) + }, + .arphdr = { + .ar_hrd = htons(ARPHRD_ETHER), + .ar_pro = htons(ETH_P_IP), + .ar_hln = ETH_ALEN, + .ar_pln = 4, + .ar_op = htons(ARPOP_REQUEST) + }, + .sender_hwaddr = TUN_HWADDR_DEST, + .sender_ipaddr = TUN_IPADDR_DEST, + .target_ipaddr = TUN_IPADDR_DEST + }; + + struct tun_vnet_hash cap; + char source_ifname[IFNAMSIZ] = ""; + char dest_ifname[IFNAMSIZ] = ""; + int i; + + self->local_fd = socket(AF_LOCAL, SOCK_STREAM, 0); + ASSERT_LE(0, self->local_fd); + + self->source_fd = tun_alloc(source_ifname, 0); + ASSERT_LE(0, self->source_fd) { + EXPECT_EQ(0, close(self->local_fd)); + } + + i = ioctl(self->source_fd, TUNGETVNETHASHCAP, &cap); + if (i == -1 && errno == EINVAL) { + EXPECT_EQ(0, close(self->local_fd)); + SKIP(return, "TUNGETVNETHASHCAP not supported"); + } + + ASSERT_EQ(0, i) + EXPECT_EQ(0, close(self->local_fd)); + + if ((cap.flags & vnet_hash.hdr.flags) != vnet_hash.hdr.flags) { + EXPECT_EQ(0, close(self->local_fd)); + SKIP(return, "Lacks some hash flag support"); + } + + if ((cap.types & vnet_hash.hdr.types) != vnet_hash.hdr.types) { + EXPECT_EQ(0, close(self->local_fd)); + SKIP(return, "Lacks some hash type support"); + } + + ASSERT_TRUE(tun_set_flags(self->local_fd, source_ifname, IFF_UP)) + EXPECT_EQ(0, close(self->local_fd)); + + self->dest_fds[0] = tun_alloc(dest_ifname, IFF_VNET_HDR); + ASSERT_LE(0, self->dest_fds[0]) { + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + i = sizeof(struct virtio_net_hdr_v1_hash); + ASSERT_EQ(0, ioctl(self->dest_fds[0], TUNSETVNETHDRSZ, &i)) { + EXPECT_EQ(0, close(self->dest_fds[0])); + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + i = 1; + ASSERT_EQ(0, ioctl(self->dest_fds[0], TUNSETVNETLE, &i)) { + EXPECT_EQ(0, close(self->dest_fds[0])); + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + ASSERT_TRUE(tun_set_flags(self->local_fd, dest_ifname, IFF_UP)) { + EXPECT_EQ(0, close(self->dest_fds[0])); + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + ASSERT_EQ(sizeof(packet), + write(self->dest_fds[0], &packet, sizeof(packet))) { + EXPECT_EQ(0, close(self->dest_fds[0])); + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + ASSERT_EQ(0, ioctl(self->dest_fds[0], TUNSETVNETHASH, &vnet_hash)) { + EXPECT_EQ(0, close(self->dest_fds[0])); + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + for (i = 1; i < ARRAY_SIZE(self->dest_fds); i++) { + self->dest_fds[i] = tun_alloc(dest_ifname, IFF_VNET_HDR); + ASSERT_LE(0, self->dest_fds[i]) { + while (i) { + i--; + EXPECT_EQ(0, close(self->local_fd)); + } + + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + } + + ASSERT_EQ(0, ioctl(self->local_fd, SIOCBRADDBR, "xbridge")) { + EXPECT_EQ(0, ioctl(self->local_fd, SIOCBRDELBR, "xbridge")); + + for (i = 0; i < ARRAY_SIZE(self->dest_fds); i++) + EXPECT_EQ(0, close(self->dest_fds[i])); + + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + ASSERT_TRUE(tun_add_to_bridge(self->local_fd, source_ifname)) { + EXPECT_EQ(0, ioctl(self->local_fd, SIOCBRDELBR, "xbridge")); + + for (i = 0; i < ARRAY_SIZE(self->dest_fds); i++) + EXPECT_EQ(0, close(self->dest_fds[i])); + + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + ASSERT_TRUE(tun_add_to_bridge(self->local_fd, dest_ifname)) { + EXPECT_EQ(0, ioctl(self->local_fd, SIOCBRDELBR, "xbridge")); + + for (i = 0; i < ARRAY_SIZE(self->dest_fds); i++) + EXPECT_EQ(0, close(self->dest_fds[i])); + + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + ASSERT_TRUE(tun_set_flags(self->local_fd, "xbridge", IFF_UP)) { + EXPECT_EQ(0, ioctl(self->local_fd, SIOCBRDELBR, "xbridge")); + + for (i = 0; i < ARRAY_SIZE(self->dest_fds); i++) + EXPECT_EQ(0, close(self->dest_fds[i])); + + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } +} + +FIXTURE_TEARDOWN(tun_vnet_hash) +{ + ASSERT_TRUE(tun_set_flags(self->local_fd, "xbridge", 0)) { + for (size_t i = 0; i < ARRAY_SIZE(self->dest_fds); i++) + EXPECT_EQ(0, close(self->dest_fds[i])); + + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); + } + + EXPECT_EQ(0, ioctl(self->local_fd, SIOCBRDELBR, "xbridge")); + + for (size_t i = 0; i < ARRAY_SIZE(self->dest_fds); i++) + EXPECT_EQ(0, close(self->dest_fds[i])); + + EXPECT_EQ(0, close(self->source_fd)); + EXPECT_EQ(0, close(self->local_fd)); +} + +TEST_F(tun_vnet_hash, unclassified) +{ + struct { + struct ethhdr ethhdr; + struct iphdr iphdr; + } __packed packet; + + tun_build_ethhdr(&packet.ethhdr, ETH_P_LOOPBACK); + + EXPECT_TRUE(tun_vnet_hash_check(self->source_fd, self->dest_fds, + &packet, sizeof(packet), 0, + VIRTIO_NET_HASH_REPORT_NONE, 0)); +} + +TEST_F(tun_vnet_hash, ipv4) +{ + struct { + struct ethhdr ethhdr; + struct iphdr iphdr; + } __packed packet; + + tun_build_ethhdr(&packet.ethhdr, ETH_P_IP); + tun_build_iphdr(&packet.iphdr, 0, 253); + + EXPECT_TRUE(tun_vnet_hash_check(self->source_fd, self->dest_fds, + &packet, sizeof(packet), 0, + VIRTIO_NET_HASH_REPORT_IPv4, + 0x6e45d952)); +} + +TEST_F(tun_vnet_hash, tcpv4) +{ + struct { + struct ethhdr ethhdr; + struct iphdr iphdr; + struct tcphdr tcphdr; + } __packed packet; + + tun_build_ethhdr(&packet.ethhdr, ETH_P_IP); + tun_build_iphdr(&packet.iphdr, sizeof(struct tcphdr), IPPROTO_TCP); + + tun_build_tcphdr(&packet.tcphdr, + tun_build_ip_pseudo_sum(&packet.iphdr)); + + EXPECT_TRUE(tun_vnet_hash_check(self->source_fd, self->dest_fds, + &packet, sizeof(packet), + VIRTIO_NET_HDR_F_DATA_VALID, + VIRTIO_NET_HASH_REPORT_TCPv4, + 0xfb63539a)); +} + +TEST_F(tun_vnet_hash, udpv4) +{ + struct { + struct ethhdr ethhdr; + struct iphdr iphdr; + struct udphdr udphdr; + } __packed packet; + + tun_build_ethhdr(&packet.ethhdr, ETH_P_IP); + tun_build_iphdr(&packet.iphdr, sizeof(struct udphdr), IPPROTO_UDP); + + tun_build_udphdr(&packet.udphdr, + tun_build_ip_pseudo_sum(&packet.iphdr)); + + EXPECT_TRUE(tun_vnet_hash_check(self->source_fd, self->dest_fds, + &packet, sizeof(packet), + VIRTIO_NET_HDR_F_DATA_VALID, + VIRTIO_NET_HASH_REPORT_UDPv4, + 0xfb63539a)); +} + +TEST_F(tun_vnet_hash, ipv6) +{ + struct { + struct ethhdr ethhdr; + struct ipv6hdr ipv6hdr; + } __packed packet; + + tun_build_ethhdr(&packet.ethhdr, ETH_P_IPV6); + tun_build_ipv6hdr(&packet.ipv6hdr, 0, 253); + + EXPECT_TRUE(tun_vnet_hash_check(self->source_fd, self->dest_fds, + &packet, sizeof(packet), 0, + VIRTIO_NET_HASH_REPORT_IPv6, + 0xd6eb560f)); +} + +TEST_F(tun_vnet_hash, tcpv6) +{ + struct { + struct ethhdr ethhdr; + struct ipv6hdr ipv6hdr; + struct tcphdr tcphdr; + } __packed packet; + + tun_build_ethhdr(&packet.ethhdr, ETH_P_IPV6); + tun_build_ipv6hdr(&packet.ipv6hdr, sizeof(struct tcphdr), IPPROTO_TCP); + + tun_build_tcphdr(&packet.tcphdr, + tun_build_ipv6_pseudo_sum(&packet.ipv6hdr)); + + EXPECT_TRUE(tun_vnet_hash_check(self->source_fd, self->dest_fds, + &packet, sizeof(packet), + VIRTIO_NET_HDR_F_DATA_VALID, + VIRTIO_NET_HASH_REPORT_TCPv6, + 0xc2b9f251)); +} + +TEST_F(tun_vnet_hash, udpv6) +{ + struct { + struct ethhdr ethhdr; + struct ipv6hdr ipv6hdr; + struct udphdr udphdr; + } __packed packet; + + tun_build_ethhdr(&packet.ethhdr, ETH_P_IPV6); + tun_build_ipv6hdr(&packet.ipv6hdr, sizeof(struct udphdr), IPPROTO_UDP); + + tun_build_udphdr(&packet.udphdr, + tun_build_ipv6_pseudo_sum(&packet.ipv6hdr)); + + EXPECT_TRUE(tun_vnet_hash_check(self->source_fd, self->dest_fds, + &packet, sizeof(packet), + VIRTIO_NET_HDR_F_DATA_VALID, + VIRTIO_NET_HASH_REPORT_UDPv6, + 0xc2b9f251)); +} + TEST_HARNESS_MAIN From patchwork Tue Oct 8 06:54:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13825753 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 137D61EBFFA for ; Tue, 8 Oct 2024 06:55:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370536; cv=none; b=LHGoi2E6Oyk+l1okqEQvYu3iaS/SbcC9iOjAyCo5MjTsCZGa3aPfAuQKMjgY00bvuNkI3Btb2/RwPh1Xee16gZKMgeS7qCcO/RD/2jHYyMMcX5po2ouxSOoUqwjh0k39nKozo7sELDgKq9jFpfFM9IrhtsSPj+4qUckUHRxIfjw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728370536; c=relaxed/simple; bh=xPXdqMNBwYBxjRVy67D/GdzAQuV+QqFH17klGcbYhos=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To; b=nXoQDSfNxdwRZLwHiP2ug/d5AXmQEHXrY7zompXEt/3x40+uU73InmeuzgXp4NNA6M8pafDJ6txNEALB+hVldw9Xu0DY/Gbcet3NStEGoRuE78j6tSJkOazFI7R+c/3B0p+wELiignlOk+RYS95v0C+0RG5+X8BMt7FXCsi6OYo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=bbH3sJnM; arc=none smtp.client-ip=209.85.216.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="bbH3sJnM" Received: by mail-pj1-f49.google.com with SMTP id 98e67ed59e1d1-2e28bd0bd13so120137a91.1 for ; Mon, 07 Oct 2024 23:55:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1728370534; x=1728975334; darn=vger.kernel.org; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=XNVaNoDirrDF/K5gG8N1Rw46L/tBZORPSDyk4tebyaM=; b=bbH3sJnM3ecJ5UeG8LgP5mWH2Yixw5L7kf+CJ6h0PTE1XXCg5GfKTpldDu3u2E6lps 0d+XsuhfV1txtCcPJXK2jTmt2pfwJw82BC2BU7cEDMvsUkja2K+kiYq2gl7MWCQefDom TQa/EJv0adWdL+xnyqKfwB8p5s+5TkRsllawZ4csczyvEKPDebWxNRK5C5B/cCs+klwv xuJGUi4snW2efsrC+43CzN81eci9eWpCBIG/6jJXrh2miTNCgsgifVayT3MfU2+Yu3C0 eUpOo0eDo4LcZvDo9280NXy/b7f1/yX7z6uvTKU+NnxoY5a46hgmuU3WUM4ShqVsGmnF uXeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728370534; x=1728975334; h=to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XNVaNoDirrDF/K5gG8N1Rw46L/tBZORPSDyk4tebyaM=; b=Xf+WpRcsz+mbn3O7C0dc+xNWGu+QU4Hh+3Mwe6GyU31OaAM4L8UbnLYaavKy3Z3B8Y znzqnv8JV8qBA06wImvPoG3CXlma6zYxT5beOhQ4+urgZWNE+XKV4oFdobC6hzQk8Lkg QbjIEIe5dedJ9Du44ZAG5Enqo63qK60cZEEJyXerhyripgtihp9TvMIikwCAgk7pKaJz 0iPKKT6txV9AzWPfWEKXyxG7dXFzQ+pzP5BLWo3Ygh9wJ4aJQpsXFMVhMkHJ5Fy6pFku 9WM8LIgFKJhGNdS6q5+JTZZkLfVKJ1XmJ2eq33w3aDvb4UF/VVMGUHEd0xuHkKB4CVWh SJgQ== X-Forwarded-Encrypted: i=1; AJvYcCUBVnQCSrqDARoasSQmVgK7XslG2DvoK6MMjwXi7Wwtp5728tPkqMrLCsU4b9N2pMPD2ps=@vger.kernel.org X-Gm-Message-State: AOJu0Yx6p98FVoDZI2gYUk5HzBYk4arr3LFu+mU+NMAVM+nCXmZP/Es7 NheBWQfal7UBhgw5SgQ3RDcHh0/qL71N3v/K8M4QKvPmuR1vPVImLxAORniU8jg= X-Google-Smtp-Source: AGHT+IHkb1uMKsN/5XlbEXxGBI6s7AmtVvSir5FKrdkZc8sYxsL2ddqvfgRpOey0uZac+T06eJdEyg== X-Received: by 2002:a17:90b:4a92:b0:2cb:4e14:fd5d with SMTP id 98e67ed59e1d1-2e1e627f3camr15847213a91.17.1728370534383; Mon, 07 Oct 2024 23:55:34 -0700 (PDT) Received: from localhost ([157.82.207.107]) by smtp.gmail.com with UTF8SMTPSA id 98e67ed59e1d1-2e20b0f64cdsm6721041a91.35.2024.10.07.23.55.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Oct 2024 23:55:34 -0700 (PDT) From: Akihiko Odaki Date: Tue, 08 Oct 2024 15:54:30 +0900 Subject: [PATCH RFC v5 10/10] vhost/net: Support VIRTIO_NET_F_HASH_REPORT Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241008-rss-v5-10-f3cf68df005d@daynix.com> References: <20241008-rss-v5-0-f3cf68df005d@daynix.com> In-Reply-To: <20241008-rss-v5-0-f3cf68df005d@daynix.com> To: Jonathan Corbet , Willem de Bruijn , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 VIRTIO_NET_F_HASH_REPORT allows to report hash values calculated on the host. When VHOST_NET_F_VIRTIO_NET_HDR is employed, it will report no hash values (i.e., the hash_report member is always set to VIRTIO_NET_HASH_REPORT_NONE). Otherwise, the values reported by the underlying socket will be reported. VIRTIO_NET_F_HASH_REPORT requires VIRTIO_F_VERSION_1. Signed-off-by: Akihiko Odaki --- drivers/vhost/net.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f16279351db5..ec1167a782ec 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -73,6 +73,7 @@ enum { VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF) | + (1ULL << VIRTIO_NET_F_HASH_REPORT) | (1ULL << VIRTIO_F_ACCESS_PLATFORM) | (1ULL << VIRTIO_F_RING_RESET) }; @@ -1604,10 +1605,13 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features) size_t vhost_hlen, sock_hlen, hdr_len; int i; - hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | - (1ULL << VIRTIO_F_VERSION_1))) ? - sizeof(struct virtio_net_hdr_mrg_rxbuf) : - sizeof(struct virtio_net_hdr); + if (features & (1ULL << VIRTIO_NET_F_HASH_REPORT)) + hdr_len = sizeof(struct virtio_net_hdr_v1_hash); + else if (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | + (1ULL << VIRTIO_F_VERSION_1))) + hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); + else + hdr_len = sizeof(struct virtio_net_hdr); if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) { /* vhost provides vnet_hdr */ vhost_hlen = hdr_len; @@ -1688,6 +1692,10 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl, return -EFAULT; if (features & ~VHOST_NET_FEATURES) return -EOPNOTSUPP; + if ((features & ((1ULL << VIRTIO_F_VERSION_1) | + (1ULL << VIRTIO_NET_F_HASH_REPORT))) == + (1ULL << VIRTIO_NET_F_HASH_REPORT)) + return -EINVAL; return vhost_net_set_features(n, features); case VHOST_GET_BACKEND_FEATURES: features = VHOST_NET_BACKEND_FEATURES;