From patchwork Thu Feb 27 04:12:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13993685 Received: from mail-ot1-f73.google.com (mail-ot1-f73.google.com [209.85.210.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7EA001D90DB for ; Thu, 27 Feb 2025 04:12:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740629535; cv=none; b=Z1GU2zmOb2rfufIAC0avLeD037RiZxs41BSUJElCkxNkr3IfvYRF9C9QF0wdB60mxIiEjrk3nSBobLdZ1Nc1hkTEoBgKHz/Nn5xiET6dNSK0GLqIXrYcE9hcVTB4SvbA1pkMuUzE6mylDVbjCojN9+HgjnoHAw7hBejY5kkM4fU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740629535; c=relaxed/simple; bh=Z1x9cBAckNAGgXsh1KnLoyHiDSqdyqilemzDU/Dtrzw=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=OeuLBrgTzIYcsOKgLUCUW1Oau0Y7dVgtN+dhfTOxqghjd/LN38NdHQ4QQmCs2oI2PalJJrqH0zO4OoWJQhDIkQLuEvm/8zDJnFFvFcvNkL7rqHNOjectnXHnNsbIsD2rRsNr3d0ommmGOaImxpnp63L0nY8GR0Eo3y7spMXOau0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PVKUrzeV; arc=none smtp.client-ip=209.85.210.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PVKUrzeV" Received: by mail-ot1-f73.google.com with SMTP id 46e09a7af769-728a436797bso328919a34.1 for ; Wed, 26 Feb 2025 20:12:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740629531; x=1741234331; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=3g5LjBldgpfT3/dJj9/PNfchsdoxXJ5s3FXfsdFv20U=; b=PVKUrzeVeAtM5IhujWcTBhshzUSqPoLhuQuTgfeEL9bTeULUUCUm/X/ugp8keQSIIi A0OPz33GYAxh6jleS8fbW6Cyg7Ub1LfiHqcrdUe/XfgMb1RyCj+irFj8ADTmkmBfnSGp 64adCads07Zb6YWcRXyvpLYk+OAwOV58zImpwSILI5/v6FiGadb0C26hSGMWFjmfcn7D 4eqT5E4etlVF52/QgAMtxViMk5su/oVk+Hsd4PWy30I/BIDkyyG7fW+GbWkt+BdJvGHj DZ8GqyaL85GHtoNetXUuIdjsbhSUTuHALBs6qVaWTOm+G3Bf4ilauI83scBI6uMTBKVX dbdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740629531; x=1741234331; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=3g5LjBldgpfT3/dJj9/PNfchsdoxXJ5s3FXfsdFv20U=; b=p6CTO+x6RZBYajkkuLgNzRtyTrVKYEv8t+HqoXBWPnR0FsAnhXjrGal5qIBoRcALXv /u/AaxOWQJNbCVEYslpdBYO3rQpuCDXwMzAmAr6hSMBPg6Uevkg5yVjz3z9C4NU6SyWl BLsjfZWLHJ0YbrhUaUWvF4N8oQk8mL6YuzDYg2WLHRpgDTXeghRDvtyIcWOUyDnDU8Ve npBEhuqszShIM595C+o+7tv9RhPrb9QZbmPsOzaE+VrzBxOpoBS/8O3HsVL20ToMqZy3 QuHZH+yujLqUST9wWMHzuIIacGQS4vozQoFNC2wyFz32o8mEaO3L1dh1lK1Yyb1qeYWe obKg== X-Forwarded-Encrypted: i=1; AJvYcCVCkH6MWWOWYieGirdTfjGs3SEJfqqWpqZhTrLUE59UahSnL1JjQ7zt5w+b4t6KvJCQCc9JGvEZuvUTHCBC4e4=@vger.kernel.org X-Gm-Message-State: AOJu0YxcAL7zxGKbspF18zFvIzrcFIPEi7A5UF26TXAn2RiBq3eGvWyb FHinHJfrUMA/GeVdjInm2nDp+dvWsp/8IqpU7zrg25PTudOKYwwYFYetoAS4rfdbBrvLHIbaeOW wBgKu1jYKREoaT7KVIX/C3g== X-Google-Smtp-Source: AGHT+IH783qYzy8zqnAN9qy+ZgGK9lxDY+qeDhB++EerNVKBi0iW0nYVUAd0UY8jwXAselzgbS5EQUX7E0eXV9iZuQ== X-Received: from oabxe11.prod.google.com ([2002:a05:6870:ce8b:b0:295:f44d:8dfa]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6830:6a90:b0:727:2f79:ce32 with SMTP id 46e09a7af769-7274c563d40mr15643474a34.22.1740629531553; Wed, 26 Feb 2025 20:12:11 -0800 (PST) Date: Thu, 27 Feb 2025 04:12:01 +0000 Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.48.1.658.g4767266eb4-goog Message-ID: <20250227041209.2031104-1-almasrymina@google.com> Subject: [PATCH net-next v6 0/8] Device memory TCP TX From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux.dev, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , " =?utf-8?q?Eugenio_P=C3=A9rez?= " , Stefan Hajnoczi , Stefano Garzarella , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.com/ === v6 has no major changes. Addressed a few issues from Paolo and David, and collected Acks from Stan. Thank you everyone for the review! Changes: - retain behavior to process MSG_FASTOPEN even if the provided cmsg is invalid (Paolo). - Rework the freeing of tx_vec slightly (it now has its own err label). (Paolo). - Squash the commit that makes dmabuf unbinding scheduled work into the same one which implements the TX path so we don't run into future errors on bisecting (Paolo). - Fix/add comments to explain how dmabuf binding refcounting works (David). v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.com/ === v5 has no major changes; it clears up the relatively minor issues pointed out to in v4, and rebases the series on top of net-next to resolve the conflict with a patch that raced to the tree. It also collects the review tags from v4. Changes: - Rebase to net-next - Fix issues in selftest (Stan). - Address comments in the devmem and netmem driver docs (Stan and Bagas) - Fix zerocopy_fill_skb_from_devmem return error code (Stan). v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.com/ === v4 mainly addresses the critical driver support issue surfaced in v3 by Paolo and Stan. Drivers aiming to support netmem_tx should make sure not to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs may come from dma-bufs. Additionally other feedback from v3 is addressed. Major changes: - Add helpers to handle netmem dma-addrs. Add GVE support for netmem_tx. - Fix binding->tx_vec not being freed on error paths during the tx binding. - Add a minimal devmem_tx test to devmem.py. - Clean up everything obsolete from the cover letter (Paolo). v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* === Address minor comments from RFCv2 and fix a few build warnings and ynl-regen issues. No major changes. RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* ======= RFC v2 addresses much of the feedback from RFC v1. I plan on sending something close to this as net-next reopens, sending it slightly early to get feedback if any. Major changes: -------------- - much improved UAPI as suggested by Stan. We now interpret the iov_base of the passed in iov from userspace as the offset into the dmabuf to send from. This removes the need to set iov.iov_base = NULL which may be confusing to users, and enables us to send multiple iovs in the same sendmsg() call. ncdevmem and the docs show a sample use of that. - Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think this is good improvment as it was confusing to keep track of 2 iterators for the same sendmsg, and mistracking both iterators caused a couple of bugs reported in the last iteration that are now resolved with this streamlining. - Improved test coverage in ncdevmem. Now multiple sendmsg() are tested, and sending multiple iovs in the same sendmsg() is tested. - Fixed issue where dmabuf unmapping was happening in invalid context (Stan). ==================================================================== The TX path had been dropped from the Device Memory TCP patch series post RFCv1 [1], to make that series slightly easier to review. This series rebases the implementation of the TX path on top of the net_iov/netmem framework agreed upon and merged. The motivation for the feature is thoroughly described in the docs & cover letter of the original proposal, so I don't repeat the lengthy descriptions here, but they are available in [1]. Full outline on usage of the TX path is detailed in the documentation included with this series. Test example is available via the kselftest included in the series as well. The series is relatively small, as the TX path for this feature largely piggybacks on the existing MSG_ZEROCOPY implementation. Patch Overview: --------------- 1. Documentation & tests to give high level overview of the feature being added. 1. Add netmem refcounting needed for the TX path. 2. Devmem TX netlink API. 3. Devmem TX net stack implementation. 4. Make dma-buf unbinding scheduled work to handle TX cases where it gets freed from contexts where we can't sleep. 5. Add devmem TX documentation. 6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver feature flag, and docs to enable drivers to declare netmem_tx support. 7. Guard netmem_tx against being enabled against drivers that don't support it. 8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to devmem.py. Testing: -------- Testing is very similar to devmem TCP RX path. The ncdevmem test used for the RX path is now augemented with client functionality to test TX path. * Test Setup: Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Performance results are not included with this version, unfortunately. I'm having issues running the dma-buf exporter driver against the upstream kernel on my test setup. The issues are specific to that dma-buf exporter and do not affect this patch series. I plan to follow up this series with perf fixes if the tests point to issues once they're up and running. Special thanks to Stan who took a stab at rebasing the TX implementation on top of the netmem/net_iov framework merged. Parts of his proposal [2] that are reused as-is are forked off into their own patches to give full credit. [1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.com/ [2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#m066dd407fbed108828e2c40ae50e3f4376ef57fd Cc: sdf@fomichev.me Cc: asml.silence@gmail.com Cc: dw@davidwei.uk Cc: Jamal Hadi Salim Cc: Victor Nogueira Cc: Pedro Tammela Cc: Samiullah Khawaja Mina Almasry (7): net: add get_netmem/put_netmem support net: devmem: Implement TX path net: add devmem TCP TX documentation net: enable driver support for netmem TX gve: add netmem TX support to GVE DQO-RDA mode net: check for driver support in netmem TX selftests: ncdevmem: Implement devmem TCP TX Stanislav Fomichev (1): net: devmem: TCP tx netlink api Documentation/netlink/specs/netdev.yaml | 12 + Documentation/networking/devmem.rst | 150 ++++++++- .../networking/net_cachelines/net_device.rst | 1 + Documentation/networking/netdev-features.rst | 5 + Documentation/networking/netmem.rst | 23 +- drivers/net/ethernet/google/gve/gve_main.c | 4 + drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +- include/linux/netdevice.h | 2 + include/linux/skbuff.h | 17 +- include/linux/skbuff_ref.h | 4 +- include/net/netmem.h | 23 ++ include/net/sock.h | 1 + include/uapi/linux/netdev.h | 1 + net/core/datagram.c | 48 ++- net/core/dev.c | 3 + net/core/devmem.c | 115 ++++++- net/core/devmem.h | 77 ++++- net/core/netdev-genl-gen.c | 13 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 73 ++++- net/core/skbuff.c | 48 ++- net/core/sock.c | 6 + net/ipv4/ip_output.c | 3 +- net/ipv4/tcp.c | 50 ++- net/ipv6/ip6_output.c | 3 +- net/vmw_vsock/virtio_transport_common.c | 5 +- tools/include/uapi/linux/netdev.h | 1 + .../selftests/drivers/net/hw/devmem.py | 26 +- .../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++- 29 files changed, 950 insertions(+), 73 deletions(-) base-commit: 80c4a0015ce249cf0917a04dbb3cc652a6811079