From patchwork Mon Aug 5 21:25:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13754123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BE61C52D71 for ; Mon, 5 Aug 2024 21:25:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79B186B007B; Mon, 5 Aug 2024 17:25:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 74ABA6B0082; Mon, 5 Aug 2024 17:25:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C4706B0083; Mon, 5 Aug 2024 17:25:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 387AC6B007B for ; Mon, 5 Aug 2024 17:25:53 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9CBB1140994 for ; Mon, 5 Aug 2024 21:25:52 +0000 (UTC) X-FDA: 82419474144.10.293EE3A Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf26.hostedemail.com (Postfix) with ESMTP id B5E3C140010 for ; Mon, 5 Aug 2024 21:25:50 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="RyQf/+Oy"; spf=pass (imf26.hostedemail.com: domain of 3XUOxZgsKCOQGRSGYXeSOTGMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--almasrymina.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3XUOxZgsKCOQGRSGYXeSOTGMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722893089; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6o1mPVkgBl7Rk0SJ/83Yz7zj/H+Vx/C8o2zBt984/uM=; b=XI730l3aN+W5YNe8tt52XsDe210Bfb4lfKFrfTtSO+w4jnPR6o24qU+EjwgBM+TzEGpptE w8kQbGNXmgnZdD0EGvEG9cES15f7lrbXu69H9bRZVX0ufqk58QlQQu3AD+ktBSqDPqa5Tj u9OQ/jpXrJKD13nVMYIFc8WSZJGy4fg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722893089; a=rsa-sha256; cv=none; b=4Lok/YofFGUDV/Nhy+9mD4NL1a/hIzfXtjyxAzeCO6MFNB9jll7/7qJn1agEJ5nmsAVJxq 1MAWHHCOdbKBt7JTv1JkAhNeFae7hTkf8wpbmGEDV27w/2fT4qHJSP04RwqMp9sJjmdYbv VqUwa1vwUIBCnzzBMLcAtVniCuAV7iI= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="RyQf/+Oy"; spf=pass (imf26.hostedemail.com: domain of 3XUOxZgsKCOQGRSGYXeSOTGMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--almasrymina.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3XUOxZgsKCOQGRSGYXeSOTGMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-664fc7c4e51so228429187b3.3 for ; Mon, 05 Aug 2024 14:25:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722893149; x=1723497949; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6o1mPVkgBl7Rk0SJ/83Yz7zj/H+Vx/C8o2zBt984/uM=; b=RyQf/+Oy8r8txwnR0HyzjFcLG2QOMmz0L1baCEx8c04/snOXE2kR2ORwS6zaXBg3Wi cr5iJCAQFHXuk/+Qcy+1yPYuDuYQY+/7QXZ7f6stC+tB+2WmvhU1I5A1+TzwNWQP5i1j 4Imn+VPXqCvachuT4ym1aVO4JsZdAOL4rZbGmxdQNLV1EOPT1M6UKwFqbEibapcyTYGT qdxTmzmq7+IWHaH8DdFV6Xiy3FMgyohgl6jFDCwAVz2ZMVJPWHWcy9+bp5f/TIrHhn0g pXfTCy76JrkgveKdK0+JoO1jzfNJNZuHc3YN4w5K6uzEeQZR38T/GUBkFSQUFBMnmIpI WNLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722893149; x=1723497949; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6o1mPVkgBl7Rk0SJ/83Yz7zj/H+Vx/C8o2zBt984/uM=; b=M3DuKa8vreP0gth/nP67DtJsNrhcZIKoYBvyfkNLhJfqj6kYTbOtdWP620bdiEkE51 DT/fn+GMPJfPevHIUEAJMUsnHjsZPD4veVxYwWyNkYBIKCfkA8pLmZn4rtiXt3OQQenw VYXEFLtbNtQrXKsJ8qNbYq/ihmSgdL/FbvMZLwQdj7rnIy15/kQDhxD+EUtZJeywKkg3 soXqtRKtO95gSyBomx6aoxC1gM2Mm3jdE9O6mCuoIiEv+7FXwIK1BiI/OXRnHcNxwMDu b1d1ZpJqnJQejz5HaEXrujSEp89mrxrhrEe4am0YrH6+1L8jDL1ciiblOrxuqlJyKimI NUmQ== X-Forwarded-Encrypted: i=1; AJvYcCWFq1U9XsBn7lVZUuDhzmAs3OBa1qlMe70BF8YRs1aVSYnGxnVC25fYDBP96jA+WAU7l4A1MolPEjKvmvazO3y4GFw= X-Gm-Message-State: AOJu0YxKCakZtDQmpIUPSof6X2pYSvBBjQevU/ndHIHAxhp07sXfw6/k /xTy2BeAn6FUn5mkZJ1mqPd/dD0ZotzAKWvPx/540VgGf+zVf+R0duIFyYAt/cieBt8y7mNGF1b Gz3+Jd3A0Qb858wGovDhXWw== X-Google-Smtp-Source: AGHT+IGNE5qJ/3E2RUAMAtoAJZ/NJiU8Jw0HTuUDnD/ItksSHe4HN1p7bqYUvP48nsLfSjHciLbiwKuUcqxrz/hmBA== X-Received: from almasrymina.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:4bc5]) (user=almasrymina job=sendgmr) by 2002:a05:690c:112:b0:62c:fb55:aeab with SMTP id 00721157ae682-6896458f799mr9240147b3.8.1722893149566; Mon, 05 Aug 2024 14:25:49 -0700 (PDT) Date: Mon, 5 Aug 2024 21:25:19 +0000 In-Reply-To: <20240805212536.2172174-1-almasrymina@google.com> Mime-Version: 1.0 References: <20240805212536.2172174-1-almasrymina@google.com> X-Mailer: git-send-email 2.46.0.rc2.264.g509ed76dc8-goog Message-ID: <20240805212536.2172174-7-almasrymina@google.com> Subject: [PATCH net-next v18 06/14] page_pool: devmem support From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Steffen Klassert , Herbert Xu , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Bagas Sanjaya , Christoph Hellwig , Nikolay Aleksandrov , Taehee Yoo , Pavel Begunkov , David Wei , Jason Gunthorpe , Yunsheng Lin , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , linux-mm@kvack.org, Matthew Wilcox X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B5E3C140010 X-Stat-Signature: kbhdhwzx8m4fo8ytyiqtor6hhpw4x5nr X-HE-Tag: 1722893150-521774 X-HE-Meta: U2FsdGVkX1+EaQ8MOBXkh4r5vQvkMMDjGwGWkXV53NK+oJzfdhcuz78KnbLjS+bA0IO4D5E7pnKLdO5ADQPwTzTUw9fOlcTnoXDSujdrfDTLoEKhaSl2k5vkIou+o+hHQLn5VsxG44V8Ep5wAtrl16lzcYecWu1opy/BgTWLnODSogNJjDhxN8IeLK6WzeNaHxhJH4Gh9Dh4rAn49YCVePMWOpw5q1Ez/dneBtz6CFQYjAep/BegcL7VVouVKi6BSgTbVOpRlEWe1dgD29RmOiYJpICjUMvaotkMmYFPfylZ9TXh0SYXqvtUl5HBVJ1NBfVsspdZmFx7xC+SaDNTz//TlgP/TjdY9K2J0d24AHA0Xf/8wIyYJ0m1KIX2XyWfk3UARAnbEFDn0JgavIM4nyst1ie/83smEbznrGi1UDlen2LtA5Ja+psXa+PFhFgvpRA6lm05Ou5cwKI756xpuqdp+UWJUMNxrp5Ux4xM2yR4NNhLhfjGmrJYoKTtt/kK2rJzwMgqMm0/9G8zvDsOBKIl/AvVXqXl93QgQph4/yDDFuvXPmRsLZUovCTaCmXc45MT+SdRhfhq3+BgVXsQdCnStXyg4zcbTMjsL7Z7Cb4q9FU5CKdwFZhLI/AmZGDOCxRTXFAEmEvsuvYgCQpdWHCNWI4yLVnoL61b5CkBmAshjeAEh5qd2mzpcV6CKACK6zuXDBcz2VYdLP/YgSbrmh6Rs09zcsZzxwMMdk9tDoQ7hPf2QryIhKA2XEK2OJuyJRoYBpzevNZjStKpD9qN69ZvDIoF4Itk/Cb6Wpz/MDrvzVjXawmymLEDXmL5W7HTFjaxzPM7iOEFEUBmoS2tg1bvx0ZKN47c9OcYSAmtrIUs74B2vxC4o5dTZC144XdLLQVLjdY82hUblncCtuuklDtxS5761+po7KC5c5wWz411wltRmxiIyk/9LRakpsOSUr51I1s9HILD7e7CQyE nPxM3M1j oGfF2V46yNOX4hI1aSWNQ05YCmbplNrclu6UyZGeIDTiDdz5vAyfK77sNBbSclU+Crexazn/pyBpr/2kkk8EcNw9BtU/agPJ4ktLRCRGrbGMZEYQPdEJ/QKdbfoqgFJPRmXBzbgAn66tUwdoUaOotL1GFOlmZdJOdrG8uQvoRd01S62fa3Ynd76DlQGpo2IAXjGE2wD20VLqT3Ncvr1+7YE3lHmIeOwFPIn7hvsYkL2nTFk2kAZ4rlpytjwSXshPCjf2uMk68R9DL3eF6xIlCeswLPi9DF+cZiiTWRheaZM78eL+s0MZoQny3Fcyv3LqjBvA/qqRvPFC+rIch4P3Xi1ZYIml7EauHGDUIZFFFKSeROro7uDLrehQiHy+nmdB/t3ZiwOS/Cs56i2EHBpTThXDNeTyNqZCRzDspRtGtFGj2YcQnHcs75FJZnal5U2uNd4LrCfjRW/9Tjs8l+3CknMcRMQ+Pl/4mlbSEf5KSOGNHW0nEypJBdyIBUxcV/Pn0TxVQY7olfe2Q5rzfa7kafbc9x+0sSOOxHopppCnYeQL5qFRz/r3o2iB8EvIuYxlLYgfi/W2ot709oTLU6fJ47Vt3AllLcH2phHNZIrjqXTREflaHgJlBN5wVSSvNg9/jN55FezDisZ0w82mVUxeDTDJNSFgA2FNt0xEFTzbG4GaBUruksvmTcUWnVUEitSjYbdBu24K6d2o0atm9fYTttrY+cqh19oV1qIvsE1Ftwcceth1sW51bOcYLtppLnZqaoLUk2Ng74tqZ59Y+Vgc9aDg9KCwvCLP3xQDZgKLD38alTAJEnKDNhr71R/3Srw+5kvMa8cq+qGGJFQZtvnnlT6snA0CuC0GDiCdRDpUsF3BCu1I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Convert netmem to be a union of struct page and struct netmem. Overload the LSB of struct netmem* to indicate that it's a net_iov, otherwise it's a page. Currently these entries in struct page are rented by the page_pool and used exclusively by the net stack: struct { unsigned long pp_magic; struct page_pool *pp; unsigned long _pp_mapping_pad; unsigned long dma_addr; atomic_long_t pp_ref_count; }; Mirror these (and only these) entries into struct net_iov and implement netmem helpers that can access these common fields regardless of whether the underlying type is page or net_iov. Implement checks for net_iov in netmem helpers which delegate to mm APIs, to ensure net_iov are never passed to the mm stack. Signed-off-by: Mina Almasry Reviewed-by: Pavel Begunkov --- v17: - Rename netmem_to_pfn to netmem_pfn_trace (Jakub) - Move some low level netmem helpers to netmem_priv.h (Jakub). v13: - Move NET_IOV dependent changes to this patch. - Fixed comment (Pavel) - Applied Reviewed-by from Pavel. v9: https://lore.kernel.org/netdev/20240403002053.2376017-8-almasrymina@google.com/ - Remove CONFIG checks in netmem_is_net_iov() (Pavel/David/Jens) v7: - Remove static_branch_unlikely from netmem_to_net_iov(). We're getting better results from the fast path in bench_page_pool_simple tests without the static_branch_unlikely, and the addition of static_branch_unlikely doesn't improve performance of devmem TCP. Additionally only check netmem_to_net_iov() if CONFIG_DMA_SHARED_BUFFER is enabled, otherwise dmabuf net_iovs cannot exist anyway. net-next base: 8 cycle fast path. with static_branch_unlikely: 10 cycle fast path. without static_branch_unlikely: 9 cycle fast path. CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path as baseline. Performance of devmem TCP is at 95% line rate is regardless of static_branch_unlikely or not. v6: - Rebased on top of the merged netmem_ref type. - Rebased on top of the merged skb_pp_frag_ref() changes. v5: - Use netmem instead of page* with LSB set. - Use pp_ref_count for refcounting net_iov. - Removed many of the custom checks for netmem. v1: - Disable fragmentation support for iov properly. - fix napi_pp_put_page() path (Yunsheng). - Use pp_frag_count for devmem refcounting. Cc: linux-mm@kvack.org Cc: Matthew Wilcox --- include/net/netmem.h | 108 +++++++++++++++++++++++++++++-- include/net/page_pool/helpers.h | 12 ++-- include/trace/events/page_pool.h | 12 ++-- net/core/devmem.c | 3 + net/core/netmem_priv.h | 36 +++++++++++ net/core/page_pool.c | 38 ++++++----- net/core/skbuff.c | 23 ++++--- 7 files changed, 181 insertions(+), 51 deletions(-) create mode 100644 net/core/netmem_priv.h diff --git a/include/net/netmem.h b/include/net/netmem.h index 664df8325ece5..99531780e53af 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -9,14 +9,51 @@ #define _NET_NETMEM_H #include +#include /* net_iov */ +DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers); + +/* We overload the LSB of the struct page pointer to indicate whether it's + * a page or net_iov. + */ +#define NET_IOV 0x01UL + struct net_iov { + unsigned long __unused_padding; + unsigned long pp_magic; + struct page_pool *pp; struct dmabuf_genpool_chunk_owner *owner; unsigned long dma_addr; + atomic_long_t pp_ref_count; }; +/* These fields in struct page are used by the page_pool and net stack: + * + * struct { + * unsigned long pp_magic; + * struct page_pool *pp; + * unsigned long _pp_mapping_pad; + * unsigned long dma_addr; + * atomic_long_t pp_ref_count; + * }; + * + * We mirror the page_pool fields here so the page_pool can access these fields + * without worrying whether the underlying fields belong to a page or net_iov. + * + * The non-net stack fields of struct page are private to the mm stack and must + * never be mirrored to net_iov. + */ +#define NET_IOV_ASSERT_OFFSET(pg, iov) \ + static_assert(offsetof(struct page, pg) == \ + offsetof(struct net_iov, iov)) +NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic); +NET_IOV_ASSERT_OFFSET(pp, pp); +NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr); +NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count); +#undef NET_IOV_ASSERT_OFFSET + static inline struct dmabuf_genpool_chunk_owner * net_iov_owner(const struct net_iov *niov) { @@ -47,20 +84,22 @@ net_iov_binding(const struct net_iov *niov) */ typedef unsigned long __bitwise netmem_ref; +static inline bool netmem_is_net_iov(const netmem_ref netmem) +{ + return (__force unsigned long)netmem & NET_IOV; +} + /* This conversion fails (returns NULL) if the netmem_ref is not struct page * backed. - * - * Currently struct page is the only possible netmem, and this helper never - * fails. */ static inline struct page *netmem_to_page(netmem_ref netmem) { + if (WARN_ON_ONCE(netmem_is_net_iov(netmem))) + return NULL; + return (__force struct page *)netmem; } -/* Converting from page to netmem is always safe, because a page can always be - * a netmem. - */ static inline netmem_ref page_to_netmem(struct page *page) { return (__force netmem_ref)page; @@ -68,17 +107,72 @@ static inline netmem_ref page_to_netmem(struct page *page) static inline int netmem_ref_count(netmem_ref netmem) { + /* The non-pp refcount of net_iov is always 1. On net_iov, we only + * support pp refcounting which uses the pp_ref_count field. + */ + if (netmem_is_net_iov(netmem)) + return 1; + return page_ref_count(netmem_to_page(netmem)); } -static inline unsigned long netmem_to_pfn(netmem_ref netmem) +static inline unsigned long netmem_pfn_trace(netmem_ref netmem) { + if (netmem_is_net_iov(netmem)) + return 0; + return page_to_pfn(netmem_to_page(netmem)); } +static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem) +{ + return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV); +} + +static inline struct page_pool *netmem_get_pp(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->pp; +} + +static inline atomic_long_t *netmem_get_pp_ref_count_ref(netmem_ref netmem) +{ + return &__netmem_clear_lsb(netmem)->pp_ref_count; +} + +static inline bool netmem_is_pref_nid(netmem_ref netmem, int pref_nid) +{ + /* Assume net_iov are on the preferred node without actually + * checking... + * + * This check is only used to check for recycling memory in the page + * pool's fast paths. Currently the only implementation of net_iov + * is dmabuf device memory. It's a deliberate decision by the user to + * bind a certain dmabuf to a certain netdev, and the netdev rx queue + * would not be able to reallocate memory from another dmabuf that + * exists on the preferred node, so, this check doesn't make much sense + * in this case. Assume all net_iovs can be recycled for now. + */ + if (netmem_is_net_iov(netmem)) + return true; + + return page_to_nid(netmem_to_page(netmem)) == pref_nid; +} + static inline netmem_ref netmem_compound_head(netmem_ref netmem) { + /* niov are never compounded */ + if (netmem_is_net_iov(netmem)) + return netmem; + return page_to_netmem(compound_head(netmem_to_page(netmem))); } +static inline void *netmem_address(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) + return NULL; + + return page_address(netmem_to_page(netmem)); +} + #endif /* _NET_NETMEM_H */ diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 8f27ecc00bb16..50c1efd4018fd 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -216,7 +216,7 @@ page_pool_get_dma_dir(const struct page_pool *pool) static inline void page_pool_fragment_netmem(netmem_ref netmem, long nr) { - atomic_long_set(&netmem_to_page(netmem)->pp_ref_count, nr); + atomic_long_set(netmem_get_pp_ref_count_ref(netmem), nr); } /** @@ -244,7 +244,7 @@ static inline void page_pool_fragment_page(struct page *page, long nr) static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) { - struct page *page = netmem_to_page(netmem); + atomic_long_t *pp_ref_count = netmem_get_pp_ref_count_ref(netmem); long ret; /* If nr == pp_ref_count then we have cleared all remaining @@ -261,19 +261,19 @@ static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) * initially, and only overwrite it when the page is partitioned into * more than one piece. */ - if (atomic_long_read(&page->pp_ref_count) == nr) { + if (atomic_long_read(pp_ref_count) == nr) { /* As we have ensured nr is always one for constant case using * the BUILD_BUG_ON(), only need to handle the non-constant case * here for pp_ref_count draining, which is a rare case. */ BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1); if (!__builtin_constant_p(nr)) - atomic_long_set(&page->pp_ref_count, 1); + atomic_long_set(pp_ref_count, 1); return 0; } - ret = atomic_long_sub_return(nr, &page->pp_ref_count); + ret = atomic_long_sub_return(nr, pp_ref_count); WARN_ON(ret < 0); /* We are the last user here too, reset pp_ref_count back to 1 to @@ -282,7 +282,7 @@ static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) * page_pool_unref_page() currently. */ if (unlikely(!ret)) - atomic_long_set(&page->pp_ref_count, 1); + atomic_long_set(pp_ref_count, 1); return ret; } diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h index 543e54e432a18..31825ed300324 100644 --- a/include/trace/events/page_pool.h +++ b/include/trace/events/page_pool.h @@ -57,12 +57,12 @@ TRACE_EVENT(page_pool_state_release, __entry->pool = pool; __entry->netmem = (__force unsigned long)netmem; __entry->release = release; - __entry->pfn = netmem_to_pfn(netmem); + __entry->pfn = netmem_pfn_trace(netmem); ), - TP_printk("page_pool=%p netmem=%p pfn=0x%lx release=%u", + TP_printk("page_pool=%p netmem=%p is_net_iov=%lu pfn=0x%lx release=%u", __entry->pool, (void *)__entry->netmem, - __entry->pfn, __entry->release) + __entry->netmem & NET_IOV, __entry->pfn, __entry->release) ); TRACE_EVENT(page_pool_state_hold, @@ -83,12 +83,12 @@ TRACE_EVENT(page_pool_state_hold, __entry->pool = pool; __entry->netmem = (__force unsigned long)netmem; __entry->hold = hold; - __entry->pfn = netmem_to_pfn(netmem); + __entry->pfn = netmem_pfn_trace(netmem); ), - TP_printk("page_pool=%p netmem=%p pfn=0x%lx hold=%u", + TP_printk("page_pool=%p netmem=%p is_net_iov=%lu, pfn=0x%lx hold=%u", __entry->pool, (void *)__entry->netmem, - __entry->pfn, __entry->hold) + __entry->netmem & NET_IOV, __entry->pfn, __entry->hold) ); TRACE_EVENT(page_pool_update_nid, diff --git a/net/core/devmem.c b/net/core/devmem.c index 3f73d0bda023f..befff59a2ee64 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -80,7 +80,10 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) index = offset / PAGE_SIZE; niov = &owner->niovs[index]; + niov->pp_magic = 0; + niov->pp = NULL; niov->dma_addr = 0; + atomic_long_set(&niov->pp_ref_count, 0); return niov; } diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h new file mode 100644 index 0000000000000..d18eaca38cdaa --- /dev/null +++ b/net/core/netmem_priv.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __NETMEM_PRIV_H +#define __NETMEM_PRIV_H + +static inline unsigned long netmem_get_pp_magic(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->pp_magic; +} + +static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic) +{ + __netmem_clear_lsb(netmem)->pp_magic |= pp_magic; +} + +static inline void netmem_clear_pp_magic(netmem_ref netmem) +{ + __netmem_clear_lsb(netmem)->pp_magic = 0; +} + +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool) +{ + __netmem_clear_lsb(netmem)->pp = pool; +} + +static inline unsigned long netmem_get_dma_addr(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->dma_addr; +} + +static inline void netmem_set_dma_addr(netmem_ref netmem, + unsigned long dma_addr) +{ + __netmem_clear_lsb(netmem)->dma_addr = dma_addr; +} +#endif diff --git a/net/core/page_pool.c b/net/core/page_pool.c index a032f731d4146..c5c303746d494 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -25,6 +25,9 @@ #include #include "page_pool_priv.h" +#include "netmem_priv.h" + +DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); #define DEFER_TIME (msecs_to_jiffies(1000)) #define DEFER_WARN_INTERVAL (60 * HZ) @@ -358,7 +361,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool) if (unlikely(!netmem)) break; - if (likely(page_to_nid(netmem_to_page(netmem)) == pref_nid)) { + if (likely(netmem_is_pref_nid(netmem, pref_nid))) { pool->alloc.cache[pool->alloc.count++] = netmem; } else { /* NUMA mismatch; @@ -454,10 +457,8 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - page->pp = pool; - page->pp_magic |= PP_SIGNATURE; + netmem_set_pp(netmem, pool); + netmem_or_pp_magic(netmem, PP_SIGNATURE); /* Ensuring all pages have been split into one fragment initially: * page_pool_set_pp_info() is only called once for every page when it @@ -472,10 +473,8 @@ static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) static void page_pool_clear_pp_info(netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - page->pp_magic = 0; - page->pp = NULL; + netmem_clear_pp_magic(netmem); + netmem_set_pp(netmem, NULL); } static struct page *__page_pool_alloc_page_order(struct page_pool *pool, @@ -692,8 +691,9 @@ static bool page_pool_recycle_in_cache(netmem_ref netmem, static bool __page_pool_page_can_be_recycled(netmem_ref netmem) { - return page_ref_count(netmem_to_page(netmem)) == 1 && - !page_is_pfmemalloc(netmem_to_page(netmem)); + return netmem_is_net_iov(netmem) || + (page_ref_count(netmem_to_page(netmem)) == 1 && + !page_is_pfmemalloc(netmem_to_page(netmem))); } /* If the page refcnt == 1, this will try to recycle the page. @@ -728,6 +728,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem, /* Page found as candidate for recycling */ return netmem; } + /* Fallback/non-XDP mode: API user have elevated refcnt. * * Many drivers split up the page into fragments, and some @@ -949,7 +950,7 @@ static void page_pool_empty_ring(struct page_pool *pool) /* Empty recycle ring */ while ((netmem = (__force netmem_ref)ptr_ring_consume_bh(&pool->ring))) { /* Verify the refcnt invariant of cached pages */ - if (!(page_ref_count(netmem_to_page(netmem)) == 1)) + if (!(netmem_ref_count(netmem) == 1)) pr_crit("%s() page_pool refcnt %d violation\n", __func__, netmem_ref_count(netmem)); @@ -1102,9 +1103,7 @@ EXPORT_SYMBOL(page_pool_update_nid); dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - dma_addr_t ret = page->dma_addr; + dma_addr_t ret = netmem_get_dma_addr(netmem); if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) ret <<= PAGE_SHIFT; @@ -1115,18 +1114,17 @@ EXPORT_SYMBOL(page_pool_get_dma_addr_netmem); bool page_pool_set_dma_addr_netmem(netmem_ref netmem, dma_addr_t addr) { - struct page *page = netmem_to_page(netmem); - if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { - page->dma_addr = addr >> PAGE_SHIFT; + netmem_set_dma_addr(netmem, addr >> PAGE_SHIFT); /* We assume page alignment to shave off bottom bits, * if this "compression" doesn't work we need to drop. */ - return addr != (dma_addr_t)page->dma_addr << PAGE_SHIFT; + return addr != (dma_addr_t)netmem_get_dma_addr(netmem) + << PAGE_SHIFT; } - page->dma_addr = addr; + netmem_set_dma_addr(netmem, addr); return false; } EXPORT_SYMBOL(page_pool_set_dma_addr_netmem); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index de2a044cc6656..9e2b283427934 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -89,6 +89,7 @@ #include "dev.h" #include "sock_destructor.h" +#include "netmem_priv.h" #ifdef CONFIG_SKB_EXTENSIONS static struct kmem_cache *skbuff_ext_cache __ro_after_init; @@ -920,9 +921,9 @@ static void skb_clone_fraglist(struct sk_buff *skb) skb_get(list); } -static bool is_pp_page(struct page *page) +static bool is_pp_netmem(netmem_ref netmem) { - return (page->pp_magic & ~0x3UL) == PP_SIGNATURE; + return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE; } int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb, @@ -1020,9 +1021,7 @@ EXPORT_SYMBOL(skb_cow_data_for_xdp); #if IS_ENABLED(CONFIG_PAGE_POOL) bool napi_pp_put_page(netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - page = compound_head(page); + netmem = netmem_compound_head(netmem); /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation * in order to preserve any existing bits, such as bit 0 for the @@ -1031,10 +1030,10 @@ bool napi_pp_put_page(netmem_ref netmem) * and page_is_pfmemalloc() is checked in __page_pool_put_page() * to avoid recycling the pfmemalloc page. */ - if (unlikely(!is_pp_page(page))) + if (unlikely(!is_pp_netmem(netmem))) return false; - page_pool_put_full_netmem(page->pp, page_to_netmem(page), false); + page_pool_put_full_netmem(netmem_get_pp(netmem), netmem, false); return true; } @@ -1061,7 +1060,7 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data) static int skb_pp_frag_ref(struct sk_buff *skb) { struct skb_shared_info *shinfo; - struct page *head_page; + netmem_ref head_netmem; int i; if (!skb->pp_recycle) @@ -1070,11 +1069,11 @@ static int skb_pp_frag_ref(struct sk_buff *skb) shinfo = skb_shinfo(skb); for (i = 0; i < shinfo->nr_frags; i++) { - head_page = compound_head(skb_frag_page(&shinfo->frags[i])); - if (likely(is_pp_page(head_page))) - page_pool_ref_page(head_page); + head_netmem = netmem_compound_head(shinfo->frags[i].netmem); + if (likely(is_pp_netmem(head_netmem))) + page_pool_ref_netmem(head_netmem); else - page_ref_inc(head_page); + page_ref_inc(netmem_to_page(head_netmem)); } return 0; }