From patchwork Wed Apr 3 00:20:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13614773 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C276C6FD1F for ; Wed, 3 Apr 2024 00:21:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A7CA6B0087; Tue, 2 Apr 2024 20:21:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 231506B0089; Tue, 2 Apr 2024 20:21:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 05C476B008C; Tue, 2 Apr 2024 20:21:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CD98A6B0089 for ; Tue, 2 Apr 2024 20:21:15 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 88675160AE3 for ; Wed, 3 Apr 2024 00:21:15 +0000 (UTC) X-FDA: 81966316110.29.5F7E64E Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf09.hostedemail.com (Postfix) with ESMTP id 9BE8F14001C for ; Wed, 3 Apr 2024 00:21:13 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nr9aBx9z; spf=pass (imf09.hostedemail.com: domain of 3-KAMZgsKCKMDOPDVUbPLQDJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--almasrymina.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3-KAMZgsKCKMDOPDVUbPLQDJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712103673; a=rsa-sha256; cv=none; b=ruXrDl70IVzn9wdsdwtM4GQGkczdTzKoOJ4qHKIuSYEO0Ph4UDMKZ4yiQeYDkOpE0/HdLU 0Zpo5Kbx58inO0D7VHdPo5Txk1WnoPvQExAJcP95ElKMZMPTpmn2pTCw7RBhpJCZkztolp ASDu/Ih1ojFZC2LplSl/fRcAYsoy0HY= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nr9aBx9z; spf=pass (imf09.hostedemail.com: domain of 3-KAMZgsKCKMDOPDVUbPLQDJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--almasrymina.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3-KAMZgsKCKMDOPDVUbPLQDJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712103673; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uMYateSukf7H0oOJ+C+iNu7/xSffwSrWgNekXtbJjIw=; b=T38tahOvlNfKkbDdQkZnSKsJQuwhG4yHLy5qHzRpzyap+Y0B4R2IcagD7Rn1ajOz+TUOm/ AQK6pnZ1AaNhZ2GPpzYNtKD7YWLws+vtaCUhHmzqa9qq94SUEolWMYtksNUdxkQ5x9DxZw tKs8oJHrTDZqjiUkochQR9Tc7Eoo3mI= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-60ccc3cfa39so91473667b3.2 for ; Tue, 02 Apr 2024 17:21:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712103672; x=1712708472; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uMYateSukf7H0oOJ+C+iNu7/xSffwSrWgNekXtbJjIw=; b=nr9aBx9z6uaRBvG0abC6Jq7rMUWCwC82AFvB0C0+hqg66M5lqKMOaSEPtXgN1377kS 3Q05KqChp7EKPp0WNUxzLORAnXqpDjWUexnwmltKXi7zc4Jei+Ff3AW1bbw/5RuDp+1m GfTjCiadHcYpidq7dOmwPkvR5HZN6A24z6irbAkp2lGK3xpHbACdZOO8TxE5gPczkMvs Y+m86iuvj59HIqVeYPC84GeTMbuTkHqgK5ypQmYO9UWG4JVvKgU4PnId/kzwnfky/Jc0 DfSYcKMkWAyWboly8DdhOvPVZ7xWVSCrCi8YWuV6vhADat2iG4DcBslixZEJYa4xZNKc lbKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712103672; x=1712708472; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uMYateSukf7H0oOJ+C+iNu7/xSffwSrWgNekXtbJjIw=; b=JnkAeL3lsNLv77OZ8oeXgBAWpSBtBFJUD6L0W4kIS9Xq4Q5bpbw6aHSHXmfuDOn/eO Xk6LqDUZ9SUNlVH4/dhJBLLc7fxk2GJB9kQpfIclBCJpjQRKk2xLmvhGC1msrIWlp2al tmnbzqPhSk9gDi7ye8ZBpc/wsso1qrsr1KucXXKnqZdwGmysFhltc8R07E2s64ucvQHf wGPxNXrku5oKLNNgw/8WeOfOwwasJb9wGprbPkG3duSMpg1qe8CVqLqLMvrLekf5l7HT r5vLnvIj/SIeMukoX7cUL7XOT3FusGKd+kZTIRqIXnGRyA4TKX7ylVpPvuqDx6K0PsT+ IiHg== X-Forwarded-Encrypted: i=1; AJvYcCVx5zKhE1OSTFtLFbFzIZvtpca3nreJ6F/6TIjv2PGe6GrzR9F+p/0zZc75E0j6pFSfVTeD2PTY5hs2bM9cvpl6bIs= X-Gm-Message-State: AOJu0Yy4kkv0/FAKOwKqdbvwhkWzPARNr38nc0cx/FUziF9kcYMAdjIi FgkNHsrKIfZpRSdOuID/fCwPDQEIK0P0IwhFVC3eaMFAUhkD+rzLIpPtJTvgO6ZJuHZYb7suiRv GmXzWpfdm7aY0jXu1x72nXw== X-Google-Smtp-Source: AGHT+IE29e4Cqcglhku7rywClT9Csi6OiqnXhu3lkv6USU0uJOVELHcNpLBCqb3h2v/a9XjCg26VZiSI3g7d13RlKA== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:1726:7620:90a1:78b9]) (user=almasrymina job=sendgmr) by 2002:a81:9255:0:b0:615:8a1:9a94 with SMTP id j82-20020a819255000000b0061508a19a94mr1086916ywg.0.1712103672382; Tue, 02 Apr 2024 17:21:12 -0700 (PDT) Date: Tue, 2 Apr 2024 17:20:43 -0700 In-Reply-To: <20240403002053.2376017-1-almasrymina@google.com> Mime-Version: 1.0 References: <20240403002053.2376017-1-almasrymina@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240403002053.2376017-7-almasrymina@google.com> Subject: [RFC PATCH net-next v8 06/14] page_pool: convert to use netmem From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: Mina Almasry , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Steffen Klassert , Herbert Xu , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Amritha Nambiar , Maciej Fijalkowski , Alexander Mikhalitsyn , Kaiyuan Zhang , Christian Brauner , Simon Horman , David Howells , Florian Westphal , Yunsheng Lin , Kuniyuki Iwashima , Jens Axboe , Arseniy Krasnov , Aleksander Lobakin , Michael Lass , Jiri Pirko , Sebastian Andrzej Siewior , Lorenzo Bianconi , Richard Gobert , Sridhar Samudrala , Xuan Zhuo , Johannes Berg , Abel Wu , Breno Leitao , Pavel Begunkov , David Wei , Jason Gunthorpe , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , linux-mm@kvack.org, Matthew Wilcox X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 9BE8F14001C X-Stat-Signature: n5n43cpjb68e8jrdx3xwhdka48roup9f X-Rspam-User: X-HE-Tag: 1712103673-653131 X-HE-Meta: U2FsdGVkX19yLcIzynkcDlJD+VjHMm2RkBnN252EaTXeMSoqymFh67ouhNDrmb8IVyYSZ18n3tr8EZT8GOXOArew0IyM7rGOBMkKosYKdWZeAIBUEOKox47SCfTWk2oHiSarE3gck5W+j8gi6HoObOci8IvQXJ0mUKY+e/8vx4MciBbm9AHPzK4+pDi/leWfJj3kRsKAePubAk/7rArB/2YUfv7fI9DW0fGj6JM4XSH/9esi8xU/UPxnAiqpPLga/zYHHNVww3rEMX+g3L4apxq7obC4QEQE6+O6GTVGMqVJgggxUIsqGQopn2aOKXLof7CpTWbdocSzD+071cQhJHCY+8Y6h7Wy0w6sAowWb6PX5O76Dns3/wNYHifMijcpiON67T6s7XB0n1DacJ/9vT4NL5st718ejBsV8dhVC7lSDysf7i3XkKmUZ1tmCwxsC33XBuBRN42o/VeNj6GJDXLo/E0Q9NXTEngCO3R8M9NIZx6eY5jHgjZm57ULBGE/2vGrbA7MR6VZaybYoIHqj9iBwQdSq43GtolyNwwMqBvm415GlU8v3dxnBC8Y9nWghU/zYKr3BKeYMlTK/wcTZtIHwxNuKw/Fy8lUUWdiZ6WIlr+B2+wQvCXTIpYCEYFKR6D1VWzyIf6i19zwqobs5nndArdfq5DuXj9/aaSSsFxpXS5oNg2bSyx4+pMdwBoeTTfSJ37PQDM+ui19MM4MlRTMC4mlMa4F1xwy+EnpaiHhDnfr6prgZjYMY+bisNhJaf+V86I6wEtpA5OEUGiQJPOOk8qnaVKAlCd6x3N6tc53H5A3/s3cCrzx4jo4+Wq+h4bvBlyP9AnflyCZ3bPd6OJJ9pgulGBU0VsDItAKOEdJ/WUciR6s/S6isJfT6HSlng7OryIXUChHE9DlyCDhZxgPG0BC+p+28jdPCafdlXoABcQbhTFeNlrDmzCv+5yZWy71JrAosF0lS4B5EFU Q6P+fZwK Admbs9hRuPq5eGPV1o2Pm9ir/QuOFqsl9IZJDIds20ECQs/l9UG/PRTZaVe3asFNdmJ4ErqjBlYXyOlRzp7S10wxSwfeLt79zc8QWwJxsqi4NBjxm8wFG4raEjxadlHXW8rz6Uwx2GskQE+OLeF/ztCpfhOFk1VdyNedzJNO6lQGgGmr/qncArJiNgOuohXo89QHUGmvqRb/BTh56zuNc3p7hBFrw7z27duFmxD7Vz+P0H4vZT/XgLMNAmAoEIwH/lSeZSy0GXjrq1Ic9PuzOSVQr2N6cgPa3lINPAnrQjBNlnjtd+AC9agWJkAxeiqlcniVUb6B/6UfGcXGrvbpXomjqrSDmH8Oi4X8FxNtfULJW1yNGElaokGLuTAeSTabQGCZQCQXfl8fWYOOZNEIeKXS2myG8gqhEa5U4Z0RccAUDHNI5xaA/rUaIX0ZncMYsrevNE+upRm+IZ46EWVeGARf25/Lk5CrxfbCZ32Hgxh6sKAryJ/+8nJSJL3KKDWtnuKad6e2dYlCHA7XVq/yzfMZGlHWAsn3GSwwqZgNCCVuDUS8Cb8Mk0TmO0XVUPrD2r3Lj+uQiDdnPbozfp6DUqw6Qkqg5s/2+sHBosO/O7F14T8J4lSawc7U58j+HXbr1FU06lbX668cGUCyEPJI90FXPHidKn69G7n56PgvuETZb3r3t8+Lb3OaCgxfuLxC6KzBP+esV+o4bDVVAAPOO4jyBfy3DUQOdQU3Be9gMg1AOCjLV9uCVfAc4BRjsnYLOZAPcUa4HekOFOHp7znK7t+Q41bAdFbFFM1feajn1PEnkpXlSGBa+Fo9gQQ9sScSEbhvH96vESVIvXLmb9zwaNmbW5B/pmVDah15y77ly0ggr/QUL4CzwhhtC6Ygc00q5+iFr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Abstrace the memory type from the page_pool so we can later add support for new memory types. Convert the page_pool to use the new netmem type abstraction, rather than use struct page directly. As of this patch the netmem type is a no-op abstraction: it's always a struct page underneath. All the page pool internals are converted to use struct netmem instead of struct page, and the page pool now exports 2 APIs: 1. The existing struct page API. 2. The new struct netmem API. Keeping the existing API is transitional; we do not want to refactor all the current drivers using the page pool at once. The netmem abstraction is currently a no-op. The page_pool uses page_to_netmem() to convert allocated pages to netmem, and uses netmem_to_page() to convert the netmem back to pages to pass to mm APIs, Follow up patches to this series add non-paged netmem support to the page_pool. This change is factored out on its own to limit the code churn to this 1 patch, for ease of code review. Signed-off-by: Mina Almasry --- v8: - Fix napi_pp_put_page() taking netmem instead of page to fix patch-by-patch build error. - Add net/netmem.h include in this patch to fix patch-by-patch build error. v6: - Rebased on top of the merged netmem_ref type. Cc: linux-mm@kvack.org Cc: Matthew Wilcox --- include/linux/skbuff.h | 6 +- include/net/netmem.h | 15 ++ include/net/page_pool/helpers.h | 122 +++++++++---- include/net/page_pool/types.h | 18 +- include/trace/events/page_pool.h | 29 +-- net/bpf/test_run.c | 5 +- net/core/page_pool.c | 303 +++++++++++++++++-------------- net/core/skbuff.c | 7 +- 8 files changed, 304 insertions(+), 201 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b7f1ecdaec38..c9659e9d843e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3510,13 +3510,13 @@ int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb, unsigned int headroom); int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb, struct bpf_prog *prog); -bool napi_pp_put_page(struct page *page, bool napi_safe); +bool napi_pp_put_page(netmem_ref netmem, bool napi_safe); static inline void skb_page_unref(const struct sk_buff *skb, struct page *page, bool napi_safe) { #ifdef CONFIG_PAGE_POOL - if (skb->pp_recycle && napi_pp_put_page(page, napi_safe)) + if (skb->pp_recycle && napi_pp_put_page(page_to_netmem(page), napi_safe)) return; #endif put_page(page); @@ -3528,7 +3528,7 @@ napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) struct page *page = skb_frag_page(frag); #ifdef CONFIG_PAGE_POOL - if (recycle && napi_pp_put_page(page, napi_safe)) + if (recycle && napi_pp_put_page(page_to_netmem(page), napi_safe)) return; #endif put_page(page); diff --git a/include/net/netmem.h b/include/net/netmem.h index 33014370a885..5f1c728618f2 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -88,4 +88,19 @@ static inline netmem_ref page_to_netmem(struct page *page) return (__force netmem_ref)page; } +static inline int netmem_ref_count(netmem_ref netmem) +{ + return page_ref_count(netmem_to_page(netmem)); +} + +static inline unsigned long netmem_to_pfn(netmem_ref netmem) +{ + return page_to_pfn(netmem_to_page(netmem)); +} + +static inline netmem_ref netmem_compound_head(netmem_ref netmem) +{ + return page_to_netmem(compound_head(netmem_to_page(netmem))); +} + #endif /* _NET_NETMEM_H */ diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 1d397c1a0043..61814f91a458 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -53,6 +53,8 @@ #define _NET_PAGE_POOL_HELPERS_H #include +#include +#include #ifdef CONFIG_PAGE_POOL_STATS /* Deprecated driver-facing API, use netlink instead */ @@ -101,7 +103,7 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool) * Get a page fragment from the page allocator or page_pool caches. * * Return: - * Return allocated page fragment, otherwise return NULL. + * Return allocated page fragment, otherwise return 0. */ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool, unsigned int *offset, @@ -112,22 +114,22 @@ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool, return page_pool_alloc_frag(pool, offset, size, gfp); } -static inline struct page *page_pool_alloc(struct page_pool *pool, - unsigned int *offset, - unsigned int *size, gfp_t gfp) +static inline netmem_ref page_pool_alloc(struct page_pool *pool, + unsigned int *offset, + unsigned int *size, gfp_t gfp) { unsigned int max_size = PAGE_SIZE << pool->p.order; - struct page *page; + netmem_ref netmem; if ((*size << 1) > max_size) { *size = max_size; *offset = 0; - return page_pool_alloc_pages(pool, gfp); + return page_pool_alloc_netmem(pool, gfp); } - page = page_pool_alloc_frag(pool, offset, *size, gfp); - if (unlikely(!page)) - return NULL; + netmem = page_pool_alloc_frag_netmem(pool, offset, *size, gfp); + if (unlikely(!netmem)) + return 0; /* There is very likely not enough space for another fragment, so append * the remaining size to the current fragment to avoid truesize @@ -138,7 +140,7 @@ static inline struct page *page_pool_alloc(struct page_pool *pool, pool->frag_offset = max_size; } - return page; + return netmem; } /** @@ -152,7 +154,7 @@ static inline struct page *page_pool_alloc(struct page_pool *pool, * utilization and performance penalty. * * Return: - * Return allocated page or page fragment, otherwise return NULL. + * Return allocated page or page fragment, otherwise return 0. */ static inline struct page *page_pool_dev_alloc(struct page_pool *pool, unsigned int *offset, @@ -160,7 +162,7 @@ static inline struct page *page_pool_dev_alloc(struct page_pool *pool, { gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN); - return page_pool_alloc(pool, offset, size, gfp); + return netmem_to_page(page_pool_alloc(pool, offset, size, gfp)); } static inline void *page_pool_alloc_va(struct page_pool *pool, @@ -170,9 +172,10 @@ static inline void *page_pool_alloc_va(struct page_pool *pool, struct page *page; /* Mask off __GFP_HIGHMEM to ensure we can use page_address() */ - page = page_pool_alloc(pool, &offset, size, gfp & ~__GFP_HIGHMEM); + page = netmem_to_page( + page_pool_alloc(pool, &offset, size, gfp & ~__GFP_HIGHMEM)); if (unlikely(!page)) - return NULL; + return 0; return page_address(page) + offset; } @@ -187,7 +190,7 @@ static inline void *page_pool_alloc_va(struct page_pool *pool, * it returns va of the allocated page or page fragment. * * Return: - * Return the va for the allocated page or page fragment, otherwise return NULL. + * Return the va for the allocated page or page fragment, otherwise return 0. */ static inline void *page_pool_dev_alloc_va(struct page_pool *pool, unsigned int *size) @@ -210,6 +213,11 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool) return pool->p.dma_dir; } +static inline void page_pool_fragment_netmem(netmem_ref netmem, long nr) +{ + atomic_long_set(&netmem_to_page(netmem)->pp_ref_count, nr); +} + /** * page_pool_fragment_page() - split a fresh page into fragments * @page: page to split @@ -230,11 +238,12 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool) */ static inline void page_pool_fragment_page(struct page *page, long nr) { - atomic_long_set(&page->pp_ref_count, nr); + page_pool_fragment_netmem(page_to_netmem(page), nr); } -static inline long page_pool_unref_page(struct page *page, long nr) +static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) { + struct page *page = netmem_to_page(netmem); long ret; /* If nr == pp_ref_count then we have cleared all remaining @@ -277,15 +286,41 @@ static inline long page_pool_unref_page(struct page *page, long nr) return ret; } +static inline long page_pool_unref_page(struct page *page, long nr) +{ + return page_pool_unref_netmem(page_to_netmem(page), nr); +} + +static inline void page_pool_ref_netmem(netmem_ref netmem) +{ + atomic_long_inc(&netmem_to_page(netmem)->pp_ref_count); +} + static inline void page_pool_ref_page(struct page *page) { - atomic_long_inc(&page->pp_ref_count); + page_pool_ref_netmem(page_to_netmem(page)); } -static inline bool page_pool_is_last_ref(struct page *page) +static inline bool page_pool_is_last_ref(netmem_ref netmem) { /* If page_pool_unref_page() returns 0, we were the last user */ - return page_pool_unref_page(page, 1) == 0; + return page_pool_unref_netmem(netmem, 1) == 0; +} + +static inline void page_pool_put_netmem(struct page_pool *pool, + netmem_ref netmem, + unsigned int dma_sync_size, + bool allow_direct) +{ + /* When page_pool isn't compiled-in, net/core/xdp.c doesn't + * allow registering MEM_TYPE_PAGE_POOL, but shield linker. + */ +#ifdef CONFIG_PAGE_POOL + if (!page_pool_is_last_ref(netmem)) + return; + + page_pool_put_unrefed_netmem(pool, netmem, dma_sync_size, allow_direct); +#endif } /** @@ -306,15 +341,15 @@ static inline void page_pool_put_page(struct page_pool *pool, unsigned int dma_sync_size, bool allow_direct) { - /* When page_pool isn't compiled-in, net/core/xdp.c doesn't - * allow registering MEM_TYPE_PAGE_POOL, but shield linker. - */ -#ifdef CONFIG_PAGE_POOL - if (!page_pool_is_last_ref(page)) - return; + page_pool_put_netmem(pool, page_to_netmem(page), dma_sync_size, + allow_direct); +} - page_pool_put_unrefed_page(pool, page, dma_sync_size, allow_direct); -#endif +static inline void page_pool_put_full_netmem(struct page_pool *pool, + netmem_ref netmem, + bool allow_direct) +{ + page_pool_put_netmem(pool, netmem, -1, allow_direct); } /** @@ -329,7 +364,7 @@ static inline void page_pool_put_page(struct page_pool *pool, static inline void page_pool_put_full_page(struct page_pool *pool, struct page *page, bool allow_direct) { - page_pool_put_page(pool, page, -1, allow_direct); + page_pool_put_netmem(pool, page_to_netmem(page), -1, allow_direct); } /** @@ -363,6 +398,18 @@ static inline void page_pool_free_va(struct page_pool *pool, void *va, page_pool_put_page(pool, virt_to_head_page(va), -1, allow_direct); } +static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem) +{ + struct page *page = netmem_to_page(netmem); + + dma_addr_t ret = page->dma_addr; + + if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) + ret <<= PAGE_SHIFT; + + return ret; +} + /** * page_pool_get_dma_addr() - Retrieve the stored DMA address. * @page: page allocated from a page pool @@ -372,16 +419,14 @@ static inline void page_pool_free_va(struct page_pool *pool, void *va, */ static inline dma_addr_t page_pool_get_dma_addr(struct page *page) { - dma_addr_t ret = page->dma_addr; - - if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) - ret <<= PAGE_SHIFT; - - return ret; + return page_pool_get_dma_addr_netmem(page_to_netmem(page)); } -static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr) +static inline bool page_pool_set_dma_addr_netmem(netmem_ref netmem, + dma_addr_t addr) { + struct page *page = netmem_to_page(netmem); + if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { page->dma_addr = addr >> PAGE_SHIFT; @@ -395,6 +440,11 @@ static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr) return false; } +static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr) +{ + return page_pool_set_dma_addr_netmem(page_to_netmem(page), addr); +} + static inline bool page_pool_put(struct page_pool *pool) { return refcount_dec_and_test(&pool->user_cnt); diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 07e6afafedbe..f04af1613f59 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -6,6 +6,7 @@ #include #include #include +#include #define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA * map/unmap @@ -40,7 +41,7 @@ #define PP_ALLOC_CACHE_REFILL 64 struct pp_alloc_cache { u32 count; - struct page *cache[PP_ALLOC_CACHE_SIZE]; + netmem_ref cache[PP_ALLOC_CACHE_SIZE]; }; /** @@ -73,7 +74,7 @@ struct page_pool_params { struct_group_tagged(page_pool_params_slow, slow, struct net_device *netdev; /* private: used by test code only */ - void (*init_callback)(struct page *page, void *arg); + void (*init_callback)(netmem_ref netmem, void *arg); void *init_arg; ); }; @@ -131,8 +132,8 @@ struct page_pool_stats { struct memory_provider_ops { int (*init)(struct page_pool *pool); void (*destroy)(struct page_pool *pool); - struct page *(*alloc_pages)(struct page_pool *pool, gfp_t gfp); - bool (*release_page)(struct page_pool *pool, struct page *page); + netmem_ref (*alloc_pages)(struct page_pool *pool, gfp_t gfp); + bool (*release_page)(struct page_pool *pool, netmem_ref netmem); }; struct pp_memory_provider_params { @@ -147,7 +148,7 @@ struct page_pool { bool has_init_callback; long frag_users; - struct page *frag_page; + netmem_ref frag_page; unsigned int frag_offset; u32 pages_state_hold_cnt; @@ -219,8 +220,12 @@ struct page_pool { }; struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); +netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp); struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset, unsigned int size, gfp_t gfp); +netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool, + unsigned int *offset, unsigned int size, + gfp_t gfp); struct page_pool *page_pool_create(const struct page_pool_params *params); struct page_pool *page_pool_create_percpu(const struct page_pool_params *params, int cpuid); @@ -250,6 +255,9 @@ static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data, } #endif +void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem, + unsigned int dma_sync_size, + bool allow_direct); void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page, unsigned int dma_sync_size, bool allow_direct); diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h index 6834356b2d2a..c5b6383ff276 100644 --- a/include/trace/events/page_pool.h +++ b/include/trace/events/page_pool.h @@ -42,51 +42,52 @@ TRACE_EVENT(page_pool_release, TRACE_EVENT(page_pool_state_release, TP_PROTO(const struct page_pool *pool, - const struct page *page, u32 release), + netmem_ref netmem, u32 release), - TP_ARGS(pool, page, release), + TP_ARGS(pool, netmem, release), TP_STRUCT__entry( __field(const struct page_pool *, pool) - __field(const struct page *, page) + __field(netmem_ref, netmem) __field(u32, release) __field(unsigned long, pfn) ), TP_fast_assign( __entry->pool = pool; - __entry->page = page; + __entry->netmem = netmem; __entry->release = release; - __entry->pfn = page_to_pfn(page); + __entry->pfn = netmem_to_pfn(netmem); ), - TP_printk("page_pool=%p page=%p pfn=0x%lx release=%u", - __entry->pool, __entry->page, __entry->pfn, __entry->release) + TP_printk("page_pool=%p netmem=%lu pfn=0x%lx release=%u", + __entry->pool, (__force unsigned long)__entry->netmem, + __entry->pfn, __entry->release) ); TRACE_EVENT(page_pool_state_hold, TP_PROTO(const struct page_pool *pool, - const struct page *page, u32 hold), + netmem_ref netmem, u32 hold), - TP_ARGS(pool, page, hold), + TP_ARGS(pool, netmem, hold), TP_STRUCT__entry( __field(const struct page_pool *, pool) - __field(const struct page *, page) + __field(netmem_ref, netmem) __field(u32, hold) __field(unsigned long, pfn) ), TP_fast_assign( __entry->pool = pool; - __entry->page = page; + __entry->netmem = netmem; __entry->hold = hold; - __entry->pfn = page_to_pfn(page); + __entry->pfn = netmem_to_pfn(netmem); ), - TP_printk("page_pool=%p page=%p pfn=0x%lx hold=%u", - __entry->pool, __entry->page, __entry->pfn, __entry->hold) + TP_printk("page_pool=%p netmem=%lu pfn=0x%lx hold=%u", + __entry->pool, __entry->netmem, __entry->pfn, __entry->hold) ); TRACE_EVENT(page_pool_update_nid, diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 61efeadaff8d..fc300e807e1d 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -127,9 +127,10 @@ struct xdp_test_data { #define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head)) #define TEST_XDP_MAX_BATCH 256 -static void xdp_test_run_init_page(struct page *page, void *arg) +static void xdp_test_run_init_page(netmem_ref netmem, void *arg) { - struct xdp_page_head *head = phys_to_virt(page_to_phys(page)); + struct xdp_page_head *head = + phys_to_virt(page_to_phys(netmem_to_page(netmem))); struct xdp_buff *new_ctx, *orig_ctx; u32 headroom = XDP_PACKET_HEADROOM; struct xdp_test_data *xdp = arg; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 795b7ff1c01f..c8125be3a6e2 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -329,19 +329,18 @@ struct page_pool *page_pool_create(const struct page_pool_params *params) } EXPORT_SYMBOL(page_pool_create); -static void page_pool_return_page(struct page_pool *pool, struct page *page); +static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem); -noinline -static struct page *page_pool_refill_alloc_cache(struct page_pool *pool) +static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool) { struct ptr_ring *r = &pool->ring; - struct page *page; + netmem_ref netmem; int pref_nid; /* preferred NUMA node */ /* Quicker fallback, avoid locks when ring is empty */ if (__ptr_ring_empty(r)) { alloc_stat_inc(pool, empty); - return NULL; + return 0; } /* Softirq guarantee CPU and thus NUMA node is stable. This, @@ -356,56 +355,56 @@ static struct page *page_pool_refill_alloc_cache(struct page_pool *pool) /* Refill alloc array, but only if NUMA match */ do { - page = __ptr_ring_consume(r); - if (unlikely(!page)) + netmem = (__force netmem_ref)__ptr_ring_consume(r); + if (unlikely(!netmem)) break; - if (likely(page_to_nid(page) == pref_nid)) { - pool->alloc.cache[pool->alloc.count++] = page; + if (likely(page_to_nid(netmem_to_page(netmem)) == pref_nid)) { + pool->alloc.cache[pool->alloc.count++] = netmem; } else { /* NUMA mismatch; * (1) release 1 page to page-allocator and * (2) break out to fallthrough to alloc_pages_node. * This limit stress on page buddy alloactor. */ - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); alloc_stat_inc(pool, waive); - page = NULL; + netmem = 0; break; } } while (pool->alloc.count < PP_ALLOC_CACHE_REFILL); /* Return last page */ if (likely(pool->alloc.count > 0)) { - page = pool->alloc.cache[--pool->alloc.count]; + netmem = pool->alloc.cache[--pool->alloc.count]; alloc_stat_inc(pool, refill); } - return page; + return netmem; } /* fast path */ -static struct page *__page_pool_get_cached(struct page_pool *pool) +static netmem_ref __page_pool_get_cached(struct page_pool *pool) { - struct page *page; + netmem_ref netmem; /* Caller MUST guarantee safe non-concurrent access, e.g. softirq */ if (likely(pool->alloc.count)) { /* Fast-path */ - page = pool->alloc.cache[--pool->alloc.count]; + netmem = pool->alloc.cache[--pool->alloc.count]; alloc_stat_inc(pool, fast); } else { - page = page_pool_refill_alloc_cache(pool); + netmem = page_pool_refill_alloc_cache(pool); } - return page; + return netmem; } static void page_pool_dma_sync_for_device(struct page_pool *pool, - struct page *page, + netmem_ref netmem, unsigned int dma_sync_size) { - dma_addr_t dma_addr = page_pool_get_dma_addr(page); + dma_addr_t dma_addr = page_pool_get_dma_addr_netmem(netmem); dma_sync_size = min(dma_sync_size, pool->p.max_len); dma_sync_single_range_for_device(pool->p.dev, dma_addr, @@ -413,7 +412,7 @@ static void page_pool_dma_sync_for_device(struct page_pool *pool, pool->p.dma_dir); } -static bool page_pool_dma_map(struct page_pool *pool, struct page *page) +static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) { dma_addr_t dma; @@ -422,18 +421,18 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) * into page private data (i.e 32bit cpu with 64bit DMA caps) * This mapping is kept for lifetime of page, until leaving pool. */ - dma = dma_map_page_attrs(pool->p.dev, page, 0, - (PAGE_SIZE << pool->p.order), - pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | - DMA_ATTR_WEAK_ORDERING); + dma = dma_map_page_attrs(pool->p.dev, netmem_to_page(netmem), 0, + (PAGE_SIZE << pool->p.order), pool->p.dma_dir, + DMA_ATTR_SKIP_CPU_SYNC | + DMA_ATTR_WEAK_ORDERING); if (dma_mapping_error(pool->p.dev, dma)) return false; - if (page_pool_set_dma_addr(page, dma)) + if (page_pool_set_dma_addr_netmem(netmem, dma)) goto unmap_failed; if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) - page_pool_dma_sync_for_device(pool, page, pool->p.max_len); + page_pool_dma_sync_for_device(pool, netmem, pool->p.max_len); return true; @@ -445,9 +444,10 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) return false; } -static void page_pool_set_pp_info(struct page_pool *pool, - struct page *page) +static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) { + struct page *page = netmem_to_page(netmem); + page->pp = pool; page->pp_magic |= PP_SIGNATURE; @@ -457,13 +457,15 @@ static void page_pool_set_pp_info(struct page_pool *pool, * is dirtying the same cache line as the page->pp_magic above, so * the overhead is negligible. */ - page_pool_fragment_page(page, 1); + page_pool_fragment_netmem(netmem, 1); if (pool->has_init_callback) - pool->slow.init_callback(page, pool->slow.init_arg); + pool->slow.init_callback(netmem, pool->slow.init_arg); } -static void page_pool_clear_pp_info(struct page *page) +static void page_pool_clear_pp_info(netmem_ref netmem) { + struct page *page = netmem_to_page(netmem); + page->pp_magic = 0; page->pp = NULL; } @@ -479,34 +481,34 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool, return NULL; if ((pool->p.flags & PP_FLAG_DMA_MAP) && - unlikely(!page_pool_dma_map(pool, page))) { + unlikely(!page_pool_dma_map(pool, page_to_netmem(page)))) { put_page(page); return NULL; } alloc_stat_inc(pool, slow_high_order); - page_pool_set_pp_info(pool, page); + page_pool_set_pp_info(pool, page_to_netmem(page)); /* Track how many pages are held 'in-flight' */ pool->pages_state_hold_cnt++; - trace_page_pool_state_hold(pool, page, pool->pages_state_hold_cnt); + trace_page_pool_state_hold(pool, page_to_netmem(page), + pool->pages_state_hold_cnt); return page; } /* slow path */ -noinline -static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, - gfp_t gfp) +static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool, + gfp_t gfp) { const int bulk = PP_ALLOC_CACHE_REFILL; unsigned int pp_flags = pool->p.flags; unsigned int pp_order = pool->p.order; - struct page *page; + netmem_ref netmem; int i, nr_pages; /* Don't support bulk alloc for high-order pages */ if (unlikely(pp_order)) - return __page_pool_alloc_page_order(pool, gfp); + return page_to_netmem(__page_pool_alloc_page_order(pool, gfp)); /* Unnecessary as alloc cache is empty, but guarantees zero count */ if (unlikely(pool->alloc.count > 0)) @@ -515,60 +517,67 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array */ memset(&pool->alloc.cache, 0, sizeof(void *) * bulk); - nr_pages = alloc_pages_bulk_array_node(gfp, pool->p.nid, bulk, - pool->alloc.cache); + nr_pages = alloc_pages_bulk_array_node(gfp, + pool->p.nid, bulk, + (struct page **)pool->alloc.cache); if (unlikely(!nr_pages)) - return NULL; + return 0; /* Pages have been filled into alloc.cache array, but count is zero and * page element have not been (possibly) DMA mapped. */ for (i = 0; i < nr_pages; i++) { - page = pool->alloc.cache[i]; + netmem = pool->alloc.cache[i]; if ((pp_flags & PP_FLAG_DMA_MAP) && - unlikely(!page_pool_dma_map(pool, page))) { - put_page(page); + unlikely(!page_pool_dma_map(pool, netmem))) { + put_page(netmem_to_page(netmem)); continue; } - page_pool_set_pp_info(pool, page); - pool->alloc.cache[pool->alloc.count++] = page; + page_pool_set_pp_info(pool, netmem); + pool->alloc.cache[pool->alloc.count++] = netmem; /* Track how many pages are held 'in-flight' */ pool->pages_state_hold_cnt++; - trace_page_pool_state_hold(pool, page, + trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt); } /* Return last page */ if (likely(pool->alloc.count > 0)) { - page = pool->alloc.cache[--pool->alloc.count]; + netmem = pool->alloc.cache[--pool->alloc.count]; alloc_stat_inc(pool, slow); } else { - page = NULL; + netmem = 0; } /* When page just alloc'ed is should/must have refcnt 1. */ - return page; + return netmem; } /* For using page_pool replace: alloc_pages() API calls, but provide * synchronization guarantee for allocation side. */ -struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp) +netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp) { - struct page *page; + netmem_ref netmem; /* Fast-path: Get a page from cache */ - page = __page_pool_get_cached(pool); - if (page) - return page; + netmem = __page_pool_get_cached(pool); + if (netmem) + return netmem; /* Slow-path: cache empty, do real allocation */ if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) - page = pool->mp_ops->alloc_pages(pool, gfp); + netmem = pool->mp_ops->alloc_pages(pool, gfp); else - page = __page_pool_alloc_pages_slow(pool, gfp); - return page; + netmem = __page_pool_alloc_pages_slow(pool, gfp); + return netmem; +} +EXPORT_SYMBOL(page_pool_alloc_netmem); + +struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp) +{ + return netmem_to_page(page_pool_alloc_netmem(pool, gfp)); } EXPORT_SYMBOL(page_pool_alloc_pages); @@ -596,8 +605,8 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict) return inflight; } -static __always_inline -void __page_pool_release_page_dma(struct page_pool *pool, struct page *page) +static __always_inline void __page_pool_release_page_dma(struct page_pool *pool, + netmem_ref netmem) { dma_addr_t dma; @@ -607,13 +616,13 @@ void __page_pool_release_page_dma(struct page_pool *pool, struct page *page) */ return; - dma = page_pool_get_dma_addr(page); + dma = page_pool_get_dma_addr_netmem(netmem); /* When page is unmapped, it cannot be returned to our pool */ dma_unmap_page_attrs(pool->p.dev, dma, PAGE_SIZE << pool->p.order, pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING); - page_pool_set_dma_addr(page, 0); + page_pool_set_dma_addr_netmem(netmem, 0); } /* Disconnects a page (from a page_pool). API users can have a need @@ -621,26 +630,26 @@ void __page_pool_release_page_dma(struct page_pool *pool, struct page *page) * a regular page (that will eventually be returned to the normal * page-allocator via put_page). */ -void page_pool_return_page(struct page_pool *pool, struct page *page) +void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) { int count; bool put; put = true; if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) - put = pool->mp_ops->release_page(pool, page); + put = pool->mp_ops->release_page(pool, netmem); else - __page_pool_release_page_dma(pool, page); + __page_pool_release_page_dma(pool, netmem); /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool afterwards. */ count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt); - trace_page_pool_state_release(pool, page, count); + trace_page_pool_state_release(pool, netmem, count); if (put) { - page_pool_clear_pp_info(page); - put_page(page); + page_pool_clear_pp_info(netmem); + put_page(netmem_to_page(netmem)); } /* An optimization would be to call __free_pages(page, pool->p.order) * knowing page is not part of page-cache (thus avoiding a @@ -648,14 +657,14 @@ void page_pool_return_page(struct page_pool *pool, struct page *page) */ } -static bool page_pool_recycle_in_ring(struct page_pool *pool, struct page *page) +static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem) { int ret; /* BH protection not needed if current is softirq */ if (in_softirq()) - ret = ptr_ring_produce(&pool->ring, page); + ret = ptr_ring_produce(&pool->ring, (__force void *)netmem); else - ret = ptr_ring_produce_bh(&pool->ring, page); + ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem); if (!ret) { recycle_stat_inc(pool, ring); @@ -670,7 +679,7 @@ static bool page_pool_recycle_in_ring(struct page_pool *pool, struct page *page) * * Caller must provide appropriate safe context. */ -static bool page_pool_recycle_in_cache(struct page *page, +static bool page_pool_recycle_in_cache(netmem_ref netmem, struct page_pool *pool) { if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE)) { @@ -679,14 +688,15 @@ static bool page_pool_recycle_in_cache(struct page *page, } /* Caller MUST have verified/know (page_ref_count(page) == 1) */ - pool->alloc.cache[pool->alloc.count++] = page; + pool->alloc.cache[pool->alloc.count++] = netmem; recycle_stat_inc(pool, cached); return true; } -static bool __page_pool_page_can_be_recycled(const struct page *page) +static bool __page_pool_page_can_be_recycled(netmem_ref netmem) { - return page_ref_count(page) == 1 && !page_is_pfmemalloc(page); + return page_ref_count(netmem_to_page(netmem)) == 1 && + !page_is_pfmemalloc(netmem_to_page(netmem)); } /* If the page refcnt == 1, this will try to recycle the page. @@ -695,8 +705,8 @@ static bool __page_pool_page_can_be_recycled(const struct page *page) * If the page refcnt != 1, then the page will be returned to memory * subsystem. */ -static __always_inline struct page * -__page_pool_put_page(struct page_pool *pool, struct page *page, +static __always_inline netmem_ref +__page_pool_put_page(struct page_pool *pool, netmem_ref netmem, unsigned int dma_sync_size, bool allow_direct) { lockdep_assert_no_hardirq(); @@ -710,19 +720,19 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, * page is NOT reusable when allocated when system is under * some pressure. (page_is_pfmemalloc) */ - if (likely(__page_pool_page_can_be_recycled(page))) { + if (likely(__page_pool_page_can_be_recycled(netmem))) { /* Read barrier done in page_ref_count / READ_ONCE */ if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) - page_pool_dma_sync_for_device(pool, page, + page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); if (allow_direct && in_softirq() && - page_pool_recycle_in_cache(page, pool)) - return NULL; + page_pool_recycle_in_cache(netmem, pool)) + return 0; /* Page found as candidate for recycling */ - return page; + return netmem; } /* Fallback/non-XDP mode: API user have elevated refcnt. * @@ -738,21 +748,30 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, * will be invoking put_page. */ recycle_stat_inc(pool, released_refcnt); - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); - return NULL; + return 0; } -void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page, - unsigned int dma_sync_size, bool allow_direct) +void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem, + unsigned int dma_sync_size, bool allow_direct) { - page = __page_pool_put_page(pool, page, dma_sync_size, allow_direct); - if (page && !page_pool_recycle_in_ring(pool, page)) { + netmem = + __page_pool_put_page(pool, netmem, dma_sync_size, allow_direct); + if (netmem && !page_pool_recycle_in_ring(pool, netmem)) { /* Cache full, fallback to free pages */ recycle_stat_inc(pool, ring_full); - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); } } +EXPORT_SYMBOL(page_pool_put_unrefed_netmem); + +void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page, + unsigned int dma_sync_size, bool allow_direct) +{ + page_pool_put_unrefed_netmem(pool, page_to_netmem(page), dma_sync_size, + allow_direct); +} EXPORT_SYMBOL(page_pool_put_unrefed_page); /** @@ -777,16 +796,16 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data, bool in_softirq; for (i = 0; i < count; i++) { - struct page *page = virt_to_head_page(data[i]); + netmem_ref netmem = page_to_netmem(virt_to_head_page(data[i])); /* It is not the last user for the page frag case */ - if (!page_pool_is_last_ref(page)) + if (!page_pool_is_last_ref(netmem)) continue; - page = __page_pool_put_page(pool, page, -1, false); + netmem = __page_pool_put_page(pool, netmem, -1, false); /* Approved for bulk recycling in ptr_ring cache */ - if (page) - data[bulk_len++] = page; + if (netmem) + data[bulk_len++] = (__force void *)netmem; } if (unlikely(!bulk_len)) @@ -812,100 +831,108 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data, * since put_page() with refcnt == 1 can be an expensive operation */ for (; i < bulk_len; i++) - page_pool_return_page(pool, data[i]); + page_pool_return_page(pool, (__force netmem_ref)data[i]); } EXPORT_SYMBOL(page_pool_put_page_bulk); -static struct page *page_pool_drain_frag(struct page_pool *pool, - struct page *page) +static netmem_ref page_pool_drain_frag(struct page_pool *pool, + netmem_ref netmem) { long drain_count = BIAS_MAX - pool->frag_users; /* Some user is still using the page frag */ - if (likely(page_pool_unref_page(page, drain_count))) - return NULL; + if (likely(page_pool_unref_netmem(netmem, drain_count))) + return 0; - if (__page_pool_page_can_be_recycled(page)) { + if (__page_pool_page_can_be_recycled(netmem)) { if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) - page_pool_dma_sync_for_device(pool, page, -1); + page_pool_dma_sync_for_device(pool, netmem, -1); - return page; + return netmem; } - page_pool_return_page(pool, page); - return NULL; + page_pool_return_page(pool, netmem); + return 0; } static void page_pool_free_frag(struct page_pool *pool) { long drain_count = BIAS_MAX - pool->frag_users; - struct page *page = pool->frag_page; + netmem_ref netmem = pool->frag_page; - pool->frag_page = NULL; + pool->frag_page = 0; - if (!page || page_pool_unref_page(page, drain_count)) + if (!netmem || page_pool_unref_netmem(netmem, drain_count)) return; - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); } -struct page *page_pool_alloc_frag(struct page_pool *pool, - unsigned int *offset, - unsigned int size, gfp_t gfp) +netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool, + unsigned int *offset, unsigned int size, + gfp_t gfp) { unsigned int max_size = PAGE_SIZE << pool->p.order; - struct page *page = pool->frag_page; + netmem_ref netmem = pool->frag_page; if (WARN_ON(size > max_size)) - return NULL; + return 0; size = ALIGN(size, dma_get_cache_alignment()); *offset = pool->frag_offset; - if (page && *offset + size > max_size) { - page = page_pool_drain_frag(pool, page); - if (page) { + if (netmem && *offset + size > max_size) { + netmem = page_pool_drain_frag(pool, netmem); + if (netmem) { alloc_stat_inc(pool, fast); goto frag_reset; } } - if (!page) { - page = page_pool_alloc_pages(pool, gfp); - if (unlikely(!page)) { - pool->frag_page = NULL; - return NULL; + if (!netmem) { + netmem = page_pool_alloc_netmem(pool, gfp); + if (unlikely(!netmem)) { + pool->frag_page = 0; + return 0; } - pool->frag_page = page; + pool->frag_page = netmem; frag_reset: pool->frag_users = 1; *offset = 0; pool->frag_offset = size; - page_pool_fragment_page(page, BIAS_MAX); - return page; + page_pool_fragment_netmem(netmem, BIAS_MAX); + return netmem; } pool->frag_users++; pool->frag_offset = *offset + size; alloc_stat_inc(pool, fast); - return page; + return netmem; +} +EXPORT_SYMBOL(page_pool_alloc_frag_netmem); + +struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset, + unsigned int size, gfp_t gfp) +{ + return netmem_to_page(page_pool_alloc_frag_netmem(pool, offset, size, + gfp)); } EXPORT_SYMBOL(page_pool_alloc_frag); static void page_pool_empty_ring(struct page_pool *pool) { - struct page *page; + netmem_ref netmem; /* Empty recycle ring */ - while ((page = ptr_ring_consume_bh(&pool->ring))) { + while ((netmem = (__force netmem_ref)ptr_ring_consume_bh(&pool->ring))) { /* Verify the refcnt invariant of cached pages */ - if (!(page_ref_count(page) == 1)) + if (!(page_ref_count(netmem_to_page(netmem)) == 1)) pr_crit("%s() page_pool refcnt %d violation\n", - __func__, page_ref_count(page)); + __func__, netmem_ref_count(netmem)); - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); } } @@ -927,7 +954,7 @@ static void __page_pool_destroy(struct page_pool *pool) static void page_pool_empty_alloc_cache_once(struct page_pool *pool) { - struct page *page; + netmem_ref netmem; if (pool->destroy_cnt) return; @@ -937,8 +964,8 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool) * call concurrently. */ while (pool->alloc.count) { - page = pool->alloc.cache[--pool->alloc.count]; - page_pool_return_page(pool, page); + netmem = pool->alloc.cache[--pool->alloc.count]; + page_pool_return_page(pool, netmem); } } @@ -1044,15 +1071,15 @@ EXPORT_SYMBOL(page_pool_destroy); /* Caller must provide appropriate safe context, e.g. NAPI. */ void page_pool_update_nid(struct page_pool *pool, int new_nid) { - struct page *page; + netmem_ref netmem; trace_page_pool_update_nid(pool, new_nid); pool->p.nid = new_nid; /* Flush pool alloc cache, as refill will check NUMA node */ while (pool->alloc.count) { - page = pool->alloc.cache[--pool->alloc.count]; - page_pool_return_page(pool, page); + netmem = pool->alloc.cache[--pool->alloc.count]; + page_pool_return_page(pool, netmem); } } EXPORT_SYMBOL(page_pool_update_nid); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index a1be84be5d35..dc6b1f6435e2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1004,8 +1004,9 @@ int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb, EXPORT_SYMBOL(skb_cow_data_for_xdp); #if IS_ENABLED(CONFIG_PAGE_POOL) -bool napi_pp_put_page(struct page *page, bool napi_safe) +bool napi_pp_put_page(netmem_ref netmem, bool napi_safe) { + struct page *page = netmem_to_page(netmem); bool allow_direct = false; struct page_pool *pp; @@ -1042,7 +1043,7 @@ bool napi_pp_put_page(struct page *page, bool napi_safe) * The page will be returned to the pool here regardless of the * 'flipped' fragment being in use or not. */ - page_pool_put_full_page(pp, page, allow_direct); + page_pool_put_full_netmem(pp, page_to_netmem(page), allow_direct); return true; } @@ -1053,7 +1054,7 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe) { if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle) return false; - return napi_pp_put_page(virt_to_page(data), napi_safe); + return napi_pp_put_page(page_to_netmem(virt_to_page(data)), napi_safe); } /** From patchwork Wed Apr 3 00:20:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13614774 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6D32C6FD1F for ; Wed, 3 Apr 2024 00:21:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4CBB26B008C; Tue, 2 Apr 2024 20:21:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4542A6B0092; Tue, 2 Apr 2024 20:21:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A6AD6B0093; Tue, 2 Apr 2024 20:21:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 050B36B008C for ; Tue, 2 Apr 2024 20:21:21 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B403240C2A for ; Wed, 3 Apr 2024 00:21:21 +0000 (UTC) X-FDA: 81966316362.15.EEC7754 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf30.hostedemail.com (Postfix) with ESMTP id 096F880006 for ; Wed, 3 Apr 2024 00:21:19 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zDxtF5a+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3_qAMZgsKCKkJUVJbahVRWJPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--almasrymina.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3_qAMZgsKCKkJUVJbahVRWJPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--almasrymina.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712103680; a=rsa-sha256; cv=none; b=tq5O1Oud5NpDsBDwKJH4kMhmTP3WJBza4v0M87AyuR5fw4SUVMd6F/5FA8kocfGLOWCvUF D56X+NoIa39gSbkXr+vC2FlIKHXyIdnAb+cTIchYR2jS5kKIPRBE2Yb0Tw/+0l5AxqLs2M p5qxIYLtH9eJBGNaLUNCEcnnmKSIKTM= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zDxtF5a+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3_qAMZgsKCKkJUVJbahVRWJPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--almasrymina.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3_qAMZgsKCKkJUVJbahVRWJPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--almasrymina.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712103680; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D2IjTcVsfReMHiEB8lNFg968fB2NbJM0LK6XUJhALBA=; b=4Rlr0SVin9CIJsYRx5OHeNuam0g7prQ0Ri4W6z1N3bS/jCWtKoz+C4/GD2kvxgqC+gVMBM gz+6c1GDdkU8sxf3KheyNEsJ0LgceLSNp8wGrjLFYdgIvnnMjOHpMt5iI1kMX7rEw+MjWv hEmckJru+2MrgWqmkCyk0UkAcuIHc7g= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dc6b26ce0bbso10429144276.1 for ; Tue, 02 Apr 2024 17:21:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712103679; x=1712708479; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=D2IjTcVsfReMHiEB8lNFg968fB2NbJM0LK6XUJhALBA=; b=zDxtF5a+/x8kGxYIouGn7LLqJR9AwHIr4NHkTNVU6iJvS06/G7HlyJiwMnnVg6sDou zduHlwiabzX6HHylnpAIyhck20GHCFw0q+RtFsFuJhysburW052LBomVfocINaDwuc5i RiEOdC4pDTg2RiurTjdGQhZZMOFFKURzfscjKowR60udnqIOEEMfXuOw3Yj1xVcF2Dvd GiexUQJT1nL8iwobgJhjtPWJoJu75M+jDwBcYkQlOmaKxSiG15ScghCACqfraf5Q8J8t F0nI0BTDNAYfh6nPCQNQ4zJeYW8145OBQZYd1xEBtQxQYySdFU2We0giOtmMPeemeyx1 1JEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712103679; x=1712708479; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=D2IjTcVsfReMHiEB8lNFg968fB2NbJM0LK6XUJhALBA=; b=OD8BthIKSZcC9ire1gmfVxVCzvQ5IDA22UcRiNEMVU1IEX4vYuPlLunQblyHDES4zG h7IsS3kL0oTIFbdU18mF8wUtSPN2OiL1/Ef5EiiIkKEETIYL5yCPrWd9NwPRxX84T1AQ mQ5tjDgmaK3XZOhtwSy7G/n10hcV0FPjX/HK6YumduT+a5Mmz0TgfKVTRVP9M6o76gBe 3ZsvvXhJir/Iny74aXh3wiTDOnRDIQ7mRySLLhXiMjIrKEuC+I4INohRLMUy9iJmkbd/ F2AHufvkJ4wx61GPhCEJHnMBKdwUnZLr2FfaLsoERV9yWWBz4apNepGdJJcWdwrtGss+ Y4Mg== X-Forwarded-Encrypted: i=1; AJvYcCXnCAOlHLthyxhe9NKxFOELd897Tb+ErkBT4PQb3/uLHB1CtOqA0jp/s8x8VqqsURq1iaZoJauqvAzzWNp415GfqC0= X-Gm-Message-State: AOJu0Yya/trQccxFLWDXDOTCDa3Fo52Fwp/EY4oNuoUwuWGEWYTTnP80 NEU2wcg97a7KX6XTX+q3L0iK2bsg+UAswuFHAZcdYR4PwHnSlrI4uO4h3ByMqfTVB4CtPFsmMWN YbwCkxLjqgDMePYi1PcHnyA== X-Google-Smtp-Source: AGHT+IFHKPDZ+jhDCTy+dea1EJBlVfqzABuulCJLiRa/ry1KKZnqAruX9TA2wlbEP3WJef1smJBLI0pMmE8KJf0hBw== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:1726:7620:90a1:78b9]) (user=almasrymina job=sendgmr) by 2002:a05:6902:1109:b0:dc7:9218:df47 with SMTP id o9-20020a056902110900b00dc79218df47mr4459516ybu.5.1712103678710; Tue, 02 Apr 2024 17:21:18 -0700 (PDT) Date: Tue, 2 Apr 2024 17:20:44 -0700 In-Reply-To: <20240403002053.2376017-1-almasrymina@google.com> Mime-Version: 1.0 References: <20240403002053.2376017-1-almasrymina@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240403002053.2376017-8-almasrymina@google.com> Subject: [RFC PATCH net-next v8 07/14] page_pool: devmem support From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: Mina Almasry , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Steffen Klassert , Herbert Xu , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Amritha Nambiar , Maciej Fijalkowski , Alexander Mikhalitsyn , Kaiyuan Zhang , Christian Brauner , Simon Horman , David Howells , Florian Westphal , Yunsheng Lin , Kuniyuki Iwashima , Jens Axboe , Arseniy Krasnov , Aleksander Lobakin , Michael Lass , Jiri Pirko , Sebastian Andrzej Siewior , Lorenzo Bianconi , Richard Gobert , Sridhar Samudrala , Xuan Zhuo , Johannes Berg , Abel Wu , Breno Leitao , Pavel Begunkov , David Wei , Jason Gunthorpe , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , linux-mm@kvack.org, Matthew Wilcox X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 096F880006 X-Stat-Signature: r67nead8irbn6ffb3z4eij66tbrmtgat X-HE-Tag: 1712103679-386881 X-HE-Meta: U2FsdGVkX1+qD+EzOZdpVn6cNe3c9W42Ok7Rw9XREKRDD5q9Aj16Fde5gRknufZFyv+fEjaJ/RE8zIuM8K5Ih96HMRhzgp+Un92b2WAD7Scztqespc65zlCTigJ5sDMT7rbe5pfpd5RPYE9V7xhohIyGZlY06tp6ltrTjOoKY2Hmn76KD2UGpZnXQMsm+PKLpSlidrCJo6VfHianEK8VURY4Jwpnm3SQVYLVIUsSOLxx1QAKcnoyc5r3j4ugkW+6TjFV/PRIlUq68OYwobQ34GQ0oSrDS105R0fA+ThOtxk3+Ln+y8wDWFFvMtNGN1Yq+nho3hbLjuPdcQ/HotJqDqle0xkdLF4rAQ3W02k/NyskMT2Ie3gULOVLdr8QYdE80gU3xi/M8jOy1V5abDny5nd39OXfGEN/ueJCugESDeG9duXutVT0H3HsKZybnl4KW2VIdnQvM9Ug6l+Ym2ehXeXjLaiEQa/9wNwUDal8oTrL0VrUd3q3rKGVILky23gUazZaxPnVaiCijxI05Yh7acKdG2UPPSoPwW1Y2pX7CclQSmlXNo0BW/hZjuxV4WM4z8J7XERD5kur10/ZoX56zVOXzh4688OkDvdfKcc51dFleNVBfVp10gqJPME2kQM+OSsyzZ0IdMHZ8ocP8uhNCKbYahKTOImO4Y4ri8TB9LIaPhdEb9rQrluyFfwAYQqKDgc5dRLffZJ/ipFvjxAtIlZ6/M4ctVBYVMsjfsOnigaWhK8OjvhgSFNODctzzP0UopY+jPEKwJf/2ZCcPJeLDo1NDpRWVcv3NUVDG8rayQzXMEQZeiqRk/zaPB6eNbFhn7SQYrDumb1VKZ6CJD5S+8cfSEnj3wAOoikv8QCAlgPMVd9Vf+iAm7CnwsuqU1UEnwv0rnK5J/43mZWvB6uFhrUD8Sb2skaumuxrUXUw85NdrzNngWE7+MgotF7xLMKpOTIl3pe5rZAsLfYW1yb RLBZmXpL F2tKRl9bn04YREIdxwkJjE8I9Pvu3ZpaM2vIzou73v9YfMmPjzXdrohDoelByg1q46ic5tUJnH+VHAAIEFnAhmofX6tkrX+LQKnIow7FE0kSJP4sITkxbbSOfWtjXB7SrWqseQAbfpNGLeYS0bFwOV9oguUV1i9iRyi+LMLzQiU6O2hFWKSvKkED2wEmGPfgWbwCXGkEGDNZdEAZiE823EQpuQyxdYzrPqnsDcWIrh5qI3a8BWghaPFH5D+/PN9GsyciaIpal1AkQPvjfO3gYy/v7EoZpKi6gFMYMZhDTk3IxXyAu4rT1zA+JjZF7EUmo4Me6rKAhb91znR3MwtTz7rHPbglz2OyWru09e8u8+HLbhR0ddg7hMGQFl+1MBlZRJ4C1s+iVsCy/ooqeKb2orgDR6Pd7mNbFkXsPWGq4CGCGTU7pwJy+JojR/PbAUuLOjIX7j6T+7N1ECYTkLUIxP7IlOPVI+aHEQI0TR9ktScqoEowWbAhtD9fDld49a62xran+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Convert netmem to be a union of struct page and struct netmem. Overload the LSB of struct netmem* to indicate that it's a net_iov, otherwise it's a page. Currently these entries in struct page are rented by the page_pool and used exclusively by the net stack: struct { unsigned long pp_magic; struct page_pool *pp; unsigned long _pp_mapping_pad; unsigned long dma_addr; atomic_long_t pp_ref_count; }; Mirror these (and only these) entries into struct net_iov and implement netmem helpers that can access these common fields regardless of whether the underlying type is page or net_iov. Implement checks for net_iov in netmem helpers which delegate to mm APIs, to ensure net_iov are never passed to the mm stack. Signed-off-by: Mina Almasry --- v7: - Remove static_branch_unlikely from netmem_to_net_iov(). We're getting better results from the fast path in bench_page_pool_simple tests without the static_branch_unlikely, and the addition of static_branch_unlikely doesn't improve performance of devmem TCP. Additionally only check netmem_to_net_iov() if CONFIG_DMA_SHARED_BUFFER is enabled, otherwise dmabuf net_iovs cannot exist anyway. net-next base: 8 cycle fast path. with static_branch_unlikely: 10 cycle fast path. without static_branch_unlikely: 9 cycle fast path. CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path as baseline. Performance of devmem TCP is at 95% line rate is regardless of static_branch_unlikely or not. v6: - Rebased on top of the merged netmem_ref type. - Rebased on top of the merged skb_pp_frag_ref() changes. v5: - Use netmem instead of page* with LSB set. - Use pp_ref_count for refcounting net_iov. - Removed many of the custom checks for netmem. v1: - Disable fragmentation support for iov properly. - fix napi_pp_put_page() path (Yunsheng). - Use pp_frag_count for devmem refcounting. Cc: linux-mm@kvack.org Cc: Matthew Wilcox --- include/net/netmem.h | 141 ++++++++++++++++++++++++++++++-- include/net/page_pool/helpers.h | 25 +++--- net/core/devmem.c | 3 + net/core/page_pool.c | 26 +++--- net/core/skbuff.c | 23 +++--- 5 files changed, 172 insertions(+), 46 deletions(-) diff --git a/include/net/netmem.h b/include/net/netmem.h index 5f1c728618f2..74eeaa34883e 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -9,14 +9,51 @@ #define _NET_NETMEM_H #include +#include /* net_iov */ +DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers); + +/* We overload the LSB of the struct page pointer to indicate whether it's + * a page or net_iov. + */ +#define NET_IOV 0x01UL + struct net_iov { + unsigned long __unused_padding; + unsigned long pp_magic; + struct page_pool *pp; struct dmabuf_genpool_chunk_owner *owner; unsigned long dma_addr; + atomic_long_t pp_ref_count; }; +/* These fields in struct page are used by the page_pool and net stack: + * + * struct { + * unsigned long pp_magic; + * struct page_pool *pp; + * unsigned long _pp_mapping_pad; + * unsigned long dma_addr; + * atomic_long_t pp_ref_count; + * }; + * + * We mirror the page_pool fields here so the page_pool can access these fields + * without worrying whether the underlying fields belong to a page or net_iov. + * + * The non-net stack fields of struct page are private to the mm stack and must + * never be mirrored to net_iov. + */ +#define NET_IOV_ASSERT_OFFSET(pg, iov) \ + static_assert(offsetof(struct page, pg) == \ + offsetof(struct net_iov, iov)) +NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic); +NET_IOV_ASSERT_OFFSET(pp, pp); +NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr); +NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count); +#undef NET_IOV_ASSERT_OFFSET + static inline struct dmabuf_genpool_chunk_owner * net_iov_owner(const struct net_iov *niov) { @@ -69,20 +106,26 @@ net_iov_binding(const struct net_iov *niov) */ typedef unsigned long __bitwise netmem_ref; +static inline bool netmem_is_net_iov(const netmem_ref netmem) +{ +#if defined(CONFIG_PAGE_POOL) && defined(CONFIG_DMA_SHARED_BUFFER) + return (__force unsigned long)netmem & NET_IOV; +#else + return false; +#endif +} + /* This conversion fails (returns NULL) if the netmem_ref is not struct page * backed. - * - * Currently struct page is the only possible netmem, and this helper never - * fails. */ static inline struct page *netmem_to_page(netmem_ref netmem) { + if (WARN_ON_ONCE(netmem_is_net_iov(netmem))) + return NULL; + return (__force struct page *)netmem; } -/* Converting from page to netmem is always safe, because a page can always be - * a netmem. - */ static inline netmem_ref page_to_netmem(struct page *page) { return (__force netmem_ref)page; @@ -90,17 +133,103 @@ static inline netmem_ref page_to_netmem(struct page *page) static inline int netmem_ref_count(netmem_ref netmem) { + /* The non-pp refcount of net_iov is always 1. On net_iov, we only + * support pp refcounting which uses the pp_ref_count field. + */ + if (netmem_is_net_iov(netmem)) + return 1; + return page_ref_count(netmem_to_page(netmem)); } static inline unsigned long netmem_to_pfn(netmem_ref netmem) { + if (netmem_is_net_iov(netmem)) + return 0; + return page_to_pfn(netmem_to_page(netmem)); } +static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem) +{ + return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV); +} + +static inline unsigned long netmem_get_pp_magic(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->pp_magic; +} + +static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic) +{ + __netmem_clear_lsb(netmem)->pp_magic |= pp_magic; +} + +static inline void netmem_clear_pp_magic(netmem_ref netmem) +{ + __netmem_clear_lsb(netmem)->pp_magic = 0; +} + +static inline struct page_pool *netmem_get_pp(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->pp; +} + +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool) +{ + __netmem_clear_lsb(netmem)->pp = pool; +} + +static inline unsigned long netmem_get_dma_addr(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->dma_addr; +} + +static inline void netmem_set_dma_addr(netmem_ref netmem, + unsigned long dma_addr) +{ + __netmem_clear_lsb(netmem)->dma_addr = dma_addr; +} + +static inline atomic_long_t *netmem_get_pp_ref_count_ref(netmem_ref netmem) +{ + return &__netmem_clear_lsb(netmem)->pp_ref_count; +} + +static inline bool netmem_is_pref_nid(netmem_ref netmem, int pref_nid) +{ + /* Assume net_iov are on the preferred node without actually + * checking... + * + * This check is only used to check for recycling memory in the page + * pool's fast paths. Currently the only implementation of net_iov + * is dmabuf device memory. It's a deliberate decision by the user to + * bind a certain dmabuf to a certain netdev, and the netdev rx queue + * would not be able to reallocate memory from another dmabuf that + * exists on the preferred node, so, this check doesn't make much sense + * in this case. Assume all net_iovs can be recycled for now. + */ + if (netmem_is_net_iov(netmem)) + return true; + + return page_to_nid(netmem_to_page(netmem)) == pref_nid; +} + static inline netmem_ref netmem_compound_head(netmem_ref netmem) { + /* niov are never compounded */ + if (netmem_is_net_iov(netmem)) + return netmem; + return page_to_netmem(compound_head(netmem_to_page(netmem))); } +static inline void *netmem_address(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) + return NULL; + + return page_address(netmem_to_page(netmem)); +} + #endif /* _NET_NETMEM_H */ diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 61814f91a458..c6a55eddefae 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -215,7 +215,7 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool) static inline void page_pool_fragment_netmem(netmem_ref netmem, long nr) { - atomic_long_set(&netmem_to_page(netmem)->pp_ref_count, nr); + atomic_long_set(netmem_get_pp_ref_count_ref(netmem), nr); } /** @@ -243,7 +243,7 @@ static inline void page_pool_fragment_page(struct page *page, long nr) static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) { - struct page *page = netmem_to_page(netmem); + atomic_long_t *pp_ref_count = netmem_get_pp_ref_count_ref(netmem); long ret; /* If nr == pp_ref_count then we have cleared all remaining @@ -260,19 +260,19 @@ static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) * initially, and only overwrite it when the page is partitioned into * more than one piece. */ - if (atomic_long_read(&page->pp_ref_count) == nr) { + if (atomic_long_read(pp_ref_count) == nr) { /* As we have ensured nr is always one for constant case using * the BUILD_BUG_ON(), only need to handle the non-constant case * here for pp_ref_count draining, which is a rare case. */ BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1); if (!__builtin_constant_p(nr)) - atomic_long_set(&page->pp_ref_count, 1); + atomic_long_set(pp_ref_count, 1); return 0; } - ret = atomic_long_sub_return(nr, &page->pp_ref_count); + ret = atomic_long_sub_return(nr, pp_ref_count); WARN_ON(ret < 0); /* We are the last user here too, reset pp_ref_count back to 1 to @@ -281,7 +281,7 @@ static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) * page_pool_unref_page() currently. */ if (unlikely(!ret)) - atomic_long_set(&page->pp_ref_count, 1); + atomic_long_set(pp_ref_count, 1); return ret; } @@ -400,9 +400,7 @@ static inline void page_pool_free_va(struct page_pool *pool, void *va, static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - dma_addr_t ret = page->dma_addr; + dma_addr_t ret = netmem_get_dma_addr(netmem); if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) ret <<= PAGE_SHIFT; @@ -425,18 +423,17 @@ static inline dma_addr_t page_pool_get_dma_addr(struct page *page) static inline bool page_pool_set_dma_addr_netmem(netmem_ref netmem, dma_addr_t addr) { - struct page *page = netmem_to_page(netmem); - if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { - page->dma_addr = addr >> PAGE_SHIFT; + netmem_set_dma_addr(netmem, addr >> PAGE_SHIFT); /* We assume page alignment to shave off bottom bits, * if this "compression" doesn't work we need to drop. */ - return addr != (dma_addr_t)page->dma_addr << PAGE_SHIFT; + return addr != (dma_addr_t)netmem_get_dma_addr(netmem) + << PAGE_SHIFT; } - page->dma_addr = addr; + netmem_set_dma_addr(netmem, addr); return false; } diff --git a/net/core/devmem.c b/net/core/devmem.c index 268fc8455a6d..c25ede5f6fb9 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -121,7 +121,10 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) index = offset / PAGE_SIZE; niov = &owner->niovs[index]; + niov->pp_magic = 0; + niov->pp = NULL; niov->dma_addr = 0; + atomic_long_set(&niov->pp_ref_count, 0); net_devmem_dmabuf_binding_get(binding); diff --git a/net/core/page_pool.c b/net/core/page_pool.c index c8125be3a6e2..c7bffd08218b 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -25,7 +25,7 @@ #include "page_pool_priv.h" -static DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); +DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); #define DEFER_TIME (msecs_to_jiffies(1000)) #define DEFER_WARN_INTERVAL (60 * HZ) @@ -359,7 +359,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool) if (unlikely(!netmem)) break; - if (likely(page_to_nid(netmem_to_page(netmem)) == pref_nid)) { + if (likely(netmem_is_pref_nid(netmem, pref_nid))) { pool->alloc.cache[pool->alloc.count++] = netmem; } else { /* NUMA mismatch; @@ -446,10 +446,8 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - page->pp = pool; - page->pp_magic |= PP_SIGNATURE; + netmem_set_pp(netmem, pool); + netmem_or_pp_magic(netmem, PP_SIGNATURE); /* Ensuring all pages have been split into one fragment initially: * page_pool_set_pp_info() is only called once for every page when it @@ -464,10 +462,8 @@ static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) static void page_pool_clear_pp_info(netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - page->pp_magic = 0; - page->pp = NULL; + netmem_clear_pp_magic(netmem); + netmem_set_pp(netmem, NULL); } static struct page *__page_pool_alloc_page_order(struct page_pool *pool, @@ -695,8 +691,9 @@ static bool page_pool_recycle_in_cache(netmem_ref netmem, static bool __page_pool_page_can_be_recycled(netmem_ref netmem) { - return page_ref_count(netmem_to_page(netmem)) == 1 && - !page_is_pfmemalloc(netmem_to_page(netmem)); + return netmem_is_net_iov(netmem) || + (page_ref_count(netmem_to_page(netmem)) == 1 && + !page_is_pfmemalloc(netmem_to_page(netmem))); } /* If the page refcnt == 1, this will try to recycle the page. @@ -718,7 +715,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem, * refcnt == 1 means page_pool owns page, and can recycle it. * * page is NOT reusable when allocated when system is under - * some pressure. (page_is_pfmemalloc) + * some pressure. (page_pool_page_is_pfmemalloc) */ if (likely(__page_pool_page_can_be_recycled(netmem))) { /* Read barrier done in page_ref_count / READ_ONCE */ @@ -734,6 +731,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem, /* Page found as candidate for recycling */ return netmem; } + /* Fallback/non-XDP mode: API user have elevated refcnt. * * Many drivers split up the page into fragments, and some @@ -928,7 +926,7 @@ static void page_pool_empty_ring(struct page_pool *pool) /* Empty recycle ring */ while ((netmem = (__force netmem_ref)ptr_ring_consume_bh(&pool->ring))) { /* Verify the refcnt invariant of cached pages */ - if (!(page_ref_count(netmem_to_page(netmem)) == 1)) + if (!(netmem_ref_count(netmem) == 1)) pr_crit("%s() page_pool refcnt %d violation\n", __func__, netmem_ref_count(netmem)); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index dc6b1f6435e2..753d61680d69 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -906,9 +906,9 @@ static void skb_clone_fraglist(struct sk_buff *skb) skb_get(list); } -static bool is_pp_page(struct page *page) +static bool is_pp_netmem(netmem_ref netmem) { - return (page->pp_magic & ~0x3UL) == PP_SIGNATURE; + return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE; } int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb, @@ -1006,11 +1006,10 @@ EXPORT_SYMBOL(skb_cow_data_for_xdp); #if IS_ENABLED(CONFIG_PAGE_POOL) bool napi_pp_put_page(netmem_ref netmem, bool napi_safe) { - struct page *page = netmem_to_page(netmem); bool allow_direct = false; struct page_pool *pp; - page = compound_head(page); + netmem = netmem_compound_head(netmem); /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation * in order to preserve any existing bits, such as bit 0 for the @@ -1019,10 +1018,10 @@ bool napi_pp_put_page(netmem_ref netmem, bool napi_safe) * and page_is_pfmemalloc() is checked in __page_pool_put_page() * to avoid recycling the pfmemalloc page. */ - if (unlikely(!is_pp_page(page))) + if (unlikely(!is_pp_netmem(netmem))) return false; - pp = page->pp; + pp = netmem_get_pp(netmem); /* Allow direct recycle if we have reasons to believe that we are * in the same context as the consumer would run, so there's @@ -1043,7 +1042,7 @@ bool napi_pp_put_page(netmem_ref netmem, bool napi_safe) * The page will be returned to the pool here regardless of the * 'flipped' fragment being in use or not. */ - page_pool_put_full_netmem(pp, page_to_netmem(page), allow_direct); + page_pool_put_full_netmem(pp, netmem, allow_direct); return true; } @@ -1070,7 +1069,7 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe) static int skb_pp_frag_ref(struct sk_buff *skb) { struct skb_shared_info *shinfo; - struct page *head_page; + netmem_ref head_netmem; int i; if (!skb->pp_recycle) @@ -1079,11 +1078,11 @@ static int skb_pp_frag_ref(struct sk_buff *skb) shinfo = skb_shinfo(skb); for (i = 0; i < shinfo->nr_frags; i++) { - head_page = compound_head(skb_frag_page(&shinfo->frags[i])); - if (likely(is_pp_page(head_page))) - page_pool_ref_page(head_page); + head_netmem = netmem_compound_head(shinfo->frags[i].netmem); + if (likely(is_pp_netmem(head_netmem))) + page_pool_ref_netmem(head_netmem); else - page_ref_inc(head_page); + page_ref_inc(netmem_to_page(head_netmem)); } return 0; }