From patchwork Tue Jul 9 13:27:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yunsheng Lin X-Patchwork-Id: 13727973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 386EFC2BD09 for ; Tue, 9 Jul 2024 13:31:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C21EA6B00C0; Tue, 9 Jul 2024 09:31:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B84226B00BE; Tue, 9 Jul 2024 09:31:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A039F6B00BF; Tue, 9 Jul 2024 09:31:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7E3AB6B00BC for ; Tue, 9 Jul 2024 09:31:08 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E8616C1969 for ; Tue, 9 Jul 2024 13:31:07 +0000 (UTC) X-FDA: 82320300174.18.636DFFE Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf22.hostedemail.com (Postfix) with ESMTP id E35A9C0021 for ; Tue, 9 Jul 2024 13:31:04 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf22.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720531842; a=rsa-sha256; cv=none; b=LIWOyUgd9nGCnFogU3V57e9IszR4mM8bBgDFGP5ScnM971aOdYy3IXCCX/rK+32HpW5oWS PeZgeH4QaeTZIKW4NirpFrAHwxi9g4on6GjWbl+/jQ8at9OUjjUnV9a8escq8JH8sMQRcb 4I8RxwhlCxQ2nb/1AsZA48f7OPsEXng= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf22.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720531842; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3NzPwAzeBy1XsngGOLrvcyZTAzJgH79Nbo9wv61cnBw=; b=Fq67nlMczN/S+OW8wOPlk/S8bpR+TVdlu1h6tWobrwTJ6XLXHi8Wq5SrshiNyCOfM6QV+C JTt2WV/H7TR5KTsjIheJ4Ac0lFhdtim0Mmj1ECArJmxu8RZ6w9P1DpK8mBr3fXwO2Yondk stEfCwQclB8zCDra3DQ9+aZN5EKsKQ0= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4WJMLf1pLJzcpGJ; Tue, 9 Jul 2024 21:30:34 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 68479180064; Tue, 9 Jul 2024 21:31:00 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 9 Jul 2024 21:31:00 +0800 From: Yunsheng Lin To: , , CC: , , Yunsheng Lin , Alexander Duyck , Andrew Morton , Subject: [PATCH net-next v10 03/15] mm: page_frag: use initial zero offset for page_frag_alloc_align() Date: Tue, 9 Jul 2024 21:27:28 +0800 Message-ID: <20240709132741.47751-4-linyunsheng@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20240709132741.47751-1-linyunsheng@huawei.com> References: <20240709132741.47751-1-linyunsheng@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspamd-Queue-Id: E35A9C0021 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 1zwi68uepe46cd1h1zz56cw5p998ea5e X-HE-Tag: 1720531864-28078 X-HE-Meta: U2FsdGVkX1+Z7UZIacpJNApEAGjJBjn7eB0BK5YZD3PI+gRvBC9hX+CqQ81HtqLMyZUBZ3/aX1Hr8yEh+SHtYmQ8TNvEKOjw4OrY9AfI+neZ6Ql2WBNbn4lp9Vxv4sJ6RbE1Uk649UyzIxaLWZ8SEymJUhliPoAPcLDEwEeRWO1zUMkm/6PWGN1IvWSdjyrpXBzgHlO6CYRvBqXWAn6Ua/uSBU15myJdMpWsbb8wacdrTSJRhy8i+2UBLqd10ZYkbVAWW2H9Qjq0i4xDhBcJItXJmGxT14MDLm3pvRv1FjGgQZxYz9+U5edzAOLennjeJtTr+RPzhpu/gZSmqA6lwXjZhvdkXm3upXTVSM3BvSF6WJ5E3WyXCgYfuNGGBvYQ+rjkqS9+tjpjuND9YfFM2qzXtRH+AnroJVCxhlC73lIjPXzQRdwzB8AP3Bj75S44l2iWHN8aYaAXbbiI7FiFVrJF2ca9DR3VbzU3ewwpQDBxiWzmk5mem+k6In8FCqYRDroAsiaZU6LZgh/OnGMSHO/WF6hbXC8PFlxP3rgoGx+7VbdOIj/iO65P08VweR1igf4RMZfjTBuZlFYXdX9yWa0aqMj610QA3Fr8oZeTDSzJJ2CHoDUF5DiJGZFmx/viGJFxqbWVywj6/K0lh4KrMW030UEE+fbsutOQ5sJUzlP/Tn3B5sGm/1n6Dfxq1AsqKrnSN+e+2nfLEbR7+bQJKBpOHz9JTQdQkbGCyTTMZjCM9GTT+X6g44oA9hQCdP3ghnanJCXPNqWcSa0iCrIovXbyrdSp3rZudXqLPFL422OFtDNw/eM06DspT8mrqgWOaPUa22LKzWj1cfFUYS4Hc2/3mBk8sKqISCxCKYaNj8iGRrbJ0gLZKw/AhRPzEuDTHZ1rJmSpPMwornxLPs1aVmWfKkOscIoZKLc/ayXgujf1S5mvWh46Lk2hUWK5YVXfp6qTMXeyR7pNkjtXead jKRODgRO d543VGQWcPvqZvuhPpXIpG58ks/Zd5XRmFQ4IW4xXdkdBmw75xB4L6QuzI8Mz1dnSxQ1Trj4tZGwXEqE2u9C3TgjpUD0WwJIBAzaZc+bSaHQbaan+vm2iycUNLr4d9GdCE2MOLPVXrWNuwQf4gv3NfStLmyryiv8bSTZKF4C3sUondg8kZysPxT9co0+pAoEXhoFKf9ywfD/y8WoeneXo+RDYajt10gmbd8p2YyCYf/RJz7FWqGawoME3V0cSA2tMwi3OpSElElQfFfd85D2NrDrjd4RMaHQ0Ro0h8giCk8t1AlaR83IHKBU1El1rs1/i1T2yYpZoai94nAlxFyLna5Ctvw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We are about to use page_frag_alloc_*() API to not just allocate memory for skb->data, but also use them to do the memory allocation for skb frag too. Currently the implementation of page_frag in mm subsystem is running the offset as a countdown rather than count-up value, there may have several advantages to that as mentioned in [1], but it may have some disadvantages, for example, it may disable skb frag coaleasing and more correct cache prefetching We have a trade-off to make in order to have a unified implementation and API for page_frag, so use a initial zero offset in this patch, and the following patch will try to make some optimization to avoid the disadvantages as much as possible. Rename 'offset' to 'remaining' to retain the 'countdown' behavior as 'remaining countdown' instead of 'offset countdown'. Also, Renaming enable us to do a single 'fragsz > remaining' checking for the case of cache not being enough, which should be the fast path if we ensure 'remaining' is zero when 'va' == NULL by memset'ing 'struct page_frag_cache' in page_frag_cache_init() and page_frag_cache_drain(). 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/ CC: Alexander Duyck Signed-off-by: Yunsheng Lin --- include/linux/page_frag_cache.h | 4 +-- mm/page_frag_cache.c | 50 +++++++++++++++++++++------------ 2 files changed, 34 insertions(+), 20 deletions(-) diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 325872cec8a4..ed8bacbb877b 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -14,10 +14,10 @@ struct page_frag_cache { void *va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - __u16 offset; + __u16 remaining; __u16 size; #else - __u32 offset; + __u32 remaining; #endif /* we maintain a pagecount bias, so that we dont dirty cache line * containing page->_refcount every time we allocate a fragment. diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index 609a485cd02a..ef0a02f12acc 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -22,6 +22,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, gfp_t gfp_mask) { + unsigned int page_size = PAGE_FRAG_CACHE_MAX_SIZE; struct page *page = NULL; gfp_t gfp = gfp_mask; @@ -30,12 +31,21 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; #endif - if (unlikely(!page)) + if (unlikely(!page)) { page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); + if (unlikely(!page)) { + nc->va = NULL; + return NULL; + } - nc->va = page ? page_address(page) : NULL; + page_size = PAGE_SIZE; + } + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + nc->size = page_size; +#endif + nc->va = page_address(page); return page; } @@ -63,9 +73,9 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align_mask) { + int aligned_remaining, remaining; unsigned int size = PAGE_SIZE; struct page *page; - int offset; if (unlikely(!nc->va)) { refill: @@ -82,14 +92,20 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, */ page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); - /* reset page count bias and offset to start of new frag */ + /* reset page count bias and remaining to start of new frag */ nc->pfmemalloc = page_is_pfmemalloc(page); nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; + nc->remaining = size; } - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + + aligned_remaining = nc->remaining & align_mask; + remaining = aligned_remaining - fragsz; + if (unlikely(remaining < 0)) { page = virt_to_page(nc->va); if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) @@ -100,17 +116,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, goto refill; } -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* OK, page count is 0, we can safely set it */ set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); - /* reset page count bias and offset to start of new frag */ + /* reset page count bias and remaining to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { + nc->remaining = size; + + aligned_remaining = size; + remaining = aligned_remaining - fragsz; + if (unlikely(remaining < 0)) { /* * The caller is trying to allocate a fragment * with fragsz > PAGE_SIZE but the cache isn't big @@ -125,10 +140,9 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, } nc->pagecnt_bias--; - offset &= align_mask; - nc->offset = offset; + nc->remaining = remaining; - return nc->va + offset; + return nc->va + (size - aligned_remaining); } EXPORT_SYMBOL(__page_frag_alloc_align);