From patchwork Mon Oct 16 05:29:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 13422490 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD9FFCDB465 for ; Mon, 16 Oct 2023 05:30:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5851D8D0032; Mon, 16 Oct 2023 01:30:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 536A98D0001; Mon, 16 Oct 2023 01:30:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FFA58D0032; Mon, 16 Oct 2023 01:30:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2C0028D0001 for ; Mon, 16 Oct 2023 01:30:35 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F149EB5C49 for ; Mon, 16 Oct 2023 05:30:34 +0000 (UTC) X-FDA: 81350199588.08.8AD9BF2 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by imf12.hostedemail.com (Postfix) with ESMTP id CADC240017 for ; Mon, 16 Oct 2023 05:30:32 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="PMD/kIm+"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697434233; a=rsa-sha256; cv=none; b=pljMVF8vVmm8ybDVZhtQmr39iFaD9YG0j5bZhzXTCJH4ywzElMu62mmLU0gGmET1KkPUiq 8hz2Bhp6BsUY3ss69Cs5xit+jdYIYXuNliW6n+izBZe1N+Swi+Mru+RW87jO78pcoV5qsP qbifijU6J8jLGi91UDOr7SFRrFPUWYQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="PMD/kIm+"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697434233; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EM2p971k14pLhhfzm38QBffPfm3gPH5AMIDqRjUWOuU=; b=vFISVWiqU+vD4knUON892052OMXLQ3VFPMI40DJprLXfpjV6bPq+M4BPMbREkIUTFkloOc bn7tJ+rDbTTEQm5xBARTlj45eIGlKbUxf5vmgBDCZBwXthr1nXUKfBqJ0GPPXW7EXq/2Jc IuiAa9AeIyoOELYnJXJOQrWDK2Uma0g= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697434232; x=1728970232; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UjGG03uSU9scSCPZWKrCozXwLS8Efdo9vVeSGMTDY0g=; b=PMD/kIm+KFcWyDcpxdXO23TFBE8nKZVlpGEuo2fKBQB2RZs7OJL3tOzd Ko8IgoS7LFLK+8/er4pakGC7vl5nUg+q8hgrjCTHOnKSB6OyT++BBLXcU TH/QOssGE5YnqG9rjki0cL2XWfaeuteYJXOwdk0AKCmllLf5TJiVHha4z R27wRaHQLIMKkeU1A8gYZdmsOEiPTbcFuVhK/lhVmkN8XFV+OMM9WYrzI nYIDHOra0a1NHOzF9dpKS39SJdwHcrjSF2SlFvXBm1xBBx/05/XiV9TYb dYr5grgPxD5qaUKt2lkk9ULnspw4VTB5vfaJPmoTqUE807VXRCBK4nNzi A==; X-IronPort-AV: E=McAfee;i="6600,9927,10863"; a="389307995" X-IronPort-AV: E=Sophos;i="6.03,228,1694761200"; d="scan'208";a="389307995" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2023 22:30:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10863"; a="899356680" X-IronPort-AV: E=Sophos;i="6.03,228,1694761200"; d="scan'208";a="899356680" Received: from yhuang6-mobl2.sh.intel.com ([10.238.6.133]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2023 22:28:29 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Huang Ying , Mel Gorman , Sudeep Holla , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: [PATCH -V3 3/9] mm, pcp: reduce lock contention for draining high-order pages Date: Mon, 16 Oct 2023 13:29:56 +0800 Message-Id: <20231016053002.756205-4-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231016053002.756205-1-ying.huang@intel.com> References: <20231016053002.756205-1-ying.huang@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CADC240017 X-Stat-Signature: q4tw9c1i46bgcxgziugnwsbtcr5eis8k X-HE-Tag: 1697434232-762922 X-HE-Meta: U2FsdGVkX1+YcqAEDWrCSjqi6n2IJyNUNiu5567mBVyX9pWtWxiAftWOHUY13Mes6mC8XGUQoMsc6P69R4/wuMEMc+VxDBQHxP6yeOQ3O9NmTk8/GekOkGm3mOaeUUiX+mzdNijlGlX9MSt8Lq9sKz7h+38A/5mFRc5EcGdA5wpSxhqw9JTdkUJf+BW6pUwvPIMgvgQ0NQsZcDcb/sP9uLSpnfWr8SQbGT80BJTAAtzCFfbU6g8O53QV99cs2hzgZPAoGFkiLljglzkKw5f5axPs13RO1QGSQKQDsaavMjLkSWcwXQZwv2y6Zk/NrRLEhUExQHdUGDN+LAreP0NDB6DEtGafGN99C/Oqn3ooyiSoAltceBYrbeuf6e7gQKIBl4U92vGCGU/7BhBz1l1+gLh6jJejjud4gyvRsOAY9EBvMNqj5F8s5RU0wBUX2Yk1GbUwYP3O4JNC5mdmfh5IdAK0KEDrcaOmNpEOIX0ZFEHY0ZQwimLiWmsOrGOG1Esw7JlAcuB+YWifURPCaYkvyrer7dreCGyya1/GutiRzvbDvwVReFjxzxwfwaRFo7AHBfu0lTFxy8ThiWCUNZ3IMEdzyfW2Pz/NkNX30XeU3OWUYQoT4zVBaKx4+tZRt/LR8IwNor4KC8+a31EOHWXP6JGajULjcuqAX+PafFbWDdOgRGFCC6FNzkkolyRN+54Sj/Jj1E/4g4vNC0+uK2nR4qUX3WVj1FAaP8NOkygCr9e1bNexxorSGLWRB+XLgl5CN9nIzVDnUXhiV73nB1skUlcHwwiEvYbpgdSc8tGqTv1m4Xlm0z9UCflNJiUJ+d6bEI3PkxrONkrV1jw78OjmhecrB8eI6AN2gg4X2b775FJD7Eqe4JwNCUdENdiFe3+6oD83k4Xt0IsvU4g8iSQxPnxHrKEKn9QUt7ZiwcomgQCiUlwRN2c7LhyQhYcvUAFjnXhE5Gq4KuO568FSz5i L1LkHhH4 EYo2E4ztliXDjTn3Aqrf1Q4khrLyexmVJLzMc+2xBR/wbYcFw1VC3W7zIjcRRz+eIqA5hzrjGxa0vdS+n7zt7kluTY76Md9dG4JlW+7Gy2izBWoITvXOmJmgY5OMIMJDXIqfAhZuH2gb8H830aQ8nt/uiRU2P5jhcCB4q2KlNcHk97QxhuwdN/dM07oLZIXmJJ89Tv6oNbdgNBwfEgv8vL1PL1ce9hr8eazXSCoytXuOhlZbyG0e3yZ0rPOLgye8gUt2OVMV2ctNwNQ0pFAXQXdrvzEIKbzDaoc5D/Qs2U01DWxL2ondiTXfTLhHDEorZzc3Isfet3Jo6l/BJV6ftTjGFLH7jJp0N+YCu1GGymzpY7xVUNaAm8ytOhrekqjPvdrgtJSzN+4DYGwzGl8qgQ09l/a5WNLwIRV8nKQI7XWkN++z1Mu4BSaKk6sAogia6wpQS+DEBvz4eS61TOyrFt4gEM1N4HM96YDF0oK58yrQ68QZqkRbKjHh5i5+cCKgVLST3YuTEkpHYuiUy+jvXI7e1A5CGHRwnm+TDLga39X1q+s8HSezWpg7ORZSy7gReus6p7lvfnjuM0knSRlD4bqcEfDhO5VhEGkvQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In commit f26b3fa04611 ("mm/page_alloc: limit number of high-order pages on PCP during bulk free"), the PCP (Per-CPU Pageset) will be drained when PCP is mostly used for high-order pages freeing to improve the cache-hot pages reusing between page allocating and freeing CPUs. On system with small per-CPU data cache slice, pages shouldn't be cached before draining to guarantee cache-hot. But on a system with large per-CPU data cache slice, some pages can be cached before draining to reduce zone lock contention. So, in this patch, instead of draining without any caching, "pcp->batch" pages will be cached in PCP before draining if the size of the per-CPU data cache slice is more than "3 * batch". In theory, if the size of per-CPU data cache slice is more than "2 * batch", we can reuse cache-hot pages between CPUs. But considering the other usage of cache (code, other data accessing, etc.), "3 * batch" is used. Note: "3 * batch" is chosen to make sure the optimization works on recent x86_64 server CPUs. If you want to increase it, please check whether it breaks the optimization. On a 2-socket Intel server with 128 logical CPU, with the patch, the network bandwidth of the UNIX (AF_UNIX) test case of lmbench test suite with 16-pair processes increase 70.5%. The cycles% of the spinlock contention (mostly for zone lock) decreases from 46.1% to 21.3%. The number of PCP draining for high order pages freeing (free_high) decreases 89.9%. The cache miss rate keeps 0.2%. Signed-off-by: "Huang, Ying" Acked-by: Mel Gorman Cc: Andrew Morton Cc: Sudeep Holla Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Johannes Weiner Cc: Dave Hansen Cc: Michal Hocko Cc: Pavel Tatashin Cc: Matthew Wilcox Cc: Christoph Lameter --- drivers/base/cacheinfo.c | 2 ++ include/linux/gfp.h | 1 + include/linux/mmzone.h | 6 ++++++ mm/page_alloc.c | 38 +++++++++++++++++++++++++++++++++++++- 4 files changed, 46 insertions(+), 1 deletion(-) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index 585c66fce9d9..f1e79263fe61 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -950,6 +950,7 @@ static int cacheinfo_cpu_online(unsigned int cpu) if (rc) goto err; update_per_cpu_data_slice_size(true, cpu); + setup_pcp_cacheinfo(); return 0; err: free_cache_attributes(cpu); @@ -963,6 +964,7 @@ static int cacheinfo_cpu_pre_down(unsigned int cpu) free_cache_attributes(cpu); update_per_cpu_data_slice_size(false, cpu); + setup_pcp_cacheinfo(); return 0; } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 665f06675c83..665edc11fb9f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -325,6 +325,7 @@ void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); void page_alloc_init_late(void); +void setup_pcp_cacheinfo(void); /* * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 19c40a6f7e45..cdff247e8c6f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -682,8 +682,14 @@ enum zone_watermarks { * PCPF_PREV_FREE_HIGH_ORDER: a high-order page is freed in the * previous page freeing. To avoid to drain PCP for an accident * high-order page freeing. + * + * PCPF_FREE_HIGH_BATCH: preserve "pcp->batch" pages in PCP before + * draining PCP for consecutive high-order pages freeing without + * allocation if data cache slice of CPU is large enough. To reduce + * zone lock contention and keep cache-hot pages reusing. */ #define PCPF_PREV_FREE_HIGH_ORDER BIT(0) +#define PCPF_FREE_HIGH_BATCH BIT(1) struct per_cpu_pages { spinlock_t lock; /* Protects lists field */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 295e61f0c49d..ba2d8f06523e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include "internal.h" #include "shuffle.h" @@ -2385,7 +2386,9 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, */ if (order && order <= PAGE_ALLOC_COSTLY_ORDER) { free_high = (pcp->free_factor && - (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER)); + (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) && + (!(pcp->flags & PCPF_FREE_HIGH_BATCH) || + pcp->count >= READ_ONCE(pcp->batch))); pcp->flags |= PCPF_PREV_FREE_HIGH_ORDER; } else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) { pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER; @@ -5418,6 +5421,39 @@ static void zone_pcp_update(struct zone *zone, int cpu_online) mutex_unlock(&pcp_batch_high_lock); } +static void zone_pcp_update_cacheinfo(struct zone *zone) +{ + int cpu; + struct per_cpu_pages *pcp; + struct cpu_cacheinfo *cci; + + for_each_online_cpu(cpu) { + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + cci = get_cpu_cacheinfo(cpu); + /* + * If data cache slice of CPU is large enough, "pcp->batch" + * pages can be preserved in PCP before draining PCP for + * consecutive high-order pages freeing without allocation. + * This can reduce zone lock contention without hurting + * cache-hot pages sharing. + */ + spin_lock(&pcp->lock); + if ((cci->per_cpu_data_slice_size >> PAGE_SHIFT) > 3 * pcp->batch) + pcp->flags |= PCPF_FREE_HIGH_BATCH; + else + pcp->flags &= ~PCPF_FREE_HIGH_BATCH; + spin_unlock(&pcp->lock); + } +} + +void setup_pcp_cacheinfo(void) +{ + struct zone *zone; + + for_each_populated_zone(zone) + zone_pcp_update_cacheinfo(zone); +} + /* * Allocate per cpu pagesets and initialize them. * Before this call only boot pagesets were available.