From patchwork Tue Sep 26 06:09:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 13398712 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38821E8181F for ; Tue, 26 Sep 2023 06:09:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C0FAE8D0068; Tue, 26 Sep 2023 02:09:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BBDFE8D0005; Tue, 26 Sep 2023 02:09:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A384E8D0068; Tue, 26 Sep 2023 02:09:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7C3758D0005 for ; Tue, 26 Sep 2023 02:09:41 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3D64F160F73 for ; Tue, 26 Sep 2023 06:09:41 +0000 (UTC) X-FDA: 81277722162.18.B08962F Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by imf19.hostedemail.com (Postfix) with ESMTP id 2626E1A0017 for ; Tue, 26 Sep 2023 06:09:39 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Onl6DQeW; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695708579; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hUErPygrtV1xgN130d18EVh1V1cyOuIF7+UyarrA6AY=; b=MYE39gUeAk99SQJGxgk7neWxv039P3lV8q/HWa3527OdBCKWPjqQhyCZB2oFzJliW4aLts pv9KMhUZ/StvhNJNqcQQtCaSnIr+o3UgGsNru4wW4F3AOM73tf7cTe/ISNMUzE8T5XVG9+ sJm+fA90kYFhG2g/bPzDuVj6OVauin8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Onl6DQeW; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695708579; a=rsa-sha256; cv=none; b=mXORW96WN0XCrUyKqN+nIPveu7RHhnQTR/p/FC8L8aYvNmtAeXzCevWoFJqpDC4qroSDtc FDpSUqx4nd54TFN23bmF9psGI4pnUmW+f/WNFS4uBUDl5Atgfm7M8pvJUZH2BgBj87T6nm ooK5cMCEaLXsvR5lJIgYjC22hY61C+I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695708579; x=1727244579; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DeMClJs3Mthv/xHdoFt3z4ewOtGnsu4v3P02NwqV+vA=; b=Onl6DQeWyWcBz/jr0e+i1wx12H3eEKJrGDSG0XoRv2ZuizHzPWRVpxOm BCN3rFTjM5ITYj8773QaKbP1jKJm0KBc7aK0Hjm9xrb1CUwAcplpCTP0l VTyZKiBLXlK6RQ5hG/H6ezS4ES6VRHfwbelp5YNDvcGSj0HWXSxWGoQkf ozkSiOjakNrb5IUQ4iWNVUSQgxXYvZyaunRYScmvXKfoyLciHbgt6ovR+ a8fmVwltDY9Pb7l90xoKmNfTRr2LBPoF+WpIyJRDF5mPhekk/WCblrkTD lcT8sCAJkymbG82yG6sX+7QYvgQ5o4en6ok1Zp8hyD9pMGyjN+NO9H4RR g==; X-IronPort-AV: E=McAfee;i="6600,9927,10843"; a="447991312" X-IronPort-AV: E=Sophos;i="6.03,177,1694761200"; d="scan'208";a="447991312" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2023 23:09:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10843"; a="892075877" X-IronPort-AV: E=Sophos;i="6.03,177,1694761200"; d="scan'208";a="892075877" Received: from aozhu-mobl.ccr.corp.intel.com (HELO yhuang6-mobl2.ccr.corp.intel.com) ([10.255.31.94]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2023 23:08:30 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Huang Ying , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: [PATCH -V2 03/10] mm, pcp: reduce lock contention for draining high-order pages Date: Tue, 26 Sep 2023 14:09:04 +0800 Message-Id: <20230926060911.266511-4-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230926060911.266511-1-ying.huang@intel.com> References: <20230926060911.266511-1-ying.huang@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2626E1A0017 X-Stat-Signature: cm99a5f4iyk9joiztutngtx5sewii7cs X-HE-Tag: 1695708579-460751 X-HE-Meta: U2FsdGVkX18CzxYGlZRLc4WESL+CZ6+zcZNdXw7YN32OjsZfQjJy6qVgpXOiy+Wa6fKTvS67ZWhQz17qTzvLL4V5/7QX9+cuaLtl3IXoj8n7uFdzvKakjVwYOR3dB6E81fjUWpDFd4BXMXO13TOX7ocWvhkUJuOWiWzKO81DUVxb4gUGIR3oSZ8FMWNjTc/OSFy54/yLhzjxLyJo6k4gwu8eMHIdei5dHGadJR73RuwOiT0DtBO2R5J60Owbc/JqKAIWI4mblHsi0RphAwdooO1f+QOl9UYcv7Mf6tTofeUkTrcLD2Fix8h1rYjMo9F4HzJOzjEnTvlRxgeeIS7GnHLnslIBQude+BpaWJHsTXn9jDNCDJcE9m3awryeSuUXdIJnIaXdeCz1KjjpAjINdsmuP+Ibw+OV/o8vjzcdxzm6+Ku9esNLd/38vQlQMuWj4BafFj7MenysSp2fk0Xyv2Zee7eAi4glb+HHwivX5mAKOHALkly3jQX/YwlKOv6OUnZDsuvPmR1N6QEPqiLgeWE4OwGflZjHz2H80H3N3a3+5ZYhQG8tb6iIAfhf+fSCx7L5jNECV1qg6YksA9FjyZKDjnrl063IxDRZmiUHq4TBqlcOjWVvgbUKdF1okNCAitV+azHvZ3R7sHJ/1lBDbTnAqIDIUMeX3DivHXLVQDGenMqt/cMXDS1zWXeyJEVF8tYhSsl2Fy9ElbPhU82MscIl7t6Q4QoUSN0DvKvyDOVpi0/n9Uth1Qy+UwReRWI63YuHfM3Jfu+i0tM1QI49iYhRNWFhtZQkzBNpiGNbht7Xw1TV4oS7qCu/A6LX73LtKA71vY5hp/AaNmWTfUZ6DqQt6RZ+Rq2QWksLpPvJ+MRPB/0ykHix96c1Fm2kmf0EiCf6Hqaz6jlbbP0GyC/h7mPFYoRGMnXiZZcsL92/cdj/1ODsmbjM7AB7IVlCfyWgB/5RWAQ4Z40aVebsLxU h8ghqv0Z tDMfV1FA2QJ/12UvXI3kyv830oGW4MZ/Bb//2aX/Zs82AbHU8VyKmSTvHsE8tXy8ncNWtzTN/Mfox6mCf7jyzNevOIQbshkzLAbxoKn6Y9XMW/g2LqS3gQvtKNz6G8oGdkCRLV5Mx1vR+A/hy2B78of3JpxF0RNK7fxxxAL1EKsxXgV8wW70pYY+T7Eem34ZolCO21ukufGnCZBrCjYyAjlhA/mLUUnDQ4vD1imBqJeDrjEXpichpiANn2RFctWqYDlTw7Q10DpKRB2e1Iq78+b/eGQKSF3/O58LGFLSP3PW6XKEVFAhgxRKy2GyrR0L9p2+WJMCXXItdTGLPGJZVMBo1E+jTBeDWzzrKMyH1gy3oQwU490c5NVijtmuV2cgSVjv1b1RvEtnNGdbtp0OOxUg9elxBQgdp0DBkj/NiCVBQ+jJK9d+0zrs7UFiZeCME95IVzvzqa0yBM/NVVlL/Hqrht4qiqQV/6YaKhr2nrBrJN1bIDrCSQeyEQx4O0ftUVaLB/uxrUposfaRSRFdZbXnuplGNXDmMwOgT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In commit f26b3fa04611 ("mm/page_alloc: limit number of high-order pages on PCP during bulk free"), the PCP (Per-CPU Pageset) will be drained when PCP is mostly used for high-order pages freeing to improve the cache-hot pages reusing between page allocating and freeing CPUs. On system with small per-CPU data cache, pages shouldn't be cached before draining to guarantee cache-hot. But on a system with large per-CPU data cache, more pages can be cached before draining to reduce zone lock contention. So, in this patch, instead of draining without any caching, "batch" pages will be cached in PCP before draining if the per-CPU data cache size is more than "4 * batch". On a 2-socket Intel server with 128 logical CPU, with the patch, the network bandwidth of the UNIX (AF_UNIX) test case of lmbench test suite with 16-pair processes increase 72.2%. The cycles% of the spinlock contention (mostly for zone lock) decreases from 45.8% to 21.2%. The number of PCP draining for high order pages freeing (free_high) decreases 89.8%. The cache miss rate keeps 0.3%. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Mel Gorman Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Johannes Weiner Cc: Dave Hansen Cc: Michal Hocko Cc: Pavel Tatashin Cc: Matthew Wilcox Cc: Christoph Lameter --- drivers/base/cacheinfo.c | 2 ++ include/linux/gfp.h | 1 + include/linux/mmzone.h | 1 + mm/page_alloc.c | 37 ++++++++++++++++++++++++++++++++++++- 4 files changed, 40 insertions(+), 1 deletion(-) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index 3e8951a3fbab..a55b2f83958b 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -943,6 +943,7 @@ static int cacheinfo_cpu_online(unsigned int cpu) if (rc) goto err; update_data_cache_size(true, cpu); + setup_pcp_cacheinfo(); return 0; err: free_cache_attributes(cpu); @@ -956,6 +957,7 @@ static int cacheinfo_cpu_pre_down(unsigned int cpu) free_cache_attributes(cpu); update_data_cache_size(false, cpu); + setup_pcp_cacheinfo(); return 0; } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 665f06675c83..665edc11fb9f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -325,6 +325,7 @@ void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); void page_alloc_init_late(void); +void setup_pcp_cacheinfo(void); /* * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 64d5ed2bb724..4132e7490b49 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -677,6 +677,7 @@ enum zone_watermarks { #define wmark_pages(z, i) (z->_watermark[i] + z->watermark_boost) #define PCPF_PREV_FREE_HIGH_ORDER 0x01 +#define PCPF_FREE_HIGH_BATCH 0x02 struct per_cpu_pages { spinlock_t lock; /* Protects lists field */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 295e61f0c49d..e97814985710 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include "internal.h" #include "shuffle.h" @@ -2385,7 +2386,9 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, */ if (order && order <= PAGE_ALLOC_COSTLY_ORDER) { free_high = (pcp->free_factor && - (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER)); + (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) && + (!(pcp->flags & PCPF_FREE_HIGH_BATCH) || + pcp->count >= READ_ONCE(pcp->batch))); pcp->flags |= PCPF_PREV_FREE_HIGH_ORDER; } else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) { pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER; @@ -5418,6 +5421,38 @@ static void zone_pcp_update(struct zone *zone, int cpu_online) mutex_unlock(&pcp_batch_high_lock); } +static void zone_pcp_update_cacheinfo(struct zone *zone) +{ + int cpu; + struct per_cpu_pages *pcp; + struct cpu_cacheinfo *cci; + + for_each_online_cpu(cpu) { + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + cci = get_cpu_cacheinfo(cpu); + /* + * If per-CPU data cache is large enough, up to + * "batch" high-order pages can be cached in PCP for + * consecutive freeing. This can reduce zone lock + * contention without hurting cache-hot pages sharing. + */ + spin_lock(&pcp->lock); + if ((cci->size_data >> PAGE_SHIFT) > 4 * pcp->batch) + pcp->flags |= PCPF_FREE_HIGH_BATCH; + else + pcp->flags &= ~PCPF_FREE_HIGH_BATCH; + spin_unlock(&pcp->lock); + } +} + +void setup_pcp_cacheinfo(void) +{ + struct zone *zone; + + for_each_populated_zone(zone) + zone_pcp_update_cacheinfo(zone); +} + /* * Allocate per cpu pagesets and initialize them. * Before this call only boot pagesets were available.