From patchwork Fri Jan 26 08:19:44 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Huang, Ying" <ying.huang@intel.com>
X-Patchwork-Id: 13532183
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C2522C47422
	for <linux-mm@archiver.kernel.org>; Fri, 26 Jan 2024 08:20:13 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 192D76B00C8; Fri, 26 Jan 2024 03:20:13 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 142846B00C9; Fri, 26 Jan 2024 03:20:13 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id F25316B00CC; Fri, 26 Jan 2024 03:20:12 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com
 [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id DFFB66B00C8
	for <linux-mm@kvack.org>; Fri, 26 Jan 2024 03:20:12 -0500 (EST)
Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id B2823A1A55
	for <linux-mm@kvack.org>; Fri, 26 Jan 2024 08:20:12 +0000 (UTC)
X-FDA: 81720764664.13.2792E4E
Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115])
	by imf25.hostedemail.com (Postfix) with ESMTP id 08EA6A0004
	for <linux-mm@kvack.org>; Fri, 26 Jan 2024 08:20:09 +0000 (UTC)
Authentication-Results: imf25.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=dcKRWdnW;
	spf=pass (imf25.hostedemail.com: domain of ying.huang@intel.com designates
 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1706257211;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:references:dkim-signature;
	bh=2P2Se62MghcU2s4Eqf+wD/ZoP4yVInFmFKSAhyIeOQw=;
	b=JGJrySvXrAlJUv7pv3Kz8Ktj8bJ5E9UgUkip9fr6Gixw/aUEYPcJCj4NjHoHG5qreg5Vqi
	R7xnNs+C854P4LtJyilzqshNUXbAvP0C3rGnBQmRr+t/5Odp/f7LePfLyGZSLZJgvG6I+9
	pGI/1t+wO+kqgo6QJNvAXKKX+KtYveE=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706257211; a=rsa-sha256;
	cv=none;
	b=Q5KIC4+ChPd7ruVEzzUviDS6NRfTHOohNnsWpw+ppJ4o7J7y29KrBHBNtA3AGI1bGLlQEk
	1t2lWHBzzTE9iuHL73W4XkA/Xazm1FwbMfYY0Fvyff/NBy4fCy+8GjRDBI6i5UG8N5Edmg
	a07nTaN9+jOT1tG0WbfOWq5YFNBkiqc=
ARC-Authentication-Results: i=1;
	imf25.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=dcKRWdnW;
	spf=pass (imf25.hostedemail.com: domain of ying.huang@intel.com designates
 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1706257210; x=1737793210;
  h=from:to:cc:subject:date:message-id:mime-version:
   content-transfer-encoding;
  bh=OD0YTTntqeuNvF0BaJzpRDfSOwaABfzWUmxl74XDdto=;
  b=dcKRWdnWRSCMy5n7ob5eZsoLT+qPB5HgtyfUW7DcDythz3UsPcsCqlOt
   TcdqMO7gyT//IZqg+eb0tXt8QxOgGmARVajfiHsgPGdC53WtNle2MxuMA
   DbJMQjeSlsvLm5i09LemPVObwL3NMOwiok4d+JYYrTmdltWxYpHef/TC9
   ZhyoUYqqTL9Q98SsPAyTfW0C8JMhKOglM6DyAoA+AD45HgwvqCJnsFVcg
   vqa5ns+8POHUrMnyKHM6rKEniaFZxgoZUZBBc9sNUJBgd55Wn5wicuZEg
   2E8PYWZmmtsbYYeN1s7GI2L3IthyeTFA7Yzd2EpvBzgkUjk4eVv5Gx6sC
   w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10964"; a="402070541"
X-IronPort-AV: E=Sophos;i="6.05,216,1701158400";
   d="scan'208";a="402070541"
Received: from fmviesa004.fm.intel.com ([10.60.135.144])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 26 Jan 2024 00:20:08 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.05,216,1701158400";
   d="scan'208";a="2678380"
Received: from yahuili1-mobl.ccr.corp.intel.com (HELO
 yhuang6-mobl2.ccr.corp.intel.com) ([10.255.29.54])
  by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 26 Jan 2024 00:20:06 -0800
From: Huang Ying <ying.huang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Sudeep Holla <sudeep.holla@arm.com>
Cc: linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Huang Ying <ying.huang@intel.com>,
	Kyle Meyer <kyle.meyer@hpe.com>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH] mm and cache_info: remove unnecessary CPU cache info update
Date: Fri, 26 Jan 2024 16:19:44 +0800
Message-Id: <20240126081944.414520-1-ying.huang@intel.com>
X-Mailer: git-send-email 2.39.2
MIME-Version: 1.0
X-Stat-Signature: gsicsd4as5php7mk5osr8go1jyeuyca5
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: 08EA6A0004
X-Rspam-User: 
X-HE-Tag: 1706257209-432675
X-HE-Meta: 
 U2FsdGVkX1/yKWV9d0I/WCyGa1Xq5HXxtJivHkqe9Fmqhnmh5dSeY043YrsHU8+UrMlj/hzDF6rIip+KSnRRGdnArnmp/k36Ezk70QK7n6el6r9TkQBPKSK6PMzMTm30HV6eN0fi9C6BBbgbMEZb1RWzfMrXcFH54xSSzEAeRGaw1FpY3ZP2fw32cVCMyY69rozbpaRhVhDUqwjFR84eBjMw3AeyptiAU9CRiCdPMwxoZ0tmeOmZkAuWdWvp5pkcKMyr2QwD5w1DRvHpV5cpCI/qOmEv0cwfKu2Y+LQ0KrrtP1+WLUFKUFDbAXURT2z5RjehWuj+zXEWui+dh/Y5OBP5XFUxQh43FNZft9bUQT/jczxNEq0KwEGlUVtoHFelZOx6jgK0m++0pGLHy+aoamf+TUyZM1GOSJNpv+cDGh7MW6w/j175TBeElRiFOf49qmbdafDW2MTyBdJHbp9yvxRiVBGzyHibcibG4f9+LtAgBPdrSv4x1QcJUB4g0F7JyH3vbPIMzhiHqn7apXkXekFW+AAZXavMwnGyfRQNnBTlVqIeK6HZkpZtYykHa2DVd7E4vew5y/GdS9rQQxvELMMdorlZBQJVJZpPjueHJXPSECXAU/KmORZdhyVSgInH8GxSN1XlsGSy9JSaDuGLCgrz0QOga6zKiBEuL/N9PDIhChr/00qb2NvDvwzI5WXCXnmKHft0a7WU0JniOeRT0oEaNIdX6qMR2rP2LQiZrWxHuoW3sMBs1X5Idlg3PgsFZFiL0x+8F5hl/jRoM3EYsvzf7Rvp/ogRAHekOaIu97s7WBT30fuVoB1c2MNJuac099Qhce/I34OItlFNbCO90Y5umgRnDIwe2eUjG2JO8YG0VJ1JKvI04Sd1KSq3gQlEt1NDEHhLiR+TBzkp8ZswSbq3vYHkuwKqkNpTh7VPhM1Ms9Z2WAdejejGkrHYnrchl7KYbun6JUlaQNus7eG
 6+rQu18A
 Co+bRffQnR6bIc1L6ZNvZJHeLDp+k0I9yVsvCTkMe0ht/Ank+6JvxkVZD4JooOS38muSh3VrAOBbc9YTaQ3EzcIi6pNGymYF7I/M1Z3DuaJSBa19tSLNx/s649JxK786Go2HGYjU6LvzrXvxF4zD9sTdWfd55OZn5ue72x1ECbPV4c5ibTo2BBhtAUY7umy/YJ1vuIS32EIijd+R1WSqxm9CVqpY+WZmIvc2kpoVjV96+4Ka8RvyemaYHjYE2ty94gou6AuFSbCLy7/u0f16huY+okei+Z3rXfMLip2XegeyyJaBvdNZESw/kM7Y6yfcANQR9GwizvtfyvNTs8ISruEtqTRPaCfStESmK+1FYL47aMwRegvEgn9BOxTFmUngw5sjFeWrYeGYbdXnkx+KXzET3u0t1WYfP8JvKJpwFoevsgGnWuxnlkj3bqw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

For each CPU hotplug event, we will update per-CPU data slice size and
corresponding PCP configuration for every online CPU to make the
implementation simple.  But, Kyle reported that this takes tens
seconds during boot on a machine with 34 zones and 3840 CPUs.

So, in this patch, for each CPU hotplug event, we only update per-CPU
data slice size and corresponding PCP configuration for the CPUs that
share caches with the hotplugged CPU.  With the patch, the system boot
time reduces 67 seconds on the machine.

Fixes: 362d37a106dd ("mm, pcp: reduce lock contention for draining high-order pages")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Originally-by: Kyle Meyer <kyle.meyer@hpe.com>
Reported-and-tested-by: Kyle Meyer <kyle.meyer@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
---
 drivers/base/cacheinfo.c | 50 +++++++++++++++++++++++++++++++++++-----
 include/linux/gfp.h      |  2 +-
 mm/page_alloc.c          | 39 +++++++++++++++----------------
 3 files changed, 63 insertions(+), 28 deletions(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index f1e79263fe61..23b8cba4a2a3 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -898,6 +898,37 @@ static int cache_add_dev(unsigned int cpu)
 	return rc;
 }
 
+static unsigned int cpu_map_shared_cache(bool online, unsigned int cpu,
+					 cpumask_t **map)
+{
+	struct cacheinfo *llc, *sib_llc;
+	unsigned int sibling;
+
+	if (!last_level_cache_is_valid(cpu))
+		return 0;
+
+	llc = per_cpu_cacheinfo_idx(cpu, cache_leaves(cpu) - 1);
+
+	if (llc->type != CACHE_TYPE_DATA && llc->type != CACHE_TYPE_UNIFIED)
+		return 0;
+
+	if (online) {
+		*map = &llc->shared_cpu_map;
+		return cpumask_weight(*map);
+	}
+
+	/* shared_cpu_map of offlined CPU will be cleared, so use sibling map */
+	for_each_cpu(sibling, &llc->shared_cpu_map) {
+		if (sibling == cpu || !last_level_cache_is_valid(sibling))
+			continue;
+		sib_llc = per_cpu_cacheinfo_idx(sibling, cache_leaves(sibling) - 1);
+		*map = &sib_llc->shared_cpu_map;
+		return cpumask_weight(*map);
+	}
+
+	return 0;
+}
+
 /*
  * Calculate the size of the per-CPU data cache slice.  This can be
  * used to estimate the size of the data cache slice that can be used
@@ -929,28 +960,31 @@ static void update_per_cpu_data_slice_size_cpu(unsigned int cpu)
 		ci->per_cpu_data_slice_size = llc->size / nr_shared;
 }
 
-static void update_per_cpu_data_slice_size(bool cpu_online, unsigned int cpu)
+static void update_per_cpu_data_slice_size(bool cpu_online, unsigned int cpu,
+					   cpumask_t *cpu_map)
 {
 	unsigned int icpu;
 
-	for_each_online_cpu(icpu) {
+	for_each_cpu(icpu, cpu_map) {
 		if (!cpu_online && icpu == cpu)
 			continue;
 		update_per_cpu_data_slice_size_cpu(icpu);
+		setup_pcp_cacheinfo(icpu);
 	}
 }
 
 static int cacheinfo_cpu_online(unsigned int cpu)
 {
 	int rc = detect_cache_attributes(cpu);
+	cpumask_t *cpu_map;
 
 	if (rc)
 		return rc;
 	rc = cache_add_dev(cpu);
 	if (rc)
 		goto err;
-	update_per_cpu_data_slice_size(true, cpu);
-	setup_pcp_cacheinfo();
+	if (cpu_map_shared_cache(true, cpu, &cpu_map))
+		update_per_cpu_data_slice_size(true, cpu, cpu_map);
 	return 0;
 err:
 	free_cache_attributes(cpu);
@@ -959,12 +993,16 @@ static int cacheinfo_cpu_online(unsigned int cpu)
 
 static int cacheinfo_cpu_pre_down(unsigned int cpu)
 {
+	cpumask_t *cpu_map;
+	unsigned int nr_shared;
+
+	nr_shared = cpu_map_shared_cache(false, cpu, &cpu_map);
 	if (cpumask_test_and_clear_cpu(cpu, &cache_dev_map))
 		cpu_cache_sysfs_exit(cpu);
 
 	free_cache_attributes(cpu);
-	update_per_cpu_data_slice_size(false, cpu);
-	setup_pcp_cacheinfo();
+	if (nr_shared > 1)
+		update_per_cpu_data_slice_size(false, cpu, cpu_map);
 	return 0;
 }
 
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index de292a007138..09e22091f1b0 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -334,7 +334,7 @@ void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
 void page_alloc_init_late(void);
-void setup_pcp_cacheinfo(void);
+void setup_pcp_cacheinfo(unsigned int cpu);
 
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 150d4f23b010..9faca05d124e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5572,37 +5572,34 @@ static void zone_pcp_update(struct zone *zone, int cpu_online)
 	mutex_unlock(&pcp_batch_high_lock);
 }
 
-static void zone_pcp_update_cacheinfo(struct zone *zone)
+static void zone_pcp_update_cacheinfo(struct zone *zone, unsigned int cpu)
 {
-	int cpu;
 	struct per_cpu_pages *pcp;
 	struct cpu_cacheinfo *cci;
 
-	for_each_online_cpu(cpu) {
-		pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
-		cci = get_cpu_cacheinfo(cpu);
-		/*
-		 * If data cache slice of CPU is large enough, "pcp->batch"
-		 * pages can be preserved in PCP before draining PCP for
-		 * consecutive high-order pages freeing without allocation.
-		 * This can reduce zone lock contention without hurting
-		 * cache-hot pages sharing.
-		 */
-		spin_lock(&pcp->lock);
-		if ((cci->per_cpu_data_slice_size >> PAGE_SHIFT) > 3 * pcp->batch)
-			pcp->flags |= PCPF_FREE_HIGH_BATCH;
-		else
-			pcp->flags &= ~PCPF_FREE_HIGH_BATCH;
-		spin_unlock(&pcp->lock);
-	}
+	pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
+	cci = get_cpu_cacheinfo(cpu);
+	/*
+	 * If data cache slice of CPU is large enough, "pcp->batch"
+	 * pages can be preserved in PCP before draining PCP for
+	 * consecutive high-order pages freeing without allocation.
+	 * This can reduce zone lock contention without hurting
+	 * cache-hot pages sharing.
+	 */
+	spin_lock(&pcp->lock);
+	if ((cci->per_cpu_data_slice_size >> PAGE_SHIFT) > 3 * pcp->batch)
+		pcp->flags |= PCPF_FREE_HIGH_BATCH;
+	else
+		pcp->flags &= ~PCPF_FREE_HIGH_BATCH;
+	spin_unlock(&pcp->lock);
 }
 
-void setup_pcp_cacheinfo(void)
+void setup_pcp_cacheinfo(unsigned int cpu)
 {
 	struct zone *zone;
 
 	for_each_populated_zone(zone)
-		zone_pcp_update_cacheinfo(zone);
+		zone_pcp_update_cacheinfo(zone, cpu);
 }
 
 /*