From patchwork Tue Sep 26 06:09:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 13398714 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71761E7D0C5 for ; Tue, 26 Sep 2023 06:09:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 111668D006A; Tue, 26 Sep 2023 02:09:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C0C98D0005; Tue, 26 Sep 2023 02:09:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA5858D006A; Tue, 26 Sep 2023 02:09:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C48468D0005 for ; Tue, 26 Sep 2023 02:09:49 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8A04540F6E for ; Tue, 26 Sep 2023 06:09:49 +0000 (UTC) X-FDA: 81277722498.30.16B48B7 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by imf19.hostedemail.com (Postfix) with ESMTP id 7F7481A0006 for ; Tue, 26 Sep 2023 06:09:47 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="P3UNhv/z"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695708587; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DryvSbimGLkTtB6Yi2MUdnBMNq996jm6NuOBUqckrJg=; b=ipkHSeuFnZD8NZBO6gW3s+MZuCrbkBUvQb1gVDvOwMRl/zylJch0hVED9eniE4p69sJJbS r40c8fAp4DAGK0AAMM9YY2nUl5ILCUMjpaxf3OYAAJvj3MhRkFTJUPkxwLZkRebnmhRT6l zNV0NwANlpgxUZl+0gnn8DxwVGCxgws= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="P3UNhv/z"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695708587; a=rsa-sha256; cv=none; b=8jzRpML1MZeyG1pElB+BhO9yAn0r8TEuirzv7Pu3HEektP4n8/kV5D0TV6lgj/F4l0mxFy L73vZlSvzkwOn3677QwUTYzIgvv0Ing80pSg3q1/AnePfse7SptSPQhJc8ofnxYslxOrYf 6OXmRguMhIzB9pmFVSpCwAAEj+otgDU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695708587; x=1727244587; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZtH26z6BJyiTQzNvHwodkC3VC0x9VX0B8MkwItrlS3k=; b=P3UNhv/zAZz09D9nv4Yo/kpXMR/CYI5XtN2Y/Z+8nnNU5KODgePXwOXn KNLTqixsr1DhB92CPOPkLSqSTdNFQ5XJCJflIaSXVFqViBPDejZIlUYdD 5uoV6ZROLT2IoXOLuzflf2YLpk63NfQPIAio+zKXq6CpiG61Vm1YyAoUl bYVjivihThW4FMW78d/1xpK8Tp3augNY66nTtm9iq+JEGmJJ6lnUMy5lq GhPMM7ClolfgbyK7R1z33KJ514DKQsWs+eBAebI5HT/OugOcA92RulRF8 QfJ4nqZqfaX7wKYEb7r8VZSCuBtjkBDi9Wo0sWiQWuM/UHHb5ihDJOjq5 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10843"; a="447991363" X-IronPort-AV: E=Sophos;i="6.03,177,1694761200"; d="scan'208";a="447991363" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2023 23:09:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10843"; a="892075929" X-IronPort-AV: E=Sophos;i="6.03,177,1694761200"; d="scan'208";a="892075929" Received: from aozhu-mobl.ccr.corp.intel.com (HELO yhuang6-mobl2.ccr.corp.intel.com) ([10.255.31.94]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2023 23:08:39 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Huang Ying , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: [PATCH -V2 05/10] mm, page_alloc: scale the number of pages that are batch allocated Date: Tue, 26 Sep 2023 14:09:06 +0800 Message-Id: <20230926060911.266511-6-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230926060911.266511-1-ying.huang@intel.com> References: <20230926060911.266511-1-ying.huang@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7F7481A0006 X-Stat-Signature: dqfujfsxk58e1dop8mi8pt8nemxqxbno X-HE-Tag: 1695708587-60991 X-HE-Meta: U2FsdGVkX1+ddbCBxkxcUsW1Zd8+M1Fn+l3XaLZMv2by1KycMq0gW8mvz8IWvBFFMhASC8sLzaO6RSX5P6v179IRGkJvCGNFEbT0FIKJM7vdmhQJqlh+vI67QhMLqXmZKfxG3nNUyvNFtBveOA2griD2D/gpU4IckkONPtsnjF0D36thvsccUM3nFbFA/3V0Vn0Toh/BUUJZklUy7i2gF/5cJTumwz03ghEv24zkhVITTvaXTQTSXzZEZ3mK5U2oEIBVz9s2w+wEufnsB/JX+Yt3DA3olR799LIzDZVxOTNXd3HFnDBflcvfRGQcWNt5C1dNG/88IpM3Wt3e0F9Qmn1VaLP3ZNUGPRADE3o86NmHkgywcaLip9MjRd7pMFNLSlX4UQToF3/J3/tCCVA+UPYwet2mXIiDoViwjRWBnSjjBdN2KQT4zE/60v6rvSWmpME0FB6YKncIzLDC4uAI//3T7RjA95cCgzFhxeifd23kNyNc5rQLeaVnqTSsLoBGjUyNxMwOOq6H4cu44H8oN9cPAO66XrFBJOH60r1GrAq10t3buPW2K/hxOzyr7Yi1wpnwtsh8Lk/af8tQm/6diSvSPumHsrmBtQ14V83nuunb1qu2eMPiniZq1JrhncihdavRuY91KU4tRIP6RtJL48KmBF4aPDQYE2IoCRWoPKP55aNRXelZ5czfvLW6CpXTVgrmM+rUJMzSPe8YxStmT3WOyXMvgH6ZcK70AblzAZAGs4GRRVrx9qrpaoBMaT0SzC83QUWstjqjn5dOKwVTBcVCMuV1F6cc2kFTJJMzwCfcpGMz1LMzbNt3HNBBdH/cEJ+s1DVtODdZbzCBAfaR6t8PX/uXVkd8KaMnSiDbxtZMrnbH+BPTRtTCh1BRCjnxkO2URMvCqPzjdch6eqlueu8EE62sQFKrSUX9n6lnpU1qmo/0rB/qZFkBE3kEUeXwe+LcsZq+WxCG68SDQeC eQaU7urH mR9I7sUOJncB26oGsTGMogV4REIwLoxGgXTfKoBCRd+tCCnT9UZJNFYVnTIacRhotrg3ktbSM3P/GOsrNL1Cra/pPfhSNpbtlx001D9q9ECTFk6sJCWnKn+LXuF8j/ciC3dKuC9eC2+0nAFsZWREue0pjPpR80f75MuqYSaJ6vp5AfXdF7l1/ccA7uP8QV9o6PuSqHpmTZ4j7HmRTps6BxcUUrGsyglsy5xVBpYEySYfJnzui7YGoET6ty1n8BqTJnz+TJVWPpVZDTGr3LoPESjmt98bOnSw5jF+oPVzWUxJIa4ydmgEqLqbpxpNYWGdXc+Q4/NMyNi7amN5K9gtgYOoGdggrbZTKvky+0PcepEKyHHld9p60XYR3LvDkLqwlW7JBb8kcFA7OH46CKHxebD0Hwp9CJJb7om6GNVOtH/9BdZPhSPYk+PTk7bXai0/Q86PNOQ7dvsmkTv8W6y+WO6KPkvgVA/q0kIxC/j+murgGz8NtQEFifUIleacXzcmPzGUzHHHK9O8kfnVwmTIMgYYA9JyDhRYg/EPeuTebl6MhTay+zZd03vOCVUvszm5b71k8EfTIeB2Be5qbE2tzbxiVww== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a task is allocating a large number of order-0 pages, it may acquire the zone->lock multiple times allocating pages in batches. This may unnecessarily contend on the zone lock when allocating very large number of pages. This patch adapts the size of the batch based on the recent pattern to scale the batch size for subsequent allocations. On a 2-socket Intel server with 224 logical CPU, we run 8 kbuild instances in parallel (each with `make -j 28`) in 8 cgroup. This simulates the kbuild server that is used by 0-Day kbuild service. With the patch, the cycles% of the spinlock contention (mostly for zone lock) decreases from 11.7% to 10.0% (with PCP size == 361). Signed-off-by: "Huang, Ying" Suggested-by: Mel Gorman Cc: Andrew Morton Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Johannes Weiner Cc: Dave Hansen Cc: Michal Hocko Cc: Pavel Tatashin Cc: Matthew Wilcox Cc: Christoph Lameter --- include/linux/mmzone.h | 3 ++- mm/page_alloc.c | 52 ++++++++++++++++++++++++++++++++++-------- 2 files changed, 44 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4132e7490b49..4f7420e35fbb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -685,9 +685,10 @@ struct per_cpu_pages { int high; /* high watermark, emptying needed */ int batch; /* chunk size for buddy add/remove */ u8 flags; /* protected by pcp->lock */ + u8 alloc_factor; /* batch scaling factor during allocate */ u8 free_factor; /* batch scaling factor during free */ #ifdef CONFIG_NUMA - short expire; /* When 0, remote pagesets are drained */ + u8 expire; /* When 0, remote pagesets are drained */ #endif /* Lists of pages, one per migrate type stored on the pcp-lists */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4b601f505401..b9226845abf7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2376,6 +2376,12 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, int pindex; bool free_high = false; + /* + * On freeing, reduce the number of pages that are batch allocated. + * See nr_pcp_alloc() where alloc_factor is increased for subsequent + * allocations. + */ + pcp->alloc_factor >>= 1; __count_vm_events(PGFREE, 1 << order); pindex = order_to_pindex(migratetype, order); list_add(&page->pcp_list, &pcp->lists[pindex]); @@ -2682,6 +2688,41 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, return page; } +static int nr_pcp_alloc(struct per_cpu_pages *pcp, int order) +{ + int high, batch, max_nr_alloc; + + high = READ_ONCE(pcp->high); + batch = READ_ONCE(pcp->batch); + + /* Check for PCP disabled or boot pageset */ + if (unlikely(high < batch)) + return 1; + + /* + * Double the number of pages allocated each time there is subsequent + * refiling of order-0 pages without drain. + */ + if (!order) { + max_nr_alloc = max(high - pcp->count - batch, batch); + batch <<= pcp->alloc_factor; + if (batch <= max_nr_alloc && pcp->alloc_factor < PCP_BATCH_SCALE_MAX) + pcp->alloc_factor++; + batch = min(batch, max_nr_alloc); + } + + /* + * Scale batch relative to order if batch implies free pages + * can be stored on the PCP. Batch can be 1 for small zones or + * for boot pagesets which should never store free pages as + * the pages may belong to arbitrary zones. + */ + if (batch > 1) + batch = max(batch >> order, 2); + + return batch; +} + /* Remove page from the per-cpu list, caller must protect the list */ static inline struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, @@ -2694,18 +2735,9 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, do { if (list_empty(list)) { - int batch = READ_ONCE(pcp->batch); + int batch = nr_pcp_alloc(pcp, order); int alloced; - /* - * Scale batch relative to order if batch implies - * free pages can be stored on the PCP. Batch can - * be 1 for small zones or for boot pagesets which - * should never store free pages as the pages may - * belong to arbitrary zones. - */ - if (batch > 1) - batch = max(batch >> order, 2); alloced = rmqueue_bulk(zone, order, batch, list, migratetype, alloc_flags);