From patchwork Mon Oct 16 05:30:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 13422496 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B2BECDB482 for ; Mon, 16 Oct 2023 05:30:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ADEFF8D0038; Mon, 16 Oct 2023 01:30:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A96D38D0001; Mon, 16 Oct 2023 01:30:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92D0C8D0038; Mon, 16 Oct 2023 01:30:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8096C8D0001 for ; Mon, 16 Oct 2023 01:30:55 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 520A5140A72 for ; Mon, 16 Oct 2023 05:30:55 +0000 (UTC) X-FDA: 81350200470.02.F61CF4E Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by imf28.hostedemail.com (Postfix) with ESMTP id 33904C0010 for ; Mon, 16 Oct 2023 05:30:52 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YTKcEB2h; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697434253; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jUzUCsLJ+6KcY9UNZ5/m46p1p7GpyZ/XvA5MmUi6M/w=; b=LeRXNKclDXb5MyT9CAkYdCt/moQ4PTqER/e139/ZPm/0zjh/HISqrEhq29vXkpWyCv+aIj LH/q6HjyArrCCKwPklCudVvSfbaviRe2J4hTxCO5gElq6geEqz9v0D74IBKcMX/mlY01zk iOnGm6kVwXmUAk+HugDc+Oy1dE55J9w= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YTKcEB2h; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697434253; a=rsa-sha256; cv=none; b=XNDtO12sZwUoWXDRLKc8TxkWzuUzJz9Zqa814BiOkNkgrJu2vE0FJ9tlf6Z771JAq/T6GC PfDn4fFSmB/IRDd+C+ZP/CuTeKTW6tU9MVKsGXkW3xsnQfenfOHgdTFVHwGjRzk85bnB6y Gv0d495Xku3Es6hOGH5xwgfNYzHxdGU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697434253; x=1728970253; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TDEUGRS7WQQwagCwDvutKWW3kgub6dOrNZn90bOr990=; b=YTKcEB2hq98hRxCKjYqa65P1kcmJD3IPdU8by5wMflLP3TI8tQvXmi3v bg+DZeIoBQzEBoTTlzd+Kewv7wcNPGCem05163pRbE5WMNGCH3eiNSlfP 2TnJ3Tq3W7eY4YanTsHckPbc44f+7oHMRwz4eh00tbjov/UO205r0TLRt FpLhdbPlir3xLVJcxE7+9Jdc6XFqAnk00cc2L1TeSfom78yB5c8OpP71N Q0QAqLFd0YmHh0YrWsGHQrwpdTGX0BV2JvwTRTfIPg1IX+VcRM1ZtVJZ3 cl+tthg2REzMmYI+6aYi7OBBxCCs/mDVkTjbXtlI8dy5DA5YsdMRnoMW7 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10863"; a="389308151" X-IronPort-AV: E=Sophos;i="6.03,228,1694761200"; d="scan'208";a="389308151" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2023 22:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10863"; a="899356777" X-IronPort-AV: E=Sophos;i="6.03,228,1694761200"; d="scan'208";a="899356777" Received: from yhuang6-mobl2.sh.intel.com ([10.238.6.133]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2023 22:28:51 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Huang Ying , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: [PATCH -V3 9/9] mm, pcp: reduce detecting time of consecutive high order page freeing Date: Mon, 16 Oct 2023 13:30:02 +0800 Message-Id: <20231016053002.756205-10-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231016053002.756205-1-ying.huang@intel.com> References: <20231016053002.756205-1-ying.huang@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 33904C0010 X-Stat-Signature: oc5uyhhzrw5g6agq9r36s849op6j9ga8 X-Rspam-User: X-HE-Tag: 1697434252-391673 X-HE-Meta: U2FsdGVkX19VKus2CJWtG8yi8ZfMlI3ftGV2H2L7AeCHx7ptBmmN7QdQi1vXRWrp/h+tIpBgARHlTfBb74vAU59t6yVCkdq67k5x4k0hymzyg9Bdfk3jcY+6ZLo2T6c1wkr3NEoriqlIP5yxGLDBSCrjM9RFynjt0mJaKHg15VogYIB8ww7omg8oTyF1PUNpAfN8+WAcAIg5qteJ8JXItni2DDrntf6aOuRrWmJ8FUv6VfvXQa0dlnSOrhoOzD6/eYqQqLE72bA2oOCi77+rug8Oh3h9mwHRQKuV2Eoij5ZbQJdvpRBCYDQtaqgIJ7NGf1rejgBXxbUqL8GJXZj/odfMpEamipRRFBzVZXba0NWr/g6zXgFvaPOgr9v/EQQjFlFCs4iifDoKbcTnV1rb88Yj4gDXEYlJGtwW6PNRmYdxtBFLxGnVLGkNA6EyUPeX/39Ic7UJRs/RaNvMRs12/fwq4C3MdqusWGwF16DeCjGIjFQyU+Dt8T66cK+fbwSfMFrqdLSEHu0UdsSekHrdAyeg/6sHyVz64k9kH1+WVJO7gFqRnO0i2EjQLL/INOzHI8c3DQu6TIjaoJLaH9mCcKdrVQk2+5cMcvsPLf59vwdbVg6jbGq5fgPRcyw8Go85vom+lv9IeVsCOm9kV3HIEHYxWqgUWYw/hCH7kgqYIBn8zQLoyWsTXeXvypj6qTsEDdfpfnAZxFVSK63BPC9WbcZrkOHu8MaLYz2FkkkmYKOSfaxo4kT1ltsxT0iTf1HK9QNsWzT6tX9qzLeT7DvXYV67NEE1i6w4PEyPUSkoTh0QtPneMKH2wZ6v5+hhwXL9021pQ6vpN9OGg/hGayPvcPh7+qd+P2QCHOjNoAk91yk6GFYyLdHxpfzfkDYCezrEMqn4ugAEEVVfH+BoMSizLKTGZ9JoFKQMYWkfv2faEaJIzdntOCB/Z8ghnfs2ZKhe5L2eIjmeJAtem/ADRJV PztJf9kY Hpjxlda0NDb29DY9RGYahuWoEne6nUuYw5WikIMTXCylnikrBbFrVImolroJHIt2cHk1MQqnx1NAPrnVgLTDmCbTayHAE8C+uzLqDgfmlTel7FHDIrrSui9ZS9q2t77jrxohMPZtKe7X8a78g94ZHLbnwFNdyCm4zXQ3e94tpA1G2BJHQ50ROXDsnGOatK1ZJLBFkI8i0aSW3AMzAiThTHGMfF+IpJiuaQFyoykWI1IErWmCSqZz37qVhQiA/Bk1VWRw7a6VF4DW2qcdWF+Nk81+H2DBX0yGCGFOQzz9TXnjzJVJSgs/mOHuvtQuTRkIe4t/TT6BKQe4JvGR1BDKdm98YTFh1ATDz90ZunrlPtJPYlpP81Xr/TrqQsAYP1+DXOa8irtWTuMyCSxdlNFSGM9O5yW3gMiixAp6Ow3txfRJ5FUh0V3Ko2qQQJ+DNsjETaWBTRKXbAAqmvyK6rAQuX/Hio/ckf687bfjTzR+/OeUUFiPGNHBkijoLOqKEZtPP/LlKO2s5RNuWPU7XjFjTph1QblDjRsmJJlVg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In current PCP auto-tuning design, if the number of pages allocated is much more than that of pages freed on a CPU, the PCP high may become the maximal value even if the allocating/freeing depth is small, for example, in the sender of network workloads. If a CPU was used as sender originally, then it is used as receiver after context switching, we need to fill the whole PCP with maximal high before triggering PCP draining for consecutive high order freeing. This will hurt the performance of some network workloads. To solve the issue, in this patch, we will track the consecutive page freeing with a counter in stead of relying on PCP draining. So, we can detect consecutive page freeing much earlier. On a 2-socket Intel server with 128 logical CPU, we tested SCTP_STREAM_MANY test case of netperf test suite with 64-pair processes. With the patch, the network bandwidth improves 5.0%. This restores the performance drop caused by PCP auto-tuning. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Mel Gorman Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Johannes Weiner Cc: Dave Hansen Cc: Michal Hocko Cc: Pavel Tatashin Cc: Matthew Wilcox Cc: Christoph Lameter --- include/linux/mmzone.h | 2 +- mm/page_alloc.c | 27 +++++++++++++++------------ 2 files changed, 16 insertions(+), 13 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c88770381aaf..57086c57b8e4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -700,10 +700,10 @@ struct per_cpu_pages { int batch; /* chunk size for buddy add/remove */ u8 flags; /* protected by pcp->lock */ u8 alloc_factor; /* batch scaling factor during allocate */ - u8 free_factor; /* batch scaling factor during free */ #ifdef CONFIG_NUMA u8 expire; /* When 0, remote pagesets are drained */ #endif + short free_count; /* consecutive free count */ /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[NR_PCP_LISTS]; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 253fc7d0498e..28088dd7a968 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2369,13 +2369,10 @@ static int nr_pcp_free(struct per_cpu_pages *pcp, int batch, int high, bool free max_nr_free = high - batch; /* - * Double the number of pages freed each time there is subsequent - * freeing of pages without any allocation. + * Increase the batch number to the number of the consecutive + * freed pages to reduce zone lock contention. */ - batch <<= pcp->free_factor; - if (batch <= max_nr_free && pcp->free_factor < CONFIG_PCP_BATCH_SCALE_MAX) - pcp->free_factor++; - batch = clamp(batch, min_nr_free, max_nr_free); + batch = clamp_t(int, pcp->free_count, min_nr_free, max_nr_free); return batch; } @@ -2403,7 +2400,9 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, * stored on pcp lists */ if (test_bit(ZONE_RECLAIM_ACTIVE, &zone->flags)) { - pcp->high = max(high - (batch << pcp->free_factor), high_min); + int free_count = max_t(int, pcp->free_count, batch); + + pcp->high = max(high - free_count, high_min); return min(batch << 2, pcp->high); } @@ -2411,10 +2410,12 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, return high; if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) { - pcp->high = max(high - (batch << pcp->free_factor), high_min); + int free_count = max_t(int, pcp->free_count, batch); + + pcp->high = max(high - free_count, high_min); high = max(pcp->count, high_min); } else if (pcp->count >= high) { - int need_high = (batch << pcp->free_factor) + batch; + int need_high = pcp->free_count + batch; /* pcp->high should be large enough to hold batch freed pages */ if (pcp->high < need_high) @@ -2451,7 +2452,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, * stops will be drained from vmstat refresh context. */ if (order && order <= PAGE_ALLOC_COSTLY_ORDER) { - free_high = (pcp->free_factor && + free_high = (pcp->free_count >= batch && (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) && (!(pcp->flags & PCPF_FREE_HIGH_BATCH) || pcp->count >= READ_ONCE(batch))); @@ -2459,6 +2460,8 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, } else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) { pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER; } + if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX)) + pcp->free_count += (1 << order); high = nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), @@ -2855,7 +2858,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, * See nr_pcp_free() where free_factor is increased for subsequent * frees. */ - pcp->free_factor >>= 1; + pcp->free_count >>= 1; list = &pcp->lists[order_to_pindex(migratetype, order)]; page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); pcp_spin_unlock(pcp); @@ -5488,7 +5491,7 @@ static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonesta pcp->high_min = BOOT_PAGESET_HIGH; pcp->high_max = BOOT_PAGESET_HIGH; pcp->batch = BOOT_PAGESET_BATCH; - pcp->free_factor = 0; + pcp->free_count = 0; } static void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long high_min,