From patchwork Mon Jul 22 02:10:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Li Zhijian X-Patchwork-Id: 13738213 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28B65C3DA61 for ; Mon, 22 Jul 2024 02:11:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A8ED6B0082; Sun, 21 Jul 2024 22:11:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 858966B0083; Sun, 21 Jul 2024 22:11:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F8616B0085; Sun, 21 Jul 2024 22:11:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 521426B0082 for ; Sun, 21 Jul 2024 22:11:27 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BBC76161403 for ; Mon, 22 Jul 2024 02:11:26 +0000 (UTC) X-FDA: 82365761772.23.06F1256 Received: from esa11.hc1455-7.c3s2.iphmx.com (esa11.hc1455-7.c3s2.iphmx.com [207.54.90.137]) by imf05.hostedemail.com (Postfix) with ESMTP id 85A84100008 for ; Mon, 22 Jul 2024 02:11:23 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=owE89Ey8; spf=pass (imf05.hostedemail.com: domain of lizhijian@fujitsu.com designates 207.54.90.137 as permitted sender) smtp.mailfrom=lizhijian@fujitsu.com; dmarc=pass (policy=reject) header.from=fujitsu.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721614261; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=zVPSZzW3PMK4XykwpTqjoSo2a+8wxsBx0G8EMziWt7o=; b=rVzuO+16NAzB38wHurbbgmyPywTKo8dCVPUsf5u6BH02HHe9USUa3fRwdbpJdtw3qCmVs+ 8lD8plt7Dd4Pd1wsr1E5lVz+NK3tTej2R/dv7jPlihokVYwmO8odJn20REpP2PjagMetiE QQVvKM57niVRyZTExi8xYOFlSDtxQa8= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=owE89Ey8; spf=pass (imf05.hostedemail.com: domain of lizhijian@fujitsu.com designates 207.54.90.137 as permitted sender) smtp.mailfrom=lizhijian@fujitsu.com; dmarc=pass (policy=reject) header.from=fujitsu.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721614261; a=rsa-sha256; cv=none; b=8EjDOa+XyECk8WWEHeXocltGpcBaN+yb3J4yERVRHXPfNWKBOqE08PxZ3yg0PVlveA2id2 tHy9iEAu9Ty88Wsc9bHAxXPYrxi1twnzETrCQpQtmVQ+vOzRbAmxtw2Hnf6xx+wqAahXd3 2Qjs8EGBmYYjm1df0GcZv+ejSQd0vks= DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1721614284; x=1753150284; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=I5DMe3EnS7G1j/NeL0/4/YwmynXJRTUqok3lFtFxT7s=; b=owE89Ey8h8v2gxC/kl/Da5EjmdU/+OoNdowO+hXtLNgY1r/rijaniqRi Rbc4Vn4DVa2GWQ+jxeV6Q/HhrqJltGjuZgotg+95unaDdSdEO8OboUO8S bFMhOzSeyn0+UslMFuX3UI2ZNZ4ke0Hyce7dPQWqojmapOjOpaTCSdY5V 4ZESjxxqVqV+48widaIBsQ9hvfP2q2dbEGGq2HXfdgG4idLfcTuJk3cOv Fp78d+4jflOjSrbuvQ94+FzTZ5JGX/DJ1jALW7urDYz32e36AkB0D+/g6 2aGzE9Olepfz6JDPiNHUC/0EQEWhw/EePdic/cjnErcuO9apzkLDDxgVi Q==; X-IronPort-AV: E=McAfee;i="6700,10204,11140"; a="147491044" X-IronPort-AV: E=Sophos;i="6.09,227,1716217200"; d="scan'208";a="147491044" Received: from unknown (HELO oym-r1.gw.nic.fujitsu.com) ([210.162.30.89]) by esa11.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2024 11:11:22 +0900 Received: from oym-m2.gw.nic.fujitsu.com (oym-nat-oym-m2.gw.nic.fujitsu.com [192.168.87.59]) by oym-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id 54C87A8C11 for ; Mon, 22 Jul 2024 11:11:19 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by oym-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id 959B7BF3D1 for ; Mon, 22 Jul 2024 11:11:18 +0900 (JST) Received: from edo.cn.fujitsu.com (edo.cn.fujitsu.com [10.167.33.5]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id 264F96C826 for ; Mon, 22 Jul 2024 11:11:18 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.226.45]) by edo.cn.fujitsu.com (Postfix) with ESMTP id 57B821A000B; Mon, 22 Jul 2024 10:11:17 +0800 (CST) From: Li Zhijian To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Yasunori Gotou , Li Zhijian , David Hildenbrand , Vlastimil Babka , Yao Xingtao Subject: [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() Date: Mon, 22 Jul 2024 10:10:59 +0800 Message-Id: <20240722021059.1076399-1-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-28544.004 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-28544.004 X-TMASE-Result: 10--5.996900-10.000000 X-TMASE-MatchedRID: +R0/YEDHEDjoSitJVour/QuB7zdAMUjAl9q75JzWJRMarX6LchMkVuBw lzWEEXt2fatVQLA534yNMkq6FfSn6lnGEjlsas2yFDuTLTe6zcNMkOX0UoduuV7V7de6UnlgmKb hu5KaCkf9F5gpB/8TUo2MogdbmQhJWSEm/dnndoSdVNZaI2n6//SzAdIVxUno2vch1fMqmI8mIm l+ywrqvklEFjVj/aAsbncztPPsTqsv+0FNnM7lDRVqL8+WwS7muhv94WF6cmmm04TWLzKiuBhBv WgZlX+84vM1YF6AJbbCCfuIMF6xLSAHAopEd76vdp8SlsBStysjUE3BmlSs165ZdugdSt2/9b4c 9nQDp+sRP1XlvFsUag== X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 85A84100008 X-Stat-Signature: kusk5rkba1fquskt1kn1w5ib81oimqi4 X-HE-Tag: 1721614283-677695 X-HE-Meta: U2FsdGVkX19gbgTpZbjSN/FPbzdTJk8WTdt3uCi8VBHqfV6sF2KwH8Uy4zc/n1jEM/aNDaXVg53AXjq6qQcFgKmEf1rSQMEDDkkR3uxJsPdC9dxmJhc0c1cHEc7kcT73fUe3ShGc31q4G3nlU54Nr+AiOYLgefSxTbXCII/k3RgPM/Fc6ffp2ekfp1RZ8iKkHuIfj213ecUS4haeV7TAnpZoJDC+KJDMLT1+w05r52U3KSg/WMeeyTSgws0diV7fBzdslwBouXfPsHBmDbVozOpBYDU67RajP79G8WGDss8czJIaTuYNuGqfApOrildHg6IDRW3tSDqF23DIvrjVAYpJ4wU44R6LnGGEVmUzNsO38ypVnLXzcAk/++U4DXp6n7AViQclEXGS8mdURNiZsPlOK+ewW941XonmFXW5IT8Rss71Hz/k8iTue6OPql1zQzsac3cCb5KgPUW8IZBoSQtkIxrkLOyLpTw93Yvsv4Ep1H5RE5ZdJvzEKtCVsf/UdWk5njlQxAZEqqQS0dNm5dA7qZ72PigvEJ+LlYGHSB885UIeMuZ9KQ/BHp7ASNpGSKnMN3Zw1/dlFzKOCMkxbQOTsiJtaGrs1gOrpiuzLVn/yEzw40KUSbC7wgx8M3ZFM78KEsTJu8lq7unF8qiN1y5Ydf4/ukTCDfk15sueoVDLNJ0nEWH5krc9nwfEodV2y8+muawZxh9/gxRHYTK3nDGhONR/OXfwAyRwXvtM7Q8n3wnOFF2ywj+YFUfGlgNA4Zm0OI+Q8vfkfl/vp702xtLch+Q/3OHDOPeioGIsLz3WXjrUPT4NyTEoNAoiBqbLzrKqCxHz/sVupIvAmGpHNlLKgCEYAGo9Vuq78i7xiOYeBqQ/0GCSSiS2IeqPapSSoiScXYULw03t/UQV0+mTsiAfbKZPJ0x1EOlkIqPfxJ/EEbgJ52/yDV6S3Es9tDDVSlh6Qe36GDnUZdXDKd/ 5KLTXSTs zAqM5QvR1EQ4X23YlvMkmGqCcabj7h3YA9ig9OgzXaJ5HqM5Gbmq5ZX+AQtI2szMI2fu3PV/nZFG5VtljHnbPp7IFgqnvwnhPH8K1x76il9AOSwrcxlukYllPBqoAGC3tYQU8iyJivd7r0+2WNRLkJJRm++HWMCVAV/C2yZUsa8AaPPDM/g6gNhL6QYJqJ1Teuoz65Q+zmZXo3DOYOpzTOh7ks4TeCBn2CC0dkOOERoT2HPDTf6zGVV+oooFF005oCWGL7tPKLZ6DTDcz9c7e8BBsu4zpQquyaafeGp1iwfhTNd7Ov9Rw+V6jR37IB8aPQIklnRBh7Uz+9gAvjIgeT7NJ564aciHmLGO4HgBfR/Djb560t/P7LrX3gA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It's expected that no page should be left in pcp_list after calling zone_pcp_disable() in offline_pages(). Previously, it's observed that offline_pages() gets stuck [1] due to some pages remaining in pcp_list. Cause: There is a race condition between drain_pages_zone() and __rmqueue_pcplist() involving the pcp->count variable. See below scenario: CPU0 CPU1 ---------------- --------------- spin_lock(&pcp->lock); __rmqueue_pcplist() { zone_pcp_disable() { /* list is empty */ if (list_empty(list)) { /* add pages to pcp_list */ alloced = rmqueue_bulk() mutex_lock(&pcp_batch_high_lock) ... __drain_all_pages() { drain_pages_zone() { /* read pcp->count, it's 0 here */ count = READ_ONCE(pcp->count) /* 0 means nothing to drain */ /* update pcp->count */ pcp->count += alloced << order; ... ... spin_unlock(&pcp->lock); In this case, after calling zone_pcp_disable() though, there are still some pages in pcp_list. And these pages in pcp_list are neither movable nor isolated, offline_pages() gets stuck as a result. Solution: Expand the scope of the pcp->lock to also protect pcp->count in drain_pages_zone(), to ensure no pages are left in the pcp list after zone_pcp_disable() [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/ Cc: David Hildenbrand Cc: Vlastimil Babka (SUSE) Reported-by: Yao Xingtao Signed-off-by: Li Zhijian --- V2: - Narrow down the scope of the spin_lock() to limit the draining latency. # Vlastimil and David - In above scenario, it's sufficient to read pcp->count once with lock held, and it fully fixed my issue[1] in thounds runs(It happened in more than 5% before). RFC: https://lore.kernel.org/linux-mm/20240716073929.843277-1-lizhijian@fujitsu.com/ --- mm/page_alloc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9ecf99190ea2..5388a35c4e9c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2323,8 +2323,11 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) static void drain_pages_zone(unsigned int cpu, struct zone *zone) { struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - int count = READ_ONCE(pcp->count); + int count; + spin_lock(&pcp->lock); + count = pcp->count; + spin_unlock(&pcp->lock); while (count) { int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX); count -= to_drain;