From patchwork Wed Mar 6 09:52:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13583716 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD43DC54E41 for ; Wed, 6 Mar 2024 09:52:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55A066B0080; Wed, 6 Mar 2024 04:52:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 50A166B0081; Wed, 6 Mar 2024 04:52:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D1E86B0082; Wed, 6 Mar 2024 04:52:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 262D06B0080 for ; Wed, 6 Mar 2024 04:52:46 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EDEA6A0341 for ; Wed, 6 Mar 2024 09:52:45 +0000 (UTC) X-FDA: 81866149890.28.FE08C65 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf29.hostedemail.com (Postfix) with ESMTP id 1F1C512000F for ; Wed, 6 Mar 2024 09:52:43 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="gVWx/R5I"; spf=pass (imf29.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709718764; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=zubgQBhzmQBglHTVeH+hOy6HJUtzVF3pbAGw/V3QBpw=; b=Btj/rIFtjztpBli/2v1uLbCdPZoJU90da6p3m4dTWQJPlVaEqU6jyADfgPRks48y13aQsj S/e72bfw0nWXVK3rE36wq1Sc+HbMiBwZkE0o+RnuRJxey7y9KWsaHsGHX+l/u/Rw1YiwxX VP4j/2GI0s4AVaWW6pMo4Dz2RILIZuY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709718764; a=rsa-sha256; cv=none; b=nOGLyaonQF5WUmO0qXI5sSeZBHOpEMTCTiuwVFxzfX71fSJx6hBBcOk2TvnLMXrugbFLOI xVtOFW8xq8j6KQTElccCIc5a2IxAD/U3XBnKhHUCIOGaccj77OctQ3IJEX5lZXt2XcsArz ZXc5CDhdpgp4RjESwGrsXnFuzd+NJGQ= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="gVWx/R5I"; spf=pass (imf29.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-5bdbe2de25fso5809944a12.3 for ; Wed, 06 Mar 2024 01:52:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709718763; x=1710323563; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=zubgQBhzmQBglHTVeH+hOy6HJUtzVF3pbAGw/V3QBpw=; b=gVWx/R5I5KYlTWKaSTBefEV1fJkbjnt6/tLHYU7ZGrQdfykGVYEFl5x3HT0rNXEd5z ys6kZRIlM3Z6KFlcJ4Bey0YpKgLmV2VJVotM6nTSg+MC/TybqbXnXUrak8EtVSRcH8yz h0npLR+TFOFbgJdg6X5KXUio5NL7mQq4HVAozAklKXehzPEtI4wH4qpHhX7xnvMSiP2h hNhJqNQaJPg1+Tv4HkX2lRi0etW7u1A8d40FYyslfTRBUcQMi3z3JLmK4OOaFNxf8lmq NOn6dQSV61CclxT6pFx3d/yYnJLBCKhoe8tH9/dfkMnfnAJLmuA6HtKqdFg87w4v9oei TLBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709718763; x=1710323563; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zubgQBhzmQBglHTVeH+hOy6HJUtzVF3pbAGw/V3QBpw=; b=aUpjbmdwixC5rP2iqrLfBB2WMgvOnjWEAANyyUlqqBvtEDM7ui/T6ch1SNI4GTgkpQ 7E44tj99/eAI82r6MnRXgLxYwHqcYDZ+FhUoMvq8E/aOp9o0NqTdYeqdcpXgdH1bZckF CD1FunwvhucPfrNvTr45ljQoMxEO9Ix5Yjt4sg0yDI9JsIULxzF1/1Zfx3E80Ofj84BK vjR5tjj1BIfVQK1yyF9XSCtk3x+vNyAdLzg9DHwYv3aGCJshTYTYQOsZ4cBj8jbdEEI2 9sFC4LMcrY+Sp/zuLaORWzSizh1BYNGLuayHjgjWHSqFbLoJpFhpchzPios/b163qbh6 30xg== X-Forwarded-Encrypted: i=1; AJvYcCUcfoEMtiH3oXSAfviB2xI/QXX3s11RVUBiNKbaQWHv6DbQh3324F6wlC+8aA4wIp60SGqdyDh6ggHRWS/Xj8zXAes= X-Gm-Message-State: AOJu0Yw7AaGL38ZznWPDdW7Tv2UtKgC/kfp6ShE3h3Xml+WAK7DKLGlT 9PjN2IX0REj4l3743RBx/ilJkEzD9el19lYeQbN6RMTdeyFWFD02 X-Google-Smtp-Source: AGHT+IFAtV7h6EuOPK7Mwdi9iLAk9qFoL5TCAMjANfrYPXtTt1X0IrwtoFML2XrBPcYr8E+4p7CPmw== X-Received: by 2002:a17:90a:948a:b0:29a:d01a:a32 with SMTP id s10-20020a17090a948a00b0029ad01a0a32mr13800145pjo.26.1709718762759; Wed, 06 Mar 2024 01:52:42 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ce17-20020a17090aff1100b0029b78a9a1desm818043pjb.32.2024.03.06.01.52.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Mar 2024 01:52:42 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: chrisl@kernel.org, david@redhat.com, hanchuanhua@oppo.com, hughd@google.com, linux-kernel@vger.kernel.org, mhocko@suse.com, ryan.roberts@arm.com, shy828301@gmail.com, v-songbaohua@oppo.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com Subject: [PATCH v2] mm: hold PTL from the first PTE while reclaiming a large folio Date: Wed, 6 Mar 2024 22:52:19 +1300 Message-Id: <20240306095219.71086-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Stat-Signature: sstm9xtj7s553ej9xun334cxdge9zdqo X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1F1C512000F X-Rspam-User: X-HE-Tag: 1709718763-41371 X-HE-Meta: U2FsdGVkX1/fp3s2IkIpMnLSpyyaDSnkueKN1ucw6zdIsKPP/FF3XjMtWLICjIKytV2DR2FccEeDO/d8YlzR1GMDfTTyPpZWQDk+Ba3SlekVLO6GWVbBdbCRptglaeQL7lnsiHVrA/TXtfvskKQYwb136FA5hRXCN5lqJsU4f6UwaBX5hUlWRsGyZf0/Z8c7Iodmrs7haHb+mG10624zs8gGwLXY3arNAkDIbbsvMzj6zVTC4nY1Nw7HjEVhY4PNiQmPNNOPIW+fIlk1V3atC0v9tRttVuSBlreBi1tn/muWpUT5vZz2Kjjd1jRcDbxDs3czEQHO3uxZptpFp2WpuucQ1iNr0befwRvN6cE83qOLLkvJglC/TVN3dDrC/2KOTmKzhIkk7+/t91CbKBj2BJCkZaAiY4jFjbDPJ4ey4/TiTIr3VGjVVWKTDvYCqoZHPVZEKZE/eBHa0Qk5qik55jrZq9k2hmQM8kHGQwzbpFFgEtnPZG3f20ul+uY1vaPc5BtNirQgphQ+M4zcKSi4QJIiRZ7kr/kPnJht+Ow1rt2NaEzM/RtDNh1j+sRV8mfJHeKagc8QK6BM/8PKkTmwyKGpYydzHFfmPSw53txEoCDa54RNvz/OT7K0WIn4YnEfmI4rhLraJsxzrHEsjKljU3NHANLsuDThHYHzR3CD2SMPk7GAETGbqultCzgSvqrufbzFONqpuHILcdOVZdJkXluXdXvM+N2z9LdDp/8OEPN4HqsWpdvOeRDE5uejWTreBY08TJLjx37az5pZrYDmYrAOSJdHJJFm4GQLLsieSpX+bBxJU/T9jRcxPwCyjXSqoC4Le0BTYroP4WW+CSvG1eCHFEqrfibFMvPlaHTJZ9UXDyf4OgvBqaO27lCb14JJ9UMS+/1BOG4Vp0K2EdekeFoixNcRdD1x7UU1amRb6PTOYi7GSOa3wgiU8eFu2CJsTPYGKYsDa84X9qEvF0G vVzqRvny gKU38rrO3fJhursqsgt63E+Xx7M9eRyUU08V5sxd0vqVOLzwbGRWfm3/WoPppthwpqOHwlJKDuEyAs3GW6/MgRfr+nQsloGlyXiNIm3pihfi9UqEqur00RvnQX2YkfVyqF5i06Bgo1ylhAHnhXCBjjjT5glYKp0wVmcZ8FwafRESk20/9YbJu2UPKO2PBFk5+t00pT+bl94i4RzSiTColHKcULO+xTDtjM7E79MyzYdV4v1CHDR4M0Nc4yy3Ji1RHpMYABAyUlWRbnLQcYZsMYSpP4PlwoTpIdD3Nrdsh7gSMsMSZPW+ijQ++Xys7x0GAMiVdGheIPLSSN9yIpTOZr2eqVgQbzsLzj757EiOxncc7+kvGDLxZXX+aHdirlibRHsudNNEwLTNDF8YBaaO/AoZZ2DccwCjp4IZP+V+7aU5zuxEd1MyYCtz6Ju2OI/iKudGvbRzHjRBv0dIfKvGdRzVylG2ZZ7CqkmhpL2rQqbRVOr3KrnzGRdAT8Lou9gm3Gx1n X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE modifications preceded by pte clear. While iterating over PTEs of a large folio, it only starts acquiring PTL from the first valid (present) PTE. PTE modifications can temporarily set PTEs to pte_none. Consequently, the initial PTEs of a large folio might be skipped in try_to_unmap_one(). For example, for an anon folio, if we skip PTE0, we may have PTE0 which is still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after try_to_unmap_one(). So folio will be still mapped, the folio fails to be reclaimed and is put back to LRU in this round. This also breaks up PTEs optimization such as CONT-PTE on this large folio and may lead to accident folio_split() afterwards. And since a part of PTEs are now swap entries, accessing those parts will introduce overhead - do_swap_page. Although the kernel can withstand all of the above issues, the situation still seems quite awkward and warrants making it more ideal. The same race also occurs with small folios, but they have only one PTE, thus, it won't be possible for them to be partially unmapped. This patch holds PTL from PTE0, allowing us to avoid reading PTE values that are in the process of being transformed. With stable PTE values, we can ensure that this large folio is either completely reclaimed or that all PTEs remain untouched in this round. A corner case is that if we hold PTL from PTE0 and most initial PTEs have been really unmapped before that, we may increase the duration of holding PTL. Thus we only apply this optimization to folios which are still entirely mapped (not in deferred_split list). Cc: Hugh Dickins Signed-off-by: Barry Song Acked-by: David Hildenbrand --- v2: * Refine commit message and code comment after reading all comments from Ryan and David, thanks! * Avoid increasing the duration of PTL by applying optimization on folios not in deferred_split_list with respect to Ying's comment, thanks! mm/vmscan.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index 0b888a2afa58..7106741387e8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1270,6 +1270,18 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (folio_test_pmd_mappable(folio)) flags |= TTU_SPLIT_HUGE_PMD; + /* + * Without TTU_SYNC, try_to_unmap will only begin to hold PTL + * from the first present PTE within a large folio. Some initial + * PTEs might be skipped due to races with parallel PTE writes + * in which PTEs can be cleared temporarily before being written + * new present values. This will lead to a large folio is still + * mapped while some subpages have been partially unmapped after + * try_to_unmap; TTU_SYNC helps try_to_unmap acquire PTL from the + * first PTE, eliminating the influence of temporary PTE values. + */ + if (folio_test_large(folio) && list_empty(&folio->_deferred_list)) + flags |= TTU_SYNC; try_to_unmap(folio, flags); if (folio_mapped(folio)) {