From patchwork Wed Jul 12 04:30:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309558 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11810EB64D9 for ; Wed, 12 Jul 2023 04:30:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8597B6B0071; Wed, 12 Jul 2023 00:30:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 809396B0072; Wed, 12 Jul 2023 00:30:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D13F6B0075; Wed, 12 Jul 2023 00:30:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5DE496B0071 for ; Wed, 12 Jul 2023 00:30:57 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0CAECB0452 for ; Wed, 12 Jul 2023 04:30:57 +0000 (UTC) X-FDA: 81001684554.20.93F8559 Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) by imf25.hostedemail.com (Postfix) with ESMTP id 17E18A0007 for ; Wed, 12 Jul 2023 04:30:54 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=RBETceKT; spf=pass (imf25.hostedemail.com: domain of hughd@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136255; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Lih86tpYDD3J7zG12H/yvKEagSCLBGtcWN3xnGIcrQ8=; b=MLjbVdD1V7148sbw5Ej9wIHM3wEakfYIa632kqqiWs9nwQ11OBfUNWnfvQc4rgT3YRfW4t o7xJ1f3/N1Pg6UzeS0NifPLXPmePaye2r0oF0xb0D0awgbgCq+nMLl8o/XbKam6rk3DV+8 jziv6DEbUwwskTuUxqKC115Via4AcXM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136255; a=rsa-sha256; cv=none; b=1cYRDsDhj0XeVqUeRrJ6a9wZv/n3LCEno7IEdcp4qTld6dzIUx3tHdlzRVCNRNo8relouR Kxo3rU0rGcmRWHWxbqXWZjDHcz4XeWRcfieFazgZA4rLICekCcpy7yDYai2qThqjrECozw 0DEEfNaCJ4g2SKmBWynsdYQnmNIWSaU= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=RBETceKT; spf=pass (imf25.hostedemail.com: domain of hughd@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f169.google.com with SMTP id 3f1490d57ef6-c4dd264359cso6099648276.3 for ; Tue, 11 Jul 2023 21:30:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136254; x=1691728254; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Lih86tpYDD3J7zG12H/yvKEagSCLBGtcWN3xnGIcrQ8=; b=RBETceKTxUIdE5Wthq5VyXl1iuewXFCZtvj7DAV2aR3XnxKua3oR4hmOqLeBfiK+Bo QX5kHPh+nmkLT9qkicOsu2l1qxImFf+42zqUi71fI6EVfsz6Wd8j8qUrewGd84C27fQF z08YkGvSrNifDm+H1bjPb9jqoZlNdzHFdJ/PGyx79rM0dM13MzFCo6oGzmwGF3H2DHxK jhmvYEXUAQ46tN3PfeVxdIIezDCwRgqWW3fiTQ5gpTM5B8KutupuJmp6cYEM5cSz9f5W fLAc5DNp+0Wj3x5T2KO5EPXVkVyBxsI6lOPcm2rzPIqTF2kKG/dKpgqnKRkoKETZyp95 sIwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136254; x=1691728254; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Lih86tpYDD3J7zG12H/yvKEagSCLBGtcWN3xnGIcrQ8=; b=SYQc/Y3JiQdro6zu7O0ss/+wEb5b17w5WJ9i0+jglMPjqOogOvwOYMelHH7P3wdV5N 8LMS8458Bu7zwj0gIP6WXerbyXKvTYK94MfINWpnaxQwErU4XMYZIfVXj727aIUA1pxA AEMligAoc/dHcU95vinedAEM+jI86DaYXk1bCHVrqbBmvvV6rUBeeBL2yPZxRC5MSyE5 IYFSQs6mLsLbJNv5QhjXv8SSNzQc3AfmIn4Z/UdnRQoKjyZbXb+7FubzoUbGCDNecj/+ A0iEFr5vXRLnQYxFvv+B+VPWa2N3XaAbZlw8JgmuvjTvY798rNvlJfrEFU++I1OITPyp GQxQ== X-Gm-Message-State: ABy/qLZt4cvJLj5S8GZI7gk8dH4XFfKMFjAVNUC1AFypnpTRaCZDgykT qwgV4KXbN7mKSS+Z7Yw2Dwg77A== X-Google-Smtp-Source: APBJJlG+aqkZgSinL0H+vGhJ8SPTaMZTSfjnzUjRq5ezvCFkxr+RV56UV3ToUF/QNi8B1m8dggx9Pw== X-Received: by 2002:a25:8a06:0:b0:c3d:25af:45ec with SMTP id g6-20020a258a06000000b00c3d25af45ecmr14975706ybl.41.1689136254058; Tue, 11 Jul 2023 21:30:54 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id b8-20020a252e48000000b00c61af359b15sm750774ybn.43.2023.07.11.21.30.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:30:53 -0700 (PDT) Date: Tue, 11 Jul 2023 21:30:40 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 01/13] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Stat-Signature: 3j46yih6uecjfibdoz3t9sbicchjcy9f X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 17E18A0007 X-Rspam-User: X-HE-Tag: 1689136254-297270 X-HE-Meta: U2FsdGVkX1/YL+ur3lncx2FwAAnMs8X9+x2RVvmJZstFYmEAO/gid2LvonzKwZ1kYKrmlA8niHIcfMIe/E+1O1a31Xcyq1nSlS3j/Y3wWRmK0+BDja23m7eVjirWK1S+55/9SOd7R7qCBGIdOX1NNYKXTCmY1hIFiwDDr4AxfK1fcQ8i9RcOJWt7ikFA9sVxpfgsgfUtcRN1aiXPdA/TRQ/AwH4g2t69aV1BKooakBQN38GDhJRb7V5fsqp/5gvtREXBktfqrLeOHnIl80Ph2wNtI0CISK4pmi6bbJ3P8434xwmF4oENJfwkFtJsYTeFE3EDUR22ZEJXQIMIHONpQp9OBP9qlOk5CykSfw5pu2KgWj0HLTlhY/JB2RD2s4810OewQn9lu/hV5R3jeecwFP/TMlyvvfTWUUN24pO0Qstuid+TWDfxT1aZVNPIgfMXraifLOQSLUrNAKqGx+5m/Epr7r5VjQkqZH80dPM9mHEApND70I5Wo172pAGuyGffhuBtIdCwJPflQ4YRQdxZYRCg77XExDeIP705hGAnQhf+y9cYge2h2/X9x8tCJqxTxaiAGcl+zjdcS85mAOG57Jdr9xG/i1992sCHUYhHfed6QgJ3JfTh73T6rH5ao0pRU9jj5ZLQ2+rQKYVaJKwBL/lu5tKAOJP690OUyRaq1mcVecaMd5Ouv2WrPsBxkxOvRq5Uy9G4xgWISzRkCsAostvslONh25dnw/xff5w5FdJzFpxzPbGYhRNZzBhLK94jVKbG7LUFdgqISjG3KRKrommwcfGDJk82FhoCi0UhrRDHHM4CeRHegHaUtnvDRjd3QQYa4KhhxiE+fFVW0wJIYcwn2tFZcPbfTfBY0Q2WW+AHCxjazHEN22phCFbU9IELOGLrYHvwSTNalRjbE9NYb3yp9DKjDcRWVMS5fownk2yIEzfdgdy2TBlyOPfqSBnJ0GOwUqg0Di77EV5l13B cGx1mOd2 6JVChCTsLoYZD5m8jKqsZzszhZZoZ7iGZitEY90Ry8qXaIlkFXtzBczWoq38nFwACMdO5elEtJhGpPsa1bTn7K5XubymTOaU4e1+fy6oiUV2brCvjKyma1hCy2MIwJQzRC7qG92rHofMrw3sdZNa3uc3/QKXKasXXH1O3lmfI4km7787TcrhK4ACMnFHxhduP8VJ7o8vHefRKb1+isAeP13EDFv4vt6UpQr2XXO/tReGvfeWIzUgzo25EwbxDgaUEe8ekQ5jlvEF73G8Nwn1mtupuWjJyDXsDANjc+WZcbu4zrQ97aqVW+Siruqu7czm86l67XzaF+aF+Q4rLUrl5+X+kvdUNiPqqbPpc35mjpxsP/adIHRpllTsLV0IpNu7JlbGkEbJbG72yngfpx5AoaKxt48728V7UMbgDciD8+f7xD96JDQa307spKvSEuioYCxfJTj9j2xqwAXFQpi5C82lPNANm88+bSL04t7FJW8e1oksBrDz1Kqx2hHkHHi0mCr+6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Before putting them to use (several commits later), add rcu_read_lock() to pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a separate commit, since it risks exposing imbalances: prior commits have fixed all the known imbalances, but we may find some have been missed. Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++-- mm/pgtable-generic.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5063b482e34f..5134edcec668 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -99,7 +99,7 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) #define pte_unmap(pte) do { \ kunmap_local((pte)); \ - /* rcu_read_unlock() to be added later */ \ + rcu_read_unlock(); \ } while (0) #else static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) @@ -108,7 +108,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) } static inline void pte_unmap(pte_t *pte) { - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); } #endif diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 4d454953046f..400e5a045848 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -236,7 +236,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { pmd_t pmdval; - /* rcu_read_lock() to be added later */ + rcu_read_lock(); pmdval = pmdp_get_lockless(pmd); if (pmdvalp) *pmdvalp = pmdval; @@ -250,7 +250,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) } return __pte_map(&pmdval, addr); nomap: - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); return NULL; } From patchwork Wed Jul 12 04:32:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309559 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EE65EB64D9 for ; Wed, 12 Jul 2023 04:32:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B18966B0071; Wed, 12 Jul 2023 00:32:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7B4E6B0072; Wed, 12 Jul 2023 00:32:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91B606B0075; Wed, 12 Jul 2023 00:32:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7EF836B0071 for ; Wed, 12 Jul 2023 00:32:13 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 456001A04FE for ; Wed, 12 Jul 2023 04:32:13 +0000 (UTC) X-FDA: 81001687746.19.3D10433 Received: from mail-yb1-f181.google.com (mail-yb1-f181.google.com [209.85.219.181]) by imf05.hostedemail.com (Postfix) with ESMTP id 7B57C10000A for ; Wed, 12 Jul 2023 04:32:11 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="Qvc/PkHD"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.219.181 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136331; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GLbz1RkNVDPwgd+c2uKjWshONgMFueYUThzaGJTcWVc=; b=xqkeDuB+bxZP7O1oido79DiAY/SjPXDrbQce/DSCpBCLZVoeDG5OW8axMhlRlQR0BYzWLv XY1SCAqTnlNLORpm2Yi6pnM8LIiynFOsp4gnq+BEBPyOdt6Al4KN8j2OjLM+99Ml6aXWez dHqkG+OBO9Eah2/GioyJKPCQpsGiiKk= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="Qvc/PkHD"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.219.181 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136331; a=rsa-sha256; cv=none; b=MGDb/2PuK4702rmXVkypMS4CffXQXb/poPr68oLsNT25NOMSzQf8UKEm8fO7OHAmT7Q6ah 8AuQzHE3TIV8+JP2S3KncUOpAATkfdeZVFlB0QTGMBm5sNxAsE6ZruWOnpCCR31XArsyt6 mW3BPayZi+5qwMu+Ywmyg0q9r17uNsU= Received: by mail-yb1-f181.google.com with SMTP id 3f1490d57ef6-c15a5ed884dso7112382276.2 for ; Tue, 11 Jul 2023 21:32:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136330; x=1691728330; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=GLbz1RkNVDPwgd+c2uKjWshONgMFueYUThzaGJTcWVc=; b=Qvc/PkHD69BAU4sP582IfM4RLWXgBIDFZuk6nWDdns1yA6pipuOhfQIqdzEwTgr4pe 3hHPVwZEX/rTN82QEjOQqTQ59HEMCYwMwWNy+6/fZQSqaZ7WuN8quxZ0OH1xSZb0nnw1 GQ9yLeSU3CHcxotbWJKDNNaXtCszqsP28pboCNNUqI77cmHp5FQjlA8XyqklDB9TnulT WeWBKDY+5/x9n63RzAqMKfAoGe4PZlQiubwU8Q6nAwMM5fg4y0pO4eBHwzFNgIbvMg4s 7C3QDcHD56vICRPhekeNquZ8dALU2oQhR8OSFrsqDoj5IKRFZFww7bl+bK1ug+1bWYfZ rBRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136330; x=1691728330; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GLbz1RkNVDPwgd+c2uKjWshONgMFueYUThzaGJTcWVc=; b=dCfpersXfjQTLWxDpVtAUEnGvTrJmdhuVt6WM1Ph5Vj/d646tU27M3nQtmABzN4wpY wwfza9pYlRslUzoPOtnb0IXU2nUA7FcI/VmVGetwU+EPXeFubUOq7i1y8gHAWysCWlHh UjfsTQhQjnyomcLjQ8ukpKRXqd/a6IipSl8u26HLAenvTqTZRXwLbzVc6ulvF0nU2CK6 vOOUqWKbCrqtqSpcUj9OO6oxEg11fw91A2p+8hryyaoMBWkWRrSZZNO6OXLPn/+222iI blov6FU01NBNas1nzi0LCfp4Y0dcrhSCo1E25hLspOfoBK2a8V8DVk3dcrSrzqec76jQ mSQg== X-Gm-Message-State: ABy/qLbsyi2qLErcXYzdu4yqjZI7ZYMB6iJYJP4LYlVZ1d9BolwQof+P jRH9+WueSBYQG3EGFFq3EUG1Cw== X-Google-Smtp-Source: APBJJlHuHSECgdU31VYjlaNkYgclr1t2srEz3h6VpNsV9zTecpCyWa2tDNjiRBAwCFbUxJ3Q8EAdUw== X-Received: by 2002:a25:dd85:0:b0:c18:dc8b:8582 with SMTP id u127-20020a25dd85000000b00c18dc8b8582mr16530943ybg.22.1689136330397; Tue, 11 Jul 2023 21:32:10 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id z14-20020a5b0e8e000000b00ca5d05d4d1csm90210ybr.28.2023.07.11.21.32.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:32:10 -0700 (PDT) Date: Tue, 11 Jul 2023 21:32:05 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 02/13] mm/pgtable: add PAE safety to __pte_offset_map() In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: <3adcd8f-9191-2df1-d7ea-c4877698aad@google.com> References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: ckw9qxd9zfrzony5nhh4jej9k6jfchqc X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 7B57C10000A X-HE-Tag: 1689136331-297361 X-HE-Meta: U2FsdGVkX1+IEdTsHy8YvlJXmR2GnG3WA/C48xsibDLogttqrzXXvE8zFfTbc2oWoZIUPvoDYnqXoprgXrzQINpH4xvd+zEXHBqUWWTXeF4Noiq0nubtXmui8lQ1DhoPriIFZy2RQZbnPZn7mIY2hPCKdJ7JXBSQWL+T6J9QMKdT4DxvTVv1898h7qOya8Sn2WTmCFve3QsGnz0uSoTft5suOU5usZ3in6/FCI+LU5nv5ndWel5pC77NxqEkEX5d17wgVUCZCBdgsIysze/TIZcTbsCiD8YaJ1tkfXyMhTveFt9PIFloffczhd/EyDiBpOjptupABaBZvw2kUtyWdbDUwCIpoGa7jPR0n5h5Z6BwXzFmqrdPMnR/xmjqsu75dwnp8gc+K5sqXzXq6DUc3hOadqG1+0KqdaFRJ0J1MI6BDiOCL+KrjvDYKrl3LjYqrzmrHkpCsgJyuPfB8zckIvLVNp26iRiVNTHfLYJR8OSXHqWIwlabXKgzEMFu0York/iirM2EjJqpGRy0M643JXtzBkw1T1Zwpn094lYAirkHND15v9w4LcBZpalcXseSXvJLI/l5Ztl6QkFi/cGJthdQeijY2RGMq08daJcndCUPv345tEt/pP+edOLa6Jh86wLZb6saa6WnZIMmrLBwCHgM+E0GeB5SrKQuBs2WP5hvPQDUxDd7OnSCMqmzgTB/F2ie1tV79lzWVRW4sViHdtrJ7xUF8ChcUc+QM4fR2hzomzLlmNDckTDpwfKRR2ETj9nq1lxAGdJ8Bz9E4xvFPxoorwxuAcNjRUi1jHQm9X5SP/mVpbwHrgtmEemKvERl0C+l021yyTY0rfM5zkM5RviFqmNu0la3bg6+gSFTEXNvRUUEYCvbv6Rs0cYtM12MBktvshuOEY4rE/++BkRptwJQ6b+588yh+/9S2Fg5SMDwpVc7lX7UEF1PJQe0PQ4trstBC8DCA3FZSY4oPyH Sou/II+g AQxoYW6bUo/AY6Yh9XsdLvq+GnV+73Qf5AwFHcsLyPFOAffSvxyPVUEbHasABGIwdC6ApLn0CUVUK3JVVvhkqCw1u5By19f8mKIiXhdZoZgz1lWwjUMs726M7p21waLKxXKGWIjQJsxjO+DXvS7SrZtXN3ddidQU8W/CtRAWVYHBvp6Xf6JYrJ6swg6EKQNJNAfQ/qIc+DE7BrZHCM3xIDApwIUrjBonEQQu2H5Q5o0fcCz2v4rL4ZQ3zfdojVulXrD76Nwu3oi03qPpuiGfEdZyiFF8pNhA2mY6/jR994GQrgg0dkbnMkYut4L9MRfRXF7/hSVADguJgyoXWuUlw+SScksFcDq34csBXB5ryYJXUoTz/aHIhjPyS6FLoW0JBnXRYTfWYdS1slBLG1PwODLfIORQM+3d2bEf4CRG8o6gPv92nEZtPo1Gnsao0ZnlCjkzdGDCefhK2q1rGyb2yU47PbYIvJJQ8s7RT3H/ZQXGVB/MDYy85qCujP1O4Sa9TVX2U X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is a faint risk that __pte_offset_map(), on a 32-bit architecture with a 64-bit pmd_t e.g. x86-32 with CONFIG_X86_PAE=y, would succeed on a pmdval assembled from a pmd_low and a pmd_high which never belonged together: their combination not pointing to a page table at all, perhaps not even a valid pfn. pmdp_get_lockless() is not enough to prevent that. Guard against that (on such configs) by local_irq_save() blocking TLB flush between present updates, as linux/pgtable.h suggests. It's only needed around the pmdp_get_lockless() in __pte_offset_map(): a race when __pte_offset_map_lock() repeats the pmdp_get_lockless() after getting the lock, would just send it back to __pte_offset_map() again. Complement this pmdp_get_lockless_start() and pmdp_get_lockless_end(), used only locally in __pte_offset_map(), with a pmdp_get_lockless_sync() synonym for tlb_remove_table_sync_one(): to send the necessary interrupt at the right moment on those configs which do not already send it. CONFIG_GUP_GET_PXX_LOW_HIGH is enabled when required by mips, sh and x86. It is not enabled by arm-32 CONFIG_ARM_LPAE: my understanding is that Will Deacon's 2020 enhancements to READ_ONCE() are sufficient for arm. It is not enabled by arc, but its pmd_t is 32-bit even when pte_t 64-bit. Limit the IRQ disablement to CONFIG_HIGHPTE? Perhaps, but would need a little more work, to retry if pmd_low good for page table, but pmd_high non-zero from THP (and that might be making x86-specific assumptions). Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++++ mm/pgtable-generic.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5134edcec668..7f2db400f653 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -390,6 +390,7 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) return pmd; } #define pmdp_get_lockless pmdp_get_lockless +#define pmdp_get_lockless_sync() tlb_remove_table_sync_one() #endif /* CONFIG_PGTABLE_LEVELS > 2 */ #endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */ @@ -408,6 +409,9 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) { return pmdp_get(pmdp); } +static inline void pmdp_get_lockless_sync(void) +{ +} #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 400e5a045848..b9a0c2137cc1 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -232,12 +232,41 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ + (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RCU)) +/* + * See the comment above ptep_get_lockless() in include/linux/pgtable.h: + * the barriers in pmdp_get_lockless() cannot guarantee that the value in + * pmd_high actually belongs with the value in pmd_low; but holding interrupts + * off blocks the TLB flush between present updates, which guarantees that a + * successful __pte_offset_map() points to a page from matched halves. + */ +static unsigned long pmdp_get_lockless_start(void) +{ + unsigned long irqflags; + + local_irq_save(irqflags); + return irqflags; +} +static void pmdp_get_lockless_end(unsigned long irqflags) +{ + local_irq_restore(irqflags); +} +#else +static unsigned long pmdp_get_lockless_start(void) { return 0; } +static void pmdp_get_lockless_end(unsigned long irqflags) { } +#endif + pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { + unsigned long irqflags; pmd_t pmdval; rcu_read_lock(); + irqflags = pmdp_get_lockless_start(); pmdval = pmdp_get_lockless(pmd); + pmdp_get_lockless_end(irqflags); + if (pmdvalp) *pmdvalp = pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) From patchwork Wed Jul 12 04:33:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309560 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86A8DEB64D9 for ; Wed, 12 Jul 2023 04:33:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 090F76B0072; Wed, 12 Jul 2023 00:33:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 01A556B0075; Wed, 12 Jul 2023 00:33:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFFB16B0078; Wed, 12 Jul 2023 00:33:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D0E306B0072 for ; Wed, 12 Jul 2023 00:33:16 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A41F51A04F7 for ; Wed, 12 Jul 2023 04:33:16 +0000 (UTC) X-FDA: 81001690392.30.243B524 Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf26.hostedemail.com (Postfix) with ESMTP id CD5D5140019 for ; Wed, 12 Jul 2023 04:33:14 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=oKpaowvQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=i4tZyi+HkQEhA143ao3Iyv04H3IotisFVwIb+XMQSnyFpZ7o35tQtCIjzh+bj5dHSSuDI7 ktl/1SxlEyLfCknpL3e/TE3xlQXTX3DkNG6lP5MaHa/NxyqnSuT0UkxmZ+cKvM4yynT6k2 giaNVBr98QJ4MYbDfyCYpMSVkpVWYY4= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=oKpaowvQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136394; a=rsa-sha256; cv=none; b=dxAuob10j4w0WMCGj/XYBfB0y5NslsP5amG7nSam5CMUFpPOCg6kRpRqQp9Nkwn0i8EHez uVfj8YKV2dDOokSPExhsxIo2E/aVOJA6Rt3hh/OyZTZcSBD3Eki/I1spunw7f+NRiC5CvG 5f4OR/08iIbfwRqNmwlDvS87+uYDzwk= Received: by mail-yb1-f180.google.com with SMTP id 3f1490d57ef6-bff89873d34so5778063276.2 for ; Tue, 11 Jul 2023 21:33:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136394; x=1691728394; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=oKpaowvQeaQUncOOMCvTBHmXQ/FtZl+zZp5+Fo04CzoCy9LRKSCy43wNQCF2w2abHG jYi2GRVb9SUA+cOJwBdOtClSXiZ2Z7OrQDz8b0C3ZKWCpyBSxxh9g0MDPDTlwWm1wpni MQq6dZQMaat5mVW33VG/Z/sJCOzOtM5JcugYkrHF0OjaZarDla2EabzCECYeLkptPTFA kvVOAdnkA+dARAIwQYkZ5QFG5BKKdFwsILzvRn8L6hIPQtsvNfCEpljLsKMHiBJ7Tb9i vvB9S1uM8vmPWQ9MN4Bos/RgNtKXVdLVIPeeQrun3EY+oUKy7MowWj11CHLrlgzQNYzJ ikwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136394; x=1691728394; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=Twi+v7jCUf1S6I68hLyAcdutgJmRglAwAsnnnYReRqWt4yhX6JpTKXWDyAQ415swFz 8nS+Njw4GpukpYpYM+uMI8X5Ti04Q1UnrdOYD+KjI+TkGBmrQfQRXPgZ27mb+Y3lOdwY CjNLJg2/hKQkYmOaaztoN4QwmwK58h/oVAw9h26cVBkPrxW/9itColp+5nQzzeXJ43zb MlG/T/YcMcMVq4sqWcMQq8SyiHP9VoqsBgc+Ljn9000VCM+BZobBLJns8x4uaIOto4xL mnku8zH2nPsgo1dMrKmvysHf+V2jwDz5vX4LIuxq5v+4ShdSQFzNqFbnPNQW6TZRawxs zl5g== X-Gm-Message-State: ABy/qLbUJVpbc9BCNYy8czYp4TshowybgFGHNrgsLrqk+5d1X9RUysII FNdGSiFmiLtNSnOv1qnbisK2ow== X-Google-Smtp-Source: APBJJlHtqAOgfTEvtz6FEYsO4J7wfUGe8UB05ARnUaznX7yxHAtr1BNRbw9aWEHw4YhpidgoA6BFwQ== X-Received: by 2002:a5b:70d:0:b0:c01:308a:44f2 with SMTP id g13-20020a5b070d000000b00c01308a44f2mr13992957ybq.57.1689136393735; Tue, 11 Jul 2023 21:33:13 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id b8-20020a252e48000000b00c61af359b15sm751502ybn.43.2023.07.11.21.33.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:33:13 -0700 (PDT) Date: Tue, 11 Jul 2023 21:33:08 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 03/13] arm: adjust_pte() use pte_offset_map_nolock() In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: <4d5258bd-ffa0-018-253a-25f2c9b783f7@google.com> References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: CD5D5140019 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: jubk65jwq9ye7p39czsuaohnosf1guz9 X-HE-Tag: 1689136394-291969 X-HE-Meta: U2FsdGVkX19+EaZmgSpC5FPmEkeIqtnDCyLPDH3BF3bVulQM4wZWoUaKe/nujpOWJcI11PfQ9zK04CQEQapxqWSUYD54yA9/xO7K76LIGE9EV/SuA6eamECPMCUpHsKB50E/wbQLO1Ac+fiq3qpWkiBdYXNMXSd5fkFfWff5oqcog8D5ysHMi+u+2UHiqJHuT3qVwF/khUWmJge+9SIJOPrD9HoMOQ3sYoroUUDRyKnSPjpe+Lu84VIOr2/LjB9n5qtjLnb155QrYHmaMl6I0bKI2KXhkGQszVdnJ/EO8Y8dPw0jjjdBYu3Y4lWh+og+fXIRmX0vlWzPxcq3OgJ3szhNaTQcvQfutX4XSMCPQqbdz4XtcV2xrR2PqajIVczpJqXHDaAcHMvGSTAwZ2NpVavwFFISCHmb69ZGZfHnT/LiDvGQ/ofNJf+SX/HmpzGfZ92g4jNcINUgFtv1JZ/XUnsjOZnLamA+AiYEJE+V3wkHFKxkp+ca6zPFv1bG7+vvWiVqgTrcjstG5MPf7z2Ehvilsx9wbYoAWvfaesZ3YKXpXtwXwhqafx0aaNTE77e41okMPPVtu4ZzFXJQSS/z9/+i6YX1s10SJBMIaGobESX8XEHGqjF+eE5o+i3SPHCHwiXK8PaBRszlSI+CeYBKBWQ0XmG/nt0LtKKrhsMdwTF6WbFuX5AfROIbgyCgokvGb7c0StR8V/9LkmUjQWaxYX1pAFqMbkQ6/wdbTbHMqoAcHjwlyLbxaL1e+BMIot6AIJrcV6YamwHJ1MCLFX8x4Nx4uIXxjn4gmV4iFnspbef422u4f1QihTE3CuK6iZiLn+qdl//407+EE7lo3Z05tGoy7yMB641hoXfcLezJ6kk+9mIJ6Umx7NmaqNHi0gd3ZCFML3i70V0WPrlpF689JchwizsCfHNKO6dUuwKrVXVQOou4BS7dISKkHJA5EUJ+NrYO/dodSsa6Pk+8LEQ ou4z2aRO bRDiDNgdukELWVhCnles/Na4iIgprOWqOGd21O3FuGqOfBixApi6yDU2F+FWRzsyiU0RpndKgT70TrhUlV4aGPTJvFv/BlNnl50x4Z7wMEsqaMLP3KC44E76zY+l4Mg1k/HAN490JPNpIPxqQFsbS0jMZ1zw05aJTIAO9BRlOVACvj7wcGpbLTYo2muZ4b+Hs92v+eLT0o9QXMkDh7afrWmG7fdwMlmJYLJRwonIqFbu7OHNdInNadEwKnlrd2ZeMjQ6u2CvhSkj+F4xTVHAgKYwyK5UXmOWQPvXXTWrw6iPk0efWld+yJ8IiIj7hSHtIBpq+C5q03CMASa++v3TRRkOLzGT3k9hQvGLtWJU+uTPEF9BAnuZA3418JtsMFh67zCihq1WfqSK9Au9L1kPn7bQBW2qmTaI7DV6Ip73n2XPiCtD7NCUtWZc4gv5JAGvX2RSlIV+esyok7la1AiylrmQJqTKx67wqDF0YVIffHrq+/P4bIm2aHretjhV6DFih8AKf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in adjust_pte(): because it gives the not-locked ptl for precisely that pte, which the caller can then safely lock; whereas pte_lockptr() is not so tightly coupled, because it dereferences the pmd pointer again. Signed-off-by: Hugh Dickins --- arch/arm/mm/fault-armv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index ca5302b0b7ee..7cb125497976 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -117,11 +117,10 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, * must use the nested version. This also means we need to * open-code the spin-locking. */ - pte = pte_offset_map(pmd, address); + pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); if (!pte) return 0; - ptl = pte_lockptr(vma->vm_mm, pmd); do_pte_lock(ptl); ret = do_adjust_pte(vma, address, pfn, pte); From patchwork Wed Jul 12 04:34:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309561 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E126EB64D9 for ; Wed, 12 Jul 2023 04:34:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDADB6B0072; Wed, 12 Jul 2023 00:34:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B641A6B0075; Wed, 12 Jul 2023 00:34:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DE206B0078; Wed, 12 Jul 2023 00:34:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8EF436B0072 for ; Wed, 12 Jul 2023 00:34:33 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6D3C81C86C6 for ; Wed, 12 Jul 2023 04:34:33 +0000 (UTC) X-FDA: 81001693626.20.614FE49 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) by imf24.hostedemail.com (Postfix) with ESMTP id 9B562180005 for ; Wed, 12 Jul 2023 04:34:31 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=wnCVuIXu; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136471; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=ntOaKZe1jOHE2IizZ6HHSBoGeQR+baq6mTkw/NIRlo0ir+Gulm8WkkjOJSVn+CnVdbCzv/ ZB9RvBaBGtw9PGzdshpc0lkbx/ljslWVwv9buoqAFLwAY523qRY6G6SuY/MCes6cwxpKOP 7503PaLqLDISS2viHKTI+3dkDzbxyHA= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=wnCVuIXu; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136471; a=rsa-sha256; cv=none; b=N9B775tiWA10zC/pRuivQScO/q9P+LHFNJ/gxIIAZpVGVTntkTLejlrPhlqdrxNhGzoDKZ izyB2GaCjFFvQFfPjRjK78jA3vOSnTvJHaYeIraIirMm2fgfFhoBJ3vlOWvLrTZJQIi6O+ 4h6uQe7ZG5jdVbEDN71XcgthbsQ6Wso= Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-5774335bb2aso71113717b3.0 for ; Tue, 11 Jul 2023 21:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136470; x=1691728470; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=wnCVuIXuiaG32yazd1jaE3W5ihs2aq10U+0VnqC0A4M5VW9f75uDvYaKtJTiOmyaue Sk3Hz9lSjZsRg5VYNIu7Hn2yE63ieOx4N3x7YzoU3Buew+NClS1OfLuPLWy/xU1Be4F4 Mj93usiX7kHcOZV19+ldiZyyL/p+CNJFiKm0gEtrogp1KtBAN0j2Dy4CImKpM6En2tcK aezkkRAxSKmsFJ7X4v2alSgKHi31+oMGpb01SlGprmt8mWgh7ncZkS4hJxBgagnaXOse tz8TLiOWqvvCKNWEav/8zM1LSgHqAZXlnZ87YVamvaqNDIExsUdGGqSiLd9h9BTV1Nu3 hsEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136470; x=1691728470; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=En/iGyLQ+Asfml1vBWsAj9s1EknS9ZpXAD3r/DCnFR2CTOzAsH+yP7CWQ4TswwrAuU kYrNCS4I9li7CLc2HSmVRJMEtdth715dVLXLMtrKcGyqDTeUjSVkuWnLw69AMcoIvOBw gCJac4qtYfu+I+6jyFniPDoRfZCXR3nA3s7l8J295KFBEYNf13SlnV4HPyWJHRNXpGMo ovPVlTQxgacrOYSWEwibwE8+ieAqZweccSO4SVi4rhXCCTAaonGOVIa/9YOTYqwHC8hn meecksR6SpRE+uhpQ62W1q1gHspynu5yrVpiV7Es9Uk8+m5A6irV7N7thpqRLRCB8Vt7 KsRQ== X-Gm-Message-State: ABy/qLYRW/QMrXdSXpemAwiqArgFW2rlFZwMYJauhcQuMCwcT0YYmFUa qs7BtuDgOZ8wvyOtt7ct6gbJqg== X-Google-Smtp-Source: APBJJlGbNO1AMOmdt4OPi/Xng0KoIYGmRlccUjicbL3FXzg2JdRnfsqSYXaILeBHKbZKCy/ktJ28ZQ== X-Received: by 2002:a81:4ecb:0:b0:57d:24e9:e7f3 with SMTP id c194-20020a814ecb000000b0057d24e9e7f3mr2850117ywb.38.1689136470494; Tue, 11 Jul 2023 21:34:30 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c124-20020a0dc182000000b0057a05834754sm974979ywd.75.2023.07.11.21.34.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:34:30 -0700 (PDT) Date: Tue, 11 Jul 2023 21:34:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock() In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 9B562180005 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 57eb59h6j3uqdx3x1n141qrsb46esnk5 X-HE-Tag: 1689136471-884393 X-HE-Meta: U2FsdGVkX1/jduX1glCD94rm3CVz5s07jgPLR74NL5zEoh5EFwGPLTlgzyixlsKn6G8hHLsN974SXCkqTGr2JbfHHpr2LqTD4bPjOnN0RmY8eA9HL8B3QYMedRffxOiHG87RuKTlM/IoHiO3BcNUc7goeUJZATm1JP30OOwNahhprOGtmEs3/sspI8A/JK2r/bJdkQLLQARVMVU1HtnwHLKxM+UXBAEMafRDywj47MeHTCUubYUG7gtRQFIPA61y6qFQ4aHJozNHus7AH5Ri/w7EXIG4YBOMS02+qd6EXNgmdCgiPER2r3Scy//v0HZCcBY/US7NN0Y7FDGcg449dQdSP9MK7VxxqlV1EkoBQE9VrAnN1p874nt5ykJupak/0Q9kscu1uTO84xde6quIv4ZgIYdKwymwDVM4TRLNgkaMOju3qecoNKBRjG4NYxTAE5GySePrkgyhwzxySgRXG0aLiPkbW0Xj5YOVGq7daT0Y6MntssHyY+j8lQBpDNWL+lYJmabIT1SIs3zXDKrzdzf24v4zMsStvE66O+lS3+WA7oPWTZl08Vz6lajocmZOy9Kmm9pIXWZ56O5wAyiDU2HrCzskao7Wr4mvWdFb2EILb4hbVceZIZLyJ3f19z1wPfNx7iUq4zHIbTQ4xtpUP05XZ6BZmnytnyhEstU+Vb9bVigYg7dKuGo7b1YV03b/AAxtu7J5AELm/6X4AIap1RX2vluFkyHTpGI2WtWW7r/PkKr1xjZiCJwQ2jjbhb9b1xIvCX7PE+ltOVMFGeZKVwV81CZG0Nalu+s1G3b3DFq8WhYqxGHmvh12zzAL05+NDbWaF+xx6tm5x9RrHdrV4dDFMOpouahuB9P+7n3c+/4c7+s4uoaDPQTWd4T3zd3/Fj1U18jeG5VGKp+PjpWLb1cjrCGimwq4yXmyopL4Q7VMDjNrz+XY88CYUD16sBkv38bMjZ/O2QLQ8SCg1vA ksD0lBPB cAH3k0W4gftsb8KEniVU2eOYAbXtQg+Ld/FLd19hmCWWSFAJ1U4vkPzWj6kv+227YIOduQGaCIe6MfaDCrCvkbkBw2hieA16FD5sagnnVRWpGOiSWect0iVEIcjAoX/pLHvzw+yCtp8Crn7zktdDOVcfC4XwkV+QHrvx8S2Kk6IFwsqLIvfLGdPKDwlVXwDXDm532oMLC+by7E2k2FH+n9lK/jCDy+XJjMhAYuoPZlutDAuew1g/a7H+Wc5dM5B89/xWU1ZI86rW4wymWaG37IOrz4vR6AtCho31b4sVBSrd8OFYgkH/kfbZtYT+cJ24WtA5ovCIdceXPTfgGI1IjqEZ+fUrlryPeK8hNQU2+/wChn5+yX3LGFrelUwWW9GU+7dG6hhnZN22aL+4pY+4FJSa16fFXw40RtAFYvjQhXUalO4tCpBEVCu+2Kk3rLrdgAUrZfwkhOYFbn1S4UWcsTBd7prpP3A/T48HibHBNXNNAx4LNMIZz1tXNsnNAKnSwsrjN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is stricter than the previous implementation, which skipped when pmd_none() (with a comment on khugepaged collapse transitions): but wouldn't we want to know, if an assert_pte_locked() caller can be racing such transitions? This mod might cause new crashes: which either expose my ignorance, or indicate issues to be fixed, or limit the usage of assert_pte_locked(). Signed-off-by: Hugh Dickins --- arch/powerpc/mm/pgtable.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..16b061af86d7 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) p4d_t *p4d; pud_t *pud; pmd_t *pmd; + pte_t *pte; + spinlock_t *ptl; if (mm == &init_mm) return; @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) pud = pud_offset(p4d, addr); BUG_ON(pud_none(*pud)); pmd = pmd_offset(pud, addr); - /* - * khugepaged to collapse normal pages to hugepage, first set - * pmd to none to force page fault/gup to take mmap_lock. After - * pmd is set to none, we do a pte_clear which does this assertion - * so if we find pmd none, return. - */ - if (pmd_none(*pmd)) - return; - BUG_ON(!pmd_present(*pmd)); - assert_spin_locked(pte_lockptr(mm, pmd)); + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); + BUG_ON(!pte); + assert_spin_locked(ptl); + pte_unmap(pte); } #endif /* CONFIG_DEBUG_VM */ From patchwork Wed Jul 12 04:35:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309562 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73E67EB64D9 for ; Wed, 12 Jul 2023 04:36:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A3896B0072; Wed, 12 Jul 2023 00:36:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12A7C6B0075; Wed, 12 Jul 2023 00:36:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0D176B0078; Wed, 12 Jul 2023 00:36:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E3FB86B0072 for ; Wed, 12 Jul 2023 00:36:07 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B28BAB045A for ; Wed, 12 Jul 2023 04:36:07 +0000 (UTC) X-FDA: 81001697574.16.1B54B3F Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) by imf15.hostedemail.com (Postfix) with ESMTP id E6178A000F for ; Wed, 12 Jul 2023 04:36:05 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=FFbzGiP1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of hughd@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136566; a=rsa-sha256; cv=none; b=riLNjgMzD2N+Ny89X/0WLKkaWYnIQLb0B13FgtvgOacD8bVMVoNnDRHIl2F/pMj9fx1zRo 9PCXhMsaPZJMSfQI8TUJXIpJM0V2zwa2+87ZGMqWG3YsC9hXPSoYJHVmps1GWEpT0b+oFJ QyFvVRS/fpWIQ+3ji6oziC8Rk2McIXk= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=FFbzGiP1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of hughd@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rNkuOnMiPA5uKMNmzDSZhkkEhVwG+08I0Sc1INz+wy8=; b=oOB3r9Ehrmh7yq0kV/p3mpnXnb4tsWnjNZ4Lfd19dYM33+212rNTZ2oT3EJFXjN+oom2dq 6tkun11v0mykXSOM/unXHp7SXxt1BWSndBWjaGR7in6KweyICdtS7SnJFsGPv4D1WaXKwr ReIFp8eErfrk8VsUamNKoX53U+7b8XU= Received: by mail-yw1-f177.google.com with SMTP id 00721157ae682-5701eaf0d04so72803007b3.2 for ; Tue, 11 Jul 2023 21:36:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136565; x=1691728565; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=rNkuOnMiPA5uKMNmzDSZhkkEhVwG+08I0Sc1INz+wy8=; b=FFbzGiP1Hn/cx8QOOnqnR11nIjqgT83mtNZkbCUQnSq5LrblamhVfpvB+UeJbyZW1s y4J6XTFnK8fA2dLUAHHtBip6qAhzoi/1hVVGt2YPDhBzcNtIdNEwDD9zadVtdk/xueRH ZKLjXaeVhHWhvtWyuu/YKxYZxvMKkRUX8OZHqVf6Z3jPZZZJRTzTUXJQXMXyRqkMOflu Wsa6cYIz8XLnnJEBGqzhXR/oYVqaIyb5OBPuZhbuAADdjSajptwPQTErMBx4XyNaSU8b s7pjxVInyPyp25nFk59wK/5ndqw3B7p/EJhYwYVyO8aZGxHmDzIQpJw9VHF7rxxbYyvE Gp3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136565; x=1691728565; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rNkuOnMiPA5uKMNmzDSZhkkEhVwG+08I0Sc1INz+wy8=; b=hkDz9iYbJUoY/1ts+tVqiWYDeFcDXnsZWsOeEK5NKqngY/JlmdysWyVwl8eaP6eVM/ 4f+xN++rC6N9JRvhiVDnBEOPlrrHkmvJEQ7oA9Q7Z8F5pn7Oqhq2GLcuxmKlZsTSBpTy rmE+Xd90YsC8W2k6sswFd3apwqiKSsmB3eqWg6C912QfRalvRME/vOjRpyNqy1+Xh5ZB GMw25py7wn7e/Wv6R8BMFJZf+w4j8zLyFHKAGN4FuKfG+FYgjFOJtMYcjdoyIlds26rS t0uJosONnk0khEY3t9e2Ro7XOK2xIGUIOroEbXAwovAqvxtoOQa+WoM4qv8MjQqPD6fA ypBA== X-Gm-Message-State: ABy/qLYwLlHtffugLC3axs4raqOli7pQplT4JTJiKZ0VMfgb7xRtOd6w JSr2ga6saAwRNwLq2+YqoQdQsw== X-Google-Smtp-Source: APBJJlFtIJlcayIhMEpj3vHnt3xsOXLGUeoHM39mGyyz8wsLsKOn5HDkQrRum1Ly6OBtEql+IV2D3A== X-Received: by 2002:a0d:e88d:0:b0:577:606c:284b with SMTP id r135-20020a0de88d000000b00577606c284bmr15146586ywe.16.1689136564850; Tue, 11 Jul 2023 21:36:04 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m124-20020a817182000000b0056d0709e0besm968571ywc.129.2023.07.11.21.36.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:36:04 -0700 (PDT) Date: Tue, 11 Jul 2023 21:35:59 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 05/13] powerpc: add pte_free_defer() for pgtables sharing page In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: <6e3ca5f1-334d-4b14-b92d-fc8e99914fcb@google.com> References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E6178A000F X-Stat-Signature: m3g5jgr4c1wztna59x9hchkau1g3uy7k X-HE-Tag: 1689136565-134804 X-HE-Meta: U2FsdGVkX18/yUuSUBGe8PNYtQ2aFnH0VbG4qBQdkIoGC5JSGApbpDNX3crRMqaBTSsGuw1aa/oaJnt8sx/piOOUiiGsFWimA7MOyIALAYYbrxgAZ0tbQc4KX9WdAzpNoKiNDuUlgXN6Uw3ifU9iqbZDfvNtKOMm1rUs1F0uNx6jmwOq9XlLTQR8J8dlBIa/+8P2qzxBCPekXtxw7mPTOL88ZEIR7oQPbxnDw2DcVGU/wLtdlgZ5VqXtaT3b43j91aO98ohdsCjTxPrbstQ7lw/1chZ6VHE3+NhxZC4aaiKmvvOy6Fo1lLJAVY9mC0v1UopT4RAUBUGPeuZ2QE0feYSdxR203qd0Jox9aEolN1muyWXo7Vpxkqkirw2V5M4RFqGu1P2v6c0kHTZNHPwClevzTljhqe0Gs8JuQ4pEGN3Wp9EKayetapc0VCbrEaHXRHHi791ZZl3i1cnsC04ugwpB88TrOdRCQNVfjsRKguwjTGb63T4zq8EgnauIdqefuJQnb7O+3dhdeO2sxDJTwZTRSOe0UnjK2TcoIIavFro0RrzJyzgMlPxlx2At2W+of/lOFO+PEO0/aSFi91QYS76djTzP/EbA4JjZX08FGAV6ICpDV1t4uuSY+z0ZnODUUQFEksfZXrl9DYCCRTEJ/pUWCFX/joj4/lYIQNO+2iJa+hxn0PUuv9VlP5J85MQc2TFQC9aLTGeo9WoAx24wKXWzfXfLb1gVmdr3oc7wvoGmmYFVWa2ZljPoKYZkLW2RDJB0WAJ0BsNdEt0L3fqq4VtO0y6l92ykdTsk9YkH0aLb8MlUGr2M2eZXjumJ63QgF5JoI16QJM4nP8cYdZOwHoWMSjs38wyXXMzzTZWXKx/QZ5sl5Z3fnVbCiFOYdi119wH2BoZywuqiOiEK2T1rNwslzu11x9PiB53hV4mq1xY3wFVQkgCcC/9m+exTMvfKol9Oknfg1pakBLNR8aB BlVRzpwZ sVRG6B6k4tZrLEcPURMYzLUEIiQX3juDUgfcH9LOxD0+kLqOpsM6mMqFnFgwWYSkCMi9Sy1Q064tGjIv5DFAN0jFiTI958uXwiNclRsmvvtpazyaqvKVZesHY+CJi1KoerdV0iLCctZWmVS2bRNADi0nleiQ1ozH0crvlPZkPWzhduZlLmxr7b9v4HPrd7RZV8/4gLlOExa7c5w/jvThGMuOLDbq4Fp0nqv8X5H6JWrreYHVXtWLcyjqbsPqwjhsvqw2PdDOXEGiYgSUJTxWhFJLSHLXksZxMkRsxO3lYulgznrNYQQ3HqpG58QyhZB2VVFCXKoAxa612+Ot/VnGu7D55jKbQvNfdpVohvDYDxvgw7jPQnE1J+mo8Grn6QDTwwkZPsD79aPd2zVmdAQeTYW/VBPSXBY7S9mpKjzR7C6w32F/BD49M9MxvM9/ts9FyqmTeIoy/luX7bwumiOmv67qIJs7VIl2+hmqnoHeiiZFrynSECW0JW2i9jT7VzqdIwC9HB7+oyGqBk3A3WyyHKizEE2Qf8wYauBHNtYMXDF7egeLgltHOLlY6Hg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add powerpc-specific pte_free_defer(), to free table page via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This is awkward because the struct page contains only one rcu_head, but that page may be shared between PTE_FRAG_NR pagetables, each wanting to use the rcu_head at the same time. But powerpc never reuses a fragment once it has been freed: so mark the page Active in pte_free_defer(), before calling pte_fragment_free() directly; and there call_rcu() to pte_free_now() when last fragment is freed and the page is PageActive. Suggested-by: Jason Gunthorpe Signed-off-by: Hugh Dickins --- arch/powerpc/include/asm/pgalloc.h | 4 ++++ arch/powerpc/mm/pgtable-frag.c | 29 ++++++++++++++++++++++++++--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/pgalloc.h index 3360cad78ace..3a971e2a8c73 100644 --- a/arch/powerpc/include/asm/pgalloc.h +++ b/arch/powerpc/include/asm/pgalloc.h @@ -45,6 +45,10 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage) pte_fragment_free((unsigned long *)ptepage, 0); } +/* arch use pte_free_defer() implementation in arch/powerpc/mm/pgtable-frag.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* * Functions that deal with pagetables that could be at any level of * the table need to be passed an "index_size" so they know how to diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c index 20652daa1d7e..0c6b68130025 100644 --- a/arch/powerpc/mm/pgtable-frag.c +++ b/arch/powerpc/mm/pgtable-frag.c @@ -106,6 +106,15 @@ pte_t *pte_fragment_alloc(struct mm_struct *mm, int kernel) return __alloc_for_ptecache(mm, kernel); } +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + pgtable_pte_page_dtor(page); + __free_page(page); +} + void pte_fragment_free(unsigned long *table, int kernel) { struct page *page = virt_to_page(table); @@ -115,8 +124,22 @@ void pte_fragment_free(unsigned long *table, int kernel) BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0); if (atomic_dec_and_test(&page->pt_frag_refcount)) { - if (!kernel) - pgtable_pte_page_dtor(page); - __free_page(page); + if (kernel) + __free_page(page); + else if (TestClearPageActive(page)) + call_rcu(&page->rcu_head, pte_free_now); + else + pte_free_now(&page->rcu_head); } } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + SetPageActive(page); + pte_fragment_free((unsigned long *)pgtable, 0); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ From patchwork Wed Jul 12 04:37:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309563 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FED1EB64D9 for ; Wed, 12 Jul 2023 04:37:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD6046B0072; Wed, 12 Jul 2023 00:37:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D854C6B0075; Wed, 12 Jul 2023 00:37:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C767F6B0078; Wed, 12 Jul 2023 00:37:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B9BDD6B0072 for ; Wed, 12 Jul 2023 00:37:32 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 84C9CA04AD for ; Wed, 12 Jul 2023 04:37:32 +0000 (UTC) X-FDA: 81001701144.02.CE2D232 Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) by imf12.hostedemail.com (Postfix) with ESMTP id A552E40011 for ; Wed, 12 Jul 2023 04:37:30 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=w4BLpHym; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of hughd@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136650; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=X+BAqSkXQ8b0pdeDOL/Pc+tUaVTUaRk7Pgs+fWJ2nyNflv6DJYN8CCiQ0gg7kUy+TRlqsK q/PVbbo39/mgmuEUMEoVwl02zdPSNqvgH4YIyjfVYyn9OcOwytzSRFm7i/uHl18IU7ED6o uY2GLaM3CTAlD6sxqFPkghVfE4ix44I= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=w4BLpHym; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of hughd@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136650; a=rsa-sha256; cv=none; b=R+axKbYRsWZB8KTzJ+cZsDiJmSqKC2d6SDBVP9N5wJKy3WMbV4YShEgDxFeX/bwv6R7z+B KHcextmkYeGx8CnvHofvaZDD8Diajc5SlzYRJ7dljwWaMTqhLvASZsLl3hkn3C6azY/XAe MJ8M2COZjXfmfo/TxbdbUfTfeEUGOZE= Received: by mail-yb1-f174.google.com with SMTP id 3f1490d57ef6-bc379e4c1cbso7590231276.2 for ; Tue, 11 Jul 2023 21:37:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136649; x=1691728649; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=w4BLpHymsh+u1p3yABD7PPkGOKG9qb0kMjMuVvRnYQ5i8Gg0vx7GaXUKU8Z11taA0f Y4235M6bgawLW3yo0EcdaghjRbkYWx3VYEN5rlBhdKXgQyAR0H6Df5aIv3U42+ge0H0F 7Am9Buup/0Dpi4X9uzRYEGHLpGhKSvbsAL94r8S6LMijx5+VZ1cBcWPoxL+3vjMaeqvp Syw06LlLqDuKwM2KmcAX8wLO81GL1X+wMS5GoRcNmiGU7mUrXq/vTwqL7+C0j6oQTqbM DMVJzOclqYpiwu/+ACLlh1W9ZEU8tNf6sf9FmXwoF/hxcf+swcDTLMrmftfNrY6dfBOv P5/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136649; x=1691728649; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=PixL+EVgoJmnO4likyb0pZXHPeawe7ciHls3/usNt5CB7slRvSXJ2+68wmmJdprnrv Vgg3vCd5T+XZ3SnbgBeadUGgS1FHY/pF4Nc8nbDxij6LpGtKVeh7mATevYSLLwajbXTf LKyn9W/rXgvkot4pxMcs9BT8V6j5PkD1nVDW36ASK1Z7BAVNoQ2Gv+hN5omTWC8OFari 9vW+TVDJcj9RzTWq/Az2wwc4EExp+JIcqYK8thurGz0IYahncelE5G8vMzSroljCBnR4 lVZX8bikQ+3OQxjAMcktgLEK4xbM0dU4lucahsQKhhq+8CWu8nZSVlCaqpzBWlCqWd2m K8nA== X-Gm-Message-State: ABy/qLb71/CQvDxcJ4z9+0Ijn98XCn00DddxF1qcK9N4CGyGLdEuCQyj 0DzLoLiLTBL/t8J8YRhmYbRTnA== X-Google-Smtp-Source: APBJJlFmxJSV0QKRdoqMkfxLV2wHU7j1RLUd/OKPTI5wJFFgGlsPMBZVm2ZQMAKmswTDjAPHLTyqAQ== X-Received: by 2002:a0d:d841:0:b0:577:189b:ad4 with SMTP id a62-20020a0dd841000000b00577189b0ad4mr18260881ywe.48.1689136649589; Tue, 11 Jul 2023 21:37:29 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c124-20020a0dc182000000b0057a05834754sm975992ywd.75.2023.07.11.21.37.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:37:29 -0700 (PDT) Date: Tue, 11 Jul 2023 21:37:24 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 06/13] sparc: add pte_free_defer() for pte_t *pgtable_t In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A552E40011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 7nf9ejap6oct76wkm8a95wpageqna4ji X-HE-Tag: 1689136650-550154 X-HE-Meta: U2FsdGVkX1+sQ7wjM7+25l8mUVcU/IAkt4tWSAjoudbji15KvJx2tjmjUitf+gZGWt7khmKrQGC4+oXdtzxZooiYQE3GbeIb8eCCGr9/6kEDy9AmHZAePsyD1VNlMPqNAriFMbhrsvH/w28KmH0P/5hzy0sKQBHL1K/cWJ8O9NKOcCBn29Y94vqmX1PuAX+9fPcpyjDPRjUgauKTlsjlWs8HIX8gP24kH0uIR3RM6Wz+Ih7lOD9LPN0E94dJi5rSA8ovyxlZZ7holvWwYybA4BJeW8HuHS9F87OzlA1PVCFam0Jp2KgOgaqHzPbdUJQMiEsxcDl6GSMJxMqZpgkbsHDGzXo2S1yxMaJeEqFG8Kb/Yl8G/kNETo3EpArM/JnNPUdU+a1U2fup5Zf/CBeUHomBY3k5XJ69VUMMoPPY03fkixHoiaexwdkTvqBBDXfpKa2BfNAIZ2GusCDzBGd6sL2DZUWXq8xMStwokOzGotFO3/pt7DgZRrjj9k0A2TAJj3fzWbBRuHVMLGtrNGDEitkNKwCbykzr1U5Z6jHXcqgGoZkRQ0wK5KhDVvT/nGBJhasC+coDQ1x+1L6WV0vpR1YBaYboMraHsp07O+1bnk5l+CU6PUTZctMWRNjE6e/4qfTQrmu+GVYoCpZdS+mPO66jv7eX8eLaA2zLSUUMZIc/UYdLRI7/Al7wKJs14O9S6BuKydUOTS76Pg3rtlhVdKA+bXpm4IWIkkDp2FVqmmM/YdJcvufqfjjS84dwkmYkAP57s9ynphneuo6QvXGHU3v3brTok/zTd0Tx5ByA9hgixjaXm/sH3sNArOIHePl2ii5lFvyv+fuaz3YaFXxSjDSf3fqsu0InJLz+5bsJKoaV/++B5qHKVWlxjWdc/9/ezH+E2xILeCbj0GZB8J9C/BaImUBvyY+bly1JmmfMTEO4fPO1m35pQmv3vgV1yUYYwzjEHlMQWHSxfxgZYY6 aFQULBMD JROVCpB9HfXQ9qGq6IEaATlSVu3xN7HkEfz1lkObbBYM+D7kICDyuUrei0Clt/ca5kVBeiiLSR3IYFCPdbGVJyOGbQFMs+IVdVsMjatO0B6G1+0nBmAkNg1l80tnDxRN7oRkJPxoB8Lv1XIHZHTr/2Vsp43TqZZhTxUmznPYxNUkc2UpP+nBNLT4pERYhvTP51oo9cjmLE3mj9CMd8Z9Pge1x8ptE+NSwE51mhhp6Zl97/WDP2UW+t4d4DS5lfgUEHlkon57+Z89ECWmSoNnHMmER6oohYKGIIF028eUDY9u1yNufCUBLhMaoydVqXBa2EGgfkTP15341Kw8M104Zju/yV3N8D2ssEAnKxnZ23jrV6yqd1pOTBkhYCwJszB0NC7KLLVYU1CPmZuif7dhOxOlFUGGNVyCm8FBBd5siuD8/1aJyutgDmLOR6Lk560S8t5dEk98oJUNrw/Kys5EXSThqS9F/K3Ux7hE9gmxIxag/mbaSYXYD6Epn7H9Dpd1gDSsR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add sparc-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. sparc32 supports pagetables sharing a page, but does not support THP; sparc64 supports THP, but does not support pagetables sharing a page. So the sparc-specific pte_free_defer() is as simple as the generic one, except for converting between pte_t *pgtable_t and struct page *. Signed-off-by: Hugh Dickins --- arch/sparc/include/asm/pgalloc_64.h | 4 ++++ arch/sparc/mm/init_64.c | 16 ++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/arch/sparc/include/asm/pgalloc_64.h b/arch/sparc/include/asm/pgalloc_64.h index 7b5561d17ab1..caa7632be4c2 100644 --- a/arch/sparc/include/asm/pgalloc_64.h +++ b/arch/sparc/include/asm/pgalloc_64.h @@ -65,6 +65,10 @@ pgtable_t pte_alloc_one(struct mm_struct *mm); void pte_free_kernel(struct mm_struct *mm, pte_t *pte); void pte_free(struct mm_struct *mm, pgtable_t ptepage); +/* arch use pte_free_defer() implementation in arch/sparc/mm/init_64.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + #define pmd_populate_kernel(MM, PMD, PTE) pmd_set(MM, PMD, PTE) #define pmd_populate(MM, PMD, PTE) pmd_set(MM, PMD, PTE) diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 04f9db0c3111..0d7fd793924c 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2930,6 +2930,22 @@ void pgtable_free(void *table, bool is_page) } #ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + __pte_free((pgtable_t)page_address(page)); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + call_rcu(&page->rcu_head, pte_free_now); +} + void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd) { From patchwork Wed Jul 12 04:38:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309564 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDB0DEB64D9 for ; Wed, 12 Jul 2023 04:38:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A4BA6B0075; Wed, 12 Jul 2023 00:38:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8548C6B0078; Wed, 12 Jul 2023 00:38:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F5D66B007B; Wed, 12 Jul 2023 00:38:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6308C6B0075 for ; Wed, 12 Jul 2023 00:38:44 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2C4BC1A0505 for ; Wed, 12 Jul 2023 04:38:44 +0000 (UTC) X-FDA: 81001704168.16.C39778B Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) by imf29.hostedemail.com (Postfix) with ESMTP id 4A78F12000A for ; Wed, 12 Jul 2023 04:38:42 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=VScKSufr; spf=pass (imf29.hostedemail.com: domain of hughd@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136722; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1rHJ9EBJseZz5NJPKd2al59RdAa4H3OEyFlgk3ZwFK8=; b=8lcgg+LSuOh8zxuBAdA4fVjOK/K5rFQWgNdVHdm49ZuMff6EQNXSu0R7Uh7vzj9VJKcwHo wXzmb4e5YtT8eCkxzABEm2EfGNgSnUgkjWvOeqe37ChX0e5pmRTSGE/30fwQPtdEQdShC0 KtzMTc7aVl1HYZtBUCi5pb6ulTZQqU4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136722; a=rsa-sha256; cv=none; b=JCQxaK9H2U5Gj18mi+4OMdozAtt1L2KJmPsE+FyXMe44jf0kgHlcA4/MYWk57s3JZL3Pdd oeMBDIFs4qpAU3AXtVvBE+0G2HPRHDkYMpsODQd52OgdY9ePhxk8Q10khZ1SzcZK1r2F3z 4mvglAduKwSWFZN8TqzGBXucBG3Lpig= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=VScKSufr; spf=pass (imf29.hostedemail.com: domain of hughd@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f177.google.com with SMTP id 00721157ae682-5776312eaddso75105647b3.3 for ; Tue, 11 Jul 2023 21:38:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136721; x=1691728721; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=1rHJ9EBJseZz5NJPKd2al59RdAa4H3OEyFlgk3ZwFK8=; b=VScKSufrGoiqmfI5RZZYxJxerglEQwWGwaRIv4A99gzC5wJzMKOiB64T2ejMrKgdKr uIsr2UxYwE3cYtc0YUlrcv6XMngYfWOsRl2pmomrgY/4Edb1ge3kiot7EKm/tncs7eVO cviypSqX9Re7SHqz2WACixffhKyzowkQkyHwSCtoVoDAd6Q8rADqqRRBehW9D7VhpQdy jfgIrKIpQWTLP3aZ5s4wAOm/AaPgQQr3FnxMyz0tHHSwgjN2BrSIYEujRK8n8SFaFHmr K+rhj1orR1amsxhQKit13Cu51WvZx8T3KsnjlJzFYQueSSTzV4TQi444dJi6Q49LPgkz NF3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136721; x=1691728721; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1rHJ9EBJseZz5NJPKd2al59RdAa4H3OEyFlgk3ZwFK8=; b=gJsFWzT06XZsCc/P6ikoHXqoLOeqMZaQ02lau0c4DX7b2QVXCcNg53JoYeIpkk3WYA cZd42/opZYOEijWYxDho+jAwMFBQbg1isRuz7v3YWLUpmVmaOIFPXl90dAha3zoTbwsQ wP6wE1qkMGMtBU9VkCzls6LsFUnDLnAxXogWjpEzrta4qMafOghZljLNOVbChTRc0Un3 dXY6YOvmykx1Y9tJYY+zUp88fcnehz6nFESOFv+8iHJvFJQCx9Bag8UniVu4G3XzHZGv KMf2EFg4rubeR8MTf62nugSRVI0DJ9BOB+ae4aQ/NtElwmvbhDrX7yz3ogZZMJa7gpeT gZPg== X-Gm-Message-State: ABy/qLb2ZHW7pv3Pdqnsrx60+C65B3OxKd0SvXomkvD+XSgyp0kKK3cC 25Z/vcnOFJbXFJnnkf5qqtYZFQ== X-Google-Smtp-Source: APBJJlETkafvYhmOQMedWmY36OTxgXTWTltj88mfBOb4d+VWzX0jPtePMxWmYiYJC8wVDqt9xdrh6g== X-Received: by 2002:a81:7105:0:b0:573:d3cd:3d2a with SMTP id m5-20020a817105000000b00573d3cd3d2amr17575683ywc.28.1689136721012; Tue, 11 Jul 2023 21:38:41 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id b126-20020a0dc084000000b0056d443372f0sm977038ywd.119.2023.07.11.21.38.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:38:40 -0700 (PDT) Date: Tue, 11 Jul 2023 21:38:35 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 07/13] s390: add pte_free_defer() for pgtables sharing page In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: <94eccf5f-264c-8abe-4567-e77f4b4e14a@google.com> References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4A78F12000A X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: p8mskr7ymzh7jjnah4h1mm3zcepas1ah X-HE-Tag: 1689136722-495086 X-HE-Meta: U2FsdGVkX18XX86iOlPpOb2eKMUUGYJdGqY1nKps7ShQxdhVQZU9PYctQT6tqd3NUUQ3gKLySNOIyrqe7xVXl+VyKG/SwqXFWuGJVGEp2h6K1btA5n1CA3jt8nZ2EwUoEgP3u50Reo43imAMxBapAxEVZ66guas8Uii0bL7y6ldgnwFkSipQy0hccEEFf8jk7m+EmilRaTc7I35lZqTY5HucZx50oRllUz+SRR5UVu2eNrA4Sp1sVpEp45SrC2uodP2H3URjfiNLGetE8mHZ+13gBOoXpIGTL7FydN1kzQZkdlmz4hPGwaoRymuzawZsKXvL6hwW6Z0yQlVBnIi3/GOSnjKXBqJEZaDX2UtwavVMXAE9XW24lFdNT2yRSKFL67soIiFqUtSCZgCQFfUyfc8dTdRG8HauW3HkjmhSJXCFAwDGinWUoUk+MBHo2v2llChTnyxQxzyFc3/6u9yYAm+si4AdA280IeWuJ94puQpKqNVmc+NQpo8RydKkSEWFNyYcBWN3JJyRhKW2znx7UpbqxnkCh67sds9zA7+nej52BIEKkbDSlW3u/ZxDoHCuEE9mj2a3J2/RF+jCslX38f3zlY5mqIYtLsDZIZLBCZ/2hVHeXtkwMAey1vc6RlsC12RmcAoosp055itcADCo5Ds5tHosFcpHW2N9NGZbOL1Ara2c12MNAv6RpkjoNfZtBuADDUK3v+ImOMcgMU1sbvyAq37o9knnuwbb2PUe3ddKZz4RcqWSCnya9cTyCO2Hw+TTKJFQkvvh0b11EM5DDLp0Ldy+7X5Z9S6927jtagpZgrMWw5Ihq7DAk7rrTnYYVT5WeTo1TzxoVDcc8A3IvXFMyhyBqmSyATGsz6WCzmERDKhRbdRdJRLb1XVZQkvUldOj0tctNh8YfwF4V/6caVwKsYg2HUFs8k16oXRwVbvwON2m9pnMXktM/mnQS+fX1/ELMdfIs9a+eNicp3w qzwGy2DY NS5lM3vTuRnJY1aSSwwLzVTWuKvTmuTxk++i+r5gtiviOxyUUzT1gDYV7GNKWxf0s+/d72jBkgEd8R33O6Ex8ES41/AYxqWT4UobyU4vA8OYrm08EhdIrQOlame3HzIYnr4B63gbOyTpbbjdza7RpXJ1fF1wTBqsipe9ZZYOkpsf0O1SUa+KSJbGbQ5PEXvy84hABEllyFGO9oR3Z/q+hbYch4bDm5YDCo9kv4IFsTafYcKNbn1zUhLtqRx4hMI8UvbonbIpuomHJt2pEjEmINAR9szLKc7bxktjYccq9G0HD4/phKwfWi2PUAAAmThfMA1xMPtUclsdDd9I06f3zF/H6u0Rf2WtkPnkjgqqPCzWfQ1R99rj7i8XdLSE6OXLj19HS2L6VqsmMm1RLcssL8JqAZmz5/QLtpYCVGsbGQ2hAPTfBbO5fasqOTrXfWhnHy+eZ5VFTEZza1PF+P7aMEEmXjruAR3YI0yRd6Tuhd20+9AQ8MX8Fg8sdCVVvCdDihEUtl/ejZkb8XLxYDeSKZLfKDQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add s390-specific pte_free_defer(), to free table page via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This version is more complicated than others: because s390 fits two 2K page tables into one 4K page (so page->rcu_head must be shared between both halves), and already uses page->lru (which page->rcu_head overlays) to list any free halves; with clever management by page->_refcount bits. Build upon the existing management, adjusted to follow a new rule: that a page is never on the free list if pte_free_defer() was used on either half (marked by PageActive). And for simplicity, delay calling RCU until both halves are freed. Not adding back unallocated fragments to the list in pte_free_defer() can result in wasting some amount of memory for pagetables, depending on how long the allocated fragment will stay in use. In practice, this effect is expected to be insignificant, and not justify a far more complex approach, which might allow to add the fragments back later in __tlb_remove_table(), where we might not have a stable mm any more. Signed-off-by: Hugh Dickins Reviewed-by: Gerald Schaefer Tested-by: Alexander Gordeev Acked-by: Alexander Gordeev --- arch/s390/include/asm/pgalloc.h | 4 ++ arch/s390/mm/pgalloc.c | 80 +++++++++++++++++++++++++++++------ 2 files changed, 72 insertions(+), 12 deletions(-) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index 17eb618f1348..89a9d5ef94f8 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -143,6 +143,10 @@ static inline void pmd_populate(struct mm_struct *mm, #define pte_free_kernel(mm, pte) page_table_free(mm, (unsigned long *) pte) #define pte_free(mm, pte) page_table_free(mm, (unsigned long *) pte) +/* arch use pte_free_defer() implementation in arch/s390/mm/pgalloc.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + void vmem_map_init(void); void *vmem_crst_alloc(unsigned long val); pte_t *vmem_pte_alloc(void); diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index 66ab68db9842..760b4ace475e 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -229,6 +229,15 @@ void page_table_free_pgste(struct page *page) * logic described above. Both AA bits are set to 1 to denote a 4KB-pgtable * while the PP bits are never used, nor such a page is added to or removed * from mm_context_t::pgtable_list. + * + * pte_free_defer() overrides those rules: it takes the page off pgtable_list, + * and prevents both 2K fragments from being reused. pte_free_defer() has to + * guarantee that its pgtable cannot be reused before the RCU grace period + * has elapsed (which page_table_free_rcu() does not actually guarantee). + * But for simplicity, because page->rcu_head overlays page->lru, and because + * the RCU callback might not be called before the mm_context_t has been freed, + * pte_free_defer() in this implementation prevents both fragments from being + * reused, and delays making the call to RCU until both fragments are freed. */ unsigned long *page_table_alloc(struct mm_struct *mm) { @@ -261,7 +270,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm) table += PTRS_PER_PTE; atomic_xor_bits(&page->_refcount, 0x01U << (bit + 24)); - list_del(&page->lru); + list_del_init(&page->lru); } } spin_unlock_bh(&mm->context.lock); @@ -281,6 +290,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm) table = (unsigned long *) page_to_virt(page); if (mm_alloc_pgste(mm)) { /* Return 4K page table with PGSTEs */ + INIT_LIST_HEAD(&page->lru); atomic_xor_bits(&page->_refcount, 0x03U << 24); memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE); memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE); @@ -300,7 +310,9 @@ static void page_table_release_check(struct page *page, void *table, { char msg[128]; - if (!IS_ENABLED(CONFIG_DEBUG_VM) || !mask) + if (!IS_ENABLED(CONFIG_DEBUG_VM)) + return; + if (!mask && list_empty(&page->lru)) return; snprintf(msg, sizeof(msg), "Invalid pgtable %p release half 0x%02x mask 0x%02x", @@ -308,6 +320,15 @@ static void page_table_release_check(struct page *page, void *table, dump_page(page, msg); } +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + pgtable_pte_page_dtor(page); + __free_page(page); +} + void page_table_free(struct mm_struct *mm, unsigned long *table) { unsigned int mask, bit, half; @@ -325,10 +346,17 @@ void page_table_free(struct mm_struct *mm, unsigned long *table) */ mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24)); mask >>= 24; - if (mask & 0x03U) + if ((mask & 0x03U) && !PageActive(page)) { + /* + * Other half is allocated, and neither half has had + * its free deferred: add page to head of list, to make + * this freed half available for immediate reuse. + */ list_add(&page->lru, &mm->context.pgtable_list); - else - list_del(&page->lru); + } else { + /* If page is on list, now remove it. */ + list_del_init(&page->lru); + } spin_unlock_bh(&mm->context.lock); mask = atomic_xor_bits(&page->_refcount, 0x10U << (bit + 24)); mask >>= 24; @@ -342,8 +370,10 @@ void page_table_free(struct mm_struct *mm, unsigned long *table) } page_table_release_check(page, table, half, mask); - pgtable_pte_page_dtor(page); - __free_page(page); + if (TestClearPageActive(page)) + call_rcu(&page->rcu_head, pte_free_now); + else + pte_free_now(&page->rcu_head); } void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, @@ -370,10 +400,18 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, */ mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24)); mask >>= 24; - if (mask & 0x03U) + if ((mask & 0x03U) && !PageActive(page)) { + /* + * Other half is allocated, and neither half has had + * its free deferred: add page to end of list, to make + * this freed half available for reuse once its pending + * bit has been cleared by __tlb_remove_table(). + */ list_add_tail(&page->lru, &mm->context.pgtable_list); - else - list_del(&page->lru); + } else { + /* If page is on list, now remove it. */ + list_del_init(&page->lru); + } spin_unlock_bh(&mm->context.lock); table = (unsigned long *) ((unsigned long) table | (0x01U << bit)); tlb_remove_table(tlb, table); @@ -403,10 +441,28 @@ void __tlb_remove_table(void *_table) } page_table_release_check(page, table, half, mask); - pgtable_pte_page_dtor(page); - __free_page(page); + if (TestClearPageActive(page)) + call_rcu(&page->rcu_head, pte_free_now); + else + pte_free_now(&page->rcu_head); } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + SetPageActive(page); + page_table_free(mm, (unsigned long *)pgtable); + /* + * page_table_free() does not do the pgste gmap_unlink() which + * page_table_free_rcu() does: warn us if pgste ever reaches here. + */ + WARN_ON_ONCE(mm_alloc_pgste(mm)); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + /* * Base infrastructure required to generate basic asces, region, segment, * and page tables that do not make use of enhanced features like EDAT1. From patchwork Wed Jul 12 04:39:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309565 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40D39EB64D9 for ; Wed, 12 Jul 2023 04:39:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D5FE36B0072; Wed, 12 Jul 2023 00:39:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0F866B0075; Wed, 12 Jul 2023 00:39:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BFEEB6B0078; Wed, 12 Jul 2023 00:39:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B2B9A6B0072 for ; Wed, 12 Jul 2023 00:39:56 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 801FB1204C0 for ; Wed, 12 Jul 2023 04:39:56 +0000 (UTC) X-FDA: 81001707192.12.EEE154E Received: from mail-yw1-f176.google.com (mail-yw1-f176.google.com [209.85.128.176]) by imf20.hostedemail.com (Postfix) with ESMTP id 9E5981C0007 for ; Wed, 12 Jul 2023 04:39:54 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=XYfimDGo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of hughd@google.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136794; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2t2oc+yeedMzdUFXYKBxRHbKcZMzrhHMUkscNN4bTko=; b=z41Czc1Z7Zw9Vz4pmz+tzR22TLPDNoORhEyJ7J49oZ3AAVdvILSX0s+BtozIb4GDaWv+A6 Bcd91LGu5+mun5BzkHkEMJr2k5YcFWIfTb0yNu9OBsWroHM1xXXUgPgLRlX8oyLx63GgR9 NQ6MQnAvxKG3hK+FBKawdvnNPRvBl+0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=XYfimDGo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of hughd@google.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136794; a=rsa-sha256; cv=none; b=wHNbmDluYk3MD69BOeM5FcExlEdYKmgGSd7SiYJZ1i4XYkwLniNEeUzlPWJQ1YPKvGGe0y QlwTf/jSxGm7RBEpxxutbXCCqmnnxoprpT26Jsl2SR3+VBaMF5WIcC/mld3/O79/OAMeDa nFIUN98KpR4a2zVUngrgvt3LtsRnl4s= Received: by mail-yw1-f176.google.com with SMTP id 00721157ae682-5704fce0f23so76229007b3.3 for ; Tue, 11 Jul 2023 21:39:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136793; x=1691728793; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2t2oc+yeedMzdUFXYKBxRHbKcZMzrhHMUkscNN4bTko=; b=XYfimDGolFEty51TCKUyAZmm69LKogtUNWLIyMCuw81Si1ZjaboZJMyonyZe8BlA1H 4dw7ywAuqviGzR+cqYGZtxV+ytY3jiQkgFAvjXIkhqrYEAwOpEMsZRaoOyR5rEd3GDD+ 6LnUt02lOICM7P8CDg5UfhiUAjvip1TQsKWkNZqXUfOXl95Q2+l607lxyvF7WfoMEkup n8srQ8rKA7PHkhNMQEN57jpX0Q3gk0ZBR2Liws8TUdU4okAi97PytXAsy84/ZZk7I96p LrZbc0ZAQKs4VNt1L3qD7duolPC2jOZozxLhx0DiCnOdCct9HdR1WwipR08docsFv3CN 0s6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136793; x=1691728793; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2t2oc+yeedMzdUFXYKBxRHbKcZMzrhHMUkscNN4bTko=; b=iarRYjVFzTGnM7sf2Hiu2/oBNpnby1bv9yI+mHe11jnxmflJTh2pF9EcWSPWEMaZKH YThMxCfAEt9wl6Xo5bLzqDC3aBwt7jBcEVYTT3aCfsgxA4/yMWF2QrBUk++YJpm4KDvw 4LjdeeiDS7kpUTHBror8GcEQ/G202NIowtFB07YznoMERJ6YnEtbeecAZ24TWL3RUnz7 22LzOkIXns+cAeXDkhKckTvL6WRgGmFX+1Yw9WK2kfg5M7u6ZkOSiVpz//d4ivUuP5BJ 7VaSj+LkMhDqz+0Coba3wOWsCIioXf1ICoPoUnKIEotvMsKuq+0O2bbT0SE863e/oLpS +ZgA== X-Gm-Message-State: ABy/qLbvyaShhQ28Urj7AapFP7Vm+hN6KJSgvZN0uxBipYOvXmLNmFwR TZX6gvidU+z3gR7cw8ZNrDEZGA== X-Google-Smtp-Source: APBJJlH7ohd7+cM8rUQREK94h0qZle8YpJFolayyIig71Hbg47eK0cJuvgfSIYJvI0oB6MYKn3tkcQ== X-Received: by 2002:a81:a187:0:b0:579:efb4:7d08 with SMTP id y129-20020a81a187000000b00579efb47d08mr15922621ywg.27.1689136793558; Tue, 11 Jul 2023 21:39:53 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id f130-20020a816a88000000b005702bfb19bfsm974843ywc.130.2023.07.11.21.39.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:39:53 -0700 (PDT) Date: Tue, 11 Jul 2023 21:39:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 08/13] mm/pgtable: add pte_free_defer() for pgtable as page In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: <78e921b0-b681-a1b0-dc20-44c9efa4ef3c@google.com> References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 9E5981C0007 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: tfoy94h8wzs5cqpzc98a6iae69hzzh7f X-HE-Tag: 1689136794-312195 X-HE-Meta: U2FsdGVkX1/uc+didIQwHPUXJqtXwlkzUa0ZQid1JYkmB/vhN28bJ5D/namX051tTf+1kXKuelq3ZeJY2/xHNxswZL2Yln8zg+UOTfGHfI7NcyGu9hEBIStde4xBweqxymJzeJYtTfzCLleAgXooY9cDSFVfuWqVBVQaJR6aHK8N2fgQjETzuDGyrDqKeDHqmG/+EyJ62HR9elI7Of88FRbKV6cZWfeZqpoHvYGQ3O8wQtnviq+pLJJZFU0xV5AryKNPB5q9SivpPuZFdOrpMjMkkKh2JENsW6ir1Sd0S7eyB/wmHz3Vduckvua3ImjUe+4tR8fgiv8neuIKuMvetJnEwFHwvs5czn4dSCLWna2cmktJT7t0ztgkmAOCaIPYXqy9tpQJ2LZanQAd4gBJ7EjaYifdezIXoQDRiI1JkmfkW2ZRaTep5A42cd33kqMQAcml1UkLOTjiH/qaUajEer6C8ZFABGn1T/sePRSdNqFCTtqxY5CH3zYdMG1UWz8gEqQT9TnS15cBuG5IdBOFoDaa46wZL6NieA9ETEth1cUf7JcVH7G4iE+A+B3ipFa225duAgyAhddgmJ/CxGtwvUOo8SX0Y++dpgkSLt8F6KjDxa5n/nfvx2ldKcWEpW0Twy0tx3ZgNeF5C7G0Kuh4dl8fyi4S6VM43T8101G/KoX/U81X+h1w/irfHiYYibfGk7j32352FVRxBGViO8XsMDAo3Jge7+FEjhGenGzYZbCW8m2QBE8qJ9yFlXD7zOjXKcvLZKXWSmLUTB4EkUC92vUKDsyusu8THCpvz9xXqF4hXA6q19pzvH/zT9r2uTaKjyVfxKdGvdfStoMxHDjyq1FNkksoRoBUYe5zoblYokt/Gy9UkPQyHmCBfLZ89dQZ1FZB0XyAyVKfzmClX/7p71ponfJ1o0FtNlgPXsr1sBsjG9bsJb/robp5EZXjeIjqPje6XhuqfR6PITtIzb5 oN8i1d0+ QBvRjDsWBdMy22AQn8lvDVy6gzKUS2Y8DjUpmUNaKFOqfZni0t4Gd+wPKVRDOq3XJjO8Ec0zB5j82XJElVF2O6vQRzV4ANnAwMdMK/flQ22e4EO+bQl7UWhqtC47nkkMGw2qWimSQHFTVmPEVmz3OIZAqdq/Pa8ZlS0Dkk6OXPgA4f/hSeA7yZwbGK8yaAhzAaiCVf0YIGuiDDK9GCY6ofHQy3miTl8jzJt75JFWy4rXcn9455fnWxBVdkqRX+alCWzVKu+CRQoy4bW9J7H2aw0Sx5xkNPsHaCsxHqZlNxXJYrnK0mVLxw93ox+u0SIELEcib5LCaZP9NIOb234GWfdcdbnPSMp+xy4sh/L1yFDcmGesZBseh4oZNfGICeX6NhufQQGKVIsG0h6/tVaCgUbkPNNnbv6j/4zY+4FlS6xQ/F5Jr/EmTeYmc+lqAl1tuk9n+Ga8x4b+Mru9rkD05fHg7116ul5Uf7dSW4UyYB0PJtpyvbWgzOctk5JoquVn5HVqlrm9DwWz6ao5aEbnmT+HCyDtj4+h/B5QE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add the generic pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This version suits all those architectures which use an unfragmented page for one page table (none of whose pte_free()s use the mm arg which was passed to it). Signed-off-by: Hugh Dickins --- include/linux/mm_types.h | 4 ++++ include/linux/pgtable.h | 2 ++ mm/pgtable-generic.c | 20 ++++++++++++++++++++ 3 files changed, 26 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index de10fc797c8e..17a7868f00bd 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -144,6 +144,10 @@ struct page { struct { /* Page table pages */ unsigned long _pt_pad_1; /* compound_head */ pgtable_t pmd_huge_pte; /* protected by page->ptl */ + /* + * A PTE page table page might be freed by use of + * rcu_head: which overlays those two fields above. + */ unsigned long _pt_pad_2; /* mapping */ union { struct mm_struct *pt_mm; /* x86 pgds only */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 7f2db400f653..9fa34be65159 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -112,6 +112,8 @@ static inline void pte_unmap(pte_t *pte) } #endif +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* Find an entry in the second-level page table.. */ #ifndef pmd_offset static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index b9a0c2137cc1..fa9d4d084291 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -13,6 +13,7 @@ #include #include #include +#include #include /* @@ -230,6 +231,25 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, return pmd; } #endif + +/* arch define pte_free_defer in asm/pgalloc.h for its own implementation */ +#ifndef pte_free_defer +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + pte_free(NULL /* mm not passed and not used */, (pgtable_t)page); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = pgtable; + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* pte_free_defer */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ From patchwork Wed Jul 12 04:41:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6975AEB64DA for ; Wed, 12 Jul 2023 04:41:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B4D86B0072; Wed, 12 Jul 2023 00:41:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0652C6B0075; Wed, 12 Jul 2023 00:41:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E499B8D0001; Wed, 12 Jul 2023 00:41:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D65386B0072 for ; Wed, 12 Jul 2023 00:41:12 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AD4ACC04E3 for ; Wed, 12 Jul 2023 04:41:12 +0000 (UTC) X-FDA: 81001710384.25.C83E173 Received: from mail-oi1-f175.google.com (mail-oi1-f175.google.com [209.85.167.175]) by imf24.hostedemail.com (Postfix) with ESMTP id D555A180011 for ; Wed, 12 Jul 2023 04:41:10 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=61XLdsWL; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.167.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136870; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pdyvejeRbaj0R0BaC9Cc3znYRUv3s+ZMzSpoEQD9sfY=; b=d6wyRRKduU86kH0KV7qRtK5RwzTIxkuc7tZ2uqgAbVIiJkpSJED8pO4G50xCDrDN8yAcol wpd733kSQmgEzvXiFjpDE4BymUIccjKNZ2FtuNz3l3BZKewhvGMMJWdTOS+OTmDoWS8ygi sPPJf+ROtpJflvhJqDvAUxB7sYncQoE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136870; a=rsa-sha256; cv=none; b=y7E0SbtdlwtJSuvobVmrb1xACupsdMp4EWG2QA7bwCOquCe77O3iLClNyqkjpa0lIMijIK NAZEZNQlaSJZweekeLba9a3aBQbaF8cZazSTaPWQZjzIzTcVsmOcs/WWcc+tyKqWm9mzn0 xKrm/BVX3mnWRxWGe/gXY1Ru55T6AKk= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=61XLdsWL; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.167.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-oi1-f175.google.com with SMTP id 5614622812f47-3a3b7f992e7so4539395b6e.2 for ; Tue, 11 Jul 2023 21:41:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136870; x=1691728870; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=pdyvejeRbaj0R0BaC9Cc3znYRUv3s+ZMzSpoEQD9sfY=; b=61XLdsWLlwNNltnknx8ZjoR3EUlUK1Bj/tcmgFfsWXA5aEpxm/PqyvJFzxt5BuWIF5 zvkWTe7pBGsdce8DeFpFHH4Wv/K1wf5gUEvTUmFwZ+ZEj2uTMPjaV81PW6un2ztL4cJ5 9Ojyupv1pMoBHR+B3ZOxdkHaWiIu1oytKgz668XEKRlUKWdagmZpxzNOsz/QaT8WYz4C 9ahjf6cCwFgvF6+Dv8kjsNK8eR0ApMsjPotTekDn9sMm+XfiULYXYS62qM1PSv68kQHL u/ggVV8M7nVTjthYMrmq77CcFpe53snWzgS02pgsIvmeEXYcD9Ow8tPml9gf+ldm9rwg OqTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136870; x=1691728870; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pdyvejeRbaj0R0BaC9Cc3znYRUv3s+ZMzSpoEQD9sfY=; b=dRy0gIh2AP2XySpBoB3sVBBDnOWduwQITyCtpIVmuOGyVa/dYCa/ZpG1/NHO1JSfYp KzPHmw78aDoaV3OS+9PIqctvRbAmskb3k7WI8mZ+Cg7shvqKE4D8HRtFg9t7qrKiejfO Pu7ru8wbsglhTr7Y8qPrcPP7L9aWSG+RIfTrPLYw8DzV12WwFx/JfUIY3XyuvxSyudvC CZFQN9QW5iGIgMMhMnFEPddnc9rIib28D9Jy71BGsE9cXRqXpPvj375V7sW2QRR8iVbo ScUZJXQbVd0paVo+RNQlCI38kl44yTqy3+X0xDi5EhWL6L0LagS3NxV2tqLWwqAIgaZj 0dCA== X-Gm-Message-State: ABy/qLa5fNkakNvMS+BbrafVpm37QTrk8UNFxhxiZAZgZ9Y97G4wCk4t iGUafdPLprYPWKx8h247r9aW6Q== X-Google-Smtp-Source: APBJJlGPp9vKO9NKxpObH1s2Z1lE514jxJK2RUhjTAY7j52rpXq2fwG1pZB3jtlZM66ZqR38ppt1QA== X-Received: by 2002:a05:6808:1689:b0:3a3:64a3:b5a1 with SMTP id bb9-20020a056808168900b003a364a3b5a1mr17094671oib.7.1689136869723; Tue, 11 Jul 2023 21:41:09 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id u203-20020a8184d4000000b005772abf6234sm970493ywf.11.2023.07.11.21.41.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:41:09 -0700 (PDT) Date: Tue, 11 Jul 2023 21:41:04 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 09/13] mm/khugepaged: retract_page_tables() without mmap or vma lock In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: D555A180011 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: b7u7saoy4n1nxskj1me79ky48knfsy87 X-HE-Tag: 1689136870-215552 X-HE-Meta: U2FsdGVkX1+9qyCu5NzGOrshysvxbzfg5VDfhCR4MGpzkCg0pHYMo6ki74un9aPXBP/b3psdWLIWn9Dt2qJLjISLhxrp3CSojKpa+i8c5Q1NW5VS8A/3gEW4fcw7g1yDkwsPzoA4YEnJ8ELMlHcjMtjjpqSQMzkEiILbRFdI6RPctO5wHYuvPWSaE+1QXoo5ikVJ221a6CccUAYY2SSW7yQcnhLuaQ/bgBqXkltdBZGDvYWNBgcgfiOGVeSdyKTg9qQWwjx7hxMIg2AhKyHN6uanbniLGL0BZhvhs7UaCmsV8lYbmoAkvUFaUpdKhevD7cKrja2h+cRnB8LgcukL0AOGwV46FDkO+1CI546E5fB9XzMQlIzIeHJPhsNhO8M/8KpMzNJJyo1nowQ2pWmfvrwOnvSrU4PC++TEI+JXGfVeeLUw7ppUDjPp75/9zG0kRMU6Rk8W8CGnEHMzbrCW214Lvx+SkaRojmErNHA+wAaN4M+nhMPk98MgtYIoHQsFD8hEBJWTUeSV1nGD6cFzCPQIoM2x7qn3WuVhmM4S1QW6Iu7Z4hTM4KTbmO3LHmu1X+tQHFFOpK2gTyuMf8N3Nm1baq05V52XlPcvBLyQpv+ateoaI4NtQBQ7hYCCo05P/QZ90paFnglmeUzHdmzX4vpyxtW2v6SddtWasEDbkGSBzEOEcBRPDtLC9JsquM66AwW9yrASaLlGnwEPoj4Q1gZBpd2OoOCCCYv4FFSnVhhJsTOch3jNfJ4eXvRl57boyYIR6stvW2wBzenF7f3AIKBzDwAq3fPQeZQ2CEKr7YTTl6PnE0VqjluUAWHlcS3ZezEbW3lOOOa1pjFqMlO49IfqyFl97YFYnWM+sbkxgl86O8mSOsID2RsiGw4U/zWdOw3AvKfVQgkk6PXzkODm5U9AHik2DNXrbdLh+2JlGvKYa4rHXzVJydSHIpEbklrv/qDRNbn+M7giivY3aNv tMvnL5MN nNSdpDsPkAEoI/omkzIQv2u/OEwjH3EqO2waBz9VN2xxk8BD2568YiYKxylwj2gPLrXCSqAdOKq5qPLF/yZfaMNUs0CwmNOmRRNKkbDkwTsDVSkTh59EsqDPD4rRzFdeo48psbfOH0dFGGQXoGImJ1plkspvGwB/sgbKH6GDiZrRiArtFiTMcUZfnzASFoi+r6p1HR0nYDo6yaazey4uGZ0urmS7df2ThkuxqpSXOtoI79KRcmmJRhbhIOiZhME1/2/A7pU87n/wdeNIGRIB3nY1Co0TezfmVTX1Ee4IKbmTgK6vA43EWl7F/fhLczU+btHPT7rk+XGoD+YrzOpyJ4SSAIcd0mBXRwKuQZYHiopUbcaYlb6C1c8glXQvjuE7coVrlf2NOaHDAC04dHH4Dg3wAR3VW/dtAcHtdGqNX4D6L88wWySX6rxyY6Ilv9iFk2PGdaWp38IFUg7YIMyZQb8vqpJnRwt5rxki3IBdKA7kFNSu/Y+IEbOjKI40xdQzYLdJD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Simplify shmem and file THP collapse's retract_page_tables(), and relax its locking: to improve its success rate and to lessen impact on others. Instead of its MADV_COLLAPSE case doing set_huge_pmd() at target_addr of target_mm, leave that part of the work to madvise_collapse() calling collapse_pte_mapped_thp() afterwards: just adjust collapse_file()'s result code to arrange for that. That spares retract_page_tables() four arguments; and since it will be successful in retracting all of the page tables expected of it, no need to track and return a result code itself. It needs i_mmap_lock_read(mapping) for traversing the vma interval tree, but it does not need i_mmap_lock_write() for that: page_vma_mapped_walk() allows for pte_offset_map_lock() etc to fail, and uses pmd_lock() for THPs. retract_page_tables() just needs to use those same spinlocks to exclude it briefly, while transitioning pmd from page table to none: so restore its use of pmd_lock() inside of which pte lock is nested. Users of pte_offset_map_lock() etc all now allow for them to fail: so retract_page_tables() now has no use for mmap_write_trylock() or vma_try_start_write(). In common with rmap and page_vma_mapped_walk(), it does not even need the mmap_read_lock(). But those users do expect the page table to remain a good page table, until they unlock and rcu_read_unlock(): so the page table cannot be freed immediately, but rather by the recently added pte_free_defer(). Use the (usually a no-op) pmdp_get_lockless_sync() to send an interrupt when PAE, and pmdp_collapse_flush() did not already do so: to make sure that the start,pmdp_get_lockless(),end sequence in __pte_offset_map() cannot pick up a pmd entry with mismatched pmd_low and pmd_high. retract_page_tables() can be enhanced to replace_page_tables(), which inserts the final huge pmd without mmap lock: going through an invalid state instead of pmd_none() followed by fault. But that enhancement does raise some more questions: leave it until a later release. Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 184 ++++++++++++++++++++------------------------------ 1 file changed, 75 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 78c8d5d8b628..3bb05147961b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1615,9 +1615,8 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, break; case SCAN_PMD_NONE: /* - * In MADV_COLLAPSE path, possible race with khugepaged where - * all pte entries have been removed and pmd cleared. If so, - * skip all the pte checks and just update the pmd mapping. + * All pte entries have been removed and pmd cleared. + * Skip all the pte checks and just update the pmd mapping. */ goto maybe_install_pmd; default: @@ -1748,123 +1747,88 @@ static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_sl mmap_write_unlock(mm); } -static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, - struct mm_struct *target_mm, - unsigned long target_addr, struct page *hpage, - struct collapse_control *cc) +static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; - int target_result = SCAN_FAIL; - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { - int result = SCAN_FAIL; - struct mm_struct *mm = NULL; - unsigned long addr = 0; - pmd_t *pmd; - bool is_target = false; + struct mmu_notifier_range range; + struct mm_struct *mm; + unsigned long addr; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml; + spinlock_t *ptl; + bool skipped_uffd = false; /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that - * got written to. These VMAs are likely not worth investing - * mmap_write_lock(mm) as PMD-mapping is likely to be split - * later. - * - * Note that vma->anon_vma check is racy: it can be set up after - * the check but before we took mmap_lock by the fault path. - * But page lock would prevent establishing any new ptes of the - * page, so we are safe. - * - * An alternative would be drop the check, but check that page - * table is clear before calling pmdp_collapse_flush() under - * ptl. It has higher chance to recover THP for the VMA, but - * has higher cost too. It would also probably require locking - * the anon_vma. + * got written to. These VMAs are likely not worth removing + * page tables from, as PMD-mapping is likely to be split later. */ - if (READ_ONCE(vma->anon_vma)) { - result = SCAN_PAGE_ANON; - goto next; - } + if (READ_ONCE(vma->anon_vma)) + continue; + addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); if (addr & ~HPAGE_PMD_MASK || - vma->vm_end < addr + HPAGE_PMD_SIZE) { - result = SCAN_VMA_CHECK; - goto next; - } - mm = vma->vm_mm; - is_target = mm == target_mm && addr == target_addr; - result = find_pmd_or_thp_or_none(mm, addr, &pmd); - if (result != SCAN_SUCCEED) - goto next; - /* - * We need exclusive mmap_lock to retract page table. - * - * We use trylock due to lock inversion: we need to acquire - * mmap_lock while holding page lock. Fault path does it in - * reverse order. Trylock is a way to avoid deadlock. - * - * Also, it's not MADV_COLLAPSE's job to collapse other - * mappings - let khugepaged take care of them later. - */ - result = SCAN_PTE_MAPPED_HUGEPAGE; - if ((cc->is_khugepaged || is_target) && - mmap_write_trylock(mm)) { - /* trylock for the same lock inversion as above */ - if (!vma_try_start_write(vma)) - goto unlock_next; - - /* - * Re-check whether we have an ->anon_vma, because - * collapse_and_free_pmd() requires that either no - * ->anon_vma exists or the anon_vma is locked. - * We already checked ->anon_vma above, but that check - * is racy because ->anon_vma can be populated under the - * mmap lock in read mode. - */ - if (vma->anon_vma) { - result = SCAN_PAGE_ANON; - goto unlock_next; - } - /* - * When a vma is registered with uffd-wp, we can't - * recycle the pmd pgtable because there can be pte - * markers installed. Skip it only, so the rest mm/vma - * can still have the same file mapped hugely, however - * it'll always mapped in small page size for uffd-wp - * registered ranges. - */ - if (hpage_collapse_test_exit(mm)) { - result = SCAN_ANY_PROCESS; - goto unlock_next; - } - if (userfaultfd_wp(vma)) { - result = SCAN_PTE_UFFD_WP; - goto unlock_next; - } - collapse_and_free_pmd(mm, vma, addr, pmd); - if (!cc->is_khugepaged && is_target) - result = set_huge_pmd(vma, addr, pmd, hpage); - else - result = SCAN_SUCCEED; - -unlock_next: - mmap_write_unlock(mm); - goto next; - } - /* - * Calling context will handle target mm/addr. Otherwise, let - * khugepaged try again later. - */ - if (!is_target) { - khugepaged_add_pte_mapped_thp(mm, addr); + vma->vm_end < addr + HPAGE_PMD_SIZE) continue; + + mm = vma->vm_mm; + if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED) + continue; + + if (hpage_collapse_test_exit(mm)) + continue; + /* + * When a vma is registered with uffd-wp, we cannot recycle + * the page table because there may be pte markers installed. + * Other vmas can still have the same file mapped hugely, but + * skip this one: it will always be mapped in small page size + * for uffd-wp registered ranges. + */ + if (userfaultfd_wp(vma)) + continue; + + /* PTEs were notified when unmapped; but now for the PMD? */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + addr, addr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + pml = pmd_lock(mm, pmd); + ptl = pte_lockptr(mm, pmd); + if (ptl != pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + + /* + * Huge page lock is still held, so normally the page table + * must remain empty; and we have already skipped anon_vma + * and userfaultfd_wp() vmas. But since the mmap_lock is not + * held, it is still possible for a racing userfaultfd_ioctl() + * to have inserted ptes or markers. Now that we hold ptlock, + * repeating the anon_vma check protects from one category, + * and repeating the userfaultfd_wp() check from another. + */ + if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) { + skipped_uffd = true; + } else { + pgt_pmd = pmdp_collapse_flush(vma, addr, pmd); + pmdp_get_lockless_sync(); + } + + if (ptl != pml) + spin_unlock(ptl); + spin_unlock(pml); + + mmu_notifier_invalidate_range_end(&range); + + if (!skipped_uffd) { + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, addr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); } -next: - if (is_target) - target_result = result; } - i_mmap_unlock_write(mapping); - return target_result; + i_mmap_unlock_read(mapping); } /** @@ -2259,9 +2223,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, /* * Remove pte page tables, so we can re-fault the page as huge. + * If MADV_COLLAPSE, adjust result to call collapse_pte_mapped_thp(). */ - result = retract_page_tables(mapping, start, mm, addr, hpage, - cc); + retract_page_tables(mapping, start); + if (cc && !cc->is_khugepaged) + result = SCAN_PTE_MAPPED_HUGEPAGE; unlock_page(hpage); /* From patchwork Wed Jul 12 04:42:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309568 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29B7FEB64D9 for ; Wed, 12 Jul 2023 04:42:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DBE66B0075; Wed, 12 Jul 2023 00:42:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 98B8E6B0078; Wed, 12 Jul 2023 00:42:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 805E08D0001; Wed, 12 Jul 2023 00:42:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 69D2F6B0075 for ; Wed, 12 Jul 2023 00:42:27 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2D8881204C0 for ; Wed, 12 Jul 2023 04:42:27 +0000 (UTC) X-FDA: 81001713534.19.4B53605 Received: from mail-yb1-f172.google.com (mail-yb1-f172.google.com [209.85.219.172]) by imf30.hostedemail.com (Postfix) with ESMTP id 59D8380018 for ; Wed, 12 Jul 2023 04:42:25 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=GZoRZcbx; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689136945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Su/l5hM860ozuqpCkqwtzzC0FfEEdtSDi6e1iwuYpug=; b=Crh/o5OchpJXLSc+PfxD8vfaCr9wdeVi5IGpoQjtw9IBi1i9atRRlfIL5XtvegA1jcscQp Kg3p0rvpkGO8j8Qm4gHrF1CKADxTsGlQCDmRusw7fkmNvbbGD7Vln/LJO7QZVtETOxdIqm 6MWSiLiicqwxkxylaOc3AfIk4HRgY9A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689136945; a=rsa-sha256; cv=none; b=b3DE/Uk3lLx6RwygadWjesNOKmrLDixGL0hN59+1kdBr2doNXhYPo4by0mfxFnipV6IMWA 94OL9bKWK72yktUTTfQoAJ1DS1IgHSAGvE1gxphXR78PnCKSeiRUE/a3u7UDbxTG8wvorO bFyVY3HGXKWl9TxzO4o+gIVplOtL+8U= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=GZoRZcbx; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f172.google.com with SMTP id 3f1490d57ef6-c2cf29195f8so7733286276.1 for ; Tue, 11 Jul 2023 21:42:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136944; x=1691728944; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Su/l5hM860ozuqpCkqwtzzC0FfEEdtSDi6e1iwuYpug=; b=GZoRZcbxUGx8JmnbydjZ65tUfTG/RCwN/jSORiifmfhsG46/OVpitJnVjI7Tvfrx66 yQKLrj9HSgcav8y1NjkbsLVKwlBLhiB07ffHvy0W5j9nFzsbypZd0htmKlFz47fp9Brd vb96MPWyt92vLrnMOzyhlT2f9c0j2h+mhC/orwGiKeqNf8uE52GrM39x9xkEKYzFJz/J Y7nqbs80ZE9xDlpAf1eoisZIqt1BHL1m+rcz6cqjFJmL7Dpw/Q2zTxvLuFLsfewvvmcc oBMm70r0MkpKOdO4rfwiGTYxRwmWgDWr98VsaQ1GlpPMjr10FdTViBtnADZdkEupuAWp FhZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136944; x=1691728944; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Su/l5hM860ozuqpCkqwtzzC0FfEEdtSDi6e1iwuYpug=; b=NNvPU6AumWIf9YMH4ZNp5LZSNrnoE+OACk+hJWdJPeA0krSt2dsaigSLSpj8lAHotB m3iR73MhH/XKfVLCIRQvVmkFssn6j4JBr+jRzbmjJNa4A/j+HXcE3yJd1KdjiPn99Zmd SxqWRw2VZ50x2PjMAiTLNY0R76DzhqYAX59uqCl9OrmU2wctpSY3mP5M1eICBQmM0PwF RKFhtU8jVOq4xmmqF8zvYFNOhD4FW1nU0J7vZXUr/iXoPi/ojdmVzgkHjP+L3ltF62YO DFyao9lzrM/AJ8Z/+QRd+N65mv4QBWxNI0PCOK4Nn6SU/CT+jyJh0gfURTTsGVaaDON5 zD2A== X-Gm-Message-State: ABy/qLacfWWd7W/lxDZ/M+xeBIPwUQXPe/QaclfzCGZQwoTziRNkcI9R JAdG8Y2roPz+zwwujg56kp/+Xg== X-Google-Smtp-Source: APBJJlFyWWTfqTiLrJedDsm7ZDw08POtyd5zLp0uOv6BSUnlUd1ofH5+6axcSto3EVAkbgdVFZKvLA== X-Received: by 2002:a25:c343:0:b0:c16:8d80:228b with SMTP id t64-20020a25c343000000b00c168d80228bmr14716106ybf.37.1689136944086; Tue, 11 Jul 2023 21:42:24 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m9-20020a258c89000000b00c4ec3a3f695sm752090ybl.46.2023.07.11.21.42.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:42:23 -0700 (PDT) Date: Tue, 11 Jul 2023 21:42:19 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 59D8380018 X-Rspam-User: X-Stat-Signature: ssgy9o4zcd9zz7cbffco375cfgmzjfnj X-Rspamd-Server: rspam03 X-HE-Tag: 1689136945-888007 X-HE-Meta: U2FsdGVkX19AcR5YS+jO/gb1h19LGoFn9UEmwWyFdCSUm+6D6qxqmIbYu2mcS0RGtQCw+QaLajbaL/EWYtGqffQymFBpvjFQTlOlMm5VOR1mJ5T2ST4FsOv6TRQnRzKDk3qLfqpKKapeMMW22b66GK6qBYEeyuNJRTvVbN02nJ4IrwwWtb9WE/LAgIt+1Ey0YsJVbRTnfMuXBqqCBLDPIEvD1DthOcCWq/Oh+8mLbngXk6fvyXuzdJJ5be+30VTH3yygAJRUSup4DEeDVuQl3F9vEC1+JivmXnQL9wv9bKgRcciV2+dSpfVG4+An2UbT18nmpolKPBegBYDjUCCr2Zi9JDSdTBUr83hxeuvfR6khrJSZYbck+m3rOxTFoEOUbsUFU2+hDBZnyNw7NACjTuK7+DawnhDA3MOYZL6H/pioig86AFIBJ/WfOoTdXOZEbiqBzyz2+aiRnJ+hCFBZXpgQts/sKAk1E9c1Y4VcoLSc3ubFQ6TZwsUqpU6vKEmJFkD9SNM+G+nHrhzRFOej/ISy2X0hXVLOR5E1syNIQVqyTlsQJU0xbokQMohNf1musWj8GGteowsCtcooZZekRcv0urMpbhjuboSzjZfWvVaMnv7MGtf7NN0bwH22poAdU4YvaTdm0mnXBm/TCW7kJvnNrhKp4ZNwZc2Q3QICW8CExYX82zK6WBOzbJ67rnDHm3ZIgf0rqkI4LpV8kaICgMyRjUOQ8UuLjyOcwa3kf52KGTk7JirwSVJWQqqTs41JW2YgIER25h9RBpYd+++seX3qgqPo+5itp6112vHqVLefF1iaIlyoC4fVCkL+j/siSePwmVY2SCR43NUqEvdZjvjayUn5m0waGrCGLc7YteHn27WE4ymRnlAyKb0XzxWzDDXxDA2KCIFOgCrI1fnDROyV2ILwnJg5eew8qTRsEU/iNMm1JZgXWBpxd+BA9dRE/kmrRXBFxHG9Qcyf00b u7S0m+5j ggc/zbGQtRXwlHzbnJ4yG2RYTKLDmnix7OZBbvifH757FczXBlgIlQ01Ch4qJG6zuljNky3SVX+KCyKcJ+KtxvSWXDlXIM5qKesewLbRjHXJTyd4M6657Yd2ZKPY33qKKJmVW2t1xRA4zoL9pX37Q4N09BQuOCyTPYAE84vnfeJpjBxoFDfreLbqM/QFxozKBAvw4UCLHWAEQjWagxthyPVlv6cY6AeQ2gpOFdRGJRRxkSnr2pj1ZM+1ArORAiaYbdqenniY8pAMswtDyGyCYCwypZW97TGS9YsfL3BU2hLvXUh67GPwiXYDoukOCL9jvgxxAPqEqJDpU6Vr3+juGZfXWsi+vJLZNEUlGky9XmVz4cXYMgUJWUipRFiL5WYDpyhwj63AxYsnv35S/PgKOApBy1gGGhD1JtKApGC/c6ZSUHnEGbBtg0w6knG4wKGxoT/yjQuV3jBAGPNCYx0kD1ad/anbsU3Up/4at+I+v7JBMtDNAMhXC4BhqEM0AAFtoECgI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp(). It does need mmap_read_lock(), but it does not need mmap_write_lock(), nor vma_start_write() nor i_mmap lock nor anon_vma lock. All racing paths are relying on pte_offset_map_lock() and pmd_lock(), so use those. Follow the pattern in retract_page_tables(); and using pte_free_defer() removes most of the need for tlb_remove_table_sync_one() here; but call pmdp_get_lockless_sync() to use it in the PAE case. First check the VMA, in case page tables are being torn down: from JannH. Confirm the preliminary find_pmd_or_thp_or_none() once page lock has been acquired and the page looks suitable: from then on its state is stable. However, collapse_pte_mapped_thp() was doing something others don't: freeing a page table still containing "valid" entries. i_mmap lock did stop a racing truncate from double-freeing those pages, but we prefer collapse_pte_mapped_thp() to clear the entries as usual. Their TLB flush can wait until the pmdp_collapse_flush() which follows, but the mmu_notifier_invalidate_range_start() has to be done earlier. Do the "step 1" checking loop without mmu_notifier: it wouldn't be good for khugepaged to keep on repeatedly invalidating a range which is then found unsuitable e.g. contains COWs. "step 2", which does the clearing, must then be more careful (after dropping ptl to do mmu_notifier), with abort prepared to correct the accounting like "step 3". But with those entries now cleared, "step 4" (after dropping ptl to do pmd_lock) is kept safe by the huge page lock, which stops new PTEs from being faulted in. Signed-off-by: Hugh Dickins Reviewed-by: Qi Zheng --- mm/khugepaged.c | 172 ++++++++++++++++++++++---------------------------- 1 file changed, 77 insertions(+), 95 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 3bb05147961b..46986eb4eebb 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1483,7 +1483,7 @@ static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, return ret; } -/* hpage must be locked, and mmap_lock must be held in write */ +/* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) { @@ -1495,7 +1495,7 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, }; VM_BUG_ON(!PageTransHuge(hpage)); - mmap_assert_write_locked(vma->vm_mm); + mmap_assert_locked(vma->vm_mm); if (do_set_pmd(&vmf, hpage)) return SCAN_FAIL; @@ -1504,48 +1504,6 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, return SCAN_SUCCEED; } -/* - * A note about locking: - * Trying to take the page table spinlocks would be useless here because those - * are only used to synchronize: - * - * - modifying terminal entries (ones that point to a data page, not to another - * page table) - * - installing *new* non-terminal entries - * - * Instead, we need roughly the same kind of protection as free_pgtables() or - * mm_take_all_locks() (but only for a single VMA): - * The mmap lock together with this VMA's rmap locks covers all paths towards - * the page table entries we're messing with here, except for hardware page - * table walks and lockless_pages_from_mm(). - */ -static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) -{ - pmd_t pmd; - struct mmu_notifier_range range; - - mmap_assert_write_locked(mm); - if (vma->vm_file) - lockdep_assert_held_write(&vma->vm_file->f_mapping->i_mmap_rwsem); - /* - * All anon_vmas attached to the VMA have the same root and are - * therefore locked by the same lock. - */ - if (vma->anon_vma) - lockdep_assert_held_write(&vma->anon_vma->root->rwsem); - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, - addr + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - pmd = pmdp_collapse_flush(vma, addr, pmdp); - tlb_remove_table_sync_one(); - mmu_notifier_invalidate_range_end(&range); - mm_dec_nr_ptes(mm); - page_table_check_pte_clear_range(mm, addr, pmd); - pte_free(mm, pmd_pgtable(pmd)); -} - /** * collapse_pte_mapped_thp - Try to collapse a pte-mapped THP for mm at * address haddr. @@ -1561,26 +1519,29 @@ static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *v int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, bool install_pmd) { + struct mmu_notifier_range range; + bool notified = false; unsigned long haddr = addr & HPAGE_PMD_MASK; struct vm_area_struct *vma = vma_lookup(mm, haddr); struct page *hpage; pte_t *start_pte, *pte; - pmd_t *pmd; - spinlock_t *ptl; - int count = 0, result = SCAN_FAIL; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml, *ptl; + int nr_ptes = 0, result = SCAN_FAIL; int i; - mmap_assert_write_locked(mm); + mmap_assert_locked(mm); + + /* First check VMA found, in case page tables are being torn down */ + if (!vma || !vma->vm_file || + !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) + return SCAN_VMA_CHECK; /* Fast check before locking page if already PMD-mapped */ result = find_pmd_or_thp_or_none(mm, haddr, &pmd); if (result == SCAN_PMD_MAPPED) return result; - if (!vma || !vma->vm_file || - !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) - return SCAN_VMA_CHECK; - /* * If we are here, we've succeeded in replacing all the native pages * in the page cache with a single hugepage. If a mm were to fault-in @@ -1610,6 +1571,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } + result = find_pmd_or_thp_or_none(mm, haddr, &pmd); switch (result) { case SCAN_SUCCEED: break; @@ -1623,27 +1585,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } - /* Lock the vma before taking i_mmap and page table locks */ - vma_start_write(vma); - - /* - * We need to lock the mapping so that from here on, only GUP-fast and - * hardware page walks can access the parts of the page tables that - * we're operating on. - * See collapse_and_free_pmd(). - */ - i_mmap_lock_write(vma->vm_file->f_mapping); - - /* - * This spinlock should be unnecessary: Nobody else should be accessing - * the page tables under spinlock protection here, only - * lockless_pages_from_mm() and the hardware page walker can access page - * tables while all the high-level locks are held in write mode. - */ result = SCAN_FAIL; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); - if (!start_pte) - goto drop_immap; + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto drop_hpage; /* step 1: check all mapped PTEs are to the right huge page */ for (i = 0, addr = haddr, pte = start_pte; @@ -1670,10 +1615,18 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, */ if (hpage + i != page) goto abort; - count++; } - /* step 2: adjust rmap */ + pte_unmap_unlock(start_pte, ptl); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + haddr, haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + notified = true; + start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto abort; + + /* step 2: clear page table and adjust rmap */ for (i = 0, addr = haddr, pte = start_pte; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { struct page *page; @@ -1681,47 +1634,76 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, if (pte_none(ptent)) continue; - page = vm_normal_page(vma, addr, ptent); - if (WARN_ON_ONCE(page && is_zone_device_page(page))) + /* + * We dropped ptl after the first scan, to do the mmu_notifier: + * page lock stops more PTEs of the hpage being faulted in, but + * does not stop write faults COWing anon copies from existing + * PTEs; and does not stop those being swapped out or migrated. + */ + if (!pte_present(ptent)) { + result = SCAN_PTE_NON_PRESENT; goto abort; + } + page = vm_normal_page(vma, addr, ptent); + if (hpage + i != page) + goto abort; + + /* + * Must clear entry, or a racing truncate may re-remove it. + * TLB flush can be left until pmdp_collapse_flush() does it. + * PTE dirty? Shmem page is already dirty; file is read-only. + */ + pte_clear(mm, addr, pte); page_remove_rmap(page, vma, false); + nr_ptes++; } pte_unmap_unlock(start_pte, ptl); /* step 3: set proper refcount and mm_counters. */ - if (count) { - page_ref_sub(hpage, count); - add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); + if (nr_ptes) { + page_ref_sub(hpage, nr_ptes); + add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); } - /* step 4: remove pte entries */ - /* we make no change to anon, but protect concurrent anon page lookup */ - if (vma->anon_vma) - anon_vma_lock_write(vma->anon_vma); + /* step 4: remove page table */ - collapse_and_free_pmd(mm, vma, haddr, pmd); + /* Huge page lock is still held, so page table must remain empty */ + pml = pmd_lock(mm, pmd); + if (ptl != pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd); + pmdp_get_lockless_sync(); + if (ptl != pml) + spin_unlock(ptl); + spin_unlock(pml); - if (vma->anon_vma) - anon_vma_unlock_write(vma->anon_vma); - i_mmap_unlock_write(vma->vm_file->f_mapping); + mmu_notifier_invalidate_range_end(&range); + + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, haddr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); maybe_install_pmd: /* step 5: install pmd entry */ result = install_pmd ? set_huge_pmd(vma, haddr, pmd, hpage) : SCAN_SUCCEED; - + goto drop_hpage; +abort: + if (nr_ptes) { + flush_tlb_mm(mm); + page_ref_sub(hpage, nr_ptes); + add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); + } + if (start_pte) + pte_unmap_unlock(start_pte, ptl); + if (notified) + mmu_notifier_invalidate_range_end(&range); drop_hpage: unlock_page(hpage); put_page(hpage); return result; - -abort: - pte_unmap_unlock(start_pte, ptl); -drop_immap: - i_mmap_unlock_write(vma->vm_file->f_mapping); - goto drop_hpage; } static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) @@ -2855,9 +2837,9 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, case SCAN_PTE_MAPPED_HUGEPAGE: BUG_ON(mmap_locked); BUG_ON(*prev); - mmap_write_lock(mm); + mmap_read_lock(mm); result = collapse_pte_mapped_thp(mm, addr, true); - mmap_write_unlock(mm); + mmap_locked = true; goto handle_result; /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: From patchwork Wed Jul 12 04:43:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B10CEB64D9 for ; Wed, 12 Jul 2023 04:43:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B892A6B0075; Wed, 12 Jul 2023 00:43:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B11FD8D0001; Wed, 12 Jul 2023 00:43:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B30A6B007B; Wed, 12 Jul 2023 00:43:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8A7936B0075 for ; Wed, 12 Jul 2023 00:43:44 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5B10716050C for ; Wed, 12 Jul 2023 04:43:44 +0000 (UTC) X-FDA: 81001716768.11.45E0C52 Received: from mail-yw1-f176.google.com (mail-yw1-f176.google.com [209.85.128.176]) by imf09.hostedemail.com (Postfix) with ESMTP id 7AE0D140007 for ; Wed, 12 Jul 2023 04:43:42 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=kQgLOire; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of hughd@google.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689137022; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=32J2R02jsERy50AfXzN+Fv7mkbkTcayWxoVtpmMI0mo=; b=S8dL//9HEazDZSkYzzcVn/7eMPJxgythgwzdM7VGFa3Ou56ZkV9Rg6K8P2STQ1KDErb7Xt 4p7bPlaz0rWXB+HRsraJFohernMlgYcViioL+7itquCs1JkCvHWcC9KBN/E8QlSZVXg9g3 pk+3LkGrqHG+89qWcEXEIpe4K4r3RVs= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=kQgLOire; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of hughd@google.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689137022; a=rsa-sha256; cv=none; b=mZbzh3OSnZBd/TVb03gFgVYyap5Cj4XcAFMcDPgLloMmCtItup9p2IASXUljOrql9Gze8j zTf6HXoL2envN63iNI8dMkTPclO8UQpqfsnDD7a0N2nsBvR8T/qxJPHrRQFqZ4JUgo1gHV /cSN/kM93wFyTJtuX4OtkJkK9yIgwEQ= Received: by mail-yw1-f176.google.com with SMTP id 00721157ae682-579d5d89b41so73803557b3.2 for ; Tue, 11 Jul 2023 21:43:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689137021; x=1691729021; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=32J2R02jsERy50AfXzN+Fv7mkbkTcayWxoVtpmMI0mo=; b=kQgLOireKqLbQKrB+SyHisW6tHedPZ4G73Quib/4dgQ29ZxZF0aUUC9/AB1YAXJTmW YfN2VLL4oxjBytasR0DDmdsA/5SwE00Qr2eUa6/BDHduLqwOHYcIxbv7cRXIJlWzEw2m uoqeOZ8DbWFaiY63frmb1j9LBXR1ks+8b73KgPMot86t+OVw+MvtSmilEn/5eKnnRo9K SAxgb45RUrxIr6smOzqkeoGVSl5EJPs5usxApXvZRRMwRcK/6dkO1aWOTsyUkX+UXkby QrSLYbHURK0PWj/JH/BKH9qrF8fgxguY0Wuc8PElVG57pF2x2Uxoa29tGTGp97vsnB2j jLjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689137021; x=1691729021; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=32J2R02jsERy50AfXzN+Fv7mkbkTcayWxoVtpmMI0mo=; b=hkjwMvEk7rTIZds91JIFVJ/LzYF4nRsefwsvU38lEbUyBKEPEUUmUBdlp6H0aWuPFs 4SqI/BgQ4nrLlOREr/Ai0NaOXBT4/aOvBSVJ5asBixSNkch1TJa5yXebWEC2ZyX5ZHdI sbCQwz3yHMUnaRzvVFlXvMyhMkBwpcPmW+YtVYmOCfQPJ6zD9WsiZ9F/BIOxYxHdRJCe a13xvfW80f4YJXsgXC0DRZEpPL0t196LKCZfY2vMCs9sw5JIFeZPaaHOm+dKSSX7CHBs ze3dJCjsrKC8duhB8Q77zYUKw07T/BWjMOjFMGO9HRulIKw8A4zrys2dwQHtbkPc/b60 J3fg== X-Gm-Message-State: ABy/qLaWeRGVwRwoii8JaREX7GlaHWIb2MECDP/TmLZN0ZaFHZlKARvV ldYkGiBlksPJDUCPq6++XVlVJA== X-Google-Smtp-Source: APBJJlEKLBPT5DnCx42X0tK5DSIxj7Xpwg5AizJaj2C86j321LP85ClVc2RWHCxTTmThFbhjxFV/0w== X-Received: by 2002:a81:4e46:0:b0:577:42be:1804 with SMTP id c67-20020a814e46000000b0057742be1804mr17031953ywb.29.1689137021419; Tue, 11 Jul 2023 21:43:41 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id h62-20020a815341000000b0057682d3f95fsm981159ywb.136.2023.07.11.21.43.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:43:41 -0700 (PDT) Date: Tue, 11 Jul 2023 21:43:36 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 11/13] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 7AE0D140007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: yzbmew5i4q5rx481acmdczaw6gyquewm X-HE-Tag: 1689137022-126693 X-HE-Meta: U2FsdGVkX1+UljHJhxdF4nH6JdN0ny1dZ1kPUPr4ZMLOaxknWZfPty9iiXV4KXviSI4Kdsl+ZjxkMpOokr0U8GcizvPg2+1kXNXHv/NSNWbvklVFIxVraTCjZmxbXaZgAm9EN9D6EPvvO59/yGy99r3lSPlOtvtoRDT6mFBfw9MoucY1/GJ/qfiFQHo8n8J8nM1DqyiZJqBaiUS0SVBbuctsAMu2gjq5K++yDmui/3UXIw4EZEw1sTK76uCO9XgYSTJKeKZ6OMgp+diqrBUi2GUeHKBIod7CHQ902jrbB9xmBsKcYY9HfqXAPi5XcFLbqyXMdogZRnQzffaWaOqErEjy+Qe+OKHQHRUwRhQygt/31DkngBSyMMsAIlx6wjrlkRRuT+4R6EoBE1sZhdQdQcJ15J6EsV9bWqPVeEIljZGWJKJRmHSjJg0+Qb1Xu0dVQblFtYT8nkmHPdEuEJ5P5/8oNuGNOPLAQUMysbubtHkWwb4S7u/r3ivKBAbpSV/LwAq56ZJftRCDFG/R66IIURhTxRfwZj+IdOkun0jZk0W9Qjz1ZZiWn/i5NZLxjfhNQ/+RNO27eStDfyeegJEU8qfGacWnFUCq9Qgvkkd6zhaQMEz7b+wpbiSKLL05vwhQP+vqJs0eioV7DIzkVt5Bg4haaoGm1aGdE1P4KT5pEqwdlOtZGKpiMJA/ht6Rg+Al3mxwKOl27niWH1HZigZkYNZ25mYSdcvaIuXBwvI140xGTzCSO9+Yi2bgMX6Buf4lIfAtSEvX29dICstNNJ2opYA9pECNCuV3RkXJcRY1ddbJF1PgV7tJA48GviqmAlNbAJymqNqauaH3WsXnryIz/AZsb9ld3cz2z10Fyu4uGrzRXCB21XFYauJKEYrgJTaXOBvNNzC759E3z6fa1y8jJ4ccWWldcp2IsINELCb7MlZ8kWIsyiVKdWq9n73CNNH/K652hPfw+esPhtZCYO5 h3wL2XaU crEbCDEFxOiUBMKN14759OnE1C8qaeyZmfRMimgqHItsIUso5pNxQmoV243XZV0IeAxqbp8y9gI1LYJNikLRbU9MFhi75TSqw8agwY/I3vmlFY3Ve46dVvh5+MCE25JuN2QlLCKdq+9nb4cwa7LWVwuRl/yuQm4YrG1y/n2LtH5ppfBWas3Z4US1u8V8u3obDmHvjYmbarT0H++rxxNYaFuSaRs/67QYG8/e5xf8OLm4XggNjO/RyJ4j1WLNxSPPPTE1J0YsJpFpr3VtFt5gudZRpN3enIGoMyVh2jzST3ijytlU7wT1lfHl2aG34DHFFglEke60pPAb+T5kPb3QR+sHkYv7X3s4qTBpMlad5rlsKS8ZjIxKqGpPmxzQJa4+tPvLjWw5IsfVmYnuzMONdQAlg6jbneg9Y60mdhsnwpDDEViSX5y0DaUbexYm3i7gFXb/K0S49GvdhVwvSK3bB1fw5GD7ZHYUC+CdFbc0add7Z9gpDtfi0wXMyiorDBAXtc2hH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that retract_page_tables() can retract page tables reliably, without depending on trylocks, delete all the apparatus for khugepaged to try again later: khugepaged_collapse_pte_mapped_thps() etc; and free up the per-mm memory which was set aside for that in the khugepaged_mm_slot. But one part of that is worth keeping: when hpage_collapse_scan_file() found SCAN_PTE_MAPPED_HUGEPAGE, that address was noted in the mm_slot to be tried for retraction later - catching, for example, page tables where a reversible mprotect() of a portion had required splitting the pmd, but now it can be recollapsed. Call collapse_pte_mapped_thp() directly in this case (why was it deferred before? I assume an issue with needing mmap_lock for write, but now it's only needed for read). Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 125 +++++++------------------------------------------- 1 file changed, 16 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 46986eb4eebb..7c7aaddbe130 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,8 +92,6 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); static struct kmem_cache *mm_slot_cache __read_mostly; -#define MAX_PTE_MAPPED_THP 8 - struct collapse_control { bool is_khugepaged; @@ -107,15 +105,9 @@ struct collapse_control { /** * struct khugepaged_mm_slot - khugepaged information per mm that is being scanned * @slot: hash lookup from mm to mm_slot - * @nr_pte_mapped_thp: number of pte mapped THP - * @pte_mapped_thp: address array corresponding pte mapped THP */ struct khugepaged_mm_slot { struct mm_slot slot; - - /* pte-mapped THP in this mm */ - int nr_pte_mapped_thp; - unsigned long pte_mapped_thp[MAX_PTE_MAPPED_THP]; }; /** @@ -1439,50 +1431,6 @@ static void collect_mm_slot(struct khugepaged_mm_slot *mm_slot) } #ifdef CONFIG_SHMEM -/* - * Notify khugepaged that given addr of the mm is pte-mapped THP. Then - * khugepaged should try to collapse the page table. - * - * Note that following race exists: - * (1) khugepaged calls khugepaged_collapse_pte_mapped_thps() for mm_struct A, - * emptying the A's ->pte_mapped_thp[] array. - * (2) MADV_COLLAPSE collapses some file extent with target mm_struct B, and - * retract_page_tables() finds a VMA in mm_struct A mapping the same extent - * (at virtual address X) and adds an entry (for X) into mm_struct A's - * ->pte-mapped_thp[] array. - * (3) khugepaged calls khugepaged_collapse_scan_file() for mm_struct A at X, - * sees a pte-mapped THP (SCAN_PTE_MAPPED_HUGEPAGE) and adds an entry - * (for X) into mm_struct A's ->pte-mapped_thp[] array. - * Thus, it's possible the same address is added multiple times for the same - * mm_struct. Should this happen, we'll simply attempt - * collapse_pte_mapped_thp() multiple times for the same address, under the same - * exclusive mmap_lock, and assuming the first call is successful, subsequent - * attempts will return quickly (without grabbing any additional locks) when - * a huge pmd is found in find_pmd_or_thp_or_none(). Since this is a cheap - * check, and since this is a rare occurrence, the cost of preventing this - * "multiple-add" is thought to be more expensive than just handling it, should - * it occur. - */ -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - struct khugepaged_mm_slot *mm_slot; - struct mm_slot *slot; - bool ret = false; - - VM_BUG_ON(addr & ~HPAGE_PMD_MASK); - - spin_lock(&khugepaged_mm_lock); - slot = mm_slot_lookup(mm_slots_hash, mm); - mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot); - if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) { - mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr; - ret = true; - } - spin_unlock(&khugepaged_mm_lock); - return ret; -} - /* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) @@ -1706,29 +1654,6 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, return result; } -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ - struct mm_slot *slot = &mm_slot->slot; - struct mm_struct *mm = slot->mm; - int i; - - if (likely(mm_slot->nr_pte_mapped_thp == 0)) - return; - - if (!mmap_write_trylock(mm)) - return; - - if (unlikely(hpage_collapse_test_exit(mm))) - goto out; - - for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) - collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i], false); - -out: - mm_slot->nr_pte_mapped_thp = 0; - mmap_write_unlock(mm); -} - static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; @@ -2370,16 +2295,6 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, { BUILD_BUG(); } - -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ -} - -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - return false; -} #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, @@ -2409,7 +2324,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, khugepaged_scan.mm_slot = mm_slot; } spin_unlock(&khugepaged_mm_lock); - khugepaged_collapse_pte_mapped_thps(mm_slot); mm = slot->mm; /* @@ -2462,36 +2376,29 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, khugepaged_scan.address); mmap_read_unlock(mm); - *result = hpage_collapse_scan_file(mm, - khugepaged_scan.address, - file, pgoff, cc); mmap_locked = false; + *result = hpage_collapse_scan_file(mm, + khugepaged_scan.address, file, pgoff, cc); + if (*result == SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + mmap_locked = true; + if (hpage_collapse_test_exit(mm)) { + fput(file); + goto breakouterloop; + } + *result = collapse_pte_mapped_thp(mm, + khugepaged_scan.address, false); + if (*result == SCAN_PMD_MAPPED) + *result = SCAN_SUCCEED; + } fput(file); } else { *result = hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, - &mmap_locked, - cc); + khugepaged_scan.address, &mmap_locked, cc); } - switch (*result) { - case SCAN_PTE_MAPPED_HUGEPAGE: { - pmd_t *pmd; - *result = find_pmd_or_thp_or_none(mm, - khugepaged_scan.address, - &pmd); - if (*result != SCAN_SUCCEED) - break; - if (!khugepaged_add_pte_mapped_thp(mm, - khugepaged_scan.address)) - break; - } fallthrough; - case SCAN_SUCCEED: + if (*result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; - break; - default: - break; - } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; From patchwork Wed Jul 12 04:44:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309570 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34749EB64D9 for ; Wed, 12 Jul 2023 04:45:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9BF46B0075; Wed, 12 Jul 2023 00:45:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C23ED6B0078; Wed, 12 Jul 2023 00:45:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC4C96B007B; Wed, 12 Jul 2023 00:45:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9D7C56B0075 for ; Wed, 12 Jul 2023 00:45:05 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7C45C404B6 for ; Wed, 12 Jul 2023 04:45:05 +0000 (UTC) X-FDA: 81001720170.09.CBA04F9 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf12.hostedemail.com (Postfix) with ESMTP id B0E6740004 for ; Wed, 12 Jul 2023 04:45:03 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=gzISCkNI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689137103; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FwWrXdLh4e+ZZMmHSG9GahXduxLzpg1IKzk9+Pois54=; b=vb1E5KtnisDjUF42y6zl2LQb3VKv/YPeJ3ecjVxufwvVKPRgzQTKJk+gvRvdwTNkWweupv W1KvEXJPwGLansVt6Qi3PwKmyQhdW7twEYsyUuxPVBdBJgvLMZlpHGkTPo2kHqIiU6a/yY l9KrAf5jWx3N2V1qb+XCpRxNe+klFnQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=gzISCkNI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689137103; a=rsa-sha256; cv=none; b=3o/XrNoWg3L+L+nGLT218SMim8PT22ML5NcPht8YdzvJ999Ew6u6iS+mL8BaCcE33Yoo2+ V9oQcsatPE0nOhtMDFxT9zb21RZUFafvs/lr02dQ6Ka0/TOVoNlcd7PvIGcCviePdihkiW XKS9aWj19SFG8YyRfhJlusI6974BpHI= Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-5774098f16eso76651467b3.0 for ; Tue, 11 Jul 2023 21:45:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689137103; x=1691729103; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=FwWrXdLh4e+ZZMmHSG9GahXduxLzpg1IKzk9+Pois54=; b=gzISCkNIkoYstaiURssLgcAP0NHrER6I1tuZ8rc0Qvm4Mprsdiz1G+fVe2F78Gadly Zd6Zv7R4mMYit6gDmv3ASM0ewnC89JI2TxXgi3FWSUoDcd6h1cjVrlXHTbjoLAQWnXuN ZQUv1RJ1RAUpKOWClvwrVKEYf90QtijHOwpAofLxIipQ4ubW+8saarTIudeMK/r5lyew pHuqVs8/tIX7WT6ZYHIjSpcSZfFYwHc9ruaHZLElCwezBNkxHQx4zaVjKMhWpdolCgWJ CTIkHHX4v8I9yJRfGqJT3RXI/emCB7p8zIkR9VuDxyTUEDm4M6z0824h8U/TQtrRCylu LH5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689137103; x=1691729103; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FwWrXdLh4e+ZZMmHSG9GahXduxLzpg1IKzk9+Pois54=; b=howVR6xSXyZbSAhXJ1/+GmDTvZvvnG9HdWH37T9eUaBriChSs3uB/PgXhfGVIEiGQO p/JpgkRspN4eIVgvddIPnhOFj8WCnQeZAN+v+miapSNMNIuL9pIFSxBNQSPwideSZIQm hlr/TzULYG1piVoTr5L4NvA78+/7LFNntwa4sjO74PeyUnyKSWayihUBQgzFHJsHlnJt +Cob2sgMWOMxRrxw1dt14zprhQhuxMTH4G+WSniq+S2gFEZqj8IdL1AIHTTb4/FhtxtW U+kKk4/ZE7HoUW/E1rYI7M2OvcKWRwv2hkw8NbdPFo/yyqY3FdPPnkr+CH8cD3VRxnRl tMew== X-Gm-Message-State: ABy/qLYd4nHDQqM6op2/Hl2Q1ZDO1IWm67fX2aRe55mhu44S8TN32bjg kSGSc2LpkkeJfOoTCzVXhOI2AA== X-Google-Smtp-Source: APBJJlHRrkvIp6Bta0OiFd96ROrSVoBT2nJnQYkUAOeyOnPlbKa4aui5kARvVpJZZxshvdKBo9N7Xg== X-Received: by 2002:a81:ab51:0:b0:577:bc6:6f8c with SMTP id d17-20020a81ab51000000b005770bc66f8cmr17906536ywk.26.1689137102617; Tue, 11 Jul 2023 21:45:02 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id r11-20020a0de80b000000b00561949f713fsm993186ywe.39.2023.07.11.21.44.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:45:02 -0700 (PDT) Date: Tue, 11 Jul 2023 21:44:57 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 12/13] mm: delete mmap_write_trylock() and vma_try_start_write() In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: <728dae79-5110-e3c4-df27-ce3df525aaef@google.com> References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B0E6740004 X-Stat-Signature: jg9p4hceq7dt5mstukid3wfhe3x8qp5j X-Rspam-User: X-HE-Tag: 1689137103-659589 X-HE-Meta: U2FsdGVkX1+7Uxd38QBCrFQWEbiJ4iWj5KAC5tETn0SfEPEfMP3y7gBTOUp/Xc5wSqKHGUHvKaYyNI9SIDhNrDHdW5knZb/FjAedDBiWP0PzGcofKWLxHi6UW8PTtqYZaXe9zGfd35u0Nk6epwleD6LTQDHUdGU6r0ZvNlmZFnvnwJVYXvyQV3vKj68eir0CjBB3HLJPbqUKQR2Ktgmo0dQmrtdAlUjbPEZd/jrwaGiR5yZ2E3jWRFW5uGBMN95ng8++iZnX8AEgVejNx8JzPRHyeX9iDakOpMGTWcaIX0qwj06+1OKDYrNJsa/ezE7kH27k6QBvlcW6ZF6CJR7M1jV3I8/Z0sVkSaTm21R5tW2QXuqeQ5lJw+HkwNeRo8NyHJXiojzKBdyD/A15mnItW9PywXLq+DpxcDcQGmst/gwWnQTc4HcjZQhI/fws6NX78luEuIKSjRmtxP0LI3jzTaAQ6XxR7QHvLGN/FEbCGidZYgTCoNjAG4wrEZ1nI4xL6ksgSsMpP5l40eLBVF7N4uBqnCxLMO1l2SxGsKYsGQvRdZJwUA4mQJVTdTihb12oe4Rap2xQPD+fwFQwdYW44rrcHSEPRMbAR2jfokCaIq+3wwkPLzyNSvq8K9oluHWR2jyhM2B+YijFK7tj8iRqrDdeiaFRYb48YOd0IR9hX7BucS8mi6rYVs/rc8tiNNwvm5TYoS69yHfc5KO1AuRaq0rSZf5XBvGbyBeAvWSMEFSB3UdEvUkfKjF5k9eTExQmifXC6C43BU2SdVbnvEVeat+0vwWNOz0DxC6KUp3OxkHNPXe3cU39a1nqJayWycQqcuYFqZSHfqa9ycqHyCMxt8DaetzyhMZ4lRqgb6BJWn8u2WuZfzDiSqD6OW1vtyZoP+0iGB+Ysv0qrnXH0EgNoPH3TCRL7DAGsa2ZL1GNt08UvaJkpD8EXr7WDwv0tUsJEmjNlFyIzFJmslDnkba nTCBBeH/ od9VaQLETazef2oAVZ8oR4e93iXEeQRCP/NQgMysTk3izpfwapu26clnuDW7+bygmWkvOvWgY4PV88bKT/jVy+uXZppdz3JxZM87jhFGuLFzcxdapprTwyOmCKuiqAsxHcKmPuGEtWK2EDJX9gHIUmU1UaNXqMiHADflrnUWF1NiJKXKT7kUSiPAUGAT2KIk8kvczfYy+Iig4dCaXpVy5C9tuK2gBZ0ubBDfNX+z49UkjkspGzedjqZjDCMaqTKRoFcWXUgDk/wv7PdGQdvB+sScddXpGmfYKVVdtIan8ygPxANIqbNb7KFx3t02RSN7iW3p9UzKVih4k3rqpbHwYMECCr2EaSFQ1D/RpZIr4h8/DnHQf5UDttoFd1zlJsqGSBZ8yvRcekacjKhNnTKf4ZH1Nvi3o1RVj0v2Ya9uzCd6OujxBjpqE0FXhnNB3g0U5wNiLU/5OSCEXfyBlikPMBtqAeQSyl39uEC5nmb7ZFyX0LE50/cmJqUokDn4F2MXWZ7lQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: mmap_write_trylock() and vma_try_start_write() were added just for khugepaged, but now it has no use for them: delete. Signed-off-by: Hugh Dickins --- include/linux/mm.h | 17 ----------------- include/linux/mmap_lock.h | 10 ---------- 2 files changed, 27 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2dd73e4f3d8e..b7b45be616ad 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -692,21 +692,6 @@ static inline void vma_start_write(struct vm_area_struct *vma) up_write(&vma->vm_lock->lock); } -static inline bool vma_try_start_write(struct vm_area_struct *vma) -{ - int mm_lock_seq; - - if (__is_vma_write_locked(vma, &mm_lock_seq)) - return true; - - if (!down_write_trylock(&vma->vm_lock->lock)) - return false; - - vma->vm_lock_seq = mm_lock_seq; - up_write(&vma->vm_lock->lock); - return true; -} - static inline void vma_assert_write_locked(struct vm_area_struct *vma) { int mm_lock_seq; @@ -731,8 +716,6 @@ static inline bool vma_start_read(struct vm_area_struct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} -static inline bool vma_try_start_write(struct vm_area_struct *vma) - { return true; } static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index aab8f1b28d26..d1191f02c7fa 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -112,16 +112,6 @@ static inline int mmap_write_lock_killable(struct mm_struct *mm) return ret; } -static inline bool mmap_write_trylock(struct mm_struct *mm) -{ - bool ret; - - __mmap_lock_trace_start_locking(mm, true); - ret = down_write_trylock(&mm->mmap_lock) != 0; - __mmap_lock_trace_acquire_returned(mm, true, ret); - return ret; -} - static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true); From patchwork Wed Jul 12 04:46:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13309573 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FF54EB64D9 for ; Wed, 12 Jul 2023 04:46:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A52506B0078; Wed, 12 Jul 2023 00:46:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D8458D0002; Wed, 12 Jul 2023 00:46:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A22B8D0001; Wed, 12 Jul 2023 00:46:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7D1786B0078 for ; Wed, 12 Jul 2023 00:46:32 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 488581A0509 for ; Wed, 12 Jul 2023 04:46:32 +0000 (UTC) X-FDA: 81001723824.15.8FDFFB0 Received: from mail-yw1-f171.google.com (mail-yw1-f171.google.com [209.85.128.171]) by imf28.hostedemail.com (Postfix) with ESMTP id 59484C0013 for ; Wed, 12 Jul 2023 04:46:30 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="J1OyC/Pz"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689137190; a=rsa-sha256; cv=none; b=suKm1PJC/IJfrgN1uPaKVY9i4GDpKm52E5wyvHLH/BL/Se6FwzCggPI1o6iBCIg6CNLToZ tAYEFwarHmmHJc9IjKr0h+k5CpPVcxX2prkFYoecHW38VhK76l5k/yIhb7isvStDzbyxPj Fh7Yd1RbzXs19qrOY1zOvd5N/Jb1V6I= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="J1OyC/Pz"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689137190; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4LOZtvGEF5g0eQvRdR0fiwT6tMgm62bdW0dz/Hhyqlk=; b=c66fxRa8aMGvuw2BIyj5sgn+ntnqHPLfnQ3cMYlxLW6Nkx02TzjQpzCi+cizO/5JAnmkEk 9/WJpqzvXFxNbDPqTE4EULDgJufyMYkoKQqvuHpola1PQnD3h1mcN13q4kPoL4TzXIHKeY cbSN7rVw1TaMb6nY9SCgfgh9h4cGqLw= Received: by mail-yw1-f171.google.com with SMTP id 00721157ae682-579ed2829a8so68751727b3.1 for ; Tue, 11 Jul 2023 21:46:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689137189; x=1691729189; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=4LOZtvGEF5g0eQvRdR0fiwT6tMgm62bdW0dz/Hhyqlk=; b=J1OyC/PzRxAcAnxReOGyZoimdKHV4sne6EV+ZF5GnSzIGkj49DIxf8CJZXSiYaLo8O rQDVnXnu0y21FRcOCZP0OeEhb8w1Ae51u8RxjVwTlnPwlHyRZu5pHWxjcdG3sLbNOkgm l9DRu2V+r91JT6sg8uxZ0xdueMEWKHVeYHw/ucTn/rvDkwQeKDUMIXdd66/4pn+ZPDqc bbexzf/2xXArSFmFzEjH5dtsWbOgASWzeyhuG3Z+bgzkYYJreHWjygSN0MaZVhpwfXUc /vz1Ww9YOcG9dsJAekHocNmUdgaeY/Jq/K6dI5tFUgzSXd3eubGhE9SWhC2zJPkylOJH UPbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689137189; x=1691729189; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4LOZtvGEF5g0eQvRdR0fiwT6tMgm62bdW0dz/Hhyqlk=; b=ZVacCtlxHFm2GPN+2sFr1Kxo9rAasklhIpppaW60ThZ16+eFSTnxx6CiaQcLfSnGGK 6667+m5PXCgm9MxiRtwY8itzx+oFDwLrTrPNvHspKPtApqPx0tdCbvvmWr3hCxDDDt5a hNHa88osHj8+1+o/P7/tcDGL/shnKCUIskEg9hE8gpeYYOCs4OnctGLWmTdZBmUThzFG i3nipN86spIlHljyxa1eVPwkC2OTXFqfT5DzHFO2ihDp9WAYgWhmcrtJixqFoGJT2Kr2 JE6ctaoiLFwYX192IIpK7vFBHU5n8WSR6X+RdBdFO26+mjRmDYYI/+djR4G3EbvMqKmq SGug== X-Gm-Message-State: ABy/qLZm6LcIntLm3dN7IciO26FM2h6l7gOY5spIygEJ0imgRnQV/ZUw gMk8bwypdR4wSKi6V+OImtJfZw== X-Google-Smtp-Source: APBJJlGajCIPfwdsjGp7sM9eUSR1neGCrgzZWT479LGOvJujy6hRuJqugeEnCaM3mBFakTQjRwJQ+w== X-Received: by 2002:a81:5dd6:0:b0:562:16d7:e6eb with SMTP id r205-20020a815dd6000000b0056216d7e6ebmr17894738ywb.40.1689137189022; Tue, 11 Jul 2023 21:46:29 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j126-20020a0df984000000b005772e9388cdsm969335ywf.62.2023.07.11.21.46.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:46:28 -0700 (PDT) Date: Tue, 11 Jul 2023 21:46:23 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Jann Horn , Vishal Moola , Vlastimil Babka , Zi Yan , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 13/13] mm/pgtable: notes on pte_offset_map[_lock]() In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 59484C0013 X-Stat-Signature: 1n8p79xdnncn13sgzgwtrrk9a4tehp9e X-HE-Tag: 1689137190-186703 X-HE-Meta: U2FsdGVkX1+MPG7+8b2W/auCaO+3K+dBbW5z5yJUQNdpEl61u9KDPrGt9sx6YUM7owGRHuRlGkBWRrhyOYiATAUIit+3Mnj8jZHc3LJdz3YfwLT1/Um7eshQtTE0jAGJRcCF3u8e+g8YjgCyXpd9BP9UyRbvusjSvcSa6O6tmmvyyJPzrFH7OdGEWecCmqoKwHpYDat2ljhYPoUW5EDisUF6qbi8P5DYCcw2eB7/watLUlarl4kQtqySsIyRnDFvpUroNxqCcbSeuP8xkWOD7GdYc2jVNIdx3mmzQnFxISB8OsA1tG7c8AJQ52HqD9rLLgokPRoTX0qWe5awhib/npe0uP28YixKkpcvB2R2m0tqErzUbVfOssw0iA0IZ0SpRuS47H0F3RPUqKHOhnkb8gONY+SzV/Ptm6bazzFYv2pgD26OsqQGeguHsIt4AzbUSsUJr+nBysgZ943lYW6RN5DDRX8TKIPY9yRp9uJdEuwj6lEWuLoRXj/36IHxr6z+mao8+3Zoih+UKezl9mVfLMXpZny0RSYqdF4wSxwqPFwEu3yEJaKHteWUHnFWEVQeQhrO3bdu8BUurJnVpPeH+U5+f1tqAPZpdGtfU9VCSTHM9C8U6sKDc9fJipfac53OD3lBptluqvr6mDCzxnQ4HrCpwQExH3Nkw22Qr6TyBhsr4Kk/bcq4y0q0x1iBCDD6kN2IuLYOTFEhIxzAtBDInYZPBt5QZUjNdMsNJ4rwL4PBTy+SpIwPEMdjAY7iqMz3GNy4MniO6vrtZa6O8drQJvZkjeA9Q6/QGDln84lgk+xm8GMdimjhidwkgFX4B4QlxeFtCaeIczBzI9DvtjdIVm3x3KcRNzk4qFzcm1/MgxSC22tPs2pmpg7sVrsVbFJL5J6IiQ+4RbpDr0woESEulkoCtq74VlbIFXu4adfYISebUeaPA0tsv2Pm3nMccUpIxGpCA9XFMMSDLjhHNqo lYrN96IN YfWHO0Y7bqpAmAIH/a5HL+T+HFHG3WnOLzUnGyd7TS97ktMPpmp6TgVDXfa7XwD6GsPvELukrBGxZZCI5euWhI5OeXHZFqJzBPJ9uO6D1sqcxTxu901paz4tgM3AUzjQbWIiz5Aj6i2JETDsnKsSZPinGgZi/QtxJGkFq65X7aX9eMcmeiekKxerLPyE88gDP2nYcCdTLcCQRanUcTaoTvPnvI9124seLtPj1tc5gWxgaO+bxFTJAijDg3NuehCeYFGHMno3h5Q70pQErYh69qrHB8arawlwUAyOLxTk7FKS4asXncdWqwqoYMl9YLRQQZvSCEbcZCTFexTnqu2Oh5ytck+d0NonUCIUskuUeInZR6+A+lqQ4SVDFLmEE0ktqQYG8ih/3l9OWNGkBNE39jYdP2nv8ue5kplyFwkO/e/2O0D4JgIUNFlBp7AOU4SNf6Bp616X1vSMdZkcR0fumOk4E2g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a block of comments on pte_offset_map_lock(), pte_offset_map() and pte_offset_map_nolock() to mm/pgtable-generic.c, to help explain them. Signed-off-by: Hugh Dickins --- mm/pgtable-generic.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index fa9d4d084291..4fcd959dcc4d 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -315,6 +315,50 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, return pte; } +/* + * pte_offset_map_lock(mm, pmd, addr, ptlp), and its internal implementation + * __pte_offset_map_lock() below, is usually called with the pmd pointer for + * addr, reached by walking down the mm's pgd, p4d, pud for addr: either while + * holding mmap_lock or vma lock for read or for write; or in truncate or rmap + * context, while holding file's i_mmap_lock or anon_vma lock for read (or for + * write). In a few cases, it may be used with pmd pointing to a pmd_t already + * copied to or constructed on the stack. + * + * When successful, it returns the pte pointer for addr, with its page table + * kmapped if necessary (when CONFIG_HIGHPTE), and locked against concurrent + * modification by software, with a pointer to that spinlock in ptlp (in some + * configs mm->page_table_lock, in SPLIT_PTLOCK configs a spinlock in table's + * struct page). pte_unmap_unlock(pte, ptl) to unlock and unmap afterwards. + * + * But it is unsuccessful, returning NULL with *ptlp unchanged, if there is no + * page table at *pmd: if, for example, the page table has just been removed, + * or replaced by the huge pmd of a THP. (When successful, *pmd is rechecked + * after acquiring the ptlock, and retried internally if it changed: so that a + * page table can be safely removed or replaced by THP while holding its lock.) + * + * pte_offset_map(pmd, addr), and its internal helper __pte_offset_map() above, + * just returns the pte pointer for addr, its page table kmapped if necessary; + * or NULL if there is no page table at *pmd. It does not attempt to lock the + * page table, so cannot normally be used when the page table is to be updated, + * or when entries read must be stable. But it does take rcu_read_lock(): so + * that even when page table is racily removed, it remains a valid though empty + * and disconnected table. Until pte_unmap(pte) unmaps and rcu_read_unlock()s + * afterwards. + * + * pte_offset_map_nolock(mm, pmd, addr, ptlp), above, is like pte_offset_map(); + * but when successful, it also outputs a pointer to the spinlock in ptlp - as + * pte_offset_map_lock() does, but in this case without locking it. This helps + * the caller to avoid a later pte_lockptr(mm, *pmd), which might by that time + * act on a changed *pmd: pte_offset_map_nolock() provides the correct spinlock + * pointer for the page table that it returns. In principle, the caller should + * recheck *pmd once the lock is taken; in practice, no callsite needs that - + * either the mmap_lock for write, or pte_same() check on contents, is enough. + * + * Note that free_pgtables(), used after unmapping detached vmas, or when + * exiting the whole mm, does not take page table lock before freeing a page + * table, and may not use RCU at all: "outsiders" like khugepaged should avoid + * pte_offset_map() and co once the vma is detached from mm or mm_users is zero. + */ pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, spinlock_t **ptlp) {