From patchwork Tue Jun 20 07:40:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285272 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC136EB64D8 for ; Tue, 20 Jun 2023 07:40:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55A298D0002; Tue, 20 Jun 2023 03:40:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 509D28D0001; Tue, 20 Jun 2023 03:40:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D0D58D0002; Tue, 20 Jun 2023 03:40:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 29DA88D0001 for ; Tue, 20 Jun 2023 03:40:16 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F1C771C83EC for ; Tue, 20 Jun 2023 07:40:15 +0000 (UTC) X-FDA: 80922327990.26.075CA41 Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) by imf29.hostedemail.com (Postfix) with ESMTP id 23C6E120012 for ; Tue, 20 Jun 2023 07:40:13 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="0LMD2/Bp"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687246814; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=w+puEa2BnVkW/BzDxfDCIfFqreC1A1h8oWKIfVwoIrq8gNaee6AappgWNQFoADYcXv6nup ZyFh2o4hrsxx/pPwKn70yJQqWkETCusAcbhokTd6be5fF86qBsv8NpOUJyMH8mf9Fbj605 2Zhy4RC4qnKXJYqdSLFeaaNDV8AItxs= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="0LMD2/Bp"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687246814; a=rsa-sha256; cv=none; b=ubx7yOeaCqVCqXGN7aaqYPE4L6aJCcfmpwlBebqkqDtTynB80xBdlzt8LRLq3rLkXQaslS ETmA+wg+1GifaENEwWcFJw5IUgfT/cie4XFFR5ocDYmghVsaBHuUf7oPCXnY68a5CWZfT0 X9avVteeLJPIPvnW3cjCR7sZJxn07GA= Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-5701e8f2b79so49562297b3.0 for ; Tue, 20 Jun 2023 00:40:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687246813; x=1689838813; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=0LMD2/Bp9LkZIBYgPbN4jKWx5l3xqsV5p+9EAezDq6xR/SWmyYTEAe1FdDy1xbMmRb H8PBsbrNU4fYI4mKZfhyMwB/CU4nG5b31JuXXrIAyWt0sajI0olzTibXN+ktWNYF0oep x0ccngPNp6kQepRw/XRcKOWVrIwm+WEupexqrFO6FFWjiu1SlHy8yMCBdGb2CGeJXPQe rQ6klP2gqbiJAoxVWslA7bD8MQXYp6OVctc8Pm9x7E9rZqqXloCfaCSI/DExJxOsqSJT 4d6fjthovhopd44fEKacEstPh46MmIG/u5ZNkKED0IyMDYNloqJ4rKSwMXR8oHsnZsZm /6EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687246813; x=1689838813; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=bkA24V+KS+/sW/gt2eS+c4sEOs3D1Un7xHak3cZwO+2A65lqwaSe+r/prJO8zt6eXu w2lL4utpedGosdMcNep+ONCgTyV65UtDHLuBfQg2YqAjmzGGAY5dGYXnhixSH5K8FI/m HAdEJdLkEs0SK+CxJKeFH4LEmK06zRCNvF+zGW0/srg/1ja3TNZx+FcWaJTAR8exuYo5 /UXz7Q6m6SQx5PbDaIVPJwF8JN3TjiDCEEwLZvdcrT9NptbTyjW2k0OoPJc7DFc05nYQ FlSJCi6fl3eu9L9RLhdli0gEKHmH0xQTKT2BLIgVgrVxWzIxfazOI4aCDt0rvpAm2Zrv 6eQg== X-Gm-Message-State: AC+VfDzZp9qanDifULUbPfCq5S3fihXhflT/5TkTyWnRz4YC0I6+ZXto RJDq+WiQVKrGiodczaJGK361fg== X-Google-Smtp-Source: ACHHUZ75OBuUzMptMwBPGDkg2ahPnjw/MJzrATJW6OVlCqaB4pyoJIMCLPp7Z4SL32qkhEKJol5B6w== X-Received: by 2002:a81:75d6:0:b0:56d:b98:cc16 with SMTP id q205-20020a8175d6000000b0056d0b98cc16mr13330173ywc.45.1687246813120; Tue, 20 Jun 2023 00:40:13 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x8-20020a817c08000000b005623ae13106sm368166ywc.100.2023.06.20.00.40.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:40:12 -0700 (PDT) Date: Tue, 20 Jun 2023 00:40:00 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Shaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <53514a65-9053-1e8a-c76a-c158f8965@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 23C6E120012 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: bm1rz6re5e5aurwyd3g7amsuh7tu3p6u X-HE-Tag: 1687246813-204660 X-HE-Meta: U2FsdGVkX18bXqNrK2o+J9LhGHebLpkoD5deQmI0D9jnC7YWbYlNeM+uaHYYPhUOqfTR/nqph1NI0742Ya8rDasQ9mwwsPcHzZfcE3YiaKv5F7Tbz8BKW82dOXgQdEq9Bx0YHzan1RzMdYtFJlMDni/wBI8nI+VLcksQYPj/8r0A8dX+nAR0uYDPw/6c6xBSKpA6oLBQwUS/YttQqkPfnYzOJgsidFBF8WhZaF7CYXBo1KjYR2eSjj/qwItW5ihAC/UW4hwRN6Wk3o/XJemqDiXxKiiKiBSayRcJCHdbSuyJm9nyrJKfUDzkGZpzE8f58ScellaqB5FHTZjWkUIEA1tHTIlftWTBWdYUItR251XTpQZLgmVv0ItHLPZLusrlvRhF68XMUd4OBBoWT4XzBFhYp+78VtCbE5kkXI3lGe63Xz7v6sEPN3YG34O350aN+PxPOCQ+c3VIr0EyH7DVVTkhHp3GnQmF9PmGpWvTZVITcgO8Inc1fzNqqJ9z7lE7C/D/9kY/uUnC2bPyjvpELr0GWXi/syBSC9Uraiq9WIz9kwZuak7f/9PmAdjtPpTVtX9B3tEg38u8xzV3h6PcoThM2CDfjC/h1WLhylHKsOVbby5TI7YiNqYWDkWY+yNMZ8sbIZKxkNOplHNSENM0f7VewK5rus9Ouvqmpig4yNHHS8mQ8CNwPI/1l9rPBYt3ShE8rveAt38Krv+BJ2Q9CGKgkWa50zGY+/627pFTV46XssGWbDwUdYNvSkNvPVl0MFRUmbaOXDTLXg0iGfcgBShKzOngbbzOWh0G3Mmjjwg/uPpuSAJBJ3vOfAjeD6bOqY/tz//eC/6KSuPxOHCLSko5ZI+7EcVy/6rkqm+aX7ku+Iga/LiykHL1mH67H5IgaNVWecvdhi9YJae/gRq8AfABPQ8f5iG25C2CGdsHqw1liNBE3Xj0VHnIJ1NA76qsAdFPJAZhxX9RMCZRyHJ LcDn6N/U nlfQ4DXXiIQQ15N0cD0AH3bSQSIqLk6zgUOxOFSAFBcDcbHnaKaOwTI+XgaQmR9FGH1992VGfbVR4b+3la6kAUiTZULdmC5rUgx1OEBwbyaT5VpbioUW+OqcS3iZ5lWnLIX6K9AShinykJoNoh4Fbz6Uw1uu/LDKZdbeB3RdteWbBOL5s3ZHduHMMIf2I+VXcH9ayPaxMdpcqqkRy9DZM9QvfULnGivhnR56fwwBs72U/jdf0wLMTgk6JNmz8xsXA3DKL5zQDkEbynUdLokHFKw5eQrSvh+B9VJP6Wat6mK6bfQXeoNGmZzhmOoKAacdDQ+40EJlMv22HDbCZ/JDCfEPS6sNqypNpUB6Pd12KG5akyIwY69OZ/otX4oac+AjEbcy5BvMEm1NuuGFYggH8vsOok2G2oOKRpofPaFWC++Rih5dbscNxpAOq8UO42416T5M/r7omlpyqKu7JFtlLaDzj63vWjEiuPxm6E4EkpL1eg6szf8xw0FzC+VK2Gmer76Nd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Before putting them to use (several commits later), add rcu_read_lock() to pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a separate commit, since it risks exposing imbalances: prior commits have fixed all the known imbalances, but we may find some have been missed. Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++-- mm/pgtable-generic.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a1326e61d7ee..8b0fc7fdc46f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -99,7 +99,7 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) #define pte_unmap(pte) do { \ kunmap_local((pte)); \ - /* rcu_read_unlock() to be added later */ \ + rcu_read_unlock(); \ } while (0) #else static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) @@ -108,7 +108,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) } static inline void pte_unmap(pte_t *pte) { - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); } #endif diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index c7ab18a5fb77..674671835631 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -236,7 +236,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { pmd_t pmdval; - /* rcu_read_lock() to be added later */ + rcu_read_lock(); pmdval = pmdp_get_lockless(pmd); if (pmdvalp) *pmdvalp = pmdval; @@ -250,7 +250,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) } return __pte_map(&pmdval, addr); nomap: - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); return NULL; } From patchwork Tue Jun 20 07:42:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285274 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96452C0015E for ; Tue, 20 Jun 2023 07:42:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1886B8D0002; Tue, 20 Jun 2023 03:42:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 138608D0001; Tue, 20 Jun 2023 03:42:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 026F38D0002; Tue, 20 Jun 2023 03:42:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E7D938D0001 for ; Tue, 20 Jun 2023 03:42:21 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A64CDC065D for ; Tue, 20 Jun 2023 07:42:21 +0000 (UTC) X-FDA: 80922333282.12.E088880 Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by imf08.hostedemail.com (Postfix) with ESMTP id C7F58160020 for ; Tue, 20 Jun 2023 07:42:19 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Rk9aZz3l; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of hughd@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687246939; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4mQCeX//121ajNwvCY66Rysl9c9mZt4CnPkKAdBrlyo=; b=J3ckR+CDKJBEWmXpBmGG0HebcIGzGKr3rCmed8MQrRac2bjLHxMfog56GgRGGhxOZDWvER RUnkFlvYfnxhAYIP1XgJA+efTeltIC5pOhX9L5c3egvSGMMg8ZffesMHEtJzkSigRT84yj pw81sM8p/CJJdp94bqXLQSAyI8vSKPY= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Rk9aZz3l; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of hughd@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687246939; a=rsa-sha256; cv=none; b=hSnyL2+8NcdL4mDdzFqTW1YwCkmNaF3sOF70iqMAXKHHrj0c/+csb+f85jv3O/KhNyj4oL SQ+ZYQusPx13uxeqgeQtJGMA0LRpcxv4aKSu+3uNxbw6PYmg0bogelXpR0ywzwIyfj9edC Nm4+7MXFJDNH10nPK6JygGrCWMz2Ouw= Received: by mail-yb1-f173.google.com with SMTP id 3f1490d57ef6-bc476bf5239so4498002276.2 for ; Tue, 20 Jun 2023 00:42:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687246939; x=1689838939; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=4mQCeX//121ajNwvCY66Rysl9c9mZt4CnPkKAdBrlyo=; b=Rk9aZz3lCgWM/wzoeIdA8TQQHo7EP0qUx4FBk6w6vBIv3V9EGmhpCtWeOKkVLalH/B 3/IhZXP8pzSAUtUwv1YZ0rZFRUBVXIfjAUyirXzYoBsOop5Z2sZPS+eTpvJIhr7MrLYm LUjzgRD1GyQ/XuvAm8PLozNnLUaUsJj8Srr/G3h67lgjS83i0yMgd02EIVoO5rsRx9kI daqlfGCCUSLKQt7wMh3o0YW7GwW3VGqW8slprzgeDxp+5B0Z6u1+4uuHf8AelDneMwjV DKf7ckeYtLcfEfqu2RjXHw4XHJyjxhdMznrF0GKkr5IkH8s4Le9upNEGz9sVfSI4hq5l 39mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687246939; x=1689838939; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4mQCeX//121ajNwvCY66Rysl9c9mZt4CnPkKAdBrlyo=; b=ZgsxGfbJ3Txzp36dNtowOBDy49qL7fPT9eyk5ynaEavm9+1v4wxF9VwF7X0YspeSKj nFemiZ59bazakIaJpc3aJsHoS+CVLQVXrFlAg+F5OpKHCyZau0JGGEpKGavcfjtIdfrY BpDBoYIqCMUdbevvR0Wvcj2YQuRM2nDCw5hQKIaIqFa68jUY5liG5U5nZzSqPUudrC0q Rih6ahGvD2LyHU0XzmLgHvqtXS29ezLhrjdJ3V7WsDt1E+0NKiXqmXE99sgQpYrPG0Tw ZyUcfTZgEe6WmmcVVt6bglOVleQZFR/44JUsmblTcMNaMpwUWaDyiHvGSvgq7x7lqh9i vk1w== X-Gm-Message-State: AC+VfDzgi9pnDJoMDhFe9CF/jHo56G+Mf2x1H5XqgmKjvEbygJLjqDFU iHznV/fCMLxZw5uDTBaZEp1o6g== X-Google-Smtp-Source: ACHHUZ5rgE5ZeLfP+qyjhwCRCkWSPHoZuMre6lfq8YixODGIptTSfSS2sgODZVUUkcQ4STmKqbbGHw== X-Received: by 2002:a25:4dc2:0:b0:bac:4dd6:d0d1 with SMTP id a185-20020a254dc2000000b00bac4dd6d0d1mr7498427ybb.19.1687246938747; Tue, 20 Jun 2023 00:42:18 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j200-20020a2523d1000000b00be6cf9d2544sm254636ybj.40.2023.06.20.00.42.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:42:18 -0700 (PDT) Date: Tue, 20 Jun 2023 00:42:13 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David Sc. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 02/12] mm/pgtable: add PAE safety to __pte_offset_map() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: C7F58160020 X-Stat-Signature: ci5x7hd83jgk68ok5spu66eokzmdqk6a X-Rspam-User: X-HE-Tag: 1687246939-379034 X-HE-Meta: U2FsdGVkX19bj3FLQBl/xaEMsGHshitx4E67aH/nJkEU3gP+c/94gK+Wk+Cu13RgJf1IqF17kjUGvKPgHcAZSIcL6WBgagbxzz+tqV6XGEmE+DFsa9tiBJJArbUocw8qt6griDuP4NNZPQxAZkIasm5yZdbRwOaS2rQzYGGd3iIrjsycmXDXiZhe8Az3jyLGDmfEqZTU1kIzqKIn14CYVavcIIn1nqkxrT1AyH6ZONXJ/AB4fkD/yTVYktj3f+ld+qnBAyAjef6desNdUpbh/Pc+ageJWcjdFAdL15p4MVraE6nle1bWIqdoPxgkyOw0h9WfxjRn/exp89DknnIMBhWwSSks65X18qMHiWzuSXAikdnzWXOISpYLtP+NCLfbOeDbLkvVpqV+DrB+ZoTQztRHHyN9yEQ/13OODK1W5UQYgi6gaFSa7a0Zb6qDhHQraTquLnLB40DXr7WaPw2WOTTMVJ84bdPN4wkd2Str9UEHYAUqrV6q6T4ImjaoLBYkHOdd5iCOGVoEJJmdwHA/HfFmH22Q8J/6GXIq2NIQ1RxB1YB19RI9EHo/CQylHJb58yj+1AlwMWRrLnwrvJiOGrvWc5+TWu4gF8RPYYsadga+6EJfyeeOkRO6TAni4QsptnRTA3UeJ7WvgyRvnFXGOWaPj2yegusqA4GM+dRXcN/1IcnNdGRzPrsh5JnNSTij+VAVBAfbecMzhkYPWXyueSH5B7ZJ5EdnthCqEu5k9iyJBwjtVLxbc09ir9ycHh0fH86dCXnU0jC4ZyBn67dNfqnq8K/sF2oNqGXKx2ThFlp5uMaEz9rraAlum9vr8kBZVebtyzDu3nfbqOfiAMgPaouAxSSErkidq6aa9OH94ZB4HoY0c4bOr5XOzql13G36IBMJ54TjSVhvOHpCk4HyAY1kLncyA1TDHblGF4hmwoBshB3F/01pYZao3HVnv8Gemm9Cqd8C8YI6v8ZZk7i ++NvB1Xv tmhU3LR3GIDVCD5Ejd91/ZOlAhQXO3RiGaI02gAmQnjdbtTjQ0hfp3yRbbYghYiAsT5kpncjOogegFsqhWXGgJfJHl/MaByOjznkBUbWWjwrB5T2hxKt/dg2viIpaS065eEJrcjP8qQCajudhBQi5uCLB30KtpRerhhnXurv21t4qkT4YwQNuOrF2hu8hHJEhcigNFMc+pTVpO3mN5xJ/Aoc6bclra8knkccgJifPFE+ljIwqCMfX1kU5LqFoaVFkAF8j8i9cGh7QKkXfUbCPILpBkEx5pl7qytwk92wWOgF2aV93psohGjWMYA2T71fa1vYpVmqN1I1ood52fC7oHOV1b/Bx1wGfs9ify4NspMPhebxJJnxvyX+b6oQMow6SdDp9sjinfZ9hDbAgv63QW+AxVo54m4HDxCxHixcLsSpTcJT+XulCd+lyGeUu9hZ4SRd6r/8fTk7DUrmqK9I61ecM4NIIYTe1xV45GzEpVgPhUS76VWWIP/c7Gg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is a faint risk that __pte_offset_map(), on a 32-bit architecture with a 64-bit pmd_t e.g. x86-32 with CONFIG_X86_PAE=y, would succeed on a pmdval assembled from a pmd_low and a pmd_high which never belonged together: their combination not pointing to a page table at all, perhaps not even a valid pfn. pmdp_get_lockless() is not enough to prevent that. Guard against that (on such configs) by local_irq_save() blocking TLB flush between present updates, as linux/pgtable.h suggests. It's only needed around the pmdp_get_lockless() in __pte_offset_map(): a race when __pte_offset_map_lock() repeats the pmdp_get_lockless() after getting the lock, would just send it back to __pte_offset_map() again. Complement this pmdp_get_lockless_start() and pmdp_get_lockless_end(), used only locally in __pte_offset_map(), with a pmdp_get_lockless_sync() synonym for tlb_remove_table_sync_one(): to send the necessary interrupt at the right moment on those configs which do not already send it. CONFIG_GUP_GET_PXX_LOW_HIGH is enabled when required by mips, sh and x86. It is not enabled by arm-32 CONFIG_ARM_LPAE: my understanding is that Will Deacon's 2020 enhancements to READ_ONCE() are sufficient for arm. It is not enabled by arc, but its pmd_t is 32-bit even when pte_t 64-bit. Limit the IRQ disablement to CONFIG_HIGHPTE? Perhaps, but would need a little more work, to retry if pmd_low good for page table, but pmd_high non-zero from THP (and that might be making x86-specific assumptions). Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++++ mm/pgtable-generic.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8b0fc7fdc46f..525f1782b466 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -390,6 +390,7 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) return pmd; } #define pmdp_get_lockless pmdp_get_lockless +#define pmdp_get_lockless_sync() tlb_remove_table_sync_one() #endif /* CONFIG_PGTABLE_LEVELS > 2 */ #endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */ @@ -408,6 +409,9 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) { return pmdp_get(pmdp); } +static inline void pmdp_get_lockless_sync(void) +{ +} #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 674671835631..5e85a625ab30 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -232,12 +232,41 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ + (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RCU)) +/* + * See the comment above ptep_get_lockless() in include/linux/pgtable.h: + * the barriers in pmdp_get_lockless() cannot guarantee that the value in + * pmd_high actually belongs with the value in pmd_low; but holding interrupts + * off blocks the TLB flush between present updates, which guarantees that a + * successful __pte_offset_map() points to a page from matched halves. + */ +static unsigned long pmdp_get_lockless_start(void) +{ + unsigned long irqflags; + + local_irq_save(irqflags); + return irqflags; +} +static void pmdp_get_lockless_end(unsigned long irqflags) +{ + local_irq_restore(irqflags); +} +#else +static unsigned long pmdp_get_lockless_start(void) { return 0; } +static void pmdp_get_lockless_end(unsigned long irqflags) { } +#endif + pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { + unsigned long irqflags; pmd_t pmdval; rcu_read_lock(); + irqflags = pmdp_get_lockless_start(); pmdval = pmdp_get_lockless(pmd); + pmdp_get_lockless_end(irqflags); + if (pmdvalp) *pmdvalp = pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) From patchwork Tue Jun 20 07:43:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285275 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBA31EB64D8 for ; Tue, 20 Jun 2023 07:43:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6DEB88D0002; Tue, 20 Jun 2023 03:43:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 68EC58D0001; Tue, 20 Jun 2023 03:43:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5571A8D0002; Tue, 20 Jun 2023 03:43:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 473218D0001 for ; Tue, 20 Jun 2023 03:43:58 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1DC8B120657 for ; Tue, 20 Jun 2023 07:43:58 +0000 (UTC) X-FDA: 80922337356.15.A814CC3 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by imf22.hostedemail.com (Postfix) with ESMTP id 4810BC000E for ; Tue, 20 Jun 2023 07:43:56 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=qhwhngGU; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247036; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=5xq4JCTo8v5um9Wv2usmUsDL87XiO4umPSpSXwpcWBtqSSkfwcYf9qxN0XaurdZ4bgt2jw cFCGbolZmqpRk00xX9rIjRM/56QQ481HDkvqXjErMSZSaAfb1X9imvIkkLEJpqbI6Xxgpa UnA3/6W7RKzv9eHlWNGeAXmn5yRYITA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247036; a=rsa-sha256; cv=none; b=4zp4cQ7YuMd6xmxR+S6cSV93xA+iuCE7ulpaKNaWtLvz3UypIaJ6Dquoxh1e0RaoUHjpV5 vB+cSD2ZQPA4UyZpUu1nO56+Ltyl+SCnVlne89TrpHQSPu77MhlgFix58guIKIzIgYA4lj xXGNvYfZAsHATknBXUxUXiBtyWmSol8= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=qhwhngGU; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-570808d8ddeso41703097b3.0 for ; Tue, 20 Jun 2023 00:43:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247035; x=1689839035; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=qhwhngGUpgrcZDlnvH+b7VWG9WLTHUEVswW9xK/3YCFZXun6Sku2aj+qmeuJtlpDm9 NL1p7tRRaSaVuFlEgJZSd1u9Osk9e4zYWPVYSmjFqeH51Two+LdeRyVK6tn3qt9RQ2/C FXROYrneK++r2OPYxjNpEmp15a3hsycB8EeXDTwBWnGuwYFgeqd19dhMRsM+70lw7kZD UxM84oMlRLUMCqnzuOHx9Iap0algo8Gh2vtX9lFjHG25G0LFiMapQLFR33NHeN7LFjU7 xqRJD57e/d0/xxEOFNXoz5t1uVw5lEYV2V7WrZMTJMmAFZ9r11+4uO5ZNp31jPHEHE3P 8GXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247035; x=1689839035; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=ebRqDBIytk/vrieiUhTPgxNqdoy3xZ+XbrgfD9ateoQqRsHhUOyOKtDzdCcOvuKAuU 3W3Ns7EAOkO1BRrgYKZSDuNWWHHBVBI/r+NoH6MHYzhz+LmKX/IJ/k9hyg4zoBv3C9hd DhsGFYm9JP6SqyfeO5b1xOKadRT5IHV0mzZUQWkFSWsO1Ky0a4YrI2olsFbGGXEgE4fZ wFSk7ToTOmKIuMsIPfOU9UIYSfjqkokBtba5Ip4dhNq25F1Z+3wtQmRFTA0vZzFZZUs5 eZO9NNQO3GtXe3gAtTgmPZ2j52EbR36lmSyHhWvcYNHeisv1ocNCAa12O3Oy0vutjq98 jajg== X-Gm-Message-State: AC+VfDzMRfochwoLspO6nrHnyGMewC8Cwpb4AoB4lr/0k5L/XeblaU/7 up4bSayQoeBEL41AQQoJO6If9Q== X-Google-Smtp-Source: ACHHUZ7JD/5gT47oLcluDex4538IBoUvSmCiuwTDk3dXuw6uEp5PSF1HYd+jz7L8cEHWLsd+Ww/fjQ== X-Received: by 2002:a0d:cf82:0:b0:56d:2b1e:3d88 with SMTP id r124-20020a0dcf82000000b0056d2b1e3d88mr11408597ywd.24.1687247035289; Tue, 20 Jun 2023 00:43:55 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y188-20020a8188c5000000b00545a08184f8sm365085ywf.136.2023.06.20.00.43.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:43:54 -0700 (PDT) Date: Tue, 20 Jun 2023 00:43:50 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 03/12] arm: adjust_pte() use pte_offset_map_nolock() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4810BC000E X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 7jzfbsuiar9aymam9x98p46r1ae9s1za X-HE-Tag: 1687247036-489239 X-HE-Meta: U2FsdGVkX1+kyolfC7siF1FHpHdG79V4aXXFcUsGjP4NEaoon/hwfrCWV8DyIfuZ2PZAmjILk7Pu8iciyooTyG93VaM2uAHFBn2VE7PXniY05AFmuQkupFCtVsp4B69gDrM4s1xfReTfxY8OFGS/NZxnn4AgiaL45uo7SgclWuCR4BffZKNC4H91yrERFNC4aLN+ZdzfqUzqUKMEh/O84LUVVlpQ07E9Ij4W+urJsTB9PKr74foFxFZcaD8sqdKvcLHznLqSYmxU3oQJS/4XnBuMNf91NVkJOsQDmRFDpbMkTu/4nzqoiXK+O0rYs6pE7NgmdCLU/T1AJUGCuX2OQsWLvU45xb71jJsx7dbmssyoWTx6NG7iKVhg1ePQIZ4IglZbvE+vtkT8TJJueqDFrYb3eeeoxvQ7FvnPzbJ9OcGpXUW17YJtwhqA4D1tyipDQED/wx1nWXMHDjsQpk9+71mxlCEe/8MM9e5u7f+HJxbUUNu3HHckvq5jecp87oUYnlL4E8qMdYkSuWXxx54zd24Y7HPq60Rp+lVO0QKp824z44xFDnONlxIeZFtwe8EZQD4HeMxcY8G7+aIi0lMeTipb8/j3DrVTyoOIgCxmm3vQnhkughh6jTLqWMW20jCNbA9COrM26t8uEFLjKoIkeuIa9ulqaSPjC4MF3CMtaYSRFMq4wMPx+GUX1NHnQKFsCOdX60JtmF4p2ED2yW5UtdF8pzZUDhDmHWAx+GB+IBjGgMr+t64Liufi/YlrmKSGFpVX9CiW7JHEHIGp7N9KKTak52sghE/xNY/l030Of8epnhK5suDjo8S7UMxqYs96Cq0vzABkuQlhW/7g/0ZGWQfnQ6goXyDhuv/1k/BVge1VoZwSQUi67d87WvZlOhT8OXN+0qgIDroIWc0GUaRS273452stq4nEYlUQn8Bk5MjijEusqayIfMM/PDeQ2gMrn1daD+lc8i9L1YhFqoW g3kkW6+F DZ+FWDxvXDSeYZknhZxzsFx6M3gvb+YO7fHQ30qt18MrQgAknfZaz+9g8OgCpRo8TbluGcnb0dekIrp9bMJ5kWY6C+tFThazHfG31ywejWgJYgY80aId/EPjWCvG+Ea67p+ZFnOqEyCuNmCIS9ZhcPxBkj1IOiC66w6uGn/YIhMhVHcjws6pnM+BAvUU+UQE/HQc68ZgRDHc+k6bnvsx5ABUy9fLgNHtXQBWrPlMCovtjso6bIRs4YEG1Lo3WmS+jPoe6yp2S11VVl9J25ZG1hGlwiX5Ki9clqsn9g5dgmVIqaCCOHz4YAAmk6RxjVo1hZpyWWl+oqdzd3UZH7gQcU+4KdAg3xxpUBZSFXKfD/R83IrUkx6yK7hCxK/PnYdk2dtIqw8SvWq2jaRwKLObPzFdgOILF1eVunS5rTG5A7D90zzn6CzWtDzNXtiGn3FP121eXbeIcdWFjoBY95Kqqmt9daSdrI7JYqFkgLKMu1oF99fMvH6z95lURjWFPkz8+uqgY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in adjust_pte(): because it gives the not-locked ptl for precisely that pte, which the caller can then safely lock; whereas pte_lockptr() is not so tightly coupled, because it dereferences the pmd pointer again. Signed-off-by: Hugh Dickins --- arch/arm/mm/fault-armv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index ca5302b0b7ee..7cb125497976 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -117,11 +117,10 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, * must use the nested version. This also means we need to * open-code the spin-locking. */ - pte = pte_offset_map(pmd, address); + pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); if (!pte) return 0; - ptl = pte_lockptr(vma->vm_mm, pmd); do_pte_lock(ptl); ret = do_adjust_pte(vma, address, pfn, pte); From patchwork Tue Jun 20 07:45:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285276 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C7B7EB64DC for ; Tue, 20 Jun 2023 07:45:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1F7F8D0002; Tue, 20 Jun 2023 03:45:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD02C8D0001; Tue, 20 Jun 2023 03:45:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B988F8D0002; Tue, 20 Jun 2023 03:45:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A9D328D0001 for ; Tue, 20 Jun 2023 03:45:34 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7ECD81C8427 for ; Tue, 20 Jun 2023 07:45:34 +0000 (UTC) X-FDA: 80922341388.16.C6D6FA9 Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) by imf07.hostedemail.com (Postfix) with ESMTP id AA0884000B for ; Tue, 20 Jun 2023 07:45:32 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=jZI7lGkG; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247132; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=IVfMx+SFh5oRs65A789JtE0p23vi3jRI5KZeRokV15wIZd86pGhGxW8YkALYAD8pVYlMik 8FHiLL4BYKwvgxWNtBI2H0UkT86dG5tDtEqXuq6Jvn11SeV3HARm4uuY/g5npFsVVMnplP OQj647qR1JXUK5PrBM5anGCFj8uOXx4= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=jZI7lGkG; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247132; a=rsa-sha256; cv=none; b=bTm74q3c2u9qjwADLZXKxtYEaDZFlhky2obLy3xAEP8dugrZlN1MMzFjkq6NseZl4L9Tjw Ofr2RbgXhAOzMxGn9+YzFhYs1blKO+GtHGdLAoPsgPuqkLagCsjagSzWtcOAifM850zgvD 7ZWRVJ6nZEywxE5zBWWmn73gz5/Lqzg= Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-5701810884aso39616777b3.0 for ; Tue, 20 Jun 2023 00:45:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247132; x=1689839132; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=jZI7lGkGOA9VplRFSG9/39QEHLqEHFn81AVfRltHoq/8iNx7aAWlNrNyFe+XyF0lml Bnzro2qd7JBTjoGR7k3DplV7l1biYn3/LoxKNyegFiBFIRoYqaL2y+rd8Ux0pQmhTgvs IUPCYgX4z0WdKLSspIafzLUn3c05nMTizwc56jirtYHMfRSc/oqFfK7oTa/fwkNDoThq 2HdR/Am0ViDDg3zBY7MZWP/MVjmD1Yx9xMfwqoDnIfAkP9LmxM3rFBDQvW9AFIfnglmj paE2iSuMCCu/tpYLxht5C/v0kudDdPO6A0RSV0ftjBtK/MFrCHoTJHxirOjy2uYrAODS XFsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247132; x=1689839132; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=DUscB6R2mnAwYuaCyj+dEedYoXXyK+UkRmtJcNV96kMfU89LXzg+VopOWeILOVoTOM mfZJPKNikV1e7+r/+NZQLsTJOZSGIY8ONlJQvt3h1fMQeZYN3FSE2fdOuWt2ImxWC9Vb BxqISAryHmaSqVcESVF4+4gu79DJDP3dlBHwP8l33GSEqZEfawDmZuwdoSZnLGTy2KOK Cf3OZPjSUySHOGuHRHEIi462XQ0kwxIX8RKWHUXftv5NF6PDpxRLh/keDtWz7AabEdIk RVsv846I8E2Oq0rXARCd93OIyvGVnwOFG/mQey1kp11K7PITVlcgHDysYfok6icW/RlS 4xYg== X-Gm-Message-State: AC+VfDwjWLHHT5Hw8zymHaLixgQfKqX8G5Uf6F/iNFWWOqN0E0eBtH34 LafK1x5leK2DkaJ8nketvXqHFw== X-Google-Smtp-Source: ACHHUZ6lVvN31sBriyHtQUo4iIlWjUc8vgsx+qnIrz5Kb8Jhe8UJO9QGV+cM44ErYZUti1LdEdlZpw== X-Received: by 2002:a81:7189:0:b0:54f:9cd0:990 with SMTP id m131-20020a817189000000b0054f9cd00990mr2846044ywc.18.1687247131680; Tue, 20 Jun 2023 00:45:31 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o9-20020a0dcc09000000b0056d2fce4e09sm379759ywd.42.2023.06.20.00.45.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:45:31 -0700 (PDT) Date: Tue, 20 Jun 2023 00:45:26 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 04/12] powerpc: assert_pte_locked() use pte_offset_map_nolock() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <7ae6836b-b612-23f1-63e0-babda6e96e2c@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: AA0884000B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: f5s44m7g8hqjcrg8wsyxs7ucp5apms9g X-HE-Tag: 1687247132-613256 X-HE-Meta: U2FsdGVkX18J7WXDzkDQI/s00/Rnh7uA0A1igJTP6oswuDMX8CASfdrS8tA9g8q5dwpsiF7RtckYsH6yaJOE+6v4OZIYfAAVsn/uOTaycYid8uQc7RqZXAqL0xplpjLr2YFxy7rEs9LB1gEliL0JD7nhFg4XG6fRGPslsj/XWqmLzRSscSdfw++crwFFmVPcYINpcgRaCOEGAvwTc9LUk2oDQeJITeAgTcLoZbFhlyioRZhKSeKKlETEChyUH3AHV+BgibYdMp+4fsgqZbkgslkS3Cwrt9OsI2TbIrfGzd2N5DxvRZ4PZM3hC/fVdma/nHVbwB3bgOWtVDVzk1rulSBfKuiAjKGy9htqKwqosAYAQ6MMS/qHg1lnZylLBYFJTgG2yAdbh/scvuGzp3LxdMK3WknCxURqXrqZi+SBy+8uX61HARqApo8lVgz1oEdSQqU2r3J6bwl0z7ul/3XjZaIIqrxhHw0w9OHj4e/e4ERulk8hxLTKGfzqPwQcsCGLK6W92Fi85OY8jCy3IbcCW+HpJuxDwFayudEOj7TaqYU5XApyE/41yiPQwXXzNTTOlYRs8W7rrSkWBBJra80Sz/83VC8A8wVnurOVYudslImfN72froN57ffiyM90gPBAvD+ak9U2Bn90Wl9GJ+iUYoptY44czpVyLkyx3q7LXDB+RZcC02RpvAOTCCgaEkEYbleaJ8vG4alHws2fRnJ2CqYzS3NHYMcaUDhYtzL+m36kjyoKFHo1LOgkws/OlCgAu6FN8yT9Ynug1Q9alvQYnt0VjIRxe3KW4aa/PoOnFdZaVfI058a3O6Sm5sFoAi3rWVz8EOP15c2++Xvdd8qV9Of7sIi1Ez0wOj/FEOsn7onvpsbZ2FA6pP0F3F/CDmwC2B6g/jo1KWnG8WMemQxc5qXSMhCsicbrv411pZMGAfBh78V332ULCQ0mP4c3AfBJUNXQeWutk4NYsSpodJp PsDwICJ8 xYZhMITMmULuwOlBkaDFMLiTnLdcs2iPbyfMLRQta9lNbCM+zk5w7T+PslNvrlvskRVxaEDHxEmMNDXiqgMOCQnwGdXCSj8czX4EYfZmkjL2TEFPGOfklKb6X3bik+lFSAeflbcokwqaXKeb/RlIKkFAUM2r+1EBJJhOwOFQdrseSySc+x7CP0OWHglLri090YLr5+n0+dWl/2ihBgG0U8KYlnLqpaBPd2d1q+UCfOh4SaIgQlBybZ+ZBEyPkVRUXw/LW+wUrjgc/ovOPPdJlNfIeBgChJO8xa5EulTh1eTdKUUC22QFP8j+h2+QITk1CMNARFWGdH53w5wTydLeP/p3RUMs02IHZTgNXQFiJj5DVcnHBDGALpOt/sVsDlIG7JD3EL9/Le1DhtsD0XqsSg8WDN2SkYOOhwgvDOxw9NTeMwItl6mrn/voCfo6tCW8yQ/AVJ2aS6BMPIupnB1ojk6g9Dq+M0m/leh+e0eU3HA08JF8vQaVGM+3fUtZUf0F3+qyb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is stricter than the previous implementation, which skipped when pmd_none() (with a comment on khugepaged collapse transitions): but wouldn't we want to know, if an assert_pte_locked() caller can be racing such transitions? This mod might cause new crashes: which either expose my ignorance, or indicate issues to be fixed, or limit the usage of assert_pte_locked(). Signed-off-by: Hugh Dickins --- arch/powerpc/mm/pgtable.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..16b061af86d7 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) p4d_t *p4d; pud_t *pud; pmd_t *pmd; + pte_t *pte; + spinlock_t *ptl; if (mm == &init_mm) return; @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) pud = pud_offset(p4d, addr); BUG_ON(pud_none(*pud)); pmd = pmd_offset(pud, addr); - /* - * khugepaged to collapse normal pages to hugepage, first set - * pmd to none to force page fault/gup to take mmap_lock. After - * pmd is set to none, we do a pte_clear which does this assertion - * so if we find pmd none, return. - */ - if (pmd_none(*pmd)) - return; - BUG_ON(!pmd_present(*pmd)); - assert_spin_locked(pte_lockptr(mm, pmd)); + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); + BUG_ON(!pte); + assert_spin_locked(ptl); + pte_unmap(pte); } #endif /* CONFIG_DEBUG_VM */ From patchwork Tue Jun 20 07:47:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285278 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D43CBEB64D7 for ; Tue, 20 Jun 2023 07:48:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B8A98D0003; Tue, 20 Jun 2023 03:48:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 63F008D0001; Tue, 20 Jun 2023 03:48:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B8E98D0003; Tue, 20 Jun 2023 03:48:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 397578D0001 for ; Tue, 20 Jun 2023 03:48:02 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E9FB81206B3 for ; Tue, 20 Jun 2023 07:48:01 +0000 (UTC) X-FDA: 80922347562.30.B3F7E28 Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) by imf11.hostedemail.com (Postfix) with ESMTP id 14C1C40003 for ; Tue, 20 Jun 2023 07:47:59 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=FTDl5iE6; spf=pass (imf11.hostedemail.com: domain of hughd@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247280; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VQ3G/mOw1YDakj6K/QDY3ISJ4AmLryMvPDR0lMuqBuw=; b=VzaE63j6S0EYcsYd3+AalrgT4NAaXQZvvsjgZINubyaro8QjnaoqTPZWBMecDUimrOM9Gi NM3VMy/PLPFJYr47spVWkvhi3DcIc3LAyc9u/4ugCYEqxkRffyPqkh+9xL1R/5Jxc8p3H5 taDGsi15QdHUFdij9pQrizXXDJocyPo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247280; a=rsa-sha256; cv=none; b=b47RTTVyvxwMzj0BzoDJDYN33RRCJzaX69n6WnxF8dTyMaVcoQufROCVs4s6si1OVVl0T8 VVEGUJO6osc9xbvHE5cVFEgwHwqaFrhDCCCENOReLX/AaxxCLsLEhIaU2asNKXt8uJcjgU I9A9vLQiiFun+y7HHaLojJG6pH4UFf0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=FTDl5iE6; spf=pass (imf11.hostedemail.com: domain of hughd@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-5701e8f2b79so49618617b3.0 for ; Tue, 20 Jun 2023 00:47:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247279; x=1689839279; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=VQ3G/mOw1YDakj6K/QDY3ISJ4AmLryMvPDR0lMuqBuw=; b=FTDl5iE634tOzkrqSCVEu9Nf5D7hLqctOZFWgKMkQ3RqUmNsUi/5jC5UBELVK+62a/ tppQbX64R9NP8cj1gGJ6SvM5CU9vfUE39QVPjAKeWzgQ/S5s5xCa2DCQNDmPIUw76uDU docHziNctEA/rKYRNqAAe9o2ycs/Al7k/GvMXYgsm/K5xYfYahKx2SHDAiqufCA//I5i fpqnLbMwekkTgKZJTegbsUNCWCDlJKVeWVzpAxM2KcHVartgtsYK5L6lD7NGrGn8gZ8V H4EcXMPG6hTq4PUDvh8MLdynIDaaJQky/EdiXCX827BTAn0GvFQ+WeIvsKvPBLKBi8Cj Dm0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247279; x=1689839279; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VQ3G/mOw1YDakj6K/QDY3ISJ4AmLryMvPDR0lMuqBuw=; b=RgOzYAiKUI6OJKg+cmaDAG8FiqQVCIuqhcL+DuJCoFLhNoPi4XM4iOZEqodxBAu5xZ B9zi/HC9cGxv5pq4I0P58QYiDXSNxstdyMUrrBgJTbN3D0vOFgxqGFBqkZkU7j/kB0y+ CAay7X7Fb+HYJf/ShkELyOvHH6DxhDW+PYwQhJCcpIAnIBMS339yJkOKcGqKZt6tbQKw oAH5pzAS6gYwaeFdacCr2z6gbs/oEmDgGFskEi3ge6Ug/31+Vb7SxYsHQ5HClF8R1rok D3qhOeoSJGEFFolk0FcM6PBph9MyxamUWK6cVDm+u+Ql0ul0VomX4qSFvdCqE/BIVfnG w1tA== X-Gm-Message-State: AC+VfDwOEMNQ4Wc1tfljDvHHgyHRjbJhsyU4giX1OgFvDvSyr/ktyADs McQQY1cnLuNNcxzvMFptAKTw5A== X-Google-Smtp-Source: ACHHUZ4S0WdBV687zylMKJcs5BTJOEB/HsGWCswlmI7GBgVKK08qTW3e1TZ6XRwEWVZtJ7HVB5KlNQ== X-Received: by 2002:a0d:e64d:0:b0:569:74f3:f3e1 with SMTP id p74-20020a0de64d000000b0056974f3f3e1mr11385428ywe.0.1687247279055; Tue, 20 Jun 2023 00:47:59 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j127-20020a0df985000000b005612fc707bfsm364068ywf.120.2023.06.20.00.47.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:47:58 -0700 (PDT) Date: Tue, 20 Jun 2023 00:47:54 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David Sc. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <5cd9f442-61da-4c3d-eca-b7f44d22aa5f@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 14C1C40003 X-Rspam-User: X-Stat-Signature: j6igrq3a65oom3d4amg8hpm8uk1y1bcg X-Rspamd-Server: rspam03 X-HE-Tag: 1687247279-428892 X-HE-Meta: U2FsdGVkX1/EO83QnqajjNgc3zvyXh4gcV8JN6PsUuI7QcJhShqbTjGHl3SF1uBpwkwLgLfRlSIEzMMcQfKVZ33dUY4tPzB9r+q4FD2oBOhsgf/YZlq9arApMNkwyopSX2HjrBTTrD0TEYGJ/viU1c5wyiG6wEgFODLoaxsXTXWCvX+tVCYqNM+NHtf0yllQfC6aPIf0tM84m+NQ92On98YhxSKzCRyepiMojqK5gFfOS3XtnNvr20rLdcfdfwP50KwWC2k8eQLvx1Us//d7el7/z5a1Czsm1iuDNiM9tKm20kWqyxQaJxav35N8uf1C+hI2C+EkzMs+V9vHqcl5M4iNQTRx+Y8cZl6DL/l7d0HCU6AGxG0w74lXaSVSOdXzFXBvpfXcXSLLBXVbS7LsXDkiT84RNRO318zYNoydxye+jnAiqId+Aarc11HZRdNfP2uyw9S5GzoxctSrtNyk5FGtaqU6RFq/RTv3dgUVwYaT6KDeinA74oVRf/N7iS+F885aECPPT7oI91We8jN+zZCtvq7pBnj15CyBd/8Z0NKigdQn2VH+QHgbyh2dfNsNgULvmHqx6o5BJuIYSGrq5FMztXwxq1qf5JXKpEQJTTLm54ZbcyPPpPSR/YuVLdInWbDWtSQ3IfAN0lSXd5/avElfOcTC+t1xfkwM7RRWG7l/w9glCnFVBv9nb/VQ0kocNayZ2Q6VjXzwxXrm0erquofvo07xc7wisJlAZPG1wnDOO7UMITi5MZ+gWPbrVrqL966Tm0VQs3sw7ehd5I8Rx5ZuUNROw4YwF7TEiClVPxSwx+RNYmR3JvGzK72L/dZ7eO9FvglyRsQOQ+63sEftKWAvEI5W6tV9Z1IiKdC0J3LiZuHr6XXDRJHJc89FbyIpRya3nS4ZBPM/ZESjVj5q0CCRzMA9OoGyLxgjodYuMT95cRdThmNd39IIm77MILA+B6tqIfmuI2vj9fiyxCE 72yBVZnW CRTBwk+1zZFuYvg2I+5ssMgpbgAhOObKdfTWugf4AW6NXpn1VKyLe11jQrYcRB3v2jwIYadMU+XN2hwwN9RFOcWGjMdmGB57ayoo0Ks0tv2R8h1SsgAy0N+KhHvM1RC+iVw1ZDF5YwjZOmTN/MKHb87DmB7EYGrOpgWLt7KpWD4T024WWldf8acJWM6pQA0asd4bjwecllUBEb9UDKO9ZkpLhxg0M66iCeVY3KzIbeXATbewJiEK6PWHQm4fEk/kNZrhsf8B0hsM4W4o3jsMGFWHbHbFRApcV9tXewNc8AUqjKF3Jpk/9eL3MKC6Bx7EFVEB0bOxfKl2h2SPcMN+MUZth29VJgrx5JdnyUHplsnrYKCdiwnIGWrkjEET9qKlyWgEaDtLRjX1I1LufIXwqYHP7iDoCayGPCVe81R6x6TCZj+LMrIs2hWx97q8dyA0fFiPgW73bOBC+3NwT2dBmwjmj4j7saayNygPWvNtKnfy1MVEtr6wtKrsnwmc60hwMUReF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add powerpc-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This is awkward because the struct page contains only one rcu_head, but that page may be shared between PTE_FRAG_NR pagetables, each wanting to use the rcu_head at the same time: account concurrent deferrals with a heightened refcount, only the first making use of the rcu_head, but re-deferring if more deferrals arrived during its grace period. Signed-off-by: Hugh Dickins Signed-off-by: Hugh Dickins --- arch/powerpc/include/asm/pgalloc.h | 4 +++ arch/powerpc/mm/pgtable-frag.c | 51 ++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/pgalloc.h index 3360cad78ace..3a971e2a8c73 100644 --- a/arch/powerpc/include/asm/pgalloc.h +++ b/arch/powerpc/include/asm/pgalloc.h @@ -45,6 +45,10 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage) pte_fragment_free((unsigned long *)ptepage, 0); } +/* arch use pte_free_defer() implementation in arch/powerpc/mm/pgtable-frag.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* * Functions that deal with pagetables that could be at any level of * the table need to be passed an "index_size" so they know how to diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c index 20652daa1d7e..e4f58c5fc2ac 100644 --- a/arch/powerpc/mm/pgtable-frag.c +++ b/arch/powerpc/mm/pgtable-frag.c @@ -120,3 +120,54 @@ void pte_fragment_free(unsigned long *table, int kernel) __free_page(page); } } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define PTE_FREE_DEFERRED 0x10000 /* beyond any PTE_FRAG_NR */ + +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + int refcount; + + page = container_of(head, struct page, rcu_head); + refcount = atomic_sub_return(PTE_FREE_DEFERRED - 1, + &page->pt_frag_refcount); + if (refcount < PTE_FREE_DEFERRED) { + pte_fragment_free((unsigned long *)page_address(page), 0); + return; + } + /* + * One page may be shared between PTE_FRAG_NR pagetables. + * At least one more call to pte_free_defer() came in while we + * were already deferring, so the free must be deferred again; + * but just for one grace period, however many calls came in. + */ + while (refcount >= PTE_FREE_DEFERRED + PTE_FREE_DEFERRED) { + refcount = atomic_sub_return(PTE_FREE_DEFERRED, + &page->pt_frag_refcount); + } + /* Remove that refcount of 1 left for fragment freeing above */ + atomic_dec(&page->pt_frag_refcount); + call_rcu(&page->rcu_head, pte_free_now); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + /* + * One page may be shared between PTE_FRAG_NR pagetables: only queue + * it once for freeing, but note whenever the free must be deferred. + * + * (This would be much simpler if the struct page had an rcu_head for + * each fragment, or if we could allocate a separate array for that.) + * + * Convert our refcount of 1 to a refcount of PTE_FREE_DEFERRED, and + * proceed to call_rcu() only when the rcu_head is not already in use. + */ + if (atomic_add_return(PTE_FREE_DEFERRED - 1, &page->pt_frag_refcount) < + PTE_FREE_DEFERRED + PTE_FREE_DEFERRED) + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ From patchwork Tue Jun 20 07:49:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285279 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E319EB64D7 for ; Tue, 20 Jun 2023 07:49:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 927268D0003; Tue, 20 Jun 2023 03:49:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B0A88D0001; Tue, 20 Jun 2023 03:49:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 729048D0003; Tue, 20 Jun 2023 03:49:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 619918D0001 for ; Tue, 20 Jun 2023 03:49:42 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 33DB514064C for ; Tue, 20 Jun 2023 07:49:42 +0000 (UTC) X-FDA: 80922351804.01.D44FB40 Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) by imf22.hostedemail.com (Postfix) with ESMTP id 58825C0005 for ; Tue, 20 Jun 2023 07:49:40 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Wlyhpqj5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247380; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=Y5XZTVvYj3wajSuczhiA5Z80uNpVuqv2X3j7+SM9xE/cPCmGGkgQ3ZXp2+GLnT8s2w0dEw NTYGR6SETI0lrVCFtv1Ba8fZkZgVzZIB2olnNeQ7ceaN12yQE5mGcVSqWkey0g7cCP4WCb gZtJTZXaBHGr8mLOgt9wSLmFpr5x3RM= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Wlyhpqj5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247380; a=rsa-sha256; cv=none; b=sG61p0phFL5MEBDvmDlGPeXMlCX+/7HVGZeUDwFVU1uSO70V+DpgSO+7D00cKZvAf6KN6d fD5uU96h61tz7gLfLw9DqPykxukyxXQnJ590mwJcG5PpoJMUbq3aC/x8yfZoOZUrXBZNY+ rjB7RLhg4xgpFfaVWGqnI1M9tvBmypI= Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-5703d12ab9aso47583187b3.2 for ; Tue, 20 Jun 2023 00:49:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247379; x=1689839379; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=Wlyhpqj5qzfuvutrjOa/4V6PuPsDCDOgWHtSW/4kwbuOkurc1q0KurhYLqHW1JVDng ocfjumMCL+5VTdlkv6JFAWzHFSaf/DQcTveKuW/F1nUw+Sj6E3jgL+8I+UQrrKTcSspC zGEBBcceErIi3jvfUdE8XIwgkG18lQEbl2V2HS4GGiBXkvY4t9/NkaOVpMEahcggUUzl u8oefzeOOrLLnM3RkTGo6WsoKQCjwWkScujGOaUdyRth+pzrlMusEw0XYP/BcTSABeYV rleoq5m+PuifcHOelghsbpsP6kubTOIzWI9OP98uqMngj6OJlbnVDdZn1qt2293RkTnb aaaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247379; x=1689839379; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=U3WVVlkVRNSOcybeHPmhpbWqXYFoxVpxyS9tQyPjeuhSGtax7LyY3+D0/WkEipLk6C LfZ9e5NsG07YRj/TwvW+EwPmgNmd96pF3bFbThNTCS73eSfqbc6svkOI8D2H1ZNAhvWA tyG3G5BdoMnKg9qgU0TUB2Tp/Q1fTgpmfHoWSxExYl6YLfJXdM12WbQg65hH98XT/42H SbgoOsv3xJzf2O6uscmpBSmj/AyE9x5i5b9mUb4JW0zLkNkRXCCijZjUidwFlIWV/1sJ Ki/XJHz2eftSNm4jJ3YQPjbjClDnmbezkM4YYtqslJI3qR4QAUnlmJklJT3v+S24PfZy qzDg== X-Gm-Message-State: AC+VfDytRZrNhaVsVsFcPMrw95+hB0INlfu12sN8m8nfXdsp/b/H4J0m fZghdgo7qmNzlkibV/ykxz282g== X-Google-Smtp-Source: ACHHUZ5+IOPW+qBQbjfJiBwfK7g3HrB1Jv9bM1vDxBV2sAeLMGBBrGsscHleOg916F+4k1/7bPeS5w== X-Received: by 2002:a81:8311:0:b0:568:d63e:dd2c with SMTP id t17-20020a818311000000b00568d63edd2cmr10308263ywf.11.1687247379257; Tue, 20 Jun 2023 00:49:39 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o17-20020a0dcc11000000b005702597583fsm381836ywd.26.2023.06.20.00.49.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:49:38 -0700 (PDT) Date: Tue, 20 Jun 2023 00:49:34 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 06/12] sparc: add pte_free_defer() for pte_t *pgtable_t In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: fu4cz9cdnwfzson731xuwpqbwidxiqip X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 58825C0005 X-HE-Tag: 1687247380-259687 X-HE-Meta: U2FsdGVkX19PFl38W3LHDHEoR0PMCqgkkwq+S6lb5rDwACXiKBS39FWW3WJ17jZkc6Ajt1zEuaDLYH5HPEytxbA1ucf5gas3DF8I29KBClI6s4CtJwGsB3mBNLkOiUKlQO69IEEu+lKzqtryBkG89jmZrwvdonLappnkrewdqdTPkOjPoPVB8aCilD64lbpKG7kdJ1OFLv8TDqP2vTsDI2qHlil7NWJTCL4KtbhqURrr1QgwE+i0LbZPTvMkoD+YqrLKu8Cjn18KOfeEDOd9L4Euxw3BBZ/tjutraXwrCn8q6mVodaadxEgV6t7guN/tM+HGmd9wD5fRr7hMHIdJLlUxdP8KRj+K7BKdH/PBm8HFoDlwhKpQ09qbNu1+mLJsYtXWyeT6mLdwh2lEM/vPyHnQqlENR3Wv3sXCp2JdiwoF06BKyw1NPLIOtkBp3JHoeioyTCWxwYaAl9BJ1rS4iBPV0KaqM6ppqijIH1Y1ZZgYp3gNs9EuMcwEtCDMYVr40TRuupFgXhBryuLpajW0nF5ShMs0JKMgRyAqIM2OEA0KnYLOTV42wr4twffRoBktCacmOtH/MTi7PNve9jd9W95r3EooPhgPk7ELy5Cly5ytuGpNaMN6RaKRF/h/sE6XQ4XX+k2hbNyTTrChKBUnBXTzr/En8t5qOIF2CpKM46QyejUsE18xQDvUS/mvhRh2QRZ5S+XOiN/0Dk7/rQWHYu3HKbEYDFfx8E0kqytwFBlujugdMRj5eWgCC7GJRD95sq4dWsljvgua5sLYwfuFeCZbdIuJFq4UHG3F604AyZ+ntq5b9yiSMypOqVsWvUV1s9F+ns01gPBa7P566Xnr8tVXYkAppNzC8+tYBAdtY9KBX3rrSxC6eL6T18UUq/ygb8LjqC+OOz32wVFc2qBizhEU32ldlGmxXe/8q89lI7Acbx0DKIzGsiN0xnGjWEFFQ47hqvcReMulV9MenoZ Is0B0ozu 2B4vAR2xUyVUDOiGDWafCQPVVvKkJ98k4B6L9atPClEBUSk4OiEsypJie127RGfRXb8guwpvsojNhxSj6EygUQ/DwcPpDlIj5aNLz4s1XQpiqVVCQEpP6ekxw5MPeOOMe3jwdKDBGkEKG1oGgwp30PC20VSbcZJoinpbpZMh4S9QL0cKQ7+1uM8pZa42rOvqi1UK0WwkIq5LQttIZ6+lLW5B0d34WqM6JzfCRHUlyRg5CGFy04pmD6twlvlJS5Mx/abUl+apcc0p0j7lhoYFRkSCNMl/zCSS4JapnLXHie5HCd1KcyzyHZbxql6K9I0njAoUSJKIbrHvG4VRBIcPWt2ke0o7Sw8wVRvpLA3rENKbE6XypLWzHYrjZ30HDPrs/24DXdqrS4uMvXdZ7jRZUSqwqy8fTnOUHDOI73xRAJRqKxeNDp7sCogkG7fn3lgLUdUbNxVirh5XxjcTqyHt/JzC2P0s0nda9MGYEubBywgesQ0aNm8k3Sk6PHrov44wSRsvl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add sparc-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. sparc32 supports pagetables sharing a page, but does not support THP; sparc64 supports THP, but does not support pagetables sharing a page. So the sparc-specific pte_free_defer() is as simple as the generic one, except for converting between pte_t *pgtable_t and struct page *. Signed-off-by: Hugh Dickins --- arch/sparc/include/asm/pgalloc_64.h | 4 ++++ arch/sparc/mm/init_64.c | 16 ++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/arch/sparc/include/asm/pgalloc_64.h b/arch/sparc/include/asm/pgalloc_64.h index 7b5561d17ab1..caa7632be4c2 100644 --- a/arch/sparc/include/asm/pgalloc_64.h +++ b/arch/sparc/include/asm/pgalloc_64.h @@ -65,6 +65,10 @@ pgtable_t pte_alloc_one(struct mm_struct *mm); void pte_free_kernel(struct mm_struct *mm, pte_t *pte); void pte_free(struct mm_struct *mm, pgtable_t ptepage); +/* arch use pte_free_defer() implementation in arch/sparc/mm/init_64.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + #define pmd_populate_kernel(MM, PMD, PTE) pmd_set(MM, PMD, PTE) #define pmd_populate(MM, PMD, PTE) pmd_set(MM, PMD, PTE) diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 04f9db0c3111..0d7fd793924c 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2930,6 +2930,22 @@ void pgtable_free(void *table, bool is_page) } #ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + __pte_free((pgtable_t)page_address(page)); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + call_rcu(&page->rcu_head, pte_free_now); +} + void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd) { From patchwork Tue Jun 20 07:51:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285280 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7CD2EB64D7 for ; Tue, 20 Jun 2023 07:51:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 491858D0003; Tue, 20 Jun 2023 03:51:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 441F98D0001; Tue, 20 Jun 2023 03:51:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E5588D0003; Tue, 20 Jun 2023 03:51:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1D82F8D0001 for ; Tue, 20 Jun 2023 03:51:28 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E84F81A069B for ; Tue, 20 Jun 2023 07:51:27 +0000 (UTC) X-FDA: 80922356214.22.62007F6 Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf05.hostedemail.com (Postfix) with ESMTP id 12AB1100009 for ; Tue, 20 Jun 2023 07:51:25 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=yMZZZ71U; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247486; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+jUseASXhwGaNM4F6CkFzwvn2Dk1xhlzNLQjSeyjV8Y=; b=O4t5cu4/8m0wXuArI6E9zMGP81WmRi2krC3dQqsdSMBTgZogKnpQ1lyTEOHK0wAwMklNKS 5DgmYE+cxOoodlmJQAUVRqZMA/K9V3Lo06N57XacK9cJ/QcOlj3+vmLl0pm5I82/3To36p 1gVckKKHRctcZKcTJpTwu8NJu8byt48= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=yMZZZ71U; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247486; a=rsa-sha256; cv=none; b=j5Z/W4WYHMoElDnu/MMGFvV6Q6l6HMszIrie/kVEV+4DOls2/1g3Wb7OeixtUPHVChVMir lD2xk6Iwy1kPvETiOl5cSTtad79c0K6zCEm4mSaLBqM4JEoYaNLZQWNgNRVou461sFW8ln NSXs9F90wz+yUtB3bBvoUrKgrcZPylE= Received: by mail-yb1-f180.google.com with SMTP id 3f1490d57ef6-bd6446528dcso4400747276.2 for ; Tue, 20 Jun 2023 00:51:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247485; x=1689839485; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=+jUseASXhwGaNM4F6CkFzwvn2Dk1xhlzNLQjSeyjV8Y=; b=yMZZZ71UME7oK4yGdGKbgAGJwHWifsRBUE1ty8JHuL5GZamUEZ4PagH7+CA+ACRuXJ sVYkjhp9M9lk4smKqdKCXO0MTjpP1EkTbYT8MDDYBlSiF5OQTuGuvrMV46BD/K8KMf/8 VOuKBB/cMCFRHgkyZqVnyK9Ko9EAea3S5WqfjuYhhZ6hhTM9p4DwQdnfw6MNXWL2d4va sxwz/XIyotH6XV5mKGrc5rIq2+19pPbxhawaqiUWsN1AbxSzM+rTQINZrSJH0bvVIXEy SnNN73ZM1CD0WeufHS2IZ+FCcyXO+JAYLl5u8OigLfNh6mFqqNG2vS4i+tU2O9Fk9zM/ R+ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247485; x=1689839485; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+jUseASXhwGaNM4F6CkFzwvn2Dk1xhlzNLQjSeyjV8Y=; b=FhAgVJMfucPZUEXtMmaIdp9F7QSpYb5CDFFjyk3F79xZcbrva3iGS08Lm2ITvedMup WKarOqCVFRud1w1Icsh1uML4ZUiCtuOaakkEDSlMep85dZhveyEKXkunCLVyEoydOuBD JWBy2baEnwCISZcAqSjTQCV0NBgOlZRMnZXWeAFhL0eWuuNb1kaWqsw2WGdpj5+aUyaB Hxb+VB0KQbqashECpJbp9FWvnhLu/lRVZUddkBBGOV2HczT4ixPS+mDVeDIOZwn0y5bG l0qnaTI6rlGKRBnrzRH+G4nCorp5ORCCe91piBu6luRQ7r094iILdAy4v5Hg1bvgHmYO Uoyg== X-Gm-Message-State: AC+VfDwbyeBgcNtag9xlBCHNa1UnoZi7u4wa2iCFVlHZoDIY2NUQYH40 SWwNAmdW8L69+XX5/ltSzaoWLA== X-Google-Smtp-Source: ACHHUZ5V40k+FwqrX2L16LirpEKemNO0AwOCEL4uHyPeS6tT3McwesGGFGOm1IWTgWqHC9fJk3bo5g== X-Received: by 2002:a25:b296:0:b0:ba8:7122:2917 with SMTP id k22-20020a25b296000000b00ba871222917mr8863514ybj.0.1687247484917; Tue, 20 Jun 2023 00:51:24 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o195-20020a2541cc000000b00bb21aeddeffsm258356yba.18.2023.06.20.00.51.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:51:24 -0700 (PDT) Date: Tue, 20 Jun 2023 00:51:19 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 12AB1100009 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: wh7fcfztyjewpp969sgmuo9f8bb3tap5 X-HE-Tag: 1687247485-262029 X-HE-Meta: U2FsdGVkX1+DfyNhK7dZvt/1jjVyvhcZ+LybtzDj5WNSBl29F0Hq3IYI3PmAtS51FybZijl9D5sAN7Lv9/rH3kVJuzy7TXlwDXouejoSzS8IJGKA0bNDsgmvM8RDSfP18ig+7VfxIdmJUMCef+AoTtO/kInEzGPydE7+Nj+6N9HK5DwydYEfWldfKPiB1nfEuJXI1ogFUGpU8hsQ+0rqnDdri5ZTI8mitGtvQnylWQqBj0ZTGlKybHlXsj+IhNAGWzAf9QuwyQ2Py52Rol7FiwGgQ/CHiCkLar89G+69lpSj53bjUoB6BM0rUlWuVhSBzcCWmhJuGwH+JX7UebEORjjyZT1Hgzon3ciy+h4XK5RTYq/w7lDQ1IzXsSF5YfLRl493b4xB9Srx22HeFjDekTrZZr7y51AgAVGLq2wm4/OZ8Edh5aWDyIZ7VDr1/qSKsCULrwLaQz3PC6vAz3+8Pfm9hBI5Vl21H0myaQh0LQN1Vyg/iQCOkVaEoa5Ehv4fADYdcJwdR/tRtQ09YenuZ0KrrAOSg/+SLnJ55oATaVuvup4oKKaPxI10HUOc7qFj7E0hXZP31SSvAq+/V+vfYE8g7/ELfxLMgPmaZWe8mJzgIaHdTMY0n25DSa27Od2CD2wwplHksWh6ZttovL6LiEwPpweWun0gO+vEmZYhutyBljUkptYeITX+gg0lhXN+t30FhvwVgMVcy693V0bikNuRZXnOE5m4BJ7E/Ln7oR26iZV53l3vl6f29ru1c0lcvz+67KW+UrG9siOtYyv8Gb7RhZ7MjjTxgfI/pgtZpXfVZ8r7EkLWuZtBo2czIwigbbYEw5IC+rOoV+IuyZdSjqVJTUjk2mikJwnPa3JEG11OQHIeA1DCHJY1sjgU2H0I9aH3QnnvmY+4qmgnQUiJRjZ53KQQSrS6pxx0TtJgmibkTLwXfu6WxEAaRfWLiy+ta95igokFyFISOWk8Yic 0GV05Tbe 93jEUmXqz06Dgjrsg/RDDNA5IuRscAvSKH2GpBlLH71hcWzFPfQODLEz6fxAvOASItX/H/6749wal/BvJgVu2tkjJuUTBukfnnkEUXIBYtRjoN/Wj7VQPRvcIWJ3iR4pYnYGdv/SmfboslVVuVJ6mjMNGZlN3N2s0a3XBnqWehCR5qKGrxxQesD0seb+9GjP+R2/ssRG5YAHrGn7FYtTETXipDx9C+NNb4A8b/veEDb6dWjkoQBT+2HrQqx3DLR/TMeNG9ZpSS+kY9m3LvAgpLCX9wa9n/4wUoyW28/ZNUM2r29vvidMirLNT5wl8qyqcFjTj2SI9NHN9V4wEIC6WxTCWPVvM2vOa3+lbFHEDNyg6t75oQkBQsVYgxZ5IGecfQdP6uUjL97tnMAtP2bb13nmTVKF0c66pM3CFbfZou/Ps4hhGQHL3Sih9JJplEgTDflCfOKF9TOpIRclvJAuUnu7jhSHMIQlarf0VhR3u8X6tKOGxhzZipgcTCMdlwmy2Sf8F/BmSX+rYPmqrIFopIxrR6A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add s390-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This version is more complicated than others: because s390 fits two 2K page tables into one 4K page (so page->rcu_head must be shared between both halves), and already uses page->lru (which page->rcu_head overlays) to list any free halves; with clever management by page->_refcount bits. Build upon the existing management, adjusted to follow a new rule: that a page is not linked to mm_context_t::pgtable_list while either half is pending free, by either tlb_remove_table() or pte_free_defer(); but is afterwards either relinked to the list (if other half is allocated), or freed (if other half is free): by __tlb_remove_table() in both cases. This rule ensures that page->lru is no longer in use while page->rcu_head may be needed for use by pte_free_defer(). And a fortuitous byproduct of following this rule is that page_table_free() no longer needs its curious two-step manipulation of _refcount - read commit c2c224932fd0 ("s390/mm: fix 2KB pgtable release race") for what to think of there. But it does not solve the problem that two halves may need rcu_head at the same time. For that, add HHead bits between s390's AAllocated and PPending bits in the upper byte of page->_refcount: then the second pte_free_defer() can see that rcu_head is already in use, and the RCU callee pte_free_half() can see that it needs to make a further call_rcu() for that other half. page_table_alloc() set the page->pt_mm field, so __tlb_remove_table() knows where to link the freed half while its other half is allocated. But linking to the list needs mm->context.lock: and although AA bit set guarantees that pt_mm must still be valid, it does not guarantee that mm is still valid an instant later: so acquiring mm->context.lock would not be safe. For now, use a static global mm_pgtable_list_lock instead: then a soon-to-follow commit will split it per-mm as before (probably by using a SLAB_TYPESAFE_BY_RCU structure for the list head and its lock); and update the commentary on the pgtable_list. Signed-off-by: Hugh Dickins Signed-off-by: Gerald Schaefer Signed-off-by: Hugh Dickins Reviewed-by: Gerald Schaefer --- arch/s390/include/asm/pgalloc.h | 4 + arch/s390/mm/pgalloc.c | 205 +++++++++++++++++++++++--------- include/linux/mm_types.h | 2 +- 3 files changed, 154 insertions(+), 57 deletions(-) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index 17eb618f1348..89a9d5ef94f8 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -143,6 +143,10 @@ static inline void pmd_populate(struct mm_struct *mm, #define pte_free_kernel(mm, pte) page_table_free(mm, (unsigned long *) pte) #define pte_free(mm, pte) page_table_free(mm, (unsigned long *) pte) +/* arch use pte_free_defer() implementation in arch/s390/mm/pgalloc.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + void vmem_map_init(void); void *vmem_crst_alloc(unsigned long val); pte_t *vmem_pte_alloc(void); diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index 66ab68db9842..11983a3ff95a 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -159,6 +159,11 @@ void page_table_free_pgste(struct page *page) #endif /* CONFIG_PGSTE */ +/* + * Temporarily use a global spinlock instead of mm->context.lock. + * This will be replaced by a per-mm spinlock in a followup commit. + */ +static DEFINE_SPINLOCK(mm_pgtable_list_lock); /* * A 2KB-pgtable is either upper or lower half of a normal page. * The second half of the page may be unused or used as another @@ -172,7 +177,7 @@ void page_table_free_pgste(struct page *page) * When a parent page gets fully allocated it contains 2KB-pgtables in both * upper and lower halves and is removed from mm_context_t::pgtable_list. * - * When 2KB-pgtable is freed from to fully allocated parent page that + * When 2KB-pgtable is freed from the fully allocated parent page that * page turns partially allocated and added to mm_context_t::pgtable_list. * * If 2KB-pgtable is freed from the partially allocated parent page that @@ -182,16 +187,24 @@ void page_table_free_pgste(struct page *page) * As follows from the above, no unallocated or fully allocated parent * pages are contained in mm_context_t::pgtable_list. * + * NOTE NOTE NOTE: The commentary above and below has not yet been updated: + * the new rule is that a page is not linked to mm_context_t::pgtable_list + * while either half is pending free by any method; but afterwards is + * either relinked to it, or freed, by __tlb_remove_table(). This allows + * pte_free_defer() to use the page->rcu_head (which overlays page->lru). + * * The upper byte (bits 24-31) of the parent page _refcount is used * for tracking contained 2KB-pgtables and has the following format: * - * PP AA - * 01234567 upper byte (bits 24-31) of struct page::_refcount - * || || - * || |+--- upper 2KB-pgtable is allocated - * || +---- lower 2KB-pgtable is allocated - * |+------- upper 2KB-pgtable is pending for removal - * +-------- lower 2KB-pgtable is pending for removal + * PPHHAA + * 76543210 upper byte (bits 24-31) of struct page::_refcount + * |||||| + * |||||+--- lower 2KB-pgtable is allocated + * ||||+---- upper 2KB-pgtable is allocated + * |||+----- lower 2KB-pgtable is pending free by page->rcu_head + * ||+------ upper 2KB-pgtable is pending free by page->rcu_head + * |+------- lower 2KB-pgtable is pending free by any method + * +-------- upper 2KB-pgtable is pending free by any method * * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why * using _refcount is possible). @@ -200,7 +213,7 @@ void page_table_free_pgste(struct page *page) * The parent page is either: * - added to mm_context_t::pgtable_list in case the second half of the * parent page is still unallocated; - * - removed from mm_context_t::pgtable_list in case both hales of the + * - removed from mm_context_t::pgtable_list in case both halves of the * parent page are allocated; * These operations are protected with mm_context_t::lock. * @@ -239,32 +252,22 @@ unsigned long *page_table_alloc(struct mm_struct *mm) /* Try to get a fragment of a 4K page as a 2K page table */ if (!mm_alloc_pgste(mm)) { table = NULL; - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); if (!list_empty(&mm->context.pgtable_list)) { page = list_first_entry(&mm->context.pgtable_list, struct page, lru); mask = atomic_read(&page->_refcount) >> 24; - /* - * The pending removal bits must also be checked. - * Failure to do so might lead to an impossible - * value of (i.e 0x13 or 0x23) written to _refcount. - * Such values violate the assumption that pending and - * allocation bits are mutually exclusive, and the rest - * of the code unrails as result. That could lead to - * a whole bunch of races and corruptions. - */ - mask = (mask | (mask >> 4)) & 0x03U; - if (mask != 0x03U) { - table = (unsigned long *) page_to_virt(page); - bit = mask & 1; /* =1 -> second 2K */ - if (bit) - table += PTRS_PER_PTE; - atomic_xor_bits(&page->_refcount, - 0x01U << (bit + 24)); - list_del(&page->lru); - } + /* Cannot be on this list if either half pending free */ + WARN_ON_ONCE(mask & ~0x03U); + /* One or other half must be available, but not both */ + WARN_ON_ONCE(mask == 0x00U || mask == 0x03U); + table = (unsigned long *)page_to_virt(page); + bit = mask & 0x01U; /* =1 -> second 2K available */ + table += bit * PTRS_PER_PTE; + atomic_xor_bits(&page->_refcount, 0x01U << (bit + 24)); + list_del(&page->lru); } - spin_unlock_bh(&mm->context.lock); + spin_unlock_bh(&mm_pgtable_list_lock); if (table) return table; } @@ -278,6 +281,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm) } arch_set_page_dat(page, 0); /* Initialize page table */ + page->pt_mm = mm; table = (unsigned long *) page_to_virt(page); if (mm_alloc_pgste(mm)) { /* Return 4K page table with PGSTEs */ @@ -288,14 +292,14 @@ unsigned long *page_table_alloc(struct mm_struct *mm) /* Return the first 2K fragment of the page */ atomic_xor_bits(&page->_refcount, 0x01U << 24); memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE); - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); list_add(&page->lru, &mm->context.pgtable_list); - spin_unlock_bh(&mm->context.lock); + spin_unlock_bh(&mm_pgtable_list_lock); } return table; } -static void page_table_release_check(struct page *page, void *table, +static void page_table_release_check(struct page *page, unsigned long *table, unsigned int half, unsigned int mask) { char msg[128]; @@ -317,21 +321,18 @@ void page_table_free(struct mm_struct *mm, unsigned long *table) if (!mm_alloc_pgste(mm)) { /* Free 2K page table fragment of a 4K page */ bit = ((unsigned long) table & ~PAGE_MASK)/(PTRS_PER_PTE*sizeof(pte_t)); - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); /* - * Mark the page for delayed release. The actual release - * will happen outside of the critical section from this - * function or from __tlb_remove_table() + * Mark the page for release. The actual release will happen + * below from this function, or later from __tlb_remove_table(). */ - mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24)); + mask = atomic_xor_bits(&page->_refcount, 0x01U << (bit + 24)); mask >>= 24; - if (mask & 0x03U) + if (mask & 0x03U) /* other half is allocated */ list_add(&page->lru, &mm->context.pgtable_list); - else + else if (!(mask & 0x30U)) /* other half not pending */ list_del(&page->lru); - spin_unlock_bh(&mm->context.lock); - mask = atomic_xor_bits(&page->_refcount, 0x10U << (bit + 24)); - mask >>= 24; + spin_unlock_bh(&mm_pgtable_list_lock); if (mask != 0x00U) return; half = 0x01U << bit; @@ -362,19 +363,17 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, return; } bit = ((unsigned long) table & ~PAGE_MASK) / (PTRS_PER_PTE*sizeof(pte_t)); - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); /* - * Mark the page for delayed release. The actual release will happen - * outside of the critical section from __tlb_remove_table() or from - * page_table_free() + * Mark the page for delayed release. + * The actual release will happen later, from __tlb_remove_table(). */ mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24)); mask >>= 24; - if (mask & 0x03U) - list_add_tail(&page->lru, &mm->context.pgtable_list); - else + /* Other half not allocated? Other half not already pending free? */ + if ((mask & 0x03U) == 0x00U && (mask & 0x30U) != 0x30U) list_del(&page->lru); - spin_unlock_bh(&mm->context.lock); + spin_unlock_bh(&mm_pgtable_list_lock); table = (unsigned long *) ((unsigned long) table | (0x01U << bit)); tlb_remove_table(tlb, table); } @@ -382,17 +381,40 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, void __tlb_remove_table(void *_table) { unsigned int mask = (unsigned long) _table & 0x03U, half = mask; - void *table = (void *)((unsigned long) _table ^ mask); + unsigned long *table = (unsigned long *)((unsigned long) _table ^ mask); struct page *page = virt_to_page(table); switch (half) { case 0x00U: /* pmd, pud, or p4d */ - free_pages((unsigned long)table, CRST_ALLOC_ORDER); + __free_pages(page, CRST_ALLOC_ORDER); return; case 0x01U: /* lower 2K of a 4K page table */ - case 0x02U: /* higher 2K of a 4K page table */ - mask = atomic_xor_bits(&page->_refcount, mask << (4 + 24)); - mask >>= 24; + case 0x02U: /* upper 2K of a 4K page table */ + /* + * If the other half is marked as allocated, page->pt_mm must + * still be valid, page->rcu_head no longer in use so page->lru + * good for use, so now make the freed half available for reuse. + * But be wary of races with that other half being freed. + */ + if (atomic_read(&page->_refcount) & (0x03U << 24)) { + struct mm_struct *mm = page->pt_mm; + /* + * It is safe to use page->pt_mm when the other half + * is seen allocated while holding pgtable_list lock; + * but how will it be safe to acquire that spinlock? + * Global mm_pgtable_list_lock is safe and easy for + * now, then a followup commit will split it per-mm. + */ + spin_lock_bh(&mm_pgtable_list_lock); + mask = atomic_xor_bits(&page->_refcount, mask << 28); + mask >>= 24; + if (mask & 0x03U) + list_add(&page->lru, &mm->context.pgtable_list); + spin_unlock_bh(&mm_pgtable_list_lock); + } else { + mask = atomic_xor_bits(&page->_refcount, mask << 28); + mask >>= 24; + } if (mask != 0x00U) return; break; @@ -407,6 +429,77 @@ void __tlb_remove_table(void *_table) __free_page(page); } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now0(struct rcu_head *head); +static void pte_free_now1(struct rcu_head *head); + +static void pte_free_pgste(struct rcu_head *head) +{ + unsigned long *table; + struct page *page; + + page = container_of(head, struct page, rcu_head); + table = (unsigned long *)page_to_virt(page); + table = (unsigned long *)((unsigned long)table | 0x03U); + __tlb_remove_table(table); +} + +static void pte_free_half(struct rcu_head *head, unsigned int bit) +{ + unsigned long *table; + struct page *page; + unsigned int mask; + + page = container_of(head, struct page, rcu_head); + mask = atomic_xor_bits(&page->_refcount, 0x04U << (bit + 24)); + + table = (unsigned long *)page_to_virt(page); + table += bit * PTRS_PER_PTE; + table = (unsigned long *)((unsigned long)table | (0x01U << bit)); + __tlb_remove_table(table); + + /* If pte_free_defer() of the other half came in, queue it now */ + if (mask & 0x0CU) + call_rcu(&page->rcu_head, bit ? pte_free_now0 : pte_free_now1); +} + +static void pte_free_now0(struct rcu_head *head) +{ + pte_free_half(head, 0); +} + +static void pte_free_now1(struct rcu_head *head) +{ + pte_free_half(head, 1); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + unsigned int bit, mask; + struct page *page; + + page = virt_to_page(pgtable); + if (mm_alloc_pgste(mm)) { + call_rcu(&page->rcu_head, pte_free_pgste); + return; + } + bit = ((unsigned long)pgtable & ~PAGE_MASK) / + (PTRS_PER_PTE * sizeof(pte_t)); + + spin_lock_bh(&mm_pgtable_list_lock); + mask = atomic_xor_bits(&page->_refcount, 0x15U << (bit + 24)); + mask >>= 24; + /* Other half not allocated? Other half not already pending free? */ + if ((mask & 0x03U) == 0x00U && (mask & 0x30U) != 0x30U) + list_del(&page->lru); + spin_unlock_bh(&mm_pgtable_list_lock); + + /* Do not relink on rcu_head if other half already linked on rcu_head */ + if ((mask & 0x0CU) != 0x0CU) + call_rcu(&page->rcu_head, bit ? pte_free_now1 : pte_free_now0); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + /* * Base infrastructure required to generate basic asces, region, segment, * and page tables that do not make use of enhanced features like EDAT1. diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..1667a1bdb8a8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -146,7 +146,7 @@ struct page { pgtable_t pmd_huge_pte; /* protected by page->ptl */ unsigned long _pt_pad_2; /* mapping */ union { - struct mm_struct *pt_mm; /* x86 pgds only */ + struct mm_struct *pt_mm; /* x86 pgd, s390 */ atomic_t pt_frag_refcount; /* powerpc */ }; #if ALLOC_SPLIT_PTLOCKS From patchwork Tue Jun 20 07:53:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285281 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2514EB64D7 for ; Tue, 20 Jun 2023 07:53:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CE458D0003; Tue, 20 Jun 2023 03:53:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 37E198D0001; Tue, 20 Jun 2023 03:53:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21E788D0003; Tue, 20 Jun 2023 03:53:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 13DBA8D0001 for ; Tue, 20 Jun 2023 03:53:11 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D5ABB40698 for ; Tue, 20 Jun 2023 07:53:10 +0000 (UTC) X-FDA: 80922360540.25.393C370 Received: from mail-yw1-f181.google.com (mail-yw1-f181.google.com [209.85.128.181]) by imf13.hostedemail.com (Postfix) with ESMTP id F1EE020006 for ; Tue, 20 Jun 2023 07:53:08 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=2GY6tb59; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.128.181 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247589; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q5R0zcB6mybnkU9NBbTtghvpq67JAa0crCVDjZ9SpHk=; b=Co+wQlo5SyZ1Gq9nsKdlodGZ8jk0Ho7wSDuG9DGNRbP5dPPdsCED0vavs5yVzJxwQrXSUj LSnb0rYIeXVtUwehMm8R2QisAmSsUHmu05FCNU4tJm17EcLd0fu28PkxBqIboCWO1lP0oH 8PnlRzjLnhTWfMS6r6X+qjOysnP7DDk= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=2GY6tb59; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.128.181 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247589; a=rsa-sha256; cv=none; b=CaldAVGAXiZdG0XEvOO3xqbKcIoi4YwpHZoaWvcTNPik8aBT1bhBhjg1JWJUBjWP32xWN4 GUcoSATGtboAYNl73+C0ef7A2A/bImPvRCszpba04m7vjyiMdbvc+k2py09d1c8vaDe8yd HPDeM5TT/LdpV9UUnbOiE14VExDJDzU= Received: by mail-yw1-f181.google.com with SMTP id 00721157ae682-5707b429540so53753367b3.1 for ; Tue, 20 Jun 2023 00:53:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247588; x=1689839588; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Q5R0zcB6mybnkU9NBbTtghvpq67JAa0crCVDjZ9SpHk=; b=2GY6tb59UZYGOHu8Jpard49Ijl8S41fZ8HDgfi6a4gvFJ5750jQeYrmu9nVrvi1jW2 tp8ypG/banZ4ws6Js9iQQN9cGknEmtht9BWd32J2yMcrVqteXUvW9YR9uMMhKiTOhwCj DlNulmZyq+f8KzKXIgNcUSNUMmKGW1CGHEEgbEp/J7IxewrE//XtQewkerpcgsbwL40t WfRaz/SdO8RzSMtTrw2rYV18y0/dvvOSll1UA2/d8Zuh32W7Pnj2B4fhDdSSvEN+0aoF 9f1IZjfj8naj8IynmrWiGUnfEPBGZdaKRscDXhw1Sy1gWU3iNWnHKGQGUPwpRy5JQoX9 lMYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247588; x=1689839588; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q5R0zcB6mybnkU9NBbTtghvpq67JAa0crCVDjZ9SpHk=; b=hAyg8DPnmI3SY3dCeqBHfoFz6sKs/t8UJGNiwGsbwOM90ca/at5di+On6YsbJCEb+0 PaBBx29jA0LZjx0TxcAmiug63oD9ImRbUdjK42nKxqXwQfCCmpG7DImW+e9IAv+cv7n8 cYcerO3K07EG9+4wc+IRifIOekfBA1wdG1t57OmtAbpq7bZahHYoaDlrhOJgEmXRGjjK SNpswar25OXUWbYLAWHaa5JVA7P64t4tj9+xlh9Y62tO7OucDx1sGprT4aTjhmqT2A+E tb6uVCNefSJjDhIAlAeavRn118vifr53WaiHkbikEUpXwIqU8OtdndGEcyedqVEPHerc XdeA== X-Gm-Message-State: AC+VfDwo5joCFQ9poT4q3KBQlxLewbSFFmK7kOVQubLgS/E0O+07EI63 YIIW6p9ZVUv/PWj3h2HX2gjwkg== X-Google-Smtp-Source: ACHHUZ5uDhFE53584OsFJztx7V9NmKNE+1aQZMwUHlsEN2oJtZCLrBIKo4HQBYW6kI7RzecFkIc3yA== X-Received: by 2002:a81:6c11:0:b0:56f:f83f:618 with SMTP id h17-20020a816c11000000b0056ff83f0618mr10840742ywc.19.1687247587950; Tue, 20 Jun 2023 00:53:07 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x11-20020a81630b000000b0056ffca5fb01sm370775ywb.117.2023.06.20.00.53.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:53:07 -0700 (PDT) Date: Tue, 20 Jun 2023 00:53:03 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 08/12] mm/pgtable: add pte_free_defer() for pgtable as page In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <3e5961a2-26e5-d1ab-5c4c-527e273e3cc5@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: F1EE020006 X-Stat-Signature: qko5bnpm16oj9mytcg6n8bzitk5mnpt1 X-Rspam-User: X-HE-Tag: 1687247588-957121 X-HE-Meta: U2FsdGVkX18OsawjSSubkQDhWJpegh7Q8Ii5XgUMaaT0UNRXGau9wIGYVnKsSiWMfcMnr7aUpsCEjCuOhwbg3LHfc4PWI4fw7eldsQpCBL+3ugMGt+d7yQftfYjNRQF5XkLaWD0cuNB3AlZFd65GpQkwGk68ACgR0hvAKc/PoAIPRNHao2YaX8v2QqEhaUY1DKoJdn48VZv1JYFDZvMg3Lrwz3bL6nOZnVzUr4vgUnHPHskNh1na5DOkKqlmgBXNN9AMFJuMLRBuz7KmHF5XNngpLPZ3cRLZOCUaopjM5JdFqJ7U3275MJT6lDgw4y/SglEyvGJP1oGRBCFIbxWEMNEwvAM5D1ElBqzw75q3PC+TNYb7+BNQB8FZA4Kwe7jWpF7QZwuv2i+lmrzWSHs/f/7EJ4fAlDwZbGoCzjSL9rCrhwI3aOaHOMEMySya2qZmm4IHIcKLebvY3zaAkih2tiHGUOj4A5FsTUkNkxWAB1Ip/R7zxWV8wHTlgLmUMHEIBQExPd2GMJYx+J7/xw1ab8DYBPi63grmhZB0yqRF3WZhXJFXXGTXSeB+UDYq9D9kWvn6Z6azhxh09NZBwyCWU+MKGBF5xyzme0go8ul7JhAbECT0ovgdNtUaFXNxbo0mprasZQ27xW/9J+QCOrUXqxGRG4DiY++DDorwTHZDe/ISM1SYM39bbU2qYAl+ovivh9K65QHprLhNq98y0hbsoXO3V9v4+UIRJl64JsutnAV0aGxcQdCKMQqGJ5ed0C0Iaks4BVX+wjCUFo3lQ3msKuIGZwBGBROEDbZLWLznvaUmQj0LYCDR1JqyZnVawZ6+2Hy6OkjPLrAx7iw+LOaDLWyF3NEzjWMrybqcFUDRrVQHEXBp+3Xo+wNBAoQ3tBwKITc9j+l2JhhqnwOp1/spg0kE6/xOmP9MaQe0JRiL9RiXuh/jXv4YYrQIWblMTwvVg71W54PduRvK5TPIYOp UVFNFKFK cKqsL1Yv8Z5qwsgM9V1/etgGhpQF8jnpVaDYqg/c1aT9GdjZIKbzmWaFTil83MoigL1PoZKvGu8sPwUZio6IwzkKhuf6q5+KSvOPHcej9SQEaaDYCWtmvCH4q7IvpwM98LpXsWJda+1wBUbXmV57E/lFCni7hFR9BZVDwA3tTQ5BoWO/kFvjO1ICazPC8aVVCj4uGwONEa77Mg40j5+9u7nbERi9sfXs3cUS0wXzk4DCt/YjtcRiaW70WwvhsZN4JwYlXyQRWR8jbjUmv1XdPoTcMIY/C4KFkcR9jGUxa0mdVdTF3pFIHhPvYgAfZ+H3jEVPvJEks1T5isLdDW7RH/YohMlF3mwvEqBsWt+DvjyxQqDKV3GWZ2JQoyFWyUC/bCWgsxARyRPXnCfQerRHAn/lbZmozlGmBZ9EV2zoLane38/SwK+bxAdV8Y5webdmzCsahyOfPbNeeJWd8+wvAmh/Pd5LenuOH309MGxGM3rWLv7VGFGLx9OULsSGEywTjApa/IpsYQUkanLS2KUJcCIHRq3T/mn55R6d9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add the generic pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This version suits all those architectures which use an unfragmented page for one page table (none of whose pte_free()s use the mm arg which was passed to it). Signed-off-by: Hugh Dickins --- include/linux/mm_types.h | 4 ++++ include/linux/pgtable.h | 2 ++ mm/pgtable-generic.c | 20 ++++++++++++++++++++ 3 files changed, 26 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 1667a1bdb8a8..09335fa28c41 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -144,6 +144,10 @@ struct page { struct { /* Page table pages */ unsigned long _pt_pad_1; /* compound_head */ pgtable_t pmd_huge_pte; /* protected by page->ptl */ + /* + * A PTE page table page might be freed by use of + * rcu_head: which overlays those two fields above. + */ unsigned long _pt_pad_2; /* mapping */ union { struct mm_struct *pt_mm; /* x86 pgd, s390 */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 525f1782b466..d18d3e963967 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -112,6 +112,8 @@ static inline void pte_unmap(pte_t *pte) } #endif +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* Find an entry in the second-level page table.. */ #ifndef pmd_offset static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 5e85a625ab30..ab3741064bb8 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -13,6 +13,7 @@ #include #include #include +#include #include /* @@ -230,6 +231,25 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, return pmd; } #endif + +/* arch define pte_free_defer in asm/pgalloc.h for its own implementation */ +#ifndef pte_free_defer +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + pte_free(NULL /* mm not passed and not used */, (pgtable_t)page); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = pgtable; + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* pte_free_defer */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ From patchwork Tue Jun 20 07:54:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285282 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 272FEEB64D7 for ; Tue, 20 Jun 2023 07:55:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD2658D0003; Tue, 20 Jun 2023 03:55:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A82198D0001; Tue, 20 Jun 2023 03:55:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 923858D0003; Tue, 20 Jun 2023 03:55:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 82E188D0001 for ; Tue, 20 Jun 2023 03:55:04 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 589B61605D0 for ; Tue, 20 Jun 2023 07:55:04 +0000 (UTC) X-FDA: 80922365328.28.2E5D189 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) by imf11.hostedemail.com (Postfix) with ESMTP id 70B5C40003 for ; Tue, 20 Jun 2023 07:55:02 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=J8nzRWA0; spf=pass (imf11.hostedemail.com: domain of hughd@google.com designates 209.85.222.169 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247702; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EoggPrivOwaF6YYpZ0mcfv9qnK5xaGr29ovQpplAZx8=; b=2/O8XjkcvXFJjNSTfeFeWLd8miSieI9UUbUDy78oGR8kRQQO4O6YHC/ohb2lR1AHoClll6 S7ESKT0YKAv+kHVWOWM87otsmqWwLtXbaQv/cM4Dpei4uERrGAjjzTAAvScxEG0PDQtPF1 JW3I0py3/DlkE3rxxTKtCR6Co/80Mmk= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=J8nzRWA0; spf=pass (imf11.hostedemail.com: domain of hughd@google.com designates 209.85.222.169 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247702; a=rsa-sha256; cv=none; b=jmyp8Q17DdnuBqX6Uv/pkHpgudUco2+sRdpNCOBsTsbhUtrTikGZabZFusVNn8CNqTPHLa 8W4R5TsQcsaHybtRFGirHONH4sqmwwbQtJtdeXn8JtrCXrIXbE8jmzZFrqR3IE2lpzJk9c DZZfRAJdJft9lp2btLTNsqzu6sbT21o= Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-76391e63725so185547685a.3 for ; Tue, 20 Jun 2023 00:55:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247701; x=1689839701; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=EoggPrivOwaF6YYpZ0mcfv9qnK5xaGr29ovQpplAZx8=; b=J8nzRWA0uvpss9wXeQOicc8Ot9y4/UFuc6plZhigByzD/E6z56SmcPuzuWHVg6RLIr zeSwVuKlgmRjiVprT2c29o+LY/xl0mKSG7EI+HK+ZmF9VRuRMDrYHfurcBSQUvbGMDJM BrTznrXs5U8ERPP2CrQPh2E6mB/xxogY+UVNmpVE8mP4PFPg1/HAfaCo5DqPYBJr1jNU jJBNI0/yqplj0fpPoW7vqJFOdNB1zyVXocOEAXeD5BoEwwZ8uu8OHUDQYXH1raSp0ip4 Z7eB2HLsP8hFBVrCu74QJhwf7v/CbG+Ae1Bf3UqUxfhWUQfg52/Gluq44YXQQXKCwSWh Wn8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247701; x=1689839701; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EoggPrivOwaF6YYpZ0mcfv9qnK5xaGr29ovQpplAZx8=; b=NQ4QaRoe2MJOywC2PkjzZhgZcfP9inXENZij4w/bp1cXalu0lFpbpLfHnUKadccSlU SQq4U3vC+C0Xfe/8AParHg9Yr1Rqyg9yPSaOcDXY/OAa6Qb6Fn56i5n1p1IYcjW+7qGB wisid5JpnLLqPshU88BtDT8UPPO16tVJEOhdJDdF4tgQ5ubdgV/nG4nakwUCY3RAe7cC 7mz708rtm2a6zwC2rVhJDztgxhzNtwJoWfyl0FpAlXGNv6oAjVOngqf+MtATFJ//ZYUY PooVoglj96QScg7hoy7dQC+LkG1orKdYvgEPA95DVuca4wEA4fgpLCPN9YtTNkPydi2y 2heQ== X-Gm-Message-State: AC+VfDxVs6h82N33T2m9XZ7PWdMKKbHpxPnFgvu+iNQ47jmgft2asqve /sa17G78IDog8set4ll/AtJ9gg== X-Google-Smtp-Source: ACHHUZ4DJO4EDwJbG0wgrFZYngaLjP4vE5bfDEN/dCam98OaAjzy+Vc4NRjPKMFiq2LjNC4HChBWFA== X-Received: by 2002:a05:620a:2890:b0:75d:4145:154e with SMTP id j16-20020a05620a289000b0075d4145154emr12424317qkp.65.1687247701263; Tue, 20 Jun 2023 00:55:01 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j1-20020a0df901000000b0054f50f71834sm368662ywf.124.2023.06.20.00.54.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:55:00 -0700 (PDT) Date: Tue, 20 Jun 2023 00:54:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 70B5C40003 X-Rspam-User: X-Stat-Signature: gz8qourd6okt6x7o7ubi9f5jk1nutzf1 X-Rspamd-Server: rspam01 X-HE-Tag: 1687247702-33836 X-HE-Meta: U2FsdGVkX19wf7bDCmcL2n/fMQ4HiBZMUASgshOSmVsfjn4ZPCSI+yL57bC6VuX9vrX8WfBhACJaUJOHbhm/9R7Z+GYcLhlgjZXhVdcvAGNZ+oPeIcTyoVhDCMZ9qj1riJr7OdILf59fgWsMMjRMrMfeaASMM7JKVM3pupojdpqfhyZZ1y2p7NtuwLT9k7GeGkLAaZwe5z3YVOQohMm0ZZjjUqsvc4REVVf66wYfIXC6LZVCB2lOIIA3dnAGOqfAIBlMngAJtBoKYcPUiwiOFyrVpNz9seLv3q+0g5m+fwhx2jTb/qtuMdoJKJCEjt0rlL2vTfjc/ni9qtoAgMtIZBHHGDEhyyFc1ysyY1pXhlR8xCWMEcqjyDyw80D4quvtoaqV/wgkORCEEK/Zozl6JOExjzmx+Iui9MgAilOibSDTUAxhoB8dKbsaIQxoJL49u/ZjWJVdp8SC1f8Alak/GNT1MiaZ9EbugxAsDVCT5O4g4gOUCi6oWEHGWJpjMogQNl7F1Kz6a6CQ1pWPl1LsWnHcKyVRKbzSZWz4AkkZ0+H2YQ8ZFWZ4KGrWwnluYrrrfwIb0+BG7bwuLQRfRDsyAsml/VP/wVZtLxcyH0zh1X8TYY+v2haYL4ivz8MfrhlYi1gCXgCvhiAjcTTvn3i1PCRaz9XCmuIMrpObl0/f0htXKT5/CndhQewvWqA/vpqcAsCIAgl5de3LSqyTqVoFremY10f7sIpCjXrBuBOz6hCXykB3PZiPsyasMatJ+ilURmCkP3gPasBEQsWFcMpENuoBHhO8vlvvyAc/5Gi7r34Y7VBmdMSVkWUpvyTH+2XZPijrb31CeNIBI/oNsdCzbiHbWnhUiLhA2cP7u+hv2/pzLDKwqWlxyvSIeP+pjwDj5ijOPOuVsVBu2oi8QAVhXIMLoEbehzb8Q3iT3uX9O9WhUkmR+DNX7bE3yzCvLyEopY3Zn0r5lDDSIqUO80R 84tLnsCO sQiH1SMj6I26aKETYBLHJsWRZa5YhCW/0Fn1D8KK9W6pOU5jH1+bRfT7/+q3idJXHJKQ++vApEIEhv6LN9G5XIgGdJULe6hmVqUVkB6UCw1TyK1URUtqQRetcS12r341aJ81IhnnnMLGgYYKBPfQdvLWBqmKL2/A+WFIGA2H+rt0v72pg0sCXBZYykCIe74JgiLku5iv5Vn52hDsDGK+YkGw86J3ZadGe0r8ri8M8R5eM9ZXNESpQUUKRYpoHCNpp6c2RuVaSaWG8dpXMtsFYXIBnpss3D+HG4RU+/mChjoIqhz2okyBN4YLNlx2yO+qES2DLWWrINyvj1gE7PbPiKbutgv2jiPA+fk+f/VBnebDuP6P6MCeAwW3ELwx6H3mbE5S6mhmF7D9dzKbF31EmBKsxi/IZlyYiaO7L3N41b950uW07SZB6GopOm85x11Ph/7uv6I6WLkB6NnTDLGG7c8dfFrqp8kni4TrMuoqy8sVa0Zw+B8Sa+94YoCRzsOrKSH+j X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Simplify shmem and file THP collapse's retract_page_tables(), and relax its locking: to improve its success rate and to lessen impact on others. Instead of its MADV_COLLAPSE case doing set_huge_pmd() at target_addr of target_mm, leave that part of the work to madvise_collapse() calling collapse_pte_mapped_thp() afterwards: just adjust collapse_file()'s result code to arrange for that. That spares retract_page_tables() four arguments; and since it will be successful in retracting all of the page tables expected of it, no need to track and return a result code itself. It needs i_mmap_lock_read(mapping) for traversing the vma interval tree, but it does not need i_mmap_lock_write() for that: page_vma_mapped_walk() allows for pte_offset_map_lock() etc to fail, and uses pmd_lock() for THPs. retract_page_tables() just needs to use those same spinlocks to exclude it briefly, while transitioning pmd from page table to none: so restore its use of pmd_lock() inside of which pte lock is nested. Users of pte_offset_map_lock() etc all now allow for them to fail: so retract_page_tables() now has no use for mmap_write_trylock() or vma_try_start_write(). In common with rmap and page_vma_mapped_walk(), it does not even need the mmap_read_lock(). But those users do expect the page table to remain a good page table, until they unlock and rcu_read_unlock(): so the page table cannot be freed immediately, but rather by the recently added pte_free_defer(). Use the (usually a no-op) pmdp_get_lockless_sync() to send an interrupt when PAE, and pmdp_collapse_flush() did not already do so: to make sure that the start,pmdp_get_lockless(),end sequence in __pte_offset_map() cannot pick up a pmd entry with mismatched pmd_low and pmd_high. retract_page_tables() can be enhanced to replace_page_tables(), which inserts the final huge pmd without mmap lock: going through an invalid state instead of pmd_none() followed by fault. But that enhancement does raise some more questions: leave it until a later release. Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 184 ++++++++++++++++++++---------------------------- 1 file changed, 75 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1083f0e38a07..f7a0f7673127 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1617,9 +1617,8 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, break; case SCAN_PMD_NONE: /* - * In MADV_COLLAPSE path, possible race with khugepaged where - * all pte entries have been removed and pmd cleared. If so, - * skip all the pte checks and just update the pmd mapping. + * All pte entries have been removed and pmd cleared. + * Skip all the pte checks and just update the pmd mapping. */ goto maybe_install_pmd; default: @@ -1748,123 +1747,88 @@ static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_sl mmap_write_unlock(mm); } -static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, - struct mm_struct *target_mm, - unsigned long target_addr, struct page *hpage, - struct collapse_control *cc) +static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; - int target_result = SCAN_FAIL; - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { - int result = SCAN_FAIL; - struct mm_struct *mm = NULL; - unsigned long addr = 0; - pmd_t *pmd; - bool is_target = false; + struct mmu_notifier_range range; + struct mm_struct *mm; + unsigned long addr; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml; + spinlock_t *ptl; + bool skipped_uffd = false; /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that - * got written to. These VMAs are likely not worth investing - * mmap_write_lock(mm) as PMD-mapping is likely to be split - * later. - * - * Note that vma->anon_vma check is racy: it can be set up after - * the check but before we took mmap_lock by the fault path. - * But page lock would prevent establishing any new ptes of the - * page, so we are safe. - * - * An alternative would be drop the check, but check that page - * table is clear before calling pmdp_collapse_flush() under - * ptl. It has higher chance to recover THP for the VMA, but - * has higher cost too. It would also probably require locking - * the anon_vma. + * got written to. These VMAs are likely not worth removing + * page tables from, as PMD-mapping is likely to be split later. */ - if (READ_ONCE(vma->anon_vma)) { - result = SCAN_PAGE_ANON; - goto next; - } + if (READ_ONCE(vma->anon_vma)) + continue; + addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); if (addr & ~HPAGE_PMD_MASK || - vma->vm_end < addr + HPAGE_PMD_SIZE) { - result = SCAN_VMA_CHECK; - goto next; - } - mm = vma->vm_mm; - is_target = mm == target_mm && addr == target_addr; - result = find_pmd_or_thp_or_none(mm, addr, &pmd); - if (result != SCAN_SUCCEED) - goto next; - /* - * We need exclusive mmap_lock to retract page table. - * - * We use trylock due to lock inversion: we need to acquire - * mmap_lock while holding page lock. Fault path does it in - * reverse order. Trylock is a way to avoid deadlock. - * - * Also, it's not MADV_COLLAPSE's job to collapse other - * mappings - let khugepaged take care of them later. - */ - result = SCAN_PTE_MAPPED_HUGEPAGE; - if ((cc->is_khugepaged || is_target) && - mmap_write_trylock(mm)) { - /* trylock for the same lock inversion as above */ - if (!vma_try_start_write(vma)) - goto unlock_next; - - /* - * Re-check whether we have an ->anon_vma, because - * collapse_and_free_pmd() requires that either no - * ->anon_vma exists or the anon_vma is locked. - * We already checked ->anon_vma above, but that check - * is racy because ->anon_vma can be populated under the - * mmap lock in read mode. - */ - if (vma->anon_vma) { - result = SCAN_PAGE_ANON; - goto unlock_next; - } - /* - * When a vma is registered with uffd-wp, we can't - * recycle the pmd pgtable because there can be pte - * markers installed. Skip it only, so the rest mm/vma - * can still have the same file mapped hugely, however - * it'll always mapped in small page size for uffd-wp - * registered ranges. - */ - if (hpage_collapse_test_exit(mm)) { - result = SCAN_ANY_PROCESS; - goto unlock_next; - } - if (userfaultfd_wp(vma)) { - result = SCAN_PTE_UFFD_WP; - goto unlock_next; - } - collapse_and_free_pmd(mm, vma, addr, pmd); - if (!cc->is_khugepaged && is_target) - result = set_huge_pmd(vma, addr, pmd, hpage); - else - result = SCAN_SUCCEED; - -unlock_next: - mmap_write_unlock(mm); - goto next; - } - /* - * Calling context will handle target mm/addr. Otherwise, let - * khugepaged try again later. - */ - if (!is_target) { - khugepaged_add_pte_mapped_thp(mm, addr); + vma->vm_end < addr + HPAGE_PMD_SIZE) continue; + + mm = vma->vm_mm; + if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED) + continue; + + if (hpage_collapse_test_exit(mm)) + continue; + /* + * When a vma is registered with uffd-wp, we cannot recycle + * the page table because there may be pte markers installed. + * Other vmas can still have the same file mapped hugely, but + * skip this one: it will always be mapped in small page size + * for uffd-wp registered ranges. + */ + if (userfaultfd_wp(vma)) + continue; + + /* PTEs were notified when unmapped; but now for the PMD? */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + addr, addr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + pml = pmd_lock(mm, pmd); + ptl = pte_lockptr(mm, pmd); + if (ptl != pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + + /* + * Huge page lock is still held, so normally the page table + * must remain empty; and we have already skipped anon_vma + * and userfaultfd_wp() vmas. But since the mmap_lock is not + * held, it is still possible for a racing userfaultfd_ioctl() + * to have inserted ptes or markers. Now that we hold ptlock, + * repeating the anon_vma check protects from one category, + * and repeating the userfaultfd_wp() check from another. + */ + if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) { + skipped_uffd = true; + } else { + pgt_pmd = pmdp_collapse_flush(vma, addr, pmd); + pmdp_get_lockless_sync(); + } + + if (ptl != pml) + spin_unlock(ptl); + spin_unlock(pml); + + mmu_notifier_invalidate_range_end(&range); + + if (!skipped_uffd) { + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, addr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); } -next: - if (is_target) - target_result = result; } - i_mmap_unlock_write(mapping); - return target_result; + i_mmap_unlock_read(mapping); } /** @@ -2261,9 +2225,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, /* * Remove pte page tables, so we can re-fault the page as huge. + * If MADV_COLLAPSE, adjust result to call collapse_pte_mapped_thp(). */ - result = retract_page_tables(mapping, start, mm, addr, hpage, - cc); + retract_page_tables(mapping, start); + if (cc && !cc->is_khugepaged) + result = SCAN_PTE_MAPPED_HUGEPAGE; unlock_page(hpage); /* From patchwork Tue Jun 20 07:56:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285285 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9998EB64D7 for ; Tue, 20 Jun 2023 07:56:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79E1B8D0003; Tue, 20 Jun 2023 03:56:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 74D058D0001; Tue, 20 Jun 2023 03:56:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A0EE8D0003; Tue, 20 Jun 2023 03:56:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 432BD8D0001 for ; Tue, 20 Jun 2023 03:56:39 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2411FAFEA5 for ; Tue, 20 Jun 2023 07:56:39 +0000 (UTC) X-FDA: 80922369318.23.EDC32DD Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf02.hostedemail.com (Postfix) with ESMTP id 4686680002 for ; Tue, 20 Jun 2023 07:56:37 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=aVESce+3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of hughd@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247797; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6+vwC/lRMNRi87hMBE1YekLdovsofTWKPL6p1SpVoTA=; b=npwELrJ8P1NrTlonTpoH4LYpUDjK1imiRisQAh4hWK77F3zD05SNN2hasfabs7Ndnfll8i NukN1zhgi4/M8AJ4GH/K5atrJt/wWU6CQ9xHlOV1qmj5NOxO2kVGXlKH45xmMCyonrZElt W3RQjC0rrNH3tMhLhlV2NuaZZ36S3AQ= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=aVESce+3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of hughd@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247797; a=rsa-sha256; cv=none; b=kujVZ54wJlre15tLCskM/KAKHdURXpExmZXlEp63c8qMXfuSAg8oP6v7BzSm248Qqei+ma pVzRb8qGfwmneRYoS9EVMgJ93NwlhmJ6a7OeZxOLB6QonzYBYRwBXPz0AX+30vQikCw/hs XNByWwgsLwXYfoM5+L+p3I23iHsfISg= Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-3f9cf20da51so37768431cf.2 for ; Tue, 20 Jun 2023 00:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247796; x=1689839796; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=6+vwC/lRMNRi87hMBE1YekLdovsofTWKPL6p1SpVoTA=; b=aVESce+35OzsmAlAStgyGN3RAj7WDCAE8sHC0KoMDM6cH2/moaTq02nEqNPwD3/T+W noO2Ww2L93Fcf5p9VnVnxgvjusPf47iNS1mjXHxFvgvoI/RPajs6xy2hlJm3pPwx/ukk EEVdSHMxTCdD+Z/QL780HcIsKFRRn53QQIJPB599L7CCSrk3CJVO4ksGsL2XLtesId9S XBJIAQuaqrldCBAWpVU1/KQoK1mGqrJ3H9wgg6dhtt5FIThtuxRlpvzAq0tV3zGFdrNA jHvRkgQ7WhWy8ZMMUEOCC7EiG5C12tuJVl2FBkFi/jQT8V3wZP59XIwC46nMyksOaHFh Ka8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247796; x=1689839796; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6+vwC/lRMNRi87hMBE1YekLdovsofTWKPL6p1SpVoTA=; b=aooiAUUndi0XdAkeByP2tGTfqS2ImVxhmGO0kM4A5O5Zq7dXFqm6n/ii1ijhZNV+Qr CPJRtMaDYqdu2jnMQlhl+b4VlKFruTc1TGjC64/9aZo3dJun1g+Fx68W5Olz7LRiuBUM 08+eZeeDorzBV7sTlTclvoWKQG4jsjYUEPL8MTZakyb0k3qO3aGX4y3cjwaFT0BFcN21 LK/ITReT7vTPcz8y/YMHR9icZeye9c1nIOynwQfcdqnOL1ra+c+sr3AcMUy/+I1Pmkj6 7az4A3HYSWfQ+aDtmbz9nKFJVXFR7dh+7GwgRtLvHYavgWEIXW8o3D0uJu2XumikJjXA D76A== X-Gm-Message-State: AC+VfDx4UGKdcMfZxuT4jdKVijSClmeycTS4I4awOg96hiMqViPN5RDX S3TEHE2g+l6Grst6GmcHqUVDdg== X-Google-Smtp-Source: ACHHUZ7b5Od5F2ztLp7089p/6rSzydLtc+ipnNwtJnuDVoDpL+0rndgulPeJ/Tox+jZ/pysI1HkLKw== X-Received: by 2002:a05:622a:1708:b0:3f0:ac80:1ed7 with SMTP id h8-20020a05622a170800b003f0ac801ed7mr15885756qtk.45.1687247796145; Tue, 20 Jun 2023 00:56:36 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v25-20020a25fc19000000b00bab9a67a4cesm257974ybd.29.2023.06.20.00.56.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:56:35 -0700 (PDT) Date: Tue, 20 Jun 2023 00:56:31 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4686680002 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 1aw9cp3sm44n3cteeejhonxw7ubx3fnk X-HE-Tag: 1687247797-688712 X-HE-Meta: U2FsdGVkX1+AQv5WLQ6dxsGcW3ED0mPiVb2C7mWHY37HmpHSAIW8h6/lkz/qKxwDNbt0QctC/x4XKGPcgtErUX8688NdzqfAmYFCMeAT65INfVXgdXOqHke8QEZuBFWU/TKDh/LCBSv+1VH+mW6TDNVJJkuad5oMdNl8mFZziKlutxj0340KkYN2RRzaI6kK8uPWD1iJ++sMcUBuUFtWfoogvn2OGoQyk0txwJPguWjQyPKbt5blukpbjE3CW49mIm9O3QLwfqUFE6qMuHn/1e7j4RnxhkPt0kM1RVFajt5ODMSpRgtNxtvI4o+OJPNgH+WtQT6SqoJLFmnRt7Z5MBS0g8d266/3c1D49t0JwuJzlqapdekof2jFV+XHmYxu7tyHct3JJbxwLgw5H/W60USIzovwLoUvrXebRZu8Eg8kbSooK2y5k9aaBJ6hmxc95/EyKn5t50ma4ik7nNsvEc2qgNI3j5wM6xXsvEgRzgVQolb8Q0ZvYp/qSXmg1VUx5hmD3p9td202flQn7b0GXE16LBVxJbiAYQeMRA4cx3EL/3gKGQRdR2GpIOg/ab6pWqpZoZ+nMohvvbcnaS+EVXGaEfIYhpEeBY61rAxZ+FVfB4JItNr0EsbKTgYb+safNoGFameIvvkiLAWJ6/vCnCDxwP8BMP6mCOnGeD6xL4rVC+PNuLLw28rdLiztpWQgJM56NrYvQDgA4IVhbILVrhWENOqIVPME2zwcauI7VBEMii/ECCq/6KiLrrK/IQAQb08irvK7LAKC/L2A9ONpvbV1uxZYT0dWxDLqFvZBwbtzB7Z+GKO+HREXG2HlRiy8rto9HM6Nfsib8jgqB76+zf/fcHVkJdsKD04FnyF57QP9Ou22+xb/FtfWME5IdeNdlBxqnZt8cZVwkVBWlLJmPfgI20wuYH3++bAPPD7UWFUTkOmILqR/0QTCQ4kdxie0VPS1IHmDc0phQLUIbyY vKMjz1fJ a+AuW/zYMN/DL4UCcrBBtTHICtAKMCAmRS+15dqwK99yYyZdM9QjU1+49lKKeWxxSYSZO7SSRaIRyPtKqD4LLADfSsJ3NUQzN7OigPt7GCNJPZV+p4NO5evpnlLCmyAmLrf2TvPH/0IypwcJHhQwHjQ3X77G2o0+bQ7a7TLebQMIqN/PnPw+kcN4uaITZRvRuTjI4Km3yOekIM6PJCFcG6XEYDqm3JDS1M/FiJKuHB4MDCyE14MF3nmrz4pbh3mv+VngDT8yKejOg1Pgh5IZeUqygwRXaf+yjjPiaNx4wHv2acC+P1JthvK4bs24jMwwLh2/CEMRivY/5En9MXztunj0G9Waj/RNMcjE6fDN0W83QeBaWOh1DwQXQm4e4vV2Kxyn0Dxz8CLVyqCVrW7QuNQMMuF30cST/pEBkv466u2QW9UZJH9MeiTL3Fe8qDYpRp9miykwZboh8+iv5PTEHaC4JleYBiMkhLRq+i5C1aKrFI3WHlOmzWr8PqFgAA+yPx+/g X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp(). It does need mmap_read_lock(), but it does not need mmap_write_lock(), nor vma_start_write() nor i_mmap lock nor anon_vma lock. All racing paths are relying on pte_offset_map_lock() and pmd_lock(), so use those. Follow the pattern in retract_page_tables(); and using pte_free_defer() removes most of the need for tlb_remove_table_sync_one() here; but call pmdp_get_lockless_sync() to use it in the PAE case. First check the VMA, in case page tables are being torn down: from JannH. Confirm the preliminary find_pmd_or_thp_or_none() once page lock has been acquired and the page looks suitable: from then on its state is stable. However, collapse_pte_mapped_thp() was doing something others don't: freeing a page table still containing "valid" entries. i_mmap lock did stop a racing truncate from double-freeing those pages, but we prefer collapse_pte_mapped_thp() to clear the entries as usual. Their TLB flush can wait until the pmdp_collapse_flush() which follows, but the mmu_notifier_invalidate_range_start() has to be done earlier. Do the "step 1" checking loop without mmu_notifier: it wouldn't be good for khugepaged to keep on repeatedly invalidating a range which is then found unsuitable e.g. contains COWs. "step 2", which does the clearing, must then be more careful (after dropping ptl to do mmu_notifier), with abort prepared to correct the accounting like "step 3". But with those entries now cleared, "step 4" (after dropping ptl to do pmd_lock) is kept safe by the huge page lock, which stops new PTEs from being faulted in. Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 172 ++++++++++++++++++++++-------------------------- 1 file changed, 77 insertions(+), 95 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f7a0f7673127..060ac8789a1e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1485,7 +1485,7 @@ static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, return ret; } -/* hpage must be locked, and mmap_lock must be held in write */ +/* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) { @@ -1497,7 +1497,7 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, }; VM_BUG_ON(!PageTransHuge(hpage)); - mmap_assert_write_locked(vma->vm_mm); + mmap_assert_locked(vma->vm_mm); if (do_set_pmd(&vmf, hpage)) return SCAN_FAIL; @@ -1506,48 +1506,6 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, return SCAN_SUCCEED; } -/* - * A note about locking: - * Trying to take the page table spinlocks would be useless here because those - * are only used to synchronize: - * - * - modifying terminal entries (ones that point to a data page, not to another - * page table) - * - installing *new* non-terminal entries - * - * Instead, we need roughly the same kind of protection as free_pgtables() or - * mm_take_all_locks() (but only for a single VMA): - * The mmap lock together with this VMA's rmap locks covers all paths towards - * the page table entries we're messing with here, except for hardware page - * table walks and lockless_pages_from_mm(). - */ -static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) -{ - pmd_t pmd; - struct mmu_notifier_range range; - - mmap_assert_write_locked(mm); - if (vma->vm_file) - lockdep_assert_held_write(&vma->vm_file->f_mapping->i_mmap_rwsem); - /* - * All anon_vmas attached to the VMA have the same root and are - * therefore locked by the same lock. - */ - if (vma->anon_vma) - lockdep_assert_held_write(&vma->anon_vma->root->rwsem); - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, - addr + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - pmd = pmdp_collapse_flush(vma, addr, pmdp); - tlb_remove_table_sync_one(); - mmu_notifier_invalidate_range_end(&range); - mm_dec_nr_ptes(mm); - page_table_check_pte_clear_range(mm, addr, pmd); - pte_free(mm, pmd_pgtable(pmd)); -} - /** * collapse_pte_mapped_thp - Try to collapse a pte-mapped THP for mm at * address haddr. @@ -1563,26 +1521,29 @@ static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *v int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, bool install_pmd) { + struct mmu_notifier_range range; + bool notified = false; unsigned long haddr = addr & HPAGE_PMD_MASK; struct vm_area_struct *vma = vma_lookup(mm, haddr); struct page *hpage; pte_t *start_pte, *pte; - pmd_t *pmd; - spinlock_t *ptl; - int count = 0, result = SCAN_FAIL; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml, *ptl; + int nr_ptes = 0, result = SCAN_FAIL; int i; - mmap_assert_write_locked(mm); + mmap_assert_locked(mm); + + /* First check VMA found, in case page tables are being torn down */ + if (!vma || !vma->vm_file || + !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) + return SCAN_VMA_CHECK; /* Fast check before locking page if already PMD-mapped */ result = find_pmd_or_thp_or_none(mm, haddr, &pmd); if (result == SCAN_PMD_MAPPED) return result; - if (!vma || !vma->vm_file || - !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) - return SCAN_VMA_CHECK; - /* * If we are here, we've succeeded in replacing all the native pages * in the page cache with a single hugepage. If a mm were to fault-in @@ -1612,6 +1573,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } + result = find_pmd_or_thp_or_none(mm, haddr, &pmd); switch (result) { case SCAN_SUCCEED: break; @@ -1625,27 +1587,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } - /* Lock the vma before taking i_mmap and page table locks */ - vma_start_write(vma); - - /* - * We need to lock the mapping so that from here on, only GUP-fast and - * hardware page walks can access the parts of the page tables that - * we're operating on. - * See collapse_and_free_pmd(). - */ - i_mmap_lock_write(vma->vm_file->f_mapping); - - /* - * This spinlock should be unnecessary: Nobody else should be accessing - * the page tables under spinlock protection here, only - * lockless_pages_from_mm() and the hardware page walker can access page - * tables while all the high-level locks are held in write mode. - */ result = SCAN_FAIL; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); - if (!start_pte) - goto drop_immap; + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto drop_hpage; /* step 1: check all mapped PTEs are to the right huge page */ for (i = 0, addr = haddr, pte = start_pte; @@ -1671,57 +1616,94 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, */ if (hpage + i != page) goto abort; - count++; } - /* step 2: adjust rmap */ + pte_unmap_unlock(start_pte, ptl); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + haddr, haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + notified = true; + start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto abort; + + /* step 2: clear page table and adjust rmap */ for (i = 0, addr = haddr, pte = start_pte; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { struct page *page; if (pte_none(*pte)) continue; - page = vm_normal_page(vma, addr, *pte); - if (WARN_ON_ONCE(page && is_zone_device_page(page))) + /* + * We dropped ptl after the first scan, to do the mmu_notifier: + * page lock stops more PTEs of the hpage being faulted in, but + * does not stop write faults COWing anon copies from existing + * PTEs; and does not stop those being swapped out or migrated. + */ + if (!pte_present(*pte)) { + result = SCAN_PTE_NON_PRESENT; goto abort; + } + page = vm_normal_page(vma, addr, *pte); + if (hpage + i != page) + goto abort; + + /* + * Must clear entry, or a racing truncate may re-remove it. + * TLB flush can be left until pmdp_collapse_flush() does it. + * PTE dirty? Shmem page is already dirty; file is read-only. + */ + pte_clear(mm, addr, pte); page_remove_rmap(page, vma, false); + nr_ptes++; } pte_unmap_unlock(start_pte, ptl); /* step 3: set proper refcount and mm_counters. */ - if (count) { - page_ref_sub(hpage, count); - add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); + if (nr_ptes) { + page_ref_sub(hpage, nr_ptes); + add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); } - /* step 4: remove pte entries */ - /* we make no change to anon, but protect concurrent anon page lookup */ - if (vma->anon_vma) - anon_vma_lock_write(vma->anon_vma); + /* step 4: remove page table */ - collapse_and_free_pmd(mm, vma, haddr, pmd); + /* Huge page lock is still held, so page table must remain empty */ + pml = pmd_lock(mm, pmd); + if (ptl != pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd); + pmdp_get_lockless_sync(); + if (ptl != pml) + spin_unlock(ptl); + spin_unlock(pml); - if (vma->anon_vma) - anon_vma_unlock_write(vma->anon_vma); - i_mmap_unlock_write(vma->vm_file->f_mapping); + mmu_notifier_invalidate_range_end(&range); + + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, haddr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); maybe_install_pmd: /* step 5: install pmd entry */ result = install_pmd ? set_huge_pmd(vma, haddr, pmd, hpage) : SCAN_SUCCEED; - + goto drop_hpage; +abort: + if (nr_ptes) { + flush_tlb_mm(mm); + page_ref_sub(hpage, nr_ptes); + add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); + } + if (start_pte) + pte_unmap_unlock(start_pte, ptl); + if (notified) + mmu_notifier_invalidate_range_end(&range); drop_hpage: unlock_page(hpage); put_page(hpage); return result; - -abort: - pte_unmap_unlock(start_pte, ptl); -drop_immap: - i_mmap_unlock_write(vma->vm_file->f_mapping); - goto drop_hpage; } static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) @@ -2857,9 +2839,9 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, case SCAN_PTE_MAPPED_HUGEPAGE: BUG_ON(mmap_locked); BUG_ON(*prev); - mmap_write_lock(mm); + mmap_read_lock(mm); result = collapse_pte_mapped_thp(mm, addr, true); - mmap_write_unlock(mm); + mmap_locked = true; goto handle_result; /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: From patchwork Tue Jun 20 07:58:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285286 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B635BEB64D7 for ; Tue, 20 Jun 2023 07:58:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 57A618D0003; Tue, 20 Jun 2023 03:58:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 529D38D0001; Tue, 20 Jun 2023 03:58:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37EB28D0003; Tue, 20 Jun 2023 03:58:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 287938D0001 for ; Tue, 20 Jun 2023 03:58:15 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id ECC2780698 for ; Tue, 20 Jun 2023 07:58:14 +0000 (UTC) X-FDA: 80922373308.26.607786E Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) by imf27.hostedemail.com (Postfix) with ESMTP id 2BA1D40005 for ; Tue, 20 Jun 2023 07:58:12 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=jjSq9aTI; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247893; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oZY6w9J7toyiTSnrDYdTl/iZtQ4DwvkRvcpzIuD6Z/M=; b=KY9JAXOW/swXlBjKMhi2Y/MIppYfxTdea1GEprYHxM9DIu8GIEELjbwbuEd3V00xW/9p3E JZFXqLez8/S3fujHgG+flqZX6jrmpbr0/OqE8knCt1vUqnZ5dqeiBxxbCRRdoGkET0lH8M oHsgENk/ax9SnqNv6WRWNjs1sDoebM0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247893; a=rsa-sha256; cv=none; b=1uyHndS02BZBKYis44BYyqIOM0DB8ODuRM7gNM25ByYgQiJU8dkGAFhCEyQ4ElTq5qnSaF 0T/tHhdHIUCrwXNUqh7kWJxEtFxcBBEFXY+3dAM/9NlDxdBZCl6j2HX4ulTV90aETwklIm f2b7eyYJtnigMUrzWUJ7GLhRV2qQMhc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=jjSq9aTI; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-57045429f76so46745067b3.0 for ; Tue, 20 Jun 2023 00:58:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247892; x=1689839892; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=oZY6w9J7toyiTSnrDYdTl/iZtQ4DwvkRvcpzIuD6Z/M=; b=jjSq9aTI2DDqxscN4pq6tIW2t1oB2Us2M+ybVjj+WbNNuPa0SHl8anbtzK7Rjar1SY FtbdELwb+qm7aZfRDGlas9htldGEgGJo8EUel1IgLBRhuOjybXZ2JhWFWoNydWhWnLoe sDikPgviTF+F9BLGHUmNHshe9jLjcHp4eHYN78AHXDshMx1gmgi0lwXRrXrymzvfy1oS sSS/SWZGOylcmqt/OOyH7yW5jIVhDF2myOH5nmIHjTp4hCoBYxyIV0dBLcoGfKjX+nSm SCqmPcSA1OfsDZ4JORRyiDnJMkamZlwDweNLYjOmQrj5114Aq+ZG9d2oxbylEIDpX6zC XvUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247892; x=1689839892; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oZY6w9J7toyiTSnrDYdTl/iZtQ4DwvkRvcpzIuD6Z/M=; b=fVnNrdF48UoCXRK3IQsJPqSNWX8EpufrOkdvbnptddZmktkRFyV4kmfyI1BXUWJFMB 2EN5O5piynZXXzbKVN2QBfdBtz6N2SV8mRGwhyalCSbkqmYRUokuCnNSIC2etmHDLzLa JzEo7H9MvQZzYxRWbBAznvWlWwL39zFIcKAGbaLg3RsQr2LDQk5AJA1tSTToo/3yr/cK eMbGiYL9KAIJCmsd00w+d96x0mAGHi8UtiwsTNI4klsppUyL+H24MA+dbmSQ0cdB1A9I bVA+3ASf0qAU6IHs+xhuYtJ60Co2sNcA+i6tlWJqm77nK4qIozosRJ2WweYQqAcx5YTP 2N0w== X-Gm-Message-State: AC+VfDzaX6b9jqeSw1P++mP2Cl4cKplQ+JEgUiiZowov8ZNZ0ymosT9r xN7rl87nH1YdnpVsZRs93GjozA== X-Google-Smtp-Source: ACHHUZ6WOKlFi1AP0MbqxWtb/dOy1NBeYqPcXgHn95ddmUtg48jpbqy+Ay6U+MoHnAVEhjTgTFamZg== X-Received: by 2002:a81:9157:0:b0:570:8802:ec9f with SMTP id i84-20020a819157000000b005708802ec9fmr10023541ywg.19.1687247892063; Tue, 20 Jun 2023 00:58:12 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a133-20020a81668b000000b00569ff2d94f6sm385736ywc.19.2023.06.20.00.58.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:58:11 -0700 (PDT) Date: Tue, 20 Jun 2023 00:58:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <90cd6860-eb92-db66-9a8-5fa7b494a10@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 2BA1D40005 X-Rspam-User: X-Stat-Signature: ph58ta6gtphi5xsdgeog8r8a4adito5c X-Rspamd-Server: rspam03 X-HE-Tag: 1687247892-923031 X-HE-Meta: U2FsdGVkX197Ph8NmfWi8qIBilKgpnhef3yAFYsMsb8kljb5x0rE2acvHE+AasavVP4NB7UcPv6T2LLfHxzf6GDZhx18DQ2ekTmzt/JjquqjrlQ0QPlX+eC9PL0A6ArDegMHu8SyU/IkIPT2ffOu9BKAd79y/JfwBlhkiciUQLmNIA2hJVqIUQ7DMvEU+OkTqz8ErPGjKUFr94kB71KEyqsRkjKPIXzWxfpxKvU0+ZCgPrW1Hb+43T1tQVr9Xkv4lQRi78Ni4DRtEO7kO1ib57DYR5iFNgiJ2bDgBIJ762BqaWn3hwZaKiQdlpkGC4lT8UKILi3r/os/EcvMqsoo64XN0wvHkjeBCcEV+/+lkT9rbVqffQU7t5w0uwI6BliumnIOzjouVQoauWVh9LXvH9xckzaQDAwcqoeSZjX7vE69Z+CbMda+6jWxUZO4XqS9AlzIbzGpYWUYjs/Y42Bzl8FbuzryhEj2WE1w2OzUIKVruiFso0g56E11L6vI7NohytX7r6SD1xxjAsETsKsKagRUgf1p1Qm8OcLJd5bQ0iJZPFTVqnRTnn4EspHVaKL5EhKeE7LZfny8XxepOudojVwhKvZo9ADJWf9xqq17Dsot2Xb2cwF4zakTL/8utkBJrPqIrCx5FkDTFlYK1hoMI6CPC/l/0Ha+Ydn1Wuv6M2Z2uzquJMgKZv2G+9f4/siZqbpZ3eF6RZBLjGCc3R211LugQnzNJLHLv+fhHJXH3zliR7rJiotd4Spr7q9lj1lC6/VBZgg3eGFy2Rmi+H9oA/yFId4IrOq01DgK6AYSxxVRlzF/FtXj4s9iL0+wrq4OzkynFxHFCFUpbt0tSFKxotgTzV1RwcuWesU/ybaYk7HBrN+gpU6WsI2yOHlrFnOSWRLTvgOzL3WQO8uF5e0MLelLLeyTwHOoJaHIMQv+9zA2Xdm2EIcwmqp6rk7fkeis4TTyT4UMrwGY3CItwdM JMvI5KPA it8MMPkgn4YCnJFItNwPPZ0pM7l59kccemKwC4klthdmhaj4RCW4OO2NVB6WxjNd8OzxfBkTxFMpE4pLZQq6nAhRxyjMiIIFZzCraS0i0UOkyTuyp0NXWyN2+svMBe+QqidBRhqml5hSJi7xCXRQ8KfORCog0dvF+GyHZMkWgDOLortsvEjQYtNiFLNgCF6DqztpoqXEHus9and+tHDAbpdTQ+ZDooj5Lu8sj5/JxxCMuiTkIucWo8poiU6T+BFWZOQubrnuPSd/S2o9meSYqTPxDxwerC1O7NlYqW4E9Y3JKaqMRdffuXgxZwy8bpC4TAtiSCyKNe3EBYQdnq2EAzE85ZXd80u4fXRosPs9bjGtlVkkmFBP4/vbESVgBHyzkfs2zLYGHujmBmrFPHUGA0e7nkBrN3KrsmvnuffNwWLCPJv84MGpqXivId57Bl5/Tinsvvd1Ra4g7du9Qps2K3IbZJoZ1iVGWdofKdoNR9xW8ithd8Ughlwlhle5GiDK3UVPk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that retract_page_tables() can retract page tables reliably, without depending on trylocks, delete all the apparatus for khugepaged to try again later: khugepaged_collapse_pte_mapped_thps() etc; and free up the per-mm memory which was set aside for that in the khugepaged_mm_slot. But one part of that is worth keeping: when hpage_collapse_scan_file() found SCAN_PTE_MAPPED_HUGEPAGE, that address was noted in the mm_slot to be tried for retraction later - catching, for example, page tables where a reversible mprotect() of a portion had required splitting the pmd, but now it can be recollapsed. Call collapse_pte_mapped_thp() directly in this case (why was it deferred before? I assume an issue with needing mmap_lock for write, but now it's only needed for read). Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 125 +++++++----------------------------------------- 1 file changed, 16 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 060ac8789a1e..06c659e6a89e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,8 +92,6 @@ static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); static struct kmem_cache *mm_slot_cache __read_mostly; -#define MAX_PTE_MAPPED_THP 8 - struct collapse_control { bool is_khugepaged; @@ -107,15 +105,9 @@ struct collapse_control { /** * struct khugepaged_mm_slot - khugepaged information per mm that is being scanned * @slot: hash lookup from mm to mm_slot - * @nr_pte_mapped_thp: number of pte mapped THP - * @pte_mapped_thp: address array corresponding pte mapped THP */ struct khugepaged_mm_slot { struct mm_slot slot; - - /* pte-mapped THP in this mm */ - int nr_pte_mapped_thp; - unsigned long pte_mapped_thp[MAX_PTE_MAPPED_THP]; }; /** @@ -1441,50 +1433,6 @@ static void collect_mm_slot(struct khugepaged_mm_slot *mm_slot) } #ifdef CONFIG_SHMEM -/* - * Notify khugepaged that given addr of the mm is pte-mapped THP. Then - * khugepaged should try to collapse the page table. - * - * Note that following race exists: - * (1) khugepaged calls khugepaged_collapse_pte_mapped_thps() for mm_struct A, - * emptying the A's ->pte_mapped_thp[] array. - * (2) MADV_COLLAPSE collapses some file extent with target mm_struct B, and - * retract_page_tables() finds a VMA in mm_struct A mapping the same extent - * (at virtual address X) and adds an entry (for X) into mm_struct A's - * ->pte-mapped_thp[] array. - * (3) khugepaged calls khugepaged_collapse_scan_file() for mm_struct A at X, - * sees a pte-mapped THP (SCAN_PTE_MAPPED_HUGEPAGE) and adds an entry - * (for X) into mm_struct A's ->pte-mapped_thp[] array. - * Thus, it's possible the same address is added multiple times for the same - * mm_struct. Should this happen, we'll simply attempt - * collapse_pte_mapped_thp() multiple times for the same address, under the same - * exclusive mmap_lock, and assuming the first call is successful, subsequent - * attempts will return quickly (without grabbing any additional locks) when - * a huge pmd is found in find_pmd_or_thp_or_none(). Since this is a cheap - * check, and since this is a rare occurrence, the cost of preventing this - * "multiple-add" is thought to be more expensive than just handling it, should - * it occur. - */ -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - struct khugepaged_mm_slot *mm_slot; - struct mm_slot *slot; - bool ret = false; - - VM_BUG_ON(addr & ~HPAGE_PMD_MASK); - - spin_lock(&khugepaged_mm_lock); - slot = mm_slot_lookup(mm_slots_hash, mm); - mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot); - if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) { - mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr; - ret = true; - } - spin_unlock(&khugepaged_mm_lock); - return ret; -} - /* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) @@ -1706,29 +1654,6 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, return result; } -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ - struct mm_slot *slot = &mm_slot->slot; - struct mm_struct *mm = slot->mm; - int i; - - if (likely(mm_slot->nr_pte_mapped_thp == 0)) - return; - - if (!mmap_write_trylock(mm)) - return; - - if (unlikely(hpage_collapse_test_exit(mm))) - goto out; - - for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) - collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i], false); - -out: - mm_slot->nr_pte_mapped_thp = 0; - mmap_write_unlock(mm); -} - static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; @@ -2372,16 +2297,6 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, { BUILD_BUG(); } - -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ -} - -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - return false; -} #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, @@ -2411,7 +2326,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, khugepaged_scan.mm_slot = mm_slot; } spin_unlock(&khugepaged_mm_lock); - khugepaged_collapse_pte_mapped_thps(mm_slot); mm = slot->mm; /* @@ -2464,36 +2378,29 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, khugepaged_scan.address); mmap_read_unlock(mm); - *result = hpage_collapse_scan_file(mm, - khugepaged_scan.address, - file, pgoff, cc); mmap_locked = false; + *result = hpage_collapse_scan_file(mm, + khugepaged_scan.address, file, pgoff, cc); + if (*result == SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + mmap_locked = true; + if (hpage_collapse_test_exit(mm)) { + fput(file); + goto breakouterloop; + } + *result = collapse_pte_mapped_thp(mm, + khugepaged_scan.address, false); + if (*result == SCAN_PMD_MAPPED) + *result = SCAN_SUCCEED; + } fput(file); } else { *result = hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, - &mmap_locked, - cc); + khugepaged_scan.address, &mmap_locked, cc); } - switch (*result) { - case SCAN_PTE_MAPPED_HUGEPAGE: { - pmd_t *pmd; - *result = find_pmd_or_thp_or_none(mm, - khugepaged_scan.address, - &pmd); - if (*result != SCAN_SUCCEED) - break; - if (!khugepaged_add_pte_mapped_thp(mm, - khugepaged_scan.address)) - break; - } fallthrough; - case SCAN_SUCCEED: + if (*result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; - break; - default: - break; - } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; From patchwork Tue Jun 20 07:59:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D5EFEB64D8 for ; Tue, 20 Jun 2023 07:59:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F16888D0003; Tue, 20 Jun 2023 03:59:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC6718D0001; Tue, 20 Jun 2023 03:59:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8F408D0003; Tue, 20 Jun 2023 03:59:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C8BAE8D0001 for ; Tue, 20 Jun 2023 03:59:57 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9E90280686 for ; Tue, 20 Jun 2023 07:59:57 +0000 (UTC) X-FDA: 80922377634.26.D0140A3 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf28.hostedemail.com (Postfix) with ESMTP id CFFC9C000D for ; Tue, 20 Jun 2023 07:59:55 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=PTqSZx2n; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687247995; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=lMycDXsuNXFCRJlBBJUyuQ1xCJk/pRgQv2MZy3YOgLjgIFFPxlCpU4F4llDE+EeiGjeuS+ AYrxKUbAQxSQ9MybTW/gwycXcELKfwkiGr68GVSOCsTu3rI6XrOGUY56eHheJvzVdGvEq6 X0Pqgmvqv7HlJF0ORKGwgvl/gu/B6j4= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=PTqSZx2n; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687247995; a=rsa-sha256; cv=none; b=6XIH04LJl3oFA/8JlW4dUHykFy6XNkqAxwpiTBHfNQSm5PxG+tK8p4E9F4qw6QvgpePyaL YALTHZeM0PQr+V6o78/GLg819h4jfEy/mSRZGVkfpe1AzrJZaTPiCmunTJKC0m1lC7uWrD Ms9Yp4cXZT00f5B2WijHxFdirWWJBmc= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-5701810884aso39722367b3.0 for ; Tue, 20 Jun 2023 00:59:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247995; x=1689839995; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=PTqSZx2nNI4PV4JwXAt99NVosmlLWlQ/KYn1Uf1f3Yd1Z3NVe7obF1sAci/vRU4j/I /dODEkNukKEVJX5QJOgx8A4SVofs7E+EBk4aUKnEe/gxIGiuFO4K0lGYei4SR6tcYKnK AduYU9eLhwRb6vByQAsUQ6tUe6SjjlibwcGiUn9EmoAEYZk/fTkwOJxts/LviFdlBYyU 2HDG5towuNEQ5CvzS8P4/0FmCn98RyYyORjokeprTYbqe1PhIQYQex8ifvvnbwDxRbO6 McJrcKmHoDDBglMt6fs5aQdKWv2A5unXxMstk7k6pfo7pQ4FA/PbwXapoPL1ZYXTeDU7 ON7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247995; x=1689839995; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=HF62bK2Bt1k8wWshr3M+XdBw4BGMZnRpueg/FcMKKNEEbmpMvzNovYwvfqPBpOUOAb 83egEF5nbnClyMhPLqHf1Tu2CfuZGbFJu0fjVJeqF1ac622BWKULy+56WMhIvNd5Os8p RfYe/QbuCAfwYuME78c+RRINX9SMDVhz3Q8jv2yjik2XuyT38jbzUSc/6lijrzsp4Gx5 WEF+X6kMHONFLACl1cE0ICwQ+hciIyFJdsLdWMlCZ1Cepjf6wrL3U+uuUcLPO96pLb95 dCPBOagH5mUI3hRj8QCsbtiwj9gRupDFBUfjCFY55IRkMF+DwQ7uV1hoAlMfVE8+N69Q fJmA== X-Gm-Message-State: AC+VfDzMrcjTAQs9dUhpir8Psz0C1zAc5kU3K3VV9MWfL6Rcc8eqbZNs 1ij260w2ts/xZm1RNzKS5wQm5Q== X-Google-Smtp-Source: ACHHUZ515t86hS4ZGbNoo4Ty1vmElBXGedGoznKelWut2eqFES40KQtwKfY1cJoUB4hv3Ba9PnrZlQ== X-Received: by 2002:a5b:bc9:0:b0:bb4:14a2:fb4a with SMTP id c9-20020a5b0bc9000000b00bb414a2fb4amr2548922ybr.9.1687247993367; Tue, 20 Jun 2023 00:59:53 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v190-20020a257ac7000000b00ba88763e5b5sm268132ybc.2.2023.06.20.00.59.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:59:53 -0700 (PDT) Date: Tue, 20 Jun 2023 00:59:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <27505a8-e717-61ce-ab70-5f79d9bf646b@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: CFFC9C000D X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 7bqaesern17ypgguzkmowqrec4gj6fsn X-HE-Tag: 1687247995-993758 X-HE-Meta: U2FsdGVkX19gaYmLqq/itrRQql8uq3KgnBG6UDRHvY0hjYDhqTZvcOqdweE4TCdToSb+aLKBF204xIyaEISNOD3dyj4X/TtMOm4Us6b5GrDu6FqmMms56+N/zeS6ylEJrO0P1WRZFI2R7tBjOeyV9nOuSwDpsZbgT7mn/pquqqetNKd+eddN0mtm1VWF7Ys+pG2fp99X0i5i9Xrpia+Hlo3KFFact1PukR6rLDmjsqIJ2WKlWLo86ES4AIU3N5eCr8Y0VSdB27YX+FfItP9vXcWTnmwdLhP46HMRi96sIfVKgPkoY7HR+atSiLndZT6yb/nBMTZBSpxVl+TQO5T4OwErmOfgBCqE1hl2MP1PZW/BozqErgWifqFIjRuLokSAQUZC/bQgXPmhdYr7GPZFJJM37MbQA/HpCMVjRve42/B0lbjt2DR0w5tnEzNh1X/fwt0mszTGbQwFyphkmsZZdnPj/p4iv1HxOjnVJI2O8gtjDC67MEqdyvDNzj7BCInLSAwYkSi8nifncI1iPZpVuF7kBUGljRHbEgze5pGBNQqhWyOq5ANthXD6EDu9WEhaVBMPkFANfYczbjcTaRyZ156qI/cBgUZU2FCrTVhO9kBsU6VRycvCNzbKZEGCmvIOjhTz3dXVgkFKbSoFqOYBrM61+GjeBr3315wXdU/8BvqAwAtkqEyrfnw/jkI0P1GrUrGasJCWhLrabpIKsv1snsfhxV8PQjpT1H2sqh5CgpbayHAelab+92ZAhQJkBReXIJYNN6uAJY2d+MCxeZ6+1iXWV/lGOAHCAe6Ip0trIk1fTcbrPG2tu3iZeBLDDvVY6GprPwLZW/BjbGDYtvrXyzxys2CYULhmjXxrOkVrbMsmhcpSqOyeQrBSdNXsFH2oGuUNNm8z1umEIpMHW3qGiLTqQt0duhzrsXECygoRVDneWlkQnId3dWenHY6/cbVrn1VY0iCj7UlUdulWQRi t+WHq2P7 Su0iDHMe7s0/BxNNt8VvXAlyE7B3xalKEn+S6TuIJ5I0rhLFfYNwmQEWUrnaP9cd3fj1SsUU1ahAwzzcUwqi5FGXTC89AbnzrYcEfyiRI7xcf75AsSogCJ+QYIdIKnAeWm20wAnTVsnYNsDsCItqYA+Ckx6CJ9TE8KquRRuVHus9BXJFVe1dyN2WJQyD4ebrV64U5yjo/k0WDaKt4+YovQ0ndUNrO0XPYcvRRzXs0ucp3gbxZ+katgp2X1jyqColPrf4+nxkXap09+GHM26qeViH3KW51Qwc80Tgzy4uWYpkBQ8GW2v+XlLBCdZuvkNC/kk3LEjL0zdc7OXpgNBL9KpP49el2Hq/uleUQ+IVVt6rhlfRS9xWcNPalPxMrPcPat/X094Hi5hFVDQHbxbz+CQhsTxHukBA0Ryjq1LnpNPoVq+MJoN7Le+MhIAgxXBIymT7S0KLH9PN629nLpbBdB+TRM0EDTn9zIBNomIQOBd/m+jDk70q6tg7K+u2Wrp6XUqBV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: mmap_write_trylock() and vma_try_start_write() were added just for khugepaged, but now it has no use for them: delete. Signed-off-by: Hugh Dickins --- include/linux/mm.h | 17 ----------------- include/linux/mmap_lock.h | 10 ---------- 2 files changed, 27 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3c2e56980853..9b24f8fbf899 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -690,21 +690,6 @@ static inline void vma_start_write(struct vm_area_struct *vma) up_write(&vma->vm_lock->lock); } -static inline bool vma_try_start_write(struct vm_area_struct *vma) -{ - int mm_lock_seq; - - if (__is_vma_write_locked(vma, &mm_lock_seq)) - return true; - - if (!down_write_trylock(&vma->vm_lock->lock)) - return false; - - vma->vm_lock_seq = mm_lock_seq; - up_write(&vma->vm_lock->lock); - return true; -} - static inline void vma_assert_write_locked(struct vm_area_struct *vma) { int mm_lock_seq; @@ -730,8 +715,6 @@ static inline bool vma_start_read(struct vm_area_struct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} -static inline bool vma_try_start_write(struct vm_area_struct *vma) - { return true; } static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index aab8f1b28d26..d1191f02c7fa 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -112,16 +112,6 @@ static inline int mmap_write_lock_killable(struct mm_struct *mm) return ret; } -static inline bool mmap_write_trylock(struct mm_struct *mm) -{ - bool ret; - - __mmap_lock_trace_start_locking(mm, true); - ret = down_write_trylock(&mm->mmap_lock) != 0; - __mmap_lock_trace_acquire_returned(mm, true, ret); - return ret; -} - static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true);