From patchwork Wed Jul 17 22:02:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735864 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8B25FC3DA60 for ; Wed, 17 Jul 2024 22:04:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=GudNUeBXFAVYmPqNkxHlIXKRQKKOee72suKUTBQc/Jw=; b=p95/h+HK7/4zIq YyR3n4fiMh4PstW9KlBiqHs7eahapY3n0DTqlv+2dDE0u3TFs9WA6vDmVFe9Kicd2DxfjOGAdqioO nj6w235dEbhPK+sKdgKuNGUts1fXFmFLvDrfFqq0IxWUX961xopNKtYwfhf1rtUIGrxvaeGlFNlgA CH8azSXu2pWcAmEywjeZwiPERQpZJn05VoUZ+OBnt+/x7O+km0G7D4S7e61QLCMpNkWz8DO3mGczu iUzUtHo35/04nBVWECJRbsjXzUfx4sX7iueW2d9Alxfj25iW4k8yU0tzk0w6MW87/hZYsVMgRZQRO xZwLmVDsqrfA4grn9FOQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUClA-0000000F42p-0Bjz; Wed, 17 Jul 2024 22:04:28 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjG-0000000F3A4-08Vk for linux-riscv@lists.infradead.org; Wed, 17 Jul 2024 22:02:58 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6jtsZY5Ha1GOE7eeXCKiwGrLCKWt0M5FAlFhM8vgy8I=; b=JyyrznhH7PKYjfMHJ3adtMwE/px0QtCPV2ihma2ICUkzJDdpoeQTDhKXwJuyWm2iVlFHcC Io8iG29tmdBzjIlVoY8UGZqJBv1YPqtSRfh1JVHt6oiCZWHp8b31b4R9owwaRpSgPD/0Ov uPLaxCdhEup8snzIfGC0negn50hJDwU= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-357-RE_ail3zP5qJ62vzltAsxg-1; Wed, 17 Jul 2024 18:02:25 -0400 X-MC-Unique: RE_ail3zP5qJ62vzltAsxg-1 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-447f9d993c2so175531cf.1 for ; Wed, 17 Jul 2024 15:02:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253745; x=1721858545; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6jtsZY5Ha1GOE7eeXCKiwGrLCKWt0M5FAlFhM8vgy8I=; b=o+1403LZ96/t6p4ZIGEBIFKGzH8pY3TB6Xf/wbJVm5WW4myBDNDku+su0taMs6nb8H N/Iy34cBkbCIEBtxJ/hhvsC1v8rc3AieO2ZuqZ74Z4eFTWTFWM+ynL13a4rRr9A61+K5 DJEK5e9DH/RZPtBoWntYqSTDHGQyPWfEnfzFSdlWnrlq0xqTQHpZ7lekH9cUM9NzpxHG YeHZKzfJzzOhweSYjNqz4OZe1NAkqgW54Zj2KcmGaUljgP6+ivQjntz6XFmg9fQth8Hu tFtVkwi4bEIqf+ZDIC7hb8wPf0EzrONyfFeT33mONknk7NKiYp+f/Kvh3QgYKopbMd7d ugiQ== X-Forwarded-Encrypted: i=1; AJvYcCW9tt2HvcJruBE2NudNy0hUKtSNRstWHxXkF3/bZlL5huL5lKND5kbIqgb+xiBOg8WJTwMNute+tVeK1PJVE+zSuDeg9xPM9wLKPtY2jP8x X-Gm-Message-State: AOJu0YzD/lCY3+BIi9De/uzrQOBvgrSscBrzw2wSIhLP00oaX743nkrJ 92+v1+9wdazJ1N+S74smVsteYSu/5NzkNYK1bLpCuDuI9kuJ8Wybex/uV3HBumBtS5zP3e2Nee6 QrbaXbD1psan1b3i1YZ2qKPgplrcCVwmMjg7Z4iyy5eTvW8uQRxPcODpnMqNHT+LVWA== X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22369421cf.1.1721253744673; Wed, 17 Jul 2024 15:02:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFUvRqkcMnpaLn941R1BXuIssxTrm35f8/3mo50id9Lq3VPmnZD9wDrl4u3hs2U9BoucnNtpA== X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22369171cf.1.1721253744329; Wed, 17 Jul 2024 15:02:24 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:23 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 1/6] mm/treewide: Remove pgd_devmap() Date: Wed, 17 Jul 2024 18:02:14 -0400 Message-ID: <20240717220219.3743374-2-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240717_150230_387837_DB5FC578 X-CRM114-Status: GOOD ( 11.04 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org It's always 0 for all archs, and there's no sign to even support p4d entry in the near future. Remove it until it's needed for real. Signed-off-by: Peter Xu --- arch/arm64/include/asm/pgtable.h | 5 ----- arch/powerpc/include/asm/book3s/64/pgtable.h | 5 ----- arch/x86/include/asm/pgtable.h | 5 ----- include/linux/pgtable.h | 4 ---- mm/gup.c | 2 -- 5 files changed, 21 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index f8efbc128446..5d5d1b18b837 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1119,11 +1119,6 @@ static inline int pud_devmap(pud_t pud) { return 0; } - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #ifdef CONFIG_PAGE_TABLE_CHECK diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 5da92ba68a45..051b1b6d729c 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1431,11 +1431,6 @@ static inline int pud_devmap(pud_t pud) { return pte_devmap(pud_pte(pud)); } - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 701593c53f3b..0d234f48ceeb 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -311,11 +311,6 @@ static inline int pud_devmap(pud_t pud) return 0; } #endif - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 2289e9f7aa1b..0a904300ac90 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1626,10 +1626,6 @@ static inline int pud_devmap(pud_t pud) { return 0; } -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \ diff --git a/mm/gup.c b/mm/gup.c index 54d0dc3831fb..b023bcd38235 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3149,8 +3149,6 @@ static int gup_fast_pgd_leaf(pgd_t orig, pgd_t *pgdp, unsigned long addr, if (!pgd_access_permitted(orig, flags & FOLL_WRITE)) return 0; - BUILD_BUG_ON(pgd_devmap(orig)); - page = pgd_page(orig); refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr); From patchwork Wed Jul 17 22:02:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B9BF8C3DA60 for ; Wed, 17 Jul 2024 22:03:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=1ctkhs3wlFLcVrIka78jPbAfmE1O5HocZ4s9DqB9uvE=; b=DNceycKpIwoYW/ hW00dQYV5+Kkutf06ehOXwknc+xJQMFIBAOxpkhuPr2SopZvfUU7DqmqozhSHwsP9S3bh3wPNbmm+ QbiF+cudG2JAEMlNJIQ5VOFOEAxF07YwimeskQ8KrmuFcnGJrKwgXzOp+Uh5TNNEwmEJmaLfkRJy2 wh+iFRxIkbLAeMXJSxAVOriSa6UmFNRD2JuSMf9Kw8OUuJukahFxOeQ4+LLMMK8NqnOJm/0KCgZ7j JFu97kXzpS+JStrDdtEgiyWFsyQUXT24mdM+iAh059kQfD8I8YRFq+WCvshbBnuh1auVgm4+Hwf6/ TwT/Nsd7X90awsuop7Vg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjl-0000000F3L3-1gBe; Wed, 17 Jul 2024 22:03:01 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjQ-0000000F3Eg-1wwQ for linux-riscv@bombadil.infradead.org; Wed, 17 Jul 2024 22:02:40 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:Content-Transfer-Encoding :MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Sender:Reply-To:Content-ID:Content-Description; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=Qc1wUAvXxoh8OwQQuECShNLC5Y ZGqbxvOOWr+CWb5yOGwsMQwzLZ40rA2Acf0nG6lqRQzGvo4XnAe3USa8abj9QjBF4rftPzS+CIqXL 6DFq7bPhK0Euu1o2ObrGmqQkmVKh2pT3meiuQj8XS1pAf6lJJvXDxnkDqAYGabiIX9Zn/Pjm34cqw 7kY7eaOKQjZp8lG5v53nrbffGztJ6jFYWzFeDFt9gLFncyPu4FBWD3CU4JfgM/0TJoSFGtQ+nheqW Tj2fpHJH4vBegUWwyKcB62vK8qVfQaKEUTivjXq0AHkdIg+R2mPDY5aWhZZegmzmLOz3D6Yv2BHyx uEMOskmQ==; Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by desiato.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjL-00000002Q0I-2hvi for linux-riscv@lists.infradead.org; Wed, 17 Jul 2024 22:02:39 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=ESRhlKtK2XbcqjHPvNhWyDnEIg6n6REXvrkJcQueiuwGAKzYFTile1c/uYc4T/kV0Vb7R7 ljjS0TdBuK0xxTMciaCUbc1FuubY4nbzqzD4YQl+Pnv8xtq/CSUkqO0FxolzY0RjvWQ6kQ zBJr1iOivcRXyRGHovl0bnLMeeOe9GA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=ESRhlKtK2XbcqjHPvNhWyDnEIg6n6REXvrkJcQueiuwGAKzYFTile1c/uYc4T/kV0Vb7R7 ljjS0TdBuK0xxTMciaCUbc1FuubY4nbzqzD4YQl+Pnv8xtq/CSUkqO0FxolzY0RjvWQ6kQ zBJr1iOivcRXyRGHovl0bnLMeeOe9GA= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-644-1c-2s5iXOEqY4HuuKMA50Q-1; Wed, 17 Jul 2024 18:02:27 -0400 X-MC-Unique: 1c-2s5iXOEqY4HuuKMA50Q-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-44aeacbf2baso259341cf.2 for ; Wed, 17 Jul 2024 15:02:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253747; x=1721858547; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=YwblS76d78ZqSXZpy8sB2YLS/e8cIub9ufHKZ9IizHK3gWziajkEEpM+EjxUaQVKgS ljifBo3g+Rvoi/HiuMUAwwMubnwI5G6hiG54BoZnN3e3+JieHFy016GdqFBTVM8Hm7Ng HlP4zsBJgX1oLhm2ny5rudM9amSZs5u0qy3WeyymO58Vh4R6qzSJOGzApVEpZQXhmtMZ Yzf4nQN4EWvT/82JC2yV3qnbia4h7NbQo0vqGXWCns6GssVMPPwRuX5vrxL0BbTsFPj1 nQMxJ48jGlxZUAy6ZlhR0UGnZV48RzWtnfD+ZAyg3oNAT3XKjop1PiVYp/UhEs8AQvuH dLJw== X-Forwarded-Encrypted: i=1; AJvYcCWvpMMoDsSw7mXWULj/WMackqjzlTcZMHscBvrwfGrRBXmd8sWj3v8nurFNjhfqOoOvwazAph70U7vQy8w8NLXtSA1mXzgVHkZ8VPkIgl95 X-Gm-Message-State: AOJu0YwSpQT+3z6cYySYGWJn5JMRWHH5Utpwm4OIGjktP5iPAmrgH8Mj kgP06kc3QsWVXoUPFzO5FrbdLEZQG9fUQbbtKjfUCc+8aABpeVmBmQTSMntLUGXwwb4sYUFwO7r B5GM6s3/qtaPhNGcWU6DuwbAZi0dSDFH2CmsvjhP6iQ6Yc1mGQSNglR9qRMAYMYrZtw== X-Received: by 2002:a05:622a:3cd:b0:44f:89e3:e8d3 with SMTP id d75a77b69052e-44f89e3ebf3mr16472681cf.10.1721253746702; Wed, 17 Jul 2024 15:02:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG7KLWkbU1UwJKSHrhdblARNi4OMqfXeCTcT+ovTUhvRHwOTG0GQKyZtUb2OdQ3d+/fWLwEbg== X-Received: by 2002:a05:622a:3cd:b0:44f:89e3:e8d3 with SMTP id d75a77b69052e-44f89e3ebf3mr16472521cf.10.1721253746301; Wed, 17 Jul 2024 15:02:26 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:25 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 2/6] mm: PGTABLE_HAS_P[MU]D_LEAVES config options Date: Wed, 17 Jul 2024 18:02:15 -0400 Message-ID: <20240717220219.3743374-3-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240717_230235_843887_BA7F56F0 X-CRM114-Status: GOOD ( 11.82 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Introduce two more sub-options for PGTABLE_HAS_HUGE_LEAVES: - PGTABLE_HAS_PMD_LEAVES: set when there can be PMD mappings - PGTABLE_HAS_PUD_LEAVES: set when there can be PUD mappings It will help to identify whether the current build may only want PMD helpers but not PUD ones, as these sub-options will also check against the arch support over HAVE_ARCH_TRANSPARENT_HUGEPAGE[_PUD]. Note that having them depend on HAVE_ARCH_TRANSPARENT_HUGEPAGE[_PUD] is still some intermediate step. The best way is to have an option say "whether arch XXX supports PMD/PUD mappings" and so on. However let's leave that for later as that's the easy part. So far, we use these options to stably detect per-arch huge mapping support. Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 10 +++++++--- mm/Kconfig | 6 ++++++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 711632df7edf..37482c8445d1 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -96,14 +96,18 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) -#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES -#define HPAGE_PMD_SHIFT PMD_SHIFT +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES #define HPAGE_PUD_SHIFT PUD_SHIFT #else -#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PUD_SHIFT ({ BUILD_BUG(); 0; }) #endif +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES +#define HPAGE_PMD_SHIFT PMD_SHIFT +#else +#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) +#endif + #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) #define HPAGE_PMD_NR (1< X-Patchwork-Id: 13735860 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B0789C3DA5D for ; Wed, 17 Jul 2024 22:03:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=NVi7tn1BZyu47pGDSVd+AH+5dpNP7FJxg2LtrbCVN28=; b=1Oh4zTIDdahRPo KdUsexsrdzUyK74YZ9Xtq0DNHwFkxqXMsGGCqQM9nRJhO1xvbNc5rmLUSn99sI1QZ/J5jrYaDLcfM y8mFUcuhJWjA10f6kkP5loPp454yglEYD8gNpM7Tr8+oPrHRuZKUwmqlxLXq1YC1uLZO40SgnEoU6 A+qZL+JNWgzbfkKF4oMCcyGHegRCmHq6+XGTvyxxjNjUrujYWrxJaR7rBbXqAKzcyvgj5eeGE9Lmc /XAsqWmnJuifkkPwlFsSy0UuHY2IXdZ3SlID+EjP3W/f46kboAIfkivtbb7uoCehKBNtLyvwbjsGQ iYiQA13AQqMFX59BF2fQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjm-0000000F3LO-0ceA; Wed, 17 Jul 2024 22:03:02 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjQ-0000000F3Eh-1zez for linux-riscv@bombadil.infradead.org; Wed, 17 Jul 2024 22:02:40 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:Content-Transfer-Encoding :MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Sender:Reply-To:Content-ID:Content-Description; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=T/A3nOTw2l4XBrUHyc5lhDLeY8 0Ljdtoe4lq9N27rmn/vLrrI4G46le+vN3ZYgbzb4T29D+ldtz9MJFjJb7ImtG/US9ewKB/YapoBmv jyHF3lcqwjqUngGEcgpPxmtiZ7wGG0ORIBwt8cTyi25NKH/uDx/iZ9M1xZrfJUJ24AN6LioPmSpJq fiyzRdZHUe9vPutSoZmHp2EjdVDq4xrpWnJyI4aRjiA3dzLgpxgp0zC72CRfBajPHPFq9vneH55m/ lZaGSykFJgu4zsCKHh8WOHkItPEIFPJDGreKLFuI0zDP1/0/pvMlBpBo5Ya+y7DUlNdtC2OON5zly YpgVaZVg==; Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by desiato.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjL-00000002Q0L-0ysB for linux-riscv@lists.infradead.org; Wed, 17 Jul 2024 22:02:39 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253751; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=izz1NsE7z16KviwtOvglK+lpZx+KGQJ1uC7aJKZRi9qJtEGHXUKylk5QYfOkQHJxnV1H+x 4EzpGu3fPefhtVZImEc0ROdlgX4SfoIFRoP1Cw+mMEcye8G3leLRw80QPGrQZhOoISfQOm Sd21pGyb9NFaT0TLERkZW86tdkreZr0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253752; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=Uh46qa8sW6D/0tbJTNzYpJG4GHZuoVztrUjb+cJ2sZ2ZQF5sITvPiO2GZlIIeBjSiVd39A GX569oG6FcvbLqRJxwzkfZg7fi4u5q6S0zsWSiJxuvH1NVDaayolJ0NIJX9HICnlk+P/Nx IiLAzk20b1cTfQRc9khqRqrPFTwmxU8= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-465-ZyQrg-7EM7GFpuAEu4JJEQ-1; Wed, 17 Jul 2024 18:02:29 -0400 X-MC-Unique: ZyQrg-7EM7GFpuAEu4JJEQ-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6b7740643fbso590206d6.1 for ; Wed, 17 Jul 2024 15:02:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253749; x=1721858549; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=vW92o7RszrtijKcs9A7rWMcPNStVsi1+Jq8wncTmBv/pMXI71PAo9TG5zxwIh28yhu crV2xM9xjGXEtwz7Lh9GUrhQD58RzYpSjglunzGFViQjFHF4pzrAjf3JNtbS/x+WSQRI mhm8+wTp2lK4x/KEk9EP7PdRUItANIpOlOC82pESOHBtijOXEEbXvJ25PQBtbGIEYdUw h4LyzmzmLaRDVExK9nE74eia1PF+/dNRhEsjyPTtx4xDEcbzmd2rC6uJvvPnY7xwBVn5 VKf5HYqzkDHedhK/UfkK8uLcOxIvrSl7U0xgkLADreHRlZBCijMsDP7McR/miW7CRxaf s/yg== X-Forwarded-Encrypted: i=1; AJvYcCWFw0b37vt6lRjElWJ85Dd+36cUHDR2UQzeVLibqd9P0m4NXH5rDAL0Y4CfGJbpzdjqgf8T9BesWyYyNAiF6n0JcRyU/oJa4vlIaNoomjXA X-Gm-Message-State: AOJu0YylJE7bUNK3US4NU9/JF5i54hXTdEHK9wmRG233dsxsAdP6FSr9 XdWvo42415T4DPlQZsA0wz5sa4wW6XEZShEJXtE7q4Ic5g/GOZD90dd6tIhsGLF4qNmJFJ7cLNp ThPso2SS4oZ2ip1GGchPUnOitTW4NdhzaiOOilGMP+6LjfAKR/MYV4TNZnUSB9A4Zyg== X-Received: by 2002:ac8:5e4e:0:b0:44f:89e3:e8d2 with SMTP id d75a77b69052e-44f89e3ec09mr15925771cf.12.1721253748781; Wed, 17 Jul 2024 15:02:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFkh4XOnr3AtOIW10S+BWZqmnCIDsT2k5pxPdK7+Hm3t/qz0UGl/1x0RrFHwUrrZQJVVqVXtg== X-Received: by 2002:ac8:5e4e:0:b0:44f:89e3:e8d2 with SMTP id d75a77b69052e-44f89e3ec09mr15925451cf.12.1721253748289; Wed, 17 Jul 2024 15:02:28 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:27 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 3/6] mm/treewide: Make pgtable-generic.c THP agnostic Date: Wed, 17 Jul 2024 18:02:16 -0400 Message-ID: <20240717220219.3743374-4-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240717_230235_581736_528CB30D X-CRM114-Status: GOOD ( 12.62 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Make pmd/pud helpers to rely on the new PGTABLE_HAS_*_LEAVES option, rather than THP alone, as THP is only one form of huge mapping. Signed-off-by: Peter Xu --- arch/arm64/include/asm/pgtable.h | 6 ++-- arch/powerpc/include/asm/book3s/64/pgtable.h | 2 +- arch/powerpc/mm/book3s64/pgtable.c | 2 +- arch/riscv/include/asm/pgtable.h | 4 +-- arch/s390/include/asm/pgtable.h | 2 +- arch/s390/mm/pgtable.c | 4 +-- arch/sparc/mm/tlb.c | 2 +- arch/x86/mm/pgtable.c | 15 ++++----- include/linux/mm_types.h | 2 +- include/linux/pgtable.h | 4 +-- mm/memory.c | 2 +- mm/pgtable-generic.c | 32 ++++++++++---------- 12 files changed, 40 insertions(+), 37 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 5d5d1b18b837..b93c03256ada 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1105,7 +1105,7 @@ extern int __ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t entry, int dirty); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS static inline int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, @@ -1114,7 +1114,9 @@ static inline int pmdp_set_access_flags(struct vm_area_struct *vma, return __ptep_set_access_flags(vma, address, (pte_t *)pmdp, pmd_pte(entry), dirty); } +#endif +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES static inline int pud_devmap(pud_t pud) { return 0; @@ -1178,7 +1180,7 @@ static inline int __ptep_clear_flush_young(struct vm_area_struct *vma, return young; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 051b1b6d729c..84cf55e18334 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1119,7 +1119,7 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write) return pte_access_permitted(pmd_pte(pmd), write); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot); extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot); extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot); diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 5a4a75369043..d6a5457627df 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__pmd_frag_nr); unsigned long __pmd_frag_size_shift; EXPORT_SYMBOL(__pmd_frag_size_shift); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES /* * This is called when relaxing access to a hugepage. It's also called in the page * fault path when we don't hit any of the major fault cases, ie, a minor diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index ebfe8faafb79..8c28f15f601b 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -752,7 +752,7 @@ static inline bool pud_user_accessible_page(pud_t pud) } #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES static inline int pmd_trans_huge(pmd_t pmd) { return pmd_leaf(pmd); @@ -802,7 +802,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, #define pmdp_collapse_flush pmdp_collapse_flush extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ /* * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index fb6870384b97..398bbed20dee 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1710,7 +1710,7 @@ pmd_t pmdp_xchg_direct(struct mm_struct *, unsigned long, pmd_t *, pmd_t); pmd_t pmdp_xchg_lazy(struct mm_struct *, unsigned long, pmd_t *, pmd_t); pud_t pudp_xchg_direct(struct mm_struct *, unsigned long, pud_t *, pud_t); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define __HAVE_ARCH_PGTABLE_DEPOSIT void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 2c944bafb030..c4481068734e 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -561,7 +561,7 @@ pud_t pudp_xchg_direct(struct mm_struct *mm, unsigned long addr, } EXPORT_SYMBOL(pudp_xchg_direct); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, pgtable_t pgtable) { @@ -600,7 +600,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) set_pte(ptep, __pte(_PAGE_INVALID)); return pgtable; } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ #ifdef CONFIG_PGSTE void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index 8648a50afe88..140813d07c9f 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -143,7 +143,7 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, tlb_batch_add_one(mm, vaddr, pte_exec(orig), hugepage_shift); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES static void tlb_batch_pmd_scan(struct mm_struct *mm, unsigned long vaddr, pmd_t pmd) { diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fa77411bb266..7b10d4a0c0cd 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -511,7 +511,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, return changed; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, pmd_t entry, int dirty) @@ -532,7 +532,9 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, return changed; } +#endif /* PGTABLE_HAS_PMD_LEAVES */ +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pud_t *pudp, pud_t entry, int dirty) { @@ -552,7 +554,7 @@ int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address, return changed; } -#endif +#endif /* PGTABLE_HAS_PUD_LEAVES */ int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) @@ -566,7 +568,7 @@ int ptep_test_and_clear_young(struct vm_area_struct *vma, return ret; } -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp) { @@ -580,7 +582,7 @@ int pmdp_test_and_clear_young(struct vm_area_struct *vma, } #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES int pudp_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp) { @@ -613,7 +615,7 @@ int ptep_clear_flush_young(struct vm_area_struct *vma, return ptep_test_and_clear_young(vma, address, ptep); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { @@ -641,8 +643,7 @@ pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, } #endif -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ - defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, pud_t *pudp) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index ef09c4eef6d3..44ef91ce720c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -942,7 +942,7 @@ struct mm_struct { #ifdef CONFIG_MMU_NOTIFIER struct mmu_notifier_subscriptions *notifier_subscriptions; #endif -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) && !USE_SPLIT_PMD_PTLOCKS pgtable_t pmd_huge_pte; /* protected by page_table_lock */ #endif #ifdef CONFIG_NUMA_BALANCING diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 0a904300ac90..5a5aaee5fa1c 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -362,7 +362,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, #endif #ifndef __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) @@ -383,7 +383,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, BUILD_BUG(); return 0; } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG */ #endif #ifndef __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH diff --git a/mm/memory.c b/mm/memory.c index 802d0d8a40f9..126ee0903c79 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -666,7 +666,7 @@ struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr, return NULL; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t pmd) { diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index a78a4adf711a..e9fc3f6774a6 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -103,7 +103,7 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, } #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #ifndef __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS int pmdp_set_access_flags(struct vm_area_struct *vma, @@ -145,20 +145,6 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; } - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, - pud_t *pudp) -{ - pud_t pud; - - VM_BUG_ON(address & ~HPAGE_PUD_MASK); - VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); - pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); - flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); - return pud; -} -#endif #endif #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT @@ -252,7 +238,21 @@ void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) call_rcu(&page->rcu_head, pte_free_now); } #endif /* pte_free_defer */ -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ + +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + pud_t pud; + + VM_BUG_ON(address & ~HPAGE_PUD_MASK); + VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); + pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); + flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return pud; +} +#endif /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RCU)) From patchwork Wed Jul 17 22:02:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 444CAC3DA60 for ; Wed, 17 Jul 2024 22:03:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=MN2X5ZC5x3afCj59bCv4apNP8BGxoDgloGkw+RDG/CE=; b=3Rbtltv91YEqOa N9emegKI9bWqsTzyafVdWPWI50Q1VmyNP7oNF93R54u0EDWCF0JNb+FUwQHyWpWpavJYFoSKQQjh3 x1fbaZZs7FyWMxoNd+ywvQ2urDKATBzNKdGqZoXgFJdDDpuiQ5dZY6DTAyl8lrWmH1NxrnNgQ7Fg1 kHBBXNSkIc18/vjU8e57sRXuXid6bLyrBGdVk+l9rnShV6wxBsKI19ZbxQdk7VSwCONkTUQOFfhfM i079VElP+Ye4ac0es6Z4BVnfdGLA396XuaZgVzD57CRs4t892KDJf/gkUJjYQV7WmQTZlU8dBGSuC Stb/enIYENxShjargefg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCk5-0000000F3Sl-0jmA; Wed, 17 Jul 2024 22:03:21 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjS-0000000F3FO-1Zdf for linux-riscv@bombadil.infradead.org; Wed, 17 Jul 2024 22:02:42 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:Content-Transfer-Encoding :MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Sender:Reply-To:Content-ID:Content-Description; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=jVbJkv1bkDtzgX0zoNvGnVOGi1 bYKfqW8H5mUsG0+8l71Dy/tEBkRKBrYyiiF4hNSuWhLJrizzl9dG/vxn0gj6RGvV8UmTqcHwImDYU zs5hyUhqdN/uUSw5DcrJgRp5e7JhDmGxsIwdIMzSk7qNSvJOP1NQ6tRel5zkOT0Z/7T6pprZjyztt juxcASqgos3Z9CJAy+jFONjeZ0yKQit5/6W5PsD0Vz7A+G+00yCaO3sXq8YHN9fYf1/Hexgh+wtYs rcYDc6XJ9+eYsDgq/Qk5xuvvD7jifR5V3GBKIWxAVdch1ZY7gT4mNnr7wilCPnTsZjsUgJYLhHO+5 JtNyzZ9Q==; Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by desiato.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjM-00000002Q0T-31YI for linux-riscv@lists.infradead.org; Wed, 17 Jul 2024 22:02:41 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253753; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=UZJZGZQI5N1saBiwJ6j2rxTh3Yq5/Dk89IZ2y4ngk+myHoaK8F2cL9hwSfgTA3IrXvlp7v jh7NHmDlvCue842vJv/i6yQkbqPYA5fQ7/N5A5qIy0XKGdzoaIXkiCEsDNuRyrUbM7UXsi 7SARwxd+DhCPjKxII/5Fha8mHP0ek8Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253754; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=JfQlVEUVMOPpY652T1C1RBr+zzBkkHWWDdi4G1t5rg7cjR44Tm5rzY9tszfSWD5it2+wwg qitRdNyvVWN1WxDFZhZfQQjHkdSnYoMwsK0HHia5MQlpvtELx2a4Dyw3WdlFJ4m55BVLH0 pRn2ggEuIZF41b3Yb3a3u946NNMh9Fo= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-211--4VIEFxzMUGHakfQGZ_-jA-1; Wed, 17 Jul 2024 18:02:32 -0400 X-MC-Unique: -4VIEFxzMUGHakfQGZ_-jA-1 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-6b798c6b850so594946d6.0 for ; Wed, 17 Jul 2024 15:02:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253751; x=1721858551; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=lzR98PbM1bSLi5rVy6GuP6O/DyKdbvGJZFM/DQzbnB3DF/724pDI98w6BXNwUMs4IA PbLBm4/NEyOj9lEGT81nSDhDjEOTPaFeXyo0o+ERoI3Vpnrs/Zi0nbrB/LgRiTyCfHqV 5AH6zz86iMERIsVQKxynpYEmJPa2W1/LSswwRqWGpMev+1ubfS6nfmBrx2NbtCDs782C 9LkcUxtlc6omhYY8ic1riXEKLa4T6+LpVLyeM0BqF7GMd3NQBt1vU2imsEWtbSX4PXl9 2CPNfEPeB50oSCRWtNvzi9LPlbN2iAxpgq7e0eemsZpGXPhYz6ZP5duNBU9xjp8Ob7oX UC1w== X-Forwarded-Encrypted: i=1; AJvYcCUHWaT1xGeJjqM0nUKNr0tJqdrHln87weN8i9j6gNAiBVrSK/FMuQHuyUPZ+wEp0A4uClpAt3ImOFlfTaV4+ZvJJNEAz/KCikyzGWt19Gw+ X-Gm-Message-State: AOJu0Yw2uN48KGkEV/mbJuinoWqL5WEtEgYndlgT1xbMnTXDmOSu0RZM Egwnl4XtT+KjD1KDXJooDJkXk2GmZO7ydL1Y5ApBgpm9D1p70eyXVk9q8HpHybkenMZPPkdaBss wvptz5+P9+5z96nP4mNt/DTzk7nj5edteqCyPnycuPh2TCc81mKoExxgxhkAXFQfiag== X-Received: by 2002:a05:622a:164b:b0:44e:cff7:3743 with SMTP id d75a77b69052e-44f86e7339emr21651671cf.9.1721253750846; Wed, 17 Jul 2024 15:02:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHgaCdw1UOkSL+KIFT2eMjRU01W5DATHpkzikOyR2irQlfZJvXc4O4/CXTqU/rdsuwB1sVYnA== X-Received: by 2002:a05:622a:164b:b0:44e:cff7:3743 with SMTP id d75a77b69052e-44f86e7339emr21651491cf.9.1721253750364; Wed, 17 Jul 2024 15:02:30 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:29 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 4/6] mm: Move huge mapping declarations from internal.h to huge_mm.h Date: Wed, 17 Jul 2024 18:02:17 -0400 Message-ID: <20240717220219.3743374-5-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240717_230236_948508_C89454F3 X-CRM114-Status: GOOD ( 14.34 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Most of the huge mapping relevant helpers are declared in huge_mm.h, not internal.h. Move the only few from internal.h into huge_mm.h. Here to move pmd_needs_soft_dirty_wp() over, we'll also need to move vma_soft_dirty_enabled() into mm.h as it'll be needed in two headers later (internal.h, huge_mm.h). Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 10 ++++++++++ include/linux/mm.h | 18 ++++++++++++++++++ mm/internal.h | 33 --------------------------------- 3 files changed, 28 insertions(+), 33 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 37482c8445d1..d8b642ad512d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -8,6 +8,11 @@ #include /* only for vma_is_dax() */ #include +void touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write); +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write); +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, @@ -629,4 +634,9 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) #define split_folio(f) split_folio_to_order(f, 0) +static inline bool pmd_needs_soft_dirty_wp(struct vm_area_struct *vma, pmd_t pmd) +{ + return vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd); +} + #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 5f1075d19600..fa10802d8faa 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1117,6 +1117,24 @@ static inline unsigned int folio_order(struct folio *folio) return folio->_flags_1 & 0xff; } +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma) +{ + /* + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty + * enablements, because when without soft-dirty being compiled in, + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY) + * will be constantly true. + */ + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) + return false; + + /* + * Soft-dirty is kind of special: its tracking is enabled when the + * vma flags not set. + */ + return !(vma->vm_flags & VM_SOFTDIRTY); +} + #include /* diff --git a/mm/internal.h b/mm/internal.h index b4d86436565b..e49941747749 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -917,8 +917,6 @@ bool need_mlock_drain(int cpu); void mlock_drain_local(void); void mlock_drain_remote(int cpu); -extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); - /** * vma_address - Find the virtual address a page range is mapped at * @vma: The vma which maps this object. @@ -1229,14 +1227,6 @@ int migrate_device_coherent_page(struct page *page); int __must_check try_grab_folio(struct folio *folio, int refs, unsigned int flags); -/* - * mm/huge_memory.c - */ -void touch_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, bool write); -void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write); - /* * mm/mmap.c */ @@ -1342,29 +1332,6 @@ static __always_inline void vma_set_range(struct vm_area_struct *vma, vma->vm_pgoff = pgoff; } -static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma) -{ - /* - * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty - * enablements, because when without soft-dirty being compiled in, - * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY) - * will be constantly true. - */ - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) - return false; - - /* - * Soft-dirty is kind of special: its tracking is enabled when the - * vma flags not set. - */ - return !(vma->vm_flags & VM_SOFTDIRTY); -} - -static inline bool pmd_needs_soft_dirty_wp(struct vm_area_struct *vma, pmd_t pmd) -{ - return vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd); -} - static inline bool pte_needs_soft_dirty_wp(struct vm_area_struct *vma, pte_t pte) { return vma_soft_dirty_enabled(vma) && !pte_soft_dirty(pte); From patchwork Wed Jul 17 22:02:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 34234C3DA5D for ; Wed, 17 Jul 2024 22:05:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=23fc3v2fJENsHi5KPiB3+LXXelEKQqSzhLUwOpQ2IOo=; b=rh8mP4g+hLYN6Y jMo2alGsEXqjim/5kOXbwkKv1wLbNFLB2niyKl0W84tvgJTn4aHCNp1Q3mOj91/YgzSK1waUHu74L RVphyM5E/k6RTofIAf8BO8hqe88FzuTzDO1GiyKGyY0vWVnXZyG5gpiiZm+gUADJ662aF8EItYRny RWnM7b7VX27Y4eu7ph0mfVRz0xvSqvITVvkHMeDfSTe2TXMOOdPE1byMrI00+7Y/93BjCga8h/LGy MO5toDyEJ80kb0pHcGHb6FMLItEwpSOLe8zvk/9oGeD9WC+sX2kuZkxWsP/c/bJvKmCqMKbe+y+bA yFdRzqErwYZHoscV6Alw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUClt-0000000F4OW-0r6z; Wed, 17 Jul 2024 22:05:13 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjN-0000000F3Cp-1vjo for linux-riscv@lists.infradead.org; Wed, 17 Jul 2024 22:03:15 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253756; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UK+6jdtPUvq+CqmhBfq9bfGaFmN7od9JwWK0He3qEgI=; b=GYAjIjiDh/iuhN8ex061cG72ataWx7rao75NG25ZsSaaO9ljnJwsIRWjkjBRxjAtgEcaCR PuMwnOuMQ3Kjh1rPxS1SqmNRePYUdaI25P1Vap7jGPURa5HWI2VMT8sDbif3vS5WaEgY6q RkleXuB6U9BhhVSR797rpxB6tGjlo24= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-359-J6s026PlMPKteNQ5LQsLeA-1; Wed, 17 Jul 2024 18:02:34 -0400 X-MC-Unique: J6s026PlMPKteNQ5LQsLeA-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6b79fbc7ed2so46166d6.3 for ; Wed, 17 Jul 2024 15:02:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253754; x=1721858554; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UK+6jdtPUvq+CqmhBfq9bfGaFmN7od9JwWK0He3qEgI=; b=PQ0qZq10KTAoHIuv5F/MJFMIN4gs2ugi46rC6BNvLJuh3llTV8X4Rde13Zwz9y79FZ 0bqX0ZrACAUWP9/HTcZRbBHp5JF8HmAlp7tUhoptlombnZIjEnH8hs1LMaguA1gts9Jj mOUNU8xvEXr08Xara7FvZbftLswixPC9fDg9w1YW0YK1zTE6HptXmFpFGa82vS6JjFr2 sZ5iEigfZtuXlF1MCcL+LdYhz5NKWD/+Txfre51EjeNw59cuWAsYItTpzFVlseGL17kK ghKtk9JAovjy18Y5pDKNQxzN//+U/RLx8beK5MlRgbWAGO2Z0T/9mCjeYBTzBvgz083G XGvg== X-Forwarded-Encrypted: i=1; AJvYcCWoY5dJuYE6Y8VcjrX1ngbveZy2y+bj1B9R4yDtsttQv9Zt+uoP+6fHAldiW7O+XqVuFgdX7GhOi78b6xixc8rw46x1zj/OtlkcNiwZi17I X-Gm-Message-State: AOJu0YwAZmlo1Q2yH2BvCdNfysPdwPOK9SoLkiC0XERnV+7yJ9ejyHpP eSowQQ5hXHKzTqGOhl6eaql1xZjFPqtNhKXWCtIvkJ/C1uYj5oOuVgKg0l/2IEaHQwLISoN8yQ6 EiOrDNotXZIvHkrsI+9TICn3ryrrNddwXKfYZo4BRaTfPkA0YKBxr8k4pDIGxhY5H4A== X-Received: by 2002:a05:622a:178f:b0:447:f3d8:e394 with SMTP id d75a77b69052e-44f86186c77mr22820971cf.2.1721253753687; Wed, 17 Jul 2024 15:02:33 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHjoecSnH17WsLUAfbk8D5niviaBnv9p9n7X7ysSM2hooHGbHoRt5hE2vvjgIUgbHso/7wvBQ== X-Received: by 2002:a05:622a:178f:b0:447:f3d8:e394 with SMTP id d75a77b69052e-44f86186c77mr22820481cf.2.1721253752855; Wed, 17 Jul 2024 15:02:32 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:32 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 5/6] mm/huge_mapping: Create huge_mapping_pxx.c Date: Wed, 17 Jul 2024 18:02:18 -0400 Message-ID: <20240717220219.3743374-6-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240717_150238_385178_E6D0294D X-CRM114-Status: GOOD ( 22.60 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org At some point, we need to decouple "huge mapping" with THP, for any non-THP huge mappings in the future (hugetlb, pfnmap, etc..). This is the first step towards it. Or say, we already started to do this when PGTABLE_HAS_HUGE_LEAVES option was introduced: that is the first thing Linux start to describe LEAVEs rather than THPs when it is about huge mappings. Before that, mostly any huge mapping will have THP involved, like devmap. Hugetlb is special only because we duplicated the whole world there, but we also have a demand to decouple that now. Linux used to have huge_memory.c which only compiles with THP enabled, I wished it was called thp.c from the start. In reality, it contains more than processing THP: any huge mapping (even if not falling into THP category) will be able to leverage many of these helpers, but unfortunately this file is not compiled if !THP. These helpers are normally only about the pgtable operations, which may not be directly relevant to what type of huge folio (e.g. THP) underneath, or perhaps even if there's no vmemmap to back it. It's better we move them out of THP world. Create a new set of files huge_mapping_p[mu]d.c. This patch starts to move quite a few essential helpers from huge_memory.c into these new files, so that they'll start to work and compile rely on PGTABLE_HAS_PXX_LEAVES rather than THP. Split them into two files by nature so that e.g. archs that only supports PMD huge mapping can avoid compiling the whole -pud file, with the hope to reduce the size of object compiled and linked. No functional change intended, but only code movement. Said that, there will be some "ifdef" machinery changes to pass all kinds of compilations. Cc: Jason Gunthorpe Cc: Matthew Wilcox Cc: Oscar Salvador Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 318 +++++--- include/linux/pgtable.h | 23 +- include/trace/events/huge_mapping.h | 41 + include/trace/events/thp.h | 28 - mm/Makefile | 2 + mm/huge_mapping_pmd.c | 979 +++++++++++++++++++++++ mm/huge_mapping_pud.c | 235 ++++++ mm/huge_memory.c | 1125 +-------------------------- 8 files changed, 1472 insertions(+), 1279 deletions(-) create mode 100644 include/trace/events/huge_mapping.h create mode 100644 mm/huge_mapping_pmd.c create mode 100644 mm/huge_mapping_pud.c diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d8b642ad512d..aea2784df8ef 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -8,43 +8,214 @@ #include /* only for vma_is_dax() */ #include +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); void touch_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pud, bool write); -void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write); -pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); -vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); -int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); -void huge_pmd_set_accessed(struct vm_fault *vmf); int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, struct vm_area_struct *vma); +int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, + unsigned long addr); +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags); +void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address); +spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); -#else -static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) +static inline spinlock_t * +pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { + if (pud_trans_huge(*pud) || pud_devmap(*pud)) + return __pud_trans_huge_lock(pud, vma); + else + return NULL; } -#endif +#define split_huge_pud(__vma, __pud, __address) \ + do { \ + pud_t *____pud = (__pud); \ + if (pud_trans_huge(*____pud) || pud_devmap(*____pud)) \ + __split_huge_pud(__vma, __pud, __address); \ + } while (0) +#else /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ +static inline void +huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) +{ +} + +static inline int +change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + return 0; +} + +static inline spinlock_t * +pud_trans_huge_lock(pud_t *pud, + struct vm_area_struct *vma) +{ + return NULL; +} + +static inline void +touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write) +{ +} + +static inline int +copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pud_t *dst_pud, pud_t *src_pud, unsigned long addr, + struct vm_area_struct *vma) +{ + return 0; +} + +static inline int +zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, + unsigned long addr) +{ + return 0; +} + +static inline void +__split_huge_pud(struct vm_area_struct *vma, pud_t *pud, unsigned long address) +{ +} + +#define split_huge_pud(__vma, __pud, __address) do {} while (0) +#endif /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ + +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write); +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); +void huge_pmd_set_accessed(struct vm_fault *vmf); vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf); -bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, unsigned long next); int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr); -int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, - unsigned long addr); bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd); int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, pgprot_t newprot, unsigned long cp_flags); +void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze, struct folio *folio); +void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, bool freeze, struct folio *folio); +void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct folio *folio); +spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); +bool can_change_pmd_writable(struct vm_area_struct *vma, unsigned long addr, + pmd_t pmd); +void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd); + +static inline int is_swap_pmd(pmd_t pmd) +{ + return !pmd_none(pmd) && !pmd_present(pmd); +} + +/* mmap_lock must be held on entry */ +static inline spinlock_t * +pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) +{ + if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + return __pmd_trans_huge_lock(pmd, vma); + else + return NULL; +} + +#define split_huge_pmd(__vma, __pmd, __address) \ + do { \ + pmd_t *____pmd = (__pmd); \ + if (is_swap_pmd(*____pmd) || pmd_is_leaf(*____pmd)) \ + __split_huge_pmd(__vma, __pmd, __address, \ + false, NULL); \ + } while (0) +#else /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ +static inline spinlock_t * +pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) +{ + return NULL; +} + +static inline int is_swap_pmd(pmd_t pmd) +{ + return 0; +} +static inline void +__split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze, struct folio *folio) +{ +} +#define split_huge_pmd(__vma, __pmd, __address) do {} while (0) + +static inline int +copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) +{ + return 0; +} + +static inline int +zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr) +{ + return 0; +} + +static inline vm_fault_t +do_huge_pmd_wp_page(struct vm_fault *vmf) +{ + return 0; +} + +static inline void +huge_pmd_set_accessed(struct vm_fault *vmf) +{ +} + +static inline int +change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + return 0; +} + +static inline bool +move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) +{ + return false; +} + +static inline void +split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, bool freeze, struct folio *folio) +{ +} + +static inline void +split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct folio *folio) +{ +} +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ + +bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, unsigned long next); vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); +struct folio *mm_get_huge_zero_folio(struct mm_struct *mm); enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_UNSUPPORTED, @@ -130,6 +301,9 @@ extern unsigned long huge_anon_orders_always; extern unsigned long huge_anon_orders_madvise; extern unsigned long huge_anon_orders_inherit; +void __split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd); + static inline bool hugepage_global_enabled(void) { return transparent_hugepage_flags & @@ -332,44 +506,6 @@ static inline int split_huge_page(struct page *page) } void deferred_split_folio(struct folio *folio); -void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct folio *folio); - -#define split_huge_pmd(__vma, __pmd, __address) \ - do { \ - pmd_t *____pmd = (__pmd); \ - if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd) \ - || pmd_devmap(*____pmd)) \ - __split_huge_pmd(__vma, __pmd, __address, \ - false, NULL); \ - } while (0) - - -void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze, struct folio *folio); - -void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address); - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pudp, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags); -#else -static inline int -change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pudp, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) { return 0; } -#endif - -#define split_huge_pud(__vma, __pud, __address) \ - do { \ - pud_t *____pud = (__pud); \ - if (pud_trans_huge(*____pud) \ - || pud_devmap(*____pud)) \ - __split_huge_pud(__vma, __pud, __address); \ - } while (0) - int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); int madvise_collapse(struct vm_area_struct *vma, @@ -377,31 +513,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); -spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); -spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); - -static inline int is_swap_pmd(pmd_t pmd) -{ - return !pmd_none(pmd) && !pmd_present(pmd); -} - -/* mmap_lock must be held on entry */ -static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, - struct vm_area_struct *vma) -{ - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) - return __pmd_trans_huge_lock(pmd, vma); - else - return NULL; -} -static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, - struct vm_area_struct *vma) -{ - if (pud_trans_huge(*pud) || pud_devmap(*pud)) - return __pud_trans_huge_lock(pud, vma); - else - return NULL; -} /** * folio_test_pmd_mappable - Can we map this folio with a PMD? @@ -416,6 +527,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap); vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); +vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); extern struct folio *huge_zero_folio; extern unsigned long huge_zero_pfn; @@ -445,13 +557,17 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } -void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, - pmd_t *pmd, bool freeze, struct folio *folio); bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct folio *folio); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ +static inline void +__split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd) +{ +} + static inline bool folio_test_pmd_mappable(struct folio *folio) { return false; @@ -505,16 +621,6 @@ static inline int split_huge_page(struct page *page) return 0; } static inline void deferred_split_folio(struct folio *folio) {} -#define split_huge_pmd(__vma, __pmd, __address) \ - do { } while (0) - -static inline void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct folio *folio) {} -static inline void split_huge_pmd_address(struct vm_area_struct *vma, - unsigned long address, bool freeze, struct folio *folio) {} -static inline void split_huge_pmd_locked(struct vm_area_struct *vma, - unsigned long address, pmd_t *pmd, - bool freeze, struct folio *folio) {} static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, @@ -523,9 +629,6 @@ static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma, return false; } -#define split_huge_pud(__vma, __pmd, __address) \ - do { } while (0) - static inline int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) { @@ -545,20 +648,6 @@ static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, long adjust_next) { } -static inline int is_swap_pmd(pmd_t pmd) -{ - return 0; -} -static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, - struct vm_area_struct *vma) -{ - return NULL; -} -static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, - struct vm_area_struct *vma) -{ - return NULL; -} static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { @@ -606,15 +695,8 @@ static inline int next_order(unsigned long *orders, int prev) return 0; } -static inline void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) -{ -} - -static inline int change_huge_pud(struct mmu_gather *tlb, - struct vm_area_struct *vma, pud_t *pudp, - unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) +static inline vm_fault_t +do_huge_pmd_anonymous_page(struct vm_fault *vmf) { return 0; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5a5aaee5fa1c..5e505373b113 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -628,8 +628,8 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, #endif /* __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#ifndef __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) && \ + !defined(__HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL) static inline pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, int full) @@ -638,14 +638,14 @@ static inline pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, } #endif -#ifndef __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR_FULL +#if defined(CONFIG_PGTABLE_HAS_PUD_LEAVES) && \ + !defined(__HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR_FULL) static inline pud_t pudp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long address, pud_t *pudp, int full) { return pudp_huge_get_and_clear(vma->vm_mm, address, pudp); } -#endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL @@ -894,9 +894,9 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif + #ifndef __HAVE_ARCH_PUDP_SET_WRPROTECT -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES static inline void pudp_set_wrprotect(struct mm_struct *mm, unsigned long address, pud_t *pudp) { @@ -910,8 +910,7 @@ static inline void pudp_set_wrprotect(struct mm_struct *mm, { BUILD_BUG(); } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +#endif /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ #endif #ifndef pmdp_collapse_flush @@ -1735,7 +1734,6 @@ static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ #ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE -#ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * ARCHes with special requirements for evicting THP backing TLB entries can * implement this. Otherwise also, it can help optimize normal TLB flush in @@ -1745,10 +1743,15 @@ static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) * invalidate the entire TLB which is not desirable. * e.g. see arch/arc: flush_pmd_tlb_range */ +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define flush_pmd_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) -#define flush_pud_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) #else #define flush_pmd_tlb_range(vma, addr, end) BUILD_BUG() +#endif + +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +#define flush_pud_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) +#else #define flush_pud_tlb_range(vma, addr, end) BUILD_BUG() #endif #endif diff --git a/include/trace/events/huge_mapping.h b/include/trace/events/huge_mapping.h new file mode 100644 index 000000000000..20036d090ce5 --- /dev/null +++ b/include/trace/events/huge_mapping.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM huge_mapping + +#if !defined(_TRACE_HUGE_MAPPING_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HUGE_MAPPING_H + +#include +#include + +DECLARE_EVENT_CLASS(migration_pmd, + + TP_PROTO(unsigned long addr, unsigned long pmd), + + TP_ARGS(addr, pmd), + + TP_STRUCT__entry( + __field(unsigned long, addr) + __field(unsigned long, pmd) + ), + + TP_fast_assign( + __entry->addr = addr; + __entry->pmd = pmd; + ), + TP_printk("addr=%lx, pmd=%lx", __entry->addr, __entry->pmd) +); + +DEFINE_EVENT(migration_pmd, set_migration_pmd, + TP_PROTO(unsigned long addr, unsigned long pmd), + TP_ARGS(addr, pmd) +); + +DEFINE_EVENT(migration_pmd, remove_migration_pmd, + TP_PROTO(unsigned long addr, unsigned long pmd), + TP_ARGS(addr, pmd) +); +#endif /* _TRACE_HUGE_MAPPING_H */ + +/* This part must be outside protection */ +#include diff --git a/include/trace/events/thp.h b/include/trace/events/thp.h index f50048af5fcc..395b574b1c79 100644 --- a/include/trace/events/thp.h +++ b/include/trace/events/thp.h @@ -66,34 +66,6 @@ DEFINE_EVENT(hugepage_update, hugepage_update_pud, TP_PROTO(unsigned long addr, unsigned long pud, unsigned long clr, unsigned long set), TP_ARGS(addr, pud, clr, set) ); - -DECLARE_EVENT_CLASS(migration_pmd, - - TP_PROTO(unsigned long addr, unsigned long pmd), - - TP_ARGS(addr, pmd), - - TP_STRUCT__entry( - __field(unsigned long, addr) - __field(unsigned long, pmd) - ), - - TP_fast_assign( - __entry->addr = addr; - __entry->pmd = pmd; - ), - TP_printk("addr=%lx, pmd=%lx", __entry->addr, __entry->pmd) -); - -DEFINE_EVENT(migration_pmd, set_migration_pmd, - TP_PROTO(unsigned long addr, unsigned long pmd), - TP_ARGS(addr, pmd) -); - -DEFINE_EVENT(migration_pmd, remove_migration_pmd, - TP_PROTO(unsigned long addr, unsigned long pmd), - TP_ARGS(addr, pmd) -); #endif /* _TRACE_THP_H */ /* This part must be outside protection */ diff --git a/mm/Makefile b/mm/Makefile index d2915f8c9dc0..3a846121b1f5 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -95,6 +95,8 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_NUMA) += memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o +obj-$(CONFIG_PGTABLE_HAS_PMD_LEAVES) += huge_mapping_pmd.o +obj-$(CONFIG_PGTABLE_HAS_PUD_LEAVES) += huge_mapping_pud.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o obj-$(CONFIG_MEMCG_V1) += memcontrol-v1.o obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o diff --git a/mm/huge_mapping_pmd.c b/mm/huge_mapping_pmd.c new file mode 100644 index 000000000000..7b85e2a564d6 --- /dev/null +++ b/mm/huge_mapping_pmd.c @@ -0,0 +1,979 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Red Hat, Inc. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "internal.h" +#include "swap.h" + +#define CREATE_TRACE_POINTS +#include + +/* + * Returns page table lock pointer if a given pmd maps a thp, NULL otherwise. + * + * Note that if it returns page table lock pointer, this routine returns without + * unlocking page table lock. So callers must unlock it. + */ +spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) +{ + spinlock_t *ptl; + + ptl = pmd_lock(vma->vm_mm, pmd); + if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || + pmd_devmap(*pmd))) + return ptl; + spin_unlock(ptl); + return NULL; +} + +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (likely(vma->vm_flags & VM_WRITE)) + pmd = pmd_mkwrite(pmd, vma); + return pmd; +} + +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write) +{ + pmd_t _pmd; + + _pmd = pmd_mkyoung(*pmd); + if (write) + _pmd = pmd_mkdirty(_pmd); + if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, + pmd, _pmd, write)) + update_mmu_cache_pmd(vma, addr, pmd); +} + +int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) +{ + spinlock_t *dst_ptl, *src_ptl; + struct page *src_page; + struct folio *src_folio; + pmd_t pmd; + pgtable_t pgtable = NULL; + int ret = -ENOMEM; + + /* Skip if can be re-fill on fault */ + if (!vma_is_anonymous(dst_vma)) + return 0; + + pgtable = pte_alloc_one(dst_mm); + if (unlikely(!pgtable)) + goto out; + + dst_ptl = pmd_lock(dst_mm, dst_pmd); + src_ptl = pmd_lockptr(src_mm, src_pmd); + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); + + ret = -EAGAIN; + pmd = *src_pmd; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (unlikely(is_swap_pmd(pmd))) { + swp_entry_t entry = pmd_to_swp_entry(pmd); + + VM_BUG_ON(!is_pmd_migration_entry(pmd)); + if (!is_readable_migration_entry(entry)) { + entry = make_readable_migration_entry( + swp_offset(entry)); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + if (pmd_swp_uffd_wp(*src_pmd)) + pmd = pmd_swp_mkuffd_wp(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + if (!userfaultfd_wp(dst_vma)) + pmd = pmd_swp_clear_uffd_wp(pmd); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; + } +#endif + + if (unlikely(!pmd_trans_huge(pmd))) { + pte_free(dst_mm, pgtable); + goto out_unlock; + } + /* + * When page table lock is held, the huge zero pmd should not be + * under splitting since we don't split the page itself, only pmd to + * a page table. + */ + if (is_huge_zero_pmd(pmd)) { + /* + * mm_get_huge_zero_folio() will never allocate a new + * folio here, since we already have a zero page to + * copy. It just takes a reference. + */ + mm_get_huge_zero_folio(dst_mm); + goto out_zero_page; + } + + src_page = pmd_page(pmd); + VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + src_folio = page_folio(src_page); + + folio_get(src_folio); + if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) { + /* Page maybe pinned: split and retry the fault on PTEs. */ + folio_put(src_folio); + pte_free(dst_mm, pgtable); + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pmd(src_vma, src_pmd, addr, false, NULL); + return -EAGAIN; + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); +out_zero_page: + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + pmdp_set_wrprotect(src_mm, addr, src_pmd); + if (!userfaultfd_wp(dst_vma)) + pmd = pmd_clear_uffd_wp(pmd); + pmd = pmd_mkold(pmd_wrprotect(pmd)); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + + ret = 0; +out_unlock: + spin_unlock(src_ptl); + spin_unlock(dst_ptl); +out: + return ret; +} + +void huge_pmd_set_accessed(struct vm_fault *vmf) +{ + bool write = vmf->flags & FAULT_FLAG_WRITE; + + vmf->ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) + goto unlock; + + touch_pmd(vmf->vma, vmf->address, vmf->pmd, write); + +unlock: + spin_unlock(vmf->ptl); +} + +vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) +{ + const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; + struct vm_area_struct *vma = vmf->vma; + struct folio *folio; + struct page *page; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + pmd_t orig_pmd = vmf->orig_pmd; + + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + VM_BUG_ON_VMA(!vma->anon_vma, vma); + + if (is_huge_zero_pmd(orig_pmd)) + goto fallback; + + spin_lock(vmf->ptl); + + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } + + page = pmd_page(orig_pmd); + folio = page_folio(page); + VM_BUG_ON_PAGE(!PageHead(page), page); + + /* Early check when only holding the PT lock. */ + if (PageAnonExclusive(page)) + goto reuse; + + if (!folio_trylock(folio)) { + folio_get(folio); + spin_unlock(vmf->ptl); + folio_lock(folio); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); + folio_unlock(folio); + folio_put(folio); + return 0; + } + folio_put(folio); + } + + /* Recheck after temporarily dropping the PT lock. */ + if (PageAnonExclusive(page)) { + folio_unlock(folio); + goto reuse; + } + + /* + * See do_wp_page(): we can only reuse the folio exclusively if + * there are no additional references. Note that we always drain + * the LRU cache immediately after adding a THP. + */ + if (folio_ref_count(folio) > + 1 + folio_test_swapcache(folio) * folio_nr_pages(folio)) + goto unlock_fallback; + if (folio_test_swapcache(folio)) + folio_free_swap(folio); + if (folio_ref_count(folio) == 1) { + pmd_t entry; + + folio_move_anon_rmap(folio, vma); + SetPageAnonExclusive(page); + folio_unlock(folio); +reuse: + if (unlikely(unshare)) { + spin_unlock(vmf->ptl); + return 0; + } + entry = pmd_mkyoung(orig_pmd); + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); + return 0; + } + +unlock_fallback: + folio_unlock(folio); + spin_unlock(vmf->ptl); +fallback: + __split_huge_pmd(vma, vmf->pmd, vmf->address, false, NULL); + return VM_FAULT_FALLBACK; +} + +bool can_change_pmd_writable(struct vm_area_struct *vma, unsigned long addr, + pmd_t pmd) +{ + struct page *page; + + if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) + return false; + + /* Don't touch entries that are not even readable (NUMA hinting). */ + if (pmd_protnone(pmd)) + return false; + + /* Do we need write faults for softdirty tracking? */ + if (pmd_needs_soft_dirty_wp(vma, pmd)) + return false; + + /* Do we need write faults for uffd-wp tracking? */ + if (userfaultfd_huge_pmd_wp(vma, pmd)) + return false; + + if (!(vma->vm_flags & VM_SHARED)) { + /* See can_change_pte_writable(). */ + page = vm_normal_page_pmd(vma, addr, pmd); + return page && PageAnon(page) && PageAnonExclusive(page); + } + + /* See can_change_pte_writable(). */ + return pmd_dirty(pmd); +} + +void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + +int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr) +{ + pmd_t orig_pmd; + spinlock_t *ptl; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + + ptl = __pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + /* + * For architectures like ppc64 we look at deposited pgtable + * when calling pmdp_huge_get_and_clear. So do the + * pgtable_trans_huge_withdraw after finishing pmdp related + * operations. + */ + orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, + tlb->fullmm); + arch_check_zapped_pmd(vma, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + if (vma_is_special_huge(vma)) { + if (arch_needs_pgtable_deposit()) + zap_deposited_table(tlb->mm, pmd); + spin_unlock(ptl); + } else if (is_huge_zero_pmd(orig_pmd)) { + zap_deposited_table(tlb->mm, pmd); + spin_unlock(ptl); + } else { + struct folio *folio = NULL; + int flush_needed = 1; + + if (pmd_present(orig_pmd)) { + struct page *page = pmd_page(orig_pmd); + + folio = page_folio(page); + folio_remove_rmap_pmd(folio, page, vma); + WARN_ON_ONCE(folio_mapcount(folio) < 0); + VM_BUG_ON_PAGE(!PageHead(page), page); + } else if (thp_migration_supported()) { + swp_entry_t entry; + + VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); + entry = pmd_to_swp_entry(orig_pmd); + folio = pfn_swap_entry_folio(entry); + flush_needed = 0; + } else + WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + + if (folio_test_anon(folio)) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); + } else { + if (arch_needs_pgtable_deposit()) + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, mm_counter_file(folio), + -HPAGE_PMD_NR); + } + + spin_unlock(ptl); + if (flush_needed) + tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE); + } + return 1; +} + +static pmd_t move_soft_dirty_pmd(pmd_t pmd) +{ +#ifdef CONFIG_MEM_SOFT_DIRTY + if (unlikely(is_pmd_migration_entry(pmd))) + pmd = pmd_swp_mksoft_dirty(pmd); + else if (pmd_present(pmd)) + pmd = pmd_mksoft_dirty(pmd); +#endif + return pmd; +} + +#ifndef pmd_move_must_withdraw +static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, + spinlock_t *old_pmd_ptl, + struct vm_area_struct *vma) +{ + /* + * With split pmd lock we also need to move preallocated + * PTE page table if new_pmd is on different PMD page table. + * + * We also don't deposit and withdraw tables for file pages. + */ + return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma); +} +#endif + +bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) +{ + spinlock_t *old_ptl, *new_ptl; + pmd_t pmd; + struct mm_struct *mm = vma->vm_mm; + bool force_flush = false; + + /* + * The destination pmd shouldn't be established, free_pgtables() + * should have released it; but move_page_tables() might have already + * inserted a page table, if racing against shmem/file collapse. + */ + if (!pmd_none(*new_pmd)) { + VM_BUG_ON(pmd_trans_huge(*new_pmd)); + return false; + } + + /* + * We don't have to worry about the ordering of src and dst + * ptlocks because exclusive mmap_lock prevents deadlock. + */ + old_ptl = __pmd_trans_huge_lock(old_pmd, vma); + if (old_ptl) { + new_ptl = pmd_lockptr(mm, new_pmd); + if (new_ptl != old_ptl) + spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + pmd = pmdp_huge_get_and_clear(mm, old_addr, old_pmd); + if (pmd_present(pmd)) + force_flush = true; + VM_BUG_ON(!pmd_none(*new_pmd)); + + if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) { + pgtable_t pgtable; + pgtable = pgtable_trans_huge_withdraw(mm, old_pmd); + pgtable_trans_huge_deposit(mm, new_pmd, pgtable); + } + pmd = move_soft_dirty_pmd(pmd); + set_pmd_at(mm, new_addr, new_pmd, pmd); + if (force_flush) + flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); + if (new_ptl != old_ptl) + spin_unlock(new_ptl); + spin_unlock(old_ptl); + return true; + } + return false; +} + +/* + * Returns + * - 0 if PMD could not be locked + * - 1 if PMD was locked but protections unchanged and TLB flush unnecessary + * or if prot_numa but THP migration is not supported + * - HPAGE_PMD_NR if protections changed and TLB flush necessary + */ +int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + pmd_t oldpmd, entry; + bool prot_numa = cp_flags & MM_CP_PROT_NUMA; + bool uffd_wp = cp_flags & MM_CP_UFFD_WP; + bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + int ret = 1; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + + if (prot_numa && !thp_migration_supported()) + return 1; + + ptl = __pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_swap_pmd(*pmd)) { + swp_entry_t entry = pmd_to_swp_entry(*pmd); + struct folio *folio = pfn_swap_entry_folio(entry); + pmd_t newpmd; + + VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + if (is_writable_migration_entry(entry)) { + /* + * A protection check is difficult so + * just be safe and disable write + */ + if (folio_test_anon(folio)) + entry = make_readable_exclusive_migration_entry(swp_offset(entry)); + else + entry = make_readable_migration_entry(swp_offset(entry)); + newpmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*pmd)) + newpmd = pmd_swp_mksoft_dirty(newpmd); + } else { + newpmd = *pmd; + } + + if (uffd_wp) + newpmd = pmd_swp_mkuffd_wp(newpmd); + else if (uffd_wp_resolve) + newpmd = pmd_swp_clear_uffd_wp(newpmd); + if (!pmd_same(*pmd, newpmd)) + set_pmd_at(mm, addr, pmd, newpmd); + goto unlock; + } +#endif + + if (prot_numa) { + struct folio *folio; + bool toptier; + /* + * Avoid trapping faults against the zero page. The read-only + * data is likely to be read-cached on the local CPU and + * local/remote hits to the zero page are not interesting. + */ + if (is_huge_zero_pmd(*pmd)) + goto unlock; + + if (pmd_protnone(*pmd)) + goto unlock; + + folio = pmd_folio(*pmd); + toptier = node_is_toptier(folio_nid(folio)); + /* + * Skip scanning top tier node if normal numa + * balancing is disabled + */ + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && + toptier) + goto unlock; + + if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING && + !toptier) + folio_xchg_access_time(folio, + jiffies_to_msecs(jiffies)); + } + /* + * In case prot_numa, we are under mmap_read_lock(mm). It's critical + * to not clear pmd intermittently to avoid race with MADV_DONTNEED + * which is also under mmap_read_lock(mm): + * + * CPU0: CPU1: + * change_huge_pmd(prot_numa=1) + * pmdp_huge_get_and_clear_notify() + * madvise_dontneed() + * zap_pmd_range() + * pmd_trans_huge(*pmd) == 0 (without ptl) + * // skip the pmd + * set_pmd_at(); + * // pmd is re-established + * + * The race makes MADV_DONTNEED miss the huge pmd and don't clear it + * which may break userspace. + * + * pmdp_invalidate_ad() is required to make sure we don't miss + * dirty/young flags set by hardware. + */ + oldpmd = pmdp_invalidate_ad(vma, addr, pmd); + + entry = pmd_modify(oldpmd, newprot); + if (uffd_wp) + entry = pmd_mkuffd_wp(entry); + else if (uffd_wp_resolve) + /* + * Leave the write bit to be handled by PF interrupt + * handler, then things like COW could be properly + * handled. + */ + entry = pmd_clear_uffd_wp(entry); + + /* See change_pte_range(). */ + if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && + can_change_pmd_writable(vma, addr, entry)) + entry = pmd_mkwrite(entry, vma); + + ret = HPAGE_PMD_NR; + set_pmd_at(mm, addr, pmd, entry); + + if (huge_pmd_needs_flush(oldpmd, entry)) + tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); +unlock: + spin_unlock(ptl); + return ret; +} + +static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long haddr, bool freeze) +{ + struct mm_struct *mm = vma->vm_mm; + struct folio *folio; + struct page *page; + pgtable_t pgtable; + pmd_t old_pmd, _pmd; + bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; + bool anon_exclusive = false, dirty = false; + unsigned long addr; + pte_t *pte; + int i; + + VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); + VM_BUG_ON_VMA(vma->vm_start > haddr, vma); + VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) && + !pmd_devmap(*pmd)); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + count_vm_event(THP_SPLIT_PMD); +#endif + + if (!vma_is_anonymous(vma)) { + old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd); + /* + * We are going to unmap this huge page. So + * just go ahead and zap it + */ + if (arch_needs_pgtable_deposit()) + zap_deposited_table(mm, pmd); + if (vma_is_special_huge(vma)) + return; + if (unlikely(is_pmd_migration_entry(old_pmd))) { + swp_entry_t entry; + + entry = pmd_to_swp_entry(old_pmd); + folio = pfn_swap_entry_folio(entry); + } else { + page = pmd_page(old_pmd); + folio = page_folio(page); + if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) + folio_mark_dirty(folio); + if (!folio_test_referenced(folio) && pmd_young(old_pmd)) + folio_set_referenced(folio); + folio_remove_rmap_pmd(folio, page, vma); + folio_put(folio); + } + add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR); + return; + } + + if (is_huge_zero_pmd(*pmd)) { + /* + * FIXME: Do we want to invalidate secondary mmu by calling + * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below + * inside __split_huge_pmd() ? + * + * We are going from a zero huge page write protected to zero + * small page also write protected so it does not seems useful + * to invalidate secondary mmu at this time. + */ + return __split_huge_zero_page_pmd(vma, haddr, pmd); + } + + pmd_migration = is_pmd_migration_entry(*pmd); + if (unlikely(pmd_migration)) { + swp_entry_t entry; + + old_pmd = *pmd; + entry = pmd_to_swp_entry(old_pmd); + page = pfn_swap_entry_to_page(entry); + write = is_writable_migration_entry(entry); + if (PageAnon(page)) + anon_exclusive = is_readable_exclusive_migration_entry(entry); + young = is_migration_entry_young(entry); + dirty = is_migration_entry_dirty(entry); + soft_dirty = pmd_swp_soft_dirty(old_pmd); + uffd_wp = pmd_swp_uffd_wp(old_pmd); + } else { + /* + * Up to this point the pmd is present and huge and userland has + * the whole access to the hugepage during the split (which + * happens in place). If we overwrite the pmd with the not-huge + * version pointing to the pte here (which of course we could if + * all CPUs were bug free), userland could trigger a small page + * size TLB miss on the small sized TLB while the hugepage TLB + * entry is still established in the huge TLB. Some CPU doesn't + * like that. See + * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum + * 383 on page 105. Intel should be safe but is also warns that + * it's only safe if the permission and cache attributes of the + * two entries loaded in the two TLB is identical (which should + * be the case here). But it is generally safer to never allow + * small and huge TLB entries for the same virtual address to be + * loaded simultaneously. So instead of doing "pmd_populate(); + * flush_pmd_tlb_range();" we first mark the current pmd + * notpresent (atomically because here the pmd_trans_huge must + * remain set at all times on the pmd until the split is + * complete for this pmd), then we flush the SMP TLB and finally + * we write the non-huge version of the pmd entry with + * pmd_populate. + */ + old_pmd = pmdp_invalidate(vma, haddr, pmd); + page = pmd_page(old_pmd); + folio = page_folio(page); + if (pmd_dirty(old_pmd)) { + dirty = true; + folio_set_dirty(folio); + } + write = pmd_write(old_pmd); + young = pmd_young(old_pmd); + soft_dirty = pmd_soft_dirty(old_pmd); + uffd_wp = pmd_uffd_wp(old_pmd); + + VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + + /* + * Without "freeze", we'll simply split the PMD, propagating the + * PageAnonExclusive() flag for each PTE by setting it for + * each subpage -- no need to (temporarily) clear. + * + * With "freeze" we want to replace mapped pages by + * migration entries right away. This is only possible if we + * managed to clear PageAnonExclusive() -- see + * set_pmd_migration_entry(). + * + * In case we cannot clear PageAnonExclusive(), split the PMD + * only and let try_to_migrate_one() fail later. + * + * See folio_try_share_anon_rmap_pmd(): invalidate PMD first. + */ + anon_exclusive = PageAnonExclusive(page); + if (freeze && anon_exclusive && + folio_try_share_anon_rmap_pmd(folio, page)) + freeze = false; + if (!freeze) { + rmap_t rmap_flags = RMAP_NONE; + + folio_ref_add(folio, HPAGE_PMD_NR - 1); + if (anon_exclusive) + rmap_flags |= RMAP_EXCLUSIVE; + folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, + vma, haddr, rmap_flags); + } + } + + /* + * Withdraw the table only after we mark the pmd entry invalid. + * This's critical for some architectures (Power). + */ + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte); + + /* + * Note that NUMA hinting access restrictions are not transferred to + * avoid any possibility of altering permissions across VMAs. + */ + if (freeze || pmd_migration) { + for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { + pte_t entry; + swp_entry_t swp_entry; + + if (write) + swp_entry = make_writable_migration_entry( + page_to_pfn(page + i)); + else if (anon_exclusive) + swp_entry = make_readable_exclusive_migration_entry( + page_to_pfn(page + i)); + else + swp_entry = make_readable_migration_entry( + page_to_pfn(page + i)); + if (young) + swp_entry = make_migration_entry_young(swp_entry); + if (dirty) + swp_entry = make_migration_entry_dirty(swp_entry); + entry = swp_entry_to_pte(swp_entry); + if (soft_dirty) + entry = pte_swp_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_swp_mkuffd_wp(entry); + + VM_WARN_ON(!pte_none(ptep_get(pte + i))); + set_pte_at(mm, addr, pte + i, entry); + } + } else { + pte_t entry; + + entry = mk_pte(page, READ_ONCE(vma->vm_page_prot)); + if (write) + entry = pte_mkwrite(entry, vma); + if (!young) + entry = pte_mkold(entry); + /* NOTE: this may set soft-dirty too on some archs */ + if (dirty) + entry = pte_mkdirty(entry); + if (soft_dirty) + entry = pte_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_mkuffd_wp(entry); + + for (i = 0; i < HPAGE_PMD_NR; i++) + VM_WARN_ON(!pte_none(ptep_get(pte + i))); + + set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); + } + pte_unmap(pte); + + if (!pmd_migration) + folio_remove_rmap_pmd(folio, page, vma); + if (freeze) + put_page(page); + + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + +void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, bool freeze, struct folio *folio) +{ + VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); + VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); + VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); + VM_BUG_ON(freeze && !folio); + + /* + * When the caller requests to set up a migration entry, we + * require a folio to check the PMD against. Otherwise, there + * is a risk of replacing the wrong folio. + */ + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || + is_pmd_migration_entry(*pmd)) { + if (folio && folio != pmd_folio(*pmd)) + return; + __split_huge_pmd_locked(vma, pmd, address, freeze); + } +} + +void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze, struct folio *folio) +{ + spinlock_t *ptl; + struct mmu_notifier_range range; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, + address & HPAGE_PMD_MASK, + (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + ptl = pmd_lock(vma->vm_mm, pmd); + split_huge_pmd_locked(vma, range.start, pmd, freeze, folio); + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); +} + +void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct folio *folio) +{ + pmd_t *pmd = mm_find_pmd(vma->vm_mm, address); + + if (!pmd) + return; + + __split_huge_pmd(vma, pmd, address, freeze, folio); +} + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page) +{ + struct folio *folio = page_folio(page); + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + unsigned long address = pvmw->address; + bool anon_exclusive; + pmd_t pmdval; + swp_entry_t entry; + pmd_t pmdswp; + + if (!(pvmw->pmd && !pvmw->pte)) + return 0; + + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE); + pmdval = pmdp_invalidate(vma, address, pvmw->pmd); + + /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */ + anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page); + if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return -EBUSY; + } + + if (pmd_dirty(pmdval)) + folio_mark_dirty(folio); + if (pmd_write(pmdval)) + entry = make_writable_migration_entry(page_to_pfn(page)); + else if (anon_exclusive) + entry = make_readable_exclusive_migration_entry(page_to_pfn(page)); + else + entry = make_readable_migration_entry(page_to_pfn(page)); + if (pmd_young(pmdval)) + entry = make_migration_entry_young(entry); + if (pmd_dirty(pmdval)) + entry = make_migration_entry_dirty(entry); + pmdswp = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + pmdswp = pmd_swp_mksoft_dirty(pmdswp); + if (pmd_uffd_wp(pmdval)) + pmdswp = pmd_swp_mkuffd_wp(pmdswp); + set_pmd_at(mm, address, pvmw->pmd, pmdswp); + folio_remove_rmap_pmd(folio, page, vma); + folio_put(folio); + trace_set_migration_pmd(address, pmd_val(pmdswp)); + + return 0; +} + +void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) +{ + struct folio *folio = page_folio(new); + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + unsigned long address = pvmw->address; + unsigned long haddr = address & HPAGE_PMD_MASK; + pmd_t pmde; + swp_entry_t entry; + + if (!(pvmw->pmd && !pvmw->pte)) + return; + + entry = pmd_to_swp_entry(*pvmw->pmd); + folio_get(folio); + pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)); + if (pmd_swp_soft_dirty(*pvmw->pmd)) + pmde = pmd_mksoft_dirty(pmde); + if (is_writable_migration_entry(entry)) + pmde = pmd_mkwrite(pmde, vma); + if (pmd_swp_uffd_wp(*pvmw->pmd)) + pmde = pmd_mkuffd_wp(pmde); + if (!is_migration_entry_young(entry)) + pmde = pmd_mkold(pmde); + /* NOTE: this may contain setting soft-dirty on some archs */ + if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) + pmde = pmd_mkdirty(pmde); + + if (folio_test_anon(folio)) { + rmap_t rmap_flags = RMAP_NONE; + + if (!is_readable_migration_entry(entry)) + rmap_flags |= RMAP_EXCLUSIVE; + + folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags); + } else { + folio_add_file_rmap_pmd(folio, new, vma); + } + VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new)); + set_pmd_at(mm, haddr, pvmw->pmd, pmde); + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, address, pvmw->pmd); + trace_remove_migration_pmd(address, pmd_val(pmde)); +} +#endif diff --git a/mm/huge_mapping_pud.c b/mm/huge_mapping_pud.c new file mode 100644 index 000000000000..c3a6bffe2871 --- /dev/null +++ b/mm/huge_mapping_pud.c @@ -0,0 +1,235 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Red Hat, Inc. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "internal.h" +#include "swap.h" + +/* + * Returns page table lock pointer if a given pud maps a thp, NULL otherwise. + * + * Note that if it returns page table lock pointer, this routine returns without + * unlocking page table lock. So callers must unlock it. + */ +spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) +{ + spinlock_t *ptl; + + ptl = pud_lock(vma->vm_mm, pud); + if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) + return ptl; + spin_unlock(ptl); + return NULL; +} + +void touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write) +{ + pud_t _pud; + + _pud = pud_mkyoung(*pud); + if (write) + _pud = pud_mkdirty(_pud); + if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK, + pud, _pud, write)) + update_mmu_cache_pud(vma, addr, pud); +} + +int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pud_t *dst_pud, pud_t *src_pud, unsigned long addr, + struct vm_area_struct *vma) +{ + spinlock_t *dst_ptl, *src_ptl; + pud_t pud; + int ret; + + dst_ptl = pud_lock(dst_mm, dst_pud); + src_ptl = pud_lockptr(src_mm, src_pud); + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); + + ret = -EAGAIN; + pud = *src_pud; + if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) + goto out_unlock; + + /* + * When page table lock is held, the huge zero pud should not be + * under splitting since we don't split the page itself, only pud to + * a page table. + */ + if (is_huge_zero_pud(pud)) { + /* No huge zero pud yet */ + } + + /* + * TODO: once we support anonymous pages, use + * folio_try_dup_anon_rmap_*() and split if duplicating fails. + */ + pudp_set_wrprotect(src_mm, addr, src_pud); + pud = pud_mkold(pud_wrprotect(pud)); + set_pud_at(dst_mm, addr, dst_pud, pud); + + ret = 0; +out_unlock: + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + return ret; +} + +void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) +{ + bool write = vmf->flags & FAULT_FLAG_WRITE; + + vmf->ptl = pud_lock(vmf->vma->vm_mm, vmf->pud); + if (unlikely(!pud_same(*vmf->pud, orig_pud))) + goto unlock; + + touch_pud(vmf->vma, vmf->address, vmf->pud, write); +unlock: + spin_unlock(vmf->ptl); +} + +/* + * Returns: + * + * - 0: if pud leaf changed from under us + * - 1: if pud can be skipped + * - HPAGE_PUD_NR: if pud was successfully processed + */ +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + struct mm_struct *mm = vma->vm_mm; + pud_t oldpud, entry; + spinlock_t *ptl; + + tlb_change_page_size(tlb, HPAGE_PUD_SIZE); + + /* NUMA balancing doesn't apply to dax */ + if (cp_flags & MM_CP_PROT_NUMA) + return 1; + + /* + * Huge entries on userfault-wp only works with anonymous, while we + * don't have anonymous PUDs yet. + */ + if (WARN_ON_ONCE(cp_flags & MM_CP_UFFD_WP_ALL)) + return 1; + + ptl = __pud_trans_huge_lock(pudp, vma); + if (!ptl) + return 0; + + /* + * Can't clear PUD or it can race with concurrent zapping. See + * change_huge_pmd(). + */ + oldpud = pudp_invalidate(vma, addr, pudp); + entry = pud_modify(oldpud, newprot); + set_pud_at(mm, addr, pudp, entry); + tlb_flush_pud_range(tlb, addr, HPAGE_PUD_SIZE); + + spin_unlock(ptl); + return HPAGE_PUD_NR; +} + +int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pud, unsigned long addr) +{ + spinlock_t *ptl; + pud_t orig_pud; + + ptl = __pud_trans_huge_lock(pud, vma); + if (!ptl) + return 0; + + orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm); + arch_check_zapped_pud(vma, orig_pud); + tlb_remove_pud_tlb_entry(tlb, pud, addr); + if (vma_is_special_huge(vma)) { + spin_unlock(ptl); + /* No zero page support yet */ + } else { + /* No support for anonymous PUD pages yet */ + BUG(); + } + return 1; +} + +static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, + unsigned long haddr) +{ + VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); + VM_BUG_ON_VMA(vma->vm_start > haddr, vma); + VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma); + VM_BUG_ON(!pud_trans_huge(*pud) && !pud_devmap(*pud)); + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) + count_vm_event(THP_SPLIT_PUD); +#endif + + pudp_huge_clear_flush(vma, haddr, pud); +} + +void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address) +{ + spinlock_t *ptl; + struct mmu_notifier_range range; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, + address & HPAGE_PUD_MASK, + (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); + mmu_notifier_invalidate_range_start(&range); + ptl = pud_lock(vma->vm_mm, pud); + if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) + goto out; + __split_huge_pud_locked(vma, pud, range.start); + +out: + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); +} diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 554dec14b768..11aee24ce21a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -838,13 +838,6 @@ static int __init setup_transparent_hugepage(char *str) } __setup("transparent_hugepage=", setup_transparent_hugepage); -pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) -{ - if (likely(vma->vm_flags & VM_WRITE)) - pmd = pmd_mkwrite(pmd, vma); - return pmd; -} - #ifdef CONFIG_MEMCG static inline struct deferred_split *get_deferred_split_queue(struct folio *folio) @@ -1313,19 +1306,6 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write) EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud); #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ -void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write) -{ - pmd_t _pmd; - - _pmd = pmd_mkyoung(*pmd); - if (write) - _pmd = pmd_mkdirty(_pmd); - if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, - pmd, _pmd, write)) - update_mmu_cache_pmd(vma, addr, pmd); -} - struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap) { @@ -1366,309 +1346,6 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, return page; } -int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) -{ - spinlock_t *dst_ptl, *src_ptl; - struct page *src_page; - struct folio *src_folio; - pmd_t pmd; - pgtable_t pgtable = NULL; - int ret = -ENOMEM; - - /* Skip if can be re-fill on fault */ - if (!vma_is_anonymous(dst_vma)) - return 0; - - pgtable = pte_alloc_one(dst_mm); - if (unlikely(!pgtable)) - goto out; - - dst_ptl = pmd_lock(dst_mm, dst_pmd); - src_ptl = pmd_lockptr(src_mm, src_pmd); - spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - - ret = -EAGAIN; - pmd = *src_pmd; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - if (unlikely(is_swap_pmd(pmd))) { - swp_entry_t entry = pmd_to_swp_entry(pmd); - - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (!is_readable_migration_entry(entry)) { - entry = make_readable_migration_entry( - swp_offset(entry)); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - if (pmd_swp_uffd_wp(*src_pmd)) - pmd = pmd_swp_mkuffd_wp(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); - } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - if (!userfaultfd_wp(dst_vma)) - pmd = pmd_swp_clear_uffd_wp(pmd); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } -#endif - - if (unlikely(!pmd_trans_huge(pmd))) { - pte_free(dst_mm, pgtable); - goto out_unlock; - } - /* - * When page table lock is held, the huge zero pmd should not be - * under splitting since we don't split the page itself, only pmd to - * a page table. - */ - if (is_huge_zero_pmd(pmd)) { - /* - * mm_get_huge_zero_folio() will never allocate a new - * folio here, since we already have a zero page to - * copy. It just takes a reference. - */ - mm_get_huge_zero_folio(dst_mm); - goto out_zero_page; - } - - src_page = pmd_page(pmd); - VM_BUG_ON_PAGE(!PageHead(src_page), src_page); - src_folio = page_folio(src_page); - - folio_get(src_folio); - if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) { - /* Page maybe pinned: split and retry the fault on PTEs. */ - folio_put(src_folio); - pte_free(dst_mm, pgtable); - spin_unlock(src_ptl); - spin_unlock(dst_ptl); - __split_huge_pmd(src_vma, src_pmd, addr, false, NULL); - return -EAGAIN; - } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); -out_zero_page: - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - pmdp_set_wrprotect(src_mm, addr, src_pmd); - if (!userfaultfd_wp(dst_vma)) - pmd = pmd_clear_uffd_wp(pmd); - pmd = pmd_mkold(pmd_wrprotect(pmd)); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - - ret = 0; -out_unlock: - spin_unlock(src_ptl); - spin_unlock(dst_ptl); -out: - return ret; -} - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -void touch_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, bool write) -{ - pud_t _pud; - - _pud = pud_mkyoung(*pud); - if (write) - _pud = pud_mkdirty(_pud); - if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK, - pud, _pud, write)) - update_mmu_cache_pud(vma, addr, pud); -} - -int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pud_t *dst_pud, pud_t *src_pud, unsigned long addr, - struct vm_area_struct *vma) -{ - spinlock_t *dst_ptl, *src_ptl; - pud_t pud; - int ret; - - dst_ptl = pud_lock(dst_mm, dst_pud); - src_ptl = pud_lockptr(src_mm, src_pud); - spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - - ret = -EAGAIN; - pud = *src_pud; - if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) - goto out_unlock; - - /* - * When page table lock is held, the huge zero pud should not be - * under splitting since we don't split the page itself, only pud to - * a page table. - */ - if (is_huge_zero_pud(pud)) { - /* No huge zero pud yet */ - } - - /* - * TODO: once we support anonymous pages, use - * folio_try_dup_anon_rmap_*() and split if duplicating fails. - */ - pudp_set_wrprotect(src_mm, addr, src_pud); - pud = pud_mkold(pud_wrprotect(pud)); - set_pud_at(dst_mm, addr, dst_pud, pud); - - ret = 0; -out_unlock: - spin_unlock(src_ptl); - spin_unlock(dst_ptl); - return ret; -} - -void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) -{ - bool write = vmf->flags & FAULT_FLAG_WRITE; - - vmf->ptl = pud_lock(vmf->vma->vm_mm, vmf->pud); - if (unlikely(!pud_same(*vmf->pud, orig_pud))) - goto unlock; - - touch_pud(vmf->vma, vmf->address, vmf->pud, write); -unlock: - spin_unlock(vmf->ptl); -} -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ - -void huge_pmd_set_accessed(struct vm_fault *vmf) -{ - bool write = vmf->flags & FAULT_FLAG_WRITE; - - vmf->ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd); - if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) - goto unlock; - - touch_pmd(vmf->vma, vmf->address, vmf->pmd, write); - -unlock: - spin_unlock(vmf->ptl); -} - -vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) -{ - const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; - struct vm_area_struct *vma = vmf->vma; - struct folio *folio; - struct page *page; - unsigned long haddr = vmf->address & HPAGE_PMD_MASK; - pmd_t orig_pmd = vmf->orig_pmd; - - vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); - VM_BUG_ON_VMA(!vma->anon_vma, vma); - - if (is_huge_zero_pmd(orig_pmd)) - goto fallback; - - spin_lock(vmf->ptl); - - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { - spin_unlock(vmf->ptl); - return 0; - } - - page = pmd_page(orig_pmd); - folio = page_folio(page); - VM_BUG_ON_PAGE(!PageHead(page), page); - - /* Early check when only holding the PT lock. */ - if (PageAnonExclusive(page)) - goto reuse; - - if (!folio_trylock(folio)) { - folio_get(folio); - spin_unlock(vmf->ptl); - folio_lock(folio); - spin_lock(vmf->ptl); - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { - spin_unlock(vmf->ptl); - folio_unlock(folio); - folio_put(folio); - return 0; - } - folio_put(folio); - } - - /* Recheck after temporarily dropping the PT lock. */ - if (PageAnonExclusive(page)) { - folio_unlock(folio); - goto reuse; - } - - /* - * See do_wp_page(): we can only reuse the folio exclusively if - * there are no additional references. Note that we always drain - * the LRU cache immediately after adding a THP. - */ - if (folio_ref_count(folio) > - 1 + folio_test_swapcache(folio) * folio_nr_pages(folio)) - goto unlock_fallback; - if (folio_test_swapcache(folio)) - folio_free_swap(folio); - if (folio_ref_count(folio) == 1) { - pmd_t entry; - - folio_move_anon_rmap(folio, vma); - SetPageAnonExclusive(page); - folio_unlock(folio); -reuse: - if (unlikely(unshare)) { - spin_unlock(vmf->ptl); - return 0; - } - entry = pmd_mkyoung(orig_pmd); - entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) - update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); - spin_unlock(vmf->ptl); - return 0; - } - -unlock_fallback: - folio_unlock(folio); - spin_unlock(vmf->ptl); -fallback: - __split_huge_pmd(vma, vmf->pmd, vmf->address, false, NULL); - return VM_FAULT_FALLBACK; -} - -static inline bool can_change_pmd_writable(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd) -{ - struct page *page; - - if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) - return false; - - /* Don't touch entries that are not even readable (NUMA hinting). */ - if (pmd_protnone(pmd)) - return false; - - /* Do we need write faults for softdirty tracking? */ - if (pmd_needs_soft_dirty_wp(vma, pmd)) - return false; - - /* Do we need write faults for uffd-wp tracking? */ - if (userfaultfd_huge_pmd_wp(vma, pmd)) - return false; - - if (!(vma->vm_flags & VM_SHARED)) { - /* See can_change_pte_writable(). */ - page = vm_normal_page_pmd(vma, addr, pmd); - return page && PageAnon(page) && PageAnonExclusive(page); - } - - /* See can_change_pte_writable(). */ - return pmd_dirty(pmd); -} - /* NUMA hinting page fault entry point for trans huge pmds */ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { @@ -1830,342 +1507,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - -int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr) -{ - pmd_t orig_pmd; - spinlock_t *ptl; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - - ptl = __pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return 0; - /* - * For architectures like ppc64 we look at deposited pgtable - * when calling pmdp_huge_get_and_clear. So do the - * pgtable_trans_huge_withdraw after finishing pmdp related - * operations. - */ - orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, - tlb->fullmm); - arch_check_zapped_pmd(vma, orig_pmd); - tlb_remove_pmd_tlb_entry(tlb, pmd, addr); - if (vma_is_special_huge(vma)) { - if (arch_needs_pgtable_deposit()) - zap_deposited_table(tlb->mm, pmd); - spin_unlock(ptl); - } else if (is_huge_zero_pmd(orig_pmd)) { - zap_deposited_table(tlb->mm, pmd); - spin_unlock(ptl); - } else { - struct folio *folio = NULL; - int flush_needed = 1; - - if (pmd_present(orig_pmd)) { - struct page *page = pmd_page(orig_pmd); - - folio = page_folio(page); - folio_remove_rmap_pmd(folio, page, vma); - WARN_ON_ONCE(folio_mapcount(folio) < 0); - VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - folio = pfn_swap_entry_folio(entry); - flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); - - if (folio_test_anon(folio)) { - zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); - } else { - if (arch_needs_pgtable_deposit()) - zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, mm_counter_file(folio), - -HPAGE_PMD_NR); - } - - spin_unlock(ptl); - if (flush_needed) - tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE); - } - return 1; -} - -#ifndef pmd_move_must_withdraw -static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, - spinlock_t *old_pmd_ptl, - struct vm_area_struct *vma) -{ - /* - * With split pmd lock we also need to move preallocated - * PTE page table if new_pmd is on different PMD page table. - * - * We also don't deposit and withdraw tables for file pages. - */ - return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma); -} -#endif - -static pmd_t move_soft_dirty_pmd(pmd_t pmd) -{ -#ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) - pmd = pmd_swp_mksoft_dirty(pmd); - else if (pmd_present(pmd)) - pmd = pmd_mksoft_dirty(pmd); -#endif - return pmd; -} - -bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) -{ - spinlock_t *old_ptl, *new_ptl; - pmd_t pmd; - struct mm_struct *mm = vma->vm_mm; - bool force_flush = false; - - /* - * The destination pmd shouldn't be established, free_pgtables() - * should have released it; but move_page_tables() might have already - * inserted a page table, if racing against shmem/file collapse. - */ - if (!pmd_none(*new_pmd)) { - VM_BUG_ON(pmd_trans_huge(*new_pmd)); - return false; - } - - /* - * We don't have to worry about the ordering of src and dst - * ptlocks because exclusive mmap_lock prevents deadlock. - */ - old_ptl = __pmd_trans_huge_lock(old_pmd, vma); - if (old_ptl) { - new_ptl = pmd_lockptr(mm, new_pmd); - if (new_ptl != old_ptl) - spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); - pmd = pmdp_huge_get_and_clear(mm, old_addr, old_pmd); - if (pmd_present(pmd)) - force_flush = true; - VM_BUG_ON(!pmd_none(*new_pmd)); - - if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) { - pgtable_t pgtable; - pgtable = pgtable_trans_huge_withdraw(mm, old_pmd); - pgtable_trans_huge_deposit(mm, new_pmd, pgtable); - } - pmd = move_soft_dirty_pmd(pmd); - set_pmd_at(mm, new_addr, new_pmd, pmd); - if (force_flush) - flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); - if (new_ptl != old_ptl) - spin_unlock(new_ptl); - spin_unlock(old_ptl); - return true; - } - return false; -} - -/* - * Returns - * - 0 if PMD could not be locked - * - 1 if PMD was locked but protections unchanged and TLB flush unnecessary - * or if prot_numa but THP migration is not supported - * - HPAGE_PMD_NR if protections changed and TLB flush necessary - */ -int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) -{ - struct mm_struct *mm = vma->vm_mm; - spinlock_t *ptl; - pmd_t oldpmd, entry; - bool prot_numa = cp_flags & MM_CP_PROT_NUMA; - bool uffd_wp = cp_flags & MM_CP_UFFD_WP; - bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; - int ret = 1; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - - if (prot_numa && !thp_migration_supported()) - return 1; - - ptl = __pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return 0; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - if (is_swap_pmd(*pmd)) { - swp_entry_t entry = pmd_to_swp_entry(*pmd); - struct folio *folio = pfn_swap_entry_folio(entry); - pmd_t newpmd; - - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); - if (is_writable_migration_entry(entry)) { - /* - * A protection check is difficult so - * just be safe and disable write - */ - if (folio_test_anon(folio)) - entry = make_readable_exclusive_migration_entry(swp_offset(entry)); - else - entry = make_readable_migration_entry(swp_offset(entry)); - newpmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*pmd)) - newpmd = pmd_swp_mksoft_dirty(newpmd); - } else { - newpmd = *pmd; - } - - if (uffd_wp) - newpmd = pmd_swp_mkuffd_wp(newpmd); - else if (uffd_wp_resolve) - newpmd = pmd_swp_clear_uffd_wp(newpmd); - if (!pmd_same(*pmd, newpmd)) - set_pmd_at(mm, addr, pmd, newpmd); - goto unlock; - } -#endif - - if (prot_numa) { - struct folio *folio; - bool toptier; - /* - * Avoid trapping faults against the zero page. The read-only - * data is likely to be read-cached on the local CPU and - * local/remote hits to the zero page are not interesting. - */ - if (is_huge_zero_pmd(*pmd)) - goto unlock; - - if (pmd_protnone(*pmd)) - goto unlock; - - folio = pmd_folio(*pmd); - toptier = node_is_toptier(folio_nid(folio)); - /* - * Skip scanning top tier node if normal numa - * balancing is disabled - */ - if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && - toptier) - goto unlock; - - if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING && - !toptier) - folio_xchg_access_time(folio, - jiffies_to_msecs(jiffies)); - } - /* - * In case prot_numa, we are under mmap_read_lock(mm). It's critical - * to not clear pmd intermittently to avoid race with MADV_DONTNEED - * which is also under mmap_read_lock(mm): - * - * CPU0: CPU1: - * change_huge_pmd(prot_numa=1) - * pmdp_huge_get_and_clear_notify() - * madvise_dontneed() - * zap_pmd_range() - * pmd_trans_huge(*pmd) == 0 (without ptl) - * // skip the pmd - * set_pmd_at(); - * // pmd is re-established - * - * The race makes MADV_DONTNEED miss the huge pmd and don't clear it - * which may break userspace. - * - * pmdp_invalidate_ad() is required to make sure we don't miss - * dirty/young flags set by hardware. - */ - oldpmd = pmdp_invalidate_ad(vma, addr, pmd); - - entry = pmd_modify(oldpmd, newprot); - if (uffd_wp) - entry = pmd_mkuffd_wp(entry); - else if (uffd_wp_resolve) - /* - * Leave the write bit to be handled by PF interrupt - * handler, then things like COW could be properly - * handled. - */ - entry = pmd_clear_uffd_wp(entry); - - /* See change_pte_range(). */ - if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && - can_change_pmd_writable(vma, addr, entry)) - entry = pmd_mkwrite(entry, vma); - - ret = HPAGE_PMD_NR; - set_pmd_at(mm, addr, pmd, entry); - - if (huge_pmd_needs_flush(oldpmd, entry)) - tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); -unlock: - spin_unlock(ptl); - return ret; -} - -/* - * Returns: - * - * - 0: if pud leaf changed from under us - * - 1: if pud can be skipped - * - HPAGE_PUD_NR: if pud was successfully processed - */ -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pudp, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) -{ - struct mm_struct *mm = vma->vm_mm; - pud_t oldpud, entry; - spinlock_t *ptl; - - tlb_change_page_size(tlb, HPAGE_PUD_SIZE); - - /* NUMA balancing doesn't apply to dax */ - if (cp_flags & MM_CP_PROT_NUMA) - return 1; - - /* - * Huge entries on userfault-wp only works with anonymous, while we - * don't have anonymous PUDs yet. - */ - if (WARN_ON_ONCE(cp_flags & MM_CP_UFFD_WP_ALL)) - return 1; - - ptl = __pud_trans_huge_lock(pudp, vma); - if (!ptl) - return 0; - - /* - * Can't clear PUD or it can race with concurrent zapping. See - * change_huge_pmd(). - */ - oldpud = pudp_invalidate(vma, addr, pudp); - entry = pud_modify(oldpud, newprot); - set_pud_at(mm, addr, pudp, entry); - tlb_flush_pud_range(tlb, addr, HPAGE_PUD_SIZE); - - spin_unlock(ptl); - return HPAGE_PUD_NR; -} -#endif - #ifdef CONFIG_USERFAULTFD /* * The PT lock for src_pmd and dst_vma/src_vma (for reading) are locked by @@ -2306,105 +1647,8 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm } #endif /* CONFIG_USERFAULTFD */ -/* - * Returns page table lock pointer if a given pmd maps a thp, NULL otherwise. - * - * Note that if it returns page table lock pointer, this routine returns without - * unlocking page table lock. So callers must unlock it. - */ -spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) -{ - spinlock_t *ptl; - ptl = pmd_lock(vma->vm_mm, pmd); - if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || - pmd_devmap(*pmd))) - return ptl; - spin_unlock(ptl); - return NULL; -} - -/* - * Returns page table lock pointer if a given pud maps a thp, NULL otherwise. - * - * Note that if it returns page table lock pointer, this routine returns without - * unlocking page table lock. So callers must unlock it. - */ -spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) -{ - spinlock_t *ptl; - - ptl = pud_lock(vma->vm_mm, pud); - if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) - return ptl; - spin_unlock(ptl); - return NULL; -} - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pud, unsigned long addr) -{ - spinlock_t *ptl; - pud_t orig_pud; - - ptl = __pud_trans_huge_lock(pud, vma); - if (!ptl) - return 0; - - orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm); - arch_check_zapped_pud(vma, orig_pud); - tlb_remove_pud_tlb_entry(tlb, pud, addr); - if (vma_is_special_huge(vma)) { - spin_unlock(ptl); - /* No zero page support yet */ - } else { - /* No support for anonymous PUD pages yet */ - BUG(); - } - return 1; -} - -static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, - unsigned long haddr) -{ - VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); - VM_BUG_ON_VMA(vma->vm_start > haddr, vma); - VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma); - VM_BUG_ON(!pud_trans_huge(*pud) && !pud_devmap(*pud)); - - count_vm_event(THP_SPLIT_PUD); - - pudp_huge_clear_flush(vma, haddr, pud); -} - -void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) -{ - spinlock_t *ptl; - struct mmu_notifier_range range; - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, - address & HPAGE_PUD_MASK, - (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); - mmu_notifier_invalidate_range_start(&range); - ptl = pud_lock(vma->vm_mm, pud); - if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) - goto out; - __split_huge_pud_locked(vma, pud, range.start); - -out: - spin_unlock(ptl); - mmu_notifier_invalidate_range_end(&range); -} -#else -void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) -{ -} -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ - -static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, - unsigned long haddr, pmd_t *pmd) +void __split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -2444,274 +1688,6 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long haddr, bool freeze) -{ - struct mm_struct *mm = vma->vm_mm; - struct folio *folio; - struct page *page; - pgtable_t pgtable; - pmd_t old_pmd, _pmd; - bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; - bool anon_exclusive = false, dirty = false; - unsigned long addr; - pte_t *pte; - int i; - - VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); - VM_BUG_ON_VMA(vma->vm_start > haddr, vma); - VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) - && !pmd_devmap(*pmd)); - - count_vm_event(THP_SPLIT_PMD); - - if (!vma_is_anonymous(vma)) { - old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd); - /* - * We are going to unmap this huge page. So - * just go ahead and zap it - */ - if (arch_needs_pgtable_deposit()) - zap_deposited_table(mm, pmd); - if (vma_is_special_huge(vma)) - return; - if (unlikely(is_pmd_migration_entry(old_pmd))) { - swp_entry_t entry; - - entry = pmd_to_swp_entry(old_pmd); - folio = pfn_swap_entry_folio(entry); - } else { - page = pmd_page(old_pmd); - folio = page_folio(page); - if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) - folio_mark_dirty(folio); - if (!folio_test_referenced(folio) && pmd_young(old_pmd)) - folio_set_referenced(folio); - folio_remove_rmap_pmd(folio, page, vma); - folio_put(folio); - } - add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR); - return; - } - - if (is_huge_zero_pmd(*pmd)) { - /* - * FIXME: Do we want to invalidate secondary mmu by calling - * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below - * inside __split_huge_pmd() ? - * - * We are going from a zero huge page write protected to zero - * small page also write protected so it does not seems useful - * to invalidate secondary mmu at this time. - */ - return __split_huge_zero_page_pmd(vma, haddr, pmd); - } - - pmd_migration = is_pmd_migration_entry(*pmd); - if (unlikely(pmd_migration)) { - swp_entry_t entry; - - old_pmd = *pmd; - entry = pmd_to_swp_entry(old_pmd); - page = pfn_swap_entry_to_page(entry); - write = is_writable_migration_entry(entry); - if (PageAnon(page)) - anon_exclusive = is_readable_exclusive_migration_entry(entry); - young = is_migration_entry_young(entry); - dirty = is_migration_entry_dirty(entry); - soft_dirty = pmd_swp_soft_dirty(old_pmd); - uffd_wp = pmd_swp_uffd_wp(old_pmd); - } else { - /* - * Up to this point the pmd is present and huge and userland has - * the whole access to the hugepage during the split (which - * happens in place). If we overwrite the pmd with the not-huge - * version pointing to the pte here (which of course we could if - * all CPUs were bug free), userland could trigger a small page - * size TLB miss on the small sized TLB while the hugepage TLB - * entry is still established in the huge TLB. Some CPU doesn't - * like that. See - * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum - * 383 on page 105. Intel should be safe but is also warns that - * it's only safe if the permission and cache attributes of the - * two entries loaded in the two TLB is identical (which should - * be the case here). But it is generally safer to never allow - * small and huge TLB entries for the same virtual address to be - * loaded simultaneously. So instead of doing "pmd_populate(); - * flush_pmd_tlb_range();" we first mark the current pmd - * notpresent (atomically because here the pmd_trans_huge must - * remain set at all times on the pmd until the split is - * complete for this pmd), then we flush the SMP TLB and finally - * we write the non-huge version of the pmd entry with - * pmd_populate. - */ - old_pmd = pmdp_invalidate(vma, haddr, pmd); - page = pmd_page(old_pmd); - folio = page_folio(page); - if (pmd_dirty(old_pmd)) { - dirty = true; - folio_set_dirty(folio); - } - write = pmd_write(old_pmd); - young = pmd_young(old_pmd); - soft_dirty = pmd_soft_dirty(old_pmd); - uffd_wp = pmd_uffd_wp(old_pmd); - - VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); - VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); - - /* - * Without "freeze", we'll simply split the PMD, propagating the - * PageAnonExclusive() flag for each PTE by setting it for - * each subpage -- no need to (temporarily) clear. - * - * With "freeze" we want to replace mapped pages by - * migration entries right away. This is only possible if we - * managed to clear PageAnonExclusive() -- see - * set_pmd_migration_entry(). - * - * In case we cannot clear PageAnonExclusive(), split the PMD - * only and let try_to_migrate_one() fail later. - * - * See folio_try_share_anon_rmap_pmd(): invalidate PMD first. - */ - anon_exclusive = PageAnonExclusive(page); - if (freeze && anon_exclusive && - folio_try_share_anon_rmap_pmd(folio, page)) - freeze = false; - if (!freeze) { - rmap_t rmap_flags = RMAP_NONE; - - folio_ref_add(folio, HPAGE_PMD_NR - 1); - if (anon_exclusive) - rmap_flags |= RMAP_EXCLUSIVE; - folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, - vma, haddr, rmap_flags); - } - } - - /* - * Withdraw the table only after we mark the pmd entry invalid. - * This's critical for some architectures (Power). - */ - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pmd_populate(mm, &_pmd, pgtable); - - pte = pte_offset_map(&_pmd, haddr); - VM_BUG_ON(!pte); - - /* - * Note that NUMA hinting access restrictions are not transferred to - * avoid any possibility of altering permissions across VMAs. - */ - if (freeze || pmd_migration) { - for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { - pte_t entry; - swp_entry_t swp_entry; - - if (write) - swp_entry = make_writable_migration_entry( - page_to_pfn(page + i)); - else if (anon_exclusive) - swp_entry = make_readable_exclusive_migration_entry( - page_to_pfn(page + i)); - else - swp_entry = make_readable_migration_entry( - page_to_pfn(page + i)); - if (young) - swp_entry = make_migration_entry_young(swp_entry); - if (dirty) - swp_entry = make_migration_entry_dirty(swp_entry); - entry = swp_entry_to_pte(swp_entry); - if (soft_dirty) - entry = pte_swp_mksoft_dirty(entry); - if (uffd_wp) - entry = pte_swp_mkuffd_wp(entry); - - VM_WARN_ON(!pte_none(ptep_get(pte + i))); - set_pte_at(mm, addr, pte + i, entry); - } - } else { - pte_t entry; - - entry = mk_pte(page, READ_ONCE(vma->vm_page_prot)); - if (write) - entry = pte_mkwrite(entry, vma); - if (!young) - entry = pte_mkold(entry); - /* NOTE: this may set soft-dirty too on some archs */ - if (dirty) - entry = pte_mkdirty(entry); - if (soft_dirty) - entry = pte_mksoft_dirty(entry); - if (uffd_wp) - entry = pte_mkuffd_wp(entry); - - for (i = 0; i < HPAGE_PMD_NR; i++) - VM_WARN_ON(!pte_none(ptep_get(pte + i))); - - set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); - } - pte_unmap(pte); - - if (!pmd_migration) - folio_remove_rmap_pmd(folio, page, vma); - if (freeze) - put_page(page); - - smp_wmb(); /* make pte visible before pmd */ - pmd_populate(mm, pmd, pgtable); -} - -void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, - pmd_t *pmd, bool freeze, struct folio *folio) -{ - VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); - VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); - VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); - VM_BUG_ON(freeze && !folio); - - /* - * When the caller requests to set up a migration entry, we - * require a folio to check the PMD against. Otherwise, there - * is a risk of replacing the wrong folio. - */ - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || - is_pmd_migration_entry(*pmd)) { - if (folio && folio != pmd_folio(*pmd)) - return; - __split_huge_pmd_locked(vma, pmd, address, freeze); - } -} - -void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct folio *folio) -{ - spinlock_t *ptl; - struct mmu_notifier_range range; - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, - address & HPAGE_PMD_MASK, - (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - ptl = pmd_lock(vma->vm_mm, pmd); - split_huge_pmd_locked(vma, range.start, pmd, freeze, folio); - spin_unlock(ptl); - mmu_notifier_invalidate_range_end(&range); -} - -void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze, struct folio *folio) -{ - pmd_t *pmd = mm_find_pmd(vma->vm_mm, address); - - if (!pmd) - return; - - __split_huge_pmd(vma, pmd, address, freeze, folio); -} - static inline void split_huge_pmd_if_needed(struct vm_area_struct *vma, unsigned long address) { /* @@ -3772,100 +2748,3 @@ static int __init split_huge_pages_debugfs(void) late_initcall(split_huge_pages_debugfs); #endif -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page) -{ - struct folio *folio = page_folio(page); - struct vm_area_struct *vma = pvmw->vma; - struct mm_struct *mm = vma->vm_mm; - unsigned long address = pvmw->address; - bool anon_exclusive; - pmd_t pmdval; - swp_entry_t entry; - pmd_t pmdswp; - - if (!(pvmw->pmd && !pvmw->pte)) - return 0; - - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE); - pmdval = pmdp_invalidate(vma, address, pvmw->pmd); - - /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */ - anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page); - if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) { - set_pmd_at(mm, address, pvmw->pmd, pmdval); - return -EBUSY; - } - - if (pmd_dirty(pmdval)) - folio_mark_dirty(folio); - if (pmd_write(pmdval)) - entry = make_writable_migration_entry(page_to_pfn(page)); - else if (anon_exclusive) - entry = make_readable_exclusive_migration_entry(page_to_pfn(page)); - else - entry = make_readable_migration_entry(page_to_pfn(page)); - if (pmd_young(pmdval)) - entry = make_migration_entry_young(entry); - if (pmd_dirty(pmdval)) - entry = make_migration_entry_dirty(entry); - pmdswp = swp_entry_to_pmd(entry); - if (pmd_soft_dirty(pmdval)) - pmdswp = pmd_swp_mksoft_dirty(pmdswp); - if (pmd_uffd_wp(pmdval)) - pmdswp = pmd_swp_mkuffd_wp(pmdswp); - set_pmd_at(mm, address, pvmw->pmd, pmdswp); - folio_remove_rmap_pmd(folio, page, vma); - folio_put(folio); - trace_set_migration_pmd(address, pmd_val(pmdswp)); - - return 0; -} - -void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) -{ - struct folio *folio = page_folio(new); - struct vm_area_struct *vma = pvmw->vma; - struct mm_struct *mm = vma->vm_mm; - unsigned long address = pvmw->address; - unsigned long haddr = address & HPAGE_PMD_MASK; - pmd_t pmde; - swp_entry_t entry; - - if (!(pvmw->pmd && !pvmw->pte)) - return; - - entry = pmd_to_swp_entry(*pvmw->pmd); - folio_get(folio); - pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)); - if (pmd_swp_soft_dirty(*pvmw->pmd)) - pmde = pmd_mksoft_dirty(pmde); - if (is_writable_migration_entry(entry)) - pmde = pmd_mkwrite(pmde, vma); - if (pmd_swp_uffd_wp(*pvmw->pmd)) - pmde = pmd_mkuffd_wp(pmde); - if (!is_migration_entry_young(entry)) - pmde = pmd_mkold(pmde); - /* NOTE: this may contain setting soft-dirty on some archs */ - if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) - pmde = pmd_mkdirty(pmde); - - if (folio_test_anon(folio)) { - rmap_t rmap_flags = RMAP_NONE; - - if (!is_readable_migration_entry(entry)) - rmap_flags |= RMAP_EXCLUSIVE; - - folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags); - } else { - folio_add_file_rmap_pmd(folio, new, vma); - } - VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new)); - set_pmd_at(mm, haddr, pvmw->pmd, pmde); - - /* No need to invalidate - it was non-present before */ - update_mmu_cache_pmd(vma, address, pvmw->pmd); - trace_remove_migration_pmd(address, pmd_val(pmde)); -} -#endif From patchwork Wed Jul 17 22:02:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735862 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9663AC3DA5D for ; Wed, 17 Jul 2024 22:03:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=vZZaRY0sEMOIar5EFA39ZfILbWs7o64kSIDuRG3wCnA=; b=hkJto+Iw2HudN5 JXkcz6Gr8BxEtkbQiBCDiujxZxrKEi+RuwQ0RYMPu+b4/LtlX004ZgHzy2K2ITQ8GyGiIuzWcZ6Xk +d/DwH6Umi9Ai3LEGh0AuWjrKn+KsevtEbwxP7jX2ycYngsENQOjdwRd6RlQZjjsscloct2MqOGfN sHvAaQGuU8UrHpu2SGHPms19b95wy+V/gPBJqDEQSTNO+eJf1pIE0xNSLl+lzR6WChR+OiPp8ji6E 2Iv7s9guEkDXrlyPp+zLZ88DGZ5o0RnQYhiQlaTwk0RoQTvcI04IYgjCARvmuB5Ji55FuLmMLkd8T OX+eeRmJ2sn9VNAzwIfw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCk5-0000000F3TD-40tE; Wed, 17 Jul 2024 22:03:21 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjS-0000000F3FP-1aEs for linux-riscv@bombadil.infradead.org; Wed, 17 Jul 2024 22:02:42 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:Content-Transfer-Encoding :MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Sender:Reply-To:Content-ID:Content-Description; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=YGFyMMP/EApp6Dm9b8oPVyp0hJ DGe9B7qiWdI480t2Wb7rZJcv6ggexL29wV42N9k6dZxTUcJnVUfztsfLb/8A7HI9rLf0Hc3PieBUx hu1NR6vMreLwjMuV88A/ttnO1Ot+poC91SrTE8Dvhmkh1+Q3el2tpWxuplz+EThhkwkZEYs3vsgfW HZWw1vmb2jhqTsuHzV+wNSv6qDKfgnGp5tzpr7UTB7I5WpzoO4ltoh2Ox976oZ7rlWpmtcSpNGg1e Lr9itW9++cdDmmTC497B2k83QVPdmMwyihW6F9mKRo8bTyIpaCufc90UcX62986rJI4zIntX44Nuh VZfu+R5w==; Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by desiato.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sUCjO-00000002Q2w-1QNG for linux-riscv@lists.infradead.org; Wed, 17 Jul 2024 22:02:41 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253757; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=X4x4G6uceckUzUEFGydtnyMyFkRjLVoqyUUMh/QrfNuBPjzC9RjvYSOEIxX7VF0qrmHG1L RU6fyidvyYkpp1Lt2j+kEsYvydnRlVukrOufvWQQ/KI4aMiJNOYvm9RwuSVL2eDTGweXnt ICK1+wdNZr4ufiu3xQP6BO7pWI//sbg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253757; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=X4x4G6uceckUzUEFGydtnyMyFkRjLVoqyUUMh/QrfNuBPjzC9RjvYSOEIxX7VF0qrmHG1L RU6fyidvyYkpp1Lt2j+kEsYvydnRlVukrOufvWQQ/KI4aMiJNOYvm9RwuSVL2eDTGweXnt ICK1+wdNZr4ufiu3xQP6BO7pWI//sbg= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-90-pJQZUHekMLimCbSNKGjinw-1; Wed, 17 Jul 2024 18:02:35 -0400 X-MC-Unique: pJQZUHekMLimCbSNKGjinw-1 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-447e34d7437so240561cf.3 for ; Wed, 17 Jul 2024 15:02:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253755; x=1721858555; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=c0WMZm5F5zd/2a6nPCTY2hWjXDZ3is65tdK0e5zrX/9GB4ZLoeDI9k+/qiB5c9e46s 95579+i3g6Zqw81BMDFO+kemJ4EH7NO//Li9Nw2yibdADw5+bCwwsVXTZDn0A4aN7Wpi 8x5sdiZUWHvR47QPiCv1t2Bg9K75wdFwwPHgH+NQt8iV4XqFtXan7ohmGbylqg0EQeUu qRj8t4MTs9kk0wav+q5YecFXtWDPGjeTSYtK7x/T7AMz4YDuxW5TRhJxhOjWj/0CJGp1 Ln6ueuR/xpjH+5ykchQvNP7eJ84vLl2nclgsGEHyGG+PAqWXwu1sv48Q52pwieChMR8S eJZA== X-Forwarded-Encrypted: i=1; AJvYcCVo+LPPGBWK4P8+kYms6DGZ3rx0CIM/njcrbtf+oIU4M6jbsP+Yks0u94xQr7TkU1gfy8kG07tQm4//UQ==@lists.infradead.org X-Gm-Message-State: AOJu0Yz3aUhiZNnVBAwGb77ziRvLQ/ILpNkmFHCyC9kw3NHD9gN3DLJ+ igy/+/Nmyi1y1L0pgmj7lXGMcM4MRiw75UQSdPVaoisRA+TY/XoGVWCI/Eef4WXt3KheApbD/Z7 OhVOFXELK2GHIU2fMPjxqJBgEmf7DMZFwcHrWmeEPvrtDBPlJWf4Fns1Ot5AYAGfuXg== X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22373011cf.1.1721253755329; Wed, 17 Jul 2024 15:02:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH8QsOTDoliMPT3fo+/7Outqv1nF7W8rzQA7PybYEjXwyS1q7X3t8IYBAe2znVX4XcYe4OF/A== X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22372781cf.1.1721253754848; Wed, 17 Jul 2024 15:02:34 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:34 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 6/6] mm: Convert "*_trans_huge() || *_devmap()" to use *_leaf() Date: Wed, 17 Jul 2024 18:02:19 -0400 Message-ID: <20240717220219.3743374-7-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240717_230238_672156_2089E88B X-CRM114-Status: GOOD ( 25.92 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org This patch converted all such checks into one *_leaf() check under common mm/, as "thp+devmap" should compose everything for a *_leaf() for now. I didn't yet touch arch code in other directories, as some arch may need some special attention, so I left those separately. It should start to save some cycles on such check and pave way for the new leaf types. E.g., when a new type of leaf is introduced, it'll naturally go the same route to what we have now for thp+devmap. Here one issue with pxx_leaf() API is that such API will be defined by arch but it doesn't consider kernel config. For example, below "if" branch cannot be automatically optimized: if (pmd_leaf()) { ... } Even if both THP && HUGETLB are not enabled (which means pmd_leaf() can never return true). To provide a chance for compilers to optimize and omit code when possible, introduce a light wrapper for them and call them pxx_is_leaf(). That will take kernel config into account and properly allow omitting branches when the compiler knows it'll constantly returns false. This tries to mimic what we used to have with pxx_trans_huge() when !THP, so it now also applies to pxx_leaf() API. Cc: Alistair Popple Cc: Dan Williams Cc: Jason Gunthorpe Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 6 +++--- include/linux/pgtable.h | 30 +++++++++++++++++++++++++++++- mm/hmm.c | 4 ++-- mm/huge_mapping_pmd.c | 9 +++------ mm/huge_mapping_pud.c | 6 +++--- mm/mapping_dirty_helpers.c | 4 ++-- mm/memory.c | 14 ++++++-------- mm/migrate_device.c | 2 +- mm/mprotect.c | 4 ++-- mm/mremap.c | 5 ++--- mm/page_vma_mapped.c | 5 ++--- mm/pgtable-generic.c | 7 +++---- 12 files changed, 58 insertions(+), 38 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index aea2784df8ef..a5b026d0731e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -27,7 +27,7 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); static inline spinlock_t * pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { - if (pud_trans_huge(*pud) || pud_devmap(*pud)) + if (pud_is_leaf(*pud)) return __pud_trans_huge_lock(pud, vma); else return NULL; @@ -36,7 +36,7 @@ pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) #define split_huge_pud(__vma, __pud, __address) \ do { \ pud_t *____pud = (__pud); \ - if (pud_trans_huge(*____pud) || pud_devmap(*____pud)) \ + if (pud_is_leaf(*____pud)) \ __split_huge_pud(__vma, __pud, __address); \ } while (0) #else /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ @@ -125,7 +125,7 @@ static inline int is_swap_pmd(pmd_t pmd) static inline spinlock_t * pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + if (is_swap_pmd(*pmd) || pmd_is_leaf(*pmd)) return __pmd_trans_huge_lock(pmd, vma); else return NULL; diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5e505373b113..af7709a132aa 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1641,7 +1641,7 @@ static inline int pud_trans_unstable(pud_t *pud) defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) pud_t pudval = READ_ONCE(*pud); - if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) + if (pud_none(pudval) || pud_leaf(pudval)) return 1; if (unlikely(pud_bad(pudval))) { pud_clear_bad(pud); @@ -1901,6 +1901,34 @@ typedef unsigned int pgtbl_mod_mask; #define pmd_leaf(x) false #endif +/* + * Wrapper of pxx_leaf() helpers. + * + * Comparing to pxx_leaf() API, the only difference is: using these macros + * can help code generation, so unnecessary code can be omitted when the + * specific level of leaf is not possible due to kernel config. It is + * needed because normally pxx_leaf() can be defined in arch code without + * knowing the kernel config. + * + * Currently we only need pmd/pud versions, because the largest leaf Linux + * supports so far is pud. + * + * Defining here also means that in arch's pgtable headers these macros + * cannot be used, pxx_leaf()s need to be used instead, because this file + * will not be included in arch's pgtable headers. + */ +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES +#define pmd_is_leaf(x) pmd_leaf(x) +#else +#define pmd_is_leaf(x) false +#endif + +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +#define pud_is_leaf(x) pud_leaf(x) +#else +#define pud_is_leaf(x) false +#endif + #ifndef pgd_leaf_size #define pgd_leaf_size(x) (1ULL << PGDIR_SHIFT) #endif diff --git a/mm/hmm.c b/mm/hmm.c index 7e0229ae4a5a..8d985bbbfee9 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -351,7 +351,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); } - if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) { + if (pmd_is_leaf(pmd)) { /* * No need to take pmd_lock here, even if some other thread * is splitting the huge pmd we will get that event through @@ -362,7 +362,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, * values. */ pmd = pmdp_get_lockless(pmdp); - if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd)) + if (!pmd_is_leaf(pmd)) goto again; return hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd); diff --git a/mm/huge_mapping_pmd.c b/mm/huge_mapping_pmd.c index 7b85e2a564d6..d30c60685f66 100644 --- a/mm/huge_mapping_pmd.c +++ b/mm/huge_mapping_pmd.c @@ -60,8 +60,7 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) spinlock_t *ptl; ptl = pmd_lock(vma->vm_mm, pmd); - if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || - pmd_devmap(*pmd))) + if (likely(is_swap_pmd(*pmd) || pmd_is_leaf(*pmd))) return ptl; spin_unlock(ptl); return NULL; @@ -627,8 +626,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) && - !pmd_devmap(*pmd)); + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_is_leaf(*pmd)); #ifdef CONFIG_TRANSPARENT_HUGEPAGE count_vm_event(THP_SPLIT_PMD); @@ -845,8 +843,7 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, * require a folio to check the PMD against. Otherwise, there * is a risk of replacing the wrong folio. */ - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || - is_pmd_migration_entry(*pmd)) { + if (pmd_is_leaf(*pmd) || is_pmd_migration_entry(*pmd)) { if (folio && folio != pmd_folio(*pmd)) return; __split_huge_pmd_locked(vma, pmd, address, freeze); diff --git a/mm/huge_mapping_pud.c b/mm/huge_mapping_pud.c index c3a6bffe2871..58871dd74df2 100644 --- a/mm/huge_mapping_pud.c +++ b/mm/huge_mapping_pud.c @@ -57,7 +57,7 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) spinlock_t *ptl; ptl = pud_lock(vma->vm_mm, pud); - if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) + if (likely(pud_is_leaf(*pud))) return ptl; spin_unlock(ptl); return NULL; @@ -90,7 +90,7 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pud = *src_pud; - if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) + if (unlikely(!pud_leaf(pud))) goto out_unlock; /* @@ -225,7 +225,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); mmu_notifier_invalidate_range_start(&range); ptl = pud_lock(vma->vm_mm, pud); - if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) + if (unlikely(!pud_is_leaf(*pud))) goto out; __split_huge_pud_locked(vma, pud, range.start); diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c index 2f8829b3541a..a9ea767d2d73 100644 --- a/mm/mapping_dirty_helpers.c +++ b/mm/mapping_dirty_helpers.c @@ -129,7 +129,7 @@ static int wp_clean_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long end, pmd_t pmdval = pmdp_get_lockless(pmd); /* Do not split a huge pmd, present or migrated */ - if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) { + if (pmd_is_leaf(pmdval)) { WARN_ON(pmd_write(pmdval) || pmd_dirty(pmdval)); walk->action = ACTION_CONTINUE; } @@ -152,7 +152,7 @@ static int wp_clean_pud_entry(pud_t *pud, unsigned long addr, unsigned long end, pud_t pudval = READ_ONCE(*pud); /* Do not split a huge pud */ - if (pud_trans_huge(pudval) || pud_devmap(pudval)) { + if (pud_is_leaf(pudval)) { WARN_ON(pud_write(pudval) || pud_dirty(pudval)); walk->action = ACTION_CONTINUE; } diff --git a/mm/memory.c b/mm/memory.c index 126ee0903c79..6dc92c514bb7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1235,8 +1235,7 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, src_pmd = pmd_offset(src_pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd) - || pmd_devmap(*src_pmd)) { + if (is_swap_pmd(*src_pmd) || pmd_is_leaf(*src_pmd)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma); err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd, @@ -1272,7 +1271,7 @@ copy_pud_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, src_pud = pud_offset(src_p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*src_pud) || pud_devmap(*src_pud)) { + if (pud_is_leaf(*src_pud)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PUD_SIZE, src_vma); @@ -1710,7 +1709,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { + if (is_swap_pmd(*pmd) || pmd_is_leaf(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false, NULL); else if (zap_huge_pmd(tlb, vma, pmd, addr)) { @@ -1752,7 +1751,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb, pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*pud) || pud_devmap(*pud)) { + if (pud_is_leaf(*pud)) { if (next - addr != HPAGE_PUD_SIZE) { mmap_assert_locked(tlb->mm); split_huge_pud(vma, pud, addr); @@ -5605,8 +5604,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, pud_t orig_pud = *vmf.pud; barrier(); - if (pud_trans_huge(orig_pud) || pud_devmap(orig_pud)) { - + if (pud_is_leaf(orig_pud)) { /* * TODO once we support anonymous PUDs: NUMA case and * FAULT_FLAG_UNSHARE handling. @@ -5646,7 +5644,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, pmd_migration_entry_wait(mm, vmf.pmd); return 0; } - if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) { + if (pmd_is_leaf(vmf.orig_pmd)) { if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6d66dc1c6ffa..1fbeee9619c8 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -596,7 +596,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, pmdp = pmd_alloc(mm, pudp, addr); if (!pmdp) goto abort; - if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) + if (pmd_leaf(*pmdp)) goto abort; if (pte_alloc(mm, pmdp)) goto abort; diff --git a/mm/mprotect.c b/mm/mprotect.c index 694f13b83864..ddfee216a02b 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -381,7 +381,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb, goto next; _pmd = pmdp_get_lockless(pmd); - if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { + if (is_swap_pmd(_pmd) || pmd_is_leaf(_pmd)) { if ((next - addr != HPAGE_PMD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false, NULL); @@ -452,7 +452,7 @@ static inline long change_pud_range(struct mmu_gather *tlb, mmu_notifier_invalidate_range_start(&range); } - if (pud_leaf(pud)) { + if (pud_is_leaf(pud)) { if ((next - addr != PUD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pud(vma, pudp, addr); diff --git a/mm/mremap.c b/mm/mremap.c index e7ae140fc640..f5c9884ea1f8 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -587,7 +587,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr); if (!new_pud) break; - if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) { + if (pud_is_leaf(*old_pud)) { if (extent == HPAGE_PUD_SIZE) { move_pgt_entry(HPAGE_PUD, vma, old_addr, new_addr, old_pud, new_pud, need_rmap_locks); @@ -609,8 +609,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, if (!new_pmd) break; again: - if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || - pmd_devmap(*old_pmd)) { + if (is_swap_pmd(*old_pmd) || pmd_is_leaf(*old_pmd)) { if (extent == HPAGE_PMD_SIZE && move_pgt_entry(HPAGE_PMD, vma, old_addr, new_addr, old_pmd, new_pmd, need_rmap_locks)) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae5cc42aa208..891bea8062d2 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -235,8 +235,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) */ pmde = pmdp_get_lockless(pvmw->pmd); - if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) || - (pmd_present(pmde) && pmd_devmap(pmde))) { + if (pmd_is_leaf(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); pmde = *pvmw->pmd; if (!pmd_present(pmde)) { @@ -251,7 +250,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return not_found(pvmw); return true; } - if (likely(pmd_trans_huge(pmde) || pmd_devmap(pmde))) { + if (likely(pmd_is_leaf(pmde))) { if (pvmw->flags & PVMW_MIGRATION) return not_found(pvmw); if (!check_pmd(pmd_pfn(pmde), pvmw)) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index e9fc3f6774a6..c7b7a803f4ad 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -139,8 +139,7 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, { pmd_t pmd; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && - !pmd_devmap(*pmdp)); + VM_BUG_ON(pmd_present(*pmdp) && !pmd_leaf(*pmdp)); pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; @@ -247,7 +246,7 @@ pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, pud_t pud; VM_BUG_ON(address & ~HPAGE_PUD_MASK); - VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); + VM_BUG_ON(!pud_leaf(*pudp)); pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); return pud; @@ -293,7 +292,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) *pmdvalp = pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) goto nomap; - if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval))) + if (unlikely(pmd_leaf(pmdval))) goto nomap; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd);