From patchwork Wed May 22 12:57:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13670846 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3ABC9C25B7D for ; Wed, 22 May 2024 12:57:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A9CFC6B0088; Wed, 22 May 2024 08:57:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A4C356B0089; Wed, 22 May 2024 08:57:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8ED0B6B008A; Wed, 22 May 2024 08:57:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 642646B0089 for ; Wed, 22 May 2024 08:57:24 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0EAFB161435 for ; Wed, 22 May 2024 12:57:24 +0000 (UTC) X-FDA: 82146032808.20.D3711AD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 630AA180015 for ; Wed, 22 May 2024 12:57:22 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DBubBgGh; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716382642; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7dtAZKM4MiVSYLDB4Q4XKxWFHe1658DI6+IaEJTZmd0=; b=vLc5hDyn56TQFLvV7/fPIarSWa/e8UlJlr4dv9flZP5zh8VpphEJ5s9Q3FVHBFk0HJMosB 7F126ypGAnmzxaOM++qrX0vMiC8W9nolhv1ph/uDjloURT3Y5UMCewGJ3frBkXyp5dBvLS 6UvdhnUqM2NZhIdJdktyYHbhJ3AQliE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716382642; a=rsa-sha256; cv=none; b=HL31EYatlwqciKWSm+XYa9y9JAhk4llogQVrIOEcrFPzKGCdj6E7m4YDDYfjwbxAcORTrN ZgMGVI56JMqCD37y1zdN1bHUq6ooNmrmQ+k7hTH0JDseY5K1GkjJy/iGGtf8Nb6x7rEDB5 5MRBx8eT3k3Y+thQk+YPvbJiNGqXgPk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DBubBgGh; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1716382641; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7dtAZKM4MiVSYLDB4Q4XKxWFHe1658DI6+IaEJTZmd0=; b=DBubBgGhln9CUy8Eimv2Iuda+Oe39DoVvoBl+iTBGXqd2fe6Zj6e2DuXpMXNcg5zrBnsgR ZjzzlPZzkgMXhjwqUQJGUUaHsyDs5olRCdVhY1sMLnOX4IMl0u4Z5stgyHN0RMU6pGHYW3 UIAYwH7dxTlK8UuXEMuXtJ43pn5vsFI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-318-xc9_3YTpP6ObsEwyO8otuw-1; Wed, 22 May 2024 08:57:17 -0400 X-MC-Unique: xc9_3YTpP6ObsEwyO8otuw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B83DF8058D5; Wed, 22 May 2024 12:57:16 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.195.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id 93E1040C6CB6; Wed, 22 May 2024 12:57:15 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Vincent Donnefort , Dan Williams Subject: [PATCH v2 1/3] mm/memory: move page_count() check into validate_page_before_insert() Date: Wed, 22 May 2024 14:57:11 +0200 Message-ID: <20240522125713.775114-2-david@redhat.com> In-Reply-To: <20240522125713.775114-1-david@redhat.com> References: <20240522125713.775114-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 X-Rspam-User: X-Stat-Signature: b1d5dxdbfuf8zpefq8uuswutpofj6nw7 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 630AA180015 X-HE-Tag: 1716382642-259702 X-HE-Meta: U2FsdGVkX1+7ohMYtpLVQo745w0j6muMLtsnu16f5SDfYiScPrme0ZGz6F/RAgfZgVnwGK9FEo6UGuEUw3H4I+67BHIm0CUCKKxpOqT0TAi+pSoLV8RceqnzPtPdE/8o8wwbYLnTy0fT1d8o65G9UP/C31IQX/V+eQf5TGly3xdR7LAIH5MsHFRJFScQbpUunhtuP0hYQ4wUvJi1OOnyg8coGaxsb7u5sYg9paNqV+zuUpNgaDryEe9ZgY2avQq6anaPYBgb9bspNMjpFGp+XUGCR9r2wqIWy/5h/uCRqa/80+wzgFRSipFSiu+51eLGkpIkuNfQgjBGr+oojo8Y1/45iYxWRyggxLEQVDtDb2i/R85/brJpsjys3iJuLQUMG1HCT0CRmGqeA7VDoqYVZ7fMcTHu2lHh0uxtNk3nVzLdzpqPvnzrkS1kK9M/pHjjDy6d8DpAgZzhrXrFWtwF2LjpZ90hOr//8EsZdU9zXALlfjchF5BYtWpjWymUnwHZ5nZ+2Wes3BSoKl0IMXbpOKmVH1ZrVMnJkSKNU98lcVnKEPGxQFng3ekAkLVFqiF2/k00blyOBPkxMP1gNHeVi5WW0VanljuHzL0hjBYdOA/P0u3kVdD+p4hfg+UKD4DAI8ZBXP+DNA+2Fd5TV52198Pp1xpqld1exiFqJtOlhwXqIKuBjUXB/Fxc8f7V9OSChsvbMdQu/sEoCC9PSkGDeIUqiVJNHRoXbBehbzUaYCwjiGailTj4dc3OWxFaZLS67lBxW2uXl0IPknDuPPxZQLMke+O+cttKSiUAXN2/dKp6jw7s+uTBbCpHerjZzl3vg/gydCVRYcuJw8funsI4rrU/6kxiJv90DrR50E8IlwQiD+qvgeoFNZi0vn4RDT/lmVTBCDiXCSHOf0U8ZOGiDQJ/PVfxUubfrpkC5vLmOGn6NxHLt2N8XIfL0hCVD+R8cDWl13xreeFHOPy9VkE rBr2nwn5 cZkhLwDC7RxyHx+oaFDcEeKJ689MG1EgvAiI3xFu8SUfeXxkc2rP/0wEuYUbJysgC/4avllotDBwX/LN8wPn8SSCUFPDjA3lzLvy3xy/PZaymj7HFdo+xLnA35J3EkHEhXR2Ppiw1bePGpJGIOvXdGqr1dFK2MGub0DNh+p27G9lUr92WcvaDUnIuDTTewrLZoqtKJQ10HesIEaJO1nDXvTMhKg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We'll now also cover the case where insert_page() is called from __vm_insert_mixed(), which sounds like the right thing to do. Signed-off-by: David Hildenbrand --- mm/memory.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index b5453b86ec4b..a3aad7e58914 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1987,6 +1987,8 @@ static int validate_page_before_insert(struct page *page) { struct folio *folio = page_folio(page); + if (!folio_ref_count(folio)) + return -EINVAL; if (folio_test_anon(folio) || folio_test_slab(folio) || page_has_type(page)) return -EINVAL; @@ -2041,8 +2043,6 @@ static int insert_page_in_batch_locked(struct vm_area_struct *vma, pte_t *pte, { int err; - if (!page_count(page)) - return -EINVAL; err = validate_page_before_insert(page); if (err) return err; @@ -2176,8 +2176,6 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, { if (addr < vma->vm_start || addr >= vma->vm_end) return -EFAULT; - if (!page_count(page)) - return -EINVAL; if (!(vma->vm_flags & VM_MIXEDMAP)) { BUG_ON(mmap_read_trylock(vma->vm_mm)); BUG_ON(vma->vm_flags & VM_PFNMAP); From patchwork Wed May 22 12:57:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13670847 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6090EC25B7A for ; Wed, 22 May 2024 12:57:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98D066B0089; Wed, 22 May 2024 08:57:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 914C56B008A; Wed, 22 May 2024 08:57:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 740986B0092; Wed, 22 May 2024 08:57:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2DF716B0089 for ; Wed, 22 May 2024 08:57:25 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B1BDA1A13A6 for ; Wed, 22 May 2024 12:57:24 +0000 (UTC) X-FDA: 82146032808.21.C224088 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id F0238140008 for ; Wed, 22 May 2024 12:57:22 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=COjhYIzZ; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716382643; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nimousALM6iigPMyKgp+2ymf7RNFQpfnnolNI1KtKGk=; b=YiclLcbS/TkNPJ6P346YlBlCFL7GEoT9dv/i0sZNRjIkYi2YvAaf6Ap/kleIzz/4mnuEsz o+y99mcrz0b4FVmmv6MPjL04tbbOrGbE0dm63WaMM8I73zG4dTCY9fFd3aFOJpc8s+rQFY DisYPRvz0dVzKFQNn9CuSm3PIJbtC2U= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=COjhYIzZ; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716382643; a=rsa-sha256; cv=none; b=yrxsqLya4TCqBLpHj7WReIpxUZqKH3e5/9Os5T+yqYUP6TFvDRKl7Hjps90CqSjsP6Sebu anPWb7OitfkmXyx2byyDH3pnizpvEWIFtG4HmITG6DJzcmb+uHjVKBwDFDbdO2Sv9ed6j5 rOkrzmaV6qCudX65vOr7tYoSdkac2HY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1716382642; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nimousALM6iigPMyKgp+2ymf7RNFQpfnnolNI1KtKGk=; b=COjhYIzZilhYKvdFxPFs443ZCokkkAh7om9hDU/KHB5tkcvu5xVFdP6ZtNoktZX2kJps9Z QT6cp+IqNVh7dR3itgoIH74OWI9KfmaLPntJl+1HBxme1fZybIbhoh+GRIwJN5i79XTKir paLpiwABSZqpwroV/46YrekD0SuIFeg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-532-gZ_eGzoNOL-ScBa9Sk3ukA-1; Wed, 22 May 2024 08:57:18 -0400 X-MC-Unique: gZ_eGzoNOL-ScBa9Sk3ukA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 37B00184882B; Wed, 22 May 2024 12:57:18 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.195.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id EE8A640C6EB7; Wed, 22 May 2024 12:57:16 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Vincent Donnefort , Dan Williams Subject: [PATCH v2 2/3] mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed() Date: Wed, 22 May 2024 14:57:12 +0200 Message-ID: <20240522125713.775114-3-david@redhat.com> In-Reply-To: <20240522125713.775114-1-david@redhat.com> References: <20240522125713.775114-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 X-Rspamd-Queue-Id: F0238140008 X-Stat-Signature: n3h7hfoc5nrukd5bkjr8r148me1mc6hp X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1716382642-745860 X-HE-Meta: U2FsdGVkX18+zOBoPa4PJupO6ryDsVYPnl7HQf5j2mdN1+KCFqJKgD7QGEcE7Es5BbgRvWEdY96Joz/ATz66vJdBfH90mZ0tsA6cWnfvkpvhJkTDzN9PmAL+MexfYM7hGGKVYybO+iG+/C+E/WkD0G14Hq6FZNOVNxN4VCy58hHe4nkaleuEmhThYssiulmekeMt+RGLtwO1nz5WwVKMeGgD5LYRujbzSclPhvFgQDZQDnaNJ6tzucUswktUVfaaUXWmK3cWjBLxNiRwzPueLhedQbpNBEqITkyU1E+Kq3KugjoeXa0yOdLzc2YtrcblKEOynnb0MeL0+EjRRaLLg1HRPMKARnokgfoQRr8lrfsH/bUgPmwk260KKnRgr6HQJsj7UxJ+0gmQzom0y47mp8UUp/k/U7iD8z6FpKaUAP7Yrug+OAxOYw0mJ0txmL2UVQRqsQ5R8iDYegG8MoB3zeup2zzYD48A4uWANXDRE7UMkcn5erB5TrjwJwdNYcidA9yQATiKusaLDro5zGRRvRte4bKx/WTzCdaVWaD7j9u1T6XDYizRKvzuBXEOvbiOFhKoFeLHafivPjMF6bsWMo+6uIrowDecrPmRs2v8FkTeD8ozpob2tT2pzWGrVSQyaeBhnhC4cQZBcKaOt5pt04IjhIsOzODn/yEWBlcEdO/NEibIzSbHSkfCWG7D9cyOKgS73CM3pHb+qMhfqwqx4M3+SqVNv3tTXxrZsmrumZuloy4fexRYcmR+c5Tvs3mFkmUf9jGDh+nhGR+6jZ2G5qF4639+ko0l9z5x8CWWTLBqgL9eZdvVVOMVSmO8KNM/50IT0YvWCl/S6KneCC7CbubMLeV30ACO9ttKbiAjsGLFhf05Y9nvG9scPgz0Bpz/itDSKb/IAE2IJ0YOV+bjK1hMia8rnqYPAID2dNMsEIirAXL3iDNoX8eeCdcRjNWq1F5YJ5FvxX/m4F8XQ1b 8puoj8Hx w4bcAOmnoMQ8l4VtyWABQCsURGtzDd+6LhXL4kzsqa69uooDlYQkasb1mtgtXAH1ngc5vZRiZZctbsQ1k/W7PJgBWQhLXx6Xf0jw31jrWbAYIVnuGzo+w0sxlaWJkYo41h5HPlTNpPUDKt7iiDDuX4FiYCM3T/XnmIuI1ymAWOyWRXdBlXKpG4YDDT/btduYPTV6shcdR1aznpRbcdzykTzDz5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For now we only get the (small) zeropage mapped to user space in four cases (excluding VM_PFNMAP mappings, such as /proc/vmstat): (1) Read page faults in anonymous VMAs (MAP_PRIVATE|MAP_ANON): do_anonymous_page() will not refcount it and map it pte_mkspecial() (2) UFFDIO_ZEROPAGE on anonymous VMA or COW mapping of shmem (MAP_PRIVATE). mfill_atomic_pte_zeropage() will not refcount it and map it pte_mkspecial(). (3) KSM in mergeable VMA (anonymous VMA or COW mapping). cmp_and_merge_page() will not refcount it and map it pte_mkspecial(). (4) FSDAX as an optimization for holes. vmf_insert_mixed()->__vm_insert_mixed() might end up calling insert_page() without CONFIG_ARCH_HAS_PTE_SPECIAL, refcounting the zeropage and not mapping it pte_mkspecial(). With CONFIG_ARCH_HAS_PTE_SPECIAL, we'll call insert_pfn() where we will not refcount it and map it pte_mkspecial(). In case (4), we might not have VM_MIXEDMAP set: while fs/fuse/dax.c sets VM_MIXEDMAP, we removed it for ext4 fsdax in commit e1fb4a086495 ("dax: remove VM_MIXEDMAP for fsdax and device dax") and for XFS in commit e1fb4a086495 ("dax: remove VM_MIXEDMAP for fsdax and device dax"). Without CONFIG_ARCH_HAS_PTE_SPECIAL and with VM_MIXEDMAP, vm_normal_page() would currently return the zeropage. We'll refcount the zeropage when mapping and when unmapping. Without CONFIG_ARCH_HAS_PTE_SPECIAL and without VM_MIXEDMAP, vm_normal_page() would currently refuse to return the zeropage. So we'd refcount it when mapping but not when unmapping it ... do we have fsdax without CONFIG_ARCH_HAS_PTE_SPECIAL in practice? Hard to tell. Independent of that, we should never refcount the zeropage when we might be holding that reference for a long time, because even without an accounting imbalance we might overflow the refcount. As there is interest in using the zeropage also in other VM_MIXEDMAP mappings, let's add clean support for that in the cases where it makes sense: (A) Never refcount the zeropage when mapping it: In insert_page(), special-case the zeropage, do not refcount it, and use pte_mkspecial(). Don't involve insert_pfn(), adjusting insert_page() looks cleaner than branching off to insert_pfn(). (B) Never refcount the zeropage when unmapping it: In vm_normal_page(), also don't return the zeropage in a VM_MIXEDMAP mapping without CONFIG_ARCH_HAS_PTE_SPECIAL. Add a VM_WARN_ON_ONCE() sanity check if we'd ever return the zeropage, which could happen if someone forgets to set pte_mkspecial() when mapping the zeropage. Document that. (C) Allow the zeropage only where reasonable s390x never wants the zeropage in some processes running legacy KVM guests that make use of storage keys. So disallow that. Further, using the zeropage in COW mappings is unproblematic (just what we do for other COW mappings), because FAULT_FLAG_UNSHARE can just unshare it and GUP with FOLL_LONGTERM would work as expected. Similarly, mappings that can never have writable PTEs (implying no write faults) are also not problematic, because nothing could end up mapping the PTE writable by mistake later. But in case we could have writable PTEs, we'll only allow the zeropage in FSDAX VMAs, that are incompatible with GUP and are blocked there completely. We'll always require the zeropage to be mapped with pte_special(). GUP-fast will reject the zeropage that way, but GUP-slow will allow it. (Note that GUP does not refcount the zeropage with FOLL_PIN, because there were issues with overflowing the refcount in the past). Add sanity checks to can_change_pte_writable() and wp_page_reuse(), to catch early during testing if we'd ever find a zeropage unexpectedly in code that wants to upgrade write permissions. Convert the BUG_ON in vm_mixed_ok() to an ordinary check and simply fail with VM_FAULT_SIGBUS, like we do for other sanity checks. Drop the stale comment regarding reserved pages from insert_page(). Note that: * we won't mess with VM_PFNMAP mappings for now. remap_pfn_range() and vmf_insert_pfn() would allow the zeropage in some cases and not refcount it. * vmf_insert_pfn*() will reject the zeropage in VM_MIXEDMAP mappings and we'll leave that alone for now. People can simply use one of the other interfaces. * we won't bother with the huge zeropage for now. It's never PTE-mapped and also GUP does not special-case it yet. Signed-off-by: David Hildenbrand --- mm/memory.c | 91 +++++++++++++++++++++++++++++++++++++++------------ mm/mprotect.c | 2 ++ 2 files changed, 72 insertions(+), 21 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index a3aad7e58914..863c24f861aa 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -575,10 +575,13 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * VM_MIXEDMAP mappings can likewise contain memory with or without "struct * page" backing, however the difference is that _all_ pages with a struct * page (that is, those where pfn_valid is true) are refcounted and considered - * normal pages by the VM. The disadvantage is that pages are refcounted - * (which can be slower and simply not an option for some PFNMAP users). The - * advantage is that we don't have to follow the strict linearity rule of - * PFNMAP mappings in order to support COWable mappings. + * normal pages by the VM. The only exception are zeropages, which are + * *never* refcounted. + * + * The disadvantage is that pages are refcounted (which can be slower and + * simply not an option for some PFNMAP users). The advantage is that we + * don't have to follow the strict linearity rule of PFNMAP mappings in + * order to support COWable mappings. * */ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, @@ -616,6 +619,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, if (vma->vm_flags & VM_MIXEDMAP) { if (!pfn_valid(pfn)) return NULL; + if (is_zero_pfn(pfn)) + return NULL; goto out; } else { unsigned long off; @@ -641,6 +646,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, * eg. VDSO mappings can cause them to exist. */ out: + VM_WARN_ON_ONCE(is_zero_pfn(pfn)); return pfn_to_page(pfn); } @@ -1983,12 +1989,48 @@ pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr, return pte_alloc_map_lock(mm, pmd, addr, ptl); } -static int validate_page_before_insert(struct page *page) +static bool vm_mixed_zeropage_allowed(struct vm_area_struct *vma) +{ + VM_WARN_ON_ONCE(vma->vm_flags & VM_PFNMAP); + /* + * Whoever wants to forbid the zeropage after some zeropages + * might already have been mapped has to scan the page tables and + * bail out on any zeropages. Zeropages in COW mappings can + * be unshared using FAULT_FLAG_UNSHARE faults. + */ + if (mm_forbids_zeropage(vma->vm_mm)) + return false; + /* zeropages in COW mappings are common and unproblematic. */ + if (is_cow_mapping(vma->vm_flags)) + return true; + /* Mappings that do not allow for writable PTEs are unproblematic. */ + if (!(vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) + return true; + /* + * Why not allow any VMA that has vm_ops->pfn_mkwrite? GUP could + * find the shared zeropage and longterm-pin it, which would + * be problematic as soon as the zeropage gets replaced by a different + * page due to vma->vm_ops->pfn_mkwrite, because what's mapped would + * now differ to what GUP looked up. FSDAX is incompatible to + * FOLL_LONGTERM and VM_IO is incompatible to GUP completely (see + * check_vma_flags). + */ + return vma->vm_ops && vma->vm_ops->pfn_mkwrite && + (vma_is_fsdax(vma) || vma->vm_flags & VM_IO); +} + +static int validate_page_before_insert(struct vm_area_struct *vma, + struct page *page) { struct folio *folio = page_folio(page); if (!folio_ref_count(folio)) return -EINVAL; + if (unlikely(is_zero_folio(folio))) { + if (!vm_mixed_zeropage_allowed(vma)) + return -EINVAL; + return 0; + } if (folio_test_anon(folio) || folio_test_slab(folio) || page_has_type(page)) return -EINVAL; @@ -2000,24 +2042,23 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte, unsigned long addr, struct page *page, pgprot_t prot) { struct folio *folio = page_folio(page); + pte_t pteval; if (!pte_none(ptep_get(pte))) return -EBUSY; /* Ok, finally just insert the thing.. */ - folio_get(folio); - inc_mm_counter(vma->vm_mm, mm_counter_file(folio)); - folio_add_file_rmap_pte(folio, page, vma); - set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot)); + pteval = mk_pte(page, prot); + if (unlikely(is_zero_folio(folio))) { + pteval = pte_mkspecial(pteval); + } else { + folio_get(folio); + inc_mm_counter(vma->vm_mm, mm_counter_file(folio)); + folio_add_file_rmap_pte(folio, page, vma); + } + set_pte_at(vma->vm_mm, addr, pte, pteval); return 0; } -/* - * This is the old fallback for page remapping. - * - * For historical reasons, it only allows reserved pages. Only - * old drivers should use this, and they needed to mark their - * pages reserved for the old functions anyway. - */ static int insert_page(struct vm_area_struct *vma, unsigned long addr, struct page *page, pgprot_t prot) { @@ -2025,7 +2066,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, pte_t *pte; spinlock_t *ptl; - retval = validate_page_before_insert(page); + retval = validate_page_before_insert(vma, page); if (retval) goto out; retval = -ENOMEM; @@ -2043,7 +2084,7 @@ static int insert_page_in_batch_locked(struct vm_area_struct *vma, pte_t *pte, { int err; - err = validate_page_before_insert(page); + err = validate_page_before_insert(vma, page); if (err) return err; return insert_page_into_pte_locked(vma, pte, addr, page, prot); @@ -2149,7 +2190,8 @@ EXPORT_SYMBOL(vm_insert_pages); * @page: source kernel page * * This allows drivers to insert individual pages they've allocated - * into a user vma. + * into a user vma. The zeropage is supported in some VMAs, + * see vm_mixed_zeropage_allowed(). * * The page has to be a nice clean _individual_ kernel allocation. * If you allocate a compound page, you need to have marked it as @@ -2193,6 +2235,8 @@ EXPORT_SYMBOL(vm_insert_page); * @offset: user's requested vm_pgoff * * This allows drivers to map range of kernel pages into a user vma. + * The zeropage is supported in some VMAs, see + * vm_mixed_zeropage_allowed(). * * Return: 0 on success and error code otherwise. */ @@ -2408,8 +2452,11 @@ vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr, } EXPORT_SYMBOL(vmf_insert_pfn); -static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn) +static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn, bool mkwrite) { + if (unlikely(is_zero_pfn(pfn_t_to_pfn(pfn))) && + (mkwrite || !vm_mixed_zeropage_allowed(vma))) + return false; /* these checks mirror the abort conditions in vm_normal_page */ if (vma->vm_flags & VM_MIXEDMAP) return true; @@ -2428,7 +2475,8 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma, pgprot_t pgprot = vma->vm_page_prot; int err; - BUG_ON(!vm_mixed_ok(vma, pfn)); + if (!vm_mixed_ok(vma, pfn, mkwrite)) + return VM_FAULT_SIGBUS; if (addr < vma->vm_start || addr >= vma->vm_end) return VM_FAULT_SIGBUS; @@ -3176,6 +3224,7 @@ static inline void wp_page_reuse(struct vm_fault *vmf, struct folio *folio) pte_t entry; VM_BUG_ON(!(vmf->flags & FAULT_FLAG_WRITE)); + VM_WARN_ON(is_zero_pfn(pte_pfn(vmf->orig_pte))); if (folio) { VM_BUG_ON(folio_test_anon(folio) && diff --git a/mm/mprotect.c b/mm/mprotect.c index 94878c39ee32..f7b7d107edf5 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -70,6 +70,8 @@ bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr, return page && PageAnon(page) && PageAnonExclusive(page); } + VM_WARN_ON_ONCE(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte)); + /* * Writable MAP_SHARED mapping: "clean" might indicate that the FS still * needs a real write-fault for writenotify From patchwork Wed May 22 12:57:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13670848 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BA28C25B7C for ; Wed, 22 May 2024 12:57:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 175E46B0092; Wed, 22 May 2024 08:57:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 126486B0095; Wed, 22 May 2024 08:57:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE1A76B0096; Wed, 22 May 2024 08:57:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BEDAB6B0092 for ; Wed, 22 May 2024 08:57:27 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 356B5A030F for ; Wed, 22 May 2024 12:57:27 +0000 (UTC) X-FDA: 82146032934.15.05810A9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 88B281C0028 for ; Wed, 22 May 2024 12:57:25 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BWAN6ygy; spf=pass (imf18.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716382645; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Qt4DlmVVsqb/QNq76vNlwwGXNiph41eusgcJK0KNhok=; b=A3i5tG4PUIWWolfCEj/wqU+njkwxtkLsKWbIdcCpnkTLJ3rO/TP73PLLx8iiyeHnUyk9T7 fTVUVSZxTLQIGLkWgfyYCg4roNiaUvbxDKPgp+2tW8Yi3cI5NIM2Y91fY+qdBVmOscLH2R PRs7EYPRyxusJursTqyV6UsZHXSr7a0= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BWAN6ygy; spf=pass (imf18.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716382645; a=rsa-sha256; cv=none; b=E+BA/Ot2wCTz1qE4FDr8Ng5TfdRev1WA4mPDUkW/xgie6vGKh0sQ62f89L65uXVj/vRLa1 bXqqNEnO86TBgDWKwiNf6m47L200VZmSjszJyjQXDcliJB6Qzmu1RH3CksP3lTaH2lMbiJ 9DNRoaDZh/t5Cohd+Jw3ykQ3E5Jy404= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1716382644; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qt4DlmVVsqb/QNq76vNlwwGXNiph41eusgcJK0KNhok=; b=BWAN6ygyn19hsy1OJ5KHl4XR3AA6HK+1lO1b9yDozXj0Fh+si/KWHWcBagoSpMy1/eOb6u z66392vvkw4rKXlMUuIhF/eUDqdcs0ky+btVTtdUtuEZ8HqQZByojJNbxjHlbxHOAOuP8o eyxyhNe1oL2zaAVwc1qN2vsdr01mNM4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-537-66OPK548MCiiS05cIhsAWg-1; Wed, 22 May 2024 08:57:20 -0400 X-MC-Unique: 66OPK548MCiiS05cIhsAWg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 905F7800169; Wed, 22 May 2024 12:57:19 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.195.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6CCEC40C6CB7; Wed, 22 May 2024 12:57:18 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Vincent Donnefort , Dan Williams Subject: [PATCH v2 3/3] mm/rmap: sanity check that zeropages are not passed to RMAP Date: Wed, 22 May 2024 14:57:13 +0200 Message-ID: <20240522125713.775114-4-david@redhat.com> In-Reply-To: <20240522125713.775114-1-david@redhat.com> References: <20240522125713.775114-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 88B281C0028 X-Stat-Signature: aozt3bggu1748shuumdfaewajactwsm4 X-Rspam-User: X-HE-Tag: 1716382645-458151 X-HE-Meta: U2FsdGVkX1/PWRvXYY7C4dzdNWjRQMnN/AluQwI0Wm9uojmRXivJmZG1I9XlDD1c7vz3kDZIGEeKFSPAe8r/X/iqUY8LkowYdqJTgEAOCPiksr6n1M8jNvu7gO0oK8/Q8Kx82ir5hPZIHUsel4JqwSSwNJaWYo90UBYVQ0KJUF+a4O1ySHDN2AHpgy98QHM88vLRb9DAxBqNic7WtpolwKQ5JLZSVGjGsc+oXyu35igYcg+X4WosXbEvbMH0wO1JJd80Dvugvm/KZlkEMq7XreTbqEsjig+C7W+4b7uZXaEPTdRtpzWIZ92Ubxeeksag9O3d3Qom0tp/3JO31nxPdj0+76scy+exELdtTrm7Cp2vSd+P1CfvfgixkfhErWvDMVHZxl17XMmzGrWAT5LKEayFZopQeXJkHsLDbOfoB40Lru3BTGg5jH3XyYNvvZz8dGa6B5N+t/i8ElPjEtV2EF2Af+o3XCkV2WyVVsLzLzcYsYSO+czcTC5yzCNSz1GQYEJZCFiAgq9Z8Rd8HscCNGhanWcyGdMqGtZtY/GGtmkHIipP1UXuSQS2q1iftLk/xdbLsjaHuMkB0mVN9BesjX1lVzOpodE/9Ui2ZPSfV494O+H0W5ga8+cSruUi+7lVwfQLf5cp1hjo1JMIi3DySwnE+wdcfhEzSew5FvP6oybbt7tX4VC/wHEBECoeGhJpijSO1wpvm2VPUUvY/90Kn0v6bA9cLHgKVOB+Na++TJjxiF2dRmsc1rttTbLH1zL/AGmEkvilqXuwcMCPofsyzYgeVHVcjEh9J3nPDrgq+wvxBEzXK0ICvHJfVx8kEPN+TlfRd6l1XaqCTInL2sjz1LkCPE1K0P5ma//CiHhBbW9UBkj1zPJNR1FT9ieoM16Zm5zskeDf7jTPdu2kH0oj7u5Usvwd9bNvLPlRwFb4DdeR6cQGu8teP4sROuXCuKiYPItBnh5kLGaq2af6z15 O3JpWr0f 9JqTgMt9zkytBcCXhSPrECbYuY2vI3PElk09QpSMzo83qjRARLx+S6+Rc/SES+uAnnxq/vfa029ebvx8ZR9iVsvKvJ9SqnFl6hM67f6lfoBLiP1WC07X6hSPUWSUZy7VliIJTvNBaz9DpU4gMqCQyNjYmaZF9AEejSSqMN0OIhy3gP0k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Using insert_page() we might have previously ended up passing the zeropage into rmap code. Make sure that won't happen again. Note that we won't check the huge zeropage for now, which might still end up in RMAP code. Signed-off-by: David Hildenbrand --- include/linux/rmap.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 7229b9baf20d..5cb0d419a1d7 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -200,6 +200,9 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio, /* hugetlb folios are handled separately. */ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + /* When (un)mapping zeropages, we should never touch ref+mapcount. */ + VM_WARN_ON_FOLIO(is_zero_folio(folio), folio); + /* * TODO: we get driver-allocated folios that have nothing to do with * the rmap using vm_insert_page(); therefore, we cannot assume that