From patchwork Fri Nov 24 13:26:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16484C624B4 for ; Fri, 24 Nov 2023 13:27:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 91AAB8D007E; Fri, 24 Nov 2023 08:27:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 87BEE8D006E; Fri, 24 Nov 2023 08:27:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CE088D007E; Fri, 24 Nov 2023 08:27:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4E75D8D006E for ; Fri, 24 Nov 2023 08:27:14 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2A32940410 for ; Fri, 24 Nov 2023 13:27:14 +0000 (UTC) X-FDA: 81492923988.04.34BD6AC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 6EFA88000A for ; Fri, 24 Nov 2023 13:27:12 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=f2v4Nd9M; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QUP8GR2DnkC5694xV/qGgIseF+xugXnf+IpocEcF75Q=; b=y0GwYH0otRp3kVhYAeo9gX51h4AIAlMtAwUse2q56DVu/bozsMW/5sy2/wVNS+kq8Ucc0b J9GWtHIgGvppdjLEx2vh625+Ie2ymHyZhAcxzcCAiW6I8g8x6+CY4DZkxhNYBwGW9FGLMo SAEOVXNTpHYMvgNoPEubb0Gtr2hhdUk= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=f2v4Nd9M; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832432; a=rsa-sha256; cv=none; b=csuAGugzZfqcFdOfshUmGqqUeKoGpZG+/WT5jq3YkBBOZ+IazRBZj2yMObIyBMylhbvWXu DbQh9lKOliKwmXDWtCMAiLNgiE4+UJ9EHfujePajlzRCccrZIKqtdUDzUZi2hu1Q5haywv XNbV1Az8RghEi8Q6u9Fszwwb9K5DcDM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832431; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QUP8GR2DnkC5694xV/qGgIseF+xugXnf+IpocEcF75Q=; b=f2v4Nd9MWqwmzWHvH54xxheGVyGAyt5tAxJIMVpfBd0QdgkTuJFTFXpNhTI0bmsb+XtBuw kKskpr0b9PT+eBiQ0HxvdQMgwW0ZzTWmPnqV9y9qoUWFdhaGf7jJhOjxbtF0fnoulpXgZl +uGVUW0gJA7SwIMywIZvNl0TsromnDE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-508-EVwlOeefM6O30sNZp-JA9g-1; Fri, 24 Nov 2023 08:27:08 -0500 X-MC-Unique: EVwlOeefM6O30sNZp-JA9g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BC99C811E93; Fri, 24 Nov 2023 13:27:07 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id ECEBC2166B2B; Fri, 24 Nov 2023 13:27:03 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 10/20] mm/memory: COW reuse support for PTE-mapped THP with rmap IDs Date: Fri, 24 Nov 2023 14:26:15 +0100 Message-ID: <20231124132626.235350-11-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6EFA88000A X-Stat-Signature: 3xjq5at5u1gjqgozkwczfkjxbogaa4ji X-HE-Tag: 1700832432-721164 X-HE-Meta: U2FsdGVkX1+rRP8r/eUZO/nx9MQnSYnwoxZgUrGYqUzev/7QRLWO3KA8lLw5zJPoJeNb8BpiHOQS+BSOnuiqfvzOQpKFrv0Zhxm8lHRaL8/o5BGsTAZ3vec6/amJnTgD8kK/FdcCxoSSWyFQiY6eapVg3im+/0YGkLgJclXfZnbnov1RSzRips1yJ+jQwA/gdVm/4mVUjj5sjD11SHXIM0hEwkdDb5r1FHtgQbYwVjynq5YBNCpMoi5wKrEnwen4t4t5S+k0phQwIHBZm5H0Hifp+X1+j+AIN4/lXBq9NXS9ai2Fh/dgSjh8VNwZX8agKjnDJU7fuFX+07qKj0ZuA8jDIpD9Su5xHUQpBirl5onDOHL2slrmGSpuJh9H2gN+jR++omnLGLtx2HdkMgY5DfnnzBULlp/2H6JpGYz1NXR1IR9dmX4TpbvRfkQIg4rCa9EygTWqc+o2dYVp2YhSH4v0YJiDhuX7RYdCuPxfh5Bduk0sexU2PQFzcbiY//fiPgqTnJmTh/Lk8tpMFlec5FXlgN1ke9nI5TqHnbmdocqHCT1n4daGc7tb1F68j7pf9J9a0u/+fB2ltmspmc3DDS728Mz9qe4b6QkBGNLp7s7MOCVKSd7anWKd0OfLUmy82lh7ylxPs/IAb9E1X4wnZJ60gwE0l0bZoO67p9TVBEybjDS3J7FpTjuomLs/yCWah2CLZC1RdC11q9LYL5WmGlraChXBvo5GAfq8ZK+6LLbS4owMDakMkNIQgmA2SAJJHKxONZThRoTZS9LVbXliTWuSsr269W7/dUc44rZ+ytmKmi2HRlL0PxqvnHKucj9JCDPcE2l2Wk+fOYUUhcmjl9pYTqqaLbqQlXTVUAF8GgZIQbx8HlPcBJDETnWUT9geae98gqLJipc7RJNbMqQxVZwJkzPELG4Z5BRRJz1aat7P+2/fI3sum91a72QM35lfkdntQ/sgHhNKvNNTPKI wFoG2xxs gOTV2PbxePGNVfJzgUSfFuXrRGBNudJq/N+N7ogqOea4vEO6MnjLkDdvAK669AqLsZjxD4AK2dc0+YR9lo/F5IG1tYQn+AbkIFkPdvIn4EdKMZ0FseqhcHX/C9IW6eeljGNOE/v/zNPUr58NkyRiq6Y1Pd9/CWXU3UVdr0tat85ilX62eJcuPvUeIH505Ed4TMaQ5KBUm6SuschZ4w7QkV70qRg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For now, we only end up reusing small folios and PMD-mapped large folios (i.e., THP) after fork(); PTE-mapped THPs are never reused, except when only a single page of the folio remains mapped. Instead, we end up copying each subpage even though the THP might be exclusive to the MM. The logic we're using for small folios and PMD-mapped THPs is the following: Is the only reference to the folio from a single page table mapping? Then: (a) There are no other references to the folio from other MMs (e.g., page table mapping, GUP) (b) There are no other references to the folio from page migration/ swapout/swapcache that might temporarily unmap the folio. Consequently, the folio is exclusive to that process and can be reused. In that case, we end up with folio_refcount(folio) == 1 and an implied folio_mapcount(folio) == 1, while holding the page table lock and the page lock to protect against possible races. For PTE-mapped THP, however, we have not one, but multiple references from page tables, whereby such THPs can be mapped into multiple page tables in the MM. Reusing the logic that we use for small folios and PMD-mapped THPs means, that when reusing a PTE-mapped THP, we want to make sure that: (1) All folio references are from page table mappings. (2) All page table mappings belong to the same MM. (3) We didn't race with (un)mapping of the page related to other page tables, such that the mapcount and refcount are stable. For (1), we can check folio_refcount(folio) == folio_mapcount(folio) For (2) and (3), we can use our new rmap ID infrastructure. We won't bother with the swapcache and LRU cache for now. Add some sanity checks under CONFIG_DEBUG_VM, to identify any obvious problems early. Signed-off-by: David Hildenbrand --- mm/memory.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 5048d58d6174..fb533995ff68 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3360,6 +3360,95 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio) static bool wp_can_reuse_anon_folio(struct folio *folio, struct vm_area_struct *vma) { +#ifdef CONFIG_RMAP_ID + if (folio_test_large(folio)) { + bool retried = false; + unsigned long start; + int mapcount, i; + + /* + * The assumption for anonymous folios is that each page can + * only get mapped once into a MM. This also holds for + * small folios -- except when KSM is involved. KSM does + * currently not apply to large folios. + * + * Further, each taken mapcount must be paired with exactly one + * taken reference, whereby references must be incremented + * before the mapcount when mapping a page, and references must + * be decremented after the mapcount when unmapping a page. + * + * So if all references to a folio are from mappings, and all + * mappings are due to our (MM) page tables, and there was no + * concurrent (un)mapping, this folio is certainly exclusive. + * + * We currently don't optimize for: + * (a) folio is mapped into multiple page tables in this + * MM (e.g., mremap) and other page tables are + * concurrently (un)mapping the folio. + * (b) the folio is in the swapcache. Likely the other PTEs + * are still swap entries and folio_free_swap() would fail. + * (c) the folio is in the LRU cache. + */ +retry: + start = raw_read_atomic_seqcount(&folio->_rmap_atomic_seqcount); + if (start & ATOMIC_SEQCOUNT_WRITERS_MASK) + return false; + mapcount = folio_mapcount(folio); + + /* Is this folio possibly exclusive ... */ + if (mapcount > folio_nr_pages(folio) || folio_entire_mapcount(folio)) + return false; + + /* ... and are all references from mappings ... */ + if (folio_ref_count(folio) != mapcount) + return false; + + /* ... and do all mappings belong to us ... */ + if (!__folio_has_large_matching_rmap_val(folio, mapcount, vma->vm_mm)) + return false; + + /* ... and was there no concurrent (un)mapping ? */ + if (raw_read_atomic_seqcount_retry(&folio->_rmap_atomic_seqcount, + start)) + return false; + + /* Safety checks we might want to drop in the future. */ + if (IS_ENABLED(CONFIG_DEBUG_VM)) { + unsigned int mapcount; + + if (WARN_ON_ONCE(folio_test_ksm(folio))) + return false; + /* + * We might have raced against swapout code adding + * the folio to the swapcache (which, by itself, is not + * problematic). Let's simply check again if we would + * properly detect the additional reference now and + * properly fail. + */ + if (unlikely(folio_test_swapcache(folio))) { + if (WARN_ON_ONCE(retried)) + return false; + retried = true; + goto retry; + } + for (i = 0; i < folio_nr_pages(folio); i++) { + mapcount = page_mapcount(folio_page(folio, i)); + if (WARN_ON_ONCE(mapcount > 1)) + return false; + } + } + + /* + * This folio is exclusive to us. Do we need the page lock? + * Likely not, and a trylock would be unfortunate if this + * folio is mapped into multiple page tables and we get + * concurrent page faults. If there would be references from + * page migration/swapout/swapcache, we would have detected + * an additional reference and never ended up here. + */ + return true; + } +#endif /* CONFIG_RMAP_ID */ /* * We have to verify under folio lock: these early checks are * just an optimization to avoid locking the folio and freeing