From patchwork Mon Mar 3 16:30:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13999203 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0592C282CD for ; Mon, 3 Mar 2025 16:31:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2315328000F; Mon, 3 Mar 2025 11:31:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C33828000C; Mon, 3 Mar 2025 11:31:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F28B728000F; Mon, 3 Mar 2025 11:31:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CD1C828000C for ; Mon, 3 Mar 2025 11:31:03 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 94DDCA494C for ; Mon, 3 Mar 2025 16:31:03 +0000 (UTC) X-FDA: 83180779206.11.6032082 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf11.hostedemail.com (Postfix) with ESMTP id DA4C14001A for ; Mon, 3 Mar 2025 16:30:56 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jFTTM+kq; spf=pass (imf11.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741019456; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=L4FvEMzbdPGmsETJ2MBrv3sRxLuQ35ROZ1b2dyZFsHo=; b=BAgva+jeTtCLMkhTmWMQG2QM78K3rICfI9GRP4PYcd9Vb9j6R1t2b8At5Jt/MhKek1UO5t i9Y3tcquSkhvfi8X/RYtOgdl66Jxf/jXhy8/RG82m4WhfTfRyDmOiLUx5tzIhg4uGX4KsC 50QgOm51CP5vj0YRcQ5B306TenOkBfU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741019456; a=rsa-sha256; cv=none; b=fklHv7QlC52UgmLr+27bUBtlpwQAKoMb3qa9GoLYuTI+Hf8cVNG2S7f8k1ZQ/wxZ8ECsS6 RB9wKp4PX/KilBg0BWwlu+FNn/Ma/gH516Yx4ru2WJZTP/vXI4KmBOrS10SNiOarByHm1R saqM9edvdyrDqT69jw5WDLFu5scMHUM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jFTTM+kq; spf=pass (imf11.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741019455; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L4FvEMzbdPGmsETJ2MBrv3sRxLuQ35ROZ1b2dyZFsHo=; b=jFTTM+kqDPgoMTdn+rFwQqy0/QULIs31F+iqVN/3CmifnlT/nAZdOrXvLOxTviVCKlu6tA HjaWLTnAVVo2hoamaj70OKoQJUVURhSNDbqCJYmzRJRzt9lruLemmXZ/2wGs84KUB5mKv7 VMtqDHEuVhtK9xo5J0+dgo1HpA5jLG4= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-45-nLLjbb4UMOq5qjE1yWsLBw-1; Mon, 03 Mar 2025 11:30:48 -0500 X-MC-Unique: nLLjbb4UMOq5qjE1yWsLBw-1 X-Mimecast-MFC-AGG-ID: nLLjbb4UMOq5qjE1yWsLBw_1741019448 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-390f365274dso1237035f8f.0 for ; Mon, 03 Mar 2025 08:30:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741019447; x=1741624247; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=L4FvEMzbdPGmsETJ2MBrv3sRxLuQ35ROZ1b2dyZFsHo=; b=GMOLVnVArCW1UyC3OE8AxkfiCH1mbuDzk57ZRf2I35lFYz8Lta/2gbFJ/hcu7cl+zM RIDITw2DZVt/xG86BbpxleuvkNLNqOsGM46oJDZnDKA0pkbVlV83eFgzmBzYKU/iK7TR /uDqNdHntidh81Kdlw8/E/xkAJrCPN70J/GuoeeCArRvKz+AFzsBSS/srNCpBggJCEoa /EgOGwPru/qz8J4ppi5tvj21OMZrIZ66VYY7uUGttAf5KBcI/sBItEHjlUoSvLv3/bhq ASzsnLuF+Fa2rthuyEZ13bFf2ALqBxtlGKDnXnCnyGet4A0nH4+nNvXf9tbwVK5N8yYZ kmaQ== X-Forwarded-Encrypted: i=1; AJvYcCUur4KUQPr+0cf1Qznp56IV711/V84+m8CCIlsQy3b5Xy1qKZZxLwtxoptghLCfDzt3wzLntvoq4w==@kvack.org X-Gm-Message-State: AOJu0Yy1eqF4umE5HChVCJqG0kQpwo6EriSzRZWLE0Fr/bHgPQq5Ux1F UweOUcNI3iDU/bH1twWUR/k4jC47JPuNIC74tJ1FdlGc7tdD4XffMzoa6EF0qUnjfsCL1ypQ381 x0EVl0H8ukHh9VMUMraSONw75FSVbVDZD+Mt3trWxopAqx86m X-Gm-Gg: ASbGnctkOpAB5Mf+4xm0vKttFppIVOTk73yxG9+IZuYJ71hMFjB4oeSvy6TR4BIzJGW 3xBQPY8lCW+9Pl3JtcUMCEeU85HaU8385zGHV1cqVYLtcCRV1SekMt6WluZDLHWZOGnOeaaqFgq E6qqIICLfdqSGtVDnYCoDmHFUstwfT5s6b5tEWdFf4wggMY2uob3PYfhyD/s+zITwqpNphWk6fE dRHg5EN1hjyQOOTaCgLGc+jKdDfbvrkz6MBcsEWY81fDDLOZeZo/Dvnzne5uj68QhAN1ZovCzVR 95j/7sIbrSg4PYB8Y+YEgKXKSC/knr9Nf/VgRUoX3XzVFnFaZV0dPLUGNUvSVoxpLW6Qb86NPSR I X-Received: by 2002:a05:6000:1563:b0:390:e9e0:5cc6 with SMTP id ffacd0b85a97d-390e9e062d6mr13290396f8f.1.1741019447605; Mon, 03 Mar 2025 08:30:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IFZ9qLz9Zew7W+f+EXmCL5po72v+Q03kK3/3MSwVL/z8h10XiTsQrbDI7Khsstadj0a/aIAmA== X-Received: by 2002:a05:6000:1563:b0:390:e9e0:5cc6 with SMTP id ffacd0b85a97d-390e9e062d6mr13290341f8f.1.1741019447204; Mon, 03 Mar 2025 08:30:47 -0800 (PST) Received: from localhost (p200300cbc7349600af274326a2162bfb.dip0.t-ipconnect.de. [2003:cb:c734:9600:af27:4326:a216:2bfb]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-390e4795d1asm14736287f8f.4.2025.03.03.08.30.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Mar 2025 08:30:46 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Johannes Weiner , =?utf-8?q?Michal_Koutn=C3=BD?= , Jonathan Corbet , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , Muchun Song , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn Subject: [PATCH v3 13/20] mm: Copy-on-Write (COW) reuse support for PTE-mapped THP Date: Mon, 3 Mar 2025 17:30:06 +0100 Message-ID: <20250303163014.1128035-14-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250303163014.1128035-1-david@redhat.com> References: <20250303163014.1128035-1-david@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ZllzbOIhlLf7toB4usFruD2fE8tBnYhMYCdtFrkfxik_1741019448 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam02 X-Stat-Signature: rwguskgxqsbcdhn7ugbz91shkinereim X-Rspamd-Queue-Id: DA4C14001A X-Rspam-User: X-HE-Tag: 1741019456-859075 X-HE-Meta: U2FsdGVkX18vSRmsONTz8HsZ1w52M3qZSo6RNE1XXJl4krqEu5dzs6PBaXh63pHuX0TnHLFuPgzlWIn8mJM0QdLYEkzsnhWWOFSk3t9yBWRjryu1onGluLVzWXPTyO1ez0ECu4/n/cRe5dpg8KDm2d4ENMZXUVKQDwBcfIg3Yc1VeC9faznm5MNzu/LI6mWGj08xesOKExFspys3dFomxi2504zqvYI5xwADJ8UNbQdQH4Q7k15I6cAn8rSE8MG83ad3Um8uk5SHnduGWp7T6a2oT/eZhiNC+2+gcjuaaU5DHEXIyWy+cILRbFNuEBbayM7va+mkqvd63uGU/Zb0PT1Xt5vKEpinXkysTZK48dcMU0tJt61GM+5Q9DRvm5WoeyRa9KP7lUDdRT38ZhJN8FpQVadfskJwYuMBHMdGv6M2bOUQKGrTQ7odZg9cHPJ+SLLl5L58R9rgcZDhGBQ9EooppRmolE5jEIe3zHWbnpBTSA8MVUeYlvrYU/a1yz5SoMwgkyqllL3CmYRBJYdyp38kKS6hIEtIczWNGlUahSC9IFyazTvGACH2QW8AdJzq927inQdOazv6eQrWc2Tps+Ttvkh0tzgrjX4jrCoR/7RkrOIVY9XxC7FK3U4JywWKeEbRcTeYPdc9k7N5t32y9oT2/NqzCKuV4iya9Li5PWhn7uOeDnKyslXcEbxs3ZYQ+YzI8BjJBsCIbYGXjruAbcmPVjK1N2+rUBVqXC14ChHqGzUMjkMo4OXTUl+0kVuNGu10PFcHUa+z71HDRYz8tsKD2mMdZp1VxN1gRQBIMug4C9feVJbavdPwuUlflUjoRYC0cpSC6E1Lyhx/GOSQ77cmZ9+bMyt+wxEnlAOF+9UbHZw1BPbIb09YOmf8s+Eo4ALXZbKd/2HvyS7gT4R0k2mV44NpgAqC40D8hq0coK92oWn9UpL4dTFB7+W9tBFhp7MrSWOW5BfvEm9vk1o urRmg20I eoQSoinjARsQ5pGkeTsbcQK3gMTPzdAeiOx6HhUe8RvrZ3/s+D6HqfGSo9JeKJndRMLrbIYjB4JYlW9FcaP8vUFSNoRU7j/rAnjprgVPFsOEXLkcD9EpZEtgngtGRKDO4G4vF+2lTFsQZ7OlXF0t7l3jdo1Mw8L28jC/bcjKyU518xjuBgtXI+2hbT2PkSFKt9l0mU2xpKLnFjp7Sc0qOTzeyn0cfRWTMWGRzIxwFAQRWoYvXY/93Lp6S20Ey8siDtqUoX7Pn990kJHw0s7K2VKxYyLDVsz6ZOz4ioqNE9ITpHSJsmezAtr6mkatFmZw6vKumJmgqoVbVw6EK2QMd+Yv/zEDcGeu4vicj97GnFdJlmS2hTXXt/QTW3adbPdOFmRke12oDWysahNiPcXVuZ4PFGdM1S+2jZkqPlmeSkKcBB8Igftv8jBVOsAtE9BG4KQXeYLfJecqn057K6Xq92uwZCicClWQkbspe X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, we never end up reusing PTE-mapped THPs after fork. This wasn't really a problem with PMD-sized THPs, because they would have to be PTE-mapped first, but it's getting a problem with smaller THP sizes that are effectively always PTE-mapped. With our new "mapped exclusively" vs "maybe mapped shared" logic for large folios, implementing CoW reuse for PTE-mapped THPs is straight forward: if exclusively mapped, make sure that all references are from these (our) mappings. Add some helpful comments to explain the details. CONFIG_TRANSPARENT_HUGEPAGE selects CONFIG_MM_ID. If we spot an anon large folio without CONFIG_TRANSPARENT_HUGEPAGE in that code, something is seriously messed up. There are plenty of things we can optimize in the future: For example, we could remember that the folio is fully exclusive so we could speedup the next fault further. Also, we could try "faulting around", turning surrounding PTEs that map the same folio writable. But especially the latter might increase COW latency, so it would need further investigation. Signed-off-by: David Hildenbrand --- mm/memory.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 75 insertions(+), 8 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 73b783c7d7d51..bb245a8fe04bc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3729,19 +3729,86 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio) return ret; } -static bool wp_can_reuse_anon_folio(struct folio *folio, - struct vm_area_struct *vma) +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static bool __wp_can_reuse_large_anon_folio(struct folio *folio, + struct vm_area_struct *vma) { + bool exclusive = false; + + /* Let's just free up a large folio if only a single page is mapped. */ + if (folio_large_mapcount(folio) <= 1) + return false; + /* - * We could currently only reuse a subpage of a large folio if no - * other subpages of the large folios are still mapped. However, - * let's just consistently not reuse subpages even if we could - * reuse in that scenario, and give back a large folio a bit - * sooner. + * The assumption for anonymous folios is that each page can only get + * mapped once into each MM. The only exception are KSM folios, which + * are always small. + * + * Each taken mapcount must be paired with exactly one taken reference, + * whereby the refcount must be incremented before the mapcount when + * mapping a page, and the refcount must be decremented after the + * mapcount when unmapping a page. + * + * If all folio references are from mappings, and all mappings are in + * the page tables of this MM, then this folio is exclusive to this MM. */ - if (folio_test_large(folio)) + if (folio_test_large_maybe_mapped_shared(folio)) + return false; + + VM_WARN_ON_ONCE(folio_test_ksm(folio)); + VM_WARN_ON_ONCE(folio_mapcount(folio) > folio_nr_pages(folio)); + VM_WARN_ON_ONCE(folio_entire_mapcount(folio)); + + if (unlikely(folio_test_swapcache(folio))) { + /* + * Note: freeing up the swapcache will fail if some PTEs are + * still swap entries. + */ + if (!folio_trylock(folio)) + return false; + folio_free_swap(folio); + folio_unlock(folio); + } + + if (folio_large_mapcount(folio) != folio_ref_count(folio)) return false; + /* Stabilize the mapcount vs. refcount and recheck. */ + folio_lock_large_mapcount(folio); + VM_WARN_ON_ONCE(folio_large_mapcount(folio) < folio_ref_count(folio)); + + if (folio_test_large_maybe_mapped_shared(folio)) + goto unlock; + if (folio_large_mapcount(folio) != folio_ref_count(folio)) + goto unlock; + + VM_WARN_ON_ONCE(folio_mm_id(folio, 0) != vma->vm_mm->mm_id && + folio_mm_id(folio, 1) != vma->vm_mm->mm_id); + + /* + * Do we need the folio lock? Likely not. If there would have been + * references from page migration/swapout, we would have detected + * an additional folio reference and never ended up here. + */ + exclusive = true; +unlock: + folio_unlock_large_mapcount(folio); + return exclusive; +} +#else /* !CONFIG_TRANSPARENT_HUGEPAGE */ +static bool __wp_can_reuse_large_anon_folio(struct folio *folio, + struct vm_area_struct *vma) +{ + BUILD_BUG(); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +static bool wp_can_reuse_anon_folio(struct folio *folio, + struct vm_area_struct *vma) +{ + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_large(folio)) + return __wp_can_reuse_large_anon_folio(folio, vma); + /* * We have to verify under folio lock: these early checks are * just an optimization to avoid locking the folio and freeing