From patchwork Mon Jan 29 14:32:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13535737 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EE3CC47422 for ; Mon, 29 Jan 2024 14:32:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8ECBA6B0082; Mon, 29 Jan 2024 09:32:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 89C7A6B0083; Mon, 29 Jan 2024 09:32:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73DBE6B0085; Mon, 29 Jan 2024 09:32:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 614D56B0082 for ; Mon, 29 Jan 2024 09:32:36 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1303840A39 for ; Mon, 29 Jan 2024 14:32:36 +0000 (UTC) X-FDA: 81732589512.24.B674F30 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 2616E180017 for ; Mon, 29 Jan 2024 14:32:33 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Iqvpnmy6; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706538754; a=rsa-sha256; cv=none; b=OSxqsSiXo/S4qkcSq8z4y+0rf7EUHO6TygW8rM10jvgH2zgP8n63+d0LXzJsiWQyFYHSMY bdhnFarSDCD78Qx7Io2ro1xPSPf0bR4+0L+4W7Z+N07f2h+imcjYiaoG1KsAiHflfC/Bcf 5HHj5vgkFxXPJQEtE8yhCA3uNBpm3N4= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Iqvpnmy6; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706538754; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=UhBdOI+JB6L0dCFGLAOqO2pJABjn3Znd2yV65wClG/w=; b=4w1RYY9etyoCp/e4vWv73xyp+cd+7gr8aeWLZUIx5R6TqyjM2SJKPhJpy87kocFzh3I1Gz kk9pRFuqhL1XcqWaK9yR7NGr0xEZrE4ESTqbX9izCwXieV3fgYExJUGC70R7cgq8Y9w1hF WEHfPnrERm/Rm7HMzYb25FbjqnhOAy4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706538753; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=UhBdOI+JB6L0dCFGLAOqO2pJABjn3Znd2yV65wClG/w=; b=Iqvpnmy6mMOPZYG3tx6bZyM1BBxCV+4385uaRT74evBaBlOh3R6HgGA6282djlwpn0ZbeE xpmrbEGCe8wFspedK7kQ1URAAT4MbKHIiT6zUZxjDHa8AWqylfV9qr3YdfR1y99pJlWW/N 8v0S5RIQtE24HcSrrIXkIm0/j7+r6rE= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-478-Kz5-1HWrO6SlttN34_aWHQ-1; Mon, 29 Jan 2024 09:32:29 -0500 X-MC-Unique: Kz5-1HWrO6SlttN34_aWHQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 979933C13A90; Mon, 29 Jan 2024 14:32:27 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.46]) by smtp.corp.redhat.com (Postfix) with ESMTP id 75D74C3F; Mon, 29 Jan 2024 14:32:22 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Matthew Wilcox , Ryan Roberts , Catalin Marinas , Will Deacon , "Aneesh Kumar K.V" , Nick Piggin , Peter Zijlstra , Michael Ellerman , Christophe Leroy , "Naveen N. Rao" , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Arnd Bergmann , linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org Subject: [PATCH v1 0/9] mm/memory: optimize unmap/zap with PTE-mapped THP Date: Mon, 29 Jan 2024 15:32:12 +0100 Message-ID: <20240129143221.263763-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2616E180017 X-Stat-Signature: enc4ffisdsrhfu9z1mt1o3kajzirxhzf X-Rspam-User: X-HE-Tag: 1706538753-713218 X-HE-Meta: U2FsdGVkX1+sSy9xouwEKKu7AIJ3RuEN6YWN6UclcxBL6XQJeIvHvJWsVOkYPYpQ42vt+jI7oBv7OS0XKaQoChzY5X7KXMfUAuYSkpFvYjoA1miKdCNaNc6p5KQwqsuQp8IW47AnA1n1sOZXf4gATdmxFquJoHtTTrQa5dSu8is116j8Km/Pu4NfPUgWf1oZCsmQBPdlQOkYgCq0jO5nTFYebu5zEQlOD5ob4ihnUSdo/kCcWKTxfBXFBhIm6pufxGx7wQcdfc9YqbZN7+1Wy01LGYCFGz8opLlokAM7WVsN+ByIRnDwhq9uqh8Fc3QrpjtEhOISy0N3FWAF9VpQThLUa+mlWZYTYfKCUSx8/0zpLSNc9MeQQ25+CvVruCBNRpeh4NfkkCsL9OORoGgrVP6hUwbL/asrckvCDItNU8woNdqn96IH03Xdv2LiGT1BsyaF4kBdKWPY+L7vUTExj+Ga7Ur7VRb/QkiMAH/Bb0QMABMiyC/sSgCcrJ0GuwETLntSnO/LvPYNrpuHVH+TqB0LZ6a+YaSWZa3u8VKjLbi4zZGqTC9JD4eGlJJfXAeXAWyFsv0ziLsUn7ZoGRPI6uwhxj3RqttVktVCtXLy4+uIQ+eec7sW3Nut7QFl6iRL1p2Qa+a+lQcOmqhM1qXZ7LacZhgYKBtHv7zi2vdwFBSl7GwtvleI1KzwjwVjLLRxgvqKzasyXs3S5E1EypE1IQ+7jlbcgOwzubom3zZssvPgQRQUWGcd+V0q3QrbDBypyGyeEqrn+uhhR5d8af7+W226VzJ1hFgRJ+/GFL7EvKN0mUKK6ggliyfBdm9NNcOeTuWZ2fT0c9E3jXyM1mPF5ieJ3Fw0jCZMkCE1C8MD/f+Gsjvom5mYppTWQUmBo1KKNE5t4ob9toU2Wqc2HtW4sixMraazOri2Py2BQgwU8YxZc/j+6jUu+ZLdwwfuneyNzQx3a86s9jtpm2JhJr7 hdU2zDwR 2rBKlWp7sTF3qrr4ubHqgSrJWekJhCJ2RPAkAp5TbzR5E3ZrUGic/wt1LbuKSAYQsNxXCKhZHnlo1RuVVm83UhfCF5qbtePveOyzh6bT8+GIGpXP360ifgsjm+VcsekUYbBTNtxUx6Aeqcl8QehUFpyji5/AYLZSbh/kpjNPM29Izu7wmXnKRrLJsMr94Fvgu9U+Os770lGyKoSlHZPbV2i9ixaQR/B5mRjTXcIX+YOi4uBXBx9rxBN/SN0mi+XRFpWezLVmRWK5tfswl/yRZ5f52K9yfo1CZhbP6cJBOcqruqWLoXyEzWt5LdjtvjK5BI/4YO8XMIKpnIyxjxDM/BYVV5HVXQ6jGn2FyfGVsCGyNUcg+PDUCra8CkoNSLwpO1QM5gwViT3J5OnqoTmDlmN75Y3fsY9zUUytnQLrKVTPudX0BZqzHMztymkhxaBqgF9tnVXoVejagBbqBaivYVgDIk8ORrjqGI3M9vrmqr8qV364ZqfiZqhfJoaJbs0kPEzJunl92nVxPRIE4vSgXqDIa4AB0+pa+xNnALBIQOIxc91Y4VJcJlXGDgoWXc25VNvLrUggcFe1ua40= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This series is based on [1] and must be applied on top of it. Similar to what we did with fork(), let's implement PTE batching during unmap/zap when processing PTE-mapped THPs. We collect consecutive PTEs that map consecutive pages of the same large folio, making sure that the other PTE bits are compatible, and (a) adjust the refcount only once per batch, (b) call rmap handling functions only once per batch, (c) perform batch PTE setting/updates and (d) perform TLB entry removal once per batch. Ryan was previously working on this in the context of cont-pte for arm64, int latest iteration [2] with a focus on arm6 with cont-pte only. This series implements the optimization for all architectures, independent of such PTE bits, teaches MMU gather/TLB code to be fully aware of such large-folio-pages batches as well, and amkes use of our new rmap batching function when removing the rmap. To achieve that, we have to enlighten MMU gather / page freeing code (i.e., everything that consumes encoded_page) to process unmapping of consecutive pages that all belong to the same large folio. I'm being very careful to not degrade order-0 performance, and it looks like I managed to achieve that. While this series should -- similar to [1] -- be beneficial for adding cont-pte support on arm64[2], it's one of the requirements for maintaining a total mapcount[3] for large folios with minimal added overhead and further changes[4] that build up on top of the total mapcount. Independent of all that, this series results in a speedup during munmap() and similar unmapping (process teardown, MADV_DONTNEED on larger ranges) with PTE-mapped THP, which is the default with THPs that are smaller than a PMD (for example, 16KiB to 1024KiB mTHPs for anonymous memory[5]). On an Intel Xeon Silver 4210R CPU, munmap'ing a 1GiB VMA backed by PTE-mapped folios of the same size (stddev < 1%) results in the following runtimes for munmap() in seconds (shorter is better): Folio Size | mm-unstable | New | Change --------------------------------------------- 4KiB | 0.058110 | 0.057715 | - 1% 16KiB | 0.044198 | 0.035469 | -20% 32KiB | 0.034216 | 0.023522 | -31% 64KiB | 0.029207 | 0.018434 | -37% 128KiB | 0.026579 | 0.014026 | -47% 256KiB | 0.025130 | 0.011756 | -53% 512KiB | 0.024292 | 0.010703 | -56% 1024KiB | 0.023812 | 0.010294 | -57% 2048KiB | 0.023785 | 0.009910 | -58% CCing especially s390x folks, because they have a tlb freeing hooks that needs adjustment. Only tested on x86-64 for now, will have to do some more stress testing. Compile-tested on most other architectures. The PPC change is negleglible and makes my cross-compiler happy. [1] https://lkml.kernel.org/r/20240129124649.189745-1-david@redhat.com [2] https://lkml.kernel.org/r/20231218105100.172635-1-ryan.roberts@arm.com [3] https://lkml.kernel.org/r/20230809083256.699513-1-david@redhat.com [4] https://lkml.kernel.org/r/20231124132626.235350-1-david@redhat.com [5] https://lkml.kernel.org/r/20231207161211.2374093-1-ryan.roberts@arm.com Cc: Andrew Morton Cc: Matthew Wilcox (Oracle) Cc: Ryan Roberts Cc: Catalin Marinas Cc: Will Deacon Cc: "Aneesh Kumar K.V" Cc: Nick Piggin Cc: Peter Zijlstra Cc: Michael Ellerman Cc: Christophe Leroy Cc: "Naveen N. Rao" Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Alexander Gordeev Cc: Christian Borntraeger Cc: Sven Schnelle Cc: Arnd Bergmann Cc: linux-arch@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org David Hildenbrand (9): mm/memory: factor out zapping of present pte into zap_present_pte() mm/memory: handle !page case in zap_present_pte() separately mm/memory: further separate anon and pagecache folio handling in zap_present_pte() mm/memory: factor out zapping folio pte into zap_present_folio_pte() mm/mmu_gather: pass "delay_rmap" instead of encoded page to __tlb_remove_page_size() mm/mmu_gather: define ENCODED_PAGE_FLAG_DELAY_RMAP mm/mmu_gather: add __tlb_remove_folio_pages() mm/mmu_gather: add tlb_remove_tlb_entries() mm/memory: optimize unmap/zap with PTE-mapped THP arch/powerpc/include/asm/tlb.h | 2 + arch/s390/include/asm/tlb.h | 30 ++++-- include/asm-generic/tlb.h | 40 ++++++-- include/linux/mm_types.h | 37 ++++++-- include/linux/pgtable.h | 66 +++++++++++++ mm/memory.c | 167 +++++++++++++++++++++++---------- mm/mmu_gather.c | 63 +++++++++++-- mm/swap.c | 12 ++- mm/swap_state.c | 12 ++- 9 files changed, 347 insertions(+), 82 deletions(-)