From patchwork Tue Mar 26 18:50:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13604897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C40A2C6FD1F for ; Tue, 26 Mar 2024 19:04:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FC976B0099; Tue, 26 Mar 2024 15:04:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 483FD6B0098; Tue, 26 Mar 2024 15:04:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B01A6B0099; Tue, 26 Mar 2024 15:04:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 127866B0096 for ; Tue, 26 Mar 2024 15:04:38 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CE5B580235 for ; Tue, 26 Mar 2024 19:04:37 +0000 (UTC) X-FDA: 81940116594.29.248FF1D Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf06.hostedemail.com (Postfix) with ESMTP id 00BFF18001E for ; Tue, 26 Mar 2024 19:04:35 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mhzGtU2P; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711479876; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AlnFQa5IuF1FU+kiw8JlfLUfSH7ltUVyYlgQl237ZM0=; b=tyjiUKJAHTvPppRekEYeEeu+pHBBeMKF7fdqp75Du/RCaghkJ42DIMfBvxHb/+xW3+rLx/ IeV1B6/XFMxiaLASzunWZ2EQPJRCracmYsmdoLfVoIzkao5pdB4oDfRLalkcIwarXArUSv occ445TsjGtotEuLAD1XGIn5iS8o7Rw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711479876; a=rsa-sha256; cv=none; b=ZtYaUIr0747XkjQTlwhRr0JR80qcDC3Er+QqgsDIxC7t0l7epkpipFWIwHdMLK3AIuqdni 0awLqh282jCU+TzEGRiUBWmuEaOWG5ZhwHrOuxPKdfJKoQOZmq+tcdHHQpxCssKs87stBN 1HIPP/mL/5rPDM82tHROcFgTQ2HeVdU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mhzGtU2P; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6e6b3dc3564so4184112b3a.2 for ; Tue, 26 Mar 2024 12:04:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711479874; x=1712084674; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=AlnFQa5IuF1FU+kiw8JlfLUfSH7ltUVyYlgQl237ZM0=; b=mhzGtU2P+ni30/kARit+Kw1H9n3Xw5+jtEICuXnyA0ERJVf/R0JEpvQXVCFKXzW/vW azCn7ujmebpKDYPEcgwA6ZjvVQLGhPECHFQKCfFPKHFKUx+bjGHutbXkXkBfDWAgUBXB yaULKrSwjVdTX4tZq9aLo9AP3AcNhVgVz3X7ODioEGDXoF5pa0u0YtztaPrNbJaMVrK7 4CCa0nU0u9o9K8n4YoDurTu5ZXOObFLmmVA+WaGLTDI+Dh0nUkXLQceQW45m/ptTaJe1 +osJ3wI95zPFvdndGFEepuhqOvQjY7KI9VENWX+qQaxGVtbkKYxtJ4CrNT49GnZEQAh2 MceA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711479874; x=1712084674; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=AlnFQa5IuF1FU+kiw8JlfLUfSH7ltUVyYlgQl237ZM0=; b=vOMjCGtGynuzed58fohAv0k90EWNA2MnJDzGaJDo770p5qwOftGnqaEVI4E9LofMWa +9p8mxs9YtDtrgwbT4NfoqTV7aAdKyDuiKotFjleuaVEB6i1WxheeHNNskVah2dKycWF vZCj5jPfUZ+eR4HbEJFqoO/GPJHaJBp5CyrBOiOP67KSwIbqme8Ijp2ATK+zst3Ob6j5 vHsUuG48qBzgTCmPAbfhxB74THax5/Zx0k8gLvibHDyg9ysLZjm3AZ/BpG7j825GdhWM Fh/MhE6gkBI7frL53OJnLXiBDaeruCCnMP0nWs1Dx09MuWYQQxVT50EIgfEFVn8ySRCF 959g== X-Gm-Message-State: AOJu0YwhNg4aRjrIdd5bIYYQc3ywV9Ytzs3XTTxtqgjLdxsXeDR5gnOI qNzR4c+l8CQ1O0CYKwCY13EdqMOZdfqGBGEZ8D+d9J190y0ylY+5BCwZ83f6FWrDhJre X-Google-Smtp-Source: AGHT+IG6+WnJXSjAIBqkKDWHn5ts/KmlVMWEdeN8j0WCY8uGY9blEEX6n3YjmXe+budCoU47FIqAOg== X-Received: by 2002:a05:6a00:3d49:b0:6e8:f8a9:490e with SMTP id lp9-20020a056a003d4900b006e8f8a9490emr2484291pfb.5.1711479874359; Tue, 26 Mar 2024 12:04:34 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([115.171.40.106]) by smtp.gmail.com with ESMTPSA id j14-20020aa783ce000000b006ea790c2232sm6298350pfn.79.2024.03.26.12.04.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 26 Mar 2024 12:04:33 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: "Huang, Ying" , Chris Li , Minchan Kim , Barry Song , Ryan Roberts , Yu Zhao , SeongJae Park , David Hildenbrand , Yosry Ahmed , Johannes Weiner , Matthew Wilcox , Nhat Pham , Chengming Zhou , Andrew Morton , linux-kernel@vger.kernel.org, Kairui Song Subject: [RFC PATCH 04/10] mm/swap: remove cache bypass swapin Date: Wed, 27 Mar 2024 02:50:26 +0800 Message-ID: <20240326185032.72159-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240326185032.72159-1-ryncsn@gmail.com> References: <20240326185032.72159-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 00BFF18001E X-Rspam-User: X-Stat-Signature: yohkyoe841oqahez8ym9rjuwqgpawjyw X-Rspamd-Server: rspam03 X-HE-Tag: 1711479875-221904 X-HE-Meta: U2FsdGVkX18kDBg00+ULbqqhaCS46c4eGQGSydoZmYKSkvknOIJ5/xn/mg0q2QjoNqeewuedl0aRUTLm5KfOVxYGBsAgR+Tw3vUekQhy6xBv5jVHOsm0iyl/m9YByT3SmyrQR0HTlUIMEC0mm3ehtCXG8d9r+/D8sm/kvhGTDyAeA+pzZ8OEFhMjtGnJLhqcKlcTZq9eOo9qQJ38p7L63O6I9VwTcf4TXv2dMRisTYnIYruh9jEazVTu0hZE4uTsMu6ZvtQF/gwSdAE7ErH4lVSn6Vu3NEWpgI+bPlRLLWnJwDta+FsVGg4327jz8X0Ylb6RsuGrGi37hubptE2as7FkadgbLW3D7cBcuZKpnqQs8yYd/STDLu9xyGjYii1LoTHZuilPJrIyq9PFRT1zaYefT+6TZFhCLIcmXyBQ8fRhigIlgPaY8rODA9iwPl3auw+m0q9/zTOeDkRyMnENwl4xRn5Hynqw1gfW5Sga8S5OKtjifNJPE3NSuRPsMHqWiqJkV4/vsomHWm3qOvSndlX3PWrlb1a2BDFEzI5LtdtRCLAJ7Mz0CLmhlJsYPaCGYUp0TjK4XTpgXksiQOaoZDxqYZ/GIciUA3E4NpONL0Grj51gpUHJw+7DUbKAgvtxLP0nOzMEBTMvFqMqEHeowMyZfk7dxLTD/r1u1hks8G+ye02zDrp062k9AnruRhxhmo3i2itFgyPPweO4fnDUclkLjI7EaiLmCiyCdFZBbHb4ZVw6vdXpMKBKaJLK37WUQ5tGFs9eGOLd4ixwKuuwSDvATH/A0DA9cryoPrpLG6WR+fSrsAhPnpB9Y8N8UPXyrok5C97QTdXh3vtokUvovD2RwPQrhJRoO13zQpbZfB2Ji75qFOsMHuXW/XqUQ1LoJNZ9RRI1Uem5AImZDUxkPshn2D4ULBoZ5rD40KHQKDs3CTXz35BI04q0jLKAi0fkDdAQobL7bSur2GKaaea JxN2aUYv tC+9cSwaEMCD0lH5LF4PRPZ4Dpe7XIemdEcRh1p7nYlU1aPjK40kcuDcJ2gVgU+EKeHEfDA9BUkLoaasuCbDzMuKA1uRzHeBwJ2kybObTB99tm7KBXkqI3ttMBDufAuXUk9y8+Z7xqNL7BQNV0zM6kAAZb1OFTjZPI9mCU7BqGFf01HQ5kVpUMX1GvAp7WqZZkGpaPeN2LLGSQEct2GEoBnkk3XEgi7KYIdWRMQXyN3XbxUT9Lpn4fh/HwdnuILhGyurCJV/plG2i8Yw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song We used to have the cache bypass swapin path for better performance, but by removing it, more optimization can be applied and have an even better overall performance and less hackish. And these optimizations are not easily doable or not doable at all without this. This patch simply removes it, and the performance will drop heavily for simple swapin, things won't get this worse for real workloads but still observable. Following commits will fix this and archive a better performance. Swapout/in 30G zero pages from ZRAM (This mostly measures overhead of swap path itself, because zero pages are not compressed but simply recorded in ZRAM, and performance drops more as SWAP device is getting full): Test result of sequential swapin/out: Before (us) After (us) Swapout: 33619409 33624641 Swapin: 32393771 41614858 (-28.4%) Swapout (THP): 7817909 7795530 Swapin (THP) : 32452387 41708471 (-28.4%) Signed-off-by: Kairui Song --- mm/memory.c | 18 ++++------------- mm/swap.h | 10 +++++----- mm/swap_state.c | 53 ++++++++++--------------------------------------- mm/swapfile.c | 13 ------------ 4 files changed, 19 insertions(+), 75 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index dfdb620a9123..357d239ee2f6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3932,7 +3932,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; - bool need_clear_cache = false; bool exclusive = false; swp_entry_t entry; pte_t pte; @@ -4000,14 +3999,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { - /* skip swapcache and readahead */ folio = swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf); - if (PTR_ERR(folio) == -EBUSY) - goto out; - need_clear_cache = true; } else { folio = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, vmf); - swapcache = folio; } if (!folio) { @@ -4023,6 +4017,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto unlock; } + swapcache = folio; page = folio_file_page(folio, swp_offset(entry)); /* Had to read the page from swap area: Major fault */ @@ -4187,7 +4182,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->orig_pte = pte; /* ksm created a completely new copy */ - if (unlikely(folio != swapcache && swapcache)) { + if (unlikely(folio != swapcache)) { folio_add_new_anon_rmap(folio, vma, vmf->address); folio_add_lru_vma(folio, vma); } else { @@ -4201,7 +4196,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); folio_unlock(folio); - if (folio != swapcache && swapcache) { + if (folio != swapcache) { /* * Hold the lock to avoid the swap entry to be reused * until we take the PT lock for the pte_same() check @@ -4227,9 +4222,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); out: - /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) - swapcache_clear(si, entry); if (si) put_swap_device(si); return ret; @@ -4240,12 +4232,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(folio); out_release: folio_put(folio); - if (folio != swapcache && swapcache) { + if (folio != swapcache) { folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) - swapcache_clear(si, entry); if (si) put_swap_device(si); return ret; diff --git a/mm/swap.h b/mm/swap.h index aee134907a70..ac9573b03432 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct folio *folio, void delete_from_swap_cache(struct folio *folio); void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end); -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry); struct folio *swap_cache_get_folio(swp_entry_t entry, struct vm_area_struct *vma, unsigned long addr); struct folio *filemap_get_incore_folio(struct address_space *mapping, @@ -100,14 +99,15 @@ static inline struct folio *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, { return NULL; } - -static inline int swap_writepage(struct page *p, struct writeback_control *wbc) +static inline struct folio *swapin_direct(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); { - return 0; + return NULL; } -static inline void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) +static inline int swap_writepage(struct page *p, struct writeback_control *wbc) { + return 0; } static inline struct folio *swap_cache_get_folio(swp_entry_t entry, diff --git a/mm/swap_state.c b/mm/swap_state.c index 2a9c6bdff5ea..49ef6250f676 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -880,61 +880,28 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, } /** - * swapin_direct - swap in folios skipping swap cache and readahead + * swapin_direct - swap in folios skipping readahead * @entry: swap entry of this memory * @gfp_mask: memory allocation flags * @vmf: fault information * - * Returns the struct folio for entry and addr after the swap entry is read - * in. + * Returns the folio for entry after it is read in. */ struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, struct vm_fault *vmf) { - struct vm_area_struct *vma = vmf->vma; + struct mempolicy *mpol; struct folio *folio; - void *shadow = NULL; - - /* - * Prevent parallel swapin from proceeding with - * the cache flag. Otherwise, another thread may - * finish swapin first, free the entry, and swapout - * reusing the same entry. It's undetectable as - * pte_same() returns true due to entry reuse. - */ - if (swapcache_prepare(entry)) { - /* Relax a bit to prevent rapid repeated page faults */ - schedule_timeout_uninterruptible(1); - return ERR_PTR(-EBUSY); - } - - /* skip swapcache */ - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); - if (folio) { - __folio_set_locked(folio); - __folio_set_swapbacked(folio); - - if (mem_cgroup_swapin_charge_folio(folio, - vma->vm_mm, GFP_KERNEL, - entry)) { - folio_unlock(folio); - folio_put(folio); - return NULL; - } - mem_cgroup_swapin_uncharge_swap(entry); - - shadow = get_shadow_from_swap_cache(entry); - if (shadow) - workingset_refault(folio, shadow); + bool page_allocated; + pgoff_t ilx; - folio_add_lru(folio); + mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); + folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + &page_allocated, false); + mpol_cond_put(mpol); - /* To provide entry to swap_read_folio() */ - folio->swap = entry; + if (page_allocated) swap_read_folio(folio, true, NULL); - folio->private = NULL; - } return folio; } diff --git a/mm/swapfile.c b/mm/swapfile.c index 4dd894395a0f..ae8d3aa05df7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3389,19 +3389,6 @@ int swapcache_prepare(swp_entry_t entry) return __swap_duplicate(entry, SWAP_HAS_CACHE); } -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) -{ - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - unsigned char usage; - - ci = lock_cluster_or_swap_info(si, offset); - usage = __swap_entry_free_locked(si, offset, SWAP_HAS_CACHE); - unlock_cluster_or_swap_info(si, ci); - if (!usage) - free_swap_slot(entry); -} - struct swap_info_struct *swp_swap_info(swp_entry_t entry) { return swap_type_to_swap_info(swp_type(entry));