From patchwork Tue Jan 2 17:53:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13509234 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6466DC47073 for ; Tue, 2 Jan 2024 17:54:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF2776B015B; Tue, 2 Jan 2024 12:54:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E7AAA6B015D; Tue, 2 Jan 2024 12:54:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1BD16B015E; Tue, 2 Jan 2024 12:54:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BD8AD6B015B for ; Tue, 2 Jan 2024 12:54:27 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9FA011A0787 for ; Tue, 2 Jan 2024 17:54:27 +0000 (UTC) X-FDA: 81635120574.04.207DB46 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf20.hostedemail.com (Postfix) with ESMTP id BA8661C0018 for ; Tue, 2 Jan 2024 17:54:25 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="M/msYg45"; spf=pass (imf20.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704218065; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sYlN8vSKv83J8k3KTy0zWmAC8wYuURkZt3/TH4YiDKw=; b=8WGxxtABo+/GkdiVuoiDW2nWNYX7OZaQy+fKhaAhI+E8enbkS4jwlABV2RlFhShmbZw6un UGefNTFZ7KSev/ooOtPoVMIc/AW747zzEL6zio9noslsXMcoHazRhW/gd7hfw/609I8Hwj mrbPH1F1tU9kt5RnB64gS2WBrt8HzBk= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="M/msYg45"; spf=pass (imf20.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704218065; a=rsa-sha256; cv=none; b=jgYy2Ll2bzjVBmC0dnmbzcl6G+ZxWaDSEiR49JvN65NWjeJ8SUDWmVgJKEerTQpqyyHSQ5 rfv6WQcX5B3mctvodDHILGw0eQOn9ZEBRPdPRV+hgiEPEmuwYho4C2gCmyDfcjlARWHIIs KDrRSQ0eQKBG13kT6PuRz95QvRm8TU8= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1d3e416f303so23753885ad.0 for ; Tue, 02 Jan 2024 09:54:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218064; x=1704822864; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=sYlN8vSKv83J8k3KTy0zWmAC8wYuURkZt3/TH4YiDKw=; b=M/msYg45qoI7HVsH4Sd6XG0Xt/P9RC9bCoxRWk7zpf9BcaTRVkriV/KrrPD06J/eP8 pYW3Kq0xKRWIA5onwpHjwFRezci9wWy6dq1tJWpt9HzOoiJhlcWWCcz5uwEao6wrYOtv qZY6pYTFN6JFFSqDRdoSIL1dGnurXR79bHRgMFOtOQBP2wcetyA+6mO4JSwuWSLZpq92 46lD9xCDzCsMUEyoeafJKQUsoM3E36aPOy1KajrfPHZOiFNC8jMsSqPwzQaXAYSAvMi4 YjpsVashXOR+FWBx9IjNdmhelhugUMBKPbtgOitov9XyU4OcxlcJPxm2oxGGynq0vcvT 4Rxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218064; x=1704822864; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=sYlN8vSKv83J8k3KTy0zWmAC8wYuURkZt3/TH4YiDKw=; b=G6PCYcptERva58aVllXhBd//LTs/qyD5Ighrm4XxOx9EWp2h8CW5U07fI1DjjkLNRa 9hOGbfiV5tGIHihw1fSyUNRAX8npqAZjQunpPk1zeRKdOUd/HO4FpdriAkV5MzwVHzVN X7hlxnkADIOG9IxA7zsvYiGZu9VnEBuaD+lTuddB/x6fP1lF8J2RcWAH0QAx3YkvLQ2P iqSdmECHZpLTKldn2w9IHUtvW4wXgj+OD32a/QfURSSEFDTm+2rLDAcPZRqK3NLqWfMB qrA3tDZ27FJZ71Vx6Xs+cLUxi8hVnDlKSaT5+bGSXaOBlt6p34qhFz4dvTRN+FYiLFFx zdsA== X-Gm-Message-State: AOJu0Yy7eycA4RKPnTqUWB9jmaCrxZNtWSG5QLFaF9BaezfBprzBm0UT moPKUDuWZNe1M/RBKA43rvO1C2ZNuxijlRI4 X-Google-Smtp-Source: AGHT+IEl/2qWBPW9Z5FVh74mUbrKLR970e9KepR/VbDcVwyuR9p3xpCkYWqKFuuUY+RAsB8Q97us6w== X-Received: by 2002:a17:902:ea06:b0:1d4:2066:68c with SMTP id s6-20020a170902ea0600b001d42066068cmr8090026plg.130.1704218064019; Tue, 02 Jan 2024 09:54:24 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.54.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:54:23 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 9/9] mm/swap, shmem: use new swapin helper to skip readahead conditionally Date: Wed, 3 Jan 2024 01:53:38 +0800 Message-ID: <20240102175338.62012-10-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: BA8661C0018 X-Rspam-User: X-Stat-Signature: n5yyyj347zzd5km4x6nwytz91optsemj X-Rspamd-Server: rspam01 X-HE-Tag: 1704218065-202784 X-HE-Meta: U2FsdGVkX18Nqg3COH6VqxFU2W3u1ezVvVbFMQwHzIgDI78eWrYG2HB0HNP4B6zfuTiLTBhtICrQfLU7iq9PENyErCJxfWAmQCVD2s1WGLRQx+kIWuNiUBSHK8rZolyjraNnN3Ii9JcDuFCzcZQwW3YBesdSQODzCZpN4NS/ZlBFJ7sEmON86RNGpZ1H8hkpIBQgGmlR9ZPImIbwD0Xwdm+CZSSddkuC6nuXZY+r/VtfbNNhA7ZVfCtFgbXnJT1hXM+5BESfIPo86wypcMh6QNrmIYruQNNb9r303Fxyrfej4f21fRJxxIK8viWIcA6Jbp3jgLh7Ai9i2cEdKW4K165ifRJSKBNpoGkqDqw89UlMihaYpT6g34L/tum3h6J5Bt8qXQckUXmVKNDvt+i82+C97t8n4Wqmf43bi3UvXUWYdkuXP0prSv7nWabqIQFI/LWCWTdyS6BFGbCOgpLzKJyfX0wHCVMny1IElZlbMHQMHJ1z+AJ9CGCQW/KsFC8AipERl6yUkta6mKSSnRCk7i4X1t/O6zF/ZyzdGT86MShlRDMzpB4G5XtnnHfgWZ5VrkawsJMXQQhEpcK+lhkYPUpOs92lvt8LKFhg7NJFL1+H5YgPcvTlavR+vfuWlJDf7aUTYG2dB9EkX29ofXeRonAbsQj6GqA6YAIrKIDGygPg0WoG/SPwjCkrrhWbbfRoKkiRLlrL7Ec2D2ou96zT1mhmiGVQwufwZ23Cr3h4GowCg945GFfVDbPmdXkg9/RQQkANIk/Ob761xoESB4BId/o9jH/yfqymnQ7upwXsU7CCtF6Mp1V8ft4A//R9pMNuGL1MkIxtb5znKllV1aKApFoii/8L1Rve8dNR3qLkjUmjLshzc3I0CmuwwYW7I/uG3xdQERmWvVk2BlQqA31COweKzTEQZGt7Ppjy8M40xA1hYqU5BYTQJkpadhNl8NaWZvasDwvzeijzIi2Hn36 +ptEfbkU RER+XzWn1+ynkpO4W2m0DwUX26selCSt3lvg3Xu5wv63tbs+yY6DBDojQ9PfAdSUoQiSy2y6cPdZoolBst7pA17IgsoY+DXJnsja45xcAKesT2sgaj8bSnzgjmcsL5MQJaxngtmBR/yxR0F+JbQn+R+Oezi391TC7YL2NgR5P3CZ752XI6XX75NtK0BRmiLX0h6AlbCCCvkTCKUCBKjKgoa0TyEZE51pUOSA3i+DZnJoVbiswnvn6JnpVwBc97nMcQ2l3Zn3Pk8vV0CPWDXskACrzr7hWJ3v+k5kU1y5WV95I91KgZhr5RyvvvrqbSLNZw3X2swrwtSVJWJn/qIcewzSDNvb8l8G4iGBcTOVDPuak0C5FVCFOrzgQrJbfdlC1Rt8WIQOsQmUFCfPLuzHzEZ2n7uKdJXHJr8B548nI9n+iFiXsAhncRbejb70FwcVJgs9Rhy7WY/bxeTh4s6ClByn6KmXx46iTPnB8dFNef5YSg1+W9nbaq/iG3Pp6aDHzjc6dD7rxjfVSy1+/DIwkh2lBLSvMyBOihdMoZee3m8C3q+C7oigo7rO3UYVD+imspbdYNCiHyR3Y5F40bBdBwvJJL0NeFVbyuAB/toZ91cJeB8XoxTNxXajgXHinUP5qPRbF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently, shmem uses cluster readahead for all swap backends. Cluster readahead is not a good solution for ramdisk based device (ZRAM) at all. After switching to the new helper, most benchmarks showed a good result: - Single file sequence read: perf stat --repeat 20 dd if=/tmpfs/test of=/dev/null bs=1M count=8192 (/tmpfs/test is a zero filled file, using brd as swap, 4G memcg limit) Before: 22.248 +- 0.549 After: 22.021 +- 0.684 (-1.1%) - Random read stress test: fio -name=tmpfs --numjobs=16 --directory=/tmpfs \ --size=256m --ioengine=mmap --rw=randread --random_distribution=random \ --time_based --ramp_time=1m --runtime=5m --group_reporting (using brd as swap, 2G memcg limit) Before: 1818MiB/s After: 1888MiB/s (+3.85%) - Zipf biased random read stress test: fio -name=tmpfs --numjobs=16 --directory=/tmpfs \ --size=256m --ioengine=mmap --rw=randread --random_distribution=zipf:1.2 \ --time_based --ramp_time=1m --runtime=5m --group_reporting (using brd as swap, 2G memcg limit) Before: 31.1GiB/s After: 32.3GiB/s (+3.86%) So cluster readahead doesn't help much even for single sequence read, and for random stress test, the performance is better without it. Considering both memory and swap device will get more fragmented slowly, and commonly used ZRAM consumes much more CPU than plain ramdisk, false readahead could occur more frequently and waste more CPU. Direct SWAP is cheaper, so use the new helper and skip read ahead for SWP_SYNCHRONOUS_IO device. Signed-off-by: Kairui Song --- mm/shmem.c | 67 +++++++++++++++++++++++++------------------------ mm/swap.h | 9 ------- mm/swap_state.c | 11 ++++++-- 3 files changed, 43 insertions(+), 44 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 9da9f7a0e620..3c0729fe934d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1564,20 +1564,6 @@ static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo) static struct mempolicy *shmem_get_pgoff_policy(struct shmem_inode_info *info, pgoff_t index, unsigned int order, pgoff_t *ilx); -static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp, - struct shmem_inode_info *info, pgoff_t index) -{ - struct mempolicy *mpol; - pgoff_t ilx; - struct folio *folio; - - mpol = shmem_get_pgoff_policy(info, index, 0, &ilx); - folio = swap_cluster_readahead(swap, gfp, mpol, ilx); - mpol_cond_put(mpol); - - return folio; -} - /* * Make sure huge_gfp is always more limited than limit_gfp. * Some of the flags set permissions, while others set limitations. @@ -1851,9 +1837,12 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, { struct address_space *mapping = inode->i_mapping; struct shmem_inode_info *info = SHMEM_I(inode); + enum swap_cache_result cache_result; struct swap_info_struct *si; struct folio *folio = NULL; + struct mempolicy *mpol; swp_entry_t swap; + pgoff_t ilx; int error; VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); @@ -1871,36 +1860,40 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, return -EINVAL; } - /* Look it up and read it in.. */ - folio = swap_cache_get_folio(swap, NULL, 0, NULL); + mpol = shmem_get_pgoff_policy(info, index, 0, &ilx); + folio = swapin_entry_mpol(swap, gfp, mpol, ilx, &cache_result); + mpol_cond_put(mpol); + if (!folio) { - /* Or update major stats only when swapin succeeds?? */ + error = -ENOMEM; + goto failed; + } + if (cache_result != SWAP_CACHE_HIT) { if (fault_type) { *fault_type |= VM_FAULT_MAJOR; count_vm_event(PGMAJFAULT); count_memcg_event_mm(fault_mm, PGMAJFAULT); } - /* Here we actually start the io */ - folio = shmem_swapin_cluster(swap, gfp, info, index); - if (!folio) { - error = -ENOMEM; - goto failed; - } } /* We have to do this with folio locked to prevent races */ folio_lock(folio); - if (!folio_test_swapcache(folio) || - folio->swap.val != swap.val || - !shmem_confirm_swap(mapping, index, swap)) { + if (cache_result != SWAP_CACHE_BYPASS) { + /* With cache bypass, folio is new allocated, sync, and not in cache */ + if (!folio_test_swapcache(folio) || folio->swap.val != swap.val) { + error = -EEXIST; + goto unlock; + } + if (!folio_test_uptodate(folio)) { + error = -EIO; + goto failed; + } + folio_wait_writeback(folio); + } + if (!shmem_confirm_swap(mapping, index, swap)) { error = -EEXIST; goto unlock; } - if (!folio_test_uptodate(folio)) { - error = -EIO; - goto failed; - } - folio_wait_writeback(folio); /* * Some architectures may have to restore extra metadata to the @@ -1908,12 +1901,19 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, */ arch_swap_restore(swap, folio); - if (shmem_should_replace_folio(folio, gfp)) { + /* With cache bypass, folio is new allocated and always respect gfp flags */ + if (cache_result != SWAP_CACHE_BYPASS && shmem_should_replace_folio(folio, gfp)) { error = shmem_replace_folio(&folio, gfp, info, index); if (error) goto failed; } + /* + * The expected value checking below should be enough to ensure + * only one up-to-date swapin success. swap_free() is called after + * this, so the entry can't be reused. As long as the mapping still + * has the old entry value, it's never swapped in or modified. + */ error = shmem_add_to_page_cache(folio, mapping, index, swp_to_radix_entry(swap), gfp); if (error) @@ -1924,7 +1924,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, if (sgp == SGP_WRITE) folio_mark_accessed(folio); - delete_from_swap_cache(folio); + if (cache_result != SWAP_CACHE_BYPASS) + delete_from_swap_cache(folio); folio_mark_dirty(folio); swap_free(swap); put_swap_device(si); diff --git a/mm/swap.h b/mm/swap.h index 8f790a67b948..20f4048c971c 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -57,9 +57,6 @@ void __delete_from_swap_cache(struct folio *folio, void delete_from_swap_cache(struct folio *folio); void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end); -struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr, - void **shadowp); struct folio *filemap_get_incore_folio(struct address_space *mapping, pgoff_t index); @@ -123,12 +120,6 @@ static inline int swap_writepage(struct page *p, struct writeback_control *wbc) return 0; } -static inline struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr) -{ - return NULL; -} - static inline struct folio *filemap_get_incore_folio(struct address_space *mapping, pgoff_t index) diff --git a/mm/swap_state.c b/mm/swap_state.c index 3edf4b63158d..10eec68475dd 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -318,7 +318,14 @@ void free_pages_and_swap_cache(struct encoded_page **pages, int nr) static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_t entry) { - return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1; + int count; + + if (!data_race(si->flags & SWP_SYNCHRONOUS_IO)) + return false; + + count = __swap_count(entry); + + return (count == 1 || count == SWAP_MAP_SHMEM); } static inline bool swap_use_vma_readahead(void) @@ -334,7 +341,7 @@ static inline bool swap_use_vma_readahead(void) * * Caller must lock the swap device or hold a reference to keep it valid. */ -struct folio *swap_cache_get_folio(swp_entry_t entry, +static struct folio *swap_cache_get_folio(swp_entry_t entry, struct vm_area_struct *vma, unsigned long addr, void **shadowp) { struct folio *folio;