From patchwork Tue May 16 05:29:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 13242547 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E639C77B7A for ; Tue, 16 May 2023 05:32:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BF35280001; Tue, 16 May 2023 01:32:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 645AC900002; Tue, 16 May 2023 01:32:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BF71280001; Tue, 16 May 2023 01:32:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 35F7E900002 for ; Tue, 16 May 2023 01:32:22 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 09A714145C for ; Tue, 16 May 2023 05:32:22 +0000 (UTC) X-FDA: 80794997724.21.D838746 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf05.hostedemail.com (Postfix) with ESMTP id DE20E100002 for ; Tue, 16 May 2023 05:32:18 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SIpGx6He; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684215139; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=0PGlwjCN2j2X5FB5T6aWGvHlmjp8LFbkS3k0XXFneFE=; b=GeLauSgjtYGHzPiS8QFZlYC8/C6vGkhF+5iSR7mvE9mmnwmL0dJr8zg+A+H/1Rnh489CaH HrDVt6TxaVJr2kurT+uBRJklpbdsy+sREtkwOWu/+5PCbooM6mxOJL/BLwluC7TCQ64xF4 2lYIp7GWsjFA0kF+AL1Dlp/iMGuS100= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SIpGx6He; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684215139; a=rsa-sha256; cv=none; b=8QyxvS/1NnSSz+jDN6Jo/+JQvXb5GvXZz2crXS72tcGIXr1EoeWGYGbNQNllmHwaoxgMmT pQOotiGz97n1Nih67upr4xD4NOFAbjG7zG9J5iSCmsxinWgCA3vNWbGRBXUvvLTIhfi3eZ zZR2NuyKO838OLtjhkJbOmF6PVwfA1o= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684215139; x=1715751139; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=VV9cOBZQtObVAsWGl0aQ/fA+/wUKQxhDDkpPyYOxuDY=; b=SIpGx6HeDvPm5+u9rVkRvJrSWL9hQsTZJMQVw9mDTI5CFaLyefQfe4U/ uMOf6r01Tom8l0ZhcjVjx6ITkvs59GQZACMEXvlGA85V1rxifHxyBDHwV UYHfchrmJmqkzoP6p45McSm4Ti72N/inms2oBqEnn75GccHFndY/XLL4M 1FTpelJ4Wm/38ITj9wuuk6lf9TbB2S4SB4k2O6nWqqKA5itRKwU4lahZ0 zaLKK4nteJJYZGMqUbSQqaKhTTY/PFyU8oB/pXEuYo0rHipY7A3JLCsV6 R5uh7E5SqRVbOLkkIM6RXu1G5U9Jg6sBrDINpvPyC1JAXdewi5F/raw9f Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10711"; a="348886035" X-IronPort-AV: E=Sophos;i="5.99,277,1677571200"; d="scan'208";a="348886035" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 22:32:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10711"; a="825430750" X-IronPort-AV: E=Sophos;i="5.99,277,1677571200"; d="scan'208";a="825430750" Received: from dzhan22-mobl2.ccr.corp.intel.com (HELO yhuang6-mobl2.ccr.corp.intel.com) ([10.255.30.63]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 22:32:10 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , David Hildenbrand , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Minchan Kim , Tim Chen , Yang Shi , Yu Zhao Subject: [PATCH] swap: cleanup get/put_swap_device usage Date: Tue, 16 May 2023 13:29:57 +0800 Message-Id: <20230516052957.175432-1-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-Stat-Signature: kkj49jwhnmb4itp3ddfmmuqkizc8qch6 X-Rspam-User: X-Rspamd-Queue-Id: DE20E100002 X-Rspamd-Server: rspam07 X-HE-Tag: 1684215138-583078 X-HE-Meta: U2FsdGVkX19TNREwCtvItCWb+ACNFcrvmvSIo54a+Q7J3CTSrwmdhtGg1rwUZx98peBKVmJz8rITWktRcTwiLBMG8jmlvBNGGe9KorYIl8isswce65/R8TfWVBJIlsF7OJZwv/jRNoyglm9cneurv8uW8z0OHgvWMaBsKn56Vf9uLD2RyMZh6pdz+vHh6Q8JdQrUFzgOPWu0n/JPawccvjSRZfY9eJ6jCVsPsqD71D3+gevlh/UPM5MLPXfg6mNPD+wF+1rBjbIqU1vB8ZIx+CM/Yylzttnr6GAK9IOgDHBSFeveksz08TirluVCYoMxzmQgAlEf+t8dON1eRzHvu3ohHg8+ieyFCL6pSDrrtEm1CpK5a8LOhWwHhkwW1qBNyUm2xrYTDijW5rTseJrba6/GPPq2upunsumM9Fh6e3A3qNzWeaL3Njl02UgHVYqFlwGNhpsJFpQVWHhq0pGPwoyFsDv+xo6NHatQSgrATCAbfYlezrc+MwxS2aKnzvPJIBGTjeMzPKbWMeWH1YpkGP2OZoLrfbFA+aqpwb0FPpcFzphyeZce0ighP1H3J4LzbJjosGfEi0CDBpeZhaH4o8PWZZdsDmv+W08YXdxh76OO6gnmsuIkpoG+bBtVcwRUXJm2g+dW1vmYMVKAXXcpEQgqAnYevDrNUpm/7pnsPbavAPwOGJQ4kwuS8jEe9EHjwip9hMwVfBT0vr4e6ok2fkt3JMN2xMESl8xwIUiuIBIas44UoQdHoconljk8LDlEbZxUFLi7VnwxWSoiGm0XBquooqYXRDMR7miz94hNKE+xYoB/9qTe9POmIrbK3qxKR2/MF+89d6XjxVZVlVb719p8GorJMEHh/8UAG2oPNMiXp/X/+stQDzV4R/r6bl7AQyuxKJnYbXUKThylyFcNjoLyAQPK1i6a1MBI4kv7oT5xVivN4FLVJpYHqKdwQFfiFMfWqM035RNZTCXMiyr y7lQm7pB RkoV7p4JgD2ETITvyO3eb9cxbcvZG5b97Yrjf7Ebg7MMtl72z4IEW+jxLfyqJbnZk6M654F0CZNkKxW9TOEQKy8Fn37D75lqLK8nnb8Feoqaf6tx2w1x+1RzQnMVe4/u5HofeLZTs+s8++PpZs1N/60gXAa9xF1lGgyiZpqzU0U7nG8EUEAN0GYliN5MfXIKAr40TJUMW/uJnfCXVZPIWmxYseE8HhGxU+YM7kR388Pk4tqoF9mLO8d04fyomLnDp/eKYjEr2FtO8Auz+c4/kj9lCUgketTqGWO1jJe3A/EX91MGsM+Iurdbq5XdNFDgagmDt4bGnughiLZjN0klpWPSlbh7F3Ez1dHV8gJ2m6WSQPzRU/vXQJqVB4Kx2n1aNCLdvHsbzldUikDGGkX6ItKDRx3C1KQy2qSvYpBZvRhi98w9ELYXspuaKAD9gJ+Ff0geJufNUcjcgItnuiv4NFgRhALj4wndAuxGHKhkkh32Ss1y0Al2QIUv4p2cr/MqSP9UA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The general rule to use a swap entry is as follows. When we get a swap entry, if there isn't some other way to prevent swapoff, such as page lock for swap cache, page table lock, etc., the swap entry may become invalid because of swapoff. Then, we need to enclose all swap related functions with get_swap_device() and put_swap_device(), unless the swap functions call get/put_swap_device() by themselves. Add the rule as comments of get_swap_device(), and cleanup some functions which call get/put_swap_device(). 1. Enlarge the get/put_swap_device() protection range in __read_swap_cache_async(). This makes the function a little easier to be understood because we don't need to consider swapoff. And this makes it possible to remove get/put_swap_device() calling in some function called by __read_swap_cache_async(). 2. Remove get/put_swap_device() in __swap_count(). Which is call in do_swap_page() only, which encloses the call with get/put_swap_device() already. 3. Remove get/put_swap_device() in __swp_swapcount(). Which is call in __read_swap_cache_async() only, which encloses the call with get/put_swap_device() already. 4. Remove get/put_swap_device() in __swap_duplicate(). Which is called by - swap_shmem_alloc(): the swap cache is locked. - copy_nonpresent_pte() -> swap_duplicate() and try_to_unmap_one() -> swap_duplicate(): the page table lock is held. - __read_swap_cache_async() -> swapcache_prepare(): enclosed with get/put_swap_device() already. Other get/put_swap_device() usages are checked too. Signed-off-by: "Huang, Ying" Cc: David Hildenbrand Cc: Hugh Dickins Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Michal Hocko Cc: Minchan Kim Cc: Tim Chen Cc: Yang Shi Cc: Yu Zhao --- include/linux/swap.h | 4 ++-- mm/swap_state.c | 33 ++++++++++++++++++++----------- mm/swapfile.c | 47 ++++++++++++-------------------------------- 3 files changed, 37 insertions(+), 47 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 3c69cb653cb9..f6bd51aa05ea 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -512,7 +512,7 @@ int find_first_swap(dev_t *device); extern unsigned int count_swap_pages(int, int); extern sector_t swapdev_block(int, pgoff_t); extern int __swap_count(swp_entry_t entry); -extern int __swp_swapcount(swp_entry_t entry); +extern int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); @@ -590,7 +590,7 @@ static inline int __swap_count(swp_entry_t entry) return 0; } -static inline int __swp_swapcount(swp_entry_t entry) +static inline int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) { return 0; } diff --git a/mm/swap_state.c b/mm/swap_state.c index b76a65ac28b3..a1028fe7214e 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -417,9 +417,13 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, { struct swap_info_struct *si; struct folio *folio; + struct page *page; void *shadow = NULL; *new_page_allocated = false; + si = get_swap_device(entry); + if (!si) + return NULL; for (;;) { int err; @@ -428,14 +432,12 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * called after swap_cache_get_folio() failed, re-calling * that would confuse statistics. */ - si = get_swap_device(entry); - if (!si) - return NULL; folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); - put_swap_device(si); - if (!IS_ERR(folio)) - return folio_file_page(folio, swp_offset(entry)); + if (!IS_ERR(folio)) { + page = folio_file_page(folio, swp_offset(entry)); + goto got_page; + } /* * Just skip read ahead for unused swap slot. @@ -445,8 +447,8 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * as SWAP_HAS_CACHE. That's done in later part of code or * else swap_off will be aborted if we return NULL. */ - if (!__swp_swapcount(entry) && swap_slot_cache_enabled) - return NULL; + if (!swap_swapcount(si, entry) && swap_slot_cache_enabled) + goto fail; /* * Get a new page to read into from swap. Allocate it now, @@ -455,7 +457,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ folio = vma_alloc_folio(gfp_mask, 0, vma, addr, false); if (!folio) - return NULL; + goto fail; /* * Swap entry may have been freed since our caller observed it. @@ -466,7 +468,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, folio_put(folio); if (err != -EEXIST) - return NULL; + goto fail; /* * We might race against __delete_from_swap_cache(), and @@ -500,12 +502,17 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* Caller will initiate read into locked folio */ folio_add_lru(folio); *new_page_allocated = true; - return &folio->page; + page = &folio->page; +got_page: + put_swap_device(si); + return page; fail_unlock: put_swap_folio(folio, entry); folio_unlock(folio); folio_put(folio); +fail: + put_swap_device(si); return NULL; } @@ -514,6 +521,10 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * and reading the disk if it is not already cached. * A failure return means that either the page allocation failed or that * the swap entry is no longer in use. + * + * get/put_swap_device() aren't needed to call this function, because + * __read_swap_cache_async() call them and swap_readpage() holds the + * swap cache folio lock. */ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, diff --git a/mm/swapfile.c b/mm/swapfile.c index 274bbf797480..0c1cb935b2eb 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1219,6 +1219,13 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, } /* + * When we get a swap entry, if there isn't some other way to prevent + * swapoff, such as page lock for swap cache, page table lock, etc., + * the swap entry may become invalid because of swapoff. Then, we + * need to enclose all swap related functions with get_swap_device() + * and put_swap_device(), unless the swap functions call + * get/put_swap_device() by themselves. + * * Check whether swap entry is valid in the swap device. If so, * return pointer to swap_info_struct, and keep the swap entry valid * via preventing the swap device from being swapoff, until @@ -1227,9 +1234,8 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, * Notice that swapoff or swapoff+swapon can still happen before the * percpu_ref_tryget_live() in get_swap_device() or after the * percpu_ref_put() in put_swap_device() if there isn't any other way - * to prevent swapoff, such as page lock, page table lock, etc. The - * caller must be prepared for that. For example, the following - * situation is possible. + * to prevent swapoff. The caller must be prepared for that. For + * example, the following situation is possible. * * CPU1 CPU2 * do_swap_page() @@ -1432,16 +1438,10 @@ void swapcache_free_entries(swp_entry_t *entries, int n) int __swap_count(swp_entry_t entry) { - struct swap_info_struct *si; + struct swap_info_struct *si = swp_swap_info(entry); pgoff_t offset = swp_offset(entry); - int count = 0; - si = get_swap_device(entry); - if (si) { - count = swap_count(si->swap_map[offset]); - put_swap_device(si); - } - return count; + return swap_count(si->swap_map[offset]); } /* @@ -1449,7 +1449,7 @@ int __swap_count(swp_entry_t entry) * This does not give an exact answer when swap count is continued, * but does include the high COUNT_CONTINUED flag to allow for that. */ -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) +int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) { pgoff_t offset = swp_offset(entry); struct swap_cluster_info *ci; @@ -1461,24 +1461,6 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) return count; } -/* - * How many references to @entry are currently swapped out? - * This does not give an exact answer when swap count is continued, - * but does include the high COUNT_CONTINUED flag to allow for that. - */ -int __swp_swapcount(swp_entry_t entry) -{ - int count = 0; - struct swap_info_struct *si; - - si = get_swap_device(entry); - if (si) { - count = swap_swapcount(si, entry); - put_swap_device(si); - } - return count; -} - /* * How many references to @entry are currently swapped out? * This considers COUNT_CONTINUED so it returns exact answer. @@ -3288,9 +3270,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) unsigned char has_cache; int err; - p = get_swap_device(entry); - if (!p) - return -EINVAL; + p = swp_swap_info(entry); offset = swp_offset(entry); ci = lock_cluster_or_swap_info(p, offset); @@ -3337,7 +3317,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) unlock_out: unlock_cluster_or_swap_info(p, ci); - put_swap_device(p); return err; }