From patchwork Fri Mar 22 16:42:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 13600317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0D0AC47DD9 for ; Fri, 22 Mar 2024 16:50:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 323BF6B0092; Fri, 22 Mar 2024 12:50:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D3C66B0093; Fri, 22 Mar 2024 12:50:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C25E6B0096; Fri, 22 Mar 2024 12:50:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0C1ED6B0092 for ; Fri, 22 Mar 2024 12:50:06 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C8F758133A for ; Fri, 22 Mar 2024 16:50:05 +0000 (UTC) X-FDA: 81925262370.26.1B92B72 Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) by imf17.hostedemail.com (Postfix) with ESMTP id D39D94000E for ; Fri, 22 Mar 2024 16:50:03 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=kkIvUBNw; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf17.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711126204; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=WJAQS09wl5gL4zlCIvpp2eOQaFo+rGQU6/Vm4ehOJB4=; b=MSp01Ks6++pXm0ZBA9qvezH2OCb8pY1QR9zV2H325K7lOsgt/8TSkLmjwUGXKeLH+kOgVI 4r2MHavdtkNpYmnz+5PUr6kJFgQNN8ZJCmLAwPtG8qP/Baop7QeTAhK16k2W1/9qW3jKW4 zeB25mcXJppUWDMH6EYE53JKZwKp2Xo= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=kkIvUBNw; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf17.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711126204; a=rsa-sha256; cv=none; b=c3L43E12d7ayxMy+YzHRZ56Jo1T7sPcXMvDljj98gbQwr1hZsZMZBTiE8o9kuz4Wbrvr0C 0tMOvjxAJ9LbwxIH1D+gudaKAzY7G47ccFmtbhA409DG8TbJVXqMZGiIune3KY9dHqrqye fMc2B9ABPsPPexathpo8D+TCgkGwOGA= Received: by mail-qk1-f173.google.com with SMTP id af79cd13be357-78a01a3012aso155564085a.2 for ; Fri, 22 Mar 2024 09:50:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1711126202; x=1711731002; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=WJAQS09wl5gL4zlCIvpp2eOQaFo+rGQU6/Vm4ehOJB4=; b=kkIvUBNw8vLI3k8OCKvX6t+i4Rwz1op99iBOwDWeo2WMLJQq5FBwVC6BH5YU1i/G5v 23TzHVrJw62373rTATZ2MSKpCz5buMn8wHdVSIhkD4OlYKewE5b4C0/XcICLr7PaZGAL VegGf7OkWRmQM91W9F+NcqmEGaUvQ4ckG3kiMhtaH2Dx4hj9GdA1pTR/LRqmkGxmBk8z CaIungpn0jSjUrfa+Fu+09hJaLjcp5wbON2TP0UopKwLn6RAfs7gjDQIJab/a1mchD8q yYOH2MEKeF/HWfhppimN41GSYpOPLVl1DLckPoDUWj2zNzSNUVh61Gs6IhMfrR/dRmpk fDOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711126202; x=1711731002; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WJAQS09wl5gL4zlCIvpp2eOQaFo+rGQU6/Vm4ehOJB4=; b=gtmU/RSMGyiYCD9hSyuAQ/AXSvCnzUAln3EHQzYPqyzwb1ujpZLYvvAhEW4S2dIB4m BV4pAB0RbiyukXFCx8kmv3+Db6JL9Bd/R1aH4jWngBNLOoEvukINujyo78aPth5rS4TF 9HFKl65GCwVYjJeVGneqTo879DAAyx35j4tTYbE+uGHodXNHyMlhQhlyrxMhZy+hg2gH ESOA8CiNV5aPR1205Qd7BF49GD+Xa4NEwmhGeLkvnjB9uWz4QIDMf2vvfASmbza7NSjl 20M8MXBKq9+hEsTDZ9pd4RyBNdjx90j6JP82c6fLv0GEaPmH3FoyDvSQ6RsmWrDuXVhI wIDg== X-Gm-Message-State: AOJu0Yyq6roJIU/8x2w1Rsy61Hn0JmncmWoOG5ea6Ceegy6iXVN79zRn 5cjFoDjBQMPOQpIJLBj32ivxjWctZujk/aUf2HHyJkTMYl0gJp/+ptlpXGR0/GfCo1umQIVSYHW 1 X-Google-Smtp-Source: AGHT+IFexvyRbIqkdvdtyRG6N0/IbDNMA+Uq6i3Kv3Q5rLDDu+usO7MCuXyl/v3JGfS729p6TE5JqQ== X-Received: by 2002:a05:620a:8110:b0:789:ebd1:445a with SMTP id os16-20020a05620a811000b00789ebd1445amr2857290qkn.42.1711126202341; Fri, 22 Mar 2024 09:50:02 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:16be]) by smtp.gmail.com with ESMTPSA id bi18-20020a05620a319200b0078a0774d1basm913629qkb.15.2024.03.22.09.50.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Mar 2024 09:50:01 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: ying.huang@intel.com, david@redhat.com, hughd@google.com, osandov@fb.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH] mm: swapfile: fix SSD detection with swapfile on btrfs Date: Fri, 22 Mar 2024 12:42:21 -0400 Message-ID: <20240322164956.422815-1-hannes@cmpxchg.org> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: D39D94000E X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: hhhubi9nxho1ox5d5yo9hk1ztwrqu87a X-HE-Tag: 1711126203-837170 X-HE-Meta: U2FsdGVkX1+ZTGh0fu0MBtbJgoiE1+DLya9Nl3KL0TrtNxZ2graGRu6jBB85xpfxBR3JBOPjdt85F+ZnDoruvSJzYTO+14wGK5xgVHycW//ME4ZcQ0FZf+gzF772v1vagCEQ/tixQQij6GsjXiO6hzghWjxxhvZEfRcN1Qx1V6gRL7TMRWknIWrL5JKFIoDUYOxF8P+nsqrKIiL/o0Z0Sni5mffUoKJT79RxOiYShTcvo5J3FYIKzTPwshqjM+B7mx6DRnPzenfHdt8y6V2bAbf++Kp8KWx3bfBTzdWfadXkyElQfV/57uyPZPEcXnbH7W2wtc+FKpdmDo4NCIQ0y0Qdf/bEGUoNb3H8lAikL4pmB/HbKiQQLQtMfl+caZ54I2nD6LGGgQaUr2sBk2LQ4vV9WMj4xV3i0JKAU98DpS33lexEhhhof/vmosM9J4ADkNsuIyR1Eqn0lB9hOVaFyuyiNPANhK83g6zu/6O31qUK6KFg5qx5TRV/rpPVt0PrFhAzj4kDsxqEyOHzEtbuU2JBPtTHobQADJ0mfFbxwFBFqynYZ1fSEirVj6CAOcJ69zSiUAwO7a4AoPBoS0hQYufXGbst6Qej3W9Ak7fVTHSAH4npKoTw4SsrNKw6PHzlpPb/rBKl3SdlXhYXQKvU3DCH4Q1nmkYIqPhloYzcJjSShy6XYurbuW/o+p45P2Rf3Fuvw6rwchJ8W/oRCYkFXt/XffOyyYeuCufECv+U0INifpPp/2mKCBO6rNh4kCSOQTRXNmyciL22T6fgJ5NiKovCijHZkAKpD1H+CLLjCH2N8Ztl3D98wHU5QaxeiYoQFj+zZuEbO/1wr6Py8sJUsbft1WgPv9jJWv8hTKNfuo6vb5UeISDYRA7Eu40Citv2jiIUD+w0OfsADpVf/jHcN3saco/hJCyHOJ2ugriGoRgw9u5RMeq5vA0WypCzxEP+6dNVCiXUKcO0uvfo3fD YZL3hkjK Xh/b3wuejfC138ch97be59kQntMtzLJfrsDboGGcnPT0c48qjPs/TjdHEfkGPL5nO8BkTBUz+XzQ6NvtVjWM89GSlXroHixORwSjUEHOpbumVoPunzk6f9wx/GlD0pmj7AhNGsejd7dnIVjpdRsJ010QTbxNljMGiA2NaDaZ+feiNMs1/bqcfoE6qyvuNH5icyH3YwUfvSMixlMj5PVxdZbw8N0p9meTwCM7JFyzahlgt7s9RTxRi/SKw03tBctDWvT/hqwy81EgQlBF7i+kscf/Z2ZHSe96sRLaKLdxRYc9AXQSWOQAQMssTaKlsWe/KK+pYfjnLBuh+TkQ6HA62FSnMv+4LtIKcVAhndU3dZsg2crR22Q1fiGJFzPALzTpP5Gz0cBjLKt3DsWjFzDf4Rd4thyhy7XWzGNzkthRP9IzUQFLYqjPNuVZ4/ZsJE3iQDofRfm0Q9gGHUlybO/uQz+QZAIhzkfGTyAcV+7XOXMGXw8zYTiSoejhZTEwfIXXDS+iB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: btrfs sets si->bdev during swap activation, which currently happens after swapon's SSD detection and cluster setup. Thus none of the SSD optimizations and cluster lock splitting are enabled for btrfs swap. Rearrange the swapon sequence so that filesystem activation happens before determining swap behavior based on the backing device. Afterwards, the nonrotational drive is detected correctly: - Adding 2097148k swap on /mnt/swapfile. Priority:-3 extents:1 across:2097148k + Adding 2097148k swap on /mnt/swapfile. Priority:-3 extents:1 across:2097148k SS Signed-off-by: Johannes Weiner --- mm/swapfile.c | 141 +++++++++++++++++++++++++------------------------- 1 file changed, 70 insertions(+), 71 deletions(-) This survives a swapping smoketest, but I would really appreciate more eyes on the swap and fs implications of this change. diff --git a/mm/swapfile.c b/mm/swapfile.c index 4919423cce76..4dd5f2e8190d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2919,22 +2919,15 @@ static unsigned long read_swap_header(struct swap_info_struct *p, static int setup_swap_map_and_extents(struct swap_info_struct *p, union swap_header *swap_header, unsigned char *swap_map, - struct swap_cluster_info *cluster_info, unsigned long maxpages, sector_t *span) { - unsigned int j, k; unsigned int nr_good_pages; + unsigned long i; int nr_extents; - unsigned long nr_clusters = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER); - unsigned long col = p->cluster_next / SWAPFILE_CLUSTER % SWAP_CLUSTER_COLS; - unsigned long i, idx; nr_good_pages = maxpages - 1; /* omit header page */ - cluster_list_init(&p->free_clusters); - cluster_list_init(&p->discard_clusters); - for (i = 0; i < swap_header->info.nr_badpages; i++) { unsigned int page_nr = swap_header->info.badpages[i]; if (page_nr == 0 || page_nr > swap_header->info.last_page) @@ -2942,25 +2935,11 @@ static int setup_swap_map_and_extents(struct swap_info_struct *p, if (page_nr < maxpages) { swap_map[page_nr] = SWAP_MAP_BAD; nr_good_pages--; - /* - * Haven't marked the cluster free yet, no list - * operation involved - */ - inc_cluster_info_page(p, cluster_info, page_nr); } } - /* Haven't marked the cluster free yet, no list operation involved */ - for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) - inc_cluster_info_page(p, cluster_info, i); - if (nr_good_pages) { swap_map[0] = SWAP_MAP_BAD; - /* - * Not mark the cluster free yet, no list - * operation involved - */ - inc_cluster_info_page(p, cluster_info, 0); p->max = maxpages; p->pages = nr_good_pages; nr_extents = setup_swap_extents(p, span); @@ -2973,9 +2952,55 @@ static int setup_swap_map_and_extents(struct swap_info_struct *p, return -EINVAL; } + return nr_extents; +} + +static struct swap_cluster_info *setup_clusters(struct swap_info_struct *p, + unsigned char *swap_map) +{ + unsigned long nr_clusters = DIV_ROUND_UP(p->max, SWAPFILE_CLUSTER); + unsigned long col = p->cluster_next / SWAPFILE_CLUSTER % SWAP_CLUSTER_COLS; + struct swap_cluster_info *cluster_info; + unsigned long i, j, k, idx; + int cpu, err = -ENOMEM; + + cluster_info = kvcalloc(nr_clusters, sizeof(*cluster_info), GFP_KERNEL); if (!cluster_info) - return nr_extents; + goto err; + + for (i = 0; i < nr_clusters; i++) + spin_lock_init(&cluster_info[i].lock); + p->cluster_next_cpu = alloc_percpu(unsigned int); + if (!p->cluster_next_cpu) + goto err_free; + + /* Random start position to help with wear leveling */ + for_each_possible_cpu(cpu) + per_cpu(*p->cluster_next_cpu, cpu) = + get_random_u32_inclusive(1, p->highest_bit); + + p->percpu_cluster = alloc_percpu(struct percpu_cluster); + if (!p->percpu_cluster) + goto err_free; + + for_each_possible_cpu(cpu) { + struct percpu_cluster *cluster; + + cluster = per_cpu_ptr(p->percpu_cluster, cpu); + cluster_set_null(&cluster->index); + } + + /* + * Mark unusable pages as unavailable. The clusters aren't + * marked free yet, so no list operations are involved yet. + */ + for (i = 0; i < round_up(p->max, SWAPFILE_CLUSTER); i++) + if (i >= p->max || swap_map[i] == SWAP_MAP_BAD) + inc_cluster_info_page(p, cluster_info, i); + + cluster_list_init(&p->free_clusters); + cluster_list_init(&p->discard_clusters); /* * Reduce false cache line sharing between cluster_info and @@ -2994,7 +3019,13 @@ static int setup_swap_map_and_extents(struct swap_info_struct *p, idx); } } - return nr_extents; + + return cluster_info; + +err_free: + kvfree(cluster_info); +err: + return ERR_PTR(err); } SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) @@ -3090,6 +3121,17 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) goto bad_swap_unlock_inode; } + error = swap_cgroup_swapon(p->type, maxpages); + if (error) + goto bad_swap_unlock_inode; + + nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map, + maxpages, &span); + if (unlikely(nr_extents < 0)) { + error = nr_extents; + goto bad_swap_unlock_inode; + } + if (p->bdev && bdev_stable_writes(p->bdev)) p->flags |= SWP_STABLE_WRITES; @@ -3097,61 +3139,18 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) p->flags |= SWP_SYNCHRONOUS_IO; if (p->bdev && bdev_nonrot(p->bdev)) { - int cpu; - unsigned long ci, nr_cluster; - p->flags |= SWP_SOLIDSTATE; - p->cluster_next_cpu = alloc_percpu(unsigned int); - if (!p->cluster_next_cpu) { - error = -ENOMEM; - goto bad_swap_unlock_inode; - } - /* - * select a random position to start with to help wear leveling - * SSD - */ - for_each_possible_cpu(cpu) { - per_cpu(*p->cluster_next_cpu, cpu) = - get_random_u32_inclusive(1, p->highest_bit); - } - nr_cluster = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER); - - cluster_info = kvcalloc(nr_cluster, sizeof(*cluster_info), - GFP_KERNEL); - if (!cluster_info) { - error = -ENOMEM; - goto bad_swap_unlock_inode; - } - - for (ci = 0; ci < nr_cluster; ci++) - spin_lock_init(&((cluster_info + ci)->lock)); - - p->percpu_cluster = alloc_percpu(struct percpu_cluster); - if (!p->percpu_cluster) { - error = -ENOMEM; + cluster_info = setup_clusters(p, swap_map); + if (IS_ERR(cluster_info)) { + error = PTR_ERR(cluster_info); + cluster_info = NULL; goto bad_swap_unlock_inode; } - for_each_possible_cpu(cpu) { - struct percpu_cluster *cluster; - cluster = per_cpu_ptr(p->percpu_cluster, cpu); - cluster_set_null(&cluster->index); - } } else { atomic_inc(&nr_rotate_swap); inced_nr_rotate_swap = true; } - error = swap_cgroup_swapon(p->type, maxpages); - if (error) - goto bad_swap_unlock_inode; - - nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map, - cluster_info, maxpages, &span); - if (unlikely(nr_extents < 0)) { - error = nr_extents; - goto bad_swap_unlock_inode; - } - if ((swap_flags & SWAP_FLAG_DISCARD) && p->bdev && bdev_max_discard_sectors(p->bdev)) { /*