From patchwork Sat Jan 27 07:47:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533990 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E3CB1E883 for ; Sat, 27 Jan 2024 07:52:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341941; cv=none; b=gLS21I/QnzpbOat3qJZL5StDef4IEYUtElYxIT6T1zPVX4sSgfmiVjyZolM4/JaK7TNlqKnRIspbzyU9/1dtFooUA1uTMofQYDwXCzC2D+vA+B9PJVIZybsEN0HsLfNyihHlveQHnAAUNz8xqXqpzblQbpH5oZ1sfWwPRtHDHv8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341941; c=relaxed/simple; bh=niaIFrprWvPwC2Lv0RN5Cl1tFCytXopzeriQpGSoAMg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NJgJFvivQKJ++AdHVsrjcjcoO1/XKsGfktjjeWIoA28kIAHYUZDtQO0RtziGINrt1PUMkQjSvGjoSnd6iiRVRnkp6bEUej3pO3MKqjJCZ5Bgi2ouOlcHTVc70XcBDkLDRg7dG06vDslFCI/Rgsh3v5n6Abad9vZYBG1EQX4QFX0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TMRbx2f8Tz4f3k63 for ; Sat, 27 Jan 2024 15:52:13 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id C68811A0172 for ; Sat, 27 Jan 2024 15:52:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S5; Sat, 27 Jan 2024 15:52:04 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 01/12] md: don't ignore suspended array in md_check_recovery() Date: Sat, 27 Jan 2024 15:47:43 +0800 Message-Id: <20240127074754.2380890-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S5 X-Coremail-Antispam: 1UD129KBjvJXoWxJryDAF43ArW7JrWrZr1xGrg_yoW8ZFy7pa yIkF1YyrWjyFZ7Aa4qka4UZa4rAr1jqrW5AFy3u34rCa4fKw43CrWYgFyDXFyqkFyxKrZY vw4rJa15uw18KF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBK14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UM2 8EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2IY04 v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_ Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x 0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8 JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIx AIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7VUjFdgJUUUUU= = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai mddev_suspend() never stop sync_thread, hence it doesn't make sense to ignore suspended array in md_check_recovery(), which might cause sync_thread can't be unregistered. After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following hang can be triggered by test shell/integrity-caching.sh: 1) suspend the array: raid_postsuspend mddev_suspend 2) stop the array: raid_dtr md_stop __md_stop_writes stop_sync_thread set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_wakeup_thread_directly(mddev->sync_thread); wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) 3) sync thread done: md_do_sync set_bit(MD_RECOVERY_DONE, &mddev->recovery); md_wakeup_thread(mddev->thread); 4) daemon thread can't unregister sync thread: md_check_recovery if (mddev->suspended) return; -> return directly md_read_sync_thread clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); -> MD_RECOVERY_RUNNING can't be cleared, hence step 2 hang; This problem is not just related to dm-raid, fix it by ignoring suspended array in md_check_recovery(). And follow up patches will improve dm-raid better to frozen sync thread during suspend. Reported-by: Mikulas Patocka Closes: https://lore.kernel.org/all/8fb335e-6d2c-dbb5-d7-ded8db5145a@redhat.com/ Fixes: 68866e425be2 ("MD: no sync IO while suspended") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Signed-off-by: Yu Kuai --- drivers/md/md.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 2266358d8074..07b80278eaa5 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9469,9 +9469,6 @@ static void md_start_sync(struct work_struct *ws) */ void md_check_recovery(struct mddev *mddev) { - if (READ_ONCE(mddev->suspended)) - return; - if (mddev->bitmap) md_bitmap_daemon_work(mddev); From patchwork Sat Jan 27 07:47:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533991 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A83E41DFC5 for ; Sat, 27 Jan 2024 07:52:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341945; cv=none; b=q+uZoJj791FQCo5Gh318D7zLfpMlqw4MbTDp1GCpLOWWi7rt0VOngGvLvT09eIcJ2a2b14TBQUoCLFxQ/eEW8rNb46NsrtX+Ri9zcE8WwlplNNntozjULxgw+AMqgZSieiSaN3yGYqfY4PKVy4e1NkklGFMpO5i9x+yuoBKmus0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341945; c=relaxed/simple; bh=h4mAM5munNDh64f9BT7eKjkJVG7xHUqubnznwxYlHRM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DsW35ibwYQUybx4n6fLQv6UeKXLy0WIVu119lfNOXyhE8VsZuXcZ4xCsXak/qwnLXUB7tNLxXbcUF7cTl7YdNgDPhmNr3WkGvFqie7TM4qhz3UHq0OfcnQ3QYc1jWru2gR2QuEeT3b6EnwAoSPB6fzswIYeShr6kSJIxwRpGiNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TMRc05T1Zz4f3jYJ for ; Sat, 27 Jan 2024 15:52:16 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id C0F191A0272 for ; Sat, 27 Jan 2024 15:52:20 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S6; Sat, 27 Jan 2024 15:52:16 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 02/12] md: don't ignore read-only array in md_check_recovery() Date: Sat, 27 Jan 2024 15:47:44 +0800 Message-Id: <20240127074754.2380890-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S6 X-Coremail-Antispam: 1UD129KBjvJXoWxCF1xZw47XF4fCr17tFyUGFg_yoWrXw45pa yfKas8Cr4UtFy3urZrK3WDXayYvr10qrZ0yFy3u34fAw1rtr4Sk34ruFyUJFWDtFyFqw4Y vw4rJF43ZFy8Kw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPj14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUCVW8JwCI42IY6I8E87Iv67AKxVWUJVW8Jw CI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUOJPEUUUU U X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai Usually if the array is not read-write, md_check_recovery() won't register new sync_thread in the first place. And if the array is read-write and sync_thread is registered, md_set_readonly() will unregister sync_thread before setting the array read-only. md/raid follow this behavior hence there is no problem. After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following hang can be triggered by test shell/integrity-caching.sh: 1) array is read-only. dm-raid update super block: rs_update_sbs ro = mddev->ro mddev->ro = 0 -> set array read-write md_update_sb 2) register new sync thread concurrently. 3) dm-raid set array back to read-only: rs_update_sbs mddev->ro = ro 4) stop the array: raid_dtr md_stop stop_sync_thread set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_wakeup_thread_directly(mddev->sync_thread); wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) 5) sync thread done: md_do_sync set_bit(MD_RECOVERY_DONE, &mddev->recovery); md_wakeup_thread(mddev->thread); 6) daemon thread can't unregister sync thread: md_check_recovery if (!md_is_rdwr(mddev) && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) return; -> -> MD_RECOVERY_RUNNING can't be cleared, hence step 4 hang; The root cause is that dm-raid manipulate 'mddev->ro' by itself, however, dm-raid really should stop sync thread before setting the array read-only. Unfortunately, I need to read more code before I can refacter the handler of 'mddev->ro' in dm-raid, hence let's fix the problem the easy way for now to prevent dm-raid regression. Reported-by: Mikulas Patocka Closes: https://lore.kernel.org/all/9801e40-8ac7-e225-6a71-309dcf9dc9aa@redhat.com/ Fixes: ecbfb9f118bc ("dm raid: add raid level takeover support") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Signed-off-by: Yu Kuai --- drivers/md/md.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 07b80278eaa5..6906d023f1d6 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9445,6 +9445,20 @@ static void md_start_sync(struct work_struct *ws) sysfs_notify_dirent_safe(mddev->sysfs_action); } +static void unregister_sync_thread(struct mddev *mddev) +{ + if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { + /* resync/recovery still happening */ + clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + return; + } + + if (WARN_ON_ONCE(!mddev->sync_thread)) + return; + + md_reap_sync_thread(mddev); +} + /* * This routine is regularly called by all per-raid-array threads to * deal with generic issues like resync and super-block update. @@ -9482,7 +9496,8 @@ void md_check_recovery(struct mddev *mddev) } if (!md_is_rdwr(mddev) && - !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) + !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) && + !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) return; if ( ! ( (mddev->sb_flags & ~ (1<recovery)) { - /* sync_work already queued. */ - clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + unregister_sync_thread(mddev); goto unlock; } @@ -9568,16 +9582,7 @@ void md_check_recovery(struct mddev *mddev) * still set. */ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) { - if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { - /* resync/recovery still happening */ - clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - goto unlock; - } - - if (WARN_ON_ONCE(!mddev->sync_thread)) - goto unlock; - - md_reap_sync_thread(mddev); + unregister_sync_thread(mddev); goto unlock; } From patchwork Sat Jan 27 07:47:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533992 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 933A6200C3 for ; Sat, 27 Jan 2024 07:52:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341950; cv=none; b=YlUhaxuIdC3SaU/TjDS55GOY6x/+b+7heK20XqwK8nPLwA3vc2GlNkpuPa80T3sRPUgoFpN8oRPGoMSoE9Xc10r+pTJfVjv3Uo3J2bWEVY3uSHMsGiiOpMf3damjd8YdjFsavBlgKDzXlG5BxuL2Jd2Ig7opIE0Dl1LvnffYIlw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341950; c=relaxed/simple; bh=uuJ8tRuvIPDNbFzYKdOhbv8+B/r7ZvglzLK7hF7sQ34=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=b4xfrDrmVE1A20gvnXesxMrD4l/2GO/OKiSPb20Pz1syZRZCTJESq49t7KSNLlfz4OTmTdXgnFBAkCoquKDWHomOGc51cHhhL44Nu/ijH/wiT0TBr0yg8lEJYpDJTznHkkVebn6qeanfSroLQ+J6a8+su5HWv57GprRA5ycZGuA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TMRc62ZTzz4f3k68 for ; Sat, 27 Jan 2024 15:52:22 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id C4C2E1A017A for ; Sat, 27 Jan 2024 15:52:24 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S7; Sat, 27 Jan 2024 15:52:21 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 03/12] md: make sure md_do_sync() will set MD_RECOVERY_DONE Date: Sat, 27 Jan 2024 15:47:45 +0800 Message-Id: <20240127074754.2380890-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S7 X-Coremail-Antispam: 1UD129KBjvJXoW7WF48JF18KF48Xr1fZFW7twb_yoW8tr1rpF WkCF98ZrW8ArW7urW2qa4UZFy5Ar10qrW7CFyfW34rAF13Kw4akryjkFyUXFWqkFWxJw4F vFs8JFZ8uF95C3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPj14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8Jw CI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUojjgUUUU U X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must set MD_RECOVERY_DONE, so that follow up md_check_recovery() will unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up stop_sync_thread(). If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4 ("md: fix stopping sync thread"), dm-raid switch from md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread from md_stop() and md_stop_writes(), causing the test shell/lvconvert-raid-reshape.sh hang. We shouldn't switch back to md_reap_sync_thread() because it's problematic in the first place. Fix the problem by making sure md_do_sync() will set MD_RECOVERY_DONE. Reported-by: Mikulas Patocka Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/ Fixes: d5d885fd514f ("md: introduce new personality funciton start()") Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Signed-off-by: Yu Kuai --- drivers/md/md.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 6906d023f1d6..c65dfd156090 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread) int ret; /* just incase thread restarts... */ - if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) || - test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) + if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) return; - if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */ + + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) + goto skip; + + if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) || + !md_is_rdwr(mddev)) {/* never try to sync a read-only array */ set_bit(MD_RECOVERY_INTR, &mddev->recovery); - return; + goto skip; } if (mddev_is_clustered(mddev)) { From patchwork Sat Jan 27 07:47:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533993 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 99B8120DFC for ; Sat, 27 Jan 2024 07:52:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341952; cv=none; b=gCxluZFX2cTN/0NlzeyiUFIfFmdd8Dx7OnDWHqqEbuf79yYx4QpS/FspmToBLzQqoH1o3j0bnPiF/SYlCn0K8nEGiHviRvbhFB5vSOie6KFmjs4YZbxrUKN3rCeEU3pditE2VE4NucGIgyjJArl/rGKJ4k/TTh0W8YLXESZBDYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341952; c=relaxed/simple; bh=dZCpGLgs1JsjUcuAOwp41TG3RiXfam80H98EBcaCbRk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cimZr+jfrxMtk/2HoQWei3PvZCPVVysjqnJHJ8SbwdOAuRrEgKw1j/umO2dIHn2ODLTtc4JNIU44MtEbcnRgUm3LS+6Uc/hj3KjE9osRomfuJoNGeIQJtvQ0tDP3lVBoCfRMGcl4juYEWlUn06VGQ7tPkzhqmqZFS55gFPQtlbY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TMRc75TZDz4f3jXb for ; Sat, 27 Jan 2024 15:52:23 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id C0E6D1A016E for ; Sat, 27 Jan 2024 15:52:27 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S8; Sat, 27 Jan 2024 15:52:25 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 04/12] md: don't register sync_thread for reshape directly Date: Sat, 27 Jan 2024 15:47:46 +0800 Message-Id: <20240127074754.2380890-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S8 X-Coremail-Antispam: 1UD129KBjvJXoW3Xry3Kw4kuF1xAry3KryrCrg_yoW7KF1fpa yfKF9xJr48A343ZrWUta4DXFW5uw1jqrWqyry3W3s5A3ZYyrZ3JFy5uFyUJFWkAa4kta15 ta45tFWDZFyFgw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUP214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbmZ X7UUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai Currently, if reshape is interrupted, then reassemble the array will register sync_thread directly from pers->run(), in this case 'MD_RECOVERY_RUNNING' is set directly, however, there is no guarantee that md_do_sync() will be executed, hence stop_sync_thread() will hang because 'MD_RECOVERY_RUNNING' can't be cleared. Last patch make sure that md_do_sync() will set MD_RECOVERY_DONE, however, following hang can still be triggered by dm-raid test shell/lvconvert-raid-reshape.sh occasionally: [root@fedora ~]# cat /proc/1982/stack [<0>] stop_sync_thread+0x1ab/0x270 [md_mod] [<0>] md_frozen_sync_thread+0x5c/0xa0 [md_mod] [<0>] raid_presuspend+0x1e/0x70 [dm_raid] [<0>] dm_table_presuspend_targets+0x40/0xb0 [dm_mod] [<0>] __dm_destroy+0x2a5/0x310 [dm_mod] [<0>] dm_destroy+0x16/0x30 [dm_mod] [<0>] dev_remove+0x165/0x290 [dm_mod] [<0>] ctl_ioctl+0x4bb/0x7b0 [dm_mod] [<0>] dm_ctl_ioctl+0x11/0x20 [dm_mod] [<0>] vfs_ioctl+0x21/0x60 [<0>] __x64_sys_ioctl+0xb9/0xe0 [<0>] do_syscall_64+0xc6/0x230 [<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74 Meanwhile mddev->recovery is: MD_RECOVERY_RUNNING | MD_RECOVERY_INTR | MD_RECOVERY_RESHAPE | MD_RECOVERY_FROZEN Fix this problem by remove the code to register sync_thread directly from raid10 and raid5. And let md_check_recovery() to register sync_thread. Fixes: f67055780caa ("[PATCH] md: Checkpoint and allow restart of raid5 reshape") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Signed-off-by: Yu Kuai --- drivers/md/md.c | 5 ++++- drivers/md/raid10.c | 16 ++-------------- drivers/md/raid5.c | 29 ++--------------------------- 3 files changed, 8 insertions(+), 42 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index c65dfd156090..6c5d0a372927 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9372,6 +9372,7 @@ static void md_start_sync(struct work_struct *ws) struct mddev *mddev = container_of(ws, struct mddev, sync_work); int spares = 0; bool suspend = false; + char *name; if (md_spares_need_change(mddev)) suspend = true; @@ -9404,8 +9405,10 @@ static void md_start_sync(struct work_struct *ws) if (spares) md_bitmap_write_all(mddev->bitmap); + name = test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ? + "reshape" : "resync"; rcu_assign_pointer(mddev->sync_thread, - md_register_thread(md_do_sync, mddev, "resync")); + md_register_thread(md_do_sync, mddev, name)); if (!mddev->sync_thread) { pr_warn("%s: could not start resync thread...\n", mdname(mddev)); diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 7412066ea22c..a5f8419e2df1 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -4175,11 +4175,7 @@ static int raid10_run(struct mddev *mddev) clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - rcu_assign_pointer(mddev->sync_thread, - md_register_thread(md_do_sync, mddev, "reshape")); - if (!mddev->sync_thread) - goto out_free_conf; + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); } return 0; @@ -4573,16 +4569,8 @@ static int raid10_start_reshape(struct mddev *mddev) clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); clear_bit(MD_RECOVERY_DONE, &mddev->recovery); set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - - rcu_assign_pointer(mddev->sync_thread, - md_register_thread(md_do_sync, mddev, "reshape")); - if (!mddev->sync_thread) { - ret = -EAGAIN; - goto abort; - } + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); conf->reshape_checkpoint = jiffies; - md_wakeup_thread(mddev->sync_thread); md_new_event(); return 0; diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 8497880135ee..6a7a32f7fb91 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -7936,11 +7936,7 @@ static int raid5_run(struct mddev *mddev) clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - rcu_assign_pointer(mddev->sync_thread, - md_register_thread(md_do_sync, mddev, "reshape")); - if (!mddev->sync_thread) - goto abort; + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); } /* Ok, everything is just fine now */ @@ -8506,29 +8502,8 @@ static int raid5_start_reshape(struct mddev *mddev) clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); clear_bit(MD_RECOVERY_DONE, &mddev->recovery); set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - rcu_assign_pointer(mddev->sync_thread, - md_register_thread(md_do_sync, mddev, "reshape")); - if (!mddev->sync_thread) { - mddev->recovery = 0; - spin_lock_irq(&conf->device_lock); - write_seqcount_begin(&conf->gen_lock); - mddev->raid_disks = conf->raid_disks = conf->previous_raid_disks; - mddev->new_chunk_sectors = - conf->chunk_sectors = conf->prev_chunk_sectors; - mddev->new_layout = conf->algorithm = conf->prev_algo; - rdev_for_each(rdev, mddev) - rdev->new_data_offset = rdev->data_offset; - smp_wmb(); - conf->generation --; - conf->reshape_progress = MaxSector; - mddev->reshape_position = MaxSector; - write_seqcount_end(&conf->gen_lock); - spin_unlock_irq(&conf->device_lock); - return -EAGAIN; - } + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); conf->reshape_checkpoint = jiffies; - md_wakeup_thread(mddev->sync_thread); md_new_event(); return 0; } From patchwork Sat Jan 27 07:47:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533994 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45AE021116 for ; Sat, 27 Jan 2024 07:52:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341952; cv=none; b=u26lfJCECHLFjcAmzNvR5o4UqPV7eX+XEOxdM/zOHDFuAHmqZs8AKqsu9aZSYj0h/zOzqT0WBBLyR6uDLE1EonmlrrH1af2ZqreoQ9iBwhQx5f5SqH7usmyQrsnqVBFHYebM+rPr8/FWojPFEN5XzbPzxq6Bwm3b2fOcngKSMhc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341952; c=relaxed/simple; bh=YYyayD+v2iivP+R6gR4731n5JA1gzwDoJc52NdiV/do=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NSWXOMpFg6ME8rObqRVqXvV4f6I4YFijDcqDIlK4YJCecR6+znB0sZwYQZAkowj+JGzax9o6v/EElR+82jhEi1TxHUF1hI4eaJyspX+pvkV//qk+etRWAePBcerRZy5LmjnpE438BOlQwfD+MW799fIAK6USVG8AZlqUkOnKlMM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TMRc832d5z4f3jsH for ; Sat, 27 Jan 2024 15:52:24 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 6D35A1A0272 for ; Sat, 27 Jan 2024 15:52:28 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S9; Sat, 27 Jan 2024 15:52:28 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 05/12] md: export helpers to stop sync_thread Date: Sat, 27 Jan 2024 15:47:47 +0800 Message-Id: <20240127074754.2380890-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S9 X-Coremail-Antispam: 1UD129KBjvJXoWxZryrJF1rCF1fuFyDGF1Utrb_yoW5CFWrpr WIqF95Ar4YyrZxZry7Ja4Duay5Zwn2gFyqyry3C3yfJ3Z3trsrGF1Yva4UAFWkGa4fXr1U ta15KFW3uFy8Kr7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr 0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai The new heleprs will be used in dm-raid in later patches to fix regressions and prevent calling md_reap_sync_thread() directly. Signed-off-by: Yu Kuai --- drivers/md/md.c | 33 +++++++++++++++++++++++++++++++-- drivers/md/md.h | 3 +++ 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 6c5d0a372927..6a76bd27e381 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4915,16 +4915,45 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq) mddev_lock_nointr(mddev); } +void md_idle_sync_thread(struct mddev *mddev) +{ + lockdep_assert_held(&mddev->reconfig_mutex); + + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + stop_sync_thread(mddev, true, true); +} +EXPORT_SYMBOL_GPL(md_idle_sync_thread); + +void md_frozen_sync_thread(struct mddev *mddev) +{ + lockdep_assert_held(&mddev->reconfig_mutex); + + set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + stop_sync_thread(mddev, true, false); +} +EXPORT_SYMBOL_GPL(md_frozen_sync_thread); + +void md_unfrozen_sync_thread(struct mddev *mddev) +{ + lockdep_assert_held(&mddev->reconfig_mutex); + + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + md_wakeup_thread(mddev->thread); + sysfs_notify_dirent_safe(mddev->sysfs_action); +} +EXPORT_SYMBOL_GPL(md_unfrozen_sync_thread); + static void idle_sync_thread(struct mddev *mddev) { mutex_lock(&mddev->sync_mutex); - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); if (mddev_lock(mddev)) { mutex_unlock(&mddev->sync_mutex); return; } + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev, false, true); mutex_unlock(&mddev->sync_mutex); } @@ -4932,13 +4961,13 @@ static void idle_sync_thread(struct mddev *mddev) static void frozen_sync_thread(struct mddev *mddev) { mutex_lock(&mddev->sync_mutex); - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); if (mddev_lock(mddev)) { mutex_unlock(&mddev->sync_mutex); return; } + set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev, false, false); mutex_unlock(&mddev->sync_mutex); } diff --git a/drivers/md/md.h b/drivers/md/md.h index 8d881cc59799..437ab70ce79b 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -781,6 +781,9 @@ extern void md_rdev_clear(struct md_rdev *rdev); extern void md_handle_request(struct mddev *mddev, struct bio *bio); extern int mddev_suspend(struct mddev *mddev, bool interruptible); extern void mddev_resume(struct mddev *mddev); +extern void md_idle_sync_thread(struct mddev *mddev); +extern void md_frozen_sync_thread(struct mddev *mddev); +extern void md_unfrozen_sync_thread(struct mddev *mddev); extern void md_reload_sb(struct mddev *mddev, int raid_disk); extern void md_update_sb(struct mddev *mddev, int force); From patchwork Sat Jan 27 07:47:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533995 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F3CD72111B for ; Sat, 27 Jan 2024 07:52:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341953; cv=none; b=go9h7MObkSOUoqnCGHHJ7I5Hjz+8FUoaOQvpYvLbg/J5x4oML91v6QDgk7StRrG3850bq3O1UG/QutlEMug9A+Ak45HTHyvfyq+Rz9EszE2zk3fAvryVhxzsmqWBqZkJGJjjUfoQsp0uC8wUSIDWZhW90MmDsESN6kTQTjZkEqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341953; c=relaxed/simple; bh=6BrWeORSG3spzobIm2H1STSTTDZ6J8PBa6b1+irB9U4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JR8nkNdaDZuyskiHG+zQBjS85ZpMx1/l5b7vkB6SKm66KW8yKEucSWXyMPiWBOMTQP7wRaURjgVkZH5/DqTu+2DI7kdtqquwba04KDU/8Q1iR4K9m9FUGB5fzqWlas8HlOv2aKlVCy6RRh/vIrI4c43OhUC9/ud97g94AI0MFlU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TMRc90gLvz4f3jsQ for ; Sat, 27 Jan 2024 15:52:25 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 1B6A31A0272 for ; Sat, 27 Jan 2024 15:52:29 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S10; Sat, 27 Jan 2024 15:52:28 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 06/12] dm-raid: really frozen sync_thread during suspend Date: Sat, 27 Jan 2024 15:47:48 +0800 Message-Id: <20240127074754.2380890-7-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S10 X-Coremail-Antispam: 1UD129KBjvJXoWxur1xXF4fWry8tryDtFW3trb_yoWrKw15pa yrtws0vw48JrW7Za9Fy3WkXFWYvwnIgrWUtr93WayrJ3WSkws3ury8Kw47ZFWDtFyxJa4Y yr4Dtw45uFWjgFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr 0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai 1) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen, it only prevent new sync_thread to start, and it can't stop the running sync thread; 2) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use it as condition for md_stop_writes() in raid_postsuspend() doesn't look correct. 3) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime, and if MD_RECOVERY_FROZEN is cleared while the array is suspended, new sync_thread can start unexpected. Fix above problems by using the new helper to suspend the array during suspend, also disallow raid_message() to change sync_thread status during suspend. Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(), and with previous fixes, the test won't hang there anymore, however, the test will still fail and complain that ext4 is corrupted. And with this patch, the test won't hang due to stop_sync_thread() or fail due to ext4 is corrupted anymore. However, there is still a deadlock related to dm-raid456 that will be fixed in following patches. Reported-by: Mikulas Patocka Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/ Fixes: 1af2048a3e87 ("dm raid: fix deadlock caused by premature md_stop_writes()") Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Signed-off-by: Yu Kuai --- drivers/md/dm-raid.c | 38 +++++++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index eb009d6bb03a..5ce3c6020b1b 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) rs->md.ro = 1; rs->md.in_sync = 1; - /* Keep array frozen until resume. */ - set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery); - /* Has to be held on running the array */ mddev_suspend_and_lock_nointr(&rs->md); + + /* Keep array frozen until resume. */ + md_frozen_sync_thread(&rs->md); + r = md_run(&rs->md); rs->md.in_sync = 0; /* Assume already marked dirty */ if (r) { @@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, if (!mddev->pers || !mddev->pers->sync_request) return -EINVAL; + if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) + return -EBUSY; + if (!strcasecmp(argv[0], "frozen")) set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); else @@ -3791,15 +3795,31 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits) blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs)); } +static void raid_presuspend(struct dm_target *ti) +{ + struct raid_set *rs = ti->private; + + mddev_lock_nointr(&rs->md); + md_frozen_sync_thread(&rs->md); + mddev_unlock(&rs->md); +} + +static void raid_presuspend_undo(struct dm_target *ti) +{ + struct raid_set *rs = ti->private; + + mddev_lock_nointr(&rs->md); + md_unfrozen_sync_thread(&rs->md); + mddev_unlock(&rs->md); +} + static void raid_postsuspend(struct dm_target *ti) { struct raid_set *rs = ti->private; if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { /* Writes have to be stopped before suspending to avoid deadlocks. */ - if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery)) - md_stop_writes(&rs->md); - + md_stop_writes(&rs->md); mddev_suspend(&rs->md, false); } } @@ -4012,8 +4032,6 @@ static int raid_preresume(struct dm_target *ti) } /* Check for any resize/reshape on @rs and adjust/initiate */ - /* Be prepared for mddev_resume() in raid_resume() */ - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) { set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); mddev->resync_min = mddev->recovery_cp; @@ -4056,9 +4074,9 @@ static void raid_resume(struct dm_target *ti) rs_set_capacity(rs); mddev_lock_nointr(mddev); - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); mddev->ro = 0; mddev->in_sync = 0; + md_unfrozen_sync_thread(mddev); mddev_unlock_and_resume(mddev); } } @@ -4074,6 +4092,8 @@ static struct target_type raid_target = { .message = raid_message, .iterate_devices = raid_iterate_devices, .io_hints = raid_io_hints, + .presuspend = raid_presuspend, + .presuspend_undo = raid_presuspend_undo, .postsuspend = raid_postsuspend, .preresume = raid_preresume, .resume = raid_resume, From patchwork Sat Jan 27 07:47:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533996 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A46CB2111C for ; Sat, 27 Jan 2024 07:52:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341954; cv=none; b=WxyGflvExo8E5lSIXFJHpsaMDLNvQ6/2qpAwOwm0RWop/LkklW2Nrh3DYNkpO9PEwBMuovvhfICgvg69njNlm2Wh9TzIm/TtJg5hpO38N35tWyskBOajOr66ZXm/Y4orz9WAZe5caFmBtMpXkrUZR9Pxf3lX4m0ZUqIh/F54xxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341954; c=relaxed/simple; bh=orTUf8OIgVO5EnJ8e56Og6bXoRyUdV4mYFLO+shF788=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=J2WRh8qg/SWfWEn3gpCvI3nhlrab9WzLAnlRirQO+TMOYFBxde0tsYA6jr5Kx2/WqqxYoe8IVdh6HnmJEuQCyXvi72jBNXniWaYRCRnBROCp5kQ/gbC1cZNRDY3Hg7SsT/84hJF2WOmF7N+b0Iajz9TFV4xKc+iP3mv/+o8qAYU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TMRc72kKGz4f3lWH for ; Sat, 27 Jan 2024 15:52:23 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id BBB881A027B for ; Sat, 27 Jan 2024 15:52:29 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S11; Sat, 27 Jan 2024 15:52:29 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 07/12] md/dm-raid: don't call md_reap_sync_thread() directly Date: Sat, 27 Jan 2024 15:47:49 +0800 Message-Id: <20240127074754.2380890-8-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S11 X-Coremail-Antispam: 1UD129KBjvJXoW7AF1xAF18KFyDuFy5Ww48JFb_yoW8Kw4fpa yakas8Ar48JrW3ZFsrt3WDWayFv3ZFq3yqyFWfG3y3JF1fKF43WF1j9a17ZFWDJFWfta4U XFZ8tF45uF4YqFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr 0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai Currently md_reap_sync_thread() is called from raid_message() directly without holding 'reconfig_mutex', this is definitely unsafe because md_reap_sync_thread() can change many fields that is protected by 'reconfig_mutex'. However, hold 'reconfig_mutex' here is still problematic because this will cause deadlock, for example, commit 130443d60b1b ("md: refactor idle/frozen_sync_thread() to fix deadlock"). Fix this problem by using stop_sync_thread() to unregister sync_thread, like md/raid did. Fixes: be83651f0050 ("DM RAID: Add message/status support for changing sync action") Signed-off-by: Yu Kuai --- drivers/md/dm-raid.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 5ce3c6020b1b..6b6c011d9f69 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3719,6 +3719,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, { struct raid_set *rs = ti->private; struct mddev *mddev = &rs->md; + int ret = 0; if (!mddev->pers || !mddev->pers->sync_request) return -EINVAL; @@ -3726,17 +3727,24 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) return -EBUSY; - if (!strcasecmp(argv[0], "frozen")) - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); - else - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + if (!strcasecmp(argv[0], "frozen")) { + ret = mddev_lock(mddev); + if (ret) + return ret; - if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) { - if (mddev->sync_thread) { - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); - } - } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle) + md_frozen_sync_thread(mddev); + mddev_unlock(mddev); + } else if (!strcasecmp(argv[0], "idle")) { + ret = mddev_lock(mddev); + if (ret) + return ret; + + md_idle_sync_thread(mddev); + mddev_unlock(mddev); + } + + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + if (decipher_sync_action(mddev, mddev->recovery) != st_idle) return -EBUSY; else if (!strcasecmp(argv[0], "resync")) ; /* MD_RECOVERY_NEEDED set below */ From patchwork Sat Jan 27 07:47:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533997 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7D411DFEB for ; Sat, 27 Jan 2024 07:52:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341954; cv=none; b=j/OYH5KS5xgmHnYRb19O0AmNWU+KnfjO+FDfSzJFmrHOnW26VUAzjIdvb4mEdrYo+pcNisvbrWyug1/qK7KootvKlp5kS89DTFsEFdYZbdMFoSVIYSJg+5kijDFfXfrdfHmh4TK2KZqBL5dCSiG/MX1HgbbnIsQomyDuOL+BEsc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341954; c=relaxed/simple; bh=pAubPNXHPlM/0y/n8diT6F9qbOAS9X4ipmJVWOgC2Tk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=g1ojw/fKQIpUL1U98y2AgZlnCDFWpW17OndOGIJVvVa5eUoPyL17IcWH0Ga9qPTqQFJQ9iv00oNv59pRtD0EmlQdjkeLPhZAkJ/hH4zkiKbbDM1Nv5V1Ri2RdH4bo9kXNYrIWSmNzYhdJeNptV11Rh2OJ5OSfCEetadKP9ZjJ3I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TMRc80Fwvz4f3lfF for ; Sat, 27 Jan 2024 15:52:24 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 66DBB1A0171 for ; Sat, 27 Jan 2024 15:52:30 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S12; Sat, 27 Jan 2024 15:52:30 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 08/12] dm-raid: remove mddev_suspend/resume() Date: Sat, 27 Jan 2024 15:47:50 +0800 Message-Id: <20240127074754.2380890-9-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S12 X-Coremail-Antispam: 1UD129KBjvJXoW7CF4fCw4fWr1fCr4kZw4fKrg_yoW8XFykpr 4rWayYvw48JrWjq3WDAan2gFy3t3ZYgrWjyrZ3Gas3u3W3Kw1Sgry8tw4jqFWqyFWxJ3W5 Aa1Yyw4kuryUKrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr 0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai dm_suspend() already make sure that no new IO can be issued and will wait for all dispatched IO to be done. There is no need to call mddev_suspend() to make sure that again. Signed-off-by: Yu Kuai --- drivers/md/dm-raid.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 6b6c011d9f69..f1637cf88559 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3241,7 +3241,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) rs->md.in_sync = 1; /* Has to be held on running the array */ - mddev_suspend_and_lock_nointr(&rs->md); + mddev_lock_nointr(&rs->md); /* Keep array frozen until resume. */ md_frozen_sync_thread(&rs->md); @@ -3825,11 +3825,9 @@ static void raid_postsuspend(struct dm_target *ti) { struct raid_set *rs = ti->private; - if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { + if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) /* Writes have to be stopped before suspending to avoid deadlocks. */ md_stop_writes(&rs->md); - mddev_suspend(&rs->md, false); - } } static void attempt_restore_of_faulty_devices(struct raid_set *rs) @@ -4085,7 +4083,7 @@ static void raid_resume(struct dm_target *ti) mddev->ro = 0; mddev->in_sync = 0; md_unfrozen_sync_thread(mddev); - mddev_unlock_and_resume(mddev); + mddev_unlock(mddev); } } From patchwork Sat Jan 27 07:47:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533998 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B91892136A for ; Sat, 27 Jan 2024 07:52:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341955; cv=none; b=huIjCAy3Sz5i5RWvmpRDVrh/5TQtZWYn1j9YDNzwwV7BzDWJJbgv+Eo/zNz68TrP8QMEAd/j7b38VG9MVosmbfFl2wPT6YEYn4/KU4ZaBQhWqfrijfhnIb/+ddMdk9VvzKSQbzVJABZu9CgfXGCWnNSCL7hfniaayEtYf4/ah0o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341955; c=relaxed/simple; bh=QXQ0+F8evE2D3KqXW332EFOwbheuBZ0/b6C8srK6I/w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dq6mPefrNblmFfB78+lrbtcr8h+pkl/1PxSmDeHFWPebUTXPsJH3HVivDa6iUN+ILAdfQ/DZkLL8dnQM/rMuLbbeLbYBTXzU5FPzKlGGbJgtHPYMpWcdorXclP8Cx1ZLTNHI3GnLrT949Q2NjsrjlKc+RDfiUCBzf07mtSQU2Z0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TMRcC0QBjz4f3jsP for ; Sat, 27 Jan 2024 15:52:27 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 1349E1A0172 for ; Sat, 27 Jan 2024 15:52:31 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S13; Sat, 27 Jan 2024 15:52:30 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 09/12] dm-raid: add a new helper prepare_suspend() in md_personality Date: Sat, 27 Jan 2024 15:47:51 +0800 Message-Id: <20240127074754.2380890-10-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S13 X-Coremail-Antispam: 1UD129KBjvJXoW7Kr47Kr4xtrykGw13uF4fGrg_yoW8Wr48pa yIqay5Ar4UJa9Iqw1DXF4kZa4aq3ZIgrWqyryfJayfZa4Igrn3W3WFqayDZrZ0kFy3CF13 Aa1Ut3ykuF109rDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr 0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai There are no functional changes for now, prepare to fix a deadlock for dm-raid456. Signed-off-by: Yu Kuai --- drivers/md/dm-raid.c | 10 +++++++--- drivers/md/md.h | 1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index f1637cf88559..ede2e4574ee5 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3806,10 +3806,14 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits) static void raid_presuspend(struct dm_target *ti) { struct raid_set *rs = ti->private; + struct mddev *mddev = &rs->md; - mddev_lock_nointr(&rs->md); - md_frozen_sync_thread(&rs->md); - mddev_unlock(&rs->md); + mddev_lock_nointr(mddev); + md_frozen_sync_thread(mddev); + mddev_unlock(mddev); + + if (mddev->pers && mddev->pers->prepare_suspend) + mddev->pers->prepare_suspend(mddev); } static void raid_presuspend_undo(struct dm_target *ti) diff --git a/drivers/md/md.h b/drivers/md/md.h index 437ab70ce79b..29b476ff3b9f 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -617,6 +617,7 @@ struct md_personality int (*start_reshape) (struct mddev *mddev); void (*finish_reshape) (struct mddev *mddev); void (*update_reshape_pos) (struct mddev *mddev); + void (*prepare_suspend) (struct mddev *mddev); /* quiesce suspends or resumes internal processing. * 1 - stop new actions and wait for action io to complete * 0 - return to normal behaviour From patchwork Sat Jan 27 07:47:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13533999 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A22982231D for ; Sat, 27 Jan 2024 07:52:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341956; cv=none; b=R2OCrX5TEuiuClBTozIPb76mOw6TVwfI6qxQUSe0QxgO0yOeWpRxDa13vS3+q66JkGz1sAglHDSSrIsUXPXNSVhOfwNAmGMJqKaIuq8MtxW5dBKiT0K+pIk87jfpYP/Cg0+HVEmxpVUd9kCb7Mbni52KR8JWab2Plehi9xdz7hU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341956; c=relaxed/simple; bh=uMmvtP4Kk96xFStFUNR5RYqJMKlOONO/eRXGKcqJhr0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ck/tKOBp788lkdA7Mf0mx40VibTtEvOFlzWGJc+c7BH2lM5pxoCHhwhYM4eweiUyk+J6Em9Lk/l5PK6Ptby1ClHr6VL+KNH2CwwwDwnpNk5fKVEpZIFFhC2pcufLz519J6G0jGC7IztcxoK/Dvgv4TODQNLIsc9kyeZOp6VtI/A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TMRc92Vptz4f3lfR for ; Sat, 27 Jan 2024 15:52:25 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id B4BEE1A01E9 for ; Sat, 27 Jan 2024 15:52:31 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S14; Sat, 27 Jan 2024 15:52:31 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 10/12] md: export helper md_is_rdwr() Date: Sat, 27 Jan 2024 15:47:52 +0800 Message-Id: <20240127074754.2380890-11-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S14 X-Coremail-Antispam: 1UD129KBjvJXoW7Kr47Kr4xtrykGw13uF4fGrg_yoW8XFyfpa 9agFy3ArW5ZF43X3yUJayDua45Z3WfKrWqkry3Z3ykXa43Jrs8CF4rGFWDJrWDWFyfAF1a ga15JF1UuF10ywUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr 0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai There are no functional changes for now, prepare to fix a deadlock for dm-raid456. Signed-off-by: Yu Kuai --- drivers/md/md.c | 12 ------------ drivers/md/md.h | 12 ++++++++++++ 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 6a76bd27e381..b942cc126ce0 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -99,18 +99,6 @@ static void mddev_detach(struct mddev *mddev); static void export_rdev(struct md_rdev *rdev, struct mddev *mddev); static void md_wakeup_thread_directly(struct md_thread __rcu *thread); -enum md_ro_state { - MD_RDWR, - MD_RDONLY, - MD_AUTO_READ, - MD_MAX_STATE -}; - -static bool md_is_rdwr(struct mddev *mddev) -{ - return (mddev->ro == MD_RDWR); -} - /* * Default number of read corrections we'll attempt on an rdev * before ejecting it from the array. We divide the read error diff --git a/drivers/md/md.h b/drivers/md/md.h index 29b476ff3b9f..98da86d38ba8 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -558,6 +558,18 @@ enum recovery_flags { MD_RESYNCING_REMOTE, /* remote node is running resync thread */ }; +enum md_ro_state { + MD_RDWR, + MD_RDONLY, + MD_AUTO_READ, + MD_MAX_STATE +}; + +static bool md_is_rdwr(struct mddev *mddev) +{ + return (mddev->ro == MD_RDWR); +} + static inline int __must_check mddev_lock(struct mddev *mddev) { return mutex_lock_interruptible(&mddev->reconfig_mutex); From patchwork Sat Jan 27 07:47:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13534000 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27C99241FF for ; Sat, 27 Jan 2024 07:52:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341957; cv=none; b=Incw/49kbxA9NG1eYRwhd9R9dLd9DnHPT+0hFz3OqTGtvD/SXh5VG4DF+wjHagY2VVWMb44xZsrKCNpBLvmEUmkR9dZA0kjX1ZNcAOL/68bk70No7tQs/W7P+B3REAwmvrOrh1JD25fFxAOZa1EuN8wBR37cFwGMBcvryr7Bym8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341957; c=relaxed/simple; bh=Y2CyAULl8CDT1e354bpgbi56zPjrKUGKSSoo0xNiEag=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ul0vXcPBKuM9BzuMYEp1qyEGz2zCDzkSHZQ9aK2P1ItkR/lS4m8lmLN6eVHJ/uOZC74aFsXqah3sPj/Bw8Ivb8X2+40NY/60OgDynv84T0cpnnDlFa8l02t2vc9E9DEbetiTnjVrjhUs4KDGMqlFddyKgILBJD6PkO2ThfVjVzM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TMRcD2kQCz4f3jsK for ; Sat, 27 Jan 2024 15:52:28 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 61A7E1A016E for ; Sat, 27 Jan 2024 15:52:32 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S15; Sat, 27 Jan 2024 15:52:32 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 11/12] md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape Date: Sat, 27 Jan 2024 15:47:53 +0800 Message-Id: <20240127074754.2380890-12-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S15 X-Coremail-Antispam: 1UD129KBjvJXoW3Gr13WF4xJry7XF4UXw13Jwb_yoW7uF1fpa 9Fkas8Zr40v34akrWDJF4kWa4F9Fs7Kry3ZF93Cw4fZ3W3Xr1kA3W5Jay5ur98AFy8J3y5 JFW5tFyDurs8KFDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr 0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai For raid456, if reshape is still in progress, then IO across reshape position will wait for reshape to make progress. However, for dm-raid, in following cases reshape will never make progress hence IO will hang: 1) the array is read-only; 2) MD_RECOVERY_WAIT is set; 3) MD_RECOVERY_FROZEN is set; After commit c467e97f079f ("md/raid6: use valid sector values to determine if an I/O should wait on the reshape") fix the problem that IO across reshape position doesn't wait for reshape, the dm-raid test shell/lvconvert-raid-reshape.sh start to hang: [root@fedora ~]# cat /proc/979/stack [<0>] wait_woken+0x7d/0x90 [<0>] raid5_make_request+0x929/0x1d70 [raid456] [<0>] md_handle_request+0xc2/0x3b0 [md_mod] [<0>] raid_map+0x2c/0x50 [dm_raid] [<0>] __map_bio+0x251/0x380 [dm_mod] [<0>] dm_submit_bio+0x1f0/0x760 [dm_mod] [<0>] __submit_bio+0xc2/0x1c0 [<0>] submit_bio_noacct_nocheck+0x17f/0x450 [<0>] submit_bio_noacct+0x2bc/0x780 [<0>] submit_bio+0x70/0xc0 [<0>] mpage_readahead+0x169/0x1f0 [<0>] blkdev_readahead+0x18/0x30 [<0>] read_pages+0x7c/0x3b0 [<0>] page_cache_ra_unbounded+0x1ab/0x280 [<0>] force_page_cache_ra+0x9e/0x130 [<0>] page_cache_sync_ra+0x3b/0x110 [<0>] filemap_get_pages+0x143/0xa30 [<0>] filemap_read+0xdc/0x4b0 [<0>] blkdev_read_iter+0x75/0x200 [<0>] vfs_read+0x272/0x460 [<0>] ksys_read+0x7a/0x170 [<0>] __x64_sys_read+0x1c/0x30 [<0>] do_syscall_64+0xc6/0x230 [<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74 This is because reshape can't make progress. For md/raid, the problem doesn't exist because register new sync_thread doesn't rely on the IO to be done any more: 1) If array is read-only, it can switch to read-write by ioctl/sysfs; 2) md/raid never set MD_RECOVERY_WAIT; 3) If MD_RECOVERY_FROZEN is set, mddev_suspend() doesn't hold 'reconfig_mutex', hence it can be cleared and reshape can continue by sysfs api 'sync_action'. However, I'm not sure yet how to avoid the problem in dm-raid yet. This patch detect the above 3 cases in dm_suspend(), and fail those IO directly. If user really meet the IO error, then it means they're reading the wrong data before c467e97f079f. And it's safe to read/write the array after reshape make progress successfully. Signed-off-by: Yu Kuai --- drivers/md/md.h | 2 +- drivers/md/raid5.c | 32 +++++++++++++++++++++++++++++++- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/drivers/md/md.h b/drivers/md/md.h index 98da86d38ba8..8e81f9e2fb20 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -565,7 +565,7 @@ enum md_ro_state { MD_MAX_STATE }; -static bool md_is_rdwr(struct mddev *mddev) +static inline bool md_is_rdwr(struct mddev *mddev) { return (mddev->ro == MD_RDWR); } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 6a7a32f7fb91..812d7ec64da5 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5915,6 +5915,13 @@ static int add_all_stripe_bios(struct r5conf *conf, return ret; } +static bool reshape_disabled(struct mddev *mddev) +{ + return !md_is_rdwr(mddev) || + test_bit(MD_RECOVERY_WAIT, &mddev->recovery) || + test_bit(MD_RECOVERY_FROZEN, &mddev->recovery); +} + static enum stripe_result make_stripe_request(struct mddev *mddev, struct r5conf *conf, struct stripe_request_ctx *ctx, sector_t logical_sector, struct bio *bi) @@ -5946,7 +5953,8 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, if (ahead_of_reshape(mddev, logical_sector, conf->reshape_safe)) { spin_unlock_irq(&conf->device_lock); - return STRIPE_SCHEDULE_AND_RETRY; + ret = STRIPE_SCHEDULE_AND_RETRY; + goto out; } } spin_unlock_irq(&conf->device_lock); @@ -6025,6 +6033,13 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, out_release: raid5_release_stripe(sh); +out: + if (ret == STRIPE_SCHEDULE_AND_RETRY && !mddev->gendisk && + reshape_disabled(mddev)) { + bi->bi_status = BLK_STS_IOERR; + ret = STRIPE_FAIL; + pr_err("dm-raid456: io failed across reshape position while reshape can't make progress"); + } return ret; } @@ -8909,6 +8924,18 @@ static int raid5_start(struct mddev *mddev) return r5l_start(conf->log); } +/* + * This is only used for dm-raid456, caller already frozen sync_thread, hence + * if rehsape is still in progress, io that is waiting for reshape can never be + * done now, hence wake up and handle those IO. + */ +static void raid5_prepare_suspend(struct mddev *mddev) +{ + struct r5conf *conf = mddev->private; + + wake_up(&conf->wait_for_overlap); +} + static struct md_personality raid6_personality = { .name = "raid6", @@ -8932,6 +8959,7 @@ static struct md_personality raid6_personality = .quiesce = raid5_quiesce, .takeover = raid6_takeover, .change_consistency_policy = raid5_change_consistency_policy, + .prepare_suspend = raid5_prepare_suspend, }; static struct md_personality raid5_personality = { @@ -8956,6 +8984,7 @@ static struct md_personality raid5_personality = .quiesce = raid5_quiesce, .takeover = raid5_takeover, .change_consistency_policy = raid5_change_consistency_policy, + .prepare_suspend = raid5_prepare_suspend, }; static struct md_personality raid4_personality = @@ -8981,6 +9010,7 @@ static struct md_personality raid4_personality = .quiesce = raid5_quiesce, .takeover = raid4_takeover, .change_consistency_policy = raid5_change_consistency_policy, + .prepare_suspend = raid5_prepare_suspend, }; static int __init raid5_init(void) From patchwork Sat Jan 27 07:47:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13534001 X-Patchwork-Delegate: snitzer@redhat.com Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA40524B50 for ; Sat, 27 Jan 2024 07:52:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341957; cv=none; b=s0KaMxRGZouwrb+412ppbSQTwoklHNGTxGN4yXryjojffxfa4v1bGnEH6+wc3DJXdePLV+mu8k9aWr2S3caDr1z8+nErXL8tMGvRxK1PXah3n7M1SML2pfVLI2dXUgIvJnQVUbBk/OISDsg767qL+7iASUR2zpTq77X4An8m3Ug= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706341957; c=relaxed/simple; bh=jdkIPGev0C5+tMmvWTmPjir2mBpBE6qhDyE0vWrNCf0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cZSE9F+98VmARKRIL9zQU585LLFbLTirQYo6qQSdO5IqbDHDcu9syVxvu8W/A07HqrTZbSZt4C+vS/PSuEJFbYIkfLD8zxJl9Fw7aJF6WUucEmWPANeIGNhIGqB+aPe4KCKgIMhkRE+g1yR1tjgt+SGIF2q2QHgj44ES/7lzOUY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TMRcF0F9Cz4f3kF8 for ; Sat, 27 Jan 2024 15:52:29 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 0DEEB1A038B for ; Sat, 27 Jan 2024 15:52:33 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAetrRlWjeACA--.31980S16; Sat, 27 Jan 2024 15:52:32 +0800 (CST) From: Yu Kuai To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v3 12/12] dm-raid: fix lockdep waring in "pers->hot_add_disk" Date: Sat, 27 Jan 2024 15:47:54 +0800 Message-Id: <20240127074754.2380890-13-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240127074754.2380890-1-yukuai1@huaweicloud.com> References: <20240127074754.2380890-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAetrRlWjeACA--.31980S16 X-Coremail-Antispam: 1UD129KBjvJXoW7Cr45Jw1UZF45JF13GFW5GFg_yoW8Jw4Up3 ZrK343Kw4DJr48Za1qvw1q9a45Jan8K3ySyrZxG395ZFy7ZrZI9ws8Ga1agFWDAFZ3ta98 AFW3J398W3Z8K37anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr 0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai The lockdep assert is added by commit a448af25becf ("md/raid10: remove rcu protection to access rdev from conf") in print_conf(). And I didn't notice that dm-raid is calling "pers->hot_add_disk" without holding 'reconfig_mutex'. "pers->hot_add_disk" read and write many fields that is protected by 'reconfig_mutex', and raid_resume() already grab the lock in other contex. Hence fix this problem by protecting "pers->host_add_disk" with the lock. Fixes: 9092c02d9435 ("DM RAID: Add ability to restore transiently failed devices on resume") Fixes: a448af25becf ("md/raid10: remove rcu protection to access rdev from conf") Signed-off-by: Yu Kuai --- drivers/md/dm-raid.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index ede2e4574ee5..ed8c28952b14 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -4075,7 +4075,9 @@ static void raid_resume(struct dm_target *ti) * Take this opportunity to check whether any failed * devices are reachable again. */ + mddev_lock_nointr(mddev); attempt_restore_of_faulty_devices(rs); + mddev_unlock(mddev); } if (test_and_clear_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {