From patchwork Wed Sep 5 17:36:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heinz Mauelshagen X-Patchwork-Id: 10589213 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8BB8C14E0 for ; Wed, 5 Sep 2018 17:36:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 74BB3290B3 for ; Wed, 5 Sep 2018 17:36:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 68CBF29353; Wed, 5 Sep 2018 17:36:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0B2B7290B3 for ; Wed, 5 Sep 2018 17:36:55 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B18A7308FBA2; Wed, 5 Sep 2018 17:36:53 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 74E98608EE; Wed, 5 Sep 2018 17:36:53 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 330274BB7F; Wed, 5 Sep 2018 17:36:53 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w85HaphB027950 for ; Wed, 5 Sep 2018 13:36:51 -0400 Received: by smtp.corp.redhat.com (Postfix) id 12673FA96C; Wed, 5 Sep 2018 17:36:51 +0000 (UTC) Delivered-To: dm-devel@redhat.com Received: from o.ww.redhat.com (ovpn-204-65.brq.redhat.com [10.40.204.65]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6CE0CFA96D; Wed, 5 Sep 2018 17:36:50 +0000 (UTC) From: Heinz Mauelshagen To: heinzm@redhat.com, dm-devel@redhat.com Date: Wed, 5 Sep 2018 19:36:42 +0200 Message-Id: <344f4b6d298b2e4058bd57e7197c3914a9040ba5.1536167421.git.heinzm@redhat.com> In-Reply-To: References: In-Reply-To: References: X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-loop: dm-devel@redhat.com Subject: [dm-devel] [PATCH 2/5] dm-raid: fix stripe adding reshape deadlock X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Wed, 05 Sep 2018 17:36:54 +0000 (UTC) X-Virus-Scanned: ClamAV using ClamSMTP When initiating a stripe adding reshape, a deadlock between md_stop_writes() waiting for the sync thread to stop and the running sync thread waiting for inactive stripes occurs (this frequently happens on single-core but rarely on multi-core systems). Resolve by setting MD_RECOVERY_WAIT to request the main MD resynchronization thread worker function md_do_sync() to bail out when initiating the reshape via constructor arguments. Don't set the flag when reloading without those arguments and avoid superfluous mddev_{suspend,resume} setting up reshape. Passes all lvm2 raid tests. Signed-off-by: Heinz Mauelshagen --- Documentation/device-mapper/dm-raid.txt | 1 + drivers/md/dm-raid.c | 13 ++++--------- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.txt index f68d06d6f28b..efb73f521568 100644 --- a/Documentation/device-mapper/dm-raid.txt +++ b/Documentation/device-mapper/dm-raid.txt @@ -349,3 +349,4 @@ Version History state races. 1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen 1.13.3 Fix reshape race on small devices +1.14.0 Fix stripe adding reshape deadlock/potential data corruption diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index ecb7706f7330..03dd915eff9e 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3871,14 +3871,13 @@ static int rs_start_reshape(struct raid_set *rs) struct mddev *mddev = &rs->md; struct md_personality *pers = mddev->pers; + /* Don't allow the sync thread to work until the table gets reloaded. */ + set_bit(MD_RECOVERY_WAIT, &mddev->recovery); + r = rs_setup_reshape(rs); if (r) return r; - /* Need to be resumed to be able to start reshape, recovery is frozen until raid_resume() though */ - if (test_and_clear_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) - mddev_resume(mddev); - /* * Check any reshape constraints enforced by the personalility * @@ -3902,10 +3901,6 @@ static int rs_start_reshape(struct raid_set *rs) } } - /* Suspend because a resume will happen in raid_resume() */ - set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags); - mddev_suspend(mddev); - /* * Now reshape got set up, update superblocks to * reflect the fact so that a table reload will @@ -4002,7 +3997,7 @@ static void raid_resume(struct dm_target *ti) static struct target_type raid_target = { .name = "raid", - .version = {1, 13, 3}, + .version = {1, 14, 0}, .module = THIS_MODULE, .ctr = raid_ctr, .dtr = raid_dtr,