From patchwork Sun Nov 24 23:30:43 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonthan Brassow X-Patchwork-Id: 3227241 Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id EFC9C9F3AE for ; Sun, 24 Nov 2013 23:34:13 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 27B3820218 for ; Sun, 24 Nov 2013 23:34:13 +0000 (UTC) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by mail.kernel.org (Postfix) with ESMTP id 2F8EA201DE for ; Sun, 24 Nov 2013 23:34:12 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id rAONUh9q020990; Sun, 24 Nov 2013 18:30:45 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id rAONUfS5003040 for ; Sun, 24 Nov 2013 18:30:41 -0500 Received: from localhost (vpn-61-24.rdu2.redhat.com [10.10.61.24]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id rAONUe8k023214 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Sun, 24 Nov 2013 18:30:41 -0500 From: Jonathan Brassow To: jbrassow@redhat.com, neilb@suse.de, linux-raid@vger.kernel.org, dm-devel@redhat.com Date: Sun, 24 Nov 2013 17:30:43 -0600 Message-Id: <1385335843-14021-2-git-send-email-jbrassow@redhat.com> In-Reply-To: <1385335843-14021-1-git-send-email-jbrassow@redhat.com> References: <1385335843-14021-1-git-send-email-jbrassow@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-loop: dm-devel@redhat.com Subject: [dm-devel] [PATCH 1/1] MD/DM RAID: Fix hang due to recent RAID5 locking changes X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When commit 773ca82 was made in v3.12-rc1, it caused RAID4/5/6 devices that were created via device-mapper (dm-raid.c) to hang on creation. This is not necessarily the fault of that commit, but perhaps the way dm-raid.c was setting-up and activating devices. Device-mapper allows I/O and memory allocations in the constructor (i.e. raid_ctr()), but nominal and recovery I/O should not be allowed until a 'resume' is issued (i.e. raid_resume()). It has been problematic (at least in the past) to call mddev_resume before mddev_suspend was called, but this is how DM behaves - CTR then resume. To solve the problem, raid_ctr() was setting up the structures, calling md_run(), and then also calling mddev_suspend(). The stage was then set for raid_resume() to call mddev_resume(). Commit 773ca82 caused a change in behavior during raid5.c:run(). 'setup_conf->grow_stripes->grow_one_stripe' is called which creates the stripe cache and increments 'active_stripes'. 'grow_one_stripe->release_stripe' doesn't actually decrement 'active_stripes' anymore. The side effect of this is that when raid_ctr calls mddev_suspend, it waits for 'active_stripes' to reduce to 0 - which never happens. You could argue that the MD personalities should be able to handle either a suspend or a resume after 'md_run' is called, but it can't really handle either. To fix this, I've removed the call to mddev_suspend in raid_ctr and I've made the call to the personality's 'quiesce' function within mddev_resume dependent on whether the device is currently suspended. This patch is suitable and recommended for 3.12. Signed-off-by: Jonathan Brassow --- drivers/md/dm-raid.c | 1 - drivers/md/md.c | 5 ++++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 4880b69..cdad87c 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -1249,7 +1249,6 @@ static int raid_ctr(struct dm_target *ti, unsigned argc, char **argv) rs->callbacks.congested_fn = raid_is_congested; dm_table_add_target_callbacks(ti->table, &rs->callbacks); - mddev_suspend(&rs->md); return 0; size_mismatch: diff --git a/drivers/md/md.c b/drivers/md/md.c index 561a65f..383980d 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -359,9 +359,12 @@ EXPORT_SYMBOL_GPL(mddev_suspend); void mddev_resume(struct mddev *mddev) { + int should_quiesce = mddev->suspended; + mddev->suspended = 0; wake_up(&mddev->sb_wait); - mddev->pers->quiesce(mddev, 0); + if (should_quiesce) + mddev->pers->quiesce(mddev, 0); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread);