From patchwork Mon Feb 27 19:46:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heinz Mauelshagen X-Patchwork-Id: 9593837 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 26DEA601D7 for ; Mon, 27 Feb 2017 19:47:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1036A262FF for ; Mon, 27 Feb 2017 19:47:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 02ABC284BA; Mon, 27 Feb 2017 19:47:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mx5-phx2.redhat.com (mx5-phx2.redhat.com [209.132.183.37]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9CE7D262FF for ; Mon, 27 Feb 2017 19:47:48 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx5-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v1RJkZUD057950; Mon, 27 Feb 2017 14:46:35 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v1RJkXXT013940 for ; Mon, 27 Feb 2017 14:46:33 -0500 Received: from redhat.com.com (ovpn-117-244.ams2.redhat.com [10.36.117.244]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v1RJkVt0013187; Mon, 27 Feb 2017 14:46:32 -0500 From: heinzm@redhat.com To: dm-devel@redhat.com Date: Mon, 27 Feb 2017 20:46:27 +0100 Message-Id: <20170227194627.22657-1-heinzm@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-loop: dm-devel@redhat.com Cc: heinzm@redhat.com Subject: [dm-devel] [PATCH] dm raid: fix data corruption on reshape request X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Virus-Scanned: ClamAV using ClamSMTP From: Heinz Mauelshagen The lvm2 sequence to process constructor flags triggering a rebuild or a reshape is defined as: - load with table flags (e.g. rebuild/delta_disks/data_offset) - clear out the flags in lvm2 - store the lvm2 metadata reloading the adjusted mapping in order to prevent requesting a rebuild or a reshape over and over again on activation Currently, loading an inactive table with those flags dm-raid directly starts the rebuild/reshape thus updating the raid metadata on resume about the progress. The aforementioned second reload to reset the flags accesses the versatile progress state kept in raid superblocks in the constructor. Because the active mapping is still processing the reshape, that position will be stale by the time the device is resumed. In case of reshaping, this causes data corruption by processing already reshaped stripes again. In case of rebuilds it does _not_ cause data corruption but involves superfluous rebuilds. Fix by keeping the raid set frozen during the first table load and allowing it during the second. This patch is based on https://patchwork.kernel.org/patch/9485615 "dm raid: fix transient device failure processing" https://patchwork.kernel.org/patch/9454975 "dm raid: journal device support" Signed-off-by: Heinz Mauelshagen --- drivers/md/dm-raid.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index b8f978e..f750493 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -92,6 +92,8 @@ struct raid_dev { #define CTR_FLAG_DATA_OFFSET (1 << __CTR_FLAG_DATA_OFFSET) #define CTR_FLAG_RAID10_USE_NEAR_SETS (1 << __CTR_FLAG_RAID10_USE_NEAR_SETS) +#define RESUME_STAY_FROZEN_FLAGS (CTR_FLAG_DELTA_DISKS | \ + CTR_FLAG_DATA_OFFSET) /* * Definitions of various constructor flags to * be used in checks of valid / invalid flags @@ -3643,7 +3645,15 @@ static void raid_resume(struct dm_target *ti) mddev->ro = 0; mddev->in_sync = 0; - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + /* + * Keep the RAID set frozen in case flags respective to + * reshape or rebuild are set until an imminent inactive + * table load/resume occurs. This ensures that the + * constructor for the inactive table retrieves an + * up-to-date reshape_position. + */ + if (!(rs->ctr_flags & RESUME_STAY_FROZEN_FLAGS)) + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); if (mddev->suspended) mddev_resume(mddev); @@ -3651,7 +3661,7 @@ static void raid_resume(struct dm_target *ti) static struct target_type raid_target = { .name = "raid", - .version = {1, 9, 1}, + .version = {1, 10, 2}, .module = THIS_MODULE, .ctr = raid_ctr, .dtr = raid_dtr,