From patchwork Wed Feb 1 03:12:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Sandeen X-Patchwork-Id: 9548883 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4EA6960425 for ; Wed, 1 Feb 2017 03:18:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3369928135 for ; Wed, 1 Feb 2017 03:18:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 250692831C; Wed, 1 Feb 2017 03:18:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mx5-phx2.redhat.com (mx5-phx2.redhat.com [209.132.183.37]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0991A28135 for ; Wed, 1 Feb 2017 03:18:43 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx5-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v113HJIE011654; Tue, 31 Jan 2017 22:17:20 -0500 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v113HI2e032452 for ; Tue, 31 Jan 2017 22:17:18 -0500 Received: from mx1.redhat.com (ext-mx01.extmail.prod.ext.phx2.redhat.com [10.5.110.25]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v113HIMM025016 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 31 Jan 2017 22:17:18 -0500 Received: from sandeen.net (sandeen.net [63.231.237.45]) by mx1.redhat.com (Postfix) with ESMTP id 989D981F07; Wed, 1 Feb 2017 03:17:17 +0000 (UTC) Received: from [10.0.0.4] (liberator [10.0.0.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTPSA id 8E97B2B49; Tue, 31 Jan 2017 21:10:51 -0600 (CST) To: linux-xfs , dm-devel@redhat.com From: Eric Sandeen Message-ID: Date: Tue, 31 Jan 2017 21:12:07 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 X-Greylist: Delayed for 00:05:06 by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 01 Feb 2017 03:17:17 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 01 Feb 2017 03:17:17 +0000 (UTC) for IP:'63.231.237.45' DOMAIN:'sandeen.net' HELO:'sandeen.net' FROM:'sandeen@sandeen.net' RCPT:'' X-RedHat-Spam-Score: 3.697 *** (BAYES_99, BAYES_999, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS) 63.231.237.45 sandeen.net 63.231.237.45 sandeen.net X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Scanned-By: MIMEDefang 2.78 on 10.5.110.25 X-loop: dm-devel@redhat.com Cc: Mike Snitzer Subject: [dm-devel] error propagation problem on xfs over dm stripe X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Virus-Scanned: ClamAV using ClamSMTP xfstest generic/108 creates a stripe then fails one leg to see if io errors on only a subset of an io get reported to userspace: # Test partial block device failure. Calls like fsync() should report failure # on partial I/O failure, e.g. a single failed disk in a raid 0 stripe. # # Test motivated by an XFS bug, and this commit fixed the issue # xfs: return errors from partial I/O failures to files This started failing with a recent xfs commit, but the change wasn't expected to change anything related to this behavior at all. My best guess was that allocation and IO patterns shifted over the stripe, and I think that turned out to be right. I tracked this down to being unique to dm; an md stripe of the same geometry doesn't have this issue. Root cause seems to be that dm's dec_pending() overwrites a bio's bi_error regardless of current state, and in some cases will overwrite an -EIO with a zero. This seems to fix it for me: but Mike was a little uneasy, not knowing for sure how we got here to overwrite this bio's error (hopefully I'm representing his concerns fairly and correctly). One other clue is that I think this is a chained bio, something xfs does use. I'll submit the above as a proper dm patch if it seems sane, but figured I'd throw this out on the lists as a bit of a heads up / question / RFC before I do that. -Eric --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 3086da5..3555ba8 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -808,7 +808,9 @@ static void dec_pending(struct dm_io *io, int error) } else { /* done with normal IO or empty flush */ trace_block_bio_complete(md->queue, bio, io_error); - bio->bi_error = io_error; + /* don't overwrite or clear existing errors */ + if (!bio->bi_error) + bio->bi_error = io_error; bio_endio(bio); } }