From patchwork Tue Jul 8 00:55:14 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junichi Nomura X-Patchwork-Id: 4501211 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 2B5A2BEEAA for ; Tue, 8 Jul 2014 01:01:59 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4E7F5202F0 for ; Tue, 8 Jul 2014 01:01:58 +0000 (UTC) Received: from mx6-phx2.redhat.com (mx6-phx2.redhat.com [209.132.183.39]) by mail.kernel.org (Postfix) with ESMTP id 38076201D5 for ; Tue, 8 Jul 2014 01:01:57 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx6-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s680vmos004573; Mon, 7 Jul 2014 20:57:49 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id s680vk1E029120 for ; Mon, 7 Jul 2014 20:57:46 -0400 Received: from mx1.redhat.com (ext-mx12.extmail.prod.ext.phx2.redhat.com [10.5.110.17]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s680vkhP021986; Mon, 7 Jul 2014 20:57:46 -0400 Received: from tyo202.gate.nec.co.jp (TYO202.gate.nec.co.jp [210.143.35.52]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s680viif011678 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 7 Jul 2014 20:57:45 -0400 Received: from mailgate3.nec.co.jp ([10.7.69.160]) by tyo202.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id s680veKl003129; Tue, 8 Jul 2014 09:57:40 +0900 (JST) Received: from mailsv4.nec.co.jp (imss61.nec.co.jp [10.7.69.156]) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) with ESMTP id s680ver12333; Tue, 8 Jul 2014 09:57:40 +0900 (JST) Received: from mail01b.kamome.nec.co.jp (mail01b.kamome.nec.co.jp [10.25.43.2]) by mailsv4.nec.co.jp (8.13.8/8.13.4) with ESMTP id s680vehD008714; Tue, 8 Jul 2014 09:57:40 +0900 (JST) Received: from bpxc99gp.gisp.nec.co.jp ([10.38.151.137] [10.38.151.137]) by mail03.kamome.nec.co.jp with ESMTP id BT-MMP-380549; Tue, 8 Jul 2014 09:55:16 +0900 Received: from BPXM12GP.gisp.nec.co.jp ([169.254.2.164]) by BPXC09GP.gisp.nec.co.jp ([10.38.151.137]) with mapi id 14.02.0328.011; Tue, 8 Jul 2014 09:55:15 +0900 From: Junichi Nomura To: Bart Van Assche , device-mapper development Thread-Topic: [dm-devel] v3.15 dm-mpath regression: cable pull test causes I/O hang Thread-Index: AQHPmekdmlPmMqt6VU2nPPIcmqCg5JuUw9sA Date: Tue, 8 Jul 2014 00:55:14 +0000 Message-ID: <11AF7C027C4C02408624617A498607840132038A@BPXM12GP.gisp.nec.co.jp> References: <53AD6B62.2020407@acm.org> <20140627133345.GA6150@redhat.com> <20140702220223.GA23894@redhat.com> <53B56120.8040802@acm.org> <20140703140516.GB28104@redhat.com> <53B569E1.1010405@acm.org> <11AF7C027C4C02408624617A4986078401311EA8@BPXM12GP.gisp.nec.co.jp> <53BAA35B.30204@acm.org> In-Reply-To: <53BAA35B.30204@acm.org> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.34.125.85] Content-ID: <3046E6AE287C724894244596136C49FB@gisp.nec.co.jp> MIME-Version: 1.0 X-RedHat-Spam-Score: -4.602 (BAYES_00, DCC_REPUT_00_12, RCVD_IN_DNSWL_MED, SPF_HELO_PASS, SPF_PASS) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Scanned-By: MIMEDefang 2.68 on 10.5.110.17 X-MIME-Autoconverted: from quoted-printable to 8bit by lists01.pubmisc.prod.ext.phx2.redhat.com id s680vk1E029120 X-loop: dm-devel@redhat.com Cc: Mike Snitzer Subject: Re: [dm-devel] v3.15 dm-mpath regression: cable pull test causes I/O hang X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 07/07/14 22:40, Bart Van Assche wrote: > Thanks for looking into this issue. I have attached the requested > information to this e-mail for a test run with kernel 3.16-rc4 at the > initiator side. Thank you, Bart. The information is helpful. >From "dmsetup table" output, hardware handler is not used in your setup. So pg_init is not involved. > # cat /proc/diskstats .. > 253 1 dm-1 127 0 1016 1070 0 0 0 0 1 278360 278820 In-flight IO remains. > # dmsetup info -c > Name Maj Min Stat Open Targ Event UUID ... > 26466363537333266 253 1 L--w 1 1 0 mpath-26466363537333266 Typical reason of remainig IO is device staying as suspended. But the device state is ok here. > # dmsetup status ... > 26466363537333266: 0 256000 multipath 2 1 0 0 1 1 E 0 2 2 8:48 A 0 0 1 8:160 A 0 0 1 Single path group with both paths being active on dm-1. But the path group is not active. I suspect what's happening here is nobody clears m->queue_io: multipath_busy() returns busy when queue_io=1 while clearing queue_io needs __choose_pgpath(), which won't be called if multipath_busy() is true. I think if you run 'sg_inq /dev/dm-1' for example in this case, the device will start working again. Since ioctl is not affected by multipath_busy(), somehow the problem was hidden in many cases by udev activities, for example. Attached patch should fix the problem. Could you give it a try? - Jun'ichi Nomura, NEC Corporation pg_ready() checks the current state of the multipath and may return false even if a new IO is needed to change the state. OTOH, if multipath_busy() returns busy, a new IO will not be sent to multipath target and the state change won't happen. That results in lock up. The intent of multipath_busy() is to avoid unnecessary cycles of dequeue + request_fn + requeue if it is known that multipath device will requeue. Such situation would be: - path group is being activated - there is no path and the multipath is setup to requeue if no path This patch should fix the problem introduced as a part of this commit: commit e809917735ebf1b9a56c24e877ce0d320baee2ec dm mpath: push back requests instead of queueing --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index ebfa411..d58343e 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1620,8 +1620,9 @@ static int multipath_busy(struct dm_target *ti) spin_lock_irqsave(&m->lock, flags); - /* pg_init in progress, requeue until done */ - if (!pg_ready(m)) { + /* pg_init in progress or no paths available */ + if (m->pg_init_in_progress || + (!m->nr_valid_paths && m->queue_if_no_path)) { busy = 1; goto out; }