From patchwork Mon Apr 20 18:05:00 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: babu moger X-Patchwork-Id: 19038 Received: from hormel.redhat.com (hormel1.redhat.com [209.132.177.33]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n3KI6Q9i015816 for ; Mon, 20 Apr 2009 18:06:27 GMT Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com [10.8.4.110]) by hormel.redhat.com (Postfix) with ESMTP id 381246196D3; Mon, 20 Apr 2009 14:06:26 -0400 (EDT) Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id n3KI6Otd019193 for ; Mon, 20 Apr 2009 14:06:25 -0400 Received: from mx1.redhat.com (mx1.redhat.com [172.16.48.31]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n3KI6NDO013883 for ; Mon, 20 Apr 2009 14:06:23 -0400 Received: from chip3mo2-old.postini.com (chip3mo2-old.postini.com [64.18.14.205]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n3KI54wf002565 for ; Mon, 20 Apr 2009 14:05:04 -0400 Received: from source ([147.145.40.20]) by chip3ob64.postini.com ([64.18.6.12]) with SMTP ID DSNKSey5UOEmYP7TpMbIOETJAiOt3aqqWH4f@postini.com; Mon, 20 Apr 2009 11:05:06 PDT Received: from milmhbs0.lsil.com (mhbs.lsil.com [147.145.1.30]) by mail0.lsil.com (8.12.11/8.12.11) with ESMTP id n3KI53Up014629; Mon, 20 Apr 2009 11:05:03 -0700 (PDT) Received: from coshub01.lsi.com (coshub01.co.lsil.com [172.21.36.64]) by milmhbs0.lsil.com (8.12.11/8.12.11) with ESMTP id n3KI57Pp012007; Mon, 20 Apr 2009 11:05:07 -0700 Received: from cosmail01.lsi.com ([172.21.36.24]) by coshub01.lsi.com ([172.21.36.64]) with mapi; Mon, 20 Apr 2009 11:05:02 -0700 From: "Moger, Babu" To: "'dm-devel@redhat.com'" , "linux-scsi@vger.kernel.org" Date: Mon, 20 Apr 2009 11:05:00 -0700 Thread-Topic: [PATCH] dm mpath: Try recover from I/O failure by re-initializing the PG if device is running on one path Thread-Index: AcnB4oWtAzRXCFXMTm6QtCM5EfXlaQ== Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 X-Scanned-By: MIMEDefang 2.63 on 172.16.48.31 X-Scanned-By: MIMEDefang 2.39 X-RedHat-Spam-Score: -4 X-MIME-Autoconverted: from quoted-printable to 8bit by listman.util.phx.redhat.com id n3KI6Otd019193 X-loop: dm-devel@redhat.com Cc: "Chauhan, Vijay" Subject: [dm-devel] [PATCH] dm mpath: Try recover from I/O failure by re-initializing the PG if device is running on one path X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.5 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com This patch introduces the mechanism to recover from I/O failures by re-initializing the path if the device is running on only one path. Problem: Device mapper fails the path for every I/O error. It does not care about the type of error. There are certain errors which can be recovered by re-initializing the path again. I have seen this problem during my testing on rdac device handler. I have observed I/O errors when there is a change in Lun ownership. When Lun ownership changes device will return back with check condition with sense 0x05/0x94/0x01(SK/ASC/ASCQ -meaning Lun ownership changed). Currently, device mapper fails the path for this error and eventually this will lead to I/O error. We don't want to see I/O error for this reason. The patch will set the flag pg_init_required if the device is running on single path. The process_queued_ios will re-initialize path if required. I have tested this patch on LSI rdac handler. Signed-off-by: Babu Moger --- -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel --- linux-2.6.30-rc2/drivers/md/dm-mpath.c.orig 2009-04-17 16:49:33.000000000 -0500 +++ linux-2.6.30-rc2/drivers/md/dm-mpath.c 2009-04-17 17:09:51.000000000 -0500 @@ -1152,6 +1152,15 @@ static int do_end_io(struct multipath *m return error; spin_lock_irqsave(&m->lock, flags); + /* + * If this is the only path left, then lets try to + * re-initialize the PG one last time.. + */ + if (m->nr_valid_paths == 1 && m->hw_handler_name) { + m->pg_init_required = 1; + spin_unlock_irqrestore(&m->lock, flags); + goto requeue; + } if (!m->nr_valid_paths) { if (__must_push_back(m)) { spin_unlock_irqrestore(&m->lock, flags);