From patchwork Wed Jun 10 23:34:52 2009
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mike Christie <michaelc@cs.wisc.edu>
X-Patchwork-Id: 29444
X-Patchwork-Delegate: christophe.varoqui@free.fr
Received: from hormel.redhat.com (hormel1.redhat.com [209.132.177.33])
	by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n5ANZI1H020918
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Wed, 10 Jun 2009 23:35:18 GMT
Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com
	[10.8.4.110])
	by hormel.redhat.com (Postfix) with ESMTP id 0016B61A056;
	Wed, 10 Jun 2009 19:35:16 -0400 (EDT)
Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com
	[172.16.52.254])
	by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id
	n5ANZF5w023491 for <dm-devel@listman.util.phx.redhat.com>;
	Wed, 10 Jun 2009 19:35:15 -0400
Received: from mx1.redhat.com (mx1.redhat.com [172.16.48.31])
	by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id
	n5ANZETa003195
	for <dm-devel@redhat.com>; Wed, 10 Jun 2009 19:35:14 -0400
Received: from sabe.cs.wisc.edu (sabe.cs.wisc.edu [128.105.6.20])
	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n5ANZ0R1001461
	for <dm-devel@redhat.com>; Wed, 10 Jun 2009 19:35:00 -0400
Received: from [20.15.0.9] (c-75-73-66-60.hsd1.mn.comcast.net [75.73.66.60])
	(authenticated bits=0)
	by sabe.cs.wisc.edu (8.14.1/8.14.1) with ESMTP id n5ANYv4G026053
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 10 Jun 2009 18:34:58 -0500
Message-ID: <4A30431C.3030809@cs.wisc.edu>
Date: Wed, 10 Jun 2009 18:34:52 -0500
From: Mike Christie <michaelc@cs.wisc.edu>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11
	Thunderbird/3.0b2
MIME-Version: 1.0
To: device-mapper development <dm-devel@redhat.com>
Subject: Re: [dm-devel] do Symmetrix multipath-tools defaults need update
	? or scsi-to-blk errors management ?
References: 
 <1766094670.1725191244670599365.JavaMail.root@zimbra16-e3.priv.proxad.net>
In-Reply-To: 
 <1766094670.1725191244670599365.JavaMail.root@zimbra16-e3.priv.proxad.net>
X-RedHat-Spam-Score: -0.328 
X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254
X-Scanned-By: MIMEDefang 2.63 on 172.16.48.31
X-loop: dm-devel@redhat.com
Cc: Levy_Jerome@emc.com, linux-scsi@vger.kernel.org
X-BeenThere: dm-devel@redhat.com
X-Mailman-Version: 2.1.5
Precedence: junk
Reply-To: device-mapper development <dm-devel@redhat.com>
List-Id: device-mapper development <dm-devel.redhat.com>
List-Unsubscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/dm-devel>
List-Post: <mailto:dm-devel@redhat.com>
List-Help: <mailto:dm-devel-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=subscribe>
Sender: dm-devel-bounces@redhat.com
Errors-To: dm-devel-bounces@redhat.com

On 06/10/2009 04:49 PM, christophe.varoqui@free.fr wrote:
> Hi Jerome,
>
> EMC recently asked my/one-of-your client to active "queue_if_no_path" on Symmetrix logical units, which is not the current default setting in the upstream multipath-tools package.
>
> I'd like to know if you intent on submitting a patch to change the default setting accordingly, or if you'd rather let the no-queueing default unchanged and work on fixing the root cause of this issue.
>
> ::: Background information, root cause :::
>
> The Symmetrix array proved to return scsi errors io to submitters in certains circumstances (I was told of errors on R1+R2 network link). The linux kernel lacking finesse in the SCSI->DM error reporting ends-up invalidating in turn each path of the multipath before the multipathd daemon gets a chance to revalidate. "queue_if_no_path" being disabled, the io errors ends up in the FS layer and in the userspace submitter.
>
> ::: error log on a 2.6.9 (rhel 4.7) kernel :::
>

For RH 4.9 I did the attached patch. So this error is not fastfailed 
(upstream does not fastfail this type of error when using dm-multipath 
now). So now the scsi layer will retry its normal 5 times, then fail.


> SCSI error :<h b t l>  return code 0x8000002
> current sday: sense key Aborted Command
> Additional sense: Internal target failure
> end_request: I/O error, dev sday, sector XXXXX
> device-mapper: dm-multipath: Failing path 67:32.
>
> ::: unfortunate side effect of queue_if_no_path :::
>
> Activating "queue_if_no_path" is certainly an effecient work-around for this kind of short-lived retriable errors, but this feature compromises data-protection on clusters relying on persistent reservation to fence ios from passive nodes. Ironically, the reason is quite similar : SCSI return codes for reservation conflicts also end up invalidating each path of a multipath, and worse, the io causing the conflict gets queued ! and retried ! until the poor active drops its reservation, unleashing data-corrupting ios from passive node queues on the logical unit.
>
> ::: error log on a 2.6.29.x kernel for a reservation conflict :::
>
> sd h:b:t:l: reservation conflict
> sd h:b:t:l: [sdu] Unhandled error code
> sd h:b:t:l: [sdu] Result: hostbyte=DID_OK driver_byte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdu, sector XXXXX
> device-mapper: dm-multipath: Failing path 65:64.
>
> ::: persistent reservation + queue_if_no_path, possible solution ? :::
>
> Seems to me scsi_lib.c::scsi_io_completion() should be able to cancel a reservation conflicting io and signal blk_end_request() with no error reported.
>

I was just about to post new blkerr patches. For this we just wan 
multipath to fail this IO right away right? So have scsi return some 
fatal error then dm-multipath will see it and not retry that IO?
---
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 7309f12..d5a3390 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1390,7 +1390,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 	case CHECK_CONDITION:
 		rtn = scsi_check_sense(scmd);
 		if (rtn == NEEDS_RETRY)
-			goto maybe_retry;
+			goto check_retry_count;
 		/* if rtn == FAILED, we have no sense information;
 		 * returning FAILED will wake the error handler thread
 		 * to collect the sense and redo the decide