diff mbox

dm-mpath: always return reservation conflict

Message ID 1436959404-14035-1-git-send-email-hare@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Hannes Reinecke July 15, 2015, 11:23 a.m. UTC
If dm-mpath encounters an reservation conflict it should not
fail the path (as communication with the target is not affected)
but should rather retry on another path.
However, in doing so we might be inducing a ping-pong between
paths, with no guarantee of any forward progress.
And arguably a reservation conflict is an unexpected error,
so we should be passing it upwards to allow the application
to take appropriate steps.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-mpath.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Comments

James Bottomley July 15, 2015, 11:35 a.m. UTC | #1
On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
> If dm-mpath encounters an reservation conflict it should not
> fail the path (as communication with the target is not affected)
> but should rather retry on another path.
> However, in doing so we might be inducing a ping-pong between
> paths, with no guarantee of any forward progress.
> And arguably a reservation conflict is an unexpected error,
> so we should be passing it upwards to allow the application
> to take appropriate steps.

If I interpret the code correctly, you've changed the behaviour from the
current try all paths and fail them, ultimately passing the reservation
conflict up if all paths fail to return reservation conflict
immediately, keeping all paths running.  This assumes that the
reservation isn't path specific because if we encounter a path specific
reservation, you've altered the behaviour from route around to fail.

The case I think the original code was for is SAN Volume controllers
which use path specific SCSI-3 reservations effectively to do traffic
control and allow favoured paths.  Have you verified that nothing we
encounter in the enterprise uses path specific reservations for
multipath shaping any more?

James

> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-mpath.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 5a67671..e65d266 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -1269,7 +1269,16 @@ static int do_end_io(struct multipath *m, struct request *clone,
>  	if (noretry_error(error))
>  		return error;
>  
> -	if (mpio->pgpath)
> +	/*
> +	 * EBADE signals an reservation conflict.
> +	 * We shouldn't fail the path here as we can communicate with
> +	 * the target. We should failover to the next path, but in
> +	 * doing so we might be causing a ping-pong between paths.
> +	 * So just return the reservation conflict error.
> +	 */
> +	if (error == -EBADE)
> +		r = error;
> +	else if (mpio->pgpath)
>  		fail_path(mpio->pgpath);
>  
>  	spin_lock_irqsave(&m->lock, flags);
> @@ -1277,9 +1286,6 @@ static int do_end_io(struct multipath *m, struct request *clone,
>  		if (!m->queue_if_no_path) {
>  			if (!__must_push_back(m))
>  				r = -EIO;
> -		} else {
> -			if (error == -EBADE)
> -				r = error;
>  		}
>  	}
>  	spin_unlock_irqrestore(&m->lock, flags);



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke July 15, 2015, 11:52 a.m. UTC | #2
On 07/15/2015 01:35 PM, James Bottomley wrote:
> On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
>> If dm-mpath encounters an reservation conflict it should not
>> fail the path (as communication with the target is not affected)
>> but should rather retry on another path.
>> However, in doing so we might be inducing a ping-pong between
>> paths, with no guarantee of any forward progress.
>> And arguably a reservation conflict is an unexpected error,
>> so we should be passing it upwards to allow the application
>> to take appropriate steps.
> 
> If I interpret the code correctly, you've changed the behaviour from the
> current try all paths and fail them, ultimately passing the reservation
> conflict up if all paths fail to return reservation conflict
> immediately, keeping all paths running.  This assumes that the
> reservation isn't path specific because if we encounter a path specific
> reservation, you've altered the behaviour from route around to fail.
> 
That is correct.
As mentioned in the path, the 'correct' solution would be to retry
the offending I/O on another path.
However, the current multipath design doesn't allow us to do that
without failing the path first.
If we were just retrying I/O on another path without failing the
path first (and all paths would return a reservation conflict) we
wouldn't know when we've exhausted all paths.

> The case I think the original code was for is SAN Volume controllers
> which use path specific SCSI-3 reservations effectively to do traffic
> control and allow favoured paths.  Have you verified that nothing we
> encounter in the enterprise uses path specific reservations for
> multipath shaping any more?
> 
Ah. That was some input I was looking for.
With that patch I've assumed that persistent reservations are done
primarily from userland / filesystem, where the reservation would
effectively be done on a per-LUN basis.
If it's being used from the storage array internally this is a
different matter.
(Although I'd be very interested how this behaviour would play
together with applications which use persistent reservations
internally; GPFS springs to mind here ...)

But apparently this specific behaviour wasn't seen that often in the
field; I certainly never got any customer reports about mysteriously
failing paths.

Anyway. I'll see if I can come up with something to restore the
original behaviour.

Cheers,

Hannes
Christoph Hellwig July 15, 2015, 11:56 a.m. UTC | #3
An array can't issue a reservation, the initiator needs to register
it.  Right now the only way to do it is through SG_IO passthrough,
which is a best luck effort it I/O isn't also using SG_IO and can't
be properly supported because of that.

However I will submit an in-kernel reservation API soon which will
allow us to have that sort of control.  My current prototyp only allows
for all-path reservations as I couldn't come up with a use case for
per-path reservations, but if such a need should arise we can add it
and take that into account in the multipathing code.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley July 15, 2015, 12:01 p.m. UTC | #4
On Wed, 2015-07-15 at 13:52 +0200, Hannes Reinecke wrote:
> On 07/15/2015 01:35 PM, James Bottomley wrote:
> > On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
> >> If dm-mpath encounters an reservation conflict it should not
> >> fail the path (as communication with the target is not affected)
> >> but should rather retry on another path.
> >> However, in doing so we might be inducing a ping-pong between
> >> paths, with no guarantee of any forward progress.
> >> And arguably a reservation conflict is an unexpected error,
> >> so we should be passing it upwards to allow the application
> >> to take appropriate steps.
> > 
> > If I interpret the code correctly, you've changed the behaviour from the
> > current try all paths and fail them, ultimately passing the reservation
> > conflict up if all paths fail to return reservation conflict
> > immediately, keeping all paths running.  This assumes that the
> > reservation isn't path specific because if we encounter a path specific
> > reservation, you've altered the behaviour from route around to fail.
> > 
> That is correct.
> As mentioned in the path, the 'correct' solution would be to retry
> the offending I/O on another path.
> However, the current multipath design doesn't allow us to do that
> without failing the path first.
> If we were just retrying I/O on another path without failing the
> path first (and all paths would return a reservation conflict) we
> wouldn't know when we've exhausted all paths.
> 
> > The case I think the original code was for is SAN Volume controllers
> > which use path specific SCSI-3 reservations effectively to do traffic
> > control and allow favoured paths.  Have you verified that nothing we
> > encounter in the enterprise uses path specific reservations for
> > multipath shaping any more?
> > 
> Ah. That was some input I was looking for.
> With that patch I've assumed that persistent reservations are done
> primarily from userland / filesystem, where the reservation would
> effectively be done on a per-LUN basis.
> If it's being used from the storage array internally this is a
> different matter.
> (Although I'd be very interested how this behaviour would play
> together with applications which use persistent reservations
> internally; GPFS springs to mind here ...)
> 
> But apparently this specific behaviour wasn't seen that often in the
> field; I certainly never got any customer reports about mysteriously
> failing paths.

Have you already got this patch in SLES, if so, for how long?

> Anyway. I'll see if I can come up with something to restore the
> original behaviour.

Or a way of verifying that nothing in the current enterprise uses path
specific reservations ...  we can change the current behaviour as long
as nothing notices.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke July 15, 2015, 12:02 p.m. UTC | #5
On 07/15/2015 01:56 PM, Christoph Hellwig wrote:
> An array can't issue a reservation, the initiator needs to register
> it.  Right now the only way to do it is through SG_IO passthrough,
> which is a best luck effort it I/O isn't also using SG_IO and can't
> be properly supported because of that.
> 
> However I will submit an in-kernel reservation API soon which will
> allow us to have that sort of control.  My current prototyp only allows
> for all-path reservations as I couldn't come up with a use case for
> per-path reservations, but if such a need should arise we can add it
> and take that into account in the multipathing code.
> 
Which was my reasoning as well.
I would consider a per-path reservation in a multipath setup an
error, as the current multipath code is not able to handle this.
With the current code we will fail a path due to the reservation
conflict error, but whatever happens next depends on the type of
reservation and the used prioritizer/path checker.
It can be everything from 'just working' to recurrent path drops to
and I/O stall (as SET TARGET PORT GROUPS might return an reservation
conflict, too, so we wouldn't be able to switch to a working path...)

And implementing a per-path reservation in multipath is far from
trivial, so I'd rather not attempt this.
_Especially_ not as you're working on a in-kernel reservation code.

Cheers,

Hannes
Hannes Reinecke July 15, 2015, 12:15 p.m. UTC | #6
On 07/15/2015 02:01 PM, James Bottomley wrote:
> On Wed, 2015-07-15 at 13:52 +0200, Hannes Reinecke wrote:
>> On 07/15/2015 01:35 PM, James Bottomley wrote:
>>> On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
>>>> If dm-mpath encounters an reservation conflict it should not
>>>> fail the path (as communication with the target is not affected)
>>>> but should rather retry on another path.
>>>> However, in doing so we might be inducing a ping-pong between
>>>> paths, with no guarantee of any forward progress.
>>>> And arguably a reservation conflict is an unexpected error,
>>>> so we should be passing it upwards to allow the application
>>>> to take appropriate steps.
>>>
>>> If I interpret the code correctly, you've changed the behaviour from the
>>> current try all paths and fail them, ultimately passing the reservation
>>> conflict up if all paths fail to return reservation conflict
>>> immediately, keeping all paths running.  This assumes that the
>>> reservation isn't path specific because if we encounter a path specific
>>> reservation, you've altered the behaviour from route around to fail.
>>>
>> That is correct.
>> As mentioned in the path, the 'correct' solution would be to retry
>> the offending I/O on another path.
>> However, the current multipath design doesn't allow us to do that
>> without failing the path first.
>> If we were just retrying I/O on another path without failing the
>> path first (and all paths would return a reservation conflict) we
>> wouldn't know when we've exhausted all paths.
>>
>>> The case I think the original code was for is SAN Volume controllers
>>> which use path specific SCSI-3 reservations effectively to do traffic
>>> control and allow favoured paths.  Have you verified that nothing we
>>> encounter in the enterprise uses path specific reservations for
>>> multipath shaping any more?
>>>
>> Ah. That was some input I was looking for.
>> With that patch I've assumed that persistent reservations are done
>> primarily from userland / filesystem, where the reservation would
>> effectively be done on a per-LUN basis.
>> If it's being used from the storage array internally this is a
>> different matter.
>> (Although I'd be very interested how this behaviour would play
>> together with applications which use persistent reservations
>> internally; GPFS springs to mind here ...)
>>
>> But apparently this specific behaviour wasn't seen that often in the
>> field; I certainly never got any customer reports about mysteriously
>> failing paths.
> 
> Have you already got this patch in SLES, if so, for how long?
> 
We haven't as of yet; I've come across this behaviour due to another
issue. And before I were to put this into SLES I thought I should be
asking those in the know ... persistent reservations _is_ an arcane
topic, after all.
I was just referring to the fact that I rarely got customer issues
with persistent reservations; and those I get tend to be tape-centric.

>> Anyway. I'll see if I can come up with something to restore the
>> original behaviour.
> 
> Or a way of verifying that nothing in the current enterprise uses path
> specific reservations ...  we can change the current behaviour as long
> as nothing notices.
> 
The only instance I know of is GPFS; someone in our company once
wrote an HA agent using persistent reservations, but I'm not sure if
it's deployed anywhere. But that agent is certainly aware of
multipathing, and doesn't issue per-path reservations.
(Well, actually it does, but it does it for every path :-)
I would think the same goes for GPFS.

Incidentally, the SVC docs have a section about persistent
reservations, but do not mention anything about internal use.
So if it does it'll be opaque to the user, otherwise I would assume
it to be mentioned there.

Cheers,

Hannes
Mike Snitzer July 15, 2015, 1:20 p.m. UTC | #7
On Wed, Jul 15 2015 at  8:15am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On 07/15/2015 02:01 PM, James Bottomley wrote:
> > On Wed, 2015-07-15 at 13:52 +0200, Hannes Reinecke wrote:
> >> On 07/15/2015 01:35 PM, James Bottomley wrote:
> >>> On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
> >>>> If dm-mpath encounters an reservation conflict it should not
> >>>> fail the path (as communication with the target is not affected)
> >>>> but should rather retry on another path.
> >>>> However, in doing so we might be inducing a ping-pong between
> >>>> paths, with no guarantee of any forward progress.
> >>>> And arguably a reservation conflict is an unexpected error,
> >>>> so we should be passing it upwards to allow the application
> >>>> to take appropriate steps.
> >>>
> >>> If I interpret the code correctly, you've changed the behaviour from the
> >>> current try all paths and fail them, ultimately passing the reservation
> >>> conflict up if all paths fail to return reservation conflict
> >>> immediately, keeping all paths running.  This assumes that the
> >>> reservation isn't path specific because if we encounter a path specific
> >>> reservation, you've altered the behaviour from route around to fail.
> >>>
> >> That is correct.
> >> As mentioned in the path, the 'correct' solution would be to retry
> >> the offending I/O on another path.
> >> However, the current multipath design doesn't allow us to do that
> >> without failing the path first.
> >> If we were just retrying I/O on another path without failing the
> >> path first (and all paths would return a reservation conflict) we
> >> wouldn't know when we've exhausted all paths.
> >>
> >>> The case I think the original code was for is SAN Volume controllers
> >>> which use path specific SCSI-3 reservations effectively to do traffic
> >>> control and allow favoured paths.  Have you verified that nothing we
> >>> encounter in the enterprise uses path specific reservations for
> >>> multipath shaping any more?
> >>>
> >> Ah. That was some input I was looking for.
> >> With that patch I've assumed that persistent reservations are done
> >> primarily from userland / filesystem, where the reservation would
> >> effectively be done on a per-LUN basis.
> >> If it's being used from the storage array internally this is a
> >> different matter.
> >> (Although I'd be very interested how this behaviour would play
> >> together with applications which use persistent reservations
> >> internally; GPFS springs to mind here ...)
> >>
> >> But apparently this specific behaviour wasn't seen that often in the
> >> field; I certainly never got any customer reports about mysteriously
> >> failing paths.
> > 
> > Have you already got this patch in SLES, if so, for how long?
> > 
> We haven't as of yet; I've come across this behaviour due to another
> issue. And before I were to put this into SLES I thought I should be
> asking those in the know ... persistent reservations _is_ an arcane
> topic, after all.
> I was just referring to the fact that I rarely got customer issues
> with persistent reservations; and those I get tend to be tape-centric.
> 
> >> Anyway. I'll see if I can come up with something to restore the
> >> original behaviour.
> > 
> > Or a way of verifying that nothing in the current enterprise uses path
> > specific reservations ...  we can change the current behaviour as long
> > as nothing notices.
> > 
> The only instance I know of is GPFS; someone in our company once
> wrote an HA agent using persistent reservations, but I'm not sure if
> it's deployed anywhere. But that agent is certainly aware of
> multipathing, and doesn't issue per-path reservations.
> (Well, actually it does, but it does it for every path :-)
> I would think the same goes for GPFS.
> 
> Incidentally, the SVC docs have a section about persistent
> reservations, but do not mention anything about internal use.
> So if it does it'll be opaque to the user, otherwise I would assume
> it to be mentioned there.

The main consumer of SCSI PR that I'm aware of is fence_scsi.  I don't
have specifics on whether the Clustering layers that use fence_scsi
(e.g. pacemaker) ever make use of per-path SCSI PR (cc'ing Ryan O'hara
who AFAIK mainatins fence_scsi).

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 16, 2015, 7:54 a.m. UTC | #8
On Thu, Jul 16, 2015 at 05:07:03AM +0000, Christophe Varoqui wrote:
> For reference the opensvc crm does use type 5 pr, and aims for all paths
> registered. It still does not make use of the multipathd pr janitoring
> features, and uses sg_persist directly for pr status and actions.

The type doesn't matter here.  It's important to set the ALL_TG_PT bit
when registering the key.  As dm-mpath opens the underlying devices
exclusively, and doesn't give you a choice which path to send to you're
in a world of pain without that.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke July 16, 2015, 2:40 p.m. UTC | #9
On 07/16/2015 09:54 AM, Christoph Hellwig wrote:
> On Thu, Jul 16, 2015 at 05:07:03AM +0000, Christophe Varoqui wrote:
>> For reference the opensvc crm does use type 5 pr, and aims for all paths
>> registered. It still does not make use of the multipathd pr janitoring
>> features, and uses sg_persist directly for pr status and actions.
> 
> The type doesn't matter here.  It's important to set the ALL_TG_PT bit
> when registering the key.  As dm-mpath opens the underlying devices
> exclusively, and doesn't give you a choice which path to send to you're
> in a world of pain without that.
> 
Second that.

I would even put this in the manpage somewhere.

Cheers,

Hannes
diff mbox

Patch

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 5a67671..e65d266 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1269,7 +1269,16 @@  static int do_end_io(struct multipath *m, struct request *clone,
 	if (noretry_error(error))
 		return error;
 
-	if (mpio->pgpath)
+	/*
+	 * EBADE signals an reservation conflict.
+	 * We shouldn't fail the path here as we can communicate with
+	 * the target. We should failover to the next path, but in
+	 * doing so we might be causing a ping-pong between paths.
+	 * So just return the reservation conflict error.
+	 */
+	if (error == -EBADE)
+		r = error;
+	else if (mpio->pgpath)
 		fail_path(mpio->pgpath);
 
 	spin_lock_irqsave(&m->lock, flags);
@@ -1277,9 +1286,6 @@  static int do_end_io(struct multipath *m, struct request *clone,
 		if (!m->queue_if_no_path) {
 			if (!__must_push_back(m))
 				r = -EIO;
-		} else {
-			if (error == -EBADE)
-				r = error;
 		}
 	}
 	spin_unlock_irqrestore(&m->lock, flags);