diff mbox series

[2/3] dm: Start pr_reserve from the same starting path

Message ID 20220707202711.10836-3-michael.christie@oracle.com (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show
Series dm pr_ops fixes | expand

Commit Message

Mike Christie July 7, 2022, 8:27 p.m. UTC
When an app does a pr_reserve it will go to whatever path we happen to
be using at the time. This can result in errors where the app does a
second pr_reserve call and expects success but gets a failure becuase
the reserve is not done on the holder's path. This patch has us always
start trying to do reserves from the first path in the first group.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/md/dm.c | 46 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 32 insertions(+), 14 deletions(-)

Comments

Mike Christie July 9, 2022, 3:06 p.m. UTC | #1
On 7/7/22 3:27 PM, Mike Christie wrote:
> When an app does a pr_reserve it will go to whatever path we happen to
> be using at the time. This can result in errors where the app does a
> second pr_reserve call and expects success but gets a failure becuase
> the reserve is not done on the holder's path. This patch has us always
> start trying to do reserves from the first path in the first group.
> 

Hi,

Giving myself a review comment. pr_preempt can also establish a reservation.
I meant to send a patch for that as well. If the approach in this patchset is
ok, I'll send a patch for that as well.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer July 14, 2022, 6:49 p.m. UTC | #2
On Sat, Jul 09 2022 at 11:06P -0400,
Mike Christie <michael.christie@oracle.com> wrote:

> On 7/7/22 3:27 PM, Mike Christie wrote:
> > When an app does a pr_reserve it will go to whatever path we happen to
> > be using at the time. This can result in errors where the app does a
> > second pr_reserve call and expects success but gets a failure becuase
> > the reserve is not done on the holder's path. This patch has us always
> > start trying to do reserves from the first path in the first group.
> > 
> 
> Hi,
> 
> Giving myself a review comment. pr_preempt can also establish a reservation.
> I meant to send a patch for that as well. If the approach in this patchset is
> ok, I'll send a patch for that as well.
> 

It'd be nice to have Christoph weigh-in on these changes but I'm OK
with them in general.

But please give details on what you've tested them against.  I assume
the Windows cluster? How about pacemaker? And all looks good on other
systems that don't have the requirement to pin the PR to a device?

Once I have this context on testing I can then work through the
changes more closely and get them staged.  Please do feel free to send
a v2 that conveys what testing was done and you're welcome to sned the
patch for pr_preempt too.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Christie July 15, 2022, 12:34 a.m. UTC | #3
On 7/14/22 1:49 PM, Mike Snitzer wrote:
> On Sat, Jul 09 2022 at 11:06P -0400,
> Mike Christie <michael.christie@oracle.com> wrote:
> 
>> On 7/7/22 3:27 PM, Mike Christie wrote:
>>> When an app does a pr_reserve it will go to whatever path we happen to
>>> be using at the time. This can result in errors where the app does a
>>> second pr_reserve call and expects success but gets a failure becuase
>>> the reserve is not done on the holder's path. This patch has us always
>>> start trying to do reserves from the first path in the first group.
>>>
>>
>> Hi,
>>
>> Giving myself a review comment. pr_preempt can also establish a reservation.
>> I meant to send a patch for that as well. If the approach in this patchset is
>> ok, I'll send a patch for that as well.
>>
> 
> It'd be nice to have Christoph weigh-in on these changes but I'm OK
> with them in general.
> 
> But please give details on what you've tested them against.  I assume
> the Windows cluster? How about pacemaker? And all looks good on other

Yeah, I tested with windows clustering. With the patches we pass its
validation test suite.

I did not run pacemaker. I was reviewing their scsi/multipath fence
agents and noticed they use a Registrants Only reservation like windows.
I just ran the commands they run manually. It works ok with the preempt
change I mentioned I forgot above.

I also ran the libiscsi PGR tests. We pass all of those tests except the
Write Exclusive and Exclusive Access tests. I don't think those reservation
types make sense for multipath devices though. And as I write this I'm
thinking we should add a check for these types and just return failure).


> systems that don't have the requirement to pin the PR to a device?

I didn't find any real applications that use the All Registrants type of
reservation where every registered path is a reservation holder. However,
libiscsi has PGR tests for that type of reservation and the code works ok.

> 
> Once I have this context on testing I can then work through the
> changes more closely and get them staged.  Please do feel free to send
> a v2 that conveys what testing was done and you're welcome to sned the
> patch for pr_preempt too.
> 

Ok. I'll send an updated patchset and add more details about what I tested
in the commits so we have it in there.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Christie July 15, 2022, 8:38 p.m. UTC | #4
On 7/14/22 7:34 PM, Mike Christie wrote:> I also ran the libiscsi PGR tests. We pass all of those tests except the
> Write Exclusive and Exclusive Access tests. I don't think those reservation
> types make sense for multipath devices though. And as I write this I'm
> thinking we should add a check for these types and just return failure).

Let me tack that last part back.

I forgot people use dm-multipath with a single path so they can use the
queue_if_no_path/no_path_retry feature and then do path management using
some other non-multipathd daemon that does stuff like IP takover uses iscsi's
redirect feature.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig July 18, 2022, 5:55 a.m. UTC | #5
On Thu, Jul 14, 2022 at 07:34:57PM -0500, Mike Christie wrote:
> I also ran the libiscsi PGR tests. We pass all of those tests except the
> Write Exclusive and Exclusive Access tests. I don't think those reservation
> types make sense for multipath devices though. And as I write this I'm
> thinking we should add a check for these types and just return failure).

Why would any reservation type make less sense for multipath vs
non-multipath setups/

> 
> 
> > systems that don't have the requirement to pin the PR to a device?
> 
> I didn't find any real applications that use the All Registrants type of
> reservation where every registered path is a reservation holder. However,
> libiscsi has PGR tests for that type of reservation and the code works ok.

Well.  In general ALL_TG_PT would usually be the preferred method
everywhere.  But that assumes:

 1) it actually is supported by the target
 2) the target actually has a useful concept of the Linux system being
    a single initiator, which outside of a few setups like software
    only iSCSI are rarely true

so we usually have to fall back to just registering every path
separately.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Christie July 18, 2022, 5:39 p.m. UTC | #6
On 7/18/22 12:55 AM, Christoph Hellwig wrote:
> On Thu, Jul 14, 2022 at 07:34:57PM -0500, Mike Christie wrote:
>> I also ran the libiscsi PGR tests. We pass all of those tests except the
>> Write Exclusive and Exclusive Access tests. I don't think those reservation
>> types make sense for multipath devices though. And as I write this I'm
>> thinking we should add a check for these types and just return failure).
> 
> Why would any reservation type make less sense for multipath vs
> non-multipath setups/

I think those 2 types only work for really specific use cases in multipath
setups.

1. Let's say you do an active-active setup with 2 paths in one priority group,
and the Write Exclusive or Exclusive Access reservation went down pathA so it's
the holder. When the app does IO to /dev/dm-0 the path selectors aren't PGR aware
so IO can go down any path. For Write Exclusive, when WRITEs go down pathB they
get failed with reservation conflicts (sbc4r22 table 13). So this type of
reservation and active-active would only be useful for read-only work loads.

For Exclusive Access READ/WRITEs can only go down pathA ok. If they go down
PathB we will get reservation conflicts. So it's really useless in an active-
active setup.

For All Reg and Reg Only this is not a problem because all paths are registered
and the spec says for those types of reservations we can do R/W IO through them
(sbc4r22 table 13).

2. For an active-passive setup with 2 priority groups and 1 path per group we have
a similar issue when there is a path failure. PathA in PG0 will be the holder and
since we have the one path in use, all IO will execute ok. If pathA fails and we
switch to pathB in PG1, then similar to above, depending on the reservation type
and IO type, we will get conflicts since PathA was the holder.

I was thinking maybe for this one, some specific user sometimes wants this
behavior (use PG1 as a last resort). However, I think the user would normally
expect the All Reg or Reg only behavior where when we switch to pathB and don't get
conflicts for a path failure.

For a non-multipath setup we don't hit 1 or 2 with Write Exclusive or Exclusive
Access since each host just has the one path to the device.

>>
>>
>>> systems that don't have the requirement to pin the PR to a device?
>>
>> I didn't find any real applications that use the All Registrants type of
>> reservation where every registered path is a reservation holder. However,
>> libiscsi has PGR tests for that type of reservation and the code works ok.
> 
> Well.  In general ALL_TG_PT would usually be the preferred method
> everywhere.  But that assumes:
> 
>  1) it actually is supported by the target
>  2) the target actually has a useful concept of the Linux system being
>     a single initiator, which outside of a few setups like software
>     only iSCSI are rarely true
> 
> so we usually have to fall back to just registering every path
> separately.

Yeah, windows sends a register down each path like us too.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Christie July 18, 2022, 6:28 p.m. UTC | #7
On 7/18/22 12:39 PM, Mike Christie wrote:
> On 7/18/22 12:55 AM, Christoph Hellwig wrote:
>> On Thu, Jul 14, 2022 at 07:34:57PM -0500, Mike Christie wrote:
>>> I also ran the libiscsi PGR tests. We pass all of those tests except the
>>> Write Exclusive and Exclusive Access tests. I don't think those reservation
>>> types make sense for multipath devices though. And as I write this I'm
>>> thinking we should add a check for these types and just return failure).
>>
>> Why would any reservation type make less sense for multipath vs
>> non-multipath setups/
> 
> I think those 2 types only work for really specific use cases in multipath
> setups.
> 

...

>>  2) the target actually has a useful concept of the Linux system being
>>     a single initiator, which outside of a few setups like software
>>     only iSCSI are rarely true
>>
Oh yeah, if we are talking about the same type of thing then I've seen a
target that has some smarts and treats Write Exclusive and Exclusive Access like
a All Reg or Reg Only variants when it detects or is setup for linux and it sees
all an initiator's paths registered.

So yeah I guess we should not fail those reservations types.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig July 20, 2022, 6:16 a.m. UTC | #8
On Mon, Jul 18, 2022 at 12:39:12PM -0500, Mike Christie wrote:
> 1. Let's say you do an active-active setup with 2 paths in one priority group,
> and the Write Exclusive or Exclusive Access reservation went down pathA so it's
> the holder. When the app does IO to /dev/dm-0 the path selectors aren't PGR aware
> so IO can go down any path. For Write Exclusive, when WRITEs go down pathB they
> get failed with reservation conflicts (sbc4r22 table 13). So this type of
> reservation and active-active would only be useful for read-only work loads.
> 
> For Exclusive Access READ/WRITEs can only go down pathA ok. If they go down
> PathB we will get reservation conflicts. So it's really useless in an active-
> active setup.

It's not useless.  It just needs all paths to be registered.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Christie July 20, 2022, 5:02 p.m. UTC | #9
On 7/20/22 1:16 AM, Christoph Hellwig wrote:
> On Mon, Jul 18, 2022 at 12:39:12PM -0500, Mike Christie wrote:
>> 1. Let's say you do an active-active setup with 2 paths in one priority group,
>> and the Write Exclusive or Exclusive Access reservation went down pathA so it's
>> the holder. When the app does IO to /dev/dm-0 the path selectors aren't PGR aware
>> so IO can go down any path. For Write Exclusive, when WRITEs go down pathB they
>> get failed with reservation conflicts (sbc4r22 table 13). So this type of
>> reservation and active-active would only be useful for read-only work loads.
>>
>> For Exclusive Access READ/WRITEs can only go down pathA ok. If they go down
>> PathB we will get reservation conflicts. So it's really useless in an active-
>> active setup.
> 
> It's not useless.  It just needs all paths to be registered.

I don't think that's correct. I think you misunderstood me or I misunderstood
table 13 in sbc4r22.

For just "Exclusive Access", even if the path is registered the table says
to fail commands with a reservation conflict if the path is not the reservation
holder. So there is no way to use active-active with more than 1 path, or every
time the path selector switches paths to the non holding path you get IO errors.

For "Exclusive Access All Registrants" or "Exclusive Access Registrants Only"
I agree you are correct, and all registered paths can execute IO ok.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 67b46ae896b1..cdd1656b99d6 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3048,6 +3048,7 @@  struct dm_pr {
 	u32	flags;
 	bool	fail_early;
 	int	ret;
+	enum pr_type type;
 };
 
 static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn,
@@ -3139,25 +3140,42 @@  static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
 	return ret;
 }
 
+
+static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev,
+			   sector_t start, sector_t len, void *data)
+{
+	struct dm_pr *pr = data;
+	const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops;
+
+	if (!ops || !ops->pr_reserve) {
+		pr->ret = -EOPNOTSUPP;
+		return -1;
+	}
+
+	pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags);
+	if (!pr->ret)
+		return -1;
+
+	return 0;
+}
+
 static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
 			 u32 flags)
 {
-	struct mapped_device *md = bdev->bd_disk->private_data;
-	const struct pr_ops *ops;
-	int r, srcu_idx;
+	struct dm_pr pr = {
+		.old_key	= key,
+		.flags		= flags,
+		.type		= type,
+		.fail_early	= false,
+		.ret		= 0,
+	};
+	int ret;
 
-	r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
-	if (r < 0)
-		goto out;
+	ret = dm_call_pr(bdev, __dm_pr_reserve, &pr);
+	if (ret)
+		return ret;
 
-	ops = bdev->bd_disk->fops->pr_ops;
-	if (ops && ops->pr_reserve)
-		r = ops->pr_reserve(bdev, key, type, flags);
-	else
-		r = -EOPNOTSUPP;
-out:
-	dm_unprepare_ioctl(md, srcu_idx);
-	return r;
+	return pr.ret;
 }
 
 static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type)