Message ID | 20220707202711.10836-3-michael.christie@oracle.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | dm pr_ops fixes | expand |
On 7/7/22 3:27 PM, Mike Christie wrote: > When an app does a pr_reserve it will go to whatever path we happen to > be using at the time. This can result in errors where the app does a > second pr_reserve call and expects success but gets a failure becuase > the reserve is not done on the holder's path. This patch has us always > start trying to do reserves from the first path in the first group. > Hi, Giving myself a review comment. pr_preempt can also establish a reservation. I meant to send a patch for that as well. If the approach in this patchset is ok, I'll send a patch for that as well. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Sat, Jul 09 2022 at 11:06P -0400, Mike Christie <michael.christie@oracle.com> wrote: > On 7/7/22 3:27 PM, Mike Christie wrote: > > When an app does a pr_reserve it will go to whatever path we happen to > > be using at the time. This can result in errors where the app does a > > second pr_reserve call and expects success but gets a failure becuase > > the reserve is not done on the holder's path. This patch has us always > > start trying to do reserves from the first path in the first group. > > > > Hi, > > Giving myself a review comment. pr_preempt can also establish a reservation. > I meant to send a patch for that as well. If the approach in this patchset is > ok, I'll send a patch for that as well. > It'd be nice to have Christoph weigh-in on these changes but I'm OK with them in general. But please give details on what you've tested them against. I assume the Windows cluster? How about pacemaker? And all looks good on other systems that don't have the requirement to pin the PR to a device? Once I have this context on testing I can then work through the changes more closely and get them staged. Please do feel free to send a v2 that conveys what testing was done and you're welcome to sned the patch for pr_preempt too. Thanks, Mike -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 7/14/22 1:49 PM, Mike Snitzer wrote: > On Sat, Jul 09 2022 at 11:06P -0400, > Mike Christie <michael.christie@oracle.com> wrote: > >> On 7/7/22 3:27 PM, Mike Christie wrote: >>> When an app does a pr_reserve it will go to whatever path we happen to >>> be using at the time. This can result in errors where the app does a >>> second pr_reserve call and expects success but gets a failure becuase >>> the reserve is not done on the holder's path. This patch has us always >>> start trying to do reserves from the first path in the first group. >>> >> >> Hi, >> >> Giving myself a review comment. pr_preempt can also establish a reservation. >> I meant to send a patch for that as well. If the approach in this patchset is >> ok, I'll send a patch for that as well. >> > > It'd be nice to have Christoph weigh-in on these changes but I'm OK > with them in general. > > But please give details on what you've tested them against. I assume > the Windows cluster? How about pacemaker? And all looks good on other Yeah, I tested with windows clustering. With the patches we pass its validation test suite. I did not run pacemaker. I was reviewing their scsi/multipath fence agents and noticed they use a Registrants Only reservation like windows. I just ran the commands they run manually. It works ok with the preempt change I mentioned I forgot above. I also ran the libiscsi PGR tests. We pass all of those tests except the Write Exclusive and Exclusive Access tests. I don't think those reservation types make sense for multipath devices though. And as I write this I'm thinking we should add a check for these types and just return failure). > systems that don't have the requirement to pin the PR to a device? I didn't find any real applications that use the All Registrants type of reservation where every registered path is a reservation holder. However, libiscsi has PGR tests for that type of reservation and the code works ok. > > Once I have this context on testing I can then work through the > changes more closely and get them staged. Please do feel free to send > a v2 that conveys what testing was done and you're welcome to sned the > patch for pr_preempt too. > Ok. I'll send an updated patchset and add more details about what I tested in the commits so we have it in there. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 7/14/22 7:34 PM, Mike Christie wrote:> I also ran the libiscsi PGR tests. We pass all of those tests except the > Write Exclusive and Exclusive Access tests. I don't think those reservation > types make sense for multipath devices though. And as I write this I'm > thinking we should add a check for these types and just return failure). Let me tack that last part back. I forgot people use dm-multipath with a single path so they can use the queue_if_no_path/no_path_retry feature and then do path management using some other non-multipathd daemon that does stuff like IP takover uses iscsi's redirect feature. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Thu, Jul 14, 2022 at 07:34:57PM -0500, Mike Christie wrote: > I also ran the libiscsi PGR tests. We pass all of those tests except the > Write Exclusive and Exclusive Access tests. I don't think those reservation > types make sense for multipath devices though. And as I write this I'm > thinking we should add a check for these types and just return failure). Why would any reservation type make less sense for multipath vs non-multipath setups/ > > > > systems that don't have the requirement to pin the PR to a device? > > I didn't find any real applications that use the All Registrants type of > reservation where every registered path is a reservation holder. However, > libiscsi has PGR tests for that type of reservation and the code works ok. Well. In general ALL_TG_PT would usually be the preferred method everywhere. But that assumes: 1) it actually is supported by the target 2) the target actually has a useful concept of the Linux system being a single initiator, which outside of a few setups like software only iSCSI are rarely true so we usually have to fall back to just registering every path separately. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 7/18/22 12:55 AM, Christoph Hellwig wrote: > On Thu, Jul 14, 2022 at 07:34:57PM -0500, Mike Christie wrote: >> I also ran the libiscsi PGR tests. We pass all of those tests except the >> Write Exclusive and Exclusive Access tests. I don't think those reservation >> types make sense for multipath devices though. And as I write this I'm >> thinking we should add a check for these types and just return failure). > > Why would any reservation type make less sense for multipath vs > non-multipath setups/ I think those 2 types only work for really specific use cases in multipath setups. 1. Let's say you do an active-active setup with 2 paths in one priority group, and the Write Exclusive or Exclusive Access reservation went down pathA so it's the holder. When the app does IO to /dev/dm-0 the path selectors aren't PGR aware so IO can go down any path. For Write Exclusive, when WRITEs go down pathB they get failed with reservation conflicts (sbc4r22 table 13). So this type of reservation and active-active would only be useful for read-only work loads. For Exclusive Access READ/WRITEs can only go down pathA ok. If they go down PathB we will get reservation conflicts. So it's really useless in an active- active setup. For All Reg and Reg Only this is not a problem because all paths are registered and the spec says for those types of reservations we can do R/W IO through them (sbc4r22 table 13). 2. For an active-passive setup with 2 priority groups and 1 path per group we have a similar issue when there is a path failure. PathA in PG0 will be the holder and since we have the one path in use, all IO will execute ok. If pathA fails and we switch to pathB in PG1, then similar to above, depending on the reservation type and IO type, we will get conflicts since PathA was the holder. I was thinking maybe for this one, some specific user sometimes wants this behavior (use PG1 as a last resort). However, I think the user would normally expect the All Reg or Reg only behavior where when we switch to pathB and don't get conflicts for a path failure. For a non-multipath setup we don't hit 1 or 2 with Write Exclusive or Exclusive Access since each host just has the one path to the device. >> >> >>> systems that don't have the requirement to pin the PR to a device? >> >> I didn't find any real applications that use the All Registrants type of >> reservation where every registered path is a reservation holder. However, >> libiscsi has PGR tests for that type of reservation and the code works ok. > > Well. In general ALL_TG_PT would usually be the preferred method > everywhere. But that assumes: > > 1) it actually is supported by the target > 2) the target actually has a useful concept of the Linux system being > a single initiator, which outside of a few setups like software > only iSCSI are rarely true > > so we usually have to fall back to just registering every path > separately. Yeah, windows sends a register down each path like us too. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 7/18/22 12:39 PM, Mike Christie wrote: > On 7/18/22 12:55 AM, Christoph Hellwig wrote: >> On Thu, Jul 14, 2022 at 07:34:57PM -0500, Mike Christie wrote: >>> I also ran the libiscsi PGR tests. We pass all of those tests except the >>> Write Exclusive and Exclusive Access tests. I don't think those reservation >>> types make sense for multipath devices though. And as I write this I'm >>> thinking we should add a check for these types and just return failure). >> >> Why would any reservation type make less sense for multipath vs >> non-multipath setups/ > > I think those 2 types only work for really specific use cases in multipath > setups. > ... >> 2) the target actually has a useful concept of the Linux system being >> a single initiator, which outside of a few setups like software >> only iSCSI are rarely true >> Oh yeah, if we are talking about the same type of thing then I've seen a target that has some smarts and treats Write Exclusive and Exclusive Access like a All Reg or Reg Only variants when it detects or is setup for linux and it sees all an initiator's paths registered. So yeah I guess we should not fail those reservations types. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Mon, Jul 18, 2022 at 12:39:12PM -0500, Mike Christie wrote: > 1. Let's say you do an active-active setup with 2 paths in one priority group, > and the Write Exclusive or Exclusive Access reservation went down pathA so it's > the holder. When the app does IO to /dev/dm-0 the path selectors aren't PGR aware > so IO can go down any path. For Write Exclusive, when WRITEs go down pathB they > get failed with reservation conflicts (sbc4r22 table 13). So this type of > reservation and active-active would only be useful for read-only work loads. > > For Exclusive Access READ/WRITEs can only go down pathA ok. If they go down > PathB we will get reservation conflicts. So it's really useless in an active- > active setup. It's not useless. It just needs all paths to be registered. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 7/20/22 1:16 AM, Christoph Hellwig wrote: > On Mon, Jul 18, 2022 at 12:39:12PM -0500, Mike Christie wrote: >> 1. Let's say you do an active-active setup with 2 paths in one priority group, >> and the Write Exclusive or Exclusive Access reservation went down pathA so it's >> the holder. When the app does IO to /dev/dm-0 the path selectors aren't PGR aware >> so IO can go down any path. For Write Exclusive, when WRITEs go down pathB they >> get failed with reservation conflicts (sbc4r22 table 13). So this type of >> reservation and active-active would only be useful for read-only work loads. >> >> For Exclusive Access READ/WRITEs can only go down pathA ok. If they go down >> PathB we will get reservation conflicts. So it's really useless in an active- >> active setup. > > It's not useless. It just needs all paths to be registered. I don't think that's correct. I think you misunderstood me or I misunderstood table 13 in sbc4r22. For just "Exclusive Access", even if the path is registered the table says to fail commands with a reservation conflict if the path is not the reservation holder. So there is no way to use active-active with more than 1 path, or every time the path selector switches paths to the non holding path you get IO errors. For "Exclusive Access All Registrants" or "Exclusive Access Registrants Only" I agree you are correct, and all registered paths can execute IO ok. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 67b46ae896b1..cdd1656b99d6 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -3048,6 +3048,7 @@ struct dm_pr { u32 flags; bool fail_early; int ret; + enum pr_type type; }; static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn, @@ -3139,25 +3140,42 @@ static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, return ret; } + +static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) +{ + struct dm_pr *pr = data; + const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops; + + if (!ops || !ops->pr_reserve) { + pr->ret = -EOPNOTSUPP; + return -1; + } + + pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags); + if (!pr->ret) + return -1; + + return 0; +} + static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, u32 flags) { - struct mapped_device *md = bdev->bd_disk->private_data; - const struct pr_ops *ops; - int r, srcu_idx; + struct dm_pr pr = { + .old_key = key, + .flags = flags, + .type = type, + .fail_early = false, + .ret = 0, + }; + int ret; - r = dm_prepare_ioctl(md, &srcu_idx, &bdev); - if (r < 0) - goto out; + ret = dm_call_pr(bdev, __dm_pr_reserve, &pr); + if (ret) + return ret; - ops = bdev->bd_disk->fops->pr_ops; - if (ops && ops->pr_reserve) - r = ops->pr_reserve(bdev, key, type, flags); - else - r = -EOPNOTSUPP; -out: - dm_unprepare_ioctl(md, srcu_idx); - return r; + return pr.ret; } static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
When an app does a pr_reserve it will go to whatever path we happen to be using at the time. This can result in errors where the app does a second pr_reserve call and expects success but gets a failure becuase the reserve is not done on the holder's path. This patch has us always start trying to do reserves from the first path in the first group. Signed-off-by: Mike Christie <michael.christie@oracle.com> --- drivers/md/dm.c | 46 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 32 insertions(+), 14 deletions(-)