diff mbox series

[v5,01/18] block: Add PR callouts for read keys and reservation

Message ID 20230324181741.13908-2-michael.christie@oracle.com (mailing list archive)
State New, archived
Headers show
Series Use block pr_ops in LIO | expand

Commit Message

Mike Christie March 24, 2023, 6:17 p.m. UTC
Add callouts for reading keys and reservations. This allows LIO to support
the READ_KEYS and READ_RESERVATION commands and will allow dm-multipath
to optimize it's error handling so it can check if it's getting an error
because there's an existing reservation or if we need to retry different
paths.

Note: This only initially adds the struct definitions in the kernel as I'm
not sure if we wanted to export the interface to userspace yet. read_keys
and read_reservation are exactly what dm-multipath and LIO need, but for a
userspace interface we may want something like SCSI's READ_FULL_STATUS and
NVMe's report reservation commands. Those are overkill for dm/LIO and
READ_FULL_STATUS is sometimes broken for SCSI devices.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/pr.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

Comments

Bart Van Assche March 24, 2023, 7:45 p.m. UTC | #1
On 3/24/23 11:17, Mike Christie wrote:
> Add callouts for reading keys and reservations. This allows LIO to support
> the READ_KEYS and READ_RESERVATION commands and will allow dm-multipath
> to optimize it's error handling so it can check if it's getting an error
> because there's an existing reservation or if we need to retry different
> paths.
> 
> Note: This only initially adds the struct definitions in the kernel as I'm
> not sure if we wanted to export the interface to userspace yet. read_keys
> and read_reservation are exactly what dm-multipath and LIO need, but for a
> userspace interface we may want something like SCSI's READ_FULL_STATUS and
> NVMe's report reservation commands. Those are overkill for dm/LIO and
> READ_FULL_STATUS is sometimes broken for SCSI devices.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Mike Snitzer March 28, 2023, 4:36 p.m. UTC | #2
On Fri, Mar 24 2023 at  2:17P -0400,
Mike Christie <michael.christie@oracle.com> wrote:

> Add callouts for reading keys and reservations. This allows LIO to support
> the READ_KEYS and READ_RESERVATION commands and will allow dm-multipath
> to optimize it's error handling so it can check if it's getting an error
> because there's an existing reservation or if we need to retry different
> paths.

Not seeing anything in DM's path selectors that adds these
optimizations. Is there accompanying multipath-tools callouts to get
at this data to optimize path selection (via DM table reloads)?

Mike
Mike Christie March 28, 2023, 5:11 p.m. UTC | #3
On 3/28/23 11:36 AM, Mike Snitzer wrote:
> On Fri, Mar 24 2023 at  2:17P -0400,
> Mike Christie <michael.christie@oracle.com> wrote:
> 
>> Add callouts for reading keys and reservations. This allows LIO to support
>> the READ_KEYS and READ_RESERVATION commands and will allow dm-multipath
>> to optimize it's error handling so it can check if it's getting an error
>> because there's an existing reservation or if we need to retry different
>> paths.
> 
> Not seeing anything in DM's path selectors that adds these
> optimizations. Is there accompanying multipath-tools callouts to get
> at this data to optimize path selection (via DM table reloads)?
> 

You can ignore that comment. The comment was meant for the dm pr callouts
and not for normal IO/path handling, and now I think I can fix in a different
way.

I originally added the comment about better dm error handling for something
like __dm_pr_register getting a failure when it did:

        ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags);

Right now, we fail the entire operation if just one call on one path fails. With the
pr_read_keys/reservation callouts we could check if we got a failure because there
was an existing reservation vs a retryable/ignorable error.

I forgot I wrote that comment about dm in the mail and we actually don't need the
pr_read_* callouts for that type of thing now, because I ended up changing
the existing callouts to return common error codes last kernel. So I have another
patchset that I'm working on, but am still debating about some issues like:

ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags);
switch (ret) {
case PR_STS_RESERVATION_CONFLICT:
	pr->ret = ret;
	return -1;
case PR_STS_RETRY_PATH_FAILURE:
	/*
	 * We probably want to retry like how we do for the pg_init.
	 */
	....
case PR_STS_PATH_FAILED:
case PR_STS_PATH_FAST_FAILED:
	/*
	 * I'm still not sure what to do here, because if this is the last
	 * host then we might want to try and register the rest of the paths
	 * and limp on. It probably needs a user config option.
	 */
	....
diff mbox series

Patch

diff --git a/include/linux/pr.h b/include/linux/pr.h
index 94ceec713afe..3003daec28a5 100644
--- a/include/linux/pr.h
+++ b/include/linux/pr.h
@@ -4,6 +4,18 @@ 
 
 #include <uapi/linux/pr.h>
 
+struct pr_keys {
+	u32	generation;
+	u32	num_keys;
+	u64	keys[];
+};
+
+struct pr_held_reservation {
+	u64		key;
+	u32		generation;
+	enum pr_type	type;
+};
+
 struct pr_ops {
 	int (*pr_register)(struct block_device *bdev, u64 old_key, u64 new_key,
 			u32 flags);
@@ -14,6 +26,19 @@  struct pr_ops {
 	int (*pr_preempt)(struct block_device *bdev, u64 old_key, u64 new_key,
 			enum pr_type type, bool abort);
 	int (*pr_clear)(struct block_device *bdev, u64 key);
+	/*
+	 * pr_read_keys - Read the registered keys and return them in the
+	 * pr_keys->keys array. The keys array will have been allocated at the
+	 * end of the pr_keys struct, and pr_keys->num_keys must be set to the
+	 * number of keys the array can hold. If there are more than can fit
+	 * in the array, success will still be returned and pr_keys->num_keys
+	 * will reflect the total number of keys the device contains, so the
+	 * caller can retry with a larger array.
+	 */
+	int (*pr_read_keys)(struct block_device *bdev,
+			struct pr_keys *keys_info);
+	int (*pr_read_reservation)(struct block_device *bdev,
+			struct pr_held_reservation *rsv);
 };
 
 #endif /* LINUX_PR_H */