mbox series

[RFC,v1,0/8] scsi: Multipath support for scsi disk devices.

Message ID 20241109044529.992935-1-himanshu.madhani@oracle.com (mailing list archive)
Headers show
Series scsi: Multipath support for scsi disk devices. | expand

Message

Himanshu Madhani Nov. 9, 2024, 4:45 a.m. UTC
From: Himanshu Madhani <himanshu.madhani@oracle.com>

Hello Folks,

Here is a very early RFC for multipath support in the scsi layer. This patch series
implements native multipath support for scsi disks devices.

In this series, I am providing conceptual changes which still needs work. However,
I wanted to get this RFC out to get community feedback on the direction of changes.

This RFC follows NVMe multipath implementation closely for SCSI multipath. Currently,
SCSI multipath only supports disk devices which advertises ALUA (Asymmetric Logical
Unit Access) capability in the Inquiry response data.

Patches are split as following

Patch 1: Add new SCSI multipath files and makefile changes for enabling multipath support.
Patch 2: Adds changes to scsi_host structure for multipath support
Patch 3: Adds error handling capability to the multipath changes.
Patch 4: Wires up commpletion path for the request
Patch 5: Adds sysfs hooks for displaying iopolicy and state.
Patch 6: Adds changes to use ALUA handler for multipath
Patch 7: Adds changes in sd driver for multipath.
Patch 8: Adds changes to scsi_debug driver for ALUA testing.

Here's list of TO-DO that will be addressed in next RFC version

1. Cleanup sysfs directory structure and only show first multipath device.
2. Test failover scenario with multiple disks and injecting errors with IO.
3. Test updating iopolicy while running IO and make sure path failover happens.
4. cleanup ALUA code to integrate more closely with new multipath code.
5. Performance numbers for the multipath disks.
6. PR ops are not yet handled by this series and will be added in next RFC.

Thanks,
Himanshu

Himanshu Madhani (8):
  scsi: Add multipath device support
  scsi: create multipath capable scsi host
  scsi: Add error handling capability for multipath
  scsi: Complete multipath request
  scsi: Add scsi multipath sysfs hooks
  scsi: Add multipath suppport for device handler
  scsi: Add multipath disk init code for sd driver
  scsi_debug: Add module parameter for ALUA multipath

 drivers/scsi/Kconfig                       |  12 +
 drivers/scsi/Makefile                      |   2 +
 drivers/scsi/device_handler/scsi_dh_alua.c |  15 +
 drivers/scsi/hosts.c                       |  12 +
 drivers/scsi/scsi_debug.c                  |  16 +-
 drivers/scsi/scsi_dh.c                     |   3 +
 drivers/scsi/scsi_error.c                  |   8 +
 drivers/scsi/scsi_lib.c                    |  25 +
 drivers/scsi/scsi_multipath.c              | 896 +++++++++++++++++++++
 drivers/scsi/scsi_sysfs.c                  | 104 +++
 drivers/scsi/sd.c                          |  83 ++
 include/scsi/scsi.h                        |   1 +
 include/scsi/scsi_device.h                 |  64 ++
 include/scsi/scsi_host.h                   |   7 +
 include/scsi/scsi_multipath.h              |  86 ++
 15 files changed, 1332 insertions(+), 2 deletions(-)
 create mode 100644 drivers/scsi/scsi_multipath.c
 create mode 100644 include/scsi/scsi_multipath.h


base-commit: 128faa1845a2d5b0178b986f3bd18fb38cc08cc2

Comments

Bart Van Assche Nov. 10, 2024, 9:15 p.m. UTC | #1
On 11/8/24 8:45 PM, himanshu.madhani@oracle.com wrote:
> Here is a very early RFC for multipath support in the scsi layer. This patch series
> implements native multipath support for scsi disks devices.
> 
> In this series, I am providing conceptual changes which still needs work. However,
> I wanted to get this RFC out to get community feedback on the direction of changes.
> 
> This RFC follows NVMe multipath implementation closely for SCSI multipath. Currently,
> SCSI multipath only supports disk devices which advertises ALUA (Asymmetric Logical
> Unit Access) capability in the Inquiry response data.

Something very important is missing from the cover letter, namely a
motivation of why this initiative has been started. Why to add native
multipath support to the SCSI core instead of using dm-multipath? Isn't
one of the goals of the Linux kernel not to duplicate functionality that
already exists? How does the new infrastructure compare with
dm-multipath from the point of view of performance and functionality?

Thanks,

Bart.
Himanshu Madhani Nov. 12, 2024, 8:46 p.m. UTC | #2
Hi Bart,

On 11/10/24 13:15, Bart Van Assche wrote:
> 
> On 11/8/24 8:45 PM, himanshu.madhani@oracle.com wrote:
>> Here is a very early RFC for multipath support in the scsi layer. This 
>> patch series
>> implements native multipath support for scsi disks devices.
>>
>> In this series, I am providing conceptual changes which still needs 
>> work. However,
>> I wanted to get this RFC out to get community feedback on the 
>> direction of changes.
>>
>> This RFC follows NVMe multipath implementation closely for SCSI 
>> multipath. Currently,
>> SCSI multipath only supports disk devices which advertises ALUA 
>> (Asymmetric Logical
>> Unit Access) capability in the Inquiry response data.
> 
> Something very important is missing from the cover letter, namely a
> motivation of why this initiative has been started. Why to add native
> multipath support to the SCSI core instead of using dm-multipath? Isn't
> one of the goals of the Linux kernel not to duplicate functionality that
> already exists? How does the new infrastructure compare with
> dm-multipath from the point of view of performance and functionality?
> 

Sorry about missing motivation section in the cover letter. I'll add 
that in v2 when I am ready to send an updated version of this RFC.

Here's motivation

1. Having native multipath provides a seamless configuration and setting 
of multipath with SCSI, which does not involve any other dependencies. 
Especially discovery and assembly of raid array. My motivation with 
native SCSI multipath is to avoid having any 3rd party daemon to do the 
discovery and assembly of multipath devices, which can sometimes create 
issues if devices are not discovered properly. The implementation of 
native multipath will avoid all that additional steps and by virtue will 
provide plug-n-play capability for SCSI multipath configurations. Also, 
having native support will help modernize SCSI code with respect to 
multipath support and provide tighter integration for SCSI stack.


2. On the performance point of view, I believe that switching to RCU 
based path selection logic will provide faster path fail-over and will 
improve overall IO latency. In this RFC, I have not spent time on 
performance collection. I am hoping to provide more comprehensive data 
with the next RFC update.

I do not believe this is duplication of functionality since I am 
providing in-kernel multipath option which will provide users a choice 
of using native v/s out of kernel multipath implementation based on 
their needs.


> Thanks,
> 
> Bart.
>
Hannes Reinecke Nov. 22, 2024, 2:27 p.m. UTC | #3
On 11/9/24 05:45, himanshu.madhani@oracle.com wrote:
> From: Himanshu Madhani <himanshu.madhani@oracle.com>
> 
> Hello Folks,
> 
> Here is a very early RFC for multipath support in the scsi layer. This patch series
> implements native multipath support for scsi disks devices.
> 
> In this series, I am providing conceptual changes which still needs work. However,
> I wanted to get this RFC out to get community feedback on the direction of changes.
> 
> This RFC follows NVMe multipath implementation closely for SCSI multipath. Currently,
> SCSI multipath only supports disk devices which advertises ALUA (Asymmetric Logical
> Unit Access) capability in the Inquiry response data.
> 
First of all, thank you for doing this.
Had been on my to-do list for a long time.

However, the one crucial thing why I kept pushing it back is:

Residuals.

NVMe native multipathing works because NVMe is a 'all-or-nothing' 
protocol, ie either the entire I/O had been completed, or nothing has 
happened.
Which means for any failure we can safely retry the entire I/O on a 
different path (that's the 'steal_bio' thingie), knowing that it's safe
to do so.

For SCSI, however, this is not the case; it's perfectly valid for a 
target to do a partial completion, and ask the initiator to retry the
remainders. And this partial completion might be at any position within
the bvec, requiring us to resend the bio from a random starting position.
Meaning we cannot do a blind 'steal_bio' thing.

So: have you evaluated you series wrt to residuals?
Have you _measured_ if residuals are happening?
Have you considered your patchset how residuals could be
treated?
(It _might_ be possible to resend the entire I/O over to another path,
even if the command had been partially completed. That's perfectly safe
for reads, but for writes you have to be extremely careful to not cause
a data corruption. We had some fun discussions here over at the NVMe 
side ...)

And: please drop the device handler thingie for this, and concentrate
on ALUA. No point in carrying legacy stuff around.
_AND_ you have to evaluate the ALUA settings anyway to get a decent
path selection.

Cheers,

Hannes