Message ID | 20181108144458.29012-2-maier@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | zfcp updates for v4.21 | expand |
Steffen, > There are too many unresolved issues with DIX outside of zfcp such as > wrong protection data on writesame/discard (over device-mapper) We don't configure protected transfers for anything but read and write commands. There is currently no protection information generated for WRITE SAME. So if you guys are seeing failures, it must be due to zfcp not handling the scsi_cmnd prot_op/prot_flags or the command PROTECT bit correctly. > or due to unstable page writes. BDI_CAP_STABLE_WRITES should take care of this. What's the configuration that fails?
Hi Martin, On 11/09/2018 03:07 AM, Martin K. Petersen wrote: >> There are too many unresolved issues with DIX outside of zfcp such as >> wrong protection data on writesame/discard (over device-mapper) > > We don't configure protected transfers for anything but read and write > commands. There is currently no protection information generated for > WRITE SAME. > So if you guys are seeing failures, it must be due to zfcp > not handling the scsi_cmnd prot_op/prot_flags or the command PROTECT bit > correctly. I think we're good since ("scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled") https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/s390/scsi?id=71b8e45da51a7b64a23378221c0a5868bd79da4f. Previously, at least regular (non-recovery) I/O should have been good by having checked at least scsi_prot_sg_count(). >> or due to unstable page writes. > > BDI_CAP_STABLE_WRITES should take care of this. What's the configuration > that fails? Apologies, if the commit description sounds unfair. I did not mean to blame anyone. It's just the collection of issues we saw in distros over the years. Some of the old issues might be fixed with above zfcp patch or common code changes. Unfortunately, I could not handle the DIX things we saw. I think, DIF by itself provides a lot of the protection benefit and was not affected by the encountered issues. We would like to give users an easy way to operate in such setup.
Hi Steffen, Sorry about the delay. Travel got in the way. >> BDI_CAP_STABLE_WRITES should take care of this. What's the configuration >> that fails? > > Apologies, if the commit description sounds unfair. I did not mean to > blame anyone. It's just the collection of issues we saw in distros over > the years. Some of the old issues might be fixed with above zfcp patch > or common code changes. Unfortunately, I could not handle the DIX things > we saw. I think, DIF by itself provides a lot of the protection benefit > and was not affected by the encountered issues. We would like to give > users an easy way to operate in such setup. I don't have a problem with zfcp having a parameter that affects the host protection mask, the other drivers do that too. However, these knobs exist exclusively for debugging and testing purposes. They are not something regular users should twiddle to switch features on or off. So DIF and DIX should always be enabled in the driver. And there is no point in ever operating without DIF enabled if the hardware is capable. If there is a desire to disable DIX protection for whatever reason (legacy code doing bad things), do so using the block layer sysfs knobs. That's where the policy of whether to generate and verify protection information resides, not in the HBA driver. And if there are unaddressed issues in the I/O stack that prevents you from having integrity enabled, I'd prefer to know about them so they can be fixed rather than circumventing them through driver module parameter. Hope that makes sense.
Hi Martin, On 11/21/2018 07:13 PM, Martin K. Petersen wrote: > Sorry about the delay. Travel got in the way. No problem. >>> BDI_CAP_STABLE_WRITES should take care of this. What's the configuration >>> that fails? >> >> Apologies, if the commit description sounds unfair. I did not mean to >> blame anyone. It's just the collection of issues we saw in distros over >> the years. Some of the old issues might be fixed with above zfcp patch >> or common code changes. Unfortunately, I could not handle the DIX things >> we saw. I think, DIF by itself provides a lot of the protection benefit >> and was not affected by the encountered issues. We would like to give >> users an easy way to operate in such setup. > > I don't have a problem with zfcp having a parameter that affects the > host protection mask, the other drivers do that too. However, these > knobs exist exclusively for debugging and testing purposes. They are not > something regular users should twiddle to switch features on or off. > > So DIF and DIX should always be enabled in the driver. And there is no > point in ever operating without DIF enabled if the hardware is capable. Our long term plan is to make the new zfcp.dif (for DIF only) default to enabled once we got enough experience about zfcp stability in this mode. > If there is a desire to disable DIX protection for whatever reason > (legacy code doing bad things), do so using the block layer sysfs > knobs. That's where the policy of whether to generate and verify > protection information resides, not in the HBA driver. Yes, we came up with udev rules to set read_verify and write_generate to 0 in order to get DIF without DIX. However, this seems complicated for users, especially since we always have at least dm-multipath and maybe other dm targets such as LVM on top. The setting that matters is on the top level block device of some dm (or maybe mdraid) virtual block device stack. Getting this right, gets more complicated if there are also disks not attached through zfcp, and which may need different settings, so the udev rules would need somewhat involved matching. The new zfcp.dif parameter makes it simpler because the SCSI disk comes up with the desired limits and anything on top automatically inherits these block queue limits. There's one more important thing that has performance impact: We need to pack payload and protection data into the same queue of limited length. So for the worst case with DIX, we have to use half the size for sg_tablesize to get the other half for sg_prot_tablesize. This limits the maximum I/O request size and thus throughput. Using read_verify and write_generate does not change the tablesizes, as zfcp would still announce support for DIF and DIX. With the new zfcp.dif=1 and zfcp.dix=0, we can use the full sg_tablesize for payload data and sg_prot_tablesize=0. (The DIF "overhead" on the fibre still exists of course.) Are there other ways for accomplishing this which I'm not aware of? > And if there are unaddressed issues in the I/O stack that prevents you > from having integrity enabled, I'd prefer to know about them so they can > be fixed rather than circumventing them through driver module parameter. Sure.
Steffen, As I said, I don't have a problem with having module parameters. > There's one more important thing that has performance impact: We need to > pack payload and protection data into the same queue of limited > length. So for the worst case with DIX, we have to use half the size for > sg_tablesize to get the other half for sg_prot_tablesize. Interesting. With the exception of NVMe over PCIe, it's kind of unusual for modern controllers to have real limits in this area. > This limits the maximum I/O request size and thus throughput. Using > read_verify and write_generate does not change the tablesizes, as zfcp > would still announce support for DIF and DIX. With the new zfcp.dif=1 > and zfcp.dix=0, we can use the full sg_tablesize for payload data and > sg_prot_tablesize=0. (The DIF "overhead" on the fibre still exists of > course.) > > Are there other ways for accomplishing this which I'm not aware of? You can set your shost->sg_prot_tablesize. There are pathological corner cases like dd to the raw block device where you'll suffer if there is not a 1:1 mapping between data and protection segments. But for regular I/O where the protection is generated over a whole bio at a time, you can get away with something smaller. Anyway. I don't have any problems with you making DIX experimental for zfcp. Just want to make sure it's done for the right reasons (i.e. not problems in SCSI or the block layer).
diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c index 94f4d8fe85e0..08cdc00e8299 100644 --- a/drivers/s390/scsi/zfcp_aux.c +++ b/drivers/s390/scsi/zfcp_aux.c @@ -124,6 +124,9 @@ static int __init zfcp_module_init(void) { int retval = -ENOMEM; + if (zfcp_experimental_dix) + pr_warn("DIX is enabled. It is experimental and might cause problems\n"); + zfcp_fsf_qtcb_cache = zfcp_cache_hw_align("zfcp_fsf_qtcb", sizeof(struct fsf_qtcb)); if (!zfcp_fsf_qtcb_cache) diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h index bd0c5a9f04cb..0940bef35020 100644 --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -144,6 +144,7 @@ extern void zfcp_qdio_close(struct zfcp_qdio *); extern void zfcp_qdio_siosl(struct zfcp_adapter *); /* zfcp_scsi.c */ +extern bool zfcp_experimental_dix; extern struct scsi_transport_template *zfcp_scsi_transport_template; extern int zfcp_scsi_adapter_register(struct zfcp_adapter *); extern void zfcp_scsi_adapter_unregister(struct zfcp_adapter *); diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c index a8efcb330bc1..2b8c33627460 100644 --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -27,7 +27,11 @@ MODULE_PARM_DESC(queue_depth, "Default queue depth for new SCSI devices"); static bool enable_dif; module_param_named(dif, enable_dif, bool, 0400); -MODULE_PARM_DESC(dif, "Enable DIF/DIX data integrity support"); +MODULE_PARM_DESC(dif, "Enable DIF data integrity support (default off)"); + +bool zfcp_experimental_dix; +module_param_named(dix, zfcp_experimental_dix, bool, 0400); +MODULE_PARM_DESC(dix, "Enable experimental DIX (data integrity extension) support which implies DIF support (default off)"); static bool allow_lun_scan = true; module_param(allow_lun_scan, bool, 0600); @@ -788,11 +792,11 @@ void zfcp_scsi_set_prot(struct zfcp_adapter *adapter) data_div = atomic_read(&adapter->status) & ZFCP_STATUS_ADAPTER_DATA_DIV_ENABLED; - if (enable_dif && + if ((enable_dif || zfcp_experimental_dix) && adapter->adapter_features & FSF_FEATURE_DIF_PROT_TYPE1) mask |= SHOST_DIF_TYPE1_PROTECTION; - if (enable_dif && data_div && + if (zfcp_experimental_dix && data_div && adapter->adapter_features & FSF_FEATURE_DIX_PROT_TCPIP) { mask |= SHOST_DIX_TYPE1_PROTECTION; scsi_host_set_guard(shost, SHOST_DIX_GUARD_IP);