diff mbox series

[01/23] zfcp: make DIX experimental, disabled, and independent of DIF

Message ID 20181108144458.29012-2-maier@linux.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series zfcp updates for v4.21 | expand

Commit Message

Steffen Maier Nov. 8, 2018, 2:44 p.m. UTC
From: Fedor Loshakov <loshakov@linux.ibm.com>

There are too many unresolved issues with DIX outside of zfcp
such as wrong protection data on writesame/discard (over device-mapper)
or due to unstable page writes.
This can cause I/O stalls or endless loops or even kernel panics,
or I/O errors due to erroneously failed logical block guard checks.

Therefore, introduce separate zfcp module parameters to individually
select support for:
DIF which should work (zfcp.dif, which used to be DIF+DIX, disabled) or
DIX+DIF which causes trouble (zfcp.dix, new, disabled).

If DIX is enabled, we warn on zfcp driver initialization.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Co-developed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Signed-off-by: Fedor Loshakov <loshakov@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
---
 drivers/s390/scsi/zfcp_aux.c  |  3 +++
 drivers/s390/scsi/zfcp_ext.h  |  1 +
 drivers/s390/scsi/zfcp_scsi.c | 10 +++++++---
 3 files changed, 11 insertions(+), 3 deletions(-)

Comments

Martin K. Petersen Nov. 9, 2018, 2:07 a.m. UTC | #1
Steffen,

> There are too many unresolved issues with DIX outside of zfcp such as
> wrong protection data on writesame/discard (over device-mapper)

We don't configure protected transfers for anything but read and write
commands. There is currently no protection information generated for
WRITE SAME. So if you guys are seeing failures, it must be due to zfcp
not handling the scsi_cmnd prot_op/prot_flags or the command PROTECT bit
correctly.

> or due to unstable page writes.

BDI_CAP_STABLE_WRITES should take care of this. What's the configuration
that fails?
Steffen Maier Nov. 9, 2018, 11:17 a.m. UTC | #2
Hi Martin,

On 11/09/2018 03:07 AM, Martin K. Petersen wrote:
>> There are too many unresolved issues with DIX outside of zfcp such as
>> wrong protection data on writesame/discard (over device-mapper)
> 
> We don't configure protected transfers for anything but read and write
> commands. There is currently no protection information generated for
> WRITE SAME.

> So if you guys are seeing failures, it must be due to zfcp
> not handling the scsi_cmnd prot_op/prot_flags or the command PROTECT bit
> correctly.

I think we're good since
("scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/s390/scsi?id=71b8e45da51a7b64a23378221c0a5868bd79da4f.

Previously, at least regular (non-recovery) I/O should have been good by 
having checked at least scsi_prot_sg_count().

>> or due to unstable page writes.
> 
> BDI_CAP_STABLE_WRITES should take care of this. What's the configuration
> that fails?

Apologies, if the commit description sounds unfair. I did not mean to 
blame anyone. It's just the collection of issues we saw in distros over 
the years. Some of the old issues might be fixed with above zfcp patch 
or common code changes. Unfortunately, I could not handle the DIX things 
we saw. I think, DIF by itself provides a lot of the protection benefit 
and was not affected by the encountered issues. We would like to give 
users an easy way to operate in such setup.
Martin K. Petersen Nov. 21, 2018, 6:13 p.m. UTC | #3
Hi Steffen,

Sorry about the delay. Travel got in the way.

>> BDI_CAP_STABLE_WRITES should take care of this. What's the configuration
>> that fails?
>
> Apologies, if the commit description sounds unfair. I did not mean to
> blame anyone. It's just the collection of issues we saw in distros over
> the years. Some of the old issues might be fixed with above zfcp patch
> or common code changes. Unfortunately, I could not handle the DIX things
> we saw. I think, DIF by itself provides a lot of the protection benefit
> and was not affected by the encountered issues. We would like to give
> users an easy way to operate in such setup.

I don't have a problem with zfcp having a parameter that affects the
host protection mask, the other drivers do that too. However, these
knobs exist exclusively for debugging and testing purposes. They are not
something regular users should twiddle to switch features on or off.

So DIF and DIX should always be enabled in the driver. And there is no
point in ever operating without DIF enabled if the hardware is capable.

If there is a desire to disable DIX protection for whatever reason
(legacy code doing bad things), do so using the block layer sysfs
knobs. That's where the policy of whether to generate and verify
protection information resides, not in the HBA driver.

And if there are unaddressed issues in the I/O stack that prevents you
from having integrity enabled, I'd prefer to know about them so they can
be fixed rather than circumventing them through driver module parameter.

Hope that makes sense.
Steffen Maier Nov. 22, 2018, 11:58 a.m. UTC | #4
Hi Martin,

On 11/21/2018 07:13 PM, Martin K. Petersen wrote:
> Sorry about the delay. Travel got in the way.

No problem.

>>> BDI_CAP_STABLE_WRITES should take care of this. What's the configuration
>>> that fails?
>>
>> Apologies, if the commit description sounds unfair. I did not mean to
>> blame anyone. It's just the collection of issues we saw in distros over
>> the years. Some of the old issues might be fixed with above zfcp patch
>> or common code changes. Unfortunately, I could not handle the DIX things
>> we saw. I think, DIF by itself provides a lot of the protection benefit
>> and was not affected by the encountered issues. We would like to give
>> users an easy way to operate in such setup.
> 
> I don't have a problem with zfcp having a parameter that affects the
> host protection mask, the other drivers do that too. However, these
> knobs exist exclusively for debugging and testing purposes. They are not
> something regular users should twiddle to switch features on or off.
> 
> So DIF and DIX should always be enabled in the driver. And there is no
> point in ever operating without DIF enabled if the hardware is capable.

Our long term plan is to make the new zfcp.dif (for DIF only) default to 
enabled once we got enough experience about zfcp stability in this mode.

> If there is a desire to disable DIX protection for whatever reason
> (legacy code doing bad things), do so using the block layer sysfs
> knobs. That's where the policy of whether to generate and verify
> protection information resides, not in the HBA driver.

Yes, we came up with udev rules to set read_verify and write_generate to 
0 in order to get DIF without DIX. However, this seems complicated for 
users, especially since we always have at least dm-multipath and maybe 
other dm targets such as LVM on top. The setting that matters is on the 
top level block device of some dm (or maybe mdraid) virtual block device 
stack. Getting this right, gets more complicated if there are also disks 
not attached through zfcp, and which may need different settings, so the 
udev rules would need somewhat involved matching. The new zfcp.dif 
parameter makes it simpler because the SCSI disk comes up with the 
desired limits and anything on top automatically inherits these block 
queue limits.

There's one more important thing that has performance impact: We need to 
pack payload and protection data into the same queue of limited length. 
So for the worst case with DIX, we have to use half the size for 
sg_tablesize to get the other half for sg_prot_tablesize. This limits 
the maximum I/O request size and thus throughput. Using read_verify and 
write_generate does not change the tablesizes, as zfcp would still 
announce support for DIF and DIX. With the new zfcp.dif=1 and 
zfcp.dix=0, we can use the full sg_tablesize for payload data and 
sg_prot_tablesize=0. (The DIF "overhead" on the fibre still exists of 
course.)

Are there other ways for accomplishing this which I'm not aware of?

> And if there are unaddressed issues in the I/O stack that prevents you
> from having integrity enabled, I'd prefer to know about them so they can
> be fixed rather than circumventing them through driver module parameter.

Sure.
Martin K. Petersen Nov. 29, 2018, 1:40 a.m. UTC | #5
Steffen,

As I said, I don't have a problem with having module parameters.

> There's one more important thing that has performance impact: We need to
> pack payload and protection data into the same queue of limited
> length. So for the worst case with DIX, we have to use half the size for
> sg_tablesize to get the other half for sg_prot_tablesize.

Interesting. With the exception of NVMe over PCIe, it's kind of unusual
for modern controllers to have real limits in this area.

> This limits the maximum I/O request size and thus throughput. Using
> read_verify and write_generate does not change the tablesizes, as zfcp
> would still announce support for DIF and DIX. With the new zfcp.dif=1
> and zfcp.dix=0, we can use the full sg_tablesize for payload data and
> sg_prot_tablesize=0. (The DIF "overhead" on the fibre still exists of
> course.)
>
> Are there other ways for accomplishing this which I'm not aware of?

You can set your shost->sg_prot_tablesize. There are pathological corner
cases like dd to the raw block device where you'll suffer if there is
not a 1:1 mapping between data and protection segments. But for regular
I/O where the protection is generated over a whole bio at a time, you
can get away with something smaller.

Anyway. I don't have any problems with you making DIX experimental for
zfcp. Just want to make sure it's done for the right reasons (i.e. not
problems in SCSI or the block layer).
diff mbox series

Patch

diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
index 94f4d8fe85e0..08cdc00e8299 100644
--- a/drivers/s390/scsi/zfcp_aux.c
+++ b/drivers/s390/scsi/zfcp_aux.c
@@ -124,6 +124,9 @@  static int __init zfcp_module_init(void)
 {
 	int retval = -ENOMEM;
 
+	if (zfcp_experimental_dix)
+		pr_warn("DIX is enabled. It is experimental and might cause problems\n");
+
 	zfcp_fsf_qtcb_cache = zfcp_cache_hw_align("zfcp_fsf_qtcb",
 						  sizeof(struct fsf_qtcb));
 	if (!zfcp_fsf_qtcb_cache)
diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h
index bd0c5a9f04cb..0940bef35020 100644
--- a/drivers/s390/scsi/zfcp_ext.h
+++ b/drivers/s390/scsi/zfcp_ext.h
@@ -144,6 +144,7 @@  extern void zfcp_qdio_close(struct zfcp_qdio *);
 extern void zfcp_qdio_siosl(struct zfcp_adapter *);
 
 /* zfcp_scsi.c */
+extern bool zfcp_experimental_dix;
 extern struct scsi_transport_template *zfcp_scsi_transport_template;
 extern int zfcp_scsi_adapter_register(struct zfcp_adapter *);
 extern void zfcp_scsi_adapter_unregister(struct zfcp_adapter *);
diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index a8efcb330bc1..2b8c33627460 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -27,7 +27,11 @@  MODULE_PARM_DESC(queue_depth, "Default queue depth for new SCSI devices");
 
 static bool enable_dif;
 module_param_named(dif, enable_dif, bool, 0400);
-MODULE_PARM_DESC(dif, "Enable DIF/DIX data integrity support");
+MODULE_PARM_DESC(dif, "Enable DIF data integrity support (default off)");
+
+bool zfcp_experimental_dix;
+module_param_named(dix, zfcp_experimental_dix, bool, 0400);
+MODULE_PARM_DESC(dix, "Enable experimental DIX (data integrity extension) support which implies DIF support (default off)");
 
 static bool allow_lun_scan = true;
 module_param(allow_lun_scan, bool, 0600);
@@ -788,11 +792,11 @@  void zfcp_scsi_set_prot(struct zfcp_adapter *adapter)
 	data_div = atomic_read(&adapter->status) &
 		   ZFCP_STATUS_ADAPTER_DATA_DIV_ENABLED;
 
-	if (enable_dif &&
+	if ((enable_dif || zfcp_experimental_dix) &&
 	    adapter->adapter_features & FSF_FEATURE_DIF_PROT_TYPE1)
 		mask |= SHOST_DIF_TYPE1_PROTECTION;
 
-	if (enable_dif && data_div &&
+	if (zfcp_experimental_dix && data_div &&
 	    adapter->adapter_features & FSF_FEATURE_DIX_PROT_TCPIP) {
 		mask |= SHOST_DIX_TYPE1_PROTECTION;
 		scsi_host_set_guard(shost, SHOST_DIX_GUARD_IP);