[3/3] scsi: target: core: Change ASCQ for residual write

Message ID	20201022172011.42367-4-a.kovaleva@yadro.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=4YdN=D5=vger.kernel.org=linux-scsi-owner@kernel.org> From: Anastasia Kovaleva <a.kovaleva@yadro.com> To: <target-devel@vger.kernel.org> CC: <linux-scsi@vger.kernel.org>, <linux@yadro.com>, Anastasia Kovaleva <a.kovaleva@yadro.com>, Roman Bolshakov <r.bolshakov@yadro.com> Subject: [PATCH 3/3] scsi: target: core: Change ASCQ for residual write Date: Thu, 22 Oct 2020 20:20:11 +0300 Message-ID: <20201022172011.42367-4-a.kovaleva@yadro.com> In-Reply-To: <20201022172011.42367-1-a.kovaleva@yadro.com> References: <20201022172011.42367-1-a.kovaleva@yadro.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk
Series	scsi: target: Set correct residual data \| expand [0/3] scsi: target: Set correct residual data [1/3] scsi: target: core: Set residuals for 4Kn devices [2/3] scsi: target: core: Signal WRITE residuals [3/3] scsi: target: core: Change ASCQ for residual write

Anastasia Kovaleva Oct. 22, 2020, 5:20 p.m. UTC

According to FCP-4 (9.4.2):

  If the command requested that data beyond the length specified by the
  FCP_DL field be transferred, then the device server shall set the
  FCP_RESID_OVER bit (see 9.5.8) to one in the FCP_RSP IU and:

  a) process the command normally except that data beyond the FCP_DL
  count shall not be requested or transferred;

  b) transfer no data and return CHECK CONDITION status with the sense
  key set to ILLEGAL REQUEST and the additional sense code set to INVALID
  FIELD IN COMMAND INFORMATION UNIT; or

  c) may transfer data and return CHECK CONDITION status with the sense
  key set to ABORTED COMMAND and the additional sense code set to
  INVALID FIELD IN COMMAND INFORMATION UNIT.

TCM follows b) and transfers no data for residual writes but returns
INVALID FIELD IN CDB instead of INVALID FIELD IN COMMAND INFORMATION
UNIT.

Change the ASCQ to INVALID FIELD IN COMMAND INFORMATION UNIT to meet the
standart.

Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
---
 drivers/target/target_core_transport.c | 8 +++++++-
 include/target/target_core_base.h      | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

Bart Van Assche Oct. 24, 2020, 4:07 a.m. UTC | #1

On 10/22/20 10:20 AM, Anastasia Kovaleva wrote:
> According to FCP-4 (9.4.2):
> 
>   If the command requested that data beyond the length specified by the
>   FCP_DL field be transferred, then the device server shall set the
>   FCP_RESID_OVER bit (see 9.5.8) to one in the FCP_RSP IU and:
> 
>   a) process the command normally except that data beyond the FCP_DL
>   count shall not be requested or transferred;
> 
>   b) transfer no data and return CHECK CONDITION status with the sense
>   key set to ILLEGAL REQUEST and the additional sense code set to INVALID
>   FIELD IN COMMAND INFORMATION UNIT; or
> 
>   c) may transfer data and return CHECK CONDITION status with the sense
>   key set to ABORTED COMMAND and the additional sense code set to
>   INVALID FIELD IN COMMAND INFORMATION UNIT.
> 
> TCM follows b) and transfers no data for residual writes but returns
> INVALID FIELD IN CDB instead of INVALID FIELD IN COMMAND INFORMATION
> UNIT.
> 
> Change the ASCQ to INVALID FIELD IN COMMAND INFORMATION UNIT to meet the
> standart.

Is FCP the only standard that requires to report INVALID FIELD IN COMMAND
INFORMATION UNIT for residual overflow? I haven't found any similar
requirement in the iSCSI RFC nor in the SRP standard.

Additionally, what benefits does it provide to report a CHECK CONDITION
upon residual overflow? The SCST QLogic FC target driver doesn't do this
as far as I know, is more than ten years old, is widely used and so far
nobody complained about this.

Bart.

Roman Bolshakov Oct. 24, 2020, 12:13 p.m. UTC | #2

On Fri, Oct 23, 2020 at 09:07:38PM -0700, Bart Van Assche wrote:
> On 10/22/20 10:20 AM, Anastasia Kovaleva wrote:
> > According to FCP-4 (9.4.2):
> > 
> >   If the command requested that data beyond the length specified by the
> >   FCP_DL field be transferred, then the device server shall set the
> >   FCP_RESID_OVER bit (see 9.5.8) to one in the FCP_RSP IU and:
> > 
> >   a) process the command normally except that data beyond the FCP_DL
> >   count shall not be requested or transferred;
> > 
> >   b) transfer no data and return CHECK CONDITION status with the sense
> >   key set to ILLEGAL REQUEST and the additional sense code set to INVALID
> >   FIELD IN COMMAND INFORMATION UNIT; or
> > 
> >   c) may transfer data and return CHECK CONDITION status with the sense
> >   key set to ABORTED COMMAND and the additional sense code set to
> >   INVALID FIELD IN COMMAND INFORMATION UNIT.
> > 
> > TCM follows b) and transfers no data for residual writes but returns
> > INVALID FIELD IN CDB instead of INVALID FIELD IN COMMAND INFORMATION
> > UNIT.
> > 
> > Change the ASCQ to INVALID FIELD IN COMMAND INFORMATION UNIT to meet the
> > standart.
> 
> Is FCP the only standard that requires to report INVALID FIELD IN COMMAND
> INFORMATION UNIT for residual overflow? I haven't found any similar
> requirement in the iSCSI RFC nor in the SRP standard.
> 

Hi Bart,

iSCSI doesn't specify a specific code but mentions a possibility of CHECK
CONDITION for residuals (11.4.5.1.  Field Semantics):

  Targets may set the residual count, and initiators may use it when the
  response code is Command Completed at Target (even if the status
  returned is not GOOD).

I have skimmed over SRP and haven't found if it's explicitly disallowed
to send CHECK CONDITION for READ/WRITEs with residuals.

> Additionally, what benefits does it provide to report a CHECK CONDITION
> upon residual overflow?

Typical use case for CHECK CONDITION in case of Underflow/Overflow is
extra robustness against buggy initiators [1][2]. Failing both READ and
WRITE is the most solid approach in that sense [3][4][5] as it prevents
data corruption at all costs.

Suppose an initiator wants to WRITE 8 LBA. For 512-byte formatted LUN,
8 LBAs need a buffer of 4K bytes. For 4096-byte formatted LUN the
command would need 32K data buffer.

An Overflow happens if initiator treats 4Kn device like 512n one but
provides a buffer of 4K. i.e. to complete the WRITE target needs to
consume 28K more data, otherwise only 1 LBA would be written and the
rest 7 LBAs would have indeterminate content.

An Underflow happens if initiator confuses 512n device with 4Kn one and
provides a buffer of 32K, i.e. target doesn't utilize all buffer for the
command.

> The SCST QLogic FC target driver doesn't do this as far as I know, is
> more than ten years old, is widely used and so far nobody complained
> about this.
> 

It might be true but there are no public tests for FC. We're planning to
extend libiscsi using SG_IO v4 to cover more corner cases, like FC
residuals and aborts/error recovery. Also, SLER (Sequence level error
recovery) is comming to FCP-5/FC-NVMe-2, it be useful to test it too.

1. https://www.t10.org/pipermail/t10/2014-June/017369.html
2. https://www.t10.org/pipermail/t10/2014-June/017370.html
3. https://www.t10.org/pipermail/t10/2012-September/016340.html
4. https://www.t10.org/pipermail/t10/2012-September/016350.html
5. https://mailarchive.ietf.org/arch/msg/ips/PpMCMQw-HzKQ5gwteFD54CN_0EY/

Regards,
Roman

Bart Van Assche Oct. 25, 2020, 12:25 a.m. UTC | #3

On 10/24/20 5:13 AM, Roman Bolshakov wrote:
> iSCSI doesn't specify a specific code but mentions a possibility of CHECK
> CONDITION for residuals (11.4.5.1.  Field Semantics):
> 
>   Targets may set the residual count, and initiators may use it when the
>   response code is Command Completed at Target (even if the status
>   returned is not GOOD).

My interpretation of the above text is that it neither allows nor
requires to change the status GOOD into something else if there is a
residue.

>> Additionally, what benefits does it provide to report a CHECK CONDITION
>> upon residual overflow?
> 
> Typical use case for CHECK CONDITION in case of Underflow/Overflow is
> extra robustness against buggy initiators [1][2]. Failing both READ and
> WRITE is the most solid approach in that sense [3][4][5] as it prevents
> data corruption at all costs.
> 
> Suppose an initiator wants to WRITE 8 LBA. For 512-byte formatted LUN,
> 8 LBAs need a buffer of 4K bytes. For 4096-byte formatted LUN the
> command would need 32K data buffer.
> 
> An Overflow happens if initiator treats 4Kn device like 512n one but
> provides a buffer of 4K. i.e. to complete the WRITE target needs to
> consume 28K more data, otherwise only 1 LBA would be written and the
> rest 7 LBAs would have indeterminate content.
> 
> An Underflow happens if initiator confuses 512n device with 4Kn one and
> provides a buffer of 32K, i.e. target doesn't utilize all buffer for the
> command.

Thanks for the additional background information, this really helps. How
about only rejecting SCSI commands for which the data buffer size is not
a multiple of the block size? I'm concerned that flagging all SCSI
commands that have a residue as invalid will break SCSI tape software.

Thanks,

Bart.

Roman Bolshakov Oct. 26, 2020, 1:12 p.m. UTC | #4

On Sat, Oct 24, 2020 at 05:25:17PM -0700, Bart Van Assche wrote:
> On 10/24/20 5:13 AM, Roman Bolshakov wrote:
> >> Additionally, what benefits does it provide to report a CHECK CONDITION
> >> upon residual overflow?
> > 
> > Typical use case for CHECK CONDITION in case of Underflow/Overflow is
> > extra robustness against buggy initiators [1][2]. Failing both READ and
> > WRITE is the most solid approach in that sense [3][4][5] as it prevents
> > data corruption at all costs.
> > 
> > Suppose an initiator wants to WRITE 8 LBA. For 512-byte formatted LUN,
> > 8 LBAs need a buffer of 4K bytes. For 4096-byte formatted LUN the
> > command would need 32K data buffer.
> > 
> > An Overflow happens if initiator treats 4Kn device like 512n one but
> > provides a buffer of 4K. i.e. to complete the WRITE target needs to
> > consume 28K more data, otherwise only 1 LBA would be written and the
> > rest 7 LBAs would have indeterminate content.
> > 
> > An Underflow happens if initiator confuses 512n device with 4Kn one and
> > provides a buffer of 32K, i.e. target doesn't utilize all buffer for the
> > command.
> 
> Thanks for the additional background information, this really helps. How
> about only rejecting SCSI commands for which the data buffer size is not
> a multiple of the block size? I'm concerned that flagging all SCSI
> commands that have a residue as invalid will break SCSI tape software.
> 

Hi Bart,

Could you please elaborate on how tape software will be broken?
I have no experience with tapes but I've looked into SSC-5 draft.

I haven't found anything concerning the writes but there are tape
variants of overflow/underflow for reads (G.3 General read rules) called
overlength and underlegth, respectively:

  If the read command requests fewer bytes than are available for
  transfer, then the read is an overlength read. If the read requests
  more bytes than are available, then the read is an underlength read.

  The amount of data returned is the smaller of the bytes available and
  the allocation length.

And the next paragraph defines cases where CHECK CONDITION should be
reported for such reads. However, GOOD status is also possible, the next
chapter of the annex (G.4 Examples from figure G.1 using variable-block
transfers and various SILI and BLOCK LENGTH settings) refines many cases
depending on SILI bit, whether block protection is enabled, if the
transfer is FIXED or variable-length and if BLOCK LENGTH is
zero/non-zero.

As far as I understand underlength and overlength are always suppressed
(status is GOOD) for devices where no "default" block size is defined
per SPC (7.5.7.1 General block descriptor format):

  For sequential access devices, a block length of zero indicates that the
  logical block size written to the medium is specified by the TRANSFER
  LENGTH field in the CDB (see SSC-4).

The cases are also summarized in annex D (D.3 Summary of length error
conditions on read type commands).

Note, that if we talk about SSC over FCP, then "9.4.2 FCP_DATA IUs for
read and write operations" does additionally apply. Perhaps a) from
"9.4.2 FCP_DATA IUs for read and write operations" works well for SSC:

  a) process the command normally except that data beyond the FCP_DL count
  shall not be requested or transferred;

The clause allows to accomodate variable-block tranfers from SSC.

So, what if we return CHECK CONDITION only for SBC WRITEs with
residuals?  Then it has no impact on SSC and other device types. In
future, we might also add a patch that would fail SBC READs with
residuals for sake of consistency. That behaviour would be beneficial
for SBC devices as no host could corrupt data or itself by forming,
requesting invalid data buffer.

Thanks,
Roman

Bodo Stroesser Oct. 26, 2020, 5:33 p.m. UTC | #5

Am 25.10.20 um 02:25 schrieb Bart Van Assche:
> On 10/24/20 5:13 AM, Roman Bolshakov wrote:
>> iSCSI doesn't specify a specific code but mentions a possibility of CHECK
>> CONDITION for residuals (11.4.5.1.  Field Semantics):
>>
>>    Targets may set the residual count, and initiators may use it when the
>>    response code is Command Completed at Target (even if the status
>>    returned is not GOOD).
> 
> My interpretation of the above text is that it neither allows nor
> requires to change the status GOOD into something else if there is a
> residue.
> 
>>> Additionally, what benefits does it provide to report a CHECK CONDITION
>>> upon residual overflow?
>>
>> Typical use case for CHECK CONDITION in case of Underflow/Overflow is
>> extra robustness against buggy initiators [1][2]. Failing both READ and
>> WRITE is the most solid approach in that sense [3][4][5] as it prevents
>> data corruption at all costs.
>>
>> Suppose an initiator wants to WRITE 8 LBA. For 512-byte formatted LUN,
>> 8 LBAs need a buffer of 4K bytes. For 4096-byte formatted LUN the
>> command would need 32K data buffer.
>>
>> An Overflow happens if initiator treats 4Kn device like 512n one but
>> provides a buffer of 4K. i.e. to complete the WRITE target needs to
>> consume 28K more data, otherwise only 1 LBA would be written and the
>> rest 7 LBAs would have indeterminate content.
>>
>> An Underflow happens if initiator confuses 512n device with 4Kn one and
>> provides a buffer of 32K, i.e. target doesn't utilize all buffer for the
>> command.
> 
> Thanks for the additional background information, this really helps. How
> about only rejecting SCSI commands for which the data buffer size is not
> a multiple of the block size? I'm concerned that flagging all SCSI
> commands that have a residue as invalid will break SCSI tape software.

AFAICS, there is no risk to break tape handling. target_cmd_size_check() is mainly used by sbc_parse_cdb(),
while passthrough_parse_cdb() optionally calls it for PERSISTENT_RESERVFE_IN/_OUT and RESERVE(_10)/RELEASE(_10) only.

sbc_parse_cdb is not usable for tape devices anyway, since CDB 'length' field in READ/WRITE for SSC devices needs special processing.
Depending on current state of the device, length 1 can have the meaning 1 byte or 1 times the optionally set fixed block size.
So the only way to set up a tape target LUN is to use tcmu or pscsi.

> 
> Thanks,
> 
> Bart.
>

Bart Van Assche Oct. 27, 2020, 2:42 a.m. UTC | #6

On 10/26/20 6:12 AM, Roman Bolshakov wrote:
> Could you please elaborate on how tape software will be broken?
> I have no experience with tapes but I've looked into SSC-5 draft.
> 
> I haven't found anything concerning the writes but there are tape
> variants of overflow/underflow for reads (G.3 General read rules) called
> overlength and underlegth, respectively:
> 
>   If the read command requests fewer bytes than are available for
>   transfer, then the read is an overlength read. If the read requests
>   more bytes than are available, then the read is an underlength read.
> 
>   The amount of data returned is the smaller of the bytes available and
>   the allocation length.
> 
> And the next paragraph defines cases where CHECK CONDITION should be
> reported for such reads. However, GOOD status is also possible, the next
> chapter of the annex (G.4 Examples from figure G.1 using variable-block
> transfers and various SILI and BLOCK LENGTH settings) refines many cases
> depending on SILI bit, whether block protection is enabled, if the
> transfer is FIXED or variable-length and if BLOCK LENGTH is
> zero/non-zero.
> 
> As far as I understand underlength and overlength are always suppressed
> (status is GOOD) for devices where no "default" block size is defined
> per SPC (7.5.7.1 General block descriptor format):
> 
>   For sequential access devices, a block length of zero indicates that the
>   logical block size written to the medium is specified by the TRANSFER
>   LENGTH field in the CDB (see SSC-4).
> 
> The cases are also summarized in annex D (D.3 Summary of length error
> conditions on read type commands).
> 
> Note, that if we talk about SSC over FCP, then "9.4.2 FCP_DATA IUs for
> read and write operations" does additionally apply. Perhaps a) from
> "9.4.2 FCP_DATA IUs for read and write operations" works well for SSC:
> 
>   a) process the command normally except that data beyond the FCP_DL count
>   shall not be requested or transferred;
> 
> The clause allows to accomodate variable-block tranfers from SSC.
> 
> So, what if we return CHECK CONDITION only for SBC WRITEs with
> residuals?  Then it has no impact on SSC and other device types. In
> future, we might also add a patch that would fail SBC READs with
> residuals for sake of consistency. That behaviour would be beneficial
> for SBC devices as no host could corrupt data or itself by forming,
> requesting invalid data buffer.

Hi Roman,

Maybe I'm overly concerned. I do not know for sure which applications
rely on the current behavior of residual handling. All I know about
these applications is based on what others wrote about these
applications. An example from
https://www.t10.org/pipermail/t10/2003-November/009317.html: "We have
customers who also use overlength and underlength transfers as a normal
mode of operation."

An additional question is what behavior other operating systems than
Linux expect? There are probably setups in which another operating
system than Linux communicates with a LIO SCSI target?

Thanks,

Bart.

Roman Bolshakov Oct. 27, 2020, 11:46 p.m. UTC | #7

On Mon, Oct 26, 2020 at 07:42:55PM -0700, Bart Van Assche wrote:
> On 10/26/20 6:12 AM, Roman Bolshakov wrote:
> > Note, that if we talk about SSC over FCP, then "9.4.2 FCP_DATA IUs for
> > read and write operations" does additionally apply. Perhaps a) from
> > "9.4.2 FCP_DATA IUs for read and write operations" works well for SSC:
> > 
> >   a) process the command normally except that data beyond the FCP_DL count
> >   shall not be requested or transferred;
> > 
> > The clause allows to accomodate variable-block tranfers from SSC.
> > 
> > So, what if we return CHECK CONDITION only for SBC WRITEs with
> > residuals?  Then it has no impact on SSC and other device types. In
> > future, we might also add a patch that would fail SBC READs with
> > residuals for sake of consistency. That behaviour would be beneficial
> > for SBC devices as no host could corrupt data or itself by forming,
> > requesting invalid data buffer.
> 
> Maybe I'm overly concerned. I do not know for sure which applications
> rely on the current behavior of residual handling. All I know about
> these applications is based on what others wrote about these
> applications. An example from
> https://www.t10.org/pipermail/t10/2003-November/009317.html: "We have
> customers who also use overlength and underlength transfers as a normal
> mode of operation."
> 

Hi Bart,

Thanks for raising the point about overlength/underlength. If you wish
we can add an extra check that fails DMA_TO_DEVICE && DATA with
residuals only for SBC devices but note that before the series,
underflow/overflow for WRITE didn't return GOOD status. The particular
patch only changes sense code to more meaningful from the former INVALID
FIELD IN CDB.

Theoretically, it could be good to have a configurable switch how LIO
handles overflows/underflows for a LUN. Then it'd be possible to
configure desired behaviour on a per-LUN basis. But there should be a
clear need & demand for the feature to avoid maintenance of dead code.

> An additional question is what behavior other operating systems than
> Linux expect? There are probably setups in which another operating
> system than Linux communicates with a LIO SCSI target?
> 

TBH I don't know any hosts that do SBC WRITE with residuals as normal
course of operation. They wouldn't be able to work with LIO because it
never returns GOOD status on WRITE with residuals. I can send an update
later if the series works fine with modern hosts (~1 month, after a few
cycles of system testing).

Fun fact, ~60 years ago WRITE overflows were used to achieve behaviour
similar to disk zeroing/WRITE SAME [1].

1. https://mailarchive.ietf.org/arch/msg/ips/135ycRlgwUg1sb3gRrUQ3-lSXg0/

Thanks,
Roman

Bart Van Assche Oct. 28, 2020, 3:41 a.m. UTC | #8

On 10/27/20 4:46 PM, Roman Bolshakov wrote:
> Thanks for raising the point about overlength/underlength. If you wish
> we can add an extra check that fails DMA_TO_DEVICE && DATA with
> residuals only for SBC devices but note that before the series,
> underflow/overflow for WRITE didn't return GOOD status. The particular
> patch only changes sense code to more meaningful from the former INVALID
> FIELD IN CDB.
> 
> Theoretically, it could be good to have a configurable switch how LIO
> handles overflows/underflows for a LUN. Then it'd be possible to
> configure desired behaviour on a per-LUN basis. But there should be a
> clear need & demand for the feature to avoid maintenance of dead code.
>>> An additional question is what behavior other operating systems than
>> Linux expect? There are probably setups in which another operating
>> system than Linux communicates with a LIO SCSI target?
> 
> TBH I don't know any hosts that do SBC WRITE with residuals as normal
> course of operation. They wouldn't be able to work with LIO because it
> never returns GOOD status on WRITE with residuals. I can send an update
> later if the series works fine with modern hosts (~1 month, after a few
> cycles of system testing).

Hi Roman,

I'm not sure adding a new kernel switch is the best choice. That would
be an additional parameter users have to know about and have to learn
how to use.

Bodo seems to be in favor of this patch series. Are there other people
who want to share their opinion about this patch series?

Thanks,

Bart.

Anastasia Kovaleva Nov. 10, 2020, 4:57 p.m. UTC | #9

On 28 Oct 2020, at 06:41, Bart Van Assche <bvanassche@acm.org> wrote:
> On 10/27/20 4:46 PM, Roman Bolshakov wrote:
>> Thanks for raising the point about overlength/underlength. If you wish
>> we can add an extra check that fails DMA_TO_DEVICE && DATA with
>> residuals only for SBC devices but note that before the series,
>> underflow/overflow for WRITE didn't return GOOD status. The particular
>> patch only changes sense code to more meaningful from the former INVALID
>> FIELD IN CDB.
>> 
>> Theoretically, it could be good to have a configurable switch how LIO
>> handles overflows/underflows for a LUN. Then it'd be possible to
>> configure desired behaviour on a per-LUN basis. But there should be a
>> clear need & demand for the feature to avoid maintenance of dead code.
>>>> An additional question is what behavior other operating systems than
>>> Linux expect? There are probably setups in which another operating
>>> system than Linux communicates with a LIO SCSI target?
>> 
>> TBH I don't know any hosts that do SBC WRITE with residuals as normal
>> course of operation. They wouldn't be able to work with LIO because it
>> never returns GOOD status on WRITE with residuals. I can send an update
>> later if the series works fine with modern hosts (~1 month, after a few
>> cycles of system testing).
> 
> Hi Roman,
> 
> I'm not sure adding a new kernel switch is the best choice. That would
> be an additional parameter users have to know about and have to learn
> how to use.
> 
> Bodo seems to be in favor of this patch series. Are there other people
> who want to share their opinion about this patch series?

Hi Bart,

Is this patch series good enough to be accepted in this form, without
the kernel switch? As far as i can see, no one has shared their opinion
about this changes. 

Thanks,

Anastasia

Bart Van Assche Nov. 15, 2020, 3:17 a.m. UTC | #10

On 11/10/20 8:57 AM, Anastasia Kovaleva wrote:
> Is this patch series good enough to be accepted in this form, without
> the kernel switch? As far as i can see, no one has shared their opinion
> about this changes. 

Hi Anastasia,

I will leave it to others to review this patch series.

Thanks,

Bart.

[3/3] scsi: target: core: Change ASCQ for residual write

Commit Message

Comments

Patch