diff mbox

[PATCH-v2,2/2] target/iblock: Use -EAGAIN/-ENOMEM to propigate SAM BUSY/TASK_SET_FULL

Message ID 1457161634-15756-2-git-send-email-nab@daterainc.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Nicholas A. Bellinger March 5, 2016, 7:07 a.m. UTC
From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch updates target/iblock backend driver code to
check for bio completion status of -EAGAIN / -ENOMEM
provided by raw block drivers and scsi devices, in order
to generate the following SCSI status to initiators:

  - SAM_STAT_BUSY
  - SAM_STAT_TASK_SET_FULL

Note these two SAM status codes are useful to signal to
Linux SCSI host side that se_cmd should be retried
again, or with TASK_SET_FULL case to attempt to lower
our internal host LLD queue_depth and scsi_cmnd retry.

It also updates target_complete_cmd() to parse status
when signalling success to target_completion_wq.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Andy Grover <agrover@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/target/target_core_iblock.c    | 38 +++++++++++++++++++++++++++-------
 drivers/target/target_core_iblock.h    |  1 +
 drivers/target/target_core_transport.c | 13 ++++++++++--
 3 files changed, 43 insertions(+), 9 deletions(-)

Comments

Christoph Hellwig March 5, 2016, 9:01 p.m. UTC | #1
> -	if (atomic_read(&ibr->ib_bio_err_cnt))
> -		status = SAM_STAT_CHECK_CONDITION;
> -	else
> +	/*
> +	 * Propigate use these two bio completion values from raw block
> +	 * drivers to signal up BUSY and TASK_SET_FULL status to the
> +	 * host side initiator.  The latter for Linux/iSCSI initiators
> +	 * means the Linux/SCSI LLD will begin to reduce it's internal
> +	 * per session queue_depth.
> +	 */
> +	if (atomic_read(&ibr->ib_bio_err_cnt)) {
> +		switch (ibr->ib_bio_retry) {
> +		case -EAGAIN:
> +			status = SAM_STAT_BUSY;
> +			break;
> +		case -ENOMEM:
> +			status = SAM_STAT_TASK_SET_FULL;
> +			break;
> +		default:
> +			status = SAM_STAT_CHECK_CONDITION;
> +			break;
> +		}
> +	} else {
>  		status = SAM_STAT_GOOD;
> +	}

I think you;d be much better off killing ib_bio_err_cnt and having
an ib_error that gets set to the last / most server error.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger March 5, 2016, 10:51 p.m. UTC | #2
On Sat, 2016-03-05 at 22:01 +0100, Christoph Hellwig wrote:
> > -	if (atomic_read(&ibr->ib_bio_err_cnt))
> > -		status = SAM_STAT_CHECK_CONDITION;
> > -	else
> > +	/*
> > +	 * Propigate use these two bio completion values from raw block
> > +	 * drivers to signal up BUSY and TASK_SET_FULL status to the
> > +	 * host side initiator.  The latter for Linux/iSCSI initiators
> > +	 * means the Linux/SCSI LLD will begin to reduce it's internal
> > +	 * per session queue_depth.
> > +	 */
> > +	if (atomic_read(&ibr->ib_bio_err_cnt)) {
> > +		switch (ibr->ib_bio_retry) {
> > +		case -EAGAIN:
> > +			status = SAM_STAT_BUSY;
> > +			break;
> > +		case -ENOMEM:
> > +			status = SAM_STAT_TASK_SET_FULL;
> > +			break;
> > +		default:
> > +			status = SAM_STAT_CHECK_CONDITION;
> > +			break;
> > +		}
> > +	} else {
> >  		status = SAM_STAT_GOOD;
> > +	}
> 
> I think you;d be much better off killing ib_bio_err_cnt and having
> an ib_error that gets set to the last / most server error.

That's what I was originally thinking too..

However, that means if one bio completed successfully and another got
-EAGAIN / -ENOMEM for the same se_cmd, IBLOCK would still complete
se_cmd with GOOD status.

I don't see how completing se_cmd with GOOD status, when one bio in the
set requested retry depending on completion order is a good idea.



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 6, 2016, 6:19 a.m. UTC | #3
On Sat, Mar 05, 2016 at 02:51:27PM -0800, Nicholas A. Bellinger wrote:
> > I think you;d be much better off killing ib_bio_err_cnt and having
> > an ib_error that gets set to the last / most server error.
> 
> That's what I was originally thinking too..
> 
> However, that means if one bio completed successfully and another got
> -EAGAIN / -ENOMEM for the same se_cmd, IBLOCK would still complete
> se_cmd with GOOD status.
> 
> I don't see how completing se_cmd with GOOD status, when one bio in the
> set requested retry depending on completion order is a good idea.

Oh, I took a look at the patch again and it looks bogus - block drivers
should never return EAGAIN or ENOMEM from ->bi_end_io.  Those are errors
that should happen before submission if at all.  Which driver ever returns
these?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger March 6, 2016, 9:55 p.m. UTC | #4
On Sat, 2016-03-05 at 22:19 -0800, Christoph Hellwig wrote:
> On Sat, Mar 05, 2016 at 02:51:27PM -0800, Nicholas A. Bellinger wrote:
> > > I think you;d be much better off killing ib_bio_err_cnt and having
> > > an ib_error that gets set to the last / most server error.
> > 
> > That's what I was originally thinking too..
> > 
> > However, that means if one bio completed successfully and another got
> > -EAGAIN / -ENOMEM for the same se_cmd, IBLOCK would still complete
> > se_cmd with GOOD status.
> > 
> > I don't see how completing se_cmd with GOOD status, when one bio in the
> > set requested retry depending on completion order is a good idea.
> 
> Oh, I took a look at the patch again and it looks bogus - block drivers
> should never return EAGAIN or ENOMEM from ->bi_end_io.  Those are errors
> that should happen before submission if at all.  Which driver ever returns
> these?

The intended use is for any make_request_fn() based driver that invokes
bio_endio() completion directly, and sets bi_error != 0 to signal
non GOOD status to target/iblock.

This is helpful when a block drivers knows it won't be able to complete
I/O before host dependent SCSI timeouts kick in, and it needs to signal
retry with BUSY status or in the case of Linux/SCSI with TASK_SET_FULL,
to reduce host queue_depth.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 7, 2016, 7:55 a.m. UTC | #5
On Sun, Mar 06, 2016 at 01:55:15PM -0800, Nicholas A. Bellinger wrote:
> The intended use is for any make_request_fn() based driver that invokes
> bio_endio() completion directly, and sets bi_error != 0 to signal
> non GOOD status to target/iblock.

But -EAGAIN and -ENOMEM are not valid drivers for bio_endio, and as far as I
can tell no driver every returns them.  So as-is this might be well intended
but either useless or broken.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger March 7, 2016, 8:03 a.m. UTC | #6
On Mon, 2016-03-07 at 08:55 +0100, Christoph Hellwig wrote:
> On Sun, Mar 06, 2016 at 01:55:15PM -0800, Nicholas A. Bellinger wrote:
> > The intended use is for any make_request_fn() based driver that invokes
> > bio_endio() completion directly, and sets bi_error != 0 to signal
> > non GOOD status to target/iblock.
> 
> But -EAGAIN and -ENOMEM are not valid drivers for bio_endio,

Why..?

>  and as far as I can tell no driver every returns them.

Correct, it's a new capability for make_request_fn() based drivers using
target/iblock export.

>  So as-is this might be well intended but either useless or broken.
> --

No, it useful for hosts that have an aggressive SCSI timeout, and it
works as expected with Linux/SCSI hosts that either retry on BUSY
status, or retry + reduce queue_depth on TASK_SET_FULL status.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 7, 2016, 4:18 p.m. UTC | #7
On Mon, Mar 07, 2016 at 12:03:56AM -0800, Nicholas A. Bellinger wrote:
> > On Sun, Mar 06, 2016 at 01:55:15PM -0800, Nicholas A. Bellinger wrote:
> > > The intended use is for any make_request_fn() based driver that invokes
> > > bio_endio() completion directly, and sets bi_error != 0 to signal
> > > non GOOD status to target/iblock.
> > 
> > But -EAGAIN and -ENOMEM are not valid drivers for bio_endio,
> 
> Why..?
> 
> >  and as far as I can tell no driver every returns them.
> 
> Correct, it's a new capability for make_request_fn() based drivers using
> target/iblock export.

Please only use it once drivers, filesystem and the block layer
can deal with it.

Right now -EAGAIN and -ENOMEM are treated as an unknown error by all
consumers of bios, so you will get a hard error and file system shutdown.

What is your driver that is going to return this, and how does it know
it's ?afe to do so?

> >  So as-is this might be well intended but either useless or broken.
> > --
> 
> No, it useful for hosts that have an aggressive SCSI timeout, and it
> works as expected with Linux/SCSI hosts that either retry on BUSY
> status, or retry + reduce queue_depth on TASK_SET_FULL status.

I explicitly wrote "as-is".  We need a way to opt into this behavior,
and we also somehow need to communicate the timeout.  I think allowing
timeouts for bios is useful, but it needs a lot more work than this
quick hack, which seems to still be missing a driver to actually
generate these errors.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley March 7, 2016, 4:40 p.m. UTC | #8
On Mon, 2016-03-07 at 00:03 -0800, Nicholas A. Bellinger wrote:
> On Mon, 2016-03-07 at 08:55 +0100, Christoph Hellwig wrote:
> > On Sun, Mar 06, 2016 at 01:55:15PM -0800, Nicholas A. Bellinger

> >  So as-is this might be well intended but either useless or broken.
> > --
> 
> No, it useful for hosts that have an aggressive SCSI timeout, and it
> works as expected with Linux/SCSI hosts that either retry on BUSY
> status, or retry + reduce queue_depth on TASK_SET_FULL status.

I'm with Christoph on this: BUSY and QUEUE_FULL are already handled
generically in SCSI.  All drivers should use the generics: to handle
separately, the driver has to intercept the error code, which I thought
I checked that none did (although it was a while ago).  Additionally,
the timeout on these operations is retries * command timeout.  So for
the default 5 retries and 30 seconds, you actually get to tolerate
BUSY/QUEUE_FULL for 2.5 minutes before you get an error.  If this is a
problem, you can bump up the timer in

/sys/class/scsi_device/<id>/device/timeout

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger March 7, 2016, 10:39 p.m. UTC | #9
On Mon, 2016-03-07 at 17:18 +0100, Christoph Hellwig wrote:
> On Mon, Mar 07, 2016 at 12:03:56AM -0800, Nicholas A. Bellinger wrote:
> > > On Sun, Mar 06, 2016 at 01:55:15PM -0800, Nicholas A. Bellinger wrote:
> > > > The intended use is for any make_request_fn() based driver that invokes
> > > > bio_endio() completion directly, and sets bi_error != 0 to signal
> > > > non GOOD status to target/iblock.
> > > 
> > > But -EAGAIN and -ENOMEM are not valid drivers for bio_endio,
> > 
> > Why..?
> > 
> > >  and as far as I can tell no driver every returns them.
> > 
> > Correct, it's a new capability for make_request_fn() based drivers using
> > target/iblock export.
> 
> Please only use it once drivers, filesystem and the block layer
> can deal with it.
> 
> Right now -EAGAIN and -ENOMEM are treated as an unknown error by all
> consumers of bios, so you will get a hard error and file system shutdown.
> 

Yes, the driver needs a way to determine when a bio was submitted via
target/iblock, and it's completion consumer is capable of processing a
non-zero bi_error as retryable.

> What is your driver that is going to return this, and how does it know
> it's ?afe to do so?

I've been using this with an out-of-tree driver for a while now, but the
most obvious upstream candidate who can benefit from this is RBD.

> 
> > >  So as-is this might be well intended but either useless or broken.
> > > --
> > 
> > No, it useful for hosts that have an aggressive SCSI timeout, and it
> > works as expected with Linux/SCSI hosts that either retry on BUSY
> > status, or retry + reduce queue_depth on TASK_SET_FULL status.
> 
> I explicitly wrote "as-is".  We need a way to opt into this behavior,
> and we also somehow need to communicate the timeout.

What did you have in mind..?

>  I think allowing
> timeouts for bios is useful, but it needs a lot more work than this
> quick hack,

From the target side, it's not a quick hack.

These initial target/iblock changes for processing non-zero bi_error +
propagating up to target-core won't change regardless of how the
underlying driver determines if a completion consumer supports retryable
bio status, or not.

> which seems to still be missing a driver to actually
> generate these errors.

I'll include the BRD patch I've been using as the first user of this
code.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger March 7, 2016, 10:44 p.m. UTC | #10
On Mon, 2016-03-07 at 11:40 -0500, James Bottomley wrote:
> On Mon, 2016-03-07 at 00:03 -0800, Nicholas A. Bellinger wrote:
> > On Mon, 2016-03-07 at 08:55 +0100, Christoph Hellwig wrote:
> > > On Sun, Mar 06, 2016 at 01:55:15PM -0800, Nicholas A. Bellinger
> 
> > >  So as-is this might be well intended but either useless or broken.
> > > --
> > 
> > No, it useful for hosts that have an aggressive SCSI timeout, and it
> > works as expected with Linux/SCSI hosts that either retry on BUSY
> > status, or retry + reduce queue_depth on TASK_SET_FULL status.
> 
> I'm with Christoph on this: BUSY and QUEUE_FULL are already handled
> generically in SCSI.  All drivers should use the generics: to handle
> separately, the driver has to intercept the error code, which I thought
> I checked that none did (although it was a while ago).  Additionally,
> the timeout on these operations is retries * command timeout.  So for
> the default 5 retries and 30 seconds, you actually get to tolerate
> BUSY/QUEUE_FULL for 2.5 minutes before you get an error.  If this is a
> problem, you can bump up the timer in
> 
> /sys/class/scsi_device/<id>/device/timeout
> 

Yes, Linux/SCSI hosts have a sane default timeout + retries, and aren't
the ones who really need SAM_STAT_BUSY + SAM_STAT_TASK_SET_FULL
responses intermittently to avoid going nuts.

It's for ESX 5+ hosts, that with iscsi use a hard-coded (non user
configurable) timeout of 5 seconds, before attempting ABORT_TASK and
friends.  For FC, it's LLD dependent, and IIRC the default for qla2xxx
is 20 seconds.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 8, 2016, 7:04 a.m. UTC | #11
On Mon, Mar 07, 2016 at 02:39:29PM -0800, Nicholas A. Bellinger wrote:
> > I explicitly wrote "as-is".  We need a way to opt into this behavior,
> > and we also somehow need to communicate the timeout.
> 
> What did you have in mind..?

You need an interface to tell the driver that it can return a timeout
status, and preferably also set the actual timeout.   The obvious
candidate would be a new method on the queue to set a user timeout,
and if one is set we could get these errors back.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 026a758..ce754f1 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -282,11 +282,28 @@  static void iblock_complete_cmd(struct se_cmd *cmd)
 
 	if (!atomic_dec_and_test(&ibr->pending))
 		return;
-
-	if (atomic_read(&ibr->ib_bio_err_cnt))
-		status = SAM_STAT_CHECK_CONDITION;
-	else
+	/*
+	 * Propigate use these two bio completion values from raw block
+	 * drivers to signal up BUSY and TASK_SET_FULL status to the
+	 * host side initiator.  The latter for Linux/iSCSI initiators
+	 * means the Linux/SCSI LLD will begin to reduce it's internal
+	 * per session queue_depth.
+	 */
+	if (atomic_read(&ibr->ib_bio_err_cnt)) {
+		switch (ibr->ib_bio_retry) {
+		case -EAGAIN:
+			status = SAM_STAT_BUSY;
+			break;
+		case -ENOMEM:
+			status = SAM_STAT_TASK_SET_FULL;
+			break;
+		default:
+			status = SAM_STAT_CHECK_CONDITION;
+			break;
+		}
+	} else {
 		status = SAM_STAT_GOOD;
+	}
 
 	target_complete_cmd(cmd, status);
 	kfree(ibr);
@@ -298,7 +315,15 @@  static void iblock_bio_done(struct bio *bio)
 	struct iblock_req *ibr = cmd->priv;
 
 	if (bio->bi_error) {
-		pr_err("bio error: %p,  err: %d\n", bio, bio->bi_error);
+		pr_debug_ratelimited("test_bit(BIO_UPTODATE) failed for bio: %p,"
+			" err: %d\n", bio, bio->bi_error);
+		/*
+		 * Save the retryable status provided and translate into
+		 * SAM status in iblock_complete_cmd().
+		 */
+		if (bio->bi_error == -EAGAIN || bio->bi_error == -ENOMEM) {
+			ibr->ib_bio_retry = bio->bi_error;
+		}
 		/*
 		 * Bump the ib_bio_err_cnt and release bio.
 		 */
@@ -677,8 +702,7 @@  iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	struct scatterlist *sg;
 	u32 sg_num = sgl_nents;
 	unsigned bio_cnt;
-	int rw = 0;
-	int i;
+	int i, rw = 0;
 
 	if (data_direction == DMA_TO_DEVICE) {
 		struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
diff --git a/drivers/target/target_core_iblock.h b/drivers/target/target_core_iblock.h
index 01c2afd..ff3cfdd 100644
--- a/drivers/target/target_core_iblock.h
+++ b/drivers/target/target_core_iblock.h
@@ -9,6 +9,7 @@ 
 struct iblock_req {
 	atomic_t pending;
 	atomic_t ib_bio_err_cnt;
+	int ib_bio_retry;
 } ____cacheline_aligned;
 
 #define IBDF_HAS_UDEV_PATH		0x01
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 784dd22..df01997 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -731,11 +731,20 @@  static unsigned char *transport_get_sense_buffer(struct se_cmd *cmd)
 void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 {
 	struct se_device *dev = cmd->se_dev;
-	int success = scsi_status == GOOD;
+	int success;
 	unsigned long flags;
 
 	cmd->scsi_status = scsi_status;
-
+	switch (cmd->scsi_status) {
+	case SAM_STAT_GOOD:
+	case SAM_STAT_BUSY:
+	case SAM_STAT_TASK_SET_FULL:
+		success = 1;
+		break;
+	default:
+		success = 0;
+		break;
+	}
 
 	spin_lock_irqsave(&cmd->t_state_lock, flags);
 	cmd->transport_state &= ~CMD_T_BUSY;