Avoid that ATA error handling hangs

Message ID	1519266202.16203.5.camel@wdc.com (mailing list archive)
State	Not Applicable
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Damien Le Moal <Damien.LeMoal@wdc.com> To: "jejb@linux.vnet.ibm.com" <jejb@linux.vnet.ibm.com>, Bart Van Assche <Bart.VanAssche@wdc.com>, "martin.petersen@oracle.com" <martin.petersen@oracle.com> CC: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>, "hare@suse.com" <hare@suse.com>, "jthumshirn@suse.de" <jthumshirn@suse.de>, "ptikhomirov@virtuozzo.com" <ptikhomirov@virtuozzo.com>, "ncopa@alpinelinux.org" <ncopa@alpinelinux.org>, "stable@vger.kernel.org" <stable@vger.kernel.org> Subject: Re: [PATCH] Avoid that ATA error handling hangs Thread-Topic: [PATCH] Avoid that ATA error handling hangs Thread-Index: AQHTqziu3psYMuehqk2Tf6b81oIkFKOvscYA Date: Thu, 22 Feb 2018 02:23:25 +0000 Message-ID: <1519266202.16203.5.camel@wdc.com> References: <20180221172316.11884-1-bart.vanassche@wdc.com> In-Reply-To: <20180221172316.11884-1-bart.vanassche@wdc.com> Accept-Language: en-US Content-Language: en-US wdcipoutbound: EOP-TRUE spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: <2814518F2B73EB418908138337C4D526@namprd04.prod.outlook.com> Content-Transfer-Encoding: base64 MIME-Version: 1.0 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk

Damien Le Moal Feb. 22, 2018, 2:23 a.m. UTC

Bart,

On Wed, 2018-02-21 at 09:23 -0800, Bart Van Assche wrote:
> Avoid that the recently introduced call_rcu() call in the SCSI core

> causes the RCU core to complain about double call_rcu() calls.

> 

> Reported-by: Natanael Copa <ncopa@alpinelinux.org>

> Reported-by: Damien Le Moal <damien.lemoal@wdc.com>

> References: https://bugzilla.kernel.org/show_bug.cgi?id=198861

> Fixes: 3bd6f43f5cb3 ("scsi: core: Ensure that the SCSI error handler gets

> woken up")

> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>

> Cc: Natanael Copa <ncopa@alpinelinux.org>

> Cc: Damien Le Moal <damien.lemoal@wdc.com>

> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

> Cc: Hannes Reinecke <hare@suse.com>

> Cc: Johannes Thumshirn <jthumshirn@suse.de>

> Cc: <stable@vger.kernel.org>

> ---

>  drivers/scsi/scsi_error.c | 5 +++--

>  include/scsi/scsi_cmnd.h  | 3 +++

>  include/scsi/scsi_host.h  | 2 --

>  3 files changed, 6 insertions(+), 4 deletions(-)

> 

> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c

> index ae325985eac1..ac9ce099530e 100644

> --- a/drivers/scsi/scsi_error.c

> +++ b/drivers/scsi/scsi_error.c

> @@ -229,7 +229,8 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)

>  

>  static void scsi_eh_inc_host_failed(struct rcu_head *head)

>  {

> -	struct Scsi_Host *shost = container_of(head, typeof(*shost), rcu);

> +	struct scsi_cmnd *scmd = container_of(head, typeof(*scmd), rcu);

> +	struct Scsi_Host *shost = scmd->device->host;

>  	unsigned long flags;

>  

>  	spin_lock_irqsave(shost->host_lock, flags);

> @@ -265,7 +266,7 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)

>  	 * Ensure that all tasks observe the host state change before the

>  	 * host_failed change.

>  	 */

> -	call_rcu(&shost->rcu, scsi_eh_inc_host_failed);

> +	call_rcu(&scmd->rcu, scsi_eh_inc_host_failed);

>  }

>  

>  /**

> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h

> index d8d4a902a88d..2280b2351739 100644

> --- a/include/scsi/scsi_cmnd.h

> +++ b/include/scsi/scsi_cmnd.h

> @@ -68,6 +68,9 @@ struct scsi_cmnd {

>  	struct list_head list;  /* scsi_cmnd participates in queue lists */

>  	struct list_head eh_entry; /* entry for the host eh_cmd_q */

>  	struct delayed_work abort_work;

> +

> +	struct rcu_head rcu;

> +

>  	int eh_eflags;		/* Used by error handlr */

>  

>  	/*

> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h

> index 1a1df0d21ee3..a8b7bf879ced 100644

> --- a/include/scsi/scsi_host.h

> +++ b/include/scsi/scsi_host.h

> @@ -571,8 +571,6 @@ struct Scsi_Host {

>  		struct blk_mq_tag_set	tag_set;

>  	};

>  

> -	struct rcu_head rcu;

> -

>  	atomic_t host_busy;		   /* commands actually active

> on low-level */

>  	atomic_t host_blocked;


This does not compile. You missed the init_rcu_head() and destroy_rcu_head()
changes. Adding this:


And it compiles.

Testing this, the rcu hang is now gone.

However, the behavior of the error recovery  is still different from what I
see in 4.15 and 4.14. For my test case, an unaligned write to a sequential
zone on a ZAC drive connected to an AHCI port, the report zone issued during
the disk revalidation after the write error fails with a timeout, which causes
capacity change to 0, port reset and recovery again. Eventually, everything
comes back up OK, but it takes some time.

I am investigating to make sure I am not hitting a device FW bug to confirm if
this is a kernel problem.

Best regards.

-- 
Damien Le Moal
Western Digital

Bart Van Assche Feb. 22, 2018, 3:53 a.m. UTC | #1

On Thu, 2018-02-22 at 02:23 +0000, Damien Le Moal wrote:
> On Wed, 2018-02-21 at 09:23 -0800, Bart Van Assche wrote:

> > [ ... ]

> This does not compile.


This patch depends on another patch that is not yet in Martin's tree. See also
https://marc.info/?l=linux-scsi&m=151675130615597. I should have mentioned this
in the patch description.

> Testing this, the rcu hang is now gone.


Thanks for the testing :-)

> However, the behavior of the error recovery  is still different from what I

> see in 4.15 and 4.14. For my test case, an unaligned write to a sequential

> zone on a ZAC drive connected to an AHCI port, the report zone issued during

> the disk revalidation after the write error fails with a timeout, which causes

> capacity change to 0, port reset and recovery again. Eventually, everything

> comes back up OK, but it takes some time.

> 

> I am investigating to make sure I am not hitting a device FW bug to confirm if

> this is a kernel problem.


This patch was tested with the SRP protocol. I'm not an ATA expert but I hope
that someone who is more familiar with ATA than I can chime in.

Bart.

Martin K. Petersen Feb. 22, 2018, 4:06 a.m. UTC | #2

Bart,

> This patch depends on another patch that is not yet in Martin's
> tree.

Nobody reviewed it. Same goes for your queuecommand tweak :/

I'm pretty picky about getting at least one other person than me to look
over core changes.

Reviewers: Fame and fortune awaits!

Damien Le Moal Feb. 22, 2018, 4:19 a.m. UTC | #3

Martin,

On 2/22/18 13:06, Martin K. Petersen wrote:
> 

> Bart,

> 

>> This patch depends on another patch that is not yet in Martin's

>> tree.

> 

> Nobody reviewed it. Same goes for your queuecommand tweak :/

> 

> I'm pretty picky about getting at least one other person than me to look

> over core changes.

> 

> Reviewers: Fame and fortune awaits!


It looks OK to me, at least if CONFIG_DEBUG_OBJECTS_RCU_HEAD is turned
off since the init_rcu_head() and destroy_rcu_head() functions only care
about that.

With rcu head debug turned on, I am not so sure. The object debug code
will have references to unused rcu heads left behind for unused scsi
cmds, which are indeed dynamically allocated for a device together with
requests when the device is initialized, but they are never freed until
the device is removed. So "dynamically allocated object", yes, but that
does not match the use of the object done in scsi (i.e. alloc before use
+ free after use).

Because of this doubt, No reviewed-by from me. I will miss fame and
fortune this time :)

Best regards.

-- 
Damien Le Moal,
Western Digital

Bart Van Assche Feb. 22, 2018, 4:39 a.m. UTC | #4

On Thu, 2018-02-22 at 04:19 +0000, Damien Le Moal wrote:
> It looks OK to me, at least if CONFIG_DEBUG_OBJECTS_RCU_HEAD is turned

> off since the init_rcu_head() and destroy_rcu_head() functions only care

> about that.

> 

> With rcu head debug turned on, I am not so sure. The object debug code

> will have references to unused rcu heads left behind for unused scsi

> cmds, which are indeed dynamically allocated for a device together with

> requests when the device is initialized, but they are never freed until

> the device is removed. So "dynamically allocated object", yes, but that

> does not match the use of the object done in scsi (i.e. alloc before use

> + free after use).


Hello Damien,

Please have a look at the following part of
Documentation/RCU/Design/Requirements/Requirements.html:

	Similarly, statically allocated non-stack <tt>rcu_head</tt>
	structures must be initialized with <tt>init_rcu_head()</tt>
	and cleaned up with <tt>destroy_rcu_head()</tt>.

Thanks,

Bart.

Bart Van Assche Feb. 22, 2018, 4:39 a.m. UTC | #5

On Thu, 2018-02-22 at 04:39 +0000, Bart Van Assche wrote:
> On Thu, 2018-02-22 at 04:19 +0000, Damien Le Moal wrote:

> > It looks OK to me, at least if CONFIG_DEBUG_OBJECTS_RCU_HEAD is turned

> > off since the init_rcu_head() and destroy_rcu_head() functions only care

> > about that.

> > 

> > With rcu head debug turned on, I am not so sure. The object debug code

> > will have references to unused rcu heads left behind for unused scsi

> > cmds, which are indeed dynamically allocated for a device together with

> > requests when the device is initialized, but they are never freed until

> > the device is removed. So "dynamically allocated object", yes, but that

> > does not match the use of the object done in scsi (i.e. alloc before use

> > + free after use).

> 

> Hello Damien,

> 

> Please have a look at the following part of

> Documentation/RCU/Design/Requirements/Requirements.html:

> 

> 	Similarly, statically allocated non-stack <tt>rcu_head</tt>

> 	structures must be initialized with <tt>init_rcu_head()</tt>

> 	and cleaned up with <tt>destroy_rcu_head()</tt>.


And from <linux/rcupdate.h>:

 * rcu_head structures
 * allocated dynamically in the heap or defined statically don't need any
 * initialization.

Bart.

Damien Le Moal Feb. 22, 2018, 4:55 a.m. UTC | #6

Bart,

On 2/22/18 13:39, Bart Van Assche wrote:
> On Thu, 2018-02-22 at 04:39 +0000, Bart Van Assche wrote:

>> On Thu, 2018-02-22 at 04:19 +0000, Damien Le Moal wrote:

>>> It looks OK to me, at least if CONFIG_DEBUG_OBJECTS_RCU_HEAD is turned

>>> off since the init_rcu_head() and destroy_rcu_head() functions only care

>>> about that.

>>>

>>> With rcu head debug turned on, I am not so sure. The object debug code

>>> will have references to unused rcu heads left behind for unused scsi

>>> cmds, which are indeed dynamically allocated for a device together with

>>> requests when the device is initialized, but they are never freed until

>>> the device is removed. So "dynamically allocated object", yes, but that

>>> does not match the use of the object done in scsi (i.e. alloc before use

>>> + free after use).

>>

>> Hello Damien,

>>

>> Please have a look at the following part of

>> Documentation/RCU/Design/Requirements/Requirements.html:

>>

>> 	Similarly, statically allocated non-stack <tt>rcu_head</tt>

>> 	structures must be initialized with <tt>init_rcu_head()</tt>

>> 	and cleaned up with <tt>destroy_rcu_head()</tt>.

> 

> And from <linux/rcupdate.h>:

> 

>  * rcu_head structures

>  * allocated dynamically in the heap or defined statically don't need any

>  * initialization.

Yes, I understood that. But my guess is this comment implies that the
objects are freed after use, which clears any reference to it from the
memory object debug hash automatically. That is not the case with scsi
command structs: there are allocated dynamically with the device, but
they are not freed after use. And here by use, I mean the normal use
cycle of a request+cmd: get unused request -> issue command -> command
completed -> return request in free state.
That is not an alloc+free cycle, so the memory object debug code will
never be involved and the scsi command rcu head never destroyed.

Considering that, I am not sure if it is really safe to remove the
init/destroy rcu head functions. At the very least, that will make the
memory object debug table grow larger with the first use of any scsi
command.

Cheers.

-- 
Damien Le Moal,
Western Digital

Damien Le Moal Feb. 22, 2018, 5:08 a.m. UTC | #7

Bart,

On 2/22/18 12:53, Bart Van Assche wrote:
> On Thu, 2018-02-22 at 02:23 +0000, Damien Le Moal wrote:

>> On Wed, 2018-02-21 at 09:23 -0800, Bart Van Assche wrote:

>>> [ ... ]

>> This does not compile.

> 

> This patch depends on another patch that is not yet in Martin's tree. See also

> https://marc.info/?l=linux-scsi&m=151675130615597. I should have mentioned this

> in the patch description.


OK. Got it.

>> Testing this, the rcu hang is now gone.

> 

> Thanks for the testing :-)

> 

>> However, the behavior of the error recovery  is still different from what I

>> see in 4.15 and 4.14. For my test case, an unaligned write to a sequential

>> zone on a ZAC drive connected to an AHCI port, the report zone issued during

>> the disk revalidation after the write error fails with a timeout, which causes

>> capacity change to 0, port reset and recovery again. Eventually, everything

>> comes back up OK, but it takes some time.

>>

>> I am investigating to make sure I am not hitting a device FW bug to confirm if

>> this is a kernel problem.

> 

> This patch was tested with the SRP protocol. I'm not an ATA expert but I hope

> that someone who is more familiar with ATA than I can chime in.


Well, I would expect the retry of the unaligned write to fail
immediately while the report zones is still on-going. These are both NCQ
commands, so that would result in the queue to be aborted and eh to see
the failed write retry and the aborted report zones, which should be
restarted right away. But report zones timeout after 30sec...

Example with CONFIG_DEBUG_OBJECTS_RCU_HEAD turned on (check time stamps):

[   43.978457] ata6.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x0
[   43.985239] ata6.00: irq_stat 0x40000008
[   43.989272] ata6.00: failed command: WRITE FPDMA QUEUED
[   43.994657] ata6.00: cmd 61/08:28:08:00:60/00:00:10:00:00/40 tag 5
ncq dma 4096 out
[   43.994657]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask
0x400 (unknown error) <F>
[   44.011157] ata6.00: status: { DRDY SENSE ERR }
[   44.015825] ata6.00: error: { ABRT }
[   44.268565] ata6.00: configured for UDMA/133
[   44.273208] ata6: EH complete
[   44.303440] ata6.00: exception Emask 0x0 SAct 0x10020 SErr 0x0 action 0x0
[   44.310743] ata6.00: irq_stat 0x40000008
[   44.314989] ata6.00: failed command: WRITE FPDMA QUEUED
[   44.320634] ata6.00: cmd 61/08:28:08:00:60/00:00:10:00:00/40 tag 5
ncq dma 4096 out
[   44.320634]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask
0x400 (unknown error) <F>
[   44.337787] ata6.00: status: { DRDY SENSE ERR }
[   44.342576] ata6.00: error: { ABRT }
[   44.374416] ata6.00: configured for UDMA/133
[   44.379094] ata6: EH complete
[   74.638316] ata6.00: exception Emask 0x1 SAct 0x110 SErr 0x0 action
0x6 frozen
[   74.645999] ata6.00: irq_stat 0x40000008
[   74.650001] ata6.00: failed command: RECEIVE FPDMA QUEUED
[   74.655544] ata6.00: cmd 65/00:20:00:00:00/01:02:00:00:00/40 tag 4
ncq dma 131072 in
[   74.655544]          res 40/00:00:00:00:38/00:00:3f:06:00/40 Emask
0x5 (timeout)
[   74.671001] ata6.00: status: { DRDY }
[   74.674768] ata6.00: failed command: WRITE FPDMA QUEUED
[   74.680140] ata6.00: cmd 61/08:40:08:00:60/00:00:10:00:00/40 tag 8
ncq dma 4096 out
[   74.680140]          res 40/00:00:08:00:60/00:00:10:00:00/40 Emask
0x1 (device error)
[   74.695966] ata6.00: status: { DRDY }
[   74.699730] ata6: hard resetting link
[   75.013606] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   75.067939] ata6.00: configured for UDMA/133
[   75.072465] ata6.00: device reported invalid CHS sector 0
[   75.078167] ata6: EH complete
[   75.078188] sd 5:0:0:0: [sdd] REPORT ZONES lba 0 failed with 0/8
[   75.087601] sd 5:0:0:0: [sdd] 0 512-byte logical blocks: (0 B/0 B)
[   75.093903] sd 5:0:0:0: [sdd] 4096-byte physical blocks
[   75.099276] sdd: detected capacity change from 14000519643136 to 0
[  106.189951] ata6.00: exception Emask 0x1 SAct 0x400010 SErr 0x0
action 0x6 frozen
[  106.197683] ata6.00: irq_stat 0x40000008
[  106.201730] ata6.00: failed command: WRITE FPDMA QUEUED
[  106.207096] ata6.00: cmd 61/08:20:08:00:60/00:00:10:00:00/40 tag 4
ncq dma 4096 out
[  106.207096]          res 40/00:00:08:00:60/00:00:10:00:00/40 Emask
0x1 (device error)
[  106.222930] ata6.00: status: { DRDY }
[  106.226690] ata6.00: failed command: RECEIVE FPDMA QUEUED
[  106.232235] ata6.00: cmd 65/00:b0:00:00:00/01:02:00:00:00/40 tag 22
ncq dma 131072 in
[  106.232235]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask
0x5 (timeout)
[  106.247834] ata6.00: status: { DRDY }
[  106.251641] ata6: hard resetting link
[  106.565274] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  106.600020] ata6.00: configured for UDMA/133
[  106.604501] ata6: EH complete
[  106.604526] sd 5:0:0:0: [sdd] REPORT ZONES lba 0 failed with 0/8
[  106.607669] print_req_error: I/O error, dev sdd, sector 274726920
[  107.037705] sdd: detected capacity change from 0 to 14000519643136
[  110.117135] sd 5:0:0:0: [sdd] 27344764928 512-byte logical blocks:
(14.0 TB/12.7 TiB)
[  110.125441] sd 5:0:0:0: [sdd] 4096-byte physical blocks
[  110.130782] sd 5:0:0:0: [sdd] 52156 zones of 524288 logical blocks


As can be seen at the end of the log, the disk comes back up and the
write failure reported back to the application (dd in this case).
Re-running dd again result in the same behavior once more. But
surprisingly, running it a third time, I get directly a scsi error (no
ata error message). No report zones failure.

[  386.748116] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  386.759463] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request
[current]
[  386.767276] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write command
[  386.774805] sd 5:0:0:0: [sdd] tag#26 CDB: Write(16) 8a 00 00 00 00 00
10 60 00 08 00 00 00 08 00 00
[  386.784700] print_req_error: I/O error, dev sdd, sector 274726920

Which look like the result of a previous command since nothing seems to
have been sent to libata... (the dd call is repeated exactly every time
with the same target sector that is after the zone write pointer).

Finally, running a fourth time, dd simply hangs with nothing happening,
and dd cannot be killed. Reboot to recover.

Now, the same test with CONFIG_DEBUG_OBJECTS_RCU_HEAD turned off, I get
this:

[  463.680615] ata6.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0
[  463.687884] ata6.00: irq_stat 0x40000008
[  463.691904] ata6.00: failed command: WRITE FPDMA QUEUED
[  463.697284] ata6.00: cmd 61/08:90:08:00:60/00:00:10:00:00/40 tag 18
ncq dma 4096 out
[  463.697284]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask
0x400 (unknown error) <F>
[  463.713798] ata6.00: status: { DRDY SENSE ERR }
[  463.718462] ata6.00: error: { ABRT }
[  463.912968] ata6.00: configured for UDMA/133
[  463.917706] ata6: EH complete
[  463.944550] ata6.00: exception Emask 0x0 SAct 0x800010 SErr 0x0
action 0x0
[  463.951962] ata6.00: irq_stat 0x40000008
[  463.956293] ata6.00: failed command: WRITE FPDMA QUEUED
[  463.961963] ata6.00: cmd 61/08:b8:08:00:60/00:00:10:00:00/40 tag 23
ncq dma 4096 out
[  463.961963]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask
0x400 (unknown error) <F>
[  463.979399] ata6.00: status: { DRDY SENSE ERR }
[  463.984336] ata6.00: error: { ABRT }
[  464.016416] ata6.00: configured for UDMA/133
[  464.020894] ata6: EH complete
[  464.050581] ata6.00: exception Emask 0x0 SAct 0x3000 SErr 0x0 action 0x0
[  464.057602] ata6.00: irq_stat 0x40000008
[  464.061761] ata6.00: failed command: WRITE FPDMA QUEUED
[  464.067318] ata6.00: cmd 61/08:60:08:00:60/00:00:10:00:00/40 tag 12
ncq dma 4096 out
[  464.067318]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask
0x400 (unknown error) <F>
[  464.084371] ata6.00: status: { DRDY SENSE ERR }
[  464.089324] ata6.00: error: { ABRT }
[  464.121189] ata6.00: configured for UDMA/133
[  464.125640] ata6: EH complete
[  464.363596] ata6.00: exception Emask 0x0 SAct 0x800400 SErr 0x0
action 0x0
[  464.371012] ata6.00: irq_stat 0x40000008
[  464.375257] ata6.00: failed command: WRITE FPDMA QUEUED
[  464.380937] ata6.00: cmd 61/08:50:08:00:60/00:00:10:00:00/40 tag 10
ncq dma 4096 out
[  464.380937]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask
0x400 (unknown error) <F>
[  464.398337] ata6.00: status: { DRDY SENSE ERR }
[  464.403186] ata6.00: error: { ABRT }
[  464.495690] ata6.00: configured for UDMA/133
[  464.500367] ata6: EH complete
[  464.739616] ata6.00: exception Emask 0x0 SAct 0x4800 SErr 0x0 action 0x0
[  464.746735] ata6.00: irq_stat 0x40000008
[  464.750729] ata6.00: failed command: WRITE FPDMA QUEUED
[  464.756095] ata6.00: cmd 61/08:70:08:00:60/00:00:10:00:00/40 tag 14
ncq dma 4096 out
[  464.756095]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask
0x400 (unknown error) <F>
[  464.772637] ata6.00: status: { DRDY SENSE ERR }
[  464.777294] ata6.00: error: { ABRT }
[  464.809308] ata6.00: configured for UDMA/133
[  464.813726] ata6: EH complete
[  464.843549] ata6.00: exception Emask 0x0 SAct 0xc000 SErr 0x0 action 0x0
[  464.850525] ata6.00: irq_stat 0x40000008
[  464.854622] ata6.00: failed command: WRITE FPDMA QUEUED
[  464.860050] ata6.00: cmd 61/08:70:08:00:60/00:00:10:00:00/40 tag 14
ncq dma 4096 out
[  464.860050]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask
0x400 (unknown error) <F>
[  464.876819] ata6.00: status: { DRDY SENSE ERR }
[  464.881488] ata6.00: error: { ABRT }
[  464.913349] ata6.00: configured for UDMA/133
[  464.917912] sd 5:0:0:0: [sdd] tag#14 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  464.926472] sd 5:0:0:0: [sdd] tag#14 Sense Key : Illegal Request
[current]
[  464.933569] sd 5:0:0:0: [sdd] tag#14 Add. Sense: Unaligned write command
[  464.940472] sd 5:0:0:0: [sdd] tag#14 CDB: Write(16) 8a 00 00 00 00 00
10 60 00 08 00 00 00 08 00 00
[  464.949740] print_req_error: I/O error, dev sdd, sector 274726920
[  464.956040] ata6: EH complete

quick error recovery and return to dd. But running a second time and dd
hangs. Nothing happening. debugfs information gives me this:

./hctx0/cpu7/completed:13 2
./hctx0/cpu7/merged:0
./hctx0/cpu7/dispatched:13 2
./hctx0/cpu6/completed:0 0
./hctx0/cpu6/merged:0
./hctx0/cpu6/dispatched:0 0
./hctx0/cpu5/completed:5 3
./hctx0/cpu5/merged:0
./hctx0/cpu5/dispatched:5 3
./hctx0/cpu4/completed:47 4
./hctx0/cpu4/merged:0
./hctx0/cpu4/dispatched:48 4
./hctx0/cpu3/completed:6 85
./hctx0/cpu3/merged:0
./hctx0/cpu3/dispatched:6 85
./hctx0/cpu2/completed:42 8
./hctx0/cpu2/merged:0
./hctx0/cpu2/dispatched:42 8
./hctx0/cpu1/completed:3 92
./hctx0/cpu1/merged:0
./hctx0/cpu1/dispatched:3 92
./hctx0/cpu0/completed:7 4
./hctx0/cpu0/merged:0
./hctx0/cpu0/dispatched:7 4
./hctx0/active:0
./hctx0/run:307
./hctx0/queued:322
./hctx0/dispatched:       0	12
./hctx0/dispatched:       1	331
./hctx0/dispatched:       2	0
./hctx0/dispatched:       4	0
./hctx0/dispatched:       8	0
./hctx0/dispatched:      16	0
./hctx0/dispatched:      32+	0
./hctx0/sched_tags_bitmap:00000000: 0000 0000 0000 0800
./hctx0/sched_tags:nr_tags=62
./hctx0/sched_tags:nr_reserved_tags=0
./hctx0/sched_tags:active_queues=0
./hctx0/sched_tags:bitmap_tags:
./hctx0/sched_tags:depth=62
./hctx0/sched_tags:busy=1
./hctx0/sched_tags:bits_per_word=8
./hctx0/sched_tags:map_nr=8
./hctx0/sched_tags:alloc_hint={11, 34, 50, 31, 52, 8, 0, 15}
./hctx0/sched_tags:wake_batch=7
./hctx0/sched_tags:wake_index=0
./hctx0/sched_tags:ws={
./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},
./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},
./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},
./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},
./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},
./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},
./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},
./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},
./hctx0/sched_tags:}
./hctx0/sched_tags:round_robin=1
./hctx0/tags_bitmap:00000000: 0000 0000
./hctx0/tags:nr_tags=31
./hctx0/tags:nr_reserved_tags=0
./hctx0/tags:active_queues=0
./hctx0/tags:bitmap_tags:
./hctx0/tags:depth=31
./hctx0/tags:busy=0
./hctx0/tags:bits_per_word=4
./hctx0/tags:map_nr=8
./hctx0/tags:alloc_hint={15, 5, 0, 10, 21, 8, 0, 15}
./hctx0/tags:wake_batch=3
./hctx0/tags:wake_index=0
./hctx0/tags:ws={
./hctx0/tags:	{.wait_cnt=3, .wait=inactive},
./hctx0/tags:	{.wait_cnt=3, .wait=inactive},
./hctx0/tags:	{.wait_cnt=3, .wait=inactive},
./hctx0/tags:	{.wait_cnt=3, .wait=inactive},
./hctx0/tags:	{.wait_cnt=3, .wait=inactive},
./hctx0/tags:	{.wait_cnt=3, .wait=inactive},
./hctx0/tags:	{.wait_cnt=3, .wait=inactive},
./hctx0/tags:	{.wait_cnt=3, .wait=inactive},
./hctx0/tags:}
./hctx0/tags:round_robin=1
./hctx0/ctx_map:00000000: 00
./hctx0/flags:alloc_policy=RR SHOULD_MERGE|SG_MERGE
./sched/starved:0
./sched/batching:1
./sched/write_fifo_list:000000003d5af16e {.op=WRITE,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT|HASHED, complete=0, .tag=-1,
.internal_tag=51, .cmd=Mode Sense(10) 5a 00 08 00 00 00 00 00 08 00,
.retries=0, .result = 0x0, .flags=TAGGED, .timeout=0.000, allocated
282.418 s ago}
./state:SAME_COMP|IO_STAT|INIT_DONE|POLL|WC|FLUSH_NQ|STATS|REGISTERED|SCSI_PASSTHROUGH

Clearly, the write command is seating in the scheduler queue but nothing
is happening. I checked if this would be a bug with mq-deadline zone
locking, but it does not appear to be so (the target zone is unlocked).
Since turning on/off CONFIG_DEBUG_OBJECTS_RCU_HEAD changes the behavior,
I am inclined to think of a timing/race issue somewhere. Running with
scsi_debug turned on (and a raiser high log level) also suppresses the
problem, so another indicator of a timing issue.

I do not fully understand what is going on yet. Still looking.
Hannes, Tejun, help !

Best regards.

-- 
Damien Le Moal,
Western Digital

Damien Le Moal Feb. 22, 2018, 7:20 a.m. UTC | #8

On 2/22/18 14:08, Damien Le Moal wrote:
> Bart,

> 

> On 2/22/18 12:53, Bart Van Assche wrote:

>> On Thu, 2018-02-22 at 02:23 +0000, Damien Le Moal wrote:

>>> On Wed, 2018-02-21 at 09:23 -0800, Bart Van Assche wrote:

>>>> [ ... ]

>>> This does not compile.

>>

>> This patch depends on another patch that is not yet in Martin's tree. See also

>> https://marc.info/?l=linux-scsi&m=151675130615597. I should have mentioned this

>> in the patch description.

> 

> OK. Got it.

> 

>>> Testing this, the rcu hang is now gone.

>>

>> Thanks for the testing :-)

>>

>>> However, the behavior of the error recovery  is still different from what I

>>> see in 4.15 and 4.14. For my test case, an unaligned write to a sequential

>>> zone on a ZAC drive connected to an AHCI port, the report zone issued during

>>> the disk revalidation after the write error fails with a timeout, which causes

>>> capacity change to 0, port reset and recovery again. Eventually, everything

>>> comes back up OK, but it takes some time.

>>>

>>> I am investigating to make sure I am not hitting a device FW bug to confirm if

>>> this is a kernel problem.

>>

>> This patch was tested with the SRP protocol. I'm not an ATA expert but I hope

>> that someone who is more familiar with ATA than I can chime in.

> 

> Well, I would expect the retry of the unaligned write to fail

> immediately while the report zones is still on-going. These are both NCQ

> commands, so that would result in the queue to be aborted and eh to see

> the failed write retry and the aborted report zones, which should be

> restarted right away. But report zones timeout after 30sec...

> 

> Example with CONFIG_DEBUG_OBJECTS_RCU_HEAD turned on (check time stamps):

> 

> [   43.978457] ata6.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x0

> [   43.985239] ata6.00: irq_stat 0x40000008

> [   43.989272] ata6.00: failed command: WRITE FPDMA QUEUED

> [   43.994657] ata6.00: cmd 61/08:28:08:00:60/00:00:10:00:00/40 tag 5

> ncq dma 4096 out

> [   43.994657]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask

> 0x400 (unknown error) <F>

> [   44.011157] ata6.00: status: { DRDY SENSE ERR }

> [   44.015825] ata6.00: error: { ABRT }

> [   44.268565] ata6.00: configured for UDMA/133

> [   44.273208] ata6: EH complete

> [   44.303440] ata6.00: exception Emask 0x0 SAct 0x10020 SErr 0x0 action 0x0

> [   44.310743] ata6.00: irq_stat 0x40000008

> [   44.314989] ata6.00: failed command: WRITE FPDMA QUEUED

> [   44.320634] ata6.00: cmd 61/08:28:08:00:60/00:00:10:00:00/40 tag 5

> ncq dma 4096 out

> [   44.320634]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask

> 0x400 (unknown error) <F>

> [   44.337787] ata6.00: status: { DRDY SENSE ERR }

> [   44.342576] ata6.00: error: { ABRT }

> [   44.374416] ata6.00: configured for UDMA/133

> [   44.379094] ata6: EH complete

> [   74.638316] ata6.00: exception Emask 0x1 SAct 0x110 SErr 0x0 action

> 0x6 frozen

> [   74.645999] ata6.00: irq_stat 0x40000008

> [   74.650001] ata6.00: failed command: RECEIVE FPDMA QUEUED

> [   74.655544] ata6.00: cmd 65/00:20:00:00:00/01:02:00:00:00/40 tag 4

> ncq dma 131072 in

> [   74.655544]          res 40/00:00:00:00:38/00:00:3f:06:00/40 Emask

> 0x5 (timeout)

> [   74.671001] ata6.00: status: { DRDY }

> [   74.674768] ata6.00: failed command: WRITE FPDMA QUEUED

> [   74.680140] ata6.00: cmd 61/08:40:08:00:60/00:00:10:00:00/40 tag 8

> ncq dma 4096 out

> [   74.680140]          res 40/00:00:08:00:60/00:00:10:00:00/40 Emask

> 0x1 (device error)

> [   74.695966] ata6.00: status: { DRDY }

> [   74.699730] ata6: hard resetting link

> [   75.013606] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

> [   75.067939] ata6.00: configured for UDMA/133

> [   75.072465] ata6.00: device reported invalid CHS sector 0

> [   75.078167] ata6: EH complete

> [   75.078188] sd 5:0:0:0: [sdd] REPORT ZONES lba 0 failed with 0/8

> [   75.087601] sd 5:0:0:0: [sdd] 0 512-byte logical blocks: (0 B/0 B)

> [   75.093903] sd 5:0:0:0: [sdd] 4096-byte physical blocks

> [   75.099276] sdd: detected capacity change from 14000519643136 to 0

> [  106.189951] ata6.00: exception Emask 0x1 SAct 0x400010 SErr 0x0

> action 0x6 frozen

> [  106.197683] ata6.00: irq_stat 0x40000008

> [  106.201730] ata6.00: failed command: WRITE FPDMA QUEUED

> [  106.207096] ata6.00: cmd 61/08:20:08:00:60/00:00:10:00:00/40 tag 4

> ncq dma 4096 out

> [  106.207096]          res 40/00:00:08:00:60/00:00:10:00:00/40 Emask

> 0x1 (device error)

> [  106.222930] ata6.00: status: { DRDY }

> [  106.226690] ata6.00: failed command: RECEIVE FPDMA QUEUED

> [  106.232235] ata6.00: cmd 65/00:b0:00:00:00/01:02:00:00:00/40 tag 22

> ncq dma 131072 in

> [  106.232235]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask

> 0x5 (timeout)

> [  106.247834] ata6.00: status: { DRDY }

> [  106.251641] ata6: hard resetting link

> [  106.565274] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

> [  106.600020] ata6.00: configured for UDMA/133

> [  106.604501] ata6: EH complete

> [  106.604526] sd 5:0:0:0: [sdd] REPORT ZONES lba 0 failed with 0/8

> [  106.607669] print_req_error: I/O error, dev sdd, sector 274726920

> [  107.037705] sdd: detected capacity change from 0 to 14000519643136

> [  110.117135] sd 5:0:0:0: [sdd] 27344764928 512-byte logical blocks:

> (14.0 TB/12.7 TiB)

> [  110.125441] sd 5:0:0:0: [sdd] 4096-byte physical blocks

> [  110.130782] sd 5:0:0:0: [sdd] 52156 zones of 524288 logical blocks

> 

> 

> As can be seen at the end of the log, the disk comes back up and the

> write failure reported back to the application (dd in this case).

> Re-running dd again result in the same behavior once more. But

> surprisingly, running it a third time, I get directly a scsi error (no

> ata error message). No report zones failure.

> 

> [  386.748116] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK

> driverbyte=DRIVER_SENSE

> [  386.759463] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request

> [current]

> [  386.767276] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write command

> [  386.774805] sd 5:0:0:0: [sdd] tag#26 CDB: Write(16) 8a 00 00 00 00 00

> 10 60 00 08 00 00 00 08 00 00

> [  386.784700] print_req_error: I/O error, dev sdd, sector 274726920

> 

> Which look like the result of a previous command since nothing seems to

> have been sent to libata... (the dd call is repeated exactly every time

> with the same target sector that is after the zone write pointer).

> 

> Finally, running a fourth time, dd simply hangs with nothing happening,

> and dd cannot be killed. Reboot to recover.

> 

> Now, the same test with CONFIG_DEBUG_OBJECTS_RCU_HEAD turned off, I get

> this:

> 

> [  463.680615] ata6.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0

> [  463.687884] ata6.00: irq_stat 0x40000008

> [  463.691904] ata6.00: failed command: WRITE FPDMA QUEUED

> [  463.697284] ata6.00: cmd 61/08:90:08:00:60/00:00:10:00:00/40 tag 18

> ncq dma 4096 out

> [  463.697284]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask

> 0x400 (unknown error) <F>

> [  463.713798] ata6.00: status: { DRDY SENSE ERR }

> [  463.718462] ata6.00: error: { ABRT }

> [  463.912968] ata6.00: configured for UDMA/133

> [  463.917706] ata6: EH complete

> [  463.944550] ata6.00: exception Emask 0x0 SAct 0x800010 SErr 0x0

> action 0x0

> [  463.951962] ata6.00: irq_stat 0x40000008

> [  463.956293] ata6.00: failed command: WRITE FPDMA QUEUED

> [  463.961963] ata6.00: cmd 61/08:b8:08:00:60/00:00:10:00:00/40 tag 23

> ncq dma 4096 out

> [  463.961963]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask

> 0x400 (unknown error) <F>

> [  463.979399] ata6.00: status: { DRDY SENSE ERR }

> [  463.984336] ata6.00: error: { ABRT }

> [  464.016416] ata6.00: configured for UDMA/133

> [  464.020894] ata6: EH complete

> [  464.050581] ata6.00: exception Emask 0x0 SAct 0x3000 SErr 0x0 action 0x0

> [  464.057602] ata6.00: irq_stat 0x40000008

> [  464.061761] ata6.00: failed command: WRITE FPDMA QUEUED

> [  464.067318] ata6.00: cmd 61/08:60:08:00:60/00:00:10:00:00/40 tag 12

> ncq dma 4096 out

> [  464.067318]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask

> 0x400 (unknown error) <F>

> [  464.084371] ata6.00: status: { DRDY SENSE ERR }

> [  464.089324] ata6.00: error: { ABRT }

> [  464.121189] ata6.00: configured for UDMA/133

> [  464.125640] ata6: EH complete

> [  464.363596] ata6.00: exception Emask 0x0 SAct 0x800400 SErr 0x0

> action 0x0

> [  464.371012] ata6.00: irq_stat 0x40000008

> [  464.375257] ata6.00: failed command: WRITE FPDMA QUEUED

> [  464.380937] ata6.00: cmd 61/08:50:08:00:60/00:00:10:00:00/40 tag 10

> ncq dma 4096 out

> [  464.380937]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask

> 0x400 (unknown error) <F>

> [  464.398337] ata6.00: status: { DRDY SENSE ERR }

> [  464.403186] ata6.00: error: { ABRT }

> [  464.495690] ata6.00: configured for UDMA/133

> [  464.500367] ata6: EH complete

> [  464.739616] ata6.00: exception Emask 0x0 SAct 0x4800 SErr 0x0 action 0x0

> [  464.746735] ata6.00: irq_stat 0x40000008

> [  464.750729] ata6.00: failed command: WRITE FPDMA QUEUED

> [  464.756095] ata6.00: cmd 61/08:70:08:00:60/00:00:10:00:00/40 tag 14

> ncq dma 4096 out

> [  464.756095]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask

> 0x400 (unknown error) <F>

> [  464.772637] ata6.00: status: { DRDY SENSE ERR }

> [  464.777294] ata6.00: error: { ABRT }

> [  464.809308] ata6.00: configured for UDMA/133

> [  464.813726] ata6: EH complete

> [  464.843549] ata6.00: exception Emask 0x0 SAct 0xc000 SErr 0x0 action 0x0

> [  464.850525] ata6.00: irq_stat 0x40000008

> [  464.854622] ata6.00: failed command: WRITE FPDMA QUEUED

> [  464.860050] ata6.00: cmd 61/08:70:08:00:60/00:00:10:00:00/40 tag 14

> ncq dma 4096 out

> [  464.860050]          res 43/04:08:08:00:60/00:00:10:00:00/00 Emask

> 0x400 (unknown error) <F>

> [  464.876819] ata6.00: status: { DRDY SENSE ERR }

> [  464.881488] ata6.00: error: { ABRT }

> [  464.913349] ata6.00: configured for UDMA/133

> [  464.917912] sd 5:0:0:0: [sdd] tag#14 FAILED Result: hostbyte=DID_OK

> driverbyte=DRIVER_SENSE

> [  464.926472] sd 5:0:0:0: [sdd] tag#14 Sense Key : Illegal Request

> [current]

> [  464.933569] sd 5:0:0:0: [sdd] tag#14 Add. Sense: Unaligned write command

> [  464.940472] sd 5:0:0:0: [sdd] tag#14 CDB: Write(16) 8a 00 00 00 00 00

> 10 60 00 08 00 00 00 08 00 00

> [  464.949740] print_req_error: I/O error, dev sdd, sector 274726920

> [  464.956040] ata6: EH complete

> 

> quick error recovery and return to dd. But running a second time and dd

> hangs. Nothing happening. debugfs information gives me this:

> 

> ./hctx0/cpu7/completed:13 2

> ./hctx0/cpu7/merged:0

> ./hctx0/cpu7/dispatched:13 2

> ./hctx0/cpu6/completed:0 0

> ./hctx0/cpu6/merged:0

> ./hctx0/cpu6/dispatched:0 0

> ./hctx0/cpu5/completed:5 3

> ./hctx0/cpu5/merged:0

> ./hctx0/cpu5/dispatched:5 3

> ./hctx0/cpu4/completed:47 4

> ./hctx0/cpu4/merged:0

> ./hctx0/cpu4/dispatched:48 4

> ./hctx0/cpu3/completed:6 85

> ./hctx0/cpu3/merged:0

> ./hctx0/cpu3/dispatched:6 85

> ./hctx0/cpu2/completed:42 8

> ./hctx0/cpu2/merged:0

> ./hctx0/cpu2/dispatched:42 8

> ./hctx0/cpu1/completed:3 92

> ./hctx0/cpu1/merged:0

> ./hctx0/cpu1/dispatched:3 92

> ./hctx0/cpu0/completed:7 4

> ./hctx0/cpu0/merged:0

> ./hctx0/cpu0/dispatched:7 4

> ./hctx0/active:0

> ./hctx0/run:307

> ./hctx0/queued:322

> ./hctx0/dispatched:       0	12

> ./hctx0/dispatched:       1	331

> ./hctx0/dispatched:       2	0

> ./hctx0/dispatched:       4	0

> ./hctx0/dispatched:       8	0

> ./hctx0/dispatched:      16	0

> ./hctx0/dispatched:      32+	0

> ./hctx0/sched_tags_bitmap:00000000: 0000 0000 0000 0800

> ./hctx0/sched_tags:nr_tags=62

> ./hctx0/sched_tags:nr_reserved_tags=0

> ./hctx0/sched_tags:active_queues=0

> ./hctx0/sched_tags:bitmap_tags:

> ./hctx0/sched_tags:depth=62

> ./hctx0/sched_tags:busy=1

> ./hctx0/sched_tags:bits_per_word=8

> ./hctx0/sched_tags:map_nr=8

> ./hctx0/sched_tags:alloc_hint={11, 34, 50, 31, 52, 8, 0, 15}

> ./hctx0/sched_tags:wake_batch=7

> ./hctx0/sched_tags:wake_index=0

> ./hctx0/sched_tags:ws={

> ./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},

> ./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},

> ./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},

> ./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},

> ./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},

> ./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},

> ./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},

> ./hctx0/sched_tags:	{.wait_cnt=7, .wait=inactive},

> ./hctx0/sched_tags:}

> ./hctx0/sched_tags:round_robin=1

> ./hctx0/tags_bitmap:00000000: 0000 0000

> ./hctx0/tags:nr_tags=31

> ./hctx0/tags:nr_reserved_tags=0

> ./hctx0/tags:active_queues=0

> ./hctx0/tags:bitmap_tags:

> ./hctx0/tags:depth=31

> ./hctx0/tags:busy=0

> ./hctx0/tags:bits_per_word=4

> ./hctx0/tags:map_nr=8

> ./hctx0/tags:alloc_hint={15, 5, 0, 10, 21, 8, 0, 15}

> ./hctx0/tags:wake_batch=3

> ./hctx0/tags:wake_index=0

> ./hctx0/tags:ws={

> ./hctx0/tags:	{.wait_cnt=3, .wait=inactive},

> ./hctx0/tags:	{.wait_cnt=3, .wait=inactive},

> ./hctx0/tags:	{.wait_cnt=3, .wait=inactive},

> ./hctx0/tags:	{.wait_cnt=3, .wait=inactive},

> ./hctx0/tags:	{.wait_cnt=3, .wait=inactive},

> ./hctx0/tags:	{.wait_cnt=3, .wait=inactive},

> ./hctx0/tags:	{.wait_cnt=3, .wait=inactive},

> ./hctx0/tags:	{.wait_cnt=3, .wait=inactive},

> ./hctx0/tags:}

> ./hctx0/tags:round_robin=1

> ./hctx0/ctx_map:00000000: 00

> ./hctx0/flags:alloc_policy=RR SHOULD_MERGE|SG_MERGE

> ./sched/starved:0

> ./sched/batching:1

> ./sched/write_fifo_list:000000003d5af16e {.op=WRITE,

> .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT|HASHED, complete=0, .tag=-1,

> .internal_tag=51, .cmd=Mode Sense(10) 5a 00 08 00 00 00 00 00 08 00,

> .retries=0, .result = 0x0, .flags=TAGGED, .timeout=0.000, allocated

> 282.418 s ago}

> ./state:SAME_COMP|IO_STAT|INIT_DONE|POLL|WC|FLUSH_NQ|STATS|REGISTERED|SCSI_PASSTHROUGH

> 

> Clearly, the write command is seating in the scheduler queue but nothing

> is happening. I checked if this would be a bug with mq-deadline zone

> locking, but it does not appear to be so (the target zone is unlocked).

> Since turning on/off CONFIG_DEBUG_OBJECTS_RCU_HEAD changes the behavior,

> I am inclined to think of a timing/race issue somewhere. Running with

> scsi_debug turned on (and a raiser high log level) also suppresses the

> problem, so another indicator of a timing issue.

> 

> I do not fully understand what is going on yet. Still looking.

> Hannes, Tejun, help !


More information: when in this "hanged" state, issuing another unaligned
write to another zone works as expected (the write quickly fails, with a
similar quick recovery sequence as show above), and the hanged dd also
returns, that is, the write command hanging in the sched is issued and
fails too.

So there is definitely something fishy going on with the recovery state
and/or queue rerun. Can't figure out exactly what yet.

Another information: the same tests on a SAS drive are perfectly fine.
No problems whatsoever. It seems to be the interaction between ata-eh
and scsi-eh that causes the problem.

Best regards.

-- 
Damien Le Moal,
Western Digital

Natanael Copa Feb. 22, 2018, 5:15 p.m. UTC | #9

On Thu, 22 Feb 2018 03:53:19 +0000
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Thu, 2018-02-22 at 02:23 +0000, Damien Le Moal wrote:
> > On Wed, 2018-02-21 at 09:23 -0800, Bart Van Assche wrote:  
> > > [ ... ]  
> > This does not compile.  
> 
> This patch depends on another patch that is not yet in Martin's tree. See also
> https://marc.info/?l=linux-scsi&m=151675130615597. I should have mentioned this
> in the patch description.

I applied the two patches on top of 4.14.20 and it solves my problem[1]. I
think those two should be included in the 4.14.21 release.

Thanks!

-nc

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=198861

Avoid that ATA error handling hangs

Commit Message

Comments

Patch