diff mbox

scsi: aacraid: Add a small delay after IOP reset

Message ID 20170919151156.29412-1-gpiccoli@linux.vnet.ibm.com (mailing list archive)
State Accepted
Headers show

Commit Message

Guilherme G. Piccoli Sept. 19, 2017, 3:11 p.m. UTC
Commit 0e9973ed3382 ("scsi: aacraid: Add periodic checks to see IOP reset
status") changed the way driver checks if a reset succeeded. Now, after an
IOP reset, aacraid immediately start polling a register to verify the reset
is complete.

This behavior cause regressions on the reset path in PowerPC (at least).
Since the delay after the IOP reset was removed by the aforementioned patch,
the fact driver just starts to read a register instantly after the reset
was issued (by writing in another register) "corrupts" the reset procedure,
which ends up failing all the time.

The issue highly impacted kdump on PowerPC, since on kdump path we
proactively issue a reset in adapter (through the reset_devices kernel
parameter).

This patch (re-)adds a delay right after IOP reset is issued. Empirically
we measured that 3 seconds is enough, but for safety reasons we delay
for 5s (and since it was 30s before, 5s is still a small amount).

For reference, without this patch we observe the following messages
on kdump kernel boot process:

  [ 76.294] aacraid 0003:01:00.0: IOP reset failed
  [ 76.294] aacraid 0003:01:00.0: ARC Reset attempt failed
  [ 86.524] aacraid 0003:01:00.0: adapter kernel panic'd ff.
  [ 86.524] aacraid 0003:01:00.0: Controller reset type is 3
  [ 86.524] aacraid 0003:01:00.0: Issuing IOP reset
  [146.534] aacraid 0003:01:00.0: IOP reset failed
  [146.534] aacraid 0003:01:00.0: ARC Reset attempt failed

Fixes: 0e9973ed3382 ("scsi: aacraid: Add periodic checks to see IOP reset status")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
---
 drivers/scsi/aacraid/src.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Christoph Hellwig Sept. 19, 2017, 3:37 p.m. UTC | #1
On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli wrote:
>  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
> +
> +	msleep(5000);

src_writel is a writel, and thus a posted MMIO write.  You'll need
to have to a read first to make it a reliable timing base.
Guilherme G. Piccoli Sept. 19, 2017, 3:49 p.m. UTC | #2
On 09/19/2017 12:37 PM, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli wrote:
>>  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
>> +
>> +	msleep(5000);
> 
> src_writel is a writel, and thus a posted MMIO write.  You'll need
> to have to a read first to make it a reliable timing base.
> 

Just for my full understanding - you're saying a readl BEFORE
src_writel() or AFTER src_writel() ?

I could add a read to some dummy register, but notice it was a sequence
of readl's on aac_is_ctrl_up_and_running() that caused the failure of
reset...

Thanks
Christoph Hellwig Sept. 19, 2017, 3:52 p.m. UTC | #3
On Tue, Sep 19, 2017 at 12:49:21PM -0300, Guilherme G. Piccoli wrote:
> On 09/19/2017 12:37 PM, Christoph Hellwig wrote:
> > On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli wrote:
> >>  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
> >> +
> >> +	msleep(5000);
> > 
> > src_writel is a writel, and thus a posted MMIO write.  You'll need
> > to have to a read first to make it a reliable timing base.
> > 
> 
> Just for my full understanding - you're saying a readl BEFORE
> src_writel() or AFTER src_writel() ?

AFTER.

> I could add a read to some dummy register, but notice it was a sequence
> of readl's on aac_is_ctrl_up_and_running() that caused the failure of
> reset...

Oh, ouch.  I guess in that case we'll need to do the writel and pray,
but that would need a big comment explaining what's going on there.
Guilherme G. Piccoli Sept. 19, 2017, 3:58 p.m. UTC | #4
On 09/19/2017 12:52 PM, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 12:49:21PM -0300, Guilherme G. Piccoli wrote:
>> On 09/19/2017 12:37 PM, Christoph Hellwig wrote:
>>> On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli wrote:
>>>>  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
>>>> +
>>>> +	msleep(5000);
>>>
>>> src_writel is a writel, and thus a posted MMIO write.  You'll need
>>> to have to a read first to make it a reliable timing base.
>>>
>>
>> Just for my full understanding - you're saying a readl BEFORE
>> src_writel() or AFTER src_writel() ?
> 
> AFTER.

Thanks!

> 
>> I could add a read to some dummy register, but notice it was a sequence
>> of readl's on aac_is_ctrl_up_and_running() that caused the failure of
>> reset...
> 
> Oh, ouch.  I guess in that case we'll need to do the writel and pray,
> but that would need a big comment explaining what's going on there.
> 

Heheh you're right!

But do you remember that quirk added on nvme? Basically, it was a
similar scenario in which some adapters weren't happy in poll a register
in nvme_wait_ready() right after we wrote in the Controller Config
register when disabling a controller.

Seems the same thing here, the controller is not being able to handle a
read right after some internal procedure (reset) were initiated.

If you have suggestion to improve the commit msg, let me know :)
Thanks!
James Bottomley Sept. 19, 2017, 5:05 p.m. UTC | #5
On Tue, 2017-09-19 at 08:52 -0700, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 12:49:21PM -0300, Guilherme G. Piccoli wrote:
> > 
> > On 09/19/2017 12:37 PM, Christoph Hellwig wrote:
> > > 
> > > On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli
> > > wrote:
> > > > 
> > > >  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
> > > > +
> > > > +	msleep(5000);
> > > 
> > > src_writel is a writel, and thus a posted MMIO write.  You'll
> > > need
> > > to have to a read first to make it a reliable timing base.
> > > 
> > 
> > Just for my full understanding - you're saying a readl BEFORE
> > src_writel() or AFTER src_writel() ?
> 
> AFTER.

Actually, the whole problem sounds like a posted write.  Likely the
write that causes the reset doesn't get flushed until the read checking
if the reset has succeeded, which might explain the 100% initial
failure.  Why not throw away that first value if it's a failure and
then do your polled wait and timeout on the reset success.  We should
anyway be waiting some time for a reset to be issued, so even on non-
posted write systems we could see this problem intermittently.

James
Guilherme G. Piccoli Sept. 19, 2017, 7:15 p.m. UTC | #6
On 09/19/2017 02:05 PM, James Bottomley wrote:
> Actually, the whole problem sounds like a posted write.  Likely the
> write that causes the reset doesn't get flushed until the read checking
> if the reset has succeeded, which might explain the 100% initial
> failure.  Why not throw away that first value if it's a failure and
> then do your polled wait and timeout on the reset success.  We should
> anyway be waiting some time for a reset to be issued, so even on non-
> posted write systems we could see this problem intermittently.
> 
> James
> 

Thanks for this suggestion James.

I tried to remove the sleep and did a dummy read to register using
readl() - issue reproduced.

I did expect that, since in aac_is_ctrl_up_and_running() we indeed read
a register and if it shows us reset is not complete, we wait and read it
again. So, we can think in this 1st read as a dummy one heheh

My theory here is that we're observing a failure similar to one we
already did in some specific NVMe adapters - the readl before some delay
(in nvme it was 2s) corrupts the adapter FW procedure. It's as if the
adapter doesn't like to deal with this read while the reset procedure is
ongoing. So, we wait a bit to issue a readl and everything goes fine.

Cheers,


Guilherme
Dave Carroll Sept. 21, 2017, 4:19 p.m. UTC | #7
> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
> Sent: Tuesday, September 19, 2017 9:12 AM
> To: dl-esc-Aacraid Linux Driver <aacraid@microsemi.com>; linux-
> scsi@vger.kernel.org
> Cc: gpiccoli@linux.vnet.ibm.com; Raghava Aditya Renukunta
> <RaghavaAditya.Renukunta@microsemi.com>; Dave Carroll
> <david.carroll@microsemi.com>; brking@linux.vnet.ibm.com;
> dougmill@linux.vnet.ibm.com; stable@vger.kernel.org
> Subject: [PATCH] scsi: aacraid: Add a small delay after IOP reset
> 
> Commit 0e9973ed3382 ("scsi: aacraid: Add periodic checks to see IOP reset
> status") changed the way driver checks if a reset succeeded. Now, after an IOP
> reset, aacraid immediately start polling a register to verify the reset is complete.
> 
> This behavior cause regressions on the reset path in PowerPC (at least).
> Since the delay after the IOP reset was removed by the aforementioned patch,
> the fact driver just starts to read a register instantly after the reset was issued
> (by writing in another register) "corrupts" the reset procedure, which ends up
> failing all the time.
> 
> The issue highly impacted kdump on PowerPC, since on kdump path we
> proactively issue a reset in adapter (through the reset_devices kernel
> parameter).
> 
> This patch (re-)adds a delay right after IOP reset is issued. Empirically we
> measured that 3 seconds is enough, but for safety reasons we delay for 5s (and
> since it was 30s before, 5s is still a small amount).
> 
> For reference, without this patch we observe the following messages on kdump
> kernel boot process:
> 
>   [ 76.294] aacraid 0003:01:00.0: IOP reset failed
>   [ 76.294] aacraid 0003:01:00.0: ARC Reset attempt failed
>   [ 86.524] aacraid 0003:01:00.0: adapter kernel panic'd ff.
>   [ 86.524] aacraid 0003:01:00.0: Controller reset type is 3
>   [ 86.524] aacraid 0003:01:00.0: Issuing IOP reset
>   [146.534] aacraid 0003:01:00.0: IOP reset failed
>   [146.534] aacraid 0003:01:00.0: ARC Reset attempt failed
> 
> Fixes: 0e9973ed3382 ("scsi: aacraid: Add periodic checks to see IOP reset
> status")
> Cc: stable@vger.kernel.org # v4.13+
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> ---
Acked-by: Dave Carroll <david.carroll@microsemi.com>
Guilherme G. Piccoli Sept. 25, 2017, 9:09 p.m. UTC | #8
On 09/21/2017 01:19 PM, Dave Carroll wrote:
>> [...]
>> ---
> Acked-by: Dave Carroll <david.carroll@microsemi.com>
> 

Thanks Dave!

James/Martin, am I expected to send a v2 with some change? Perhaps with
Dave's ack?
Sorry to annoy, thanks in advance for any advice!

Cheers,


Guilherme
Martin K. Petersen Sept. 25, 2017, 9:34 p.m. UTC | #9
Guilherme,

> James/Martin, am I expected to send a v2 with some change? Perhaps
> with Dave's ack?  Sorry to annoy, thanks in advance for any advice!

I was just about to mail Dave and ask for confirmation that your
interpretation of the controller behavior is correct.

Dave?
Dave Carroll Sept. 27, 2017, 7:26 p.m. UTC | #10
> 
> 
> Guilherme,
> 
> > James/Martin, am I expected to send a v2 with some change? Perhaps
> > with Dave's ack?  Sorry to annoy, thanks in advance for any advice!
> 
> I was just about to mail Dave and ask for confirmation that your interpretation
> of the controller behavior is correct.
> 
> Dave?
> 
 Hi Martin,

Previously we had used sleep to delay until the controller got its mind back, 
but early testing indicated it wasn't needed. I'm  good with this.

-Dave
Martin K. Petersen Sept. 28, 2017, 1:42 a.m. UTC | #11
Dave,

>> 
> Previously we had used sleep to delay until the controller got its
> mind back, but early testing indicated it wasn't needed. I'm good with
> this.

Applied to 4.14/scsi-fixes. Thanks!
diff mbox

Patch

diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index 48c2b2b34b72..0c9361c87ec8 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -740,6 +740,8 @@  static void aac_send_iop_reset(struct aac_dev *dev)
 	aac_set_intx_mode(dev);
 
 	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
+
+	msleep(5000);
 }
 
 static void aac_send_hardware_soft_reset(struct aac_dev *dev)