diff mbox

[v2] mtd: nand: mtk: fix infinite ECC decode IRQ issue

Message ID 1509331196-29412-2-git-send-email-xiaolei.li@mediatek.com (mailing list archive)
State New, archived
Headers show

Commit Message

xiaolei li Oct. 30, 2017, 2:39 a.m. UTC
For MT2701 NAND Controller, there may generate infinite ECC decode IRQ
during long time burn test on some platforms. Once this issue occurred,
the ECC decode IRQ status cannot be cleared in the IRQ handler function,
and threads cannot be scheduled.

ECC HW generates decode IRQ each sector, so there will have more than one
decode IRQ if read one page of large page NAND.

Currently, ECC IRQ handle flow is that we will check whether it is decode
IRQ at first by reading the register ECC_DECIRQ_STA. This is a read-clear
type register. If this IRQ is decode IRQ, then the ECC IRQ signal will be
cleared at the same time.
Secondly, we will check whether all sectors are decoded by reading the
register ECC_DECDONE. This is because the current IRQ may be not dealed
in time, and the next sectors have been decoded before reading the
register ECC_DECIRQ_STA. Then, the next sectors's decode IRQs will not
be generated.
Thirdly, if all sectors are decoded by comparing with ecc->sectors, then we
will complete ecc->done, set ecc->sectors as 0, and disable ECC IRQ by
programming the register ECC_IRQ_REG(op) as 0. Otherwise, wait for the
next ECC IRQ.

But, there is a timing issue between step one and two. When we read the
reigster ECC_DECIRQ_STA, all sectors are decoded except the last sector,
and the ECC IRQ signal is cleared. But the last sector is decoded before
reading ECC_DECDONE, so the ECC IRQ signal is enabled again by ECC HW, and
it means we will receive one extra ECC IRQ later. In step three, we will
find that all sectors were decoded, then disable ECC IRQ and return.
When deal with the extra ECC IRQ, the ECC IRQ status cannot be cleared
anymore. That is because the register ECC_DECIRQ_STA can only be cleared
when the register ECC_IRQ_REG(op) is enabled. But actually we have
disabled ECC IRQ in the previous ECC IRQ handle. So, there will
keep receiving ECC decode IRQ.

Now, we read the register ECC_DECIRQ_STA once again before completing the
ecc done event. This ensures that there will be no extra ECC decode IRQ.

Also, remove writel(0, ecc->regs + ECC_IRQ_REG(op)) from irq handler,
because ECC IRQ is disabled in mtk_ecc_disable(). And clear ECC_DECIRQ_STA
in mtk_ecc_disable() in case there is a timeout to wait decode IRQ.

Fixes: 1d6b1e464950 ("mtd: mediatek: driver for MTK Smart Device")
Cc: <stable@vger.kernel.org>
Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
---
 drivers/mtd/nand/mtk_ecc.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Boris BREZILLON Oct. 30, 2017, 9:27 a.m. UTC | #1
On Mon, 30 Oct 2017 10:39:56 +0800
Xiaolei Li <xiaolei.li@mediatek.com> wrote:

> For MT2701 NAND Controller, there may generate infinite ECC decode IRQ
> during long time burn test on some platforms. Once this issue occurred,
> the ECC decode IRQ status cannot be cleared in the IRQ handler function,
> and threads cannot be scheduled.
> 
> ECC HW generates decode IRQ each sector, so there will have more than one
> decode IRQ if read one page of large page NAND.
> 
> Currently, ECC IRQ handle flow is that we will check whether it is decode
> IRQ at first by reading the register ECC_DECIRQ_STA. This is a read-clear
> type register. If this IRQ is decode IRQ, then the ECC IRQ signal will be
> cleared at the same time.
> Secondly, we will check whether all sectors are decoded by reading the
> register ECC_DECDONE. This is because the current IRQ may be not dealed
> in time, and the next sectors have been decoded before reading the
> register ECC_DECIRQ_STA. Then, the next sectors's decode IRQs will not
> be generated.
> Thirdly, if all sectors are decoded by comparing with ecc->sectors, then we
> will complete ecc->done, set ecc->sectors as 0, and disable ECC IRQ by
> programming the register ECC_IRQ_REG(op) as 0. Otherwise, wait for the
> next ECC IRQ.
> 
> But, there is a timing issue between step one and two. When we read the
> reigster ECC_DECIRQ_STA, all sectors are decoded except the last sector,
> and the ECC IRQ signal is cleared. But the last sector is decoded before
> reading ECC_DECDONE, so the ECC IRQ signal is enabled again by ECC HW, and
> it means we will receive one extra ECC IRQ later. In step three, we will
> find that all sectors were decoded, then disable ECC IRQ and return.
> When deal with the extra ECC IRQ, the ECC IRQ status cannot be cleared
> anymore. That is because the register ECC_DECIRQ_STA can only be cleared
> when the register ECC_IRQ_REG(op) is enabled. But actually we have
> disabled ECC IRQ in the previous ECC IRQ handle. So, there will
> keep receiving ECC decode IRQ.
> 
> Now, we read the register ECC_DECIRQ_STA once again before completing the
> ecc done event. This ensures that there will be no extra ECC decode IRQ.
> 
> Also, remove writel(0, ecc->regs + ECC_IRQ_REG(op)) from irq handler,
> because ECC IRQ is disabled in mtk_ecc_disable(). And clear ECC_DECIRQ_STA
> in mtk_ecc_disable() in case there is a timeout to wait decode IRQ.
> 
> Fixes: 1d6b1e464950 ("mtd: mediatek: driver for MTK Smart Device")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>

Applied.

Thanks,

Boris

> ---
>  drivers/mtd/nand/mtk_ecc.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c
> index 7f3b065..c51d214 100644
> --- a/drivers/mtd/nand/mtk_ecc.c
> +++ b/drivers/mtd/nand/mtk_ecc.c
> @@ -115,6 +115,11 @@ static irqreturn_t mtk_ecc_irq(int irq, void *id)
>  		op = ECC_DECODE;
>  		dec = readw(ecc->regs + ECC_DECDONE);
>  		if (dec & ecc->sectors) {
> +			/*
> +			 * Clear decode IRQ status once again to ensure that
> +			 * there will be no extra IRQ.
> +			 */
> +			readw(ecc->regs + ECC_DECIRQ_STA);
>  			ecc->sectors = 0;
>  			complete(&ecc->done);
>  		} else {
> @@ -130,8 +135,6 @@ static irqreturn_t mtk_ecc_irq(int irq, void *id)
>  		}
>  	}
>  
> -	writel(0, ecc->regs + ECC_IRQ_REG(op));
> -
>  	return IRQ_HANDLED;
>  }
>  
> @@ -307,6 +310,12 @@ void mtk_ecc_disable(struct mtk_ecc *ecc)
>  
>  	/* disable it */
>  	mtk_ecc_wait_idle(ecc, op);
> +	if (op == ECC_DECODE)
> +		/*
> +		 * Clear decode IRQ status in case there is a timeout to wait
> +		 * decode IRQ.
> +		 */
> +		readw(ecc->regs + ECC_DECIRQ_STA);
>  	writew(0, ecc->regs + ECC_IRQ_REG(op));
>  	writew(ECC_OP_DISABLE, ecc->regs + ECC_CTL_REG(op));
>
diff mbox

Patch

diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c
index 7f3b065..c51d214 100644
--- a/drivers/mtd/nand/mtk_ecc.c
+++ b/drivers/mtd/nand/mtk_ecc.c
@@ -115,6 +115,11 @@  static irqreturn_t mtk_ecc_irq(int irq, void *id)
 		op = ECC_DECODE;
 		dec = readw(ecc->regs + ECC_DECDONE);
 		if (dec & ecc->sectors) {
+			/*
+			 * Clear decode IRQ status once again to ensure that
+			 * there will be no extra IRQ.
+			 */
+			readw(ecc->regs + ECC_DECIRQ_STA);
 			ecc->sectors = 0;
 			complete(&ecc->done);
 		} else {
@@ -130,8 +135,6 @@  static irqreturn_t mtk_ecc_irq(int irq, void *id)
 		}
 	}
 
-	writel(0, ecc->regs + ECC_IRQ_REG(op));
-
 	return IRQ_HANDLED;
 }
 
@@ -307,6 +310,12 @@  void mtk_ecc_disable(struct mtk_ecc *ecc)
 
 	/* disable it */
 	mtk_ecc_wait_idle(ecc, op);
+	if (op == ECC_DECODE)
+		/*
+		 * Clear decode IRQ status in case there is a timeout to wait
+		 * decode IRQ.
+		 */
+		readw(ecc->regs + ECC_DECIRQ_STA);
 	writew(0, ecc->regs + ECC_IRQ_REG(op));
 	writew(ECC_OP_DISABLE, ecc->regs + ECC_CTL_REG(op));