diff mbox

mtd: fix writing incorrect ECC parity data in OOB region.

Message ID 1472443093.27061.4.camel@mtkswgap22 (mailing list archive)
State New, archived
Headers show

Commit Message

RogerCC.Lin Aug. 29, 2016, 3:58 a.m. UTC
Hi Boris,

I found a problem in current Mediatek upstream NAND driver (of nand/next
branch). When testing the upstream driver with UBIFS file system, it has
some chance to create incorrect ECC data in the second subpage, this
makes UBIFS failed to mount. Below patch fixes this problem. Please help
to review the patch and upstream. Thank you.

Best regards,
Roger



Description:
When mtk_ecc_encode() is writing the ECC parity data to the OOB region,
because each register is 4 bytes in length, but the len's unit is in
bytes, the operation in the for loop will cross the ECC's boundary.

And when mtk_nfc_do_write_page() comparing the sector number, because
the sector number field is at the 12th-bit position of NFI_BYTELEN
register, the masked register should be shifted 12 bits before being
compared. The result of this bug may cause the second subpage has
incomplete ECC parity bytes.

Test:
The patch passed the test of UBIFS file-system read/write on Mediatek's
RFB. The tested driver is checked-out from LEDE OpenWRT project's
upstream driver, which is pretty much same as nand/next branch upstream
driver(git clone https://git.lede-project.org/source.git). 


Signed-off-by: RogerCC Lin <rogercc.lin@mediatek.com>
---
        if (ret)
                dev_err(dev, "hwecc write timeout\n");

@@ -902,8 +902,8 @@ static int mtk_nfc_read_subpage(struct mtd_info
*mtd, struct nand_chip *chip,
                dev_warn(nfc->dev, "read ahb/dma done timeout\n");

        rc = readl_poll_timeout_atomic(nfc->regs + NFI_BYTELEN, reg,
-                                      (reg & CNTR_MASK) >= sectors, 10,
-                                      MTK_TIMEOUT);
+                                      ((reg & CNTR_MASK) >> 12) >=
sectors,
+                                      10, MTK_TIMEOUT);
        if (rc < 0) {
                dev_err(nfc->dev, "subpage done timeout\n");
                bitflips = -EIO;

Comments

Boris BREZILLON Aug. 29, 2016, 8:10 a.m. UTC | #1
+Jorge

Hi Roger,

On Mon, 29 Aug 2016 11:58:13 +0800
mtk04561 <rogercc.lin@mediatek.com> wrote:

> Hi Boris,
> 
> I found a problem in current Mediatek upstream NAND driver (of nand/next
> branch). When testing the upstream driver with UBIFS file system, it has
> some chance to create incorrect ECC data in the second subpage, this
> makes UBIFS failed to mount. Below patch fixes this problem. Please help
> to review the patch and upstream. Thank you.
> 
> Best regards,
> Roger

Can you put this message after the '---' (below your SoB), so that it
does not appear when applying the patch.

> 
> 
> 
> Description:

Drop the 'Description:' line.

> When mtk_ecc_encode() is writing the ECC parity data to the OOB region,
> because each register is 4 bytes in length, but the len's unit is in
> bytes, the operation in the for loop will cross the ECC's boundary.
> 
> And when mtk_nfc_do_write_page() comparing the sector number, because
> the sector number field is at the 12th-bit position of NFI_BYTELEN
> register, the masked register should be shifted 12 bits before being
> compared. The result of this bug may cause the second subpage has
> incomplete ECC parity bytes.

Maybe you could split those changes in 2 patches (not a strong
requirement).

> 
> Test:

Drop the 'Test:' line.

> The patch passed the test of UBIFS file-system read/write on Mediatek's
> RFB. The tested driver is checked-out from LEDE OpenWRT project's
> upstream driver, which is pretty much same as nand/next branch upstream
> driver(git clone https://git.lede-project.org/source.git). 
> 
> 
> Signed-off-by: RogerCC Lin <rogercc.lin@mediatek.com>
> ---
> diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c
> index 25a4fbd..0a7ce8d 100644
> --- a/drivers/mtd/nand/mtk_ecc.c
> +++ b/drivers/mtd/nand/mtk_ecc.c
> @@ -366,7 +366,8 @@ int mtk_ecc_encode(struct mtk_ecc *ecc, struct
> mtk_ecc_config *config,
>                    u8 *data, u32 bytes)
>  {
>         dma_addr_t addr;
> -       u32 *p, len, i;
> +       u8 *p;
> +       u32 len, i, val;
>         int ret = 0;
> 
>         addr = dma_map_single(ecc->dev, data, bytes, DMA_TO_DEVICE);
> @@ -395,8 +396,11 @@ int mtk_ecc_encode(struct mtk_ecc *ecc, struct
> mtk_ecc_config *config,
>         p = (u32 *)(data + bytes);

Hm, you should now do:

	p = data + bytes;

> 
>         /* write the parity bytes generated by the ECC back to the OOB
> region */
> -       for (i = 0; i < len; i++)
> -               p[i] = readl(ecc->regs + ECC_ENCPAR(i));
> +       for (i = 0; i < len; i++) {
> +               if ((i % 4) == 0)
> +                       val = readl(ecc->regs + ECC_ENCPAR(i >> 2));
> +               p[i] = *((u8 *)&val + (i % 4));

Please, use a shift operator here:

		p[i] = (val >> ((i % 4) * 8)) & 0xff;

Otherwise you're exposed to endianness conversion problems (this works
fine in your case because you're using a little-endian kernel).

> +       }
>  timeout:
> 
>         dma_unmap_single(ecc->dev, addr, bytes, DMA_TO_DEVICE);
> diff --git a/drivers/mtd/nand/mtk_nand.c b/drivers/mtd/nand/mtk_nand.c
> index ddaa2ac..9c49121 100644
> --- a/drivers/mtd/nand/mtk_nand.c
> +++ b/drivers/mtd/nand/mtk_nand.c
> @@ -699,8 +699,8 @@ static int mtk_nfc_do_write_page(struct mtd_info
> *mtd, struct nand_chip *chip,
>         }
> 
>         ret = readl_poll_timeout_atomic(nfc->regs + NFI_ADDRCNTR, reg,
> -                                       (reg & CNTR_MASK) >=
> chip->ecc.steps,
> -                                       10, MTK_TIMEOUT);
> +                                       ((reg & CNTR_MASK) >> 12) >=
> +                                       chip->ecc.steps, 10,

Please keep '((reg & CNTR_MASK) >> 12) >= chip->ecc.steps' on the same
line.

If you want to comply with the 80 char lines, you can do:

	ret = readl_poll_timeout_atomic(nfc->regs + NFI_ADDRCNTR, reg,
				((reg & CNTR_MASK) >> 12) >= chip->ecc.steps,
				10, MTK_TIMEOUT);


>         if (ret)
>                 dev_err(dev, "hwecc write timeout\n");
> 
> @@ -902,8 +902,8 @@ static int mtk_nfc_read_subpage(struct mtd_info
> *mtd, struct nand_chip *chip,
>                 dev_warn(nfc->dev, "read ahb/dma done timeout\n");
> 
>         rc = readl_poll_timeout_atomic(nfc->regs + NFI_BYTELEN, reg,
> -                                      (reg & CNTR_MASK) >= sectors, 10,
> -                                      MTK_TIMEOUT);
> +                                      ((reg & CNTR_MASK) >> 12) >=
> sectors,
> +                                      10, MTK_TIMEOUT);
>         if (rc < 0) {
>                 dev_err(nfc->dev, "subpage done timeout\n");
>                 bitflips = -EIO;
> 
>
Jorge Ramirez-Ortiz Aug. 29, 2016, 8:56 a.m. UTC | #2
On 08/29/2016 10:10 AM, Boris Brezillon wrote:
> +Jorge

Thanks Boris. Xiaolei is looking into this - he is aware of the patch.
RogerCC.Lin Aug. 29, 2016, 11:40 a.m. UTC | #3
This series fix chances to write incorrect ECC data which may cause
uncorrectable ECC error when reading.

changes since v1:
- separate patches into 2.
- use shift operator with byte access to avoid endianness conversion
problems.
- follow linux coding style.

The patch passed the test of UBIFS file-system read/write on Mediatek's
RFB. The tested driver is checked-out from LEDE OpenWRT project's
upstream driver, which is pretty much same as nand/next branch upstream
driver(git clone https://git.lede-project.org/source.git).

RogerCC Lin (2):
  mtd: nand: fix generating over-boundary ECC data when writing.
  mtd: nand: fix chances to create incomplete ECC data when writing.

 drivers/mtd/nand/mtk_ecc.c  |   12 ++++++++----
 drivers/mtd/nand/mtk_nand.c |    8 ++++----
 2 files changed, 12 insertions(+), 8 deletions(-)
RogerCC.Lin Aug. 29, 2016, 11:53 a.m. UTC | #4
Hi Boris and all,

I am sorry that though I have change the sender name in the email, but
It seems the patchwork still show the wrong name, I will check what
wrong with my email, sorry for the inconvenience.

Roger
diff mbox

Patch

diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c
index 25a4fbd..0a7ce8d 100644
--- a/drivers/mtd/nand/mtk_ecc.c
+++ b/drivers/mtd/nand/mtk_ecc.c
@@ -366,7 +366,8 @@  int mtk_ecc_encode(struct mtk_ecc *ecc, struct
mtk_ecc_config *config,
                   u8 *data, u32 bytes)
 {
        dma_addr_t addr;
-       u32 *p, len, i;
+       u8 *p;
+       u32 len, i, val;
        int ret = 0;

        addr = dma_map_single(ecc->dev, data, bytes, DMA_TO_DEVICE);
@@ -395,8 +396,11 @@  int mtk_ecc_encode(struct mtk_ecc *ecc, struct
mtk_ecc_config *config,
        p = (u32 *)(data + bytes);

        /* write the parity bytes generated by the ECC back to the OOB
region */
-       for (i = 0; i < len; i++)
-               p[i] = readl(ecc->regs + ECC_ENCPAR(i));
+       for (i = 0; i < len; i++) {
+               if ((i % 4) == 0)
+                       val = readl(ecc->regs + ECC_ENCPAR(i >> 2));
+               p[i] = *((u8 *)&val + (i % 4));
+       }
 timeout:

        dma_unmap_single(ecc->dev, addr, bytes, DMA_TO_DEVICE);
diff --git a/drivers/mtd/nand/mtk_nand.c b/drivers/mtd/nand/mtk_nand.c
index ddaa2ac..9c49121 100644
--- a/drivers/mtd/nand/mtk_nand.c
+++ b/drivers/mtd/nand/mtk_nand.c
@@ -699,8 +699,8 @@  static int mtk_nfc_do_write_page(struct mtd_info
*mtd, struct nand_chip *chip,
        }

        ret = readl_poll_timeout_atomic(nfc->regs + NFI_ADDRCNTR, reg,
-                                       (reg & CNTR_MASK) >=
chip->ecc.steps,
-                                       10, MTK_TIMEOUT);
+                                       ((reg & CNTR_MASK) >> 12) >=
+                                       chip->ecc.steps, 10,
MTK_TIMEOUT);