Message ID | 20240430-loongson1-nand-v7-2-60787c314fa4@gmail.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | Add support for Loongson-1 NAND | expand |
Hi, devnull+keguang.zhang.gmail.com@kernel.org wrote on Tue, 30 Apr 2024 19:11:11 +0800: > From: Keguang Zhang <keguang.zhang@gmail.com> > > nand_read_subpage() reads data and ECC data by two separate > operations. > This patch allows the NAND controllers who support > monolithic page read to do subpage read by a single operation, > which is more effective than nand_read_subpage(). I am a bit puzzled by this change. Usually nand_read_subpage is used for optimizations (when less data than a full page must be retrieved). I know it may be used in other cases (because it's easier for the core in order to support a wide range of controllers). Can you please show a speed test showing the results before I consider merging this patch? The monolithic thing was not supposed to improve throughput but to help with very limited controllers. Thanks, Miquèl
On Mon, May 6, 2024 at 3:17 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi, > > devnull+keguang.zhang.gmail.com@kernel.org wrote on Tue, 30 Apr 2024 > 19:11:11 +0800: > > > From: Keguang Zhang <keguang.zhang@gmail.com> > > > > nand_read_subpage() reads data and ECC data by two separate > > operations. > > This patch allows the NAND controllers who support > > monolithic page read to do subpage read by a single operation, > > which is more effective than nand_read_subpage(). > > I am a bit puzzled by this change. Usually nand_read_subpage is used > for optimizations (when less data than a full page must be retrieved). > I know it may be used in other cases (because it's easier for the core > in order to support a wide range of controllers). Can you please show a > speed test showing the results before I consider merging this patch? > With this patch: # flash_speed -c 128 -d /dev/mtd1 scanning for bad eraseblocks scanned 128 eraseblocks, 0 are bad testing eraseblock write speed eraseblock write speed is 2112 KiB/s testing eraseblock read speed eraseblock read speed is 3454 KiB/s testing page write speed page write speed is 1915 KiB/s testing page read speed page read speed is 2999 KiB/s testing 2 page write speed 2 page write speed is 2000 KiB/s testing 2 page read speed 2 page read speed is 3207 KiB/s Testing erase speed erase speed is 72495 KiB/s Testing 2x multi-block erase speed 2x multi-block erase speed is 74135 KiB/s Testing 4x multi-block erase speed 4x multi-block erase speed is 74812 KiB/s Testing 8x multi-block erase speed 8x multi-block erase speed is 75502 KiB/s Testing 16x multi-block erase speed 16x multi-block erase speed is 75851 KiB/s Testing 32x multi-block erase speed 32x multi-block erase speed is 75851 KiB/s Testing 64x multi-block erase speed 64x multi-block erase speed is 76204 KiB/s finished Without this patch: # flash_speed -c 128 -d /dev/mtd1 scanning for bad eraseblocks scanned 128 eraseblocks, 0 are bad testing eraseblock write speed eraseblock write speed is 2074 KiB/s testing eraseblock read speed eraseblock read speed is 2895 KiB/s testing page write speed page write speed is 998 KiB/s testing page read speed page read speed is 1499 KiB/s testing 2 page write speed 2 page write speed is 1002 KiB/s testing 2 page read speed 2 page read speed is 1554 KiB/s Testing erase speed erase speed is 76560 KiB/s Testing 2x multi-block erase speed 2x multi-block erase speed is 74019 KiB/s Testing 4x multi-block erase speed 4x multi-block erase speed is 74769 KiB/s Testing 8x multi-block erase speed 8x multi-block erase speed is 75149 KiB/s Testing 16x multi-block erase speed 16x multi-block erase speed is 75921 KiB/s Testing 32x multi-block erase speed 32x multi-block erase speed is 75921 KiB/s Testing 64x multi-block erase speed 64x multi-block erase speed is 75921 KiB/s finished The throughput of the former is twice that of the latter. > The monolithic thing was not supposed to improve throughput but to help > with very limited controllers. > > Thanks, > Miquèl
Hi Keguang, keguang.zhang@gmail.com wrote on Mon, 20 May 2024 18:42:30 +0800: > On Mon, May 6, 2024 at 3:17 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Hi, > > > > devnull+keguang.zhang.gmail.com@kernel.org wrote on Tue, 30 Apr 2024 > > 19:11:11 +0800: > > > > > From: Keguang Zhang <keguang.zhang@gmail.com> > > > > > > nand_read_subpage() reads data and ECC data by two separate > > > operations. > > > This patch allows the NAND controllers who support > > > monolithic page read to do subpage read by a single operation, > > > which is more effective than nand_read_subpage(). > > > > I am a bit puzzled by this change. Usually nand_read_subpage is used > > for optimizations (when less data than a full page must be retrieved). > > I know it may be used in other cases (because it's easier for the core > > in order to support a wide range of controllers). Can you please show a > > speed test showing the results before I consider merging this patch? > > > With this patch: > # flash_speed -c 128 -d /dev/mtd1 > scanning for bad eraseblocks > scanned 128 eraseblocks, 0 are bad > testing eraseblock write speed > eraseblock write speed is 2112 KiB/s > testing eraseblock read speed > eraseblock read speed is 3454 KiB/s > testing page write speed > page write speed is 1915 KiB/s > testing page read speed > page read speed is 2999 KiB/s > testing 2 page write speed > 2 page write speed is 2000 KiB/s > testing 2 page read speed > 2 page read speed is 3207 KiB/s > Testing erase speed > erase speed is 72495 KiB/s > Testing 2x multi-block erase speed > 2x multi-block erase speed is 74135 KiB/s > Testing 4x multi-block erase speed > 4x multi-block erase speed is 74812 KiB/s > Testing 8x multi-block erase speed > 8x multi-block erase speed is 75502 KiB/s > Testing 16x multi-block erase speed > 16x multi-block erase speed is 75851 KiB/s > Testing 32x multi-block erase speed > 32x multi-block erase speed is 75851 KiB/s > Testing 64x multi-block erase speed > 64x multi-block erase speed is 76204 KiB/s > finished > > Without this patch: > # flash_speed -c 128 -d /dev/mtd1 > scanning for bad eraseblocks > scanned 128 eraseblocks, 0 are bad > testing eraseblock write speed > eraseblock write speed is 2074 KiB/s > testing eraseblock read speed > eraseblock read speed is 2895 KiB/s > testing page write speed > page write speed is 998 KiB/s > testing page read speed > page read speed is 1499 KiB/s > testing 2 page write speed > 2 page write speed is 1002 KiB/s > testing 2 page read speed > 2 page read speed is 1554 KiB/s > Testing erase speed > erase speed is 76560 KiB/s > Testing 2x multi-block erase speed > 2x multi-block erase speed is 74019 KiB/s > Testing 4x multi-block erase speed > 4x multi-block erase speed is 74769 KiB/s > Testing 8x multi-block erase speed > 8x multi-block erase speed is 75149 KiB/s > Testing 16x multi-block erase speed > 16x multi-block erase speed is 75921 KiB/s > Testing 32x multi-block erase speed > 32x multi-block erase speed is 75921 KiB/s > Testing 64x multi-block erase speed > 64x multi-block erase speed is 75921 KiB/s > finished > > The throughput of the former is twice that of the latter. And what is your NAND controller driver? subpage reads are used when you only want to read a subset of a NAND page. Otherwise the core may use the RNDOUT command to change the pointer in the chip's SRAM to read from a different location, but I don't see what is impacting so much, unless if the driver implementation is really sub-optimized. Thanks, Miquèl
Sorry! I admit this is taking a shortcut. Will drop this patch and implement subpage read instead. Thanks! On Mon, May 20, 2024 at 11:33 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Keguang, > > keguang.zhang@gmail.com wrote on Mon, 20 May 2024 18:42:30 +0800: > > > On Mon, May 6, 2024 at 3:17 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > Hi, > > > > > > devnull+keguang.zhang.gmail.com@kernel.org wrote on Tue, 30 Apr 2024 > > > 19:11:11 +0800: > > > > > > > From: Keguang Zhang <keguang.zhang@gmail.com> > > > > > > > > nand_read_subpage() reads data and ECC data by two separate > > > > operations. > > > > This patch allows the NAND controllers who support > > > > monolithic page read to do subpage read by a single operation, > > > > which is more effective than nand_read_subpage(). > > > > > > I am a bit puzzled by this change. Usually nand_read_subpage is used > > > for optimizations (when less data than a full page must be retrieved). > > > I know it may be used in other cases (because it's easier for the core > > > in order to support a wide range of controllers). Can you please show a > > > speed test showing the results before I consider merging this patch? > > > > > With this patch: > > # flash_speed -c 128 -d /dev/mtd1 > > scanning for bad eraseblocks > > scanned 128 eraseblocks, 0 are bad > > testing eraseblock write speed > > eraseblock write speed is 2112 KiB/s > > testing eraseblock read speed > > eraseblock read speed is 3454 KiB/s > > testing page write speed > > page write speed is 1915 KiB/s > > testing page read speed > > page read speed is 2999 KiB/s > > testing 2 page write speed > > 2 page write speed is 2000 KiB/s > > testing 2 page read speed > > 2 page read speed is 3207 KiB/s > > Testing erase speed > > erase speed is 72495 KiB/s > > Testing 2x multi-block erase speed > > 2x multi-block erase speed is 74135 KiB/s > > Testing 4x multi-block erase speed > > 4x multi-block erase speed is 74812 KiB/s > > Testing 8x multi-block erase speed > > 8x multi-block erase speed is 75502 KiB/s > > Testing 16x multi-block erase speed > > 16x multi-block erase speed is 75851 KiB/s > > Testing 32x multi-block erase speed > > 32x multi-block erase speed is 75851 KiB/s > > Testing 64x multi-block erase speed > > 64x multi-block erase speed is 76204 KiB/s > > finished > > > > Without this patch: > > # flash_speed -c 128 -d /dev/mtd1 > > scanning for bad eraseblocks > > scanned 128 eraseblocks, 0 are bad > > testing eraseblock write speed > > eraseblock write speed is 2074 KiB/s > > testing eraseblock read speed > > eraseblock read speed is 2895 KiB/s > > testing page write speed > > page write speed is 998 KiB/s > > testing page read speed > > page read speed is 1499 KiB/s > > testing 2 page write speed > > 2 page write speed is 1002 KiB/s > > testing 2 page read speed > > 2 page read speed is 1554 KiB/s > > Testing erase speed > > erase speed is 76560 KiB/s > > Testing 2x multi-block erase speed > > 2x multi-block erase speed is 74019 KiB/s > > Testing 4x multi-block erase speed > > 4x multi-block erase speed is 74769 KiB/s > > Testing 8x multi-block erase speed > > 8x multi-block erase speed is 75149 KiB/s > > Testing 16x multi-block erase speed > > 16x multi-block erase speed is 75921 KiB/s > > Testing 32x multi-block erase speed > > 32x multi-block erase speed is 75921 KiB/s > > Testing 64x multi-block erase speed > > 64x multi-block erase speed is 75921 KiB/s > > finished > > > > The throughput of the former is twice that of the latter. > > And what is your NAND controller driver? > > subpage reads are used when you only want to read a subset of a NAND > page. > > Otherwise the core may use the RNDOUT command to change the pointer in > the chip's SRAM to read from a different location, but I don't see what > is impacting so much, unless if the driver implementation is really > sub-optimized. > > Thanks, > Miquèl -- Best regards, Keguang Zhang
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c index d7dbbd469b89..eeb654c6b4fc 100644 --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -3630,7 +3630,7 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from, oob_required, page); else if (!aligned && NAND_HAS_SUBPAGE_READ(chip) && - !oob) + !NAND_HAS_MONOLITHIC_READ(chip) && !oob) ret = chip->ecc.read_subpage(chip, col, bytes, bufpoi, page); else @@ -3648,7 +3648,8 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from, * partial pages or when a bounce buffer is required. */ if (use_bounce_buf) { - if (!NAND_HAS_SUBPAGE_READ(chip) && !oob && + if ((!NAND_HAS_SUBPAGE_READ(chip) || + NAND_HAS_MONOLITHIC_READ(chip)) && !oob && !(mtd->ecc_stats.failed - ecc_stats.failed) && (ops->mode != MTD_OPS_RAW)) { chip->pagecache.page = realpage; diff --git a/include/linux/mtd/rawnand.h b/include/linux/mtd/rawnand.h index e84522e31301..92d3ab491c9c 100644 --- a/include/linux/mtd/rawnand.h +++ b/include/linux/mtd/rawnand.h @@ -150,6 +150,11 @@ struct gpio_desc; /* Device needs 3rd row address cycle */ #define NAND_ROW_ADDR_3 BIT(14) +/* Device supports monolithic reads */ +#define NAND_MONOLITHIC_READ BIT(15) +/* Macros to identify the above */ +#define NAND_HAS_MONOLITHIC_READ(chip) ((chip->options & NAND_MONOLITHIC_READ)) + /* Non chip related options */ /* This option skips the bbt scan during initialization. */ #define NAND_SKIP_BBTSCAN BIT(16)