Message ID | 20131202103305.GB2466@localhost (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Ezequiel, Ezequiel Garcia <ezequiel.garcia@free-electrons.com> writes: >> Here is the nandtest run: >> >> nandtest /dev/mtd4 >> ECC corrections: 0 >> ECC failures : 0 >> Bad blocks : 8 >> BBT blocks : 0 >> Bad block at 0x06700000 >> Bad block at 0x06720000 >> Bad block at 0x06740000 >> Bad block at 0x06760000 >> Bad block at 0x06780000 >> Bad block at 0x067a0000 >> Bad block at 0x067c0000 >> Bad block at 0x067e0000 >> >> Finished pass 1 successfully >> > > Hm, so something *is* working properly. Notice that mtd4 has its 8 last blocks > marked 'bad' becuase it holds the bad block table. I think I am missing something here: u-boot reports that the BBT are located here: >> Bad block table found at page 65472, version 0x01 >> Bad block table found at page 65408, version 0x01 which seems to indicate that they are each 64 pages (i.e. 128KB) long. AFAICT, the log output seems to indicate that blocks are 128KB in size: >> Bad block at 0x06700000 >> Bad block at 0x06720000 >> Bad block at 0x06740000 >> Bad block at 0x06760000 >> Bad block at 0x06780000 >> Bad block at 0x067a0000 >> Bad block at 0x067c0000 >> Bad block at 0x067e0000 which is also what mtdinfo reports: root@mood:~# mtdinfo /dev/mtd4 mtd4 Name: jffs2 Type: nand Eraseblock size: 131072 bytes, 128.0 KiB Amount of eraseblocks: 832 (109051904 bytes, 104.0 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes OOB size: 64 bytes Character device major/minor: 90:8 Bad blocks are allowed: true Device is writable: true So unless I am missing something, *according* to u-boot there are 2 BBT and each BBT is 1 block long. Additionally - and this is a point I would like to understand - the reported location of this BBT is not after the 128MB of the chip but before the end of those 128MB. This also means before the end of the last partition. So the definition of the partition itsef gives the impression that all the blocks in the partition are available when in fact, the last two blocks cannot be used at all (or 8 if you are right). Put in a different manner, if I nandwrite a 104.0 MiB image to the last partition (/dev/mtd4), I am guaranteed to try and overwrite the BBT. Do not hesitate to tell me what I am missing here or if it is expected that the partition definition (start address and length) *includes* the bbt pages/blocks. > If you don't mind running a few more rounds, then it would be nice to do: > > $ nandtest --passes {N} > > So we run the test a few more times, just to be sure. root@mood:~# nandtest --passes 2 /dev/mtd4 ECC corrections: 0 ECC failures : 0 Bad blocks : 8 BBT blocks : 0 Bad block at 0x06700000 Bad block at 0x06720000 Bad block at 0x06740000 Bad block at 0x06760000 Bad block at 0x06780000 Bad block at 0x067a0000 Bad block at 0x067c0000 Bad block at 0x067e0000 Finished pass 1 successfully Bad block at 0x06700000 Bad block at 0x06720000 Bad block at 0x06740000 Bad block at 0x06760000 Bad block at 0x06780000 Bad block at 0x067a0000 Bad block at 0x067c0000 Bad block at 0x067e0000 Finished pass 2 successfully >> root@mood:~# nandwrite -p /dev/mtd4 /tmp/toto >> ... >> Writing data to block 795 at offset 0x6360000 >> Writing data to block 796 at offset 0x6380000 >> Writing data to block 797 at offset 0x63a0000 >> Writing data to block 798 at offset 0x63c0000 >> Writing data to block 799 at offset 0x63e0000 >> Writing data to block 800 at offset 0x6400000 >> [ 1509.210395] pxa3xx-nand d00d0000.nand: Ready time out!!! >> libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 800, offset 0) >> error 5 (Input/output error) > [..] > >> [ 1513.810387] pxa3xx-nand d00d0000.nand: Ready time out!!! >> libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 823, offset 0) >> error 5 (Input/output error) >> Erasing failed write from 0x66e0000 to 0x66fffff > > Hm.. so you get errors when writing to mtd4 blocks from 800 to 823. > > Is that completely reproducible, IOW do you get always the error > on those blocks? Well, I did the same test again, and I think the answer is 'no': root@mood:~# dd if=/dev/mtd4 of=/tmp/toto 212992+0 records in 212992+0 records out 109051904 bytes (109 MB) copied, 11.136 s, 9.8 MB/s root@mood:~# flash_erase /dev/mtd4 0 0 Erasing 128 Kibyte @ 66e0000 -- 98 % complete flash_erase: Skipping bad block at 06700000 flash_erase: Skipping bad block at 06720000 flash_erase: Skipping bad block at 06740000 flash_erase: Skipping bad block at 06760000 flash_erase: Skipping bad block at 06780000 flash_erase: Skipping bad block at 067a0000 flash_erase: Skipping bad block at 067c0000 flash_erase: Skipping bad block at 067e0000 Erasing 128 Kibyte @ 67e0000 -- 100 % complete root@mood:~# nandwrite -p /dev/mtd4 /tmp/toto Writing data to block 0 at offset 0x0 Writing data to block 1 at offset 0x20000 Writing data to block 2 at offset 0x40000 Writing data to block 3 at offset 0x60000 Writing data to block 4 at offset 0x80000 Writing data to block 5 at offset 0xa0000 Writing data to block 6 at offset 0xc0000 Writing data to block 7 at offset 0xe0000 ... Writing data to block 818 at offset 0x6640000 Writing data to block 819 at offset 0x6660000 Writing data to block 820 at offset 0x6680000 Writing data to block 821 at offset 0x66a0000 Writing data to block 822 at offset 0x66c0000 Writing data to block 823 at offset 0x66e0000 Writing data to block 824 at offset 0x6700000 Bad block at 6700000, 1 block(s) from 6700000 will be skipped Writing data to block 825 at offset 0x6720000 Bad block at 6720000, 1 block(s) from 6720000 will be skipped Writing data to block 826 at offset 0x6740000 Bad block at 6740000, 1 block(s) from 6740000 will be skipped Writing data to block 827 at offset 0x6760000 Bad block at 6760000, 1 block(s) from 6760000 will be skipped Writing data to block 828 at offset 0x6780000 Bad block at 6780000, 1 block(s) from 6780000 will be skipped Writing data to block 829 at offset 0x67a0000 Bad block at 67a0000, 1 block(s) from 67a0000 will be skipped Writing data to block 830 at offset 0x67c0000 Bad block at 67c0000, 1 block(s) from 67c0000 will be skipped Writing data to block 831 at offset 0x67e0000 Bad block at 67e0000, 1 block(s) from 67e0000 will be skipped Writing data to block 832 at offset 0x6800000 libmtd: error!: bad eraseblock number 832, mtd4 has 832 eraseblocks nandwrite: error!: /dev/mtd4: MTD get bad block failed error 22 (Invalid argument) nandwrite: error!: Data was only partially written due to error error 22 (Invalid argument) root@mood:~# Well, this time I don't get the "ready timeout" error I had last time around block 800. I did the test a second time (flash_erase and then nandwrite) and got the same result, i.e. no error before 824. >> Writing data to block 824 at offset 0x6700000 >> Bad block at 6700000, 1 block(s) from 6700000 will be skipped >> Writing data to block 825 at offset 0x6720000 >> Bad block at 6720000, 1 block(s) from 6720000 will be skipped >> Writing data to block 826 at offset 0x6740000 >> Bad block at 6740000, 1 block(s) from 6740000 will be skipped >> Writing data to block 827 at offset 0x6760000 >> Bad block at 6760000, 1 block(s) from 6760000 will be skipped >> Writing data to block 828 at offset 0x6780000 >> Bad block at 6780000, 1 block(s) from 6780000 will be skipped >> Writing data to block 829 at offset 0x67a0000 >> Bad block at 67a0000, 1 block(s) from 67a0000 will be skipped >> Writing data to block 830 at offset 0x67c0000 >> Bad block at 67c0000, 1 block(s) from 67c0000 will be skipped >> Writing data to block 831 at offset 0x67e0000 >> Bad block at 67e0000, 1 block(s) from 67e0000 will be skipped >> Writing data to block 832 at offset 0x6800000 >> libmtd: error!: bad eraseblock number 832, mtd4 has 832 eraseblocks >> nandwrite: error!: /dev/mtd4: MTD get bad block failed >> error 22 (Invalid argument) >> nandwrite: error!: Data was only partially written due to error >> error 22 (Invalid argument) >> > > These 8 blocks (824-832) that has been skipped are the ones marked > as 'bad' because they hold the bad block table. > >> This is the kind of errors I got last time but I think am starting to >> understand the root cause now. Tell me if I get it right: what is >> understood as bad blocks above (and in nandtest) is in fact the two bad >> block tables reported during boot: >> >> NAND device: Manufacturer ID: 0xad, Chip ID: 0xf1 (Hynix H27U1G8F2BTR-BC) >> NAND device: 128MiB, SLC, page size: 2048, OOB size: 64 >> Bad block table found at page 65472, version 0x01 >> Bad block table found at page 65408, version 0x01 >> > > Yes and no :-) The bad block table consists of 8 blocks at the end > of the flash device. These blocks are marked as 'reserved' and nandtest > or any other userspace writing/erasing tool will skip them. > > Hence, the bad block table is what explains the skipping of the group > of blocks [824..832]. However, you're getting errors when writing > data to [800..823], and it's a "Ready timeout" condition. I'm not sure > exactly what's going on, but we can say that: > > * Either the waiting time is not enough, or ... > > * The commands (maybe some race) were badly issued so there's nothing > to wait at all. As I can not reproduce previous behavior (I am on the exact same kernel), I guess it's difficult to go any further yet. Cheers, a+
Hi Arnaud, I'll try to explain the numbers you're observing. If I say something stupid along the way, feel free to correct me. When I wrote this driver I got puzzled by the same questions you're asking so rest assured I completely understand your confusion ;-) Let me being by writing your NAND layout, just for reference. 0x000000000000-0x000000180000 : "u-boot" 0x000000180000-0x0000001a0000 : "u-boot-env" 0x000000200000-0x000000800000 : "uImage" 0x000000800000-0x000001800000 : "minirootfs" 0x000001800000-0x000008000000 : "jffs2" Your NAND device has page size 2048 byte. Therefore, the following message: > > >> Bad block table found at page 65472, version 0x01 > >> Bad block table found at page 65408, version 0x01 > Is telling us that there are two bad block tables (the table and its so-called mirror): one at page 65472 and another at page 65408. These page numbers correspond to offset 0x7fe0000 and 0x7fc0000 respectively. Then, considering the mtd4 (aka "jffs2") partition starts at 0x1800000, we can say the bad block tables start at offsets 0x67e0000 and 0x67c0000, from the start of mtd4. And since you run 'nandtest' on a partition and not on the entire device you get the bad block locations at mtd4 offset: > >> Bad block at 0x06700000 > >> Bad block at 0x06720000 > >> Bad block at 0x06740000 > >> Bad block at 0x06760000 > >> Bad block at 0x06780000 > >> Bad block at 0x067a0000 > >> Bad block at 0x067c0000 <-- Mirror bad block table > >> Bad block at 0x067e0000 <-- Bad block table > The reason the numbers appear as "upside-down" is because the pxa3xx-nand driver uses this BBT options: .options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE | NAND_BBT_2BIT | NAND_BBT_VERSION, So, the NAND core code writes the table to the last block (as per NAND_BBT_LASTBLOCK option). Of course, it must write the bad block table *inside* the flash (before the end of its last partition) and to prevent any user I/O operation from losing the BBT the NAND core marks as bad using a special 'reserved' mark. Now, why does NAND reserve eight blocks, if there are only two tables? Well, you'll be able to find this in the driver: static struct nand_bbt_descr bbt_main_descr = { /* stuff */ .maxblocks = 8, /* Last 8 blocks in each chip */ }; The snippet above asks the NAND core to scan the last 8 blocks when searching for the in-flash bad block table. The NAND core will also reserve these 8 blocks as the maximum amount of blocks that can be used to store a bad block table (I guess that's in case one block gets 'really' bad). In conclusion, answering your question, yes, this is correct: > it's expected that the partition definition (start address and length) > *includes* the bbt pages/blocks. > The flash stores 8 x 128 KiB = 1 MiB for the bad block table and so whenever you prepare your filesystem, you must now that the last partition is actually 1 MiB smaller. On the other issue, about the timeout... > root@mood:~# nandwrite -p /dev/mtd4 /tmp/toto > Writing data to block 0 at offset 0x0 > Writing data to block 1 at offset 0x20000 > Writing data to block 2 at offset 0x40000 > Writing data to block 3 at offset 0x60000 > Writing data to block 4 at offset 0x80000 > Writing data to block 5 at offset 0xa0000 > Writing data to block 6 at offset 0xc0000 > Writing data to block 7 at offset 0xe0000 > ... > Writing data to block 818 at offset 0x6640000 > Writing data to block 819 at offset 0x6660000 > Writing data to block 820 at offset 0x6680000 > Writing data to block 821 at offset 0x66a0000 > Writing data to block 822 at offset 0x66c0000 > Writing data to block 823 at offset 0x66e0000 > Writing data to block 824 at offset 0x6700000 > Bad block at 6700000, 1 block(s) from 6700000 will be skipped > Writing data to block 825 at offset 0x6720000 > Bad block at 6720000, 1 block(s) from 6720000 will be skipped > Writing data to block 826 at offset 0x6740000 > Bad block at 6740000, 1 block(s) from 6740000 will be skipped > Writing data to block 827 at offset 0x6760000 > Bad block at 6760000, 1 block(s) from 6760000 will be skipped > Writing data to block 828 at offset 0x6780000 > Bad block at 6780000, 1 block(s) from 6780000 will be skipped > Writing data to block 829 at offset 0x67a0000 > Bad block at 67a0000, 1 block(s) from 67a0000 will be skipped > Writing data to block 830 at offset 0x67c0000 > Bad block at 67c0000, 1 block(s) from 67c0000 will be skipped > Writing data to block 831 at offset 0x67e0000 > Bad block at 67e0000, 1 block(s) from 67e0000 will be skipped > Writing data to block 832 at offset 0x6800000 > libmtd: error!: bad eraseblock number 832, mtd4 has 832 eraseblocks > nandwrite: error!: /dev/mtd4: MTD get bad block failed > error 22 (Invalid argument) > nandwrite: error!: Data was only partially written due to error > error 22 (Invalid argument) > root@mood:~# > OK... Then I guess the driver works fine but you're trying to write an image bigger than possible. Where did you get such filesystem? Maybe reserving 8 blocks is *a lot* (I personally think it is a lot, but maybe it's more robust that way). > > Hence, the bad block table is what explains the skipping of the group > > of blocks [824..832]. However, you're getting errors when writing > > data to [800..823], and it's a "Ready timeout" condition. I'm not sure > > exactly what's going on, but we can say that: > > > > * Either the waiting time is not enough, or ... > > > > * The commands (maybe some race) were badly issued so there's nothing > > to wait at all. > > As I can not reproduce previous behavior (I am on the exact same > kernel), I guess it's difficult to go any further yet. > Hm.. that's not cool. The fact that you got those timeouts, but now you don't have them anymore only worries me even more. Feel free to ask any more questions about the NAND stuff, although I've shared with you pretty much all I could collect from my readings to the code.
Hi Ezequiel, Ezequiel Garcia <ezequiel.garcia@free-electrons.com> writes: > I'll try to explain the numbers you're observing. If I say something stupid > along the way, feel free to correct me. > > When I wrote this driver I got puzzled by the same questions you're asking > so rest assured I completely understand your confusion ;-) Thanks. Unless ones spends the time being taught that or reading the code, the logic of BBT mapping and reservation is not exactly straightforward. Thanks for the explanation. More news below. > The reason the numbers appear as "upside-down" is because the pxa3xx-nand > driver uses this BBT options: > > .options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE > | NAND_BBT_2BIT | NAND_BBT_VERSION, > > So, the NAND core code writes the table to the last block (as per > NAND_BBT_LASTBLOCK option). Of course, it must write the bad block > table *inside* the flash (before the end of its last partition) and > to prevent any user I/O operation from losing the BBT the NAND core > marks as bad using a special 'reserved' mark. ok. > Now, why does NAND reserve eight blocks, if there are only two tables? > Well, you'll be able to find this in the driver: > > static struct nand_bbt_descr bbt_main_descr = { > /* stuff */ > .maxblocks = 8, /* Last 8 blocks in each chip */ > }; > > The snippet above asks the NAND core to scan the last 8 blocks when searching > for the in-flash bad block table. The NAND core will also reserve these > 8 blocks as the maximum amount of blocks that can be used to store a bad > block table (I guess that's in case one block gets 'really' bad). > > In conclusion, answering your question, yes, this is correct: > >> it's expected that the partition definition (start address and length) >> *includes* the bbt pages/blocks. >> > > The flash stores 8 x 128 KiB = 1 MiB for the bad block table and so > whenever you prepare your filesystem, you must now that the last > partition is actually 1 MiB smaller. ok. Will put a comment in the .dts about that. > On the other issue, about the timeout... > >> root@mood:~# nandwrite -p /dev/mtd4 /tmp/toto >> Writing data to block 0 at offset 0x0 >> Writing data to block 1 at offset 0x20000 >> Writing data to block 2 at offset 0x40000 >> Writing data to block 3 at offset 0x60000 >> Writing data to block 4 at offset 0x80000 >> Writing data to block 5 at offset 0xa0000 >> Writing data to block 6 at offset 0xc0000 >> Writing data to block 7 at offset 0xe0000 >> ... >> Writing data to block 818 at offset 0x6640000 >> Writing data to block 819 at offset 0x6660000 >> Writing data to block 820 at offset 0x6680000 >> Writing data to block 821 at offset 0x66a0000 >> Writing data to block 822 at offset 0x66c0000 >> Writing data to block 823 at offset 0x66e0000 >> Writing data to block 824 at offset 0x6700000 >> Bad block at 6700000, 1 block(s) from 6700000 will be skipped >> Writing data to block 825 at offset 0x6720000 >> Bad block at 6720000, 1 block(s) from 6720000 will be skipped >> Writing data to block 826 at offset 0x6740000 >> Bad block at 6740000, 1 block(s) from 6740000 will be skipped >> Writing data to block 827 at offset 0x6760000 >> Bad block at 6760000, 1 block(s) from 6760000 will be skipped >> Writing data to block 828 at offset 0x6780000 >> Bad block at 6780000, 1 block(s) from 6780000 will be skipped >> Writing data to block 829 at offset 0x67a0000 >> Bad block at 67a0000, 1 block(s) from 67a0000 will be skipped >> Writing data to block 830 at offset 0x67c0000 >> Bad block at 67c0000, 1 block(s) from 67c0000 will be skipped >> Writing data to block 831 at offset 0x67e0000 >> Bad block at 67e0000, 1 block(s) from 67e0000 will be skipped >> Writing data to block 832 at offset 0x6800000 >> libmtd: error!: bad eraseblock number 832, mtd4 has 832 eraseblocks >> nandwrite: error!: /dev/mtd4: MTD get bad block failed >> error 22 (Invalid argument) >> nandwrite: error!: Data was only partially written due to error >> error 22 (Invalid argument) >> root@mood:~# >> > > OK... Then I guess the driver works fine but you're trying to write > an image bigger than possible. Where did you get such filesystem? The image is a dd from the partition itself. Trying to put it back will obviously not work for the reason you provide above, i.e. because dd reads to the end of the partition (including the BBT) but then during the write operation the reserved block cannot be written back. > Maybe reserving 8 blocks is *a lot* (I personally think it is a lot, > but maybe it's more robust that way). > >> > Hence, the bad block table is what explains the skipping of the group >> > of blocks [824..832]. However, you're getting errors when writing >> > data to [800..823], and it's a "Ready timeout" condition. I'm not sure >> > exactly what's going on, but we can say that: >> > >> > * Either the waiting time is not enough, or ... >> > >> > * The commands (maybe some race) were badly issued so there's nothing >> > to wait at all. >> >> As I can not reproduce previous behavior (I am on the exact same >> kernel), I guess it's difficult to go any further yet. >> > > Hm.. that's not cool. The fact that you got those timeouts, but now > you don't have them anymore only worries me even more. Well, before submitting the patches for armada-based ReadyNAS .dts files, I decided to give each of them a try on their respective hardware and I was somewhat lucky (if I can say so) on my RN2120 and got the following. Note that the NAND chip is the same but the SoC on the NAS is different (dual-core armada Xp mv78230 on RN2120 instead of Armada 370 on RN102). The same tree is used in each case with the same MTD related config options: root@thin:~# flash_erase /dev/mtd4 0 0 ... root@thin:~# nandwrite -p /dev/mtd4 /tmp/mtd4ro Writing data to block 0 at offset 0x0 [ 782.766892] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 0, offset 2048) error 5 (Input/output error) Erasing failed write from 00000000 to 0x01ffff Writing data to block 1 at offset 0x20000 [ 782.966891] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 1, offset 2048) error 5 (Input/output error) Erasing failed write from 0x020000 to 0x03ffff Writing data to block 2 at offset 0x40000 [ 783.166891] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 2, offset 2048) error 5 (Input/output error) Erasing failed write from 0x040000 to 0x05ffff Writing data to block 3 at offset 0x60000 [ 783.366889] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 3, offset 2048) error 5 (Input/output error) Erasing failed write from 0x060000 to 0x07ffff Writing data to block 4 at offset 0x80000 [ 783.566890] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 4, offset 2048) error 5 (Input/output error) Erasing failed write from 0x080000 to 0x09ffff Writing data to block 5 at offset 0xa0000 [ 783.766890] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 5, offset 2048) error 5 (Input/output error) Erasing failed write from 0x0a0000 to 0x0bffff Writing data to block 6 at offset 0xc0000 [ 783.966890] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 6, offset 2048) error 5 (Input/output error) Erasing failed write from 0x0c0000 to 0x0dffff Writing data to block 7 at offset 0xe0000 [ 784.166890] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 7, offset 2048) error 5 (Input/output error) Erasing failed write from 0x0e0000 to 0x0fffff Writing data to block 8 at offset 0x100000 [ 784.366889] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 8, offset 2048) error 5 (Input/output error) Erasing failed write from 0x100000 to 0x11ffff Writing data to block 9 at offset 0x120000 ^C[ 784.566889] pxa3xx-nand d00d0000.nand: Ready time out!!! So I decided to increase CHIP_DELAY_TIMEOUT in the driver as you suggested in a previous email (HZ instead of 2*HZ/10): root@thin:~# nandwrite -p /dev/mtd4 mtd4ro Writing data to block 0 at offset 0x0 [ 273.917098] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 0, offset 2048) error 5 (Input/output error) Erasing failed write from 00000000 to 0x01ffff Writing data to block 1 at offset 0x20000 [ 274.917094] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 1, offset 2048) error 5 (Input/output error) Erasing failed write from 0x020000 to 0x03ffff Writing data to block 2 at offset 0x40000 [ 275.917094] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 2, offset 2048) error 5 (Input/output error) Erasing failed write from 0x040000 to 0x05ffff Writing data to block 3 at offset 0x60000 ^C[ 276.917094] pxa3xx-nand d00d0000.nand: Ready time out!!! And then to 25*HZ/10 (I know it's a lot): root@thin:~# nandwrite /dev/mtd4 mtd4ro Writing data to block 0 at offset 0x0 [ 192.118574] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 0, offset 2048) error 5 (Input/output error) Erasing failed write from 00000000 to 0x01ffff Writing data to block 1 at offset 0x20000 [ 194.618569] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 1, offset 2048) error 5 (Input/output error) Erasing failed write from 0x020000 to 0x03ffff Writing data to block 2 at offset 0x40000 ^C[ 197.118570] pxa3xx-nand d00d0000.nand: Ready time out!!! So I guess, if you have any (non hardware destructive ;-) ) idea, I now have a platform on which the problem is reproducible. Meanwhile, I will try and do the same on my RN104 (should be the same as the RN102). Cheers, a+
Hi Ezequiel, arno@natisbad.org (Arnaud Ebalard) writes: > So I guess, if you have any (non hardware destructive ;-) ) idea, I now > have a platform on which the problem is reproducible. Meanwhile, I will > try and do the same on my RN104 (should be the same as the RN102). Well, on the RN104 (same chip, same kernel and same SoC as the RN102), I have the following (I get the same with a patched driver to increase the timeout): root@humble:~# flash_erase /dev/mtd4 0 0 Erasing 128 Kibyte @ 72e0000 -- 99 % complete flash_erase: Skipping bad block at 07300000 flash_erase: Skipping bad block at 07320000 flash_erase: Skipping bad block at 07340000 flash_erase: Skipping bad block at 07360000 flash_erase: Skipping bad block at 07380000 flash_erase: Skipping bad block at 073a0000 flash_erase: Skipping bad block at 073c0000 flash_erase: Skipping bad block at 073e0000 Erasing 128 Kibyte @ 73e0000 -- 100 % complete root@humble:~# nandwrite -p /dev/mtd4 mtd4ro Writing data to block 0 at offset 0x0 [ 449.915173] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 0, offset 2048) error 5 (Input/output error) Erasing failed write from 00000000 to 0x01ffff Writing data to block 1 at offset 0x20000 [ 450.115172] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 1, offset 2048) error 5 (Input/output error) Erasing failed write from 0x020000 to 0x03ffff Writing data to block 2 at offset 0x40000 [ 450.315171] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 2, offset 2048) error 5 (Input/output error) Erasing failed write from 0x040000 to 0x05ffff Writing data to block 3 at offset 0x60000 [ 450.515171] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 3, offset 2048) error 5 (Input/output error) Erasing failed write from 0x060000 to 0x07ffff Writing data to block 4 at offset 0x80000 [ 450.715169] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 4, offset 2048) error 5 (Input/output error) Erasing failed write from 0x080000 to 0x09ffff Writing data to block 5 at offset 0xa0000 [ 450.915171] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 5, offset 2048) error 5 (Input/output error) Erasing failed write from 0x0a0000 to 0x0bffff Writing data to block 6 at offset 0xc0000 [ 451.115171] pxa3xx-nand d00d0000.nand: Ready time out!!! libmtd: error!: cannot write 2048 bytes to mtd4 (eraseblock 6, offset 2048) error 5 (Input/output error) Erasing failed write from 0x0c0000 to 0x0dffff Writing data to block 7 at offset 0xe0000 So I then did the same test again on the RN102 by doing a flash_erase on /dev/mtd4 and then a nandwrite: it worked as expected (errors only at the end when trying to overwrite the reserved block for the BBT). The only difference I can see between RN104/RN2120 and RN102 is the debian: the three are jessie but the one on the RN102 I used is armel when the ones on the RN104 and RN2120 are armhf (all kernels are armel). Cheers, a+ ps: I can provide the config files for each device's kernel if you think a specific option (debug or something else) may have an impact on how the driver behave.
On Mon, Dec 02, 2013 at 09:22:26PM -0300, Ezequiel Garcia wrote: > Now, why does NAND reserve eight blocks, if there are only two tables? > Well, you'll be able to find this in the driver: > > static struct nand_bbt_descr bbt_main_descr = { > /* stuff */ > .maxblocks = 8, /* Last 8 blocks in each chip */ > }; > > The snippet above asks the NAND core to scan the last 8 blocks when searching > for the in-flash bad block table. The NAND core will also reserve these > 8 blocks as the maximum amount of blocks that can be used to store a bad > block table (I guess that's in case one block gets 'really' bad). That doesn't reflect mainline, where you'll see: static struct nand_bbt_descr bbt_main_descr = { ... .maxblocks = NAND_BBT_SCAN_MAXBLOCKS, ... }; Where NAND_BBT_SCAN_MAXBLOCKS == 4. Do you have local modifications that make .maxblocks = 8? Brian
On Thu, Dec 05, 2013 at 01:23:33PM -0800, Brian Norris wrote: > On Mon, Dec 02, 2013 at 09:22:26PM -0300, Ezequiel Garcia wrote: > > Now, why does NAND reserve eight blocks, if there are only two tables? > > Well, you'll be able to find this in the driver: > > > > static struct nand_bbt_descr bbt_main_descr = { > > /* stuff */ > > .maxblocks = 8, /* Last 8 blocks in each chip */ > > }; > > > > The snippet above asks the NAND core to scan the last 8 blocks when searching > > for the in-flash bad block table. The NAND core will also reserve these > > 8 blocks as the maximum amount of blocks that can be used to store a bad > > block table (I guess that's in case one block gets 'really' bad). > > That doesn't reflect mainline, where you'll see: > I wasn't mentioning nand_bbt.c but rather the pxa3xx-nand custom nand_bbt_descr. $ grep "maxblocks =" drivers/mtd/nand/pxa3xx_nand.c .maxblocks = 8, /* Last 8 blocks in each chip */ .maxblocks = 8, /* Last 8 blocks in each chip */ > static struct nand_bbt_descr bbt_main_descr = { > ... > .maxblocks = NAND_BBT_SCAN_MAXBLOCKS, > ... > }; > > Where NAND_BBT_SCAN_MAXBLOCKS == 4. > Do you think using 8 is too much? (I'd agree at changing it) Does it break anything to lower it to 4?
On Thu, Dec 05, 2013 at 07:23:53PM -0300, Ezequiel Garcia wrote: > On Thu, Dec 05, 2013 at 01:23:33PM -0800, Brian Norris wrote: > > On Mon, Dec 02, 2013 at 09:22:26PM -0300, Ezequiel Garcia wrote: > > > Now, why does NAND reserve eight blocks, if there are only two tables? > > > Well, you'll be able to find this in the driver: > > > > > > static struct nand_bbt_descr bbt_main_descr = { > > > /* stuff */ > > > .maxblocks = 8, /* Last 8 blocks in each chip */ > > > }; > > > > > > The snippet above asks the NAND core to scan the last 8 blocks when searching > > > for the in-flash bad block table. The NAND core will also reserve these > > > 8 blocks as the maximum amount of blocks that can be used to store a bad > > > block table (I guess that's in case one block gets 'really' bad). > > > > That doesn't reflect mainline, where you'll see: > > > > I wasn't mentioning nand_bbt.c but rather the pxa3xx-nand custom > nand_bbt_descr. > > $ grep "maxblocks =" drivers/mtd/nand/pxa3xx_nand.c > .maxblocks = 8, /* Last 8 blocks in each chip */ > .maxblocks = 8, /* Last 8 blocks in each chip */ Ah, I forgot about that. > > static struct nand_bbt_descr bbt_main_descr = { > > ... > > .maxblocks = NAND_BBT_SCAN_MAXBLOCKS, > > ... > > }; > > > > Where NAND_BBT_SCAN_MAXBLOCKS == 4. > > > > Do you think using 8 is too much? (I'd agree at changing it) > Does it break anything to lower it to 4? I think it's primarily a concern if you have too many consecutive bad blocks at the end of your flash (factory, or from wear). On a pristine flash, the table will only ever be in the last 2 blocks, and so 4 blocks is a reasonable guess at the maximum needed, leaving room for two blocks to go bad. So, you'd only have a problem if 3 or more of the last 4 blocks go bad. Brian
diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c index 3d143fe..9bb7d35 100644 --- a/drivers/mtd/nand/pxa3xx_nand.c +++ b/drivers/mtd/nand/pxa3xx_nand.c @@ -39,7 +39,7 @@ #include <linux/platform_data/mtd-nand-pxa3xx.h> #define NAND_DEV_READY_TIMEOUT 50 -#define CHIP_DELAY_TIMEOUT (2 * HZ/10) +#define CHIP_DELAY_TIMEOUT (10 * HZ/10) #define NAND_STOP_DELAY (2 * HZ/50) #define PAGE_CHUNK_SIZE (2048)