mbox series

[RFC,nand-next,0/2] meson-nand: support for older SoCs

Message ID 20190301182922.8309-1-martin.blumenstingl@googlemail.com (mailing list archive)
Headers show
Series meson-nand: support for older SoCs | expand

Message

Martin Blumenstingl March 1, 2019, 6:29 p.m. UTC
Hi Liang,

I am trying to add support for older SoCs to the meson-nand driver.
Back when the driver was in development I used an early revision (of
your driver) and did some modifications to make it work on older SoCs.

Now that the driver is upstream I wanted to give it another try and
make a real patch out of it. Unfortunately it's not working anymore.

As far as I know the NFC IP block revision on GXL is similar (or even
the same?) as on all older SoCs. As far as I can tell only the clock
setup is different on the older SoCs (which have a dedicated NAND
clock):
- we don't need the "amlogic,mmc-syscon" property on the older SoCs
  because we don't need to setup any muxing (common clock framework
  will do everything for us)
- "rx" and "tx" clocks don't exist
- I could not find any other differences between Meson8, Meson8b,
  Meson8m2, GXBB and GXL

In this series I'm sending two patches which add support for the older
SoCs.

Unfortunately these patches are currently not working for me (hence the
"RFC" prefix). I get a (strange) crash which is triggered by the
kzalloc() in meson_nfc_read_buf() - see below for more details.

Can you please help me on this one? I'd like to know whether:
- the meson-nand driver works for you on GXL or AXG on linux-next?
  (I was running these patches on top of next-20190301 on my M8S
  board which uses a 32-bit Meson8m2 SoC. I don't have any board using
  a GXL SoC which also has NAND)
- you see any issue with my patches? (maybe I missed more differences
  between GXL and the older SoCs)


kernel log extract:
[...]
Could not find a valid ONFI parameter page, trying bit-wise majority to recover it
ONFI parameter recovery failed, aborting
Unable to handle kernel paging request at virtual address 80110000
pgd = (ptrval)
[80110000] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc8-next-20190301-00053-g50ac6f7757e2 #4145
Hardware name: Amlogic Meson platform
PC is at kmem_cache_alloc_trace+0xc8/0x268
LR is at kmem_cache_alloc_trace+0x2c/0x268
pc : [<c046479c>]    lr : [<c0464700>]    psr: 60000013
sp : c02adc58  ip : e9e7a440  fp : 00004ee2
r10: 80110000  r9 : ffffe000  r8 : c110918c
r7 : 00000008  r6 : c08967c0  r5 : 00000dc0  r4 : c0201e40
r3 : c109dd30  r2 : 00000000  r1 : 00004ee2  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 0020404a  DAC: 00000051
Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
Stack: (0xc02adc58 to 0xc02ae000)
dc40:                                                       e9d84048 00000003
dc60: e9e7a440 c02add18 00000028 e9e68680 c02adcf0 00000003 e9d84048 00000002
dc80: e9e7a440 c08967c0 c1108cb4 c02adcdc 10624dd3 c02adce4 10624dd3 00000005
dca0: e9e7a440 c0f5c310 00000028 c0f09998 00000005 e9d84048 00000005 c02add57
dcc0: c1108c88 e9e7a440 c02adcf0 e9d84428 e9d843b0 c0882258 000000ff 40000000
dce0: c02adce8 00000000 c02adcf0 00000003 00000000 00000090 00000000 00000000
dd00: 00000000 00000001 00000001 c02adcdf 00000000 00000190 00000002 00000005
dd20: c02add57 00000001 00000000 a10a1ef7 00000000 c1108c88 c1180250 e9d843c0
dd40: 00000001 000000de 00000000 c088d114 c0f08db4 00d843c0 00000000 a10a1ef7
dd60: 00000015 e9d84048 c1108c88 c088d470 e9d84048 c08888b4 00000000 60000013
dd80: c0eebc9c 000000ad c0da01ac 00000000 e9e7a48c c0cc6d50 c12122cc a10a1ef7
dda0: e9e7a440 e9e7a440 e9d84040 c0eebc9c e987f410 eafd6748 e9d84048 c1108c88
ddc0: e9e7a48c c0895c70 00000000 e9871f00 e9e7a440 c04eef60 00000000 eafd64bc
dde0: e9e7a534 00000000 00000000 00000000 00000001 a10a1ef7 00000000 e987f410
de00: 00000000 c1180728 00000000 00000000 c1180728 00000000 c1071854 c07fd388
de20: c120da38 e987f410 c120da3c 00000000 00000000 c07fb410 e987f410 c1180728
de40: c1180728 c07fb910 00000000 c1071834 c10004a8 c07fb65c c10004a8 c0a917b4
de60: c0da1914 e987f410 00000000 c1180728 c07fb910 00000000 c1071834 c10004a8
de80: c1071854 c07fb908 00000000 c1180728 e987f410 c07fb968 e98b8eb4 c1108c88
dea0: c1180728 c07f97d4 c1175260 c029a958 e98b8eb4 a10a1ef7 c029a96c c1180728
dec0: e9e1fa80 c1175260 00000000 c07fa844 c0f09c64 c1108c88 ffffe000 c1180728
dee0: c1108c88 ffffe000 c103b788 c07fc494 c11c2ca0 c1108c88 ffffe000 c0302f1c
df00: ebfffd96 c0346f90 c0fab8a0 c0f2ad00 00000000 00000006 00000006 c0e9e26c
df20: 00000000 c1108c88 c0eb11b0 c0e9e2e0 c11d1300 ebfffd84 ebfffd89 a10a1ef7
df40: c1071838 c11c2ca0 c10914b8 a10a1ef7 c11c2ca0 c1091804 00000007 c11d1300
df60: c11d1300 c1001180 00000006 00000006 00000000 c10004a8 0000013d 00000000
df80: c02c0504 00000000 c0cbf6e8 00000000 00000000 00000000 00000000 00000000
dfa0: 00000000 c0cbf6f0 00000000 c03010e8 00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c046479c>] (kmem_cache_alloc_trace) from [<c08967c0>] (meson_nfc_exec_op+0x2c4/0x3e8)
[<c08967c0>] (meson_nfc_exec_op) from [<c0882258>] (nand_readid_op+0x128/0x1c4)
[<c0882258>] (nand_readid_op) from [<c088d114>] (hynix_nand_has_valid_jedecid+0x34/0x78)
[<c088d114>] (hynix_nand_has_valid_jedecid) from [<c088d470>] (hynix_nand_decode_id+0x64/0x3fc)
[<c088d470>] (hynix_nand_decode_id) from [<c08888b4>] (nand_scan_with_ids+0xa04/0x171c)
[<c08888b4>] (nand_scan_with_ids) from [<c0895c70>] (meson_nfc_probe+0x460/0x690)
[<c0895c70>] (meson_nfc_probe) from [<c07fd388>] (platform_drv_probe+0x48/0x98)
[<c07fd388>] (platform_drv_probe) from [<c07fb410>] (really_probe+0x1e0/0x2cc)
[<c07fb410>] (really_probe) from [<c07fb65c>] (driver_probe_device+0x60/0x16c)
[<c07fb65c>] (driver_probe_device) from [<c07fb908>] (device_driver_attach+0x58/0x60)
[<c07fb908>] (device_driver_attach) from [<c07fb968>] (__driver_attach+0x58/0xcc)
[<c07fb968>] (__driver_attach) from [<c07f97d4>] (bus_for_each_dev+0x74/0xb4)
[<c07f97d4>] (bus_for_each_dev) from [<c07fa844>] (bus_add_driver+0x1b8/0x1d8)
[<c07fa844>] (bus_add_driver) from [<c07fc494>] (driver_register+0x74/0x108)
[<c07fc494>] (driver_register) from [<c0302f1c>] (do_one_initcall+0x54/0x284)
[<c0302f1c>] (do_one_initcall) from [<c1001180>] (kernel_init_freeable+0x2d4/0x36c)
[<c1001180>] (kernel_init_freeable) from [<c0cbf6f0>] (kernel_init+0x8/0x110)
[<c0cbf6f0>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0xc02adfb0 to 0xc02adff8)
dfa0:                                     00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
Code: e5943000 e5942014 e3130007 1a000038 (e79ae002) 
---[ end trace 28d391ed14b0f021 ]---


Martin Blumenstingl (2):
  dt-bindings: nand: meson: add support for more SoCs
  mtd: rawnand: meson: support for older SoCs up to Meson8

 .../bindings/mtd/amlogic,meson-nand.txt       | 14 ++++--
 drivers/mtd/nand/raw/meson_nand.c             | 46 +++++++++++++------
 2 files changed, 42 insertions(+), 18 deletions(-)

Comments

Liang Yang March 4, 2019, 4:56 a.m. UTC | #1
Hello Martin,

On 2019/3/2 2:29, Martin Blumenstingl wrote:
> Hi Liang,
> 
> I am trying to add support for older SoCs to the meson-nand driver.
> Back when the driver was in development I used an early revision (of
> your driver) and did some modifications to make it work on older SoCs.
> 
> Now that the driver is upstream I wanted to give it another try and
> make a real patch out of it. Unfortunately it's not working anymore.
> 
> As far as I know the NFC IP block revision on GXL is similar (or even
> the same?) as on all older SoCs. As far as I can tell only the clock
> setup is different on the older SoCs (which have a dedicated NAND
> clock):
> - we don't need the "amlogic,mmc-syscon" property on the older SoCs
>    because we don't need to setup any muxing (common clock framework
>    will do everything for us)
> - "rx" and "tx" clocks don't exist
> - I could not find any other differences between Meson8, Meson8b,
>    Meson8m2, GXBB and GXL
> 
That is right. the serials NFC is almost the same except:
1) The clock control and source that M8-serials are not share with EMMC.
2) The base register address
3) DMA encryption option which we don't care on NFC driver.

> In this series I'm sending two patches which add support for the older
> SoCs.
> 
> Unfortunately these patches are currently not working for me (hence the
> "RFC" prefix). I get a (strange) crash which is triggered by the
> kzalloc() in meson_nfc_read_buf() - see below for more details.
> 
> Can you please help me on this one? I'd like to know whether:
> - the meson-nand driver works for you on GXL or AXG on linux-next?
>    (I was running these patches on top of next-20190301 on my M8S
>    board which uses a 32-bit Meson8m2 SoC. I don't have any board using
>    a GXL SoC which also has NAND)
Yes, it works on AXG platform using a MXIC slc nand flash(MX30LF4G); but 
i an not sure it runs the same flow with yours. because i see the print 
"Counld not find a valid ONFI parameter page, ...." in yours. i will try 
to reproduce it on AXG(i don't have a M8 platform now).

> - you see any issue with my patches? (maybe I missed more differences
>    between GXL and the older SoCs)
> 
i think it is ok now.
> 
> kernel log extract:
> [...]
> Could not find a valid ONFI parameter page, trying bit-wise majority to recover it
> ONFI parameter recovery failed, aborting
> Unable to handle kernel paging request at virtual address 80110000
> pgd = (ptrval)
> [80110000] *pgd=00000000
> Internal error: Oops: 5 [#1] PREEMPT SMP ARM
> Modules linked in:
> CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc8-next-20190301-00053-g50ac6f7757e2 #4145
> Hardware name: Amlogic Meson platform
> PC is at kmem_cache_alloc_trace+0xc8/0x268
> LR is at kmem_cache_alloc_trace+0x2c/0x268
> pc : [<c046479c>]    lr : [<c0464700>]    psr: 60000013
> sp : c02adc58  ip : e9e7a440  fp : 00004ee2
> r10: 80110000  r9 : ffffe000  r8 : c110918c
> r7 : 00000008  r6 : c08967c0  r5 : 00000dc0  r4 : c0201e40
> r3 : c109dd30  r2 : 00000000  r1 : 00004ee2  r0 : 00000000
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 0020404a  DAC: 00000051
> Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
> Stack: (0xc02adc58 to 0xc02ae000)
> dc40:                                                       e9d84048 00000003
> dc60: e9e7a440 c02add18 00000028 e9e68680 c02adcf0 00000003 e9d84048 00000002
> dc80: e9e7a440 c08967c0 c1108cb4 c02adcdc 10624dd3 c02adce4 10624dd3 00000005
> dca0: e9e7a440 c0f5c310 00000028 c0f09998 00000005 e9d84048 00000005 c02add57
> dcc0: c1108c88 e9e7a440 c02adcf0 e9d84428 e9d843b0 c0882258 000000ff 40000000
> dce0: c02adce8 00000000 c02adcf0 00000003 00000000 00000090 00000000 00000000
> dd00: 00000000 00000001 00000001 c02adcdf 00000000 00000190 00000002 00000005
> dd20: c02add57 00000001 00000000 a10a1ef7 00000000 c1108c88 c1180250 e9d843c0
> dd40: 00000001 000000de 00000000 c088d114 c0f08db4 00d843c0 00000000 a10a1ef7
> dd60: 00000015 e9d84048 c1108c88 c088d470 e9d84048 c08888b4 00000000 60000013
> dd80: c0eebc9c 000000ad c0da01ac 00000000 e9e7a48c c0cc6d50 c12122cc a10a1ef7
> dda0: e9e7a440 e9e7a440 e9d84040 c0eebc9c e987f410 eafd6748 e9d84048 c1108c88
> ddc0: e9e7a48c c0895c70 00000000 e9871f00 e9e7a440 c04eef60 00000000 eafd64bc
> dde0: e9e7a534 00000000 00000000 00000000 00000001 a10a1ef7 00000000 e987f410
> de00: 00000000 c1180728 00000000 00000000 c1180728 00000000 c1071854 c07fd388
> de20: c120da38 e987f410 c120da3c 00000000 00000000 c07fb410 e987f410 c1180728
> de40: c1180728 c07fb910 00000000 c1071834 c10004a8 c07fb65c c10004a8 c0a917b4
> de60: c0da1914 e987f410 00000000 c1180728 c07fb910 00000000 c1071834 c10004a8
> de80: c1071854 c07fb908 00000000 c1180728 e987f410 c07fb968 e98b8eb4 c1108c88
> dea0: c1180728 c07f97d4 c1175260 c029a958 e98b8eb4 a10a1ef7 c029a96c c1180728
> dec0: e9e1fa80 c1175260 00000000 c07fa844 c0f09c64 c1108c88 ffffe000 c1180728
> dee0: c1108c88 ffffe000 c103b788 c07fc494 c11c2ca0 c1108c88 ffffe000 c0302f1c
> df00: ebfffd96 c0346f90 c0fab8a0 c0f2ad00 00000000 00000006 00000006 c0e9e26c
> df20: 00000000 c1108c88 c0eb11b0 c0e9e2e0 c11d1300 ebfffd84 ebfffd89 a10a1ef7
> df40: c1071838 c11c2ca0 c10914b8 a10a1ef7 c11c2ca0 c1091804 00000007 c11d1300
> df60: c11d1300 c1001180 00000006 00000006 00000000 c10004a8 0000013d 00000000
> df80: c02c0504 00000000 c0cbf6e8 00000000 00000000 00000000 00000000 00000000
> dfa0: 00000000 c0cbf6f0 00000000 c03010e8 00000000 00000000 00000000 00000000
> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> [<c046479c>] (kmem_cache_alloc_trace) from [<c08967c0>] (meson_nfc_exec_op+0x2c4/0x3e8)
> [<c08967c0>] (meson_nfc_exec_op) from [<c0882258>] (nand_readid_op+0x128/0x1c4)
> [<c0882258>] (nand_readid_op) from [<c088d114>] (hynix_nand_has_valid_jedecid+0x34/0x78)
> [<c088d114>] (hynix_nand_has_valid_jedecid) from [<c088d470>] (hynix_nand_decode_id+0x64/0x3fc)
> [<c088d470>] (hynix_nand_decode_id) from [<c08888b4>] (nand_scan_with_ids+0xa04/0x171c)
> [<c08888b4>] (nand_scan_with_ids) from [<c0895c70>] (meson_nfc_probe+0x460/0x690)
> [<c0895c70>] (meson_nfc_probe) from [<c07fd388>] (platform_drv_probe+0x48/0x98)
> [<c07fd388>] (platform_drv_probe) from [<c07fb410>] (really_probe+0x1e0/0x2cc)
> [<c07fb410>] (really_probe) from [<c07fb65c>] (driver_probe_device+0x60/0x16c)
> [<c07fb65c>] (driver_probe_device) from [<c07fb908>] (device_driver_attach+0x58/0x60)
> [<c07fb908>] (device_driver_attach) from [<c07fb968>] (__driver_attach+0x58/0xcc)
> [<c07fb968>] (__driver_attach) from [<c07f97d4>] (bus_for_each_dev+0x74/0xb4)
> [<c07f97d4>] (bus_for_each_dev) from [<c07fa844>] (bus_add_driver+0x1b8/0x1d8)
> [<c07fa844>] (bus_add_driver) from [<c07fc494>] (driver_register+0x74/0x108)
> [<c07fc494>] (driver_register) from [<c0302f1c>] (do_one_initcall+0x54/0x284)
> [<c0302f1c>] (do_one_initcall) from [<c1001180>] (kernel_init_freeable+0x2d4/0x36c)
> [<c1001180>] (kernel_init_freeable) from [<c0cbf6f0>] (kernel_init+0x8/0x110)
> [<c0cbf6f0>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
> Exception stack(0xc02adfb0 to 0xc02adff8)
> dfa0:                                     00000000 00000000 00000000 00000000
> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> Code: e5943000 e5942014 e3130007 1a000038 (e79ae002)
> ---[ end trace 28d391ed14b0f021 ]---
> 
> 
> Martin Blumenstingl (2):
>    dt-bindings: nand: meson: add support for more SoCs
>    mtd: rawnand: meson: support for older SoCs up to Meson8
> 
>   .../bindings/mtd/amlogic,meson-nand.txt       | 14 ++++--
>   drivers/mtd/nand/raw/meson_nand.c             | 46 +++++++++++++------
>   2 files changed, 42 insertions(+), 18 deletions(-)
>
Martin Blumenstingl March 5, 2019, 10:12 p.m. UTC | #2
Hi Liang,

On Mon, Mar 4, 2019 at 5:55 AM Liang Yang <liang.yang@amlogic.com> wrote:
>
> Hello Martin,
>
> On 2019/3/2 2:29, Martin Blumenstingl wrote:
> > Hi Liang,
> >
> > I am trying to add support for older SoCs to the meson-nand driver.
> > Back when the driver was in development I used an early revision (of
> > your driver) and did some modifications to make it work on older SoCs.
> >
> > Now that the driver is upstream I wanted to give it another try and
> > make a real patch out of it. Unfortunately it's not working anymore.
> >
> > As far as I know the NFC IP block revision on GXL is similar (or even
> > the same?) as on all older SoCs. As far as I can tell only the clock
> > setup is different on the older SoCs (which have a dedicated NAND
> > clock):
> > - we don't need the "amlogic,mmc-syscon" property on the older SoCs
> >    because we don't need to setup any muxing (common clock framework
> >    will do everything for us)
> > - "rx" and "tx" clocks don't exist
> > - I could not find any other differences between Meson8, Meson8b,
> >    Meson8m2, GXBB and GXL
> >
> That is right. the serials NFC is almost the same except:
> 1) The clock control and source that M8-serials are not share with EMMC.
> 2) The base register address
> 3) DMA encryption option which we don't care on NFC driver.
great, thank you for confirming this!

> > In this series I'm sending two patches which add support for the older
> > SoCs.
> >
> > Unfortunately these patches are currently not working for me (hence the
> > "RFC" prefix). I get a (strange) crash which is triggered by the
> > kzalloc() in meson_nfc_read_buf() - see below for more details.
> >
> > Can you please help me on this one? I'd like to know whether:
> > - the meson-nand driver works for you on GXL or AXG on linux-next?
> >    (I was running these patches on top of next-20190301 on my M8S
> >    board which uses a 32-bit Meson8m2 SoC. I don't have any board using
> >    a GXL SoC which also has NAND)
> Yes, it works on AXG platform using a MXIC slc nand flash(MX30LF4G); but
> i an not sure it runs the same flow with yours. because i see the print
> "Counld not find a valid ONFI parameter page, ...." in yours. i will try
> to reproduce it on AXG(i don't have a M8 platform now).
I'm looking forward to hear about the test results on your AXG boards
for reference: my board has a SK Hynix H27UCG8T2B (ID bytes: 0xad 0xde
0x94 0xeb 0x74 0x44, 20nm MLC)
I have another board (where I haven't tested the NFC driver yet) with
a SK Hynix H27UCG8T2E (ID bytes: 0xad 0xde 0x14 0xa7 0x42 0x4a, 1Ynm
MLC). if it helps with your analysis I can test on that board as well

> > - you see any issue with my patches? (maybe I missed more differences
> >    between GXL and the older SoCs)
> >
> i think it is ok now.
many thanks for checking my patches!


Regards
Martin
Miquel Raynal March 7, 2019, 1:09 p.m. UTC | #3
Hello,

Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote on Tue,
5 Mar 2019 23:12:51 +0100:

> Hi Liang,
> 
> On Mon, Mar 4, 2019 at 5:55 AM Liang Yang <liang.yang@amlogic.com> wrote:
> >
> > Hello Martin,
> >
> > On 2019/3/2 2:29, Martin Blumenstingl wrote:  
> > > Hi Liang,
> > >
> > > I am trying to add support for older SoCs to the meson-nand driver.
> > > Back when the driver was in development I used an early revision (of
> > > your driver) and did some modifications to make it work on older SoCs.
> > >
> > > Now that the driver is upstream I wanted to give it another try and
> > > make a real patch out of it. Unfortunately it's not working anymore.
> > >
> > > As far as I know the NFC IP block revision on GXL is similar (or even
> > > the same?) as on all older SoCs. As far as I can tell only the clock
> > > setup is different on the older SoCs (which have a dedicated NAND
> > > clock):
> > > - we don't need the "amlogic,mmc-syscon" property on the older SoCs
> > >    because we don't need to setup any muxing (common clock framework
> > >    will do everything for us)
> > > - "rx" and "tx" clocks don't exist
> > > - I could not find any other differences between Meson8, Meson8b,
> > >    Meson8m2, GXBB and GXL
> > >  
> > That is right. the serials NFC is almost the same except:
> > 1) The clock control and source that M8-serials are not share with EMMC.
> > 2) The base register address
> > 3) DMA encryption option which we don't care on NFC driver.  
> great, thank you for confirming this!
> 
> > > In this series I'm sending two patches which add support for the older
> > > SoCs.
> > >
> > > Unfortunately these patches are currently not working for me (hence the
> > > "RFC" prefix). I get a (strange) crash which is triggered by the
> > > kzalloc() in meson_nfc_read_buf() - see below for more details.
> > >
> > > Can you please help me on this one? I'd like to know whether:
> > > - the meson-nand driver works for you on GXL or AXG on linux-next?
> > >    (I was running these patches on top of next-20190301 on my M8S
> > >    board which uses a 32-bit Meson8m2 SoC. I don't have any board using
> > >    a GXL SoC which also has NAND)  
> > Yes, it works on AXG platform using a MXIC slc nand flash(MX30LF4G); but
> > i an not sure it runs the same flow with yours. because i see the print
> > "Counld not find a valid ONFI parameter page, ...." in yours. i will try
> > to reproduce it on AXG(i don't have a M8 platform now).  
> I'm looking forward to hear about the test results on your AXG boards
> for reference: my board has a SK Hynix H27UCG8T2B (ID bytes: 0xad 0xde
> 0x94 0xeb 0x74 0x44, 20nm MLC)
> I have another board (where I haven't tested the NFC driver yet) with
> a SK Hynix H27UCG8T2E (ID bytes: 0xad 0xde 0x14 0xa7 0x42 0x4a, 1Ynm
> MLC). if it helps with your analysis I can test on that board as well

Liang, you just have to fake the output of the ONFI page detection and
you will probably run into this error which will then be easy to
reproduce.


Thanks,
Miquèl
Liang Yang March 7, 2019, 1:36 p.m. UTC | #4
Hi Martin,

On 2019/3/7 21:09, Miquel Raynal wrote:
> Hello,
> 
> Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote on Tue,
> 5 Mar 2019 23:12:51 +0100:
> 
>> Hi Liang,
>>
>> On Mon, Mar 4, 2019 at 5:55 AM Liang Yang <liang.yang@amlogic.com> wrote:
>>>
>>> Hello Martin,
>>>
>>> On 2019/3/2 2:29, Martin Blumenstingl wrote:
>>>> Hi Liang,
>>>>
>>>> I am trying to add support for older SoCs to the meson-nand driver.
>>>> Back when the driver was in development I used an early revision (of
>>>> your driver) and did some modifications to make it work on older SoCs.
>>>>
>>>> Now that the driver is upstream I wanted to give it another try and
>>>> make a real patch out of it. Unfortunately it's not working anymore.
>>>>
>>>> As far as I know the NFC IP block revision on GXL is similar (or even
>>>> the same?) as on all older SoCs. As far as I can tell only the clock
>>>> setup is different on the older SoCs (which have a dedicated NAND
>>>> clock):
>>>> - we don't need the "amlogic,mmc-syscon" property on the older SoCs
>>>>     because we don't need to setup any muxing (common clock framework
>>>>     will do everything for us)
>>>> - "rx" and "tx" clocks don't exist
>>>> - I could not find any other differences between Meson8, Meson8b,
>>>>     Meson8m2, GXBB and GXL
>>>>   
>>> That is right. the serials NFC is almost the same except:
>>> 1) The clock control and source that M8-serials are not share with EMMC.
>>> 2) The base register address
>>> 3) DMA encryption option which we don't care on NFC driver.
>> great, thank you for confirming this!
>>
>>>> In this series I'm sending two patches which add support for the older
>>>> SoCs.
>>>>
>>>> Unfortunately these patches are currently not working for me (hence the
>>>> "RFC" prefix). I get a (strange) crash which is triggered by the
>>>> kzalloc() in meson_nfc_read_buf() - see below for more details.
>>>>
>>>> Can you please help me on this one? I'd like to know whether:
>>>> - the meson-nand driver works for you on GXL or AXG on linux-next?
>>>>     (I was running these patches on top of next-20190301 on my M8S
>>>>     board which uses a 32-bit Meson8m2 SoC. I don't have any board using
>>>>     a GXL SoC which also has NAND)
>>> Yes, it works on AXG platform using a MXIC slc nand flash(MX30LF4G); but
>>> i an not sure it runs the same flow with yours. because i see the print
>>> "Counld not find a valid ONFI parameter page, ...." in yours. i will try
>>> to reproduce it on AXG(i don't have a M8 platform now).
>> I'm looking forward to hear about the test results on your AXG boards
>> for reference: my board has a SK Hynix H27UCG8T2B (ID bytes: 0xad 0xde
>> 0x94 0xeb 0x74 0x44, 20nm MLC)
>> I have another board (where I haven't tested the NFC driver yet) with
>> a SK Hynix H27UCG8T2E (ID bytes: 0xad 0xde 0x14 0xa7 0x42 0x4a, 1Ynm
>> MLC). if it helps with your analysis I can test on that board as well
> 
> Liang, you just have to fake the output of the ONFI page detection and
> you will probably run into this error which will then be easy to
> reproduce.
>

I have tested it on AXG platform; I find MX30LF4G also enter this flow , 
but it doesn't crash. log as follow:
[    1.018056] Could not find a valid ONFI parameter page, trying 
bit-wise majority to recover it
[    1.021057] ONFI parameter recovery failed, aborting
[    1.025966] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
[    1.032237] nand: Macronix NAND 512MiB 3,3V 8-bit
[    1.036889] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, 
OOB size: 64
[    1.045741] Bad block table not found for chip 0
[    1.050077] Bad block table not found for chip 0
[    1.053538] Scanning device for bad blocks
[    1.069094] Bad eraseblock 20 at 0x000000280000
[    1.071074] Bad eraseblock 24 at 0x000000300000
[    1.127494] random: fast init done
[    1.348754] Bad eraseblock 519 at 0x0000040e0000
[    1.632819] Bad eraseblock 1028 at 0x000008080000
[    2.405420] Bad eraseblock 2411 at 0x000012d60000
[    3.349276] Bad block table written to 0x00001ffe0000, version 0x01
[    3.350967] Bad block table written to 0x00001ffc0000, version 0x01
[    3.356429] 5 fixed-partitions partitions found on MTD device 
ffe07800.nfc
[    3.362925] Creating 5 MTD partitions on "ffe07800.nfc":
[    3.368188] 0x000000000000-0x000000200000 : "boot"
[    3.373970] 0x000000200000-0x000000600000 : "env"
[    3.378564] 0x000000600000-0x000001000000 : "system"
[    3.383511] 0x000001000000-0x000004000000 : "rootfs"
[    3.388525] 0x000004000000-0x00000c000000 : "media"

I am looking forward to a Hynix nand flash to test on GXL platform, and 
there should be something different from MXIC flash on ONFI page 
detection. I will update the result asap net week.
Do you have another type of nand flash to test on M8 platform ?
> 
> Thanks,
> Miquèl
> 
> .
>
Liang Yang March 12, 2019, 9:06 a.m. UTC | #5
Hi Martin and Miquel,

On 2019/3/7 21:09, Miquel Raynal wrote:
> Hello,
> 
> Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote on Tue,
> 5 Mar 2019 23:12:51 +0100:
> 
>> Hi Liang,
>>
>> On Mon, Mar 4, 2019 at 5:55 AM Liang Yang <liang.yang@amlogic.com> wrote:
>>>
>>> Hello Martin,
>>>
>>> On 2019/3/2 2:29, Martin Blumenstingl wrote:
>>>> Hi Liang,
>>>>
>>>> I am trying to add support for older SoCs to the meson-nand driver.
>>>> Back when the driver was in development I used an early revision (of
>>>> your driver) and did some modifications to make it work on older SoCs.
>>>>
>>>> Now that the driver is upstream I wanted to give it another try and
>>>> make a real patch out of it. Unfortunately it's not working anymore.
>>>>
>>>> As far as I know the NFC IP block revision on GXL is similar (or even
>>>> the same?) as on all older SoCs. As far as I can tell only the clock
>>>> setup is different on the older SoCs (which have a dedicated NAND
>>>> clock):
>>>> - we don't need the "amlogic,mmc-syscon" property on the older SoCs
>>>>     because we don't need to setup any muxing (common clock framework
>>>>     will do everything for us)
>>>> - "rx" and "tx" clocks don't exist
>>>> - I could not find any other differences between Meson8, Meson8b,
>>>>     Meson8m2, GXBB and GXL
>>>>   
>>> That is right. the serials NFC is almost the same except:
>>> 1) The clock control and source that M8-serials are not share with EMMC.
>>> 2) The base register address
>>> 3) DMA encryption option which we don't care on NFC driver.
>> great, thank you for confirming this!
>>
>>>> In this series I'm sending two patches which add support for the older
>>>> SoCs.
>>>>
>>>> Unfortunately these patches are currently not working for me (hence the
>>>> "RFC" prefix). I get a (strange) crash which is triggered by the
>>>> kzalloc() in meson_nfc_read_buf() - see below for more details.
>>>>
>>>> Can you please help me on this one? I'd like to know whether:
>>>> - the meson-nand driver works for you on GXL or AXG on linux-next?
>>>>     (I was running these patches on top of next-20190301 on my M8S
>>>>     board which uses a 32-bit Meson8m2 SoC. I don't have any board using
>>>>     a GXL SoC which also has NAND)
>>> Yes, it works on AXG platform using a MXIC slc nand flash(MX30LF4G); but
>>> i an not sure it runs the same flow with yours. because i see the print
>>> "Counld not find a valid ONFI parameter page, ...." in yours. i will try
>>> to reproduce it on AXG(i don't have a M8 platform now).
>> I'm looking forward to hear about the test results on your AXG boards
>> for reference: my board has a SK Hynix H27UCG8T2B (ID bytes: 0xad 0xde
>> 0x94 0xeb 0x74 0x44, 20nm MLC)
>> I have another board (where I haven't tested the NFC driver yet) with
>> a SK Hynix H27UCG8T2E (ID bytes: 0xad 0xde 0x14 0xa7 0x42 0x4a, 1Ynm
>> MLC). if it helps with your analysis I can test on that board as well
> 
> Liang, you just have to fake the output of the ONFI page detection and
> you will probably run into this error which will then be easy to
> reproduce.
> 
i don't reproduce it by using a SK Hynix nand flash H27UCG8T2E on gxl 
platform. it runs well.
[......]
[    0.977127] loop: module loaded
[    0.998625] Could not find a valid ONFI parameter page, trying 
bit-wise majority to recover it
[    1.001619] ONFI parameter recovery failed, aborting
[    1.006684] Could not find valid JEDEC parameter page; aborting
[    1.012391] nand: device found, Manufacturer ID: 0xad, Chip ID: 0xde
[    1.018660] nand: Hynix NAND 8GiB 3,3V 8-bit
[    1.022885] nand: 8192 MiB, MLC, erase size: 4096 KiB, page size: 
16384, OOB size: 1664
[    1.047033] Bad block table not found for chip 0
[    1.054950] Bad block table not found for chip 0
[    1.054970] Scanning device for bad blocks
[    1.522664] random: fast init done
[    4.893731] Bad eraseblock 1985 at 0x0001f07fc000
[    5.020637] Bad block table written to 0x0001ffc00000, version 0x01
[    5.028258] Bad block table written to 0x0001ff800000, version 0x01
[    5.029905] 5 fixed-partitions partitions found on MTD device 
d0074800.nfc
[    5.035714] Creating 5 MTD partitions on "d0074800.nfc":
[......]

Martin, Now i am not sure whether NFC driver leads to kernel panic when
calling kmem_cache_alloc_trace.

> .
>
Martin Blumenstingl March 16, 2019, 10:55 a.m. UTC | #6
Hi Liang,

On Tue, Mar 12, 2019 at 10:05 AM Liang Yang <liang.yang@amlogic.com> wrote:
>
> Hi Martin and Miquel,
>
> On 2019/3/7 21:09, Miquel Raynal wrote:
> > Hello,
> >
> > Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote on Tue,
> > 5 Mar 2019 23:12:51 +0100:
> >
> >> Hi Liang,
> >>
> >> On Mon, Mar 4, 2019 at 5:55 AM Liang Yang <liang.yang@amlogic.com> wrote:
> >>>
> >>> Hello Martin,
> >>>
> >>> On 2019/3/2 2:29, Martin Blumenstingl wrote:
> >>>> Hi Liang,
> >>>>
> >>>> I am trying to add support for older SoCs to the meson-nand driver.
> >>>> Back when the driver was in development I used an early revision (of
> >>>> your driver) and did some modifications to make it work on older SoCs.
> >>>>
> >>>> Now that the driver is upstream I wanted to give it another try and
> >>>> make a real patch out of it. Unfortunately it's not working anymore.
> >>>>
> >>>> As far as I know the NFC IP block revision on GXL is similar (or even
> >>>> the same?) as on all older SoCs. As far as I can tell only the clock
> >>>> setup is different on the older SoCs (which have a dedicated NAND
> >>>> clock):
> >>>> - we don't need the "amlogic,mmc-syscon" property on the older SoCs
> >>>>     because we don't need to setup any muxing (common clock framework
> >>>>     will do everything for us)
> >>>> - "rx" and "tx" clocks don't exist
> >>>> - I could not find any other differences between Meson8, Meson8b,
> >>>>     Meson8m2, GXBB and GXL
> >>>>
> >>> That is right. the serials NFC is almost the same except:
> >>> 1) The clock control and source that M8-serials are not share with EMMC.
> >>> 2) The base register address
> >>> 3) DMA encryption option which we don't care on NFC driver.
> >> great, thank you for confirming this!
> >>
> >>>> In this series I'm sending two patches which add support for the older
> >>>> SoCs.
> >>>>
> >>>> Unfortunately these patches are currently not working for me (hence the
> >>>> "RFC" prefix). I get a (strange) crash which is triggered by the
> >>>> kzalloc() in meson_nfc_read_buf() - see below for more details.
> >>>>
> >>>> Can you please help me on this one? I'd like to know whether:
> >>>> - the meson-nand driver works for you on GXL or AXG on linux-next?
> >>>>     (I was running these patches on top of next-20190301 on my M8S
> >>>>     board which uses a 32-bit Meson8m2 SoC. I don't have any board using
> >>>>     a GXL SoC which also has NAND)
> >>> Yes, it works on AXG platform using a MXIC slc nand flash(MX30LF4G); but
> >>> i an not sure it runs the same flow with yours. because i see the print
> >>> "Counld not find a valid ONFI parameter page, ...." in yours. i will try
> >>> to reproduce it on AXG(i don't have a M8 platform now).
> >> I'm looking forward to hear about the test results on your AXG boards
> >> for reference: my board has a SK Hynix H27UCG8T2B (ID bytes: 0xad 0xde
> >> 0x94 0xeb 0x74 0x44, 20nm MLC)
> >> I have another board (where I haven't tested the NFC driver yet) with
> >> a SK Hynix H27UCG8T2E (ID bytes: 0xad 0xde 0x14 0xa7 0x42 0x4a, 1Ynm
> >> MLC). if it helps with your analysis I can test on that board as well
> >
> > Liang, you just have to fake the output of the ONFI page detection and
> > you will probably run into this error which will then be easy to
> > reproduce.
> >
> i don't reproduce it by using a SK Hynix nand flash H27UCG8T2E on gxl
> platform. it runs well.
> [......]
> [    0.977127] loop: module loaded
> [    0.998625] Could not find a valid ONFI parameter page, trying
> bit-wise majority to recover it
> [    1.001619] ONFI parameter recovery failed, aborting
> [    1.006684] Could not find valid JEDEC parameter page; aborting
> [    1.012391] nand: device found, Manufacturer ID: 0xad, Chip ID: 0xde
> [    1.018660] nand: Hynix NAND 8GiB 3,3V 8-bit
> [    1.022885] nand: 8192 MiB, MLC, erase size: 4096 KiB, page size:
> 16384, OOB size: 1664
> [    1.047033] Bad block table not found for chip 0
> [    1.054950] Bad block table not found for chip 0
> [    1.054970] Scanning device for bad blocks
> [    1.522664] random: fast init done
> [    4.893731] Bad eraseblock 1985 at 0x0001f07fc000
> [    5.020637] Bad block table written to 0x0001ffc00000, version 0x01
> [    5.028258] Bad block table written to 0x0001ff800000, version 0x01
> [    5.029905] 5 fixed-partitions partitions found on MTD device
> d0074800.nfc
> [    5.035714] Creating 5 MTD partitions on "d0074800.nfc":
> [......]
>
> Martin, Now i am not sure whether NFC driver leads to kernel panic when
> calling kmem_cache_alloc_trace.
thank you for confirming that it works for you on GXL

I'm not sure that this is a NFC driver problem.
after enabling CONFIG_SLAB_FREELIST_HARDENED in my kernel config the
crash moves. it's now crashing in slub.c's kfree() at
BUG_ON(!PageCompound(page));

maybe this is related to some difference in 32-bit ARM and arm64
or it could even be some memory management issue
I'm not sure yet so I'll try to dig deeper


Regards
Martin
[    2.080461] Could not find a valid ONFI parameter page, trying bit-wise majority to recover it
[    2.084140] ONFI parameter recovery failed, aborting
[    2.089154] ------------[ cut here ]------------
[    2.093631] kernel BUG at mm/slub.c:3950!
[    2.097619] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[    2.103427] Modules linked in:
[    2.106464] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.0.0-11944-g4479fa4728e9-dirty #4196
[    2.114781] Hardware name: Amlogic Meson platform
[    2.119470] PC is at kfree+0x298/0x2c4
[    2.123195] LR is at meson_nfc_exec_op+0x34c/0x3e8
[    2.127958] pc : [<c048e9b4>]    lr : [<c08c2108>]    psr: 40000013
[    2.134199] sp : c02afc60  ip : eafd9000  fp : e9e36e40
[    2.139400] r10: 00000002  r9 : e9d6c048  r8 : ee36434b
[    2.144601] r7 : eb59fc80  r6 : ee36434b  r5 : c08c2108  r4 : c02afd18
[    2.151102] r3 : eb59fc84  r2 : c12089c0  r1 : ee364340  r0 : ee36434b
[    2.157605] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    2.164712] Control: 10c5387d  Table: 0020404a  DAC: 00000051
[    2.170433] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
[    2.176413] Stack: (0xc02afc60 to 0xc02b0000)
[    2.180752] fc60: ee36434b c08c0c74 00000000 c02afd18 00000028 e9e92680 c02afcf0 ee36434b
[    2.188899] fc80: e9d6c048 c08c2108 00000008 00000002 10624dd3 c02afce4 10624dd3 00000005
[    2.197047] fca0: e9e36e40 c0f67310 00220005 c0f152e8 00000005 e9d6c048 00000005 c02afd57
[    2.205195] fcc0: c1108c88 e9e36e40 c02afcf0 e9d6c428 e9d6c3b0 c08adb18 00000000 40000000
[    2.213343] fce0: c02afce8 00000000 c02afcf0 00000003 00000000 00000090 00000000 00000000
[    2.221491] fd00: 00000000 00000001 00000001 c02afcdf 00000000 00000190 00000002 00000005
[    2.229639] fd20: c02afd57 00000001 00000000 5b1da8e9 00000000 c1108c88 c11812f0 e9d6c3c0
[    2.237787] fd40: 00000001 000000de 00000000 c08b89d4 c0f14704 00d6c3c0 00000000 5b1da8e9
[    2.245936] fd60: 00000015 e9d6c048 c1108c88 c08b8d30 e9d6c048 c08b4174 00000000 60000013
[    2.254084] fd80: c0ef7604 000000ad c0da27ac 00000000 e9e36e8c c0cf8950 c121b50c 5b1da8e9
[    2.262232] fda0: e9e36e40 e9e36e40 e9d6c040 c0ef7604 e987f810 eafd6a00 e9d6c048 c1108c88
[    2.270380] fdc0: e9e36e8c c08c1530 00000000 e9874e80 e9e36e40 c0517fd0 00000000 eafd6774
[    2.278528] fde0: e9e36f34 00000000 00000000 00000000 00000001 5b1da8e9 00000000 e987f810
[    2.286676] fe00: 00000000 c11817c8 00000000 00000000 c11817c8 00000000 c1071854 c0828b28
[    2.294824] fe20: c1216c78 e987f810 c1216c7c 00000000 00000000 c0826bb0 e987f810 c11817c8
[    2.302973] fe40: c11817c8 c08270b0 00000000 c1071834 c10004a8 c0826dfc c10004a8 c0abf0ac
[    2.311121] fe60: c0da3f14 e987f810 00000000 c11817c8 c08270b0 00000000 c1071834 c10004a8
[    2.319269] fe80: c1071854 c08270a8 00000000 c11817c8 e987f810 c0827108 e98bafb4 c1108c88
[    2.327417] fea0: c11817c8 c0824f78 c1176300 c029c958 e98bafb4 5b1da8e9 c029c96c c11817c8
[    2.335565] fec0: e9e39e80 c1176300 00000000 c0825fe8 c0f155b4 c1108c88 ffffe000 c11817c8
[    2.343713] fee0: c1108c88 ffffe000 c103b854 c0827c34 c11c3e40 c1108c88 ffffe000 c0302f54
[    2.351861] ff00: ebfffdc0 c0347218 c0fb6d80 c0f36800 00000000 00000006 00000006 c0ea6330
[    2.360010] ff20: 00000000 c1108c88 c0eb3dc4 c0ea63a4 c11da500 ebfffdae ebfffdb3 5b1da8e9
[    2.368158] ff40: c1071838 c11c3e40 c10914a4 5b1da8e9 c11c3e40 c10917f0 00000007 c11da500
[    2.376306] ff60: c11da500 c1001180 00000006 00000006 00000000 c10004a8 0000013d 00000000
[    2.384454] ff80: c02c0504 00000000 c0cf12e8 00000000 00000000 00000000 00000000 00000000
[    2.392602] ffa0: 00000000 c0cf12f0 00000000 c03010e8 00000000 00000000 00000000 00000000
[    2.400750] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    2.408898] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    2.417052] [<c048e9b4>] (kfree) from [<c08c2108>] (meson_nfc_exec_op+0x34c/0x3e8)
[    2.424592] [<c08c2108>] (meson_nfc_exec_op) from [<c08adb18>] (nand_readid_op+0x128/0x1c4)
[    2.432914] [<c08adb18>] (nand_readid_op) from [<c08b89d4>] (hynix_nand_has_valid_jedecid+0x34/0x78)
[    2.442013] [<c08b89d4>] (hynix_nand_has_valid_jedecid) from [<c08b8d30>] (hynix_nand_decode_id+0x64/0x3fc)
[    2.451721] [<c08b8d30>] (hynix_nand_decode_id) from [<c08b4174>] (nand_scan_with_ids+0xa04/0x171c)
[    2.460735] [<c08b4174>] (nand_scan_with_ids) from [<c08c1530>] (meson_nfc_probe+0x460/0x690)
[    2.469232] [<c08c1530>] (meson_nfc_probe) from [<c0828b28>] (platform_drv_probe+0x48/0x98)
[    2.477553] [<c0828b28>] (platform_drv_probe) from [<c0826bb0>] (really_probe+0x1e0/0x2cc)
[    2.485786] [<c0826bb0>] (really_probe) from [<c0826dfc>] (driver_probe_device+0x60/0x16c)
[    2.494021] [<c0826dfc>] (driver_probe_device) from [<c08270a8>] (device_driver_attach+0x58/0x60)
[    2.502862] [<c08270a8>] (device_driver_attach) from [<c0827108>] (__driver_attach+0x58/0xcc)
[    2.511357] [<c0827108>] (__driver_attach) from [<c0824f78>] (bus_for_each_dev+0x74/0xb4)
[    2.519505] [<c0824f78>] (bus_for_each_dev) from [<c0825fe8>] (bus_add_driver+0x1b8/0x1d8)
[    2.527740] [<c0825fe8>] (bus_add_driver) from [<c0827c34>] (driver_register+0x74/0x108)
[    2.535804] [<c0827c34>] (driver_register) from [<c0302f54>] (do_one_initcall+0x54/0x284)
[    2.543956] [<c0302f54>] (do_one_initcall) from [<c1001180>] (kernel_init_freeable+0x2d4/0x36c)
[    2.552621] [<c1001180>] (kernel_init_freeable) from [<c0cf12f0>] (kernel_init+0x8/0x110)
[    2.560768] [<c0cf12f0>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
[    2.568304] Exception stack(0xc02affb0 to 0xc02afff8)
[    2.573333] ffa0:                                     00000000 00000000 00000000 00000000
[    2.581483] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    2.589630] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    2.596220] Code: 1a000003 e5973004 e3130001 1a000000 (e7f001f2)  
[    2.602295] ---[ end trace 0bdf5d4bfd4b3fb1 ]---
Martin Blumenstingl March 19, 2019, 8:27 p.m. UTC | #7
Hello Liang,

On Sat, Mar 16, 2019 at 11:55 AM Martin Blumenstingl
<martin.blumenstingl@googlemail.com> wrote:
[...]
> > Martin, Now i am not sure whether NFC driver leads to kernel panic when
> > calling kmem_cache_alloc_trace.
> thank you for confirming that it works for you on GXL
>
> I'm not sure that this is a NFC driver problem.
> after enabling CONFIG_SLAB_FREELIST_HARDENED in my kernel config the
> crash moves. it's now crashing in slub.c's kfree() at
> BUG_ON(!PageCompound(page));
I added some debug prints in meson_nfc_read_buf() to get some details
about the info buffer before the crash,
format is: meson_nfc_read_buf <virtual address> <physical address>

during my first test three different addresses are used:
- meson_nfc_read_buf e9e6c640 0x29e6c640 (works fine)
- meson_nfc_read_buf e9e6c680 0x29e6c680 (works fine)
- meson_nfc_read_buf ee39a34b 0x2e39a34b (crashes during kfree)

so I tried playing around with the allocation size (see the attached
patch) and changed it to:
  kzalloc(PER_INFO_BYTE + 64, GFP_KERNEL)
this results in the following addresses being used:
- meson_nfc_read_buf e9ea4280 0x29ea4280 (works fine)
- meson_nfc_read_buf e9ea4300 0x29ea4300 (works fine)
(there is no crash anymore)

Liang, are there any special requirements on the "info address" like
the alignment?
also do you know why the PER_INFO_BYTE buffer is allocated dynamically
in meson_nfc_read_buf() instead of allocating it at initialization?
I'm not saying that it should be changed! I'm curious because there's
per-meson_nfc_nand_chip info and data buffers which are allocated at
initialization time.


meson_nfc_read_buf debug log with PER_INFO_BYTE allocation:
[    2.032914] meson_nfc_read_buf e9e6c640 0x29e6c640
[    2.033005] meson_nfc_dma_buffer_setup 0x29e6c640
[    2.037717] meson_nfc_read_buf: about to kfree info
[    2.042535] meson_nfc_read_buf: kfree'd info
[    2.046794] meson_nfc_read_buf e9e6c640 0x29e6c640
[    2.051552] meson_nfc_dma_buffer_setup 0x29e6c640
[    2.056261] meson_nfc_read_buf: about to kfree info
[    2.061086] meson_nfc_read_buf: kfree'd info
[    2.065356] meson_nfc_read_buf e9e6c680 0x29e6c680
[    2.070102] meson_nfc_dma_buffer_setup 0x29e6c680
[    2.074810] meson_nfc_read_buf: about to kfree info
[    2.079635] meson_nfc_read_buf: kfree'd info
[    2.083978] meson_nfc_read_buf e9e6c640 0x29e6c640
[    2.088684] meson_nfc_dma_buffer_setup 0x29e6c640
[    2.093334] meson_nfc_read_buf: about to kfree info
[    2.098199] meson_nfc_read_buf: kfree'd info
[    2.102446] meson_nfc_read_buf e9e6c640 0x29e6c640
[    2.107208] meson_nfc_dma_buffer_setup 0x29e6c640
[    2.111883] meson_nfc_read_buf: about to kfree info
[    2.116765] meson_nfc_read_buf: kfree'd info
[    2.120996] meson_nfc_read_buf e9e6c640 0x29e6c640
[    2.125762] meson_nfc_dma_buffer_setup 0x29e6c640
[    2.130433] meson_nfc_read_buf: about to kfree info
[    2.135294] meson_nfc_read_buf: kfree'd info
[    2.139545] Could not find a valid ONFI parameter page, trying
bit-wise majority to recover it
[    2.148173] ONFI parameter recovery failed, aborting
[    2.153058] meson_nfc_read_buf e9e6c680 0x29e6c680
[    2.157831] meson_nfc_dma_buffer_setup 0x29e6c680
[    2.162527] meson_nfc_read_buf: about to kfree info
[    2.167369] meson_nfc_read_buf: kfree'd info
[    2.171611] meson_nfc_read_buf ee39a34b 0x2e39a34b
[    2.176383] meson_nfc_dma_buffer_setup 0x2e39a34b
[    2.181076] meson_nfc_read_buf: about to kfree info
[    2.185932] ------------[ cut here ]------------
[    2.190503] kernel BUG at mm/slub.c:3950!
[    2.194491] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...

meson_nfc_read_buf debug log with PER_INFO_BYTE+64 allocation:
[    2.033019] meson_nfc_read_buf e9ea4280 0x29ea4280
[    2.033112] meson_nfc_dma_buffer_setup 0x29ea4280
[    2.037847] meson_nfc_read_buf: about to kfree info
[    2.042642] meson_nfc_read_buf: kfree'd info
[    2.046909] meson_nfc_read_buf e9ea4280 0x29ea4280
[    2.051659] meson_nfc_dma_buffer_setup 0x29ea4280
[    2.056374] meson_nfc_read_buf: about to kfree info
[    2.061192] meson_nfc_read_buf: kfree'd info
[    2.065461] meson_nfc_read_buf e9ea4280 0x29ea4280
[    2.070208] meson_nfc_dma_buffer_setup 0x29ea4280
[    2.074922] meson_nfc_read_buf: about to kfree info
[    2.079742] meson_nfc_read_buf: kfree'd info
[    2.084087] meson_nfc_read_buf e9ea4280 0x29ea4280
[    2.088789] meson_nfc_dma_buffer_setup 0x29ea4280
[    2.093440] meson_nfc_read_buf: about to kfree info
[    2.098303] meson_nfc_read_buf: kfree'd info
[    2.102553] meson_nfc_read_buf e9ea4280 0x29ea4280
[    2.107316] meson_nfc_dma_buffer_setup 0x29ea4280
[    2.111990] meson_nfc_read_buf: about to kfree info
[    2.116870] meson_nfc_read_buf: kfree'd info
[    2.121103] meson_nfc_read_buf e9ea4280 0x29ea4280
[    2.125868] meson_nfc_dma_buffer_setup 0x29ea4280
[    2.130540] meson_nfc_read_buf: about to kfree info
[    2.135400] meson_nfc_read_buf: kfree'd info
[    2.139652] Could not find a valid ONFI parameter page, trying
bit-wise majority to recover it
[    2.148276] ONFI parameter recovery failed, aborting
[    2.153165] meson_nfc_read_buf e9ea4280 0x29ea4280
[    2.157938] meson_nfc_dma_buffer_setup 0x29ea4280
[    2.162634] meson_nfc_read_buf: about to kfree info
[    2.167475] meson_nfc_read_buf: kfree'd info
[    2.171717] meson_nfc_read_buf e9ea4280 0x29ea4280
[    2.176489] meson_nfc_dma_buffer_setup 0x29ea4280
[    2.181183] meson_nfc_read_buf: about to kfree info
[    2.186025] meson_nfc_read_buf: kfree'd info
[    2.190265] nand: device found, Manufacturer ID: 0xad, Chip ID: 0xde
[    2.196598] nand: Hynix NAND 8GiB 3,3V 8-bit
[    2.200840] nand: 8192 MiB, MLC, erase size: 4096 KiB, page size:
16384, OOB size: 1280
[    2.208829] meson_nfc_read_buf e9ea4300 0x29ea4300
[    2.213581] meson_nfc_dma_buffer_setup 0x29ea4300
[    2.218291] meson_nfc_read_buf: about to kfree info
[    2.223115] meson_nfc_read_buf: kfree'd info
[    2.227374] ------------[ cut here ]------------
[    2.231968] WARNING: CPU: 1 PID: 1 at
drivers/mtd/nand/raw/nand_base.c:5503 nand_scan_with_ids+0x1718/0x171c
[    2.241760] No oob scheme defined for oobsize 1280
...
(the "No oob scheme defined for oobsize 1280" message is expected)


Regards
Martin
diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c
index b49a45f255f8..cdc426cd0a43 100644
--- a/drivers/mtd/nand/raw/meson_nand.c
+++ b/drivers/mtd/nand/raw/meson_nand.c
@@ -493,6 +493,7 @@ static int meson_nfc_dma_buffer_setup(struct nand_chip *nand, u8 *databuf,
 
 	if (infobuf) {
 		nfc->iaddr = dma_map_single(nfc->dev, infobuf, infolen, dir);
+		printk("%s 0x%08x\n", __func__, nfc->iaddr);
 		ret = dma_mapping_error(nfc->dev, nfc->iaddr);
 		if (ret) {
 			dev_err(nfc->dev, "DMA mapping error\n");
@@ -528,10 +529,10 @@ static int meson_nfc_read_buf(struct nand_chip *nand, u8 *buf, int len)
 	u32 cmd;
 	u8 *info;
 
-	info = kzalloc(PER_INFO_BYTE, GFP_KERNEL);
+	info = kzalloc(PER_INFO_BYTE + 64, GFP_KERNEL);
 	if (!info)
 		return -ENOMEM;
-
+printk("%s %px 0x%08x\n", __func__, info, virt_to_phys(info));
 	ret = meson_nfc_dma_buffer_setup(nand, buf, len, info,
 					 PER_INFO_BYTE, DMA_FROM_DEVICE);
 	if (ret)
@@ -545,7 +546,9 @@ static int meson_nfc_read_buf(struct nand_chip *nand, u8 *buf, int len)
 	meson_nfc_dma_buffer_release(nand, len, PER_INFO_BYTE, DMA_FROM_DEVICE);
 
 out:
+printk("%s: about to kfree info\n", __func__);
 	kfree(info);
+printk("%s: kfree'd info\n", __func__);
 
 	return ret;
 }
Liang Yang March 20, 2019, 3:33 a.m. UTC | #8
Hi Martin,

Thanks for your time.
On 2019/3/20 4:27, Martin Blumenstingl wrote:
> Hello Liang,
> 
> On Sat, Mar 16, 2019 at 11:55 AM Martin Blumenstingl
> <martin.blumenstingl@googlemail.com> wrote:
> [...]
>>> Martin, Now i am not sure whether NFC driver leads to kernel panic when
>>> calling kmem_cache_alloc_trace.
>> thank you for confirming that it works for you on GXL
>>
>> I'm not sure that this is a NFC driver problem.
>> after enabling CONFIG_SLAB_FREELIST_HARDENED in my kernel config the
>> crash moves. it's now crashing in slub.c's kfree() at
>> BUG_ON(!PageCompound(page));
> I added some debug prints in meson_nfc_read_buf() to get some details
> about the info buffer before the crash,
> format is: meson_nfc_read_buf <virtual address> <physical address>
> 
> during my first test three different addresses are used:
> - meson_nfc_read_buf e9e6c640 0x29e6c640 (works fine)
> - meson_nfc_read_buf e9e6c680 0x29e6c680 (works fine)
> - meson_nfc_read_buf ee39a34b 0x2e39a34b (crashes during kfree)
> 
> so I tried playing around with the allocation size (see the attached
> patch) and changed it to:
>    kzalloc(PER_INFO_BYTE + 64, GFP_KERNEL)
> this results in the following addresses being used:
> - meson_nfc_read_buf e9ea4280 0x29ea4280 (works fine)
> - meson_nfc_read_buf e9ea4300 0x29ea4300 (works fine)
> (there is no crash anymore)
> 
> Liang, are there any special requirements on the "info address" like
> the alignment?
It must be 4 bytes alignment. i have met it previously when debugging 
NFC driver on AXG platform, but it is not specified on spec. Now i am 
confused that how to get the no aligned address "xe39a34b" when using 
kmalloc; i think it should return the aligned address. doesn't it?

> also do you know why the PER_INFO_BYTE buffer is allocated dynamically
> in meson_nfc_read_buf() instead of allocating it at initialization?
> I'm not saying that it should be changed! I'm curious because there's
> per-meson_nfc_nand_chip info and data buffers which are allocated at
> initialization time.
> NAND scan or initialization is divided into three stages: 
nand_scan_ident->nand_attach->nand_scan_tail. info and data buffer which 
depend on the result of nand_scan_ident are allocated on nand_attach; so 
nand_scan_ident can not use the info buffer on meson_nfc_nand_chip.
Allocating a fixed size info buffer before nand_scan_ident and attach it 
on the struct meson_nfc; Or considering not use dma for reading data 
less than 8 bytes. Both can reduce kmalloc and kfree calling. Thanks.
> 
> meson_nfc_read_buf debug log with PER_INFO_BYTE allocation:
> [    2.032914] meson_nfc_read_buf e9e6c640 0x29e6c640
> [    2.033005] meson_nfc_dma_buffer_setup 0x29e6c640
> [    2.037717] meson_nfc_read_buf: about to kfree info
> [    2.042535] meson_nfc_read_buf: kfree'd info
> [    2.046794] meson_nfc_read_buf e9e6c640 0x29e6c640
> [    2.051552] meson_nfc_dma_buffer_setup 0x29e6c640
> [    2.056261] meson_nfc_read_buf: about to kfree info
> [    2.061086] meson_nfc_read_buf: kfree'd info
> [    2.065356] meson_nfc_read_buf e9e6c680 0x29e6c680
> [    2.070102] meson_nfc_dma_buffer_setup 0x29e6c680
> [    2.074810] meson_nfc_read_buf: about to kfree info
> [    2.079635] meson_nfc_read_buf: kfree'd info
> [    2.083978] meson_nfc_read_buf e9e6c640 0x29e6c640
> [    2.088684] meson_nfc_dma_buffer_setup 0x29e6c640
> [    2.093334] meson_nfc_read_buf: about to kfree info
> [    2.098199] meson_nfc_read_buf: kfree'd info
> [    2.102446] meson_nfc_read_buf e9e6c640 0x29e6c640
> [    2.107208] meson_nfc_dma_buffer_setup 0x29e6c640
> [    2.111883] meson_nfc_read_buf: about to kfree info
> [    2.116765] meson_nfc_read_buf: kfree'd info
> [    2.120996] meson_nfc_read_buf e9e6c640 0x29e6c640
> [    2.125762] meson_nfc_dma_buffer_setup 0x29e6c640
> [    2.130433] meson_nfc_read_buf: about to kfree info
> [    2.135294] meson_nfc_read_buf: kfree'd info
> [    2.139545] Could not find a valid ONFI parameter page, trying
> bit-wise majority to recover it
> [    2.148173] ONFI parameter recovery failed, aborting
> [    2.153058] meson_nfc_read_buf e9e6c680 0x29e6c680
> [    2.157831] meson_nfc_dma_buffer_setup 0x29e6c680
> [    2.162527] meson_nfc_read_buf: about to kfree info
> [    2.167369] meson_nfc_read_buf: kfree'd info
> [    2.171611] meson_nfc_read_buf ee39a34b 0x2e39a34b
> [    2.176383] meson_nfc_dma_buffer_setup 0x2e39a34b
> [    2.181076] meson_nfc_read_buf: about to kfree info
> [    2.185932] ------------[ cut here ]------------
> [    2.190503] kernel BUG at mm/slub.c:3950!
> [    2.194491] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> ...
> 
> meson_nfc_read_buf debug log with PER_INFO_BYTE+64 allocation:
> [    2.033019] meson_nfc_read_buf e9ea4280 0x29ea4280
> [    2.033112] meson_nfc_dma_buffer_setup 0x29ea4280
> [    2.037847] meson_nfc_read_buf: about to kfree info
> [    2.042642] meson_nfc_read_buf: kfree'd info
> [    2.046909] meson_nfc_read_buf e9ea4280 0x29ea4280
> [    2.051659] meson_nfc_dma_buffer_setup 0x29ea4280
> [    2.056374] meson_nfc_read_buf: about to kfree info
> [    2.061192] meson_nfc_read_buf: kfree'd info
> [    2.065461] meson_nfc_read_buf e9ea4280 0x29ea4280
> [    2.070208] meson_nfc_dma_buffer_setup 0x29ea4280
> [    2.074922] meson_nfc_read_buf: about to kfree info
> [    2.079742] meson_nfc_read_buf: kfree'd info
> [    2.084087] meson_nfc_read_buf e9ea4280 0x29ea4280
> [    2.088789] meson_nfc_dma_buffer_setup 0x29ea4280
> [    2.093440] meson_nfc_read_buf: about to kfree info
> [    2.098303] meson_nfc_read_buf: kfree'd info
> [    2.102553] meson_nfc_read_buf e9ea4280 0x29ea4280
> [    2.107316] meson_nfc_dma_buffer_setup 0x29ea4280
> [    2.111990] meson_nfc_read_buf: about to kfree info
> [    2.116870] meson_nfc_read_buf: kfree'd info
> [    2.121103] meson_nfc_read_buf e9ea4280 0x29ea4280
> [    2.125868] meson_nfc_dma_buffer_setup 0x29ea4280
> [    2.130540] meson_nfc_read_buf: about to kfree info
> [    2.135400] meson_nfc_read_buf: kfree'd info
> [    2.139652] Could not find a valid ONFI parameter page, trying
> bit-wise majority to recover it
> [    2.148276] ONFI parameter recovery failed, aborting
> [    2.153165] meson_nfc_read_buf e9ea4280 0x29ea4280
> [    2.157938] meson_nfc_dma_buffer_setup 0x29ea4280
> [    2.162634] meson_nfc_read_buf: about to kfree info
> [    2.167475] meson_nfc_read_buf: kfree'd info
> [    2.171717] meson_nfc_read_buf e9ea4280 0x29ea4280
> [    2.176489] meson_nfc_dma_buffer_setup 0x29ea4280
> [    2.181183] meson_nfc_read_buf: about to kfree info
> [    2.186025] meson_nfc_read_buf: kfree'd info
> [    2.190265] nand: device found, Manufacturer ID: 0xad, Chip ID: 0xde
> [    2.196598] nand: Hynix NAND 8GiB 3,3V 8-bit
> [    2.200840] nand: 8192 MiB, MLC, erase size: 4096 KiB, page size:
> 16384, OOB size: 1280
> [    2.208829] meson_nfc_read_buf e9ea4300 0x29ea4300
> [    2.213581] meson_nfc_dma_buffer_setup 0x29ea4300
> [    2.218291] meson_nfc_read_buf: about to kfree info
> [    2.223115] meson_nfc_read_buf: kfree'd info
> [    2.227374] ------------[ cut here ]------------
> [    2.231968] WARNING: CPU: 1 PID: 1 at
> drivers/mtd/nand/raw/nand_base.c:5503 nand_scan_with_ids+0x1718/0x171c
> [    2.241760] No oob scheme defined for oobsize 1280
> ...
> (the "No oob scheme defined for oobsize 1280" message is expected)
> 
miss mtd_set_ooblayout(mtd, &meson_ooblayout_ops) on function 
meson_nand_attach_chip.
> 
> Regards
> Martin
>
Martin Blumenstingl March 20, 2019, 8:48 p.m. UTC | #9
Hi Liang,

On Wed, Mar 20, 2019 at 4:32 AM Liang Yang <liang.yang@amlogic.com> wrote:
>
> Hi Martin,
>
> Thanks for your time.
> On 2019/3/20 4:27, Martin Blumenstingl wrote:
> > Hello Liang,
> >
> > On Sat, Mar 16, 2019 at 11:55 AM Martin Blumenstingl
> > <martin.blumenstingl@googlemail.com> wrote:
> > [...]
> >>> Martin, Now i am not sure whether NFC driver leads to kernel panic when
> >>> calling kmem_cache_alloc_trace.
> >> thank you for confirming that it works for you on GXL
> >>
> >> I'm not sure that this is a NFC driver problem.
> >> after enabling CONFIG_SLAB_FREELIST_HARDENED in my kernel config the
> >> crash moves. it's now crashing in slub.c's kfree() at
> >> BUG_ON(!PageCompound(page));
> > I added some debug prints in meson_nfc_read_buf() to get some details
> > about the info buffer before the crash,
> > format is: meson_nfc_read_buf <virtual address> <physical address>
> >
> > during my first test three different addresses are used:
> > - meson_nfc_read_buf e9e6c640 0x29e6c640 (works fine)
> > - meson_nfc_read_buf e9e6c680 0x29e6c680 (works fine)
> > - meson_nfc_read_buf ee39a34b 0x2e39a34b (crashes during kfree)
> >
> > so I tried playing around with the allocation size (see the attached
> > patch) and changed it to:
> >    kzalloc(PER_INFO_BYTE + 64, GFP_KERNEL)
> > this results in the following addresses being used:
> > - meson_nfc_read_buf e9ea4280 0x29ea4280 (works fine)
> > - meson_nfc_read_buf e9ea4300 0x29ea4300 (works fine)
> > (there is no crash anymore)
> >
> > Liang, are there any special requirements on the "info address" like
> > the alignment?
> It must be 4 bytes alignment. i have met it previously when debugging
> NFC driver on AXG platform, but it is not specified on spec. Now i am
> confused that how to get the no aligned address "xe39a34b" when using
> kmalloc; i think it should return the aligned address. doesn't it?
thank you for confirming the 4-byte alignment requirement!
I have no explanation for the unaligned address returned by kzalloc().
I'll ask on the linux-mm mailing list if they have a hint why this
happens.

> > also do you know why the PER_INFO_BYTE buffer is allocated dynamically
> > in meson_nfc_read_buf() instead of allocating it at initialization?
> > I'm not saying that it should be changed! I'm curious because there's
> > per-meson_nfc_nand_chip info and data buffers which are allocated at
> > initialization time.
> > NAND scan or initialization is divided into three stages:
> nand_scan_ident->nand_attach->nand_scan_tail. info and data buffer which
> depend on the result of nand_scan_ident are allocated on nand_attach; so
> nand_scan_ident can not use the info buffer on meson_nfc_nand_chip.
thank you for the explanation!

> Allocating a fixed size info buffer before nand_scan_ident and attach it
> on the struct meson_nfc; Or considering not use dma for reading data
> less than 8 bytes. Both can reduce kmalloc and kfree calling. Thanks.
both suggestions sound reasonable.
however, I will search for the root cause of the unaligned address
first before changing the Meson NFC driver.

[...]
> > [    2.227374] ------------[ cut here ]------------
> > [    2.231968] WARNING: CPU: 1 PID: 1 at
> > drivers/mtd/nand/raw/nand_base.c:5503 nand_scan_with_ids+0x1718/0x171c
> > [    2.241760] No oob scheme defined for oobsize 1280
> > ...
> > (the "No oob scheme defined for oobsize 1280" message is expected)
> >
> miss mtd_set_ooblayout(mtd, &meson_ooblayout_ops) on function
> meson_nand_attach_chip.
thank you for the suggestion. I didn't have time to test this on my
board yet but I'll let you know about my results during the weekend.
Does the missing mtd_set_ooblayout() call also affect GXL or AXG boards?


Regards
Martin
Liang Yang March 21, 2019, 12:10 p.m. UTC | #10
Hi Martin,

On 2019/3/21 4:48, Martin Blumenstingl wrote:
> Hi Liang,
> 
> On Wed, Mar 20, 2019 at 4:32 AM Liang Yang <liang.yang@amlogic.com> wrote:
>>
>> Hi Martin,
>>
>> Thanks for your time.
>> On 2019/3/20 4:27, Martin Blumenstingl wrote:
>>> Hello Liang,
>>>
>>> On Sat, Mar 16, 2019 at 11:55 AM Martin Blumenstingl
>>> <martin.blumenstingl@googlemail.com> wrote:
>>> [...]
> 
>> Allocating a fixed size info buffer before nand_scan_ident and attach it
>> on the struct meson_nfc; Or considering not use dma for reading data
>> less than 8 bytes. Both can reduce kmalloc and kfree calling. Thanks.
> both suggestions sound reasonable.
> however, I will search for the root cause of the unaligned address
> first before changing the Meson NFC driver.
That is good.  And i will implement one of both mentioned above soon.
> 
> [...]
>>> [    2.227374] ------------[ cut here ]------------
>>> [    2.231968] WARNING: CPU: 1 PID: 1 at
>>> drivers/mtd/nand/raw/nand_base.c:5503 nand_scan_with_ids+0x1718/0x171c
>>> [    2.241760] No oob scheme defined for oobsize 1280
>>> ...
>>> (the "No oob scheme defined for oobsize 1280" message is expected)
>>>
>> miss mtd_set_ooblayout(mtd, &meson_ooblayout_ops) on function
>> meson_nand_attach_chip.
> thank you for the suggestion. I didn't have time to test this on my
> board yet but I'll let you know about my results during the weekend.
> Does the missing mtd_set_ooblayout() call also affect GXL or AXG boards?
> 
Yes. I deleted it unintentionally.
> 
> Regards
> Martin
> 
> .
>