Message ID | 20230130153247.445027-1-alexander.stein@ew.tq-group.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] i2c: imx-lpi2c: clean rx/tx buffers upon new message | expand |
Hi, any feedback on this series? Best regards, Alexander Am Montag, 30. Januar 2023, 16:32:46 CET schrieb Alexander Stein: > When start sending a new message clear the Rx & Tx buffer pointers in > order to avoid using stale pointers. > > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> > --- > I noticed an ambigous stack corruption once my rtc-pcf85063 driver probes. > > [ 2.695684] Kernel panic - not syncing: stack-protector: Kernel stack is > corrupted in: pcf85063_rtc_read_time+0x10/0x100 [ 2.706669] CPU: 1 PID: > 63 Comm: kworker/u8:2 Not tainted 6.2.0-rc6-next-20230130+ #1185 > ca067559321ae817c063baccdba80d328f10f73 [ 2.718331] Hardware name: > TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) [ 2.724866] Workqueue: > events_unbound deferred_probe_work_func > [ 2.730712] Call trace: > [ 2.733161] dump_backtrace+0x9c/0x11c > [ 2.736914] show_stack+0x14/0x1c > [ 2.740232] dump_stack_lvl+0x5c/0x78 > [ 2.743907] dump_stack+0x14/0x1c > [ 2.747225] panic+0x34c/0x39c > [ 2.750283] __ktime_get_real_seconds+0x0/0xc > [ 2.754653] pcf85063_ioctl+0x0/0xf0 > [ 2.758232] __rtc_read_time+0x44/0x114 > [ 2.762081] __rtc_read_alarm+0x258/0x460 > [ 2.766095] __devm_rtc_register_device+0x174/0x2b4 > [ 2.770986] pcf85063_probe+0x258/0x4d4 > [ 2.774825] i2c_device_probe+0x100/0x33c > > The backtrace did not indicate the actual cause of it. Checking the code the > RTC driver seemed to be ok, so it has to be in the i2c bus driver. At some > point I noticed that I see both Rx and Tx interrupts at the same time, > which is odd. Also both rx_buf and tx_buf was set simultaneously. Clearly a > bug to me. > Clearing the buffer pointers upon each new i2c message triggered a NULL > pointer dereference: > > [ 2.694923] Unable to handle kernel NULL pointer dereference at virtual > address 0000000000000001 [ 2.703730] Mem abort info: > [ 2.706525] ESR = 0x0000000096000004 > [ 2.710278] EC = 0x25: DABT (current EL), IL = 32 bits > [ 2.715595] SET = 0, FnV = 0 > [ 2.718653] EA = 0, S1PTW = 0 > [ 2.721798] FSC = 0x04: level 0 translation fault > [ 2.726680] Data abort info: > [ 2.729556] ISV = 0, ISS = 0x00000004 > [ 2.733387] CM = 0, WnR = 0 > [ 2.736358] [0000000000000001] user address but active_mm is swapper > [ 2.742719] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > [ 2.748990] Modules linked in: > [ 2.752051] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 6.2.0-rc6-next-20230130+ #1184 44a8abebca6bfabc93e20ac52bce 47da7f92cec1 > [ 2.763368] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) > [ 2.769902] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS > BTYPE=--) [ 2.776868] pc : lpi2c_imx_write_txfifo+0x44/0xb0 > [ 2.781585] lr : lpi2c_imx_isr+0x60/0x8c > [ 2.785512] sp : ffff800008003ef0 > [ 2.788831] x29: ffff800008003ef0 x28: ffff8000099c1ec0 x27: > 00000000bfe632c8 [ 2.795980] x26: 0000000000000000 x25: ffff800009b935ed > x24: ffff800009a4d4c0 [ 2.803130] x23: ffff00000365e800 x22: > 0000000000000128 x21: 0000000000000000 [ 2.810280] x20: ffff0000033f4080 > x19: 0000000003000103 x18: 0000000000000000 [ 2.817430] x17: > ffff80003688a000 x16: ffff800008000000 x15: 0000000000000000 [ 2.824579] > x14: 0000000000000000 x13: ffff8000099d1db8 x12: 0000000000000000 [ > 2.831729] x11: ffff800009503180 x10: 0000000000000a80 x9 : ffff8000099b3d20 > [ 2.838879] x8 : ffff8000099c29a0 x7 : 00000000000000c0 x6 : > ffff000002838028 [ 2.846029] x5 : 0000000000000002 x4 : 0000000000000000 > x3 : 0000000000000000 [ 2.849626] imx-scu system-controller: RPC send > msg timeout > [ 2.853178] x2 : ffff800009c88060 x1 : 0000000000000001 x0 : > ffff0000033f4080 [ 2.858764] enet1: failed to power off resource 252 > ret -110 > [ 2.865897] Call trace: > [ 2.865901] lpi2c_imx_write_txfifo+0x44/0xb0 > [ 2.878443] __handle_irq_event_percpu+0x5c/0x188 > [ 2.883151] handle_irq_event+0x48/0xb0 > > $ ./scripts/faddr2line build_arm64/vmlinux lpi2c_imx_write_txfifo+0x44/0xb0 > lpi2c_imx_write_txfifo+0x44/0xb0: > lpi2c_imx_write_txfifo at drivers/i2c/busses/i2c-imx-lpi2c.c:364 > > This now clearly pinpoints the wrong access which previously corrupted the > stack. The error leading to this wrong access is addressed in the > following patch. > > drivers/i2c/busses/i2c-imx-lpi2c.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c > b/drivers/i2c/busses/i2c-imx-lpi2c.c index 188f2a36d2fd..c6d0225246e6 > 100644 > --- a/drivers/i2c/busses/i2c-imx-lpi2c.c > +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c > @@ -463,6 +463,8 @@ static int lpi2c_imx_xfer(struct i2c_adapter *adapter, > if (num == 1 && msgs[0].len == 0) > goto stop; > > + lpi2c_imx->rx_buf = NULL; > + lpi2c_imx->tx_buf = NULL; > lpi2c_imx->delivered = 0; > lpi2c_imx->msglen = msgs[i].len; > init_completion(&lpi2c_imx->complete);
On 30/01/2023 16:32, Alexander Stein wrote: > When start sending a new message clear the Rx & Tx buffer pointers in > order to avoid using stale pointers. > > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> > --- > I noticed an ambigous stack corruption once my rtc-pcf85063 driver probes. > > [ 2.695684] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pcf85063_rtc_read_time+0x10/0x100 > [ 2.706669] CPU: 1 PID: 63 Comm: kworker/u8:2 Not tainted 6.2.0-rc6-next-20230130+ #1185 ca067559321ae817c063baccdba80d328f10f73 > [ 2.718331] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) > [ 2.724866] Workqueue: events_unbound deferred_probe_work_func > [ 2.730712] Call trace: > [ 2.733161] dump_backtrace+0x9c/0x11c > [ 2.736914] show_stack+0x14/0x1c > [ 2.740232] dump_stack_lvl+0x5c/0x78 > [ 2.743907] dump_stack+0x14/0x1c > [ 2.747225] panic+0x34c/0x39c > [ 2.750283] __ktime_get_real_seconds+0x0/0xc > [ 2.754653] pcf85063_ioctl+0x0/0xf0 > [ 2.758232] __rtc_read_time+0x44/0x114 > [ 2.762081] __rtc_read_alarm+0x258/0x460 > [ 2.766095] __devm_rtc_register_device+0x174/0x2b4 > [ 2.770986] pcf85063_probe+0x258/0x4d4 > [ 2.774825] i2c_device_probe+0x100/0x33c > > The backtrace did not indicate the actual cause of it. Checking the code the > RTC driver seemed to be ok, so it has to be in the i2c bus driver. > At some point I noticed that I see both Rx and Tx interrupts at the same time, > which is odd. Also both rx_buf and tx_buf was set simultaneously. Clearly > a bug to me. > Clearing the buffer pointers upon each new i2c message triggered a NULL > pointer dereference: > > [ 2.694923] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000001 > [ 2.703730] Mem abort info: > [ 2.706525] ESR = 0x0000000096000004 > [ 2.710278] EC = 0x25: DABT (current EL), IL = 32 bits > [ 2.715595] SET = 0, FnV = 0 > [ 2.718653] EA = 0, S1PTW = 0 > [ 2.721798] FSC = 0x04: level 0 translation fault > [ 2.726680] Data abort info: > [ 2.729556] ISV = 0, ISS = 0x00000004 > [ 2.733387] CM = 0, WnR = 0 > [ 2.736358] [0000000000000001] user address but active_mm is swapper > [ 2.742719] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > [ 2.748990] Modules linked in: > [ 2.752051] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc6-next-20230130+ #1184 44a8abebca6bfabc93e20ac52bce > 47da7f92cec1 > [ 2.763368] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) > [ 2.769902] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 2.776868] pc : lpi2c_imx_write_txfifo+0x44/0xb0 > [ 2.781585] lr : lpi2c_imx_isr+0x60/0x8c > [ 2.785512] sp : ffff800008003ef0 > [ 2.788831] x29: ffff800008003ef0 x28: ffff8000099c1ec0 x27: 00000000bfe632c8 > [ 2.795980] x26: 0000000000000000 x25: ffff800009b935ed x24: ffff800009a4d4c0 > [ 2.803130] x23: ffff00000365e800 x22: 0000000000000128 x21: 0000000000000000 > [ 2.810280] x20: ffff0000033f4080 x19: 0000000003000103 x18: 0000000000000000 > [ 2.817430] x17: ffff80003688a000 x16: ffff800008000000 x15: 0000000000000000 > [ 2.824579] x14: 0000000000000000 x13: ffff8000099d1db8 x12: 0000000000000000 > [ 2.831729] x11: ffff800009503180 x10: 0000000000000a80 x9 : ffff8000099b3d20 > [ 2.838879] x8 : ffff8000099c29a0 x7 : 00000000000000c0 x6 : ffff000002838028 > [ 2.846029] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000 > [ 2.849626] imx-scu system-controller: RPC send msg timeout > [ 2.853178] x2 : ffff800009c88060 x1 : 0000000000000001 x0 : ffff0000033f4080 > [ 2.858764] enet1: failed to power off resource 252 ret -110 > [ 2.865897] Call trace: > [ 2.865901] lpi2c_imx_write_txfifo+0x44/0xb0 > [ 2.878443] __handle_irq_event_percpu+0x5c/0x188 > [ 2.883151] handle_irq_event+0x48/0xb0 > > $ ./scripts/faddr2line build_arm64/vmlinux lpi2c_imx_write_txfifo+0x44/0xb0 > lpi2c_imx_write_txfifo+0x44/0xb0: > lpi2c_imx_write_txfifo at drivers/i2c/busses/i2c-imx-lpi2c.c:364 > > This now clearly pinpoints the wrong access which previously corrupted the > stack. The error leading to this wrong access is addressed in the > following patch. > > drivers/i2c/busses/i2c-imx-lpi2c.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c b/drivers/i2c/busses/i2c-imx-lpi2c.c > index 188f2a36d2fd..c6d0225246e6 100644 > --- a/drivers/i2c/busses/i2c-imx-lpi2c.c > +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c > @@ -463,6 +463,8 @@ static int lpi2c_imx_xfer(struct i2c_adapter *adapter, > if (num == 1 && msgs[0].len == 0) > goto stop; > > + lpi2c_imx->rx_buf = NULL; > + lpi2c_imx->tx_buf = NULL; > lpi2c_imx->delivered = 0; > lpi2c_imx->msglen = msgs[i].len; > init_completion(&lpi2c_imx->complete); Hello, I have same problem with rtc-ds1307 driver and NXP imx8x (using ic2-imx-lpi2c.c bus driver). I do not have the full stack trace but I'm sure is very similar: [ 10.750015] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ds1307_get_time+0x2a4/0x2c4 [rtc_ds1307] Your patches are fixing this too and they seem good to me. About the [2/2] patch your approach sound better to me than the downstream approach. Emanuele Ghidoli
Hi Emanuele, Am Donnerstag, 2. März 2023, 12:06:18 CET schrieb Emanuele Ghidoli: > On 30/01/2023 16:32, Alexander Stein wrote: > > When start sending a new message clear the Rx & Tx buffer pointers in > > order to avoid using stale pointers. > > > > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> > > --- > > I noticed an ambigous stack corruption once my rtc-pcf85063 driver probes. > > > > [ 2.695684] Kernel panic - not syncing: stack-protector: Kernel stack > > is corrupted in: pcf85063_rtc_read_time+0x10/0x100 [ 2.706669] CPU: 1 > > PID: 63 Comm: kworker/u8:2 Not tainted 6.2.0-rc6-next-20230130+ #1185 > > ca067559321ae817c063baccdba80d328f10f73 [ 2.718331] Hardware name: > > TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) [ 2.724866] Workqueue: > > events_unbound deferred_probe_work_func > > [ 2.730712] Call trace: > > [ 2.733161] dump_backtrace+0x9c/0x11c > > [ 2.736914] show_stack+0x14/0x1c > > [ 2.740232] dump_stack_lvl+0x5c/0x78 > > [ 2.743907] dump_stack+0x14/0x1c > > [ 2.747225] panic+0x34c/0x39c > > [ 2.750283] __ktime_get_real_seconds+0x0/0xc > > [ 2.754653] pcf85063_ioctl+0x0/0xf0 > > [ 2.758232] __rtc_read_time+0x44/0x114 > > [ 2.762081] __rtc_read_alarm+0x258/0x460 > > [ 2.766095] __devm_rtc_register_device+0x174/0x2b4 > > [ 2.770986] pcf85063_probe+0x258/0x4d4 > > [ 2.774825] i2c_device_probe+0x100/0x33c > > > > The backtrace did not indicate the actual cause of it. Checking the code > > the RTC driver seemed to be ok, so it has to be in the i2c bus driver. At > > some point I noticed that I see both Rx and Tx interrupts at the same > > time, which is odd. Also both rx_buf and tx_buf was set simultaneously. > > Clearly a bug to me. > > Clearing the buffer pointers upon each new i2c message triggered a NULL > > pointer dereference: > > > > [ 2.694923] Unable to handle kernel NULL pointer dereference at virtual > > address 0000000000000001 [ 2.703730] Mem abort info: > > [ 2.706525] ESR = 0x0000000096000004 > > [ 2.710278] EC = 0x25: DABT (current EL), IL = 32 bits > > [ 2.715595] SET = 0, FnV = 0 > > [ 2.718653] EA = 0, S1PTW = 0 > > [ 2.721798] FSC = 0x04: level 0 translation fault > > [ 2.726680] Data abort info: > > [ 2.729556] ISV = 0, ISS = 0x00000004 > > [ 2.733387] CM = 0, WnR = 0 > > [ 2.736358] [0000000000000001] user address but active_mm is swapper > > [ 2.742719] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > > [ 2.748990] Modules linked in: > > [ 2.752051] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > > 6.2.0-rc6-next-20230130+ #1184 44a8abebca6bfabc93e20ac52bce 47da7f92cec1 > > [ 2.763368] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) > > [ 2.769902] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS > > BTYPE=--) [ 2.776868] pc : lpi2c_imx_write_txfifo+0x44/0xb0 > > [ 2.781585] lr : lpi2c_imx_isr+0x60/0x8c > > [ 2.785512] sp : ffff800008003ef0 > > [ 2.788831] x29: ffff800008003ef0 x28: ffff8000099c1ec0 x27: > > 00000000bfe632c8 [ 2.795980] x26: 0000000000000000 x25: > > ffff800009b935ed x24: ffff800009a4d4c0 [ 2.803130] x23: > > ffff00000365e800 x22: 0000000000000128 x21: 0000000000000000 [ > > 2.810280] x20: ffff0000033f4080 x19: 0000000003000103 x18: > > 0000000000000000 [ 2.817430] x17: ffff80003688a000 x16: > > ffff800008000000 x15: 0000000000000000 [ 2.824579] x14: > > 0000000000000000 x13: ffff8000099d1db8 x12: 0000000000000000 [ > > 2.831729] x11: ffff800009503180 x10: 0000000000000a80 x9 : > > ffff8000099b3d20 [ 2.838879] x8 : ffff8000099c29a0 x7 : > > 00000000000000c0 x6 : ffff000002838028 [ 2.846029] x5 : > > 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000 [ > > 2.849626] imx-scu system-controller: RPC send msg timeout > > [ 2.853178] x2 : ffff800009c88060 x1 : 0000000000000001 x0 : > > ffff0000033f4080 [ 2.858764] enet1: failed to power off resource 252 > > ret -110 > > [ 2.865897] Call trace: > > [ 2.865901] lpi2c_imx_write_txfifo+0x44/0xb0 > > [ 2.878443] __handle_irq_event_percpu+0x5c/0x188 > > [ 2.883151] handle_irq_event+0x48/0xb0 > > > > $ ./scripts/faddr2line build_arm64/vmlinux > > lpi2c_imx_write_txfifo+0x44/0xb0 > > lpi2c_imx_write_txfifo+0x44/0xb0: > > lpi2c_imx_write_txfifo at drivers/i2c/busses/i2c-imx-lpi2c.c:364 > > > > This now clearly pinpoints the wrong access which previously corrupted the > > stack. The error leading to this wrong access is addressed in the > > following patch. > > > > drivers/i2c/busses/i2c-imx-lpi2c.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c > > b/drivers/i2c/busses/i2c-imx-lpi2c.c index 188f2a36d2fd..c6d0225246e6 > > 100644 > > --- a/drivers/i2c/busses/i2c-imx-lpi2c.c > > +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c > > @@ -463,6 +463,8 @@ static int lpi2c_imx_xfer(struct i2c_adapter *adapter, > > > > if (num == 1 && msgs[0].len == 0) > > > > goto stop; > > > > + lpi2c_imx->rx_buf = NULL; > > + lpi2c_imx->tx_buf = NULL; > > > > lpi2c_imx->delivered = 0; > > lpi2c_imx->msglen = msgs[i].len; > > init_completion(&lpi2c_imx->complete); > > Hello, > I have same problem with rtc-ds1307 driver and NXP imx8x (using > ic2-imx-lpi2c.c bus driver). I do not have the full stack trace but I'm > sure is very similar: > [ 10.750015] Kernel panic - not syncing: stack-protector: Kernel stack is > corrupted in: ds1307_get_time+0x2a4/0x2c4 [rtc_ds1307] > > Your patches are fixing this too and they seem good to me. > About the [2/2] patch your approach sound better to me than the downstream > approach. > > Emanuele Ghidoli Thanks for the feedback. Could you provide then a Tested-by tag? Best regards, Alexander
On 02/03/2023 12:15, Alexander Stein wrote: > Hi Emanuele, > > Am Donnerstag, 2. März 2023, 12:06:18 CET schrieb Emanuele Ghidoli: >> On 30/01/2023 16:32, Alexander Stein wrote: >>> When start sending a new message clear the Rx & Tx buffer pointers in >>> order to avoid using stale pointers. >>> >>> Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> >>> --- >>> I noticed an ambigous stack corruption once my rtc-pcf85063 driver probes. >>> >>> [ 2.695684] Kernel panic - not syncing: stack-protector: Kernel stack >>> is corrupted in: pcf85063_rtc_read_time+0x10/0x100 [ 2.706669] CPU: 1 >>> PID: 63 Comm: kworker/u8:2 Not tainted 6.2.0-rc6-next-20230130+ #1185 >>> ca067559321ae817c063baccdba80d328f10f73 [ 2.718331] Hardware name: >>> TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) [ 2.724866] Workqueue: >>> events_unbound deferred_probe_work_func >>> [ 2.730712] Call trace: >>> [ 2.733161] dump_backtrace+0x9c/0x11c >>> [ 2.736914] show_stack+0x14/0x1c >>> [ 2.740232] dump_stack_lvl+0x5c/0x78 >>> [ 2.743907] dump_stack+0x14/0x1c >>> [ 2.747225] panic+0x34c/0x39c >>> [ 2.750283] __ktime_get_real_seconds+0x0/0xc >>> [ 2.754653] pcf85063_ioctl+0x0/0xf0 >>> [ 2.758232] __rtc_read_time+0x44/0x114 >>> [ 2.762081] __rtc_read_alarm+0x258/0x460 >>> [ 2.766095] __devm_rtc_register_device+0x174/0x2b4 >>> [ 2.770986] pcf85063_probe+0x258/0x4d4 >>> [ 2.774825] i2c_device_probe+0x100/0x33c >>> >>> The backtrace did not indicate the actual cause of it. Checking the code >>> the RTC driver seemed to be ok, so it has to be in the i2c bus driver. At >>> some point I noticed that I see both Rx and Tx interrupts at the same >>> time, which is odd. Also both rx_buf and tx_buf was set simultaneously. >>> Clearly a bug to me. >>> Clearing the buffer pointers upon each new i2c message triggered a NULL >>> pointer dereference: >>> >>> [ 2.694923] Unable to handle kernel NULL pointer dereference at virtual >>> address 0000000000000001 [ 2.703730] Mem abort info: >>> [ 2.706525] ESR = 0x0000000096000004 >>> [ 2.710278] EC = 0x25: DABT (current EL), IL = 32 bits >>> [ 2.715595] SET = 0, FnV = 0 >>> [ 2.718653] EA = 0, S1PTW = 0 >>> [ 2.721798] FSC = 0x04: level 0 translation fault >>> [ 2.726680] Data abort info: >>> [ 2.729556] ISV = 0, ISS = 0x00000004 >>> [ 2.733387] CM = 0, WnR = 0 >>> [ 2.736358] [0000000000000001] user address but active_mm is swapper >>> [ 2.742719] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP >>> [ 2.748990] Modules linked in: >>> [ 2.752051] CPU: 0 PID: 0 Comm: swapper/0 Not tainted >>> 6.2.0-rc6-next-20230130+ #1184 44a8abebca6bfabc93e20ac52bce 47da7f92cec1 >>> [ 2.763368] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) >>> [ 2.769902] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS >>> BTYPE=--) [ 2.776868] pc : lpi2c_imx_write_txfifo+0x44/0xb0 >>> [ 2.781585] lr : lpi2c_imx_isr+0x60/0x8c >>> [ 2.785512] sp : ffff800008003ef0 >>> [ 2.788831] x29: ffff800008003ef0 x28: ffff8000099c1ec0 x27: >>> 00000000bfe632c8 [ 2.795980] x26: 0000000000000000 x25: >>> ffff800009b935ed x24: ffff800009a4d4c0 [ 2.803130] x23: >>> ffff00000365e800 x22: 0000000000000128 x21: 0000000000000000 [ >>> 2.810280] x20: ffff0000033f4080 x19: 0000000003000103 x18: >>> 0000000000000000 [ 2.817430] x17: ffff80003688a000 x16: >>> ffff800008000000 x15: 0000000000000000 [ 2.824579] x14: >>> 0000000000000000 x13: ffff8000099d1db8 x12: 0000000000000000 [ >>> 2.831729] x11: ffff800009503180 x10: 0000000000000a80 x9 : >>> ffff8000099b3d20 [ 2.838879] x8 : ffff8000099c29a0 x7 : >>> 00000000000000c0 x6 : ffff000002838028 [ 2.846029] x5 : >>> 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000 [ >>> 2.849626] imx-scu system-controller: RPC send msg timeout >>> [ 2.853178] x2 : ffff800009c88060 x1 : 0000000000000001 x0 : >>> ffff0000033f4080 [ 2.858764] enet1: failed to power off resource 252 >>> ret -110 >>> [ 2.865897] Call trace: >>> [ 2.865901] lpi2c_imx_write_txfifo+0x44/0xb0 >>> [ 2.878443] __handle_irq_event_percpu+0x5c/0x188 >>> [ 2.883151] handle_irq_event+0x48/0xb0 >>> >>> $ ./scripts/faddr2line build_arm64/vmlinux >>> lpi2c_imx_write_txfifo+0x44/0xb0 >>> lpi2c_imx_write_txfifo+0x44/0xb0: >>> lpi2c_imx_write_txfifo at drivers/i2c/busses/i2c-imx-lpi2c.c:364 >>> >>> This now clearly pinpoints the wrong access which previously corrupted the >>> stack. The error leading to this wrong access is addressed in the >>> following patch. >>> >>> drivers/i2c/busses/i2c-imx-lpi2c.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c >>> b/drivers/i2c/busses/i2c-imx-lpi2c.c index 188f2a36d2fd..c6d0225246e6 >>> 100644 >>> --- a/drivers/i2c/busses/i2c-imx-lpi2c.c >>> +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c >>> @@ -463,6 +463,8 @@ static int lpi2c_imx_xfer(struct i2c_adapter *adapter, >>> >>> if (num == 1 && msgs[0].len == 0) >>> >>> goto stop; >>> >>> + lpi2c_imx->rx_buf = NULL; >>> + lpi2c_imx->tx_buf = NULL; >>> >>> lpi2c_imx->delivered = 0; >>> lpi2c_imx->msglen = msgs[i].len; >>> init_completion(&lpi2c_imx->complete); >> >> Hello, >> I have same problem with rtc-ds1307 driver and NXP imx8x (using >> ic2-imx-lpi2c.c bus driver). I do not have the full stack trace but I'm >> sure is very similar: >> [ 10.750015] Kernel panic - not syncing: stack-protector: Kernel stack is >> corrupted in: ds1307_get_time+0x2a4/0x2c4 [rtc_ds1307] >> >> Your patches are fixing this too and they seem good to me. >> About the [2/2] patch your approach sound better to me than the downstream >> approach. >> >> Emanuele Ghidoli > > Thanks for the feedback. Could you provide then a Tested-by tag? > > Best regards, > Alexander > Yes, for sure tested and reviewed: Tested-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com> Best regards, Emanuele
On 02/03/2023 12:06, Emanuele Ghidoli wrote: > On 30/01/2023 16:32, Alexander Stein wrote: >> When start sending a new message clear the Rx & Tx buffer pointers in >> order to avoid using stale pointers. >> >> Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> >> --- >> I noticed an ambigous stack corruption once my rtc-pcf85063 driver probes. >> >> [ 2.695684] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pcf85063_rtc_read_time+0x10/0x100 >> [ 2.706669] CPU: 1 PID: 63 Comm: kworker/u8:2 Not tainted 6.2.0-rc6-next-20230130+ #1185 ca067559321ae817c063baccdba80d328f10f73 >> [ 2.718331] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) >> [ 2.724866] Workqueue: events_unbound deferred_probe_work_func >> [ 2.730712] Call trace: >> [ 2.733161] dump_backtrace+0x9c/0x11c >> [ 2.736914] show_stack+0x14/0x1c >> [ 2.740232] dump_stack_lvl+0x5c/0x78 >> [ 2.743907] dump_stack+0x14/0x1c >> [ 2.747225] panic+0x34c/0x39c >> [ 2.750283] __ktime_get_real_seconds+0x0/0xc >> [ 2.754653] pcf85063_ioctl+0x0/0xf0 >> [ 2.758232] __rtc_read_time+0x44/0x114 >> [ 2.762081] __rtc_read_alarm+0x258/0x460 >> [ 2.766095] __devm_rtc_register_device+0x174/0x2b4 >> [ 2.770986] pcf85063_probe+0x258/0x4d4 >> [ 2.774825] i2c_device_probe+0x100/0x33c >> >> The backtrace did not indicate the actual cause of it. Checking the code the >> RTC driver seemed to be ok, so it has to be in the i2c bus driver. >> At some point I noticed that I see both Rx and Tx interrupts at the same time, >> which is odd. Also both rx_buf and tx_buf was set simultaneously. Clearly >> a bug to me. >> Clearing the buffer pointers upon each new i2c message triggered a NULL >> pointer dereference: >> >> [ 2.694923] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000001 >> [ 2.703730] Mem abort info: >> [ 2.706525] ESR = 0x0000000096000004 >> [ 2.710278] EC = 0x25: DABT (current EL), IL = 32 bits >> [ 2.715595] SET = 0, FnV = 0 >> [ 2.718653] EA = 0, S1PTW = 0 >> [ 2.721798] FSC = 0x04: level 0 translation fault >> [ 2.726680] Data abort info: >> [ 2.729556] ISV = 0, ISS = 0x00000004 >> [ 2.733387] CM = 0, WnR = 0 >> [ 2.736358] [0000000000000001] user address but active_mm is swapper >> [ 2.742719] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP >> [ 2.748990] Modules linked in: >> [ 2.752051] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc6-next-20230130+ #1184 44a8abebca6bfabc93e20ac52bce >> 47da7f92cec1 >> [ 2.763368] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) >> [ 2.769902] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 2.776868] pc : lpi2c_imx_write_txfifo+0x44/0xb0 >> [ 2.781585] lr : lpi2c_imx_isr+0x60/0x8c >> [ 2.785512] sp : ffff800008003ef0 >> [ 2.788831] x29: ffff800008003ef0 x28: ffff8000099c1ec0 x27: 00000000bfe632c8 >> [ 2.795980] x26: 0000000000000000 x25: ffff800009b935ed x24: ffff800009a4d4c0 >> [ 2.803130] x23: ffff00000365e800 x22: 0000000000000128 x21: 0000000000000000 >> [ 2.810280] x20: ffff0000033f4080 x19: 0000000003000103 x18: 0000000000000000 >> [ 2.817430] x17: ffff80003688a000 x16: ffff800008000000 x15: 0000000000000000 >> [ 2.824579] x14: 0000000000000000 x13: ffff8000099d1db8 x12: 0000000000000000 >> [ 2.831729] x11: ffff800009503180 x10: 0000000000000a80 x9 : ffff8000099b3d20 >> [ 2.838879] x8 : ffff8000099c29a0 x7 : 00000000000000c0 x6 : ffff000002838028 >> [ 2.846029] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000 >> [ 2.849626] imx-scu system-controller: RPC send msg timeout >> [ 2.853178] x2 : ffff800009c88060 x1 : 0000000000000001 x0 : ffff0000033f4080 >> [ 2.858764] enet1: failed to power off resource 252 ret -110 >> [ 2.865897] Call trace: >> [ 2.865901] lpi2c_imx_write_txfifo+0x44/0xb0 >> [ 2.878443] __handle_irq_event_percpu+0x5c/0x188 >> [ 2.883151] handle_irq_event+0x48/0xb0 >> >> $ ./scripts/faddr2line build_arm64/vmlinux lpi2c_imx_write_txfifo+0x44/0xb0 >> lpi2c_imx_write_txfifo+0x44/0xb0: >> lpi2c_imx_write_txfifo at drivers/i2c/busses/i2c-imx-lpi2c.c:364 >> >> This now clearly pinpoints the wrong access which previously corrupted the >> stack. The error leading to this wrong access is addressed in the >> following patch. >> >> drivers/i2c/busses/i2c-imx-lpi2c.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c b/drivers/i2c/busses/i2c-imx-lpi2c.c >> index 188f2a36d2fd..c6d0225246e6 100644 >> --- a/drivers/i2c/busses/i2c-imx-lpi2c.c >> +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c >> @@ -463,6 +463,8 @@ static int lpi2c_imx_xfer(struct i2c_adapter *adapter, >> if (num == 1 && msgs[0].len == 0) >> goto stop; >> + lpi2c_imx->rx_buf = NULL; >> + lpi2c_imx->tx_buf = NULL; >> lpi2c_imx->delivered = 0; >> lpi2c_imx->msglen = msgs[i].len; >> init_completion(&lpi2c_imx->complete); > > Hello, > I have same problem with rtc-ds1307 driver and NXP imx8x (using ic2-imx-lpi2c.c bus driver). > I do not have the full stack trace but I'm sure is very similar: > [ 10.750015] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ds1307_get_time+0x2a4/0x2c4 [rtc_ds1307] > > Your patches are fixing this too and they seem good to me. > About the [2/2] patch your approach sound better to me than the downstream approach. > > Emanuele Ghidoli > > > > Hello, maybe it is worth to tag also with "Fixes:"... I do not know if it was here from the first commit introducing the driver. Best regards, Emanuele
On Mon, Jan 30, 2023 at 04:32:46PM +0100, Alexander Stein wrote: > When start sending a new message clear the Rx & Tx buffer pointers in > order to avoid using stale pointers. > > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Dong Aisheng, what do you think of these patches? > --- > I noticed an ambigous stack corruption once my rtc-pcf85063 driver probes. > > [ 2.695684] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pcf85063_rtc_read_time+0x10/0x100 > [ 2.706669] CPU: 1 PID: 63 Comm: kworker/u8:2 Not tainted 6.2.0-rc6-next-20230130+ #1185 ca067559321ae817c063baccdba80d328f10f73 > [ 2.718331] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) > [ 2.724866] Workqueue: events_unbound deferred_probe_work_func > [ 2.730712] Call trace: > [ 2.733161] dump_backtrace+0x9c/0x11c > [ 2.736914] show_stack+0x14/0x1c > [ 2.740232] dump_stack_lvl+0x5c/0x78 > [ 2.743907] dump_stack+0x14/0x1c > [ 2.747225] panic+0x34c/0x39c > [ 2.750283] __ktime_get_real_seconds+0x0/0xc > [ 2.754653] pcf85063_ioctl+0x0/0xf0 > [ 2.758232] __rtc_read_time+0x44/0x114 > [ 2.762081] __rtc_read_alarm+0x258/0x460 > [ 2.766095] __devm_rtc_register_device+0x174/0x2b4 > [ 2.770986] pcf85063_probe+0x258/0x4d4 > [ 2.774825] i2c_device_probe+0x100/0x33c > > The backtrace did not indicate the actual cause of it. Checking the code the > RTC driver seemed to be ok, so it has to be in the i2c bus driver. > At some point I noticed that I see both Rx and Tx interrupts at the same time, > which is odd. Also both rx_buf and tx_buf was set simultaneously. Clearly > a bug to me. > Clearing the buffer pointers upon each new i2c message triggered a NULL > pointer dereference: > > [ 2.694923] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000001 > [ 2.703730] Mem abort info: > [ 2.706525] ESR = 0x0000000096000004 > [ 2.710278] EC = 0x25: DABT (current EL), IL = 32 bits > [ 2.715595] SET = 0, FnV = 0 > [ 2.718653] EA = 0, S1PTW = 0 > [ 2.721798] FSC = 0x04: level 0 translation fault > [ 2.726680] Data abort info: > [ 2.729556] ISV = 0, ISS = 0x00000004 > [ 2.733387] CM = 0, WnR = 0 > [ 2.736358] [0000000000000001] user address but active_mm is swapper > [ 2.742719] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > [ 2.748990] Modules linked in: > [ 2.752051] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc6-next-20230130+ #1184 44a8abebca6bfabc93e20ac52bce > 47da7f92cec1 > [ 2.763368] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) > [ 2.769902] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 2.776868] pc : lpi2c_imx_write_txfifo+0x44/0xb0 > [ 2.781585] lr : lpi2c_imx_isr+0x60/0x8c > [ 2.785512] sp : ffff800008003ef0 > [ 2.788831] x29: ffff800008003ef0 x28: ffff8000099c1ec0 x27: 00000000bfe632c8 > [ 2.795980] x26: 0000000000000000 x25: ffff800009b935ed x24: ffff800009a4d4c0 > [ 2.803130] x23: ffff00000365e800 x22: 0000000000000128 x21: 0000000000000000 > [ 2.810280] x20: ffff0000033f4080 x19: 0000000003000103 x18: 0000000000000000 > [ 2.817430] x17: ffff80003688a000 x16: ffff800008000000 x15: 0000000000000000 > [ 2.824579] x14: 0000000000000000 x13: ffff8000099d1db8 x12: 0000000000000000 > [ 2.831729] x11: ffff800009503180 x10: 0000000000000a80 x9 : ffff8000099b3d20 > [ 2.838879] x8 : ffff8000099c29a0 x7 : 00000000000000c0 x6 : ffff000002838028 > [ 2.846029] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000 > [ 2.849626] imx-scu system-controller: RPC send msg timeout > [ 2.853178] x2 : ffff800009c88060 x1 : 0000000000000001 x0 : ffff0000033f4080 > [ 2.858764] enet1: failed to power off resource 252 ret -110 > [ 2.865897] Call trace: > [ 2.865901] lpi2c_imx_write_txfifo+0x44/0xb0 > [ 2.878443] __handle_irq_event_percpu+0x5c/0x188 > [ 2.883151] handle_irq_event+0x48/0xb0 > > $ ./scripts/faddr2line build_arm64/vmlinux lpi2c_imx_write_txfifo+0x44/0xb0 > lpi2c_imx_write_txfifo+0x44/0xb0: > lpi2c_imx_write_txfifo at drivers/i2c/busses/i2c-imx-lpi2c.c:364 > > This now clearly pinpoints the wrong access which previously corrupted the > stack. The error leading to this wrong access is addressed in the > following patch. > > drivers/i2c/busses/i2c-imx-lpi2c.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c b/drivers/i2c/busses/i2c-imx-lpi2c.c > index 188f2a36d2fd..c6d0225246e6 100644 > --- a/drivers/i2c/busses/i2c-imx-lpi2c.c > +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c > @@ -463,6 +463,8 @@ static int lpi2c_imx_xfer(struct i2c_adapter *adapter, > if (num == 1 && msgs[0].len == 0) > goto stop; > > + lpi2c_imx->rx_buf = NULL; > + lpi2c_imx->tx_buf = NULL; > lpi2c_imx->delivered = 0; > lpi2c_imx->msglen = msgs[i].len; > init_completion(&lpi2c_imx->complete); > -- > 2.34.1 >
On Mon, Jan 30, 2023 at 04:32:46PM +0100, Alexander Stein wrote: > When start sending a new message clear the Rx & Tx buffer pointers in > order to avoid using stale pointers. > > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Applied to for-current, thanks!
On Thu, Mar 09, 2023 at 10:19:46PM +0100, Wolfram Sang wrote: > On Mon, Jan 30, 2023 at 04:32:46PM +0100, Alexander Stein wrote: > > When start sending a new message clear the Rx & Tx buffer pointers in > > order to avoid using stale pointers. > > > > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> > > Dong Aisheng, what do you think of these patches? Dong Aisheng, are you still with us?
diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c b/drivers/i2c/busses/i2c-imx-lpi2c.c index 188f2a36d2fd..c6d0225246e6 100644 --- a/drivers/i2c/busses/i2c-imx-lpi2c.c +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c @@ -463,6 +463,8 @@ static int lpi2c_imx_xfer(struct i2c_adapter *adapter, if (num == 1 && msgs[0].len == 0) goto stop; + lpi2c_imx->rx_buf = NULL; + lpi2c_imx->tx_buf = NULL; lpi2c_imx->delivered = 0; lpi2c_imx->msglen = msgs[i].len; init_completion(&lpi2c_imx->complete);
When start sending a new message clear the Rx & Tx buffer pointers in order to avoid using stale pointers. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> --- I noticed an ambigous stack corruption once my rtc-pcf85063 driver probes. [ 2.695684] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pcf85063_rtc_read_time+0x10/0x100 [ 2.706669] CPU: 1 PID: 63 Comm: kworker/u8:2 Not tainted 6.2.0-rc6-next-20230130+ #1185 ca067559321ae817c063baccdba80d328f10f73 [ 2.718331] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) [ 2.724866] Workqueue: events_unbound deferred_probe_work_func [ 2.730712] Call trace: [ 2.733161] dump_backtrace+0x9c/0x11c [ 2.736914] show_stack+0x14/0x1c [ 2.740232] dump_stack_lvl+0x5c/0x78 [ 2.743907] dump_stack+0x14/0x1c [ 2.747225] panic+0x34c/0x39c [ 2.750283] __ktime_get_real_seconds+0x0/0xc [ 2.754653] pcf85063_ioctl+0x0/0xf0 [ 2.758232] __rtc_read_time+0x44/0x114 [ 2.762081] __rtc_read_alarm+0x258/0x460 [ 2.766095] __devm_rtc_register_device+0x174/0x2b4 [ 2.770986] pcf85063_probe+0x258/0x4d4 [ 2.774825] i2c_device_probe+0x100/0x33c The backtrace did not indicate the actual cause of it. Checking the code the RTC driver seemed to be ok, so it has to be in the i2c bus driver. At some point I noticed that I see both Rx and Tx interrupts at the same time, which is odd. Also both rx_buf and tx_buf was set simultaneously. Clearly a bug to me. Clearing the buffer pointers upon each new i2c message triggered a NULL pointer dereference: [ 2.694923] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000001 [ 2.703730] Mem abort info: [ 2.706525] ESR = 0x0000000096000004 [ 2.710278] EC = 0x25: DABT (current EL), IL = 32 bits [ 2.715595] SET = 0, FnV = 0 [ 2.718653] EA = 0, S1PTW = 0 [ 2.721798] FSC = 0x04: level 0 translation fault [ 2.726680] Data abort info: [ 2.729556] ISV = 0, ISS = 0x00000004 [ 2.733387] CM = 0, WnR = 0 [ 2.736358] [0000000000000001] user address but active_mm is swapper [ 2.742719] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 2.748990] Modules linked in: [ 2.752051] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc6-next-20230130+ #1184 44a8abebca6bfabc93e20ac52bce 47da7f92cec1 [ 2.763368] Hardware name: TQ-Systems i.MX8QXP TQMa8XQP on MBa8Xx (DT) [ 2.769902] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2.776868] pc : lpi2c_imx_write_txfifo+0x44/0xb0 [ 2.781585] lr : lpi2c_imx_isr+0x60/0x8c [ 2.785512] sp : ffff800008003ef0 [ 2.788831] x29: ffff800008003ef0 x28: ffff8000099c1ec0 x27: 00000000bfe632c8 [ 2.795980] x26: 0000000000000000 x25: ffff800009b935ed x24: ffff800009a4d4c0 [ 2.803130] x23: ffff00000365e800 x22: 0000000000000128 x21: 0000000000000000 [ 2.810280] x20: ffff0000033f4080 x19: 0000000003000103 x18: 0000000000000000 [ 2.817430] x17: ffff80003688a000 x16: ffff800008000000 x15: 0000000000000000 [ 2.824579] x14: 0000000000000000 x13: ffff8000099d1db8 x12: 0000000000000000 [ 2.831729] x11: ffff800009503180 x10: 0000000000000a80 x9 : ffff8000099b3d20 [ 2.838879] x8 : ffff8000099c29a0 x7 : 00000000000000c0 x6 : ffff000002838028 [ 2.846029] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000 [ 2.849626] imx-scu system-controller: RPC send msg timeout [ 2.853178] x2 : ffff800009c88060 x1 : 0000000000000001 x0 : ffff0000033f4080 [ 2.858764] enet1: failed to power off resource 252 ret -110 [ 2.865897] Call trace: [ 2.865901] lpi2c_imx_write_txfifo+0x44/0xb0 [ 2.878443] __handle_irq_event_percpu+0x5c/0x188 [ 2.883151] handle_irq_event+0x48/0xb0 $ ./scripts/faddr2line build_arm64/vmlinux lpi2c_imx_write_txfifo+0x44/0xb0 lpi2c_imx_write_txfifo+0x44/0xb0: lpi2c_imx_write_txfifo at drivers/i2c/busses/i2c-imx-lpi2c.c:364 This now clearly pinpoints the wrong access which previously corrupted the stack. The error leading to this wrong access is addressed in the following patch. drivers/i2c/busses/i2c-imx-lpi2c.c | 2 ++ 1 file changed, 2 insertions(+)