Message ID | 20240507201028.564630-1-andriy.shevchenko@linux.intel.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 8cc3bad9d9d6a4735a8c8998c6daa1ef31cbf708 |
Headers | show |
Series | [v1,1/1] spi: Remove unneded check for orig_nents | expand |
On Tue, 07 May 2024 23:10:27 +0300, Andy Shevchenko wrote: > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > have checks for orig_nents against 0. No need to duplicate this. > All the same applies to other DMA mapping API calls. > > Also note, there is no other user in the kernel that does this kind of > checks. > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/1] spi: Remove unneded check for orig_nents commit: 8cc3bad9d9d6a4735a8c8998c6daa1ef31cbf708 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > have checks for orig_nents against 0. No need to duplicate this. > All the same applies to other DMA mapping API calls. > > Also note, there is no other user in the kernel that does this kind of > checks. > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Hi, this commit caused a regression which I reported here: https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano along with some thoughts on the cause and a possible solution, though I'm not familiar with this code base at all and would really appreciate any feedback you may have. Thanks, Nícolas
On Wed, May 15, 2024 at 05:09:33PM -0400, Nícolas F. R. A. Prado wrote: > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > have checks for orig_nents against 0. No need to duplicate this. > > All the same applies to other DMA mapping API calls. > > > > Also note, there is no other user in the kernel that does this kind of > > checks. > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > this commit caused a regression which I reported here: > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > along with some thoughts on the cause and a possible solution, though I'm not > familiar with this code base at all and would really appreciate any feedback you > may have. Thanks for the report and preliminary analysis! I'll look at it hopefully sooner than later. But at least what I think now is that my change revealed a problem somewhere else, because that's how DMA mapping / streaming APIs designed, it's extremely rare to check orig_nents field.
… > Applied to > > https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next … > [1/1] spi: Remove unneded check for orig_nents > commit: 8cc3bad9d9d6a4735a8c8998c6daa1ef31cbf708 … Are there any chances left over to avoid a typo in the summary phrase? Regards, Markus
On Thu, May 16, 2024 at 01:18:04PM +0300, Andy Shevchenko wrote: > On Wed, May 15, 2024 at 05:09:33PM -0400, Nícolas F. R. A. Prado wrote: > > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > > have checks for orig_nents against 0. No need to duplicate this. > > > All the same applies to other DMA mapping API calls. > > > > > > Also note, there is no other user in the kernel that does this kind of > > > checks. > > > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > this commit caused a regression which I reported here: > > > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > > > along with some thoughts on the cause and a possible solution, though I'm not > > familiar with this code base at all and would really appreciate any feedback you > > may have. > > Thanks for the report and preliminary analysis! > I'll look at it hopefully sooner than later. > > But at least what I think now is that my change revealed a problem somewhere > else, because that's how DMA mapping / streaming APIs designed, it's extremely > rare to check orig_nents field. Can you test the below patch? diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index b2efd4964f7c..51811f04e463 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1243,6 +1243,7 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) else rx_dev = ctlr->dev.parent; + ret = -ENOMSG; list_for_each_entry(xfer, &msg->transfers, transfer_list) { /* The sync is done before each transfer. */ unsigned long attrs = DMA_ATTR_SKIP_CPU_SYNC; @@ -1272,6 +1273,9 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) } } } + /* No transfer has been mapped, bail out with success */ + if (ret) + return 0; ctlr->cur_rx_dma_dev = rx_dev; ctlr->cur_tx_dma_dev = tx_dev; If it fixes the issue, I will submit it properly.
On Thu, May 16, 2024 at 04:25:35PM +0300, Andy Shevchenko wrote: > On Thu, May 16, 2024 at 01:18:04PM +0300, Andy Shevchenko wrote: > > On Wed, May 15, 2024 at 05:09:33PM -0400, Nícolas F. R. A. Prado wrote: > > > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > > > have checks for orig_nents against 0. No need to duplicate this. > > > > All the same applies to other DMA mapping API calls. > > > > > > > > Also note, there is no other user in the kernel that does this kind of > > > > checks. > > > > > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > > this commit caused a regression which I reported here: > > > > > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > > > > > along with some thoughts on the cause and a possible solution, though I'm not > > > familiar with this code base at all and would really appreciate any feedback you > > > may have. > > > > Thanks for the report and preliminary analysis! > > I'll look at it hopefully sooner than later. > > > > But at least what I think now is that my change revealed a problem somewhere > > else, because that's how DMA mapping / streaming APIs designed, it's extremely > > rare to check orig_nents field. > > Can you test the below patch? > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > index b2efd4964f7c..51811f04e463 100644 > --- a/drivers/spi/spi.c > +++ b/drivers/spi/spi.c > @@ -1243,6 +1243,7 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > else > rx_dev = ctlr->dev.parent; > > + ret = -ENOMSG; > list_for_each_entry(xfer, &msg->transfers, transfer_list) { > /* The sync is done before each transfer. */ > unsigned long attrs = DMA_ATTR_SKIP_CPU_SYNC; > @@ -1272,6 +1273,9 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > } > } > } > + /* No transfer has been mapped, bail out with success */ > + if (ret) > + return 0; > > ctlr->cur_rx_dma_dev = rx_dev; > ctlr->cur_tx_dma_dev = tx_dev; Hi Andy, thank you for the patch. Unfortunately it didn't completely solve the issue. Now the stack trace is slightly different and points at the next line: dma_sync_sgtable_for_device(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); So now we're hitting the case where only the tx buffer was DMA mapped, but the rx is still uninitialized, though the cur_msg_mapped flag is set to true, since it is shared between them. The original code checked for the initialization of each scatterlist individually, which is why it worked. Thanks, Nícolas
On Thu, May 16, 2024 at 12:25:19PM -0400, Nícolas F. R. A. Prado wrote: > On Thu, May 16, 2024 at 04:25:35PM +0300, Andy Shevchenko wrote: > > On Thu, May 16, 2024 at 01:18:04PM +0300, Andy Shevchenko wrote: > > > On Wed, May 15, 2024 at 05:09:33PM -0400, Nícolas F. R. A. Prado wrote: > > > > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > > > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > > > > have checks for orig_nents against 0. No need to duplicate this. > > > > > All the same applies to other DMA mapping API calls. > > > > > > > > > > Also note, there is no other user in the kernel that does this kind of > > > > > checks. > > > > > > > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > > > > this commit caused a regression which I reported here: > > > > > > > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > > > > > > > along with some thoughts on the cause and a possible solution, though I'm not > > > > familiar with this code base at all and would really appreciate any feedback you > > > > may have. > > > > > > Thanks for the report and preliminary analysis! > > > I'll look at it hopefully sooner than later. > > > > > > But at least what I think now is that my change revealed a problem somewhere > > > else, because that's how DMA mapping / streaming APIs designed, it's extremely > > > rare to check orig_nents field. > > > > Can you test the below patch? > > > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > > index b2efd4964f7c..51811f04e463 100644 > > --- a/drivers/spi/spi.c > > +++ b/drivers/spi/spi.c > > @@ -1243,6 +1243,7 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > > else > > rx_dev = ctlr->dev.parent; > > > > + ret = -ENOMSG; > > list_for_each_entry(xfer, &msg->transfers, transfer_list) { > > /* The sync is done before each transfer. */ > > unsigned long attrs = DMA_ATTR_SKIP_CPU_SYNC; > > @@ -1272,6 +1273,9 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > > } > > } > > } > > + /* No transfer has been mapped, bail out with success */ > > + if (ret) > > + return 0; > > > > ctlr->cur_rx_dma_dev = rx_dev; > > ctlr->cur_tx_dma_dev = tx_dev; > > Hi Andy, > > thank you for the patch. Unfortunately it didn't completely solve the issue. Now > the stack trace is slightly different and points at the next line: > > dma_sync_sgtable_for_device(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); > > So now we're hitting the case where only the tx buffer was DMA mapped, but the > rx is still uninitialized, though the cur_msg_mapped flag is set to true, since > it is shared between them. The original code checked for the initialization of > each scatterlist individually, which is why it worked. I was kinda expecting that, and already have another patch to try (should applied on top): diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 51811f04e463..5c607dd21fe7 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1258,6 +1258,8 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) attrs); if (ret != 0) return ret; + } else { + memset(&xfer->tx_sg, 0, sizeof(xfer->tx_sg)); } if (xfer->rx_buf != NULL) { @@ -1271,6 +1273,8 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) return ret; } + } else { + memset(&xfer->rx_sg, 0, sizeof(xfer->rx_sg)); } } /* No transfer has been mapped, bail out with success */ If my understanding is correct, my patch revealed two issues: - for non-mapped message at all; - for unidirect transfers. But I will wait for your test results to make the final conclusion.
On Thu, May 16, 2024 at 08:46:23PM +0300, Andy Shevchenko wrote: > On Thu, May 16, 2024 at 12:25:19PM -0400, Nícolas F. R. A. Prado wrote: > > On Thu, May 16, 2024 at 04:25:35PM +0300, Andy Shevchenko wrote: > > > On Thu, May 16, 2024 at 01:18:04PM +0300, Andy Shevchenko wrote: > > > > On Wed, May 15, 2024 at 05:09:33PM -0400, Nícolas F. R. A. Prado wrote: > > > > > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > > > > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > > > > > have checks for orig_nents against 0. No need to duplicate this. > > > > > > All the same applies to other DMA mapping API calls. > > > > > > > > > > > > Also note, there is no other user in the kernel that does this kind of > > > > > > checks. > > > > > > > > > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > > > > > > this commit caused a regression which I reported here: > > > > > > > > > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > > > > > > > > > along with some thoughts on the cause and a possible solution, though I'm not > > > > > familiar with this code base at all and would really appreciate any feedback you > > > > > may have. > > > > > > > > Thanks for the report and preliminary analysis! > > > > I'll look at it hopefully sooner than later. > > > > > > > > But at least what I think now is that my change revealed a problem somewhere > > > > else, because that's how DMA mapping / streaming APIs designed, it's extremely > > > > rare to check orig_nents field. > > > > > > Can you test the below patch? > > > > > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > > > index b2efd4964f7c..51811f04e463 100644 > > > --- a/drivers/spi/spi.c > > > +++ b/drivers/spi/spi.c > > > @@ -1243,6 +1243,7 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > > > else > > > rx_dev = ctlr->dev.parent; > > > > > > + ret = -ENOMSG; > > > list_for_each_entry(xfer, &msg->transfers, transfer_list) { > > > /* The sync is done before each transfer. */ > > > unsigned long attrs = DMA_ATTR_SKIP_CPU_SYNC; > > > @@ -1272,6 +1273,9 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > > > } > > > } > > > } > > > + /* No transfer has been mapped, bail out with success */ > > > + if (ret) > > > + return 0; > > > > > > ctlr->cur_rx_dma_dev = rx_dev; > > > ctlr->cur_tx_dma_dev = tx_dev; > > > > Hi Andy, > > > > thank you for the patch. Unfortunately it didn't completely solve the issue. Now > > the stack trace is slightly different and points at the next line: > > > > dma_sync_sgtable_for_device(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); > > > > So now we're hitting the case where only the tx buffer was DMA mapped, but the > > rx is still uninitialized, though the cur_msg_mapped flag is set to true, since > > it is shared between them. The original code checked for the initialization of > > each scatterlist individually, which is why it worked. > > I was kinda expecting that, and already have another patch to try (should > applied on top): > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > index 51811f04e463..5c607dd21fe7 100644 > --- a/drivers/spi/spi.c > +++ b/drivers/spi/spi.c > @@ -1258,6 +1258,8 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > attrs); > if (ret != 0) > return ret; > + } else { > + memset(&xfer->tx_sg, 0, sizeof(xfer->tx_sg)); > } > > if (xfer->rx_buf != NULL) { > @@ -1271,6 +1273,8 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > > return ret; > } > + } else { > + memset(&xfer->rx_sg, 0, sizeof(xfer->rx_sg)); > } > } > /* No transfer has been mapped, bail out with success */ Still the same issue. I've attached the backtrace at the end for reference. But I don't see how a memset would help here. As far as I can see, there's nothing in the DMA API protecting it from a null pointer to be passed in. So when dma_sync_sgtable_for_device(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE); is called with xfer->tx_sg.sgl being null, that will get passed all the way to iommu_dma_sync_sg_for_device() and sg_dma_is_swiotlb(), where it'll be dereferenced and cause the issue. So it seems to me that either the DMA API functions should check for the null pointer, or if the API doesn't want to handle those cases (like sync being called before the buffer has been mapped), then the caller needs to do the check, as was done in the original code. The same applies for the change in spi_unmap_buf_attrs(). I see sg_free_table() does handle a null sgl, but dma_unmap_sgtable() doesn't (and indeed I verified null pointer dereference happens there too if I avoid this one). Thanks, Nícolas [ 3.418996] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c [ 3.425173] Mem abort info: [ 3.425175] ESR = 0x0000000096000004 [ 3.425178] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.425183] SET = 0, FnV = 0 [ 3.425186] EA = 0, S1PTW = 0 [ 3.437092] FSC = 0x04: level 0 translation fault [ 3.446403] Data abort info: [ 3.446405] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 3.452779] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 3.460666] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 3.467487] [000000000000001c] user address but active_mm is swapper [ 3.478571] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 3.486896] Modules linked in: [ 3.495580] CPU: 6 PID: 68 Comm: kworker/u32:2 Tainted: G W 6.9.0-next-20240515-00003-g3f984d58a25f #400 [ 3.507103] Hardware name: Google Kingoftown (DT) [ 3.507108] Workqueue: events_unbound deferred_probe_work_func [ 3.516498] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3.525275] pc : iommu_dma_sync_sg_for_device+0x28/0x100 [ 3.580070] lr : __dma_sync_sg_for_device+0x28/0x4c [ 3.585089] sp : ffff800080942dc0 [ 3.588495] x29: ffff800080942dc0 x28: ffff5a9cc33ac000 x27: ffff5a9cc1482010 [ 3.595821] x26: ffff800080943008 x25: ffff5a9cc33ac480 x24: ffffb11b69bcfc00 [ 3.603146] x23: ffff5a9cc1482010 x22: 0000000000000001 x21: 0000000000000000 [ 3.610473] x20: ffffb11b69f3d718 x19: 0000000000000000 x18: ffffb11b6ac19c48 [ 3.617797] x17: 0000000000010108 x16: 0000000000000000 x15: 0000000000000002 [ 3.625130] x14: 0000000000000001 x13: 0000000000161361 x12: 0000000000000001 [ 3.632454] x11: ffff800080942cd0 x10: ffff5a9cc3dafff8 x9 : ffff5a9cc33ac469 [ 3.639782] x8 : ffff5a9cc148d704 x7 : 00000000ffffffff x6 : 0000000000000001 [ 3.647109] x5 : ffffb11b6933a780 x4 : ffffb11b68504b34 x3 : 0000000000000001 [ 3.654442] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff5a9cc1482010 [ 3.661769] Call trace: [ 3.664285] iommu_dma_sync_sg_for_device+0x28/0x100 [ 3.669388] __dma_sync_sg_for_device+0x28/0x4c [ 3.674052] spi_transfer_one_message+0x378/0x6e4 [ 3.678898] __spi_pump_transfer_message+0x1dc/0x504 [ 3.684002] __spi_sync+0x2a0/0x3c4 [ 3.687589] spi_sync+0x30/0x54 [ 3.690825] spi_mem_exec_op+0x26c/0x41c [ 3.694866] spi_nor_spimem_read_data+0x148/0x158 [ 3.699708] spi_nor_read_data+0x30/0x3c [ 3.703745] spi_nor_read_sfdp+0x74/0xe4 [ 3.707784] spi_nor_parse_sfdp+0x120/0x11d0 [ 3.712174] spi_nor_sfdp_init_params_deprecated+0x3c/0x8c [ 3.717807] spi_nor_scan+0x7ac/0xef8 [ 3.721579] spi_nor_probe+0x94/0x2ec [ 3.725352] spi_mem_probe+0x6c/0xac [ 3.729038] spi_probe+0x84/0xe4 [ 3.732359] really_probe+0xbc/0x2a0 [ 3.736035] __driver_probe_device+0x78/0x12c [ 3.740511] driver_probe_device+0x40/0x160 [ 3.744814] __device_attach_driver+0xb8/0x134 [ 3.749381] bus_for_each_drv+0x84/0xe0 [ 3.753330] __device_attach+0xa8/0x1b0 [ 3.757280] device_initial_probe+0x14/0x20 [ 3.761580] bus_probe_device+0xa8/0xac [ 3.765526] device_add+0x590/0x750 [ 3.769108] __spi_add_device+0x138/0x208 [ 3.773234] of_register_spi_device+0x394/0x57c [ 3.777898] spi_register_controller+0x394/0x760 [ 3.782651] qcom_qspi_probe+0x328/0x390 [ 3.786691] platform_probe+0x68/0xd8 [ 3.790464] really_probe+0xbc/0x2a0 [ 3.794147] __driver_probe_device+0x78/0x12c [ 3.798622] driver_probe_device+0x40/0x160 [ 3.802923] __device_attach_driver+0xb8/0x134 [ 3.807487] bus_for_each_drv+0x84/0xe0 [ 3.811434] __device_attach+0xa8/0x1b0 [ 3.815383] device_initial_probe+0x14/0x20 [ 3.819682] bus_probe_device+0xa8/0xac [ 3.823629] deferred_probe_work_func+0x88/0xc0 [ 3.828281] process_one_work+0x154/0x298 [ 3.832409] worker_thread+0x304/0x408 [ 3.836272] kthread+0x118/0x11c [ 3.839596] ret_from_fork+0x10/0x20 [ 3.843284] Code: 2a0203f5 2a0303f6 a90363f7 aa0003f7 (b9401c20) [ 3.849541] ---[ end trace 0000000000000000 ]--- > > > If my understanding is correct, my patch revealed two issues: > - for non-mapped message at all; > - for unidirect transfers. > > But I will wait for your test results to make the final conclusion. > > -- > With Best Regards, > Andy Shevchenko > >
On Thu, May 16, 2024 at 05:11:21PM -0400, Nícolas F. R. A. Prado wrote: > On Thu, May 16, 2024 at 08:46:23PM +0300, Andy Shevchenko wrote: > > On Thu, May 16, 2024 at 12:25:19PM -0400, Nícolas F. R. A. Prado wrote: > > > On Thu, May 16, 2024 at 04:25:35PM +0300, Andy Shevchenko wrote: > > > > On Thu, May 16, 2024 at 01:18:04PM +0300, Andy Shevchenko wrote: > > > > > On Wed, May 15, 2024 at 05:09:33PM -0400, Nícolas F. R. A. Prado wrote: > > > > > > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > > > > > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > > > > > > have checks for orig_nents against 0. No need to duplicate this. > > > > > > > All the same applies to other DMA mapping API calls. > > > > > > > > > > > > > > Also note, there is no other user in the kernel that does this kind of > > > > > > > checks. > > > > > > > > > > > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > > > > > > > > this commit caused a regression which I reported here: > > > > > > > > > > > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > > > > > > > > > > > along with some thoughts on the cause and a possible solution, though I'm not > > > > > > familiar with this code base at all and would really appreciate any feedback you > > > > > > may have. > > > > > > > > > > Thanks for the report and preliminary analysis! > > > > > I'll look at it hopefully sooner than later. > > > > > > > > > > But at least what I think now is that my change revealed a problem somewhere > > > > > else, because that's how DMA mapping / streaming APIs designed, it's extremely > > > > > rare to check orig_nents field. > > > > > > > > Can you test the below patch? > > > > > > > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > > > > index b2efd4964f7c..51811f04e463 100644 > > > > --- a/drivers/spi/spi.c > > > > +++ b/drivers/spi/spi.c > > > > @@ -1243,6 +1243,7 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > > > > else > > > > rx_dev = ctlr->dev.parent; > > > > > > > > + ret = -ENOMSG; > > > > list_for_each_entry(xfer, &msg->transfers, transfer_list) { > > > > /* The sync is done before each transfer. */ > > > > unsigned long attrs = DMA_ATTR_SKIP_CPU_SYNC; > > > > @@ -1272,6 +1273,9 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > > > > } > > > > } > > > > } > > > > + /* No transfer has been mapped, bail out with success */ > > > > + if (ret) > > > > + return 0; > > > > > > > > ctlr->cur_rx_dma_dev = rx_dev; > > > > ctlr->cur_tx_dma_dev = tx_dev; > > > > > > Hi Andy, > > > > > > thank you for the patch. Unfortunately it didn't completely solve the issue. Now > > > the stack trace is slightly different and points at the next line: > > > > > > dma_sync_sgtable_for_device(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); > > > > > > So now we're hitting the case where only the tx buffer was DMA mapped, but the > > > rx is still uninitialized, though the cur_msg_mapped flag is set to true, since > > > it is shared between them. The original code checked for the initialization of > > > each scatterlist individually, which is why it worked. (So the above patch is okay, the below is wrong, but read at the bottom as well) > > I was kinda expecting that, and already have another patch to try (should > > applied on top): > > > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > > index 51811f04e463..5c607dd21fe7 100644 > > --- a/drivers/spi/spi.c > > +++ b/drivers/spi/spi.c > > @@ -1258,6 +1258,8 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > > attrs); > > if (ret != 0) > > return ret; > > + } else { > > + memset(&xfer->tx_sg, 0, sizeof(xfer->tx_sg)); > > } > > > > if (xfer->rx_buf != NULL) { > > @@ -1271,6 +1273,8 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > > > > return ret; > > } > > + } else { > > + memset(&xfer->rx_sg, 0, sizeof(xfer->rx_sg)); > > } > > } > > /* No transfer has been mapped, bail out with success */ > > Still the same issue. I've attached the backtrace at the end for reference. But > I don't see how a memset would help here. As far as I can see, there's nothing > in the DMA API protecting it from a null pointer to be passed in. So when > > dma_sync_sgtable_for_device(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE); > > is called with xfer->tx_sg.sgl being null, that will get passed all the way to > iommu_dma_sync_sg_for_device() and sg_dma_is_swiotlb(), where it'll be > dereferenced and cause the issue. Right, sorry I was missing that piece. > So it seems to me that either the DMA API > functions should check for the null pointer, or if the API doesn't want to > handle those cases (like sync being called before the buffer has been mapped), > then the caller needs to do the check, as was done in the original code. The dma-api.rst seems to imply that sync calls done after the mapping: "With the sync_sg API, all the parameters must be the same as those passed into the sg mapping API." The dma-api-howto.rst is clearer on this: "So, firstly, just map it with dma_map_{single,sg}(), and after each DMA transfer call either:: dma_sync_single_for_cpu(dev, dma_handle, size, direction); or:: dma_sync_sg_for_cpu(dev, sglist, nents, direction); as appropriate." So, it means the calling sync APIs on unprepared resources is a shooting in a foot. OTOH > The same applies for the change in spi_unmap_buf_attrs(). I see sg_free_table() > does handle a null sgl, but dma_unmap_sgtable() doesn't (and indeed I verified > null pointer dereference happens there too if I avoid this one). Taking into account the above, I think those memset()'s has actually to be paired with a dummy SG table, which is empty. --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1220,6 +1220,11 @@ void spi_unmap_buf(struct spi_controller *ctlr, struct device *dev, spi_unmap_buf_attrs(ctlr, dev, sgt, dir, 0); } +/* Dummy SG for unidirect transfers */ +static struct scatterlist dummy_sg = { + .page_link = SG_END, +}; + static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) { struct device *tx_dev, *rx_dev; @@ -1260,6 +1265,7 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) return ret; } else { memset(&xfer->tx_sg, 0, sizeof(xfer->tx_sg)); + xfer->tx_sg.sgl = &dummy_sg; } if (xfer->rx_buf != NULL) { @@ -1275,6 +1281,7 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) } } else { memset(&xfer->rx_sg, 0, sizeof(xfer->rx_sg)); + xfer->rx_sg.sgl = &dummy_sg; } } /* No transfer has been mapped, bail out with success */ But the best shot is to fix IOMMU for nents == 0 case in my opinion. Neglecting nents before accessing the SG is not a good idea. Catalin? The commit in question here is this one 861370f49ce4 ("iommu/dma: force bouncing if the size is not cacheline-aligned").
Hi, On 15/05/2024 23:09, Nícolas F. R. A. Prado wrote: > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: >> Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() >> have checks for orig_nents against 0. No need to duplicate this. >> All the same applies to other DMA mapping API calls. >> >> Also note, there is no other user in the kernel that does this kind of >> checks. >> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > Hi, > > this commit caused a regression which I reported here: > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > along with some thoughts on the cause and a possible solution, though I'm not > familiar with this code base at all and would really appreciate any feedback you > may have. I also see the same regression on the SM8550 and SM8650 platforms, please CC linux-arm-msm@vger.kernel.org and me for a potential fix to test on those platforms. Thanks, Neil > > Thanks, > Nícolas >
On Wed, May 22, 2024 at 12:03:33PM +0200, Neil Armstrong wrote: > On 15/05/2024 23:09, Nícolas F. R. A. Prado wrote: > > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > > have checks for orig_nents against 0. No need to duplicate this. > > > All the same applies to other DMA mapping API calls. > > > > > > Also note, there is no other user in the kernel that does this kind of > > > checks. > > > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > Hi, > > > > this commit caused a regression which I reported here: > > > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > > > along with some thoughts on the cause and a possible solution, though I'm not > > familiar with this code base at all and would really appreciate any feedback you > > may have. > > I also see the same regression on the SM8550 and SM8650 platforms, > please CC linux-arm-msm@vger.kernel.org and me for a potential fix to test on those platforms. There is still no answer from IOMMU patch author. Do you have the same trace due to IOMMU calls? Anyway, I guess it would be nice to see it. Meanwhile, I have three changes I posted in the replies to the initial report, can you combine them all and test? This will be a plan B (? or A, depending on the culprit).
On 22/05/2024 13:33, Andy Shevchenko wrote: > On Wed, May 22, 2024 at 12:03:33PM +0200, Neil Armstrong wrote: >> On 15/05/2024 23:09, Nícolas F. R. A. Prado wrote: >>> On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: >>>> Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() >>>> have checks for orig_nents against 0. No need to duplicate this. >>>> All the same applies to other DMA mapping API calls. >>>> >>>> Also note, there is no other user in the kernel that does this kind of >>>> checks. >>>> >>>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> >>> >>> Hi, >>> >>> this commit caused a regression which I reported here: >>> >>> https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano >>> >>> along with some thoughts on the cause and a possible solution, though I'm not >>> familiar with this code base at all and would really appreciate any feedback you >>> may have. >> >> I also see the same regression on the SM8550 and SM8650 platforms, >> please CC linux-arm-msm@vger.kernel.org and me for a potential fix to test on those platforms. > > There is still no answer from IOMMU patch author. Do you have the same trace > due to IOMMU calls? Anyway, I guess it would be nice to see it. Yes : [ 6.404623] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c <snip> [ 6.641597] lr : __dma_sync_sg_for_device+0x3c/0x40 <snip> [ 6.688286] Call trace: [ 6.688287] iommu_dma_sync_sg_for_device+0x28/0x100 [ 6.717582] __dma_sync_sg_for_device+0x3c/0x40 [ 6.717585] spi_transfer_one_message+0x358/0x680 [ 6.732229] __spi_pump_transfer_message+0x188/0x494 [ 6.732232] __spi_sync+0x2a8/0x3c4 [ 6.732234] spi_sync+0x30/0x54 [ 6.732236] goodix_berlin_spi_write+0xf8/0x164 [goodix_berlin_spi] [ 6.739854] _regmap_raw_write_impl+0x538/0x674 [ 6.750053] _regmap_raw_write+0xb4/0x144 [ 6.750056] regmap_raw_write+0x7c/0xc0 [ 6.750058] goodix_berlin_power_on+0xb0/0x1b0 [goodix_berlin_core] [ 6.765520] goodix_berlin_probe+0xc0/0x660 [goodix_berlin_core] [ 6.765522] goodix_berlin_spi_probe+0x12c/0x14c [goodix_berlin_spi] [ 6.772339] spi_probe+0x84/0xe4 [ 6.772342] really_probe+0xbc/0x29c [ 6.784313] __driver_probe_device+0x78/0x12c [ 6.784316] driver_probe_device+0x3c/0x15c [ 6.784319] __driver_attach+0x90/0x19c [ 6.784322] bus_for_each_dev+0x7c/0xdc [ 6.794520] driver_attach+0x24/0x30 [ 6.794523] bus_add_driver+0xe4/0x208 [ 6.794526] driver_register+0x5c/0x124 [ 6.802586] __spi_register_driver+0xa4/0xe4 [ 6.802589] goodix_berlin_spi_driver_init+0x20/0x1000 [goodix_berlin_spi] [ 6.802591] do_one_initcall+0x80/0x1c8 [ 6.902310] do_init_module+0x60/0x218 [ 6.921988] load_module+0x1bcc/0x1d8c [ 6.925847] init_module_from_file+0x88/0xcc [ 6.930238] __arm64_sys_finit_module+0x1dc/0x2e4 [ 6.935074] invoke_syscall+0x48/0x114 [ 6.938944] el0_svc_common.constprop.0+0xc0/0xe0 [ 6.943781] do_el0_svc+0x1c/0x28 [ 6.947195] el0_svc+0x34/0xd8 [ 6.950348] el0t_64_sync_handler+0x120/0x12c [ 6.954833] el0t_64_sync+0x190/0x194 [ 6.958600] Code: 2a0203f5 2a0303f6 a90363f7 aa0003f7 (b9401c2 Reverting 8cc3bad9d9d6 ("spi: Remove unneded check for orig_nents") removes the crash. > > Meanwhile, I have three changes I posted in the replies to the initial report, > can you combine them all and test? This will be a plan B (? or A, depending on > the culprit). > I'll try to apply them and test. Neil
Hi, On 22/05/2024 13:53, Neil Armstrong wrote: > On 22/05/2024 13:33, Andy Shevchenko wrote: >> On Wed, May 22, 2024 at 12:03:33PM +0200, Neil Armstrong wrote: >>> On 15/05/2024 23:09, Nícolas F. R. A. Prado wrote: >>>> On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: >>>>> Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() >>>>> have checks for orig_nents against 0. No need to duplicate this. >>>>> All the same applies to other DMA mapping API calls. >>>>> >>>>> Also note, there is no other user in the kernel that does this kind of >>>>> checks. >>>>> >>>>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> >>>> >>>> Hi, >>>> >>>> this commit caused a regression which I reported here: >>>> >>>> https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano >>>> >>>> along with some thoughts on the cause and a possible solution, though I'm not >>>> familiar with this code base at all and would really appreciate any feedback you >>>> may have. >>> >>> I also see the same regression on the SM8550 and SM8650 platforms, >>> please CC linux-arm-msm@vger.kernel.org and me for a potential fix to test on those platforms. >> >> There is still no answer from IOMMU patch author. Do you have the same trace >> due to IOMMU calls? Anyway, I guess it would be nice to see it. > > Yes : > [ 6.404623] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c > <snip> > [ 6.641597] lr : __dma_sync_sg_for_device+0x3c/0x40 > <snip> > [ 6.688286] Call trace: > [ 6.688287] iommu_dma_sync_sg_for_device+0x28/0x100 > [ 6.717582] __dma_sync_sg_for_device+0x3c/0x40 > [ 6.717585] spi_transfer_one_message+0x358/0x680 > [ 6.732229] __spi_pump_transfer_message+0x188/0x494 > [ 6.732232] __spi_sync+0x2a8/0x3c4 > [ 6.732234] spi_sync+0x30/0x54 > [ 6.732236] goodix_berlin_spi_write+0xf8/0x164 [goodix_berlin_spi] > [ 6.739854] _regmap_raw_write_impl+0x538/0x674 > [ 6.750053] _regmap_raw_write+0xb4/0x144 > [ 6.750056] regmap_raw_write+0x7c/0xc0 > [ 6.750058] goodix_berlin_power_on+0xb0/0x1b0 [goodix_berlin_core] > [ 6.765520] goodix_berlin_probe+0xc0/0x660 [goodix_berlin_core] > [ 6.765522] goodix_berlin_spi_probe+0x12c/0x14c [goodix_berlin_spi] > [ 6.772339] spi_probe+0x84/0xe4 > [ 6.772342] really_probe+0xbc/0x29c > [ 6.784313] __driver_probe_device+0x78/0x12c > [ 6.784316] driver_probe_device+0x3c/0x15c > [ 6.784319] __driver_attach+0x90/0x19c > [ 6.784322] bus_for_each_dev+0x7c/0xdc > [ 6.794520] driver_attach+0x24/0x30 > [ 6.794523] bus_add_driver+0xe4/0x208 > [ 6.794526] driver_register+0x5c/0x124 > [ 6.802586] __spi_register_driver+0xa4/0xe4 > [ 6.802589] goodix_berlin_spi_driver_init+0x20/0x1000 [goodix_berlin_spi] > [ 6.802591] do_one_initcall+0x80/0x1c8 > [ 6.902310] do_init_module+0x60/0x218 > [ 6.921988] load_module+0x1bcc/0x1d8c > [ 6.925847] init_module_from_file+0x88/0xcc > [ 6.930238] __arm64_sys_finit_module+0x1dc/0x2e4 > [ 6.935074] invoke_syscall+0x48/0x114 > [ 6.938944] el0_svc_common.constprop.0+0xc0/0xe0 > [ 6.943781] do_el0_svc+0x1c/0x28 > [ 6.947195] el0_svc+0x34/0xd8 > [ 6.950348] el0t_64_sync_handler+0x120/0x12c > [ 6.954833] el0t_64_sync+0x190/0x194 > [ 6.958600] Code: 2a0203f5 2a0303f6 a90363f7 aa0003f7 (b9401c2 > > Reverting 8cc3bad9d9d6 ("spi: Remove unneded check for orig_nents") removes the crash. > >> >> Meanwhile, I have three changes I posted in the replies to the initial report, >> can you combine them all and test? This will be a plan B (? or A, depending on >> the culprit). >> > > I'll try to apply them and test. I stacked the 3 changes, and it works: Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD For reference, the changeset looks like: ============><============================================================================================ diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 289feccca376..0851c5e1fd1f 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1220,6 +1220,11 @@ void spi_unmap_buf(struct spi_controller *ctlr, struct device *dev, spi_unmap_buf_attrs(ctlr, dev, sgt, dir, 0); } +/* Dummy SG for unidirect transfers */ +static struct scatterlist dummy_sg = { + .page_link = SG_END, +}; + static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) { struct device *tx_dev, *rx_dev; @@ -1243,6 +1248,7 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) else rx_dev = ctlr->dev.parent; + ret = -ENOMSG; list_for_each_entry(xfer, &msg->transfers, transfer_list) { /* The sync is done before each transfer. */ unsigned long attrs = DMA_ATTR_SKIP_CPU_SYNC; @@ -1257,6 +1263,9 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) attrs); if (ret != 0) return ret; + } else { + memset(&xfer->tx_sg, 0, sizeof(xfer->tx_sg)); + xfer->tx_sg.sgl = &dummy_sg; } if (xfer->rx_buf != NULL) { @@ -1270,8 +1279,14 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) return ret; } + } else { + memset(&xfer->rx_sg, 0, sizeof(xfer->rx_sg)); + xfer->rx_sg.sgl = &dummy_sg; } } + /* No transfer has been mapped, bail out with success */ + if (ret) + return 0; ctlr->cur_rx_dma_dev = rx_dev; ctlr->cur_tx_dma_dev = tx_dev; ============><============================================================================================ Thanks, Neil > > Neil >
On Wed, May 22, 2024 at 03:18:18PM +0200, Neil Armstrong wrote: > On 22/05/2024 13:53, Neil Armstrong wrote: > > On 22/05/2024 13:33, Andy Shevchenko wrote: > > > On Wed, May 22, 2024 at 12:03:33PM +0200, Neil Armstrong wrote: > > > > On 15/05/2024 23:09, Nícolas F. R. A. Prado wrote: > > > > > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > > > > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > > > > > have checks for orig_nents against 0. No need to duplicate this. > > > > > > All the same applies to other DMA mapping API calls. > > > > > > > > > > > > Also note, there is no other user in the kernel that does this kind of > > > > > > checks. > > > > > > > > > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > > > > > > Hi, > > > > > > > > > > this commit caused a regression which I reported here: > > > > > > > > > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > > > > > > > > > along with some thoughts on the cause and a possible solution, though I'm not > > > > > familiar with this code base at all and would really appreciate any feedback you > > > > > may have. > > > > > > > > I also see the same regression on the SM8550 and SM8650 platforms, > > > > please CC linux-arm-msm@vger.kernel.org and me for a potential fix to test on those platforms. > > > > > > There is still no answer from IOMMU patch author. Do you have the same trace > > > due to IOMMU calls? Anyway, I guess it would be nice to see it. > > > > Yes : > > [ 6.404623] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c > > <snip> > > [ 6.641597] lr : __dma_sync_sg_for_device+0x3c/0x40 > > <snip> > > [ 6.688286] Call trace: > > [ 6.688287] iommu_dma_sync_sg_for_device+0x28/0x100 > > [ 6.717582] __dma_sync_sg_for_device+0x3c/0x40 > > [ 6.717585] spi_transfer_one_message+0x358/0x680 > > [ 6.732229] __spi_pump_transfer_message+0x188/0x494 > > [ 6.732232] __spi_sync+0x2a8/0x3c4 > > [ 6.732234] spi_sync+0x30/0x54 > > [ 6.732236] goodix_berlin_spi_write+0xf8/0x164 [goodix_berlin_spi] > > [ 6.739854] _regmap_raw_write_impl+0x538/0x674 > > [ 6.750053] _regmap_raw_write+0xb4/0x144 > > [ 6.750056] regmap_raw_write+0x7c/0xc0 > > [ 6.750058] goodix_berlin_power_on+0xb0/0x1b0 [goodix_berlin_core] > > [ 6.765520] goodix_berlin_probe+0xc0/0x660 [goodix_berlin_core] > > [ 6.765522] goodix_berlin_spi_probe+0x12c/0x14c [goodix_berlin_spi] > > [ 6.772339] spi_probe+0x84/0xe4 > > [ 6.772342] really_probe+0xbc/0x29c > > [ 6.784313] __driver_probe_device+0x78/0x12c > > [ 6.784316] driver_probe_device+0x3c/0x15c > > [ 6.784319] __driver_attach+0x90/0x19c > > [ 6.784322] bus_for_each_dev+0x7c/0xdc > > [ 6.794520] driver_attach+0x24/0x30 > > [ 6.794523] bus_add_driver+0xe4/0x208 > > [ 6.794526] driver_register+0x5c/0x124 > > [ 6.802586] __spi_register_driver+0xa4/0xe4 > > [ 6.802589] goodix_berlin_spi_driver_init+0x20/0x1000 [goodix_berlin_spi] > > [ 6.802591] do_one_initcall+0x80/0x1c8 > > [ 6.902310] do_init_module+0x60/0x218 > > [ 6.921988] load_module+0x1bcc/0x1d8c > > [ 6.925847] init_module_from_file+0x88/0xcc > > [ 6.930238] __arm64_sys_finit_module+0x1dc/0x2e4 > > [ 6.935074] invoke_syscall+0x48/0x114 > > [ 6.938944] el0_svc_common.constprop.0+0xc0/0xe0 > > [ 6.943781] do_el0_svc+0x1c/0x28 > > [ 6.947195] el0_svc+0x34/0xd8 > > [ 6.950348] el0t_64_sync_handler+0x120/0x12c > > [ 6.954833] el0t_64_sync+0x190/0x194 > > [ 6.958600] Code: 2a0203f5 2a0303f6 a90363f7 aa0003f7 (b9401c2 > > > > Reverting 8cc3bad9d9d6 ("spi: Remove unneded check for orig_nents") removes the crash. > > > > > > > > Meanwhile, I have three changes I posted in the replies to the initial report, > > > can you combine them all and test? This will be a plan B (? or A, depending on > > > the culprit). > > > > > > > I'll try to apply them and test. > > I stacked the 3 changes, and it works: > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD Thank you! I will try to develop and submit a fix against IOMMU which I believe is the correct place to address this. So, this one will be plan B in case the IOMMU folks will refuse the other one.
On Wed, May 22, 2024 at 05:24:40PM +0300, Andy Shevchenko wrote: > On Wed, May 22, 2024 at 03:18:18PM +0200, Neil Armstrong wrote: > > On 22/05/2024 13:53, Neil Armstrong wrote: > > > On 22/05/2024 13:33, Andy Shevchenko wrote: > > > > On Wed, May 22, 2024 at 12:03:33PM +0200, Neil Armstrong wrote: > > > > > On 15/05/2024 23:09, Nícolas F. R. A. Prado wrote: > > > > > > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > > > > > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > > > > > > have checks for orig_nents against 0. No need to duplicate this. > > > > > > > All the same applies to other DMA mapping API calls. > > > > > > > > > > > > > > Also note, there is no other user in the kernel that does this kind of > > > > > > > checks. > > > > > > > > > > > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > > > > > > > > Hi, > > > > > > > > > > > > this commit caused a regression which I reported here: > > > > > > > > > > > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > > > > > > > > > > > along with some thoughts on the cause and a possible solution, though I'm not > > > > > > familiar with this code base at all and would really appreciate any feedback you > > > > > > may have. > > > > > > > > > > I also see the same regression on the SM8550 and SM8650 platforms, > > > > > please CC linux-arm-msm@vger.kernel.org and me for a potential fix to test on those platforms. > > > > > > > > There is still no answer from IOMMU patch author. Do you have the same trace > > > > due to IOMMU calls? Anyway, I guess it would be nice to see it. [..] > > > > > > > > Meanwhile, I have three changes I posted in the replies to the initial report, > > > > can you combine them all and test? This will be a plan B (? or A, depending on > > > > the culprit). > > > > > > > > > > I'll try to apply them and test. > > > > I stacked the 3 changes, and it works: > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD > > Thank you! > > I will try to develop and submit a fix against IOMMU which I believe is the > correct place to address this. So, this one will be plan B in case the IOMMU > folks will refuse the other one. Hi, that change did not work for me. Stack trace follows at the end. But adding the following on top did fix it: diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 0851c5e1fd1f..5d3972d9d1da 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1253,8 +1253,13 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) /* The sync is done before each transfer. */ unsigned long attrs = DMA_ATTR_SKIP_CPU_SYNC; - if (!ctlr->can_dma(ctlr, msg->spi, xfer)) + if (!ctlr->can_dma(ctlr, msg->spi, xfer)) { + memset(&xfer->tx_sg, 0, sizeof(xfer->tx_sg)); + xfer->tx_sg.sgl = &dummy_sg; + memset(&xfer->rx_sg, 0, sizeof(xfer->rx_sg)); + xfer->rx_sg.sgl = &dummy_sg; continue; + } if (xfer->tx_buf != NULL) { ret = spi_map_buf_attrs(ctlr, tx_dev, &xfer->tx_sg, The thing is that even with all the previous changes applied, if one of the transfers inside the message doesn't support DMA, the null pointer would still be passed to the DMA API. Alternatively, I think a better way to achieve the same is (I have verified this also works): diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 0851c5e1fd1f..3f2ee70d080a 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1322,7 +1322,7 @@ static int __spi_unmap_msg(struct spi_controller *ctlr, struct spi_message *msg) return 0; } -static void spi_dma_sync_for_device(struct spi_controller *ctlr, +static void spi_dma_sync_for_device(struct spi_controller *ctlr, struct spi_message *msg, struct spi_transfer *xfer) { struct device *rx_dev = ctlr->cur_rx_dma_dev; @@ -1331,11 +1331,14 @@ static void spi_dma_sync_for_device(struct spi_controller *ctlr, if (!ctlr->cur_msg_mapped) return; + if (!ctlr->can_dma(ctlr, msg->spi, xfer)) + return; + dma_sync_sgtable_for_device(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE); dma_sync_sgtable_for_device(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); } -static void spi_dma_sync_for_cpu(struct spi_controller *ctlr, +static void spi_dma_sync_for_cpu(struct spi_controller *ctlr, struct spi_message *msg, struct spi_transfer *xfer) { struct device *rx_dev = ctlr->cur_rx_dma_dev; @@ -1344,6 +1347,9 @@ static void spi_dma_sync_for_cpu(struct spi_controller *ctlr, if (!ctlr->cur_msg_mapped) return; + if (!ctlr->can_dma(ctlr, msg->spi, xfer)) + return; + dma_sync_sgtable_for_cpu(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); dma_sync_sgtable_for_cpu(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE); } @@ -1637,10 +1643,10 @@ static int spi_transfer_one_message(struct spi_controller *ctlr, reinit_completion(&ctlr->xfer_completion); fallback_pio: - spi_dma_sync_for_device(ctlr, xfer); + spi_dma_sync_for_device(ctlr, msg, xfer); ret = ctlr->transfer_one(ctlr, msg->spi, xfer); if (ret < 0) { - spi_dma_sync_for_cpu(ctlr, xfer); + spi_dma_sync_for_cpu(ctlr, msg, xfer); if (ctlr->cur_msg_mapped && (xfer->error & SPI_TRANS_FAIL_NO_START)) { @@ -1665,7 +1671,7 @@ static int spi_transfer_one_message(struct spi_controller *ctlr, msg->status = ret; } - spi_dma_sync_for_cpu(ctlr, xfer); + spi_dma_sync_for_cpu(ctlr, msg, xfer); } else { if (xfer->len) dev_err(&msg->spi->dev, I'll let you decide what is the best fix for the issue (what has been posted so far in this thread or another fix in IOMMU). If you go with this, feel free to add my Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Thanks, Nícolas Stack trace: [ 3.086173] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c [ 3.103002] Mem abort info: [ 3.105885] ESR = 0x0000000096000004 [ 3.109738] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.115198] SET = 0, FnV = 0 [ 3.118342] EA = 0, S1PTW = 0 [ 3.121570] FSC = 0x04: level 0 translation fault [ 3.126584] Data abort info: [ 3.129545] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 3.135188] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 3.140383] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 3.145837] [000000000000001c] user address but active_mm is swapper [ 3.152369] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 3.156423] input: cros_ec as /devices/platform/soc@0/ac0000.geniqup/a80000.spi/spi_master/spi6/spi6.0/a80000.spi:ec@0:keyboard-controller/input/input0 [ 3.158806] Modules linked in: [ 3.158812] CPU: 6 PID: 68 Comm: kworker/u32:2 Not tainted 6.9.0-next-20240515-00005-gad5f430e51d2 #424 [ 3.158820] Hardware name: Google Kingoftown (DT) [ 3.158823] Workqueue: events_unbound deferred_probe_work_func [ 3.175091] input: cros_ec_buttons as /devices/platform/soc@0/ac0000.geniqup/a80000.spi/spi_master/spi6/spi6.0/a80000.spi:ec@0:keyboard-controller/input/input1 [ 3.175859] [ 3.175861] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3.186858] cros-ec-spi spi6.0: Chrome EC device registered [ 3.190317] pc : iommu_dma_sync_sg_for_device+0x28/0x100 [ 3.190327] lr : __dma_sync_sg_for_device+0x28/0x4c [ 3.211853] tpm_tis_spi spi0.0: Cr50 firmware version: B2-C:0 RO_A:0.0.11/bc74f7dc RW_A:0.6.70/cr50_v2.0.3361-b70f24b1b [ 3.212461] sp : ffff800080942dc0 [ 3.212464] x29: ffff800080942dc0 x28: ffff7eb103be2000 x27: ffff7eb101012010 [ 3.257562] x26: ffff800080943008 x25: ffff7eb103be2480 x24: ffffb819fada5180 [ 3.264887] x23: ffff7eb101012010 x22: 0000000000000001 x21: 0000000000000000 [ 3.272213] x20: ffffb819fb10c718 x19: 0000000000000000 x18: ffffb819fbde8988 [ 3.279539] x17: 0000000000010108 x16: 0000000000000000 x15: 0000000000000002 [ 3.286863] x14: 0000000000000001 x13: 000000000016356d x12: 0000000000000001 [ 3.294188] x11: ffff800080942cd0 x10: ffff7eb103c74ff8 x9 : ffff7eb103be2469 [ 3.301515] x8 : ffff7eb10101cf04 x7 : 00000000ffffffff x6 : 0000000000000001 [ 3.308849] x5 : ffffb819fa51a780 x4 : ffffb819f9704acc x3 : 0000000000000001 [ 3.316182] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff7eb101012010 [ 3.323509] Call trace: [ 3.326023] iommu_dma_sync_sg_for_device+0x28/0x100 [ 3.331125] __dma_sync_sg_for_device+0x28/0x4c [ 3.335790] spi_transfer_one_message+0x378/0x6e4 [ 3.340634] __spi_pump_transfer_message+0x1e4/0x504 [ 3.345737] __spi_sync+0x2a0/0x3c4 [ 3.349334] spi_sync+0x30/0x54 [ 3.352568] spi_mem_exec_op+0x26c/0x41c [ 3.356610] spi_nor_spimem_read_data+0x148/0x158 [ 3.361454] spi_nor_read_data+0x30/0x3c [ 3.365494] spi_nor_read_sfdp+0x74/0xe4 [ 3.369532] spi_nor_parse_sfdp+0x120/0x11d0 [ 3.373921] spi_nor_sfdp_init_params_deprecated+0x3c/0x8c [ 3.379562] spi_nor_scan+0x7ac/0xef8 [ 3.383336] spi_nor_probe+0x94/0x2ec [ 3.387109] spi_mem_probe+0x6c/0xac [ 3.390796] spi_probe+0x84/0xe4 [ 3.394119] really_probe+0xbc/0x2a0 [ 3.397793] __driver_probe_device+0x78/0x12c [ 3.402270] driver_probe_device+0x40/0x160 [ 3.406569] __device_attach_driver+0xb8/0x134 [ 3.411144] bus_for_each_drv+0x84/0xe0 [ 3.415091] __device_attach+0xa8/0x1b0 [ 3.419040] device_initial_probe+0x14/0x20 [ 3.423340] bus_probe_device+0xa8/0xac [ 3.427288] device_add+0x590/0x750 [ 3.430872] __spi_add_device+0x138/0x208 [ 3.434999] of_register_spi_device+0x394/0x57c [ 3.439656] spi_register_controller+0x394/0x760 [ 3.444399] qcom_qspi_probe+0x328/0x390 [ 3.448442] platform_probe+0x68/0xd8 [ 3.452216] really_probe+0xbc/0x2a0 [ 3.455898] __driver_probe_device+0x78/0x12c [ 3.460382] driver_probe_device+0x40/0x160 [ 3.464682] __device_attach_driver+0xb8/0x134 [ 3.469246] bus_for_each_drv+0x84/0xe0 [ 3.473193] __device_attach+0xa8/0x1b0 [ 3.477137] device_initial_probe+0x14/0x20 [ 3.481435] bus_probe_device+0xa8/0xac [ 3.485383] deferred_probe_work_func+0x88/0xc0 [ 3.490044] process_one_work+0x154/0x298 [ 3.494172] worker_thread+0x304/0x408 [ 3.498035] kthread+0x118/0x11c [ 3.501358] ret_from_fork+0x10/0x20 [ 3.505046] Code: 2a0203f5 2a0303f6 a90363f7 aa0003f7 (b9401c20) [ 3.511301] ---[ end trace 0000000000000000 ]---
On Wed, May 22, 2024 at 11:12:43AM -0400, Nícolas F. R. A. Prado wrote: > On Wed, May 22, 2024 at 05:24:40PM +0300, Andy Shevchenko wrote: > > On Wed, May 22, 2024 at 03:18:18PM +0200, Neil Armstrong wrote: > > > On 22/05/2024 13:53, Neil Armstrong wrote: > > > > On 22/05/2024 13:33, Andy Shevchenko wrote: > > > > > On Wed, May 22, 2024 at 12:03:33PM +0200, Neil Armstrong wrote: > > > > > > On 15/05/2024 23:09, Nícolas F. R. A. Prado wrote: > > > > > > > On Tue, May 07, 2024 at 11:10:27PM +0300, Andy Shevchenko wrote: > > > > > > > > Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() > > > > > > > > have checks for orig_nents against 0. No need to duplicate this. > > > > > > > > All the same applies to other DMA mapping API calls. > > > > > > > > > > > > > > > > Also note, there is no other user in the kernel that does this kind of > > > > > > > > checks. > > > > > > > > > > > > > > > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > this commit caused a regression which I reported here: > > > > > > > > > > > > > > https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano > > > > > > > > > > > > > > along with some thoughts on the cause and a possible solution, though I'm not > > > > > > > familiar with this code base at all and would really appreciate any feedback you > > > > > > > may have. > > > > > > > > > > > > I also see the same regression on the SM8550 and SM8650 platforms, > > > > > > please CC linux-arm-msm@vger.kernel.org and me for a potential fix to test on those platforms. > > > > > > > > > > There is still no answer from IOMMU patch author. Do you have the same trace > > > > > due to IOMMU calls? Anyway, I guess it would be nice to see it. > [..] > > > > > > > > > > Meanwhile, I have three changes I posted in the replies to the initial report, > > > > > can you combine them all and test? This will be a plan B (? or A, depending on > > > > > the culprit). > > > > > > > > > > > > > I'll try to apply them and test. > > > > > > I stacked the 3 changes, and it works: > > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD > > > > Thank you! > > > > I will try to develop and submit a fix against IOMMU which I believe is the > > correct place to address this. So, this one will be plan B in case the IOMMU > > folks will refuse the other one. > > Hi, > > that change did not work for me. Stack trace follows at the end. > > But adding the following on top did fix it: > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > index 0851c5e1fd1f..5d3972d9d1da 100644 > --- a/drivers/spi/spi.c > +++ b/drivers/spi/spi.c > @@ -1253,8 +1253,13 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) > /* The sync is done before each transfer. */ > unsigned long attrs = DMA_ATTR_SKIP_CPU_SYNC; > > - if (!ctlr->can_dma(ctlr, msg->spi, xfer)) > + if (!ctlr->can_dma(ctlr, msg->spi, xfer)) { > + memset(&xfer->tx_sg, 0, sizeof(xfer->tx_sg)); > + xfer->tx_sg.sgl = &dummy_sg; > + memset(&xfer->rx_sg, 0, sizeof(xfer->rx_sg)); > + xfer->rx_sg.sgl = &dummy_sg; > continue; > + } > > if (xfer->tx_buf != NULL) { > ret = spi_map_buf_attrs(ctlr, tx_dev, &xfer->tx_sg, > > > The thing is that even with all the previous changes applied, if one of the > transfers inside the message doesn't support DMA, the null pointer would still > be passed to the DMA API. Ah, indeed, the conditionals also can be translated to can_dma = can_dma(); if (can_dma && _buf) ... else ... but... > Alternatively, I think a better way to achieve the same is (I have verified this > also works): I agree that this one is much better approach. I will clean it up and send as a quick fix. IOMMU will still be on the table as I think it's wrong to dereference SG when nents = 0. > I'll let you decide what is the best fix for the issue (what has been posted so > far in this thread or another fix in IOMMU). If you go with this, feel free to > add my > > Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Thank you!
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index b7e40898590a..cb27d9e551a4 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1221,12 +1221,10 @@ static void spi_unmap_buf_attrs(struct spi_controller *ctlr, enum dma_data_direction dir, unsigned long attrs) { - if (sgt->orig_nents) { - dma_unmap_sgtable(dev, sgt, dir, attrs); - sg_free_table(sgt); - sgt->orig_nents = 0; - sgt->nents = 0; - } + dma_unmap_sgtable(dev, sgt, dir, attrs); + sg_free_table(sgt); + sgt->orig_nents = 0; + sgt->nents = 0; } void spi_unmap_buf(struct spi_controller *ctlr, struct device *dev, @@ -1331,10 +1329,8 @@ static void spi_dma_sync_for_device(struct spi_controller *ctlr, if (!ctlr->cur_msg_mapped) return; - if (xfer->tx_sg.orig_nents) - dma_sync_sgtable_for_device(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE); - if (xfer->rx_sg.orig_nents) - dma_sync_sgtable_for_device(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); + dma_sync_sgtable_for_device(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE); + dma_sync_sgtable_for_device(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); } static void spi_dma_sync_for_cpu(struct spi_controller *ctlr, @@ -1346,10 +1342,8 @@ static void spi_dma_sync_for_cpu(struct spi_controller *ctlr, if (!ctlr->cur_msg_mapped) return; - if (xfer->rx_sg.orig_nents) - dma_sync_sgtable_for_cpu(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); - if (xfer->tx_sg.orig_nents) - dma_sync_sgtable_for_cpu(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE); + dma_sync_sgtable_for_cpu(rx_dev, &xfer->rx_sg, DMA_FROM_DEVICE); + dma_sync_sgtable_for_cpu(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE); } #else /* !CONFIG_HAS_DMA */ static inline int __spi_map_msg(struct spi_controller *ctlr,
Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() have checks for orig_nents against 0. No need to duplicate this. All the same applies to other DMA mapping API calls. Also note, there is no other user in the kernel that does this kind of checks. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> --- drivers/spi/spi.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-)