diff mbox series

[RFC] mmc: tmio: Protect the asynchronous usage of mrq by a lock

Message ID 20240220061356.3001761-1-dirk.behme@de.bosch.com (mailing list archive)
State Under Review
Delegated to: Geert Uytterhoeven
Headers show
Series [RFC] mmc: tmio: Protect the asynchronous usage of mrq by a lock | expand

Commit Message

Dirk Behme Feb. 20, 2024, 6:13 a.m. UTC
Analyzing the KASAN report [1] tells us that in

mmc_request_done+0xcc/0x30c

what can be resolved to an access to

mrq->cap_cmd_during_tfr

in mmc_command_done() called inline from mmc_request_done() "mrq"
becomes invalid.

In the driver we have two work queues, tmio_mmc_reset_work() and
tmio_mmc_done_work(). Both operate on mrq.

Synchronize this by extending the spin lock protected sections.

[1]

BUG: KASAN: stack-out-of-bounds in mmc_request_done+0xcc/0x30c
Read of size 1 at addr ffff8005df517738 by task kworker/4:1/2308

CPU: 4 PID: 2308 Comm: kworker/4:1 Tainted: G         C      4.14.327-ltsi
Hardware name: custom board based on r8a7796 (DT)
Workqueue: events tmio_mmc_reset_work
Call trace:
  dump_backtrace+0x0/0x1fc
  show_stack+0x14/0x1c
  dump_stack+0xcc/0x114
  print_address_description+0x54/0x238
  kasan_report+0x274/0x29c
  __asan_load1+0x24/0x50
  mmc_request_done+0xcc/0x30c
  tmio_mmc_reset_work+0x1fc/0x248
  process_one_work+0x324/0x6c0
  worker_thread+0x358/0x4d4
  kthread+0x1a4/0x1bc
  ret_from_fork+0x10/0x18

The buggy address belongs to the page:
page:ffff7e00177d45c0 pfn:61f517 refcount:0 mapcount:0 mapping:
(null) index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffff7e00177d45e0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8005df517600: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 00
  ffff8005df517680: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
 >ffff8005df517700: 00 00 f1 f1 f1 f1 04 f2 f2 f2 00 00 00 00 00 00
                                         ^
  ffff8005df517780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8005df517800: 00 00 f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00
==================================================================
Disabling lock debugging due to kernel taint
------------[ cut here ]------------

Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
---

Notes: Instead of extending the irqsave spin lock sections and with
       this blocking the interrupts even longer we could discuss if
       a new "mrq synchronization" spin lock should be added. But of
       course the 'host->mrq = NULL;' needs to go into the host->lock
       protected section.

 drivers/mmc/host/tmio_mmc_core.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Comments

Wolfram Sang Feb. 20, 2024, 9:34 a.m. UTC | #1
> BUG: KASAN: stack-out-of-bounds in mmc_request_done+0xcc/0x30c
> Read of size 1 at addr ffff8005df517738 by task kworker/4:1/2308

Do you have a reproducer for this report? I.e. can you say that this
patch fixes the issue?
Wolfram Sang Feb. 28, 2024, 9:54 a.m. UTC | #2
Hi Dirk,

thanks a ton for this report and RFC!

On Tue, Feb 20, 2024 at 07:13:56AM +0100, Dirk Behme wrote:
> Analyzing the KASAN report [1] tells us that in
> 
> mmc_request_done+0xcc/0x30c
> 
> what can be resolved to an access to
> 
> mrq->cap_cmd_during_tfr
> 
> in mmc_command_done() called inline from mmc_request_done() "mrq"
> becomes invalid.
> 
> In the driver we have two work queues, tmio_mmc_reset_work() and
> tmio_mmc_done_work(). Both operate on mrq.
> 
> Synchronize this by extending the spin lock protected sections.

As discussed further privately, we both see the problem but want to keep
the critical sections minimal. So, after some more investigations, a
counter-patch is coming in a minute.

   Wolfram
diff mbox series

Patch

diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
index be7f18fd4836..d876af57db48 100644
--- a/drivers/mmc/host/tmio_mmc_core.c
+++ b/drivers/mmc/host/tmio_mmc_core.c
@@ -261,14 +261,15 @@  static void tmio_mmc_reset_work(struct work_struct *work)
 
 	host->cmd = NULL;
 	host->data = NULL;
-
+	/* Ready for new calls */
+	host->mrq = NULL;
 	spin_unlock_irqrestore(&host->lock, flags);
 
 	tmio_mmc_reset(host, true);
 
-	/* Ready for new calls */
-	host->mrq = NULL;
+	spin_lock_irqsave(&host->lock, flags);
 	mmc_request_done(host->mmc, mrq);
+	spin_unlock_irqrestore(&host->lock, flags);
 }
 
 /* These are the bitmasks the tmio chip requires to implement the MMC response
@@ -812,9 +813,9 @@  static void tmio_mmc_request(struct mmc_host *mmc, struct mmc_request *mrq)
 	wmb();
 	host->mrq = mrq;
 
-	spin_unlock_irqrestore(&host->lock, flags);
-
 	tmio_process_mrq(host, mrq);
+
+	spin_unlock_irqrestore(&host->lock, flags);
 }
 
 static void tmio_mmc_finish_request(struct tmio_mmc_host *host)
@@ -841,8 +842,6 @@  static void tmio_mmc_finish_request(struct tmio_mmc_host *host)
 
 	cancel_delayed_work(&host->delayed_reset_work);
 
-	spin_unlock_irqrestore(&host->lock, flags);
-
 	if (mrq->cmd->error || (mrq->data && mrq->data->error)) {
 		tmio_mmc_ack_mmc_irqs(host, TMIO_MASK_IRQ); /* Clear all */
 		tmio_mmc_abort_dma(host);
@@ -855,6 +854,7 @@  static void tmio_mmc_finish_request(struct tmio_mmc_host *host)
 	/* If SET_BLOCK_COUNT, continue with main command */
 	if (host->mrq && !mrq->cmd->error) {
 		tmio_process_mrq(host, mrq);
+		spin_unlock_irqrestore(&host->lock, flags);
 		return;
 	}
 
@@ -862,6 +862,7 @@  static void tmio_mmc_finish_request(struct tmio_mmc_host *host)
 		host->fixup_request(host, mrq);
 
 	mmc_request_done(host->mmc, mrq);
+	spin_unlock_irqrestore(&host->lock, flags);
 }
 
 static void tmio_mmc_done_work(struct work_struct *work)