From patchwork Wed Aug 21 20:18:32 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Grant Grundler X-Patchwork-Id: 2847896 Return-Path: X-Original-To: patchwork-linux-mmc@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 9DDD99F271 for ; Wed, 21 Aug 2013 20:18:37 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 5FEB2204EB for ; Wed, 21 Aug 2013 20:18:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0ADE7204E2 for ; Wed, 21 Aug 2013 20:18:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752542Ab3HUUSe (ORCPT ); Wed, 21 Aug 2013 16:18:34 -0400 Received: from mail-oa0-f49.google.com ([209.85.219.49]:64559 "EHLO mail-oa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752289Ab3HUUSd (ORCPT ); Wed, 21 Aug 2013 16:18:33 -0400 Received: by mail-oa0-f49.google.com with SMTP id n10so1847197oag.36 for ; Wed, 21 Aug 2013 13:18:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=4uG7P4CcO7g3kbHcFhMv4QvLHNExVie4nKAhLVQdrZg=; b=WxZ8EFndU8sVQQEZo+KNd63xbZvcfilUBIgvSYLp6QxQa5CBeiSi8wGwWPa2d6INWG d1j6cyXlzzs4D8LPNDePPM7+yZRlEGv4NydjmzWu6TVIQ1iw8jOd6Lh+QhWeoO0KbDOa rLlc5A+fWEJ86K4LK+QEVVMtbIz60hDU91Y1Qy7EevClg8056WLYUJNhoZQ6S49nkeKu 3yncr+8CBHhNZuhld68GqnzlhuEj7q3vkyB5oY07iGloyo4dQSMEn9szxVEoNj+rQ3eh Sl1UOW2LCWVI+hI5YhblO0y6lkLZkZr8O+tWPXrxkjFUWFB+jLbtIiELEcQcBBxWWLV6 GNCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=4uG7P4CcO7g3kbHcFhMv4QvLHNExVie4nKAhLVQdrZg=; b=QGY2DZW9789UZVvqFKF6vawvQYswshgTK7Xkd98baMrfE13GUGYKNSBfrz+8dKfW6v sTOHNSws3aIVcXfOhXstyx1sYORZu2BDCDqYVpd8I/FwMljqW/RUA9wNLjvT6tIluivY tmWPs20ewDt/28XdVR4YyrBv3Xu9dJfXon0Ok= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=4uG7P4CcO7g3kbHcFhMv4QvLHNExVie4nKAhLVQdrZg=; b=AKeBemiBZXk1gHyyAfpNqDnKjUh1DfSUUxFr/TGsozK0mc+sl9F5qNUguG/nZWm5Z3 5y4wA5d4WFkqO30KpiuQ0QhECyEYQ8w2NgmrvMtYvxFDkubTfDWVCprm2oqMnxeyODFw tIHF9Yf14XTx3677jMXJbrQzAdxL3BKXQumizOf9M8/5/7Z/EVsYhz0iAIzl5XGnaizJ tgazW6+Pa4i7goJkNf/yEOSJXi4uphVm+NC+J60kIaCj7O1tLJoSCIKOD3PSx4WBw0ol TXpicS6r56lH9mzGP6xxDpgZEmigdGe3vhWPuu57gsv7BhnKGrKKsin0Ka7ORcrnd8uK IEyQ== X-Gm-Message-State: ALoCoQlxID6w6s5WJH/m0pa3S4UVZ7pN4pr/JXlBAVCXq8quATEmvuD0qwSjplSWv23D33KGpUOUMcX357wsAMySV7WsmHl0llll9DxbVJh41knmrCJldh1DiY97u8GctBkQCk+iYTWVKDpCl8gw1+83WH0nox7DgVKxRVROKaPWYXOss3dcnViBBCF2EXMpAA6e8Ojkvz4i7oI17bu0YEEPY+TcF5f/Ig== MIME-Version: 1.0 X-Received: by 10.50.44.83 with SMTP id c19mr6445032igm.2.1377116312567; Wed, 21 Aug 2013 13:18:32 -0700 (PDT) Received: by 10.231.202.135 with HTTP; Wed, 21 Aug 2013 13:18:32 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Aug 2013 13:18:32 -0700 X-Google-Sender-Auth: xcZx3Lw1rdaZwgJ5MDa48znenZc Message-ID: Subject: Re: MMC locking: mmc_request accesses from dw_mmc driver ok? From: Grant Grundler To: Chris Ball , Jaehoon Chung , Doug Anderson Cc: "linux-mmc@vger.kernel.org" Sender: linux-mmc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-mmc@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Aug 12, 2013 at 4:45 PM, Grant Grundler wrote: > I've been working on an "task mmcqd/0:84 blocked for more than 120 > seconds" panic for the past month or so in the chromeos-3.4 kernel > tree. Stack trace below. Feel free to tell me "fixed in v3.x". :) I've added this change: cmd->retries = 0; and the "WARN_ON(host->claimer != current)" is triggering with this stack trace (as I suspected): WARNING: at /mnt/host/source/src/third_party/kernel/files/drivers/mmc/core/core.c:139 mmc_request_done+0x6c/0xf0() Modules linked in: i2c_dev uinput asix usbnet nf_conntrack_ipv6 nf_defrag_ipv6 uvcvideo videobuf2_vmalloc sbs_battery mwifiex_sdio mwifiex cfg80211 btmrvl_sdio btmrvl rtc_s3c bluetooth zram(C) zsmalloc(C) fuse ip6table_filter xt_mark ip6_tables joydev [<800150a4>] (unwind_backtrace+0x0/0x114) from [<804ee9a0>] (dump_stack+0x20/0x24) [<804ee9a0>] (dump_stack+0x20/0x24) from [<8002bff8>] (warn_slowpath_null+0x44/0x5c) [<8002bff8>] (warn_slowpath_null+0x44/0x5c) from [<803a3e74>] (mmc_request_done+0x6c/0xf0) [<803a3e74>] (mmc_request_done+0x6c/0xf0) from [<803b9c4c>] (dw_mci_request_end+0xc4/0xec) [<803b9c4c>] (dw_mci_request_end+0xc4/0xec) from [<803ba274>] (dw_mci_tasklet_func+0x354/0x3a8) [<803ba274>] (dw_mci_tasklet_func+0x354/0x3a8) from [<80034044>] (tasklet_action+0xac/0x12c) [<80034044>] (tasklet_action+0xac/0x12c) from [<800334f8>] (__do_softirq+0xc4/0x208) [<800334f8>] (__do_softirq+0xc4/0x208) from [<80033a54>] (irq_exit+0x54/0x94) [<80033a54>] (irq_exit+0x54/0x94) from [<8000ef68>] (handle_IRQ+0x8c/0xc8) [<8000ef68>] (handle_IRQ+0x8c/0xc8) from [<800085ec>] (gic_handle_irq+0x4c/0x70) [<800085ec>] (gic_handle_irq+0x4c/0x70) from [<8000e200>] (__irq_svc+0x40/0x60) Is this expected behavior? It feels wrong to me since it means the dw_mmc tasklet and whatever thread starts the IO can access the mmc_request data structure. Do I have this right? If I have this right, any "obvious" fixes? e.g. add locking to mmc_start_request() and mmc_request_done()? thanks, grant > After staring at the 14 MMC and DW driver data structures, I now > think dw_mmc driver is accessing MMC generic data structures > (mmc_request and mmc_queue_req) without grabbing either > mmc_blk_data->lock or mmc_queue->thread_sem and it needs to. I don't > have a specific stack trace yet where dw_mmc driver is accessing MMC > generic data without protection. This is where I need some guidance. > > I am confident dw_mmc driver is always acquiring dw_mci->lock when > accessing data in dw_mci structure(s). I don't see any locking around > access into the struct mmc_request by dw_mci_slot[]->mrq though - not > sure where that belongs. > > Two questions: > 1) is there interest in adding "assert_spin_locked()" calls to > document locking dependencies? > 2) Does anyone understand this code well enough to confirm I'm on the > right track and which code path I should be looking at? > > > Back to the bug: mmc_start_req() is sleeping, waiting for the > "previous" (in flight) "async" IO to complete. (1) This IO never > completes (unlikely) OR (2) already did (e.g. mmc_host->areq is stale) > OR (3) mmc_host->areq is non-zero garbage. I'll add some code to > confirm (3) not the last case. > > I have confirmed with the stress test I'm running (many async IO in > flight with two antagonist processes that burns CPU cycles) gets about > 4 completions per second that are "done" before we call > mmc_start_req(). So I know the race in (2) could happen. > > > thanks, > grant > > > INFO: task mmcqd/0:84 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > mmcqd/0 D 804f4890 0 84 2 0x00000000 > [<804f4890>] (__schedule+0x614/0x780) from [<804f4dc0>] (schedule+0x94/0x98) > [<804f4dc0>] (schedule+0x94/0x98) from [<804f2ae0>] > (schedule_timeout+0x38/0x2d0) > [<804f2ae0>] (schedule_timeout+0x38/0x2d0) from [<804f4c14>] > (wait_for_common+0x164/0x1a0) > [<804f4c14>] (wait_for_common+0x164/0x1a0) from [<804f4d28>] > (wait_for_completion+0x20/0x24) > [<804f4d28>] (wait_for_completion+0x20/0x24) from [<803a39f8>] > (mmc_wait_for_req_done+0x2c/0x84) > [<803a39f8>] (mmc_wait_for_req_done+0x2c/0x84) from [<803a4b50>] > (mmc_start_req+0x60/0x120) > [<803a4b50>] (mmc_start_req+0x60/0x120) from [<803b09bc>] > (mmc_blk_issue_rw_rq+0xa0/0x3a8) > [<803b09bc>] (mmc_blk_issue_rw_rq+0xa0/0x3a8) from [<803b10e8>] > (mmc_blk_issue_rq+0x424/0x478) > [<803b10e8>] (mmc_blk_issue_rq+0x424/0x478) from [<803b220c>] > (mmc_queue_thread+0xb0/0x118) > [<803b220c>] (mmc_queue_thread+0xb0/0x118) from [<8004d620>] (kthread+0xa8/0xbc) > [<8004d620>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8) > Kernel panic - not syncing: hung_task: blocked tasks > [<800150a4>] (unwind_backtrace+0x0/0x114) from [<804ee160>] > (dump_stack+0x20/0x24) > [<804ee160>] (dump_stack+0x20/0x24) from [<804ee2d0>] (panic+0xa8/0x1f4) > [<804ee2d0>] (panic+0xa8/0x1f4) from [<80086d44>] (watchdog+0x1f4/0x25c) > [<80086d44>] (watchdog+0x1f4/0x25c) from [<8004d620>] (kthread+0xa8/0xbc) > [<8004d620>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8) --- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 1cf4900..a127ce1 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -135,6 +135,9 @@ void mmc_request_done(struct mmc_host *host, struct mmc_request *mrq) struct mmc_command *cmd = mrq->cmd; int err = cmd->error; +WARN_ON(!host->claimed); +WARN_ON(host->claimer != current); + if (err && cmd->retries && mmc_host_is_spi(host)) { if (cmd->resp[0] & R1_SPI_ILLEGAL_COMMAND)