From patchwork Sun Mar 26 02:43:18 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Cvek X-Patchwork-Id: 9644933 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3C639601D7 for ; Sun, 26 Mar 2017 02:43:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2963326220 for ; Sun, 26 Mar 2017 02:43:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1AF6A27FA0; Sun, 26 Mar 2017 02:43:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 928E926220 for ; Sun, 26 Mar 2017 02:43:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751264AbdCZCnL (ORCPT ); Sat, 25 Mar 2017 22:43:11 -0400 Received: from bubo.tul.cz ([147.230.16.1]:48116 "EHLO bubo.tul.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751227AbdCZCnK (ORCPT ); Sat, 25 Mar 2017 22:43:10 -0400 X-Virus-Scanned: amavisd-new at tul.cz Received: from [IPv6:2001:1ae9:ff1:f191:86a:280f:c019:1caf] (unknown [IPv6:2001:1ae9:ff1:f191:86a:280f:c019:1caf]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by bubo.tul.cz (Postfix) with ESMTPSA id 5BA20180383C0; Sun, 26 Mar 2017 04:42:57 +0200 (CEST) Subject: Re: [BUG] dmaengine: pxa_dma: + mmc: pxamci: race condition with DMA error on tx channel To: Robert Jarzmik References: <877f3zwwgo.fsf@belgarion.home> <871stzh8bx.fsf@belgarion.home> Cc: vinod.koul@intel.com, Ulf Hansson , Daniel Mack , Haojian Zhuang , linux-mmc@vger.kernel.org, linux-arm-kernel@lists.infradead.org, dmaengine@vger.kernel.org From: Petr Cvek Message-ID: Date: Sun, 26 Mar 2017 04:43:18 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <871stzh8bx.fsf@belgarion.home> Sender: linux-mmc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-mmc@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Dne 14.3.2017 v 22:11 Robert Jarzmik napsal(a): > Petr Cvek writes: > > Ok Petr, I've been trying for days to reproduce without any luck. Hi, I think I was able to finally find the problem with the PXA MCI and DMA. It seems to be a problem with a race condition with vchan_complete() tasklet. The pxa_dma driver handles IRQ with pxad_chan_handler(), which calls vchan_cookie_complete(), which schedules the vchan_complete() tasklet. Starting the tasklet may take a long time. The race condition appeared during the heavy IRQ load. During that time, the pxamci driver can start another data write transmit. This another transmit with again schedule the tasklet. But as tasklet schedules are not cumulative, it will (probably) add item to a list. After some time the tasklet is finally scheduled and for every item in the list the callback pxamci_dma_irq() is called. And there is the main problem, the pxamci_dma_irq() is using __actual__ transmit variables (e.g. host->data). My debug printk shows the code tests a cookie of the actual transmit, which may be in a DMA_IN_PROGRESS state (every failure is during this state). Solution: I commented the error handling parts of the callback function and the driver works, but it is only for the testing purposes, there can be a partially filled FIFO (BUF_PART_FULL) which will not be handled. Problem is the driver won't wait on the completion before starting a new transmission. But this waiting on completion will probably slow down the MMC communication :-/. The best thing would be to handle the partial buffer somewhere else and get rid of the callback completely. If it is possible, probably not as I assume the partially filled buffer will not create pxamci interrupt? In the other case then maybe in pxamci_data_done()? Log: [ 2669.917946] dma dma0chan1: pxad_chan_handler(): checking txd c18c2f20[135f]: completed=1 dcsr=0x2000000c ^^^ schedules the tasklet [ 2669.924255] dma dma0chan1: pxad_chan_handler(): checking txd c1a6b740[1360]: completed=1 dcsr=0x2000000c ^^^ reschedule the tasklet [ 2669.934441] dma dma0chan1: pxad_chan_handler(): checking txd c1a6b880[1361]: completed=1 dcsr=0x2000000c ^^^ reschedule the tasklet [ 2669.944893] dma dma0chan1: pxad_chan_handler(): checking txd c1a78ba0[1362]: completed=1 dcsr=0x2000000c ^^^ reschedule the tasklet [ 2670.081114] ###pre ^^^ tasklet has finally started [ 2670.081187] ###post ^^^ first item of the list, callback [ 2670.081369] !!!cookie=1364 complete=1363 used=1364 ... status=1 ^^^ There it would fail with "DMA error on tx channel" [ 2670.081608] ###post ^^^ The second item of the list [ 2670.081678] !!!cookie=1364 complete=1363 used=1364 ... status=1 ^^^ Again called with the same host->data, notice same cookie, status=1 == DMA_IN_PROGRESS (always) The full log and used debug patch are attached. The machine is PXA272 @ 416MHz, during logging (over irda or usb ssh console) I created multiple interrupt sources by touching a touchscreen and pressing GPIO buttons. A higher bug occurrence was observed with sync-written files of unusual lengths: while : ; do dd if=/dev/urandom bs=7777 count=11 of=/tmp/file conv=fsync done > > I had a look at your traces, and I'd like something else when it happens : > 1) The patch I provided earlier applied Yeah it is DMA_IN_PROGRESS in the "bug" case and a few DMA_COMPLETE in the "normal" case, but I don't know if it is a valid value due to the race condition "aliasing". Anyway the status value is written in the "!!!cookie=" printk. > 2) This done (the 'cat' after the bug) : > mount -t debugfs none /sys/kernel/debug/ > cat /sys/kernel/debug/pxa-dma.0/channels/4/[sd]* > This wasn't possible as everything freezes on the vanilla kernel and with my debug patch the DMA transactions just continue. Cheers, Petr From 070a9c01d94e5efbdf0afae9ad2ddab380a2513e Mon Sep 17 00:00:00 2001 From: Petr Cvek Date: Thu, 23 Feb 2017 01:07:51 +0100 Subject: [PATCH] DMA/MMC debugs --- drivers/dma/virt-dma.c | 6 +++++- drivers/mmc/host/pxamci.c | 18 ++++++++++++++++-- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/dma/virt-dma.c b/drivers/dma/virt-dma.c index e47fc9b0944f..c7ea39ea97fb 100644 --- a/drivers/dma/virt-dma.c +++ b/drivers/dma/virt-dma.c @@ -101,15 +101,19 @@ static void vchan_complete(unsigned long arg) } spin_unlock_irq(&vc->lock); +pr_info("###pre\n"); dmaengine_desc_callback_invoke(&cb, NULL); while (!list_empty(&head)) { vd = list_first_entry(&head, struct virt_dma_desc, node); dmaengine_desc_get_callback(&vd->tx, &cb); +pr_info("###post\n"); list_del(&vd->node); - if (dmaengine_desc_test_reuse(&vd->tx)) + if (dmaengine_desc_test_reuse(&vd->tx)) { + pr_info("###reuse\n"); list_add(&vd->node, &vc->desc_allocated); + } else vc->desc_free(vd); diff --git a/drivers/mmc/host/pxamci.c b/drivers/mmc/host/pxamci.c index c763b404510f..d7f04b11e9e6 100644 --- a/drivers/mmc/host/pxamci.c +++ b/drivers/mmc/host/pxamci.c @@ -191,6 +191,8 @@ static void pxamci_setup_data(struct pxamci_host *host, struct mmc_data *data) host->data = data; +pr_info("##pxamci_setup_data\n"); + writel(nob, host->base + MMC_NOB); writel(data->blksz, host->base + MMC_BLKLEN); @@ -300,8 +302,10 @@ static int pxamci_cmd_done(struct pxamci_host *host, unsigned int stat) int i; u32 v; - if (!cmd) + if (!cmd) { +pr_info("###pxamci_cmd_done NONE\n"); return 0; + } host->cmd = NULL; @@ -354,8 +358,10 @@ static int pxamci_data_done(struct pxamci_host *host, unsigned int stat) struct mmc_data *data = host->data; struct dma_chan *chan; - if (!data) + if (!data) { +pr_info("###pxamci_data_done NONE\n"); return 0; + } if (data->flags & MMC_DATA_READ) chan = host->dma_chan_rx; @@ -385,6 +391,7 @@ static int pxamci_data_done(struct pxamci_host *host, unsigned int stat) host->data = NULL; if (host->mrq->stop) { pxamci_stop_clock(host); +pr_info("##pxamci_start_cmd DD\n"); pxamci_start_cmd(host, host->mrq->stop, host->cmdat); } else { pxamci_finish_request(host, host->mrq); @@ -441,6 +448,7 @@ static void pxamci_request(struct mmc_host *mmc, struct mmc_request *mrq) if (mrq->data->flags & MMC_DATA_WRITE) cmdat |= CMDAT_WRITE; } +pr_info("##pxamci_start_cmd RQ\n"); pxamci_start_cmd(host, mrq->cmd, cmdat); } @@ -568,13 +576,19 @@ static void pxamci_dma_irq(void *param) status = dmaengine_tx_status(chan, host->dma_cookie, &state); +pr_info("!!!cookie=%x complete=%x used=%x ... status=%i\n", + host->dma_cookie,chan->completed_cookie,chan->cookie, + status); + if (likely(status == DMA_COMPLETE)) { writel(BUF_PART_FULL, host->base + MMC_PRTBUF); +/* } else { pr_err("%s: DMA error on %s channel\n", mmc_hostname(host->mmc), host->data->flags & MMC_DATA_READ ? "rx" : "tx"); host->data->error = -EIO; pxamci_data_done(host, 0); +*/ } out_unlock: -- 2.11.0