From patchwork Thu Mar 9 11:19:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Murphy X-Patchwork-Id: 9613117 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AC515602B4 for ; Thu, 9 Mar 2017 11:29:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A1885285F5 for ; Thu, 9 Mar 2017 11:29:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 95C4E285ED; Thu, 9 Mar 2017 11:29:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFAED285ED for ; Thu, 9 Mar 2017 11:29:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753291AbdCIL3o (ORCPT ); Thu, 9 Mar 2017 06:29:44 -0500 Received: from foss.arm.com ([217.140.101.70]:42116 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753158AbdCIL3n (ORCPT ); Thu, 9 Mar 2017 06:29:43 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 44FA6C28; Thu, 9 Mar 2017 03:20:00 -0800 (PST) Received: from [10.1.210.40] (e110467-lin.cambridge.arm.com [10.1.210.40]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6D98E3F3E1; Thu, 9 Mar 2017 03:19:58 -0800 (PST) Subject: Re: [Question] devm_kmalloc() for DMA ? To: Masahiro Yamada References: Cc: dmaengine@vger.kernel.org, linux-arm-kernel , Russell King - ARM Linux , Arnd Bergmann , Linux Kernel Mailing List , "James E.J. Bottomley" , Tejun Heo , "David S. Miller" From: Robin Murphy Message-ID: <894e8867-57b5-39ab-4dbd-3761d485cb62@arm.com> Date: Thu, 9 Mar 2017 11:19:56 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dmaengine@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 08/03/17 18:06, Masahiro Yamada wrote: > Hi Robin, > > > 2017-03-08 20:15 GMT+09:00 Robin Murphy : >> On 08/03/17 10:59, Masahiro Yamada wrote: >>> Hi experts, >>> >>> I have a question about >>> how to allocate DMA-safe buffer. >>> >>> >>> In my understanding, kmalloc() returns >>> memory with DMA safe alignment >>> in order to avoid cache-sharing problem when used for DMA. >>> >>> The alignment is decided by ARCH_DMA_MINALIGN. >>> For example, on modern ARM 32bit boards, this value is typically 64. >>> So, memory returned by kmalloc() has >>> at least 64 byte alignment. >>> >>> >>> On the other hand, devm_kmalloc() does not return >>> enough-aligned memory. >> >> How so? If anything returned by kmalloc() is guaranteed to occupy some >> multiple of ARCH_DMA_MINALIGN bytes in order to avoid two allocations >> falling into the same cache line, I don't see how stealing the first 16 >> bytes *of a single allocation* could make it start sharing cache lines >> with another? :/ > > I just thought of traverse of the linked list of devres_node > on a different thread, but it should not happen. > > Please forget my stupid question. On the contrary, my apologies for overlooking the subtlety here and speaking too soon. It's not strictly the alignment of the allocation that's at issue, it's more that the streaming DMA notion of "ownership" of that particular allocation can be unwittingly violated by other agents. Another thread traversing the list isn't actually a problem, as it's effectively no different from the cache itself doing a speculative prefetch, which we have to accommodate anyway. However, there could in theory be a potential data loss if the devres node is *modified* (e.g. an adjacent entry is added or removed) whilst the buffer is mapped for DMA_FROM_DEVICE. Now, as Russell says, doing streaming DMA on a managed allocation tied to the device's lifetime doesn't make a great deal of sense, and if you're allocating or freeing device-lifetime resources *while DMA is active* you're almost certainly doing something wrong, so thankfully I don't think we should need to worry about this in practice. Looking at the Denali driver specifically, it does indeed look a bit bogus. At least it's not doing DMA with a uint8-aligned array in the middle of a live structure as it apparently once did, but in terms of how it's actually using that buffer I think something along the lines of the below would be appropriate (if anyone wants to try spinning a proper patch out of it, feel free - I have no way of testing and am not familiar with MTD in general). Robin. ----->8----- @@ -1063,7 +1061,6 @@ static int write_page(struct mtd_info *mtd, struct nand_chip *chip, } denali_enable_dma(denali, false); - dma_sync_single_for_cpu(denali->dev, addr, size, DMA_TO_DEVICE); return 0; } @@ -1139,7 +1136,6 @@ static int denali_read_page(struct mtd_info *mtd, struct nand_chip *chip, setup_ecc_for_xfer(denali, true, false); denali_enable_dma(denali, true); - dma_sync_single_for_device(denali->dev, addr, size, DMA_FROM_DEVICE); clear_interrupts(denali); denali_setup_dma(denali, DENALI_READ); @@ -1147,8 +1143,6 @@ static int denali_read_page(struct mtd_info *mtd, struct nand_chip *chip, /* wait for operation to complete */ irq_status = wait_for_irq(denali, irq_mask); - dma_sync_single_for_cpu(denali->dev, addr, size, DMA_FROM_DEVICE); - memcpy(buf, denali->buf.buf, mtd->writesize); check_erased_page = handle_ecc(denali, buf, irq_status, &max_bitflips); @@ -1186,16 +1180,12 @@ static int denali_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip, setup_ecc_for_xfer(denali, false, true); denali_enable_dma(denali, true); - dma_sync_single_for_device(denali->dev, addr, size, DMA_FROM_DEVICE); - clear_interrupts(denali); denali_setup_dma(denali, DENALI_READ); /* wait for operation to complete */ wait_for_irq(denali, irq_mask); - dma_sync_single_for_cpu(denali->dev, addr, size, DMA_FROM_DEVICE); - denali_enable_dma(denali, false); memcpy(buf, denali->buf.buf, mtd->writesize); @@ -1466,16 +1456,6 @@ int denali_init(struct denali_nand_info *denali) if (ret) goto failed_req_irq; - /* allocate the right size buffer now */ - devm_kfree(denali->dev, denali->buf.buf); - denali->buf.buf = devm_kzalloc(denali->dev, - mtd->writesize + mtd->oobsize, - GFP_KERNEL); - if (!denali->buf.buf) { - ret = -ENOMEM; - goto failed_req_irq; - } - /* Is 32-bit DMA supported? */ ret = dma_set_mask(denali->dev, DMA_BIT_MASK(32)); if (ret) { @@ -1483,12 +1463,14 @@ int denali_init(struct denali_nand_info *denali) goto failed_req_irq; } - denali->buf.dma_buf = dma_map_single(denali->dev, denali->buf.buf, - mtd->writesize + mtd->oobsize, - DMA_BIDIRECTIONAL); - if (dma_mapping_error(denali->dev, denali->buf.dma_buf)) { - dev_err(denali->dev, "Failed to map DMA buffer\n"); - ret = -EIO; + /* allocate the right size buffer now */ + devm_kfree(denali->dev, denali->buf.buf); + denali->buf.buf = dmam_alloc_coherent(denali->dev, + mtd->writesize + mtd->oobsize, + &denali->buf.dma_buf, + GFP_KERNEL); + if (!denali->buf.buf) { + ret = -ENOMEM; goto failed_req_irq; } @@ -1598,7 +1580,5 @@ void denali_remove(struct denali_nand_info *denali) nand_release(mtd); denali_irq_cleanup(denali->irq, denali); - dma_unmap_single(denali->dev, denali->buf.dma_buf, bufsize, - DMA_BIDIRECTIONAL); } EXPORT_SYMBOL(denali_remove); --- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/mtd/nand/denali.c b/drivers/mtd/nand/denali.c index 73b9d4e2dca0..b2b4650a3923 100644 --- a/drivers/mtd/nand/denali.c +++ b/drivers/mtd/nand/denali.c @@ -1046,8 +1046,6 @@ static int write_page(struct mtd_info *mtd, struct nand_chip *chip, mtd->oobsize); } - dma_sync_single_for_device(denali->dev, addr, size, DMA_TO_DEVICE); - clear_interrupts(denali); denali_enable_dma(denali, true);