From patchwork Tue Jun 21 07:53:19 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Russell King - ARM Linux X-Patchwork-Id: 900382 Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p5L7s12D027157 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 21 Jun 2011 07:54:23 GMT Received: from canuck.infradead.org ([2001:4978:20e::1]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1QYvmI-0003Vi-9X; Tue, 21 Jun 2011 07:53:42 +0000 Received: from localhost ([127.0.0.1] helo=canuck.infradead.org) by canuck.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1QYvmH-00083Q-Ub; Tue, 21 Jun 2011 07:53:41 +0000 Received: from [2002:4e20:1eda::1] (helo=caramon.arm.linux.org.uk) by canuck.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1QYvmD-000836-DR for linux-arm-kernel@lists.infradead.org; Tue, 21 Jun 2011 07:53:39 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=arm.linux.org.uk; s=caramon; h=Sender:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=sD1zWdrBWrxWy3P/ibxw0mfVcCUcPwYY6X6ekeMw/vE=; b=gQJPNx4cr/siFRqKwTYq5P2TdqSrTesK7v0PUTVhCfQ+VLzsi6uBSHx6kkQtYm3PTckCxZzrP640JV7iRldBpyuKmieG78pE7Yc3z9iDhCmUt+0GepBpIYkr8MGQY/OBeVHxbq3OgZb6tCVlulmz5zNLUj9/nfBeqMKnCHSK31Q=; Received: from n2100.arm.linux.org.uk ([2002:4e20:1eda:1:214:fdff:fe10:4f86]) by caramon.arm.linux.org.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1QYvlx-00020D-2q; Tue, 21 Jun 2011 08:53:21 +0100 Received: from linux by n2100.arm.linux.org.uk with local (Exim 4.72) (envelope-from ) id 1QYvlv-0005ke-Id; Tue, 21 Jun 2011 08:53:19 +0100 Date: Tue, 21 Jun 2011 08:53:19 +0100 From: Russell King - ARM Linux To: Per Forlin Subject: Re: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency Message-ID: <20110621075319.GN26089@n2100.arm.linux.org.uk> References: <1308518257-9783-1-git-send-email-per.forlin@linaro.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1308518257-9783-1-git-send-email-per.forlin@linaro.org> User-Agent: Mutt/1.5.19 (2009-01-05) X-CRM114-Version: 20090807-BlameThorstenAndJenny ( TRE 0.7.6 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20110621_035338_264357_5B3446ED X-CRM114-Status: GOOD ( 17.63 ) X-Spam-Score: 1.2 (+) X-Spam-Report: SpamAssassin version 3.3.1 on canuck.infradead.org summary: Content analysis details: (1.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS Cc: Nicolas Pitre , Chris Ball , linaro-dev@lists.linaro.org, linux-mmc@vger.kernel.org, linux-arm-kernel@lists.infradead.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Tue, 21 Jun 2011 07:54:23 +0000 (UTC) On Sun, Jun 19, 2011 at 11:17:26PM +0200, Per Forlin wrote: > How significant is the cache maintenance over head? Per, Can you measure how much difference this has before and after your patch set please? This moves the dsb() out of the individual cache maintanence functions, such that we will only perform one dsb() per dma_*_sg call rather than one per SG entry. Thanks. arch/arm/include/asm/dma-mapping.h | 11 +++++++++++ arch/arm/mm/cache-fa.S | 6 ------ arch/arm/mm/cache-v4wb.S | 2 -- arch/arm/mm/cache-v6.S | 6 ------ arch/arm/mm/cache-v7.S | 3 --- arch/arm/mm/dma-mapping.c | 8 ++++++++ 6 files changed, 19 insertions(+), 17 deletions(-) diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 4fff837..853eba5 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -115,6 +115,11 @@ static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off, ___dma_page_dev_to_cpu(page, off, size, dir); } +static inline void __dma_sync(void) +{ + dsb(); +} + /* * Return whether the given device DMA address mask can be supported * properly. For example, if your device can only drive the low 24-bits @@ -378,6 +383,7 @@ static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, BUG_ON(!valid_dma_direction(dir)); addr = __dma_map_single(dev, cpu_addr, size, dir); + __dma_sync(); debug_dma_map_page(dev, virt_to_page(cpu_addr), (unsigned long)cpu_addr & ~PAGE_MASK, size, dir, addr, true); @@ -407,6 +413,7 @@ static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, BUG_ON(!valid_dma_direction(dir)); addr = __dma_map_page(dev, page, offset, size, dir); + __dma_sync(); debug_dma_map_page(dev, page, offset, size, dir, addr, false); return addr; @@ -431,6 +438,7 @@ static inline void dma_unmap_single(struct device *dev, dma_addr_t handle, { debug_dma_unmap_page(dev, handle, size, dir, true); __dma_unmap_single(dev, handle, size, dir); + __dma_sync(); } /** @@ -452,6 +460,7 @@ static inline void dma_unmap_page(struct device *dev, dma_addr_t handle, { debug_dma_unmap_page(dev, handle, size, dir, false); __dma_unmap_page(dev, handle, size, dir); + __dma_sync(); } /** @@ -484,6 +493,7 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev, return; __dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir); + __dma_sync(); } static inline void dma_sync_single_range_for_device(struct device *dev, @@ -498,6 +508,7 @@ static inline void dma_sync_single_range_for_device(struct device *dev, return; __dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir); + __dma_sync(); } static inline void dma_sync_single_for_cpu(struct device *dev, diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 1fa6f71..6eeb734 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -179,8 +179,6 @@ fa_dma_inv_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr /* @@ -197,8 +195,6 @@ fa_dma_clean_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr /* @@ -212,8 +208,6 @@ ENTRY(fa_dma_flush_range) add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr /* diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index f40c696..523c0cb 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -194,7 +194,6 @@ v4wb_dma_inv_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr /* @@ -211,7 +210,6 @@ v4wb_dma_clean_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr /* diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 73b4a8b..7a842dd 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -239,8 +239,6 @@ v6_dma_inv_range: strlo r2, [r0] @ write for ownership #endif blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr /* @@ -262,8 +260,6 @@ v6_dma_clean_range: add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr /* @@ -290,8 +286,6 @@ ENTRY(v6_dma_flush_range) strlob r2, [r0] @ write for ownership #endif blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr /* diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index d32f02b..18dcef6 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -257,7 +257,6 @@ v7_dma_inv_range: add r0, r0, r2 cmp r0, r1 blo 1b - dsb mov pc, lr ENDPROC(v7_dma_inv_range) @@ -275,7 +274,6 @@ v7_dma_clean_range: add r0, r0, r2 cmp r0, r1 blo 1b - dsb mov pc, lr ENDPROC(v7_dma_clean_range) @@ -293,7 +291,6 @@ ENTRY(v7_dma_flush_range) add r0, r0, r2 cmp r0, r1 blo 1b - dsb mov pc, lr ENDPROC(v7_dma_flush_range) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 82a093c..042b056 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -97,6 +97,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf memset(ptr, 0, size); dmac_flush_range(ptr, ptr + size); outer_flush_range(__pa(ptr), __pa(ptr) + size); + __dma_sync(); return page; } @@ -572,6 +573,7 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, if (dma_mapping_error(dev, s->dma_address)) goto bad_mapping; } + __dma_sync(); debug_dma_map_sg(dev, sg, nents, nents, dir); return nents; @@ -602,6 +604,8 @@ void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, for_each_sg(sg, s, nents, i) __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); + + __dma_sync(); } EXPORT_SYMBOL(dma_unmap_sg); @@ -627,6 +631,8 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, s->length, dir); } + __dma_sync(); + debug_dma_sync_sg_for_cpu(dev, sg, nents, dir); } EXPORT_SYMBOL(dma_sync_sg_for_cpu); @@ -653,6 +659,8 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, s->length, dir); } + __dma_sync(); + debug_dma_sync_sg_for_device(dev, sg, nents, dir); } EXPORT_SYMBOL(dma_sync_sg_for_device);