From patchwork Fri Jan 8 23:05:30 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Anderson X-Patchwork-Id: 7990871 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 328DF9F6CD for ; Fri, 8 Jan 2016 23:08:09 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4631320138 for ; Fri, 8 Jan 2016 23:08:08 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5A33A201C0 for ; Fri, 8 Jan 2016 23:08:07 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1aHg7D-0004ut-1p; Fri, 08 Jan 2016 23:06:39 +0000 Received: from mail-pf0-x234.google.com ([2607:f8b0:400e:c00::234]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1aHg6q-0004i2-DN for linux-arm-kernel@lists.infradead.org; Fri, 08 Jan 2016 23:06:17 +0000 Received: by mail-pf0-x234.google.com with SMTP id 65so15829229pff.2 for ; Fri, 08 Jan 2016 15:05:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=eAtABilXl+QeJJmZkQgy6Hs/aCoEoyG2y7ctfkslBr0=; b=IS8FQG4CZU9MTQXxH/AbjkQsEDUJJGBJLav6PhBgCIWPUH0JkDkAGiQnZ3nG7bAQeV WXjpC2qLv/5j0qrr/MTSB83qJbnD+LRAJiCd2IasUEk+/WnUlTG+S3ozMiyZmg6l2W46 C+5N3FVWaAse97KV3T/x2VsGQm+X7Zt0AR3DU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=eAtABilXl+QeJJmZkQgy6Hs/aCoEoyG2y7ctfkslBr0=; b=e2OExmArC2CDmEoLlsbQRRpIjrpvFbT9bVZ9xxJKqfK9t/0MxFEQJXafoBi2ZuMw2K mWR5+NXY1neRYhJrBVxHTee/Xl2S7WSiT3TgYS0184yqMaDWUOy7GHu4R85DOW9Gsyaz /o0E/qtSpP7UuWgOzkSsZ1Yz+M3GTRA4Ne9jk9QTO5KEddyEYGSVe2DXeQHuFBPG+uzN 6M/V+buDHMlewhgXdcSqdvi7LD+66VL4Yogvy+LEepo/MonDzQiRboAxRmaB63lmp7w2 SpxeC5OMpEHSZFmMM0o0+jM3mSAm/7/YZOKHtQHimZTdlSlXNaXGbnG+89eslCQPK0ii JiCA== X-Gm-Message-State: ALoCoQkGnAsVJOBngWN7w5suYzlPuNtXIgbf0UQaMdn0tegolZ+tspSmXgDqiyahLDDThWYOt3kYzRaeaeVQIFDQ3PMvlpxn1Q== X-Received: by 10.98.87.20 with SMTP id l20mr8038214pfb.70.1452294355753; Fri, 08 Jan 2016 15:05:55 -0800 (PST) Received: from tictac.mtv.corp.google.com ([172.22.65.76]) by smtp.gmail.com with ESMTPSA id z7sm7027783pfi.19.2016.01.08.15.05.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 08 Jan 2016 15:05:55 -0800 (PST) From: Douglas Anderson To: Russell King , Mauro Carvalho Chehab , Robin Murphy , Tomasz Figa , Marek Szyprowski Subject: [PATCH v5 3/5] ARM: dma-mapping: Use DMA_ATTR_NO_HUGE_PAGE hint to optimize allocation Date: Fri, 8 Jan 2016 15:05:30 -0800 Message-Id: <1452294332-23415-4-git-send-email-dianders@chromium.org> X-Mailer: git-send-email 2.6.0.rc2.230.g3dd15c0 In-Reply-To: <1452294332-23415-1-git-send-email-dianders@chromium.org> References: <1452294332-23415-1-git-send-email-dianders@chromium.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160108_150616_545671_7A70348A X-CRM114-Status: GOOD ( 18.33 ) X-Spam-Score: -2.7 (--) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: laurent.pinchart+renesas@ideasonboard.com, Pawel Osciak , mike.looijmans@topic.nl, lorenx4@gmail.com, Dmitry Torokhov , will.deacon@arm.com, Douglas Anderson , linux-kernel@vger.kernel.org, carlo@caione.org, dan.j.williams@intel.com, akpm@linux-foundation.org, linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we know that TLB efficiency will not be an issue when memory is accessed then it's not terribly important to allocate big chunks of memory. The whole point of allocating the big chunks was that it would make TLB usage efficient. As Marek Szyprowski indicated: Please note that mapping memory with larger pages significantly improves performance, especially when IOMMU has a little TLB cache. This can be easily observed when multimedia devices do processing of RGB data with 90/270 degree rotation Image rotation is distinctly an operation that needs to bounce around through memory, so it makes sense that TLB efficiency is important there. Video decoding, on the other hand, is a fairly sequential operation. During video decoding it's not expected that we'll be jumping all over memory. Decoding video is also pretty heavy and the TLB misses aren't a huge deal. Presumably most HW video acceleration users of dma-mapping will not care about huge pages and will set DMA_ATTR_NO_HUGE_PAGE. Allocating big chunks of memory is quite expensive, especially if we're doing it repeadly and memory is full. In one (out of tree) usage model it is common that arm_iommu_alloc_attrs() is called 16 times in a row, each one trying to allocate 4 MB of memory. This is called whenever the system encounters a new video, which could easily happen while the memory system is stressed out. In fact, on certain social media websites that auto-play video and have infinite scrolling, it's quite common to see not just one of these 16x4MB allocations but 2 or 3 right after another. Asking the system even to do a small amount of extra work to give us big chunks in this case is just not a good use of time. Allocating big chunks of memory is also expensive indirectly. Even if we ask the system not to do ANY extra work to allocate _our_ memory, we're still potentially eating up all big chunks in the system. Presumably there are other users in the system that aren't quite as flexible and that actually need these big chunks. By eating all the big chunks we're causing extra work for the rest of the system. We also may start making other memory allocations fail. While the system may be robust to such failures (as is the case with dwc2 USB trying to allocate buffers for Ethernet data and with WiFi trying to allocate buffers for WiFi data), it is yet another big performance hit. Signed-off-by: Douglas Anderson Acked-by: Marek Szyprowski --- Changes in v5: - renamed DMA_ATTR_NOHUGEPAGE to DMA_ATTR_NO_HUGE_PAGE Changes in v4: - renamed DMA_ATTR_SEQUENTIAL to DMA_ATTR_NOHUGEPAGE - added Marek's ack Changes in v3: - Use DMA_ATTR_SEQUENTIAL hint patch new for v3. Changes in v2: None arch/arm/mm/dma-mapping.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index bc9cebfa0891..e9fb2929cb7b 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1158,6 +1158,10 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, return pages; } + /* Go straight to 4K chunks if caller says it's OK. */ + if (dma_get_attr(DMA_ATTR_NO_HUGE_PAGE, attrs)) + order_idx = ARRAY_SIZE(iommu_order_array) - 1; + /* * IOMMU can map any pages, so himem can also be used here */