From patchwork Wed Jan  6 19:36:45 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Doug Anderson <dianders@chromium.org>
X-Patchwork-Id: 7970401
Return-Path: 
 <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org>
X-Original-To: patchwork-linux-arm@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id 0818BBEEED
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Wed,  6 Jan 2016 19:39:40 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 22AE42014A
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Wed,  6 Jan 2016 19:39:39 +0000 (UTC)
Received: from bombadil.infradead.org (bombadil.infradead.org
	[198.137.202.9])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 39D062013D
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Wed,  6 Jan 2016 19:39:38 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux))
	id 1aGttv-0002vk-Kz; Wed, 06 Jan 2016 19:37:43 +0000
Received: from mail-pa0-x235.google.com ([2607:f8b0:400e:c03::235])
	by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat
	Linux)) id 1aGtte-0002rY-IQ
	for linux-arm-kernel@lists.infradead.org;
	Wed, 06 Jan 2016 19:37:28 +0000
Received: by mail-pa0-x235.google.com with SMTP id do7so1077136pab.2
	for <linux-arm-kernel@lists.infradead.org>;
	Wed, 06 Jan 2016 11:37:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org;
	s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=cy8JOWzZhNW4KWIr5FkqFSA+7L6O/QxRNN89w019Syo=;
	b=NsZU4SgPpSKQBKDLCDtSYMM/ZtRzGr4UfkCzIQ9cW8vnFcUMLgCVAICxfJJ5TZSOkW
	Jze/APRvfGo2GJkeoQFIZ3QlZsSSE/OmCXQsqeXYb6EazUiXWANtcQIZRwsI0Bd5bjdr
	vN9lNTytZK4PEghU2kiShXjoitJpFL2iZxfNg=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=cy8JOWzZhNW4KWIr5FkqFSA+7L6O/QxRNN89w019Syo=;
	b=NVT+ApqNfoOpl/yHHekZsQ85YpHYPmetxg7d6oz09GSSwjhFij2HfWDHQCKGEqjG0l
	OJC0VNcCqVHHyfZUjvVkqcuSK8cyZptaSARxWjlmlhE0l13me5dyO/SfUneUYIbCwT4y
	PBRjgFxF26uEWVkAHqgu5Qe3EIfMIvmy3rICdEARuVPP+XMU49zrzdGfyse51kjURiAL
	ycSqTIeiGQhpI00gpzoFn4QS2dTIJtoYQ5B6cqdZhIl1spz/agjf8SM8UQX3cqTv+2vR
	n0FKGOzbr6Y7WWRtJnXcVHNBiAOh5UffjAqucCT55jSBe2X9aqW7UFCTFtIIPlNrOfMu
	roeQ==
X-Gm-Message-State: 
 ALoCoQnzw9COyVJg43m32JJHVIeovU7/UxP7IX12UpPE2G4fUueSX04pT+VpknayMxVpeWjNIlZDLfFc1NbfwfI9X/2qY+KsKw==
X-Received: by 10.66.150.228 with SMTP id ul4mr144751502pab.15.1452109024895;
	Wed, 06 Jan 2016 11:37:04 -0800 (PST)
Received: from tictac.mtv.corp.google.com ([172.22.65.76])
	by smtp.gmail.com with ESMTPSA id
	u14sm137274228pfi.58.2016.01.06.11.37.03
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Wed, 06 Jan 2016 11:37:04 -0800 (PST)
From: Douglas Anderson <dianders@chromium.org>
To: Russell King <linux@arm.linux.org.uk>
Subject: [PATCH v3 3/3] ARM: dma-mapping: Use DMA_ATTR_SEQUENTIAL hint to
	optimize allocation
Date: Wed,  6 Jan 2016 11:36:45 -0800
Message-Id: <1452109005-19517-4-git-send-email-dianders@chromium.org>
X-Mailer: git-send-email 2.6.0.rc2.230.g3dd15c0
In-Reply-To: <1452109005-19517-1-git-send-email-dianders@chromium.org>
References: <1452109005-19517-1-git-send-email-dianders@chromium.org>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20160106_113726_745992_C617672A 
X-CRM114-Status: GOOD (  19.65  )
X-Spam-Score: -2.7 (--)
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: 
 <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: 
 <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Cc: laurent.pinchart+renesas@ideasonboard.com,
	Pawel Osciak <pawel@osciak.com>,
	mike.looijmans@topic.nl, linux-kernel@vger.kernel.org,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>, will.deacon@arm.com,
	Douglas Anderson <dianders@chromium.org>,
	Tomasz Figa <tfiga@chromium.org>,
	penguin-kernel@i-love.sakura.ne.jp, carlo@caione.org,
	akpm@linux-foundation.org, Robin Murphy <robin.murphy@arm.com>,
	dan.j.williams@intel.com, linux-arm-kernel@lists.infradead.org,
	Marek Szyprowski <m.szyprowski@samsung.com>
MIME-Version: 1.0
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY
	autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

If we know that memory will be accessed sequentially then it's not
terribly important to allocate big chunks of memory.  The whole point of
allocating the big chunks was that it would make TLB usage efficient.

As Marek Szyprowski indicated:
    Please note that mapping memory with larger pages significantly
    improves performance, especially when IOMMU has a little TLB
    cache. This can be easily observed when multimedia devices do
    processing of RGB data with 90/270 degree rotation
Image rotation is distinctly not a sequential operation, so it makes
sense that TLB efficiency is important there.

Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory.  Thus if we know we're setting up DMA for a video decode
operation we can indicate DMA_ATTR_SEQUENTIAL.

Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full.  In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory.  This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out.  In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another.  Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.

Allocating big chunks of memory is also expensive indirectly.  Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks.  By eating all the big
chunks we're causing extra work for the rest of the system.  We also may
start making other memory allocations fail.  While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Changes in v3:
- Use DMA_ATTR_SEQUENTIAL hint patch new for v3.

Changes in v2: None

 arch/arm/mm/dma-mapping.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index bc9cebfa0891..58298221ce3e 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1159,6 +1159,13 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 	}
 
 	/*
+	 * Go straight to 4K chunks if it's sequential to ease the burden on
+	 * the memory manager and to leave bug chunks available for others.
+	 */
+	if (dma_get_attr(DMA_ATTR_SEQUENTIAL, attrs))
+		order_idx = ARRAY_SIZE(iommu_order_array) - 1;
+
+	/*
 	 * IOMMU can map any pages, so himem can also be used here
 	 */
 	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;