From patchwork Thu Jul 26 18:54:21 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tony Battersby <tonyb@cybernetics.com>
X-Patchwork-Id: 10546333
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2AFD61805
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Thu, 26 Jul 2018 19:11:53 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1CFA22BCA0
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Thu, 26 Jul 2018 19:11:53 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 10CCE2BCC2; Thu, 26 Jul 2018 19:11:53 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A32892BCD0
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Thu, 26 Jul 2018 19:11:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1731183AbeGZUaC (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Thu, 26 Jul 2018 16:30:02 -0400
Received: from mail.cybernetics.com ([173.71.130.66]:37646 "EHLO
        mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1730517AbeGZUaC (ORCPT
        <rfc822;linux-scsi@vger.kernel.org>); Thu, 26 Jul 2018 16:30:02 -0400
X-ASG-Debug-ID: 1532631261-0fb3b01fb33a2750001-ziuLRu
Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with
 ESMTP id wjiWuOT5fySaaiG1 (version=SSLv3 cipher=DES-CBC3-SHA bits=112
 verify=NO); Thu, 26 Jul 2018 14:54:21 -0400 (EDT)
X-Barracuda-Envelope-From: tonyb@cybernetics.com
X-ASG-Whitelist: Client
Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1])
  by cybernetics.com (CommuniGate Pro SMTP 5.1.14)
  with ESMTPSA id 8304004; Thu, 26 Jul 2018 14:54:21 -0400
From: Tony Battersby <tonyb@cybernetics.com>
Subject: [PATCH 1/3] dmapool: improve scalability of dma_pool_alloc
To: Christoph Hellwig <hch@lst.de>,
        Marek Szyprowski <m.szyprowski@samsung.com>,
        Matthew Wilcox <willy@infradead.org>,
        Sathya Prakash <sathya.prakash@broadcom.com>,
        Chaitra P B <chaitra.basappa@broadcom.com>,
        Suganath Prabu Subramani
        <suganath-prabu.subramani@broadcom.com>,
        iommu@lists.linux-foundation.org, linux-mm@kvack.org,
        linux-scsi@vger.kernel.org, MPT-FusionLinux.pdl@broadcom.com
X-ASG-Orig-Subj: [PATCH 1/3] dmapool: improve scalability of dma_pool_alloc
Message-ID: <15ff502d-d840-1003-6c45-bc17f0d81262@cybernetics.com>
Date: Thu, 26 Jul 2018 14:54:21 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
Content-Language: en-US
X-Barracuda-Connect: UNKNOWN[10.157.1.126]
X-Barracuda-Start-Time: 1532631261
X-Barracuda-Encrypted: DES-CBC3-SHA
X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi
X-Barracuda-Scan-Msg-Size: 4197
X-Virus-Scanned: by bsmtpd at cybernetics.com
X-Barracuda-BRTS-Status: 1
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

dma_pool_alloc() scales poorly when allocating a large number of pages
because it does a linear scan of all previously-allocated pages before
allocating a new one.  Improve its scalability by maintaining a separate
list of pages that have free blocks ready to (re)allocate.  In big O
notation, this improves the algorithm from O(n^2) to O(n).

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

Using list_del_init() in dma_pool_alloc() makes it safe to call
list_del() unconditionally when freeing the page.

In dma_pool_free(), the check for being already in avail_page_list could
be written several different ways.  The most obvious way is:

if (page->offset >= pool->allocation)
	list_add(&page->avail_page_link, &pool->avail_page_list);

Another way would be to check page->in_use.  But since it is already
using list_del_init(), checking the list pointers directly is safest to
prevent any possible list corruption in case the caller misuses the API
(e.g. double-dma_pool_free()) with DMAPOOL_DEBUG disabled.

--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -20,6 +20,10 @@
  * least 'size' bytes.  Free blocks are tracked in an unsorted singly-linked
  * list of free blocks within the page.  Used blocks aren't tracked, but we
  * keep a count of how many are currently allocated from each page.
+ *
+ * The avail_page_list keeps track of pages that have one or more free blocks
+ * available to (re)allocate.  Pages are moved in and out of avail_page_list
+ * as their blocks are allocated and freed.
  */
 
 #include <linux/device.h>
@@ -44,6 +48,7 @@
 
 struct dma_pool {		/* the pool */
 	struct list_head page_list;
+	struct list_head avail_page_list;
 	spinlock_t lock;
 	size_t size;
 	struct device *dev;
@@ -55,6 +60,7 @@ struct dma_pool {		/* the pool */
 
 struct dma_page {		/* cacheable header for 'allocation' bytes */
 	struct list_head page_list;
+	struct list_head avail_page_link;
 	void *vaddr;
 	dma_addr_t dma;
 	unsigned int in_use;
@@ -164,6 +170,7 @@ struct dma_pool *dma_pool_create(const c
 	retval->dev = dev;
 
 	INIT_LIST_HEAD(&retval->page_list);
+	INIT_LIST_HEAD(&retval->avail_page_list);
 	spin_lock_init(&retval->lock);
 	retval->size = size;
 	retval->boundary = boundary;
@@ -256,6 +263,7 @@ static void pool_free_page(struct dma_po
 #endif
 	dma_free_coherent(pool->dev, pool->allocation, page->vaddr, dma);
 	list_del(&page->page_list);
+	list_del(&page->avail_page_link);
 	kfree(page);
 }
 
@@ -298,6 +306,7 @@ void dma_pool_destroy(struct dma_pool *p
 				       pool->name, page->vaddr);
 			/* leak the still-in-use consistent memory */
 			list_del(&page->page_list);
+			list_del(&page->avail_page_link);
 			kfree(page);
 		} else
 			pool_free_page(pool, page);
@@ -328,9 +337,11 @@ void *dma_pool_alloc(struct dma_pool *po
 	might_sleep_if(gfpflags_allow_blocking(mem_flags));
 
 	spin_lock_irqsave(&pool->lock, flags);
-	list_for_each_entry(page, &pool->page_list, page_list) {
-		if (page->offset < pool->allocation)
-			goto ready;
+	if (!list_empty(&pool->avail_page_list)) {
+		page = list_first_entry(&pool->avail_page_list,
+					struct dma_page,
+					avail_page_link);
+		goto ready;
 	}
 
 	/* pool_alloc_page() might sleep, so temporarily drop &pool->lock */
@@ -343,10 +354,13 @@ void *dma_pool_alloc(struct dma_pool *po
 	spin_lock_irqsave(&pool->lock, flags);
 
 	list_add(&page->page_list, &pool->page_list);
+	list_add(&page->avail_page_link, &pool->avail_page_list);
  ready:
 	page->in_use++;
 	offset = page->offset;
 	page->offset = *(int *)(page->vaddr + offset);
+	if (page->offset >= pool->allocation)
+		list_del_init(&page->avail_page_link);
 	retval = offset + page->vaddr;
 	*handle = offset + page->dma;
 #ifdef	DMAPOOL_DEBUG
@@ -461,6 +475,10 @@ void dma_pool_free(struct dma_pool *pool
 	memset(vaddr, POOL_POISON_FREED, pool->size);
 #endif
 
+	/* This test checks if the page is already in avail_page_list. */
+	if (list_empty(&page->avail_page_link))
+		list_add(&page->avail_page_link, &pool->avail_page_list);
+
 	page->in_use--;
 	*(int *)vaddr = page->offset;
 	page->offset = offset;

From patchwork Thu Jul 26 18:54:56 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tony Battersby <tonyb@cybernetics.com>
X-Patchwork-Id: 10546335
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9A441805
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Thu, 26 Jul 2018 19:11:57 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC21B2BCA0
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Thu, 26 Jul 2018 19:11:57 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id A060F2BCD0; Thu, 26 Jul 2018 19:11:57 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05E6D2BCA0
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Thu, 26 Jul 2018 19:11:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1731197AbeGZUaH (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Thu, 26 Jul 2018 16:30:07 -0400
Received: from mail.cybernetics.com ([173.71.130.66]:37654 "EHLO
        mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1730517AbeGZUaH (ORCPT
        <rfc822;linux-scsi@vger.kernel.org>); Thu, 26 Jul 2018 16:30:07 -0400
X-ASG-Debug-ID: 1532631296-0fb3b01fb33a2770001-ziuLRu
Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with
 ESMTP id EjELGtzuCzCSTUNj (version=SSLv3 cipher=DES-CBC3-SHA bits=112
 verify=NO); Thu, 26 Jul 2018 14:54:56 -0400 (EDT)
X-Barracuda-Envelope-From: tonyb@cybernetics.com
X-ASG-Whitelist: Client
Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1])
  by cybernetics.com (CommuniGate Pro SMTP 5.1.14)
  with ESMTPSA id 8304012; Thu, 26 Jul 2018 14:54:56 -0400
From: Tony Battersby <tonyb@cybernetics.com>
Subject: [PATCH 2/3] dmapool: improve scalability of dma_pool_free
To: Christoph Hellwig <hch@lst.de>,
        Marek Szyprowski <m.szyprowski@samsung.com>,
        Matthew Wilcox <willy@infradead.org>,
        Sathya Prakash <sathya.prakash@broadcom.com>,
        Chaitra P B <chaitra.basappa@broadcom.com>,
        Suganath Prabu Subramani
        <suganath-prabu.subramani@broadcom.com>,
        iommu@lists.linux-foundation.org, linux-mm@kvack.org,
        linux-scsi <linux-scsi@vger.kernel.org>,
        MPT-FusionLinux.pdl@broadcom.com
X-ASG-Orig-Subj: [PATCH 2/3] dmapool: improve scalability of dma_pool_free
Message-ID: <1288e597-a67a-25b3-b7c6-db883ca67a25@cybernetics.com>
Date: Thu, 26 Jul 2018 14:54:56 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
Content-Language: en-US
X-Barracuda-Connect: UNKNOWN[10.157.1.126]
X-Barracuda-Start-Time: 1532631296
X-Barracuda-Encrypted: DES-CBC3-SHA
X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi
X-Barracuda-Scan-Msg-Size: 8846
X-Barracuda-BRTS-Status: 1
X-Virus-Scanned: by bsmtpd at cybernetics.com
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

dma_pool_free() scales poorly when the pool contains many pages because
pool_find_page() does a linear scan of all allocated pages.  Improve its
scalability by replacing the linear scan with a red-black tree lookup. 
In big O notation, this improves the algorithm from O(n^2) to O(n * log n).

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

I moved some code from dma_pool_destroy() into pool_free_page() to avoid code
repetition.

--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -15,11 +15,12 @@
  * Many older drivers still have their own code to do this.
  *
  * The current design of this allocator is fairly simple.  The pool is
- * represented by the 'struct dma_pool' which keeps a doubly-linked list of
- * allocated pages.  Each page in the page_list is split into blocks of at
- * least 'size' bytes.  Free blocks are tracked in an unsorted singly-linked
- * list of free blocks within the page.  Used blocks aren't tracked, but we
- * keep a count of how many are currently allocated from each page.
+ * represented by the 'struct dma_pool' which keeps a red-black tree of all
+ * allocated pages, keyed by DMA address for fast lookup when freeing.
+ * Each page in the page_tree is split into blocks of at least 'size' bytes.
+ * Free blocks are tracked in an unsorted singly-linked list of free blocks
+ * within the page.  Used blocks aren't tracked, but we keep a count of how
+ * many are currently allocated from each page.
  *
  * The avail_page_list keeps track of pages that have one or more free blocks
  * available to (re)allocate.  Pages are moved in and out of avail_page_list
@@ -41,13 +42,14 @@
 #include <linux/string.h>
 #include <linux/types.h>
 #include <linux/wait.h>
+#include <linux/rbtree.h>
 
 #if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB_DEBUG_ON)
 #define DMAPOOL_DEBUG 1
 #endif
 
 struct dma_pool {		/* the pool */
-	struct list_head page_list;
+	struct rb_root page_tree;
 	struct list_head avail_page_list;
 	spinlock_t lock;
 	size_t size;
@@ -59,7 +61,7 @@ struct dma_pool {		/* the pool */
 };
 
 struct dma_page {		/* cacheable header for 'allocation' bytes */
-	struct list_head page_list;
+	struct rb_node page_node;
 	struct list_head avail_page_link;
 	void *vaddr;
 	dma_addr_t dma;
@@ -78,6 +80,7 @@ show_pools(struct device *dev, struct de
 	char *next;
 	struct dma_page *page;
 	struct dma_pool *pool;
+	struct rb_node *node;
 
 	next = buf;
 	size = PAGE_SIZE;
@@ -92,7 +95,10 @@ show_pools(struct device *dev, struct de
 		unsigned blocks = 0;
 
 		spin_lock_irq(&pool->lock);
-		list_for_each_entry(page, &pool->page_list, page_list) {
+		for (node = rb_first(&pool->page_tree);
+		     node;
+		     node = rb_next(node)) {
+			page = rb_entry(node, struct dma_page, page_node);
 			pages++;
 			blocks += page->in_use;
 		}
@@ -169,7 +175,7 @@ struct dma_pool *dma_pool_create(const c
 
 	retval->dev = dev;
 
-	INIT_LIST_HEAD(&retval->page_list);
+	retval->page_tree = RB_ROOT;
 	INIT_LIST_HEAD(&retval->avail_page_list);
 	spin_lock_init(&retval->lock);
 	retval->size = size;
@@ -210,6 +216,65 @@ struct dma_pool *dma_pool_create(const c
 }
 EXPORT_SYMBOL(dma_pool_create);
 
+/*
+ * Find the dma_page that manages the given DMA address.
+ */
+static struct dma_page *pool_find_page(struct dma_pool *pool, dma_addr_t dma)
+{
+	struct rb_node *node = pool->page_tree.rb_node;
+
+	while (node) {
+		struct dma_page *page =
+			container_of(node, struct dma_page, page_node);
+
+		if (dma < page->dma)
+			node = node->rb_left;
+		else if ((dma - page->dma) >= pool->allocation)
+			node = node->rb_right;
+		else
+			return page;
+	}
+	return NULL;
+}
+
+/*
+ * Insert a dma_page into the page_tree.
+ */
+static int pool_insert_page(struct dma_pool *pool, struct dma_page *new_page)
+{
+	dma_addr_t dma = new_page->dma;
+	struct rb_node **node = &(pool->page_tree.rb_node), *parent = NULL;
+
+	while (*node) {
+		struct dma_page *this_page =
+			container_of(*node, struct dma_page, page_node);
+
+		parent = *node;
+		if (dma < this_page->dma)
+			node = &((*node)->rb_left);
+		else if (likely((dma - this_page->dma) >= pool->allocation))
+			node = &((*node)->rb_right);
+		else {
+			/*
+			 * A page that overlaps the new DMA range is already
+			 * present in the tree.  This should not happen.
+			 */
+			WARN(1,
+			     "%s: %s: DMA address overlap: old 0x%llx new 0x%llx len %zu\n",
+			     pool->dev ? dev_name(pool->dev) : "(nodev)",
+			     pool->name, (u64) this_page->dma, (u64) dma,
+			     pool->allocation);
+			return -1;
+		}
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&new_page->page_node, parent, node);
+	rb_insert_color(&new_page->page_node, &pool->page_tree);
+
+	return 0;
+}
+
 static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 {
 	unsigned int offset = 0;
@@ -254,15 +319,36 @@ static inline bool is_page_busy(struct d
 	return page->in_use != 0;
 }
 
-static void pool_free_page(struct dma_pool *pool, struct dma_page *page)
+static void pool_free_page(struct dma_pool *pool,
+			   struct dma_page *page,
+			   bool destroying_pool)
 {
-	dma_addr_t dma = page->dma;
-
+	if (destroying_pool && is_page_busy(page)) {
+		if (pool->dev)
+			dev_err(pool->dev,
+				"dma_pool_destroy %s, %p busy\n",
+				pool->name, page->vaddr);
+		else
+			pr_err("dma_pool_destroy %s, %p busy\n",
+			       pool->name, page->vaddr);
+		/* leak the still-in-use consistent memory */
+	} else {
 #ifdef	DMAPOOL_DEBUG
-	memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
+		memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
 #endif
-	dma_free_coherent(pool->dev, pool->allocation, page->vaddr, dma);
-	list_del(&page->page_list);
+		dma_free_coherent(pool->dev,
+				  pool->allocation,
+				  page->vaddr,
+				  page->dma);
+	}
+
+	/*
+	 * If the pool is being destroyed, it is not safe to modify the
+	 * page_tree while iterating over it, and it is also unnecessary since
+	 * the whole tree will be discarded anyway.
+	 */
+	if (!destroying_pool)
+		rb_erase(&page->page_node, &pool->page_tree);
 	list_del(&page->avail_page_link);
 	kfree(page);
 }
@@ -277,6 +363,7 @@ static void pool_free_page(struct dma_po
  */
 void dma_pool_destroy(struct dma_pool *pool)
 {
+	struct dma_page *page, *tmp_page;
 	bool empty = false;
 
 	if (unlikely(!pool))
@@ -292,24 +379,11 @@ void dma_pool_destroy(struct dma_pool *p
 		device_remove_file(pool->dev, &dev_attr_pools);
 	mutex_unlock(&pools_reg_lock);
 
-	while (!list_empty(&pool->page_list)) {
-		struct dma_page *page;
-		page = list_entry(pool->page_list.next,
-				  struct dma_page, page_list);
-		if (is_page_busy(page)) {
-			if (pool->dev)
-				dev_err(pool->dev,
-					"dma_pool_destroy %s, %p busy\n",
-					pool->name, page->vaddr);
-			else
-				pr_err("dma_pool_destroy %s, %p busy\n",
-				       pool->name, page->vaddr);
-			/* leak the still-in-use consistent memory */
-			list_del(&page->page_list);
-			list_del(&page->avail_page_link);
-			kfree(page);
-		} else
-			pool_free_page(pool, page);
+	rbtree_postorder_for_each_entry_safe(page,
+					     tmp_page,
+					     &pool->page_tree,
+					     page_node) {
+		pool_free_page(pool, page, true);
 	}
 
 	kfree(pool);
@@ -353,7 +427,15 @@ void *dma_pool_alloc(struct dma_pool *po
 
 	spin_lock_irqsave(&pool->lock, flags);
 
-	list_add(&page->page_list, &pool->page_list);
+	if (unlikely(pool_insert_page(pool, page))) {
+		/*
+		 * This should not happen, so something must have gone horribly
+		 * wrong.  Instead of crashing, intentionally leak the memory
+		 * and make for the exit.
+		 */
+		spin_unlock_irqrestore(&pool->lock, flags);
+		return NULL;
+	}
 	list_add(&page->avail_page_link, &pool->avail_page_list);
  ready:
 	page->in_use++;
@@ -400,19 +482,6 @@ void *dma_pool_alloc(struct dma_pool *po
 }
 EXPORT_SYMBOL(dma_pool_alloc);
 
-static struct dma_page *pool_find_page(struct dma_pool *pool, dma_addr_t dma)
-{
-	struct dma_page *page;
-
-	list_for_each_entry(page, &pool->page_list, page_list) {
-		if (dma < page->dma)
-			continue;
-		if ((dma - page->dma) < pool->allocation)
-			return page;
-	}
-	return NULL;
-}
-
 /**
  * dma_pool_free - put block back into dma pool
  * @pool: the dma pool holding the block
@@ -484,7 +553,7 @@ void dma_pool_free(struct dma_pool *pool
 	page->offset = offset;
 	/*
 	 * Resist a temptation to do
-	 *    if (!is_page_busy(page)) pool_free_page(pool, page);
+	 *    if (!is_page_busy(page)) pool_free_page(pool, page, false);
 	 * Better have a few empty pages hang around.
 	 */
 	spin_unlock_irqrestore(&pool->lock, flags);

From patchwork Thu Jul 26 18:55:31 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tony Battersby <tonyb@cybernetics.com>
X-Patchwork-Id: 10546331
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1EC91112B
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Thu, 26 Jul 2018 19:11:53 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10F232BCEB
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Thu, 26 Jul 2018 19:11:53 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 051282BCE7; Thu, 26 Jul 2018 19:11:53 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75E9C2BCC2
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Thu, 26 Jul 2018 19:11:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1730940AbeGZUaC (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Thu, 26 Jul 2018 16:30:02 -0400
Received: from mail.cybernetics.com ([173.71.130.66]:37640 "EHLO
        mail.cybernetics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1730198AbeGZUaC (ORCPT
        <rfc822;linux-scsi@vger.kernel.org>); Thu, 26 Jul 2018 16:30:02 -0400
X-Greylist: delayed 1073 seconds by postgrey-1.27 at vger.kernel.org;
 Thu, 26 Jul 2018 16:30:02 EDT
X-ASG-Debug-ID: 1532631332-0fb3b01fb33a2790001-ziuLRu
Received: from cybernetics.com ([10.157.1.126]) by mail.cybernetics.com with
 ESMTP id mI9WtgWibIWVV87X (version=SSLv3 cipher=DES-CBC3-SHA bits=112
 verify=NO); Thu, 26 Jul 2018 14:55:32 -0400 (EDT)
X-Barracuda-Envelope-From: tonyb@cybernetics.com
X-ASG-Whitelist: Client
Received: from [10.157.2.224] (account tonyb HELO [192.168.200.1])
  by cybernetics.com (CommuniGate Pro SMTP 5.1.14)
  with ESMTPSA id 8304014; Thu, 26 Jul 2018 14:55:31 -0400
From: Tony Battersby <tonyb@cybernetics.com>
Subject: [PATCH 3/3] [SCSI] mpt3sas: replace chain_dma_pool
To: Christoph Hellwig <hch@lst.de>,
        Marek Szyprowski <m.szyprowski@samsung.com>,
        Matthew Wilcox <willy@infradead.org>,
        Sathya Prakash <sathya.prakash@broadcom.com>,
        Chaitra P B <chaitra.basappa@broadcom.com>,
        Suganath Prabu Subramani
        <suganath-prabu.subramani@broadcom.com>,
        iommu@lists.linux-foundation.org, linux-mm@kvack.org,
        linux-scsi <linux-scsi@vger.kernel.org>,
        MPT-FusionLinux.pdl@broadcom.com
X-ASG-Orig-Subj: [PATCH 3/3] [SCSI] mpt3sas: replace chain_dma_pool
Message-ID: <f63175ae-b835-a8a9-2d71-7cf5dfa95304@cybernetics.com>
Date: Thu, 26 Jul 2018 14:55:31 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
Content-Language: en-US
X-Barracuda-Connect: UNKNOWN[10.157.1.126]
X-Barracuda-Start-Time: 1532631332
X-Barracuda-Encrypted: DES-CBC3-SHA
X-Barracuda-URL: https://10.157.1.122:443/cgi-mod/mark.cgi
X-Barracuda-Scan-Msg-Size: 7758
X-Barracuda-BRTS-Status: 1
X-Virus-Scanned: by bsmtpd at cybernetics.com
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Replace chain_dma_pool with direct calls to dma_alloc_coherent() and
dma_free_coherent().  Since the chain lookup can involve hundreds of
thousands of allocations, it is worthwile to avoid the overhead of the
dma_pool API.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

The original code called _base_release_memory_pools() before "goto out"
if dma_pool_alloc() failed, but this was unnecessary because
mpt3sas_base_attach() will call _base_release_memory_pools() after "goto
out_free_resources".  It may have been that way because the out-of-tree
vendor driver (from https://www.broadcom.com/support/download-search)
has a slightly-more-complicated error handler there that adjusts
max_request_credit, calls _base_release_memory_pools() and then does
"goto retry_allocation" under some circumstances, but that is missing
from the in-tree driver.

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 569392d..2cb567a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -4224,6 +4224,134 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc,
 }
 
 /**
+ * _base_release_chain_lookup - release chain_lookup memory pools
+ * @ioc: per adapter object
+ *
+ * Free memory allocated from _base_allocate_chain_lookup.
+ */
+static void
+_base_release_chain_lookup(struct MPT3SAS_ADAPTER *ioc)
+{
+	unsigned int chains_avail = 0;
+	struct chain_tracker *ct;
+	int i, j;
+
+	if (!ioc->chain_lookup)
+		return;
+
+	/*
+	 * NOTE
+	 *
+	 * To make this code easier to understand and maintain, the for loops
+	 * and the management of the chains_avail value are designed to be
+	 * similar to the _base_allocate_chain_lookup() function.  That way,
+	 * the code for freeing the memory is similar to the code for
+	 * allocating the memory.
+	 */
+	for (i = 0; i < ioc->scsiio_depth; i++) {
+		if (!ioc->chain_lookup[i].chains_per_smid)
+			break;
+
+		for (j = ioc->chains_per_prp_buffer;
+				j < ioc->chains_needed_per_io; j++) {
+			/*
+			 * If chains_avail is 0, then the chain represents a
+			 * real allocation, so free it.
+			 *
+			 * If chains_avail is nonzero, then the chain was
+			 * initialized at an offset from a previous allocation,
+			 * so don't free it.
+			 */
+			if (chains_avail == 0) {
+				ct = &ioc->chain_lookup[i].chains_per_smid[j];
+				if (ct->chain_buffer)
+					dma_free_coherent(
+						&ioc->pdev->dev,
+						ioc->chain_allocation_sz,
+						ct->chain_buffer,
+						ct->chain_buffer_dma);
+				chains_avail = ioc->chains_per_allocation;
+			}
+			chains_avail--;
+		}
+		kfree(ioc->chain_lookup[i].chains_per_smid);
+	}
+
+	kfree(ioc->chain_lookup);
+	ioc->chain_lookup = NULL;
+}
+
+/**
+ * _base_allocate_chain_lookup - allocate chain_lookup memory pools
+ * @ioc: per adapter object
+ * @total_sz: external value that tracks total amount of memory allocated
+ *
+ * Return: 0 success, anything else error
+ */
+static int
+_base_allocate_chain_lookup(struct MPT3SAS_ADAPTER *ioc, u32 *total_sz)
+{
+	unsigned int aligned_chain_segment_sz;
+	const unsigned int align = 16;
+	unsigned int chains_avail = 0;
+	struct chain_tracker *ct;
+	dma_addr_t dma_addr = 0;
+	void *vaddr = NULL;
+	int i, j;
+
+	/* Round up the allocation size for alignment. */
+	aligned_chain_segment_sz = ioc->chain_segment_sz;
+	if (aligned_chain_segment_sz % align != 0)
+		aligned_chain_segment_sz =
+			ALIGN(aligned_chain_segment_sz, align);
+
+	/* Allocate a page of chain buffers at a time. */
+	ioc->chain_allocation_sz =
+		max_t(unsigned int, aligned_chain_segment_sz, PAGE_SIZE);
+
+	/* Calculate how many chain buffers we can get from one allocation. */
+	ioc->chains_per_allocation =
+		ioc->chain_allocation_sz / aligned_chain_segment_sz;
+
+	for (i = 0; i < ioc->scsiio_depth; i++) {
+		for (j = ioc->chains_per_prp_buffer;
+				j < ioc->chains_needed_per_io; j++) {
+			/*
+			 * Check if there are any chain buffers left in the
+			 * previously-allocated block.
+			 */
+			if (chains_avail == 0) {
+				/* Allocate a new block of chain buffers. */
+				vaddr = dma_alloc_coherent(
+					&ioc->pdev->dev,
+					ioc->chain_allocation_sz,
+					&dma_addr,
+					GFP_KERNEL);
+				if (!vaddr) {
+					pr_err(MPT3SAS_FMT
+						"chain_lookup: dma_alloc_coherent failed\n",
+						ioc->name);
+					return -1;
+				}
+				chains_avail = ioc->chains_per_allocation;
+			}
+
+			ct = &ioc->chain_lookup[i].chains_per_smid[j];
+			ct->chain_buffer     = vaddr;
+			ct->chain_buffer_dma = dma_addr;
+
+			/* Go to the next chain buffer in the block. */
+			vaddr     += aligned_chain_segment_sz;
+			dma_addr  += aligned_chain_segment_sz;
+			*total_sz += ioc->chain_segment_sz;
+			chains_avail--;
+		}
+	}
+
+	return 0;
+}
+
+/**
  * _base_release_memory_pools - release memory
  * @ioc: per adapter object
  *
@@ -4235,8 +4363,6 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc,
 _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
 {
 	int i = 0;
-	int j = 0;
-	struct chain_tracker *ct;
 	struct reply_post_struct *rps;
 
 	dexitprintk(ioc, pr_info(MPT3SAS_FMT "%s\n", ioc->name,
@@ -4326,22 +4452,7 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc,
 
 	kfree(ioc->hpr_lookup);
 	kfree(ioc->internal_lookup);
-	if (ioc->chain_lookup) {
-		for (i = 0; i < ioc->scsiio_depth; i++) {
-			for (j = ioc->chains_per_prp_buffer;
-			    j < ioc->chains_needed_per_io; j++) {
-				ct = &ioc->chain_lookup[i].chains_per_smid[j];
-				if (ct && ct->chain_buffer)
-					dma_pool_free(ioc->chain_dma_pool,
-						ct->chain_buffer,
-						ct->chain_buffer_dma);
-			}
-			kfree(ioc->chain_lookup[i].chains_per_smid);
-		}
-		dma_pool_destroy(ioc->chain_dma_pool);
-		kfree(ioc->chain_lookup);
-		ioc->chain_lookup = NULL;
-	}
+	_base_release_chain_lookup(ioc);
 }
 
 /**
@@ -4784,29 +4895,8 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc,
 		total_sz += sz * ioc->scsiio_depth;
 	}
 
-	ioc->chain_dma_pool = dma_pool_create("chain pool", &ioc->pdev->dev,
-	    ioc->chain_segment_sz, 16, 0);
-	if (!ioc->chain_dma_pool) {
-		pr_err(MPT3SAS_FMT "chain_dma_pool: dma_pool_create failed\n",
-			ioc->name);
+	if (_base_allocate_chain_lookup(ioc, &total_sz))
 		goto out;
-	}
-	for (i = 0; i < ioc->scsiio_depth; i++) {
-		for (j = ioc->chains_per_prp_buffer;
-				j < ioc->chains_needed_per_io; j++) {
-			ct = &ioc->chain_lookup[i].chains_per_smid[j];
-			ct->chain_buffer = dma_pool_alloc(
-					ioc->chain_dma_pool, GFP_KERNEL,
-					&ct->chain_buffer_dma);
-			if (!ct->chain_buffer) {
-				pr_err(MPT3SAS_FMT "chain_lookup: "
-				" pci_pool_alloc failed\n", ioc->name);
-				_base_release_memory_pools(ioc);
-				goto out;
-			}
-		}
-		total_sz += ioc->chain_segment_sz;
-	}
 
 	dinitprintk(ioc, pr_info(MPT3SAS_FMT
 		"chain pool depth(%d), frame_size(%d), pool_size(%d kB)\n",
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index f02974c..7ee81d5 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -1298,7 +1298,6 @@ struct MPT3SAS_ADAPTER {
 	/* chain */
 	struct chain_lookup *chain_lookup;
 	struct list_head free_chain_list;
-	struct dma_pool *chain_dma_pool;
 	ulong		chain_pages;
 	u16		max_sges_in_main_message;
 	u16		max_sges_in_chain_message;
@@ -1306,6 +1305,8 @@ struct MPT3SAS_ADAPTER {
 	u32		chain_depth;
 	u16		chain_segment_sz;
 	u16		chains_per_prp_buffer;
+	u32		chain_allocation_sz;
+	u32		chains_per_allocation;
 
 	/* hi-priority queue */
 	u16		hi_priority_smid;