Message ID | 1288e597-a67a-25b3-b7c6-db883ca67a25@cybernetics.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | mpt3sas and dmapool scalability | expand |
On Thu, Jul 26, 2018 at 02:54:56PM -0400, Tony Battersby wrote: > dma_pool_free() scales poorly when the pool contains many pages because > pool_find_page() does a linear scan of all allocated pages. Improve its > scalability by replacing the linear scan with a red-black tree lookup. > In big O notation, this improves the algorithm from O(n^2) to O(n * log n). This is a lot of code to get us to O(n * log(n)) when we can use less code to go to O(n). dma_pool_free() is passed the virtual address. We can go from virtual address to struct page with virt_to_page(). In struct page, we have 5 words available (20/40 bytes), and it's trivial to use one of them to point to the struct dma_page.
On Thu, Jul 26, 2018 at 9:54 PM, Tony Battersby <tonyb@cybernetics.com> wrote: > dma_pool_free() scales poorly when the pool contains many pages because > pool_find_page() does a linear scan of all allocated pages. Improve its > scalability by replacing the linear scan with a red-black tree lookup. > In big O notation, this improves the algorithm from O(n^2) to O(n * log n). Few style related comments. > I moved some code from dma_pool_destroy() into pool_free_page() to avoid code > repetition. I would rather split that part as a separate preparatory change which doesn't change the behaviour. > #include <linux/string.h> > #include <linux/types.h> > #include <linux/wait.h> > +#include <linux/rbtree.h> It looks misordered. > + struct dma_page *this_page = > + container_of(*node, struct dma_page, page_node); #define to_dma_page() container_of() ? > + WARN(1, > + "%s: %s: DMA address overlap: old 0x%llx new 0x%llx len %zu\n", > + pool->dev ? dev_name(pool->dev) : "(nodev)", > + pool->name, (u64) this_page->dma, (u64) dma, Use proper %p extensions for the DMA addresses: https://elixir.bootlin.com/linux/latest/source/Documentation/core-api/printk-formats.rst#L150
On 07/26/2018 03:42 PM, Matthew Wilcox wrote: > On Thu, Jul 26, 2018 at 02:54:56PM -0400, Tony Battersby wrote: >> dma_pool_free() scales poorly when the pool contains many pages because >> pool_find_page() does a linear scan of all allocated pages. Improve its >> scalability by replacing the linear scan with a red-black tree lookup. >> In big O notation, this improves the algorithm from O(n^2) to O(n * log n). > This is a lot of code to get us to O(n * log(n)) when we can use less > code to go to O(n). dma_pool_free() is passed the virtual address. > We can go from virtual address to struct page with virt_to_page(). > In struct page, we have 5 words available (20/40 bytes), and it's trivial > to use one of them to point to the struct dma_page. > Thanks for the tip. I will give that a try.
On Thu, Jul 26, 2018 at 04:06:05PM -0400, Tony Battersby wrote: > On 07/26/2018 03:42 PM, Matthew Wilcox wrote: > > On Thu, Jul 26, 2018 at 02:54:56PM -0400, Tony Battersby wrote: > >> dma_pool_free() scales poorly when the pool contains many pages because > >> pool_find_page() does a linear scan of all allocated pages. Improve its > >> scalability by replacing the linear scan with a red-black tree lookup. > >> In big O notation, this improves the algorithm from O(n^2) to O(n * log n). > > This is a lot of code to get us to O(n * log(n)) when we can use less > > code to go to O(n). dma_pool_free() is passed the virtual address. > > We can go from virtual address to struct page with virt_to_page(). > > In struct page, we have 5 words available (20/40 bytes), and it's trivial > > to use one of them to point to the struct dma_page. > > > Thanks for the tip. I will give that a try. If you're up for more major surgery, then I think we can put all the information currently stored in dma_page into struct page. Something like this: +++ b/include/linux/mm_types.h @@ -152,6 +152,12 @@ struct page { unsigned long hmm_data; unsigned long _zd_pad_1; /* uses mapping */ }; + struct { /* dma_pool pages */ + struct list_head dma_list; + unsigned short in_use; + unsigned short offset; + dma_addr_t dma; + }; /** @rcu_head: You can use this to free a page by RCU. */ struct rcu_head rcu_head; page_list -> dma_list vaddr goes away (page_to_virt() exists) dma -> dma in_use and offset shrink from 4 bytes to 2. Some 32-bit systems have a 64-bit dma_addr_t, and on those systems, this will be 8 + 2 + 2 + 8 = 20 bytes. On 64-bit systems, it'll be 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available).
On 07/26/2018 08:07 PM, Matthew Wilcox wrote: > On Thu, Jul 26, 2018 at 04:06:05PM -0400, Tony Battersby wrote: >> On 07/26/2018 03:42 PM, Matthew Wilcox wrote: >>> On Thu, Jul 26, 2018 at 02:54:56PM -0400, Tony Battersby wrote: >>>> dma_pool_free() scales poorly when the pool contains many pages because >>>> pool_find_page() does a linear scan of all allocated pages. Improve its >>>> scalability by replacing the linear scan with a red-black tree lookup. >>>> In big O notation, this improves the algorithm from O(n^2) to O(n * log n). >>> This is a lot of code to get us to O(n * log(n)) when we can use less >>> code to go to O(n). dma_pool_free() is passed the virtual address. >>> We can go from virtual address to struct page with virt_to_page(). >>> In struct page, we have 5 words available (20/40 bytes), and it's trivial >>> to use one of them to point to the struct dma_page. >>> >> Thanks for the tip. I will give that a try. > If you're up for more major surgery, then I think we can put all the > information currently stored in dma_page into struct page. Something > like this: > > +++ b/include/linux/mm_types.h > @@ -152,6 +152,12 @@ struct page { > unsigned long hmm_data; > unsigned long _zd_pad_1; /* uses mapping */ > }; > + struct { /* dma_pool pages */ > + struct list_head dma_list; > + unsigned short in_use; > + unsigned short offset; > + dma_addr_t dma; > + }; > > /** @rcu_head: You can use this to free a page by RCU. */ > struct rcu_head rcu_head; > > page_list -> dma_list > vaddr goes away (page_to_virt() exists) > dma -> dma > in_use and offset shrink from 4 bytes to 2. > > Some 32-bit systems have a 64-bit dma_addr_t, and on those systems, > this will be 8 + 2 + 2 + 8 = 20 bytes. On 64-bit systems, it'll be > 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available). > > offset at least needs more bits, since allocations can be multi-page. See the following from mpt3sas: cat /sys/devices/pci0000:80/0000:80:07.0/0000:85:00.0/pools (manually cleaned up column alignment) poolinfo - 0.1 reply_post_free_array pool 1 21 192 1 reply_free pool 1 1 41728 1 reply pool 1 1 1335296 1 sense pool 1 1 970272 1 chain pool 373959 386048 128 12064 reply_post_free pool 12 12 166528 12 ^size^ In my initial implementation I also added a pointer from struct page to the dma_pool so that dma_pool_free() could sanity-check that the page really belongs to the pool, as in: pg = virt_to_page(vaddr); if (unlikely(pg->dma_pool != pool)) { handle error... } page = pg->dma_page; The linked-list search previously implemented that check as a by-product, and I didn't want to lose it. It all seems to be working so far.
On Fri, Jul 27, 2018 at 09:23:30AM -0400, Tony Battersby wrote: > On 07/26/2018 08:07 PM, Matthew Wilcox wrote: > > If you're up for more major surgery, then I think we can put all the > > information currently stored in dma_page into struct page. Something > > like this: > > > > +++ b/include/linux/mm_types.h > > @@ -152,6 +152,12 @@ struct page { > > unsigned long hmm_data; > > unsigned long _zd_pad_1; /* uses mapping */ > > }; > > + struct { /* dma_pool pages */ > > + struct list_head dma_list; > > + unsigned short in_use; > > + unsigned short offset; > > + dma_addr_t dma; > > + }; > > > > /** @rcu_head: You can use this to free a page by RCU. */ > > struct rcu_head rcu_head; > > > > page_list -> dma_list > > vaddr goes away (page_to_virt() exists) > > dma -> dma > > in_use and offset shrink from 4 bytes to 2. > > > > Some 32-bit systems have a 64-bit dma_addr_t, and on those systems, > > this will be 8 + 2 + 2 + 8 = 20 bytes. On 64-bit systems, it'll be > > 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available). > > > > > offset at least needs more bits, since allocations can be multi-page. Ah, rats. That means we have to use the mapcount union too: +++ b/include/linux/mm_types.h @@ -152,6 +152,11 @@ struct page { unsigned long hmm_data; unsigned long _zd_pad_1; /* uses mapping */ }; + struct { /* dma_pool pages */ + struct list_head dma_list; + unsigned int dma_in_use; + dma_addr_t dma; + }; /** @rcu_head: You can use this to free a page by RCU. */ struct rcu_head rcu_head; @@ -174,6 +179,7 @@ struct page { unsigned int active; /* SLAB */ int units; /* SLOB */ + unsigned int dma_offset; /* dma_pool */ }; /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ > See the following from mpt3sas: > > cat /sys/devices/pci0000:80/0000:80:07.0/0000:85:00.0/pools > (manually cleaned up column alignment) > poolinfo - 0.1 > reply_post_free_array pool 1 21 192 1 > reply_free pool 1 1 41728 1 > reply pool 1 1 1335296 1 > sense pool 1 1 970272 1 > chain pool 373959 386048 128 12064 > reply_post_free pool 12 12 166528 12 > ^size^ Wow, that's a pretty weird way to use the dmapool. It'd be more efficient to just call dma_alloc_coherent() directly.
On 07/27/2018 11:23 AM, Matthew Wilcox wrote: > On Fri, Jul 27, 2018 at 09:23:30AM -0400, Tony Battersby wrote: >> On 07/26/2018 08:07 PM, Matthew Wilcox wrote: >>> If you're up for more major surgery, then I think we can put all the >>> information currently stored in dma_page into struct page. Something >>> like this: >>> >>> +++ b/include/linux/mm_types.h >>> @@ -152,6 +152,12 @@ struct page { >>> unsigned long hmm_data; >>> unsigned long _zd_pad_1; /* uses mapping */ >>> }; >>> + struct { /* dma_pool pages */ >>> + struct list_head dma_list; >>> + unsigned short in_use; >>> + unsigned short offset; >>> + dma_addr_t dma; >>> + }; >>> >>> /** @rcu_head: You can use this to free a page by RCU. */ >>> struct rcu_head rcu_head; >>> >>> page_list -> dma_list >>> vaddr goes away (page_to_virt() exists) >>> dma -> dma >>> in_use and offset shrink from 4 bytes to 2. >>> >>> Some 32-bit systems have a 64-bit dma_addr_t, and on those systems, >>> this will be 8 + 2 + 2 + 8 = 20 bytes. On 64-bit systems, it'll be >>> 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available). >>> >>> >> offset at least needs more bits, since allocations can be multi-page. > Ah, rats. That means we have to use the mapcount union too: > Actually, on second thought, if I understand it correctly, a multi-page allocation will never be split up and returned as multiple sub-allocations, so the offset shouldn't be needed for that case. The offset should only be needed when splitting a PAGE_SIZE-allocation into smaller sub-allocations. The current code uses the offset unconditionally though, so it would need major changes to remove the dependence. So a 16-bit offset might work. As for sanity checking, I suppose having the dma address in the page could provide something for dma_pool_free() to check against (in fact it is already there under DMAPOOL_DEBUG). But the bigger problem is that my first patch adds another list_head to the dma_page for the avail_page_link to make allocations faster. I suppose we could make the lists singly-linked instead of doubly-linked to save space. Wouldn't using the mapcount union make it problematic for userspace to mmap() the returned DMA buffers? I am not sure if any drivers allow that to be done or not. I have heard of drivers in userspace, drivers with DMA ring buffers, etc. I don't want to audit the whole kernel tree to know if it would be safe. As you have seen, at least mpt3sas is doing unexpected things with dma pools. So maybe it could be done, but you are right, it would involve major surgery. My current in-development patch to implement your intial suggestion is pretty small and it works. So I'm not sure if I want to take it further or not. Lots of other things to do...
On 07/27/2018 03:38 PM, Tony Battersby wrote: > But the bigger problem is that my first patch adds another list_head to > the dma_page for the avail_page_link to make allocations faster. I > suppose we could make the lists singly-linked instead of doubly-linked > to save space. > I managed to redo my dma_pool_alloc() patch to make avail_page_list singly-linked instead of doubly-linked. But the problem with making either list singly-linked is that it would no longer be possible to call pool_free_page() any time other than dma_pool_destroy() without scanning the lists to remove the page from them, which would make pruning arbitrary free pages slower (adding back a O(n^2)). But the current code doesn't do that anyway, and in fact it has a comment in dma_pool_free() to "resist the temptation" to prune free pages. And yet it seems like it might be reasonable for someone to add such code in the future if there are a whole lot of free pages, so I am hesitant to make it more difficult. So my question is: when I post v2 of the patchset, should I send the doubly-linked version or the singly-linked version, in anticipation that someone else might want to take it further and move everything into struct page as you suggest?
On Sat, Jul 28, 2018 at 12:27 AM, Tony Battersby <tonyb@cybernetics.com> wrote: > On 07/27/2018 03:38 PM, Tony Battersby wrote: >> But the bigger problem is that my first patch adds another list_head to >> the dma_page for the avail_page_link to make allocations faster. I >> suppose we could make the lists singly-linked instead of doubly-linked >> to save space. >> > > I managed to redo my dma_pool_alloc() patch to make avail_page_list > singly-linked instead of doubly-linked. Are you relying on llist.h implementation? Btw, did you see quicklist.h?
On 07/27/2018 05:35 PM, Andy Shevchenko wrote: > On Sat, Jul 28, 2018 at 12:27 AM, Tony Battersby <tonyb@cybernetics.com> wrote: >> On 07/27/2018 03:38 PM, Tony Battersby wrote: >>> But the bigger problem is that my first patch adds another list_head to >>> the dma_page for the avail_page_link to make allocations faster. I >>> suppose we could make the lists singly-linked instead of doubly-linked >>> to save space. >>> >> I managed to redo my dma_pool_alloc() patch to make avail_page_list >> singly-linked instead of doubly-linked. > Are you relying on llist.h implementation? > > Btw, did you see quicklist.h? > > I looked over include/linux/*list* to see if there was a suitable implementation I could use. llist.h makes a big deal about having a lock-less implementation with atomic instructions, which seemed like overkill. I didn't see anything else suitable, so I just went with my own implementation. Singly-linked lists are simple enough. And a quick "grep -i singly include/linux/*" shows precedent in bi_next, fl_next, fa_next, etc. Thanks for pointing out quicklist.h. At first I thought you were confused since you were talking about linked list implementations and quicklist.h sounds like a linked list implementation but isn't. But now I see that it is doing simple memory allocation/free, so that is the relevance to dmapool. Incidentally it looks like it is also using a singly-linked list to store the list of free pages, but it is much simpler because it doesn't try to sub-divide the pages into smaller allocations.
On 07/27/2018 05:27 PM, Tony Battersby wrote: > On 07/27/2018 03:38 PM, Tony Battersby wrote: >> But the bigger problem is that my first patch adds another list_head to >> the dma_page for the avail_page_link to make allocations faster. I >> suppose we could make the lists singly-linked instead of doubly-linked >> to save space. >> > I managed to redo my dma_pool_alloc() patch to make avail_page_list > singly-linked instead of doubly-linked. But the problem with making > either list singly-linked is that it would no longer be possible to call > pool_free_page() any time other than dma_pool_destroy() without scanning > the lists to remove the page from them, which would make pruning > arbitrary free pages slower (adding back a O(n^2)). But the current > code doesn't do that anyway, and in fact it has a comment in > dma_pool_free() to "resist the temptation" to prune free pages. And yet > it seems like it might be reasonable for someone to add such code in the > future if there are a whole lot of free pages, so I am hesitant to make > it more difficult. > > So my question is: when I post v2 of the patchset, should I send the > doubly-linked version or the singly-linked version, in anticipation that > someone else might want to take it further and move everything into > struct page as you suggest? > Over the weekend I came up with a better solution. Instead of having the page listed in two singly-linked lists at the same time, move the page between two doubly-linked lists. One list is dedicated for pages that have all blocks allocated, and one list is for pages that have some blocks free. Since the page is only in one list at a time, it only needs one set of list pointers. I also implemented the code to make the offset 16-bit, while ignoring the offset for cases where it is not needed (where it would overflow anyway). So now I have an implementation that eliminates struct dma_page. I will post it once it is ready for review.
--- a/mm/dmapool.c +++ b/mm/dmapool.c @@ -15,11 +15,12 @@ * Many older drivers still have their own code to do this. * * The current design of this allocator is fairly simple. The pool is - * represented by the 'struct dma_pool' which keeps a doubly-linked list of - * allocated pages. Each page in the page_list is split into blocks of at - * least 'size' bytes. Free blocks are tracked in an unsorted singly-linked - * list of free blocks within the page. Used blocks aren't tracked, but we - * keep a count of how many are currently allocated from each page. + * represented by the 'struct dma_pool' which keeps a red-black tree of all + * allocated pages, keyed by DMA address for fast lookup when freeing. + * Each page in the page_tree is split into blocks of at least 'size' bytes. + * Free blocks are tracked in an unsorted singly-linked list of free blocks + * within the page. Used blocks aren't tracked, but we keep a count of how + * many are currently allocated from each page. * * The avail_page_list keeps track of pages that have one or more free blocks * available to (re)allocate. Pages are moved in and out of avail_page_list @@ -41,13 +42,14 @@ #include <linux/string.h> #include <linux/types.h> #include <linux/wait.h> +#include <linux/rbtree.h> #if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB_DEBUG_ON) #define DMAPOOL_DEBUG 1 #endif struct dma_pool { /* the pool */ - struct list_head page_list; + struct rb_root page_tree; struct list_head avail_page_list; spinlock_t lock; size_t size; @@ -59,7 +61,7 @@ struct dma_pool { /* the pool */ }; struct dma_page { /* cacheable header for 'allocation' bytes */ - struct list_head page_list; + struct rb_node page_node; struct list_head avail_page_link; void *vaddr; dma_addr_t dma; @@ -78,6 +80,7 @@ show_pools(struct device *dev, struct de char *next; struct dma_page *page; struct dma_pool *pool; + struct rb_node *node; next = buf; size = PAGE_SIZE; @@ -92,7 +95,10 @@ show_pools(struct device *dev, struct de unsigned blocks = 0; spin_lock_irq(&pool->lock); - list_for_each_entry(page, &pool->page_list, page_list) { + for (node = rb_first(&pool->page_tree); + node; + node = rb_next(node)) { + page = rb_entry(node, struct dma_page, page_node); pages++; blocks += page->in_use; } @@ -169,7 +175,7 @@ struct dma_pool *dma_pool_create(const c retval->dev = dev; - INIT_LIST_HEAD(&retval->page_list); + retval->page_tree = RB_ROOT; INIT_LIST_HEAD(&retval->avail_page_list); spin_lock_init(&retval->lock); retval->size = size; @@ -210,6 +216,65 @@ struct dma_pool *dma_pool_create(const c } EXPORT_SYMBOL(dma_pool_create); +/* + * Find the dma_page that manages the given DMA address. + */ +static struct dma_page *pool_find_page(struct dma_pool *pool, dma_addr_t dma) +{ + struct rb_node *node = pool->page_tree.rb_node; + + while (node) { + struct dma_page *page = + container_of(node, struct dma_page, page_node); + + if (dma < page->dma) + node = node->rb_left; + else if ((dma - page->dma) >= pool->allocation) + node = node->rb_right; + else + return page; + } + return NULL; +} + +/* + * Insert a dma_page into the page_tree. + */ +static int pool_insert_page(struct dma_pool *pool, struct dma_page *new_page) +{ + dma_addr_t dma = new_page->dma; + struct rb_node **node = &(pool->page_tree.rb_node), *parent = NULL; + + while (*node) { + struct dma_page *this_page = + container_of(*node, struct dma_page, page_node); + + parent = *node; + if (dma < this_page->dma) + node = &((*node)->rb_left); + else if (likely((dma - this_page->dma) >= pool->allocation)) + node = &((*node)->rb_right); + else { + /* + * A page that overlaps the new DMA range is already + * present in the tree. This should not happen. + */ + WARN(1, + "%s: %s: DMA address overlap: old 0x%llx new 0x%llx len %zu\n", + pool->dev ? dev_name(pool->dev) : "(nodev)", + pool->name, (u64) this_page->dma, (u64) dma, + pool->allocation); + return -1; + } + } + + /* Add new node and rebalance tree. */ + rb_link_node(&new_page->page_node, parent, node); + rb_insert_color(&new_page->page_node, &pool->page_tree); + + return 0; +} + static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page) { unsigned int offset = 0; @@ -254,15 +319,36 @@ static inline bool is_page_busy(struct d return page->in_use != 0; } -static void pool_free_page(struct dma_pool *pool, struct dma_page *page) +static void pool_free_page(struct dma_pool *pool, + struct dma_page *page, + bool destroying_pool) { - dma_addr_t dma = page->dma; - + if (destroying_pool && is_page_busy(page)) { + if (pool->dev) + dev_err(pool->dev, + "dma_pool_destroy %s, %p busy\n", + pool->name, page->vaddr); + else + pr_err("dma_pool_destroy %s, %p busy\n", + pool->name, page->vaddr); + /* leak the still-in-use consistent memory */ + } else { #ifdef DMAPOOL_DEBUG - memset(page->vaddr, POOL_POISON_FREED, pool->allocation); + memset(page->vaddr, POOL_POISON_FREED, pool->allocation); #endif - dma_free_coherent(pool->dev, pool->allocation, page->vaddr, dma); - list_del(&page->page_list); + dma_free_coherent(pool->dev, + pool->allocation, + page->vaddr, + page->dma); + } + + /* + * If the pool is being destroyed, it is not safe to modify the + * page_tree while iterating over it, and it is also unnecessary since + * the whole tree will be discarded anyway. + */ + if (!destroying_pool) + rb_erase(&page->page_node, &pool->page_tree); list_del(&page->avail_page_link); kfree(page); } @@ -277,6 +363,7 @@ static void pool_free_page(struct dma_po */ void dma_pool_destroy(struct dma_pool *pool) { + struct dma_page *page, *tmp_page; bool empty = false; if (unlikely(!pool)) @@ -292,24 +379,11 @@ void dma_pool_destroy(struct dma_pool *p device_remove_file(pool->dev, &dev_attr_pools); mutex_unlock(&pools_reg_lock); - while (!list_empty(&pool->page_list)) { - struct dma_page *page; - page = list_entry(pool->page_list.next, - struct dma_page, page_list); - if (is_page_busy(page)) { - if (pool->dev) - dev_err(pool->dev, - "dma_pool_destroy %s, %p busy\n", - pool->name, page->vaddr); - else - pr_err("dma_pool_destroy %s, %p busy\n", - pool->name, page->vaddr); - /* leak the still-in-use consistent memory */ - list_del(&page->page_list); - list_del(&page->avail_page_link); - kfree(page); - } else - pool_free_page(pool, page); + rbtree_postorder_for_each_entry_safe(page, + tmp_page, + &pool->page_tree, + page_node) { + pool_free_page(pool, page, true); } kfree(pool); @@ -353,7 +427,15 @@ void *dma_pool_alloc(struct dma_pool *po spin_lock_irqsave(&pool->lock, flags); - list_add(&page->page_list, &pool->page_list); + if (unlikely(pool_insert_page(pool, page))) { + /* + * This should not happen, so something must have gone horribly + * wrong. Instead of crashing, intentionally leak the memory + * and make for the exit. + */ + spin_unlock_irqrestore(&pool->lock, flags); + return NULL; + } list_add(&page->avail_page_link, &pool->avail_page_list); ready: page->in_use++; @@ -400,19 +482,6 @@ void *dma_pool_alloc(struct dma_pool *po } EXPORT_SYMBOL(dma_pool_alloc); -static struct dma_page *pool_find_page(struct dma_pool *pool, dma_addr_t dma) -{ - struct dma_page *page; - - list_for_each_entry(page, &pool->page_list, page_list) { - if (dma < page->dma) - continue; - if ((dma - page->dma) < pool->allocation) - return page; - } - return NULL; -} - /** * dma_pool_free - put block back into dma pool * @pool: the dma pool holding the block @@ -484,7 +553,7 @@ void dma_pool_free(struct dma_pool *pool page->offset = offset; /* * Resist a temptation to do - * if (!is_page_busy(page)) pool_free_page(pool, page); + * if (!is_page_busy(page)) pool_free_page(pool, page, false); * Better have a few empty pages hang around. */ spin_unlock_irqrestore(&pool->lock, flags);
dma_pool_free() scales poorly when the pool contains many pages because pool_find_page() does a linear scan of all allocated pages. Improve its scalability by replacing the linear scan with a red-black tree lookup. In big O notation, this improves the algorithm from O(n^2) to O(n * log n). Signed-off-by: Tony Battersby <tonyb@cybernetics.com> --- I moved some code from dma_pool_destroy() into pool_free_page() to avoid code repetition.