Message ID | 20241119205529.3871048-2-bjohannesmeyer@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | dmapool: Mitigate device-controllable mem. corruption | expand |
On Tue, Nov 19, 2024 at 09:55:28PM +0100, Brian Johannesmeyer wrote: > + page->blocks_per_page = pool->allocation / pool->size; > + page->blocks = kmalloc_array(page->blocks_per_page, > + sizeof(struct dma_block), GFP_KERNEL); Given that you now need an array of the blocks anyway, it might make sense to switch from a linked list to a bitmap for tracking free state, which would be a lot more efficient as you only need a bit per block as tracking overhead instead of a two pointers and a dma_addr_t. e.g. do a find_first_zero_bit() to find the ffree slot, then calculate the dma_addr and virt address by simple offseting into the dma_page ones with bitnr * pool->size.
> Given that you now need an array of the blocks anyway, it might make > sense to switch from a linked list to a bitmap for tracking free state, > which would be a lot more efficient as you only need a bit per block > as tracking overhead instead of a two pointers and a dma_addr_t. > > e.g. do a find_first_zero_bit() to find the ffree slot, then calculate > the dma_addr and virt address by simple offseting into the dma_page > ones with bitnr * pool->size. Thank you for the suggestion. I hacked together a bitmap-based approach as you proposed, and while it does improve memory efficiency by reducing the per-block metadata overhead, it unfortunately appears to significantly impact the runtime performance. Here are the performance results, with DMAPOOL_DEBUG disabled. The first two sets of numbers are the same as my latest response in the other thread (i.e., [RFC v2 0/2]), and the last set of numbers is with the bitmap approach applied: **Without no patches applied:** ``` dmapool test: size:16 align:16 blocks:8192 time:11860 dmapool test: size:64 align:64 blocks:8192 time:11951 dmapool test: size:256 align:256 blocks:8192 time:12287 dmapool test: size:1024 align:1024 blocks:2048 time:3134 dmapool test: size:4096 align:4096 blocks:1024 time:1686 dmapool test: size:68 align:32 blocks:8192 time:12050 ``` **With the submitted patches applied:** ``` dmapool test: size:16 align:16 blocks:8192 time:34432 dmapool test: size:64 align:64 blocks:8192 time:62262 dmapool test: size:256 align:256 blocks:8192 time:238137 dmapool test: size:1024 align:1024 blocks:2048 time:61386 dmapool test: size:4096 align:4096 blocks:1024 time:75342 dmapool test: size:68 align:32 blocks:8192 time:88243 ``` **With the submitted patches applied AND using a bitmap approach:** ``` dmapool test: size:16 align:16 blocks:8192 time:82733 dmapool test: size:64 align:64 blocks:8192 time:198460 dmapool test: size:256 align:256 blocks:8192 time:710316 dmapool test: size:1024 align:1024 blocks:2048 time:177801 dmapool test: size:4096 align:4096 blocks:1024 time:192297 dmapool test: size:68 align:32 blocks:8192 time:274931 ``` My guess as to why: The current linked list implementation allows us to find the next free block in constant time (`O(1)`) by directly dereferencing `pool->next_block`, and then following the `next_block` pointers for subsequent free blocks. In contrast, the bitmap approach requires iterating over all pages in `page->page_list` and, for each page, iterating through its bitmap to find the first zero bit. This results in a worst-case complexity of `O(n * b)`, where `n` is the number of pages and `b` is the number of bits in each page's bitmap. If you have ideas for mitigating this runtime overhead, I’d be happy to explore them further. Thanks, Brian Johannesmeyer
On Wed, Nov 20, 2024 at 04:46:40PM -0700, Brian Johannesmeyer wrote: > Thank you for the suggestion. I hacked together a bitmap-based > approach as you proposed, and while it does improve memory efficiency > by reducing the per-block metadata overhead, it unfortunately appears > to significantly impact the runtime performance. > > My guess as to why: The current linked list implementation allows us > to find the next free block in constant time (`O(1)`) by directly > dereferencing `pool->next_block`, and then following the `next_block` > pointers for subsequent free blocks. In contrast, the bitmap approach > requires iterating over all pages in `page->page_list` and, for each > page, iterating through its bitmap to find the first zero bit. This > results in a worst-case complexity of `O(n * b)`, where `n` is the > number of pages and `b` is the number of bits in each page's bitmap. Indeed. You'd probably need to split the linkage of the pages into a list of those that have free blocks and those that don't as a minimum. Can you share your current version?
diff --git a/mm/dmapool.c b/mm/dmapool.c index a151a21e571b..25005a9fc201 100644 --- a/mm/dmapool.c +++ b/mm/dmapool.c @@ -43,6 +43,7 @@ struct dma_block { struct dma_block *next_block; dma_addr_t dma; + void *vaddr; }; struct dma_pool { /* the pool */ @@ -64,6 +65,8 @@ struct dma_page { /* cacheable header for 'allocation' bytes */ struct list_head page_list; void *vaddr; dma_addr_t dma; + struct dma_block *blocks; + size_t blocks_per_page; }; static DEFINE_MUTEX(pools_lock); @@ -91,14 +94,35 @@ static ssize_t pools_show(struct device *dev, struct device_attribute *attr, cha static DEVICE_ATTR_RO(pools); +static struct dma_block *pool_find_block(struct dma_pool *pool, void *vaddr) +{ + struct dma_page *page; + size_t offset, index; + + list_for_each_entry(page, &pool->page_list, page_list) { + if (vaddr < page->vaddr) + continue; + offset = vaddr - page->vaddr; + if (offset >= pool->allocation) + continue; + + index = offset / pool->size; + if (index >= page->blocks_per_page) + return NULL; + + return &page->blocks[index]; + } + return NULL; +} + #ifdef DMAPOOL_DEBUG static void pool_check_block(struct dma_pool *pool, struct dma_block *block, gfp_t mem_flags) { - u8 *data = (void *)block; + u8 *data = (void *)block->vaddr; int i; - for (i = sizeof(struct dma_block); i < pool->size; i++) { + for (i = 0; i < pool->size; i++) { if (data[i] == POOL_POISON_FREED) continue; dev_err(pool->dev, "%s %s, %p (corrupted)\n", __func__, @@ -114,7 +138,7 @@ static void pool_check_block(struct dma_pool *pool, struct dma_block *block, } if (!want_init_on_alloc(mem_flags)) - memset(block, POOL_POISON_ALLOCATED, pool->size); + memset(block->vaddr, POOL_POISON_ALLOCATED, pool->size); } static struct dma_page *pool_find_page(struct dma_pool *pool, dma_addr_t dma) @@ -143,7 +167,7 @@ static bool pool_block_err(struct dma_pool *pool, void *vaddr, dma_addr_t dma) } while (block) { - if (block != vaddr) { + if (block->vaddr != vaddr) { block = block->next_block; continue; } @@ -301,6 +325,7 @@ static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page) { unsigned int next_boundary = pool->boundary, offset = 0; struct dma_block *block, *first = NULL, *last = NULL; + size_t i = 0; pool_init_page(pool, page); while (offset + pool->size <= pool->allocation) { @@ -310,7 +335,8 @@ static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page) continue; } - block = page->vaddr + offset; + block = &page->blocks[i]; + block->vaddr = page->vaddr + offset; block->dma = page->dma + offset; block->next_block = NULL; @@ -322,6 +348,7 @@ static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page) offset += pool->size; pool->nr_blocks++; + i++; } last->next_block = pool->next_block; @@ -339,9 +366,18 @@ static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags) if (!page) return NULL; + page->blocks_per_page = pool->allocation / pool->size; + page->blocks = kmalloc_array(page->blocks_per_page, + sizeof(struct dma_block), GFP_KERNEL); + if (!page->blocks) { + kfree(page); + return NULL; + } + page->vaddr = dma_alloc_coherent(pool->dev, pool->allocation, &page->dma, mem_flags); if (!page->vaddr) { + kfree(page->blocks); kfree(page); return NULL; } @@ -383,6 +419,7 @@ void dma_pool_destroy(struct dma_pool *pool) if (!busy) dma_free_coherent(pool->dev, pool->allocation, page->vaddr, page->dma); + kfree(page->blocks); list_del(&page->page_list); kfree(page); } @@ -432,9 +469,9 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags, *handle = block->dma; pool_check_block(pool, block, mem_flags); if (want_init_on_alloc(mem_flags)) - memset(block, 0, pool->size); + memset(block->vaddr, 0, pool->size); - return block; + return block->vaddr; } EXPORT_SYMBOL(dma_pool_alloc); @@ -449,9 +486,16 @@ EXPORT_SYMBOL(dma_pool_alloc); */ void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma) { - struct dma_block *block = vaddr; + struct dma_block *block; unsigned long flags; + block = pool_find_block(pool, vaddr); + if (!block) { + dev_err(pool->dev, "%s %s, invalid vaddr %p\n", + __func__, pool->name, vaddr); + return; + } + spin_lock_irqsave(&pool->lock, flags); if (!pool_block_err(pool, vaddr, dma)) { pool_block_push(pool, block, dma);