diff mbox series

[RFC,v2,1/2] dmapool: Move pool metadata into non-DMA memory

Message ID 20241119205529.3871048-2-bjohannesmeyer@gmail.com (mailing list archive)
State New
Headers show
Series dmapool: Mitigate device-controllable mem. corruption | expand

Commit Message

Brian Johannesmeyer Nov. 19, 2024, 8:55 p.m. UTC
If a `struct dma_block` object resides in DMA memory, a malicious
peripheral device can corrupt its metadata --- specifically, its
`next_block` pointer, which links blocks in a DMA pool. By corrupting these
pointers, an attacker can manipulate `dma_pool_alloc()` into returning
attacker-controllable pointers, which can lead to kernel memory corruption
from a driver that calls it.

To prevent this, move the `struct dma_block` metadata into non-DMA memory,
ensuring that devices cannot tamper with the internal pointers of the DMA
pool allocator. Specifically:

- Add a `vaddr` field to `struct dma_block` to point to the actual
  DMA-accessible block.
- Maintain an array of `struct dma_block` objects in `struct dma_page` to
  track the metadata of each block within an allocated page.

This change secures the DMA pool allocator by keeping its metadata in
kernel memory, inaccessible to peripheral devices, thereby preventing
potential attacks that could corrupt kernel memory through DMA operations.

Co-developed-by: Raphael Isemann <teemperor@gmail.com>
Signed-off-by: Raphael Isemann <teemperor@gmail.com>
Signed-off-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com>
---
 mm/dmapool.c | 60 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 52 insertions(+), 8 deletions(-)

Comments

Christoph Hellwig Nov. 20, 2024, 9:37 a.m. UTC | #1
On Tue, Nov 19, 2024 at 09:55:28PM +0100, Brian Johannesmeyer wrote:

> +	page->blocks_per_page = pool->allocation / pool->size;
> +	page->blocks = kmalloc_array(page->blocks_per_page,
> +				     sizeof(struct dma_block), GFP_KERNEL);

Given that you now need an array of the blocks anyway, it might make
sense to switch from a linked list to a bitmap for tracking free state,
which would be a lot more efficient as you only need a bit per block
as tracking overhead instead of a two pointers and a dma_addr_t.

e.g. do a find_first_zero_bit() to find the ffree slot, then calculate
the dma_addr and virt address by simple offseting into the dma_page
ones with bitnr * pool->size.
Brian Johannesmeyer Nov. 20, 2024, 11:46 p.m. UTC | #2
> Given that you now need an array of the blocks anyway, it might make
> sense to switch from a linked list to a bitmap for tracking free state,
> which would be a lot more efficient as you only need a bit per block
> as tracking overhead instead of a two pointers and a dma_addr_t.
>
> e.g. do a find_first_zero_bit() to find the ffree slot, then calculate
> the dma_addr and virt address by simple offseting into the dma_page
> ones with bitnr * pool->size.

Thank you for the suggestion. I hacked together a bitmap-based
approach as you proposed, and while it does improve memory efficiency
by reducing the per-block metadata overhead, it unfortunately appears
to significantly impact the runtime performance.

Here are the performance results, with DMAPOOL_DEBUG disabled. The
first two sets of numbers are the same as my latest response in the
other thread (i.e., [RFC v2 0/2]), and the last set of numbers is with
the bitmap approach applied:

**Without no patches applied:**
```
dmapool test: size:16   align:16   blocks:8192 time:11860
dmapool test: size:64   align:64   blocks:8192 time:11951
dmapool test: size:256  align:256  blocks:8192 time:12287
dmapool test: size:1024 align:1024 blocks:2048 time:3134
dmapool test: size:4096 align:4096 blocks:1024 time:1686
dmapool test: size:68   align:32   blocks:8192 time:12050
```

**With the submitted patches applied:**
```
dmapool test: size:16   align:16   blocks:8192 time:34432
dmapool test: size:64   align:64   blocks:8192 time:62262
dmapool test: size:256  align:256  blocks:8192 time:238137
dmapool test: size:1024 align:1024 blocks:2048 time:61386
dmapool test: size:4096 align:4096 blocks:1024 time:75342
dmapool test: size:68   align:32   blocks:8192 time:88243
```

**With the submitted patches applied AND using a bitmap approach:**
```
dmapool test: size:16   align:16   blocks:8192 time:82733
dmapool test: size:64   align:64   blocks:8192 time:198460
dmapool test: size:256  align:256  blocks:8192 time:710316
dmapool test: size:1024 align:1024 blocks:2048 time:177801
dmapool test: size:4096 align:4096 blocks:1024 time:192297
dmapool test: size:68   align:32   blocks:8192 time:274931
```

My guess as to why: The current linked list implementation allows us
to find the next free block in constant time (`O(1)`) by directly
dereferencing `pool->next_block`, and then following the `next_block`
pointers for subsequent free blocks. In contrast, the bitmap approach
requires iterating over all pages in `page->page_list` and, for each
page, iterating through its bitmap to find the first zero bit. This
results in a worst-case complexity of `O(n * b)`, where `n` is the
number of pages and `b` is the number of bits in each page's bitmap.

If you have ideas for mitigating this runtime overhead, I’d be happy
to explore them further.

Thanks,

Brian Johannesmeyer
Christoph Hellwig Nov. 21, 2024, 5:03 a.m. UTC | #3
On Wed, Nov 20, 2024 at 04:46:40PM -0700, Brian Johannesmeyer wrote:
> Thank you for the suggestion. I hacked together a bitmap-based
> approach as you proposed, and while it does improve memory efficiency
> by reducing the per-block metadata overhead, it unfortunately appears
> to significantly impact the runtime performance.
>
> My guess as to why: The current linked list implementation allows us
> to find the next free block in constant time (`O(1)`) by directly
> dereferencing `pool->next_block`, and then following the `next_block`
> pointers for subsequent free blocks. In contrast, the bitmap approach
> requires iterating over all pages in `page->page_list` and, for each
> page, iterating through its bitmap to find the first zero bit. This
> results in a worst-case complexity of `O(n * b)`, where `n` is the
> number of pages and `b` is the number of bits in each page's bitmap.

Indeed.  You'd probably need to split the linkage of the pages into
a list of those that have free blocks and those that don't as a minimum.

Can you share your current version?
diff mbox series

Patch

diff --git a/mm/dmapool.c b/mm/dmapool.c
index a151a21e571b..25005a9fc201 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -43,6 +43,7 @@ 
 struct dma_block {
 	struct dma_block *next_block;
 	dma_addr_t dma;
+	void *vaddr;
 };
 
 struct dma_pool {		/* the pool */
@@ -64,6 +65,8 @@  struct dma_page {		/* cacheable header for 'allocation' bytes */
 	struct list_head page_list;
 	void *vaddr;
 	dma_addr_t dma;
+	struct dma_block *blocks;
+	size_t blocks_per_page;
 };
 
 static DEFINE_MUTEX(pools_lock);
@@ -91,14 +94,35 @@  static ssize_t pools_show(struct device *dev, struct device_attribute *attr, cha
 
 static DEVICE_ATTR_RO(pools);
 
+static struct dma_block *pool_find_block(struct dma_pool *pool, void *vaddr)
+{
+	struct dma_page *page;
+	size_t offset, index;
+
+	list_for_each_entry(page, &pool->page_list, page_list) {
+		if (vaddr < page->vaddr)
+			continue;
+		offset = vaddr - page->vaddr;
+		if (offset >= pool->allocation)
+			continue;
+
+		index = offset / pool->size;
+		if (index >= page->blocks_per_page)
+			return NULL;
+
+		return &page->blocks[index];
+	}
+	return NULL;
+}
+
 #ifdef DMAPOOL_DEBUG
 static void pool_check_block(struct dma_pool *pool, struct dma_block *block,
 			     gfp_t mem_flags)
 {
-	u8 *data = (void *)block;
+	u8 *data = (void *)block->vaddr;
 	int i;
 
-	for (i = sizeof(struct dma_block); i < pool->size; i++) {
+	for (i = 0; i < pool->size; i++) {
 		if (data[i] == POOL_POISON_FREED)
 			continue;
 		dev_err(pool->dev, "%s %s, %p (corrupted)\n", __func__,
@@ -114,7 +138,7 @@  static void pool_check_block(struct dma_pool *pool, struct dma_block *block,
 	}
 
 	if (!want_init_on_alloc(mem_flags))
-		memset(block, POOL_POISON_ALLOCATED, pool->size);
+		memset(block->vaddr, POOL_POISON_ALLOCATED, pool->size);
 }
 
 static struct dma_page *pool_find_page(struct dma_pool *pool, dma_addr_t dma)
@@ -143,7 +167,7 @@  static bool pool_block_err(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
 	}
 
 	while (block) {
-		if (block != vaddr) {
+		if (block->vaddr != vaddr) {
 			block = block->next_block;
 			continue;
 		}
@@ -301,6 +325,7 @@  static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 {
 	unsigned int next_boundary = pool->boundary, offset = 0;
 	struct dma_block *block, *first = NULL, *last = NULL;
+	size_t i = 0;
 
 	pool_init_page(pool, page);
 	while (offset + pool->size <= pool->allocation) {
@@ -310,7 +335,8 @@  static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 			continue;
 		}
 
-		block = page->vaddr + offset;
+		block = &page->blocks[i];
+		block->vaddr = page->vaddr + offset;
 		block->dma = page->dma + offset;
 		block->next_block = NULL;
 
@@ -322,6 +348,7 @@  static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 
 		offset += pool->size;
 		pool->nr_blocks++;
+		i++;
 	}
 
 	last->next_block = pool->next_block;
@@ -339,9 +366,18 @@  static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags)
 	if (!page)
 		return NULL;
 
+	page->blocks_per_page = pool->allocation / pool->size;
+	page->blocks = kmalloc_array(page->blocks_per_page,
+				     sizeof(struct dma_block), GFP_KERNEL);
+	if (!page->blocks) {
+		kfree(page);
+		return NULL;
+	}
+
 	page->vaddr = dma_alloc_coherent(pool->dev, pool->allocation,
 					 &page->dma, mem_flags);
 	if (!page->vaddr) {
+		kfree(page->blocks);
 		kfree(page);
 		return NULL;
 	}
@@ -383,6 +419,7 @@  void dma_pool_destroy(struct dma_pool *pool)
 		if (!busy)
 			dma_free_coherent(pool->dev, pool->allocation,
 					  page->vaddr, page->dma);
+		kfree(page->blocks);
 		list_del(&page->page_list);
 		kfree(page);
 	}
@@ -432,9 +469,9 @@  void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 	*handle = block->dma;
 	pool_check_block(pool, block, mem_flags);
 	if (want_init_on_alloc(mem_flags))
-		memset(block, 0, pool->size);
+		memset(block->vaddr, 0, pool->size);
 
-	return block;
+	return block->vaddr;
 }
 EXPORT_SYMBOL(dma_pool_alloc);
 
@@ -449,9 +486,16 @@  EXPORT_SYMBOL(dma_pool_alloc);
  */
 void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
 {
-	struct dma_block *block = vaddr;
+	struct dma_block *block;
 	unsigned long flags;
 
+	block = pool_find_block(pool, vaddr);
+	if (!block) {
+		dev_err(pool->dev, "%s %s, invalid vaddr %p\n",
+			__func__, pool->name, vaddr);
+		return;
+	}
+
 	spin_lock_irqsave(&pool->lock, flags);
 	if (!pool_block_err(pool, vaddr, dma)) {
 		pool_block_push(pool, block, dma);