Message ID | 6b42014d04258c706c7c43ae739efb30e32496b9.1445994839.git.luto@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Andy, On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote: > From: Andy Lutomirski <luto@amacapital.net> > > virtio_ring currently sends the device (usually a hypervisor) > physical addresses of its I/O buffers. This is okay when DMA > addresses and physical addresses are the same thing, but this isn't > always the case. For example, this never works on Xen guests, and > it is likely to fail if a physical "virtio" device ever ends up > behind an IOMMU or swiotlb. The overall code looks good, but I havn't seen and dma_sync* calls. When swiotlb=force is in use this would break. > + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, vring_map_single( > + vq, > + desc, total_sg * sizeof(struct vring_desc), > + DMA_TO_DEVICE)); Nit: I think readability could be improved here by using a temporary variable for the return value of vring_map_single(). Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 27, 2015 at 7:06 PM, Joerg Roedel <jroedel@suse.de> wrote: > Hi Andy, > > On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote: >> From: Andy Lutomirski <luto@amacapital.net> >> >> virtio_ring currently sends the device (usually a hypervisor) >> physical addresses of its I/O buffers. This is okay when DMA >> addresses and physical addresses are the same thing, but this isn't >> always the case. For example, this never works on Xen guests, and >> it is likely to fail if a physical "virtio" device ever ends up >> behind an IOMMU or swiotlb. > > The overall code looks good, but I havn't seen and dma_sync* calls. > When swiotlb=force is in use this would break. > >> + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, vring_map_single( >> + vq, >> + desc, total_sg * sizeof(struct vring_desc), >> + DMA_TO_DEVICE)); > Are you talking about a dma_sync call on the descriptor ring itself? Isn't dma_alloc_coherent supposed to make that unnecessary? I should move the allocation into the virtqueue code. The docs suggest that I might need to "flush the processor's write buffers before telling devices to read that memory". I'm not sure how to do that. > Nit: I think readability could be improved here by using a temporary > variable for the return value of vring_map_single(). > I'll do something like that for v2. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 27, 2015 at 07:13:56PM -0700, Andy Lutomirski wrote: > On Tue, Oct 27, 2015 at 7:06 PM, Joerg Roedel <jroedel@suse.de> wrote: > > Hi Andy, > > > > On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote: > >> From: Andy Lutomirski <luto@amacapital.net> > >> > >> virtio_ring currently sends the device (usually a hypervisor) > >> physical addresses of its I/O buffers. This is okay when DMA > >> addresses and physical addresses are the same thing, but this isn't > >> always the case. For example, this never works on Xen guests, and > >> it is likely to fail if a physical "virtio" device ever ends up > >> behind an IOMMU or swiotlb. > > > > The overall code looks good, but I havn't seen and dma_sync* calls. > > When swiotlb=force is in use this would break. > > > >> + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, vring_map_single( > >> + vq, > >> + desc, total_sg * sizeof(struct vring_desc), > >> + DMA_TO_DEVICE)); > > > > Are you talking about a dma_sync call on the descriptor ring itself? > Isn't dma_alloc_coherent supposed to make that unnecessary? I should > move the allocation into the virtqueue code. > > The docs suggest that I might need to "flush the processor's write > buffers before telling devices to read that memory". I'm not sure how > to do that. The write buffers should be flushed by the dma-api functions if necessary. For dma_alloc_coherent allocations you don't need to call dma_sync*, but for the map_single/map_page/map_sg ones, as these might be bounce-buffered. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 28.10.2015 um 10:17 schrieb Andy Lutomirski: @@ -423,27 +522,42 @@ EXPORT_SYMBOL_GPL(virtqueue_kick); > > static void detach_buf(struct vring_virtqueue *vq, unsigned int head) > { > - unsigned int i; > + unsigned int i, j; > + u16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT); > > /* Clear data ptr. */ > - vq->data[head] = NULL; > + vq->desc_state[head].data = NULL; > > - /* Put back on free list: find end */ > + /* Put back on free list: unmap first-level descriptors and find end */ > i = head; > > - /* Free the indirect table */ > - if (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)) > - kfree(phys_to_virt(virtio64_to_cpu(vq->vq.vdev, vq->vring.desc[i].addr))); > - > - while (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT)) { > + while (vq->vring.desc[i].flags & nextflag) { > + vring_unmap_one(vq, &vq->vring.desc[i]); > i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next); > vq->vq.num_free++; > } > > + vring_unmap_one(vq, &vq->vring.desc[i]); > vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head); > vq->free_head = head; > + > /* Plus final descriptor */ > vq->vq.num_free++; > + > + /* Free the indirect table, if any, now that it's unmapped. */ > + if (vq->desc_state[head].indir_desc) { > + struct vring_desc *indir_desc = vq->desc_state[head].indir_desc; > + u32 len = vq->vring.desc[head].len; > + > + BUG_ON(!(vq->vring.desc[head].flags & VRING_DESC_F_INDIRECT)); > + BUG_ON(len == 0 || len % sizeof(struct vring_desc)); > + > + for (j = 0; j < len / sizeof(struct vring_desc); j++) > + vring_unmap_one(vq, &indir_desc[j]); > + > + kfree(vq->desc_state[head].indir_desc); > + vq->desc_state[head].indir_desc = NULL; > + } > } something seems to be broken with indirect descriptors with that change [ 1.885451] ------------[ cut here ]------------ [ 1.885454] kernel BUG at drivers/virtio/virtio_ring.c:552! [ 1.885487] illegal operation: 0001 ilc:1 [#1] SMP [ 1.885489] Modules linked in: [ 1.885491] CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.3.0-rc3+ #250 [ 1.885493] task: 000000000be49988 ti: 000000000be54000 task.ti: 000000000be54000 [ 1.885514] Krnl PSW : 0404c00180000000 000000000059b9d2 (detach_buf+0x1ca/0x1d0) [ 1.885519] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 0000000000000000 0000000030000000 000000000c0c8c00 000000000a89c000 [ 1.885522] 000000000059b8fa 0000000000000001 0000000000000000 0000000000000000 [ 1.885523] 0000000000000000 0000000000000000 0000000000000100 000000000a89c000 [ 1.885525] 000000000c082000 000000000c082000 000000000059b8fa 000000000c7fbc88 [ 1.885531] Krnl Code: 000000000059b9c6: a7f4ff40 brc 15,59b846 000000000059b9ca: a7f40001 brc 15,59b9cc #000000000059b9ce: a7f40001 brc 15,59b9d0 >000000000059b9d2: 0707 bcr 0,%r7 000000000059b9d4: 0707 bcr 0,%r7 000000000059b9d6: 0707 bcr 0,%r7 000000000059b9d8: c00400000000 brcl 0,59b9d8 000000000059b9de: ebcff0780024 stmg %r12,%r15,120(%r15) [ 1.885542] Call Trace: [ 1.885544] ([<000000000059b8fa>] detach_buf+0xf2/0x1d0) [ 1.885546] [<000000000059bbb4>] virtqueue_get_buf+0xcc/0x218 [ 1.885549] [<00000000005dd8fe>] virtblk_done+0xa6/0x120 [ 1.885550] [<000000000059b66e>] vring_interrupt+0x7e/0x108 [ 1.885552] [<00000000006c21a4>] virtio_airq_handler+0x7c/0x120 [ 1.885554] [<0000000000657e54>] do_airq_interrupt+0xa4/0xc8 [ 1.885558] [<00000000001b79a0>] handle_irq_event_percpu+0x60/0x1f0 [ 1.885560] [<00000000001bbbea>] handle_percpu_irq+0x72/0xa0 [ 1.885561] [<00000000001b6fa4>] generic_handle_irq+0x4c/0x78 [ 1.885564] [<000000000010cc7c>] do_IRQ+0x64/0x88 [ 1.885566] Btrfs loaded [ 1.885600] [<000000000080e30a>] io_int_handler+0x10a/0x218 [ 1.885603] [<0000000000186b9e>] wake_up_new_task+0x1be/0x260 [ 1.885604] ([<0000000000186b6a>] wake_up_new_task+0x18a/0x260) [ 1.885606] [<0000000000156182>] _do_fork+0xfa/0x338 [ 1.885608] [<000000000015644e>] kernel_thread+0x4e/0x60 [ 1.885609] [<0000000000179202>] kthreadd+0x1a2/0x238 [ 1.885611] [<000000000080e056>] kernel_thread_starter+0x6/0xc [ 1.885613] [<000000000080e050>] kernel_thread_starter+0x0/0xc [ 1.885614] Last Breaking-Event-Address: [ 1.885615] [<000000000059b9ce>] detach_buf+0x1c6/0x1d0 [ 1.885617] [ 1.885618] Kernel panic - not syncing: Fatal exception in interrupt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 27, 2015 at 7:27 PM, Christian Borntraeger <borntraeger@de.ibm.com> wrote: > Am 28.10.2015 um 10:17 schrieb Andy Lutomirski: > @@ -423,27 +522,42 @@ EXPORT_SYMBOL_GPL(virtqueue_kick); >> >> static void detach_buf(struct vring_virtqueue *vq, unsigned int head) >> { >> - unsigned int i; >> + unsigned int i, j; >> + u16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT); >> >> /* Clear data ptr. */ >> - vq->data[head] = NULL; >> + vq->desc_state[head].data = NULL; >> >> - /* Put back on free list: find end */ >> + /* Put back on free list: unmap first-level descriptors and find end */ >> i = head; >> >> - /* Free the indirect table */ >> - if (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)) >> - kfree(phys_to_virt(virtio64_to_cpu(vq->vq.vdev, vq->vring.desc[i].addr))); >> - >> - while (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT)) { >> + while (vq->vring.desc[i].flags & nextflag) { >> + vring_unmap_one(vq, &vq->vring.desc[i]); >> i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next); >> vq->vq.num_free++; >> } >> >> + vring_unmap_one(vq, &vq->vring.desc[i]); >> vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head); >> vq->free_head = head; >> + >> /* Plus final descriptor */ >> vq->vq.num_free++; >> + >> + /* Free the indirect table, if any, now that it's unmapped. */ >> + if (vq->desc_state[head].indir_desc) { >> + struct vring_desc *indir_desc = vq->desc_state[head].indir_desc; >> + u32 len = vq->vring.desc[head].len; >> + >> + BUG_ON(!(vq->vring.desc[head].flags & VRING_DESC_F_INDIRECT)); >> + BUG_ON(len == 0 || len % sizeof(struct vring_desc)); >> + >> + for (j = 0; j < len / sizeof(struct vring_desc); j++) >> + vring_unmap_one(vq, &indir_desc[j]); >> + >> + kfree(vq->desc_state[head].indir_desc); >> + vq->desc_state[head].indir_desc = NULL; >> + } >> } > > something seems to be broken with indirect descriptors with that change You're big endian, right? I had an endian handling bug, and I'll fix it in v2. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 27, 2015 at 7:21 PM, Joerg Roedel <jroedel@suse.de> wrote: > On Tue, Oct 27, 2015 at 07:13:56PM -0700, Andy Lutomirski wrote: >> On Tue, Oct 27, 2015 at 7:06 PM, Joerg Roedel <jroedel@suse.de> wrote: >> > Hi Andy, >> > >> > On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote: >> >> From: Andy Lutomirski <luto@amacapital.net> >> >> >> >> virtio_ring currently sends the device (usually a hypervisor) >> >> physical addresses of its I/O buffers. This is okay when DMA >> >> addresses and physical addresses are the same thing, but this isn't >> >> always the case. For example, this never works on Xen guests, and >> >> it is likely to fail if a physical "virtio" device ever ends up >> >> behind an IOMMU or swiotlb. >> > >> > The overall code looks good, but I havn't seen and dma_sync* calls. >> > When swiotlb=force is in use this would break. >> > >> >> + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, vring_map_single( >> >> + vq, >> >> + desc, total_sg * sizeof(struct vring_desc), >> >> + DMA_TO_DEVICE)); >> > >> >> Are you talking about a dma_sync call on the descriptor ring itself? >> Isn't dma_alloc_coherent supposed to make that unnecessary? I should >> move the allocation into the virtqueue code. >> >> The docs suggest that I might need to "flush the processor's write >> buffers before telling devices to read that memory". I'm not sure how >> to do that. > > The write buffers should be flushed by the dma-api functions if > necessary. For dma_alloc_coherent allocations you don't need to call > dma_sync*, but for the map_single/map_page/map_sg ones, as these might > be bounce-buffered. I think that all the necessary barriers are already there. I had a nasty bug that swiotlb=force exposed, though, which I've fixed. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index cab9f3f63a38..77590320d44c 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -60,7 +60,7 @@ config VIRTIO_INPUT config VIRTIO_MMIO tristate "Platform bus driver for memory mapped virtio devices" - depends on HAS_IOMEM + depends on HAS_IOMEM && HAS_DMA select VIRTIO ---help--- This drivers provides support for memory mapped virtio diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 096b857e7b75..911fbaa7a260 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -24,6 +24,7 @@ #include <linux/module.h> #include <linux/hrtimer.h> #include <linux/kmemleak.h> +#include <linux/dma-mapping.h> #ifdef DEBUG /* For development, we want to crash whenever the ring is screwed. */ @@ -54,7 +55,14 @@ #define END_USE(vq) #endif -struct vring_virtqueue { +struct vring_desc_state +{ + void *data; /* Data for callback. */ + struct vring_desc *indir_desc; /* Indirect descriptor, if any. */ +}; + +struct vring_virtqueue +{ struct virtqueue vq; /* Actual memory layout for this queue */ @@ -92,12 +100,71 @@ struct vring_virtqueue { ktime_t last_add_time; #endif - /* Tokens for callbacks. */ - void *data[]; + /* Per-descriptor state. */ + struct vring_desc_state desc_state[]; }; #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq) +/* + * The DMA ops on various arches are rather gnarly right now, and + * making all of the arch DMA ops work on the vring device itself + * is a mess. For now, we use the parent device for DMA ops. + */ +struct device *vring_dma_dev(const struct vring_virtqueue *vq) +{ + return vq->vq.vdev->dev.parent; +} + +/* Map one sg entry. */ +static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq, + struct scatterlist *sg, + enum dma_data_direction direction) +{ + /* + * We can't use dma_map_sg, because we don't use scatterlists in + * the way it expects (we don't guarantee that the scatterlist + * will exist for the lifetime of the mapping). + */ + return dma_map_page(vring_dma_dev(vq), + sg_page(sg), sg->offset, sg->length, + direction); +} + +static dma_addr_t vring_map_single(const struct vring_virtqueue *vq, + void *cpu_addr, size_t size, + enum dma_data_direction direction) +{ + return dma_map_single(vring_dma_dev(vq), + cpu_addr, size, direction); +} + +static void vring_unmap_one(const struct vring_virtqueue *vq, + struct vring_desc *desc) +{ + u16 flags = virtio16_to_cpu(vq->vq.vdev, desc->flags); + + if (flags & VRING_DESC_F_INDIRECT) { + dma_unmap_single(vring_dma_dev(vq), + virtio64_to_cpu(vq->vq.vdev, desc->addr), + virtio32_to_cpu(vq->vq.vdev, desc->len), + (flags & VRING_DESC_F_WRITE) ? + DMA_FROM_DEVICE : DMA_TO_DEVICE); + } else { + dma_unmap_page(vring_dma_dev(vq), + virtio64_to_cpu(vq->vq.vdev, desc->addr), + virtio32_to_cpu(vq->vq.vdev, desc->len), + (flags & VRING_DESC_F_WRITE) ? + DMA_FROM_DEVICE : DMA_TO_DEVICE); + } +} + +static int vring_mapping_error(const struct vring_virtqueue *vq, + dma_addr_t addr) +{ + return dma_mapping_error(vring_dma_dev(vq), addr); +} + static struct vring_desc *alloc_indirect(struct virtqueue *_vq, unsigned int total_sg, gfp_t gfp) { @@ -131,7 +198,7 @@ static inline int virtqueue_add(struct virtqueue *_vq, struct vring_virtqueue *vq = to_vvq(_vq); struct scatterlist *sg; struct vring_desc *desc; - unsigned int i, n, avail, descs_used, uninitialized_var(prev); + unsigned int i, n, avail, descs_used, uninitialized_var(prev), err_idx; int head; bool indirect; @@ -171,21 +238,25 @@ static inline int virtqueue_add(struct virtqueue *_vq, if (desc) { /* Use a single buffer which doesn't continue */ + indirect = true; vq->vring.desc[head].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT); - vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, virt_to_phys(desc)); - /* avoid kmemleak false positive (hidden by virt_to_phys) */ - kmemleak_ignore(desc); + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, vring_map_single( + vq, + desc, total_sg * sizeof(struct vring_desc), + DMA_TO_DEVICE)); + if (vring_mapping_error(vq, vq->vring.desc[head].addr)) + goto unmap_free_indir; + vq->vring.desc[head].len = cpu_to_virtio32(_vq->vdev, total_sg * sizeof(struct vring_desc)); /* Set up rest to use this indirect table. */ i = 0; descs_used = 1; - indirect = true; } else { + indirect = false; desc = vq->vring.desc; i = head; descs_used = total_sg; - indirect = false; } if (vq->vq.num_free < descs_used) { @@ -200,14 +271,13 @@ static inline int virtqueue_add(struct virtqueue *_vq, return -ENOSPC; } - /* We're about to use some buffers from the free list. */ - vq->vq.num_free -= descs_used; - for (n = 0; n < out_sgs; n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT); - desc[i].addr = cpu_to_virtio64(_vq->vdev, sg_phys(sg)); + desc[i].addr = cpu_to_virtio64(_vq->vdev, vring_map_one_sg(vq, sg, DMA_TO_DEVICE)); desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length); + if (vring_mapping_error(vq, desc[i].addr)) + goto unmap_release; prev = i; i = virtio16_to_cpu(_vq->vdev, desc[i].next); } @@ -215,8 +285,10 @@ static inline int virtqueue_add(struct virtqueue *_vq, for (; n < (out_sgs + in_sgs); n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE); - desc[i].addr = cpu_to_virtio64(_vq->vdev, sg_phys(sg)); + desc[i].addr = cpu_to_virtio64(_vq->vdev, vring_map_one_sg(vq, sg, DMA_FROM_DEVICE)); desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length); + if (vring_mapping_error(vq, desc[i].addr)) + goto unmap_release; prev = i; i = virtio16_to_cpu(_vq->vdev, desc[i].next); } @@ -224,14 +296,19 @@ static inline int virtqueue_add(struct virtqueue *_vq, /* Last one doesn't continue. */ desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT); + /* We're using some buffers from the free list. */ + vq->vq.num_free -= descs_used; + /* Update free pointer */ if (indirect) vq->free_head = virtio16_to_cpu(_vq->vdev, vq->vring.desc[head].next); else vq->free_head = i; - /* Set token. */ - vq->data[head] = data; + /* Store token and indirect buffer state. */ + vq->desc_state[head].data = data; + if (indirect) + vq->desc_state[head].indir_desc = desc; /* Put entry in available array (but don't update avail->idx until they * do sync). */ @@ -253,6 +330,28 @@ static inline int virtqueue_add(struct virtqueue *_vq, virtqueue_kick(_vq); return 0; + +unmap_release: + err_idx = i; + i = head; + + for (n = 0; n < total_sg; n++) { + if (i == err_idx) + break; + vring_unmap_one(vq, &desc[i]); + i = vq->vring.desc[i].next; + } + + vq->vq.num_free += total_sg; + + if (indirect) + vring_unmap_one(vq, desc); + +unmap_free_indir: + if (indirect) + kfree(desc); + + return -EIO; } /** @@ -423,27 +522,42 @@ EXPORT_SYMBOL_GPL(virtqueue_kick); static void detach_buf(struct vring_virtqueue *vq, unsigned int head) { - unsigned int i; + unsigned int i, j; + u16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT); /* Clear data ptr. */ - vq->data[head] = NULL; + vq->desc_state[head].data = NULL; - /* Put back on free list: find end */ + /* Put back on free list: unmap first-level descriptors and find end */ i = head; - /* Free the indirect table */ - if (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)) - kfree(phys_to_virt(virtio64_to_cpu(vq->vq.vdev, vq->vring.desc[i].addr))); - - while (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT)) { + while (vq->vring.desc[i].flags & nextflag) { + vring_unmap_one(vq, &vq->vring.desc[i]); i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next); vq->vq.num_free++; } + vring_unmap_one(vq, &vq->vring.desc[i]); vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head); vq->free_head = head; + /* Plus final descriptor */ vq->vq.num_free++; + + /* Free the indirect table, if any, now that it's unmapped. */ + if (vq->desc_state[head].indir_desc) { + struct vring_desc *indir_desc = vq->desc_state[head].indir_desc; + u32 len = vq->vring.desc[head].len; + + BUG_ON(!(vq->vring.desc[head].flags & VRING_DESC_F_INDIRECT)); + BUG_ON(len == 0 || len % sizeof(struct vring_desc)); + + for (j = 0; j < len / sizeof(struct vring_desc); j++) + vring_unmap_one(vq, &indir_desc[j]); + + kfree(vq->desc_state[head].indir_desc); + vq->desc_state[head].indir_desc = NULL; + } } static inline bool more_used(const struct vring_virtqueue *vq) @@ -498,13 +612,13 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len) BAD_RING(vq, "id %u out of range\n", i); return NULL; } - if (unlikely(!vq->data[i])) { + if (unlikely(!vq->desc_state[i].data)) { BAD_RING(vq, "id %u is not a head!\n", i); return NULL; } /* detach_buf clears data, so grab it now. */ - ret = vq->data[i]; + ret = vq->desc_state[i].data; detach_buf(vq, i); vq->last_used_idx++; /* If we expect an interrupt for the next entry, tell host @@ -665,10 +779,10 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq) START_USE(vq); for (i = 0; i < vq->vring.num; i++) { - if (!vq->data[i]) + if (!vq->desc_state[i].data) continue; /* detach_buf clears data, so grab it now. */ - buf = vq->data[i]; + buf = vq->desc_state[i].data; detach_buf(vq, i); vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - 1); END_USE(vq); @@ -721,7 +835,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index, return NULL; } - vq = kmalloc(sizeof(*vq) + sizeof(void *)*num, GFP_KERNEL); + vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state), + GFP_KERNEL); if (!vq) return NULL; @@ -751,11 +866,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int index, /* Put everything in free lists. */ vq->free_head = 0; - for (i = 0; i < num-1; i++) { + for (i = 0; i < num-1; i++) vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1); - vq->data[i] = NULL; - } - vq->data[i] = NULL; + memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state)); return &vq->vq; } diff --git a/tools/virtio/linux/dma-mapping.h b/tools/virtio/linux/dma-mapping.h new file mode 100644 index 000000000000..4f93af89ae16 --- /dev/null +++ b/tools/virtio/linux/dma-mapping.h @@ -0,0 +1,17 @@ +#ifndef _LINUX_DMA_MAPPING_H +#define _LINUX_DMA_MAPPING_H + +#ifdef CONFIG_HAS_DMA +# error Virtio userspace code does not support CONFIG_HAS_DMA +#endif + +#define PCI_DMA_BUS_IS_PHYS 1 + +enum dma_data_direction { + DMA_BIDIRECTIONAL = 0, + DMA_TO_DEVICE = 1, + DMA_FROM_DEVICE = 2, + DMA_NONE = 3, +}; + +#endif