Message ID | 20090528044808.205238362@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Marcelo Tosatti wrote: > Move coalesced_mmio locking to its own device, instead of relying on > kvm->lock. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > Index: kvm-irqlock/virt/kvm/coalesced_mmio.c > =================================================================== > --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c > +++ kvm-irqlock/virt/kvm/coalesced_mmio.c > @@ -26,9 +26,7 @@ static int coalesced_mmio_in_range(struc > if (!is_write) > return 0; > > - /* kvm->lock is taken by the caller and must be not released before > - * dev.read/write > - */ > + spin_lock(&dev->lock); > This unbalanced locking is still very displeasing. At a minimum you need a sparse annotation to indicate it. But I think it really indicates a problem with the io_device API. Potential solutions: - fold in_range() into ->write and ->read. Make those functions responsible for both determining whether they can handle the range and performing the I/O. - have a separate rwlock for the device list.
On Sun, May 31, 2009 at 03:14:36PM +0300, Avi Kivity wrote: > Marcelo Tosatti wrote: >> Move coalesced_mmio locking to its own device, instead of relying on >> kvm->lock. >> >> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> >> >> Index: kvm-irqlock/virt/kvm/coalesced_mmio.c >> =================================================================== >> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c >> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c >> @@ -26,9 +26,7 @@ static int coalesced_mmio_in_range(struc >> if (!is_write) >> return 0; >> - /* kvm->lock is taken by the caller and must be not released before >> - * dev.read/write >> - */ >> + spin_lock(&dev->lock); >> > > This unbalanced locking is still very displeasing. At a minimum you > need a sparse annotation to indicate it. > > But I think it really indicates a problem with the io_device API. > > Potential solutions: > - fold in_range() into ->write and ->read. Make those functions > responsible for both determining whether they can handle the range and > performing the I/O. > - have a separate rwlock for the device list. IMO the problem is the coalesced_mmio device. The unbalanced locking is a result of the abuse of the in_range() and read/write() methods. Normally you'd expect parallel accesses to in_range() to be allowed, since its just checking whether (aha) the access is in range, returning a pointer to the device if positive. Now read/write() are the ones who need serialization, since they touch the device internal state. coalesced_mmio abuses in_range() to do more things than it should. Ideally we should fix coalesced_mmio, but i'm not going to do that now (sorry, not confident in changing it without seeing go through intense torture testing). That said, is sparse annotation enough the convince you? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti wrote: > On Sun, May 31, 2009 at 03:14:36PM +0300, Avi Kivity wrote: > >> Marcelo Tosatti wrote: >> >>> Move coalesced_mmio locking to its own device, instead of relying on >>> kvm->lock. >>> >>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> >>> >>> Index: kvm-irqlock/virt/kvm/coalesced_mmio.c >>> =================================================================== >>> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c >>> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c >>> @@ -26,9 +26,7 @@ static int coalesced_mmio_in_range(struc >>> if (!is_write) >>> return 0; >>> - /* kvm->lock is taken by the caller and must be not released before >>> - * dev.read/write >>> - */ >>> + spin_lock(&dev->lock); >>> >>> >> This unbalanced locking is still very displeasing. At a minimum you >> need a sparse annotation to indicate it. >> >> But I think it really indicates a problem with the io_device API. >> >> Potential solutions: >> - fold in_range() into ->write and ->read. Make those functions >> responsible for both determining whether they can handle the range and >> performing the I/O. >> - have a separate rwlock for the device list. >> > > IMO the problem is the coalesced_mmio device. The unbalanced locking is > a result of the abuse of the in_range() and read/write() methods. > > Okay, the penny has dropped. I understand now. > Normally you'd expect parallel accesses to in_range() to be allowed, > since its just checking whether (aha) the access is in range, returning > a pointer to the device if positive. Now read/write() are the ones who > need serialization, since they touch the device internal state. > > coalesced_mmio abuses in_range() to do more things than it should. > > Ideally we should fix coalesced_mmio, but i'm not going to do that now > (sorry, not confident in changing it without seeing go through intense > torture testing). > It's not trivial since it's userspace that clears the ring, and we can't wait on userspace. > That said, is sparse annotation enough the convince you? > Let me have a look at fixing coalesced_mmio first. We might allow ->write to fail, causing a fallback to userspace. Or we could fail if n_avail < MAX_VCPUS, so even the worst-case race leaves us one entry.
This is to fix a deadlock reported by Alex Williamson, while at the same time makes it easier to allow PIO/MMIO regions to be registered/unregistered while a guest is alive.
Marcelo Tosatti wrote: > This is to fix a deadlock reported by Alex Williamson, while at > the same time makes it easier to allow PIO/MMIO regions to be > registered/unregistered while a guest is alive. > Applied all, thanks. I also changed the coalesced_mmio overflow check to account for KVM_MAX_VCPUS.
Index: kvm-irqlock/virt/kvm/coalesced_mmio.c =================================================================== --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c +++ kvm-irqlock/virt/kvm/coalesced_mmio.c @@ -26,9 +26,7 @@ static int coalesced_mmio_in_range(struc if (!is_write) return 0; - /* kvm->lock is taken by the caller and must be not released before - * dev.read/write - */ + spin_lock(&dev->lock); /* Are we able to batch it ? */ @@ -41,7 +39,7 @@ static int coalesced_mmio_in_range(struc KVM_COALESCED_MMIO_MAX; if (next == dev->kvm->coalesced_mmio_ring->first) { /* full */ - return 0; + goto out_denied; } /* is it in a batchable area ? */ @@ -57,6 +55,8 @@ static int coalesced_mmio_in_range(struc addr + len <= zone->addr + zone->size) return 1; } +out_denied: + spin_unlock(&dev->lock); return 0; } @@ -67,8 +67,6 @@ static void coalesced_mmio_write(struct (struct kvm_coalesced_mmio_dev*)this->private; struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring; - /* kvm->lock must be taken by caller before call to in_range()*/ - /* copy data in first free entry of the ring */ ring->coalesced_mmio[ring->last].phys_addr = addr; @@ -76,6 +74,7 @@ static void coalesced_mmio_write(struct memcpy(ring->coalesced_mmio[ring->last].data, val, len); smp_wmb(); ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX; + spin_unlock(&dev->lock); } static void coalesced_mmio_destructor(struct kvm_io_device *this) @@ -90,6 +89,8 @@ int kvm_coalesced_mmio_init(struct kvm * dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL); if (!dev) return -ENOMEM; + spin_lock_init(&dev->lock); + dev->dev.write = coalesced_mmio_write; dev->dev.in_range = coalesced_mmio_in_range; dev->dev.destructor = coalesced_mmio_destructor; Index: kvm-irqlock/virt/kvm/coalesced_mmio.h =================================================================== --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.h +++ kvm-irqlock/virt/kvm/coalesced_mmio.h @@ -12,6 +12,7 @@ struct kvm_coalesced_mmio_dev { struct kvm_io_device dev; struct kvm *kvm; + spinlock_t lock; int nb_zones; struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX]; };
Move coalesced_mmio locking to its own device, instead of relying on kvm->lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>