virtio: guard vring access when setting notification

Message ID	20170228152411.81609-1-cornelia.huck@de.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> Gateway: Authorized Use Only! Violators will be prosecuted for <qemu-devel@nongnu.org> from <cornelia.huck@de.ibm.com>; Tue, 28 Feb 2017 15:24:14 -0000 Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 28 Feb 2017 15:24:13 -0000 From: Cornelia Huck <cornelia.huck@de.ibm.com> To: qemu-devel@nongnu.org Date: Tue, 28 Feb 2017 16:24:11 +0100 Message-Id: <20170228152411.81609-1-cornelia.huck@de.ibm.com> Subject: [Qemu-devel] [PATCH] virtio: guard vring access when setting notification Precedence: list Cc: Cornelia Huck <cornelia.huck@de.ibm.com>, pbonzini@redhat.com, borntraeger@de.ibm.com, mst@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Cornelia Huck Feb. 28, 2017, 3:24 p.m. UTC

Switching to vring caches exposed an existing bug in
virtio_queue_set_notification(): We can't access vring structures
if they have not been set up yet. This may happen, for example,
for virtio-blk devices with multiple queues: The code will try to
switch notifiers for every queue, but the guest may have only set up
a subset of them.

Fix this by (1) guarding access to the vring memory by checking
for vring.desc and (2) triggering an update to the vring flags
for consistency with the configured notification state once the
queue is actually configured.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
 hw/virtio/virtio.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

Paolo Bonzini Feb. 28, 2017, 3:43 p.m. UTC | #1

On 28/02/2017 16:24, Cornelia Huck wrote:
> Switching to vring caches exposed an existing bug in
> virtio_queue_set_notification(): We can't access vring structures
> if they have not been set up yet. This may happen, for example,
> for virtio-blk devices with multiple queues: The code will try to
> switch notifiers for every queue, but the guest may have only set up
> a subset of them.
> 
> Fix this by (1) guarding access to the vring memory by checking
> for vring.desc and (2) triggering an update to the vring flags
> for consistency with the configured notification state once the
> queue is actually configured.
> 
> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> ---
>  hw/virtio/virtio.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index e487e36..d2ecd64 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
>      virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
>  }
>  
> -void virtio_queue_set_notification(VirtQueue *vq, int enable)
> +static void vring_set_notification(VirtQueue *vq, int enable)
>  {
> -    vq->notification = enable;
> -
> +    if (!vq->vring.desc) {
> +        return;
> +    }
>      rcu_read_lock();
>      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
>          vring_set_avail_event(vq, vring_avail_idx(vq));
> @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
>      rcu_read_unlock();
>  }
>  
> +void virtio_queue_set_notification(VirtQueue *vq, int enable)
> +{
> +    vq->notification = enable;
> +
> +    vring_set_notification(vq, enable);
> +}
> +
>  int virtio_queue_ready(VirtQueue *vq)
>  {
>      return vq->vring.avail != 0;
> @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr)
>  {
>      vdev->vq[n].vring.desc = addr;
>      virtio_queue_update_rings(vdev, n);
> +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
>  }
>  
>  hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
> @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
>      vdev->vq[n].vring.avail = avail;
>      vdev->vq[n].vring.used = used;
>      virtio_init_region_cache(vdev, n);
> +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
>  }
>  
>  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
> 

Looks good, thanks.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Michael S. Tsirkin March 1, 2017, 3:55 p.m. UTC | #2

On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote:
> Switching to vring caches exposed an existing bug in
> virtio_queue_set_notification(): We can't access vring structures
> if they have not been set up yet. This may happen, for example,
> for virtio-blk devices with multiple queues: The code will try to
> switch notifiers for every queue, but the guest may have only set up
> a subset of them.
> 
> Fix this by (1) guarding access to the vring memory by checking
> for vring.desc and (2) triggering an update to the vring flags
> for consistency with the configured notification state once the
> queue is actually configured.
> 
> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> ---
>  hw/virtio/virtio.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index e487e36..d2ecd64 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
>      virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
>  }
>  
> -void virtio_queue_set_notification(VirtQueue *vq, int enable)
> +static void vring_set_notification(VirtQueue *vq, int enable)
>  {
> -    vq->notification = enable;
> -
> +    if (!vq->vring.desc) {
> +        return;
> +    }
>      rcu_read_lock();
>      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
>          vring_set_avail_event(vq, vring_avail_idx(vq));
> @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
>      rcu_read_unlock();
>  }
>  
> +void virtio_queue_set_notification(VirtQueue *vq, int enable)
> +{
> +    vq->notification = enable;
> +
> +    vring_set_notification(vq, enable);
> +}
> +
>  int virtio_queue_ready(VirtQueue *vq)
>  {
>      return vq->vring.avail != 0;
> @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr)
>  {
>      vdev->vq[n].vring.desc = addr;
>      virtio_queue_update_rings(vdev, n);
> +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
>  }
>  
>  hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
> @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
>      vdev->vq[n].vring.avail = avail;
>      vdev->vq[n].vring.used = used;
>      virtio_init_region_cache(vdev, n);
> +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
>  }
>  
>  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)

There's a problem here, this violates the spec:
- for legacy devices, we shouldn't touch rings until we get a first kick
- for virtio 1 devices, we should not do it until DRIVER_OK

This is the real problem therefore: aio poll should not even
start before these events. Pls fix that and then you will not
need to call vring_set_notification from set rings.


> -- 
> 2.8.4
>

Cornelia Huck March 1, 2017, 4:15 p.m. UTC | #3

On Wed, 1 Mar 2017 17:55:24 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote:
> > Switching to vring caches exposed an existing bug in
> > virtio_queue_set_notification(): We can't access vring structures
> > if they have not been set up yet. This may happen, for example,
> > for virtio-blk devices with multiple queues: The code will try to
> > switch notifiers for every queue, but the guest may have only set up
> > a subset of them.
> > 
> > Fix this by (1) guarding access to the vring memory by checking
> > for vring.desc and (2) triggering an update to the vring flags
> > for consistency with the configured notification state once the
> > queue is actually configured.
> > 
> > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > ---
> >  hw/virtio/virtio.c | 16 +++++++++++++---
> >  1 file changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index e487e36..d2ecd64 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
> >      virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
> >  }
> >  
> > -void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > +static void vring_set_notification(VirtQueue *vq, int enable)
> >  {
> > -    vq->notification = enable;
> > -
> > +    if (!vq->vring.desc) {
> > +        return;
> > +    }
> >      rcu_read_lock();
> >      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >          vring_set_avail_event(vq, vring_avail_idx(vq));
> > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
> >      rcu_read_unlock();
> >  }
> >  
> > +void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > +{
> > +    vq->notification = enable;
> > +
> > +    vring_set_notification(vq, enable);
> > +}
> > +
> >  int virtio_queue_ready(VirtQueue *vq)
> >  {
> >      return vq->vring.avail != 0;
> > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr)
> >  {
> >      vdev->vq[n].vring.desc = addr;
> >      virtio_queue_update_rings(vdev, n);
> > +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
> >  }
> >  
> >  hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
> > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> >      vdev->vq[n].vring.avail = avail;
> >      vdev->vq[n].vring.used = used;
> >      virtio_init_region_cache(vdev, n);
> > +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
> >  }
> >  
> >  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
> 
> There's a problem here, this violates the spec:
> - for legacy devices, we shouldn't touch rings until we get a first kick
> - for virtio 1 devices, we should not do it until DRIVER_OK
> 
> This is the real problem therefore: aio poll should not even
> start before these events. Pls fix that and then you will not
> need to call vring_set_notification from set rings.

Hooking into set_status for virtio-1 devices should not be a problem.

For legacy, we probably need to track and do a delayed switch.  Let me
see what I can come up with.

Michael S. Tsirkin March 1, 2017, 4:50 p.m. UTC | #4

On Wed, Mar 01, 2017 at 05:15:54PM +0100, Cornelia Huck wrote:
> On Wed, 1 Mar 2017 17:55:24 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote:
> > > Switching to vring caches exposed an existing bug in
> > > virtio_queue_set_notification(): We can't access vring structures
> > > if they have not been set up yet. This may happen, for example,
> > > for virtio-blk devices with multiple queues: The code will try to
> > > switch notifiers for every queue, but the guest may have only set up
> > > a subset of them.
> > > 
> > > Fix this by (1) guarding access to the vring memory by checking
> > > for vring.desc and (2) triggering an update to the vring flags
> > > for consistency with the configured notification state once the
> > > queue is actually configured.
> > > 
> > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > > ---
> > >  hw/virtio/virtio.c | 16 +++++++++++++---
> > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > index e487e36..d2ecd64 100644
> > > --- a/hw/virtio/virtio.c
> > > +++ b/hw/virtio/virtio.c
> > > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
> > >      virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
> > >  }
> > >  
> > > -void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > > +static void vring_set_notification(VirtQueue *vq, int enable)
> > >  {
> > > -    vq->notification = enable;
> > > -
> > > +    if (!vq->vring.desc) {
> > > +        return;
> > > +    }
> > >      rcu_read_lock();
> > >      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> > >          vring_set_avail_event(vq, vring_avail_idx(vq));
> > > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > >      rcu_read_unlock();
> > >  }
> > >  
> > > +void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > > +{
> > > +    vq->notification = enable;
> > > +
> > > +    vring_set_notification(vq, enable);
> > > +}
> > > +
> > >  int virtio_queue_ready(VirtQueue *vq)
> > >  {
> > >      return vq->vring.avail != 0;
> > > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr)
> > >  {
> > >      vdev->vq[n].vring.desc = addr;
> > >      virtio_queue_update_rings(vdev, n);
> > > +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
> > >  }
> > >  
> > >  hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
> > > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> > >      vdev->vq[n].vring.avail = avail;
> > >      vdev->vq[n].vring.used = used;
> > >      virtio_init_region_cache(vdev, n);
> > > +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
> > >  }
> > >  
> > >  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
> > 
> > There's a problem here, this violates the spec:
> > - for legacy devices, we shouldn't touch rings until we get a first kick
> > - for virtio 1 devices, we should not do it until DRIVER_OK
> > 
> > This is the real problem therefore: aio poll should not even
> > start before these events. Pls fix that and then you will not
> > need to call vring_set_notification from set rings.
> 
> Hooking into set_status for virtio-1 devices should not be a problem.
> 
> For legacy, we probably need to track and do a delayed switch.  Let me
> see what I can come up with.


Yes all these callbacks complicate code to the point it's barely
readable. I really don't see why are we poking at device beforehand at
all. IMHO this is closer to the root of the problem. Don't do it.  One
way to look at it is to say that we start aio too early. Do it at
driver_ok for virtio 1 and on kick for virtio 0 and problems will go
away.

In fact, is there even a need to call vring_set_notification that early?
queues are initialized by guest to 0 so you get a notification on first
buffer unconditionally.

Would just
+    if (!vq->vring.desc) {
+        return;
+    }
be enough?

Cornelia Huck March 1, 2017, 5 p.m. UTC | #5

On Wed, 1 Mar 2017 18:50:34 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Wed, Mar 01, 2017 at 05:15:54PM +0100, Cornelia Huck wrote:
> > On Wed, 1 Mar 2017 17:55:24 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote:
> > > > Switching to vring caches exposed an existing bug in
> > > > virtio_queue_set_notification(): We can't access vring structures
> > > > if they have not been set up yet. This may happen, for example,
> > > > for virtio-blk devices with multiple queues: The code will try to
> > > > switch notifiers for every queue, but the guest may have only set up
> > > > a subset of them.
> > > > 
> > > > Fix this by (1) guarding access to the vring memory by checking
> > > > for vring.desc and (2) triggering an update to the vring flags
> > > > for consistency with the configured notification state once the
> > > > queue is actually configured.
> > > > 
> > > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > > > ---
> > > >  hw/virtio/virtio.c | 16 +++++++++++++---
> > > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > > index e487e36..d2ecd64 100644
> > > > --- a/hw/virtio/virtio.c
> > > > +++ b/hw/virtio/virtio.c
> > > > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
> > > >      virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
> > > >  }
> > > >  
> > > > -void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > > > +static void vring_set_notification(VirtQueue *vq, int enable)
> > > >  {
> > > > -    vq->notification = enable;
> > > > -
> > > > +    if (!vq->vring.desc) {
> > > > +        return;
> > > > +    }
> > > >      rcu_read_lock();
> > > >      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> > > >          vring_set_avail_event(vq, vring_avail_idx(vq));
> > > > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > > >      rcu_read_unlock();
> > > >  }
> > > >  
> > > > +void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > > > +{
> > > > +    vq->notification = enable;
> > > > +
> > > > +    vring_set_notification(vq, enable);
> > > > +}
> > > > +
> > > >  int virtio_queue_ready(VirtQueue *vq)
> > > >  {
> > > >      return vq->vring.avail != 0;
> > > > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr)
> > > >  {
> > > >      vdev->vq[n].vring.desc = addr;
> > > >      virtio_queue_update_rings(vdev, n);
> > > > +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
> > > >  }
> > > >  
> > > >  hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
> > > > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> > > >      vdev->vq[n].vring.avail = avail;
> > > >      vdev->vq[n].vring.used = used;
> > > >      virtio_init_region_cache(vdev, n);
> > > > +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
> > > >  }
> > > >  
> > > >  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
> > > 
> > > There's a problem here, this violates the spec:
> > > - for legacy devices, we shouldn't touch rings until we get a first kick
> > > - for virtio 1 devices, we should not do it until DRIVER_OK
> > > 
> > > This is the real problem therefore: aio poll should not even
> > > start before these events. Pls fix that and then you will not
> > > need to call vring_set_notification from set rings.
> > 
> > Hooking into set_status for virtio-1 devices should not be a problem.
> > 
> > For legacy, we probably need to track and do a delayed switch.  Let me
> > see what I can come up with.
> 
> 
> Yes all these callbacks complicate code to the point it's barely
> readable. I really don't see why are we poking at device beforehand at
> all. IMHO this is closer to the root of the problem. Don't do it.  One
> way to look at it is to say that we start aio too early. Do it at
> driver_ok for virtio 1 and on kick for virtio 0 and problems will go
> away.

The problem exists for the case where the guest sets up only the first
queue, but the host has more than one queue. The guest setting up other
queues later is probably patholotical, but not really forbidden by the
spec (I think).

> 
> In fact, is there even a need to call vring_set_notification that early?
> queues are initialized by guest to 0 so you get a notification on first
> buffer unconditionally.
> 
> Would just
> +    if (!vq->vring.desc) {
> +        return;
> +    }
> be enough?
>

Michael S. Tsirkin March 1, 2017, 5:04 p.m. UTC | #6

On Wed, Mar 01, 2017 at 06:00:44PM +0100, Cornelia Huck wrote:
> On Wed, 1 Mar 2017 18:50:34 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Wed, Mar 01, 2017 at 05:15:54PM +0100, Cornelia Huck wrote:
> > > On Wed, 1 Mar 2017 17:55:24 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote:
> > > > > Switching to vring caches exposed an existing bug in
> > > > > virtio_queue_set_notification(): We can't access vring structures
> > > > > if they have not been set up yet. This may happen, for example,
> > > > > for virtio-blk devices with multiple queues: The code will try to
> > > > > switch notifiers for every queue, but the guest may have only set up
> > > > > a subset of them.
> > > > > 
> > > > > Fix this by (1) guarding access to the vring memory by checking
> > > > > for vring.desc and (2) triggering an update to the vring flags
> > > > > for consistency with the configured notification state once the
> > > > > queue is actually configured.
> > > > > 
> > > > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > > > > ---
> > > > >  hw/virtio/virtio.c | 16 +++++++++++++---
> > > > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > > > index e487e36..d2ecd64 100644
> > > > > --- a/hw/virtio/virtio.c
> > > > > +++ b/hw/virtio/virtio.c
> > > > > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
> > > > >      virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
> > > > >  }
> > > > >  
> > > > > -void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > > > > +static void vring_set_notification(VirtQueue *vq, int enable)
> > > > >  {
> > > > > -    vq->notification = enable;
> > > > > -
> > > > > +    if (!vq->vring.desc) {
> > > > > +        return;
> > > > > +    }
> > > > >      rcu_read_lock();
> > > > >      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> > > > >          vring_set_avail_event(vq, vring_avail_idx(vq));
> > > > > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > > > >      rcu_read_unlock();
> > > > >  }
> > > > >  
> > > > > +void virtio_queue_set_notification(VirtQueue *vq, int enable)
> > > > > +{
> > > > > +    vq->notification = enable;
> > > > > +
> > > > > +    vring_set_notification(vq, enable);
> > > > > +}
> > > > > +
> > > > >  int virtio_queue_ready(VirtQueue *vq)
> > > > >  {
> > > > >      return vq->vring.avail != 0;
> > > > > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr)
> > > > >  {
> > > > >      vdev->vq[n].vring.desc = addr;
> > > > >      virtio_queue_update_rings(vdev, n);
> > > > > +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
> > > > >  }
> > > > >  
> > > > >  hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
> > > > > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> > > > >      vdev->vq[n].vring.avail = avail;
> > > > >      vdev->vq[n].vring.used = used;
> > > > >      virtio_init_region_cache(vdev, n);
> > > > > +    vring_set_notification(&vdev->vq[n], vdev->vq[n].notification);
> > > > >  }
> > > > >  
> > > > >  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
> > > > 
> > > > There's a problem here, this violates the spec:
> > > > - for legacy devices, we shouldn't touch rings until we get a first kick
> > > > - for virtio 1 devices, we should not do it until DRIVER_OK
> > > > 
> > > > This is the real problem therefore: aio poll should not even
> > > > start before these events. Pls fix that and then you will not
> > > > need to call vring_set_notification from set rings.
> > > 
> > > Hooking into set_status for virtio-1 devices should not be a problem.
> > > 
> > > For legacy, we probably need to track and do a delayed switch.  Let me
> > > see what I can come up with.
> > 
> > 
> > Yes all these callbacks complicate code to the point it's barely
> > readable. I really don't see why are we poking at device beforehand at
> > all. IMHO this is closer to the root of the problem. Don't do it.  One
> > way to look at it is to say that we start aio too early. Do it at
> > driver_ok for virtio 1 and on kick for virtio 0 and problems will go
> > away.
> 
> The problem exists for the case where the guest sets up only the first
> queue, but the host has more than one queue. The guest setting up other
> queues later is probably patholotical, but not really forbidden by the
> spec (I think).

I think it's forbidden.  spec explains how queues are set up:

1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the device.
4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.

And it goes on to mention a bug in legacy drivers:
Legacy driver implementations often used the device before setting the DRIVER_OK bit, and sometimes
even before writing the feature bits to the device.



> > 
> > In fact, is there even a need to call vring_set_notification that early?
> > queues are initialized by guest to 0 so you get a notification on first
> > buffer unconditionally.
> > 
> > Would just
> > +    if (!vq->vring.desc) {
> > +        return;
> > +    }
> > be enough?
> >

Cornelia Huck March 1, 2017, 5:14 p.m. UTC | #7

On Wed, 1 Mar 2017 19:04:56 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Wed, Mar 01, 2017 at 06:00:44PM +0100, Cornelia Huck wrote:
> > On Wed, 1 Mar 2017 18:50:34 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:

> > > Yes all these callbacks complicate code to the point it's barely
> > > readable. I really don't see why are we poking at device beforehand at
> > > all. IMHO this is closer to the root of the problem. Don't do it.  One
> > > way to look at it is to say that we start aio too early. Do it at
> > > driver_ok for virtio 1 and on kick for virtio 0 and problems will go
> > > away.
> > 
> > The problem exists for the case where the guest sets up only the first
> > queue, but the host has more than one queue. The guest setting up other
> > queues later is probably patholotical, but not really forbidden by the
> > spec (I think).
> 
> I think it's forbidden.  spec explains how queues are set up:
> 
> 1. Reset the device.
> 2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device.
> 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
> device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
> fields to check that it can support the device before accepting it.
> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
> 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
> support our subset of features and the device is unusable.
> 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
> reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
> 8. Set the DRIVER_OK status bit. At this point the device is “live”.

It seems I managed to read over this statement in step 7 several
times... So "make sure that we only start aio after DRIVER_OK, and
forbid any setup for !desc permanently" will take care of virtio-1
devices.

> 
> And it goes on to mention a bug in legacy drivers:
> Legacy driver implementations often used the device before setting the DRIVER_OK bit, and sometimes
> even before writing the feature bits to the device.

I wonder whether we need to accommodate legacy drivers doing both that
_and_ setting up virtqueues between using the device for the first time
and setting DRIVER_OK? Just using the logic of "if there were no rings
setup when we first try to start aio, ignore that queue until reset"
would cover all but those very odd drivers, and I highly doubt anybody
cares about such broken setups.

Michael S. Tsirkin March 1, 2017, 5:19 p.m. UTC | #8

On Wed, Mar 01, 2017 at 06:14:52PM +0100, Cornelia Huck wrote:
> On Wed, 1 Mar 2017 19:04:56 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Wed, Mar 01, 2017 at 06:00:44PM +0100, Cornelia Huck wrote:
> > > On Wed, 1 Mar 2017 18:50:34 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > > Yes all these callbacks complicate code to the point it's barely
> > > > readable. I really don't see why are we poking at device beforehand at
> > > > all. IMHO this is closer to the root of the problem. Don't do it.  One
> > > > way to look at it is to say that we start aio too early. Do it at
> > > > driver_ok for virtio 1 and on kick for virtio 0 and problems will go
> > > > away.
> > > 
> > > The problem exists for the case where the guest sets up only the first
> > > queue, but the host has more than one queue. The guest setting up other
> > > queues later is probably patholotical, but not really forbidden by the
> > > spec (I think).
> > 
> > I think it's forbidden.  spec explains how queues are set up:
> > 
> > 1. Reset the device.
> > 2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device.
> > 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> > 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
> > device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
> > fields to check that it can support the device before accepting it.
> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
> > 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
> > support our subset of features and the device is unusable.
> > 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
> > reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
> > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> 
> It seems I managed to read over this statement in step 7 several
> times... So "make sure that we only start aio after DRIVER_OK, and
> forbid any setup for !desc permanently" will take care of virtio-1
> devices.
> 
> > 
> > And it goes on to mention a bug in legacy drivers:
> > Legacy driver implementations often used the device before setting the DRIVER_OK bit, and sometimes
> > even before writing the feature bits to the device.
> 
> I wonder whether we need to accommodate legacy drivers doing both that
> _and_ setting up virtqueues between using the device for the first time
> and setting DRIVER_OK? Just using the logic of "if there were no rings
> setup when we first try to start aio, ignore that queue until reset"
> would cover all but those very odd drivers, and I highly doubt anybody
> cares about such broken setups.


kick on a vq before this vq is setup would be a broken thing to do.

They did this though

	for each ring
		add buffer
		kick
	driver ok

so it's a per-ring thing.

Cornelia Huck March 1, 2017, 5:43 p.m. UTC | #9

On Wed, 1 Mar 2017 19:19:40 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Wed, Mar 01, 2017 at 06:14:52PM +0100, Cornelia Huck wrote:
> > On Wed, 1 Mar 2017 19:04:56 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Wed, Mar 01, 2017 at 06:00:44PM +0100, Cornelia Huck wrote:
> > > > On Wed, 1 Mar 2017 18:50:34 +0200
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > > > Yes all these callbacks complicate code to the point it's barely
> > > > > readable. I really don't see why are we poking at device beforehand at
> > > > > all. IMHO this is closer to the root of the problem. Don't do it.  One
> > > > > way to look at it is to say that we start aio too early. Do it at
> > > > > driver_ok for virtio 1 and on kick for virtio 0 and problems will go
> > > > > away.
> > > > 
> > > > The problem exists for the case where the guest sets up only the first
> > > > queue, but the host has more than one queue. The guest setting up other
> > > > queues later is probably patholotical, but not really forbidden by the
> > > > spec (I think).
> > > 
> > > I think it's forbidden.  spec explains how queues are set up:
> > > 
> > > 1. Reset the device.
> > > 2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device.
> > > 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> > > 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
> > > device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
> > > fields to check that it can support the device before accepting it.
> > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
> > > 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
> > > support our subset of features and the device is unusable.
> > > 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
> > > reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
> > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > 
> > It seems I managed to read over this statement in step 7 several
> > times... So "make sure that we only start aio after DRIVER_OK, and
> > forbid any setup for !desc permanently" will take care of virtio-1
> > devices.
> > 
> > > 
> > > And it goes on to mention a bug in legacy drivers:
> > > Legacy driver implementations often used the device before setting the DRIVER_OK bit, and sometimes
> > > even before writing the feature bits to the device.
> > 
> > I wonder whether we need to accommodate legacy drivers doing both that
> > _and_ setting up virtqueues between using the device for the first time
> > and setting DRIVER_OK? Just using the logic of "if there were no rings
> > setup when we first try to start aio, ignore that queue until reset"
> > would cover all but those very odd drivers, and I highly doubt anybody
> > cares about such broken setups.
> 
> 
> kick on a vq before this vq is setup would be a broken thing to do.
> 
> They did this though
> 
> 	for each ring
> 		add buffer
> 		kick
> 	driver ok
> 
> so it's a per-ring thing.

Hm... If we just check for !desc, we should be fine after the first aio
poll, I think...

virtio: guard vring access when setting notification

Commit Message

Comments

Patch