Message ID | 20170228152411.81609-1-cornelia.huck@de.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 28/02/2017 16:24, Cornelia Huck wrote: > Switching to vring caches exposed an existing bug in > virtio_queue_set_notification(): We can't access vring structures > if they have not been set up yet. This may happen, for example, > for virtio-blk devices with multiple queues: The code will try to > switch notifiers for every queue, but the guest may have only set up > a subset of them. > > Fix this by (1) guarding access to the vring memory by checking > for vring.desc and (2) triggering an update to the vring flags > for consistency with the configured notification state once the > queue is actually configured. > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > --- > hw/virtio/virtio.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index e487e36..d2ecd64 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val) > virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val); > } > > -void virtio_queue_set_notification(VirtQueue *vq, int enable) > +static void vring_set_notification(VirtQueue *vq, int enable) > { > - vq->notification = enable; > - > + if (!vq->vring.desc) { > + return; > + } > rcu_read_lock(); > if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) { > vring_set_avail_event(vq, vring_avail_idx(vq)); > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable) > rcu_read_unlock(); > } > > +void virtio_queue_set_notification(VirtQueue *vq, int enable) > +{ > + vq->notification = enable; > + > + vring_set_notification(vq, enable); > +} > + > int virtio_queue_ready(VirtQueue *vq) > { > return vq->vring.avail != 0; > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr) > { > vdev->vq[n].vring.desc = addr; > virtio_queue_update_rings(vdev, n); > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > } > > hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n) > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc, > vdev->vq[n].vring.avail = avail; > vdev->vq[n].vring.used = used; > virtio_init_region_cache(vdev, n); > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > } > > void virtio_queue_set_num(VirtIODevice *vdev, int n, int num) > Looks good, thanks. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote: > Switching to vring caches exposed an existing bug in > virtio_queue_set_notification(): We can't access vring structures > if they have not been set up yet. This may happen, for example, > for virtio-blk devices with multiple queues: The code will try to > switch notifiers for every queue, but the guest may have only set up > a subset of them. > > Fix this by (1) guarding access to the vring memory by checking > for vring.desc and (2) triggering an update to the vring flags > for consistency with the configured notification state once the > queue is actually configured. > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > --- > hw/virtio/virtio.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index e487e36..d2ecd64 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val) > virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val); > } > > -void virtio_queue_set_notification(VirtQueue *vq, int enable) > +static void vring_set_notification(VirtQueue *vq, int enable) > { > - vq->notification = enable; > - > + if (!vq->vring.desc) { > + return; > + } > rcu_read_lock(); > if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) { > vring_set_avail_event(vq, vring_avail_idx(vq)); > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable) > rcu_read_unlock(); > } > > +void virtio_queue_set_notification(VirtQueue *vq, int enable) > +{ > + vq->notification = enable; > + > + vring_set_notification(vq, enable); > +} > + > int virtio_queue_ready(VirtQueue *vq) > { > return vq->vring.avail != 0; > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr) > { > vdev->vq[n].vring.desc = addr; > virtio_queue_update_rings(vdev, n); > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > } > > hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n) > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc, > vdev->vq[n].vring.avail = avail; > vdev->vq[n].vring.used = used; > virtio_init_region_cache(vdev, n); > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > } > > void virtio_queue_set_num(VirtIODevice *vdev, int n, int num) There's a problem here, this violates the spec: - for legacy devices, we shouldn't touch rings until we get a first kick - for virtio 1 devices, we should not do it until DRIVER_OK This is the real problem therefore: aio poll should not even start before these events. Pls fix that and then you will not need to call vring_set_notification from set rings. > -- > 2.8.4 >
On Wed, 1 Mar 2017 17:55:24 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote: > > Switching to vring caches exposed an existing bug in > > virtio_queue_set_notification(): We can't access vring structures > > if they have not been set up yet. This may happen, for example, > > for virtio-blk devices with multiple queues: The code will try to > > switch notifiers for every queue, but the guest may have only set up > > a subset of them. > > > > Fix this by (1) guarding access to the vring memory by checking > > for vring.desc and (2) triggering an update to the vring flags > > for consistency with the configured notification state once the > > queue is actually configured. > > > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > > --- > > hw/virtio/virtio.c | 16 +++++++++++++--- > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > > index e487e36..d2ecd64 100644 > > --- a/hw/virtio/virtio.c > > +++ b/hw/virtio/virtio.c > > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val) > > virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val); > > } > > > > -void virtio_queue_set_notification(VirtQueue *vq, int enable) > > +static void vring_set_notification(VirtQueue *vq, int enable) > > { > > - vq->notification = enable; > > - > > + if (!vq->vring.desc) { > > + return; > > + } > > rcu_read_lock(); > > if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) { > > vring_set_avail_event(vq, vring_avail_idx(vq)); > > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable) > > rcu_read_unlock(); > > } > > > > +void virtio_queue_set_notification(VirtQueue *vq, int enable) > > +{ > > + vq->notification = enable; > > + > > + vring_set_notification(vq, enable); > > +} > > + > > int virtio_queue_ready(VirtQueue *vq) > > { > > return vq->vring.avail != 0; > > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr) > > { > > vdev->vq[n].vring.desc = addr; > > virtio_queue_update_rings(vdev, n); > > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > > } > > > > hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n) > > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc, > > vdev->vq[n].vring.avail = avail; > > vdev->vq[n].vring.used = used; > > virtio_init_region_cache(vdev, n); > > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > > } > > > > void virtio_queue_set_num(VirtIODevice *vdev, int n, int num) > > There's a problem here, this violates the spec: > - for legacy devices, we shouldn't touch rings until we get a first kick > - for virtio 1 devices, we should not do it until DRIVER_OK > > This is the real problem therefore: aio poll should not even > start before these events. Pls fix that and then you will not > need to call vring_set_notification from set rings. Hooking into set_status for virtio-1 devices should not be a problem. For legacy, we probably need to track and do a delayed switch. Let me see what I can come up with.
On Wed, Mar 01, 2017 at 05:15:54PM +0100, Cornelia Huck wrote: > On Wed, 1 Mar 2017 17:55:24 +0200 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote: > > > Switching to vring caches exposed an existing bug in > > > virtio_queue_set_notification(): We can't access vring structures > > > if they have not been set up yet. This may happen, for example, > > > for virtio-blk devices with multiple queues: The code will try to > > > switch notifiers for every queue, but the guest may have only set up > > > a subset of them. > > > > > > Fix this by (1) guarding access to the vring memory by checking > > > for vring.desc and (2) triggering an update to the vring flags > > > for consistency with the configured notification state once the > > > queue is actually configured. > > > > > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > > > --- > > > hw/virtio/virtio.c | 16 +++++++++++++--- > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > > > index e487e36..d2ecd64 100644 > > > --- a/hw/virtio/virtio.c > > > +++ b/hw/virtio/virtio.c > > > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val) > > > virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val); > > > } > > > > > > -void virtio_queue_set_notification(VirtQueue *vq, int enable) > > > +static void vring_set_notification(VirtQueue *vq, int enable) > > > { > > > - vq->notification = enable; > > > - > > > + if (!vq->vring.desc) { > > > + return; > > > + } > > > rcu_read_lock(); > > > if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) { > > > vring_set_avail_event(vq, vring_avail_idx(vq)); > > > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable) > > > rcu_read_unlock(); > > > } > > > > > > +void virtio_queue_set_notification(VirtQueue *vq, int enable) > > > +{ > > > + vq->notification = enable; > > > + > > > + vring_set_notification(vq, enable); > > > +} > > > + > > > int virtio_queue_ready(VirtQueue *vq) > > > { > > > return vq->vring.avail != 0; > > > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr) > > > { > > > vdev->vq[n].vring.desc = addr; > > > virtio_queue_update_rings(vdev, n); > > > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > > > } > > > > > > hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n) > > > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc, > > > vdev->vq[n].vring.avail = avail; > > > vdev->vq[n].vring.used = used; > > > virtio_init_region_cache(vdev, n); > > > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > > > } > > > > > > void virtio_queue_set_num(VirtIODevice *vdev, int n, int num) > > > > There's a problem here, this violates the spec: > > - for legacy devices, we shouldn't touch rings until we get a first kick > > - for virtio 1 devices, we should not do it until DRIVER_OK > > > > This is the real problem therefore: aio poll should not even > > start before these events. Pls fix that and then you will not > > need to call vring_set_notification from set rings. > > Hooking into set_status for virtio-1 devices should not be a problem. > > For legacy, we probably need to track and do a delayed switch. Let me > see what I can come up with. Yes all these callbacks complicate code to the point it's barely readable. I really don't see why are we poking at device beforehand at all. IMHO this is closer to the root of the problem. Don't do it. One way to look at it is to say that we start aio too early. Do it at driver_ok for virtio 1 and on kick for virtio 0 and problems will go away. In fact, is there even a need to call vring_set_notification that early? queues are initialized by guest to 0 so you get a notification on first buffer unconditionally. Would just + if (!vq->vring.desc) { + return; + } be enough?
On Wed, 1 Mar 2017 18:50:34 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Mar 01, 2017 at 05:15:54PM +0100, Cornelia Huck wrote: > > On Wed, 1 Mar 2017 17:55:24 +0200 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote: > > > > Switching to vring caches exposed an existing bug in > > > > virtio_queue_set_notification(): We can't access vring structures > > > > if they have not been set up yet. This may happen, for example, > > > > for virtio-blk devices with multiple queues: The code will try to > > > > switch notifiers for every queue, but the guest may have only set up > > > > a subset of them. > > > > > > > > Fix this by (1) guarding access to the vring memory by checking > > > > for vring.desc and (2) triggering an update to the vring flags > > > > for consistency with the configured notification state once the > > > > queue is actually configured. > > > > > > > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > > > > --- > > > > hw/virtio/virtio.c | 16 +++++++++++++--- > > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > > > > index e487e36..d2ecd64 100644 > > > > --- a/hw/virtio/virtio.c > > > > +++ b/hw/virtio/virtio.c > > > > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val) > > > > virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val); > > > > } > > > > > > > > -void virtio_queue_set_notification(VirtQueue *vq, int enable) > > > > +static void vring_set_notification(VirtQueue *vq, int enable) > > > > { > > > > - vq->notification = enable; > > > > - > > > > + if (!vq->vring.desc) { > > > > + return; > > > > + } > > > > rcu_read_lock(); > > > > if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) { > > > > vring_set_avail_event(vq, vring_avail_idx(vq)); > > > > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable) > > > > rcu_read_unlock(); > > > > } > > > > > > > > +void virtio_queue_set_notification(VirtQueue *vq, int enable) > > > > +{ > > > > + vq->notification = enable; > > > > + > > > > + vring_set_notification(vq, enable); > > > > +} > > > > + > > > > int virtio_queue_ready(VirtQueue *vq) > > > > { > > > > return vq->vring.avail != 0; > > > > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr) > > > > { > > > > vdev->vq[n].vring.desc = addr; > > > > virtio_queue_update_rings(vdev, n); > > > > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > > > > } > > > > > > > > hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n) > > > > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc, > > > > vdev->vq[n].vring.avail = avail; > > > > vdev->vq[n].vring.used = used; > > > > virtio_init_region_cache(vdev, n); > > > > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > > > > } > > > > > > > > void virtio_queue_set_num(VirtIODevice *vdev, int n, int num) > > > > > > There's a problem here, this violates the spec: > > > - for legacy devices, we shouldn't touch rings until we get a first kick > > > - for virtio 1 devices, we should not do it until DRIVER_OK > > > > > > This is the real problem therefore: aio poll should not even > > > start before these events. Pls fix that and then you will not > > > need to call vring_set_notification from set rings. > > > > Hooking into set_status for virtio-1 devices should not be a problem. > > > > For legacy, we probably need to track and do a delayed switch. Let me > > see what I can come up with. > > > Yes all these callbacks complicate code to the point it's barely > readable. I really don't see why are we poking at device beforehand at > all. IMHO this is closer to the root of the problem. Don't do it. One > way to look at it is to say that we start aio too early. Do it at > driver_ok for virtio 1 and on kick for virtio 0 and problems will go > away. The problem exists for the case where the guest sets up only the first queue, but the host has more than one queue. The guest setting up other queues later is probably patholotical, but not really forbidden by the spec (I think). > > In fact, is there even a need to call vring_set_notification that early? > queues are initialized by guest to 0 so you get a notification on first > buffer unconditionally. > > Would just > + if (!vq->vring.desc) { > + return; > + } > be enough? >
On Wed, Mar 01, 2017 at 06:00:44PM +0100, Cornelia Huck wrote: > On Wed, 1 Mar 2017 18:50:34 +0200 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Wed, Mar 01, 2017 at 05:15:54PM +0100, Cornelia Huck wrote: > > > On Wed, 1 Mar 2017 17:55:24 +0200 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Tue, Feb 28, 2017 at 04:24:11PM +0100, Cornelia Huck wrote: > > > > > Switching to vring caches exposed an existing bug in > > > > > virtio_queue_set_notification(): We can't access vring structures > > > > > if they have not been set up yet. This may happen, for example, > > > > > for virtio-blk devices with multiple queues: The code will try to > > > > > switch notifiers for every queue, but the guest may have only set up > > > > > a subset of them. > > > > > > > > > > Fix this by (1) guarding access to the vring memory by checking > > > > > for vring.desc and (2) triggering an update to the vring flags > > > > > for consistency with the configured notification state once the > > > > > queue is actually configured. > > > > > > > > > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > > > > > --- > > > > > hw/virtio/virtio.c | 16 +++++++++++++--- > > > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > > > > > index e487e36..d2ecd64 100644 > > > > > --- a/hw/virtio/virtio.c > > > > > +++ b/hw/virtio/virtio.c > > > > > @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val) > > > > > virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val); > > > > > } > > > > > > > > > > -void virtio_queue_set_notification(VirtQueue *vq, int enable) > > > > > +static void vring_set_notification(VirtQueue *vq, int enable) > > > > > { > > > > > - vq->notification = enable; > > > > > - > > > > > + if (!vq->vring.desc) { > > > > > + return; > > > > > + } > > > > > rcu_read_lock(); > > > > > if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) { > > > > > vring_set_avail_event(vq, vring_avail_idx(vq)); > > > > > @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable) > > > > > rcu_read_unlock(); > > > > > } > > > > > > > > > > +void virtio_queue_set_notification(VirtQueue *vq, int enable) > > > > > +{ > > > > > + vq->notification = enable; > > > > > + > > > > > + vring_set_notification(vq, enable); > > > > > +} > > > > > + > > > > > int virtio_queue_ready(VirtQueue *vq) > > > > > { > > > > > return vq->vring.avail != 0; > > > > > @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr) > > > > > { > > > > > vdev->vq[n].vring.desc = addr; > > > > > virtio_queue_update_rings(vdev, n); > > > > > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > > > > > } > > > > > > > > > > hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n) > > > > > @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc, > > > > > vdev->vq[n].vring.avail = avail; > > > > > vdev->vq[n].vring.used = used; > > > > > virtio_init_region_cache(vdev, n); > > > > > + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); > > > > > } > > > > > > > > > > void virtio_queue_set_num(VirtIODevice *vdev, int n, int num) > > > > > > > > There's a problem here, this violates the spec: > > > > - for legacy devices, we shouldn't touch rings until we get a first kick > > > > - for virtio 1 devices, we should not do it until DRIVER_OK > > > > > > > > This is the real problem therefore: aio poll should not even > > > > start before these events. Pls fix that and then you will not > > > > need to call vring_set_notification from set rings. > > > > > > Hooking into set_status for virtio-1 devices should not be a problem. > > > > > > For legacy, we probably need to track and do a delayed switch. Let me > > > see what I can come up with. > > > > > > Yes all these callbacks complicate code to the point it's barely > > readable. I really don't see why are we poking at device beforehand at > > all. IMHO this is closer to the root of the problem. Don't do it. One > > way to look at it is to say that we start aio too early. Do it at > > driver_ok for virtio 1 and on kick for virtio 0 and problems will go > > away. > > The problem exists for the case where the guest sets up only the first > queue, but the host has more than one queue. The guest setting up other > queues later is probably patholotical, but not really forbidden by the > spec (I think). I think it's forbidden. spec explains how queues are set up: 1. Reset the device. 2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device. 3. Set the DRIVER status bit: the guest OS knows how to drive the device. 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not support our subset of features and the device is unusable. 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8. Set the DRIVER_OK status bit. At this point the device is “live”. And it goes on to mention a bug in legacy drivers: Legacy driver implementations often used the device before setting the DRIVER_OK bit, and sometimes even before writing the feature bits to the device. > > > > In fact, is there even a need to call vring_set_notification that early? > > queues are initialized by guest to 0 so you get a notification on first > > buffer unconditionally. > > > > Would just > > + if (!vq->vring.desc) { > > + return; > > + } > > be enough? > >
On Wed, 1 Mar 2017 19:04:56 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Mar 01, 2017 at 06:00:44PM +0100, Cornelia Huck wrote: > > On Wed, 1 Mar 2017 18:50:34 +0200 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > Yes all these callbacks complicate code to the point it's barely > > > readable. I really don't see why are we poking at device beforehand at > > > all. IMHO this is closer to the root of the problem. Don't do it. One > > > way to look at it is to say that we start aio too early. Do it at > > > driver_ok for virtio 1 and on kick for virtio 0 and problems will go > > > away. > > > > The problem exists for the case where the guest sets up only the first > > queue, but the host has more than one queue. The guest setting up other > > queues later is probably patholotical, but not really forbidden by the > > spec (I think). > > I think it's forbidden. spec explains how queues are set up: > > 1. Reset the device. > 2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device. > 3. Set the DRIVER status bit: the guest OS knows how to drive the device. > 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the > device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration > fields to check that it can support the device before accepting it. > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. > 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not > support our subset of features and the device is unusable. > 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, > reading and possibly writing the device’s virtio configuration space, and population of virtqueues. > 8. Set the DRIVER_OK status bit. At this point the device is “live”. It seems I managed to read over this statement in step 7 several times... So "make sure that we only start aio after DRIVER_OK, and forbid any setup for !desc permanently" will take care of virtio-1 devices. > > And it goes on to mention a bug in legacy drivers: > Legacy driver implementations often used the device before setting the DRIVER_OK bit, and sometimes > even before writing the feature bits to the device. I wonder whether we need to accommodate legacy drivers doing both that _and_ setting up virtqueues between using the device for the first time and setting DRIVER_OK? Just using the logic of "if there were no rings setup when we first try to start aio, ignore that queue until reset" would cover all but those very odd drivers, and I highly doubt anybody cares about such broken setups.
On Wed, Mar 01, 2017 at 06:14:52PM +0100, Cornelia Huck wrote: > On Wed, 1 Mar 2017 19:04:56 +0200 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Wed, Mar 01, 2017 at 06:00:44PM +0100, Cornelia Huck wrote: > > > On Wed, 1 Mar 2017 18:50:34 +0200 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > Yes all these callbacks complicate code to the point it's barely > > > > readable. I really don't see why are we poking at device beforehand at > > > > all. IMHO this is closer to the root of the problem. Don't do it. One > > > > way to look at it is to say that we start aio too early. Do it at > > > > driver_ok for virtio 1 and on kick for virtio 0 and problems will go > > > > away. > > > > > > The problem exists for the case where the guest sets up only the first > > > queue, but the host has more than one queue. The guest setting up other > > > queues later is probably patholotical, but not really forbidden by the > > > spec (I think). > > > > I think it's forbidden. spec explains how queues are set up: > > > > 1. Reset the device. > > 2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device. > > 3. Set the DRIVER status bit: the guest OS knows how to drive the device. > > 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the > > device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration > > fields to check that it can support the device before accepting it. > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. > > 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not > > support our subset of features and the device is unusable. > > 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, > > reading and possibly writing the device’s virtio configuration space, and population of virtqueues. > > 8. Set the DRIVER_OK status bit. At this point the device is “live”. > > It seems I managed to read over this statement in step 7 several > times... So "make sure that we only start aio after DRIVER_OK, and > forbid any setup for !desc permanently" will take care of virtio-1 > devices. > > > > > And it goes on to mention a bug in legacy drivers: > > Legacy driver implementations often used the device before setting the DRIVER_OK bit, and sometimes > > even before writing the feature bits to the device. > > I wonder whether we need to accommodate legacy drivers doing both that > _and_ setting up virtqueues between using the device for the first time > and setting DRIVER_OK? Just using the logic of "if there were no rings > setup when we first try to start aio, ignore that queue until reset" > would cover all but those very odd drivers, and I highly doubt anybody > cares about such broken setups. kick on a vq before this vq is setup would be a broken thing to do. They did this though for each ring add buffer kick driver ok so it's a per-ring thing.
On Wed, 1 Mar 2017 19:19:40 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Mar 01, 2017 at 06:14:52PM +0100, Cornelia Huck wrote: > > On Wed, 1 Mar 2017 19:04:56 +0200 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Wed, Mar 01, 2017 at 06:00:44PM +0100, Cornelia Huck wrote: > > > > On Wed, 1 Mar 2017 18:50:34 +0200 > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > Yes all these callbacks complicate code to the point it's barely > > > > > readable. I really don't see why are we poking at device beforehand at > > > > > all. IMHO this is closer to the root of the problem. Don't do it. One > > > > > way to look at it is to say that we start aio too early. Do it at > > > > > driver_ok for virtio 1 and on kick for virtio 0 and problems will go > > > > > away. > > > > > > > > The problem exists for the case where the guest sets up only the first > > > > queue, but the host has more than one queue. The guest setting up other > > > > queues later is probably patholotical, but not really forbidden by the > > > > spec (I think). > > > > > > I think it's forbidden. spec explains how queues are set up: > > > > > > 1. Reset the device. > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device. > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the device. > > > 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the > > > device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration > > > fields to check that it can support the device before accepting it. > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not > > > support our subset of features and the device is unusable. > > > 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, > > > reading and possibly writing the device’s virtio configuration space, and population of virtqueues. > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”. > > > > It seems I managed to read over this statement in step 7 several > > times... So "make sure that we only start aio after DRIVER_OK, and > > forbid any setup for !desc permanently" will take care of virtio-1 > > devices. > > > > > > > > And it goes on to mention a bug in legacy drivers: > > > Legacy driver implementations often used the device before setting the DRIVER_OK bit, and sometimes > > > even before writing the feature bits to the device. > > > > I wonder whether we need to accommodate legacy drivers doing both that > > _and_ setting up virtqueues between using the device for the first time > > and setting DRIVER_OK? Just using the logic of "if there were no rings > > setup when we first try to start aio, ignore that queue until reset" > > would cover all but those very odd drivers, and I highly doubt anybody > > cares about such broken setups. > > > kick on a vq before this vq is setup would be a broken thing to do. > > They did this though > > for each ring > add buffer > kick > driver ok > > so it's a per-ring thing. Hm... If we just check for !desc, we should be fine after the first aio poll, I think...
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index e487e36..d2ecd64 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -284,10 +284,11 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val) virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val); } -void virtio_queue_set_notification(VirtQueue *vq, int enable) +static void vring_set_notification(VirtQueue *vq, int enable) { - vq->notification = enable; - + if (!vq->vring.desc) { + return; + } rcu_read_lock(); if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) { vring_set_avail_event(vq, vring_avail_idx(vq)); @@ -303,6 +304,13 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable) rcu_read_unlock(); } +void virtio_queue_set_notification(VirtQueue *vq, int enable) +{ + vq->notification = enable; + + vring_set_notification(vq, enable); +} + int virtio_queue_ready(VirtQueue *vq) { return vq->vring.avail != 0; @@ -1348,6 +1356,7 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr) { vdev->vq[n].vring.desc = addr; virtio_queue_update_rings(vdev, n); + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); } hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n) @@ -1362,6 +1371,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc, vdev->vq[n].vring.avail = avail; vdev->vq[n].vring.used = used; virtio_init_region_cache(vdev, n); + vring_set_notification(&vdev->vq[n], vdev->vq[n].notification); } void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
Switching to vring caches exposed an existing bug in virtio_queue_set_notification(): We can't access vring structures if they have not been set up yet. This may happen, for example, for virtio-blk devices with multiple queues: The code will try to switch notifiers for every queue, but the guest may have only set up a subset of them. Fix this by (1) guarding access to the vring memory by checking for vring.desc and (2) triggering an update to the vring flags for consistency with the configured notification state once the queue is actually configured. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> --- hw/virtio/virtio.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)