diff mbox

[1/3] kvm tools: Make virtio-net kvm__irq_line thread safe

Message ID 1304058985-13833-1-git-send-email-asias.hejun@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Asias He April 29, 2011, 6:36 a.m. UTC
This patch fixes virtio-net randmom stall

host $ scp guest:/root/big.guest .
big.guest 42%  440MB  67.7KB/s - stalled -

Signed-off-by: Asias He <asias.hejun@gmail.com>
---
 tools/kvm/virtio-net.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

Comments

Ingo Molnar April 29, 2011, 6:46 a.m. UTC | #1
* Asias He <asias.hejun@gmail.com> wrote:

> This patch fixes virtio-net randmom stall
> 
> host $ scp guest:/root/big.guest .
> big.guest 42%  440MB  67.7KB/s - stalled -
> 
> Signed-off-by: Asias He <asias.hejun@gmail.com>
> ---
>  tools/kvm/virtio-net.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c
> index 58b3de4..efe06cb 100644
> --- a/tools/kvm/virtio-net.c
> +++ b/tools/kvm/virtio-net.c
> @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void *param)
>  		head = virt_queue__get_iov(vq, iov, &out, &in, self);
>  		len = readv(net_device.tap_fd, iov, in);
>  		virt_queue__set_used_elem(vq, head, len);
> -	}
>  
> -	kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> +		/* We should interrupt guest right now, otherwise latency is huge. */
> +		mutex_lock(&net_device.mutex);
> +		kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> +		mutex_unlock(&net_device.mutex);
> +	}
>  }
>  
>  static void virtio_net_tx_callback(struct kvm *self, void *param)
> @@ -98,7 +101,9 @@ static void virtio_net_tx_callback(struct kvm *self, void *param)
>  		virt_queue__set_used_elem(vq, head, len);
>  	}
>  
> +	mutex_lock(&net_device.mutex);
>  	kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> +	mutex_unlock(&net_device.mutex);

I do not find this explanation adequate either. This file too could use some 
comments about how the SMP behavior looks like.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pekka Enberg April 29, 2011, 6:52 a.m. UTC | #2
On Fri, Apr 29, 2011 at 9:36 AM, Asias He <asias.hejun@gmail.com> wrote:
> This patch fixes virtio-net randmom stall
>
> host $ scp guest:/root/big.guest .
> big.guest 42%  440MB  67.7KB/s - stalled -
>
> Signed-off-by: Asias He <asias.hejun@gmail.com>
> ---
>  tools/kvm/virtio-net.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c
> index 58b3de4..efe06cb 100644
> --- a/tools/kvm/virtio-net.c
> +++ b/tools/kvm/virtio-net.c
> @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void *param)
>                head = virt_queue__get_iov(vq, iov, &out, &in, self);
>                len = readv(net_device.tap_fd, iov, in);
>                virt_queue__set_used_elem(vq, head, len);
> -       }
>
> -       kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> +               /* We should interrupt guest right now, otherwise latency is huge. */
> +               mutex_lock(&net_device.mutex);
> +               kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> +               mutex_unlock(&net_device.mutex);
> +       }
>  }
>
>  static void virtio_net_tx_callback(struct kvm *self, void *param)

Please make that IRQ latency fix a separate patch. Don't we need to do
it for TX path as well, btw?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Asias He April 29, 2011, 7:13 a.m. UTC | #3
On 04/29/2011 02:52 PM, Pekka Enberg wrote:
> Please make that IRQ latency fix a separate patch. Don't we need to do
> it for TX path as well, btw?
> 

No. We only need it in RX path. Sasha's threadpool patch breaks this.
I'm just moving it back.
Pekka Enberg April 29, 2011, 7:15 a.m. UTC | #4
On Fri, Apr 29, 2011 at 10:13 AM, Asias He <asias.hejun@gmail.com> wrote:
> No. We only need it in RX path. Sasha's threadpool patch breaks this.
> I'm just moving it back.

OK, cool! Can you send the fix as a separate patch?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 29, 2011, 7:30 a.m. UTC | #5
* Asias He <asias.hejun@gmail.com> wrote:

> This patch fixes virtio-net randmom stall
> 
> host $ scp guest:/root/big.guest .
> big.guest 42%  440MB  67.7KB/s - stalled -

> @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void *param)
>  		head = virt_queue__get_iov(vq, iov, &out, &in, self);
>  		len = readv(net_device.tap_fd, iov, in);
>  		virt_queue__set_used_elem(vq, head, len);
> -	}
>  
> -	kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> +		/* We should interrupt guest right now, otherwise latency is huge. */
> +		mutex_lock(&net_device.mutex);
> +		kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> +		mutex_unlock(&net_device.mutex);

The movement of the irq-line assertion call (which fixes the hangs) should be 
separated from the mutex_lock() additions (which look unnecessary).

Could you please check whether the addition of the mutex really is necessary to 
solve the hang/stall?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Asias He April 29, 2011, 7:47 a.m. UTC | #6
On 04/29/2011 03:30 PM, Ingo Molnar wrote:
> 
> * Asias He <asias.hejun@gmail.com> wrote:
> 
>> This patch fixes virtio-net randmom stall
>>
>> host $ scp guest:/root/big.guest .
>> big.guest 42%  440MB  67.7KB/s - stalled -
> 
>> @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void *param)
>>  		head = virt_queue__get_iov(vq, iov, &out, &in, self);
>>  		len = readv(net_device.tap_fd, iov, in);
>>  		virt_queue__set_used_elem(vq, head, len);
>> -	}
>>  
>> -	kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
>> +		/* We should interrupt guest right now, otherwise latency is huge. */
>> +		mutex_lock(&net_device.mutex);
>> +		kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
>> +		mutex_unlock(&net_device.mutex);
> 
> The movement of the irq-line assertion call (which fixes the hangs) should be 
> separated from the mutex_lock() additions (which look unnecessary).

I've sent out another patch to move this irq assertion.

> 
> Could you please check whether the addition of the mutex really is necessary to 
> solve the hang/stall?


Without the mutex around the kvm__irq_line, I am still seeing the
hangs(with the irq assertion movement patch).

I can reproduce this by:

host$ scp guest:/root/big.guest .

guest$ ping host

Not sure if the mutex thing is necessary.

> 
> Thanks,
> 
> 	Ingo
>
Sasha Levin April 29, 2011, 7:53 a.m. UTC | #7
On Fri, 2011-04-29 at 15:13 +0800, Asias He wrote:
> On 04/29/2011 02:52 PM, Pekka Enberg wrote:
> > Please make that IRQ latency fix a separate patch. Don't we need to do
> > it for TX path as well, btw?
> > 
> 
> No. We only need it in RX path. Sasha's threadpool patch breaks this.
> I'm just moving it back.
> 

I've moved the kvm__irq_line() call out because what would happen
sometimes is the following:
 - Call kvm__irq_line() to signal completion.
 - virt_queue__available() would return true.
 - readv() call blocks.

I figured it happens because we catch virt_queue in while it's being
updated by the guest and use an erroneous state of it.
Cyrill Gorcunov April 29, 2011, 10:02 a.m. UTC | #8
On Fri, Apr 29, 2011 at 11:53 AM, Sasha Levin <levinsasha928@gmail.com> wrote:
> On Fri, 2011-04-29 at 15:13 +0800, Asias He wrote:
>> On 04/29/2011 02:52 PM, Pekka Enberg wrote:
>> > Please make that IRQ latency fix a separate patch. Don't we need to do
>> > it for TX path as well, btw?
>> >
>>
>> No. We only need it in RX path. Sasha's threadpool patch breaks this.
>> I'm just moving it back.
>>
>
> I've moved the kvm__irq_line() call out because what would happen
> sometimes is the following:
>  - Call kvm__irq_line() to signal completion.
>  - virt_queue__available() would return true.
>  - readv() call blocks.
>
> I figured it happens because we catch virt_queue in while it's being
> updated by the guest and use an erroneous state of it.
>
> --
>
> Sasha.
>
>

So, if I understand all the things correct -- making virtio devices to
belong separated irqs
issued some race conditions on read\write operations between host and
guest and adding
thread pool revealed it, right? (because previously we were doing all
the work inside i/o
path on guest site).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 29, 2011, 10:22 a.m. UTC | #9
* Sasha Levin <levinsasha928@gmail.com> wrote:

> On Fri, 2011-04-29 at 15:13 +0800, Asias He wrote:
> > On 04/29/2011 02:52 PM, Pekka Enberg wrote:
> > > Please make that IRQ latency fix a separate patch. Don't we need to do
> > > it for TX path as well, btw?
> > > 
> > 
> > No. We only need it in RX path. Sasha's threadpool patch breaks this.
> > I'm just moving it back.
> > 
> 
> I've moved the kvm__irq_line() call out because what would happen
> sometimes is the following:
>  - Call kvm__irq_line() to signal completion.
>  - virt_queue__available() would return true.
>  - readv() call blocks.
> 
> I figured it happens because we catch virt_queue in while it's being
> updated by the guest and use an erroneous state of it.

How can the vring.avail flag be 'erroneous'?

Can virt_queue__available() really be true while it's not really true? How is 
that possible and what are the semantics / expected behavior of interaction 
with the virtual ring?

It's this code AFAICS:

        if (!vq->vring.avail)
                return 0;
        return vq->vring.avail->idx !=  vq->last_avail_idx;

And it's used like this:

static void virtio_net_rx_callback(struct kvm *self, void *param)
{
        struct iovec iov[VIRTIO_NET_QUEUE_SIZE];
        struct virt_queue *vq;
        uint16_t out, in;
        uint16_t head;
        int len;

        vq = param;

        while (virt_queue__available(vq)) {
                head = virt_queue__get_iov(vq, iov, &out, &in, self);
                len = readv(net_device.tap_fd, iov, in);
                virt_queue__set_used_elem(vq, head, len);
        }

        kvm__irq_line(self, VIRTIO_NET_IRQ, 1);

the guest kernel has access to these same virt_queue structure and updates it - 
possibly in parallel to the virtio-net thread.

( One thing that sticks out is that there are no memory barriers used - 
  although very likely that is not needed right now and it is not related to 
  the stalls. )

the rx_callback() is one where the host receives packet from the TAP device, 
right? I.e. the host receives brand new packets here and forwards them to the 
guest.

If that is so then indeed the right approach might be to signal the guest every 
time we manage to readv() something - there might not come any other packet for 
a long time. But the reason is not some 'erroneous state' - all state is 
perfectly fine, this is simply a basic property of the event loop that the rx 
thread implements ...

And no, the mutex lock is not needed around the guest signalling - why would it 
be needed?

The thing is, the queueing and state machine code within virtio-net.c is rather 
hard to read - do people actually understand it? It took me several tried until 
i convinced mysef that with this fix it would work. The whole file does not 
have a *single* comment!

So please document it much better, document the state machine, document the 
interaction between the rx and tx threads and the rx and tx queue, document the 
interaction between guest and host, outline what happens when any of the queues 
gets full, etc. etc. Parallel state machines are exceedingly complex and 
subtle, not documenting them is one of the seven deadly sins ...

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 29, 2011, 10:32 a.m. UTC | #10
* Asias He <asias.hejun@gmail.com> wrote:

> On 04/29/2011 03:30 PM, Ingo Molnar wrote:
> > 
> > * Asias He <asias.hejun@gmail.com> wrote:
> > 
> >> This patch fixes virtio-net randmom stall
> >>
> >> host $ scp guest:/root/big.guest .
> >> big.guest 42%  440MB  67.7KB/s - stalled -
> > 
> >> @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void *param)
> >>  		head = virt_queue__get_iov(vq, iov, &out, &in, self);
> >>  		len = readv(net_device.tap_fd, iov, in);
> >>  		virt_queue__set_used_elem(vq, head, len);
> >> -	}
> >>  
> >> -	kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> >> +		/* We should interrupt guest right now, otherwise latency is huge. */
> >> +		mutex_lock(&net_device.mutex);
> >> +		kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> >> +		mutex_unlock(&net_device.mutex);
> > 
> > The movement of the irq-line assertion call (which fixes the hangs) should be 
> > separated from the mutex_lock() additions (which look unnecessary).
> 
> I've sent out another patch to move this irq assertion.
> 
> > 
> > Could you please check whether the addition of the mutex really is necessary to 
> > solve the hang/stall?
> 
> 
> Without the mutex around the kvm__irq_line, I am still seeing the
> hangs(with the irq assertion movement patch).

since the mutex has no effect on the other party that modifies the virtual 
queue (the guest OS), this at best only changes timings and hides the bug.

The queues are private to the worker thread they belong to - and are shared 
with the guest OS. So we need to document the queueing protocol, who does what, 
when, and then maybe it will become more obvious why the hangs are occuring.

For example what happens when the RX queue gets full? The rx thread:

        while (virt_queue__available(vq)) {

drops out of the TAP-read() loop and sleeps indefinitely. It will only be woken 
up if the guest signals that the queue is available again (there's free slots 
in it):

        case VIRTIO_PCI_QUEUE_NOTIFY: {
                uint16_t queue_index;
                queue_index     = ioport__read16(data);
                virtio_net_handle_callback(self, queue_index);
                break;
        }

...
static void virtio_net_handle_callback(struct kvm *self, uint16_t queue_index)
{
        thread_pool__signal_work(net_device.jobs[queue_index]);
}

with queue_index == the RX vring.

Now if this is correct then virtio_net_rx_callback() is only ever be called 
with a ring not full. You could test this by adding an assert like this to the 
head of that function:

	BUG_ON(!virt_queue__available(vq))

does this BUG_ON() ever trigger?

Pick up BUG_ON() from tools/perf/:

  tools/perf/util/include/linux/kernel.h:#define BUG_ON(cond) assert(!(cond))

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pekka Enberg April 29, 2011, 10:37 a.m. UTC | #11
On Fri, Apr 29, 2011 at 1:02 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> So, if I understand all the things correct -- making virtio devices to
> belong separated irqs
> issued some race conditions on read\write operations between host and
> guest and adding
> thread pool revealed it, right? (because previously we were doing all
> the work inside i/o
> path on guest site).

So does reverting commit a37089da817ce7aad9789aeb9fc09b68e088ad9a
("kvm tools: Use threadpool for virtio-net") fix things? I think the
problem here is that now RX path relies on VIRTIO_PCI_QUEUE_NOTIFY to
happen in order for it to trigger KVM_IRQ_LINE which is wrong. Using
shared IRQs obviously masks the problem which is why reverting
Cyrill's commit makes the problem go away.

                        Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pekka Enberg April 29, 2011, 10:42 a.m. UTC | #12
On Fri, Apr 29, 2011 at 1:32 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> Without the mutex around the kvm__irq_line, I am still seeing the
>> hangs(with the irq assertion movement patch).
>
> since the mutex has no effect on the other party that modifies the virtual
> queue (the guest OS), this at best only changes timings and hides the bug.

Yes, I don't think it has anything to do with the queues - it's simply
a bug in the RX path which requires other work to happen to raise the
IRQ.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cyrill Gorcunov April 29, 2011, 10:47 a.m. UTC | #13
i suspect so, threadpool makes all things to be more async as they
were before and proper lockings does matter, i could test any patches
at evening, and i must admit i never get well uderstanding on virtio
device operations yet (in terms of virtio ring :)

On Friday, April 29, 2011, Pekka Enberg <penberg@kernel.org> wrote:
> On Fri, Apr 29, 2011 at 1:02 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>> So, if I understand all the things correct -- making virtio devices to
>> belong separated irqs
>> issued some race conditions on read\write operations between host and
>> guest and adding
>> thread pool revealed it, right? (because previously we were doing all
>> the work inside i/o
>> path on guest site).
>
> So does reverting commit a37089da817ce7aad9789aeb9fc09b68e088ad9a
> ("kvm tools: Use threadpool for virtio-net") fix things? I think the
> problem here is that now RX path relies on VIRTIO_PCI_QUEUE_NOTIFY to
> happen in order for it to trigger KVM_IRQ_LINE which is wrong. Using
> shared IRQs obviously masks the problem which is why reverting
> Cyrill's commit makes the problem go away.
>
>                         Pekka
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin April 29, 2011, 4:52 p.m. UTC | #14
On Fri, 2011-04-29 at 12:22 +0200, Ingo Molnar wrote:
> If that is so then indeed the right approach might be to signal the guest every 
> time we manage to readv() something - there might not come any other packet for 
> a long time. But the reason is not some 'erroneous state' - all state is 
> perfectly fine, this is simply a basic property of the event loop that the rx 
> thread implements ...

My idea as for 'erroneous state' was as follows:

We have 2 virt queues: one for RX and one for TX, each on it's own
thread.

RX Thread:
 - Runs readv() and reads data.
 - Calls virt_queue__set_used_elem() which starts updating the RX virt
queue.

TX Thread:
 - Runs readv() and reads data.
 - Calls and returns from virt_queue__set_used_elem().
 - Calls kvm__irq_line().

At this point, The RX queue state is being updated but since the IRQ was
signaled (same IRQ for both TX and RX) the guest virtio-net checks the
RX queue and finds a virt queue that wasn't fully updated.
Pekka Enberg April 29, 2011, 5:43 p.m. UTC | #15
On Fri, Apr 29, 2011 at 1:32 PM, Ingo Molnar <mingo@elte.hu> wrote:
>>> Without the mutex around the kvm__irq_line, I am still seeing the
>>> hangs(with the irq assertion movement patch).
>>
>> since the mutex has no effect on the other party that modifies the virtual
>> queue (the guest OS), this at best only changes timings and hides the bug.

On Fri, Apr 29, 2011 at 1:42 PM, Pekka Enberg <penberg@kernel.org> wrote:
> Yes, I don't think it has anything to do with the queues - it's simply
> a bug in the RX path which requires other work to happen to raise the
> IRQ.

Turns out thread pool isn't related to the problem. Asias and Sasha
reported that the problem triggers even with all the thread pool code
reverted. I'm guessing it's a latent bug in the virtio network driver
that got exposed by Cyrill's patch. It seems to be related to IRQ
handling and I still think it's related to assuming
VIRTIO_PCI_QUEUE_NOTIFY will be triggered for the RX path.

                         Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 29, 2011, 7:59 p.m. UTC | #16
* Pekka Enberg <penberg@kernel.org> wrote:

> On Fri, Apr 29, 2011 at 1:32 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >>> Without the mutex around the kvm__irq_line, I am still seeing the
> >>> hangs(with the irq assertion movement patch).
> >>
> >> since the mutex has no effect on the other party that modifies the virtual
> >> queue (the guest OS), this at best only changes timings and hides the bug.
> 
> On Fri, Apr 29, 2011 at 1:42 PM, Pekka Enberg <penberg@kernel.org> wrote:
> > Yes, I don't think it has anything to do with the queues - it's simply
> > a bug in the RX path which requires other work to happen to raise the
> > IRQ.
> 
> Turns out thread pool isn't related to the problem. Asias and Sasha reported 
> that the problem triggers even with all the thread pool code reverted. I'm 
> guessing it's a latent bug in the virtio network driver that got exposed by 
> Cyrill's patch. It seems to be related to IRQ handling and I still think it's 
> related to assuming VIRTIO_PCI_QUEUE_NOTIFY will be triggered for the RX 
> path.

Some sort of timeout would make it more robust i suspect, although the guest 
can always 'hang' both the rx and tx flow: by not clearing ring entries.

So i think it needs to be understood how those stalls occur - who misses to 
issue a notification?

Also, do perhaps TX activity assume a wakeup for RX processing? Right now the 
TX and RX threads are independent. So it would be nice to document how the 
virtio protocol works.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 29, 2011, 8:39 p.m. UTC | #17
* Sasha Levin <levinsasha928@gmail.com> wrote:

> On Fri, 2011-04-29 at 12:22 +0200, Ingo Molnar wrote:
> > If that is so then indeed the right approach might be to signal the guest every 
> > time we manage to readv() something - there might not come any other packet for 
> > a long time. But the reason is not some 'erroneous state' - all state is 
> > perfectly fine, this is simply a basic property of the event loop that the rx 
> > thread implements ...
> 
> My idea as for 'erroneous state' was as follows:
> 
> We have 2 virt queues: one for RX and one for TX, each on it's own
> thread.
> 
> RX Thread:
>  - Runs readv() and reads data.
>  - Calls virt_queue__set_used_elem() which starts updating the RX virt
> queue.
> 
> TX Thread:
>  - Runs readv() and reads data.

Doesnt the TX thread do writev()?

>  - Calls and returns from virt_queue__set_used_elem().
>  - Calls kvm__irq_line().
> 
> At this point, The RX queue state is being updated but since the IRQ was
> signaled (same IRQ for both TX and RX) the guest virtio-net checks the
> RX queue and finds a virt queue that wasn't fully updated.

Well, guest side can look at the queue *anytime* - that is key to NAPI style 
network queue processing for example.

So virt_queue__set_used_elem() needs to update the RX queue very carefully, and 
i think it's buggy there.

Right now it does:

struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, uint32_t head, uint32_t len)
{
        struct vring_used_elem *used_elem;
        used_elem       = &queue->vring.used->ring[queue->vring.used->idx++ % queue->vring.num];
        used_elem->id   = head;
        used_elem->len  = len;
        return used_elem;

that looks dangerous - what is the right protocol for touching the ring? It 
sure cannot be this.

Firstly, write barriers are needed (wmb() in the kernel), to make sure the 
fields are modified in the right order and become visible to the guest in that 
order as well.

Secondly, and more importantly, the queue index needs to be updated *after* the 
ring element has been modified. This ring's vring.used index is owned by the 
host side, so the guest will not update it in parallel - it will only look at 
it. So the sequence should be something like:

        struct vring_used_elem *used_elem;
	struct vring *vr;
	int new_idx;

	vr = &vq->vring;
	new_idx = (vr->used->idx + 1) % vr->num;

	/* Fill in the new ring descriptor: */
        used_elem       = vr->used->ring + new_idx;
        used_elem->id   = head;
        used_elem->len  = len;

	/* Write barrier, make sure the descriptor updates are visible first: */
	wmb();

	/*
	 * Update the index to make the new element visible to the guest:
	 */
	vr->used->idx = new_idx;

	/* Write barrier, make sure the index update is visible to the guest before any IRQ: */
	wmb();

        return used_elem;

Note that given current IRQ signalling techniques the second wmb() is 
unnecesarry, i only added it for completeness.

Could you try this variant without the barriers? That alone could solve your 
hangs.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pekka Enberg April 30, 2011, 7:44 a.m. UTC | #18
On Fri, 2011-04-29 at 21:59 +0200, Ingo Molnar wrote:
> Also, do perhaps TX activity assume a wakeup for RX processing? Right now the 
> TX and RX threads are independent. So it would be nice to document how the 
> virtio protocol works.

So in

http://ozlabs.org/~rusty/virtio-spec/virtio-spec.pdf

check out

Section 2.4.1 Supplying Buffers to The Device

and

Section 2.4.2 Receiving Used Buffers From The Device

			Pekka

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c
index 58b3de4..efe06cb 100644
--- a/tools/kvm/virtio-net.c
+++ b/tools/kvm/virtio-net.c
@@ -77,9 +77,12 @@  static void virtio_net_rx_callback(struct kvm *self, void *param)
 		head = virt_queue__get_iov(vq, iov, &out, &in, self);
 		len = readv(net_device.tap_fd, iov, in);
 		virt_queue__set_used_elem(vq, head, len);
-	}
 
-	kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
+		/* We should interrupt guest right now, otherwise latency is huge. */
+		mutex_lock(&net_device.mutex);
+		kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
+		mutex_unlock(&net_device.mutex);
+	}
 }
 
 static void virtio_net_tx_callback(struct kvm *self, void *param)
@@ -98,7 +101,9 @@  static void virtio_net_tx_callback(struct kvm *self, void *param)
 		virt_queue__set_used_elem(vq, head, len);
 	}
 
+	mutex_lock(&net_device.mutex);
 	kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
+	mutex_unlock(&net_device.mutex);
 }
 
 static bool virtio_net_pci_io_device_specific_in(void *data, unsigned long offset, int size, uint32_t count)