Message ID | 20221212091029.54390-1-jasowang@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] virtio-net: correctly enable callback during start_xmit | expand |
On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote: > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables > virtqueue callback via the following statement: > > do { > ...... > } while (use_napi && kick && > unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > This will cause a missing call to virtqueue_enable_cb_delayed() when > kick is false. Fixing this by removing the checking of the kick from > the condition to make sure callback is enabled correctly. > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > The patch is needed for -stable. stable rules don't allow for theoretical fixes. Was a problem observed? > --- > drivers/net/virtio_net.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 86e52454b5b5..44d7daf0267b 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > > free_old_xmit_skbs(sq, false); > > - } while (use_napi && kick && > - unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > + } while (use_napi && > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > A bit more explanation pls. kick simply means !netdev_xmit_more - if it's false we know there will be another packet, then transmissing that packet will invoke virtqueue_enable_cb_delayed. No? > /* timestamp packet in software */ > skb_tx_timestamp(skb); > -- > 2.25.1
On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote: > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables > > virtqueue callback via the following statement: > > > > do { > > ...... > > } while (use_napi && kick && > > unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when > > kick is false. Fixing this by removing the checking of the kick from > > the condition to make sure callback is enabled correctly. > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > --- > > The patch is needed for -stable. > > stable rules don't allow for theoretical fixes. Was a problem observed? > > > --- > > drivers/net/virtio_net.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 86e52454b5b5..44d7daf0267b 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > > > > free_old_xmit_skbs(sq, false); > > > > - } while (use_napi && kick && > > - unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > + } while (use_napi && > > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > A bit more explanation pls. kick simply means !netdev_xmit_more - > if it's false we know there will be another packet, then transmissing > that packet will invoke virtqueue_enable_cb_delayed. No? It's just that there may be a next packet, but in fact there may not be. For example, the vq is full, and the driver stops the queue. Thanks. > > > > > > > /* timestamp packet in software */ > > skb_tx_timestamp(skb); > > -- > > 2.25.1 > > _______________________________________________ > Virtualization mailing list > Virtualization@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote: > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables > > > virtqueue callback via the following statement: > > > > > > do { > > > ...... > > > } while (use_napi && kick && > > > unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when > > > kick is false. Fixing this by removing the checking of the kick from > > > the condition to make sure callback is enabled correctly. > > > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > --- > > > The patch is needed for -stable. > > > > stable rules don't allow for theoretical fixes. Was a problem observed? Yes, running a pktgen sample script can lead to a tx timeout. > > > > > --- > > > drivers/net/virtio_net.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index 86e52454b5b5..44d7daf0267b 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > > > > > > free_old_xmit_skbs(sq, false); > > > > > > - } while (use_napi && kick && > > > - unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > + } while (use_napi && > > > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > > A bit more explanation pls. kick simply means !netdev_xmit_more - > > if it's false we know there will be another packet, then transmissing > > that packet will invoke virtqueue_enable_cb_delayed. No? > > It's just that there may be a next packet, but in fact there may not be. > For example, the vq is full, and the driver stops the queue. Exactly, when the queue is about to be full we disable tx and wait for the next tx interrupt to re-enable tx. Thanks > > Thanks. > > > > > > > > > > > > > > /* timestamp packet in software */ > > > skb_tx_timestamp(skb); > > > -- > > > 2.25.1 > > > > _______________________________________________ > > Virtualization mailing list > > Virtualization@lists.linux-foundation.org > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization >
On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote: > On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote: > > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables > > > > virtqueue callback via the following statement: > > > > > > > > do { > > > > ...... > > > > } while (use_napi && kick && > > > > unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when > > > > kick is false. Fixing this by removing the checking of the kick from > > > > the condition to make sure callback is enabled correctly. > > > > > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > > --- > > > > The patch is needed for -stable. > > > > > > stable rules don't allow for theoretical fixes. Was a problem observed? > > Yes, running a pktgen sample script can lead to a tx timeout. Since April 2021 and we only noticed now? Are you sure it's the right Fixes tag? > > > > > > > --- > > > > drivers/net/virtio_net.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index 86e52454b5b5..44d7daf0267b 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > > > > > > > > free_old_xmit_skbs(sq, false); > > > > > > > > - } while (use_napi && kick && > > > > - unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > + } while (use_napi && > > > > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > > > > > A bit more explanation pls. kick simply means !netdev_xmit_more - > > > if it's false we know there will be another packet, then transmissing > > > that packet will invoke virtqueue_enable_cb_delayed. No? > > > > It's just that there may be a next packet, but in fact there may not be. > > For example, the vq is full, and the driver stops the queue. > > Exactly, when the queue is about to be full we disable tx and wait for > the next tx interrupt to re-enable tx. > > Thanks OK, it's a good idea to document that. And we should enable callbacks at that point, not here on data path. > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > /* timestamp packet in software */ > > > > skb_tx_timestamp(skb); > > > > -- > > > > 2.25.1 > > > > > > _______________________________________________ > > > Virtualization mailing list > > > Virtualization@lists.linux-foundation.org > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization > >
On Tue, Dec 13, 2022 at 2:38 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote: > > On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote: > > > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables > > > > > virtqueue callback via the following statement: > > > > > > > > > > do { > > > > > ...... > > > > > } while (use_napi && kick && > > > > > unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when > > > > > kick is false. Fixing this by removing the checking of the kick from > > > > > the condition to make sure callback is enabled correctly. > > > > > > > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > > > --- > > > > > The patch is needed for -stable. > > > > > > > > stable rules don't allow for theoretical fixes. Was a problem observed? > > > > Yes, running a pktgen sample script can lead to a tx timeout. > > Since April 2021 and we only noticed now? Are you sure it's the > right Fixes tag? Well, reverting a7766ef18b33 makes pktgen work again. The reason we doesn't notice is probably because: 1) We don't support BQL, so no bulk dequeuing (skb list) in normal traffic 2) When burst is enabled for pktgen, it can do bulk xmit via skb list by its own > > > > > > > > > > --- > > > > > drivers/net/virtio_net.c | 4 ++-- > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > index 86e52454b5b5..44d7daf0267b 100644 > > > > > --- a/drivers/net/virtio_net.c > > > > > +++ b/drivers/net/virtio_net.c > > > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > > > > > > > > > > free_old_xmit_skbs(sq, false); > > > > > > > > > > - } while (use_napi && kick && > > > > > - unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > + } while (use_napi && > > > > > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > > > > > > > > A bit more explanation pls. kick simply means !netdev_xmit_more - > > > > if it's false we know there will be another packet, then transmissing > > > > that packet will invoke virtqueue_enable_cb_delayed. No? > > > > > > It's just that there may be a next packet, but in fact there may not be. > > > For example, the vq is full, and the driver stops the queue. > > > > Exactly, when the queue is about to be full we disable tx and wait for > > the next tx interrupt to re-enable tx. > > > > Thanks > > OK, it's a good idea to document that. Will do. > And we should enable callbacks at that point, not here on data path. I'm not sure I understand here. Are you suggesting removing the !user_napi check here? if (!use_napi && unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { /* More just got used, free them then recheck. */ free_old_xmit_skbs(sq, false); if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { netif_start_subqueue(dev, qnum); virtqueue_disable_cb(sq->vq); } } Btw, it doesn't differ too much as kick is always true without pktgen and that may even need more comments or make the code even harder to read. We need a patch for -stable at least so I prefer to let this patch go first and do optimization on top. Thanks > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* timestamp packet in software */ > > > > > skb_tx_timestamp(skb); > > > > > -- > > > > > 2.25.1 > > > > > > > > _______________________________________________ > > > > Virtualization mailing list > > > > Virtualization@lists.linux-foundation.org > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization > > > >
On Tue, Dec 13, 2022 at 02:57:54PM +0800, Jason Wang wrote: > On Tue, Dec 13, 2022 at 2:38 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote: > > > On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote: > > > > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables > > > > > > virtqueue callback via the following statement: > > > > > > > > > > > > do { > > > > > > ...... > > > > > > } while (use_napi && kick && > > > > > > unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > > > > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when > > > > > > kick is false. Fixing this by removing the checking of the kick from > > > > > > the condition to make sure callback is enabled correctly. > > > > > > > > > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > > > > --- > > > > > > The patch is needed for -stable. > > > > > > > > > > stable rules don't allow for theoretical fixes. Was a problem observed? > > > > > > Yes, running a pktgen sample script can lead to a tx timeout. > > > > Since April 2021 and we only noticed now? Are you sure it's the > > right Fixes tag? > > Well, reverting a7766ef18b33 makes pktgen work again. > > The reason we doesn't notice is probably because: > > 1) We don't support BQL, so no bulk dequeuing (skb list) in normal traffic > 2) When burst is enabled for pktgen, it can do bulk xmit via skb list by its own > > > > > > > > > > > > > > --- > > > > > > drivers/net/virtio_net.c | 4 ++-- > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > index 86e52454b5b5..44d7daf0267b 100644 > > > > > > --- a/drivers/net/virtio_net.c > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > > > > > > > > > > > > free_old_xmit_skbs(sq, false); > > > > > > > > > > > > - } while (use_napi && kick && > > > > > > - unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > + } while (use_napi && > > > > > > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > > > > > > > > > > > A bit more explanation pls. kick simply means !netdev_xmit_more - > > > > > if it's false we know there will be another packet, then transmissing > > > > > that packet will invoke virtqueue_enable_cb_delayed. No? > > > > > > > > It's just that there may be a next packet, but in fact there may not be. > > > > For example, the vq is full, and the driver stops the queue. > > > > > > Exactly, when the queue is about to be full we disable tx and wait for > > > the next tx interrupt to re-enable tx. > > > > > > Thanks > > > > OK, it's a good idea to document that. > > Will do. > > > And we should enable callbacks at that point, not here on data path. > > I'm not sure I understand here. Are you suggesting removing the > !user_napi check here? > > if (!use_napi && > unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { > /* More just got used, free them then recheck. */ > free_old_xmit_skbs(sq, false); > if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { > netif_start_subqueue(dev, qnum); > virtqueue_disable_cb(sq->vq); > } > } At least, I suggest calling virtqueue_enable_cb_delayed around this area of code. I have not really thought all this path through and how all the corner cases interact. > Btw, it doesn't differ too much as kick is always true without pktgen > and that may even need more comments or make the code even harder to > read. We need a patch for -stable at least so I prefer to let this > patch go first and do optimization on top. > > Thanks There's a chance of perf regression here too. Let's write the full patch first of all. If you want to make it a 2 patch series that is fine but it is here since 2021 I don't see why we should rush a fix. Worry about backporting later. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* timestamp packet in software */ > > > > > > skb_tx_timestamp(skb); > > > > > > -- > > > > > > 2.25.1 > > > > > > > > > > _______________________________________________ > > > > > Virtualization mailing list > > > > > Virtualization@lists.linux-foundation.org > > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization > > > > > >
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 86e52454b5b5..44d7daf0267b 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) free_old_xmit_skbs(sq, false); - } while (use_napi && kick && - unlikely(!virtqueue_enable_cb_delayed(sq->vq))); + } while (use_napi && + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); /* timestamp packet in software */ skb_tx_timestamp(skb);
Commit a7766ef18b33("virtio_net: disable cb aggressively") enables virtqueue callback via the following statement: do { ...... } while (use_napi && kick && unlikely(!virtqueue_enable_cb_delayed(sq->vq))); This will cause a missing call to virtqueue_enable_cb_delayed() when kick is false. Fixing this by removing the checking of the kick from the condition to make sure callback is enabled correctly. Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") Signed-off-by: Jason Wang <jasowang@redhat.com> --- The patch is needed for -stable. --- drivers/net/virtio_net.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)