[v9,31/32] virtio_net: support rx/tx queue resize

Message ID	20220406034346.74409-32-xuanzhuo@linux.alibaba.com (mailing list archive)
State	Not Applicable
Headers	show Return-Path: <linux-remoteproc-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79EC4C433FE for <linux-remoteproc@archiver.kernel.org>; Wed, 6 Apr 2022 12:17:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232837AbiDFMS5 (ORCPT <rfc822;linux-remoteproc@archiver.kernel.org>); Wed, 6 Apr 2022 08:18:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233194AbiDFMSL (ORCPT <rfc822;linux-remoteproc@vger.kernel.org>); Wed, 6 Apr 2022 08:18:11 -0400 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1D7B2963D1; Tue, 5 Apr 2022 20:45:01 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1\|-1;BR=01201311R101e4;CH=green;DM=\|\|false\|;DS=\|\|;FP=0\|-1\|-1\|-1\|0\|-1\|-1\|-1;HT=e01e04426;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=34;SR=0;TI=SMTPD_---0V9KCOR._1649216693; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0V9KCOR._1649216693) by smtp.aliyun-inc.com(127.0.0.1); Wed, 06 Apr 2022 11:44:54 +0800 From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> To: virtualization@lists.linux-foundation.org Cc: Jeff Dike <jdike@addtoit.com>, Richard Weinberger <richard@nod.at>, Anton Ivanov <anton.ivanov@cambridgegreys.com>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Hans de Goede <hdegoede@redhat.com>, Mark Gross <markgross@kernel.org>, Vadim Pasternak <vadimp@nvidia.com>, Bjorn Andersson <bjorn.andersson@linaro.org>, Mathieu Poirier <mathieu.poirier@linaro.org>, Cornelia Huck <cohuck@redhat.com>, Halil Pasic <pasic@linux.ibm.com>, Heiko Carstens <hca@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Alexander Gordeev <agordeev@linux.ibm.com>, Sven Schnelle <svens@linux.ibm.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Jesper Dangaard Brouer <hawk@kernel.org>, John Fastabend <john.fastabend@gmail.com>, Johannes Berg <johannes.berg@intel.com>, Xuan Zhuo <xuanzhuo@linux.alibaba.com>, Vincent Whitchurch <vincent.whitchurch@axis.com>, linux-um@lists.infradead.org, netdev@vger.kernel.org, platform-driver-x86@vger.kernel.org, linux-remoteproc@vger.kernel.org, linux-s390@vger.kernel.org, kvm@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH v9 31/32] virtio_net: support rx/tx queue resize Date: Wed, 6 Apr 2022 11:43:45 +0800 Message-Id: <20220406034346.74409-32-xuanzhuo@linux.alibaba.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20220406034346.74409-1-xuanzhuo@linux.alibaba.com> References: <20220406034346.74409-1-xuanzhuo@linux.alibaba.com> MIME-Version: 1.0 X-Git-Hash: 881cb3483d12 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-remoteproc.vger.kernel.org> X-Mailing-List: linux-remoteproc@vger.kernel.org
Series	virtio pci support VIRTIO_F_RING_RESET (refactor vring) \| expand [v9,00/32] virtio pci support VIRTIO_F_RING_RESET (refactor vring) [v9,01/32] virtio: add helper virtqueue_get_vring_max_size() [v9,02/32] virtio: struct virtio_config_ops add callbacks for queue_reset [v9,03/32] virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset [v9,04/32] virtio_ring: remove the arg vq of vring_alloc_desc_extra() [v9,05/32] virtio_ring: extract the logic of freeing vring [v9,06/32] virtio_ring: split: extract the logic of alloc queue [v9,07/32] virtio_ring: split: extract the logic of alloc state and extra [v9,08/32] virtio_ring: split: extract the logic of attach vring [v9,09/32] virtio_ring: split: extract the logic of vq init [v9,10/32] virtio_ring: split: introduce virtqueue_reinit_split() [v9,11/32] virtio_ring: split: introduce virtqueue_resize_split() [v9,12/32] virtio_ring: packed: extract the logic of alloc queue [v9,13/32] virtio_ring: packed: extract the logic of alloc state and extra [v9,14/32] virtio_ring: packed: extract the logic of attach vring [v9,15/32] virtio_ring: packed: extract the logic of vq init [v9,16/32] virtio_ring: packed: introduce virtqueue_reinit_packed() [v9,17/32] virtio_ring: packed: introduce virtqueue_resize_packed() [v9,18/32] virtio_ring: introduce virtqueue_resize() [v9,19/32] virtio_pci: struct virtio_pci_common_cfg add queue_notify_data [v9,20/32] virtio: queue_reset: add VIRTIO_F_RING_RESET [v9,21/32] virtio_pci: queue_reset: update struct virtio_pci_common_cfg and option functions [v9,22/32] virtio_pci: queue_reset: extract the logic of active vq for modern pci [v9,23/32] virtio_pci: queue_reset: support VIRTIO_F_RING_RESET [v9,24/32] virtio: find_vqs() add arg sizes [v9,25/32] virtio_pci: support the arg sizes of find_vqs() [v9,26/32] virtio_mmio: support the arg sizes of find_vqs() [v9,27/32] virtio: add helper virtio_find_vqs_ctx_size() [v9,28/32] virtio_net: set the default max ring size by find_vqs() [v9,29/32] virtio_net: get ringparam by virtqueue_get_vring_max_size() [v9,30/32] virtio_net: split free_unused_bufs() [v9,31/32] virtio_net: support rx/tx queue resize [v9,32/32] virtio_net: support set_ringparam

Xuan Zhuo April 6, 2022, 3:43 a.m. UTC

This patch implements the resize function of the rx, tx queues.
Based on this function, it is possible to modify the ring num of the
queue.

There may be an exception during the resize process, the resize may
fail, or the vq can no longer be used. Either way, we must execute
napi_enable(). Because napi_disable is similar to a lock, napi_enable
must be called after calling napi_disable.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

Jason Wang April 13, 2022, 8 a.m. UTC | #1

在 2022/4/6 上午11:43, Xuan Zhuo 写道:
> This patch implements the resize function of the rx, tx queues.
> Based on this function, it is possible to modify the ring num of the
> queue.
>
> There may be an exception during the resize process, the resize may
> fail, or the vq can no longer be used. Either way, we must execute
> napi_enable(). Because napi_disable is similar to a lock, napi_enable
> must be called after calling napi_disable.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 81 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b8bf00525177..ba6859f305f7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
>   	char padding[4];
>   };
>   
> +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> +
>   static bool is_xdp_frame(void *ptr)
>   {
>   	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
>   {
>   	napi_enable(napi);
>   
> +	/* Check if vq is in reset state. The normal reset/resize process will
> +	 * be protected by napi. However, the protection of napi is only enabled
> +	 * during the operation, and the protection of napi will end after the
> +	 * operation is completed. If re-enable fails during the process, vq
> +	 * will remain unavailable with reset state.
> +	 */
> +	if (vq->reset)
> +		return;


I don't get when could we hit this condition.


> +
>   	/* If all buffers were filled by other side before we napi_enabled, we
>   	 * won't get another interrupt, so process any outstanding packets now.
>   	 * Call local_bh_enable after to trigger softIRQ processing.
> @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
>   		struct receive_queue *rq = &vi->rq[i];
>   
>   		napi_disable(&rq->napi);
> +
> +		/* Check if vq is in reset state. See more in
> +		 * virtnet_napi_enable()
> +		 */
> +		if (rq->vq->reset) {
> +			virtnet_napi_enable(rq->vq, &rq->napi);
> +			continue;
> +		}


Can we do something similar in virtnet_close() by canceling the work?


> +
>   		still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
>   		virtnet_napi_enable(rq->vq, &rq->napi);
>   
> @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   	if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
>   		return;
>   
> +	/* Check if vq is in reset state. See more in virtnet_napi_enable() */
> +	if (sq->vq->reset)
> +		return;


We've disabled TX napi, any chance we can still hit this?


> +
>   	if (__netif_tx_trylock(txq)) {
>   		do {
>   			virtqueue_disable_cb(sq->vq);
> @@ -1769,6 +1794,62 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	return NETDEV_TX_OK;
>   }
>   
> +static int virtnet_rx_resize(struct virtnet_info *vi,
> +			     struct receive_queue *rq, u32 ring_num)
> +{
> +	int err;
> +
> +	napi_disable(&rq->napi);
> +
> +	err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> +	if (err)
> +		goto err;
> +
> +	if (!try_fill_recv(vi, rq, GFP_KERNEL))
> +		schedule_delayed_work(&vi->refill, 0);
> +
> +	virtnet_napi_enable(rq->vq, &rq->napi);
> +	return 0;
> +
> +err:
> +	netdev_err(vi->dev,
> +		   "reset rx reset vq fail: rx queue index: %td err: %d\n",
> +		   rq - vi->rq, err);
> +	virtnet_napi_enable(rq->vq, &rq->napi);
> +	return err;
> +}
> +
> +static int virtnet_tx_resize(struct virtnet_info *vi,
> +			     struct send_queue *sq, u32 ring_num)
> +{
> +	struct netdev_queue *txq;
> +	int err, qindex;
> +
> +	qindex = sq - vi->sq;
> +
> +	virtnet_napi_tx_disable(&sq->napi);
> +
> +	txq = netdev_get_tx_queue(vi->dev, qindex);
> +	__netif_tx_lock_bh(txq);
> +	netif_stop_subqueue(vi->dev, qindex);
> +	__netif_tx_unlock_bh(txq);
> +
> +	err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> +	if (err)
> +		goto err;
> +
> +	netif_start_subqueue(vi->dev, qindex);
> +	virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> +	return 0;
> +
> +err:


I guess we can still start the queue in this case? (Since we don't 
change the queue if resize fails).


> +	netdev_err(vi->dev,
> +		   "reset tx reset vq fail: tx queue index: %td err: %d\n",
> +		   sq - vi->sq, err);
> +	virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> +	return err;
> +}
> +
>   /*
>    * Send command via the control virtqueue and check status.  Commands
>    * supported by the hypervisor, as indicated by feature bits, should

Xuan Zhuo April 13, 2022, 8:35 a.m. UTC | #2

On Wed, 13 Apr 2022 16:00:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
>
> 在 2022/4/6 上午11:43, Xuan Zhuo 写道:
> > This patch implements the resize function of the rx, tx queues.
> > Based on this function, it is possible to modify the ring num of the
> > queue.
> >
> > There may be an exception during the resize process, the resize may
> > fail, or the vq can no longer be used. Either way, we must execute
> > napi_enable(). Because napi_disable is similar to a lock, napi_enable
> > must be called after calling napi_disable.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >   drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 81 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index b8bf00525177..ba6859f305f7 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
> >   	char padding[4];
> >   };
> >
> > +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> > +
> >   static bool is_xdp_frame(void *ptr)
> >   {
> >   	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
> >   {
> >   	napi_enable(napi);
> >
> > +	/* Check if vq is in reset state. The normal reset/resize process will
> > +	 * be protected by napi. However, the protection of napi is only enabled
> > +	 * during the operation, and the protection of napi will end after the
> > +	 * operation is completed. If re-enable fails during the process, vq
> > +	 * will remain unavailable with reset state.
> > +	 */
> > +	if (vq->reset)
> > +		return;
>
>
> I don't get when could we hit this condition.


In patch 23, the code to implement re-enable vq is as follows:

+static int vp_modern_enable_reset_vq(struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+	struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
+	struct virtio_pci_vq_info *info;
+	unsigned long flags, index;
+	int err;
+
+	if (!vq->reset)
+		return -EBUSY;
+
+	index = vq->index;
+	info = vp_dev->vqs[index];
+
+	/* check queue reset status */
+	if (vp_modern_get_queue_reset(mdev, index) != 1)
+		return -EBUSY;
+
+	err = vp_active_vq(vq, info->msix_vector);
+	if (err)
+		return err;
+
+	if (vq->callback) {
+		spin_lock_irqsave(&vp_dev->lock, flags);
+		list_add(&info->node, &vp_dev->virtqueues);
+		spin_unlock_irqrestore(&vp_dev->lock, flags);
+	} else {
+		INIT_LIST_HEAD(&info->node);
+	}
+
+	vp_modern_set_queue_enable(&vp_dev->mdev, index, true);
+
+	if (vp_dev->per_vq_vectors && info->msix_vector != VIRTIO_MSI_NO_VECTOR)
+		enable_irq(pci_irq_vector(vp_dev->pci_dev, info->msix_vector));
+
+	vq->reset = false;
+
+	return 0;
+}


There are three situations where an error will be returned. These are the
situations I want to handle.

But I'm rethinking the question, and I feel like you're right, although the
hardware setup may fail. We can no longer sync with the hardware. But using it
as a normal vq doesn't have any problems.

>
>
> > +
> >   	/* If all buffers were filled by other side before we napi_enabled, we
> >   	 * won't get another interrupt, so process any outstanding packets now.
> >   	 * Call local_bh_enable after to trigger softIRQ processing.
> > @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
> >   		struct receive_queue *rq = &vi->rq[i];
> >
> >   		napi_disable(&rq->napi);
> > +
> > +		/* Check if vq is in reset state. See more in
> > +		 * virtnet_napi_enable()
> > +		 */
> > +		if (rq->vq->reset) {
> > +			virtnet_napi_enable(rq->vq, &rq->napi);
> > +			continue;
> > +		}
>
>
> Can we do something similar in virtnet_close() by canceling the work?

I think there is no need to cancel the work here, because napi_disable will wait
for the napi_enable of the resize. So if the re-enable failed vq is used as a normal
vq, this logic can be removed.


>
>
> > +
> >   		still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> >   		virtnet_napi_enable(rq->vq, &rq->napi);
> >
> > @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >   	if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
> >   		return;
> >
> > +	/* Check if vq is in reset state. See more in virtnet_napi_enable() */
> > +	if (sq->vq->reset)
> > +		return;
>
>
> We've disabled TX napi, any chance we can still hit this?

Same as above.

>
>
> > +
> >   	if (__netif_tx_trylock(txq)) {
> >   		do {
> >   			virtqueue_disable_cb(sq->vq);
> > @@ -1769,6 +1794,62 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   	return NETDEV_TX_OK;
> >   }
> >
> > +static int virtnet_rx_resize(struct virtnet_info *vi,
> > +			     struct receive_queue *rq, u32 ring_num)
> > +{
> > +	int err;
> > +
> > +	napi_disable(&rq->napi);
> > +
> > +	err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> > +	if (err)
> > +		goto err;
> > +
> > +	if (!try_fill_recv(vi, rq, GFP_KERNEL))
> > +		schedule_delayed_work(&vi->refill, 0);
> > +
> > +	virtnet_napi_enable(rq->vq, &rq->napi);
> > +	return 0;
> > +
> > +err:
> > +	netdev_err(vi->dev,
> > +		   "reset rx reset vq fail: rx queue index: %td err: %d\n",
> > +		   rq - vi->rq, err);
> > +	virtnet_napi_enable(rq->vq, &rq->napi);
> > +	return err;
> > +}
> > +
> > +static int virtnet_tx_resize(struct virtnet_info *vi,
> > +			     struct send_queue *sq, u32 ring_num)
> > +{
> > +	struct netdev_queue *txq;
> > +	int err, qindex;
> > +
> > +	qindex = sq - vi->sq;
> > +
> > +	virtnet_napi_tx_disable(&sq->napi);
> > +
> > +	txq = netdev_get_tx_queue(vi->dev, qindex);
> > +	__netif_tx_lock_bh(txq);
> > +	netif_stop_subqueue(vi->dev, qindex);
> > +	__netif_tx_unlock_bh(txq);
> > +
> > +	err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> > +	if (err)
> > +		goto err;
> > +
> > +	netif_start_subqueue(vi->dev, qindex);
> > +	virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > +	return 0;
> > +
> > +err:
>
>
> I guess we can still start the queue in this case? (Since we don't
> change the queue if resize fails).

Yes, you are right.

Thanks.

>
>
> > +	netdev_err(vi->dev,
> > +		   "reset tx reset vq fail: tx queue index: %td err: %d\n",
> > +		   sq - vi->sq, err);
> > +	virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > +	return err;
> > +}
> > +
> >   /*
> >    * Send command via the control virtqueue and check status.  Commands
> >    * supported by the hypervisor, as indicated by feature bits, should
>

Jason Wang April 14, 2022, 9:30 a.m. UTC | #3

On Wed, Apr 13, 2022 at 4:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 13 Apr 2022 16:00:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >
> > 在 2022/4/6 上午11:43, Xuan Zhuo 写道:
> > > This patch implements the resize function of the rx, tx queues.
> > > Based on this function, it is possible to modify the ring num of the
> > > queue.
> > >
> > > There may be an exception during the resize process, the resize may
> > > fail, or the vq can no longer be used. Either way, we must execute
> > > napi_enable(). Because napi_disable is similar to a lock, napi_enable
> > > must be called after calling napi_disable.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >   drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 81 insertions(+)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index b8bf00525177..ba6859f305f7 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
> > >     char padding[4];
> > >   };
> > >
> > > +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > +
> > >   static bool is_xdp_frame(void *ptr)
> > >   {
> > >     return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > > @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
> > >   {
> > >     napi_enable(napi);
> > >
> > > +   /* Check if vq is in reset state. The normal reset/resize process will
> > > +    * be protected by napi. However, the protection of napi is only enabled
> > > +    * during the operation, and the protection of napi will end after the
> > > +    * operation is completed. If re-enable fails during the process, vq
> > > +    * will remain unavailable with reset state.
> > > +    */
> > > +   if (vq->reset)
> > > +           return;
> >
> >
> > I don't get when could we hit this condition.
>
>
> In patch 23, the code to implement re-enable vq is as follows:
>
> +static int vp_modern_enable_reset_vq(struct virtqueue *vq)
> +{
> +       struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
> +       struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
> +       struct virtio_pci_vq_info *info;
> +       unsigned long flags, index;
> +       int err;
> +
> +       if (!vq->reset)
> +               return -EBUSY;
> +
> +       index = vq->index;
> +       info = vp_dev->vqs[index];
> +
> +       /* check queue reset status */
> +       if (vp_modern_get_queue_reset(mdev, index) != 1)
> +               return -EBUSY;
> +
> +       err = vp_active_vq(vq, info->msix_vector);
> +       if (err)
> +               return err;
> +
> +       if (vq->callback) {
> +               spin_lock_irqsave(&vp_dev->lock, flags);
> +               list_add(&info->node, &vp_dev->virtqueues);
> +               spin_unlock_irqrestore(&vp_dev->lock, flags);
> +       } else {
> +               INIT_LIST_HEAD(&info->node);
> +       }
> +
> +       vp_modern_set_queue_enable(&vp_dev->mdev, index, true);
> +
> +       if (vp_dev->per_vq_vectors && info->msix_vector != VIRTIO_MSI_NO_VECTOR)
> +               enable_irq(pci_irq_vector(vp_dev->pci_dev, info->msix_vector));
> +
> +       vq->reset = false;
> +
> +       return 0;
> +}
>
>
> There are three situations where an error will be returned. These are the
> situations I want to handle.

Right, but it looks harmless if we just schedule the NAPI without the check.

>
> But I'm rethinking the question, and I feel like you're right, although the
> hardware setup may fail. We can no longer sync with the hardware. But using it
> as a normal vq doesn't have any problems.

Note that we should make sure the buggy(malicous) device won't crash
the codes by changing the queue_reset value at its will.

>
> >
> >
> > > +
> > >     /* If all buffers were filled by other side before we napi_enabled, we
> > >      * won't get another interrupt, so process any outstanding packets now.
> > >      * Call local_bh_enable after to trigger softIRQ processing.
> > > @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
> > >             struct receive_queue *rq = &vi->rq[i];
> > >
> > >             napi_disable(&rq->napi);
> > > +
> > > +           /* Check if vq is in reset state. See more in
> > > +            * virtnet_napi_enable()
> > > +            */
> > > +           if (rq->vq->reset) {
> > > +                   virtnet_napi_enable(rq->vq, &rq->napi);
> > > +                   continue;
> > > +           }
> >
> >
> > Can we do something similar in virtnet_close() by canceling the work?
>
> I think there is no need to cancel the work here, because napi_disable will wait
> for the napi_enable of the resize. So if the re-enable failed vq is used as a normal
> vq, this logic can be removed.

Actually I meant the part of virtnet_rx_resize().

If we don't synchronize with the refill work, it might enable NAPI unexpectedly?

Thanks

>
>
> >
> >
> > > +
> > >             still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> > >             virtnet_napi_enable(rq->vq, &rq->napi);
> > >
> > > @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > >     if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
> > >             return;
> > >
> > > +   /* Check if vq is in reset state. See more in virtnet_napi_enable() */
> > > +   if (sq->vq->reset)
> > > +           return;
> >
> >
> > We've disabled TX napi, any chance we can still hit this?
>
> Same as above.
>
> >
> >
> > > +
> > >     if (__netif_tx_trylock(txq)) {
> > >             do {
> > >                     virtqueue_disable_cb(sq->vq);
> > > @@ -1769,6 +1794,62 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >     return NETDEV_TX_OK;
> > >   }
> > >
> > > +static int virtnet_rx_resize(struct virtnet_info *vi,
> > > +                        struct receive_queue *rq, u32 ring_num)
> > > +{
> > > +   int err;
> > > +
> > > +   napi_disable(&rq->napi);
> > > +
> > > +   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> > > +   if (err)
> > > +           goto err;
> > > +
> > > +   if (!try_fill_recv(vi, rq, GFP_KERNEL))
> > > +           schedule_delayed_work(&vi->refill, 0);
> > > +
> > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > +   return 0;
> > > +
> > > +err:
> > > +   netdev_err(vi->dev,
> > > +              "reset rx reset vq fail: rx queue index: %td err: %d\n",
> > > +              rq - vi->rq, err);
> > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > +   return err;
> > > +}
> > > +
> > > +static int virtnet_tx_resize(struct virtnet_info *vi,
> > > +                        struct send_queue *sq, u32 ring_num)
> > > +{
> > > +   struct netdev_queue *txq;
> > > +   int err, qindex;
> > > +
> > > +   qindex = sq - vi->sq;
> > > +
> > > +   virtnet_napi_tx_disable(&sq->napi);
> > > +
> > > +   txq = netdev_get_tx_queue(vi->dev, qindex);
> > > +   __netif_tx_lock_bh(txq);
> > > +   netif_stop_subqueue(vi->dev, qindex);
> > > +   __netif_tx_unlock_bh(txq);
> > > +
> > > +   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> > > +   if (err)
> > > +           goto err;
> > > +
> > > +   netif_start_subqueue(vi->dev, qindex);
> > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > +   return 0;
> > > +
> > > +err:
> >
> >
> > I guess we can still start the queue in this case? (Since we don't
> > change the queue if resize fails).
>
> Yes, you are right.
>
> Thanks.
>
> >
> >
> > > +   netdev_err(vi->dev,
> > > +              "reset tx reset vq fail: tx queue index: %td err: %d\n",
> > > +              sq - vi->sq, err);
> > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > +   return err;
> > > +}
> > > +
> > >   /*
> > >    * Send command via the control virtqueue and check status.  Commands
> > >    * supported by the hypervisor, as indicated by feature bits, should
> >
>

Xuan Zhuo April 15, 2022, 2:18 a.m. UTC | #4

On Thu, 14 Apr 2022 17:30:02 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 13, 2022 at 4:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 13 Apr 2022 16:00:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > 在 2022/4/6 上午11:43, Xuan Zhuo 写道:
> > > > This patch implements the resize function of the rx, tx queues.
> > > > Based on this function, it is possible to modify the ring num of the
> > > > queue.
> > > >
> > > > There may be an exception during the resize process, the resize may
> > > > fail, or the vq can no longer be used. Either way, we must execute
> > > > napi_enable(). Because napi_disable is similar to a lock, napi_enable
> > > > must be called after calling napi_disable.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >   drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
> > > >   1 file changed, 81 insertions(+)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index b8bf00525177..ba6859f305f7 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
> > > >     char padding[4];
> > > >   };
> > > >
> > > > +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > +
> > > >   static bool is_xdp_frame(void *ptr)
> > > >   {
> > > >     return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > > > @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
> > > >   {
> > > >     napi_enable(napi);
> > > >
> > > > +   /* Check if vq is in reset state. The normal reset/resize process will
> > > > +    * be protected by napi. However, the protection of napi is only enabled
> > > > +    * during the operation, and the protection of napi will end after the
> > > > +    * operation is completed. If re-enable fails during the process, vq
> > > > +    * will remain unavailable with reset state.
> > > > +    */
> > > > +   if (vq->reset)
> > > > +           return;
> > >
> > >
> > > I don't get when could we hit this condition.
> >
> >
> > In patch 23, the code to implement re-enable vq is as follows:
> >
> > +static int vp_modern_enable_reset_vq(struct virtqueue *vq)
> > +{
> > +       struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
> > +       struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
> > +       struct virtio_pci_vq_info *info;
> > +       unsigned long flags, index;
> > +       int err;
> > +
> > +       if (!vq->reset)
> > +               return -EBUSY;
> > +
> > +       index = vq->index;
> > +       info = vp_dev->vqs[index];
> > +
> > +       /* check queue reset status */
> > +       if (vp_modern_get_queue_reset(mdev, index) != 1)
> > +               return -EBUSY;
> > +
> > +       err = vp_active_vq(vq, info->msix_vector);
> > +       if (err)
> > +               return err;
> > +
> > +       if (vq->callback) {
> > +               spin_lock_irqsave(&vp_dev->lock, flags);
> > +               list_add(&info->node, &vp_dev->virtqueues);
> > +               spin_unlock_irqrestore(&vp_dev->lock, flags);
> > +       } else {
> > +               INIT_LIST_HEAD(&info->node);
> > +       }
> > +
> > +       vp_modern_set_queue_enable(&vp_dev->mdev, index, true);
> > +
> > +       if (vp_dev->per_vq_vectors && info->msix_vector != VIRTIO_MSI_NO_VECTOR)
> > +               enable_irq(pci_irq_vector(vp_dev->pci_dev, info->msix_vector));
> > +
> > +       vq->reset = false;
> > +
> > +       return 0;
> > +}
> >
> >
> > There are three situations where an error will be returned. These are the
> > situations I want to handle.
>
> Right, but it looks harmless if we just schedule the NAPI without the check.

Yes.

> >
> > But I'm rethinking the question, and I feel like you're right, although the
> > hardware setup may fail. We can no longer sync with the hardware. But using it
> > as a normal vq doesn't have any problems.
>
> Note that we should make sure the buggy(malicous) device won't crash
> the codes by changing the queue_reset value at its will.

I will keep an eye on this situation.

>
> >
> > >
> > >
> > > > +
> > > >     /* If all buffers were filled by other side before we napi_enabled, we
> > > >      * won't get another interrupt, so process any outstanding packets now.
> > > >      * Call local_bh_enable after to trigger softIRQ processing.
> > > > @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
> > > >             struct receive_queue *rq = &vi->rq[i];
> > > >
> > > >             napi_disable(&rq->napi);
> > > > +
> > > > +           /* Check if vq is in reset state. See more in
> > > > +            * virtnet_napi_enable()
> > > > +            */
> > > > +           if (rq->vq->reset) {
> > > > +                   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > +                   continue;
> > > > +           }
> > >
> > >
> > > Can we do something similar in virtnet_close() by canceling the work?
> >
> > I think there is no need to cancel the work here, because napi_disable will wait
> > for the napi_enable of the resize. So if the re-enable failed vq is used as a normal
> > vq, this logic can be removed.
>
> Actually I meant the part of virtnet_rx_resize().
>
> If we don't synchronize with the refill work, it might enable NAPI unexpectedly?

I don't think this situation will be encountered, because napi_disable is
mutually exclusive, so there will be no unexpected napi enable.

Is there something I misunderstood?

Thanks.

>
> Thanks
>
> >
> >
> > >
> > >
> > > > +
> > > >             still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> > > >             virtnet_napi_enable(rq->vq, &rq->napi);
> > > >
> > > > @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > > >     if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
> > > >             return;
> > > >
> > > > +   /* Check if vq is in reset state. See more in virtnet_napi_enable() */
> > > > +   if (sq->vq->reset)
> > > > +           return;
> > >
> > >
> > > We've disabled TX napi, any chance we can still hit this?
> >
> > Same as above.
> >
> > >
> > >
> > > > +
> > > >     if (__netif_tx_trylock(txq)) {
> > > >             do {
> > > >                     virtqueue_disable_cb(sq->vq);
> > > > @@ -1769,6 +1794,62 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >     return NETDEV_TX_OK;
> > > >   }
> > > >
> > > > +static int virtnet_rx_resize(struct virtnet_info *vi,
> > > > +                        struct receive_queue *rq, u32 ring_num)
> > > > +{
> > > > +   int err;
> > > > +
> > > > +   napi_disable(&rq->napi);
> > > > +
> > > > +   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> > > > +   if (err)
> > > > +           goto err;
> > > > +
> > > > +   if (!try_fill_recv(vi, rq, GFP_KERNEL))
> > > > +           schedule_delayed_work(&vi->refill, 0);
> > > > +
> > > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > +   return 0;
> > > > +
> > > > +err:
> > > > +   netdev_err(vi->dev,
> > > > +              "reset rx reset vq fail: rx queue index: %td err: %d\n",
> > > > +              rq - vi->rq, err);
> > > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > +   return err;
> > > > +}
> > > > +
> > > > +static int virtnet_tx_resize(struct virtnet_info *vi,
> > > > +                        struct send_queue *sq, u32 ring_num)
> > > > +{
> > > > +   struct netdev_queue *txq;
> > > > +   int err, qindex;
> > > > +
> > > > +   qindex = sq - vi->sq;
> > > > +
> > > > +   virtnet_napi_tx_disable(&sq->napi);
> > > > +
> > > > +   txq = netdev_get_tx_queue(vi->dev, qindex);
> > > > +   __netif_tx_lock_bh(txq);
> > > > +   netif_stop_subqueue(vi->dev, qindex);
> > > > +   __netif_tx_unlock_bh(txq);
> > > > +
> > > > +   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> > > > +   if (err)
> > > > +           goto err;
> > > > +
> > > > +   netif_start_subqueue(vi->dev, qindex);
> > > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > > +   return 0;
> > > > +
> > > > +err:
> > >
> > >
> > > I guess we can still start the queue in this case? (Since we don't
> > > change the queue if resize fails).
> >
> > Yes, you are right.
> >
> > Thanks.
> >
> > >
> > >
> > > > +   netdev_err(vi->dev,
> > > > +              "reset tx reset vq fail: tx queue index: %td err: %d\n",
> > > > +              sq - vi->sq, err);
> > > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > > +   return err;
> > > > +}
> > > > +
> > > >   /*
> > > >    * Send command via the control virtqueue and check status.  Commands
> > > >    * supported by the hypervisor, as indicated by feature bits, should
> > >
> >
>

Jason Wang April 15, 2022, 5:53 a.m. UTC | #5

On Fri, Apr 15, 2022 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 14 Apr 2022 17:30:02 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Apr 13, 2022 at 4:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 13 Apr 2022 16:00:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > 在 2022/4/6 上午11:43, Xuan Zhuo 写道:
> > > > > This patch implements the resize function of the rx, tx queues.
> > > > > Based on this function, it is possible to modify the ring num of the
> > > > > queue.
> > > > >
> > > > > There may be an exception during the resize process, the resize may
> > > > > fail, or the vq can no longer be used. Either way, we must execute
> > > > > napi_enable(). Because napi_disable is similar to a lock, napi_enable
> > > > > must be called after calling napi_disable.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >   drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
> > > > >   1 file changed, 81 insertions(+)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index b8bf00525177..ba6859f305f7 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
> > > > >     char padding[4];
> > > > >   };
> > > > >
> > > > > +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > > +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > > +
> > > > >   static bool is_xdp_frame(void *ptr)
> > > > >   {
> > > > >     return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > > > > @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
> > > > >   {
> > > > >     napi_enable(napi);
> > > > >
> > > > > +   /* Check if vq is in reset state. The normal reset/resize process will
> > > > > +    * be protected by napi. However, the protection of napi is only enabled
> > > > > +    * during the operation, and the protection of napi will end after the
> > > > > +    * operation is completed. If re-enable fails during the process, vq
> > > > > +    * will remain unavailable with reset state.
> > > > > +    */
> > > > > +   if (vq->reset)
> > > > > +           return;
> > > >
> > > >
> > > > I don't get when could we hit this condition.
> > >
> > >
> > > In patch 23, the code to implement re-enable vq is as follows:
> > >
> > > +static int vp_modern_enable_reset_vq(struct virtqueue *vq)
> > > +{
> > > +       struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
> > > +       struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
> > > +       struct virtio_pci_vq_info *info;
> > > +       unsigned long flags, index;
> > > +       int err;
> > > +
> > > +       if (!vq->reset)
> > > +               return -EBUSY;
> > > +
> > > +       index = vq->index;
> > > +       info = vp_dev->vqs[index];
> > > +
> > > +       /* check queue reset status */
> > > +       if (vp_modern_get_queue_reset(mdev, index) != 1)
> > > +               return -EBUSY;
> > > +
> > > +       err = vp_active_vq(vq, info->msix_vector);
> > > +       if (err)
> > > +               return err;
> > > +
> > > +       if (vq->callback) {
> > > +               spin_lock_irqsave(&vp_dev->lock, flags);
> > > +               list_add(&info->node, &vp_dev->virtqueues);
> > > +               spin_unlock_irqrestore(&vp_dev->lock, flags);
> > > +       } else {
> > > +               INIT_LIST_HEAD(&info->node);
> > > +       }
> > > +
> > > +       vp_modern_set_queue_enable(&vp_dev->mdev, index, true);
> > > +
> > > +       if (vp_dev->per_vq_vectors && info->msix_vector != VIRTIO_MSI_NO_VECTOR)
> > > +               enable_irq(pci_irq_vector(vp_dev->pci_dev, info->msix_vector));
> > > +
> > > +       vq->reset = false;
> > > +
> > > +       return 0;
> > > +}
> > >
> > >
> > > There are three situations where an error will be returned. These are the
> > > situations I want to handle.
> >
> > Right, but it looks harmless if we just schedule the NAPI without the check.
>
> Yes.
>
> > >
> > > But I'm rethinking the question, and I feel like you're right, although the
> > > hardware setup may fail. We can no longer sync with the hardware. But using it
> > > as a normal vq doesn't have any problems.
> >
> > Note that we should make sure the buggy(malicous) device won't crash
> > the codes by changing the queue_reset value at its will.
>
> I will keep an eye on this situation.
>
> >
> > >
> > > >
> > > >
> > > > > +
> > > > >     /* If all buffers were filled by other side before we napi_enabled, we
> > > > >      * won't get another interrupt, so process any outstanding packets now.
> > > > >      * Call local_bh_enable after to trigger softIRQ processing.
> > > > > @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
> > > > >             struct receive_queue *rq = &vi->rq[i];
> > > > >
> > > > >             napi_disable(&rq->napi);
> > > > > +
> > > > > +           /* Check if vq is in reset state. See more in
> > > > > +            * virtnet_napi_enable()
> > > > > +            */
> > > > > +           if (rq->vq->reset) {
> > > > > +                   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > > +                   continue;
> > > > > +           }
> > > >
> > > >
> > > > Can we do something similar in virtnet_close() by canceling the work?
> > >
> > > I think there is no need to cancel the work here, because napi_disable will wait
> > > for the napi_enable of the resize. So if the re-enable failed vq is used as a normal
> > > vq, this logic can be removed.
> >
> > Actually I meant the part of virtnet_rx_resize().
> >
> > If we don't synchronize with the refill work, it might enable NAPI unexpectedly?
>
> I don't think this situation will be encountered, because napi_disable is
> mutually exclusive, so there will be no unexpected napi enable.
>
> Is there something I misunderstood?

So in virtnet_rx_resize() we do:

napi_disable()
...
resize()
...
napi_enalbe()

How can we guarantee that the work is not run after the napi_disable()?

Thanks

>
> Thanks.
>
> >
> > Thanks
> >
> > >
> > >
> > > >
> > > >
> > > > > +
> > > > >             still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> > > > >             virtnet_napi_enable(rq->vq, &rq->napi);
> > > > >
> > > > > @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > > > >     if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
> > > > >             return;
> > > > >
> > > > > +   /* Check if vq is in reset state. See more in virtnet_napi_enable() */
> > > > > +   if (sq->vq->reset)
> > > > > +           return;
> > > >
> > > >
> > > > We've disabled TX napi, any chance we can still hit this?
> > >
> > > Same as above.
> > >
> > > >
> > > >
> > > > > +
> > > > >     if (__netif_tx_trylock(txq)) {
> > > > >             do {
> > > > >                     virtqueue_disable_cb(sq->vq);
> > > > > @@ -1769,6 +1794,62 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > >     return NETDEV_TX_OK;
> > > > >   }
> > > > >
> > > > > +static int virtnet_rx_resize(struct virtnet_info *vi,
> > > > > +                        struct receive_queue *rq, u32 ring_num)
> > > > > +{
> > > > > +   int err;
> > > > > +
> > > > > +   napi_disable(&rq->napi);
> > > > > +
> > > > > +   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> > > > > +   if (err)
> > > > > +           goto err;
> > > > > +
> > > > > +   if (!try_fill_recv(vi, rq, GFP_KERNEL))
> > > > > +           schedule_delayed_work(&vi->refill, 0);
> > > > > +
> > > > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > > +   return 0;
> > > > > +
> > > > > +err:
> > > > > +   netdev_err(vi->dev,
> > > > > +              "reset rx reset vq fail: rx queue index: %td err: %d\n",
> > > > > +              rq - vi->rq, err);
> > > > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > > +   return err;
> > > > > +}
> > > > > +
> > > > > +static int virtnet_tx_resize(struct virtnet_info *vi,
> > > > > +                        struct send_queue *sq, u32 ring_num)
> > > > > +{
> > > > > +   struct netdev_queue *txq;
> > > > > +   int err, qindex;
> > > > > +
> > > > > +   qindex = sq - vi->sq;
> > > > > +
> > > > > +   virtnet_napi_tx_disable(&sq->napi);
> > > > > +
> > > > > +   txq = netdev_get_tx_queue(vi->dev, qindex);
> > > > > +   __netif_tx_lock_bh(txq);
> > > > > +   netif_stop_subqueue(vi->dev, qindex);
> > > > > +   __netif_tx_unlock_bh(txq);
> > > > > +
> > > > > +   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> > > > > +   if (err)
> > > > > +           goto err;
> > > > > +
> > > > > +   netif_start_subqueue(vi->dev, qindex);
> > > > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > > > +   return 0;
> > > > > +
> > > > > +err:
> > > >
> > > >
> > > > I guess we can still start the queue in this case? (Since we don't
> > > > change the queue if resize fails).
> > >
> > > Yes, you are right.
> > >
> > > Thanks.
> > >
> > > >
> > > >
> > > > > +   netdev_err(vi->dev,
> > > > > +              "reset tx reset vq fail: tx queue index: %td err: %d\n",
> > > > > +              sq - vi->sq, err);
> > > > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > > > +   return err;
> > > > > +}
> > > > > +
> > > > >   /*
> > > > >    * Send command via the control virtqueue and check status.  Commands
> > > > >    * supported by the hypervisor, as indicated by feature bits, should
> > > >
> > >
> >
>

Xuan Zhuo April 15, 2022, 9:17 a.m. UTC | #6

On Fri, 15 Apr 2022 13:53:54 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Apr 15, 2022 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 14 Apr 2022 17:30:02 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Apr 13, 2022 at 4:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 13 Apr 2022 16:00:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > 在 2022/4/6 上午11:43, Xuan Zhuo 写道:
> > > > > > This patch implements the resize function of the rx, tx queues.
> > > > > > Based on this function, it is possible to modify the ring num of the
> > > > > > queue.
> > > > > >
> > > > > > There may be an exception during the resize process, the resize may
> > > > > > fail, or the vq can no longer be used. Either way, we must execute
> > > > > > napi_enable(). Because napi_disable is similar to a lock, napi_enable
> > > > > > must be called after calling napi_disable.
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >   drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
> > > > > >   1 file changed, 81 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index b8bf00525177..ba6859f305f7 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
> > > > > >     char padding[4];
> > > > > >   };
> > > > > >
> > > > > > +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > > > +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > > > +
> > > > > >   static bool is_xdp_frame(void *ptr)
> > > > > >   {
> > > > > >     return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > > > > > @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
> > > > > >   {
> > > > > >     napi_enable(napi);
> > > > > >
> > > > > > +   /* Check if vq is in reset state. The normal reset/resize process will
> > > > > > +    * be protected by napi. However, the protection of napi is only enabled
> > > > > > +    * during the operation, and the protection of napi will end after the
> > > > > > +    * operation is completed. If re-enable fails during the process, vq
> > > > > > +    * will remain unavailable with reset state.
> > > > > > +    */
> > > > > > +   if (vq->reset)
> > > > > > +           return;
> > > > >
> > > > >
> > > > > I don't get when could we hit this condition.
> > > >
> > > >
> > > > In patch 23, the code to implement re-enable vq is as follows:
> > > >
> > > > +static int vp_modern_enable_reset_vq(struct virtqueue *vq)
> > > > +{
> > > > +       struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
> > > > +       struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
> > > > +       struct virtio_pci_vq_info *info;
> > > > +       unsigned long flags, index;
> > > > +       int err;
> > > > +
> > > > +       if (!vq->reset)
> > > > +               return -EBUSY;
> > > > +
> > > > +       index = vq->index;
> > > > +       info = vp_dev->vqs[index];
> > > > +
> > > > +       /* check queue reset status */
> > > > +       if (vp_modern_get_queue_reset(mdev, index) != 1)
> > > > +               return -EBUSY;
> > > > +
> > > > +       err = vp_active_vq(vq, info->msix_vector);
> > > > +       if (err)
> > > > +               return err;
> > > > +
> > > > +       if (vq->callback) {
> > > > +               spin_lock_irqsave(&vp_dev->lock, flags);
> > > > +               list_add(&info->node, &vp_dev->virtqueues);
> > > > +               spin_unlock_irqrestore(&vp_dev->lock, flags);
> > > > +       } else {
> > > > +               INIT_LIST_HEAD(&info->node);
> > > > +       }
> > > > +
> > > > +       vp_modern_set_queue_enable(&vp_dev->mdev, index, true);
> > > > +
> > > > +       if (vp_dev->per_vq_vectors && info->msix_vector != VIRTIO_MSI_NO_VECTOR)
> > > > +               enable_irq(pci_irq_vector(vp_dev->pci_dev, info->msix_vector));
> > > > +
> > > > +       vq->reset = false;
> > > > +
> > > > +       return 0;
> > > > +}
> > > >
> > > >
> > > > There are three situations where an error will be returned. These are the
> > > > situations I want to handle.
> > >
> > > Right, but it looks harmless if we just schedule the NAPI without the check.
> >
> > Yes.
> >
> > > >
> > > > But I'm rethinking the question, and I feel like you're right, although the
> > > > hardware setup may fail. We can no longer sync with the hardware. But using it
> > > > as a normal vq doesn't have any problems.
> > >
> > > Note that we should make sure the buggy(malicous) device won't crash
> > > the codes by changing the queue_reset value at its will.
> >
> > I will keep an eye on this situation.
> >
> > >
> > > >
> > > > >
> > > > >
> > > > > > +
> > > > > >     /* If all buffers were filled by other side before we napi_enabled, we
> > > > > >      * won't get another interrupt, so process any outstanding packets now.
> > > > > >      * Call local_bh_enable after to trigger softIRQ processing.
> > > > > > @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
> > > > > >             struct receive_queue *rq = &vi->rq[i];
> > > > > >
> > > > > >             napi_disable(&rq->napi);
> > > > > > +
> > > > > > +           /* Check if vq is in reset state. See more in
> > > > > > +            * virtnet_napi_enable()
> > > > > > +            */
> > > > > > +           if (rq->vq->reset) {
> > > > > > +                   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > > > +                   continue;
> > > > > > +           }
> > > > >
> > > > >
> > > > > Can we do something similar in virtnet_close() by canceling the work?
> > > >
> > > > I think there is no need to cancel the work here, because napi_disable will wait
> > > > for the napi_enable of the resize. So if the re-enable failed vq is used as a normal
> > > > vq, this logic can be removed.
> > >
> > > Actually I meant the part of virtnet_rx_resize().
> > >
> > > If we don't synchronize with the refill work, it might enable NAPI unexpectedly?
> >
> > I don't think this situation will be encountered, because napi_disable is
> > mutually exclusive, so there will be no unexpected napi enable.
> >
> > Is there something I misunderstood?
>
> So in virtnet_rx_resize() we do:
>
> napi_disable()
> ...
> resize()
> ...
> napi_enalbe()
>
> How can we guarantee that the work is not run after the napi_disable()?


I think you're talking about a situation like this:

virtnet_rx_resize          refill work
-----------------------------------------------------------
 napi_disable()
 ...                       napi_disable()
 resize()                      ...
                           napi_enable()
 ...
 napi_enalbe()


But in fact:

virtnet_rx_resize          refill work
-----------------------------------------------------------
 napi_disable()
 ...                       napi_disable() <----[0]
 resize()                       |
 ...                            |
 napi_enalbe()                  |
                           napi_disable() <---- [1] here success
                           napi_enable()

Because virtnet_rx_resize() has already executed napi_disable(), napi_disalbe()
of [0] will wait until [1] to complete.

I'm not sure if my understanding is correct.

Thanks.

>
> Thanks
>
> >
> > Thanks.
> >
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > > +
> > > > > >             still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> > > > > >             virtnet_napi_enable(rq->vq, &rq->napi);
> > > > > >
> > > > > > @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > > > > >     if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
> > > > > >             return;
> > > > > >
> > > > > > +   /* Check if vq is in reset state. See more in virtnet_napi_enable() */
> > > > > > +   if (sq->vq->reset)
> > > > > > +           return;
> > > > >
> > > > >
> > > > > We've disabled TX napi, any chance we can still hit this?
> > > >
> > > > Same as above.
> > > >
> > > > >
> > > > >
> > > > > > +
> > > > > >     if (__netif_tx_trylock(txq)) {
> > > > > >             do {
> > > > > >                     virtqueue_disable_cb(sq->vq);
> > > > > > @@ -1769,6 +1794,62 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >     return NETDEV_TX_OK;
> > > > > >   }
> > > > > >
> > > > > > +static int virtnet_rx_resize(struct virtnet_info *vi,
> > > > > > +                        struct receive_queue *rq, u32 ring_num)
> > > > > > +{
> > > > > > +   int err;
> > > > > > +
> > > > > > +   napi_disable(&rq->napi);
> > > > > > +
> > > > > > +   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> > > > > > +   if (err)
> > > > > > +           goto err;
> > > > > > +
> > > > > > +   if (!try_fill_recv(vi, rq, GFP_KERNEL))
> > > > > > +           schedule_delayed_work(&vi->refill, 0);
> > > > > > +
> > > > > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > > > +   return 0;
> > > > > > +
> > > > > > +err:
> > > > > > +   netdev_err(vi->dev,
> > > > > > +              "reset rx reset vq fail: rx queue index: %td err: %d\n",
> > > > > > +              rq - vi->rq, err);
> > > > > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > > > +   return err;
> > > > > > +}
> > > > > > +
> > > > > > +static int virtnet_tx_resize(struct virtnet_info *vi,
> > > > > > +                        struct send_queue *sq, u32 ring_num)
> > > > > > +{
> > > > > > +   struct netdev_queue *txq;
> > > > > > +   int err, qindex;
> > > > > > +
> > > > > > +   qindex = sq - vi->sq;
> > > > > > +
> > > > > > +   virtnet_napi_tx_disable(&sq->napi);
> > > > > > +
> > > > > > +   txq = netdev_get_tx_queue(vi->dev, qindex);
> > > > > > +   __netif_tx_lock_bh(txq);
> > > > > > +   netif_stop_subqueue(vi->dev, qindex);
> > > > > > +   __netif_tx_unlock_bh(txq);
> > > > > > +
> > > > > > +   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> > > > > > +   if (err)
> > > > > > +           goto err;
> > > > > > +
> > > > > > +   netif_start_subqueue(vi->dev, qindex);
> > > > > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > > > > +   return 0;
> > > > > > +
> > > > > > +err:
> > > > >
> > > > >
> > > > > I guess we can still start the queue in this case? (Since we don't
> > > > > change the queue if resize fails).
> > > >
> > > > Yes, you are right.
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > >
> > > > > > +   netdev_err(vi->dev,
> > > > > > +              "reset tx reset vq fail: tx queue index: %td err: %d\n",
> > > > > > +              sq - vi->sq, err);
> > > > > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > > > > +   return err;
> > > > > > +}
> > > > > > +
> > > > > >   /*
> > > > > >    * Send command via the control virtqueue and check status.  Commands
> > > > > >    * supported by the hypervisor, as indicated by feature bits, should
> > > > >
> > > >
> > >
> >
>

Xuan Zhuo April 18, 2022, 3:21 a.m. UTC | #7

On Wed, 13 Apr 2022 16:00:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
>
> 在 2022/4/6 上午11:43, Xuan Zhuo 写道:
> > This patch implements the resize function of the rx, tx queues.
> > Based on this function, it is possible to modify the ring num of the
> > queue.
> >
> > There may be an exception during the resize process, the resize may
> > fail, or the vq can no longer be used. Either way, we must execute
> > napi_enable(). Because napi_disable is similar to a lock, napi_enable
> > must be called after calling napi_disable.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >   drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 81 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index b8bf00525177..ba6859f305f7 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
> >   	char padding[4];
> >   };
> >
> > +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> > +
> >   static bool is_xdp_frame(void *ptr)
> >   {
> >   	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
> >   {
> >   	napi_enable(napi);
> >
> > +	/* Check if vq is in reset state. The normal reset/resize process will
> > +	 * be protected by napi. However, the protection of napi is only enabled
> > +	 * during the operation, and the protection of napi will end after the
> > +	 * operation is completed. If re-enable fails during the process, vq
> > +	 * will remain unavailable with reset state.
> > +	 */
> > +	if (vq->reset)
> > +		return;
>
>
> I don't get when could we hit this condition.
>
>
> > +
> >   	/* If all buffers were filled by other side before we napi_enabled, we
> >   	 * won't get another interrupt, so process any outstanding packets now.
> >   	 * Call local_bh_enable after to trigger softIRQ processing.
> > @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
> >   		struct receive_queue *rq = &vi->rq[i];
> >
> >   		napi_disable(&rq->napi);
> > +
> > +		/* Check if vq is in reset state. See more in
> > +		 * virtnet_napi_enable()
> > +		 */
> > +		if (rq->vq->reset) {
> > +			virtnet_napi_enable(rq->vq, &rq->napi);
> > +			continue;
> > +		}
>
>
> Can we do something similar in virtnet_close() by canceling the work?
>
>
> > +
> >   		still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> >   		virtnet_napi_enable(rq->vq, &rq->napi);
> >
> > @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >   	if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
> >   		return;
> >
> > +	/* Check if vq is in reset state. See more in virtnet_napi_enable() */
> > +	if (sq->vq->reset)
> > +		return;
>
>
> We've disabled TX napi, any chance we can still hit this?


static int virtnet_poll(struct napi_struct *napi, int budget)
{
	struct receive_queue *rq =
		container_of(napi, struct receive_queue, napi);
	struct virtnet_info *vi = rq->vq->vdev->priv;
	struct send_queue *sq;
	unsigned int received;
	unsigned int xdp_xmit = 0;

	virtnet_poll_cleantx(rq);
...
}

This is called by rx poll. Although it is the logic of tx, it is not driven by
tx napi, but is called in rx poll.

Thanks.


>
>
> > +
> >   	if (__netif_tx_trylock(txq)) {
> >   		do {
> >   			virtqueue_disable_cb(sq->vq);
> > @@ -1769,6 +1794,62 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >   	return NETDEV_TX_OK;
> >   }
> >
> > +static int virtnet_rx_resize(struct virtnet_info *vi,
> > +			     struct receive_queue *rq, u32 ring_num)
> > +{
> > +	int err;
> > +
> > +	napi_disable(&rq->napi);
> > +
> > +	err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> > +	if (err)
> > +		goto err;
> > +
> > +	if (!try_fill_recv(vi, rq, GFP_KERNEL))
> > +		schedule_delayed_work(&vi->refill, 0);
> > +
> > +	virtnet_napi_enable(rq->vq, &rq->napi);
> > +	return 0;
> > +
> > +err:
> > +	netdev_err(vi->dev,
> > +		   "reset rx reset vq fail: rx queue index: %td err: %d\n",
> > +		   rq - vi->rq, err);
> > +	virtnet_napi_enable(rq->vq, &rq->napi);
> > +	return err;
> > +}
> > +
> > +static int virtnet_tx_resize(struct virtnet_info *vi,
> > +			     struct send_queue *sq, u32 ring_num)
> > +{
> > +	struct netdev_queue *txq;
> > +	int err, qindex;
> > +
> > +	qindex = sq - vi->sq;
> > +
> > +	virtnet_napi_tx_disable(&sq->napi);
> > +
> > +	txq = netdev_get_tx_queue(vi->dev, qindex);
> > +	__netif_tx_lock_bh(txq);
> > +	netif_stop_subqueue(vi->dev, qindex);
> > +	__netif_tx_unlock_bh(txq);
> > +
> > +	err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> > +	if (err)
> > +		goto err;
> > +
> > +	netif_start_subqueue(vi->dev, qindex);
> > +	virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > +	return 0;
> > +
> > +err:
>
>
> I guess we can still start the queue in this case? (Since we don't
> change the queue if resize fails).
>
>
> > +	netdev_err(vi->dev,
> > +		   "reset tx reset vq fail: tx queue index: %td err: %d\n",
> > +		   sq - vi->sq, err);
> > +	virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > +	return err;
> > +}
> > +
> >   /*
> >    * Send command via the control virtqueue and check status.  Commands
> >    * supported by the hypervisor, as indicated by feature bits, should
>

Jason Wang April 18, 2022, 7:49 a.m. UTC | #8

On Mon, Apr 18, 2022 at 11:24 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 13 Apr 2022 16:00:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >
> > 在 2022/4/6 上午11:43, Xuan Zhuo 写道:
> > > This patch implements the resize function of the rx, tx queues.
> > > Based on this function, it is possible to modify the ring num of the
> > > queue.
> > >
> > > There may be an exception during the resize process, the resize may
> > > fail, or the vq can no longer be used. Either way, we must execute
> > > napi_enable(). Because napi_disable is similar to a lock, napi_enable
> > > must be called after calling napi_disable.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >   drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 81 insertions(+)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index b8bf00525177..ba6859f305f7 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
> > >     char padding[4];
> > >   };
> > >
> > > +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > +
> > >   static bool is_xdp_frame(void *ptr)
> > >   {
> > >     return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > > @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
> > >   {
> > >     napi_enable(napi);
> > >
> > > +   /* Check if vq is in reset state. The normal reset/resize process will
> > > +    * be protected by napi. However, the protection of napi is only enabled
> > > +    * during the operation, and the protection of napi will end after the
> > > +    * operation is completed. If re-enable fails during the process, vq
> > > +    * will remain unavailable with reset state.
> > > +    */
> > > +   if (vq->reset)
> > > +           return;
> >
> >
> > I don't get when could we hit this condition.
> >
> >
> > > +
> > >     /* If all buffers were filled by other side before we napi_enabled, we
> > >      * won't get another interrupt, so process any outstanding packets now.
> > >      * Call local_bh_enable after to trigger softIRQ processing.
> > > @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
> > >             struct receive_queue *rq = &vi->rq[i];
> > >
> > >             napi_disable(&rq->napi);
> > > +
> > > +           /* Check if vq is in reset state. See more in
> > > +            * virtnet_napi_enable()
> > > +            */
> > > +           if (rq->vq->reset) {
> > > +                   virtnet_napi_enable(rq->vq, &rq->napi);
> > > +                   continue;
> > > +           }
> >
> >
> > Can we do something similar in virtnet_close() by canceling the work?
> >
> >
> > > +
> > >             still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> > >             virtnet_napi_enable(rq->vq, &rq->napi);
> > >
> > > @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > >     if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
> > >             return;
> > >
> > > +   /* Check if vq is in reset state. See more in virtnet_napi_enable() */
> > > +   if (sq->vq->reset)
> > > +           return;
> >
> >
> > We've disabled TX napi, any chance we can still hit this?
>
>
> static int virtnet_poll(struct napi_struct *napi, int budget)
> {
>         struct receive_queue *rq =
>                 container_of(napi, struct receive_queue, napi);
>         struct virtnet_info *vi = rq->vq->vdev->priv;
>         struct send_queue *sq;
>         unsigned int received;
>         unsigned int xdp_xmit = 0;
>
>         virtnet_poll_cleantx(rq);
> ...
> }
>
> This is called by rx poll. Although it is the logic of tx, it is not driven by
> tx napi, but is called in rx poll.

Ok, but we need guarantee the memory ordering in this case. Disable RX
napi could be a solution for this.

Thanks

>
> Thanks.
>
>
> >
> >
> > > +
> > >     if (__netif_tx_trylock(txq)) {
> > >             do {
> > >                     virtqueue_disable_cb(sq->vq);
> > > @@ -1769,6 +1794,62 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >     return NETDEV_TX_OK;
> > >   }
> > >
> > > +static int virtnet_rx_resize(struct virtnet_info *vi,
> > > +                        struct receive_queue *rq, u32 ring_num)
> > > +{
> > > +   int err;
> > > +
> > > +   napi_disable(&rq->napi);
> > > +
> > > +   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> > > +   if (err)
> > > +           goto err;
> > > +
> > > +   if (!try_fill_recv(vi, rq, GFP_KERNEL))
> > > +           schedule_delayed_work(&vi->refill, 0);
> > > +
> > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > +   return 0;
> > > +
> > > +err:
> > > +   netdev_err(vi->dev,
> > > +              "reset rx reset vq fail: rx queue index: %td err: %d\n",
> > > +              rq - vi->rq, err);
> > > +   virtnet_napi_enable(rq->vq, &rq->napi);
> > > +   return err;
> > > +}
> > > +
> > > +static int virtnet_tx_resize(struct virtnet_info *vi,
> > > +                        struct send_queue *sq, u32 ring_num)
> > > +{
> > > +   struct netdev_queue *txq;
> > > +   int err, qindex;
> > > +
> > > +   qindex = sq - vi->sq;
> > > +
> > > +   virtnet_napi_tx_disable(&sq->napi);
> > > +
> > > +   txq = netdev_get_tx_queue(vi->dev, qindex);
> > > +   __netif_tx_lock_bh(txq);
> > > +   netif_stop_subqueue(vi->dev, qindex);
> > > +   __netif_tx_unlock_bh(txq);
> > > +
> > > +   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> > > +   if (err)
> > > +           goto err;
> > > +
> > > +   netif_start_subqueue(vi->dev, qindex);
> > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > +   return 0;
> > > +
> > > +err:
> >
> >
> > I guess we can still start the queue in this case? (Since we don't
> > change the queue if resize fails).
> >
> >
> > > +   netdev_err(vi->dev,
> > > +              "reset tx reset vq fail: tx queue index: %td err: %d\n",
> > > +              sq - vi->sq, err);
> > > +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> > > +   return err;
> > > +}
> > > +
> > >   /*
> > >    * Send command via the control virtqueue and check status.  Commands
> > >    * supported by the hypervisor, as indicated by feature bits, should
> >
>

Jason Wang April 18, 2022, 7:57 a.m. UTC | #9

在 2022/4/15 17:17, Xuan Zhuo 写道:
> On Fri, 15 Apr 2022 13:53:54 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> On Fri, Apr 15, 2022 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>> On Thu, 14 Apr 2022 17:30:02 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>> On Wed, Apr 13, 2022 at 4:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>> On Wed, 13 Apr 2022 16:00:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/4/6 上午11:43, Xuan Zhuo 写道:
>>>>>>> This patch implements the resize function of the rx, tx queues.
>>>>>>> Based on this function, it is possible to modify the ring num of the
>>>>>>> queue.
>>>>>>>
>>>>>>> There may be an exception during the resize process, the resize may
>>>>>>> fail, or the vq can no longer be used. Either way, we must execute
>>>>>>> napi_enable(). Because napi_disable is similar to a lock, napi_enable
>>>>>>> must be called after calling napi_disable.
>>>>>>>
>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>>> ---
>>>>>>>    drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
>>>>>>>    1 file changed, 81 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index b8bf00525177..ba6859f305f7 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
>>>>>>>      char padding[4];
>>>>>>>    };
>>>>>>>
>>>>>>> +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
>>>>>>> +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
>>>>>>> +
>>>>>>>    static bool is_xdp_frame(void *ptr)
>>>>>>>    {
>>>>>>>      return (unsigned long)ptr & VIRTIO_XDP_FLAG;
>>>>>>> @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
>>>>>>>    {
>>>>>>>      napi_enable(napi);
>>>>>>>
>>>>>>> +   /* Check if vq is in reset state. The normal reset/resize process will
>>>>>>> +    * be protected by napi. However, the protection of napi is only enabled
>>>>>>> +    * during the operation, and the protection of napi will end after the
>>>>>>> +    * operation is completed. If re-enable fails during the process, vq
>>>>>>> +    * will remain unavailable with reset state.
>>>>>>> +    */
>>>>>>> +   if (vq->reset)
>>>>>>> +           return;
>>>>>>
>>>>>> I don't get when could we hit this condition.
>>>>>
>>>>> In patch 23, the code to implement re-enable vq is as follows:
>>>>>
>>>>> +static int vp_modern_enable_reset_vq(struct virtqueue *vq)
>>>>> +{
>>>>> +       struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
>>>>> +       struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
>>>>> +       struct virtio_pci_vq_info *info;
>>>>> +       unsigned long flags, index;
>>>>> +       int err;
>>>>> +
>>>>> +       if (!vq->reset)
>>>>> +               return -EBUSY;
>>>>> +
>>>>> +       index = vq->index;
>>>>> +       info = vp_dev->vqs[index];
>>>>> +
>>>>> +       /* check queue reset status */
>>>>> +       if (vp_modern_get_queue_reset(mdev, index) != 1)
>>>>> +               return -EBUSY;
>>>>> +
>>>>> +       err = vp_active_vq(vq, info->msix_vector);
>>>>> +       if (err)
>>>>> +               return err;
>>>>> +
>>>>> +       if (vq->callback) {
>>>>> +               spin_lock_irqsave(&vp_dev->lock, flags);
>>>>> +               list_add(&info->node, &vp_dev->virtqueues);
>>>>> +               spin_unlock_irqrestore(&vp_dev->lock, flags);
>>>>> +       } else {
>>>>> +               INIT_LIST_HEAD(&info->node);
>>>>> +       }
>>>>> +
>>>>> +       vp_modern_set_queue_enable(&vp_dev->mdev, index, true);
>>>>> +
>>>>> +       if (vp_dev->per_vq_vectors && info->msix_vector != VIRTIO_MSI_NO_VECTOR)
>>>>> +               enable_irq(pci_irq_vector(vp_dev->pci_dev, info->msix_vector));
>>>>> +
>>>>> +       vq->reset = false;
>>>>> +
>>>>> +       return 0;
>>>>> +}
>>>>>
>>>>>
>>>>> There are three situations where an error will be returned. These are the
>>>>> situations I want to handle.
>>>> Right, but it looks harmless if we just schedule the NAPI without the check.
>>> Yes.
>>>
>>>>> But I'm rethinking the question, and I feel like you're right, although the
>>>>> hardware setup may fail. We can no longer sync with the hardware. But using it
>>>>> as a normal vq doesn't have any problems.
>>>> Note that we should make sure the buggy(malicous) device won't crash
>>>> the codes by changing the queue_reset value at its will.
>>> I will keep an eye on this situation.
>>>
>>>>>>
>>>>>>> +
>>>>>>>      /* If all buffers were filled by other side before we napi_enabled, we
>>>>>>>       * won't get another interrupt, so process any outstanding packets now.
>>>>>>>       * Call local_bh_enable after to trigger softIRQ processing.
>>>>>>> @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
>>>>>>>              struct receive_queue *rq = &vi->rq[i];
>>>>>>>
>>>>>>>              napi_disable(&rq->napi);
>>>>>>> +
>>>>>>> +           /* Check if vq is in reset state. See more in
>>>>>>> +            * virtnet_napi_enable()
>>>>>>> +            */
>>>>>>> +           if (rq->vq->reset) {
>>>>>>> +                   virtnet_napi_enable(rq->vq, &rq->napi);
>>>>>>> +                   continue;
>>>>>>> +           }
>>>>>>
>>>>>> Can we do something similar in virtnet_close() by canceling the work?
>>>>> I think there is no need to cancel the work here, because napi_disable will wait
>>>>> for the napi_enable of the resize. So if the re-enable failed vq is used as a normal
>>>>> vq, this logic can be removed.
>>>> Actually I meant the part of virtnet_rx_resize().
>>>>
>>>> If we don't synchronize with the refill work, it might enable NAPI unexpectedly?
>>> I don't think this situation will be encountered, because napi_disable is
>>> mutually exclusive, so there will be no unexpected napi enable.
>>>
>>> Is there something I misunderstood?
>> So in virtnet_rx_resize() we do:
>>
>> napi_disable()
>> ...
>> resize()
>> ...
>> napi_enalbe()
>>
>> How can we guarantee that the work is not run after the napi_disable()?
>
> I think you're talking about a situation like this:
>
> virtnet_rx_resize          refill work
> -----------------------------------------------------------
>   napi_disable()
>   ...                       napi_disable()
>   resize()                      ...
>                             napi_enable()
>   ...
>   napi_enalbe()
>
>
> But in fact:
>
> virtnet_rx_resize          refill work
> -----------------------------------------------------------
>   napi_disable()
>   ...                       napi_disable() <----[0]
>   resize()                       |
>   ...                            |
>   napi_enalbe()                  |
>                             napi_disable() <---- [1] here success
>                             napi_enable()
>
> Because virtnet_rx_resize() has already executed napi_disable(), napi_disalbe()
> of [0] will wait until [1] to complete.
>
> I'm not sure if my understanding is correct.


I think you're right here.

Thanks


>
> Thanks.
>
>> Thanks
>>
>>> Thanks.
>>>
>>>> Thanks
>>>>
>>>>>
>>>>>>
>>>>>>> +
>>>>>>>              still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
>>>>>>>              virtnet_napi_enable(rq->vq, &rq->napi);
>>>>>>>
>>>>>>> @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>>>>>>>      if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
>>>>>>>              return;
>>>>>>>
>>>>>>> +   /* Check if vq is in reset state. See more in virtnet_napi_enable() */
>>>>>>> +   if (sq->vq->reset)
>>>>>>> +           return;
>>>>>>
>>>>>> We've disabled TX napi, any chance we can still hit this?
>>>>> Same as above.
>>>>>
>>>>>>
>>>>>>> +
>>>>>>>      if (__netif_tx_trylock(txq)) {
>>>>>>>              do {
>>>>>>>                      virtqueue_disable_cb(sq->vq);
>>>>>>> @@ -1769,6 +1794,62 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>>>      return NETDEV_TX_OK;
>>>>>>>    }
>>>>>>>
>>>>>>> +static int virtnet_rx_resize(struct virtnet_info *vi,
>>>>>>> +                        struct receive_queue *rq, u32 ring_num)
>>>>>>> +{
>>>>>>> +   int err;
>>>>>>> +
>>>>>>> +   napi_disable(&rq->napi);
>>>>>>> +
>>>>>>> +   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
>>>>>>> +   if (err)
>>>>>>> +           goto err;
>>>>>>> +
>>>>>>> +   if (!try_fill_recv(vi, rq, GFP_KERNEL))
>>>>>>> +           schedule_delayed_work(&vi->refill, 0);
>>>>>>> +
>>>>>>> +   virtnet_napi_enable(rq->vq, &rq->napi);
>>>>>>> +   return 0;
>>>>>>> +
>>>>>>> +err:
>>>>>>> +   netdev_err(vi->dev,
>>>>>>> +              "reset rx reset vq fail: rx queue index: %td err: %d\n",
>>>>>>> +              rq - vi->rq, err);
>>>>>>> +   virtnet_napi_enable(rq->vq, &rq->napi);
>>>>>>> +   return err;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int virtnet_tx_resize(struct virtnet_info *vi,
>>>>>>> +                        struct send_queue *sq, u32 ring_num)
>>>>>>> +{
>>>>>>> +   struct netdev_queue *txq;
>>>>>>> +   int err, qindex;
>>>>>>> +
>>>>>>> +   qindex = sq - vi->sq;
>>>>>>> +
>>>>>>> +   virtnet_napi_tx_disable(&sq->napi);
>>>>>>> +
>>>>>>> +   txq = netdev_get_tx_queue(vi->dev, qindex);
>>>>>>> +   __netif_tx_lock_bh(txq);
>>>>>>> +   netif_stop_subqueue(vi->dev, qindex);
>>>>>>> +   __netif_tx_unlock_bh(txq);
>>>>>>> +
>>>>>>> +   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
>>>>>>> +   if (err)
>>>>>>> +           goto err;
>>>>>>> +
>>>>>>> +   netif_start_subqueue(vi->dev, qindex);
>>>>>>> +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
>>>>>>> +   return 0;
>>>>>>> +
>>>>>>> +err:
>>>>>>
>>>>>> I guess we can still start the queue in this case? (Since we don't
>>>>>> change the queue if resize fails).
>>>>> Yes, you are right.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>>
>>>>>>> +   netdev_err(vi->dev,
>>>>>>> +              "reset tx reset vq fail: tx queue index: %td err: %d\n",
>>>>>>> +              sq - vi->sq, err);
>>>>>>> +   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
>>>>>>> +   return err;
>>>>>>> +}
>>>>>>> +
>>>>>>>    /*
>>>>>>>     * Send command via the control virtqueue and check status.  Commands
>>>>>>>     * supported by the hypervisor, as indicated by feature bits, should

Xuan Zhuo April 18, 2022, 8:48 a.m. UTC | #10

On Mon, 18 Apr 2022 15:49:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 18, 2022 at 11:24 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 13 Apr 2022 16:00:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > 在 2022/4/6 上午11:43, Xuan Zhuo 写道:
> > > > This patch implements the resize function of the rx, tx queues.
> > > > Based on this function, it is possible to modify the ring num of the
> > > > queue.
> > > >
> > > > There may be an exception during the resize process, the resize may
> > > > fail, or the vq can no longer be used. Either way, we must execute
> > > > napi_enable(). Because napi_disable is similar to a lock, napi_enable
> > > > must be called after calling napi_disable.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >   drivers/net/virtio_net.c | 81 ++++++++++++++++++++++++++++++++++++++++
> > > >   1 file changed, 81 insertions(+)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index b8bf00525177..ba6859f305f7 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -251,6 +251,9 @@ struct padded_vnet_hdr {
> > > >     char padding[4];
> > > >   };
> > > >
> > > > +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > +
> > > >   static bool is_xdp_frame(void *ptr)
> > > >   {
> > > >     return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > > > @@ -1369,6 +1372,15 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
> > > >   {
> > > >     napi_enable(napi);
> > > >
> > > > +   /* Check if vq is in reset state. The normal reset/resize process will
> > > > +    * be protected by napi. However, the protection of napi is only enabled
> > > > +    * during the operation, and the protection of napi will end after the
> > > > +    * operation is completed. If re-enable fails during the process, vq
> > > > +    * will remain unavailable with reset state.
> > > > +    */
> > > > +   if (vq->reset)
> > > > +           return;
> > >
> > >
> > > I don't get when could we hit this condition.
> > >
> > >
> > > > +
> > > >     /* If all buffers were filled by other side before we napi_enabled, we
> > > >      * won't get another interrupt, so process any outstanding packets now.
> > > >      * Call local_bh_enable after to trigger softIRQ processing.
> > > > @@ -1413,6 +1425,15 @@ static void refill_work(struct work_struct *work)
> > > >             struct receive_queue *rq = &vi->rq[i];
> > > >
> > > >             napi_disable(&rq->napi);
> > > > +
> > > > +           /* Check if vq is in reset state. See more in
> > > > +            * virtnet_napi_enable()
> > > > +            */
> > > > +           if (rq->vq->reset) {
> > > > +                   virtnet_napi_enable(rq->vq, &rq->napi);
> > > > +                   continue;
> > > > +           }
> > >
> > >
> > > Can we do something similar in virtnet_close() by canceling the work?
> > >
> > >
> > > > +
> > > >             still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> > > >             virtnet_napi_enable(rq->vq, &rq->napi);
> > > >
> > > > @@ -1523,6 +1544,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > > >     if (!sq->napi.weight || is_xdp_raw_buffer_queue(vi, index))
> > > >             return;
> > > >
> > > > +   /* Check if vq is in reset state. See more in virtnet_napi_enable() */
> > > > +   if (sq->vq->reset)
> > > > +           return;
> > >
> > >
> > > We've disabled TX napi, any chance we can still hit this?
> >
> >
> > static int virtnet_poll(struct napi_struct *napi, int budget)
> > {
> >         struct receive_queue *rq =
> >                 container_of(napi, struct receive_queue, napi);
> >         struct virtnet_info *vi = rq->vq->vdev->priv;
> >         struct send_queue *sq;
> >         unsigned int received;
> >         unsigned int xdp_xmit = 0;
> >
> >         virtnet_poll_cleantx(rq);
> > ...
> > }
> >
> > This is called by rx poll. Although it is the logic of tx, it is not driven by
> > tx napi, but is called in rx poll.
>
> Ok, but we need guarantee the memory ordering in this case. Disable RX
> napi could be a solution for this.

Yes, I have realized this too. I have two solutions, disable rx napi or the
following.

Thanks.


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9bf1b6530b38..7764d1dcb831 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -135,6 +135,7 @@ struct send_queue {
 	struct virtnet_sq_stats stats;

 	struct napi_struct napi;
+	bool reset;
 };

 /* Internal representation of a receive virtqueue */
@@ -1583,6 +1587,11 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 		return;

 	if (__netif_tx_trylock(txq)) {
+		if (sq->reset) {
+			__netif_tx_unlock(txq);
+			return;
+		}
+
 		do {
 			virtqueue_disable_cb(sq->vq);
 			free_old_xmit_skbs(sq, true);
@@ -1828,6 +1837,56 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return NETDEV_TX_OK;
 }

+static int virtnet_tx_resize(struct virtnet_info *vi,
+			     struct send_queue *sq, u32 ring_num)
+{
+	struct netdev_queue *txq;
+	int err, qindex;
+
+	qindex = sq - vi->sq;
+
+	virtnet_napi_tx_disable(&sq->napi);
+
+	txq = netdev_get_tx_queue(vi->dev, qindex);
+
+	__netif_tx_lock_bh(txq);
+	netif_stop_subqueue(vi->dev, qindex);
+	sq->reset = true;
+	__netif_tx_unlock_bh(txq);
+
+	err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
+	if (err)
+		netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
+
+	__netif_tx_lock_bh(txq);
+	sq->reset = false;
+	netif_start_subqueue(vi->dev, qindex);
+	__netif_tx_unlock_bh(txq);
+
+	virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
+	return err;
+}
+

[v9,31/32] virtio_net: support rx/tx queue resize

Commit Message

Comments

Patch