Message ID | 20240116075924.42798-1-xuanzhuo@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | virtio-net: sq support premapped mode | expand |
On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > This is the second part of virtio-net support AF_XDP zero copy. > > The whole patch set > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > ## About the branch > > This patch set is pushed to the net-next branch, but some patches are about > virtio core. Because the entire patch set for virtio-net to support AF_XDP > should be pushed to net-next, I hope these patches will be merged into net-next > with the virtio core maintains's Acked-by. > > ============================================================================ > > ## AF_XDP > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > copy feature of xsk (XDP socket) needs to be supported by the driver. The > performance of zero copy is very good. mlx5 and intel ixgbe already support > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > feature. > > At present, we have completed some preparation: > > 1. vq-reset (virtio spec and kernel code) > 2. virtio-core premapped dma > 3. virtio-net xdp refactor > > So it is time for Virtio-Net to complete the support for the XDP Socket > Zerocopy. > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > kernel. > > On the other hand, Virtio-Net does not support generate interrupt from driver > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > is also the local CPU, then we wake up napi directly. > > This patch set includes some refactor to the virtio-net to let that to support > AF_XDP. > > ## performance > > ENV: Qemu with vhost-user(polling mode). > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > ### virtio PMD in guest with testpmd > > testpmd> show port stats all > > ######################## NIC statistics for port 0 ######################## > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > RX-errors: 0 > RX-nombuf: 0 > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > Throughput (since last show) > Rx-pps: 8861574 Rx-bps: 3969985208 > Tx-pps: 8861493 Tx-bps: 3969962736 > ############################################################################ > > ### AF_XDP PMD in guest with testpmd > > testpmd> show port stats all > > ######################## NIC statistics for port 0 ######################## > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > RX-errors: 0 > RX-nombuf: 0 > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > Throughput (since last show) > Rx-pps: 6333196 Rx-bps: 2837272088 > Tx-pps: 6333227 Tx-bps: 2837285936 > ############################################################################ > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > ## maintain > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > virtio-net. > > Please review. > Rethink of the whole design, I have one question: The reason we need to store DMA information is to harden the virtqueue to make sure the DMA unmap is safe. This seems redundant when the buffer were premapped by the driver, for example: Receive queue maintains DMA information, so it doesn't need desc_extra to work. So can we simply 1) when premapping is enabled, store DMA information by driver itself 2) don't store DMA information in desc_extra Would this be simpler? Thanks
On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > This is the second part of virtio-net support AF_XDP zero copy. > > > > The whole patch set > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > > > ## About the branch > > > > This patch set is pushed to the net-next branch, but some patches are about > > virtio core. Because the entire patch set for virtio-net to support AF_XDP > > should be pushed to net-next, I hope these patches will be merged into net-next > > with the virtio core maintains's Acked-by. > > > > ============================================================================ > > > > ## AF_XDP > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > > copy feature of xsk (XDP socket) needs to be supported by the driver. The > > performance of zero copy is very good. mlx5 and intel ixgbe already support > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > > feature. > > > > At present, we have completed some preparation: > > > > 1. vq-reset (virtio spec and kernel code) > > 2. virtio-core premapped dma > > 3. virtio-net xdp refactor > > > > So it is time for Virtio-Net to complete the support for the XDP Socket > > Zerocopy. > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > > kernel. > > > > On the other hand, Virtio-Net does not support generate interrupt from driver > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > > is also the local CPU, then we wake up napi directly. > > > > This patch set includes some refactor to the virtio-net to let that to support > > AF_XDP. > > > > ## performance > > > > ENV: Qemu with vhost-user(polling mode). > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > > > ### virtio PMD in guest with testpmd > > > > testpmd> show port stats all > > > > ######################## NIC statistics for port 0 ######################## > > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > > RX-errors: 0 > > RX-nombuf: 0 > > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > > > > Throughput (since last show) > > Rx-pps: 8861574 Rx-bps: 3969985208 > > Tx-pps: 8861493 Tx-bps: 3969962736 > > ############################################################################ > > > > ### AF_XDP PMD in guest with testpmd > > > > testpmd> show port stats all > > > > ######################## NIC statistics for port 0 ######################## > > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > > RX-errors: 0 > > RX-nombuf: 0 > > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > > > Throughput (since last show) > > Rx-pps: 6333196 Rx-bps: 2837272088 > > Tx-pps: 6333227 Tx-bps: 2837285936 > > ############################################################################ > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > > > ## maintain > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > > virtio-net. > > > > Please review. > > > > Rethink of the whole design, I have one question: > > The reason we need to store DMA information is to harden the virtqueue > to make sure the DMA unmap is safe. This seems redundant when the > buffer were premapped by the driver, for example: > > Receive queue maintains DMA information, so it doesn't need desc_extra to work. > > So can we simply > > 1) when premapping is enabled, store DMA information by driver itself YES. this is simpler. And this is more convenience. But the driver must allocate memory to store the dma info. > 2) don't store DMA information in desc_extra YES. But the desc_extra memory is wasted. The "next" item is used. Do you think should we free the desc_extra when the vq is premapped mode? Thanks. > > Would this be simpler? > > Thanks >
On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > This is the second part of virtio-net support AF_XDP zero copy. > > > > > > The whole patch set > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > > > > > ## About the branch > > > > > > This patch set is pushed to the net-next branch, but some patches are about > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP > > > should be pushed to net-next, I hope these patches will be merged into net-next > > > with the virtio core maintains's Acked-by. > > > > > > ============================================================================ > > > > > > ## AF_XDP > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The > > > performance of zero copy is very good. mlx5 and intel ixgbe already support > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > > > feature. > > > > > > At present, we have completed some preparation: > > > > > > 1. vq-reset (virtio spec and kernel code) > > > 2. virtio-core premapped dma > > > 3. virtio-net xdp refactor > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket > > > Zerocopy. > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > > > kernel. > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > > > is also the local CPU, then we wake up napi directly. > > > > > > This patch set includes some refactor to the virtio-net to let that to support > > > AF_XDP. > > > > > > ## performance > > > > > > ENV: Qemu with vhost-user(polling mode). > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > > > > > ### virtio PMD in guest with testpmd > > > > > > testpmd> show port stats all > > > > > > ######################## NIC statistics for port 0 ######################## > > > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > > > RX-errors: 0 > > > RX-nombuf: 0 > > > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > > > > > > > Throughput (since last show) > > > Rx-pps: 8861574 Rx-bps: 3969985208 > > > Tx-pps: 8861493 Tx-bps: 3969962736 > > > ############################################################################ > > > > > > ### AF_XDP PMD in guest with testpmd > > > > > > testpmd> show port stats all > > > > > > ######################## NIC statistics for port 0 ######################## > > > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > > > RX-errors: 0 > > > RX-nombuf: 0 > > > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > > > > > Throughput (since last show) > > > Rx-pps: 6333196 Rx-bps: 2837272088 > > > Tx-pps: 6333227 Tx-bps: 2837285936 > > > ############################################################################ > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > > > > > ## maintain > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > > > virtio-net. > > > > > > Please review. > > > > > > > Rethink of the whole design, I have one question: > > > > The reason we need to store DMA information is to harden the virtqueue > > to make sure the DMA unmap is safe. This seems redundant when the > > buffer were premapped by the driver, for example: > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work. > > > > So can we simply > > > > 1) when premapping is enabled, store DMA information by driver itself > > YES. this is simpler. And this is more convenience. > But the driver must allocate memory to store the dma info. > > > 2) don't store DMA information in desc_extra > > YES. But the desc_extra memory is wasted. The "next" item is used. > Do you think should we free the desc_extra when the vq is premapped mode? struct vring_desc_extra { dma_addr_t addr; /* Descriptor DMA addr. */ u32 len; /* Descriptor length. */ u16 flags; /* Descriptor flags. */ u16 next; /* The next desc state in a list. */ }; The flags and the next are used whatever premapped or not. So I think we can add a new array to store the addr and len. If the vq is premappd, the memory can be freed. struct vring_desc_extra { u16 flags; /* Descriptor flags. */ u16 next; /* The next desc state in a list. */ }; struct vring_desc_dma { dma_addr_t addr; /* Descriptor DMA addr. */ u32 len; /* Descriptor length. */ }; Thanks. > > Thanks. > > > > > > Would this be simpler? > > > > Thanks > >
On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > This is the second part of virtio-net support AF_XDP zero copy. > > > > > > > > The whole patch set > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > > > > > > > ## About the branch > > > > > > > > This patch set is pushed to the net-next branch, but some patches are about > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP > > > > should be pushed to net-next, I hope these patches will be merged into net-next > > > > with the virtio core maintains's Acked-by. > > > > > > > > ============================================================================ > > > > > > > > ## AF_XDP > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > > > > feature. > > > > > > > > At present, we have completed some preparation: > > > > > > > > 1. vq-reset (virtio spec and kernel code) > > > > 2. virtio-core premapped dma > > > > 3. virtio-net xdp refactor > > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket > > > > Zerocopy. > > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > > > > kernel. > > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > > > > is also the local CPU, then we wake up napi directly. > > > > > > > > This patch set includes some refactor to the virtio-net to let that to support > > > > AF_XDP. > > > > > > > > ## performance > > > > > > > > ENV: Qemu with vhost-user(polling mode). > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > > > > > > > ### virtio PMD in guest with testpmd > > > > > > > > testpmd> show port stats all > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > > > > RX-errors: 0 > > > > RX-nombuf: 0 > > > > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > > > > > > > > > > Throughput (since last show) > > > > Rx-pps: 8861574 Rx-bps: 3969985208 > > > > Tx-pps: 8861493 Tx-bps: 3969962736 > > > > ############################################################################ > > > > > > > > ### AF_XDP PMD in guest with testpmd > > > > > > > > testpmd> show port stats all > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > > > > RX-errors: 0 > > > > RX-nombuf: 0 > > > > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > > > > > > > Throughput (since last show) > > > > Rx-pps: 6333196 Rx-bps: 2837272088 > > > > Tx-pps: 6333227 Tx-bps: 2837285936 > > > > ############################################################################ > > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > > > > > > > ## maintain > > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > > > > virtio-net. > > > > > > > > Please review. > > > > > > > > > > Rethink of the whole design, I have one question: > > > > > > The reason we need to store DMA information is to harden the virtqueue > > > to make sure the DMA unmap is safe. This seems redundant when the > > > buffer were premapped by the driver, for example: > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work. > > > > > > So can we simply > > > > > > 1) when premapping is enabled, store DMA information by driver itself > > > > YES. this is simpler. And this is more convenience. > > But the driver must allocate memory to store the dma info. Right, and this looks like the common practice for most of the NIC drivers. > > > > > 2) don't store DMA information in desc_extra > > > > YES. But the desc_extra memory is wasted. The "next" item is used. > > Do you think should we free the desc_extra when the vq is premapped mode? > > > struct vring_desc_extra { > dma_addr_t addr; /* Descriptor DMA addr. */ > u32 len; /* Descriptor length. */ > u16 flags; /* Descriptor flags. */ > u16 next; /* The next desc state in a list. */ > }; > > > The flags and the next are used whatever premapped or not. > > So I think we can add a new array to store the addr and len. Yes. > If the vq is premappd, the memory can be freed. Then we need to make sure the premapped is set before find_vqs() etc. > > struct vring_desc_extra { > u16 flags; /* Descriptor flags. */ > u16 next; /* The next desc state in a list. */ > }; > > struct vring_desc_dma { > dma_addr_t addr; /* Descriptor DMA addr. */ > u32 len; /* Descriptor length. */ > }; > > Thanks. Thanks > > > > > Thanks. > > > > > > > > > > Would this be simpler? > > > > > > Thanks > > > >
On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > This is the second part of virtio-net support AF_XDP zero copy. > > > > > > > > > > The whole patch set > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > > > > > > > > > ## About the branch > > > > > > > > > > This patch set is pushed to the net-next branch, but some patches are about > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP > > > > > should be pushed to net-next, I hope these patches will be merged into net-next > > > > > with the virtio core maintains's Acked-by. > > > > > > > > > > ============================================================================ > > > > > > > > > > ## AF_XDP > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > > > > > feature. > > > > > > > > > > At present, we have completed some preparation: > > > > > > > > > > 1. vq-reset (virtio spec and kernel code) > > > > > 2. virtio-core premapped dma > > > > > 3. virtio-net xdp refactor > > > > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket > > > > > Zerocopy. > > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > > > > > kernel. > > > > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > > > > > is also the local CPU, then we wake up napi directly. > > > > > > > > > > This patch set includes some refactor to the virtio-net to let that to support > > > > > AF_XDP. > > > > > > > > > > ## performance > > > > > > > > > > ENV: Qemu with vhost-user(polling mode). > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > > > > > > > > > ### virtio PMD in guest with testpmd > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > > > > > RX-errors: 0 > > > > > RX-nombuf: 0 > > > > > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > > > > > > > > > > > > > Throughput (since last show) > > > > > Rx-pps: 8861574 Rx-bps: 3969985208 > > > > > Tx-pps: 8861493 Tx-bps: 3969962736 > > > > > ############################################################################ > > > > > > > > > > ### AF_XDP PMD in guest with testpmd > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > > > > > RX-errors: 0 > > > > > RX-nombuf: 0 > > > > > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > > > > > > > > > Throughput (since last show) > > > > > Rx-pps: 6333196 Rx-bps: 2837272088 > > > > > Tx-pps: 6333227 Tx-bps: 2837285936 > > > > > ############################################################################ > > > > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > > > > > > > > > ## maintain > > > > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > > > > > virtio-net. > > > > > > > > > > Please review. > > > > > > > > > > > > > Rethink of the whole design, I have one question: > > > > > > > > The reason we need to store DMA information is to harden the virtqueue > > > > to make sure the DMA unmap is safe. This seems redundant when the > > > > buffer were premapped by the driver, for example: > > > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work. > > > > > > > > So can we simply > > > > > > > > 1) when premapping is enabled, store DMA information by driver itself > > > > > > YES. this is simpler. And this is more convenience. > > > But the driver must allocate memory to store the dma info. > > Right, and this looks like the common practice for most of the NIC drivers. > > > > > > > > 2) don't store DMA information in desc_extra > > > > > > YES. But the desc_extra memory is wasted. The "next" item is used. > > > Do you think should we free the desc_extra when the vq is premapped mode? > > > > > > struct vring_desc_extra { > > dma_addr_t addr; /* Descriptor DMA addr. */ > > u32 len; /* Descriptor length. */ > > u16 flags; /* Descriptor flags. */ > > u16 next; /* The next desc state in a list. */ > > }; > > > > > > The flags and the next are used whatever premapped or not. > > > > So I think we can add a new array to store the addr and len. > > Yes. > > > If the vq is premappd, the memory can be freed. > > Then we need to make sure the premapped is set before find_vqs() etc. Yes. We can start from the parameters of the find_vqs(). But actually we can free the dma array when the driver sets premapped mode. > > > > > struct vring_desc_extra { > > u16 flags; /* Descriptor flags. */ > > u16 next; /* The next desc state in a list. */ > > }; > > > > struct vring_desc_dma { > > dma_addr_t addr; /* Descriptor DMA addr. */ > > u32 len; /* Descriptor length. */ > > }; > > > > Thanks. As we discussed, you may wait my next patch set of the new design. Could you review the first patch set of this serial. http://lore.kernel.org/all/20240116062842.67874-1-xuanzhuo@linux.alibaba.com I work on top of this. PS. There is another patch set "device stats". I hope that is in your list. http://lore.kernel.org/all/20231226073103.116153-1-xuanzhuo@linux.alibaba.com Thanks. > > Thanks > > > > > > > > > Thanks. > > > > > > > > > > > > > > Would this be simpler? > > > > > > > > Thanks > > > > > > >
On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > This is the second part of virtio-net support AF_XDP zero copy. > > > > > > > > > > > > The whole patch set > > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > > > > > > > > > > > ## About the branch > > > > > > > > > > > > This patch set is pushed to the net-next branch, but some patches are about > > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP > > > > > > should be pushed to net-next, I hope these patches will be merged into net-next > > > > > > with the virtio core maintains's Acked-by. > > > > > > > > > > > > ============================================================================ > > > > > > > > > > > > ## AF_XDP > > > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support > > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > > > > > > feature. > > > > > > > > > > > > At present, we have completed some preparation: > > > > > > > > > > > > 1. vq-reset (virtio spec and kernel code) > > > > > > 2. virtio-core premapped dma > > > > > > 3. virtio-net xdp refactor > > > > > > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket > > > > > > Zerocopy. > > > > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > > > > > > kernel. > > > > > > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > > > > > > is also the local CPU, then we wake up napi directly. > > > > > > > > > > > > This patch set includes some refactor to the virtio-net to let that to support > > > > > > AF_XDP. > > > > > > > > > > > > ## performance > > > > > > > > > > > > ENV: Qemu with vhost-user(polling mode). > > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > > > > > > > > > > > ### virtio PMD in guest with testpmd > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > > > > > > RX-errors: 0 > > > > > > RX-nombuf: 0 > > > > > > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > > > > > > > > > > > > > > > > Throughput (since last show) > > > > > > Rx-pps: 8861574 Rx-bps: 3969985208 > > > > > > Tx-pps: 8861493 Tx-bps: 3969962736 > > > > > > ############################################################################ > > > > > > > > > > > > ### AF_XDP PMD in guest with testpmd > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > > > > > > RX-errors: 0 > > > > > > RX-nombuf: 0 > > > > > > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > > > > > > > > > > > Throughput (since last show) > > > > > > Rx-pps: 6333196 Rx-bps: 2837272088 > > > > > > Tx-pps: 6333227 Tx-bps: 2837285936 > > > > > > ############################################################################ > > > > > > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > > > > > > > > > > > ## maintain > > > > > > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > > > > > > virtio-net. > > > > > > > > > > > > Please review. > > > > > > > > > > > > > > > > Rethink of the whole design, I have one question: > > > > > > > > > > The reason we need to store DMA information is to harden the virtqueue > > > > > to make sure the DMA unmap is safe. This seems redundant when the > > > > > buffer were premapped by the driver, for example: > > > > > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work. > > > > > > > > > > So can we simply > > > > > > > > > > 1) when premapping is enabled, store DMA information by driver itself > > > > > > > > YES. this is simpler. And this is more convenience. > > > > But the driver must allocate memory to store the dma info. > > > > Right, and this looks like the common practice for most of the NIC drivers. > > > > > > > > > > > 2) don't store DMA information in desc_extra > > > > > > > > YES. But the desc_extra memory is wasted. The "next" item is used. > > > > Do you think should we free the desc_extra when the vq is premapped mode? > > > > > > > > > struct vring_desc_extra { > > > dma_addr_t addr; /* Descriptor DMA addr. */ > > > u32 len; /* Descriptor length. */ > > > u16 flags; /* Descriptor flags. */ > > > u16 next; /* The next desc state in a list. */ > > > }; > > > > > > > > > The flags and the next are used whatever premapped or not. > > > > > > So I think we can add a new array to store the addr and len. > > > > Yes. > > > > > If the vq is premappd, the memory can be freed. > > > > Then we need to make sure the premapped is set before find_vqs() etc. > > > Yes. We can start from the parameters of the find_vqs(). > > But actually we can free the dma array when the driver sets premapped mode. Probably, but that's kind of odd. init() alloc() set_premapped() free() > > > > > > > > > struct vring_desc_extra { > > > u16 flags; /* Descriptor flags. */ > > > u16 next; /* The next desc state in a list. */ > > > }; > > > > > > struct vring_desc_dma { > > > dma_addr_t addr; /* Descriptor DMA addr. */ > > > u32 len; /* Descriptor length. */ > > > }; > > > > > > Thanks. > > As we discussed, you may wait my next patch set of the new design. > > Could you review the first patch set of this serial. > http://lore.kernel.org/all/20240116062842.67874-1-xuanzhuo@linux.alibaba.com > > I work on top of this. Actually, I'm a little confused about the dependencies. We have three: 1) move the virtio-net to a dedicated directory 2) premapped mode 3) AF_XDP It looks to me the current series is posted in that dependency. Then I have questions: 1) do we agree with moving to a directory (I don't have a preference)? 2) if 3) depends on 2), I'd suggest to make sure 2) is finalized before posting 3), this is because we have gone through several rounds of AF_XDP and most concerns were for the API that is introduced in 2) Does this make sense? > > PS. > > There is another patch set "device stats". I hope that is in your list. > > http://lore.kernel.org/all/20231226073103.116153-1-xuanzhuo@linux.alibaba.com Yes, it is. (If it doesn't depend on the moving of the source). Thanks > > Thanks. > > > > > > Thanks > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Would this be simpler? > > > > > > > > > > Thanks > > > > > > > > > > >
On Mon, 29 Jan 2024 11:14:43 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > This is the second part of virtio-net support AF_XDP zero copy. > > > > > > > > > > > > > > The whole patch set > > > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > > > > > > > > > > > > > ## About the branch > > > > > > > > > > > > > > This patch set is pushed to the net-next branch, but some patches are about > > > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP > > > > > > > should be pushed to net-next, I hope these patches will be merged into net-next > > > > > > > with the virtio core maintains's Acked-by. > > > > > > > > > > > > > > ============================================================================ > > > > > > > > > > > > > > ## AF_XDP > > > > > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > > > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support > > > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > > > > > > > feature. > > > > > > > > > > > > > > At present, we have completed some preparation: > > > > > > > > > > > > > > 1. vq-reset (virtio spec and kernel code) > > > > > > > 2. virtio-core premapped dma > > > > > > > 3. virtio-net xdp refactor > > > > > > > > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket > > > > > > > Zerocopy. > > > > > > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > > > > > > > kernel. > > > > > > > > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > > > > > > > is also the local CPU, then we wake up napi directly. > > > > > > > > > > > > > > This patch set includes some refactor to the virtio-net to let that to support > > > > > > > AF_XDP. > > > > > > > > > > > > > > ## performance > > > > > > > > > > > > > > ENV: Qemu with vhost-user(polling mode). > > > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > > > > > > > > > > > > > ### virtio PMD in guest with testpmd > > > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > > > > > > > RX-errors: 0 > > > > > > > RX-nombuf: 0 > > > > > > > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > > > > > > > > > > > > > > > > > > > Throughput (since last show) > > > > > > > Rx-pps: 8861574 Rx-bps: 3969985208 > > > > > > > Tx-pps: 8861493 Tx-bps: 3969962736 > > > > > > > ############################################################################ > > > > > > > > > > > > > > ### AF_XDP PMD in guest with testpmd > > > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > > > > > > > RX-errors: 0 > > > > > > > RX-nombuf: 0 > > > > > > > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > > > > > > > > > > > > > Throughput (since last show) > > > > > > > Rx-pps: 6333196 Rx-bps: 2837272088 > > > > > > > Tx-pps: 6333227 Tx-bps: 2837285936 > > > > > > > ############################################################################ > > > > > > > > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > > > > > > > > > > > > > ## maintain > > > > > > > > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > > > > > > > virtio-net. > > > > > > > > > > > > > > Please review. > > > > > > > > > > > > > > > > > > > Rethink of the whole design, I have one question: > > > > > > > > > > > > The reason we need to store DMA information is to harden the virtqueue > > > > > > to make sure the DMA unmap is safe. This seems redundant when the > > > > > > buffer were premapped by the driver, for example: > > > > > > > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work. > > > > > > > > > > > > So can we simply > > > > > > > > > > > > 1) when premapping is enabled, store DMA information by driver itself > > > > > > > > > > YES. this is simpler. And this is more convenience. > > > > > But the driver must allocate memory to store the dma info. > > > > > > Right, and this looks like the common practice for most of the NIC drivers. > > > > > > > > > > > > > > 2) don't store DMA information in desc_extra > > > > > > > > > > YES. But the desc_extra memory is wasted. The "next" item is used. > > > > > Do you think should we free the desc_extra when the vq is premapped mode? > > > > > > > > > > > > struct vring_desc_extra { > > > > dma_addr_t addr; /* Descriptor DMA addr. */ > > > > u32 len; /* Descriptor length. */ > > > > u16 flags; /* Descriptor flags. */ > > > > u16 next; /* The next desc state in a list. */ > > > > }; > > > > > > > > > > > > The flags and the next are used whatever premapped or not. > > > > > > > > So I think we can add a new array to store the addr and len. > > > > > > Yes. > > > > > > > If the vq is premappd, the memory can be freed. > > > > > > Then we need to make sure the premapped is set before find_vqs() etc. > > > > > > Yes. We can start from the parameters of the find_vqs(). > > > > But actually we can free the dma array when the driver sets premapped mode. > > Probably, but that's kind of odd. > > init() > alloc() > > set_premapped() > free() If so, the premapped option will be a find_vqs parameter, the api virtqueue_set_dma_premapped will be removed. And we can put the buffer_is_premapped to the struct virtqueue, The use can access it on the fly. (You asked on #4) > > > > > > > > > > > > > > struct vring_desc_extra { > > > > u16 flags; /* Descriptor flags. */ > > > > u16 next; /* The next desc state in a list. */ > > > > }; > > > > > > > > struct vring_desc_dma { > > > > dma_addr_t addr; /* Descriptor DMA addr. */ > > > > u32 len; /* Descriptor length. */ > > > > }; > > > > > > > > Thanks. > > > > As we discussed, you may wait my next patch set of the new design. > > > > Could you review the first patch set of this serial. > > http://lore.kernel.org/all/20240116062842.67874-1-xuanzhuo@linux.alibaba.com > > > > I work on top of this. > > Actually, I'm a little confused about the dependencies. > > We have three: > > 1) move the virtio-net to a dedicated directory > 2) premapped mode > 3) AF_XDP > > It looks to me the current series is posted in that dependency. > > Then I have questions: > > 1) do we agree with moving to a directory (I don't have a preference)? > 2) if 3) depends on 2), I'd suggest to make sure 2) is finalized > before posting 3), this is because we have gone through several rounds > of AF_XDP and most concerns were for the API that is introduced in 2) > > Does this make sense? YES. this make sense. I do this because I can reduce the work of rebasing the code again. But you are right, this is the right order. > > > > > PS. > > > > There is another patch set "device stats". I hope that is in your list. > > > > http://lore.kernel.org/all/20231226073103.116153-1-xuanzhuo@linux.alibaba.com > > Yes, it is. (If it doesn't depend on the moving of the source). It does not depend on that. Thanks. > > Thanks > > > > > Thanks. > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > Would this be simpler? > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > >
On Mon, 29 Jan 2024 11:37:56 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > On Mon, 29 Jan 2024 11:14:43 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > This is the second part of virtio-net support AF_XDP zero copy. > > > > > > > > > > > > > > > > The whole patch set > > > > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > > > > > > > > > > > > > > > ## About the branch > > > > > > > > > > > > > > > > This patch set is pushed to the net-next branch, but some patches are about > > > > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP > > > > > > > > should be pushed to net-next, I hope these patches will be merged into net-next > > > > > > > > with the virtio core maintains's Acked-by. > > > > > > > > > > > > > > > > ============================================================================ > > > > > > > > > > > > > > > > ## AF_XDP > > > > > > > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The > > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support > > > > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > > > > > > > > feature. > > > > > > > > > > > > > > > > At present, we have completed some preparation: > > > > > > > > > > > > > > > > 1. vq-reset (virtio spec and kernel code) > > > > > > > > 2. virtio-core premapped dma > > > > > > > > 3. virtio-net xdp refactor > > > > > > > > > > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket > > > > > > > > Zerocopy. > > > > > > > > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > > > > > > > > kernel. > > > > > > > > > > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > > > > > > > > is also the local CPU, then we wake up napi directly. > > > > > > > > > > > > > > > > This patch set includes some refactor to the virtio-net to let that to support > > > > > > > > AF_XDP. > > > > > > > > > > > > > > > > ## performance > > > > > > > > > > > > > > > > ENV: Qemu with vhost-user(polling mode). > > > > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > > > > > > > > > > > > > > > ### virtio PMD in guest with testpmd > > > > > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > > > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > > > > > > > > RX-errors: 0 > > > > > > > > RX-nombuf: 0 > > > > > > > > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > > > > > > > > > > > > > > > > > > > > > > Throughput (since last show) > > > > > > > > Rx-pps: 8861574 Rx-bps: 3969985208 > > > > > > > > Tx-pps: 8861493 Tx-bps: 3969962736 > > > > > > > > ############################################################################ > > > > > > > > > > > > > > > > ### AF_XDP PMD in guest with testpmd > > > > > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > > > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > > > > > > > > RX-errors: 0 > > > > > > > > RX-nombuf: 0 > > > > > > > > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > > > > > > > > > > > > > > > Throughput (since last show) > > > > > > > > Rx-pps: 6333196 Rx-bps: 2837272088 > > > > > > > > Tx-pps: 6333227 Tx-bps: 2837285936 > > > > > > > > ############################################################################ > > > > > > > > > > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > > > > > > > > > > > > > > > ## maintain > > > > > > > > > > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > > > > > > > > virtio-net. > > > > > > > > > > > > > > > > Please review. > > > > > > > > > > > > > > > > > > > > > > Rethink of the whole design, I have one question: > > > > > > > > > > > > > > The reason we need to store DMA information is to harden the virtqueue > > > > > > > to make sure the DMA unmap is safe. This seems redundant when the > > > > > > > buffer were premapped by the driver, for example: > > > > > > > > > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work. > > > > > > > > > > > > > > So can we simply > > > > > > > > > > > > > > 1) when premapping is enabled, store DMA information by driver itself > > > > > > > > > > > > YES. this is simpler. And this is more convenience. > > > > > > But the driver must allocate memory to store the dma info. > > > > > > > > Right, and this looks like the common practice for most of the NIC drivers. > > > > > > > > > > > > > > > > > 2) don't store DMA information in desc_extra > > > > > > > > > > > > YES. But the desc_extra memory is wasted. The "next" item is used. > > > > > > Do you think should we free the desc_extra when the vq is premapped mode? > > > > > > > > > > > > > > > struct vring_desc_extra { > > > > > dma_addr_t addr; /* Descriptor DMA addr. */ > > > > > u32 len; /* Descriptor length. */ > > > > > u16 flags; /* Descriptor flags. */ > > > > > u16 next; /* The next desc state in a list. */ > > > > > }; > > > > > > > > > > > > > > > The flags and the next are used whatever premapped or not. > > > > > > > > > > So I think we can add a new array to store the addr and len. > > > > > > > > Yes. > > > > > > > > > If the vq is premappd, the memory can be freed. > > > > > > > > Then we need to make sure the premapped is set before find_vqs() etc. > > > > > > > > > Yes. We can start from the parameters of the find_vqs(). > > > > > > But actually we can free the dma array when the driver sets premapped mode. > > > > Probably, but that's kind of odd. > > > > init() > > alloc() > > > > set_premapped() > > free() > > If so, the premapped option will be a find_vqs parameter, > the api virtqueue_set_dma_premapped will be removed. > And we can put the buffer_is_premapped to the struct virtqueue, > The use can access it on the fly. (You asked on #4) I try to pass the option to find_vqs. You know, the find_vqs has too many parameters. And everytime we try to add new parameter is a difficult work. Many places need to be changed. int (*find_vqs)(struct virtio_device *, unsigned nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], const char * const names[], const bool *ctx, const bool *premapped, struct irq_affinity *desc); Do you have any preference if I try to refactor this to pass a struct? Thanks. > > > > > > > > > > > > > > > > > > > > > struct vring_desc_extra { > > > > > u16 flags; /* Descriptor flags. */ > > > > > u16 next; /* The next desc state in a list. */ > > > > > }; > > > > > > > > > > struct vring_desc_dma { > > > > > dma_addr_t addr; /* Descriptor DMA addr. */ > > > > > u32 len; /* Descriptor length. */ > > > > > }; > > > > > > > > > > Thanks. > > > > > > As we discussed, you may wait my next patch set of the new design. > > > > > > Could you review the first patch set of this serial. > > > http://lore.kernel.org/all/20240116062842.67874-1-xuanzhuo@linux.alibaba.com > > > > > > I work on top of this. > > > > Actually, I'm a little confused about the dependencies. > > > > We have three: > > > > 1) move the virtio-net to a dedicated directory > > 2) premapped mode > > 3) AF_XDP > > > > It looks to me the current series is posted in that dependency. > > > > Then I have questions: > > > > 1) do we agree with moving to a directory (I don't have a preference)? > > 2) if 3) depends on 2), I'd suggest to make sure 2) is finalized > > before posting 3), this is because we have gone through several rounds > > of AF_XDP and most concerns were for the API that is introduced in 2) > > > > Does this make sense? > > YES. this make sense. > > I do this because I can reduce the work of rebasing the code again. > > But you are right, this is the right order. > > > > > > > > > > PS. > > > > > > There is another patch set "device stats". I hope that is in your list. > > > > > > http://lore.kernel.org/all/20231226073103.116153-1-xuanzhuo@linux.alibaba.com > > > > Yes, it is. (If it doesn't depend on the moving of the source). > > It does not depend on that. > > Thanks. > > > > > > Thanks > > > > > > > > Thanks. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > Would this be simpler? > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > >
On Mon, Jan 29, 2024 at 2:28 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 29 Jan 2024 11:37:56 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 29 Jan 2024 11:14:43 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > This is the second part of virtio-net support AF_XDP zero copy. > > > > > > > > > > > > > > > > > > The whole patch set > > > > > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > > > > > > > > > > > > > > > > > ## About the branch > > > > > > > > > > > > > > > > > > This patch set is pushed to the net-next branch, but some patches are about > > > > > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP > > > > > > > > > should be pushed to net-next, I hope these patches will be merged into net-next > > > > > > > > > with the virtio core maintains's Acked-by. > > > > > > > > > > > > > > > > > > ============================================================================ > > > > > > > > > > > > > > > > > > ## AF_XDP > > > > > > > > > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The > > > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support > > > > > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > > > > > > > > > feature. > > > > > > > > > > > > > > > > > > At present, we have completed some preparation: > > > > > > > > > > > > > > > > > > 1. vq-reset (virtio spec and kernel code) > > > > > > > > > 2. virtio-core premapped dma > > > > > > > > > 3. virtio-net xdp refactor > > > > > > > > > > > > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket > > > > > > > > > Zerocopy. > > > > > > > > > > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > > > > > > > > > kernel. > > > > > > > > > > > > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > > > > > > > > > is also the local CPU, then we wake up napi directly. > > > > > > > > > > > > > > > > > > This patch set includes some refactor to the virtio-net to let that to support > > > > > > > > > AF_XDP. > > > > > > > > > > > > > > > > > > ## performance > > > > > > > > > > > > > > > > > > ENV: Qemu with vhost-user(polling mode). > > > > > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > > > > > > > > > > > > > > > > > ### virtio PMD in guest with testpmd > > > > > > > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > > > > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > > > > > > > > > RX-errors: 0 > > > > > > > > > RX-nombuf: 0 > > > > > > > > > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > > > > > > > > > > > > > > > > > > > > > > > > > Throughput (since last show) > > > > > > > > > Rx-pps: 8861574 Rx-bps: 3969985208 > > > > > > > > > Tx-pps: 8861493 Tx-bps: 3969962736 > > > > > > > > > ############################################################################ > > > > > > > > > > > > > > > > > > ### AF_XDP PMD in guest with testpmd > > > > > > > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > > > > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > > > > > > > > > RX-errors: 0 > > > > > > > > > RX-nombuf: 0 > > > > > > > > > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > > > > > > > > > > > > > > > > > Throughput (since last show) > > > > > > > > > Rx-pps: 6333196 Rx-bps: 2837272088 > > > > > > > > > Tx-pps: 6333227 Tx-bps: 2837285936 > > > > > > > > > ############################################################################ > > > > > > > > > > > > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > > > > > > > > > > > > > > > > > ## maintain > > > > > > > > > > > > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > > > > > > > > > virtio-net. > > > > > > > > > > > > > > > > > > Please review. > > > > > > > > > > > > > > > > > > > > > > > > > Rethink of the whole design, I have one question: > > > > > > > > > > > > > > > > The reason we need to store DMA information is to harden the virtqueue > > > > > > > > to make sure the DMA unmap is safe. This seems redundant when the > > > > > > > > buffer were premapped by the driver, for example: > > > > > > > > > > > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work. > > > > > > > > > > > > > > > > So can we simply > > > > > > > > > > > > > > > > 1) when premapping is enabled, store DMA information by driver itself > > > > > > > > > > > > > > YES. this is simpler. And this is more convenience. > > > > > > > But the driver must allocate memory to store the dma info. > > > > > > > > > > Right, and this looks like the common practice for most of the NIC drivers. > > > > > > > > > > > > > > > > > > > > 2) don't store DMA information in desc_extra > > > > > > > > > > > > > > YES. But the desc_extra memory is wasted. The "next" item is used. > > > > > > > Do you think should we free the desc_extra when the vq is premapped mode? > > > > > > > > > > > > > > > > > > struct vring_desc_extra { > > > > > > dma_addr_t addr; /* Descriptor DMA addr. */ > > > > > > u32 len; /* Descriptor length. */ > > > > > > u16 flags; /* Descriptor flags. */ > > > > > > u16 next; /* The next desc state in a list. */ > > > > > > }; > > > > > > > > > > > > > > > > > > The flags and the next are used whatever premapped or not. > > > > > > > > > > > > So I think we can add a new array to store the addr and len. > > > > > > > > > > Yes. > > > > > > > > > > > If the vq is premappd, the memory can be freed. > > > > > > > > > > Then we need to make sure the premapped is set before find_vqs() etc. > > > > > > > > > > > > Yes. We can start from the parameters of the find_vqs(). > > > > > > > > But actually we can free the dma array when the driver sets premapped mode. > > > > > > Probably, but that's kind of odd. > > > > > > init() > > > alloc() > > > > > > set_premapped() > > > free() > > > > If so, the premapped option will be a find_vqs parameter, > > the api virtqueue_set_dma_premapped will be removed. > > And we can put the buffer_is_premapped to the struct virtqueue, > > The use can access it on the fly. (You asked on #4) > > > I try to pass the option to find_vqs. > > You know, the find_vqs has too many parameters. > And everytime we try to add new parameter is a difficult work. > Many places need to be changed. > > > int (*find_vqs)(struct virtio_device *, unsigned nvqs, > struct virtqueue *vqs[], vq_callback_t *callbacks[], > const char * const names[], const bool *ctx, > const bool *premapped, > struct irq_affinity *desc); > > Do you have any preference if I try to refactor this to pass a struct? > > Thanks. This should be fine. Thanks
On Tue, 30 Jan 2024 10:51:37 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 29, 2024 at 2:28 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 29 Jan 2024 11:37:56 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > On Mon, 29 Jan 2024 11:14:43 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > This is the second part of virtio-net support AF_XDP zero copy. > > > > > > > > > > > > > > > > > > > > The whole patch set > > > > > > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com > > > > > > > > > > > > > > > > > > > > ## About the branch > > > > > > > > > > > > > > > > > > > > This patch set is pushed to the net-next branch, but some patches are about > > > > > > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP > > > > > > > > > > should be pushed to net-next, I hope these patches will be merged into net-next > > > > > > > > > > with the virtio core maintains's Acked-by. > > > > > > > > > > > > > > > > > > > > ============================================================================ > > > > > > > > > > > > > > > > > > > > ## AF_XDP > > > > > > > > > > > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The > > > > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support > > > > > > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit > > > > > > > > > > feature. > > > > > > > > > > > > > > > > > > > > At present, we have completed some preparation: > > > > > > > > > > > > > > > > > > > > 1. vq-reset (virtio spec and kernel code) > > > > > > > > > > 2. virtio-core premapped dma > > > > > > > > > > 3. virtio-net xdp refactor > > > > > > > > > > > > > > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket > > > > > > > > > > Zerocopy. > > > > > > > > > > > > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with > > > > > > > > > > kernel. > > > > > > > > > > > > > > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver > > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX > > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it > > > > > > > > > > is also the local CPU, then we wake up napi directly. > > > > > > > > > > > > > > > > > > > > This patch set includes some refactor to the virtio-net to let that to support > > > > > > > > > > AF_XDP. > > > > > > > > > > > > > > > > > > > > ## performance > > > > > > > > > > > > > > > > > > > > ENV: Qemu with vhost-user(polling mode). > > > > > > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz > > > > > > > > > > > > > > > > > > > > ### virtio PMD in guest with testpmd > > > > > > > > > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > > > > > RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 > > > > > > > > > > RX-errors: 0 > > > > > > > > > > RX-nombuf: 0 > > > > > > > > > > TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Throughput (since last show) > > > > > > > > > > Rx-pps: 8861574 Rx-bps: 3969985208 > > > > > > > > > > Tx-pps: 8861493 Tx-bps: 3969962736 > > > > > > > > > > ############################################################################ > > > > > > > > > > > > > > > > > > > > ### AF_XDP PMD in guest with testpmd > > > > > > > > > > > > > > > > > > > > testpmd> show port stats all > > > > > > > > > > > > > > > > > > > > ######################## NIC statistics for port 0 ######################## > > > > > > > > > > RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 > > > > > > > > > > RX-errors: 0 > > > > > > > > > > RX-nombuf: 0 > > > > > > > > > > TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 > > > > > > > > > > > > > > > > > > > > Throughput (since last show) > > > > > > > > > > Rx-pps: 6333196 Rx-bps: 2837272088 > > > > > > > > > > Tx-pps: 6333227 Tx-bps: 2837285936 > > > > > > > > > > ############################################################################ > > > > > > > > > > > > > > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). > > > > > > > > > > > > > > > > > > > > ## maintain > > > > > > > > > > > > > > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in > > > > > > > > > > virtio-net. > > > > > > > > > > > > > > > > > > > > Please review. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Rethink of the whole design, I have one question: > > > > > > > > > > > > > > > > > > The reason we need to store DMA information is to harden the virtqueue > > > > > > > > > to make sure the DMA unmap is safe. This seems redundant when the > > > > > > > > > buffer were premapped by the driver, for example: > > > > > > > > > > > > > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work. > > > > > > > > > > > > > > > > > > So can we simply > > > > > > > > > > > > > > > > > > 1) when premapping is enabled, store DMA information by driver itself > > > > > > > > > > > > > > > > YES. this is simpler. And this is more convenience. > > > > > > > > But the driver must allocate memory to store the dma info. > > > > > > > > > > > > Right, and this looks like the common practice for most of the NIC drivers. > > > > > > > > > > > > > > > > > > > > > > > 2) don't store DMA information in desc_extra > > > > > > > > > > > > > > > > YES. But the desc_extra memory is wasted. The "next" item is used. > > > > > > > > Do you think should we free the desc_extra when the vq is premapped mode? > > > > > > > > > > > > > > > > > > > > > struct vring_desc_extra { > > > > > > > dma_addr_t addr; /* Descriptor DMA addr. */ > > > > > > > u32 len; /* Descriptor length. */ > > > > > > > u16 flags; /* Descriptor flags. */ > > > > > > > u16 next; /* The next desc state in a list. */ > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > The flags and the next are used whatever premapped or not. > > > > > > > > > > > > > > So I think we can add a new array to store the addr and len. > > > > > > > > > > > > Yes. > > > > > > > > > > > > > If the vq is premappd, the memory can be freed. > > > > > > > > > > > > Then we need to make sure the premapped is set before find_vqs() etc. > > > > > > > > > > > > > > > Yes. We can start from the parameters of the find_vqs(). > > > > > > > > > > But actually we can free the dma array when the driver sets premapped mode. > > > > > > > > Probably, but that's kind of odd. > > > > > > > > init() > > > > alloc() > > > > > > > > set_premapped() > > > > free() > > > > > > If so, the premapped option will be a find_vqs parameter, > > > the api virtqueue_set_dma_premapped will be removed. > > > And we can put the buffer_is_premapped to the struct virtqueue, > > > The use can access it on the fly. (You asked on #4) > > > > > > I try to pass the option to find_vqs. > > > > You know, the find_vqs has too many parameters. > > And everytime we try to add new parameter is a difficult work. > > Many places need to be changed. > > > > > > int (*find_vqs)(struct virtio_device *, unsigned nvqs, > > struct virtqueue *vqs[], vq_callback_t *callbacks[], > > const char * const names[], const bool *ctx, > > const bool *premapped, > > struct irq_affinity *desc); > > > > Do you have any preference if I try to refactor this to pass a struct? > > > > Thanks. > > This should be fine. The patch set is sent. I will introduce that in next version. Thanks. > > Thanks >
On Tue, Jan 16, 2024 at 03:59:19PM +0800, Xuan Zhuo wrote:
> This is the second part of virtio-net support AF_XDP zero copy.
My understanding is, there's going to be another version of all
this work?
On Thu, 22 Feb 2024 14:45:08 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Jan 16, 2024 at 03:59:19PM +0800, Xuan Zhuo wrote: > > This is the second part of virtio-net support AF_XDP zero copy. > > My understanding is, there's going to be another version of all > this work? YES. http://lore.kernel.org/all/20240202093951.120283-1-xuanzhuo@linux.alibaba.com Thanks > > -- > MST >