Message ID | 20230430131518.2708471-1-alvaro.karsz@solid-run.com (mailing list archive) |
---|---|
Headers | show |
Series | virtio-net: allow usage of small vrings | expand |
On Sun, Apr 30, 2023 at 04:15:15PM +0300, Alvaro Karsz wrote: > At the moment, if a virtio network device uses vrings with less than > MAX_SKB_FRAGS + 2 entries, the device won't be functional. > > The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always > evaluate to false, leading to TX timeouts. > > This patchset attempts this fix this bug, and to allow small rings down > to 4 entries. > The first patch introduces a new mechanism in virtio core - it allows to > block features in probe time. > > If a virtio drivers blocks features and fails probe, virtio core will > reset the device, re-negotiate the features and probe again. > > This is needed since some virtio net features are not supported with > small rings. > > This patchset follows a discussion in the mailing list [1]. > > This fixes only part of the bug, rings with less than 4 entries won't > work. Why the difference? > My intention is to split the effort and fix the RING_SIZE < 4 case in a > follow up patchset. > > Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset? I'd keep current behaviour. > I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed > and split VQs, with rings down to 4 entries, with and without > VIRTIO_NET_F_MRG_RXBUF, with big MTUs. > > I would appreciate more testing. > Xuan: I wasn't able to test XDP with my setup, maybe you can help with > that? > > [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz@solid-run.com/ > > Alvaro Karsz (3): > virtio: re-negotiate features if probe fails and features are blocked > virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2 > virtio-net: block ethtool from converting a ring to a small ring > > drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++-- > drivers/virtio/virtio.c | 73 +++++++++++++----- > include/linux/virtio.h | 3 + > 3 files changed, 212 insertions(+), 25 deletions(-) > > -- > 2.34.1
> > This patchset follows a discussion in the mailing list [1]. > > > > This fixes only part of the bug, rings with less than 4 entries won't > > work. > > Why the difference? > Because the RING_SIZE < 4 case requires much more adjustments. * We may need to squeeze the virtio header into the headroom. * We may need to squeeze the GSO header into the headroom, or block the features. * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1. * We may need to change all the control commands, so class, command and command specific data will fit in a single segment. * We may need to disable the control command and all the features depending on it. * We may need to disable NAPI? There may be more changes.. I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4.
On Sun, Apr 30, 2023 at 06:15:03PM +0000, Alvaro Karsz wrote: > > > > This patchset follows a discussion in the mailing list [1]. > > > > > > This fixes only part of the bug, rings with less than 4 entries won't > > > work. > > > > Why the difference? > > > > Because the RING_SIZE < 4 case requires much more adjustments. > > * We may need to squeeze the virtio header into the headroom. > * We may need to squeeze the GSO header into the headroom, or block the features. We alread do this though no? I think we'll need to tweak hard_header_len to guarantee it's there as opposed to needed_headroom ... > * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1. You are saying clearing NETIF_F_SG does not guarantee a linear skb? > * We may need to change all the control commands, so class, command and command specific data will fit in a single segment. > * We may need to disable the control command and all the features depending on it. well if we don't commands just fail as we can't add them right? no corruption or stalls ... > * We may need to disable NAPI? hmm why napi? > There may be more changes.. > > I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4. it's ok but I'm just trying to figure out where does 4 come from.
> > > Why the difference? > > > > > > > Because the RING_SIZE < 4 case requires much more adjustments. > > > > * We may need to squeeze the virtio header into the headroom. > > * We may need to squeeze the GSO header into the headroom, or block the features. > > We alread do this though no? > I think we'll need to tweak hard_header_len to guarantee it's there > as opposed to needed_headroom ... > > > * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1. > > You are saying clearing NETIF_F_SG does not guarantee a linear skb? > I don't know.. I'm not sure what is the cause, but using this patchset, without any host GSO feature, I can get a chain of 3 descriptors. Posing an example of a 4 entries ring during iperf3, acting as a client: (TX descriptors) len=86 flags 0x1 addr 0xf738d000 len=1448 flags 0x0 addr 0xf738d800 len=86 flags 0x8081 addr 0xf738e000 len=1184, flags 0x8081 addr 0xf738e800 len=264 flags 0x8080 addr 0xf738f000 len=86 flags 0x8081 addr 0xf738f800 len=1448 flags 0x0 addr 0xf7390000 len=86 flags 0x1 addr 0xf7390800 len=1448 flags 0x0 addr 0xf7391000 len=86 flags 0x1 addr 0xf716a800 len=1448 flags 0x8080 addr 0xf716b000 len=86 flags 0x8081 addr 0xf7391800 len=1448 flags 0x8080 addr 0xf7392000 We got a chain of 3 in here. This happens often. Now, when negotiating host GSO features, I can get up to 4: len=86 flags 0x1 addr 0xf71fc800 len=21328 flags 0x1 addr 0xf6e00800 len=32768 flags 0x8081 addr 0xf6e06000 len=11064 flags 0x8080 addr 0xf6e0e000 len=86 flags 0x8081 addr 0xf738b000 len=1 flags 0x8080 addr 0xf738b800 len=86 flags 0x1 addr 0xf738c000 len=21704 flags 0x1 addr 0xf738c800 len=32768 flags 0x1 addr 0xf7392000 len=10688 flags 0x0 addr 0xf739a000 len=86 flags 0x8081 addr 0xf739d000 len=22080 flags 0x8081 addr 0xf739d800 len=32768 flags 0x8081 addr 0xf73a3000 len=10312 flags 0x8080 addr 0xf73ab000 TBH, I thought that this behaviour was expected until you mentioned it, This is why virtnet_calc_max_descs returns 3 if no host_gso feature is negotiated, and 4 otherwise. I was thinking that we may need to use another skb to hold the TSO template (for headers generation)... Any ideas? > > * We may need to change all the control commands, so class, command and command specific data will fit in a single segment. > > * We may need to disable the control command and all the features depending on it. > > well if we don't commands just fail as we can't add them right? > no corruption or stalls ... > > > * We may need to disable NAPI? > > hmm why napi? > I'm not sure if it's required to disable it, but I'm not sure what's the point having napi if the ring size is 1.. Will it work? > > There may be more changes.. > > > > I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4. > > > it's ok but I'm just trying to figure out where does 4 come from. > I guess this part was not clear, sorry.. In case of split vqs, we have ring size 2 before 4. And as you saw, we still get chains of 3 when NETIF_F_SG is off (Which I thought was expected).
On Mon, May 01, 2023 at 11:41:44AM +0000, Alvaro Karsz wrote: > > > > Why the difference? > > > > > > > > > > Because the RING_SIZE < 4 case requires much more adjustments. > > > > > > * We may need to squeeze the virtio header into the headroom. > > > * We may need to squeeze the GSO header into the headroom, or block the features. > > > > We alread do this though no? > > I think we'll need to tweak hard_header_len to guarantee it's there > > as opposed to needed_headroom ... > > > > > * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1. > > > > You are saying clearing NETIF_F_SG does not guarantee a linear skb? > > > > I don't know.. > I'm not sure what is the cause, but using this patchset, without any host GSO feature, I can get a chain of 3 descriptors. > Posing an example of a 4 entries ring during iperf3, acting as a client: > (TX descriptors) > > len=86 flags 0x1 addr 0xf738d000 > len=1448 flags 0x0 addr 0xf738d800 > len=86 flags 0x8081 addr 0xf738e000 > len=1184, flags 0x8081 addr 0xf738e800 > len=264 flags 0x8080 addr 0xf738f000 > len=86 flags 0x8081 addr 0xf738f800 > len=1448 flags 0x0 addr 0xf7390000 > len=86 flags 0x1 addr 0xf7390800 > len=1448 flags 0x0 addr 0xf7391000 > len=86 flags 0x1 addr 0xf716a800 > len=1448 flags 0x8080 addr 0xf716b000 > len=86 flags 0x8081 addr 0xf7391800 > len=1448 flags 0x8080 addr 0xf7392000 > > We got a chain of 3 in here. > This happens often. > > Now, when negotiating host GSO features, I can get up to 4: > > len=86 flags 0x1 addr 0xf71fc800 > len=21328 flags 0x1 addr 0xf6e00800 > len=32768 flags 0x8081 addr 0xf6e06000 > len=11064 flags 0x8080 addr 0xf6e0e000 > len=86 flags 0x8081 addr 0xf738b000 > len=1 flags 0x8080 addr 0xf738b800 > len=86 flags 0x1 addr 0xf738c000 > len=21704 flags 0x1 addr 0xf738c800 > len=32768 flags 0x1 addr 0xf7392000 > len=10688 flags 0x0 addr 0xf739a000 > len=86 flags 0x8081 addr 0xf739d000 > len=22080 flags 0x8081 addr 0xf739d800 > len=32768 flags 0x8081 addr 0xf73a3000 > len=10312 flags 0x8080 addr 0xf73ab000 > > TBH, I thought that this behaviour was expected until you mentioned it, > This is why virtnet_calc_max_descs returns 3 if no host_gso feature is negotiated, and 4 otherwise. > I was thinking that we may need to use another skb to hold the TSO template (for headers generation)... > > Any ideas? Something's wrong with the code apparently. Want to try sticking printk's in the driver to see what is going on? Documentation/networking/netdev-features.rst says: Those features say that ndo_start_xmit can handle fragmented skbs: NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- chained skbs (skb->next/prev list). of course we can always linearize if we want to ... > > > * We may need to change all the control commands, so class, command and command specific data will fit in a single segment. > > > * We may need to disable the control command and all the features depending on it. > > > > well if we don't commands just fail as we can't add them right? > > no corruption or stalls ... > > > > > * We may need to disable NAPI? > > > > hmm why napi? > > > > I'm not sure if it's required to disable it, but I'm not sure what's the point having napi if the ring size is 1.. > Will it work? Not much point in it but it's simpler to just keep things consistent. > > > There may be more changes.. > > > > > > I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4. > > > > > > it's ok but I'm just trying to figure out where does 4 come from. > > > > I guess this part was not clear, sorry.. > In case of split vqs, we have ring size 2 before 4. > And as you saw, we still get chains of 3 when NETIF_F_SG is off (Which I thought was expected). > Worth figuring out where do these come from.
On Sun, Apr 30, 2023 at 04:15:15PM +0300, Alvaro Karsz wrote: > At the moment, if a virtio network device uses vrings with less than > MAX_SKB_FRAGS + 2 entries, the device won't be functional. > > The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always > evaluate to false, leading to TX timeouts. > > This patchset attempts this fix this bug, and to allow small rings down > to 4 entries. > > The first patch introduces a new mechanism in virtio core - it allows to > block features in probe time. > > If a virtio drivers blocks features and fails probe, virtio core will > reset the device, re-negotiate the features and probe again. > > This is needed since some virtio net features are not supported with > small rings. > > This patchset follows a discussion in the mailing list [1]. > > This fixes only part of the bug, rings with less than 4 entries won't > work. > My intention is to split the effort and fix the RING_SIZE < 4 case in a > follow up patchset. > > Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset? > > I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed > and split VQs, with rings down to 4 entries, with and without > VIRTIO_NET_F_MRG_RXBUF, with big MTUs. > > I would appreciate more testing. > Xuan: I wasn't able to test XDP with my setup, maybe you can help with > that? > > [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz@solid-run.com/ the work is orphaned for now. Jason do you want to pick this up? Related to all the hardening I guess ... > Alvaro Karsz (3): > virtio: re-negotiate features if probe fails and features are blocked > virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2 > virtio-net: block ethtool from converting a ring to a small ring > > drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++-- > drivers/virtio/virtio.c | 73 +++++++++++++----- > include/linux/virtio.h | 3 + > 3 files changed, 212 insertions(+), 25 deletions(-) > > -- > 2.34.1