Message ID | 1610685980-38608-1-git-send-email-wangyunjian@huawei.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v7] vhost_net: avoid tx queue stuck when sendmsg fails | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 1 maintainers not CCed: kvm@vger.kernel.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: line length of 82 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On 2021/1/15 下午12:46, wangyunjian wrote: > From: Yunjian Wang <wangyunjian@huawei.com> > > Currently the driver doesn't drop a packet which can't be sent by tun > (e.g bad packet). In this case, the driver will always process the > same packet lead to the tx queue stuck. > > To fix this issue: > 1. in the case of persistent failure (e.g bad packet), the driver > can skip this descriptor by ignoring the error. > 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), > the driver schedules the worker to try again. > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> > --- > v7: > * code rebase > v6: > * update code styles and commit log > --- > drivers/vhost/net.c | 26 ++++++++++++++------------ > 1 file changed, 14 insertions(+), 12 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index 3b744031ec8f..df82b124170e 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -828,14 +828,15 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) > msg.msg_flags &= ~MSG_MORE; > } > > - /* TODO: Check specific error and bomb out unless ENOBUFS? */ > err = sock->ops->sendmsg(sock, &msg, len); > if (unlikely(err < 0)) { > - vhost_discard_vq_desc(vq, 1); > - vhost_net_enable_vq(net, vq); > - break; > - } > - if (err != len) > + if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) { > + vhost_discard_vq_desc(vq, 1); > + vhost_net_enable_vq(net, vq); > + break; > + } > + pr_debug("Fail to send packet: err %d", err); > + } else if (unlikely(err != len)) > pr_debug("Truncated TX packet: len %d != %zd\n", > err, len); > done: > @@ -924,7 +925,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) > msg.msg_flags &= ~MSG_MORE; > } > > - /* TODO: Check specific error and bomb out unless ENOBUFS? */ > err = sock->ops->sendmsg(sock, &msg, len); > if (unlikely(err < 0)) { > if (zcopy_used) { > @@ -933,11 +933,13 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) > nvq->upend_idx = ((unsigned)nvq->upend_idx - 1) > % UIO_MAXIOV; > } > - vhost_discard_vq_desc(vq, 1); > - vhost_net_enable_vq(net, vq); > - break; > - } > - if (err != len) > + if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) { > + vhost_discard_vq_desc(vq, 1); > + vhost_net_enable_vq(net, vq); > + break; > + } > + pr_debug("Fail to send packet: err %d", err); > + } else if (unlikely(err != len)) > pr_debug("Truncated TX packet: " > " len %d != %zd\n", err, len); > if (!zcopy_used)
On Fri, Jan 15, 2021 at 1:12 AM Jason Wang <jasowang@redhat.com> wrote: > > > On 2021/1/15 下午12:46, wangyunjian wrote: > > From: Yunjian Wang <wangyunjian@huawei.com> > > > > Currently the driver doesn't drop a packet which can't be sent by tun > > (e.g bad packet). In this case, the driver will always process the > > same packet lead to the tx queue stuck. > > > > To fix this issue: > > 1. in the case of persistent failure (e.g bad packet), the driver > > can skip this descriptor by ignoring the error. > > 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), > > the driver schedules the worker to try again. > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > > Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com>
On Fri, 15 Jan 2021 12:46:20 +0800 wangyunjian wrote: > From: Yunjian Wang <wangyunjian@huawei.com> > > Currently the driver doesn't drop a packet which can't be sent by tun > (e.g bad packet). In this case, the driver will always process the > same packet lead to the tx queue stuck. > > To fix this issue: > 1. in the case of persistent failure (e.g bad packet), the driver > can skip this descriptor by ignoring the error. > 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), > the driver schedules the worker to try again. > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Michael, LMK if you want to have a closer look otherwise I'll apply tomorrow.
On Fri, Jan 15, 2021 at 12:46:20PM +0800, wangyunjian wrote: > From: Yunjian Wang <wangyunjian@huawei.com> > > Currently the driver doesn't drop a packet which can't be sent by tun > (e.g bad packet). In this case, the driver will always process the > same packet lead to the tx queue stuck. > > To fix this issue: > 1. in the case of persistent failure (e.g bad packet), the driver > can skip this descriptor by ignoring the error. > 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), > the driver schedules the worker to try again. > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> > --- > v7: > * code rebase > v6: > * update code styles and commit log > --- > drivers/vhost/net.c | 26 ++++++++++++++------------ > 1 file changed, 14 insertions(+), 12 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index 3b744031ec8f..df82b124170e 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -828,14 +828,15 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) > msg.msg_flags &= ~MSG_MORE; > } > > - /* TODO: Check specific error and bomb out unless ENOBUFS? */ > err = sock->ops->sendmsg(sock, &msg, len); > if (unlikely(err < 0)) { > - vhost_discard_vq_desc(vq, 1); > - vhost_net_enable_vq(net, vq); > - break; > - } > - if (err != len) > + if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) { > + vhost_discard_vq_desc(vq, 1); > + vhost_net_enable_vq(net, vq); > + break; > + } > + pr_debug("Fail to send packet: err %d", err); > + } else if (unlikely(err != len)) > pr_debug("Truncated TX packet: len %d != %zd\n", > err, len); > done: > @@ -924,7 +925,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) > msg.msg_flags &= ~MSG_MORE; > } > > - /* TODO: Check specific error and bomb out unless ENOBUFS? */ > err = sock->ops->sendmsg(sock, &msg, len); > if (unlikely(err < 0)) { > if (zcopy_used) { > @@ -933,11 +933,13 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) > nvq->upend_idx = ((unsigned)nvq->upend_idx - 1) > % UIO_MAXIOV; > } > - vhost_discard_vq_desc(vq, 1); > - vhost_net_enable_vq(net, vq); > - break; > - } > - if (err != len) > + if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) { > + vhost_discard_vq_desc(vq, 1); > + vhost_net_enable_vq(net, vq); > + break; > + } > + pr_debug("Fail to send packet: err %d", err); > + } else if (unlikely(err != len)) > pr_debug("Truncated TX packet: " > " len %d != %zd\n", err, len); > if (!zcopy_used) > -- > 2.23.0
On Mon, Jan 18, 2021 at 02:33:29PM -0800, Jakub Kicinski wrote: > On Fri, 15 Jan 2021 12:46:20 +0800 wangyunjian wrote: > > From: Yunjian Wang <wangyunjian@huawei.com> > > > > Currently the driver doesn't drop a packet which can't be sent by tun > > (e.g bad packet). In this case, the driver will always process the > > same packet lead to the tx queue stuck. > > > > To fix this issue: > > 1. in the case of persistent failure (e.g bad packet), the driver > > can skip this descriptor by ignoring the error. > > 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), > > the driver schedules the worker to try again. > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > Michael, LMK if you want to have a closer look otherwise I'll apply > tomorrow. Thanks for the reminder. Acked.
On Tue, 19 Jan 2021 04:56:59 -0500 Michael S. Tsirkin wrote: > On Mon, Jan 18, 2021 at 02:33:29PM -0800, Jakub Kicinski wrote: > > On Fri, 15 Jan 2021 12:46:20 +0800 wangyunjian wrote: > > > From: Yunjian Wang <wangyunjian@huawei.com> > > > > > > Currently the driver doesn't drop a packet which can't be sent by tun > > > (e.g bad packet). In this case, the driver will always process the > > > same packet lead to the tx queue stuck. > > > > > > To fix this issue: > > > 1. in the case of persistent failure (e.g bad packet), the driver > > > can skip this descriptor by ignoring the error. > > > 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), > > > the driver schedules the worker to try again. > > > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > > > Michael, LMK if you want to have a closer look otherwise I'll apply > > tomorrow. > > Thanks for the reminder. Acked. Applied, thanks everyone!
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 3b744031ec8f..df82b124170e 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -828,14 +828,15 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) msg.msg_flags &= ~MSG_MORE; } - /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { - vhost_discard_vq_desc(vq, 1); - vhost_net_enable_vq(net, vq); - break; - } - if (err != len) + if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) { + vhost_discard_vq_desc(vq, 1); + vhost_net_enable_vq(net, vq); + break; + } + pr_debug("Fail to send packet: err %d", err); + } else if (unlikely(err != len)) pr_debug("Truncated TX packet: len %d != %zd\n", err, len); done: @@ -924,7 +925,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) msg.msg_flags &= ~MSG_MORE; } - /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { if (zcopy_used) { @@ -933,11 +933,13 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) nvq->upend_idx = ((unsigned)nvq->upend_idx - 1) % UIO_MAXIOV; } - vhost_discard_vq_desc(vq, 1); - vhost_net_enable_vq(net, vq); - break; - } - if (err != len) + if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) { + vhost_discard_vq_desc(vq, 1); + vhost_net_enable_vq(net, vq); + break; + } + pr_debug("Fail to send packet: err %d", err); + } else if (unlikely(err != len)) pr_debug("Truncated TX packet: " " len %d != %zd\n", err, len); if (!zcopy_used)