Message ID | 20230809070532.3252464-2-sumang@marvell.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Fix PFC related issues | expand |
On Wed, 9 Aug 2023 12:35:29 +0530 Suman Ghosh wrote: > + otx2_stop(dev); > + otx2_open(dev); If there is any error in open() this will silently take the interface down. Can't you force a NAPI poll or some such, if the concern is a missed IRQ?
>---------------------------------------------------------------------- >On Wed, 9 Aug 2023 12:35:29 +0530 Suman Ghosh wrote: >> + otx2_stop(dev); >> + otx2_open(dev); > >If there is any error in open() this will silently take the interface >down. Can't you force a NAPI poll or some such, if the concern is a >missed IRQ? [Suman] I can check the return type of open() and report in case of error. But even if we force NAPI poll we might not be able to control the watchdog reset. If we have a running traffic and interface is up, when we force NAPI poll, then the new packets will have updated scheduler queue and we will still loose the completion interrupts of the previous packets. Do you see any issue if I handle the error situation during open() call? >-- >pw-bot: cr
Thanks for replying a week late, always a good use of maintainers time to swap back all the context from random conversations! On Fri, 18 Aug 2023 06:54:52 +0000 Suman Ghosh wrote: > >If there is any error in open() this will silently take the interface > >down. Can't you force a NAPI poll or some such, if the concern is a > >missed IRQ? > [Suman] I can check the return type of open() and report in case of > error. But even if we force NAPI poll we might not be able to control > the watchdog reset. If we have a running traffic and interface is up, > when we force NAPI poll, then the new packets will have updated > scheduler queue and we will still loose the completion interrupts of > the previous packets. Why does it matter that you lost an interrupt if the poll has happened. Can you describe the problem in more detail? > Do you see any issue if I handle the error situation during open() call? No, for years we have been rejecting code which does this. If the machine is under memory pressure allocating all the buffers for rings can easily fail and make the machine drop off the network. You either have to refuse to change this setting at runtime or implement prepare/commit reconfiguration model like other modern drivers, where allocations are done before the stop().
>Thanks for replying a week late, always a good use of maintainers time >to swap back all the context from random conversations! [Suman] Sorry for being late this time Jakub. I will remove this patch from the patch set and will push a new version with the other three patches. I will analyze the issue in more detail and will produce a proper fix. > >On Fri, 18 Aug 2023 06:54:52 +0000 Suman Ghosh wrote: >> >If there is any error in open() this will silently take the interface >> >down. Can't you force a NAPI poll or some such, if the concern is a >> >missed IRQ? >> [Suman] I can check the return type of open() and report in case of >> error. But even if we force NAPI poll we might not be able to control >> the watchdog reset. If we have a running traffic and interface is up, >> when we force NAPI poll, then the new packets will have updated >> scheduler queue and we will still loose the completion interrupts of >> the previous packets. > >Why does it matter that you lost an interrupt if the poll has happened. >Can you describe the problem in more detail? > >> Do you see any issue if I handle the error situation during open() >call? > >No, for years we have been rejecting code which does this. >If the machine is under memory pressure allocating all the buffers for >rings can easily fail and make the machine drop off the network. >You either have to refuse to change this setting at runtime or implement >prepare/commit reconfiguration model like other modern drivers, where >allocations are done before the stop().
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_dcbnl.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_dcbnl.c index ccaf97bb1ce0..d54edfa8fcc9 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_dcbnl.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_dcbnl.c @@ -406,6 +406,7 @@ static int otx2_dcbnl_ieee_getpfc(struct net_device *dev, struct ieee_pfc *pfc) static int otx2_dcbnl_ieee_setpfc(struct net_device *dev, struct ieee_pfc *pfc) { struct otx2_nic *pfvf = netdev_priv(dev); + bool if_up = netif_running(dev); int err; /* Save PFC configuration to interface */ @@ -426,14 +427,9 @@ static int otx2_dcbnl_ieee_setpfc(struct net_device *dev, struct ieee_pfc *pfc) if (err) return err; - /* Request Per channel Bpids */ - if (pfc->pfc_en) - otx2_nix_config_bp(pfvf, true); - - err = otx2_pfc_txschq_update(pfvf); - if (err) { - dev_err(pfvf->dev, "%s failed to update TX schedulers\n", __func__); - return err; + if (if_up) { + otx2_stop(dev); + otx2_open(dev); } return 0;
As of now we are creating/deleting Tx schedulers when user is setting PFC on/off. The problem is if we have a running traffic on the interface and as we are updating the sq->smq mapping on the fly, we might loose completion interrupt for some packets. As a result of that a watchdog reset is hit from BQL. This patch solves the issue by simply calling interface off/on APIs which will reconfigure all the queues. We might loss the running traffic momentarily but that should be fine. Fixes: 99c969a83d82 ("octeontx2-pf: Add egress PFC support") Signed-off-by: Suman Ghosh <sumang@marvell.com> --- .../net/ethernet/marvell/octeontx2/nic/otx2_dcbnl.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)