Message ID | 20230922173626.23790-3-asmaa@nvidia.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | mlxbf_gige: Fix several bugs | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
On 9/22/2023 10:36 AM, Asmaa Mnebhi wrote: > Although the link is up, there is no ip assigned on a setup with high background > traffic. Nothing is transmitted nor received. > The RX error count keeps on increasing. After several minutes, the RX error count > stagnates and the GigE interface finally gets an ip. > > The issue is in the mlxbf_gige_rx_init function. As soon as the RX DMA is enabled, > the RX CI reaches the max of 128, and it becomes equal to RX PI. RX CI doesn't decrease > since the code hasn't ran phy_start yet. > > The solution is to move the rx init after phy_start. > > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") > Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> > Reviewed-by: David Thompson <davthompson@nvidia.com> This seems fine, but your description of the problem still looks like there may be a more fundamental ordering issue when you enable your RX pipe here. It seems to me like you should enable it from "inner" as in closest to the CPU/DMA subsystem towards "outer" which is the MAC and finally the PHY. It should be fine to enable your RX DMA as long as you keep the MAC's RX disabled, and then you can enable your MAC's RX enable and later start the PHY.
> > Although the link is up, there is no ip assigned on a setup with high > > background traffic. Nothing is transmitted nor received. > > The RX error count keeps on increasing. After several minutes, the RX > > error count stagnates and the GigE interface finally gets an ip. > > > > The issue is in the mlxbf_gige_rx_init function. As soon as the RX DMA > > is enabled, the RX CI reaches the max of 128, and it becomes equal to > > RX PI. RX CI doesn't decrease since the code hasn't ran phy_start yet. > > > > The solution is to move the rx init after phy_start. > > > > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") > > Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> > > Reviewed-by: David Thompson <davthompson@nvidia.com> > > This seems fine, but your description of the problem still looks like there may > be a more fundamental ordering issue when you enable your RX pipe here. > > It seems to me like you should enable it from "inner" as in closest to the > CPU/DMA subsystem towards "outer" which is the MAC and finally the PHY. > > It should be fine to enable your RX DMA as long as you keep the MAC's RX > disabled, and then you can enable your MAC's RX enable and later start the > PHY. Thanks for your feedback Florian. I will take a look and address your comments shortly. Sorry for the delayed response, I was OOO.
> > > Although the link is up, there is no ip assigned on a setup with > > > high background traffic. Nothing is transmitted nor received. > > > The RX error count keeps on increasing. After several minutes, the > > > RX error count stagnates and the GigE interface finally gets an ip. > > > > > > The issue is in the mlxbf_gige_rx_init function. As soon as the RX > > > DMA is enabled, the RX CI reaches the max of 128, and it becomes > > > equal to RX PI. RX CI doesn't decrease since the code hasn't ran phy_start > yet. > > > > > > The solution is to move the rx init after phy_start. > > > > > > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet > > > driver") > > > Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> > > > Reviewed-by: David Thompson <davthompson@nvidia.com> > > > > This seems fine, but your description of the problem still looks like > > there may be a more fundamental ordering issue when you enable your RX > pipe here. > > > > It seems to me like you should enable it from "inner" as in closest to > > the CPU/DMA subsystem towards "outer" which is the MAC and finally the > PHY. > > > > It should be fine to enable your RX DMA as long as you keep the MAC's > > RX disabled, and then you can enable your MAC's RX enable and later > > start the PHY. > > Thanks for your feedback Florian. I will take a look and address your > comments shortly. Sorry for the delayed response, I was OOO. Hi Florian, We would like to maintain the code as is because we need to set the RX DMA after the MAC RX filters and the RX rings are setup (in mlxbf_gige_rx_init()). The PHY start logic needs to be done before that, otherwise, there is a chance we would encounter this bug where our MAC RX consumer index (CI) equals our MAC RX production index (PI) and that results in a MAC state that cannot be solved until we cleanup the MAC again. Note that this bug is difficult to reproduce. Our QA had to run the reboot test and have a setup with really high background traffic. Thanks. Asmaa
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c index 74185b02daa0..fd4fac1ca26c 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c @@ -147,14 +147,14 @@ static int mlxbf_gige_open(struct net_device *netdev) */ priv->valid_polarity = 0; - err = mlxbf_gige_rx_init(priv); + phy_start(phydev); + + err = mlxbf_gige_tx_init(priv); if (err) goto free_irqs; - err = mlxbf_gige_tx_init(priv); + err = mlxbf_gige_rx_init(priv); if (err) - goto rx_deinit; - - phy_start(phydev); + goto tx_deinit; netif_napi_add(netdev, &priv->napi, mlxbf_gige_poll); napi_enable(&priv->napi); @@ -176,8 +176,8 @@ static int mlxbf_gige_open(struct net_device *netdev) return 0; -rx_deinit: - mlxbf_gige_rx_deinit(priv); +tx_deinit: + mlxbf_gige_tx_deinit(priv); free_irqs: mlxbf_gige_free_irqs(priv); diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c index cfb8fb957f0c..d82feeabb061 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c @@ -142,6 +142,9 @@ int mlxbf_gige_rx_init(struct mlxbf_gige *priv) writeq(MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS_EN, priv->base + MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS); + writeq(ilog2(priv->rx_q_entries), + priv->base + MLXBF_GIGE_RX_WQE_SIZE_LOG2); + /* Clear MLXBF_GIGE_INT_MASK 'receive pkt' bit to * indicate readiness to receive interrupts */ @@ -154,9 +157,6 @@ int mlxbf_gige_rx_init(struct mlxbf_gige *priv) data |= MLXBF_GIGE_RX_DMA_EN; writeq(data, priv->base + MLXBF_GIGE_RX_DMA); - writeq(ilog2(priv->rx_q_entries), - priv->base + MLXBF_GIGE_RX_WQE_SIZE_LOG2); - return 0; free_wqe_and_skb: