Message ID | 1595246298-29260-1-git-send-email-yoshihiro.shimoda.uh@renesas.com (mailing list archive) |
---|---|
State | Under Review |
Delegated to: | Geert Uytterhoeven |
Headers | show |
Series | [PATCH/RFC,v2] net: ethernet: ravb: exit if hardware is in-progress in tx timeout | expand |
Hello! On 7/20/20 2:58 PM, Yoshihiro Shimoda wrote: > According to the report of [1], this driver is possible to cause > the following error in ravb_tx_timeout_work(). > > ravb e6800000.ethernet ethernet: failed to switch device to config mode Hmm, maybe we need a larger timeout there? The current one amounts to only ~100 ms for all cases (maybe we should parametrize the timeout?)... > This error means that the hardware could not change the state > from "Operation" to "Configuration" while some tx and/or rx queue > are operating. After that, ravb_config() in ravb_dmac_init() will fail, Are we seeing double messages from ravb_config()? I think we aren't... > and then any descriptors will be not allocaled anymore so that NULL > pointer dereference happens after that on ravb_start_xmit(). > > To fix the issue, the ravb_tx_timeout_work() should check > the return value of ravb_stop_dma() whether this hardware can be > re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work() > re-enables TX and RX and just exits. > > [1] > https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/ > > Reported-by: Dirk Behme <dirk.behme@de.bosch.com> > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Assuming the comment below is fixed: Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> > --- > Changes from RFC v1: > - Check the return value of ravb_stop_dma() and exit if the hardware > condition can not be initialized in the tx timeout. > - Update the commit subject and description. > - Fix some typo. > https://patchwork.kernel.org/patch/11570217/ > > Unfortunately, I still didn't reproduce the issue yet. So, I still > marked RFC on this patch. I think the Bosch people should test this patch, as they reported the kernel oops... > > drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index a442bcf6..500f5c1 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -1458,7 +1458,18 @@ static void ravb_tx_timeout_work(struct work_struct *work) > ravb_ptp_stop(ndev); > > /* Wait for DMA stopping */ > - ravb_stop_dma(ndev); > + if (ravb_stop_dma(ndev)) { > + /* If ravb_stop_dma() fails, the hardware is still in-progress > + * as "Operation" mode for TX and/or RX. So, this should not s/in-progress as "Operation" mode/operating/. > + * call the following functions because ravb_dmac_init() is > + * possible to fail too. Also, this should not retry > + * ravb_stop_dma() again and again here because it's possible > + * to wait forever. So, this just re-enables the TX and RX and > + * skip the following re-initialization procedure. > + */ > + ravb_rcv_snd_enable(ndev); > + goto out; > + } > > ravb_ring_free(ndev, RAVB_BE); > ravb_ring_free(ndev, RAVB_NC); > @@ -1467,6 +1478,7 @@ static void ravb_tx_timeout_work(struct work_struct *work) > ravb_dmac_init(ndev); BTW, that one also may fail... > ravb_emac_init(ndev); > > +out: > /* Initialise PTP Clock driver */ > if (priv->chip_id == RCAR_GEN2) > ravb_ptp_init(ndev, priv->pdev); >
Hello! Thank you for your review! > From: Sergei Shtylyov, Sent: Tuesday, July 21, 2020 2:15 AM > > Hello! > > On 7/20/20 2:58 PM, Yoshihiro Shimoda wrote: > > > According to the report of [1], this driver is possible to cause > > the following error in ravb_tx_timeout_work(). > > > > ravb e6800000.ethernet ethernet: failed to switch device to config mode > > Hmm, maybe we need a larger timeout there? The current one amounts to only > ~100 ms for all cases (maybe we should parametrize the timeout?)... I don't think so because we cannot assume when RX is finished. For example, if other device sends to the hardware by using "ping -f", the hardware is operating as RX while the ping is running. > > This error means that the hardware could not change the state > > from "Operation" to "Configuration" while some tx and/or rx queue > > are operating. After that, ravb_config() in ravb_dmac_init() will fail, > > Are we seeing double messages from ravb_config()? I think we aren't... No, we are not seeing double messages from ravb_config() because ravb_stop_dma() is possible to fail before ravb_config() is called if TCCR or CSR is specific condition. > > and then any descriptors will be not allocaled anymore so that NULL > > pointer dereference happens after that on ravb_start_xmit(). > > > > To fix the issue, the ravb_tx_timeout_work() should check > > the return value of ravb_stop_dma() whether this hardware can be > > re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work() > > re-enables TX and RX and just exits. > > > > [1] > > https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/ > > > > Reported-by: Dirk Behme <dirk.behme@de.bosch.com> > > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> > > Assuming the comment below is fixed: > > Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Thanks! > > --- > > Changes from RFC v1: > > - Check the return value of ravb_stop_dma() and exit if the hardware > > condition can not be initialized in the tx timeout. > > - Update the commit subject and description. > > - Fix some typo. > > https://patchwork.kernel.org/patch/11570217/ > > > > Unfortunately, I still didn't reproduce the issue yet. So, I still > > marked RFC on this patch. > > I think the Bosch people should test this patch, as they reported the kernel oops... > > > > > drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > > index a442bcf6..500f5c1 100644 > > --- a/drivers/net/ethernet/renesas/ravb_main.c > > +++ b/drivers/net/ethernet/renesas/ravb_main.c > > @@ -1458,7 +1458,18 @@ static void ravb_tx_timeout_work(struct work_struct *work) > > ravb_ptp_stop(ndev); > > > > /* Wait for DMA stopping */ > > - ravb_stop_dma(ndev); > > + if (ravb_stop_dma(ndev)) { > > + /* If ravb_stop_dma() fails, the hardware is still in-progress > > + * as "Operation" mode for TX and/or RX. So, this should not > > s/in-progress as "Operation" mode/operating/. I'll fix it. > > + * call the following functions because ravb_dmac_init() is > > + * possible to fail too. Also, this should not retry > > + * ravb_stop_dma() again and again here because it's possible > > + * to wait forever. So, this just re-enables the TX and RX and > > + * skip the following re-initialization procedure. > > + */ > > + ravb_rcv_snd_enable(ndev); > > + goto out; > > + } > > > > ravb_ring_free(ndev, RAVB_BE); > > ravb_ring_free(ndev, RAVB_NC); > > @@ -1467,6 +1478,7 @@ static void ravb_tx_timeout_work(struct work_struct *work) > > ravb_dmac_init(ndev); > > BTW, that one also may fail... Yes, that's true... In this case, I think this should print error message and stop TX and RX to avoid any unexpected behaviors like kernel panic. So, I'll add such a code. Best regards, Yoshihiro Shimoda > > ravb_emac_init(ndev); > > > > +out: > > /* Initialise PTP Clock driver */ > > if (priv->chip_id == RCAR_GEN2) > > ravb_ptp_init(ndev, priv->pdev); > >
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index a442bcf6..500f5c1 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1458,7 +1458,18 @@ static void ravb_tx_timeout_work(struct work_struct *work) ravb_ptp_stop(ndev); /* Wait for DMA stopping */ - ravb_stop_dma(ndev); + if (ravb_stop_dma(ndev)) { + /* If ravb_stop_dma() fails, the hardware is still in-progress + * as "Operation" mode for TX and/or RX. So, this should not + * call the following functions because ravb_dmac_init() is + * possible to fail too. Also, this should not retry + * ravb_stop_dma() again and again here because it's possible + * to wait forever. So, this just re-enables the TX and RX and + * skip the following re-initialization procedure. + */ + ravb_rcv_snd_enable(ndev); + goto out; + } ravb_ring_free(ndev, RAVB_BE); ravb_ring_free(ndev, RAVB_NC); @@ -1467,6 +1478,7 @@ static void ravb_tx_timeout_work(struct work_struct *work) ravb_dmac_init(ndev); ravb_emac_init(ndev); +out: /* Initialise PTP Clock driver */ if (priv->chip_id == RCAR_GEN2) ravb_ptp_init(ndev, priv->pdev);
According to the report of [1], this driver is possible to cause the following error in ravb_tx_timeout_work(). ravb e6800000.ethernet ethernet: failed to switch device to config mode This error means that the hardware could not change the state from "Operation" to "Configuration" while some tx and/or rx queue are operating. After that, ravb_config() in ravb_dmac_init() will fail, and then any descriptors will be not allocaled anymore so that NULL pointer dereference happens after that on ravb_start_xmit(). To fix the issue, the ravb_tx_timeout_work() should check the return value of ravb_stop_dma() whether this hardware can be re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work() re-enables TX and RX and just exits. [1] https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/ Reported-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> --- Changes from RFC v1: - Check the return value of ravb_stop_dma() and exit if the hardware condition can not be initialized in the tx timeout. - Update the commit subject and description. - Fix some typo. https://patchwork.kernel.org/patch/11570217/ Unfortunately, I still didn't reproduce the issue yet. So, I still marked RFC on this patch. drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)