Message ID | 20220318185644.517164-8-miquel.raynal@bootlin.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | ieee802154: Better Tx error handling | expand |
Hi, On Fri, Mar 18, 2022 at 2:56 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Either the spi operation failed, or the offloaded transmit operation > failed and returned a TRAC value. Use this value when available or use > the default "SYSTEM_ERROR" otherwise, in order to propagate one step > above the error. > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > --- > drivers/net/ieee802154/at86rf230.c | 25 +++++++++++++++++++++++-- > 1 file changed, 23 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > index d3cf6d23b57e..34d199f597c9 100644 > --- a/drivers/net/ieee802154/at86rf230.c > +++ b/drivers/net/ieee802154/at86rf230.c > @@ -358,7 +358,23 @@ static inline void > at86rf230_async_error(struct at86rf230_local *lp, > struct at86rf230_state_change *ctx, int rc) > { > - dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > + int reason; > + > + switch (rc) { I think there was a miscommunication last time, this rc variable is not a trac register value, it is a linux errno. Also the error here has nothing to do with a trac error. A trac error is the result of the offloaded transmit functionality on the transceiver, here we dealing about bus communication errors produced by the spi subsystem. What we need is to report it to the softmac layer as "IEEE802154_SYSTEM_ERROR" (as we decided that this is a user specific error and can be returned by the transceiver for non 802.15.4 "error" return code. > + case TRAC_CHANNEL_ACCESS_FAILURE: > + reason = IEEE802154_CHANNEL_ACCESS_FAILURE; > + break; > + case TRAC_NO_ACK: > + reason = IEEE802154_NO_ACK; > + break; > + default: > + reason = IEEE802154_SYSTEM_ERROR; > + } > + > + if (rc < 0) > + dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > + else > + dev_err(&lp->spi->dev, "xceiver error %d\n", reason); > > at86rf230_async_state_change(lp, ctx, STATE_FORCE_TRX_OFF, > at86rf230_async_error_recover); > @@ -666,10 +682,15 @@ at86rf230_tx_trac_check(void *context) > case TRAC_SUCCESS: > case TRAC_SUCCESS_DATA_PENDING: > at86rf230_async_state_change(lp, ctx, STATE_TX_ON, at86rf230_tx_on); > + return; > + case TRAC_CHANNEL_ACCESS_FAILURE: > + case TRAC_NO_ACK: > break; > default: > - at86rf230_async_error(lp, ctx, -EIO); > + trac = TRAC_INVALID; > } > + > + at86rf230_async_error(lp, ctx, trac); That makes no sense, at86rf230_async_error() is not a trac error handling, it is a bus error handling. As noted above. With this change you mix bus errors and trac errors (which are not bus errors). If there are no bus errors then trac should be evaluated and should either deliver some 802.15.4 $SUCCESS_CODE or $ERROR_CODE to the softmac stack, which is xmit_complete() or xmit_error(). - Alex
Hi Alexander, alex.aring@gmail.com wrote on Sun, 27 Mar 2022 11:46:12 -0400: > Hi, > > On Fri, Mar 18, 2022 at 2:56 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Either the spi operation failed, or the offloaded transmit operation > > failed and returned a TRAC value. Use this value when available or use > > the default "SYSTEM_ERROR" otherwise, in order to propagate one step > > above the error. > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > --- > > drivers/net/ieee802154/at86rf230.c | 25 +++++++++++++++++++++++-- > > 1 file changed, 23 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > index d3cf6d23b57e..34d199f597c9 100644 > > --- a/drivers/net/ieee802154/at86rf230.c > > +++ b/drivers/net/ieee802154/at86rf230.c > > @@ -358,7 +358,23 @@ static inline void > > at86rf230_async_error(struct at86rf230_local *lp, > > struct at86rf230_state_change *ctx, int rc) > > { > > - dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > > + int reason; > > + > > + switch (rc) { > > I think there was a miscommunication last time, this rc variable is > not a trac register value, it is a linux errno. Also the error here > has nothing to do with a trac error. A trac error is the result of the > offloaded transmit functionality on the transceiver, here we dealing > about bus communication errors produced by the spi subsystem. What we > need is to report it to the softmac layer as "IEEE802154_SYSTEM_ERROR" > (as we decided that this is a user specific error and can be returned > by the transceiver for non 802.15.4 "error" return code. > > > + case TRAC_CHANNEL_ACCESS_FAILURE: > > + reason = IEEE802154_CHANNEL_ACCESS_FAILURE; > > + break; > > + case TRAC_NO_ACK: > > + reason = IEEE802154_NO_ACK; > > + break; > > + default: > > + reason = IEEE802154_SYSTEM_ERROR; I went for the solution: if it is a bus error, I return SYSTEM ERROR, otherwise I return a trac error. > > + } > > + > > + if (rc < 0) > > + dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > > + else > > + dev_err(&lp->spi->dev, "xceiver error %d\n", reason); > > > > at86rf230_async_state_change(lp, ctx, STATE_FORCE_TRX_OFF, > > at86rf230_async_error_recover); > > @@ -666,10 +682,15 @@ at86rf230_tx_trac_check(void *context) > > case TRAC_SUCCESS: > > case TRAC_SUCCESS_DATA_PENDING: > > at86rf230_async_state_change(lp, ctx, STATE_TX_ON, at86rf230_tx_on); > > + return; > > + case TRAC_CHANNEL_ACCESS_FAILURE: > > + case TRAC_NO_ACK: > > break; > > default: > > - at86rf230_async_error(lp, ctx, -EIO); > > + trac = TRAC_INVALID; > > } > > + > > + at86rf230_async_error(lp, ctx, trac); > > That makes no sense, at86rf230_async_error() is not a trac error > handling, it is a bus error handling. As noted above. With this change > you mix bus errors and trac errors (which are not bus errors). If > there are no bus errors then trac should be evaluated and should > either deliver some 802.15.4 $SUCCESS_CODE or $ERROR_CODE to the > softmac stack, which is xmit_complete() or xmit_error(). There is no specific path for bus errors, everything is supposedly asynchronous and all the function return void. In both cases I need to free the skb. So I am questioning myself about the right solution (need to think further...) Thanks, Miquèl
Hi Alexander, alex.aring@gmail.com wrote on Sun, 27 Mar 2022 11:46:12 -0400: > Hi, > > On Fri, Mar 18, 2022 at 2:56 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Either the spi operation failed, or the offloaded transmit operation > > failed and returned a TRAC value. Use this value when available or use > > the default "SYSTEM_ERROR" otherwise, in order to propagate one step > > above the error. > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > --- > > drivers/net/ieee802154/at86rf230.c | 25 +++++++++++++++++++++++-- > > 1 file changed, 23 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > index d3cf6d23b57e..34d199f597c9 100644 > > --- a/drivers/net/ieee802154/at86rf230.c > > +++ b/drivers/net/ieee802154/at86rf230.c > > @@ -358,7 +358,23 @@ static inline void > > at86rf230_async_error(struct at86rf230_local *lp, > > struct at86rf230_state_change *ctx, int rc) > > { > > - dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > > + int reason; > > + > > + switch (rc) { > > I think there was a miscommunication last time, this rc variable is > not a trac register value, it is a linux errno. Also the error here > has nothing to do with a trac error. A trac error is the result of the > offloaded transmit functionality on the transceiver, here we dealing > about bus communication errors produced by the spi subsystem. What we > need is to report it to the softmac layer as "IEEE802154_SYSTEM_ERROR" > (as we decided that this is a user specific error and can be returned > by the transceiver for non 802.15.4 "error" return code. I think we definitely need to handle both, see below. > > > + case TRAC_CHANNEL_ACCESS_FAILURE: > > + reason = IEEE802154_CHANNEL_ACCESS_FAILURE; > > + break; > > + case TRAC_NO_ACK: > > + reason = IEEE802154_NO_ACK; > > + break; > > + default: > > + reason = IEEE802154_SYSTEM_ERROR; > > + } > > + > > + if (rc < 0) > > + dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > > + else > > + dev_err(&lp->spi->dev, "xceiver error %d\n", reason); > > > > at86rf230_async_state_change(lp, ctx, STATE_FORCE_TRX_OFF, > > at86rf230_async_error_recover); > > @@ -666,10 +682,15 @@ at86rf230_tx_trac_check(void *context) > > case TRAC_SUCCESS: > > case TRAC_SUCCESS_DATA_PENDING: > > at86rf230_async_state_change(lp, ctx, STATE_TX_ON, at86rf230_tx_on); > > + return; > > + case TRAC_CHANNEL_ACCESS_FAILURE: > > + case TRAC_NO_ACK: > > break; > > default: > > - at86rf230_async_error(lp, ctx, -EIO); > > + trac = TRAC_INVALID; > > } > > + > > + at86rf230_async_error(lp, ctx, trac); > > That makes no sense, at86rf230_async_error() is not a trac error > handling, it is a bus error handling. Both will have to be handled asynchronously, which means we have to tell the soft mac layer that something bad happened in each case. > As noted above. With this change > you mix bus errors and trac errors (which are not bus errors). In the case of a SPI error, it will happen asynchronously, which means the tx call is over and something bad happened. We are aware that something bad happened and there was a bus error. We need to: - Free the skb - Restart the internal machinery - Somehow tell the soft mac layer something bad happened and the packet will not be transmitted as expected (IOW, balance the "end" calls with the "start" calls, just because we did not return immediately when we got the transmit request). In the case of a transmission error, this is a trac condition that is reported to us by an IRQ. We know it is a trac error, we can look at a buffer to find which trac error exactly happened. In this case we need to go through exactly the same steps as above. But you are right that a spi_async() error is not a trac error, hence my choice in the switch statement to default to the IEEE80154_SYSTEM_ERROR flag in this case. Should I ignore spi bus errors? I don't think I can, so I don't really see how to handle it differently. Thanks, Miquèl
Hello, miquel.raynal@bootlin.com wrote on Tue, 29 Mar 2022 18:35:06 +0200: > Hi Alexander, > > alex.aring@gmail.com wrote on Sun, 27 Mar 2022 11:46:12 -0400: > > > Hi, > > > > On Fri, Mar 18, 2022 at 2:56 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > Either the spi operation failed, or the offloaded transmit operation > > > failed and returned a TRAC value. Use this value when available or use > > > the default "SYSTEM_ERROR" otherwise, in order to propagate one step > > > above the error. > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > --- > > > drivers/net/ieee802154/at86rf230.c | 25 +++++++++++++++++++++++-- > > > 1 file changed, 23 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > > index d3cf6d23b57e..34d199f597c9 100644 > > > --- a/drivers/net/ieee802154/at86rf230.c > > > +++ b/drivers/net/ieee802154/at86rf230.c > > > @@ -358,7 +358,23 @@ static inline void > > > at86rf230_async_error(struct at86rf230_local *lp, > > > struct at86rf230_state_change *ctx, int rc) > > > { > > > - dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > > > + int reason; > > > + > > > + switch (rc) { > > > > I think there was a miscommunication last time, this rc variable is > > not a trac register value, it is a linux errno. Also the error here > > has nothing to do with a trac error. A trac error is the result of the > > offloaded transmit functionality on the transceiver, here we dealing > > about bus communication errors produced by the spi subsystem. What we > > need is to report it to the softmac layer as "IEEE802154_SYSTEM_ERROR" > > (as we decided that this is a user specific error and can be returned > > by the transceiver for non 802.15.4 "error" return code. > > I think we definitely need to handle both, see below. > > > > > > + case TRAC_CHANNEL_ACCESS_FAILURE: > > > + reason = IEEE802154_CHANNEL_ACCESS_FAILURE; > > > + break; > > > + case TRAC_NO_ACK: > > > + reason = IEEE802154_NO_ACK; > > > + break; > > > + default: > > > + reason = IEEE802154_SYSTEM_ERROR; > > > + } > > > + > > > + if (rc < 0) > > > + dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > > > + else > > > + dev_err(&lp->spi->dev, "xceiver error %d\n", reason); > > > > > > at86rf230_async_state_change(lp, ctx, STATE_FORCE_TRX_OFF, > > > at86rf230_async_error_recover); > > > @@ -666,10 +682,15 @@ at86rf230_tx_trac_check(void *context) > > > case TRAC_SUCCESS: > > > case TRAC_SUCCESS_DATA_PENDING: > > > at86rf230_async_state_change(lp, ctx, STATE_TX_ON, at86rf230_tx_on); > > > + return; > > > + case TRAC_CHANNEL_ACCESS_FAILURE: > > > + case TRAC_NO_ACK: > > > break; > > > default: > > > - at86rf230_async_error(lp, ctx, -EIO); > > > + trac = TRAC_INVALID; > > > } > > > + > > > + at86rf230_async_error(lp, ctx, trac); > > > > That makes no sense, at86rf230_async_error() is not a trac error > > handling, it is a bus error handling. > > Both will have to be handled asynchronously, which means we have to > tell the soft mac layer that something bad happened in each case. > > > As noted above. With this change > > you mix bus errors and trac errors (which are not bus errors). > > In the case of a SPI error, it will happen asynchronously, which means > the tx call is over and something bad happened. We are aware that > something bad happened and there was a bus error. We need to: > - Free the skb > - Restart the internal machinery > - Somehow tell the soft mac layer something bad happened and the packet > will not be transmitted as expected (IOW, balance the "end" calls > with the "start" calls, just because we did not return immediately > when we got the transmit request). > > In the case of a transmission error, this is a trac condition that is > reported to us by an IRQ. We know it is a trac error, we can look at a > buffer to find which trac error exactly happened. In this case we need > to go through exactly the same steps as above. > > But you are right that a spi_async() error is not a trac error, hence > my choice in the switch statement to default to the > IEEE80154_SYSTEM_ERROR flag in this case. > > Should I ignore spi bus errors? I don't think I can, so I don't really > see how to handle it differently. Sorry to bother you again, but in the end, do you agree on returning IEEE802154_SYSTEM_ERROR upon asynchronous bus errors? Any other modification of the driver in favor of having two distinct paths would be really costly in term of time spent and probability of breaking something, so I would rather avoid it, unless I am missing something simpler? Cheers, Miquèl
Hi, On Mon, Apr 4, 2022 at 8:40 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hello, > > miquel.raynal@bootlin.com wrote on Tue, 29 Mar 2022 18:35:06 +0200: > > > Hi Alexander, > > > > alex.aring@gmail.com wrote on Sun, 27 Mar 2022 11:46:12 -0400: > > > > > Hi, > > > > > > On Fri, Mar 18, 2022 at 2:56 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > > > Either the spi operation failed, or the offloaded transmit operation > > > > failed and returned a TRAC value. Use this value when available or use > > > > the default "SYSTEM_ERROR" otherwise, in order to propagate one step > > > > above the error. > > > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > > --- > > > > drivers/net/ieee802154/at86rf230.c | 25 +++++++++++++++++++++++-- > > > > 1 file changed, 23 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > > > index d3cf6d23b57e..34d199f597c9 100644 > > > > --- a/drivers/net/ieee802154/at86rf230.c > > > > +++ b/drivers/net/ieee802154/at86rf230.c > > > > @@ -358,7 +358,23 @@ static inline void > > > > at86rf230_async_error(struct at86rf230_local *lp, > > > > struct at86rf230_state_change *ctx, int rc) > > > > { > > > > - dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > > > > + int reason; > > > > + > > > > + switch (rc) { > > > > > > I think there was a miscommunication last time, this rc variable is > > > not a trac register value, it is a linux errno. Also the error here > > > has nothing to do with a trac error. A trac error is the result of the > > > offloaded transmit functionality on the transceiver, here we dealing > > > about bus communication errors produced by the spi subsystem. What we > > > need is to report it to the softmac layer as "IEEE802154_SYSTEM_ERROR" > > > (as we decided that this is a user specific error and can be returned > > > by the transceiver for non 802.15.4 "error" return code. > > > > I think we definitely need to handle both, see below. > > > > > > > > > + case TRAC_CHANNEL_ACCESS_FAILURE: > > > > + reason = IEEE802154_CHANNEL_ACCESS_FAILURE; > > > > + break; > > > > + case TRAC_NO_ACK: > > > > + reason = IEEE802154_NO_ACK; > > > > + break; > > > > + default: > > > > + reason = IEEE802154_SYSTEM_ERROR; > > > > + } > > > > + > > > > + if (rc < 0) > > > > + dev_err(&lp->spi->dev, "spi_async error %d\n", rc); > > > > + else > > > > + dev_err(&lp->spi->dev, "xceiver error %d\n", reason); > > > > > > > > at86rf230_async_state_change(lp, ctx, STATE_FORCE_TRX_OFF, > > > > at86rf230_async_error_recover); > > > > @@ -666,10 +682,15 @@ at86rf230_tx_trac_check(void *context) > > > > case TRAC_SUCCESS: > > > > case TRAC_SUCCESS_DATA_PENDING: > > > > at86rf230_async_state_change(lp, ctx, STATE_TX_ON, at86rf230_tx_on); > > > > + return; > > > > + case TRAC_CHANNEL_ACCESS_FAILURE: > > > > + case TRAC_NO_ACK: > > > > break; > > > > default: > > > > - at86rf230_async_error(lp, ctx, -EIO); > > > > + trac = TRAC_INVALID; > > > > } > > > > + > > > > + at86rf230_async_error(lp, ctx, trac); > > > > > > That makes no sense, at86rf230_async_error() is not a trac error > > > handling, it is a bus error handling. > > > > Both will have to be handled asynchronously, which means we have to > > tell the soft mac layer that something bad happened in each case. > > > > > As noted above. With this change > > > you mix bus errors and trac errors (which are not bus errors). > > > > In the case of a SPI error, it will happen asynchronously, which means > > the tx call is over and something bad happened. We are aware that > > something bad happened and there was a bus error. We need to: > > - Free the skb > > - Restart the internal machinery > > - Somehow tell the soft mac layer something bad happened and the packet > > will not be transmitted as expected (IOW, balance the "end" calls > > with the "start" calls, just because we did not return immediately > > when we got the transmit request). > > > > In the case of a transmission error, this is a trac condition that is > > reported to us by an IRQ. We know it is a trac error, we can look at a > > buffer to find which trac error exactly happened. In this case we need > > to go through exactly the same steps as above. > > > > But you are right that a spi_async() error is not a trac error, hence > > my choice in the switch statement to default to the > > IEEE80154_SYSTEM_ERROR flag in this case. > > > > Should I ignore spi bus errors? I don't think I can, so I don't really > > see how to handle it differently. > > Sorry to bother you again, but in the end, do you agree on returning > IEEE802154_SYSTEM_ERROR upon asynchronous bus errors? > yes, I said nothing different here. What I said is that bus errors and trac status get mixed here and this patch breaks things. Really this is just changing either call xmit_complete() when trac was one of the successful codes or xmit_error($REASON) when trac was one which failed. In case of bus error and it was "tx" then call xmit_error(SYSTEM_ERROR) in at86rf230_async_error_recover_complete(). You might need to store the last trac register to decide what to call at the current places of "xmit_complete()". I also would like to see a helper here which statically sends SYSTEM_ERROR in case of bus error because I am worried that somebody is choosing any other 802.15.4 error to return which might be interpreted differently by SoftMAC. > Any other modification of the driver in favor of having two distinct > paths would be really costly in term of time spent and probability of > breaking something, so I would rather avoid it, unless I am missing > something simpler? If it's too much time, then just update the driver like any others and don't use the new feature, somebody else will send patches for it to update the driver then. - Alex
diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c index d3cf6d23b57e..34d199f597c9 100644 --- a/drivers/net/ieee802154/at86rf230.c +++ b/drivers/net/ieee802154/at86rf230.c @@ -358,7 +358,23 @@ static inline void at86rf230_async_error(struct at86rf230_local *lp, struct at86rf230_state_change *ctx, int rc) { - dev_err(&lp->spi->dev, "spi_async error %d\n", rc); + int reason; + + switch (rc) { + case TRAC_CHANNEL_ACCESS_FAILURE: + reason = IEEE802154_CHANNEL_ACCESS_FAILURE; + break; + case TRAC_NO_ACK: + reason = IEEE802154_NO_ACK; + break; + default: + reason = IEEE802154_SYSTEM_ERROR; + } + + if (rc < 0) + dev_err(&lp->spi->dev, "spi_async error %d\n", rc); + else + dev_err(&lp->spi->dev, "xceiver error %d\n", reason); at86rf230_async_state_change(lp, ctx, STATE_FORCE_TRX_OFF, at86rf230_async_error_recover); @@ -666,10 +682,15 @@ at86rf230_tx_trac_check(void *context) case TRAC_SUCCESS: case TRAC_SUCCESS_DATA_PENDING: at86rf230_async_state_change(lp, ctx, STATE_TX_ON, at86rf230_tx_on); + return; + case TRAC_CHANNEL_ACCESS_FAILURE: + case TRAC_NO_ACK: break; default: - at86rf230_async_error(lp, ctx, -EIO); + trac = TRAC_INVALID; } + + at86rf230_async_error(lp, ctx, trac); } static void
Either the spi operation failed, or the offloaded transmit operation failed and returned a TRAC value. Use this value when available or use the default "SYSTEM_ERROR" otherwise, in order to propagate one step above the error. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> --- drivers/net/ieee802154/at86rf230.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-)