Message ID | 1565703400-10623-1-git-send-email-sgruszka@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 34b0e9b767bfa09ae233ca0d6ceb299bf2e24600 |
Delegated to: | Kalle Valo |
Headers | show |
Series | [5.3] mt76: mt76x0e: don't use hw encryption for MT7630E | expand |
> > Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > I can observe firmware hangs on MT7630E on station mode: tx stop > functioning after minor activity (rx keep working) and on module > unload device fail to stop with messages: > > [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop > [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop > > Loading module again results in failure to associate with AP. > Only machine power off / power on cycle can make device work again. > > It's unclear why commit 41634aa8d6db causes the problem, but it is > related to HW encryption. Since issue is a firmware hang, that is super > hard to debug, just disable HW encryption as fix for the issue. > > Fixes: 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> > --- > drivers/net/wireless/mediatek/mt76/mt76x0/pci.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c b/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c > index 4585e1b756c2..6117e6ca08cb 100644 > --- a/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c > +++ b/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c > @@ -62,6 +62,19 @@ static void mt76x0e_stop(struct ieee80211_hw *hw) > mt76x0e_stop_hw(dev); > } > > +static int > +mt76x0e_set_key(struct ieee80211_hw *hw, enum set_key_cmd cmd, > + struct ieee80211_vif *vif, struct ieee80211_sta *sta, > + struct ieee80211_key_conf *key) > +{ > + struct mt76x02_dev *dev = hw->priv; > + > + if (is_mt7630(dev)) > + return -EOPNOTSUPP; Hi Stanislaw, Can you please try if disabling/enabling the tx tasklet during hw key configuration fixes the issue? Doing something like: tasklet_disable(tx_tasklet) mt76x02_set_key() tasklet_enable(tx_tasklet) Moreover, have you double checked if there is any performance impact of not using hw encryption? If so, I guess it is better to just redefine mt76_wake_tx_queue for mt76x0e and run mt76_txq_schedule for 7630e: void mt76x0e_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) { if (is_mt7630(dev)) { mt76_txq_schedule(dev, txq->ac); } else { tasklet_schedule(&dev->tx_tasklet); } } Regards, Lorenzo > + > + return mt76x02_set_key(hw, cmd, vif, sta, key); > +} > + > static void > mt76x0e_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif, > u32 queues, bool drop) > @@ -78,7 +91,7 @@ static void mt76x0e_stop(struct ieee80211_hw *hw) > .configure_filter = mt76x02_configure_filter, > .bss_info_changed = mt76x02_bss_info_changed, > .sta_state = mt76_sta_state, > - .set_key = mt76x02_set_key, > + .set_key = mt76x0e_set_key, > .conf_tx = mt76x02_conf_tx, > .sw_scan_start = mt76x02_sw_scan, > .sw_scan_complete = mt76x02_sw_scan_complete, > -- > 1.9.3 >
On Wed, Aug 14, 2019 at 11:50:12AM +0200, Lorenzo Bianconi wrote: > > > > Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > > I can observe firmware hangs on MT7630E on station mode: tx stop > > functioning after minor activity (rx keep working) and on module > > unload device fail to stop with messages: > > > > [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop > > [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop > > > > Loading module again results in failure to associate with AP. > > Only machine power off / power on cycle can make device work again. > > > > It's unclear why commit 41634aa8d6db causes the problem, but it is > > related to HW encryption. Since issue is a firmware hang, that is super > > hard to debug, just disable HW encryption as fix for the issue. > > > > Fixes: 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > > Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> > > --- > > drivers/net/wireless/mediatek/mt76/mt76x0/pci.c | 15 ++++++++++++++- > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c b/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c > > index 4585e1b756c2..6117e6ca08cb 100644 > > --- a/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c > > @@ -62,6 +62,19 @@ static void mt76x0e_stop(struct ieee80211_hw *hw) > > mt76x0e_stop_hw(dev); > > } > > > > +static int > > +mt76x0e_set_key(struct ieee80211_hw *hw, enum set_key_cmd cmd, > > + struct ieee80211_vif *vif, struct ieee80211_sta *sta, > > + struct ieee80211_key_conf *key) > > +{ > > + struct mt76x02_dev *dev = hw->priv; > > + > > + if (is_mt7630(dev)) > > + return -EOPNOTSUPP; > > Hi Stanislaw, > > Can you please try if disabling/enabling the tx tasklet during hw key > configuration fixes the issue? > Doing something like: > > tasklet_disable(tx_tasklet) > mt76x02_set_key() > tasklet_enable(tx_tasklet) It does not help with the problem. > Moreover, have you double checked if there is any performance impact > of not using hw encryption? I didn't observe any, but realized on this machine I have aesni_intel encryption accelerator. After rebuild kernel without CONFIG_CRYPTO_AES_NI_INTEL, 'perf top' showed extra 20% of cpu usage in aes_encrypt() when sending data with HW encryption disabled. > If so, I guess it is better to just redefine mt76_wake_tx_queue for > mt76x0e and run mt76_txq_schedule for 7630e: > > void mt76x0e_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) > { > if (is_mt7630(dev)) { > mt76_txq_schedule(dev, txq->ac); > } else { > tasklet_schedule(&dev->tx_tasklet); > } > } Not sure about reduction of lock contention for which the tx_tasklet was introduced here, but looks ok for me as fix. Stanislaw
On 2019-08-15 12:09, Stanislaw Gruszka wrote: >> Hi Stanislaw, >> >> Can you please try if disabling/enabling the tx tasklet during hw key >> configuration fixes the issue? >> Doing something like: >> >> tasklet_disable(tx_tasklet) >> mt76x02_set_key() >> tasklet_enable(tx_tasklet) > > It does not help with the problem. > >> Moreover, have you double checked if there is any performance impact >> of not using hw encryption? > > I didn't observe any, but realized on this machine I have > aesni_intel encryption accelerator. After rebuild kernel without > CONFIG_CRYPTO_AES_NI_INTEL, 'perf top' showed extra 20% of cpu usage > in aes_encrypt() when sending data with HW encryption disabled. > >> If so, I guess it is better to just redefine mt76_wake_tx_queue for >> mt76x0e and run mt76_txq_schedule for 7630e: >> >> void mt76x0e_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) >> { >> if (is_mt7630(dev)) { >> mt76_txq_schedule(dev, txq->ac); >> } else { >> tasklet_schedule(&dev->tx_tasklet); >> } >> } > > Not sure about reduction of lock contention for which the tx_tasklet > was introduced here, but looks ok for me as fix. I think if we work around the bug like this, it can easily come back to bite us again later. I don't see any logical explanation as to how this makes a difference with hardware encryption. Also, I think it would be helpful to figure out what key operation (if any) triggers this, adding or removing keys. Maybe it could also help if we change the order in which the WCID table entries are updated, i.e. changing MT_WCID_ATTR first when removing keys. Maybe temporarily clearing MT_MAC_SYS_CTRL_ENABLE_TX before the key update and setting it again afterwards could also help. - Felix
On Thu, Aug 15, 2019 at 12:20:54PM +0200, Felix Fietkau wrote: > On 2019-08-15 12:09, Stanislaw Gruszka wrote: > >> Hi Stanislaw, > >> > >> Can you please try if disabling/enabling the tx tasklet during hw key > >> configuration fixes the issue? > >> Doing something like: > >> > >> tasklet_disable(tx_tasklet) > >> mt76x02_set_key() > >> tasklet_enable(tx_tasklet) > > > > It does not help with the problem. > > > >> Moreover, have you double checked if there is any performance impact > >> of not using hw encryption? > > > > I didn't observe any, but realized on this machine I have > > aesni_intel encryption accelerator. After rebuild kernel without > > CONFIG_CRYPTO_AES_NI_INTEL, 'perf top' showed extra 20% of cpu usage > > in aes_encrypt() when sending data with HW encryption disabled. > > > >> If so, I guess it is better to just redefine mt76_wake_tx_queue for > >> mt76x0e and run mt76_txq_schedule for 7630e: > >> > >> void mt76x0e_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) > >> { > >> if (is_mt7630(dev)) { > >> mt76_txq_schedule(dev, txq->ac); > >> } else { > >> tasklet_schedule(&dev->tx_tasklet); > >> } > >> } > > > > Not sure about reduction of lock contention for which the tx_tasklet > > was introduced here, but looks ok for me as fix. > I think if we work around the bug like this, it can easily come back to > bite us again later. I'm not into workarounds any kind, but this is really strange issue, maybe FW bug that triggers just by slightly different driver behaviour. > I don't see any logical explanation as to how this > makes a difference with hardware encryption. > Also, I think it would be helpful to figure out what key operation (if > any) triggers this, adding or removing keys. Seems not to be related with set_key operation at all. We set 2 HW keys at the beginning and hang happen after some tx/rx traffic without any re-keyring. I'm not sure why disabling HW encryption helps. Maybe it is due to ordering or timing. With SW encryption we spend more time in mac80211 before pass skb's to the driver. Or maybe we just mix some HW keys and SW (group) keys in way that FW does not like. > Maybe it could also help if we change the order in which the WCID table > entries are updated, i.e. changing MT_WCID_ATTR first when removing keys. > > Maybe temporarily clearing MT_MAC_SYS_CTRL_ENABLE_TX before the key > update and setting it again afterwards could also help. I tested below patch and it did not help. Stanislaw diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c index 82bafb5ac326..846652f2b3f1 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mac.c @@ -105,11 +105,14 @@ int mt76x02_mac_wcid_set_key(struct mt76x02_dev *dev, u8 idx, if (cipher == MT_CIPHER_NONE && key) return -EOPNOTSUPP; + if (!key) + mt76_rmw_field(dev, MT_WCID_ATTR(idx), MT_WCID_ATTR_PKEY_MODE, MT_CIPHER_NONE); + mt76_wr_copy(dev, MT_WCID_KEY(idx), key_data, sizeof(key_data)); - mt76_rmw_field(dev, MT_WCID_ATTR(idx), MT_WCID_ATTR_PKEY_MODE, cipher); memset(iv_data, 0, sizeof(iv_data)); if (key) { + mt76_rmw_field(dev, MT_WCID_ATTR(idx), MT_WCID_ATTR_PKEY_MODE, cipher); mt76_rmw_field(dev, MT_WCID_ATTR(idx), MT_WCID_ATTR_PAIRWISE, !!(key->flags & IEEE80211_KEY_FLAG_PAIRWISE)); diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_util.c b/drivers/net/wireless/mediatek/mt76/mt76x02_util.c index fa45ed280ab1..6f624e3329f9 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_util.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_util.c @@ -391,7 +391,7 @@ int mt76x02_ampdu_action(struct ieee80211_hw *hw, struct ieee80211_vif *vif, } EXPORT_SYMBOL_GPL(mt76x02_ampdu_action); -int mt76x02_set_key(struct ieee80211_hw *hw, enum set_key_cmd cmd, +int __mt76x02_set_key(struct ieee80211_hw *hw, enum set_key_cmd cmd, struct ieee80211_vif *vif, struct ieee80211_sta *sta, struct ieee80211_key_conf *key) { @@ -466,6 +466,35 @@ int mt76x02_set_key(struct ieee80211_hw *hw, enum set_key_cmd cmd, return mt76x02_mac_wcid_set_key(dev, msta->wcid.idx, key); } + +int mt76x02_set_key(struct ieee80211_hw *hw, enum set_key_cmd cmd, + struct ieee80211_vif *vif, struct ieee80211_sta *sta, + struct ieee80211_key_conf *key) +{ + struct mt76x02_dev *dev = hw->priv; + int ret; + int i = 200; + + tasklet_disable(&dev->mt76.tx_tasklet); + + /* Page count on TxQ */ + while (i-- && ((mt76_rr(dev, 0x0438) & 0xffffffff) || + (mt76_rr(dev, 0x0a30) & 0x000000ff) || + (mt76_rr(dev, 0x0a34) & 0x00ff00ff))) + msleep(10); + + if (!mt76_poll(dev, MT_MAC_STATUS, MT_MAC_STATUS_TX, 0, 1000)) + dev_warn(dev->mt76.dev, "Warning: SET KEY: MAC TX did not stop!\n"); + + mt76_clear(dev, MT_MAC_SYS_CTRL, MT_MAC_SYS_CTRL_ENABLE_TX); + + ret = __mt76x02_set_key(hw, cmd, vif, sta, key); + + mt76_set(dev, MT_MAC_SYS_CTRL, MT_MAC_SYS_CTRL_ENABLE_TX); + tasklet_enable(&dev->mt76.tx_tasklet); + + return ret; +} EXPORT_SYMBOL_GPL(mt76x02_set_key); int mt76x02_conf_tx(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
On 2019-08-19 13:06, Stanislaw Gruszka wrote: > On Thu, Aug 15, 2019 at 12:20:54PM +0200, Felix Fietkau wrote: >> On 2019-08-15 12:09, Stanislaw Gruszka wrote: >> >> Hi Stanislaw, >> >> >> >> Can you please try if disabling/enabling the tx tasklet during hw key >> >> configuration fixes the issue? >> >> Doing something like: >> >> >> >> tasklet_disable(tx_tasklet) >> >> mt76x02_set_key() >> >> tasklet_enable(tx_tasklet) >> > >> > It does not help with the problem. >> > >> >> Moreover, have you double checked if there is any performance impact >> >> of not using hw encryption? >> > >> > I didn't observe any, but realized on this machine I have >> > aesni_intel encryption accelerator. After rebuild kernel without >> > CONFIG_CRYPTO_AES_NI_INTEL, 'perf top' showed extra 20% of cpu usage >> > in aes_encrypt() when sending data with HW encryption disabled. >> > >> >> If so, I guess it is better to just redefine mt76_wake_tx_queue for >> >> mt76x0e and run mt76_txq_schedule for 7630e: >> >> >> >> void mt76x0e_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) >> >> { >> >> if (is_mt7630(dev)) { >> >> mt76_txq_schedule(dev, txq->ac); >> >> } else { >> >> tasklet_schedule(&dev->tx_tasklet); >> >> } >> >> } >> > >> > Not sure about reduction of lock contention for which the tx_tasklet >> > was introduced here, but looks ok for me as fix. >> I think if we work around the bug like this, it can easily come back to >> bite us again later. > > I'm not into workarounds any kind, but this is really strange issue, > maybe FW bug that triggers just by slightly different driver behaviour. > >> I don't see any logical explanation as to how this >> makes a difference with hardware encryption. >> Also, I think it would be helpful to figure out what key operation (if >> any) triggers this, adding or removing keys. > > Seems not to be related with set_key operation at all. We set 2 HW > keys at the beginning and hang happen after some tx/rx traffic > without any re-keyring. > > I'm not sure why disabling HW encryption helps. Maybe it is due to > ordering or timing. With SW encryption we spend more time in mac80211 > before pass skb's to the driver. Or maybe we just mix some HW keys > and SW (group) keys in way that FW does not like. > >> Maybe it could also help if we change the order in which the WCID table >> entries are updated, i.e. changing MT_WCID_ATTR first when removing keys. >> >> Maybe temporarily clearing MT_MAC_SYS_CTRL_ENABLE_TX before the key >> update and setting it again afterwards could also help. > > I tested below patch and it did not help. Can you test if disabling hw encryption only for shared or only for pairwise keys makes any difference? - Felix
On Tue, Aug 20, 2019 at 12:31:56PM +0200, Felix Fietkau wrote: > >> >> void mt76x0e_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) > >> >> { > >> >> if (is_mt7630(dev)) { > >> >> mt76_txq_schedule(dev, txq->ac); > >> >> } else { > >> >> tasklet_schedule(&dev->tx_tasklet); > >> >> } > >> >> } > >> > > >> > Not sure about reduction of lock contention for which the tx_tasklet > >> > was introduced here, but looks ok for me as fix. > >> I think if we work around the bug like this, it can easily come back to > >> bite us again later. > > > > I'm not into workarounds any kind, but this is really strange issue, > > maybe FW bug that triggers just by slightly different driver behaviour. > > > >> I don't see any logical explanation as to how this > >> makes a difference with hardware encryption. > >> Also, I think it would be helpful to figure out what key operation (if > >> any) triggers this, adding or removing keys. > > > > Seems not to be related with set_key operation at all. We set 2 HW > > keys at the beginning and hang happen after some tx/rx traffic > > without any re-keyring. > > > > I'm not sure why disabling HW encryption helps. Maybe it is due to > > ordering or timing. With SW encryption we spend more time in mac80211 > > before pass skb's to the driver. Or maybe we just mix some HW keys > > and SW (group) keys in way that FW does not like. > > > >> Maybe it could also help if we change the order in which the WCID table > >> entries are updated, i.e. changing MT_WCID_ATTR first when removing keys. > >> > >> Maybe temporarily clearing MT_MAC_SYS_CTRL_ENABLE_TX before the key > >> update and setting it again afterwards could also help. > > > > I tested below patch and it did not help. > Can you test if disabling hw encryption only for shared or only for > pairwise keys makes any difference? Disabling only pairwise keys helps. Disabling only shared keys does not help. Not sure if this will be helpful information or make things more confusing, but seems the difference between mt76_txq_schedule() and tasklet_schedule() in mt76_wake_tx_queue() is that on mt76_txq_schedule() some tx packets are serialized by dev->rx_lock (because some ARP and TCP packets are sent via network stack as response of incoming packet within ieee80211_rx_napi() call). Removing spin_lock(&dev->rx_lock) in mt76_rx_complete() make the problem reproducible again with mt76_txq_schedule() & HW encryption. Stanislaw
On Tue, Aug 20, 2019 at 01:24:39PM +0200, Stanislaw Gruszka wrote: > > Can you test if disabling hw encryption only for shared or only for > > pairwise keys makes any difference? > > Disabling only pairwise keys helps. Disabling only shared keys does > not help. > > Not sure if this will be helpful information or make things more > confusing, but seems the difference between mt76_txq_schedule() > and tasklet_schedule() in mt76_wake_tx_queue() is that on > mt76_txq_schedule() some tx packets are serialized by dev->rx_lock > (because some ARP and TCP packets are sent via network stack as response > of incoming packet within ieee80211_rx_napi() call). Removing > spin_lock(&dev->rx_lock) in mt76_rx_complete() make the problem > reproducible again with mt76_txq_schedule() & HW encryption. So, I think this is FW/HW issue related with encryption and ordering and we should apply patch originally posted in this thread that disable HW encryption for MT7630E. I do not think we should disable HW encryption only for pairwise keys, because FW/HW can have the same bug for shared keys, but is not triggered in my test, as we do not sent lot of group frames. Stanislaw
On 2019-08-21 10:47, Stanislaw Gruszka wrote: > On Tue, Aug 20, 2019 at 01:24:39PM +0200, Stanislaw Gruszka wrote: >> > Can you test if disabling hw encryption only for shared or only for >> > pairwise keys makes any difference? >> >> Disabling only pairwise keys helps. Disabling only shared keys does >> not help. >> >> Not sure if this will be helpful information or make things more >> confusing, but seems the difference between mt76_txq_schedule() >> and tasklet_schedule() in mt76_wake_tx_queue() is that on >> mt76_txq_schedule() some tx packets are serialized by dev->rx_lock >> (because some ARP and TCP packets are sent via network stack as response >> of incoming packet within ieee80211_rx_napi() call). Removing >> spin_lock(&dev->rx_lock) in mt76_rx_complete() make the problem >> reproducible again with mt76_txq_schedule() & HW encryption. > > So, I think this is FW/HW issue related with encryption and ordering > and we should apply patch originally posted in this thread that > disable HW encryption for MT7630E. > > I do not think we should disable HW encryption only for pairwise keys, > because FW/HW can have the same bug for shared keys, but is not > triggered in my test, as we do not sent lot of group frames. I'm still not convinced that this is just the hardware implementation of hw crypto being faulty. I think it's more likely that there's a bug in the tx path somewhere, which causes hangs on MT7630E but remains hidden (or at least recoverable) on other devices. I'm currently reviewing key handling in the mac80211 fast-xmit codepath and get the feeling that something might be racy there. I will let you know when I make some progress with that review. If we can't find the bug soon, then I'm fine with merging this patch. Right now, I would like to see first if we can fix it properly. - Felix
On 2019-08-21 11:03, Felix Fietkau wrote: > On 2019-08-21 10:47, Stanislaw Gruszka wrote: >> On Tue, Aug 20, 2019 at 01:24:39PM +0200, Stanislaw Gruszka wrote: >>> > Can you test if disabling hw encryption only for shared or only for >>> > pairwise keys makes any difference? >>> >>> Disabling only pairwise keys helps. Disabling only shared keys does >>> not help. >>> >>> Not sure if this will be helpful information or make things more >>> confusing, but seems the difference between mt76_txq_schedule() >>> and tasklet_schedule() in mt76_wake_tx_queue() is that on >>> mt76_txq_schedule() some tx packets are serialized by dev->rx_lock >>> (because some ARP and TCP packets are sent via network stack as response >>> of incoming packet within ieee80211_rx_napi() call). Removing >>> spin_lock(&dev->rx_lock) in mt76_rx_complete() make the problem >>> reproducible again with mt76_txq_schedule() & HW encryption. >> >> So, I think this is FW/HW issue related with encryption and ordering >> and we should apply patch originally posted in this thread that >> disable HW encryption for MT7630E. >> >> I do not think we should disable HW encryption only for pairwise keys, >> because FW/HW can have the same bug for shared keys, but is not >> triggered in my test, as we do not sent lot of group frames. > I'm still not convinced that this is just the hardware implementation of > hw crypto being faulty. I think it's more likely that there's a bug in > the tx path somewhere, which causes hangs on MT7630E but remains hidden > (or at least recoverable) on other devices. > I'm currently reviewing key handling in the mac80211 fast-xmit codepath > and get the feeling that something might be racy there. > I will let you know when I make some progress with that review. > If we can't find the bug soon, then I'm fine with merging this patch. > Right now, I would like to see first if we can fix it properly. Another question: Does a watchdog restart happen before tx fails? - Felix
On Wed, Aug 21, 2019 at 12:40:14PM +0200, Felix Fietkau wrote: > On 2019-08-21 11:03, Felix Fietkau wrote: > > On 2019-08-21 10:47, Stanislaw Gruszka wrote: > >> On Tue, Aug 20, 2019 at 01:24:39PM +0200, Stanislaw Gruszka wrote: > >>> > Can you test if disabling hw encryption only for shared or only for > >>> > pairwise keys makes any difference? > >>> > >>> Disabling only pairwise keys helps. Disabling only shared keys does > >>> not help. > >>> > >>> Not sure if this will be helpful information or make things more > >>> confusing, but seems the difference between mt76_txq_schedule() > >>> and tasklet_schedule() in mt76_wake_tx_queue() is that on > >>> mt76_txq_schedule() some tx packets are serialized by dev->rx_lock > >>> (because some ARP and TCP packets are sent via network stack as response > >>> of incoming packet within ieee80211_rx_napi() call). Removing > >>> spin_lock(&dev->rx_lock) in mt76_rx_complete() make the problem > >>> reproducible again with mt76_txq_schedule() & HW encryption. > >> > >> So, I think this is FW/HW issue related with encryption and ordering > >> and we should apply patch originally posted in this thread that > >> disable HW encryption for MT7630E. > >> > >> I do not think we should disable HW encryption only for pairwise keys, > >> because FW/HW can have the same bug for shared keys, but is not > >> triggered in my test, as we do not sent lot of group frames. > > I'm still not convinced that this is just the hardware implementation of > > hw crypto being faulty. I think it's more likely that there's a bug in > > the tx path somewhere, which causes hangs on MT7630E but remains hidden > > (or at least recoverable) on other devices. > > I'm currently reviewing key handling in the mac80211 fast-xmit codepath > > and get the feeling that something might be racy there. > > I will let you know when I make some progress with that review. > > If we can't find the bug soon, then I'm fine with merging this patch. > > Right now, I would like to see first if we can fix it properly. > Another question: Does a watchdog restart happen before tx fails? No, we do not run wdt_work for mt76x0e. Stanislaw
On 2019-08-21 13:22, Stanislaw Gruszka wrote: > On Wed, Aug 21, 2019 at 12:40:14PM +0200, Felix Fietkau wrote: >> On 2019-08-21 11:03, Felix Fietkau wrote: >> > On 2019-08-21 10:47, Stanislaw Gruszka wrote: >> >> On Tue, Aug 20, 2019 at 01:24:39PM +0200, Stanislaw Gruszka wrote: >> >>> > Can you test if disabling hw encryption only for shared or only for >> >>> > pairwise keys makes any difference? >> >>> >> >>> Disabling only pairwise keys helps. Disabling only shared keys does >> >>> not help. >> >>> >> >>> Not sure if this will be helpful information or make things more >> >>> confusing, but seems the difference between mt76_txq_schedule() >> >>> and tasklet_schedule() in mt76_wake_tx_queue() is that on >> >>> mt76_txq_schedule() some tx packets are serialized by dev->rx_lock >> >>> (because some ARP and TCP packets are sent via network stack as response >> >>> of incoming packet within ieee80211_rx_napi() call). Removing >> >>> spin_lock(&dev->rx_lock) in mt76_rx_complete() make the problem >> >>> reproducible again with mt76_txq_schedule() & HW encryption. >> >> >> >> So, I think this is FW/HW issue related with encryption and ordering >> >> and we should apply patch originally posted in this thread that >> >> disable HW encryption for MT7630E. >> >> >> >> I do not think we should disable HW encryption only for pairwise keys, >> >> because FW/HW can have the same bug for shared keys, but is not >> >> triggered in my test, as we do not sent lot of group frames. >> > I'm still not convinced that this is just the hardware implementation of >> > hw crypto being faulty. I think it's more likely that there's a bug in >> > the tx path somewhere, which causes hangs on MT7630E but remains hidden >> > (or at least recoverable) on other devices. >> > I'm currently reviewing key handling in the mac80211 fast-xmit codepath >> > and get the feeling that something might be racy there. >> > I will let you know when I make some progress with that review. >> > If we can't find the bug soon, then I'm fine with merging this patch. >> > Right now, I would like to see first if we can fix it properly. >> Another question: Does a watchdog restart happen before tx fails? > > No, we do not run wdt_work for mt76x0e. I didn't find anything conclusive yet, so I agree with merging this patch for now until we have a better fix. - Felix
Stanislaw Gruszka <sgruszka@redhat.com> wrote: > Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > I can observe firmware hangs on MT7630E on station mode: tx stop > functioning after minor activity (rx keep working) and on module > unload device fail to stop with messages: > > [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop > [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop > > Loading module again results in failure to associate with AP. > Only machine power off / power on cycle can make device work again. > > It's unclear why commit 41634aa8d6db causes the problem, but it is > related to HW encryption. Since issue is a firmware hang, that is super > hard to debug, just disable HW encryption as fix for the issue. > > Fixes: 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Patch applied to wireless-drivers.git, thanks. 34b0e9b767bf mt76: mt76x0e: don't use hw encryption for MT7630E
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c b/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c index 4585e1b756c2..6117e6ca08cb 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c @@ -62,6 +62,19 @@ static void mt76x0e_stop(struct ieee80211_hw *hw) mt76x0e_stop_hw(dev); } +static int +mt76x0e_set_key(struct ieee80211_hw *hw, enum set_key_cmd cmd, + struct ieee80211_vif *vif, struct ieee80211_sta *sta, + struct ieee80211_key_conf *key) +{ + struct mt76x02_dev *dev = hw->priv; + + if (is_mt7630(dev)) + return -EOPNOTSUPP; + + return mt76x02_set_key(hw, cmd, vif, sta, key); +} + static void mt76x0e_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif, u32 queues, bool drop) @@ -78,7 +91,7 @@ static void mt76x0e_stop(struct ieee80211_hw *hw) .configure_filter = mt76x02_configure_filter, .bss_info_changed = mt76x02_bss_info_changed, .sta_state = mt76_sta_state, - .set_key = mt76x02_set_key, + .set_key = mt76x0e_set_key, .conf_tx = mt76x02_conf_tx, .sw_scan_start = mt76x02_sw_scan, .sw_scan_complete = mt76x02_sw_scan_complete,
Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") I can observe firmware hangs on MT7630E on station mode: tx stop functioning after minor activity (rx keep working) and on module unload device fail to stop with messages: [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop Loading module again results in failure to associate with AP. Only machine power off / power on cycle can make device work again. It's unclear why commit 41634aa8d6db causes the problem, but it is related to HW encryption. Since issue is a firmware hang, that is super hard to debug, just disable HW encryption as fix for the issue. Fixes: 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> --- drivers/net/wireless/mediatek/mt76/mt76x0/pci.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)