Message ID | 1468924452-23877-5-git-send-email-michal.kazior@tieto.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 47b1848d9fde5daf102f599be6e589a1d3c8da7d |
Delegated to: | Kalle Valo |
Headers | show |
On 07/19/2016 03:34 AM, Michal Kazior wrote: > HW Rx filters and masks are not configured > properly by firmware during boot sequences. The > MAC_PCU_ADDR1 is set to 0s instead of 1s which > allows the HW to ACK any frame that passes through > MAC_PCU_RX_FILTER. The MAC_PCU_RX_FILTER itself > is misconfigured on boot as well. > > The combination of these bugs ended up with the > following manifestations: > - "no channel configured; ignoring frame(s)!" > warnings in the driver > - spurious ACKs (transmission) on the air during > firmware bootup sequences > > The former was a long standing and known bug > originally though mostly harmless. > > However Marek recently discovered that this > problem also involves ACKing *all* frames the HW > receives (including beacons ;). Such frames > are delivered to host and generate the former > warning as well. > > This could be a problem with regulatory compliance > in some rare cases (e.g. Taiwan which forbids > transmissions on channel 36 which is the default > bootup channel on 5Ghz band cards). The good news > is that it'd require someone else to violate > regulatory first to coerce our device to generate > and transmit an ACK. > > The problem could be reproduced in a rather busy > environment that has a lot of APs. The likelihood > could be increased by injecting an msleep() of > 5000 or longer immediately after > ath10k_htt_setup() in ath10k_core_start(). > > The reason why the former warnings were only > showing up seldom is because the device was either > quickly reset again (i.e. during firmware probing) > or wmi vdev was created (which fixes hw and fw > states). > > It is technically possible for host driver to > override adequate hw registers however this can't > work reliably because the bug root cause lies in > incorrect firmware state on boot (internal > structure used to program MAC_PCU_ADDR1 is not > properly initialized) and only vdev create/delete > events can fix it. This is why the patch takes > dummy vdev approach. > > This could be fixed in firmware as well but having > this fixed in driver is more robust, most notably > when thinking of users of older firmware such as > 999.999.0.636. > > Reported-by: Marek Puzyniak <marek.puzyniak@tieto.com> > Signed-off-by: Michal Kazior <michal.kazior@tieto.com> I was looking at firmware to make sure that I fixed what I could there.... From what I can tell, 10.4 should not have this bug. Did you see this only on 10.1/10.2 firmware? It is of course possible that I am mis-understanding 10.4.... Thanks, Ben
On 24 August 2016 at 19:20, Ben Greear <greearb@candelatech.com> wrote: > On 07/19/2016 03:34 AM, Michal Kazior wrote: >> >> HW Rx filters and masks are not configured >> properly by firmware during boot sequences. The >> MAC_PCU_ADDR1 is set to 0s instead of 1s which >> allows the HW to ACK any frame that passes through >> MAC_PCU_RX_FILTER. The MAC_PCU_RX_FILTER itself >> is misconfigured on boot as well. >> >> The combination of these bugs ended up with the >> following manifestations: >> - "no channel configured; ignoring frame(s)!" >> warnings in the driver >> - spurious ACKs (transmission) on the air during >> firmware bootup sequences >> >> The former was a long standing and known bug >> originally though mostly harmless. >> >> However Marek recently discovered that this >> problem also involves ACKing *all* frames the HW >> receives (including beacons ;). Such frames >> are delivered to host and generate the former >> warning as well. >> >> This could be a problem with regulatory compliance >> in some rare cases (e.g. Taiwan which forbids >> transmissions on channel 36 which is the default >> bootup channel on 5Ghz band cards). The good news >> is that it'd require someone else to violate >> regulatory first to coerce our device to generate >> and transmit an ACK. >> >> The problem could be reproduced in a rather busy >> environment that has a lot of APs. The likelihood >> could be increased by injecting an msleep() of >> 5000 or longer immediately after >> ath10k_htt_setup() in ath10k_core_start(). >> >> The reason why the former warnings were only >> showing up seldom is because the device was either >> quickly reset again (i.e. during firmware probing) >> or wmi vdev was created (which fixes hw and fw >> states). >> >> It is technically possible for host driver to >> override adequate hw registers however this can't >> work reliably because the bug root cause lies in >> incorrect firmware state on boot (internal >> structure used to program MAC_PCU_ADDR1 is not >> properly initialized) and only vdev create/delete >> events can fix it. This is why the patch takes >> dummy vdev approach. >> >> This could be fixed in firmware as well but having >> this fixed in driver is more robust, most notably >> when thinking of users of older firmware such as >> 999.999.0.636. >> >> Reported-by: Marek Puzyniak <marek.puzyniak@tieto.com> >> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> > > > I was looking at firmware to make sure that I fixed what I could there.... > > From what I can tell, 10.4 should not have this bug. Did you see this only > on 10.1/10.2 firmware? It is of course possible that I am mis-understanding > 10.4.... I did see it on 10.1 and 10.2. Don't recall seeing it on 10.4 though. If you didn't see warnings on 10.4 even after adding msleep() as per commit log then I guess it doesn't suffer from the bug. MichaĆ
On 08/24/2016 11:18 PM, Michal Kazior wrote: >> I was looking at firmware to make sure that I fixed what I could there.... >> >> From what I can tell, 10.4 should not have this bug. Did you see this only >> on 10.1/10.2 firmware? It is of course possible that I am mis-understanding >> 10.4.... > > I did see it on 10.1 and 10.2. Don't recall seeing it on 10.4 though. > If you didn't see warnings on 10.4 even after adding msleep() as per > commit log then I guess it doesn't suffer from the bug. I can still occasionally see that message with a 15000 ms sleep. Based on debugging, it seems my firmware is now setting the mac-mask properly. But, as you mention, the rxfilter is enabled very early. So, probably it is still possible to see packets early if they are multicast, bcast, etc. I don't think it is worth re-working the entire rx-filter calc in the concurrency logic properly for 10.1 firmware, so I'm going to figure my fix is good enough as is as long as it sets the mac-mask properly. Thanks, Ben
On 07/19/2016 03:34 AM, Michal Kazior wrote: > > +static int ath10k_core_reset_rx_filter(struct ath10k *ar) > +{ > + int ret; > + int vdev_id; > + int vdev_type; > + int vdev_subtype; > + const u8 *vdev_addr; > + > + vdev_id = 0; > + vdev_type = WMI_VDEV_TYPE_STA; > + vdev_subtype = ath10k_wmi_get_vdev_subtype(ar, WMI_VDEV_SUBTYPE_NONE); > + vdev_addr = ar->mac_addr; > + > + ret = ath10k_wmi_vdev_create(ar, vdev_id, vdev_type, vdev_subtype, > + vdev_addr); > + if (ret) { > + ath10k_err(ar, "failed to create dummy vdev: %d\n", ret); > + return ret; > + } > + > + ret = ath10k_wmi_vdev_delete(ar, vdev_id); > + if (ret) { > + ath10k_err(ar, "failed to delete dummy vdev: %d\n", ret); > + return ret; > + } > + > + /* WMI and HTT may use separate HIF pipes and are not guaranteed to be > + * serialized properly implicitly. > + * > + * Moreover (most) WMI commands have no explicit acknowledges. It is > + * possible to infer it implicitly by poking firmware with echo > + * command - getting a reply means all preceding comments have been > + * (mostly) processed. > + * > + * In case of vdev create/delete this is sufficient. > + * > + * Without this it's possible to end up with a race when HTT Rx ring is > + * started before vdev create/delete hack is complete allowing a short > + * window of opportunity to receive (and Tx ACK) a bunch of frames. > + */ > + ret = ath10k_wmi_barrier(ar); QCA6174 UTF firmware seems doesn't support the WMI_ECHO command. [16460.274822] ath10k_pci 0000:04:00.0: wmi tlv echo value 0x0ba991e9 ... [16463.461970] ath10k_pci 0000:04:00.0: failed to ping firmware: -110 [16463.461975] ath10k_pci 0000:04:00.0: failed to reset rx filter: -110 Has anyone verified any AP solution to see if UTF mode is still working with after this patch? Anyway, I would like to exclude the workaround from all solution's UTF mode. Michal any concerns? (or maybe just for QCA61x4 if any...) > + if (ret) { > + ath10k_err(ar, "failed to ping firmware: %d\n", ret); > + return ret; > + } > + > + return 0; > +} >
On 17 September 2016 at 00:37, Hsu, Ryan <ryanhsu@qca.qualcomm.com> wrote: [...] >> + /* WMI and HTT may use separate HIF pipes and are not guaranteed to be >> + * serialized properly implicitly. >> + * >> + * Moreover (most) WMI commands have no explicit acknowledges. It is >> + * possible to infer it implicitly by poking firmware with echo >> + * command - getting a reply means all preceding comments have been >> + * (mostly) processed. >> + * >> + * In case of vdev create/delete this is sufficient. >> + * >> + * Without this it's possible to end up with a race when HTT Rx ring is >> + * started before vdev create/delete hack is complete allowing a short >> + * window of opportunity to receive (and Tx ACK) a bunch of frames. >> + */ >> + ret = ath10k_wmi_barrier(ar); > QCA6174 UTF firmware seems doesn't support the WMI_ECHO command. > > [16460.274822] ath10k_pci 0000:04:00.0: wmi tlv echo value 0x0ba991e9 > ... > [16463.461970] ath10k_pci 0000:04:00.0: failed to ping firmware: -110 > [16463.461975] ath10k_pci 0000:04:00.0: failed to reset rx filter: -110 > > Has anyone verified any AP solution to see if UTF mode is still working > with after this patch? > > Anyway, I would like to exclude the workaround from all solution's UTF mode. > > Michal any concerns? (or maybe just for QCA61x4 if any...) I didn't expect UTF wouldn't support echo.. Sorry! If you skip this workaround for UTF I guess the device will (again) be able to generate some bogus traffic on boot for UTF case. Not sure how much of a problem that is (assuming it is at all). Michal
diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index e88982921aa3..d2e255418d1b 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -1705,6 +1705,55 @@ static int ath10k_core_init_firmware_features(struct ath10k *ar) return 0; } +static int ath10k_core_reset_rx_filter(struct ath10k *ar) +{ + int ret; + int vdev_id; + int vdev_type; + int vdev_subtype; + const u8 *vdev_addr; + + vdev_id = 0; + vdev_type = WMI_VDEV_TYPE_STA; + vdev_subtype = ath10k_wmi_get_vdev_subtype(ar, WMI_VDEV_SUBTYPE_NONE); + vdev_addr = ar->mac_addr; + + ret = ath10k_wmi_vdev_create(ar, vdev_id, vdev_type, vdev_subtype, + vdev_addr); + if (ret) { + ath10k_err(ar, "failed to create dummy vdev: %d\n", ret); + return ret; + } + + ret = ath10k_wmi_vdev_delete(ar, vdev_id); + if (ret) { + ath10k_err(ar, "failed to delete dummy vdev: %d\n", ret); + return ret; + } + + /* WMI and HTT may use separate HIF pipes and are not guaranteed to be + * serialized properly implicitly. + * + * Moreover (most) WMI commands have no explicit acknowledges. It is + * possible to infer it implicitly by poking firmware with echo + * command - getting a reply means all preceding comments have been + * (mostly) processed. + * + * In case of vdev create/delete this is sufficient. + * + * Without this it's possible to end up with a race when HTT Rx ring is + * started before vdev create/delete hack is complete allowing a short + * window of opportunity to receive (and Tx ACK) a bunch of frames. + */ + ret = ath10k_wmi_barrier(ar); + if (ret) { + ath10k_err(ar, "failed to ping firmware: %d\n", ret); + return ret; + } + + return 0; +} + int ath10k_core_start(struct ath10k *ar, enum ath10k_firmware_mode mode, const struct ath10k_fw_components *fw) { @@ -1872,6 +1921,25 @@ int ath10k_core_start(struct ath10k *ar, enum ath10k_firmware_mode mode, goto err_hif_stop; } + /* Some firmware revisions do not properly set up hardware rx filter + * registers. + * + * A known example from QCA9880 and 10.2.4 is that MAC_PCU_ADDR1_MASK + * is filled with 0s instead of 1s allowing HW to respond with ACKs to + * any frames that matches MAC_PCU_RX_FILTER which is also + * misconfigured to accept anything. + * + * The ADDR1 is programmed using internal firmware structure field and + * can't be (easily/sanely) reached from the driver explicitly. It is + * possible to implicitly make it correct by creating a dummy vdev and + * then deleting it. + */ + status = ath10k_core_reset_rx_filter(ar); + if (status) { + ath10k_err(ar, "failed to reset rx filter: %d\n", status); + goto err_hif_stop; + } + /* If firmware indicates Full Rx Reorder support it must be used in a * slightly different manner. Let HTT code know. */
HW Rx filters and masks are not configured properly by firmware during boot sequences. The MAC_PCU_ADDR1 is set to 0s instead of 1s which allows the HW to ACK any frame that passes through MAC_PCU_RX_FILTER. The MAC_PCU_RX_FILTER itself is misconfigured on boot as well. The combination of these bugs ended up with the following manifestations: - "no channel configured; ignoring frame(s)!" warnings in the driver - spurious ACKs (transmission) on the air during firmware bootup sequences The former was a long standing and known bug originally though mostly harmless. However Marek recently discovered that this problem also involves ACKing *all* frames the HW receives (including beacons ;). Such frames are delivered to host and generate the former warning as well. This could be a problem with regulatory compliance in some rare cases (e.g. Taiwan which forbids transmissions on channel 36 which is the default bootup channel on 5Ghz band cards). The good news is that it'd require someone else to violate regulatory first to coerce our device to generate and transmit an ACK. The problem could be reproduced in a rather busy environment that has a lot of APs. The likelihood could be increased by injecting an msleep() of 5000 or longer immediately after ath10k_htt_setup() in ath10k_core_start(). The reason why the former warnings were only showing up seldom is because the device was either quickly reset again (i.e. during firmware probing) or wmi vdev was created (which fixes hw and fw states). It is technically possible for host driver to override adequate hw registers however this can't work reliably because the bug root cause lies in incorrect firmware state on boot (internal structure used to program MAC_PCU_ADDR1 is not properly initialized) and only vdev create/delete events can fix it. This is why the patch takes dummy vdev approach. This could be fixed in firmware as well but having this fixed in driver is more robust, most notably when thinking of users of older firmware such as 999.999.0.636. Reported-by: Marek Puzyniak <marek.puzyniak@tieto.com> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> --- drivers/net/wireless/ath/ath10k/core.c | 68 ++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+)