diff mbox

[4/4] ath10k: fix spurious tx/rx during boot

Message ID 1468924452-23877-5-git-send-email-michal.kazior@tieto.com (mailing list archive)
State Accepted
Commit 47b1848d9fde5daf102f599be6e589a1d3c8da7d
Delegated to: Kalle Valo
Headers show

Commit Message

Michal Kazior July 19, 2016, 10:34 a.m. UTC
HW Rx filters and masks are not configured
properly by firmware during boot sequences. The
MAC_PCU_ADDR1 is set to 0s instead of 1s which
allows the HW to ACK any frame that passes through
MAC_PCU_RX_FILTER. The MAC_PCU_RX_FILTER itself
is misconfigured on boot as well.

The combination of these bugs ended up with the
following manifestations:
 - "no channel configured; ignoring frame(s)!"
   warnings in the driver
 - spurious ACKs (transmission) on the air during
   firmware bootup sequences

The former was a long standing and known bug
originally though mostly harmless.

However Marek recently discovered that this
problem also involves ACKing *all* frames the HW
receives (including beacons ;). Such frames
are delivered to host and generate the former
warning as well.

This could be a problem with regulatory compliance
in some rare cases (e.g. Taiwan which forbids
transmissions on channel 36 which is the default
bootup channel on 5Ghz band cards). The good news
is that it'd require someone else to violate
regulatory first to coerce our device to generate
and transmit an ACK.

The problem could be reproduced in a rather busy
environment that has a lot of APs. The likelihood
could be increased by injecting an msleep() of
5000 or longer immediately after
ath10k_htt_setup() in ath10k_core_start().

The reason why the former warnings were only
showing up seldom is because the device was either
quickly reset again (i.e. during firmware probing)
or wmi vdev was created (which fixes hw and fw
states).

It is technically possible for host driver to
override adequate hw registers however this can't
work reliably because the bug root cause lies in
incorrect firmware state on boot (internal
structure used to program MAC_PCU_ADDR1 is not
properly initialized) and only vdev create/delete
events can fix it. This is why the patch takes
dummy vdev approach.

This could be fixed in firmware as well but having
this fixed in driver is more robust, most notably
when thinking of users of older firmware such as
999.999.0.636.

Reported-by: Marek Puzyniak <marek.puzyniak@tieto.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/core.c | 68 ++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

Comments

Ben Greear Aug. 24, 2016, 5:20 p.m. UTC | #1
On 07/19/2016 03:34 AM, Michal Kazior wrote:
> HW Rx filters and masks are not configured
> properly by firmware during boot sequences. The
> MAC_PCU_ADDR1 is set to 0s instead of 1s which
> allows the HW to ACK any frame that passes through
> MAC_PCU_RX_FILTER. The MAC_PCU_RX_FILTER itself
> is misconfigured on boot as well.
>
> The combination of these bugs ended up with the
> following manifestations:
>  - "no channel configured; ignoring frame(s)!"
>    warnings in the driver
>  - spurious ACKs (transmission) on the air during
>    firmware bootup sequences
>
> The former was a long standing and known bug
> originally though mostly harmless.
>
> However Marek recently discovered that this
> problem also involves ACKing *all* frames the HW
> receives (including beacons ;). Such frames
> are delivered to host and generate the former
> warning as well.
>
> This could be a problem with regulatory compliance
> in some rare cases (e.g. Taiwan which forbids
> transmissions on channel 36 which is the default
> bootup channel on 5Ghz band cards). The good news
> is that it'd require someone else to violate
> regulatory first to coerce our device to generate
> and transmit an ACK.
>
> The problem could be reproduced in a rather busy
> environment that has a lot of APs. The likelihood
> could be increased by injecting an msleep() of
> 5000 or longer immediately after
> ath10k_htt_setup() in ath10k_core_start().
>
> The reason why the former warnings were only
> showing up seldom is because the device was either
> quickly reset again (i.e. during firmware probing)
> or wmi vdev was created (which fixes hw and fw
> states).
>
> It is technically possible for host driver to
> override adequate hw registers however this can't
> work reliably because the bug root cause lies in
> incorrect firmware state on boot (internal
> structure used to program MAC_PCU_ADDR1 is not
> properly initialized) and only vdev create/delete
> events can fix it. This is why the patch takes
> dummy vdev approach.
>
> This could be fixed in firmware as well but having
> this fixed in driver is more robust, most notably
> when thinking of users of older firmware such as
> 999.999.0.636.
>
> Reported-by: Marek Puzyniak <marek.puzyniak@tieto.com>
> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>

I was looking at firmware to make sure that I fixed what I could there....

 From what I can tell, 10.4 should not have this bug.  Did you see this only
on 10.1/10.2 firmware?  It is of course possible that I am mis-understanding
10.4....

Thanks,
Ben
Michal Kazior Aug. 25, 2016, 6:18 a.m. UTC | #2
On 24 August 2016 at 19:20, Ben Greear <greearb@candelatech.com> wrote:
> On 07/19/2016 03:34 AM, Michal Kazior wrote:
>>
>> HW Rx filters and masks are not configured
>> properly by firmware during boot sequences. The
>> MAC_PCU_ADDR1 is set to 0s instead of 1s which
>> allows the HW to ACK any frame that passes through
>> MAC_PCU_RX_FILTER. The MAC_PCU_RX_FILTER itself
>> is misconfigured on boot as well.
>>
>> The combination of these bugs ended up with the
>> following manifestations:
>>  - "no channel configured; ignoring frame(s)!"
>>    warnings in the driver
>>  - spurious ACKs (transmission) on the air during
>>    firmware bootup sequences
>>
>> The former was a long standing and known bug
>> originally though mostly harmless.
>>
>> However Marek recently discovered that this
>> problem also involves ACKing *all* frames the HW
>> receives (including beacons ;). Such frames
>> are delivered to host and generate the former
>> warning as well.
>>
>> This could be a problem with regulatory compliance
>> in some rare cases (e.g. Taiwan which forbids
>> transmissions on channel 36 which is the default
>> bootup channel on 5Ghz band cards). The good news
>> is that it'd require someone else to violate
>> regulatory first to coerce our device to generate
>> and transmit an ACK.
>>
>> The problem could be reproduced in a rather busy
>> environment that has a lot of APs. The likelihood
>> could be increased by injecting an msleep() of
>> 5000 or longer immediately after
>> ath10k_htt_setup() in ath10k_core_start().
>>
>> The reason why the former warnings were only
>> showing up seldom is because the device was either
>> quickly reset again (i.e. during firmware probing)
>> or wmi vdev was created (which fixes hw and fw
>> states).
>>
>> It is technically possible for host driver to
>> override adequate hw registers however this can't
>> work reliably because the bug root cause lies in
>> incorrect firmware state on boot (internal
>> structure used to program MAC_PCU_ADDR1 is not
>> properly initialized) and only vdev create/delete
>> events can fix it. This is why the patch takes
>> dummy vdev approach.
>>
>> This could be fixed in firmware as well but having
>> this fixed in driver is more robust, most notably
>> when thinking of users of older firmware such as
>> 999.999.0.636.
>>
>> Reported-by: Marek Puzyniak <marek.puzyniak@tieto.com>
>> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
>
>
> I was looking at firmware to make sure that I fixed what I could there....
>
> From what I can tell, 10.4 should not have this bug.  Did you see this only
> on 10.1/10.2 firmware?  It is of course possible that I am mis-understanding
> 10.4....

I did see it on 10.1 and 10.2. Don't recall seeing it on 10.4 though.
If you didn't see warnings on 10.4 even after adding msleep() as per
commit log then I guess it doesn't suffer from the bug.


MichaƂ
Ben Greear Aug. 25, 2016, 7:19 p.m. UTC | #3
On 08/24/2016 11:18 PM, Michal Kazior wrote:

>> I was looking at firmware to make sure that I fixed what I could there....
>>
>> From what I can tell, 10.4 should not have this bug.  Did you see this only
>> on 10.1/10.2 firmware?  It is of course possible that I am mis-understanding
>> 10.4....
>
> I did see it on 10.1 and 10.2. Don't recall seeing it on 10.4 though.
> If you didn't see warnings on 10.4 even after adding msleep() as per
> commit log then I guess it doesn't suffer from the bug.

I can still occasionally see that message with a 15000 ms sleep.

Based on debugging, it seems my firmware is now setting the mac-mask properly.

But, as you mention, the rxfilter is enabled very early.  So, probably
it is still possible to see packets early if they are multicast, bcast, etc.

I don't think it is worth re-working the entire rx-filter calc in
the concurrency logic properly for 10.1 firmware, so I'm going to figure
my fix is good enough as is as long as it sets the mac-mask properly.

Thanks,
Ben
Ryan Hsu Sept. 16, 2016, 10:37 p.m. UTC | #4
On 07/19/2016 03:34 AM, Michal Kazior wrote:
>   
> +static int ath10k_core_reset_rx_filter(struct ath10k *ar)
> +{
> +	int ret;
> +	int vdev_id;
> +	int vdev_type;
> +	int vdev_subtype;
> +	const u8 *vdev_addr;
> +
> +	vdev_id = 0;
> +	vdev_type = WMI_VDEV_TYPE_STA;
> +	vdev_subtype = ath10k_wmi_get_vdev_subtype(ar, WMI_VDEV_SUBTYPE_NONE);
> +	vdev_addr = ar->mac_addr;
> +
> +	ret = ath10k_wmi_vdev_create(ar, vdev_id, vdev_type, vdev_subtype,
> +				     vdev_addr);
> +	if (ret) {
> +		ath10k_err(ar, "failed to create dummy vdev: %d\n", ret);
> +		return ret;
> +	}
> +
> +	ret = ath10k_wmi_vdev_delete(ar, vdev_id);
> +	if (ret) {
> +		ath10k_err(ar, "failed to delete dummy vdev: %d\n", ret);
> +		return ret;
> +	}
> +
> +	/* WMI and HTT may use separate HIF pipes and are not guaranteed to be
> +	 * serialized properly implicitly.
> +	 *
> +	 * Moreover (most) WMI commands have no explicit acknowledges. It is
> +	 * possible to infer it implicitly by poking firmware with echo
> +	 * command - getting a reply means all preceding comments have been
> +	 * (mostly) processed.
> +	 *
> +	 * In case of vdev create/delete this is sufficient.
> +	 *
> +	 * Without this it's possible to end up with a race when HTT Rx ring is
> +	 * started before vdev create/delete hack is complete allowing a short
> +	 * window of opportunity to receive (and Tx ACK) a bunch of frames.
> +	 */
> +	ret = ath10k_wmi_barrier(ar);
QCA6174 UTF firmware seems doesn't support the WMI_ECHO command.

[16460.274822] ath10k_pci 0000:04:00.0: wmi tlv echo value 0x0ba991e9
...
[16463.461970] ath10k_pci 0000:04:00.0: failed to ping firmware: -110
[16463.461975] ath10k_pci 0000:04:00.0: failed to reset rx filter: -110

Has anyone verified any AP solution to see if UTF mode is still working 
with after this patch?

Anyway, I would like to exclude the workaround from all solution's UTF mode.

Michal any concerns? (or maybe just for QCA61x4 if any...)

> +	if (ret) {
> +		ath10k_err(ar, "failed to ping firmware: %d\n", ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
>
Michal Kazior Sept. 19, 2016, 9:22 a.m. UTC | #5
On 17 September 2016 at 00:37, Hsu, Ryan <ryanhsu@qca.qualcomm.com> wrote:
[...]
>> +     /* WMI and HTT may use separate HIF pipes and are not guaranteed to be
>> +      * serialized properly implicitly.
>> +      *
>> +      * Moreover (most) WMI commands have no explicit acknowledges. It is
>> +      * possible to infer it implicitly by poking firmware with echo
>> +      * command - getting a reply means all preceding comments have been
>> +      * (mostly) processed.
>> +      *
>> +      * In case of vdev create/delete this is sufficient.
>> +      *
>> +      * Without this it's possible to end up with a race when HTT Rx ring is
>> +      * started before vdev create/delete hack is complete allowing a short
>> +      * window of opportunity to receive (and Tx ACK) a bunch of frames.
>> +      */
>> +     ret = ath10k_wmi_barrier(ar);
> QCA6174 UTF firmware seems doesn't support the WMI_ECHO command.
>
> [16460.274822] ath10k_pci 0000:04:00.0: wmi tlv echo value 0x0ba991e9
> ...
> [16463.461970] ath10k_pci 0000:04:00.0: failed to ping firmware: -110
> [16463.461975] ath10k_pci 0000:04:00.0: failed to reset rx filter: -110
>
> Has anyone verified any AP solution to see if UTF mode is still working
> with after this patch?
>
> Anyway, I would like to exclude the workaround from all solution's UTF mode.
>
> Michal any concerns? (or maybe just for QCA61x4 if any...)

I didn't expect UTF wouldn't support echo.. Sorry!

If you skip this workaround for UTF I guess the device will (again) be
able to generate some bogus traffic on boot for UTF case. Not sure how
much of a problem that is (assuming it is at all).


Michal
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index e88982921aa3..d2e255418d1b 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -1705,6 +1705,55 @@  static int ath10k_core_init_firmware_features(struct ath10k *ar)
 	return 0;
 }
 
+static int ath10k_core_reset_rx_filter(struct ath10k *ar)
+{
+	int ret;
+	int vdev_id;
+	int vdev_type;
+	int vdev_subtype;
+	const u8 *vdev_addr;
+
+	vdev_id = 0;
+	vdev_type = WMI_VDEV_TYPE_STA;
+	vdev_subtype = ath10k_wmi_get_vdev_subtype(ar, WMI_VDEV_SUBTYPE_NONE);
+	vdev_addr = ar->mac_addr;
+
+	ret = ath10k_wmi_vdev_create(ar, vdev_id, vdev_type, vdev_subtype,
+				     vdev_addr);
+	if (ret) {
+		ath10k_err(ar, "failed to create dummy vdev: %d\n", ret);
+		return ret;
+	}
+
+	ret = ath10k_wmi_vdev_delete(ar, vdev_id);
+	if (ret) {
+		ath10k_err(ar, "failed to delete dummy vdev: %d\n", ret);
+		return ret;
+	}
+
+	/* WMI and HTT may use separate HIF pipes and are not guaranteed to be
+	 * serialized properly implicitly.
+	 *
+	 * Moreover (most) WMI commands have no explicit acknowledges. It is
+	 * possible to infer it implicitly by poking firmware with echo
+	 * command - getting a reply means all preceding comments have been
+	 * (mostly) processed.
+	 *
+	 * In case of vdev create/delete this is sufficient.
+	 *
+	 * Without this it's possible to end up with a race when HTT Rx ring is
+	 * started before vdev create/delete hack is complete allowing a short
+	 * window of opportunity to receive (and Tx ACK) a bunch of frames.
+	 */
+	ret = ath10k_wmi_barrier(ar);
+	if (ret) {
+		ath10k_err(ar, "failed to ping firmware: %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
 int ath10k_core_start(struct ath10k *ar, enum ath10k_firmware_mode mode,
 		      const struct ath10k_fw_components *fw)
 {
@@ -1872,6 +1921,25 @@  int ath10k_core_start(struct ath10k *ar, enum ath10k_firmware_mode mode,
 		goto err_hif_stop;
 	}
 
+	/* Some firmware revisions do not properly set up hardware rx filter
+	 * registers.
+	 *
+	 * A known example from QCA9880 and 10.2.4 is that MAC_PCU_ADDR1_MASK
+	 * is filled with 0s instead of 1s allowing HW to respond with ACKs to
+	 * any frames that matches MAC_PCU_RX_FILTER which is also
+	 * misconfigured to accept anything.
+	 *
+	 * The ADDR1 is programmed using internal firmware structure field and
+	 * can't be (easily/sanely) reached from the driver explicitly. It is
+	 * possible to implicitly make it correct by creating a dummy vdev and
+	 * then deleting it.
+	 */
+	status = ath10k_core_reset_rx_filter(ar);
+	if (status) {
+		ath10k_err(ar, "failed to reset rx filter: %d\n", status);
+		goto err_hif_stop;
+	}
+
 	/* If firmware indicates Full Rx Reorder support it must be used in a
 	 * slightly different manner. Let HTT code know.
 	 */