Message ID | 201608030349.u733nRPn000595@sdf.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Kalle Valo |
Headers | show |
On Wednesday, August 3, 2016 3:49:26 AM CEST Alan Curry wrote: > Al Viro wrote: > > > > Which just might mean that we have *three* issues here - > > (1) buggered __copy_to_user_inatomic() (and friends) on some sparcs > > (2) your ssl-only corruption > > (3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc > > anywhere, and no multi-segment recvmsg(). Which would strongly argue in > > favour of some kind of copy_page_to_iter() breakage triggered when handling > > a fragmented skb, as in (1). Except that I don't see anything similar in > > x86_64 uaccess primitives... > > > > I think I've solved (3) at least... > > Using the twin weapons of printk and stubbornness, I have built a working > theory of the bug. I haven't traced it all the way through, so my explanation > may be partly wrong. I do have a patch that eliminates the symptom in all my > tests though. Here's what happens: > > A corrupted packet somehow arrives in skb_copy_and_csum_datagram_msg(). > During downloads at reasonably high speed, about 0.1% of my incoming > packets are bad. Probably because the access point is that suspicious > Comcast thing. Thanks for being very persistent with this. I think I'm able to reproduce this now (on any hardware... like r8169 ethernet) as long as the following "traffic policy" is enacted on the HTTP - Server: # tc qdisc add dev eth0 root netem corrupt 0.1% (This needs the "Network Emulation" Sched CONFIG_NET_SCH_NETEM [0].) With your tool (changed to point to my apache local server). I'm seeing corruptions in the "noselect" case. Running it in "select" mode however and the resulting files have no corruptions. About AR9170 corruption issues: I know of one report that the AR9170's Encryption Engine can cause corruptions [1]. In this case outgoing data was corrupted which lead to deauths/disassocs since the AP was basically sending out multicast deauths/disassocs with bad addresses. However, "nohwcrypt" should have made a difference there since the software decryption would discard the faulty package due the message integrety checks. Another source for corruptions could be the USB-PHY (FUSB200) in the AR9170 [2]. I know it's causing problems for the ath9k_htc. However not everyone is affected. One thing I noticed in your previous post is that you "might" not have draft-802.11n enabled. Do you see any "disabling HT/VHT due to WEP/TKIP use." in your dmesg logs? If so, check if you can force your AP to use WPA2 with CCMP/AES only. Regards, Christian [0] <http://www.spinics.net/lists/linux-wireless/msg60104.html> [1] <https://wiki.linuxfoundation.org/networking/netem> [2] <https://github.com/qca/open-ath9k-htc-firmware/wiki/usb-related-issues> -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christian Lamparter wrote: > > One thing I noticed in your previous post is that you "might" not have > draft-802.11n enabled. Do you see any "disabling HT/VHT due to WEP/TKIP use." > in your dmesg logs? If so, check if you can force your AP to use WPA2 > with CCMP/AES only. > Yes, I've had that message. The reason wan't on the AP though. My wpa_supplicant.conf only had TKIP enabled, because that's what was in the sample configuration file I started with. Adding CCMP there worked, and in that mode I'm no longer getting any corrupted packets. If I'd paid attention to the encryption options when setting up this network originally, I would have had CCMP the whole time, with no corrupted packets, and never would have found the iov iterator bug...
diff --git a/net/core/datagram.c b/net/core/datagram.c index b7de71f..574d4bf 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -730,6 +730,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, { __wsum csum; int chunk = skb->len - hlen; + struct iov_iter save_iter; if (!chunk) return 0; @@ -741,11 +742,14 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, goto fault; } else { csum = csum_partial(skb->data, hlen, skb->csum); + memcpy(&save_iter, &msg->msg_iter, sizeof save_iter); if (skb_copy_and_csum_datagram(skb, hlen, &msg->msg_iter, chunk, &csum)) goto fault; - if (csum_fold(csum)) + if (csum_fold(csum)) { + memcpy(&msg->msg_iter, &save_iter, sizeof save_iter); goto csum_error; + } if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) netdev_rx_csum_fault(skb->dev); }