From patchwork Thu Aug 10 12:43:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arend van Spriel X-Patchwork-Id: 9893631 X-Patchwork-Delegate: kvalo@adurom.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7A8D660325 for ; Thu, 10 Aug 2017 12:43:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6AF6828B17 for ; Thu, 10 Aug 2017 12:43:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5EDA728B05; Thu, 10 Aug 2017 12:43:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC67228AFC for ; Thu, 10 Aug 2017 12:43:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752514AbdHJMnM (ORCPT ); Thu, 10 Aug 2017 08:43:12 -0400 Received: from mail-wm0-f41.google.com ([74.125.82.41]:33839 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752037AbdHJMnL (ORCPT ); Thu, 10 Aug 2017 08:43:11 -0400 Received: by mail-wm0-f41.google.com with SMTP id t138so20716000wmt.1 for ; Thu, 10 Aug 2017 05:43:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=ZxH00Orje+ZaARHXK82v7hBTYtIXnMTc/ks/l1CqOYA=; b=V0TZlTwGZPA0ANjPiz1hBihZAMNZ+2jCBeRFzxoWQuE6vmYmbP3JUKF+lUXy/r7oyn XOr0jd3tabdFYS4Kr6A0C6eMHXlVuIHH/a37mAsk0acJaV9UWtXNhqzPaPhQUo/y0K6k yoaVUPBqXBRV2njRehQwjW9t1bBDHN/ZF2WlQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ZxH00Orje+ZaARHXK82v7hBTYtIXnMTc/ks/l1CqOYA=; b=OzZq2kV7uTU4XKiUiRmyf5U23eYvSIHsHyYriw9i/BB2QX6DRbsIqfaiHH20XlNozs yEL1xrmsg5LZLGq4/r1lD4H1kPew/yljPy1BmR2jgrV50PD/CDhDNNrwDR9BdbPjyEt/ 2qiOE4FHUsKgwegUcwwTpOhsOjSYA+C7sVJUcE864Wq0J5N/l3Yp8EFCBmFHsBnLKkBA u8iXOeHoo1eMyEIPGD1qOwFYXHvdYqn7r/e/GsVy+09B/t+YYamtGrGP8HK6Es+l2fMb sfBkheTS/TveggJ7kT8TbP5O9kBuPg+7JRPoLPL/LPRBnBa01DQ7gY417NO0j8v7LvKY g1lQ== X-Gm-Message-State: AHYfb5hR5opRzQ7rpIkeWo6xoiUVYuvPTKKaQ3fMpMcCFI4zywPUuch0 d7l76IQwBdp2n/iu X-Received: by 10.80.146.86 with SMTP id j22mr11683906eda.89.1502368989948; Thu, 10 Aug 2017 05:43:09 -0700 (PDT) Received: from [192.168.178.39] (f140230.upc-f.chello.nl. [80.56.140.230]) by smtp.gmail.com with ESMTPSA id f25sm3214962edf.60.2017.08.10.05.43.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Aug 2017 05:43:09 -0700 (PDT) Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers To: Kalle Valo , Mahesh Bandewar , Andy Gospodarek Cc: David Miller , netdev@vger.kernel.org, linux-wireless@vger.kernel.org, James Feeney References: <87shh0gewn.fsf@kamboji.qca.qualcomm.com> From: Arend van Spriel Message-ID: <8845e49b-3165-e6df-5935-c86278d220d9@broadcom.com> Date: Thu, 10 Aug 2017 14:43:06 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <87shh0gewn.fsf@kamboji.qca.qualcomm.com> Content-Language: en-US Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 10-08-17 07:39, Kalle Valo wrote: > Hi Mahesh and Andy, > > James Feeney reported that there's a serious regression in bonding > module since v4.12, it doesn't work with wireless drivers anymore as > wireless drivers don't report the link speed via ethtool: > > https://bugzilla.kernel.org/show_bug.cgi?id=196547 > > In the bug report it's said that this commit is the culprit: > > 3f3c278c94dd bonding: fix active-backup transition This commit references another one. ie. commit c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state"). Before this commit the result of __ethtool_get_link_ksettings() was simply ignored. ruling it out to be used as active bond slave. To the end-users who were using bonding this is simply a regression. So to fix that both changes should be reverted in my opinion. Now specifically for wireless interfaces we could implement get_link_ksettings callback although most of the fields requested are meaningless in wireless context. Regarding the speed and half-duplex values we raised some concerns in an earlier discussion with James. Wireless is always half-duplex as there can be only one (unintended ref to [1]). If the reported speed in wifi is difficult. In wifi we have txrate and rxrate which are inherently asynchronous and it is a per-packet value so it is going to change a lot. Seeing only 4 call sites in the bonding code tells me that is not taken into account. All in all this shenanigan seems netconf material to me. Regards, Arend [1] https://en.wikipedia.org/wiki/Highlander_(film) --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -365,9 +365,10 @@ int bond_set_carrier(struct bonding *bond) /* Get link speed and duplex from the slave's base driver * using ethtool. If for some reason the call fails or the * values are invalid, set speed and duplex to -1, - * and return. + * and return. Return 1 if speed or duplex settings are + * UNKNOWN; 0 otherwise. */ -static void bond_update_speed_duplex(struct slave *slave) +static int bond_update_speed_duplex(struct slave *slave) { struct net_device *slave_dev = slave->dev; struct ethtool_link_ksettings ecmd; @@ -377,24 +378,27 @@ static void bond_update_speed_duplex(struct slave *slave) slave->duplex = DUPLEX_UNKNOWN; res = __ethtool_get_link_ksettings(slave_dev, &ecmd); - if (res < 0) - return; - - if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) - return; - + if (res < 0) { + slave->link = BOND_LINK_DOWN; + return 1; + } + if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) { + slave->link = BOND_LINK_DOWN; + return 1; + } Commit 3f3c278c94dd ("bonding: fix active-backup transition") moves setting the link state to the call sites of bond_update_speed_duplex(), just not all call sites. > Is there a fix for this or should that commit be reverted? This seems to > be a serious regression as there are multiple reports already and we > should get it fixed for v4.13, and the fix backported to v4.12 stable > release. The ethtool callbacks really seem optional. At least in brcmfmac, the wireless driver I maintain, I only provide get_drvinfo callback and there is no warning triggered upon registering the netdev. The changes above now require each netdev to implement the get_link_ksettings callback (get_settings is deprecated) or the link is marked as DOWN