From patchwork Thu Sep 26 01:24:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: chengyechun X-Patchwork-Id: 13812674 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70F2A2C18C for ; Thu, 26 Sep 2024 01:24:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727313892; cv=none; b=dxvw2r6GaGbupp0/bi47+/jhn+sr6N9P8zVbG/EhEb3Gyhtq/Vi1PFDBZGxMJ5CCCOMdMUG2LQMMt0l6PjsCcb71UBz/eMCntOd8sXDv8ZhMvj9MIdX85PaGgLpRLROqdweHVG02EJX0NG25D0fY/SnJL2WiPyWI4L5PvtIolJg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727313892; c=relaxed/simple; bh=z2FkN7Uv8WpIEuaoeQKJ+Q276JtcRn/m+/KcAq2vM4g=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=fuiT66EqPNJdLSnyETquA8EeAOx96CeiUYeMcltDjoT1w5uJhTD3u68xl2Sq80V0t67R3qWYxDa4uMEd0gQ4srbthysQTqWw+pWMteqQfQX5qrHjrSPuOzdfcWCcJsBk+WjYW6ljviyL0vDlrNgWM5+fhqxqa9QcfQaibYWWtjs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4XDbTX46vrzySJ9; Thu, 26 Sep 2024 09:23:44 +0800 (CST) Received: from dggpemf100016.china.huawei.com (unknown [7.185.36.236]) by mail.maildlp.com (Postfix) with ESMTPS id 4FBDA1800CF; Thu, 26 Sep 2024 09:24:46 +0800 (CST) Received: from dggpemf500016.china.huawei.com (7.185.36.197) by dggpemf100016.china.huawei.com (7.185.36.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 26 Sep 2024 09:24:46 +0800 Received: from dggpemf500016.china.huawei.com ([7.185.36.197]) by dggpemf500016.china.huawei.com ([7.185.36.197]) with mapi id 15.02.1544.011; Thu, 26 Sep 2024 09:24:46 +0800 From: chengyechun To: "netdev@vger.kernel.org" CC: Jay Vosburgh , =?eucgb2312_cn?b?o6xBbmR5IEdvc3BvZGFyZWs=?= Subject: =?eucgb2312_cn?b?tPC4tDogW0Rpc2N1c3NdUXVlc3Rpb25zIGFib3V0IGFjdGl2ZSBzbGF2?= =?eucgb2312_cn?b?ZSBzZWxlY3QgaW4gYm9uZGluZyA4MDIzYWQ=?= Thread-Topic: [Discuss]Questions about active slave select in bonding 8023ad Thread-Index: AdsKZKjErjnqndRMTZCIcmgDUhEG9gACy1PAAVC4CFA= Date: Thu, 26 Sep 2024 01:24:45 +0000 Message-ID: <7b0d827769974176835440fe5211522a@huawei.com> References: In-Reply-To: Accept-Language: en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 I hope to get a reply, if I have any questions about thinking, please let me know, thank you. -----邮件原件----- 发件人: chengyechun 发送时间: 2024年9月19日 16:43 收件人: 'netdev@vger.kernel.org' 抄送: 'Jay Vosburgh' ; ',Andy Gospodarek' 主题: 答复: [Discuss]Questions about active slave select in bonding 8023ad Here is patch: Subject: [PATCH] bonding: enable best slave after switch under condition 3a --- drivers/net/bonding/bond_3ad.c | 2 ++ 1 file changed, 2 insertions(+) -- -----邮件原件----- 发件人: chengyechun 发送时间: 2024年9月19日 15:22 收件人: 'netdev@vger.kernel.org' 抄送: 'Jay Vosburgh' ; ',Andy Gospodarek' 主题: [Discuss]Questions about active slave select in bonding 8023ad Hi all, Recently,I'm having a problem starting bond. It's an occasional problem. After the slave and bond are configured, the network fails to be restarted. The failure cause is as follows: “/etc/sysconfig/network-scripts/ifup-eth[2747129]: Error, some other host () already uses address 1.1.1.39.” When the network uses arping to check whether an IP address conflict occurs, an error occurs, but the IP address conflict is not caused. this is very strange. The kernel version 5.10 is used. The bond configuration is as follows: BONDING_OPTS='mode=4 miimon=100 lacp_rate=fast xmit_hash_policy=layer3+4' TYPE=Bond BONDING_MASTER=yes BOOTPROTO=static NM_CONTROLLED=no IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_FAILURE_FATAL=no IPV6_ADDR_GEN_MODE=stable-privacy NAME=bond0 DEVICE=bond0 ONBOOT=yes IPADDR=1.1.1.38 NETMASK=255.255.0.0 IPV6ADDR=1:1:1::39/64 The slave configuration is as follows: and I have four similar slaves enp13s0,enp14s0,enp15s0 NAME=enp12s0 DEVICE=enp12s0 BOOTPROTO=none ONBOOT=yes USERCTL=no NM_CONTROLLED=no MASTER=bond0 SLAVE=yes IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_FAILURE_FATAL=no After I discovered this problem, I restarted the network multiple times and it always happened once or twice. After some debugging, it is found that the bond interface does not have an available slave when the arping packet is sent. As a result, the arping packet fails to be sent. When the problem occurs, the active slave node is switched from enp12s0 to enp13s0. However, the backup of enp13s0 is not changed from 1 to 0 immediately after the switchover is complete. This is a mechanism or bug? After thinking about it, I have a doubt about the select of active slave. In the ad_agg_selection_test function, if condition 3a is met, that is, if (__agg_has_partner(curr) && !__agg_has_partner(best)),and after the active slave switch is successful, why not enable_port the best slave in ad_agg_selection_logic? diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index ae0393dff..8494420ed 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -1819,6 +1819,8 @@ static void ad_agg_selection_logic(struct aggregator *agg, __disable_port(port); } } + port = best->lag_ports; + __enbale_port(port); /* Slave array needs update. */ *update_slave_arr = true; }