[net-next,0/6] net: nexthop: Increase weight to u16

Message ID	cover.1722519021.git.petrm@nvidia.com (mailing list archive)
Headers	show Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2074.outbound.protection.outlook.com [40.107.237.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CCB61B1428 for <netdev@vger.kernel.org>; Thu, 1 Aug 2024 16:24:59 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C From: Petr Machata <petrm@nvidia.com> To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, <netdev@vger.kernel.org> CC: Ido Schimmel <idosch@nvidia.com>, Petr Machata <petrm@nvidia.com>, "David Ahern" <dsahern@kernel.org>, Donald Sharp <sharpd@nvidia.com>, <mlxsw@nvidia.com> Subject: [PATCH net-next 0/6] net: nexthop: Increase weight to u16 Date: Thu, 1 Aug 2024 18:23:56 +0200 Message-ID: <cover.1722519021.git.petrm@nvidia.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	net: nexthop: Increase weight to u16 \| expand [net-next,0/6] net: nexthop: Increase weight to u16 [net-next,1/6] net: nexthop: Add flag to assert that NHGRP reserved fields are zero [net-next,2/6] net: nexthop: Increase weight to u16 [net-next,3/6] selftests: router_mpath: Sleep after MZ [net-next,4/6] selftests: router_mpath_nh: Test 16-bit next hop weights [net-next,5/6] selftests: router_mpath_nh_res: Test 16-bit next hop weights [net-next,6/6] selftests: fib_nexthops: Test 16-bit next hop weights

Message ID

cover.1722519021.git.petrm@nvidia.com (mailing list archive)

Headers

Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 216.228.117.160 as permitted sender) receiver=protection.outlook.com;
 client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C
From: Petr Machata <petrm@nvidia.com>
To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet
	<edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni
	<pabeni@redhat.com>, <netdev@vger.kernel.org>
CC: Ido Schimmel <idosch@nvidia.com>, Petr Machata <petrm@nvidia.com>, "David
 Ahern" <dsahern@kernel.org>, Donald Sharp <sharpd@nvidia.com>,
	<mlxsw@nvidia.com>
Subject: [PATCH net-next 0/6] net: nexthop: Increase weight to u16
Date: Thu, 1 Aug 2024 18:23:56 +0200
Message-ID: <cover.1722519021.git.petrm@nvidia.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Aug 2024 16:24:52.7908
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 f640b79b-e343-4090-5664-08dcb2467913
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: 
	CO1PEPF000042AB.namprd03.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR12MB9046
X-Patchwork-Delegate: kuba@kernel.org

Series

net: nexthop: Increase weight to u16 | expand

Message

Petr Machata Aug. 1, 2024, 4:23 p.m. UTC

In CLOS networks, as link failures occur at various points in the network,
ECMP weights of the involved nodes are adjusted to compensate. With high
fan-out of the involved nodes, and overall high number of nodes,
a (non-)ECMP weight ratio that we would like to configure does not fit into
8 bits. Instead of, say, 255:254, we might like to configure something like
1000:999. For these deployments, the 8-bit weight may not be enough.

To that end, in this patchset increase the next hop weight from u8 to u16.

Patch #1 adds a flag that indicates whether the reserved fields are zeroed.
This is a follow-up to a new fix merged in commit 6d745cd0e972 ("net:
nexthop: Initialize all fields in dumped nexthops"). The theory behind this
patch is that there is a strict ordering between the fields actually being
zeroed, the kernel declaring that they are, and the kernel repurposing the
fields. Thus clients can use the flag to tell if it is safe to interpret
the reserved fields in any way.

Patch #2 contains the substantial code and the commit message covers the
details of the changes.

Patches #3 to #6 add selftests.

Petr Machata (6):
  net: nexthop: Add flag to assert that NHGRP reserved fields are zero
  net: nexthop: Increase weight to u16
  selftests: router_mpath: Sleep after MZ
  selftests: router_mpath_nh: Test 16-bit next hop weights
  selftests: router_mpath_nh_res: Test 16-bit next hop weights
  selftests: fib_nexthops: Test 16-bit next hop weights

 include/net/nexthop.h                         |  4 +-
 include/uapi/linux/nexthop.h                  | 10 +++-
 net/ipv4/nexthop.c                            | 49 ++++++++++------
 tools/testing/selftests/net/fib_nexthops.sh   | 55 +++++++++++++++++-
 tools/testing/selftests/net/forwarding/lib.sh |  7 +++
 .../net/forwarding/router_mpath_nh.sh         | 40 ++++++++++---
 .../net/forwarding/router_mpath_nh_lib.sh     | 13 +++++
 .../net/forwarding/router_mpath_nh_res.sh     | 58 ++++++++++++++++---
 .../net/forwarding/router_multipath.sh        |  2 +
 9 files changed, 201 insertions(+), 37 deletions(-)

Comments

Jakub Kicinski Aug. 1, 2024, 10:52 p.m. UTC | #1

On Thu, 1 Aug 2024 18:23:56 +0200 Petr Machata wrote:
> Patches #3 to #6 add selftests.

Could you share the iproute2 patches?

Petr Machata Aug. 5, 2024, 8:06 a.m. UTC | #2

Jakub Kicinski <kuba@kernel.org> writes:

> On Thu, 1 Aug 2024 18:23:56 +0200 Petr Machata wrote:
>> Patches #3 to #6 add selftests.
>
> Could you share the iproute2 patches?

https://github.com/pmachata/iproute2/commits/nhgw16/

David Ahern Aug. 5, 2024, 2:31 p.m. UTC | #3

On 8/1/24 10:23 AM, Petr Machata wrote:
> In CLOS networks, as link failures occur at various points in the network,
> ECMP weights of the involved nodes are adjusted to compensate. With high
> fan-out of the involved nodes, and overall high number of nodes,
> a (non-)ECMP weight ratio that we would like to configure does not fit into
> 8 bits. Instead of, say, 255:254, we might like to configure something like
> 1000:999. For these deployments, the 8-bit weight may not be enough.
> 
> To that end, in this patchset increase the next hop weight from u8 to u16.
> 
> Patch #1 adds a flag that indicates whether the reserved fields are zeroed.
> This is a follow-up to a new fix merged in commit 6d745cd0e972 ("net:
> nexthop: Initialize all fields in dumped nexthops"). The theory behind this
> patch is that there is a strict ordering between the fields actually being
> zeroed, the kernel declaring that they are, and the kernel repurposing the
> fields. Thus clients can use the flag to tell if it is safe to interpret
> the reserved fields in any way.
> 
> Patch #2 contains the substantial code and the commit message covers the
> details of the changes.
> 
> Patches #3 to #6 add selftests.
> 

LGTM. For the set
Reviewed-by: David Ahern <dsahern@kernel.org>