[v3] udp: Avoid call to compute_score on multiple sites

Message ID	20240412212004.17181-1-krisman@suse.de (mailing list archive)
State	Accepted
Commit	50aee97d15113b95a68848db1f0cb2a6c09f753a
Delegated to:	Netdev Maintainers
Headers	show Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 573494438E for <netdev@vger.kernel.org>; Fri, 12 Apr 2024 21:20:16 +0000 (UTC) From: Gabriel Krisman Bertazi <krisman@suse.de> To: willemdebruijn.kernel@gmail.com, davem@davemloft.net, kuniyu@amazon.com Cc: netdev@vger.kernel.org, martin.lau@kernel.org, Gabriel Krisman Bertazi <krisman@suse.de>, Lorenz Bauer <lmb@isovalent.com> Subject: [PATCH v3] udp: Avoid call to compute_score on multiple sites Date: Fri, 12 Apr 2024 17:20:04 -0400 Message-ID: <20240412212004.17181-1-krisman@suse.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit default: False [-1.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns]; FROM_HAS_DN(0.00)[]; TAGGED_RCPT(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_TO(0.00)[gmail.com,davemloft.net,amazon.com]; RCPT_COUNT_SEVEN(0.00)[7]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_TLS_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_SOME(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FREEMAIL_ENVRCPT(0.00)[gmail.com]
Series	[v3] udp: Avoid call to compute_score on multiple sites \| expand [v3] udp: Avoid call to compute_score on multiple sites

Message ID

20240412212004.17181-1-krisman@suse.de (mailing list archive)

State

Accepted

Commit

50aee97d15113b95a68848db1f0cb2a6c09f753a

Delegated to:

Netdev Maintainers

Headers

show

Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 573494438E
	for <netdev@vger.kernel.org>; Fri, 12 Apr 2024 21:20:16 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=195.135.223.131
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1712956818; cv=none;
 b=GOwUMRZKY+Nr8fpn12GduF63Rvgy3Ogb3xFfGWvDv5seJ5ePDHMws4uqjzU4XZ8IwoqRpH3fw8n5/uP7dVzIzsn1mdlXv8fjpZ2XAJ88zu6hhw5EDv6qAm7icST8g9ovjAZHnKToaFS4fJXa5TlSW2W9ilKtWNTLsetQeu0L1G0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1712956818; c=relaxed/simple;
	bh=oR9m0ITxLnlMZnvZs5BXLoEM86xEX4hJ0V0xq7eUYYg=;
	h=From:To:Cc:Subject:Date:Message-ID:MIME-Version;
 b=mBX6NPzZ3WnJr0VLdif3H5Wl7RDA+06cxi4HrFDcrnvxnMKbXfPdc1k4tPXvXNf4udnFHbsdIWSizBse6ALUyTAP/mh10zKeRjeXE6IC4Jw8Wc/c9GLPlsPipintV8EllDU0Y9hJhLkSKU9m6l6qM4f90muQvLgTIC8BNETHLYg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=suse.de;
 spf=pass smtp.mailfrom=suse.de;
 dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de
 header.b=l1HAesxr;
 dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de
 header.b=kNA3SdHn;
 dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de
 header.b=J8ffIOiK;
 dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de
 header.b=VCFnGxNc; arc=none smtp.client-ip=195.135.223.131
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=suse.de
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de
 header.b="l1HAesxr";
	dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de
 header.b="kNA3SdHn";
	dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de
 header.b="J8ffIOiK";
	dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de
 header.b="VCFnGxNc"
Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org
 [10.150.64.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
	(No client certificate requested)
	by smtp-out2.suse.de (Postfix) with ESMTPS id 8DE771FD01;
	Fri, 12 Apr 2024 21:20:13 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
	t=1712956814;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:
  content-transfer-encoding:content-transfer-encoding;
	bh=mSZhS4KhLP2ZkHX8tqVCwGcDOr64lSHAPUTtMgFThfM=;
	b=l1HAesxrYx/3+kPSasf+RfQC5Y/Hi0MZDiNsyq0ZnV3HngOm4W5DKfAnzLWXhKY3qFa+Oi
	z0gZoN3fP/qCYtH/9zdzjNBI30oU0BtwXLwViuq5RgU5Rlg0445XJYP3kSgGWrFGBkHEku
	ijv7aMojL9balIjvgDnEyhtrIMojzgc=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1712956814;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:
  content-transfer-encoding:content-transfer-encoding;
	bh=mSZhS4KhLP2ZkHX8tqVCwGcDOr64lSHAPUTtMgFThfM=;
	b=kNA3SdHnEvlsm8HrHbnHOJ6GhTYJlAGKvhOb0sCrBPgu/wC4NXFj6IPa/DZjp8LPSX3s1f
	CAZK0NINp1MgsyCA==
Authentication-Results: smtp-out2.suse.de;
	none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
	t=1712956813;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:
  content-transfer-encoding:content-transfer-encoding;
	bh=mSZhS4KhLP2ZkHX8tqVCwGcDOr64lSHAPUTtMgFThfM=;
	b=J8ffIOiKZqtXPUPRKzPqGDvO6mFpoJdOzZYY6hM3VVKZMQ6U5mwstq/cMU6xPGCoKZmurw
	TYNgUB6LoeGMkPEo/AVDFofMEHvFzyj2lp2fmNm67+q0ihZYd93I1oUqW6XcTVgXBhQZqM
	HBGTIMUzI5szJVMqdEmnMGvZxf1rFxI=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1712956813;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:
  content-transfer-encoding:content-transfer-encoding;
	bh=mSZhS4KhLP2ZkHX8tqVCwGcDOr64lSHAPUTtMgFThfM=;
	b=VCFnGxNcz9G9saMvtCgWVPAF8S66lSOiBK1F4y4YpD9p1GYD6Yuh3icJdizveV6ZaS32oD
	xx3NNSRbq8GJWpAQ==
Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
	(No client certificate requested)
	by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 4186C1368B;
	Fri, 12 Apr 2024 21:20:13 +0000 (UTC)
Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167])
	by imap1.dmz-prg2.suse.org with ESMTPSA
	id xD2pBY2lGWawZwAAD6G6ig
	(envelope-from <krisman@suse.de>); Fri, 12 Apr 2024 21:20:13 +0000
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: willemdebruijn.kernel@gmail.com,
	davem@davemloft.net,
	kuniyu@amazon.com
Cc: netdev@vger.kernel.org,
	martin.lau@kernel.org,
	Gabriel Krisman Bertazi <krisman@suse.de>,
	Lorenz Bauer <lmb@isovalent.com>
Subject: [PATCH v3] udp: Avoid call to compute_score on multiple sites
Date: Fri, 12 Apr 2024 17:20:04 -0400
Message-ID: <20240412212004.17181-1-krisman@suse.de>
X-Mailer: git-send-email 2.44.0
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Spam-Level: 
X-Spamd-Result: default: False [-1.30 / 50.00];
	BAYES_HAM(-3.00)[100.00%];
	SUSPICIOUS_RECIPS(1.50)[];
	MID_CONTAINS_FROM(1.00)[];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	R_MISSING_CHARSET(0.50)[];
	NEURAL_HAM_SHORT(-0.20)[-1.000];
	MIME_GOOD(-0.10)[text/plain];
	DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns];
	FROM_HAS_DN(0.00)[];
	TAGGED_RCPT(0.00)[];
	RCVD_VIA_SMTP_AUTH(0.00)[];
	MIME_TRACE(0.00)[0:+];
	ARC_NA(0.00)[];
	FROM_EQ_ENVFROM(0.00)[];
	FREEMAIL_TO(0.00)[gmail.com,davemloft.net,amazon.com];
	RCPT_COUNT_SEVEN(0.00)[7];
	RCVD_COUNT_TWO(0.00)[2];
	TO_MATCH_ENVRCPT_ALL(0.00)[];
	RCVD_TLS_ALL(0.00)[];
	FUZZY_BLOCKED(0.00)[rspamd.com];
	TO_DN_SOME(0.00)[];
	DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];
	FREEMAIL_ENVRCPT(0.00)[gmail.com]
X-Spam-Score: -1.30
X-Spam-Flag: NO
X-Patchwork-Delegate: kuba@kernel.org

Series

[v3] udp: Avoid call to compute_score on multiple sites | expand

Context	Check	Description
netdev/series_format	warning	Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 934 this patch: 934
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	warning	4 maintainers not CCed: pabeni@redhat.com edumazet@google.com kuba@kernel.org dsahern@kernel.org
netdev/build_clang	success	Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 945 this patch: 945
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 79 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 3 this patch: 3
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-04-13--15-00 (tests: 960)

Context

Check

Description

netdev/series_format

warning

Single patches do not need cover letters; Target tree name not specified in the subject

netdev/tree_selection

success

Guessed tree name to be net-next

netdev/ynl

success

Generated files up to date; no warnings/errors; no diff in generated;

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 934 this patch: 934

netdev/build_tools

success

No tools touched, skip

netdev/cc_maintainers

warning

4 maintainers not CCed: pabeni@redhat.com edumazet@google.com kuba@kernel.org dsahern@kernel.org

netdev/build_clang

success

Errors and warnings before: 938 this patch: 938

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

Fixes tag looks correct

netdev/build_allmodconfig_warn

success

Errors and warnings before: 945 this patch: 945

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 79 lines checked

netdev/build_clang_rust

success

No Rust files in patch. Skipping build

netdev/kdoc

success

Errors and warnings before: 3 this patch: 3

netdev/source_inline

success

Was 0 now: 0

netdev/contest

success

net-next-2024-04-13--15-00 (tests: 960)

Commit Message

Gabriel Krisman Bertazi April 12, 2024, 9:20 p.m. UTC

We've observed a 7-12% performance regression in iperf3 UDP ipv4 and
ipv6 tests with multiple sockets on Zen3 cpus, which we traced back to
commit f0ea27e7bfe1 ("udp: re-score reuseport groups when connected
sockets are present").  The failing tests were those that would spawn
UDP sockets per-cpu on systems that have a high number of cpus.

Unsurprisingly, it is not caused by the extra re-scoring of the reused
socket, but due to the compiler no longer inlining compute_score, once
it has the extra call site in udp4_lib_lookup2.  This is augmented by
the "Safe RET" mitigation for SRSO, needed in our Zen3 cpus.

We could just explicitly inline it, but compute_score() is quite a large
function, around 300b.  Inlining in two sites would almost double
udp4_lib_lookup2, which is a silly thing to do just to workaround a
mitigation.  Instead, this patch shuffles the code a bit to avoid the
multiple calls to compute_score.  Since it is a static function used in
one spot, the compiler can safely fold it in, as it did before, without
increasing the text size.

With this patch applied I ran my original iperf3 testcases.  The failing
cases all looked like this (ipv4):
	iperf3 -c 127.0.0.1 --udp -4 -f K -b $R -l 8920 -t 30 -i 5 -P 64 -O 2

where $R is either 1G/10G/0 (max, unlimited).  I ran 3 times each.
baseline is v6.9-rc3. harmean == harmonic mean; CV == coefficient of
variation.

ipv4:
                 1G                10G                  MAX
	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
baseline 1743852.66(0.0208) 1725933.02(0.0167) 1705203.78(0.0386)
patched  1968727.61(0.0035) 1962283.22(0.0195) 1923853.50(0.0256)

ipv6:
                 1G                10G                  MAX
	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
baseline 1729020.03(0.0028) 1691704.49(0.0243) 1692251.34(0.0083)
patched  1900422.19(0.0067) 1900968.01(0.0067) 1568532.72(0.1519)

This restores the performance we had before the change above with this
benchmark.  We obviously don't expect any real impact when mitigations
are disabled, but just to be sure it also doesn't regresses:

mitigations=off ipv4:
                 1G                10G                  MAX
	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
baseline 3230279.97(0.0066) 3229320.91(0.0060) 2605693.19(0.0697)
patched  3242802.36(0.0073) 3239310.71(0.0035) 2502427.19(0.0882)

Cc: Lorenz Bauer <lmb@isovalent.com>
Fixes: f0ea27e7bfe1 ("udp: re-score reuseport groups when connected sockets are present")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>

---
Changes since v2:
(me)
  - recollected performance data after changes below only for the
  mitigations=auto case.
(suggested by Willem de Bruijn)
  - Explicitly continue the loop after a rescore
  - rename rescore variable to not clash with jump label
  - disable rescore for new loop iteration
(suggested by Kuniyuki Iwashima)
  - sort stack variables
  - drop unneeded ()

Changes since v1:
(me)
  - recollected performance data after changes below only for the
  mitigations enabled case.
(suggested by Willem de Bruijn)
  - Drop __always_inline in compute_score
  - Simplify logic by replacing third struct sock pointer with bool
  - Fix typo in commit message
  - Don't explicitly break out of loop after rescore
---
 net/ipv4/udp.c | 21 ++++++++++++++++-----
 net/ipv6/udp.c | 20 ++++++++++++++++----
 2 files changed, 32 insertions(+), 9 deletions(-)

Comments

Kuniyuki Iwashima April 13, 2024, 1:25 a.m. UTC | #1

From: Gabriel Krisman Bertazi <krisman@suse.de>
Date: Fri, 12 Apr 2024 17:20:04 -0400
> We've observed a 7-12% performance regression in iperf3 UDP ipv4 and
> ipv6 tests with multiple sockets on Zen3 cpus, which we traced back to
> commit f0ea27e7bfe1 ("udp: re-score reuseport groups when connected
> sockets are present").  The failing tests were those that would spawn
> UDP sockets per-cpu on systems that have a high number of cpus.
> 
> Unsurprisingly, it is not caused by the extra re-scoring of the reused
> socket, but due to the compiler no longer inlining compute_score, once
> it has the extra call site in udp4_lib_lookup2.  This is augmented by
> the "Safe RET" mitigation for SRSO, needed in our Zen3 cpus.
> 
> We could just explicitly inline it, but compute_score() is quite a large
> function, around 300b.  Inlining in two sites would almost double
> udp4_lib_lookup2, which is a silly thing to do just to workaround a
> mitigation.  Instead, this patch shuffles the code a bit to avoid the
> multiple calls to compute_score.  Since it is a static function used in
> one spot, the compiler can safely fold it in, as it did before, without
> increasing the text size.
> 
> With this patch applied I ran my original iperf3 testcases.  The failing
> cases all looked like this (ipv4):
> 	iperf3 -c 127.0.0.1 --udp -4 -f K -b $R -l 8920 -t 30 -i 5 -P 64 -O 2
> 
> where $R is either 1G/10G/0 (max, unlimited).  I ran 3 times each.
> baseline is v6.9-rc3. harmean == harmonic mean; CV == coefficient of
> variation.
> 
> ipv4:
>                  1G                10G                  MAX
> 	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
> baseline 1743852.66(0.0208) 1725933.02(0.0167) 1705203.78(0.0386)
> patched  1968727.61(0.0035) 1962283.22(0.0195) 1923853.50(0.0256)
> 
> ipv6:
>                  1G                10G                  MAX
> 	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
> baseline 1729020.03(0.0028) 1691704.49(0.0243) 1692251.34(0.0083)
> patched  1900422.19(0.0067) 1900968.01(0.0067) 1568532.72(0.1519)
> 
> This restores the performance we had before the change above with this
> benchmark.  We obviously don't expect any real impact when mitigations
> are disabled, but just to be sure it also doesn't regresses:
> 
> mitigations=off ipv4:
>                  1G                10G                  MAX
> 	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
> baseline 3230279.97(0.0066) 3229320.91(0.0060) 2605693.19(0.0697)
> patched  3242802.36(0.0073) 3239310.71(0.0035) 2502427.19(0.0882)
> 
> Cc: Lorenz Bauer <lmb@isovalent.com>
> Fixes: f0ea27e7bfe1 ("udp: re-score reuseport groups when connected sockets are present")
> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Thanks!

> 
> ---
> Changes since v2:
> (me)
>   - recollected performance data after changes below only for the
>   mitigations=auto case.
> (suggested by Willem de Bruijn)
>   - Explicitly continue the loop after a rescore
>   - rename rescore variable to not clash with jump label
>   - disable rescore for new loop iteration
> (suggested by Kuniyuki Iwashima)
>   - sort stack variables
>   - drop unneeded ()
> 
> Changes since v1:
> (me)
>   - recollected performance data after changes below only for the
>   mitigations enabled case.
> (suggested by Willem de Bruijn)
>   - Drop __always_inline in compute_score
>   - Simplify logic by replacing third struct sock pointer with bool
>   - Fix typo in commit message
>   - Don't explicitly break out of loop after rescore
> ---
>  net/ipv4/udp.c | 21 ++++++++++++++++-----
>  net/ipv6/udp.c | 20 ++++++++++++++++----
>  2 files changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index c02bf011d4a6..4eff1e145c63 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -427,15 +427,21 @@ static struct sock *udp4_lib_lookup2(struct net *net,
>  {
>  	struct sock *sk, *result;
>  	int score, badness;
> +	bool need_rescore;
>  
>  	result = NULL;
>  	badness = 0;
>  	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
> -		score = compute_score(sk, net, saddr, sport,
> -				      daddr, hnum, dif, sdif);
> +		need_rescore = false;
> +rescore:
> +		score = compute_score(need_rescore ? result : sk, net, saddr,
> +				      sport, daddr, hnum, dif, sdif);
>  		if (score > badness) {
>  			badness = score;
>  
> +			if (need_rescore)
> +				continue;
> +
>  			if (sk->sk_state == TCP_ESTABLISHED) {
>  				result = sk;
>  				continue;
> @@ -456,9 +462,14 @@ static struct sock *udp4_lib_lookup2(struct net *net,
>  			if (IS_ERR(result))
>  				continue;
>  
> -			badness = compute_score(result, net, saddr, sport,
> -						daddr, hnum, dif, sdif);
> -
> +			/* compute_score is too long of a function to be
> +			 * inlined, and calling it again here yields
> +			 * measureable overhead for some
> +			 * workloads. Work around it by jumping
> +			 * backwards to rescore 'result'.
> +			 */
> +			need_rescore = true;
> +			goto rescore;
>  		}
>  	}
>  	return result;
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 8b1dd7f51249..e80e8b1d2000 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -168,15 +168,21 @@ static struct sock *udp6_lib_lookup2(struct net *net,
>  {
>  	struct sock *sk, *result;
>  	int score, badness;
> +	bool need_rescore;
>  
>  	result = NULL;
>  	badness = -1;
>  	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
> -		score = compute_score(sk, net, saddr, sport,
> -				      daddr, hnum, dif, sdif);
> +		need_rescore = false;
> +rescore:
> +		score = compute_score(need_rescore ? result : sk, net, saddr,
> +				      sport, daddr, hnum, dif, sdif);
>  		if (score > badness) {
>  			badness = score;
>  
> +			if (need_rescore)
> +				continue;
> +
>  			if (sk->sk_state == TCP_ESTABLISHED) {
>  				result = sk;
>  				continue;
> @@ -197,8 +203,14 @@ static struct sock *udp6_lib_lookup2(struct net *net,
>  			if (IS_ERR(result))
>  				continue;
>  
> -			badness = compute_score(sk, net, saddr, sport,
> -						daddr, hnum, dif, sdif);
> +			/* compute_score is too long of a function to be
> +			 * inlined, and calling it again here yields
> +			 * measureable overhead for some
> +			 * workloads. Work around it by jumping
> +			 * backwards to rescore 'result'.
> +			 */
> +			need_rescore = true;
> +			goto rescore;
>  		}
>  	}
>  	return result;
> -- 
> 2.44.0

Willem de Bruijn April 13, 2024, 5:26 p.m. UTC | #2

Kuniyuki Iwashima wrote:
> From: Gabriel Krisman Bertazi <krisman@suse.de>
> Date: Fri, 12 Apr 2024 17:20:04 -0400
> > We've observed a 7-12% performance regression in iperf3 UDP ipv4 and
> > ipv6 tests with multiple sockets on Zen3 cpus, which we traced back to
> > commit f0ea27e7bfe1 ("udp: re-score reuseport groups when connected
> > sockets are present").  The failing tests were those that would spawn
> > UDP sockets per-cpu on systems that have a high number of cpus.
> > 
> > Unsurprisingly, it is not caused by the extra re-scoring of the reused
> > socket, but due to the compiler no longer inlining compute_score, once
> > it has the extra call site in udp4_lib_lookup2.  This is augmented by
> > the "Safe RET" mitigation for SRSO, needed in our Zen3 cpus.
> > 
> > We could just explicitly inline it, but compute_score() is quite a large
> > function, around 300b.  Inlining in two sites would almost double
> > udp4_lib_lookup2, which is a silly thing to do just to workaround a
> > mitigation.  Instead, this patch shuffles the code a bit to avoid the
> > multiple calls to compute_score.  Since it is a static function used in
> > one spot, the compiler can safely fold it in, as it did before, without
> > increasing the text size.
> > 
> > With this patch applied I ran my original iperf3 testcases.  The failing
> > cases all looked like this (ipv4):
> > 	iperf3 -c 127.0.0.1 --udp -4 -f K -b $R -l 8920 -t 30 -i 5 -P 64 -O 2
> > 
> > where $R is either 1G/10G/0 (max, unlimited).  I ran 3 times each.
> > baseline is v6.9-rc3. harmean == harmonic mean; CV == coefficient of
> > variation.
> > 
> > ipv4:
> >                  1G                10G                  MAX
> > 	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
> > baseline 1743852.66(0.0208) 1725933.02(0.0167) 1705203.78(0.0386)
> > patched  1968727.61(0.0035) 1962283.22(0.0195) 1923853.50(0.0256)
> > 
> > ipv6:
> >                  1G                10G                  MAX
> > 	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
> > baseline 1729020.03(0.0028) 1691704.49(0.0243) 1692251.34(0.0083)
> > patched  1900422.19(0.0067) 1900968.01(0.0067) 1568532.72(0.1519)
> > 
> > This restores the performance we had before the change above with this
> > benchmark.  We obviously don't expect any real impact when mitigations
> > are disabled, but just to be sure it also doesn't regresses:
> > 
> > mitigations=off ipv4:
> >                  1G                10G                  MAX
> > 	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
> > baseline 3230279.97(0.0066) 3229320.91(0.0060) 2605693.19(0.0697)
> > patched  3242802.36(0.0073) 3239310.71(0.0035) 2502427.19(0.0882)
> > 
> > Cc: Lorenz Bauer <lmb@isovalent.com>
> > Fixes: f0ea27e7bfe1 ("udp: re-score reuseport groups when connected sockets are present")
> > Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
> 
> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

patchwork-bot+netdevbpf@kernel.org April 15, 2024, 11:10 a.m. UTC | #3

Hello:

This patch was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Fri, 12 Apr 2024 17:20:04 -0400 you wrote:
> We've observed a 7-12% performance regression in iperf3 UDP ipv4 and
> ipv6 tests with multiple sockets on Zen3 cpus, which we traced back to
> commit f0ea27e7bfe1 ("udp: re-score reuseport groups when connected
> sockets are present").  The failing tests were those that would spawn
> UDP sockets per-cpu on systems that have a high number of cpus.
> 
> Unsurprisingly, it is not caused by the extra re-scoring of the reused
> socket, but due to the compiler no longer inlining compute_score, once
> it has the extra call site in udp4_lib_lookup2.  This is augmented by
> the "Safe RET" mitigation for SRSO, needed in our Zen3 cpus.
> 
> [...]

Here is the summary with links:
  - [v3] udp: Avoid call to compute_score on multiple sites
    https://git.kernel.org/netdev/net-next/c/50aee97d1511

You are awesome, thank you!

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c02bf011d4a6..4eff1e145c63 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -427,15 +427,21 @@  static struct sock *udp4_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
+	bool need_rescore;
 
 	result = NULL;
 	badness = 0;
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
-		score = compute_score(sk, net, saddr, sport,
-				      daddr, hnum, dif, sdif);
+		need_rescore = false;
+rescore:
+		score = compute_score(need_rescore ? result : sk, net, saddr,
+				      sport, daddr, hnum, dif, sdif);
 		if (score > badness) {
 			badness = score;
 
+			if (need_rescore)
+				continue;
+
 			if (sk->sk_state == TCP_ESTABLISHED) {
 				result = sk;
 				continue;
@@ -456,9 +462,14 @@  static struct sock *udp4_lib_lookup2(struct net *net,
 			if (IS_ERR(result))
 				continue;
 
-			badness = compute_score(result, net, saddr, sport,
-						daddr, hnum, dif, sdif);
-
+			/* compute_score is too long of a function to be
+			 * inlined, and calling it again here yields
+			 * measureable overhead for some
+			 * workloads. Work around it by jumping
+			 * backwards to rescore 'result'.
+			 */
+			need_rescore = true;
+			goto rescore;
 		}
 	}
 	return result;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8b1dd7f51249..e80e8b1d2000 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -168,15 +168,21 @@  static struct sock *udp6_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
+	bool need_rescore;
 
 	result = NULL;
 	badness = -1;
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
-		score = compute_score(sk, net, saddr, sport,
-				      daddr, hnum, dif, sdif);
+		need_rescore = false;
+rescore:
+		score = compute_score(need_rescore ? result : sk, net, saddr,
+				      sport, daddr, hnum, dif, sdif);
 		if (score > badness) {
 			badness = score;
 
+			if (need_rescore)
+				continue;
+
 			if (sk->sk_state == TCP_ESTABLISHED) {
 				result = sk;
 				continue;
@@ -197,8 +203,14 @@  static struct sock *udp6_lib_lookup2(struct net *net,
 			if (IS_ERR(result))
 				continue;
 
-			badness = compute_score(sk, net, saddr, sport,
-						daddr, hnum, dif, sdif);
+			/* compute_score is too long of a function to be
+			 * inlined, and calling it again here yields
+			 * measureable overhead for some
+			 * workloads. Work around it by jumping
+			 * backwards to rescore 'result'.
+			 */
+			need_rescore = true;
+			goto rescore;
 		}
 	}
 	return result;

[v3] udp: Avoid call to compute_score on multiple sites

Checks

Commit Message

Comments

Patch