Message ID | 20220203015140.3022854-10-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: BIG TCP implementation | expand |
Hi Eric,
I love your patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-BIG-TCP-implementation/20220203-095336
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 52dae93f3bad842c6d585700460a0dea4d70e096
config: arc-randconfig-r043-20220130 (https://download.01.org/0day-ci/archive/20220203/202202031206.1nNLT568-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/64ec6b0260be94b2ed90ee6d139591bdbd49c82d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Eric-Dumazet/tcp-BIG-TCP-implementation/20220203-095336
git checkout 64ec6b0260be94b2ed90ee6d139591bdbd49c82d
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash kernel/bpf/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from include/linux/container_of.h:5,
from include/linux/list.h:5,
from include/linux/rculist.h:10,
from include/linux/pid.h:5,
from include/linux/sched.h:14,
from include/linux/ptrace.h:6,
from include/uapi/asm-generic/bpf_perf_event.h:4,
from ./arch/arc/include/generated/uapi/asm/bpf_perf_event.h:1,
from include/uapi/linux/bpf_perf_event.h:11,
from kernel/bpf/btf.c:6:
>> include/linux/build_bug.h:78:41: error: static assertion failed: "BITS_PER_LONG >= NR_MSG_FRAG_IDS"
78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
| ^~~~~~~~~~~~~~
include/linux/build_bug.h:77:34: note: in expansion of macro '__static_assert'
77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
| ^~~~~~~~~~~~~~~
include/linux/skmsg.h:41:1: note: in expansion of macro 'static_assert'
41 | static_assert(BITS_PER_LONG >= NR_MSG_FRAG_IDS);
| ^~~~~~~~~~~~~
kernel/bpf/btf.c: In function 'btf_seq_show':
kernel/bpf/btf.c:6049:29: warning: function 'btf_seq_show' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
6049 | seq_vprintf((struct seq_file *)show->target, fmt, args);
| ^~~~~~~~
kernel/bpf/btf.c: In function 'btf_snprintf_show':
kernel/bpf/btf.c:6086:9: warning: function 'btf_snprintf_show' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
6086 | len = vsnprintf(show->target, ssnprintf->len_left, fmt, args);
| ^~~
vim +78 include/linux/build_bug.h
bc6245e5efd70c4 Ian Abbott 2017-07-10 60
6bab69c65013bed Rasmus Villemoes 2019-03-07 61 /**
6bab69c65013bed Rasmus Villemoes 2019-03-07 62 * static_assert - check integer constant expression at build time
6bab69c65013bed Rasmus Villemoes 2019-03-07 63 *
6bab69c65013bed Rasmus Villemoes 2019-03-07 64 * static_assert() is a wrapper for the C11 _Static_assert, with a
6bab69c65013bed Rasmus Villemoes 2019-03-07 65 * little macro magic to make the message optional (defaulting to the
6bab69c65013bed Rasmus Villemoes 2019-03-07 66 * stringification of the tested expression).
6bab69c65013bed Rasmus Villemoes 2019-03-07 67 *
6bab69c65013bed Rasmus Villemoes 2019-03-07 68 * Contrary to BUILD_BUG_ON(), static_assert() can be used at global
6bab69c65013bed Rasmus Villemoes 2019-03-07 69 * scope, but requires the expression to be an integer constant
6bab69c65013bed Rasmus Villemoes 2019-03-07 70 * expression (i.e., it is not enough that __builtin_constant_p() is
6bab69c65013bed Rasmus Villemoes 2019-03-07 71 * true for expr).
6bab69c65013bed Rasmus Villemoes 2019-03-07 72 *
6bab69c65013bed Rasmus Villemoes 2019-03-07 73 * Also note that BUILD_BUG_ON() fails the build if the condition is
6bab69c65013bed Rasmus Villemoes 2019-03-07 74 * true, while static_assert() fails the build if the expression is
6bab69c65013bed Rasmus Villemoes 2019-03-07 75 * false.
6bab69c65013bed Rasmus Villemoes 2019-03-07 76 */
6bab69c65013bed Rasmus Villemoes 2019-03-07 77 #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
6bab69c65013bed Rasmus Villemoes 2019-03-07 @78 #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
6bab69c65013bed Rasmus Villemoes 2019-03-07 79
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Wed, Feb 2, 2022 at 9:02 PM kernel test robot <lkp@intel.com> wrote: > > Hi Eric, > > I love your patch! Yet something to improve: > > [auto build test ERROR on net-next/master] > > url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-BIG-TCP-implementation/20220203-095336 > base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 52dae93f3bad842c6d585700460a0dea4d70e096 > config: arc-randconfig-r043-20220130 (https://download.01.org/0day-ci/archive/20220203/202202031206.1nNLT568-lkp@intel.com/config) > compiler: arc-elf-gcc (GCC) 11.2.0 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # https://github.com/0day-ci/linux/commit/64ec6b0260be94b2ed90ee6d139591bdbd49c82d > git remote add linux-review https://github.com/0day-ci/linux > git fetch --no-tags linux-review Eric-Dumazet/tcp-BIG-TCP-implementation/20220203-095336 > git checkout 64ec6b0260be94b2ed90ee6d139591bdbd49c82d > # save the config file to linux build tree > mkdir build_dir > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash kernel/bpf/ > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > > In file included from include/linux/container_of.h:5, > from include/linux/list.h:5, > from include/linux/rculist.h:10, > from include/linux/pid.h:5, > from include/linux/sched.h:14, > from include/linux/ptrace.h:6, > from include/uapi/asm-generic/bpf_perf_event.h:4, > from ./arch/arc/include/generated/uapi/asm/bpf_perf_event.h:1, > from include/uapi/linux/bpf_perf_event.h:11, > from kernel/bpf/btf.c:6: > >> include/linux/build_bug.h:78:41: error: static assertion failed: "BITS_PER_LONG >= NR_MSG_FRAG_IDS" > 78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) > | ^~~~~~~~~~~~~~ > include/linux/build_bug.h:77:34: note: in expansion of macro '__static_assert' > 77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) > | ^~~~~~~~~~~~~~~ > include/linux/skmsg.h:41:1: note: in expansion of macro 'static_assert' > 41 | static_assert(BITS_PER_LONG >= NR_MSG_FRAG_IDS); Not clear why we have this assertion. Do we use a bitmap in an "unsigned long" in skmsg ? We could still use the old 17 limit for 32bit arches/builds. > | ^~~~~~~~~~~~~ > kernel/bpf/btf.c: In function 'btf_seq_show': > kernel/bpf/btf.c:6049:29: warning: function 'btf_seq_show' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] > 6049 | seq_vprintf((struct seq_file *)show->target, fmt, args); > | ^~~~~~~~ > kernel/bpf/btf.c: In function 'btf_snprintf_show': > kernel/bpf/btf.c:6086:9: warning: function 'btf_snprintf_show' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] > 6086 | len = vsnprintf(show->target, ssnprintf->len_left, fmt, args); > | ^~~ > > > vim +78 include/linux/build_bug.h > > bc6245e5efd70c4 Ian Abbott 2017-07-10 60 > 6bab69c65013bed Rasmus Villemoes 2019-03-07 61 /** > 6bab69c65013bed Rasmus Villemoes 2019-03-07 62 * static_assert - check integer constant expression at build time > 6bab69c65013bed Rasmus Villemoes 2019-03-07 63 * > 6bab69c65013bed Rasmus Villemoes 2019-03-07 64 * static_assert() is a wrapper for the C11 _Static_assert, with a > 6bab69c65013bed Rasmus Villemoes 2019-03-07 65 * little macro magic to make the message optional (defaulting to the > 6bab69c65013bed Rasmus Villemoes 2019-03-07 66 * stringification of the tested expression). > 6bab69c65013bed Rasmus Villemoes 2019-03-07 67 * > 6bab69c65013bed Rasmus Villemoes 2019-03-07 68 * Contrary to BUILD_BUG_ON(), static_assert() can be used at global > 6bab69c65013bed Rasmus Villemoes 2019-03-07 69 * scope, but requires the expression to be an integer constant > 6bab69c65013bed Rasmus Villemoes 2019-03-07 70 * expression (i.e., it is not enough that __builtin_constant_p() is > 6bab69c65013bed Rasmus Villemoes 2019-03-07 71 * true for expr). > 6bab69c65013bed Rasmus Villemoes 2019-03-07 72 * > 6bab69c65013bed Rasmus Villemoes 2019-03-07 73 * Also note that BUILD_BUG_ON() fails the build if the condition is > 6bab69c65013bed Rasmus Villemoes 2019-03-07 74 * true, while static_assert() fails the build if the expression is > 6bab69c65013bed Rasmus Villemoes 2019-03-07 75 * false. > 6bab69c65013bed Rasmus Villemoes 2019-03-07 76 */ > 6bab69c65013bed Rasmus Villemoes 2019-03-07 77 #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) > 6bab69c65013bed Rasmus Villemoes 2019-03-07 @78 #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) > 6bab69c65013bed Rasmus Villemoes 2019-03-07 79 > > --- > 0-DAY CI Kernel Test Service, Intel Corporation > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Eric,
I love your patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-BIG-TCP-implementation/20220203-095336
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 52dae93f3bad842c6d585700460a0dea4d70e096
config: hexagon-randconfig-r045-20220130 (https://download.01.org/0day-ci/archive/20220203/202202031315.B425Ipe8-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project a73e4ce6a59b01f0e37037761c1e6889d539d233)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/64ec6b0260be94b2ed90ee6d139591bdbd49c82d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Eric-Dumazet/tcp-BIG-TCP-implementation/20220203-095336
git checkout 64ec6b0260be94b2ed90ee6d139591bdbd49c82d
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/bpf/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from kernel/bpf/btf.c:22:
>> include/linux/skmsg.h:41:1: error: static_assert failed due to requirement '32 >= (45UL + 1)' "BITS_PER_LONG >= NR_MSG_FRAG_IDS"
static_assert(BITS_PER_LONG >= NR_MSG_FRAG_IDS);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:77:34: note: expanded from macro 'static_assert'
#define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:78:41: note: expanded from macro '__static_assert'
#define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
^ ~~~~
1 error generated.
vim +41 include/linux/skmsg.h
604326b41a6fb9 Daniel Borkmann 2018-10-13 25
604326b41a6fb9 Daniel Borkmann 2018-10-13 26 struct sk_msg_sg {
604326b41a6fb9 Daniel Borkmann 2018-10-13 27 u32 start;
604326b41a6fb9 Daniel Borkmann 2018-10-13 28 u32 curr;
604326b41a6fb9 Daniel Borkmann 2018-10-13 29 u32 end;
604326b41a6fb9 Daniel Borkmann 2018-10-13 30 u32 size;
604326b41a6fb9 Daniel Borkmann 2018-10-13 31 u32 copybreak;
163ab96b52ae2b Jakub Kicinski 2019-10-06 32 unsigned long copy;
031097d9e079e4 Jakub Kicinski 2019-11-27 33 /* The extra two elements:
031097d9e079e4 Jakub Kicinski 2019-11-27 34 * 1) used for chaining the front and sections when the list becomes
031097d9e079e4 Jakub Kicinski 2019-11-27 35 * partitioned (e.g. end < start). The crypto APIs require the
031097d9e079e4 Jakub Kicinski 2019-11-27 36 * chaining;
031097d9e079e4 Jakub Kicinski 2019-11-27 37 * 2) to chain tailer SG entries after the message.
d3b18ad31f93d0 John Fastabend 2018-10-13 38 */
031097d9e079e4 Jakub Kicinski 2019-11-27 39 struct scatterlist data[MAX_MSG_FRAGS + 2];
604326b41a6fb9 Daniel Borkmann 2018-10-13 40 };
031097d9e079e4 Jakub Kicinski 2019-11-27 @41 static_assert(BITS_PER_LONG >= NR_MSG_FRAG_IDS);
604326b41a6fb9 Daniel Borkmann 2018-10-13 42
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Wed, 2 Feb 2022 21:20:32 -0800 Eric Dumazet wrote: > Not clear why we have this assertion. Do we use a bitmap in an > "unsigned long" in skmsg ? > > We could still use the old 17 limit for 32bit arches/builds. git blame points at me but I just adjusted it. Looks like its struct sk_msg_sg::copy that's the reason. On a quick look we can make it an array of unsigned longs without a problem.
Hi Eric, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-BIG-TCP-implementation/20220203-095336 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 52dae93f3bad842c6d585700460a0dea4d70e096 config: arc-allyesconfig (https://download.01.org/0day-ci/archive/20220203/202202031344.0FFfnywX-lkp@intel.com/config) compiler: arceb-elf-gcc (GCC) 11.2.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/64ec6b0260be94b2ed90ee6d139591bdbd49c82d git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Eric-Dumazet/tcp-BIG-TCP-implementation/20220203-095336 git checkout 64ec6b0260be94b2ed90ee6d139591bdbd49c82d # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash drivers/net/ethernet/3com/ drivers/net/ethernet/agere/ drivers/net/ethernet/mellanox/mlx5/core/ drivers/net/wireguard/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): drivers/net/wireguard/send.c: In function 'encrypt_packet': >> drivers/net/wireguard/send.c:219:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=] 219 | } | ^ -- drivers/net/wireguard/receive.c: In function 'decrypt_packet': >> drivers/net/wireguard/receive.c:299:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=] 299 | } | ^ -- >> drivers/net/ethernet/3com/typhoon.c:142:2: warning: #warning Typhoon only supports 32 entries in its SG list for TSO, disabling TSO [-Wcpp] 142 | #warning Typhoon only supports 32 entries in its SG list for TSO, disabling TSO | ^~~~~~~ vim +219 drivers/net/wireguard/send.c e7096c131e5161 Jason A. Donenfeld 2019-12-09 161 e7096c131e5161 Jason A. Donenfeld 2019-12-09 162 static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair) e7096c131e5161 Jason A. Donenfeld 2019-12-09 163 { e7096c131e5161 Jason A. Donenfeld 2019-12-09 164 unsigned int padding_len, plaintext_len, trailer_len; e7096c131e5161 Jason A. Donenfeld 2019-12-09 165 struct scatterlist sg[MAX_SKB_FRAGS + 8]; e7096c131e5161 Jason A. Donenfeld 2019-12-09 166 struct message_data *header; e7096c131e5161 Jason A. Donenfeld 2019-12-09 167 struct sk_buff *trailer; e7096c131e5161 Jason A. Donenfeld 2019-12-09 168 int num_frags; e7096c131e5161 Jason A. Donenfeld 2019-12-09 169 c78a0b4a78839d Jason A. Donenfeld 2020-05-19 170 /* Force hash calculation before encryption so that flow analysis is c78a0b4a78839d Jason A. Donenfeld 2020-05-19 171 * consistent over the inner packet. c78a0b4a78839d Jason A. Donenfeld 2020-05-19 172 */ c78a0b4a78839d Jason A. Donenfeld 2020-05-19 173 skb_get_hash(skb); c78a0b4a78839d Jason A. Donenfeld 2020-05-19 174 e7096c131e5161 Jason A. Donenfeld 2019-12-09 175 /* Calculate lengths. */ e7096c131e5161 Jason A. Donenfeld 2019-12-09 176 padding_len = calculate_skb_padding(skb); e7096c131e5161 Jason A. Donenfeld 2019-12-09 177 trailer_len = padding_len + noise_encrypted_len(0); e7096c131e5161 Jason A. Donenfeld 2019-12-09 178 plaintext_len = skb->len + padding_len; e7096c131e5161 Jason A. Donenfeld 2019-12-09 179 e7096c131e5161 Jason A. Donenfeld 2019-12-09 180 /* Expand data section to have room for padding and auth tag. */ e7096c131e5161 Jason A. Donenfeld 2019-12-09 181 num_frags = skb_cow_data(skb, trailer_len, &trailer); e7096c131e5161 Jason A. Donenfeld 2019-12-09 182 if (unlikely(num_frags < 0 || num_frags > ARRAY_SIZE(sg))) e7096c131e5161 Jason A. Donenfeld 2019-12-09 183 return false; e7096c131e5161 Jason A. Donenfeld 2019-12-09 184 e7096c131e5161 Jason A. Donenfeld 2019-12-09 185 /* Set the padding to zeros, and make sure it and the auth tag are part e7096c131e5161 Jason A. Donenfeld 2019-12-09 186 * of the skb. e7096c131e5161 Jason A. Donenfeld 2019-12-09 187 */ e7096c131e5161 Jason A. Donenfeld 2019-12-09 188 memset(skb_tail_pointer(trailer), 0, padding_len); e7096c131e5161 Jason A. Donenfeld 2019-12-09 189 e7096c131e5161 Jason A. Donenfeld 2019-12-09 190 /* Expand head section to have room for our header and the network e7096c131e5161 Jason A. Donenfeld 2019-12-09 191 * stack's headers. e7096c131e5161 Jason A. Donenfeld 2019-12-09 192 */ e7096c131e5161 Jason A. Donenfeld 2019-12-09 193 if (unlikely(skb_cow_head(skb, DATA_PACKET_HEAD_ROOM) < 0)) e7096c131e5161 Jason A. Donenfeld 2019-12-09 194 return false; e7096c131e5161 Jason A. Donenfeld 2019-12-09 195 e7096c131e5161 Jason A. Donenfeld 2019-12-09 196 /* Finalize checksum calculation for the inner packet, if required. */ e7096c131e5161 Jason A. Donenfeld 2019-12-09 197 if (unlikely(skb->ip_summed == CHECKSUM_PARTIAL && e7096c131e5161 Jason A. Donenfeld 2019-12-09 198 skb_checksum_help(skb))) e7096c131e5161 Jason A. Donenfeld 2019-12-09 199 return false; e7096c131e5161 Jason A. Donenfeld 2019-12-09 200 e7096c131e5161 Jason A. Donenfeld 2019-12-09 201 /* Only after checksumming can we safely add on the padding at the end e7096c131e5161 Jason A. Donenfeld 2019-12-09 202 * and the header. e7096c131e5161 Jason A. Donenfeld 2019-12-09 203 */ e7096c131e5161 Jason A. Donenfeld 2019-12-09 204 skb_set_inner_network_header(skb, 0); e7096c131e5161 Jason A. Donenfeld 2019-12-09 205 header = (struct message_data *)skb_push(skb, sizeof(*header)); e7096c131e5161 Jason A. Donenfeld 2019-12-09 206 header->header.type = cpu_to_le32(MESSAGE_DATA); e7096c131e5161 Jason A. Donenfeld 2019-12-09 207 header->key_idx = keypair->remote_index; e7096c131e5161 Jason A. Donenfeld 2019-12-09 208 header->counter = cpu_to_le64(PACKET_CB(skb)->nonce); e7096c131e5161 Jason A. Donenfeld 2019-12-09 209 pskb_put(skb, trailer, trailer_len); e7096c131e5161 Jason A. Donenfeld 2019-12-09 210 e7096c131e5161 Jason A. Donenfeld 2019-12-09 211 /* Now we can encrypt the scattergather segments */ e7096c131e5161 Jason A. Donenfeld 2019-12-09 212 sg_init_table(sg, num_frags); e7096c131e5161 Jason A. Donenfeld 2019-12-09 213 if (skb_to_sgvec(skb, sg, sizeof(struct message_data), e7096c131e5161 Jason A. Donenfeld 2019-12-09 214 noise_encrypted_len(plaintext_len)) <= 0) e7096c131e5161 Jason A. Donenfeld 2019-12-09 215 return false; e7096c131e5161 Jason A. Donenfeld 2019-12-09 216 return chacha20poly1305_encrypt_sg_inplace(sg, plaintext_len, NULL, 0, e7096c131e5161 Jason A. Donenfeld 2019-12-09 217 PACKET_CB(skb)->nonce, e7096c131e5161 Jason A. Donenfeld 2019-12-09 218 keypair->sending.key); e7096c131e5161 Jason A. Donenfeld 2019-12-09 @219 } e7096c131e5161 Jason A. Donenfeld 2019-12-09 220 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Wed, Feb 2, 2022 at 9:31 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 2 Feb 2022 21:20:32 -0800 Eric Dumazet wrote: > > Not clear why we have this assertion. Do we use a bitmap in an > > "unsigned long" in skmsg ? > > > > We could still use the old 17 limit for 32bit arches/builds. > > git blame points at me but I just adjusted it. Looks like its > struct sk_msg_sg::copy that's the reason. On a quick look we > can make it an array of unsigned longs without a problem. Oh right, thanks for the pointer.
On Wed, 2022-02-02 at 17:51 -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Currently, MAX_SKB_FRAGS value is 17. > > For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() > attempts order-3 allocations, stuffing 32768 bytes per frag. > > But with zero copy, we use order-0 pages. > > For BIG TCP to show its full potential, we increase MAX_SKB_FRAGS > to be able to fit 45 segments per skb. > > This is also needed for BIG TCP rx zerocopy, as zerocopy currently > does not support skbs with frag list. > > We have used this MAX_SKB_FRAGS value for years at Google before > we deployed 4K MTU, with no adverse effect. > Back then, goal was to be able to receive full size (64KB) GRO > packets without the frag_list overhead. IIRC, while backporting some changes to an older RHEL kernel, we had to increase the skb overhead due to kabi issue. That caused some measurable regressions because some drivers (e.g. ixgbe) where not able any more to allocate multiple (skb) heads from the same page. All the above subject to some noise - it's a fainting memory. I'll try to do some tests with the H/W I have handy, but it could take a little time due to conflicting scheduling here. Thanks, Paolo
On Wed, 2022-02-02 at 17:51 -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Currently, MAX_SKB_FRAGS value is 17. > > For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() > attempts order-3 allocations, stuffing 32768 bytes per frag. > > But with zero copy, we use order-0 pages. > > For BIG TCP to show its full potential, we increase MAX_SKB_FRAGS > to be able to fit 45 segments per skb. > > This is also needed for BIG TCP rx zerocopy, as zerocopy currently > does not support skbs with frag list. > > We have used this MAX_SKB_FRAGS value for years at Google before > we deployed 4K MTU, with no adverse effect. > Back then, goal was to be able to receive full size (64KB) GRO > packets without the frag_list overhead. > > Signed-off-by: Eric Dumazet <edumazet@google.com> So a big issue I see with this patch is the potential queueing issues it may introduce on Tx queues. I suspect it will cause a number of performance regressions and deadlocks as it will change the Tx queueing behavior for many NICs. As I recall many of the Intel drivers are using MAX_SKB_FRAGS as one of the ingredients for DESC_NEEDED in order to determine if the Tx queue needs to stop. With this change the value for igb for instance is jumping from 21 to 49, and the wake threshold is twice that, 98. As such the minimum Tx descriptor threshold for the driver would need to be updated beyond 80 otherwise it is likely to deadlock the first time it has to pause.
On Thu, Feb 3, 2022 at 9:26 AM Alexander H Duyck <alexander.duyck@gmail.com> wrote: > > On Wed, 2022-02-02 at 17:51 -0800, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > Currently, MAX_SKB_FRAGS value is 17. > > > > For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() > > attempts order-3 allocations, stuffing 32768 bytes per frag. > > > > But with zero copy, we use order-0 pages. > > > > For BIG TCP to show its full potential, we increase MAX_SKB_FRAGS > > to be able to fit 45 segments per skb. > > > > This is also needed for BIG TCP rx zerocopy, as zerocopy currently > > does not support skbs with frag list. > > > > We have used this MAX_SKB_FRAGS value for years at Google before > > we deployed 4K MTU, with no adverse effect. > > Back then, goal was to be able to receive full size (64KB) GRO > > packets without the frag_list overhead. > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > So a big issue I see with this patch is the potential queueing issues > it may introduce on Tx queues. I suspect it will cause a number of > performance regressions and deadlocks as it will change the Tx queueing > behavior for many NICs. > > As I recall many of the Intel drivers are using MAX_SKB_FRAGS as one of > the ingredients for DESC_NEEDED in order to determine if the Tx queue > needs to stop. With this change the value for igb for instance is > jumping from 21 to 49, and the wake threshold is twice that, 98. As > such the minimum Tx descriptor threshold for the driver would need to > be updated beyond 80 otherwise it is likely to deadlock the first time > it has to pause. Are these limits hard coded in Intel drivers and firmware, or do you think this can be changed ? I could make MAX_SKB_FRAGS a config option, and default to 17, until all drivers have been fixed. Alternative is that I remove this patch from the series and we apply it to Google production kernels, as we did before.
On Thu, Feb 3, 2022 at 9:34 AM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Feb 3, 2022 at 9:26 AM Alexander H Duyck > <alexander.duyck@gmail.com> wrote: > > > > On Wed, 2022-02-02 at 17:51 -0800, Eric Dumazet wrote: > > > From: Eric Dumazet <edumazet@google.com> > > > > > > Currently, MAX_SKB_FRAGS value is 17. > > > > > > For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() > > > attempts order-3 allocations, stuffing 32768 bytes per frag. > > > > > > But with zero copy, we use order-0 pages. > > > > > > For BIG TCP to show its full potential, we increase MAX_SKB_FRAGS > > > to be able to fit 45 segments per skb. > > > > > > This is also needed for BIG TCP rx zerocopy, as zerocopy currently > > > does not support skbs with frag list. > > > > > > We have used this MAX_SKB_FRAGS value for years at Google before > > > we deployed 4K MTU, with no adverse effect. > > > Back then, goal was to be able to receive full size (64KB) GRO > > > packets without the frag_list overhead. > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > So a big issue I see with this patch is the potential queueing issues > > it may introduce on Tx queues. I suspect it will cause a number of > > performance regressions and deadlocks as it will change the Tx queueing > > behavior for many NICs. > > > > As I recall many of the Intel drivers are using MAX_SKB_FRAGS as one of > > the ingredients for DESC_NEEDED in order to determine if the Tx queue > > needs to stop. With this change the value for igb for instance is > > jumping from 21 to 49, and the wake threshold is twice that, 98. As > > such the minimum Tx descriptor threshold for the driver would need to > > be updated beyond 80 otherwise it is likely to deadlock the first time > > it has to pause. > > Are these limits hard coded in Intel drivers and firmware, or do you > think this can be changed ? This is all code in the drivers. Most drivers have them as the logic is used to avoid having to return NETIDEV_TX_BUSY. Basically the assumption is there is a 1:1 correlation between descriptors and individual frags. So most drivers would need to increase the size of their Tx descriptor rings if they were optimized for a lower value. The other thing is that most of the tuning for things like interrupt moderation assume a certain fill level on the queues and those would likely need to be updated to account for this change. > I could make MAX_SKB_FRAGS a config option, and default to 17, until > all drivers have been fixed. > > Alternative is that I remove this patch from the series and we apply > it to Google production kernels, > as we did before. A config option would probably be preferred. The big issue as I see it is that changing MAX_SKB_FRAGS is going to have ripples throughout the ecosystem as the shared info size will be increasing and the queueing behavior for most drivers will be modified as a result.
On Thu, 3 Feb 2022 09:56:42 -0800 Alexander Duyck wrote: > > I could make MAX_SKB_FRAGS a config option, and default to 17, until > > all drivers have been fixed. > > > > Alternative is that I remove this patch from the series and we apply > > it to Google production kernels, > > as we did before. > > A config option would probably be preferred. The big issue as I see it > is that changing MAX_SKB_FRAGS is going to have ripples throughout the > ecosystem as the shared info size will be increasing and the queueing > behavior for most drivers will be modified as a result. I'd vote for making the change and dealing with the fall out. Unlikely many people would turn this knob otherwise and it's a major difference. Better not to fork the characteristics of the stack, IMHO.
On Thu, Feb 3, 2022 at 11:18 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 3 Feb 2022 09:56:42 -0800 Alexander Duyck wrote: > > > I could make MAX_SKB_FRAGS a config option, and default to 17, until > > > all drivers have been fixed. > > > > > > Alternative is that I remove this patch from the series and we apply > > > it to Google production kernels, > > > as we did before. > > > > A config option would probably be preferred. The big issue as I see it > > is that changing MAX_SKB_FRAGS is going to have ripples throughout the > > ecosystem as the shared info size will be increasing and the queueing > > behavior for most drivers will be modified as a result. > > I'd vote for making the change and dealing with the fall out. Unlikely > many people would turn this knob otherwise and it's a major difference. > Better not to fork the characteristics of the stack, IMHO. Another issue with CONFIG_ options is that they are integer. Trying the following did not work #define MAX_SKB_FRAGS ((unsigned long)CONFIG_MAX_SKB_FRAGS) Because in some places we have #if ( MAX_SKB_FRAGS > ...) (MAX_SKB_FRAGS is UL currently, making it an integer might cause some signed/unsigned operations buggy)
On Thu, Feb 3, 2022 at 11:20 AM Eric Dumazet <edumazet@google.com> wrote: > > Another issue with CONFIG_ options is that they are integer. > > Trying the following did not work > > #define MAX_SKB_FRAGS ((unsigned long)CONFIG_MAX_SKB_FRAGS) > > Because in some places we have > > #if ( MAX_SKB_FRAGS > ...) > > (MAX_SKB_FRAGS is UL currently, making it an integer might cause some > signed/unsigned operations buggy) I came to something like this, clearly this a bit ugly. diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 08c12c41c5a5907dccc7389f396394d8132d962e..cc3cac3ee109f95c8a51eb90ba4a3bf7bebe86eb 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -323,7 +323,15 @@ enum skb_drop_reason { SKB_DROP_REASON_MAX, }; +#ifdef CONFIG_MAX_SKB_FRAGS_17 +#define MAX_SKB_FRAGS 17UL +#endif +#ifdef CONFIG_MAX_SKB_FRAGS_25 +#define MAX_SKB_FRAGS 25UL +#endif +#ifdef CONFIG_MAX_SKB_FRAGS_45 #define MAX_SKB_FRAGS 45UL +#endif extern int sysctl_max_skb_frags; diff --git a/net/Kconfig b/net/Kconfig index 8a1f9d0287de3c32040eee03b60114c6e6d150bc..d91027a654c2aad7bfa55152ef81c882bf394aff 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -253,6 +253,29 @@ config PCPU_DEV_REFCNT network device refcount are using per cpu variables if this option is set. This can be forced to N to detect underflows (with a performance drop). +choice + prompt "Maximum number of fragments per skb_shared_info" + default MAX_SKB_FRAGS_17 + +config MAX_SKB_FRAGS_17 + bool "17 fragments per skb_shared_info" + help + Some drivers have assumptions about MAX_SKB_FRAGS being 17. + Until they are fixed, it is safe to adopt the old limit. + +config MAX_SKB_FRAGS_25 + bool "25 fragments per skb_shared_info" + help + Helps BIG TCP workloads, but might expose bugs in some legacy drivers. + +config MAX_SKB_FRAGS_45 + bool "45 fragments per skb_shared_info" + help + Helps BIG TCP workloads, but might expose bugs in some legacy drivers. + This also increase memory overhead of small packets. + +endchoice + config RPS bool depends on SMP && SYSFS
From: Alexander Duyck > Sent: 03 February 2022 17:57 ... > > > So a big issue I see with this patch is the potential queueing issues > > > it may introduce on Tx queues. I suspect it will cause a number of > > > performance regressions and deadlocks as it will change the Tx queueing > > > behavior for many NICs. > > > > > > As I recall many of the Intel drivers are using MAX_SKB_FRAGS as one of > > > the ingredients for DESC_NEEDED in order to determine if the Tx queue > > > needs to stop. With this change the value for igb for instance is > > > jumping from 21 to 49, and the wake threshold is twice that, 98. As > > > such the minimum Tx descriptor threshold for the driver would need to > > > be updated beyond 80 otherwise it is likely to deadlock the first time > > > it has to pause. > > > > Are these limits hard coded in Intel drivers and firmware, or do you > > think this can be changed ? > > This is all code in the drivers. Most drivers have them as the logic > is used to avoid having to return NETIDEV_TX_BUSY. Basically the > assumption is there is a 1:1 correlation between descriptors and > individual frags. So most drivers would need to increase the size of > their Tx descriptor rings if they were optimized for a lower value. Maybe the drivers can be a little less conservative about the number of fragments they expect in the next message? There is little point requiring 49 free descriptors when the workload never has more than 2 or 3 fragments. Clearly you don't want to re-enable things unless there are enough descriptors for an skb that has generated NETDEV_TX_BUSY, but the current logic of 'trying to never actually return NETDEV_TX_BUSY' is probably over cautious. Does Linux allow skb to have a lot of short fragments? If dma_map isn't cheap (probably anything with an iommu or non-coherent memory) them copying/merging short fragments into a pre-mapped buffer can easily be faster. Many years ago we found it was worth copying anything under 1k on a sparc mbus+sbus system. I don't think Linux can generate what I've seen elsewhere - the mac driver being asked to transmit something with 1000+ one byte fragmemts! David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Feb 4, 2022 at 2:18 AM David Laight <David.Laight@aculab.com> wrote: > > From: Alexander Duyck > > Sent: 03 February 2022 17:57 > ... > > > > So a big issue I see with this patch is the potential queueing issues > > > > it may introduce on Tx queues. I suspect it will cause a number of > > > > performance regressions and deadlocks as it will change the Tx queueing > > > > behavior for many NICs. > > > > > > > > As I recall many of the Intel drivers are using MAX_SKB_FRAGS as one of > > > > the ingredients for DESC_NEEDED in order to determine if the Tx queue > > > > needs to stop. With this change the value for igb for instance is > > > > jumping from 21 to 49, and the wake threshold is twice that, 98. As > > > > such the minimum Tx descriptor threshold for the driver would need to > > > > be updated beyond 80 otherwise it is likely to deadlock the first time > > > > it has to pause. > > > > > > Are these limits hard coded in Intel drivers and firmware, or do you > > > think this can be changed ? > > > > This is all code in the drivers. Most drivers have them as the logic > > is used to avoid having to return NETIDEV_TX_BUSY. Basically the > > assumption is there is a 1:1 correlation between descriptors and > > individual frags. So most drivers would need to increase the size of > > their Tx descriptor rings if they were optimized for a lower value. > > Maybe the drivers can be a little less conservative about the number > of fragments they expect in the next message? > There is little point requiring 49 free descriptors when the workload > never has more than 2 or 3 fragments. > > Clearly you don't want to re-enable things unless there are enough > descriptors for an skb that has generated NETDEV_TX_BUSY, but the > current logic of 'trying to never actually return NETDEV_TX_BUSY' > is probably over cautious. The problem is that NETDEV_TX_BUSY can cause all sorts of issues in terms of the flow of packets. Basically when you start having to push packets back from the device to the qdisc you can essentially create head-of-line blocking type scenarios which can make things like traffic shaping that much more difficult. > Does Linux allow skb to have a lot of short fragments? > If dma_map isn't cheap (probably anything with an iommu or non-coherent > memory) them copying/merging short fragments into a pre-mapped > buffer can easily be faster. I know Linux skbs can have a lot of short fragments. The i40e has a workaround for cases where more than 8 fragments are needed to transmit a single frame for instance, see __i40e_chk_linearize(). > Many years ago we found it was worth copying anything under 1k on > a sparc mbus+sbus system. > I don't think Linux can generate what I've seen elsewhere - the mac > driver being asked to transmit something with 1000+ one byte fragmemts! > > David Linux cannot generate the 1000+ fragments, mainly because it is limited by the frags. However as I pointed out above it isn't uncommon to see an skb composed of a number of smaller fragments. That said, I don't know if we really need to be rewriting the code for NETDEV_TX_BUSY handling on the drivers. It would just be a matter of reserving more memory in the descriptor rings since the counts would be going from 42 to 98 in order to unblock a Tx queue in the case of igb for instance, and currently the minimum ring size is 80. So in this case it would just be a matter of increasing the minimum so that it cannot be configured into a deadlock. Ultimately that is the trade-off with this approach. What we are doing is increasing the memory footprint of the drivers and skbs in order to allow for more buffering in the skb to increase throughput. I wonder if it wouldn't make sense to just make MAX_SKB_FRAGS a driver level setting like gso_max_size so that low end NICs out there aren't having to reserve a ton of memory to store fragments they will never use. - Alex
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a27bcc4f7e9a92ea4f4f4f6e5f454bb4f8099f66..08c12c41c5a5907dccc7389f396394d8132d962e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -323,18 +323,8 @@ enum skb_drop_reason { SKB_DROP_REASON_MAX, }; -/* To allow 64K frame to be packed as single skb without frag_list we - * require 64K/PAGE_SIZE pages plus 1 additional page to allow for - * buffers which do not start on a page boundary. - * - * Since GRO uses frags we allocate at least 16 regardless of page - * size. - */ -#if (65536/PAGE_SIZE + 1) < 16 -#define MAX_SKB_FRAGS 16UL -#else -#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1) -#endif +#define MAX_SKB_FRAGS 45UL + extern int sysctl_max_skb_frags; /* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to