From patchwork Wed Apr 9 18:22:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Rife X-Patchwork-Id: 14045218 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2922621129A for ; Wed, 9 Apr 2025 18:22:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222965; cv=none; b=L5ca6lX6IrT5AbGNaRebqqsOjWoo71TZxLmBuGazQsx8a1TOgDgSDb5dFkBOchZyKcLzAY+wK/tmkGBs0t6EM5TWA4tc3FSJuGPiKsDKzUDfSnfHiKKUvAASh4UmLMeAtXvVci2upLHQm3rO1h0EVME1HQudNUu+C4c62iQAgd8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222965; c=relaxed/simple; bh=wEHFPVdrORgpLcGZ+GvRrXkyZBaib+TRXKwG8iJEyPg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NrOvaOLHIgWnmxsJxV76HsyDGzndMNwlfq7J8kymDNkfD+7FXIWE18dPBZDXeU9wjSKDRpjPzNnaNJ4cE6/YaVjZ6oNI9f+/hySjUKzkOy+yDKIH4BU6NHW9CKnSKAWBlQllnOxJ8sODfzotefpaJMdwxgvsryUjPAG8BOfmkNs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io; spf=none smtp.mailfrom=jrife.io; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b=OFdQJYCS; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=jrife.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b="OFdQJYCS" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-227e29b6c55so7053125ad.1 for ; Wed, 09 Apr 2025 11:22:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jrife-io.20230601.gappssmtp.com; s=20230601; t=1744222963; x=1744827763; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DntxylgD/bFAuSnepybtNVZWTfXmXLq0Ys2Pz9lSxJ8=; b=OFdQJYCStxyB1Sykn2pOT0CnwPOy+Fa303JVx2mmyeqqvw6+0h3AL7LrzEvGj3LLP9 AUfoqm8lOx8NMEadjPfmGwjZ8tVAM5LcIGT9kZwgbBe6qqk+fFmg60/401oXqkUnc8K2 3DbaFbZbhnQOlY7yoyXCcoptvgN7hRHoxtVQcM8B/RCJEN4ebP4uGAQE2uLFUUZU1G0T xVn+QJgkjqkWCX9UMr9ih0x+ZwJXfHhYl6E+CIqcAvXUbmesfgr7FOkcNqOehG3f/9DN B3jMI2J5jhs0yI+QB9Aa2n/yTm/Yrrkcaw63wIxvpoIPId6SzR4vFbvRCbHi9R0MPQlz KfLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744222963; x=1744827763; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DntxylgD/bFAuSnepybtNVZWTfXmXLq0Ys2Pz9lSxJ8=; b=KZZSpNDwXFNp/SbmEKEIvXKT9dy5teWFnf+1LA1FcngN02oMK2vjcJXvS9Y1QtKYBq AId8/B9UgJtlEmxj3cmAyGf0TIywFcCrmCzog+3FpJYp9WGVkajbtEVMDsj86yuXQGBX Rh98KC+H/vInv6y0aGwPwTzXfdK8lnt1XVViex1M3GtYhEj2RRfaDJ4xodOb0ESr5p5i LsXEY3jMNyGiBHfQyiiWZ4tLsuj88MugXiXkMjhZ1u9IHY4v+iLp61nMSCZz3JC72AtC 8Wd6yyyyBaZhYrzfmlPU1/WDI6IM61YMDNj0berBJitSmAW1wB1d4HdijKn0NXMcF9Bi 9zwQ== X-Forwarded-Encrypted: i=1; AJvYcCVVhhPR3eHEIRFC4jsOz29K7cmTfiHfZNzgXv0ROqbpD6YMS/QD3Xeo8BKa8NZcVhcN9Ls=@vger.kernel.org X-Gm-Message-State: AOJu0Yx6mfTMnu35sFLmeAICx+EeQdY0ZXALbW2cQXjBQXA1Bsyw0DTX /Ax6B/aRGsg47OkGapZk0OIVnSDPXthKj0QA6/H8DCewnz+ChWFvtCr5mKaV1qE= X-Gm-Gg: ASbGncsceLeqwcvHJPQ19dc2ZtHPcTg9Usl+G1BF6ux6+0JFST0YUOZFrZuVyX1ELZj SbOoX1LqyjivqhwWpwvysRyK6TF9OY/hiaclWs79yOt/GMvSiixi9mC2f6sulsjuwTTJFTFIZee je1J3fv1JAhbxrfkBzBCClkWbQHsx/Exxf8CFRmPwpYOpsnUlIPHEHKZKdcf+9icmyYuJERA8tG WQCXApLpyZfioz3YO84EpjRKIHOp5wIF4ZTULyMMSVpp4quwLbEBD6hiljMaqOvMnTUnrDiCdPp /alOwp+p5Pxre1ZYue6NXI1lPU6RTwAlUWQTCv1/ X-Google-Smtp-Source: AGHT+IHRVgJw3NYOgckrFBmUcQqoYpSp6TZXaA/bl5F+NRrvKmI0aks6nLltvSzd1NloBpdjcmwp+A== X-Received: by 2002:a17:902:f60f:b0:223:5696:44d6 with SMTP id d9443c01a7336-22ac2c2f69amr23547465ad.12.1744222963301; Wed, 09 Apr 2025 11:22:43 -0700 (PDT) Received: from t14.. ([2001:5a8:4528:b100:2f6b:1a9a:d8b7:a414]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73bb1d2ae5fsm1673021b3a.20.2025.04.09.11.22.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 11:22:43 -0700 (PDT) From: Jordan Rife To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Jordan Rife , Aditi Ghag , Daniel Borkmann , Martin KaFai Lau , Willem de Bruijn , Kuniyuki Iwashima Subject: [PATCH v1 bpf-next 1/5] bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items Date: Wed, 9 Apr 2025 11:22:30 -0700 Message-ID: <20250409182237.441532-2-jordan@jrife.io> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250409182237.441532-1-jordan@jrife.io> References: <20250409182237.441532-1-jordan@jrife.io> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Prepare for the next commit that tracks cookies between iterations by converting struct sock **batch to union bpf_udp_iter_batch_item *batch inside struct bpf_udp_iter_state. Signed-off-by: Jordan Rife Reviewed-by: Kuniyuki Iwashima --- net/ipv4/udp.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index d0bffcfa56d8..59c3281962b9 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -3384,13 +3384,17 @@ struct bpf_iter__udp { int bucket __aligned(8); }; +union bpf_udp_iter_batch_item { + struct sock *sock; +}; + struct bpf_udp_iter_state { struct udp_iter_state state; unsigned int cur_sk; unsigned int end_sk; unsigned int max_sk; int offset; - struct sock **batch; + union bpf_udp_iter_batch_item *batch; bool st_bucket_done; }; @@ -3449,7 +3453,7 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) } if (iter->end_sk < iter->max_sk) { sock_hold(sk); - iter->batch[iter->end_sk++] = sk; + iter->batch[iter->end_sk++].sock = sk; } batch_sks++; } @@ -3479,7 +3483,7 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) goto again; } done: - return iter->batch[0]; + return iter->batch[0].sock; } static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos) @@ -3491,7 +3495,7 @@ static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos) * done with seq_show(), so unref the iter->cur_sk. */ if (iter->cur_sk < iter->end_sk) { - sock_put(iter->batch[iter->cur_sk++]); + sock_put(iter->batch[iter->cur_sk++].sock); ++iter->offset; } @@ -3499,7 +3503,7 @@ static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos) * available in the current bucket batch. */ if (iter->cur_sk < iter->end_sk) - sk = iter->batch[iter->cur_sk]; + sk = iter->batch[iter->cur_sk].sock; else /* Prepare a new batch. */ sk = bpf_iter_udp_batch(seq); @@ -3564,7 +3568,7 @@ static int bpf_iter_udp_seq_show(struct seq_file *seq, void *v) static void bpf_iter_udp_put_batch(struct bpf_udp_iter_state *iter) { while (iter->cur_sk < iter->end_sk) - sock_put(iter->batch[iter->cur_sk++]); + sock_put(iter->batch[iter->cur_sk++].sock); } static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v) @@ -3827,7 +3831,7 @@ DEFINE_BPF_ITER_FUNC(udp, struct bpf_iter_meta *meta, static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, unsigned int new_batch_sz) { - struct sock **new_batch; + union bpf_udp_iter_batch_item *new_batch; new_batch = kvmalloc_array(new_batch_sz, sizeof(*new_batch), GFP_USER | __GFP_NOWARN); From patchwork Wed Apr 9 18:22:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Rife X-Patchwork-Id: 14045220 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BEA327701D for ; Wed, 9 Apr 2025 18:22:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222968; cv=none; b=ID6Yfj+gDKan3pTLhMZBKYg+gJBB+IwHMYDd6aMqiPaU2N3WdJvqonixRIac0tRWoKcaxt5UV4Qq/FvShHR3Lz8tA5bb5Ur+moiU3L8z7rl+Kc4/+V6bONmxJhzerNKSHsUHsap2g5a1K7zLMyO6mTkiJaj/eie20UXEjLYsGmQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222968; c=relaxed/simple; bh=9Ty13kz+vG892eEU/2suz2+F7mpuey8R9EO5xxq9kNY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UeYosBdQ8Ueem2HO8judBSS9Fe8QlXO66OCM2VtkgU5/Og3sDfXEZ3AOkqhMBz8OpX+G3EeGtXgaAFwA0koVJ8/DappwBeaYxJY+O19qchvzY+mtigCcOeBLjZv0eLIhGb6D6T/NQHv3rC3u04BIVwy6ye6kPpB1dV9txntpCEQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io; spf=none smtp.mailfrom=jrife.io; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b=kogw9feN; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=jrife.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b="kogw9feN" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-737685eda9aso711137b3a.3 for ; Wed, 09 Apr 2025 11:22:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jrife-io.20230601.gappssmtp.com; s=20230601; t=1744222965; x=1744827765; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ISVnKv8eaf3vmQhEyj3tVD+8xYJUfBBYc2rhhs4las8=; b=kogw9feNWyrhq174XY8xBW1jdsC5gs3x10yiDjU9dvraynZN2+VFbmBwVXhh2LkZ1n 6Cpk26soXh59pPfXrb4n9mIKRf4UHn5QNIXl7uFf0GhwTsZEv25MNGJIxLRCHQtly3cw gdhblEioFgHVuakRucAq3sG4+sMiJ4+do3p5FF+JGp1HsQ0PhtmcWx4Dr/3COFZsX0DN 0yjSxZaogkmahQpfX2CpGt0xhUHk9NSAVtjHeO3HN4zCYL7bMuHRleprvG+2npa2UeGu 6qQb9Mcqc5lg54xRyOPxle060TjQk0hzFkVuUMN872wl9loor4ZnBG+Q8wIiQz7SFCps DvIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744222965; x=1744827765; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ISVnKv8eaf3vmQhEyj3tVD+8xYJUfBBYc2rhhs4las8=; b=N1mh50WzmfnfpsfREgisuN9h2yMZCKOSx/jSVIXNgVKfWpKZ3sz4WiNs1zcjRmianE 8rRWFs4GrYrQl4uuSIGgSXan8dkjPXqcTo1evcJo/zXHv0jle+KBOaMe17elwt8uKY1o 2Fqlk06Ek7O1Wyaxz+023ZBu2JqRO1FofnLXgDlu+K3JXCUyIMGFmDqw8ZBa+RfsmCuj 7pESKByRJjg4PkuuWwkP8VjI94m+P/EdLebYze2tPdgErNnHQw6iO080UKfuMnuxJiRo OxIxH3UvC9resh/R0gd6QPz7IOWawg5Ig80JNIO1OxJWP7DEnF2U/ViBlgj42LgZmqCH fEjQ== X-Forwarded-Encrypted: i=1; AJvYcCX6dNIixsuCNiVp4FySoEVh1tGoy+Eti/nOwlwyD4aW69gou7pW2indiputmEvRGdzWmzA=@vger.kernel.org X-Gm-Message-State: AOJu0YyO4EzQJq2n4/qrxyTHinlZM9j9wqOaTXE9n8xJuiXTSH35d8Uu 22k9w2ObjURs4aKv/wLDce65ZgxJIV/tcYf8WRv/0d2o+8JnWctl1q9mYzZe5m8= X-Gm-Gg: ASbGncuuBM2pA8vyqCwXj7JTQVEtfg1kiDRa7bYfotRf97wKrzg+7JsMlwpt6yE4Cds s+XjVOotm2OTkBRCinmC+V38TpO3ejLV0g4ysWO1gUti8oNCxGkCHrpWK0iiW5TELaPSixABY/S MrxqMO8eYjOrYC4x9+P3DDghhV0vabRBGDuZupZZyJKrnERF4iWl+ZNc3X2+mZrbXRyLGVzPNch weORgY1DwJlwoPXmBLKJCLLF5WXTxw9Fkj/zKo/Jq+y9MMxnRehXkGKflgavCD6onDVnokCFhwI LD4f3VHEJwgbbCDsxjB9qvLH1N/hlw== X-Google-Smtp-Source: AGHT+IGq3htNAA0EkBA5+Y5S+GWVypggO99v4XIcdwKhyHKXM7AKce3ggyPxtD75f/rgwHNcQQ5MCg== X-Received: by 2002:a05:6a00:2187:b0:730:8c9d:5842 with SMTP id d2e1a72fcca58-73bae551359mr1797432b3a.5.1744222964313; Wed, 09 Apr 2025 11:22:44 -0700 (PDT) Received: from t14.. ([2001:5a8:4528:b100:2f6b:1a9a:d8b7:a414]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73bb1d2ae5fsm1673021b3a.20.2025.04.09.11.22.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 11:22:44 -0700 (PDT) From: Jordan Rife To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Jordan Rife , Aditi Ghag , Daniel Borkmann , Martin KaFai Lau , Willem de Bruijn , Kuniyuki Iwashima Subject: [PATCH v1 bpf-next 2/5] bpf: udp: Avoid socket skips and repeats during iteration Date: Wed, 9 Apr 2025 11:22:31 -0700 Message-ID: <20250409182237.441532-3-jordan@jrife.io> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250409182237.441532-1-jordan@jrife.io> References: <20250409182237.441532-1-jordan@jrife.io> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Replace the offset-based approach for tracking progress through a bucket in the UDP table with one based on socket cookies. Remember the cookies of unprocessed sockets from the last batch and use this list to pick up where we left off or, in the case that the next socket disappears between reads, find the first socket after that point that still exists in the bucket and resume from there. In order to make the control flow a bit easier to follow inside bpf_iter_udp_batch, introduce the udp_portaddr_for_each_entry_from macro and use this to split bucket processing into two stages: finding the starting point and adding items to the next batch. Originally, I implemented this patch inside a single udp_portaddr_for_each_entry loop, as it was before, but I found the resulting logic a bit messy. Overall, this version seems more readable. Signed-off-by: Jordan Rife --- include/linux/udp.h | 3 ++ net/ipv4/udp.c | 78 ++++++++++++++++++++++++++++++++++----------- 2 files changed, 63 insertions(+), 18 deletions(-) diff --git a/include/linux/udp.h b/include/linux/udp.h index 0807e21cfec9..a69da9c4c1c5 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -209,6 +209,9 @@ static inline void udp_allow_gso(struct sock *sk) #define udp_portaddr_for_each_entry(__sk, list) \ hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node) +#define udp_portaddr_for_each_entry_from(__sk) \ + hlist_for_each_entry_from(__sk, __sk_common.skc_portaddr_node) + #define udp_portaddr_for_each_entry_rcu(__sk, list) \ hlist_for_each_entry_rcu(__sk, list, __sk_common.skc_portaddr_node) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 59c3281962b9..f6a579d61717 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -93,6 +93,7 @@ #include #include #include +#include #include #include #include @@ -3386,6 +3387,7 @@ struct bpf_iter__udp { union bpf_udp_iter_batch_item { struct sock *sock; + __u64 cookie; }; struct bpf_udp_iter_state { @@ -3393,26 +3395,42 @@ struct bpf_udp_iter_state { unsigned int cur_sk; unsigned int end_sk; unsigned int max_sk; - int offset; union bpf_udp_iter_batch_item *batch; bool st_bucket_done; }; static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, unsigned int new_batch_sz); +static struct sock *bpf_iter_udp_resume(struct sock *first_sk, + union bpf_udp_iter_batch_item *cookies, + int n_cookies) +{ + struct sock *sk = NULL; + int i = 0; + + for (; i < n_cookies; i++) { + sk = first_sk; + udp_portaddr_for_each_entry_from(sk) + if (cookies[i].cookie == atomic64_read(&sk->sk_cookie)) + goto done; + } +done: + return sk; +} + static struct sock *bpf_iter_udp_batch(struct seq_file *seq) { struct bpf_udp_iter_state *iter = seq->private; struct udp_iter_state *state = &iter->state; + unsigned int find_cookie, end_cookie = 0; struct net *net = seq_file_net(seq); - int resume_bucket, resume_offset; struct udp_table *udptable; unsigned int batch_sks = 0; bool resized = false; + int resume_bucket; struct sock *sk; resume_bucket = state->bucket; - resume_offset = iter->offset; /* The current batch is done, so advance the bucket. */ if (iter->st_bucket_done) @@ -3428,6 +3446,8 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) * before releasing the bucket lock. This allows BPF programs that are * called in seq_show to acquire the bucket lock if needed. */ + find_cookie = iter->cur_sk; + end_cookie = iter->end_sk; iter->cur_sk = 0; iter->end_sk = 0; iter->st_bucket_done = false; @@ -3439,18 +3459,26 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) if (hlist_empty(&hslot2->head)) continue; - iter->offset = 0; spin_lock_bh(&hslot2->lock); - udp_portaddr_for_each_entry(sk, &hslot2->head) { + /* Initialize sk to the first socket in hslot2. */ + sk = hlist_entry_safe(hslot2->head.first, struct sock, + __sk_common.skc_portaddr_node); + /* Resume from the first (in iteration order) unseen socket from + * the last batch that still exists in resume_bucket. Most of + * the time this will just be where the last iteration left off + * in resume_bucket unless that socket disappeared between + * reads. + * + * Skip this if end_cookie isn't set; this is the first + * batch, we're on bucket zero, and we want to start from the + * beginning. + */ + if (state->bucket == resume_bucket && end_cookie) + sk = bpf_iter_udp_resume(sk, + &iter->batch[find_cookie], + end_cookie - find_cookie); + udp_portaddr_for_each_entry_from(sk) { if (seq_sk_match(seq, sk)) { - /* Resume from the last iterated socket at the - * offset in the bucket before iterator was stopped. - */ - if (state->bucket == resume_bucket && - iter->offset < resume_offset) { - ++iter->offset; - continue; - } if (iter->end_sk < iter->max_sk) { sock_hold(sk); iter->batch[iter->end_sk++].sock = sk; @@ -3494,10 +3522,8 @@ static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos) /* Whenever seq_next() is called, the iter->cur_sk is * done with seq_show(), so unref the iter->cur_sk. */ - if (iter->cur_sk < iter->end_sk) { + if (iter->cur_sk < iter->end_sk) sock_put(iter->batch[iter->cur_sk++].sock); - ++iter->offset; - } /* After updating iter->cur_sk, check if there are more sockets * available in the current bucket batch. @@ -3567,8 +3593,19 @@ static int bpf_iter_udp_seq_show(struct seq_file *seq, void *v) static void bpf_iter_udp_put_batch(struct bpf_udp_iter_state *iter) { - while (iter->cur_sk < iter->end_sk) - sock_put(iter->batch[iter->cur_sk++].sock); + union bpf_udp_iter_batch_item *item; + unsigned int cur_sk = iter->cur_sk; + __u64 cookie; + + /* Remember the cookies of the sockets we haven't seen yet, so we can + * pick up where we left off next time around. + */ + while (cur_sk < iter->end_sk) { + item = &iter->batch[cur_sk++]; + cookie = __sock_gen_cookie(item->sock); + sock_put(item->sock); + item->cookie = cookie; + } } static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v) @@ -3839,6 +3876,11 @@ static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, return -ENOMEM; bpf_iter_udp_put_batch(iter); + WARN_ON_ONCE(new_batch_sz < iter->max_sk); + /* Make sure the new batch has the cookies of the sockets we haven't + * visited yet. + */ + memcpy(new_batch, iter->batch, iter->end_sk); kvfree(iter->batch); iter->batch = new_batch; iter->max_sk = new_batch_sz; From patchwork Wed Apr 9 18:22:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Rife X-Patchwork-Id: 14045221 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CB9E276030 for ; Wed, 9 Apr 2025 18:22:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222968; cv=none; b=Qigm/5adg5KHIOzHlFFjZxB9u1C55Z430oXXjGHTyPk52GwMzfwVk+Askl9i28xNKLbj7jinajs2n3jgLBf3O74QJSESyJEp53/Ktw2VbHxIL06uPwljmeyELlTVGOCT71hMWoUV9095yfR2x5z1H9hfOtEPOHrZuhvJTk2qS0c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222968; c=relaxed/simple; bh=JAO8+G3HphwkOY+iqqSe3ukkWPpaXH8YJAO42Y9xFhY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=W9PQgyq+OG8zk9qUqTgHQN9tQQslWqbzPwkX978zJ18LYI1bI54OiBa3YgzCDzUzFuleqb7siTYlPJIWwU7Z+f+WoqAhimHEE9YmpZ5ehGml5iWWb8393WRbXshuOWXcI0weEcc9ciCX1m4nylW/FMxYPzCh5bBpAFnERWeAET0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io; spf=none smtp.mailfrom=jrife.io; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b=WstELgZ0; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=jrife.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b="WstELgZ0" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-7390294782bso1217479b3a.0 for ; Wed, 09 Apr 2025 11:22:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jrife-io.20230601.gappssmtp.com; s=20230601; t=1744222966; x=1744827766; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=68vcx7Bw67jxBeayoI9I9qADf7d6VxnZrSqJdxToVFs=; b=WstELgZ0CAJh/wQGUDcGyboizQtCyKtIy+JDvXN1H5CPMdBRzLiEwBnNjHaaDFR/7W xGiwuXvq9IciKQ2px7BdbReZlEYcJC6p1VGBxFPN9T7oUDbjAIzG+ioxmqzPu5zHlYGG Sg1NV/m/aqLhHTj8NdUZoXMr9C9GJcccUV1JCe7BpZwPOCbtR7faY14ZevuwBI0X6Ykw vGcBObPlT3d27yGUl1tP3r9NUqc6L4S6UQnzhq320g2VQU0kGLmVhTcndWJfCf5UCI3E CdtLRyfYguoIFUjiGb8r8g5tGdJL9Zu42S1vYgBy7huyHGI6sBD3MbFH+8AkWCCGCNyx 1VHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744222966; x=1744827766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=68vcx7Bw67jxBeayoI9I9qADf7d6VxnZrSqJdxToVFs=; b=Vw2ONskY6B4A3Sa9DM/lHFCH5JoNJYYf+95KuSqBZ5jZUav/6xWiWcvtr+cy9Blh36 DhwzgIIRhzWuT8MkhtdZDWGQCECLR+zQ1tAVhOW2fiquM/Zv76r+2wc53RNwJzsIzbaQ AjZ/8hPu7l9Dp9mrYA0rtauiPfWgBXOR5KI8Zb/MQIKKYByJfDHMV1wH9fByrh0/qaTz Lzxz7e3+38ppJ0e1NZEho3vaZHhRMRssukOzj4qKtOZW+EtyVilfz0KVTJ7NrnuzhOcB VJ23t17CLmBhtU9p38ehZ+Bgd3ckoJFgyhP2i/ExHyrCbal9JdJTIjfHAQMwqkLN2CDz xuDg== X-Forwarded-Encrypted: i=1; AJvYcCVuPdWhvPbc1a74h4lspgJMowDMLMK4AGlKLNsfCmEjwuQBsWaTIehnbzPgcw0GpsIumgU=@vger.kernel.org X-Gm-Message-State: AOJu0YyJKDef42bIspY43gZDWWPjwxq3lTZSKiVUNgpcIrhcyQHvT13E fP0nf3hcbKtwUiSiHVeb7slHDGKCr1jEed9ntHiX0KRW6mD9hP+t35gTruCveak= X-Gm-Gg: ASbGnctL5hBuj2Zxpy2D23tzQ/cSNmQlbQ2A3pCO+d0RGLx+qEnnKPSSPSpeGqVsskH qdXLvy8qyT2sapcooExj2d50nNku5CK966Gzz/OZW5M2sCF2juH7YxxMmULKHx4NWRcwGfWON8m eisnmUjVKAF21BeHAOiA4Cp1MmMnab3/cnqoJaRQoF+4UuqHNmgigO2hon03TMzDxW8AiYUJyEn k1Gf4LlRzoy96ftUOPzWXyKRAKAT6k2CEANqhtKtxLU2brAqbPOHzmx+zMih6c6eyW20+6+ZK0s Tqv4f8SmZ/vIa4X2v4snzIGebyYkyl6lXmaG/6cX X-Google-Smtp-Source: AGHT+IFybk8OdTnLHPlMB6aOkmAK2JetcmLdU041qlRpNVFo6uvs8RWdiyb6MlHvRgMZg5/VqogSYg== X-Received: by 2002:a05:6a00:1309:b0:736:355b:5df6 with SMTP id d2e1a72fcca58-73bafd6d4aemr1415683b3a.6.1744222965355; Wed, 09 Apr 2025 11:22:45 -0700 (PDT) Received: from t14.. ([2001:5a8:4528:b100:2f6b:1a9a:d8b7:a414]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73bb1d2ae5fsm1673021b3a.20.2025.04.09.11.22.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 11:22:45 -0700 (PDT) From: Jordan Rife To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Jordan Rife , Aditi Ghag , Daniel Borkmann , Martin KaFai Lau , Willem de Bruijn , Kuniyuki Iwashima Subject: [PATCH v1 bpf-next 3/5] bpf: udp: Propagate ENOMEM up from bpf_iter_udp_batch Date: Wed, 9 Apr 2025 11:22:32 -0700 Message-ID: <20250409182237.441532-4-jordan@jrife.io> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250409182237.441532-1-jordan@jrife.io> References: <20250409182237.441532-1-jordan@jrife.io> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Stop iteration if the current bucket can't be contained in a single batch to avoid choosing between skipping or repeating sockets. In cases where none of the saved cookies can be found in the current bucket and the batch isn't big enough to contain all the sockets in the bucket, there are really only two choices, neither of which is desirable: 1. Start from the beginning, assuming we haven't seen any sockets in the current bucket, in which case we might repeat a socket we've already seen. 2. Go to the next bucket to avoid repeating a socket we may have already seen, in which case we may accidentally skip a socket that we didn't yet visit. To avoid this tradeoff, enforce the invariant that the batch always contains a full snapshot of the bucket from last time by returning -ENOMEM if bpf_iter_udp_realloc_batch() can't grab enough memory to fit all sockets in the current bucket. To test this code path, I forced bpf_iter_udp_realloc_batch() to return -ENOMEM when called from within bpf_iter_udp_batch() and observed that read() fails in userspace with errno set to ENOMEM. Otherwise, it's a bit hard to test this scenario. Link: https://lore.kernel.org/bpf/CABi4-ogUtMrH8-NVB6W8Xg_F_KDLq=yy-yu-tKr2udXE2Mu1Lg@mail.gmail.com/ Signed-off-by: Jordan Rife Reviewed-by: Kuniyuki Iwashima --- net/ipv4/udp.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index f6a579d61717..de58dae6ff3c 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -3429,6 +3429,7 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) bool resized = false; int resume_bucket; struct sock *sk; + int err = 0; resume_bucket = state->bucket; @@ -3503,7 +3504,11 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) iter->st_bucket_done = true; goto done; } - if (!resized && !bpf_iter_udp_realloc_batch(iter, batch_sks * 3 / 2)) { + if (!resized) { + err = bpf_iter_udp_realloc_batch(iter, batch_sks * 3 / 2); + if (err) + return ERR_PTR(err); + resized = true; /* After allocating a larger batch, retry one more time to grab * the whole bucket. From patchwork Wed Apr 9 18:22:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Rife X-Patchwork-Id: 14045222 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A619270EC0 for ; Wed, 9 Apr 2025 18:22:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222969; cv=none; b=HWFGjW4+DIA2ESlmaLsTmQzhUzGHhVF7YU0Fd6JO98ECR8UcDKqkS+R/MxI5jdY/+l2UlMYEgv+8Px97mn6SzXb5a0khcpHRUEfaXAvLS5XZqO5d6FHgUMdq/OkYqhdcyN0K6v4K7vj3zIjXOQK32XpjGCS6KoAm0WEHsHflKnQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222969; c=relaxed/simple; bh=8D3K8jEJFy2ljx8/dNSQp4nr5+k3ju7eECDWQX6lssg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JCH1VfQN/c4cdN4ScKRSEzkFqv0Jwg3wBVIkVgkURu/N35/ud7XvKeU4gq/DP2LEAm3nKRBVw9jAlWnRLslVnM5e0Fi/gtUveQmX3Qn0CEW8JYKGGKRmrLxIHoPKgFRc6/CKDCsyjLVCtnjXLsCR1tivD/tTgsqCA2R/13HwC7s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io; spf=none smtp.mailfrom=jrife.io; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b=zsa4sRbu; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=jrife.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b="zsa4sRbu" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-736b94d12b6so558186b3a.1 for ; Wed, 09 Apr 2025 11:22:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jrife-io.20230601.gappssmtp.com; s=20230601; t=1744222966; x=1744827766; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4N8moraxYLNX0qhNglQq0LQLGcDgagzwGmqz/6Y4IFY=; b=zsa4sRbuxLDqHJcPWwjOc0W8Vxv37fsYWyGDxnoTmUtQENaaArxVSzwyhGb/9HjRzK GNRV27qCj0fuc6uW9IhGcXZKsyG796rjqKbxpVbzE1o6+5aUpMwVjEMmzHJJnYtpapyc Zlrm2jxfIrdSWlxkDlfV9Fy4/hN0gLxfhwB1qJnEVv04Sy7WHycx25gJOgB2H8h6rUyC 9wfIJ3LHEXkWhIDBbBtsKijaL1U2/ZftklKgR8zG2zqW7IDGcW4FrFGlsyp1kKJzyJDw 4T4VsCCYHkcinaOZfj5o/McwzB0umrScmEiSXo0vfZCksY0zhVZmVTFSkuJ+Xl2XfHoH 2wmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744222966; x=1744827766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4N8moraxYLNX0qhNglQq0LQLGcDgagzwGmqz/6Y4IFY=; b=lHEzCMBnYpDtotk3E6VdDkrRtPcTXhNV9H+AVcUWvb4/rVHmJBYxJc11n2IrL/BzWG hwWMUDXca4JwQ8yk1fzErzWVhDhXsorEyuss9y/qpp9kQkd4EZpm/s35ZnAOj5IVM2jh do0y1Crxs7k184R1JqHsM4cVmP27agYuIW4o7WvrkqVrXZjoHCPKa2Zeotxqu6bM6yXv 212zuCbGfKKg796yx24fsbM2vHBPV0Kc/M9bp/P+/p+k/Sc/OTmR7jLawCOAjMa70JiH lkXo4A3URl2imt7K2w80VlqvbCX/SfEg8kNJIFSDLScYr2Fkbk+mjj3dM6c1h/AMTJRV wafA== X-Forwarded-Encrypted: i=1; AJvYcCViYJkolkU4u/BOrNXzhlaw2VdJIavKs1vYKbAeTyhVppwpBl8NRS2VqXDCzmRdP8SmWeU=@vger.kernel.org X-Gm-Message-State: AOJu0Yx5Dj76RrBP6MWFy9drHewY/q2LhiwIegYfzjAcXzCERt6vUmIn U+kLSYmgr2aOjFnt0ZSGGSu5BUG5F85Vwr3QYjOKWuGbPGdIj1kC2K1ApDlZC5A= X-Gm-Gg: ASbGncsb84wIOxh601g7fN55mNiMYRbL6uqOdfwCh0z7kgzb89VfaSsxgE+4Ial5m67 q/CqDAHNs713Ih4Segy1nFB+rGUUII2j2EDBj6Q39KU1QiTVSrdYopLheWtNlDJ7syHSfkFdqOG uMT5ON6CxSEJgc3z3QUfpPftTIGcIQSCobJ1SnZthDD9TpKJFMZau5MoontiA3cRVqPpZlHyEUH ighay1jJfCmG4kAmBr2L+ehsR2X22ou0IIhjZZABdTbiYNk1qEEIPlNb7j/HSgZlVdUSzeegaxw +Q6subOXDHeGeXjTN39uE61t/XJ2hA== X-Google-Smtp-Source: AGHT+IG92xtvcFxjgQsiw/Zqfnb60jq+T8hpez4KBBUwwGtCyFXnFshWfQHyEMlRsrBN3ExBYvtDFg== X-Received: by 2002:a05:6a00:14c6:b0:726:380a:282f with SMTP id d2e1a72fcca58-73bae48bee8mr2064256b3a.2.1744222966483; Wed, 09 Apr 2025 11:22:46 -0700 (PDT) Received: from t14.. ([2001:5a8:4528:b100:2f6b:1a9a:d8b7:a414]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73bb1d2ae5fsm1673021b3a.20.2025.04.09.11.22.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 11:22:46 -0700 (PDT) From: Jordan Rife To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Jordan Rife , Aditi Ghag , Daniel Borkmann , Martin KaFai Lau , Willem de Bruijn , Kuniyuki Iwashima Subject: [PATCH v1 bpf-next 4/5] selftests/bpf: Return socket cookies from sock_iter_batch progs Date: Wed, 9 Apr 2025 11:22:33 -0700 Message-ID: <20250409182237.441532-5-jordan@jrife.io> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250409182237.441532-1-jordan@jrife.io> References: <20250409182237.441532-1-jordan@jrife.io> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Extend the iter_udp_soreuse and iter_tcp_soreuse programs to write the cookie of the current socket, so that we can track the identity of the sockets that the iterator has seen so far. Update the existing do_test function to account for this change to the iterator program output. At the same time, teach both programs to work with AF_INET as well. Signed-off-by: Jordan Rife --- .../bpf/prog_tests/sock_iter_batch.c | 33 +++++++++++-------- .../selftests/bpf/progs/bpf_tracing_net.h | 1 + .../selftests/bpf/progs/sock_iter_batch.c | 24 +++++++++++--- 3 files changed, 41 insertions(+), 17 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c b/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c index d56e18b25528..74dbe91806a0 100644 --- a/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c +++ b/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c @@ -9,12 +9,18 @@ static const int nr_soreuse = 4; +struct iter_out { + int idx; + __u64 cookie; +} __packed; + static void do_test(int sock_type, bool onebyone) { int err, i, nread, to_read, total_read, iter_fd = -1; - int first_idx, second_idx, indices[nr_soreuse]; + struct iter_out outputs[nr_soreuse]; struct bpf_link *link = NULL; struct sock_iter_batch *skel; + int first_idx, second_idx; int *fds[2] = {}; skel = sock_iter_batch__open(); @@ -34,6 +40,7 @@ static void do_test(int sock_type, bool onebyone) goto done; skel->rodata->ports[i] = ntohs(local_port); } + skel->rodata->sf = AF_INET6; err = sock_iter_batch__load(skel); if (!ASSERT_OK(err, "sock_iter_batch__load")) @@ -55,38 +62,38 @@ static void do_test(int sock_type, bool onebyone) * from a bucket and leave one socket out from * that bucket on purpose. */ - to_read = (nr_soreuse - 1) * sizeof(*indices); + to_read = (nr_soreuse - 1) * sizeof(*outputs); total_read = 0; first_idx = -1; do { - nread = read(iter_fd, indices, onebyone ? sizeof(*indices) : to_read); - if (nread <= 0 || nread % sizeof(*indices)) + nread = read(iter_fd, outputs, onebyone ? sizeof(*outputs) : to_read); + if (nread <= 0 || nread % sizeof(*outputs)) break; total_read += nread; if (first_idx == -1) - first_idx = indices[0]; - for (i = 0; i < nread / sizeof(*indices); i++) - ASSERT_EQ(indices[i], first_idx, "first_idx"); + first_idx = outputs[0].idx; + for (i = 0; i < nread / sizeof(*outputs); i++) + ASSERT_EQ(outputs[i].idx, first_idx, "first_idx"); } while (total_read < to_read); - ASSERT_EQ(nread, onebyone ? sizeof(*indices) : to_read, "nread"); + ASSERT_EQ(nread, onebyone ? sizeof(*outputs) : to_read, "nread"); ASSERT_EQ(total_read, to_read, "total_read"); free_fds(fds[first_idx], nr_soreuse); fds[first_idx] = NULL; /* Read the "whole" second bucket */ - to_read = nr_soreuse * sizeof(*indices); + to_read = nr_soreuse * sizeof(*outputs); total_read = 0; second_idx = !first_idx; do { - nread = read(iter_fd, indices, onebyone ? sizeof(*indices) : to_read); - if (nread <= 0 || nread % sizeof(*indices)) + nread = read(iter_fd, outputs, onebyone ? sizeof(*outputs) : to_read); + if (nread <= 0 || nread % sizeof(*outputs)) break; total_read += nread; - for (i = 0; i < nread / sizeof(*indices); i++) - ASSERT_EQ(indices[i], second_idx, "second_idx"); + for (i = 0; i < nread / sizeof(*outputs); i++) + ASSERT_EQ(outputs[i].idx, second_idx, "second_idx"); } while (total_read <= to_read); ASSERT_EQ(nread, 0, "nread"); /* Both so_reuseport ports should be in different buckets, so diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h index 659694162739..17db400f0e0d 100644 --- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h +++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h @@ -128,6 +128,7 @@ #define sk_refcnt __sk_common.skc_refcnt #define sk_state __sk_common.skc_state #define sk_net __sk_common.skc_net +#define sk_rcv_saddr __sk_common.skc_rcv_saddr #define sk_v6_daddr __sk_common.skc_v6_daddr #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr #define sk_flags __sk_common.skc_flags diff --git a/tools/testing/selftests/bpf/progs/sock_iter_batch.c b/tools/testing/selftests/bpf/progs/sock_iter_batch.c index 96531b0d9d55..8f483337e103 100644 --- a/tools/testing/selftests/bpf/progs/sock_iter_batch.c +++ b/tools/testing/selftests/bpf/progs/sock_iter_batch.c @@ -17,6 +17,12 @@ static bool ipv6_addr_loopback(const struct in6_addr *a) a->s6_addr32[2] | (a->s6_addr32[3] ^ bpf_htonl(1))) == 0; } +static bool ipv4_addr_loopback(__be32 a) +{ + return a == bpf_ntohl(0x7f000001); +} + +volatile const unsigned int sf; volatile const __u16 ports[2]; unsigned int bucket[2]; @@ -26,16 +32,20 @@ int iter_tcp_soreuse(struct bpf_iter__tcp *ctx) struct sock *sk = (struct sock *)ctx->sk_common; struct inet_hashinfo *hinfo; unsigned int hash; + __u64 sock_cookie; struct net *net; int idx; if (!sk) return 0; + sock_cookie = bpf_get_socket_cookie(sk); sk = bpf_core_cast(sk, struct sock); - if (sk->sk_family != AF_INET6 || + if (sk->sk_family != sf || sk->sk_state != TCP_LISTEN || - !ipv6_addr_loopback(&sk->sk_v6_rcv_saddr)) + sk->sk_family == AF_INET6 ? + !ipv6_addr_loopback(&sk->sk_v6_rcv_saddr) : + !ipv4_addr_loopback(sk->sk_rcv_saddr)) return 0; if (sk->sk_num == ports[0]) @@ -52,6 +62,7 @@ int iter_tcp_soreuse(struct bpf_iter__tcp *ctx) hinfo = net->ipv4.tcp_death_row.hashinfo; bucket[idx] = hash & hinfo->lhash2_mask; bpf_seq_write(ctx->meta->seq, &idx, sizeof(idx)); + bpf_seq_write(ctx->meta->seq, &sock_cookie, sizeof(sock_cookie)); return 0; } @@ -63,14 +74,18 @@ int iter_udp_soreuse(struct bpf_iter__udp *ctx) { struct sock *sk = (struct sock *)ctx->udp_sk; struct udp_table *udptable; + __u64 sock_cookie; int idx; if (!sk) return 0; + sock_cookie = bpf_get_socket_cookie(sk); sk = bpf_core_cast(sk, struct sock); - if (sk->sk_family != AF_INET6 || - !ipv6_addr_loopback(&sk->sk_v6_rcv_saddr)) + if (sk->sk_family != sf || + sk->sk_family == AF_INET6 ? + !ipv6_addr_loopback(&sk->sk_v6_rcv_saddr) : + !ipv4_addr_loopback(sk->sk_rcv_saddr)) return 0; if (sk->sk_num == ports[0]) @@ -84,6 +99,7 @@ int iter_udp_soreuse(struct bpf_iter__udp *ctx) udptable = sk->sk_net.net->ipv4.udp_table; bucket[idx] = udp_sk(sk)->udp_portaddr_hash & udptable->mask; bpf_seq_write(ctx->meta->seq, &idx, sizeof(idx)); + bpf_seq_write(ctx->meta->seq, &sock_cookie, sizeof(sock_cookie)); return 0; } From patchwork Wed Apr 9 18:22:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Rife X-Patchwork-Id: 14045223 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7ED7727780D for ; Wed, 9 Apr 2025 18:22:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222971; cv=none; b=Z8VLTSPGGwRmlQI7ALkKGS2qnKFn3FVUXYhh8dRudxQnkekX3WTLIwRb2UCt7Xqj6Y1ZzQ0eI2KyRjo6GX1Ln8JxukSAQDDZs99FgmF8DCdsKwd89/gIAHUZppNIe91BxaJipPqNkSQXX7vrGixtE3HCmySBg0PuGusTNXehH3o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744222971; c=relaxed/simple; bh=/pMOBzcyyUUdMqhqVQ6U+tC+dFT/CLOfpNqHxOlMUAk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=h839ElVZp9OZVdStwiTp++5tABH7NhB7VYczv58cFTsxsl7+kR34qcaoolmtVOVyiGf1peDnJFSIFxGwY3uYUfqv9zUHVX+7gR3rMOe1gq1iDGIn9BXisCR1G7QEINqvab8Dfc/NLrIYRuGeKEGWFTX6addJzj359fmDNND2Vo4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io; spf=none smtp.mailfrom=jrife.io; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b=WNOIn39Y; arc=none smtp.client-ip=209.85.210.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=jrife.io Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=jrife.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=jrife-io.20230601.gappssmtp.com header.i=@jrife-io.20230601.gappssmtp.com header.b="WNOIn39Y" Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-736c1c8e9e9so654698b3a.3 for ; Wed, 09 Apr 2025 11:22:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jrife-io.20230601.gappssmtp.com; s=20230601; t=1744222968; x=1744827768; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T02eRmlXfilzOoyNB0fLFiK3Jbrw514z8mkV5Oipiws=; b=WNOIn39YJMa5ZK1hddYhjU888mSIhANFoDFEPCrtrdOh1PSOAGoalMLEVmz4/RTfqd uTFPD8yh7ZF5Q89cgBqgof2NqdhLHlYaM0u5mSCCF89fBlk0xaASK9J54Frewj9rblEW PcQoqpnnZ+eSH9cPwo1Le96ey8teqNuKQiBAUrpL5k+xX2xDmBH9o/v7Gf6zEcI+n+zQ NUkD/ng2/2tlu6WzNTOpEfsXOkOZ9/xpfUVKXP3xeAePcVwTb+PdvtLsn1NXxaI+QFOz 2xyVMP9T/TOtiYuzzaade1HNGRYa4vRtnOci7MfZIypf+W9QN2UDnyMk5y7bA2yXu0D2 so3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744222968; x=1744827768; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T02eRmlXfilzOoyNB0fLFiK3Jbrw514z8mkV5Oipiws=; b=tDD3HIBUqQ5KQTW+b+APK8dv0z2+ohKu4s7m8cPn+YehomCzPwb3BAfK/ci+jWitum /i4oSl5WjEwcmeWUwh5eae8mR72eCLyDCfTeDxm55jluxKc62+RIrklkEjGHL8GeOl8X LnpX19cSNLaeqG4nLzHxBI/UUohYUN/NTel0uLfofJ/Gv7ZXpySFfUPS1fILimU9gWsq TZ8w7xeFxnfApCtlX0NGt6NJKXKsdnyu5d3jiLAnYNwnEo+TnUZmnHioyoWTcM6ZtyjG nds6Tp4QihNJQdFTnOAXW1by41yvZ87n0fO8lfuYZ0OhA7CUW6y+1nWt87P6NMD+PT8P mc0Q== X-Forwarded-Encrypted: i=1; AJvYcCWCFFAaYJccanF3LAVR11I++7hWoAQq+/3fbz1zAowa9y83JZpGp2qnSA5X7DlpS+YsObY=@vger.kernel.org X-Gm-Message-State: AOJu0Yyan3bu4bxtO9db5xcIYQhEABx1/ftLRN2qi9qIyixLkpXj4UGN tCR3X195ZJDdVHywNnTn8QokswyfnuCs3z6MdFr3EOAGwzYAA7xqdlKy8xgOHtMAdSX0JMkvqcY eTsY= X-Gm-Gg: ASbGncu1U8QjcXjUVOJsNspctIxIB6qwQqEPJuajkt3WkxUByEOnJfptl8eBMbWSTLd NFeHyY36psaDcxIxEFDGp4Svv4d0Bml45CytO3abWy/oEGFZNhHf0pTIlFrsyK0KhMnMRzo8XGB /rzvsK7Oh8SCYp2+CCTMrhW/yBRmHeIygz4WNbpAbqUAwbcSbBfPm4pSj7lrMgnR4ve6l39WrtF RkLs700h8cvl/ARgfex9jV/5/ZW5XxOcjOEImIUTBysqKtiAUEBpQUKZQjQujgHDt4iPyJj5sYE f6nf4Em1DlqBLmdcga03Tp2KsbKjSA== X-Google-Smtp-Source: AGHT+IGP8L+MLrZCYq8Kk7ujnPS2KzWM3xTWIGpfsMz0nGj/heVowo7+aFenukvATfKNNAIKoJzUVQ== X-Received: by 2002:a05:6a00:804:b0:733:b644:bf94 with SMTP id d2e1a72fcca58-73bae54d176mr1568346b3a.6.1744222967505; Wed, 09 Apr 2025 11:22:47 -0700 (PDT) Received: from t14.. ([2001:5a8:4528:b100:2f6b:1a9a:d8b7:a414]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73bb1d2ae5fsm1673021b3a.20.2025.04.09.11.22.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 11:22:47 -0700 (PDT) From: Jordan Rife To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Jordan Rife , Aditi Ghag , Daniel Borkmann , Martin KaFai Lau , Willem de Bruijn , Kuniyuki Iwashima Subject: [PATCH v1 bpf-next 5/5] selftests/bpf: Add tests for bucket resume logic in UDP socket iterators Date: Wed, 9 Apr 2025 11:22:34 -0700 Message-ID: <20250409182237.441532-6-jordan@jrife.io> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250409182237.441532-1-jordan@jrife.io> References: <20250409182237.441532-1-jordan@jrife.io> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Introduce a set of tests that exercise various bucket resume scenarios: * remove_seen resumes iteration after removing a socket from the bucket that we've already processed. Before, with the offset-based approach, this test would have skipped an unseen socket after resuming iteration. With the cookie-based approach, we now see all sockets exactly once. * remove_unseen exercises the condition where the next socket that we would have seen is removed from the bucket before we resume iteration. This tests the scenario where we need to scan past the first cookie in our remembered cookies list to find the socket from which to resume iteration. * remove_all exercises the condition where all sockets we remembered were removed from the bucket to make sure iteration terminates and returns no more results. * add_some exercises the condition where a few, but not enough to trigger a realloc, sockets are added to the head of the current bucket between reads. Before, with the offset-based approach, this test would have repeated sockets we've already seen. With the cookie-based approach, we now see all sockets exactly once. * force_realloc exercises the condition that we need to realloc the batch on a subsequent read, since more sockets than can be held in the current batch array were added to the current bucket. This exercies the logic inside bpf_iter_udp_realloc_batch that copies cookies into the new batch to make sure nothing is skipped or repeated. Signed-off-by: Jordan Rife --- .../bpf/prog_tests/sock_iter_batch.c | 418 ++++++++++++++++++ 1 file changed, 418 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c b/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c index 74dbe91806a0..93b992fa5efe 100644 --- a/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c +++ b/tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c @@ -7,6 +7,7 @@ #define TEST_NS "sock_iter_batch_netns" +static const int init_batch_size = 16; static const int nr_soreuse = 4; struct iter_out { @@ -14,6 +15,422 @@ struct iter_out { __u64 cookie; } __packed; +struct sock_count { + __u64 cookie; + int count; +}; + +static int insert(__u64 cookie, struct sock_count counts[], int counts_len) +{ + int insert = -1; + int i = 0; + + for (; i < counts_len; i++) { + if (!counts[i].cookie) { + insert = i; + } else if (counts[i].cookie == cookie) { + insert = i; + break; + } + } + if (insert < 0) + return insert; + + counts[insert].cookie = cookie; + counts[insert].count++; + + return counts[insert].count; +} + +static int read_n(int iter_fd, int n, struct sock_count counts[], + int counts_len) +{ + struct iter_out out; + int nread = 1; + int i = 0; + + for (; nread > 0 && (n < 0 || i < n); i++) { + nread = read(iter_fd, &out, sizeof(out)); + if (!nread || !ASSERT_GE(nread, 1, "nread")) + break; + ASSERT_GE(insert(out.cookie, counts, counts_len), 0, "insert"); + } + + ASSERT_TRUE(n < 0 || i == n, "n < 0 || i == n"); + + return i; +} + +static __u64 socket_cookie(int fd) +{ + __u64 cookie; + socklen_t cookie_len = sizeof(cookie); + static __u32 duration; /* for CHECK macro */ + + if (CHECK(getsockopt(fd, SOL_SOCKET, SO_COOKIE, &cookie, &cookie_len) < 0, + "getsockopt(SO_COOKIE)", "%s\n", strerror(errno))) + return 0; + return cookie; +} + +static bool was_seen(int fd, struct sock_count counts[], int counts_len) +{ + __u64 cookie = socket_cookie(fd); + int i = 0; + + for (; cookie && i < counts_len; i++) + if (cookie == counts[i].cookie) + return true; + + return false; +} + +static int get_seen_socket(int *fds, struct sock_count counts[], int n) +{ + int i = 0; + + for (; i < n; i++) + if (was_seen(fds[i], counts, n)) + return i; + return -1; +} + +static int get_nth_socket(int *fds, int fds_len, struct bpf_link *link, int n) +{ + int i, nread, iter_fd; + int nth_sock_idx = -1; + struct iter_out out; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "bpf_iter_create")) + return -1; + + for (; n >= 0; n--) { + nread = read(iter_fd, &out, sizeof(out)); + if (!nread || !ASSERT_GE(nread, 1, "nread")) + goto done; + } + + for (i = 0; i < fds_len && nth_sock_idx < 0; i++) + if (fds[i] >= 0 && socket_cookie(fds[i]) == out.cookie) + nth_sock_idx = i; +done: + if (iter_fd >= 0) + close(iter_fd); + return nth_sock_idx; +} + +static int get_seen_count(int fd, struct sock_count counts[], int n) +{ + __u64 cookie = socket_cookie(fd); + int count = 0; + int i = 0; + + for (; cookie && !count && i < n; i++) + if (cookie == counts[i].cookie) + count = counts[i].count; + + return count; +} + +static void check_n_were_seen_once(int *fds, int fds_len, int n, + struct sock_count counts[], int counts_len) +{ + int seen_once = 0; + int seen_cnt; + int i = 0; + + for (; i < fds_len; i++) { + /* Skip any sockets that were closed or that weren't seen + * exactly once. + */ + if (fds[i] < 0) + continue; + seen_cnt = get_seen_count(fds[i], counts, counts_len); + if (seen_cnt && ASSERT_EQ(seen_cnt, 1, "seen_cnt")) + seen_once++; + } + + ASSERT_EQ(seen_once, n, "seen_once"); +} + +static void remove_seen(int family, int sock_type, const char *addr, __u16 port, + int *socks, int socks_len, struct sock_count *counts, + int counts_len, struct bpf_link *link, int iter_fd) +{ + int close_idx; + + /* Iterate through the first socks_len - 1 sockets. */ + read_n(iter_fd, socks_len - 1, counts, counts_len); + + /* Make sure we saw socks_len - 1 sockets exactly once. */ + check_n_were_seen_once(socks, socks_len, socks_len - 1, counts, + counts_len); + + /* Close a socket we've already seen to remove it from the bucket. */ + close_idx = get_seen_socket(socks, counts, counts_len); + if (!ASSERT_GE(close_idx, 0, "close_idx")) + return; + close(socks[close_idx]); + socks[close_idx] = -1; + + /* Iterate through the rest of the sockets. */ + read_n(iter_fd, -1, counts, counts_len); + + /* Make sure the last socket wasn't skipped and that there were no + * repeats. + */ + check_n_were_seen_once(socks, socks_len, socks_len - 1, counts, + counts_len); +} + +static void remove_unseen(int family, int sock_type, const char *addr, + __u16 port, int *socks, int socks_len, + struct sock_count *counts, int counts_len, + struct bpf_link *link, int iter_fd) +{ + int close_idx; + + /* Iterate through the first socket. */ + read_n(iter_fd, 1, counts, counts_len); + + /* Make sure we saw a socket from fds. */ + check_n_were_seen_once(socks, socks_len, 1, counts, counts_len); + + /* Close what would be the next socket in the bucket to exercise the + * condition where we need to skip past the first cookie we remembered. + */ + close_idx = get_nth_socket(socks, socks_len, link, 1); + if (!ASSERT_GE(close_idx, 0, "close_idx")) + return; + close(socks[close_idx]); + socks[close_idx] = -1; + + /* Iterate through the rest of the sockets. */ + read_n(iter_fd, -1, counts, counts_len); + + /* Make sure the remaining sockets were seen exactly once and that we + * didn't repeat the socket that was already seen. + */ + check_n_were_seen_once(socks, socks_len, socks_len - 1, counts, + counts_len); +} + +static void remove_all(int family, int sock_type, const char *addr, + __u16 port, int *socks, int socks_len, + struct sock_count *counts, int counts_len, + struct bpf_link *link, int iter_fd) +{ + int close_idx, i; + + /* Iterate through the first socket. */ + read_n(iter_fd, 1, counts, counts_len); + + /* Make sure we saw a socket from fds. */ + check_n_were_seen_once(socks, socks_len, 1, counts, counts_len); + + /* Close all remaining sockets to exhaust the list of saved cookies and + * exit without putting any sockets into the batch on the next read. + */ + for (i = 0; i < socks_len - 1; i++) { + close_idx = get_nth_socket(socks, socks_len, link, 1); + if (!ASSERT_GE(close_idx, 0, "close_idx")) + return; + close(socks[close_idx]); + socks[close_idx] = -1; + } + + /* Make sure there are no more sockets returned */ + ASSERT_EQ(read_n(iter_fd, -1, counts, counts_len), 0, "read_n"); +} + +static void add_some(int family, int sock_type, const char *addr, __u16 port, + int *socks, int socks_len, struct sock_count *counts, + int counts_len, struct bpf_link *link, int iter_fd) +{ + int *new_socks = NULL; + + /* Iterate through the first socks_len - 1 sockets. */ + read_n(iter_fd, socks_len - 1, counts, counts_len); + + /* Make sure we saw socks_len - 1 sockets exactly once. */ + check_n_were_seen_once(socks, socks_len, socks_len - 1, counts, + counts_len); + + /* Double the number of sockets in the bucket. */ + new_socks = start_reuseport_server(family, sock_type, addr, port, 0, + socks_len); + if (!ASSERT_OK_PTR(new_socks, "start_reuseport_server")) + goto done; + + /* Iterate through the rest of the sockets. */ + read_n(iter_fd, -1, counts, counts_len); + + /* Make sure each of the original sockets was seen exactly once. */ + check_n_were_seen_once(socks, socks_len, socks_len, counts, + counts_len); +done: + if (new_socks) + free_fds(new_socks, socks_len); +} + +static void force_realloc(int family, int sock_type, const char *addr, + __u16 port, int *socks, int socks_len, + struct sock_count *counts, int counts_len, + struct bpf_link *link, int iter_fd) +{ + int *new_socks = NULL; + + /* Iterate through the first socket just to initialize the batch. */ + read_n(iter_fd, 1, counts, counts_len); + + /* Double the number of sockets in the bucket to force a realloc on the + * next read. + */ + new_socks = start_reuseport_server(family, sock_type, addr, port, 0, + socks_len); + if (!ASSERT_OK_PTR(new_socks, "start_reuseport_server")) + goto done; + + /* Iterate through the rest of the sockets. */ + read_n(iter_fd, -1, counts, counts_len); + + /* Make sure each socket from the first set was seen exactly once. */ + check_n_were_seen_once(socks, socks_len, socks_len, counts, + counts_len); +done: + if (new_socks) + free_fds(new_socks, socks_len); +} + +struct test_case { + void (*test)(int family, int sock_type, const char *addr, __u16 port, + int *socks, int socks_len, struct sock_count *counts, + int counts_len, struct bpf_link *link, int iter_fd); + const char *description; + int init_socks; + int max_socks; + int sock_type; + int family; +}; + +static struct test_case resume_tests[] = { + { + .description = "udp: resume after removing a seen socket", + .init_socks = nr_soreuse, + .max_socks = nr_soreuse, + .sock_type = SOCK_DGRAM, + .family = AF_INET6, + .test = remove_seen, + }, + { + .description = "udp: resume after removing one unseen socket", + .init_socks = nr_soreuse, + .max_socks = nr_soreuse, + .sock_type = SOCK_DGRAM, + .family = AF_INET6, + .test = remove_unseen, + }, + { + .description = "udp: resume after removing all unseen sockets", + .init_socks = nr_soreuse, + .max_socks = nr_soreuse, + .sock_type = SOCK_DGRAM, + .family = AF_INET6, + .test = remove_all, + }, + { + .description = "udp: resume after adding a few sockets", + .init_socks = nr_soreuse, + .max_socks = nr_soreuse, + .sock_type = SOCK_DGRAM, + /* Use AF_INET so that new sockets are added to the head of the + * bucket's list. + */ + .family = AF_INET, + .test = add_some, + }, + { + .description = "udp: force a realloc to occur", + .init_socks = init_batch_size, + .max_socks = init_batch_size * 2, + .sock_type = SOCK_DGRAM, + /* Use AF_INET6 so that new sockets are added to the tail of the + * bucket's list, needing to be added to the next batch to force + * a realloc. + */ + .family = AF_INET6, + .test = force_realloc, + }, +}; + +static void do_resume_test(struct test_case *tc) +{ + static const __u16 port = 10001; + struct bpf_link *link = NULL; + struct sock_iter_batch *skel; + struct sock_count *counts; + int err, iter_fd = -1; + const char *addr; + int *fds; + + counts = calloc(tc->max_socks, sizeof(*counts)); + if (!counts) + return; + skel = sock_iter_batch__open(); + if (!ASSERT_OK_PTR(skel, "sock_iter_batch__open")) + return; + + /* Prepare a bucket of sockets in the kernel hashtable */ + int local_port; + + addr = tc->family == AF_INET6 ? "::1" : "127.0.0.1"; + fds = start_reuseport_server(tc->family, tc->sock_type, addr, port, 0, + tc->init_socks); + if (!ASSERT_OK_PTR(fds, "start_reuseport_server")) + goto done; + local_port = get_socket_local_port(*fds); + if (!ASSERT_GE(local_port, 0, "get_socket_local_port")) + goto done; + skel->rodata->ports[0] = ntohs(local_port); + skel->rodata->sf = tc->family; + + err = sock_iter_batch__load(skel); + if (!ASSERT_OK(err, "sock_iter_batch__load")) + goto done; + + link = bpf_program__attach_iter(tc->sock_type == SOCK_STREAM ? + skel->progs.iter_tcp_soreuse : + skel->progs.iter_udp_soreuse, + NULL); + if (!ASSERT_OK_PTR(link, "bpf_program__attach_iter")) + goto done; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "bpf_iter_create")) + goto done; + + tc->test(tc->family, tc->sock_type, addr, port, fds, tc->init_socks, + counts, tc->max_socks, link, iter_fd); +done: + free_fds(fds, tc->init_socks); + if (iter_fd >= 0) + close(iter_fd); + bpf_link__destroy(link); + sock_iter_batch__destroy(skel); +} + +static void do_resume_tests(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(resume_tests); i++) { + if (test__start_subtest(resume_tests[i].description)) { + do_resume_test(&resume_tests[i]); + } + } +} + static void do_test(int sock_type, bool onebyone) { int err, i, nread, to_read, total_read, iter_fd = -1; @@ -135,6 +552,7 @@ void test_sock_iter_batch(void) do_test(SOCK_DGRAM, true); do_test(SOCK_DGRAM, false); } + do_resume_tests(); close_netns(nstoken); done: