[bpf-next,v3] libbpf: handle producer position overflow

Message ID	20230824220907.1172808-1-awerner32@gmail.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AD8AA93E for <bpf@vger.kernel.org>; Thu, 24 Aug 2023 22:17:13 +0000 (UTC) From: Andrew Werner <awerner32@gmail.com> To: bpf@vger.kernel.org Cc: kernel-team@dataexmachina.dev, alexei.starovoitov@gmail.com, andrii@kernel.org, olsajiri@gmail.com, houtao@huaweicloud.com, void@manifault.com, Andrew Werner <awerner32@gmail.com> Subject: [PATCH bpf-next v3] libbpf: handle producer position overflow Date: Thu, 24 Aug 2023 18:09:08 -0400 Message-Id: <20230824220907.1172808-1-awerner32@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[bpf-next,v3] libbpf: handle producer position overflow \| expand [bpf-next,v3] libbpf: handle producer position overflow

Context	Check	Description
bpf/vmtest-bpf-next-VM_Test-0	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for veristat
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-7	fail	Logs for test_maps on s390x with gcc
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1332 this patch: 1332
netdev/cc_maintainers	warning	9 maintainers not CCed: daniel@iogearbox.net kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com song@kernel.org sdf@google.com yonghong.song@linux.dev jolsa@kernel.org haoluo@google.com
netdev/build_clang	success	Errors and warnings before: 1353 this patch: 1353
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1355 this patch: 1355
netdev/checkpatch	warning	WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Andrew Werner Aug. 24, 2023, 10:09 p.m. UTC

Before this patch, the producer position could overflow `unsigned
long`, in which case libbpf would forever stop processing new writes to
the ringbuf. Similarly, overflows of the producer position could result
in __bpf_user_ringbuf_peek not discovering available data. This patch
addresses that bug by computing using the signed delta between the
consumer and producer position to determine if data is available; the
delta computation is robust to overflow.

A more defensive check could be to ensure that the delta is within
the allowed range, but such defensive checks are neither present in
the kernel side code nor in libbpf. The overflow that this patch
handles can occur while the producer and consumer follow a correct
protocol.

Secondarily, the type used to represent the positions in the
user_ring_buffer functions in both libbpf and the kernel has been
changed from u64 to unsigned long to match the type used in the
kernel's representation of the structure. The change occurs in the
same patch because it's required to align the data availability
calculations between the userspace producing ringbuf and the bpf
producing ringbuf.

Not included in this patch, a selftest was written to demonstrate the
bug, and indeed this patch allows the test to continue to make progress
past the overflow. The shape of the self test was as follows:

 a) Set up ringbuf of 2GB size (the maximum permitted size).
 b) reserve+commit maximum-sized records (ULONG_MAX/4) constantly as
    fast as possible.

With 1 million records per second repro time should be about 4.7 hours.
Such a test duration is impractical to run, hence the omission.

Additionally, this patch adds commentary around a separate point to note
that the modular arithmetic is valid in the face of overflows, as that
fact may not be obvious to future readers.

v2->v3:
  - Changed the representation of the consumer and producer positions
    from u64 to unsigned long in user_ring_buffer functions. 
  - Addressed overflow in __bpf_user_ringbuf_peek.
  - Changed data availability computations to use the signed delta
    between the consumer and producer positions rather than merely
    checking whether their values were unequal.
v1->v2:
  - Fixed comment grammar.
  - Properly formatted subject line.

Signed-off-by: Andrew Werner <awerner32@gmail.com>
---
 kernel/bpf/ringbuf.c    | 11 ++++++++---
 tools/lib/bpf/ringbuf.c | 16 +++++++++++++---
 2 files changed, 21 insertions(+), 6 deletions(-)

David Vernet Aug. 24, 2023, 10:46 p.m. UTC | #1

On Thu, Aug 24, 2023 at 06:09:08PM -0400, Andrew Werner wrote:
> Before this patch, the producer position could overflow `unsigned
> long`, in which case libbpf would forever stop processing new writes to
> the ringbuf. Similarly, overflows of the producer position could result
> in __bpf_user_ringbuf_peek not discovering available data. This patch
> addresses that bug by computing using the signed delta between the
> consumer and producer position to determine if data is available; the
> delta computation is robust to overflow.
> 
> A more defensive check could be to ensure that the delta is within
> the allowed range, but such defensive checks are neither present in
> the kernel side code nor in libbpf. The overflow that this patch
> handles can occur while the producer and consumer follow a correct
> protocol.
> 
> Secondarily, the type used to represent the positions in the
> user_ring_buffer functions in both libbpf and the kernel has been
> changed from u64 to unsigned long to match the type used in the
> kernel's representation of the structure. The change occurs in the
> same patch because it's required to align the data availability
> calculations between the userspace producing ringbuf and the bpf
> producing ringbuf.
> 
> Not included in this patch, a selftest was written to demonstrate the
> bug, and indeed this patch allows the test to continue to make progress
> past the overflow. The shape of the self test was as follows:
> 
>  a) Set up ringbuf of 2GB size (the maximum permitted size).
>  b) reserve+commit maximum-sized records (ULONG_MAX/4) constantly as
>     fast as possible.
> 
> With 1 million records per second repro time should be about 4.7 hours.
> Such a test duration is impractical to run, hence the omission.
> 
> Additionally, this patch adds commentary around a separate point to note
> that the modular arithmetic is valid in the face of overflows, as that
> fact may not be obvious to future readers.
> 
> v2->v3:
>   - Changed the representation of the consumer and producer positions
>     from u64 to unsigned long in user_ring_buffer functions. 
>   - Addressed overflow in __bpf_user_ringbuf_peek.
>   - Changed data availability computations to use the signed delta
>     between the consumer and producer positions rather than merely
>     checking whether their values were unequal.
> v1->v2:
>   - Fixed comment grammar.
>   - Properly formatted subject line.
> 
> Signed-off-by: Andrew Werner <awerner32@gmail.com>

Hey Andrew,

This LGTM, thanks for finding and fixing this. I left a comment below
about the cast >= vs. equality comparison, but I won't block on it given
that it's already been discussed on another thread. Here's my tag:

Reviewed-by: David Vernet<void@manifault.com>

> ---
>  kernel/bpf/ringbuf.c    | 11 ++++++++---
>  tools/lib/bpf/ringbuf.c | 16 +++++++++++++---
>  2 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> index f045fde632e5..0c48673520fb 100644
> --- a/kernel/bpf/ringbuf.c
> +++ b/kernel/bpf/ringbuf.c
> @@ -658,7 +658,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  {
>  	int err;
>  	u32 hdr_len, sample_len, total_len, flags, *hdr;
> -	u64 cons_pos, prod_pos;
> +	unsigned long cons_pos, prod_pos;
>  
>  	/* Synchronizes with smp_store_release() in user-space producer. */
>  	prod_pos = smp_load_acquire(&rb->producer_pos);
> @@ -667,7 +667,12 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  
>  	/* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
>  	cons_pos = smp_load_acquire(&rb->consumer_pos);
> -	if (cons_pos >= prod_pos)
> +
> +	/* Check if there's data available by computing the signed delta between
> +	 * cons_pos and prod_pos; a negative delta indicates that the consumer has
> +	 * not caught up. This formulation is robust to prod_pos wrapping around.
> +	 */
> +	if ((long)(cons_pos - prod_pos) >= 0)

I see that Andrii suggested doing it this way in [0] so I won't insist
on changing it, but IMO this is much less readable and more confusing
than just doing an if (cond_pos == prod_pos) with a comment. The way
it's written this way, it makes it look like there could be a situation
where cond_pos could be ahead of prod_pos, whereas that would actually
just be a bug elsewhere that we'd be papering over. I guess this is more
defensive. In any case, I won't insit on it needing to change.

[0]: https://lore.kernel.org/all/CAEf4BzZQQ=fz+NqFHhJcqKoVAvh4=XbH7HWaHKjUg5OOzi-PTw@mail.gmail.com/

>  		return -ENODATA;
>  
>  	hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask));
> @@ -711,7 +716,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  
>  static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
>  {
> -	u64 consumer_pos;
> +	unsigned long consumer_pos;
>  	u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
>  
>  	/* Using smp_load_acquire() is unnecessary here, as the busy-bit
> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> index 02199364db13..141030a89370 100644
> --- a/tools/lib/bpf/ringbuf.c
> +++ b/tools/lib/bpf/ringbuf.c
> @@ -237,7 +237,13 @@ static int64_t ringbuf_process_ring(struct ring *r)
>  	do {
>  		got_new_data = false;
>  		prod_pos = smp_load_acquire(r->producer_pos);
> -		while (cons_pos < prod_pos) {
> +
> +		/* Check if there's data available by computing the signed delta
> +		 * between cons_pos and prod_pos; a negative delta indicates that the
> +		 * consumer has not caught up. This formulation is robust to prod_pos
> +		 * wrapping around.
> +		 */
> +		while ((long)(cons_pos - prod_pos) < 0) {

Same here. Doing a != is much more clear, in my opinion. Not a blocker
though.

[...]

Thanks,
David

Andrew Werner Aug. 24, 2023, 11:42 p.m. UTC | #2

On Thu, Aug 24, 2023 at 6:46 PM David Vernet <void@manifault.com> wrote:
>
> On Thu, Aug 24, 2023 at 06:09:08PM -0400, Andrew Werner wrote:
> > Before this patch, the producer position could overflow `unsigned
> > long`, in which case libbpf would forever stop processing new writes to
> > the ringbuf. Similarly, overflows of the producer position could result
> > in __bpf_user_ringbuf_peek not discovering available data. This patch
> > addresses that bug by computing using the signed delta between the
> > consumer and producer position to determine if data is available; the
> > delta computation is robust to overflow.
> >
> > A more defensive check could be to ensure that the delta is within
> > the allowed range, but such defensive checks are neither present in
> > the kernel side code nor in libbpf. The overflow that this patch
> > handles can occur while the producer and consumer follow a correct
> > protocol.
> >
> > Secondarily, the type used to represent the positions in the
> > user_ring_buffer functions in both libbpf and the kernel has been
> > changed from u64 to unsigned long to match the type used in the
> > kernel's representation of the structure. The change occurs in the
> > same patch because it's required to align the data availability
> > calculations between the userspace producing ringbuf and the bpf
> > producing ringbuf.
> >
> > Not included in this patch, a selftest was written to demonstrate the
> > bug, and indeed this patch allows the test to continue to make progress
> > past the overflow. The shape of the self test was as follows:
> >
> >  a) Set up ringbuf of 2GB size (the maximum permitted size).
> >  b) reserve+commit maximum-sized records (ULONG_MAX/4) constantly as
> >     fast as possible.
> >
> > With 1 million records per second repro time should be about 4.7 hours.
> > Such a test duration is impractical to run, hence the omission.
> >
> > Additionally, this patch adds commentary around a separate point to note
> > that the modular arithmetic is valid in the face of overflows, as that
> > fact may not be obvious to future readers.
> >
> > v2->v3:
> >   - Changed the representation of the consumer and producer positions
> >     from u64 to unsigned long in user_ring_buffer functions.
> >   - Addressed overflow in __bpf_user_ringbuf_peek.
> >   - Changed data availability computations to use the signed delta
> >     between the consumer and producer positions rather than merely
> >     checking whether their values were unequal.
> > v1->v2:
> >   - Fixed comment grammar.
> >   - Properly formatted subject line.
> >
> > Signed-off-by: Andrew Werner <awerner32@gmail.com>
>
> Hey Andrew,
>
> This LGTM, thanks for finding and fixing this. I left a comment below
> about the cast >= vs. equality comparison, but I won't block on it given
> that it's already been discussed on another thread. Here's my tag:
>
> Reviewed-by: David Vernet<void@manifault.com>
>
> > ---
> >  kernel/bpf/ringbuf.c    | 11 ++++++++---
> >  tools/lib/bpf/ringbuf.c | 16 +++++++++++++---
> >  2 files changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> > index f045fde632e5..0c48673520fb 100644
> > --- a/kernel/bpf/ringbuf.c
> > +++ b/kernel/bpf/ringbuf.c
> > @@ -658,7 +658,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
> >  {
> >       int err;
> >       u32 hdr_len, sample_len, total_len, flags, *hdr;
> > -     u64 cons_pos, prod_pos;
> > +     unsigned long cons_pos, prod_pos;
> >
> >       /* Synchronizes with smp_store_release() in user-space producer. */
> >       prod_pos = smp_load_acquire(&rb->producer_pos);
> > @@ -667,7 +667,12 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
> >
> >       /* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
> >       cons_pos = smp_load_acquire(&rb->consumer_pos);
> > -     if (cons_pos >= prod_pos)
> > +
> > +     /* Check if there's data available by computing the signed delta between
> > +      * cons_pos and prod_pos; a negative delta indicates that the consumer has
> > +      * not caught up. This formulation is robust to prod_pos wrapping around.
> > +      */
> > +     if ((long)(cons_pos - prod_pos) >= 0)
>
> I see that Andrii suggested doing it this way in [0] so I won't insist
> on changing it, but IMO this is much less readable and more confusing
> than just doing an if (cond_pos == prod_pos) with a comment. The way
> it's written this way, it makes it look like there could be a situation
> where cond_pos could be ahead of prod_pos, whereas that would actually
> just be a bug elsewhere that we'd be papering over. I guess this is more
> defensive. In any case, I won't insit on it needing to change.

I am also happy with the !=/== checks vs this delta checking. Any cast
in C makes me worry I'm doing something wrong. My understanding is
that the C spec doesn't state that `unsigned long` and `long` have to
be the same bit width even in the same program, but that's not
practically going to happen. Even the wraparound behavior and the
representation of negative numbers aren't defined in the spec as I
understand.

I'm going to wait for Andrii to opine before doing anything to avoid
excessive churn.

>
> [0]: https://lore.kernel.org/all/CAEf4BzZQQ=fz+NqFHhJcqKoVAvh4=XbH7HWaHKjUg5OOzi-PTw@mail.gmail.com/
>
> >               return -ENODATA;
> >
> >       hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask));
> > @@ -711,7 +716,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
> >
> >  static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
> >  {
> > -     u64 consumer_pos;
> > +     unsigned long consumer_pos;
> >       u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
> >
> >       /* Using smp_load_acquire() is unnecessary here, as the busy-bit
> > diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> > index 02199364db13..141030a89370 100644
> > --- a/tools/lib/bpf/ringbuf.c
> > +++ b/tools/lib/bpf/ringbuf.c
> > @@ -237,7 +237,13 @@ static int64_t ringbuf_process_ring(struct ring *r)
> >       do {
> >               got_new_data = false;
> >               prod_pos = smp_load_acquire(r->producer_pos);
> > -             while (cons_pos < prod_pos) {
> > +
> > +             /* Check if there's data available by computing the signed delta
> > +              * between cons_pos and prod_pos; a negative delta indicates that the
> > +              * consumer has not caught up. This formulation is robust to prod_pos
> > +              * wrapping around.
> > +              */
> > +             while ((long)(cons_pos - prod_pos) < 0) {
>
> Same here. Doing a != is much more clear, in my opinion. Not a blocker
> though.
>
> [...]
>
> Thanks,
> David

Hou Tao Aug. 25, 2023, 3:17 a.m. UTC | #3

Hi,

On 8/25/2023 6:09 AM, Andrew Werner wrote:
> Before this patch, the producer position could overflow `unsigned
> long`, in which case libbpf would forever stop processing new writes to
> the ringbuf. Similarly, overflows of the producer position could result
> in __bpf_user_ringbuf_peek not discovering available data. This patch
> addresses that bug by computing using the signed delta between the
> consumer and producer position to determine if data is available; the
> delta computation is robust to overflow.
>
> A more defensive check could be to ensure that the delta is within
> the allowed range, but such defensive checks are neither present in
> the kernel side code nor in libbpf. The overflow that this patch
> handles can occur while the producer and consumer follow a correct
> protocol.
>
> Secondarily, the type used to represent the positions in the
> user_ring_buffer functions in both libbpf and the kernel has been
> changed from u64 to unsigned long to match the type used in the
> kernel's representation of the structure. The change occurs in the
> same patch because it's required to align the data availability
> calculations between the userspace producing ringbuf and the bpf
> producing ringbuf.

Because the changes include both the change for ring buffer and user
ring buffer. I think it is better to split the changes into three
patches to ease the backports of these changes: one patch for change in
libbpf for ring buffer, and another two patches for changes in libbpf
and kernel for user ring buffer.
>
> Not included in this patch, a selftest was written to demonstrate the
> bug, and indeed this patch allows the test to continue to make progress
> past the overflow. The shape of the self test was as follows:
>
>  a) Set up ringbuf of 2GB size (the maximum permitted size).
>  b) reserve+commit maximum-sized records (ULONG_MAX/4) constantly as
>     fast as possible.

ULONG_MAX -> UINT_MAX ?
>
> With 1 million records per second repro time should be about 4.7 hours.
> Such a test duration is impractical to run, hence the omission.
>
> Additionally, this patch adds commentary around a separate point to note
> that the modular arithmetic is valid in the face of overflows, as that
> fact may not be obvious to future readers.
>
> v2->v3:
>   - Changed the representation of the consumer and producer positions
>     from u64 to unsigned long in user_ring_buffer functions. 
>   - Addressed overflow in __bpf_user_ringbuf_peek.
>   - Changed data availability computations to use the signed delta
>     between the consumer and producer positions rather than merely
>     checking whether their values were unequal.
> v1->v2:
>   - Fixed comment grammar.
>   - Properly formatted subject line.
>
> Signed-off-by: Andrew Werner <awerner32@gmail.com>
> ---
>  kernel/bpf/ringbuf.c    | 11 ++++++++---
>  tools/lib/bpf/ringbuf.c | 16 +++++++++++++---
>  2 files changed, 21 insertions(+), 6 deletions(-)

Otherwise, these changes look good to me:

Acked-by: Hou Tao <houtao1@huawei.com>

>
> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> index f045fde632e5..0c48673520fb 100644
> --- a/kernel/bpf/ringbuf.c
> +++ b/kernel/bpf/ringbuf.c
> @@ -658,7 +658,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  {
>  	int err;
>  	u32 hdr_len, sample_len, total_len, flags, *hdr;
> -	u64 cons_pos, prod_pos;
> +	unsigned long cons_pos, prod_pos;
>  
>  	/* Synchronizes with smp_store_release() in user-space producer. */
>  	prod_pos = smp_load_acquire(&rb->producer_pos);
> @@ -667,7 +667,12 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  
>  	/* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
>  	cons_pos = smp_load_acquire(&rb->consumer_pos);
> -	if (cons_pos >= prod_pos)
> +
> +	/* Check if there's data available by computing the signed delta between
> +	 * cons_pos and prod_pos; a negative delta indicates that the consumer has
> +	 * not caught up. This formulation is robust to prod_pos wrapping around.
> +	 */
> +	if ((long)(cons_pos - prod_pos) >= 0)
>  		return -ENODATA;
>  
>  	hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask));
> @@ -711,7 +716,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>  
>  static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
>  {
> -	u64 consumer_pos;
> +	unsigned long consumer_pos;
>  	u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
>  
>  	/* Using smp_load_acquire() is unnecessary here, as the busy-bit
> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> index 02199364db13..141030a89370 100644
> --- a/tools/lib/bpf/ringbuf.c
> +++ b/tools/lib/bpf/ringbuf.c
> @@ -237,7 +237,13 @@ static int64_t ringbuf_process_ring(struct ring *r)
>  	do {
>  		got_new_data = false;
>  		prod_pos = smp_load_acquire(r->producer_pos);
> -		while (cons_pos < prod_pos) {
> +
> +		/* Check if there's data available by computing the signed delta
> +		 * between cons_pos and prod_pos; a negative delta indicates that the
> +		 * consumer has not caught up. This formulation is robust to prod_pos
> +		 * wrapping around.
> +		 */
> +		while ((long)(cons_pos - prod_pos) < 0) {
>  			len_ptr = r->data + (cons_pos & r->mask);
>  			len = smp_load_acquire(len_ptr);
>  
> @@ -482,8 +488,7 @@ void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample)
>  void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
>  {
>  	__u32 avail_size, total_size, max_size;
> -	/* 64-bit to avoid overflow in case of extreme application behavior */
> -	__u64 cons_pos, prod_pos;
> +	unsigned long cons_pos, prod_pos;
>  	struct ringbuf_hdr *hdr;
>  
>  	/* The top two bits are used as special flags */
> @@ -498,6 +503,11 @@ void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
>  	prod_pos = smp_load_acquire(rb->producer_pos);
>  
>  	max_size = rb->mask + 1;
> +
> +	/* Note that this formulation is valid in the face of overflow of
> +	 * prod_pos so long as the delta between prod_pos and cons_pos is
> +	 * no greater than max_size.
> +	 */
>  	avail_size = max_size - (prod_pos - cons_pos);
>  	/* Round up total size to a multiple of 8. */
>  	total_size = (size + BPF_RINGBUF_HDR_SZ + 7) / 8 * 8;

Daniel Borkmann Aug. 25, 2023, 3:27 p.m. UTC | #4

On 8/25/23 12:09 AM, Andrew Werner wrote:
> Before this patch, the producer position could overflow `unsigned
> long`, in which case libbpf would forever stop processing new writes to
> the ringbuf. Similarly, overflows of the producer position could result
> in __bpf_user_ringbuf_peek not discovering available data. This patch
> addresses that bug by computing using the signed delta between the
> consumer and producer position to determine if data is available; the
> delta computation is robust to overflow.
> 
> A more defensive check could be to ensure that the delta is within
> the allowed range, but such defensive checks are neither present in
> the kernel side code nor in libbpf. The overflow that this patch
> handles can occur while the producer and consumer follow a correct
> protocol.
> 
> Secondarily, the type used to represent the positions in the
> user_ring_buffer functions in both libbpf and the kernel has been
> changed from u64 to unsigned long to match the type used in the
> kernel's representation of the structure. The change occurs in the

Hm, but won't this mismatch for 64bit kernel and 32bit user space? Why
not fixate both on u64 instead so types are consistent?

> same patch because it's required to align the data availability
> calculations between the userspace producing ringbuf and the bpf
> producing ringbuf.
> 
> Not included in this patch, a selftest was written to demonstrate the
> bug, and indeed this patch allows the test to continue to make progress
> past the overflow. The shape of the self test was as follows:
> 
>   a) Set up ringbuf of 2GB size (the maximum permitted size).
>   b) reserve+commit maximum-sized records (ULONG_MAX/4) constantly as
>      fast as possible.
> 
> With 1 million records per second repro time should be about 4.7 hours.
> Such a test duration is impractical to run, hence the omission.
> 
> Additionally, this patch adds commentary around a separate point to note
> that the modular arithmetic is valid in the face of overflows, as that
> fact may not be obvious to future readers.
> 
> v2->v3:
>    - Changed the representation of the consumer and producer positions
>      from u64 to unsigned long in user_ring_buffer functions.
>    - Addressed overflow in __bpf_user_ringbuf_peek.
>    - Changed data availability computations to use the signed delta
>      between the consumer and producer positions rather than merely
>      checking whether their values were unequal.
> v1->v2:
>    - Fixed comment grammar.
>    - Properly formatted subject line.
> 
> Signed-off-by: Andrew Werner <awerner32@gmail.com>
> ---
>   kernel/bpf/ringbuf.c    | 11 ++++++++---
>   tools/lib/bpf/ringbuf.c | 16 +++++++++++++---
>   2 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> index f045fde632e5..0c48673520fb 100644
> --- a/kernel/bpf/ringbuf.c
> +++ b/kernel/bpf/ringbuf.c
> @@ -658,7 +658,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>   {
>   	int err;
>   	u32 hdr_len, sample_len, total_len, flags, *hdr;
> -	u64 cons_pos, prod_pos;
> +	unsigned long cons_pos, prod_pos;
>   
>   	/* Synchronizes with smp_store_release() in user-space producer. */
>   	prod_pos = smp_load_acquire(&rb->producer_pos);
> @@ -667,7 +667,12 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>   
>   	/* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
>   	cons_pos = smp_load_acquire(&rb->consumer_pos);
> -	if (cons_pos >= prod_pos)
> +
> +	/* Check if there's data available by computing the signed delta between
> +	 * cons_pos and prod_pos; a negative delta indicates that the consumer has
> +	 * not caught up. This formulation is robust to prod_pos wrapping around.
> +	 */
> +	if ((long)(cons_pos - prod_pos) >= 0)
>   		return -ENODATA;
>   
>   	hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask));
> @@ -711,7 +716,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
>   
>   static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
>   {
> -	u64 consumer_pos;
> +	unsigned long consumer_pos;
>   	u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
>   
>   	/* Using smp_load_acquire() is unnecessary here, as the busy-bit
> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> index 02199364db13..141030a89370 100644
> --- a/tools/lib/bpf/ringbuf.c
> +++ b/tools/lib/bpf/ringbuf.c
> @@ -237,7 +237,13 @@ static int64_t ringbuf_process_ring(struct ring *r)
>   	do {
>   		got_new_data = false;
>   		prod_pos = smp_load_acquire(r->producer_pos);
> -		while (cons_pos < prod_pos) {
> +
> +		/* Check if there's data available by computing the signed delta
> +		 * between cons_pos and prod_pos; a negative delta indicates that the
> +		 * consumer has not caught up. This formulation is robust to prod_pos
> +		 * wrapping around.
> +		 */
> +		while ((long)(cons_pos - prod_pos) < 0) {
>   			len_ptr = r->data + (cons_pos & r->mask);
>   			len = smp_load_acquire(len_ptr);
>   
> @@ -482,8 +488,7 @@ void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample)
>   void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
>   {
>   	__u32 avail_size, total_size, max_size;
> -	/* 64-bit to avoid overflow in case of extreme application behavior */
> -	__u64 cons_pos, prod_pos;
> +	unsigned long cons_pos, prod_pos;
>   	struct ringbuf_hdr *hdr;
>   
>   	/* The top two bits are used as special flags */
> @@ -498,6 +503,11 @@ void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
>   	prod_pos = smp_load_acquire(rb->producer_pos);
>   
>   	max_size = rb->mask + 1;
> +
> +	/* Note that this formulation is valid in the face of overflow of
> +	 * prod_pos so long as the delta between prod_pos and cons_pos is
> +	 * no greater than max_size.
> +	 */
>   	avail_size = max_size - (prod_pos - cons_pos);
>   	/* Round up total size to a multiple of 8. */
>   	total_size = (size + BPF_RINGBUF_HDR_SZ + 7) / 8 * 8;
>

Daniel Borkmann Aug. 25, 2023, 4 p.m. UTC | #5

On 8/25/23 5:17 AM, Hou Tao wrote:
> On 8/25/2023 6:09 AM, Andrew Werner wrote:
>> Before this patch, the producer position could overflow `unsigned
>> long`, in which case libbpf would forever stop processing new writes to
>> the ringbuf. Similarly, overflows of the producer position could result
>> in __bpf_user_ringbuf_peek not discovering available data. This patch
>> addresses that bug by computing using the signed delta between the
>> consumer and producer position to determine if data is available; the
>> delta computation is robust to overflow.
>>
>> A more defensive check could be to ensure that the delta is within
>> the allowed range, but such defensive checks are neither present in
>> the kernel side code nor in libbpf. The overflow that this patch
>> handles can occur while the producer and consumer follow a correct
>> protocol.
>>
>> Secondarily, the type used to represent the positions in the
>> user_ring_buffer functions in both libbpf and the kernel has been
>> changed from u64 to unsigned long to match the type used in the
>> kernel's representation of the structure. The change occurs in the
>> same patch because it's required to align the data availability
>> calculations between the userspace producing ringbuf and the bpf
>> producing ringbuf.
> 
> Because the changes include both the change for ring buffer and user
> ring buffer. I think it is better to split the changes into three
> patches to ease the backports of these changes: one patch for change in
> libbpf for ring buffer, and another two patches for changes in libbpf
> and kernel for user ring buffer.

Splitting off the kernel parts into a separate patch would indeed be
good so that stable team can pull it. (Pls also add a Fixes tag.)

Thanks,
Daniel

Andrew Werner Aug. 25, 2023, 4:39 p.m. UTC | #6

On Fri, Aug 25, 2023 at 11:28 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/25/23 12:09 AM, Andrew Werner wrote:
> > Before this patch, the producer position could overflow `unsigned
> > long`, in which case libbpf would forever stop processing new writes to
> > the ringbuf. Similarly, overflows of the producer position could result
> > in __bpf_user_ringbuf_peek not discovering available data. This patch
> > addresses that bug by computing using the signed delta between the
> > consumer and producer position to determine if data is available; the
> > delta computation is robust to overflow.
> >
> > A more defensive check could be to ensure that the delta is within
> > the allowed range, but such defensive checks are neither present in
> > the kernel side code nor in libbpf. The overflow that this patch
> > handles can occur while the producer and consumer follow a correct
> > protocol.
> >
> > Secondarily, the type used to represent the positions in the
> > user_ring_buffer functions in both libbpf and the kernel has been
> > changed from u64 to unsigned long to match the type used in the
> > kernel's representation of the structure. The change occurs in the
>
> Hm, but won't this mismatch for 64bit kernel and 32bit user space? Why
> not fixate both on u64 instead so types are consistent?

Sure. It feels like if we do that then we'd break existing 32bit
big-endian clients, though I am not sure those exist. Concretely, the
request here would be to change the kernel structure and all library
usages to use u64, right?

>
> > same patch because it's required to align the data availability
> > calculations between the userspace producing ringbuf and the bpf
> > producing ringbuf.
> >
> > Not included in this patch, a selftest was written to demonstrate the
> > bug, and indeed this patch allows the test to continue to make progress
> > past the overflow. The shape of the self test was as follows:
> >
> >   a) Set up ringbuf of 2GB size (the maximum permitted size).
> >   b) reserve+commit maximum-sized records (ULONG_MAX/4) constantly as
> >      fast as possible.
> >
> > With 1 million records per second repro time should be about 4.7 hours.
> > Such a test duration is impractical to run, hence the omission.
> >
> > Additionally, this patch adds commentary around a separate point to note
> > that the modular arithmetic is valid in the face of overflows, as that
> > fact may not be obvious to future readers.
> >
> > v2->v3:
> >    - Changed the representation of the consumer and producer positions
> >      from u64 to unsigned long in user_ring_buffer functions.
> >    - Addressed overflow in __bpf_user_ringbuf_peek.
> >    - Changed data availability computations to use the signed delta
> >      between the consumer and producer positions rather than merely
> >      checking whether their values were unequal.
> > v1->v2:
> >    - Fixed comment grammar.
> >    - Properly formatted subject line.
> >
> > Signed-off-by: Andrew Werner <awerner32@gmail.com>
> > ---
> >   kernel/bpf/ringbuf.c    | 11 ++++++++---
> >   tools/lib/bpf/ringbuf.c | 16 +++++++++++++---
> >   2 files changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> > index f045fde632e5..0c48673520fb 100644
> > --- a/kernel/bpf/ringbuf.c
> > +++ b/kernel/bpf/ringbuf.c
> > @@ -658,7 +658,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
> >   {
> >       int err;
> >       u32 hdr_len, sample_len, total_len, flags, *hdr;
> > -     u64 cons_pos, prod_pos;
> > +     unsigned long cons_pos, prod_pos;
> >
> >       /* Synchronizes with smp_store_release() in user-space producer. */
> >       prod_pos = smp_load_acquire(&rb->producer_pos);
> > @@ -667,7 +667,12 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
> >
> >       /* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
> >       cons_pos = smp_load_acquire(&rb->consumer_pos);
> > -     if (cons_pos >= prod_pos)
> > +
> > +     /* Check if there's data available by computing the signed delta between
> > +      * cons_pos and prod_pos; a negative delta indicates that the consumer has
> > +      * not caught up. This formulation is robust to prod_pos wrapping around.
> > +      */
> > +     if ((long)(cons_pos - prod_pos) >= 0)
> >               return -ENODATA;
> >
> >       hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask));
> > @@ -711,7 +716,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
> >
> >   static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
> >   {
> > -     u64 consumer_pos;
> > +     unsigned long consumer_pos;
> >       u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
> >
> >       /* Using smp_load_acquire() is unnecessary here, as the busy-bit
> > diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> > index 02199364db13..141030a89370 100644
> > --- a/tools/lib/bpf/ringbuf.c
> > +++ b/tools/lib/bpf/ringbuf.c
> > @@ -237,7 +237,13 @@ static int64_t ringbuf_process_ring(struct ring *r)
> >       do {
> >               got_new_data = false;
> >               prod_pos = smp_load_acquire(r->producer_pos);
> > -             while (cons_pos < prod_pos) {
> > +
> > +             /* Check if there's data available by computing the signed delta
> > +              * between cons_pos and prod_pos; a negative delta indicates that the
> > +              * consumer has not caught up. This formulation is robust to prod_pos
> > +              * wrapping around.
> > +              */
> > +             while ((long)(cons_pos - prod_pos) < 0) {
> >                       len_ptr = r->data + (cons_pos & r->mask);
> >                       len = smp_load_acquire(len_ptr);
> >
> > @@ -482,8 +488,7 @@ void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample)
> >   void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
> >   {
> >       __u32 avail_size, total_size, max_size;
> > -     /* 64-bit to avoid overflow in case of extreme application behavior */
> > -     __u64 cons_pos, prod_pos;
> > +     unsigned long cons_pos, prod_pos;
> >       struct ringbuf_hdr *hdr;
> >
> >       /* The top two bits are used as special flags */
> > @@ -498,6 +503,11 @@ void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
> >       prod_pos = smp_load_acquire(rb->producer_pos);
> >
> >       max_size = rb->mask + 1;
> > +
> > +     /* Note that this formulation is valid in the face of overflow of
> > +      * prod_pos so long as the delta between prod_pos and cons_pos is
> > +      * no greater than max_size.
> > +      */
> >       avail_size = max_size - (prod_pos - cons_pos);
> >       /* Round up total size to a multiple of 8. */
> >       total_size = (size + BPF_RINGBUF_HDR_SZ + 7) / 8 * 8;
> >
>

Andrew Werner Aug. 25, 2023, 4:40 p.m. UTC | #7

On Fri, Aug 25, 2023 at 12:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/25/23 5:17 AM, Hou Tao wrote:
> > On 8/25/2023 6:09 AM, Andrew Werner wrote:
> >> Before this patch, the producer position could overflow `unsigned
> >> long`, in which case libbpf would forever stop processing new writes to
> >> the ringbuf. Similarly, overflows of the producer position could result
> >> in __bpf_user_ringbuf_peek not discovering available data. This patch
> >> addresses that bug by computing using the signed delta between the
> >> consumer and producer position to determine if data is available; the
> >> delta computation is robust to overflow.
> >>
> >> A more defensive check could be to ensure that the delta is within
> >> the allowed range, but such defensive checks are neither present in
> >> the kernel side code nor in libbpf. The overflow that this patch
> >> handles can occur while the producer and consumer follow a correct
> >> protocol.
> >>
> >> Secondarily, the type used to represent the positions in the
> >> user_ring_buffer functions in both libbpf and the kernel has been
> >> changed from u64 to unsigned long to match the type used in the
> >> kernel's representation of the structure. The change occurs in the
> >> same patch because it's required to align the data availability
> >> calculations between the userspace producing ringbuf and the bpf
> >> producing ringbuf.
> >
> > Because the changes include both the change for ring buffer and user
> > ring buffer. I think it is better to split the changes into three
> > patches to ease the backports of these changes: one patch for change in
> > libbpf for ring buffer, and another two patches for changes in libbpf
> > and kernel for user ring buffer.
>
> Splitting off the kernel parts into a separate patch would indeed be
> good so that stable team can pull it. (Pls also add a Fixes tag.)

Ack, will do this for the next revision when the dust settles.

>
> Thanks,
> Daniel

Daniel Borkmann Aug. 25, 2023, 5:23 p.m. UTC | #8

On 8/25/23 6:39 PM, Andrew Werner wrote:
> On Fri, Aug 25, 2023 at 11:28 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 8/25/23 12:09 AM, Andrew Werner wrote:
>>> Before this patch, the producer position could overflow `unsigned
>>> long`, in which case libbpf would forever stop processing new writes to
>>> the ringbuf. Similarly, overflows of the producer position could result
>>> in __bpf_user_ringbuf_peek not discovering available data. This patch
>>> addresses that bug by computing using the signed delta between the
>>> consumer and producer position to determine if data is available; the
>>> delta computation is robust to overflow.
>>>
>>> A more defensive check could be to ensure that the delta is within
>>> the allowed range, but such defensive checks are neither present in
>>> the kernel side code nor in libbpf. The overflow that this patch
>>> handles can occur while the producer and consumer follow a correct
>>> protocol.
>>>
>>> Secondarily, the type used to represent the positions in the
>>> user_ring_buffer functions in both libbpf and the kernel has been
>>> changed from u64 to unsigned long to match the type used in the
>>> kernel's representation of the structure. The change occurs in the
>>
>> Hm, but won't this mismatch for 64bit kernel and 32bit user space? Why
>> not fixate both on u64 instead so types are consistent?
> 
> Sure. It feels like if we do that then we'd break existing 32bit
> big-endian clients, though I am not sure those exist. Concretely, the
> request here would be to change the kernel structure and all library
> usages to use u64, right?

Yes, to align all consistently on u64. From your diff, I read that for
the kernel its the case already.

Thanks,
Daniel

Andrew Werner Aug. 25, 2023, 5:49 p.m. UTC | #9

On Fri, Aug 25, 2023 at 1:23 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/25/23 6:39 PM, Andrew Werner wrote:
> > On Fri, Aug 25, 2023 at 11:28 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >> On 8/25/23 12:09 AM, Andrew Werner wrote:
> >>> Before this patch, the producer position could overflow `unsigned
> >>> long`, in which case libbpf would forever stop processing new writes to
> >>> the ringbuf. Similarly, overflows of the producer position could result
> >>> in __bpf_user_ringbuf_peek not discovering available data. This patch
> >>> addresses that bug by computing using the signed delta between the
> >>> consumer and producer position to determine if data is available; the
> >>> delta computation is robust to overflow.
> >>>
> >>> A more defensive check could be to ensure that the delta is within
> >>> the allowed range, but such defensive checks are neither present in
> >>> the kernel side code nor in libbpf. The overflow that this patch
> >>> handles can occur while the producer and consumer follow a correct
> >>> protocol.
> >>>
> >>> Secondarily, the type used to represent the positions in the
> >>> user_ring_buffer functions in both libbpf and the kernel has been
> >>> changed from u64 to unsigned long to match the type used in the
> >>> kernel's representation of the structure. The change occurs in the
> >>
> >> Hm, but won't this mismatch for 64bit kernel and 32bit user space? Why
> >> not fixate both on u64 instead so types are consistent?
> >
> > Sure. It feels like if we do that then we'd break existing 32bit
> > big-endian clients, though I am not sure those exist. Concretely, the
> > request here would be to change the kernel structure and all library
> > usages to use u64, right?
>
> Yes, to align all consistently on u64. From your diff, I read that for
> the kernel its the case already.

I would not say that for the kernel it's the case that these values
are u64. Today the kernel representation of the positions are both
`unsigned long`, and almost all functions which read or write to those
locations in memory use `unsigned long`. Two kernel functions related
to the user_ring_buffer and one libbpf function related to the
user_ring_buffer were reading the data at those addresses in the data
structure into u64.

>
> Thanks,
> Daniel

Andrii Nakryiko Aug. 25, 2023, 6:25 p.m. UTC | #10

On Thu, Aug 24, 2023 at 4:42 PM Andrew Werner <awerner32@gmail.com> wrote:
>
> On Thu, Aug 24, 2023 at 6:46 PM David Vernet <void@manifault.com> wrote:
> >
> > On Thu, Aug 24, 2023 at 06:09:08PM -0400, Andrew Werner wrote:
> > > Before this patch, the producer position could overflow `unsigned
> > > long`, in which case libbpf would forever stop processing new writes to
> > > the ringbuf. Similarly, overflows of the producer position could result
> > > in __bpf_user_ringbuf_peek not discovering available data. This patch
> > > addresses that bug by computing using the signed delta between the
> > > consumer and producer position to determine if data is available; the
> > > delta computation is robust to overflow.
> > >
> > > A more defensive check could be to ensure that the delta is within
> > > the allowed range, but such defensive checks are neither present in
> > > the kernel side code nor in libbpf. The overflow that this patch
> > > handles can occur while the producer and consumer follow a correct
> > > protocol.
> > >
> > > Secondarily, the type used to represent the positions in the
> > > user_ring_buffer functions in both libbpf and the kernel has been
> > > changed from u64 to unsigned long to match the type used in the
> > > kernel's representation of the structure. The change occurs in the
> > > same patch because it's required to align the data availability
> > > calculations between the userspace producing ringbuf and the bpf
> > > producing ringbuf.
> > >
> > > Not included in this patch, a selftest was written to demonstrate the
> > > bug, and indeed this patch allows the test to continue to make progress
> > > past the overflow. The shape of the self test was as follows:
> > >
> > >  a) Set up ringbuf of 2GB size (the maximum permitted size).
> > >  b) reserve+commit maximum-sized records (ULONG_MAX/4) constantly as
> > >     fast as possible.
> > >
> > > With 1 million records per second repro time should be about 4.7 hours.
> > > Such a test duration is impractical to run, hence the omission.
> > >
> > > Additionally, this patch adds commentary around a separate point to note
> > > that the modular arithmetic is valid in the face of overflows, as that
> > > fact may not be obvious to future readers.
> > >
> > > v2->v3:
> > >   - Changed the representation of the consumer and producer positions
> > >     from u64 to unsigned long in user_ring_buffer functions.
> > >   - Addressed overflow in __bpf_user_ringbuf_peek.
> > >   - Changed data availability computations to use the signed delta
> > >     between the consumer and producer positions rather than merely
> > >     checking whether their values were unequal.
> > > v1->v2:
> > >   - Fixed comment grammar.
> > >   - Properly formatted subject line.
> > >
> > > Signed-off-by: Andrew Werner <awerner32@gmail.com>
> >
> > Hey Andrew,
> >
> > This LGTM, thanks for finding and fixing this. I left a comment below
> > about the cast >= vs. equality comparison, but I won't block on it given
> > that it's already been discussed on another thread. Here's my tag:
> >
> > Reviewed-by: David Vernet<void@manifault.com>
> >
> > > ---
> > >  kernel/bpf/ringbuf.c    | 11 ++++++++---
> > >  tools/lib/bpf/ringbuf.c | 16 +++++++++++++---
> > >  2 files changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> > > index f045fde632e5..0c48673520fb 100644
> > > --- a/kernel/bpf/ringbuf.c
> > > +++ b/kernel/bpf/ringbuf.c
> > > @@ -658,7 +658,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
> > >  {
> > >       int err;
> > >       u32 hdr_len, sample_len, total_len, flags, *hdr;
> > > -     u64 cons_pos, prod_pos;
> > > +     unsigned long cons_pos, prod_pos;
> > >
> > >       /* Synchronizes with smp_store_release() in user-space producer. */
> > >       prod_pos = smp_load_acquire(&rb->producer_pos);
> > > @@ -667,7 +667,12 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
> > >
> > >       /* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
> > >       cons_pos = smp_load_acquire(&rb->consumer_pos);
> > > -     if (cons_pos >= prod_pos)
> > > +
> > > +     /* Check if there's data available by computing the signed delta between
> > > +      * cons_pos and prod_pos; a negative delta indicates that the consumer has
> > > +      * not caught up. This formulation is robust to prod_pos wrapping around.
> > > +      */
> > > +     if ((long)(cons_pos - prod_pos) >= 0)
> >
> > I see that Andrii suggested doing it this way in [0] so I won't insist
> > on changing it, but IMO this is much less readable and more confusing
> > than just doing an if (cond_pos == prod_pos) with a comment. The way

this is a canonical way of comparing two counters that can wrap over,
just like time_before() for jiffies. We can add a small macro similar
to time_before(), but I figured given it's in one place in kernel's
ringbuf, it should be fine

as for just using inequality here, I still would prefer to preserve
original semantics (cons_pos >= prod_pos) instead of having to analyze
anew whether just doing inequality here and handling potentially
garbage data later on would be always correct.

For libbpf side, given we expect kernel to behave correctly always,
inequality would be a bit more acceptable, but I'd still keep
inequality just like in the original code (except taking into account
wraparound possibility)


> > it's written this way, it makes it look like there could be a situation
> > where cond_pos could be ahead of prod_pos, whereas that would actually
> > just be a bug elsewhere that we'd be papering over. I guess this is more
> > defensive. In any case, I won't insit on it needing to change.
>
> I am also happy with the !=/== checks vs this delta checking. Any cast
> in C makes me worry I'm doing something wrong. My understanding is
> that the C spec doesn't state that `unsigned long` and `long` have to
> be the same bit width even in the same program, but that's not
> practically going to happen. Even the wraparound behavior and the
> representation of negative numbers aren't defined in the spec as I
> understand.

I haven't studied C spec and don't really want to :) but if ever
signed vs unsigned changed the size of int/long/long long, so much
code would be broken, that I refuse to worry about this :) Similarly
such comparison-in-the-face-of-wrapping is done all the time in kernel
for jiffies comparison and such, so we are not charting a new
territory here.

>
> I'm going to wait for Andrii to opine before doing anything to avoid
> excessive churn.
>
> >
> > [0]: https://lore.kernel.org/all/CAEf4BzZQQ=fz+NqFHhJcqKoVAvh4=XbH7HWaHKjUg5OOzi-PTw@mail.gmail.com/
> >
> > >               return -ENODATA;
> > >
> > >       hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask));
> > > @@ -711,7 +716,7 @@ static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *s
> > >
> > >  static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
> > >  {
> > > -     u64 consumer_pos;
> > > +     unsigned long consumer_pos;
> > >       u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
> > >
> > >       /* Using smp_load_acquire() is unnecessary here, as the busy-bit
> > > diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> > > index 02199364db13..141030a89370 100644
> > > --- a/tools/lib/bpf/ringbuf.c
> > > +++ b/tools/lib/bpf/ringbuf.c
> > > @@ -237,7 +237,13 @@ static int64_t ringbuf_process_ring(struct ring *r)
> > >       do {
> > >               got_new_data = false;
> > >               prod_pos = smp_load_acquire(r->producer_pos);
> > > -             while (cons_pos < prod_pos) {
> > > +
> > > +             /* Check if there's data available by computing the signed delta
> > > +              * between cons_pos and prod_pos; a negative delta indicates that the
> > > +              * consumer has not caught up. This formulation is robust to prod_pos
> > > +              * wrapping around.
> > > +              */
> > > +             while ((long)(cons_pos - prod_pos) < 0) {
> >
> > Same here. Doing a != is much more clear, in my opinion. Not a blocker
> > though.
> >
> > [...]
> >
> > Thanks,
> > David

Andrii Nakryiko Aug. 25, 2023, 6:28 p.m. UTC | #11

On Fri, Aug 25, 2023 at 10:23 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/25/23 6:39 PM, Andrew Werner wrote:
> > On Fri, Aug 25, 2023 at 11:28 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >> On 8/25/23 12:09 AM, Andrew Werner wrote:
> >>> Before this patch, the producer position could overflow `unsigned
> >>> long`, in which case libbpf would forever stop processing new writes to
> >>> the ringbuf. Similarly, overflows of the producer position could result
> >>> in __bpf_user_ringbuf_peek not discovering available data. This patch
> >>> addresses that bug by computing using the signed delta between the
> >>> consumer and producer position to determine if data is available; the
> >>> delta computation is robust to overflow.
> >>>
> >>> A more defensive check could be to ensure that the delta is within
> >>> the allowed range, but such defensive checks are neither present in
> >>> the kernel side code nor in libbpf. The overflow that this patch
> >>> handles can occur while the producer and consumer follow a correct
> >>> protocol.
> >>>
> >>> Secondarily, the type used to represent the positions in the
> >>> user_ring_buffer functions in both libbpf and the kernel has been
> >>> changed from u64 to unsigned long to match the type used in the
> >>> kernel's representation of the structure. The change occurs in the
> >>
> >> Hm, but won't this mismatch for 64bit kernel and 32bit user space? Why
> >> not fixate both on u64 instead so types are consistent?
> >
> > Sure. It feels like if we do that then we'd break existing 32bit
> > big-endian clients, though I am not sure those exist. Concretely, the
> > request here would be to change the kernel structure and all library
> > usages to use u64, right?
>
> Yes, to align all consistently on u64. From your diff, I read that for
> the kernel its the case already.
>

I don't think we can change it. It was intentionally specified as
`long` for consumer and producer positions, to match native word size
for atomic operations and smp_load_acquire/smp_store_release. Using
u64 was a mistake that slipped through.

Further, using u64 doesn't really help with anything, just makes
32-bit code slower (and maybe/probably would allow teared reads/writes
for position counters). Switching to unsigned long is the right move.

> Thanks,
> Daniel
>

[bpf-next,v3] libbpf: handle producer position overflow

Checks

Commit Message

Comments

Patch