diff mbox series

[bpf-next,1/2] bpf: Make dst_port field in struct bpf_sock 16-bit wide

Message ID 20220127172448.155686-2-jakub@cloudflare.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series Split bpf_sock dst_port field | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1808 this patch: 1808
netdev/cc_maintainers warning 6 maintainers not CCed: kuba@kernel.org kpsingh@kernel.org john.fastabend@gmail.com songliubraving@fb.com yhs@fb.com davem@davemloft.net
netdev/build_clang success Errors and warnings before: 218 this patch: 218
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1825 this patch: 1825
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 36 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next success VM_Test

Commit Message

Jakub Sitnicki Jan. 27, 2022, 5:24 p.m. UTC
Menglong Dong reports that the documentation for the dst_port field in
struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
field is a zero-padded 16-bit integer in network byte order. The value
appears to the BPF user as if laid out in memory as so:

  offsetof(struct bpf_sock, dst_port) + 0  <port MSB>
                                      + 8  <port LSB>
                                      +16  0x00
                                      +24  0x00

32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
the offset into the field is 0.

32-bit wide loads from dst_port are especially confusing. The loaded value,
after converting to host byte order with bpf_ntohl(dst_port), contains the
port number in the upper 16-bits.

Remove the confusion by splitting the field into two 16-bit fields. For
backward compatibility, allow 32-bit wide loads from offsetof(struct
bpf_sock, dst_port).

While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.

Reported-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/uapi/linux/bpf.h | 3 ++-
 net/core/filter.c        | 9 ++++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

Comments

Alexei Starovoitov Jan. 28, 2022, 6:17 p.m. UTC | #1
On Thu, Jan 27, 2022 at 9:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Menglong Dong reports that the documentation for the dst_port field in
> struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
> field is a zero-padded 16-bit integer in network byte order. The value
> appears to the BPF user as if laid out in memory as so:
>
>   offsetof(struct bpf_sock, dst_port) + 0  <port MSB>
>                                       + 8  <port LSB>
>                                       +16  0x00
>                                       +24  0x00
>
> 32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
> the offset into the field is 0.
>
> 32-bit wide loads from dst_port are especially confusing. The loaded value,
> after converting to host byte order with bpf_ntohl(dst_port), contains the
> port number in the upper 16-bits.
>
> Remove the confusion by splitting the field into two 16-bit fields. For
> backward compatibility, allow 32-bit wide loads from offsetof(struct
> bpf_sock, dst_port).
>
> While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
>
> Reported-by: Menglong Dong <imagedong@tencent.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/uapi/linux/bpf.h | 3 ++-
>  net/core/filter.c        | 9 ++++++++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4a2f7041ebae..027e84b18b51 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5574,7 +5574,8 @@ struct bpf_sock {
>         __u32 src_ip4;
>         __u32 src_ip6[4];
>         __u32 src_port;         /* host byte order */
> -       __u32 dst_port;         /* network byte order */
> +       __be16 dst_port;        /* network byte order */
> +       __u16 zero_padding;

I was wondering can we do '__u16 :16' here ?

Should we do the same for bpf_sk_lookup->remote_port as well
for consistency?

Thanks for the idea and the patches!
Jakub Sitnicki Jan. 30, 2022, 11:58 a.m. UTC | #2
On Fri, Jan 28, 2022 at 07:17 PM CET, Alexei Starovoitov wrote:
> On Thu, Jan 27, 2022 at 9:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Menglong Dong reports that the documentation for the dst_port field in
>> struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
>> field is a zero-padded 16-bit integer in network byte order. The value
>> appears to the BPF user as if laid out in memory as so:
>>
>>   offsetof(struct bpf_sock, dst_port) + 0  <port MSB>
>>                                       + 8  <port LSB>
>>                                       +16  0x00
>>                                       +24  0x00
>>
>> 32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
>> the offset into the field is 0.
>>
>> 32-bit wide loads from dst_port are especially confusing. The loaded value,
>> after converting to host byte order with bpf_ntohl(dst_port), contains the
>> port number in the upper 16-bits.
>>
>> Remove the confusion by splitting the field into two 16-bit fields. For
>> backward compatibility, allow 32-bit wide loads from offsetof(struct
>> bpf_sock, dst_port).
>>
>> While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
>>
>> Reported-by: Menglong Dong <imagedong@tencent.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  include/uapi/linux/bpf.h | 3 ++-
>>  net/core/filter.c        | 9 ++++++++-
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 4a2f7041ebae..027e84b18b51 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -5574,7 +5574,8 @@ struct bpf_sock {
>>         __u32 src_ip4;
>>         __u32 src_ip6[4];
>>         __u32 src_port;         /* host byte order */
>> -       __u32 dst_port;         /* network byte order */
>> +       __be16 dst_port;        /* network byte order */
>> +       __u16 zero_padding;
>
> I was wondering can we do '__u16 :16' here ?

Great idea. Now done in v2:

https://lore.kernel.org/bpf/20220130115518.213259-1-jakub@cloudflare.com/

> Should we do the same for bpf_sk_lookup->remote_port as well
> for consistency?

I can tend to that this upcoming week.
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4a2f7041ebae..027e84b18b51 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5574,7 +5574,8 @@  struct bpf_sock {
 	__u32 src_ip4;
 	__u32 src_ip6[4];
 	__u32 src_port;		/* host byte order */
-	__u32 dst_port;		/* network byte order */
+	__be16 dst_port;	/* network byte order */
+	__u16 zero_padding;
 	__u32 dst_ip4;
 	__u32 dst_ip6[4];
 	__u32 state;
diff --git a/net/core/filter.c b/net/core/filter.c
index a06931c27eeb..ef8473163696 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8265,6 +8265,7 @@  bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 			      struct bpf_insn_access_aux *info)
 {
 	const int size_default = sizeof(__u32);
+	int field_size;
 
 	if (off < 0 || off >= sizeof(struct bpf_sock))
 		return false;
@@ -8276,7 +8277,6 @@  bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 	case offsetof(struct bpf_sock, family):
 	case offsetof(struct bpf_sock, type):
 	case offsetof(struct bpf_sock, protocol):
-	case offsetof(struct bpf_sock, dst_port):
 	case offsetof(struct bpf_sock, src_port):
 	case offsetof(struct bpf_sock, rx_queue_mapping):
 	case bpf_ctx_range(struct bpf_sock, src_ip4):
@@ -8285,6 +8285,13 @@  bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 	case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
 		bpf_ctx_record_field_size(info, size_default);
 		return bpf_ctx_narrow_access_ok(off, size, size_default);
+	case bpf_ctx_range(struct bpf_sock, dst_port):
+		field_size = size == size_default ?
+			size_default : sizeof_field(struct bpf_sock, dst_port);
+		bpf_ctx_record_field_size(info, field_size);
+		return bpf_ctx_narrow_access_ok(off, size, field_size);
+	case bpf_ctx_range(struct bpf_sock, zero_padding):
+		return false;
 	}
 
 	return size == size_default;