Message ID | 20201201144418.35045-7-kuniyu@amazon.co.jp (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | Socket migration for SO_REUSEPORT. | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf-next |
netdev/subject_prefix | success | Link |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 15755 this patch: 15755 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 30 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 15667 this patch: 15667 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to > check if the attached eBPF program is capable of migrating sockets. > > When the eBPF program is attached, the kernel runs it for socket migration > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. > The kernel will change the behaviour depending on the returned value: > > - SK_PASS with selected_sk, select it as a new listener > - SK_PASS with selected_sk NULL, fall back to the random selection > - SK_DROP, cancel the migration > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ > Suggested-by: Martin KaFai Lau <kafai@fb.com> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > --- > include/uapi/linux/bpf.h | 2 ++ > kernel/bpf/syscall.c | 8 ++++++++ > tools/include/uapi/linux/bpf.h | 2 ++ > 3 files changed, 12 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 85278deff439..cfc207ae7782 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -241,6 +241,8 @@ enum bpf_attach_type { > BPF_XDP_CPUMAP, > BPF_SK_LOOKUP, > BPF_XDP, > + BPF_SK_REUSEPORT_SELECT, > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > __MAX_BPF_ATTACH_TYPE > }; > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index f3fe9f53f93c..a0796a8de5ea 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, > if (expected_attach_type == BPF_SK_LOOKUP) > return 0; > return -EINVAL; > + case BPF_PROG_TYPE_SK_REUSEPORT: > + switch (expected_attach_type) { > + case BPF_SK_REUSEPORT_SELECT: > + case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: > + return 0; > + default: > + return -EINVAL; > + } this is a kernel regression, previously expected_attach_type wasn't enforced, so user-space could have provided any number without an error. > case BPF_PROG_TYPE_EXT: > if (expected_attach_type) > return -EINVAL; > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index 85278deff439..cfc207ae7782 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -241,6 +241,8 @@ enum bpf_attach_type { > BPF_XDP_CPUMAP, > BPF_SK_LOOKUP, > BPF_XDP, > + BPF_SK_REUSEPORT_SELECT, > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > __MAX_BPF_ATTACH_TYPE > }; > > -- > 2.17.2 (Apple Git-113) >
On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote: > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to > > check if the attached eBPF program is capable of migrating sockets. > > > > When the eBPF program is attached, the kernel runs it for socket migration > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. > > The kernel will change the behaviour depending on the returned value: > > > > - SK_PASS with selected_sk, select it as a new listener > > - SK_PASS with selected_sk NULL, fall back to the random selection > > - SK_DROP, cancel the migration > > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ > > Suggested-by: Martin KaFai Lau <kafai@fb.com> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > > --- > > include/uapi/linux/bpf.h | 2 ++ > > kernel/bpf/syscall.c | 8 ++++++++ > > tools/include/uapi/linux/bpf.h | 2 ++ > > 3 files changed, 12 insertions(+) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 85278deff439..cfc207ae7782 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -241,6 +241,8 @@ enum bpf_attach_type { > > BPF_XDP_CPUMAP, > > BPF_SK_LOOKUP, > > BPF_XDP, > > + BPF_SK_REUSEPORT_SELECT, > > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > > __MAX_BPF_ATTACH_TYPE > > }; > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index f3fe9f53f93c..a0796a8de5ea 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, > > if (expected_attach_type == BPF_SK_LOOKUP) > > return 0; > > return -EINVAL; > > + case BPF_PROG_TYPE_SK_REUSEPORT: > > + switch (expected_attach_type) { > > + case BPF_SK_REUSEPORT_SELECT: > > + case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: > > + return 0; > > + default: > > + return -EINVAL; > > + } > > this is a kernel regression, previously expected_attach_type wasn't > enforced, so user-space could have provided any number without an > error. I also think this change alone will break things like when the usual attr->expected_attach_type == 0 case. At least changes is needed in bpf_prog_load_fixup_attach_type() which is also handling a similar situation for BPF_PROG_TYPE_CGROUP_SOCK. I now think there is no need to expose new bpf_attach_type to the UAPI. Since the prog->expected_attach_type is not used, it can be cleared at load time and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined internally at filter.[c|h]) in the is_valid_access() when "migration" is accessed. When "migration" is accessed, the bpf prog can handle migration (and the original not-migration) case. > > > case BPF_PROG_TYPE_EXT: > > if (expected_attach_type) > > return -EINVAL; > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > > index 85278deff439..cfc207ae7782 100644 > > --- a/tools/include/uapi/linux/bpf.h > > +++ b/tools/include/uapi/linux/bpf.h > > @@ -241,6 +241,8 @@ enum bpf_attach_type { > > BPF_XDP_CPUMAP, > > BPF_SK_LOOKUP, > > BPF_XDP, > > + BPF_SK_REUSEPORT_SELECT, > > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > > __MAX_BPF_ATTACH_TYPE > > }; > > > > -- > > 2.17.2 (Apple Git-113) > >
On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote: > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote: > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > > > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to > > > check if the attached eBPF program is capable of migrating sockets. > > > > > > When the eBPF program is attached, the kernel runs it for socket migration > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. > > > The kernel will change the behaviour depending on the returned value: > > > > > > - SK_PASS with selected_sk, select it as a new listener > > > - SK_PASS with selected_sk NULL, fall back to the random selection > > > - SK_DROP, cancel the migration > > > > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ > > > Suggested-by: Martin KaFai Lau <kafai@fb.com> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > > > --- > > > include/uapi/linux/bpf.h | 2 ++ > > > kernel/bpf/syscall.c | 8 ++++++++ > > > tools/include/uapi/linux/bpf.h | 2 ++ > > > 3 files changed, 12 insertions(+) > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > index 85278deff439..cfc207ae7782 100644 > > > --- a/include/uapi/linux/bpf.h > > > +++ b/include/uapi/linux/bpf.h > > > @@ -241,6 +241,8 @@ enum bpf_attach_type { > > > BPF_XDP_CPUMAP, > > > BPF_SK_LOOKUP, > > > BPF_XDP, > > > + BPF_SK_REUSEPORT_SELECT, > > > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > > > __MAX_BPF_ATTACH_TYPE > > > }; > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > index f3fe9f53f93c..a0796a8de5ea 100644 > > > --- a/kernel/bpf/syscall.c > > > +++ b/kernel/bpf/syscall.c > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, > > > if (expected_attach_type == BPF_SK_LOOKUP) > > > return 0; > > > return -EINVAL; > > > + case BPF_PROG_TYPE_SK_REUSEPORT: > > > + switch (expected_attach_type) { > > > + case BPF_SK_REUSEPORT_SELECT: > > > + case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: > > > + return 0; > > > + default: > > > + return -EINVAL; > > > + } > > > > this is a kernel regression, previously expected_attach_type wasn't > > enforced, so user-space could have provided any number without an > > error. > I also think this change alone will break things like when the usual > attr->expected_attach_type == 0 case. At least changes is needed in > bpf_prog_load_fixup_attach_type() which is also handling a > similar situation for BPF_PROG_TYPE_CGROUP_SOCK. > > I now think there is no need to expose new bpf_attach_type to the UAPI. > Since the prog->expected_attach_type is not used, it can be cleared at load time > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined > internally at filter.[c|h]) in the is_valid_access() when "migration" > is accessed. When "migration" is accessed, the bpf prog can handle > migration (and the original not-migration) case. Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea. I think there will be cases that bpf prog wants to do both without accessing any field from sk_reuseport_md. Lets go back to the discussion on using a similar idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type(). I am not aware there is loader setting a random number in expected_attach_type, so the chance of breaking is very low. There was a similar discussion earlier [0]. [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/ > > > > > > case BPF_PROG_TYPE_EXT: > > > if (expected_attach_type) > > > return -EINVAL; > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > > > index 85278deff439..cfc207ae7782 100644 > > > --- a/tools/include/uapi/linux/bpf.h > > > +++ b/tools/include/uapi/linux/bpf.h > > > @@ -241,6 +241,8 @@ enum bpf_attach_type { > > > BPF_XDP_CPUMAP, > > > BPF_SK_LOOKUP, > > > BPF_XDP, > > > + BPF_SK_REUSEPORT_SELECT, > > > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > > > __MAX_BPF_ATTACH_TYPE > > > }; > > > > > > -- > > > 2.17.2 (Apple Git-113) > > >
From: Martin KaFai Lau <kafai@fb.com> Date: Wed, 2 Dec 2020 20:24:02 -0800 > On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote: > > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote: > > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > > > > > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to > > > > check if the attached eBPF program is capable of migrating sockets. > > > > > > > > When the eBPF program is attached, the kernel runs it for socket migration > > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. > > > > The kernel will change the behaviour depending on the returned value: > > > > > > > > - SK_PASS with selected_sk, select it as a new listener > > > > - SK_PASS with selected_sk NULL, fall back to the random selection > > > > - SK_DROP, cancel the migration > > > > > > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ > > > > Suggested-by: Martin KaFai Lau <kafai@fb.com> > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > > > > --- > > > > include/uapi/linux/bpf.h | 2 ++ > > > > kernel/bpf/syscall.c | 8 ++++++++ > > > > tools/include/uapi/linux/bpf.h | 2 ++ > > > > 3 files changed, 12 insertions(+) > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > index 85278deff439..cfc207ae7782 100644 > > > > --- a/include/uapi/linux/bpf.h > > > > +++ b/include/uapi/linux/bpf.h > > > > @@ -241,6 +241,8 @@ enum bpf_attach_type { > > > > BPF_XDP_CPUMAP, > > > > BPF_SK_LOOKUP, > > > > BPF_XDP, > > > > + BPF_SK_REUSEPORT_SELECT, > > > > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > > > > __MAX_BPF_ATTACH_TYPE > > > > }; > > > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > > index f3fe9f53f93c..a0796a8de5ea 100644 > > > > --- a/kernel/bpf/syscall.c > > > > +++ b/kernel/bpf/syscall.c > > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, > > > > if (expected_attach_type == BPF_SK_LOOKUP) > > > > return 0; > > > > return -EINVAL; > > > > + case BPF_PROG_TYPE_SK_REUSEPORT: > > > > + switch (expected_attach_type) { > > > > + case BPF_SK_REUSEPORT_SELECT: > > > > + case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: > > > > + return 0; > > > > + default: > > > > + return -EINVAL; > > > > + } > > > > > > this is a kernel regression, previously expected_attach_type wasn't > > > enforced, so user-space could have provided any number without an > > > error. > > I also think this change alone will break things like when the usual > > attr->expected_attach_type == 0 case. At least changes is needed in > > bpf_prog_load_fixup_attach_type() which is also handling a > > similar situation for BPF_PROG_TYPE_CGROUP_SOCK. > > > > I now think there is no need to expose new bpf_attach_type to the UAPI. > > Since the prog->expected_attach_type is not used, it can be cleared at load time > > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined > > internally at filter.[c|h]) in the is_valid_access() when "migration" > > is accessed. When "migration" is accessed, the bpf prog can handle > > migration (and the original not-migration) case. > Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea. > I think there will be cases that bpf prog wants to do both > without accessing any field from sk_reuseport_md. > > Lets go back to the discussion on using a similar > idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type(). > I am not aware there is loader setting a random number > in expected_attach_type, so the chance of breaking > is very low. There was a similar discussion earlier [0]. > > [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/ Thank you for the idea and reference. I will remove the change in bpf_prog_load_check_attach() and set the default value (BPF_SK_REUSEPORT_SELECT) in bpf_prog_load_fixup_attach_type() for backward compatibility if expected_attach_type is 0. > > > > case BPF_PROG_TYPE_EXT: > > > > if (expected_attach_type) > > > > return -EINVAL; > > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > > > > index 85278deff439..cfc207ae7782 100644 > > > > --- a/tools/include/uapi/linux/bpf.h > > > > +++ b/tools/include/uapi/linux/bpf.h > > > > @@ -241,6 +241,8 @@ enum bpf_attach_type { > > > > BPF_XDP_CPUMAP, > > > > BPF_SK_LOOKUP, > > > > BPF_XDP, > > > > + BPF_SK_REUSEPORT_SELECT, > > > > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > > > > __MAX_BPF_ATTACH_TYPE > > > > }; > > > > > > > > -- > > > > 2.17.2 (Apple Git-113)
On Thu, Dec 03, 2020 at 11:16:08PM +0900, Kuniyuki Iwashima wrote: > From: Martin KaFai Lau <kafai@fb.com> > Date: Wed, 2 Dec 2020 20:24:02 -0800 > > On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote: > > > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote: > > > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > > > > > > > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to > > > > > check if the attached eBPF program is capable of migrating sockets. > > > > > > > > > > When the eBPF program is attached, the kernel runs it for socket migration > > > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. > > > > > The kernel will change the behaviour depending on the returned value: > > > > > > > > > > - SK_PASS with selected_sk, select it as a new listener > > > > > - SK_PASS with selected_sk NULL, fall back to the random selection > > > > > - SK_DROP, cancel the migration > > > > > > > > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ > > > > > Suggested-by: Martin KaFai Lau <kafai@fb.com> > > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > > > > > --- > > > > > include/uapi/linux/bpf.h | 2 ++ > > > > > kernel/bpf/syscall.c | 8 ++++++++ > > > > > tools/include/uapi/linux/bpf.h | 2 ++ > > > > > 3 files changed, 12 insertions(+) > > > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > index 85278deff439..cfc207ae7782 100644 > > > > > --- a/include/uapi/linux/bpf.h > > > > > +++ b/include/uapi/linux/bpf.h > > > > > @@ -241,6 +241,8 @@ enum bpf_attach_type { > > > > > BPF_XDP_CPUMAP, > > > > > BPF_SK_LOOKUP, > > > > > BPF_XDP, > > > > > + BPF_SK_REUSEPORT_SELECT, > > > > > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > > > > > __MAX_BPF_ATTACH_TYPE > > > > > }; > > > > > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > > > index f3fe9f53f93c..a0796a8de5ea 100644 > > > > > --- a/kernel/bpf/syscall.c > > > > > +++ b/kernel/bpf/syscall.c > > > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, > > > > > if (expected_attach_type == BPF_SK_LOOKUP) > > > > > return 0; > > > > > return -EINVAL; > > > > > + case BPF_PROG_TYPE_SK_REUSEPORT: > > > > > + switch (expected_attach_type) { > > > > > + case BPF_SK_REUSEPORT_SELECT: > > > > > + case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: > > > > > + return 0; > > > > > + default: > > > > > + return -EINVAL; > > > > > + } > > > > > > > > this is a kernel regression, previously expected_attach_type wasn't > > > > enforced, so user-space could have provided any number without an > > > > error. > > > I also think this change alone will break things like when the usual > > > attr->expected_attach_type == 0 case. At least changes is needed in > > > bpf_prog_load_fixup_attach_type() which is also handling a > > > similar situation for BPF_PROG_TYPE_CGROUP_SOCK. > > > > > > I now think there is no need to expose new bpf_attach_type to the UAPI. > > > Since the prog->expected_attach_type is not used, it can be cleared at load time > > > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined > > > internally at filter.[c|h]) in the is_valid_access() when "migration" > > > is accessed. When "migration" is accessed, the bpf prog can handle > > > migration (and the original not-migration) case. > > Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea. > > I think there will be cases that bpf prog wants to do both > > without accessing any field from sk_reuseport_md. > > > > Lets go back to the discussion on using a similar > > idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type(). > > I am not aware there is loader setting a random number > > in expected_attach_type, so the chance of breaking > > is very low. There was a similar discussion earlier [0]. > > > > [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/ > > Thank you for the idea and reference. > > I will remove the change in bpf_prog_load_check_attach() and set the > default value (BPF_SK_REUSEPORT_SELECT) in bpf_prog_load_fixup_attach_type() > for backward compatibility if expected_attach_type is 0. check_attach_type() can be kept. You can refer to commit aac3fc320d94 for a similar situation.
I'm sending this mail just for logging because I failed to send mails only to LKML, netdev, and bpf yesterday. From: Martin KaFai Lau <kafai@fb.com> Date: Thu, 3 Dec 2020 21:56:53 -0800 > On Thu, Dec 03, 2020 at 11:16:08PM +0900, Kuniyuki Iwashima wrote: > > From: Martin KaFai Lau <kafai@fb.com> > > Date: Wed, 2 Dec 2020 20:24:02 -0800 > > > On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote: > > > > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote: > > > > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > > > > > > > > > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to > > > > > > check if the attached eBPF program is capable of migrating sockets. > > > > > > > > > > > > When the eBPF program is attached, the kernel runs it for socket migration > > > > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. > > > > > > The kernel will change the behaviour depending on the returned value: > > > > > > > > > > > > - SK_PASS with selected_sk, select it as a new listener > > > > > > - SK_PASS with selected_sk NULL, fall back to the random selection > > > > > > - SK_DROP, cancel the migration > > > > > > > > > > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ > > > > > > Suggested-by: Martin KaFai Lau <kafai@fb.com> > > > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > > > > > > --- > > > > > > include/uapi/linux/bpf.h | 2 ++ > > > > > > kernel/bpf/syscall.c | 8 ++++++++ > > > > > > tools/include/uapi/linux/bpf.h | 2 ++ > > > > > > 3 files changed, 12 insertions(+) > > > > > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > > index 85278deff439..cfc207ae7782 100644 > > > > > > --- a/include/uapi/linux/bpf.h > > > > > > +++ b/include/uapi/linux/bpf.h > > > > > > @@ -241,6 +241,8 @@ enum bpf_attach_type { > > > > > > BPF_XDP_CPUMAP, > > > > > > BPF_SK_LOOKUP, > > > > > > BPF_XDP, > > > > > > + BPF_SK_REUSEPORT_SELECT, > > > > > > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, > > > > > > __MAX_BPF_ATTACH_TYPE > > > > > > }; > > > > > > > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > > > > index f3fe9f53f93c..a0796a8de5ea 100644 > > > > > > --- a/kernel/bpf/syscall.c > > > > > > +++ b/kernel/bpf/syscall.c > > > > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, > > > > > > if (expected_attach_type == BPF_SK_LOOKUP) > > > > > > return 0; > > > > > > return -EINVAL; > > > > > > + case BPF_PROG_TYPE_SK_REUSEPORT: > > > > > > + switch (expected_attach_type) { > > > > > > + case BPF_SK_REUSEPORT_SELECT: > > > > > > + case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: > > > > > > + return 0; > > > > > > + default: > > > > > > + return -EINVAL; > > > > > > + } > > > > > > > > > > this is a kernel regression, previously expected_attach_type wasn't > > > > > enforced, so user-space could have provided any number without an > > > > > error. > > > > I also think this change alone will break things like when the usual > > > > attr->expected_attach_type == 0 case. At least changes is needed in > > > > bpf_prog_load_fixup_attach_type() which is also handling a > > > > similar situation for BPF_PROG_TYPE_CGROUP_SOCK. > > > > > > > > I now think there is no need to expose new bpf_attach_type to the UAPI. > > > > Since the prog->expected_attach_type is not used, it can be cleared at load time > > > > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined > > > > internally at filter.[c|h]) in the is_valid_access() when "migration" > > > > is accessed. When "migration" is accessed, the bpf prog can handle > > > > migration (and the original not-migration) case. > > > Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea. > > > I think there will be cases that bpf prog wants to do both > > > without accessing any field from sk_reuseport_md. > > > > > > Lets go back to the discussion on using a similar > > > idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type(). > > > I am not aware there is loader setting a random number > > > in expected_attach_type, so the chance of breaking > > > is very low. There was a similar discussion earlier [0]. > > > > > > [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/ > > > > Thank you for the idea and reference. > > > > I will remove the change in bpf_prog_load_check_attach() and set the > > default value (BPF_SK_REUSEPORT_SELECT) in bpf_prog_load_fixup_attach_type() > > for backward compatibility if expected_attach_type is 0. > check_attach_type() can be kept. You can refer to > commit aac3fc320d94 for a similar situation. I confirmed bpf_prog_load_fixup_attach_type() is called just before bpf_prog_load_check_attach(), so I will add the fixup code to this patch. Thank you.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 85278deff439..cfc207ae7782 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -241,6 +241,8 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_REUSEPORT_SELECT, + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f3fe9f53f93c..a0796a8de5ea 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, if (expected_attach_type == BPF_SK_LOOKUP) return 0; return -EINVAL; + case BPF_PROG_TYPE_SK_REUSEPORT: + switch (expected_attach_type) { + case BPF_SK_REUSEPORT_SELECT: + case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: + return 0; + default: + return -EINVAL; + } case BPF_PROG_TYPE_EXT: if (expected_attach_type) return -EINVAL; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 85278deff439..cfc207ae7782 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -241,6 +241,8 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_REUSEPORT_SELECT, + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, __MAX_BPF_ATTACH_TYPE };
This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to check if the attached eBPF program is capable of migrating sockets. When the eBPF program is attached, the kernel runs it for socket migration only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. The kernel will change the behaviour depending on the returned value: - SK_PASS with selected_sk, select it as a new listener - SK_PASS with selected_sk NULL, fall back to the random selection - SK_DROP, cancel the migration Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> --- include/uapi/linux/bpf.h | 2 ++ kernel/bpf/syscall.c | 8 ++++++++ tools/include/uapi/linux/bpf.h | 2 ++ 3 files changed, 12 insertions(+)