Message ID | 20231219000520.34178-1-alexei.starovoitov@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | c49b292d031e385abf764ded32cd953c77e73f2d |
Headers | show |
Series | pull-request: bpf-next 2023-12-18 | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Pull request for net-next, async |
netdev/build_32bit | fail | Errors and warnings before: 7908 this patch: 7917 |
netdev/build_clang | fail | Errors and warnings before: 1330 this patch: 1330 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/build_allmodconfig_warn | fail | Errors and warnings before: 8444 this patch: 8453 |
netdev/build_clang_rust | success | No Rust files in patch. Skipping build |
On Mon, Dec 18, 2023 at 4:05 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > Please consider pulling these changes from: > > git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git tags/for-netdev Forgot to mention that there are two conflicts in include/linux/skbuff.h and tools/net/ynl/generated/netdev-user.c that should be easy to resolve. First is due to typo fix in the word 'variant' and code move.
On Mon, 18 Dec 2023 16:05:20 -0800 Alexei Starovoitov wrote: > 2) Introduce BPF token object, from Andrii Nakryiko. > It adds an ability to delegate a subset of BPF features from privileged daemon > (e.g., systemd) through special mount options for userns-bound BPF FS to a > trusted unprivileged application. The design accommodates suggestions from > Christian Brauner and Paul Moore. > Example: > $ sudo mkdir -p /sys/fs/bpf/token > $ sudo mount -t bpf bpffs /sys/fs/bpf/token \ > -o delegate_cmds=prog_load:MAP_CREATE \ > -o delegate_progs=kprobe \ > -o delegate_attachs=xdp LGTM, but what do I know about file systems.. Adding LKML to the CC list, if anyone has any late comments on the BPF token come forward now, petty please?
Hello: This pull request was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Mon, 18 Dec 2023 16:05:20 -0800 you wrote: > Hi David, hi Jakub, hi Paolo, hi Eric, > > The following pull-request contains BPF updates for your *net-next* tree. > > This PR is larger than usual and contains changes in various parts of the kernel. > > The main changes are: > > [...] Here is the summary with links: - pull-request: bpf-next 2023-12-18 https://git.kernel.org/netdev/net-next/c/c49b292d031e You are awesome, thank you!
On Mon, 18 Dec 2023 at 16:05, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > 2) Introduce BPF token object, from Andrii Nakryiko. I assume this is why I and some other unusual recipients are cc'd, because the networking people feel like they can't judge this and shouldn't merge non-networking code like this. Honestly, I was told - and expected - that this part would come in a branch of its own, so that it would be sanely reviewable. Now it's mixed in with everything else. This is *literally* why we have branches in git, so that people can make more independent changes and judgements, and so that we don't have to be in a situation where "look, here's ten different things, pull it all or nothing". Many of the changes *look* like they are in branches, but they've been the "fake branches" that are just done as "patch series in a branch, with the cover letter as the merge message". Which is great for maintaining that cover letter information and a certain amount of historical clarity, but not helpful AT ALL for the "independent changes" thing when it is all mixed up in history, where independent things are mostly serialized and not actually independent in history at all. So now it appears to be one big mess, and exactly that "all or nothing" thing that isn't great, since the whole point was that the networking people weren't comfortable with the reviewing filesystem side. And honestly, the bpf side *still* seems to be absolutely conbfused and complkete crap when it comes to file descriptors. I took a quick look, and I *still* see new code being introduced there that thinks that file descriptor zero is special, and we tols you a *year* ago that that wasn't true, and that you need to fix this. I literally see complete garbage like tghis: .. __u32 btf_token_fd; ... if (attr->btf_token_fd) { token = bpf_token_get_from_fd(attr->btf_token_fd); and this is all *new* code that makes that same bogus sh*t-for-brains mistake that was wrong the first time. So now I'm saying NAK. Enough is enough. No more of this crazy "I don't understand even the _basics_ of file descriptors, and yet I'm introducing new random interfaces". I know you thought fd zero was something invalid. You were told otherwise. Apparently you just ignored being wrong, and have decided to double down on being wrong. We don't take this kind of flat-Earther crap. File descriptors don't start at 1. Deal with reality. Stop making the same mistake over and over. If you ant to have a "no file descriptor" flag, you use a signed type, and a signed value for that, because file descriptor zero is perfectly valid, and I don't want to hear any more uninformed denialism. Stop polluting the kernel with incorrect assumptions. So yes, I will keep NAK'ing this until this kind of fundamental mistake is fixed. This is not rocket science, and this is not something that wasn't discussed before. Your ignorance has now turned from "I didn't know" to "I didn 't care", and at that point I really don't want to see new code any more. Linus
On Mon, 18 Dec 2023 at 16:55, Jakub Kicinski <kuba@kernel.org> wrote: > > LGTM, but what do I know about file systems.. Adding LKML to the CC > list, if anyone has any late comments on the BPF token come forward > now, petty please? See my crossed email reply. The file descriptor handling is FUNDAMENTALLY wrong. The first time that happened, we chalked it up to a mistake. Now it's something worse. Please don't pull until at least that part is fixed. I tried to review the token patches, but honestly, I got to that part and I just gave up. We had this whole discussion more than 6 months ago: https://lore.kernel.org/all/20230517-allabendlich-umgekehrt-8cc81f8313ac@brauner/ and I really thought the bpf people had *understood* they their special use of "fd == 0" was wrong. But it seems that they never did. Once is a mistake. Twice is a choice. And the bpf people have chosen insanity. Linus
On Mon, Dec 18, 2023 at 5:11 PM Linus Torvalds <torvalds@linuxfoundation.org> wrote: > > I literally see complete garbage like tghis: > > .. > __u32 btf_token_fd; > ... > if (attr->btf_token_fd) { > token = bpf_token_get_from_fd(attr->btf_token_fd); > > and this is all *new* code that makes that same bogus sh*t-for-brains > mistake that was wrong the first time. Point taken. We can do s/__u32 token_fd/__u64 token/ and waste upper 32-bit as flags that indicate that lower 32-bit is an FD or are you ok with __u32 token that is 'fd + 1'. zero - invalid one - FD==0 two - FD==1 ? Naming is hard. 'token_handle' maybe?
On Mon, 18 Dec 2023 at 17:48, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > Point taken. > We can do s/__u32 token_fd/__u64 token/ > and waste upper 32-bit as flags that indicate that lower 32-bit is an FD > or > are you ok with __u32 token that is 'fd + 1'. No, you make it follow the standard pattern that Unix has always had: file descriptors are _signed_ integer, and negative means error (or special cases). Now, traditionally a 'fd' is literally just of type "int", but for structures it's actually good to make it be a sized entity, so just make it be __s32, and make any special cases be actual negative numbers. Because I'll just go out on a limb and say that two billion file descriptors is enough for anybody, and if we ever were to hit that number, we'll have *way* more serious problems elsewhere long long before. And in practice, "int" is 32-bit on all current and near-future architectures, so "__s32" really is the same as "int" in all practical respects, and making the size explicit is just a good idea. You might want to perhaps pre-reserve a few negative numbers for actual special cases, eg "openat()" uses #define AT_FDCWD -100 which I don't think is a great example to follow in the details: it should have parenthesis, and "100" is a rather odd number to choose, but it's certainly an example of a not-fundamentally-broken "not a file descriptor, but a special case". Now, if you have a 'flags' or 'cmd' field for *other* reasons, then you can certainly just use one of the flags for "I have a file descriptor". But don't do some odd "translate values", and don't add 32 bits just for that. That's also a perfectly fine traditional unix use (example: socket control messages - "struct cmsghdr" with "cmsg_type = SCM_RIGHTS" in unix domain sockets). But if you don't have some other reason for having a separate flag for "I also have a file descriptor you should use", then just make a negative number mean "no file descriptor". It's easy to test for the number being negative, but it's also just easy to *not* test for, ie it's also perfectly fine to just do something like struct fd f = fdget(fd); without ever even bothering to test whether 'fd' is negative or not. It is guaranteed to fail for negative numbers and just look exactly like the "not open" case, so if you don't care about the difference between "invalid" and "not open", then a negative fd also works just as-is with no extra code at all. Linus Linus
On Mon, Dec 18, 2023 at 7:58 PM Linus Torvalds <torvalds@linuxfoundation.org> wrote: > > On Mon, 18 Dec 2023 at 17:48, Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > > > Point taken. > > We can do s/__u32 token_fd/__u64 token/ > > and waste upper 32-bit as flags that indicate that lower 32-bit is an FD > > or > > are you ok with __u32 token that is 'fd + 1'. > > No, you make it follow the standard pattern that Unix has always had: > file descriptors are _signed_ integer, and negative means error (or > special cases). > > Now, traditionally a 'fd' is literally just of type "int", but for > structures it's actually good to make it be a sized entity, so just > make it be __s32, and make any special cases be actual negative > numbers. > > Because I'll just go out on a limb and say that two billion file > descriptors is enough for anybody, and if we ever were to hit that > number, we'll have *way* more serious problems elsewhere long long > before. And in practice, "int" is 32-bit on all current and > near-future architectures, so "__s32" really is the same as "int" in > all practical respects, and making the size explicit is just a good > idea. > > You might want to perhaps pre-reserve a few negative numbers for > actual special cases, eg "openat()" uses > > #define AT_FDCWD -100 > > which I don't think is a great example to follow in the details: it > should have parenthesis, and "100" is a rather odd number to choose, > but it's certainly an example of a not-fundamentally-broken "not a > file descriptor, but a special case". > > Now, if you have a 'flags' or 'cmd' field for *other* reasons, then > you can certainly just use one of the flags for "I have a file > descriptor". But don't do some odd "translate values", and don't add > 32 bits just for that. > Makes sense. Yes, we do have flags for all commands accepting token FD, except for one, BPF_BTF_LOAD, but it's trivial to add flags there as well. I'll prepare a patch. > That's also a perfectly fine traditional unix use (example: socket > control messages - "struct cmsghdr" with "cmsg_type = SCM_RIGHTS" in > unix domain sockets). > > But if you don't have some other reason for having a separate flag for > "I also have a file descriptor you should use", then just make a > negative number mean "no file descriptor". > > It's easy to test for the number being negative, but it's also just > easy to *not* test for, ie it's also perfectly fine to just do > something like > > struct fd f = fdget(fd); > > without ever even bothering to test whether 'fd' is negative or not. > It is guaranteed to fail for negative numbers and just look exactly > like the "not open" case, so if you don't care about the difference > between "invalid" and "not open", then a negative fd also works just > as-is with no extra code at all. > > Linus > > Linus
On Mon, Dec 18, 2023 at 8:34 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Dec 18, 2023 at 7:58 PM Linus Torvalds > <torvalds@linuxfoundation.org> wrote: > > > > On Mon, 18 Dec 2023 at 17:48, Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > Point taken. > > > We can do s/__u32 token_fd/__u64 token/ > > > and waste upper 32-bit as flags that indicate that lower 32-bit is an FD > > > or > > > are you ok with __u32 token that is 'fd + 1'. > > > > No, you make it follow the standard pattern that Unix has always had: > > file descriptors are _signed_ integer, and negative means error (or > > special cases). > > > > Now, traditionally a 'fd' is literally just of type "int", but for > > structures it's actually good to make it be a sized entity, so just > > make it be __s32, and make any special cases be actual negative > > numbers. > > > > Because I'll just go out on a limb and say that two billion file > > descriptors is enough for anybody, and if we ever were to hit that > > number, we'll have *way* more serious problems elsewhere long long > > before. And in practice, "int" is 32-bit on all current and > > near-future architectures, so "__s32" really is the same as "int" in > > all practical respects, and making the size explicit is just a good > > idea. > > > > You might want to perhaps pre-reserve a few negative numbers for > > actual special cases, eg "openat()" uses > > > > #define AT_FDCWD -100 > > > > which I don't think is a great example to follow in the details: it > > should have parenthesis, and "100" is a rather odd number to choose, > > but it's certainly an example of a not-fundamentally-broken "not a > > file descriptor, but a special case". > > > > Now, if you have a 'flags' or 'cmd' field for *other* reasons, then > > you can certainly just use one of the flags for "I have a file > > descriptor". But don't do some odd "translate values", and don't add > > 32 bits just for that. > > > > Makes sense. Yes, we do have flags for all commands accepting token > FD, except for one, BPF_BTF_LOAD, but it's trivial to add flags there > as well. I'll prepare a patch. The patch is at [0], thanks. [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231219053150.336991-1-andrii@kernel.org/ > > > That's also a perfectly fine traditional unix use (example: socket > > control messages - "struct cmsghdr" with "cmsg_type = SCM_RIGHTS" in > > unix domain sockets). > > > > But if you don't have some other reason for having a separate flag for > > "I also have a file descriptor you should use", then just make a > > negative number mean "no file descriptor". > > > > It's easy to test for the number being negative, but it's also just > > easy to *not* test for, ie it's also perfectly fine to just do > > something like > > > > struct fd f = fdget(fd); > > > > without ever even bothering to test whether 'fd' is negative or not. > > It is guaranteed to fail for negative numbers and just look exactly > > like the "not open" case, so if you don't care about the difference > > between "invalid" and "not open", then a negative fd also works just > > as-is with no extra code at all. > > > > Linus > > > > Linus
On Mon, Dec 18, 2023 at 05:11:23PM -0800, Linus Torvalds wrote: > On Mon, 18 Dec 2023 at 16:05, Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > 2) Introduce BPF token object, from Andrii Nakryiko. > > I assume this is why I and some other unusual recipients are cc'd, > because the networking people feel like they can't judge this and > shouldn't merge non-networking code like this. > > Honestly, I was told - and expected - that this part would come in a > branch of its own, so that it would be sanely reviewable. > > Now it's mixed in with everything else. > > This is *literally* why we have branches in git, so that people can > make more independent changes and judgements, and so that we don't > have to be in a situation where "look, here's ten different things, > pull it all or nothing". > > Many of the changes *look* like they are in branches, but they've been > the "fake branches" that are just done as "patch series in a branch, > with the cover letter as the merge message". > > Which is great for maintaining that cover letter information and a > certain amount of historical clarity, but not helpful AT ALL for the > "independent changes" thing when it is all mixed up in history, where > independent things are mostly serialized and not actually independent > in history at all. > > So now it appears to be one big mess, and exactly that "all or > nothing" thing that isn't great, since the whole point was that the > networking people weren't comfortable with the reviewing filesystem > side. > > And honestly, the bpf side *still* seems to be absolutely conbfused > and complkete crap when it comes to file descriptors. > > I took a quick look, and I *still* see new code being introduced there > that thinks that file descriptor zero is special, and we tols you a > *year* ago that that wasn't true, and that you need to fix this. > > I literally see complete garbage like tghis: > > .. > __u32 btf_token_fd; > ... > if (attr->btf_token_fd) { > token = bpf_token_get_from_fd(attr->btf_token_fd); > > and this is all *new* code that makes that same bogus sh*t-for-brains > mistake that was wrong the first time. > > So now I'm saying NAK. Enough is enough. No more of this crazy "I > don't understand even the _basics_ of file descriptors, and yet I'm > introducing new random interfaces". > > I know you thought fd zero was something invalid. You were told > otherwise. Apparently you just ignored being wrong, and have decided > to double down on being wrong. > > We don't take this kind of flat-Earther crap. > > File descriptors don't start at 1. Deal with reality. Stop making the > same mistake over and over. If you ant to have a "no file descriptor" > flag, you use a signed type, and a signed value for that, because file > descriptor zero is perfectly valid, and I don't want to hear any more > uninformed denialism. > > Stop polluting the kernel with incorrect assumptions. > > So yes, I will keep NAK'ing this until this kind of fundamental > mistake is fixed. This is not rocket science, and this is not > something that wasn't discussed before. Your ignorance has now turned > from "I didn't know" to "I didn 't care", and at that point I really > don't want to see new code any more. Alexei, Andrii, this is a massive breach of trust and flatout disrespectful. I barely reword mails and believe me I've reworded this mail many times. I'm furious. Over the last couple of months since LSFMM in May 2023 until almost last week I've given you extensive design and review for this whole approach to get this into even remotely sane shape from a VFS perspective. The VFS maintainers including Linus have explicitly NAKed this "zero is not a valid fd nonsense" and told you to stop doing that. We told you that such fundamental VFS semantics are not yours to decide. And yet you put a patch into a series that did exactly that and then had the unbelievable audacity to repeatedly ask me to put my ACK under this - both in person and on list. I'm glad I only gave my ACK to the two patches that I extensivly reviewed and never to the whole series. @Linus, I'd like to ask you to please not pull any BPF code that touches fs/ in any way without an explicit ACK/RVB from Al, Jan, or myself. For now, everything BPF related to fs/ is proactively NAKed by me. This is disrespectful to the whole fs community and to me personally and precisely why we will keep resisting meaningful BPF integration in fs/ until we can be sure that we can trust this subsystem. Pleasant in-person interactions are one thing. But they're meaningless if they're inconsistent with on-list behavior and matters of trust. Disgraceful and tasteless is what I keep coming back to.
On Tue, Dec 19, 2023 at 11:23:50AM +0100, Christian Brauner wrote: > Alexei, Andrii, this is a massive breach of trust and flatout > disrespectful. I barely reword mails and believe me I've reworded this > mail many times. I'm furious. > > Over the last couple of months since LSFMM in May 2023 until almost last > week I've given you extensive design and review for this whole approach > to get this into even remotely sane shape from a VFS perspective. This isn't new behaviour from the BPF people. They always go their own way on everything. They refuse to collaborate with anyone in MM to make the memory allocators work with their constraints; instead they implement their own. It feels like they're on a Mission From God to implement the BPF Operating System and dealing with everyone else is an inconvenience. https://lore.kernel.org/bpf/20220623003230.37497-1-alexei.starovoitov@gmail.com/
On 12/19/23 11:23 AM, Christian Brauner wrote: > On Mon, Dec 18, 2023 at 05:11:23PM -0800, Linus Torvalds wrote: >> On Mon, 18 Dec 2023 at 16:05, Alexei Starovoitov >> <alexei.starovoitov@gmail.com> wrote: >>> >>> 2) Introduce BPF token object, from Andrii Nakryiko. >> >> I assume this is why I and some other unusual recipients are cc'd, >> because the networking people feel like they can't judge this and >> shouldn't merge non-networking code like this. >> >> Honestly, I was told - and expected - that this part would come in a >> branch of its own, so that it would be sanely reviewable. >> >> Now it's mixed in with everything else. >> >> This is *literally* why we have branches in git, so that people can >> make more independent changes and judgements, and so that we don't >> have to be in a situation where "look, here's ten different things, >> pull it all or nothing". >> >> Many of the changes *look* like they are in branches, but they've been >> the "fake branches" that are just done as "patch series in a branch, >> with the cover letter as the merge message". >> >> Which is great for maintaining that cover letter information and a >> certain amount of historical clarity, but not helpful AT ALL for the >> "independent changes" thing when it is all mixed up in history, where >> independent things are mostly serialized and not actually independent >> in history at all. >> >> So now it appears to be one big mess, and exactly that "all or >> nothing" thing that isn't great, since the whole point was that the >> networking people weren't comfortable with the reviewing filesystem >> side. >> >> And honestly, the bpf side *still* seems to be absolutely conbfused >> and complkete crap when it comes to file descriptors. >> >> I took a quick look, and I *still* see new code being introduced there >> that thinks that file descriptor zero is special, and we tols you a >> *year* ago that that wasn't true, and that you need to fix this. >> >> I literally see complete garbage like tghis: >> >> .. >> __u32 btf_token_fd; >> ... >> if (attr->btf_token_fd) { >> token = bpf_token_get_from_fd(attr->btf_token_fd); >> >> and this is all *new* code that makes that same bogus sh*t-for-brains >> mistake that was wrong the first time. >> >> So now I'm saying NAK. Enough is enough. No more of this crazy "I >> don't understand even the _basics_ of file descriptors, and yet I'm >> introducing new random interfaces". >> >> I know you thought fd zero was something invalid. You were told >> otherwise. Apparently you just ignored being wrong, and have decided >> to double down on being wrong. >> >> We don't take this kind of flat-Earther crap. >> >> File descriptors don't start at 1. Deal with reality. Stop making the >> same mistake over and over. If you ant to have a "no file descriptor" >> flag, you use a signed type, and a signed value for that, because file >> descriptor zero is perfectly valid, and I don't want to hear any more >> uninformed denialism. >> >> Stop polluting the kernel with incorrect assumptions. >> >> So yes, I will keep NAK'ing this until this kind of fundamental >> mistake is fixed. This is not rocket science, and this is not >> something that wasn't discussed before. Your ignorance has now turned >> from "I didn't know" to "I didn 't care", and at that point I really >> don't want to see new code any more. > > Alexei, Andrii, this is a massive breach of trust and flatout > disrespectful. I barely reword mails and believe me I've reworded this > mail many times. I'm furious. > > Over the last couple of months since LSFMM in May 2023 until almost last > week I've given you extensive design and review for this whole approach > to get this into even remotely sane shape from a VFS perspective. > > The VFS maintainers including Linus have explicitly NAKed this "zero is > not a valid fd nonsense" and told you to stop doing that. We told you > that such fundamental VFS semantics are not yours to decide. > > And yet you put a patch into a series that did exactly that and then had > the unbelievable audacity to repeatedly ask me to put my ACK under this > - both in person and on list. > > I'm glad I only gave my ACK to the two patches that I extensivly > reviewed and never to the whole series. Sincere apologies for this whole mess. All token series related patches have been reverted in bpf-next now, and I'm prepping a PR for net-next so that this is also fully removed from there. Thanks, Daniel
On Tue, Dec 19, 2023 at 2:23 AM Christian Brauner <brauner@kernel.org> wrote: > > On Mon, Dec 18, 2023 at 05:11:23PM -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2023 at 16:05, Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > 2) Introduce BPF token object, from Andrii Nakryiko. > > > > I assume this is why I and some other unusual recipients are cc'd, > > because the networking people feel like they can't judge this and > > shouldn't merge non-networking code like this. > > > > Honestly, I was told - and expected - that this part would come in a > > branch of its own, so that it would be sanely reviewable. > > > > Now it's mixed in with everything else. > > > > This is *literally* why we have branches in git, so that people can > > make more independent changes and judgements, and so that we don't > > have to be in a situation where "look, here's ten different things, > > pull it all or nothing". > > > > Many of the changes *look* like they are in branches, but they've been > > the "fake branches" that are just done as "patch series in a branch, > > with the cover letter as the merge message". > > > > Which is great for maintaining that cover letter information and a > > certain amount of historical clarity, but not helpful AT ALL for the > > "independent changes" thing when it is all mixed up in history, where > > independent things are mostly serialized and not actually independent > > in history at all. > > > > So now it appears to be one big mess, and exactly that "all or > > nothing" thing that isn't great, since the whole point was that the > > networking people weren't comfortable with the reviewing filesystem > > side. > > > > And honestly, the bpf side *still* seems to be absolutely conbfused > > and complkete crap when it comes to file descriptors. > > > > I took a quick look, and I *still* see new code being introduced there > > that thinks that file descriptor zero is special, and we tols you a > > *year* ago that that wasn't true, and that you need to fix this. > > > > I literally see complete garbage like tghis: > > > > .. > > __u32 btf_token_fd; > > ... > > if (attr->btf_token_fd) { > > token = bpf_token_get_from_fd(attr->btf_token_fd); > > > > and this is all *new* code that makes that same bogus sh*t-for-brains > > mistake that was wrong the first time. > > > > So now I'm saying NAK. Enough is enough. No more of this crazy "I > > don't understand even the _basics_ of file descriptors, and yet I'm > > introducing new random interfaces". > > > > I know you thought fd zero was something invalid. You were told > > otherwise. Apparently you just ignored being wrong, and have decided > > to double down on being wrong. > > > > We don't take this kind of flat-Earther crap. > > > > File descriptors don't start at 1. Deal with reality. Stop making the > > same mistake over and over. If you ant to have a "no file descriptor" > > flag, you use a signed type, and a signed value for that, because file > > descriptor zero is perfectly valid, and I don't want to hear any more > > uninformed denialism. > > > > Stop polluting the kernel with incorrect assumptions. > > > > So yes, I will keep NAK'ing this until this kind of fundamental > > mistake is fixed. This is not rocket science, and this is not > > something that wasn't discussed before. Your ignorance has now turned > > from "I didn't know" to "I didn 't care", and at that point I really > > don't want to see new code any more. > > Alexei, Andrii, this is a massive breach of trust and flatout > disrespectful. I barely reword mails and believe me I've reworded this > mail many times. I'm furious. > > Over the last couple of months since LSFMM in May 2023 until almost last > week I've given you extensive design and review for this whole approach > to get this into even remotely sane shape from a VFS perspective. > > The VFS maintainers including Linus have explicitly NAKed this "zero is > not a valid fd nonsense" and told you to stop doing that. We told you > that such fundamental VFS semantics are not yours to decide. > > And yet you put a patch into a series that did exactly that and then had > the unbelievable audacity to repeatedly ask me to put my ACK under this > - both in person and on list. > > I'm glad I only gave my ACK to the two patches that I extensivly > reviewed and never to the whole series. fwiw to three patches: https://lore.kernel.org/bpf/20231208-besessen-vibrieren-4e963e3ca3ba@brauner/ which are all the main bits of it. The patch 4 that does: if (attr->map_token_fd) wasn't sneaked in in any way. You were cc-ed on it just like linux-fsdevel@vger during all 12 revisions of the token series over many months. So this accusation of breach of trust is baseless. Indeed we didn't internalize that you guys hate fd=0 so much. In the past you made it clear fd=0 shouldn't be an alias to AT_FDCWD. We got that part. Meaning of fd=0 here wasn't a special new thing. We made this mistake in the past and assumed it's ok-ish to continue in similar situations. As I said. Point taken. We'll use flag+fd approach as Linus suggested.
On Tue, Dec 19, 2023 at 8:06 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Dec 19, 2023 at 11:23:50AM +0100, Christian Brauner wrote: > > Alexei, Andrii, this is a massive breach of trust and flatout > > disrespectful. I barely reword mails and believe me I've reworded this > > mail many times. I'm furious. > > > > Over the last couple of months since LSFMM in May 2023 until almost last > > week I've given you extensive design and review for this whole approach > > to get this into even remotely sane shape from a VFS perspective. > > This isn't new behaviour from the BPF people. They always go their own > way on everything. They refuse to collaborate with anyone in MM to make > the memory allocators work with their constraints; instead they implement > their own. It feels like they're on a Mission From God to implement the > BPF Operating System and dealing with everyone else is an inconvenience. > > https://lore.kernel.org/bpf/20220623003230.37497-1-alexei.starovoitov@gmail.com/ Matthew, I thought I answered in that thread that it is not a memory allocator. It's small free list of cached elements that bpf prog peeks from when prog runs in unknown context == tracing deep inside the kernel. Do you want to design a memory allocator that is fully re-entrant ? Meaning that kmalloc(GFP_REENTRANT) can be called from any context deep inside slab, inside arch code, inside _any_ and all code of the kernel? If the answer is yes, please go ahead. We'll happily switch to your thing. We used to preallocate all memory for such tracing use cases which was wasteful. This thingy is preallocating a few elements instead of preallocating them all. That's all there is.
On Tue, Dec 19, 2023 at 8:43 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Tue, Dec 19, 2023 at 2:23 AM Christian Brauner <brauner@kernel.org> wrote: > > > > On Mon, Dec 18, 2023 at 05:11:23PM -0800, Linus Torvalds wrote: > > > On Mon, 18 Dec 2023 at 16:05, Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > 2) Introduce BPF token object, from Andrii Nakryiko. > > > > > > I assume this is why I and some other unusual recipients are cc'd, > > > because the networking people feel like they can't judge this and > > > shouldn't merge non-networking code like this. > > > > > > Honestly, I was told - and expected - that this part would come in a > > > branch of its own, so that it would be sanely reviewable. > > > > > > Now it's mixed in with everything else. > > > > > > This is *literally* why we have branches in git, so that people can > > > make more independent changes and judgements, and so that we don't > > > have to be in a situation where "look, here's ten different things, > > > pull it all or nothing". > > > > > > Many of the changes *look* like they are in branches, but they've been > > > the "fake branches" that are just done as "patch series in a branch, > > > with the cover letter as the merge message". > > > > > > Which is great for maintaining that cover letter information and a > > > certain amount of historical clarity, but not helpful AT ALL for the > > > "independent changes" thing when it is all mixed up in history, where > > > independent things are mostly serialized and not actually independent > > > in history at all. > > > > > > So now it appears to be one big mess, and exactly that "all or > > > nothing" thing that isn't great, since the whole point was that the > > > networking people weren't comfortable with the reviewing filesystem > > > side. > > > > > > And honestly, the bpf side *still* seems to be absolutely conbfused > > > and complkete crap when it comes to file descriptors. > > > > > > I took a quick look, and I *still* see new code being introduced there > > > that thinks that file descriptor zero is special, and we tols you a > > > *year* ago that that wasn't true, and that you need to fix this. > > > > > > I literally see complete garbage like tghis: > > > > > > .. > > > __u32 btf_token_fd; > > > ... > > > if (attr->btf_token_fd) { > > > token = bpf_token_get_from_fd(attr->btf_token_fd); > > > > > > and this is all *new* code that makes that same bogus sh*t-for-brains > > > mistake that was wrong the first time. > > > > > > So now I'm saying NAK. Enough is enough. No more of this crazy "I > > > don't understand even the _basics_ of file descriptors, and yet I'm > > > introducing new random interfaces". > > > > > > I know you thought fd zero was something invalid. You were told > > > otherwise. Apparently you just ignored being wrong, and have decided > > > to double down on being wrong. > > > > > > We don't take this kind of flat-Earther crap. > > > > > > File descriptors don't start at 1. Deal with reality. Stop making the > > > same mistake over and over. If you ant to have a "no file descriptor" > > > flag, you use a signed type, and a signed value for that, because file > > > descriptor zero is perfectly valid, and I don't want to hear any more > > > uninformed denialism. > > > > > > Stop polluting the kernel with incorrect assumptions. > > > > > > So yes, I will keep NAK'ing this until this kind of fundamental > > > mistake is fixed. This is not rocket science, and this is not > > > something that wasn't discussed before. Your ignorance has now turned > > > from "I didn't know" to "I didn 't care", and at that point I really > > > don't want to see new code any more. > > > > Alexei, Andrii, this is a massive breach of trust and flatout > > disrespectful. I barely reword mails and believe me I've reworded this > > mail many times. I'm furious. > > > > Over the last couple of months since LSFMM in May 2023 until almost last > > week I've given you extensive design and review for this whole approach > > to get this into even remotely sane shape from a VFS perspective. Yes, and I appreciate your reviews and feedback a lot. There was never an intent to clandestinely land anything bad or outlandish, so I'm sorry you feel this way. I've cc'ed you and fsdevel mailing list, just like LSM folks, on every relevant patch set, each and every patch in them, and incorporated all the feedback I got over the last multiple months. > > > > The VFS maintainers including Linus have explicitly NAKed this "zero is > > not a valid fd nonsense" and told you to stop doing that. We told you > > that such fundamental VFS semantics are not yours to decide. It's on me to have interpreted FD=0 as AT_CWD in my original patch set for BPF_OBJ_PIN/BPF_OBJ_GET ([0]). It was totally my fault not thinking through all the negative consequences of defaulting to AT_CWD, and I acknowledged that and fixed it. It's also my bad that I kept using the "fd=0 means no FD was specified" approach which has been a consistent approach within bpf() syscall API without really thinking twice about this and how much it might irritate kernel people. I sent a fix ([1]) and going forward I'll remember to always add a flag for any new FD-based field in BPF UAPI. [0] https://lore.kernel.org/all/20230516001348.286414-1-andrii@kernel.org/ [1] https://patchwork.kernel.org/project/netdevbpf/patch/20231219053150.336991-1-andrii@kernel.org/ > > > > And yet you put a patch into a series that did exactly that and then had > > the unbelievable audacity to repeatedly ask me to put my ACK under this > > - both in person and on list. I did ask for a review and an ack as a sign that it looks good to you. Precisely to make sure that *everything* looks good overall from the POV of people outside of the BPF subsystem. I didn't ask for rubber stamping anything, if that's what is implied here. > > > > I'm glad I only gave my ACK to the two patches that I extensivly > > reviewed and never to the whole series. > > fwiw to three patches: > https://lore.kernel.org/bpf/20231208-besessen-vibrieren-4e963e3ca3ba@brauner/ > which are all the main bits of it. > > The patch 4 that does: > if (attr->map_token_fd) > wasn't sneaked in in any way. > You were cc-ed on it just like linux-fsdevel@vger > during all 12 revisions of the token series over many months. > > So this accusation of breach of trust is baseless. > > Indeed we didn't internalize that you guys hate fd=0 so much. > In the past you made it clear fd=0 shouldn't be an alias to AT_FDCWD. Right, and AT_FDCWD interpretation for fd=0 had security implications which was a clear and a bad bug. > We got that part. Meaning of fd=0 here wasn't a special new thing. > We made this mistake in the past and assumed it's ok-ish to continue > in similar situations. > As I said. Point taken. We'll use flag+fd approach as Linus suggested. Yes, I second the above.
> The patch 4 that does: > if (attr->map_token_fd) > wasn't sneaked in in any way. > You were cc-ed on it just like linux-fsdevel@vger > during all 12 revisions of the token series over many months. > > So this accusation of breach of trust is baseless. I was expecting this reply and I'm still disappointed. Both of you were explicitly told in very clear words that special-casing fd 0 is not ok. Fast forward a few weeks, you chose to not just add patches that forbid fd 0 again, no, the heinous part is that you chose to not lose a single word about this: not in the cover letter, not in the relevant commit, not in all the discussions we had around this. You were absolutely aware how opposed we are to this. It cannot get any more sneaky than this. And it's frankly insulting that you choose to defend this by feigning ignorance. No one is buying this. But let's assume for a second that both you and Andrii somehow managed to forget the very clear and heated discussion on-list last time, the resulting LWN article written about it and the in-person discussion around this we had in November at LPC. You still would have put a major deviation from file descriptor semantics in the bpf specific parts of the patches yet failed to lose a single word on this anywhere. Yet we explicitly requested in the last thread that if bpf does deviate from core fs semantics you clearly communicate this. But shame on me as well. I should've caught this during review. I trusted you both enough that I only focussed on the parts that matter for the VFS which were the two patches I ACKed. I didn't think it necessary to wade through the completely uninteresting BPF bits that I couldn't care less about. That won't happen again. What I want for the future is for bpf to clearly, openly, and explicitly communicate any decisions that affect core fs semantics. It's the exact same request I put forward last time. This is a path forward.
On Wed, Dec 20, 2023 at 3:18 AM Christian Brauner <brauner@kernel.org> wrote: > > > The patch 4 that does: > > if (attr->map_token_fd) > > wasn't sneaked in in any way. > > You were cc-ed on it just like linux-fsdevel@vger > > during all 12 revisions of the token series over many months. > > > > So this accusation of breach of trust is baseless. > > I was expecting this reply and I'm still disappointed. > > Both of you were explicitly told in very clear words that special-casing > fd 0 is not ok. > > Fast forward a few weeks, you chose to not just add patches that forbid > fd 0 again, no, the heinous part is that you chose to not lose a single > word about this: not in the cover letter, not in the relevant commit, > not in all the discussions we had around this. > > You were absolutely aware how opposed we are to this. It cannot get any > more sneaky than this. And it's frankly insulting that you choose to > defend this by feigning ignorance. No one is buying this. Christian, I'm sorry you feel this way, but I refuse to accept the blame of malicious or heinous intent. There was neither intent nor attempt to mislead you (or anyone else for that matter) or silently sneak anything in. Yes, we did continue to use the convention that FD=0 means "no FD is specified", which was consistent throughout BPF UAPI. But only when it comes to passing BPF objects around (program, BTF, map, link, and token). You hate it, I get it. But user-space already deals with this, because it is present in BPF UAPI in many commands, and it's never a problem in practice. This is *very different in implications* compared to passing VFS path FDs, like it was during the BPF_OBJ_PIN/BPF_OBJ_GET patch set ([0]). Back then I acknowledged the wrongness of treating FD=0 as AT_FDCWD due to security implications and fixed that with a special flag. It was a special case as far as BPF UAPI goes, and it was the first one of that kind. In my mind, path FDs (and any other FD that points to a file or other objects that didn't originate in bpf() syscall) and BPF FDs are two completely different classes of cases. And that's why I didn't give it too much thought when adding bpf_token_fd with default (for bpf() syscall) semantics of treating FD=0 as "no FD was provided". It *is* a *quirk* of the BPF UAPI that users have to take into account (and libbpf does take into account and never returns FD=0 to users, so in practice it is never a problem), but we are not defining any new semantics here. We do say "dup your FD=0, if you happen to want to pass it to BPF UAPI", a quirk that libbpf (and I presume other BPF libraries) hides and doesn't even mention in API. I'm elaborating on this so much just to explain *the thought process* (and not to make excuses) and why this was done the way it was done. This discussion made it very clear that this BPF FD special treatment won't be tolerated. OK, ack. We are going to add new flags whenever any FD field is added to BPF UAPI from now on. Back in [0] I didn't remember such a strong "we forbid fd 0 again" wording, tbh. At least before the discussion devolved into an unfortunate "let's prevent kernel from returning fd<3" discussion. Quoting [0] a bit: > If it was discussed then great but if not then I would like to make it > very clear that if in the future you decide to introduce custom > semantics for vfs provided infrastructure - especially when exposed to > userspace - that you please Cc us. And I did CC you. > I personally find this extremely weird to treat fd 0 as anything other > than a random fd number as it goes against any userspace assumptions and > drastically deviates from basically every file descriptor interface we > have. I mean, you're not just saying fd 0 is invalid you're even saying > it means AT_FDCWD. Yes, the AT_FDCWD thing was completely wrong. You did express general dissatisfaction with BPF UAPI's choice to treat optional FD fields as "no FD" if FD=0, and we can't fix it without breaking user-space. Still, the way I read it was that your main concern was, justifiably, AT_FDCWD treatment. > For every other interface, including those that pass fds in structs > whose extensibility is premised on unknown fields being set to zero, > have ways to make fd 0 work just fine. You could've done that to without > inventing custom fd semantics. You did explicitly ask this, but still, not in a "I forbid" fashion. Especially, taking this into account: > This is not a rant I'm really just trying to make sure that we agree on > common ground when it comes to touching each others code or semantic > assumptions. I didn't feel like bpf() syscall UAPI was "touching each others code". But I'll be honest, I'm not sure how widely I should have treated "custom semantics for vfs provided infrastructure" and "inventing custom fd semantics", and so I chose UAPI consistency relegating FD=0 convention as a BPF UAPI quirk. Are the BPF kernel objects the VFS infrastructure? Or that was about path FDs, and other "standard" FS files? It's a bit of a moot point now, though, as we agreed to do it for any FD field going forward, but still, clarity would be helpful. Again, there was no bad faith in my actions and everything was done in the open, with plenty of opportunities and time to raise concerns. [0] https://lore.kernel.org/all/20230517-allabendlich-umgekehrt-8cc81f8313ac@brauner/ > > But let's assume for a second that both you and Andrii somehow managed > to forget the very clear and heated discussion on-list last time, the > resulting LWN article written about it and the in-person discussion > around this we had in November at LPC. As far as I can remember LPC, we never touched on passing the BPF token FD aspect at all during discussions. We did talk about BPF_OBJ_PIN and how wrong it was to assume AT_FDCWD, which, again, is objectively wrong due to potential security attacks. It seems like in your mind AT_FDCWD and generally passing BPF object FDs is exactly the same problem, which I disagree with, but I think that's where your accusations come from. You can say both cases are problems, but they are different problems (security vs deviation of API from the rest of kernel APIs). > > You still would have put a major deviation from file descriptor > semantics in the bpf specific parts of the patches yet failed to lose a > single word on this anywhere. Yet we explicitly requested in the last > thread that if bpf does deviate from core fs semantics you clearly > communicate this. FD 0 is still a valid file descriptor, in general. No one claims otherwise, there is no change in semantics. BPF syscall won't see FD=0 for some optional fields, and that's a deviation from other APIs, yes, but with no security implications. And it might be hard to believe, but it's been like that for so long and is just such an ingrained default behavior, that yes, out of habit I didn't even think to highlight that in commit messages or cover letters. My bad, certainly, but hardly a heinous act. > > But shame on me as well. I should've caught this during review. I > trusted you both enough that I only focussed on the parts that matter > for the VFS which were the two patches I ACKed. I didn't think it > necessary to wade through the completely uninteresting BPF bits that I > couldn't care less about. That won't happen again. > > What I want for the future is for bpf to clearly, openly, and explicitly > communicate any decisions that affect core fs semantics. It's the exact > same request I put forward last time. This is a path forward. Of course, and you'll be CC'ed on all the BPF token patches I will resend after the holidays. And just to be clear for the future, by "core fs semantics" you also mean any BPF UAPI FD field, right?
> Of course, and you'll be CC'ed on all the BPF token patches I will > resend after the holidays. > > And just to be clear for the future, by "core fs semantics" you also > mean any BPF UAPI FD field, right? Yes, because ultimately you end up with calling: fdget()/fdget_raw()/fget() to turn a userspace handle in the form of an fd and turn it into a struct file. And that is uniform across the kernel. And therein lies the beauty of it all imo. IMHO, a file descriptor is one of the most widely used generic abstraction we have across all of the kernel. It is almost literally used everywhere. And everyone has the same contract: a non-negative integer is a valid fd, a negative one is invalid. It's simple, there aren't corner cases, there aren't custom semantics. And it's also arguably one of the most successful ones as we keep implementing new apis on top of this abstraction (pidfd, seccomp, process_*(), memfd_*(), endless kvm ioctls etc etc).