Message ID | 20241020110345.1468595-9-zijianzhang@bytedance.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | Fixes to bpf_msg_push/pop_data and test_sockmap | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for bpf |
netdev/apply | fail | Patch does not apply to bpf-0 |
zijianzhang@ wrote: > From: Zijian Zhang <zijianzhang@bytedance.com> > > Found in the test_txmsg_pull in test_sockmap, > ``` > txmsg_cork = 512; > opt->iov_length = 3; > opt->iov_count = 1; > opt->rate = 512; > ``` > The first sendmsg will send an sk_msg with size 3, and bpf_msg_pull_data > will be invoked the first time. sk_msg_reset_curr will reset the copybreak > from 3 to 0, then the second sendmsg will write into copybreak starting at > 0 which overwrites the first sendmsg. The same problem happens in push and > pop test. Thus, fix sk_msg_reset_curr to restore the correct copybreak. > > Fixes: bb9aefde5bba ("bpf: sockmap, updating the sg structure should also update curr") > Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Hi Zijian, question on below. > --- > net/core/filter.c | 20 +++++++++----------- > 1 file changed, 9 insertions(+), 11 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 8e1a8a8d8d55..b725d3a2fdb8 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -2619,18 +2619,16 @@ BPF_CALL_2(bpf_msg_cork_bytes, struct sk_msg *, msg, u32, bytes) > I find push_data a bit easier to think through so allow me to walk through a push example. If we setup so that curr=0 and copybreak=3 then call push_data(skmsg, 2, 2); When we get to the sk_msg_reset_curr we should have a layout, msg->sg.data[0] = length(2) equal to original [0,2] msg->sg.data[1] = length(2) msg->sg.data[2] = legnth(1) equal to original [3] The current before the reset curr will be, curr = 1 copybreak = 3 > static void sk_msg_reset_curr(struct sk_msg *msg) > { > - u32 i = msg->sg.start; > - u32 len = 0; > - with above context i = 0 > - do { > - len += sk_msg_elem(msg, i)->length; > - sk_msg_iter_var_next(i); > - if (len >= msg->sg.size) > - break; > - } while (i != msg->sg.end); When we exit loop, i = 3 len = 5 msg->sg.curr = 3 msg->sg.copybreak = 0 So we zero the copy break and set curr = 3. The next send should happen over sg.curr=3? What did I miss? > + if (!msg->sg.size) { > + msg->sg.curr = msg->sg.start; > + msg->sg.copybreak = 0; > + } else { > + u32 i = msg->sg.end; > > - msg->sg.curr = i; > - msg->sg.copybreak = 0; > + sk_msg_iter_var_prev(i); With this curr will always point to the end-1 but I'm not sure this can handle the case where we have done sk_msg_alloc() so we have start/end setup. And then on a copy fault for example we might have curr pointing somewhere in the middle of that. I think I will need to construct the example, but I believe this is originally why the 'i' is discovered by sg walk vs simpler end. > + msg->sg.curr = i; > + msg->sg.copybreak = msg->sg.data[i].length; This does seem more accurate then simply zero'ing out the copybreak which is a good thing. > + } > } > > static const struct bpf_func_proto bpf_msg_cork_bytes_proto = { > -- > 2.20.1 >
On 10/25/24 10:05 PM, John Fastabend wrote: > zijianzhang@ wrote: >> From: Zijian Zhang <zijianzhang@bytedance.com> >> >> Found in the test_txmsg_pull in test_sockmap, >> ``` >> txmsg_cork = 512; >> opt->iov_length = 3; >> opt->iov_count = 1; >> opt->rate = 512; >> ``` >> The first sendmsg will send an sk_msg with size 3, and bpf_msg_pull_data >> will be invoked the first time. sk_msg_reset_curr will reset the copybreak >> from 3 to 0, then the second sendmsg will write into copybreak starting at >> 0 which overwrites the first sendmsg. The same problem happens in push and >> pop test. Thus, fix sk_msg_reset_curr to restore the correct copybreak. >> >> Fixes: bb9aefde5bba ("bpf: sockmap, updating the sg structure should also update curr") >> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> > > Hi Zijian, question on below. > > > I find push_data a bit easier to think through so allow me to walk > through a push example. > > If we setup so that curr=0 and copybreak=3 then call > > push_data(skmsg, 2, 2); > > When we get to the sk_msg_reset_curr we should have a layout, > > msg->sg.data[0] = length(2) equal to original [0,2] > msg->sg.data[1] = length(2) > msg->sg.data[2] = legnth(1) equal to original [3] > > The current before the reset curr will be, > > curr = 1 > copybreak = 3 > >> static void sk_msg_reset_curr(struct sk_msg *msg) >> { >> - u32 i = msg->sg.start; >> - u32 len = 0; >> - > > with above context i = 0 > >> - do { >> - len += sk_msg_elem(msg, i)->length; >> - sk_msg_iter_var_next(i); >> - if (len >= msg->sg.size) >> - break; >> - } while (i != msg->sg.end); > > When we exit loop, > > i = 3 > len = 5 > > msg->sg.curr = 3 > msg->sg.copybreak = 0 > > So we zero the copy break and set curr = 3. The next send > should happen over sg.curr=3? What did I miss? > That's true, for common cases without corking, it should work well. For this fix, we have to consider cork at the same time. I still use pull here for simplicity, for example, ``` // user program txmsg_cork = 8; opt->iov_length = 3; opt->iov_count = 1; opt->rate = 3; // eBPF program pull_data(sk_msg, 0, 1); ``` In the first sendmsg, pull_data will be invoked and reset msg->sg.copybreak from 3 to 0, msg->sg.curr to 0, which is the same. However, because of the corking, the data will not be sent out, and it is stored in the psock->cork. In the second sendmsg, since we are in the stage of corking, psock->cork will be reused in func sk_msg_alloc. msg->sg.copybreak is 0 now, the second msg will overwrite the first msg. As a result, we could not pass the data integrity test. push + cork and pop + cork may have the same problem, with the same reason that corking may have different sendmsgs reuse the same skmsg. >> + if (!msg->sg.size) { >> + msg->sg.curr = msg->sg.start; >> + msg->sg.copybreak = 0; >> + } else { >> + u32 i = msg->sg.end; >> >> - msg->sg.curr = i; >> - msg->sg.copybreak = 0; >> + sk_msg_iter_var_prev(i); > > With this curr will always point to the end-1 but I'm not sure this can > handle the case where we have done sk_msg_alloc() so we have start/end > setup. And then on a copy fault for example we might have curr pointing > somewhere in the middle of that. I think I will need to construct the > example, but I believe this is originally why the 'i' is discovered > by sg walk vs simpler end. > Good point! I am not sure if corking + pull/push/pop are common cases. I guess they are not? If we take these into account, then instead of resetting, we may need to carefully maintain the curr and copybreak when we shift the sgs? >> + msg->sg.curr = i; >> + msg->sg.copybreak = msg->sg.data[i].length; > > This does seem more accurate then simply zero'ing out the copybreak > which is a good thing. > >> + } >> } >> >> static const struct bpf_func_proto bpf_msg_cork_bytes_proto = { >> -- >> 2.20.1 >> > >
diff --git a/net/core/filter.c b/net/core/filter.c index 8e1a8a8d8d55..b725d3a2fdb8 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2619,18 +2619,16 @@ BPF_CALL_2(bpf_msg_cork_bytes, struct sk_msg *, msg, u32, bytes) static void sk_msg_reset_curr(struct sk_msg *msg) { - u32 i = msg->sg.start; - u32 len = 0; - - do { - len += sk_msg_elem(msg, i)->length; - sk_msg_iter_var_next(i); - if (len >= msg->sg.size) - break; - } while (i != msg->sg.end); + if (!msg->sg.size) { + msg->sg.curr = msg->sg.start; + msg->sg.copybreak = 0; + } else { + u32 i = msg->sg.end; - msg->sg.curr = i; - msg->sg.copybreak = 0; + sk_msg_iter_var_prev(i); + msg->sg.curr = i; + msg->sg.copybreak = msg->sg.data[i].length; + } } static const struct bpf_func_proto bpf_msg_cork_bytes_proto = {