Message ID | 20221222001343.489117-1-paul@paul-moore.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: restore the ebpf audit UNLOAD id field | expand |
On 12/21, Paul Moore wrote: > When changing the ebpf program put() routines to support being called > from within IRQ context the program ID was reset to zero prior to > generating the audit UNLOAD record, which obviously rendered the ID > field bogus (always zero). This patch resolves this by adding a new > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > allocated an ID and never reset, ensuring a valid ID field, > regardless of the state of the original ID field, bpf_prox_aud::id. > I also modified the bpf_audit_prog() logic used to associate the > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > Instead of keying off the operation, it now keys off the execution > context, e.g. '!in_irg && !irqs_disabled()', which is much more > appropriate and should help better connect the UNLOAD operations with > the associated audit state (other audit records). [..] > As an note to future bug hunters, I did briefly consider removing the > ID reset in bpf_prog_free_id(), as it would seem that once the > program is removed from the idr pool it can no longer be found by its > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > when device disappears") seems to imply that it is beneficial to > reset the ID value. Perhaps as a secondary indicator that the ebpf > program is unbound/orphaned. That seems like the way to go imho. Can we have some extra 'invalid_id' bitfield in the bpf_prog so we can set it in bpf_prog_free_id and check in bpf_prog_free_id (for this offloaded use-case)? Because having two ids and then keeping track about which one to use, depending on the context, seems more fragile? > Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq > context.") > Reported-by: Burn Alting <burn.alting@iinet.net.au> > Signed-off-by: Paul Moore <paul@paul-moore.com> > --- > include/linux/bpf.h | 1 + > kernel/bpf/syscall.c | 8 +++++--- > 2 files changed, 6 insertions(+), 3 deletions(-) > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 9e7d46d16032..a22001ceb2c3 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1103,6 +1103,7 @@ struct bpf_prog_aux { > u32 max_tp_access; > u32 stack_depth; > u32 id; > + u32 id_audit; /* preserves the id for use by audit */ > u32 func_cnt; /* used by non-func prog as the number of func progs */ > u32 func_idx; /* 0 for non-func prog, the index in func array for func > prog */ > u32 attach_btf_id; /* in-kernel BTF type id to attach to */ > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 7b373a5e861f..3ec09f4dba18 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -1958,13 +1958,13 @@ static void bpf_audit_prog(const struct bpf_prog > *prog, unsigned int op) > return; > if (audit_enabled == AUDIT_OFF) > return; > - if (op == BPF_AUDIT_LOAD) > + if (!in_irq() && !irqs_disabled()) > ctx = audit_context(); > ab = audit_log_start(ctx, GFP_ATOMIC, AUDIT_BPF); > if (unlikely(!ab)) > return; > audit_log_format(ab, "prog-id=%u op=%s", > - prog->aux->id, bpf_audit_str[op]); > + prog->aux->id_audit, bpf_audit_str[op]); > audit_log_end(ab); > } > @@ -1975,8 +1975,10 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog) > idr_preload(GFP_KERNEL); > spin_lock_bh(&prog_idr_lock); > id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC); > - if (id > 0) > + if (id > 0) { > prog->aux->id = id; > + prog->aux->id_audit = id; > + } > spin_unlock_bh(&prog_idr_lock); > idr_preload_end(); > -- > 2.39.0
On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > On 12/21, Paul Moore wrote: > > When changing the ebpf program put() routines to support being called > > from within IRQ context the program ID was reset to zero prior to > > generating the audit UNLOAD record, which obviously rendered the ID > > field bogus (always zero). This patch resolves this by adding a new > > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > > allocated an ID and never reset, ensuring a valid ID field, > > regardless of the state of the original ID field, bpf_prox_aud::id. > > > I also modified the bpf_audit_prog() logic used to associate the > > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > > Instead of keying off the operation, it now keys off the execution > > context, e.g. '!in_irg && !irqs_disabled()', which is much more > > appropriate and should help better connect the UNLOAD operations with > > the associated audit state (other audit records). > > [..] > > > As an note to future bug hunters, I did briefly consider removing the > > ID reset in bpf_prog_free_id(), as it would seem that once the > > program is removed from the idr pool it can no longer be found by its > > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > > when device disappears") seems to imply that it is beneficial to > > reset the ID value. Perhaps as a secondary indicator that the ebpf > > program is unbound/orphaned. > > That seems like the way to go imho. Can we have some extra 'invalid_id' > bitfield in the bpf_prog so we can set it in bpf_prog_free_id and > check in bpf_prog_free_id (for this offloaded use-case)? Because > having two ids and then keeping track about which one to use, depending > on the context, seems more fragile? I would definitely prefer to keep just a single ID value, and that was the first approach I took when drafting this patch, but when looking through the git log it looked like there was some desire to reset the ID to zero on free. Not being an expert on the ebpf kernel code I figured I would just write the patch up this way and make a comment about not zero'ing out the ID in the commit description so we could have a discussion about it. I'm not seeing any other comments, so I'll go ahead with putting together a v2 that sets an invalid flag/bit and I'll post that for further discussion/review. > > Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq > > context.") > > Reported-by: Burn Alting <burn.alting@iinet.net.au> > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > --- > > include/linux/bpf.h | 1 + > > kernel/bpf/syscall.c | 8 +++++--- > > 2 files changed, 6 insertions(+), 3 deletions(-)
On 12/22, Paul Moore wrote: > On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > > On 12/21, Paul Moore wrote: > > > When changing the ebpf program put() routines to support being called > > > from within IRQ context the program ID was reset to zero prior to > > > generating the audit UNLOAD record, which obviously rendered the ID > > > field bogus (always zero). This patch resolves this by adding a new > > > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > > > allocated an ID and never reset, ensuring a valid ID field, > > > regardless of the state of the original ID field, bpf_prox_aud::id. > > > > > I also modified the bpf_audit_prog() logic used to associate the > > > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > > > Instead of keying off the operation, it now keys off the execution > > > context, e.g. '!in_irg && !irqs_disabled()', which is much more > > > appropriate and should help better connect the UNLOAD operations with > > > the associated audit state (other audit records). > > > > [..] > > > > > As an note to future bug hunters, I did briefly consider removing the > > > ID reset in bpf_prog_free_id(), as it would seem that once the > > > program is removed from the idr pool it can no longer be found by its > > > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > > > when device disappears") seems to imply that it is beneficial to > > > reset the ID value. Perhaps as a secondary indicator that the ebpf > > > program is unbound/orphaned. > > > > That seems like the way to go imho. Can we have some extra 'invalid_id' > > bitfield in the bpf_prog so we can set it in bpf_prog_free_id and > > check in bpf_prog_free_id (for this offloaded use-case)? Because > > having two ids and then keeping track about which one to use, depending > > on the context, seems more fragile? > I would definitely prefer to keep just a single ID value, and that was > the first approach I took when drafting this patch, but when looking > through the git log it looked like there was some desire to reset the > ID to zero on free. Not being an expert on the ebpf kernel code I > figured I would just write the patch up this way and make a comment > about not zero'ing out the ID in the commit description so we could > have a discussion about it. Yeah, the commit you reference is resetting the id for the offloaded progs. But it also mentions that even though we reset the id, it won't leak into the userspace: Note that orphaned offload programs will return -ENODEV on BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0. It talks about the "if (!aux->offload)" check in bpf_prog_offload_info_fill. So I'm assuming that having some extra "this id is already free" signal in the bpf_prog shouldn't be a problem here. > I'm not seeing any other comments, so I'll go ahead with putting > together a v2 that sets an invalid flag/bit and I'll post that for > further discussion/review. > > > Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from > irq > > > context.") > > > Reported-by: Burn Alting <burn.alting@iinet.net.au> > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > > --- > > > include/linux/bpf.h | 1 + > > > kernel/bpf/syscall.c | 8 +++++--- > > > 2 files changed, 6 insertions(+), 3 deletions(-) > -- > paul-moore.com
On Thu, Dec 22, 2022 at 2:40 PM <sdf@google.com> wrote: > On 12/22, Paul Moore wrote: > > On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > > > On 12/21, Paul Moore wrote: > > > > When changing the ebpf program put() routines to support being called > > > > from within IRQ context the program ID was reset to zero prior to > > > > generating the audit UNLOAD record, which obviously rendered the ID > > > > field bogus (always zero). This patch resolves this by adding a new > > > > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > > > > allocated an ID and never reset, ensuring a valid ID field, > > > > regardless of the state of the original ID field, bpf_prox_aud::id. > > > > > > > I also modified the bpf_audit_prog() logic used to associate the > > > > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > > > > Instead of keying off the operation, it now keys off the execution > > > > context, e.g. '!in_irg && !irqs_disabled()', which is much more > > > > appropriate and should help better connect the UNLOAD operations with > > > > the associated audit state (other audit records). > > > > > > [..] > > > > > > > As an note to future bug hunters, I did briefly consider removing the > > > > ID reset in bpf_prog_free_id(), as it would seem that once the > > > > program is removed from the idr pool it can no longer be found by its > > > > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > > > > when device disappears") seems to imply that it is beneficial to > > > > reset the ID value. Perhaps as a secondary indicator that the ebpf > > > > program is unbound/orphaned. > > > > > > That seems like the way to go imho. Can we have some extra 'invalid_id' > > > bitfield in the bpf_prog so we can set it in bpf_prog_free_id and > > > check in bpf_prog_free_id (for this offloaded use-case)? Because > > > having two ids and then keeping track about which one to use, depending > > > on the context, seems more fragile? > > > I would definitely prefer to keep just a single ID value, and that was > > the first approach I took when drafting this patch, but when looking > > through the git log it looked like there was some desire to reset the > > ID to zero on free. Not being an expert on the ebpf kernel code I > > figured I would just write the patch up this way and make a comment > > about not zero'ing out the ID in the commit description so we could > > have a discussion about it. > > Yeah, the commit you reference is resetting the id for the offloaded > progs. But it also mentions that even though we reset the id, > it won't leak into the userspace: > > Note that orphaned offload programs will return -ENODEV on > BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0. > > It talks about the "if (!aux->offload)" check in bpf_prog_offload_info_fill. > So I'm assuming that having some extra "this id is already free" signal > in the bpf_prog shouldn't be a problem here. FWIW, the currently-work-in-progress v2 patch adds a getter for the ID with a WARN() check to flag callers who are trying to access a bad/free'd bpf_prog. Unfortunately it touches a decent chunk of code, but I think it might be a nice additional check at runtime. +u32 bpf_prog_get_id(const struct bpf_prog *prog) +{ + if (WARN(!prog->valid_id, "Attempting to use invalid eBPF program")) + return 0; + return prog->aux->__id; +} > > I'm not seeing any other comments, so I'll go ahead with putting > > together a v2 that sets an invalid flag/bit and I'll post that for > > further discussion/review.
On Thu, Dec 22, 2022 at 2:59 PM Paul Moore <paul@paul-moore.com> wrote: > > On Thu, Dec 22, 2022 at 2:40 PM <sdf@google.com> wrote: > > On 12/22, Paul Moore wrote: > > > On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > > > > On 12/21, Paul Moore wrote: > > > > > When changing the ebpf program put() routines to support being called > > > > > from within IRQ context the program ID was reset to zero prior to > > > > > generating the audit UNLOAD record, which obviously rendered the ID > > > > > field bogus (always zero). This patch resolves this by adding a new > > > > > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > > > > > allocated an ID and never reset, ensuring a valid ID field, > > > > > regardless of the state of the original ID field, bpf_prox_aud::id. > > > > > > > > > I also modified the bpf_audit_prog() logic used to associate the > > > > > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > > > > > Instead of keying off the operation, it now keys off the execution > > > > > context, e.g. '!in_irg && !irqs_disabled()', which is much more > > > > > appropriate and should help better connect the UNLOAD operations with > > > > > the associated audit state (other audit records). > > > > > > > > [..] > > > > > > > > > As an note to future bug hunters, I did briefly consider removing the > > > > > ID reset in bpf_prog_free_id(), as it would seem that once the > > > > > program is removed from the idr pool it can no longer be found by its > > > > > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > > > > > when device disappears") seems to imply that it is beneficial to > > > > > reset the ID value. Perhaps as a secondary indicator that the ebpf > > > > > program is unbound/orphaned. > > > > > > > > That seems like the way to go imho. Can we have some extra 'invalid_id' > > > > bitfield in the bpf_prog so we can set it in bpf_prog_free_id and > > > > check in bpf_prog_free_id (for this offloaded use-case)? Because > > > > having two ids and then keeping track about which one to use, depending > > > > on the context, seems more fragile? > > > > > I would definitely prefer to keep just a single ID value, and that was > > > the first approach I took when drafting this patch, but when looking > > > through the git log it looked like there was some desire to reset the > > > ID to zero on free. Not being an expert on the ebpf kernel code I > > > figured I would just write the patch up this way and make a comment > > > about not zero'ing out the ID in the commit description so we could > > > have a discussion about it. > > > > Yeah, the commit you reference is resetting the id for the offloaded > > progs. But it also mentions that even though we reset the id, > > it won't leak into the userspace: > > > > Note that orphaned offload programs will return -ENODEV on > > BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0. > > > > It talks about the "if (!aux->offload)" check in bpf_prog_offload_info_fill. > > So I'm assuming that having some extra "this id is already free" signal > > in the bpf_prog shouldn't be a problem here. > > FWIW, the currently-work-in-progress v2 patch adds a getter for the ID > with a WARN() check to flag callers who are trying to access a > bad/free'd bpf_prog. Unfortunately it touches a decent chunk of code, > but I think it might be a nice additional check at runtime. > > +u32 bpf_prog_get_id(const struct bpf_prog *prog) > +{ > + if (WARN(!prog->valid_id, "Attempting to use invalid eBPF program")) > + return 0; > + return prog->aux->__id; > +} I should add that the getter is currently a static inline in bpf.h. > > > I'm not seeing any other comments, so I'll go ahead with putting > > > together a v2 that sets an invalid flag/bit and I'll post that for > > > further discussion/review.
On Thu, Dec 22, 2022 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote: > > On Thu, Dec 22, 2022 at 2:59 PM Paul Moore <paul@paul-moore.com> wrote: > > > > On Thu, Dec 22, 2022 at 2:40 PM <sdf@google.com> wrote: > > > On 12/22, Paul Moore wrote: > > > > On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > > > > > On 12/21, Paul Moore wrote: > > > > > > When changing the ebpf program put() routines to support being called > > > > > > from within IRQ context the program ID was reset to zero prior to > > > > > > generating the audit UNLOAD record, which obviously rendered the ID > > > > > > field bogus (always zero). This patch resolves this by adding a new > > > > > > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > > > > > > allocated an ID and never reset, ensuring a valid ID field, > > > > > > regardless of the state of the original ID field, bpf_prox_aud::id. > > > > > > > > > > > I also modified the bpf_audit_prog() logic used to associate the > > > > > > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > > > > > > Instead of keying off the operation, it now keys off the execution > > > > > > context, e.g. '!in_irg && !irqs_disabled()', which is much more > > > > > > appropriate and should help better connect the UNLOAD operations with > > > > > > the associated audit state (other audit records). > > > > > > > > > > [..] > > > > > > > > > > > As an note to future bug hunters, I did briefly consider removing the > > > > > > ID reset in bpf_prog_free_id(), as it would seem that once the > > > > > > program is removed from the idr pool it can no longer be found by its > > > > > > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > > > > > > when device disappears") seems to imply that it is beneficial to > > > > > > reset the ID value. Perhaps as a secondary indicator that the ebpf > > > > > > program is unbound/orphaned. > > > > > > > > > > That seems like the way to go imho. Can we have some extra 'invalid_id' > > > > > bitfield in the bpf_prog so we can set it in bpf_prog_free_id and > > > > > check in bpf_prog_free_id (for this offloaded use-case)? Because > > > > > having two ids and then keeping track about which one to use, depending > > > > > on the context, seems more fragile? > > > > > > > I would definitely prefer to keep just a single ID value, and that was > > > > the first approach I took when drafting this patch, but when looking > > > > through the git log it looked like there was some desire to reset the > > > > ID to zero on free. Not being an expert on the ebpf kernel code I > > > > figured I would just write the patch up this way and make a comment > > > > about not zero'ing out the ID in the commit description so we could > > > > have a discussion about it. > > > > > > Yeah, the commit you reference is resetting the id for the offloaded > > > progs. But it also mentions that even though we reset the id, > > > it won't leak into the userspace: > > > > > > Note that orphaned offload programs will return -ENODEV on > > > BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0. > > > > > > It talks about the "if (!aux->offload)" check in bpf_prog_offload_info_fill. > > > So I'm assuming that having some extra "this id is already free" signal > > > in the bpf_prog shouldn't be a problem here. > > > > FWIW, the currently-work-in-progress v2 patch adds a getter for the ID > > with a WARN() check to flag callers who are trying to access a > > bad/free'd bpf_prog. Unfortunately it touches a decent chunk of code, > > but I think it might be a nice additional check at runtime. > > > > +u32 bpf_prog_get_id(const struct bpf_prog *prog) > > +{ > > + if (WARN(!prog->valid_id, "Attempting to use invalid eBPF program")) > > + return 0; > > + return prog->aux->__id; > > +} > > I should add that the getter is currently a static inline in bpf.h. I don't see why we need to WARN on !valid_id, but I might be missing something. There are no places currently where we report 'id == 0' to the userspace, so we only need to take care of the offloaded case that resets id to zero early (instead of resetting it during regular __bpf_prog_put path). > > > > I'm not seeing any other comments, so I'll go ahead with putting > > > > together a v2 that sets an invalid flag/bit and I'll post that for > > > > further discussion/review. > > -- > paul-moore.com
On Thu, Dec 22, 2022 at 02:03:41PM -0500, Paul Moore wrote: > On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > > On 12/21, Paul Moore wrote: > > > When changing the ebpf program put() routines to support being called > > > from within IRQ context the program ID was reset to zero prior to > > > generating the audit UNLOAD record, which obviously rendered the ID > > > field bogus (always zero). This patch resolves this by adding a new > > > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > > > allocated an ID and never reset, ensuring a valid ID field, > > > regardless of the state of the original ID field, bpf_prox_aud::id. > > > > > I also modified the bpf_audit_prog() logic used to associate the > > > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > > > Instead of keying off the operation, it now keys off the execution > > > context, e.g. '!in_irg && !irqs_disabled()', which is much more > > > appropriate and should help better connect the UNLOAD operations with > > > the associated audit state (other audit records). > > > > [..] > > > > > As an note to future bug hunters, I did briefly consider removing the > > > ID reset in bpf_prog_free_id(), as it would seem that once the > > > program is removed from the idr pool it can no longer be found by its > > > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > > > when device disappears") seems to imply that it is beneficial to > > > reset the ID value. Perhaps as a secondary indicator that the ebpf > > > program is unbound/orphaned. > > > > That seems like the way to go imho. Can we have some extra 'invalid_id' > > bitfield in the bpf_prog so we can set it in bpf_prog_free_id and > > check in bpf_prog_free_id (for this offloaded use-case)? Because > > having two ids and then keeping track about which one to use, depending > > on the context, seems more fragile? > > I would definitely prefer to keep just a single ID value, and that was > the first approach I took when drafting this patch, but when looking > through the git log it looked like there was some desire to reset the > ID to zero on free. Not being an expert on the ebpf kernel code I > figured I would just write the patch up this way and make a comment > about not zero'ing out the ID in the commit description so we could > have a discussion about it. > > I'm not seeing any other comments, so I'll go ahead with putting > together a v2 that sets an invalid flag/bit and I'll post that for > further discussion/review. great, perf suffers the same issue: https://lore.kernel.org/bpf/Y3SRWVoycV290S16@krava/ any chance you could include it as well? I can send a patch later if needed thanks, jirka > > > > Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq > > > context.") > > > Reported-by: Burn Alting <burn.alting@iinet.net.au> > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > > --- > > > include/linux/bpf.h | 1 + > > > kernel/bpf/syscall.c | 8 +++++--- > > > 2 files changed, 6 insertions(+), 3 deletions(-) > > -- > paul-moore.com
On Thu, Dec 22, 2022 at 4:28 PM Stanislav Fomichev <sdf@google.com> wrote: > On Thu, Dec 22, 2022 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote: > > On Thu, Dec 22, 2022 at 2:59 PM Paul Moore <paul@paul-moore.com> wrote: > > > On Thu, Dec 22, 2022 at 2:40 PM <sdf@google.com> wrote: > > > > On 12/22, Paul Moore wrote: > > > > > On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > > > > > > On 12/21, Paul Moore wrote: ... > > > FWIW, the currently-work-in-progress v2 patch adds a getter for the ID > > > with a WARN() check to flag callers who are trying to access a > > > bad/free'd bpf_prog. Unfortunately it touches a decent chunk of code, > > > but I think it might be a nice additional check at runtime. > > > > > > +u32 bpf_prog_get_id(const struct bpf_prog *prog) > > > +{ > > > + if (WARN(!prog->valid_id, "Attempting to use invalid eBPF program")) > > > + return 0; > > > + return prog->aux->__id; > > > +} > > > > I should add that the getter is currently a static inline in bpf.h. > > I don't see why we need to WARN on !valid_id, but I might be missing something. > There are no places currently where we report 'id == 0' to the > userspace, so we only need to take care of the offloaded case that > resets id to zero early (instead of resetting it during regular > __bpf_prog_put path). I put the WARN there, in place of a normal 'if (!prog->valid_id)', as an extra runtime check/debug-tool for those who have CONFIG_BUG enabled. I'm sure everything works properly now with respect to not using a bpf_prog reference after it has been free'd/released, but mistakes do happen - look at the regression/bug that started this thread :) If you really don't want the WARN() there, I can replace it with the simple '!prog->valid_id' check, let me know. It's your code, you should maintain it how you want; I just want to make sure we are generating audit records correctly. > > > > > I'm not seeing any other comments, so I'll go ahead with putting > > > > > together a v2 that sets an invalid flag/bit and I'll post that for > > > > > further discussion/review.
On Thu, Dec 22, 2022 at 6:20 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Thu, Dec 22, 2022 at 02:03:41PM -0500, Paul Moore wrote: > > On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > > > On 12/21, Paul Moore wrote: > > > > When changing the ebpf program put() routines to support being called > > > > from within IRQ context the program ID was reset to zero prior to > > > > generating the audit UNLOAD record, which obviously rendered the ID > > > > field bogus (always zero). This patch resolves this by adding a new > > > > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > > > > allocated an ID and never reset, ensuring a valid ID field, > > > > regardless of the state of the original ID field, bpf_prox_aud::id. > > > > > > > I also modified the bpf_audit_prog() logic used to associate the > > > > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > > > > Instead of keying off the operation, it now keys off the execution > > > > context, e.g. '!in_irg && !irqs_disabled()', which is much more > > > > appropriate and should help better connect the UNLOAD operations with > > > > the associated audit state (other audit records). > > > > > > [..] > > > > > > > As an note to future bug hunters, I did briefly consider removing the > > > > ID reset in bpf_prog_free_id(), as it would seem that once the > > > > program is removed from the idr pool it can no longer be found by its > > > > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > > > > when device disappears") seems to imply that it is beneficial to > > > > reset the ID value. Perhaps as a secondary indicator that the ebpf > > > > program is unbound/orphaned. > > > > > > That seems like the way to go imho. Can we have some extra 'invalid_id' > > > bitfield in the bpf_prog so we can set it in bpf_prog_free_id and > > > check in bpf_prog_free_id (for this offloaded use-case)? Because > > > having two ids and then keeping track about which one to use, depending > > > on the context, seems more fragile? > > > > I would definitely prefer to keep just a single ID value, and that was > > the first approach I took when drafting this patch, but when looking > > through the git log it looked like there was some desire to reset the > > ID to zero on free. Not being an expert on the ebpf kernel code I > > figured I would just write the patch up this way and make a comment > > about not zero'ing out the ID in the commit description so we could > > have a discussion about it. > > > > I'm not seeing any other comments, so I'll go ahead with putting > > together a v2 that sets an invalid flag/bit and I'll post that for > > further discussion/review. > > great, perf suffers the same issue: > https://lore.kernel.org/bpf/Y3SRWVoycV290S16@krava/ > > any chance you could include it as well? I can send a patch > later if needed Hi Jiri, I'm pretty sure the current approach recommended by Stanislav, to never reset/zero the ID and instead mark it as invalid via a flag in the bpf_prog struct, should resolve the perf problem as well. My time is a little short at the moment due to the holidays, but perhaps with a little luck I'll get a new revision of the patch posted soon (today?) and you can take a look and give it a test. Are you subscribed to the linux-audit and/or bpf mailing lists? If not I can CC you directly on the next revision.
On Fri, Dec 23, 2022 at 10:37 AM Paul Moore <paul@paul-moore.com> wrote: > On Thu, Dec 22, 2022 at 6:20 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Thu, Dec 22, 2022 at 02:03:41PM -0500, Paul Moore wrote: > > > On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > > > > On 12/21, Paul Moore wrote: > > > > > When changing the ebpf program put() routines to support being called > > > > > from within IRQ context the program ID was reset to zero prior to > > > > > generating the audit UNLOAD record, which obviously rendered the ID > > > > > field bogus (always zero). This patch resolves this by adding a new > > > > > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > > > > > allocated an ID and never reset, ensuring a valid ID field, > > > > > regardless of the state of the original ID field, bpf_prox_aud::id. > > > > > > > > > I also modified the bpf_audit_prog() logic used to associate the > > > > > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > > > > > Instead of keying off the operation, it now keys off the execution > > > > > context, e.g. '!in_irg && !irqs_disabled()', which is much more > > > > > appropriate and should help better connect the UNLOAD operations with > > > > > the associated audit state (other audit records). > > > > > > > > [..] > > > > > > > > > As an note to future bug hunters, I did briefly consider removing the > > > > > ID reset in bpf_prog_free_id(), as it would seem that once the > > > > > program is removed from the idr pool it can no longer be found by its > > > > > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > > > > > when device disappears") seems to imply that it is beneficial to > > > > > reset the ID value. Perhaps as a secondary indicator that the ebpf > > > > > program is unbound/orphaned. > > > > > > > > That seems like the way to go imho. Can we have some extra 'invalid_id' > > > > bitfield in the bpf_prog so we can set it in bpf_prog_free_id and > > > > check in bpf_prog_free_id (for this offloaded use-case)? Because > > > > having two ids and then keeping track about which one to use, depending > > > > on the context, seems more fragile? > > > > > > I would definitely prefer to keep just a single ID value, and that was > > > the first approach I took when drafting this patch, but when looking > > > through the git log it looked like there was some desire to reset the > > > ID to zero on free. Not being an expert on the ebpf kernel code I > > > figured I would just write the patch up this way and make a comment > > > about not zero'ing out the ID in the commit description so we could > > > have a discussion about it. > > > > > > I'm not seeing any other comments, so I'll go ahead with putting > > > together a v2 that sets an invalid flag/bit and I'll post that for > > > further discussion/review. > > > > great, perf suffers the same issue: > > https://lore.kernel.org/bpf/Y3SRWVoycV290S16@krava/ > > > > any chance you could include it as well? I can send a patch > > later if needed > > Hi Jiri, > > I'm pretty sure the current approach recommended by Stanislav, to > never reset/zero the ID and instead mark it as invalid via a flag in > the bpf_prog struct, should resolve the perf problem as well. I probably should elaborate on this a bit more, in the case of perf_event_bpf_event() the getter which checks the valid_id flag isn't used, rather a direct access of bpf_prog_aux::__id is done so that the ID can be retrieved even after it is free'd/marked-invalid. Here is the relevant code snippet for the patch: diff --git a/kernel/events/core.c b/kernel/events/core.c index aefc1e08e015..c24e897d27f1 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9001,7 +9001,11 @@ void perf_event_bpf_event(struct bpf_prog *prog, }, .type = type, .flags = flags, - .id = prog->aux->id, + /* + * don't use bpf_prog_get_id() as the id may be marked + * invalid on PERF_BPF_EVENT_PROG_UNLOAD events + */ + .id = prog->aux->__id, }, }; > My time > is a little short at the moment due to the holidays, but perhaps with > a little luck I'll get a new revision of the patch posted soon > (today?) and you can take a look and give it a test. Are you > subscribed to the linux-audit and/or bpf mailing lists? If not I can > CC you directly on the next revision.
On Fri, Dec 23, 2022 at 10:58:37AM -0500, Paul Moore wrote: > On Fri, Dec 23, 2022 at 10:37 AM Paul Moore <paul@paul-moore.com> wrote: > > On Thu, Dec 22, 2022 at 6:20 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > > On Thu, Dec 22, 2022 at 02:03:41PM -0500, Paul Moore wrote: > > > > On Thu, Dec 22, 2022 at 12:19 PM <sdf@google.com> wrote: > > > > > On 12/21, Paul Moore wrote: > > > > > > When changing the ebpf program put() routines to support being called > > > > > > from within IRQ context the program ID was reset to zero prior to > > > > > > generating the audit UNLOAD record, which obviously rendered the ID > > > > > > field bogus (always zero). This patch resolves this by adding a new > > > > > > field, bpf_prog_aux::id_audit, which is set when the ebpf program is > > > > > > allocated an ID and never reset, ensuring a valid ID field, > > > > > > regardless of the state of the original ID field, bpf_prox_aud::id. > > > > > > > > > > > I also modified the bpf_audit_prog() logic used to associate the > > > > > > AUDIT_BPF record with other associated records, e.g. @ctx != NULL. > > > > > > Instead of keying off the operation, it now keys off the execution > > > > > > context, e.g. '!in_irg && !irqs_disabled()', which is much more > > > > > > appropriate and should help better connect the UNLOAD operations with > > > > > > the associated audit state (other audit records). > > > > > > > > > > [..] > > > > > > > > > > > As an note to future bug hunters, I did briefly consider removing the > > > > > > ID reset in bpf_prog_free_id(), as it would seem that once the > > > > > > program is removed from the idr pool it can no longer be found by its > > > > > > ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id > > > > > > when device disappears") seems to imply that it is beneficial to > > > > > > reset the ID value. Perhaps as a secondary indicator that the ebpf > > > > > > program is unbound/orphaned. > > > > > > > > > > That seems like the way to go imho. Can we have some extra 'invalid_id' > > > > > bitfield in the bpf_prog so we can set it in bpf_prog_free_id and > > > > > check in bpf_prog_free_id (for this offloaded use-case)? Because > > > > > having two ids and then keeping track about which one to use, depending > > > > > on the context, seems more fragile? > > > > > > > > I would definitely prefer to keep just a single ID value, and that was > > > > the first approach I took when drafting this patch, but when looking > > > > through the git log it looked like there was some desire to reset the > > > > ID to zero on free. Not being an expert on the ebpf kernel code I > > > > figured I would just write the patch up this way and make a comment > > > > about not zero'ing out the ID in the commit description so we could > > > > have a discussion about it. > > > > > > > > I'm not seeing any other comments, so I'll go ahead with putting > > > > together a v2 that sets an invalid flag/bit and I'll post that for > > > > further discussion/review. > > > > > > great, perf suffers the same issue: > > > https://lore.kernel.org/bpf/Y3SRWVoycV290S16@krava/ > > > > > > any chance you could include it as well? I can send a patch > > > later if needed > > > > Hi Jiri, > > > > I'm pretty sure the current approach recommended by Stanislav, to > > never reset/zero the ID and instead mark it as invalid via a flag in > > the bpf_prog struct, should resolve the perf problem as well. ok, I misunderstood > > I probably should elaborate on this a bit more, in the case of > perf_event_bpf_event() the getter which checks the valid_id flag isn't > used, rather a direct access of bpf_prog_aux::__id is done so that the > ID can be retrieved even after it is free'd/marked-invalid. Here is > the relevant code snippet for the patch: > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index aefc1e08e015..c24e897d27f1 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9001,7 +9001,11 @@ void perf_event_bpf_event(struct bpf_prog *prog, > }, > .type = type, > .flags = flags, > - .id = prog->aux->id, > + /* > + * don't use bpf_prog_get_id() as the id may be marked > + * invalid on PERF_BPF_EVENT_PROG_UNLOAD events > + */ > + .id = prog->aux->__id, looks good > }, > }; > > > My time > > is a little short at the moment due to the holidays, but perhaps with > > a little luck I'll get a new revision of the patch posted soon > > (today?) and you can take a look and give it a test. Are you > > subscribed to the linux-audit and/or bpf mailing lists? If not I can > > CC you directly on the next revision. bpf list is fine thanks, jirka
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9e7d46d16032..a22001ceb2c3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1103,6 +1103,7 @@ struct bpf_prog_aux { u32 max_tp_access; u32 stack_depth; u32 id; + u32 id_audit; /* preserves the id for use by audit */ u32 func_cnt; /* used by non-func prog as the number of func progs */ u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */ u32 attach_btf_id; /* in-kernel BTF type id to attach to */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 7b373a5e861f..3ec09f4dba18 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1958,13 +1958,13 @@ static void bpf_audit_prog(const struct bpf_prog *prog, unsigned int op) return; if (audit_enabled == AUDIT_OFF) return; - if (op == BPF_AUDIT_LOAD) + if (!in_irq() && !irqs_disabled()) ctx = audit_context(); ab = audit_log_start(ctx, GFP_ATOMIC, AUDIT_BPF); if (unlikely(!ab)) return; audit_log_format(ab, "prog-id=%u op=%s", - prog->aux->id, bpf_audit_str[op]); + prog->aux->id_audit, bpf_audit_str[op]); audit_log_end(ab); } @@ -1975,8 +1975,10 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog) idr_preload(GFP_KERNEL); spin_lock_bh(&prog_idr_lock); id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC); - if (id > 0) + if (id > 0) { prog->aux->id = id; + prog->aux->id_audit = id; + } spin_unlock_bh(&prog_idr_lock); idr_preload_end();
When changing the ebpf program put() routines to support being called from within IRQ context the program ID was reset to zero prior to generating the audit UNLOAD record, which obviously rendered the ID field bogus (always zero). This patch resolves this by adding a new field, bpf_prog_aux::id_audit, which is set when the ebpf program is allocated an ID and never reset, ensuring a valid ID field, regardless of the state of the original ID field, bpf_prox_aud::id. I also modified the bpf_audit_prog() logic used to associate the AUDIT_BPF record with other associated records, e.g. @ctx != NULL. Instead of keying off the operation, it now keys off the execution context, e.g. '!in_irg && !irqs_disabled()', which is much more appropriate and should help better connect the UNLOAD operations with the associated audit state (other audit records). As an note to future bug hunters, I did briefly consider removing the ID reset in bpf_prog_free_id(), as it would seem that once the program is removed from the idr pool it can no longer be found by its ID value, but commit ad8ad79f4f60 ("bpf: offload: free program id when device disappears") seems to imply that it is beneficial to reset the ID value. Perhaps as a secondary indicator that the ebpf program is unbound/orphaned. Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.") Reported-by: Burn Alting <burn.alting@iinet.net.au> Signed-off-by: Paul Moore <paul@paul-moore.com> --- include/linux/bpf.h | 1 + kernel/bpf/syscall.c | 8 +++++--- 2 files changed, 6 insertions(+), 3 deletions(-)