Message ID | 20230101083403.332783-4-memxor@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | Dynptr fixes | expand |
On Sun, Jan 1, 2023 at 12:34 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > Currently, while reads are disallowed for dynptr stack slots, writes are > not. Reads don't work from both direct access and helpers, while writes > do work in both cases, but have the effect of overwriting the slot_type. > > While this is fine, handling for a few edge cases is missing. Firstly, > a user can overwrite the stack slots of dynptr partially. > > Consider the following layout: > spi: [d][d][?] > 2 1 0 > > First slot is at spi 2, second at spi 1. > Now, do a write of 1 to 8 bytes for spi 1. > > This will essentially either write STACK_MISC for all slot_types or > STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write > of zeroes). The end result is that slot is scrubbed. > > Now, the layout is: > spi: [d][m][?] > 2 1 0 > > Suppose if user initializes spi = 1 as dynptr. > We get: > spi: [d][d][d] > 2 1 0 > > But this time, both spi 2 and spi 1 have first_slot = true. > > Now, when passing spi 2 to dynptr helper, it will consider it as > initialized as it does not check whether second slot has first_slot == > false. And spi 1 should already work as normal. > > This effectively replaced size + offset of first dynptr, hence allowing > invalid OOB reads and writes. > > Make a few changes to protect against this: > When writing to PTR_TO_STACK using BPF insns, when we touch spi of a > STACK_DYNPTR type, mark both first and second slot (regardless of which > slot we touch) as STACK_INVALID. Reads are already prevented. > > Second, prevent writing to stack memory from helpers if the range may > contain any STACK_DYNPTR slots. Reads are already prevented. > > For helpers, we cannot allow it to destroy dynptrs from the writes as > depending on arguments, helper may take uninit_mem and dynptr both at > the same time. This would mean that helper may write to uninit_mem > before it reads the dynptr, which would be bad. > > PTR_TO_MEM: [?????dd] > > Depending on the code inside the helper, it may end up overwriting the > dynptr contents first and then read those as the dynptr argument. > > Verifier would only simulate destruction when it does byte by byte > access simulation in check_helper_call for meta.access_size, and > fail to catch this case, as it happens after argument checks. > > The same would need to be done for any other non-trivial objects created > on the stack in the future, such as bpf_list_head on stack, or > bpf_rb_root on stack. > > A common misunderstanding in the current code is that MEM_UNINIT means > writes, but note that writes may also be performed even without > MEM_UNINIT in case of helpers, in that case the code after handling meta > && meta->raw_mode will complain when it sees STACK_DYNPTR. So that > invalid read case also covers writes to potential STACK_DYNPTR slots. > The only loophole was in case of meta->raw_mode which simulated writes > through instructions which could overwrite them. > > A future series sequenced after this will focus on the clean up of > helper access checks and bugs around that. > > Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> > --- > kernel/bpf/verifier.c | 73 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 73 insertions(+) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index ca970f80e395..b985d90505cc 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -769,6 +769,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg, > __mark_dynptr_reg(reg, type, true); > } > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > + struct bpf_func_state *state, int spi); > > static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg, > enum bpf_arg_type arg_type, int insn_idx) > @@ -858,6 +860,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re > return 0; > } > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > + struct bpf_func_state *state, int spi) > +{ > + int i; > + > + /* We always ensure that STACK_DYNPTR is never set partially, > + * hence just checking for slot_type[0] is enough. This is > + * different for STACK_SPILL, where it may be only set for > + * 1 byte, so code has to use is_spilled_reg. > + */ > + if (state->stack[spi].slot_type[0] != STACK_DYNPTR) > + return; > + /* Reposition spi to first slot */ > + if (!state->stack[spi].spilled_ptr.dynptr.first_slot) > + spi = spi + 1; > + > + mark_stack_slot_scratched(env, spi); > + mark_stack_slot_scratched(env, spi - 1); > + > + /* Writing partially to one dynptr stack slot destroys both. */ > + for (i = 0; i < BPF_REG_SIZE; i++) { > + state->stack[spi].slot_type[i] = STACK_INVALID; > + state->stack[spi - 1].slot_type[i] = STACK_INVALID; > + } > + > + /* Do not release reference state, we are destroying dynptr on stack, > + * not using some helper to release it. Just reset register. > + */ > + __mark_reg_not_init(env, &state->stack[spi].spilled_ptr); > + __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr); > + > + /* Same reason as unmark_stack_slots_dynptr above */ > + state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN; > + state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN; > + > + return; > +} > + > static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > { > struct bpf_func_state *state = func(env, reg); > @@ -3384,6 +3424,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > env->insn_aux_data[insn_idx].sanitize_stack_spill = true; > } > > + destroy_stack_slots_dynptr(env, state, spi); > + subjective, but it feels like having an explicit slot_type != STACK_DYNPTR here is better, then "destroy_stack_slots_dynptr" actually is doing destruction, not "maybe_destroy_stack_slots_dynptr", which you effectively are implementing here also, shouldn't overwrite of dynptrs w/ ref_obj_id be prevented early on with a meaningful error, instead of waiting for "unreleased reference" error later on? for ref_obj_id dynptrs we know that you have to call helper with OBJ_RELEASE semantics, at which point we'll reset stack slots am I missing something? > mark_stack_slot_scratched(env, spi); > if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && > !register_is_null(reg) && env->bpf_capable) { > @@ -3497,6 +3539,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env, > if (err) > return err; > > + for (i = min_off; i < max_off; i++) { > + int slot, spi; > + > + slot = -i - 1; > + spi = slot / BPF_REG_SIZE; > + destroy_stack_slots_dynptr(env, state, spi); > + } > > /* Variable offset writes destroy any spilled pointers in range. */ > for (i = min_off; i < max_off; i++) { > @@ -5524,6 +5573,30 @@ static int check_stack_range_initialized( > } > > if (meta && meta->raw_mode) { > + /* Ensure we won't be overwriting dynptrs when simulating byte > + * by byte access in check_helper_call using meta.access_size. > + * This would be a problem if we have a helper in the future > + * which takes: > + * > + * helper(uninit_mem, len, dynptr) > + * > + * Now, uninint_mem may overlap with dynptr pointer. Hence, it > + * may end up writing to dynptr itself when touching memory from > + * arg 1. This can be relaxed on a case by case basis for known > + * safe cases, but reject due to the possibilitiy of aliasing by > + * default. > + */ > + for (i = min_off; i < max_off + access_size; i++) { > + slot = -i - 1; nit: slot name is misleading, we normally call entire 8-byte slot a "slot", while here slot is actually off, right? same above. > + spi = slot / BPF_REG_SIZE; > + /* raw_mode may write past allocated_stack */ > + if (state->allocated_stack <= slot) > + continue; > + if (state->stack[spi].slot_type[slot % BPF_REG_SIZE] == STACK_DYNPTR) { > + verbose(env, "potential write to dynptr at off=%d disallowed\n", i); > + return -EACCES; > + } > + } > meta->access_size = access_size; > meta->regno = regno; > return 0; > -- > 2.39.0 >
On Sun, Jan 01, 2023 at 02:03:57PM +0530, Kumar Kartikeya Dwivedi wrote: > Currently, while reads are disallowed for dynptr stack slots, writes are > not. Reads don't work from both direct access and helpers, while writes > do work in both cases, but have the effect of overwriting the slot_type. Unrelated to this patch set, but disallowing reads from dynptr slots seems like unnecessary restriction. We allow reads from spilled slots and conceptually dynptr slots should fall in is_spilled_reg() category in check_stack_read_*(). We already can do: d = bpf_rdonly_cast(dynptr, bpf_core_type_id_kernel(struct bpf_dynptr_kern)) d->size; and there is really no need to add bpf_dynptr* accessors either as helpers or as kfuncs. All accessors can simply be 'static inline' pure bpf functions in bpf_helpers.h. Automatic inlining and zero kernel side maintenance. With verifier allowing reads into dynptr we can also enable bpf_cast_to_kern_ctx() to convert struct bpf_dynptr to struct bpf_dynptr_kern and enable even faster reads.
On Sun, Jan 1, 2023 at 12:34 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > Currently, while reads are disallowed for dynptr stack slots, writes are > not. Reads don't work from both direct access and helpers, while writes > do work in both cases, but have the effect of overwriting the slot_type. > > While this is fine, handling for a few edge cases is missing. Firstly, > a user can overwrite the stack slots of dynptr partially. > > Consider the following layout: > spi: [d][d][?] > 2 1 0 > > First slot is at spi 2, second at spi 1. > Now, do a write of 1 to 8 bytes for spi 1. > > This will essentially either write STACK_MISC for all slot_types or > STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write > of zeroes). The end result is that slot is scrubbed. > > Now, the layout is: > spi: [d][m][?] > 2 1 0 > > Suppose if user initializes spi = 1 as dynptr. > We get: > spi: [d][d][d] > 2 1 0 > > But this time, both spi 2 and spi 1 have first_slot = true. > > Now, when passing spi 2 to dynptr helper, it will consider it as > initialized as it does not check whether second slot has first_slot == > false. And spi 1 should already work as normal. > > This effectively replaced size + offset of first dynptr, hence allowing > invalid OOB reads and writes. > > Make a few changes to protect against this: > When writing to PTR_TO_STACK using BPF insns, when we touch spi of a > STACK_DYNPTR type, mark both first and second slot (regardless of which > slot we touch) as STACK_INVALID. Reads are already prevented. > > Second, prevent writing to stack memory from helpers if the range may > contain any STACK_DYNPTR slots. Reads are already prevented. > > For helpers, we cannot allow it to destroy dynptrs from the writes as > depending on arguments, helper may take uninit_mem and dynptr both at > the same time. This would mean that helper may write to uninit_mem > before it reads the dynptr, which would be bad. > > PTR_TO_MEM: [?????dd] > > Depending on the code inside the helper, it may end up overwriting the > dynptr contents first and then read those as the dynptr argument. > > Verifier would only simulate destruction when it does byte by byte > access simulation in check_helper_call for meta.access_size, and > fail to catch this case, as it happens after argument checks. > > The same would need to be done for any other non-trivial objects created > on the stack in the future, such as bpf_list_head on stack, or > bpf_rb_root on stack. > > A common misunderstanding in the current code is that MEM_UNINIT means > writes, but note that writes may also be performed even without > MEM_UNINIT in case of helpers, in that case the code after handling meta > && meta->raw_mode will complain when it sees STACK_DYNPTR. So that > invalid read case also covers writes to potential STACK_DYNPTR slots. > The only loophole was in case of meta->raw_mode which simulated writes > through instructions which could overwrite them. > > A future series sequenced after this will focus on the clean up of > helper access checks and bugs around that. > > Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> > --- > kernel/bpf/verifier.c | 73 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 73 insertions(+) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index ca970f80e395..b985d90505cc 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -769,6 +769,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg, > __mark_dynptr_reg(reg, type, true); > } > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > + struct bpf_func_state *state, int spi); > > static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg, > enum bpf_arg_type arg_type, int insn_idx) > @@ -858,6 +860,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re > return 0; > } > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > + struct bpf_func_state *state, int spi) > +{ > + int i; > + > + /* We always ensure that STACK_DYNPTR is never set partially, > + * hence just checking for slot_type[0] is enough. This is > + * different for STACK_SPILL, where it may be only set for > + * 1 byte, so code has to use is_spilled_reg. > + */ > + if (state->stack[spi].slot_type[0] != STACK_DYNPTR) > + return; nit: an empty line here helps readability > + /* Reposition spi to first slot */ > + if (!state->stack[spi].spilled_ptr.dynptr.first_slot) > + spi = spi + 1; > + > + mark_stack_slot_scratched(env, spi); > + mark_stack_slot_scratched(env, spi - 1); > + > + /* Writing partially to one dynptr stack slot destroys both. */ > + for (i = 0; i < BPF_REG_SIZE; i++) { > + state->stack[spi].slot_type[i] = STACK_INVALID; > + state->stack[spi - 1].slot_type[i] = STACK_INVALID; > + } > + > + /* Do not release reference state, we are destroying dynptr on stack, > + * not using some helper to release it. Just reset register. > + */ I agree with Andrii's point - I think it'd be more helpful if we error out here if the dynptr is refcounted. It'd be easy to check too, we already have dynptr_type_refcounted(). > + __mark_reg_not_init(env, &state->stack[spi].spilled_ptr); > + __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr); > + > + /* Same reason as unmark_stack_slots_dynptr above */ > + state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN; > + state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN; > + > + return; I think we should also invalidate any data slices associated with the dynptrs? It seems natural that once a dynptr is invalidated, none of its data slices should be usable. > +} > + > static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > { > struct bpf_func_state *state = func(env, reg); > @@ -3384,6 +3424,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > env->insn_aux_data[insn_idx].sanitize_stack_spill = true; > } > > + destroy_stack_slots_dynptr(env, state, spi); > + > mark_stack_slot_scratched(env, spi); > if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && > !register_is_null(reg) && env->bpf_capable) { > @@ -3497,6 +3539,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env, > if (err) > return err; > > + for (i = min_off; i < max_off; i++) { > + int slot, spi; > + > + slot = -i - 1; > + spi = slot / BPF_REG_SIZE; I think you can just use __get_spi() here > + destroy_stack_slots_dynptr(env, state, spi); I think here too, if (state->stack[spi].slot_type[0] == STACK_DYNPTR) destroy_stack_slots_dynptr(env, state, spi) makes it more readable. And if it is a STACK_DYNPTR, we can also fast-forward i. > + } > > /* Variable offset writes destroy any spilled pointers in range. */ > for (i = min_off; i < max_off; i++) { > @@ -5524,6 +5573,30 @@ static int check_stack_range_initialized( > } > > if (meta && meta->raw_mode) { > + /* Ensure we won't be overwriting dynptrs when simulating byte > + * by byte access in check_helper_call using meta.access_size. > + * This would be a problem if we have a helper in the future > + * which takes: > + * > + * helper(uninit_mem, len, dynptr) > + * > + * Now, uninint_mem may overlap with dynptr pointer. Hence, it > + * may end up writing to dynptr itself when touching memory from > + * arg 1. This can be relaxed on a case by case basis for known > + * safe cases, but reject due to the possibilitiy of aliasing by > + * default. > + */ > + for (i = min_off; i < max_off + access_size; i++) { > + slot = -i - 1; > + spi = slot / BPF_REG_SIZE; nit: here too, we can use __get_spi() > + /* raw_mode may write past allocated_stack */ > + if (state->allocated_stack <= slot) > + continue; > + if (state->stack[spi].slot_type[slot % BPF_REG_SIZE] == STACK_DYNPTR) { > + verbose(env, "potential write to dynptr at off=%d disallowed\n", i); > + return -EACCES; > + } > + } > meta->access_size = access_size; > meta->regno = regno; > return 0; > -- > 2.39.0 >
On Fri, Jan 6, 2023 at 11:16 AM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Sun, Jan 1, 2023 at 12:34 AM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > Currently, while reads are disallowed for dynptr stack slots, writes are > > not. Reads don't work from both direct access and helpers, while writes > > do work in both cases, but have the effect of overwriting the slot_type. > > > > While this is fine, handling for a few edge cases is missing. Firstly, > > a user can overwrite the stack slots of dynptr partially. > > > > Consider the following layout: > > spi: [d][d][?] > > 2 1 0 > > > > First slot is at spi 2, second at spi 1. > > Now, do a write of 1 to 8 bytes for spi 1. > > > > This will essentially either write STACK_MISC for all slot_types or > > STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write > > of zeroes). The end result is that slot is scrubbed. > > > > Now, the layout is: > > spi: [d][m][?] > > 2 1 0 > > > > Suppose if user initializes spi = 1 as dynptr. > > We get: > > spi: [d][d][d] > > 2 1 0 > > > > But this time, both spi 2 and spi 1 have first_slot = true. > > > > Now, when passing spi 2 to dynptr helper, it will consider it as > > initialized as it does not check whether second slot has first_slot == > > false. And spi 1 should already work as normal. > > > > This effectively replaced size + offset of first dynptr, hence allowing > > invalid OOB reads and writes. > > > > Make a few changes to protect against this: > > When writing to PTR_TO_STACK using BPF insns, when we touch spi of a > > STACK_DYNPTR type, mark both first and second slot (regardless of which > > slot we touch) as STACK_INVALID. Reads are already prevented. > > > > Second, prevent writing to stack memory from helpers if the range may > > contain any STACK_DYNPTR slots. Reads are already prevented. > > > > For helpers, we cannot allow it to destroy dynptrs from the writes as > > depending on arguments, helper may take uninit_mem and dynptr both at > > the same time. This would mean that helper may write to uninit_mem > > before it reads the dynptr, which would be bad. > > > > PTR_TO_MEM: [?????dd] > > > > Depending on the code inside the helper, it may end up overwriting the > > dynptr contents first and then read those as the dynptr argument. > > > > Verifier would only simulate destruction when it does byte by byte > > access simulation in check_helper_call for meta.access_size, and > > fail to catch this case, as it happens after argument checks. > > > > The same would need to be done for any other non-trivial objects created > > on the stack in the future, such as bpf_list_head on stack, or > > bpf_rb_root on stack. > > > > A common misunderstanding in the current code is that MEM_UNINIT means > > writes, but note that writes may also be performed even without > > MEM_UNINIT in case of helpers, in that case the code after handling meta > > && meta->raw_mode will complain when it sees STACK_DYNPTR. So that > > invalid read case also covers writes to potential STACK_DYNPTR slots. > > The only loophole was in case of meta->raw_mode which simulated writes > > through instructions which could overwrite them. > > > > A future series sequenced after this will focus on the clean up of > > helper access checks and bugs around that. > > > > Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> > > --- > > kernel/bpf/verifier.c | 73 +++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 73 insertions(+) > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index ca970f80e395..b985d90505cc 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -769,6 +769,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg, > > __mark_dynptr_reg(reg, type, true); > > } > > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > > + struct bpf_func_state *state, int spi); > > > > static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg, > > enum bpf_arg_type arg_type, int insn_idx) > > @@ -858,6 +860,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re > > return 0; > > } > > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > > + struct bpf_func_state *state, int spi) > > +{ > > + int i; > > + > > + /* We always ensure that STACK_DYNPTR is never set partially, > > + * hence just checking for slot_type[0] is enough. This is > > + * different for STACK_SPILL, where it may be only set for > > + * 1 byte, so code has to use is_spilled_reg. > > + */ > > + if (state->stack[spi].slot_type[0] != STACK_DYNPTR) > > + return; > > nit: an empty line here helps readability > > > + /* Reposition spi to first slot */ > > + if (!state->stack[spi].spilled_ptr.dynptr.first_slot) > > + spi = spi + 1; > > + > > + mark_stack_slot_scratched(env, spi); > > + mark_stack_slot_scratched(env, spi - 1); > > + > > + /* Writing partially to one dynptr stack slot destroys both. */ > > + for (i = 0; i < BPF_REG_SIZE; i++) { > > + state->stack[spi].slot_type[i] = STACK_INVALID; > > + state->stack[spi - 1].slot_type[i] = STACK_INVALID; > > + } > > + > > + /* Do not release reference state, we are destroying dynptr on stack, > > + * not using some helper to release it. Just reset register. > > + */ > > I agree with Andrii's point - I think it'd be more helpful if we error > out here if the dynptr is refcounted. It'd be easy to check too, we > already have dynptr_type_refcounted(). Actually, since __mark_reg_unknown sets reg->ref_obj_id to 0, it's imperative that we error out here if the dynptr is refcounted > > > + __mark_reg_not_init(env, &state->stack[spi].spilled_ptr); > > + __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr); > > + > > + /* Same reason as unmark_stack_slots_dynptr above */ > > + state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN; > > + state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN; > > + > > + return; > > I think we should also invalidate any data slices associated with the > dynptrs? It seems natural that once a dynptr is invalidated, none of > its data slices should be usable. > > > +} > > + > > static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > > { > > struct bpf_func_state *state = func(env, reg); > > @@ -3384,6 +3424,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > > env->insn_aux_data[insn_idx].sanitize_stack_spill = true; > > } > > > > + destroy_stack_slots_dynptr(env, state, spi); > > + > > mark_stack_slot_scratched(env, spi); > > if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && > > !register_is_null(reg) && env->bpf_capable) { > > @@ -3497,6 +3539,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env, > > if (err) > > return err; > > > > + for (i = min_off; i < max_off; i++) { > > + int slot, spi; > > + > > + slot = -i - 1; > > + spi = slot / BPF_REG_SIZE; > > I think you can just use __get_spi() here > > > + destroy_stack_slots_dynptr(env, state, spi); > > I think here too, > > if (state->stack[spi].slot_type[0] == STACK_DYNPTR) > destroy_stack_slots_dynptr(env, state, spi) > > makes it more readable. > > And if it is a STACK_DYNPTR, we can also fast-forward i. > > > + } > > > > /* Variable offset writes destroy any spilled pointers in range. */ > > for (i = min_off; i < max_off; i++) { > > @@ -5524,6 +5573,30 @@ static int check_stack_range_initialized( > > } > > > > if (meta && meta->raw_mode) { > > + /* Ensure we won't be overwriting dynptrs when simulating byte > > + * by byte access in check_helper_call using meta.access_size. > > + * This would be a problem if we have a helper in the future > > + * which takes: > > + * > > + * helper(uninit_mem, len, dynptr) > > + * > > + * Now, uninint_mem may overlap with dynptr pointer. Hence, it > > + * may end up writing to dynptr itself when touching memory from > > + * arg 1. This can be relaxed on a case by case basis for known > > + * safe cases, but reject due to the possibilitiy of aliasing by > > + * default. > > + */ > > + for (i = min_off; i < max_off + access_size; i++) { > > + slot = -i - 1; > > + spi = slot / BPF_REG_SIZE; > > nit: here too, we can use __get_spi() > > > + /* raw_mode may write past allocated_stack */ > > + if (state->allocated_stack <= slot) > > + continue; > > + if (state->stack[spi].slot_type[slot % BPF_REG_SIZE] == STACK_DYNPTR) { > > + verbose(env, "potential write to dynptr at off=%d disallowed\n", i); > > + return -EACCES; > > + } > > + } > > meta->access_size = access_size; > > meta->regno = regno; > > return 0; > > -- > > 2.39.0 > >
On Thu, Jan 05, 2023 at 04:12:52AM IST, Andrii Nakryiko wrote: > On Sun, Jan 1, 2023 at 12:34 AM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > Currently, while reads are disallowed for dynptr stack slots, writes are > > not. Reads don't work from both direct access and helpers, while writes > > do work in both cases, but have the effect of overwriting the slot_type. > > > > While this is fine, handling for a few edge cases is missing. Firstly, > > a user can overwrite the stack slots of dynptr partially. > > > > Consider the following layout: > > spi: [d][d][?] > > 2 1 0 > > > > First slot is at spi 2, second at spi 1. > > Now, do a write of 1 to 8 bytes for spi 1. > > > > This will essentially either write STACK_MISC for all slot_types or > > STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write > > of zeroes). The end result is that slot is scrubbed. > > > > Now, the layout is: > > spi: [d][m][?] > > 2 1 0 > > > > Suppose if user initializes spi = 1 as dynptr. > > We get: > > spi: [d][d][d] > > 2 1 0 > > > > But this time, both spi 2 and spi 1 have first_slot = true. > > > > Now, when passing spi 2 to dynptr helper, it will consider it as > > initialized as it does not check whether second slot has first_slot == > > false. And spi 1 should already work as normal. > > > > This effectively replaced size + offset of first dynptr, hence allowing > > invalid OOB reads and writes. > > > > Make a few changes to protect against this: > > When writing to PTR_TO_STACK using BPF insns, when we touch spi of a > > STACK_DYNPTR type, mark both first and second slot (regardless of which > > slot we touch) as STACK_INVALID. Reads are already prevented. > > > > Second, prevent writing to stack memory from helpers if the range may > > contain any STACK_DYNPTR slots. Reads are already prevented. > > > > For helpers, we cannot allow it to destroy dynptrs from the writes as > > depending on arguments, helper may take uninit_mem and dynptr both at > > the same time. This would mean that helper may write to uninit_mem > > before it reads the dynptr, which would be bad. > > > > PTR_TO_MEM: [?????dd] > > > > Depending on the code inside the helper, it may end up overwriting the > > dynptr contents first and then read those as the dynptr argument. > > > > Verifier would only simulate destruction when it does byte by byte > > access simulation in check_helper_call for meta.access_size, and > > fail to catch this case, as it happens after argument checks. > > > > The same would need to be done for any other non-trivial objects created > > on the stack in the future, such as bpf_list_head on stack, or > > bpf_rb_root on stack. > > > > A common misunderstanding in the current code is that MEM_UNINIT means > > writes, but note that writes may also be performed even without > > MEM_UNINIT in case of helpers, in that case the code after handling meta > > && meta->raw_mode will complain when it sees STACK_DYNPTR. So that > > invalid read case also covers writes to potential STACK_DYNPTR slots. > > The only loophole was in case of meta->raw_mode which simulated writes > > through instructions which could overwrite them. > > > > A future series sequenced after this will focus on the clean up of > > helper access checks and bugs around that. > > > > Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> > > --- > > kernel/bpf/verifier.c | 73 +++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 73 insertions(+) > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index ca970f80e395..b985d90505cc 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -769,6 +769,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg, > > __mark_dynptr_reg(reg, type, true); > > } > > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > > + struct bpf_func_state *state, int spi); > > > > static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg, > > enum bpf_arg_type arg_type, int insn_idx) > > @@ -858,6 +860,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re > > return 0; > > } > > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > > + struct bpf_func_state *state, int spi) > > +{ > > + int i; > > + > > + /* We always ensure that STACK_DYNPTR is never set partially, > > + * hence just checking for slot_type[0] is enough. This is > > + * different for STACK_SPILL, where it may be only set for > > + * 1 byte, so code has to use is_spilled_reg. > > + */ > > + if (state->stack[spi].slot_type[0] != STACK_DYNPTR) > > + return; > > + /* Reposition spi to first slot */ > > + if (!state->stack[spi].spilled_ptr.dynptr.first_slot) > > + spi = spi + 1; > > + > > + mark_stack_slot_scratched(env, spi); > > + mark_stack_slot_scratched(env, spi - 1); > > + > > + /* Writing partially to one dynptr stack slot destroys both. */ > > + for (i = 0; i < BPF_REG_SIZE; i++) { > > + state->stack[spi].slot_type[i] = STACK_INVALID; > > + state->stack[spi - 1].slot_type[i] = STACK_INVALID; > > + } > > + > > + /* Do not release reference state, we are destroying dynptr on stack, > > + * not using some helper to release it. Just reset register. > > + */ > > + __mark_reg_not_init(env, &state->stack[spi].spilled_ptr); > > + __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr); > > + > > + /* Same reason as unmark_stack_slots_dynptr above */ > > + state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN; > > + state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN; > > + > > + return; > > +} > > + > > static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > > { > > struct bpf_func_state *state = func(env, reg); > > @@ -3384,6 +3424,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > > env->insn_aux_data[insn_idx].sanitize_stack_spill = true; > > } > > > > + destroy_stack_slots_dynptr(env, state, spi); > > + > > subjective, but it feels like having an explicit slot_type != > STACK_DYNPTR here is better, then "destroy_stack_slots_dynptr" > actually is doing destruction, not "maybe_destroy_stack_slots_dynptr", > which you effectively are implementing here > The intent of the function is to destroy any dynptr which the spi belongs to. If there is nothing, it will just return. I don't mind pulling the check out, but I think it's going to be done before each call for this function, so it felt better to keep it inside it and make non STACK_DYNPTR case a no-op. > also, shouldn't overwrite of dynptrs w/ ref_obj_id be prevented early > on with a meaningful error, instead of waiting for "unreleased > reference" error later on? for ref_obj_id dynptrs we know that you > have to call helper with OBJ_RELEASE semantics, at which point we'll > reset stack slots > > am I missing something? > Yes, I can make that change. > > > mark_stack_slot_scratched(env, spi); > > if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && > > !register_is_null(reg) && env->bpf_capable) { > > @@ -3497,6 +3539,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env, > > if (err) > > return err; > > > > + for (i = min_off; i < max_off; i++) { > > + int slot, spi; > > + > > + slot = -i - 1; > > + spi = slot / BPF_REG_SIZE; > > + destroy_stack_slots_dynptr(env, state, spi); > > + } > > > > /* Variable offset writes destroy any spilled pointers in range. */ > > for (i = min_off; i < max_off; i++) { > > @@ -5524,6 +5573,30 @@ static int check_stack_range_initialized( > > } > > > > if (meta && meta->raw_mode) { > > + /* Ensure we won't be overwriting dynptrs when simulating byte > > + * by byte access in check_helper_call using meta.access_size. > > + * This would be a problem if we have a helper in the future > > + * which takes: > > + * > > + * helper(uninit_mem, len, dynptr) > > + * > > + * Now, uninint_mem may overlap with dynptr pointer. Hence, it > > + * may end up writing to dynptr itself when touching memory from > > + * arg 1. This can be relaxed on a case by case basis for known > > + * safe cases, but reject due to the possibilitiy of aliasing by > > + * default. > > + */ > > + for (i = min_off; i < max_off + access_size; i++) { > > + slot = -i - 1; > > nit: slot name is misleading, we normally call entire 8-byte slot a > "slot", while here slot is actually off, right? same above. > The same naming has been used in multiple places, probably because these functions also get an off parameter passed in from the caller. I guess stack_off sounds better? > > + spi = slot / BPF_REG_SIZE; > > + /* raw_mode may write past allocated_stack */ > > + if (state->allocated_stack <= slot) > > + continue; > > + if (state->stack[spi].slot_type[slot % BPF_REG_SIZE] == STACK_DYNPTR) { > > + verbose(env, "potential write to dynptr at off=%d disallowed\n", i); > > + return -EACCES; > > + } > > + } > > meta->access_size = access_size; > > meta->regno = regno; > > return 0; > > -- > > 2.39.0 > >
On Sat, Jan 07, 2023 at 12:46:23AM IST, Joanne Koong wrote: > On Sun, Jan 1, 2023 at 12:34 AM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > Currently, while reads are disallowed for dynptr stack slots, writes are > > not. Reads don't work from both direct access and helpers, while writes > > do work in both cases, but have the effect of overwriting the slot_type. > > > > While this is fine, handling for a few edge cases is missing. Firstly, > > a user can overwrite the stack slots of dynptr partially. > > > > Consider the following layout: > > spi: [d][d][?] > > 2 1 0 > > > > First slot is at spi 2, second at spi 1. > > Now, do a write of 1 to 8 bytes for spi 1. > > > > This will essentially either write STACK_MISC for all slot_types or > > STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write > > of zeroes). The end result is that slot is scrubbed. > > > > Now, the layout is: > > spi: [d][m][?] > > 2 1 0 > > > > Suppose if user initializes spi = 1 as dynptr. > > We get: > > spi: [d][d][d] > > 2 1 0 > > > > But this time, both spi 2 and spi 1 have first_slot = true. > > > > Now, when passing spi 2 to dynptr helper, it will consider it as > > initialized as it does not check whether second slot has first_slot == > > false. And spi 1 should already work as normal. > > > > This effectively replaced size + offset of first dynptr, hence allowing > > invalid OOB reads and writes. > > > > Make a few changes to protect against this: > > When writing to PTR_TO_STACK using BPF insns, when we touch spi of a > > STACK_DYNPTR type, mark both first and second slot (regardless of which > > slot we touch) as STACK_INVALID. Reads are already prevented. > > > > Second, prevent writing to stack memory from helpers if the range may > > contain any STACK_DYNPTR slots. Reads are already prevented. > > > > For helpers, we cannot allow it to destroy dynptrs from the writes as > > depending on arguments, helper may take uninit_mem and dynptr both at > > the same time. This would mean that helper may write to uninit_mem > > before it reads the dynptr, which would be bad. > > > > PTR_TO_MEM: [?????dd] > > > > Depending on the code inside the helper, it may end up overwriting the > > dynptr contents first and then read those as the dynptr argument. > > > > Verifier would only simulate destruction when it does byte by byte > > access simulation in check_helper_call for meta.access_size, and > > fail to catch this case, as it happens after argument checks. > > > > The same would need to be done for any other non-trivial objects created > > on the stack in the future, such as bpf_list_head on stack, or > > bpf_rb_root on stack. > > > > A common misunderstanding in the current code is that MEM_UNINIT means > > writes, but note that writes may also be performed even without > > MEM_UNINIT in case of helpers, in that case the code after handling meta > > && meta->raw_mode will complain when it sees STACK_DYNPTR. So that > > invalid read case also covers writes to potential STACK_DYNPTR slots. > > The only loophole was in case of meta->raw_mode which simulated writes > > through instructions which could overwrite them. > > > > A future series sequenced after this will focus on the clean up of > > helper access checks and bugs around that. > > > > Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> > > --- > > kernel/bpf/verifier.c | 73 +++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 73 insertions(+) > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index ca970f80e395..b985d90505cc 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -769,6 +769,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg, > > __mark_dynptr_reg(reg, type, true); > > } > > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > > + struct bpf_func_state *state, int spi); > > > > static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg, > > enum bpf_arg_type arg_type, int insn_idx) > > @@ -858,6 +860,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re > > return 0; > > } > > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > > + struct bpf_func_state *state, int spi) > > +{ > > + int i; > > + > > + /* We always ensure that STACK_DYNPTR is never set partially, > > + * hence just checking for slot_type[0] is enough. This is > > + * different for STACK_SPILL, where it may be only set for > > + * 1 byte, so code has to use is_spilled_reg. > > + */ > > + if (state->stack[spi].slot_type[0] != STACK_DYNPTR) > > + return; > > nit: an empty line here helps readability > Ok. > > + /* Reposition spi to first slot */ > > + if (!state->stack[spi].spilled_ptr.dynptr.first_slot) > > + spi = spi + 1; > > + > > + mark_stack_slot_scratched(env, spi); > > + mark_stack_slot_scratched(env, spi - 1); > > + > > + /* Writing partially to one dynptr stack slot destroys both. */ > > + for (i = 0; i < BPF_REG_SIZE; i++) { > > + state->stack[spi].slot_type[i] = STACK_INVALID; > > + state->stack[spi - 1].slot_type[i] = STACK_INVALID; > > + } > > + > > + /* Do not release reference state, we are destroying dynptr on stack, > > + * not using some helper to release it. Just reset register. > > + */ > > I agree with Andrii's point - I think it'd be more helpful if we error > out here if the dynptr is refcounted. It'd be easy to check too, we > already have dynptr_type_refcounted(). > Ack, I'll change it to return an error early. > > + __mark_reg_not_init(env, &state->stack[spi].spilled_ptr); > > + __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr); > > + > > + /* Same reason as unmark_stack_slots_dynptr above */ > > + state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN; > > + state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN; > > + > > + return; > > I think we should also invalidate any data slices associated with the > dynptrs? It seems natural that once a dynptr is invalidated, none of > its data slices should be usable. > Great catch, will fix. > > +} > > + > > static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > > { > > struct bpf_func_state *state = func(env, reg); > > @@ -3384,6 +3424,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > > env->insn_aux_data[insn_idx].sanitize_stack_spill = true; > > } > > > > + destroy_stack_slots_dynptr(env, state, spi); > > + > > mark_stack_slot_scratched(env, spi); > > if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && > > !register_is_null(reg) && env->bpf_capable) { > > @@ -3497,6 +3539,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env, > > if (err) > > return err; > > > > + for (i = min_off; i < max_off; i++) { > > + int slot, spi; > > + > > + slot = -i - 1; > > + spi = slot / BPF_REG_SIZE; > > I think you can just use __get_spi() here > Ack. > > + destroy_stack_slots_dynptr(env, state, spi); > > I think here too, > > if (state->stack[spi].slot_type[0] == STACK_DYNPTR) > destroy_stack_slots_dynptr(env, state, spi) > > makes it more readable. > > And if it is a STACK_DYNPTR, we can also fast-forward i. > No issues with such a change, but it's going to precede almost every call to this function. I don't have a strong preference, but we could also call it destroy_if_dynptr_stack_slot to make it more clear the destructon is conditional and move it inside the function. > [...]
On Thu, Jan 05, 2023 at 08:36:07AM IST, Alexei Starovoitov wrote: > On Sun, Jan 01, 2023 at 02:03:57PM +0530, Kumar Kartikeya Dwivedi wrote: > > Currently, while reads are disallowed for dynptr stack slots, writes are > > not. Reads don't work from both direct access and helpers, while writes > > do work in both cases, but have the effect of overwriting the slot_type. > > Unrelated to this patch set, but disallowing reads from dynptr slots > seems like unnecessary restriction. > We allow reads from spilled slots and conceptually dynptr slots should > fall in is_spilled_reg() category in check_stack_read_*(). > > We already can do: > d = bpf_rdonly_cast(dynptr, bpf_core_type_id_kernel(struct bpf_dynptr_kern)) > d->size; Not sure this cast is required, it can just be reads from the stack and clang will generate CO-RE relocatable accesses when casted to the right struct with preserve_access_index attribute set? Or did I miss something? > and there is really no need to add bpf_dynptr* accessors either as helpers or as kfuncs. > All accessors can simply be 'static inline' pure bpf functions in bpf_helpers.h. > Automatic inlining and zero kernel side maintenance. Yeah, it could be made to work (perhaps even portably using CO-RE and relocatable enum values which check for set bits etc.). But in the end how do you define such an interface, will it be UAPI like xdp_md or __sk_buff, or unstable like kfunc, or just best-effort stable as long as user is making use of CO-RE relocs? > > With verifier allowing reads into dynptr we can also enable bpf_cast_to_kern_ctx() > to convert struct bpf_dynptr to struct bpf_dynptr_kern and enable > even faster reads. I think rdonly_cast is unnecessary, just enabling reads into stack will be enough to enable this.
On Mon, Jan 9, 2023 at 3:52 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Thu, Jan 05, 2023 at 08:36:07AM IST, Alexei Starovoitov wrote: > > On Sun, Jan 01, 2023 at 02:03:57PM +0530, Kumar Kartikeya Dwivedi wrote: > > > Currently, while reads are disallowed for dynptr stack slots, writes are > > > not. Reads don't work from both direct access and helpers, while writes > > > do work in both cases, but have the effect of overwriting the slot_type. > > > > Unrelated to this patch set, but disallowing reads from dynptr slots > > seems like unnecessary restriction. > > We allow reads from spilled slots and conceptually dynptr slots should > > fall in is_spilled_reg() category in check_stack_read_*(). > > > > We already can do: > > d = bpf_rdonly_cast(dynptr, bpf_core_type_id_kernel(struct bpf_dynptr_kern)) > > d->size; > > Not sure this cast is required, it can just be reads from the stack and clang > will generate CO-RE relocatable accesses when casted to the right struct with > preserve_access_index attribute set? Or did I miss something? rdonly_cast is required today, because the verifier rejects raw reads. > > > > With verifier allowing reads into dynptr we can also enable bpf_cast_to_kern_ctx() > > to convert struct bpf_dynptr to struct bpf_dynptr_kern and enable > > even faster reads. > > I think rdonly_cast is unnecessary, just enabling reads into stack will be > enough to enable this. Right. I was thinking that bpf_cast_to_kern_ctx will make things more robust, but plain vanilla cast to struct bpf_dynptr_kern * and using CO-RE will work when the verifier enables reads. > relocatable enum values which check for set bits etc.). But in the end how do > you define such an interface, will it be UAPI like xdp_md or __sk_buff, or > unstable like kfunc, or just best-effort stable as long as user is making use of > CO-RE relocs? We can start best-effort with CORE. All our fixed uapi things like __sk_buff and xdp_md have ongoing usability issues. People always need something new and we keep adding to those structs every now and then. struct __sk_buff is quite scary. Its vlan_* fields was a bit of a pain to adjust when corresponding vlan changes were made in the sk_buff and networking core. Eventually we still had to do: /* Simulate the following kernel macro: * #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) */ shared_info = bpf_rdonly_cast(kskb->head + kskb->end, bpf_core_type_id_kernel(struct skb_shared_info)); I'm arguing that everything we expose to bpf progs should be unstable in the beginning.
On Mon, Jan 9, 2023 at 3:30 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Sat, Jan 07, 2023 at 12:46:23AM IST, Joanne Koong wrote: > > On Sun, Jan 1, 2023 at 12:34 AM Kumar Kartikeya Dwivedi > > <memxor@gmail.com> wrote: > > > > > > Currently, while reads are disallowed for dynptr stack slots, writes are > > > not. Reads don't work from both direct access and helpers, while writes > > > do work in both cases, but have the effect of overwriting the slot_type. > > > > > > While this is fine, handling for a few edge cases is missing. Firstly, > > > a user can overwrite the stack slots of dynptr partially. > > > > > > Consider the following layout: > > > spi: [d][d][?] > > > 2 1 0 > > > > > > First slot is at spi 2, second at spi 1. > > > Now, do a write of 1 to 8 bytes for spi 1. > > > > > > This will essentially either write STACK_MISC for all slot_types or > > > STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write > > > of zeroes). The end result is that slot is scrubbed. > > > > > > Now, the layout is: > > > spi: [d][m][?] > > > 2 1 0 > > > > > > Suppose if user initializes spi = 1 as dynptr. > > > We get: > > > spi: [d][d][d] > > > 2 1 0 > > > > > > But this time, both spi 2 and spi 1 have first_slot = true. > > > > > > Now, when passing spi 2 to dynptr helper, it will consider it as > > > initialized as it does not check whether second slot has first_slot == > > > false. And spi 1 should already work as normal. > > > > > > This effectively replaced size + offset of first dynptr, hence allowing > > > invalid OOB reads and writes. > > > > > > Make a few changes to protect against this: > > > When writing to PTR_TO_STACK using BPF insns, when we touch spi of a > > > STACK_DYNPTR type, mark both first and second slot (regardless of which > > > slot we touch) as STACK_INVALID. Reads are already prevented. > > > > > > Second, prevent writing to stack memory from helpers if the range may > > > contain any STACK_DYNPTR slots. Reads are already prevented. > > > > > > For helpers, we cannot allow it to destroy dynptrs from the writes as > > > depending on arguments, helper may take uninit_mem and dynptr both at > > > the same time. This would mean that helper may write to uninit_mem > > > before it reads the dynptr, which would be bad. > > > > > > PTR_TO_MEM: [?????dd] > > > > > > Depending on the code inside the helper, it may end up overwriting the > > > dynptr contents first and then read those as the dynptr argument. > > > > > > Verifier would only simulate destruction when it does byte by byte > > > access simulation in check_helper_call for meta.access_size, and > > > fail to catch this case, as it happens after argument checks. > > > > > > The same would need to be done for any other non-trivial objects created > > > on the stack in the future, such as bpf_list_head on stack, or > > > bpf_rb_root on stack. > > > > > > A common misunderstanding in the current code is that MEM_UNINIT means > > > writes, but note that writes may also be performed even without > > > MEM_UNINIT in case of helpers, in that case the code after handling meta > > > && meta->raw_mode will complain when it sees STACK_DYNPTR. So that > > > invalid read case also covers writes to potential STACK_DYNPTR slots. > > > The only loophole was in case of meta->raw_mode which simulated writes > > > through instructions which could overwrite them. > > > > > > A future series sequenced after this will focus on the clean up of > > > helper access checks and bugs around that. > > > > > > Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> > > > --- > > > kernel/bpf/verifier.c | 73 +++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 73 insertions(+) > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > index ca970f80e395..b985d90505cc 100644 > > > --- a/kernel/bpf/verifier.c > > > +++ b/kernel/bpf/verifier.c > > > @@ -769,6 +769,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg, > > > __mark_dynptr_reg(reg, type, true); > > > } > > > > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > > > + struct bpf_func_state *state, int spi); > > > > > > static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg, > > > enum bpf_arg_type arg_type, int insn_idx) > > > @@ -858,6 +860,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re > > > return 0; > > > } > > > > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, > > > + struct bpf_func_state *state, int spi) > > > +{ > > > + int i; > > > + > > > + /* We always ensure that STACK_DYNPTR is never set partially, > > > + * hence just checking for slot_type[0] is enough. This is > > > + * different for STACK_SPILL, where it may be only set for > > > + * 1 byte, so code has to use is_spilled_reg. > > > + */ > > > + if (state->stack[spi].slot_type[0] != STACK_DYNPTR) > > > + return; > > > > nit: an empty line here helps readability > > > > Ok. > > > > + /* Reposition spi to first slot */ > > > + if (!state->stack[spi].spilled_ptr.dynptr.first_slot) > > > + spi = spi + 1; > > > + > > > + mark_stack_slot_scratched(env, spi); > > > + mark_stack_slot_scratched(env, spi - 1); > > > + > > > + /* Writing partially to one dynptr stack slot destroys both. */ > > > + for (i = 0; i < BPF_REG_SIZE; i++) { > > > + state->stack[spi].slot_type[i] = STACK_INVALID; > > > + state->stack[spi - 1].slot_type[i] = STACK_INVALID; > > > + } > > > + > > > + /* Do not release reference state, we are destroying dynptr on stack, > > > + * not using some helper to release it. Just reset register. > > > + */ > > > > I agree with Andrii's point - I think it'd be more helpful if we error > > out here if the dynptr is refcounted. It'd be easy to check too, we > > already have dynptr_type_refcounted(). > > > > Ack, I'll change it to return an error early. > > > > + __mark_reg_not_init(env, &state->stack[spi].spilled_ptr); > > > + __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr); > > > + > > > + /* Same reason as unmark_stack_slots_dynptr above */ > > > + state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN; > > > + state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN; > > > + > > > + return; > > > > I think we should also invalidate any data slices associated with the > > dynptrs? It seems natural that once a dynptr is invalidated, none of > > its data slices should be usable. > > > > Great catch, will fix. > > > > +} > > > + > > > static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > > > { > > > struct bpf_func_state *state = func(env, reg); > > > @@ -3384,6 +3424,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > > > env->insn_aux_data[insn_idx].sanitize_stack_spill = true; > > > } > > > > > > + destroy_stack_slots_dynptr(env, state, spi); > > > + > > > mark_stack_slot_scratched(env, spi); > > > if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && > > > !register_is_null(reg) && env->bpf_capable) { > > > @@ -3497,6 +3539,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env, > > > if (err) > > > return err; > > > > > > + for (i = min_off; i < max_off; i++) { > > > + int slot, spi; > > > + > > > + slot = -i - 1; > > > + spi = slot / BPF_REG_SIZE; > > > > I think you can just use __get_spi() here > > > > Ack. > > > > + destroy_stack_slots_dynptr(env, state, spi); > > > > I think here too, > > > > if (state->stack[spi].slot_type[0] == STACK_DYNPTR) > > destroy_stack_slots_dynptr(env, state, spi) > > > > makes it more readable. > > > > And if it is a STACK_DYNPTR, we can also fast-forward i. > > > > No issues with such a change, but it's going to precede almost every call to > this function. I don't have a strong preference, but we could also call it > destroy_if_dynptr_stack_slot to make it more clear the destructon is conditional > and move it inside the function. I don't have a strong preference either. Calling it destroy_if_dynptr_stack_slot() sounds good as well. > > > [...]
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index ca970f80e395..b985d90505cc 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -769,6 +769,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg, __mark_dynptr_reg(reg, type, true); } +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, + struct bpf_func_state *state, int spi); static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg, enum bpf_arg_type arg_type, int insn_idx) @@ -858,6 +860,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re return 0; } +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env, + struct bpf_func_state *state, int spi) +{ + int i; + + /* We always ensure that STACK_DYNPTR is never set partially, + * hence just checking for slot_type[0] is enough. This is + * different for STACK_SPILL, where it may be only set for + * 1 byte, so code has to use is_spilled_reg. + */ + if (state->stack[spi].slot_type[0] != STACK_DYNPTR) + return; + /* Reposition spi to first slot */ + if (!state->stack[spi].spilled_ptr.dynptr.first_slot) + spi = spi + 1; + + mark_stack_slot_scratched(env, spi); + mark_stack_slot_scratched(env, spi - 1); + + /* Writing partially to one dynptr stack slot destroys both. */ + for (i = 0; i < BPF_REG_SIZE; i++) { + state->stack[spi].slot_type[i] = STACK_INVALID; + state->stack[spi - 1].slot_type[i] = STACK_INVALID; + } + + /* Do not release reference state, we are destroying dynptr on stack, + * not using some helper to release it. Just reset register. + */ + __mark_reg_not_init(env, &state->stack[spi].spilled_ptr); + __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr); + + /* Same reason as unmark_stack_slots_dynptr above */ + state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN; + state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN; + + return; +} + static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg) { struct bpf_func_state *state = func(env, reg); @@ -3384,6 +3424,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, env->insn_aux_data[insn_idx].sanitize_stack_spill = true; } + destroy_stack_slots_dynptr(env, state, spi); + mark_stack_slot_scratched(env, spi); if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && !register_is_null(reg) && env->bpf_capable) { @@ -3497,6 +3539,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env, if (err) return err; + for (i = min_off; i < max_off; i++) { + int slot, spi; + + slot = -i - 1; + spi = slot / BPF_REG_SIZE; + destroy_stack_slots_dynptr(env, state, spi); + } /* Variable offset writes destroy any spilled pointers in range. */ for (i = min_off; i < max_off; i++) { @@ -5524,6 +5573,30 @@ static int check_stack_range_initialized( } if (meta && meta->raw_mode) { + /* Ensure we won't be overwriting dynptrs when simulating byte + * by byte access in check_helper_call using meta.access_size. + * This would be a problem if we have a helper in the future + * which takes: + * + * helper(uninit_mem, len, dynptr) + * + * Now, uninint_mem may overlap with dynptr pointer. Hence, it + * may end up writing to dynptr itself when touching memory from + * arg 1. This can be relaxed on a case by case basis for known + * safe cases, but reject due to the possibilitiy of aliasing by + * default. + */ + for (i = min_off; i < max_off + access_size; i++) { + slot = -i - 1; + spi = slot / BPF_REG_SIZE; + /* raw_mode may write past allocated_stack */ + if (state->allocated_stack <= slot) + continue; + if (state->stack[spi].slot_type[slot % BPF_REG_SIZE] == STACK_DYNPTR) { + verbose(env, "potential write to dynptr at off=%d disallowed\n", i); + return -EACCES; + } + } meta->access_size = access_size; meta->regno = regno; return 0;
Currently, while reads are disallowed for dynptr stack slots, writes are not. Reads don't work from both direct access and helpers, while writes do work in both cases, but have the effect of overwriting the slot_type. While this is fine, handling for a few edge cases is missing. Firstly, a user can overwrite the stack slots of dynptr partially. Consider the following layout: spi: [d][d][?] 2 1 0 First slot is at spi 2, second at spi 1. Now, do a write of 1 to 8 bytes for spi 1. This will essentially either write STACK_MISC for all slot_types or STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write of zeroes). The end result is that slot is scrubbed. Now, the layout is: spi: [d][m][?] 2 1 0 Suppose if user initializes spi = 1 as dynptr. We get: spi: [d][d][d] 2 1 0 But this time, both spi 2 and spi 1 have first_slot = true. Now, when passing spi 2 to dynptr helper, it will consider it as initialized as it does not check whether second slot has first_slot == false. And spi 1 should already work as normal. This effectively replaced size + offset of first dynptr, hence allowing invalid OOB reads and writes. Make a few changes to protect against this: When writing to PTR_TO_STACK using BPF insns, when we touch spi of a STACK_DYNPTR type, mark both first and second slot (regardless of which slot we touch) as STACK_INVALID. Reads are already prevented. Second, prevent writing to stack memory from helpers if the range may contain any STACK_DYNPTR slots. Reads are already prevented. For helpers, we cannot allow it to destroy dynptrs from the writes as depending on arguments, helper may take uninit_mem and dynptr both at the same time. This would mean that helper may write to uninit_mem before it reads the dynptr, which would be bad. PTR_TO_MEM: [?????dd] Depending on the code inside the helper, it may end up overwriting the dynptr contents first and then read those as the dynptr argument. Verifier would only simulate destruction when it does byte by byte access simulation in check_helper_call for meta.access_size, and fail to catch this case, as it happens after argument checks. The same would need to be done for any other non-trivial objects created on the stack in the future, such as bpf_list_head on stack, or bpf_rb_root on stack. A common misunderstanding in the current code is that MEM_UNINIT means writes, but note that writes may also be performed even without MEM_UNINIT in case of helpers, in that case the code after handling meta && meta->raw_mode will complain when it sees STACK_DYNPTR. So that invalid read case also covers writes to potential STACK_DYNPTR slots. The only loophole was in case of meta->raw_mode which simulated writes through instructions which could overwrite them. A future series sequenced after this will focus on the clean up of helper access checks and bugs around that. Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> --- kernel/bpf/verifier.c | 73 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+)