Message ID | 20211025130809.314707-1-toke@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf,v2] bpf: fix potential race in tail call compatibility check | expand |
> Lorenzo noticed that the code testing for program type compatibility of > tail call maps is potentially racy in that two threads could encounter a > map with an unset type simultaneously and both return true even though they > are inserting incompatible programs. > > The race window is quite small, but artificially enlarging it by adding a > usleep_range() inside the check in bpf_prog_array_compatible() makes it > trivial to trigger from userspace with a program that does, essentially: > > map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); > pid = fork(); > if (pid) { > key = 0; > value = xdp_fd; > } else { > key = 1; > value = tc_fd; > } > err = bpf_map_update_elem(map_fd, &key, &value, 0); > > While the race window is small, it has potentially serious ramifications in > that triggering it would allow a BPF program to tail call to a program of a > different type. So let's get rid of it by protecting the update with a > spinlock. The commit in the Fixes tag is the last commit that touches the > code in question. > > v2: > - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) > > Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") > Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- > include/linux/bpf.h | 1 + > kernel/bpf/arraymap.c | 1 + > kernel/bpf/core.c | 14 ++++++++++---- > kernel/bpf/syscall.c | 2 ++ > 4 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 020a7d5bf470..98d906176d89 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -929,6 +929,7 @@ struct bpf_array_aux { > * stored in the map to make sure that all callers and callees have > * the same prog type and JITed flag. > */ > + spinlock_t type_check_lock; I was wondering if we can use a mutex instead of a spinlock here since it is run from a syscall AFAIU. The only downside is mutex_lock is run inside aux->used_maps_mutex critical section. Am I missing something? Regards, Lorenzo > enum bpf_prog_type type; > bool jited; > /* Programs with direct jumps into programs part of this array. */ > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > index cebd4fb06d19..da9b1e96cadc 100644 > --- a/kernel/bpf/arraymap.c > +++ b/kernel/bpf/arraymap.c > @@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr) > INIT_WORK(&aux->work, prog_array_map_clear_deferred); > INIT_LIST_HEAD(&aux->poke_progs); > mutex_init(&aux->poke_mutex); > + spin_lock_init(&aux->type_check_lock); > > map = array_map_alloc(attr); > if (IS_ERR(map)) { > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index c1e7eb3f1876..9439c839d279 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -1823,20 +1823,26 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx, > bool bpf_prog_array_compatible(struct bpf_array *array, > const struct bpf_prog *fp) > { > + bool ret; > + > if (fp->kprobe_override) > return false; > > + spin_lock(&array->aux->type_check_lock); > + > if (!array->aux->type) { > /* There's no owner yet where we could check for > * compatibility. > */ > array->aux->type = fp->type; > array->aux->jited = fp->jited; > - return true; > + ret = true; > + } else { > + ret = array->aux->type == fp->type && > + array->aux->jited == fp->jited; > } > - > - return array->aux->type == fp->type && > - array->aux->jited == fp->jited; > + spin_unlock(&array->aux->type_check_lock); > + return ret; > } > > static int bpf_check_tail_call(const struct bpf_prog *fp) > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 4e50c0bfdb7d..955011c7df29 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -543,8 +543,10 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp) > > if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) { > array = container_of(map, struct bpf_array, map); > + spin_lock(&array->aux->type_check_lock); > type = array->aux->type; > jited = array->aux->jited; > + spin_unlock(&array->aux->type_check_lock); > } > > seq_printf(m, > -- > 2.33.0 >
On 10/25/21 4:28 PM, Lorenzo Bianconi wrote: >> Lorenzo noticed that the code testing for program type compatibility of >> tail call maps is potentially racy in that two threads could encounter a >> map with an unset type simultaneously and both return true even though they >> are inserting incompatible programs. >> >> The race window is quite small, but artificially enlarging it by adding a >> usleep_range() inside the check in bpf_prog_array_compatible() makes it >> trivial to trigger from userspace with a program that does, essentially: >> >> map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); >> pid = fork(); >> if (pid) { >> key = 0; >> value = xdp_fd; >> } else { >> key = 1; >> value = tc_fd; >> } >> err = bpf_map_update_elem(map_fd, &key, &value, 0); >> >> While the race window is small, it has potentially serious ramifications in >> that triggering it would allow a BPF program to tail call to a program of a >> different type. So let's get rid of it by protecting the update with a >> spinlock. The commit in the Fixes tag is the last commit that touches the >> code in question. >> >> v2: >> - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) >> >> Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") >> Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> >> --- >> include/linux/bpf.h | 1 + >> kernel/bpf/arraymap.c | 1 + >> kernel/bpf/core.c | 14 ++++++++++---- >> kernel/bpf/syscall.c | 2 ++ >> 4 files changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index 020a7d5bf470..98d906176d89 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -929,6 +929,7 @@ struct bpf_array_aux { >> * stored in the map to make sure that all callers and callees have >> * the same prog type and JITed flag. >> */ >> + spinlock_t type_check_lock; > > I was wondering if we can use a mutex instead of a spinlock here since it is > run from a syscall AFAIU. The only downside is mutex_lock is run inside > aux->used_maps_mutex critical section. Am I missing something? Hm, potentially it could work, but then it's also 32 vs 4 extra bytes. There's also poke_mutex or freeze_mutex, but feels to hacky to 'generalize for reuse', so I think the spinlock in bpf_array_aux is fine. >> enum bpf_prog_type type; >> bool jited; >> /* Programs with direct jumps into programs part of this array. */ >> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c >> index cebd4fb06d19..da9b1e96cadc 100644 >> --- a/kernel/bpf/arraymap.c >> +++ b/kernel/bpf/arraymap.c >> @@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr) >> INIT_WORK(&aux->work, prog_array_map_clear_deferred); >> INIT_LIST_HEAD(&aux->poke_progs); >> mutex_init(&aux->poke_mutex); >> + spin_lock_init(&aux->type_check_lock); Just as a tiny nit, I would probably name it slightly different, since type_check_lock mainly refers to the type property but there's also jit vs non-jit and as pointed out there could be other extensions that need checking in future as well. Maybe 'compat_lock' would be a more generic one or just: struct { enum bpf_prog_type type; bool jited; spinlock_t lock; } owner; >> map = array_map_alloc(attr); >> if (IS_ERR(map)) { >> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >> index c1e7eb3f1876..9439c839d279 100644 >> --- a/kernel/bpf/core.c >> +++ b/kernel/bpf/core.c >> @@ -1823,20 +1823,26 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx, >> bool bpf_prog_array_compatible(struct bpf_array *array, >> const struct bpf_prog *fp) >> { >> + bool ret; >> + >> if (fp->kprobe_override) >> return false; >> >> + spin_lock(&array->aux->type_check_lock); >> + >> if (!array->aux->type) { >> /* There's no owner yet where we could check for >> * compatibility. >> */ >> array->aux->type = fp->type; >> array->aux->jited = fp->jited; >> - return true; >> + ret = true; >> + } else { >> + ret = array->aux->type == fp->type && >> + array->aux->jited == fp->jited; >> } >> - >> - return array->aux->type == fp->type && >> - array->aux->jited == fp->jited; >> + spin_unlock(&array->aux->type_check_lock); >> + return ret; >> } >> >> static int bpf_check_tail_call(const struct bpf_prog *fp) >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index 4e50c0bfdb7d..955011c7df29 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -543,8 +543,10 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp) >> >> if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) { >> array = container_of(map, struct bpf_array, map); >> + spin_lock(&array->aux->type_check_lock); >> type = array->aux->type; >> jited = array->aux->jited; >> + spin_unlock(&array->aux->type_check_lock); >> } >> >> seq_printf(m, >> -- >> 2.33.0 >>
Daniel Borkmann <daniel@iogearbox.net> writes: > On 10/25/21 4:28 PM, Lorenzo Bianconi wrote: >>> Lorenzo noticed that the code testing for program type compatibility of >>> tail call maps is potentially racy in that two threads could encounter a >>> map with an unset type simultaneously and both return true even though they >>> are inserting incompatible programs. >>> >>> The race window is quite small, but artificially enlarging it by adding a >>> usleep_range() inside the check in bpf_prog_array_compatible() makes it >>> trivial to trigger from userspace with a program that does, essentially: >>> >>> map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); >>> pid = fork(); >>> if (pid) { >>> key = 0; >>> value = xdp_fd; >>> } else { >>> key = 1; >>> value = tc_fd; >>> } >>> err = bpf_map_update_elem(map_fd, &key, &value, 0); >>> >>> While the race window is small, it has potentially serious ramifications in >>> that triggering it would allow a BPF program to tail call to a program of a >>> different type. So let's get rid of it by protecting the update with a >>> spinlock. The commit in the Fixes tag is the last commit that touches the >>> code in question. >>> >>> v2: >>> - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) >>> >>> Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") >>> Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> >>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> >>> --- >>> include/linux/bpf.h | 1 + >>> kernel/bpf/arraymap.c | 1 + >>> kernel/bpf/core.c | 14 ++++++++++---- >>> kernel/bpf/syscall.c | 2 ++ >>> 4 files changed, 14 insertions(+), 4 deletions(-) >>> >>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >>> index 020a7d5bf470..98d906176d89 100644 >>> --- a/include/linux/bpf.h >>> +++ b/include/linux/bpf.h >>> @@ -929,6 +929,7 @@ struct bpf_array_aux { >>> * stored in the map to make sure that all callers and callees have >>> * the same prog type and JITed flag. >>> */ >>> + spinlock_t type_check_lock; >> >> I was wondering if we can use a mutex instead of a spinlock here since it is >> run from a syscall AFAIU. The only downside is mutex_lock is run inside >> aux->used_maps_mutex critical section. Am I missing something? > > Hm, potentially it could work, but then it's also 32 vs 4 extra bytes. There's > also poke_mutex or freeze_mutex, but feels to hacky to 'generalize for reuse', > so I think the spinlock in bpf_array_aux is fine. > >>> enum bpf_prog_type type; >>> bool jited; >>> /* Programs with direct jumps into programs part of this array. */ >>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c >>> index cebd4fb06d19..da9b1e96cadc 100644 >>> --- a/kernel/bpf/arraymap.c >>> +++ b/kernel/bpf/arraymap.c >>> @@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr) >>> INIT_WORK(&aux->work, prog_array_map_clear_deferred); >>> INIT_LIST_HEAD(&aux->poke_progs); >>> mutex_init(&aux->poke_mutex); >>> + spin_lock_init(&aux->type_check_lock); > > Just as a tiny nit, I would probably name it slightly different, since type_check_lock > mainly refers to the type property but there's also jit vs non-jit and as pointed out > there could be other extensions that need checking in future as well. Maybe 'compat_lock' > would be a more generic one or just: > > struct { > enum bpf_prog_type type; > bool jited; > spinlock_t lock; > } owner; Uh, I like that! Makes it easier to move as well (which we're doing as part of the xdp_mb series). Will send a v3 with this :) -Toke
> On 10/25/21 4:28 PM, Lorenzo Bianconi wrote: > > > Lorenzo noticed that the code testing for program type compatibility of > > > tail call maps is potentially racy in that two threads could encounter a > > > map with an unset type simultaneously and both return true even though they > > > are inserting incompatible programs. > > > > > > The race window is quite small, but artificially enlarging it by adding a > > > usleep_range() inside the check in bpf_prog_array_compatible() makes it > > > trivial to trigger from userspace with a program that does, essentially: > > > > > > map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); > > > pid = fork(); > > > if (pid) { > > > key = 0; > > > value = xdp_fd; > > > } else { > > > key = 1; > > > value = tc_fd; > > > } > > > err = bpf_map_update_elem(map_fd, &key, &value, 0); > > > > > > While the race window is small, it has potentially serious ramifications in > > > that triggering it would allow a BPF program to tail call to a program of a > > > different type. So let's get rid of it by protecting the update with a > > > spinlock. The commit in the Fixes tag is the last commit that touches the > > > code in question. > > > > > > v2: > > > - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) > > > > > > Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") > > > Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> > > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > > > --- > > > include/linux/bpf.h | 1 + > > > kernel/bpf/arraymap.c | 1 + > > > kernel/bpf/core.c | 14 ++++++++++---- > > > kernel/bpf/syscall.c | 2 ++ > > > 4 files changed, 14 insertions(+), 4 deletions(-) > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > index 020a7d5bf470..98d906176d89 100644 > > > --- a/include/linux/bpf.h > > > +++ b/include/linux/bpf.h > > > @@ -929,6 +929,7 @@ struct bpf_array_aux { > > > * stored in the map to make sure that all callers and callees have > > > * the same prog type and JITed flag. > > > */ > > > + spinlock_t type_check_lock; > > > > I was wondering if we can use a mutex instead of a spinlock here since it is > > run from a syscall AFAIU. The only downside is mutex_lock is run inside > > aux->used_maps_mutex critical section. Am I missing something? > > Hm, potentially it could work, but then it's also 32 vs 4 extra bytes. There's > also poke_mutex or freeze_mutex, but feels to hacky to 'generalize for reuse', > so I think the spinlock in bpf_array_aux is fine. I was wondering if in the future we would need to protect something not supported by a spinlock but it is probably not the case. I am fine with the spinlock :) Regards, Lorenzo > > > > enum bpf_prog_type type; > > > bool jited; > > > /* Programs with direct jumps into programs part of this array. */ > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > > index cebd4fb06d19..da9b1e96cadc 100644 > > > --- a/kernel/bpf/arraymap.c > > > +++ b/kernel/bpf/arraymap.c > > > @@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr) > > > INIT_WORK(&aux->work, prog_array_map_clear_deferred); > > > INIT_LIST_HEAD(&aux->poke_progs); > > > mutex_init(&aux->poke_mutex); > > > + spin_lock_init(&aux->type_check_lock); > > Just as a tiny nit, I would probably name it slightly different, since type_check_lock > mainly refers to the type property but there's also jit vs non-jit and as pointed out > there could be other extensions that need checking in future as well. Maybe 'compat_lock' > would be a more generic one or just: > > struct { > enum bpf_prog_type type; > bool jited; > spinlock_t lock; > } owner; > > > > map = array_map_alloc(attr); > > > if (IS_ERR(map)) { > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > > index c1e7eb3f1876..9439c839d279 100644 > > > --- a/kernel/bpf/core.c > > > +++ b/kernel/bpf/core.c > > > @@ -1823,20 +1823,26 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx, > > > bool bpf_prog_array_compatible(struct bpf_array *array, > > > const struct bpf_prog *fp) > > > { > > > + bool ret; > > > + > > > if (fp->kprobe_override) > > > return false; > > > + spin_lock(&array->aux->type_check_lock); > > > + > > > if (!array->aux->type) { > > > /* There's no owner yet where we could check for > > > * compatibility. > > > */ > > > array->aux->type = fp->type; > > > array->aux->jited = fp->jited; > > > - return true; > > > + ret = true; > > > + } else { > > > + ret = array->aux->type == fp->type && > > > + array->aux->jited == fp->jited; > > > } > > > - > > > - return array->aux->type == fp->type && > > > - array->aux->jited == fp->jited; > > > + spin_unlock(&array->aux->type_check_lock); > > > + return ret; > > > } > > > static int bpf_check_tail_call(const struct bpf_prog *fp) > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > index 4e50c0bfdb7d..955011c7df29 100644 > > > --- a/kernel/bpf/syscall.c > > > +++ b/kernel/bpf/syscall.c > > > @@ -543,8 +543,10 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp) > > > if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) { > > > array = container_of(map, struct bpf_array, map); > > > + spin_lock(&array->aux->type_check_lock); > > > type = array->aux->type; > > > jited = array->aux->jited; > > > + spin_unlock(&array->aux->type_check_lock); > > > } > > > seq_printf(m, > > > -- > > > 2.33.0 > > > >
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 020a7d5bf470..98d906176d89 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -929,6 +929,7 @@ struct bpf_array_aux { * stored in the map to make sure that all callers and callees have * the same prog type and JITed flag. */ + spinlock_t type_check_lock; enum bpf_prog_type type; bool jited; /* Programs with direct jumps into programs part of this array. */ diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index cebd4fb06d19..da9b1e96cadc 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr) INIT_WORK(&aux->work, prog_array_map_clear_deferred); INIT_LIST_HEAD(&aux->poke_progs); mutex_init(&aux->poke_mutex); + spin_lock_init(&aux->type_check_lock); map = array_map_alloc(attr); if (IS_ERR(map)) { diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index c1e7eb3f1876..9439c839d279 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1823,20 +1823,26 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx, bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp) { + bool ret; + if (fp->kprobe_override) return false; + spin_lock(&array->aux->type_check_lock); + if (!array->aux->type) { /* There's no owner yet where we could check for * compatibility. */ array->aux->type = fp->type; array->aux->jited = fp->jited; - return true; + ret = true; + } else { + ret = array->aux->type == fp->type && + array->aux->jited == fp->jited; } - - return array->aux->type == fp->type && - array->aux->jited == fp->jited; + spin_unlock(&array->aux->type_check_lock); + return ret; } static int bpf_check_tail_call(const struct bpf_prog *fp) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4e50c0bfdb7d..955011c7df29 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -543,8 +543,10 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp) if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) { array = container_of(map, struct bpf_array, map); + spin_lock(&array->aux->type_check_lock); type = array->aux->type; jited = array->aux->jited; + spin_unlock(&array->aux->type_check_lock); } seq_printf(m,
Lorenzo noticed that the code testing for program type compatibility of tail call maps is potentially racy in that two threads could encounter a map with an unset type simultaneously and both return true even though they are inserting incompatible programs. The race window is quite small, but artificially enlarging it by adding a usleep_range() inside the check in bpf_prog_array_compatible() makes it trivial to trigger from userspace with a program that does, essentially: map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); pid = fork(); if (pid) { key = 0; value = xdp_fd; } else { key = 1; value = tc_fd; } err = bpf_map_update_elem(map_fd, &key, &value, 0); While the race window is small, it has potentially serious ramifications in that triggering it would allow a BPF program to tail call to a program of a different type. So let's get rid of it by protecting the update with a spinlock. The commit in the Fixes tag is the last commit that touches the code in question. v2: - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> --- include/linux/bpf.h | 1 + kernel/bpf/arraymap.c | 1 + kernel/bpf/core.c | 14 ++++++++++---- kernel/bpf/syscall.c | 2 ++ 4 files changed, 14 insertions(+), 4 deletions(-)