Message ID | f5eed57b42cc077d24807fc6f2f7b961d65691e5.1605134506.git.dxu@dxuuu.xyz (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | Fix bpf_probe_read_user_str() overcopying | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf |
netdev/subject_prefix | success | Link |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 1 this patch: 1 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 16 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 1 this patch: 1 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Wed, Nov 11, 2020 at 02:45:54PM -0800, Daniel Xu wrote: > do_strncpy_from_user() may copy some extra bytes after the NUL > terminator into the destination buffer. This usually does not matter for > normal string operations. However, when BPF programs key BPF maps with > strings, this matters a lot. > > A BPF program may read strings from user memory by calling the > bpf_probe_read_user_str() helper which eventually calls > do_strncpy_from_user(). The program can then key a map with the > resulting string. BPF map keys are fixed-width and string-agnostic, > meaning that map keys are treated as a set of bytes. > > The issue is when do_strncpy_from_user() overcopies bytes after the NUL > terminator, it can result in seemingly identical strings occupying > multiple slots in a BPF map. This behavior is subtle and totally > unexpected by the user. > > This commit has strncpy start copying a byte at a time if a NUL is > spotted. > > Fixes: 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers") > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > --- > lib/strncpy_from_user.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c > index e6d5fcc2cdf3..83180742e729 100644 > --- a/lib/strncpy_from_user.c > +++ b/lib/strncpy_from_user.c > @@ -40,12 +40,11 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src, > /* Fall back to byte-at-a-time if we get a page fault */ > unsafe_get_user(c, (unsigned long __user *)(src+res), byte_at_a_time); > > + if (has_zero(c, &data, &constants)) > + goto byte_at_a_time; > + > *(unsigned long *)(dst+res) = c; > - if (has_zero(c, &data, &constants)) { > - data = prep_zero_mask(c, data, &constants); > - data = create_zero_mask(data); > - return res + find_zero(data); > - } > + > res += sizeof(unsigned long); > max -= sizeof(unsigned long); > } The fix looks good to me. It's indeed better than v4 approach. Linus, I think you might have an opinion about it. Please see commit log for the reason we need this fix.
On Fri, Nov 13, 2020 at 9:03 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > Linus, > I think you might have an opinion about it. > Please see commit log for the reason we need this fix. Why is BPF doing this? The thing is, if you care about the data after the strncpy, you should be clearing it yourself. The kernel "strncpy_from_user()" is *NOT* the same as "strncpy()", despite the name. It never has been, and it never will be. Just the return value being different should have given it away. In particular, "strncpy()" is documented to zero-pad the end of the area. strncpy_from_user() in contrast, is documented to NOT do that. You cannot - and must not - depend on it. If you want the zero-padding, you need to do it yourself. We're not slowing down strncpy_from_user() because you want it, because NOBODY ELSE cares, and you're depending on semantics that do not exist, and have never existed. So if you want padding, you do something like long strncpy_from_user_pad(char *dst, const char __user *src, long count) { long res = strncpy_from_user(dst, src, count) if (res >= 0) memset(dst+res, 0, count-res); return res; } because BPF is doing things wrong as-is, and the problem is very much that BPF is relying on undefined data *after* the string. And again - we're not slowing down the good cases just because BPF depends on bad behavior. You should feel bad. Linus
On Fri, Nov 13, 2020 at 10:08:02AM -0800, Linus Torvalds wrote: > On Fri, Nov 13, 2020 at 9:03 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > Linus, > > I think you might have an opinion about it. > > Please see commit log for the reason we need this fix. > > Why is BPF doing this? > > The thing is, if you care about the data after the strncpy, you should > be clearing it yourself. > > The kernel "strncpy_from_user()" is *NOT* the same as "strncpy()", > despite the name. It never has been, and it never will be. Just the > return value being different should have given it away. > > In particular, "strncpy()" is documented to zero-pad the end of the > area. strncpy_from_user() in contrast, is documented to NOT do that. > You cannot - and must not - depend on it. > > If you want the zero-padding, you need to do it yourself. We're not > slowing down strncpy_from_user() because you want it, because NOBODY > ELSE cares, and you're depending on semantics that do not exist, and > have never existed. > > So if you want padding, you do something like > > long strncpy_from_user_pad(char *dst, const char __user *src, long count) > { > long res = strncpy_from_user(dst, src, count) > if (res >= 0) > memset(dst+res, 0, count-res); > return res; > } > > because BPF is doing things wrong as-is, and the problem is very much > that BPF is relying on undefined data *after* the string. > > And again - we're not slowing down the good cases just because BPF > depends on bad behavior. You misunderstood. BPF side does not depend on zero padding. The destination buffer was already initialized with zeros before the call. What BPF didn't expect is strncpy_from_user() copying extra garbage after NUL byte. I bet some kernel routine would be equally surprised by such behavior. Hence I still think the fix belongs in this function. I think the theoretical performance degradation is exactly theoretical. The v4 approach preserves performance. It wasn't switching to byte_at_a_time: https://patchwork.kernel.org/project/netdevbpf/patch/4ff12d0c19de63e7172d25922adfb83ae7c8691f.1604620776.git.dxu@dxuuu.xyz/ but it caused KASAN splats, since kernel usage of strncpy_from_user() doesn't init dest array unlike bpf does.
On Fri, Nov 13, 2020 at 11:17 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > You misunderstood. > BPF side does not depend on zero padding. > The destination buffer was already initialized with zeros before the call. > What BPF didn't expect is strncpy_from_user() copying extra garbage after NUL byte. BPF made the wrong expectation. Those bytes are not defined, and it's faster the way it is written. Nobody else cares. BPF needs to fix it's usage. It really is that simple. strncpy_from_user() is one of the hottest functions in the whole kernel (under certain not very uncommon loads), and it's been optimized for performance. You told it that the destination buffer was some amount of bytes, and strncpy_from_user() will use up to that maximum number of bytes. That's the only guarantee you have - it won't write _past_ the buffer you gave it. The fact that you then use the string not as a string, but as something else, that's why *you* need to change your code. Linus
On Fri, Nov 13, 2020 at 11:17 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > The destination buffer was already initialized with zeros before the call. Side note: this is another example of you using the interface incorrectly. You should *not* initialize the buffer with zeroes. That's just extra work. One of the points of the strncpy_from_user() interface is that it is *not* the incredibly broken garbage that "strncpy()" is. strncpy_from_user() returns the size of the resulting string, *EXACTLY* so that people who care can then use that information directly and efficiently. Most of the time it's to avoid a strlen() on the result (and check for overflow), of course, but the other use-case is exactly that "maybe I need to pad out the result", so that you don't need to initialize the buffer beforehand. I'm not sure exactly which strncpy_from_user() user you have issues with, but I did a git grep strncpy_from_user -- '*bpf*' and several of them look quite questionable. All of the ones in kernel/bpf/syscall.c seem to handle overflow incorrectly, for example, with a silent truncation instead of error. Maybe that's fine, but it's questionable. And the bpf_trace_copy_string() thing doesn't check the result at all, so the end result may not be NUL-terminated. Maybe that's ok. I didn't check the call chain. Linus
On Fri, Nov 13, 2020 at 11:17 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > The v4 approach preserves performance. It wasn't switching to byte_at_a_time: That v4 looks better, but still pointless. But it might be acceptable, with that final *out = (*out & ~mask) | (c & mask); just replaced with *out = c & mask; which still writes past the end, but now it only writes zeroes. But the only reason for that to be done is if you have exposed the destination buffer to another thread before (and you zeroed it before exposing it), and you want to make sure that any concurrent reader can never see anything past the end of the string. Again - the *only* case that could possibly matter is when you pre-zeroed the buffer, because if you didn't, then a concurrent reader would see random garbage *anyway*, particularly since there is no SMP memory ordering imposed with the strncpy. So nothing but "pre-zeroed" makes any possible sense, which means that the whole "(*out & ~mask)" in that v4 patch is completely and utterly meaningless. There's absolutely zero reason to try to preserve any old data. In other words, you have two cases: (a) no threaded and unlocked accesses to the resulting string (b) you _do_ have concurrent threaded accesses to the string and no locking (really? That's seriously questionable), If you have case (a), then the only correct thing to do is to explicitly pad afterwards. It's optimal, and doesn't make any assumptions about implementation of strncpy_from_user(). If you really have that case (b), and you absolutely require that the filling be done without exposing any temporary garbage, and thus the "pad afterwards" doesn't work, then you are doing something seriously odd. But in that seriously odd (b) case, the _only_ possibly valid thing you can do is to pre-zero the buffer, since strncpy_from_user() doesn't even imply any memory ordering in its accesses, so any concurrent reader by definition will see a randomly ordered partial string being copied. That strikes me as completely insane, but at least a careful reader could see a valid partial string being possibly in the process of being built up. But again, that use-case is only possible if the buffer is pre-zeroed, so doing that "(*out & ~mask)" cannot be relevant or sane. If you really do have that (b) case, then I'd accept that modified v4 patch, together with an absolutely *huge* comment both in strncpy_from_user() and very much at the call-site, talking about that non-locked concurrent access to the destination buffer. Linus
On Fri, Nov 13, 2020 at 12:10:57PM -0800, Linus Torvalds wrote: > On Fri, Nov 13, 2020 at 11:17 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > The v4 approach preserves performance. It wasn't switching to byte_at_a_time: > > That v4 looks better, but still pointless. > > But it might be acceptable, with that final > > *out = (*out & ~mask) | (c & mask); > > just replaced with > > *out = c & mask; > > which still writes past the end, but now it only writes zeroes. > > But the only reason for that to be done is if you have exposed the > destination buffer to another thread before (and you zeroed it before > exposing it), and you want to make sure that any concurrent reader can > never see anything past the end of the string. > > Again - the *only* case that could possibly matter is when you > pre-zeroed the buffer, because if you didn't, then a concurrent reader > would see random garbage *anyway*, particularly since there is no SMP > memory ordering imposed with the strncpy. So nothing but "pre-zeroed" > makes any possible sense, which means that the whole "(*out & ~mask)" > in that v4 patch is completely and utterly meaningless. There's > absolutely zero reason to try to preserve any old data. > > In other words, you have two cases: > > (a) no threaded and unlocked accesses to the resulting string > > (b) you _do_ have concurrent threaded accesses to the string and no > locking (really? That's seriously questionable), > > If you have case (a), then the only correct thing to do is to > explicitly pad afterwards. It's optimal, and doesn't make any > assumptions about implementation of strncpy_from_user(). (a) is the only case. There is no concurrent access to dest. Theoretically it's possible for two bpf progs on different cpus to write into the same dest, but it's completely racy and buggy for other reasons. I've looked through most of the kernel code where strncpy_from_user() is used and couldn't find a case where dest is not used as a string. In particular I was worried about the code like: struct foo { ... char name[64]; ... } *f; f = kcalloc(); ... ret = strncpy_from_user(f->name, user_addr, sizeof(f->name)); if (ret <= 0) goto ...; f->name[ret] = 0; and then the whole *f would be passed to a hash function that will go over the sizeof(struct foo) assuming that strncpy_from_user() didn't add the garbage. The extra zeroing: f->name[ret] = 0; didn't save it from the garbage in the "name". I can convince myself that your new definition of strncpy_from_user: " You told it that the destination buffer was some amount of bytes, and strncpy_from_user() will use up to that maximum number of bytes. That's the only guarantee you have - it won't write _past_ the buffer you gave it. " makes sense from the performance point of view. But I think if glibc's strncpy() did something like this it would probably caused a lot of pain for user space. The hash element example above is typical bpf usage. One real bpf prog was converted as a test: tools/testing/selftests/bpf/progs/pyperf.h There it's populating: typedef struct { char name[FUNCTION_NAME_LEN]; char file[FILE_NAME_LEN]; } Symbol; with two calls: bpf_probe_read_user_str(&symbol->name, sizeof(symbol->name), frame->co_name + pidData->offsets.String_data); These are the function name inside python interpreter. The 'len' result is ignored by bpf prog. And later the whole 'Symbol' is passed into a hash map. The kernel code doesn't seem to be doing anything like this, so it's fine to adopt your definition of strncpy_from_user(). The garbage copying part can be cleared on bpf side. It's easy enough to do in bpf_probe_read_user_str(). We can just zero up to sizeof(long) bytes there.
On Fri, Nov 13, 2020 at 12:57 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > (a) is the only case. Ok, good. The (b) case is certainly valid in theory (and we might even traditionaly have had something like that for things like ->comm[] accesses, although I think we got rid of it). But the (b) case is _so_ hard to think about and so easy to get wrong - readers have to be very careful to only read each byte of the source exactly once - that it's much much better to try to avoid it. > But I think if glibc's strncpy() did something like this it would > probably caused a lot of pain for user space. Oh, absolutely. The standard strncpy() function has some very strict behavior issues, including that zero-padding of the *whole* destination buffer, which would be absolutely horrid for things like fetching pathnames from user space (our buffer is generally close to a page in size). In fact, the kernel strncpy() (ie the one that doesn't copy from user) does ado the whole "pad all zeroes at the end" exactly because people might depend on that. So the _actual_ strncpy() function conforms to the standard use - but you generally shouldn't use it, exactly because it's such a horrible interface. Only good for very small buffers. > The hash element example above is typical bpf usage. The core kernel does have one very common string hash case, but it's for path components, and never the whole string - so it already has to deal with the fact that the string is very much delimited in place by not just NUL at the end, but also '/' characters etc. So no "copy it as a string from user space, and then use it as a block" that I'm aware of. Linus
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c index e6d5fcc2cdf3..83180742e729 100644 --- a/lib/strncpy_from_user.c +++ b/lib/strncpy_from_user.c @@ -40,12 +40,11 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src, /* Fall back to byte-at-a-time if we get a page fault */ unsafe_get_user(c, (unsigned long __user *)(src+res), byte_at_a_time); + if (has_zero(c, &data, &constants)) + goto byte_at_a_time; + *(unsigned long *)(dst+res) = c; - if (has_zero(c, &data, &constants)) { - data = prep_zero_mask(c, data, &constants); - data = create_zero_mask(data); - return res + find_zero(data); - } + res += sizeof(unsigned long); max -= sizeof(unsigned long); }
do_strncpy_from_user() may copy some extra bytes after the NUL terminator into the destination buffer. This usually does not matter for normal string operations. However, when BPF programs key BPF maps with strings, this matters a lot. A BPF program may read strings from user memory by calling the bpf_probe_read_user_str() helper which eventually calls do_strncpy_from_user(). The program can then key a map with the resulting string. BPF map keys are fixed-width and string-agnostic, meaning that map keys are treated as a set of bytes. The issue is when do_strncpy_from_user() overcopies bytes after the NUL terminator, it can result in seemingly identical strings occupying multiple slots in a BPF map. This behavior is subtle and totally unexpected by the user. This commit has strncpy start copying a byte at a time if a NUL is spotted. Fixes: 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> --- lib/strncpy_from_user.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)