Message ID | 20230607223755.1610-1-richard@nod.at (mailing list archive) |
---|---|
Headers | show |
Series | Integer overflows while scanning for integers | expand |
On Thu, Jun 08, 2023 at 12:37:54AM +0200, Richard Weinberger wrote: > Hi! > > Lately I wondered whether users of integer scanning functions check > for overflows. > To detect such overflows around scanf I came up with the following > patch. It simply triggers a WARN_ON_ONCE() upon an overflow. > > After digging into various scanf users I found that the network device > naming code can trigger an overflow. > > e.g: > $ ip link add 1 type veth peer name 9999999999 > $ ip link set name "%d" dev 1 > > It will trigger the following WARN_ON_ONCE(): > ------------[ cut here ]------------ > WARNING: CPU: 2 PID: 433 at lib/vsprintf.c:3701 vsscanf+0x6ce/0x990 Hm, it's considered a bug if a WARN or BUG can be reached from userspace, so this probably needs to be rearranged (or callers fixed). Do we need to change the scanf API for sane use inside the kernel? -Kees
On Wed 2023-06-07 16:36:12, Kees Cook wrote: > On Thu, Jun 08, 2023 at 12:37:54AM +0200, Richard Weinberger wrote: > > Hi! > > > > Lately I wondered whether users of integer scanning functions check > > for overflows. > > To detect such overflows around scanf I came up with the following > > patch. It simply triggers a WARN_ON_ONCE() upon an overflow. > > > > After digging into various scanf users I found that the network device > > naming code can trigger an overflow. > > > > e.g: > > $ ip link add 1 type veth peer name 9999999999 > > $ ip link set name "%d" dev 1 > > > > It will trigger the following WARN_ON_ONCE(): > > ------------[ cut here ]------------ > > WARNING: CPU: 2 PID: 433 at lib/vsprintf.c:3701 vsscanf+0x6ce/0x990 > > Hm, it's considered a bug if a WARN or BUG can be reached from > userspace, Good point. WARN() does not look like the right way in this case. Another problem is that some users use panic_on_warn. In this case, the above "ip" command calls would trigger panic(). And it does not look like an optimal behavior. I know there already are some WARN_ONs for similar situations, e.g. set_field_width() or set_precision(). But these do not get random values. And it would actually be nice to introduce something like INFO() that would be usable for these less serious problems where the backtrace is useful but they should never trigger panic(). > so this probably needs to be rearranged (or callers fixed). > Do we need to change the scanf API for sane use inside the kernel? It seems that userspace implementation of sscanf() and vsscanf() returns -ERANGE in this case. It might be a reasonable solution. Well, there is a risk of introducing security problems. The error value might cause an underflow/overflow when the caller does not expect a negative value. Alternative solution would be to update the "ip" code so that it reads the number separately and treat zero return value as -EINVAL. Best Regards, Petr
----- Ursprüngliche Mail ----- > Von: "Petr Mladek" <pmladek@suse.com> > On Wed 2023-06-07 16:36:12, Kees Cook wrote: >> On Thu, Jun 08, 2023 at 12:37:54AM +0200, Richard Weinberger wrote: >> > Hi! >> > >> > Lately I wondered whether users of integer scanning functions check >> > for overflows. >> > To detect such overflows around scanf I came up with the following >> > patch. It simply triggers a WARN_ON_ONCE() upon an overflow. >> > >> > After digging into various scanf users I found that the network device >> > naming code can trigger an overflow. >> > >> > e.g: >> > $ ip link add 1 type veth peer name 9999999999 >> > $ ip link set name "%d" dev 1 >> > >> > It will trigger the following WARN_ON_ONCE(): >> > ------------[ cut here ]------------ >> > WARNING: CPU: 2 PID: 433 at lib/vsprintf.c:3701 vsscanf+0x6ce/0x990 >> >> Hm, it's considered a bug if a WARN or BUG can be reached from >> userspace, > > Good point. WARN() does not look like the right way in this case. Well, the whole point of my RFC(!) patch is showing the issue and providing a way to actually find such call sites, like the netdev code. > Another problem is that some users use panic_on_warn. In this case, > the above "ip" command calls would trigger panic(). And it does not > look like an optimal behavior. Only if we don't fix netdev code. > I know there already are some WARN_ONs for similar situations, e.g. > set_field_width() or set_precision(). But these do not get random > values. And it would actually be nice to introduce something like > INFO() that would be usable for these less serious problems where > the backtrace is useful but they should never trigger panic(). > >> so this probably needs to be rearranged (or callers fixed). >> Do we need to change the scanf API for sane use inside the kernel? > > It seems that userspace implementation of sscanf() and vsscanf() > returns -ERANGE in this case. It might be a reasonable solution. > > Well, there is a risk of introducing security problems. The error > value might cause an underflow/overflow when the caller does not expect > a negative value. Agreed. Without inspecting all users of scanf we cannot change the API. > Alternative solution would be to update the "ip" code so that it > reads the number separately and treat zero return value as > -EINVAL. The kernel needs fixing, not userspace. Thanks, //richard
On Thu, Jun 08, 2023 at 05:27:40PM +0200, Petr Mladek wrote: > On Wed 2023-06-07 16:36:12, Kees Cook wrote: > > On Thu, Jun 08, 2023 at 12:37:54AM +0200, Richard Weinberger wrote: > > > Hi! > > > > > > Lately I wondered whether users of integer scanning functions check > > > for overflows. > > > To detect such overflows around scanf I came up with the following > > > patch. It simply triggers a WARN_ON_ONCE() upon an overflow. > > > > > > After digging into various scanf users I found that the network device > > > naming code can trigger an overflow. > > > > > > e.g: > > > $ ip link add 1 type veth peer name 9999999999 > > > $ ip link set name "%d" dev 1 > > > > > > It will trigger the following WARN_ON_ONCE(): > > > ------------[ cut here ]------------ > > > WARNING: CPU: 2 PID: 433 at lib/vsprintf.c:3701 vsscanf+0x6ce/0x990 > > > > Hm, it's considered a bug if a WARN or BUG can be reached from > > userspace, > > Good point. WARN() does not look like the right way in this case. > > Another problem is that some users use panic_on_warn. In this case, > the above "ip" command calls would trigger panic(). And it does not > look like an optimal behavior. "some users" == "most major cloud providers and a few billion Android phones" So in pure numbers, the huge majority of Linux systems running in the world have that option enabled. So please don't use WARN() to catch issues that can be triggered by userspace, that can cause data loss and worse at times. thanks, greg k-h
----- Ursprüngliche Mail ----- > Von: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org> > "some users" == "most major cloud providers and a few billion Android > phones" So in pure numbers, the huge majority of Linux systems running > in the world have that option enabled. > > So please don't use WARN() to catch issues that can be triggered by > userspace, that can cause data loss and worse at times. Sorry for being unclear. My goal is not having the WARN patch immediately applied without fixing known call sites which can trigger it. Thanks, //richard
On 08/06/2023 17.27, Petr Mladek wrote: > On Wed 2023-06-07 16:36:12, Kees Cook wrote: > It seems that userspace implementation of sscanf() and vsscanf() > returns -ERANGE in this case. It might be a reasonable solution. Well. _Some_ userspace implementation does that. It's not in POSIX. While "man scanf" lists that ERANGE error, it also explicitly says that: CONFORMING TO The functions fscanf(), scanf(), and sscanf() conform to C89 and C99 and POSIX.1-2001. These standards do not specify the ERANGE error. I can't figure out what POSIX actually says should or could happen with sscanf("99999999999999", "%i", &x); > Well, there is a risk of introducing security problems. The error > value might cause an underflow/overflow when the caller does not expect > a negative value. There is absolutely no way we can start letting sscanf() return a negative err value, in exactly the same way we cannot possibly let vsnprintf() do that. We can stop early, possibly with a WARNing if it's the format string we're unhappy about ('cause that should be compile-time constant or, e.g. in the netdevice name case, carefully checked by the caller) and return "number of succesful conversions so far" (scanf) / "number of bytes written to buffer" (printf). > Alternative solution would be to update the "ip" code so that it > reads the number separately and treat zero return value as > -EINVAL. The netdev naming code _could_ be updated to just not use scanf at all or the bitmap of in-use numbers, just do the "sprintf(buf, fmt, i)" in a loop and stop when the name is not in use. That's a win as long as there are less than ~256 names already matching the pattern, but the performance absolutely tanks if there are many more than that. So I won't actually suggest that. Rasmus
Added Linus into CC. Quick summary for him: 1. The original problem is best described in the cover letter, see https://lore.kernel.org/r/20230607223755.1610-1-richard@nod.at 2. I think that we actually have two problems: a) How to find a code where sscanf() might get invalid output. Such a code might require some special/better handling of this situation. b) How to eventually provide a useful message back to userspace. On Fri 2023-06-09 12:10:29, Rasmus Villemoes wrote: > On 08/06/2023 17.27, Petr Mladek wrote: > > On Wed 2023-06-07 16:36:12, Kees Cook wrote: > > > It seems that userspace implementation of sscanf() and vsscanf() > > returns -ERANGE in this case. It might be a reasonable solution. > > Well. _Some_ userspace implementation does that. It's not in POSIX. > While "man scanf" lists that ERANGE error, it also explicitly says that: > > CONFORMING TO > The functions fscanf(), scanf(), and sscanf() conform to C89 and > C99 and POSIX.1-2001. These standards do not specify the > ERANGE error. > > I can't figure out what POSIX actually says should or could happen with > sscanf("99999999999999", "%i", &x); It looks to me that it does not say anything about it. Even the latest IEEE Std 1003.1-2017 says just this: --- cut --- RETURN VALUE Upon successful completion, these functions shall return the number of successfully matched and assigned input items; this number can be zero in the event of an early matching failure. If the input ends before the first conversion (if any) has completed, and without a matching failure having occurred, EOF shall be returned. If an error occurs before the first conversion (if any) has completed, and without a matching failure having occurred, EOF shall be returned and errno shall be set to indicate the error. If a read error occurs, the error indicator for the stream shall be set. ERRORS For the conditions under which the fscanf() functions fail and may fail, refer to fgetc or fgetwc. In addition, the fscanf() function shall fail if: [EILSEQ] Input byte sequence does not form a valid character. [ENOMEM] Insufficient storage space is available. In addition, the fscanf() function may fail if: [EINVAL] There are insufficient arguments. --- cut --- I do not see anything about overflow anywhere. Well, the -ERANGE returned by the Linux implementation looks quite practical. > > Well, there is a risk of introducing security problems. The error > > value might cause an underflow/overflow when the caller does not expect > > a negative value. > > There is absolutely no way we can start letting sscanf() return a > negative err value, in exactly the same way we cannot possibly let > vsnprintf() do that. I agree. > We can stop early, possibly with a WARNing if it's Thinking more about it, the WARN() is a really big hammer even if we introduced a never-panicking alternative, e.g. INFO() or REPORT(). sscanf() does not know in which situation it is used and how serious the overflow would be. I mean that it does not know how to provide useful report and if it is needed at all. Also WARN()-like report would only help with the 1st problem, how find the code where the overflow might happen. But it would not help to handle it a reasonable way. > > Alternative solution would be to update the "ip" code so that it > > reads the number separately and treat zero return value as > > -EINVAL. > > The netdev naming code _could_ be updated to just not use scanf at all > or the bitmap of in-use numbers, just do the "sprintf(buf, fmt, i)" in a > loop and stop when the name is not in use. That's a win as long as there > are less than ~256 names already matching the pattern, but the > performance absolutely tanks if there are many more than that. So I > won't actually suggest that. Yes, sscanf() might be replaced by a tricky code to better handle possible wrong input. Alternatively, we could provide sscanf_with_err() [*] which would return the error codes. It might be used in situations where the caller is ready to handle it. For example, __dev_alloc_name() might return -ERANGE or -EINVAL. As a result "ip" might exit with 2 status. It means an error from kernel. __dev_alloc_name() might eventually print some useful message into the kernel log. Normal user would not be able to read it. But they might ask admin to check the kernel log... My opinion: 1. I would prefer to avoid WARN() even when a non-panic alternative is used. It might be too noisy. Well, it might be acceptable when it is enabled only with some CONFIG_DEBUG_SSCANF. But the question is who would enable it. If a developer knew that the problem might be in sscanf() then they probably are close to the buggy code anyway. 2. The sscanf_with_err() [*] variant looks practical to actually handle the wrong input. [*] sscanf_with_error() is super ugly name. Any better idea is welcome. Best Regards, Petr
On Fri, Jun 9, 2023 at 10:03 AM Petr Mladek <pmladek@suse.com> wrote: > > Added Linus into CC. Quick summary for him: > > 1. The original problem is best described in the cover letter, see > https://lore.kernel.org/r/20230607223755.1610-1-richard@nod.at Well, honestly, I'm not convinced this is a huge problem, but: > > I can't figure out what POSIX actually says should or could happen with > > sscanf("99999999999999", "%i", &x); > > It looks to me that it does not say anything about it. Even the latest > IEEE Std 1003.1-2017 says just this: We really don't need to language-lawyer POSIX. It's one of those "let's follow it to minimize confusion" things, but no need to think our internal library decisions need to actually care deeply. For example, we did all the '%pXYZ" extensions on the printf() side, which are very much *not* POSIX. They have been a huge success. And even when it comes to the user space ABI, we've always prioritized backwards compatibility over POSIX. In some cases have actively disagreed with POSIX for being actively wrong. For example, POSIX at one point thought that 'accept()', 'bind()' and friends should take a 'size_t' for the address length, which was complete garbage. It's "int", and I absolutely refused to follow broken specs. They ended up fixing their completely broken spec to use 'socklen_t' instead, which is still completely wrong (it's "int", and anything else is wrong), but at least you can now be POSIX-compatible without being broken. In this case, I would suggest: - absolutely do *NOT* add a WARNING() for this, because that just allows user space to arbitrarily cause kernel warnings. Not ok. - fail overflows by default - to be able to deal with any compatibility issues, add a way to say "don't fail overflows" in the format specs, maybe using '!'. IOW, make sscanf("999999999999", "%d", &i); return 0 (because it couldn't parse a single field - go ahead and set errno to ERANGE too if you care), but allow people to do sscanf("999999999999", "%!d", &i); which would return 1 and set 'i' to -727379969. That makes us error out on overflow by default, but gives us an easy way to say "in this case, I'll take the overflow value" if it turns out that we have some situation where people gave bad input and it "just worked". There's a special case of "overflow" that we may need to deal with: passing in "-1" as a way to set all bits to one in an unsigned value is not necessarily uncommon. So I suspect that even if the conversion is something like "%lu" (or "%x"), which is technically meant for unsigned values, we need to still allow negative values, and just say "it's not overflow, it's 2's complement". Linus
From: Rasmus Villemoes > Sent: 09 June 2023 11:10 > > On 08/06/2023 17.27, Petr Mladek wrote: > > On Wed 2023-06-07 16:36:12, Kees Cook wrote: > > > It seems that userspace implementation of sscanf() and vsscanf() > > returns -ERANGE in this case. It might be a reasonable solution. > > Well. _Some_ userspace implementation does that. It's not in POSIX. > While "man scanf" lists that ERANGE error, it also explicitly says that: > > CONFORMING TO > The functions fscanf(), scanf(), and sscanf() conform to C89 and > C99 and POSIX.1-2001. These standards do not specify the > ERANGE error. > > I can't figure out what POSIX actually says should or could happen with > sscanf("99999999999999", "%i", &x); Possibly 'undefined behaviour' - they like that one. But I'm sure I remember the ToG 'Unix' definition not requiring that 'utilities' check for overflow on numeric command line parameters. (It might even have said they wouldn't check.) So it was perfectly valid for a stupidly out of range value to be treated as a different (possibly valid) value. What is clearly wrong is to just stop processing the input string. sscanf("9999999999999999999scale", "%i%s", &x, &scale) writing any '9' to scale is clearly broken. Personally I avoid scanf() - it is far too easy for it to do something you didn't really intend. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)