Message ID | 20231018-strncpy-drivers-nvme-host-fabrics-c-v1-1-b6677df40a35@google.com (mailing list archive) |
---|---|
State | Mainlined |
Commit | 576b75f93b3d3c408235808f689453f1ed891486 |
Headers | show |
Series | nvme-fabrics: replace deprecated strncpy with strscpy | expand |
On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > strncpy() is deprecated for use on NUL-terminated destination strings > [1] and as such we should prefer more robust and less ambiguous string > interfaces. If we want that we need to stop pretendening direct manipulation of nul-terminate strings is a good idea. I suspect the churn of replacing one helper with another, maybe slightly better, one probably introduces more bugs than it fixes. If we want to attack the issue for real we need to use something better. lib/seq_buf.c is a good start for a lot of simple cases that just append to strings including creating complex ones. Kent had a bunch of good ideas on how to improve it, but couldn't be convinced to contribute to it instead of duplicating the functionality which is a bit sad, but I think we need to switch to something like seq_buf that actually has a counted string instead of all this messing around with the null-terminated strings.
On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > strncpy() is deprecated for use on NUL-terminated destination strings > [1] and as such we should prefer more robust and less ambiguous string > interfaces. > > We expect both data->subsysnqn and data->hostnqn to be NUL-terminated > based on their usage with format specifier ("%s"): > fabrics.c: > 322: dev_err(ctrl->device, > 323: "%s, subsysnqn \"%s\"\n", > 324: inv_data, data->subsysnqn); > ... > 349: dev_err(ctrl->device, > 350: "Connect for subsystem %s is not allowed, hostnqn: %s\n", > 351: data->subsysnqn, data->hostnqn); > > Moreover, there's no need to NUL-pad since `data` is zero-allocated > already in fabrics.c: > 383: data = kzalloc(sizeof(*data), GFP_KERNEL); > ... therefore any further NUL-padding is rendered useless. > > Considering the above, a suitable replacement is `strscpy` [2] due to > the fact that it guarantees NUL-termination on the destination buffer > without unnecessarily NUL-padding. > > I opted not to switch NVMF_NQN_SIZE to sizeof(data->xyz) because the > size is defined as: > | /* NQN names in commands fields specified one size */ > | #define NVMF_NQN_FIELD_LEN 256 > > ... while NVMF_NQN_SIZE is defined as: > | /* However the max length of a qualified name is another size */ > | #define NVMF_NQN_SIZE 223 > > Since 223 seems pretty magic, I'm not going to touch it. struct nvmf_connect_data { ... char subsysnqn[NVMF_NQN_FIELD_LEN]; char hostnqn[NVMF_NQN_FIELD_LEN]; Honestly, the use of NVMF_NQN_SIZE as the length arg looks like a bug. struct nvmf_ctrl_options { ... char *subsysnqn; ... struct nvmf_host *host; struct nvmf_host { ... char nqn[NVMF_NQN_SIZE]; ctrl->opts->host->nqn is sized as NVMF_NQN_SIZE, so this is like saying: strscpy(dest, src, sizeof(src)); Which can go wrong when dest is smaller than src, but that's not the case here. It seems ctrl->opts->host->nqn is expected to be a C string: drivers/nvme/host/fabrics.h: strcmp(opts->host->nqn, ctrl->opts->host->nqn) || And subsysnqn seems to be the same size: drivers/nvme/target/core.c: subsys->subsysnqn = kstrndup(subsysnqn, NVMF_NQN_SIZE, drivers/nvme/target/core.c- GFP_KERNEL); So these really look like bugs to me. Perhaps a follow-up patch to fix this, if nvme maintainers agree? Either way, strscpy is an improvement: Reviewed-by: Kees Cook <keescook@chromium.org> -Kees > > Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] > Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] > Link: https://github.com/KSPP/linux/issues/90 > Cc: linux-hardening@vger.kernel.org > Signed-off-by: Justin Stitt <justinstitt@google.com> > --- > Note: build-tested only. > > Found with: $ rg "strncpy\(" > --- > drivers/nvme/host/fabrics.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c > index 8175d49f2909..57aad3ce311a 100644 > --- a/drivers/nvme/host/fabrics.c > +++ b/drivers/nvme/host/fabrics.c > @@ -386,8 +386,8 @@ static struct nvmf_connect_data *nvmf_connect_data_prep(struct nvme_ctrl *ctrl, > > uuid_copy(&data->hostid, &ctrl->opts->host->id); > data->cntlid = cpu_to_le16(cntlid); > - strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE); > - strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE); > + strscpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE); > + strscpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE); > > return data; > } > > --- > base-commit: 58720809f52779dc0f08e53e54b014209d13eebb > change-id: 20231018-strncpy-drivers-nvme-host-fabrics-c-416258a22598 > > Best regards, > -- > Justin Stitt <justinstitt@google.com> > >
On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > strncpy() is deprecated for use on NUL-terminated destination strings > > [1] and as such we should prefer more robust and less ambiguous string > > interfaces. > > If we want that we need to stop pretendening direct manipulation of > nul-terminate strings is a good idea. I suspect the churn of replacing > one helper with another, maybe slightly better, one probably > introduces more bugs than it fixes. > > If we want to attack the issue for real we need to use something > better. > > lib/seq_buf.c is a good start for a lot of simple cases that just > append to strings including creating complex ones. Kent had a bunch > of good ideas on how to improve it, but couldn't be convinced to > contribute to it instead of duplicating the functionality which > is a bit sad, but I think we need to switch to something like > seq_buf that actually has a counted string instead of all this messing > around with the null-terminated strings. When doing more complex string creation, I agree. I spent some time doing this while I was looking at removing strcat() and strlcat(); this is where seq_buf shines. (And seq_buf is actually both: it maintains its %NUL termination _and_ does the length counting.) The only thing clunky about it was initialization, but all the conversions I experimented with were way cleaner using seq_buf. I even added a comment to strlcat()'s kern-doc to aim folks at seq_buf. :) /** * strlcat - Append a string to an existing string ... * Do not use this function. While FORTIFY_SOURCE tries to avoid * read and write overflows, this is only possible when the sizes * of @p and @q are known to the compiler. Prefer building the * string with formatting, via scnprintf(), seq_buf, or similar. Almost all of the remaining strncpy() usage is just string to string copying, but the corner cases that are being spun out that aren't strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), and memcpy(). Each of these are a clear improvement since they remove the ambiguity of the intended behavior. Using seq_buf ends up being way more overhead than is needed. -Kees
On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > [1] and as such we should prefer more robust and less ambiguous string > > > interfaces. > > > > If we want that we need to stop pretendening direct manipulation of > > nul-terminate strings is a good idea. I suspect the churn of replacing > > one helper with another, maybe slightly better, one probably > > introduces more bugs than it fixes. > > > > If we want to attack the issue for real we need to use something > > better. > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > append to strings including creating complex ones. Kent had a bunch > > of good ideas on how to improve it, but couldn't be convinced to > > contribute to it instead of duplicating the functionality which > > is a bit sad, but I think we need to switch to something like > > seq_buf that actually has a counted string instead of all this messing > > around with the null-terminated strings. > > When doing more complex string creation, I agree. I spent some time > doing this while I was looking at removing strcat() and strlcat(); this > is where seq_buf shines. (And seq_buf is actually both: it maintains its > %NUL termination _and_ does the length counting.) The only thing clunky > about it was initialization, but all the conversions I experimented with > were way cleaner using seq_buf. (...) I also agree. I'm using several other schemes based on pointer+length in other projects and despite not being complete in terms of API (due to the slow migration of old working code), over time it proves much easier to use and requires far less controls. With NUL-teminated strings you need to perform checks for each and every operation. When the length is known and controlled, most often you can get rid of many tests on intermediate operations and perform a check at the end, thus you end up with less "if" and "goto fail" in the code, because the checks are no longer for "not crashing nor introducing vulnerabilities", but just "returning a correct result", which can often be detected more easily. Another benefit I found by accident is that when you need to compare some tokens against multiple ones (say some keywords for example), it becomes much faster than strcmp()-based if/else series because in this case you start by comparing lengths instead of comparing contents. And when your macros allow you to constify string constants, the compiler will replace long "if" series with checks against constant values, and may even arrange them as a tree since all are constants, sometimes mixing with the first char as the discriminator. Typically on the test below I observe a 10x speedup at -O3 and ~5x at -O2 when I convert this: if (!strcmp(name, "host") || !strcmp(name, "content-length") || !strcmp(name, "connection") || !strcmp(name, "proxy-connection") || !strcmp(name, "keep-alive") || !strcmp(name, "upgrade") || !strcmp(name, "te") || !strcmp(name, "transfer-encoding")) return 1; to this: if (isteq(name, ist("host")) || isteq(name, ist("content-length")) || isteq(name, ist("connection")) || isteq(name, ist("proxy-connection")) || isteq(name, ist("keep-alive")) || isteq(name, ist("upgrade")) || isteq(name, ist("te")) || isteq(name, ist("transfer-encoding"))) return 1; The code is larger but when compiled at -Os, it instead becomes smaller. Another interesting property I'm using in the API above, that might or might not apply there is that for most archs we care about, functions can take a struct of two words passed as registers, and can return such a struct as a pair of registers as well. This allows to chain functions by passing one function's return as the argument to another one, which is what users often want to do to avoid intermediate variables. All this to say that length-based strings do offer quite a lot of benefits over the long term. Willy
On Thu, Oct 19, 2023 at 09:01:53AM +0200, Willy Tarreau wrote: > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > > [1] and as such we should prefer more robust and less ambiguous string > > > > interfaces. > > > > > > If we want that we need to stop pretendening direct manipulation of > > > nul-terminate strings is a good idea. I suspect the churn of replacing > > > one helper with another, maybe slightly better, one probably > > > introduces more bugs than it fixes. > > > > > > If we want to attack the issue for real we need to use something > > > better. > > > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > > append to strings including creating complex ones. Kent had a bunch > > > of good ideas on how to improve it, but couldn't be convinced to > > > contribute to it instead of duplicating the functionality which > > > is a bit sad, but I think we need to switch to something like > > > seq_buf that actually has a counted string instead of all this messing > > > around with the null-terminated strings. > > > > When doing more complex string creation, I agree. I spent some time > > doing this while I was looking at removing strcat() and strlcat(); this > > is where seq_buf shines. (And seq_buf is actually both: it maintains its > > %NUL termination _and_ does the length counting.) The only thing clunky > > about it was initialization, but all the conversions I experimented with > > were way cleaner using seq_buf. > (...) > > I also agree. I'm using several other schemes based on pointer+length in > other projects and despite not being complete in terms of API (due to the > slow migration of old working code), over time it proves much easier to > use and requires far less controls. > > With NUL-teminated strings you need to perform checks for each and every > operation. When the length is known and controlled, most often you can > get rid of many tests on intermediate operations and perform a check at > the end, thus you end up with less "if" and "goto fail" in the code, > because the checks are no longer for "not crashing nor introducing > vulnerabilities", but just "returning a correct result", which can often > be detected more easily. > > Another benefit I found by accident is that when you need to compare some > tokens against multiple ones (say some keywords for example), it becomes > much faster than strcmp()-based if/else series because in this case you > start by comparing lengths instead of comparing contents. And when your > macros allow you to constify string constants, the compiler will replace > long "if" series with checks against constant values, and may even arrange > them as a tree since all are constants, sometimes mixing with the first > char as the discriminator. Typically on the test below I observe a 10x > speedup at -O3 and ~5x at -O2 when I convert this: > > if (!strcmp(name, "host") || > !strcmp(name, "content-length") || > !strcmp(name, "connection") || > !strcmp(name, "proxy-connection") || > !strcmp(name, "keep-alive") || > !strcmp(name, "upgrade") || > !strcmp(name, "te") || > !strcmp(name, "transfer-encoding")) > return 1; > > to this: > > if (isteq(name, ist("host")) || > isteq(name, ist("content-length")) || > isteq(name, ist("connection")) || > isteq(name, ist("proxy-connection")) || > isteq(name, ist("keep-alive")) || > isteq(name, ist("upgrade")) || > isteq(name, ist("te")) || > isteq(name, ist("transfer-encoding"))) > return 1; > > The code is larger but when compiled at -Os, it instead becomes smaller. > > Another interesting property I'm using in the API above, that might or > might not apply there is that for most archs we care about, functions > can take a struct of two words passed as registers, and can return > such a struct as a pair of registers as well. This allows to chain > functions by passing one function's return as the argument to another > one, which is what users often want to do to avoid intermediate > variables. Chaining should be nice cherry on top for very specific cases but certainly not promoted or advertised. Deleting intermediate variables promotes implementation-defined behaviour because of unspecified order of evaluation of function arguments. Second, debuggers still operate with lines in mind, so jumping to the next statement written like this f(g(), h()) can be problematic. Intermediate variables are much less of a problem now that -Wdeclaration-after-statement has been finally abolished! They don't consume LOC anymore. > All this to say that length-based strings do offer quite a lot of > benefits over the long term. As long as they are named kstring :-) Or std_string, he-he.
On Thu, Oct 19, 2023 at 02:40:52PM +0300, Alexey Dobriyan wrote: > On Thu, Oct 19, 2023 at 09:01:53AM +0200, Willy Tarreau wrote: > > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > > > [1] and as such we should prefer more robust and less ambiguous string > > > > > interfaces. > > > > > > > > If we want that we need to stop pretendening direct manipulation of > > > > nul-terminate strings is a good idea. I suspect the churn of replacing > > > > one helper with another, maybe slightly better, one probably > > > > introduces more bugs than it fixes. > > > > > > > > If we want to attack the issue for real we need to use something > > > > better. > > > > > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > > > append to strings including creating complex ones. Kent had a bunch > > > > of good ideas on how to improve it, but couldn't be convinced to > > > > contribute to it instead of duplicating the functionality which > > > > is a bit sad, but I think we need to switch to something like > > > > seq_buf that actually has a counted string instead of all this messing > > > > around with the null-terminated strings. > > > > > > When doing more complex string creation, I agree. I spent some time > > > doing this while I was looking at removing strcat() and strlcat(); this > > > is where seq_buf shines. (And seq_buf is actually both: it maintains its > > > %NUL termination _and_ does the length counting.) The only thing clunky > > > about it was initialization, but all the conversions I experimented with > > > were way cleaner using seq_buf. > > (...) > > > > I also agree. I'm using several other schemes based on pointer+length in > > other projects and despite not being complete in terms of API (due to the > > slow migration of old working code), over time it proves much easier to > > use and requires far less controls. > > > > With NUL-teminated strings you need to perform checks for each and every > > operation. When the length is known and controlled, most often you can > > get rid of many tests on intermediate operations and perform a check at > > the end, thus you end up with less "if" and "goto fail" in the code, > > because the checks are no longer for "not crashing nor introducing > > vulnerabilities", but just "returning a correct result", which can often > > be detected more easily. > > > > Another benefit I found by accident is that when you need to compare some > > tokens against multiple ones (say some keywords for example), it becomes > > much faster than strcmp()-based if/else series because in this case you > > start by comparing lengths instead of comparing contents. And when your > > macros allow you to constify string constants, the compiler will replace > > long "if" series with checks against constant values, and may even arrange > > them as a tree since all are constants, sometimes mixing with the first > > char as the discriminator. Typically on the test below I observe a 10x > > speedup at -O3 and ~5x at -O2 when I convert this: > > > > if (!strcmp(name, "host") || > > !strcmp(name, "content-length") || > > !strcmp(name, "connection") || > > !strcmp(name, "proxy-connection") || > > !strcmp(name, "keep-alive") || > > !strcmp(name, "upgrade") || > > !strcmp(name, "te") || > > !strcmp(name, "transfer-encoding")) > > return 1; > > > > to this: > > > > if (isteq(name, ist("host")) || > > isteq(name, ist("content-length")) || > > isteq(name, ist("connection")) || > > isteq(name, ist("proxy-connection")) || > > isteq(name, ist("keep-alive")) || > > isteq(name, ist("upgrade")) || > > isteq(name, ist("te")) || > > isteq(name, ist("transfer-encoding"))) > > return 1; > > > > The code is larger but when compiled at -Os, it instead becomes smaller. > > > > Another interesting property I'm using in the API above, that might or > > might not apply there is that for most archs we care about, functions > > can take a struct of two words passed as registers, and can return > > such a struct as a pair of registers as well. This allows to chain > > functions by passing one function's return as the argument to another > > one, which is what users often want to do to avoid intermediate > > variables. > > Chaining should be nice cherry on top for very specific cases but certainly > not promoted or advertised. Deleting intermediate variables promotes > implementation-defined behaviour because of unspecified order of evaluation > of function arguments. Second, debuggers still operate with lines in mind, > so jumping to the next statement written like this > > f(g(), h()) > > can be problematic. It obviously depends what these functions do, but that remains true for lots of other use cases applying to a shared memory location, if that's the case. Also it happens that a lot of string functions that are used as arguments to other ones are in fact lookups, skip, trim etc which only manipulate the pointer and the length and not the contents. > Intermediate variables are much less of a problem now > that -Wdeclaration-after-statement has been finally abolished! They don't > consume LOC anymore. Intermediate variables declared after statements remain an abomination which turn a visual lookup from O(indent_levels) to O(lines) because normally you only have to quickly glance a the previous opening brace and if you don't find, you repeat, but with them you have to visually scan every single line. They're now allowed for macros and iterators which can make a good use of them but it's not a reason for abusing them in code supposed to be reviewable by humans. > > All this to say that length-based strings do offer quite a lot of > > benefits over the long term. > > As long as they are named kstring :-) Or std_string, he-he. That point is the last of my concerns ;-) Willy
On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > Almost all of the remaining strncpy() usage is just string to string > copying, but the corner cases that are being spun out that aren't > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), > and memcpy(). Each of these are a clear improvement since they remove > the ambiguity of the intended behavior. Using seq_buf ends up being way > more overhead than is needed. I'm really not sure strscpy is much of an improvement. In this particular case in most other places we simply use a snprintf for nqns, which seems useful here to if we don't want the full buf. But switching to a completely undocumented helper like strscpy seems not useful at all.
On Thu, Oct 19, 2023 at 9:46 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > Almost all of the remaining strncpy() usage is just string to string > > copying, but the corner cases that are being spun out that aren't > > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), > > and memcpy(). Each of these are a clear improvement since they remove > > the ambiguity of the intended behavior. Using seq_buf ends up being way > > more overhead than is needed. > > I'm really not sure strscpy is much of an improvement. In this particular > case in most other places we simply use a snprintf for nqns, which seems > useful here to if we don't want the full buf. > > But switching to a completely undocumented helper like strscpy seems not > useful at all. There's some docs at [1]. Perhaps there could be more? [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292
On Fri, 20 Oct 2023 at 10:40, Justin Stitt <justinstitt@google.com> wrote: > > There's some docs at [1]. Perhaps there could be more? > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 Note that we have so few 'strlcpy()' calls that we really should remove that horrid horrid interface. It's a buggy piece of sh*t. 'strlcpy()' is fundamentally unsafe BY DESIGN if you don't trust the source string - which is one of the alleged reasons to use it. That said, while 'strscpy()' fixes the fundamental conceptual bugs of strlcpy(), it's worth noting that it has *two* differences wrt strlcpy: - it doesn't have the broken return value - it copies things in word-size chunks to be more efficient And because of that word-size thing it will possibly insert more than one NUL character at the end of the result. To give an example, if you have dst[64] = "abcdefghijklmnopqrstuvwxyz"; src[64] = "123\0........"; strlcpy(dst, src, 64); then the strlcpy() will return 3 (the size of the copy), but the destination will end up looking like dst[64] = "123\0\0\0\0\0ijklmnopqrstuvwxyz..."; This odd "possibly word padding" is basically never relevant (and it's obviously always also limited by the size you gave it), but *if* you are doing something really odd, and you expect the end of the destination string to not be modified at all past the final NUL of the copy, you need to be aware of this. It does mean that if you used to have dst[4]; strlcpy(dst, "abc", 8); then that *used* to work (because it would copy four bytes: "abc\0" and that fits in 'dst[]'). But dst[4]; strscpy(dst, "abc", 8); will overflow dst[], because it will do a word-copy and you told 'strscpy()' that you had a 8-byte buffer, and it will try to write "abc\0\0\0\0\0" into the destination. The above is insane code, but it's an example of why a blind strlcpy->strscpy conversion might change semantics. Linus
On Fri, Oct 20, 2023 at 10:56:31AM -0700, Linus Torvalds wrote: > On Fri, 20 Oct 2023 at 10:40, Justin Stitt <justinstitt@google.com> wrote: > > > > There's some docs at [1]. Perhaps there could be more? > > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > Note that we have so few 'strlcpy()' calls that we really should > remove that horrid horrid interface. It's a buggy piece of sh*t. Yup, that's on-going. There's just a few left; Azeem has been chipping away at strlcpy. > It does mean that if you used to have > > dst[4]; > strlcpy(dst, "abc", 8); > > then that *used* to work (because it would copy four bytes: "abc\0" > and that fits in 'dst[]'). But > > dst[4]; > strscpy(dst, "abc", 8); > > will overflow dst[], because it will do a word-copy and you told > 'strscpy()' that you had a 8-byte buffer, and it will try to write > "abc\0\0\0\0\0" into the destination. Luckily, we already have checks for these mismatched sizes at compile time (i.e. CONFIG_FORTIFY_SOURCE will already check for pathological cases like above where 8 > sizeof(dst)). > The above is insane code, but it's an example of why a blind > strlcpy->strscpy conversion might change semantics. Totally agreed. All of the recent string conversions have been paying close attention to the behavioral differences.
On Fri, Oct 20, 2023 at 10:40:12AM -0700, Justin Stitt wrote: > On Thu, Oct 19, 2023 at 9:46 PM Christoph Hellwig <hch@lst.de> wrote: > > > > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > > Almost all of the remaining strncpy() usage is just string to string > > > copying, but the corner cases that are being spun out that aren't > > > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), > > > and memcpy(). Each of these are a clear improvement since they remove > > > the ambiguity of the intended behavior. Using seq_buf ends up being way > > > more overhead than is needed. > > > > I'm really not sure strscpy is much of an improvement. In this particular > > case in most other places we simply use a snprintf for nqns, which seems > > useful here to if we don't want the full buf. > > > > But switching to a completely undocumented helper like strscpy seems not > > useful at all. I'm curious where you looked and didn't find documentation -- perhaps there is an improvement to be made to aim one to where the existing documentation lives? > > There's some docs at [1]. Perhaps there could be more? > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 Right, And it's even valid kern-doc, which gets rendered in the kernel API docs, along with all the other string functions: https://docs.kernel.org/core-api/kernel-api.html#c.strscpy
On Fri, Oct 20, 2023 at 11:30:49AM -0700, Kees Cook wrote: > I'm curious where you looked and didn't find documentation -- perhaps > there is an improvement to be made to aim one to where the existing > documentation lives? My order was the following: - look for kernel doc on the main function implementation in lib/string.c (as found by a grep for an EXPORT_SYMBOL for it) - after not finding it there, but seeing that it has an ifdef for an arch override, which turns out to be unused - then I grepped the Documentation/ directory for it, and while there are quite a few matches for strscpy, they are largely in examples, with the only text referring to strscpy being Documentation/process/deprecated.rst that tells you to use it instead of strcpy, but not how it actually works - after that I realized that some people put the kerneldoc on the declaration, so I looked at that in string.h, but couldn't find it. > > There's some docs at [1]. Perhaps there could be more? > > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > Right, And it's even valid kern-doc, which gets rendered in the kernel > API docs, along with all the other string functions: > https://docs.kernel.org/core-api/kernel-api.html#c.strscpy Well, I never use the generated kerneldoc because it's much harder than just grepping the tree, but indeed it exists even if it's hidden in the most obsfucated way. But at least I know now!
On Thu, 2023-10-26 at 12:01 +0200, Christoph Hellwig wrote: > > > There's some docs at [1]. Perhaps there could be more? > > > > > > [1]: > > > https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > > > Right, And it's even valid kern-doc, which gets rendered in the > > kernel API docs, along with all the other string functions: > > https://docs.kernel.org/core-api/kernel-api.html#c.strscpy > > Well, I never use the generated kerneldoc because it's much harder > than just grepping the tree, but indeed it exists even if it's hidden > in the most obsfucated way. But at least I know now! This, honestly, is one of the really annoying problems of kerneldoc. When looking for structures or functions git grep "<function> -" usually finds it. However, I recently asked on linux-scsi if we could point to the doc about system_state and what it meant. However, either we all suck or there's no such documentation because no-one could find it. While it's nice in theory to have everything documented, it's not much use if no one can actually find the information ... James
> > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292
I found that https://elixir.bootlin.com/linux is the best way to find
Documentation for functions and structures. I would suggest try it
first, and only when what fails to start using grep.
Andrew
On Thu, Oct 26, 2023 at 03:44:22PM +0200, Andrew Lunn wrote: > > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > I found that https://elixir.bootlin.com/linux is the best way to find > Documentation for functions and structures. I would suggest try it > first, and only when what fails to start using grep. It's painful to have to open the HTML documentation generated by the kernel build system when developing, and even more painful to have to use a particular website. If the documentation is difficult to locate in the source tree, we're doing something wrong.
On Thu, 26 Oct 2023 07:39:44 -0400 James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > While it's nice in theory to have everything documented, it's not much > use if no one can actually find the information ... Does kerneldoc provide an automated index? That is, if we had a single file that had every function in the kernel that is documented, with the path to the file that documents it, it would make finding documentation much simpler. Maybe it already does? Which would mean we need a way to find the index too! -- Steve
Hi Steven, On Thu, Oct 26, 2023 at 3:52 PM Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, 26 Oct 2023 07:39:44 -0400 > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > > While it's nice in theory to have everything documented, it's not much > > use if no one can actually find the information ... > > Does kerneldoc provide an automated index? That is, if we had a single file > that had every function in the kernel that is documented, with the path to > the file that documents it, it would make finding documentation much > simpler. > > Maybe it already does? Which would mean we need a way to find the index too! ctags? Although "git grep" is faster (assumed you use the "correct" search pattern, which can sometimes be challenging, indeed). Gr{oetje,eeting}s, Geert
Steven Rostedt <rostedt@goodmis.org> writes: > On Thu, 26 Oct 2023 07:39:44 -0400 > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > >> While it's nice in theory to have everything documented, it's not much >> use if no one can actually find the information ... > > Does kerneldoc provide an automated index? If you go to https://www.kernel.org/doc/html/latest/ and type a symbol into the search box on the left, you'll get directed to the right place (if it exists). For James's system_state example, it makes it clear that there's only one reference - in the coding-style document, of all places... I've never looked into that index to see how hard it would be to access without a browser. jon
On Thu, 2023-10-26 at 15:44 +0200, Andrew Lunn wrote: > > > > [1]: > > > > https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > I found that https://elixir.bootlin.com/linux That's a 404, I think you mean https://elixir.bootlin.com/linux/latest/source > is the best way to find Documentation for functions and structures. > I would suggest try it first, and only when what fails to start using > grep. I just tried it with system_state and it doesn't even find the definition. I think it might be because it has annotations which confuse the searcher (it's in init/main.c as enum system_states system_state __read_mostly; ). If there's any meaningful doc about it, elixir also doesn't find it. James
Jonathan Corbet <corbet@lwn.net> writes: > Steven Rostedt <rostedt@goodmis.org> writes: > >> On Thu, 26 Oct 2023 07:39:44 -0400 >> James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >> >>> While it's nice in theory to have everything documented, it's not much >>> use if no one can actually find the information ... >> >> Does kerneldoc provide an automated index? > > If you go to https://www.kernel.org/doc/html/latest/ BTW there's now a shorter URL for this whic is really nice: https://docs.kernel.org/
On Thu, Oct 26, 2023 at 03:59:28PM +0200, Geert Uytterhoeven wrote: > Hi Steven, > > On Thu, Oct 26, 2023 at 3:52 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 26 Oct 2023 07:39:44 -0400 > > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > > > > While it's nice in theory to have everything documented, it's not much > > > use if no one can actually find the information ... > > > > Does kerneldoc provide an automated index? That is, if we had a single file > > that had every function in the kernel that is documented, with the path to > > the file that documents it, it would make finding documentation much > > simpler. > > > > Maybe it already does? Which would mean we need a way to find the index too! > > ctags? ctags is a tool from previous century. It doesn't help that "make tags" is single-threaded. It needs constant babysitting (loop-like macros, ignore attibute annotations which masquerade as identifiers). I think "make tags" became much slower because ignore-list is one giant regexp which only grows bigger. > Although "git grep" is faster (assumed you use the "correct" search > pattern, which can sometimes be challenging, indeed). I tried QT Creator indexing at some point (which is parallel), it needs to be told that headers are C not C++. I didn't find a way to tell it that .c files are C too but F2 jumped to definitions quite well. Also hovering over identifier/name works (being IDE it understands popular doc styles). It can be made to work reasonably well provided that you did "make allmodconfig" and added few header locations. clangd parses like compiler, not like human and kernel uses a lot of CONFIG defines so some config must be chosen. But I need to recheck all this stuff now that new version was propagated to distros. It should be better (and less segfaulty :-)
On Wed, 18 Oct 2023 22:48:49 +0000, Justin Stitt wrote: > strncpy() is deprecated for use on NUL-terminated destination strings > [1] and as such we should prefer more robust and less ambiguous string > interfaces. > > We expect both data->subsysnqn and data->hostnqn to be NUL-terminated > based on their usage with format specifier ("%s"): > fabrics.c: > 322: dev_err(ctrl->device, > 323: "%s, subsysnqn \"%s\"\n", > 324: inv_data, data->subsysnqn); > ... > 349: dev_err(ctrl->device, > 350: "Connect for subsystem %s is not allowed, hostnqn: %s\n", > 351: data->subsysnqn, data->hostnqn); > > [...] Applied to for-next/hardening, thanks! [1/1] nvme-fabrics: replace deprecated strncpy with strscpy https://git.kernel.org/kees/c/5ef935fd5520 Take care,
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 8175d49f2909..57aad3ce311a 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -386,8 +386,8 @@ static struct nvmf_connect_data *nvmf_connect_data_prep(struct nvme_ctrl *ctrl, uuid_copy(&data->hostid, &ctrl->opts->host->id); data->cntlid = cpu_to_le16(cntlid); - strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE); - strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE); + strscpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE); + strscpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE); return data; }
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect both data->subsysnqn and data->hostnqn to be NUL-terminated based on their usage with format specifier ("%s"): fabrics.c: 322: dev_err(ctrl->device, 323: "%s, subsysnqn \"%s\"\n", 324: inv_data, data->subsysnqn); ... 349: dev_err(ctrl->device, 350: "Connect for subsystem %s is not allowed, hostnqn: %s\n", 351: data->subsysnqn, data->hostnqn); Moreover, there's no need to NUL-pad since `data` is zero-allocated already in fabrics.c: 383: data = kzalloc(sizeof(*data), GFP_KERNEL); ... therefore any further NUL-padding is rendered useless. Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. I opted not to switch NVMF_NQN_SIZE to sizeof(data->xyz) because the size is defined as: | /* NQN names in commands fields specified one size */ | #define NVMF_NQN_FIELD_LEN 256 ... while NVMF_NQN_SIZE is defined as: | /* However the max length of a qualified name is another size */ | #define NVMF_NQN_SIZE 223 Since 223 seems pretty magic, I'm not going to touch it. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> --- Note: build-tested only. Found with: $ rg "strncpy\(" --- drivers/nvme/host/fabrics.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- base-commit: 58720809f52779dc0f08e53e54b014209d13eebb change-id: 20231018-strncpy-drivers-nvme-host-fabrics-c-416258a22598 Best regards, -- Justin Stitt <justinstitt@google.com>