Message ID | alpine.DEB.2.21.2001141757490.108121@chino.kir.corp.google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm, thp: fix defrag setting if newline is not used | expand |
On 1/15/20 2:58 AM, David Rientjes wrote: > If thp defrag setting "defer" is used and a newline is *not* used when > writing to the sysfs file, this is interpreted as the "defer+madvise" > option. > > This is because we do prefix matching and if five characters are written > without a newline, the current code ends up comparing to the first five > bytes of the "defer+madvise" option and using that instead. > > Find the length of what the user is writing and use that to guide our > decision on which string comparison to do. > > Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option") > Signed-off-by: David Rientjes <rientjes@google.com> > --- > This can be done in *many* different ways including extracting logic to > a helper function. If someone would like this to be implemented > differently, please suggest it. I've come up with this: diff --git mm/huge_memory.c mm/huge_memory.c index 41a0fbddc96b..f36b93334874 100644 --- mm/huge_memory.c +++ mm/huge_memory.c @@ -256,7 +256,7 @@ static ssize_t defrag_store(struct kobject *kobj, clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("defer+madvise", buf, + } else if (count > sizeof("defer")-1 && !memcmp("defer+madvise", buf, min(sizeof("defer+madvise")-1, count))) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); It's smaller, but more hacky. But it doesn't add new restrictions. E.g. this still works: # echo -n 'alw' > /sys/kernel/mm/transparent_hugepage/defrag # cat /sys/kernel/mm/transparent_hugepage/defrag [always] defer defer+madvise madvise never But whether anyone does that, I don't know (it doesn't work without -n). Also this still works: # echo -n 'defer ' > /sys/kernel/mm/transparent_hugepage/defrag # cat /sys/kernel/mm/transparent_hugepage/defrag always [defer] defer+madvise madvise never Ideally we would have had strict matching as you propose (no matching of prefixes) since the beginning and use e.g. strstrip() to remove all whitespace from buffer first. But it's 'const char *' and I'm not sure if it's null-terminated.
On Tue, 14 Jan 2020 17:58:36 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > If thp defrag setting "defer" is used and a newline is *not* used when > writing to the sysfs file, this is interpreted as the "defer+madvise" > option. > > This is because we do prefix matching and if five characters are written > without a newline, the current code ends up comparing to the first five > bytes of the "defer+madvise" option and using that instead. > > Find the length of what the user is writing and use that to guide our > decision on which string comparison to do. Gee, why is this code so complicated? Can't we just do if (sysfs_streq(buf, "always")) { ... } else if sysfs_streq(buf, "defer+madvise")) { ... } ...
On 1/17/20 4:16 AM, Andrew Morton wrote: > On Tue, 14 Jan 2020 17:58:36 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > >> If thp defrag setting "defer" is used and a newline is *not* used when >> writing to the sysfs file, this is interpreted as the "defer+madvise" >> option. >> >> This is because we do prefix matching and if five characters are written >> without a newline, the current code ends up comparing to the first five >> bytes of the "defer+madvise" option and using that instead. >> >> Find the length of what the user is writing and use that to guide our >> decision on which string comparison to do. > > Gee, why is this code so complicated? Can't we just do > > if (sysfs_streq(buf, "always")) { > ... > } else if sysfs_streq(buf, "defer+madvise")) { > ... > } > ... Yeah, if we knew this existed :) We would lose the prefix matching but hopefully nobody will complain.
On Fri, 17 Jan 2020, Vlastimil Babka wrote: > >> If thp defrag setting "defer" is used and a newline is *not* used when > >> writing to the sysfs file, this is interpreted as the "defer+madvise" > >> option. > >> > >> This is because we do prefix matching and if five characters are written > >> without a newline, the current code ends up comparing to the first five > >> bytes of the "defer+madvise" option and using that instead. > >> > >> Find the length of what the user is writing and use that to guide our > >> decision on which string comparison to do. > > > > Gee, why is this code so complicated? Can't we just do > > > > if (sysfs_streq(buf, "always")) { > > ... > > } else if sysfs_streq(buf, "defer+madvise")) { > > ... > > } > > ... > > Yeah, if we knew this existed :) > > We would lose the prefix matching but hopefully nobody will complain. > I tested Vlastimil's patch and it works as intended so I was about to modify the changelog and send his patch and ask for a sign-off line because I think I agree the *partial* prefix matching has ~0.1% chance of breaking userspace and that 0.1% chance outweighs my desire to make the code consistent for all options. But if userspace were broken by this, then at least it was already broken for "defer" depending on newline vs no newline. (What we do know is that nobody has used "defer" for the past couple years without a newline :). If nobody objects, I'll test and send Andrew's version with the changelog because I think we all agree the risk of breakage here is very minimal and actually fixes the case for defer.
On 1/17/20 10:43 AM, David Rientjes wrote: > On Fri, 17 Jan 2020, Vlastimil Babka wrote: > >>>> If thp defrag setting "defer" is used and a newline is *not* used when >>>> writing to the sysfs file, this is interpreted as the "defer+madvise" >>>> option. >>>> >>>> This is because we do prefix matching and if five characters are written >>>> without a newline, the current code ends up comparing to the first five >>>> bytes of the "defer+madvise" option and using that instead. >>>> >>>> Find the length of what the user is writing and use that to guide our >>>> decision on which string comparison to do. >>> >>> Gee, why is this code so complicated? Can't we just do >>> >>> if (sysfs_streq(buf, "always")) { >>> ... >>> } else if sysfs_streq(buf, "defer+madvise")) { >>> ... >>> } >>> ... >> >> Yeah, if we knew this existed :) >> >> We would lose the prefix matching but hopefully nobody will complain. >> > > I tested Vlastimil's patch and it works as intended so I was about to > modify the changelog and send his patch and ask for a sign-off line > because I think I agree the *partial* prefix matching has ~0.1% chance of > breaking userspace and that 0.1% chance outweighs my desire to make the > code consistent for all options. If prefix matching worked with "echo alw > /sys..." then I would expect some script out there relies on it, but since it only works with "echo -n alw > /..." then perhaps there's no such script :) > But if userspace were broken by this, then at least it was already broken > for "defer" depending on newline vs no newline. (What we do know is that > nobody has used "defer" for the past couple years without a newline :). > > If nobody objects, I'll test and send Andrew's version with the changelog > because I think we all agree the risk of breakage here is very minimal and > actually fixes the case for defer. Agreed.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -250,32 +250,33 @@ static ssize_t defrag_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { - if (!memcmp("always", buf, - min(sizeof("always")-1, count))) { + size_t len = count; + + /* For prefix matching, find the length of interest */ + if (buf[len-1] == '\n') + len--; + + if (len == sizeof("always")-1 && !memcmp("always", buf, len)) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("defer+madvise", buf, - min(sizeof("defer+madvise")-1, count))) { + } else if (len == sizeof("defer+madvise")-1 && !memcmp("defer+madvise", buf, len)) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("defer", buf, - min(sizeof("defer")-1, count))) { + } else if (len == sizeof("defer")-1 && !memcmp("defer", buf, len)) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("madvise", buf, - min(sizeof("madvise")-1, count))) { + } else if (len == sizeof("madvise")-1 && !memcmp("madvise", buf, len)) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("never", buf, - min(sizeof("never")-1, count))) { + } else if (len == sizeof("never")-1 && !memcmp("never", buf, len)) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags);
If thp defrag setting "defer" is used and a newline is *not* used when writing to the sysfs file, this is interpreted as the "defer+madvise" option. This is because we do prefix matching and if five characters are written without a newline, the current code ends up comparing to the first five bytes of the "defer+madvise" option and using that instead. Find the length of what the user is writing and use that to guide our decision on which string comparison to do. Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option") Signed-off-by: David Rientjes <rientjes@google.com> --- This can be done in *many* different ways including extracting logic to a helper function. If someone would like this to be implemented differently, please suggest it. mm/huge_memory.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)