diff mbox series

[v2,1/2] string: Add load_unaligned_zeropad() code path to sized_strscpy()

Message ID 20250318214035.481950-2-pcc@google.com (mailing list archive)
State New
Headers show
Series string: Add load_unaligned_zeropad() code path to sized_strscpy() | expand

Commit Message

Peter Collingbourne March 18, 2025, 9:40 p.m. UTC
The call to read_word_at_a_time() in sized_strscpy() is problematic
with MTE because it may trigger a tag check fault when reading
across a tag granule (16 bytes) boundary. To make this code
MTE compatible, let's start using load_unaligned_zeropad()
on architectures where it is available (i.e. architectures that
define CONFIG_DCACHE_WORD_ACCESS). Because load_unaligned_zeropad()
takes care of page boundaries as well as tag granule boundaries,
also disable the code preventing crossing page boundaries when using
load_unaligned_zeropad().

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/If4b22e43b5a4ca49726b4bf98ada827fdf755548
Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
Cc: stable@vger.kernel.org
---
v2:
- new approach

 lib/string.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Kees Cook March 19, 2025, 3:06 a.m. UTC | #1
On Tue, Mar 18, 2025 at 02:40:32PM -0700, Peter Collingbourne wrote:
> The call to read_word_at_a_time() in sized_strscpy() is problematic
> with MTE because it may trigger a tag check fault when reading
> across a tag granule (16 bytes) boundary. To make this code
> MTE compatible, let's start using load_unaligned_zeropad()
> on architectures where it is available (i.e. architectures that
> define CONFIG_DCACHE_WORD_ACCESS). Because load_unaligned_zeropad()
> takes care of page boundaries as well as tag granule boundaries,
> also disable the code preventing crossing page boundaries when using
> load_unaligned_zeropad().
> 
> Signed-off-by: Peter Collingbourne <pcc@google.com>
> Link: https://linux-review.googlesource.com/id/If4b22e43b5a4ca49726b4bf98ada827fdf755548
> Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
> Cc: stable@vger.kernel.org
> ---
> v2:
> - new approach
> 
>  lib/string.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/string.c b/lib/string.c
> index eb4486ed40d25..b632c71df1a50 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -119,6 +119,7 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
>  	if (count == 0 || WARN_ON_ONCE(count > INT_MAX))
>  		return -E2BIG;
>  
> +#ifndef CONFIG_DCACHE_WORD_ACCESS
>  #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

I would prefer this were written as:

#if !defined(CONFIG_DCACHE_WORD_ACCESS) && \
    defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)

Having 2 #ifs makes me think there is some reason for having them
separable. But the logic here is for a single check.

>  	/*
>  	 * If src is unaligned, don't cross a page boundary,
> @@ -133,12 +134,14 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
>  	/* If src or dest is unaligned, don't do word-at-a-time. */
>  	if (((long) dest | (long) src) & (sizeof(long) - 1))
>  		max = 0;
> +#endif
>  #endif

(Then no second #endif needed)

>  
>  	/*
> -	 * read_word_at_a_time() below may read uninitialized bytes after the
> -	 * trailing zero and use them in comparisons. Disable this optimization
> -	 * under KMSAN to prevent false positive reports.
> +	 * load_unaligned_zeropad() or read_word_at_a_time() below may read
> +	 * uninitialized bytes after the trailing zero and use them in
> +	 * comparisons. Disable this optimization under KMSAN to prevent
> +	 * false positive reports.
>  	 */
>  	if (IS_ENABLED(CONFIG_KMSAN))
>  		max = 0;
> @@ -146,7 +149,11 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
>  	while (max >= sizeof(unsigned long)) {
>  		unsigned long c, data;
>  
> +#ifdef CONFIG_DCACHE_WORD_ACCESS
> +		c = load_unaligned_zeropad(src+res);
> +#else
>  		c = read_word_at_a_time(src+res);
> +#endif
>  		if (has_zero(c, &data, &constants)) {
>  			data = prep_zero_mask(c, data, &constants);
>  			data = create_zero_mask(data);

The rest seems good. Though I do wonder: what happens on a page boundary
for read_word_at_a_time(), then? We get back zero-filled remainder? Will
that hide a missing NUL terminator? As in, it's not actually there
because of the end of the page/granule, but a zero was put in, so now
it looks like it's been terminated and the exception got eaten? And
doesn't this hide MTE faults since we can't differentiate "overran MTE
tag" from "overran granule while over-reading"?
Peter Collingbourne March 19, 2025, 3:35 a.m. UTC | #2
On Tue, Mar 18, 2025 at 8:06 PM Kees Cook <kees@kernel.org> wrote:
>
> On Tue, Mar 18, 2025 at 02:40:32PM -0700, Peter Collingbourne wrote:
> > The call to read_word_at_a_time() in sized_strscpy() is problematic
> > with MTE because it may trigger a tag check fault when reading
> > across a tag granule (16 bytes) boundary. To make this code
> > MTE compatible, let's start using load_unaligned_zeropad()
> > on architectures where it is available (i.e. architectures that
> > define CONFIG_DCACHE_WORD_ACCESS). Because load_unaligned_zeropad()
> > takes care of page boundaries as well as tag granule boundaries,
> > also disable the code preventing crossing page boundaries when using
> > load_unaligned_zeropad().
> >
> > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > Link: https://linux-review.googlesource.com/id/If4b22e43b5a4ca49726b4bf98ada827fdf755548
> > Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
> > Cc: stable@vger.kernel.org
> > ---
> > v2:
> > - new approach
> >
> >  lib/string.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/string.c b/lib/string.c
> > index eb4486ed40d25..b632c71df1a50 100644
> > --- a/lib/string.c
> > +++ b/lib/string.c
> > @@ -119,6 +119,7 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
> >       if (count == 0 || WARN_ON_ONCE(count > INT_MAX))
> >               return -E2BIG;
> >
> > +#ifndef CONFIG_DCACHE_WORD_ACCESS
> >  #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>
> I would prefer this were written as:
>
> #if !defined(CONFIG_DCACHE_WORD_ACCESS) && \
>     defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
>
> Having 2 #ifs makes me think there is some reason for having them
> separable. But the logic here is for a single check.

There is indeed a reason for having two: there's an #else in the
middle (which pertains to CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
only).

>
> >       /*
> >        * If src is unaligned, don't cross a page boundary,
> > @@ -133,12 +134,14 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
> >       /* If src or dest is unaligned, don't do word-at-a-time. */
> >       if (((long) dest | (long) src) & (sizeof(long) - 1))
> >               max = 0;
> > +#endif
> >  #endif
>
> (Then no second #endif needed)
>
> >
> >       /*
> > -      * read_word_at_a_time() below may read uninitialized bytes after the
> > -      * trailing zero and use them in comparisons. Disable this optimization
> > -      * under KMSAN to prevent false positive reports.
> > +      * load_unaligned_zeropad() or read_word_at_a_time() below may read
> > +      * uninitialized bytes after the trailing zero and use them in
> > +      * comparisons. Disable this optimization under KMSAN to prevent
> > +      * false positive reports.
> >        */
> >       if (IS_ENABLED(CONFIG_KMSAN))
> >               max = 0;
> > @@ -146,7 +149,11 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
> >       while (max >= sizeof(unsigned long)) {
> >               unsigned long c, data;
> >
> > +#ifdef CONFIG_DCACHE_WORD_ACCESS
> > +             c = load_unaligned_zeropad(src+res);
> > +#else
> >               c = read_word_at_a_time(src+res);
> > +#endif
> >               if (has_zero(c, &data, &constants)) {
> >                       data = prep_zero_mask(c, data, &constants);
> >                       data = create_zero_mask(data);
>
> The rest seems good. Though I do wonder: what happens on a page boundary
> for read_word_at_a_time(), then? We get back zero-filled remainder? Will
> that hide a missing NUL terminator? As in, it's not actually there
> because of the end of the page/granule, but a zero was put in, so now
> it looks like it's been terminated and the exception got eaten? And
> doesn't this hide MTE faults since we can't differentiate "overran MTE
> tag" from "overran granule while over-reading"?

Correct. The behavior I implemented seems good enough for now IMO (and
at least good enough for backports). If we did want to detect this
case, we would need an alternative to "load_unaligned_zeropad" that
would do something other than fill the unreadable bytes with zeros in
the fault handler. For example, it could fill them with all-ones. That
would prevent the loop from being terminated by the tag check fault,
and we would proceed to the next granule, take another tag check fault
which would recur in the fault handler and cause the bug to be
detected.

Peter
diff mbox series

Patch

diff --git a/lib/string.c b/lib/string.c
index eb4486ed40d25..b632c71df1a50 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -119,6 +119,7 @@  ssize_t sized_strscpy(char *dest, const char *src, size_t count)
 	if (count == 0 || WARN_ON_ONCE(count > INT_MAX))
 		return -E2BIG;
 
+#ifndef CONFIG_DCACHE_WORD_ACCESS
 #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 	/*
 	 * If src is unaligned, don't cross a page boundary,
@@ -133,12 +134,14 @@  ssize_t sized_strscpy(char *dest, const char *src, size_t count)
 	/* If src or dest is unaligned, don't do word-at-a-time. */
 	if (((long) dest | (long) src) & (sizeof(long) - 1))
 		max = 0;
+#endif
 #endif
 
 	/*
-	 * read_word_at_a_time() below may read uninitialized bytes after the
-	 * trailing zero and use them in comparisons. Disable this optimization
-	 * under KMSAN to prevent false positive reports.
+	 * load_unaligned_zeropad() or read_word_at_a_time() below may read
+	 * uninitialized bytes after the trailing zero and use them in
+	 * comparisons. Disable this optimization under KMSAN to prevent
+	 * false positive reports.
 	 */
 	if (IS_ENABLED(CONFIG_KMSAN))
 		max = 0;
@@ -146,7 +149,11 @@  ssize_t sized_strscpy(char *dest, const char *src, size_t count)
 	while (max >= sizeof(unsigned long)) {
 		unsigned long c, data;
 
+#ifdef CONFIG_DCACHE_WORD_ACCESS
+		c = load_unaligned_zeropad(src+res);
+#else
 		c = read_word_at_a_time(src+res);
+#endif
 		if (has_zero(c, &data, &constants)) {
 			data = prep_zero_mask(c, data, &constants);
 			data = create_zero_mask(data);