diff mbox series

[v2,1/6] fs/readdir: Fix filldir() and filldir64() use of user_access_begin()

Message ID 12a4be679e43de1eca6e5e2173163f27e2f25236.1579715466.git.christophe.leroy@c-s.fr (mailing list archive)
State New, archived
Headers show
Series [v2,1/6] fs/readdir: Fix filldir() and filldir64() use of user_access_begin() | expand

Commit Message

Christophe Leroy Jan. 22, 2020, 5:52 p.m. UTC
Some architectures grand full access to userspace regardless of the
address/len passed to user_access_begin(), but other architectures
only grand access to the requested area.

For exemple, on 32 bits powerpc (book3s/32), access is granted by
segments of 256 Mbytes.

Modify filldir() and filldir64() to request the real area they need
to get access to, i.e. the area covering the parent dirent (if any)
and the contiguous current dirent.

Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v2: have user_access_begin() cover both parent dirent (if any) and current dirent
---
 fs/readdir.c | 50 ++++++++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 22 deletions(-)

Comments

Linus Torvalds Jan. 22, 2020, 8 p.m. UTC | #1
On Wed, Jan 22, 2020 at 10:24 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Patch looks better, but those names are horrid.

Hmm.

A bit more re-organization also allows us to do the unsafe_put_user()
unconditionally.

In particular, if we remove 'previous' as a pointer from the filldir
data structure, and replace it with 'prev_reclen', then we can do

        prev_reclen = buf->prev_reclen;
        dirent = buf->current_dir;
        prev = (void __user *) dirent - prev_reclen;
        if (!user_access_begin(prev, reclen + prev_reclen))
                goto efault;

and instead of checking that 'previous' pointer for NULL, we just
check prev_reclen for not being zero.

Yes, it replaces a few other

        lastdirent = buf.previous;

with the slightly more complicated

        lastdirent = (void __user *)buf.current_dir - buf.prev_reclen;

but on the whole it makes the _important_ code more streamlined, and
avoids having to have those if-else cases.

Something like the attached.

COMPLETELY UNTESTED! It compiles for me. The generated assembly looks
ok from a quick look.

Christophe, does this work for you on your ppc test-case?

Side note: I think verify_dirent_name() should check that 'len' is in
the appropriate range too, because right now a corrupted filesystem is
only noticed for a zero length. But a negative one, or one where the
reclen calculations would overflow, is not noticed.

Most filesystems have the source of 'len' being something like an
'unsigned char' so that it's pretty bounded anyway, which is likely
why nobody cared when we added that check, but..

               Linus
Linus Torvalds Jan. 22, 2020, 8:15 p.m. UTC | #2
On Wed, Jan 22, 2020 at 12:00 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> A bit more re-organization also allows us to do the unsafe_put_user()
> unconditionally.

I meant the "user_access_begin()", of course.

Code was right, explanation was wrong.

That said, with this model, we _could_ make the

        unsafe_put_user(offset, &prev->d_off, efault_end);

be unconditional too, since now 'prev' will actually be a valid
pointer - it will match 'dirent' if there was no prev.

But since we want to test whether we had a previous entry anyway (for
the signal handling latency issue), making the write to the previous
d_reclen unconditional (and then overwriting it the next iteration)
doesn't actually buy us anything.

It was the user_access_begin() I'd rather have unconditional, since
otherwise it gets duplicated in two (very slightly) different versions
and we have unnecessary code bloat.

           Linus
Linus Torvalds Jan. 22, 2020, 8:37 p.m. UTC | #3
[ Talking to myself ]

On Wed, Jan 22, 2020 at 12:00 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> COMPLETELY UNTESTED! It compiles for me. The generated assembly looks
> ok from a quick look.

Some more testing shows that objtool is unhappy about how we do that
signal_pending(current) inside the user access region.

I didn't notice because my test builds were with sane kernel
configurations so that I could look at the generated code.

But with KASAN enabled, the signal check causes accesses that KASAN
wants to check, and I get

  objtool: filldir()+0x395: call to __kasan_check_read() with UACCESS enabled

warnings.

So that patch of mine isn't acceptable for silly reasons, and the
signal check itself would need to be done outside of the user access
area.

That actually makes the whole "let's do the &prev->d_off setting
unconditionally" much more interesting.

So here's a slightly updated patch that does exactly that, and avoids
the objtool warning.

It actually generates better code than the last one too, because now
we don't duplicate the user_access_end() for the EINTR case.

So test this one instead, please.

                 Linus
Christophe Leroy Jan. 23, 2020, 6:27 a.m. UTC | #4
Le 22/01/2020 à 21:37, Linus Torvalds a écrit :
> [ Talking to myself ]
> 
> On Wed, Jan 22, 2020 at 12:00 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> COMPLETELY UNTESTED! It compiles for me. The generated assembly looks
>> ok from a quick look.
> 
> 
> So here's a slightly updated patch that does exactly that, and avoids
> the objtool warning.
> 
> It actually generates better code than the last one too, because now
> we don't duplicate the user_access_end() for the EINTR case.
> 
> So test this one instead, please.

This patch works on my ppc board, thanks

Christophe
Michael Ellerman Jan. 23, 2020, 11:56 a.m. UTC | #5
Hi Christophe,

This patch is independent of the rest of the series AFAICS, and it looks
like Linus has modified it quite a bit down thread.

So I'll take patches 2-6 via powerpc and assume this patch will go via
Linus or Al or elsewhere.

Also a couple of minor spelling fixes below.

cheers

Christophe Leroy <christophe.leroy@c-s.fr> writes:
> Some architectures grand full access to userspace regardless of the
                     ^
                     grant
> address/len passed to user_access_begin(), but other architectures
> only grand access to the requested area.
       ^
       grant
>
> For exemple, on 32 bits powerpc (book3s/32), access is granted by
      ^
      example
> segments of 256 Mbytes.
>
> Modify filldir() and filldir64() to request the real area they need
> to get access to, i.e. the area covering the parent dirent (if any)
> and the contiguous current dirent.
>
> Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> v2: have user_access_begin() cover both parent dirent (if any) and current dirent
> ---
>  fs/readdir.c | 50 ++++++++++++++++++++++++++++----------------------
>  1 file changed, 28 insertions(+), 22 deletions(-)
>
> diff --git a/fs/readdir.c b/fs/readdir.c
> index d26d5ea4de7b..3f9b4488d9b7 100644
> --- a/fs/readdir.c
> +++ b/fs/readdir.c
> @@ -214,7 +214,7 @@ struct getdents_callback {
>  static int filldir(struct dir_context *ctx, const char *name, int namlen,
>  		   loff_t offset, u64 ino, unsigned int d_type)
>  {
> -	struct linux_dirent __user * dirent;
> +	struct linux_dirent __user * dirent, *dirent0;
>  	struct getdents_callback *buf =
>  		container_of(ctx, struct getdents_callback, ctx);
>  	unsigned long d_ino;
> @@ -232,19 +232,22 @@ static int filldir(struct dir_context *ctx, const char *name, int namlen,
>  		buf->error = -EOVERFLOW;
>  		return -EOVERFLOW;
>  	}
> -	dirent = buf->previous;
> -	if (dirent && signal_pending(current))
> +	dirent0 = buf->previous;
> +	if (dirent0 && signal_pending(current))
>  		return -EINTR;
>  
> -	/*
> -	 * Note! This range-checks 'previous' (which may be NULL).
> -	 * The real range was checked in getdents
> -	 */
> -	if (!user_access_begin(dirent, sizeof(*dirent)))
> -		goto efault;
> -	if (dirent)
> -		unsafe_put_user(offset, &dirent->d_off, efault_end);
>  	dirent = buf->current_dir;
> +	if (dirent0) {
> +		int sz = (void __user *)dirent + reclen -
> +			 (void __user *)dirent0;
> +
> +		if (!user_access_begin(dirent0, sz))
> +			goto efault;
> +		unsafe_put_user(offset, &dirent0->d_off, efault_end);
> +	} else {
> +		if (!user_access_begin(dirent, reclen))
> +			goto efault;
> +	}
>  	unsafe_put_user(d_ino, &dirent->d_ino, efault_end);
>  	unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
>  	unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end);
> @@ -307,7 +310,7 @@ struct getdents_callback64 {
>  static int filldir64(struct dir_context *ctx, const char *name, int namlen,
>  		     loff_t offset, u64 ino, unsigned int d_type)
>  {
> -	struct linux_dirent64 __user *dirent;
> +	struct linux_dirent64 __user *dirent, *dirent0;
>  	struct getdents_callback64 *buf =
>  		container_of(ctx, struct getdents_callback64, ctx);
>  	int reclen = ALIGN(offsetof(struct linux_dirent64, d_name) + namlen + 1,
> @@ -319,19 +322,22 @@ static int filldir64(struct dir_context *ctx, const char *name, int namlen,
>  	buf->error = -EINVAL;	/* only used if we fail.. */
>  	if (reclen > buf->count)
>  		return -EINVAL;
> -	dirent = buf->previous;
> -	if (dirent && signal_pending(current))
> +	dirent0 = buf->previous;
> +	if (dirent0 && signal_pending(current))
>  		return -EINTR;
>  
> -	/*
> -	 * Note! This range-checks 'previous' (which may be NULL).
> -	 * The real range was checked in getdents
> -	 */
> -	if (!user_access_begin(dirent, sizeof(*dirent)))
> -		goto efault;
> -	if (dirent)
> -		unsafe_put_user(offset, &dirent->d_off, efault_end);
>  	dirent = buf->current_dir;
> +	if (dirent0) {
> +		int sz = (void __user *)dirent + reclen -
> +			 (void __user *)dirent0;
> +
> +		if (!user_access_begin(dirent0, sz))
> +			goto efault;
> +		unsafe_put_user(offset, &dirent0->d_off, efault_end);
> +	} else {
> +		if (!user_access_begin(dirent, reclen))
> +			goto efault;
> +	}
>  	unsafe_put_user(ino, &dirent->d_ino, efault_end);
>  	unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
>  	unsafe_put_user(d_type, &dirent->d_type, efault_end);
> -- 
> 2.25.0
Michael Ellerman Jan. 23, 2020, noon UTC | #6
Michael Ellerman <mpe@ellerman.id.au> writes:
> Hi Christophe,
>
> This patch is independent of the rest of the series AFAICS

And of course having hit send I immediately realise that's not true.

> So I'll take patches 2-6 via powerpc and assume this patch will go via
> Linus or Al or elsewhere.

So I guess I'll wait and see what happens with patch 1.

cheers
Christophe Leroy Jan. 23, 2020, 12:43 p.m. UTC | #7
Le 23/01/2020 à 13:00, Michael Ellerman a écrit :
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Hi Christophe,
>>
>> This patch is independent of the rest of the series AFAICS
> 
> And of course having hit send I immediately realise that's not true.

Without this, book3s/32 fails booting. (And without patch 2, it even 
hangs, looping forever in do_page_fault()).


> 
>> So I'll take patches 2-6 via powerpc and assume this patch will go via
>> Linus or Al or elsewhere.
> 
> So I guess I'll wait and see what happens with patch 1.

We could eventually opt out user_access_begin() for 
CONFIG_PPC_BOOK3S_32, then you could take patches 3 and 6. That's enough 
to have user_access_begin() and stuff for 8xx and RADIX.

Patch 2 should be taken as well as a fix, and can be kept independant of 
the series (once we have patch 1, we normally don't hit the problem 
fixed by patch 2).

Won't don't need patch 4 until we want user_access_begin() supported by 
book3s/32


However, I'm about to send out a v3 with a different approach. It 
modifies the core part where user_access_begin() is returning an opaque 
value used by user_access_end(). And it also tells user_access_begin() 
whether it's a read or a write, so that we can limit unlocking to write 
acccesses on book3s/32, and fine grain rights on book3s/64.

Maybe you would prefer this change on top of first step, in which case 
I'll be able to make a v4 rebasing all this on top of patch 3 and 6 of 
v3 series. Tell me what you prefer.

Christophe
Linus Torvalds Jan. 23, 2020, 6:38 p.m. UTC | #8
On Thu, Jan 23, 2020 at 4:00 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> So I guess I'll wait and see what happens with patch 1.

I've committed my fixes to filldir[64]() directly - they really were
fixing me being lazy about the range, and the name length checking
really is a theoretical "access wrong user space pointer" issue with
corrupted filesystems regardless (even though I suspect it's entirely
theoretical - even a corrupt filesystem hopefully won't be passing in
negative directory entry lengths or something like that).

The "pass in read/write" part I'm not entirely convinced about.
Honestly, if this is just for ppc32 and nobody else really needs it,
make the ppc32s thing always just enable both user space reads and
writes. That's the semantics for x86 and arm as is, I'm not convinced
that we should complicate this for a legacy platform.

                Linus
Michael Ellerman Jan. 24, 2020, 10:42 a.m. UTC | #9
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Thu, Jan 23, 2020 at 4:00 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> So I guess I'll wait and see what happens with patch 1.
>
> I've committed my fixes to filldir[64]() directly - they really were
> fixing me being lazy about the range, and the name length checking
> really is a theoretical "access wrong user space pointer" issue with
> corrupted filesystems regardless (even though I suspect it's entirely
> theoretical - even a corrupt filesystem hopefully won't be passing in
> negative directory entry lengths or something like that).

Great, thanks.

> The "pass in read/write" part I'm not entirely convinced about.
> Honestly, if this is just for ppc32 and nobody else really needs it,
> make the ppc32s thing always just enable both user space reads and
> writes. That's the semantics for x86 and arm as is, I'm not convinced
> that we should complicate this for a legacy platform.

We can use the read/write info on Power9 too. That's a niche platform
but hopefully not legacy status yet :P

But it's entirely optional, as you say we can just enable read/write if
we aren't passed the read/write info from the upper-level API.

I think our priority should be getting objtool going on powerpc to check
our user access regions are well contained. Once we have that working
maybe then we can look at plumbing the direction through
user_access_begin() etc.

cheers
diff mbox series

Patch

diff --git a/fs/readdir.c b/fs/readdir.c
index d26d5ea4de7b..3f9b4488d9b7 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -214,7 +214,7 @@  struct getdents_callback {
 static int filldir(struct dir_context *ctx, const char *name, int namlen,
 		   loff_t offset, u64 ino, unsigned int d_type)
 {
-	struct linux_dirent __user * dirent;
+	struct linux_dirent __user * dirent, *dirent0;
 	struct getdents_callback *buf =
 		container_of(ctx, struct getdents_callback, ctx);
 	unsigned long d_ino;
@@ -232,19 +232,22 @@  static int filldir(struct dir_context *ctx, const char *name, int namlen,
 		buf->error = -EOVERFLOW;
 		return -EOVERFLOW;
 	}
-	dirent = buf->previous;
-	if (dirent && signal_pending(current))
+	dirent0 = buf->previous;
+	if (dirent0 && signal_pending(current))
 		return -EINTR;
 
-	/*
-	 * Note! This range-checks 'previous' (which may be NULL).
-	 * The real range was checked in getdents
-	 */
-	if (!user_access_begin(dirent, sizeof(*dirent)))
-		goto efault;
-	if (dirent)
-		unsafe_put_user(offset, &dirent->d_off, efault_end);
 	dirent = buf->current_dir;
+	if (dirent0) {
+		int sz = (void __user *)dirent + reclen -
+			 (void __user *)dirent0;
+
+		if (!user_access_begin(dirent0, sz))
+			goto efault;
+		unsafe_put_user(offset, &dirent0->d_off, efault_end);
+	} else {
+		if (!user_access_begin(dirent, reclen))
+			goto efault;
+	}
 	unsafe_put_user(d_ino, &dirent->d_ino, efault_end);
 	unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
 	unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end);
@@ -307,7 +310,7 @@  struct getdents_callback64 {
 static int filldir64(struct dir_context *ctx, const char *name, int namlen,
 		     loff_t offset, u64 ino, unsigned int d_type)
 {
-	struct linux_dirent64 __user *dirent;
+	struct linux_dirent64 __user *dirent, *dirent0;
 	struct getdents_callback64 *buf =
 		container_of(ctx, struct getdents_callback64, ctx);
 	int reclen = ALIGN(offsetof(struct linux_dirent64, d_name) + namlen + 1,
@@ -319,19 +322,22 @@  static int filldir64(struct dir_context *ctx, const char *name, int namlen,
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
-	dirent = buf->previous;
-	if (dirent && signal_pending(current))
+	dirent0 = buf->previous;
+	if (dirent0 && signal_pending(current))
 		return -EINTR;
 
-	/*
-	 * Note! This range-checks 'previous' (which may be NULL).
-	 * The real range was checked in getdents
-	 */
-	if (!user_access_begin(dirent, sizeof(*dirent)))
-		goto efault;
-	if (dirent)
-		unsafe_put_user(offset, &dirent->d_off, efault_end);
 	dirent = buf->current_dir;
+	if (dirent0) {
+		int sz = (void __user *)dirent + reclen -
+			 (void __user *)dirent0;
+
+		if (!user_access_begin(dirent0, sz))
+			goto efault;
+		unsafe_put_user(offset, &dirent0->d_off, efault_end);
+	} else {
+		if (!user_access_begin(dirent, reclen))
+			goto efault;
+	}
 	unsafe_put_user(ino, &dirent->d_ino, efault_end);
 	unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
 	unsafe_put_user(d_type, &dirent->d_type, efault_end);