diff mbox series

ext4: Give 32bit personalities 32bit hashes

Message ID 20200317113153.7945-1-linus.walleij@linaro.org (mailing list archive)
State New, archived
Headers show
Series ext4: Give 32bit personalities 32bit hashes | expand

Commit Message

Linus Walleij March 17, 2020, 11:31 a.m. UTC
It was brought to my attention that this bug from 2018 was
still unresolved: 32 bit emulators like QEMU were given
64 bit hashes when running 32 bit emulation on 64 bit systems.

The personality(2) system call supports to let processes
indicate that they are 32 bit Linux to the kernel. This
was suggested by Teo in the original thread, so I just wired
it up and it solves the problem.

Programs that need the 32 bit hash only need to issue the
personality(PER_LINUX32) call and things start working.

I made a test program like this:

  #include <dirent.h>
  #include <errno.h>
  #include <stdio.h>
  #include <string.h>
  #include <sys/types.h>
  #include <sys/personality.h>

  int main(int argc, char** argv) {
    DIR* dir;
    personality(PER_LINUX32);
    dir = opendir("/boot");
    printf("dir=%p\n", dir);
    printf("readdir(dir)=%p\n", readdir(dir));
    printf("errno=%d: %s\n", errno, strerror(errno));
    return 0;
  }

This was compiled with an ARM32 toolchain from Bootlin using
glibc 2.28 and thus suffering from the bug.

Before the patch:

  $ ./readdir-bug
  dir=0x86000
  readdir(dir)=(nil)
  errno=75: Value too large for defined data type

After the patch:

  $ ./readdir-bug
  dir=0x86000
  readdir(dir)=0x86020
  errno=0: Success

Problem solved.

Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Link: https://bugs.launchpad.net/qemu/+bug/1805913
Link: https://lore.kernel.org/lkml/87bm56vqg4.fsf@mid.deneb.enyo.de/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 fs/ext4/dir.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Florian Weimer March 17, 2020, 11:52 a.m. UTC | #1
* Linus Walleij:

> It was brought to my attention that this bug from 2018 was
> still unresolved: 32 bit emulators like QEMU were given
> 64 bit hashes when running 32 bit emulation on 64 bit systems.
>
> The personality(2) system call supports to let processes
> indicate that they are 32 bit Linux to the kernel. This
> was suggested by Teo in the original thread, so I just wired
> it up and it solves the problem.
>
> Programs that need the 32 bit hash only need to issue the
> personality(PER_LINUX32) call and things start working.
>
> I made a test program like this:
>
>   #include <dirent.h>
>   #include <errno.h>
>   #include <stdio.h>
>   #include <string.h>
>   #include <sys/types.h>
>   #include <sys/personality.h>
>
>   int main(int argc, char** argv) {
>     DIR* dir;
>     personality(PER_LINUX32);
>     dir = opendir("/boot");
>     printf("dir=%p\n", dir);
>     printf("readdir(dir)=%p\n", readdir(dir));
>     printf("errno=%d: %s\n", errno, strerror(errno));
>     return 0;
>   }
>
> This was compiled with an ARM32 toolchain from Bootlin using
> glibc 2.28 and thus suffering from the bug.

Just be sure: Is it possible to move the PER_LINUX32 setting into QEMU?
(I see why not.)

However, this does not solve the issue with network file systems and
other scenarios.  I still think need to add a workaround to the glibc
implementation.
Peter Maydell March 17, 2020, 11:58 a.m. UTC | #2
On Tue, 17 Mar 2020 at 11:31, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> It was brought to my attention that this bug from 2018 was
> still unresolved: 32 bit emulators like QEMU were given
> 64 bit hashes when running 32 bit emulation on 64 bit systems.
>
> The personality(2) system call supports to let processes
> indicate that they are 32 bit Linux to the kernel. This
> was suggested by Teo in the original thread, so I just wired
> it up and it solves the problem.

Thanks for having a look at this. I'm not sure this is what
QEMU needs, though. When QEMU runs, it is not a 32-bit
process, it's a 64-bit process. Some of the syscalls
it makes are on behalf of the guest and would need 32-bit
semantics (including this one of wanting 32-bit hash sizes
in directory reads). But some syscalls it makes for itself
(either directly, or via libraries it's linked against
including glibc and glib) -- those would still want the
usual 64-bit semantics, I would have thought.

> Programs that need the 32 bit hash only need to issue the
> personality(PER_LINUX32) call and things start working.

What in particular does this personality setting affect?
My copy of the personality(2) manpage just says:

       PER_LINUX32 (since Linux 2.2)
              [To be documented.]

which isn't very informative.

thanks
-- PMM
Linus Walleij March 17, 2020, 12:38 p.m. UTC | #3
On Tue, Mar 17, 2020 at 12:53 PM Florian Weimer <fw@deneb.enyo.de> wrote:

> Just be sure: Is it possible to move the PER_LINUX32 setting into QEMU?
> (I see why not.)

I set it in the program explicitly, but what actually happens when
I run it is that the binfmt handler invokes qemu-user so certainly
that program can set the flag, any process can.

Yours,
Linus Walleij
Andreas Dilger March 17, 2020, 10:28 p.m. UTC | #4
On Mar 17, 2020, at 5:31 AM, Linus Walleij <linus.walleij@linaro.org> wrote:
> 
> It was brought to my attention that this bug from 2018 was
> still unresolved: 32 bit emulators like QEMU were given
> 64 bit hashes when running 32 bit emulation on 64 bit systems.
> 
> The personality(2) system call supports to let processes
> indicate that they are 32 bit Linux to the kernel. This
> was suggested by Teo in the original thread, so I just wired
> it up and it solves the problem.
> 
> Programs that need the 32 bit hash only need to issue the
> personality(PER_LINUX32) call and things start working.

I'm generally with with this from the ext4 point of view.

That said, I'd think it would be preferable for ease of use and
compatibility that applications didn't have to be modified
(e.g. have QEMU or glibc internally set PER_LINUX32 for this
process before the 32-bit syscall is called, given that it knows
whether it is emulating a 32-bit runtime or not).

The other way to handle this would be for ARM32 to check the
PER_LINUX32 flag via is_compat_task() so that there wouldn't
need to be any changes to the ext4 code at all?

Cheers, Andreas


> I made a test program like this:
> 
>  #include <dirent.h>
>  #include <errno.h>
>  #include <stdio.h>
>  #include <string.h>
>  #include <sys/types.h>
>  #include <sys/personality.h>
> 
>  int main(int argc, char** argv) {
>    DIR* dir;
>    personality(PER_LINUX32);
>    dir = opendir("/boot");
>    printf("dir=%p\n", dir);
>    printf("readdir(dir)=%p\n", readdir(dir));
>    printf("errno=%d: %s\n", errno, strerror(errno));
>    return 0;
>  }
> 
> This was compiled with an ARM32 toolchain from Bootlin using
> glibc 2.28 and thus suffering from the bug.
> 
> Before the patch:
> 
>  $ ./readdir-bug
>  dir=0x86000
>  readdir(dir)=(nil)
>  errno=75: Value too large for defined data type
> 
> After the patch:
> 
>  $ ./readdir-bug
>  dir=0x86000
>  readdir(dir)=0x86020
>  errno=0: Success
> 
> Problem solved.
> 
> Cc: Florian Weimer <fw@deneb.enyo.de>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: stable@vger.kernel.org
> Suggested-by: Theodore Ts'o <tytso@mit.edu>
> Link: https://bugs.launchpad.net/qemu/+bug/1805913
> Link: https://lore.kernel.org/lkml/87bm56vqg4.fsf@mid.deneb.enyo.de/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> fs/ext4/dir.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
> 
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index 9aa1f75409b0..3faf9edf3e92 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -27,6 +27,7 @@
> #include <linux/slab.h>
> #include <linux/iversion.h>
> #include <linux/unicode.h>
> +#include <linux/personality.h>
> #include "ext4.h"
> #include "xattr.h"
> 
> @@ -618,6 +619,14 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
> 
> static int ext4_dir_open(struct inode * inode, struct file * filp)
> {
> +	/*
> +	 * If we are currently running e.g. a 32 bit emulator on
> +	 * a 64 bit machine, the emulator will indicate that it needs
> +	 * a 32 bit personality and thus 32 bit hashes from the file
> +	 * system.
> +	 */
> +	if (personality(current->personality) == PER_LINUX32)
> +		filp->f_mode |= FMODE_32BITHASH;
> 	if (IS_ENCRYPTED(inode))
> 		return fscrypt_get_encryption_info(inode) ? -EACCES : 0;
> 	return 0;
> --
> 2.24.1
> 


Cheers, Andreas
Linus Walleij March 19, 2020, 3:13 p.m. UTC | #5
On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell <peter.maydell@linaro.org> wrote:
> On Tue, 17 Mar 2020 at 11:31, Linus Walleij <linus.walleij@linaro.org> wrote:
> >
> > It was brought to my attention that this bug from 2018 was
> > still unresolved: 32 bit emulators like QEMU were given
> > 64 bit hashes when running 32 bit emulation on 64 bit systems.
> >
> > The personality(2) system call supports to let processes
> > indicate that they are 32 bit Linux to the kernel. This
> > was suggested by Teo in the original thread, so I just wired
> > it up and it solves the problem.
>
> Thanks for having a look at this. I'm not sure this is what
> QEMU needs, though. When QEMU runs, it is not a 32-bit
> process, it's a 64-bit process. Some of the syscalls
> it makes are on behalf of the guest and would need 32-bit
> semantics (including this one of wanting 32-bit hash sizes
> in directory reads). But some syscalls it makes for itself
> (either directly, or via libraries it's linked against
> including glibc and glib) -- those would still want the
> usual 64-bit semantics, I would have thought.

The "personality" thing is a largely unused facility that
a process can use to say it has this generic behaviour.
In this case we say we have the PER_LINUX32 personality
so it will make the process evoke 32bit behaviours inside
the kernel when dealing with this process.

Which I (loosely) appreciate that we want to achieve.

> > Programs that need the 32 bit hash only need to issue the
> > personality(PER_LINUX32) call and things start working.
>
> What in particular does this personality setting affect?
> My copy of the personality(2) manpage just says:
>
>        PER_LINUX32 (since Linux 2.2)
>               [To be documented.]
>
> which isn't very informative.

It's not a POSIX thing (not part of the Single Unix Specification)
so as with most Linux things it has some fuzzy semantics
defined by the community...

I usually just go to the source.

If you grep the kernel what turns up is a bunch of
architecture-specific behaviors on arm64, ia64, mips, parisc,
powerpc, s390, sparc. They are very minor.

On x86_64 the semantic effect is
none so this would work for any kind of 32bit userspace
on x86_64. It would be the first time this flag has any
effect at all on x86_64, but arguably a useful one.

I would not be surprised if running say i586 on x86_64
has the same problem, just that noone has reported
it yet. But what do I know. If they have the same problem
they can use the same solution. Hm QEMU supports
emulating i586 or even i386 ... maybe you could test?
Or tell me how to test? We might be solving a bigger
issue here.

Yours,
Linus Walleij
Linus Walleij March 19, 2020, 3:18 p.m. UTC | #6
On Tue, Mar 17, 2020 at 11:29 PM Andreas Dilger <adilger@dilger.ca> wrote:

> That said, I'd think it would be preferable for ease of use and
> compatibility that applications didn't have to be modified
> (e.g. have QEMU or glibc internally set PER_LINUX32 for this
> process before the 32-bit syscall is called, given that it knows
> whether it is emulating a 32-bit runtime or not).

I think the plan is that QEMU set this, either directly when
you run qemu-user or through the binfmt handler.
https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html

IIUC the binfmt handler is invoked when you try to
execute ELF so-and-so-for-arch-so-and-so when you
are not that arch yourself. So that can set up this
personality.

Not that I know how the binfmt handler works, I am just
assuming that thing is some program that can issue
syscalls. It JustWorks for me after installing the QEMU
packages...

The problem still stands that userspace need to somehow
inform kernelspace that 32bit is going on, and this was the
best I could think of.

Yours,
Linus Walleij
Peter Maydell March 19, 2020, 3:25 p.m. UTC | #7
On Thu, 19 Mar 2020 at 15:13, Linus Walleij <linus.walleij@linaro.org> wrote:
> On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell <peter.maydell@linaro.org> wrote:
> > What in particular does this personality setting affect?
> > My copy of the personality(2) manpage just says:
> >
> >        PER_LINUX32 (since Linux 2.2)
> >               [To be documented.]
> >
> > which isn't very informative.
>
> It's not a POSIX thing (not part of the Single Unix Specification)
> so as with most Linux things it has some fuzzy semantics
> defined by the community...
>
> I usually just go to the source.

If we're going to decide that this is the way to say
"give me 32-bit semantics" we need to actually document
that and define in at least broad terms what we mean
by it, so that when new things are added that might or
might not check against the setting there is a reference
defining whether they should or not, and so that
userspace knows what it's opting into by setting the flag.
The kernel loves undocumented APIs but userspace
consumers of them are not so enamoured :-)

As a concrete example, should "give me 32-bit semantics
via PER_LINUX32" mean "mmap should always return addresses
within 4GB" ? That would seem like it would make sense --
but on the other hand it would make it absolutely unusable
for QEMU's purposes, because we want to be able to
do full 64-bit mmap() for our own internal allocations.
(This is a specific example of the general case that
I'm dubious about having this be a process-wide switch,
because QEMU is a single process which sometimes
makes syscalls on its own behalf and sometimes makes
syscalls on behalf of the guest program it is emulating.
We want 32-bit semantics for the latter but if we
also get them for the former then there might be
unintended side effects.)

> I would not be surprised if running say i586 on x86_64
> has the same problem, just that noone has reported
> it yet. But what do I know. If they have the same problem
> they can use the same solution. Hm QEMU supports
> emulating i586 or even i386 ... maybe you could test?

Native i586 code on x86-64 should be fine, because it
will be using the compat syscalls, which ext4 already
ensures get the 32-bit sized hash (see hash2pos() and
friends).

thanks
-- PMM
Linus Walleij March 19, 2020, 10:23 p.m. UTC | #8
On Thu, Mar 19, 2020 at 4:25 PM Peter Maydell <peter.maydell@linaro.org> wrote:
> On Thu, 19 Mar 2020 at 15:13, Linus Walleij <linus.walleij@linaro.org> wrote:
> > On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell <peter.maydell@linaro.org> wrote:
> > > What in particular does this personality setting affect?
> > > My copy of the personality(2) manpage just says:
> > >
> > >        PER_LINUX32 (since Linux 2.2)
> > >               [To be documented.]
> > >
> > > which isn't very informative.
> >
> > It's not a POSIX thing (not part of the Single Unix Specification)
> > so as with most Linux things it has some fuzzy semantics
> > defined by the community...
> >
> > I usually just go to the source.
>
> If we're going to decide that this is the way to say
> "give me 32-bit semantics" we need to actually document
> that and define in at least broad terms what we mean
> by it, so that when new things are added that might or
> might not check against the setting there is a reference
> defining whether they should or not, and so that
> userspace knows what it's opting into by setting the flag.
> The kernel loves undocumented APIs but userspace
> consumers of them are not so enamoured :-)

OK I guess we can at least take this opportunity to add
some kerneldoc to the include file.

> As a concrete example, should "give me 32-bit semantics
> via PER_LINUX32" mean "mmap should always return addresses
> within 4GB" ? That would seem like it would make sense --

Incidentally that thing in particular has its own personality
flag (personalities are additive, it's a bit schizophrenic)
so PER_LINUX_32BIT is defined as:
PER_LINUX_32BIT =       0x0000 | ADDR_LIMIT_32BIT,
and that is specifically for limiting the address space to
32bit.

There is also PER_LINUX32_3GB for a 3GB lowmem
limit.

Since the personality is kind of additive, if
we want a flag *specifically* for indicating that we want
32bit hashes from the file system, there are bits left so we
can provide that.

Is this what we want to do? I just think we shouldn't
decide on that lightly as we will be using up personality
bug bits, but sometimes you have to use them.

PER_LINUX32 as it stands means 32bit personality
but very specifically does not include memory range
limitations since that has its own flags.

Yours,
Linus Walleij
Theodore Ts'o March 24, 2020, 2:34 a.m. UTC | #9
On Thu, Mar 19, 2020 at 11:23:33PM +0100, Linus Walleij wrote:
> OK I guess we can at least take this opportunity to add
> some kerneldoc to the include file.
> 
> > As a concrete example, should "give me 32-bit semantics
> > via PER_LINUX32" mean "mmap should always return addresses
> > within 4GB" ? That would seem like it would make sense --
> 
> Incidentally that thing in particular has its own personality
> flag (personalities are additive, it's a bit schizophrenic)
> so PER_LINUX_32BIT is defined as:
> PER_LINUX_32BIT =       0x0000 | ADDR_LIMIT_32BIT,
> and that is specifically for limiting the address space to
> 32bit.
> 
> There is also PER_LINUX32_3GB for a 3GB lowmem
> limit.
> 
> Since the personality is kind of additive, if
> we want a flag *specifically* for indicating that we want
> 32bit hashes from the file system, there are bits left so we
> can provide that.
> 
> Is this what we want to do? I just think we shouldn't
> decide on that lightly as we will be using up personality
> bug bits, but sometimes you have to use them.

I've been looking at the personality bug bits more detailed, and it
looks... messy.  Do we pick a new personality, or do we grab another
unique flag?

Another possibility, which would be messier for qemu, would be use a
flag set via fcntl.  That would require qemu from noticing when the
guest is calling open, openat, or openat2, and then inserting a fcntl
system call to set the 32-bit readdir mode.  That's cleaner from the
kernel interface complexity perspective, but it's messier for qemu.

       		 	    		     	  - Ted
Peter Maydell March 24, 2020, 9:29 a.m. UTC | #10
On Tue, 24 Mar 2020 at 02:34, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> Another possibility, which would be messier for qemu, would be use a
> flag set via fcntl.  That would require qemu from noticing when the
> guest is calling open, openat, or openat2, and then inserting a fcntl
> system call to set the 32-bit readdir mode.  That's cleaner from the
> kernel interface complexity perspective, but it's messier for qemu.

On the contrary, that would be a much better interface for QEMU.
We always know when we're doing an open-syscall on behalf
of the guest, and it would be trivial to make the fcntl() call then.
That would ensure that we don't accidentally get the
'32-bit semantics' on file descriptors QEMU opens for its own
purposes, and wouldn't leave us open to the risk in future that
setting the PER_LINUX32 flag for all of QEMU causes
unexpected extra behaviour in future kernels that would be correct
for the guest binary but wrong/broken for QEMU's own internals.

thanks
-- PMM
Theodore Ts'o March 24, 2020, 6:47 p.m. UTC | #11
On Tue, Mar 24, 2020 at 09:29:58AM +0000, Peter Maydell wrote:
> 
> On the contrary, that would be a much better interface for QEMU.
> We always know when we're doing an open-syscall on behalf
> of the guest, and it would be trivial to make the fcntl() call then.
> That would ensure that we don't accidentally get the
> '32-bit semantics' on file descriptors QEMU opens for its own
> purposes, and wouldn't leave us open to the risk in future that
> setting the PER_LINUX32 flag for all of QEMU causes
> unexpected extra behaviour in future kernels that would be correct
> for the guest binary but wrong/broken for QEMU's own internals.

If using a flag set by fcntl is better for qemu, then by all means
let's go with that instead of using a personality flag/number.

Linus, do you have what you need to do a respin of the patch?

       	         	      	   	    	 - Ted
Linus Walleij March 24, 2020, 9:17 p.m. UTC | #12
On Tue, Mar 24, 2020 at 7:48 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
> On Tue, Mar 24, 2020 at 09:29:58AM +0000, Peter Maydell wrote:
> >
> > On the contrary, that would be a much better interface for QEMU.
> > We always know when we're doing an open-syscall on behalf
> > of the guest, and it would be trivial to make the fcntl() call then.
> > That would ensure that we don't accidentally get the
> > '32-bit semantics' on file descriptors QEMU opens for its own
> > purposes, and wouldn't leave us open to the risk in future that
> > setting the PER_LINUX32 flag for all of QEMU causes
> > unexpected extra behaviour in future kernels that would be correct
> > for the guest binary but wrong/broken for QEMU's own internals.
>
> If using a flag set by fcntl is better for qemu, then by all means
> let's go with that instead of using a personality flag/number.
>
> Linus, do you have what you need to do a respin of the patch?

Absolutely, I'm a bit occupied this week but I will try to get to it
early next week!

Thanks a lot for the directions here, it's highly valuable.

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 9aa1f75409b0..3faf9edf3e92 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -27,6 +27,7 @@ 
 #include <linux/slab.h>
 #include <linux/iversion.h>
 #include <linux/unicode.h>
+#include <linux/personality.h>
 #include "ext4.h"
 #include "xattr.h"
 
@@ -618,6 +619,14 @@  static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
 
 static int ext4_dir_open(struct inode * inode, struct file * filp)
 {
+	/*
+	 * If we are currently running e.g. a 32 bit emulator on
+	 * a 64 bit machine, the emulator will indicate that it needs
+	 * a 32 bit personality and thus 32 bit hashes from the file
+	 * system.
+	 */
+	if (personality(current->personality) == PER_LINUX32)
+		filp->f_mode |= FMODE_32BITHASH;
 	if (IS_ENCRYPTED(inode))
 		return fscrypt_get_encryption_info(inode) ? -EACCES : 0;
 	return 0;