Message ID | 20201012220620.124408-1-linus.walleij@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,RESEND] fcntl: Add 32bit filesystem mode | expand |
On 10/12/20 5:06 PM, Linus Walleij wrote: > It was brought to my attention that this bug from 2018 was > still unresolved: 32 bit emulators like QEMU were given > 64 bit hashes when running 32 bit emulation on 64 bit systems. > > This adds a flag to the fcntl() F_GETFD and F_SETFD operations > to set the underlying filesystem into 32bit mode even if the > file handle was opened using 64bit mode without the compat > syscalls. > > Programs that need the 32 bit file system behavior need to > issue a fcntl() system call such as in this example: > > #define FD_32BIT_MODE 2 > > int main(int argc, char** argv) { > DIR* dir; > int err; > int fd; > > dir = opendir("/boot"); > fd = dirfd(dir); > err = fcntl(fd, F_SETFD, FD_32BIT_MODE); This is a blind set, and wipes out FD_CLOEXEC. Better would be to do a proper demonstration of the read-modify-write with F_GETFD that portable programs will have to use in practice.
On Tue, Oct 13, 2020 at 12:06:20AM +0200, Linus Walleij wrote: > It was brought to my attention that this bug from 2018 was > still unresolved: 32 bit emulators like QEMU were given > 64 bit hashes when running 32 bit emulation on 64 bit systems. > > This adds a flag to the fcntl() F_GETFD and F_SETFD operations > to set the underlying filesystem into 32bit mode even if the > file handle was opened using 64bit mode without the compat > syscalls. > > Programs that need the 32 bit file system behavior need to > issue a fcntl() system call such as in this example: > > #define FD_32BIT_MODE 2 > > int main(int argc, char** argv) { > DIR* dir; > int err; > int fd; > > dir = opendir("/boot"); > fd = dirfd(dir); > err = fcntl(fd, F_SETFD, FD_32BIT_MODE); > if (err) { > printf("fcntl() failed! err=%d\n", err); > return 1; > } > printf("dir=%p\n", dir); > printf("readdir(dir)=%p\n", readdir(dir)); > printf("errno=%d: %s\n", errno, strerror(errno)); > return 0; > } > > This can be pretty hard to test since C libraries and linux > userspace security extensions aggressively filter the parameters > that are passed down and allowed to commit into actual system > calls. > > Cc: Florian Weimer <fw@deneb.enyo.de> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Andy Lutomirski <luto@kernel.org> > Suggested-by: Theodore Ts'o <tytso@mit.edu> > Link: https://bugs.launchpad.net/qemu/+bug/1805913 > Link: https://lore.kernel.org/lkml/87bm56vqg4.fsf@mid.deneb.enyo.de/ > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957 > Signed-off-by: Linus Walleij <linus.walleij@linaro.org> > --- > ChangeLog v3->v3 RESEND 1: > - Resending during the v5.10 merge window to get attention. > ChangeLog v2->v3: > - Realized that I also have to clear the flag correspondingly > if someone ask for !FD_32BIT_MODE after setting it the > first time. > ChangeLog v1->v2: > - Use a new flag FD_32BIT_MODE to F_GETFD and F_SETFD > instead of a new fcntl operation, there is already a fcntl > operation to set random flags. > - Sorry for taking forever to respin this patch :( > --- > fs/fcntl.c | 7 +++++++ > include/uapi/asm-generic/fcntl.h | 8 ++++++++ > 2 files changed, 15 insertions(+) > > diff --git a/fs/fcntl.c b/fs/fcntl.c > index 19ac5baad50f..6c32edc4099a 100644 > --- a/fs/fcntl.c > +++ b/fs/fcntl.c > @@ -335,10 +335,17 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, > break; > case F_GETFD: > err = get_close_on_exec(fd) ? FD_CLOEXEC : 0; > + /* Report 32bit file system mode */ > + if (filp->f_mode & FMODE_32BITHASH) > + err |= FD_32BIT_MODE; > break; > case F_SETFD: > err = 0; > set_close_on_exec(fd, arg & FD_CLOEXEC); > + if (arg & FD_32BIT_MODE) > + filp->f_mode |= FMODE_32BITHASH; > + else > + filp->f_mode &= ~FMODE_32BITHASH; This seems inconsistent? F_SETFD is for setting flags on a file descriptor. Won't setting a flag on filp here instead cause the behaviour to change for all file descriptors across the system that are open on this struct file? Compare set_close_on_exec(). I don't see any discussion on whether this should be an F_SETFL or an F_SETFD, though I see F_SETFD was Ted's suggestion originally. [...] Cheers ---Dave
On Tue, Oct 13, 2020 at 6:09 AM Linus Walleij <linus.walleij@linaro.org> wrote: > > It was brought to my attention that this bug from 2018 was > still unresolved: 32 bit emulators like QEMU were given > 64 bit hashes when running 32 bit emulation on 64 bit systems. > > This adds a flag to the fcntl() F_GETFD and F_SETFD operations > to set the underlying filesystem into 32bit mode even if the > file handle was opened using 64bit mode without the compat > syscalls. I've also seens this when running i386 inside WSL by qemu, It's seems the same issue I am facing. > > Programs that need the 32 bit file system behavior need to > issue a fcntl() system call such as in this example: > > #define FD_32BIT_MODE 2 > > int main(int argc, char** argv) { > DIR* dir; > int err; > int fd; > > dir = opendir("/boot"); > fd = dirfd(dir); > err = fcntl(fd, F_SETFD, FD_32BIT_MODE); > if (err) { > printf("fcntl() failed! err=%d\n", err); > return 1; > } > printf("dir=%p\n", dir); > printf("readdir(dir)=%p\n", readdir(dir)); > printf("errno=%d: %s\n", errno, strerror(errno)); > return 0; > } > > This can be pretty hard to test since C libraries and linux > userspace security extensions aggressively filter the parameters > that are passed down and allowed to commit into actual system > calls. > > Cc: Florian Weimer <fw@deneb.enyo.de> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Andy Lutomirski <luto@kernel.org> > Suggested-by: Theodore Ts'o <tytso@mit.edu> > Link: https://bugs.launchpad.net/qemu/+bug/1805913 > Link: https://lore.kernel.org/lkml/87bm56vqg4.fsf@mid.deneb.enyo.de/ > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957 > Signed-off-by: Linus Walleij <linus.walleij@linaro.org> > --- > ChangeLog v3->v3 RESEND 1: > - Resending during the v5.10 merge window to get attention. > ChangeLog v2->v3: > - Realized that I also have to clear the flag correspondingly > if someone ask for !FD_32BIT_MODE after setting it the > first time. > ChangeLog v1->v2: > - Use a new flag FD_32BIT_MODE to F_GETFD and F_SETFD > instead of a new fcntl operation, there is already a fcntl > operation to set random flags. > - Sorry for taking forever to respin this patch :( > --- > fs/fcntl.c | 7 +++++++ > include/uapi/asm-generic/fcntl.h | 8 ++++++++ > 2 files changed, 15 insertions(+) > > diff --git a/fs/fcntl.c b/fs/fcntl.c > index 19ac5baad50f..6c32edc4099a 100644 > --- a/fs/fcntl.c > +++ b/fs/fcntl.c > @@ -335,10 +335,17 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, > break; > case F_GETFD: > err = get_close_on_exec(fd) ? FD_CLOEXEC : 0; > + /* Report 32bit file system mode */ > + if (filp->f_mode & FMODE_32BITHASH) > + err |= FD_32BIT_MODE; > break; > case F_SETFD: > err = 0; > set_close_on_exec(fd, arg & FD_CLOEXEC); > + if (arg & FD_32BIT_MODE) > + filp->f_mode |= FMODE_32BITHASH; > + else > + filp->f_mode &= ~FMODE_32BITHASH; > break; > case F_GETFL: > err = filp->f_flags; > diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h > index 9dc0bf0c5a6e..edd3573cb7ef 100644 > --- a/include/uapi/asm-generic/fcntl.h > +++ b/include/uapi/asm-generic/fcntl.h > @@ -160,6 +160,14 @@ struct f_owner_ex { > > /* for F_[GET|SET]FL */ > #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ > +/* > + * This instructs the kernel to provide 32bit semantics (such as hashes) from > + * the file system layer, when running a userland that depend on 32bit > + * semantics on a kernel that supports 64bit userland, but does not use the > + * compat ioctl() for e.g. open(), so that the kernel would otherwise assume > + * that the userland process is capable of dealing with 64bit semantics. > + */ > +#define FD_32BIT_MODE 2 > > /* for posix fcntl() and lockf() */ > #ifndef F_RDLCK > -- > 2.26.2 > > -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo
On Tue, Oct 13, 2020 at 11:22 AM Dave Martin <Dave.Martin@arm.com> wrote: > > case F_SETFD: > > err = 0; > > set_close_on_exec(fd, arg & FD_CLOEXEC); > > + if (arg & FD_32BIT_MODE) > > + filp->f_mode |= FMODE_32BITHASH; > > + else > > + filp->f_mode &= ~FMODE_32BITHASH; > > This seems inconsistent? F_SETFD is for setting flags on a file > descriptor. Won't setting a flag on filp here instead cause the > behaviour to change for all file descriptors across the system that are > open on this struct file? Compare set_close_on_exec(). > > I don't see any discussion on whether this should be an F_SETFL or an > F_SETFD, though I see F_SETFD was Ted's suggestion originally. I cannot honestly say I know the semantic difference. I would ask the QEMU people how a user program would expect the flag to behave. Yours, Linus Walleij
On Wed, Nov 18, 2020 at 12:38 AM Linus Walleij <linus.walleij@linaro.org> wrote: > > On Tue, Oct 13, 2020 at 11:22 AM Dave Martin <Dave.Martin@arm.com> wrote: > > > > case F_SETFD: > > > err = 0; > > > set_close_on_exec(fd, arg & FD_CLOEXEC); > > > + if (arg & FD_32BIT_MODE) > > > + filp->f_mode |= FMODE_32BITHASH; > > > + else > > > + filp->f_mode &= ~FMODE_32BITHASH; > > > > This seems inconsistent? F_SETFD is for setting flags on a file > > descriptor. Won't setting a flag on filp here instead cause the > > behaviour to change for all file descriptors across the system that are > > open on this struct file? Compare set_close_on_exec(). > > > > I don't see any discussion on whether this should be an F_SETFL or an > > F_SETFD, though I see F_SETFD was Ted's suggestion originally. > > I cannot honestly say I know the semantic difference. > > I would ask the QEMU people how a user program would expect > the flag to behave. I agree it should either use F_SETFD to set a bit in the fdtable structure like set_close_on_exec() or it should use F_SETFL to set a bit in filp->f_mode. It appears the reason FMODE_32BITHASH is part of filp->f_mode is that the only user today is nfsd, which does not have a file descriptor but only has a struct file. Similarly, the only code that understands the difference (ext4_readdir()) has no reference to the file descriptor. If this becomes an O_DIR32BITHASH flag for F_SETFL, I suppose it should also be supported by openat2(). Arnd
On Tue, 17 Nov 2020 at 23:38, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Tue, Oct 13, 2020 at 11:22 AM Dave Martin <Dave.Martin@arm.com> wrote: > > > > case F_SETFD: > > > err = 0; > > > set_close_on_exec(fd, arg & FD_CLOEXEC); > > > + if (arg & FD_32BIT_MODE) > > > + filp->f_mode |= FMODE_32BITHASH; > > > + else > > > + filp->f_mode &= ~FMODE_32BITHASH; > > > > This seems inconsistent? F_SETFD is for setting flags on a file > > descriptor. Won't setting a flag on filp here instead cause the > > behaviour to change for all file descriptors across the system that are > > open on this struct file? Compare set_close_on_exec(). > > > > I don't see any discussion on whether this should be an F_SETFL or an > > F_SETFD, though I see F_SETFD was Ted's suggestion originally. > > I cannot honestly say I know the semantic difference. > > I would ask the QEMU people how a user program would expect > the flag to behave. Apologies for the very late response -- I hadn't noticed that this thread had stalled out waiting for an answer to this, and was only reminded of it recently when another QEMU user ran into the problem that this kernel patch is trying to resolve. If I understand the distinction here correctly, I think QEMU wouldn't care about it in practice. We want the "32 bit readdir offsets" behaviour on all file descriptors that correspond to where we're emulating "the guest opened this file descriptor". We don't want (but probably won't notice if we get) that behaviour on file descriptors that QEMU has opened for its own purposes. But we'll never open a file descriptor for the guest and then dup it into one for QEMU's purposes. (I guess there might be some weird unlikely-to-happen edge cases where an emulated guest binary opens an fd for a directory and then passes it via exec to a host binary: but even there I expect the host binary wouldn't notice it was getting 32-bit hashes.) But overall I think that the more natural behaviour would be that it is per-file-descriptor. -- PMM
diff --git a/fs/fcntl.c b/fs/fcntl.c index 19ac5baad50f..6c32edc4099a 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -335,10 +335,17 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, break; case F_GETFD: err = get_close_on_exec(fd) ? FD_CLOEXEC : 0; + /* Report 32bit file system mode */ + if (filp->f_mode & FMODE_32BITHASH) + err |= FD_32BIT_MODE; break; case F_SETFD: err = 0; set_close_on_exec(fd, arg & FD_CLOEXEC); + if (arg & FD_32BIT_MODE) + filp->f_mode |= FMODE_32BITHASH; + else + filp->f_mode &= ~FMODE_32BITHASH; break; case F_GETFL: err = filp->f_flags; diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index 9dc0bf0c5a6e..edd3573cb7ef 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -160,6 +160,14 @@ struct f_owner_ex { /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ +/* + * This instructs the kernel to provide 32bit semantics (such as hashes) from + * the file system layer, when running a userland that depend on 32bit + * semantics on a kernel that supports 64bit userland, but does not use the + * compat ioctl() for e.g. open(), so that the kernel would otherwise assume + * that the userland process is capable of dealing with 64bit semantics. + */ +#define FD_32BIT_MODE 2 /* for posix fcntl() and lockf() */ #ifndef F_RDLCK
It was brought to my attention that this bug from 2018 was still unresolved: 32 bit emulators like QEMU were given 64 bit hashes when running 32 bit emulation on 64 bit systems. This adds a flag to the fcntl() F_GETFD and F_SETFD operations to set the underlying filesystem into 32bit mode even if the file handle was opened using 64bit mode without the compat syscalls. Programs that need the 32 bit file system behavior need to issue a fcntl() system call such as in this example: #define FD_32BIT_MODE 2 int main(int argc, char** argv) { DIR* dir; int err; int fd; dir = opendir("/boot"); fd = dirfd(dir); err = fcntl(fd, F_SETFD, FD_32BIT_MODE); if (err) { printf("fcntl() failed! err=%d\n", err); return 1; } printf("dir=%p\n", dir); printf("readdir(dir)=%p\n", readdir(dir)); printf("errno=%d: %s\n", errno, strerror(errno)); return 0; } This can be pretty hard to test since C libraries and linux userspace security extensions aggressively filter the parameters that are passed down and allowed to commit into actual system calls. Cc: Florian Weimer <fw@deneb.enyo.de> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Andy Lutomirski <luto@kernel.org> Suggested-by: Theodore Ts'o <tytso@mit.edu> Link: https://bugs.launchpad.net/qemu/+bug/1805913 Link: https://lore.kernel.org/lkml/87bm56vqg4.fsf@mid.deneb.enyo.de/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957 Signed-off-by: Linus Walleij <linus.walleij@linaro.org> --- ChangeLog v3->v3 RESEND 1: - Resending during the v5.10 merge window to get attention. ChangeLog v2->v3: - Realized that I also have to clear the flag correspondingly if someone ask for !FD_32BIT_MODE after setting it the first time. ChangeLog v1->v2: - Use a new flag FD_32BIT_MODE to F_GETFD and F_SETFD instead of a new fcntl operation, there is already a fcntl operation to set random flags. - Sorry for taking forever to respin this patch :( --- fs/fcntl.c | 7 +++++++ include/uapi/asm-generic/fcntl.h | 8 ++++++++ 2 files changed, 15 insertions(+)