diff mbox series

[v2] fs: use acquire ordering in __fget_light()

Message ID 20221031175256.2813280-1-jannh@google.com (mailing list archive)
State New, archived
Headers show
Series [v2] fs: use acquire ordering in __fget_light() | expand

Commit Message

Jann Horn Oct. 31, 2022, 5:52 p.m. UTC
We must prevent the CPU from reordering the files->count read with the
FD table access like this, on architectures where read-read reordering is
possible:

    files_lookup_fd_raw()
                                  close_fd()
                                  put_files_struct()
    atomic_read(&files->count)

I would like to mark this for stable, but the stable rules explicitly say
"no theoretical races", and given that the FD table pointer and
files->count are explicitly stored in the same cacheline, this sort of
reordering seems quite unlikely in practice...

Signed-off-by: Jann Horn <jannh@google.com>
---
 fs/file.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)


base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763

Comments

Al Viro Oct. 31, 2022, 6:08 p.m. UTC | #1
On Mon, Oct 31, 2022 at 06:52:56PM +0100, Jann Horn wrote:
> We must prevent the CPU from reordering the files->count read with the
> FD table access like this, on architectures where read-read reordering is
> possible:
> 
>     files_lookup_fd_raw()
>                                   close_fd()
>                                   put_files_struct()
>     atomic_read(&files->count)
> 
> I would like to mark this for stable, but the stable rules explicitly say
> "no theoretical races", and given that the FD table pointer and
> files->count are explicitly stored in the same cacheline, this sort of
> reordering seems quite unlikely in practice...

Looks sane, but looking at the definition of atomic_read_acquire...  ouch.

static __always_inline int
atomic_read_acquire(const atomic_t *v)
{
        instrument_atomic_read(v, sizeof(*v));
	return arch_atomic_read_acquire(v);
}

OK...

; git grep -n -w arch_atomic_read_acquire
include/linux/atomic/atomic-arch-fallback.h:220:#ifndef arch_atomic_read_acquire
include/linux/atomic/atomic-arch-fallback.h:222:arch_atomic_read_acquire(const atomic_t *v)
include/linux/atomic/atomic-arch-fallback.h:235:#define arch_atomic_read_acquire arch_atomic_read_acquire
include/linux/atomic/atomic-instrumented.h:35:  return arch_atomic_read_acquire(v);
include/linux/atomic/atomic-long.h:529: return arch_atomic_read_acquire(v);

No arch-specific instances, so...
static __always_inline int
arch_atomic_read_acquire(const atomic_t *v)
{
	int ret;

	if (__native_word(atomic_t)) {
		ret = smp_load_acquire(&(v)->counter);
	} else {
		ret = arch_atomic_read(v);
		__atomic_acquire_fence();
	}

	return ret;
}

OK, but when would that test not be true?  We have unconditional
typedef struct {
        int counter;
} atomic_t;
and
#define __native_word(t) \
        (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
         sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))

Do we really have any architectures where a structure with one
int field does *not* have a size that would satisfy that check?

Is it future-proofing for masturbation sake, or am I missing something
real here?
Jann Horn Oct. 31, 2022, 6:13 p.m. UTC | #2
On Mon, Oct 31, 2022 at 7:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
[...]
> No arch-specific instances, so...
> static __always_inline int
> arch_atomic_read_acquire(const atomic_t *v)
> {
>         int ret;
>
>         if (__native_word(atomic_t)) {
>                 ret = smp_load_acquire(&(v)->counter);
>         } else {
>                 ret = arch_atomic_read(v);
>                 __atomic_acquire_fence();
>         }
>
>         return ret;
> }
[...]
> Do we really have any architectures where a structure with one
> int field does *not* have a size that would satisfy that check?
>
> Is it future-proofing for masturbation sake, or am I missing something
> real here?

include/linux/atomic/atomic-arch-fallback.h has a comment at the top that says:

// Generated by scripts/atomic/gen-atomic-fallback.sh
// DO NOT MODIFY THIS FILE DIRECTLY
Linus Torvalds Oct. 31, 2022, 6:45 p.m. UTC | #3
On Mon, Oct 31, 2022 at 11:08 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Looks sane, but looking at the definition of atomic_read_acquire...  ouch.

The compiler should sort all that out and the mess shouldn't affect
any code generation.

But I also wouldn't mind somebody fixing things up, because I do agree
that checking whether 'atomic_t' is a native word size is kind of
pointless and probably just makes our build times unnecessarily
longer.

                Linus
Al Viro Oct. 31, 2022, 6:48 p.m. UTC | #4
On Mon, Oct 31, 2022 at 07:13:30PM +0100, Jann Horn wrote:
> On Mon, Oct 31, 2022 at 7:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> [...]
> > No arch-specific instances, so...
> > static __always_inline int
> > arch_atomic_read_acquire(const atomic_t *v)
> > {
> >         int ret;
> >
> >         if (__native_word(atomic_t)) {
> >                 ret = smp_load_acquire(&(v)->counter);
> >         } else {
> >                 ret = arch_atomic_read(v);
> >                 __atomic_acquire_fence();
> >         }
> >
> >         return ret;
> > }
> [...]
> > Do we really have any architectures where a structure with one
> > int field does *not* have a size that would satisfy that check?
> >
> > Is it future-proofing for masturbation sake, or am I missing something
> > real here?
> 
> include/linux/atomic/atomic-arch-fallback.h has a comment at the top that says:
> 
> // Generated by scripts/atomic/gen-atomic-fallback.sh
> // DO NOT MODIFY THIS FILE DIRECTLY

Hmm...  Apparently, the source is shared for atomic and atomic64, and the
check is intended for atomic64 counterpart of that thing on 32bit boxen.
Might make sense to put a comment in there...

The question about architectures with non-default implementations still
stands, though.

Anyway, it's unrelated to the patch itself.  I'm fine with it in the current
form.  Will apply for the next merge window, unless Linus wants it in right
now.
Linus Torvalds Oct. 31, 2022, 7:07 p.m. UTC | #5
On Mon, Oct 31, 2022 at 11:48 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Anyway, it's unrelated to the patch itself.  I'm fine with it in the current
> form.  Will apply for the next merge window, unless Linus wants it in right
> now.

It doesn't strike me as hugely critical, so I'm fine with it being put
in any random pile of "fixes to be applied" as long as it doesn't get
lost entirely. But if y ou have a "fixes" branch that may end up
coming to me before this release is over, that's not the wrong place
either.

I would tend to agree with Jann that the re-ordering doesn't look very
likely because it's the same cacheline, so even an aggressively
out-of-order core really doesn't seem to be very likely to trigger any
issues. So you have a really unlikely situation to begin with, and
even less reason for it then triggering the re-ordering.

The "original situation is unlikely" can probably be made quite likely
with an active attack, but that active attacker would then also also
have to rely on "that re-ordering looks sketchy", and actively hunt
for hardware where it can happen.

And said hardware may simply not exist, even if the race is certainly
theoretically possible on any weakly ordered CPU.

             Linus
Al Viro Oct. 31, 2022, 7:31 p.m. UTC | #6
On Mon, Oct 31, 2022 at 12:07:36PM -0700, Linus Torvalds wrote:
> On Mon, Oct 31, 2022 at 11:48 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Anyway, it's unrelated to the patch itself.  I'm fine with it in the current
> > form.  Will apply for the next merge window, unless Linus wants it in right
> > now.
> 
> It doesn't strike me as hugely critical, so I'm fine with it being put
> in any random pile of "fixes to be applied" as long as it doesn't get
> lost entirely. But if y ou have a "fixes" branch that may end up
> coming to me before this release is over, that's not the wrong place
> either.

Applied to #fixes and pushed out...
diff mbox series

Patch

diff --git a/fs/file.c b/fs/file.c
index 5f9c802a5d8d3..c942c89ca4cda 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1003,7 +1003,16 @@  static unsigned long __fget_light(unsigned int fd, fmode_t mask)
 	struct files_struct *files = current->files;
 	struct file *file;
 
-	if (atomic_read(&files->count) == 1) {
+	/*
+	 * If another thread is concurrently calling close_fd() followed
+	 * by put_files_struct(), we must not observe the old table
+	 * entry combined with the new refcount - otherwise we could
+	 * return a file that is concurrently being freed.
+	 *
+	 * atomic_read_acquire() pairs with atomic_dec_and_test() in
+	 * put_files_struct().
+	 */
+	if (atomic_read_acquire(&files->count) == 1) {
 		file = files_lookup_fd_raw(files, fd);
 		if (!file || unlikely(file->f_mode & mask))
 			return 0;