diff mbox

[v5] fs: clear file privilege bits when mmap writing

Message ID 20151210070635.GC31922@1wt.eu (mailing list archive)
State New, archived
Headers show

Commit Message

Willy Tarreau Dec. 10, 2015, 7:06 a.m. UTC
Hi Kees,

Why not add a new file flag instead ?

Something like this (editing your patch by hand to illustrate) :


Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Willy Tarreau Dec. 10, 2015, 7:10 a.m. UTC | #1
On Thu, Dec 10, 2015 at 08:06:35AM +0100, Willy Tarreau wrote:
> Hi Kees,
> 
> Why not add a new file flag instead ?
> 
> Something like this (editing your patch by hand to illustrate) :
(...)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3aa514254161..409bd7047e7e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -913,3 +913,4 @@
>  #define FL_OFDLCK       1024    /* lock is "owned" by struct file */
>  #define FL_LAYOUT       2048    /* outstanding pNFS layout */
> +#define FL_DROP_PRIVS   4096    /* lest something weird decides that 2 is OK */

Crap, these ones are for locks, we need to use O_* instead
But anyway you get the idea, I mean there are probably many spare bits
overthere.

Another option I was thinking about was to change f_mode and detect the
change on close. But I don't know what to compare it against.

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kees Cook Dec. 10, 2015, 6:05 p.m. UTC | #2
On Wed, Dec 9, 2015 at 11:06 PM, Willy Tarreau <w@1wt.eu> wrote:
> Hi Kees,
>
> Why not add a new file flag instead ?
>
> Something like this (editing your patch by hand to illustrate) :
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index ad17e05ebf95..3a7eee76ea90 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -191,6 +191,17 @@ static void __fput(struct file *file)
>
>         might_sleep();
>
> +       /*
> +        * XXX: While avoiding mmap_sem, we've already been written to.
> +        * We must ignore the return value, since we can't reject the
> +        * write.
> +        */
> +       if (unlikely(file->f_flags & FL_DROP_PRIVS)) {
> +               mutex_lock(&inode->i_mutex);
> +               file_remove_privs(file);
> +               mutex_unlock(&inode->i_mutex);
> +       }
> +
>         fsnotify_close(file);
>         /*
>          * The function eventpoll_release() should be the first called
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3aa514254161..409bd7047e7e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -913,3 +913,4 @@
>  #define FL_OFDLCK       1024    /* lock is "owned" by struct file */
>  #define FL_LAYOUT       2048    /* outstanding pNFS layout */
> +#define FL_DROP_PRIVS   4096    /* lest something weird decides that 2 is OK */
>
> diff --git a/mm/memory.c b/mm/memory.c
> index c387430f06c3..08a77e0cf65f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
>
>                 if (!page_mkwrite)
>                         file_update_time(vma->vm_file);
> +               vma->vm_file->f_flags |= FL_DROP_PRIVS;
>         }
>
>         return VM_FAULT_WRITE;
>
> Willy
>

Is f_flags safe to write like this without holding a lock?

-Kees
Willy Tarreau Dec. 10, 2015, 6:16 p.m. UTC | #3
On Thu, Dec 10, 2015 at 10:05:50AM -0800, Kees Cook wrote:
> On Wed, Dec 9, 2015 at 11:06 PM, Willy Tarreau <w@1wt.eu> wrote:
> > Hi Kees,
> >
> > Why not add a new file flag instead ?
> >
> > Something like this (editing your patch by hand to illustrate) :
> >
> > diff --git a/fs/file_table.c b/fs/file_table.c
> > index ad17e05ebf95..3a7eee76ea90 100644
> > --- a/fs/file_table.c
> > +++ b/fs/file_table.c
> > @@ -191,6 +191,17 @@ static void __fput(struct file *file)
> >
> >         might_sleep();
> >
> > +       /*
> > +        * XXX: While avoiding mmap_sem, we've already been written to.
> > +        * We must ignore the return value, since we can't reject the
> > +        * write.
> > +        */
> > +       if (unlikely(file->f_flags & FL_DROP_PRIVS)) {
> > +               mutex_lock(&inode->i_mutex);
> > +               file_remove_privs(file);
> > +               mutex_unlock(&inode->i_mutex);
> > +       }
> > +
> >         fsnotify_close(file);
> >         /*
> >          * The function eventpoll_release() should be the first called
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 3aa514254161..409bd7047e7e 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -913,3 +913,4 @@
> >  #define FL_OFDLCK       1024    /* lock is "owned" by struct file */
> >  #define FL_LAYOUT       2048    /* outstanding pNFS layout */
> > +#define FL_DROP_PRIVS   4096    /* lest something weird decides that 2 is OK */
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index c387430f06c3..08a77e0cf65f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
> >
> >                 if (!page_mkwrite)
> >                         file_update_time(vma->vm_file);
> > +               vma->vm_file->f_flags |= FL_DROP_PRIVS;
> >         }
> >
> >         return VM_FAULT_WRITE;
> >
> > Willy
> >
> 
> Is f_flags safe to write like this without holding a lock?

Unfortunately I have no idea. I've seen places where it's written without
taking a lock such as in blkdev_open() and I don't think that this one is
called with a lock held.

The comment in fs.h says that spinlock f_lock is here to protect f_flags
(among others) and that it must not be taken from IRQ context. Thus I'd
think we "just" have to take it to remain safe. That would be just one
spinlock per first write via mmap() to a file, I don't know if that's
reasonable or not :-/

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kees Cook Dec. 10, 2015, 6:18 p.m. UTC | #4
On Thu, Dec 10, 2015 at 10:16 AM, Willy Tarreau <w@1wt.eu> wrote:
> On Thu, Dec 10, 2015 at 10:05:50AM -0800, Kees Cook wrote:
>> On Wed, Dec 9, 2015 at 11:06 PM, Willy Tarreau <w@1wt.eu> wrote:
>> > Hi Kees,
>> >
>> > Why not add a new file flag instead ?
>> >
>> > Something like this (editing your patch by hand to illustrate) :
>> >
>> > diff --git a/fs/file_table.c b/fs/file_table.c
>> > index ad17e05ebf95..3a7eee76ea90 100644
>> > --- a/fs/file_table.c
>> > +++ b/fs/file_table.c
>> > @@ -191,6 +191,17 @@ static void __fput(struct file *file)
>> >
>> >         might_sleep();
>> >
>> > +       /*
>> > +        * XXX: While avoiding mmap_sem, we've already been written to.
>> > +        * We must ignore the return value, since we can't reject the
>> > +        * write.
>> > +        */
>> > +       if (unlikely(file->f_flags & FL_DROP_PRIVS)) {
>> > +               mutex_lock(&inode->i_mutex);
>> > +               file_remove_privs(file);
>> > +               mutex_unlock(&inode->i_mutex);
>> > +       }
>> > +
>> >         fsnotify_close(file);
>> >         /*
>> >          * The function eventpoll_release() should be the first called
>> > diff --git a/include/linux/fs.h b/include/linux/fs.h
>> > index 3aa514254161..409bd7047e7e 100644
>> > --- a/include/linux/fs.h
>> > +++ b/include/linux/fs.h
>> > @@ -913,3 +913,4 @@
>> >  #define FL_OFDLCK       1024    /* lock is "owned" by struct file */
>> >  #define FL_LAYOUT       2048    /* outstanding pNFS layout */
>> > +#define FL_DROP_PRIVS   4096    /* lest something weird decides that 2 is OK */
>> >
>> > diff --git a/mm/memory.c b/mm/memory.c
>> > index c387430f06c3..08a77e0cf65f 100644
>> > --- a/mm/memory.c
>> > +++ b/mm/memory.c
>> > @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
>> >
>> >                 if (!page_mkwrite)
>> >                         file_update_time(vma->vm_file);
>> > +               vma->vm_file->f_flags |= FL_DROP_PRIVS;
>> >         }
>> >
>> >         return VM_FAULT_WRITE;
>> >
>> > Willy
>> >
>>
>> Is f_flags safe to write like this without holding a lock?
>
> Unfortunately I have no idea. I've seen places where it's written without
> taking a lock such as in blkdev_open() and I don't think that this one is
> called with a lock held.
>
> The comment in fs.h says that spinlock f_lock is here to protect f_flags
> (among others) and that it must not be taken from IRQ context. Thus I'd
> think we "just" have to take it to remain safe. That would be just one
> spinlock per first write via mmap() to a file, I don't know if that's
> reasonable or not :-/

Al, what's the best way forward here? I created a separate flag
variable so it could be used effectively write-only, with the read
happening only at final fput.

-Kees
Al Viro Dec. 10, 2015, 7:33 p.m. UTC | #5
On Thu, Dec 10, 2015 at 07:16:11PM +0100, Willy Tarreau wrote:

> > Is f_flags safe to write like this without holding a lock?
> 
> Unfortunately I have no idea. I've seen places where it's written without
> taking a lock such as in blkdev_open() and I don't think that this one is
> called with a lock held.

In any ->open() we obviously have nobody else able to find that struct file,
let alone modify it, so there the damn thing is essentially caller-private
and no locking is needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kees Cook Dec. 10, 2015, 7:47 p.m. UTC | #6
On Thu, Dec 10, 2015 at 11:33 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Dec 10, 2015 at 07:16:11PM +0100, Willy Tarreau wrote:
>
>> > Is f_flags safe to write like this without holding a lock?
>>
>> Unfortunately I have no idea. I've seen places where it's written without
>> taking a lock such as in blkdev_open() and I don't think that this one is
>> called with a lock held.
>
> In any ->open() we obviously have nobody else able to find that struct file,
> let alone modify it, so there the damn thing is essentially caller-private
> and no locking is needed.

In open, sure, but what about under mm/memory.c where we're trying to
twiddle it from vma->file->f_flags as in my patch? That seemed like it
would want atomic safety.

-Kees
Al Viro Dec. 10, 2015, 8:27 p.m. UTC | #7
On Thu, Dec 10, 2015 at 11:47:18AM -0800, Kees Cook wrote:

> In open, sure, but what about under mm/memory.c where we're trying to
> twiddle it from vma->file->f_flags as in my patch? That seemed like it
> would want atomic safety.

Sigh...  Again, I'm not at all convinced that this is the right approach,
but generally you need ->f_lock.  And in situations where the bit can
go only off->on, check it lockless, skip the whole thing entirely if it's
already set and grab the spinlock otherwise.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kees Cook Dec. 10, 2015, 9:45 p.m. UTC | #8
On Thu, Dec 10, 2015 at 12:27 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Dec 10, 2015 at 11:47:18AM -0800, Kees Cook wrote:
>
>> In open, sure, but what about under mm/memory.c where we're trying to
>> twiddle it from vma->file->f_flags as in my patch? That seemed like it
>> would want atomic safety.
>
> Sigh...  Again, I'm not at all convinced that this is the right approach,

I'm open to any suggestions. Every path I've tried has been ultimately
blocked by mmap_sem. :(

> but generally you need ->f_lock.  And in situations where the bit can
> go only off->on, check it lockless, skip the whole thing entirely if it's
> already set and grab the spinlock otherwise.

And I can take f_lock safely under mmap_sem?

-Kees
Al Viro Dec. 10, 2015, 9:56 p.m. UTC | #9
On Thu, Dec 10, 2015 at 01:45:09PM -0800, Kees Cook wrote:
> > but generally you need ->f_lock.  And in situations where the bit can
> > go only off->on, check it lockless, skip the whole thing entirely if it's
> > already set and grab the spinlock otherwise.
> 
> And I can take f_lock safely under mmap_sem?

Are you asking whether it's safe to take a spinlock under an rwsem?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kees Cook Dec. 10, 2015, 10 p.m. UTC | #10
On Thu, Dec 10, 2015 at 1:56 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Dec 10, 2015 at 01:45:09PM -0800, Kees Cook wrote:
>> > but generally you need ->f_lock.  And in situations where the bit can
>> > go only off->on, check it lockless, skip the whole thing entirely if it's
>> > already set and grab the spinlock otherwise.
>>
>> And I can take f_lock safely under mmap_sem?
>
> Are you asking whether it's safe to take a spinlock under an rwsem?

I keep getting various surprises while trying to implement this
change, so yeah, I just want to make sure I won't waste my time adding
taking the spinlock to the patch.

-Kees
diff mbox

Patch

diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05ebf95..3a7eee76ea90 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -191,6 +191,17 @@  static void __fput(struct file *file)
 
 	might_sleep();
 
+	/*
+	 * XXX: While avoiding mmap_sem, we've already been written to.
+	 * We must ignore the return value, since we can't reject the
+	 * write.
+	 */
+	if (unlikely(file->f_flags & FL_DROP_PRIVS)) {
+		mutex_lock(&inode->i_mutex);
+		file_remove_privs(file);
+		mutex_unlock(&inode->i_mutex);
+	}
+
 	fsnotify_close(file);
 	/*
 	 * The function eventpoll_release() should be the first called
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa514254161..409bd7047e7e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -913,3 +913,4 @@ 
 #define FL_OFDLCK       1024    /* lock is "owned" by struct file */
 #define FL_LAYOUT       2048    /* outstanding pNFS layout */
+#define FL_DROP_PRIVS   4096    /* lest something weird decides that 2 is OK */
 
diff --git a/mm/memory.c b/mm/memory.c
index c387430f06c3..08a77e0cf65f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2036,6 +2036,7 @@  static inline int wp_page_reuse(struct mm_struct *mm,
 
 		if (!page_mkwrite)
 			file_update_time(vma->vm_file);
+		vma->vm_file->f_flags |= FL_DROP_PRIVS;
 	}
 
 	return VM_FAULT_WRITE;