Message ID | 20151210070635.GC31922@1wt.eu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Dec 10, 2015 at 08:06:35AM +0100, Willy Tarreau wrote: > Hi Kees, > > Why not add a new file flag instead ? > > Something like this (editing your patch by hand to illustrate) : (...) > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 3aa514254161..409bd7047e7e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -913,3 +913,4 @@ > #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ > #define FL_LAYOUT 2048 /* outstanding pNFS layout */ > +#define FL_DROP_PRIVS 4096 /* lest something weird decides that 2 is OK */ Crap, these ones are for locks, we need to use O_* instead But anyway you get the idea, I mean there are probably many spare bits overthere. Another option I was thinking about was to change f_mode and detect the change on close. But I don't know what to compare it against. Willy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 9, 2015 at 11:06 PM, Willy Tarreau <w@1wt.eu> wrote: > Hi Kees, > > Why not add a new file flag instead ? > > Something like this (editing your patch by hand to illustrate) : > > diff --git a/fs/file_table.c b/fs/file_table.c > index ad17e05ebf95..3a7eee76ea90 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -191,6 +191,17 @@ static void __fput(struct file *file) > > might_sleep(); > > + /* > + * XXX: While avoiding mmap_sem, we've already been written to. > + * We must ignore the return value, since we can't reject the > + * write. > + */ > + if (unlikely(file->f_flags & FL_DROP_PRIVS)) { > + mutex_lock(&inode->i_mutex); > + file_remove_privs(file); > + mutex_unlock(&inode->i_mutex); > + } > + > fsnotify_close(file); > /* > * The function eventpoll_release() should be the first called > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 3aa514254161..409bd7047e7e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -913,3 +913,4 @@ > #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ > #define FL_LAYOUT 2048 /* outstanding pNFS layout */ > +#define FL_DROP_PRIVS 4096 /* lest something weird decides that 2 is OK */ > > diff --git a/mm/memory.c b/mm/memory.c > index c387430f06c3..08a77e0cf65f 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm, > > if (!page_mkwrite) > file_update_time(vma->vm_file); > + vma->vm_file->f_flags |= FL_DROP_PRIVS; > } > > return VM_FAULT_WRITE; > > Willy > Is f_flags safe to write like this without holding a lock? -Kees
On Thu, Dec 10, 2015 at 10:05:50AM -0800, Kees Cook wrote: > On Wed, Dec 9, 2015 at 11:06 PM, Willy Tarreau <w@1wt.eu> wrote: > > Hi Kees, > > > > Why not add a new file flag instead ? > > > > Something like this (editing your patch by hand to illustrate) : > > > > diff --git a/fs/file_table.c b/fs/file_table.c > > index ad17e05ebf95..3a7eee76ea90 100644 > > --- a/fs/file_table.c > > +++ b/fs/file_table.c > > @@ -191,6 +191,17 @@ static void __fput(struct file *file) > > > > might_sleep(); > > > > + /* > > + * XXX: While avoiding mmap_sem, we've already been written to. > > + * We must ignore the return value, since we can't reject the > > + * write. > > + */ > > + if (unlikely(file->f_flags & FL_DROP_PRIVS)) { > > + mutex_lock(&inode->i_mutex); > > + file_remove_privs(file); > > + mutex_unlock(&inode->i_mutex); > > + } > > + > > fsnotify_close(file); > > /* > > * The function eventpoll_release() should be the first called > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 3aa514254161..409bd7047e7e 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -913,3 +913,4 @@ > > #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ > > #define FL_LAYOUT 2048 /* outstanding pNFS layout */ > > +#define FL_DROP_PRIVS 4096 /* lest something weird decides that 2 is OK */ > > > > diff --git a/mm/memory.c b/mm/memory.c > > index c387430f06c3..08a77e0cf65f 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm, > > > > if (!page_mkwrite) > > file_update_time(vma->vm_file); > > + vma->vm_file->f_flags |= FL_DROP_PRIVS; > > } > > > > return VM_FAULT_WRITE; > > > > Willy > > > > Is f_flags safe to write like this without holding a lock? Unfortunately I have no idea. I've seen places where it's written without taking a lock such as in blkdev_open() and I don't think that this one is called with a lock held. The comment in fs.h says that spinlock f_lock is here to protect f_flags (among others) and that it must not be taken from IRQ context. Thus I'd think we "just" have to take it to remain safe. That would be just one spinlock per first write via mmap() to a file, I don't know if that's reasonable or not :-/ Willy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 10, 2015 at 10:16 AM, Willy Tarreau <w@1wt.eu> wrote: > On Thu, Dec 10, 2015 at 10:05:50AM -0800, Kees Cook wrote: >> On Wed, Dec 9, 2015 at 11:06 PM, Willy Tarreau <w@1wt.eu> wrote: >> > Hi Kees, >> > >> > Why not add a new file flag instead ? >> > >> > Something like this (editing your patch by hand to illustrate) : >> > >> > diff --git a/fs/file_table.c b/fs/file_table.c >> > index ad17e05ebf95..3a7eee76ea90 100644 >> > --- a/fs/file_table.c >> > +++ b/fs/file_table.c >> > @@ -191,6 +191,17 @@ static void __fput(struct file *file) >> > >> > might_sleep(); >> > >> > + /* >> > + * XXX: While avoiding mmap_sem, we've already been written to. >> > + * We must ignore the return value, since we can't reject the >> > + * write. >> > + */ >> > + if (unlikely(file->f_flags & FL_DROP_PRIVS)) { >> > + mutex_lock(&inode->i_mutex); >> > + file_remove_privs(file); >> > + mutex_unlock(&inode->i_mutex); >> > + } >> > + >> > fsnotify_close(file); >> > /* >> > * The function eventpoll_release() should be the first called >> > diff --git a/include/linux/fs.h b/include/linux/fs.h >> > index 3aa514254161..409bd7047e7e 100644 >> > --- a/include/linux/fs.h >> > +++ b/include/linux/fs.h >> > @@ -913,3 +913,4 @@ >> > #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ >> > #define FL_LAYOUT 2048 /* outstanding pNFS layout */ >> > +#define FL_DROP_PRIVS 4096 /* lest something weird decides that 2 is OK */ >> > >> > diff --git a/mm/memory.c b/mm/memory.c >> > index c387430f06c3..08a77e0cf65f 100644 >> > --- a/mm/memory.c >> > +++ b/mm/memory.c >> > @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm, >> > >> > if (!page_mkwrite) >> > file_update_time(vma->vm_file); >> > + vma->vm_file->f_flags |= FL_DROP_PRIVS; >> > } >> > >> > return VM_FAULT_WRITE; >> > >> > Willy >> > >> >> Is f_flags safe to write like this without holding a lock? > > Unfortunately I have no idea. I've seen places where it's written without > taking a lock such as in blkdev_open() and I don't think that this one is > called with a lock held. > > The comment in fs.h says that spinlock f_lock is here to protect f_flags > (among others) and that it must not be taken from IRQ context. Thus I'd > think we "just" have to take it to remain safe. That would be just one > spinlock per first write via mmap() to a file, I don't know if that's > reasonable or not :-/ Al, what's the best way forward here? I created a separate flag variable so it could be used effectively write-only, with the read happening only at final fput. -Kees
On Thu, Dec 10, 2015 at 07:16:11PM +0100, Willy Tarreau wrote: > > Is f_flags safe to write like this without holding a lock? > > Unfortunately I have no idea. I've seen places where it's written without > taking a lock such as in blkdev_open() and I don't think that this one is > called with a lock held. In any ->open() we obviously have nobody else able to find that struct file, let alone modify it, so there the damn thing is essentially caller-private and no locking is needed. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 10, 2015 at 11:33 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, Dec 10, 2015 at 07:16:11PM +0100, Willy Tarreau wrote: > >> > Is f_flags safe to write like this without holding a lock? >> >> Unfortunately I have no idea. I've seen places where it's written without >> taking a lock such as in blkdev_open() and I don't think that this one is >> called with a lock held. > > In any ->open() we obviously have nobody else able to find that struct file, > let alone modify it, so there the damn thing is essentially caller-private > and no locking is needed. In open, sure, but what about under mm/memory.c where we're trying to twiddle it from vma->file->f_flags as in my patch? That seemed like it would want atomic safety. -Kees
On Thu, Dec 10, 2015 at 11:47:18AM -0800, Kees Cook wrote: > In open, sure, but what about under mm/memory.c where we're trying to > twiddle it from vma->file->f_flags as in my patch? That seemed like it > would want atomic safety. Sigh... Again, I'm not at all convinced that this is the right approach, but generally you need ->f_lock. And in situations where the bit can go only off->on, check it lockless, skip the whole thing entirely if it's already set and grab the spinlock otherwise. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 10, 2015 at 12:27 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, Dec 10, 2015 at 11:47:18AM -0800, Kees Cook wrote: > >> In open, sure, but what about under mm/memory.c where we're trying to >> twiddle it from vma->file->f_flags as in my patch? That seemed like it >> would want atomic safety. > > Sigh... Again, I'm not at all convinced that this is the right approach, I'm open to any suggestions. Every path I've tried has been ultimately blocked by mmap_sem. :( > but generally you need ->f_lock. And in situations where the bit can > go only off->on, check it lockless, skip the whole thing entirely if it's > already set and grab the spinlock otherwise. And I can take f_lock safely under mmap_sem? -Kees
On Thu, Dec 10, 2015 at 01:45:09PM -0800, Kees Cook wrote: > > but generally you need ->f_lock. And in situations where the bit can > > go only off->on, check it lockless, skip the whole thing entirely if it's > > already set and grab the spinlock otherwise. > > And I can take f_lock safely under mmap_sem? Are you asking whether it's safe to take a spinlock under an rwsem? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 10, 2015 at 1:56 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, Dec 10, 2015 at 01:45:09PM -0800, Kees Cook wrote: >> > but generally you need ->f_lock. And in situations where the bit can >> > go only off->on, check it lockless, skip the whole thing entirely if it's >> > already set and grab the spinlock otherwise. >> >> And I can take f_lock safely under mmap_sem? > > Are you asking whether it's safe to take a spinlock under an rwsem? I keep getting various surprises while trying to implement this change, so yeah, I just want to make sure I won't waste my time adding taking the spinlock to the patch. -Kees
diff --git a/fs/file_table.c b/fs/file_table.c index ad17e05ebf95..3a7eee76ea90 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -191,6 +191,17 @@ static void __fput(struct file *file) might_sleep(); + /* + * XXX: While avoiding mmap_sem, we've already been written to. + * We must ignore the return value, since we can't reject the + * write. + */ + if (unlikely(file->f_flags & FL_DROP_PRIVS)) { + mutex_lock(&inode->i_mutex); + file_remove_privs(file); + mutex_unlock(&inode->i_mutex); + } + fsnotify_close(file); /* * The function eventpoll_release() should be the first called diff --git a/include/linux/fs.h b/include/linux/fs.h index 3aa514254161..409bd7047e7e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -913,3 +913,4 @@ #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ #define FL_LAYOUT 2048 /* outstanding pNFS layout */ +#define FL_DROP_PRIVS 4096 /* lest something weird decides that 2 is OK */ diff --git a/mm/memory.c b/mm/memory.c index c387430f06c3..08a77e0cf65f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm, if (!page_mkwrite) file_update_time(vma->vm_file); + vma->vm_file->f_flags |= FL_DROP_PRIVS; } return VM_FAULT_WRITE;