Message ID | 20240502223341.1835070-5-keescook@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: Do not allow get_file() to resurrect 0 f_count | expand |
On Thu, May 02, 2024 at 03:33:40PM -0700, Kees Cook wrote: > Underflow of f_count needs to be more carefully detected than it > currently is. The results of get_file() should be checked, but the > first step is detection. Redefine f_count from atomic_long_t to > refcount_long_t. It is used on fairly hot paths. What's more, it's not at all obvious what the hell would right semantics be. NAKed-by: Al Viro <viro@zeniv.linux.org.uk>
On Thu, May 02, 2024 at 11:42:50PM +0100, Al Viro wrote: > On Thu, May 02, 2024 at 03:33:40PM -0700, Kees Cook wrote: > > Underflow of f_count needs to be more carefully detected than it > > currently is. The results of get_file() should be checked, but the > > first step is detection. Redefine f_count from atomic_long_t to > > refcount_long_t. > > It is used on fairly hot paths. What's more, it's not > at all obvious what the hell would right semantics be. I think we've put performance concerns between refcount_t and atomic_t to rest long ago. If there is a real workload where it's a problem, let's find it! :) As for semantics, what do you mean? Detecting dec-below-zero means we catch underflow, and detected inc-from-zero means we catch resurrection attempts. In both cases we avoid double-free, but we have already lost to a potential dangling reference to a freed struct file. But just letting f_count go bad seems dangerous.
On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote: > As for semantics, what do you mean? Detecting dec-below-zero means we > catch underflow, and detected inc-from-zero means we catch resurrection > attempts. In both cases we avoid double-free, but we have already lost > to a potential dangling reference to a freed struct file. But just > letting f_count go bad seems dangerous. Detected inc-from-zero can also mean an RCU lookup detecting a descriptor in the middle of getting closed. And it's more subtle than that, actually, thanks to SLAB_TYPESAFE_BY_RCU for struct file.
On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote: > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote: > > > As for semantics, what do you mean? Detecting dec-below-zero means we > > catch underflow, and detected inc-from-zero means we catch resurrection > > attempts. In both cases we avoid double-free, but we have already lost > > to a potential dangling reference to a freed struct file. But just > > letting f_count go bad seems dangerous. > > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor > in the middle of getting closed. And it's more subtle than that, actually, > thanks to SLAB_TYPESAFE_BY_RCU for struct file. But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be impossible for a simple get_file() to ever see a 0 f_count under normal conditions?
On Thu, May 02, 2024 at 04:21:13PM -0700, Kees Cook wrote: > On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote: > > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote: > > > > > As for semantics, what do you mean? Detecting dec-below-zero means we > > > catch underflow, and detected inc-from-zero means we catch resurrection > > > attempts. In both cases we avoid double-free, but we have already lost > > > to a potential dangling reference to a freed struct file. But just > > > letting f_count go bad seems dangerous. > > > > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor > > in the middle of getting closed. And it's more subtle than that, actually, > > thanks to SLAB_TYPESAFE_BY_RCU for struct file. > > But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be > impossible for a simple get_file() to ever see a 0 f_count under normal > conditions? For get_file() it is impossible. The comment about semantics had been about the sane ways to recover if such crap gets detected. __get_file_rcu() is a separate story - consider the comment in there: * atomic_long_inc_not_zero() above provided a full memory * barrier when we acquired a reference. * * This is paired with the write barrier from assigning to the * __rcu protected file pointer so that if that pointer still * matches the current file, we know we have successfully * acquired a reference to the right file. and IIRC, refcount_t is weaker wrt barriers.
On Fri, May 03, 2024 at 12:41:52AM +0100, Al Viro wrote: > On Thu, May 02, 2024 at 04:21:13PM -0700, Kees Cook wrote: > > On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote: > > > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote: > > > > > > > As for semantics, what do you mean? Detecting dec-below-zero means we > > > > catch underflow, and detected inc-from-zero means we catch resurrection > > > > attempts. In both cases we avoid double-free, but we have already lost > > > > to a potential dangling reference to a freed struct file. But just > > > > letting f_count go bad seems dangerous. > > > > > > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor > > > in the middle of getting closed. And it's more subtle than that, actually, > > > thanks to SLAB_TYPESAFE_BY_RCU for struct file. > > > > But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be > > impossible for a simple get_file() to ever see a 0 f_count under normal > > conditions? > > For get_file() it is impossible. The comment about semantics had been > about the sane ways to recover if such crap gets detected. > > __get_file_rcu() is a separate story - consider the comment in there: > * atomic_long_inc_not_zero() above provided a full memory > * barrier when we acquired a reference. > * > * This is paired with the write barrier from assigning to the > * __rcu protected file pointer so that if that pointer still > * matches the current file, we know we have successfully > * acquired a reference to the right file. > > and IIRC, refcount_t is weaker wrt barriers. I think that was also fixed for refcount_t. I'll need to go dig out the commit... But anyway, there needs to be a general "oops I hit 0"-aware form of get_file(), and it seems like it should just be get_file() itself...
On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > But anyway, there needs to be a general "oops I hit 0"-aware form of > get_file(), and it seems like it should just be get_file() itself... ... which brings back the question of what's the sane damage mitigation for that. Adding arseloads of never-exercised failure exits is generally a bad idea - it's asking for bitrot and making the thing harder to review in future.
On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote: > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > > > But anyway, there needs to be a general "oops I hit 0"-aware form of > > get_file(), and it seems like it should just be get_file() itself... > > ... which brings back the question of what's the sane damage mitigation > for that. Adding arseloads of never-exercised failure exits is generally > a bad idea - it's asking for bitrot and making the thing harder to review > in future. Linus seems to prefer best-effort error recovery to sprinkling BUG()s around. But if that's really the solution, then how about get_file() switching to to use inc_not_zero and BUG on 0?
On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote: > On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote: > > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > > > > > But anyway, there needs to be a general "oops I hit 0"-aware form of > > > get_file(), and it seems like it should just be get_file() itself... > > > > ... which brings back the question of what's the sane damage mitigation > > for that. Adding arseloads of never-exercised failure exits is generally > > a bad idea - it's asking for bitrot and making the thing harder to review > > in future. > > Linus seems to prefer best-effort error recovery to sprinkling BUG()s > around. But if that's really the solution, then how about get_file() > switching to to use inc_not_zero and BUG on 0? Making get_file() return an error is not an option. For all current callers that's pointless churn for a condition that's not supposed to happen at all. Additionally, iirc *_inc_not_zero() variants are implemented with try_cmpxchg() which scales poorly under contention for a condition that's not supposed to happen.
On Fri, May 03, 2024 at 11:37:25AM +0200, Christian Brauner wrote: > On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote: > > On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote: > > > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > > > > > > > But anyway, there needs to be a general "oops I hit 0"-aware form of > > > > get_file(), and it seems like it should just be get_file() itself... > > > > > > ... which brings back the question of what's the sane damage mitigation > > > for that. Adding arseloads of never-exercised failure exits is generally > > > a bad idea - it's asking for bitrot and making the thing harder to review > > > in future. > > > > Linus seems to prefer best-effort error recovery to sprinkling BUG()s > > around. But if that's really the solution, then how about get_file() > > switching to to use inc_not_zero and BUG on 0? > > Making get_file() return an error is not an option. For all current > callers that's pointless churn for a condition that's not supposed to > happen at all. > > Additionally, iirc *_inc_not_zero() variants are implemented with > try_cmpxchg() which scales poorly under contention for a condition > that's not supposed to happen. unsigned long old = atomic_long_fetch_inc_relaxed(&f->f_count); WARN_ON(!old); Or somesuch might be an option?
On Fri, May 03, 2024 at 12:36:14PM +0200, Peter Zijlstra wrote: > On Fri, May 03, 2024 at 11:37:25AM +0200, Christian Brauner wrote: > > On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote: > > > On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote: > > > > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > > > > > > > > > But anyway, there needs to be a general "oops I hit 0"-aware form of > > > > > get_file(), and it seems like it should just be get_file() itself... > > > > > > > > ... which brings back the question of what's the sane damage mitigation > > > > for that. Adding arseloads of never-exercised failure exits is generally > > > > a bad idea - it's asking for bitrot and making the thing harder to review > > > > in future. > > > > > > Linus seems to prefer best-effort error recovery to sprinkling BUG()s > > > around. But if that's really the solution, then how about get_file() > > > switching to to use inc_not_zero and BUG on 0? > > > > Making get_file() return an error is not an option. For all current > > callers that's pointless churn for a condition that's not supposed to > > happen at all. > > > > Additionally, iirc *_inc_not_zero() variants are implemented with > > try_cmpxchg() which scales poorly under contention for a condition > > that's not supposed to happen. > > unsigned long old = atomic_long_fetch_inc_relaxed(&f->f_count); > WARN_ON(!old); > > Or somesuch might be an option? Yeah, I'd be fine with that. WARN_ON() (or WARN_ON_ONCE() even?) and then people can do their panic_on_warn stuff to get the BUG_ON() behavior if they want to.
diff --git a/fs/file.c b/fs/file.c index 3b683b9101d8..570424dd634b 100644 --- a/fs/file.c +++ b/fs/file.c @@ -865,7 +865,7 @@ static struct file *__get_file_rcu(struct file __rcu **f) if (!file) return NULL; - if (unlikely(!atomic_long_inc_not_zero(&file->f_count))) + if (unlikely(!refcount_long_inc_not_zero(&file->f_count))) return ERR_PTR(-EAGAIN); file_reloaded = rcu_dereference_raw(*f); @@ -987,7 +987,7 @@ static inline struct file *__fget_files_rcu(struct files_struct *files, * barrier. We only really need an 'acquire' one to * protect the loads below, but we don't have that. */ - if (unlikely(!atomic_long_inc_not_zero(&file->f_count))) + if (unlikely(!refcount_long_inc_not_zero(&file->f_count))) continue; /* diff --git a/fs/file_table.c b/fs/file_table.c index 4f03beed4737..f29e7b94bca1 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -167,7 +167,7 @@ static int init_file(struct file *f, int flags, const struct cred *cred) * fget-rcu pattern users need to be able to handle spurious * refcount bumps we should reinitialize the reused file first. */ - atomic_long_set(&f->f_count, 1); + refcount_long_set(&f->f_count, 1); return 0; } @@ -470,7 +470,7 @@ static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput); void fput(struct file *file) { - if (atomic_long_dec_and_test(&file->f_count)) { + if (refcount_long_dec_and_test(&file->f_count)) { struct task_struct *task = current; if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) { @@ -503,7 +503,7 @@ void fput(struct file *file) */ void __fput_sync(struct file *file) { - if (atomic_long_dec_and_test(&file->f_count)) + if (refcount_long_dec_and_test(&file->f_count)) __fput(file); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 210bbbfe9b83..b8f6cce7c39d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1001,7 +1001,7 @@ struct file { */ spinlock_t f_lock; fmode_t f_mode; - atomic_long_t f_count; + refcount_long_t f_count; struct mutex f_pos_lock; loff_t f_pos; unsigned int f_flags; @@ -1038,7 +1038,7 @@ struct file_handle { static inline struct file *get_file(struct file *f) { - if (unlikely(!atomic_long_inc_not_zero(&f->f_count))) + if (unlikely(!refcount_long_inc_not_zero(&f->f_count))) return NULL; return f; } @@ -1046,7 +1046,7 @@ static inline struct file *get_file(struct file *f) struct file *get_file_rcu(struct file __rcu **f); struct file *get_file_active(struct file **f); -#define file_count(x) atomic_long_read(&(x)->f_count) +#define file_count(x) refcount_long_read(&(x)->f_count) #define MAX_NON_LFS ((1UL<<31) - 1)
Underflow of f_count needs to be more carefully detected than it currently is. The results of get_file() should be checked, but the first step is detection. Redefine f_count from atomic_long_t to refcount_long_t. Signed-off-by: Kees Cook <keescook@chromium.org> --- Cc: Christian Brauner <brauner@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: linux-fsdevel@vger.kernel.org --- fs/file.c | 4 ++-- fs/file_table.c | 6 +++--- include/linux/fs.h | 6 +++--- 3 files changed, 8 insertions(+), 8 deletions(-)