Message ID | 20150619223223.94775FFA@viggo.jf.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri 19-06-15 15:32:23, Dave Hansen wrote: > If I sit in a loop and do write()s to small tmpfs files, > __sb_end_write() is third-hottest kernel function due to its > smp_mb(). > > The stated purpose for the smp_mb() in __sb_end_write() is to > ensure "s_writers are updated before we wake up waiters". We > only wake up waiters if waitqueue_active(), but we do the > smp_mb() unconditionally. > > It seems like we should be able to avoid it unless we are > actually doing the wake_up(). ... > diff -puN fs/super.c~selectively-do-barriers-in-__sb_end_write fs/super.c > --- a/fs/super.c~selectively-do-barriers-in-__sb_end_write 2015-06-19 15:20:37.953726659 -0700 > +++ b/fs/super.c 2015-06-19 15:20:37.956726794 -0700 > @@ -1147,13 +1147,14 @@ out: > void __sb_end_write(struct super_block *sb, int level) > { > percpu_counter_dec(&sb->s_writers.counter[level-1]); > - /* > - * Make sure s_writers are updated before we wake up waiters in > - * freeze_super(). > - */ > - smp_mb(); > - if (waitqueue_active(&sb->s_writers.wait)) > + if (waitqueue_active(&sb->s_writers.wait)) { > + /* > + * Make sure other CPUs can see our s_writers update > + * before we wake up waiters in freeze_super(). > + */ > + smp_mb(); I think this is actually wrong. The barrier has to be before the waitqueue_active() check. Otherwise that read can be reordered before the percpu counter increment and a race window opens... But we could make things faster by something like: __sb_end_write() rcu_read_lock(); percpu_counter_dec(&sb->s_writers.counter[level-1]); if (unlikely(sb->s_writers.frozen >= level)) wake_up(&sb->s_writers.wait); rcu_read_unlock(); So the synchronize_rcu() calls you've added in the first patch will make sure that all __sb_end_write() calls after we've started the freeze procedure will end up calling wake_up() and so the process waiting in sb_wait_write() will be woken as necessary. But please add a detailed comment about the synchronization because its tricky and uncommon... Honza > wake_up(&sb->s_writers.wait); > + } > rwsem_release(&sb->s_writers.lock_map[level-1], 1, _RET_IP_); > } > EXPORT_SYMBOL(__sb_end_write); > _
diff -puN fs/super.c~selectively-do-barriers-in-__sb_end_write fs/super.c --- a/fs/super.c~selectively-do-barriers-in-__sb_end_write 2015-06-19 15:20:37.953726659 -0700 +++ b/fs/super.c 2015-06-19 15:20:37.956726794 -0700 @@ -1147,13 +1147,14 @@ out: void __sb_end_write(struct super_block *sb, int level) { percpu_counter_dec(&sb->s_writers.counter[level-1]); - /* - * Make sure s_writers are updated before we wake up waiters in - * freeze_super(). - */ - smp_mb(); - if (waitqueue_active(&sb->s_writers.wait)) + if (waitqueue_active(&sb->s_writers.wait)) { + /* + * Make sure other CPUs can see our s_writers update + * before we wake up waiters in freeze_super(). + */ + smp_mb(); wake_up(&sb->s_writers.wait); + } rwsem_release(&sb->s_writers.lock_map[level-1], 1, _RET_IP_); } EXPORT_SYMBOL(__sb_end_write);