Message ID | 20150619223223.10B658AD@viggo.jf.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri 19-06-15 15:32:23, Dave Hansen wrote: > > Currently, __sb_start_write() and freeze_super() can race with > each other. __sb_start_write() uses a smp_mb() to ensure that > freeze_super() can see its write to sb->s_writers.counter and > that it can see freeze_super()'s update to sb->s_writers.frozen. > This all seems to work fine. > > But, this smp_mb() makes __sb_start_write() the single hottest > function in the kernel if I sit in a loop and do tiny write()s to > tmpfs over and over. This is on a very small 2-core system, so > it will only get worse on larger systems. > > This _seems_ like an ideal case for RCU. __sb_start_write() is > the RCU read-side and is in a very fast, performance-sensitive > path. freeze_super() is the RCU writer and is in an extremely > rare non-performance-sensitive path. > > Instead of doing and smp_wmb() in __sb_start_write(), we do > rcu_read_lock(). This ensures that a CPU doing freeze_super() > can not proceed past its synchronize_rcu() until the grace > period has ended and the 's_writers.frozen = SB_FREEZE_WRITE' > is visible to __sb_start_write(). > > One question here: Does the work that __sb_start_write() does in > a previous grace period becomes visible to freeze_super() after > its call to synchronize_rcu()? It _seems_ like it should, but it > seems backwards to me since __sb_start_write() is the "reader" in > this case. > > This patch increases the number of writes/second that I can do > by 10.4%. > > Does anybody see any holes with this? Nice speed up and looks good to me. Just one question below. > @@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb) > printk(KERN_ERR > "VFS:Filesystem freeze failed\n"); > sb->s_writers.frozen = SB_UNFROZEN; > - smp_wmb(); > + synchronize_rcu(); Do we really need synchronize_rcu() here? We just need to make sure write to sb->s_writers.frozen happens before we start waking processes... > wake_up(&sb->s_writers.wait_unfrozen); > deactivate_locked_super(sb); > return ret; > @@ -1387,7 +1391,7 @@ int thaw_super(struct super_block *sb) > > out: > sb->s_writers.frozen = SB_UNFROZEN; > - smp_wmb(); > + synchronize_rcu(); > wake_up(&sb->s_writers.wait_unfrozen); And here as well... Honza
On 06/23/2015 04:09 AM, Jan Kara wrote: >> @@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb) >> printk(KERN_ERR >> "VFS:Filesystem freeze failed\n"); >> sb->s_writers.frozen = SB_UNFROZEN; >> - smp_wmb(); >> + synchronize_rcu(); > > Do we really need synchronize_rcu() here? We just need to make sure write > to sb->s_writers.frozen happens before we start waking processes... I don't think it is necessary. We only need to be concerned in practice if someone could be inside a critical section when we are executing this. I *think* the only case that we have that really matters will be taken care of by the _first_ synchronize_rcu(). It's definitely worth adding a comment. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff -puN fs/super.c~rcu-__sb_start_write fs/super.c --- a/fs/super.c~rcu-__sb_start_write 2015-06-19 14:50:53.081869092 -0700 +++ b/fs/super.c 2015-06-19 15:19:03.000473047 -0700 @@ -1190,27 +1190,25 @@ static void acquire_freeze_lock(struct s */ int __sb_start_write(struct super_block *sb, int level, bool wait) { -retry: - if (unlikely(sb->s_writers.frozen >= level)) { + /* + * RCU keeps freeze_super() from proceeding + * while we are in here. + */ + rcu_read_lock(); + while (unlikely(sb->s_writers.frozen >= level)) { + rcu_read_unlock(); if (!wait) - return 0; + return 0; wait_event(sb->s_writers.wait_unfrozen, sb->s_writers.frozen < level); + rcu_read_lock(); } #ifdef CONFIG_LOCKDEP acquire_freeze_lock(sb, level, !wait, _RET_IP_); #endif percpu_counter_inc(&sb->s_writers.counter[level-1]); - /* - * Make sure counter is updated before we check for frozen. - * freeze_super() first sets frozen and then checks the counter. - */ - smp_mb(); - if (unlikely(sb->s_writers.frozen >= level)) { - __sb_end_write(sb, level); - goto retry; - } + rcu_read_unlock(); return 1; } EXPORT_SYMBOL(__sb_start_write); @@ -1312,7 +1310,13 @@ int freeze_super(struct super_block *sb) /* From now on, no new normal writers can start */ sb->s_writers.frozen = SB_FREEZE_WRITE; - smp_wmb(); + /* + * After we synchronize_rcu(), we have ensured that everyone + * who reads sb->s_writers.frozen under rcu_read_lock() can + * now see our update. This pretty much means that + * __sb_start_write() will not allow any new writers. + */ + synchronize_rcu(); /* Release s_umount to preserve sb_start_write -> s_umount ordering */ up_write(&sb->s_umount); @@ -1322,7 +1326,7 @@ int freeze_super(struct super_block *sb) /* Now we go and block page faults... */ down_write(&sb->s_umount); sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_PAGEFAULT); @@ -1331,7 +1335,7 @@ int freeze_super(struct super_block *sb) /* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_FS); if (sb->s_op->freeze_fs) { @@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb) printk(KERN_ERR "VFS:Filesystem freeze failed\n"); sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); return ret; @@ -1387,7 +1391,7 @@ int thaw_super(struct super_block *sb) out: sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb);