diff mbox

[v1,11/11] locks: give the blocked_hash its own spinlock

Message ID 1370056054-25449-12-git-send-email-jlayton@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton June 1, 2013, 3:07 a.m. UTC
There's no reason we have to protect the blocked_hash and file_lock_list
with the same spinlock. With the tests I have, breaking it in two gives
a barely measurable performance benefit, but it seems reasonable to make
this locking as granular as possible.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 Documentation/filesystems/Locking |   16 ++++++++--------
 fs/locks.c                        |   17 ++++++++++-------
 2 files changed, 18 insertions(+), 15 deletions(-)

Comments

Stefan Metzmacher June 4, 2013, 2:19 p.m. UTC | #1
Hi Jeff,

> There's no reason we have to protect the blocked_hash and file_lock_list
> with the same spinlock. With the tests I have, breaking it in two gives
> a barely measurable performance benefit, but it seems reasonable to make
> this locking as granular as possible.

as file_lock_{list,lock} is only used for debugging (/proc/locks) after this
change, I guess it would be possible to use RCU instead of a spinlock.

@others: this was the related discussion on IRC
(http://irclog.samba.org/) about this:

16:02 < metze> jlayton: do you have time to discuss your file_lock_lock
changes?
16:02 < jlayton> metze: sure, what's up?
16:03 < jlayton> metze: note that it won't help vl's thundering herd
problems...
16:03 < metze> is it correct that after your last patch file_lock_lock
is only used for /proc/locks?
16:03 < jlayton> well, it's only used to protect the list that is used
for /proc/locks
16:04 < jlayton> it still gets taken whenever a lock is acquired or
released in order to manipulate that list
16:04 < metze> would it be a good idea to use rcu instead of a spin lock?
16:04 < jlayton> I tried using RCU, but it turned out to slow everything
down
16:04 < jlayton> this is not a read-mostly workload unfortunately
16:04 < jlayton> so doing it with mutual exclusion turns out to be faster
16:04 < metze> ok
16:05 < jlayton> I might play around with it again sometime, but I don't
think it really helps. What we need to ensure is
                 that we optimize the code that manipulates that list,
and RCU list manipulations have larger overhead
16:06 < jlayton> metze: that's a good question though so if you want to
ask it on the list, please do
16:06 < jlayton> others will probably be wondering the same thing
16:08 < metze> maybe it's worth a comment in commit message and the code
16:08 < metze> btw, why don't you remove the ' /* Protects the
file_lock_list and the blocked_hash */' comment?

metze
Jeff Layton June 4, 2013, 2:39 p.m. UTC | #2
On Tue, 04 Jun 2013 16:19:53 +0200
"Stefan (metze) Metzmacher" <metze@samba.org> wrote:

> Hi Jeff,
> 
> > There's no reason we have to protect the blocked_hash and file_lock_list
> > with the same spinlock. With the tests I have, breaking it in two gives
> > a barely measurable performance benefit, but it seems reasonable to make
> > this locking as granular as possible.
> 
> as file_lock_{list,lock} is only used for debugging (/proc/locks) after this
> change, I guess it would be possible to use RCU instead of a spinlock.
> 
> @others: this was the related discussion on IRC
> (http://irclog.samba.org/) about this:
> 
> 16:02 < metze> jlayton: do you have time to discuss your file_lock_lock
> changes?
> 16:02 < jlayton> metze: sure, what's up?
> 16:03 < jlayton> metze: note that it won't help vl's thundering herd
> problems...
> 16:03 < metze> is it correct that after your last patch file_lock_lock
> is only used for /proc/locks?
> 16:03 < jlayton> well, it's only used to protect the list that is used
> for /proc/locks
> 16:04 < jlayton> it still gets taken whenever a lock is acquired or
> released in order to manipulate that list
> 16:04 < metze> would it be a good idea to use rcu instead of a spin lock?
> 16:04 < jlayton> I tried using RCU, but it turned out to slow everything
> down
> 16:04 < jlayton> this is not a read-mostly workload unfortunately
> 16:04 < jlayton> so doing it with mutual exclusion turns out to be faster
> 16:04 < metze> ok
> 16:05 < jlayton> I might play around with it again sometime, but I don't
> think it really helps. What we need to ensure is
>                  that we optimize the code that manipulates that list,
> and RCU list manipulations have larger overhead
> 16:06 < jlayton> metze: that's a good question though so if you want to
> ask it on the list, please do
> 16:06 < jlayton> others will probably be wondering the same thing
> 16:08 < metze> maybe it's worth a comment in commit message and the code


I'm not sure it's worth commenting about RCU in the code. In the future
it may be possible to do this -- who knows. It does seem to have been
consistently slower in my testing here though.

> 16:08 < metze> btw, why don't you remove the ' /* Protects the
> file_lock_list and the blocked_hash */' comment?
> 

Removed in my git tree -- thanks for pointing that out.
Christoph Hellwig June 4, 2013, 2:46 p.m. UTC | #3
Having RCU for modification mostly workloads never is a good idea, so
I don't think it makes sense to mention it here.

If you care about the overhead it's worth trying to use per-cpu lists,
though.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields June 4, 2013, 2:53 p.m. UTC | #4
On Tue, Jun 04, 2013 at 07:46:40AM -0700, Christoph Hellwig wrote:
> Having RCU for modification mostly workloads never is a good idea, so
> I don't think it makes sense to mention it here.
> 
> If you care about the overhead it's worth trying to use per-cpu lists,
> though.

Yes.  The lock and unlock could happen on different CPU's--but I think
you can make the rule that the lock stays associated with the list it
was first put on, and then it's correct in general and hopefully quick
in the common case.

--b.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton June 4, 2013, 2:56 p.m. UTC | #5
On Tue, 4 Jun 2013 07:46:40 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> Having RCU for modification mostly workloads never is a good idea, so
> I don't think it makes sense to mention it here.
> 
> If you care about the overhead it's worth trying to use per-cpu lists,
> though.
> 

Yeah, I looked at those too in an earlier set and it did help some.
Moving to a hashtable for the blocked_list really seemed to help the
most there, but percpu lists with lglocks or something might help a lot
on the file_lock_list.
Jeff Layton June 4, 2013, 3:15 p.m. UTC | #6
On Tue, 4 Jun 2013 10:53:22 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Jun 04, 2013 at 07:46:40AM -0700, Christoph Hellwig wrote:
> > Having RCU for modification mostly workloads never is a good idea, so
> > I don't think it makes sense to mention it here.
> > 
> > If you care about the overhead it's worth trying to use per-cpu lists,
> > though.
> 
> Yes.  The lock and unlock could happen on different CPU's--but I think
> you can make the rule that the lock stays associated with the list it
> was first put on, and then it's correct in general and hopefully quick
> in the common case.
> 

It's important to distinguish between the blocked_list/hash and the
file_lock_list. Yes, they use the same embedded list_head or hlist_node
in the file_lock, but they have very different characteristics and use
cases.

In the testing I did, having a hashtable for the blocked locks helped a
lot more than a percpu list. The trick with deadlock detection is to
ensure that you don't spend a lot of time walking the lists. Since we
do deadlock detection frequently, we need to optimize for that case.

For the file_lock_list, it might make sense to have percpu hlists with
an lglock however. The thing to note here is that the file_lock_list is
almost never read. Only /proc/locks uses it, so anything we can do to
optimize the list manipulation is probably worth it.

All that said, I'm leery about changing too much of this code too fast.
It's pretty old and poorly understood, so I think we need to be
cautious and take an incremental approach to changing it.
diff mbox

Patch

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index ee351ac..8d8d040 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -359,20 +359,20 @@  prototypes:
 
 locking rules:
 
-			inode->i_lock	file_lock_lock	may block
-lm_compare_owner:	yes		maybe		no
-lm_owner_key		yes		yes		no
-lm_notify:		yes		no		no
-lm_grant:		no		no		no
-lm_break:		yes		no		no
-lm_change		yes		no		no
+			inode->i_lock	blocked_hash_lock	may block
+lm_compare_owner:	yes		maybe			no
+lm_owner_key		yes		yes			no
+lm_notify:		yes		no			no
+lm_grant:		no		no			no
+lm_break:		yes		no			no
+lm_change		yes		no			no
 
 	->lm_compare_owner and ->lm_owner_key are generally called with
 *an* inode->i_lock held. It may not be the i_lock of the inode
 associated with either file_lock argument! This is the case with deadlock
 detection, since the code has to chase down the owners of locks that may
 be entirely unrelated to the one on which the lock is being acquired.
-For deadlock detection however, the file_lock_lock is also held. The
+For deadlock detection however, the blocked_hash_lock is also held. The
 fact that these locks are held ensures that the file_locks do not
 disappear out from under you while doing the comparison or generating an
 owner key.
diff --git a/fs/locks.c b/fs/locks.c
index 8219187..520f32b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -172,12 +172,13 @@  int lease_break_time = 45;
  */
 #define BLOCKED_HASH_BITS	7
 
+static DEFINE_SPINLOCK(blocked_hash_lock);
 static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
 
+static DEFINE_SPINLOCK(file_lock_lock);
 static HLIST_HEAD(file_lock_list);
 
 /* Protects the file_lock_list and the blocked_hash */
-static DEFINE_SPINLOCK(file_lock_lock);
 
 static struct kmem_cache *filelock_cache __read_mostly;
 
@@ -503,17 +504,17 @@  posix_owner_key(struct file_lock *fl)
 static inline void
 locks_insert_global_blocked(struct file_lock *waiter)
 {
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_hash_lock);
 	hash_add(blocked_hash, &waiter->fl_link, posix_owner_key(waiter));
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_hash_lock);
 }
 
 static inline void
 locks_delete_global_blocked(struct file_lock *waiter)
 {
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_hash_lock);
 	hash_del(&waiter->fl_link);
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_hash_lock);
 }
 
 static inline void
@@ -739,7 +740,7 @@  static int posix_locks_deadlock(struct file_lock *caller_fl,
 	int i = 0;
 	int ret = 0;
 
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_hash_lock);
 	while ((block_fl = what_owner_is_waiting_for(block_fl))) {
 		if (i++ > MAX_DEADLK_ITERATIONS)
 			break;
@@ -748,7 +749,7 @@  static int posix_locks_deadlock(struct file_lock *caller_fl,
 			break;
 		}
 	}
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_hash_lock);
 	return ret;
 }
 
@@ -2300,10 +2301,12 @@  static int locks_show(struct seq_file *f, void *v)
 
 	lock_get_status(f, fl, *((loff_t *)f->private), "");
 
+	spin_lock(&blocked_hash_lock);
 	hash_for_each(blocked_hash, bkt, bfl, fl_link) {
 		if (bfl->fl_next == fl)
 			lock_get_status(f, bfl, *((loff_t *)f->private), " ->");
 	}
+	spin_unlock(&blocked_hash_lock);
 
 	return 0;
 }