Message ID | 153378028121.1220.4418653283078446336.stgit@noble (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | locks: avoid thundering-herd wake-ups | expand |
On Thu, 2018-08-09 at 12:04 +1000, NeilBrown wrote: > When we find an existing lock which conflicts with a request, > and the request wants to wait, we currently add the request > to a list. When the lock is removed, the whole list is woken. > This can cause the thundering-herd problem. > To reduce the problem, we make use of the (new) fact that > a pending request can itself have a list of blocked requests. > When we find a conflict, we look through the existing blocked requests. > If any one of them blocks the new request, the new request is attached > below that request. > This way, when the lock is released, only a set of non-conflicting > locks will be woken. The rest of the herd can stay asleep. > > Reported-and-tested-by: Martin Wilck <mwilck@suse.de> > Signed-off-by: NeilBrown <neilb@suse.com> > --- > fs/locks.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 63 insertions(+), 6 deletions(-) > > diff --git a/fs/locks.c b/fs/locks.c > index fc64016d01ee..17843feb6f5b 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -738,6 +738,39 @@ static void locks_delete_block(struct file_lock *waiter) > spin_unlock(&blocked_lock_lock); > } > > +static void wake_non_conflicts(struct file_lock *waiter, struct file_lock *blocker, > + enum conflict conflict(struct file_lock *, > + struct file_lock *)) > +{ > + struct file_lock *parent = waiter; > + struct file_lock *fl; > + struct file_lock *t; > + > + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); > +restart: > + list_for_each_entry_safe_continue(fl, t, &parent->fl_blocked, fl_block) { > + switch (conflict(fl, blocker)) { > + default: BUG or WARN here too please. > + case FL_NO_CONFLICT: > + __locks_wake_one(fl); > + break; > + case FL_CONFLICT: > + /* Need to check children */ > + parent = fl; > + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); > + goto restart; > + case FL_TRANSITIVE_CONFLICT: > + /* all children must also conflict, no need to check */ > + continue; > + } > + } > + if (parent != waiter) { > + parent = parent->fl_blocker; > + fl = parent; > + goto restart; > + } > +} > + > /* Insert waiter into blocker's block list. > * We use a circular list so that processes can be easily woken up in > * the order they blocked. The documentation doesn't require this but > @@ -747,11 +780,32 @@ static void locks_delete_block(struct file_lock *waiter) > * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring > * that the flc_lock is also held on insertions we can avoid taking the > * blocked_lock_lock in some cases when we see that the fl_blocked list is empty. > + * > + * Rather than just adding to the list, we check for conflicts with any existing > + * waiter, and add to that waiter instead. > + * Thus wakeups don't happen until needed. > */ > static void __locks_insert_block(struct file_lock *blocker, > - struct file_lock *waiter) > + struct file_lock *waiter, > + enum conflict conflict(struct file_lock *, > + struct file_lock *)) > { > + struct file_lock *fl; > BUG_ON(!list_empty(&waiter->fl_block)); > + > + /* Any request in waiter->fl_blocked is know to conflict with "known" > + * waiter, but it might not conflict with blocker. > + * If it doesn't, it needs to be woken now so it can find > + * somewhere else to wait, or possible it can get granted. "possibly it can be" > + */ > + if (conflict(waiter, blocker) != FL_TRANSITIVE_CONFLICT) > + wake_non_conflicts(waiter, blocker, conflict); > +new_blocker: > + list_for_each_entry(fl, &blocker->fl_blocked, fl_block) > + if (conflict(fl, waiter)) { > + blocker = fl; > + goto new_blocker; > + } > > > waiter->fl_blocker = blocker; > list_add_tail(&waiter->fl_block, &blocker->fl_blocked); > if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)) I wonder if it might be better to insert the blocker first before waking up other waiters? Consider that anything awoken will end up contending for the flc_lock that is held by "current" at this point. Doing most of what you need to get done before waking them might mean less spinning in other tasks. > @@ -760,10 +814,12 @@ static void __locks_insert_block(struct file_lock *blocker, > > /* Must be called with flc_lock held. */ > static void locks_insert_block(struct file_lock *blocker, > - struct file_lock *waiter) > + struct file_lock *waiter, > + enum conflict conflict(struct file_lock *, > + struct file_lock *)) > { > spin_lock(&blocked_lock_lock); > - __locks_insert_block(blocker, waiter); > + __locks_insert_block(blocker, waiter, conflict); > spin_unlock(&blocked_lock_lock); > } > > @@ -1033,7 +1089,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request) > if (!(request->fl_flags & FL_SLEEP)) > goto out; > error = FILE_LOCK_DEFERRED; > - locks_insert_block(fl, request); > + locks_insert_block(fl, request, flock_locks_conflict); > goto out; > } > if (request->fl_flags & FL_ACCESS) > @@ -1107,7 +1163,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, > spin_lock(&blocked_lock_lock); > if (likely(!posix_locks_deadlock(request, fl))) { > error = FILE_LOCK_DEFERRED; > - __locks_insert_block(fl, request); > + __locks_insert_block(fl, request, > + posix_locks_conflict); > } > spin_unlock(&blocked_lock_lock); > goto out; > @@ -1581,7 +1638,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) > break_time -= jiffies; > if (break_time == 0) > break_time++; > - locks_insert_block(fl, new_fl); > + locks_insert_block(fl, new_fl, leases_conflict); > trace_break_lease_block(inode, new_fl); > spin_unlock(&ctx->flc_lock); > percpu_up_read_preempt_enable(&file_rwsem); > >
On Thu, Aug 09, 2018 at 12:04:41PM +1000, NeilBrown wrote: > When we find an existing lock which conflicts with a request, > and the request wants to wait, we currently add the request > to a list. When the lock is removed, the whole list is woken. > This can cause the thundering-herd problem. > To reduce the problem, we make use of the (new) fact that > a pending request can itself have a list of blocked requests. > When we find a conflict, we look through the existing blocked requests. > If any one of them blocks the new request, the new request is attached > below that request. > This way, when the lock is released, only a set of non-conflicting > locks will be woken. The rest of the herd can stay asleep. That that's not true any more--some of the locks you wake may conflict with each other. Is that right? Which is fine (the possibility of thundering herds in weird overlapping-range cases probably isn't a big deal). I just want to make sure I understand.... I think you could simplify the code a lot by maintaining the tree so that it always satisfies the condition that waiters are always strictly "weaker" than their descendents, so that finding a conflict with a waiter is always enough to know that the descendents also conflict. So, when you put a waiter to sleep, you don't add it below a child unless it's "stronger" than the child. You give up the property that siblings don't conflict, but again that just means thundering herds in weird cases, which is OK. --b. > > Reported-and-tested-by: Martin Wilck <mwilck@suse.de> > Signed-off-by: NeilBrown <neilb@suse.com> > --- > fs/locks.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 63 insertions(+), 6 deletions(-) > > diff --git a/fs/locks.c b/fs/locks.c > index fc64016d01ee..17843feb6f5b 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -738,6 +738,39 @@ static void locks_delete_block(struct file_lock *waiter) > spin_unlock(&blocked_lock_lock); > } > > +static void wake_non_conflicts(struct file_lock *waiter, struct file_lock *blocker, > + enum conflict conflict(struct file_lock *, > + struct file_lock *)) > +{ > + struct file_lock *parent = waiter; > + struct file_lock *fl; > + struct file_lock *t; > + > + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); > +restart: > + list_for_each_entry_safe_continue(fl, t, &parent->fl_blocked, fl_block) { > + switch (conflict(fl, blocker)) { > + default: > + case FL_NO_CONFLICT: > + __locks_wake_one(fl); > + break; > + case FL_CONFLICT: > + /* Need to check children */ > + parent = fl; > + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); > + goto restart; > + case FL_TRANSITIVE_CONFLICT: > + /* all children must also conflict, no need to check */ > + continue; > + } > + } > + if (parent != waiter) { > + parent = parent->fl_blocker; > + fl = parent; > + goto restart; > + } > +} > + > /* Insert waiter into blocker's block list. > * We use a circular list so that processes can be easily woken up in > * the order they blocked. The documentation doesn't require this but > @@ -747,11 +780,32 @@ static void locks_delete_block(struct file_lock *waiter) > * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring > * that the flc_lock is also held on insertions we can avoid taking the > * blocked_lock_lock in some cases when we see that the fl_blocked list is empty. > + * > + * Rather than just adding to the list, we check for conflicts with any existing > + * waiter, and add to that waiter instead. > + * Thus wakeups don't happen until needed. > */ > static void __locks_insert_block(struct file_lock *blocker, > - struct file_lock *waiter) > + struct file_lock *waiter, > + enum conflict conflict(struct file_lock *, > + struct file_lock *)) > { > + struct file_lock *fl; > BUG_ON(!list_empty(&waiter->fl_block)); > + > + /* Any request in waiter->fl_blocked is know to conflict with > + * waiter, but it might not conflict with blocker. > + * If it doesn't, it needs to be woken now so it can find > + * somewhere else to wait, or possible it can get granted. > + */ > + if (conflict(waiter, blocker) != FL_TRANSITIVE_CONFLICT) > + wake_non_conflicts(waiter, blocker, conflict); > +new_blocker: > + list_for_each_entry(fl, &blocker->fl_blocked, fl_block) > + if (conflict(fl, waiter)) { > + blocker = fl; > + goto new_blocker; > + } > waiter->fl_blocker = blocker; > list_add_tail(&waiter->fl_block, &blocker->fl_blocked); > if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)) > @@ -760,10 +814,12 @@ static void __locks_insert_block(struct file_lock *blocker, > > /* Must be called with flc_lock held. */ > static void locks_insert_block(struct file_lock *blocker, > - struct file_lock *waiter) > + struct file_lock *waiter, > + enum conflict conflict(struct file_lock *, > + struct file_lock *)) > { > spin_lock(&blocked_lock_lock); > - __locks_insert_block(blocker, waiter); > + __locks_insert_block(blocker, waiter, conflict); > spin_unlock(&blocked_lock_lock); > } > > @@ -1033,7 +1089,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request) > if (!(request->fl_flags & FL_SLEEP)) > goto out; > error = FILE_LOCK_DEFERRED; > - locks_insert_block(fl, request); > + locks_insert_block(fl, request, flock_locks_conflict); > goto out; > } > if (request->fl_flags & FL_ACCESS) > @@ -1107,7 +1163,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, > spin_lock(&blocked_lock_lock); > if (likely(!posix_locks_deadlock(request, fl))) { > error = FILE_LOCK_DEFERRED; > - __locks_insert_block(fl, request); > + __locks_insert_block(fl, request, > + posix_locks_conflict); > } > spin_unlock(&blocked_lock_lock); > goto out; > @@ -1581,7 +1638,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) > break_time -= jiffies; > if (break_time == 0) > break_time++; > - locks_insert_block(fl, new_fl); > + locks_insert_block(fl, new_fl, leases_conflict); > trace_break_lease_block(inode, new_fl); > spin_unlock(&ctx->flc_lock); > percpu_up_read_preempt_enable(&file_rwsem); >
On Thu, Aug 09 2018, J. Bruce Fields wrote: > On Thu, Aug 09, 2018 at 12:04:41PM +1000, NeilBrown wrote: >> When we find an existing lock which conflicts with a request, >> and the request wants to wait, we currently add the request >> to a list. When the lock is removed, the whole list is woken. >> This can cause the thundering-herd problem. >> To reduce the problem, we make use of the (new) fact that >> a pending request can itself have a list of blocked requests. >> When we find a conflict, we look through the existing blocked requests. >> If any one of them blocks the new request, the new request is attached >> below that request. >> This way, when the lock is released, only a set of non-conflicting >> locks will be woken. The rest of the herd can stay asleep. > > That that's not true any more--some of the locks you wake may conflict > with each other. Is that right? Which is fine (the possibility of > thundering herds in weird overlapping-range cases probably isn't a big > deal). I just want to make sure I understand.... Yes, you are correct. Lock waiters will be woken if they were directly blocked by a lock that has been released, if they were blocked (directly or indirectly) by a lock which is now blocked by a lock that they don't conflict with. The first set will be mutually non-conflicting. > > I think you could simplify the code a lot by maintaining the tree so > that it always satisfies the condition that waiters are always strictly > "weaker" than their descendents, so that finding a conflict with a > waiter is always enough to know that the descendents also conflict. Can you define "weaker" please. I suspect it is a partial ordering, in which case a tree would normally be more appropriate than trying to find a total ordering. Thanks, NeilBrown > > So, when you put a waiter to sleep, you don't add it below a child > unless it's "stronger" than the child. > > You give up the property that siblings don't conflict, but again that > just means thundering herds in weird cases, which is OK. > > --b. > >> >> Reported-and-tested-by: Martin Wilck <mwilck@suse.de> >> Signed-off-by: NeilBrown <neilb@suse.com> >> --- >> fs/locks.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 63 insertions(+), 6 deletions(-) >> >> diff --git a/fs/locks.c b/fs/locks.c >> index fc64016d01ee..17843feb6f5b 100644 >> --- a/fs/locks.c >> +++ b/fs/locks.c >> @@ -738,6 +738,39 @@ static void locks_delete_block(struct file_lock *waiter) >> spin_unlock(&blocked_lock_lock); >> } >> >> +static void wake_non_conflicts(struct file_lock *waiter, struct file_lock *blocker, >> + enum conflict conflict(struct file_lock *, >> + struct file_lock *)) >> +{ >> + struct file_lock *parent = waiter; >> + struct file_lock *fl; >> + struct file_lock *t; >> + >> + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); >> +restart: >> + list_for_each_entry_safe_continue(fl, t, &parent->fl_blocked, fl_block) { >> + switch (conflict(fl, blocker)) { >> + default: >> + case FL_NO_CONFLICT: >> + __locks_wake_one(fl); >> + break; >> + case FL_CONFLICT: >> + /* Need to check children */ >> + parent = fl; >> + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); >> + goto restart; >> + case FL_TRANSITIVE_CONFLICT: >> + /* all children must also conflict, no need to check */ >> + continue; >> + } >> + } >> + if (parent != waiter) { >> + parent = parent->fl_blocker; >> + fl = parent; >> + goto restart; >> + } >> +} >> + >> /* Insert waiter into blocker's block list. >> * We use a circular list so that processes can be easily woken up in >> * the order they blocked. The documentation doesn't require this but >> @@ -747,11 +780,32 @@ static void locks_delete_block(struct file_lock *waiter) >> * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring >> * that the flc_lock is also held on insertions we can avoid taking the >> * blocked_lock_lock in some cases when we see that the fl_blocked list is empty. >> + * >> + * Rather than just adding to the list, we check for conflicts with any existing >> + * waiter, and add to that waiter instead. >> + * Thus wakeups don't happen until needed. >> */ >> static void __locks_insert_block(struct file_lock *blocker, >> - struct file_lock *waiter) >> + struct file_lock *waiter, >> + enum conflict conflict(struct file_lock *, >> + struct file_lock *)) >> { >> + struct file_lock *fl; >> BUG_ON(!list_empty(&waiter->fl_block)); >> + >> + /* Any request in waiter->fl_blocked is know to conflict with >> + * waiter, but it might not conflict with blocker. >> + * If it doesn't, it needs to be woken now so it can find >> + * somewhere else to wait, or possible it can get granted. >> + */ >> + if (conflict(waiter, blocker) != FL_TRANSITIVE_CONFLICT) >> + wake_non_conflicts(waiter, blocker, conflict); >> +new_blocker: >> + list_for_each_entry(fl, &blocker->fl_blocked, fl_block) >> + if (conflict(fl, waiter)) { >> + blocker = fl; >> + goto new_blocker; >> + } >> waiter->fl_blocker = blocker; >> list_add_tail(&waiter->fl_block, &blocker->fl_blocked); >> if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)) >> @@ -760,10 +814,12 @@ static void __locks_insert_block(struct file_lock *blocker, >> >> /* Must be called with flc_lock held. */ >> static void locks_insert_block(struct file_lock *blocker, >> - struct file_lock *waiter) >> + struct file_lock *waiter, >> + enum conflict conflict(struct file_lock *, >> + struct file_lock *)) >> { >> spin_lock(&blocked_lock_lock); >> - __locks_insert_block(blocker, waiter); >> + __locks_insert_block(blocker, waiter, conflict); >> spin_unlock(&blocked_lock_lock); >> } >> >> @@ -1033,7 +1089,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request) >> if (!(request->fl_flags & FL_SLEEP)) >> goto out; >> error = FILE_LOCK_DEFERRED; >> - locks_insert_block(fl, request); >> + locks_insert_block(fl, request, flock_locks_conflict); >> goto out; >> } >> if (request->fl_flags & FL_ACCESS) >> @@ -1107,7 +1163,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, >> spin_lock(&blocked_lock_lock); >> if (likely(!posix_locks_deadlock(request, fl))) { >> error = FILE_LOCK_DEFERRED; >> - __locks_insert_block(fl, request); >> + __locks_insert_block(fl, request, >> + posix_locks_conflict); >> } >> spin_unlock(&blocked_lock_lock); >> goto out; >> @@ -1581,7 +1638,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) >> break_time -= jiffies; >> if (break_time == 0) >> break_time++; >> - locks_insert_block(fl, new_fl); >> + locks_insert_block(fl, new_fl, leases_conflict); >> trace_break_lease_block(inode, new_fl); >> spin_unlock(&ctx->flc_lock); >> percpu_up_read_preempt_enable(&file_rwsem); >>
On Thu, Aug 09 2018, Jeff Layton wrote: > On Thu, 2018-08-09 at 12:04 +1000, NeilBrown wrote: >> When we find an existing lock which conflicts with a request, >> and the request wants to wait, we currently add the request >> to a list. When the lock is removed, the whole list is woken. >> This can cause the thundering-herd problem. >> To reduce the problem, we make use of the (new) fact that >> a pending request can itself have a list of blocked requests. >> When we find a conflict, we look through the existing blocked requests. >> If any one of them blocks the new request, the new request is attached >> below that request. >> This way, when the lock is released, only a set of non-conflicting >> locks will be woken. The rest of the herd can stay asleep. >> >> Reported-and-tested-by: Martin Wilck <mwilck@suse.de> >> Signed-off-by: NeilBrown <neilb@suse.com> >> --- >> fs/locks.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 63 insertions(+), 6 deletions(-) >> >> diff --git a/fs/locks.c b/fs/locks.c >> index fc64016d01ee..17843feb6f5b 100644 >> --- a/fs/locks.c >> +++ b/fs/locks.c >> @@ -738,6 +738,39 @@ static void locks_delete_block(struct file_lock *waiter) >> spin_unlock(&blocked_lock_lock); >> } >> >> +static void wake_non_conflicts(struct file_lock *waiter, struct file_lock *blocker, >> + enum conflict conflict(struct file_lock *, >> + struct file_lock *)) >> +{ >> + struct file_lock *parent = waiter; >> + struct file_lock *fl; >> + struct file_lock *t; >> + >> + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); >> +restart: >> + list_for_each_entry_safe_continue(fl, t, &parent->fl_blocked, fl_block) { >> + switch (conflict(fl, blocker)) { >> + default: > > BUG or WARN here too please. Maybe .... I'd rather not have the default case at all. I can remove this one, but if I remove the next one, gcc complains ../fs/locks.c: In function ‘posix_locks_conflict’: ../fs/locks.c:912:1: warning: control reaches end of non-void function [-Wreturn-type] event though control cannot reach the end of the function. Maybe: switch (locks_conflict(caller_fl, sys_fl)) { default: WARN(1, "locks_conflict returned impossible value"); /* fallthrough */ case FL_NO_CONFLICT: > >> + case FL_NO_CONFLICT: >> + __locks_wake_one(fl); >> + break; >> + case FL_CONFLICT: >> + /* Need to check children */ >> + parent = fl; >> + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); >> + goto restart; >> + case FL_TRANSITIVE_CONFLICT: >> + /* all children must also conflict, no need to check */ >> + continue; >> + } >> + } >> + if (parent != waiter) { >> + parent = parent->fl_blocker; >> + fl = parent; >> + goto restart; >> + } >> +} >> + >> /* Insert waiter into blocker's block list. >> * We use a circular list so that processes can be easily woken up in >> * the order they blocked. The documentation doesn't require this but >> @@ -747,11 +780,32 @@ static void locks_delete_block(struct file_lock *waiter) >> * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring >> * that the flc_lock is also held on insertions we can avoid taking the >> * blocked_lock_lock in some cases when we see that the fl_blocked list is empty. >> + * >> + * Rather than just adding to the list, we check for conflicts with any existing >> + * waiter, and add to that waiter instead. >> + * Thus wakeups don't happen until needed. >> */ >> static void __locks_insert_block(struct file_lock *blocker, >> - struct file_lock *waiter) >> + struct file_lock *waiter, >> + enum conflict conflict(struct file_lock *, >> + struct file_lock *)) >> { >> + struct file_lock *fl; >> BUG_ON(!list_empty(&waiter->fl_block)); >> + >> + /* Any request in waiter->fl_blocked is know to conflict with > > "known" > >> + * waiter, but it might not conflict with blocker. >> + * If it doesn't, it needs to be woken now so it can find >> + * somewhere else to wait, or possible it can get granted. > > "possibly it can be" Both fixed, thanks. > >> + */ >> + if (conflict(waiter, blocker) != FL_TRANSITIVE_CONFLICT) >> + wake_non_conflicts(waiter, blocker, conflict); >> +new_blocker: >> + list_for_each_entry(fl, &blocker->fl_blocked, fl_block) >> + if (conflict(fl, waiter)) { >> + blocker = fl; >> + goto new_blocker; >> + } >> >> > waiter->fl_blocker = blocker; >> list_add_tail(&waiter->fl_block, &blocker->fl_blocked); >> if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)) > > I wonder if it might be better to insert the blocker first before waking > up other waiters? Consider that anything awoken will end up contending > for the flc_lock that is held by "current" at this point. Doing most of > what you need to get done before waking them might mean less spinning in > other tasks. > Maybe. I think you are suggesting we move the call to wake_non_conflicts() to the end of the function. The main reason I put it at the top is to use the original value of "blocker" before it gets changed. Even if we move it to the end, there is still quite a few other little tasks to be performed before the lock is dropped. Will all this get done before some other processor has a chance to wake up a process, and for that process to get a to spinlock ??? Maybe - though the first spinlock would be blocked_lock_lock in locks_delete_block(), and that is dropped almost immediately. I don't know ... it seems much of a muchness. If the process will be woken that quickly, then some other processor must be idle, and does it matter much if it spends a microsecond spinning on a lock rather than being idle a bit longer? Thanks. I won't to do a least a little testing before I repost with any of these changes. NeilBrown >> @@ -760,10 +814,12 @@ static void __locks_insert_block(struct file_lock *blocker, >> >> /* Must be called with flc_lock held. */ >> static void locks_insert_block(struct file_lock *blocker, >> - struct file_lock *waiter) >> + struct file_lock *waiter, >> + enum conflict conflict(struct file_lock *, >> + struct file_lock *)) >> { >> spin_lock(&blocked_lock_lock); >> - __locks_insert_block(blocker, waiter); >> + __locks_insert_block(blocker, waiter, conflict); >> spin_unlock(&blocked_lock_lock); >> } >> >> @@ -1033,7 +1089,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request) >> if (!(request->fl_flags & FL_SLEEP)) >> goto out; >> error = FILE_LOCK_DEFERRED; >> - locks_insert_block(fl, request); >> + locks_insert_block(fl, request, flock_locks_conflict); >> goto out; >> } >> if (request->fl_flags & FL_ACCESS) >> @@ -1107,7 +1163,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, >> spin_lock(&blocked_lock_lock); >> if (likely(!posix_locks_deadlock(request, fl))) { >> error = FILE_LOCK_DEFERRED; >> - __locks_insert_block(fl, request); >> + __locks_insert_block(fl, request, >> + posix_locks_conflict); >> } >> spin_unlock(&blocked_lock_lock); >> goto out; >> @@ -1581,7 +1638,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) >> break_time -= jiffies; >> if (break_time == 0) >> break_time++; >> - locks_insert_block(fl, new_fl); >> + locks_insert_block(fl, new_fl, leases_conflict); >> trace_break_lease_block(inode, new_fl); >> spin_unlock(&ctx->flc_lock); >> percpu_up_read_preempt_enable(&file_rwsem); >> >> > > -- > Jeff Layton <jlayton@kernel.org>
On Fri, Aug 10, 2018 at 08:19:26AM +1000, NeilBrown wrote: > On Thu, Aug 09 2018, J. Bruce Fields wrote: > > I think you could simplify the code a lot by maintaining the tree so > > that it always satisfies the condition that waiters are always strictly > > "weaker" than their descendents, so that finding a conflict with a > > waiter is always enough to know that the descendents also conflict. > > Can you define "weaker" please. > I suspect it is a partial ordering, in which case a tree would normally > be more appropriate than trying to find a total ordering. Lock X is stronger than lock Y if any lock that would conflict with lock Y would also conflict with lock X. Equivalently, X is stronger than Y if lock X's range is a superset of lock Y's and if X is a write lock whenever Y is. Well, I *thought* that was equivalent until I thought about the owner problem. Ugh. --b. > > Thanks, > NeilBrown > > > > > So, when you put a waiter to sleep, you don't add it below a child > > unless it's "stronger" than the child. > > > > You give up the property that siblings don't conflict, but again that > > just means thundering herds in weird cases, which is OK. > > > > --b. > > > >> > >> Reported-and-tested-by: Martin Wilck <mwilck@suse.de> > >> Signed-off-by: NeilBrown <neilb@suse.com> > >> --- > >> fs/locks.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----- > >> 1 file changed, 63 insertions(+), 6 deletions(-) > >> > >> diff --git a/fs/locks.c b/fs/locks.c > >> index fc64016d01ee..17843feb6f5b 100644 > >> --- a/fs/locks.c > >> +++ b/fs/locks.c > >> @@ -738,6 +738,39 @@ static void locks_delete_block(struct file_lock *waiter) > >> spin_unlock(&blocked_lock_lock); > >> } > >> > >> +static void wake_non_conflicts(struct file_lock *waiter, struct file_lock *blocker, > >> + enum conflict conflict(struct file_lock *, > >> + struct file_lock *)) > >> +{ > >> + struct file_lock *parent = waiter; > >> + struct file_lock *fl; > >> + struct file_lock *t; > >> + > >> + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); > >> +restart: > >> + list_for_each_entry_safe_continue(fl, t, &parent->fl_blocked, fl_block) { > >> + switch (conflict(fl, blocker)) { > >> + default: > >> + case FL_NO_CONFLICT: > >> + __locks_wake_one(fl); > >> + break; > >> + case FL_CONFLICT: > >> + /* Need to check children */ > >> + parent = fl; > >> + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); > >> + goto restart; > >> + case FL_TRANSITIVE_CONFLICT: > >> + /* all children must also conflict, no need to check */ > >> + continue; > >> + } > >> + } > >> + if (parent != waiter) { > >> + parent = parent->fl_blocker; > >> + fl = parent; > >> + goto restart; > >> + } > >> +} > >> + > >> /* Insert waiter into blocker's block list. > >> * We use a circular list so that processes can be easily woken up in > >> * the order they blocked. The documentation doesn't require this but > >> @@ -747,11 +780,32 @@ static void locks_delete_block(struct file_lock *waiter) > >> * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring > >> * that the flc_lock is also held on insertions we can avoid taking the > >> * blocked_lock_lock in some cases when we see that the fl_blocked list is empty. > >> + * > >> + * Rather than just adding to the list, we check for conflicts with any existing > >> + * waiter, and add to that waiter instead. > >> + * Thus wakeups don't happen until needed. > >> */ > >> static void __locks_insert_block(struct file_lock *blocker, > >> - struct file_lock *waiter) > >> + struct file_lock *waiter, > >> + enum conflict conflict(struct file_lock *, > >> + struct file_lock *)) > >> { > >> + struct file_lock *fl; > >> BUG_ON(!list_empty(&waiter->fl_block)); > >> + > >> + /* Any request in waiter->fl_blocked is know to conflict with > >> + * waiter, but it might not conflict with blocker. > >> + * If it doesn't, it needs to be woken now so it can find > >> + * somewhere else to wait, or possible it can get granted. > >> + */ > >> + if (conflict(waiter, blocker) != FL_TRANSITIVE_CONFLICT) > >> + wake_non_conflicts(waiter, blocker, conflict); > >> +new_blocker: > >> + list_for_each_entry(fl, &blocker->fl_blocked, fl_block) > >> + if (conflict(fl, waiter)) { > >> + blocker = fl; > >> + goto new_blocker; > >> + } > >> waiter->fl_blocker = blocker; > >> list_add_tail(&waiter->fl_block, &blocker->fl_blocked); > >> if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)) > >> @@ -760,10 +814,12 @@ static void __locks_insert_block(struct file_lock *blocker, > >> > >> /* Must be called with flc_lock held. */ > >> static void locks_insert_block(struct file_lock *blocker, > >> - struct file_lock *waiter) > >> + struct file_lock *waiter, > >> + enum conflict conflict(struct file_lock *, > >> + struct file_lock *)) > >> { > >> spin_lock(&blocked_lock_lock); > >> - __locks_insert_block(blocker, waiter); > >> + __locks_insert_block(blocker, waiter, conflict); > >> spin_unlock(&blocked_lock_lock); > >> } > >> > >> @@ -1033,7 +1089,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request) > >> if (!(request->fl_flags & FL_SLEEP)) > >> goto out; > >> error = FILE_LOCK_DEFERRED; > >> - locks_insert_block(fl, request); > >> + locks_insert_block(fl, request, flock_locks_conflict); > >> goto out; > >> } > >> if (request->fl_flags & FL_ACCESS) > >> @@ -1107,7 +1163,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, > >> spin_lock(&blocked_lock_lock); > >> if (likely(!posix_locks_deadlock(request, fl))) { > >> error = FILE_LOCK_DEFERRED; > >> - __locks_insert_block(fl, request); > >> + __locks_insert_block(fl, request, > >> + posix_locks_conflict); > >> } > >> spin_unlock(&blocked_lock_lock); > >> goto out; > >> @@ -1581,7 +1638,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) > >> break_time -= jiffies; > >> if (break_time == 0) > >> break_time++; > >> - locks_insert_block(fl, new_fl); > >> + locks_insert_block(fl, new_fl, leases_conflict); > >> trace_break_lease_block(inode, new_fl); > >> spin_unlock(&ctx->flc_lock); > >> percpu_up_read_preempt_enable(&file_rwsem); > >>
diff --git a/fs/locks.c b/fs/locks.c index fc64016d01ee..17843feb6f5b 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -738,6 +738,39 @@ static void locks_delete_block(struct file_lock *waiter) spin_unlock(&blocked_lock_lock); } +static void wake_non_conflicts(struct file_lock *waiter, struct file_lock *blocker, + enum conflict conflict(struct file_lock *, + struct file_lock *)) +{ + struct file_lock *parent = waiter; + struct file_lock *fl; + struct file_lock *t; + + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); +restart: + list_for_each_entry_safe_continue(fl, t, &parent->fl_blocked, fl_block) { + switch (conflict(fl, blocker)) { + default: + case FL_NO_CONFLICT: + __locks_wake_one(fl); + break; + case FL_CONFLICT: + /* Need to check children */ + parent = fl; + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); + goto restart; + case FL_TRANSITIVE_CONFLICT: + /* all children must also conflict, no need to check */ + continue; + } + } + if (parent != waiter) { + parent = parent->fl_blocker; + fl = parent; + goto restart; + } +} + /* Insert waiter into blocker's block list. * We use a circular list so that processes can be easily woken up in * the order they blocked. The documentation doesn't require this but @@ -747,11 +780,32 @@ static void locks_delete_block(struct file_lock *waiter) * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring * that the flc_lock is also held on insertions we can avoid taking the * blocked_lock_lock in some cases when we see that the fl_blocked list is empty. + * + * Rather than just adding to the list, we check for conflicts with any existing + * waiter, and add to that waiter instead. + * Thus wakeups don't happen until needed. */ static void __locks_insert_block(struct file_lock *blocker, - struct file_lock *waiter) + struct file_lock *waiter, + enum conflict conflict(struct file_lock *, + struct file_lock *)) { + struct file_lock *fl; BUG_ON(!list_empty(&waiter->fl_block)); + + /* Any request in waiter->fl_blocked is know to conflict with + * waiter, but it might not conflict with blocker. + * If it doesn't, it needs to be woken now so it can find + * somewhere else to wait, or possible it can get granted. + */ + if (conflict(waiter, blocker) != FL_TRANSITIVE_CONFLICT) + wake_non_conflicts(waiter, blocker, conflict); +new_blocker: + list_for_each_entry(fl, &blocker->fl_blocked, fl_block) + if (conflict(fl, waiter)) { + blocker = fl; + goto new_blocker; + } waiter->fl_blocker = blocker; list_add_tail(&waiter->fl_block, &blocker->fl_blocked); if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)) @@ -760,10 +814,12 @@ static void __locks_insert_block(struct file_lock *blocker, /* Must be called with flc_lock held. */ static void locks_insert_block(struct file_lock *blocker, - struct file_lock *waiter) + struct file_lock *waiter, + enum conflict conflict(struct file_lock *, + struct file_lock *)) { spin_lock(&blocked_lock_lock); - __locks_insert_block(blocker, waiter); + __locks_insert_block(blocker, waiter, conflict); spin_unlock(&blocked_lock_lock); } @@ -1033,7 +1089,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request) if (!(request->fl_flags & FL_SLEEP)) goto out; error = FILE_LOCK_DEFERRED; - locks_insert_block(fl, request); + locks_insert_block(fl, request, flock_locks_conflict); goto out; } if (request->fl_flags & FL_ACCESS) @@ -1107,7 +1163,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, spin_lock(&blocked_lock_lock); if (likely(!posix_locks_deadlock(request, fl))) { error = FILE_LOCK_DEFERRED; - __locks_insert_block(fl, request); + __locks_insert_block(fl, request, + posix_locks_conflict); } spin_unlock(&blocked_lock_lock); goto out; @@ -1581,7 +1638,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) break_time -= jiffies; if (break_time == 0) break_time++; - locks_insert_block(fl, new_fl); + locks_insert_block(fl, new_fl, leases_conflict); trace_break_lease_block(inode, new_fl); spin_unlock(&ctx->flc_lock); percpu_up_read_preempt_enable(&file_rwsem);
When we find an existing lock which conflicts with a request, and the request wants to wait, we currently add the request to a list. When the lock is removed, the whole list is woken. This can cause the thundering-herd problem. To reduce the problem, we make use of the (new) fact that a pending request can itself have a list of blocked requests. When we find a conflict, we look through the existing blocked requests. If any one of them blocks the new request, the new request is attached below that request. This way, when the lock is released, only a set of non-conflicting locks will be woken. The rest of the herd can stay asleep. Reported-and-tested-by: Martin Wilck <mwilck@suse.de> Signed-off-by: NeilBrown <neilb@suse.com> --- fs/locks.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 63 insertions(+), 6 deletions(-)