Message ID | 153378012255.1220.6754153662007899557.stgit@noble (mailing list archive) |
---|---|
Headers | show |
Series | locks: avoid thundering-herd wake-ups | expand |
I think there's also a problem with multiple tasks sharing the same lock owner. So, all locks are exclusive locks for the same range. We have four tasks. Tasks 1 and 4 share the same owner, the others' owners are distinct. - Task 1 gets a lock. - Task 2 gets a conflicting lock. - Task 3 gets another conflicting lock. So now we the tree is 3->2->1. - Task 1's lock is released. - Before task 2 is scheduled, task 4 acquires a new lock. - Task 2 waits on task 4's lock, we now have 3->2->4. Task 3 shouldn't be waiting--the lock it's requesting has the same owner as the lock task 4 holds--but we fail to wake up task 3. --b.
On Thu, Aug 09 2018, J. Bruce Fields wrote: > I think there's also a problem with multiple tasks sharing the same > lock owner. > > So, all locks are exclusive locks for the same range. We have four > tasks. Tasks 1 and 4 share the same owner, the others' owners are > distinct. > > - Task 1 gets a lock. > - Task 2 gets a conflicting lock. > - Task 3 gets another conflicting lock. So now we the tree is > 3->2->1. > - Task 1's lock is released. > - Before task 2 is scheduled, task 4 acquires a new lock. > - Task 2 waits on task 4's lock, we now have > 3->2->4. > > Task 3 shouldn't be waiting--the lock it's requesting has the same owner > as the lock task 4 holds--but we fail to wake up task 3. So task 1 and task 4 are threads in the one process - OK. Tasks 2 and 3 are threads in two other processes. So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be woken? I suspect you got the numbers bit mixed up, but in any case, the "conflict()" function that is passed around takes ownership into account when assessing if one lock conflicts with another. NeilBrown
On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote: > On Thu, Aug 09 2018, J. Bruce Fields wrote: > > > I think there's also a problem with multiple tasks sharing the same > > lock owner. > > > > So, all locks are exclusive locks for the same range. We have four > > tasks. Tasks 1 and 4 share the same owner, the others' owners are > > distinct. > > > > - Task 1 gets a lock. > > - Task 2 gets a conflicting lock. > > - Task 3 gets another conflicting lock. So now we the tree is > > 3->2->1. > > - Task 1's lock is released. > > - Before task 2 is scheduled, task 4 acquires a new lock. > > - Task 2 waits on task 4's lock, we now have > > 3->2->4. > > > > Task 3 shouldn't be waiting--the lock it's requesting has the same owner > > as the lock task 4 holds--but we fail to wake up task 3. > > So task 1 and task 4 are threads in the one process - OK. > Tasks 2 and 3 are threads in two other processes. > > So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be > woken? > > I suspect you got the numbers bit mixed up, Whoops. > but in any case, the "conflict()" function that is passed around takes > ownership into account when assessing if one lock conflicts with > another. Right, I know, but, let me try again: All locks are exclusive locks for the same range. Only tasks 3 and 4 share the the same owner. - Task 1 gets a lock. - Task 2 requests a conflicting lock, so we have 2->1. - Task 3 requests a conflicting lock, so we have 3->2->1. - Task 1 unlocks. We wake up task 2, but it isn't scheduled yet. - Task 4 gets a new lock. - Task 2 runs, discovers the conflict, and waits. Now we have: 3->2->4. There is no conflict between the lock 3 requested and the lock 4 holds, but 3 is not woken up. This is another version of the first problem: there's information we need (the owners of the waiting locks in the tree) that we can't determine just from looking at the root of the tree. I'm not sure what to do about that. --b.
On Thu, Aug 09 2018, J. Bruce Fields wrote: > On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote: >> On Thu, Aug 09 2018, J. Bruce Fields wrote: >> >> > I think there's also a problem with multiple tasks sharing the same >> > lock owner. >> > >> > So, all locks are exclusive locks for the same range. We have four >> > tasks. Tasks 1 and 4 share the same owner, the others' owners are >> > distinct. >> > >> > - Task 1 gets a lock. >> > - Task 2 gets a conflicting lock. >> > - Task 3 gets another conflicting lock. So now we the tree is >> > 3->2->1. >> > - Task 1's lock is released. >> > - Before task 2 is scheduled, task 4 acquires a new lock. >> > - Task 2 waits on task 4's lock, we now have >> > 3->2->4. >> > >> > Task 3 shouldn't be waiting--the lock it's requesting has the same owner >> > as the lock task 4 holds--but we fail to wake up task 3. >> >> So task 1 and task 4 are threads in the one process - OK. >> Tasks 2 and 3 are threads in two other processes. >> >> So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be >> woken? >> >> I suspect you got the numbers bit mixed up, > > Whoops. > >> but in any case, the "conflict()" function that is passed around takes >> ownership into account when assessing if one lock conflicts with >> another. > > Right, I know, but, let me try again: > > All locks are exclusive locks for the same range. Only tasks 3 and 4 > share the the same owner. > > - Task 1 gets a lock. > - Task 2 requests a conflicting lock, so we have 2->1. > - Task 3 requests a conflicting lock, so we have 3->2->1. > - Task 1 unlocks. We wake up task 2, but it isn't scheduled yet. > - Task 4 gets a new lock. > - Task 2 runs, discovers the conflict, and waits. Now we have: > 3->2->4. > > There is no conflict between the lock 3 requested and the lock 4 holds, > but 3 is not woken up. > > This is another version of the first problem: there's information we > need (the owners of the waiting locks in the tree) that we can't > determine just from looking at the root of the tree. > > I'm not sure what to do about that. You're good at this game! So, because a locker with the same "owner" gets a free pass, you can *never* say that any lock which conflicts with A also conflicts with B, as a lock with the same owner as B will never conflict with B, even though it conflicts with A. I think there is still value in having the tree, but when a waiter is attached under a new blocker, we need to walk the whole tree beneath the waiter and detach/wake anything that is not blocked by the new blocker. Thanks, NeilBrown
On Fri, Aug 10, 2018 at 11:50:58AM +1000, NeilBrown wrote: > You're good at this game! Everybody's got to have a hobby, mine is pathological posix locking cases.... > So, because a locker with the same "owner" gets a free pass, you can > *never* say that any lock which conflicts with A also conflicts with B, > as a lock with the same owner as B will never conflict with B, even > though it conflicts with A. > > I think there is still value in having the tree, but when a waiter is > attached under a new blocker, we need to walk the whole tree beneath the > waiter and detach/wake anything that is not blocked by the new blocker. If you're walking the whole tree every time then it might as well be a flat list, I think? --b.
On Thu, Aug 09 2018, J. Bruce Fields wrote: > On Fri, Aug 10, 2018 at 11:50:58AM +1000, NeilBrown wrote: >> You're good at this game! > > Everybody's got to have a hobby, mine is pathological posix locking > cases.... > >> So, because a locker with the same "owner" gets a free pass, you can >> *never* say that any lock which conflicts with A also conflicts with B, >> as a lock with the same owner as B will never conflict with B, even >> though it conflicts with A. >> >> I think there is still value in having the tree, but when a waiter is >> attached under a new blocker, we need to walk the whole tree beneath the >> waiter and detach/wake anything that is not blocked by the new blocker. > > If you're walking the whole tree every time then it might as well be a > flat list, I think? The advantage of a tree is that it keeps over-lapping locks closer together. For it to make a difference you would need a load where lots of threads were locking several different small ranges, and other threads were locking large ranges that cover all the small ranges. I doubt this is common, but it doesn't seem as strange as other things I've seen in the wild. The other advantage, of course, is that I've already written the code, and I like it. Maybe I'll do a simple-list version, then a patch to convert that to the clever-tree version, and we can then have something concrete to assess. Thanks, NeilBrown
On Fri, Aug 10, 2018 at 01:17:14PM +1000, NeilBrown wrote: > On Thu, Aug 09 2018, J. Bruce Fields wrote: > > > On Fri, Aug 10, 2018 at 11:50:58AM +1000, NeilBrown wrote: > >> You're good at this game! > > > > Everybody's got to have a hobby, mine is pathological posix locking > > cases.... > > > >> So, because a locker with the same "owner" gets a free pass, you can > >> *never* say that any lock which conflicts with A also conflicts with B, > >> as a lock with the same owner as B will never conflict with B, even > >> though it conflicts with A. > >> > >> I think there is still value in having the tree, but when a waiter is > >> attached under a new blocker, we need to walk the whole tree beneath the > >> waiter and detach/wake anything that is not blocked by the new blocker. > > > > If you're walking the whole tree every time then it might as well be a > > flat list, I think? > > The advantage of a tree is that it keeps over-lapping locks closer > together. > For it to make a difference you would need a load where lots of threads > were locking several different small ranges, and other threads were > locking large ranges that cover all the small ranges. OK, I'm not sure I understand, but I'll give another look at the next version.... > I doubt this is common, but it doesn't seem as strange as other things > I've seen in the wild. > The other advantage, of course, is that I've already written the code, > and I like it. > > Maybe I'll do a simple-list version, then a patch to convert that to the > clever-tree version, and we can then have something concrete to assess. That might help, thanks. --b.
On Thu, 2018-08-09 at 20:29 -0400, J. Bruce Fields wrote: > On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote: > > On Thu, Aug 09 2018, J. Bruce Fields wrote: > > > > > I think there's also a problem with multiple tasks sharing the same > > > lock owner. > > > > > > So, all locks are exclusive locks for the same range. We have four > > > tasks. Tasks 1 and 4 share the same owner, the others' owners are > > > distinct. > > > > > > - Task 1 gets a lock. > > > - Task 2 gets a conflicting lock. > > > - Task 3 gets another conflicting lock. So now we the tree is > > > 3->2->1. > > > - Task 1's lock is released. > > > - Before task 2 is scheduled, task 4 acquires a new lock. > > > - Task 2 waits on task 4's lock, we now have > > > 3->2->4. > > > > > > Task 3 shouldn't be waiting--the lock it's requesting has the same owner > > > as the lock task 4 holds--but we fail to wake up task 3. > > > > So task 1 and task 4 are threads in the one process - OK. > > Tasks 2 and 3 are threads in two other processes. > > > > So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be > > woken? > > > > I suspect you got the numbers bit mixed up, > > Whoops. > > > but in any case, the "conflict()" function that is passed around takes > > ownership into account when assessing if one lock conflicts with > > another. > > Right, I know, but, let me try again: > > All locks are exclusive locks for the same range. Only tasks 3 and 4 > share the the same owner. > > - Task 1 gets a lock. > - Task 2 requests a conflicting lock, so we have 2->1. > - Task 3 requests a conflicting lock, so we have 3->2->1. > - Task 1 unlocks. We wake up task 2, but it isn't scheduled yet. > - Task 4 gets a new lock. > - Task 2 runs, discovers the conflict, and waits. Now we have: > 3->2->4. > > There is no conflict between the lock 3 requested and the lock 4 holds, > but 3 is not woken up. > > This is another version of the first problem: there's information we > need (the owners of the waiting locks in the tree) that we can't > determine just from looking at the root of the tree. > > I'm not sure what to do about that. > Is this still a problem in the v2 set? wake_non_conflicts walks the whole tree of requests that were blocked on it, so a. After task 2 discovers the conflict, it should wake any of its children that don't conflict. So in that last step, task 3 would be awoken before task 2 goes back to sleep.
On Fri, 2018-08-10 at 11:47 -0400, J. Bruce Fields wrote: > On Fri, Aug 10, 2018 at 01:17:14PM +1000, NeilBrown wrote: > > On Thu, Aug 09 2018, J. Bruce Fields wrote: > > > > > On Fri, Aug 10, 2018 at 11:50:58AM +1000, NeilBrown wrote: > > > > You're good at this game! > > > > > > Everybody's got to have a hobby, mine is pathological posix locking > > > cases.... > > > > > > > So, because a locker with the same "owner" gets a free pass, you can > > > > *never* say that any lock which conflicts with A also conflicts with B, > > > > as a lock with the same owner as B will never conflict with B, even > > > > though it conflicts with A. > > > > > > > > I think there is still value in having the tree, but when a waiter is > > > > attached under a new blocker, we need to walk the whole tree beneath the > > > > waiter and detach/wake anything that is not blocked by the new blocker. > > > > > > If you're walking the whole tree every time then it might as well be a > > > flat list, I think? > > > > The advantage of a tree is that it keeps over-lapping locks closer > > together. > > For it to make a difference you would need a load where lots of threads > > were locking several different small ranges, and other threads were > > locking large ranges that cover all the small ranges. > > OK, I'm not sure I understand, but I'll give another look at the next > version.... > > > I doubt this is common, but it doesn't seem as strange as other things > > I've seen in the wild. > > The other advantage, of course, is that I've already written the code, > > and I like it. > > > > Maybe I'll do a simple-list version, then a patch to convert that to the > > clever-tree version, and we can then have something concrete to assess. > > That might help, thanks. > FWIW, I did a bit of testing with lockperf tests that I had written on an earlier rework of this code: https://git.samba.org/jlayton/linux.git/?p=jlayton/lockperf.git;a=summary The posix01 and flock01 tests in there show about a 10x speedup with this set in place. I think something closer to Neil's design will end up being what we want here. Consider the relatively common case where you have a whole-file POSIX write lock held with a bunch of different waiters blocked on it (all whole file write locks with different owners): With Neil's patches, you will just wake up a single waiter when the blocked lock is released, as they would all be in a long chain of waiters. If you keep all the locks in a single list, you'll either have to: a) wake up all the waiters on the list when the lock comes free: no lock is held at that point so none of them will conflict. ...or... b) keep track of what waiters have already been awoken, and compare any further candidate for waking against the current set of held locks and any lock requests by waiters that you just woke. b seems more expensive as you have to walk over a larger set of locks on every change.
On Sat, Aug 11, 2018 at 07:51:13AM -0400, Jeff Layton wrote: > On Thu, 2018-08-09 at 20:29 -0400, J. Bruce Fields wrote: > > On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote: > > > On Thu, Aug 09 2018, J. Bruce Fields wrote: > > > > > > > I think there's also a problem with multiple tasks sharing the same > > > > lock owner. > > > > > > > > So, all locks are exclusive locks for the same range. We have four > > > > tasks. Tasks 1 and 4 share the same owner, the others' owners are > > > > distinct. > > > > > > > > - Task 1 gets a lock. > > > > - Task 2 gets a conflicting lock. > > > > - Task 3 gets another conflicting lock. So now we the tree is > > > > 3->2->1. > > > > - Task 1's lock is released. > > > > - Before task 2 is scheduled, task 4 acquires a new lock. > > > > - Task 2 waits on task 4's lock, we now have > > > > 3->2->4. > > > > > > > > Task 3 shouldn't be waiting--the lock it's requesting has the same owner > > > > as the lock task 4 holds--but we fail to wake up task 3. > > > > > > So task 1 and task 4 are threads in the one process - OK. > > > Tasks 2 and 3 are threads in two other processes. > > > > > > So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be > > > woken? > > > > > > I suspect you got the numbers bit mixed up, > > > > Whoops. > > > > > but in any case, the "conflict()" function that is passed around takes > > > ownership into account when assessing if one lock conflicts with > > > another. > > > > Right, I know, but, let me try again: > > > > All locks are exclusive locks for the same range. Only tasks 3 and 4 > > share the the same owner. > > > > - Task 1 gets a lock. > > - Task 2 requests a conflicting lock, so we have 2->1. > > - Task 3 requests a conflicting lock, so we have 3->2->1. > > - Task 1 unlocks. We wake up task 2, but it isn't scheduled yet. > > - Task 4 gets a new lock. > > - Task 2 runs, discovers the conflict, and waits. Now we have: > > 3->2->4. > > > > There is no conflict between the lock 3 requested and the lock 4 holds, > > but 3 is not woken up. > > > > This is another version of the first problem: there's information we > > need (the owners of the waiting locks in the tree) that we can't > > determine just from looking at the root of the tree. > > > > I'm not sure what to do about that. > > > > Is this still a problem in the v2 set? > > wake_non_conflicts walks the whole tree of requests that were blocked on > it, Not in the FL_TRANSITIVE_CONFLICT case, which is the case here. --b. > so a. After task 2 discovers the conflict, it should wake any of its > children that don't conflict. So in that last step, task 3 would be > awoken before task 2 goes back to sleep. > > -- > Jeff Layton <jlayton@kernel.org>
On Sat, Aug 11, 2018 at 07:56:25AM -0400, Jeff Layton wrote: > FWIW, I did a bit of testing with lockperf tests that I had written on > an earlier rework of this code: > > https://git.samba.org/jlayton/linux.git/?p=jlayton/lockperf.git;a=summary > > > The posix01 and flock01 tests in there show about a 10x speedup with > this set in place. > > I think something closer to Neil's design will end up being what we want > here. Consider the relatively common case where you have a whole-file > POSIX write lock held with a bunch of different waiters blocked on it > (all whole file write locks with different owners): > > With Neil's patches, you will just wake up a single waiter when the > blocked lock is released, as they would all be in a long chain of > waiters. Right, but you still need to walk the whole tree to make sure that it's the only one you need to wake. The tree structure means that you know all the other locks have non-overlapping ranges, but it doesn't tell you the lock owners. Maybe there's some reasonable way to rule out the shared-lockowner case more quickly too. I haven't thought about that much. > If you keep all the locks in a single list, you'll either have to: > > a) wake up all the waiters on the list when the lock comes free: no lock > is held at that point so none of them will conflict. > > ...or... > > b) keep track of what waiters have already been awoken, and compare any > further candidate for waking against the current set of held locks and > any lock requests by waiters that you just woke. Instead of keeping track of *every* waiter that you've woken, you could keep track of some subset. Worst case that just means waking more processes than you need to, which is wasteful but correct. In the common case that you give, that subset could just be "the first waiter you wake". You'd get the same result. The every-waiter-a-whole-file-write-lock case is pretty easy. To benefit from the tree you need a case where some of the waiters overlap and some don't. Might be worth it, sure. --b. > b seems more expensive as you have to walk over a larger set of locks > on every change. > -- > Jeff Layton <jlayton@kernel.org>
On Sat, 2018-08-11 at 08:21 -0400, J. Bruce Fields wrote: > On Sat, Aug 11, 2018 at 07:51:13AM -0400, Jeff Layton wrote: > > On Thu, 2018-08-09 at 20:29 -0400, J. Bruce Fields wrote: > > > On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote: > > > > On Thu, Aug 09 2018, J. Bruce Fields wrote: > > > > > > > > > I think there's also a problem with multiple tasks sharing the same > > > > > lock owner. > > > > > > > > > > So, all locks are exclusive locks for the same range. We have four > > > > > tasks. Tasks 1 and 4 share the same owner, the others' owners are > > > > > distinct. > > > > > > > > > > - Task 1 gets a lock. > > > > > - Task 2 gets a conflicting lock. > > > > > - Task 3 gets another conflicting lock. So now we the tree is > > > > > 3->2->1. > > > > > - Task 1's lock is released. > > > > > - Before task 2 is scheduled, task 4 acquires a new lock. > > > > > - Task 2 waits on task 4's lock, we now have > > > > > 3->2->4. > > > > > > > > > > Task 3 shouldn't be waiting--the lock it's requesting has the same owner > > > > > as the lock task 4 holds--but we fail to wake up task 3. > > > > > > > > So task 1 and task 4 are threads in the one process - OK. > > > > Tasks 2 and 3 are threads in two other processes. > > > > > > > > So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be > > > > woken? > > > > > > > > I suspect you got the numbers bit mixed up, > > > > > > Whoops. > > > > > > > but in any case, the "conflict()" function that is passed around takes > > > > ownership into account when assessing if one lock conflicts with > > > > another. > > > > > > Right, I know, but, let me try again: > > > > > > All locks are exclusive locks for the same range. Only tasks 3 and 4 > > > share the the same owner. > > > > > > - Task 1 gets a lock. > > > - Task 2 requests a conflicting lock, so we have 2->1. > > > - Task 3 requests a conflicting lock, so we have 3->2->1. > > > - Task 1 unlocks. We wake up task 2, but it isn't scheduled yet. > > > - Task 4 gets a new lock. > > > - Task 2 runs, discovers the conflict, and waits. Now we have: > > > 3->2->4. > > > > > > There is no conflict between the lock 3 requested and the lock 4 holds, > > > but 3 is not woken up. > > > > > > This is another version of the first problem: there's information we > > > need (the owners of the waiting locks in the tree) that we can't > > > determine just from looking at the root of the tree. > > > > > > I'm not sure what to do about that. > > > > > > > Is this still a problem in the v2 set? > > > > wake_non_conflicts walks the whole tree of requests that were blocked on > > it, > > Not in the FL_TRANSITIVE_CONFLICT case, which is the case here. > Got it. Yeah, I'm not sure that the idea of FL_TRANSITIVE_CONFLICT really works. I think you could fix this by just getting rid of that and folding it into the FL_CONFLICT case. In more complex situations (like the one you describe), you may end up cycling through several blocked requests before you hit one that can get the lock. It may slow things down some for those cases. In the more common locking scenarios (whole file locks, different owners), I think this will be much more efficient by avoiding so many wakeups.