Message ID | 20190820081902.24815-5-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mmu notifier debug annotations/checks | expand |
On Tue, Aug 20, 2019 at 10:19:02AM +0200, Daniel Vetter wrote: > We need to make sure implementations don't cheat and don't have a > possible schedule/blocking point deeply burried where review can't > catch it. > > I'm not sure whether this is the best way to make sure all the > might_sleep() callsites trigger, and it's a bit ugly in the code flow. > But it gets the job done. > > Inspired by an i915 patch series which did exactly that, because the > rules haven't been entirely clear to us. > > v2: Use the shiny new non_block_start/end annotations instead of > abusing preempt_disable/enable. > > v3: Rebase on top of Glisse's arg rework. > > v4: Rebase on top of more Glisse rework. > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: David Rientjes <rientjes@google.com> > Cc: "Christian König" <christian.koenig@amd.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: "Jérôme Glisse" <jglisse@redhat.com> > Cc: linux-mm@kvack.org > Reviewed-by: Christian König <christian.koenig@amd.com> > Reviewed-by: Jérôme Glisse <jglisse@redhat.com> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > mm/mmu_notifier.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > index 538d3bb87f9b..856636d06ee0 100644 > +++ b/mm/mmu_notifier.c > @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) > id = srcu_read_lock(&srcu); > hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) { > if (mn->ops->invalidate_range_start) { > - int _ret = mn->ops->invalidate_range_start(mn, range); > + int _ret; > + > + if (!mmu_notifier_range_blockable(range)) > + non_block_start(); > + _ret = mn->ops->invalidate_range_start(mn, range); > + if (!mmu_notifier_range_blockable(range)) > + non_block_end(); If someone Acks all the sched changes then I can pick this for hmm.git, but I still think the existing pre-emption debugging is fine for this use case. Also, same comment as for the lockdep map, this needs to apply to the non-blocking range_end also. Anyhow, since this series has conflicts with hmm.git it would be best to flow through the whole thing through that tree. If there are no remarks on the first two patches I'll grab them in a few days. Regards, Jason
On Tue, Aug 20, 2019 at 10:34:18AM -0300, Jason Gunthorpe wrote: > On Tue, Aug 20, 2019 at 10:19:02AM +0200, Daniel Vetter wrote: > > We need to make sure implementations don't cheat and don't have a > > possible schedule/blocking point deeply burried where review can't > > catch it. > > > > I'm not sure whether this is the best way to make sure all the > > might_sleep() callsites trigger, and it's a bit ugly in the code flow. > > But it gets the job done. > > > > Inspired by an i915 patch series which did exactly that, because the > > rules haven't been entirely clear to us. > > > > v2: Use the shiny new non_block_start/end annotations instead of > > abusing preempt_disable/enable. > > > > v3: Rebase on top of Glisse's arg rework. > > > > v4: Rebase on top of more Glisse rework. > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: David Rientjes <rientjes@google.com> > > Cc: "Christian König" <christian.koenig@amd.com> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Cc: "Jérôme Glisse" <jglisse@redhat.com> > > Cc: linux-mm@kvack.org > > Reviewed-by: Christian König <christian.koenig@amd.com> > > Reviewed-by: Jérôme Glisse <jglisse@redhat.com> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > mm/mmu_notifier.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > > index 538d3bb87f9b..856636d06ee0 100644 > > +++ b/mm/mmu_notifier.c > > @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) > > id = srcu_read_lock(&srcu); > > hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) { > > if (mn->ops->invalidate_range_start) { > > - int _ret = mn->ops->invalidate_range_start(mn, range); > > + int _ret; > > + > > + if (!mmu_notifier_range_blockable(range)) > > + non_block_start(); > > + _ret = mn->ops->invalidate_range_start(mn, range); > > + if (!mmu_notifier_range_blockable(range)) > > + non_block_end(); > > If someone Acks all the sched changes then I can pick this for > hmm.git, but I still think the existing pre-emption debugging is fine > for this use case. Ok, I'll ping Peter Z. for an ack, iirc he was involved. > Also, same comment as for the lockdep map, this needs to apply to the > non-blocking range_end also. Hm, I thought the page table locks we're holding there already prevent any sleeping, so would be redundant? But reading through code I think that's not guaranteed, so yeah makes sense to add it for invalidate_range_end too. I'll respin once I have the ack/nack from scheduler people. > Anyhow, since this series has conflicts with hmm.git it would be best > to flow through the whole thing through that tree. If there are no > remarks on the first two patches I'll grab them in a few days. Thanks, Daniel
On Tue, Aug 20, 2019 at 05:18:10PM +0200, Daniel Vetter wrote: > > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > > > index 538d3bb87f9b..856636d06ee0 100644 > > > +++ b/mm/mmu_notifier.c > > > @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) > > > id = srcu_read_lock(&srcu); > > > hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) { > > > if (mn->ops->invalidate_range_start) { > > > - int _ret = mn->ops->invalidate_range_start(mn, range); > > > + int _ret; > > > + > > > + if (!mmu_notifier_range_blockable(range)) > > > + non_block_start(); > > > + _ret = mn->ops->invalidate_range_start(mn, range); > > > + if (!mmu_notifier_range_blockable(range)) > > > + non_block_end(); > > > > If someone Acks all the sched changes then I can pick this for > > hmm.git, but I still think the existing pre-emption debugging is fine > > for this use case. > > Ok, I'll ping Peter Z. for an ack, iirc he was involved. > > > Also, same comment as for the lockdep map, this needs to apply to the > > non-blocking range_end also. > > Hm, I thought the page table locks we're holding there already prevent any > sleeping, so would be redundant? AFAIK no. All callers of invalidate_range_start/end pairs do so a few lines apart and don't change their locking in between - thus since start can block so can end. Would love to know if that is not true?? Similarly I've also been idly wondering if we should add a 'might_sleep()' to invalidate_rangestart/end() to make this constraint clear & tested to the mm side? Jason
On Wed, Aug 21, 2019 at 9:33 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Aug 20, 2019 at 05:18:10PM +0200, Daniel Vetter wrote: > > > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > > > > index 538d3bb87f9b..856636d06ee0 100644 > > > > +++ b/mm/mmu_notifier.c > > > > @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) > > > > id = srcu_read_lock(&srcu); > > > > hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) { > > > > if (mn->ops->invalidate_range_start) { > > > > - int _ret = mn->ops->invalidate_range_start(mn, range); > > > > + int _ret; > > > > + > > > > + if (!mmu_notifier_range_blockable(range)) > > > > + non_block_start(); > > > > + _ret = mn->ops->invalidate_range_start(mn, range); > > > > + if (!mmu_notifier_range_blockable(range)) > > > > + non_block_end(); > > > > > > If someone Acks all the sched changes then I can pick this for > > > hmm.git, but I still think the existing pre-emption debugging is fine > > > for this use case. > > > > Ok, I'll ping Peter Z. for an ack, iirc he was involved. > > > > > Also, same comment as for the lockdep map, this needs to apply to the > > > non-blocking range_end also. > > > > Hm, I thought the page table locks we're holding there already prevent any > > sleeping, so would be redundant? > > AFAIK no. All callers of invalidate_range_start/end pairs do so a few > lines apart and don't change their locking in between - thus since > start can block so can end. > > Would love to know if that is not true?? Yeah I reviewed them, I think I mixed up a discussion I had a while ago with Jerome. It's a bit tricky to follow in the code since in some places ->invalidate_range and ->invalidate_range_end seem to be called from the same place, in others not at all. > Similarly I've also been idly wondering if we should add a > 'might_sleep()' to invalidate_rangestart/end() to make this constraint > clear & tested to the mm side? Hm, sounds like a useful idea. Since in general you wont test with mmu notifiers, but they could happen, and then they will block for at least some mutex usually. I'll throw that as an idea on top for the next round. -Daniel
On Tue, Aug 20, 2019 at 05:18:10PM +0200, Daniel Vetter wrote: > On Tue, Aug 20, 2019 at 10:34:18AM -0300, Jason Gunthorpe wrote: > > On Tue, Aug 20, 2019 at 10:19:02AM +0200, Daniel Vetter wrote: > > > We need to make sure implementations don't cheat and don't have a > > > possible schedule/blocking point deeply burried where review can't > > > catch it. > > > > > > I'm not sure whether this is the best way to make sure all the > > > might_sleep() callsites trigger, and it's a bit ugly in the code flow. > > > But it gets the job done. > > > > > > Inspired by an i915 patch series which did exactly that, because the > > > rules haven't been entirely clear to us. > > > > > > v2: Use the shiny new non_block_start/end annotations instead of > > > abusing preempt_disable/enable. > > > > > > v3: Rebase on top of Glisse's arg rework. > > > > > > v4: Rebase on top of more Glisse rework. > > > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > Cc: Michal Hocko <mhocko@suse.com> > > > Cc: David Rientjes <rientjes@google.com> > > > Cc: "Christian König" <christian.koenig@amd.com> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > > Cc: "Jérôme Glisse" <jglisse@redhat.com> > > > Cc: linux-mm@kvack.org > > > Reviewed-by: Christian König <christian.koenig@amd.com> > > > Reviewed-by: Jérôme Glisse <jglisse@redhat.com> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > mm/mmu_notifier.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > > > index 538d3bb87f9b..856636d06ee0 100644 > > > +++ b/mm/mmu_notifier.c > > > @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) > > > id = srcu_read_lock(&srcu); > > > hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) { > > > if (mn->ops->invalidate_range_start) { > > > - int _ret = mn->ops->invalidate_range_start(mn, range); > > > + int _ret; > > > + > > > + if (!mmu_notifier_range_blockable(range)) > > > + non_block_start(); > > > + _ret = mn->ops->invalidate_range_start(mn, range); > > > + if (!mmu_notifier_range_blockable(range)) > > > + non_block_end(); > > > > If someone Acks all the sched changes then I can pick this for > > hmm.git, but I still think the existing pre-emption debugging is fine > > for this use case. > > Ok, I'll ping Peter Z. for an ack, iirc he was involved. > > > Also, same comment as for the lockdep map, this needs to apply to the > > non-blocking range_end also. > > Hm, I thought the page table locks we're holding there already prevent any > sleeping, so would be redundant? But reading through code I think that's > not guaranteed, so yeah makes sense to add it for invalidate_range_end > too. I'll respin once I have the ack/nack from scheduler people. So I started to look into this, and I'm a bit confused. There's no _nonblock version of this, so does this means blocking is never allowed, or always allowed? From a quick look through implementations I've only seen spinlocks, and one up_read. So I guess I should wrape this callback in some unconditional non_block_start/end, but I'm not sure. Thanks, Daniel > > Anyhow, since this series has conflicts with hmm.git it would be best > > to flow through the whole thing through that tree. If there are no > > remarks on the first two patches I'll grab them in a few days. > > Thanks, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Wed, Aug 21, 2019 at 05:41:51PM +0200, Daniel Vetter wrote: > > Hm, I thought the page table locks we're holding there already prevent any > > sleeping, so would be redundant? But reading through code I think that's > > not guaranteed, so yeah makes sense to add it for invalidate_range_end > > too. I'll respin once I have the ack/nack from scheduler people. > > So I started to look into this, and I'm a bit confused. There's no > _nonblock version of this, so does this means blocking is never allowed, > or always allowed? RDMA has a mutex: ib_umem_notifier_invalidate_range_end rbt_ib_umem_for_each_in_range invalidate_range_start_trampoline ib_umem_notifier_end_account mutex_lock(&umem_odp->umem_mutex); I'm working to delete this path though! nonblocking or not follows the start, the same flag gets placed into the mmu_notifier_range struct passed to end. > From a quick look through implementations I've only seen spinlocks, and > one up_read. So I guess I should wrape this callback in some unconditional > non_block_start/end, but I'm not sure. For now, we should keep it the same as start, conditionally blocking. Hopefully before LPC I can send a RFC series that eliminates most invalidate_range_end users in favor of common locking.. Jason
On Thu, Aug 22, 2019 at 10:16 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Wed, Aug 21, 2019 at 05:41:51PM +0200, Daniel Vetter wrote: > > > > Hm, I thought the page table locks we're holding there already prevent any > > > sleeping, so would be redundant? But reading through code I think that's > > > not guaranteed, so yeah makes sense to add it for invalidate_range_end > > > too. I'll respin once I have the ack/nack from scheduler people. > > > > So I started to look into this, and I'm a bit confused. There's no > > _nonblock version of this, so does this means blocking is never allowed, > > or always allowed? > > RDMA has a mutex: > > ib_umem_notifier_invalidate_range_end > rbt_ib_umem_for_each_in_range > invalidate_range_start_trampoline > ib_umem_notifier_end_account > mutex_lock(&umem_odp->umem_mutex); > > I'm working to delete this path though! > > nonblocking or not follows the start, the same flag gets placed into > the mmu_notifier_range struct passed to end. Ok, makes sense. I guess that also means the might_sleep (I started on that) in invalidate_range_end also needs to be conditional? Or not bother with a might_sleep in invalidate_range_end since you're working on removing the last sleep in there? > > From a quick look through implementations I've only seen spinlocks, and > > one up_read. So I guess I should wrape this callback in some unconditional > > non_block_start/end, but I'm not sure. > > For now, we should keep it the same as start, conditionally blocking. > > Hopefully before LPC I can send a RFC series that eliminates most > invalidate_range_end users in favor of common locking.. Thanks, Daniel
On Thu, Aug 22, 2019 at 10:42:39AM +0200, Daniel Vetter wrote: > > RDMA has a mutex: > > > > ib_umem_notifier_invalidate_range_end > > rbt_ib_umem_for_each_in_range > > invalidate_range_start_trampoline > > ib_umem_notifier_end_account > > mutex_lock(&umem_odp->umem_mutex); > > > > I'm working to delete this path though! > > > > nonblocking or not follows the start, the same flag gets placed into > > the mmu_notifier_range struct passed to end. > > Ok, makes sense. > > I guess that also means the might_sleep (I started on that) in > invalidate_range_end also needs to be conditional? Or not bother with > a might_sleep in invalidate_range_end since you're working on removing > the last sleep in there? I might suggest the same pattern as used for locked, the might_sleep unconditionally on the start, and a 2nd might sleep after the IF in __mmu_notifier_invalidate_range_end() Observing that by audit all the callers already have the same locking context for start/end Jason
On Thu, Aug 22, 2019 at 4:24 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Thu, Aug 22, 2019 at 10:42:39AM +0200, Daniel Vetter wrote: > > > > RDMA has a mutex: > > > > > > ib_umem_notifier_invalidate_range_end > > > rbt_ib_umem_for_each_in_range > > > invalidate_range_start_trampoline > > > ib_umem_notifier_end_account > > > mutex_lock(&umem_odp->umem_mutex); > > > > > > I'm working to delete this path though! > > > > > > nonblocking or not follows the start, the same flag gets placed into > > > the mmu_notifier_range struct passed to end. > > > > Ok, makes sense. > > > > I guess that also means the might_sleep (I started on that) in > > invalidate_range_end also needs to be conditional? Or not bother with > > a might_sleep in invalidate_range_end since you're working on removing > > the last sleep in there? > > I might suggest the same pattern as used for locked, the might_sleep > unconditionally on the start, and a 2nd might sleep after the IF in > __mmu_notifier_invalidate_range_end() > > Observing that by audit all the callers already have the same locking > context for start/end My question was more about enforcing that going forward, since you're working to remove all the sleeps from invalidate_range_end. I don't want to add debug annotations which are stricter than what the other side actually expects. But since currently there is still sleeping locks in invalidate_range_end I think I'll just stick them in both places. You can then (re)move it when the cleanup lands. -Daniel
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 538d3bb87f9b..856636d06ee0 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) { - int _ret = mn->ops->invalidate_range_start(mn, range); + int _ret; + + if (!mmu_notifier_range_blockable(range)) + non_block_start(); + _ret = mn->ops->invalidate_range_start(mn, range); + if (!mmu_notifier_range_blockable(range)) + non_block_end(); if (_ret) { pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret,