Message ID | 20190614004450.20252-12-jgg@ziepe.ca (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/hmm: Various revisions from a locking/code review | expand |
On Thu, Jun 13, 2019 at 09:44:49PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe <jgg@mellanox.com> > > hmm_release() is called exactly once per hmm. ops->release() cannot > accidentally trigger any action that would recurse back onto > hmm->mirrors_sem. In linux-next amdgpu actually calls hmm_mirror_unregister from its release function. That whole release function looks rather sketchy, but we probably need to sort that out first.
On Sat, Jun 15, 2019 at 07:21:06AM -0700, Christoph Hellwig wrote: > On Thu, Jun 13, 2019 at 09:44:49PM -0300, Jason Gunthorpe wrote: > > From: Jason Gunthorpe <jgg@mellanox.com> > > > > hmm_release() is called exactly once per hmm. ops->release() cannot > > accidentally trigger any action that would recurse back onto > > hmm->mirrors_sem. > > In linux-next amdgpu actually calls hmm_mirror_unregister from its > release function. That whole release function looks rather sketchy, > but we probably need to sort that out first. Does it? I see this: static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror) { struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror); INIT_WORK(&amn->work, amdgpu_mn_destroy); schedule_work(&amn->work); } static struct hmm_mirror_ops amdgpu_hmm_mirror_ops[] = { [AMDGPU_MN_TYPE_GFX] = { .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_gfx, .release = amdgpu_hmm_mirror_release }, [AMDGPU_MN_TYPE_HSA] = { .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_hsa, .release = amdgpu_hmm_mirror_release }, }; Am I looking at the wrong thing? Looks like it calls it through a work queue should should be OK.. Though very strange that amdgpu only destroys the mirror via release, that cannot be right. Jason
On Mon, Jun 17, 2019 at 09:45:09PM -0300, Jason Gunthorpe wrote: > Am I looking at the wrong thing? Looks like it calls it through a work > queue should should be OK.. Yes, it calls it through a work queue. I guess that is fine because it needs to take the lock again. > Though very strange that amdgpu only destroys the mirror via release, > that cannot be right. As said the whole things looks rather odd to me.
On 2019-06-18 1:37, Christoph Hellwig wrote: > On Mon, Jun 17, 2019 at 09:45:09PM -0300, Jason Gunthorpe wrote: >> Am I looking at the wrong thing? Looks like it calls it through a work >> queue should should be OK.. > Yes, it calls it through a work queue. I guess that is fine because > it needs to take the lock again. > >> Though very strange that amdgpu only destroys the mirror via release, >> that cannot be right. > As said the whole things looks rather odd to me. This code is derived from our old MMU notifier code. Before HMM we used to register a single MMU notifier per mm_struct and look up virtual address ranges that had been registered for mirroring via driver API calls. The idea was to reuse a single MMU notifier for the life time of the process. It would remain registered until we got a notifier_release. hmm_mirror took the place of that when we converted the code to HMM. I suppose we could destroy the mirror earlier, when we have no more registered virtual address ranges, and create a new one if needed later. Regards, Felix
On Wed, Jun 19, 2019 at 12:53:55AM +0000, Kuehling, Felix wrote: > This code is derived from our old MMU notifier code. Before HMM we used > to register a single MMU notifier per mm_struct and look up virtual > address ranges that had been registered for mirroring via driver API > calls. The idea was to reuse a single MMU notifier for the life time of > the process. It would remain registered until we got a notifier_release. > > hmm_mirror took the place of that when we converted the code to HMM. > > I suppose we could destroy the mirror earlier, when we have no more > registered virtual address ranges, and create a new one if needed later. I didn't write the code, but if you look at hmm_mirror it already is a multiplexer over the mmu notifier, and the intent clearly seems that you register one per range that you want to mirror, and not multiplex it once again. In other words - I think each amdgpu_mn_node should probably have its own hmm_mirror. And while the amdgpu_mn_node objects are currently stored in an interval tree it seems like they are only linearly iterated anyway, so a list actually seems pretty suitable. If not we need to improve the core data structures instead of working around them.
On Wed, Jun 19, 2019 at 01:07:05AM -0700, Christoph Hellwig wrote: > On Wed, Jun 19, 2019 at 12:53:55AM +0000, Kuehling, Felix wrote: > > This code is derived from our old MMU notifier code. Before HMM we used > > to register a single MMU notifier per mm_struct and look up virtual > > address ranges that had been registered for mirroring via driver API > > calls. The idea was to reuse a single MMU notifier for the life time of > > the process. It would remain registered until we got a notifier_release. > > > > hmm_mirror took the place of that when we converted the code to HMM. > > > > I suppose we could destroy the mirror earlier, when we have no more > > registered virtual address ranges, and create a new one if needed later. > > I didn't write the code, but if you look at hmm_mirror it already is > a multiplexer over the mmu notifier, and the intent clearly seems that > you register one per range that you want to mirror, and not multiplex > it once again. In other words - I think each amdgpu_mn_node should > probably have its own hmm_mirror. And while the amdgpu_mn_node objects > are currently stored in an interval tree it seems like they are only > linearly iterated anyway, so a list actually seems pretty suitable. If > not we need to improve the core data structures instead of working > around them. This looks a lot like the ODP code (amdgpu_mn_node == ib_umem_odp) The interval tree is to quickly find the driver object(s) that have the virtual pages during invalidation: static int amdgpu_mn_sync_pagetables_gfx(struct hmm_mirror *mirror, const struct hmm_update *update) { it = interval_tree_iter_first(&amn->objects, start, end); while (it) { [..] amdgpu_mn_invalidate_node(node, start, end); And following the ODP model there should be a single hmm_mirror per-mm (user can fork and stuff, this is something I want to have core code help with). The hmm_mirror can either exist so long as objects exist, or it can exist until the chardev is closed - but never longer than the chardev's lifetime. Maybe we should be considering providing a mmu notifier & interval tree & lock abstraction since ODP & AMD are very similar here.. Jason
On Wed, Jun 19, 2019 at 08:56:32AM -0300, Jason Gunthorpe wrote: > This looks a lot like the ODP code (amdgpu_mn_node == ib_umem_odp) > > The interval tree is to quickly find the driver object(s) that have > the virtual pages during invalidation: > > static int amdgpu_mn_sync_pagetables_gfx(struct hmm_mirror *mirror, > const struct hmm_update *update) > { > it = interval_tree_iter_first(&amn->objects, start, end); > while (it) { > [..] > amdgpu_mn_invalidate_node(node, start, end); > > And following the ODP model there should be a single hmm_mirror per-mm > (user can fork and stuff, this is something I want to have core code > help with). That makes the hmm_mirror object pretty silly, though as the scope is then exactly the same as the mmu_notifier itself. > The hmm_mirror can either exist so long as objects exist, or it can > exist until the chardev is closed - but never longer than the > chardev's lifetime. > > Maybe we should be considering providing a mmu notifier & interval > tree & lock abstraction since ODP & AMD are very similar here.. It defintively sounds like a good idea to move this kind of object management into common code. Nouvea actually seems like the odd one out here by not having a list of objects below the mirror, but then again and interval tree with a single entry wouldn't really hurt it either.
diff --git a/mm/hmm.c b/mm/hmm.c index 26af511cbdd075..c0d43302fd6b2f 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -137,26 +137,16 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) WARN_ON(!list_empty(&hmm->ranges)); mutex_unlock(&hmm->lock); - down_write(&hmm->mirrors_sem); - mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror, - list); - while (mirror) { - list_del_init(&mirror->list); - if (mirror->ops->release) { - /* - * Drop mirrors_sem so the release callback can wait - * on any pending work that might itself trigger a - * mmu_notifier callback and thus would deadlock with - * us. - */ - up_write(&hmm->mirrors_sem); + down_read(&hmm->mirrors_sem); + list_for_each_entry(mirror, &hmm->mirrors, list) { + /* + * Note: The driver is not allowed to trigger + * hmm_mirror_unregister() from this thread. + */ + if (mirror->ops->release) mirror->ops->release(mirror); - down_write(&hmm->mirrors_sem); - } - mirror = list_first_entry_or_null(&hmm->mirrors, - struct hmm_mirror, list); } - up_write(&hmm->mirrors_sem); + up_read(&hmm->mirrors_sem); hmm_put(hmm); } @@ -286,7 +276,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror) struct hmm *hmm = mirror->hmm; down_write(&hmm->mirrors_sem); - list_del_init(&mirror->list); + list_del(&mirror->list); up_write(&hmm->mirrors_sem); hmm_put(hmm); memset(&mirror->hmm, POISON_INUSE, sizeof(mirror->hmm));