diff mbox series

[v3,hmm,11/12] mm/hmm: Remove confusing comment and logic from hmm_release

Message ID 20190614004450.20252-12-jgg@ziepe.ca (mailing list archive)
State New, archived
Headers show
Series mm/hmm: Various revisions from a locking/code review | expand

Commit Message

Jason Gunthorpe June 14, 2019, 12:44 a.m. UTC
From: Jason Gunthorpe <jgg@mellanox.com>

hmm_release() is called exactly once per hmm. ops->release() cannot
accidentally trigger any action that would recurse back onto
hmm->mirrors_sem.

This fixes a use after-free race of the form:

       CPU0                                   CPU1
                                           hmm_release()
                                             up_write(&hmm->mirrors_sem);
 hmm_mirror_unregister(mirror)
  down_write(&hmm->mirrors_sem);
  up_write(&hmm->mirrors_sem);
  kfree(mirror)
                                             mirror->ops->release(mirror)

The only user we have today for ops->release is an empty function, so this
is unambiguously safe.

As a consequence of plugging this race drivers are not allowed to
register/unregister mirrors from within a release op.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Tested-by: Philip Yang <Philip.Yang@amd.com>
---
 mm/hmm.c | 28 +++++++++-------------------
 1 file changed, 9 insertions(+), 19 deletions(-)

Comments

Christoph Hellwig June 15, 2019, 2:21 p.m. UTC | #1
On Thu, Jun 13, 2019 at 09:44:49PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> hmm_release() is called exactly once per hmm. ops->release() cannot
> accidentally trigger any action that would recurse back onto
> hmm->mirrors_sem.

In linux-next amdgpu actually calls hmm_mirror_unregister from its
release function.  That whole release function looks rather sketchy,
but we probably need to sort that out first.
Jason Gunthorpe June 18, 2019, 12:45 a.m. UTC | #2
On Sat, Jun 15, 2019 at 07:21:06AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 13, 2019 at 09:44:49PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > hmm_release() is called exactly once per hmm. ops->release() cannot
> > accidentally trigger any action that would recurse back onto
> > hmm->mirrors_sem.
> 
> In linux-next amdgpu actually calls hmm_mirror_unregister from its
> release function.  That whole release function looks rather sketchy,
> but we probably need to sort that out first.

Does it? I see this:

static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
{
        struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);

        INIT_WORK(&amn->work, amdgpu_mn_destroy);
        schedule_work(&amn->work);
}

static struct hmm_mirror_ops amdgpu_hmm_mirror_ops[] = {
        [AMDGPU_MN_TYPE_GFX] = {
                .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_gfx,
                .release = amdgpu_hmm_mirror_release
        },
        [AMDGPU_MN_TYPE_HSA] = {
                .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_hsa,
                .release = amdgpu_hmm_mirror_release
        },
};


Am I looking at the wrong thing? Looks like it calls it through a work
queue should should be OK..

Though very strange that amdgpu only destroys the mirror via release,
that cannot be right.

Jason
Christoph Hellwig June 18, 2019, 5:37 a.m. UTC | #3
On Mon, Jun 17, 2019 at 09:45:09PM -0300, Jason Gunthorpe wrote:
> Am I looking at the wrong thing? Looks like it calls it through a work
> queue should should be OK..

Yes, it calls it through a work queue.  I guess that is fine because
it needs to take the lock again.

> Though very strange that amdgpu only destroys the mirror via release,
> that cannot be right.

As said the whole things looks rather odd to me.
Felix Kuehling June 19, 2019, 12:53 a.m. UTC | #4
On 2019-06-18 1:37, Christoph Hellwig wrote:
> On Mon, Jun 17, 2019 at 09:45:09PM -0300, Jason Gunthorpe wrote:
>> Am I looking at the wrong thing? Looks like it calls it through a work
>> queue should should be OK..
> Yes, it calls it through a work queue.  I guess that is fine because
> it needs to take the lock again.
>
>> Though very strange that amdgpu only destroys the mirror via release,
>> that cannot be right.
> As said the whole things looks rather odd to me.

This code is derived from our old MMU notifier code. Before HMM we used 
to register a single MMU notifier per mm_struct and look up virtual 
address ranges that had been registered for mirroring via driver API 
calls. The idea was to reuse a single MMU notifier for the life time of 
the process. It would remain registered until we got a notifier_release.

hmm_mirror took the place of that when we converted the code to HMM.

I suppose we could destroy the mirror earlier, when we have no more 
registered virtual address ranges, and create a new one if needed later.

Regards,
   Felix
Christoph Hellwig June 19, 2019, 8:07 a.m. UTC | #5
On Wed, Jun 19, 2019 at 12:53:55AM +0000, Kuehling, Felix wrote:
> This code is derived from our old MMU notifier code. Before HMM we used 
> to register a single MMU notifier per mm_struct and look up virtual 
> address ranges that had been registered for mirroring via driver API 
> calls. The idea was to reuse a single MMU notifier for the life time of 
> the process. It would remain registered until we got a notifier_release.
> 
> hmm_mirror took the place of that when we converted the code to HMM.
> 
> I suppose we could destroy the mirror earlier, when we have no more 
> registered virtual address ranges, and create a new one if needed later.

I didn't write the code, but if you look at hmm_mirror it already is
a multiplexer over the mmu notifier, and the intent clearly seems that
you register one per range that you want to mirror, and not multiplex
it once again.  In other words - I think each amdgpu_mn_node should
probably have its own hmm_mirror.  And while the amdgpu_mn_node objects
are currently stored in an interval tree it seems like they are only
linearly iterated anyway, so a list actually seems pretty suitable.  If
not we need to improve the core data structures instead of working
around them.
Jason Gunthorpe June 19, 2019, 11:56 a.m. UTC | #6
On Wed, Jun 19, 2019 at 01:07:05AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 19, 2019 at 12:53:55AM +0000, Kuehling, Felix wrote:
> > This code is derived from our old MMU notifier code. Before HMM we used 
> > to register a single MMU notifier per mm_struct and look up virtual 
> > address ranges that had been registered for mirroring via driver API 
> > calls. The idea was to reuse a single MMU notifier for the life time of 
> > the process. It would remain registered until we got a notifier_release.
> > 
> > hmm_mirror took the place of that when we converted the code to HMM.
> > 
> > I suppose we could destroy the mirror earlier, when we have no more 
> > registered virtual address ranges, and create a new one if needed later.
> 
> I didn't write the code, but if you look at hmm_mirror it already is
> a multiplexer over the mmu notifier, and the intent clearly seems that
> you register one per range that you want to mirror, and not multiplex
> it once again.  In other words - I think each amdgpu_mn_node should
> probably have its own hmm_mirror.  And while the amdgpu_mn_node objects
> are currently stored in an interval tree it seems like they are only
> linearly iterated anyway, so a list actually seems pretty suitable.  If
> not we need to improve the core data structures instead of working
> around them.

This looks a lot like the ODP code (amdgpu_mn_node == ib_umem_odp)

The interval tree is to quickly find the driver object(s) that have
the virtual pages during invalidation:

static int amdgpu_mn_sync_pagetables_gfx(struct hmm_mirror *mirror,
                        const struct hmm_update *update)
{
        it = interval_tree_iter_first(&amn->objects, start, end);
        while (it) {
                [..]
                amdgpu_mn_invalidate_node(node, start, end);

And following the ODP model there should be a single hmm_mirror per-mm
(user can fork and stuff, this is something I want to have core code
help with). 

The hmm_mirror can either exist so long as objects exist, or it can
exist until the chardev is closed - but never longer than the
chardev's lifetime.

Maybe we should be considering providing a mmu notifier & interval
tree & lock abstraction since ODP & AMD are very similar here..

Jason
Christoph Hellwig June 19, 2019, 12:03 p.m. UTC | #7
On Wed, Jun 19, 2019 at 08:56:32AM -0300, Jason Gunthorpe wrote:
> This looks a lot like the ODP code (amdgpu_mn_node == ib_umem_odp)
> 
> The interval tree is to quickly find the driver object(s) that have
> the virtual pages during invalidation:
> 
> static int amdgpu_mn_sync_pagetables_gfx(struct hmm_mirror *mirror,
>                         const struct hmm_update *update)
> {
>         it = interval_tree_iter_first(&amn->objects, start, end);
>         while (it) {
>                 [..]
>                 amdgpu_mn_invalidate_node(node, start, end);
> 
> And following the ODP model there should be a single hmm_mirror per-mm
> (user can fork and stuff, this is something I want to have core code
> help with). 

That makes the hmm_mirror object pretty silly, though as the scope
is then exactly the same as the mmu_notifier itself.

> The hmm_mirror can either exist so long as objects exist, or it can
> exist until the chardev is closed - but never longer than the
> chardev's lifetime.
> 
> Maybe we should be considering providing a mmu notifier & interval
> tree & lock abstraction since ODP & AMD are very similar here..

It defintively sounds like a good idea to move this kind of object
management into common code.  Nouvea actually seems like the odd one
out here by not having a list of objects below the mirror, but then
again and interval tree with a single entry wouldn't really hurt it
either.
diff mbox series

Patch

diff --git a/mm/hmm.c b/mm/hmm.c
index 26af511cbdd075..c0d43302fd6b2f 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -137,26 +137,16 @@  static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	WARN_ON(!list_empty(&hmm->ranges));
 	mutex_unlock(&hmm->lock);
 
-	down_write(&hmm->mirrors_sem);
-	mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror,
-					  list);
-	while (mirror) {
-		list_del_init(&mirror->list);
-		if (mirror->ops->release) {
-			/*
-			 * Drop mirrors_sem so the release callback can wait
-			 * on any pending work that might itself trigger a
-			 * mmu_notifier callback and thus would deadlock with
-			 * us.
-			 */
-			up_write(&hmm->mirrors_sem);
+	down_read(&hmm->mirrors_sem);
+	list_for_each_entry(mirror, &hmm->mirrors, list) {
+		/*
+		 * Note: The driver is not allowed to trigger
+		 * hmm_mirror_unregister() from this thread.
+		 */
+		if (mirror->ops->release)
 			mirror->ops->release(mirror);
-			down_write(&hmm->mirrors_sem);
-		}
-		mirror = list_first_entry_or_null(&hmm->mirrors,
-						  struct hmm_mirror, list);
 	}
-	up_write(&hmm->mirrors_sem);
+	up_read(&hmm->mirrors_sem);
 
 	hmm_put(hmm);
 }
@@ -286,7 +276,7 @@  void hmm_mirror_unregister(struct hmm_mirror *mirror)
 	struct hmm *hmm = mirror->hmm;
 
 	down_write(&hmm->mirrors_sem);
-	list_del_init(&mirror->list);
+	list_del(&mirror->list);
 	up_write(&hmm->mirrors_sem);
 	hmm_put(hmm);
 	memset(&mirror->hmm, POISON_INUSE, sizeof(mirror->hmm));