Message ID | 20200928103507.2164-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: do not rely on mm == current->mm in __get_user_pages_locked | expand |
Quoting Jason A. Donenfeld (2020-09-28 11:35:07) > It seems likely this block was pasted from internal_get_user_pages_fast, > which is not passed an mm struct and therefore uses current's. But > __get_user_pages_locked is passed an explicit mm, and current->mm is not > always valid. This was hit when being called from i915, which uses: > > pin_user_pages_remote-> > __get_user_pages_remote-> > __gup_longterm_locked-> > __get_user_pages_locked > > Before, this would lead to an OOPS: > > BUG: kernel NULL pointer dereference, address: 0000000000000064 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 0 P4D 0 > Oops: 0002 [#1] SMP > CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S U O 5.9.0-rc7+ #140 > Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS N2OET47W (1.34 ) 08/06/2020 > Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915] > RIP: 0010:__get_user_pages_remote+0xd7/0x310 > Code: f5 01 00 00 83 7d 00 01 0f 85 ed 01 00 00 f7 c1 00 00 04 00 0f 84 58 01 00 00 65 48 8b 04 25 00 6d 01 00 48 8b 80 40 03 00 00 <c7> 40 64 01 00 00 00 65 48 8b 04 25 00 6d 01 00 48 c7 44 24 18 00 > RSP: 0018:ffff888fdfe47de0 EFLAGS: 00010206 > RAX: 0000000000000000 RBX: 00007fe188531000 RCX: 0000000000040001 > RDX: 0000000000000001 RSI: 00007fe188531000 RDI: ffff888ff0748f00 > RBP: ffff888fdfe47e54 R08: ffff888fedc7d7c8 R09: 0000000000000000 > R10: 0000000000000018 R11: fefefefefefefeff R12: ffff888ff0748f00 > R13: ffff888fedc7d7c8 R14: ffff888f81fe3a40 R15: 0000000000042003 > FS: 0000000000000000(0000) GS:ffff888ffc480000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000064 CR3: 0000000002009003 CR4: 00000000003706e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > __i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915] > process_one_work+0x1ca/0x390 > worker_thread+0x48/0x3c0 > ? rescuer_thread+0x3d0/0x3d0 > kthread+0x114/0x130 > ? kthread_create_worker_on_cpu+0x40/0x40 > ret_from_fork+0x1f/0x30 > CR2: 0000000000000064 > > This commit fixes the problem by using the mm pointer passed to the > function rather than the bogus one in current. > > Fixes: 008cfe4418b3 ("mm: Introduce mm_struct.has_pinned") > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Peter Xu <peterx@redhat.com> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > mm/gup.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/gup.c b/mm/gup.c > index dfe781d2ad4c..e869c634cc9a 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1256,7 +1256,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, > } > > if (flags & FOLL_PIN) > - atomic_set(¤t->mm->has_pinned, 1); > + atomic_set(&mm->has_pinned, 1); That's literally the same diff as I was just testing :) I can attest that it fixes the i915 issue, but since that's also your test case, I'm not adding much information. -Chris
On Mon, Sep 28, 2020 at 12:35:07PM +0200, Jason A. Donenfeld wrote: > It seems likely this block was pasted from internal_get_user_pages_fast, > which is not passed an mm struct and therefore uses current's. But > __get_user_pages_locked is passed an explicit mm, and current->mm is not > always valid. This was hit when being called from i915, which uses: > > pin_user_pages_remote-> > __get_user_pages_remote-> > __gup_longterm_locked-> > __get_user_pages_locked > > Before, this would lead to an OOPS: > > BUG: kernel NULL pointer dereference, address: 0000000000000064 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 0 P4D 0 > Oops: 0002 [#1] SMP > CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S U O 5.9.0-rc7+ #140 > Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS N2OET47W (1.34 ) 08/06/2020 > Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915] > RIP: 0010:__get_user_pages_remote+0xd7/0x310 > Code: f5 01 00 00 83 7d 00 01 0f 85 ed 01 00 00 f7 c1 00 00 04 00 0f 84 58 01 00 00 65 48 8b 04 25 00 6d 01 00 48 8b 80 40 03 00 00 <c7> 40 64 01 00 00 00 65 48 8b 04 25 00 6d 01 00 48 c7 44 24 18 00 > RSP: 0018:ffff888fdfe47de0 EFLAGS: 00010206 > RAX: 0000000000000000 RBX: 00007fe188531000 RCX: 0000000000040001 > RDX: 0000000000000001 RSI: 00007fe188531000 RDI: ffff888ff0748f00 > RBP: ffff888fdfe47e54 R08: ffff888fedc7d7c8 R09: 0000000000000000 > R10: 0000000000000018 R11: fefefefefefefeff R12: ffff888ff0748f00 > R13: ffff888fedc7d7c8 R14: ffff888f81fe3a40 R15: 0000000000042003 > FS: 0000000000000000(0000) GS:ffff888ffc480000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000064 CR3: 0000000002009003 CR4: 00000000003706e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > __i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915] > process_one_work+0x1ca/0x390 > worker_thread+0x48/0x3c0 > ? rescuer_thread+0x3d0/0x3d0 > kthread+0x114/0x130 > ? kthread_create_worker_on_cpu+0x40/0x40 > ret_from_fork+0x1f/0x30 > CR2: 0000000000000064 > > This commit fixes the problem by using the mm pointer passed to the > function rather than the bogus one in current. > > Fixes: 008cfe4418b3 ("mm: Introduce mm_struct.has_pinned") > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Peter Xu <peterx@redhat.com> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > mm/gup.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Yes this looks like the right fix Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Thanks, Jason
On Mon, Sep 28, 2020 at 12:35:07PM +0200, Jason A. Donenfeld wrote: > It seems likely this block was pasted from internal_get_user_pages_fast, > which is not passed an mm struct and therefore uses current's. But > __get_user_pages_locked is passed an explicit mm, and current->mm is not > always valid. This was hit when being called from i915, which uses: > > pin_user_pages_remote-> > __get_user_pages_remote-> > __gup_longterm_locked-> > __get_user_pages_locked Afaict it's not only an "current->mm can be NULL" issue - because this flag is used to mark "whether the mm pinned any page", so for remote pinning we definitely should mark the remote mm rather than the current mm, simply because it's the target mm page table that we'd want to stablize rather than the current->mm (even if current->mm always existed). > > Before, this would lead to an OOPS: > > BUG: kernel NULL pointer dereference, address: 0000000000000064 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 0 P4D 0 > Oops: 0002 [#1] SMP > CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S U O 5.9.0-rc7+ #140 > Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS N2OET47W (1.34 ) 08/06/2020 > Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915] > RIP: 0010:__get_user_pages_remote+0xd7/0x310 > Code: f5 01 00 00 83 7d 00 01 0f 85 ed 01 00 00 f7 c1 00 00 04 00 0f 84 58 01 00 00 65 48 8b 04 25 00 6d 01 00 48 8b 80 40 03 00 00 <c7> 40 64 01 00 00 00 65 48 8b 04 25 00 6d 01 00 48 c7 44 24 18 00 > RSP: 0018:ffff888fdfe47de0 EFLAGS: 00010206 > RAX: 0000000000000000 RBX: 00007fe188531000 RCX: 0000000000040001 > RDX: 0000000000000001 RSI: 00007fe188531000 RDI: ffff888ff0748f00 > RBP: ffff888fdfe47e54 R08: ffff888fedc7d7c8 R09: 0000000000000000 > R10: 0000000000000018 R11: fefefefefefefeff R12: ffff888ff0748f00 > R13: ffff888fedc7d7c8 R14: ffff888f81fe3a40 R15: 0000000000042003 > FS: 0000000000000000(0000) GS:ffff888ffc480000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000064 CR3: 0000000002009003 CR4: 00000000003706e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > __i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915] > process_one_work+0x1ca/0x390 > worker_thread+0x48/0x3c0 > ? rescuer_thread+0x3d0/0x3d0 > kthread+0x114/0x130 > ? kthread_create_worker_on_cpu+0x40/0x40 > ret_from_fork+0x1f/0x30 > CR2: 0000000000000064 > > This commit fixes the problem by using the mm pointer passed to the > function rather than the bogus one in current. > > Fixes: 008cfe4418b3 ("mm: Introduce mm_struct.has_pinned") > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Peter Xu <peterx@redhat.com> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > mm/gup.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/gup.c b/mm/gup.c > index dfe781d2ad4c..e869c634cc9a 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1256,7 +1256,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, > } > > if (flags & FOLL_PIN) > - atomic_set(¤t->mm->has_pinned, 1); > + atomic_set(&mm->has_pinned, 1); > > /* > * FOLL_PIN and FOLL_GET are mutually exclusive. Traditional behavior > -- > 2.28.0 > Thanks! And sorry for this silly mistake. I even didn't understand how it was written, because the normal gup change should have come earlier, anyway... Reviewed-by: Peter Xu <peterx@redhat.com>
diff --git a/mm/gup.c b/mm/gup.c index dfe781d2ad4c..e869c634cc9a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1256,7 +1256,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, } if (flags & FOLL_PIN) - atomic_set(¤t->mm->has_pinned, 1); + atomic_set(&mm->has_pinned, 1); /* * FOLL_PIN and FOLL_GET are mutually exclusive. Traditional behavior
It seems likely this block was pasted from internal_get_user_pages_fast, which is not passed an mm struct and therefore uses current's. But __get_user_pages_locked is passed an explicit mm, and current->mm is not always valid. This was hit when being called from i915, which uses: pin_user_pages_remote-> __get_user_pages_remote-> __gup_longterm_locked-> __get_user_pages_locked Before, this would lead to an OOPS: BUG: kernel NULL pointer dereference, address: 0000000000000064 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S U O 5.9.0-rc7+ #140 Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS N2OET47W (1.34 ) 08/06/2020 Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915] RIP: 0010:__get_user_pages_remote+0xd7/0x310 Code: f5 01 00 00 83 7d 00 01 0f 85 ed 01 00 00 f7 c1 00 00 04 00 0f 84 58 01 00 00 65 48 8b 04 25 00 6d 01 00 48 8b 80 40 03 00 00 <c7> 40 64 01 00 00 00 65 48 8b 04 25 00 6d 01 00 48 c7 44 24 18 00 RSP: 0018:ffff888fdfe47de0 EFLAGS: 00010206 RAX: 0000000000000000 RBX: 00007fe188531000 RCX: 0000000000040001 RDX: 0000000000000001 RSI: 00007fe188531000 RDI: ffff888ff0748f00 RBP: ffff888fdfe47e54 R08: ffff888fedc7d7c8 R09: 0000000000000000 R10: 0000000000000018 R11: fefefefefefefeff R12: ffff888ff0748f00 R13: ffff888fedc7d7c8 R14: ffff888f81fe3a40 R15: 0000000000042003 FS: 0000000000000000(0000) GS:ffff888ffc480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000064 CR3: 0000000002009003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915] process_one_work+0x1ca/0x390 worker_thread+0x48/0x3c0 ? rescuer_thread+0x3d0/0x3d0 kthread+0x114/0x130 ? kthread_create_worker_on_cpu+0x40/0x40 ret_from_fork+0x1f/0x30 CR2: 0000000000000064 This commit fixes the problem by using the mm pointer passed to the function rather than the bogus one in current. Fixes: 008cfe4418b3 ("mm: Introduce mm_struct.has_pinned") Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- mm/gup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)