Message ID | b61d5999a4fc6d50b7e073cc3c3efa8fe79bbd94.1684097002.git.lstoakes@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5,1/6] mm/gup: remove unused vmas parameter from get_user_pages() | expand |
On Sun, May 14, 2023, Lorenzo Stoakes wrote: > No invocation of get_user_pages() use the vmas parameter, so remove it. > > The GUP API is confusing and caveated. Recent changes have done much to > improve that, however there is more we can do. Exporting vmas is a prime > target as the caller has to be extremely careful to preclude their use > after the mmap_lock has expired or otherwise be left with dangling > pointers. > > Removing the vmas parameter focuses the GUP functions upon their primary > purpose - pinning (and outputting) pages as well as performing the actions > implied by the input flags. > > This is part of a patch series aiming to remove the vmas parameter > altogether. > > Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Acked-by: David Hildenbrand <david@redhat.com> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > Acked-by: Christian K�nig <christian.koenig@amd.com> (for radeon parts) > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > --- > arch/x86/kernel/cpu/sgx/ioctl.c | 2 +- > drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- > drivers/misc/sgi-gru/grufault.c | 2 +- > include/linux/mm.h | 3 +-- > mm/gup.c | 9 +++------ > mm/gup_test.c | 5 ++--- > virt/kvm/kvm_main.c | 2 +- > 7 files changed, 10 insertions(+), 15 deletions(-) Acked-by: Sean Christopherson <seanjc@google.com> (KVM) > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index cb5c13eee193..eaa5bb8dbadc 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) > { > int rc, flags = FOLL_HWPOISON | FOLL_WRITE; > > - rc = get_user_pages(addr, 1, flags, NULL, NULL); > + rc = get_user_pages(addr, 1, flags, NULL); > return rc == -EHWPOISON; Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns a valid page, KVM will leak the refcount and unintentionally pin the page. That's highly unlikely as check_user_page_hwpoison() is called iff get_user_pages_unlocked() fails (called by hva_to_pfn_slow()), but it's theoretically possible that userspace could change the VMAs between hva_to_pfn_slow() and check_user_page_hwpoison() since KVM doesn't hold any relevant locks at this point. E.g. if there's no VMA during hva_to_pfn_{fast,slow}(), npages==-EFAULT and KVM will invoke check_user_page_hwpoison(). If userspace installs a valid mapping after hva_to_pfn_slow() but before KVM acquires mmap_lock, then gup() will find a valid page. I _think_ the fix is to simply delete this code. The bug was introduced by commit fafc3dbaac64 ("KVM: Replace is_hwpoison_address with __get_user_pages"). At that time, KVM didn't check for "npages == -EHWPOISON" from the first call to get_user_pages_unlocked(). Later on, commit 0857b9e95c1a ("KVM: Enable async page fault processing") reworked the caller to be: mmap_read_lock(current->mm); if (npages == -EHWPOISON || (!async && check_user_page_hwpoison(addr))) { pfn = KVM_PFN_ERR_HWPOISON; goto exit; } where async really means NOWAIT, so that the hwpoison use of gup() didn't sleep. KVM: Enable async page fault processing If asynchronous hva_to_pfn() is requested call GUP with FOLL_NOWAIT to avoid sleeping on IO. Check for hwpoison is done at the same time, otherwise check_user_page_hwpoison() will call GUP again and will put vcpu to sleep. There are other potential problems too, e.g. the hwpoison call doesn't honor the recently introduced @interruptible flag. I don't see any reason to keep check_user_page_hwpoison(), KVM can simply rely on the "npages == -EHWPOISON" check. get_user_pages_unlocked() is guaranteed to be called with roughly equivalent flags, and the flags that aren't equivalent are arguably bugs in check_user_page_hwpoison(), e.g. assuming FOLL_WRITE is wrong. TL;DR: Go ahead with this change, I'll submit a separate patch to delete the buggy KVM code.
On 15.05.23 21:07, Sean Christopherson wrote: > On Sun, May 14, 2023, Lorenzo Stoakes wrote: >> No invocation of get_user_pages() use the vmas parameter, so remove it. >> >> The GUP API is confusing and caveated. Recent changes have done much to >> improve that, however there is more we can do. Exporting vmas is a prime >> target as the caller has to be extremely careful to preclude their use >> after the mmap_lock has expired or otherwise be left with dangling >> pointers. >> >> Removing the vmas parameter focuses the GUP functions upon their primary >> purpose - pinning (and outputting) pages as well as performing the actions >> implied by the input flags. >> >> This is part of a patch series aiming to remove the vmas parameter >> altogether. >> >> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> >> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >> Acked-by: David Hildenbrand <david@redhat.com> >> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> >> Acked-by: Christian K�nig <christian.koenig@amd.com> (for radeon parts) >> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> >> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> >> --- >> arch/x86/kernel/cpu/sgx/ioctl.c | 2 +- >> drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- >> drivers/misc/sgi-gru/grufault.c | 2 +- >> include/linux/mm.h | 3 +-- >> mm/gup.c | 9 +++------ >> mm/gup_test.c | 5 ++--- >> virt/kvm/kvm_main.c | 2 +- >> 7 files changed, 10 insertions(+), 15 deletions(-) > > Acked-by: Sean Christopherson <seanjc@google.com> (KVM) > >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index cb5c13eee193..eaa5bb8dbadc 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) >> { >> int rc, flags = FOLL_HWPOISON | FOLL_WRITE; >> >> - rc = get_user_pages(addr, 1, flags, NULL, NULL); >> + rc = get_user_pages(addr, 1, flags, NULL); >> return rc == -EHWPOISON; > > Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns > a valid page, KVM will leak the refcount and unintentionally pin the page. That's When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked() won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with the mapcount of the page. So even if get_user_pages() returns "1", we should be fine. Or am I misunderstanding your concern? At least hva_to_pfn_slow() most certainly didn't return "1" if we end up calling check_user_page_hwpoison(), so nothing would have been pinned there as well.
On Tue, May 16, 2023, David Hildenbrand wrote: > On 15.05.23 21:07, Sean Christopherson wrote: > > On Sun, May 14, 2023, Lorenzo Stoakes wrote: > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index cb5c13eee193..eaa5bb8dbadc 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) > > > { > > > int rc, flags = FOLL_HWPOISON | FOLL_WRITE; > > > - rc = get_user_pages(addr, 1, flags, NULL, NULL); > > > + rc = get_user_pages(addr, 1, flags, NULL); > > > return rc == -EHWPOISON; > > > > Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns > > a valid page, KVM will leak the refcount and unintentionally pin the page. That's > > When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked() > won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with > the mapcount of the page. Ah, that's what I'm missing. > So even if get_user_pages() returns "1", we should be fine. > > > Or am I misunderstanding your concern? Nope, you covered everything. I do think we can drop the extra gup() though, AFAICT it's 100% redundant. But it's not a bug. Thanks!
On 16.05.23 16:30, Sean Christopherson wrote: > On Tue, May 16, 2023, David Hildenbrand wrote: >> On 15.05.23 21:07, Sean Christopherson wrote: >>> On Sun, May 14, 2023, Lorenzo Stoakes wrote: >>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>>> index cb5c13eee193..eaa5bb8dbadc 100644 >>>> --- a/virt/kvm/kvm_main.c >>>> +++ b/virt/kvm/kvm_main.c >>>> @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) >>>> { >>>> int rc, flags = FOLL_HWPOISON | FOLL_WRITE; >>>> - rc = get_user_pages(addr, 1, flags, NULL, NULL); >>>> + rc = get_user_pages(addr, 1, flags, NULL); >>>> return rc == -EHWPOISON; >>> >>> Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns >>> a valid page, KVM will leak the refcount and unintentionally pin the page. That's >> >> When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked() >> won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with >> the mapcount of the page. For completeness: s/mapcount/refcount/ :) > > Ah, that's what I'm missing.
On 5/16/23 07:35, David Hildenbrand wrote: ... >>> When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked() >>> won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with >>> the mapcount of the page. > > For completeness: s/mapcount/refcount/ :) whew, you had me going there! Now it all adds up. :) thanks,
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 21ca0a831b70..5d390df21440 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -214,7 +214,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, if (!(vma->vm_flags & VM_MAYEXEC)) return -EACCES; - ret = get_user_pages(src, 1, 0, &src_page, NULL); + ret = get_user_pages(src, 1, 0, &src_page); if (ret < 1) return -EFAULT; diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 2220cdf6a3f6..3a9db030f98f 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -359,7 +359,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_device *bdev, struct ttm_tt *ttm struct page **pages = ttm->pages + pinned; r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0, - pages, NULL); + pages); if (r < 0) goto release_pages; diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c index b836936e9747..378cf02a2aa1 100644 --- a/drivers/misc/sgi-gru/grufault.c +++ b/drivers/misc/sgi-gru/grufault.c @@ -185,7 +185,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma, #else *pageshift = PAGE_SHIFT; #endif - if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0) + if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page) <= 0) return -EFAULT; *paddr = page_to_phys(page); put_page(page); diff --git a/include/linux/mm.h b/include/linux/mm.h index db3f66ed2f32..2c1a92bf5626 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2382,8 +2382,7 @@ long pin_user_pages_remote(struct mm_struct *mm, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas, int *locked); long get_user_pages(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas); + unsigned int gup_flags, struct page **pages); long pin_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); diff --git a/mm/gup.c b/mm/gup.c index 90d9b65ff35c..b8189396f435 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2294,8 +2294,6 @@ long get_user_pages_remote(struct mm_struct *mm, * @pages: array that receives pointers to the pages pinned. * Should be at least nr_pages long. Or NULL, if caller * only intends to ensure the pages are faulted in. - * @vmas: array of pointers to vmas corresponding to each page. - * Or NULL if the caller does not require them. * * This is the same as get_user_pages_remote(), just with a less-flexible * calling convention where we assume that the mm being operated on belongs to @@ -2303,16 +2301,15 @@ long get_user_pages_remote(struct mm_struct *mm, * obviously don't pass FOLL_REMOTE in here. */ long get_user_pages(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas) + unsigned int gup_flags, struct page **pages) { int locked = 1; - if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_TOUCH)) + if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH)) return -EINVAL; return __get_user_pages_locked(current->mm, start, nr_pages, pages, - vmas, &locked, gup_flags); + NULL, &locked, gup_flags); } EXPORT_SYMBOL(get_user_pages); diff --git a/mm/gup_test.c b/mm/gup_test.c index 8ae7307a1bb6..9ba8ea23f84e 100644 --- a/mm/gup_test.c +++ b/mm/gup_test.c @@ -139,8 +139,7 @@ static int __gup_test_ioctl(unsigned int cmd, pages + i); break; case GUP_BASIC_TEST: - nr = get_user_pages(addr, nr, gup->gup_flags, pages + i, - NULL); + nr = get_user_pages(addr, nr, gup->gup_flags, pages + i); break; case PIN_FAST_BENCHMARK: nr = pin_user_pages_fast(addr, nr, gup->gup_flags, @@ -161,7 +160,7 @@ static int __gup_test_ioctl(unsigned int cmd, pages + i, NULL); else nr = get_user_pages(addr, nr, gup->gup_flags, - pages + i, NULL); + pages + i); break; default: ret = -EINVAL; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cb5c13eee193..eaa5bb8dbadc 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) { int rc, flags = FOLL_HWPOISON | FOLL_WRITE; - rc = get_user_pages(addr, 1, flags, NULL, NULL); + rc = get_user_pages(addr, 1, flags, NULL); return rc == -EHWPOISON; }