Message ID | 20161025233609.5601-1-lstoakes@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
The holdout for unexporting __get_user_pages_unlocked() is its invocation in mm/process_vm_access.c: process_vm_rw_single_vec(), as this definitely _does_ seem to invoke VM_FAULT_RETRY behaviour which get_user_pages_remote() will not trigger if we were to replace it with the latter. I'm not sure how to proceed in this case - get_user_pages_remote() invocations assume mmap_sem is held so can't offer VM_FAULT_RETRY behaviour as the lock can't be assumed to be safe to release, and get_user_pages_unlocked() assumes tsk, mm are set to current, current->mm respectively so we can't use that here either. Is it important to retain VM_FAULT_RETRY behaviour here, does it matter? If it isn't so important then we can just go ahead and replace with get_user_pages_remote() and unexport away. Of course the whole idea of unexporting __get_user_pages_unlocked() might be bogus so let me know in that case also :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 26, 2016 at 12:36:09AM +0100, Lorenzo Stoakes wrote: > In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with > get_user_pages_unlocked() since we can now pass gup_flags. > > In async_pf_execute() we need to pass different tsk, mm arguments so > get_user_pages_remote() is the sane replacement here (having added manual > acquisition and release of mmap_sem.) > > Since we pass a NULL pages parameter the subsequent call to > __get_user_pages_locked() will have previously bailed any attempt at > VM_FAULT_RETRY, so we do not change this behaviour by using > get_user_pages_remote() which does not invoke VM_FAULT_RETRY logic at all. > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Note that the use of get_user_pages_remote() in async_pf_execute() reintroduces the use of the FOLL_TOUCH flag - I don't think this is a problem as this flag was dropped by 1e987790 ("mm/gup: Introduce get_user_pages_remote()") which states 'Without protection keys, this patch should not change any behavior', so I don't think this was intentional. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 26-10-16 08:59:52, Lorenzo Stoakes wrote: > On Wed, Oct 26, 2016 at 12:36:09AM +0100, Lorenzo Stoakes wrote: > > In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with > > get_user_pages_unlocked() since we can now pass gup_flags. > > > > In async_pf_execute() we need to pass different tsk, mm arguments so > > get_user_pages_remote() is the sane replacement here (having added manual > > acquisition and release of mmap_sem.) > > > > Since we pass a NULL pages parameter the subsequent call to > > __get_user_pages_locked() will have previously bailed any attempt at > > VM_FAULT_RETRY, so we do not change this behaviour by using > > get_user_pages_remote() which does not invoke VM_FAULT_RETRY logic at all. > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > Note that the use of get_user_pages_remote() in async_pf_execute() reintroduces > the use of the FOLL_TOUCH flag - I don't think this is a problem as this flag > was dropped by 1e987790 ("mm/gup: Introduce get_user_pages_remote()") which > states 'Without protection keys, this patch should not change any behavior', so > I don't think this was intentional. Yes, I have already mentioned this in one of my previous emails. This indeed doesn't seem to be intentional
On Wed 26-10-16 00:36:09, Lorenzo Stoakes wrote: > In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with > get_user_pages_unlocked() since we can now pass gup_flags. > > In async_pf_execute() we need to pass different tsk, mm arguments so > get_user_pages_remote() is the sane replacement here (having added manual > acquisition and release of mmap_sem.) please also add a note about the FOLL_TOUCH reintroduced after it has been dropped by 1e9877902dc7e silently > Since we pass a NULL pages parameter the subsequent call to > __get_user_pages_locked() will have previously bailed any attempt at > VM_FAULT_RETRY, so we do not change this behaviour by using > get_user_pages_remote() which does not invoke VM_FAULT_RETRY logic at all. > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> > --- > virt/kvm/async_pf.c | 7 ++++--- > virt/kvm/kvm_main.c | 5 ++--- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c > index 8035cc1..e8c832c 100644 > --- a/virt/kvm/async_pf.c > +++ b/virt/kvm/async_pf.c > @@ -82,10 +82,11 @@ static void async_pf_execute(struct work_struct *work) > /* > * This work is run asynchromously to the task which owns > * mm and might be done in another context, so we must > - * use FOLL_REMOTE. > + * access remotely. > */ > - __get_user_pages_unlocked(NULL, mm, addr, 1, NULL, > - FOLL_WRITE | FOLL_REMOTE); > + down_read(&mm->mmap_sem); > + get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL); > + up_read(&mm->mmap_sem); > > kvm_async_page_present_sync(vcpu, apf); > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 2907b7b..c45d951 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1415,13 +1415,12 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, > npages = get_user_page_nowait(addr, write_fault, page); > up_read(¤t->mm->mmap_sem); > } else { > - unsigned int flags = FOLL_TOUCH | FOLL_HWPOISON; > + unsigned int flags = FOLL_HWPOISON; > > if (write_fault) > flags |= FOLL_WRITE; > > - npages = __get_user_pages_unlocked(current, current->mm, addr, 1, > - page, flags); > + npages = get_user_pages_unlocked(addr, 1, page, flags); > } > if (npages != 1) > return npages; > -- > 2.10.1
On Wed 26-10-16 00:46:31, Lorenzo Stoakes wrote: > The holdout for unexporting __get_user_pages_unlocked() is its invocation in > mm/process_vm_access.c: process_vm_rw_single_vec(), as this definitely _does_ > seem to invoke VM_FAULT_RETRY behaviour which get_user_pages_remote() will not > trigger if we were to replace it with the latter. I am not sure I understand. Prior to 1e9877902dc7e this used get_user_pages_unlocked. What prevents us from reintroducing it with FOLL_REMOVE which was meant to be added by the above commit? Or am I missing your point?
On Wed, Oct 26, 2016 at 11:15:43AM +0200, Michal Hocko wrote: > On Wed 26-10-16 00:46:31, Lorenzo Stoakes wrote: > > The holdout for unexporting __get_user_pages_unlocked() is its invocation in > > mm/process_vm_access.c: process_vm_rw_single_vec(), as this definitely _does_ > > seem to invoke VM_FAULT_RETRY behaviour which get_user_pages_remote() will not > > trigger if we were to replace it with the latter. > > I am not sure I understand. Prior to 1e9877902dc7e this used > get_user_pages_unlocked. What prevents us from reintroducing it with > FOLL_REMOVE which was meant to be added by the above commit? > > Or am I missing your point? The issue isn't the flags being passed, rather that in this case: a. Replacing __get_user_pages_unlocked() with get_user_pages_unlocked() won't work as the latter assumes task = current and mm = current->mm but process_vm_rw_single_vec() needs to pass different task, mm. b. Moving to get_user_pages_remote() _will_ allow us to pass different task, mm but won't however match existing behaviour precisely, since __get_user_pages_unlocked() acquires mmap_sem then passes a pointer to a local 'locked' variable to __get_user_pages_locked() which allows VM_FAULT_RETRY to trigger. The main issue I had here was not being sure whether we care about the VM_FAULT_RETRY functionality being used here or not, if we don't care then we can just move to get_user_pages_remote(), otherwise perhaps this should be left alone or maybe we need to consider adjusting the API to allow for remote access with VM_FAULT_RETRY functionality. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 26-10-16 10:39:13, Lorenzo Stoakes wrote: > On Wed, Oct 26, 2016 at 11:15:43AM +0200, Michal Hocko wrote: > > On Wed 26-10-16 00:46:31, Lorenzo Stoakes wrote: > > > The holdout for unexporting __get_user_pages_unlocked() is its invocation in > > > mm/process_vm_access.c: process_vm_rw_single_vec(), as this definitely _does_ > > > seem to invoke VM_FAULT_RETRY behaviour which get_user_pages_remote() will not > > > trigger if we were to replace it with the latter. > > > > I am not sure I understand. Prior to 1e9877902dc7e this used > > get_user_pages_unlocked. What prevents us from reintroducing it with > > FOLL_REMOVE which was meant to be added by the above commit? > > > > Or am I missing your point? > > The issue isn't the flags being passed, rather that in this case: > > a. Replacing __get_user_pages_unlocked() with get_user_pages_unlocked() won't > work as the latter assumes task = current and mm = current->mm but > process_vm_rw_single_vec() needs to pass different task, mm. Ohh, right. I should have checked more closely. > b. Moving to get_user_pages_remote() _will_ allow us to pass different task, mm > but won't however match existing behaviour precisely, since > __get_user_pages_unlocked() acquires mmap_sem then passes a pointer to a > local 'locked' variable to __get_user_pages_locked() which allows > VM_FAULT_RETRY to trigger. I do not see any reason why get_user_pages_remote should implicitely disallow VM_FAULT_RETRY. Releasing the mmap_sem on a remote task when we have to wait for IO is a good thing in general. So I would rather see a way to do allow that. Doing that implicitly sounds too dangerous and maybe we even have users which wouldn't cope with the mmap sem being dropped (get_arg_page sounds like a potential example) so I would rather add locked * parameter to get_user_pages_remote.
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 8035cc1..e8c832c 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -82,10 +82,11 @@ static void async_pf_execute(struct work_struct *work) /* * This work is run asynchromously to the task which owns * mm and might be done in another context, so we must - * use FOLL_REMOTE. + * access remotely. */ - __get_user_pages_unlocked(NULL, mm, addr, 1, NULL, - FOLL_WRITE | FOLL_REMOTE); + down_read(&mm->mmap_sem); + get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL); + up_read(&mm->mmap_sem); kvm_async_page_present_sync(vcpu, apf); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2907b7b..c45d951 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1415,13 +1415,12 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, npages = get_user_page_nowait(addr, write_fault, page); up_read(¤t->mm->mmap_sem); } else { - unsigned int flags = FOLL_TOUCH | FOLL_HWPOISON; + unsigned int flags = FOLL_HWPOISON; if (write_fault) flags |= FOLL_WRITE; - npages = __get_user_pages_unlocked(current, current->mm, addr, 1, - page, flags); + npages = get_user_pages_unlocked(addr, 1, page, flags); } if (npages != 1) return npages;
In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with get_user_pages_unlocked() since we can now pass gup_flags. In async_pf_execute() we need to pass different tsk, mm arguments so get_user_pages_remote() is the sane replacement here (having added manual acquisition and release of mmap_sem.) Since we pass a NULL pages parameter the subsequent call to __get_user_pages_locked() will have previously bailed any attempt at VM_FAULT_RETRY, so we do not change this behaviour by using get_user_pages_remote() which does not invoke VM_FAULT_RETRY logic at all. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> --- virt/kvm/async_pf.c | 7 ++++--- virt/kvm/kvm_main.c | 5 ++--- 2 files changed, 6 insertions(+), 6 deletions(-) -- 2.10.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html