Message ID | 20230414175444.1837474-1-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/1] mm: do not increment pgfault stats when page fault handler retries | expand |
On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote: > If the page fault handler requests a retry, we will count the fault > multiple times. This is a relatively harmless problem as the retry paths > are not often requested, and the only user-visible problem is that the > fault counter will be slightly higher than it should be. Nevertheless, > userspace only took one fault, and should not see the fact that the > kernel had to retry the fault multiple times. > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") I know I suggested this fixes line, but I think it's actually been here much longer, perhaps since Fixes: d065bd810b6d ("mm: retry page fault when blocking on disk transfer") Michel, what do you think? > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > Patch applies cleanly over linux-next and mm-unstable > > mm/memory.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 1c5b231fe6e3..d88f370eacd1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > __set_current_state(TASK_RUNNING); > > - count_vm_event(PGFAULT); > - count_memcg_event_mm(vma->vm_mm, PGFAULT); > - > ret = sanitize_fault_flags(vma, &flags); > if (ret) > - return ret; > + goto out; > > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > flags & FAULT_FLAG_INSTRUCTION, > - flags & FAULT_FLAG_REMOTE)) > - return VM_FAULT_SIGSEGV; > + flags & FAULT_FLAG_REMOTE)) { > + ret = VM_FAULT_SIGSEGV; > + goto out; > + } > > /* > * Enable the memcg OOM handling for faults triggered in user > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > } > > mm_account_fault(regs, address, flags, ret); > +out: > + if (!(ret & VM_FAULT_RETRY)) { > + count_vm_event(PGFAULT); > + count_memcg_event_mm(vma->vm_mm, PGFAULT); > + } > > return ret; > } > -- > 2.40.0.634.g4ca3ef3211-goog >
Hi, Suren, On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote: > If the page fault handler requests a retry, we will count the fault > multiple times. This is a relatively harmless problem as the retry paths > are not often requested, and the only user-visible problem is that the > fault counter will be slightly higher than it should be. Nevertheless, > userspace only took one fault, and should not see the fact that the > kernel had to retry the fault multiple times. > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > Patch applies cleanly over linux-next and mm-unstable > > mm/memory.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 1c5b231fe6e3..d88f370eacd1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > __set_current_state(TASK_RUNNING); > > - count_vm_event(PGFAULT); > - count_memcg_event_mm(vma->vm_mm, PGFAULT); > - > ret = sanitize_fault_flags(vma, &flags); > if (ret) > - return ret; > + goto out; > > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > flags & FAULT_FLAG_INSTRUCTION, > - flags & FAULT_FLAG_REMOTE)) > - return VM_FAULT_SIGSEGV; > + flags & FAULT_FLAG_REMOTE)) { > + ret = VM_FAULT_SIGSEGV; > + goto out; > + } > > /* > * Enable the memcg OOM handling for faults triggered in user > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > } > > mm_account_fault(regs, address, flags, ret); Here is the mm_account_fault() function taking care of some other accountings. Perhaps good to put things into it? It also already ignores invalid faults: if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) return; I see that you may also want to account for sigbus, however I really don't know why. Explanations would be great when it would matter. So far it makes sense to me if we skip both RETRY or ERROR cases. > +out: > + if (!(ret & VM_FAULT_RETRY)) { > + count_vm_event(PGFAULT); > + count_memcg_event_mm(vma->vm_mm, PGFAULT); There is one thing worth noticing is here vma may or may not be valid depending on the retval of the fault. RETRY is exactly one of the cases that accessing vma may be unsafe due to releasing of mmap read lock. The other one is the recently added VM_FAULT_COMPLETE. So if we want to move this chunk (or any vma reference) to be later we need to consider a valid vma / mm being there first, or we're prone to accessing a vma that has already been released, I think. > + } > > return ret; > } > -- > 2.40.0.634.g4ca3ef3211-goog > > Thanks,
On Fri, Apr 14, 2023 at 2:47 PM Peter Xu <peterx@redhat.com> wrote: > > Hi, Suren, Hi Peter, > > On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote: > > If the page fault handler requests a retry, we will count the fault > > multiple times. This is a relatively harmless problem as the retry paths > > are not often requested, and the only user-visible problem is that the > > fault counter will be slightly higher than it should be. Nevertheless, > > userspace only took one fault, and should not see the fact that the > > kernel had to retry the fault multiple times. > > > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > --- > > Patch applies cleanly over linux-next and mm-unstable > > > > mm/memory.c | 16 ++++++++++------ > > 1 file changed, 10 insertions(+), 6 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 1c5b231fe6e3..d88f370eacd1 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > > > __set_current_state(TASK_RUNNING); > > > > - count_vm_event(PGFAULT); > > - count_memcg_event_mm(vma->vm_mm, PGFAULT); > > - > > ret = sanitize_fault_flags(vma, &flags); > > if (ret) > > - return ret; > > + goto out; > > > > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > > flags & FAULT_FLAG_INSTRUCTION, > > - flags & FAULT_FLAG_REMOTE)) > > - return VM_FAULT_SIGSEGV; > > + flags & FAULT_FLAG_REMOTE)) { > > + ret = VM_FAULT_SIGSEGV; > > + goto out; > > + } > > > > /* > > * Enable the memcg OOM handling for faults triggered in user > > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > } > > > > mm_account_fault(regs, address, flags, ret); > > Here is the mm_account_fault() function taking care of some other > accountings. Perhaps good to put things into it? That seems appropriate. Let me take a closer look. > > It also already ignores invalid faults: > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > return; Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically we need to retry but no errors happened? If so then this condition would double-count pagefaults in such cases. If such return code is impossible then it's the same as checking for VM_FAULT_RETRY. > > I see that you may also want to account for sigbus, however I really don't > know why. Explanations would be great when it would matter. So far it > makes sense to me if we skip both RETRY or ERROR cases. Accounting in case of a sigbus is not affected by this patch I think. We account for sigbus or any other error cases because there was a pagefault and we need to account for it. Whether we failed to handle it or not should not affect the count. We skip the retry case because we know the same fault will be retried. If we don't skip then we will double-count this fault. > > > +out: > > + if (!(ret & VM_FAULT_RETRY)) { > > + count_vm_event(PGFAULT); > > + count_memcg_event_mm(vma->vm_mm, PGFAULT); > > There is one thing worth noticing is here vma may or may not be valid > depending on the retval of the fault. > > RETRY is exactly one of the cases that accessing vma may be unsafe due to > releasing of mmap read lock. The other one is the recently added > VM_FAULT_COMPLETE. So if we want to move this chunk (or any vma reference) > to be later we need to consider a valid vma / mm being there first, or > we're prone to accessing a vma that has already been released, I think. Good catch! I think you are right and I should have stored vma->vm_mm in the beginning and used it when calling count_memcg_event_mm(). I'll prepare a new patch which handles this correctly. Thanks, Suren. > > > + } > > > > return ret; > > } > > -- > > 2.40.0.634.g4ca3ef3211-goog > > > > > > Thanks, > > -- > Peter Xu > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Fri, Apr 14, 2023 at 3:14 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Fri, Apr 14, 2023 at 2:47 PM Peter Xu <peterx@redhat.com> wrote: > > > > Hi, Suren, > > Hi Peter, > > > > > On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote: > > > If the page fault handler requests a retry, we will count the fault > > > multiple times. This is a relatively harmless problem as the retry paths > > > are not often requested, and the only user-visible problem is that the > > > fault counter will be slightly higher than it should be. Nevertheless, > > > userspace only took one fault, and should not see the fact that the > > > kernel had to retry the fault multiple times. > > > > > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > > --- > > > Patch applies cleanly over linux-next and mm-unstable > > > > > > mm/memory.c | 16 ++++++++++------ > > > 1 file changed, 10 insertions(+), 6 deletions(-) > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 1c5b231fe6e3..d88f370eacd1 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > > > > > __set_current_state(TASK_RUNNING); > > > > > > - count_vm_event(PGFAULT); > > > - count_memcg_event_mm(vma->vm_mm, PGFAULT); > > > - > > > ret = sanitize_fault_flags(vma, &flags); > > > if (ret) > > > - return ret; > > > + goto out; > > > > > > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > > > flags & FAULT_FLAG_INSTRUCTION, > > > - flags & FAULT_FLAG_REMOTE)) > > > - return VM_FAULT_SIGSEGV; > > > + flags & FAULT_FLAG_REMOTE)) { > > > + ret = VM_FAULT_SIGSEGV; > > > + goto out; > > > + } > > > > > > /* > > > * Enable the memcg OOM handling for faults triggered in user > > > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > > } > > > > > > mm_account_fault(regs, address, flags, ret); > > > > Here is the mm_account_fault() function taking care of some other > > accountings. Perhaps good to put things into it? > > That seems appropriate. Let me take a closer look. > > > > > It also already ignores invalid faults: > > > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > return; > > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically > we need to retry but no errors happened? If so then this condition > would double-count pagefaults in such cases. If such return code is > impossible then it's the same as checking for VM_FAULT_RETRY. > > > > > I see that you may also want to account for sigbus, however I really don't > > know why. Explanations would be great when it would matter. So far it > > makes sense to me if we skip both RETRY or ERROR cases. > > Accounting in case of a sigbus is not affected by this patch I think. > We account for sigbus or any other error cases because there was a > pagefault and we need to account for it. Whether we failed to handle > it or not should not affect the count. We skip the retry case because > we know the same fault will be retried. If we don't skip then we will > double-count this fault. mm_account_fault() has a nice comment explaining why it skips errors and that now makes sense to me. Let me move the accounting there and see if others agree that's the right place. > > > > > > +out: > > > + if (!(ret & VM_FAULT_RETRY)) { > > > + count_vm_event(PGFAULT); > > > + count_memcg_event_mm(vma->vm_mm, PGFAULT); > > > > There is one thing worth noticing is here vma may or may not be valid > > depending on the retval of the fault. > > > > RETRY is exactly one of the cases that accessing vma may be unsafe due to > > releasing of mmap read lock. The other one is the recently added > > VM_FAULT_COMPLETE. So if we want to move this chunk (or any vma reference) > > to be later we need to consider a valid vma / mm being there first, or > > we're prone to accessing a vma that has already been released, I think. > > Good catch! I think you are right and I should have stored vma->vm_mm > in the beginning and used it when calling count_memcg_event_mm(). > I'll prepare a new patch which handles this correctly. > Thanks, > Suren. > > > > > > + } > > > > > > return ret; > > > } > > > -- > > > 2.40.0.634.g4ca3ef3211-goog > > > > > > > > > > Thanks, > > > > -- > > Peter Xu > > > > -- > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > >
Hi, Suren, On Fri, Apr 14, 2023 at 03:14:23PM -0700, Suren Baghdasaryan wrote: > > It also already ignores invalid faults: > > > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > return; > > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically > we need to retry but no errors happened? If so then this condition > would double-count pagefaults in such cases. If ret==VM_FAULT_RETRY it should return here already, so I assume mm_account_fault() itself is fine regarding fault retries? Note that I think "ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)" above means "either ERROR or RETRY we'll skip the accounting". IMHO we should have 3 cases here: - ERROR && !RETRY error triggered of any kind - RETRY && !ERROR we need to try one more time - !RETRY && !ERROR we finished the fault I don't think ERROR & RETRY can even be set at the same time so I assume there's no option 4) - a RETRY should imply no ERROR already, even though it's still incomplete so need another attempt. Thanks,
On Fri, Apr 14, 2023 at 3:35 PM Peter Xu <peterx@redhat.com> wrote: > > Hi, Suren, > > On Fri, Apr 14, 2023 at 03:14:23PM -0700, Suren Baghdasaryan wrote: > > > It also already ignores invalid faults: > > > > > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > > return; > > > > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically > > we need to retry but no errors happened? If so then this condition > > would double-count pagefaults in such cases. > > If ret==VM_FAULT_RETRY it should return here already, so I assume > mm_account_fault() itself is fine regarding fault retries? > > Note that I think "ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)" above means > "either ERROR or RETRY we'll skip the accounting". > > IMHO we should have 3 cases here: > > - ERROR && !RETRY > error triggered of any kind > > - RETRY && !ERROR > we need to try one more time > > - !RETRY && !ERROR > we finished the fault After looking some more into mm_account_fault(), I think it would be fine to count the faults which produced errors. IIUC these counters represent the total number of faults, not the number of valid and successful faults. If so then I think simply using VM_FAULT_RETRY should be ok without considering all possible combinations. WDYT? > > I don't think ERROR & RETRY can even be set at the same time so I assume > there's no option 4) - a RETRY should imply no ERROR already, even though > it's still incomplete so need another attempt. > > Thanks, > > -- > Peter Xu >
On Fri, Apr 14, 2023 at 4:49 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Fri, Apr 14, 2023 at 3:35 PM Peter Xu <peterx@redhat.com> wrote: > > > > Hi, Suren, > > > > On Fri, Apr 14, 2023 at 03:14:23PM -0700, Suren Baghdasaryan wrote: > > > > It also already ignores invalid faults: > > > > > > > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > > > return; > > > > > > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically > > > we need to retry but no errors happened? If so then this condition > > > would double-count pagefaults in such cases. > > > > If ret==VM_FAULT_RETRY it should return here already, so I assume > > mm_account_fault() itself is fine regarding fault retries? > > > > Note that I think "ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)" above means > > "either ERROR or RETRY we'll skip the accounting". > > > > IMHO we should have 3 cases here: > > > > - ERROR && !RETRY > > error triggered of any kind > > > > - RETRY && !ERROR > > we need to try one more time > > > > - !RETRY && !ERROR > > we finished the fault > > After looking some more into mm_account_fault(), I think it would be > fine to count the faults which produced errors. IIUC these counters > represent the total number of faults, not the number of valid and > successful faults. If so then I think simply using VM_FAULT_RETRY > should be ok without considering all possible combinations. WDYT? I posted v2 at https://lore.kernel.org/all/20230415000818.1955007-1-surenb@google.com/ Hopefully it's closer to what we want it to be. > > > > > I don't think ERROR & RETRY can even be set at the same time so I assume > > there's no option 4) - a RETRY should imply no ERROR already, even though > > it's still incomplete so need another attempt. > > > > Thanks, > > > > -- > > Peter Xu > >
diff --git a/mm/memory.c b/mm/memory.c index 1c5b231fe6e3..d88f370eacd1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, __set_current_state(TASK_RUNNING); - count_vm_event(PGFAULT); - count_memcg_event_mm(vma->vm_mm, PGFAULT); - ret = sanitize_fault_flags(vma, &flags); if (ret) - return ret; + goto out; if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, flags & FAULT_FLAG_INSTRUCTION, - flags & FAULT_FLAG_REMOTE)) - return VM_FAULT_SIGSEGV; + flags & FAULT_FLAG_REMOTE)) { + ret = VM_FAULT_SIGSEGV; + goto out; + } /* * Enable the memcg OOM handling for faults triggered in user @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, } mm_account_fault(regs, address, flags, ret); +out: + if (!(ret & VM_FAULT_RETRY)) { + count_vm_event(PGFAULT); + count_memcg_event_mm(vma->vm_mm, PGFAULT); + } return ret; }