Message ID | 20200615221607.7764-7-peterx@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
On Mon, Jun 15, 2020 at 06:15:48PM -0400, Peter Xu wrote: > Use the new mm_fault_accounting() helper for page fault accounting. > > CC: Catalin Marinas <catalin.marinas@arm.com> > CC: Will Deacon <will@kernel.org> > CC: linux-arm-kernel@lists.infradead.org > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > arch/arm64/mm/fault.c | 17 ++--------------- > 1 file changed, 2 insertions(+), 15 deletions(-) > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index c9cedc0432d2..09af7d7a60ec 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -484,8 +484,6 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, > addr, esr, regs); > } > > - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr); > - > /* > * As per x86, we may deadlock here. However, since the kernel only > * validly references user space from well defined areas of the code, > @@ -535,20 +533,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, > VM_FAULT_BADACCESS)))) { > /* > * Major/minor page fault accounting is only done > - * once. If we go through a retry, it is extremely > - * likely that the page will be found in page cache at > - * that point. > + * once. > */ > - if (major) { > - current->maj_flt++; > - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, > - addr); > - } else { > - current->min_flt++; > - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, > - addr); > - } > - > + mm_fault_accounting(current, regs, address, major); Please can you explain why it's ok to move the PERF_COUNT_SW_PAGE_FAULTS update like this? Seems like a user-visible change to me, so some justification would really help. Will
Hi, Will, On Tue, Jun 16, 2020 at 08:43:08AM +0100, Will Deacon wrote: > Please can you explain why it's ok to move the PERF_COUNT_SW_PAGE_FAULTS > update like this? Seems like a user-visible change to me, so some > justification would really help. Indeed this could be a functional change for PERF_COUNT_SW_PAGE_FAULTS on some archs, e.g., for arm64, PERF_COUNT_SW_PAGE_FAULTS previously will also contain accounting of severe errors where we go into the "no_context" path. However if you see the other archs, it's not always true, for example, the xtensa arch only accounts the correctly handled faults (arch/xtensa/mm/fault.c). After I thought about this, I don't think it's extremely useful (or please correct me if I missed something important) to use PERF_COUNT_SW_PAGE_FAULTS for fatal error accountings among all those correct ones. After all they are really extremely rare cases, and even if we got a sigbus for a process, we'll normally got something dumped in dmesg so if we really want to capture the error cases there should always be a better way (because by following things like dmesg we can not only know how many error faults triggered, but also on the details of the errors). IOW, my understanding of users of PERF_COUNT_SW_PAGE_FAULTS is that they want to trap normal/correct page faults, not really care about rare errors. Then when I went back to think PERF_COUNT_SW_PAGE_FAULTS, it's really about: A=PERF_COUNT_SW_PAGE_FAULTS B=PERF_COUNT_SW_PAGE_FAULTS_MAJ C=PERF_COUNT_SW_PAGE_FAULTS_MIN And: A=B+C If that's the case (which is simple and clear), it's really helpful too that we unify this definition across all the architectures, then it'll also be easier for us to provide some helper like mm_fault_accounting() so that the accounting can be managed in the general code rather than in arch-specific ways. Thanks,
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index c9cedc0432d2..09af7d7a60ec 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -484,8 +484,6 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, addr, esr, regs); } - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr); - /* * As per x86, we may deadlock here. However, since the kernel only * validly references user space from well defined areas of the code, @@ -535,20 +533,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, VM_FAULT_BADACCESS)))) { /* * Major/minor page fault accounting is only done - * once. If we go through a retry, it is extremely - * likely that the page will be found in page cache at - * that point. + * once. */ - if (major) { - current->maj_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, - addr); - } else { - current->min_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, - addr); - } - + mm_fault_accounting(current, regs, address, major); return 0; }
Use the new mm_fault_accounting() helper for page fault accounting. CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will@kernel.org> CC: linux-arm-kernel@lists.infradead.org Signed-off-by: Peter Xu <peterx@redhat.com> --- arch/arm64/mm/fault.c | 17 ++--------------- 1 file changed, 2 insertions(+), 15 deletions(-)