Message ID | 20190815205227.7949-1-Philip.Yang@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/hmm: hmm_range_fault handle pages swapped out | expand |
On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote: > hmm_range_fault may return NULL pages because some of pfns are equal to > HMM_PFN_NONE. This happens randomly under memory pressure. The reason is > for swapped out page pte path, hmm_vma_handle_pte doesn't update fault > variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap > the page in. > > The fix is to call hmm_pte_need_fault to update fault variable. > > Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee > Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> > --- > mm/hmm.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/hmm.c b/mm/hmm.c > index 9f22562e2c43..7ca4fb39d3d8 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -544,6 +544,9 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, > swp_entry_t entry = pte_to_swp_entry(pte); > > if (!non_swap_entry(entry)) { > + cpu_flags = pte_to_hmm_pfn_flags(range, pte); > + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, > + &fault, &write_fault); > if (fault || write_fault) > goto fault; > return 0; > -- > 2.17.1 >
On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote: > hmm_range_fault may return NULL pages because some of pfns are equal to > HMM_PFN_NONE. This happens randomly under memory pressure. The reason is > for swapped out page pte path, hmm_vma_handle_pte doesn't update fault > variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap > the page in. > > The fix is to call hmm_pte_need_fault to update fault variable. > Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee I'll fix it for you but please be careful not to send Change-id's to the public lists. Also what is the Fixes line for this? > Signed-off-by: Philip Yang <Philip.Yang@amd.com> > mm/hmm.c | 3 +++ > 1 file changed, 3 insertions(+) Ralph has also been looking at this area also so I'll give him a bit to chime in, otherwise with Jerome's review this looks OK to go to linux-next Thanks, Jason
On 2019-08-15 8:54 p.m., Jason Gunthorpe wrote: > On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote: >> hmm_range_fault may return NULL pages because some of pfns are equal to >> HMM_PFN_NONE. This happens randomly under memory pressure. The reason is >> for swapped out page pte path, hmm_vma_handle_pte doesn't update fault >> variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap >> the page in. >> >> The fix is to call hmm_pte_need_fault to update fault variable. > >> Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee > > I'll fix it for you but please be careful not to send Change-id's to > the public lists. > Thanks, the change-id was added by our Gerrit hook, I need generate patch files, remove change-id line and then send out modified patch files in future. > Also what is the Fixes line for this? > This fixes the issue found by the internal rocrtst, the rocrtstFunc.Memory_Max_Mem evicted some user buffers, and then following test restore those user buffers failed because the buffers are swapped out and application doesn't touch the buffers to swap it in. >> Signed-off-by: Philip Yang <Philip.Yang@amd.com> >> mm/hmm.c | 3 +++ >> 1 file changed, 3 insertions(+) > > Ralph has also been looking at this area also so I'll give him a bit > to chime in, otherwise with Jerome's review this looks OK to go to > linux-next > Ok, thanks for helping push this to hmm branch at https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git > Thanks, > Jason >
On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote: > hmm_range_fault may return NULL pages because some of pfns are equal to > HMM_PFN_NONE. This happens randomly under memory pressure. The reason is > for swapped out page pte path, hmm_vma_handle_pte doesn't update fault > variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap > the page in. > > The fix is to call hmm_pte_need_fault to update fault variable. > > Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee > Signed-off-by: Philip Yang <Philip.Yang@amd.com> > mm/hmm.c | 3 +++ > 1 file changed, 3 insertions(+) Applied to hmm.git, thanks I fixed the commit message: Author: Yang, Philip <Philip.Yang@amd.com> Date: Thu Aug 15 20:52:56 2019 +0000 mm/hmm: fix hmm_range_fault()'s handling of swapped out pages hmm_range_fault() may return NULL pages because some of the pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is during the swapped out page pte path, hmm_vma_handle_pte() doesn't update the fault variable from cpu_flags, so it failed to call hmm_vam_do_fault() to swap the page in. The fix is to call hmm_pte_need_fault() to update fault variable. Fixes: 74eee180b935 ("mm/hmm/mirror: device page fault handler") Link: https://lore.kernel.org/r/20190815205227.7949-1-Philip.Yang@amd.com Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
diff --git a/mm/hmm.c b/mm/hmm.c index 9f22562e2c43..7ca4fb39d3d8 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -544,6 +544,9 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, swp_entry_t entry = pte_to_swp_entry(pte); if (!non_swap_entry(entry)) { + cpu_flags = pte_to_hmm_pfn_flags(range, pte); + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, + &fault, &write_fault); if (fault || write_fault) goto fault; return 0;
hmm_range_fault may return NULL pages because some of pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is for swapped out page pte path, hmm_vma_handle_pte doesn't update fault variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap the page in. The fix is to call hmm_pte_need_fault to update fault variable. Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee Signed-off-by: Philip Yang <Philip.Yang@amd.com> --- mm/hmm.c | 3 +++ 1 file changed, 3 insertions(+)