Message ID | 2edc4657-b6ff-3d6e-2342-6b60bfccc5b@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: allow pte_offset_map[_lock]() to fail | expand |
On 2023/5/22 13:05, Hugh Dickins wrote: > hmm_vma_walk_pmd() is called through mm_walk, but already has a goto > again loop of its own, so take part in that if pte_offset_map() fails. > > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > mm/hmm.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/hmm.c b/mm/hmm.c > index e23043345615..b1a9159d7c92 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, > } > > ptep = pte_offset_map(pmdp, addr); > + if (!ptep) > + goto again; > for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) { > int r; > I haven't read the entire patch set yet, but taking a note here. The hmm_vma_handle_pte() will unmap pte and then call migration_entry_wait() to remap pte, so this may fail, we need to handle this case like below: diff --git a/mm/hmm.c b/mm/hmm.c index 6a151c09de5e..eb726ff0981c 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -276,7 +276,8 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, if (is_migration_entry(entry)) { pte_unmap(ptep); hmm_vma_walk->last = addr; - migration_entry_wait(walk->mm, pmdp, addr); + if (!migration_entry_wait(walk->mm, pmdp, addr)) + return -EAGAIN; return -EBUSY; } @@ -386,6 +387,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, hmm_pfns); if (r) { + if (r == -EAGAIN) + goto again; /* hmm_vma_handle_pte() did pte_unmap() */ return r; } Of course, the migration_entry_wait() also needs to be modified.
Qi Zheng <qi.zheng@linux.dev> writes: > On 2023/5/22 13:05, Hugh Dickins wrote: >> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto >> again loop of its own, so take part in that if pte_offset_map() fails. >> Signed-off-by: Hugh Dickins <hughd@google.com> >> --- >> mm/hmm.c | 2 ++ >> 1 file changed, 2 insertions(+) >> diff --git a/mm/hmm.c b/mm/hmm.c >> index e23043345615..b1a9159d7c92 100644 >> --- a/mm/hmm.c >> +++ b/mm/hmm.c >> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, >> } >> ptep = pte_offset_map(pmdp, addr); >> + if (!ptep) >> + goto again; >> for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) { >> int r; >> > > I haven't read the entire patch set yet, but taking a note here. > The hmm_vma_handle_pte() will unmap pte and then call > migration_entry_wait() to remap pte, so this may fail, we need to > handle this case like below: I don't see a problem here. Sure, hmm_vma_handle_pte() might return -EBUSY but that will get returned up to hmm_range_fault() which will retry the whole thing again and presumably fail when looking at the PMD. > diff --git a/mm/hmm.c b/mm/hmm.c > index 6a151c09de5e..eb726ff0981c 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -276,7 +276,8 @@ static int hmm_vma_handle_pte(struct mm_walk > *walk, unsigned long addr, > if (is_migration_entry(entry)) { > pte_unmap(ptep); > hmm_vma_walk->last = addr; > - migration_entry_wait(walk->mm, pmdp, addr); > + if (!migration_entry_wait(walk->mm, pmdp, addr)) > + return -EAGAIN; > return -EBUSY; > } > > @@ -386,6 +387,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, > > r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, > hmm_pfns); > if (r) { > + if (r == -EAGAIN) > + goto again; > /* hmm_vma_handle_pte() did pte_unmap() */ > return r; > } > > Of course, the migration_entry_wait() also needs to be modified.
On 2023/5/23 10:39, Alistair Popple wrote: > > Qi Zheng <qi.zheng@linux.dev> writes: > >> On 2023/5/22 13:05, Hugh Dickins wrote: >>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto >>> again loop of its own, so take part in that if pte_offset_map() fails. >>> Signed-off-by: Hugh Dickins <hughd@google.com> >>> --- >>> mm/hmm.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> diff --git a/mm/hmm.c b/mm/hmm.c >>> index e23043345615..b1a9159d7c92 100644 >>> --- a/mm/hmm.c >>> +++ b/mm/hmm.c >>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, >>> } >>> ptep = pte_offset_map(pmdp, addr); >>> + if (!ptep) >>> + goto again; >>> for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) { >>> int r; >>> >> >> I haven't read the entire patch set yet, but taking a note here. >> The hmm_vma_handle_pte() will unmap pte and then call >> migration_entry_wait() to remap pte, so this may fail, we need to >> handle this case like below: > > I don't see a problem here. Sure, hmm_vma_handle_pte() might return > -EBUSY but that will get returned up to hmm_range_fault() which will > retry the whole thing again and presumably fail when looking at the PMD. Yeah. There is no problem with this and the modification to migration_entry_wait() can be simplified. My previous thought was that we can finish the retry logic in hmm_vma_walk_pmd() without handing it over to the caller. :) > >> diff --git a/mm/hmm.c b/mm/hmm.c >> index 6a151c09de5e..eb726ff0981c 100644 >> --- a/mm/hmm.c >> +++ b/mm/hmm.c >> @@ -276,7 +276,8 @@ static int hmm_vma_handle_pte(struct mm_walk >> *walk, unsigned long addr, >> if (is_migration_entry(entry)) { >> pte_unmap(ptep); >> hmm_vma_walk->last = addr; >> - migration_entry_wait(walk->mm, pmdp, addr); >> + if (!migration_entry_wait(walk->mm, pmdp, addr)) >> + return -EAGAIN; >> return -EBUSY; >> } >> >> @@ -386,6 +387,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, >> >> r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, >> hmm_pfns); >> if (r) { >> + if (r == -EAGAIN) >> + goto again; >> /* hmm_vma_handle_pte() did pte_unmap() */ >> return r; >> } >> >> Of course, the migration_entry_wait() also needs to be modified. >
On Tue, 23 May 2023, Qi Zheng wrote: > On 2023/5/23 10:39, Alistair Popple wrote: > > Qi Zheng <qi.zheng@linux.dev> writes: > >> On 2023/5/22 13:05, Hugh Dickins wrote: > >>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto > >>> again loop of its own, so take part in that if pte_offset_map() fails. > >>> Signed-off-by: Hugh Dickins <hughd@google.com> > >>> --- > >>> mm/hmm.c | 2 ++ > >>> 1 file changed, 2 insertions(+) > >>> diff --git a/mm/hmm.c b/mm/hmm.c > >>> index e23043345615..b1a9159d7c92 100644 > >>> --- a/mm/hmm.c > >>> +++ b/mm/hmm.c > >>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, > >>> } > >>> ptep = pte_offset_map(pmdp, addr); > >>> + if (!ptep) > >>> + goto again; > >>> for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) { > >>> int r; > >>> > >> > >> I haven't read the entire patch set yet, but taking a note here. > >> The hmm_vma_handle_pte() will unmap pte and then call > >> migration_entry_wait() to remap pte, so this may fail, we need to > >> handle this case like below: > > > > I don't see a problem here. Sure, hmm_vma_handle_pte() might return > > -EBUSY but that will get returned up to hmm_range_fault() which will > > retry the whole thing again and presumably fail when looking at the PMD. > > Yeah. There is no problem with this and the modification to > migration_entry_wait() can be simplified. My previous thought was that > we can finish the retry logic in hmm_vma_walk_pmd() without handing it > over to the caller. :) Okay, Alistair has resolved this one, thanks, I agree; but what is "the modification to migration_entry_wait()" that you refer to there? I don't think there's any need to make it a bool, it's normal for there to be races on entry to migration_entry_wait(), and we're used to just returning to caller (and back up to userspace) when it does not wait. Hugh
Hugh Dickins <hughd@google.com> writes: > On Tue, 23 May 2023, Qi Zheng wrote: >> On 2023/5/23 10:39, Alistair Popple wrote: >> > Qi Zheng <qi.zheng@linux.dev> writes: >> >> On 2023/5/22 13:05, Hugh Dickins wrote: >> >>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto >> >>> again loop of its own, so take part in that if pte_offset_map() fails. >> >>> Signed-off-by: Hugh Dickins <hughd@google.com> >> >>> --- >> >>> mm/hmm.c | 2 ++ >> >>> 1 file changed, 2 insertions(+) >> >>> diff --git a/mm/hmm.c b/mm/hmm.c >> >>> index e23043345615..b1a9159d7c92 100644 >> >>> --- a/mm/hmm.c >> >>> +++ b/mm/hmm.c >> >>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, >> >>> } >> >>> ptep = pte_offset_map(pmdp, addr); >> >>> + if (!ptep) >> >>> + goto again; >> >>> for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) { >> >>> int r; >> >>> >> >> >> >> I haven't read the entire patch set yet, but taking a note here. >> >> The hmm_vma_handle_pte() will unmap pte and then call >> >> migration_entry_wait() to remap pte, so this may fail, we need to >> >> handle this case like below: >> > >> > I don't see a problem here. Sure, hmm_vma_handle_pte() might return >> > -EBUSY but that will get returned up to hmm_range_fault() which will >> > retry the whole thing again and presumably fail when looking at the PMD. >> >> Yeah. There is no problem with this and the modification to >> migration_entry_wait() can be simplified. My previous thought was that >> we can finish the retry logic in hmm_vma_walk_pmd() without handing it >> over to the caller. :) > > Okay, Alistair has resolved this one, thanks, I agree; but what is > "the modification to migration_entry_wait()" that you refer to there? > > I don't think there's any need to make it a bool, it's normal for there > to be races on entry to migration_entry_wait(), and we're used to just > returning to caller (and back up to userspace) when it does not wait. Agreed. I didn't spot any places where returning to the caller without actually waiting would cause looping. I assume any retries or refaults will find the cleared PMD and fault/error out in some other manner anyway. hmm_range_fault() is the only place that might have been a bit special, but it looks fine to me so: Reviewed-by: Alistair Popple <apopple@nvidia.com> > Hugh
diff --git a/mm/hmm.c b/mm/hmm.c index e23043345615..b1a9159d7c92 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, } ptep = pte_offset_map(pmdp, addr); + if (!ptep) + goto again; for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) { int r;
hmm_vma_walk_pmd() is called through mm_walk, but already has a goto again loop of its own, so take part in that if pte_offset_map() fails. Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/hmm.c | 2 ++ 1 file changed, 2 insertions(+)