Message ID | 20230217085439.2826375-1-stevensd@google.com (mailing list archive) |
---|---|
Headers | show |
Series | mm/khugepaged: fix khugepaged+shmem races | expand |
On Fri, 2023-02-17 at 17:54 +0900, David Stevens wrote: > From: David Stevens <stevensd@chromium.org> > > Fix two races in khugepaged+shmem that cause issues with userfaultfd > and > lseek, respectively. > > v3 -> v4: > - Base changes on mm-everything (fba720cb4dc0) > - Add patch to refactor error handling control flow in collapse_file > - Rebase userfaultfd patch with no significant logic changes > - Different approach for fixing lseek race > v2 -> v3: > - Use XA_RETRY_ENTRY to synchronize with reads from the page cache > under the RCU read lock in userfaultfd fix > - Add patch to fix lseek race > v1 -> v2: > - Different approach for userfaultfd fix > > David Stevens (3): > mm/khugepaged: refactor collapse_file control flow > mm/khugepaged: skip shmem with userfaultfd > mm/khugepaged: maintain page cache uptodate flag > > include/trace/events/huge_memory.h | 3 +- > mm/khugepaged.c | 263 ++++++++++++++++----------- > -- > 2 files changed, 144 insertions(+), 122 deletions(-) > Might want to Cc this to the stable mailing list.
On Fri, Feb 17, 2023 at 05:54:36PM +0900, David Stevens wrote: > From: David Stevens <stevensd@chromium.org> > > Fix two races in khugepaged+shmem that cause issues with userfaultfd and > lseek, respectively. > > v3 -> v4: > - Base changes on mm-everything (fba720cb4dc0) > - Add patch to refactor error handling control flow in collapse_file > - Rebase userfaultfd patch with no significant logic changes > - Different approach for fixing lseek race I just noticed this one hasn't landed unstable, so I guess I just posted a trivial cleanup that can conflict with this so it won't apply cleanly.. https://lore.kernel.org/r/20230303151218.311015-1-peterx@redhat.com The resolution will be fairly straightforward though, and I'm happy to rebase that one to this since this targets a real bug so should have higher priority. I guess it's possible Andrew thought the series has unsettled comment so Andrew could just have ignored that whole set in the mark ups. A repost could possibly clarify that. Again, it'll always great to get another eye on this slightly involved series. Matthew / Yang were already on the list, also copying Zach for his recent works on khugepaged just in case he spots anything wrong. Thanks,
On Mar 03 10:35, Peter Xu wrote: > On Fri, Feb 17, 2023 at 05:54:36PM +0900, David Stevens wrote: > > From: David Stevens <stevensd@chromium.org> > > > > Fix two races in khugepaged+shmem that cause issues with userfaultfd and > > lseek, respectively. > > > > v3 -> v4: > > - Base changes on mm-everything (fba720cb4dc0) > > - Add patch to refactor error handling control flow in collapse_file > > - Rebase userfaultfd patch with no significant logic changes > > - Different approach for fixing lseek race > > I just noticed this one hasn't landed unstable, so I guess I just posted a > trivial cleanup that can conflict with this so it won't apply cleanly.. > > https://lore.kernel.org/r/20230303151218.311015-1-peterx@redhat.com > > The resolution will be fairly straightforward though, and I'm happy to > rebase that one to this since this targets a real bug so should have higher > priority. > > I guess it's possible Andrew thought the series has unsettled comment so > Andrew could just have ignored that whole set in the mark ups. A repost > could possibly clarify that. > > Again, it'll always great to get another eye on this slightly involved > series. Matthew / Yang were already on the list, also copying Zach for his > recent works on khugepaged just in case he spots anything wrong. Thanks Peter. My email filters need some tinkering, since I missed this until yesterday when I came across it by chance. Thanks for Cc'ing me. I'm taking a look now, and likewise will if reposted. Best, Zach > > Thanks, > > -- > Peter Xu >
On Fri, Mar 3, 2023 at 7:35 AM Peter Xu <peterx@redhat.com> wrote: > > On Fri, Feb 17, 2023 at 05:54:36PM +0900, David Stevens wrote: > > From: David Stevens <stevensd@chromium.org> > > > > Fix two races in khugepaged+shmem that cause issues with userfaultfd and > > lseek, respectively. > > > > v3 -> v4: > > - Base changes on mm-everything (fba720cb4dc0) > > - Add patch to refactor error handling control flow in collapse_file > > - Rebase userfaultfd patch with no significant logic changes > > - Different approach for fixing lseek race > > I just noticed this one hasn't landed unstable, so I guess I just posted a > trivial cleanup that can conflict with this so it won't apply cleanly.. > > https://lore.kernel.org/r/20230303151218.311015-1-peterx@redhat.com > > The resolution will be fairly straightforward though, and I'm happy to > rebase that one to this since this targets a real bug so should have higher > priority. > > I guess it's possible Andrew thought the series has unsettled comment so > Andrew could just have ignored that whole set in the mark ups. A repost > could possibly clarify that. IIUC this series should need to be rebased on top of your clean up patch? > > Again, it'll always great to get another eye on this slightly involved > series. Matthew / Yang were already on the list, also copying Zach for his > recent works on khugepaged just in case he spots anything wrong. I had a quick look at it before, I didn't recall I spotted anything wrong about khugepaged, but I'm not familiar with userfaultfd. > > Thanks, > > -- > Peter Xu >
On Fri, 3 Mar 2023 10:35:53 -0500 Peter Xu <peterx@redhat.com> wrote: > On Fri, Feb 17, 2023 at 05:54:36PM +0900, David Stevens wrote: > > From: David Stevens <stevensd@chromium.org> > > > > Fix two races in khugepaged+shmem that cause issues with userfaultfd and > > lseek, respectively. > > > > v3 -> v4: > > - Base changes on mm-everything (fba720cb4dc0) > > - Add patch to refactor error handling control flow in collapse_file > > - Rebase userfaultfd patch with no significant logic changes > > - Different approach for fixing lseek race > > I just noticed this one hasn't landed unstable, so I guess I just posted a > trivial cleanup that can conflict with this so it won't apply cleanly.. > > https://lore.kernel.org/r/20230303151218.311015-1-peterx@redhat.com > > The resolution will be fairly straightforward though, and I'm happy to > rebase that one to this since this targets a real bug so should have higher > priority. Even without the above patch ("mm/khugepaged: Cleanup memcg uncharge for failure path") I'm seeing a big reject in khugepaged.c. Might be easily fixed, didn't look. > I guess it's possible Andrew thought the series has unsettled comment so > Andrew could just have ignored that whole set in the mark ups. A repost > could possibly clarify that. Yes please. Lets gather the acks thus far, rebase on git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm's mm-unstable branch and resend? > Again, it'll always great to get another eye on this slightly involved > series. Matthew / Yang were already on the list, also copying Zach for his > recent works on khugepaged just in case he spots anything wrong.
On Sat, Mar 4, 2023 at 7:53 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Fri, 3 Mar 2023 10:35:53 -0500 Peter Xu <peterx@redhat.com> wrote: > > > On Fri, Feb 17, 2023 at 05:54:36PM +0900, David Stevens wrote: > > > From: David Stevens <stevensd@chromium.org> > > > > > > Fix two races in khugepaged+shmem that cause issues with userfaultfd and > > > lseek, respectively. > > > > > > v3 -> v4: > > > - Base changes on mm-everything (fba720cb4dc0) > > > - Add patch to refactor error handling control flow in collapse_file > > > - Rebase userfaultfd patch with no significant logic changes > > > - Different approach for fixing lseek race > > > > I just noticed this one hasn't landed unstable, so I guess I just posted a > > trivial cleanup that can conflict with this so it won't apply cleanly.. > > > > https://lore.kernel.org/r/20230303151218.311015-1-peterx@redhat.com > > > > The resolution will be fairly straightforward though, and I'm happy to > > rebase that one to this since this targets a real bug so should have higher > > priority. > > Even without the above patch ("mm/khugepaged: Cleanup memcg uncharge > for failure path") I'm seeing a big reject in khugepaged.c. Might be > easily fixed, didn't look. > > > I guess it's possible Andrew thought the series has unsettled comment so > > Andrew could just have ignored that whole set in the mark ups. A repost > > could possibly clarify that. > > Yes please. Lets gather the acks thus far, rebase on > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm's mm-unstable > branch and resend? This conflicts pretty heavily with the "Memory poison recovery in khugepaged collapsing" series. This series was written on top of v9 of that series, but it looks like v9 of that series was dropped and is being replaced with v10. Which series should go in first? If we're confident that v10 of that series won't also be dropped, then rebasing this series onto v10 of that series should be pretty easy. Otherwise we could try reworking things to minimize conflicts between the two series (create a 0th refactoring series?). Andrew, what course would you prefer? -David > > Again, it'll always great to get another eye on this slightly involved > > series. Matthew / Yang were already on the list, also copying Zach for his > > recent works on khugepaged just in case he spots anything wrong. >
On Mon, 6 Mar 2023 11:44:59 +0900 David Stevens <stevensd@chromium.org> wrote: > > Yes please. Lets gather the acks thus far, rebase on > > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm's mm-unstable > > branch and resend? > > This conflicts pretty heavily with the "Memory poison recovery in > khugepaged collapsing" series. This series was written on top of v9 of > that series, but it looks like v9 of that series was dropped and is > being replaced with v10. Which series should go in first? If we're > confident that v10 of that series won't also be dropped, then rebasing > this series onto v10 of that series should be pretty easy. Otherwise > we could try reworking things to minimize conflicts between the two > series (create a 0th refactoring series?). Andrew, what course would > you prefer? I've merged v10 of "Memory poison recovery in khugepaged collapsing" so let's base on that please? "Memory poison recovery in khugepaged collapsing" shows no sign of review, which is worrisome for a series at version 10. Hopefully people will step up soon. So there's presently a risk that "Memory poison recovery in khugepaged collapsing" will go away again, so don't lose the pre-rebase series :(
From: David Stevens <stevensd@chromium.org> Fix two races in khugepaged+shmem that cause issues with userfaultfd and lseek, respectively. v3 -> v4: - Base changes on mm-everything (fba720cb4dc0) - Add patch to refactor error handling control flow in collapse_file - Rebase userfaultfd patch with no significant logic changes - Different approach for fixing lseek race v2 -> v3: - Use XA_RETRY_ENTRY to synchronize with reads from the page cache under the RCU read lock in userfaultfd fix - Add patch to fix lseek race v1 -> v2: - Different approach for userfaultfd fix David Stevens (3): mm/khugepaged: refactor collapse_file control flow mm/khugepaged: skip shmem with userfaultfd mm/khugepaged: maintain page cache uptodate flag include/trace/events/huge_memory.h | 3 +- mm/khugepaged.c | 263 ++++++++++++++++------------- 2 files changed, 144 insertions(+), 122 deletions(-)