Message ID | 20241113160103.48943-2-vbabka@suse.cz (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC,for,stable,5.15,and,5.10] mm/memory: only copy anonymous pages during fork() | expand |
On 13.11.24 17:01, Vlastimil Babka wrote: > When a combination of unfortunate factors occur, we might BUG in fork(): > > dup_mmap() > copy_page_range() > copy_***_range() > copy_present_pte() > copy_present_page() > page_add_new_anon_rmap() > __page_set_anon_rmap() > BUG_ON(!anon_vma); > > The factors are: > > - source vma is VM_MIXEDMAP otherwise copy_page_range() would bail out > when !src_vma->anon_vma > - I think this was due to gpfs, but can happen in-tree as well > - is_cow_mapping() is true because VM_MAYWRITE (even though the vma > was a read-only mapping of a .so file) > - MMF_HAS_PINNED is true, thus some actual pinning has happened > - page_maybe_dma_pinned() is true as a false positive, because mapcount > and thus refcount is >1024 > > That makes us reach page_needs_cow_for_dma() in copy_present_page() and > evaluate it as true and attempt to CoW a file page and hit the BUG_ON() > because we never had a reason to instantiate anon_vma for the source > vma. > > AFAICS this was fixed inadvertedly in 5.19 by commit fb3d824d1a46 > ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and > page_try_dup_anon_rmap()") or another commit in that series. What caught > my attention is this part of the changelog: > > We really only care about pins on anonymous pages, because they are prone > to getting replaced in the COW handler once mapped R/O. For !anon pages > in cow-mappings (!VM_SHARED && VM_MAYWRITE) we shouldn't really care about > that, at least not that I could come up with an example. > > And as part of that commit, an PageAnon() test is added in > copy_present_pte(). > > But the code is already refactored a lot, so this is an attempt at a > minimal fix for LTS kernels by placing the PageAnon() check to > copy_present_page(). > > Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes") > Cc: Peter Xu <peterx@redhat.com> > Cc: David Hildenbrand <david@redhat.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Hi, we've seen this in our 5.14 based kernel and it involved the out of > tree gpfs module, but I believe the same thing can happen in LTS's 5.10 > and 5.15 without out of tree modules as well. So I'd like your opinion > on this fix before I propose it to stable as a non-standard > version-specific fix (I don't think we'd want to backport fb3d824d1a46 > with prerequisities). Thanks. I recall seeing+discussing this exact patch already a couple years ago :D Ah, here is the 5.15 version https://lkml.kernel.org/r/20221028075244.3112566-1-songyuanzheng@huawei.com And the 5.10 version https://lore.kernel.org/lkml/20221024094911.3054769-1-songyuanzheng@huawei.com/ ... I could have sworn they got applied. ... and in linux-5.10.y I see commit 935a8b6202101d7f58fe9cd11287f9cec0d8dd32 Author: Yuanzheng Song <songyuanzheng@huawei.com> Date: Fri Oct 28 03:07:05 2022 +0000 mm/memory: add non-anonymous page check in the copy_present_page() The vma->anon_vma of the child process may be NULL because the entire vma does not contain anonymous pages. In this case, a BUG will occur when the copy_present_page() passes a copy of a non-anonymous page of that vma to the page_add_new_anon_rmap() to set up new anonymous rmap. Maybe you missed that the PageAnon() check is simply a couple of lines further down in there?
On 11/13/24 17:09, David Hildenbrand wrote: >> --- >> Hi, we've seen this in our 5.14 based kernel and it involved the out of >> tree gpfs module, but I believe the same thing can happen in LTS's 5.10 >> and 5.15 without out of tree modules as well. So I'd like your opinion >> on this fix before I propose it to stable as a non-standard >> version-specific fix (I don't think we'd want to backport fb3d824d1a46 >> with prerequisities). Thanks. > > I recall seeing+discussing this exact patch already a couple years ago :D > > Ah, here is the 5.15 version > > https://lkml.kernel.org/r/20221028075244.3112566-1-songyuanzheng@huawei.com > > And the 5.10 version > > https://lore.kernel.org/lkml/20221024094911.3054769-1-songyuanzheng@huawei.com/ > > > ... I could have sworn they got applied. > > ... and in linux-5.10.y I see > > commit 935a8b6202101d7f58fe9cd11287f9cec0d8dd32 > Author: Yuanzheng Song <songyuanzheng@huawei.com> > Date: Fri Oct 28 03:07:05 2022 +0000 > > mm/memory: add non-anonymous page check in the copy_present_page() > > The vma->anon_vma of the child process may be NULL because > the entire vma does not contain anonymous pages. In this > case, a BUG will occur when the copy_present_page() passes > a copy of a non-anonymous page of that vma to the > page_add_new_anon_rmap() to set up new anonymous rmap. > > > > Maybe you missed that the PageAnon() check is simply a couple of lines > further down in there? No I was only looking at the 5.15 branch so far, and seems it was never applied there, unlike 5.10. Weird. But thanks for the heads up.
On 13.11.24 17:18, Vlastimil Babka wrote: > On 11/13/24 17:09, David Hildenbrand wrote: >>> --- >>> Hi, we've seen this in our 5.14 based kernel and it involved the out of >>> tree gpfs module, but I believe the same thing can happen in LTS's 5.10 >>> and 5.15 without out of tree modules as well. So I'd like your opinion >>> on this fix before I propose it to stable as a non-standard >>> version-specific fix (I don't think we'd want to backport fb3d824d1a46 >>> with prerequisities). Thanks. >> >> I recall seeing+discussing this exact patch already a couple years ago :D >> >> Ah, here is the 5.15 version >> >> https://lkml.kernel.org/r/20221028075244.3112566-1-songyuanzheng@huawei.com >> >> And the 5.10 version >> >> https://lore.kernel.org/lkml/20221024094911.3054769-1-songyuanzheng@huawei.com/ >> >> >> ... I could have sworn they got applied. >> >> ... and in linux-5.10.y I see >> >> commit 935a8b6202101d7f58fe9cd11287f9cec0d8dd32 >> Author: Yuanzheng Song <songyuanzheng@huawei.com> >> Date: Fri Oct 28 03:07:05 2022 +0000 >> >> mm/memory: add non-anonymous page check in the copy_present_page() >> >> The vma->anon_vma of the child process may be NULL because >> the entire vma does not contain anonymous pages. In this >> case, a BUG will occur when the copy_present_page() passes >> a copy of a non-anonymous page of that vma to the >> page_add_new_anon_rmap() to set up new anonymous rmap. >> >> >> >> Maybe you missed that the PageAnon() check is simply a couple of lines >> further down in there? > > No I was only looking at the 5.15 branch so far, and seems it was never > applied there, unlike 5.10. Weird. But thanks for the heads up. Indeed, looks like it never got applied to 5.15 ...
On 11/13/24 17:25, David Hildenbrand wrote: > On 13.11.24 17:18, Vlastimil Babka wrote: >> On 11/13/24 17:09, David Hildenbrand wrote: >>>> --- >>>> Hi, we've seen this in our 5.14 based kernel and it involved the out of >>>> tree gpfs module, but I believe the same thing can happen in LTS's 5.10 >>>> and 5.15 without out of tree modules as well. So I'd like your opinion >>>> on this fix before I propose it to stable as a non-standard >>>> version-specific fix (I don't think we'd want to backport fb3d824d1a46 >>>> with prerequisities). Thanks. >>> >>> I recall seeing+discussing this exact patch already a couple years ago :D >>> >>> Ah, here is the 5.15 version >>> >>> https://lkml.kernel.org/r/20221028075244.3112566-1-songyuanzheng@huawei.com >>> >>> And the 5.10 version >>> >>> https://lore.kernel.org/lkml/20221024094911.3054769-1-songyuanzheng@huawei.com/ >>> >>> >>> ... I could have sworn they got applied. >>> >>> ... and in linux-5.10.y I see >>> >>> commit 935a8b6202101d7f58fe9cd11287f9cec0d8dd32 >>> Author: Yuanzheng Song <songyuanzheng@huawei.com> >>> Date: Fri Oct 28 03:07:05 2022 +0000 >>> >>> mm/memory: add non-anonymous page check in the copy_present_page() >>> >>> The vma->anon_vma of the child process may be NULL because >>> the entire vma does not contain anonymous pages. In this >>> case, a BUG will occur when the copy_present_page() passes >>> a copy of a non-anonymous page of that vma to the >>> page_add_new_anon_rmap() to set up new anonymous rmap. >>> >>> >>> >>> Maybe you missed that the PageAnon() check is simply a couple of lines >>> further down in there? >> >> No I was only looking at the 5.15 branch so far, and seems it was never >> applied there, unlike 5.10. Weird. But thanks for the heads up. > > Indeed, looks like it never got applied to 5.15 ... The author forgot to actually include stable@ will forward it
diff --git a/mm/memory.c b/mm/memory.c index 6d058973a97e..73871bac0e4c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -887,6 +887,10 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma { struct page *new_page; + /* We only care about pins on anonymous pages */ + if (!PageAnon(page)) + return 1; + /* * What we want to do is to check whether this page may * have been pinned by the parent process. If so,
When a combination of unfortunate factors occur, we might BUG in fork(): dup_mmap() copy_page_range() copy_***_range() copy_present_pte() copy_present_page() page_add_new_anon_rmap() __page_set_anon_rmap() BUG_ON(!anon_vma); The factors are: - source vma is VM_MIXEDMAP otherwise copy_page_range() would bail out when !src_vma->anon_vma - I think this was due to gpfs, but can happen in-tree as well - is_cow_mapping() is true because VM_MAYWRITE (even though the vma was a read-only mapping of a .so file) - MMF_HAS_PINNED is true, thus some actual pinning has happened - page_maybe_dma_pinned() is true as a false positive, because mapcount and thus refcount is >1024 That makes us reach page_needs_cow_for_dma() in copy_present_page() and evaluate it as true and attempt to CoW a file page and hit the BUG_ON() because we never had a reason to instantiate anon_vma for the source vma. AFAICS this was fixed inadvertedly in 5.19 by commit fb3d824d1a46 ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()") or another commit in that series. What caught my attention is this part of the changelog: We really only care about pins on anonymous pages, because they are prone to getting replaced in the COW handler once mapped R/O. For !anon pages in cow-mappings (!VM_SHARED && VM_MAYWRITE) we shouldn't really care about that, at least not that I could come up with an example. And as part of that commit, an PageAnon() test is added in copy_present_pte(). But the code is already refactored a lot, so this is an attempt at a minimal fix for LTS kernels by placing the PageAnon() check to copy_present_page(). Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes") Cc: Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- Hi, we've seen this in our 5.14 based kernel and it involved the out of tree gpfs module, but I believe the same thing can happen in LTS's 5.10 and 5.15 without out of tree modules as well. So I'd like your opinion on this fix before I propose it to stable as a non-standard version-specific fix (I don't think we'd want to backport fb3d824d1a46 with prerequisities). Thanks. mm/memory.c | 4 ++++ 1 file changed, 4 insertions(+)