Message ID | c440d69879e34209feba21e12d236d06bc0a25db.1543577156.git.jstancek@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] mm: page_mapped: don't assume compound page is huge or THP | expand |
On Fri 30-11-18 13:06:57, Jan Stancek wrote: > LTP proc01 testcase has been observed to rarely trigger crashes > on arm64: > page_mapped+0x78/0xb4 > stable_page_flags+0x27c/0x338 > kpageflags_read+0xfc/0x164 > proc_reg_read+0x7c/0xb8 > __vfs_read+0x58/0x178 > vfs_read+0x90/0x14c > SyS_read+0x60/0xc0 > > Issue is that page_mapped() assumes that if compound page is not > huge, then it must be THP. But if this is 'normal' compound page > (COMPOUND_PAGE_DTOR), then following loop can keep running > (for HPAGE_PMD_NR iterations) until it tries to read from memory > that isn't mapped and triggers a panic: > for (i = 0; i < hpage_nr_pages(page); i++) { > if (atomic_read(&page[i]._mapcount) >= 0) > return true; > } > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > with a custom kernel module [1] which: > - allocates compound page (PAGEC) of order 1 > - allocates 2 normal pages (COPY), which are initialized to 0xff > (to satisfy _mapcount >= 0) > - 2 PAGEC page structs are copied to address of first COPY page > - second page of COPY is marked as not present > - call to page_mapped(COPY) now triggers fault on access to 2nd > COPY page at offset 0x30 (_mapcount) > > [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > Fix the loop to iterate for "1 << compound_order" pages. This is much less magic than the previous version. It is still not clear to me how is mapping higher order pages to page tables other than THP though. So a more detailed information about the source would bre really welcome. Once we know that we can add a Fixes tag and also mark the patch for stable because that sounds like a stable material. > Debugged-by: Laszlo Ersek <lersek@redhat.com> > Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name> > Signed-off-by: Jan Stancek <jstancek@redhat.com> The patch looks sensible to me Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Changes in v2: > - change the loop instead so we check also mapcount of subpages > > diff --git a/mm/util.c b/mm/util.c > index 8bf08b5b5760..5c9c7359ee8a 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -478,7 +478,7 @@ bool page_mapped(struct page *page) > return true; > if (PageHuge(page)) > return false; > - for (i = 0; i < hpage_nr_pages(page); i++) { > + for (i = 0; i < (1 << compound_order(page)); i++) { > if (atomic_read(&page[i]._mapcount) >= 0) > return true; > } > -- > 1.8.3.1 >
On Fri, Nov 30, 2018 at 01:18:51PM +0100, Michal Hocko wrote: > On Fri 30-11-18 13:06:57, Jan Stancek wrote: > > LTP proc01 testcase has been observed to rarely trigger crashes > > on arm64: > > page_mapped+0x78/0xb4 > > stable_page_flags+0x27c/0x338 > > kpageflags_read+0xfc/0x164 > > proc_reg_read+0x7c/0xb8 > > __vfs_read+0x58/0x178 > > vfs_read+0x90/0x14c > > SyS_read+0x60/0xc0 > > > > Issue is that page_mapped() assumes that if compound page is not > > huge, then it must be THP. But if this is 'normal' compound page > > (COMPOUND_PAGE_DTOR), then following loop can keep running > > (for HPAGE_PMD_NR iterations) until it tries to read from memory > > that isn't mapped and triggers a panic: > > for (i = 0; i < hpage_nr_pages(page); i++) { > > if (atomic_read(&page[i]._mapcount) >= 0) > > return true; > > } > > > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > > with a custom kernel module [1] which: > > - allocates compound page (PAGEC) of order 1 > > - allocates 2 normal pages (COPY), which are initialized to 0xff > > (to satisfy _mapcount >= 0) > > - 2 PAGEC page structs are copied to address of first COPY page > > - second page of COPY is marked as not present > > - call to page_mapped(COPY) now triggers fault on access to 2nd > > COPY page at offset 0x30 (_mapcount) > > > > [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > > > Fix the loop to iterate for "1 << compound_order" pages. > > This is much less magic than the previous version. It is still not clear > to me how is mapping higher order pages to page tables other than THP > though. So a more detailed information about the source would bre really > welcome. Once we know that we can add a Fixes tag and also mark the > patch for stable because that sounds like a stable material. IIRC, sound subsystem can producuce custom mapped compound pages. The bug dates back to e1534ae95004. > > Debugged-by: Laszlo Ersek <lersek@redhat.com> > > Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name> > > Signed-off-by: Jan Stancek <jstancek@redhat.com> > > The patch looks sensible to me > Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
On Fri 30-11-18 15:36:51, Kirill A. Shutemov wrote: > On Fri, Nov 30, 2018 at 01:18:51PM +0100, Michal Hocko wrote: > > On Fri 30-11-18 13:06:57, Jan Stancek wrote: > > > LTP proc01 testcase has been observed to rarely trigger crashes > > > on arm64: > > > page_mapped+0x78/0xb4 > > > stable_page_flags+0x27c/0x338 > > > kpageflags_read+0xfc/0x164 > > > proc_reg_read+0x7c/0xb8 > > > __vfs_read+0x58/0x178 > > > vfs_read+0x90/0x14c > > > SyS_read+0x60/0xc0 > > > > > > Issue is that page_mapped() assumes that if compound page is not > > > huge, then it must be THP. But if this is 'normal' compound page > > > (COMPOUND_PAGE_DTOR), then following loop can keep running > > > (for HPAGE_PMD_NR iterations) until it tries to read from memory > > > that isn't mapped and triggers a panic: > > > for (i = 0; i < hpage_nr_pages(page); i++) { > > > if (atomic_read(&page[i]._mapcount) >= 0) > > > return true; > > > } > > > > > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > > > with a custom kernel module [1] which: > > > - allocates compound page (PAGEC) of order 1 > > > - allocates 2 normal pages (COPY), which are initialized to 0xff > > > (to satisfy _mapcount >= 0) > > > - 2 PAGEC page structs are copied to address of first COPY page > > > - second page of COPY is marked as not present > > > - call to page_mapped(COPY) now triggers fault on access to 2nd > > > COPY page at offset 0x30 (_mapcount) > > > > > > [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > > > > > Fix the loop to iterate for "1 << compound_order" pages. > > > > This is much less magic than the previous version. It is still not clear > > to me how is mapping higher order pages to page tables other than THP > > though. So a more detailed information about the source would bre really > > welcome. Once we know that we can add a Fixes tag and also mark the > > patch for stable because that sounds like a stable material. > > IIRC, sound subsystem can producuce custom mapped compound pages. Do I assume correctly that consecutive ptes simply point to subpages? > The bug dates back to e1534ae95004. Thanks for the pointer. I thought this was a new and creative usage of the pte->page mapping but it really looks like your commit just changed the underlying semantic. > > > Debugged-by: Laszlo Ersek <lersek@redhat.com> > > > Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name> > > > Signed-off-by: Jan Stancek <jstancek@redhat.com> > > > > The patch looks sensible to me > > Acked-by: Michal Hocko <mhocko@suse.com> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > -- > Kirill A. Shutemov
On Fri, Nov 30, 2018 at 01:45:46PM +0100, Michal Hocko wrote: > On Fri 30-11-18 15:36:51, Kirill A. Shutemov wrote: > > On Fri, Nov 30, 2018 at 01:18:51PM +0100, Michal Hocko wrote: > > > On Fri 30-11-18 13:06:57, Jan Stancek wrote: > > > > LTP proc01 testcase has been observed to rarely trigger crashes > > > > on arm64: > > > > page_mapped+0x78/0xb4 > > > > stable_page_flags+0x27c/0x338 > > > > kpageflags_read+0xfc/0x164 > > > > proc_reg_read+0x7c/0xb8 > > > > __vfs_read+0x58/0x178 > > > > vfs_read+0x90/0x14c > > > > SyS_read+0x60/0xc0 > > > > > > > > Issue is that page_mapped() assumes that if compound page is not > > > > huge, then it must be THP. But if this is 'normal' compound page > > > > (COMPOUND_PAGE_DTOR), then following loop can keep running > > > > (for HPAGE_PMD_NR iterations) until it tries to read from memory > > > > that isn't mapped and triggers a panic: > > > > for (i = 0; i < hpage_nr_pages(page); i++) { > > > > if (atomic_read(&page[i]._mapcount) >= 0) > > > > return true; > > > > } > > > > > > > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > > > > with a custom kernel module [1] which: > > > > - allocates compound page (PAGEC) of order 1 > > > > - allocates 2 normal pages (COPY), which are initialized to 0xff > > > > (to satisfy _mapcount >= 0) > > > > - 2 PAGEC page structs are copied to address of first COPY page > > > > - second page of COPY is marked as not present > > > > - call to page_mapped(COPY) now triggers fault on access to 2nd > > > > COPY page at offset 0x30 (_mapcount) > > > > > > > > [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > > > > > > > Fix the loop to iterate for "1 << compound_order" pages. > > > > > > This is much less magic than the previous version. It is still not clear > > > to me how is mapping higher order pages to page tables other than THP > > > though. So a more detailed information about the source would bre really > > > welcome. Once we know that we can add a Fixes tag and also mark the > > > patch for stable because that sounds like a stable material. > > > > IIRC, sound subsystem can producuce custom mapped compound pages. > > Do I assume correctly that consecutive ptes simply point to subpages? Yes. No huge pages there.
On 30.11.18 13:06, Jan Stancek wrote: > LTP proc01 testcase has been observed to rarely trigger crashes > on arm64: > page_mapped+0x78/0xb4 > stable_page_flags+0x27c/0x338 > kpageflags_read+0xfc/0x164 > proc_reg_read+0x7c/0xb8 > __vfs_read+0x58/0x178 > vfs_read+0x90/0x14c > SyS_read+0x60/0xc0 > > Issue is that page_mapped() assumes that if compound page is not > huge, then it must be THP. But if this is 'normal' compound page > (COMPOUND_PAGE_DTOR), then following loop can keep running > (for HPAGE_PMD_NR iterations) until it tries to read from memory > that isn't mapped and triggers a panic: > for (i = 0; i < hpage_nr_pages(page); i++) { > if (atomic_read(&page[i]._mapcount) >= 0) > return true; > } > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > with a custom kernel module [1] which: > - allocates compound page (PAGEC) of order 1 > - allocates 2 normal pages (COPY), which are initialized to 0xff > (to satisfy _mapcount >= 0) > - 2 PAGEC page structs are copied to address of first COPY page > - second page of COPY is marked as not present > - call to page_mapped(COPY) now triggers fault on access to 2nd > COPY page at offset 0x30 (_mapcount) > > [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > Fix the loop to iterate for "1 << compound_order" pages. > > Debugged-by: Laszlo Ersek <lersek@redhat.com> > Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name> > Signed-off-by: Jan Stancek <jstancek@redhat.com> > --- > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Changes in v2: > - change the loop instead so we check also mapcount of subpages > > diff --git a/mm/util.c b/mm/util.c > index 8bf08b5b5760..5c9c7359ee8a 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -478,7 +478,7 @@ bool page_mapped(struct page *page) > return true; > if (PageHuge(page)) > return false; > - for (i = 0; i < hpage_nr_pages(page); i++) { > + for (i = 0; i < (1 << compound_order(page)); i++) { > if (atomic_read(&page[i]._mapcount) >= 0) > return true; > } > Reviewed-by: David Hildenbrand <david@redhat.com>
On 11/30/18 13:06, Jan Stancek wrote: > LTP proc01 testcase has been observed to rarely trigger crashes > on arm64: > page_mapped+0x78/0xb4 > stable_page_flags+0x27c/0x338 > kpageflags_read+0xfc/0x164 > proc_reg_read+0x7c/0xb8 > __vfs_read+0x58/0x178 > vfs_read+0x90/0x14c > SyS_read+0x60/0xc0 > > Issue is that page_mapped() assumes that if compound page is not > huge, then it must be THP. But if this is 'normal' compound page > (COMPOUND_PAGE_DTOR), then following loop can keep running > (for HPAGE_PMD_NR iterations) until it tries to read from memory > that isn't mapped and triggers a panic: > for (i = 0; i < hpage_nr_pages(page); i++) { > if (atomic_read(&page[i]._mapcount) >= 0) > return true; > } > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > with a custom kernel module [1] which: > - allocates compound page (PAGEC) of order 1 > - allocates 2 normal pages (COPY), which are initialized to 0xff > (to satisfy _mapcount >= 0) > - 2 PAGEC page structs are copied to address of first COPY page > - second page of COPY is marked as not present > - call to page_mapped(COPY) now triggers fault on access to 2nd > COPY page at offset 0x30 (_mapcount) > > [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > Fix the loop to iterate for "1 << compound_order" pages. > > Debugged-by: Laszlo Ersek <lersek@redhat.com> > Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name> > Signed-off-by: Jan Stancek <jstancek@redhat.com> > --- > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Changes in v2: > - change the loop instead so we check also mapcount of subpages > > diff --git a/mm/util.c b/mm/util.c > index 8bf08b5b5760..5c9c7359ee8a 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -478,7 +478,7 @@ bool page_mapped(struct page *page) > return true; > if (PageHuge(page)) > return false; > - for (i = 0; i < hpage_nr_pages(page); i++) { > + for (i = 0; i < (1 << compound_order(page)); i++) { > if (atomic_read(&page[i]._mapcount) >= 0) > return true; > } > Totally uninformed side-question: how large can the return value of compound_order() be? MAX_ORDER? Apparently, MAX_ORDER can be defined as CONFIG_FORCE_MAX_ZONEORDER. "config FORCE_MAX_ZONEORDER" is listed in a number of Kconfig files. Among those, "arch/mips/Kconfig" permits "ranges" (?) that extend up to 64. Same applies to "arch/powerpc/Kconfig" and "arch/sh/mm/Kconfig". If we left-shift "1" -- a signed int, which I assume in practice will always have two's complement representation, 1 sign bit, 31 value bits, and 0 padding bits --, by 31 or more bit positions, we get undefined behavior (as part of the left-shift operation). Is this a practical concern? Thanks, Laszlo
On Mon, Dec 03, 2018 at 11:23:58AM +0100, Laszlo Ersek wrote: > Totally uninformed side-question: > > how large can the return value of compound_order() be? MAX_ORDER? > > Apparently, MAX_ORDER can be defined as CONFIG_FORCE_MAX_ZONEORDER. > > "config FORCE_MAX_ZONEORDER" is listed in a number of Kconfig files. > Among those, "arch/mips/Kconfig" permits "ranges" (?) that extend up to > 64. Same applies to "arch/powerpc/Kconfig" and "arch/sh/mm/Kconfig". > > If we left-shift "1" -- a signed int, which I assume in practice will > always have two's complement representation, 1 sign bit, 31 value bits, > and 0 padding bits --, by 31 or more bit positions, we get undefined > behavior (as part of the left-shift operation). > > Is this a practical concern? Not really. Assuming 4k PAGE_SIZE, compound_order() == 31 means 8 TiB pages. I doubt we will see such allocation requests any time soon. Even with 1k base page size, it's still 2 TiB. We will see other limitations in page allocaiton path before the compund order type will be an issue.
[ CC'ed Andrew for potential inclusion in -mm ] On Fri, Nov 30, 2018 at 01:06:57PM +0100, Jan Stancek wrote: > LTP proc01 testcase has been observed to rarely trigger crashes > on arm64: > page_mapped+0x78/0xb4 > stable_page_flags+0x27c/0x338 > kpageflags_read+0xfc/0x164 > proc_reg_read+0x7c/0xb8 > __vfs_read+0x58/0x178 > vfs_read+0x90/0x14c > SyS_read+0x60/0xc0 > > Issue is that page_mapped() assumes that if compound page is not > huge, then it must be THP. But if this is 'normal' compound page > (COMPOUND_PAGE_DTOR), then following loop can keep running > (for HPAGE_PMD_NR iterations) until it tries to read from memory > that isn't mapped and triggers a panic: > for (i = 0; i < hpage_nr_pages(page); i++) { > if (atomic_read(&page[i]._mapcount) >= 0) > return true; > } > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > with a custom kernel module [1] which: > - allocates compound page (PAGEC) of order 1 > - allocates 2 normal pages (COPY), which are initialized to 0xff > (to satisfy _mapcount >= 0) > - 2 PAGEC page structs are copied to address of first COPY page > - second page of COPY is marked as not present > - call to page_mapped(COPY) now triggers fault on access to 2nd > COPY page at offset 0x30 (_mapcount) > > [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > Fix the loop to iterate for "1 << compound_order" pages. > > Debugged-by: Laszlo Ersek <lersek@redhat.com> > Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name> > Signed-off-by: Jan Stancek <jstancek@redhat.com> > --- > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Changes in v2: > - change the loop instead so we check also mapcount of subpages Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Thanks, Andrea
On Fri, Nov 30, 2018 at 1:07 PM Jan Stancek <jstancek@redhat.com> wrote: > > LTP proc01 testcase has been observed to rarely trigger crashes > on arm64: > page_mapped+0x78/0xb4 > stable_page_flags+0x27c/0x338 > kpageflags_read+0xfc/0x164 > proc_reg_read+0x7c/0xb8 > __vfs_read+0x58/0x178 > vfs_read+0x90/0x14c > SyS_read+0x60/0xc0 > > Issue is that page_mapped() assumes that if compound page is not > huge, then it must be THP. But if this is 'normal' compound page > (COMPOUND_PAGE_DTOR), then following loop can keep running > (for HPAGE_PMD_NR iterations) until it tries to read from memory > that isn't mapped and triggers a panic: > for (i = 0; i < hpage_nr_pages(page); i++) { > if (atomic_read(&page[i]._mapcount) >= 0) > return true; > } > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > with a custom kernel module [1] which: > - allocates compound page (PAGEC) of order 1 > - allocates 2 normal pages (COPY), which are initialized to 0xff > (to satisfy _mapcount >= 0) > - 2 PAGEC page structs are copied to address of first COPY page > - second page of COPY is marked as not present > - call to page_mapped(COPY) now triggers fault on access to 2nd > COPY page at offset 0x30 (_mapcount) > > [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > Fix the loop to iterate for "1 << compound_order" pages. > > Debugged-by: Laszlo Ersek <lersek@redhat.com> > Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name> > Signed-off-by: Jan Stancek <jstancek@redhat.com> > --- > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Changes in v2: > - change the loop instead so we check also mapcount of subpages > > diff --git a/mm/util.c b/mm/util.c > index 8bf08b5b5760..5c9c7359ee8a 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -478,7 +478,7 @@ bool page_mapped(struct page *page) > return true; > if (PageHuge(page)) > return false; > - for (i = 0; i < hpage_nr_pages(page); i++) { > + for (i = 0; i < (1 << compound_order(page)); i++) { > if (atomic_read(&page[i]._mapcount) >= 0) > return true; > } > -- > 1.8.3.1 Hi all This patch landed in the 4.9-stable tree starting from 4.9.151 and it broke our MIPS1004kc system with CONFIG_HIGHMEM=y. The breakage consists of random processes dying with SIGILL or SIGSEGV when we stress test the system with high memory pressure and explicit memory compaction requested through /proc/sys/vm/compact_memory. Reverting this patch fixes the crashes. We can put some effort on debugging if there are no obvious explanations for this. Keep in mind that this is 32-bit system with HIGHMEM. BR, Lars
----- Original Message ----- > On Fri, Nov 30, 2018 at 1:07 PM Jan Stancek <jstancek@redhat.com> wrote: > > > > LTP proc01 testcase has been observed to rarely trigger crashes > > on arm64: > > page_mapped+0x78/0xb4 > > stable_page_flags+0x27c/0x338 > > kpageflags_read+0xfc/0x164 > > proc_reg_read+0x7c/0xb8 > > __vfs_read+0x58/0x178 > > vfs_read+0x90/0x14c > > SyS_read+0x60/0xc0 > > > > Issue is that page_mapped() assumes that if compound page is not > > huge, then it must be THP. But if this is 'normal' compound page > > (COMPOUND_PAGE_DTOR), then following loop can keep running > > (for HPAGE_PMD_NR iterations) until it tries to read from memory > > that isn't mapped and triggers a panic: > > for (i = 0; i < hpage_nr_pages(page); i++) { > > if (atomic_read(&page[i]._mapcount) >= 0) > > return true; > > } > > > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > > with a custom kernel module [1] which: > > - allocates compound page (PAGEC) of order 1 > > - allocates 2 normal pages (COPY), which are initialized to 0xff > > (to satisfy _mapcount >= 0) > > - 2 PAGEC page structs are copied to address of first COPY page > > - second page of COPY is marked as not present > > - call to page_mapped(COPY) now triggers fault on access to 2nd > > COPY page at offset 0x30 (_mapcount) > > > > [1] > > https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > > > Fix the loop to iterate for "1 << compound_order" pages. > > > > Debugged-by: Laszlo Ersek <lersek@redhat.com> > > Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name> > > Signed-off-by: Jan Stancek <jstancek@redhat.com> > > --- > > mm/util.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Changes in v2: > > - change the loop instead so we check also mapcount of subpages > > > > diff --git a/mm/util.c b/mm/util.c > > index 8bf08b5b5760..5c9c7359ee8a 100644 > > --- a/mm/util.c > > +++ b/mm/util.c > > @@ -478,7 +478,7 @@ bool page_mapped(struct page *page) > > return true; > > if (PageHuge(page)) > > return false; > > - for (i = 0; i < hpage_nr_pages(page); i++) { > > + for (i = 0; i < (1 << compound_order(page)); i++) { > > if (atomic_read(&page[i]._mapcount) >= 0) > > return true; > > } > > -- > > 1.8.3.1 > > Hi all > > This patch landed in the 4.9-stable tree starting from 4.9.151 and it > broke our MIPS1004kc system with CONFIG_HIGHMEM=y. Hi, are you using THP (CONFIG_TRANSPARENT_HUGEPAGE)? The changed line should affect only THP and normal compound pages, so a test with THP disabled might be interesting. > > The breakage consists of random processes dying with SIGILL or SIGSEGV > when we stress test the system with high memory pressure and explicit > memory compaction requested through /proc/sys/vm/compact_memory. > Reverting this patch fixes the crashes. > > We can put some effort on debugging if there are no obvious > explanations for this. Keep in mind that this is 32-bit system with > HIGHMEM. Nothing obvious that I can see. I've been trying to reproduce on 32-bit x86 Fedora with no luck so far. Regards, Jan
On Tue, Feb 5, 2019 at 8:14 AM Jan Stancek <jstancek@redhat.com> wrote: > Hi, > > are you using THP (CONFIG_TRANSPARENT_HUGEPAGE)? > > The changed line should affect only THP and normal compound pages, > so a test with THP disabled might be interesting. > > > > > The breakage consists of random processes dying with SIGILL or SIGSEGV > > when we stress test the system with high memory pressure and explicit > > memory compaction requested through /proc/sys/vm/compact_memory. > > Reverting this patch fixes the crashes. > > > > We can put some effort on debugging if there are no obvious > > explanations for this. Keep in mind that this is 32-bit system with > > HIGHMEM. > > Nothing obvious that I can see. I've been trying to reproduce on > 32-bit x86 Fedora with no luck so far. > Hi Thanks for looking in to it. After some deep dive in MM code, I think it is safe to say this patch was innocent. All traces studied so far points to a missing cache coherency call in mm/migrate.c:migrate_page that is needed only for those evil MIPSes that lack I/D cache coherency. I will send a write-up to linux-mips about this. Basically for a non-mapped page it does only a copy of page data and metadata but no flush_dcache_page() call will be done. This races with subsequent use of the page. BR, Lars
On Mon 18-02-19 14:43:58, Lars Persson wrote: > On Tue, Feb 5, 2019 at 8:14 AM Jan Stancek <jstancek@redhat.com> wrote: > > Hi, > > > > are you using THP (CONFIG_TRANSPARENT_HUGEPAGE)? > > > > The changed line should affect only THP and normal compound pages, > > so a test with THP disabled might be interesting. > > > > > > > > The breakage consists of random processes dying with SIGILL or SIGSEGV > > > when we stress test the system with high memory pressure and explicit > > > memory compaction requested through /proc/sys/vm/compact_memory. > > > Reverting this patch fixes the crashes. > > > > > > We can put some effort on debugging if there are no obvious > > > explanations for this. Keep in mind that this is 32-bit system with > > > HIGHMEM. > > > > Nothing obvious that I can see. I've been trying to reproduce on > > 32-bit x86 Fedora with no luck so far. > > > > Hi > > Thanks for looking in to it. After some deep dive in MM code, I think > it is safe to say this patch was innocent. > > All traces studied so far points to a missing cache coherency call in > mm/migrate.c:migrate_page that is needed only for those evil MIPSes > that lack I/D cache coherency. I will send a write-up to linux-mips > about this. Basically for a non-mapped page it does only a copy of > page data and metadata but no flush_dcache_page() call will be done. > This races with subsequent use of the page. Please make sure to cc linux-mm for the patch
diff --git a/mm/util.c b/mm/util.c index 8bf08b5b5760..5c9c7359ee8a 100644 --- a/mm/util.c +++ b/mm/util.c @@ -478,7 +478,7 @@ bool page_mapped(struct page *page) return true; if (PageHuge(page)) return false; - for (i = 0; i < hpage_nr_pages(page); i++) { + for (i = 0; i < (1 << compound_order(page)); i++) { if (atomic_read(&page[i]._mapcount) >= 0) return true; }
LTP proc01 testcase has been observed to rarely trigger crashes on arm64: page_mapped+0x78/0xb4 stable_page_flags+0x27c/0x338 kpageflags_read+0xfc/0x164 proc_reg_read+0x7c/0xb8 __vfs_read+0x58/0x178 vfs_read+0x90/0x14c SyS_read+0x60/0xc0 Issue is that page_mapped() assumes that if compound page is not huge, then it must be THP. But if this is 'normal' compound page (COMPOUND_PAGE_DTOR), then following loop can keep running (for HPAGE_PMD_NR iterations) until it tries to read from memory that isn't mapped and triggers a panic: for (i = 0; i < hpage_nr_pages(page); i++) { if (atomic_read(&page[i]._mapcount) >= 0) return true; } I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only with a custom kernel module [1] which: - allocates compound page (PAGEC) of order 1 - allocates 2 normal pages (COPY), which are initialized to 0xff (to satisfy _mapcount >= 0) - 2 PAGEC page structs are copied to address of first COPY page - second page of COPY is marked as not present - call to page_mapped(COPY) now triggers fault on access to 2nd COPY page at offset 0x30 (_mapcount) [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c Fix the loop to iterate for "1 << compound_order" pages. Debugged-by: Laszlo Ersek <lersek@redhat.com> Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Jan Stancek <jstancek@redhat.com> --- mm/util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Changes in v2: - change the loop instead so we check also mapcount of subpages