Message ID | 20191008093711.3410-1-thomas_os@shipmail.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] mm: Fix a huge pud insertion race during faulting | expand |
On Tue, Oct 08, 2019 at 11:37:11AM +0200, Thomas Hellström (VMware) wrote: > From: Thomas Hellstrom <thellstrom@vmware.com> > > A huge pud page can theoretically be faulted in racing with pmd_alloc() > in __handle_mm_fault(). That will lead to pmd_alloc() returning an > invalid pmd pointer. Fix this by adding a pud_trans_unstable() function > similar to pmd_trans_unstable() and check whether the pud is really stable > before using the pmd pointer. > > Race: > Thread 1: Thread 2: Comment > create_huge_pud() Fallback - not taken. > create_huge_pud() Taken. > pmd_alloc() Returns an invalid pointer. > > Cc: Matthew Wilcox <willy@infradead.org> > Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") > Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> > --- > RFC: We include pud_devmap() as an unstable PUD flag. Is this correct? > Do the same for pmds? I *think* it is correct and we should do the same for PMD, but I may be wrong. Dan, Matthew, could you comment on this? > --- > include/asm-generic/pgtable.h | 25 +++++++++++++++++++++++++ > mm/memory.c | 6 ++++++ > 2 files changed, 31 insertions(+) > > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h > index 818691846c90..70c2058230ba 100644 > --- a/include/asm-generic/pgtable.h > +++ b/include/asm-generic/pgtable.h > @@ -912,6 +912,31 @@ static inline int pud_trans_huge(pud_t pud) > } > #endif > > +/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */ > +static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud) > +{ > + pud_t pudval = READ_ONCE(*pud); > + > + if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) > + return 1; > + if (unlikely(pud_bad(pudval))) { > + pud_clear_bad(pud); > + return 1; > + } > + return 0; > +} > + > +/* See pmd_trans_unstable for discussion. */ > +static inline int pud_trans_unstable(pud_t *pud) > +{ > +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ > + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) > + return pud_none_or_trans_huge_or_dev_or_clear_bad(pud); > +#else > + return 0; > +#endif > +} > + > #ifndef pmd_read_atomic > static inline pmd_t pmd_read_atomic(pmd_t *pmdp) > { > diff --git a/mm/memory.c b/mm/memory.c > index b1ca51a079f2..43ff372f4f07 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3914,6 +3914,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > vmf.pud = pud_alloc(mm, p4d, address); > if (!vmf.pud) > return VM_FAULT_OOM; > +retry_pud: > if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) { > ret = create_huge_pud(&vmf); > if (!(ret & VM_FAULT_FALLBACK)) > @@ -3940,6 +3941,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > vmf.pmd = pmd_alloc(mm, vmf.pud, address); > if (!vmf.pmd) > return VM_FAULT_OOM; > + > + /* Huge pud page fault raced with pmd_alloc? */ > + if (pud_trans_unstable(vmf.pud)) > + goto retry_pud; > + > if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) { > ret = create_huge_pmd(&vmf); > if (!(ret & VM_FAULT_FALLBACK)) > -- > 2.20.1 >
On Tue, Oct 15, 2019 at 3:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote: > > On Tue, Oct 08, 2019 at 11:37:11AM +0200, Thomas Hellström (VMware) wrote: > > From: Thomas Hellstrom <thellstrom@vmware.com> > > > > A huge pud page can theoretically be faulted in racing with pmd_alloc() > > in __handle_mm_fault(). That will lead to pmd_alloc() returning an > > invalid pmd pointer. Fix this by adding a pud_trans_unstable() function > > similar to pmd_trans_unstable() and check whether the pud is really stable > > before using the pmd pointer. > > > > Race: > > Thread 1: Thread 2: Comment > > create_huge_pud() Fallback - not taken. > > create_huge_pud() Taken. > > pmd_alloc() Returns an invalid pointer. > > > > Cc: Matthew Wilcox <willy@infradead.org> > > Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") > > Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> > > --- > > RFC: We include pud_devmap() as an unstable PUD flag. Is this correct? > > Do the same for pmds? > > I *think* it is correct and we should do the same for PMD, but I may be > wrong. > > Dan, Matthew, could you comment on this? The _devmap() check in these paths near _trans_unstable() has always been about avoiding assumptions that the corresponding page might be page cache or anonymous which for dax it's neither and does not behave like a typical page.
Hi, Dan, On 10/16/19 3:44 AM, Dan Williams wrote: > On Tue, Oct 15, 2019 at 3:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote: >> On Tue, Oct 08, 2019 at 11:37:11AM +0200, Thomas Hellström (VMware) wrote: >>> From: Thomas Hellstrom <thellstrom@vmware.com> >>> >>> A huge pud page can theoretically be faulted in racing with pmd_alloc() >>> in __handle_mm_fault(). That will lead to pmd_alloc() returning an >>> invalid pmd pointer. Fix this by adding a pud_trans_unstable() function >>> similar to pmd_trans_unstable() and check whether the pud is really stable >>> before using the pmd pointer. >>> >>> Race: >>> Thread 1: Thread 2: Comment >>> create_huge_pud() Fallback - not taken. >>> create_huge_pud() Taken. >>> pmd_alloc() Returns an invalid pointer. >>> >>> Cc: Matthew Wilcox <willy@infradead.org> >>> Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") >>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> >>> --- >>> RFC: We include pud_devmap() as an unstable PUD flag. Is this correct? >>> Do the same for pmds? >> I *think* it is correct and we should do the same for PMD, but I may be >> wrong. >> >> Dan, Matthew, could you comment on this? > The _devmap() check in these paths near _trans_unstable() has always > been about avoiding assumptions that the corresponding page might be > page cache or anonymous which for dax it's neither and does not behave > like a typical page. The concern here is that _trans_huge() returns false for _devmap() pages, which means that also _trans_unstable() returns false. Still, I figure someone could zap the entry at any time using madvise(), so AFAICT the entry is indeed unstable, and it's a bug not to include _devmap() in the _trans_unstable() functions? Thanks, Thomas
On Tue, Oct 15, 2019 at 10:59 PM Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > > Hi, Dan, > > On 10/16/19 3:44 AM, Dan Williams wrote: > > On Tue, Oct 15, 2019 at 3:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote: > >> On Tue, Oct 08, 2019 at 11:37:11AM +0200, Thomas Hellström (VMware) wrote: > >>> From: Thomas Hellstrom <thellstrom@vmware.com> > >>> > >>> A huge pud page can theoretically be faulted in racing with pmd_alloc() > >>> in __handle_mm_fault(). That will lead to pmd_alloc() returning an > >>> invalid pmd pointer. Fix this by adding a pud_trans_unstable() function > >>> similar to pmd_trans_unstable() and check whether the pud is really stable > >>> before using the pmd pointer. > >>> > >>> Race: > >>> Thread 1: Thread 2: Comment > >>> create_huge_pud() Fallback - not taken. > >>> create_huge_pud() Taken. > >>> pmd_alloc() Returns an invalid pointer. > >>> > >>> Cc: Matthew Wilcox <willy@infradead.org> > >>> Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") > >>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> > >>> --- > >>> RFC: We include pud_devmap() as an unstable PUD flag. Is this correct? > >>> Do the same for pmds? > >> I *think* it is correct and we should do the same for PMD, but I may be > >> wrong. > >> > >> Dan, Matthew, could you comment on this? > > The _devmap() check in these paths near _trans_unstable() has always > > been about avoiding assumptions that the corresponding page might be > > page cache or anonymous which for dax it's neither and does not behave > > like a typical page. > > The concern here is that _trans_huge() returns false for _devmap() > pages, which means that also _trans_unstable() returns false. > > Still, I figure someone could zap the entry at any time using madvise(), > so AFAICT the entry is indeed unstable, and it's a bug not to include > _devmap() in the _trans_unstable() functions? Yes, I can't think a case where it is wrong to include _devmap() in a _trans_unstable(). It may be unnecessary if the given path can't reasonably ever encounter a file-backed dax mapping, but it's otherwise ok to always consider _devmap().
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 818691846c90..70c2058230ba 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -912,6 +912,31 @@ static inline int pud_trans_huge(pud_t pud) } #endif +/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */ +static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud) +{ + pud_t pudval = READ_ONCE(*pud); + + if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) + return 1; + if (unlikely(pud_bad(pudval))) { + pud_clear_bad(pud); + return 1; + } + return 0; +} + +/* See pmd_trans_unstable for discussion. */ +static inline int pud_trans_unstable(pud_t *pud) +{ +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) + return pud_none_or_trans_huge_or_dev_or_clear_bad(pud); +#else + return 0; +#endif +} + #ifndef pmd_read_atomic static inline pmd_t pmd_read_atomic(pmd_t *pmdp) { diff --git a/mm/memory.c b/mm/memory.c index b1ca51a079f2..43ff372f4f07 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3914,6 +3914,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, vmf.pud = pud_alloc(mm, p4d, address); if (!vmf.pud) return VM_FAULT_OOM; +retry_pud: if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) { ret = create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) @@ -3940,6 +3941,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, vmf.pmd = pmd_alloc(mm, vmf.pud, address); if (!vmf.pmd) return VM_FAULT_OOM; + + /* Huge pud page fault raced with pmd_alloc? */ + if (pud_trans_unstable(vmf.pud)) + goto retry_pud; + if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) { ret = create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK))