Message ID | 20191115115808.21181-2-thomas_os@shipmail.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] mm: Move the backup x_devmap() functions to asm-generic/pgtable.h | expand |
On Fri, 15 Nov 2019 12:58:08 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > A huge pud page can theoretically be faulted in racing with pmd_alloc() > in __handle_mm_fault(). That will lead to pmd_alloc() returning an > invalid pmd pointer. Fix this by adding a pud_trans_unstable() function > similar to pmd_trans_unstable() and check whether the pud is really stable > before using the pmd pointer. > > Race: > Thread 1: Thread 2: Comment > create_huge_pud() Fallback - not taken. > create_huge_pud() Taken. > pmd_alloc() Returns an invalid pointer. What are the user-visible runtime effects of this change? Is a -stable backport warranted?
On Fri, Nov 15, 2019 at 11:58:00AM -0800, Andrew Morton wrote: > On Fri, 15 Nov 2019 12:58:08 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > > > A huge pud page can theoretically be faulted in racing with pmd_alloc() > > in __handle_mm_fault(). That will lead to pmd_alloc() returning an > > invalid pmd pointer. Fix this by adding a pud_trans_unstable() function > > similar to pmd_trans_unstable() and check whether the pud is really stable > > before using the pmd pointer. > > > > Race: > > Thread 1: Thread 2: Comment > > create_huge_pud() Fallback - not taken. > > create_huge_pud() Taken. > > pmd_alloc() Returns an invalid pointer. > > What are the user-visible runtime effects of this change? Data corruption: kernel writes to a huge page thing it's page table. > Is a -stable backport warranted? I believe it is. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
On 11/18/19 11:22 AM, Kirill A. Shutemov wrote: > On Fri, Nov 15, 2019 at 11:58:00AM -0800, Andrew Morton wrote: >> On Fri, 15 Nov 2019 12:58:08 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: >> >>> A huge pud page can theoretically be faulted in racing with pmd_alloc() >>> in __handle_mm_fault(). That will lead to pmd_alloc() returning an >>> invalid pmd pointer. Fix this by adding a pud_trans_unstable() function >>> similar to pmd_trans_unstable() and check whether the pud is really stable >>> before using the pmd pointer. >>> >>> Race: >>> Thread 1: Thread 2: Comment >>> create_huge_pud() Fallback - not taken. >>> create_huge_pud() Taken. >>> pmd_alloc() Returns an invalid pointer. >> What are the user-visible runtime effects of this change? > Data corruption: kernel writes to a huge page thing it's page table. > >> Is a -stable backport warranted? > I believe it is. Note that this was caught during a code audit rather than a real experienced problem. It looks to me like the only implementation that currently creates huge pud pagetable entries is dev_dax_huge_fault() which doesn't appear to care much about private (COW) mappings or write-tracking which is, I believe, a prerequisite for create_huge_pud() falling back on thread 1, but not in thread 2. This means (assuming that's intentional) that a stable backport shouldn't be needed. For the WIP huge page support for graphics memory we'll be allowing both COW mappings and write-tracking, though, but that's still some time away. In any case, I think this patch needs -rc testing to catch potential pud_devmap issues before submitted to stable. Thanks, Thomas > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >
On Mon, 18 Nov 2019 13:58:04 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > >> Is a -stable backport warranted? > > I believe it is. > > Note that this was caught during a code audit rather than a real > experienced problem. It looks to me like the only implementation that > currently creates huge pud pagetable entries is dev_dax_huge_fault() > which doesn't appear to care much about private (COW) mappings or > write-tracking which is, I believe, a prerequisite for create_huge_pud() > falling back on thread 1, but not in thread 2. > > This means (assuming that's intentional) that a stable backport > shouldn't be needed. > > For the WIP huge page support for graphics memory we'll be allowing both > COW mappings and write-tracking, though, but that's still some time away. > > In any case, I think this patch needs -rc testing to catch potential > pud_devmap issues before submitted to stable. OK, thanks, I'll queue it for 5.5-rc1 with a -stable tag. Hopefully that way it will get a bit of exposure before the stable trees pick it up. Maybe this is optimistic..
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 8efa45580fd0..c40a0ced53bd 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -927,6 +927,31 @@ static inline int pud_trans_huge(pud_t pud) } #endif +/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */ +static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud) +{ + pud_t pudval = READ_ONCE(*pud); + + if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) + return 1; + if (unlikely(pud_bad(pudval))) { + pud_clear_bad(pud); + return 1; + } + return 0; +} + +/* See pmd_trans_unstable for discussion. */ +static inline int pud_trans_unstable(pud_t *pud) +{ +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) + return pud_none_or_trans_huge_or_dev_or_clear_bad(pud); +#else + return 0; +#endif +} + #ifndef pmd_read_atomic static inline pmd_t pmd_read_atomic(pmd_t *pmdp) { diff --git a/mm/memory.c b/mm/memory.c index b1ca51a079f2..43ff372f4f07 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3914,6 +3914,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, vmf.pud = pud_alloc(mm, p4d, address); if (!vmf.pud) return VM_FAULT_OOM; +retry_pud: if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) { ret = create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) @@ -3940,6 +3941,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, vmf.pmd = pmd_alloc(mm, vmf.pud, address); if (!vmf.pmd) return VM_FAULT_OOM; + + /* Huge pud page fault raced with pmd_alloc? */ + if (pud_trans_unstable(vmf.pud)) + goto retry_pud; + if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) { ret = create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK))