Message ID | 20230707033859.16148-1-songmuchun@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: hugetlb_vmemmap: fix a race between vmemmap pmd split | expand |
On Fri, 7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd > page without holding page_table_lock may possiblely get the page table > page instead of a huge pmd page. The effect may be in set_pte_at() > since we may pass an invalid page struct, if set_pte_at() wants to > access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it > may crash the kernel. So fix it. And inline __split_vmemmap_huge_pmd() > since it only has one user. Is this likely enough to justify a backport? I'm thinking "add cc:stable and merge into 6.6-rc1", so it hits -stable after a couple of months of testing.
On Fri, 7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > And inline __split_vmemmap_huge_pmd() > since it only has one user. "open code" would be a better term than "inline" in this situation. If we are to offer this change to -stable then it would be better to do the open-coding of __split_vmemmap_huge_pmd() in a separate, later patch.
> On Jul 8, 2023, at 03:41, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Fri, 7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > >> And inline __split_vmemmap_huge_pmd() >> since it only has one user. > > "open code" would be a better term than "inline" in this situation. > > If we are to offer this change to -stable then it would be better to do > the open-coding of __split_vmemmap_huge_pmd() in a separate, later > patch. > I see. Bug fix is better to "open code" instead of "inline". However, it is a simpler and cleaner way to fix this bug by using "inline". Because we must hold init_mm.page_table_lock to get the page table page in __split_vmemmap_huge_pmd(), then it is just a couple of duplicated code from split_vmemmap_huge_pmd(). Consequently, split_vmemmap_huge_pmd() is redundant, just remove it. And rename __split_vmemmap_huge_pmd() to split_vmemmap_huge_pmd(). The result is the same as the "inline" way. So I keep "inline" to fix this. Thanks.
> On Jul 8, 2023, at 03:38, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Fri, 7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > >> The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd >> page without holding page_table_lock may possiblely get the page table >> page instead of a huge pmd page. The effect may be in set_pte_at() >> since we may pass an invalid page struct, if set_pte_at() wants to >> access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it >> may crash the kernel. So fix it. And inline __split_vmemmap_huge_pmd() >> since it only has one user. > > Is this likely enough to justify a backport? > > I'm thinking "add cc:stable and merge into 6.6-rc1", so it hits -stable > after a couple of months of testing. > Hi Andrew, It is better to backport it to stable. Could you help me add it? Thanks.
On Sat, 8 Jul 2023 09:42:57 +0800 Muchun Song <muchun.song@linux.dev> wrote: > > > > On Jul 8, 2023, at 03:38, Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Fri, 7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > > > >> The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd > >> page without holding page_table_lock may possiblely get the page table > >> page instead of a huge pmd page. The effect may be in set_pte_at() > >> since we may pass an invalid page struct, if set_pte_at() wants to > >> access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it > >> may crash the kernel. So fix it. And inline __split_vmemmap_huge_pmd() > >> since it only has one user. > > > > Is this likely enough to justify a backport? > > > > I'm thinking "add cc:stable and merge into 6.6-rc1", so it hits -stable > > after a couple of months of testing. > > > > Hi Andrew, > > It is better to backport it to stable. Could you help me add it? > I have added cc:stable to this and I have staged it for 6.6-rc1.
On 07/07/23 11:38, Muchun Song wrote: > The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd > page without holding page_table_lock may possiblely get the page table > page instead of a huge pmd page. The effect may be in set_pte_at() > since we may pass an invalid page struct, if set_pte_at() wants to > access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it > may crash the kernel. So fix it. And inline __split_vmemmap_huge_pmd() > since it only has one user. > > Fixes: d8d55f5616cf ("mm: sparsemem: use page table lock to protect kernel pmd operations") > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/hugetlb_vmemmap.c | 34 ++++++++++++++-------------------- > 1 file changed, 14 insertions(+), 20 deletions(-) Sorry, for the very late reply! This code looks fine to me, Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> The discussion about 'open code' or inline' has me a bit confused. I am perfectly happy with the code as it is. When I hear/see inline, I think of the inline function keyword. Since that is not happening in this patch, the mention of 'inline __split_vmemmap_huge_pmd()' does cause me to think a bit more than usual. Yes, the backports will need to be modified.
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index c2007ef5e9b0..4b9734777f69 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -36,14 +36,22 @@ struct vmemmap_remap_walk { struct list_head *vmemmap_pages; }; -static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) { pmd_t __pmd; int i; unsigned long addr = start; - struct page *page = pmd_page(*pmd); - pte_t *pgtable = pte_alloc_one_kernel(&init_mm); + struct page *head; + pte_t *pgtable; + + spin_lock(&init_mm.page_table_lock); + head = pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL; + spin_unlock(&init_mm.page_table_lock); + if (!head) + return 0; + + pgtable = pte_alloc_one_kernel(&init_mm); if (!pgtable) return -ENOMEM; @@ -53,7 +61,7 @@ static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) pte_t entry, *pte; pgprot_t pgprot = PAGE_KERNEL; - entry = mk_pte(page + i, pgprot); + entry = mk_pte(head + i, pgprot); pte = pte_offset_kernel(&__pmd, addr); set_pte_at(&init_mm, addr, pte, entry); } @@ -65,8 +73,8 @@ static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) * be treated as indepdenent small pages (as they can be freed * individually). */ - if (!PageReserved(page)) - split_page(page, get_order(PMD_SIZE)); + if (!PageReserved(head)) + split_page(head, get_order(PMD_SIZE)); /* Make pte visible before pmd. See comment in pmd_install(). */ smp_wmb(); @@ -80,20 +88,6 @@ static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) return 0; } -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) -{ - int leaf; - - spin_lock(&init_mm.page_table_lock); - leaf = pmd_leaf(*pmd); - spin_unlock(&init_mm.page_table_lock); - - if (!leaf) - return 0; - - return __split_vmemmap_huge_pmd(pmd, start); -} - static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct vmemmap_remap_walk *walk)
The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd page without holding page_table_lock may possiblely get the page table page instead of a huge pmd page. The effect may be in set_pte_at() since we may pass an invalid page struct, if set_pte_at() wants to access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it may crash the kernel. So fix it. And inline __split_vmemmap_huge_pmd() since it only has one user. Fixes: d8d55f5616cf ("mm: sparsemem: use page table lock to protect kernel pmd operations") Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- mm/hugetlb_vmemmap.c | 34 ++++++++++++++-------------------- 1 file changed, 14 insertions(+), 20 deletions(-)