Message ID | 20250214030349.45524-1-zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] arm: pgtable: fix NULL pointer dereference issue | expand |
On 14.02.25 04:03, Qi Zheng wrote: > When update_mmu_cache_range() is called by update_mmu_cache(), the vmf > parameter is NULL, which will cause a NULL pointer dereference issue in > adjust_pte(): > > Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read > Hardware name: Atmel AT91SAM9 > PC is at update_mmu_cache_range+0x1e0/0x278 > LR is at pte_offset_map_rw_nolock+0x18/0x2c > Call trace: > update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec > remove_migration_pte from rmap_walk_file+0xcc/0x130 > rmap_walk_file from remove_migration_ptes+0x90/0xa4 > remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 > migrate_pages_batch from migrate_pages+0x188/0x488 > migrate_pages from compact_zone+0x56c/0x954 > compact_zone from compact_node+0x90/0xf0 > compact_node from kcompactd+0x1d4/0x204 > kcompactd from kthread+0x120/0x12c > kthread from ret_from_fork+0x14/0x38 > Exception stack(0xc0d8bfb0 to 0xc0d8bff8) > > To fix it, do not rely on whether 'ptl' is equal to decide whether to hold > the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is > enabled. In addition, if two vmas map to the same PTE page, there is no > need to hold the pte lock again, otherwise a deadlock will occur. Just add > the need_lock parameter to let adjust_pte() know this information. > > Reported-by: Ezra Buehler <ezra.buehler@husqvarnagroup.com> > Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/ > Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()") > Cc: stable@vger.kernel.org > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > Changes in v2: > - change Ezra's email address (Ezra Buehler) > - some cleanups (David Hildenbrand) > > arch/arm/mm/fault-armv.c | 38 ++++++++++++++++++++++++++------------ > 1 file changed, 26 insertions(+), 12 deletions(-) > > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c > index 2bec87c3327d2..ea4c4e15f0d31 100644 > --- a/arch/arm/mm/fault-armv.c > +++ b/arch/arm/mm/fault-armv.c > @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address, > } > > static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > - unsigned long pfn, struct vm_fault *vmf) > + unsigned long pfn, bool need_lock) > { > spinlock_t *ptl; > pgd_t *pgd; > @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > if (!pte) > return 0; > > - /* > - * If we are using split PTE locks, then we need to take the page > - * lock here. Otherwise we are using shared mm->page_table_lock > - * which is already locked, thus cannot take it. > - */ > - if (ptl != vmf->ptl) { > + if (need_lock) { > + /* > + * Use nested version here to indicate that we are already > + * holding one similar spinlock. > + */ > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { > pte_unmap_unlock(pte, ptl); > @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > > ret = do_adjust_pte(vma, address, pfn, pte); > > - if (ptl != vmf->ptl) > + if (need_lock) > spin_unlock(ptl); > pte_unmap(pte); > > @@ -123,16 +122,18 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > > static void > make_coherent(struct address_space *mapping, struct vm_area_struct *vma, > - unsigned long addr, pte_t *ptep, unsigned long pfn, > - struct vm_fault *vmf) > + unsigned long addr, pte_t *ptep, unsigned long pfn) > { > struct mm_struct *mm = vma->vm_mm; > struct vm_area_struct *mpnt; > unsigned long offset; > + unsigned long pmd_start_addr, pmd_end_addr; Nit: reverse christmas tree would make us put this line at the very top. Maybe do the initialization directly: const unsigned long pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE); const unsigned long pmd_end_addr = pmd_start_addr + PMD_SIZE; > pgoff_t pgoff; > int aliases = 0; > > pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); > + pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE); > + pmd_end_addr = pmd_start_addr + PMD_SIZE; > > /* > * If we have any shared mappings that are in the same mm > @@ -141,6 +142,14 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, > */ > flush_dcache_mmap_lock(mapping); > vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { > + /* > + * If we are using split PTE locks, then we need to take the pte > + * lock. Otherwise we are using shared mm->page_table_lock which > + * is already locked, thus cannot take it. > + */ > + bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS); > + unsigned long mpnt_addr; > + > /* > * If this VMA is not in our MM, we can ignore it. > * Note that we intentionally mask out the VMA > @@ -151,7 +160,12 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, > if (!(mpnt->vm_flags & VM_MAYSHARE)) > continue; > offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; > - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf); > + mpnt_addr = mpnt->vm_start + offset; > + > + /* Avoid deadlocks by not grabbing the same PTE lock again. */ > + if (mpnt_addr >= pmd_start_addr && mpnt_addr < pmd_end_addr) > + need_lock = false; > + aliases += adjust_pte(mpnt, mpnt_addr, pfn, need_lock); > } > flush_dcache_mmap_unlock(mapping); > if (aliases) > @@ -194,7 +208,7 @@ void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma, > __flush_dcache_folio(mapping, folio); > if (mapping) { > if (cache_is_vivt()) > - make_coherent(mapping, vma, addr, ptep, pfn, vmf); > + make_coherent(mapping, vma, addr, ptep, pfn); > else if (vma->vm_flags & VM_EXEC) > __flush_icache_all(); > } Apart from that LGTM. Hoping it will work :) Acked-by: David Hildenbrand <david@redhat.com>
On 2025/2/14 16:11, David Hildenbrand wrote: > On 14.02.25 04:03, Qi Zheng wrote: >> When update_mmu_cache_range() is called by update_mmu_cache(), the vmf >> parameter is NULL, which will cause a NULL pointer dereference issue in >> adjust_pte(): >> >> Unable to handle kernel NULL pointer dereference at virtual address >> 00000030 when read >> Hardware name: Atmel AT91SAM9 >> PC is at update_mmu_cache_range+0x1e0/0x278 >> LR is at pte_offset_map_rw_nolock+0x18/0x2c >> Call trace: >> update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec >> remove_migration_pte from rmap_walk_file+0xcc/0x130 >> rmap_walk_file from remove_migration_ptes+0x90/0xa4 >> remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 >> migrate_pages_batch from migrate_pages+0x188/0x488 >> migrate_pages from compact_zone+0x56c/0x954 >> compact_zone from compact_node+0x90/0xf0 >> compact_node from kcompactd+0x1d4/0x204 >> kcompactd from kthread+0x120/0x12c >> kthread from ret_from_fork+0x14/0x38 >> Exception stack(0xc0d8bfb0 to 0xc0d8bff8) >> >> To fix it, do not rely on whether 'ptl' is equal to decide whether to >> hold >> the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is >> enabled. In addition, if two vmas map to the same PTE page, there is no >> need to hold the pte lock again, otherwise a deadlock will occur. Just >> add >> the need_lock parameter to let adjust_pte() know this information. >> >> Reported-by: Ezra Buehler <ezra.buehler@husqvarnagroup.com> >> Closes: >> https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/ >> Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()") >> Cc: stable@vger.kernel.org >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> --- >> Changes in v2: >> - change Ezra's email address (Ezra Buehler) >> - some cleanups (David Hildenbrand) >> >> arch/arm/mm/fault-armv.c | 38 ++++++++++++++++++++++++++------------ >> 1 file changed, 26 insertions(+), 12 deletions(-) >> >> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c >> index 2bec87c3327d2..ea4c4e15f0d31 100644 >> --- a/arch/arm/mm/fault-armv.c >> +++ b/arch/arm/mm/fault-armv.c >> @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, >> unsigned long address, >> } >> static int adjust_pte(struct vm_area_struct *vma, unsigned long >> address, >> - unsigned long pfn, struct vm_fault *vmf) >> + unsigned long pfn, bool need_lock) >> { >> spinlock_t *ptl; >> pgd_t *pgd; >> @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, >> unsigned long address, >> if (!pte) >> return 0; >> - /* >> - * If we are using split PTE locks, then we need to take the page >> - * lock here. Otherwise we are using shared mm->page_table_lock >> - * which is already locked, thus cannot take it. >> - */ >> - if (ptl != vmf->ptl) { >> + if (need_lock) { >> + /* >> + * Use nested version here to indicate that we are already >> + * holding one similar spinlock. >> + */ >> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); >> if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { >> pte_unmap_unlock(pte, ptl); >> @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, >> unsigned long address, >> ret = do_adjust_pte(vma, address, pfn, pte); >> - if (ptl != vmf->ptl) >> + if (need_lock) >> spin_unlock(ptl); >> pte_unmap(pte); >> @@ -123,16 +122,18 @@ static int adjust_pte(struct vm_area_struct >> *vma, unsigned long address, >> static void >> make_coherent(struct address_space *mapping, struct vm_area_struct >> *vma, >> - unsigned long addr, pte_t *ptep, unsigned long pfn, >> - struct vm_fault *vmf) >> + unsigned long addr, pte_t *ptep, unsigned long pfn) >> { >> struct mm_struct *mm = vma->vm_mm; >> struct vm_area_struct *mpnt; >> unsigned long offset; >> + unsigned long pmd_start_addr, pmd_end_addr; > > Nit: reverse christmas tree would make us put this line at the very top. > > Maybe do the initialization directly: > > const unsigned long pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE); > const unsigned long pmd_end_addr = pmd_start_addr + PMD_SIZE; Agree, will do. > >> pgoff_t pgoff; >> int aliases = 0; >> pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); >> + pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE); >> + pmd_end_addr = pmd_start_addr + PMD_SIZE; >> /* >> * If we have any shared mappings that are in the same mm >> @@ -141,6 +142,14 @@ make_coherent(struct address_space *mapping, >> struct vm_area_struct *vma, >> */ >> flush_dcache_mmap_lock(mapping); >> vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { >> + /* >> + * If we are using split PTE locks, then we need to take the pte >> + * lock. Otherwise we are using shared mm->page_table_lock which >> + * is already locked, thus cannot take it. >> + */ >> + bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS); >> + unsigned long mpnt_addr; >> + >> /* >> * If this VMA is not in our MM, we can ignore it. >> * Note that we intentionally mask out the VMA >> @@ -151,7 +160,12 @@ make_coherent(struct address_space *mapping, >> struct vm_area_struct *vma, >> if (!(mpnt->vm_flags & VM_MAYSHARE)) >> continue; >> offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; >> - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf); >> + mpnt_addr = mpnt->vm_start + offset; >> + >> + /* Avoid deadlocks by not grabbing the same PTE lock again. */ >> + if (mpnt_addr >= pmd_start_addr && mpnt_addr < pmd_end_addr) >> + need_lock = false; >> + aliases += adjust_pte(mpnt, mpnt_addr, pfn, need_lock); >> } >> flush_dcache_mmap_unlock(mapping); >> if (aliases) >> @@ -194,7 +208,7 @@ void update_mmu_cache_range(struct vm_fault *vmf, >> struct vm_area_struct *vma, >> __flush_dcache_folio(mapping, folio); >> if (mapping) { >> if (cache_is_vivt()) >> - make_coherent(mapping, vma, addr, ptep, pfn, vmf); >> + make_coherent(mapping, vma, addr, ptep, pfn); >> else if (vma->vm_flags & VM_EXEC) >> __flush_icache_all(); >> } > > > Apart from that LGTM. Hoping it will work :) +1 > > Acked-by: David Hildenbrand <david@redhat.com> Thanks! >
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index 2bec87c3327d2..ea4c4e15f0d31 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address, } static int adjust_pte(struct vm_area_struct *vma, unsigned long address, - unsigned long pfn, struct vm_fault *vmf) + unsigned long pfn, bool need_lock) { spinlock_t *ptl; pgd_t *pgd; @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, if (!pte) return 0; - /* - * If we are using split PTE locks, then we need to take the page - * lock here. Otherwise we are using shared mm->page_table_lock - * which is already locked, thus cannot take it. - */ - if (ptl != vmf->ptl) { + if (need_lock) { + /* + * Use nested version here to indicate that we are already + * holding one similar spinlock. + */ spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { pte_unmap_unlock(pte, ptl); @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, ret = do_adjust_pte(vma, address, pfn, pte); - if (ptl != vmf->ptl) + if (need_lock) spin_unlock(ptl); pte_unmap(pte); @@ -123,16 +122,18 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, static void make_coherent(struct address_space *mapping, struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep, unsigned long pfn, - struct vm_fault *vmf) + unsigned long addr, pte_t *ptep, unsigned long pfn) { struct mm_struct *mm = vma->vm_mm; struct vm_area_struct *mpnt; unsigned long offset; + unsigned long pmd_start_addr, pmd_end_addr; pgoff_t pgoff; int aliases = 0; pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); + pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE); + pmd_end_addr = pmd_start_addr + PMD_SIZE; /* * If we have any shared mappings that are in the same mm @@ -141,6 +142,14 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, */ flush_dcache_mmap_lock(mapping); vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { + /* + * If we are using split PTE locks, then we need to take the pte + * lock. Otherwise we are using shared mm->page_table_lock which + * is already locked, thus cannot take it. + */ + bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS); + unsigned long mpnt_addr; + /* * If this VMA is not in our MM, we can ignore it. * Note that we intentionally mask out the VMA @@ -151,7 +160,12 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, if (!(mpnt->vm_flags & VM_MAYSHARE)) continue; offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf); + mpnt_addr = mpnt->vm_start + offset; + + /* Avoid deadlocks by not grabbing the same PTE lock again. */ + if (mpnt_addr >= pmd_start_addr && mpnt_addr < pmd_end_addr) + need_lock = false; + aliases += adjust_pte(mpnt, mpnt_addr, pfn, need_lock); } flush_dcache_mmap_unlock(mapping); if (aliases) @@ -194,7 +208,7 @@ void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma, __flush_dcache_folio(mapping, folio); if (mapping) { if (cache_is_vivt()) - make_coherent(mapping, vma, addr, ptep, pfn, vmf); + make_coherent(mapping, vma, addr, ptep, pfn); else if (vma->vm_flags & VM_EXEC) __flush_icache_all(); }
When update_mmu_cache_range() is called by update_mmu_cache(), the vmf parameter is NULL, which will cause a NULL pointer dereference issue in adjust_pte(): Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read Hardware name: Atmel AT91SAM9 PC is at update_mmu_cache_range+0x1e0/0x278 LR is at pte_offset_map_rw_nolock+0x18/0x2c Call trace: update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec remove_migration_pte from rmap_walk_file+0xcc/0x130 rmap_walk_file from remove_migration_ptes+0x90/0xa4 remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 migrate_pages_batch from migrate_pages+0x188/0x488 migrate_pages from compact_zone+0x56c/0x954 compact_zone from compact_node+0x90/0xf0 compact_node from kcompactd+0x1d4/0x204 kcompactd from kthread+0x120/0x12c kthread from ret_from_fork+0x14/0x38 Exception stack(0xc0d8bfb0 to 0xc0d8bff8) To fix it, do not rely on whether 'ptl' is equal to decide whether to hold the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is enabled. In addition, if two vmas map to the same PTE page, there is no need to hold the pte lock again, otherwise a deadlock will occur. Just add the need_lock parameter to let adjust_pte() know this information. Reported-by: Ezra Buehler <ezra.buehler@husqvarnagroup.com> Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/ Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()") Cc: stable@vger.kernel.org Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- Changes in v2: - change Ezra's email address (Ezra Buehler) - some cleanups (David Hildenbrand) arch/arm/mm/fault-armv.c | 38 ++++++++++++++++++++++++++------------ 1 file changed, 26 insertions(+), 12 deletions(-)