Message ID | 1720614028-2260-1-git-send-email-zhang.chuna@h3c.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock. | expand |
On Wed, 10 Jul 2024 20:20:28 +0800 zhangchun <zhang.chuna@h3c.com> wrote: > Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the same > time may cause deadlock. The issue is like this: What is kmap_XXX? > CPU 0: CPU 1: > kmap_high(){ kmap_xxx() { > ... irq_disable(); > spin_lock(&kmap_lock) > ... > map_new_virtual ... > flush_all_zero_pkmaps > flush_tlb_kernel_range /* CPU0 holds the kmap_lock */ > smp_call_function_many spin_lock(&kmap_lock) > ... .... > spin_unlock(&kmap_lock) > ... > > CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 > has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix > this by releasing kmap_lock before call flush_tlb_kernel_range, > avoid kmap_lock deadlock. > > Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning") Wow, that's 15 years old. Has the deadlock been observed? > --- a/mm/highmem.c > +++ b/mm/highmem.c > @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void) > set_page_address(page, NULL); > need_flush = 1; > } > - if (need_flush) > + if (need_flush) { > + unlock_kmap(); > flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)); > + lock_kmap(); > + } > } Why is dropping the lock like this safe? What data is it protecting and why is it OK to leave that data unprotected here?
>> Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the >> same time may cause deadlock. The issue is like this: >What is kmap_XXX? kmap/kunmap. >> CPU 0: CPU 1: >> kmap_high(){ kmap_xxx() { >> ... irq_disable(); >> spin_lock(&kmap_lock) >> ... >> map_new_virtual ... >> flush_all_zero_pkmaps >> flush_tlb_kernel_range /* CPU0 holds the kmap_lock */ >> smp_call_function_many spin_lock(&kmap_lock) >> ... .... >> spin_unlock(&kmap_lock) >> ... >> >> CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 >> has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix >> this by releasing kmap_lock before call flush_tlb_kernel_range, avoid >> kmap_lock deadlock. >> >> Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning") >Wow, that's 15 years old. Has the deadlock been observed? Yeah, the device crashed due to this reason. >> --- a/mm/highmem.c >> +++ b/mm/highmem.c >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void) >> set_page_address(page, NULL); >> need_flush = 1; >> } >> - if (need_flush) >> + if (need_flush) { >> + unlock_kmap(); >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)); >> + lock_kmap(); >> + } >> } >Why is dropping the lock like this safe? What data is it protecting and why is it OK to >leave that data unprotected here? kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected here is safe.
On Thu, 11 Jul 2024 15:07:56 +0800 zhangchun <zhang.chuna@h3c.com> wrote: > >> --- a/mm/highmem.c > >> +++ b/mm/highmem.c > >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void) > >> set_page_address(page, NULL); > >> need_flush = 1; > >> } > >> - if (need_flush) > >> + if (need_flush) { > >> + unlock_kmap(); > >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)); > >> + lock_kmap(); > >> + } > >> } > > >Why is dropping the lock like this safe? What data is it protecting and why is it OK to > >leave that data unprotected here? > > kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). > When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range > will neither modify nor read these variables. Leave that data unprotected here is safe. No, the risk here is that when the lock is dropped, other threads will modify the data. And when this thread (the one running flush_all_zero_pkmaps()) retakes the lock, that data may now be unexpectedly altered.
>> >> --- a/mm/highmem.c >> >> +++ b/mm/highmem.c >> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void) >> >> set_page_address(page, NULL); >> >> need_flush = 1; >> >> } >> >> - if (need_flush) >> >> + if (need_flush) { >> >> + unlock_kmap(); >> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)); >> >> + lock_kmap(); >> >> + } >> >> } >> >> >Why is dropping the lock like this safe? What data is it protecting >> >and why is it OK to leave that data unprotected here? >> >> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). >> When call flush_tlb_kernel_range(PKMAP_ADDR(0), >> PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected here is safe. >No, the risk here is that when the lock is dropped, other threads will modify the data. And when this thread (the one running >flush_all_zero_pkmaps()) retakes the lock, that data may now be unexpectedly altered. map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is not dropped. "if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin. Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired, this is thread-safe. In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr]. If pkmap_count[last_pkmap_nr] != 0, Thread A continue to call get_next_pkmap_nr and get next last_pkmap_nr. static inline unsigned long map_new_virtual(struct page *page) { unsigned long vaddr; int count; unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr unsigned int color = get_pkmap_color(page); start: ... flush_all_zero_pkmaps();// release kmap_lock, then acquire it count = get_pkmap_entries_count(color); } ... if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not break; /* Found a usable entry */ if (--count) continue; ... vaddr = PKMAP_ADDR(last_pkmap_nr); set_pte_at(&init_mm, vaddr, &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot)); pkmap_count[last_pkmap_nr] = 1; ... return vaddr; }
Very sorry to disturb! Just a friendly ping to check in on the status of the patch "Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.". Please let me know if there is any additional information from my side. Sincerely look forward to your suggestions and guidance! >>> >> --- a/mm/highmem.c >>> >> +++ b/mm/highmem.c >>> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void) >>> >> set_page_address(page, NULL); >>> >> need_flush = 1; >>> >> } >>> >> - if (need_flush) >>> >> + if (need_flush) { >>> >> + unlock_kmap(); >>> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)); >>> >> + lock_kmap(); >>> >> + } >>> >> } >>> >>> >Why is dropping the lock like this safe? What data is it protecting >>> >and why is it OK to leave that data unprotected here? >>> >>> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). >>> When call flush_tlb_kernel_range(PKMAP_ADDR(0), >>> PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected here is safe. >>No, the risk here is that when the lock is dropped, other threads will modify the data. And when this thread (the one running >>flush_all_zero_pkmaps()) retakes the lock, that data may now be unexpectedly altered. >map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is >not dropped. >"if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin. >Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired, >this is thread-safe. >In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the >kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes >the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr]. >If pkmap_count[last_pkmap_nr] != 0, Thread A continue to call get_next_pkmap_nr and get next last_pkmap_nr. >static inline unsigned long map_new_virtual(struct page *page) >{ > unsigned long vaddr; > int count; > unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr > unsigned int color = get_pkmap_color(page); >start: > ... > flush_all_zero_pkmaps();// release kmap_lock, then acquire it > count = get_pkmap_entries_count(color); > } > ... > if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not > break; /* Found a usable entry */ > if (--count) > continue; > > ... > vaddr = PKMAP_ADDR(last_pkmap_nr); > set_pte_at(&init_mm, vaddr, > &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot)); > > pkmap_count[last_pkmap_nr] = 1; > ... > return vaddr; >}
On Fri, 19 Jul 2024 00:18:32 +0800 zhangchun <zhang.chuna@h3c.com> wrote: > Very sorry to disturb! Just a friendly ping to check in on the status of the > patch "Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.". > Please let me know if there is any additional information from my side. > Please update the patch to include a code comment which explains why we're dropping kmap_lock at this point. And please update the changelog to explain why holding kmap_lock was not necessary at this point and why this is a safe change. Then resend.
diff --git a/mm/highmem.c b/mm/highmem.c index bd48ba4..841b370 100644 --- a/mm/highmem.c +++ b/mm/highmem.c @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void) set_page_address(page, NULL); need_flush = 1; } - if (need_flush) + if (need_flush) { + unlock_kmap(); flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)); + lock_kmap(); + } } void __kmap_flush_unused(void)