diff mbox series

[v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.

Message ID 1720614028-2260-1-git-send-email-zhang.chuna@h3c.com (mailing list archive)
State New
Headers show
Series [v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock. | expand

Commit Message

zhangchun July 10, 2024, 12:20 p.m. UTC
Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the same
time may cause deadlock. The issue is like this:

 CPU 0:                                                 CPU 1:
 kmap_high(){                                           kmap_xxx() {
               ...                                        irq_disable();
        spin_lock(&kmap_lock)
               ...
        map_new_virtual                                     ...
           flush_all_zero_pkmaps
              flush_tlb_kernel_range         /* CPU0 holds the kmap_lock */
                      smp_call_function_many         spin_lock(&kmap_lock)
                      ...                                   ....
        spin_unlock(&kmap_lock)
               ...

CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1
has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix
this by releasing  kmap_lock before call flush_tlb_kernel_range,
avoid kmap_lock deadlock.

Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Signed-off-by: zhangchun <zhang.chuna@h3c.com>
Co-developed-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Signed-off-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: zhangzhengming <zhang.zhengming@h3c.com>
---
 mm/highmem.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Andrew Morton July 10, 2024, 5:36 p.m. UTC | #1
On Wed, 10 Jul 2024 20:20:28 +0800 zhangchun <zhang.chuna@h3c.com> wrote:

> Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the same
> time may cause deadlock. The issue is like this:


What is kmap_XXX?

>  CPU 0:                                                 CPU 1:
>  kmap_high(){                                           kmap_xxx() {
>                ...                                        irq_disable();
>         spin_lock(&kmap_lock)
>                ...
>         map_new_virtual                                     ...
>            flush_all_zero_pkmaps
>               flush_tlb_kernel_range         /* CPU0 holds the kmap_lock */
>                       smp_call_function_many         spin_lock(&kmap_lock)
>                       ...                                   ....
>         spin_unlock(&kmap_lock)
>                ...
> 
> CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1
> has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix
> this by releasing  kmap_lock before call flush_tlb_kernel_range,
> avoid kmap_lock deadlock.
> 
> Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")

Wow, that's 15 years old.  Has the deadlock been observed?

> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>  		set_page_address(page, NULL);
>  		need_flush = 1;
>  	}
> -	if (need_flush)
> +	if (need_flush) {
> +		unlock_kmap();
>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> +		lock_kmap();
> +	}
>  }

Why is dropping the lock like this safe?  What data is it protecting
and why is it OK to leave that data unprotected here?
zhangchun July 11, 2024, 7:07 a.m. UTC | #2
>> Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the 
>> same time may cause deadlock. The issue is like this:


>What is kmap_XXX?

kmap/kunmap.

>>  CPU 0:                                                 CPU 1:
>>  kmap_high(){                                           kmap_xxx() {
>>                ...                                        irq_disable();
>>         spin_lock(&kmap_lock)
>>                ...
>>         map_new_virtual                                     ...
>>            flush_all_zero_pkmaps
>>             flush_tlb_kernel_range        /* CPU0 holds the kmap_lock */
>>                 smp_call_function_many         spin_lock(&kmap_lock)
>>                       ...                                   ....
>>         spin_unlock(&kmap_lock)
>>                ...
>> 
>> CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 
>> has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix 
>> this by releasing  kmap_lock before call flush_tlb_kernel_range, avoid 
>> kmap_lock deadlock.
>> 
>> Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")

>Wow, that's 15 years old.  Has the deadlock been observed?

Yeah, the device crashed due to this reason. 
 
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>>  		set_page_address(page, NULL);
>>  		need_flush = 1;
>>  	}
>> -	if (need_flush)
>> +	if (need_flush) {
>> +		unlock_kmap();
>>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> +		lock_kmap();
>> +	}
>>  }

>Why is dropping the lock like this safe?  What data is it protecting and why is it OK to 
>leave that data unprotected here?

kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). 
When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range
will neither modify nor read these variables. Leave that data unprotected here is safe.
Andrew Morton July 11, 2024, 9:13 p.m. UTC | #3
On Thu, 11 Jul 2024 15:07:56 +0800 zhangchun <zhang.chuna@h3c.com> wrote:

> >> --- a/mm/highmem.c
> >> +++ b/mm/highmem.c
> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
> >>  		set_page_address(page, NULL);
> >>  		need_flush = 1;
> >>  	}
> >> -	if (need_flush)
> >> +	if (need_flush) {
> >> +		unlock_kmap();
> >>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> +		lock_kmap();
> >> +	}
> >>  }
> 
> >Why is dropping the lock like this safe?  What data is it protecting and why is it OK to 
> >leave that data unprotected here?
> 
> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). 
> When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range
> will neither modify nor read these variables. Leave that data unprotected here is safe.

No, the risk here is that when the lock is dropped, other threads will
modify the data.  And when this thread (the one running
flush_all_zero_pkmaps()) retakes the lock, that data may now be
unexpectedly altered.
zhangchun July 12, 2024, 7:54 a.m. UTC | #4
>> >> --- a/mm/highmem.c
>> >> +++ b/mm/highmem.c
>> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>> >>       set_page_address(page, NULL);
>> >>       need_flush = 1;
>> >>   }
>> >> - if (need_flush)
>> >> + if (need_flush) {
>> >> +     unlock_kmap();
>> >>       flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> +     lock_kmap();
>> >> + }
>> >>  }
>> 
>> >Why is dropping the lock like this safe?  What data is it protecting 
>> >and why is it OK to leave that data unprotected here?
>> 
>> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). 
>> When call flush_tlb_kernel_range(PKMAP_ADDR(0), 
>> PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected here is safe.

>No, the risk here is that when the lock is dropped, other threads will modify the data.  And when this thread (the one running
>flush_all_zero_pkmaps()) retakes the lock, that data may now be unexpectedly altered.

map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is 
not dropped. 
"if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.

Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired, 
this is thread-safe.

In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the 
kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes 
the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr]. 
If pkmap_count[last_pkmap_nr] != 0, Thread A continue to call get_next_pkmap_nr and get next last_pkmap_nr. 

static inline unsigned long map_new_virtual(struct page *page)
{
        unsigned long vaddr;
        int count;
        unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
        unsigned int color = get_pkmap_color(page);

start:
        ...
                        flush_all_zero_pkmaps();// release kmap_lock, then acquire it
                        count = get_pkmap_entries_count(color);
                } 
                ...
                if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
                        break;  /* Found a usable entry */
                if (--count) 
                        continue;

               ...
        vaddr = PKMAP_ADDR(last_pkmap_nr);
        set_pte_at(&init_mm, vaddr,
                   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));

        pkmap_count[last_pkmap_nr] = 1;
        ...
        return vaddr;
}
zhangchun July 18, 2024, 4:18 p.m. UTC | #5
Very sorry to disturb! Just a friendly ping to check in on the status of the 
patch "Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.".  
Please let me know if there is any additional information from my side.

Sincerely look forward to your suggestions and guidance!

>>> >> --- a/mm/highmem.c
>>> >> +++ b/mm/highmem.c
>>> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>>> >>       set_page_address(page, NULL);
>>> >>       need_flush = 1;
>>> >>   }
>>> >> - if (need_flush)
>>> >> + if (need_flush) {
>>> >> +     unlock_kmap();
>>> >>       flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>>> >> +     lock_kmap();
>>> >> + }
>>> >>  }
>>> 
>>> >Why is dropping the lock like this safe?  What data is it protecting 
>>> >and why is it OK to leave that data unprotected here?
>>> 
>>> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). 
>>> When call flush_tlb_kernel_range(PKMAP_ADDR(0), 
>>> PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected here is safe.

>>No, the risk here is that when the lock is dropped, other threads will modify the data.  And when this thread (the one running
>>flush_all_zero_pkmaps()) retakes the lock, that data may now be unexpectedly altered.

>map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is 
>not dropped. 
>"if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.

>Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired, 
>this is thread-safe.

>In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the 
>kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes 
>the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr]. 
>If pkmap_count[last_pkmap_nr] != 0, Thread A continue to call get_next_pkmap_nr and get next last_pkmap_nr. 

>static inline unsigned long map_new_virtual(struct page *page)
>{
>        unsigned long vaddr;
>        int count;
>        unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
>        unsigned int color = get_pkmap_color(page);

>start:
>        ...
>                        flush_all_zero_pkmaps();// release kmap_lock, then acquire it
>                        count = get_pkmap_entries_count(color);
>                } 
>                ...
>                if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
>                        break;  /* Found a usable entry */
>                if (--count) 
>                        continue;
>
>               ...
>        vaddr = PKMAP_ADDR(last_pkmap_nr);
>        set_pte_at(&init_mm, vaddr,
>                   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
>
>        pkmap_count[last_pkmap_nr] = 1;
>        ...
>        return vaddr;
>}
Andrew Morton July 24, 2024, 12:26 a.m. UTC | #6
On Fri, 19 Jul 2024 00:18:32 +0800 zhangchun <zhang.chuna@h3c.com> wrote:

> Very sorry to disturb! Just a friendly ping to check in on the status of the 
> patch "Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.".  
> Please let me know if there is any additional information from my side.
> 

Please update the patch to include a code comment which explains why
we're dropping kmap_lock at this point.  And please update the
changelog to explain why holding kmap_lock was not necessary at this
point and why this is a safe change.  Then resend.
diff mbox series

Patch

diff --git a/mm/highmem.c b/mm/highmem.c
index bd48ba4..841b370 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -220,8 +220,11 @@  static void flush_all_zero_pkmaps(void)
 		set_page_address(page, NULL);
 		need_flush = 1;
 	}
-	if (need_flush)
+	if (need_flush) {
+		unlock_kmap();
 		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+		lock_kmap();
+	}
 }
 
 void __kmap_flush_unused(void)