diff mbox series

x86/alternative: don't call text_poke() in lazy TLB mode

Message ID 20201009144225.12019-1-jgross@suse.com (mailing list archive)
State Accepted
Commit abee7c494d8c41bb388839bccc47e06247f0d7de
Headers show
Series x86/alternative: don't call text_poke() in lazy TLB mode | expand

Commit Message

Jürgen Groß Oct. 9, 2020, 2:42 p.m. UTC
When running in lazy TLB mode the currently active page tables might
be the ones of a previous process, e.g. when running a kernel thread.

This can be problematic in case kernel code is being modified via
text_poke() in a kernel thread, and on another processor exit_mmap()
is active for the process which was running on the first cpu before
the kernel thread.

As text_poke() is using a temporary address space and the former
address space (obtained via cpu_tlbstate.loaded_mm) is restored
afterwards, there is a race possible in case the cpu on which
exit_mmap() is running wants to make sure there are no stale
references to that address space on any cpu active (this e.g. is
required when running as a Xen PV guest, where this problem has been
observed and analyzed).

In order to avoid that, drop off TLB lazy mode before switching to the
temporary address space.

Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/kernel/alternative.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Peter Zijlstra Oct. 12, 2020, 10:13 a.m. UTC | #1
On Fri, Oct 09, 2020 at 04:42:25PM +0200, Juergen Gross wrote:
> When running in lazy TLB mode the currently active page tables might
> be the ones of a previous process, e.g. when running a kernel thread.
> 
> This can be problematic in case kernel code is being modified via
> text_poke() in a kernel thread, and on another processor exit_mmap()
> is active for the process which was running on the first cpu before
> the kernel thread.
> 
> As text_poke() is using a temporary address space and the former
> address space (obtained via cpu_tlbstate.loaded_mm) is restored
> afterwards, there is a race possible in case the cpu on which
> exit_mmap() is running wants to make sure there are no stale
> references to that address space on any cpu active (this e.g. is
> required when running as a Xen PV guest, where this problem has been
> observed and analyzed).
> 
> In order to avoid that, drop off TLB lazy mode before switching to the
> temporary address space.

Oh man, that must've been 'fun' :/

> Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  arch/x86/kernel/alternative.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index cdaab30880b9..cd6be6f143e8 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
>  	temp_mm_state_t temp_state;
>  
>  	lockdep_assert_irqs_disabled();
> +
> +	/*
> +	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
> +	 * with a stale address space WITHOUT being in lazy mode after
> +	 * restoring the previous mm.
> +	 */
> +	if (this_cpu_read(cpu_tlbstate.is_lazy))
> +		leave_mm(smp_processor_id());
> +
>  	temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>  	switch_mm_irqs_off(NULL, mm, current);

Would it make sense to write it like:

	this_state.mm = this_cpu_read(cpu_tlbstate.is_lazy) ?
			&init_mm : this_cpu_read(cpu_tlbstate.loaded_mm);

Possibly with that wrapped in a conveniently named helper function.
Jürgen Groß Oct. 12, 2020, 10:26 a.m. UTC | #2
On 12.10.20 12:13, Peter Zijlstra wrote:
> On Fri, Oct 09, 2020 at 04:42:25PM +0200, Juergen Gross wrote:
>> When running in lazy TLB mode the currently active page tables might
>> be the ones of a previous process, e.g. when running a kernel thread.
>>
>> This can be problematic in case kernel code is being modified via
>> text_poke() in a kernel thread, and on another processor exit_mmap()
>> is active for the process which was running on the first cpu before
>> the kernel thread.
>>
>> As text_poke() is using a temporary address space and the former
>> address space (obtained via cpu_tlbstate.loaded_mm) is restored
>> afterwards, there is a race possible in case the cpu on which
>> exit_mmap() is running wants to make sure there are no stale
>> references to that address space on any cpu active (this e.g. is
>> required when running as a Xen PV guest, where this problem has been
>> observed and analyzed).
>>
>> In order to avoid that, drop off TLB lazy mode before switching to the
>> temporary address space.
> 
> Oh man, that must've been 'fun' :/

Yeah.

> 
>> Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   arch/x86/kernel/alternative.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
>> index cdaab30880b9..cd6be6f143e8 100644
>> --- a/arch/x86/kernel/alternative.c
>> +++ b/arch/x86/kernel/alternative.c
>> @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
>>   	temp_mm_state_t temp_state;
>>   
>>   	lockdep_assert_irqs_disabled();
>> +
>> +	/*
>> +	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
>> +	 * with a stale address space WITHOUT being in lazy mode after
>> +	 * restoring the previous mm.
>> +	 */
>> +	if (this_cpu_read(cpu_tlbstate.is_lazy))
>> +		leave_mm(smp_processor_id());
>> +
>>   	temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>>   	switch_mm_irqs_off(NULL, mm, current);
> 
> Would it make sense to write it like:
> 
> 	this_state.mm = this_cpu_read(cpu_tlbstate.is_lazy) ?
> 			&init_mm : this_cpu_read(cpu_tlbstate.loaded_mm);
> 
> Possibly with that wrapped in a conveniently named helper function.

Fine with me, but I don't think it matters that much.

For each batch of text_poke() it will be hit only once, and I'm not sure
it is really a good idea to use the knowledge that leave_mm() is just a
switch to init_mm here.

In case it is still the preferred way to do it I can send an update of
the patch.


Juergen
Peter Zijlstra Oct. 12, 2020, 10:45 a.m. UTC | #3
On Mon, Oct 12, 2020 at 12:26:06PM +0200, Jürgen Groß wrote:

> > > @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
> > >   	temp_mm_state_t temp_state;
> > >   	lockdep_assert_irqs_disabled();
> > > +
> > > +	/*
> > > +	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
> > > +	 * with a stale address space WITHOUT being in lazy mode after
> > > +	 * restoring the previous mm.
> > > +	 */
> > > +	if (this_cpu_read(cpu_tlbstate.is_lazy))
> > > +		leave_mm(smp_processor_id());
> > > +
> > >   	temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
> > >   	switch_mm_irqs_off(NULL, mm, current);
> > 
> > Would it make sense to write it like:
> > 
> > 	this_state.mm = this_cpu_read(cpu_tlbstate.is_lazy) ?
> > 			&init_mm : this_cpu_read(cpu_tlbstate.loaded_mm);
> > 
> > Possibly with that wrapped in a conveniently named helper function.
> 
> Fine with me, but I don't think it matters that much.
> 
> For each batch of text_poke() it will be hit only once, and I'm not sure
> it is really a good idea to use the knowledge that leave_mm() is just a
> switch to init_mm here.

Yeah, I'm not sure either. But it's something I came up with when
looking at all this.

Andy, what's your preference?
Jürgen Groß Oct. 22, 2020, 9:24 a.m. UTC | #4
On 09.10.20 16:42, Juergen Gross wrote:
> When running in lazy TLB mode the currently active page tables might
> be the ones of a previous process, e.g. when running a kernel thread.
> 
> This can be problematic in case kernel code is being modified via
> text_poke() in a kernel thread, and on another processor exit_mmap()
> is active for the process which was running on the first cpu before
> the kernel thread.
> 
> As text_poke() is using a temporary address space and the former
> address space (obtained via cpu_tlbstate.loaded_mm) is restored
> afterwards, there is a race possible in case the cpu on which
> exit_mmap() is running wants to make sure there are no stale
> references to that address space on any cpu active (this e.g. is
> required when running as a Xen PV guest, where this problem has been
> observed and analyzed).
> 
> In order to avoid that, drop off TLB lazy mode before switching to the
> temporary address space.
> 
> Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
> Signed-off-by: Juergen Gross <jgross@suse.com>

Can anyone look at this, please? It is fixing a real problem which has
been seen several times.


Juergen

> ---
>   arch/x86/kernel/alternative.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index cdaab30880b9..cd6be6f143e8 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
>   	temp_mm_state_t temp_state;
>   
>   	lockdep_assert_irqs_disabled();
> +
> +	/*
> +	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
> +	 * with a stale address space WITHOUT being in lazy mode after
> +	 * restoring the previous mm.
> +	 */
> +	if (this_cpu_read(cpu_tlbstate.is_lazy))
> +		leave_mm(smp_processor_id());
> +
>   	temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>   	switch_mm_irqs_off(NULL, mm, current);
>   
>
Peter Zijlstra Oct. 22, 2020, 10:45 a.m. UTC | #5
On Thu, Oct 22, 2020 at 11:24:39AM +0200, Jürgen Groß wrote:
> On 09.10.20 16:42, Juergen Gross wrote:
> > When running in lazy TLB mode the currently active page tables might
> > be the ones of a previous process, e.g. when running a kernel thread.
> > 
> > This can be problematic in case kernel code is being modified via
> > text_poke() in a kernel thread, and on another processor exit_mmap()
> > is active for the process which was running on the first cpu before
> > the kernel thread.
> > 
> > As text_poke() is using a temporary address space and the former
> > address space (obtained via cpu_tlbstate.loaded_mm) is restored
> > afterwards, there is a race possible in case the cpu on which
> > exit_mmap() is running wants to make sure there are no stale
> > references to that address space on any cpu active (this e.g. is
> > required when running as a Xen PV guest, where this problem has been
> > observed and analyzed).
> > 
> > In order to avoid that, drop off TLB lazy mode before switching to the
> > temporary address space.
> > 
> > Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
> > Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Can anyone look at this, please? It is fixing a real problem which has
> been seen several times.

As it happens I picked it up yesterday, just pushed it out for you.

Thanks!
Jürgen Groß Oct. 22, 2020, 10:48 a.m. UTC | #6
On 22.10.20 12:45, Peter Zijlstra wrote:
> On Thu, Oct 22, 2020 at 11:24:39AM +0200, Jürgen Groß wrote:
>> On 09.10.20 16:42, Juergen Gross wrote:
>>> When running in lazy TLB mode the currently active page tables might
>>> be the ones of a previous process, e.g. when running a kernel thread.
>>>
>>> This can be problematic in case kernel code is being modified via
>>> text_poke() in a kernel thread, and on another processor exit_mmap()
>>> is active for the process which was running on the first cpu before
>>> the kernel thread.
>>>
>>> As text_poke() is using a temporary address space and the former
>>> address space (obtained via cpu_tlbstate.loaded_mm) is restored
>>> afterwards, there is a race possible in case the cpu on which
>>> exit_mmap() is running wants to make sure there are no stale
>>> references to that address space on any cpu active (this e.g. is
>>> required when running as a Xen PV guest, where this problem has been
>>> observed and analyzed).
>>>
>>> In order to avoid that, drop off TLB lazy mode before switching to the
>>> temporary address space.
>>>
>>> Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> Can anyone look at this, please? It is fixing a real problem which has
>> been seen several times.
> 
> As it happens I picked it up yesterday, just pushed it out for you.

Thank you very much!


Juergen
diff mbox series

Patch

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index cdaab30880b9..cd6be6f143e8 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -807,6 +807,15 @@  static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
 	temp_mm_state_t temp_state;
 
 	lockdep_assert_irqs_disabled();
+
+	/*
+	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
+	 * with a stale address space WITHOUT being in lazy mode after
+	 * restoring the previous mm.
+	 */
+	if (this_cpu_read(cpu_tlbstate.is_lazy))
+		leave_mm(smp_processor_id());
+
 	temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
 	switch_mm_irqs_off(NULL, mm, current);