Message ID | 20201009144225.12019-1-jgross@suse.com (mailing list archive) |
---|---|
State | Accepted |
Commit | abee7c494d8c41bb388839bccc47e06247f0d7de |
Headers | show |
Series | x86/alternative: don't call text_poke() in lazy TLB mode | expand |
On Fri, Oct 09, 2020 at 04:42:25PM +0200, Juergen Gross wrote: > When running in lazy TLB mode the currently active page tables might > be the ones of a previous process, e.g. when running a kernel thread. > > This can be problematic in case kernel code is being modified via > text_poke() in a kernel thread, and on another processor exit_mmap() > is active for the process which was running on the first cpu before > the kernel thread. > > As text_poke() is using a temporary address space and the former > address space (obtained via cpu_tlbstate.loaded_mm) is restored > afterwards, there is a race possible in case the cpu on which > exit_mmap() is running wants to make sure there are no stale > references to that address space on any cpu active (this e.g. is > required when running as a Xen PV guest, where this problem has been > observed and analyzed). > > In order to avoid that, drop off TLB lazy mode before switching to the > temporary address space. Oh man, that must've been 'fun' :/ > Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs") > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > arch/x86/kernel/alternative.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c > index cdaab30880b9..cd6be6f143e8 100644 > --- a/arch/x86/kernel/alternative.c > +++ b/arch/x86/kernel/alternative.c > @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm) > temp_mm_state_t temp_state; > > lockdep_assert_irqs_disabled(); > + > + /* > + * Make sure not to be in TLB lazy mode, as otherwise we'll end up > + * with a stale address space WITHOUT being in lazy mode after > + * restoring the previous mm. > + */ > + if (this_cpu_read(cpu_tlbstate.is_lazy)) > + leave_mm(smp_processor_id()); > + > temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm); > switch_mm_irqs_off(NULL, mm, current); Would it make sense to write it like: this_state.mm = this_cpu_read(cpu_tlbstate.is_lazy) ? &init_mm : this_cpu_read(cpu_tlbstate.loaded_mm); Possibly with that wrapped in a conveniently named helper function.
On 12.10.20 12:13, Peter Zijlstra wrote: > On Fri, Oct 09, 2020 at 04:42:25PM +0200, Juergen Gross wrote: >> When running in lazy TLB mode the currently active page tables might >> be the ones of a previous process, e.g. when running a kernel thread. >> >> This can be problematic in case kernel code is being modified via >> text_poke() in a kernel thread, and on another processor exit_mmap() >> is active for the process which was running on the first cpu before >> the kernel thread. >> >> As text_poke() is using a temporary address space and the former >> address space (obtained via cpu_tlbstate.loaded_mm) is restored >> afterwards, there is a race possible in case the cpu on which >> exit_mmap() is running wants to make sure there are no stale >> references to that address space on any cpu active (this e.g. is >> required when running as a Xen PV guest, where this problem has been >> observed and analyzed). >> >> In order to avoid that, drop off TLB lazy mode before switching to the >> temporary address space. > > Oh man, that must've been 'fun' :/ Yeah. > >> Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs") >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> arch/x86/kernel/alternative.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c >> index cdaab30880b9..cd6be6f143e8 100644 >> --- a/arch/x86/kernel/alternative.c >> +++ b/arch/x86/kernel/alternative.c >> @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm) >> temp_mm_state_t temp_state; >> >> lockdep_assert_irqs_disabled(); >> + >> + /* >> + * Make sure not to be in TLB lazy mode, as otherwise we'll end up >> + * with a stale address space WITHOUT being in lazy mode after >> + * restoring the previous mm. >> + */ >> + if (this_cpu_read(cpu_tlbstate.is_lazy)) >> + leave_mm(smp_processor_id()); >> + >> temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm); >> switch_mm_irqs_off(NULL, mm, current); > > Would it make sense to write it like: > > this_state.mm = this_cpu_read(cpu_tlbstate.is_lazy) ? > &init_mm : this_cpu_read(cpu_tlbstate.loaded_mm); > > Possibly with that wrapped in a conveniently named helper function. Fine with me, but I don't think it matters that much. For each batch of text_poke() it will be hit only once, and I'm not sure it is really a good idea to use the knowledge that leave_mm() is just a switch to init_mm here. In case it is still the preferred way to do it I can send an update of the patch. Juergen
On Mon, Oct 12, 2020 at 12:26:06PM +0200, Jürgen Groß wrote: > > > @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm) > > > temp_mm_state_t temp_state; > > > lockdep_assert_irqs_disabled(); > > > + > > > + /* > > > + * Make sure not to be in TLB lazy mode, as otherwise we'll end up > > > + * with a stale address space WITHOUT being in lazy mode after > > > + * restoring the previous mm. > > > + */ > > > + if (this_cpu_read(cpu_tlbstate.is_lazy)) > > > + leave_mm(smp_processor_id()); > > > + > > > temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm); > > > switch_mm_irqs_off(NULL, mm, current); > > > > Would it make sense to write it like: > > > > this_state.mm = this_cpu_read(cpu_tlbstate.is_lazy) ? > > &init_mm : this_cpu_read(cpu_tlbstate.loaded_mm); > > > > Possibly with that wrapped in a conveniently named helper function. > > Fine with me, but I don't think it matters that much. > > For each batch of text_poke() it will be hit only once, and I'm not sure > it is really a good idea to use the knowledge that leave_mm() is just a > switch to init_mm here. Yeah, I'm not sure either. But it's something I came up with when looking at all this. Andy, what's your preference?
On 09.10.20 16:42, Juergen Gross wrote: > When running in lazy TLB mode the currently active page tables might > be the ones of a previous process, e.g. when running a kernel thread. > > This can be problematic in case kernel code is being modified via > text_poke() in a kernel thread, and on another processor exit_mmap() > is active for the process which was running on the first cpu before > the kernel thread. > > As text_poke() is using a temporary address space and the former > address space (obtained via cpu_tlbstate.loaded_mm) is restored > afterwards, there is a race possible in case the cpu on which > exit_mmap() is running wants to make sure there are no stale > references to that address space on any cpu active (this e.g. is > required when running as a Xen PV guest, where this problem has been > observed and analyzed). > > In order to avoid that, drop off TLB lazy mode before switching to the > temporary address space. > > Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs") > Signed-off-by: Juergen Gross <jgross@suse.com> Can anyone look at this, please? It is fixing a real problem which has been seen several times. Juergen > --- > arch/x86/kernel/alternative.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c > index cdaab30880b9..cd6be6f143e8 100644 > --- a/arch/x86/kernel/alternative.c > +++ b/arch/x86/kernel/alternative.c > @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm) > temp_mm_state_t temp_state; > > lockdep_assert_irqs_disabled(); > + > + /* > + * Make sure not to be in TLB lazy mode, as otherwise we'll end up > + * with a stale address space WITHOUT being in lazy mode after > + * restoring the previous mm. > + */ > + if (this_cpu_read(cpu_tlbstate.is_lazy)) > + leave_mm(smp_processor_id()); > + > temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm); > switch_mm_irqs_off(NULL, mm, current); > >
On Thu, Oct 22, 2020 at 11:24:39AM +0200, Jürgen Groß wrote: > On 09.10.20 16:42, Juergen Gross wrote: > > When running in lazy TLB mode the currently active page tables might > > be the ones of a previous process, e.g. when running a kernel thread. > > > > This can be problematic in case kernel code is being modified via > > text_poke() in a kernel thread, and on another processor exit_mmap() > > is active for the process which was running on the first cpu before > > the kernel thread. > > > > As text_poke() is using a temporary address space and the former > > address space (obtained via cpu_tlbstate.loaded_mm) is restored > > afterwards, there is a race possible in case the cpu on which > > exit_mmap() is running wants to make sure there are no stale > > references to that address space on any cpu active (this e.g. is > > required when running as a Xen PV guest, where this problem has been > > observed and analyzed). > > > > In order to avoid that, drop off TLB lazy mode before switching to the > > temporary address space. > > > > Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs") > > Signed-off-by: Juergen Gross <jgross@suse.com> > > Can anyone look at this, please? It is fixing a real problem which has > been seen several times. As it happens I picked it up yesterday, just pushed it out for you. Thanks!
On 22.10.20 12:45, Peter Zijlstra wrote: > On Thu, Oct 22, 2020 at 11:24:39AM +0200, Jürgen Groß wrote: >> On 09.10.20 16:42, Juergen Gross wrote: >>> When running in lazy TLB mode the currently active page tables might >>> be the ones of a previous process, e.g. when running a kernel thread. >>> >>> This can be problematic in case kernel code is being modified via >>> text_poke() in a kernel thread, and on another processor exit_mmap() >>> is active for the process which was running on the first cpu before >>> the kernel thread. >>> >>> As text_poke() is using a temporary address space and the former >>> address space (obtained via cpu_tlbstate.loaded_mm) is restored >>> afterwards, there is a race possible in case the cpu on which >>> exit_mmap() is running wants to make sure there are no stale >>> references to that address space on any cpu active (this e.g. is >>> required when running as a Xen PV guest, where this problem has been >>> observed and analyzed). >>> >>> In order to avoid that, drop off TLB lazy mode before switching to the >>> temporary address space. >>> >>> Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs") >>> Signed-off-by: Juergen Gross <jgross@suse.com> >> >> Can anyone look at this, please? It is fixing a real problem which has >> been seen several times. > > As it happens I picked it up yesterday, just pushed it out for you. Thank you very much! Juergen
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index cdaab30880b9..cd6be6f143e8 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm) temp_mm_state_t temp_state; lockdep_assert_irqs_disabled(); + + /* + * Make sure not to be in TLB lazy mode, as otherwise we'll end up + * with a stale address space WITHOUT being in lazy mode after + * restoring the previous mm. + */ + if (this_cpu_read(cpu_tlbstate.is_lazy)) + leave_mm(smp_processor_id()); + temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm); switch_mm_irqs_off(NULL, mm, current);
When running in lazy TLB mode the currently active page tables might be the ones of a previous process, e.g. when running a kernel thread. This can be problematic in case kernel code is being modified via text_poke() in a kernel thread, and on another processor exit_mmap() is active for the process which was running on the first cpu before the kernel thread. As text_poke() is using a temporary address space and the former address space (obtained via cpu_tlbstate.loaded_mm) is restored afterwards, there is a race possible in case the cpu on which exit_mmap() is running wants to make sure there are no stale references to that address space on any cpu active (this e.g. is required when running as a Xen PV guest, where this problem has been observed and analyzed). In order to avoid that, drop off TLB lazy mode before switching to the temporary address space. Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs") Signed-off-by: Juergen Gross <jgross@suse.com> --- arch/x86/kernel/alternative.c | 9 +++++++++ 1 file changed, 9 insertions(+)