Message ID | 20191119221006.1021520-1-pasha.tatashin@soleen.com (mailing list archive) |
---|---|
State | Mainlined |
Commit | 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e |
Headers | show |
Series | arm64: kernel: memory corruptions due non-disabled PAN | expand |
Hi Pavel, On Tue, Nov 19, 2019 at 05:10:06PM -0500, Pavel Tatashin wrote: > Userland access functions (__arch_clear_user, __arch_copy_from_user, > __arch_copy_in_user, __arch_copy_to_user), enable and disable PAN > for the duration of copy. However, when copy fails for some reason, > i.e. access violation the code is transferred to fixedup section, > where we do not disable PAN. Thanks for reporting this. This is a very nasty bug. > The bug is a security violation as the access to userland is still > open when it should be disabled, but it also causes memory corruptions > when software emulated PAN is used: CONFIG_ARM64_SW_TTBR0_PAN=y. I see that with CONFIG_ARM64_SW_TTBR0_PAN=y, this means that we may leave the stale TTBR0 value installed across a context-switch (and have reproduced that locally), but I'm having some difficulty reproducing the corruption that you see. > I was able to reproduce memory corruption problem on Broadcom's SoC > ARMv8-A like this: > > Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's > stack is accessed and copied. IIUC this tickles the issue by performing a faulting uaccess in IRQ context. On the path to returnign from the exception, it directly calls into the scheduler as part of el1_preempt, erroneously passing the TTBR0 value to the next task. Note that a preemption would remove the stale TTBR0 value as part of kernel entry. It looks like if we're in this state, and return from an exception taken from EL1 with SW PAN enabled, we'll also leave the stale TTBR0 value installed. If PAN was disabled (e.g. in the middle of a uaccess region), then we'll restore the correct TTBR0. > The test program performed the following on every CPU and forking many > processes: > > unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, > MAP_SHARED | MAP_ANONYMOUS, -1, 0); > map[0] = getpid(); > sched_yield(); > if (map[0] != getpid()) { > fprintf(stderr, "Corruption detected!"); > } > munmap(map, PAGE_SIZE); Can you provide the whole test, please? And precisely how you're launching it? I've tried turning the above into a main() function, and spawning a number of instances in parallel while perf is running, but I haven't been able to reproduce the issue locally, and I'm concerned that I'm missing something. > From time to time I was getting map[0] to contain pid for a different > process. How often is "from time to time"? How many processes are you running, across how many CPUs? > > Fixes: 338d4f49d6f7114 ("arm64: kernel: Add support for Privileged...") > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > --- > arch/arm64/lib/clear_user.S | 1 + > arch/arm64/lib/copy_from_user.S | 1 + > arch/arm64/lib/copy_in_user.S | 1 + > arch/arm64/lib/copy_to_user.S | 1 + > 4 files changed, 4 insertions(+) FWIW, the diff below looks correct to me, but we might want to fold this into the C wrappers, so that this is consistent with the other uaccess cases (and done in one place in the code). Thanks, Mark. > > diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S > index 10415572e82f..322b55664cca 100644 > --- a/arch/arm64/lib/clear_user.S > +++ b/arch/arm64/lib/clear_user.S > @@ -48,5 +48,6 @@ EXPORT_SYMBOL(__arch_clear_user) > .section .fixup,"ax" > .align 2 > 9: mov x0, x2 // return the original size > + uaccess_disable_not_uao x2, x3 > ret > .previous > diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S > index 680e74409ff9..8472dc7798b3 100644 > --- a/arch/arm64/lib/copy_from_user.S > +++ b/arch/arm64/lib/copy_from_user.S > @@ -66,5 +66,6 @@ EXPORT_SYMBOL(__arch_copy_from_user) > .section .fixup,"ax" > .align 2 > 9998: sub x0, end, dst // bytes not copied > + uaccess_disable_not_uao x3, x4 > ret > .previous > diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S > index 0bedae3f3792..8e0355c1e318 100644 > --- a/arch/arm64/lib/copy_in_user.S > +++ b/arch/arm64/lib/copy_in_user.S > @@ -68,5 +68,6 @@ EXPORT_SYMBOL(__arch_copy_in_user) > .section .fixup,"ax" > .align 2 > 9998: sub x0, end, dst // bytes not copied > + uaccess_disable_not_uao x3, x4 > ret > .previous > diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S > index 2d88c736e8f2..6085214654dc 100644 > --- a/arch/arm64/lib/copy_to_user.S > +++ b/arch/arm64/lib/copy_to_user.S > @@ -65,5 +65,6 @@ EXPORT_SYMBOL(__arch_copy_to_user) > .section .fixup,"ax" > .align 2 > 9998: sub x0, end, dst // bytes not copied > + uaccess_disable_not_uao x3, x4 > ret > .previous > -- > 2.24.0 >
On Wed, Nov 20, 2019 at 11:43 AM Mark Rutland <mark.rutland@arm.com> wrote: > > Hi Pavel, > > On Tue, Nov 19, 2019 at 05:10:06PM -0500, Pavel Tatashin wrote: > > Userland access functions (__arch_clear_user, __arch_copy_from_user, > > __arch_copy_in_user, __arch_copy_to_user), enable and disable PAN > > for the duration of copy. However, when copy fails for some reason, > > i.e. access violation the code is transferred to fixedup section, > > where we do not disable PAN. > > Thanks for reporting this. This is a very nasty bug. Indeed, it was biting us randomly, and it took me awhile to understand the root cause. > > > The bug is a security violation as the access to userland is still > > open when it should be disabled, but it also causes memory corruptions > > when software emulated PAN is used: CONFIG_ARM64_SW_TTBR0_PAN=y. > > I see that with CONFIG_ARM64_SW_TTBR0_PAN=y, this means that we may > leave the stale TTBR0 value installed across a context-switch (and have > reproduced that locally), but I'm having some difficulty reproducing the > corruption that you see. I will send the full test shortly. Note, I was never able to reproduce it in QEMU, only on real hardware. Also, for some unknown reason after kexec I could not reproduce it only during first boot, so it is somewhat fragile, but I am sure it can be reproduced in other cases as well, it is just my reproducer is not tunes for that. > > > I was able to reproduce memory corruption problem on Broadcom's SoC > > ARMv8-A like this: > > > > Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's > > stack is accessed and copied. > > IIUC this tickles the issue by performing a faulting uaccess in IRQ > context. On the path to returnign from the exception, it directly calls > into the scheduler as part of el1_preempt, erroneously passing the TTBR0 > value to the next task. Note that a preemption would remove the stale > TTBR0 value as part of kernel entry. Correct. > > It looks like if we're in this state, and return from an exception taken > from EL1 with SW PAN enabled, we'll also leave the stale TTBR0 value > installed. If PAN was disabled (e.g. in the middle of a uaccess region), > then we'll restore the correct TTBR0. > > > The test program performed the following on every CPU and forking many > > processes: > > > > unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, > > MAP_SHARED | MAP_ANONYMOUS, -1, 0); > > map[0] = getpid(); > > sched_yield(); > > if (map[0] != getpid()) { > > fprintf(stderr, "Corruption detected!"); > > } > > munmap(map, PAGE_SIZE); > > Can you provide the whole test, please? And precisely how you're > launching it? I will shortly. > > I've tried turning the above into a main() function, and spawning a > number of instances in parallel while perf is running, but I haven't > been able to reproduce the issue locally, and I'm concerned that I'm > missing something. > > > From time to time I was getting map[0] to contain pid for a different > > process. > > How often is "from time to time"? How many processes are you running, > across how many CPUs? Less than a second on 8 CPU SoC it takes for a process to get access to address space of another process. > > > > > Fixes: 338d4f49d6f7114 ("arm64: kernel: Add support for Privileged...") > > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > --- > > arch/arm64/lib/clear_user.S | 1 + > > arch/arm64/lib/copy_from_user.S | 1 + > > arch/arm64/lib/copy_in_user.S | 1 + > > arch/arm64/lib/copy_to_user.S | 1 + > > 4 files changed, 4 insertions(+) > > FWIW, the diff below looks correct to me, but we might want to fold this > into the C wrappers, so that this is consistent with the other uaccess > cases (and done in one place in the code). I agree, and I actually have a patch for that, but I wanted my fix to be included into 5.4 if possible. This is why I sent it out. I will send out a C wrapper patch soon, but that one won't need to be backported to stable. Pasha
On Tue, Nov 19, 2019 at 05:10:06PM -0500, Pavel Tatashin wrote: > Userland access functions (__arch_clear_user, __arch_copy_from_user, > __arch_copy_in_user, __arch_copy_to_user), enable and disable PAN > for the duration of copy. However, when copy fails for some reason, > i.e. access violation the code is transferred to fixedup section, > where we do not disable PAN. > > The bug is a security violation as the access to userland is still > open when it should be disabled, but it also causes memory corruptions > when software emulated PAN is used: CONFIG_ARM64_SW_TTBR0_PAN=y. > > I was able to reproduce memory corruption problem on Broadcom's SoC > ARMv8-A like this: > > Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's > stack is accessed and copied. > > The test program performed the following on every CPU and forking many > processes: > > unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, > MAP_SHARED | MAP_ANONYMOUS, -1, 0); > map[0] = getpid(); > sched_yield(); > if (map[0] != getpid()) { > fprintf(stderr, "Corruption detected!"); > } > munmap(map, PAGE_SIZE); > > From time to time I was getting map[0] to contain pid for a different > process. > > Fixes: 338d4f49d6f7114 ("arm64: kernel: Add support for Privileged...") > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > --- > arch/arm64/lib/clear_user.S | 1 + > arch/arm64/lib/copy_from_user.S | 1 + > arch/arm64/lib/copy_in_user.S | 1 + > arch/arm64/lib/copy_to_user.S | 1 + > 4 files changed, 4 insertions(+) Thanks. I've pushed this and your other patch out [1], with some changes to the commit message. I'm annoyed that I didn't spot this during review of the initial PAN patches. Will [1] https://fixes.arm64.dev
> > I see that with CONFIG_ARM64_SW_TTBR0_PAN=y, this means that we may > > leave the stale TTBR0 value installed across a context-switch (and have > > reproduced that locally), but I'm having some difficulty reproducing the > > corruption that you see. > > I will send the full test shortly. Note, I was never able to reproduce > it in QEMU, only on real hardware. Also, for some unknown reason after > kexec I could not reproduce it only during first boot, so it is > somewhat fragile, but I am sure it can be reproduced in other cases as > well, it is just my reproducer is not tunes for that. > Attached is the test program that I used to reproduce memory corruption. Test on board with Broadcom's Stingray SoC. Without fix: # time /tmp/repro Corruption: pid 1474 map[0] 1488 cpu 3 Terminated real 0m0.088s user 0m0.004s sys 0m0.071s With the fix: # time /tmp/repro Test passed, all good Terminated real 1m1.286s user 0m0.004s sys 0m0.970s Pasha
> > Thanks. I've pushed this and your other patch out [1], with some changes > to the commit message. I'm annoyed that I didn't spot this during review > of the initial PAN patches. > > Will Great. Thank you, Pasha
On Wed, Nov 20, 2019 at 2:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > > > I see that with CONFIG_ARM64_SW_TTBR0_PAN=y, this means that we may > > > leave the stale TTBR0 value installed across a context-switch (and have > > > reproduced that locally), but I'm having some difficulty reproducing the > > > corruption that you see. > > > > I will send the full test shortly. Note, I was never able to reproduce > > it in QEMU, only on real hardware. Also, for some unknown reason after > > kexec I could not reproduce it only during first boot, so it is > > somewhat fragile, but I am sure it can be reproduced in other cases as > > well, it is just my reproducer is not tunes for that. > > > > Attached is the test program that I used to reproduce memory corruption. > Test on board with Broadcom's Stingray SoC. I forgot to remove from repro.c some of the stuff that I used for debugging: get_pa() and sched_setaffinity() stuff are not needed. > > Without fix: > # time /tmp/repro > Corruption: pid 1474 map[0] 1488 cpu 3 > Terminated > > real 0m0.088s > user 0m0.004s > sys 0m0.071s > > With the fix: > > # time /tmp/repro > Test passed, all good > Terminated > > real 1m1.286s > user 0m0.004s > sys 0m0.970s > > > > Pasha
diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S index 10415572e82f..322b55664cca 100644 --- a/arch/arm64/lib/clear_user.S +++ b/arch/arm64/lib/clear_user.S @@ -48,5 +48,6 @@ EXPORT_SYMBOL(__arch_clear_user) .section .fixup,"ax" .align 2 9: mov x0, x2 // return the original size + uaccess_disable_not_uao x2, x3 ret .previous diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S index 680e74409ff9..8472dc7798b3 100644 --- a/arch/arm64/lib/copy_from_user.S +++ b/arch/arm64/lib/copy_from_user.S @@ -66,5 +66,6 @@ EXPORT_SYMBOL(__arch_copy_from_user) .section .fixup,"ax" .align 2 9998: sub x0, end, dst // bytes not copied + uaccess_disable_not_uao x3, x4 ret .previous diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S index 0bedae3f3792..8e0355c1e318 100644 --- a/arch/arm64/lib/copy_in_user.S +++ b/arch/arm64/lib/copy_in_user.S @@ -68,5 +68,6 @@ EXPORT_SYMBOL(__arch_copy_in_user) .section .fixup,"ax" .align 2 9998: sub x0, end, dst // bytes not copied + uaccess_disable_not_uao x3, x4 ret .previous diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S index 2d88c736e8f2..6085214654dc 100644 --- a/arch/arm64/lib/copy_to_user.S +++ b/arch/arm64/lib/copy_to_user.S @@ -65,5 +65,6 @@ EXPORT_SYMBOL(__arch_copy_to_user) .section .fixup,"ax" .align 2 9998: sub x0, end, dst // bytes not copied + uaccess_disable_not_uao x3, x4 ret .previous
Userland access functions (__arch_clear_user, __arch_copy_from_user, __arch_copy_in_user, __arch_copy_to_user), enable and disable PAN for the duration of copy. However, when copy fails for some reason, i.e. access violation the code is transferred to fixedup section, where we do not disable PAN. The bug is a security violation as the access to userland is still open when it should be disabled, but it also causes memory corruptions when software emulated PAN is used: CONFIG_ARM64_SW_TTBR0_PAN=y. I was able to reproduce memory corruption problem on Broadcom's SoC ARMv8-A like this: Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's stack is accessed and copied. The test program performed the following on every CPU and forking many processes: unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); map[0] = getpid(); sched_yield(); if (map[0] != getpid()) { fprintf(stderr, "Corruption detected!"); } munmap(map, PAGE_SIZE); From time to time I was getting map[0] to contain pid for a different process. Fixes: 338d4f49d6f7114 ("arm64: kernel: Add support for Privileged...") Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> --- arch/arm64/lib/clear_user.S | 1 + arch/arm64/lib/copy_from_user.S | 1 + arch/arm64/lib/copy_in_user.S | 1 + arch/arm64/lib/copy_to_user.S | 1 + 4 files changed, 4 insertions(+)