Message ID | 20210308161434.33424-9-vincenzo.frascino@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: ARMv8.5-A: MTE: Add async mode support | expand |
On Mon, Mar 08, 2021 at 04:14:34PM +0000, Vincenzo Frascino wrote: > load_unaligned_zeropad() and __get/put_kernel_nofault() functions can > read passed some buffer limits which may include some MTE granule with a > different tag. > > When MTE async mode is enable, the load operation crosses the boundaries > and the next granule has a different tag the PE sets the TFSR_EL1.TF1 > bit as if an asynchronous tag fault is happened: > > ================================================================== > BUG: KASAN: invalid-access > Asynchronous mode enabled: no access details available > > CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8 > Hardware name: FVP Base RevC (DT) > Call trace: > dump_backtrace+0x0/0x1c0 > show_stack+0x18/0x24 > dump_stack+0xcc/0x14c > kasan_report_async+0x54/0x70 > mte_check_tfsr_el1+0x48/0x4c > exit_to_user_mode+0x18/0x38 > finish_ret_to_user+0x4/0x15c > ================================================================== > > Verify that Tag Check Override (TCO) is enabled in these functions before > the load and disable it afterwards to prevent this to happen. > > Note: The issue has been observed only with an MTE enabled userspace. The above bug is all about kernel buffers. While userspace can trigger the relevant code paths, it should not matter whether the user has MTE enabled or not. Can you please confirm that you can still triggered the fault with kernel-mode MTE but non-MTE user-space? If not, we may have a bug somewhere as the two are unrelated: load_unaligned_zeropad() only acts on kernel buffers and are subject to the kernel MTE tag check fault mode. I don't think we should have a user-space selftest for this. The bug is not about a user-kernel interface, so an in-kernel test is more appropriate. Could we instead add this to the kasan tests and calling load_unaligned_zeropad() and other functions directly?
On 3/11/21 1:25 PM, Catalin Marinas wrote: > On Mon, Mar 08, 2021 at 04:14:34PM +0000, Vincenzo Frascino wrote: >> load_unaligned_zeropad() and __get/put_kernel_nofault() functions can >> read passed some buffer limits which may include some MTE granule with a >> different tag. >> >> When MTE async mode is enable, the load operation crosses the boundaries >> and the next granule has a different tag the PE sets the TFSR_EL1.TF1 >> bit as if an asynchronous tag fault is happened: >> >> ================================================================== >> BUG: KASAN: invalid-access >> Asynchronous mode enabled: no access details available >> >> CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8 >> Hardware name: FVP Base RevC (DT) >> Call trace: >> dump_backtrace+0x0/0x1c0 >> show_stack+0x18/0x24 >> dump_stack+0xcc/0x14c >> kasan_report_async+0x54/0x70 >> mte_check_tfsr_el1+0x48/0x4c >> exit_to_user_mode+0x18/0x38 >> finish_ret_to_user+0x4/0x15c >> ================================================================== >> >> Verify that Tag Check Override (TCO) is enabled in these functions before >> the load and disable it afterwards to prevent this to happen. >> >> Note: The issue has been observed only with an MTE enabled userspace. > > The above bug is all about kernel buffers. While userspace can trigger > the relevant code paths, it should not matter whether the user has MTE > enabled or not. Can you please confirm that you can still triggered the > fault with kernel-mode MTE but non-MTE user-space? If not, we may have a > bug somewhere as the two are unrelated: load_unaligned_zeropad() only > acts on kernel buffers and are subject to the kernel MTE tag check fault > mode. > I retried and you are right, it does not matter if it is a MTE or non-MTE user-space. The issue seems to be that this test does not trigger the problem all the times which probably lead me to the wrong conclusions. > I don't think we should have a user-space selftest for this. The bug is > not about a user-kernel interface, so an in-kernel test is more > appropriate. Could we instead add this to the kasan tests and calling > load_unaligned_zeropad() and other functions directly? > I agree with you we should abandon this strategy of triggering the issue due to my comment above. I will investigate the option of having a kasan test and try to come up with one that calls the relevant functions directly. I would prefer though, since the rest of the series is almost ready, to post it in a future series. What do you think?
On Thu, Mar 11, 2021 at 03:00:26PM +0000, Vincenzo Frascino wrote: > On 3/11/21 1:25 PM, Catalin Marinas wrote: > > On Mon, Mar 08, 2021 at 04:14:34PM +0000, Vincenzo Frascino wrote: > >> load_unaligned_zeropad() and __get/put_kernel_nofault() functions can > >> read passed some buffer limits which may include some MTE granule with a > >> different tag. > >> > >> When MTE async mode is enable, the load operation crosses the boundaries > >> and the next granule has a different tag the PE sets the TFSR_EL1.TF1 > >> bit as if an asynchronous tag fault is happened: > >> > >> ================================================================== > >> BUG: KASAN: invalid-access > >> Asynchronous mode enabled: no access details available > >> > >> CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8 > >> Hardware name: FVP Base RevC (DT) > >> Call trace: > >> dump_backtrace+0x0/0x1c0 > >> show_stack+0x18/0x24 > >> dump_stack+0xcc/0x14c > >> kasan_report_async+0x54/0x70 > >> mte_check_tfsr_el1+0x48/0x4c > >> exit_to_user_mode+0x18/0x38 > >> finish_ret_to_user+0x4/0x15c > >> ================================================================== > >> > >> Verify that Tag Check Override (TCO) is enabled in these functions before > >> the load and disable it afterwards to prevent this to happen. > >> > >> Note: The issue has been observed only with an MTE enabled userspace. > > > > The above bug is all about kernel buffers. While userspace can trigger > > the relevant code paths, it should not matter whether the user has MTE > > enabled or not. Can you please confirm that you can still triggered the > > fault with kernel-mode MTE but non-MTE user-space? If not, we may have a > > bug somewhere as the two are unrelated: load_unaligned_zeropad() only > > acts on kernel buffers and are subject to the kernel MTE tag check fault > > mode. > > I retried and you are right, it does not matter if it is a MTE or non-MTE > user-space. The issue seems to be that this test does not trigger the problem > all the times which probably lead me to the wrong conclusions. Keep the test around for some quick checks before you get the kasan test support. > > I don't think we should have a user-space selftest for this. The bug is > > not about a user-kernel interface, so an in-kernel test is more > > appropriate. Could we instead add this to the kasan tests and calling > > load_unaligned_zeropad() and other functions directly? > > I agree with you we should abandon this strategy of triggering the issue due to > my comment above. I will investigate the option of having a kasan test and try > to come up with one that calls the relevant functions directly. I would prefer > though, since the rest of the series is almost ready, to post it in a future > series. What do you think? That's fine by me.
On 3/11/21 4:28 PM, Catalin Marinas wrote: > On Thu, Mar 11, 2021 at 03:00:26PM +0000, Vincenzo Frascino wrote: >> On 3/11/21 1:25 PM, Catalin Marinas wrote: >>> On Mon, Mar 08, 2021 at 04:14:34PM +0000, Vincenzo Frascino wrote: >>>> load_unaligned_zeropad() and __get/put_kernel_nofault() functions can >>>> read passed some buffer limits which may include some MTE granule with a >>>> different tag. >>>> >>>> When MTE async mode is enable, the load operation crosses the boundaries >>>> and the next granule has a different tag the PE sets the TFSR_EL1.TF1 >>>> bit as if an asynchronous tag fault is happened: >>>> >>>> ================================================================== >>>> BUG: KASAN: invalid-access >>>> Asynchronous mode enabled: no access details available >>>> >>>> CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8 >>>> Hardware name: FVP Base RevC (DT) >>>> Call trace: >>>> dump_backtrace+0x0/0x1c0 >>>> show_stack+0x18/0x24 >>>> dump_stack+0xcc/0x14c >>>> kasan_report_async+0x54/0x70 >>>> mte_check_tfsr_el1+0x48/0x4c >>>> exit_to_user_mode+0x18/0x38 >>>> finish_ret_to_user+0x4/0x15c >>>> ================================================================== >>>> >>>> Verify that Tag Check Override (TCO) is enabled in these functions before >>>> the load and disable it afterwards to prevent this to happen. >>>> >>>> Note: The issue has been observed only with an MTE enabled userspace. >>> >>> The above bug is all about kernel buffers. While userspace can trigger >>> the relevant code paths, it should not matter whether the user has MTE >>> enabled or not. Can you please confirm that you can still triggered the >>> fault with kernel-mode MTE but non-MTE user-space? If not, we may have a >>> bug somewhere as the two are unrelated: load_unaligned_zeropad() only >>> acts on kernel buffers and are subject to the kernel MTE tag check fault >>> mode. >> >> I retried and you are right, it does not matter if it is a MTE or non-MTE >> user-space. The issue seems to be that this test does not trigger the problem >> all the times which probably lead me to the wrong conclusions. > > Keep the test around for some quick checks before you get the kasan > test support. > Of course, I never throw away my code. >>> I don't think we should have a user-space selftest for this. The bug is >>> not about a user-kernel interface, so an in-kernel test is more >>> appropriate. Could we instead add this to the kasan tests and calling >>> load_unaligned_zeropad() and other functions directly? >> >> I agree with you we should abandon this strategy of triggering the issue due to >> my comment above. I will investigate the option of having a kasan test and try >> to come up with one that calls the relevant functions directly. I would prefer >> though, since the rest of the series is almost ready, to post it in a future >> series. What do you think? > > That's fine by me. >
diff --git a/tools/testing/selftests/arm64/mte/check_read_beyond_buffer.c b/tools/testing/selftests/arm64/mte/check_read_beyond_buffer.c new file mode 100644 index 000000000000..eb03cd52a58e --- /dev/null +++ b/tools/testing/selftests/arm64/mte/check_read_beyond_buffer.c @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2020 ARM Limited + +#define _GNU_SOURCE + +#include <errno.h> +#include <fcntl.h> +#include <pthread.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <time.h> +#include <unistd.h> +#include <sys/auxv.h> +#include <sys/mman.h> +#include <sys/prctl.h> +#include <sys/types.h> +#include <sys/wait.h> + +#include "kselftest.h" +#include "mte_common_util.h" +#include "mte_def.h" + +#define NUM_DEVICES 8 + +static char *dev[NUM_DEVICES] = { + "/proc/cmdline", + "/fstab.fvp", + "/dev/null", + "/proc/mounts", + "/proc/filesystems", + "/proc/cmdline", + "/proc/device-tre", /* incorrect path */ + "", +}; + +#define FAKE_PERMISSION 0x88000 +#define MAX_DESCRIPTOR 0xffffffff + +int mte_read_beyond_buffer_test(void) +{ + int fd[NUM_DEVICES]; + unsigned int _desc, _dev; + + for (_desc = 0; _desc <= MAX_DESCRIPTOR; _desc++) { + for (_dev = 0; _dev < NUM_DEVICES; _dev++) { +#ifdef _TEST_DEBUG + printf("[TEST]: openat(0x%x, %s, 0x%x)\n", _desc, dev[_dev], FAKE_PERMISSION); +#endif + + fd[_dev] = openat(_desc, dev[_dev], FAKE_PERMISSION); + } + + for (_dev = 0; _dev <= NUM_DEVICES; _dev++) + close(fd[_dev]); + } + + return KSFT_PASS; +} + +int main(int argc, char *argv[]) +{ + int err; + + err = mte_default_setup(); + if (err) + return err; + + ksft_set_plan(1); + + evaluate_test(mte_read_beyond_buffer_test(), + "Verify that TCO is enabled correctly if a read beyond buffer occurs\n"); + + mte_restore_setup(); + ksft_print_cnts(); + + return ksft_get_fail_cnt() == 0 ? KSFT_PASS : KSFT_FAIL; +}
load_unaligned_zeropad() and __get/put_kernel_nofault() functions can read passed some buffer limits which may include some MTE granule with a different tag. When MTE async mode is enable, the load operation crosses the boundaries and the next granule has a different tag the PE sets the TFSR_EL1.TF1 bit as if an asynchronous tag fault is happened: ================================================================== BUG: KASAN: invalid-access Asynchronous mode enabled: no access details available CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8 Hardware name: FVP Base RevC (DT) Call trace: dump_backtrace+0x0/0x1c0 show_stack+0x18/0x24 dump_stack+0xcc/0x14c kasan_report_async+0x54/0x70 mte_check_tfsr_el1+0x48/0x4c exit_to_user_mode+0x18/0x38 finish_ret_to_user+0x4/0x15c ================================================================== Verify that Tag Check Override (TCO) is enabled in these functions before the load and disable it afterwards to prevent this to happen. Note: The issue has been observed only with an MTE enabled userspace. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reported-by: Branislav Rankov <Branislav.Rankov@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> --- .../arm64/mte/check_read_beyond_buffer.c | 78 +++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 tools/testing/selftests/arm64/mte/check_read_beyond_buffer.c