Message ID | 20210118183033.41764-6-vincenzo.frascino@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: ARMv8.5-A: MTE: Add async mode support | expand |
On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote: > mte_assign_mem_tag_range() is called on production KASAN HW hot > paths. It makes sense to inline it in an attempt to reduce the > overhead. > > Inline mte_assign_mem_tag_range() based on the indications provided at > [1]. > > [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/ > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> > --- > arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++- > arch/arm64/lib/mte.S | 15 --------------- > 2 files changed, 25 insertions(+), 16 deletions(-) > > diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h > index 237bb2f7309d..1a6fd53f82c3 100644 > --- a/arch/arm64/include/asm/mte.h > +++ b/arch/arm64/include/asm/mte.h > @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task); > int mte_ptrace_copy_tags(struct task_struct *child, long request, > unsigned long addr, unsigned long data); > > -void mte_assign_mem_tag_range(void *addr, size_t size); > +static inline void mte_assign_mem_tag_range(void *addr, size_t size) > +{ > + u64 _addr = (u64)addr; > + u64 _end = _addr + size; > + > + /* > + * This function must be invoked from an MTE enabled context. > + * > + * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and > + * size must be non-zero and MTE_GRANULE_SIZE aligned. > + */ > + do { > + /* > + * 'asm volatile' is required to prevent the compiler to move > + * the statement outside of the loop. > + */ > + asm volatile(__MTE_PREAMBLE "stg %0, [%0]" > + : > + : "r" (_addr) > + : "memory"); > + > + _addr += MTE_GRANULE_SIZE; > + } while (_addr != _end); > +} While I'm ok with moving this function to C, I don't think it solves the inlining in the kasan code. The only interface we have to kasan is via mte_{set,get}_mem_tag_range(), so the above function doesn't need to live in a header. If you do want inlining all the way to the kasan code, we should probably move the mte_{set,get}_mem_tag_range() functions to the header as well (and ideally backed by some numbers to show that it matters). Moving it to mte.c also gives us more control on how it's called (we have the WARN_ONs in place in the callers).
Hi Catalin, On 1/19/21 2:45 PM, Catalin Marinas wrote: > On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote: >> mte_assign_mem_tag_range() is called on production KASAN HW hot >> paths. It makes sense to inline it in an attempt to reduce the >> overhead. >> >> Inline mte_assign_mem_tag_range() based on the indications provided at >> [1]. >> >> [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/ >> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: Will Deacon <will@kernel.org> >> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> >> --- >> arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++- >> arch/arm64/lib/mte.S | 15 --------------- >> 2 files changed, 25 insertions(+), 16 deletions(-) >> >> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h >> index 237bb2f7309d..1a6fd53f82c3 100644 >> --- a/arch/arm64/include/asm/mte.h >> +++ b/arch/arm64/include/asm/mte.h >> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task); >> int mte_ptrace_copy_tags(struct task_struct *child, long request, >> unsigned long addr, unsigned long data); >> >> -void mte_assign_mem_tag_range(void *addr, size_t size); >> +static inline void mte_assign_mem_tag_range(void *addr, size_t size) >> +{ >> + u64 _addr = (u64)addr; >> + u64 _end = _addr + size; >> + >> + /* >> + * This function must be invoked from an MTE enabled context. >> + * >> + * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and >> + * size must be non-zero and MTE_GRANULE_SIZE aligned. >> + */ >> + do { >> + /* >> + * 'asm volatile' is required to prevent the compiler to move >> + * the statement outside of the loop. >> + */ >> + asm volatile(__MTE_PREAMBLE "stg %0, [%0]" >> + : >> + : "r" (_addr) >> + : "memory"); >> + >> + _addr += MTE_GRANULE_SIZE; >> + } while (_addr != _end); >> +} > > While I'm ok with moving this function to C, I don't think it solves the > inlining in the kasan code. The only interface we have to kasan is via > mte_{set,get}_mem_tag_range(), so the above function doesn't need to > live in a header. > > If you do want inlining all the way to the kasan code, we should > probably move the mte_{set,get}_mem_tag_range() functions to the header > as well (and ideally backed by some numbers to show that it matters). > > Moving it to mte.c also gives us more control on how it's called (we > have the WARN_ONs in place in the callers). > Based on the thread [1] this patch contains only an intermediate step to allow KASAN to call directly mte_assign_mem_tag_range() in future. At that point I think that mte_set_mem_tag_range() can be removed. If you agree, I would live the things like this to give to Andrey a chance to execute on the original plan with a separate series. I agree though that this change alone does not bring huge benefits but regressions neither. If you want I can add something to the commit message in the next version to make this more explicit. Let me know how do you want me to proceed.
On Tue, Jan 19, 2021 at 4:45 PM Vincenzo Frascino <vincenzo.frascino@arm.com> wrote: > > Hi Catalin, > > On 1/19/21 2:45 PM, Catalin Marinas wrote: > > On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote: > >> mte_assign_mem_tag_range() is called on production KASAN HW hot > >> paths. It makes sense to inline it in an attempt to reduce the > >> overhead. > >> > >> Inline mte_assign_mem_tag_range() based on the indications provided at > >> [1]. > >> > >> [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/ > >> > >> Cc: Catalin Marinas <catalin.marinas@arm.com> > >> Cc: Will Deacon <will@kernel.org> > >> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> > >> --- > >> arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++- > >> arch/arm64/lib/mte.S | 15 --------------- > >> 2 files changed, 25 insertions(+), 16 deletions(-) > >> > >> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h > >> index 237bb2f7309d..1a6fd53f82c3 100644 > >> --- a/arch/arm64/include/asm/mte.h > >> +++ b/arch/arm64/include/asm/mte.h > >> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task); > >> int mte_ptrace_copy_tags(struct task_struct *child, long request, > >> unsigned long addr, unsigned long data); > >> > >> -void mte_assign_mem_tag_range(void *addr, size_t size); > >> +static inline void mte_assign_mem_tag_range(void *addr, size_t size) > >> +{ > >> + u64 _addr = (u64)addr; > >> + u64 _end = _addr + size; > >> + > >> + /* > >> + * This function must be invoked from an MTE enabled context. > >> + * > >> + * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and > >> + * size must be non-zero and MTE_GRANULE_SIZE aligned. > >> + */ > >> + do { > >> + /* > >> + * 'asm volatile' is required to prevent the compiler to move > >> + * the statement outside of the loop. > >> + */ > >> + asm volatile(__MTE_PREAMBLE "stg %0, [%0]" > >> + : > >> + : "r" (_addr) > >> + : "memory"); > >> + > >> + _addr += MTE_GRANULE_SIZE; > >> + } while (_addr != _end); > >> +} > > > > While I'm ok with moving this function to C, I don't think it solves the > > inlining in the kasan code. The only interface we have to kasan is via > > mte_{set,get}_mem_tag_range(), so the above function doesn't need to > > live in a header. > > > > If you do want inlining all the way to the kasan code, we should > > probably move the mte_{set,get}_mem_tag_range() functions to the header > > as well (and ideally backed by some numbers to show that it matters). > > > > Moving it to mte.c also gives us more control on how it's called (we > > have the WARN_ONs in place in the callers). > > > > Based on the thread [1] this patch contains only an intermediate step to allow > KASAN to call directly mte_assign_mem_tag_range() in future. At that point I > think that mte_set_mem_tag_range() can be removed. > > If you agree, I would live the things like this to give to Andrey a chance to > execute on the original plan with a separate series. I think we should drop this patch from this series as it's unrelated. I will pick it up into my future optimization series. Then it will be easier to discuss it in the context. The important part that I needed is an inlinable C implementation of mte_assign_mem_tag_range(), which I now have with this patch. Thanks, Vincenzo!
On Tue, Jan 19, 2021 at 07:12:40PM +0100, Andrey Konovalov wrote: > On Tue, Jan 19, 2021 at 4:45 PM Vincenzo Frascino > <vincenzo.frascino@arm.com> wrote: > > On 1/19/21 2:45 PM, Catalin Marinas wrote: > > > On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote: > > >> mte_assign_mem_tag_range() is called on production KASAN HW hot > > >> paths. It makes sense to inline it in an attempt to reduce the > > >> overhead. > > >> > > >> Inline mte_assign_mem_tag_range() based on the indications provided at > > >> [1]. > > >> > > >> [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/ > > >> > > >> Cc: Catalin Marinas <catalin.marinas@arm.com> > > >> Cc: Will Deacon <will@kernel.org> > > >> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> > > >> --- > > >> arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++- > > >> arch/arm64/lib/mte.S | 15 --------------- > > >> 2 files changed, 25 insertions(+), 16 deletions(-) > > >> > > >> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h > > >> index 237bb2f7309d..1a6fd53f82c3 100644 > > >> --- a/arch/arm64/include/asm/mte.h > > >> +++ b/arch/arm64/include/asm/mte.h > > >> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task); > > >> int mte_ptrace_copy_tags(struct task_struct *child, long request, > > >> unsigned long addr, unsigned long data); > > >> > > >> -void mte_assign_mem_tag_range(void *addr, size_t size); > > >> +static inline void mte_assign_mem_tag_range(void *addr, size_t size) > > >> +{ > > >> + u64 _addr = (u64)addr; > > >> + u64 _end = _addr + size; > > >> + > > >> + /* > > >> + * This function must be invoked from an MTE enabled context. > > >> + * > > >> + * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and > > >> + * size must be non-zero and MTE_GRANULE_SIZE aligned. > > >> + */ > > >> + do { > > >> + /* > > >> + * 'asm volatile' is required to prevent the compiler to move > > >> + * the statement outside of the loop. > > >> + */ > > >> + asm volatile(__MTE_PREAMBLE "stg %0, [%0]" > > >> + : > > >> + : "r" (_addr) > > >> + : "memory"); > > >> + > > >> + _addr += MTE_GRANULE_SIZE; > > >> + } while (_addr != _end); > > >> +} > > > > > > While I'm ok with moving this function to C, I don't think it solves the > > > inlining in the kasan code. The only interface we have to kasan is via > > > mte_{set,get}_mem_tag_range(), so the above function doesn't need to > > > live in a header. > > > > > > If you do want inlining all the way to the kasan code, we should > > > probably move the mte_{set,get}_mem_tag_range() functions to the header > > > as well (and ideally backed by some numbers to show that it matters). > > > > > > Moving it to mte.c also gives us more control on how it's called (we > > > have the WARN_ONs in place in the callers). > > > > > > > Based on the thread [1] this patch contains only an intermediate step to allow > > KASAN to call directly mte_assign_mem_tag_range() in future. At that point I > > think that mte_set_mem_tag_range() can be removed. > > > > If you agree, I would live the things like this to give to Andrey a chance to > > execute on the original plan with a separate series. > > I think we should drop this patch from this series as it's unrelated. > > I will pick it up into my future optimization series. Then it will be > easier to discuss it in the context. The important part that I needed > is an inlinable C implementation of mte_assign_mem_tag_range(), which > I now have with this patch. That's fine by me but we may want to add some forced-alignment on the addr and size as the loop here depends on them being aligned, otherwise it gets stuck. The mte_set_mem_tag_range() at least had a WARN_ON in place. Here we could do: addr &= MTE_GRANULE_MASK; size = ALIGN(size, MTE_GRANULE_SIZE); (or maybe trim "size" with MTE_GRANULE_MASK) That's unless the call places are well known and guarantee this alignment (only had a very brief look).
On Tue, Jan 19, 2021 at 8:00 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Jan 19, 2021 at 07:12:40PM +0100, Andrey Konovalov wrote: > > On Tue, Jan 19, 2021 at 4:45 PM Vincenzo Frascino > > <vincenzo.frascino@arm.com> wrote: > > > On 1/19/21 2:45 PM, Catalin Marinas wrote: > > > > On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote: > > > >> mte_assign_mem_tag_range() is called on production KASAN HW hot > > > >> paths. It makes sense to inline it in an attempt to reduce the > > > >> overhead. > > > >> > > > >> Inline mte_assign_mem_tag_range() based on the indications provided at > > > >> [1]. > > > >> > > > >> [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/ > > > >> > > > >> Cc: Catalin Marinas <catalin.marinas@arm.com> > > > >> Cc: Will Deacon <will@kernel.org> > > > >> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> > > > >> --- > > > >> arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++- > > > >> arch/arm64/lib/mte.S | 15 --------------- > > > >> 2 files changed, 25 insertions(+), 16 deletions(-) > > > >> > > > >> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h > > > >> index 237bb2f7309d..1a6fd53f82c3 100644 > > > >> --- a/arch/arm64/include/asm/mte.h > > > >> +++ b/arch/arm64/include/asm/mte.h > > > >> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task); > > > >> int mte_ptrace_copy_tags(struct task_struct *child, long request, > > > >> unsigned long addr, unsigned long data); > > > >> > > > >> -void mte_assign_mem_tag_range(void *addr, size_t size); > > > >> +static inline void mte_assign_mem_tag_range(void *addr, size_t size) > > > >> +{ > > > >> + u64 _addr = (u64)addr; > > > >> + u64 _end = _addr + size; > > > >> + > > > >> + /* > > > >> + * This function must be invoked from an MTE enabled context. > > > >> + * > > > >> + * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and > > > >> + * size must be non-zero and MTE_GRANULE_SIZE aligned. > > > >> + */ > > > >> + do { > > > >> + /* > > > >> + * 'asm volatile' is required to prevent the compiler to move > > > >> + * the statement outside of the loop. > > > >> + */ > > > >> + asm volatile(__MTE_PREAMBLE "stg %0, [%0]" > > > >> + : > > > >> + : "r" (_addr) > > > >> + : "memory"); > > > >> + > > > >> + _addr += MTE_GRANULE_SIZE; > > > >> + } while (_addr != _end); > > > >> +} > > > > > > > > While I'm ok with moving this function to C, I don't think it solves the > > > > inlining in the kasan code. The only interface we have to kasan is via > > > > mte_{set,get}_mem_tag_range(), so the above function doesn't need to > > > > live in a header. > > > > > > > > If you do want inlining all the way to the kasan code, we should > > > > probably move the mte_{set,get}_mem_tag_range() functions to the header > > > > as well (and ideally backed by some numbers to show that it matters). > > > > > > > > Moving it to mte.c also gives us more control on how it's called (we > > > > have the WARN_ONs in place in the callers). > > > > > > > > > > Based on the thread [1] this patch contains only an intermediate step to allow > > > KASAN to call directly mte_assign_mem_tag_range() in future. At that point I > > > think that mte_set_mem_tag_range() can be removed. > > > > > > If you agree, I would live the things like this to give to Andrey a chance to > > > execute on the original plan with a separate series. > > > > I think we should drop this patch from this series as it's unrelated. > > > > I will pick it up into my future optimization series. Then it will be > > easier to discuss it in the context. The important part that I needed > > is an inlinable C implementation of mte_assign_mem_tag_range(), which > > I now have with this patch. > > That's fine by me but we may want to add some forced-alignment on the > addr and size as the loop here depends on them being aligned, otherwise > it gets stuck. The mte_set_mem_tag_range() at least had a WARN_ON in > place. Here we could do: > > addr &= MTE_GRANULE_MASK; > size = ALIGN(size, MTE_GRANULE_SIZE); > > (or maybe trim "size" with MTE_GRANULE_MASK) > > That's unless the call places are well known and guarantee this > alignment (only had a very brief look). No problem. I'll either add the ALIGN or change the call site to ensure alignment.
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h index 237bb2f7309d..1a6fd53f82c3 100644 --- a/arch/arm64/include/asm/mte.h +++ b/arch/arm64/include/asm/mte.h @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task); int mte_ptrace_copy_tags(struct task_struct *child, long request, unsigned long addr, unsigned long data); -void mte_assign_mem_tag_range(void *addr, size_t size); +static inline void mte_assign_mem_tag_range(void *addr, size_t size) +{ + u64 _addr = (u64)addr; + u64 _end = _addr + size; + + /* + * This function must be invoked from an MTE enabled context. + * + * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and + * size must be non-zero and MTE_GRANULE_SIZE aligned. + */ + do { + /* + * 'asm volatile' is required to prevent the compiler to move + * the statement outside of the loop. + */ + asm volatile(__MTE_PREAMBLE "stg %0, [%0]" + : + : "r" (_addr) + : "memory"); + + _addr += MTE_GRANULE_SIZE; + } while (_addr != _end); +} + #else /* CONFIG_ARM64_MTE */ diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S index 9e1a12e10053..a0a650451510 100644 --- a/arch/arm64/lib/mte.S +++ b/arch/arm64/lib/mte.S @@ -150,18 +150,3 @@ SYM_FUNC_START(mte_restore_page_tags) ret SYM_FUNC_END(mte_restore_page_tags) -/* - * Assign allocation tags for a region of memory based on the pointer tag - * x0 - source pointer - * x1 - size - * - * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and - * size must be non-zero and MTE_GRANULE_SIZE aligned. - */ -SYM_FUNC_START(mte_assign_mem_tag_range) -1: stg x0, [x0] - add x0, x0, #MTE_GRANULE_SIZE - subs x1, x1, #MTE_GRANULE_SIZE - b.gt 1b - ret -SYM_FUNC_END(mte_assign_mem_tag_range)
mte_assign_mem_tag_range() is called on production KASAN HW hot paths. It makes sense to inline it in an attempt to reduce the overhead. Inline mte_assign_mem_tag_range() based on the indications provided at [1]. [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/ Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> --- arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++- arch/arm64/lib/mte.S | 15 --------------- 2 files changed, 25 insertions(+), 16 deletions(-)