diff mbox series

[RFC] arm64: remove CONFIG_DEBUG_ALIGN_RODATA feature

Message ID 20200329141258.31172-1-ardb@kernel.org (mailing list archive)
State Mainlined
Commit e16e65a02913d29a7b27c4e3a415ceec967b0629
Headers show
Series [RFC] arm64: remove CONFIG_DEBUG_ALIGN_RODATA feature | expand

Commit Message

Ard Biesheuvel March 29, 2020, 2:12 p.m. UTC
When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
different permissions (r-x for .text, r-- for .rodata, rw- for .data,
etc) are rounded up to 2 MiB so they can be mapped more efficiently.
In particular, it permits the segments to be mapped using level 2
block entries when using 4k pages, which is expected to result in less
TLB pressure.

However, the mappings for the bulk of the kernel will use level 2
entries anyway, and the misaligned fringes are organized such that they
can take advantage of the contiguous bit, and use far fewer level 3
entries than would be needed otherwise.

This makes the value of this feature dubious at best, and since it is not
enabled in defconfig or in the distro configs, it does not appear to be
in wide use either. So let's just remove it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig.debug                  | 13 -------------
 arch/arm64/include/asm/memory.h           | 12 +-----------
 drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
 3 files changed, 4 insertions(+), 29 deletions(-)

Comments

Mark Rutland March 30, 2020, 11:29 a.m. UTC | #1
On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> In particular, it permits the segments to be mapped using level 2
> block entries when using 4k pages, which is expected to result in less
> TLB pressure.
> 
> However, the mappings for the bulk of the kernel will use level 2
> entries anyway, and the misaligned fringes are organized such that they
> can take advantage of the contiguous bit, and use far fewer level 3
> entries than would be needed otherwise.
> 
> This makes the value of this feature dubious at best, and since it is not
> enabled in defconfig or in the distro configs, it does not appear to be
> in wide use either. So let's just remove it.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

No strong feelings either way, but getting rid of code is usually good,
so:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/Kconfig.debug                  | 13 -------------
>  arch/arm64/include/asm/memory.h           | 12 +-----------
>  drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
>  3 files changed, 4 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
> index 1c906d932d6b..a1efa246c9ed 100644
> --- a/arch/arm64/Kconfig.debug
> +++ b/arch/arm64/Kconfig.debug
> @@ -52,19 +52,6 @@ config DEBUG_WX
>  
>  	  If in doubt, say "Y".
>  
> -config DEBUG_ALIGN_RODATA
> -	depends on STRICT_KERNEL_RWX
> -	bool "Align linker sections up to SECTION_SIZE"
> -	help
> -	  If this option is enabled, sections that may potentially be marked as
> -	  read only or non-executable will be aligned up to the section size of
> -	  the kernel. This prevents sections from being split into pages and
> -	  avoids a potential TLB penalty. The downside is an increase in
> -	  alignment and potentially wasted space. Turn on this option if
> -	  performance is more important than memory pressure.
> -
> -	  If in doubt, say N.
> -
>  config DEBUG_EFI
>  	depends on EFI && DEBUG_INFO
>  	bool "UEFI debugging"
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 4d94676e5a8b..3b34f7bde2f2 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -119,22 +119,12 @@
>  
>  /*
>   * Alignment of kernel segments (e.g. .text, .data).
> - */
> -#if defined(CONFIG_DEBUG_ALIGN_RODATA)
> -/*
> - *  4 KB granule:   1 level 2 entry
> - * 16 KB granule: 128 level 3 entries, with contiguous bit
> - * 64 KB granule:  32 level 3 entries, with contiguous bit
> - */
> -#define SEGMENT_ALIGN		SZ_2M
> -#else
> -/*
> + *
>   *  4 KB granule:  16 level 3 entries, with contiguous bit
>   * 16 KB granule:   4 level 3 entries, without contiguous bit
>   * 64 KB granule:   1 level 3 entry
>   */
>  #define SEGMENT_ALIGN		SZ_64K
> -#endif
>  
>  /*
>   * Memory types available.
> diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
> index db0c1a9c1699..fc9f8ab533a7 100644
> --- a/drivers/firmware/efi/libstub/arm64-stub.c
> +++ b/drivers/firmware/efi/libstub/arm64-stub.c
> @@ -75,14 +75,12 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
>  
>  	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && phys_seed != 0) {
>  		/*
> -		 * If CONFIG_DEBUG_ALIGN_RODATA is not set, produce a
> -		 * displacement in the interval [0, MIN_KIMG_ALIGN) that
> -		 * doesn't violate this kernel's de-facto alignment
> +		 * Produce a displacement in the interval [0, MIN_KIMG_ALIGN)
> +		 * that doesn't violate this kernel's de-facto alignment
>  		 * constraints.
>  		 */
>  		u32 mask = (MIN_KIMG_ALIGN - 1) & ~(EFI_KIMG_ALIGN - 1);
> -		u32 offset = !IS_ENABLED(CONFIG_DEBUG_ALIGN_RODATA) ?
> -			     (phys_seed >> 32) & mask : TEXT_OFFSET;
> +		u32 offset = (phys_seed >> 32) & mask;
>  
>  		/*
>  		 * With CONFIG_RANDOMIZE_TEXT_OFFSET=y, TEXT_OFFSET may not
> -- 
> 2.17.1
>
Ard Biesheuvel March 30, 2020, 12:36 p.m. UTC | #2
On Mon, 30 Mar 2020 at 13:29, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> > When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> > different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> > etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> > In particular, it permits the segments to be mapped using level 2
> > block entries when using 4k pages, which is expected to result in less
> > TLB pressure.
> >
> > However, the mappings for the bulk of the kernel will use level 2
> > entries anyway, and the misaligned fringes are organized such that they
> > can take advantage of the contiguous bit, and use far fewer level 3
> > entries than would be needed otherwise.
> >
> > This makes the value of this feature dubious at best, and since it is not
> > enabled in defconfig or in the distro configs, it does not appear to be
> > in wide use either. So let's just remove it.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> No strong feelings either way, but getting rid of code is usually good,
> so:
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks Mark.

This is related to [0], which increases the PE/COFF section alignment
to 64k so that a KASLR enabled kernel always lands at an address at
which it can execute without being moved around first. This is an
improvement in itself, but also provides 5 bits (log2(2M / 64k)) of
wiggle room for the virtual as well as the physical placement of the
kernel. CONFIG_DEBUG_ALIGN_RODATA kind of interferes with that, so I'd
like to get rid of it.

Catalin, Will: if you have no objections, I can include this in my
series for v5.8 and take it via the EFI tree.

Thanks,
Ard.

[0] https://lore.kernel.org/linux-efi/20200326165905.2240-1-ardb@kernel.org/
Will Deacon March 30, 2020, 1:51 p.m. UTC | #3
On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> In particular, it permits the segments to be mapped using level 2
> block entries when using 4k pages, which is expected to result in less
> TLB pressure.
> 
> However, the mappings for the bulk of the kernel will use level 2
> entries anyway, and the misaligned fringes are organized such that they
> can take advantage of the contiguous bit, and use far fewer level 3
> entries than would be needed otherwise.
> 
> This makes the value of this feature dubious at best, and since it is not
> enabled in defconfig or in the distro configs, it does not appear to be
> in wide use either. So let's just remove it.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/Kconfig.debug                  | 13 -------------
>  arch/arm64/include/asm/memory.h           | 12 +-----------
>  drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
>  3 files changed, 4 insertions(+), 29 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

But I would really like to go a step further and rip out the block mapping
support altogether so that we can fix non-coherent DMA aliases:

https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de

Will
Ard Biesheuvel March 30, 2020, 1:53 p.m. UTC | #4
On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@kernel.org> wrote:
>
> On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> > When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> > different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> > etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> > In particular, it permits the segments to be mapped using level 2
> > block entries when using 4k pages, which is expected to result in less
> > TLB pressure.
> >
> > However, the mappings for the bulk of the kernel will use level 2
> > entries anyway, and the misaligned fringes are organized such that they
> > can take advantage of the contiguous bit, and use far fewer level 3
> > entries than would be needed otherwise.
> >
> > This makes the value of this feature dubious at best, and since it is not
> > enabled in defconfig or in the distro configs, it does not appear to be
> > in wide use either. So let's just remove it.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/Kconfig.debug                  | 13 -------------
> >  arch/arm64/include/asm/memory.h           | 12 +-----------
> >  drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
> >  3 files changed, 4 insertions(+), 29 deletions(-)
>
> Acked-by: Will Deacon <will@kernel.org>
>
> But I would really like to go a step further and rip out the block mapping
> support altogether so that we can fix non-coherent DMA aliases:
>
> https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de
>

I'm not sure I follow - is this about mapping parts of the static
kernel Image for non-coherent DMA?
Laura Abbott March 30, 2020, 1:57 p.m. UTC | #5
On 3/29/20 10:12 AM, Ard Biesheuvel wrote:
> When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> In particular, it permits the segments to be mapped using level 2
> block entries when using 4k pages, which is expected to result in less
> TLB pressure.
> 
> However, the mappings for the bulk of the kernel will use level 2
> entries anyway, and the misaligned fringes are organized such that they
> can take advantage of the contiguous bit, and use far fewer level 3
> entries than would be needed otherwise.
> 
> This makes the value of this feature dubious at best, and since it is not
> enabled in defconfig or in the distro configs, it does not appear to be
> in wide use either. So let's just remove it.
> 

This was supposed to avoid potential performance inconsistencies with
some memory being in pages vs other in block. It was always a minor
concern so if it's preventing other work:

Acked-by: Laura Abbott <labbott@kernel.org>

> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>   arch/arm64/Kconfig.debug                  | 13 -------------
>   arch/arm64/include/asm/memory.h           | 12 +-----------
>   drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
>   3 files changed, 4 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
> index 1c906d932d6b..a1efa246c9ed 100644
> --- a/arch/arm64/Kconfig.debug
> +++ b/arch/arm64/Kconfig.debug
> @@ -52,19 +52,6 @@ config DEBUG_WX
>   
>   	  If in doubt, say "Y".
>   
> -config DEBUG_ALIGN_RODATA
> -	depends on STRICT_KERNEL_RWX
> -	bool "Align linker sections up to SECTION_SIZE"
> -	help
> -	  If this option is enabled, sections that may potentially be marked as
> -	  read only or non-executable will be aligned up to the section size of
> -	  the kernel. This prevents sections from being split into pages and
> -	  avoids a potential TLB penalty. The downside is an increase in
> -	  alignment and potentially wasted space. Turn on this option if
> -	  performance is more important than memory pressure.
> -
> -	  If in doubt, say N.
> -
>   config DEBUG_EFI
>   	depends on EFI && DEBUG_INFO
>   	bool "UEFI debugging"
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 4d94676e5a8b..3b34f7bde2f2 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -119,22 +119,12 @@
>   
>   /*
>    * Alignment of kernel segments (e.g. .text, .data).
> - */
> -#if defined(CONFIG_DEBUG_ALIGN_RODATA)
> -/*
> - *  4 KB granule:   1 level 2 entry
> - * 16 KB granule: 128 level 3 entries, with contiguous bit
> - * 64 KB granule:  32 level 3 entries, with contiguous bit
> - */
> -#define SEGMENT_ALIGN		SZ_2M
> -#else
> -/*
> + *
>    *  4 KB granule:  16 level 3 entries, with contiguous bit
>    * 16 KB granule:   4 level 3 entries, without contiguous bit
>    * 64 KB granule:   1 level 3 entry
>    */
>   #define SEGMENT_ALIGN		SZ_64K
> -#endif
>   
>   /*
>    * Memory types available.
> diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
> index db0c1a9c1699..fc9f8ab533a7 100644
> --- a/drivers/firmware/efi/libstub/arm64-stub.c
> +++ b/drivers/firmware/efi/libstub/arm64-stub.c
> @@ -75,14 +75,12 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
>   
>   	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && phys_seed != 0) {
>   		/*
> -		 * If CONFIG_DEBUG_ALIGN_RODATA is not set, produce a
> -		 * displacement in the interval [0, MIN_KIMG_ALIGN) that
> -		 * doesn't violate this kernel's de-facto alignment
> +		 * Produce a displacement in the interval [0, MIN_KIMG_ALIGN)
> +		 * that doesn't violate this kernel's de-facto alignment
>   		 * constraints.
>   		 */
>   		u32 mask = (MIN_KIMG_ALIGN - 1) & ~(EFI_KIMG_ALIGN - 1);
> -		u32 offset = !IS_ENABLED(CONFIG_DEBUG_ALIGN_RODATA) ?
> -			     (phys_seed >> 32) & mask : TEXT_OFFSET;
> +		u32 offset = (phys_seed >> 32) & mask;
>   
>   		/*
>   		 * With CONFIG_RANDOMIZE_TEXT_OFFSET=y, TEXT_OFFSET may not
>
Robin Murphy March 30, 2020, 1:59 p.m. UTC | #6
On 2020-03-30 2:53 pm, Ard Biesheuvel wrote:
> On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@kernel.org> wrote:
>>
>> On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
>>> When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
>>> different permissions (r-x for .text, r-- for .rodata, rw- for .data,
>>> etc) are rounded up to 2 MiB so they can be mapped more efficiently.
>>> In particular, it permits the segments to be mapped using level 2
>>> block entries when using 4k pages, which is expected to result in less
>>> TLB pressure.
>>>
>>> However, the mappings for the bulk of the kernel will use level 2
>>> entries anyway, and the misaligned fringes are organized such that they
>>> can take advantage of the contiguous bit, and use far fewer level 3
>>> entries than would be needed otherwise.
>>>
>>> This makes the value of this feature dubious at best, and since it is not
>>> enabled in defconfig or in the distro configs, it does not appear to be
>>> in wide use either. So let's just remove it.
>>>
>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>> ---
>>>   arch/arm64/Kconfig.debug                  | 13 -------------
>>>   arch/arm64/include/asm/memory.h           | 12 +-----------
>>>   drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
>>>   3 files changed, 4 insertions(+), 29 deletions(-)
>>
>> Acked-by: Will Deacon <will@kernel.org>
>>
>> But I would really like to go a step further and rip out the block mapping
>> support altogether so that we can fix non-coherent DMA aliases:
>>
>> https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de
>>
> 
> I'm not sure I follow - is this about mapping parts of the static
> kernel Image for non-coherent DMA?

Yikes, I hope not!

The concern there is about block entries in the linear map; I'd assume 
kernel text/data means not-linear-map, and is thus a different kettle of 
fish anyway.

Robin.
Will Deacon March 30, 2020, 2:04 p.m. UTC | #7
On Mon, Mar 30, 2020 at 03:53:04PM +0200, Ard Biesheuvel wrote:
> On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@kernel.org> wrote:
> >
> > On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> > > When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> > > different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> > > etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> > > In particular, it permits the segments to be mapped using level 2
> > > block entries when using 4k pages, which is expected to result in less
> > > TLB pressure.
> > >
> > > However, the mappings for the bulk of the kernel will use level 2
> > > entries anyway, and the misaligned fringes are organized such that they
> > > can take advantage of the contiguous bit, and use far fewer level 3
> > > entries than would be needed otherwise.
> > >
> > > This makes the value of this feature dubious at best, and since it is not
> > > enabled in defconfig or in the distro configs, it does not appear to be
> > > in wide use either. So let's just remove it.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/arm64/Kconfig.debug                  | 13 -------------
> > >  arch/arm64/include/asm/memory.h           | 12 +-----------
> > >  drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
> > >  3 files changed, 4 insertions(+), 29 deletions(-)
> >
> > Acked-by: Will Deacon <will@kernel.org>
> >
> > But I would really like to go a step further and rip out the block mapping
> > support altogether so that we can fix non-coherent DMA aliases:
> >
> > https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de
> >
> 
> I'm not sure I follow - is this about mapping parts of the static
> kernel Image for non-coherent DMA?

Sorry, it's not directly related to your patch, just that if we're removing
options relating to kernel mappings then I'd be quite keen on effectively
forcing page-granularity on the linear map, as is currently done by default
thanks to RODATA_FULL_DEFAULT_ENABLED, so that we can nobble cacheable
aliases for non-coherent streaming DMA mappings by hooking into Christoph's
series above.

This series just reminded me of it because it's another
"off-by-default-behaviour-for-block-mappings-probably-because-of-performance-
but-never-actually-measured" type of thing which really just gets in the
way.

Will
Ard Biesheuvel March 30, 2020, 2:22 p.m. UTC | #8
On Mon, 30 Mar 2020 at 16:04, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Mar 30, 2020 at 03:53:04PM +0200, Ard Biesheuvel wrote:
> > On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@kernel.org> wrote:
> > >
> > > On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> > > > When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> > > > different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> > > > etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> > > > In particular, it permits the segments to be mapped using level 2
> > > > block entries when using 4k pages, which is expected to result in less
> > > > TLB pressure.
> > > >
> > > > However, the mappings for the bulk of the kernel will use level 2
> > > > entries anyway, and the misaligned fringes are organized such that they
> > > > can take advantage of the contiguous bit, and use far fewer level 3
> > > > entries than would be needed otherwise.
> > > >
> > > > This makes the value of this feature dubious at best, and since it is not
> > > > enabled in defconfig or in the distro configs, it does not appear to be
> > > > in wide use either. So let's just remove it.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > ---
> > > >  arch/arm64/Kconfig.debug                  | 13 -------------
> > > >  arch/arm64/include/asm/memory.h           | 12 +-----------
> > > >  drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
> > > >  3 files changed, 4 insertions(+), 29 deletions(-)
> > >
> > > Acked-by: Will Deacon <will@kernel.org>
> > >
> > > But I would really like to go a step further and rip out the block mapping
> > > support altogether so that we can fix non-coherent DMA aliases:
> > >
> > > https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de
> > >
> >
> > I'm not sure I follow - is this about mapping parts of the static
> > kernel Image for non-coherent DMA?
>
> Sorry, it's not directly related to your patch, just that if we're removing
> options relating to kernel mappings then I'd be quite keen on effectively
> forcing page-granularity on the linear map, as is currently done by default
> thanks to RODATA_FULL_DEFAULT_ENABLED, so that we can nobble cacheable
> aliases for non-coherent streaming DMA mappings by hooking into Christoph's
> series above.
>

Right. I don't remember seeing any complaints about
RODATA_FULL_DEFAULT_ENABLED, but maybe it's too early for that.

> This series just reminded me of it because it's another
> "off-by-default-behaviour-for-block-mappings-probably-because-of-performance-
> but-never-actually-measured" type of thing which really just gets in the
> way.
>

Well, even though I agree that the lack of actual numbers is a bit
disturbing here, I'd hate to penalize all systems even more than they
already are (due to ARCH_KMALLOC_MINALIGN == ARCH_DMA_MINALIGN) by
adding another workaround that is only needed on devices that have
non-coherent masters.
Will Deacon March 30, 2020, 2:28 p.m. UTC | #9
On Mon, Mar 30, 2020 at 04:22:24PM +0200, Ard Biesheuvel wrote:
> On Mon, 30 Mar 2020 at 16:04, Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Mar 30, 2020 at 03:53:04PM +0200, Ard Biesheuvel wrote:
> > > On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@kernel.org> wrote:
> > > >
> > > > On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> > > > > When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> > > > > different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> > > > > etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> > > > > In particular, it permits the segments to be mapped using level 2
> > > > > block entries when using 4k pages, which is expected to result in less
> > > > > TLB pressure.
> > > > >
> > > > > However, the mappings for the bulk of the kernel will use level 2
> > > > > entries anyway, and the misaligned fringes are organized such that they
> > > > > can take advantage of the contiguous bit, and use far fewer level 3
> > > > > entries than would be needed otherwise.
> > > > >
> > > > > This makes the value of this feature dubious at best, and since it is not
> > > > > enabled in defconfig or in the distro configs, it does not appear to be
> > > > > in wide use either. So let's just remove it.
> > > > >
> > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > ---
> > > > >  arch/arm64/Kconfig.debug                  | 13 -------------
> > > > >  arch/arm64/include/asm/memory.h           | 12 +-----------
> > > > >  drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
> > > > >  3 files changed, 4 insertions(+), 29 deletions(-)
> > > >
> > > > Acked-by: Will Deacon <will@kernel.org>
> > > >
> > > > But I would really like to go a step further and rip out the block mapping
> > > > support altogether so that we can fix non-coherent DMA aliases:
> > > >
> > > > https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de
> > > >
> > >
> > > I'm not sure I follow - is this about mapping parts of the static
> > > kernel Image for non-coherent DMA?
> >
> > Sorry, it's not directly related to your patch, just that if we're removing
> > options relating to kernel mappings then I'd be quite keen on effectively
> > forcing page-granularity on the linear map, as is currently done by default
> > thanks to RODATA_FULL_DEFAULT_ENABLED, so that we can nobble cacheable
> > aliases for non-coherent streaming DMA mappings by hooking into Christoph's
> > series above.
> >
> 
> Right. I don't remember seeing any complaints about
> RODATA_FULL_DEFAULT_ENABLED, but maybe it's too early for that.
> 
> > This series just reminded me of it because it's another
> > "off-by-default-behaviour-for-block-mappings-probably-because-of-performance-
> > but-never-actually-measured" type of thing which really just gets in the
> > way.
> >
> 
> Well, even though I agree that the lack of actual numbers is a bit
> disturbing here, I'd hate to penalize all systems even more than they
> already are (due to ARCH_KMALLOC_MINALIGN == ARCH_DMA_MINALIGN) by
> adding another workaround that is only needed on devices that have
> non-coherent masters.

Fair enough, but I'd still like to see some numbers. If they're compelling,
then we could explore something like CONFIG_OF_DMA_DEFAULT_COHERENT, but
that doesn't really help the kconfig maze :(

Will
Ard Biesheuvel March 30, 2020, 2:32 p.m. UTC | #10
On Mon, 30 Mar 2020 at 16:28, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Mar 30, 2020 at 04:22:24PM +0200, Ard Biesheuvel wrote:
> > On Mon, 30 Mar 2020 at 16:04, Will Deacon <will@kernel.org> wrote:
> > >
> > > On Mon, Mar 30, 2020 at 03:53:04PM +0200, Ard Biesheuvel wrote:
> > > > On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@kernel.org> wrote:
> > > > >
> > > > > On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> > > > > > When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> > > > > > different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> > > > > > etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> > > > > > In particular, it permits the segments to be mapped using level 2
> > > > > > block entries when using 4k pages, which is expected to result in less
> > > > > > TLB pressure.
> > > > > >
> > > > > > However, the mappings for the bulk of the kernel will use level 2
> > > > > > entries anyway, and the misaligned fringes are organized such that they
> > > > > > can take advantage of the contiguous bit, and use far fewer level 3
> > > > > > entries than would be needed otherwise.
> > > > > >
> > > > > > This makes the value of this feature dubious at best, and since it is not
> > > > > > enabled in defconfig or in the distro configs, it does not appear to be
> > > > > > in wide use either. So let's just remove it.
> > > > > >
> > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > > ---
> > > > > >  arch/arm64/Kconfig.debug                  | 13 -------------
> > > > > >  arch/arm64/include/asm/memory.h           | 12 +-----------
> > > > > >  drivers/firmware/efi/libstub/arm64-stub.c |  8 +++-----
> > > > > >  3 files changed, 4 insertions(+), 29 deletions(-)
> > > > >
> > > > > Acked-by: Will Deacon <will@kernel.org>
> > > > >
> > > > > But I would really like to go a step further and rip out the block mapping
> > > > > support altogether so that we can fix non-coherent DMA aliases:
> > > > >
> > > > > https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de
> > > > >
> > > >
> > > > I'm not sure I follow - is this about mapping parts of the static
> > > > kernel Image for non-coherent DMA?
> > >
> > > Sorry, it's not directly related to your patch, just that if we're removing
> > > options relating to kernel mappings then I'd be quite keen on effectively
> > > forcing page-granularity on the linear map, as is currently done by default
> > > thanks to RODATA_FULL_DEFAULT_ENABLED, so that we can nobble cacheable
> > > aliases for non-coherent streaming DMA mappings by hooking into Christoph's
> > > series above.
> > >
> >
> > Right. I don't remember seeing any complaints about
> > RODATA_FULL_DEFAULT_ENABLED, but maybe it's too early for that.
> >
> > > This series just reminded me of it because it's another
> > > "off-by-default-behaviour-for-block-mappings-probably-because-of-performance-
> > > but-never-actually-measured" type of thing which really just gets in the
> > > way.
> > >
> >
> > Well, even though I agree that the lack of actual numbers is a bit
> > disturbing here, I'd hate to penalize all systems even more than they
> > already are (due to ARCH_KMALLOC_MINALIGN == ARCH_DMA_MINALIGN) by
> > adding another workaround that is only needed on devices that have
> > non-coherent masters.
>
> Fair enough, but I'd still like to see some numbers. If they're compelling,
> then we could explore something like CONFIG_OF_DMA_DEFAULT_COHERENT, but
> that doesn't really help the kconfig maze :(
>

Could we make this a runtime thing? E.g., remap the entire linear
region down to pages under stop_machine() the first time we probe a
device that uses non-coherent DMA?

(/me ducks)
Catalin Marinas April 2, 2020, 11:15 a.m. UTC | #11
On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> In particular, it permits the segments to be mapped using level 2
> block entries when using 4k pages, which is expected to result in less
> TLB pressure.
> 
> However, the mappings for the bulk of the kernel will use level 2
> entries anyway, and the misaligned fringes are organized such that they
> can take advantage of the contiguous bit, and use far fewer level 3
> entries than would be needed otherwise.
> 
> This makes the value of this feature dubious at best, and since it is not
> enabled in defconfig or in the distro configs, it does not appear to be
> in wide use either. So let's just remove it.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Happy to take this patch via the arm64 tree for 5.7 (no new
functionality), unless you want it to go with your other relocation
login in the EFI stub patches.
Ard Biesheuvel April 2, 2020, 11:24 a.m. UTC | #12
On Thu, 2 Apr 2020 at 13:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Sun, Mar 29, 2020 at 04:12:58PM +0200, Ard Biesheuvel wrote:
> > When CONFIG_DEBUG_ALIGN_RODATA is enabled, kernel segments mapped with
> > different permissions (r-x for .text, r-- for .rodata, rw- for .data,
> > etc) are rounded up to 2 MiB so they can be mapped more efficiently.
> > In particular, it permits the segments to be mapped using level 2
> > block entries when using 4k pages, which is expected to result in less
> > TLB pressure.
> >
> > However, the mappings for the bulk of the kernel will use level 2
> > entries anyway, and the misaligned fringes are organized such that they
> > can take advantage of the contiguous bit, and use far fewer level 3
> > entries than would be needed otherwise.
> >
> > This makes the value of this feature dubious at best, and since it is not
> > enabled in defconfig or in the distro configs, it does not appear to be
> > in wide use either. So let's just remove it.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> Happy to take this patch via the arm64 tree for 5.7 (no new
> functionality), unless you want it to go with your other relocation
> login in the EFI stub patches.
>

If you don't mind taking it for v5.7, please go ahead.
Catalin Marinas April 2, 2020, 11:30 a.m. UTC | #13
On Mon, Mar 30, 2020 at 04:32:31PM +0200, Ard Biesheuvel wrote:
> On Mon, 30 Mar 2020 at 16:28, Will Deacon <will@kernel.org> wrote:
> > > On Mon, 30 Mar 2020 at 16:04, Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Mar 30, 2020 at 03:53:04PM +0200, Ard Biesheuvel wrote:
> > > > > On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@kernel.org> wrote:
> > > > > > But I would really like to go a step further and rip out the block mapping
> > > > > > support altogether so that we can fix non-coherent DMA aliases:
> > > > > >
> > > > > > https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de
> > > > >
> > > > > I'm not sure I follow - is this about mapping parts of the static
> > > > > kernel Image for non-coherent DMA?
> > > >
> > > > Sorry, it's not directly related to your patch, just that if we're removing
> > > > options relating to kernel mappings then I'd be quite keen on effectively
> > > > forcing page-granularity on the linear map, as is currently done by default
> > > > thanks to RODATA_FULL_DEFAULT_ENABLED, so that we can nobble cacheable
> > > > aliases for non-coherent streaming DMA mappings by hooking into Christoph's
> > > > series above.

Have we ever hit this issue in practice? At least from the CPU
perspective, we've assumed that a non-cacheable access would not hit in
the cache. Reading the ARM ARM rules, it doesn't seem to state this
explicitly but we can ask for clarification (I dug out an email from
2015, left unanswered).

Assuming that the CPU is behaving as we'd expect, are there other issues
with peripherals/SMMU?

> > Fair enough, but I'd still like to see some numbers. If they're compelling,
> > then we could explore something like CONFIG_OF_DMA_DEFAULT_COHERENT, but
> > that doesn't really help the kconfig maze :(

I'd prefer not to have a config option, we could easily break single
Image at some point.

> Could we make this a runtime thing? E.g., remap the entire linear
> region down to pages under stop_machine() the first time we probe a
> device that uses non-coherent DMA?

That could be pretty expensive at run-time. With the ARMv8.4-TTRem
feature, I wonder whether we could do this lazily when allocating
non-coherent DMA buffers.

(I still hope there isn't a problem at all with this mismatch ;)).
Mark Rutland April 2, 2020, 12:17 p.m. UTC | #14
On Thu, Apr 02, 2020 at 12:30:33PM +0100, Catalin Marinas wrote:
> On Mon, Mar 30, 2020 at 04:32:31PM +0200, Ard Biesheuvel wrote:
> > Could we make this a runtime thing? E.g., remap the entire linear
> > region down to pages under stop_machine() the first time we probe a
> > device that uses non-coherent DMA?
> 
> That could be pretty expensive at run-time. With the ARMv8.4-TTRem
> feature, I wonder whether we could do this lazily when allocating
> non-coherent DMA buffers.

It's worth noting that ARMv8.4-TTRem is optional and the "level 1" and
"level 2" behaviours still allows non-fatal faults to be taken while nT
is set (or until you perform the TLB invalidation). We can only safely
use it where we could use a full BBM sequence today.

Effectively TTRem is an optimized, where the CPU *might* be able to use
entries during the break stage, but is not guaranteed to do so.

Thanksm
Mark.
Will Deacon April 3, 2020, 7:07 a.m. UTC | #15
On Thu, Apr 02, 2020 at 12:30:33PM +0100, Catalin Marinas wrote:
> On Mon, Mar 30, 2020 at 04:32:31PM +0200, Ard Biesheuvel wrote:
> > On Mon, 30 Mar 2020 at 16:28, Will Deacon <will@kernel.org> wrote:
> > > > On Mon, 30 Mar 2020 at 16:04, Will Deacon <will@kernel.org> wrote:
> > > > > On Mon, Mar 30, 2020 at 03:53:04PM +0200, Ard Biesheuvel wrote:
> > > > > > On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@kernel.org> wrote:
> > > > > > > But I would really like to go a step further and rip out the block mapping
> > > > > > > support altogether so that we can fix non-coherent DMA aliases:
> > > > > > >
> > > > > > > https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de
> > > > > >
> > > > > > I'm not sure I follow - is this about mapping parts of the static
> > > > > > kernel Image for non-coherent DMA?
> > > > >
> > > > > Sorry, it's not directly related to your patch, just that if we're removing
> > > > > options relating to kernel mappings then I'd be quite keen on effectively
> > > > > forcing page-granularity on the linear map, as is currently done by default
> > > > > thanks to RODATA_FULL_DEFAULT_ENABLED, so that we can nobble cacheable
> > > > > aliases for non-coherent streaming DMA mappings by hooking into Christoph's
> > > > > series above.
> 
> Have we ever hit this issue in practice? At least from the CPU
> perspective, we've assumed that a non-cacheable access would not hit in
> the cache. Reading the ARM ARM rules, it doesn't seem to state this
> explicitly but we can ask for clarification (I dug out an email from
> 2015, left unanswered).
> 
> Assuming that the CPU is behaving as we'd expect, are there other issues
> with peripherals/SMMU?

Any clarification would need to be architectural, I think, and not specific
to the CPU. Worth asking again though.

Will
Ard Biesheuvel April 3, 2020, 8:58 a.m. UTC | #16
On Thu, 2 Apr 2020 at 13:30, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Mon, Mar 30, 2020 at 04:32:31PM +0200, Ard Biesheuvel wrote:
> > On Mon, 30 Mar 2020 at 16:28, Will Deacon <will@kernel.org> wrote:
> > > > On Mon, 30 Mar 2020 at 16:04, Will Deacon <will@kernel.org> wrote:
> > > > > On Mon, Mar 30, 2020 at 03:53:04PM +0200, Ard Biesheuvel wrote:
> > > > > > On Mon, 30 Mar 2020 at 15:51, Will Deacon <will@kernel.org> wrote:
> > > > > > > But I would really like to go a step further and rip out the block mapping
> > > > > > > support altogether so that we can fix non-coherent DMA aliases:
> > > > > > >
> > > > > > > https://lore.kernel.org/lkml/20200224194446.690816-1-hch@lst.de
> > > > > >
> > > > > > I'm not sure I follow - is this about mapping parts of the static
> > > > > > kernel Image for non-coherent DMA?
> > > > >
> > > > > Sorry, it's not directly related to your patch, just that if we're removing
> > > > > options relating to kernel mappings then I'd be quite keen on effectively
> > > > > forcing page-granularity on the linear map, as is currently done by default
> > > > > thanks to RODATA_FULL_DEFAULT_ENABLED, so that we can nobble cacheable
> > > > > aliases for non-coherent streaming DMA mappings by hooking into Christoph's
> > > > > series above.
>
> Have we ever hit this issue in practice? At least from the CPU
> perspective, we've assumed that a non-cacheable access would not hit in
> the cache. Reading the ARM ARM rules, it doesn't seem to state this
> explicitly but we can ask for clarification (I dug out an email from
> 2015, left unanswered).
>

There is some wording in D4.4.5 (Behavior of caches at reset) that
suggests that implementations may permit cache hits in regions that
are mapped Non-cacheable (although the paragraph in question talks
about global controls and not page table attributes)

> Assuming that the CPU is behaving as we'd expect, are there other issues
> with peripherals/SMMU?
>

There is the NoSnoop PCIe issue as well: PCIe masters that are DMA
coherent in general can generate transactions with non-cacheable
attributes. I guess this is mostly orthogonal, but I'm sure it would
be much easier to reason about correctness if it is guaranteed that no
mappings with mismatched attributes exist anywhere.

> > > Fair enough, but I'd still like to see some numbers. If they're compelling,
> > > then we could explore something like CONFIG_OF_DMA_DEFAULT_COHERENT, but
> > > that doesn't really help the kconfig maze :(
>
> I'd prefer not to have a config option, we could easily break single
> Image at some point.
>
> > Could we make this a runtime thing? E.g., remap the entire linear
> > region down to pages under stop_machine() the first time we probe a
> > device that uses non-coherent DMA?
>
> That could be pretty expensive at run-time. With the ARMv8.4-TTRem
> feature, I wonder whether we could do this lazily when allocating
> non-coherent DMA buffers.
>
> (I still hope there isn't a problem at all with this mismatch ;)).
>

Now that we have the pieces to easily remap the linear region down to
pages, and [apparently] some generic infrastructure to manage the
linear aliases, the only downside is the alleged performance hit
resulting from increased TLB pressure. This is obviously highly
micro-architecture dependent, but with Xgene1 and ThunderX1 out of the
picture, I wonder if the tradeoffs are different now. Maybe by now, we
should just suck it up (Note that we had no complaints afaik regarding
the fact that we map the linear map down to pages by default now)
Will Deacon May 5, 2020, 10:44 a.m. UTC | #17
On Fri, Apr 03, 2020 at 10:58:51AM +0200, Ard Biesheuvel wrote:
> On Thu, 2 Apr 2020 at 13:30, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, Mar 30, 2020 at 04:32:31PM +0200, Ard Biesheuvel wrote:
> > > On Mon, 30 Mar 2020 at 16:28, Will Deacon <will@kernel.org> wrote:
> > > > Fair enough, but I'd still like to see some numbers. If they're compelling,
> > > > then we could explore something like CONFIG_OF_DMA_DEFAULT_COHERENT, but
> > > > that doesn't really help the kconfig maze :(
> >
> > I'd prefer not to have a config option, we could easily break single
> > Image at some point.
> >
> > > Could we make this a runtime thing? E.g., remap the entire linear
> > > region down to pages under stop_machine() the first time we probe a
> > > device that uses non-coherent DMA?
> >
> > That could be pretty expensive at run-time. With the ARMv8.4-TTRem
> > feature, I wonder whether we could do this lazily when allocating
> > non-coherent DMA buffers.
> >
> > (I still hope there isn't a problem at all with this mismatch ;)).
> >
> 
> Now that we have the pieces to easily remap the linear region down to
> pages, and [apparently] some generic infrastructure to manage the
> linear aliases, the only downside is the alleged performance hit
> resulting from increased TLB pressure. This is obviously highly
> micro-architecture dependent, but with Xgene1 and ThunderX1 out of the
> picture, I wonder if the tradeoffs are different now. Maybe by now, we
> should just suck it up (Note that we had no complaints afaik regarding
> the fact that we map the linear map down to pages by default now)

I'd be in favour of that fwiw.

Catalin -- did you get anything back from the architects about the cache
hit behaviour?

Will
Catalin Marinas May 7, 2020, 1:43 p.m. UTC | #18
On Tue, May 05, 2020 at 11:44:06AM +0100, Will Deacon wrote:
> Catalin -- did you get anything back from the architects about the cache
> hit behaviour?

Any read from a non-cacheable alias would be coherent with writes using
the same non-cacheable attributes, irrespective of other cacheable
aliases (of course, subject to the cache lines having been previously
cleaned/invalidated to avoid dirty lines evictions).

So as long as the hardware works as per the ARM ARM (B2.8), we don't
need to unmap the non-cacheable DMA buffers from the linear map.
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
index 1c906d932d6b..a1efa246c9ed 100644
--- a/arch/arm64/Kconfig.debug
+++ b/arch/arm64/Kconfig.debug
@@ -52,19 +52,6 @@  config DEBUG_WX
 
 	  If in doubt, say "Y".
 
-config DEBUG_ALIGN_RODATA
-	depends on STRICT_KERNEL_RWX
-	bool "Align linker sections up to SECTION_SIZE"
-	help
-	  If this option is enabled, sections that may potentially be marked as
-	  read only or non-executable will be aligned up to the section size of
-	  the kernel. This prevents sections from being split into pages and
-	  avoids a potential TLB penalty. The downside is an increase in
-	  alignment and potentially wasted space. Turn on this option if
-	  performance is more important than memory pressure.
-
-	  If in doubt, say N.
-
 config DEBUG_EFI
 	depends on EFI && DEBUG_INFO
 	bool "UEFI debugging"
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 4d94676e5a8b..3b34f7bde2f2 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -119,22 +119,12 @@ 
 
 /*
  * Alignment of kernel segments (e.g. .text, .data).
- */
-#if defined(CONFIG_DEBUG_ALIGN_RODATA)
-/*
- *  4 KB granule:   1 level 2 entry
- * 16 KB granule: 128 level 3 entries, with contiguous bit
- * 64 KB granule:  32 level 3 entries, with contiguous bit
- */
-#define SEGMENT_ALIGN		SZ_2M
-#else
-/*
+ *
  *  4 KB granule:  16 level 3 entries, with contiguous bit
  * 16 KB granule:   4 level 3 entries, without contiguous bit
  * 64 KB granule:   1 level 3 entry
  */
 #define SEGMENT_ALIGN		SZ_64K
-#endif
 
 /*
  * Memory types available.
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index db0c1a9c1699..fc9f8ab533a7 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -75,14 +75,12 @@  efi_status_t handle_kernel_image(unsigned long *image_addr,
 
 	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && phys_seed != 0) {
 		/*
-		 * If CONFIG_DEBUG_ALIGN_RODATA is not set, produce a
-		 * displacement in the interval [0, MIN_KIMG_ALIGN) that
-		 * doesn't violate this kernel's de-facto alignment
+		 * Produce a displacement in the interval [0, MIN_KIMG_ALIGN)
+		 * that doesn't violate this kernel's de-facto alignment
 		 * constraints.
 		 */
 		u32 mask = (MIN_KIMG_ALIGN - 1) & ~(EFI_KIMG_ALIGN - 1);
-		u32 offset = !IS_ENABLED(CONFIG_DEBUG_ALIGN_RODATA) ?
-			     (phys_seed >> 32) & mask : TEXT_OFFSET;
+		u32 offset = (phys_seed >> 32) & mask;
 
 		/*
 		 * With CONFIG_RANDOMIZE_TEXT_OFFSET=y, TEXT_OFFSET may not