mbox series

[v3,0/2] arch_topology: Pre-allocate cacheinfo from primary CPU

Message ID 20230406233926.1670094-1-rrendec@redhat.com (mailing list archive)
Headers show
Series arch_topology: Pre-allocate cacheinfo from primary CPU | expand

Message

Radu Rendec April 6, 2023, 11:39 p.m. UTC
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
tries to build the cacheinfo from the primary CPU prior to secondary
CPUs boot, if the DT/ACPI description contains cache information.
However, if such information is not present, it still reverts to the old
behavior, which allocates the cacheinfo memory on each secondary CPU. On
RT kernels, this triggers a "BUG: sleeping function called from invalid
context" because the allocation is done before preemption is first
enabled on the secondary CPU.

The solution is to add cache information to DT/ACPI, but at least on
arm64 systems this can be avoided by leveraging automatic detection
(through the CLIDR_EL1 register), which is already implemented but
currently doesn't work on RT kernels for the reason described above.

This patch series attempts to enable automatic detection for RT kernels
when no DT/ACPI cache information is available, by pre-allocating
cacheinfo memory on the primary CPU.

The first patch adds an architecture independent infrastructure that
allows architecture specific code to take an early guess at the number
of cache leaves of the secodary CPUs, while it runs in preemptible
context on the primary CPU. At the same time, it gives architecture
specific code the opportunity to go back later, while it runs on the
secondary CPU, and reallocate the cacheinfo memory if the initial guess
proves to be wrong.

The second patch leverages the infrastructure implemented in the first
patch and enables early cache depth detection for arm64.

The patch series is based on an RFC patch that was posted to the
linux-arm-kernel mailing list and discussed with a smaller audience:
https://lore.kernel.org/all/20230323224242.31142-1-rrendec@redhat.com/

Changes to v2:
* Address minor coding style issue (unbalanced braces).
* Move cacheinfo reallocation logic from detect_cache_attributes() to a
  new function to improve code readability.
* Minor fix to cacheinfo reallocation logic to avoid a new detection of
  the cache level if/when detect_cache_attributes() is called again.

Radu Rendec (2):
  cacheinfo: Add arch specific early level initializer
  cacheinfo: Add arm64 early level initializer implementation

 arch/arm64/kernel/cacheinfo.c | 32 +++++++++++----
 drivers/base/cacheinfo.c      | 75 +++++++++++++++++++++++++----------
 include/linux/cacheinfo.h     |  2 +
 3 files changed, 79 insertions(+), 30 deletions(-)

Comments

Pierre Gondois April 11, 2023, 12:36 p.m. UTC | #1
Hello Radu,
Some additional points:
1-
For the record, Will made a comment about adding weak functions
(cf. https://lore.kernel.org/all/20230327121734.GB31342@willie-the-truck/)
but I don't see how it could be done otherwise ...

2-
The patch-set needs to be rebased on top of v6.3-rc6,
otherwise there is a merge conflict.

3-
When trying the patch-set on an ACPI platform with no PPTT, it seems that
fetch_cache_info() is not called from init_cpu_topology() because
parse_acpi_topology() returns an error code. This result in a
'sleeping function called from invalid context' message. The following made
it work for me:

--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -838,7 +838,6 @@ void __init init_cpu_topology(void)
                  * don't use partial information.
                  */
                 reset_cpu_topology();
-               return;
         }
  
         for_each_possible_cpu(cpu) {

With 2 and 3 addressed:
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>

Also maybe wait for Sudeep to have a look before sending a v4,
Regards,
Pierre


On 4/7/23 01:39, Radu Rendec wrote:
> Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
> tries to build the cacheinfo from the primary CPU prior to secondary
> CPUs boot, if the DT/ACPI description contains cache information.
> However, if such information is not present, it still reverts to the old
> behavior, which allocates the cacheinfo memory on each secondary CPU. On
> RT kernels, this triggers a "BUG: sleeping function called from invalid
> context" because the allocation is done before preemption is first
> enabled on the secondary CPU.
> 
> The solution is to add cache information to DT/ACPI, but at least on
> arm64 systems this can be avoided by leveraging automatic detection
> (through the CLIDR_EL1 register), which is already implemented but
> currently doesn't work on RT kernels for the reason described above.
> 
> This patch series attempts to enable automatic detection for RT kernels
> when no DT/ACPI cache information is available, by pre-allocating
> cacheinfo memory on the primary CPU.
> 
> The first patch adds an architecture independent infrastructure that
> allows architecture specific code to take an early guess at the number
> of cache leaves of the secodary CPUs, while it runs in preemptible
> context on the primary CPU. At the same time, it gives architecture
> specific code the opportunity to go back later, while it runs on the
> secondary CPU, and reallocate the cacheinfo memory if the initial guess
> proves to be wrong.
> 
> The second patch leverages the infrastructure implemented in the first
> patch and enables early cache depth detection for arm64.
> 
> The patch series is based on an RFC patch that was posted to the
> linux-arm-kernel mailing list and discussed with a smaller audience:
> https://lore.kernel.org/all/20230323224242.31142-1-rrendec@redhat.com/
> 
> Changes to v2:
> * Address minor coding style issue (unbalanced braces).
> * Move cacheinfo reallocation logic from detect_cache_attributes() to a
>    new function to improve code readability.
> * Minor fix to cacheinfo reallocation logic to avoid a new detection of
>    the cache level if/when detect_cache_attributes() is called again.
> 
> Radu Rendec (2):
>    cacheinfo: Add arch specific early level initializer
>    cacheinfo: Add arm64 early level initializer implementation
> 
>   arch/arm64/kernel/cacheinfo.c | 32 +++++++++++----
>   drivers/base/cacheinfo.c      | 75 +++++++++++++++++++++++++----------
>   include/linux/cacheinfo.h     |  2 +
>   3 files changed, 79 insertions(+), 30 deletions(-)
>
Radu Rendec April 11, 2023, 4:36 p.m. UTC | #2
Hello Pierre,

On Tue, 2023-04-11 at 14:36 +0200, Pierre Gondois wrote:
> Hello Radu,
> Some additional points:
> 1-
> For the record, Will made a comment about adding weak functions
> (cf. https://lore.kernel.org/all/20230327121734.GB31342@willie-the-truck/)
> but I don't see how it could be done otherwise ...

In that comment, Will suggested using static inline functions in a
header. It would probably work but for the sake of consistency with
init_cache_level() I would argue it's better to use a weak function in
this particular case.

> 2-
> The patch-set needs to be rebased on top of v6.3-rc6,
> otherwise there is a merge conflict.

Fair enough. I worked off of linux-rt-devel for obvious reasons, and
forgot to rebase. It's a trivial conflict, which both "git rebase" and
"git am -3" can fix automatically. I will keep this in mind for v4.

> 3-
> When trying the patch-set on an ACPI platform with no PPTT, it seems that
> fetch_cache_info() is not called from init_cpu_topology() because
> parse_acpi_topology() returns an error code. This result in a
> 'sleeping function called from invalid context' message. The following made
> it work for me:
> 
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -838,7 +838,6 @@ void __init init_cpu_topology(void)
>                   * don't use partial information.
>                   */
>                  reset_cpu_topology();
> -               return;
>          }
>   
>          for_each_possible_cpu(cpu) {

Good catch! I think this calls for a dedicated patch in the series to
do just that and explain why the "return" statement is being removed.

> With 2 and 3 addressed:
> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>

Thanks again for reviewing these patches and for all your input!

> Also maybe wait for Sudeep to have a look before sending a v4,

Sure. Looking at other patches, he seems to respond pretty quickly, so
I'll wait until tomorrow and then send v4 if I don't hear back.

Best regards,
Radu

> On 4/7/23 01:39, Radu Rendec wrote:
> > Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
> > tries to build the cacheinfo from the primary CPU prior to secondary
> > CPUs boot, if the DT/ACPI description contains cache information.
> > However, if such information is not present, it still reverts to the old
> > behavior, which allocates the cacheinfo memory on each secondary CPU. On
> > RT kernels, this triggers a "BUG: sleeping function called from invalid
> > context" because the allocation is done before preemption is first
> > enabled on the secondary CPU.
> > 
> > The solution is to add cache information to DT/ACPI, but at least on
> > arm64 systems this can be avoided by leveraging automatic detection
> > (through the CLIDR_EL1 register), which is already implemented but
> > currently doesn't work on RT kernels for the reason described above.
> > 
> > This patch series attempts to enable automatic detection for RT kernels
> > when no DT/ACPI cache information is available, by pre-allocating
> > cacheinfo memory on the primary CPU.
> > 
> > The first patch adds an architecture independent infrastructure that
> > allows architecture specific code to take an early guess at the number
> > of cache leaves of the secodary CPUs, while it runs in preemptible
> > context on the primary CPU. At the same time, it gives architecture
> > specific code the opportunity to go back later, while it runs on the
> > secondary CPU, and reallocate the cacheinfo memory if the initial guess
> > proves to be wrong.
> > 
> > The second patch leverages the infrastructure implemented in the first
> > patch and enables early cache depth detection for arm64.
> > 
> > The patch series is based on an RFC patch that was posted to the
> > linux-arm-kernel mailing list and discussed with a smaller audience:
> > https://lore.kernel.org/all/20230323224242.31142-1-rrendec@redhat.com/
> > 
> > Changes to v2:
> > * Address minor coding style issue (unbalanced braces).
> > * Move cacheinfo reallocation logic from detect_cache_attributes() to a
> >    new function to improve code readability.
> > * Minor fix to cacheinfo reallocation logic to avoid a new detection of
> >    the cache level if/when detect_cache_attributes() is called again.
> > 
> > Radu Rendec (2):
> >    cacheinfo: Add arch specific early level initializer
> >    cacheinfo: Add arm64 early level initializer implementation
> > 
> >   arch/arm64/kernel/cacheinfo.c | 32 +++++++++++----
> >   drivers/base/cacheinfo.c      | 75 +++++++++++++++++++++++++----------
> >   include/linux/cacheinfo.h     |  2 +
> >   3 files changed, 79 insertions(+), 30 deletions(-)
> > 
>