diff mbox

[v2,06/12] ARM: mcpm: change max clusters to 4

Message ID 1396944052-9887-7-git-send-email-haojian.zhuang@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Haojian Zhuang April 8, 2014, 8 a.m. UTC
In order to support 4 clusters with 4 Cortex A15 Cores in each cluster,
enlarge maximum clusters from 2 to 4 in MCPM.

Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
---
 arch/arm/include/asm/mcpm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dave Martin April 10, 2014, 9:56 a.m. UTC | #1
On Tue, Apr 08, 2014 at 04:00:46PM +0800, Haojian Zhuang wrote:
> In order to support 4 clusters with 4 Cortex A15 Cores in each cluster,
> enlarge maximum clusters from 2 to 4 in MCPM.

CC Nico on mcpm patches please.  I'm happy to be CC'd as well.

> 
> Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
> ---
>  arch/arm/include/asm/mcpm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h
> index 608516e..68f82cf 100644
> --- a/arch/arm/include/asm/mcpm.h
> +++ b/arch/arm/include/asm/mcpm.h
> @@ -20,7 +20,7 @@
>   * to consider dynamic allocation.
>   */
>  #define MAX_CPUS_PER_CLUSTER	4
> -#define MAX_NR_CLUSTERS		2
> +#define MAX_NR_CLUSTERS		4

Because of the need for alignment to the biggest cacheline size in the
system, the MCPM low-level locking structures consume a non-trivial
amount of memory.

Therefore, I'm not keen on the idea of simply increasing this #define
every time a platform appears with a larger number of clusters.
If hip04 is not built into the kernel, this just wastes memory.

I'll leave it to Nico to decide whether we can increase the #define
to 4 or whether this needs a proper fix now.  Ideally, we would have
a way of choosing the maximum value required by the set of boards built
into the kernel, or switch to dynamic allocation.

Cheers
---Dave
Nicolas Pitre April 11, 2014, 2:39 a.m. UTC | #2
On Thu, 10 Apr 2014, Dave Martin wrote:

> On Tue, Apr 08, 2014 at 04:00:46PM +0800, Haojian Zhuang wrote:
> > In order to support 4 clusters with 4 Cortex A15 Cores in each cluster,
> > enlarge maximum clusters from 2 to 4 in MCPM.
> 
> CC Nico on mcpm patches please.  I'm happy to be CC'd as well.

Indeed.

> > Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
> > ---
> >  arch/arm/include/asm/mcpm.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h
> > index 608516e..68f82cf 100644
> > --- a/arch/arm/include/asm/mcpm.h
> > +++ b/arch/arm/include/asm/mcpm.h
> > @@ -20,7 +20,7 @@
> >   * to consider dynamic allocation.
> >   */
> >  #define MAX_CPUS_PER_CLUSTER	4
> > -#define MAX_NR_CLUSTERS		2
> > +#define MAX_NR_CLUSTERS		4
> 
> Because of the need for alignment to the biggest cacheline size in the
> system, the MCPM low-level locking structures consume a non-trivial
> amount of memory.
> 
> Therefore, I'm not keen on the idea of simply increasing this #define
> every time a platform appears with a larger number of clusters.
> If hip04 is not built into the kernel, this just wastes memory.
> 
> I'll leave it to Nico to decide whether we can increase the #define
> to 4 or whether this needs a proper fix now.  Ideally, we would have
> a way of choosing the maximum value required by the set of boards built
> into the kernel, or switch to dynamic allocation.

I think we should go with the ability to select a maximum based on the 
configured platforms. That could be as simple as having a 
CONFIG_MCPM_QUAD_CLUSTER symbol to be selected by those platforms that 
need it.

The memory usage is still relatively low: 4 clusters containing 6 
independent cache lines each, so for a 64-byte cache line this means 
1536 bytes.  That is rather insignificant compared to the amount of 
memory fitted to typical multi-cluster systems, especially quad cluster 
systems.  Unless there is yet more expansion of clusters and/or CPUs per 
cluster to come amongst MCPM users, I don't think we've yet reached the 
tipping point where the complexity of dynamic memory allocation for this 
is worth it.

However, changing MAX_NR_CLUSTERS cannot be done without also modifying 
the code in bL_switcher_init() or the b.L switcher won't work anymore on 
dual cluster systems as soon as a quad cluster system is configured in. 
Simply removing the test against MAX_NR_CLUSTERS should be sufficient as 
there is already a runtime validation test in bL_switcher_halve_cpus().


Nicolas
Dave Martin April 11, 2014, 2:57 p.m. UTC | #3
On Thu, Apr 10, 2014 at 10:39:04PM -0400, Nicolas Pitre wrote:
> On Thu, 10 Apr 2014, Dave Martin wrote:
> 
> > On Tue, Apr 08, 2014 at 04:00:46PM +0800, Haojian Zhuang wrote:
> > > In order to support 4 clusters with 4 Cortex A15 Cores in each cluster,
> > > enlarge maximum clusters from 2 to 4 in MCPM.
> > 
> > CC Nico on mcpm patches please.  I'm happy to be CC'd as well.
> 
> Indeed.
> 
> > > Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
> > > ---
> > >  arch/arm/include/asm/mcpm.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h
> > > index 608516e..68f82cf 100644
> > > --- a/arch/arm/include/asm/mcpm.h
> > > +++ b/arch/arm/include/asm/mcpm.h
> > > @@ -20,7 +20,7 @@
> > >   * to consider dynamic allocation.
> > >   */
> > >  #define MAX_CPUS_PER_CLUSTER	4
> > > -#define MAX_NR_CLUSTERS		2
> > > +#define MAX_NR_CLUSTERS		4
> > 
> > Because of the need for alignment to the biggest cacheline size in the
> > system, the MCPM low-level locking structures consume a non-trivial
> > amount of memory.
> > 
> > Therefore, I'm not keen on the idea of simply increasing this #define
> > every time a platform appears with a larger number of clusters.
> > If hip04 is not built into the kernel, this just wastes memory.
> > 
> > I'll leave it to Nico to decide whether we can increase the #define
> > to 4 or whether this needs a proper fix now.  Ideally, we would have
> > a way of choosing the maximum value required by the set of boards built
> > into the kernel, or switch to dynamic allocation.
> 
> I think we should go with the ability to select a maximum based on the 
> configured platforms. That could be as simple as having a 
> CONFIG_MCPM_QUAD_CLUSTER symbol to be selected by those platforms that 
> need it.

Yes, that could work.

We could use a similar approach for __CACHE_WRITEBACK_ORDER, but at
the moment we are not aware of a platform that contains caches with
lines larger than 64 bytes.  We shouldn't address that until/unless
it's needed.

> The memory usage is still relatively low: 4 clusters containing 6 
> independent cache lines each, so for a 64-byte cache line this means 
> 1536 bytes.  That is rather insignificant compared to the amount of 
> memory fitted to typical multi-cluster systems, especially quad cluster 
> systems.  Unless there is yet more expansion of clusters and/or CPUs per 
> cluster to come amongst MCPM users, I don't think we've yet reached the 
> tipping point where the complexity of dynamic memory allocation for this 
> is worth it.

I agree with this.  If we se 16 clusters somewhere, it might be time to
rethink, but dynamic allocation seems like overkill for 4.

> However, changing MAX_NR_CLUSTERS cannot be done without also modifying 
> the code in bL_switcher_init() or the b.L switcher won't work anymore on 
> dual cluster systems as soon as a quad cluster system is configured in. 
> Simply removing the test against MAX_NR_CLUSTERS should be sufficient as 
> there is already a runtime validation test in bL_switcher_halve_cpus().

Agreed, that will need some attention, but it doesn't sound too painful.

Cheers
---Dave
Haojian Zhuang April 15, 2014, 6:45 a.m. UTC | #4
On 11 April 2014 10:39, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Thu, 10 Apr 2014, Dave Martin wrote:
>
>> On Tue, Apr 08, 2014 at 04:00:46PM +0800, Haojian Zhuang wrote:
>> > In order to support 4 clusters with 4 Cortex A15 Cores in each cluster,
>> > enlarge maximum clusters from 2 to 4 in MCPM.
>>
>> CC Nico on mcpm patches please.  I'm happy to be CC'd as well.
>
> Indeed.
>
>> > Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
>> > ---
>> >  arch/arm/include/asm/mcpm.h | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h
>> > index 608516e..68f82cf 100644
>> > --- a/arch/arm/include/asm/mcpm.h
>> > +++ b/arch/arm/include/asm/mcpm.h
>> > @@ -20,7 +20,7 @@
>> >   * to consider dynamic allocation.
>> >   */
>> >  #define MAX_CPUS_PER_CLUSTER       4
>> > -#define MAX_NR_CLUSTERS            2
>> > +#define MAX_NR_CLUSTERS            4
>>
>> Because of the need for alignment to the biggest cacheline size in the
>> system, the MCPM low-level locking structures consume a non-trivial
>> amount of memory.
>>
>> Therefore, I'm not keen on the idea of simply increasing this #define
>> every time a platform appears with a larger number of clusters.
>> If hip04 is not built into the kernel, this just wastes memory.
>>
>> I'll leave it to Nico to decide whether we can increase the #define
>> to 4 or whether this needs a proper fix now.  Ideally, we would have
>> a way of choosing the maximum value required by the set of boards built
>> into the kernel, or switch to dynamic allocation.
>
> I think we should go with the ability to select a maximum based on the
> configured platforms. That could be as simple as having a
> CONFIG_MCPM_QUAD_CLUSTER symbol to be selected by those platforms that
> need it.
>

I'll set MAX_NR_CLUSTERS from Kconfig.

If BIG_LITTLE isn't enabled & HIP04 is enabled, MAX_NR_CLUSTERS could
be set to 4.
Otherwise, it's always 2. Since there're only 16 A15 cores in HiP04.

Regards
Haojian
Dave Martin April 15, 2014, 8:15 a.m. UTC | #5
On Tue, Apr 15, 2014 at 02:45:05PM +0800, Haojian Zhuang wrote:
> On 11 April 2014 10:39, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Thu, 10 Apr 2014, Dave Martin wrote:
> >
> >> On Tue, Apr 08, 2014 at 04:00:46PM +0800, Haojian Zhuang wrote:
> >> > In order to support 4 clusters with 4 Cortex A15 Cores in each cluster,
> >> > enlarge maximum clusters from 2 to 4 in MCPM.
> >>
> >> CC Nico on mcpm patches please.  I'm happy to be CC'd as well.
> >
> > Indeed.
> >
> >> > Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
> >> > ---
> >> >  arch/arm/include/asm/mcpm.h | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h
> >> > index 608516e..68f82cf 100644
> >> > --- a/arch/arm/include/asm/mcpm.h
> >> > +++ b/arch/arm/include/asm/mcpm.h
> >> > @@ -20,7 +20,7 @@
> >> >   * to consider dynamic allocation.
> >> >   */
> >> >  #define MAX_CPUS_PER_CLUSTER       4
> >> > -#define MAX_NR_CLUSTERS            2
> >> > +#define MAX_NR_CLUSTERS            4
> >>
> >> Because of the need for alignment to the biggest cacheline size in the
> >> system, the MCPM low-level locking structures consume a non-trivial
> >> amount of memory.
> >>
> >> Therefore, I'm not keen on the idea of simply increasing this #define
> >> every time a platform appears with a larger number of clusters.
> >> If hip04 is not built into the kernel, this just wastes memory.
> >>
> >> I'll leave it to Nico to decide whether we can increase the #define
> >> to 4 or whether this needs a proper fix now.  Ideally, we would have
> >> a way of choosing the maximum value required by the set of boards built
> >> into the kernel, or switch to dynamic allocation.
> >
> > I think we should go with the ability to select a maximum based on the
> > configured platforms. That could be as simple as having a
> > CONFIG_MCPM_QUAD_CLUSTER symbol to be selected by those platforms that
> > need it.
> >
> 
> I'll set MAX_NR_CLUSTERS from Kconfig.
> 
> If BIG_LITTLE isn't enabled & HIP04 is enabled, MAX_NR_CLUSTERS could
> be set to 4.

Perhaps the bL switcher should simply refuse to enable itself if there
are more than 2 clusters, or 2 clusters that are identical.

There's no reason why HIP04 and b.L switcher support shouldn't be built
into the same kernel image, since the decision about whether to enable
the switcher is made at runtime.

Cheers
---Dave
Nicolas Pitre April 15, 2014, 2:48 p.m. UTC | #6
On Tue, 15 Apr 2014, Haojian Zhuang wrote:

> On 11 April 2014 10:39, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Thu, 10 Apr 2014, Dave Martin wrote:
> >
> >> On Tue, Apr 08, 2014 at 04:00:46PM +0800, Haojian Zhuang wrote:
> >> > In order to support 4 clusters with 4 Cortex A15 Cores in each cluster,
> >> > enlarge maximum clusters from 2 to 4 in MCPM.
> >>
> >> CC Nico on mcpm patches please.  I'm happy to be CC'd as well.
> >
> > Indeed.
> >
> >> > Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
> >> > ---
> >> >  arch/arm/include/asm/mcpm.h | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h
> >> > index 608516e..68f82cf 100644
> >> > --- a/arch/arm/include/asm/mcpm.h
> >> > +++ b/arch/arm/include/asm/mcpm.h
> >> > @@ -20,7 +20,7 @@
> >> >   * to consider dynamic allocation.
> >> >   */
> >> >  #define MAX_CPUS_PER_CLUSTER       4
> >> > -#define MAX_NR_CLUSTERS            2
> >> > +#define MAX_NR_CLUSTERS            4
> >>
> >> Because of the need for alignment to the biggest cacheline size in the
> >> system, the MCPM low-level locking structures consume a non-trivial
> >> amount of memory.
> >>
> >> Therefore, I'm not keen on the idea of simply increasing this #define
> >> every time a platform appears with a larger number of clusters.
> >> If hip04 is not built into the kernel, this just wastes memory.
> >>
> >> I'll leave it to Nico to decide whether we can increase the #define
> >> to 4 or whether this needs a proper fix now.  Ideally, we would have
> >> a way of choosing the maximum value required by the set of boards built
> >> into the kernel, or switch to dynamic allocation.
> >
> > I think we should go with the ability to select a maximum based on the
> > configured platforms. That could be as simple as having a
> > CONFIG_MCPM_QUAD_CLUSTER symbol to be selected by those platforms that
> > need it.
> >
> 
> I'll set MAX_NR_CLUSTERS from Kconfig.
> 
> If BIG_LITTLE isn't enabled & HIP04 is enabled, MAX_NR_CLUSTERS could
> be set to 4.
> Otherwise, it's always 2. Since there're only 16 A15 cores in HiP04.

No.

Please let's avoid cross dependencies for things that are conceptually 
unrelated.

Like I suggested, just create a CONFIG_MCPM_QUAD_CLUSTER symbol and 
select it from the HIP04 config.

For multi-platform kernels we must be able to support both BIG_LITTLE 
and HIP04 in the same config.  Whether or not the big.LITTLE code is 
activated can be decided at run time.


Nicolas
diff mbox

Patch

diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h
index 608516e..68f82cf 100644
--- a/arch/arm/include/asm/mcpm.h
+++ b/arch/arm/include/asm/mcpm.h
@@ -20,7 +20,7 @@ 
  * to consider dynamic allocation.
  */
 #define MAX_CPUS_PER_CLUSTER	4
-#define MAX_NR_CLUSTERS		2
+#define MAX_NR_CLUSTERS		4
 
 #ifndef __ASSEMBLY__