diff mbox

[V2,1/5] arm: mvebu: Added support for coherency fabric in mach-mvebu

Message ID 1351545108-18954-2-git-send-email-gregory.clement@free-electrons.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gregory CLEMENT Oct. 29, 2012, 9:11 p.m. UTC
From: Yehuda Yitschak <yehuday@marvell.com>

Signed-off-by: Yehuda Yitschak <yehuday@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
 .../devicetree/bindings/arm/coherency-fabric.txt   |   16 ++++
 arch/arm/boot/dts/armada-370-xp.dtsi               |    5 ++
 arch/arm/mach-mvebu/Makefile                       |    2 +-
 arch/arm/mach-mvebu/coherency.c                    |   89 ++++++++++++++++++++
 arch/arm/mach-mvebu/coherency.h                    |   21 +++++
 arch/arm/mach-mvebu/common.h                       |    2 +
 6 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/arm/coherency-fabric.txt
 create mode 100644 arch/arm/mach-mvebu/coherency.c
 create mode 100644 arch/arm/mach-mvebu/coherency.h

Comments

Will Deacon Nov. 5, 2012, 2:02 p.m. UTC | #1
Hi Gregory,

On Mon, Oct 29, 2012 at 09:11:44PM +0000, Gregory CLEMENT wrote:
> diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
> new file mode 100644
> index 0000000..69e130d
> --- /dev/null
> +++ b/arch/arm/mach-mvebu/coherency.c
> @@ -0,0 +1,89 @@
> +/*
> + * Coherency fabric (Aurora) support for Armada 370 and XP platforms.
> + *
> + * Copyright (C) 2012 Marvell
> + *
> + * Yehuda Yitschak <yehuday@marvell.com>
> + * Gregory Clement <gregory.clement@free-electrons.com>
> + * Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> + *
> + * This file is licensed under the terms of the GNU General Public
> + * License version 2.  This program is licensed "as is" without any
> + * warranty of any kind, whether express or implied.
> + *
> + * The Armada 370 and Armada XP SOCs have a coherency fabric which is
> + * responsible for ensuring hardware coherency between all CPUs and between
> + * CPUs and I/O masters. This file initializes the coherency fabric and
> + * supplies basic routines for configuring and controlling hardware coherency
> + */

[...]

> +int set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
> +{
> +	int reg;
> +
> +	if (!coherency_base) {
> +		pr_warn("Can't make CPU %d cache coherent.\n", hw_cpu_id);
> +		pr_warn("Coherency fabric is not initialized\n");
> +		return 1;
> +	}
> +
> +	/* Enable the CPU in coherency fabric */
> +	reg = readl(coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
> +	reg |= 1 << (24 + hw_cpu_id);
> +	writel(reg, coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
> +
> +	/* Add CPU to SMP group */
> +	reg = readl(coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
> +	reg |= 1 << (16 + hw_cpu_id + (smp_group_id == 0 ? 8 : 0));
> +	writel(reg, coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
> +
> +	return 0;
> +}

These writels may expand to code containing calls to outer_sync(), which
will attempt to take a spinlock for the aurora l2. Given that the CPU isn't
coherent, how does this play out with the exclusive store instruction in the
lock?

Will
Gregory CLEMENT Nov. 5, 2012, 11:53 p.m. UTC | #2
On 11/05/2012 03:02 PM, Will Deacon wrote:
> Hi Gregory,

Hi Will,

> 
> On Mon, Oct 29, 2012 at 09:11:44PM +0000, Gregory CLEMENT wrote:
>> diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
>> new file mode 100644
>> index 0000000..69e130d
>> --- /dev/null
>> +++ b/arch/arm/mach-mvebu/coherency.c
>> @@ -0,0 +1,89 @@
>> +/*
>> + * Coherency fabric (Aurora) support for Armada 370 and XP platforms.
>> + *
>> + * Copyright (C) 2012 Marvell
>> + *
>> + * Yehuda Yitschak <yehuday@marvell.com>
>> + * Gregory Clement <gregory.clement@free-electrons.com>
>> + * Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
>> + *
>> + * This file is licensed under the terms of the GNU General Public
>> + * License version 2.  This program is licensed "as is" without any
>> + * warranty of any kind, whether express or implied.
>> + *
>> + * The Armada 370 and Armada XP SOCs have a coherency fabric which is
>> + * responsible for ensuring hardware coherency between all CPUs and between
>> + * CPUs and I/O masters. This file initializes the coherency fabric and
>> + * supplies basic routines for configuring and controlling hardware coherency
>> + */
> 
> [...]
> 
>> +int set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
>> +{
>> +	int reg;
>> +
>> +	if (!coherency_base) {
>> +		pr_warn("Can't make CPU %d cache coherent.\n", hw_cpu_id);
>> +		pr_warn("Coherency fabric is not initialized\n");
>> +		return 1;
>> +	}
>> +
>> +	/* Enable the CPU in coherency fabric */
>> +	reg = readl(coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
>> +	reg |= 1 << (24 + hw_cpu_id);
>> +	writel(reg, coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
>> +
>> +	/* Add CPU to SMP group */
>> +	reg = readl(coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
>> +	reg |= 1 << (16 + hw_cpu_id + (smp_group_id == 0 ? 8 : 0));
>> +	writel(reg, coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
>> +
>> +	return 0;
>> +}
> 
> These writels may expand to code containing calls to outer_sync(), which
> will attempt to take a spinlock for the aurora l2. Given that the CPU isn't
> coherent, how does this play out with the exclusive store instruction in the
> lock?

I forward this question to the Marvell experts.

> 
> Will
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Gregory CLEMENT Nov. 12, 2012, 8:21 p.m. UTC | #3
Hi Will,

On 11/05/2012 03:02 PM, Will Deacon wrote:
> Hi Gregory,
> 
> On Mon, Oct 29, 2012 at 09:11:44PM +0000, Gregory CLEMENT wrote:
>> diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
>> new file mode 100644
>> index 0000000..69e130d
>> --- /dev/null
>> +++ b/arch/arm/mach-mvebu/coherency.c
>> @@ -0,0 +1,89 @@
>> +/*
>> + * Coherency fabric (Aurora) support for Armada 370 and XP platforms.
>> + *
>> + * Copyright (C) 2012 Marvell
>> + *
>> + * Yehuda Yitschak <yehuday@marvell.com>
>> + * Gregory Clement <gregory.clement@free-electrons.com>
>> + * Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
>> + *
>> + * This file is licensed under the terms of the GNU General Public
>> + * License version 2.  This program is licensed "as is" without any
>> + * warranty of any kind, whether express or implied.
>> + *
>> + * The Armada 370 and Armada XP SOCs have a coherency fabric which is
>> + * responsible for ensuring hardware coherency between all CPUs and between
>> + * CPUs and I/O masters. This file initializes the coherency fabric and
>> + * supplies basic routines for configuring and controlling hardware coherency
>> + */
> 
> [...]
> 
>> +int set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
>> +{
>> +	int reg;
>> +
>> +	if (!coherency_base) {
>> +		pr_warn("Can't make CPU %d cache coherent.\n", hw_cpu_id);
>> +		pr_warn("Coherency fabric is not initialized\n");
>> +		return 1;
>> +	}
>> +
>> +	/* Enable the CPU in coherency fabric */
>> +	reg = readl(coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
>> +	reg |= 1 << (24 + hw_cpu_id);
>> +	writel(reg, coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
>> +
>> +	/* Add CPU to SMP group */
>> +	reg = readl(coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
>> +	reg |= 1 << (16 + hw_cpu_id + (smp_group_id == 0 ? 8 : 0));
>> +	writel(reg, coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
>> +
>> +	return 0;
>> +}
> 
> These writels may expand to code containing calls to outer_sync(), which
> will attempt to take a spinlock for the aurora l2. Given that the CPU isn't
> coherent, how does this play out with the exclusive store instruction in the
> lock?

I dug a little this subject: and I am not sure there is problem. In SMP mode,
only the system cache mode of Aurora is used. In this mode, outer_cache.sync
is void then outer_sync() won't call any function, so there will be no
access to any spinlock.

Gregory
Will Deacon Nov. 13, 2012, 10:43 a.m. UTC | #4
On Mon, Nov 12, 2012 at 08:21:07PM +0000, Gregory CLEMENT wrote:
> On 11/05/2012 03:02 PM, Will Deacon wrote:
> > On Mon, Oct 29, 2012 at 09:11:44PM +0000, Gregory CLEMENT wrote:
> >> +int set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
> >> +{
> >> +	int reg;
> >> +
> >> +	if (!coherency_base) {
> >> +		pr_warn("Can't make CPU %d cache coherent.\n", hw_cpu_id);
> >> +		pr_warn("Coherency fabric is not initialized\n");
> >> +		return 1;
> >> +	}
> >> +
> >> +	/* Enable the CPU in coherency fabric */
> >> +	reg = readl(coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
> >> +	reg |= 1 << (24 + hw_cpu_id);
> >> +	writel(reg, coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
> >> +
> >> +	/* Add CPU to SMP group */
> >> +	reg = readl(coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
> >> +	reg |= 1 << (16 + hw_cpu_id + (smp_group_id == 0 ? 8 : 0));
> >> +	writel(reg, coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
> >> +
> >> +	return 0;
> >> +}
> > 
> > These writels may expand to code containing calls to outer_sync(), which
> > will attempt to take a spinlock for the aurora l2. Given that the CPU isn't
> > coherent, how does this play out with the exclusive store instruction in the
> > lock?
> 
> I dug a little this subject: and I am not sure there is problem. In SMP mode,
> only the system cache mode of Aurora is used. In this mode, outer_cache.sync
> is void then outer_sync() won't call any function, so there will be no
> access to any spinlock.

Hmm, that is pretty subtle and it doesn't really solve the bigger picture.
printk takes logbuf_lock, for example, and I'm sure that by the time you get
to this code you will have relied on exclusives behaving correctly.

Will
Gregory CLEMENT Nov. 14, 2012, 8 p.m. UTC | #5
On 11/13/2012 11:43 AM, Will Deacon wrote:
> On Mon, Nov 12, 2012 at 08:21:07PM +0000, Gregory CLEMENT wrote:
>> On 11/05/2012 03:02 PM, Will Deacon wrote:
>>> On Mon, Oct 29, 2012 at 09:11:44PM +0000, Gregory CLEMENT wrote:
>>>> +int set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
>>>> +{
>>>> +	int reg;
>>>> +
>>>> +	if (!coherency_base) {
>>>> +		pr_warn("Can't make CPU %d cache coherent.\n", hw_cpu_id);
>>>> +		pr_warn("Coherency fabric is not initialized\n");
>>>> +		return 1;
>>>> +	}
>>>> +
>>>> +	/* Enable the CPU in coherency fabric */
>>>> +	reg = readl(coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
>>>> +	reg |= 1 << (24 + hw_cpu_id);
>>>> +	writel(reg, coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
>>>> +
>>>> +	/* Add CPU to SMP group */
>>>> +	reg = readl(coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
>>>> +	reg |= 1 << (16 + hw_cpu_id + (smp_group_id == 0 ? 8 : 0));
>>>> +	writel(reg, coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
>>>> +
>>>> +	return 0;
>>>> +}
>>>
>>> These writels may expand to code containing calls to outer_sync(), which
>>> will attempt to take a spinlock for the aurora l2. Given that the CPU isn't
>>> coherent, how does this play out with the exclusive store instruction in the
>>> lock?
>>
>> I dug a little this subject: and I am not sure there is problem. In SMP mode,
>> only the system cache mode of Aurora is used. In this mode, outer_cache.sync
>> is void then outer_sync() won't call any function, so there will be no
>> access to any spinlock.
> 
> Hmm, that is pretty subtle and it doesn't really solve the bigger picture.
> printk takes logbuf_lock, for example, and I'm sure that by the time you get
> to this code you will have relied on exclusives behaving correctly.
> 

Hi Will,
I get an answer from Marvell engineers:
"STREX on non-shareable and/or non-cacheable memory regions is supported."

Gregory
Will Deacon Nov. 15, 2012, 10:17 a.m. UTC | #6
On Wed, Nov 14, 2012 at 08:00:32PM +0000, Gregory CLEMENT wrote:
> On 11/13/2012 11:43 AM, Will Deacon wrote:
> > On Mon, Nov 12, 2012 at 08:21:07PM +0000, Gregory CLEMENT wrote:
> >> On 11/05/2012 03:02 PM, Will Deacon wrote:
> >>> These writels may expand to code containing calls to outer_sync(), which
> >>> will attempt to take a spinlock for the aurora l2. Given that the CPU isn't
> >>> coherent, how does this play out with the exclusive store instruction in the
> >>> lock?
> >>
> >> I dug a little this subject: and I am not sure there is problem. In SMP mode,
> >> only the system cache mode of Aurora is used. In this mode, outer_cache.sync
> >> is void then outer_sync() won't call any function, so there will be no
> >> access to any spinlock.
> > 
> > Hmm, that is pretty subtle and it doesn't really solve the bigger picture.
> > printk takes logbuf_lock, for example, and I'm sure that by the time you get
> > to this code you will have relied on exclusives behaving correctly.
> > 
> 
> Hi Will,
> I get an answer from Marvell engineers:
> "STREX on non-shareable and/or non-cacheable memory regions is supported."

Interesting, thanks for asking them about this. Does this mean that:

	1. When not running coherently (i.e. before initialising the
	   coherency fabric), memory is treated as non-shareable,
	   non-cacheable?

	2. If (1), then are exclusive accesses the only way to achieve
	   coherent memory accesses in this scenario?

If so, you still have a problem with write locks, where the unlock code does
a regular str to clear the status. atomic_{read,set} also uses regular
memory accesses, so I think you'll get some surprises there when you add
explicit memory barriers and expect things to be visible between threads.

Do memory barriers have different semantics depending on the state of your
coherency fabric?

Cheers,

Will
Gregory CLEMENT Nov. 15, 2012, 3:54 p.m. UTC | #7
On 11/15/2012 11:17 AM, Will Deacon wrote:
> On Wed, Nov 14, 2012 at 08:00:32PM +0000, Gregory CLEMENT wrote:
>> On 11/13/2012 11:43 AM, Will Deacon wrote:
>>> On Mon, Nov 12, 2012 at 08:21:07PM +0000, Gregory CLEMENT wrote:
>>>> On 11/05/2012 03:02 PM, Will Deacon wrote:
>>>>> These writels may expand to code containing calls to outer_sync(), which
>>>>> will attempt to take a spinlock for the aurora l2. Given that the CPU isn't
>>>>> coherent, how does this play out with the exclusive store instruction in the
>>>>> lock?
>>>>
>>>> I dug a little this subject: and I am not sure there is problem. In SMP mode,
>>>> only the system cache mode of Aurora is used. In this mode, outer_cache.sync
>>>> is void then outer_sync() won't call any function, so there will be no
>>>> access to any spinlock.
>>>
>>> Hmm, that is pretty subtle and it doesn't really solve the bigger picture.
>>> printk takes logbuf_lock, for example, and I'm sure that by the time you get
>>> to this code you will have relied on exclusives behaving correctly.
>>>
>>
>> Hi Will,
>> I get an answer from Marvell engineers:
>> "STREX on non-shareable and/or non-cacheable memory regions is supported."
> 
> Interesting, thanks for asking them about this. Does this mean that:

Here come the answers to your new questions


>
> 	1. When not running coherently (i.e. before initialising the
> 	   coherency fabric), memory is treated as non-shareable,
> 	   non-cacheable?

It can be cacheable. The shared memory (as defined on the page table)
will NOT be coherent by HW.

>
> 	2. If (1), then are exclusive accesses the only way to achieve
> 	   coherent memory accesses in this scenario?

I quote: "I suspect there is terminology miss-use: exclusive accesses
are NOT used to achieve memory coherency - they are used to achieve
atomicity. To achieve memory coherency while fabric is configured to
be non-coherent, SW should use maintenance operations over the L1
caches.suspect there is terminology miss-use: exclusive accesses are
NOT used to achieve memory coherency - "they are used to achieve
atomicity. To achieve memory coherency while fabric is configured to
be non-coherent, SW should use maintenance operations over the L1
caches.

> If so, you still have a problem with write locks, where the unlock code does
> a regular str to clear the status. atomic_{read,set} also uses regular
> memory accesses, so I think you'll get some surprises there when you add
> explicit memory barriers and expect things to be visible between threads.
>
> Do memory barriers have different semantics depending on the state of your
> coherency fabric?

No
Will Deacon Nov. 15, 2012, 4:21 p.m. UTC | #8
Hi Gregory,

On Thu, Nov 15, 2012 at 03:54:39PM +0000, Gregory CLEMENT wrote:
> On 11/15/2012 11:17 AM, Will Deacon wrote:
> > Interesting, thanks for asking them about this. Does this mean that:
> 
> Here come the answers to your new questions

Great, thanks for the quick turn-around!

> > 	1. When not running coherently (i.e. before initialising the
> > 	   coherency fabric), memory is treated as non-shareable,
> > 	   non-cacheable?
> 
> It can be cacheable. The shared memory (as defined on the page table)
> will NOT be coherent by HW.

Ok, so we really are incoherent before enabling the fabric.

> > 	2. If (1), then are exclusive accesses the only way to achieve
> > 	   coherent memory accesses in this scenario?
> 
> I quote: "I suspect there is terminology miss-use: exclusive accesses
> are NOT used to achieve memory coherency - they are used to achieve
> atomicity. To achieve memory coherency while fabric is configured to
> be non-coherent, SW should use maintenance operations over the L1
> caches."

Ok, so if I'm understanding correctly then I don't really see the usefulness
of having working exclusives that are incoherent. Surely it means that you
can guarantee mutual exclusion on a lock variable, but the value you actually
end up reading from the lock is junk unless you litter the accessors with cache
clean operations?

Anyway, that's by-the-by as this is all called early enough that we
shouldn't care. The thing I don't like now is that the fabric initialisation
is done entirely differently on the primary CPU than the secondaries. The
primary probes the device-tree (well, it's also now hard-coded for v2) and
accesses the registers from a C function(armada_370_xp_set_cpu_coherent) whilst
the secondaries have hardcoded addresses and access via asm
(armada_xp_secondary_startup).

Will
Gregory CLEMENT Nov. 15, 2012, 4:49 p.m. UTC | #9
On 11/15/2012 05:21 PM, Will Deacon wrote:
> Hi Gregory,
> 
> On Thu, Nov 15, 2012 at 03:54:39PM +0000, Gregory CLEMENT wrote:
>> On 11/15/2012 11:17 AM, Will Deacon wrote:
>>> Interesting, thanks for asking them about this. Does this mean that:
>>
>> Here come the answers to your new questions
> 
> Great, thanks for the quick turn-around!
> 
>>> 	1. When not running coherently (i.e. before initialising the
>>> 	   coherency fabric), memory is treated as non-shareable,
>>> 	   non-cacheable?
>>
>> It can be cacheable. The shared memory (as defined on the page table)
>> will NOT be coherent by HW.
> 
> Ok, so we really are incoherent before enabling the fabric.
> 
>>> 	2. If (1), then are exclusive accesses the only way to achieve
>>> 	   coherent memory accesses in this scenario?
>>
>> I quote: "I suspect there is terminology miss-use: exclusive accesses
>> are NOT used to achieve memory coherency - they are used to achieve
>> atomicity. To achieve memory coherency while fabric is configured to
>> be non-coherent, SW should use maintenance operations over the L1
>> caches."
> 
> Ok, so if I'm understanding correctly then I don't really see the usefulness
> of having working exclusives that are incoherent. Surely it means that you
> can guarantee mutual exclusion on a lock variable, but the value you actually
> end up reading from the lock is junk unless you litter the accessors with cache
> clean operations?
> 
> Anyway, that's by-the-by as this is all called early enough that we
> shouldn't care. The thing I don't like now is that the fabric initialisation
> is done entirely differently on the primary CPU than the secondaries. The
> primary probes the device-tree (well, it's also now hard-coded for v2) and
> accesses the registers from a C function(armada_370_xp_set_cpu_coherent) whilst
> the secondaries have hardcoded addresses and access via asm
> (armada_xp_secondary_startup).


Now it is hardcoded in both case as you pointed it. So the last
difference is setup from a C function or via asm.

The differences between primary and secondary CPU when they enable the
coherency, is due to the fact that we really are in a different
situation. For primary CPU, as it is the only CPU online it doesn't
need to enable the coherency from the beginning, so we can wait to
have MMU enable and convenient feature. Whereas for the secondary CPU
they need the coherency from the very beginning are by definition they
won't be alone. That's why this very first instruction are written in
asm and they use physical address.

I don't see how to handle it in a different way.

Gregory
Will Deacon Nov. 16, 2012, 6:56 p.m. UTC | #10
On Thu, Nov 15, 2012 at 04:49:17PM +0000, Gregory CLEMENT wrote:
> On 11/15/2012 05:21 PM, Will Deacon wrote:
> > Anyway, that's by-the-by as this is all called early enough that we
> > shouldn't care. The thing I don't like now is that the fabric initialisation
> > is done entirely differently on the primary CPU than the secondaries. The
> > primary probes the device-tree (well, it's also now hard-coded for v2) and
> > accesses the registers from a C function(armada_370_xp_set_cpu_coherent) whilst
> > the secondaries have hardcoded addresses and access via asm
> > (armada_xp_secondary_startup).
> 
> 
> Now it is hardcoded in both case as you pointed it. So the last
> difference is setup from a C function or via asm.
> 
> The differences between primary and secondary CPU when they enable the
> coherency, is due to the fact that we really are in a different
> situation. For primary CPU, as it is the only CPU online it doesn't
> need to enable the coherency from the beginning, so we can wait to
> have MMU enable and convenient feature. Whereas for the secondary CPU
> they need the coherency from the very beginning are by definition they
> won't be alone. That's why this very first instruction are written in
> asm and they use physical address.
> 
> I don't see how to handle it in a different way.

The code paths are fine, I would just like to see less duplication. Can you
make the asm function PCS compliant and call it from C for the primary
(setting the link register to secondary_startup for the secondary cores)?

Will
Gregory CLEMENT Nov. 16, 2012, 7:25 p.m. UTC | #11
On 11/16/2012 07:56 PM, Will Deacon wrote:
> On Thu, Nov 15, 2012 at 04:49:17PM +0000, Gregory CLEMENT wrote:
>> On 11/15/2012 05:21 PM, Will Deacon wrote:
>>> Anyway, that's by-the-by as this is all called early enough that we
>>> shouldn't care. The thing I don't like now is that the fabric initialisation
>>> is done entirely differently on the primary CPU than the secondaries. The
>>> primary probes the device-tree (well, it's also now hard-coded for v2) and
>>> accesses the registers from a C function(armada_370_xp_set_cpu_coherent) whilst
>>> the secondaries have hardcoded addresses and access via asm
>>> (armada_xp_secondary_startup).
>>
>>
>> Now it is hardcoded in both case as you pointed it. So the last
>> difference is setup from a C function or via asm.
>>
>> The differences between primary and secondary CPU when they enable the
>> coherency, is due to the fact that we really are in a different
>> situation. For primary CPU, as it is the only CPU online it doesn't
>> need to enable the coherency from the beginning, so we can wait to
>> have MMU enable and convenient feature. Whereas for the secondary CPU
>> they need the coherency from the very beginning are by definition they
>> won't be alone. That's why this very first instruction are written in
>> asm and they use physical address.
>>
>> I don't see how to handle it in a different way.
> 
> The code paths are fine, I would just like to see less duplication. Can you
> make the asm function PCS compliant and call it from C for the primary
> (setting the link register to secondary_startup for the secondary cores)?

Have you a pointer on how to do it (make the asm function PCS compliant)?

I will also need to add a parameter, because the base address are not the
same between primary CPU and secondary CPUS. With the first we use virtual
address whereas with the second the physical address.


> 
> Will
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Will Deacon Nov. 19, 2012, 10:32 a.m. UTC | #12
On Fri, Nov 16, 2012 at 07:25:53PM +0000, Gregory CLEMENT wrote:
> On 11/16/2012 07:56 PM, Will Deacon wrote:
> > 
> > The code paths are fine, I would just like to see less duplication. Can you
> > make the asm function PCS compliant and call it from C for the primary
> > (setting the link register to secondary_startup for the secondary cores)?
> 
> Have you a pointer on how to do it (make the asm function PCS compliant)?

Take a look at the PCS document:

  http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf

For your case it's really simple though as you don't need to use the stack:
just take the base address in r0 and use a subset of r1-r3 for temporaries
while setting up the control registers. Then mov pc, lr at the end.

> I will also need to add a parameter, because the base address are not the
> same between primary CPU and secondary CPUS. With the first we use virtual
> address whereas with the second the physical address.

Should be simple enough to add a secondary entry point immediately before,
which initialises r0 and lr.

Will
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/arm/coherency-fabric.txt b/Documentation/devicetree/bindings/arm/coherency-fabric.txt
new file mode 100644
index 0000000..2bfbf67
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/coherency-fabric.txt
@@ -0,0 +1,16 @@ 
+Coherency fabric
+----------------
+Available on Marvell SOCs: Armada 370 and Armada XP
+
+Required properties:
+
+- compatible: "marvell,coherency-fabric"
+- reg: Should contain,coherency fabric registers location and length.
+
+Example:
+
+coherency-fabric@d0020200 {
+	compatible = "marvell,coherency-fabric";
+	reg = <0xd0020200 0xb0>;
+};
+
diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
index 94b4b9e..b0d075b 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -36,6 +36,11 @@ 
 	      interrupt-controller;
 	};
 
+	coherency-fabric@d0020200 {
+		compatible = "marvell,coherency-fabric";
+		reg = <0xd0020200 0xb0>;
+	};
+
 	soc {
 		#address-cells = <1>;
 		#size-cells = <1>;
diff --git a/arch/arm/mach-mvebu/Makefile b/arch/arm/mach-mvebu/Makefile
index 57f996b..abd6d3b 100644
--- a/arch/arm/mach-mvebu/Makefile
+++ b/arch/arm/mach-mvebu/Makefile
@@ -2,4 +2,4 @@  ccflags-$(CONFIG_ARCH_MULTIPLATFORM) := -I$(srctree)/$(src)/include \
 	-I$(srctree)/arch/arm/plat-orion/include
 
 obj-y += system-controller.o
-obj-$(CONFIG_MACH_ARMADA_370_XP) += armada-370-xp.o irq-armada-370-xp.o addr-map.o
+obj-$(CONFIG_MACH_ARMADA_370_XP) += armada-370-xp.o irq-armada-370-xp.o addr-map.o coherency.o
diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
new file mode 100644
index 0000000..69e130d
--- /dev/null
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -0,0 +1,89 @@ 
+/*
+ * Coherency fabric (Aurora) support for Armada 370 and XP platforms.
+ *
+ * Copyright (C) 2012 Marvell
+ *
+ * Yehuda Yitschak <yehuday@marvell.com>
+ * Gregory Clement <gregory.clement@free-electrons.com>
+ * Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2.  This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ *
+ * The Armada 370 and Armada XP SOCs have a coherency fabric which is
+ * responsible for ensuring hardware coherency between all CPUs and between
+ * CPUs and I/O masters. This file initializes the coherency fabric and
+ * supplies basic routines for configuring and controlling hardware coherency
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/of_address.h>
+#include <linux/io.h>
+#include <linux/smp.h>
+#include <asm/smp_plat.h>
+#include "armada-370-xp.h"
+
+/* Some functions in this file are called very early during SMP
+ * initialization. At that time the device tree framework is not yet
+ * ready, and it is not possible to get the register address to
+ * ioremap it. That's why the pointer below is given with an initial
+ * value matching its virtual mapping
+ */
+static void __iomem *coherency_base = ARMADA_370_XP_REGS_VIRT_BASE + 0x20200;
+
+/* Coherency fabric registers */
+#define COHERENCY_FABRIC_CTL_OFFSET		   0x0
+#define COHERENCY_FABRIC_CFG_OFFSET		   0x4
+
+static struct of_device_id of_coherency_table[] = {
+	{.compatible = "marvell,coherency-fabric"},
+	{ /* end of list */ },
+};
+
+int coherency_get_cpu_count(void)
+{
+	int reg, cnt;
+
+	reg = readl(coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
+	cnt = (reg & 0xF) + 1;
+
+	return cnt;
+}
+
+int set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
+{
+	int reg;
+
+	if (!coherency_base) {
+		pr_warn("Can't make CPU %d cache coherent.\n", hw_cpu_id);
+		pr_warn("Coherency fabric is not initialized\n");
+		return 1;
+	}
+
+	/* Enable the CPU in coherency fabric */
+	reg = readl(coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
+	reg |= 1 << (24 + hw_cpu_id);
+	writel(reg, coherency_base + COHERENCY_FABRIC_CTL_OFFSET);
+
+	/* Add CPU to SMP group */
+	reg = readl(coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
+	reg |= 1 << (16 + hw_cpu_id + (smp_group_id == 0 ? 8 : 0));
+	writel(reg, coherency_base + COHERENCY_FABRIC_CFG_OFFSET);
+
+	return 0;
+}
+
+int __init coherency_init(void)
+{
+	struct device_node *np;
+
+	np = of_find_matching_node(NULL, of_coherency_table);
+	if (np) {
+		pr_info("Initializing Coherency fabric\n");
+		coherency_base = of_iomap(np, 0);
+	}
+
+	return 0;
+}
diff --git a/arch/arm/mach-mvebu/coherency.h b/arch/arm/mach-mvebu/coherency.h
new file mode 100644
index 0000000..6182192
--- /dev/null
+++ b/arch/arm/mach-mvebu/coherency.h
@@ -0,0 +1,21 @@ 
+/*
+ * arch/arm/mach-mvebu/include/mach/coherency.h
+ *
+ *
+ * Coherency fabric (Aurora) support for Armada 370 and XP platforms.
+ *
+ * Copyright (C) 2012 Marvell
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2.  This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifndef __MACH_370_XP_COHERENCY_H
+#define __MACH_370_XP_COHERENCY_H
+
+int set_cpu_coherent(int cpu_id, int smp_group_id);
+int coherency_get_cpu_count(void);
+int coherency_init(void);
+
+#endif	/* __MACH_370_XP_COHERENCY_H */
diff --git a/arch/arm/mach-mvebu/common.h b/arch/arm/mach-mvebu/common.h
index 281fab3..ea08919 100644
--- a/arch/arm/mach-mvebu/common.h
+++ b/arch/arm/mach-mvebu/common.h
@@ -21,4 +21,6 @@  void mvebu_clocks_init(void);
 void armada_370_xp_init_irq(void);
 void armada_370_xp_handle_irq(struct pt_regs *regs);
 
+
+int armada_370_xp_coherency_init(void);
 #endif