diff mbox

[RFC] edac: add support for ARM PL310 L2 cache parity

Message ID a1a9e47a-069c-4285-9899-d5a58cf532c2@CO9EHSMHS003.ehs.local (mailing list archive)
State New, archived
Headers show

Commit Message

Punnaiah Choudary Kalluri March 2, 2014, 2:32 p.m. UTC
Add support for ARM Pl310 L2 cache controller parity error

Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
---
 .../devicetree/bindings/edac/pl310_edac_l2.txt     |   19 ++
 drivers/edac/Kconfig                               |    7 +
 drivers/edac/Makefile                              |    1 +
 drivers/edac/pl310_edac_l2.c                       |  236 ++++++++++++++++++++
 4 files changed, 263 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
 create mode 100644 drivers/edac/pl310_edac_l2.c

Comments

Michal Simek April 3, 2014, 3:02 p.m. UTC | #1
Hi Borislav,

On 03/02/2014 03:32 PM, Punnaiah Choudary Kalluri wrote:
> Add support for ARM Pl310 L2 cache controller parity error
> 
> Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
> ---
>  .../devicetree/bindings/edac/pl310_edac_l2.txt     |   19 ++
>  drivers/edac/Kconfig                               |    7 +
>  drivers/edac/Makefile                              |    1 +
>  drivers/edac/pl310_edac_l2.c                       |  236 ++++++++++++++++++++
>  4 files changed, 263 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
>  create mode 100644 drivers/edac/pl310_edac_l2.c

Any comment about this driver?

Thanks,
Michal
Borislav Petkov April 3, 2014, 3:24 p.m. UTC | #2
On Thu, Apr 03, 2014 at 05:02:30PM +0200, Michal Simek wrote:
> Any comment about this driver?

It is all on the TODO list. I'll take a look after the merge window
closes.

Thanks.
Michal Simek April 3, 2014, 3:25 p.m. UTC | #3
On 04/03/2014 05:24 PM, Borislav Petkov wrote:
> On Thu, Apr 03, 2014 at 05:02:30PM +0200, Michal Simek wrote:
>> Any comment about this driver?
> 
> It is all on the TODO list. I'll take a look after the merge window
> closes.

Ok. Thank you,
Michal
Borislav Petkov April 9, 2014, 11:32 a.m. UTC | #4
On Sun, Mar 02, 2014 at 08:02:40PM +0530, Punnaiah Choudary Kalluri wrote:
> Add support for ARM Pl310 L2 cache controller parity error
> 
> Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
> ---
>  .../devicetree/bindings/edac/pl310_edac_l2.txt     |   19 ++
>  drivers/edac/Kconfig                               |    7 +
>  drivers/edac/Makefile                              |    1 +
>  drivers/edac/pl310_edac_l2.c                       |  236 ++++++++++++++++++++
>  4 files changed, 263 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
>  create mode 100644 drivers/edac/pl310_edac_l2.c
> 
> diff --git a/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
> new file mode 100644
> index 0000000..94fbb8d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
> @@ -0,0 +1,19 @@
> +Pl310 L2 Cache EDAC driver, it does reports the data and tag ram parity errors.
> +
> +Required properties:
> +- compatible: Should be "arm,pl310-cache".
> +- intterupts: Interrupt number to the cpu.
> +- reg: Physical base address and size of cache controller's memory mapped
> +  registers
> +
> +Example:
> +++++++++
> +
> +	L2: cache-controller {
> +		compatible = "arm,pl310-cache";
> +		interrupts = <0 2 4>;
> +		reg = <0xf8f02000 0x1000>;
> +	};
> +
> +PL310 L2 Cache EDAC driver detects the Parity enable state by reading the
> +appropriate control register.
> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
> index 878f090..059ac31 100644
> --- a/drivers/edac/Kconfig
> +++ b/drivers/edac/Kconfig
> @@ -326,6 +326,13 @@ config EDAC_TILE
>  	  Support for error detection and correction on the
>  	  Tilera memory controller.
>  
> +config EDAC_PL310_L2
> +	tristate "Pl310 L2 Cache Controller"
> +	depends on EDAC_MM_EDAC && ARM
> +	help
> +	  Support for parity error detection on L2 cache controller
> +	  data and tag ram memory
> +


Ok, so I'm looking at this after having looked at the synopsys thing
and it looks very similar in functionality - it does the basic dance of
registering and setting up stuff, only using different devicetree nodes,
regs, etc.

However, it adds a new file under drivers/edac/ and I'm wondering if it
wouldn't be better to simply create a xilinx_edac.c and put all your
stuff in there, maybe even share code by abstracting it nicely. Having
a separate driver only for a single L2 cache controller seems kinda too
granulary for me.

Thanks.
Rob Herring April 9, 2014, 1:18 p.m. UTC | #5
On Wed, Apr 9, 2014 at 6:32 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Sun, Mar 02, 2014 at 08:02:40PM +0530, Punnaiah Choudary Kalluri wrote:
>> Add support for ARM Pl310 L2 cache controller parity error
>>
>> Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
>> ---
>>  .../devicetree/bindings/edac/pl310_edac_l2.txt     |   19 ++
>>  drivers/edac/Kconfig                               |    7 +
>>  drivers/edac/Makefile                              |    1 +
>>  drivers/edac/pl310_edac_l2.c                       |  236 ++++++++++++++++++++
>>  4 files changed, 263 insertions(+), 0 deletions(-)
>>  create mode 100644 Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
>>  create mode 100644 drivers/edac/pl310_edac_l2.c
>>
>> diff --git a/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
>> new file mode 100644
>> index 0000000..94fbb8d
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
>> @@ -0,0 +1,19 @@
>> +Pl310 L2 Cache EDAC driver, it does reports the data and tag ram parity errors.
>> +
>> +Required properties:
>> +- compatible: Should be "arm,pl310-cache".
>> +- intterupts: Interrupt number to the cpu.
>> +- reg: Physical base address and size of cache controller's memory mapped
>> +  registers
>> +
>> +Example:
>> +++++++++
>> +
>> +     L2: cache-controller {
>> +             compatible = "arm,pl310-cache";
>> +             interrupts = <0 2 4>;
>> +             reg = <0xf8f02000 0x1000>;
>> +     };
>> +
>> +PL310 L2 Cache EDAC driver detects the Parity enable state by reading the
>> +appropriate control register.
>> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
>> index 878f090..059ac31 100644
>> --- a/drivers/edac/Kconfig
>> +++ b/drivers/edac/Kconfig
>> @@ -326,6 +326,13 @@ config EDAC_TILE
>>         Support for error detection and correction on the
>>         Tilera memory controller.
>>
>> +config EDAC_PL310_L2
>> +     tristate "Pl310 L2 Cache Controller"
>> +     depends on EDAC_MM_EDAC && ARM
>> +     help
>> +       Support for parity error detection on L2 cache controller
>> +       data and tag ram memory
>> +
>
>
> Ok, so I'm looking at this after having looked at the synopsys thing
> and it looks very similar in functionality - it does the basic dance of
> registering and setting up stuff, only using different devicetree nodes,
> regs, etc.
>
> However, it adds a new file under drivers/edac/ and I'm wondering if it
> wouldn't be better to simply create a xilinx_edac.c and put all your
> stuff in there, maybe even share code by abstracting it nicely. Having
> a separate driver only for a single L2 cache controller seems kinda too
> granulary for me.

I don't think so, the PL310 is present on lots of ARM chips besides
Xilinx. I don't know how many support parity as that is optional. In
fact the highbank_l2_edac.c is for the PL310 as well, but the
registers it uses is all custom logic added for ECC and there is no
part of the PL310 h/w used by the driver.

If there is lots duplication, then that's a sign the framework needs
to handle more of the boilerplate pieces. There could be a "simple"
driver/library for devices which are no more than some registers, an
interrupt handler and static information about the type of EDAC
device.

Rob
Borislav Petkov April 9, 2014, 3:19 p.m. UTC | #6
On Wed, Apr 09, 2014 at 08:18:28AM -0500, Rob Herring wrote:
> I don't think so, the PL310 is present on lots of ARM chips besides
> Xilinx. I don't know how many support parity as that is optional. In
> fact the highbank_l2_edac.c is for the PL310 as well, but the
> registers it uses is all custom logic added for ECC and there is no
> part of the PL310 h/w used by the driver.

Oh ok, so highbank_l2 and PL310 could theoretically be merged together
in one compilation unit, even if they don't really share code at all...

> If there is lots duplication, then that's a sign the framework needs
> to handle more of the boilerplate pieces. There could be a "simple"
> driver/library for devices which are no more than some registers, an
> interrupt handler and static information about the type of EDAC
> device.

Yeah, it's not that - I'm just getting worried that I'm receiving an
EDAC driver for each piece of silicon out there and would like to still
keep drivers/edac/ sane and be able to control that wild growth.

I'm just thinking out loud here, bear with me pls:

Frankly, having a single compilation unit contain similar silicon
functionality could be a good way to put a hold on the growth but the
disadvantage of this is fatter drivers. Which wouldn't matter all too
much but after a certain level of fat, they might need splitting.

And the highbank version is nothing but the big probe routine and a
small irq handler.

And the PL310 is similar but also with a poller.

I guess, if they don't share functionality at all, putting them together
might not be worth it. Hohummm.

Thanks.
Punnaiah Choudary April 9, 2014, 5:29 p.m. UTC | #7
There is a driver file cache-l2x0.c under arch/arm/mm for pl310 cache
configuration and management. Russel king had suggested to use
single driver file for both pl310 edac implementation and cache-l2x0.c
Here is the thread
http://www.spinics.net/lists/arm-kernel/msg320407.html

please provide your inputs

Thanks,
Punnaiah

On Wed, Apr 9, 2014 at 8:49 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Wed, Apr 09, 2014 at 08:18:28AM -0500, Rob Herring wrote:
>> I don't think so, the PL310 is present on lots of ARM chips besides
>> Xilinx. I don't know how many support parity as that is optional. In
>> fact the highbank_l2_edac.c is for the PL310 as well, but the
>> registers it uses is all custom logic added for ECC and there is no
>> part of the PL310 h/w used by the driver.
>
> Oh ok, so highbank_l2 and PL310 could theoretically be merged together
> in one compilation unit, even if they don't really share code at all...
>
>> If there is lots duplication, then that's a sign the framework needs
>> to handle more of the boilerplate pieces. There could be a "simple"
>> driver/library for devices which are no more than some registers, an
>> interrupt handler and static information about the type of EDAC
>> device.
>
> Yeah, it's not that - I'm just getting worried that I'm receiving an
> EDAC driver for each piece of silicon out there and would like to still
> keep drivers/edac/ sane and be able to control that wild growth.
>
> I'm just thinking out loud here, bear with me pls:
>
> Frankly, having a single compilation unit contain similar silicon
> functionality could be a good way to put a hold on the growth but the
> disadvantage of this is fatter drivers. Which wouldn't matter all too
> much but after a certain level of fat, they might need splitting.
>
> And the highbank version is nothing but the big probe routine and a
> small irq handler.
>
> And the PL310 is similar but also with a poller.
>
> I guess, if they don't share functionality at all, putting them together
> might not be worth it. Hohummm.
>
> Thanks.
>
> --
> Regards/Gruss,
>     Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
Borislav Petkov April 9, 2014, 5:47 p.m. UTC | #8
On Wed, Apr 09, 2014 at 10:59:49PM +0530, Punnaiah Choudary wrote:
> There is a driver file cache-l2x0.c under arch/arm/mm for pl310 cache
> configuration and management. Russel king had suggested to use
> single driver file for both pl310 edac implementation and cache-l2x0.c
> Here is the thread
> http://www.spinics.net/lists/arm-kernel/msg320407.html
> 
> please provide your inputs

Well, it is very simple - you don't really need an EDAC compilation unit
just so that you can report errors - you can very well do that in your
cache-l2x0.c and boom, problem solved.

Simply move the irq handler and the parity error checker to that file.

:-)
Michal Simek April 10, 2014, 6:12 a.m. UTC | #9
On 04/09/2014 07:47 PM, Borislav Petkov wrote:
> On Wed, Apr 09, 2014 at 10:59:49PM +0530, Punnaiah Choudary wrote:
>> There is a driver file cache-l2x0.c under arch/arm/mm for pl310 cache
>> configuration and management. Russel king had suggested to use
>> single driver file for both pl310 edac implementation and cache-l2x0.c
>> Here is the thread
>> http://www.spinics.net/lists/arm-kernel/msg320407.html
>>
>> please provide your inputs
> 
> Well, it is very simple - you don't really need an EDAC compilation unit
> just so that you can report errors - you can very well do that in your
> cache-l2x0.c and boom, problem solved.
> 
> Simply move the irq handler and the parity error checker to that file.

I am just curious about this recommendation. Does it mean that we shouldn't
use edac interface just for reporting problems?

I didn't play with it but I guess that there is a record about edac driver
via sysfs, etc. You could read some useful information that you know
what it is protected and I hope that edac interface bring any value
to reporting errors comparing to just printk message from IRQ handler.

My question is if using printk in IRQ handler and report problem is equal to
reporting this via edac interface. Or it is just easy way to do but
using edac interface is right solution and how to do it properly
is different question.

Thanks,
Michal
Borislav Petkov April 10, 2014, 9:02 a.m. UTC | #10
On Thu, Apr 10, 2014 at 08:12:17AM +0200, Michal Simek wrote:
> I am just curious about this recommendation. Does it mean that we
> shouldn't use edac interface just for reporting problems?

Yes, we should. But, if you want to have two different drivers accessing
the hardware and have to build synchronization around it, I'm simply
putting the simplest solution on the table in case it is viable. If all
you want to do is report errors to dmesg, then you don't need the edac
stuff.

> I didn't play with it but I guess that there is a record about edac
> driver via sysfs, etc. You could read some useful information that you
> know what it is protected and I hope that edac interface bring any
> value to reporting errors comparing to just printk message from IRQ
> handler.

Yes, you can read error counts and DIMM layout. I'm adding a full dump
of sysfs on my machine at the end. Take a look at it and think whether
this could be something you could use/need/like.

But, if you have only one DIMM slot available on the hw (I'm not even
going to pretend to know your possible DIMM layouts) then maybe EDAC is
not needed at all - people would know which DIMM is at fault :-)

And so on and so on.

> My question is if using printk in IRQ handler and report problem is
> equal to reporting this via edac interface. Or it is just easy way
> to do but using edac interface is right solution and how to do it
> properly is different question.

Yeah, it would probably be easier if you would point out first what you
want to use the edac interface for and we can tell you whether it's
already there/doable/makes sense, etc.

Thanks.

/sys/devices/system/edac/mc/mc0/dbam:0x0000000000000077
/sys/devices/system/edac/mc/mc0/dhar:0x00000000c0004003
/sys/devices/system/edac/mc/mc0/inject_ecc_vector:0x0
/sys/devices/system/edac/mc/mc0/seconds_since_reset:169051
/sys/devices/system/edac/mc/mc0/ce_noinfo_count:0
/sys/devices/system/edac/mc/mc0/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank0/size:2048
/sys/devices/system/edac/mc/mc0/rank0/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank0/dimm_location:csrow 0 channel 0 
/sys/devices/system/edac/mc/mc0/rank0/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank0/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank0/dimm_label:mc#0csrow#0channel#0
/sys/devices/system/edac/mc/mc0/rank0/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank1/size:2048
/sys/devices/system/edac/mc/mc0/rank1/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank1/dimm_location:csrow 0 channel 1 
/sys/devices/system/edac/mc/mc0/rank1/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank1/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank1/dimm_label:mc#0csrow#0channel#1
/sys/devices/system/edac/mc/mc0/rank1/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank2/size:2048
/sys/devices/system/edac/mc/mc0/rank2/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank2/dimm_location:csrow 1 channel 0 
/sys/devices/system/edac/mc/mc0/rank2/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank2/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank2/dimm_label:mc#0csrow#1channel#0
/sys/devices/system/edac/mc/mc0/rank2/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank3/size:2048
/sys/devices/system/edac/mc/mc0/rank3/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank3/dimm_location:csrow 1 channel 1 
/sys/devices/system/edac/mc/mc0/rank3/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank3/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank3/dimm_label:mc#0csrow#1channel#1
/sys/devices/system/edac/mc/mc0/rank3/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank4/size:2048
/sys/devices/system/edac/mc/mc0/rank4/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank4/dimm_location:csrow 2 channel 0 
/sys/devices/system/edac/mc/mc0/rank4/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank4/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank4/dimm_label:mc#0csrow#2channel#0
/sys/devices/system/edac/mc/mc0/rank4/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank5/size:2048
/sys/devices/system/edac/mc/mc0/rank5/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank5/dimm_location:csrow 2 channel 1 
/sys/devices/system/edac/mc/mc0/rank5/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank5/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank5/dimm_label:mc#0csrow#2channel#1
/sys/devices/system/edac/mc/mc0/rank5/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank6/size:2048
/sys/devices/system/edac/mc/mc0/rank6/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank6/dimm_location:csrow 3 channel 0 
/sys/devices/system/edac/mc/mc0/rank6/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank6/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank6/dimm_label:mc#0csrow#3channel#0
/sys/devices/system/edac/mc/mc0/rank6/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank7/size:2048
/sys/devices/system/edac/mc/mc0/rank7/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank7/dimm_location:csrow 3 channel 1 
/sys/devices/system/edac/mc/mc0/rank7/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank7/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank7/dimm_label:mc#0csrow#3channel#1
/sys/devices/system/edac/mc/mc0/rank7/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/sdram_scrub_rate:195300
/sys/devices/system/edac/mc/mc0/inject_word:0x0
/sys/devices/system/edac/mc/mc0/size_mb:16384
/sys/devices/system/edac/mc/mc0/max_location:csrow 7 channel 1 
/sys/devices/system/edac/mc/mc0/mc_name:F15h
/sys/devices/system/edac/mc/mc0/csrow0/power/async:disabled
/sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label:mc#0csrow#0channel#0
/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch1_dimm_label:mc#0csrow#0channel#1
/sys/devices/system/edac/mc/mc0/csrow0/size_mb:4096
/sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ue_count:0
/sys/devices/system/edac/mc/mc0/csrow0/dev_type:Unknown
/sys/devices/system/edac/mc/mc0/csrow0/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/csrow0/mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/csrow1/power/async:disabled
/sys/devices/system/edac/mc/mc0/csrow1/ch0_dimm_label:mc#0csrow#1channel#0
/sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow1/ch1_dimm_label:mc#0csrow#1channel#1
/sys/devices/system/edac/mc/mc0/csrow1/size_mb:4096
/sys/devices/system/edac/mc/mc0/csrow1/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow1/ue_count:0
/sys/devices/system/edac/mc/mc0/csrow1/dev_type:Unknown
/sys/devices/system/edac/mc/mc0/csrow1/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow1/edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/csrow1/mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/csrow2/power/async:disabled
/sys/devices/system/edac/mc/mc0/csrow2/ch0_dimm_label:mc#0csrow#2channel#0
/sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/ch1_dimm_label:mc#0csrow#2channel#1
/sys/devices/system/edac/mc/mc0/csrow2/size_mb:4096
/sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/ue_count:0
/sys/devices/system/edac/mc/mc0/csrow2/dev_type:Unknown
/sys/devices/system/edac/mc/mc0/csrow2/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/csrow2/mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/csrow3/power/async:disabled
/sys/devices/system/edac/mc/mc0/csrow3/ch0_dimm_label:mc#0csrow#3channel#0
/sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ch1_dimm_label:mc#0csrow#3channel#1
/sys/devices/system/edac/mc/mc0/csrow3/size_mb:4096
/sys/devices/system/edac/mc/mc0/csrow3/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ue_count:0
/sys/devices/system/edac/mc/mc0/csrow3/dev_type:Unknown
/sys/devices/system/edac/mc/mc0/csrow3/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/csrow3/mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/topmem2:0x000000043f000000
/sys/devices/system/edac/mc/mc0/ue_count:0
/sys/devices/system/edac/mc/mc0/topmem:0x00000000c0000000
/sys/devices/system/edac/mc/mc0/ue_noinfo_count:0
/sys/devices/system/edac/mc/mc0/ce_count:0
/sys/devices/system/edac/mc/mc0/dram_hole:c0000000 40000000 40000000
/sys/devices/system/edac/mc/mc0/inject_section:0x0
/sys/devices/system/edac/mc/power/async:disabled
/sys/devices/system/edac/pci/pci0/pe_count:0
/sys/devices/system/edac/pci/pci0/npe_count:0
/sys/devices/system/edac/pci/edac_pci_log_npe:1
/sys/devices/system/edac/pci/edac_pci_panic_on_pe:0
/sys/devices/system/edac/pci/edac_pci_log_pe:1
/sys/devices/system/edac/pci/pci_nonparity_count:0
/sys/devices/system/edac/pci/check_pci_errors:0
/sys/devices/system/edac/pci/pci_parity_count:0
/sys/devices/system/edac/power/async:disabled
Michal Simek April 10, 2014, 10:09 a.m. UTC | #11
Hi Borislav,

>> My question is if using printk in IRQ handler and report problem is
>> equal to reporting this via edac interface. Or it is just easy way
>> to do but using edac interface is right solution and how to do it
>> properly is different question.
> 
> Yeah, it would probably be easier if you would point out first what you
> want to use the edac interface for and we can tell you whether it's
> already there/doable/makes sense, etc.

OK. This is the v1 that's why we are having this discussion now
and I really appreciate your recommendation.
It wasn't too complicated to create this driver that's why not big deal
and it is always better to talk about the code and how to handle it properly.

I had a call with Punnaiah regarding this L2 edac driver and please
correct me if my understanding of pl310 parity error
is wrong (I didn't read pl310 specification).
There are 2 memories in pl310 - data and tag,
Through controller we are able to detect 2 errors:
data parity error and tag parity error.
Both of them are uncorrectable errors and could be just reported.
L2 contains data and instructions together that's why error can
happen in data or instruction part.

The question here is.
This driver is just reporting problem through edac interface
which is counting that errors and provide an unified way
how to report problems.

Maybe as you said we don't need to use edac interface at all
but by design because every error means that there is the problem and
error should be reported and system should be reset because we just
don't know where the problem is. We know that we have a problem.

The question also is if we should execute any code because the problem
can be with instructions and system should just reset.

Isn't there any security issue that even executing any code is a problem?

From the code I see that IRQ can be raised also for different things
because only errors are handled here (BTW: IRQ_HANDLED should be return when
IRQ is actually handled).

I just want to make sure that edac is right interface, we have right reactions
and this is how to handle this problem.

Thanks,
Michal
Borislav Petkov April 11, 2014, 1:14 p.m. UTC | #12
On Thu, Apr 10, 2014 at 12:09:03PM +0200, Michal Simek wrote:
> The question here is. This driver is just reporting problem through
> edac interface which is counting that errors and provide an unified
> way how to report problems.

Yes, normally you can use edac for reporting and error counting. But,
if, as I said earlier, it is easier to solve your issue of having two
entities touch one hardware and synchronizing around it is too much,
just for this one case, you can simply report the errors with simple
printk, without the edac interface.

This is why I was asking the practical question of why do you even need
the edac interface? If it is only for reporting, use printk and solve
the problem of having two drivers.

> Maybe as you said we don't need to use edac interface at all but by
> design because every error means that there is the problem and error
> should be reported and system should be reset because we just don't
> know where the problem is. We know that we have a problem.
>
> The question also is if we should execute any code because the problem
> can be with instructions and system should just reset.
>
> Isn't there any security issue that even executing any code is a
> problem?

Well, this is up to you to answer. If an UE (Uncorrectable Error) causes
data to get corrupted on your system, which, as a result, corrupts
visible state which lands on storage, you definitely want to stop
executing any code. x86 deals very rigorously with errors like those
by running an exception handler, on AMD there's also this thing called
syncflood which stops any execution and a warm reset happens.

So you have to think hard what those UEs cause on your systems and only
then act accordingly. If something bad like the above happens, the last
thing you want to do is report them to dmesg.

HTH.
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
new file mode 100644
index 0000000..94fbb8d
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
@@ -0,0 +1,19 @@ 
+Pl310 L2 Cache EDAC driver, it does reports the data and tag ram parity errors.
+
+Required properties:
+- compatible: Should be "arm,pl310-cache".
+- intterupts: Interrupt number to the cpu.
+- reg: Physical base address and size of cache controller's memory mapped
+  registers
+
+Example:
+++++++++
+
+	L2: cache-controller {
+		compatible = "arm,pl310-cache";
+		interrupts = <0 2 4>;
+		reg = <0xf8f02000 0x1000>;
+	};
+
+PL310 L2 Cache EDAC driver detects the Parity enable state by reading the
+appropriate control register.
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 878f090..059ac31 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -326,6 +326,13 @@  config EDAC_TILE
 	  Support for error detection and correction on the
 	  Tilera memory controller.
 
+config EDAC_PL310_L2
+	tristate "Pl310 L2 Cache Controller"
+	depends on EDAC_MM_EDAC && ARM
+	help
+	  Support for parity error detection on L2 cache controller
+	  data and tag ram memory
+
 config EDAC_HIGHBANK_MC
 	tristate "Highbank Memory Controller"
 	depends on EDAC_MM_EDAC && ARCH_HIGHBANK
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 4154ed6..179356f 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -64,3 +64,4 @@  obj-$(CONFIG_EDAC_OCTEON_PC)		+= octeon_edac-pc.o
 obj-$(CONFIG_EDAC_OCTEON_L2C)		+= octeon_edac-l2c.o
 obj-$(CONFIG_EDAC_OCTEON_LMC)		+= octeon_edac-lmc.o
 obj-$(CONFIG_EDAC_OCTEON_PCI)		+= octeon_edac-pci.o
+obj-$(CONFIG_EDAC_PL310_L2)		+= pl310_edac_l2.o
diff --git a/drivers/edac/pl310_edac_l2.c b/drivers/edac/pl310_edac_l2.c
new file mode 100644
index 0000000..649371a
--- /dev/null
+++ b/drivers/edac/pl310_edac_l2.c
@@ -0,0 +1,236 @@ 
+/*
+ * Pl310 L2 Cache EDAC Driver
+ *
+ * Copyright (C) 2013-2014 Xilinx, Inc.
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/edac.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <asm/hardware/cache-l2x0.h>
+#include "edac_core.h"
+
+/* Auxilary control register definitions */
+#define L2X0_AUX_CTRL_PARITY_MASK	BIT(21)
+
+/* Interrupt imask/status/clear register definitions */
+#define L2X0_INTR_PARRD_MASK	0x4
+#define L2X0_INTR_PARRT_MASK	0x2
+
+/**
+ * struct pl310_edac_l2_priv - Zynq L2 cache controller private instance data
+ * @base:		Base address of the controller
+ * @irq:		Interrupt number
+ */
+struct pl310_edac_l2_priv {
+	void __iomem *base;
+	int irq;
+};
+
+/**
+ * pl310_edac_l2_parityerr_check - Check controller staus for parity errors
+ * @dci:	Pointer to the edac device controller instance
+ *
+ * This routine is used to check and post parity errors
+ */
+static void pl310_edac_l2_parityerr_check(struct edac_device_ctl_info *dci)
+{
+	struct pl310_edac_l2_priv *priv = dci->pvt_info;
+	u32 regval;
+
+	regval = readl(priv->base + L2X0_RAW_INTR_STAT);
+	if (regval & L2X0_INTR_PARRD_MASK) {
+		/* Data parity error will be reported as correctable error */
+		writel(L2X0_INTR_PARRD_MASK, priv->base + L2X0_INTR_CLEAR);
+		edac_device_handle_ce(dci, 0, 0, dci->ctl_name);
+	}
+	if (regval & L2X0_INTR_PARRT_MASK) {
+		/* tag parity error will be reported as uncorrectable error */
+		writel(L2X0_INTR_PARRT_MASK, priv->base + L2X0_INTR_CLEAR);
+		edac_device_handle_ue(dci, 0, 0, dci->ctl_name);
+	}
+}
+
+/**
+ * pl310_edac_l2_int_handler - ISR fucntion for l2cahe controller
+ * @irq:	Irq Number
+ * @device:	Pointer to the edac device controller instance
+ *
+ * This routine is triggered whenever there is parity error detected
+ *
+ * Return: Always returns IRQ_HANDLED
+ */
+static irqreturn_t pl310_edac_l2_int_handler(int irq, void *device)
+{
+	pl310_edac_l2_parityerr_check((struct edac_device_ctl_info *)device);
+	return IRQ_HANDLED;
+}
+
+/**
+ * pl310_edac_l2_poll_handler - Poll the status reg for parity errors
+ * @dci:	Pointer to the edac device controller instance
+ *
+ * This routine is used to check and post parity errors and is called by
+ * the EDAC polling thread
+ */
+static void pl310_edac_l2_poll_handler(struct edac_device_ctl_info *dci)
+{
+	pl310_edac_l2_parityerr_check(dci);
+}
+
+/**
+ * pl310_edac_l2_get_paritystate - check the parity enable/disable status
+ * @base:	Pointer to the contoller base address
+ *
+ * This routine returns the parity enable/diable status for the controller
+ *
+ * Return: true/false -  parity enabled/disabled.
+ */
+static bool pl310_edac_l2_get_paritystate(void __iomem *base)
+{
+	u32 regval;
+
+	regval = readl(base + L2X0_AUX_CTRL);
+	if (regval & L2X0_AUX_CTRL_PARITY_MASK)
+		return true;
+
+	return false;
+}
+
+/**
+ * pl310_edac_l2_probe - Check controller and bind driver
+ * @pdev:	Pointer to the platform_device struct
+ *
+ * This routine probes a specific arm,pl310-cache instance for binding
+ * with the driver.
+ *
+ * Return: 0 if the controller instance was successfully bound to the
+ * driver; otherwise, < 0 on error.
+ */
+static int pl310_edac_l2_probe(struct platform_device *pdev)
+{
+	struct edac_device_ctl_info *dci;
+	struct pl310_edac_l2_priv *priv;
+	int rc;
+	struct resource *res;
+	void __iomem *baseaddr;
+	u32 regval;
+
+	/* Get the data from the platform device */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	baseaddr = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(baseaddr))
+		return PTR_ERR(baseaddr);
+
+	/* Check for the ecc enable status */
+	if (pl310_edac_l2_get_paritystate(baseaddr) == false) {
+		dev_err(&pdev->dev, "parity check not enabled\n");
+		return -ENXIO;
+	}
+
+	dci = edac_device_alloc_ctl_info(sizeof(*priv), "l2cache",
+					 1, "L", 1, 1, NULL, 0, 0);
+	if (IS_ERR(dci))
+		return PTR_ERR(dci);
+
+	priv = dci->pvt_info;
+	priv->base = baseaddr;
+	dci->dev = &pdev->dev;
+	dci->mod_name = "pl310_edac_l2";
+	dci->ctl_name = "pl310_l2_controller";
+	dci->dev_name = dev_name(&pdev->dev);
+
+	priv->irq = platform_get_irq(pdev, 0);
+	rc = devm_request_irq(&pdev->dev, priv->irq,
+				pl310_edac_l2_int_handler,
+				0, dev_name(&pdev->dev), (void *)dci);
+	if (rc < 0) {
+		dci->edac_check = pl310_edac_l2_poll_handler;
+		edac_op_state = EDAC_OPSTATE_POLL;
+	}
+
+	rc = edac_device_add_device(dci);
+	if (rc) {
+		dev_err(&pdev->dev, "failed to register with EDAC core\n");
+		goto del_edac_device;
+	}
+
+	if (edac_op_state != EDAC_OPSTATE_POLL) {
+		regval = readl(priv->base+L2X0_INTR_MASK);
+		regval |= (L2X0_INTR_PARRD_MASK | L2X0_INTR_PARRT_MASK);
+		writel(regval, priv->base+L2X0_INTR_MASK);
+	}
+
+	return rc;
+
+del_edac_device:
+	edac_device_del_device(&pdev->dev);
+	edac_device_free_ctl_info(dci);
+
+	return rc;
+}
+
+/**
+ * pl310_edac_l2_remove - Unbind driver from controller
+ * @pdev:	Pointer to the platform_device struct
+ *
+ * This routine unbinds the EDAC device controller instance associated
+ * with the specified arm,pl310-cache controller described by the
+ * OpenFirmware device tree node passed as a parameter.
+ *
+ * Return: Always returns 0
+ */
+static int pl310_edac_l2_remove(struct platform_device *pdev)
+{
+	struct edac_device_ctl_info *dci = platform_get_drvdata(pdev);
+	struct pl310_edac_l2_priv *priv = dci->pvt_info;
+	u32 regval;
+
+	if (edac_op_state != EDAC_OPSTATE_POLL) {
+		regval = readl(priv->base+L2X0_INTR_MASK);
+		regval &= ~(L2X0_INTR_PARRD_MASK | L2X0_INTR_PARRT_MASK);
+		writel(regval, priv->base+L2X0_INTR_MASK);
+	}
+
+	edac_device_del_device(&pdev->dev);
+	edac_device_free_ctl_info(dci);
+
+	return 0;
+}
+
+/* Device tree node type and compatible tuples this driver can match on */
+static struct of_device_id pl310_edac_l2_match[] = {
+	{ .compatible = "arm,pl310-cache", },
+	{ /* end of table */ }
+};
+
+MODULE_DEVICE_TABLE(of, pl310_edac_l2_match);
+
+static struct platform_driver pl310_edac_l2_driver = {
+	.driver = {
+		 .name = "pl310-edac-l2",
+		 .owner = THIS_MODULE,
+		 .of_match_table = pl310_edac_l2_match,
+	},
+	.probe = pl310_edac_l2_probe,
+	.remove = pl310_edac_l2_remove,
+};
+
+module_platform_driver(pl310_edac_l2_driver);
+
+MODULE_AUTHOR("Xilinx Inc.");
+MODULE_DESCRIPTION("pl310 L2 EDAC driver");
+MODULE_LICENSE("GPL v2");