diff mbox

[PATCHv3,2/7] EDAC, altera: Add panic flag check to A10 IRQ

Message ID 1465852752-11018-3-git-send-email-tthayer@opensource.altera.com (mailing list archive)
State New, archived
Headers show

Commit Message

tthayer@opensource.altera.com June 13, 2016, 9:19 p.m. UTC
From: Thor Thayer <tthayer@opensource.altera.com>

In preparation for additional memory module ECCs, the
IRQ function will check a panic flag before doing a
kernel panic on double bit errors. ECCs on buffers
will not cause a kernel panic on DBERRs.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
---
v2  New patch. Add panic flag to IRQ function.
v3  No change
---
 drivers/edac/altera_edac.c |    4 +++-
 drivers/edac/altera_edac.h |    1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Borislav Petkov June 17, 2016, 4:51 p.m. UTC | #1
On Mon, Jun 13, 2016 at 04:19:07PM -0500, tthayer@opensource.altera.com wrote:
> From: Thor Thayer <tthayer@opensource.altera.com>
> 
> In preparation for additional memory module ECCs, the
> IRQ function will check a panic flag before doing a
> kernel panic on double bit errors. ECCs on buffers
> will not cause a kernel panic on DBERRs.
> 
> Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
> ---
> v2  New patch. Add panic flag to IRQ function.
> v3  No change
> ---
>  drivers/edac/altera_edac.c |    4 +++-
>  drivers/edac/altera_edac.h |    1 +
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
> index 926bcaf..a9d8fa7 100644
> --- a/drivers/edac/altera_edac.c
> +++ b/drivers/edac/altera_edac.c
> @@ -897,7 +897,8 @@ static irqreturn_t altr_edac_a10_ecc_irq(int irq, void *dev_id)
>  		writel(ALTR_A10_ECC_DERRPENA,
>  		       base + ALTR_A10_ECC_INTSTAT_OFST);
>  		edac_device_handle_ue(dci->edac_dev, 0, 0, dci->edac_dev_name);
> -		panic("\nEDAC:ECC_DEVICE[Uncorrectable errors]\n");
> +		if (dci->data->panic)
> +			panic("\nEDAC:ECC_DEVICE[Uncorrectable errors]\n");
>  
>  		return IRQ_HANDLED;
>  	}
> @@ -936,6 +937,7 @@ const struct edac_device_prv_data a10_ocramecc_data = {
>  	.set_err_ofst = ALTR_A10_ECC_INTTEST_OFST,
>  	.ecc_irq_handler = altr_edac_a10_ecc_irq,
>  	.inject_fops = &altr_edac_a10_device_inject_fops,
> +	.panic = true,

So I could use a bit more detailed explanation here why OCRAM must panic
and the others don't. Consider me an external guy who doesn't know the
hardware and is looking at the driver and is wondering why this IP must
panic on double-bit errors and the others don't.

:-)

Thanks.
Borislav Petkov June 17, 2016, 5:02 p.m. UTC | #2
On Fri, Jun 17, 2016 at 12:05:41PM -0500, Thor Thayer wrote:
> That is a good question. We have 2 important uses for OCRAM 1) to hold our
> power-down/sleep and resume functions and 2) to hold our FPGA contents
> during sleep. If either of these is corrupted, it is better to panic than to
> load something that would cause incorrect.
> 
> In the cases of the FIFOs such as Ethernet and USB, the plan is to add code
> to drop the packet so that we'll get a re-transmission. In that case, it is
> sort of recoverable.

Much better. Now put that explanation in the code please!

:-)
tthayer@opensource.altera.com June 17, 2016, 5:05 p.m. UTC | #3
Hi Boris,

On 06/17/2016 11:51 AM, Borislav Petkov wrote:
> On Mon, Jun 13, 2016 at 04:19:07PM -0500, tthayer@opensource.altera.com wrote:
>> From: Thor Thayer <tthayer@opensource.altera.com>
>>
>> In preparation for additional memory module ECCs, the
>> IRQ function will check a panic flag before doing a
>> kernel panic on double bit errors. ECCs on buffers
>> will not cause a kernel panic on DBERRs.
>>
>> Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
>> ---
>> v2  New patch. Add panic flag to IRQ function.
>> v3  No change
>> ---
>>   drivers/edac/altera_edac.c |    4 +++-
>>   drivers/edac/altera_edac.h |    1 +
>>   2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
>> index 926bcaf..a9d8fa7 100644
>> --- a/drivers/edac/altera_edac.c
>> +++ b/drivers/edac/altera_edac.c
>> @@ -897,7 +897,8 @@ static irqreturn_t altr_edac_a10_ecc_irq(int irq, void *dev_id)
>>   		writel(ALTR_A10_ECC_DERRPENA,
>>   		       base + ALTR_A10_ECC_INTSTAT_OFST);
>>   		edac_device_handle_ue(dci->edac_dev, 0, 0, dci->edac_dev_name);
>> -		panic("\nEDAC:ECC_DEVICE[Uncorrectable errors]\n");
>> +		if (dci->data->panic)
>> +			panic("\nEDAC:ECC_DEVICE[Uncorrectable errors]\n");
>>
>>   		return IRQ_HANDLED;
>>   	}
>> @@ -936,6 +937,7 @@ const struct edac_device_prv_data a10_ocramecc_data = {
>>   	.set_err_ofst = ALTR_A10_ECC_INTTEST_OFST,
>>   	.ecc_irq_handler = altr_edac_a10_ecc_irq,
>>   	.inject_fops = &altr_edac_a10_device_inject_fops,
>> +	.panic = true,
>
> So I could use a bit more detailed explanation here why OCRAM must panic
> and the others don't. Consider me an external guy who doesn't know the
> hardware and is looking at the driver and is wondering why this IP must
> panic on double-bit errors and the others don't.
>
> :-)
>
> Thanks.
>
That is a good question. We have 2 important uses for OCRAM 1) to hold 
our power-down/sleep and resume functions and 2) to hold our FPGA 
contents during sleep. If either of these is corrupted, it is better to 
panic than to load something that would cause incorrect.

In the cases of the FIFOs such as Ethernet and USB, the plan is to add 
code to drop the packet so that we'll get a re-transmission. In that 
case, it is sort of recoverable.

Thor
tthayer@opensource.altera.com June 17, 2016, 5:11 p.m. UTC | #4
On 06/17/2016 12:02 PM, Borislav Petkov wrote:
> On Fri, Jun 17, 2016 at 12:05:41PM -0500, Thor Thayer wrote:
>> That is a good question. We have 2 important uses for OCRAM 1) to hold our
>> power-down/sleep and resume functions and 2) to hold our FPGA contents
>> during sleep. If either of these is corrupted, it is better to panic than to
>> load something that would cause incorrect.
>>
>> In the cases of the FIFOs such as Ethernet and USB, the plan is to add code
>> to drop the packet so that we'll get a re-transmission. In that case, it is
>> sort of recoverable.
>
> Much better. Now put that explanation in the code please!
>
> :-)
>
I'll add that in the next revision. Thanks for reviewing.
diff mbox

Patch

diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
index 926bcaf..a9d8fa7 100644
--- a/drivers/edac/altera_edac.c
+++ b/drivers/edac/altera_edac.c
@@ -897,7 +897,8 @@  static irqreturn_t altr_edac_a10_ecc_irq(int irq, void *dev_id)
 		writel(ALTR_A10_ECC_DERRPENA,
 		       base + ALTR_A10_ECC_INTSTAT_OFST);
 		edac_device_handle_ue(dci->edac_dev, 0, 0, dci->edac_dev_name);
-		panic("\nEDAC:ECC_DEVICE[Uncorrectable errors]\n");
+		if (dci->data->panic)
+			panic("\nEDAC:ECC_DEVICE[Uncorrectable errors]\n");
 
 		return IRQ_HANDLED;
 	}
@@ -936,6 +937,7 @@  const struct edac_device_prv_data a10_ocramecc_data = {
 	.set_err_ofst = ALTR_A10_ECC_INTTEST_OFST,
 	.ecc_irq_handler = altr_edac_a10_ecc_irq,
 	.inject_fops = &altr_edac_a10_device_inject_fops,
+	.panic = true,
 };
 
 #endif	/* CONFIG_EDAC_ALTERA_OCRAM */
diff --git a/drivers/edac/altera_edac.h b/drivers/edac/altera_edac.h
index 62b0fa0..cf4e8cb 100644
--- a/drivers/edac/altera_edac.h
+++ b/drivers/edac/altera_edac.h
@@ -298,6 +298,7 @@  struct edac_device_prv_data {
 	irqreturn_t (*ecc_irq_handler)(int irq, void *dev_id);
 	int trig_alloc_sz;
 	const struct file_operations *inject_fops;
+	bool panic;
 };
 
 struct altr_edac_device_dev {