diff mbox series

irqchip/armada-370-xp: Implement SoC Error interrupts

Message ID 20240814124537.29847-1-kabel@kernel.org (mailing list archive)
State New, archived
Headers show
Series irqchip/armada-370-xp: Implement SoC Error interrupts | expand

Commit Message

Marek Behún Aug. 14, 2024, 12:45 p.m. UTC
From: Pali Rohár <pali@kernel.org>

Interrupt 4 in MPIC is the SoC Error Summary interrupt. The SoC Error
Reporting provides another hierarchy of interrupts, the SoC Error
interrupts. These can be used for example for PCIe error reporting.

Implement a new IRQ chip and domain for accessing these interrupts.

The code binds this new interrupt domain to the same device-tree node as
the main interrupt domain. The main interrupt controller has its
interrupts described by one argument in device-tree
(#interrupt-cells = <1>), i.e.:

  interrupts-extended = <&mpic 8>;

Because of backwards compatibility we cannot change this number of
arguments, and so the SoC Error interrupts must also be described by
this one number.

Thus, to describe a SoC Error interrupt, one has to add the an offset
to the SoC Error interrupt number. Offset 0x400 was chosen because the
main controller supports at most 1024 interrupts (in theory; in practice
it seems to be 116 interrupts on all supported platforms). An example of
describing a SoC Error interrupt is

  interrupts-extended = <&mpic 0x404>;

Signed-off-by: Pali Rohár <pali@kernel.org>
[ Marek: rebased, refactored, dropped the requirement for another
  interrupt-controller device-tree node by binding the new IRQ domain
  to the same node, changed commit message ]
Signed-off-by: Marek Behún <kabel@kernel.org>
---
 drivers/irqchip/Kconfig             |   1 +
 drivers/irqchip/irq-armada-370-xp.c | 248 +++++++++++++++++++++++++++-
 2 files changed, 241 insertions(+), 8 deletions(-)

Comments

Thomas Gleixner Aug. 14, 2024, 3:51 p.m. UTC | #1
On Wed, Aug 14 2024 at 14:45, Marek Behún wrote:

Cc+ device tree people.

> + The code binds this new interrupt domain to the same device-tree node as
> + the main interrupt domain. The main interrupt controller has its
> + interrupts described by one argument in device-tree
> + (#interrupt-cells = <1>), i.e.:
> + 
> +   interrupts-extended = <&mpic 8>;
> + 
> + Because of backwards compatibility we cannot change this number of
> + arguments, and so the SoC Error interrupts must also be described by
> + this one number.
> + 
> + Thus, to describe a SoC Error interrupt, one has to add the an offset
> + to the SoC Error interrupt number. Offset 0x400 was chosen because the
> + main controller supports at most 1024 interrupts (in theory; in practice
> + it seems to be 116 interrupts on all supported platforms). An example of
> + describing a SoC Error interrupt is
> + 
> +   interrupts-extended = <&mpic 0x404>;

This looks like a horrible hack and I don't understand why this can't be
a separate interrupt controller, which it is in the hardware. That
controller utilizes interrupt 4 from the MPIC.

But then my DT foo is limited, so I let the DT folks comment on that.

> +static int mpic_soc_err_irq_set_affinity(struct irq_data *d, const struct cpumask *mask, bool force)
> +{
> +	unsigned int cpu;
> +
> +	/*
> +	 * TODO: The mpic->per_cpu region accesses CPU Local IC registers for CPU n when accessed
> +	 * from CPU n. Thus if we want to access this registers from another CPU, we need to request
> +	 * a function to be executed on CPU n. This is what we do here by calling smp_call_on_cpu().
> +	 *
> +	 * Instead, we could access CPU Local IC registers by having CPU Local region of each CPU
> +	 * mapped in the MPIC private data structure. We could do this either by extending the
> +	 * register resource in the device-tree, or by computing the physical base address of those
> +	 * regions relative to the main MPIC base address.

That requires locking for those registers obviously.

> +	 */
> +
> +	cpus_read_lock();

This code was clearly never tested with any debug enabled.

set_affinity() is invoked with interrupts disabled and irq_desc::lock
held. cpus_read_lock() can sleep... The mandatory debug options would
have told you loud and clearly.

> +	/* First, disable the ERR IRQ on all cores */
> +	for_each_online_cpu(cpu)
> +		smp_call_on_cpu(cpu, mpic_soc_err_irq_mask_on_cpu, d, true);

Again. smp_call_on_cpu() invokes wait_for_completion(), which obviously
can sleep.

Also why do you want to do that on _ALL_ CPUs if there is only one you
pick from the effective affinity mask?

> +	/* Then enable on one online core from the affinity mask */
> +	cpu = cpumask_any_and(mask, cpu_online_mask);
> +	smp_call_on_cpu(cpu, mpic_soc_err_irq_unmask_on_cpu, d, true);

Ditto.

So you really want to map the registers so they are accessible cross CPU
including locking. Alternatively pin the error interrupts to CPU0 which
cannot be unplugged and be done with it.

> +static int mpic_soc_err_irq_map(struct irq_domain *domain, unsigned int virq, irq_hw_number_t hwirq)
> +{
> +	struct mpic *mpic = domain->host_data;
> +
> +	irq_set_chip_data(virq, mpic);
> +
> +	mpic_soc_err_irq_mask(irq_get_irq_data(virq));

What for? It should be masked if it's not mapped, no?

> +	irq_set_status_flags(virq, IRQ_LEVEL);
> +	irq_set_chip_and_handler(virq, &mpic_soc_err_irq_chip, handle_level_irq);
> +	irq_set_probe(virq);
> +
> +	return 0;
> +}

> +static int mpic_soc_err_xlate(struct irq_domain *domain, struct device_node *np,
> +			      const u32 *spec, unsigned int count,
> +			      unsigned long *hwirq, unsigned int *type)
> +{
> +	int err = irq_domain_xlate_onecell(domain, np, spec, count, hwirq, type);
> +
> +	if (err)
> +		return err;
> +
> +	*hwirq -= MPIC_SOC_ERR_IRQS_OFFSET;
> +	return 0;
> +}

> +static int __init mpic_soc_err_init(struct mpic *mpic, struct device_node *np)
> +{
> +	unsigned int nr_irqs;
> +
> +	if (of_machine_is_compatible("marvell,armada-370-xp"))
> +		nr_irqs = 32;
> +	else
> +		nr_irqs = 64;
> +
> +	mpic->soc_err_domain = irq_domain_add_hierarchy(mpic->domain, 0, nr_irqs, np,
> +							&mpic_soc_err_irq_ops, mpic);

Why is this a hierarchical domain? That does not make any sense at
all. The MPIC domain provides only the demultiplexing interrupt and is
not involved in a domain hierarchy. Hierarchical domains are required
when the top level domain depends on resources from the parent domain
which need to be allocated when mapping an interrupt. e.g. on x86:

      vector - remap - PCI/MSI

So the top level PCI/MSI domain requires resources from the remap domain
and the remap domain requires a vector from the vector domain. These
resources are per interrupt.

During runtime there are also irq_*() callbacks which utilize callbacks
from the parent domains. I.e. mask() invokes mask(parent) ...

But that has nothing to do with demultiplexed interrupts because the
underlying demultiplex interrupt is already there and the same for all
interrupts in the demultiplexed domain. There is no callback dependency
and no resource dependency at all.

Thanks,

        tglx
Conor Dooley Aug. 14, 2024, 4:43 p.m. UTC | #2
On Wed, Aug 14, 2024 at 05:51:33PM +0200, Thomas Gleixner wrote:
> On Wed, Aug 14 2024 at 14:45, Marek Behún wrote:
> 
> Cc+ device tree people.
> 
> > + The code binds this new interrupt domain to the same device-tree node as
> > + the main interrupt domain. The main interrupt controller has its
> > + interrupts described by one argument in device-tree
> > + (#interrupt-cells = <1>), i.e.:
> > + 
> > +   interrupts-extended = <&mpic 8>;
> > + 
> > + Because of backwards compatibility we cannot change this number of
> > + arguments, and so the SoC Error interrupts must also be described by
> > + this one number.
> > + 
> > + Thus, to describe a SoC Error interrupt, one has to add the an offset
> > + to the SoC Error interrupt number. Offset 0x400 was chosen because the
> > + main controller supports at most 1024 interrupts (in theory; in practice
> > + it seems to be 116 interrupts on all supported platforms). An example of
> > + describing a SoC Error interrupt is
> > + 
> > +   interrupts-extended = <&mpic 0x404>;
> 
> This looks like a horrible hack and I don't understand why this can't be
> a separate interrupt controller, which it is in the hardware. That
> controller utilizes interrupt 4 from the MPIC.
> 
> But then my DT foo is limited, so I let the DT folks comment on that.

I'm guessing, not being author and all that, that since the registers
for this SOC_ERR business are intermingled with those of the existing
interrupt controller it is not possible to create a standalone node for
the new controller with it's own reg entry, without having to redo
the binding etc for the existing controller, ending up with some sort of
syscon.
Alternatively, I suppose a child node could work, but it wouldn't be much
less of a hack than deciding that numbers > 400 are the SOC_ERR ones.
I see arguments for doing it either way and Marek must have an opinion
on doing it without a new node, given the comment suggesting that a
previous revision did have a dedicated node and that approach was
dropped.

Cheers,
Conor.
Marek Behún Aug. 15, 2024, 10:15 a.m. UTC | #3
On Wed, Aug 14, 2024 at 05:43:19PM +0100, Conor Dooley wrote:
> On Wed, Aug 14, 2024 at 05:51:33PM +0200, Thomas Gleixner wrote:
> > On Wed, Aug 14 2024 at 14:45, Marek Behún wrote:
> > 
> > Cc+ device tree people.
> > 
> > > + The code binds this new interrupt domain to the same device-tree node as
> > > + the main interrupt domain. The main interrupt controller has its
> > > + interrupts described by one argument in device-tree
> > > + (#interrupt-cells = <1>), i.e.:
> > > + 
> > > +   interrupts-extended = <&mpic 8>;
> > > + 
> > > + Because of backwards compatibility we cannot change this number of
> > > + arguments, and so the SoC Error interrupts must also be described by
> > > + this one number.
> > > + 
> > > + Thus, to describe a SoC Error interrupt, one has to add the an offset
> > > + to the SoC Error interrupt number. Offset 0x400 was chosen because the
> > > + main controller supports at most 1024 interrupts (in theory; in practice
> > > + it seems to be 116 interrupts on all supported platforms). An example of
> > > + describing a SoC Error interrupt is
> > > + 
> > > +   interrupts-extended = <&mpic 0x404>;
> > 
> > This looks like a horrible hack and I don't understand why this can't be
> > a separate interrupt controller, which it is in the hardware. That
> > controller utilizes interrupt 4 from the MPIC.
> > 
> > But then my DT foo is limited, so I let the DT folks comment on that.
> 
> I'm guessing, not being author and all that, that since the registers
> for this SOC_ERR business are intermingled with those of the existing
> interrupt controller it is not possible to create a standalone node for
> the new controller with it's own reg entry, without having to redo
> the binding etc for the existing controller, ending up with some sort of
> syscon.
> Alternatively, I suppose a child node could work, but it wouldn't be much
> less of a hack than deciding that numbers > 400 are the SOC_ERR ones.
> I see arguments for doing it either way and Marek must have an opinion
> on doing it without a new node, given the comment suggesting that a
> previous revision did have a dedicated node and that approach was
> dropped.

This is exactly the reason.

Pali's original code required creating another interrupt-controller node
(soc_err) within the mpic node:

   mpic: interrupt-controller@20a00 {
        compatible = "marvell,mpic";
	reg = <0x20a00 0x2d0>, <0x21070 0x58>;
	#interrupt-cells = <1>;
	interrupt-controller;
	msi-controller;
	interrupts = <GIC_PPI 15 IRQ_TYPE_LEVEL_HIGH>;

	soc_err: interrupt-controller@20 {
	    interrupt-controller;
	    #interrupt-cells = <1>;
	};
   };

I decided against it, because to do this correctly in device-tree
bindings, it gets unnecessarily complicated:
- it requires #address-cells and #size-cells within the mpic node
- the `interrupt-controller` property of mpic node can in some
  interpretations clash with the `interrupt-controller@20` name of the
  child node
- the registers of soc_err and mpic overlap
- the child note should have a compatible string if we want to be proper

I also did consider syscon, but IMO it just adds unnecessary burden to
backwards compatibility of device-trees.

The solution with the 0x400 offset solves the backwards compatiblity
issue.

Marek
diff mbox series

Patch

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index d078bdc48c38..ac011e9df7f6 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -82,6 +82,7 @@  config IRQ_MSI_LIB
 config ARMADA_370_XP_IRQ
 	bool
 	select GENERIC_IRQ_CHIP
+	select IRQ_DOMAIN_HIERARCHY
 	select PCI_MSI if PCI
 	select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
 
diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index d7c5ef248474..8075b25a66a5 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -27,6 +27,7 @@ 
 #include <linux/of_pci.h>
 #include <linux/irqdomain.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/syscore_ops.h>
 #include <linux/msi.h>
 #include <linux/types.h>
@@ -116,6 +117,8 @@ 
 #define MPIC_INT_CONTROL			0x00
 #define MPIC_INT_CONTROL_NUMINT_MASK		GENMASK(12, 2)
 #define MPIC_SW_TRIG_INT			0x04
+#define MPIC_INT_SOC_ERR_0_CAUSE		0x20
+#define MPIC_INT_SOC_ERR_1_CAUSE		0x24
 #define MPIC_INT_SET_ENABLE			0x30
 #define MPIC_INT_CLEAR_ENABLE			0x34
 #define MPIC_INT_SOURCE_CTL(hwirq)		(0x100 + (hwirq) * 4)
@@ -130,7 +133,9 @@ 
 #define MPIC_CPU_INTACK_IID_MASK		GENMASK(9, 0)
 #define MPIC_INT_SET_MASK			0x48
 #define MPIC_INT_CLEAR_MASK			0x4C
+#define MPIC_INT_SOC_ERR_0_MASK			0x50
 #define MPIC_INT_FABRIC_MASK			0x54
+#define MPIC_INT_SOC_ERR_1_MASK			0x58
 #define MPIC_INT_CAUSE_PERF(cpu)		BIT(cpu)
 
 #define MPIC_PER_CPU_IRQS_NR			29
@@ -149,12 +154,16 @@ 
 #define PCI_MSI_FULL_DOORBELL_SRC0_MASK		GENMASK(15, 0)
 #define PCI_MSI_FULL_DOORBELL_SRC1_MASK		GENMASK(31, 16)
 
+/* Device-tree binding offset for SoC Error interrupts */
+#define MPIC_SOC_ERR_IRQS_OFFSET		0x400
+
 /**
  * struct mpic - MPIC private data structure
  * @base:		MPIC registers base address
  * @per_cpu:		per-CPU registers base address
  * @parent_irq:		parent IRQ if MPIC is not top-level interrupt controller
  * @domain:		MPIC main interrupt domain
+ * @soc_err_domain:	SoC Error interrupt domain
  * @ipi_domain:		IPI domain
  * @msi_domain:		MSI domain
  * @msi_inner_domain:	MSI inner domain
@@ -172,6 +181,7 @@  struct mpic {
 	void __iomem *per_cpu;
 	int parent_irq;
 	struct irq_domain *domain;
+	struct irq_domain *soc_err_domain;
 #ifdef CONFIG_SMP
 	struct irq_domain *ipi_domain;
 #endif
@@ -542,19 +552,34 @@  static void mpic_smp_cpu_init(struct mpic *mpic)
 	writel(0, mpic->per_cpu + MPIC_INT_CLEAR_MASK);
 }
 
-static void mpic_reenable_percpu(struct mpic *mpic)
+static void mpic_reenable_percpu_domain(struct irq_domain *domain, irq_hw_number_t max,
+					void (*unmask_cb)(struct irq_data *))
 {
-	/* Re-enable per-CPU interrupts that were enabled before suspend */
-	for (irq_hw_number_t i = 0; i < MPIC_PER_CPU_IRQS_NR; i++) {
-		unsigned int virq = irq_linear_revmap(mpic->domain, i);
+	for (irq_hw_number_t i = 0; i < max; i++) {
+		unsigned int virq = irq_linear_revmap(domain, i);
 		struct irq_data *d;
 
 		if (!virq || !irq_percpu_is_enabled(virq))
 			continue;
 
 		d = irq_get_irq_data(virq);
-		mpic_irq_unmask(d);
+		unmask_cb(d);
 	}
+}
+
+static void mpic_soc_err_irq_unmask(struct irq_data *d);
+
+static void mpic_reenable_percpu(struct mpic *mpic)
+{
+	/* Re-enable per-CPU interrupts that were enabled before suspend */
+	mpic_reenable_percpu_domain(mpic->domain, MPIC_PER_CPU_IRQS_NR, mpic_irq_unmask);
+
+	/* Re-enable per-CPU SoC Error interrupts that were enabled before suspend */
+	mpic_reenable_percpu_domain(mpic->soc_err_domain, mpic->soc_err_domain->hwirq_max,
+				    mpic_soc_err_irq_unmask);
+
+	/* Unmask summary SoC Error Interrupt */
+	writel(4, mpic->per_cpu + MPIC_INT_CLEAR_MASK);
 
 	if (mpic_is_ipi_available(mpic))
 		mpic_ipi_resume(mpic);
@@ -599,12 +624,31 @@  static struct irq_chip mpic_irq_chip = {
 	.flags		= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_MASK_ON_SUSPEND,
 };
 
+static int mpic_domain_select(struct irq_domain *our_domain, irq_hw_number_t base,
+			      struct irq_domain *domain, struct irq_fwspec *fwspec,
+			      enum irq_domain_bus_token bus_token)
+{
+	u32 param = fwspec->param[0];
+
+	return domain == our_domain && domain->fwnode == fwspec->fwnode &&
+	       bus_token == DOMAIN_BUS_WIRED && fwspec->param_count == 1 &&
+	       param >= base && param < base + domain->hwirq_max;
+}
+
+static int mpic_irq_select(struct irq_domain *domain, struct irq_fwspec *fwspec,
+			   enum irq_domain_bus_token bus_token)
+{
+	struct mpic *mpic = domain->host_data;
+
+	return mpic_domain_select(mpic->domain, 0, domain, fwspec, bus_token);
+}
+
 static int mpic_irq_map(struct irq_domain *domain, unsigned int virq, irq_hw_number_t hwirq)
 {
 	struct mpic *mpic = domain->host_data;
 
-	/* IRQs 0 and 1 cannot be mapped, they are handled internally */
-	if (hwirq <= 1)
+	/* IRQs 0, 1 and 4 cannot be mapped, they are handled internally */
+	if (hwirq <= 1 || hwirq == 4)
 		return -EINVAL;
 
 	irq_set_chip_data(virq, mpic);
@@ -628,10 +672,173 @@  static int mpic_irq_map(struct irq_domain *domain, unsigned int virq, irq_hw_num
 }
 
 static const struct irq_domain_ops mpic_irq_ops = {
+	.select	= mpic_irq_select,
 	.map	= mpic_irq_map,
 	.xlate	= irq_domain_xlate_onecell,
 };
 
+static void mpic_soc_err_irq_mask_set(struct irq_data *d, bool mask)
+{
+	struct mpic *mpic = irq_data_get_irq_chip_data(d);
+	irq_hw_number_t hwirq = irqd_to_hwirq(d);
+	u32 bit = BIT(hwirq % 32);
+	u32 reg;
+
+	if (hwirq >= 32)
+		reg = MPIC_INT_SOC_ERR_1_MASK;
+	else
+		reg = MPIC_INT_SOC_ERR_0_MASK;
+
+	atomic_io_modify(mpic->per_cpu + reg, bit, mask ? 0 : bit);
+}
+
+static void mpic_soc_err_irq_mask(struct irq_data *d)
+{
+	mpic_soc_err_irq_mask_set(d, true);
+}
+
+static void mpic_soc_err_irq_unmask(struct irq_data *d)
+{
+	mpic_soc_err_irq_mask_set(d, false);
+}
+
+static int mpic_soc_err_irq_mask_on_cpu(void *d)
+{
+	mpic_soc_err_irq_mask(d);
+
+	return 0;
+}
+
+static int mpic_soc_err_irq_unmask_on_cpu(void *d)
+{
+	mpic_soc_err_irq_unmask(d);
+
+	return 0;
+}
+
+static int mpic_soc_err_irq_set_affinity(struct irq_data *d, const struct cpumask *mask, bool force)
+{
+	unsigned int cpu;
+
+	/*
+	 * TODO: The mpic->per_cpu region accesses CPU Local IC registers for CPU n when accessed
+	 * from CPU n. Thus if we want to access this registers from another CPU, we need to request
+	 * a function to be executed on CPU n. This is what we do here by calling smp_call_on_cpu().
+	 *
+	 * Instead, we could access CPU Local IC registers by having CPU Local region of each CPU
+	 * mapped in the MPIC private data structure. We could do this either by extending the
+	 * register resource in the device-tree, or by computing the physical base address of those
+	 * regions relative to the main MPIC base address.
+	 */
+
+	cpus_read_lock();
+
+	/* First, disable the ERR IRQ on all cores */
+	for_each_online_cpu(cpu)
+		smp_call_on_cpu(cpu, mpic_soc_err_irq_mask_on_cpu, d, true);
+
+	/* Then enable on one online core from the affinity mask */
+	cpu = cpumask_any_and(mask, cpu_online_mask);
+	smp_call_on_cpu(cpu, mpic_soc_err_irq_unmask_on_cpu, d, true);
+
+	cpus_read_unlock();
+
+	irq_data_update_effective_affinity(d, cpumask_of(cpu));
+
+	return IRQ_SET_MASK_OK;
+}
+
+static struct irq_chip mpic_soc_err_irq_chip = {
+	.name			= "MPIC SoC Errors",
+	.irq_mask		= mpic_soc_err_irq_mask,
+	.irq_unmask		= mpic_soc_err_irq_unmask,
+	.irq_set_affinity	= mpic_soc_err_irq_set_affinity,
+};
+
+static int mpic_soc_err_irq_select(struct irq_domain *domain, struct irq_fwspec *fwspec,
+				   enum irq_domain_bus_token bus_token)
+{
+	struct mpic *mpic = domain->host_data;
+
+	return mpic_domain_select(mpic->soc_err_domain, MPIC_SOC_ERR_IRQS_OFFSET, domain, fwspec,
+				  bus_token);
+}
+
+static int mpic_soc_err_irq_map(struct irq_domain *domain, unsigned int virq, irq_hw_number_t hwirq)
+{
+	struct mpic *mpic = domain->host_data;
+
+	irq_set_chip_data(virq, mpic);
+
+	mpic_soc_err_irq_mask(irq_get_irq_data(virq));
+	irq_set_status_flags(virq, IRQ_LEVEL);
+	irq_set_chip_and_handler(virq, &mpic_soc_err_irq_chip, handle_level_irq);
+	irq_set_probe(virq);
+
+	return 0;
+}
+
+static int mpic_soc_err_xlate(struct irq_domain *domain, struct device_node *np,
+			      const u32 *spec, unsigned int count,
+			      unsigned long *hwirq, unsigned int *type)
+{
+	int err = irq_domain_xlate_onecell(domain, np, spec, count, hwirq, type);
+
+	if (err)
+		return err;
+
+	*hwirq -= MPIC_SOC_ERR_IRQS_OFFSET;
+
+	return 0;
+}
+
+static const struct irq_domain_ops mpic_soc_err_irq_ops = {
+	.select	= mpic_soc_err_irq_select,
+	.map	= mpic_soc_err_irq_map,
+	.xlate	= mpic_soc_err_xlate,
+};
+
+static int __init mpic_soc_err_init(struct mpic *mpic, struct device_node *np)
+{
+	unsigned int nr_irqs;
+
+	if (of_machine_is_compatible("marvell,armada-370-xp"))
+		nr_irqs = 32;
+	else
+		nr_irqs = 64;
+
+	mpic->soc_err_domain = irq_domain_add_hierarchy(mpic->domain, 0, nr_irqs, np,
+							&mpic_soc_err_irq_ops, mpic);
+	if (WARN_ON(!mpic->soc_err_domain))
+		return -ENOMEM;
+
+	/* Unmask summary SoC Error Interrupt */
+	writel(4, mpic->per_cpu + MPIC_INT_CLEAR_MASK);
+
+	return 0;
+}
+
+static void mpic_handle_soc_err_reg(struct mpic *mpic, u32 mask_reg, u32 cause_reg,
+				    unsigned int offset)
+{
+	unsigned long cause;
+	unsigned int i;
+
+	cause = readl(mpic->base + cause_reg);
+	cause &= readl(mpic->per_cpu + mask_reg);
+
+	for_each_set_bit(i, &cause, BITS_PER_LONG)
+		generic_handle_domain_irq(mpic->soc_err_domain, i + offset);
+}
+
+static void mpic_handle_soc_err_irq(struct mpic *mpic)
+{
+	mpic_handle_soc_err_reg(mpic, MPIC_INT_SOC_ERR_0_MASK, MPIC_INT_SOC_ERR_0_CAUSE, 0);
+
+	if (mpic->soc_err_domain->hwirq_max > 32)
+		mpic_handle_soc_err_reg(mpic, MPIC_INT_SOC_ERR_1_MASK, MPIC_INT_SOC_ERR_1_CAUSE, 32);
+}
+
 #ifdef CONFIG_PCI_MSI
 static void mpic_handle_msi_irq(struct mpic *mpic)
 {
@@ -692,6 +899,11 @@  static void mpic_handle_cascade_irq(struct irq_desc *desc)
 			continue;
 		}
 
+		if (unlikely(i == 4)) {
+			mpic_handle_soc_err_irq(mpic);
+			continue;
+		}
+
 		generic_handle_domain_irq(mpic->domain, i);
 	}
 
@@ -711,9 +923,13 @@  static void __exception_irq_entry mpic_handle_irq(struct pt_regs *regs)
 		if (i > 1022)
 			break;
 
-		if (i > 1)
+		if (i > 1 && i != 4)
 			generic_handle_domain_irq(mpic->domain, i);
 
+		/* SoC Error handling */
+		if (unlikely(i == 4))
+			mpic_handle_soc_err_irq(mpic);
+
 		/* MSI handling */
 		if (i == 1)
 			mpic_handle_msi_irq(mpic);
@@ -766,6 +982,13 @@  static void mpic_resume(void)
 		}
 	}
 
+	/*
+	 * Re-enable per-CPU SoC Error interrupts on the current CPU, mpic_reenable_percpu() will
+	 * take care of secondary CPUs when they come up.
+	 */
+	mpic_reenable_percpu_domain(mpic->soc_err_domain, mpic->soc_err_domain->hwirq_max,
+				    mpic_soc_err_irq_unmask);
+
 	/* Reconfigure doorbells for IPIs and MSIs */
 	writel(mpic->doorbell_mask, mpic->per_cpu + MPIC_IN_DRBEL_MASK);
 
@@ -782,6 +1005,9 @@  static void mpic_resume(void)
 	if (src1)
 		writel(1, mpic->per_cpu + MPIC_INT_CLEAR_MASK);
 
+	/* Unmask summary SoC Error Interrupt */
+	writel(4, mpic->per_cpu + MPIC_INT_CLEAR_MASK);
+
 	if (mpic_is_ipi_available(mpic))
 		mpic_ipi_resume(mpic);
 }
@@ -873,6 +1099,12 @@  static int __init mpic_of_init(struct device_node *node, struct device_node *par
 	mpic_perf_init(mpic);
 	mpic_smp_cpu_init(mpic);
 
+	err = mpic_soc_err_init(mpic, node);
+	if (err) {
+		pr_err("%pOF: Unable to initialize SOC Error IRQs domain\n", node);
+		return err;
+	}
+
 	err = mpic_msi_init(mpic, node, phys_base);
 	if (err) {
 		pr_err("%pOF: Unable to initialize MSI domain\n", node);