Message ID | 20220324151258.943896-4-apatel@ventanamicro.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RISC-V IPI Improvements | expand |
Anup, On Thu, Mar 24 2022 at 20:42, Anup Patel wrote: > All RISC-V platforms have a single HW IPI provided by the INTC local > interrupt controller. The HW method to trigger INTC IPI can be through > external irqchip (e.g. RISC-V AIA), through platform specific device > (e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call). > > To support multiple IPIs on RISC-V, we need a generic mechanism to > create multiple per-CPU vIRQs using a single HW IPI hence this patch. git grep 'This patch' Documentation/process > The generic IPI multiplex mechanism added by this patch can also be > useful to other architectures. Which ones? Sane architectures have more than one IPI. > diff --git a/include/linux/irq.h b/include/linux/irq.h > index 848e1e12c5c6..cdce7eae2f37 100644 > --- a/include/linux/irq.h > +++ b/include/linux/irq.h > @@ -1248,6 +1248,34 @@ int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest); > int ipi_send_single(unsigned int virq, unsigned int cpu); > int ipi_send_mask(unsigned int virq, const struct cpumask *dest); > > +#define IPI_MUX_NR_IRQS BITS_PER_LONG > +struct ipi_mux_ops { This is unreadable. Newlines exist for a reason. > + void (*ipi_mux_clear)(unsigned int parent_virq); > + void (*ipi_mux_send)(unsigned int parent_virq, > + const struct cpumask *mask); > +}; > + > +/* Process multiplexed IPIs */ > +void ipi_mux_process(void); > + > +/* > + * Create multiple IPIs (total IPI_MUX_NR_IRQS) multiplexed on top of a > + * single parent IPI. > + * > + * If the parent IPI > 0 then ipi_mux_process() will be automatically > + * called via chained handler. > + * > + * If the parent IPI <= 0 then it is responsiblity of irqchip drivers > + * to explicitly call ipi_mux_process() for processing muxed > + * IPIs. > + * > + * Returns first virq of the muxed IPIs upon success or <=0 upon failure > + */ > +int ipi_mux_create(unsigned int parent_virq, const struct ipi_mux_ops *ops); While it is kinda sensible to have the documentation near the declaration, I prefer it to be near the code because thats where it matters and also has a higher chance to be updated when the code changes. Please use proper kernel doc while at it. > +static unsigned int ipi_mux_parent_virq; > +static struct irq_domain *ipi_mux_domain; > +static const struct ipi_mux_ops *ipi_mux_ops; > +static DEFINE_PER_CPU(unsigned long, ipi_mux_bits); > + > +static void ipi_mux_dummy(struct irq_data *d) > +{ > +} > + > +static void ipi_mux_send_mask(struct irq_data *d, const struct cpumask *mask) > +{ > + int cpu; > + > + /* Barrier before doing atomic bit update to IPI bits */ > + smp_mb__before_atomic(); > + > + for_each_cpu(cpu, mask) > + set_bit(d->hwirq, per_cpu_ptr(&ipi_mux_bits, cpu)); > + > + /* Barrier after doing atomic bit update to IPI bits */ > + smp_mb__after_atomic(); > + > + /* Trigger the parent IPI */ > + ipi_mux_ops->ipi_mux_send(ipi_mux_parent_virq, mask); > +} > + > +static struct irq_chip ipi_mux_chip = { > + .name = "RISC-V IPI Mux", RISC-V IPI Mux is a truly generic name :) > + .irq_mask = ipi_mux_dummy, > + .irq_unmask = ipi_mux_dummy, > + .ipi_send_mask = ipi_mux_send_mask, > +}; > + > +static int ipi_mux_domain_map(struct irq_domain *d, unsigned int irq, > + irq_hw_number_t hwirq) > +{ > + irq_set_percpu_devid(irq); > + irq_domain_set_info(d, irq, hwirq, &ipi_mux_chip, d->host_data, > + handle_percpu_devid_irq, NULL, NULL); > + > + return 0; > +} > + > +static int ipi_mux_domain_alloc(struct irq_domain *d, unsigned int virq, > + unsigned int nr_irqs, void *arg) > +{ > + int i, ret; > + irq_hw_number_t hwirq; > + unsigned int type = IRQ_TYPE_NONE; > + struct irq_fwspec *fwspec = arg; Documentation/process/maintainer-tip.rst #coding-style-notes > + ret = irq_domain_translate_onecell(d, fwspec, &hwirq, &type); > + if (ret) > + return ret; > + > + for (i = 0; i < nr_irqs; i++) { > + ret = ipi_mux_domain_map(d, virq + i, hwirq + i); > + if (ret) > + return ret; > + } > + > + return 0; > +} > + > +static const struct irq_domain_ops ipi_mux_domain_ops = { > + .translate = irq_domain_translate_onecell, > + .alloc = ipi_mux_domain_alloc, > + .free = irq_domain_free_irqs_top, > +}; > + > +void ipi_mux_process(void) > +{ > + int err; > + unsigned long irqs, *bits = this_cpu_ptr(&ipi_mux_bits); > + irq_hw_number_t hwirq; > + > + while (true) { > + /* Clear the parent IPI */ > + ipi_mux_ops->ipi_mux_clear(ipi_mux_parent_virq); This being in a loop smells fishy at least without a comment. And the more I read all of this the less I'm convinced that this code can be used by anything else than RISCV. > + /* Order bit clearing and data access. */ > + mb(); This mb() pairs with what? Memory barriers have a counterpart and it's mandatory to document that in the comment. > + irqs = xchg(bits, 0); > + if (!irqs) > + break; > + > + for_each_set_bit(hwirq, &irqs, IPI_MUX_NR_IRQS) { > + err = generic_handle_domain_irq(ipi_mux_domain, > + hwirq); > + if (unlikely(err)) > + pr_warn_ratelimited( > + "can't find mapping for hwirq %lu\n", > + hwirq); > + } > + } > +} > + > + > +void ipi_mux_destroy(void) Seriously? You provide a function to rip the IPI mechanism out in a running system? What's that for? > +{ > + if (!ipi_mux_domain) > + return; > + > + irq_domain_remove(ipi_mux_domain); > + ipi_mux_domain = NULL; > + ipi_mux_parent_virq = 0; If it would be useful, then this would leak the hotplug callbacks, but the good news is that after tearing down the IPI domain hotplug does not work anymore :) Thanks, tglx
On Mon, Apr 11, 2022 at 1:41 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Anup, > > On Thu, Mar 24 2022 at 20:42, Anup Patel wrote: > > All RISC-V platforms have a single HW IPI provided by the INTC local > > interrupt controller. The HW method to trigger INTC IPI can be through > > external irqchip (e.g. RISC-V AIA), through platform specific device > > (e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call). > > > > To support multiple IPIs on RISC-V, we need a generic mechanism to > > create multiple per-CPU vIRQs using a single HW IPI hence this patch. > > git grep 'This patch' Documentation/process Okay, I will update the commit description as-per Documentation. > > > The generic IPI multiplex mechanism added by this patch can also be > > useful to other architectures. > > Which ones? Sane architectures have more than one IPI. Currently, the IPI muxing is shared code for various RISC-V drivers (such as CLINT driver, SBI IPI irqchip driver, and AIA (coming soon)). Overall, the IPI muxing seems independent of RISC-V so maybe it is useful to have it as common selectable API. > > > diff --git a/include/linux/irq.h b/include/linux/irq.h > > index 848e1e12c5c6..cdce7eae2f37 100644 > > --- a/include/linux/irq.h > > +++ b/include/linux/irq.h > > @@ -1248,6 +1248,34 @@ int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest); > > int ipi_send_single(unsigned int virq, unsigned int cpu); > > int ipi_send_mask(unsigned int virq, const struct cpumask *dest); > > > > +#define IPI_MUX_NR_IRQS BITS_PER_LONG > > +struct ipi_mux_ops { > > This is unreadable. Newlines exist for a reason. Okay, I will add a newline above the struct. > > > + void (*ipi_mux_clear)(unsigned int parent_virq); > > + void (*ipi_mux_send)(unsigned int parent_virq, > > + const struct cpumask *mask); > > +}; > > + > > +/* Process multiplexed IPIs */ > > +void ipi_mux_process(void); > > + > > +/* > > + * Create multiple IPIs (total IPI_MUX_NR_IRQS) multiplexed on top of a > > + * single parent IPI. > > + * > > + * If the parent IPI > 0 then ipi_mux_process() will be automatically > > + * called via chained handler. > > + * > > + * If the parent IPI <= 0 then it is responsiblity of irqchip drivers > > + * to explicitly call ipi_mux_process() for processing muxed > > + * IPIs. > > + * > > + * Returns first virq of the muxed IPIs upon success or <=0 upon failure > > + */ > > +int ipi_mux_create(unsigned int parent_virq, const struct ipi_mux_ops *ops); > > While it is kinda sensible to have the documentation near the > declaration, I prefer it to be near the code because thats where it > matters and also has a higher chance to be updated when the code > changes. Okay, I will move documentation near the code. > > Please use proper kernel doc while at it. Sure, I will update. > > > +static unsigned int ipi_mux_parent_virq; > > +static struct irq_domain *ipi_mux_domain; > > +static const struct ipi_mux_ops *ipi_mux_ops; > > +static DEFINE_PER_CPU(unsigned long, ipi_mux_bits); > > + > > +static void ipi_mux_dummy(struct irq_data *d) > > +{ > > +} > > + > > +static void ipi_mux_send_mask(struct irq_data *d, const struct cpumask *mask) > > +{ > > + int cpu; > > + > > + /* Barrier before doing atomic bit update to IPI bits */ > > + smp_mb__before_atomic(); > > + > > + for_each_cpu(cpu, mask) > > + set_bit(d->hwirq, per_cpu_ptr(&ipi_mux_bits, cpu)); > > + > > + /* Barrier after doing atomic bit update to IPI bits */ > > + smp_mb__after_atomic(); > > + > > + /* Trigger the parent IPI */ > > + ipi_mux_ops->ipi_mux_send(ipi_mux_parent_virq, mask); > > +} > > + > > +static struct irq_chip ipi_mux_chip = { > > + .name = "RISC-V IPI Mux", > > RISC-V IPI Mux is a truly generic name :) Aargh, I forgot to remove "RISC-V" from the name here. I will update. > > > + .irq_mask = ipi_mux_dummy, > > + .irq_unmask = ipi_mux_dummy, > > + .ipi_send_mask = ipi_mux_send_mask, > > +}; > > + > > +static int ipi_mux_domain_map(struct irq_domain *d, unsigned int irq, > > + irq_hw_number_t hwirq) > > +{ > > + irq_set_percpu_devid(irq); > > + irq_domain_set_info(d, irq, hwirq, &ipi_mux_chip, d->host_data, > > + handle_percpu_devid_irq, NULL, NULL); > > + > > + return 0; > > +} > > + > > +static int ipi_mux_domain_alloc(struct irq_domain *d, unsigned int virq, > > + unsigned int nr_irqs, void *arg) > > +{ > > + int i, ret; > > + irq_hw_number_t hwirq; > > + unsigned int type = IRQ_TYPE_NONE; > > + struct irq_fwspec *fwspec = arg; > > Documentation/process/maintainer-tip.rst #coding-style-notes Okay, I will refer and update. > > > + ret = irq_domain_translate_onecell(d, fwspec, &hwirq, &type); > > + if (ret) > > + return ret; > > + > > + for (i = 0; i < nr_irqs; i++) { > > + ret = ipi_mux_domain_map(d, virq + i, hwirq + i); > > + if (ret) > > + return ret; > > + } > > + > > + return 0; > > +} > > + > > +static const struct irq_domain_ops ipi_mux_domain_ops = { > > + .translate = irq_domain_translate_onecell, > > + .alloc = ipi_mux_domain_alloc, > > + .free = irq_domain_free_irqs_top, > > +}; > > + > > +void ipi_mux_process(void) > > +{ > > + int err; > > + unsigned long irqs, *bits = this_cpu_ptr(&ipi_mux_bits); > > + irq_hw_number_t hwirq; > > + > > + while (true) { > > + /* Clear the parent IPI */ > > + ipi_mux_ops->ipi_mux_clear(ipi_mux_parent_virq); > > This being in a loop smells fishy at least without a comment. And the > more I read all of this the less I'm convinced that this code can be > used by anything else than RISCV. The original IPI muxing code in RISC-V had this loop so I did not remove it. Actually, the loop is redundant because if a CPU gets another IPI while it was in ipi_mux_process() then another interrupt will be taken and ipi_mux_process() will be called again. I test more and remove this loop. > > > + /* Order bit clearing and data access. */ > > + mb(); > > This mb() pairs with what? Memory barriers have a counterpart and it's > mandatory to document that in the comment. It pairs with barriers in ipi_mux_send_mask(). I will update the comment. > > > + irqs = xchg(bits, 0); > > + if (!irqs) > > + break; > > + > > + for_each_set_bit(hwirq, &irqs, IPI_MUX_NR_IRQS) { > > + err = generic_handle_domain_irq(ipi_mux_domain, > > + hwirq); > > + if (unlikely(err)) > > + pr_warn_ratelimited( > > + "can't find mapping for hwirq %lu\n", > > + hwirq); > > + } > > + } > > +} > > + > > + > > +void ipi_mux_destroy(void) > > Seriously? You provide a function to rip the IPI mechanism out in a > running system? What's that for? > > > +{ > > + if (!ipi_mux_domain) > > + return; > > + > > + irq_domain_remove(ipi_mux_domain); > > + ipi_mux_domain = NULL; > > + ipi_mux_parent_virq = 0; > > If it would be useful, then this would leak the hotplug callbacks, but > the good news is that after tearing down the IPI domain hotplug does not > work anymore :) The only use of this function was to clean up in-case the irqchip driver failed after creating mux. I will certainly remove this function in the next patch revision. > > Thanks, > > tglx Regards, Anup
diff --git a/include/linux/irq.h b/include/linux/irq.h index 848e1e12c5c6..cdce7eae2f37 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -1248,6 +1248,34 @@ int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest); int ipi_send_single(unsigned int virq, unsigned int cpu); int ipi_send_mask(unsigned int virq, const struct cpumask *dest); +#define IPI_MUX_NR_IRQS BITS_PER_LONG +struct ipi_mux_ops { + void (*ipi_mux_clear)(unsigned int parent_virq); + void (*ipi_mux_send)(unsigned int parent_virq, + const struct cpumask *mask); +}; + +/* Process multiplexed IPIs */ +void ipi_mux_process(void); + +/* + * Create multiple IPIs (total IPI_MUX_NR_IRQS) multiplexed on top of a + * single parent IPI. + * + * If the parent IPI > 0 then ipi_mux_process() will be automatically + * called via chained handler. + * + * If the parent IPI <= 0 then it is responsiblity of irqchip drivers + * to explicitly call ipi_mux_process() for processing muxed + * IPIs. + * + * Returns first virq of the muxed IPIs upon success or <=0 upon failure + */ +int ipi_mux_create(unsigned int parent_virq, const struct ipi_mux_ops *ops); + +/* Destroy multiplexed IPIs */ +void ipi_mux_destroy(void); + #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER /* * Registers a generic IRQ handling function as the top-level IRQ handler in diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 10929eda9825..2388e7d40ed3 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -84,6 +84,10 @@ config GENERIC_IRQ_IPI bool select IRQ_DOMAIN_HIERARCHY +# Generic IRQ IPI Mux support +config GENERIC_IRQ_IPI_MUX + bool + # Generic MSI interrupt support config GENERIC_MSI_IRQ bool diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile index b4f53717d143..f19d3080bf11 100644 --- a/kernel/irq/Makefile +++ b/kernel/irq/Makefile @@ -15,6 +15,7 @@ obj-$(CONFIG_GENERIC_IRQ_MIGRATION) += cpuhotplug.o obj-$(CONFIG_PM_SLEEP) += pm.o obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o +obj-$(CONFIG_GENERIC_IRQ_IPI_MUX) += ipi-mux.o obj-$(CONFIG_SMP) += affinity.o obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o diff --git a/kernel/irq/ipi-mux.c b/kernel/irq/ipi-mux.c new file mode 100644 index 000000000000..7bf3d007b1e6 --- /dev/null +++ b/kernel/irq/ipi-mux.c @@ -0,0 +1,190 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Multiplex several IPIs over a single HW IPI. + * + * Copyright (c) 2022 Ventana Micro Systems Inc. + */ + +#define pr_fmt(fmt) "ipi-mux: " fmt +#include <linux/cpu.h> +#include <linux/init.h> +#include <linux/irq.h> +#include <linux/irqchip.h> +#include <linux/irqchip/chained_irq.h> +#include <linux/irqdomain.h> +#include <linux/smp.h> + +static unsigned int ipi_mux_parent_virq; +static struct irq_domain *ipi_mux_domain; +static const struct ipi_mux_ops *ipi_mux_ops; +static DEFINE_PER_CPU(unsigned long, ipi_mux_bits); + +static void ipi_mux_dummy(struct irq_data *d) +{ +} + +static void ipi_mux_send_mask(struct irq_data *d, const struct cpumask *mask) +{ + int cpu; + + /* Barrier before doing atomic bit update to IPI bits */ + smp_mb__before_atomic(); + + for_each_cpu(cpu, mask) + set_bit(d->hwirq, per_cpu_ptr(&ipi_mux_bits, cpu)); + + /* Barrier after doing atomic bit update to IPI bits */ + smp_mb__after_atomic(); + + /* Trigger the parent IPI */ + ipi_mux_ops->ipi_mux_send(ipi_mux_parent_virq, mask); +} + +static struct irq_chip ipi_mux_chip = { + .name = "RISC-V IPI Mux", + .irq_mask = ipi_mux_dummy, + .irq_unmask = ipi_mux_dummy, + .ipi_send_mask = ipi_mux_send_mask, +}; + +static int ipi_mux_domain_map(struct irq_domain *d, unsigned int irq, + irq_hw_number_t hwirq) +{ + irq_set_percpu_devid(irq); + irq_domain_set_info(d, irq, hwirq, &ipi_mux_chip, d->host_data, + handle_percpu_devid_irq, NULL, NULL); + + return 0; +} + +static int ipi_mux_domain_alloc(struct irq_domain *d, unsigned int virq, + unsigned int nr_irqs, void *arg) +{ + int i, ret; + irq_hw_number_t hwirq; + unsigned int type = IRQ_TYPE_NONE; + struct irq_fwspec *fwspec = arg; + + ret = irq_domain_translate_onecell(d, fwspec, &hwirq, &type); + if (ret) + return ret; + + for (i = 0; i < nr_irqs; i++) { + ret = ipi_mux_domain_map(d, virq + i, hwirq + i); + if (ret) + return ret; + } + + return 0; +} + +static const struct irq_domain_ops ipi_mux_domain_ops = { + .translate = irq_domain_translate_onecell, + .alloc = ipi_mux_domain_alloc, + .free = irq_domain_free_irqs_top, +}; + +void ipi_mux_process(void) +{ + int err; + unsigned long irqs, *bits = this_cpu_ptr(&ipi_mux_bits); + irq_hw_number_t hwirq; + + while (true) { + /* Clear the parent IPI */ + ipi_mux_ops->ipi_mux_clear(ipi_mux_parent_virq); + + /* Order bit clearing and data access. */ + mb(); + + irqs = xchg(bits, 0); + if (!irqs) + break; + + for_each_set_bit(hwirq, &irqs, IPI_MUX_NR_IRQS) { + err = generic_handle_domain_irq(ipi_mux_domain, + hwirq); + if (unlikely(err)) + pr_warn_ratelimited( + "can't find mapping for hwirq %lu\n", + hwirq); + } + } +} + +static void ipi_mux_handler(struct irq_desc *desc) +{ + struct irq_chip *chip = irq_desc_get_chip(desc); + + chained_irq_enter(chip, desc); + ipi_mux_process(); + chained_irq_exit(chip, desc); +} + +static int ipi_mux_dying_cpu(unsigned int cpu) +{ + if (ipi_mux_parent_virq > 0) + disable_percpu_irq(ipi_mux_parent_virq); + return 0; +} + +static int ipi_mux_starting_cpu(unsigned int cpu) +{ + if (ipi_mux_parent_virq > 0) + enable_percpu_irq(ipi_mux_parent_virq, + irq_get_trigger_type(ipi_mux_parent_virq)); + return 0; +} + +int ipi_mux_create(unsigned int parent_virq, const struct ipi_mux_ops *ops) +{ + int virq; + struct irq_fwspec ipi; + struct irq_domain *domain; + + if (ipi_mux_domain || + !ops || !ops->ipi_mux_send || !ops->ipi_mux_clear) + return 0; + + domain = irq_domain_add_linear(NULL, IPI_MUX_NR_IRQS, + &ipi_mux_domain_ops, NULL); + if (!domain) { + pr_err("unable to add IPI Mux domain\n"); + return 0; + } + + ipi.fwnode = domain->fwnode; + ipi.param_count = 1; + ipi.param[0] = 0; + virq = __irq_domain_alloc_irqs(domain, -1, IPI_MUX_NR_IRQS, + NUMA_NO_NODE, &ipi, false, NULL); + if (virq <= 0) { + pr_err("unable to alloc IRQs from IPI Mux domain\n"); + irq_domain_remove(domain); + return virq; + } + + ipi_mux_domain = domain; + ipi_mux_parent_virq = parent_virq; + ipi_mux_ops = ops; + + if (parent_virq > 0) { + irq_set_chained_handler(parent_virq, ipi_mux_handler); + + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, + "irqchip/ipi-mux:starting", + ipi_mux_starting_cpu, ipi_mux_dying_cpu); + } + + return virq; +} + +void ipi_mux_destroy(void) +{ + if (!ipi_mux_domain) + return; + + irq_domain_remove(ipi_mux_domain); + ipi_mux_domain = NULL; + ipi_mux_parent_virq = 0; +}
All RISC-V platforms have a single HW IPI provided by the INTC local interrupt controller. The HW method to trigger INTC IPI can be through external irqchip (e.g. RISC-V AIA), through platform specific device (e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call). To support multiple IPIs on RISC-V, we need a generic mechanism to create multiple per-CPU vIRQs using a single HW IPI hence this patch. The generic IPI multiplex mechanism added by this patch can also be useful to other architectures. Signed-off-by: Anup Patel <apatel@ventanamicro.com> --- include/linux/irq.h | 28 +++++++ kernel/irq/Kconfig | 4 + kernel/irq/Makefile | 1 + kernel/irq/ipi-mux.c | 190 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 223 insertions(+) create mode 100644 kernel/irq/ipi-mux.c