Message ID | f127e6d3f3ef64ea286b7b8f485e5d0c7f3faa6f.147018b3518.git.dalias@libc.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello Rich, On 08-04-16, Rich Felker wrote: > At the hardware level, the J-Core PIT is integrated with the interrupt > controller, but it is represented as its own device and has an > independent programming interface. It provides a 12-bit countdown > timer, which is not presently used, and a periodic timer. The interval > length for the latter is programmable via a 32-bit throttle register > whose units are determined by a bus-period register. The periodic > timer is used to implement both periodic and oneshot clock event > modes; in oneshot mode the interrupt handler simply disables the timer > as soon as it fires. > > Despite its device tree node representing an interrupt for the PIT, > the actual irq generated is programmable, not hard-wired. The driver > is responsible for programming the PIT to generate the hardware irq > number that the DT assigns to it. > > On SMP configurations, J-Core provides cpu-local instances of the PIT; > no broadcast timer is needed. This driver supports the creation of the > necessary per-cpu clock_event_device instances. The code has been > tested and works on SMP, but will not be usable without additional > J-Core SMP-support patches and appropriate hardware capable of running > SMP. > > A nanosecond-resolution clocksource is provided using the J-Core "RTC" > registers, which give a 64-bit seconds count and 32-bit nanoseconds. > The driver converts these to a 64-bit nanoseconds count. > > Signed-off-by: Rich Felker <dalias@libc.org> > ... > ... > ... > + > +static int jcore_pit_set_state_shutdown(struct clock_event_device *ced) > +{ > + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); > + > + return jcore_pit_disable(pit); > +} > + > +static int jcore_pit_set_state_oneshot(struct clock_event_device *ced) > +{ > + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); > + > + return jcore_pit_disable(pit); > +} Maybe to use only jcore_pit_set_state_shutdown() for both shutdown/oneshot states as it is implemented for some of others clocksource drivers? all the more so as described in your commit message, they do the same: > in oneshot mode the interrupt handler simply disables the timer > as soon as it fires right? > +static int __init jcore_pit_init(struct device_node *node) > +{ > + int err; > + unsigned pit_irq, cpu; > + unsigned long hwirq; > + u32 irqprio, enable_val; > + > + jcore_pit_base = of_iomap(node, 0); > + if (!jcore_pit_base) { > + pr_err("Error: Cannot map base address for J-Core PIT\n"); > + return -ENXIO; > + } > + > + pit_irq = irq_of_parse_and_map(node, 0); > + if (!pit_irq) { > + pr_err("Error: J-Core PIT has no IRQ\n"); > + return -ENXIO; > + } > + > + pr_info("Initializing J-Core PIT at %p IRQ %d\n", > + jcore_pit_base, pit_irq); > + > + jcore_cs.name = "jcore_pit_cs"; > + jcore_cs.rating = 400; > + jcore_cs.read = jcore_clocksource_read; > + jcore_cs.mult = 1; > + jcore_cs.shift = 0; > + jcore_cs.mask = CLOCKSOURCE_MASK(32); > + jcore_cs.flags = CLOCK_SOURCE_IS_CONTINUOUS; > + > + err = clocksource_register_hz(&jcore_cs, NSEC_PER_SEC); > + if (err) { > + pr_err("Error registering clocksource device: %d\n", err); > + return err; > + } > + > + sched_clock_register(jcore_sched_clock_read, 32, NSEC_PER_SEC); > + > + jcore_pit_percpu = alloc_percpu(struct jcore_pit); > + if (!jcore_pit_percpu) { > + pr_err("Failed to allocate memory for clock event device\n"); > + return -ENOMEM; > + } > + > + err = request_irq(pit_irq, jcore_timer_interrupt, > + IRQF_TIMER | IRQF_PERCPU, > + "jcore_pit", jcore_pit_percpu); > + if (err) { > + pr_err("pit irq request failed: %d\n", err); > + return err; > + } free_percpu() missed in a case when request_irq() failed. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 04, 2016 at 02:24:51PM +0600, Alexnader Kuleshov wrote: > Hello Rich, > > On 08-04-16, Rich Felker wrote: > > At the hardware level, the J-Core PIT is integrated with the interrupt > > controller, but it is represented as its own device and has an > > independent programming interface. It provides a 12-bit countdown > > timer, which is not presently used, and a periodic timer. The interval > > length for the latter is programmable via a 32-bit throttle register > > whose units are determined by a bus-period register. The periodic > > timer is used to implement both periodic and oneshot clock event > > modes; in oneshot mode the interrupt handler simply disables the timer > > as soon as it fires. > > > > Despite its device tree node representing an interrupt for the PIT, > > the actual irq generated is programmable, not hard-wired. The driver > > is responsible for programming the PIT to generate the hardware irq > > number that the DT assigns to it. > > > > On SMP configurations, J-Core provides cpu-local instances of the PIT; > > no broadcast timer is needed. This driver supports the creation of the > > necessary per-cpu clock_event_device instances. The code has been > > tested and works on SMP, but will not be usable without additional > > J-Core SMP-support patches and appropriate hardware capable of running > > SMP. > > > > A nanosecond-resolution clocksource is provided using the J-Core "RTC" > > registers, which give a 64-bit seconds count and 32-bit nanoseconds. > > The driver converts these to a 64-bit nanoseconds count. > > > > Signed-off-by: Rich Felker <dalias@libc.org> > > ... > > ... > > ... > > + > > +static int jcore_pit_set_state_shutdown(struct clock_event_device *ced) > > +{ > > + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); > > + > > + return jcore_pit_disable(pit); > > +} > > + > > +static int jcore_pit_set_state_oneshot(struct clock_event_device *ced) > > +{ > > + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); > > + > > + return jcore_pit_disable(pit); > > +} > > Maybe to use only jcore_pit_set_state_shutdown() for both shutdown/oneshot > states as it is implemented for some of others clocksource drivers? > > all the more so as described in your commit message, they do the same: > > > in oneshot mode the interrupt handler simply disables the timer > > as soon as it fires > > right? I separated these out semantically due to another request earlier in this patch's life cycle. At this point I feel like issues like this are really a bikeshed, and rather than changing trivial details back and forth over and over I'd like to see it go upstream so that I don't have to keep rebasing it on infrastructure that's changing underneath it. > > +static int __init jcore_pit_init(struct device_node *node) > > +{ > > + int err; > > + unsigned pit_irq, cpu; > > + unsigned long hwirq; > > + u32 irqprio, enable_val; > > + > > + jcore_pit_base = of_iomap(node, 0); > > + if (!jcore_pit_base) { > > + pr_err("Error: Cannot map base address for J-Core PIT\n"); > > + return -ENXIO; > > + } > > + > > + pit_irq = irq_of_parse_and_map(node, 0); > > + if (!pit_irq) { > > + pr_err("Error: J-Core PIT has no IRQ\n"); > > + return -ENXIO; > > + } > > + > > + pr_info("Initializing J-Core PIT at %p IRQ %d\n", > > + jcore_pit_base, pit_irq); > > + > > + jcore_cs.name = "jcore_pit_cs"; > > + jcore_cs.rating = 400; > > + jcore_cs.read = jcore_clocksource_read; > > + jcore_cs.mult = 1; > > + jcore_cs.shift = 0; > > + jcore_cs.mask = CLOCKSOURCE_MASK(32); > > + jcore_cs.flags = CLOCK_SOURCE_IS_CONTINUOUS; > > + > > + err = clocksource_register_hz(&jcore_cs, NSEC_PER_SEC); > > + if (err) { > > + pr_err("Error registering clocksource device: %d\n", err); > > + return err; > > + } > > + > > + sched_clock_register(jcore_sched_clock_read, 32, NSEC_PER_SEC); > > + > > + jcore_pit_percpu = alloc_percpu(struct jcore_pit); > > + if (!jcore_pit_percpu) { > > + pr_err("Failed to allocate memory for clock event device\n"); > > + return -ENOMEM; > > + } > > + > > + err = request_irq(pit_irq, jcore_timer_interrupt, > > + IRQF_TIMER | IRQF_PERCPU, > > + "jcore_pit", jcore_pit_percpu); > > + if (err) { > > + pr_err("pit irq request failed: %d\n", err); > > + return err; > > + } > > free_percpu() missed in a case when request_irq() failed. Shall I submit a new version with this change? I don't think it's particular useful since you're not going to have a working system if the timer fails to initialize anyway, but I can do it if desired. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/04/2016 06:30 AM, Rich Felker wrote: > At the hardware level, the J-Core PIT is integrated with the interrupt > controller, but it is represented as its own device and has an > independent programming interface. It provides a 12-bit countdown > timer, which is not presently used, and a periodic timer. The interval > length for the latter is programmable via a 32-bit throttle register > whose units are determined by a bus-period register. The periodic > timer is used to implement both periodic and oneshot clock event > modes; in oneshot mode the interrupt handler simply disables the timer > as soon as it fires. > > Despite its device tree node representing an interrupt for the PIT, > the actual irq generated is programmable, not hard-wired. The driver > is responsible for programming the PIT to generate the hardware irq > number that the DT assigns to it. > > On SMP configurations, J-Core provides cpu-local instances of the PIT; > no broadcast timer is needed. This driver supports the creation of the > necessary per-cpu clock_event_device instances. The code has been > tested and works on SMP, but will not be usable without additional > J-Core SMP-support patches and appropriate hardware capable of running > SMP. > > A nanosecond-resolution clocksource is provided using the J-Core "RTC" > registers, which give a 64-bit seconds count and 32-bit nanoseconds. > The driver converts these to a 64-bit nanoseconds count. > > Signed-off-by: Rich Felker <dalias@libc.org> > --- > drivers/clocksource/Kconfig | 9 ++ > drivers/clocksource/Makefile | 1 + > drivers/clocksource/jcore-pit.c | 242 ++++++++++++++++++++++++++++++++++++++++ > include/linux/cpuhotplug.h | 1 + > 4 files changed, 253 insertions(+) > create mode 100644 drivers/clocksource/jcore-pit.c > > diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig > index 5677886..3210ca5 100644 > --- a/drivers/clocksource/Kconfig > +++ b/drivers/clocksource/Kconfig > @@ -407,6 +407,15 @@ config SYS_SUPPORTS_SH_TMU > config SYS_SUPPORTS_EM_STI > bool > > +config CLKSRC_JCORE_PIT > + bool "J-Core PIT timer driver" > + depends on OF && (SUPERH || COMPILE_TEST) Even if this is correct, for the sake of consistency, it is preferable to change it to: bool "J-Core PIT timer driver" if COMPILE_TEST depends on SUPERH select CLKSRC_OF > + depends on GENERIC_CLOCKEVENTS > + depends on HAS_IOMEM > + help > + This enables build of clocksource and clockevent driver for > + the integrated PIT in the J-Core synthesizable, open source SoC. > + > config SH_TIMER_CMT > bool "Renesas CMT timer driver" if COMPILE_TEST > depends on GENERIC_CLOCKEVENTS > diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile > index fd9d6df..cf87f40 100644 > --- a/drivers/clocksource/Makefile > +++ b/drivers/clocksource/Makefile > @@ -5,6 +5,7 @@ obj-$(CONFIG_ATMEL_TCB_CLKSRC) += tcb_clksrc.o > obj-$(CONFIG_X86_PM_TIMER) += acpi_pm.o > obj-$(CONFIG_SCx200HR_TIMER) += scx200_hrt.o > obj-$(CONFIG_CS5535_CLOCK_EVENT_SRC) += cs5535-clockevt.o > +obj-$(CONFIG_CLKSRC_JCORE_PIT) += jcore-pit.o > obj-$(CONFIG_SH_TIMER_CMT) += sh_cmt.o > obj-$(CONFIG_SH_TIMER_MTU2) += sh_mtu2.o > obj-$(CONFIG_SH_TIMER_TMU) += sh_tmu.o > diff --git a/drivers/clocksource/jcore-pit.c b/drivers/clocksource/jcore-pit.c > new file mode 100644 > index 0000000..23dee50 > --- /dev/null > +++ b/drivers/clocksource/jcore-pit.c > @@ -0,0 +1,242 @@ > +/* > + * J-Core SoC PIT/clocksource driver > + * > + * Copyright (C) 2015-2016 Smart Energy Instruments, Inc. > + * > + * This file is subject to the terms and conditions of the GNU General Public > + * License. See the file "COPYING" in the main directory of this archive > + * for more details. > + */ > + > +#include <linux/kernel.h> > +#include <linux/slab.h> > +#include <linux/interrupt.h> > +#include <linux/clockchips.h> > +#include <linux/clocksource.h> > +#include <linux/sched_clock.h> > +#include <linux/cpu.h> > +#include <linux/cpuhotplug.h> > +#include <linux/of_address.h> > +#include <linux/of_irq.h> > + > +#define PIT_IRQ_SHIFT 12 > +#define PIT_PRIO_SHIFT 20 > +#define PIT_ENABLE_SHIFT 26 > +#define PIT_IRQ_MASK 0x3f > +#define PIT_PRIO_MASK 0xf > + > +#define REG_PITEN 0x00 > +#define REG_THROT 0x10 > +#define REG_COUNT 0x14 > +#define REG_BUSPD 0x18 > +#define REG_SECHI 0x20 > +#define REG_SECLO 0x24 > +#define REG_NSEC 0x28 > + > +struct jcore_pit { > + struct clock_event_device ced; > + __iomem void *base; > + unsigned long periodic_delta; > + unsigned cpu; > + u32 enable_val; > +}; > + > +static __iomem void *jcore_pit_base; > +static struct clocksource jcore_cs; > +struct jcore_pit __percpu *jcore_pit_percpu; > + > +static notrace u64 jcore_sched_clock_read(void) > +{ > + u32 seclo, nsec, seclo0; > + __iomem void *base = jcore_pit_base; > + > + seclo = __raw_readl(base + REG_SECLO); > + do { > + seclo0 = seclo; > + nsec = __raw_readl(base + REG_NSEC); > + seclo = __raw_readl(base + REG_SECLO); > + } while (seclo0 != seclo); > + > + return seclo * NSEC_PER_SEC + nsec; > +} > + > +static cycle_t jcore_clocksource_read(struct clocksource *cs) > +{ > + return jcore_sched_clock_read(); > +} > + > +static int jcore_pit_disable(struct jcore_pit *pit) > +{ > + __raw_writel(0, pit->base + REG_PITEN); > + return 0; > +} > + > +static int jcore_pit_set(unsigned long delta, struct jcore_pit *pit) > +{ > + jcore_pit_disable(pit); > + __raw_writel(delta, pit->base + REG_THROT); > + __raw_writel(pit->enable_val, pit->base + REG_PITEN); > + return 0; Why do you need to use __raw_writel ? s/__raw_writel/writel/ > +} > + > +static int jcore_pit_set_state_shutdown(struct clock_event_device *ced) > +{ > + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); > + > + return jcore_pit_disable(pit); > +} > + > +static int jcore_pit_set_state_oneshot(struct clock_event_device *ced) > +{ > + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); > + > + return jcore_pit_disable(pit); > +} > + > +static int jcore_pit_set_state_periodic(struct clock_event_device *ced) > +{ > + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); > + > + return jcore_pit_set(pit->periodic_delta, pit); > +} > + > +static int jcore_pit_set_next_event(unsigned long delta, > + struct clock_event_device *ced) > +{ > + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); > + > + return jcore_pit_set(delta, pit); > +} > + > +static int jcore_pit_local_init(unsigned cpu) > +{ > + struct jcore_pit *pit = this_cpu_ptr(jcore_pit_percpu); > + unsigned buspd, freq; > + > + pr_info("Local J-Core PIT init on cpu %u\n", pit->cpu); > + > + buspd = __raw_readl(pit->base + REG_BUSPD); s/__raw_readl/readl/ ? > + freq = DIV_ROUND_CLOSEST(NSEC_PER_SEC, buspd); > + pit->periodic_delta = DIV_ROUND_CLOSEST(NSEC_PER_SEC, HZ*buspd); ---> HZ * buspd > + > + clockevents_config_and_register(&pit->ced, freq, 1, ULONG_MAX); > + > + return 0; > +} > + > +static int jcore_pit_local_shutdown(unsigned cpu) > +{ > + return 0; > +} This function is useless I think. AFAIU, cpuhp_setup_state can have a NULL function for the cpu teardown. > + > +static irqreturn_t jcore_timer_interrupt(int irq, void *dev_id) > +{ > + struct jcore_pit *pit = this_cpu_ptr(dev_id); > + > + if (clockevent_state_oneshot(&pit->ced)) > + jcore_pit_disable(pit); > + > + pit->ced.event_handler(&pit->ced); > + > + return IRQ_HANDLED; > +} > + > +static int __init jcore_pit_init(struct device_node *node) > +{ > + int err; > + unsigned pit_irq, cpu; > + unsigned long hwirq; > + u32 irqprio, enable_val; > + > + jcore_pit_base = of_iomap(node, 0); > + if (!jcore_pit_base) { > + pr_err("Error: Cannot map base address for J-Core PIT\n"); > + return -ENXIO; > + } > + > + pit_irq = irq_of_parse_and_map(node, 0); > + if (!pit_irq) { > + pr_err("Error: J-Core PIT has no IRQ\n"); > + return -ENXIO; > + } > + > + pr_info("Initializing J-Core PIT at %p IRQ %d\n", > + jcore_pit_base, pit_irq); > + > + jcore_cs.name = "jcore_pit_cs"; > + jcore_cs.rating = 400; > + jcore_cs.read = jcore_clocksource_read; > + jcore_cs.mult = 1; > + jcore_cs.shift = 0; > + jcore_cs.mask = CLOCKSOURCE_MASK(32); > + jcore_cs.flags = CLOCK_SOURCE_IS_CONTINUOUS; > + > + err = clocksource_register_hz(&jcore_cs, NSEC_PER_SEC); > + if (err) { > + pr_err("Error registering clocksource device: %d\n", err); > + return err; > + } May be you can consider by replacing the above by: clocksource_mmio_init(jcore_pit_base, "jcore_pit_cs", NSEC_PER_SEC, 32, jcore_clocksource_read); > + > + sched_clock_register(jcore_sched_clock_read, 32, NSEC_PER_SEC); > + > + jcore_pit_percpu = alloc_percpu(struct jcore_pit); > + if (!jcore_pit_percpu) { > + pr_err("Failed to allocate memory for clock event device\n"); > + return -ENOMEM; > + } > + > + err = request_irq(pit_irq, jcore_timer_interrupt, > + IRQF_TIMER | IRQF_PERCPU, > + "jcore_pit", jcore_pit_percpu); > + if (err) { > + pr_err("pit irq request failed: %d\n", err); > + return err; > + } That is my major concern. Regarding the description of this timer, it appears there is one timer per cpu but we use request_irq instead of request_percpu_irq. IIUC, there is a problem with the interrupt controller where the per irq line are not working correctly. Is that correct ? Regarding Marc Zyngier comments about the irq controller driver being almost empty, I'm wondering if something in the irq controller driver which shouldn't be added before submitting this timer driver with SMP support (eg. irq domain ?). > + /* > + * The J-Core PIT is not hard-wired to a particular IRQ, but > + * integrated with the interrupt controller such that the IRQ it > + * generates is programmable. The programming interface has a > + * legacy field which was an interrupt priority for AIC1, but > + * which is OR'd onto bits 2-5 of the generated IRQ number when > + * used with J-Core AIC2, so set it to match these bits. > + */ > + hwirq = irq_get_irq_data(pit_irq)->hwirq; irq_hw_number_t hwirq; hwirq = irqd_to_hwirq(irq_get_irq_data(pit_irq)); > + irqprio = (hwirq >> 2) & PIT_PRIO_MASK; > + enable_val = (1U << PIT_ENABLE_SHIFT) > + | (hwirq << PIT_IRQ_SHIFT) > + | (irqprio << PIT_PRIO_SHIFT); I'm missing the connection between the description above and the enable value computed here. Can you elaborate ? > + > + for_each_present_cpu(cpu) { > + struct jcore_pit *pit = per_cpu_ptr(jcore_pit_percpu, cpu); > + > + pit->base = of_iomap(node, cpu); > + if (!pit->base) { > + pr_err("Unable to map PIT for cpu %u\n", cpu); > + continue; > + } > + > + pit->ced.name = "jcore_pit"; > + pit->ced.features = CLOCK_EVT_FEAT_PERIODIC > + | CLOCK_EVT_FEAT_ONESHOT > + | CLOCK_EVT_FEAT_PERCPU; > + pit->ced.cpumask = cpumask_of(cpu); > + pit->ced.rating = 400; > + pit->ced.irq = pit_irq; > + pit->ced.set_state_shutdown = jcore_pit_set_state_shutdown; > + pit->ced.set_state_periodic = jcore_pit_set_state_periodic; > + pit->ced.set_state_oneshot = jcore_pit_set_state_oneshot; > + pit->ced.set_next_event = jcore_pit_set_next_event; > + > + pit->cpu = cpu; > + pit->enable_val = enable_val; > + } > + > + cpuhp_setup_state(CPUHP_AP_JCORE_TIMER_STARTING, > + "AP_JCORE_TIMER_STARTING", > + jcore_pit_local_init, jcore_pit_local_shutdown); > + > + return 0; > +} > + > +CLOCKSOURCE_OF_DECLARE(jcore_pit, "jcore,pit", jcore_pit_init); > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > index 242bf53..e95ed7d 100644 > --- a/include/linux/cpuhotplug.h > +++ b/include/linux/cpuhotplug.h > @@ -50,6 +50,7 @@ enum cpuhp_state { > CPUHP_AP_ARM_ARCH_TIMER_STARTING, > CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, > CPUHP_AP_DUMMY_TIMER_STARTING, > + CPUHP_AP_JCORE_TIMER_STARTING, > CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, > CPUHP_AP_ARM_TWD_STARTING, > CPUHP_AP_METAG_TIMER_STARTING, >
On Wed, Aug 24, 2016 at 06:42:05PM +0200, Daniel Lezcano wrote: > > diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig > > index 5677886..3210ca5 100644 > > --- a/drivers/clocksource/Kconfig > > +++ b/drivers/clocksource/Kconfig > > @@ -407,6 +407,15 @@ config SYS_SUPPORTS_SH_TMU > > config SYS_SUPPORTS_EM_STI > > bool > > > > +config CLKSRC_JCORE_PIT > > + bool "J-Core PIT timer driver" > > + depends on OF && (SUPERH || COMPILE_TEST) > > Even if this is correct, for the sake of consistency, it is preferable > to change it to: > > bool "J-Core PIT timer driver" if COMPILE_TEST > depends on SUPERH > select CLKSRC_OF Is this functionally equivalent? If so that's non-obvious to me. What about the dropped OF dependency? The intent is that the option should always be available for SUPERH targets using OF, otherwise only available with COMPILE_TEST. > > +static int jcore_pit_set(unsigned long delta, struct jcore_pit *pit) > > +{ > > + jcore_pit_disable(pit); > > + __raw_writel(delta, pit->base + REG_THROT); > > + __raw_writel(pit->enable_val, pit->base + REG_PITEN); > > + return 0; > > Why do you need to use __raw_writel ? > > s/__raw_writel/writel/ I actually tried multiple times to find good resources on policy for which form to prefer, but didn't have much luck. My understanding is that __raw_writel/__raw_readl always performs a native-endian load/store, whereas writel/readl behavior depends on cpu endianness and whether the arch declares that "pci bus" (that was the term I found in the source, not relevant here) io is endian-swapped or not. Can you give me a better explanation of why we might prefer one form or the other? > > + freq = DIV_ROUND_CLOSEST(NSEC_PER_SEC, buspd); > > + pit->periodic_delta = DIV_ROUND_CLOSEST(NSEC_PER_SEC, HZ*buspd); > > ---> HZ * buspd OK. > > + clockevents_config_and_register(&pit->ced, freq, 1, ULONG_MAX); > > + > > + return 0; > > +} > > + > > +static int jcore_pit_local_shutdown(unsigned cpu) > > +{ > > + return 0; > > +} > > This function is useless I think. AFAIU, cpuhp_setup_state can have a > NULL function for the cpu teardown. OK, I wasn't sure if that was permitted. > > + jcore_cs.name = "jcore_pit_cs"; > > + jcore_cs.rating = 400; > > + jcore_cs.read = jcore_clocksource_read; > > + jcore_cs.mult = 1; > > + jcore_cs.shift = 0; > > + jcore_cs.mask = CLOCKSOURCE_MASK(32); > > + jcore_cs.flags = CLOCK_SOURCE_IS_CONTINUOUS; > > + > > + err = clocksource_register_hz(&jcore_cs, NSEC_PER_SEC); > > + if (err) { > > + pr_err("Error registering clocksource device: %d\n", err); > > + return err; > > + } > > May be you can consider by replacing the above by: > > clocksource_mmio_init(jcore_pit_base, "jcore_pit_cs", > NSEC_PER_SEC, 32, > jcore_clocksource_read); I think you're missing the rating argument. Otherwise it should work, but is there a good reason to prefer it? The code is slightly simpler; I'm not sure if using kzalloc vs static storage is better or worse. > > + sched_clock_register(jcore_sched_clock_read, 32, NSEC_PER_SEC); > > + > > + jcore_pit_percpu = alloc_percpu(struct jcore_pit); > > + if (!jcore_pit_percpu) { > > + pr_err("Failed to allocate memory for clock event device\n"); > > + return -ENOMEM; > > + } > > + > > + err = request_irq(pit_irq, jcore_timer_interrupt, > > + IRQF_TIMER | IRQF_PERCPU, > > + "jcore_pit", jcore_pit_percpu); > > + if (err) { > > + pr_err("pit irq request failed: %d\n", err); > > + return err; > > + } > > That is my major concern. Regarding the description of this timer, it > appears there is one timer per cpu but we use request_irq instead of > request_percpu_irq. > > IIUC, there is a problem with the interrupt controller where the per irq > line are not working correctly. Is that correct ? I don't think that's a correct characterization. Rather the percpu infrastructure just means something completely different from what you would expect it to mean. It has nothing to do with the hardware but rather with kernel-internal choice of whether to do percpu devid mapping inside the irq infrastructure, and the choice at the irq-requester side of whether to do this is required to match the irqchip driver's choice. I explained this better in another email which I could dig up if necessary, but the essence is that request_percpu_irq is a misnamed and unusably broken API. > Regarding Marc Zyngier comments about the irq controller driver being > almost empty, I'm wondering if something in the irq controller driver > which shouldn't be added before submitting this timer driver with SMP > support (eg. irq domain ?). I don't think so. At most I could make the driver hard-code the percpu devid model for certain irqs, but that _does not reflect_ anything about the hardware. Rather it just reflects bad kernel internals. It would also require writing a new percpu devid version of handle_simple_irq. In the other thread where I discussed this, I suggested irq framework changes that would clean this all up and make the percpu devid stuff work the way someone trying to use the API would expect, but such changes are not necessary for the J-Core drivers (or other existing drivers) to work. > > + /* > > + * The J-Core PIT is not hard-wired to a particular IRQ, but > > + * integrated with the interrupt controller such that the IRQ it > > + * generates is programmable. The programming interface has a > > + * legacy field which was an interrupt priority for AIC1, but > > + * which is OR'd onto bits 2-5 of the generated IRQ number when > > + * used with J-Core AIC2, so set it to match these bits. > > + */ > > + hwirq = irq_get_irq_data(pit_irq)->hwirq; > > irq_hw_number_t hwirq; > hwirq = irqd_to_hwirq(irq_get_irq_data(pit_irq)); OK. > > + irqprio = (hwirq >> 2) & PIT_PRIO_MASK; > > + enable_val = (1U << PIT_ENABLE_SHIFT) > > + | (hwirq << PIT_IRQ_SHIFT) > > + | (irqprio << PIT_PRIO_SHIFT); > > > I'm missing the connection between the description above and the enable > value computed here. Can you elaborate ? The irqprio field is filled in using a value that matches bits 2 and up of hwirq; this is what the comment says and what the code does. Can you elaborate on what you don't understand? Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 24 Aug 2016 13:40:01 -0400 Rich Felker <dalias@libc.org> wrote: [...] > > IIUC, there is a problem with the interrupt controller where the per irq > > line are not working correctly. Is that correct ? > > I don't think that's a correct characterization. Rather the percpu > infrastructure just means something completely different from what you > would expect it to mean. It has nothing to do with the hardware but > rather with kernel-internal choice of whether to do percpu devid > mapping inside the irq infrastructure, and the choice at the > irq-requester side of whether to do this is required to match the > irqchip driver's choice. I explained this better in another email > which I could dig up if necessary, but the essence is that > request_percpu_irq is a misnamed and unusably broken API. Or just one that simply doesn't fit your needs, because other architectures have different semantics than the ones you take for granted? > > > Regarding Marc Zyngier comments about the irq controller driver being > > almost empty, I'm wondering if something in the irq controller driver > > which shouldn't be added before submitting this timer driver with SMP > > support (eg. irq domain ?). > > I don't think so. At most I could make the driver hard-code the percpu > devid model for certain irqs, but that _does not reflect_ anything > about the hardware. Rather it just reflects bad kernel internals. It I'd appreciate it if instead of ranting about how broken the kernel is, you'd submit a patch fixing it, since you seem to have spotted something that we haven't in several years of using that code on a couple of ARM-related platforms. Thanks, M.
On Wed, Aug 24, 2016 at 08:01:52PM +0100, Marc Zyngier wrote: > On Wed, 24 Aug 2016 13:40:01 -0400 > Rich Felker <dalias@libc.org> wrote: > > [...] > > > > IIUC, there is a problem with the interrupt controller where > the per irq > > > line are not working correctly. Is that correct ? > > > > I don't think that's a correct characterization. Rather the percpu > > infrastructure just means something completely different from what you > > would expect it to mean. It has nothing to do with the hardware but > > rather with kernel-internal choice of whether to do percpu devid > > mapping inside the irq infrastructure, and the choice at the > > irq-requester side of whether to do this is required to match the > > irqchip driver's choice. I explained this better in another email > > which I could dig up if necessary, but the essence is that > > request_percpu_irq is a misnamed and unusably broken API. For reference, here's the thread I was referring to: https://lkml.org/lkml/2016/7/15/585 > Or just one that simply doesn't fit your needs, because other > architectures have different semantics than the ones you take for > granted? I don't think so. The choice of whether to have the irq layer or the driver's irq handler be responsible for handling a percpu pointer has nothing to do with the hardware. Perhaps the intent was that the irqchip driver always knows whether a given hwirq[-range] is associated with per-cpu events or global events for which it doesn't matter what cpu they're delivered on. In this case, the situations where you may want percpu dev_id mapping line up with some property of the hardware. However that need not be the case, and it's not when the choice of irq is programmable. > > > Regarding Marc Zyngier comments about the irq controller driver being > > > almost empty, I'm wondering if something in the irq controller driver > > > which shouldn't be added before submitting this timer driver with SMP > > > support (eg. irq domain ?). > > > > I don't think so. At most I could make the driver hard-code the percpu > > devid model for certain irqs, but that _does not reflect_ anything > > about the hardware. Rather it just reflects bad kernel internals. It > > I'd appreciate it if instead of ranting about how broken the kernel is, > you'd submit a patch fixing it, since you seem to have spotted > something that we haven't in several years of using that code on a > couple of ARM-related platforms. I didn't intend for this to be a rant. I'm not demanding that it be changed; I'm only objecting to being asked to make the driver use a framework that it doesn't need and that can't model what needs to be done. But I'm happy to discuss whether you would be open to such a change, and if so, to write and submit a patch. The ideas for what it would involve are in the linked email, quoted here: "... This is because the irq controller driver must, at irqdomain mapping time, decide whether to register the handler as handle_percpu_devid_irq (which interprets dev_id as a __percpu pointer and remaps it for the local cpu before invoking the driver's handler) or one of the other handlers that does not perform any percpu remapping. The right way for this to work would be for handle_irq_event_percpu to be responsible for the remapping, but do it conditionally on whether the irq was requested via request_irq or request_percpu_irq." Do you disagree with this assessment? Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, August 24, 2016 1:40:01 PM CEST Rich Felker wrote: > On Wed, Aug 24, 2016 at 06:42:05PM +0200, Daniel Lezcano wrote: > > > diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig > > > index 5677886..3210ca5 100644 > > > --- a/drivers/clocksource/Kconfig > > > +++ b/drivers/clocksource/Kconfig > > > @@ -407,6 +407,15 @@ config SYS_SUPPORTS_SH_TMU > > > config SYS_SUPPORTS_EM_STI > > > bool > > > > > > +config CLKSRC_JCORE_PIT > > > + bool "J-Core PIT timer driver" > > > + depends on OF && (SUPERH || COMPILE_TEST) > > > > Even if this is correct, for the sake of consistency, it is preferable > > to change it to: > > > > bool "J-Core PIT timer driver" if COMPILE_TEST > > depends on SUPERH > > select CLKSRC_OF > > Is this functionally equivalent? If so that's non-obvious to me. What > about the dropped OF dependency? The intent is that the option should > always be available for SUPERH targets using OF, otherwise only > available with COMPILE_TEST. I think your version is correct here, Daniel made a mistake with the SUPERH dependency. Another way to write it would be config CLKSRC_JCORE_PIT bool "J-Core PIT timer driver" if COMPILE_TEST && !JCORE depends on OF default JCORE to make it impossible to disable when JCORE is set, but user-visible on all other architectures, SUPERH or otherwise. > > > +static int jcore_pit_set(unsigned long delta, struct jcore_pit *pit) > > > +{ > > > + jcore_pit_disable(pit); > > > + __raw_writel(delta, pit->base + REG_THROT); > > > + __raw_writel(pit->enable_val, pit->base + REG_PITEN); > > > + return 0; > > > > Why do you need to use __raw_writel ? > > > > s/__raw_writel/writel/ > > I actually tried multiple times to find good resources on policy for > which form to prefer, but didn't have much luck. My understanding is > that __raw_writel/__raw_readl always performs a native-endian > load/store, whereas writel/readl behavior depends on cpu endianness > and whether the arch declares that "pci bus" (that was the term I > found in the source, not relevant here) io is endian-swapped or not. > Can you give me a better explanation of why we might prefer one form > or the other? In general, a portable driver should use readl/writel because they are defined to have the behavior that x86 provides, which is what most developers understand best. Compared to __raw_readl/__raw_writel, it guarantees - correct endianess (in most cases, exceptions see below) - does not leak out of spinlocks - accesses are word-sized, the compiler cannot split them into byte accesses, or merge subsequent accesses into larger words - ordering against dma_map_*/dma_unmap_*/dma_sync_* is maintained The __raw_* accessors are used as building blocks for readl/outl/ioread32/ioread32_be/readl_relaxed/... and they can usually be used safely on device RAM such as framebuffers but not much else in portable code. Some architectures also use the __raw_* accessors for MMIO registers, but that code is generally not portable. Endianess is always more complicated than one thinks, and it's handled differently across architectures, sometimes because of hardware differences, sometimes for historic reasons that differ only in the Linux implementation. A large majority of devices are fixed to little-endian MMIO registers (as recommended by PCI), so that is a good default. Exceptions to this are: * Some architectures (e.g. PowerPC) typically run big-endian code, and hardware designers decided to make the MMIO registers the same. In this case, you can generally use ioread32_be/iowrite32_be. * Some devices have configurable endianess with a device specific register. Sometimes those registers are set to match the CPU endianess, but for Linux it tends to be better to just set them to little-endian and use readl/writel * Some drivers are configurable at HW synthesis time, and we have a mix of devices. If you have this, you usually need to support both ways in the driver with a helper function that uses ioread32 or ioread32_be depending on the configuration. This can be done using either compile-time or runtime configuration. * Some SoCs have boot-time configurable endianess and enforce that MMIO registers and the CPU always use the same way. We are inconsistent in handling those, I'd recommend creating a header file for that SoC that defines vendor specific helper functions. * Some (usually very old) SoCs have configurable endianess, but don't actually change the behavior of the SoC or the MMIO devices, but only flip the address lines on the memory interface. This is similar to the previous one, but has additional problems with FIFO registers. You generally cannot use portable drivers here. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 24, 2016 at 10:01:08PM +0200, Arnd Bergmann wrote: > On Wednesday, August 24, 2016 1:40:01 PM CEST Rich Felker wrote: > > On Wed, Aug 24, 2016 at 06:42:05PM +0200, Daniel Lezcano wrote: > > > > diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig > > > > index 5677886..3210ca5 100644 > > > > --- a/drivers/clocksource/Kconfig > > > > +++ b/drivers/clocksource/Kconfig > > > > @@ -407,6 +407,15 @@ config SYS_SUPPORTS_SH_TMU > > > > config SYS_SUPPORTS_EM_STI > > > > bool > > > > > > > > +config CLKSRC_JCORE_PIT > > > > + bool "J-Core PIT timer driver" > > > > + depends on OF && (SUPERH || COMPILE_TEST) > > > > > > Even if this is correct, for the sake of consistency, it is preferable > > > to change it to: > > > > > > bool "J-Core PIT timer driver" if COMPILE_TEST > > > depends on SUPERH > > > select CLKSRC_OF > > > > Is this functionally equivalent? If so that's non-obvious to me. What > > about the dropped OF dependency? The intent is that the option should > > always be available for SUPERH targets using OF, otherwise only > > available with COMPILE_TEST. > > I think your version is correct here, Daniel made a mistake with the > SUPERH dependency. Another way to write it would be > > config CLKSRC_JCORE_PIT > bool "J-Core PIT timer driver" if COMPILE_TEST && !JCORE > depends on OF > default JCORE > > to make it impossible to disable when JCORE is set, but > user-visible on all other architectures, SUPERH or otherwise. There's no JCORE config item and in theory you don't even need to pick CPU_J2 if you don't want SMP; CPU_SH2 should work. In practice the legacy hw support needs to be converted over to device tree still, but the goal has been to _avoid_ introducing soc/board/model specific config stuff and move the arch to completely DT-based. > > > > +static int jcore_pit_set(unsigned long delta, struct jcore_pit *pit) > > > > +{ > > > > + jcore_pit_disable(pit); > > > > + __raw_writel(delta, pit->base + REG_THROT); > > > > + __raw_writel(pit->enable_val, pit->base + REG_PITEN); > > > > + return 0; > > > > > > Why do you need to use __raw_writel ? > > > > > > s/__raw_writel/writel/ > > > > I actually tried multiple times to find good resources on policy for > > which form to prefer, but didn't have much luck. My understanding is > > that __raw_writel/__raw_readl always performs a native-endian > > load/store, whereas writel/readl behavior depends on cpu endianness > > and whether the arch declares that "pci bus" (that was the term I > > found in the source, not relevant here) io is endian-swapped or not. > > Can you give me a better explanation of why we might prefer one form > > or the other? > > In general, a portable driver should use readl/writel because they > are defined to have the behavior that x86 provides, which is what > most developers understand best. Compared to __raw_readl/__raw_writel, > it guarantees > > - correct endianess (in most cases, exceptions see below) > - does not leak out of spinlocks Not even normal non-volatile accesses can leak across spinlocks; they're memory+compiler barriers. > - accesses are word-sized, the compiler cannot split them into > byte accesses, or merge subsequent accesses into larger words Based on my reading of the code that's also true for the raw ones. At least GCC interprets volatile as disallowing split/combined loads/stores when the hardware supports the nominal load/store size, and this seems to agree with the intent of the C standard that volatile be usable for memory-mapped devices.. > - ordering against dma_map_*/dma_unmap_*/dma_sync_* is maintained Aren't they barriers too? > The __raw_* accessors are used as building blocks for > readl/outl/ioread32/ioread32_be/readl_relaxed/... and they can > usually be used safely on device RAM such as framebuffers but > not much else in portable code. Some architectures also use the > __raw_* accessors for MMIO registers, but that code is generally > not portable. My concern is that, depending on some arcane defines, readl/writel could start doing byte-reversal of values since the CPU is big-endian. arch/sh seems to have them defined in such a way that this doesn't happen, but I was hoping to avoid depending on that. If you think it's safe to assume they do the right thing, though, I'm okay with it. If anything breaks when we try a little-endian version of the CPU/SoC later I can go back and fix it. > Endianess is always more complicated than one thinks, and it's > handled differently across architectures, sometimes because of > hardware differences, sometimes for historic reasons that differ > only in the Linux implementation. > > A large majority of devices are fixed to little-endian MMIO > registers (as recommended by PCI), so that is a good default. > > Exceptions to this are: > > * Some architectures (e.g. PowerPC) typically run big-endian code, > and hardware designers decided to make the MMIO registers the > same. In this case, you can generally use ioread32_be/iowrite32_be. This is something like our case, but the mmio registers are accessed as 32-bit values and the endianness does not come into play. Sub-word accesses just don't work at all for most of the mmio devices. This will remain the same if/when we add little endian builds (there's work on this on the jcore list). Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 24, 2016 at 04:52:26PM -0400, Rich Felker wrote: > On Wed, Aug 24, 2016 at 10:01:08PM +0200, Arnd Bergmann wrote: > > On Wednesday, August 24, 2016 1:40:01 PM CEST Rich Felker wrote: > > > I actually tried multiple times to find good resources on policy for > > > which form to prefer, but didn't have much luck. My understanding is > > > that __raw_writel/__raw_readl always performs a native-endian > > > load/store, whereas writel/readl behavior depends on cpu endianness > > > and whether the arch declares that "pci bus" (that was the term I > > > found in the source, not relevant here) io is endian-swapped or not. > > > Can you give me a better explanation of why we might prefer one form > > > or the other? > > > > In general, a portable driver should use readl/writel because they > > are defined to have the behavior that x86 provides, which is what > > most developers understand best. Compared to __raw_readl/__raw_writel, > > it guarantees > > > > - correct endianess (in most cases, exceptions see below) > > - does not leak out of spinlocks > > Not even normal non-volatile accesses can leak across spinlocks; > they're memory+compiler barriers. Note that Arnd was talking in general, about cross-architectural guarantees. Across architectures, not all barriers are equivalent. On some architectures, the barriers used by spinlocks don't necessarily order accesses across disparate memory types (i.e. DRAM and IO), or might not guarantee that the ordering is consistent to observers other than (coherent) CPUs. Hence a portable driver should care about this distinction. If they're equivalent for you, you may as well use the portably stronger form, as it's less likely to surprise others. > > - accesses are word-sized, the compiler cannot split them into > > byte accesses, or merge subsequent accesses into larger words > > Based on my reading of the code that's also true for the raw ones. At > least GCC interprets volatile as disallowing split/combined > loads/stores when the hardware supports the nominal load/store size, > and this seems to agree with the intent of the C standard that > volatile be usable for memory-mapped devices.. > > > - ordering against dma_map_*/dma_unmap_*/dma_sync_* is maintained > > Aren't they barriers too? As above, not all barriers are equivalent. Your architecture may provide stronger (or weaker) guarantees than others. > > The __raw_* accessors are used as building blocks for > > readl/outl/ioread32/ioread32_be/readl_relaxed/... and they can > > usually be used safely on device RAM such as framebuffers but > > not much else in portable code. Some architectures also use the > > __raw_* accessors for MMIO registers, but that code is generally > > not portable. > > My concern is that, depending on some arcane defines, readl/writel > could start doing byte-reversal of values since the CPU is big-endian. > arch/sh seems to have them defined in such a way that this doesn't > happen, but I was hoping to avoid depending on that. If you think it's > safe to assume they do the right thing, though, I'm okay with it. If > anything breaks when we try a little-endian version of the CPU/SoC > later I can go back and fix it. I guess this really depends on the expectations you have w.r.t. the endianness of devices vs the endianness of CPUs. If you expect the two to always be matched, please add a comment in the driver to that effect. As Arnd and others mentioned, the vastly common case is that the endianness of CPUs and devices is not fixed to each other, and devices are typically little-endian. > > Endianess is always more complicated than one thinks, and it's > > handled differently across architectures, sometimes because of > > hardware differences, sometimes for historic reasons that differ > > only in the Linux implementation. > > > > A large majority of devices are fixed to little-endian MMIO > > registers (as recommended by PCI), so that is a good default. > > > > Exceptions to this are: > > > > * Some architectures (e.g. PowerPC) typically run big-endian code, > > and hardware designers decided to make the MMIO registers the > > same. In this case, you can generally use ioread32_be/iowrite32_be. > > This is something like our case, but the mmio registers are accessed > as 32-bit values and the endianness does not come into play. This really depends on you endianness model, bus, etc. The above sounds like BE32 rather than BE8, assuming I've understood correctly, which is an unusual case nowadays. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 24, 2016 at 10:22:13PM +0100, Mark Rutland wrote: > On Wed, Aug 24, 2016 at 04:52:26PM -0400, Rich Felker wrote: > > On Wed, Aug 24, 2016 at 10:01:08PM +0200, Arnd Bergmann wrote: > > > On Wednesday, August 24, 2016 1:40:01 PM CEST Rich Felker wrote: > > > > I actually tried multiple times to find good resources on policy for > > > > which form to prefer, but didn't have much luck. My understanding is > > > > that __raw_writel/__raw_readl always performs a native-endian > > > > load/store, whereas writel/readl behavior depends on cpu endianness > > > > and whether the arch declares that "pci bus" (that was the term I > > > > found in the source, not relevant here) io is endian-swapped or not. > > > > Can you give me a better explanation of why we might prefer one form > > > > or the other? > > > > > > In general, a portable driver should use readl/writel because they > > > are defined to have the behavior that x86 provides, which is what > > > most developers understand best. Compared to __raw_readl/__raw_writel, > > > it guarantees > > > > > > - correct endianess (in most cases, exceptions see below) > > > - does not leak out of spinlocks > > > > Not even normal non-volatile accesses can leak across spinlocks; > > they're memory+compiler barriers. > > Note that Arnd was talking in general, about cross-architectural guarantees. > > Across architectures, not all barriers are equivalent. On some architectures, > the barriers used by spinlocks don't necessarily order accesses across > disparate memory types (i.e. DRAM and IO), or might not guarantee that the > ordering is consistent to observers other than (coherent) CPUs. Hence a > portable driver should care about this distinction. > > If they're equivalent for you, you may as well use the portably stronger form, > as it's less likely to surprise others. *nod* Understood. > > > - accesses are word-sized, the compiler cannot split them into > > > byte accesses, or merge subsequent accesses into larger words > > > > Based on my reading of the code that's also true for the raw ones. At > > least GCC interprets volatile as disallowing split/combined > > loads/stores when the hardware supports the nominal load/store size, > > and this seems to agree with the intent of the C standard that > > volatile be usable for memory-mapped devices.. > > > > > - ordering against dma_map_*/dma_unmap_*/dma_sync_* is maintained > > > > Aren't they barriers too? > > As above, not all barriers are equivalent. Your architecture may provide > stronger (or weaker) guarantees than others. OK. > > > The __raw_* accessors are used as building blocks for > > > readl/outl/ioread32/ioread32_be/readl_relaxed/... and they can > > > usually be used safely on device RAM such as framebuffers but > > > not much else in portable code. Some architectures also use the > > > __raw_* accessors for MMIO registers, but that code is generally > > > not portable. > > > > My concern is that, depending on some arcane defines, readl/writel > > could start doing byte-reversal of values since the CPU is big-endian. > > arch/sh seems to have them defined in such a way that this doesn't > > happen, but I was hoping to avoid depending on that. If you think it's > > safe to assume they do the right thing, though, I'm okay with it. If > > anything breaks when we try a little-endian version of the CPU/SoC > > later I can go back and fix it. > > I guess this really depends on the expectations you have w.r.t. the endianness > of devices vs the endianness of CPUs. If you expect the two to always be > matched, please add a comment in the driver to that effect. > > As Arnd and others mentioned, the vastly common case is that the endianness of > CPUs and devices is not fixed to each other, and devices are typically > little-endian. Since it's not used in other designs, I can't say anything meaningful about how it hypothetically might be, but the intent on all current and currently-planned systems is that 32-bit loads/stores be performed using values that are meaningful as values on the cpu, i.e. with no byte swapping regardless of cpu endianness. The current hardware is designed as big-endian, and the proposed patches for supporting little endian just involve flipping the low bits of the address for 16- and 8-bit accesses (not relevant to mmio registers that are only accessible in 32-bit units). > > > Endianess is always more complicated than one thinks, and it's > > > handled differently across architectures, sometimes because of > > > hardware differences, sometimes for historic reasons that differ > > > only in the Linux implementation. > > > > > > A large majority of devices are fixed to little-endian MMIO > > > registers (as recommended by PCI), so that is a good default. > > > > > > Exceptions to this are: > > > > > > * Some architectures (e.g. PowerPC) typically run big-endian code, > > > and hardware designers decided to make the MMIO registers the > > > same. In this case, you can generally use ioread32_be/iowrite32_be. > > > > This is something like our case, but the mmio registers are accessed > > as 32-bit values and the endianness does not come into play. > > This really depends on you endianness model, bus, etc. The above sounds like > BE32 rather than BE8, assuming I've understood correctly, which is an unusual > case nowadays. I'm not familiar with those classifications, but from what I can tell, BE32 describes it correctly. I'll see if I can get someone to verify this. Is there a reason it's not widely used anymore? Perhaps something related to supporting misaligned word-sized loads/stores? Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, August 24, 2016 5:44:44 PM CEST Rich Felker wrote: > On Wed, Aug 24, 2016 at 10:22:13PM +0100, Mark Rutland wrote: > > On Wed, Aug 24, 2016 at 04:52:26PM -0400, Rich Felker wrote: > > As Arnd and others mentioned, the vastly common case is that the endianness of > > CPUs and devices is not fixed to each other, and devices are typically > > little-endian. > > Since it's not used in other designs, I can't say anything meaningful > about how it hypothetically might be, but the intent on all current > and currently-planned systems is that 32-bit loads/stores be performed > using values that are meaningful as values on the cpu, i.e. with no > byte swapping regardless of cpu endianness. The current hardware is > designed as big-endian, and the proposed patches for supporting little > endian just involve flipping the low bits of the address for 16- and > 8-bit accesses (not relevant to mmio registers that are only > accessible in 32-bit units). > > > > > > > > > Exceptions to this are: > > > > > > > > * Some architectures (e.g. PowerPC) typically run big-endian code, > > > > and hardware designers decided to make the MMIO registers the > > > > same. In this case, you can generally use ioread32_be/iowrite32_be. > > > > > > This is something like our case, but the mmio registers are accessed > > > as 32-bit values and the endianness does not come into play. > > > > This really depends on you endianness model, bus, etc. The above sounds like > > BE32 rather than BE8, assuming I've understood correctly, which is an unusual > > case nowadays. Correct, which is the last case I had in my list, not the first one: > > > * Some (usually very old) SoCs have configurable endianess, but > > > don't actually change the behavior of the SoC or the MMIO devices, > > > but only flip the address lines on the memory interface. > > > This is similar to the previous one, but has additional problems > > > with FIFO registers. You generally cannot use portable drivers > > > here. With your description above, that would probably be called LE32, given that the native ordering is big-endian, and the little-endian mode is only approximated using address magic. ARM BE32 mode (e.g. on IXP4xx) did it the other way round. > I'm not familiar with those classifications, but from what I can tell, > BE32 describes it correctly. I'll see if I can get someone to verify > this. Is there a reason it's not widely used anymore? Perhaps > something related to supporting misaligned word-sized loads/stores? The main problem I see is that you can't easily use MMIO registers that are not 32-bit wide -- you end up having to flip not just the register contents but also the lower bits of the address in order to reach the right contents of the register. If you can guarantee that all registers are 32-bit wide, you can create a header file with your own MMIO accessors that wrap around readl/write or ioread32_be/iowrite32_be depending on the kernel endianess, and don't provide readb/writeb or readw/writew wrappers. However, what everyone else does these days is to flip the register values in load/store instructions when the opposite endianess is selected, or to have two sets of load/store instructions and leave it up to the compiler to pick which one to use (probably not an option for you, since you use an existing instruction set). If you do that, all portable device drivers should work fine on your architecture. Arnd
Hi, On Wed, Aug 24, 2016 at 03:20:09PM -0400, Rich Felker wrote: > On Wed, Aug 24, 2016 at 08:01:52PM +0100, Marc Zyngier wrote: > > On Wed, 24 Aug 2016 13:40:01 -0400 > > Rich Felker <dalias@libc.org> wrote: > > > > [...] > > > > > > IIUC, there is a problem with the interrupt controller where > > the per irq > > > > line are not working correctly. Is that correct ? > > > > > > I don't think that's a correct characterization. Rather the percpu > > > infrastructure just means something completely different from what you > > > would expect it to mean. It has nothing to do with the hardware but > > > rather with kernel-internal choice of whether to do percpu devid > > > mapping inside the irq infrastructure, and the choice at the > > > irq-requester side of whether to do this is required to match the > > > irqchip driver's choice. I explained this better in another email > > > which I could dig up if necessary, but the essence is that > > > request_percpu_irq is a misnamed and unusably broken API. > > For reference, here's the thread I was referring to: > > https://lkml.org/lkml/2016/7/15/585 That reply seems to imply that the percpu_irq functionality is only there to shift a percpu pointer, and that any IRQ can be treated either as a 'regular' interrupt or a percpu one. That's not really the case, and I'm worried that this implies that the way your hardware and driver behaves violates (implicit) assumptions made by core code. Please see below for more details and questions. The percpu IRQ infrastructure was added to handle things like SPI (Shared Peripheral Interrupt) vs PPI (Private Peripheral Interrupt) on ARM GICs (Generic Interrupt Controllers), where the two classses of interrupt are distinct, and the latter is very much a per-cpu resource in hardware, while the former is not. In the GIC, particular IRQ IDs are reserved for each class, making them possible to distinguish easily. Consider an ARM system with N CPUs, a GIC, some CPU-local timers (wired as PPIs), and some other peripherals (wired as SPIs). For each SPI, it is as if there is a single wire into the GIC. An SPI can be programmatically configured to be routed to any arbitrary CPU (or several, if one really wants that), but it's fundamentally a single interrupt, and there is a single GIC state machine for it (even if it's routed to multiple CPUs simultaneously). Any state for this class of interrupt can/must be protected by some global lock or other synchronisation mechanism, as two CPUs can't take the same SPI simultaneously, though the affinity might be changed at any point. For each PPI, it is as if there are N wires into the GIC, each attached to a CPU-affine device. It's effectively a group of N interrupts for a particular purpose, sharing the same ID. Each wire is routed to a particular CPU (which cannot be changed), and there is an independent GIC state machine per-cpu (accessed through registers which are banked per-cpu). Any state for this class of interrupt can/must be protected by some per-cpu lock or other synchronisation mechanism, as the instances of the interrupt can be taken on multiple CPUs simultaneously. The above is roughly what Linux understands today as the distinction between a 'regular' and percpu interrupt. As above, you cannot freely treat either as the other -- it is a hardware property. If you took the same regular interrupt on different CPUs simultaneously, the core IRQ system is liable to be confused, as it does not expect this to happen -- e.g. if for some reason we had some global lock or accounting data structure. Note that you could have some percpu resource using N SPIs. That would not be a percpu irq, even if you only want to handle each interrupt on a particular CPU. With contrast to the above, can you explain how your interrupts behave? Is the routing to a CPU or CPUs fixed? Are there independent state machines per-cpu for your timer interrupt in the interrupt controller hardware? > > Or just one that simply doesn't fit your needs, because other > > architectures have different semantics than the ones you take for > > granted? > > I don't think so. The choice of whether to have the irq layer or the > driver's irq handler be responsible for handling a percpu pointer has > nothing to do with the hardware. As above, percpu interrupts are not just about shifting pointers. Using the correct class is critical to continued correct operation. > Perhaps the intent was that the irqchip driver always knows whether a > given hwirq[-range] is associated with per-cpu events or global events > for which it doesn't matter what cpu they're delivered on. So far, the assumption has been that the irqchip (driver) knows whether a particular hwirq is regular or percpu. > In this case, the situations where you may want percpu dev_id mapping line up > with some property of the hardware. However that need not be the case, and > it's not when the choice of irq is programmable. I guess this depends on the behaviour of your HW. Does the timer interrupt behave like the GIC PPI I explained above? I assume others behave like the GIC SPI? What happens if you route one of those to multiple CPUs? If the distinctions largely matches SPI/PPI, other than the lack of a fixed ID, there are ways we can solve this, e.g. annotating percpu interrupts in the DT with a flag, and deferring their initialisation to the first request. > > > > Regarding Marc Zyngier comments about the irq controller driver being > > > > almost empty, I'm wondering if something in the irq controller driver > > > > which shouldn't be added before submitting this timer driver with SMP > > > > support (eg. irq domain ?). > > > > > > I don't think so. At most I could make the driver hard-code the percpu > > > devid model for certain irqs, but that _does not reflect_ anything > > > about the hardware. Rather it just reflects bad kernel internals. It > > > > I'd appreciate it if instead of ranting about how broken the kernel is, > > you'd submit a patch fixing it, since you seem to have spotted > > something that we haven't in several years of using that code on a > > couple of ARM-related platforms. > > I didn't intend for this to be a rant. I'm not demanding that it be > changed; I'm only objecting to being asked to make the driver use a > framework that it doesn't need and that can't model what needs to be > done. But I'm happy to discuss whether you would be open to such a > change, and if so, to write and submit a patch. The ideas for what it > would involve are in the linked email, quoted here: > > "... This is because the irq controller driver must, at irqdomain > mapping time, decide whether to register the handler as > handle_percpu_devid_irq (which interprets dev_id as a __percpu > pointer and remaps it for the local cpu before invoking the > driver's handler) or one of the other handlers that does not > perform any percpu remapping. > > The right way for this to work would be for > handle_irq_event_percpu to be responsible for the remapping, but > do it conditionally on whether the irq was requested via > request_irq or request_percpu_irq." As above, I don't think that this is quite the right approach, as this allows one to erroneously handle distinct classes of interrupt as one another. I agree that in the case that the percpu-ness of a hwirq number is not fixed property of the interrupt-controller, we may need to defer initialisation until the request (where we can query the DT, say). Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 24, 2016 at 05:44:44PM -0400, Rich Felker wrote: > On Wed, Aug 24, 2016 at 10:22:13PM +0100, Mark Rutland wrote: > > On Wed, Aug 24, 2016 at 04:52:26PM -0400, Rich Felker wrote: > > > On Wed, Aug 24, 2016 at 10:01:08PM +0200, Arnd Bergmann wrote: > > > > On Wednesday, August 24, 2016 1:40:01 PM CEST Rich Felker wrote: > > > > The __raw_* accessors are used as building blocks for > > > > readl/outl/ioread32/ioread32_be/readl_relaxed/... and they can > > > > usually be used safely on device RAM such as framebuffers but > > > > not much else in portable code. Some architectures also use the > > > > __raw_* accessors for MMIO registers, but that code is generally > > > > not portable. > > > > > > My concern is that, depending on some arcane defines, readl/writel > > > could start doing byte-reversal of values since the CPU is big-endian. > > > arch/sh seems to have them defined in such a way that this doesn't > > > happen, but I was hoping to avoid depending on that. If you think it's > > > safe to assume they do the right thing, though, I'm okay with it. If > > > anything breaks when we try a little-endian version of the CPU/SoC > > > later I can go back and fix it. > > > > I guess this really depends on the expectations you have w.r.t. the endianness > > of devices vs the endianness of CPUs. If you expect the two to always be > > matched, please add a comment in the driver to that effect. > > > > As Arnd and others mentioned, the vastly common case is that the endianness of > > CPUs and devices is not fixed to each other, and devices are typically > > little-endian. > > Since it's not used in other designs, I can't say anything meaningful > about how it hypothetically might be, but the intent on all current > and currently-planned systems is that 32-bit loads/stores be performed > using values that are meaningful as values on the cpu, i.e. with no > byte swapping regardless of cpu endianness. The current hardware is > designed as big-endian, and the proposed patches for supporting little > endian just involve flipping the low bits of the address for 16- and > 8-bit accesses (not relevant to mmio registers that are only > accessible in 32-bit units). Ok. Generally speaking, it's preferable (to the Linux community) that devices has some known fixed endianness, and that systems behave in a BE8 fashion (I'll clarify that below per your prior question), i.e. not just flipping the low bits of addresses. That allows common infrastructure like readl/writel to be used consistently and reliably, regardless of what peripherals are integrated into the system, and regardless of the endianness of the CPUs. The kernel has to know which endianness it is (for other reasons at least), and in general the endianness swap that may be required in some cases is less of a performance const than any infrastructure required to handle varied integration possibilities (e.g. LE, BE, 'native'). It will also avoid having to modify each and every driver that already use the existing infrastructure... > > > > Endianess is always more complicated than one thinks, and it's > > > > handled differently across architectures, sometimes because of > > > > hardware differences, sometimes for historic reasons that differ > > > > only in the Linux implementation. > > > > > > > > A large majority of devices are fixed to little-endian MMIO > > > > registers (as recommended by PCI), so that is a good default. > > > > > > > > Exceptions to this are: > > > > > > > > * Some architectures (e.g. PowerPC) typically run big-endian code, > > > > and hardware designers decided to make the MMIO registers the > > > > same. In this case, you can generally use ioread32_be/iowrite32_be. > > > > > > This is something like our case, but the mmio registers are accessed > > > as 32-bit values and the endianness does not come into play. > > > > This really depends on you endianness model, bus, etc. The above sounds like > > BE32 rather than BE8, assuming I've understood correctly, which is an unusual > > case nowadays. > > I'm not familiar with those classifications, but from what I can tell, > BE32 describes it correctly. I'm not sure if it's an ARM term or a more general one. See [1] for a description. Per your description of flipping address lines, it sounds like you're (planning on) implementing BE32. > I'll see if I can get someone to verify this. Is there a reason it's not > widely used anymore? Perhaps something related to supporting misaligned > word-sized loads/stores? For one thing, BE8 is self-consistent as you scale to larger word sizes, rather than having to be special-cased at that boundary (i.e. one always has to endian swap rather than only swap the 32-bit halves of a 64-bit load with BE32). BE8 is generally seen as the 'true' big-endian, with BE32 being a particular 'middle-endian' variant [2]. Others are better equipped to provide rationale and/or war stories... Thanks, Mark. [1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html [2] https://en.wikipedia.org/wiki/Endianness#Middle-endian -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rich, On Wed, 24 Aug 2016, Rich Felker wrote: > I don't think that's a correct characterization. Rather the percpu > infrastructure just means something completely different from what you > would expect it to mean. It has nothing to do with the hardware but > rather with kernel-internal choice of whether to do percpu devid > mapping inside the irq infrastructure, and the choice at the > irq-requester side of whether to do this is required to match the > irqchip driver's choice. I explained this better in another email > which I could dig up if necessary, but the essence is that > request_percpu_irq is a misnamed and unusably broken API. I slowly get tired about your permanent ranting about misnamed, misdesigned and unusable code in the kernel. The percpu infrastructure was designed explicitely to reflect the hardware and is not a random kernel internal choice. It's there to handle the situation where a single interrupt number is actually available on each CPU. ARM has a very clear distinction between PPI (per processor interrutps) and SPI (shared peripheral interrupts). So we end up with the following interrupt space: IRQ CPU0 CPU1 0 PPI PPI ... ... ... 15 PPI PPI 16 SPI SPI ... N SPI SPI The fundamental difference between PPI and SPI is that PPIs originate from the CPU itself or are injected as IPIs targeted to a particular CPU. Each CPU has its own interrupt controller to mask/ack/... PPIs. The SPIs have a shared interrupt controller and therefor are globally shared between all cpus. The percpu infrastructure reflects exaclty this scenario and allows us to handle PPIs with the same irq number (which makes a lot of sense) seperately on each CPU. So how does this not reflect the hardware? If your particular hardware has the old scheme of seperate interrupt numbers for per cpu interrupts, then you can simply use the normal interrupt scheme and request a seperate interrupt per cpu. Just because something does not fit your needs and does not match your SH blinkered worldview does not make it misnamed, misdesigned and unusable. If you want to work with this community you really should start to talk constructive with us and stop this unjustified ranting attitude. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, August 24, 2016 11:57:08 PM CEST Arnd Bergmann wrote: > > I'm not familiar with those classifications, but from what I can tell, > > BE32 describes it correctly. I'll see if I can get someone to verify > > this. Is there a reason it's not widely used anymore? Perhaps > > something related to supporting misaligned word-sized loads/stores? > > The main problem I see is that you can't easily use MMIO registers that > are not 32-bit wide -- you end up having to flip not just the register > contents but also the lower bits of the address in order to reach > the right contents of the register. > Actually there is a much more serious problem with BE32/LE32 mode: doing unaligned access to RAM is a huge pain at the HW level, and also when emulating it in the kernel on CPUs that require aligned access. In ARM THUMB2 mode, you can even have unaligned 32-bit instructions. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2016 at 10:07:08AM +0200, Thomas Gleixner wrote: > Rich, > > On Wed, 24 Aug 2016, Rich Felker wrote: > > I don't think that's a correct characterization. Rather the percpu > > infrastructure just means something completely different from what you > > would expect it to mean. It has nothing to do with the hardware but > > rather with kernel-internal choice of whether to do percpu devid > > mapping inside the irq infrastructure, and the choice at the > > irq-requester side of whether to do this is required to match the > > irqchip driver's choice. I explained this better in another email > > which I could dig up if necessary, but the essence is that > > request_percpu_irq is a misnamed and unusably broken API. > > I slowly get tired about your permanent ranting about misnamed, misdesigned > and unusable code in the kernel. This is the only instance I'm aware of, and my intent was not to rant. If you're aware of other cases where I've done this and it's come across abrasive to you or others, I'd appreciate it if you could point them out so I can try to avoid doing so in the future. As for this topic, what happened is that, after the first and second time of expressing confusion about how the infrastructure did not make sense and whether I should even be using it, I did not get anything resembling an explanation of why it's the way it is or why I might be wrong in thinking it's a bug, and so I went forward with the assumption that is was just a bug. Now that Mark Rutland has explained it well (and with your additional explanation below in your email), I see what the motivation was, but I still think it could be done in a less-confusing and more-consistent way that doesn't assume ARM-like irq architecture. > If your particular hardware has the old scheme of seperate interrupt numbers > for per cpu interrupts, then you can simply use the normal interrupt scheme > and request a seperate interrupt per cpu. Nominally it uses the same range of hardware interrupt numbers for all (presently both) cpus, but some of them get delivered to a specific cpu associated with the event (presently, IPI and timer; IPI is on a fixed number at synthesis time but timer is runtime configurable) while others are conceptually deliverable to either cpu (presently only delivered to cpu0, but that's treated as an implementation detail). It currently works requesting the irq with flags that ensure the handler runs on the same cpu it was delivered on, without using any other percpu irq framework. If you have concerns about ways this could break and want me to make the drivers do something else, I'm open to suggestions. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 25 Aug 2016, Rich Felker wrote: > assumption that is was just a bug. Now that Mark Rutland has explained > it well (and with your additional explanation below in your email), I > see what the motivation was, but I still think it could be done in a > less-confusing and more-consistent way that doesn't assume ARM-like > irq architecture. It's not only ARM. Some MIPS Octeon stuff has the same layout and requirements to use a single irq number for interrupts which are delivered on a per cpu basis. Patches are welcome :) > > If your particular hardware has the old scheme of seperate interrupt numbers > > for per cpu interrupts, then you can simply use the normal interrupt scheme > > and request a seperate interrupt per cpu. > > Nominally it uses the same range of hardware interrupt numbers for all > (presently both) cpus, but some of them get delivered to a specific > cpu associated with the event (presently, IPI and timer; IPI is on a > fixed number at synthesis time but timer is runtime configurable) > while others are conceptually deliverable to either cpu (presently > only delivered to cpu0, but that's treated as an implementation > detail). If I understand correctly, then this is the classic scheme: CPU0 IPI0 IRQ-N CPU1 IPI1 IRQ-M These and the timers or whatever are strict per cpu and therefor not routable. Regular device interrupts can be routed to any CPU by setting the affinity. Correct? > It currently works requesting the irq with flags that ensure the > handler runs on the same cpu it was delivered on, without using any > other percpu irq framework. Which special flag are you referring to? I'm not aware of one. IRQF_PER_CPU is just telling the core that this is a non routable per cpu interrupt. It's excluded from affinity setting and also on cpu hot unplug the per cpu interrupts are not touched and nothing tries to reroute them to one of the still online cpus. Regarding the interrupt handler. It runs on the CPU on which the interrupt is delivered and there is nothing you can influence with a flag. > If you have concerns about ways this could break and want me to make the > drivers do something else, I'm open to suggestions. If I understand the hardware halfways right, then using request_irq() with IRQF_PER_CPU for these special interrupts is completely correct. The handler either uses this_cpu_xxx() for accessing the per cpu data related to the interrupt or you can hand in a percpu pointer as dev_id to request_irq() which then is handed to the interrupt function as a cookie. Hope that helps. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2016 at 10:56:50AM -0400, Rich Felker wrote: > On Thu, Aug 25, 2016 at 10:07:08AM +0200, Thomas Gleixner wrote: > > On Wed, 24 Aug 2016, Rich Felker wrote: > As for this topic, what happened is that, after the first and second > time of expressing confusion about how the infrastructure did not make > sense and whether I should even be using it, I did not get anything > resembling an explanation of why it's the way it is or why I might be > wrong in thinking it's a bug, and so I went forward with the > assumption that is was just a bug. Now that Mark Rutland has explained > it well (and with your additional explanation below in your email), I > see what the motivation was, but I still think it could be done in a > less-confusing and more-consistent way that doesn't assume ARM-like > irq architecture. The fun this is that right until the point you sent this patch, it was consistent for all hardware that we support and were aware of. For others, it is your hardware that is the confusing case. ;) In future, it's better to state 'this doesn't seem to match my hardware' rather than 'this is misdesigned'. Ideally, with a description of how your hardware works, as that's the thing no-one else is familiar with. > > If your particular hardware has the old scheme of seperate interrupt numbers > > for per cpu interrupts, then you can simply use the normal interrupt scheme > > and request a seperate interrupt per cpu. > > Nominally it uses the same range of hardware interrupt numbers for all > (presently both) cpus, but some of them get delivered to a specific > cpu associated with the event (presently, IPI and timer; IPI is on a > fixed number at synthesis time but timer is runtime configurable) > while others are conceptually deliverable to either cpu (presently > only delivered to cpu0, but that's treated as an implementation > detail). Given you say it's delivered to the CPU associated with the event (and you have different PIT bases per-cpu), it sounds like your timer interrupt is percpu, it's just that the hwirq number can be chosen by software. > It currently works requesting the irq with flags that ensure the > handler runs on the same cpu it was delivered on, without using any > other percpu irq framework. If you have concerns about ways this could > break and want me to make the drivers do something else, I'm open to > suggestions. As I suggested, I don't think that this is right, and you need some mechanism to describe to the kernel that the interrupt is percpu (e.g. a flag in the interrupt-specifier in DT). Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2016 at 05:41:29PM +0200, Thomas Gleixner wrote: > On Thu, 25 Aug 2016, Rich Felker wrote: > > assumption that is was just a bug. Now that Mark Rutland has explained > > it well (and with your additional explanation below in your email), I > > see what the motivation was, but I still think it could be done in a > > less-confusing and more-consistent way that doesn't assume ARM-like > > irq architecture. > > It's not only ARM. Some MIPS Octeon stuff has the same layout and requirements > to use a single irq number for interrupts which are delivered on a per cpu > basis. > > Patches are welcome :) I'm not opposed to working on changes, but based on your below comments I think maybe this (percpu request) is just infrastructure I shouldn't be using. I think the source of my frustration was the repeated (maybe by different people; I don't remember now) suggestions that I use it even when I found that it didn't currently match well with the hardware. > > > If your particular hardware has the old scheme of seperate interrupt numbers > > > for per cpu interrupts, then you can simply use the normal interrupt scheme > > > and request a seperate interrupt per cpu. > > > > Nominally it uses the same range of hardware interrupt numbers for all > > (presently both) cpus, but some of them get delivered to a specific > > cpu associated with the event (presently, IPI and timer; IPI is on a > > fixed number at synthesis time but timer is runtime configurable) > > while others are conceptually deliverable to either cpu (presently > > only delivered to cpu0, but that's treated as an implementation > > detail). > > If I understand correctly, then this is the classic scheme: > > CPU0 IPI0 IRQ-N > CPU1 IPI1 IRQ-M > > These and the timers or whatever are strict per cpu and therefor not routable. > Regular device interrupts can be routed to any CPU by setting the > affinity. Correct? IPI generates hw irq 97 on whichever cpu it's targeted at, and the timer generates whatever hw irq you program it to generate (by convention, currently 72) on the cpu associated with the timer that expired. Treating "cpu0's irq 97" and "cpu1's irq 97" as separate hw irq numbers would be possible at the kernel level (just by using the cpu id as part of the logical hw irq number) but this would require lots of (imo useless) infrastructure/overhead and hard-coded assumptions about which irq numbers are used for percpu events, and it would not model the hardware well. > > It currently works requesting the irq with flags that ensure the > > handler runs on the same cpu it was delivered on, without using any > > other percpu irq framework. > > Which special flag are you referring to? I'm not aware of one. > > IRQF_PER_CPU is just telling the core that this is a non routable per cpu > interrupt. It's excluded from affinity setting and also on cpu hot unplug the > per cpu interrupts are not touched and nothing tries to reroute them to one of > the still online cpus. > > Regarding the interrupt handler. It runs on the CPU on which the interrupt is > delivered and there is nothing you can influence with a flag. OK, I was not clear on whether there was such a guarantee in general but knew there must be one for IRQF_TIMER or IRQF_PER_CPU. (Without knowing the system doesn't do this, it's possible that the softirq/tasklet stuff could migrate handling to a different cpu than the hardware irq was delivered on.) > > If you have concerns about ways this could break and want me to make the > > drivers do something else, I'm open to suggestions. > > If I understand the hardware halfways right, then using request_irq() with > IRQF_PER_CPU for these special interrupts is completely correct. > > The handler either uses this_cpu_xxx() for accessing the per cpu data related > to the interrupt or you can hand in a percpu pointer as dev_id to > request_irq() which then is handed to the interrupt function as a cookie. Yes, that's exactly what my driver is doing now, and I'm happy to leave it that way. Can we move forward with that? If so I'll make the other changes requested and submit a new version of the patch. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2016 at 05:38:06PM +0100, Mark Rutland wrote: > On Thu, Aug 25, 2016 at 10:56:50AM -0400, Rich Felker wrote: > > On Thu, Aug 25, 2016 at 10:07:08AM +0200, Thomas Gleixner wrote: > > > On Wed, 24 Aug 2016, Rich Felker wrote: > > As for this topic, what happened is that, after the first and second > > time of expressing confusion about how the infrastructure did not make > > sense and whether I should even be using it, I did not get anything > > resembling an explanation of why it's the way it is or why I might be > > wrong in thinking it's a bug, and so I went forward with the > > assumption that is was just a bug. Now that Mark Rutland has explained > > it well (and with your additional explanation below in your email), I > > see what the motivation was, but I still think it could be done in a > > less-confusing and more-consistent way that doesn't assume ARM-like > > irq architecture. > > The fun this is that right until the point you sent this patch, it was > consistent for all hardware that we support and were aware of. For others, it > is your hardware that is the confusing case. ;) > > In future, it's better to state 'this doesn't seem to match my hardware' rather > than 'this is misdesigned'. Ideally, with a description of how your hardware > works, as that's the thing no-one else is familiar with. OK. What I meant is "it's not sufficiently general". I usually come at these things not from a particular preconception of how it is/should-be done on some archs I'm familiar with, but minimizing unnecessary (i.e. not beneficial to performance or correctness or simplicity) assumptions at the subsystem level about how hardware could work. > > > If your particular hardware has the old scheme of seperate interrupt numbers > > > for per cpu interrupts, then you can simply use the normal interrupt scheme > > > and request a seperate interrupt per cpu. > > > > Nominally it uses the same range of hardware interrupt numbers for all > > (presently both) cpus, but some of them get delivered to a specific > > cpu associated with the event (presently, IPI and timer; IPI is on a > > fixed number at synthesis time but timer is runtime configurable) > > while others are conceptually deliverable to either cpu (presently > > only delivered to cpu0, but that's treated as an implementation > > detail). > > Given you say it's delivered to the CPU associated with the event (and you have > different PIT bases per-cpu), it sounds like your timer interrupt is percpu, > it's just that the hwirq number can be chosen by software. It's what I would call percpu in the hardware, but I'm not convinced that the Linux irq subsystem's "percpu" stuff models it in a way that fits the hw, nor that it's in any way necessary. > > It currently works requesting the irq with flags that ensure the > > handler runs on the same cpu it was delivered on, without using any > > other percpu irq framework. If you have concerns about ways this could > > break and want me to make the drivers do something else, I'm open to > > suggestions. > > As I suggested, I don't think that this is right, and you need some mechanism > to describe to the kernel that the interrupt is percpu (e.g. a flag in the > interrupt-specifier in DT). Thomas seemed to think it's okay as-is. Can you describe what you expect could go wrong by using request_irq rather than the ARM-style percpu irq framework? Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2016 at 01:51:35PM -0400, Rich Felker wrote: > On Thu, Aug 25, 2016 at 05:38:06PM +0100, Mark Rutland wrote: > > On Thu, Aug 25, 2016 at 10:56:50AM -0400, Rich Felker wrote: > > > On Thu, Aug 25, 2016 at 10:07:08AM +0200, Thomas Gleixner wrote: > > > Nominally it uses the same range of hardware interrupt numbers for all > > > (presently both) cpus, but some of them get delivered to a specific > > > cpu associated with the event (presently, IPI and timer; IPI is on a > > > fixed number at synthesis time but timer is runtime configurable) > > > while others are conceptually deliverable to either cpu (presently > > > only delivered to cpu0, but that's treated as an implementation > > > detail). > > > > Given you say it's delivered to the CPU associated with the event (and you have > > different PIT bases per-cpu), it sounds like your timer interrupt is percpu, > > it's just that the hwirq number can be chosen by software. > > It's what I would call percpu in the hardware, but I'm not convinced > that the Linux irq subsystem's "percpu" stuff models it in a way > that fits the hw, nor that it's in any way necessary. My understanding was that you used the same hwirq number to handle interrupts from per-cpu resources being delivered to their relevant CPUs, independently of each other. That in my mind is a perfect match. The only difference, as I've stated a number of times, seems to be that you can choose the hwirq number from software. > > > It currently works requesting the irq with flags that ensure the > > > handler runs on the same cpu it was delivered on, without using any > > > other percpu irq framework. If you have concerns about ways this could > > > break and want me to make the drivers do something else, I'm open to > > > suggestions. > > > > As I suggested, I don't think that this is right, and you need some mechanism > > to describe to the kernel that the interrupt is percpu (e.g. a flag in the > > interrupt-specifier in DT). > > Thomas seemed to think it's okay as-is. Can you describe what you > expect could go wrong by using request_irq rather than the ARM-style > percpu irq framework? The percpu irq code is designed to expect a hwirq number being shared by banked, cpu-local interrupts, the regular request_irq code is not. Even if the latter happens to work today for your use-case, that is not by design. Relying on non-deliberate properties of request_irq makes it far harder for the generic code to be altered in future, with global vs percpu locking, synchronisation, accounting, etc being broken. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2016 at 07:21:13PM +0100, Mark Rutland wrote: > On Thu, Aug 25, 2016 at 01:51:35PM -0400, Rich Felker wrote: > > On Thu, Aug 25, 2016 at 05:38:06PM +0100, Mark Rutland wrote: > > > On Thu, Aug 25, 2016 at 10:56:50AM -0400, Rich Felker wrote: > > > > On Thu, Aug 25, 2016 at 10:07:08AM +0200, Thomas Gleixner wrote: > > > > Nominally it uses the same range of hardware interrupt numbers for all > > > > (presently both) cpus, but some of them get delivered to a specific > > > > cpu associated with the event (presently, IPI and timer; IPI is on a > > > > fixed number at synthesis time but timer is runtime configurable) > > > > while others are conceptually deliverable to either cpu (presently > > > > only delivered to cpu0, but that's treated as an implementation > > > > detail). > > > > > > Given you say it's delivered to the CPU associated with the event (and you have > > > different PIT bases per-cpu), it sounds like your timer interrupt is percpu, > > > it's just that the hwirq number can be chosen by software. > > > > It's what I would call percpu in the hardware, but I'm not convinced > > that the Linux irq subsystem's "percpu" stuff models it in a way > > that fits the hw, nor that it's in any way necessary. > > My understanding was that you used the same hwirq number to handle interrupts > from per-cpu resources being delivered to their relevant CPUs, independently of > each other. > > That in my mind is a perfect match. > > The only difference, as I've stated a number of times, seems to be that you can > choose the hwirq number from software. I don't understand where you're getting that from. The timer irq number can be chosen from software, but by convention the device tree provides an otherwise-unoccupied irq number to be used. The IPI irq cannot be chosen by software; it's hard-coded at synthesis time. Either way, though, I don't see how being able to choose the number from software would impact the situation. The reason I don't think the current percpu irq request framework is well-suited is that, unlike ARM, we don't have any particular set of irq numbers that are designated as percpu vs global. In theory the DT could be extended to provide that information somehow, but it's not information you need to actually be able to use the hardware; if it were added, it would just be satisfying the Linux irq subsystem's (imo gratuitous) desire to know. From my perspective, a hwirq being percpu is logically a property that only the driver for the attached hardware needs to be aware of. In fact the drivers using the percpu request already encode this, but there's an unnecessary requirement that the irqchip driver also have this knowledge. > > > > It currently works requesting the irq with flags that ensure the > > > > handler runs on the same cpu it was delivered on, without using any > > > > other percpu irq framework. If you have concerns about ways this could > > > > break and want me to make the drivers do something else, I'm open to > > > > suggestions. > > > > > > As I suggested, I don't think that this is right, and you need some mechanism > > > to describe to the kernel that the interrupt is percpu (e.g. a flag in the > > > interrupt-specifier in DT). > > > > Thomas seemed to think it's okay as-is. Can you describe what you > > expect could go wrong by using request_irq rather than the ARM-style > > percpu irq framework? > > The percpu irq code is designed to expect a hwirq number being shared by > banked, cpu-local interrupts, the regular request_irq code is not. Even if the > latter happens to work today for your use-case, that is not by design. > > Relying on non-deliberate properties of request_irq makes it far harder for the > generic code to be altered in future, with global vs percpu locking, > synchronisation, accounting, etc being broken. I can see how the irq subsystem wrongly thinking an irq is percpu when it's actually global could cause problems (missing locking, etc.) but I fail to see how the opposite could. In an abstract sense, percpu irqs are just a special case of global ones. The general infrastructure has no reason to care whether it was delivered on cpuN because the event was associated with cpuN, or because cpuN happened to be where the hw decided to deliver it. Only the actual driver/handler needs to know that the cpu on which it was delivered is meaningful. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/24/2016 07:40 PM, Rich Felker wrote: [ ... ] >>> +config CLKSRC_JCORE_PIT >>> + bool "J-Core PIT timer driver" >>> + depends on OF && (SUPERH || COMPILE_TEST) >> >> Even if this is correct, for the sake of consistency, it is preferable >> to change it to: >> >> bool "J-Core PIT timer driver" if COMPILE_TEST >> depends on SUPERH >> select CLKSRC_OF > > Is this functionally equivalent? If so that's non-obvious to me. What > about the dropped OF dependency? The intent is that the option should > always be available for SUPERH targets using OF, otherwise only > available with COMPILE_TEST.ig It is not equivalent but consistent with the other options where it is not possible to manually set/unset the driver config wuthout COMPILE_TEST. >>> +static int jcore_pit_set(unsigned long delta, struct jcore_pit *pit) >>> +{ >>> + jcore_pit_disable(pit); >>> + __raw_writel(delta, pit->base + REG_THROT); >>> + __raw_writel(pit->enable_val, pit->base + REG_PITEN); >>> + return 0; >> >> Why do you need to use __raw_writel ? >> >> s/__raw_writel/writel/ > > I actually tried multiple times to find good resources on policy for > which form to prefer, but didn't have much luck. My understanding is > that __raw_writel/__raw_readl always performs a native-endian > load/store, whereas writel/readl behavior depends on cpu endianness > and whether the arch declares that "pci bus" (that was the term I > found in the source, not relevant here) io is endian-swapped or not. > Can you give me a better explanation of why we might prefer one form > or the other? [ skipping this as it was very well commented by Arnd and Mark ] >>> + freq = DIV_ROUND_CLOSEST(NSEC_PER_SEC, buspd); >>> + pit->periodic_delta = DIV_ROUND_CLOSEST(NSEC_PER_SEC, HZ*buspd); >> >> ---> HZ * buspd > > OK. > >>> + clockevents_config_and_register(&pit->ced, freq, 1, ULONG_MAX); >>> + >>> + return 0; >>> +} >>> + >>> +static int jcore_pit_local_shutdown(unsigned cpu) >>> +{ >>> + return 0; >>> +} >> >> This function is useless I think. AFAIU, cpuhp_setup_state can have a >> NULL function for the cpu teardown. > > OK, I wasn't sure if that was permitted. > >>> + jcore_cs.name = "jcore_pit_cs"; >>> + jcore_cs.rating = 400; >>> + jcore_cs.read = jcore_clocksource_read; >>> + jcore_cs.mult = 1; >>> + jcore_cs.shift = 0; >>> + jcore_cs.mask = CLOCKSOURCE_MASK(32); >>> + jcore_cs.flags = CLOCK_SOURCE_IS_CONTINUOUS; >>> + >>> + err = clocksource_register_hz(&jcore_cs, NSEC_PER_SEC); >>> + if (err) { >>> + pr_err("Error registering clocksource device: %d\n", err); >>> + return err; >>> + } >> >> May be you can consider by replacing the above by: >> >> clocksource_mmio_init(jcore_pit_base, "jcore_pit_cs", >> NSEC_PER_SEC, 32, >> jcore_clocksource_read); > > I think you're missing the rating argument. Otherwise it should work, > but is there a good reason to prefer it? The code is slightly simpler; > I'm not sure if using kzalloc vs static storage is better or worse. Probably clksrc field pointers are pre-calculated at compile time while the allocation results in the code into an indirection to compute the pointers, but I don't think it is noticeable. [ ... ] >>> + /* >>> + * The J-Core PIT is not hard-wired to a particular IRQ, but >>> + * integrated with the interrupt controller such that the IRQ it >>> + * generates is programmable. The programming interface has a >>> + * legacy field which was an interrupt priority for AIC1, but >>> + * which is OR'd onto bits 2-5 of the generated IRQ number when >>> + * used with J-Core AIC2, so set it to match these bits. >>> + */ >>> + hwirq = irq_get_irq_data(pit_irq)->hwirq; >> >> irq_hw_number_t hwirq; >> hwirq = irqd_to_hwirq(irq_get_irq_data(pit_irq)); > > OK. > >>> + irqprio = (hwirq >> 2) & PIT_PRIO_MASK; >>> + enable_val = (1U << PIT_ENABLE_SHIFT) >>> + | (hwirq << PIT_IRQ_SHIFT) >>> + | (irqprio << PIT_PRIO_SHIFT); >> >> >> I'm missing the connection between the description above and the enable >> value computed here. Can you elaborate ? > > The irqprio field is filled in using a value that matches bits 2 and > up of hwirq; this is what the comment says and what the code does. Can > you elaborate on what you don't understand? The API to compute the 'enable_val'. Having a technical reference manual would help a lot.
diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig index 5677886..3210ca5 100644 --- a/drivers/clocksource/Kconfig +++ b/drivers/clocksource/Kconfig @@ -407,6 +407,15 @@ config SYS_SUPPORTS_SH_TMU config SYS_SUPPORTS_EM_STI bool +config CLKSRC_JCORE_PIT + bool "J-Core PIT timer driver" + depends on OF && (SUPERH || COMPILE_TEST) + depends on GENERIC_CLOCKEVENTS + depends on HAS_IOMEM + help + This enables build of clocksource and clockevent driver for + the integrated PIT in the J-Core synthesizable, open source SoC. + config SH_TIMER_CMT bool "Renesas CMT timer driver" if COMPILE_TEST depends on GENERIC_CLOCKEVENTS diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile index fd9d6df..cf87f40 100644 --- a/drivers/clocksource/Makefile +++ b/drivers/clocksource/Makefile @@ -5,6 +5,7 @@ obj-$(CONFIG_ATMEL_TCB_CLKSRC) += tcb_clksrc.o obj-$(CONFIG_X86_PM_TIMER) += acpi_pm.o obj-$(CONFIG_SCx200HR_TIMER) += scx200_hrt.o obj-$(CONFIG_CS5535_CLOCK_EVENT_SRC) += cs5535-clockevt.o +obj-$(CONFIG_CLKSRC_JCORE_PIT) += jcore-pit.o obj-$(CONFIG_SH_TIMER_CMT) += sh_cmt.o obj-$(CONFIG_SH_TIMER_MTU2) += sh_mtu2.o obj-$(CONFIG_SH_TIMER_TMU) += sh_tmu.o diff --git a/drivers/clocksource/jcore-pit.c b/drivers/clocksource/jcore-pit.c new file mode 100644 index 0000000..23dee50 --- /dev/null +++ b/drivers/clocksource/jcore-pit.c @@ -0,0 +1,242 @@ +/* + * J-Core SoC PIT/clocksource driver + * + * Copyright (C) 2015-2016 Smart Energy Instruments, Inc. + * + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + */ + +#include <linux/kernel.h> +#include <linux/slab.h> +#include <linux/interrupt.h> +#include <linux/clockchips.h> +#include <linux/clocksource.h> +#include <linux/sched_clock.h> +#include <linux/cpu.h> +#include <linux/cpuhotplug.h> +#include <linux/of_address.h> +#include <linux/of_irq.h> + +#define PIT_IRQ_SHIFT 12 +#define PIT_PRIO_SHIFT 20 +#define PIT_ENABLE_SHIFT 26 +#define PIT_IRQ_MASK 0x3f +#define PIT_PRIO_MASK 0xf + +#define REG_PITEN 0x00 +#define REG_THROT 0x10 +#define REG_COUNT 0x14 +#define REG_BUSPD 0x18 +#define REG_SECHI 0x20 +#define REG_SECLO 0x24 +#define REG_NSEC 0x28 + +struct jcore_pit { + struct clock_event_device ced; + __iomem void *base; + unsigned long periodic_delta; + unsigned cpu; + u32 enable_val; +}; + +static __iomem void *jcore_pit_base; +static struct clocksource jcore_cs; +struct jcore_pit __percpu *jcore_pit_percpu; + +static notrace u64 jcore_sched_clock_read(void) +{ + u32 seclo, nsec, seclo0; + __iomem void *base = jcore_pit_base; + + seclo = __raw_readl(base + REG_SECLO); + do { + seclo0 = seclo; + nsec = __raw_readl(base + REG_NSEC); + seclo = __raw_readl(base + REG_SECLO); + } while (seclo0 != seclo); + + return seclo * NSEC_PER_SEC + nsec; +} + +static cycle_t jcore_clocksource_read(struct clocksource *cs) +{ + return jcore_sched_clock_read(); +} + +static int jcore_pit_disable(struct jcore_pit *pit) +{ + __raw_writel(0, pit->base + REG_PITEN); + return 0; +} + +static int jcore_pit_set(unsigned long delta, struct jcore_pit *pit) +{ + jcore_pit_disable(pit); + __raw_writel(delta, pit->base + REG_THROT); + __raw_writel(pit->enable_val, pit->base + REG_PITEN); + return 0; +} + +static int jcore_pit_set_state_shutdown(struct clock_event_device *ced) +{ + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); + + return jcore_pit_disable(pit); +} + +static int jcore_pit_set_state_oneshot(struct clock_event_device *ced) +{ + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); + + return jcore_pit_disable(pit); +} + +static int jcore_pit_set_state_periodic(struct clock_event_device *ced) +{ + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); + + return jcore_pit_set(pit->periodic_delta, pit); +} + +static int jcore_pit_set_next_event(unsigned long delta, + struct clock_event_device *ced) +{ + struct jcore_pit *pit = container_of(ced, struct jcore_pit, ced); + + return jcore_pit_set(delta, pit); +} + +static int jcore_pit_local_init(unsigned cpu) +{ + struct jcore_pit *pit = this_cpu_ptr(jcore_pit_percpu); + unsigned buspd, freq; + + pr_info("Local J-Core PIT init on cpu %u\n", pit->cpu); + + buspd = __raw_readl(pit->base + REG_BUSPD); + freq = DIV_ROUND_CLOSEST(NSEC_PER_SEC, buspd); + pit->periodic_delta = DIV_ROUND_CLOSEST(NSEC_PER_SEC, HZ*buspd); + + clockevents_config_and_register(&pit->ced, freq, 1, ULONG_MAX); + + return 0; +} + +static int jcore_pit_local_shutdown(unsigned cpu) +{ + return 0; +} + +static irqreturn_t jcore_timer_interrupt(int irq, void *dev_id) +{ + struct jcore_pit *pit = this_cpu_ptr(dev_id); + + if (clockevent_state_oneshot(&pit->ced)) + jcore_pit_disable(pit); + + pit->ced.event_handler(&pit->ced); + + return IRQ_HANDLED; +} + +static int __init jcore_pit_init(struct device_node *node) +{ + int err; + unsigned pit_irq, cpu; + unsigned long hwirq; + u32 irqprio, enable_val; + + jcore_pit_base = of_iomap(node, 0); + if (!jcore_pit_base) { + pr_err("Error: Cannot map base address for J-Core PIT\n"); + return -ENXIO; + } + + pit_irq = irq_of_parse_and_map(node, 0); + if (!pit_irq) { + pr_err("Error: J-Core PIT has no IRQ\n"); + return -ENXIO; + } + + pr_info("Initializing J-Core PIT at %p IRQ %d\n", + jcore_pit_base, pit_irq); + + jcore_cs.name = "jcore_pit_cs"; + jcore_cs.rating = 400; + jcore_cs.read = jcore_clocksource_read; + jcore_cs.mult = 1; + jcore_cs.shift = 0; + jcore_cs.mask = CLOCKSOURCE_MASK(32); + jcore_cs.flags = CLOCK_SOURCE_IS_CONTINUOUS; + + err = clocksource_register_hz(&jcore_cs, NSEC_PER_SEC); + if (err) { + pr_err("Error registering clocksource device: %d\n", err); + return err; + } + + sched_clock_register(jcore_sched_clock_read, 32, NSEC_PER_SEC); + + jcore_pit_percpu = alloc_percpu(struct jcore_pit); + if (!jcore_pit_percpu) { + pr_err("Failed to allocate memory for clock event device\n"); + return -ENOMEM; + } + + err = request_irq(pit_irq, jcore_timer_interrupt, + IRQF_TIMER | IRQF_PERCPU, + "jcore_pit", jcore_pit_percpu); + if (err) { + pr_err("pit irq request failed: %d\n", err); + return err; + } + + /* + * The J-Core PIT is not hard-wired to a particular IRQ, but + * integrated with the interrupt controller such that the IRQ it + * generates is programmable. The programming interface has a + * legacy field which was an interrupt priority for AIC1, but + * which is OR'd onto bits 2-5 of the generated IRQ number when + * used with J-Core AIC2, so set it to match these bits. + */ + hwirq = irq_get_irq_data(pit_irq)->hwirq; + irqprio = (hwirq >> 2) & PIT_PRIO_MASK; + enable_val = (1U << PIT_ENABLE_SHIFT) + | (hwirq << PIT_IRQ_SHIFT) + | (irqprio << PIT_PRIO_SHIFT); + + for_each_present_cpu(cpu) { + struct jcore_pit *pit = per_cpu_ptr(jcore_pit_percpu, cpu); + + pit->base = of_iomap(node, cpu); + if (!pit->base) { + pr_err("Unable to map PIT for cpu %u\n", cpu); + continue; + } + + pit->ced.name = "jcore_pit"; + pit->ced.features = CLOCK_EVT_FEAT_PERIODIC + | CLOCK_EVT_FEAT_ONESHOT + | CLOCK_EVT_FEAT_PERCPU; + pit->ced.cpumask = cpumask_of(cpu); + pit->ced.rating = 400; + pit->ced.irq = pit_irq; + pit->ced.set_state_shutdown = jcore_pit_set_state_shutdown; + pit->ced.set_state_periodic = jcore_pit_set_state_periodic; + pit->ced.set_state_oneshot = jcore_pit_set_state_oneshot; + pit->ced.set_next_event = jcore_pit_set_next_event; + + pit->cpu = cpu; + pit->enable_val = enable_val; + } + + cpuhp_setup_state(CPUHP_AP_JCORE_TIMER_STARTING, + "AP_JCORE_TIMER_STARTING", + jcore_pit_local_init, jcore_pit_local_shutdown); + + return 0; +} + +CLOCKSOURCE_OF_DECLARE(jcore_pit, "jcore,pit", jcore_pit_init); diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 242bf53..e95ed7d 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -50,6 +50,7 @@ enum cpuhp_state { CPUHP_AP_ARM_ARCH_TIMER_STARTING, CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, CPUHP_AP_DUMMY_TIMER_STARTING, + CPUHP_AP_JCORE_TIMER_STARTING, CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, CPUHP_AP_ARM_TWD_STARTING, CPUHP_AP_METAG_TIMER_STARTING,
At the hardware level, the J-Core PIT is integrated with the interrupt controller, but it is represented as its own device and has an independent programming interface. It provides a 12-bit countdown timer, which is not presently used, and a periodic timer. The interval length for the latter is programmable via a 32-bit throttle register whose units are determined by a bus-period register. The periodic timer is used to implement both periodic and oneshot clock event modes; in oneshot mode the interrupt handler simply disables the timer as soon as it fires. Despite its device tree node representing an interrupt for the PIT, the actual irq generated is programmable, not hard-wired. The driver is responsible for programming the PIT to generate the hardware irq number that the DT assigns to it. On SMP configurations, J-Core provides cpu-local instances of the PIT; no broadcast timer is needed. This driver supports the creation of the necessary per-cpu clock_event_device instances. The code has been tested and works on SMP, but will not be usable without additional J-Core SMP-support patches and appropriate hardware capable of running SMP. A nanosecond-resolution clocksource is provided using the J-Core "RTC" registers, which give a 64-bit seconds count and 32-bit nanoseconds. The driver converts these to a 64-bit nanoseconds count. Signed-off-by: Rich Felker <dalias@libc.org> --- drivers/clocksource/Kconfig | 9 ++ drivers/clocksource/Makefile | 1 + drivers/clocksource/jcore-pit.c | 242 ++++++++++++++++++++++++++++++++++++++++ include/linux/cpuhotplug.h | 1 + 4 files changed, 253 insertions(+) create mode 100644 drivers/clocksource/jcore-pit.c