Message ID | 1599472439-22770-1-git-send-email-qiangqing.zhang@nxp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | perf/imx_ddr: Add stop event counters support for i.MX8MP | expand |
On Mon, Sep 07, 2020 at 05:53:59PM +0800, Joakim Zhang wrote: > DDR Perf driver only supports free-running event counters(counter1/2/3) > now, this patch adds support for stop event counters. > > Legacy SoCs: > Cycle counter(counter0) is a special counter, only count cycles. When > cycle counter overflow, it will lock all counters and generate an > interrupt. In ddr_perf_irq_handler, disable cycle counter then all > counters would stop at the same time, update all counters' count, then > enable cycle counter that all counters count again. During this process, > only clear cycle counter, no need to clear event counters since they are > free-running counters. They would continue counting after overflow and > do/while loop from ddr_perf_event_update can handle event counters > overflow case. > > i.MX8MP: > Almost all is the same as legacy SoCs, the only difference is that, event > counters are not free-running any more. Like cycle counter, when event > counters overflow, they would stop counting unless clear the counter, > and no interrupt generate for event counters. So we should clear event > counters that let them re-count when cycle counter overflow, which ensure > event counters will not lose data. Was this supposed to be an improvement over the "Legacy SoCs" implementation? It seems even worse... Do you _have_ to write zeroes back to the event counters to get them going again, or will any value do? > diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c > index 90884d14f95f..057e361eb391 100644 > --- a/drivers/perf/fsl_imx8_ddr_perf.c > +++ b/drivers/perf/fsl_imx8_ddr_perf.c > @@ -14,6 +14,7 @@ > #include <linux/of_device.h> > #include <linux/of_irq.h> > #include <linux/perf_event.h> > +#include <linux/spinlock.h> > #include <linux/slab.h> > > #define COUNTER_CNTL 0x0 > @@ -82,6 +83,7 @@ struct ddr_pmu { > const struct fsl_ddr_devtype_data *devtype_data; > int irq; > int id; > + spinlock_t lock; > }; > > enum ddr_perf_filter_capabilities { > @@ -368,16 +370,19 @@ static void ddr_perf_event_update(struct perf_event *event) > struct hw_perf_event *hwc = &event->hw; > u64 delta, prev_raw_count, new_raw_count; > int counter = hwc->idx; > + unsigned long flags; > > - do { > - prev_raw_count = local64_read(&hwc->prev_count); > - new_raw_count = ddr_perf_read_counter(pmu, counter); > - } while (local64_cmpxchg(&hwc->prev_count, prev_raw_count, > - new_raw_count) != prev_raw_count); > + spin_lock_irqsave(&pmu->lock, flags); > + > + prev_raw_count = local64_read(&hwc->prev_count); > + new_raw_count = ddr_perf_read_counter(pmu, counter); > > delta = (new_raw_count - prev_raw_count) & 0xFFFFFFFF; > > local64_add(delta, &event->count); > + local64_set(&hwc->prev_count, new_raw_count); Hmm, assuming that the event counters never overflow, why do we care about the prev count at all? In other words, why don't we just add the counter value to event->count and reset the hardware to zero every time? Will
> -----Original Message----- > From: Will Deacon <will@kernel.org> > Sent: 2020年9月8日 1:07 > To: Joakim Zhang <qiangqing.zhang@nxp.com> > Cc: mark.rutland@arm.com; robin.murphy@arm.com; dl-linux-imx > <linux-imx@nxp.com>; linux-arm-kernel@lists.infradead.org > Subject: Re: [PATCH] perf/imx_ddr: Add stop event counters support for > i.MX8MP > > On Mon, Sep 07, 2020 at 05:53:59PM +0800, Joakim Zhang wrote: > > DDR Perf driver only supports free-running event > > counters(counter1/2/3) now, this patch adds support for stop event counters. > > > > Legacy SoCs: > > Cycle counter(counter0) is a special counter, only count cycles. When > > cycle counter overflow, it will lock all counters and generate an > > interrupt. In ddr_perf_irq_handler, disable cycle counter then all > > counters would stop at the same time, update all counters' count, then > > enable cycle counter that all counters count again. During this > > process, only clear cycle counter, no need to clear event counters > > since they are free-running counters. They would continue counting > > after overflow and do/while loop from ddr_perf_event_update can handle > > event counters overflow case. > > > > i.MX8MP: > > Almost all is the same as legacy SoCs, the only difference is that, > > event counters are not free-running any more. Like cycle counter, when > > event counters overflow, they would stop counting unless clear the > > counter, and no interrupt generate for event counters. So we should > > clear event counters that let them re-count when cycle counter > > overflow, which ensure event counters will not lose data. > > Was this supposed to be an improvement over the "Legacy SoCs" > implementation? It seems even worse... Per IC guys' perspective, they think this is an improvement. Event counters should also stop counting when they are overflow. So they fix it as a bug. Per software perspective, we more hope event counters are free-running. However, IC guys has not informed us when they do this change. > Do you _have_ to write zeroes back to the event counters to get them going > again, or will any value do? No, event counters also have a CLEAR bit, only need clear this CLEAR bit, then event counters start counting again from zero. > > diff --git a/drivers/perf/fsl_imx8_ddr_perf.c > > b/drivers/perf/fsl_imx8_ddr_perf.c > > index 90884d14f95f..057e361eb391 100644 > > --- a/drivers/perf/fsl_imx8_ddr_perf.c > > +++ b/drivers/perf/fsl_imx8_ddr_perf.c > > @@ -14,6 +14,7 @@ > > #include <linux/of_device.h> > > #include <linux/of_irq.h> > > #include <linux/perf_event.h> > > +#include <linux/spinlock.h> > > #include <linux/slab.h> > > > > #define COUNTER_CNTL 0x0 > > @@ -82,6 +83,7 @@ struct ddr_pmu { > > const struct fsl_ddr_devtype_data *devtype_data; > > int irq; > > int id; > > + spinlock_t lock; > > }; > > > > enum ddr_perf_filter_capabilities { > > @@ -368,16 +370,19 @@ static void ddr_perf_event_update(struct > perf_event *event) > > struct hw_perf_event *hwc = &event->hw; > > u64 delta, prev_raw_count, new_raw_count; > > int counter = hwc->idx; > > + unsigned long flags; > > > > - do { > > - prev_raw_count = local64_read(&hwc->prev_count); > > - new_raw_count = ddr_perf_read_counter(pmu, counter); > > - } while (local64_cmpxchg(&hwc->prev_count, prev_raw_count, > > - new_raw_count) != prev_raw_count); > > + spin_lock_irqsave(&pmu->lock, flags); > > + > > + prev_raw_count = local64_read(&hwc->prev_count); > > + new_raw_count = ddr_perf_read_counter(pmu, counter); > > > > delta = (new_raw_count - prev_raw_count) & 0xFFFFFFFF; > > > > local64_add(delta, &event->count); > > + local64_set(&hwc->prev_count, new_raw_count); > > Hmm, assuming that the event counters never overflow, why do we care about > the prev count at all? In other words, why don't we just add the counter value > to event->count and reset the hardware to zero every time? Do you mean that: for cycle counter, keep the original routine, for event counters, add counter value to event->count, and then clear event counters to let them counting from zero again? Sounds great! I will have a try. Thanks. Best Regards, Joakim Zhang > Will
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c index 90884d14f95f..057e361eb391 100644 --- a/drivers/perf/fsl_imx8_ddr_perf.c +++ b/drivers/perf/fsl_imx8_ddr_perf.c @@ -14,6 +14,7 @@ #include <linux/of_device.h> #include <linux/of_irq.h> #include <linux/perf_event.h> +#include <linux/spinlock.h> #include <linux/slab.h> #define COUNTER_CNTL 0x0 @@ -82,6 +83,7 @@ struct ddr_pmu { const struct fsl_ddr_devtype_data *devtype_data; int irq; int id; + spinlock_t lock; }; enum ddr_perf_filter_capabilities { @@ -368,16 +370,19 @@ static void ddr_perf_event_update(struct perf_event *event) struct hw_perf_event *hwc = &event->hw; u64 delta, prev_raw_count, new_raw_count; int counter = hwc->idx; + unsigned long flags; - do { - prev_raw_count = local64_read(&hwc->prev_count); - new_raw_count = ddr_perf_read_counter(pmu, counter); - } while (local64_cmpxchg(&hwc->prev_count, prev_raw_count, - new_raw_count) != prev_raw_count); + spin_lock_irqsave(&pmu->lock, flags); + + prev_raw_count = local64_read(&hwc->prev_count); + new_raw_count = ddr_perf_read_counter(pmu, counter); delta = (new_raw_count - prev_raw_count) & 0xFFFFFFFF; local64_add(delta, &event->count); + local64_set(&hwc->prev_count, new_raw_count); + + spin_unlock_irqrestore(&pmu->lock, flags); } static void ddr_perf_counter_enable(struct ddr_pmu *pmu, int config, @@ -404,6 +409,15 @@ static void ddr_perf_counter_enable(struct ddr_pmu *pmu, int config, } } +static bool ddr_perf_counter_overflow(struct ddr_pmu *pmu, int counter) +{ + int val; + + val = readl_relaxed(pmu->base + counter * 4 + COUNTER_CNTL); + + return val & CNTL_OVER ? true : false; +} + static void ddr_perf_event_start(struct perf_event *event, int flags) { struct ddr_pmu *pmu = to_ddr_pmu(event->pmu); @@ -534,7 +548,7 @@ static int ddr_perf_init(struct ddr_pmu *pmu, void __iomem *base, static irqreturn_t ddr_perf_irq_handler(int irq, void *p) { - int i; + int i, ret; struct ddr_pmu *pmu = (struct ddr_pmu *) p; struct perf_event *event, *cycle_event = NULL; @@ -546,7 +560,7 @@ static irqreturn_t ddr_perf_irq_handler(int irq, void *p) /* * When the cycle counter overflows, all counters are stopped, * and an IRQ is raised. If any other counter overflows, it - * continues counting, and no IRQ is raised. + * continues counting (stop counting for i.MX8MP), and no IRQ is raised. * * Cycles occur at least 4 times as often as other events, so we * can update all events on a cycle counter overflow and not @@ -566,6 +580,29 @@ static irqreturn_t ddr_perf_irq_handler(int irq, void *p) cycle_event = event; } + /* Clear event counters to avoid they stop counting when overflow, such as i.MX8MP */ + spin_lock(&pmu->lock); + for (i = 0; i < NUM_COUNTERS; i++) { + if (!pmu->events[i]) + continue; + + event = pmu->events[i]; + + if (event->hw.idx == EVENT_CYCLES_COUNTER) + continue; + + /* check event counters overflow */ + ret = ddr_perf_counter_overflow(pmu, event->hw.idx); + if (ret) + dev_warn(pmu->dev, "Event Counter%d overflow happened, data incorrect!!\n", i); + + /* clear event counters */ + ddr_perf_counter_enable(pmu, event->attr.config, event->hw.idx, true); + + local64_set(&event->hw.prev_count, 0); + } + spin_unlock(&pmu->lock); + ddr_perf_counter_enable(pmu, EVENT_CYCLES_ID, EVENT_CYCLES_COUNTER, @@ -619,6 +656,7 @@ static int ddr_perf_probe(struct platform_device *pdev) num = ddr_perf_init(pmu, base, &pdev->dev); platform_set_drvdata(pdev, pmu); + spin_lock_init(&pmu->lock); name = devm_kasprintf(&pdev->dev, GFP_KERNEL, DDR_PERF_DEV_NAME "%d", num);
DDR Perf driver only supports free-running event counters(counter1/2/3) now, this patch adds support for stop event counters. Legacy SoCs: Cycle counter(counter0) is a special counter, only count cycles. When cycle counter overflow, it will lock all counters and generate an interrupt. In ddr_perf_irq_handler, disable cycle counter then all counters would stop at the same time, update all counters' count, then enable cycle counter that all counters count again. During this process, only clear cycle counter, no need to clear event counters since they are free-running counters. They would continue counting after overflow and do/while loop from ddr_perf_event_update can handle event counters overflow case. i.MX8MP: Almost all is the same as legacy SoCs, the only difference is that, event counters are not free-running any more. Like cycle counter, when event counters overflow, they would stop counting unless clear the counter, and no interrupt generate for event counters. So we should clear event counters that let them re-count when cycle counter overflow, which ensure event counters will not lose data. Take one case into consideration, from cycle counter interrupt context, when invoke ddr_perf_counter_enable to clear event counters, but have not set prev_count equal 0 yet. Concurrently, ddr_perf_event_update from another thread context invokes ddr_perf_read_counter to read that event counter value, it will return 0. Delta(new_raw_count - prev_raw_count) calculate is incorrect. So I add a spinlock, for that clear event counters and update event counters never happened concurrently. It is save for cycle counter to clear then update the counter, since it is exactly overflow. This patch adds stop event counters support which would be compatible to free-running event counters. Hi Will, I resend the patch for your review since last mail time span is too long. I am not sure whether it is a formal solution or not. If any better solution, please share me how to implement it? Thanks. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> --- drivers/perf/fsl_imx8_ddr_perf.c | 52 +++++++++++++++++++++++++++----- 1 file changed, 45 insertions(+), 7 deletions(-)