diff mbox

[1/3] arm/pmu: Reject groups spanning multiple hardware PMUs

Message ID 1425905192-10509-2-git-send-email-suzuki.poulose@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Suzuki K Poulose March 9, 2015, 12:46 p.m. UTC
From: "Suzuki K. Poulose" <suzuki.poulose@arm.com>

Don't allow grouping hardware events from different PMUs
 (eg. CCI + CPU).

Fixes a crash triggered by perf_fuzzer on Linux-4.0-rc2,
with CCI PMU turned on. The validate_event(), after certain checks,
assumes that the given hardware pmu event belongs to armpmu,
which may not be true always, with other hardware PMUs
around (CCI, CCN).

 ---
CPU: 0 PID: 1527 Comm: perf_fuzzer Not tainted 4.0.0-rc2 #57
Hardware name: ARM-Versatile Express
task: bd8484c0 ti: be676000 task.ti: be676000
PC is at 0xbf1bbc90
LR is at validate_event+0x34/0x5c
pc : [<bf1bbc90>]    lr : [<80016060>]    psr: 00000013
...
[<80016060>] (validate_event) from [<80016198>] (validate_group+0x28/0x90)
[<80016198>] (validate_group) from [<80016398>] (armpmu_event_init+0x150/0x218)
[<80016398>] (armpmu_event_init) from [<800882e4>] (perf_try_init_event+0x30/0x48)
[<800882e4>] (perf_try_init_event) from [<8008f544>] (perf_init_event+0x5c/0xf4)
[<8008f544>] (perf_init_event) from [<8008f8a8>] (perf_event_alloc+0x2cc/0x35c)
[<8008f8a8>] (perf_event_alloc) from [<8009015c>] (SyS_perf_event_open+0x498/0xa70)
[<8009015c>] (SyS_perf_event_open) from [<8000e420>] (ret_fast_syscall+0x0/0x34)
Code: bf1be000 bf1bb380 802a2664 00000000 (00000002)
---[ end trace 01aff0ff00926a0a ]---

Also cleans up the code to use the arm_pmu only when we know that
we are dealing with an arm pmu event.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kernel/perf_event.c |   21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Peter Zijlstra March 10, 2015, 11:27 a.m. UTC | #1
On Mon, Mar 09, 2015 at 12:46:30PM +0000, Suzuki K. Poulose wrote:
> From: "Suzuki K. Poulose" <suzuki.poulose@arm.com>
> 
> Don't allow grouping hardware events from different PMUs
>  (eg. CCI + CPU).

Uhm, how does this work? If we have multiple hardware PMUs we'll stop
scheduling events after the first failed event schedule. This can leave
one of the PMUs severely under utilized.
Suzuki K Poulose March 10, 2015, noon UTC | #2
On 10/03/15 11:27, Peter Zijlstra wrote:
> On Mon, Mar 09, 2015 at 12:46:30PM +0000, Suzuki K. Poulose wrote:
>> From: "Suzuki K. Poulose" <suzuki.poulose@arm.com>
>>
>> Don't allow grouping hardware events from different PMUs
>>   (eg. CCI + CPU).
>
> Uhm, how does this work? If we have multiple hardware PMUs we'll stop
> scheduling events after the first failed event schedule. This can leave
> one of the PMUs severely under utilized.
This is done from pmu->event_init(), where we haven't scheduled an
event yet. Do you think we need to solve it using a different approach
? What is the best way to handle this situation ? Is it OK
to allow different PMUs in the group ?

Suzuki
>
Mark Rutland March 10, 2015, 12:05 p.m. UTC | #3
On Tue, Mar 10, 2015 at 11:27:23AM +0000, Peter Zijlstra wrote:
> On Mon, Mar 09, 2015 at 12:46:30PM +0000, Suzuki K. Poulose wrote:
> > From: "Suzuki K. Poulose" <suzuki.poulose@arm.com>
> > 
> > Don't allow grouping hardware events from different PMUs
> >  (eg. CCI + CPU).
> 
> Uhm, how does this work? If we have multiple hardware PMUs we'll stop
> scheduling events after the first failed event schedule. This can leave
> one of the PMUs severely under utilized.

The problem is here group validation at pmu::event_init() time, not
scheduling.

We don't allow grouping across disparate HW PMUs because we can't
provide group semantics anyway. Scheduling is not a problem in this case
(unlike the big.LITTLE case I have a patch for [1]).

We have a CPU PMU and an "uncore" CCI PMU. You can't create task-bound
events for the CCI, but you can create CPU-bound events for the CCI on
the nominal CPU the CCI is monitored from.

The context check you added in c3c87e770458aa00 "perf: Tighten (and fix)
the grouping condition" implicitly rejects groups that have CPU and CCI
events (each event::ctx will be the relevant pmu::pmu_cpu_context and
will differ), and this is sane -- you can't provide group semantics
across disparate HW PMUs.

Unfortunately that happens after we've done the
event->pmu->event_init(event) dance on each event, and in our event_init
function we try to verify the group is sane. In our verification we
ignore SW events, but assume that all !SW events are for the CPU PMU.
If you add a CPU event to a CCI group, that's not the case, and we use
container_of on an unsuitable object, derefence garbage, invoke the
eschaton and so on.

It would be nicer if we could prevent this in the core so we're not
reliant on every PMU driver doing the same verification. My initial
thought was that seemed like unnecessary duplication of the ctx checking
above, but if we're going to end up shoving it into several drivers
anyway perhaps it's the lesser evil.

Mark.

[1] https://lkml.org/lkml/2015/2/5/520
Peter Zijlstra March 10, 2015, 12:53 p.m. UTC | #4
On Tue, Mar 10, 2015 at 12:05:21PM +0000, Mark Rutland wrote:
> On Tue, Mar 10, 2015 at 11:27:23AM +0000, Peter Zijlstra wrote:
> > On Mon, Mar 09, 2015 at 12:46:30PM +0000, Suzuki K. Poulose wrote:
> > > From: "Suzuki K. Poulose" <suzuki.poulose@arm.com>
> > > 
> > > Don't allow grouping hardware events from different PMUs
> > >  (eg. CCI + CPU).
> > 
> > Uhm, how does this work? If we have multiple hardware PMUs we'll stop
> > scheduling events after the first failed event schedule. This can leave
> > one of the PMUs severely under utilized.
> 
> The problem is here group validation at pmu::event_init() time, not
> scheduling.

Maybe make that a little more explicit.

> We don't allow grouping across disparate HW PMUs because we can't
> provide group semantics anyway. Scheduling is not a problem in this case
> (unlike the big.LITTLE case I have a patch for [1]).

Right, I remember that; I was wondering if this was related.

> We have a CPU PMU and an "uncore" CCI PMU. You can't create task-bound
> events for the CCI, but you can create CPU-bound events for the CCI on
> the nominal CPU the CCI is monitored from.

Indeed, ok.

> The context check you added in c3c87e770458aa00 "perf: Tighten (and fix)
> the grouping condition" implicitly rejects groups that have CPU and CCI
> events (each event::ctx will be the relevant pmu::pmu_cpu_context and
> will differ), and this is sane -- you can't provide group semantics
> across disparate HW PMUs.

Agreed.

> Unfortunately that happens after we've done the
> event->pmu->event_init(event) dance on each event, and in our event_init
> function we try to verify the group is sane. In our verification we
> ignore SW events, but assume that all !SW events are for the CPU PMU.
> If you add a CPU event to a CCI group, that's not the case, and we use
> container_of on an unsuitable object, derefence garbage, invoke the
> eschaton and so on.

Indeed, on x86 we explicitly ignore everything not an x86_pmu event.

> It would be nicer if we could prevent this in the core so we're not
> reliant on every PMU driver doing the same verification. My initial
> thought was that seemed like unnecessary duplication of the ctx checking
> above, but if we're going to end up shoving it into several drivers
> anyway perhaps it's the lesser evil.

Again, agreed, that would be better and less error prone. But I'm not
entirely sure how to go about doing it :/ I'll have to go think about
that; and conferences are not the best place for that.

Suggestions on that are welcome of course ;)
Peter Zijlstra March 10, 2015, 1 p.m. UTC | #5
On Tue, Mar 10, 2015 at 01:53:51PM +0100, Peter Zijlstra wrote:
> > It would be nicer if we could prevent this in the core so we're not
> > reliant on every PMU driver doing the same verification. My initial
> > thought was that seemed like unnecessary duplication of the ctx checking
> > above, but if we're going to end up shoving it into several drivers
> > anyway perhaps it's the lesser evil.
> 
> Again, agreed, that would be better and less error prone. But I'm not
> entirely sure how to go about doing it :/ I'll have to go think about
> that; and conferences are not the best place for that.
> 
> Suggestions on that are welcome of course ;)

So the problem is that event_init() is what will return the pmu, so we
cannot make decisions on it until after that returns.

Maybe we can pull out the validate step into its own funciton;
pmu->validate() or whatnot, to be called slightly later.
Mark Rutland March 10, 2015, 1:57 p.m. UTC | #6
> So the problem is that event_init() is what will return the pmu, so we
> cannot make decisions on it until after that returns.

I took a look into hacking something into perf_try_init_event, but it
ends up duplicating all of the existing tests and looks really out of
place.

> Maybe we can pull out the validate step into its own funciton;
> pmu->validate() or whatnot, to be called slightly later.

Perhaps the other way around: move the call to pmu::event_init() later
(after the other checks in perf_event_open), and in its original spot
add a new pmu::can_handle() that only tells us whether an event in
isolation is for this PMU, without validation of the specifics of the
event.

That would keep the split between the two callbacks more clearly
defined, though it would require updating most PMU drivers, so perhaps
that's too invasive.

I'll dig into this a bit.

Mark.
Mark Rutland March 10, 2015, 3:36 p.m. UTC | #7
On Tue, Mar 10, 2015 at 12:53:51PM +0000, Peter Zijlstra wrote:
> On Tue, Mar 10, 2015 at 12:05:21PM +0000, Mark Rutland wrote:
> > On Tue, Mar 10, 2015 at 11:27:23AM +0000, Peter Zijlstra wrote:
> > > On Mon, Mar 09, 2015 at 12:46:30PM +0000, Suzuki K. Poulose wrote:
> > > > From: "Suzuki K. Poulose" <suzuki.poulose@arm.com>
> > > > 
> > > > Don't allow grouping hardware events from different PMUs
> > > >  (eg. CCI + CPU).
> > > 
> > > Uhm, how does this work? If we have multiple hardware PMUs we'll stop
> > > scheduling events after the first failed event schedule. This can leave
> > > one of the PMUs severely under utilized.
> > 
> > The problem is here group validation at pmu::event_init() time, not
> > scheduling.
> 
> Maybe make that a little more explicit.

On the assumption that the patch is otherwise OK, how about the commit
message below?

---->8----
arm/pmu: Reject groups spanning multiple hardware PMUs

The perf core implicitly rejects events spanning multiple HW PMUs, as in
these cases the event->ctx will differ. However this validation is
performed after pmu::event_init() is called in perf_init_event(), and
thus pmu::event_init() may be called with a group leader from a
different HW PMU.

The ARM PMU driver does not take this fact into account, and when
validating groups assumes that it can call to_arm_pmu(event->pmu) for
any HW event. When the event in question is from another HW PMU this is
wrong, and results in dereferencing garbage.

This patch updates the ARM PMU driver to first test for and reject
events from other PMUs, moving the to_arm_pmu and related logic after
this test. Fixes a crash triggered by perf_fuzzer on Linux-4.0-rc2, with
a CCI PMU present:

CPU: 0 PID: 1527 Comm: perf_fuzzer Not tainted 4.0.0-rc2 #57
Hardware name: ARM-Versatile Express
task: bd8484c0 ti: be676000 task.ti: be676000
PC is at 0xbf1bbc90
LR is at validate_event+0x34/0x5c
pc : [<bf1bbc90>]    lr : [<80016060>]    psr: 00000013
...
[<80016060>] (validate_event) from [<80016198>] (validate_group+0x28/0x90)
[<80016198>] (validate_group) from [<80016398>] (armpmu_event_init+0x150/0x218)
[<80016398>] (armpmu_event_init) from [<800882e4>] (perf_try_init_event+0x30/0x48)
[<800882e4>] (perf_try_init_event) from [<8008f544>] (perf_init_event+0x5c/0xf4)
[<8008f544>] (perf_init_event) from [<8008f8a8>] (perf_event_alloc+0x2cc/0x35c)
[<8008f8a8>] (perf_event_alloc) from [<8009015c>] (SyS_perf_event_open+0x498/0xa70)
[<8009015c>] (SyS_perf_event_open) from [<8000e420>] (ret_fast_syscall+0x0/0x34)
Code: bf1be000 bf1bb380 802a2664 00000000 (00000002)
---[ end trace 01aff0ff00926a0a ]---
---->8----

Mark.
Peter Zijlstra March 10, 2015, 3:44 p.m. UTC | #8
On Tue, Mar 10, 2015 at 03:36:08PM +0000, Mark Rutland wrote:
> On the assumption that the patch is otherwise OK, how about the commit
> message below?

Yeah, that'll do fine. Thanks!
diff mbox

Patch

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 557e128..4a86a01 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -259,20 +259,29 @@  out:
 }
 
 static int
-validate_event(struct pmu_hw_events *hw_events,
-	       struct perf_event *event)
+validate_event(struct pmu *pmu, struct pmu_hw_events *hw_events,
+			       struct perf_event *event)
 {
-	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+	struct arm_pmu *armpmu;
 
 	if (is_software_event(event))
 		return 1;
 
+	/*
+	 * Reject groups spanning multiple HW PMUs (e.g. CPU + CCI). The
+	 * core perf code won't check that the pmu->ctx == leader->ctx
+	 * until after pmu->event_init(event).
+	 */
+	if (event->pmu != pmu)
+		return 0;
+
 	if (event->state < PERF_EVENT_STATE_OFF)
 		return 1;
 
 	if (event->state == PERF_EVENT_STATE_OFF && !event->attr.enable_on_exec)
 		return 1;
 
+	armpmu = to_arm_pmu(event->pmu);
 	return armpmu->get_event_idx(hw_events, event) >= 0;
 }
 
@@ -288,15 +297,15 @@  validate_group(struct perf_event *event)
 	 */
 	memset(&fake_pmu.used_mask, 0, sizeof(fake_pmu.used_mask));
 
-	if (!validate_event(&fake_pmu, leader))
+	if (!validate_event(event->pmu, &fake_pmu, leader))
 		return -EINVAL;
 
 	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
-		if (!validate_event(&fake_pmu, sibling))
+		if (!validate_event(event->pmu, &fake_pmu, sibling))
 			return -EINVAL;
 	}
 
-	if (!validate_event(&fake_pmu, event))
+	if (!validate_event(event->pmu, &fake_pmu, event))
 		return -EINVAL;
 
 	return 0;