diff mbox

[3/3] trace points: power: remove 'cpu_id' from trace_cpu_idle

Message ID 1313766244-22313-3-git-send-email-tom.leiming@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ming Lei Aug. 19, 2011, 3:04 p.m. UTC
From: Ming Lei <tom.leiming@gmail.com>

This patch removes the 'cpu_id' parameter of the cpu_idle
trace point, based on the ideas below:

- the cpu_id which is passed to trace point is always the current
  cpu
- the current cpu info has been included into the trace result
  already
- smp_processor_id() can't be used safely in preemptible context.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 Documentation/trace/events-power.txt |    6 +++---
 arch/arm/mach-omap2/pm34xx.c         |    4 ++--
 arch/x86/kernel/process.c            |   12 ++++++------
 drivers/cpuidle/cpuidle.c            |    4 ++--
 include/trace/events/power.h         |   15 ++++++++++++---
 5 files changed, 25 insertions(+), 16 deletions(-)

Comments

Ming Lei Aug. 20, 2011, 2:40 a.m. UTC | #1
Hi,

2011/8/20 Thomas Renninger <trenn@suse.de>:
> On Friday, August 19, 2011 05:04:04 PM tom.leiming@gmail.com wrote:
>> From: Ming Lei <tom.leiming@gmail.com>
>>
>> This patch removes the 'cpu_id' parameter of the cpu_idle
>> trace point, based on the ideas below:
>>
>> - the cpu_id which is passed to trace point is always the current
>>   cpu
> Are you sure this will always be true?

It is sure at least now,  the only place to pass 'dev->cpu' is inside
cpuidle_idle_call,
but the cpuidle_device is got from __this_cpu_read(cpuidle_devices),
so it should
be true for such case.

>
>> - the current cpu info has been included into the trace result
>>   already
>> - smp_processor_id() can't be used safely in preemptible context.
>
> The cpuid has been added to idle events on purpose for example to be
> able to pass C-state dependencies.
> 2 cores may only enter a deeper sleep state if the 2nd core enters it
> as well.
> There may be architectures where you could trigger a sleep state on
> a foreign cpu.
>
> This may not be used now, but if the kernel does want to use it, you do
> not want to adjust a dozen userspace apps.
>
> Not sure how to quickly solve the:
> "smp_processor_id() can't be used safely in preemptible context."
> problem, though.
> A convention could be declared that if -1 is passed, it's the same cpu
> entering the sleep state as the running one. This will probably
> break userspace apps as well...
>
> Best would be to clean up x86 and let idle routines always be entered
> from cpuidle subsystem which knows the cpu id already and eliminate
> all idle functions in arch/x86/kernel/process.c.
>
>   Thomas
>
> PS: I do not care about patch 1 and 2 as these events are ARM specific
> afaik. But I vote against this one.
>


thanks,
Thomas Renninger Aug. 22, 2011, 8:27 a.m. UTC | #2
On Saturday, August 20, 2011 04:40:09 AM Ming Lei wrote:
> Hi,
> 
> 2011/8/20 Thomas Renninger <trenn@suse.de>:
> > On Friday, August 19, 2011 05:04:04 PM tom.leiming@gmail.com wrote:
> >> From: Ming Lei <tom.leiming@gmail.com>
> >>
> >> This patch removes the 'cpu_id' parameter of the cpu_idle
> >> trace point, based on the ideas below:
> >>
> >> - the cpu_id which is passed to trace point is always the current
> >>   cpu
> > Are you sure this will always be true?
> 
> It is sure at least now,  the only place to pass 'dev->cpu' is inside
> cpuidle_idle_call,
It was known that cpu_id is always the current cpu with current 
implementation when this got introduced.
But the perf events API must not change back and forth for userspace 
compatibility. Therefore the cpu_id was added in case
that future implementations want to pass info where the current cpu
is not the cpu which is sent to the sleep state.

> smp_processor_id() can't be used safely in preemptible context.
I expect the only side effect that could happen is that if smp_process_id
is interrupted you get the wrong core id on a cpu idle trace event.
This only happens if cpuidle is not used and even then should happen
very rarely, nothing to worry for a debug tool like that.
And it should get fixed if these idle functions get fully integrated into
cpuidle at some point of time.

   Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jean Pihet Sept. 2, 2011, 7:26 a.m. UTC | #3
Ming Lei, Thomas,

Sorry if it is a bit late to jump in.

On Mon, Aug 22, 2011 at 10:27 AM, Thomas Renninger <trenn@suse.de> wrote:
> On Saturday, August 20, 2011 04:40:09 AM Ming Lei wrote:
>> Hi,
>>
>> 2011/8/20 Thomas Renninger <trenn@suse.de>:
>> > On Friday, August 19, 2011 05:04:04 PM tom.leiming@gmail.com wrote:
>> >> From: Ming Lei <tom.leiming@gmail.com>
>> >>
>> >> This patch removes the 'cpu_id' parameter of the cpu_idle
>> >> trace point, based on the ideas below:
>> >>
>> >> - the cpu_id which is passed to trace point is always the current
>> >>   cpu
>> > Are you sure this will always be true?
>>
>> It is sure at least now,  the only place to pass 'dev->cpu' is inside
>> cpuidle_idle_call,
> It was known that cpu_id is always the current cpu with current
> implementation when this got introduced.
> But the perf events API must not change back and forth for userspace
> compatibility. Therefore the cpu_id was added in case
> that future implementations want to pass info where the current cpu
> is not the cpu which is sent to the sleep state.
Agree. Let's keep the cpu_id field.

>
>> smp_processor_id() can't be used safely in preemptible context.
> I expect the only side effect that could happen is that if smp_process_id
> is interrupted you get the wrong core id on a cpu idle trace event.
> This only happens if cpuidle is not used and even then should happen
> very rarely, nothing to worry for a debug tool like that.
> And it should get fixed if these idle functions get fully integrated into
> cpuidle at some point of time.
>
>   Thomas
>

Regards,
Jean
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Sept. 2, 2011, 7:38 a.m. UTC | #4
Hi,

On Fri, Sep 2, 2011 at 3:26 PM, Jean Pihet <jean.pihet@newoldbits.com> wrote:
>> It was known that cpu_id is always the current cpu with current
>> implementation when this got introduced.
>> But the perf events API must not change back and forth for userspace
>> compatibility. Therefore the cpu_id was added in case
>> that future implementations want to pass info where the current cpu
>> is not the cpu which is sent to the sleep state.
> Agree. Let's keep the cpu_id field.

OK, let's keep it.

How about removing it in clock_enalbe/clock_disable/power_domain_target
as did in [1/3] and [2/3]?  I don't see any usefulness of 'cpu_id' in the tree
trace points.


thanks,
diff mbox

Patch

diff --git a/Documentation/trace/events-power.txt b/Documentation/trace/events-power.txt
index e9d5fe3..9f2f96c 100644
--- a/Documentation/trace/events-power.txt
+++ b/Documentation/trace/events-power.txt
@@ -23,7 +23,7 @@  Cf. include/trace/events/power.h for the events definitions.
 A 'cpu' event class gathers the CPU-related events: cpuidle and
 cpufreq.
 
-cpu_idle		"state=%lu cpu_id=%lu"
+cpu_idle		"state=%lu"
 cpu_frequency		"state=%lu cpu_id=%lu"
 
 A suspend event is used to indicate the system going in and out of the
@@ -33,8 +33,8 @@  machine_suspend		"state=%lu"
 
 
 Note: the value of '-1' or '4294967295' for state means an exit from the current state,
-i.e. trace_cpu_idle(4, smp_processor_id()) means that the system
-enters the idle state 4, while trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id())
+i.e. trace_cpu_idle(4) means that the system
+enters the idle state 4, while trace_cpu_idle(PWR_EVENT_EXIT)
 means that the system exits the previous idle state.
 
 The event which has 'state=4294967295' in the trace is very important to the user
diff --git a/arch/arm/mach-omap2/pm34xx.c b/arch/arm/mach-omap2/pm34xx.c
index 7255d9b..cde9a11 100644
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -502,12 +502,12 @@  static void omap3_pm_idle(void)
 		goto out;
 
 	trace_power_start(POWER_CSTATE, 1, smp_processor_id());
-	trace_cpu_idle(1, smp_processor_id());
+	trace_cpu_idle(1);
 
 	omap_sram_idle();
 
 	trace_power_end(smp_processor_id());
-	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT);
 
 out:
 	local_fiq_enable();
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index e7e3b01..fb9e92b 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -378,7 +378,7 @@  void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
-		trace_cpu_idle(1, smp_processor_id());
+		trace_cpu_idle(1);
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -392,7 +392,7 @@  void default_idle(void)
 			local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
 		trace_power_end(smp_processor_id());
-		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT);
 	} else {
 		local_irq_enable();
 		/* loop is done by the caller */
@@ -443,7 +443,7 @@  static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
-		trace_cpu_idle(1, smp_processor_id());
+		trace_cpu_idle(1);
 		if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -454,7 +454,7 @@  static void mwait_idle(void)
 		else
 			local_irq_enable();
 		trace_power_end(smp_processor_id());
-		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT);
 	} else
 		local_irq_enable();
 }
@@ -467,12 +467,12 @@  static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
-	trace_cpu_idle(0, smp_processor_id());
+	trace_cpu_idle(0);
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
 	trace_power_end(smp_processor_id());
-	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT);
 }
 
 /*
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index d4c5423..7980732 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -106,12 +106,12 @@  int cpuidle_idle_call(void)
 	dev->last_state = target_state;
 
 	trace_power_start(POWER_CSTATE, next_state, dev->cpu);
-	trace_cpu_idle(next_state, dev->cpu);
+	trace_cpu_idle(next_state);
 
 	dev->last_residency = target_state->enter(dev, target_state);
 
 	trace_power_end(dev->cpu);
-	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
+	trace_cpu_idle(PWR_EVENT_EXIT);
 
 	if (dev->last_state)
 		target_state = dev->last_state;
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 3878edc..8a579bd 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -27,11 +27,20 @@  DECLARE_EVENT_CLASS(cpu,
 		  (unsigned long)__entry->cpu_id)
 );
 
-DEFINE_EVENT(cpu, cpu_idle,
+TRACE_EVENT(cpu_idle,
+	TP_PROTO(unsigned int state),
 
-	TP_PROTO(unsigned int state, unsigned int cpu_id),
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
 
-	TP_ARGS(state, cpu_id)
+	TP_printk("state=%lu", (unsigned long)__entry->state)
 );
 
 /* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */