diff mbox

[v5] ARM: omap: edma: add suspend suspend/resume hooks

Message ID 1383863549-7438-1-git-send-email-zonque@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Mack Nov. 7, 2013, 10:32 p.m. UTC
This patch makes the edma driver resume correctly after suspend. Tested
on an AM33xx platform with cyclic audio streams and omap_hsmmc.

All information can be reconstructed by already known runtime
information.

As we now use some functions that were previously only used from __init
context, annotations had to be dropped.

Signed-off-by: Daniel Mack <zonque@gmail.com>
---
Ok, here is v5.

v4 -> v5:

	* dropped pm_runtime_* function calls entirely
	* moved the function pointers to .suspend/resume _noirq

Again, thanks for the reviews. I'm still uncertain which function order
is most appropriate though.


Thanks,
Daniel



 arch/arm/common/edma.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 75 insertions(+), 3 deletions(-)

Comments

Kevin Hilman Nov. 8, 2013, 12:02 a.m. UTC | #1
Daniel Mack <zonque@gmail.com> writes:

> This patch makes the edma driver resume correctly after suspend. Tested
> on an AM33xx platform with cyclic audio streams and omap_hsmmc.
>
> All information can be reconstructed by already known runtime
> information.
>
> As we now use some functions that were previously only used from __init
> context, annotations had to be dropped.
>
> Signed-off-by: Daniel Mack <zonque@gmail.com>
> ---
> Ok, here is v5.
>
> v4 -> v5:
>
> 	* dropped pm_runtime_* function calls entirely
> 	* moved the function pointers to .suspend/resume _noirq

[...]

> +static const struct dev_pm_ops edma_pm_ops = {
> +	.suspend	= edma_pm_suspend,

I suspect you intended to use the _noirq version like the changelog
says?

> +	.resume_noirq	= edma_pm_resume,
> +};

Also, I believe it was already suggested by Nishanth, but the late/early
callbacks are probably more appropriate here than the noirq callbacks.
Unless there's a *really* good reason to use the noirq callbacks, they
should be avoided.

That being said, I wonder if the whole approach here is the right one.
I know you're basing your stuff on some TI tree, but that doesn't make
it the right way (usually, it's the opposite, but I digress...)  ;)

IMO, EDMA should be done like we currently do I2C and not implement
suspend/resume at all.  Instead, the driver should do runtime PM done on
a per xfer basis.  Then when suspend comes along, all that needs to be
done is ensure all in-flight xfers are done, then runtime PM will kick
in.

Kevin
Joel Fernandes Nov. 8, 2013, 4:58 a.m. UTC | #2
Hi Kevin,

On 11/07/2013 06:02 PM, Kevin Hilman wrote:
> Daniel Mack <zonque@gmail.com> writes:
[..]
>> +	.resume_noirq	= edma_pm_resume,
>> +};
> 
> Also, I believe it was already suggested by Nishanth, but the late/early
> callbacks are probably more appropriate here than the noirq callbacks.
> Unless there's a *really* good reason to use the noirq callbacks, they
> should be avoided.
> 
> That being said, I wonder if the whole approach here is the right one.
> I know you're basing your stuff on some TI tree, but that doesn't make
> it the right way (usually, it's the opposite, but I digress...)  ;)
> 
> IMO, EDMA should be done like we currently do I2C and not implement
> suspend/resume at all.  Instead, the driver should do runtime PM done on

But a potential problem with this is powering edma on or off between xfers may
slow things down quite a bit because xfers happen so much often and there is
some overhead in the pm_runtime calls which adds up overtime when dealing with
something as frequent as EDMA. Also we would lose the global EDMA context right
so we'd have to restore the context every time during runtime PM (?).

I had a patch that also made AES driver not pm_runtime_get/put specially when
the next crypto request was imminent. But instead adopted a softer approach
where the pm_runtime_put call would happen at a time when we are sure we're at
the end of all requests. This resulted in quite a speed up.

thanks,

-Joel
Kevin Hilman Nov. 8, 2013, 6:28 a.m. UTC | #3
Hi Joel,

On Thu, Nov 7, 2013 at 8:58 PM, Joel Fernandes <joelf@ti.com> wrote:

> On 11/07/2013 06:02 PM, Kevin Hilman wrote:
>> Daniel Mack <zonque@gmail.com> writes:
> [..]
>>> +    .resume_noirq   = edma_pm_resume,
>>> +};
>>
>> Also, I believe it was already suggested by Nishanth, but the late/early
>> callbacks are probably more appropriate here than the noirq callbacks.
>> Unless there's a *really* good reason to use the noirq callbacks, they
>> should be avoided.
>>
>> That being said, I wonder if the whole approach here is the right one.
>> I know you're basing your stuff on some TI tree, but that doesn't make
>> it the right way (usually, it's the opposite, but I digress...)  ;)
>>
>> IMO, EDMA should be done like we currently do I2C and not implement
>> suspend/resume at all.  Instead, the driver should do runtime PM done on
>
> But a potential problem with this is powering edma on or off between xfers may
> slow things down quite a bit because xfers happen so much often and there is
> some overhead in the pm_runtime calls which adds up overtime when dealing with
> something as frequent as EDMA. Also we would lose the global EDMA context right
> so we'd have to restore the context every time during runtime PM (?).

Have a look at the autosuspend feature of runtime PM.  (c.f.
Documentation/power/pm_runtime.txt)

> I had a patch that also made AES driver not pm_runtime_get/put specially when
> the next crypto request was imminent. But instead adopted a softer approach
> where the pm_runtime_put call would happen at a time when we are sure we're at
> the end of all requests. This resulted in quite a speed up.

Sounds like you partially re-invented autosuspend.

Kevin
Daniel Mack Nov. 8, 2013, 7:59 a.m. UTC | #4
On 11/08/2013 01:02 AM, Kevin Hilman wrote:
> Daniel Mack <zonque@gmail.com> writes:

>> +	.resume_noirq	= edma_pm_resume,
>> +};
> 
> Also, I believe it was already suggested by Nishanth, but the late/early
> callbacks are probably more appropriate here than the noirq callbacks.
> Unless there's a *really* good reason to use the noirq callbacks, they
> should be avoided.

Ok, no problem to change that. Regarding the function ordering,
late/early didn't show any lockup in my tests yet.

> That being said, I wonder if the whole approach here is the right one.
> I know you're basing your stuff on some TI tree, but that doesn't make
> it the right way (usually, it's the opposite, but I digress...)  ;)

:)

Well, I was primarily looking for a solution to bring the edma back to
life after suspend. As I wrote in the first version of this patch that I
sent, I didn't even intent that patch to go mainline at all, given that
arch/arm/common/edma.c woill go away soon.

> IMO, EDMA should be done like we currently do I2C and not implement
> suspend/resume at all.  Instead, the driver should do runtime PM done on
> a per xfer basis.  Then when suspend comes along, all that needs to be
> done is ensure all in-flight xfers are done, then runtime PM will kick
> in.

I see your point in general, but isn't runtime PM there to actually save
power at runtime? Does disabling unused channels in the EDMA really
reduce the consumption of that IP block?


Daniel
Joel Fernandes Nov. 8, 2013, 4:14 p.m. UTC | #5
On 11/08/2013 12:28 AM, Kevin Hilman wrote:
[..]
>>> Also, I believe it was already suggested by Nishanth, but the late/early
>>> callbacks are probably more appropriate here than the noirq callbacks.
>>> Unless there's a *really* good reason to use the noirq callbacks, they
>>> should be avoided.
>>>
>>> That being said, I wonder if the whole approach here is the right one.
>>> I know you're basing your stuff on some TI tree, but that doesn't make
>>> it the right way (usually, it's the opposite, but I digress...)  ;)
>>>
>>> IMO, EDMA should be done like we currently do I2C and not implement
>>> suspend/resume at all.  Instead, the driver should do runtime PM done on
>>
>> But a potential problem with this is powering edma on or off between xfers may
>> slow things down quite a bit because xfers happen so much often and there is
>> some overhead in the pm_runtime calls which adds up overtime when dealing with
>> something as frequent as EDMA. Also we would lose the global EDMA context right
>> so we'd have to restore the context every time during runtime PM (?).
> 
> Have a look at the autosuspend feature of runtime PM.  (c.f.
> Documentation/power/pm_runtime.txt)

Sure, will do that. Thanks :)

Just one more silly question, in very frequent operations is there no over-head
even when using autosuspend? Because when I traced last time (without
autosuspend), the depth of pm runtime calls were quite a lot but this could also
be because of the OMAP implementation.

>> I had a patch that also made AES driver not pm_runtime_get/put specially when
>> the next crypto request was imminent. But instead adopted a softer approach
>> where the pm_runtime_put call would happen at a time when we are sure we're at
>> the end of all requests. This resulted in quite a speed up.
> 
> Sounds like you partially re-invented autosuspend.

Hehe. :)

regards,

-Joel
Nishanth Menon Nov. 8, 2013, 6:03 p.m. UTC | #6
On 11/08/2013 10:14 AM, Joel Fernandes wrote:
> On 11/08/2013 12:28 AM, Kevin Hilman wrote:
> [..]
>>>> Also, I believe it was already suggested by Nishanth, but the late/early
>>>> callbacks are probably more appropriate here than the noirq callbacks.
>>>> Unless there's a *really* good reason to use the noirq callbacks, they
>>>> should be avoided.
>>>>
>>>> That being said, I wonder if the whole approach here is the right one.
>>>> I know you're basing your stuff on some TI tree, but that doesn't make
>>>> it the right way (usually, it's the opposite, but I digress...)  ;)
>>>>
>>>> IMO, EDMA should be done like we currently do I2C and not implement
>>>> suspend/resume at all.  Instead, the driver should do runtime PM done on
>>>
>>> But a potential problem with this is powering edma on or off between xfers may
>>> slow things down quite a bit because xfers happen so much often and there is
>>> some overhead in the pm_runtime calls which adds up overtime when dealing with
>>> something as frequent as EDMA. Also we would lose the global EDMA context right
>>> so we'd have to restore the context every time during runtime PM (?).
>>
>> Have a look at the autosuspend feature of runtime PM.  (c.f.
>> Documentation/power/pm_runtime.txt)
> 
> Sure, will do that. Thanks :)
> 
> Just one more silly question, in very frequent operations is there no over-head
> even when using autosuspend? Because when I traced last time (without
> autosuspend), the depth of pm runtime calls were quite a lot but this could also
> be because of the OMAP implementation.

that is the entire purpose of autosuspend - the back2back calls have
almost no overhead (other than to see that device suspend was not
invoked) - this is pretty fast. you can tweak autosuspend timeout for
the right configuration to meet up with what you need.
Kevin Hilman Nov. 8, 2013, 6:51 p.m. UTC | #7
Joel Fernandes <joelf@ti.com> writes:

> On 11/08/2013 12:28 AM, Kevin Hilman wrote:
> [..]
>>>> Also, I believe it was already suggested by Nishanth, but the late/early
>>>> callbacks are probably more appropriate here than the noirq callbacks.
>>>> Unless there's a *really* good reason to use the noirq callbacks, they
>>>> should be avoided.
>>>>
>>>> That being said, I wonder if the whole approach here is the right one.
>>>> I know you're basing your stuff on some TI tree, but that doesn't make
>>>> it the right way (usually, it's the opposite, but I digress...)  ;)
>>>>
>>>> IMO, EDMA should be done like we currently do I2C and not implement
>>>> suspend/resume at all.  Instead, the driver should do runtime PM done on
>>>
>>> But a potential problem with this is powering edma on or off between xfers may
>>> slow things down quite a bit because xfers happen so much often and there is
>>> some overhead in the pm_runtime calls which adds up overtime when dealing with
>>> something as frequent as EDMA. Also we would lose the global EDMA context right
>>> so we'd have to restore the context every time during runtime PM (?).
>> 
>> Have a look at the autosuspend feature of runtime PM.  (c.f.
>> Documentation/power/pm_runtime.txt)
>
> Sure, will do that. Thanks :)
>
> Just one more silly question, in very frequent operations is there no over-head
> even when using autosuspend? Because when I traced last time (without
> autosuspend), the depth of pm runtime calls were quite a lot but this could also
> be because of the OMAP implementation.

The depth of calls is shallow until it's actually time to really runtime
suspend (after the autosuspend timeout.)  IOW, you don't even get into
OMAP specific code until the autosuspend timeout expires, so the
overhead is only coming from the runtime PM core, and is very minimal.

We're already using this technique in the I2C driver which typically has
less data than EDMA, but does have frequent, bursty xfers.

Kevin
diff mbox

Patch

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 8e1a024..7deacde 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -239,6 +239,8 @@  struct edma {
 	/* list of channels with no even trigger; terminated by "-1" */
 	const s8	*noevent;
 
+	struct edma_soc_info *info;
+
 	/* The edma_inuse bit for each PaRAM slot is clear unless the
 	 * channel is in use ... by ARM or DSP, for QDMA, or whatever.
 	 */
@@ -290,13 +292,13 @@  static void map_dmach_queue(unsigned ctlr, unsigned ch_no,
 			~(0x7 << bit), queue_no << bit);
 }
 
-static void __init map_queue_tc(unsigned ctlr, int queue_no, int tc_no)
+static void map_queue_tc(unsigned ctlr, int queue_no, int tc_no)
 {
 	int bit = queue_no * 4;
 	edma_modify(ctlr, EDMA_QUETCMAP, ~(0x7 << bit), ((tc_no & 0x7) << bit));
 }
 
-static void __init assign_priority_to_queue(unsigned ctlr, int queue_no,
+static void assign_priority_to_queue(unsigned ctlr, int queue_no,
 		int priority)
 {
 	int bit = queue_no * 4;
@@ -315,7 +317,7 @@  static void __init assign_priority_to_queue(unsigned ctlr, int queue_no,
  * included in that particular EDMA variant (Eg : dm646x)
  *
  */
-static void __init map_dmach_param(unsigned ctlr)
+static void map_dmach_param(unsigned ctlr)
 {
 	int i;
 	for (i = 0; i < EDMA_MAX_DMACH; i++)
@@ -1785,15 +1787,85 @@  static int edma_probe(struct platform_device *pdev)
 			edma_write_array2(j, EDMA_DRAE, i, 1, 0x0);
 			edma_write_array(j, EDMA_QRAE, i, 0x0);
 		}
+		edma_cc[j]->info = info[j];
 		arch_num_cc++;
 	}
 
 	return 0;
 }
 
+static int edma_pm_suspend(struct device *dev)
+{
+	int j;
+
+	for (j = 0; j < arch_num_cc; j++) {
+		struct edma *ecc = edma_cc[j];
+
+		disable_irq(ecc->irq_res_start);
+		disable_irq(ecc->irq_res_end);
+	}
+
+	return 0;
+}
+
+static int edma_pm_resume(struct device *dev)
+{
+	int i, j;
+
+	for (j = 0; j < arch_num_cc; j++) {
+		struct edma *cc = edma_cc[j];
+
+		s8 (*queue_priority_mapping)[2];
+		s8 (*queue_tc_mapping)[2];
+
+		queue_tc_mapping = cc->info->queue_tc_mapping;
+		queue_priority_mapping = cc->info->queue_priority_mapping;
+
+		/* Event queue to TC mapping */
+		for (i = 0; queue_tc_mapping[i][0] != -1; i++)
+			map_queue_tc(j, queue_tc_mapping[i][0],
+					queue_tc_mapping[i][1]);
+
+		/* Event queue priority mapping */
+		for (i = 0; queue_priority_mapping[i][0] != -1; i++)
+			assign_priority_to_queue(j,
+						queue_priority_mapping[i][0],
+						queue_priority_mapping[i][1]);
+
+		/* Map the channel to param entry if channel mapping logic
+		 * exist
+		 */
+		if (edma_read(j, EDMA_CCCFG) & CHMAP_EXIST)
+			map_dmach_param(j);
+
+		for (i = 0; i < cc->num_channels; i++) {
+			if (test_bit(i, cc->edma_inuse)) {
+				/* ensure access through shadow region 0 */
+				edma_or_array2(j, EDMA_DRAE, 0, i >> 5,
+						BIT(i & 0x1f));
+
+				setup_dma_interrupt(i,
+						    cc->intr_data[i].callback,
+						    cc->intr_data[i].data);
+			}
+		}
+
+		enable_irq(cc->irq_res_start);
+		enable_irq(cc->irq_res_end);
+	}
+
+	return 0;
+}
+
+static const struct dev_pm_ops edma_pm_ops = {
+	.suspend	= edma_pm_suspend,
+	.resume_noirq	= edma_pm_resume,
+};
+
 static struct platform_driver edma_driver = {
 	.driver = {
 		.name	= "edma",
+		.pm	= &edma_pm_ops,
 		.of_match_table = edma_of_ids,
 	},
 	.probe = edma_probe,