diff mbox

[RFC] Consolidate SRAM support

Message ID 20110415130607.GM1611@n2100.arm.linux.org.uk (mailing list archive)
State Superseded
Headers show

Commit Message

Russell King - ARM Linux April 15, 2011, 1:06 p.m. UTC
This is work in progress.

We have two SoCs using SRAM, both with their own allocation systems,
and both with their own ways of copying functions into the SRAM.

Let's unify this before we have additional SoCs re-implementing this
obviously common functionality themselves.

Unfortunately, we end up with code growth through doing this, but that
will become a win when we have another SoC using this (which I know
there's at least one in the pipeline).

One of the considerations here is that we can easily convert sram-pool.c
to hook into device tree stuff, which can tell the sram allocator:
	- physical address of sram
	- size of sram
	- allocation granularity
and then we just need to ensure that it is appropriately mapped.

This uses the physical address, and unlike Davinci's dma address usage,
it always wants to have the physical address, and will always return
the corresponding physical address when passed that pointer.

OMAP could probably do with some more work to make the omapfb and other
allocations use the sram allocator, rather than hooking in before the
sram allocator is initialized - and then further cleanups so that we
have an initialization function which just does

sram_create(phys, size)
	virt = map sram(phys, size)
	create sram pool(virt, phys, size, min_alloc_order)

Another question is whether we should allow multiple SRAM pools or not -
this code does allow multiple pools, but so far we only have one pool
per SoC.  Overdesign?  Maybe, but it prevents SoCs wanting to duplicate
it if they want to partition the SRAM, or have peripheral-local SRAMs.

Lastly, uio_pruss should probably take the SRAM pool pointer via
platform data so that it doesn't have to include Davinci specific
includes.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/Kconfig                            |    2 +
 arch/arm/common/Kconfig                     |    4 ++
 arch/arm/common/Makefile                    |    1 +
 arch/arm/common/sram-pool.c                 |   69 +++++++++++++++++++++++++++
 arch/arm/include/asm/sram-pool.h            |   20 ++++++++
 arch/arm/mach-davinci/da850.c               |    2 +-
 arch/arm/mach-davinci/dm355.c               |    2 +-
 arch/arm/mach-davinci/dm365.c               |    2 +-
 arch/arm/mach-davinci/dm644x.c              |    2 +-
 arch/arm/mach-davinci/dm646x.c              |    2 +-
 arch/arm/mach-davinci/include/mach/common.h |    2 +-
 arch/arm/mach-davinci/include/mach/sram.h   |   13 +----
 arch/arm/mach-davinci/pm.c                  |   12 +----
 arch/arm/mach-davinci/sram.c                |   42 +++--------------
 arch/arm/plat-omap/include/plat/sram.h      |   17 ++++---
 arch/arm/plat-omap/sram.c                   |   34 +++++---------
 drivers/uio/uio_pruss.c                     |    8 ++-
 17 files changed, 141 insertions(+), 93 deletions(-)

Comments

Arnd Bergmann April 15, 2011, 2:40 p.m. UTC | #1
On Friday 15 April 2011 15:39:55 Rob Herring wrote:

> On 04/15/2011 08:06 AM, Russell King - ARM Linux wrote:
> > This is work in progress.
> >
> > We have two SoCs using SRAM, both with their own allocation systems,
> > and both with their own ways of copying functions into the SRAM.
> 
> It's more than that. Several i.MX chips use plat-mxc/iram_alloc.c.

There are also non-ARM systems doing something like that,
arch/blackfin/mm/sram-alloc.c comes to mind. On powerpc, the
ppc7410 also has this feature in hardware, but I believe it was
never supported in mainline Linux.

How about putting sram-pool.c into the top-level mm/ directory
and sram-pool.h into include/linux? They seem to be completely
generic to me, aside from the dependency on asm/fncpy.h which
should probably remain arch specific.

As long as CONFIG_ARM_SRAM_POOL (or perhaps just CONFIG_SRAM_POOL)
is selected only by platforms that support it, the dependency
should not hurt.

	Arnd
Russell King - ARM Linux April 15, 2011, 3:24 p.m. UTC | #2
On Fri, Apr 15, 2011 at 04:52:38PM +0300, Eduardo Valentin wrote:
> Hi Russel,
> 
> Just small comment,
> 
> On Fri, Apr 15, 2011 at 08:06:07AM -0500, Russell King wrote:
> 
> > diff --git a/arch/arm/plat-omap/sram.c b/arch/arm/plat-omap/sram.c
> > index a3f50b3..3588749 100644
> > --- a/arch/arm/plat-omap/sram.c
> > +++ b/arch/arm/plat-omap/sram.c
> > @@ -75,7 +75,6 @@
> >  static unsigned long omap_sram_start;
> >  static unsigned long omap_sram_base;
> >  static unsigned long omap_sram_size;
> > -static unsigned long omap_sram_ceil;
> 
> I think you missed one occurrence of omap_sram_ceil at omap3_sram_restore_context.
> Probably you have compile-tested w/o CONFIG_PM?

I built OMAP2 only, which did have CONFIG_PM=y.  Welcome to the problems
of ifdef'ing code.
Russell King - ARM Linux April 15, 2011, 3:26 p.m. UTC | #3
On Fri, Apr 15, 2011 at 04:40:00PM +0200, Arnd Bergmann wrote:
> On Friday 15 April 2011 15:39:55 Rob Herring wrote:
> 
> > On 04/15/2011 08:06 AM, Russell King - ARM Linux wrote:
> > > This is work in progress.
> > >
> > > We have two SoCs using SRAM, both with their own allocation systems,
> > > and both with their own ways of copying functions into the SRAM.
> > 
> > It's more than that. Several i.MX chips use plat-mxc/iram_alloc.c.
> 
> There are also non-ARM systems doing something like that,
> arch/blackfin/mm/sram-alloc.c comes to mind. On powerpc, the
> ppc7410 also has this feature in hardware, but I believe it was
> never supported in mainline Linux.

Yes, but there's some horrible bits out there.  PPC is one of them
which looks like it doesn't lend itself to consolidating with this
kind of implementation.

> How about putting sram-pool.c into the top-level mm/ directory
> and sram-pool.h into include/linux? They seem to be completely
> generic to me, aside from the dependency on asm/fncpy.h which
> should probably remain arch specific.

I've thought about lib - but first lets see whether other architectures
want to use it first.  I've asked the sh folk off-list about this and
waiting for a reply.

Blackfin and PPC look horrible to sort out though, so they can come at
a later date if they so wish.
Grant Likely April 15, 2011, 3:32 p.m. UTC | #4
On Fri, Apr 15, 2011 at 9:26 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Apr 15, 2011 at 04:40:00PM +0200, Arnd Bergmann wrote:
>> On Friday 15 April 2011 15:39:55 Rob Herring wrote:
>>
>> > On 04/15/2011 08:06 AM, Russell King - ARM Linux wrote:
>> > > This is work in progress.
>> > >
>> > > We have two SoCs using SRAM, both with their own allocation systems,
>> > > and both with their own ways of copying functions into the SRAM.
>> >
>> > It's more than that. Several i.MX chips use plat-mxc/iram_alloc.c.
>>
>> There are also non-ARM systems doing something like that,
>> arch/blackfin/mm/sram-alloc.c comes to mind. On powerpc, the
>> ppc7410 also has this feature in hardware, but I believe it was
>> never supported in mainline Linux.
>
> Yes, but there's some horrible bits out there.  PPC is one of them
> which looks like it doesn't lend itself to consolidating with this
> kind of implementation.
>
>> How about putting sram-pool.c into the top-level mm/ directory
>> and sram-pool.h into include/linux? They seem to be completely
>> generic to me, aside from the dependency on asm/fncpy.h which
>> should probably remain arch specific.
>
> I've thought about lib - but first lets see whether other architectures
> want to use it first.  I've asked the sh folk off-list about this and
> waiting for a reply.
>
> Blackfin and PPC look horrible to sort out though, so they can come at
> a later date if they so wish.

Yes, once the infrastructure is in place, powerpc can do its own
migration to the new code.  I vote for putting it in lib at the
outset.

g.
Russell King - ARM Linux April 15, 2011, 3:37 p.m. UTC | #5
On Fri, Apr 15, 2011 at 04:50:49PM +0200, Detlef Vollmann wrote:
> I'd love to have the mapping inside the create pool, but that might
> not be possible in general.

No, think about it.  What if you need to map the RAM area with some
special attributes - eg, where ioremap() doesn't work.  Eg, OMAP you
need SRAM mapped as normal memory, except for OMAP34xx where it must
be mapped normal memory non-cacheable.

It's best to leave the mapping to the architecture.

>> Another question is whether we should allow multiple SRAM pools or not -
>> this code does allow multiple pools, but so far we only have one pool
>> per SoC.  Overdesign?  Maybe, but it prevents SoCs wanting to duplicate
>> it if they want to partition the SRAM, or have peripheral-local SRAMs.
> Having the option to partition the SRAM is probably useful.
> What I'm missing is sram_pool_add: on AT91SAM9G20 you have two banks
> of SRAM, and you might want to combine them into a single pool.

Do you already combine them into a single pool, or is this a wishlist?
I'm really not interested in sorting out peoples wishlist items in
the process of consolidation.

Second point is that you'll notice that the code converts to a phys
address using this: phys = phys_base + (virt - virt_base).  As soon as
we start allowing multiple regions of SRAM, it becomes impossible to
provide the phys address without a lot more complexity.

>>   arch/arm/common/sram-pool.c                 |   69 +++++++++++++++++++++++++++
>>   arch/arm/include/asm/sram-pool.h            |   20 ++++++++
> Waht is ARM specific about this code?
> Why not put it into lib/ and include/linux, respectively?

Nothing, but this is the first stage of consolidation of this code.
I'm not trying to consolidate the universe down to one implementation
here - that's going to take _far_ too much effort in one go, and from
my previous experiences interacting with other arch maintainers, that's
the way to get precisely nothing done.

In my experience, arch maintainers tend to object to each others patches,
and the net result is no forward progress.  See dma_mmap API as a
brilliant example of that - the lack of which contines to cause problems
for drivers many years after I initially tried (and gave up) trying to
get agreement on that.

So, let's do what we can to consolidate our stuff and if we get some
agreement from other arch maintainers, look towards moving it out.
Moving stuff out once its properly modularized is trivial.
Russell King - ARM Linux April 15, 2011, 3:41 p.m. UTC | #6
On Fri, Apr 15, 2011 at 09:32:01AM -0600, Grant Likely wrote:
> Yes, once the infrastructure is in place, powerpc can do its own
> migration to the new code.  I vote for putting it in lib at the
> outset.

I don't agree with stuffing non-arch directories with code which people
haven't already agreed should be shared.  As I've already said, in my
experience it's hard to get agreement in the first place and even when
you can the API generally needs to be changed from what you first think
would be reasonable.

So, lets wait to see whether the SH folk reply.
Russell King - ARM Linux April 15, 2011, 4:03 p.m. UTC | #7
On Fri, Apr 15, 2011 at 08:39:55AM -0500, Rob Herring wrote:
> Russell,
>
> On 04/15/2011 08:06 AM, Russell King - ARM Linux wrote:
>> This is work in progress.
>>
>> We have two SoCs using SRAM, both with their own allocation systems,
>> and both with their own ways of copying functions into the SRAM.
>
> It's more than that. Several i.MX chips use plat-mxc/iram_alloc.c.

Hmm, that's nice - except for one issue.  According to my grep of
arch/arm/ and drivers/, nothing uses iram_alloc().  So, does anything in
the MX stuff use iram_alloc.c, or is it dead code left over from a
previous experiment?

The commit says:

    ARM: imx: Add iram allocator functions

    Add IRAM(Internal RAM) allocation functions using GENERIC_ALLOCATOR.
    The allocation size is 4KB multiples to guarantee alignment. The
    idea for these functions is for i.MX platforms to use them
    to dynamically allocate IRAM usage.

    Applies on 2.6.36-rc7

> lpc32xx and pnx4008 also use iram, but do not have an allocator (only 1  
> user). Both are doing a copy the suspend code to IRAM and run it which  
> may also be a good thing to have generic code for. Several i.MX chips  
> also need to run from IRAM for suspend.

We have support for copying functions to other bits of memory and getting
the Thumb-ness right - see asm/fncpy.h.  So that's a separate patch to
convert them over.
Russell King - ARM Linux April 15, 2011, 4:20 p.m. UTC | #8
On Fri, Apr 15, 2011 at 04:18:23PM +0000, Nguyen Dinh-R00091 wrote:
> >Hmm, that's nice - except for one issue.  According to my grep of
> >arch/arm/ and drivers/, nothing uses iram_alloc().  So, does anything in
> >the MX stuff use iram_alloc.c, or is it dead code left over from a
> >previous experiment?
> 
> This function will be used for suspend code in the mx5x series. I just
> got done submitting a series of patches to Sascha for a simple suspend
> that does not need running code out of IRAM yet. The next set of suspend
> patches will be using these iram functions.

Okay, so does iram_alloc() need to return a dma_addr_t or a phys_addr_t ?
Are you dealing with physical addresses for it or DMA addresses?
Russell King - ARM Linux April 15, 2011, 4:58 p.m. UTC | #9
On Fri, Apr 15, 2011 at 05:20:45PM +0100, Russell King - ARM Linux wrote:
> On Fri, Apr 15, 2011 at 04:18:23PM +0000, Nguyen Dinh-R00091 wrote:
> > >Hmm, that's nice - except for one issue.  According to my grep of
> > >arch/arm/ and drivers/, nothing uses iram_alloc().  So, does anything in
> > >the MX stuff use iram_alloc.c, or is it dead code left over from a
> > >previous experiment?
> > 
> > This function will be used for suspend code in the mx5x series. I just
> > got done submitting a series of patches to Sascha for a simple suspend
> > that does not need running code out of IRAM yet. The next set of suspend
> > patches will be using these iram functions.
> 
> Okay, so does iram_alloc() need to return a dma_addr_t or a phys_addr_t ?
> Are you dealing with physical addresses for it or DMA addresses?

And another question: why does iram_alloc() return a void __iomem * for
the virtual address, and the physical address via a pointer argument.
However, iram_free() take an unsigned long for the address.

It seems rather inconsistent that the parameter for free is returned via
a pointer argument.

If I convert this to sram-pool, can we change this to:

static inline void *iram_alloc(size_t size, phys_addr_t *phys_addr)
{
	return sram_pool_alloc(iram_pool, size, phys_addr);
}

static void iram_free(void *addr, size_t size)
{
	sram_pool_free(iram_pool, addr, size);
}

and:

int __init iram_init(unsigned long base, unsigned long size)

becomes:

int __init iram_init(phys_addr_t base, size_t size)

?
Russell King - ARM Linux April 15, 2011, 6:21 p.m. UTC | #10
On Fri, Apr 15, 2011 at 08:12:07PM +0200, Detlef Vollmann wrote:
>> Second point is that you'll notice that the code converts to a phys
>> address using this: phys = phys_base + (virt - virt_base).  As soon as
>> we start allowing multiple regions of SRAM, it becomes impossible to
>> provide the phys address without a lot more complexity.
> Yes, I understand that.  This either requires some intrusive changes
> to gen_pool code or quite some additional code in sram_pool_alloc.

It would require sram_pool to track each individual buffer alongside
the similar tracking that gen_pool does, and then look up the returned
address to find out which buffer it corresponds with.

As it looks though, this functionality isn't required on AT91.  So lets
not complicate the code unless we know we have to.  While the alloc/free
API is such that it'll be able to cope if we have to add it later - the
initialization will have to become a little more complex.  And then I
start to wonder if gen_pool should just be extended for the phys address.
Russell King - ARM Linux April 15, 2011, 7:40 p.m. UTC | #11
On Fri, Apr 15, 2011 at 07:20:12PM +0000, Nguyen Dinh-R00091 wrote:
> 
> >-----Original Message-----
> >From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> >Sent: Friday, April 15, 2011 11:59 AM
> >To: Nguyen Dinh-R00091
> >Cc: Kevin Hilman; davinci-linux-open-source@linux.davincidsp.com; Tony Lindgren; Sekhar Nori; linux-
> >omap@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> >Subject: Re: [RFC PATCH] Consolidate SRAM support
> >
> >On Fri, Apr 15, 2011 at 05:20:45PM +0100, Russell King - ARM Linux wrote:
> >> On Fri, Apr 15, 2011 at 04:18:23PM +0000, Nguyen Dinh-R00091 wrote:
> >> > >Hmm, that's nice - except for one issue.  According to my grep of
> >> > >arch/arm/ and drivers/, nothing uses iram_alloc().  So, does anything in
> >> > >the MX stuff use iram_alloc.c, or is it dead code left over from a
> >> > >previous experiment?
> >> >
> >> > This function will be used for suspend code in the mx5x series. I just
> >> > got done submitting a series of patches to Sascha for a simple suspend
> >> > that does not need running code out of IRAM yet. The next set of suspend
> >> > patches will be using these iram functions.
> >>
> >> Okay, so does iram_alloc() need to return a dma_addr_t or a phys_addr_t ?
> >> Are you dealing with physical addresses for it or DMA addresses?
> >
> >And another question: why does iram_alloc() return a void __iomem * for
> >the virtual address, and the physical address via a pointer argument.
> >However, iram_free() take an unsigned long for the address.
> 
> Yes, should just be a void *iram_alloc().

Thanks.  As you didn't comment against the change below, I'm going to
assume that you're happy with it.  It means that the usage changes from:

	unsigned long dma;
	void __iomem *addr = iram_alloc(SZ_4K, &dma);
	...
	iram_free(dma, SZ_4K);

to:

	phys_addr_t phys;
	void *addr = iram_alloc(SZ_4K, &phys);
	...
	iram_free(addr, SZ_4K);

> >It seems rather inconsistent that the parameter for free is returned via
> >a pointer argument.
> >
> >If I convert this to sram-pool, can we change this to:
> >
> >static inline void *iram_alloc(size_t size, phys_addr_t *phys_addr)
> >{
> >	return sram_pool_alloc(iram_pool, size, phys_addr);
> >}
> >
> >static void iram_free(void *addr, size_t size)
> >{
> >	sram_pool_free(iram_pool, addr, size);
> >}
> >
> >and:
> >
> >int __init iram_init(unsigned long base, unsigned long size)
> >
> >becomes:
> >
> >int __init iram_init(phys_addr_t base, size_t size)
> >
> >?
> 
>
Russell King - ARM Linux April 15, 2011, 8:19 p.m. UTC | #12
On Fri, Apr 15, 2011 at 10:11:07PM +0200, Uwe Kleine-König wrote:
> On Fri, Apr 15, 2011 at 02:06:07PM +0100, Russell King - ARM Linux wrote:
> > This is work in progress.
> > 
> > We have two SoCs using SRAM, both with their own allocation systems,
> > and both with their own ways of copying functions into the SRAM.
> I havn't checked the details, but maybe the code in
> arch/arm/plat-mxc/iram_alloc.c could be migrated to your approach, too?

Its already in there now that I have replies from Nguyen Dinh.  It
looks like this presently:

 arch/arm/plat-mxc/Kconfig                   |    2 +-
 arch/arm/plat-mxc/include/mach/iram.h       |   24 +++++++--
 arch/arm/plat-mxc/iram_alloc.c              |   50 +++++---------------

and if we get rid of the iram_alloc/iram_free wrappers around the
sram_pool (now pv_pool) alloc/free in iram.h, in the same way I've
done for Davinci, then we get less new additions too.
Russell King - ARM Linux April 16, 2011, 1:09 p.m. UTC | #13
On Sat, Apr 16, 2011 at 09:01:26PM +0800, Haojian Zhuang wrote:
> On Fri, Apr 15, 2011 at 9:06 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > This is work in progress.
> >
> > We have two SoCs using SRAM, both with their own allocation systems,
> > and both with their own ways of copying functions into the SRAM.
> >
> > Let's unify this before we have additional SoCs re-implementing this
> > obviously common functionality themselves.
> >
> > Unfortunately, we end up with code growth through doing this, but that
> > will become a win when we have another SoC using this (which I know
> > there's at least one in the pipeline).
> >
> > One of the considerations here is that we can easily convert sram-pool.c
> > to hook into device tree stuff, which can tell the sram allocator:
> >        - physical address of sram
> >        - size of sram
> >        - allocation granularity
> > and then we just need to ensure that it is appropriately mapped.
> >
> > This uses the physical address, and unlike Davinci's dma address usage,
> > it always wants to have the physical address, and will always return
> > the corresponding physical address when passed that pointer.
> >
> > OMAP could probably do with some more work to make the omapfb and other
> > allocations use the sram allocator, rather than hooking in before the
> > sram allocator is initialized - and then further cleanups so that we
> > have an initialization function which just does
> >
> > sram_create(phys, size)
> >        virt = map sram(phys, size)
> >        create sram pool(virt, phys, size, min_alloc_order)
> >
> > Another question is whether we should allow multiple SRAM pools or not -
> > this code does allow multiple pools, but so far we only have one pool
> > per SoC.  Overdesign?  Maybe, but it prevents SoCs wanting to duplicate
> > it if they want to partition the SRAM, or have peripheral-local SRAMs.
> >
> > Lastly, uio_pruss should probably take the SRAM pool pointer via
> > platform data so that it doesn't have to include Davinci specific
> > includes.
> >
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> This common sram driver is good for us. It can benefit us on DMA usage.
> I just have one question on SRAM for storing instruction. We still need to
> copy code into SRAM and flush cache & TLB with this SRAM driver.

There is the fncpy API for copying code to other regions of memory,
including SRAM.  The fncpy API is explicitly designed to cope with
Thumb 2 and provide an appropriate function pointer for such code.

Such code needs to be position independent to cope with multiple SRAM
users, and also possible variability of SRAM mapping which may (eventually)
exist when DT comes along.

> TCM driver can allocate code into SRAM section in link stage. It needs to
> update link file and virtual memory layout. Is it worth to make SRAM driver
> support this behavior? The case of using SRAM as memory for instruction
> is switching frequency or entering/exiting low power mode in PXA silicons.

This is more or less the same scenario which OMAP is using its SRAM
for, although that's explicitly SRAM and not TCM.

I don't think we need position dependent code requiring fixup for
these applications as such things tend to be fairly easily coded in a
position independent way.

Also note that C functions can't be copied because C produces position
dependent code, and we have no way of extracting the relocation
dependencies from such code.
diff mbox

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5b9f78b..5c3401c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -850,6 +850,7 @@  config ARCH_DAVINCI
 	bool "TI DaVinci"
 	select GENERIC_CLOCKEVENTS
 	select ARCH_REQUIRE_GPIOLIB
+	select ARM_SRAM_POOL
 	select ZONE_DMA
 	select HAVE_IDE
 	select CLKDEV_LOOKUP
@@ -863,6 +864,7 @@  config ARCH_OMAP
 	select HAVE_CLK
 	select ARCH_REQUIRE_GPIOLIB
 	select ARCH_HAS_CPUFREQ
+	select ARM_SRAM_POOL
 	select GENERIC_CLOCKEVENTS
 	select HAVE_SCHED_CLOCK
 	select ARCH_HAS_HOLES_MEMORYMODEL
diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig
index ea5ee4d..ddbd20b 100644
--- a/arch/arm/common/Kconfig
+++ b/arch/arm/common/Kconfig
@@ -39,3 +39,7 @@  config SHARP_PARAM
 
 config SHARP_SCOOP
 	bool
+
+config ARM_SRAM_POOL
+	bool
+	select GENERIC_ALLOCATOR
diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
index e7521bca..b79ad68 100644
--- a/arch/arm/common/Makefile
+++ b/arch/arm/common/Makefile
@@ -18,3 +18,4 @@  obj-$(CONFIG_ARCH_IXP23XX)	+= uengine.o
 obj-$(CONFIG_PCI_HOST_ITE8152)  += it8152.o
 obj-$(CONFIG_COMMON_CLKDEV)	+= clkdev.o
 obj-$(CONFIG_ARM_TIMER_SP804)	+= timer-sp.o
+obj-$(CONFIG_ARM_SRAM_POOL)	+= sram-pool.o
diff --git a/arch/arm/common/sram-pool.c b/arch/arm/common/sram-pool.c
new file mode 100644
index 0000000..9ff1466
--- /dev/null
+++ b/arch/arm/common/sram-pool.c
@@ -0,0 +1,69 @@ 
+/*
+ * Unified SRAM allocator, based on mach-davinci/sram.c, which was
+ * Copyright (C) 2009 David Brownell.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include <linux/dma-mapping.h>
+#include <linux/genalloc.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#include <asm/sram-pool.h>
+
+struct sram_pool {
+	struct gen_pool *genpool;
+	void *cpu_base;
+	phys_addr_t phys_base;
+};
+
+void *sram_pool_alloc(struct sram_pool *pool, size_t len, phys_addr_t *phys)
+{
+	void *addr = (void *)gen_pool_alloc(pool->genpool, len);
+
+	if (phys)
+		*phys = addr ? (pool->phys_base + (addr - pool->cpu_base)) :
+			 (phys_addr_t)-1ULL;
+
+	return addr;
+}
+EXPORT_SYMBOL_GPL(sram_pool_alloc);
+
+void sram_pool_free(struct sram_pool *pool, void *addr, size_t len)
+{
+	gen_pool_free(pool->genpool, (unsigned long)addr, len);
+}
+EXPORT_SYMBOL_GPL(sram_pool_free);
+
+struct sram_pool *sram_pool_create(void *addr, phys_addr_t phys, size_t len,
+	int min_alloc_order)
+{
+	struct sram_pool *pool = kzalloc(sizeof(struct sram_pool), GFP_KERNEL);
+
+	if (pool) {
+		pool->cpu_base = addr;
+		pool->phys_base = phys;
+		pool->genpool = gen_pool_create(min_alloc_order, -1);
+		if (!pool->genpool) {
+			kfree(pool);
+			pool = NULL;
+		} else {
+			WARN_ON(gen_pool_add(pool->genpool, (unsigned long)addr,
+						len, -1) < 0);
+		}
+	}
+
+	return pool;
+}
+EXPORT_SYMBOL_GPL(sram_pool_create);
+
+void sram_pool_destroy(struct sram_pool *pool)
+{
+	gen_pool_destroy(pool->genpool);
+	kfree(pool);
+}
+EXPORT_SYMBOL_GPL(sram_pool_destroy);
diff --git a/arch/arm/include/asm/sram-pool.h b/arch/arm/include/asm/sram-pool.h
new file mode 100644
index 0000000..b7ae871
--- /dev/null
+++ b/arch/arm/include/asm/sram-pool.h
@@ -0,0 +1,20 @@ 
+#ifndef __ASMARM_SRAM_POOL_H
+#define __ASMARM_SRAM_POOL_H
+
+#include <asm/fncpy.h>
+
+struct sram_pool;
+
+void *sram_pool_alloc(struct sram_pool *, size_t, phys_addr_t *);
+void sram_pool_free(struct sram_pool *, void *, size_t);
+struct sram_pool *sram_pool_create(void *, phys_addr_t, size_t, int);
+void sram_pool_destroy(struct sram_pool *);
+
+/* Macro to copy a function into SRAM, using the fncpy API */
+#define sram_pool_fncpy(pool, funcp, size) ({			\
+	size_t _sz = size;					\
+	void *_sram = sram_pool_alloc(pool, _sz, NULL);		\
+	(_sram ? fncpy(_sram, &(funcp), _sz) : NULL);		\
+})
+
+#endif
diff --git a/arch/arm/mach-davinci/da850.c b/arch/arm/mach-davinci/da850.c
index 68fe4c2..5eca128 100644
--- a/arch/arm/mach-davinci/da850.c
+++ b/arch/arm/mach-davinci/da850.c
@@ -1099,7 +1099,7 @@  static struct davinci_soc_info davinci_soc_info_da850 = {
 	.gpio_irq		= IRQ_DA8XX_GPIO0,
 	.serial_dev		= &da8xx_serial_device,
 	.emac_pdata		= &da8xx_emac_pdata,
-	.sram_dma		= DA8XX_ARM_RAM_BASE,
+	.sram_phys		= DA8XX_ARM_RAM_BASE,
 	.sram_len		= SZ_8K,
 	.reset_device		= &da8xx_wdt_device,
 };
diff --git a/arch/arm/mach-davinci/dm355.c b/arch/arm/mach-davinci/dm355.c
index 76364d1..3df8730 100644
--- a/arch/arm/mach-davinci/dm355.c
+++ b/arch/arm/mach-davinci/dm355.c
@@ -850,7 +850,7 @@  static struct davinci_soc_info davinci_soc_info_dm355 = {
 	.gpio_num		= 104,
 	.gpio_irq		= IRQ_DM355_GPIOBNK0,
 	.serial_dev		= &dm355_serial_device,
-	.sram_dma		= 0x00010000,
+	.sram_phys		= 0x00010000,
 	.sram_len		= SZ_32K,
 	.reset_device		= &davinci_wdt_device,
 };
diff --git a/arch/arm/mach-davinci/dm365.c b/arch/arm/mach-davinci/dm365.c
index 4604e72..d306034 100644
--- a/arch/arm/mach-davinci/dm365.c
+++ b/arch/arm/mach-davinci/dm365.c
@@ -1082,7 +1082,7 @@  static struct davinci_soc_info davinci_soc_info_dm365 = {
 	.gpio_unbanked		= 8,	/* really 16 ... skip muxed GPIOs */
 	.serial_dev		= &dm365_serial_device,
 	.emac_pdata		= &dm365_emac_pdata,
-	.sram_dma		= 0x00010000,
+	.sram_phys		= 0x00010000,
 	.sram_len		= SZ_32K,
 	.reset_device		= &davinci_wdt_device,
 };
diff --git a/arch/arm/mach-davinci/dm644x.c b/arch/arm/mach-davinci/dm644x.c
index 9a2376b..4ca7295 100644
--- a/arch/arm/mach-davinci/dm644x.c
+++ b/arch/arm/mach-davinci/dm644x.c
@@ -764,7 +764,7 @@  static struct davinci_soc_info davinci_soc_info_dm644x = {
 	.gpio_irq		= IRQ_GPIOBNK0,
 	.serial_dev		= &dm644x_serial_device,
 	.emac_pdata		= &dm644x_emac_pdata,
-	.sram_dma		= 0x00008000,
+	.sram_phys		= 0x00008000,
 	.sram_len		= SZ_16K,
 	.reset_device		= &davinci_wdt_device,
 };
diff --git a/arch/arm/mach-davinci/dm646x.c b/arch/arm/mach-davinci/dm646x.c
index 1e0f809..a4365f7 100644
--- a/arch/arm/mach-davinci/dm646x.c
+++ b/arch/arm/mach-davinci/dm646x.c
@@ -848,7 +848,7 @@  static struct davinci_soc_info davinci_soc_info_dm646x = {
 	.gpio_irq		= IRQ_DM646X_GPIOBNK0,
 	.serial_dev		= &dm646x_serial_device,
 	.emac_pdata		= &dm646x_emac_pdata,
-	.sram_dma		= 0x10010000,
+	.sram_phys		= 0x10010000,
 	.sram_len		= SZ_32K,
 	.reset_device		= &davinci_wdt_device,
 };
diff --git a/arch/arm/mach-davinci/include/mach/common.h b/arch/arm/mach-davinci/include/mach/common.h
index a57cba2..665d049 100644
--- a/arch/arm/mach-davinci/include/mach/common.h
+++ b/arch/arm/mach-davinci/include/mach/common.h
@@ -75,7 +75,7 @@  struct davinci_soc_info {
 	int				gpio_ctlrs_num;
 	struct platform_device		*serial_dev;
 	struct emac_platform_data	*emac_pdata;
-	dma_addr_t			sram_dma;
+	phys_addr_t			sram_phys;
 	unsigned			sram_len;
 	struct platform_device		*reset_device;
 	void				(*reset)(struct platform_device *);
diff --git a/arch/arm/mach-davinci/include/mach/sram.h b/arch/arm/mach-davinci/include/mach/sram.h
index 111f7cc..7d77e85 100644
--- a/arch/arm/mach-davinci/include/mach/sram.h
+++ b/arch/arm/mach-davinci/include/mach/sram.h
@@ -10,18 +10,11 @@ 
 #ifndef __MACH_SRAM_H
 #define __MACH_SRAM_H
 
+#include <asm/sram-pool.h>
+
 /* ARBITRARY:  SRAM allocations are multiples of this 2^N size */
 #define SRAM_GRANULARITY	512
 
-/*
- * SRAM allocations return a CPU virtual address, or NULL on error.
- * If a DMA address is requested and the SRAM supports DMA, its
- * mapped address is also returned.
- *
- * Errors include SRAM memory not being available, and requesting
- * DMA mapped SRAM on systems which don't allow that.
- */
-extern void *sram_alloc(size_t len, dma_addr_t *dma);
-extern void sram_free(void *addr, size_t len);
+extern struct sram_pool *davinci_sram_pool;
 
 #endif /* __MACH_SRAM_H */
diff --git a/arch/arm/mach-davinci/pm.c b/arch/arm/mach-davinci/pm.c
index 1bd73a0..f69cd7b 100644
--- a/arch/arm/mach-davinci/pm.c
+++ b/arch/arm/mach-davinci/pm.c
@@ -29,12 +29,6 @@ 
 static void (*davinci_sram_suspend) (struct davinci_pm_config *);
 static struct davinci_pm_config *pdata;
 
-static void davinci_sram_push(void *dest, void *src, unsigned int size)
-{
-	memcpy(dest, src, size);
-	flush_icache_range((unsigned long)dest, (unsigned long)(dest + size));
-}
-
 static void davinci_pm_suspend(void)
 {
 	unsigned val;
@@ -123,15 +117,13 @@  static int __init davinci_pm_probe(struct platform_device *pdev)
 		return -ENOENT;
 	}
 
-	davinci_sram_suspend = sram_alloc(davinci_cpu_suspend_sz, NULL);
+	davinci_sram_suspend = sram_pool_fncpy(davinci_sram_pool,
+				davinci_cpu_suspend, davinci_cpu_suspend_sz);
 	if (!davinci_sram_suspend) {
 		dev_err(&pdev->dev, "cannot allocate SRAM memory\n");
 		return -ENOMEM;
 	}
 
-	davinci_sram_push(davinci_sram_suspend, davinci_cpu_suspend,
-						davinci_cpu_suspend_sz);
-
 	suspend_set_ops(&davinci_pm_ops);
 
 	return 0;
diff --git a/arch/arm/mach-davinci/sram.c b/arch/arm/mach-davinci/sram.c
index db0f778..ebd4d67 100644
--- a/arch/arm/mach-davinci/sram.c
+++ b/arch/arm/mach-davinci/sram.c
@@ -10,40 +10,13 @@ 
  */
 #include <linux/module.h>
 #include <linux/init.h>
-#include <linux/genalloc.h>
+#include <asm/sram-pool.h>
 
 #include <mach/common.h>
 #include <mach/sram.h>
 
-static struct gen_pool *sram_pool;
-
-void *sram_alloc(size_t len, dma_addr_t *dma)
-{
-	unsigned long vaddr;
-	dma_addr_t dma_base = davinci_soc_info.sram_dma;
-
-	if (dma)
-		*dma = 0;
-	if (!sram_pool || (dma && !dma_base))
-		return NULL;
-
-	vaddr = gen_pool_alloc(sram_pool, len);
-	if (!vaddr)
-		return NULL;
-
-	if (dma)
-		*dma = dma_base + (vaddr - SRAM_VIRT);
-	return (void *)vaddr;
-
-}
-EXPORT_SYMBOL(sram_alloc);
-
-void sram_free(void *addr, size_t len)
-{
-	gen_pool_free(sram_pool, (unsigned long) addr, len);
-}
-EXPORT_SYMBOL(sram_free);
-
+struct sram_pool *davinci_sram_pool;
+EXPORT_SYMBOL_GPL(davinci_sram_pool);
 
 /*
  * REVISIT This supports CPU and DMA access to/from SRAM, but it
@@ -58,13 +31,12 @@  static int __init sram_init(void)
 
 	if (len) {
 		len = min_t(unsigned, len, SRAM_SIZE);
-		sram_pool = gen_pool_create(ilog2(SRAM_GRANULARITY), -1);
-		if (!sram_pool)
+		davinci_sram_pool = sram_pool_create((void *)SRAM_VIRT,
+				davinci_soc_info.sram_phys, len,
+				ilog2(SRAM_GRANULARITY));
+		if (!davinci_sram_pool)
 			status = -ENOMEM;
 	}
-	if (sram_pool)
-		status = gen_pool_add(sram_pool, SRAM_VIRT, len, -1);
-	WARN_ON(status < 0);
 	return status;
 }
 core_initcall(sram_init);
diff --git a/arch/arm/plat-omap/include/plat/sram.h b/arch/arm/plat-omap/include/plat/sram.h
index f500fc3..9fc27d0 100644
--- a/arch/arm/plat-omap/include/plat/sram.h
+++ b/arch/arm/plat-omap/include/plat/sram.h
@@ -12,16 +12,19 @@ 
 #define __ARCH_ARM_OMAP_SRAM_H
 
 #ifndef __ASSEMBLY__
-#include <asm/fncpy.h>
+#include <asm/sram-pool.h>
 
-extern void *omap_sram_push_address(unsigned long size);
+extern struct sram_pool *omap_sram_pool;
 
-/* Macro to push a function to the internal SRAM, using the fncpy API */
+/*
+ * Note that fncpy requires the SRAM address to be aligned to an 8-byte
+ * boundary, so the min_alloc_order for the pool is set appropriately.
+ */
 #define omap_sram_push(funcp, size) ({				\
-	typeof(&(funcp)) _res = NULL;				\
-	void *_sram_address = omap_sram_push_address(size);	\
-	if (_sram_address)					\
-		_res = fncpy(_sram_address, &(funcp), size);	\
+	typeof(&(funcp)) _res;					\
+	_res = sram_pool_fncpy(omap_sram_pool, funcp, size);	\
+	if (!_res)						\
+		pr_err("Not enough space in SRAM\n");		\
 	_res;							\
 })
 
diff --git a/arch/arm/plat-omap/sram.c b/arch/arm/plat-omap/sram.c
index a3f50b3..3588749 100644
--- a/arch/arm/plat-omap/sram.c
+++ b/arch/arm/plat-omap/sram.c
@@ -75,7 +75,6 @@ 
 static unsigned long omap_sram_start;
 static unsigned long omap_sram_base;
 static unsigned long omap_sram_size;
-static unsigned long omap_sram_ceil;
 
 /*
  * Depending on the target RAMFS firewall setup, the public usable amount of
@@ -104,6 +103,8 @@  static int is_sram_locked(void)
 		return 1; /* assume locked with no PPA or security driver */
 }
 
+struct sram_pool *omap_sram_pool;
+
 /*
  * The amount of SRAM depends on the core type.
  * Note that we cannot try to test for SRAM here because writes
@@ -182,7 +183,16 @@  static void __init omap_detect_sram(void)
 			omap_sram_size - SRAM_BOOTLOADER_SZ);
 	omap_sram_size -= reserved;
 
-	omap_sram_ceil = omap_sram_base + omap_sram_size;
+	{
+		/* The first SRAM_BOOTLOADER_SZ of SRAM are reserved */
+		void *base = (void *)omap_sram_base + SRAM_BOOTLOADER_SZ;
+		phys_addr_t phys = omap_sram_start + SRAM_BOOTLOADER_SZ;
+		size_t len = omap_sram_size - SRAM_BOOTLOADER_SZ;
+
+		omap_sram_pool = sram_pool_create(base, phys, len,
+						ilog2(FNCPY_ALIGN));
+		WARN_ON(!omap_sram_pool);
+	}
 }
 
 static struct map_desc omap_sram_io_desc[] __initdata = {
@@ -242,26 +252,6 @@  static void __init omap_map_sram(void)
 	       omap_sram_size - SRAM_BOOTLOADER_SZ);
 }
 
-/*
- * Memory allocator for SRAM: calculates the new ceiling address
- * for pushing a function using the fncpy API.
- *
- * Note that fncpy requires the returned address to be aligned
- * to an 8-byte boundary.
- */
-void *omap_sram_push_address(unsigned long size)
-{
-	if (size > (omap_sram_ceil - (omap_sram_base + SRAM_BOOTLOADER_SZ))) {
-		printk(KERN_ERR "Not enough space in SRAM\n");
-		return NULL;
-	}
-
-	omap_sram_ceil -= size;
-	omap_sram_ceil = ROUND_DOWN(omap_sram_ceil, FNCPY_ALIGN);
-
-	return (void *)omap_sram_ceil;
-}
-
 #ifdef CONFIG_ARCH_OMAP1
 
 static void (*_omap_sram_reprogram_clock)(u32 dpllctl, u32 ckctl);
diff --git a/drivers/uio/uio_pruss.c b/drivers/uio/uio_pruss.c
index daf6e77..2be3155 100644
--- a/drivers/uio/uio_pruss.c
+++ b/drivers/uio/uio_pruss.c
@@ -62,7 +62,7 @@  MODULE_PARM_DESC(extram_pool_sz, "external ram pool size to allocate");
 struct uio_pruss_dev {
 	struct uio_info *info;
 	struct clk *pruss_clk;
-	dma_addr_t sram_paddr;
+	phys_addr_t sram_paddr;
 	dma_addr_t ddr_paddr;
 	void __iomem *prussio_vaddr;
 	void *sram_vaddr;
@@ -106,7 +106,8 @@  static void pruss_cleanup(struct platform_device *dev,
 			gdev->ddr_paddr);
 	}
 	if (gdev->sram_vaddr)
-		sram_free(gdev->sram_vaddr, sram_pool_sz);
+		sram_pool_free(davinci_sram_pool, gdev->sram_vaddr,
+			       sram_pool_sz);
 	kfree(gdev->info);
 	clk_put(gdev->pruss_clk);
 	kfree(gdev);
@@ -152,7 +153,8 @@  static int __devinit pruss_probe(struct platform_device *dev)
 		goto out_free;
 	}
 
-	gdev->sram_vaddr = sram_alloc(sram_pool_sz, &(gdev->sram_paddr));
+	gdev->sram_vaddr = sram_pool_alloc(davinci_sram_pool, sram_pool_sz,
+					   &(gdev->sram_paddr));
 	if (!gdev->sram_vaddr) {
 		dev_err(&dev->dev, "Could not allocate SRAM pool\n");
 		goto out_free;