diff mbox

[v2,08/25] arch: introduce memremap()

Message ID 20150725023842.8664.97620.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams July 25, 2015, 2:38 a.m. UTC
Existing users of ioremap_cache() are mapping memory that is known in
advance to not have i/o side effects.  These users are forced to cast
away the __iomem annotation, or otherwise neglect to fix the sparse
errors thrown when dereferencing pointers to this memory.  Provide
memremap() as a non __iomem annotated ioremap_*() in the case when
ioremap is otherwise a pointer to memory. Outside of ioremap() and
ioremap_nocache(), the expectation is that most calls to
ioremap_<type>() are seeking memory-like semantics (e.g.  speculative
reads, and prefetching permitted).  These callsites can be moved to
memremap() over time.

memremap() is a break from the ioremap implementation pattern of adding
a new memremap_<type>() for each mapping type and having silent
compatibility fall backs.  Instead, the
implementation defines flags that are passed to the central memremap()
and if a mapping type is not supported by an arch memremap returns NULL.

The behavior change to return NULL on an unsupported request is reserved
for a later patch.  This initial implementation starts off by using
ioremap_cache() directly.  Once all ioremap_cache() and ioremap_wt()
instances have been converted the functionality for establishing these
mappings will be pushed to a per-architecture arch_memremap()
implementation.

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/ia64/include/asm/io.h   |    1 +
 arch/sh/include/asm/io.h     |    1 +
 arch/xtensa/include/asm/io.h |    1 +
 include/linux/io.h           |    9 +++++
 kernel/Makefile              |    2 +
 kernel/memremap.c            |   82 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 96 insertions(+)
 create mode 100644 kernel/memremap.c

Comments

Christoph Hellwig July 26, 2015, 5:25 p.m. UTC | #1
On Fri, Jul 24, 2015 at 10:38:42PM -0400, Dan Williams wrote:
> The behavior change to return NULL on an unsupported request is reserved
> for a later patch.

Why?

> +enum {
> +	MEMREMAP_WB = 1 << 0,
> +	MEMREMAP_WT = 1 << 1,
> +	MEMREMAP_CACHE = MEMREMAP_WB,

What's the point of having this MEMREMAP_CACHE alias?

Also please document the meaning of the flags for the poor users.
Dan Williams July 26, 2015, 5:49 p.m. UTC | #2
On Sun, Jul 26, 2015 at 10:25 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Fri, Jul 24, 2015 at 10:38:42PM -0400, Dan Williams wrote:
>> The behavior change to return NULL on an unsupported request is reserved
>> for a later patch.
>
> Why?

This is for drivers like pmem that care about the mapping type.  For
example, if pmem can't get a cache-enabled mapping it is potentially
putting the write durability of the persistent media at risk.

>> +enum {
>> +     MEMREMAP_WB = 1 << 0,
>> +     MEMREMAP_WT = 1 << 1,
>> +     MEMREMAP_CACHE = MEMREMAP_WB,
>
> What's the point of having this MEMREMAP_CACHE alias?

For developers that are used to seeing ioremap_cache()...

> Also please document the meaning of the flags for the poor users.

Will do.  I'll mostly borrow from the x86 mapping type definitions,
but these will also have architecture specific semantics /
constraints.
Christoph Hellwig July 27, 2015, 5:12 a.m. UTC | #3
On Sun, Jul 26, 2015 at 10:49:39AM -0700, Dan Williams wrote:
> > What's the point of having this MEMREMAP_CACHE alias?
> 
> For developers that are used to seeing ioremap_cache()...

Meh.  We've got git logs that show clearly what replaced a previous API.
Duplicate names for the same API are highly confusing.
Christoph Hellwig July 27, 2015, 5:12 a.m. UTC | #4
On Sun, Jul 26, 2015 at 10:49:39AM -0700, Dan Williams wrote:
> On Sun, Jul 26, 2015 at 10:25 AM, Christoph Hellwig <hch@lst.de> wrote:
> > On Fri, Jul 24, 2015 at 10:38:42PM -0400, Dan Williams wrote:
> >> The behavior change to return NULL on an unsupported request is reserved
> >> for a later patch.
> >
> > Why?
> 
> This is for drivers like pmem that care about the mapping type.  For
> example, if pmem can't get a cache-enabled mapping it is potentially
> putting the write durability of the persistent media at risk.

I understand that part, but the question is why the old behavior
is retained for now and only changed later.
Luis Chamberlain July 27, 2015, 11:17 p.m. UTC | #5
On Fri, Jul 24, 2015 at 10:38:42PM -0400, Dan Williams wrote:
> Existing users of ioremap_cache() are mapping memory that is known in
> advance to not have i/o side effects.  These users are forced to cast
> away the __iomem annotation, or otherwise neglect to fix the sparse
> errors thrown when dereferencing pointers to this memory.  Provide
> memremap() as a non __iomem annotated ioremap_*() in the case when
> ioremap is otherwise a pointer to memory.

Ok so a special use case.

> Outside of ioremap() and
> ioremap_nocache(), the expectation is that most calls to
> ioremap_<type>() are seeking memory-like semantics (e.g.  speculative
> reads, and prefetching permitted).  These callsites can be moved to
> memremap() over time.

Such memory-like smantics are not well defined yet and this has caused
issues over expectations over a slew of APIs. As you note above
your own defined 'semantics' so far for memremap are just that there are
no i/o side effects, when the mapped memory is just a pointer to memory,
as such I do not think its fair to set the excpectations that we'll
switch all other ioremap_*() callers to the same memremap() API. Now,
it may be a good idea to use something similar, ie, to pass flags,
but setting the expectations outright to move to memremap() without having
any agreement having been made over semantics seems uncalled for at this
point in time, specially when you are noting that the expectations for
both sets of calls are different.

So perhaps:

"
Outside of ioremap() and ioremap_nocache(), all other ioremap_<type>()
variant calls are seeking memory-like semantics (e.g.  speculative
reads, and prefetching permitted) and all calls expecations currently
differ depending on architecture. Once and if we get agreement on such
semantics we may be able to move such ioremap_*() variant calls to
a similar API where the semantics required are clearly spelled out
and well defined and passed as arguments.
"

> diff --git a/kernel/memremap.c b/kernel/memremap.c
> new file mode 100644
> index 000000000000..ba206fd11785
> --- /dev/null
> +++ b/kernel/memremap.c
> @@ -0,0 +1,82 @@
> +/*
> + * Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */
> +#include <linux/types.h>
> +#include <linux/io.h>
> +#include <linux/mm.h>
> +
> +#ifndef ioremap_cache
> +/* temporary while we convert existing ioremap_cache users to memremap */
> +__weak void __iomem *ioremap_cache(resource_size_t offset, unsigned long size)
> +{
> +	return ioremap(offset, size);
> +}
> +#endif
> +
> +/*
> + * memremap() is "ioremap" for cases where it is known that the resource
> + * being mapped does not have i/o side effects and the __iomem
> + * annotation is not applicable.
> + */
> +void *memremap(resource_size_t offset, size_t size, unsigned long flags)
> +{
> +	int is_ram = region_is_ram(offset, size);
> +	void *addr = NULL;
> +
> +	if (is_ram < 0) {
> +		WARN_ONCE(1, "memremap attempted on mixed range %pa size: %zu\n",
> +				&offset, size);
> +		return NULL;
> +	}
> +
> +	/* Try all mapping types requested until one returns non-NULL */
> +	if (flags & MEMREMAP_CACHE) {
> +		flags &= ~MEMREMAP_CACHE;
> +		/*
> +		 * MEMREMAP_CACHE is special in that it can be satisifed
> +		 * from the direct map.  Some archs depend on the
> +		 * capability of memremap() to autodetect cases where
> +		 * the requested range is potentially in "System RAM"
> +		 */
> +		if (is_ram)
> +			addr = __va(offset);

The no MMU case seems to match this, can that be switch to this?

  Luis
Dan Williams July 27, 2015, 11:26 p.m. UTC | #6
On Sun, Jul 26, 2015 at 10:12 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Sun, Jul 26, 2015 at 10:49:39AM -0700, Dan Williams wrote:
>> On Sun, Jul 26, 2015 at 10:25 AM, Christoph Hellwig <hch@lst.de> wrote:
>> > On Fri, Jul 24, 2015 at 10:38:42PM -0400, Dan Williams wrote:
>> >> The behavior change to return NULL on an unsupported request is reserved
>> >> for a later patch.
>> >
>> > Why?
>>
>> This is for drivers like pmem that care about the mapping type.  For
>> example, if pmem can't get a cache-enabled mapping it is potentially
>> putting the write durability of the persistent media at risk.
>
> I understand that part, but the question is why the old behavior
> is retained for now and only changed later.

Oh, because all we have at this point is ioremap_cache() which
silently falls back.  It's not until the introduction of
arch_memremp() where we update the arch code to break that behavior.

That said, I think it may be beneficial to allow a fallback if the
user cares.  So maybe memremap() can call plain ioremap() if
MEMREMAP_STRICT is not set and none of the other mapping types are
satisfied.
Dan Williams July 27, 2015, 11:31 p.m. UTC | #7
On Mon, Jul 27, 2015 at 4:17 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Jul 24, 2015 at 10:38:42PM -0400, Dan Williams wrote:
>> Existing users of ioremap_cache() are mapping memory that is known in
>> advance to not have i/o side effects.  These users are forced to cast
>> away the __iomem annotation, or otherwise neglect to fix the sparse
>> errors thrown when dereferencing pointers to this memory.  Provide
>> memremap() as a non __iomem annotated ioremap_*() in the case when
>> ioremap is otherwise a pointer to memory.
>
> Ok so a special use case.
>
>> Outside of ioremap() and
>> ioremap_nocache(), the expectation is that most calls to
>> ioremap_<type>() are seeking memory-like semantics (e.g.  speculative
>> reads, and prefetching permitted).  These callsites can be moved to
>> memremap() over time.
>
> Such memory-like smantics are not well defined yet and this has caused
> issues over expectations over a slew of APIs. As you note above
> your own defined 'semantics' so far for memremap are just that there are
> no i/o side effects, when the mapped memory is just a pointer to memory,
> as such I do not think its fair to set the excpectations that we'll
> switch all other ioremap_*() callers to the same memremap() API. Now,
> it may be a good idea to use something similar, ie, to pass flags,
> but setting the expectations outright to move to memremap() without having
> any agreement having been made over semantics seems uncalled for at this
> point in time, specially when you are noting that the expectations for
> both sets of calls are different.
>
> So perhaps:
>
> "
> Outside of ioremap() and ioremap_nocache(), all other ioremap_<type>()
> variant calls are seeking memory-like semantics (e.g.  speculative
> reads, and prefetching permitted) and all calls expecations currently
> differ depending on architecture. Once and if we get agreement on such
> semantics we may be able to move such ioremap_*() variant calls to
> a similar API where the semantics required are clearly spelled out
> and well defined and passed as arguments.

I still think ioremap_wc(), and now ioremap_uc(), are special and are
not obvious candidates for conversion to memremap().

> "
>
>> diff --git a/kernel/memremap.c b/kernel/memremap.c
>> new file mode 100644
>> index 000000000000..ba206fd11785
>> --- /dev/null
>> +++ b/kernel/memremap.c
>> @@ -0,0 +1,82 @@
>> +/*
>> + * Copyright(c) 2015 Intel Corporation. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of version 2 of the GNU General Public License as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful, but
>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details.
>> + */
>> +#include <linux/types.h>
>> +#include <linux/io.h>
>> +#include <linux/mm.h>
>> +
>> +#ifndef ioremap_cache
>> +/* temporary while we convert existing ioremap_cache users to memremap */
>> +__weak void __iomem *ioremap_cache(resource_size_t offset, unsigned long size)
>> +{
>> +     return ioremap(offset, size);
>> +}
>> +#endif
>> +
>> +/*
>> + * memremap() is "ioremap" for cases where it is known that the resource
>> + * being mapped does not have i/o side effects and the __iomem
>> + * annotation is not applicable.
>> + */
>> +void *memremap(resource_size_t offset, size_t size, unsigned long flags)
>> +{
>> +     int is_ram = region_is_ram(offset, size);
>> +     void *addr = NULL;
>> +
>> +     if (is_ram < 0) {
>> +             WARN_ONCE(1, "memremap attempted on mixed range %pa size: %zu\n",
>> +                             &offset, size);
>> +             return NULL;
>> +     }
>> +
>> +     /* Try all mapping types requested until one returns non-NULL */
>> +     if (flags & MEMREMAP_CACHE) {
>> +             flags &= ~MEMREMAP_CACHE;
>> +             /*
>> +              * MEMREMAP_CACHE is special in that it can be satisifed
>> +              * from the direct map.  Some archs depend on the
>> +              * capability of memremap() to autodetect cases where
>> +              * the requested range is potentially in "System RAM"
>> +              */
>> +             if (is_ram)
>> +                     addr = __va(offset);
>
> The no MMU case seems to match this, can that be switch to this?

Hmm, it may be possible to consolidate all the NOMMU cases here.
That's a good incremental cleanup once these base patches are merged.
Luis Chamberlain July 27, 2015, 11:43 p.m. UTC | #8
On Mon, Jul 27, 2015 at 04:31:09PM -0700, Dan Williams wrote:
> On Mon, Jul 27, 2015 at 4:17 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Jul 24, 2015 at 10:38:42PM -0400, Dan Williams wrote:
> >> Existing users of ioremap_cache() are mapping memory that is known in
> >> advance to not have i/o side effects.  These users are forced to cast
> >> away the __iomem annotation, or otherwise neglect to fix the sparse
> >> errors thrown when dereferencing pointers to this memory.  Provide
> >> memremap() as a non __iomem annotated ioremap_*() in the case when
> >> ioremap is otherwise a pointer to memory.
> >
> > Ok so a special use case.
> >
> >> Outside of ioremap() and
> >> ioremap_nocache(), the expectation is that most calls to
> >> ioremap_<type>() are seeking memory-like semantics (e.g.  speculative
> >> reads, and prefetching permitted).  These callsites can be moved to
> >> memremap() over time.
> >
> > Such memory-like smantics are not well defined yet and this has caused
> > issues over expectations over a slew of APIs. As you note above
> > your own defined 'semantics' so far for memremap are just that there are
> > no i/o side effects, when the mapped memory is just a pointer to memory,
> > as such I do not think its fair to set the excpectations that we'll
> > switch all other ioremap_*() callers to the same memremap() API. Now,
> > it may be a good idea to use something similar, ie, to pass flags,
> > but setting the expectations outright to move to memremap() without having
> > any agreement having been made over semantics seems uncalled for at this
> > point in time, specially when you are noting that the expectations for
> > both sets of calls are different.
> >
> > So perhaps:
> >
> > "
> > Outside of ioremap() and ioremap_nocache(), all other ioremap_<type>()
> > variant calls are seeking memory-like semantics (e.g.  speculative
> > reads, and prefetching permitted) and all calls expecations currently
> > differ depending on architecture. Once and if we get agreement on such
> > semantics we may be able to move such ioremap_*() variant calls to
> > a similar API where the semantics required are clearly spelled out
> > and well defined and passed as arguments.
> 
> I still think ioremap_wc(), and now ioremap_uc(), are special and are
> not obvious candidates for conversion to memremap().

OK great, then we're in strong agreement, so removing the "Outside of
ioremap() and... over time" might be best then? Otherwise what I posted
seems to reflect the state of affairs better?

> >> +void *memremap(resource_size_t offset, size_t size, unsigned long flags)
> >> +{
> >> +     int is_ram = region_is_ram(offset, size);
> >> +     void *addr = NULL;
> >> +
> >> +     if (is_ram < 0) {
> >> +             WARN_ONCE(1, "memremap attempted on mixed range %pa size: %zu\n",
> >> +                             &offset, size);
> >> +             return NULL;
> >> +     }
> >> +
> >> +     /* Try all mapping types requested until one returns non-NULL */
> >> +     if (flags & MEMREMAP_CACHE) {
> >> +             flags &= ~MEMREMAP_CACHE;
> >> +             /*
> >> +              * MEMREMAP_CACHE is special in that it can be satisifed
> >> +              * from the direct map.  Some archs depend on the
> >> +              * capability of memremap() to autodetect cases where
> >> +              * the requested range is potentially in "System RAM"
> >> +              */
> >> +             if (is_ram)
> >> +                     addr = __va(offset);
> >
> > The no MMU case seems to match this, can that be switch to this?
> 
> Hmm, it may be possible to consolidate all the NOMMU cases here.
> That's a good incremental cleanup once these base patches are merged.

Sure fine by me.

  Luis
Dan Williams July 28, 2015, 12:32 a.m. UTC | #9
On Mon, Jul 27, 2015 at 4:43 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Mon, Jul 27, 2015 at 04:31:09PM -0700, Dan Williams wrote:
>> On Mon, Jul 27, 2015 at 4:17 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Jul 24, 2015 at 10:38:42PM -0400, Dan Williams wrote:
>> >> Existing users of ioremap_cache() are mapping memory that is known in
>> >> advance to not have i/o side effects.  These users are forced to cast
>> >> away the __iomem annotation, or otherwise neglect to fix the sparse
>> >> errors thrown when dereferencing pointers to this memory.  Provide
>> >> memremap() as a non __iomem annotated ioremap_*() in the case when
>> >> ioremap is otherwise a pointer to memory.
>> >
>> > Ok so a special use case.
>> >
>> >> Outside of ioremap() and
>> >> ioremap_nocache(), the expectation is that most calls to
>> >> ioremap_<type>() are seeking memory-like semantics (e.g.  speculative
>> >> reads, and prefetching permitted).  These callsites can be moved to
>> >> memremap() over time.
>> >
>> > Such memory-like smantics are not well defined yet and this has caused
>> > issues over expectations over a slew of APIs. As you note above
>> > your own defined 'semantics' so far for memremap are just that there are
>> > no i/o side effects, when the mapped memory is just a pointer to memory,
>> > as such I do not think its fair to set the excpectations that we'll
>> > switch all other ioremap_*() callers to the same memremap() API. Now,
>> > it may be a good idea to use something similar, ie, to pass flags,
>> > but setting the expectations outright to move to memremap() without having
>> > any agreement having been made over semantics seems uncalled for at this
>> > point in time, specially when you are noting that the expectations for
>> > both sets of calls are different.
>> >
>> > So perhaps:
>> >
>> > "
>> > Outside of ioremap() and ioremap_nocache(), all other ioremap_<type>()
>> > variant calls are seeking memory-like semantics (e.g.  speculative
>> > reads, and prefetching permitted) and all calls expecations currently
>> > differ depending on architecture. Once and if we get agreement on such
>> > semantics we may be able to move such ioremap_*() variant calls to
>> > a similar API where the semantics required are clearly spelled out
>> > and well defined and passed as arguments.
>>
>> I still think ioremap_wc(), and now ioremap_uc(), are special and are
>> not obvious candidates for conversion to memremap().
>
> OK great, then we're in strong agreement, so removing the "Outside of
> ioremap() and... over time" might be best then? Otherwise what I posted
> seems to reflect the state of affairs better?
>

Ah yes, I need to clean that up.  Thanks!
Christoph Hellwig July 29, 2015, 6:50 a.m. UTC | #10
On Mon, Jul 27, 2015 at 04:26:03PM -0700, Dan Williams wrote:
> Oh, because all we have at this point is ioremap_cache() which
> silently falls back.  It's not until the introduction of
> arch_memremp() where we update the arch code to break that behavior.

Ok, makes sense.  Might be worth to document in the changelog.

> That said, I think it may be beneficial to allow a fallback if the
> user cares.  So maybe memremap() can call plain ioremap() if
> MEMREMAP_STRICT is not set and none of the other mapping types are
> satisfied.

Is there a real use case for it?  Fallback APIs always seem confusing
and it might make more sense to do this in the caller(s) that actually
need it.
Luis Chamberlain July 29, 2015, 6:27 p.m. UTC | #11
On Wed, Jul 29, 2015 at 08:50:04AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 27, 2015 at 04:26:03PM -0700, Dan Williams wrote:
> > Oh, because all we have at this point is ioremap_cache() which
> > silently falls back.  It's not until the introduction of
> > arch_memremp() where we update the arch code to break that behavior.
> 
> Ok, makes sense.  Might be worth to document in the changelog.
> 
> > That said, I think it may be beneficial to allow a fallback if the
> > user cares.  So maybe memremap() can call plain ioremap() if
> > MEMREMAP_STRICT is not set and none of the other mapping types are
> > satisfied.
> 
> Is there a real use case for it?  Fallback APIs always seem confusing
> and it might make more sense to do this in the caller(s) that actually
> need it.

It seems semantics-wise we are trying to separate these two really, so I agree
with this. Having a fallback would onloy make things more complicated for any
sanitizer / checker / etc, and I don't think the practical gains of having a
fallback outweight the gains of having a clear semantic separation on intended
memory type and interactions with it.

  Luis
Dan Williams July 29, 2015, 6:33 p.m. UTC | #12
On Wed, Jul 29, 2015 at 11:27 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Wed, Jul 29, 2015 at 08:50:04AM +0200, Christoph Hellwig wrote:
>> On Mon, Jul 27, 2015 at 04:26:03PM -0700, Dan Williams wrote:
>> > Oh, because all we have at this point is ioremap_cache() which
>> > silently falls back.  It's not until the introduction of
>> > arch_memremp() where we update the arch code to break that behavior.
>>
>> Ok, makes sense.  Might be worth to document in the changelog.
>>
>> > That said, I think it may be beneficial to allow a fallback if the
>> > user cares.  So maybe memremap() can call plain ioremap() if
>> > MEMREMAP_STRICT is not set and none of the other mapping types are
>> > satisfied.
>>
>> Is there a real use case for it?  Fallback APIs always seem confusing
>> and it might make more sense to do this in the caller(s) that actually
>> need it.
>
> It seems semantics-wise we are trying to separate these two really, so I agree
> with this. Having a fallback would onloy make things more complicated for any
> sanitizer / checker / etc, and I don't think the practical gains of having a
> fallback outweight the gains of having a clear semantic separation on intended
> memory type and interactions with it.
>

Yup, consider it dropped.  Drivers that want fallback behavior can do
it explicitly.
Toshi Kani July 29, 2015, 9 p.m. UTC | #13
On Wed, 2015-07-29 at 11:33 -0700, Dan Williams wrote:
> On Wed, Jul 29, 2015 at 11:27 AM, Luis R. Rodriguez <mcgrof@suse.com> 
> wrote:
> > On Wed, Jul 29, 2015 at 08:50:04AM +0200, Christoph Hellwig wrote:
> > > On Mon, Jul 27, 2015 at 04:26:03PM -0700, Dan Williams wrote:
> > > > Oh, because all we have at this point is ioremap_cache() which
> > > > silently falls back.  It's not until the introduction of
> > > > arch_memremp() where we update the arch code to break that 
> > > > behavior.
> > > 
> > > Ok, makes sense.  Might be worth to document in the changelog.
> > > 
> > > > That said, I think it may be beneficial to allow a fallback if the
> > > > user cares.  So maybe memremap() can call plain ioremap() if
> > > > MEMREMAP_STRICT is not set and none of the other mapping types are
> > > > satisfied.
> > > 
> > > Is there a real use case for it?  Fallback APIs always seem confusing
> > > and it might make more sense to do this in the caller(s) that 
> > > actually
> > > need it.
> > 
> > It seems semantics-wise we are trying to separate these two really, so 
> > I agree
> > with this. Having a fallback would onloy make things more complicated 
> > for any
> > sanitizer / checker / etc, and I don't think the practical gains of 
> > having a
> > fallback outweight the gains of having a clear semantic separation on 
> > intended
> > memory type and interactions with it.
> > 
> 
> Yup, consider it dropped.  Drivers that want fallback behavior can do
> it explicitly.

I agree in general.  However, for the drivers to be able to fall back
properly, they will need to know the cache type they can fall back with. 
 ioremap() falls back to the cache type of an existing mapping to avoid
aliasing.

Thanks,
-Toshi
Toshi Kani July 29, 2015, 9:11 p.m. UTC | #14
On Wed, 2015-07-29 at 15:00 -0600, Toshi Kani wrote:
> On Wed, 2015-07-29 at 11:33 -0700, Dan Williams wrote:
> > On Wed, Jul 29, 2015 at 11:27 AM, Luis R. Rodriguez <mcgrof@suse.com> 
> > wrote:
> > > On Wed, Jul 29, 2015 at 08:50:04AM +0200, Christoph Hellwig wrote:
> > > > On Mon, Jul 27, 2015 at 04:26:03PM -0700, Dan Williams wrote:
> > > > > Oh, because all we have at this point is ioremap_cache() which
> > > > > silently falls back.  It's not until the introduction of
> > > > > arch_memremp() where we update the arch code to break that 
> > > > > behavior.
> > > > 
> > > > Ok, makes sense.  Might be worth to document in the changelog.
> > > > 
> > > > > That said, I think it may be beneficial to allow a fallback if 
> > > > > the
> > > > > user cares.  So maybe memremap() can call plain ioremap() if
> > > > > MEMREMAP_STRICT is not set and none of the other mapping types 
> > > > > are
> > > > > satisfied.
> > > > 
> > > > Is there a real use case for it?  Fallback APIs always seem 
> > > > confusing
> > > > and it might make more sense to do this in the caller(s) that 
> > > > actually
> > > > need it.
> > > 
> > > It seems semantics-wise we are trying to separate these two really, 
> > > so 
> > > I agree
> > > with this. Having a fallback would onloy make things more complicated 
> > > 
> > > for any
> > > sanitizer / checker / etc, and I don't think the practical gains of 
> > > having a
> > > fallback outweight the gains of having a clear semantic separation on 
> > > 
> > > intended
> > > memory type and interactions with it.
> > > 
> > 
> > Yup, consider it dropped.  Drivers that want fallback behavior can do
> > it explicitly.
> 
> I agree in general.  However, for the drivers to be able to fall back
> properly, they will need to know the cache type they can fall back with. 
>  ioremap() falls back to the cache type of an existing mapping to avoid
> aliasing.

Never mind...  In this context, we are talking about fallback in case of "
API not implemented" on arch.  I was talking about fallback in case of
aliasing.

Thanks,
-Toshi
Luis Chamberlain July 29, 2015, 9:43 p.m. UTC | #15
On Wed, Jul 29, 2015 at 03:00:38PM -0600, Toshi Kani wrote:
> On Wed, 2015-07-29 at 11:33 -0700, Dan Williams wrote:
> > On Wed, Jul 29, 2015 at 11:27 AM, Luis R. Rodriguez <mcgrof@suse.com> 
> > wrote:
> > > On Wed, Jul 29, 2015 at 08:50:04AM +0200, Christoph Hellwig wrote:
> > > > On Mon, Jul 27, 2015 at 04:26:03PM -0700, Dan Williams wrote:
> > > > > Oh, because all we have at this point is ioremap_cache() which
> > > > > silently falls back.  It's not until the introduction of
> > > > > arch_memremp() where we update the arch code to break that 
> > > > > behavior.
> > > > 
> > > > Ok, makes sense.  Might be worth to document in the changelog.
> > > > 
> > > > > That said, I think it may be beneficial to allow a fallback if the
> > > > > user cares.  So maybe memremap() can call plain ioremap() if
> > > > > MEMREMAP_STRICT is not set and none of the other mapping types are
> > > > > satisfied.
> > > > 
> > > > Is there a real use case for it?  Fallback APIs always seem confusing
> > > > and it might make more sense to do this in the caller(s) that 
> > > > actually
> > > > need it.
> > > 
> > > It seems semantics-wise we are trying to separate these two really, so 
> > > I agree
> > > with this. Having a fallback would onloy make things more complicated 
> > > for any
> > > sanitizer / checker / etc, and I don't think the practical gains of 
> > > having a
> > > fallback outweight the gains of having a clear semantic separation on 
> > > intended
> > > memory type and interactions with it.
> > > 
> > 
> > Yup, consider it dropped.  Drivers that want fallback behavior can do
> > it explicitly.
> 
> I agree in general.  However, for the drivers to be able to fall back
> properly, they will need to know the cache type they can fall back with. 

That would depend on the purpose of the region and the driver developer should
in theory know best. One issue with this of course is that, as we've
discovered, the semantics of on the ioremap*() variant front regarding cache
types is not clearly well defined, or at least it may be only well defined
implicitly on x86 only, so the driver developer can only *hope* for the best
across architectures. Our ambiguity on our semantics on ioremap*() variants
therefore means driver developers can resonably be puzzled by what fallbacks to
use. That also means architectures maintainers should not whip driver developers
for any misuse. Such considerations are why although we're now revisiting
semantics for ioremap*() variants I was in hopes we could be at least somewhat
pedantic about memremap() semantics.

For instance since memremap() only has 2 types right now can a respective
fallback be documented as an alternative to help with this ? Or can we not
generalize this ?  One for MEMREMAP_WB and one for MEMREMAP_WT ?

> ioremap() falls back to the cache type of an existing mapping to avoid
> aliasing.

Does that assume an existing ioremap*() call was used on a bigger range?
Do you know if that happens to only be the case for x86 (I'd think so)
or if its the same for other architectures ?

While at it, Dan, will / should memremap() support overlapping calls ?
What is the expectation on behaviour ?

PS: I did see you reply about this being about lacking an arch implementation
for a memremap() type, not a cache type, but as the driver uses one memremap()
type the goal for a region is just as important as the resulting type.

  Luis
Dan Williams July 29, 2015, 9:47 p.m. UTC | #16
On Wed, Jul 29, 2015 at 2:43 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> While at it, Dan, will / should memremap() support overlapping calls ?
> What is the expectation on behaviour ?

It will be the same as ioremap().  Each mapping gets its own virtual
address and overlapping calls are permitted.  The only protection is
that RAM is never mapped with an alias.
Luis Chamberlain July 29, 2015, 9:52 p.m. UTC | #17
On Wed, Jul 29, 2015 at 02:47:40PM -0700, Dan Williams wrote:
> On Wed, Jul 29, 2015 at 2:43 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > While at it, Dan, will / should memremap() support overlapping calls ?
> > What is the expectation on behaviour ?
> 
> It will be the same as ioremap().  Each mapping gets its own virtual
> address and overlapping calls are permitted.  The only protection is
> that RAM is never mapped with an alias.

If you could document this as part of your next respin that'd be
highly appreciated.

  Luis
Toshi Kani July 30, 2015, midnight UTC | #18
On Wed, 2015-07-29 at 23:43 +0200, Luis R. Rodriguez wrote:
> On Wed, Jul 29, 2015 at 03:00:38PM -0600, Toshi Kani wrote:
> > On Wed, 2015-07-29 at 11:33 -0700, Dan Williams wrote:
> > > On Wed, Jul 29, 2015 at 11:27 AM, Luis R. Rodriguez <mcgrof@suse.com> 
> > > wrote:
> > > > On Wed, Jul 29, 2015 at 08:50:04AM +0200, Christoph Hellwig wrote:
> > > > > On Mon, Jul 27, 2015 at 04:26:03PM -0700, Dan Williams wrote:
> > > > > > Oh, because all we have at this point is ioremap_cache() which
> > > > > > silently falls back.  It's not until the introduction of
> > > > > > arch_memremp() where we update the arch code to break that 
> > > > > > behavior.
> > > > > 
> > > > > Ok, makes sense.  Might be worth to document in the changelog.
> > > > > 
> > > > > > That said, I think it may be beneficial to allow a fallback if 
> > > > > > the user cares.  So maybe memremap() can call plain ioremap() if
> > > > > > MEMREMAP_STRICT is not set and none of the other mapping types 
> > > > > > are satisfied.
> > > > > 
> > > > > Is there a real use case for it?  Fallback APIs always seem 
> > > > > confusing and it might make more sense to do this in the caller(s) 
> > > > > that actually need it.
> > > > 
> > > > It seems semantics-wise we are trying to separate these two really, 
> > > > so I agree with this. Having a fallback would onloy make things more
> > > > complicated for any sanitizer / checker / etc, and I don't think the 
> > > > practical gains of having a fallback outweight the gains of having a 
> > > > clear semantic separation on intended memory type and interactions 
> > > > with it.
> > > > 
> > > 
> > > Yup, consider it dropped.  Drivers that want fallback behavior can do
> > > it explicitly.
> > 
> > I agree in general.  However, for the drivers to be able to fall back
> > properly, they will need to know the cache type they can fall back with. 
> > 
> 
> That would depend on the purpose of the region and the driver developer 
> should in theory know best. One issue with this of course is that, as 
> we've discovered, the semantics of on the ioremap*() variant front 
> regarding cache types is not clearly well defined, or at least it may be 
> only well defined implicitly on x86 only, so the driver developer can only 
> *hope* for the best across architectures. Our ambiguity on our semantics 
> on ioremap*() variants therefore means driver developers can resonably be 
> puzzled by what fallbacks to use. That also means architectures 
> maintainers should not whip driver developers for any misuse. Such 
> considerations are why although we're now revisiting semantics for 
> ioremap*() variants I was in hopes we could be at least somewhat
> pedantic about memremap() semantics.

I agree.  However, there are a few exceptions like /dev/mem, which can map a
target range without knowledge.

> For instance since memremap() only has 2 types right now can a respective
> fallback be documented as an alternative to help with this ? Or can we not
> generalize this ?  One for MEMREMAP_WB and one for MEMREMAP_WT ?

Yes, if a target range can be only mapped by memremap().  However, there is
no restriction that a range can be mapped with a single interface alone. 
 For example, a range can be mapped with remap_pfn_range() to user space
with any cache type.  So, in theory, memremap() can overlap with any other
types.

> > ioremap() falls back to the cache type of an existing mapping to avoid
> > aliasing.
> 
> Does that assume an existing ioremap*() call was used on a bigger range?
> Do you know if that happens to only be the case for x86 (I'd think so)
> or if its the same for other architectures ?

In the /dev/mem example, it is remap_pfn_range().  I think other archs have
the same issue, but I do not know if they fall back in case of overlapping
call. 

> While at it, Dan, will / should memremap() support overlapping calls ?
> What is the expectation on behaviour ?
> 
> PS: I did see you reply about this being about lacking an arch 
> implementation for a memremap() type, not a cache type, but as the driver 
> uses one memremap() type the goal for a region is just as important as the 
> resulting type.

Agreed.  Drivers cannot tell if a fallback is due to lacking implementation
of overlapping, unless they check with #ifdef ioremap_xxx.

Thanks,
-Toshi
Luis Chamberlain Aug. 11, 2015, 9:30 p.m. UTC | #19
On Wed, Jul 29, 2015 at 06:00:04PM -0600, Toshi Kani wrote:
> On Wed, 2015-07-29 at 23:43 +0200, Luis R. Rodriguez wrote:
> > On Wed, Jul 29, 2015 at 03:00:38PM -0600, Toshi Kani wrote:
> > > On Wed, 2015-07-29 at 11:33 -0700, Dan Williams wrote:
> > > > On Wed, Jul 29, 2015 at 11:27 AM, Luis R. Rodriguez <mcgrof@suse.com> 
> > > > wrote:
> > > > > On Wed, Jul 29, 2015 at 08:50:04AM +0200, Christoph Hellwig wrote:
> > > > > > On Mon, Jul 27, 2015 at 04:26:03PM -0700, Dan Williams wrote:
> > > > > > > Oh, because all we have at this point is ioremap_cache() which
> > > > > > > silently falls back.  It's not until the introduction of
> > > > > > > arch_memremp() where we update the arch code to break that 
> > > > > > > behavior.
> > > > > > 
> > > > > > Ok, makes sense.  Might be worth to document in the changelog.
> > > > > > 
> > > > > > > That said, I think it may be beneficial to allow a fallback if 
> > > > > > > the user cares.  So maybe memremap() can call plain ioremap() if
> > > > > > > MEMREMAP_STRICT is not set and none of the other mapping types 
> > > > > > > are satisfied.
> > > > > > 
> > > > > > Is there a real use case for it?  Fallback APIs always seem 
> > > > > > confusing and it might make more sense to do this in the caller(s) 
> > > > > > that actually need it.
> > > > > 
> > > > > It seems semantics-wise we are trying to separate these two really, 
> > > > > so I agree with this. Having a fallback would onloy make things more
> > > > > complicated for any sanitizer / checker / etc, and I don't think the 
> > > > > practical gains of having a fallback outweight the gains of having a 
> > > > > clear semantic separation on intended memory type and interactions 
> > > > > with it.
> > > > > 
> > > > 
> > > > Yup, consider it dropped.  Drivers that want fallback behavior can do
> > > > it explicitly.
> > > 
> > > I agree in general.  However, for the drivers to be able to fall back
> > > properly, they will need to know the cache type they can fall back with. 
> > > 
> > 
> > That would depend on the purpose of the region and the driver developer 
> > should in theory know best. One issue with this of course is that, as 
> > we've discovered, the semantics of on the ioremap*() variant front 
> > regarding cache types is not clearly well defined, or at least it may be 
> > only well defined implicitly on x86 only, so the driver developer can only 
> > *hope* for the best across architectures. Our ambiguity on our semantics 
> > on ioremap*() variants therefore means driver developers can resonably be 
> > puzzled by what fallbacks to use. That also means architectures 
> > maintainers should not whip driver developers for any misuse. Such 
> > considerations are why although we're now revisiting semantics for 
> > ioremap*() variants I was in hopes we could be at least somewhat
> > pedantic about memremap() semantics.
> 
> I agree.  However, there are a few exceptions like /dev/mem, which can map a
> target range without knowledge.

Still, the expectation to require support for overlapping ioremap() calls would
seem to be more of an exception than the norm, so I'd argue that requiring
callers who know they do need overlapping support to be explicit about it may
help us simplify expecations on semantics in other areas of the kernel.

> > For instance since memremap() only has 2 types right now can a respective
> > fallback be documented as an alternative to help with this ? Or can we not
> > generalize this ?  One for MEMREMAP_WB and one for MEMREMAP_WT ?
> 
> Yes, if a target range can be only mapped by memremap().  However, there is
> no restriction that a range can be mapped with a single interface alone. 
>  For example, a range can be mapped with remap_pfn_range() to user space
> with any cache type.  So, in theory, memremap() can overlap with any other
> types.

Shouldn't that be an issue or area of concern ? It seems the flakiness on
ioremap() and its wide array flexibility would spill over the any semantics
which folks would be trying to set out with memremap(). That should not stop
the evolution of memremap() though, just pointing out that perhaps we should be
a bit more restrictive over how things can criss-cross and who areas can do it.

> > > ioremap() falls back to the cache type of an existing mapping to avoid
> > > aliasing.
> > 
> > Does that assume an existing ioremap*() call was used on a bigger range?
> > Do you know if that happens to only be the case for x86 (I'd think so)
> > or if its the same for other architectures ?
> 
> In the /dev/mem example, it is remap_pfn_range().  I think other archs have
> the same issue, but I do not know if they fall back in case of overlapping
> call. 

What should happen if remap_pfn_range() was used with pgprot_writecombine()
and later memremap() is used for instance with a smaller overlappin section,
or perhaps bigger?

  Luis
Toshi Kani Aug. 11, 2015, 10:40 p.m. UTC | #20
On Tue, 2015-08-11 at 23:30 +0200, Luis R. Rodriguez wrote:
> On Wed, Jul 29, 2015 at 06:00:04PM -0600, Toshi Kani wrote:
> > On Wed, 2015-07-29 at 23:43 +0200, Luis R. Rodriguez wrote:
> > > On Wed, Jul 29, 2015 at 03:00:38PM -0600, Toshi Kani wrote:
> > > > On Wed, 2015-07-29 at 11:33 -0700, Dan Williams wrote:
  :
> > > That would depend on the purpose of the region and the driver 
> > > developer should in theory know best. One issue with this of course is 
> > > that, as we've discovered, the semantics of on the ioremap*() variant 
> > > front regarding cache types is not clearly well defined, or at least 
> > > it may be only well defined implicitly on x86 only, so the driver 
> > > developer can only *hope* for the best across architectures. Our 
> > > ambiguity on our semantics on ioremap*() variants therefore means 
> > > driver developers can resonably be puzzled by what fallbacks to use. 
> > > That also means architectures maintainers should not whip driver 
> > > developers for any misuse. Such considerations are why although we're 
> > > now revisiting semantics for ioremap*() variants I was in hopes we 
> > > could be at least somewhat pedantic about memremap() semantics.
> > 
> > I agree.  However, there are a few exceptions like /dev/mem, which can 
> > map a target range without knowledge.
> 
> Still, the expectation to require support for overlapping ioremap() calls 
> would seem to be more of an exception than the norm, so I'd argue that 
> requiring callers who know they do need overlapping support to be explicit 
> about it may help us simplify expecations on semantics in other areas of 
> the kernel.

Again, I agree.  I am simply saying that the fallback in an overlapping case
may need to remain supported for such exceptional cases, possibly with a
separate interface.

> > > For instance since memremap() only has 2 types right now can a 
> > > respective fallback be documented as an alternative to help with this 
> > > ? Or can we not generalize this ?  One for MEMREMAP_WB and one for 
> > > MEMREMAP_WT ?
> > 
> > Yes, if a target range can be only mapped by memremap().  However, there 
> > is no restriction that a range can be mapped with a single interface 
> > alone.  For example, a range can be mapped with remap_pfn_range() to 
> > user space with any cache type.  So, in theory, memremap() can overlap 
> > with any other types.
> 
> Shouldn't that be an issue or area of concern ? It seems the flakiness on
> ioremap() and its wide array flexibility would spill over the any 
> semantics which folks would be trying to set out with memremap(). That 
> should not stop the evolution of memremap() though, just pointing out that 
> perhaps we should be a bit more restrictive over how things can criss
> -cross and who areas can do it.

reserve_pfn_range() allows the caller to specify if the fallback needs to be
enabled or disabled with 'strict_prot'.  track_pfn_remap() called from
remap_pfn_range() enables it, and track_pfn_copy() disables it.  I think we
can do similar for the memremap and ioremap families as well.  The fallback
could be set disabled in the regular interfaces, and enabled in some
interface as necessary.  This also allows gradual transition, ex. disable in
memremap while ioremap remains enabled for now. 

> > > > ioremap() falls back to the cache type of an existing mapping to 
> > > > avoid aliasing.
> > > 
> > > Does that assume an existing ioremap*() call was used on a bigger 
> > > range?  Do you know if that happens to only be the case for x86 (I'd 
> > > think so) or if its the same for other architectures ?
> > 
> > In the /dev/mem example, it is remap_pfn_range().  I think other archs 
> > have the same issue, but I do not know if they fall back in case of
> > overlapping call.
> 
> What should happen if remap_pfn_range() was used with 
> pgprot_writecombine() and later memremap() is used for instance with a 
> smaller overlappin section, or perhaps bigger?

With the fallback disabled, memremap() should fail in this case. 

Thanks,
-Toshi
Luis Chamberlain Aug. 11, 2015, 10:52 p.m. UTC | #21
On Tue, Aug 11, 2015 at 3:40 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Tue, 2015-08-11 at 23:30 +0200, Luis R. Rodriguez wrote:
>> On Wed, Jul 29, 2015 at 06:00:04PM -0600, Toshi Kani wrote:
>> > On Wed, 2015-07-29 at 23:43 +0200, Luis R. Rodriguez wrote:
>> > > On Wed, Jul 29, 2015 at 03:00:38PM -0600, Toshi Kani wrote:
>> > > > On Wed, 2015-07-29 at 11:33 -0700, Dan Williams wrote:
>   :
>> > > That would depend on the purpose of the region and the driver
>> > > developer should in theory know best. One issue with this of course is
>> > > that, as we've discovered, the semantics of on the ioremap*() variant
>> > > front regarding cache types is not clearly well defined, or at least
>> > > it may be only well defined implicitly on x86 only, so the driver
>> > > developer can only *hope* for the best across architectures. Our
>> > > ambiguity on our semantics on ioremap*() variants therefore means
>> > > driver developers can resonably be puzzled by what fallbacks to use.
>> > > That also means architectures maintainers should not whip driver
>> > > developers for any misuse. Such considerations are why although we're
>> > > now revisiting semantics for ioremap*() variants I was in hopes we
>> > > could be at least somewhat pedantic about memremap() semantics.
>> >
>> > I agree.  However, there are a few exceptions like /dev/mem, which can
>> > map a target range without knowledge.
>>
>> Still, the expectation to require support for overlapping ioremap() calls
>> would seem to be more of an exception than the norm, so I'd argue that
>> requiring callers who know they do need overlapping support to be explicit
>> about it may help us simplify expecations on semantics in other areas of
>> the kernel.
>
> Again, I agree.  I am simply saying that the fallback in an overlapping case
> may need to remain supported for such exceptional cases, possibly with a
> separate interface.

Great.

>> > > For instance since memremap() only has 2 types right now can a
>> > > respective fallback be documented as an alternative to help with this
>> > > ? Or can we not generalize this ?  One for MEMREMAP_WB and one for
>> > > MEMREMAP_WT ?
>> >
>> > Yes, if a target range can be only mapped by memremap().  However, there
>> > is no restriction that a range can be mapped with a single interface
>> > alone.  For example, a range can be mapped with remap_pfn_range() to
>> > user space with any cache type.  So, in theory, memremap() can overlap
>> > with any other types.
>>
>> Shouldn't that be an issue or area of concern ? It seems the flakiness on
>> ioremap() and its wide array flexibility would spill over the any
>> semantics which folks would be trying to set out with memremap(). That
>> should not stop the evolution of memremap() though, just pointing out that
>> perhaps we should be a bit more restrictive over how things can criss
>> -cross and who areas can do it.
>
> reserve_pfn_range() allows the caller to specify if the fallback needs to be
> enabled or disabled with 'strict_prot'.  track_pfn_remap() called from
> remap_pfn_range() enables it, and track_pfn_copy() disables it.  I think we
> can do similar for the memremap and ioremap families as well.  The fallback
> could be set disabled in the regular interfaces, and enabled in some
> interface as necessary.  This also allows gradual transition, ex. disable in
> memremap while ioremap remains enabled for now.

Sounds sexy to me.

>> > > > ioremap() falls back to the cache type of an existing mapping to
>> > > > avoid aliasing.
>> > >
>> > > Does that assume an existing ioremap*() call was used on a bigger
>> > > range?  Do you know if that happens to only be the case for x86 (I'd
>> > > think so) or if its the same for other architectures ?
>> >
>> > In the /dev/mem example, it is remap_pfn_range().  I think other archs
>> > have the same issue, but I do not know if they fall back in case of
>> > overlapping call.
>>
>> What should happen if remap_pfn_range() was used with
>> pgprot_writecombine() and later memremap() is used for instance with a
>> smaller overlappin section, or perhaps bigger?
>
> With the fallback disabled, memremap() should fail in this case.

Excellent.

 Luis
Dan Williams Aug. 11, 2015, 11:13 p.m. UTC | #22
On Tue, Aug 11, 2015 at 3:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Tue, Aug 11, 2015 at 3:40 PM, Toshi Kani <toshi.kani@hp.com> wrote:
>> On Tue, 2015-08-11 at 23:30 +0200, Luis R. Rodriguez wrote:
>>> On Wed, Jul 29, 2015 at 06:00:04PM -0600, Toshi Kani wrote:
>>> > On Wed, 2015-07-29 at 23:43 +0200, Luis R. Rodriguez wrote:
>>> > > On Wed, Jul 29, 2015 at 03:00:38PM -0600, Toshi Kani wrote:
>>> > > > On Wed, 2015-07-29 at 11:33 -0700, Dan Williams wrote:
>>   :
>>> > > That would depend on the purpose of the region and the driver
>>> > > developer should in theory know best. One issue with this of course is
>>> > > that, as we've discovered, the semantics of on the ioremap*() variant
>>> > > front regarding cache types is not clearly well defined, or at least
>>> > > it may be only well defined implicitly on x86 only, so the driver
>>> > > developer can only *hope* for the best across architectures. Our
>>> > > ambiguity on our semantics on ioremap*() variants therefore means
>>> > > driver developers can resonably be puzzled by what fallbacks to use.
>>> > > That also means architectures maintainers should not whip driver
>>> > > developers for any misuse. Such considerations are why although we're
>>> > > now revisiting semantics for ioremap*() variants I was in hopes we
>>> > > could be at least somewhat pedantic about memremap() semantics.
>>> >
>>> > I agree.  However, there are a few exceptions like /dev/mem, which can
>>> > map a target range without knowledge.
>>>
>>> Still, the expectation to require support for overlapping ioremap() calls
>>> would seem to be more of an exception than the norm, so I'd argue that
>>> requiring callers who know they do need overlapping support to be explicit
>>> about it may help us simplify expecations on semantics in other areas of
>>> the kernel.
>>
>> Again, I agree.  I am simply saying that the fallback in an overlapping case
>> may need to remain supported for such exceptional cases, possibly with a
>> separate interface.
>
> Great.
>
>>> > > For instance since memremap() only has 2 types right now can a
>>> > > respective fallback be documented as an alternative to help with this
>>> > > ? Or can we not generalize this ?  One for MEMREMAP_WB and one for
>>> > > MEMREMAP_WT ?
>>> >
>>> > Yes, if a target range can be only mapped by memremap().  However, there
>>> > is no restriction that a range can be mapped with a single interface
>>> > alone.  For example, a range can be mapped with remap_pfn_range() to
>>> > user space with any cache type.  So, in theory, memremap() can overlap
>>> > with any other types.
>>>
>>> Shouldn't that be an issue or area of concern ? It seems the flakiness on
>>> ioremap() and its wide array flexibility would spill over the any
>>> semantics which folks would be trying to set out with memremap(). That
>>> should not stop the evolution of memremap() though, just pointing out that
>>> perhaps we should be a bit more restrictive over how things can criss
>>> -cross and who areas can do it.
>>
>> reserve_pfn_range() allows the caller to specify if the fallback needs to be
>> enabled or disabled with 'strict_prot'.  track_pfn_remap() called from
>> remap_pfn_range() enables it, and track_pfn_copy() disables it.  I think we
>> can do similar for the memremap and ioremap families as well.  The fallback
>> could be set disabled in the regular interfaces, and enabled in some
>> interface as necessary.  This also allows gradual transition, ex. disable in
>> memremap while ioremap remains enabled for now.
>
> Sounds sexy to me.

Cool, sounds like something we can tackle in 4.4 along with the
ioremap_cache removal cleanups.
Luis Chamberlain April 21, 2016, 12:47 p.m. UTC | #23
On Wed, Aug 12, 2015 at 12:13 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Aug 11, 2015 at 3:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> On Tue, Aug 11, 2015 at 3:40 PM, Toshi Kani <toshi.kani@hp.com> wrote:
>>> On Tue, 2015-08-11 at 23:30 +0200, Luis R. Rodriguez wrote:
>>>> On Wed, Jul 29, 2015 at 06:00:04PM -0600, Toshi Kani wrote:
>>>> > On Wed, 2015-07-29 at 23:43 +0200, Luis R. Rodriguez wrote:
>>>> > > On Wed, Jul 29, 2015 at 03:00:38PM -0600, Toshi Kani wrote:
>>>> > > > On Wed, 2015-07-29 at 11:33 -0700, Dan Williams wrote:
>>>   :
>>>> > > That would depend on the purpose of the region and the driver
>>>> > > developer should in theory know best. One issue with this of course is
>>>> > > that, as we've discovered, the semantics of on the ioremap*() variant
>>>> > > front regarding cache types is not clearly well defined, or at least
>>>> > > it may be only well defined implicitly on x86 only, so the driver
>>>> > > developer can only *hope* for the best across architectures. Our
>>>> > > ambiguity on our semantics on ioremap*() variants therefore means
>>>> > > driver developers can resonably be puzzled by what fallbacks to use.
>>>> > > That also means architectures maintainers should not whip driver
>>>> > > developers for any misuse. Such considerations are why although we're
>>>> > > now revisiting semantics for ioremap*() variants I was in hopes we
>>>> > > could be at least somewhat pedantic about memremap() semantics.
>>>> >
>>>> > I agree.  However, there are a few exceptions like /dev/mem, which can
>>>> > map a target range without knowledge.
>>>>
>>>> Still, the expectation to require support for overlapping ioremap() calls
>>>> would seem to be more of an exception than the norm, so I'd argue that
>>>> requiring callers who know they do need overlapping support to be explicit
>>>> about it may help us simplify expecations on semantics in other areas of
>>>> the kernel.
>>>
>>> Again, I agree.  I am simply saying that the fallback in an overlapping case
>>> may need to remain supported for such exceptional cases, possibly with a
>>> separate interface.
>>
>> Great.
>>
>>>> > > For instance since memremap() only has 2 types right now can a
>>>> > > respective fallback be documented as an alternative to help with this
>>>> > > ? Or can we not generalize this ?  One for MEMREMAP_WB and one for
>>>> > > MEMREMAP_WT ?
>>>> >
>>>> > Yes, if a target range can be only mapped by memremap().  However, there
>>>> > is no restriction that a range can be mapped with a single interface
>>>> > alone.  For example, a range can be mapped with remap_pfn_range() to
>>>> > user space with any cache type.  So, in theory, memremap() can overlap
>>>> > with any other types.
>>>>
>>>> Shouldn't that be an issue or area of concern ? It seems the flakiness on
>>>> ioremap() and its wide array flexibility would spill over the any
>>>> semantics which folks would be trying to set out with memremap(). That
>>>> should not stop the evolution of memremap() though, just pointing out that
>>>> perhaps we should be a bit more restrictive over how things can criss
>>>> -cross and who areas can do it.
>>>
>>> reserve_pfn_range() allows the caller to specify if the fallback needs to be
>>> enabled or disabled with 'strict_prot'.  track_pfn_remap() called from
>>> remap_pfn_range() enables it, and track_pfn_copy() disables it.  I think we
>>> can do similar for the memremap and ioremap families as well.  The fallback
>>> could be set disabled in the regular interfaces, and enabled in some
>>> interface as necessary.  This also allows gradual transition, ex. disable in
>>> memremap while ioremap remains enabled for now.
>>
>> Sounds sexy to me.
>
> Cool, sounds like something we can tackle in 4.4 along with the
> ioremap_cache removal cleanups.

Just a reminder that we should expect some of these changes soon :D

 Luis
diff mbox

Patch

diff --git a/arch/ia64/include/asm/io.h b/arch/ia64/include/asm/io.h
index 80a7e34be009..9041bbe2b7b4 100644
--- a/arch/ia64/include/asm/io.h
+++ b/arch/ia64/include/asm/io.h
@@ -435,6 +435,7 @@  static inline void __iomem * ioremap_cache (unsigned long phys_addr, unsigned lo
 {
 	return ioremap(phys_addr, size);
 }
+#define ioremap_cache ioremap_cache
 
 
 /*
diff --git a/arch/sh/include/asm/io.h b/arch/sh/include/asm/io.h
index 728c4c571f40..6194e20fccca 100644
--- a/arch/sh/include/asm/io.h
+++ b/arch/sh/include/asm/io.h
@@ -342,6 +342,7 @@  ioremap_cache(phys_addr_t offset, unsigned long size)
 {
 	return __ioremap_mode(offset, size, PAGE_KERNEL);
 }
+#define ioremap_cache ioremap_cache
 
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 static inline void __iomem *
diff --git a/arch/xtensa/include/asm/io.h b/arch/xtensa/include/asm/io.h
index c39bb6e61911..867840f5400f 100644
--- a/arch/xtensa/include/asm/io.h
+++ b/arch/xtensa/include/asm/io.h
@@ -57,6 +57,7 @@  static inline void __iomem *ioremap_cache(unsigned long offset,
 	else
 		BUG();
 }
+#define ioremap_cache ioremap_cache
 
 #define ioremap_wc ioremap_nocache
 #define ioremap_wt ioremap_nocache
diff --git a/include/linux/io.h b/include/linux/io.h
index fb5a99800e77..dfed9d608bb3 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -121,4 +121,13 @@  static inline int arch_phys_wc_index(int handle)
 #endif
 #endif
 
+enum {
+	MEMREMAP_WB = 1 << 0,
+	MEMREMAP_WT = 1 << 1,
+	MEMREMAP_CACHE = MEMREMAP_WB,
+};
+
+extern void *memremap(resource_size_t offset, size_t size, unsigned long flags);
+extern void memunmap(void *addr);
+
 #endif /* _LINUX_IO_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 43c4c920f30a..92866d36e376 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -99,6 +99,8 @@  obj-$(CONFIG_JUMP_LABEL) += jump_label.o
 obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
 obj-$(CONFIG_TORTURE_TEST) += torture.o
 
+obj-$(CONFIG_HAS_IOMEM) += memremap.o
+
 $(obj)/configs.o: $(obj)/config_data.h
 
 # config_data.h contains the same information as ikconfig.h but gzipped.
diff --git a/kernel/memremap.c b/kernel/memremap.c
new file mode 100644
index 000000000000..ba206fd11785
--- /dev/null
+++ b/kernel/memremap.c
@@ -0,0 +1,82 @@ 
+/*
+ * Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include <linux/types.h>
+#include <linux/io.h>
+#include <linux/mm.h>
+
+#ifndef ioremap_cache
+/* temporary while we convert existing ioremap_cache users to memremap */
+__weak void __iomem *ioremap_cache(resource_size_t offset, unsigned long size)
+{
+	return ioremap(offset, size);
+}
+#endif
+
+/*
+ * memremap() is "ioremap" for cases where it is known that the resource
+ * being mapped does not have i/o side effects and the __iomem
+ * annotation is not applicable.
+ */
+void *memremap(resource_size_t offset, size_t size, unsigned long flags)
+{
+	int is_ram = region_is_ram(offset, size);
+	void *addr = NULL;
+
+	if (is_ram < 0) {
+		WARN_ONCE(1, "memremap attempted on mixed range %pa size: %zu\n",
+				&offset, size);
+		return NULL;
+	}
+
+	/* Try all mapping types requested until one returns non-NULL */
+	if (flags & MEMREMAP_CACHE) {
+		flags &= ~MEMREMAP_CACHE;
+		/*
+		 * MEMREMAP_CACHE is special in that it can be satisifed
+		 * from the direct map.  Some archs depend on the
+		 * capability of memremap() to autodetect cases where
+		 * the requested range is potentially in "System RAM"
+		 */
+		if (is_ram)
+			addr = __va(offset);
+		else
+			addr = ioremap_cache(offset, size);
+	}
+
+	/*
+	 * If we don't have a mapping yet and more request flags are
+	 * pending then we will be attempting to establish a new virtual
+	 * address mapping.  Enforce that this mapping is not aliasing
+	 * "System RAM"
+	 */
+	if (!addr && is_ram && flags) {
+		WARN_ONCE(1, "memremap attempted on ram %pa size: %zu\n",
+				&offset, size);
+		return NULL;
+	}
+
+	if (!addr && (flags & MEMREMAP_WT)) {
+		flags &= ~MEMREMAP_WT;
+		addr = ioremap_wt(offset, size);
+	}
+
+	return addr;
+}
+EXPORT_SYMBOL(memremap);
+
+void memunmap(void *addr)
+{
+	if (is_vmalloc_addr(addr))
+		iounmap((void __iomem *) addr);
+}
+EXPORT_SYMBOL(memunmap);