diff mbox

[v2,2/2] restrict /dev/mem to idle io memory ranges

Message ID 20151124000604.26360.56945.stgit@dwillia2-desk3.jf.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams Nov. 24, 2015, 12:06 a.m. UTC
This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE
semantics by default.  If userspace really believes it is safe to access
the memory region it can also perform the extra step of disabling an
active driver.  This protects device address ranges with read side
effects and otherwise directs userspace to use the driver.

Persistent memory presents a large "mistake surface" to /dev/mem as now
accidental writes can corrupt a filesystem.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 kernel/resource.c |   11 +++++++++--
 lib/Kconfig.debug |   23 ++++++++++++++++++++---
 2 files changed, 29 insertions(+), 5 deletions(-)

Comments

Andrew Morton Nov. 24, 2015, 10:25 p.m. UTC | #1
On Mon, 23 Nov 2015 16:06:04 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE
> semantics by default.  If userspace really believes it is safe to access
> the memory region it can also perform the extra step of disabling an
> active driver.  This protects device address ranges with read side
> effects and otherwise directs userspace to use the driver.

I don't think I'm sufficiently understanding what this is all needed
for, sorry.  A better changelog would help: what's wrong with the
current code, how you propose it be changed, how the kernel's
externally-visible behaviour is altered, etc.

Please pay particular attention to the back-compatibility issues which
will be encountered when people enable these options.

Perhaps when all that material is described, I'll understand why the
heck we're doing this with a build-time switch rather than a runtime
one...

> Persistent memory presents a large "mistake surface" to /dev/mem as now
> accidental writes can corrupt a filesystem.

Is that the motivation?  root can come in and accidentally alter
persistent memory contents?  If so, 

- why do we care?  There are all sorts of ways in which root can muck
  up the persistent memory, starting with dd(1).  What's special about
  /dev/mem?

- why is the patch mucking with access to PCI and BIOS space?  Is the
  persistent memory even mappable in those regions?  Or is the concern
  that userspace can access control registers associated with the
  persistent memory?  What is the problem scenario?


IOW, a very good description of the problem-being-solved would help out
a lot here...
Dan Williams Nov. 25, 2015, 12:34 a.m. UTC | #2
On Tue, Nov 24, 2015 at 2:25 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 23 Nov 2015 16:06:04 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
>
>> This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE
>> semantics by default.  If userspace really believes it is safe to access
>> the memory region it can also perform the extra step of disabling an
>> active driver.  This protects device address ranges with read side
>> effects and otherwise directs userspace to use the driver.
>
> I don't think I'm sufficiently understanding what this is all needed
> for, sorry.  A better changelog would help: what's wrong with the
> current code, how you propose it be changed, how the kernel's
> externally-visible behaviour is altered, etc.
>

I should have duplicated the Kconfig description for IO_STRICT_DEVMEM
in the changelog, but the justification is simply that if the kernel
has a driver busily using a memory range, userspace needs to assert it
knows it is safe to access that range by disabling the driver.  This
makes the kernel safer by default.

> Please pay particular attention to the back-compatibility issues which
> will be encountered when people enable these options.

It certainly diminishes debug capabilities, mmap of sysfs pci
resources will also fail while a driver is active.  The only general
purpose application I know that uses /dev/mem is dosemu.  It should
continue to work fine as x86 "devmem_is_allowed()" permits access from
0-to-1MB by default.  The other stated user of /dev/mem legacy X
drivers.  With the prevalence of kernel modesetting in graphics
drivers I don't know how much of a concern this is anymore.

> Perhaps when all that material is described, I'll understand why the
> heck we're doing this with a build-time switch rather than a runtime
> one...

We have the "iomem=" kernel parameter.  I think it makes sense to have
that setting be configurable at runtime to augment this build time
decision.

>> Persistent memory presents a large "mistake surface" to /dev/mem as now
>> accidental writes can corrupt a filesystem.
>
> Is that the motivation?  root can come in and accidentally alter
> persistent memory contents?  If so,
>
> - why do we care?  There are all sorts of ways in which root can muck
>   up the persistent memory, starting with dd(1).  What's special about
>   /dev/mem?

dd through /dev/pmem and the driver will do all the proper flushing
and syncing to make the writes durable on media.  /dev/mem knows none
of those semantics.  /dev/pmem as a block device responds to O_EXCL
and prevents other attempts to open the device.

> - why is the patch mucking with access to PCI and BIOS space?  Is the
>   persistent memory even mappable in those regions?  Or is the concern
>   that userspace can access control registers associated with the
>   persistent memory?  What is the problem scenario?

It seems to me that letting /dev/mem do arbitrary access to any region
of memory is a dangerous capability for a production environment.
Drivers assume that request_mem_region() tells other parts of the
kernel to not touch their memory.  Having the option to extend that
protection to /dev/mem by default seemed a reasonable idea.

Of course, all of this assumes that you think it is worthwhile to have
some protections and safety measures even for root.

> IOW, a very good description of the problem-being-solved would help out
> a lot here...

I'll fold the eventual result of this discussion into the changelog if
I can convince you it's worth moving forward.

I also have the option of just tagging the pmem regions as
IORESOURCE_EXCLUSIVE, but I decided against that because I think our
current definition of STRICT_DEVMEM leaves a big hole if the goal is
"/dev/mem access is safe by default".
Andrew Morton Nov. 25, 2015, 12:47 a.m. UTC | #3
On Tue, 24 Nov 2015 16:34:19 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> > IOW, a very good description of the problem-being-solved would help out
> > a lot here...
> 
> I'll fold the eventual result of this discussion into the changelog if
> I can convince you it's worth moving forward.

I'm easily convinced ;) Please let's get all the info into the right
place, make sure it answers the thus-far-asked questions (at least) and
we'll take it from there.

And please do have a think about switching as much as possible over to
runtime-configurability.  Because "please echo foo > /proc" is a heck
of a lot nicer than "please reboot with iomem=" which is a heck of a lot
nicer than "please ask vendor for a new kernel".
Kees Cook Nov. 25, 2015, 12:50 a.m. UTC | #4
On Tue, Nov 24, 2015 at 4:47 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 24 Nov 2015 16:34:19 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
>
>> > IOW, a very good description of the problem-being-solved would help out
>> > a lot here...
>>
>> I'll fold the eventual result of this discussion into the changelog if
>> I can convince you it's worth moving forward.
>
> I'm easily convinced ;) Please let's get all the info into the right
> place, make sure it answers the thus-far-asked questions (at least) and
> we'll take it from there.
>
> And please do have a think about switching as much as possible over to
> runtime-configurability.  Because "please echo foo > /proc" is a heck
> of a lot nicer than "please reboot with iomem=" which is a heck of a lot
> nicer than "please ask vendor for a new kernel".

I think run-time config for this should be an as-needed case. Nothing
should be fiddling with this memory from userspace anyway -- a driver
covering it should be unloaded first.

And, with my dosemu maintainer hat on, If you're using dosemu in a
mode where this will cause a problem, you are already running a custom
kernel. :)

And that said, if someone can actually produce a case where we need
this runtime configurable, I'm all for it. I just don't think we need
to design it in right now.

-Kees
Dan Williams Nov. 25, 2015, 1:28 a.m. UTC | #5
On Tue, Nov 24, 2015 at 4:47 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 24 Nov 2015 16:34:19 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
>
>> > IOW, a very good description of the problem-being-solved would help out
>> > a lot here...
>>
>> I'll fold the eventual result of this discussion into the changelog if
>> I can convince you it's worth moving forward.
>
> I'm easily convinced ;) Please let's get all the info into the right
> place, make sure it answers the thus-far-asked questions (at least) and
> we'll take it from there.
>
> And please do have a think about switching as much as possible over to
> runtime-configurability.  Because "please echo foo > /proc" is a heck
> of a lot nicer than "please reboot with iomem=" which is a heck of a lot
> nicer than "please ask vendor for a new kernel".

Actually, we already have runtime configuration.  For example, if you
want to muck with pmem via /dev/mem, just do this first:

    echo namespace0.0 > /sys/bus/nd/drivers/nd_pmem/unbind
Dan Williams Nov. 25, 2015, 6:54 p.m. UTC | #6
On Tue, Nov 24, 2015 at 5:28 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Nov 24, 2015 at 4:47 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Tue, 24 Nov 2015 16:34:19 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
>>
>>> > IOW, a very good description of the problem-being-solved would help out
>>> > a lot here...
>>>
>>> I'll fold the eventual result of this discussion into the changelog if
>>> I can convince you it's worth moving forward.
>>
>> I'm easily convinced ;) Please let's get all the info into the right
>> place, make sure it answers the thus-far-asked questions (at least) and
>> we'll take it from there.
>>
>> And please do have a think about switching as much as possible over to
>> runtime-configurability.  Because "please echo foo > /proc" is a heck
>> of a lot nicer than "please reboot with iomem=" which is a heck of a lot
>> nicer than "please ask vendor for a new kernel".
>
> Actually, we already have runtime configuration.  For example, if you
> want to muck with pmem via /dev/mem, just do this first:
>
>     echo namespace0.0 > /sys/bus/nd/drivers/nd_pmem/unbind

Here's the summary of the thread that I will add to the changelog:

---

In general if a device driver is busily using a memory region it
already informs other parts of the kernel to not touch it via
request_mem_region().  /dev/mem should honor the same safety
restriction by default.  Debugging a device driver from userspace
becomes more
difficult with this enabled.  Any application using /dev/mem or mmap
of sysfs pci resources will now need to perform the extra step of
either:

1/ Disabling the driver, for example:

  echo <device id> > /dev/bus/<parent bus>/drivers/<driver name>/unbind

2/ Rebooting with "iomem=relaxed" on the command line

3/ Recompiling with CONFIG_IO_STRICT_DEVMEM=n

Traditional users of /dev/mem like dosemu are unaffected because the
first 1MB of memory is not subject to the IO_STRICT_DEVMEM
restriction.
Legacy X configurations use /dev/mem to talk to graphics hardware, but
that functionality has since moved to kernel graphics drivers.
Ingo Molnar Nov. 26, 2015, 11:08 a.m. UTC | #7
* Dan Williams <dan.j.williams@intel.com> wrote:

> > - why is the patch mucking with access to PCI and BIOS space?  Is the
> >   persistent memory even mappable in those regions?  Or is the concern
> >   that userspace can access control registers associated with the
> >   persistent memory?  What is the problem scenario?
> 
> It seems to me that letting /dev/mem do arbitrary access to any region
> of memory is a dangerous capability for a production environment.

So basically the original motivation years ago was to disable /dev/mem altogether: 
it used to be a wide open roothole if anything with write access to it (such as 
old Xorg) is exploited, plus it's a favorite and convenient tool for stealth 
access to system areas of memory in cases where the attacker already has root. 

(this is relevant as even being root might not give easy access to system mmio 
areas if things like being able to load modules is restricted even for room, and 
if the system has readonly storage and a few other things configured.)

But we couldn't disable /dev/mem completely due to Xorg and dosemu legacies - so 
we came up with this restriction feature that limits its scope.

Any additional steps that limit the scope of access under the STRICT_DEVMEM option 
(which is really a misnomer: it should be RESTRICT_DEVMEM instead) are welcome 
from a generic Linux distro POV.

Thanks,

	Ingo
diff mbox

Patch

diff --git a/kernel/resource.c b/kernel/resource.c
index f150dbbe6f62..09c0597840b0 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1498,8 +1498,15 @@  int iomem_is_exclusive(u64 addr)
 			break;
 		if (p->end < addr)
 			continue;
-		if (p->flags & IORESOURCE_BUSY &&
-		     p->flags & IORESOURCE_EXCLUSIVE) {
+		/*
+		 * A resource is exclusive if IORESOURCE_EXCLUSIVE is set
+		 * or CONFIG_IO_STRICT_DEVMEM is enabled and the
+		 * resource is busy.
+		 */
+		if ((p->flags & IORESOURCE_BUSY) == 0)
+			continue;
+		if (IS_ENABLED(CONFIG_IO_STRICT_DEVMEM)
+				|| p->flags & IORESOURCE_EXCLUSIVE) {
 			err = 1;
 			break;
 		}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 289dfcbc14eb..073496dea848 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1869,9 +1869,26 @@  config STRICT_DEVMEM
 	  enabled, even in this case there are restrictions on /dev/mem
 	  use due to the cache aliasing requirements.
 
+	  If this option is switched on, and IO_STRICT_DEVMEM=n, the /dev/mem
+	  file only allows userspace access to PCI space and the BIOS code and
+	  data regions.  This is sufficient for dosemu and X and all common
+	  users of /dev/mem.
+
+	  If in doubt, say Y.
+
+config IO_STRICT_DEVMEM
+	bool "Filter I/O access to /dev/mem"
+	depends on STRICT_DEVMEM
+	default STRICT_DEVMEM
+	---help---
+	  If this option is disabled, you allow userspace (root) access to all
+	  io-memory regardless of whether a driver is actively using that
+	  range.  Accidental access to this is obviously disastrous, but
+	  specific access can be used by people debugging kernel drivers.
+
 	  If this option is switched on, the /dev/mem file only allows
-	  userspace access to PCI space and the BIOS code and data regions.
-	  This is sufficient for dosemu and X and all common users of
-	  /dev/mem.
+	  userspace access to *idle* io-memory ranges (see /proc/iomem) This
+	  may break traditional users of /dev/mem (dosemu, legacy X, etc...)
+	  if the driver using a given range cannot be disabled.
 
 	  If in doubt, say Y.