Message ID | 20151124000604.26360.56945.stgit@dwillia2-desk3.jf.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 23 Nov 2015 16:06:04 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE > semantics by default. If userspace really believes it is safe to access > the memory region it can also perform the extra step of disabling an > active driver. This protects device address ranges with read side > effects and otherwise directs userspace to use the driver. I don't think I'm sufficiently understanding what this is all needed for, sorry. A better changelog would help: what's wrong with the current code, how you propose it be changed, how the kernel's externally-visible behaviour is altered, etc. Please pay particular attention to the back-compatibility issues which will be encountered when people enable these options. Perhaps when all that material is described, I'll understand why the heck we're doing this with a build-time switch rather than a runtime one... > Persistent memory presents a large "mistake surface" to /dev/mem as now > accidental writes can corrupt a filesystem. Is that the motivation? root can come in and accidentally alter persistent memory contents? If so, - why do we care? There are all sorts of ways in which root can muck up the persistent memory, starting with dd(1). What's special about /dev/mem? - why is the patch mucking with access to PCI and BIOS space? Is the persistent memory even mappable in those regions? Or is the concern that userspace can access control registers associated with the persistent memory? What is the problem scenario? IOW, a very good description of the problem-being-solved would help out a lot here...
On Tue, Nov 24, 2015 at 2:25 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Mon, 23 Nov 2015 16:06:04 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > >> This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE >> semantics by default. If userspace really believes it is safe to access >> the memory region it can also perform the extra step of disabling an >> active driver. This protects device address ranges with read side >> effects and otherwise directs userspace to use the driver. > > I don't think I'm sufficiently understanding what this is all needed > for, sorry. A better changelog would help: what's wrong with the > current code, how you propose it be changed, how the kernel's > externally-visible behaviour is altered, etc. > I should have duplicated the Kconfig description for IO_STRICT_DEVMEM in the changelog, but the justification is simply that if the kernel has a driver busily using a memory range, userspace needs to assert it knows it is safe to access that range by disabling the driver. This makes the kernel safer by default. > Please pay particular attention to the back-compatibility issues which > will be encountered when people enable these options. It certainly diminishes debug capabilities, mmap of sysfs pci resources will also fail while a driver is active. The only general purpose application I know that uses /dev/mem is dosemu. It should continue to work fine as x86 "devmem_is_allowed()" permits access from 0-to-1MB by default. The other stated user of /dev/mem legacy X drivers. With the prevalence of kernel modesetting in graphics drivers I don't know how much of a concern this is anymore. > Perhaps when all that material is described, I'll understand why the > heck we're doing this with a build-time switch rather than a runtime > one... We have the "iomem=" kernel parameter. I think it makes sense to have that setting be configurable at runtime to augment this build time decision. >> Persistent memory presents a large "mistake surface" to /dev/mem as now >> accidental writes can corrupt a filesystem. > > Is that the motivation? root can come in and accidentally alter > persistent memory contents? If so, > > - why do we care? There are all sorts of ways in which root can muck > up the persistent memory, starting with dd(1). What's special about > /dev/mem? dd through /dev/pmem and the driver will do all the proper flushing and syncing to make the writes durable on media. /dev/mem knows none of those semantics. /dev/pmem as a block device responds to O_EXCL and prevents other attempts to open the device. > - why is the patch mucking with access to PCI and BIOS space? Is the > persistent memory even mappable in those regions? Or is the concern > that userspace can access control registers associated with the > persistent memory? What is the problem scenario? It seems to me that letting /dev/mem do arbitrary access to any region of memory is a dangerous capability for a production environment. Drivers assume that request_mem_region() tells other parts of the kernel to not touch their memory. Having the option to extend that protection to /dev/mem by default seemed a reasonable idea. Of course, all of this assumes that you think it is worthwhile to have some protections and safety measures even for root. > IOW, a very good description of the problem-being-solved would help out > a lot here... I'll fold the eventual result of this discussion into the changelog if I can convince you it's worth moving forward. I also have the option of just tagging the pmem regions as IORESOURCE_EXCLUSIVE, but I decided against that because I think our current definition of STRICT_DEVMEM leaves a big hole if the goal is "/dev/mem access is safe by default".
On Tue, 24 Nov 2015 16:34:19 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > > IOW, a very good description of the problem-being-solved would help out > > a lot here... > > I'll fold the eventual result of this discussion into the changelog if > I can convince you it's worth moving forward. I'm easily convinced ;) Please let's get all the info into the right place, make sure it answers the thus-far-asked questions (at least) and we'll take it from there. And please do have a think about switching as much as possible over to runtime-configurability. Because "please echo foo > /proc" is a heck of a lot nicer than "please reboot with iomem=" which is a heck of a lot nicer than "please ask vendor for a new kernel".
On Tue, Nov 24, 2015 at 4:47 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Tue, 24 Nov 2015 16:34:19 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > >> > IOW, a very good description of the problem-being-solved would help out >> > a lot here... >> >> I'll fold the eventual result of this discussion into the changelog if >> I can convince you it's worth moving forward. > > I'm easily convinced ;) Please let's get all the info into the right > place, make sure it answers the thus-far-asked questions (at least) and > we'll take it from there. > > And please do have a think about switching as much as possible over to > runtime-configurability. Because "please echo foo > /proc" is a heck > of a lot nicer than "please reboot with iomem=" which is a heck of a lot > nicer than "please ask vendor for a new kernel". I think run-time config for this should be an as-needed case. Nothing should be fiddling with this memory from userspace anyway -- a driver covering it should be unloaded first. And, with my dosemu maintainer hat on, If you're using dosemu in a mode where this will cause a problem, you are already running a custom kernel. :) And that said, if someone can actually produce a case where we need this runtime configurable, I'm all for it. I just don't think we need to design it in right now. -Kees
On Tue, Nov 24, 2015 at 4:47 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Tue, 24 Nov 2015 16:34:19 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > >> > IOW, a very good description of the problem-being-solved would help out >> > a lot here... >> >> I'll fold the eventual result of this discussion into the changelog if >> I can convince you it's worth moving forward. > > I'm easily convinced ;) Please let's get all the info into the right > place, make sure it answers the thus-far-asked questions (at least) and > we'll take it from there. > > And please do have a think about switching as much as possible over to > runtime-configurability. Because "please echo foo > /proc" is a heck > of a lot nicer than "please reboot with iomem=" which is a heck of a lot > nicer than "please ask vendor for a new kernel". Actually, we already have runtime configuration. For example, if you want to muck with pmem via /dev/mem, just do this first: echo namespace0.0 > /sys/bus/nd/drivers/nd_pmem/unbind
On Tue, Nov 24, 2015 at 5:28 PM, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, Nov 24, 2015 at 4:47 PM, Andrew Morton > <akpm@linux-foundation.org> wrote: >> On Tue, 24 Nov 2015 16:34:19 -0800 Dan Williams <dan.j.williams@intel.com> wrote: >> >>> > IOW, a very good description of the problem-being-solved would help out >>> > a lot here... >>> >>> I'll fold the eventual result of this discussion into the changelog if >>> I can convince you it's worth moving forward. >> >> I'm easily convinced ;) Please let's get all the info into the right >> place, make sure it answers the thus-far-asked questions (at least) and >> we'll take it from there. >> >> And please do have a think about switching as much as possible over to >> runtime-configurability. Because "please echo foo > /proc" is a heck >> of a lot nicer than "please reboot with iomem=" which is a heck of a lot >> nicer than "please ask vendor for a new kernel". > > Actually, we already have runtime configuration. For example, if you > want to muck with pmem via /dev/mem, just do this first: > > echo namespace0.0 > /sys/bus/nd/drivers/nd_pmem/unbind Here's the summary of the thread that I will add to the changelog: --- In general if a device driver is busily using a memory region it already informs other parts of the kernel to not touch it via request_mem_region(). /dev/mem should honor the same safety restriction by default. Debugging a device driver from userspace becomes more difficult with this enabled. Any application using /dev/mem or mmap of sysfs pci resources will now need to perform the extra step of either: 1/ Disabling the driver, for example: echo <device id> > /dev/bus/<parent bus>/drivers/<driver name>/unbind 2/ Rebooting with "iomem=relaxed" on the command line 3/ Recompiling with CONFIG_IO_STRICT_DEVMEM=n Traditional users of /dev/mem like dosemu are unaffected because the first 1MB of memory is not subject to the IO_STRICT_DEVMEM restriction. Legacy X configurations use /dev/mem to talk to graphics hardware, but that functionality has since moved to kernel graphics drivers.
* Dan Williams <dan.j.williams@intel.com> wrote: > > - why is the patch mucking with access to PCI and BIOS space? Is the > > persistent memory even mappable in those regions? Or is the concern > > that userspace can access control registers associated with the > > persistent memory? What is the problem scenario? > > It seems to me that letting /dev/mem do arbitrary access to any region > of memory is a dangerous capability for a production environment. So basically the original motivation years ago was to disable /dev/mem altogether: it used to be a wide open roothole if anything with write access to it (such as old Xorg) is exploited, plus it's a favorite and convenient tool for stealth access to system areas of memory in cases where the attacker already has root. (this is relevant as even being root might not give easy access to system mmio areas if things like being able to load modules is restricted even for room, and if the system has readonly storage and a few other things configured.) But we couldn't disable /dev/mem completely due to Xorg and dosemu legacies - so we came up with this restriction feature that limits its scope. Any additional steps that limit the scope of access under the STRICT_DEVMEM option (which is really a misnomer: it should be RESTRICT_DEVMEM instead) are welcome from a generic Linux distro POV. Thanks, Ingo
diff --git a/kernel/resource.c b/kernel/resource.c index f150dbbe6f62..09c0597840b0 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1498,8 +1498,15 @@ int iomem_is_exclusive(u64 addr) break; if (p->end < addr) continue; - if (p->flags & IORESOURCE_BUSY && - p->flags & IORESOURCE_EXCLUSIVE) { + /* + * A resource is exclusive if IORESOURCE_EXCLUSIVE is set + * or CONFIG_IO_STRICT_DEVMEM is enabled and the + * resource is busy. + */ + if ((p->flags & IORESOURCE_BUSY) == 0) + continue; + if (IS_ENABLED(CONFIG_IO_STRICT_DEVMEM) + || p->flags & IORESOURCE_EXCLUSIVE) { err = 1; break; } diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 289dfcbc14eb..073496dea848 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1869,9 +1869,26 @@ config STRICT_DEVMEM enabled, even in this case there are restrictions on /dev/mem use due to the cache aliasing requirements. + If this option is switched on, and IO_STRICT_DEVMEM=n, the /dev/mem + file only allows userspace access to PCI space and the BIOS code and + data regions. This is sufficient for dosemu and X and all common + users of /dev/mem. + + If in doubt, say Y. + +config IO_STRICT_DEVMEM + bool "Filter I/O access to /dev/mem" + depends on STRICT_DEVMEM + default STRICT_DEVMEM + ---help--- + If this option is disabled, you allow userspace (root) access to all + io-memory regardless of whether a driver is actively using that + range. Accidental access to this is obviously disastrous, but + specific access can be used by people debugging kernel drivers. + If this option is switched on, the /dev/mem file only allows - userspace access to PCI space and the BIOS code and data regions. - This is sufficient for dosemu and X and all common users of - /dev/mem. + userspace access to *idle* io-memory ranges (see /proc/iomem) This + may break traditional users of /dev/mem (dosemu, legacy X, etc...) + if the driver using a given range cannot be disabled. If in doubt, say Y.