diff mbox series

[EFI,PCI] Allow disabling PCI busmastering on bridges during boot

Message ID 20191203004043.174977-1-matthewgarrett@google.com (mailing list archive)
State Not Applicable, archived
Headers show
Series [EFI,PCI] Allow disabling PCI busmastering on bridges during boot | expand

Commit Message

Matthew Garrett Dec. 3, 2019, 12:40 a.m. UTC
Add an option to disable the busmaster bit in the control register on
all PCI bridges before calling ExitBootServices() and passing control to
the runtime kernel. System firmware may configure the IOMMU to prevent
malicious PCI devices from being able to attack the OS via DMA. However,
since firmware can't guarantee that the OS is IOMMU-aware, it will tear
down IOMMU configuration when ExitBootServices() is called. This leaves
a window between where a hostile device could still cause damage before
Linux configures the IOMMU again.

If CONFIG_EFI_NO_BUSMASTER is enabled or the "disable_busmaster=1"
commandline argument is passed, the EFI stub will clear the busmaster
bit on all PCI bridges before ExitBootServices() is called. This will
prevent any malicious PCI devices from being able to perform DMA until
the kernel reenables busmastering after configuring the IOMMU.

This option will cause failures with some poorly behaved hardware and
should not be enabled without testing. The kernel commandline option
"disable_busmaster=0" or "disable_busmaster=1" may be used to override
the default.

Signed-off-by: Matthew Garrett <mjg59@google.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 ++
 arch/x86/boot/compressed/eboot.c              |  2 +
 drivers/firmware/efi/Kconfig                  | 23 +++++
 drivers/firmware/efi/libstub/Makefile         |  2 +-
 .../firmware/efi/libstub/efi-stub-helper.c    | 19 ++++
 drivers/firmware/efi/libstub/efistub.h        |  1 +
 drivers/firmware/efi/libstub/pci.c            | 88 +++++++++++++++++++
 include/linux/efi.h                           |  2 +
 8 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 drivers/firmware/efi/libstub/pci.c

Comments

Matthew Garrett Dec. 3, 2019, 12:42 a.m. UTC | #1
On Mon, Dec 2, 2019 at 4:40 PM Matthew Garrett
<matthewgarrett@google.com> wrote:
>
> Add an option to disable the busmaster bit in the control register on
> all PCI bridges before calling ExitBootServices() and passing control to
> the runtime kernel. System firmware may configure the IOMMU to prevent
> malicious PCI devices from being able to attack the OS via DMA. However,
> since firmware can't guarantee that the OS is IOMMU-aware, it will tear
> down IOMMU configuration when ExitBootServices() is called. This leaves
> a window between where a hostile device could still cause damage before
> Linux configures the IOMMU again.

I don't know enough about ARM to know if this makes sense there as well. Anyone?
Ard Biesheuvel Dec. 3, 2019, 11:54 a.m. UTC | #2
(+ Laszlo)

On Tue, 3 Dec 2019 at 00:43, Matthew Garrett <mjg59@google.com> wrote:
>
> On Mon, Dec 2, 2019 at 4:40 PM Matthew Garrett
> <matthewgarrett@google.com> wrote:
> >
> > Add an option to disable the busmaster bit in the control register on
> > all PCI bridges before calling ExitBootServices() and passing control to
> > the runtime kernel. System firmware may configure the IOMMU to prevent
> > malicious PCI devices from being able to attack the OS via DMA. However,
> > since firmware can't guarantee that the OS is IOMMU-aware, it will tear
> > down IOMMU configuration when ExitBootServices() is called. This leaves
> > a window between where a hostile device could still cause damage before
> > Linux configures the IOMMU again.
>
> I don't know enough about ARM to know if this makes sense there as well. Anyone?

There is no reason this shouldn't apply to ARM, but disabling bus
mastering like that before the drivers themselves get a chance to do
so is likely to cause trouble. Network devices or storage controllers
that are still running and have live descriptor rings in DMA memory
shouldn't get the rug pulled from under their feet like that by
blindly disabling the BM attribute on all root ports before their
drivers have had the opportunity to do this cleanly.

One trick we implemented in EDK2 for memory encryption was to do the
following (Laszlo, mind correcting me here if I am remembering this
wrong?)
- create an event X
- register an AtExitBootServices event that signals event X in its handler
- in the handler of event X, iterate over all PPBs to clear the bus
master attribute
- for bonus points, do the same for the PCIe devices themselves,
because root ports are known to exist that entirely ignore the BM
attribute

This way, event X should get handled after all the drivers' EBS event
handlers have been called.
Laszlo Ersek Dec. 3, 2019, 1:38 p.m. UTC | #3
On 12/03/19 12:54, Ard Biesheuvel wrote:
> (+ Laszlo)
> 
> On Tue, 3 Dec 2019 at 00:43, Matthew Garrett <mjg59@google.com> wrote:
>>
>> On Mon, Dec 2, 2019 at 4:40 PM Matthew Garrett
>> <matthewgarrett@google.com> wrote:
>>>
>>> Add an option to disable the busmaster bit in the control register on
>>> all PCI bridges before calling ExitBootServices() and passing control to
>>> the runtime kernel. System firmware may configure the IOMMU to prevent
>>> malicious PCI devices from being able to attack the OS via DMA. However,
>>> since firmware can't guarantee that the OS is IOMMU-aware, it will tear
>>> down IOMMU configuration when ExitBootServices() is called. This leaves
>>> a window between where a hostile device could still cause damage before
>>> Linux configures the IOMMU again.

(1) This vaguely reminds me of
<https://bugzilla.tianocore.org/show_bug.cgi?id=675>.


(2) I'm not 100% convinced this threat model -- I hope I'm using the
right term -- is useful. A PCI device will likely not "itself" set up
DMA (maliciously or not) without a matching driver. The driver will
likely come from the device too (option ROM). The driver will program
the device to do DMA. So, whatever the system firwmare does wrt. the
IOMMU for OS protection purposes, the device driver from the option ROM
can undo.

Is this a scenario where we trust the device driver that comes from the
device's ROM BAR (let's say after the driver passes Secure Boot
verification and after we measure it into the TPM), but don't trust the
silicon jammed in the motherboard that presents the driver?


(3) I never understood why the default behavior (or rather, "only"
behavior) for system firmware wrt. the IOMMU at EBS was "whitelist
everything". Why not "blacklist everything"?

I understand the compat perspective, but the OS should at least be able
to request such a full blackout through OsIndications or whatever. (With
the SEV IOMMU driver in OVMF, that's what we do -- we set everything to
encrypted.)

>> I don't know enough about ARM to know if this makes sense there as well. Anyone?
> 
> There is no reason this shouldn't apply to ARM, but disabling bus
> mastering like that before the drivers themselves get a chance to do
> so is likely to cause trouble. Network devices or storage controllers
> that are still running and have live descriptor rings in DMA memory
> shouldn't get the rug pulled from under their feet like that by
> blindly disabling the BM attribute on all root ports before their
> drivers have had the opportunity to do this cleanly.

I agree.

> 
> One trick we implemented in EDK2 for memory encryption was to do the
> following (Laszlo, mind correcting me here if I am remembering this
> wrong?)
> - create an event X
> - register an AtExitBootServices event that signals event X in its handler
> - in the handler of event X, iterate over all PPBs to clear the bus
> master attribute
> - for bonus points, do the same for the PCIe devices themselves,
> because root ports are known to exist that entirely ignore the BM
> attribute
> 
> This way, event X should get handled after all the drivers' EBS event
> handlers have been called.

Yes. Please see the commit message and the code comments in the
following edk2 commit:

https://github.com/tianocore/edk2/commit/7aee391fa3d0

I'm unsure how portable it is to platforms that are not derived from edk2.

Thanks!
Laszlo
Andy Lutomirski Dec. 3, 2019, 3:30 p.m. UTC | #4
On Mon, Dec 2, 2019 at 4:41 PM Matthew Garrett
<matthewgarrett@google.com> wrote:
>
> Add an option to disable the busmaster bit in the control register on
> all PCI bridges before calling ExitBootServices() and passing control to
> the runtime kernel. System firmware may configure the IOMMU to prevent
> malicious PCI devices from being able to attack the OS via DMA. However,
> since firmware can't guarantee that the OS is IOMMU-aware, it will tear
> down IOMMU configuration when ExitBootServices() is called. This leaves
> a window between where a hostile device could still cause damage before
> Linux configures the IOMMU again.
>
> If CONFIG_EFI_NO_BUSMASTER is enabled or the "disable_busmaster=1"
> commandline argument is passed, the EFI stub will clear the busmaster
> bit on all PCI bridges before ExitBootServices() is called. This will
> prevent any malicious PCI devices from being able to perform DMA until
> the kernel reenables busmastering after configuring the IOMMU.

I hate to be an obnoxious bikeshedder, but I really dislike the
"disable_busmaster" name.  I read this and $SUBJECT as "for some
reason, the admin wants to operate the system with busmastering off".
What you really want is something more like "disable busmastering
before IOMMU initialization".  Maybe
"iommu.disable_busmaster_before_init"?

Similarly, EFI_NO_BUSMASTER sounds like a permanent state of affairs.

Would a similar patch apply to non-EFI boot?  That is, in a BIOS boot,
is busmastering on when the kernel is loaded?

--Andy
Ard Biesheuvel Dec. 3, 2019, 4:33 p.m. UTC | #5
On Tue, 3 Dec 2019 at 15:30, Andy Lutomirski <luto@amacapital.net> wrote:
>
> On Mon, Dec 2, 2019 at 4:41 PM Matthew Garrett
> <matthewgarrett@google.com> wrote:
> >
> > Add an option to disable the busmaster bit in the control register on
> > all PCI bridges before calling ExitBootServices() and passing control to
> > the runtime kernel. System firmware may configure the IOMMU to prevent
> > malicious PCI devices from being able to attack the OS via DMA. However,
> > since firmware can't guarantee that the OS is IOMMU-aware, it will tear
> > down IOMMU configuration when ExitBootServices() is called. This leaves
> > a window between where a hostile device could still cause damage before
> > Linux configures the IOMMU again.
> >
> > If CONFIG_EFI_NO_BUSMASTER is enabled or the "disable_busmaster=1"
> > commandline argument is passed, the EFI stub will clear the busmaster
> > bit on all PCI bridges before ExitBootServices() is called. This will
> > prevent any malicious PCI devices from being able to perform DMA until
> > the kernel reenables busmastering after configuring the IOMMU.
>
> I hate to be an obnoxious bikeshedder, but I really dislike the
> "disable_busmaster" name.  I read this and $SUBJECT as "for some
> reason, the admin wants to operate the system with busmastering off".
> What you really want is something more like "disable busmastering
> before IOMMU initialization".  Maybe
> "iommu.disable_busmaster_before_init"?
>
> Similarly, EFI_NO_BUSMASTER sounds like a permanent state of affairs.
>
> Would a similar patch apply to non-EFI boot?  That is, in a BIOS boot,
> is busmastering on when the kernel is loaded?
>

Yes, bus mastering is on, but since legacy BIOS may implement things
like PS/2 emulation or other compatibility hacks where the PCI masters
(devices or bridges) may need to be left enabled across the transition
from firmware into the OS, I don't think it is wise to try and
implement this feature for it.

So the EFI stub is a reasonable place to put a feature like this,
except for the fact that [on x86], it does not get invoked unless GRUB
boots your kernel with 'linuxefi' rather than 'linux', and so in the
majority of cases, I guess we are essentially doing legacy BIOS boot,
even on UEFI systems.
kernel test robot Dec. 3, 2019, 6:23 p.m. UTC | #6
Hi Matthew,

I love your patch! Yet something to improve:

[auto build test ERROR on v5.4-rc8]
[cannot apply to efi/next linus/master linux/master v5.4 v5.4-rc7 next-20191202 next-20191203]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Matthew-Garrett/Allow-disabling-PCI-busmastering-on-bridges-during-boot/20191203-084747
base:    af42d3466bdc8f39806b26f593604fdc54140bcb
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from drivers/firmware//efi/libstub/pci.c:12:0:
   drivers/firmware//efi/libstub/pci.c: In function 'efi_pci_disable_bridge_busmaster':
>> arch/arm64/include/asm/efi.h:96:33: error: 'sys_table_arg' undeclared (first use in this function); did you mean 'sys_table'?
    #define efi_call_early(f, ...)  sys_table_arg->boottime->f(__VA_ARGS__)
                                    ^
>> drivers/firmware//efi/libstub/pci.c:29:11: note: in expansion of macro 'efi_call_early'
     status = efi_call_early(locate_handle,
              ^~~~~~~~~~~~~~
   arch/arm64/include/asm/efi.h:96:33: note: each undeclared identifier is reported only once for each function it appears in
    #define efi_call_early(f, ...)  sys_table_arg->boottime->f(__VA_ARGS__)
                                    ^
>> drivers/firmware//efi/libstub/pci.c:29:11: note: in expansion of macro 'efi_call_early'
     status = efi_call_early(locate_handle,
              ^~~~~~~~~~~~~~
>> arch/arm64/include/asm/efi.h:105:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/firmware//efi/libstub/pci.c:62:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
               ^~~~~~~~~~~~~~
>> arch/arm64/include/asm/efi.h:105:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware//efi/libstub/pci.c:70:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
               ^~~~~~~~~~~~~~
>> arch/arm64/include/asm/efi.h:105:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware//efi/libstub/pci.c:78:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.write, pci,
               ^~~~~~~~~~~~~~
--
   In file included from drivers/firmware/efi/libstub/pci.c:12:0:
   drivers/firmware/efi/libstub/pci.c: In function 'efi_pci_disable_bridge_busmaster':
>> arch/arm64/include/asm/efi.h:96:33: error: 'sys_table_arg' undeclared (first use in this function); did you mean 'sys_table'?
    #define efi_call_early(f, ...)  sys_table_arg->boottime->f(__VA_ARGS__)
                                    ^
   drivers/firmware/efi/libstub/pci.c:29:11: note: in expansion of macro 'efi_call_early'
     status = efi_call_early(locate_handle,
              ^~~~~~~~~~~~~~
   arch/arm64/include/asm/efi.h:96:33: note: each undeclared identifier is reported only once for each function it appears in
    #define efi_call_early(f, ...)  sys_table_arg->boottime->f(__VA_ARGS__)
                                    ^
   drivers/firmware/efi/libstub/pci.c:29:11: note: in expansion of macro 'efi_call_early'
     status = efi_call_early(locate_handle,
              ^~~~~~~~~~~~~~
>> arch/arm64/include/asm/efi.h:105:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware/efi/libstub/pci.c:62:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
               ^~~~~~~~~~~~~~
>> arch/arm64/include/asm/efi.h:105:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware/efi/libstub/pci.c:70:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
               ^~~~~~~~~~~~~~
>> arch/arm64/include/asm/efi.h:105:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware/efi/libstub/pci.c:78:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.write, pci,
               ^~~~~~~~~~~~~~

vim +96 arch/arm64/include/asm/efi.h

a13b00778e89c4 Ard Biesheuvel 2014-07-02   95  
a13b00778e89c4 Ard Biesheuvel 2014-07-02  @96  #define efi_call_early(f, ...)		sys_table_arg->boottime->f(__VA_ARGS__)
fc37206427ce38 Ard Biesheuvel 2016-04-25   97  #define __efi_call_early(f, ...)	f(__VA_ARGS__)
6d0ca4a47bf8cb David Howells  2017-02-06   98  #define efi_call_runtime(f, ...)	sys_table_arg->runtime->f(__VA_ARGS__)
fc37206427ce38 Ard Biesheuvel 2016-04-25   99  #define efi_is_64bit()			(true)
a13b00778e89c4 Ard Biesheuvel 2014-07-02  100  
c4db9c1e8c70bc Lukas Wunner   2018-07-20  101  #define efi_table_attr(table, attr, instance)				\
c4db9c1e8c70bc Lukas Wunner   2018-07-20  102  	((table##_t *)instance)->attr
c4db9c1e8c70bc Lukas Wunner   2018-07-20  103  
3552fdf29f01e5 Lukas Wunner   2016-11-12  104  #define efi_call_proto(protocol, f, instance, ...)			\
3552fdf29f01e5 Lukas Wunner   2016-11-12 @105  	((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
3552fdf29f01e5 Lukas Wunner   2016-11-12  106  

:::::: The code at line 96 was first introduced by commit
:::::: a13b00778e89c405cb224ef0926be6d71682d2a2 efi/arm64: efistub: Move shared dependencies to <asm/efi.h>

:::::: TO: Ard Biesheuvel <ard.biesheuvel@linaro.org>
:::::: CC: Matt Fleming <matt.fleming@intel.com>

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
Matthew Garrett Dec. 3, 2019, 7:36 p.m. UTC | #7
On Tue, Dec 3, 2019 at 5:38 AM Laszlo Ersek <lersek@redhat.com> wrote:

> (2) I'm not 100% convinced this threat model -- I hope I'm using the
> right term -- is useful. A PCI device will likely not "itself" set up
> DMA (maliciously or not) without a matching driver.

A malicious PCI device can absolutely set up DMA itself without a
matching driver. There's a couple of cases:

1) A device that's entirely under the control of an attacker. Using
external Thunderbolt devices to overwrite OS data has been
demonstrated on multiple occasions.

2) A device that's been compromised in some way. The UEFI driver is a
long way from the only software that's related to the device -
discrete GPUs boot themselves even in the absence of a driver, and if
that on-board code can be compromised in any persistent way then they
can be used to attack the OS.

> Is this a scenario where we trust the device driver that comes from the
> device's ROM BAR (let's say after the driver passes Secure Boot
> verification and after we measure it into the TPM), but don't trust the
> silicon jammed in the motherboard that presents the driver?

Yes, though it's not just internal devices that we need to worry about.

> (3) I never understood why the default behavior (or rather, "only"
> behavior) for system firmware wrt. the IOMMU at EBS was "whitelist
> everything". Why not "blacklist everything"?
>
> I understand the compat perspective, but the OS should at least be able
> to request such a full blackout through OsIndications or whatever. (With
> the SEV IOMMU driver in OVMF, that's what we do -- we set everything to
> encrypted.)

I'm working on that, but it would be nice to have an approach for
existing systems.
Matthew Garrett Dec. 3, 2019, 7:40 p.m. UTC | #8
On Tue, Dec 3, 2019 at 3:54 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

> There is no reason this shouldn't apply to ARM, but disabling bus
> mastering like that before the drivers themselves get a chance to do
> so is likely to cause trouble. Network devices or storage controllers
> that are still running and have live descriptor rings in DMA memory
> shouldn't get the rug pulled from under their feet like that by
> blindly disabling the BM attribute on all root ports before their
> drivers have had the opportunity to do this cleanly.

Yes, whether this causes problems is going to be influenced by the
behaviour of the hardware on the system. That's why I'm not defaulting
it to being enabled :)

> One trick we implemented in EDK2 for memory encryption was to do the
> following (Laszlo, mind correcting me here if I am remembering this
> wrong?)
> - create an event X
> - register an AtExitBootServices event that signals event X in its handler
> - in the handler of event X, iterate over all PPBs to clear the bus
> master attribute
> - for bonus points, do the same for the PCIe devices themselves,
> because root ports are known to exist that entirely ignore the BM
> attribute
>
> This way, event X should get handled after all the drivers' EBS event
> handlers have been called.

Can we guarantee that this happens before IOMMU state teardown? I
don't think there's a benefit to clearing the bit on endpoint devices,
if they're malicious they're just going to turn it back on anyway.
Matthew Garrett Dec. 3, 2019, 7:41 p.m. UTC | #9
On Tue, Dec 3, 2019 at 7:30 AM Andy Lutomirski <luto@amacapital.net> wrote:

> Would a similar patch apply to non-EFI boot?  That is, in a BIOS boot,
> is busmastering on when the kernel is loaded?

It's only relevant where firmware configures the IOMMU but then
removes that configuration before handing control to the OS. I'm not
aware of that happening anywhere other than EFI.
Laszlo Ersek Dec. 4, 2019, 7:11 a.m. UTC | #10
On 12/03/19 20:40, Matthew Garrett wrote:
> On Tue, Dec 3, 2019 at 3:54 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> 
>> There is no reason this shouldn't apply to ARM, but disabling bus
>> mastering like that before the drivers themselves get a chance to do
>> so is likely to cause trouble. Network devices or storage controllers
>> that are still running and have live descriptor rings in DMA memory
>> shouldn't get the rug pulled from under their feet like that by
>> blindly disabling the BM attribute on all root ports before their
>> drivers have had the opportunity to do this cleanly.
> 
> Yes, whether this causes problems is going to be influenced by the
> behaviour of the hardware on the system. That's why I'm not defaulting
> it to being enabled :)
> 
>> One trick we implemented in EDK2 for memory encryption was to do the
>> following (Laszlo, mind correcting me here if I am remembering this
>> wrong?)
>> - create an event X
>> - register an AtExitBootServices event that signals event X in its handler
>> - in the handler of event X, iterate over all PPBs to clear the bus
>> master attribute
>> - for bonus points, do the same for the PCIe devices themselves,
>> because root ports are known to exist that entirely ignore the BM
>> attribute
>>
>> This way, event X should get handled after all the drivers' EBS event
>> handlers have been called.
> 
> Can we guarantee that this happens before IOMMU state teardown?

In OVMF, the handler of "event X" is in the IOMMU driver itself, so it's
the IOMMU driver that takes care of blacklisting everything *after*
other drivers had a chance to clean up.

But in this case, we'd have to insert the PPB clearing *before* the
(platform's) IOMMU driver's EBS handler (because the latter is going to
deny, not permit, everything); and we can't modify the IOMMU driver.

I guess we could install an EBS handler with TPL_NOTIFY (PciIo usage
appears permitted at TPL_NOTIFY, from "Table 27. TPL Restrictions"). But:
- if the IOMMU driver's EBS handler is also to be enqueued at
TPL_NOTIFY, then the order will be unspecified
- if a PCI driver sets up an EBS handler at TPL_CALLBACK, then in our
handler we could shut down a PPB in front of a device bound by that
driver too early.

Handling dependencies between event notification functions is a
never-ending struggle in UEFI, AFAICT.

> I don't think there's a benefit to clearing the bit on endpoint devices,
> if they're malicious they're just going to turn it back on anyway.
> 

Thanks
Laszlo
Matthew Garrett Dec. 4, 2019, 7:29 p.m. UTC | #11
On Tue, Dec 3, 2019 at 11:11 PM Laszlo Ersek <lersek@redhat.com> wrote:
> But in this case, we'd have to insert the PPB clearing *before* the
> (platform's) IOMMU driver's EBS handler (because the latter is going to
> deny, not permit, everything); and we can't modify the IOMMU driver.
>
> I guess we could install an EBS handler with TPL_NOTIFY (PciIo usage
> appears permitted at TPL_NOTIFY, from "Table 27. TPL Restrictions"). But:
> - if the IOMMU driver's EBS handler is also to be enqueued at
> TPL_NOTIFY, then the order will be unspecified
> - if a PCI driver sets up an EBS handler at TPL_CALLBACK, then in our
> handler we could shut down a PPB in front of a device bound by that
> driver too early.

Yeah, that's my concern - doing this more correctly seems to leave us
in a situation where we're no longer able to make guarantees about the
security properties of the feature. I think I prefer going with
something that's guaranteed to give us the properties we want, even at
the expense of some compatibility - users who want this can validate
it against their platform.
Andy Lutomirski Dec. 4, 2019, 7:50 p.m. UTC | #12
On Tue, Dec 3, 2019 at 11:41 AM Matthew Garrett <mjg59@google.com> wrote:
>
> On Tue, Dec 3, 2019 at 7:30 AM Andy Lutomirski <luto@amacapital.net> wrote:
>
> > Would a similar patch apply to non-EFI boot?  That is, in a BIOS boot,
> > is busmastering on when the kernel is loaded?
>
> It's only relevant where firmware configures the IOMMU but then
> removes that configuration before handing control to the OS. I'm not
> aware of that happening anywhere other than EFI.

Wouldn't it also be applicable in the much simpler case where the
firmware hands over control with no IOMMU configured but also with the
busmastering bit cleared.  Does firmware do this?  Does the kernel
currently configure the iOMMU before enabling busmastering?

--Andy
Matthew Garrett Dec. 4, 2019, 7:56 p.m. UTC | #13
On Wed, Dec 4, 2019 at 11:50 AM Andy Lutomirski <luto@amacapital.net> wrote:

> Wouldn't it also be applicable in the much simpler case where the
> firmware hands over control with no IOMMU configured but also with the
> busmastering bit cleared.  Does firmware do this?  Does the kernel
> currently configure the iOMMU before enabling busmastering?

We already handle this case - the kernel doesn't activate busmastering
until after it does IOMMU setup.
kernel test robot Dec. 5, 2019, 1:04 p.m. UTC | #14
Hi Matthew,

I love your patch! Yet something to improve:

[auto build test ERROR on v5.4-rc8]
[also build test ERROR on linux/master]
[cannot apply to efi/next linus/master next-20191203]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Matthew-Garrett/Allow-disabling-PCI-busmastering-on-bridges-during-boot/20191203-084747
base:    af42d3466bdc8f39806b26f593604fdc54140bcb
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from drivers/firmware/efi/libstub/pci.c:12:0:
   drivers/firmware/efi/libstub/pci.c: In function 'efi_pci_disable_bridge_busmaster':
>> arch/arm/include/asm/efi.h:53:33: error: 'sys_table_arg' undeclared (first use in this function); did you mean 'sys_table'?
    #define efi_call_early(f, ...)  sys_table_arg->boottime->f(__VA_ARGS__)
                                    ^
   drivers/firmware/efi/libstub/pci.c:29:11: note: in expansion of macro 'efi_call_early'
     status = efi_call_early(locate_handle,
              ^~~~~~~~~~~~~~
   arch/arm/include/asm/efi.h:53:33: note: each undeclared identifier is reported only once for each function it appears in
    #define efi_call_early(f, ...)  sys_table_arg->boottime->f(__VA_ARGS__)
                                    ^
   drivers/firmware/efi/libstub/pci.c:29:11: note: in expansion of macro 'efi_call_early'
     status = efi_call_early(locate_handle,
              ^~~~~~~~~~~~~~
>> arch/arm/include/asm/efi.h:62:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware/efi/libstub/pci.c:62:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
               ^~~~~~~~~~~~~~
>> arch/arm/include/asm/efi.h:62:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware/efi/libstub/pci.c:70:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
               ^~~~~~~~~~~~~~
>> arch/arm/include/asm/efi.h:62:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware/efi/libstub/pci.c:78:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.write, pci,
               ^~~~~~~~~~~~~~
--
   In file included from drivers/firmware//efi/libstub/pci.c:12:0:
   drivers/firmware//efi/libstub/pci.c: In function 'efi_pci_disable_bridge_busmaster':
>> arch/arm/include/asm/efi.h:53:33: error: 'sys_table_arg' undeclared (first use in this function); did you mean 'sys_table'?
    #define efi_call_early(f, ...)  sys_table_arg->boottime->f(__VA_ARGS__)
                                    ^
   drivers/firmware//efi/libstub/pci.c:29:11: note: in expansion of macro 'efi_call_early'
     status = efi_call_early(locate_handle,
              ^~~~~~~~~~~~~~
   arch/arm/include/asm/efi.h:53:33: note: each undeclared identifier is reported only once for each function it appears in
    #define efi_call_early(f, ...)  sys_table_arg->boottime->f(__VA_ARGS__)
                                    ^
   drivers/firmware//efi/libstub/pci.c:29:11: note: in expansion of macro 'efi_call_early'
     status = efi_call_early(locate_handle,
              ^~~~~~~~~~~~~~
>> arch/arm/include/asm/efi.h:62:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware//efi/libstub/pci.c:62:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
               ^~~~~~~~~~~~~~
>> arch/arm/include/asm/efi.h:62:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware//efi/libstub/pci.c:70:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
               ^~~~~~~~~~~~~~
>> arch/arm/include/asm/efi.h:62:3: error: called object is not a function or function pointer
     ((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
     ~^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/firmware//efi/libstub/pci.c:78:12: note: in expansion of macro 'efi_call_proto'
      status = efi_call_proto(efi_pci_io_protocol, pci.write, pci,
               ^~~~~~~~~~~~~~

vim +53 arch/arm/include/asm/efi.h

81a0bc39ea1960 Roy Franz      2015-09-23  52  
81a0bc39ea1960 Roy Franz      2015-09-23 @53  #define efi_call_early(f, ...)		sys_table_arg->boottime->f(__VA_ARGS__)
fc37206427ce38 Ard Biesheuvel 2016-04-25  54  #define __efi_call_early(f, ...)	f(__VA_ARGS__)
6d0ca4a47bf8cb David Howells  2017-02-06  55  #define efi_call_runtime(f, ...)	sys_table_arg->runtime->f(__VA_ARGS__)
fc37206427ce38 Ard Biesheuvel 2016-04-25  56  #define efi_is_64bit()			(false)
81a0bc39ea1960 Roy Franz      2015-09-23  57  
c4db9c1e8c70bc Lukas Wunner   2018-07-20  58  #define efi_table_attr(table, attr, instance)				\
c4db9c1e8c70bc Lukas Wunner   2018-07-20  59  	((table##_t *)instance)->attr
c4db9c1e8c70bc Lukas Wunner   2018-07-20  60  
3552fdf29f01e5 Lukas Wunner   2016-11-12  61  #define efi_call_proto(protocol, f, instance, ...)			\
3552fdf29f01e5 Lukas Wunner   2016-11-12 @62  	((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
3552fdf29f01e5 Lukas Wunner   2016-11-12  63  

:::::: The code at line 53 was first introduced by commit
:::::: 81a0bc39ea1960bbf8ece6a895d7cfd2d9efa28a ARM: add UEFI stub support

:::::: TO: Roy Franz <roy.franz@linaro.org>
:::::: CC: ard <ard.biesheuvel@linaro.org>

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
Ard Biesheuvel Dec. 12, 2019, 3:46 p.m. UTC | #15
On Wed, 4 Dec 2019 at 20:56, Matthew Garrett <mjg59@google.com> wrote:
>
> On Wed, Dec 4, 2019 at 11:50 AM Andy Lutomirski <luto@amacapital.net> wrote:
>
> > Wouldn't it also be applicable in the much simpler case where the
> > firmware hands over control with no IOMMU configured but also with the
> > busmastering bit cleared.  Does firmware do this?  Does the kernel
> > currently configure the iOMMU before enabling busmastering?
>
> We already handle this case - the kernel doesn't activate busmastering
> until after it does IOMMU setup.

Build issues aside (which we already handled off list), I think we
should consider the following concerns I have about this patch:
- make it work on ARM (already done)
- make the cmdline option an efi=xxx one, this makes it obvious which
context this is active in
- I would prefer it if we could make it more obvious that this affects
PCI DMA only, other masters are unaffected by any of this.
- I don't think the presence of the IOMMU is entirely relevant - even
in the absence of an IOMMU, I would prefer bus mastering to be
disabled until the OS driver takes control. This is already part of
the EFI<->handover contract, but it makes sense to have this on top
just in case.
- What about integrated masters? On the systems I have access to,
there are a lot of DMA capable endpoints that sit on bus 0 without any
root port or PCI bridge in between
- Should we treat GOP producers differently? Or perhaps only if the
efifb address is known to be carved out of system memory?

If we come up with a good story here in terms of policy, we may be
able to enable this by default, which would be a win imo.
Matthew Garrett Dec. 13, 2019, 9:24 p.m. UTC | #16
On Thu, Dec 12, 2019 at 7:46 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On Wed, 4 Dec 2019 at 20:56, Matthew Garrett <mjg59@google.com> wrote:
> > We already handle this case - the kernel doesn't activate busmastering
> > until after it does IOMMU setup.
>
> Build issues aside (which we already handled off list), I think we
> should consider the following concerns I have about this patch:
> - make it work on ARM (already done)
> - make the cmdline option an efi=xxx one, this makes it obvious which
> context this is active in

Ok.

> - I would prefer it if we could make it more obvious that this affects
> PCI DMA only, other masters are unaffected by any of this.

Ok - in terms of naming, or in terms of documentation?

> - What about integrated masters? On the systems I have access to,
> there are a lot of DMA capable endpoints that sit on bus 0 without any
> root port or PCI bridge in between

There's not really anything we can do about those. My gut feeling is
that if you're in a situation where you can't trust your integrated
chipset then you're going to have trouble building any real trust in
the platform.

> - Should we treat GOP producers differently? Or perhaps only if the
> efifb address is known to be carved out of system memory?

Hm, good question. Video cards are one of the most complicated devices
on the system, so I'd prefer not to leave us vulnerable to them. Maybe
try this as an opt-in thing for a while and see whether people find
graphics-related breakage?

> If we come up with a good story here in terms of policy, we may be
> able to enable this by default, which would be a win imo.

I'm pretty sure we're going to have some hardware that this just
breaks on, unfortunately - Apple's EFI driver for Broadcom wifi used
to continue DMAing over ExitBootServices(), and the "easy" fix of
disabling BME on it beforehand resulted in the card wedging on driver
load, so I think we'll see other devices that have similar behaviour.

(We "fixed" the Apple case by putting the card into S3)
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8dee8f68fe15..2d538f27f861 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -881,6 +881,11 @@ 
 			The feature only exists starting from
 			Arch Perfmon v4 (Skylake and newer).
 
+	disable_busmaster= [X86, EFI]
+			Format: <int>
+			Disable the busmaster bit on all PCI bridges in the
+			EFI boot stub if 1, leave in the default state if 0.
+
 	disable_ddw	[PPC/PSERIES]
 			Disable Dynamic DMA Window support. Use this if
 			to workaround buggy firmware.
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 82bc60c8acb2..2f5226b32a6e 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -137,6 +137,8 @@  static void setup_efi_pci(struct boot_params *params)
 	struct setup_data *data;
 	int i;
 
+	efi_pci_disable_bridge_busmaster(sys_table);
+
 	status = efi_call_early(locate_handle,
 				EFI_LOCATE_BY_PROTOCOL,
 				&pci_proto, NULL, &size, pci_handle);
diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index b248870a9806..b1651b40424e 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -194,6 +194,29 @@  config EFI_RCI2_TABLE
 
 	  Say Y here for Dell EMC PowerEdge systems.
 
+config EFI_NO_BUSMASTER
+       bool "Clear Busmaster bit on PCI bridges before ExitBootServices()"
+       depends on X86 || COMPILE_TEST
+       help
+	  Disable the busmaster bit in the control register on all PCI bridges
+	  before calling ExitBootServices() and passing control to the runtime
+	  kernel. System firmware may configure the IOMMU to prevent malicious
+	  PCI devices from being able to attack the OS via DMA. However, since
+	  firmware can't guarantee that the OS is IOMMU-aware, it will tear
+	  down IOMMU configuration when ExitBootServices() is called. This
+	  leaves a window between where a hostile device could still cause
+	  damage before Linux configures the IOMMU again.
+
+	  If you say Y here, the EFI stub will clear the busmaster bit on all
+	  PCI bridges before ExitBootServices() is called. This will prevent
+	  any malicious PCI devices from being able to perform DMA until the
+	  kernel reenables busmastering after configuring the IOMMU.
+
+	  This option will cause failures with some poorly behaved hardware
+	  and should not be enabled without testing. The kernel commandline
+	  option "disable_busmaster=0" or "disable_busmaster=1" may be used
+	  to override this option.
+
 endmenu
 
 config UEFI_CPER
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index ee0661ddb25b..6536ee851334 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -38,7 +38,7 @@  OBJECT_FILES_NON_STANDARD	:= y
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
 KCOV_INSTRUMENT			:= n
 
-lib-y				:= efi-stub-helper.o gop.o secureboot.o tpm.o
+lib-y				:= efi-stub-helper.o gop.o pci.o secureboot.o tpm.o
 
 # include the stub's generic dependencies from lib/ when building for ARM/arm64
 arm-deps-y := fdt_rw.c fdt_ro.c fdt_wip.c fdt.c fdt_empty_tree.c fdt_sw.c
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 35dbc2791c97..69faeeba1cdb 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -32,6 +32,8 @@  static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
 static int __section(.data) __nokaslr;
 static int __section(.data) __quiet;
 static int __section(.data) __novamap;
+static int __section(.data) __disable_busmaster =
+	IS_ENABLED(CONFIG_EFI_NO_BUSMASTER);
 
 int __pure nokaslr(void)
 {
@@ -46,6 +48,11 @@  int __pure novamap(void)
 	return __novamap;
 }
 
+int __pure nobusmaster(void)
+{
+	return __disable_busmaster;
+}
+
 #define EFI_MMAP_NR_SLACK_SLOTS	8
 
 struct file_info {
@@ -458,6 +465,18 @@  efi_status_t efi_parse_options(char const *cmdline)
 	if (str == cmdline || (str && str > cmdline && *(str - 1) == ' '))
 		__quiet = 1;
 
+	str = strstr(cmdline, "disable_busmaster=");
+	if (str) {
+		switch (*(str + strlen("disable_busmaster="))) {
+		case '0':
+			__disable_busmaster = 0;
+			break;
+		case '1':
+			__disable_busmaster = 1;
+			break;
+		}
+	}
+
 	/*
 	 * If no EFI parameters were specified on the cmdline we've got
 	 * nothing to do.
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 7f1556fd867d..12f5f80f6851 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -28,6 +28,7 @@ 
 extern int __pure nokaslr(void);
 extern int __pure is_quiet(void);
 extern int __pure novamap(void);
+extern int __pure nobusmaster(void);
 
 #define pr_efi(sys_table, msg)		do {				\
 	if (!is_quiet()) efi_printk(sys_table, "EFI stub: "msg);	\
diff --git a/drivers/firmware/efi/libstub/pci.c b/drivers/firmware/efi/libstub/pci.c
new file mode 100644
index 000000000000..dc0db8a1e248
--- /dev/null
+++ b/drivers/firmware/efi/libstub/pci.c
@@ -0,0 +1,88 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCI-related functions used by the EFI stub on multiple
+ * architectures.
+ *
+ * Copyright 2019 Google, LLC
+ */
+
+#include <linux/efi.h>
+#include <linux/pci.h>
+
+#include <asm/efi.h>
+
+#include "efistub.h"
+
+void efi_pci_disable_bridge_busmaster(efi_system_table_t *sys_table)
+{
+	efi_status_t status;
+	void **pci_handle = NULL;
+	efi_guid_t pci_proto = EFI_PCI_IO_PROTOCOL_GUID;
+	unsigned long size = 0;
+	unsigned long nr_pci;
+	u16 class, command;
+	int i;
+
+	if (!nobusmaster())
+		return;
+
+	status = efi_call_early(locate_handle,
+				EFI_LOCATE_BY_PROTOCOL,
+				&pci_proto, NULL, &size, pci_handle);
+
+	if (status == EFI_BUFFER_TOO_SMALL) {
+		status = efi_call_early(allocate_pool,
+					EFI_LOADER_DATA,
+					size, (void **)&pci_handle);
+
+		if (status != EFI_SUCCESS) {
+			efi_printk(sys_table, "Failed to allocate memory for 'pci_handle'\n");
+			return;
+		}
+
+		status = efi_call_early(locate_handle,
+					EFI_LOCATE_BY_PROTOCOL, &pci_proto,
+					NULL, &size, pci_handle);
+	}
+
+	if (status != EFI_SUCCESS)
+		goto free_handle;
+
+	nr_pci = size / (efi_is_64bit() ? sizeof(u64) : sizeof(u32));
+	for (i = 0; i < nr_pci; i++) {
+		efi_pci_io_protocol_t *pci = NULL;
+
+		status = efi_call_early(handle_protocol,
+					efi_is_64bit() ? ((u64 *)pci_handle)[i]
+						       : ((u32 *)pci_handle)[i],
+					&pci_proto, (void **)&pci);
+		if (status != EFI_SUCCESS || !pci)
+			continue;
+
+		status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
+					EfiPciIoWidthUint16, PCI_CLASS_DEVICE,
+					1, &class);
+
+		if (status != EFI_SUCCESS || class != PCI_CLASS_BRIDGE_PCI)
+			continue;
+
+		/* Disable busmastering */
+		status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
+					EfiPciIoWidthUint16, PCI_COMMAND, 1,
+					&command);
+		if (status != EFI_SUCCESS ||
+		    !(command & PCI_COMMAND_MASTER))
+			continue;
+
+		command &= ~PCI_COMMAND_MASTER;
+		status = efi_call_proto(efi_pci_io_protocol, pci.write, pci,
+					EfiPciIoWidthUint16, PCI_COMMAND, 1,
+					&command);
+		if (status != EFI_SUCCESS)
+			efi_printk(sys_table,
+				   "Failed to disable PCI busmastering\n");
+	}
+
+free_handle:
+	efi_call_early(free_pool, pci_handle);
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index d87acf62958e..b0de9b4a0ad1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1647,6 +1647,8 @@  efi_enable_reset_attack_mitigation(efi_system_table_t *sys_table_arg) { }
 
 void efi_retrieve_tpm2_eventlog(efi_system_table_t *sys_table);
 
+void efi_pci_disable_bridge_busmaster(efi_system_table_t *sys_table);
+
 /*
  * Arch code can implement the following three template macros, avoiding
  * reptition for the void/non-void return cases of {__,}efi_call_virt():