diff mbox series

[XEN,v2] hvmloader: Enable MMIO and I/O decode, after all resource allocation

Message ID bca361efe8061c470a4a27470dd247ee8d53af59.1586813622.git.havanur@amazon.com (mailing list archive)
State Superseded
Headers show
Series [XEN,v2] hvmloader: Enable MMIO and I/O decode, after all resource allocation | expand

Commit Message

Shamsundara Havanur, Harsha April 13, 2020, 9:33 p.m. UTC
It was observed that PCI MMIO and/or IO BARs were programmed with
BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
register) enabled, during PCI setup phase. This resulted in
incorrect memory mapping as soon as the lower half of the 64 bit bar
is programmed, which displaced any RAM mappings under 4G. After the
upper half is programmed PCI memory mapping is restored to its
intended mapping but the RAM displaced is not restored. The OS then
continues to boot and function until it tries to access the displaced
RAM at which point it suffers a page fault and crashes.

This patch address the issue by deferring enablement of memory and
I/O decode in command register until all the resources, like interrupts
I/O and/or MMIO BARs for all the PCI device functions are programmed.

Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/firmware/hvmloader/pci.c | 35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

Comments

Jan Beulich April 14, 2020, 7:42 a.m. UTC | #1
On 13.04.2020 23:33, Harsha Shamsundara Havanur wrote:
> It was observed that PCI MMIO and/or IO BARs were programmed with
> BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> register) enabled, during PCI setup phase. This resulted in
> incorrect memory mapping as soon as the lower half of the 64 bit bar
> is programmed, which displaced any RAM mappings under 4G. After the
> upper half is programmed PCI memory mapping is restored to its
> intended mapping but the RAM displaced is not restored. The OS then
> continues to boot and function until it tries to access the displaced
> RAM at which point it suffers a page fault and crashes.
> 
> This patch address the issue by deferring enablement of memory and
> I/O decode in command register until all the resources, like interrupts
> I/O and/or MMIO BARs for all the PCI device functions are programmed.
> 
> Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
> Reviewed-by: Paul Durrant <pdurrant@amazon.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/firmware/hvmloader/pci.c | 35 +++++++++++++++++++++++++++--------
>  1 file changed, 27 insertions(+), 8 deletions(-)

There not being any description of what has changed in v2, I also
can't easily judge whether keeping the two tags above was
legitimate. In any event you don't seem to have taken care of all
review feedback (whether by making changes to the patch or by
replying verbally).

> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -84,6 +84,7 @@ void pci_setup(void)
>      uint32_t vga_devfn = 256;
>      uint16_t class, vendor_id, device_id;
>      unsigned int bar, pin, link, isa_irq;
> +    uint8_t pci_devfn_decode_type[256] = {};
>  
>      /* Resources assignable to PCI devices via BARs. */
>      struct resource {
> @@ -120,6 +121,9 @@ void pci_setup(void)
>       */
>      bool allow_memory_relocate = 1;
>  
> +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_MEMORY != PCI_COMMAND_MEMORY);
> +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO != PCI_COMMAND_IO);
> +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO != PCI_COMMAND_MASTER);

This looks like a copy-and-paste mistake - are you sure you've
build-tested this? (This alone likely invalidates the tags, as
per above.)

> @@ -289,9 +293,22 @@ void pci_setup(void)
>                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
>          }
>  
> -        /* Enable bus mastering. */
> +        /*
> +         * Disable bus mastering, memory and I/O space, which is typical device
> +         * reset state. It is recommended that BAR programming be done whilst
> +         * decode bits are cleared to avoid incorrect mappings being created,
> +         * when 64-bit memory BAR is programmed first by writing the lower half
> +         * and then the upper half, which first maps to an address under 4G
> +         * replacing any RAM mapped in that address, which is not restored
> +         * back after the upper half is written and PCI memory is correctly
> +         * mapped to its intended high mem address.
> +         *
> +         * Capture the state of bus master to restore it back once BAR
> +         * programming is completed.
> +         */
>          cmd = pci_readw(devfn, PCI_COMMAND);
> -        cmd |= PCI_COMMAND_MASTER;
> +        pci_devfn_decode_type[devfn] = cmd & ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
> +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY | PCI_COMMAND_IO);

The disabling of MASTER was put under question in v1 already.

> @@ -526,10 +542,13 @@ void pci_setup(void)
>           * has IO enabled, even if there is no I/O BAR on that
>           * particular device.
>           */
> -        cmd = pci_readw(vga_devfn, PCI_COMMAND);
> -        cmd |= PCI_COMMAND_IO;
> -        pci_writew(vga_devfn, PCI_COMMAND, cmd);
> +        pci_devfn_decode_type[vga_devfn] |= PCI_COMMAND_IO;
>      }
> +
> +    /* Enable memory and I/O space. Restore saved BUS MASTER state */
> +    for ( devfn = 0; devfn < 256; devfn++ )
> +        if ( pci_devfn_decode_type[devfn] )
> +            pci_writew(devfn, PCI_COMMAND, pci_devfn_decode_type[devfn]);

You effectively clear the upper 8 bits here, rather than retaining
them.

Jan
Roger Pau Monné April 14, 2020, 8:10 a.m. UTC | #2
On Mon, Apr 13, 2020 at 09:33:42PM +0000, Harsha Shamsundara Havanur wrote:
> It was observed that PCI MMIO and/or IO BARs were programmed with
> BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> register) enabled, during PCI setup phase. This resulted in
> incorrect memory mapping as soon as the lower half of the 64 bit bar
> is programmed, which displaced any RAM mappings under 4G. After the
> upper half is programmed PCI memory mapping is restored to its
> intended mapping but the RAM displaced is not restored. The OS then
> continues to boot and function until it tries to access the displaced
> RAM at which point it suffers a page fault and crashes.
> 
> This patch address the issue by deferring enablement of memory and
> I/O decode in command register until all the resources, like interrupts
> I/O and/or MMIO BARs for all the PCI device functions are programmed.
> 
> Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
> Reviewed-by: Paul Durrant <pdurrant@amazon.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/firmware/hvmloader/pci.c | 35 +++++++++++++++++++++++++++--------
>  1 file changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 0b708bf578..f74471b255 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -84,6 +84,7 @@ void pci_setup(void)
>      uint32_t vga_devfn = 256;
>      uint16_t class, vendor_id, device_id;
>      unsigned int bar, pin, link, isa_irq;
> +    uint8_t pci_devfn_decode_type[256] = {};
>  
>      /* Resources assignable to PCI devices via BARs. */
>      struct resource {
> @@ -120,6 +121,9 @@ void pci_setup(void)
>       */
>      bool allow_memory_relocate = 1;
>  
> +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_MEMORY != PCI_COMMAND_MEMORY);
> +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO != PCI_COMMAND_IO);
> +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO != PCI_COMMAND_MASTER);
>      s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
>      if ( s )
>          allow_memory_relocate = strtoll(s, NULL, 0);
> @@ -289,9 +293,22 @@ void pci_setup(void)
>                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
>          }
>  
> -        /* Enable bus mastering. */
> +        /*
> +         * Disable bus mastering, memory and I/O space, which is typical device
> +         * reset state. It is recommended that BAR programming be done whilst
> +         * decode bits are cleared to avoid incorrect mappings being created,
> +         * when 64-bit memory BAR is programmed first by writing the lower half
> +         * and then the upper half, which first maps to an address under 4G
> +         * replacing any RAM mapped in that address, which is not restored
> +         * back after the upper half is written and PCI memory is correctly
> +         * mapped to its intended high mem address.
> +         *
> +         * Capture the state of bus master to restore it back once BAR
> +         * programming is completed.
> +         */
>          cmd = pci_readw(devfn, PCI_COMMAND);
> -        cmd |= PCI_COMMAND_MASTER;
> +        pci_devfn_decode_type[devfn] = cmd & ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
> +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
>          pci_writew(devfn, PCI_COMMAND, cmd);
>      }
>  
> @@ -503,10 +520,9 @@ void pci_setup(void)
>          if ( (bar_reg == PCI_ROM_ADDRESS) ||
>               ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>                PCI_BASE_ADDRESS_SPACE_MEMORY) )
> -            cmd |= PCI_COMMAND_MEMORY;
> +            pci_devfn_decode_type[devfn] |= PCI_COMMAND_MEMORY;
>          else
> -            cmd |= PCI_COMMAND_IO;
> -        pci_writew(devfn, PCI_COMMAND, cmd);
> +            pci_devfn_decode_type[devfn] |= PCI_COMMAND_IO;
>      }
>  
>      if ( pci_hi_mem_start )
> @@ -526,10 +542,13 @@ void pci_setup(void)
>           * has IO enabled, even if there is no I/O BAR on that
>           * particular device.
>           */
> -        cmd = pci_readw(vga_devfn, PCI_COMMAND);
> -        cmd |= PCI_COMMAND_IO;
> -        pci_writew(vga_devfn, PCI_COMMAND, cmd);
> +        pci_devfn_decode_type[vga_devfn] |= PCI_COMMAND_IO;
>      }
> +
> +    /* Enable memory and I/O space. Restore saved BUS MASTER state */
> +    for ( devfn = 0; devfn < 256; devfn++ )
> +        if ( pci_devfn_decode_type[devfn] )
> +            pci_writew(devfn, PCI_COMMAND, pci_devfn_decode_type[devfn]);

Why don't you enable the decoding after done with programming all the
BARs on the device in the loop above? Is there any reason to defer
this until all devices have been programmed?

If so, I think you would also need to introduce a pre-loop that
disables all of this for all devices before programming the BARs, or
else you are still programming BARs while some devices might have the
bus mastering or decoding bits enabled.

Thanks, Roger.
Roger Pau Monné April 14, 2020, 8:20 a.m. UTC | #3
On Tue, Apr 14, 2020 at 10:10:09AM +0200, Roger Pau Monné wrote:
> On Mon, Apr 13, 2020 at 09:33:42PM +0000, Harsha Shamsundara Havanur wrote:
> > It was observed that PCI MMIO and/or IO BARs were programmed with
> > BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> > register) enabled, during PCI setup phase. This resulted in
> > incorrect memory mapping as soon as the lower half of the 64 bit bar
> > is programmed, which displaced any RAM mappings under 4G. After the
> > upper half is programmed PCI memory mapping is restored to its
> > intended mapping but the RAM displaced is not restored. The OS then
> > continues to boot and function until it tries to access the displaced
> > RAM at which point it suffers a page fault and crashes.
> > 
> > This patch address the issue by deferring enablement of memory and
> > I/O decode in command register until all the resources, like interrupts
> > I/O and/or MMIO BARs for all the PCI device functions are programmed.
> > 
> > Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
> > Reviewed-by: Paul Durrant <pdurrant@amazon.com>
> > Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> >  tools/firmware/hvmloader/pci.c | 35 +++++++++++++++++++++++++++--------
> >  1 file changed, 27 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> > index 0b708bf578..f74471b255 100644
> > --- a/tools/firmware/hvmloader/pci.c
> > +++ b/tools/firmware/hvmloader/pci.c
> > @@ -84,6 +84,7 @@ void pci_setup(void)
> >      uint32_t vga_devfn = 256;
> >      uint16_t class, vendor_id, device_id;
> >      unsigned int bar, pin, link, isa_irq;
> > +    uint8_t pci_devfn_decode_type[256] = {};
> >  
> >      /* Resources assignable to PCI devices via BARs. */
> >      struct resource {
> > @@ -120,6 +121,9 @@ void pci_setup(void)
> >       */
> >      bool allow_memory_relocate = 1;
> >  
> > +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_MEMORY != PCI_COMMAND_MEMORY);
> > +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO != PCI_COMMAND_IO);
> > +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO != PCI_COMMAND_MASTER);
> >      s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
> >      if ( s )
> >          allow_memory_relocate = strtoll(s, NULL, 0);
> > @@ -289,9 +293,22 @@ void pci_setup(void)
> >                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
> >          }
> >  
> > -        /* Enable bus mastering. */
> > +        /*
> > +         * Disable bus mastering, memory and I/O space, which is typical device
> > +         * reset state. It is recommended that BAR programming be done whilst
> > +         * decode bits are cleared to avoid incorrect mappings being created,
> > +         * when 64-bit memory BAR is programmed first by writing the lower half
> > +         * and then the upper half, which first maps to an address under 4G
> > +         * replacing any RAM mapped in that address, which is not restored
> > +         * back after the upper half is written and PCI memory is correctly
> > +         * mapped to its intended high mem address.
> > +         *
> > +         * Capture the state of bus master to restore it back once BAR
> > +         * programming is completed.
> > +         */
> >          cmd = pci_readw(devfn, PCI_COMMAND);
> > -        cmd |= PCI_COMMAND_MASTER;
> > +        pci_devfn_decode_type[devfn] = cmd & ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
> > +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
> >          pci_writew(devfn, PCI_COMMAND, cmd);
> >      }
> >  
> > @@ -503,10 +520,9 @@ void pci_setup(void)
> >          if ( (bar_reg == PCI_ROM_ADDRESS) ||
> >               ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> >                PCI_BASE_ADDRESS_SPACE_MEMORY) )
> > -            cmd |= PCI_COMMAND_MEMORY;
> > +            pci_devfn_decode_type[devfn] |= PCI_COMMAND_MEMORY;
> >          else
> > -            cmd |= PCI_COMMAND_IO;
> > -        pci_writew(devfn, PCI_COMMAND, cmd);
> > +            pci_devfn_decode_type[devfn] |= PCI_COMMAND_IO;
> >      }
> >  
> >      if ( pci_hi_mem_start )
> > @@ -526,10 +542,13 @@ void pci_setup(void)
> >           * has IO enabled, even if there is no I/O BAR on that
> >           * particular device.
> >           */
> > -        cmd = pci_readw(vga_devfn, PCI_COMMAND);
> > -        cmd |= PCI_COMMAND_IO;
> > -        pci_writew(vga_devfn, PCI_COMMAND, cmd);
> > +        pci_devfn_decode_type[vga_devfn] |= PCI_COMMAND_IO;
> >      }
> > +
> > +    /* Enable memory and I/O space. Restore saved BUS MASTER state */
> > +    for ( devfn = 0; devfn < 256; devfn++ )
> > +        if ( pci_devfn_decode_type[devfn] )
> > +            pci_writew(devfn, PCI_COMMAND, pci_devfn_decode_type[devfn]);
> 
> Why don't you enable the decoding after done with programming all the
> BARs on the device in the loop above? Is there any reason to defer
> this until all devices have been programmed?
> 
> If so, I think you would also need to introduce a pre-loop that
> disables all of this for all devices before programming the BARs, or
> else you are still programming BARs while some devices might have the
> bus mastering or decoding bits enabled.

Oh, forget that last paragraph, I see that decoding is indeed disabled
before programming any devices BARs. I still think that it might be
feasible to enable it once all BARs on the device have been
programmed, which would allow to get rid of the extra loop and the
pci_devfn_decode_type local variable.

Thanks, Roger.
Shamsundara Havanur, Harsha April 14, 2020, 8:54 a.m. UTC | #4
On Tue, 2020-04-14 at 10:20 +0200, Roger Pau Monné wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> 
> On Tue, Apr 14, 2020 at 10:10:09AM +0200, Roger Pau Monné wrote:
> > On Mon, Apr 13, 2020 at 09:33:42PM +0000, Harsha Shamsundara
> > Havanur wrote:
> > > It was observed that PCI MMIO and/or IO BARs were programmed with
> > > BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> > > register) enabled, during PCI setup phase. This resulted in
> > > incorrect memory mapping as soon as the lower half of the 64 bit
> > > bar
> > > is programmed, which displaced any RAM mappings under 4G. After
> > > the
> > > upper half is programmed PCI memory mapping is restored to its
> > > intended mapping but the RAM displaced is not restored. The OS
> > > then
> > > continues to boot and function until it tries to access the
> > > displaced
> > > RAM at which point it suffers a page fault and crashes.
> > > 
> > > This patch address the issue by deferring enablement of memory
> > > and
> > > I/O decode in command register until all the resources, like
> > > interrupts
> > > I/O and/or MMIO BARs for all the PCI device functions are
> > > programmed.
> > > 
> > > Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
> > > Reviewed-by: Paul Durrant <pdurrant@amazon.com>
> > > Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > > ---
> > >  tools/firmware/hvmloader/pci.c | 35 +++++++++++++++++++++++++++-
> > > -------
> > >  1 file changed, 27 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/tools/firmware/hvmloader/pci.c
> > > b/tools/firmware/hvmloader/pci.c
> > > index 0b708bf578..f74471b255 100644
> > > --- a/tools/firmware/hvmloader/pci.c
> > > +++ b/tools/firmware/hvmloader/pci.c
> > > @@ -84,6 +84,7 @@ void pci_setup(void)
> > >      uint32_t vga_devfn = 256;
> > >      uint16_t class, vendor_id, device_id;
> > >      unsigned int bar, pin, link, isa_irq;
> > > +    uint8_t pci_devfn_decode_type[256] = {};
> > > 
> > >      /* Resources assignable to PCI devices via BARs. */
> > >      struct resource {
> > > @@ -120,6 +121,9 @@ void pci_setup(void)
> > >       */
> > >      bool allow_memory_relocate = 1;
> > > 
> > > +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_MEM
> > > ORY != PCI_COMMAND_MEMORY);
> > > +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO
> > > != PCI_COMMAND_IO);
> > > +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO
> > > != PCI_COMMAND_MASTER);
> > >      s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
> > >      if ( s )
> > >          allow_memory_relocate = strtoll(s, NULL, 0);
> > > @@ -289,9 +293,22 @@ void pci_setup(void)
> > >                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
> > >          }
> > > 
> > > -        /* Enable bus mastering. */
> > > +        /*
> > > +         * Disable bus mastering, memory and I/O space, which is
> > > typical device
> > > +         * reset state. It is recommended that BAR programming
> > > be done whilst
> > > +         * decode bits are cleared to avoid incorrect mappings
> > > being created,
> > > +         * when 64-bit memory BAR is programmed first by writing
> > > the lower half
> > > +         * and then the upper half, which first maps to an
> > > address under 4G
> > > +         * replacing any RAM mapped in that address, which is
> > > not restored
> > > +         * back after the upper half is written and PCI memory
> > > is correctly
> > > +         * mapped to its intended high mem address.
> > > +         *
> > > +         * Capture the state of bus master to restore it back
> > > once BAR
> > > +         * programming is completed.
> > > +         */
> > >          cmd = pci_readw(devfn, PCI_COMMAND);
> > > -        cmd |= PCI_COMMAND_MASTER;
> > > +        pci_devfn_decode_type[devfn] = cmd &
> > > ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
> > > +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY |
> > > PCI_COMMAND_IO);
> > >          pci_writew(devfn, PCI_COMMAND, cmd);
> > >      }
> > > 
> > > @@ -503,10 +520,9 @@ void pci_setup(void)
> > >          if ( (bar_reg == PCI_ROM_ADDRESS) ||
> > >               ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> > >                PCI_BASE_ADDRESS_SPACE_MEMORY) )
> > > -            cmd |= PCI_COMMAND_MEMORY;
> > > +            pci_devfn_decode_type[devfn] |= PCI_COMMAND_MEMORY;
> > >          else
> > > -            cmd |= PCI_COMMAND_IO;
> > > -        pci_writew(devfn, PCI_COMMAND, cmd);
> > > +            pci_devfn_decode_type[devfn] |= PCI_COMMAND_IO;
> > >      }
> > > 
> > >      if ( pci_hi_mem_start )
> > > @@ -526,10 +542,13 @@ void pci_setup(void)
> > >           * has IO enabled, even if there is no I/O BAR on that
> > >           * particular device.
> > >           */
> > > -        cmd = pci_readw(vga_devfn, PCI_COMMAND);
> > > -        cmd |= PCI_COMMAND_IO;
> > > -        pci_writew(vga_devfn, PCI_COMMAND, cmd);
> > > +        pci_devfn_decode_type[vga_devfn] |= PCI_COMMAND_IO;
> > >      }
> > > +
> > > +    /* Enable memory and I/O space. Restore saved BUS MASTER
> > > state */
> > > +    for ( devfn = 0; devfn < 256; devfn++ )
> > > +        if ( pci_devfn_decode_type[devfn] )
> > > +            pci_writew(devfn, PCI_COMMAND,
> > > pci_devfn_decode_type[devfn]);
> > 
> > Why don't you enable the decoding after done with programming all
> > the
> > BARs on the device in the loop above? Is there any reason to defer
> > this until all devices have been programmed?
> > 
> > If so, I think you would also need to introduce a pre-loop that
> > disables all of this for all devices before programming the BARs,
> > or
> > else you are still programming BARs while some devices might have
> > the
> > bus mastering or decoding bits enabled.
> 
> Oh, forget that last paragraph, I see that decoding is indeed
> disabled
> before programming any devices BARs. I still think that it might be
> feasible to enable it once all BARs on the device have been
> programmed, which would allow to get rid of the extra loop and the
> pci_devfn_decode_type local variable.

BARs are programmed sorted by their memory requirement and thus same
pci function could be programmed multiple times in this loop
422     /* Assign iomem and ioport resources in descending order of
size. */
423     for ( i = 0; i < nr_bars; i++ )

Hence I am waiting for this loop to be completed to enable decode bits
in a saparate loop later.
> 
> Thanks, Roger.
Shamsundara Havanur, Harsha April 14, 2020, 9 a.m. UTC | #5
On Tue, 2020-04-14 at 09:42 +0200, Jan Beulich wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> 
> On 13.04.2020 23:33, Harsha Shamsundara Havanur wrote:
> > It was observed that PCI MMIO and/or IO BARs were programmed with
> > BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> > register) enabled, during PCI setup phase. This resulted in
> > incorrect memory mapping as soon as the lower half of the 64 bit
> > bar
> > is programmed, which displaced any RAM mappings under 4G. After the
> > upper half is programmed PCI memory mapping is restored to its
> > intended mapping but the RAM displaced is not restored. The OS then
> > continues to boot and function until it tries to access the
> > displaced
> > RAM at which point it suffers a page fault and crashes.
> > 
> > This patch address the issue by deferring enablement of memory and
> > I/O decode in command register until all the resources, like
> > interrupts
> > I/O and/or MMIO BARs for all the PCI device functions are
> > programmed.
> > 
> > Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
> > Reviewed-by: Paul Durrant <pdurrant@amazon.com>
> > Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> >  tools/firmware/hvmloader/pci.c | 35 +++++++++++++++++++++++++++---
> > -----
> >  1 file changed, 27 insertions(+), 8 deletions(-)
> 
> There not being any description of what has changed in v2, I also
> can't easily judge whether keeping the two tags above was
> legitimate. In any event you don't seem to have taken care of all
> review feedback (whether by making changes to the patch or by
> replying verbally).
> 
> > --- a/tools/firmware/hvmloader/pci.c
> > +++ b/tools/firmware/hvmloader/pci.c
> > @@ -84,6 +84,7 @@ void pci_setup(void)
> >      uint32_t vga_devfn = 256;
> >      uint16_t class, vendor_id, device_id;
> >      unsigned int bar, pin, link, isa_irq;
> > +    uint8_t pci_devfn_decode_type[256] = {};
> > 
> >      /* Resources assignable to PCI devices via BARs. */
> >      struct resource {
> > @@ -120,6 +121,9 @@ void pci_setup(void)
> >       */
> >      bool allow_memory_relocate = 1;
> > 
> > +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_MEMOR
> > Y != PCI_COMMAND_MEMORY);
> > +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO !=
> > PCI_COMMAND_IO);
> > +    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO !=
> > PCI_COMMAND_MASTER);
> 
This is a mistake, In fact it should be compared against unit8 as I am
storing whole command value later in the loop.

> This looks like a copy-and-paste mistake - are you sure you've
> build-tested this? (This alone likely invalidates the tags, as
> per above.)
> 
> > @@ -289,9 +293,22 @@ void pci_setup(void)
> >                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
> >          }
> > 
> > -        /* Enable bus mastering. */
> > +        /*
> > +         * Disable bus mastering, memory and I/O space, which is
> > typical device
> > +         * reset state. It is recommended that BAR programming be
> > done whilst
> > +         * decode bits are cleared to avoid incorrect mappings
> > being created,
> > +         * when 64-bit memory BAR is programmed first by writing
> > the lower half
> > +         * and then the upper half, which first maps to an address
> > under 4G
> > +         * replacing any RAM mapped in that address, which is not
> > restored
> > +         * back after the upper half is written and PCI memory is
> > correctly
> > +         * mapped to its intended high mem address.
> > +         *
> > +         * Capture the state of bus master to restore it back once
> > BAR
> > +         * programming is completed.
> > +         */
> >          cmd = pci_readw(devfn, PCI_COMMAND);
> > -        cmd |= PCI_COMMAND_MASTER;
> > +        pci_devfn_decode_type[devfn] = cmd & ~(PCI_COMMAND_MEMORY
> > | PCI_COMMAND_IO);
> > +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY |
> > PCI_COMMAND_IO);
> 
> The disabling of MASTER was put under question in v1 already.

Disabling of MASTER is done whilst programming BARs and it is restored
back to its previous value in the loop at the end of pci_setup
function.

> 
> > @@ -526,10 +542,13 @@ void pci_setup(void)
> >           * has IO enabled, even if there is no I/O BAR on that
> >           * particular device.
> >           */
> > -        cmd = pci_readw(vga_devfn, PCI_COMMAND);
> > -        cmd |= PCI_COMMAND_IO;
> > -        pci_writew(vga_devfn, PCI_COMMAND, cmd);
> > +        pci_devfn_decode_type[vga_devfn] |= PCI_COMMAND_IO;
> >      }
> > +
> > +    /* Enable memory and I/O space. Restore saved BUS MASTER state
> > */
> > +    for ( devfn = 0; devfn < 256; devfn++ )
> > +        if ( pci_devfn_decode_type[devfn] )
> > +            pci_writew(devfn, PCI_COMMAND,
> > pci_devfn_decode_type[devfn]);
> 
> You effectively clear the upper 8 bits here, rather than retaining
> them.
> 
In fact cmd is a 32 bit value and I am retaining only lower 8 bits of
it. This will be corrected in v3.
> Jan
Jan Beulich April 14, 2020, 9:14 a.m. UTC | #6
On 14.04.2020 11:00, Shamsundara Havanur, Harsha wrote:
> On Tue, 2020-04-14 at 09:42 +0200, Jan Beulich wrote:
>> On 13.04.2020 23:33, Harsha Shamsundara Havanur wrote:
>>> @@ -289,9 +293,22 @@ void pci_setup(void)
>>>                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
>>>          }
>>>
>>> -        /* Enable bus mastering. */
>>> +        /*
>>> +         * Disable bus mastering, memory and I/O space, which is
>>> typical device
>>> +         * reset state. It is recommended that BAR programming be
>>> done whilst
>>> +         * decode bits are cleared to avoid incorrect mappings
>>> being created,
>>> +         * when 64-bit memory BAR is programmed first by writing
>>> the lower half
>>> +         * and then the upper half, which first maps to an address
>>> under 4G
>>> +         * replacing any RAM mapped in that address, which is not
>>> restored
>>> +         * back after the upper half is written and PCI memory is
>>> correctly
>>> +         * mapped to its intended high mem address.
>>> +         *
>>> +         * Capture the state of bus master to restore it back once
>>> BAR
>>> +         * programming is completed.
>>> +         */
>>>          cmd = pci_readw(devfn, PCI_COMMAND);
>>> -        cmd |= PCI_COMMAND_MASTER;
>>> +        pci_devfn_decode_type[devfn] = cmd & ~(PCI_COMMAND_MEMORY
>>> | PCI_COMMAND_IO);
>>> +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY |
>>> PCI_COMMAND_IO);
>>
>> The disabling of MASTER was put under question in v1 already.
> 
> Disabling of MASTER is done whilst programming BARs and it is restored
> back to its previous value in the loop at the end of pci_setup
> function.

Yet didn't Andrew indicate he knows of devices which get upset if
MASTER _ever_ gets cleared?

>>> @@ -526,10 +542,13 @@ void pci_setup(void)
>>>           * has IO enabled, even if there is no I/O BAR on that
>>>           * particular device.
>>>           */
>>> -        cmd = pci_readw(vga_devfn, PCI_COMMAND);
>>> -        cmd |= PCI_COMMAND_IO;
>>> -        pci_writew(vga_devfn, PCI_COMMAND, cmd);
>>> +        pci_devfn_decode_type[vga_devfn] |= PCI_COMMAND_IO;
>>>      }
>>> +
>>> +    /* Enable memory and I/O space. Restore saved BUS MASTER state
>>> */
>>> +    for ( devfn = 0; devfn < 256; devfn++ )
>>> +        if ( pci_devfn_decode_type[devfn] )
>>> +            pci_writew(devfn, PCI_COMMAND,
>>> pci_devfn_decode_type[devfn]);
>>
>> You effectively clear the upper 8 bits here, rather than retaining
>> them.
>>
> In fact cmd is a 32 bit value and I am retaining only lower 8 bits of
> it. This will be corrected in v3.

PCI_COMMAND is a 16-bit field. The adjacent 16 bits in the same 32 bit
word are PCI_STATUS.

Jan
Shamsundara Havanur, Harsha April 14, 2020, 9:22 a.m. UTC | #7
On Tue, 2020-04-14 at 11:14 +0200, Jan Beulich wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> 
> On 14.04.2020 11:00, Shamsundara Havanur, Harsha wrote:
> > On Tue, 2020-04-14 at 09:42 +0200, Jan Beulich wrote:
> > > On 13.04.2020 23:33, Harsha Shamsundara Havanur wrote:
> > > > @@ -289,9 +293,22 @@ void pci_setup(void)
> > > >                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
> > > >          }
> > > > 
> > > > -        /* Enable bus mastering. */
> > > > +        /*
> > > > +         * Disable bus mastering, memory and I/O space, which
> > > > is
> > > > typical device
> > > > +         * reset state. It is recommended that BAR programming
> > > > be
> > > > done whilst
> > > > +         * decode bits are cleared to avoid incorrect mappings
> > > > being created,
> > > > +         * when 64-bit memory BAR is programmed first by
> > > > writing
> > > > the lower half
> > > > +         * and then the upper half, which first maps to an
> > > > address
> > > > under 4G
> > > > +         * replacing any RAM mapped in that address, which is
> > > > not
> > > > restored
> > > > +         * back after the upper half is written and PCI memory
> > > > is
> > > > correctly
> > > > +         * mapped to its intended high mem address.
> > > > +         *
> > > > +         * Capture the state of bus master to restore it back
> > > > once
> > > > BAR
> > > > +         * programming is completed.
> > > > +         */
> > > >          cmd = pci_readw(devfn, PCI_COMMAND);
> > > > -        cmd |= PCI_COMMAND_MASTER;
> > > > +        pci_devfn_decode_type[devfn] = cmd &
> > > > ~(PCI_COMMAND_MEMORY
> > > > > PCI_COMMAND_IO);
> > > > 
> > > > +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY |
> > > > PCI_COMMAND_IO);
> > > 
> > > The disabling of MASTER was put under question in v1 already.
> > 
> > Disabling of MASTER is done whilst programming BARs and it is
> > restored
> > back to its previous value in the loop at the end of pci_setup
> > function.
> 
> Yet didn't Andrew indicate he knows of devices which get upset if
> MASTER _ever_ gets cleared?

Previous commit enabled MASTER for all functions. I am bit confused
here on the consensus on enabling/disabling/retaining BME.
Should we even care about MASTER?

> 
> > > > @@ -526,10 +542,13 @@ void pci_setup(void)
> > > >           * has IO enabled, even if there is no I/O BAR on that
> > > >           * particular device.
> > > >           */
> > > > -        cmd = pci_readw(vga_devfn, PCI_COMMAND);
> > > > -        cmd |= PCI_COMMAND_IO;
> > > > -        pci_writew(vga_devfn, PCI_COMMAND, cmd);
> > > > +        pci_devfn_decode_type[vga_devfn] |= PCI_COMMAND_IO;
> > > >      }
> > > > +
> > > > +    /* Enable memory and I/O space. Restore saved BUS MASTER
> > > > state
> > > > */
> > > > +    for ( devfn = 0; devfn < 256; devfn++ )
> > > > +        if ( pci_devfn_decode_type[devfn] )
> > > > +            pci_writew(devfn, PCI_COMMAND,
> > > > pci_devfn_decode_type[devfn]);
> > > 
> > > You effectively clear the upper 8 bits here, rather than
> > > retaining
> > > them.
> > > 
> > 
> > In fact cmd is a 32 bit value and I am retaining only lower 8 bits
> > of
> > it. This will be corrected in v3.
> 
> PCI_COMMAND is a 16-bit field. The adjacent 16 bits in the same 32
> bit
> word are PCI_STATUS.
> 
> Jan
Jan Beulich April 14, 2020, 9:29 a.m. UTC | #8
On 14.04.2020 11:22, Shamsundara Havanur, Harsha wrote:
> On Tue, 2020-04-14 at 11:14 +0200, Jan Beulich wrote:
>> CAUTION: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender
>> and know the content is safe.
>>
>>
>>
>> On 14.04.2020 11:00, Shamsundara Havanur, Harsha wrote:
>>> On Tue, 2020-04-14 at 09:42 +0200, Jan Beulich wrote:
>>>> On 13.04.2020 23:33, Harsha Shamsundara Havanur wrote:
>>>>> @@ -289,9 +293,22 @@ void pci_setup(void)
>>>>>                     devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
>>>>>          }
>>>>>
>>>>> -        /* Enable bus mastering. */
>>>>> +        /*
>>>>> +         * Disable bus mastering, memory and I/O space, which
>>>>> is
>>>>> typical device
>>>>> +         * reset state. It is recommended that BAR programming
>>>>> be
>>>>> done whilst
>>>>> +         * decode bits are cleared to avoid incorrect mappings
>>>>> being created,
>>>>> +         * when 64-bit memory BAR is programmed first by
>>>>> writing
>>>>> the lower half
>>>>> +         * and then the upper half, which first maps to an
>>>>> address
>>>>> under 4G
>>>>> +         * replacing any RAM mapped in that address, which is
>>>>> not
>>>>> restored
>>>>> +         * back after the upper half is written and PCI memory
>>>>> is
>>>>> correctly
>>>>> +         * mapped to its intended high mem address.
>>>>> +         *
>>>>> +         * Capture the state of bus master to restore it back
>>>>> once
>>>>> BAR
>>>>> +         * programming is completed.
>>>>> +         */
>>>>>          cmd = pci_readw(devfn, PCI_COMMAND);
>>>>> -        cmd |= PCI_COMMAND_MASTER;
>>>>> +        pci_devfn_decode_type[devfn] = cmd &
>>>>> ~(PCI_COMMAND_MEMORY
>>>>>> PCI_COMMAND_IO);
>>>>>
>>>>> +        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY |
>>>>> PCI_COMMAND_IO);
>>>>
>>>> The disabling of MASTER was put under question in v1 already.
>>>
>>> Disabling of MASTER is done whilst programming BARs and it is
>>> restored
>>> back to its previous value in the loop at the end of pci_setup
>>> function.
>>
>> Yet didn't Andrew indicate he knows of devices which get upset if
>> MASTER _ever_ gets cleared?
> 
> Previous commit enabled MASTER for all functions. I am bit confused
> here on the consensus on enabling/disabling/retaining BME.
> Should we even care about MASTER?

With the commit introducing its universal setting, I'm afraid to
avoid regressions we can't sensibly alter the behavior unless it
can be explained clearly why the original change must have been
outright wrong.

Jan
Paul Durrant April 14, 2020, 11:01 a.m. UTC | #9
> -----Original Message-----
> >
> > Previous commit enabled MASTER for all functions. I am bit confused
> > here on the consensus on enabling/disabling/retaining BME.
> > Should we even care about MASTER?
> 
> With the commit introducing its universal setting, I'm afraid to
> avoid regressions we can't sensibly alter the behavior unless it
> can be explained clearly why the original change must have been
> outright wrong.
> 

Well the original code IIRC had no justification for setting BME and doing it unconditionally does seem dangerous. Could we at least make it configurable?

  Paul
Jan Beulich April 14, 2020, 11:39 a.m. UTC | #10
On 14.04.2020 13:01, Paul Durrant wrote:
>> -----Original Message-----
>>>
>>> Previous commit enabled MASTER for all functions. I am bit confused
>>> here on the consensus on enabling/disabling/retaining BME.
>>> Should we even care about MASTER?
>>
>> With the commit introducing its universal setting, I'm afraid to
>> avoid regressions we can't sensibly alter the behavior unless it
>> can be explained clearly why the original change must have been
>> outright wrong.
>>
> 
> Well the original code IIRC had no justification for setting BME
> and doing it unconditionally does seem dangerous.

I'm not viewing this as dangerous, merely as (typically) pointless.
A well behaved device won't start issuing DMA requests merely
because it had its bus mastering capability enabled. (And in the
context of some IOMMU work of yours you actually stated there are
devices where clearing of this bit won't stop them from doing so.)

> Could we at least make it configurable?
Well, the main question then would be - configurable by which
mechanism?

Jan
Paul Durrant April 14, 2020, 11:58 a.m. UTC | #11
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 14 April 2020 12:40
> To: paul@xen.org
> Cc: 'Shamsundara Havanur, Harsha' <havanur@amazon.com>; xen-devel@lists.xenproject.org;
> andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com; wl@xen.org; roger.pau@citrix.com
> Subject: Re: [XEN PATCH v2] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> 
> On 14.04.2020 13:01, Paul Durrant wrote:
> >> -----Original Message-----
> >>>
> >>> Previous commit enabled MASTER for all functions. I am bit confused
> >>> here on the consensus on enabling/disabling/retaining BME.
> >>> Should we even care about MASTER?
> >>
> >> With the commit introducing its universal setting, I'm afraid to
> >> avoid regressions we can't sensibly alter the behavior unless it
> >> can be explained clearly why the original change must have been
> >> outright wrong.
> >>
> >
> > Well the original code IIRC had no justification for setting BME
> > and doing it unconditionally does seem dangerous.
> 
> I'm not viewing this as dangerous, merely as (typically) pointless.
> A well behaved device won't start issuing DMA requests merely
> because it had its bus mastering capability enabled. (And in the
> context of some IOMMU work of yours you actually stated there are
> devices where clearing of this bit won't stop them from doing so.)
> 

It's a line of defence against some devices at least, but others may choke. I still think it should be cleared by default and turned on with quirks if that is necessary.

> > Could we at least make it configurable?
> Well, the main question then would be - configurable by which
> mechanism?
> 

The usual for hvmloader... a xenstore platform key.

  Paul
Jan Beulich April 14, 2020, 1:42 p.m. UTC | #12
On 14.04.2020 13:58, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 14 April 2020 12:40
>> To: paul@xen.org
>> Cc: 'Shamsundara Havanur, Harsha' <havanur@amazon.com>; xen-devel@lists.xenproject.org;
>> andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com; wl@xen.org; roger.pau@citrix.com
>> Subject: Re: [XEN PATCH v2] hvmloader: Enable MMIO and I/O decode, after all resource allocation
>>
>> On 14.04.2020 13:01, Paul Durrant wrote:
>>>> -----Original Message-----
>>>>>
>>>>> Previous commit enabled MASTER for all functions. I am bit confused
>>>>> here on the consensus on enabling/disabling/retaining BME.
>>>>> Should we even care about MASTER?
>>>>
>>>> With the commit introducing its universal setting, I'm afraid to
>>>> avoid regressions we can't sensibly alter the behavior unless it
>>>> can be explained clearly why the original change must have been
>>>> outright wrong.
>>>>
>>>
>>> Well the original code IIRC had no justification for setting BME
>>> and doing it unconditionally does seem dangerous.
>>
>> I'm not viewing this as dangerous, merely as (typically) pointless.
>> A well behaved device won't start issuing DMA requests merely
>> because it had its bus mastering capability enabled. (And in the
>> context of some IOMMU work of yours you actually stated there are
>> devices where clearing of this bit won't stop them from doing so.)
>>
> 
> It's a line of defence against some devices at least,

What defence? Once we're past hvmloader, the guest can do whatever it
wants anyway.

Jan
Paul Durrant April 14, 2020, 1:51 p.m. UTC | #13
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 14 April 2020 14:43
> To: paul@xen.org
> Cc: 'Shamsundara Havanur, Harsha' <havanur@amazon.com>; xen-devel@lists.xenproject.org;
> andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com; wl@xen.org; roger.pau@citrix.com
> Subject: Re: [XEN PATCH v2] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> 
> On 14.04.2020 13:58, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 14 April 2020 12:40
> >> To: paul@xen.org
> >> Cc: 'Shamsundara Havanur, Harsha' <havanur@amazon.com>; xen-devel@lists.xenproject.org;
> >> andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com; wl@xen.org; roger.pau@citrix.com
> >> Subject: Re: [XEN PATCH v2] hvmloader: Enable MMIO and I/O decode, after all resource allocation
> >>
> >> On 14.04.2020 13:01, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>>>
> >>>>> Previous commit enabled MASTER for all functions. I am bit confused
> >>>>> here on the consensus on enabling/disabling/retaining BME.
> >>>>> Should we even care about MASTER?
> >>>>
> >>>> With the commit introducing its universal setting, I'm afraid to
> >>>> avoid regressions we can't sensibly alter the behavior unless it
> >>>> can be explained clearly why the original change must have been
> >>>> outright wrong.
> >>>>
> >>>
> >>> Well the original code IIRC had no justification for setting BME
> >>> and doing it unconditionally does seem dangerous.
> >>
> >> I'm not viewing this as dangerous, merely as (typically) pointless.
> >> A well behaved device won't start issuing DMA requests merely
> >> because it had its bus mastering capability enabled. (And in the
> >> context of some IOMMU work of yours you actually stated there are
> >> devices where clearing of this bit won't stop them from doing so.)
> >>
> >
> > It's a line of defence against some devices at least,
> 
> What defence? Once we're past hvmloader, the guest can do whatever it
> wants anyway.
> 

My observation is that it is typically the device function driver that will enable BME, and that may come after a device-specific reset has been done. So, the chances of the VM surviving in the face of buggy h/w is higher if we don't blindly enable BME early on.

  Paul
Jan Beulich April 14, 2020, 2:13 p.m. UTC | #14
On 14.04.2020 15:51, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 14 April 2020 14:43
>> To: paul@xen.org
>> Cc: 'Shamsundara Havanur, Harsha' <havanur@amazon.com>; xen-devel@lists.xenproject.org;
>> andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com; wl@xen.org; roger.pau@citrix.com
>> Subject: Re: [XEN PATCH v2] hvmloader: Enable MMIO and I/O decode, after all resource allocation
>>
>> On 14.04.2020 13:58, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 14 April 2020 12:40
>>>> To: paul@xen.org
>>>> Cc: 'Shamsundara Havanur, Harsha' <havanur@amazon.com>; xen-devel@lists.xenproject.org;
>>>> andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com; wl@xen.org; roger.pau@citrix.com
>>>> Subject: Re: [XEN PATCH v2] hvmloader: Enable MMIO and I/O decode, after all resource allocation
>>>>
>>>> On 14.04.2020 13:01, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>>>
>>>>>>> Previous commit enabled MASTER for all functions. I am bit confused
>>>>>>> here on the consensus on enabling/disabling/retaining BME.
>>>>>>> Should we even care about MASTER?
>>>>>>
>>>>>> With the commit introducing its universal setting, I'm afraid to
>>>>>> avoid regressions we can't sensibly alter the behavior unless it
>>>>>> can be explained clearly why the original change must have been
>>>>>> outright wrong.
>>>>>>
>>>>>
>>>>> Well the original code IIRC had no justification for setting BME
>>>>> and doing it unconditionally does seem dangerous.
>>>>
>>>> I'm not viewing this as dangerous, merely as (typically) pointless.
>>>> A well behaved device won't start issuing DMA requests merely
>>>> because it had its bus mastering capability enabled. (And in the
>>>> context of some IOMMU work of yours you actually stated there are
>>>> devices where clearing of this bit won't stop them from doing so.)
>>>>
>>>
>>> It's a line of defence against some devices at least,
>>
>> What defence? Once we're past hvmloader, the guest can do whatever it
>> wants anyway.
>>
> 
> My observation is that it is typically the device function driver
> that will enable BME, and that may come after a device-specific
> reset has been done. So, the chances of the VM surviving in the
> face of buggy h/w is higher if we don't blindly enable BME early
> on.

Well, I'd agree if only we knew the reason for the introduction of
this, many years ago.

Jan
diff mbox series

Patch

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 0b708bf578..f74471b255 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -84,6 +84,7 @@  void pci_setup(void)
     uint32_t vga_devfn = 256;
     uint16_t class, vendor_id, device_id;
     unsigned int bar, pin, link, isa_irq;
+    uint8_t pci_devfn_decode_type[256] = {};
 
     /* Resources assignable to PCI devices via BARs. */
     struct resource {
@@ -120,6 +121,9 @@  void pci_setup(void)
      */
     bool allow_memory_relocate = 1;
 
+    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_MEMORY != PCI_COMMAND_MEMORY);
+    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO != PCI_COMMAND_IO);
+    BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO != PCI_COMMAND_MASTER);
     s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
     if ( s )
         allow_memory_relocate = strtoll(s, NULL, 0);
@@ -289,9 +293,22 @@  void pci_setup(void)
                    devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
         }
 
-        /* Enable bus mastering. */
+        /*
+         * Disable bus mastering, memory and I/O space, which is typical device
+         * reset state. It is recommended that BAR programming be done whilst
+         * decode bits are cleared to avoid incorrect mappings being created,
+         * when 64-bit memory BAR is programmed first by writing the lower half
+         * and then the upper half, which first maps to an address under 4G
+         * replacing any RAM mapped in that address, which is not restored
+         * back after the upper half is written and PCI memory is correctly
+         * mapped to its intended high mem address.
+         *
+         * Capture the state of bus master to restore it back once BAR
+         * programming is completed.
+         */
         cmd = pci_readw(devfn, PCI_COMMAND);
-        cmd |= PCI_COMMAND_MASTER;
+        pci_devfn_decode_type[devfn] = cmd & ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
+        cmd &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
         pci_writew(devfn, PCI_COMMAND, cmd);
     }
 
@@ -503,10 +520,9 @@  void pci_setup(void)
         if ( (bar_reg == PCI_ROM_ADDRESS) ||
              ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
               PCI_BASE_ADDRESS_SPACE_MEMORY) )
-            cmd |= PCI_COMMAND_MEMORY;
+            pci_devfn_decode_type[devfn] |= PCI_COMMAND_MEMORY;
         else
-            cmd |= PCI_COMMAND_IO;
-        pci_writew(devfn, PCI_COMMAND, cmd);
+            pci_devfn_decode_type[devfn] |= PCI_COMMAND_IO;
     }
 
     if ( pci_hi_mem_start )
@@ -526,10 +542,13 @@  void pci_setup(void)
          * has IO enabled, even if there is no I/O BAR on that
          * particular device.
          */
-        cmd = pci_readw(vga_devfn, PCI_COMMAND);
-        cmd |= PCI_COMMAND_IO;
-        pci_writew(vga_devfn, PCI_COMMAND, cmd);
+        pci_devfn_decode_type[vga_devfn] |= PCI_COMMAND_IO;
     }
+
+    /* Enable memory and I/O space. Restore saved BUS MASTER state */
+    for ( devfn = 0; devfn < 256; devfn++ )
+        if ( pci_devfn_decode_type[devfn] )
+            pci_writew(devfn, PCI_COMMAND, pci_devfn_decode_type[devfn]);
 }
 
 /*