Message ID | 20160114172645.23429.9938.stgit@gimli.home (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote: > We've done a pretty good job of abstracting EA from drivers, but there > are some properties of BAR Equivalent resources that don't really jive > with traditional PCI BARs. In particular, natural alignment is only > encouraged, not required. > > Why does this matter? There are drivers like vfio-pci that will > happily gobble up the EA abstraction that's been implemented and > expose a device using EA to userspace as if those resources are > traditional BARs. Pretty cool. The vfio API is bus agnostic, so it > doesn't care about alignment. The problem comes with PCI config space > emulation where we don't let userspace manipulate the BAR value, but > we do emulate BAR sizing. The abstraction kind of falls apart if > userspace gets garbage when they try to size what appears to be a > traditional BAR, but is actually a BAR equivalent. > > We could simply round up the size in vfio to make it naturally > aligned, but then we're imposing artificial sizes to the user and we > have the discontinuity that BAR size emulation and vfio region size > reporting don't agree on the size. I think what we want to do is > expose EA to the user, reporting traditional BARs with BEIs as > zero-sized and providing additional regions for the user to access > each EA region, whether it has a BEI or not. > > To facilitate that, a flag indicating whether a PCI resource is a > traditional BAR or BAR equivalent seems much nicer than attempting > to size the BAR ourselves or deducing it through the EA capability. If vfio does size the resource, EA entries that are aligned could still be emulated as BARs, correct? I would think that emulating a BAR would be preferred when possible, for backwards-compatibility. > Thoughts? I like the idea of adding an EA flag. There were some cases in the kernel where it would be nice to know if a resource was fixed because it was EA or if something else was fixing it. Adding that flag was discussed during the code review of the EA code, but it was decided that we could get by without it. IIRC, most of the cases that required the flag had to do with EA entries for bridges. Since bridge support wasn't added, we didn't need the flag. -Sean > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > drivers/pci/pci.c | 2 +- > include/linux/ioport.h | 2 ++ > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 314db8c..174c734 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) > > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) > { > - unsigned long flags = IORESOURCE_PCI_FIXED; > + unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI; > > switch (prop) { > case PCI_EA_P_MEM: > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > index 24bea08..5acc194 100644 > --- a/include/linux/ioport.h > +++ b/include/linux/ioport.h > @@ -105,6 +105,8 @@ struct resource { > /* PCI control bits. Shares IORESOURCE_BITS with above PCI ROM. */ > #define IORESOURCE_PCI_FIXED (1<<4) /* Do not move resource */ > > +/* PCI Enhanced Allocation defined BAR equivalent resource */ > +#define IORESOURCE_PCI_EA_BEI (1<<5) > > /* helpers to define resources */ > #define DEFINE_RES_NAMED(_start, _size, _name, _flags) \ > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/14/2016 09:26 AM, Alex Williamson wrote: > We've done a pretty good job of abstracting EA from drivers, but there > are some properties of BAR Equivalent resources that don't really jive > with traditional PCI BARs. In particular, natural alignment is only > encouraged, not required. > > Why does this matter? There are drivers like vfio-pci that will > happily gobble up the EA abstraction that's been implemented and > expose a device using EA to userspace as if those resources are > traditional BARs. Pretty cool. The vfio API is bus agnostic, so it > doesn't care about alignment. The problem comes with PCI config space > emulation where we don't let userspace manipulate the BAR value, but > we do emulate BAR sizing. The abstraction kind of falls apart if > userspace gets garbage when they try to size what appears to be a > traditional BAR, but is actually a BAR equivalent. > > We could simply round up the size in vfio to make it naturally > aligned, but then we're imposing artificial sizes to the user and we > have the discontinuity that BAR size emulation and vfio region size > reporting don't agree on the size. I think what we want to do is > expose EA to the user, reporting traditional BARs with BEIs as > zero-sized and providing additional regions for the user to access > each EA region, whether it has a BEI or not. > > To facilitate that, a flag indicating whether a PCI resource is a > traditional BAR or BAR equivalent seems much nicer than attempting > to size the BAR ourselves or deducing it through the EA capability. > > Thoughts? Is the flag exposed to userspace in any way? I haven't dug into what uses the flags. One problem we have seen is with the lspci utility which cannot distinguish between SROIV BARs and EA provisioned BARs. Would, or could, this be used there? David Daney > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > drivers/pci/pci.c | 2 +- > include/linux/ioport.h | 2 ++ > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 314db8c..174c734 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) > > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) > { > - unsigned long flags = IORESOURCE_PCI_FIXED; > + unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI; > > switch (prop) { > case PCI_EA_P_MEM: > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > index 24bea08..5acc194 100644 > --- a/include/linux/ioport.h > +++ b/include/linux/ioport.h > @@ -105,6 +105,8 @@ struct resource { > /* PCI control bits. Shares IORESOURCE_BITS with above PCI ROM. */ > #define IORESOURCE_PCI_FIXED (1<<4) /* Do not move resource */ > > +/* PCI Enhanced Allocation defined BAR equivalent resource */ > +#define IORESOURCE_PCI_EA_BEI (1<<5) > > /* helpers to define resources */ > #define DEFINE_RES_NAMED(_start, _size, _name, _flags) \ > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-01-14 at 10:34 -0800, Sean O. Stalley wrote: > On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote: > > We've done a pretty good job of abstracting EA from drivers, but > > there > > are some properties of BAR Equivalent resources that don't really > > jive > > with traditional PCI BARs. In particular, natural alignment is > > only > > encouraged, not required. > > > > Why does this matter? There are drivers like vfio-pci that will > > happily gobble up the EA abstraction that's been implemented and > > expose a device using EA to userspace as if those resources are > > traditional BARs. Pretty cool. The vfio API is bus agnostic, so > > it > > doesn't care about alignment. The problem comes with PCI config > > space > > emulation where we don't let userspace manipulate the BAR value, > > but > > we do emulate BAR sizing. The abstraction kind of falls apart if > > userspace gets garbage when they try to size what appears to be a > > traditional BAR, but is actually a BAR equivalent. > > > > We could simply round up the size in vfio to make it naturally > > aligned, but then we're imposing artificial sizes to the user and > > we > > have the discontinuity that BAR size emulation and vfio region size > > reporting don't agree on the size. I think what we want to do is > > expose EA to the user, reporting traditional BARs with BEIs as > > zero-sized and providing additional regions for the user to access > > each EA region, whether it has a BEI or not. > > > > To facilitate that, a flag indicating whether a PCI resource is a > > traditional BAR or BAR equivalent seems much nicer than attempting > > to size the BAR ourselves or deducing it through the EA capability. > > If vfio does size the resource, EA entries that are aligned could > still be emulated as BARs, correct? > > I would think that emulating a BAR would be preferred when possible, > for backwards-compatibility. If a BEI is naturally aligned, I can't think of any problems with exposing it as a traditional BAR to userspace. I agree that there may be some compatibility benefits there, so it may be useful to offer both options. I don't think we can combine them though, it would violate the EA spec to expose the traditional BAR and and the matching BEI. We'd either need to hide the fake BAR or hide the EA entry defining that BEI. A module option could define which is preferred or maybe an ioctl. > > Thoughts? > > I like the idea of adding an EA flag. > > There were some cases in the kernel where it would be nice to know if a > resource was fixed because it was EA or if something else was fixing it. > Adding that flag was discussed during the code review of the EA code, > but it was decided that we could get by without it. > > IIRC, most of the cases that required the flag had to do with EA entries > for bridges. Since bridge support wasn't added, we didn't need the flag. By my reading of the spec, not all BEIs need to be fixed, is this just a simplification to avoid sizing and mapping a BAR that doesn't exist in the traditional sense? A flag on the resource seems like it would be useful for that as well if we ever wanted to add the case where an AE BAR equivalent could be remapped. Thanks, Alex > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > drivers/pci/pci.c | 2 +- > > include/linux/ioport.h | 2 ++ > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 314db8c..174c734 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) > > > > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) > > { > > - unsigned long flags = IORESOURCE_PCI_FIXED; > > + unsigned long flags = IORESOURCE_PCI_FIXED | > > IORESOURCE_PCI_EA_BEI; > > > > switch (prop) { > > case PCI_EA_P_MEM: > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > > index 24bea08..5acc194 100644 > > --- a/include/linux/ioport.h > > +++ b/include/linux/ioport.h > > @@ -105,6 +105,8 @@ struct resource { > > /* PCI control bits. Shares IORESOURCE_BITS with above PCI > > ROM. */ > > #define IORESOURCE_PCI_FIXED (1<<4) /* Do > > not move resource */ > > > > +/* PCI Enhanced Allocation defined BAR equivalent resource */ > > +#define IORESOURCE_PCI_EA_BEI (1<<5) > > > > /* helpers to define resources */ > > #define DEFINE_RES_NAMED(_start, _size, _name, _flags) > > \ > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-01-14 at 10:54 -0800, David Daney wrote: > On 01/14/2016 09:26 AM, Alex Williamson wrote: > > We've done a pretty good job of abstracting EA from drivers, but > > there > > are some properties of BAR Equivalent resources that don't really > > jive > > with traditional PCI BARs. In particular, natural alignment is > > only > > encouraged, not required. > > > > Why does this matter? There are drivers like vfio-pci that will > > happily gobble up the EA abstraction that's been implemented and > > expose a device using EA to userspace as if those resources are > > traditional BARs. Pretty cool. The vfio API is bus agnostic, so > > it > > doesn't care about alignment. The problem comes with PCI config > > space > > emulation where we don't let userspace manipulate the BAR value, > > but > > we do emulate BAR sizing. The abstraction kind of falls apart if > > userspace gets garbage when they try to size what appears to be a > > traditional BAR, but is actually a BAR equivalent. > > > > We could simply round up the size in vfio to make it naturally > > aligned, but then we're imposing artificial sizes to the user and > > we > > have the discontinuity that BAR size emulation and vfio region size > > reporting don't agree on the size. I think what we want to do is > > expose EA to the user, reporting traditional BARs with BEIs as > > zero-sized and providing additional regions for the user to access > > each EA region, whether it has a BEI or not. > > > > To facilitate that, a flag indicating whether a PCI resource is a > > traditional BAR or BAR equivalent seems much nicer than attempting > > to size the BAR ourselves or deducing it through the EA capability. > > > > Thoughts? > > Is the flag exposed to userspace in any way? > > I haven't dug into what uses the flags. > > One problem we have seen is with the lspci utility which cannot > distinguish between SROIV BARs and EA provisioned BARs. > > Would, or could, this be used there? Perhaps so, the flags would be exposed in sysfs in /sys/bus/pci/devices/<device>/resource. Three fields are printed for each resource: start, end, and flags. That's definitely something lspci could consume. Thanks, Alex > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > drivers/pci/pci.c | 2 +- > > include/linux/ioport.h | 2 ++ > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 314db8c..174c734 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) > > > > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) > > { > > - unsigned long flags = IORESOURCE_PCI_FIXED; > > + unsigned long flags = IORESOURCE_PCI_FIXED | > > IORESOURCE_PCI_EA_BEI; > > > > switch (prop) { > > case PCI_EA_P_MEM: > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > > index 24bea08..5acc194 100644 > > --- a/include/linux/ioport.h > > +++ b/include/linux/ioport.h > > @@ -105,6 +105,8 @@ struct resource { > > /* PCI control bits. Shares IORESOURCE_BITS with above PCI > > ROM. */ > > #define IORESOURCE_PCI_FIXED (1<<4) /* Do > > not move resource */ > > > > +/* PCI Enhanced Allocation defined BAR equivalent resource */ > > +#define IORESOURCE_PCI_EA_BEI (1<<5) > > > > /* helpers to define resources */ > > #define DEFINE_RES_NAMED(_start, _size, _name, _flags) > > \ > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2016 at 10:54:48AM -0800, David Daney wrote: > On 01/14/2016 09:26 AM, Alex Williamson wrote: > >We've done a pretty good job of abstracting EA from drivers, but there > >are some properties of BAR Equivalent resources that don't really jive > >with traditional PCI BARs. In particular, natural alignment is only > >encouraged, not required. > > > >Why does this matter? There are drivers like vfio-pci that will > >happily gobble up the EA abstraction that's been implemented and > >expose a device using EA to userspace as if those resources are > >traditional BARs. Pretty cool. The vfio API is bus agnostic, so it > >doesn't care about alignment. The problem comes with PCI config space > >emulation where we don't let userspace manipulate the BAR value, but > >we do emulate BAR sizing. The abstraction kind of falls apart if > >userspace gets garbage when they try to size what appears to be a > >traditional BAR, but is actually a BAR equivalent. > > > >We could simply round up the size in vfio to make it naturally > >aligned, but then we're imposing artificial sizes to the user and we > >have the discontinuity that BAR size emulation and vfio region size > >reporting don't agree on the size. I think what we want to do is > >expose EA to the user, reporting traditional BARs with BEIs as > >zero-sized and providing additional regions for the user to access > >each EA region, whether it has a BEI or not. > > > >To facilitate that, a flag indicating whether a PCI resource is a > >traditional BAR or BAR equivalent seems much nicer than attempting > >to size the BAR ourselves or deducing it through the EA capability. > > > >Thoughts? > > Is the flag exposed to userspace in any way? Yes. Resources are exposed through a sysfs file. (/sys/bus/pci/devices/[Device Number]/resource) This file contains the resource range, as well as the flags. > I haven't dug into what uses the flags. > > One problem we have seen is with the lspci utility which cannot > distinguish between SROIV BARs and EA provisioned BARs. > > Would, or could, this be used there? This flag would let us distinguish between EA BARs & SRIOV BARs in lspci. When lspci sees a resource in the sysfs file without a matching BAR in configspace, it assumes that the BAR comes from an SRIOV entry. This was a good assumption until EA showed up. The only case in lspci that this flag wouldn't handle is when the resource is provisioned through EA & SRIOV (ie: BEI 9-14). lspci would only flag the resource as EA. -Sean > > David Daney > > > > > >Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > >--- > > drivers/pci/pci.c | 2 +- > > include/linux/ioport.h | 2 ++ > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > >diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > >index 314db8c..174c734 100644 > >--- a/drivers/pci/pci.c > >+++ b/drivers/pci/pci.c > >@@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) > > > > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) > > { > >- unsigned long flags = IORESOURCE_PCI_FIXED; > >+ unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI; > > > > switch (prop) { > > case PCI_EA_P_MEM: > >diff --git a/include/linux/ioport.h b/include/linux/ioport.h > >index 24bea08..5acc194 100644 > >--- a/include/linux/ioport.h > >+++ b/include/linux/ioport.h > >@@ -105,6 +105,8 @@ struct resource { > > /* PCI control bits. Shares IORESOURCE_BITS with above PCI ROM. */ > > #define IORESOURCE_PCI_FIXED (1<<4) /* Do not move resource */ > > > >+/* PCI Enhanced Allocation defined BAR equivalent resource */ > >+#define IORESOURCE_PCI_EA_BEI (1<<5) > > > > /* helpers to define resources */ > > #define DEFINE_RES_NAMED(_start, _size, _name, _flags) \ > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2016 at 12:16:02PM -0700, Alex Williamson wrote: > On Thu, 2016-01-14 at 10:34 -0800, Sean O. Stalley wrote: > > On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote: > > > We've done a pretty good job of abstracting EA from drivers, but > > > there > > > are some properties of BAR Equivalent resources that don't really > > > jive > > > with traditional PCI BARs. In particular, natural alignment is > > > only > > > encouraged, not required. > > > > > > Why does this matter? There are drivers like vfio-pci that will > > > happily gobble up the EA abstraction that's been implemented and > > > expose a device using EA to userspace as if those resources are > > > traditional BARs. Pretty cool. The vfio API is bus agnostic, so > > > it > > > doesn't care about alignment. The problem comes with PCI config > > > space > > > emulation where we don't let userspace manipulate the BAR value, > > > but > > > we do emulate BAR sizing. The abstraction kind of falls apart if > > > userspace gets garbage when they try to size what appears to be a > > > traditional BAR, but is actually a BAR equivalent. > > > > > > We could simply round up the size in vfio to make it naturally > > > aligned, but then we're imposing artificial sizes to the user and > > > we > > > have the discontinuity that BAR size emulation and vfio region size > > > reporting don't agree on the size. I think what we want to do is > > > expose EA to the user, reporting traditional BARs with BEIs as > > > zero-sized and providing additional regions for the user to access > > > each EA region, whether it has a BEI or not. > > > > > > To facilitate that, a flag indicating whether a PCI resource is a > > > traditional BAR or BAR equivalent seems much nicer than attempting > > > to size the BAR ourselves or deducing it through the EA capability. > > > > If vfio does size the resource, EA entries that are aligned could > > still be emulated as BARs, correct? > > > > I would think that emulating a BAR would be preferred when possible, > > for backwards-compatibility. > > If a BEI is naturally aligned, I can't think of any problems with > exposing it as a traditional BAR to userspace. I agree that there may > be some compatibility benefits there, so it may be useful to offer both > options. I don't think we can combine them though, it would violate > the EA spec to expose the traditional BAR and and the matching BEI. > We'd either need to hide the fake BAR or hide the EA entry defining > that BEI. A module option could define which is preferred or maybe an > ioctl. Would any functionality be lost if vfio: - emulates BARs & hide EA entry when EA resources are aligned. - exposes EA entries when the resources aren't aligned (no BAR emulation). ? I'm just wondering if giving userspace the option to pick is necessary, or if there is a setting that is always ideal. > > > Thoughts? > > > > I like the idea of adding an EA flag. > > > > There were some cases in the kernel where it would be nice to know if a > > resource was fixed because it was EA or if something else was fixing it. > > Adding that flag was discussed during the code review of the EA code, > > but it was decided that we could get by without it. > > > > IIRC, most of the cases that required the flag had to do with EA entries > > for bridges. Since bridge support wasn't added, we didn't need the flag. > > By my reading of the spec, not all BEIs need to be fixed, is this just > a simplification to avoid sizing and mapping a BAR that doesn't exist > in the traditional sense? A flag on the resource seems like it would > be useful for that as well if we ever wanted to add the case where an > AE BAR equivalent could be remapped. Thanks, All of the usable BEIs have a HwInit Base & MaxOffset, and therefore a fixed range. The "unavailable for use" resources aren't explicitly HwInit, but the spec doesn't define how/when you can move them. The spec does define a writeable bit for resources, but doesn't define how to use it either. I think the intention was to be able to expand EA in the future to cover movable resources. Anyway, I think having an explicit flag that says "This Resource is from EA" that is independent of "This resource is fixed" is a good idea. Acked-by: Sean O. Stalley <sean.stalley@intel.com> Thanks, Sean > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > --- > > > drivers/pci/pci.c | 2 +- > > > include/linux/ioport.h | 2 ++ > > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index 314db8c..174c734 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) > > > > > > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) > > > { > > > - unsigned long flags = IORESOURCE_PCI_FIXED; > > > + unsigned long flags = IORESOURCE_PCI_FIXED | > > > IORESOURCE_PCI_EA_BEI; > > > > > > switch (prop) { > > > case PCI_EA_P_MEM: > > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > > > index 24bea08..5acc194 100644 > > > --- a/include/linux/ioport.h > > > +++ b/include/linux/ioport.h > > > @@ -105,6 +105,8 @@ struct resource { > > > /* PCI control bits. Shares IORESOURCE_BITS with above PCI > > > ROM. */ > > > #define IORESOURCE_PCI_FIXED (1<<4) /* Do > > > not move resource */ > > > > > > +/* PCI Enhanced Allocation defined BAR equivalent resource * > > > +#define IORESOURCE_PCI_EA_BEI (1<<5) > > > > > > /* helpers to define resources */ > > > #define DEFINE_RES_NAMED(_start, _size, _name, _flags) > > > \ > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-01-14 at 12:23 -0800, Sean O. Stalley wrote: > On Thu, Jan 14, 2016 at 12:16:02PM -0700, Alex Williamson wrote: > > On Thu, 2016-01-14 at 10:34 -0800, Sean O. Stalley wrote: > > > On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote: > > > > We've done a pretty good job of abstracting EA from drivers, but > > > > there > > > > are some properties of BAR Equivalent resources that don't really > > > > jive > > > > with traditional PCI BARs. In particular, natural alignment is > > > > only > > > > encouraged, not required. > > > > > > > > Why does this matter? There are drivers like vfio-pci that will > > > > happily gobble up the EA abstraction that's been implemented and > > > > expose a device using EA to userspace as if those resources are > > > > traditional BARs. Pretty cool. The vfio API is bus agnostic, so > > > > it > > > > doesn't care about alignment. The problem comes with PCI config > > > > space > > > > emulation where we don't let userspace manipulate the BAR value, > > > > but > > > > we do emulate BAR sizing. The abstraction kind of falls apart if > > > > userspace gets garbage when they try to size what appears to be a > > > > traditional BAR, but is actually a BAR equivalent. > > > > > > > > We could simply round up the size in vfio to make it naturally > > > > aligned, but then we're imposing artificial sizes to the user and > > > > we > > > > have the discontinuity that BAR size emulation and vfio region size > > > > reporting don't agree on the size. I think what we want to do is > > > > expose EA to the user, reporting traditional BARs with BEIs as > > > > zero-sized and providing additional regions for the user to access > > > > each EA region, whether it has a BEI or not. > > > > > > > > To facilitate that, a flag indicating whether a PCI resource is a > > > > traditional BAR or BAR equivalent seems much nicer than attempting > > > > to size the BAR ourselves or deducing it through the EA capability. > > > > > > If vfio does size the resource, EA entries that are aligned could > > > still be emulated as BARs, correct? > > > > > > I would think that emulating a BAR would be preferred when possible, > > > for backwards-compatibility. > > > > If a BEI is naturally aligned, I can't think of any problems with > > exposing it as a traditional BAR to userspace. I agree that there may > > be some compatibility benefits there, so it may be useful to offer both > > options. I don't think we can combine them though, it would violate > > the EA spec to expose the traditional BAR and and the matching BEI. > > We'd either need to hide the fake BAR or hide the EA entry defining > > that BEI. A module option could define which is preferred or maybe an > > ioctl. > > Would any functionality be lost if vfio: > - emulates BARs & hide EA entry when EA resources are aligned. > - exposes EA entries when the resources aren't aligned (no BAR emulation). > ? > > I'm just wondering if giving userspace the option to pick is necessary, > or if there is a setting that is always ideal. That certainly might be a good default and the only use case I can think where it wouldn't be ideal is if we want to expose EA to a VM for the purpose of doing EA testing and development in a guest. A module option would make more sense than defining a user interface for that case though. > > > > Thoughts? > > > > > > I like the idea of adding an EA flag. > > > > > > There were some cases in the kernel where it would be nice to know if a > > > resource was fixed because it was EA or if something else was fixing it. > > > Adding that flag was discussed during the code review of the EA code, > > > but it was decided that we could get by without it. > > > > > > IIRC, most of the cases that required the flag had to do with EA entries > > > for bridges. Since bridge support wasn't added, we didn't need the flag. > > > > By my reading of the spec, not all BEIs need to be fixed, is this just > > a simplification to avoid sizing and mapping a BAR that doesn't exist > > in the traditional sense? A flag on the resource seems like it would > > be useful for that as well if we ever wanted to add the case where an > > AE BAR equivalent could be remapped. Thanks, > > All of the usable BEIs have a HwInit Base & MaxOffset, and therefore a > fixed range. The "unavailable for use" resources aren't explicitly HwInit, > but the spec doesn't define how/when you can move them. > > The spec does define a writeable bit for resources, > but doesn't define how to use it either. I think the intention was to > be able to expand EA in the future to cover movable resources. Wow, that's really confusing to have a writable bit per entry and then define properties which forbid its use. We'd need to go through the SIG and define a whole new set of properties just to get movable versions, crazy. Thanks for the explanation. > > Anyway, I think having an explicit flag that says "This Resource is from EA" > that is independent of "This resource is fixed" is a good idea. > > > Acked-by: Sean O. Stalley <sean.stalley@intel.com> Cool, thanks Alex > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > --- > > > > drivers/pci/pci.c | 2 +- > > > > include/linux/ioport.h | 2 ++ > > > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > > index 314db8c..174c734 100644 > > > > --- a/drivers/pci/pci.c > > > > +++ b/drivers/pci/pci.c > > > > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) > > > > > > > > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 > > > > prop) > > > > { > > > > - unsigned long flags = IORESOURCE_PCI_FIXED; > > > > + unsigned long flags = IORESOURCE_PCI_FIXED | > > > > IORESOURCE_PCI_EA_BEI; > > > > > > > > switch (prop) { > > > > case PCI_EA_P_MEM: > > > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > > > > index 24bea08..5acc194 100644 > > > > --- a/include/linux/ioport.h > > > > +++ b/include/linux/ioport.h > > > > @@ -105,6 +105,8 @@ struct resource { > > > > /* PCI control bits. Shares IORESOURCE_BITS with above PCI > > > > ROM. */ > > > > #define IORESOURCE_PCI_FIXED (1<<4) /* > > > > Do > > > > not move resource */ > > > > > > > > +/* PCI Enhanced Allocation defined BAR equivalent resource * > > > > +#define IORESOURCE_PCI_EA_BEI (1<<5) > > > > > > > > /* helpers to define resources */ > > > > #define DEFINE_RES_NAMED(_start, _size, _name, _flags) > > > > \ > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2016 at 02:14:52PM -0700, Alex Williamson wrote: > On Thu, 2016-01-14 at 12:23 -0800, Sean O. Stalley wrote: > > On Thu, Jan 14, 2016 at 12:16:02PM -0700, Alex Williamson wrote: > > > On Thu, 2016-01-14 at 10:34 -0800, Sean O. Stalley wrote: > > > > On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote: > > > > > We've done a pretty good job of abstracting EA from drivers, but > > > > > there > > > > > are some properties of BAR Equivalent resources that don't really > > > > > jive > > > > > with traditional PCI BARs. In particular, natural alignment is > > > > > only > > > > > encouraged, not required. > > > > > > > > > > Why does this matter? There are drivers like vfio-pci that will > > > > > happily gobble up the EA abstraction that's been implemented and > > > > > expose a device using EA to userspace as if those resources are > > > > > traditional BARs. Pretty cool. The vfio API is bus agnostic, so > > > > > it > > > > > doesn't care about alignment. The problem comes with PCI config > > > > > space > > > > > emulation where we don't let userspace manipulate the BAR value, > > > > > but > > > > > we do emulate BAR sizing. The abstraction kind of falls apart if > > > > > userspace gets garbage when they try to size what appears to be a > > > > > traditional BAR, but is actually a BAR equivalent. > > > > > > > > > > We could simply round up the size in vfio to make it naturally > > > > > aligned, but then we're imposing artificial sizes to the user and > > > > > we > > > > > have the discontinuity that BAR size emulation and vfio region size > > > > > reporting don't agree on the size. I think what we want to do is > > > > > expose EA to the user, reporting traditional BARs with BEIs as > > > > > zero-sized and providing additional regions for the user to access > > > > > each EA region, whether it has a BEI or not. > > > > > > > > > > To facilitate that, a flag indicating whether a PCI resource is a > > > > > traditional BAR or BAR equivalent seems much nicer than attempting > > > > > to size the BAR ourselves or deducing it through the EA capability. > > > > > > > > If vfio does size the resource, EA entries that are aligned could > > > > still be emulated as BARs, correct? > > > > > > > > I would think that emulating a BAR would be preferred when possible, > > > > for backwards-compatibility. > > > > > > If a BEI is naturally aligned, I can't think of any problems with > > > exposing it as a traditional BAR to userspace. I agree that there may > > > be some compatibility benefits there, so it may be useful to offer both > > > options. I don't think we can combine them though, it would violate > > > the EA spec to expose the traditional BAR and and the matching BEI. > > > We'd either need to hide the fake BAR or hide the EA entry defining > > > that BEI. A module option could define which is preferred or maybe an > > > ioctl. > > > > Would any functionality be lost if vfio: > > - emulates BARs & hide EA entry when EA resources are aligned. > > - exposes EA entries when the resources aren't aligned (no BAR emulation). > > ? > > > > I'm just wondering if giving userspace the option to pick is necessary, > > or if there is a setting that is always ideal. > > That certainly might be a good default and the only use case I can > think where it wouldn't be ideal is if we want to expose EA to a VM for > the purpose of doing EA testing and development in a guest. A module > option would make more sense than defining a user interface for that > case though. The testing I have done for EA has been in a VM. I sent out some patches a while back for QEMU that allow the user to add an EA entry to the existing PCI models. [https://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg00348.html] I don't know if they will be useful for what you are doing, but they were very useful to me when doing the initial EA development. -Sean -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-01-14 at 10:26 -0700, Alex Williamson wrote: > We've done a pretty good job of abstracting EA from drivers, but there > are some properties of BAR Equivalent resources that don't really jive > with traditional PCI BARs. In particular, natural alignment is only > encouraged, not required. > > Why does this matter? There are drivers like vfio-pci that will > happily gobble up the EA abstraction that's been implemented and > expose a device using EA to userspace as if those resources are > traditional BARs. Pretty cool. The vfio API is bus agnostic, so it > doesn't care about alignment. The problem comes with PCI config space > emulation where we don't let userspace manipulate the BAR value, but > we do emulate BAR sizing. The abstraction kind of falls apart if > userspace gets garbage when they try to size what appears to be a > traditional BAR, but is actually a BAR equivalent. > > We could simply round up the size in vfio to make it naturally > aligned, but then we're imposing artificial sizes to the user and we > have the discontinuity that BAR size emulation and vfio region size > reporting don't agree on the size. I think what we want to do is > expose EA to the user, reporting traditional BARs with BEIs as > zero-sized and providing additional regions for the user to access > each EA region, whether it has a BEI or not. > > To facilitate that, a flag indicating whether a PCI resource is a > traditional BAR or BAR equivalent seems much nicer than attempting > to size the BAR ourselves or deducing it through the EA capability. > > Thoughts? > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- Just to loop back on this, it seems like we do have some support and use cases beyond what I proposed. Thanks for the discussion of that. However, I'm reluctant to post this formally because the change is user visible, it consumes a limited resource, and I don't know how quickly vfio-pci is going to be able to make use of this flag. The vfio-pci work may not happen until a device appears with poorly sized resources that has some use case with vfio-pci. Even then, we may be able to infer the BEI association without this flag. So, while I'm not opposed to this flag, I don't see a need to drive it right now and those that do have a more immediate need are welcome to take over. Thanks, Alex > drivers/pci/pci.c | 2 +- > include/linux/ioport.h | 2 ++ > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 314db8c..174c734 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) > > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) > { > - unsigned long flags = IORESOURCE_PCI_FIXED; > + unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI; > > switch (prop) { > case PCI_EA_P_MEM: > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > index 24bea08..5acc194 100644 > --- a/include/linux/ioport.h > +++ b/include/linux/ioport.h > @@ -105,6 +105,8 @@ struct resource { > /* PCI control bits. Shares IORESOURCE_BITS with above PCI ROM. */ > #define IORESOURCE_PCI_FIXED (1<<4) /* Do not move resource */ > > +/* PCI Enhanced Allocation defined BAR equivalent resource */ > +#define IORESOURCE_PCI_EA_BEI (1<<5) > > /* helpers to define resources */ > #define DEFINE_RES_NAMED(_start, _size, _name, _flags) \ > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 20, 2016 at 01:20:27PM -0700, Alex Williamson wrote: > On Thu, 2016-01-14 at 10:26 -0700, Alex Williamson wrote: > > We've done a pretty good job of abstracting EA from drivers, but there > > are some properties of BAR Equivalent resources that don't really jive > > with traditional PCI BARs. In particular, natural alignment is only > > encouraged, not required. > > > > Why does this matter? There are drivers like vfio-pci that will > > happily gobble up the EA abstraction that's been implemented and > > expose a device using EA to userspace as if those resources are > > traditional BARs. Pretty cool. The vfio API is bus agnostic, so it > > doesn't care about alignment. The problem comes with PCI config space > > emulation where we don't let userspace manipulate the BAR value, but > > we do emulate BAR sizing. The abstraction kind of falls apart if > > userspace gets garbage when they try to size what appears to be a > > traditional BAR, but is actually a BAR equivalent. > > > > We could simply round up the size in vfio to make it naturally > > aligned, but then we're imposing artificial sizes to the user and we > > have the discontinuity that BAR size emulation and vfio region size > > reporting don't agree on the size. I think what we want to do is > > expose EA to the user, reporting traditional BARs with BEIs as > > zero-sized and providing additional regions for the user to access > > each EA region, whether it has a BEI or not. > > > > To facilitate that, a flag indicating whether a PCI resource is a > > traditional BAR or BAR equivalent seems much nicer than attempting > > to size the BAR ourselves or deducing it through the EA capability. > > > > Thoughts? > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > > Just to loop back on this, it seems like we do have some support and > use cases beyond what I proposed. Thanks for the discussion of that. > However, I'm reluctant to post this formally because the change is user > visible, it consumes a limited resource, and I don't know how quickly > vfio-pci is going to be able to make use of this flag. The vfio-pci > work may not happen until a device appears with poorly sized resources > that has some use case with vfio-pci. Even then, we may be able to > infer the BEI association without this flag. So, while I'm not opposed > to this flag, I don't see a need to drive it right now and those that > do have a more immediate need are welcome to take over. Thanks, > > Alex > > Thanks Alex, I'll pick it up and fix the lspci case. -Sean > > drivers/pci/pci.c | 2 +- > > include/linux/ioport.h | 2 ++ > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 314db8c..174c734 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) > > > > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) > > { > > - unsigned long flags = IORESOURCE_PCI_FIXED; > > + unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI; > > > > switch (prop) { > > case PCI_EA_P_MEM: > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > > index 24bea08..5acc194 100644 > > --- a/include/linux/ioport.h > > +++ b/include/linux/ioport.h > > @@ -105,6 +105,8 @@ struct resource { > > /* PCI control bits. Shares IORESOURCE_BITS with above PCI ROM. */ > > #define IORESOURCE_PCI_FIXED (1<<4) /* Do not move resource */ > > > > +/* PCI Enhanced Allocation defined BAR equivalent resource */ > > +#define IORESOURCE_PCI_EA_BEI (1<<5) > > > > /* helpers to define resources */ > > #define DEFINE_RES_NAMED(_start, _size, _name, _flags) \ > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 314db8c..174c734 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev) static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) { - unsigned long flags = IORESOURCE_PCI_FIXED; + unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI; switch (prop) { case PCI_EA_P_MEM: diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 24bea08..5acc194 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -105,6 +105,8 @@ struct resource { /* PCI control bits. Shares IORESOURCE_BITS with above PCI ROM. */ #define IORESOURCE_PCI_FIXED (1<<4) /* Do not move resource */ +/* PCI Enhanced Allocation defined BAR equivalent resource */ +#define IORESOURCE_PCI_EA_BEI (1<<5) /* helpers to define resources */ #define DEFINE_RES_NAMED(_start, _size, _name, _flags) \
We've done a pretty good job of abstracting EA from drivers, but there are some properties of BAR Equivalent resources that don't really jive with traditional PCI BARs. In particular, natural alignment is only encouraged, not required. Why does this matter? There are drivers like vfio-pci that will happily gobble up the EA abstraction that's been implemented and expose a device using EA to userspace as if those resources are traditional BARs. Pretty cool. The vfio API is bus agnostic, so it doesn't care about alignment. The problem comes with PCI config space emulation where we don't let userspace manipulate the BAR value, but we do emulate BAR sizing. The abstraction kind of falls apart if userspace gets garbage when they try to size what appears to be a traditional BAR, but is actually a BAR equivalent. We could simply round up the size in vfio to make it naturally aligned, but then we're imposing artificial sizes to the user and we have the discontinuity that BAR size emulation and vfio region size reporting don't agree on the size. I think what we want to do is expose EA to the user, reporting traditional BARs with BEIs as zero-sized and providing additional regions for the user to access each EA region, whether it has a BEI or not. To facilitate that, a flag indicating whether a PCI resource is a traditional BAR or BAR equivalent seems much nicer than attempting to size the BAR ourselves or deducing it through the EA capability. Thoughts? Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- drivers/pci/pci.c | 2 +- include/linux/ioport.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html