Message ID | 1430343372-687-2-git-send-email-mcgrof@do-not-panic.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
[+cc linux-pci] Hi Luis, On Wed, Apr 29, 2015 at 02:36:08PM -0700, Luis R. Rodriguez wrote: > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > This allows drivers to take advantage of write-combining > when possible. Ideally we'd have pci_read_bases() just > peg an IORESOURCE_WC flag for us This makes it sound like pci_read_bases() could do a better job if we just tried harder, but I don't think that's the case. All pci_read_bases() can do is look at the bits in the BAR. For memory BARs, there's a "prefetchable" bit and a "64-bit" bit. If you just want to complain that the PCI spec didn't define a way for software to discover whether a BAR can be mapped with WC, that's fine, but it's misleading to suggest that pci_read_bases() could figure out WC without some help from the spec. > but where exactly > video devices memory lie varies *largely* and at times things > are mixed with MMIO registers, sometimes we can address > the changes in drivers, other times the change requires > intrusive changes. > > Although there is also arch_phys_wc_add() that makes use of > architecture specific write-combining alternatives (MTRR on > x86 when a system does not have PAT) we void polluting > pci_iomap() space with it and force drivers and subsystems > that want to use it to be explicit. I'm not quite sure I understand the point you're making here about not polluting pci_iomap_wc() with arch_phys_wc_add(). I think the choice is for a driver to do either this: info->screen_base = pci_iomap_wc(dev, 0, 0); or this: info->screen_base = pci_iomap_wc(dev, 0, 0); par->wc_cookie = arch_phys_wc_add(pci_resource_start(dev, 0), pci_resource_len(dev, 0)); The driver is *already* being explicit because it calls pci_iomap_wc() instead of pci_iomap(). It seems like it would be ideal if ioremap_wc() could call arch_phys_wc_add() internally. Doesn't any caller of arch_phys_wc_add() have to also do some sort of ioremap() beforehand? I assume there's some reason for separating them, and I see that the current arch_phys_wc_add() requires the caller to store a handle, but doing both seems confusing. > There are a few motivations for this: > > a) Take advantage of PAT when available > > b) Help bury MTRR code away, MTRR is architecture specific and on > x86 its replaced by PAT > > c) Help with the goal of eventually using _PAGE_CACHE_UC over > _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit > de33c442e titled "x86 PAT: fix performance drop for glx, > use UC minus for ioremap(), ioremap_nocache() and > pci_mmap_page_range()") I think these are now _PAGE_CACHE_MODE_UC and _PAGE_CACHE_MODE_UC_MINUS, right? > ... > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev, > + int bar, > + unsigned long offset, > + unsigned long maxlen) > +{ > + resource_size_t start = pci_resource_start(dev, bar); > + resource_size_t len = pci_resource_len(dev, bar); > + unsigned long flags = pci_resource_flags(dev, bar); > + > + if (len <= offset || !start) > + return NULL; > + len -= offset; > + start += offset; > + if (maxlen && len > maxlen) > + len = maxlen; > + if (flags & IORESOURCE_IO) > + return __pci_ioport_map(dev, start, len); Is there any point in checking for IORESOURCE_IO? If a driver calls pci_iomap_wc_range(), I assume it already knows this is an IORESOURCE_MEM BAR, so if we see IORESOURCE_IO here we should just return an error, i.e., NULL. > + if (flags & IORESOURCE_MEM) > + return ioremap_wc(start, len); > + /* What? */ > + return NULL; > +} > +EXPORT_SYMBOL_GPL(pci_iomap_wc_range); Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 30, 2015 at 10:59:17AM -0500, Bjorn Helgaas wrote: > [+cc linux-pci] > > Hi Luis, > > On Wed, Apr 29, 2015 at 02:36:08PM -0700, Luis R. Rodriguez wrote: > > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > > > This allows drivers to take advantage of write-combining > > when possible. Ideally we'd have pci_read_bases() just > > peg an IORESOURCE_WC flag for us > > This makes it sound like pci_read_bases() could do a better job > if we just tried harder, but I don't think that's the case. All > pci_read_bases() can do is look at the bits in the BAR. For > memory BARs, there's a "prefetchable" bit and a "64-bit" bit. > > If you just want to complain that the PCI spec didn't define a > way for software to discover whether a BAR can be mapped with WC, > that's fine, but it's misleading to suggest that pci_read_bases() > could figure out WC without some help from the spec. You're right sorry about that, in my original patch this was more of a question and I did not have a full answer for but mst had clarified before the spec doesn't allow for this [0] and you are confirming this now as well. [0] https://lkml.org/lkml/2015/4/21/714 I'll update the patch and at least document we did think about this and that its a shortcoming of the spec. > > but where exactly > > video devices memory lie varies *largely* and at times things > > are mixed with MMIO registers, sometimes we can address > > the changes in drivers, other times the change requires > > intrusive changes. > > > > Although there is also arch_phys_wc_add() that makes use of > > architecture specific write-combining alternatives (MTRR on > > x86 when a system does not have PAT) we void polluting > > pci_iomap() space with it and force drivers and subsystems > > that want to use it to be explicit. > > I'm not quite sure I understand the point you're making here > about not polluting pci_iomap_wc() with arch_phys_wc_add(). I > think the choice is for a driver to do either this: > > info->screen_base = pci_iomap_wc(dev, 0, 0); > > or this: > > info->screen_base = pci_iomap_wc(dev, 0, 0); > par->wc_cookie = arch_phys_wc_add(pci_resource_start(dev, 0), > pci_resource_len(dev, 0)); > > The driver is *already* being explicit because it calls > pci_iomap_wc() instead of pci_iomap(). > > It seems like it would be ideal if ioremap_wc() could call > arch_phys_wc_add() internally. Indeed, that's what I was alluding to. > Doesn't any caller of > arch_phys_wc_add() have to also do some sort of ioremap() > beforehand? This is not a requirement as the physical address is used, not the virtual address. > I assume there's some reason for separating them, Well a full sweep to change to arch_phys_wc_add() was never done, consider this part of the last effort to do so. In retrospect now that I've covered all other drivers in 12 different series of patches I think its perhaps best to not mesh them together as we're phasing out MTRR and the only reason to have arch_phys_wc_add() is for MTRR which is legacy. I'll update the commit log to mention that. > and I see that the current arch_phys_wc_add() requires the caller > to store a handle, but doing both seems confusing. That's just a cookie so that later when we undo the driver we can tell the backend to remove it. > > There are a few motivations for this: > > > > a) Take advantage of PAT when available > > > > b) Help bury MTRR code away, MTRR is architecture specific and on > > x86 its replaced by PAT > > > > c) Help with the goal of eventually using _PAGE_CACHE_UC over > > _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit > > de33c442e titled "x86 PAT: fix performance drop for glx, > > use UC minus for ioremap(), ioremap_nocache() and > > pci_mmap_page_range()") > > I think these are now _PAGE_CACHE_MODE_UC and > _PAGE_CACHE_MODE_UC_MINUS, right? Indeed, thanks, I'll fix that in the commit log. > > ... > > > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev, > > + int bar, > > + unsigned long offset, > > + unsigned long maxlen) > > +{ > > + resource_size_t start = pci_resource_start(dev, bar); > > + resource_size_t len = pci_resource_len(dev, bar); > > + unsigned long flags = pci_resource_flags(dev, bar); > > + > > + if (len <= offset || !start) > > + return NULL; > > + len -= offset; > > + start += offset; > > + if (maxlen && len > maxlen) > > + len = maxlen; > > + if (flags & IORESOURCE_IO) > > + return __pci_ioport_map(dev, start, len); > > Is there any point in checking for IORESOURCE_IO? If a driver > calls pci_iomap_wc_range(), I assume it already knows this is an > IORESOURCE_MEM BAR, so if we see IORESOURCE_IO here we should > just return an error, i.e., NULL. Agreed, will fix with all the other changes on the commit log and repost. I won't repost the full series but just this one patch as a v5. Thanks for the review. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 30, 2015 at 9:52 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote: > On Thu, Apr 30, 2015 at 10:59:17AM -0500, Bjorn Helgaas wrote: >> [+cc linux-pci] >> >> Hi Luis, >> >> On Wed, Apr 29, 2015 at 02:36:08PM -0700, Luis R. Rodriguez wrote: >> > From: "Luis R. Rodriguez" <mcgrof@suse.com> >> > >> > This allows drivers to take advantage of write-combining >> > when possible. Ideally we'd have pci_read_bases() just >> > peg an IORESOURCE_WC flag for us >> >> This makes it sound like pci_read_bases() could do a better job >> if we just tried harder, but I don't think that's the case. All >> pci_read_bases() can do is look at the bits in the BAR. For >> memory BARs, there's a "prefetchable" bit and a "64-bit" bit. >> >> If you just want to complain that the PCI spec didn't define a >> way for software to discover whether a BAR can be mapped with WC, >> that's fine, but it's misleading to suggest that pci_read_bases() >> could figure out WC without some help from the spec. > > You're right sorry about that, in my original patch this was more > of a question and I did not have a full answer for but mst had > clarified before the spec doesn't allow for this [0] and you are > confirming this now as well. > > [0] https://lkml.org/lkml/2015/4/21/714 > > I'll update the patch and at least document we did think about > this and that its a shortcoming of the spec. > >> > but where exactly >> > video devices memory lie varies *largely* and at times things >> > are mixed with MMIO registers, sometimes we can address >> > the changes in drivers, other times the change requires >> > intrusive changes. >> > >> > Although there is also arch_phys_wc_add() that makes use of >> > architecture specific write-combining alternatives (MTRR on >> > x86 when a system does not have PAT) we void polluting >> > pci_iomap() space with it and force drivers and subsystems >> > that want to use it to be explicit. >> >> I'm not quite sure I understand the point you're making here >> about not polluting pci_iomap_wc() with arch_phys_wc_add(). I >> think the choice is for a driver to do either this: >> >> info->screen_base = pci_iomap_wc(dev, 0, 0); >> >> or this: >> >> info->screen_base = pci_iomap_wc(dev, 0, 0); >> par->wc_cookie = arch_phys_wc_add(pci_resource_start(dev, 0), >> pci_resource_len(dev, 0)); >> >> The driver is *already* being explicit because it calls >> pci_iomap_wc() instead of pci_iomap(). >> >> It seems like it would be ideal if ioremap_wc() could call >> arch_phys_wc_add() internally. > > Indeed, that's what I was alluding to. > >> Doesn't any caller of >> arch_phys_wc_add() have to also do some sort of ioremap() >> beforehand? > > This is not a requirement as the physical address is used, > not the virtual address. > >> I assume there's some reason for separating them, > > Well a full sweep to change to arch_phys_wc_add() was never done, > consider this part of the last effort to do so. In retrospect now > that I've covered all other drivers in 12 different series of patches > I think its perhaps best to not mesh them together as we're phasing > out MTRR and the only reason to have arch_phys_wc_add() is for MTRR > which is legacy. I would say it much more strongly. Drivers for new hardware SHOULD NOT call arch_phys_wc_add, directly or otherwise. MTRRs are crap. They have nasty alignment requirements, they are a very limited and unpredictable resource, and the interact poorly with BIOS. They should really only be used for old video framebuffers and such. Anything new should use PAT (it's been available for a long time) and possibly streaming memory writes. Even fancy server gear (myri10ge, for example) should stay far away from MTRRs and such: it's very easy to put enough devices in a server board that you simply run out of MTRRs and arch_phys_wc_add will stop working. If we make ioremap_wc and similar start automatically adding MTRRs, then performance will vary wildly with the order of driver loading, because we'll run out of MTRRs part-way through bootup. ioremap_wc via PAT, on the other hand, is 100% reliable on newer hardware. Maybe I should have called it arch_phys_wc_add_awful_legacy_hack. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 30, 2015 at 10:03:18AM -0700, Andy Lutomirski wrote: > On Thu, Apr 30, 2015 at 9:52 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote: > > On Thu, Apr 30, 2015 at 10:59:17AM -0500, Bjorn Helgaas wrote: > >> [+cc linux-pci] > >> > >> Hi Luis, > >> > >> On Wed, Apr 29, 2015 at 02:36:08PM -0700, Luis R. Rodriguez wrote: > >> > From: "Luis R. Rodriguez" <mcgrof@suse.com> > >> > > >> > This allows drivers to take advantage of write-combining > >> > when possible. Ideally we'd have pci_read_bases() just > >> > peg an IORESOURCE_WC flag for us > >> > >> This makes it sound like pci_read_bases() could do a better job > >> if we just tried harder, but I don't think that's the case. All > >> pci_read_bases() can do is look at the bits in the BAR. For > >> memory BARs, there's a "prefetchable" bit and a "64-bit" bit. > >> > >> If you just want to complain that the PCI spec didn't define a > >> way for software to discover whether a BAR can be mapped with WC, > >> that's fine, but it's misleading to suggest that pci_read_bases() > >> could figure out WC without some help from the spec. > > > > You're right sorry about that, in my original patch this was more > > of a question and I did not have a full answer for but mst had > > clarified before the spec doesn't allow for this [0] and you are > > confirming this now as well. > > > > [0] https://lkml.org/lkml/2015/4/21/714 > > > > I'll update the patch and at least document we did think about > > this and that its a shortcoming of the spec. > > > >> > but where exactly > >> > video devices memory lie varies *largely* and at times things > >> > are mixed with MMIO registers, sometimes we can address > >> > the changes in drivers, other times the change requires > >> > intrusive changes. > >> > > >> > Although there is also arch_phys_wc_add() that makes use of > >> > architecture specific write-combining alternatives (MTRR on > >> > x86 when a system does not have PAT) we void polluting > >> > pci_iomap() space with it and force drivers and subsystems > >> > that want to use it to be explicit. > >> > >> I'm not quite sure I understand the point you're making here > >> about not polluting pci_iomap_wc() with arch_phys_wc_add(). I > >> think the choice is for a driver to do either this: > >> > >> info->screen_base = pci_iomap_wc(dev, 0, 0); > >> > >> or this: > >> > >> info->screen_base = pci_iomap_wc(dev, 0, 0); > >> par->wc_cookie = arch_phys_wc_add(pci_resource_start(dev, 0), > >> pci_resource_len(dev, 0)); > >> > >> The driver is *already* being explicit because it calls > >> pci_iomap_wc() instead of pci_iomap(). > >> > >> It seems like it would be ideal if ioremap_wc() could call > >> arch_phys_wc_add() internally. > > > > Indeed, that's what I was alluding to. > > > >> Doesn't any caller of > >> arch_phys_wc_add() have to also do some sort of ioremap() > >> beforehand? > > > > This is not a requirement as the physical address is used, > > not the virtual address. > > > >> I assume there's some reason for separating them, > > > > Well a full sweep to change to arch_phys_wc_add() was never done, > > consider this part of the last effort to do so. In retrospect now > > that I've covered all other drivers in 12 different series of patches > > I think its perhaps best to not mesh them together as we're phasing > > out MTRR and the only reason to have arch_phys_wc_add() is for MTRR > > which is legacy. > > I would say it much more strongly. > > Drivers for new hardware SHOULD NOT call arch_phys_wc_add, directly or > otherwise. MTRRs are crap. They have nasty alignment requirements, > they are a very limited and unpredictable resource, and the interact > poorly with BIOS. They should really only be used for old video > framebuffers and such. > > Anything new should use PAT (it's been available for a long time) and > possibly streaming memory writes. Even fancy server gear (myri10ge, > for example) should stay far away from MTRRs and such: it's very easy > to put enough devices in a server board that you simply run out of > MTRRs and arch_phys_wc_add will stop working. > > If we make ioremap_wc and similar start automatically adding MTRRs, > then performance will vary wildly with the order of driver loading, > because we'll run out of MTRRs part-way through bootup. > > ioremap_wc via PAT, on the other hand, is 100% reliable on newer hardware. > > Maybe I should have called it arch_phys_wc_add_awful_legacy_hack. Thanks, I'll document such technicalities as well ;) Luis -- To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h index 7389c87..b1e17fc 100644 --- a/include/asm-generic/pci_iomap.h +++ b/include/asm-generic/pci_iomap.h @@ -15,9 +15,13 @@ struct pci_dev; #ifdef CONFIG_PCI /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */ extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max); +extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max); extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen); +extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, + unsigned long offset, + unsigned long maxlen); /* Create a virtual mapping cookie for a port on a given PCI device. * Do not call this directly, it exists to make it easier for architectures * to override */ @@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon return NULL; } +static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max) +{ + return NULL; +} static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen) { return NULL; } +static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, + unsigned long offset, + unsigned long maxlen) +{ + return NULL; +} #endif #endif /* __ASM_GENERIC_IO_H */ diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c index bcce5f1..30b65ae 100644 --- a/lib/pci_iomap.c +++ b/lib/pci_iomap.c @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev, EXPORT_SYMBOL(pci_iomap_range); /** + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR + * @dev: PCI device that owns the BAR + * @bar: BAR number + * @offset: map memory at the given offset in BAR + * @maxlen: max length of the memory to map + * + * Using this function you will get a __iomem address to your device BAR. + * You can access it using ioread*() and iowrite*(). These functions hide + * the details if this is a MMIO or PIO address space and will just do what + * you expect from them in the correct way. When possible write combining + * is used. + * + * @maxlen specifies the maximum length to map. If you want to get access to + * the complete BAR from offset to the end, pass %0 here. + * */ +void __iomem *pci_iomap_wc_range(struct pci_dev *dev, + int bar, + unsigned long offset, + unsigned long maxlen) +{ + resource_size_t start = pci_resource_start(dev, bar); + resource_size_t len = pci_resource_len(dev, bar); + unsigned long flags = pci_resource_flags(dev, bar); + + if (len <= offset || !start) + return NULL; + len -= offset; + start += offset; + if (maxlen && len > maxlen) + len = maxlen; + if (flags & IORESOURCE_IO) + return __pci_ioport_map(dev, start, len); + if (flags & IORESOURCE_MEM) + return ioremap_wc(start, len); + /* What? */ + return NULL; +} +EXPORT_SYMBOL_GPL(pci_iomap_wc_range); + +/** * pci_iomap - create a virtual mapping cookie for a PCI BAR * @dev: PCI device that owns the BAR * @bar: BAR number @@ -70,4 +110,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen) return pci_iomap_range(dev, bar, 0, maxlen); } EXPORT_SYMBOL(pci_iomap); + +/** + * pci_iomap_wc - create a virtual WC mapping cookie for a PCI BAR + * @dev: PCI device that owns the BAR + * @bar: BAR number + * @maxlen: length of the memory to map + * + * Using this function you will get a __iomem address to your device BAR. + * You can access it using ioread*() and iowrite*(). These functions hide + * the details if this is a MMIO or PIO address space and will just do what + * you expect from them in the correct way. When possible write combining + * is used. + * + * @maxlen specifies the maximum length to map. If you want to get access to + * the complete BAR without checking for its length first, pass %0 here. + * */ +void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen) +{ + return pci_iomap_wc_range(dev, bar, 0, maxlen); +} +EXPORT_SYMBOL_GPL(pci_iomap_wc); #endif /* CONFIG_PCI */