Message ID | 20190924181244.7159-1-nsaenzjulienne@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | of: Fix DMA configuration for non-DT masters | expand |
On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote: > On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne > <nsaenzjulienne@suse.de> wrote: > > Hi All, > > this series tries to address one of the issues blocking us from > > upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that > > devices not represented in DT which sit behind a PCI bus fail to get the > > bus' DMA addressing constraints. > > > > This is due to the fact that of_dma_configure() assumes it's receiving a > > DT node representing the device being configured, as opposed to the PCIe > > bridge node we currently pass. This causes the code to directly jump > > into PCI's parent node when checking for 'dma-ranges' and misses > > whatever was set there. > > > > To address this I create a new API in OF - inspired from Robin Murphys > > original proposal[2] - which accepts a bus DT node as it's input in > > order to configure a device's DMA constraints. The changes go deep into > > of/address.c's implementation, as a device being having a DT node > > assumption was pretty strong. > > > > On top of this work, I also cleaned up of_dma_configure() removing its > > redundant arguments and creating an alternative function for the special > > cases > > not applicable to either the above case or the default usage. > > > > IMO the resulting functions are more explicit. They will probably > > surface some hacky usages that can be properly fixed as I show with the > > DT fixes on the Layerscape platform. > > > > This was also tested on a Raspberry Pi 4 with a custom PCIe driver and > > on a Seattle AMD board. > > Humm, I've been working on this issue too. Looks similar though yours > has a lot more churn and there's some other bugs I've found. That's good news, and yes now that I see it, some stuff on my series is overly complicated. Specially around of_translate_*(). On top of that, you removed in of_dma_get_range(): - /* - * At least empty ranges has to be defined for parent node if - * DMA is supported - */ - if (!ranges) - break; Which I assumed was bound to the standard and makes things easier. > Can you test out this branch[1]. I don't have any h/w needing this, > but wrote a unittest and tested with modified QEMU. I reviewed everything, I did find a minor issue, see the patch attached. Also I tested your branch both on an RPi4, with a PCI device that depends on these changes and by comparing the OF debugs logs on a Layerscape board which uses dma-ranges, dma-coherent and IOMMU. All works as expected. Will you send this series for v5.5? Please keep me in the loop, I'll review and test the final version. Regards, Nicolas
On 25/09/2019 15:52, Nicolas Saenz Julienne wrote: > On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote: >> On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne >> <nsaenzjulienne@suse.de> wrote: >>> Hi All, >>> this series tries to address one of the issues blocking us from >>> upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that >>> devices not represented in DT which sit behind a PCI bus fail to get the >>> bus' DMA addressing constraints. >>> >>> This is due to the fact that of_dma_configure() assumes it's receiving a >>> DT node representing the device being configured, as opposed to the PCIe >>> bridge node we currently pass. This causes the code to directly jump >>> into PCI's parent node when checking for 'dma-ranges' and misses >>> whatever was set there. >>> >>> To address this I create a new API in OF - inspired from Robin Murphys >>> original proposal[2] - which accepts a bus DT node as it's input in >>> order to configure a device's DMA constraints. The changes go deep into >>> of/address.c's implementation, as a device being having a DT node >>> assumption was pretty strong. >>> >>> On top of this work, I also cleaned up of_dma_configure() removing its >>> redundant arguments and creating an alternative function for the special >>> cases >>> not applicable to either the above case or the default usage. >>> >>> IMO the resulting functions are more explicit. They will probably >>> surface some hacky usages that can be properly fixed as I show with the >>> DT fixes on the Layerscape platform. >>> >>> This was also tested on a Raspberry Pi 4 with a custom PCIe driver and >>> on a Seattle AMD board. >> >> Humm, I've been working on this issue too. Looks similar though yours >> has a lot more churn and there's some other bugs I've found. > > That's good news, and yes now that I see it, some stuff on my series is overly > complicated. Specially around of_translate_*(). > > On top of that, you removed in of_dma_get_range(): > > - /* > - * At least empty ranges has to be defined for parent node if > - * DMA is supported > - */ > - if (!ranges) > - break; > > Which I assumed was bound to the standard and makes things easier. > >> Can you test out this branch[1]. I don't have any h/w needing this, >> but wrote a unittest and tested with modified QEMU. > > I reviewed everything, I did find a minor issue, see the patch attached. WRT that patch, the original intent of "force_dma" was purely to consider a device DMA-capable regardless of the presence of "dma-ranges". Expecting of_dma_configure() to do anything for a non-OF device has always been bogus - magic paravirt devices which appear out of nowhere and expect to be treated as genuine DMA masters are a separate problem that we haven't really approached yet. Robin.
On Wed, 2019-09-25 at 16:09 +0100, Robin Murphy wrote: > On 25/09/2019 15:52, Nicolas Saenz Julienne wrote: > > On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote: > > > On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne > > > <nsaenzjulienne@suse.de> wrote: > > > > Hi All, > > > > this series tries to address one of the issues blocking us from > > > > upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that > > > > devices not represented in DT which sit behind a PCI bus fail to get the > > > > bus' DMA addressing constraints. > > > > > > > > This is due to the fact that of_dma_configure() assumes it's receiving a > > > > DT node representing the device being configured, as opposed to the PCIe > > > > bridge node we currently pass. This causes the code to directly jump > > > > into PCI's parent node when checking for 'dma-ranges' and misses > > > > whatever was set there. > > > > > > > > To address this I create a new API in OF - inspired from Robin Murphys > > > > original proposal[2] - which accepts a bus DT node as it's input in > > > > order to configure a device's DMA constraints. The changes go deep into > > > > of/address.c's implementation, as a device being having a DT node > > > > assumption was pretty strong. > > > > > > > > On top of this work, I also cleaned up of_dma_configure() removing its > > > > redundant arguments and creating an alternative function for the special > > > > cases > > > > not applicable to either the above case or the default usage. > > > > > > > > IMO the resulting functions are more explicit. They will probably > > > > surface some hacky usages that can be properly fixed as I show with the > > > > DT fixes on the Layerscape platform. > > > > > > > > This was also tested on a Raspberry Pi 4 with a custom PCIe driver and > > > > on a Seattle AMD board. > > > > > > Humm, I've been working on this issue too. Looks similar though yours > > > has a lot more churn and there's some other bugs I've found. > > > > That's good news, and yes now that I see it, some stuff on my series is > > overly > > complicated. Specially around of_translate_*(). > > > > On top of that, you removed in of_dma_get_range(): > > > > - /* > > - * At least empty ranges has to be defined for parent node if > > - * DMA is supported > > - */ > > - if (!ranges) > > - break; > > > > Which I assumed was bound to the standard and makes things easier. > > > > > Can you test out this branch[1]. I don't have any h/w needing this, > > > but wrote a unittest and tested with modified QEMU. > > > > I reviewed everything, I did find a minor issue, see the patch attached. > > WRT that patch, the original intent of "force_dma" was purely to > consider a device DMA-capable regardless of the presence of > "dma-ranges". Expecting of_dma_configure() to do anything for a non-OF > device has always been bogus - magic paravirt devices which appear out > of nowhere and expect to be treated as genuine DMA masters are a > separate problem that we haven't really approached yet. I agree it's clearly abusing the function. I have no problem with the behaviour change if it's OK with you. Robin, have you looked into supporting multiple dma-ranges? It's the next thing we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in the works already. Regards, Nicolas
On Wed, Sep 25, 2019 at 9:53 AM Nicolas Saenz Julienne <nsaenzjulienne@suse.de> wrote: > > On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote: > > On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne > > <nsaenzjulienne@suse.de> wrote: > > > Hi All, > > > this series tries to address one of the issues blocking us from > > > upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that > > > devices not represented in DT which sit behind a PCI bus fail to get the > > > bus' DMA addressing constraints. > > > > > > This is due to the fact that of_dma_configure() assumes it's receiving a > > > DT node representing the device being configured, as opposed to the PCIe > > > bridge node we currently pass. This causes the code to directly jump > > > into PCI's parent node when checking for 'dma-ranges' and misses > > > whatever was set there. > > > > > > To address this I create a new API in OF - inspired from Robin Murphys > > > original proposal[2] - which accepts a bus DT node as it's input in > > > order to configure a device's DMA constraints. The changes go deep into > > > of/address.c's implementation, as a device being having a DT node > > > assumption was pretty strong. > > > > > > On top of this work, I also cleaned up of_dma_configure() removing its > > > redundant arguments and creating an alternative function for the special > > > cases > > > not applicable to either the above case or the default usage. > > > > > > IMO the resulting functions are more explicit. They will probably > > > surface some hacky usages that can be properly fixed as I show with the > > > DT fixes on the Layerscape platform. > > > > > > This was also tested on a Raspberry Pi 4 with a custom PCIe driver and > > > on a Seattle AMD board. > > > > Humm, I've been working on this issue too. Looks similar though yours > > has a lot more churn and there's some other bugs I've found. > > That's good news, and yes now that I see it, some stuff on my series is overly > complicated. Specially around of_translate_*(). > > On top of that, you removed in of_dma_get_range(): > > - /* > - * At least empty ranges has to be defined for parent node if > - * DMA is supported > - */ > - if (!ranges) > - break; > > Which I assumed was bound to the standard and makes things easier. The standard is whatever we say it is and what exists in the wild... Probably better for me to get the series posted for context, but the above is removed because we could be passing in the bus device/child node and checking for 'dma-ranges' rather than only the bus node. While this does mean 'dma-ranges' could be in a child node which is wrong, it simplifies the only caller of_dma_configure(). And really, there's no way to detect that error. Someone could call of_dma_configure(NULL, child, ...). Perhaps we could assert that 'ranges' is present whenever 'dma-ranges' is. Back to the standard, I think it can be summarized as a device's immediate parent (a bus node) must contain 'dma-ranges'. All the parent nodes of the bus node should also have 'dma-ranges', but missing is treated as empty (1:1 translation). 'dma-ranges' missing in all the parent nodes is also treated as 1:1 translation and no addressing restrictions. > > Can you test out this branch[1]. I don't have any h/w needing this, > > but wrote a unittest and tested with modified QEMU. > > I reviewed everything, I did find a minor issue, see the patch attached. > > Also I tested your branch both on an RPi4, with a PCI device that depends on > these changes and by comparing the OF debugs logs on a Layerscape board which > uses dma-ranges, dma-coherent and IOMMU. All works as expected. > > Will you send this series for v5.5? Please keep me in the loop, I'll review and > test the final version. Yes, sending it out soon. Rob
On Wed, Sep 25, 2019 at 10:30 AM Nicolas Saenz Julienne <nsaenzjulienne@suse.de> wrote: > > On Wed, 2019-09-25 at 16:09 +0100, Robin Murphy wrote: > > On 25/09/2019 15:52, Nicolas Saenz Julienne wrote: > > > On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote: > > > > On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne > > > > <nsaenzjulienne@suse.de> wrote: > > > > > Hi All, > > > > > this series tries to address one of the issues blocking us from > > > > > upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that > > > > > devices not represented in DT which sit behind a PCI bus fail to get the > > > > > bus' DMA addressing constraints. > > > > > > > > > > This is due to the fact that of_dma_configure() assumes it's receiving a > > > > > DT node representing the device being configured, as opposed to the PCIe > > > > > bridge node we currently pass. This causes the code to directly jump > > > > > into PCI's parent node when checking for 'dma-ranges' and misses > > > > > whatever was set there. > > > > > > > > > > To address this I create a new API in OF - inspired from Robin Murphys > > > > > original proposal[2] - which accepts a bus DT node as it's input in > > > > > order to configure a device's DMA constraints. The changes go deep into > > > > > of/address.c's implementation, as a device being having a DT node > > > > > assumption was pretty strong. > > > > > > > > > > On top of this work, I also cleaned up of_dma_configure() removing its > > > > > redundant arguments and creating an alternative function for the special > > > > > cases > > > > > not applicable to either the above case or the default usage. > > > > > > > > > > IMO the resulting functions are more explicit. They will probably > > > > > surface some hacky usages that can be properly fixed as I show with the > > > > > DT fixes on the Layerscape platform. > > > > > > > > > > This was also tested on a Raspberry Pi 4 with a custom PCIe driver and > > > > > on a Seattle AMD board. > > > > > > > > Humm, I've been working on this issue too. Looks similar though yours > > > > has a lot more churn and there's some other bugs I've found. > > > > > > That's good news, and yes now that I see it, some stuff on my series is > > > overly > > > complicated. Specially around of_translate_*(). > > > > > > On top of that, you removed in of_dma_get_range(): > > > > > > - /* > > > - * At least empty ranges has to be defined for parent node if > > > - * DMA is supported > > > - */ > > > - if (!ranges) > > > - break; > > > > > > Which I assumed was bound to the standard and makes things easier. > > > > > > > Can you test out this branch[1]. I don't have any h/w needing this, > > > > but wrote a unittest and tested with modified QEMU. > > > > > > I reviewed everything, I did find a minor issue, see the patch attached. > > > > WRT that patch, the original intent of "force_dma" was purely to > > consider a device DMA-capable regardless of the presence of > > "dma-ranges". Expecting of_dma_configure() to do anything for a non-OF > > device has always been bogus - magic paravirt devices which appear out > > of nowhere and expect to be treated as genuine DMA masters are a > > separate problem that we haven't really approached yet. > > I agree it's clearly abusing the function. I have no problem with the behaviour > change if it's OK with you. > > Robin, have you looked into supporting multiple dma-ranges? It's the next thing > we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in the > works already. Multiple dma-ranges as far as configuring inbound windows should work already other than the bug when there's any parent translation. But if you mean supporting multiple DMA offsets and masks per device in the DMA API, there's nothing in the works yet. Rob
On 25/09/2019 17:16, Rob Herring wrote: > On Wed, Sep 25, 2019 at 10:30 AM Nicolas Saenz Julienne > <nsaenzjulienne@suse.de> wrote: >> >> On Wed, 2019-09-25 at 16:09 +0100, Robin Murphy wrote: >>> On 25/09/2019 15:52, Nicolas Saenz Julienne wrote: >>>> On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote: >>>>> On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne >>>>> <nsaenzjulienne@suse.de> wrote: >>>>>> Hi All, >>>>>> this series tries to address one of the issues blocking us from >>>>>> upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that >>>>>> devices not represented in DT which sit behind a PCI bus fail to get the >>>>>> bus' DMA addressing constraints. >>>>>> >>>>>> This is due to the fact that of_dma_configure() assumes it's receiving a >>>>>> DT node representing the device being configured, as opposed to the PCIe >>>>>> bridge node we currently pass. This causes the code to directly jump >>>>>> into PCI's parent node when checking for 'dma-ranges' and misses >>>>>> whatever was set there. >>>>>> >>>>>> To address this I create a new API in OF - inspired from Robin Murphys >>>>>> original proposal[2] - which accepts a bus DT node as it's input in >>>>>> order to configure a device's DMA constraints. The changes go deep into >>>>>> of/address.c's implementation, as a device being having a DT node >>>>>> assumption was pretty strong. >>>>>> >>>>>> On top of this work, I also cleaned up of_dma_configure() removing its >>>>>> redundant arguments and creating an alternative function for the special >>>>>> cases >>>>>> not applicable to either the above case or the default usage. >>>>>> >>>>>> IMO the resulting functions are more explicit. They will probably >>>>>> surface some hacky usages that can be properly fixed as I show with the >>>>>> DT fixes on the Layerscape platform. >>>>>> >>>>>> This was also tested on a Raspberry Pi 4 with a custom PCIe driver and >>>>>> on a Seattle AMD board. >>>>> >>>>> Humm, I've been working on this issue too. Looks similar though yours >>>>> has a lot more churn and there's some other bugs I've found. >>>> >>>> That's good news, and yes now that I see it, some stuff on my series is >>>> overly >>>> complicated. Specially around of_translate_*(). >>>> >>>> On top of that, you removed in of_dma_get_range(): >>>> >>>> - /* >>>> - * At least empty ranges has to be defined for parent node if >>>> - * DMA is supported >>>> - */ >>>> - if (!ranges) >>>> - break; >>>> >>>> Which I assumed was bound to the standard and makes things easier. >>>> >>>>> Can you test out this branch[1]. I don't have any h/w needing this, >>>>> but wrote a unittest and tested with modified QEMU. >>>> >>>> I reviewed everything, I did find a minor issue, see the patch attached. >>> >>> WRT that patch, the original intent of "force_dma" was purely to >>> consider a device DMA-capable regardless of the presence of >>> "dma-ranges". Expecting of_dma_configure() to do anything for a non-OF >>> device has always been bogus - magic paravirt devices which appear out >>> of nowhere and expect to be treated as genuine DMA masters are a >>> separate problem that we haven't really approached yet. >> >> I agree it's clearly abusing the function. I have no problem with the behaviour >> change if it's OK with you. Thinking about it, you could probably just remove that call from the Xen DRM driver now anyway - since the dma-direct rework, we lost the ability to set dma_dummy_ops by default, and NULL ops now represent what it (presumably) wants. >> Robin, have you looked into supporting multiple dma-ranges? It's the next thing >> we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in the >> works already. > > Multiple dma-ranges as far as configuring inbound windows should work > already other than the bug when there's any parent translation. But if > you mean supporting multiple DMA offsets and masks per device in the > DMA API, there's nothing in the works yet. There's also the in-between step of making of_dma_get_range() return a size based on all the dma-ranges entries rather than only the first one - otherwise, something like [1] can lead to pretty unworkable default masks. We implemented that when doing acpi_dma_get_range(), it's just that the OF counterpart never caught up. Robin. [1] http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a2814af56b3486c2985a95540a88d8f9fa3a699f
On Wed, Sep 25, 2019 at 11:52 AM Robin Murphy <robin.murphy@arm.com> wrote: > > On 25/09/2019 17:16, Rob Herring wrote: > > On Wed, Sep 25, 2019 at 10:30 AM Nicolas Saenz Julienne > > <nsaenzjulienne@suse.de> wrote: > >> > >> On Wed, 2019-09-25 at 16:09 +0100, Robin Murphy wrote: > >>> On 25/09/2019 15:52, Nicolas Saenz Julienne wrote: > >>>> On Tue, 2019-09-24 at 16:59 -0500, Rob Herring wrote: > >>>>> On Tue, Sep 24, 2019 at 1:12 PM Nicolas Saenz Julienne > >>>>> <nsaenzjulienne@suse.de> wrote: > >>>>>> Hi All, > >>>>>> this series tries to address one of the issues blocking us from > >>>>>> upstreaming Broadcom's STB PCIe controller[1]. Namely, the fact that > >>>>>> devices not represented in DT which sit behind a PCI bus fail to get the > >>>>>> bus' DMA addressing constraints. > >>>>>> > >>>>>> This is due to the fact that of_dma_configure() assumes it's receiving a > >>>>>> DT node representing the device being configured, as opposed to the PCIe > >>>>>> bridge node we currently pass. This causes the code to directly jump > >>>>>> into PCI's parent node when checking for 'dma-ranges' and misses > >>>>>> whatever was set there. > >>>>>> > >>>>>> To address this I create a new API in OF - inspired from Robin Murphys > >>>>>> original proposal[2] - which accepts a bus DT node as it's input in > >>>>>> order to configure a device's DMA constraints. The changes go deep into > >>>>>> of/address.c's implementation, as a device being having a DT node > >>>>>> assumption was pretty strong. > >>>>>> > >>>>>> On top of this work, I also cleaned up of_dma_configure() removing its > >>>>>> redundant arguments and creating an alternative function for the special > >>>>>> cases > >>>>>> not applicable to either the above case or the default usage. > >>>>>> > >>>>>> IMO the resulting functions are more explicit. They will probably > >>>>>> surface some hacky usages that can be properly fixed as I show with the > >>>>>> DT fixes on the Layerscape platform. > >>>>>> > >>>>>> This was also tested on a Raspberry Pi 4 with a custom PCIe driver and > >>>>>> on a Seattle AMD board. > >>>>> > >>>>> Humm, I've been working on this issue too. Looks similar though yours > >>>>> has a lot more churn and there's some other bugs I've found. > >>>> > >>>> That's good news, and yes now that I see it, some stuff on my series is > >>>> overly > >>>> complicated. Specially around of_translate_*(). > >>>> > >>>> On top of that, you removed in of_dma_get_range(): > >>>> > >>>> - /* > >>>> - * At least empty ranges has to be defined for parent node if > >>>> - * DMA is supported > >>>> - */ > >>>> - if (!ranges) > >>>> - break; > >>>> > >>>> Which I assumed was bound to the standard and makes things easier. > >>>> > >>>>> Can you test out this branch[1]. I don't have any h/w needing this, > >>>>> but wrote a unittest and tested with modified QEMU. > >>>> > >>>> I reviewed everything, I did find a minor issue, see the patch attached. > >>> > >>> WRT that patch, the original intent of "force_dma" was purely to > >>> consider a device DMA-capable regardless of the presence of > >>> "dma-ranges". Expecting of_dma_configure() to do anything for a non-OF > >>> device has always been bogus - magic paravirt devices which appear out > >>> of nowhere and expect to be treated as genuine DMA masters are a > >>> separate problem that we haven't really approached yet. > >> > >> I agree it's clearly abusing the function. I have no problem with the behaviour > >> change if it's OK with you. > > Thinking about it, you could probably just remove that call from the Xen > DRM driver now anyway - since the dma-direct rework, we lost the ability > to set dma_dummy_ops by default, and NULL ops now represent what it > (presumably) wants. Not xen_dma_ops? In any case, I'll send out a patch for the the Xen folks to comment on. > >> Robin, have you looked into supporting multiple dma-ranges? It's the next thing > >> we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in the > >> works already. > > > > Multiple dma-ranges as far as configuring inbound windows should work > > already other than the bug when there's any parent translation. But if > > you mean supporting multiple DMA offsets and masks per device in the > > DMA API, there's nothing in the works yet. > > There's also the in-between step of making of_dma_get_range() return a > size based on all the dma-ranges entries rather than only the first one > - otherwise, something like [1] can lead to pretty unworkable default > masks. We implemented that when doing acpi_dma_get_range(), it's just > that the OF counterpart never caught up. Right. I suppose we assume any holes in the ranges are addressable by the device but won't get used for other reasons (such as no memory there). However, to be correct, the range of the dma offset plus mask would need to be within the min start and max end addresses. IOW, while we need to round up (0xa_8000_0000 - 0x2c1c_0000) to the next power of 2, the 'correct' thing to do is round down. Rob > [1] > http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a2814af56b3486c2985a95540a88d8f9fa3a699f
> > > > Robin, have you looked into supporting multiple dma-ranges? It's the > > > > next thing > > > > we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in > > > > the > > > > works already. > > > > > > Multiple dma-ranges as far as configuring inbound windows should work > > > already other than the bug when there's any parent translation. But if > > > you mean supporting multiple DMA offsets and masks per device in the > > > DMA API, there's nothing in the works yet. Sorry, I meant supporting multiple DMA offsets[1]. I think I could still make it with a single DMA mask though. > > There's also the in-between step of making of_dma_get_range() return a > > size based on all the dma-ranges entries rather than only the first one > > - otherwise, something like [1] can lead to pretty unworkable default > > masks. We implemented that when doing acpi_dma_get_range(), it's just > > that the OF counterpart never caught up. > > Right. I suppose we assume any holes in the ranges are addressable by > the device but won't get used for other reasons (such as no memory > there). However, to be correct, the range of the dma offset plus mask > would need to be within the min start and max end addresses. IOW, > while we need to round up (0xa_8000_0000 - 0x2c1c_0000) to the next > power of 2, the 'correct' thing to do is round down. IIUC I also have this issue on my list. The RPi4 PCIe block has an integration bug that only allows DMA to the lower 3GB. With dma-ranges of size 0xc000_0000 you get a 32bit DMA mask wich is not what you need. So far I faked it in the device-tree but I guess it be better to add an extra check in of_dma_configure(), decrease the mask and print some kind of warning stating that DMA addressing is suboptimal. Regards, Nicolas [1] https://lkml.org/lkml/2018/9/19/641
On 2019-09-26 11:44 am, Nicolas Saenz Julienne wrote: >>>>> Robin, have you looked into supporting multiple dma-ranges? It's the >>>>> next thing >>>>> we need for BCM STB's PCIe. I'll have a go at it myself if nothing is in >>>>> the >>>>> works already. >>>> >>>> Multiple dma-ranges as far as configuring inbound windows should work >>>> already other than the bug when there's any parent translation. But if >>>> you mean supporting multiple DMA offsets and masks per device in the >>>> DMA API, there's nothing in the works yet. > > Sorry, I meant supporting multiple DMA offsets[1]. I think I could still make > it with a single DMA mask though. The main problem for supporting that case in general is the disgusting carving up of the physical memory map you may have to do to guarantee that a single buffer allocation cannot ever span two windows with different offsets. I don't think we ever reached a conclusion on whether that was even achievable in practice. >>> There's also the in-between step of making of_dma_get_range() return a >>> size based on all the dma-ranges entries rather than only the first one >>> - otherwise, something like [1] can lead to pretty unworkable default >>> masks. We implemented that when doing acpi_dma_get_range(), it's just >>> that the OF counterpart never caught up. >> >> Right. I suppose we assume any holes in the ranges are addressable by >> the device but won't get used for other reasons (such as no memory >> there). However, to be correct, the range of the dma offset plus mask >> would need to be within the min start and max end addresses. IOW, >> while we need to round up (0xa_8000_0000 - 0x2c1c_0000) to the next >> power of 2, the 'correct' thing to do is round down. > > IIUC I also have this issue on my list. The RPi4 PCIe block has an integration > bug that only allows DMA to the lower 3GB. With dma-ranges of size 0xc000_0000 > you get a 32bit DMA mask wich is not what you need. So far I faked it in the > device-tree but I guess it be better to add an extra check in > of_dma_configure(), decrease the mask and print some kind of warning stating > that DMA addressing is suboptimal. Yeah, there's just no way for masks to describe that the device can drive all the individual bits, just not in certain combinations :( The plan I have sketched out there is to merge dma_pfn_offset and bus_dma_mask into a "DMA range" descriptor, so we can then hang one or more of those off a device to properly cope with all these weird interconnects. Conceptually it feels pretty straightforward; I think most of the challenge is in implementing it efficiently. Plus there's the question of whether it could also subsume the dma_mask as well. Robin.
On 9/26/2019 4:20 AM, Robin Murphy wrote: > On 2019-09-26 11:44 am, Nicolas Saenz Julienne wrote: >>>>>> Robin, have you looked into supporting multiple dma-ranges? It's the >>>>>> next thing >>>>>> we need for BCM STB's PCIe. I'll have a go at it myself if nothing >>>>>> is in >>>>>> the >>>>>> works already. >>>>> >>>>> Multiple dma-ranges as far as configuring inbound windows should work >>>>> already other than the bug when there's any parent translation. But if >>>>> you mean supporting multiple DMA offsets and masks per device in the >>>>> DMA API, there's nothing in the works yet. >> >> Sorry, I meant supporting multiple DMA offsets[1]. I think I could >> still make >> it with a single DMA mask though. > > The main problem for supporting that case in general is the disgusting > carving up of the physical memory map you may have to do to guarantee > that a single buffer allocation cannot ever span two windows with > different offsets. I don't think we ever reached a conclusion on whether > that was even achievable in practice. It is with the Broadcom STB SoCs which have between 1 and 3 memory controllers depending on the SoC, and multiple dma-ranges cells for PCIe as a consequence. Each memory controller has a different physical address aperture in the CPU's physical address map (e.g.: MEMC0 is 0x0 - 0x3fff_ffff, MEMC1 0x4000_0000 - 0x7ffff_ffff and MEMC2 0x8000_0000 - 0xbfff_ffff, not counting the extension regions above 4GB), and while the CPU is scheduled and arbitrated the same way across all memory controllers (thus making it virtually UMA, almost) having a buffer span two memory controllers would be problematic because the memory controllers do not know how to guarantee the transaction ordering and buffer data consistency in both DRAM itself and for other memory controller clients, like PCIe. We historically had to reserve the last 4KB of each memory controller to avoid problematic controllers like EHCI to prefetch beyond the end of a memory controller's populated memory and that also incidentally takes care of never having a buffer cross a controller boundary. Either you can allocate the entire buffer on a given memory controller, or you cannot allocate memory at all on that zone/region and another one must be found (or there is no more memory and there is a genuine OOM). The way we reserve memory right now is based on the first patch submitted by Jim: https://lore.kernel.org/patchwork/patch/988469/ whereby we read the memory node's "reg" property and we map the physical addresses to the memory controller configuration read from the specific registers in the CPU's Bus Interface Unit (where the memory controller apertures are architecturally defined) and then we use that to call memblock_reserve() (not part of that patch, it should be though).