Message ID | 20190809175741.7066-2-marek.vasut@gmail.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [V3,1/3] PCI: rcar: Move the inbound index check | expand |
On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote: > From: Marek Vasut <marek.vasut+renesas@gmail.com> > > In case the "dma-ranges" DT property contains either too many ranges > or the range start address is unaligned in such a way that populating > the range into the controller requires multiple entries, a situation > may occur where all ranges cannot be loaded into the controller. > > Currently, the driver refuses to probe in such a situation. Relax this > behavior, load as many ranges as possible and warn if some ranges do > not fit anymore. What is the motivation for relaxing this? > Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> > Cc: Geert Uytterhoeven <geert+renesas@glider.be> > Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > Cc: Wolfram Sang <wsa@the-dreams.de> > Cc: linux-renesas-soc@vger.kernel.org > To: linux-pci@vger.kernel.org > --- > V2: Update on top of 1/3 > V3: No change > --- > drivers/pci/controller/pcie-rcar.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/controller/pcie-rcar.c b/drivers/pci/controller/pcie-rcar.c > index 56a6433eb70b..e2735005ffd3 100644 > --- a/drivers/pci/controller/pcie-rcar.c > +++ b/drivers/pci/controller/pcie-rcar.c > @@ -1049,8 +1049,9 @@ static int rcar_pcie_inbound_ranges(struct rcar_pcie *pcie, > > while (cpu_addr < cpu_end) { > if (idx >= MAX_NR_INBOUND_MAPS - 1) { > - dev_err(pcie->dev, "Failed to map inbound regions!\n"); > - return -EINVAL; > + dev_warn(pcie->dev, > + "Too many inbound regions, not all are mapped.\n"); > + break; > } > /* > * Set up 64-bit inbound regions as the range parser doesn't > -- > 2.20.1 >
On 8/16/19 3:23 PM, Simon Horman wrote: > On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote: >> From: Marek Vasut <marek.vasut+renesas@gmail.com> >> >> In case the "dma-ranges" DT property contains either too many ranges >> or the range start address is unaligned in such a way that populating >> the range into the controller requires multiple entries, a situation >> may occur where all ranges cannot be loaded into the controller. >> >> Currently, the driver refuses to probe in such a situation. Relax this >> behavior, load as many ranges as possible and warn if some ranges do >> not fit anymore. > > What is the motivation for relaxing this? U-Boot can fill the ranges in properly now, the list would be longer in such a case and the driver would fail to probe (because the list is longer than what the hardware can support).
On Fri, Aug 16, 2019 at 03:28:04PM +0200, Marek Vasut wrote: > On 8/16/19 3:23 PM, Simon Horman wrote: > > On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote: > >> From: Marek Vasut <marek.vasut+renesas@gmail.com> > >> > >> In case the "dma-ranges" DT property contains either too many ranges > >> or the range start address is unaligned in such a way that populating > >> the range into the controller requires multiple entries, a situation > >> may occur where all ranges cannot be loaded into the controller. > >> > >> Currently, the driver refuses to probe in such a situation. Relax this > >> behavior, load as many ranges as possible and warn if some ranges do > >> not fit anymore. > > > > What is the motivation for relaxing this? > > U-Boot can fill the ranges in properly now, the list would be longer in > such a case and the driver would fail to probe (because the list is > longer than what the hardware can support). Thanks, I think that would be worth adding to the changelog. Regardless, Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
On 8/16/19 3:38 PM, Simon Horman wrote: > On Fri, Aug 16, 2019 at 03:28:04PM +0200, Marek Vasut wrote: >> On 8/16/19 3:23 PM, Simon Horman wrote: >>> On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote: >>>> From: Marek Vasut <marek.vasut+renesas@gmail.com> >>>> >>>> In case the "dma-ranges" DT property contains either too many ranges >>>> or the range start address is unaligned in such a way that populating >>>> the range into the controller requires multiple entries, a situation >>>> may occur where all ranges cannot be loaded into the controller. >>>> >>>> Currently, the driver refuses to probe in such a situation. Relax this >>>> behavior, load as many ranges as possible and warn if some ranges do >>>> not fit anymore. >>> >>> What is the motivation for relaxing this? >> >> U-Boot can fill the ranges in properly now, the list would be longer in >> such a case and the driver would fail to probe (because the list is >> longer than what the hardware can support). > > Thanks, I think that would be worth adding to the changelog. It does describe exactly what I just said -- if there are too many ranges or they start in a way that cannot be easily fully programmed into the HW, this patch applies. > Regardless, > > Reviewed-by: Simon Horman <horms+renesas@verge.net.au> >
On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote: > From: Marek Vasut <marek.vasut+renesas@gmail.com> > > In case the "dma-ranges" DT property contains either too many ranges > or the range start address is unaligned in such a way that populating > the range into the controller requires multiple entries, a situation > may occur where all ranges cannot be loaded into the controller. > > Currently, the driver refuses to probe in such a situation. Relax this > behavior, load as many ranges as possible and warn if some ranges do > not fit anymore. Patches (1) and (3) are fine but I do not think this one is. Firmware (DT) should provide dma-ranges according to what HW can handle, more so given that other subsystems (eg IOMMU) rely on the dma-ranges value to set-up eg DMA - if there is a mismatch between PCI host inbound regions and software structures describing DMA'able ranges all bets are off. I would not merge this specific patch but let me know what you think. Thanks, Lorenzo > Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> > Cc: Geert Uytterhoeven <geert+renesas@glider.be> > Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > Cc: Wolfram Sang <wsa@the-dreams.de> > Cc: linux-renesas-soc@vger.kernel.org > To: linux-pci@vger.kernel.org > --- > V2: Update on top of 1/3 > V3: No change > --- > drivers/pci/controller/pcie-rcar.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/controller/pcie-rcar.c b/drivers/pci/controller/pcie-rcar.c > index 56a6433eb70b..e2735005ffd3 100644 > --- a/drivers/pci/controller/pcie-rcar.c > +++ b/drivers/pci/controller/pcie-rcar.c > @@ -1049,8 +1049,9 @@ static int rcar_pcie_inbound_ranges(struct rcar_pcie *pcie, > > while (cpu_addr < cpu_end) { > if (idx >= MAX_NR_INBOUND_MAPS - 1) { > - dev_err(pcie->dev, "Failed to map inbound regions!\n"); > - return -EINVAL; > + dev_warn(pcie->dev, > + "Too many inbound regions, not all are mapped.\n"); > + break; > } > /* > * Set up 64-bit inbound regions as the range parser doesn't > -- > 2.20.1 >
On 10/16/19 5:00 PM, Lorenzo Pieralisi wrote: > On Fri, Aug 09, 2019 at 07:57:40PM +0200, Marek Vasut wrote: >> From: Marek Vasut <marek.vasut+renesas@gmail.com> >> >> In case the "dma-ranges" DT property contains either too many ranges >> or the range start address is unaligned in such a way that populating >> the range into the controller requires multiple entries, a situation >> may occur where all ranges cannot be loaded into the controller. >> >> Currently, the driver refuses to probe in such a situation. Relax this >> behavior, load as many ranges as possible and warn if some ranges do >> not fit anymore. > > Patches (1) and (3) are fine but I do not think this one is. > > Firmware (DT) should provide dma-ranges according to what HW can handle, > more so given that other subsystems (eg IOMMU) rely on the dma-ranges > value to set-up eg DMA - if there is a mismatch between PCI host inbound > regions and software structures describing DMA'able ranges all bets are > off. The firmware provides all the ranges which are available and usable, that's the hardware description and that should be in the DT. The firmware cannot decide the policy for the next stage (Linux in this case) on which ranges are better to use for Linux and which are less good. Linux can then decide which ranges are best suited for it and ignore the other ones.
On Wed, Oct 16, 2019 at 05:10:02PM +0200, Marek Vasut wrote: > On 10/16/19 5:00 PM, Lorenzo Pieralisi wrote: > > On Fri, Aug 09, 2019 at 07:57:40PM +0200, Marek Vasut wrote: > >> From: Marek Vasut <marek.vasut+renesas@gmail.com> > >> > >> In case the "dma-ranges" DT property contains either too many ranges > >> or the range start address is unaligned in such a way that populating > >> the range into the controller requires multiple entries, a situation > >> may occur where all ranges cannot be loaded into the controller. > >> > >> Currently, the driver refuses to probe in such a situation. Relax this > >> behavior, load as many ranges as possible and warn if some ranges do > >> not fit anymore. > > > > Patches (1) and (3) are fine but I do not think this one is. > > > > Firmware (DT) should provide dma-ranges according to what HW can handle, > > more so given that other subsystems (eg IOMMU) rely on the dma-ranges > > value to set-up eg DMA - if there is a mismatch between PCI host inbound > > regions and software structures describing DMA'able ranges all bets are > > off. > > The firmware provides all the ranges which are available and usable, > that's the hardware description and that should be in the DT. If the HW (given that those dma-ranges are declared for the PCI host controller) can't be programmed to enable those DMA ranges - those ranges are neither available nor usable, ergo DT is broken. > The firmware cannot decide the policy for the next stage (Linux in > this case) on which ranges are better to use for Linux and which are > less good. Linux can then decide which ranges are best suited for it > and ignore the other ones. dma-ranges is a property that is used by other kernel subsystems eg IOMMU other than the RCAR host controller driver. The policy, provided there is one should be shared across them. You can't leave a PCI host controller half-programmed and expect other subsystems (that *expect* those ranges to be DMA'ble) to work. I reiterate my point: if firmware is broken it is better to fail the probe rather than limp on hoping that things will keep on working. Lorenzo
On 10/16/19 5:26 PM, Lorenzo Pieralisi wrote: > On Wed, Oct 16, 2019 at 05:10:02PM +0200, Marek Vasut wrote: >> On 10/16/19 5:00 PM, Lorenzo Pieralisi wrote: >>> On Fri, Aug 09, 2019 at 07:57:40PM +0200, Marek Vasut wrote: >>>> From: Marek Vasut <marek.vasut+renesas@gmail.com> >>>> >>>> In case the "dma-ranges" DT property contains either too many ranges >>>> or the range start address is unaligned in such a way that populating >>>> the range into the controller requires multiple entries, a situation >>>> may occur where all ranges cannot be loaded into the controller. >>>> >>>> Currently, the driver refuses to probe in such a situation. Relax this >>>> behavior, load as many ranges as possible and warn if some ranges do >>>> not fit anymore. >>> >>> Patches (1) and (3) are fine but I do not think this one is. >>> >>> Firmware (DT) should provide dma-ranges according to what HW can handle, >>> more so given that other subsystems (eg IOMMU) rely on the dma-ranges >>> value to set-up eg DMA - if there is a mismatch between PCI host inbound >>> regions and software structures describing DMA'able ranges all bets are >>> off. >> >> The firmware provides all the ranges which are available and usable, >> that's the hardware description and that should be in the DT. > > If the HW (given that those dma-ranges are declared for the PCI host > controller) can't be programmed to enable those DMA ranges - those > ranges are neither available nor usable, ergo DT is broken. The hardware can be programmed to enable those DMA ranges, just not all of them at the same time. It's not the job of the bootloader to guess which ranges might the next stage like best. >> The firmware cannot decide the policy for the next stage (Linux in >> this case) on which ranges are better to use for Linux and which are >> less good. Linux can then decide which ranges are best suited for it >> and ignore the other ones. > > dma-ranges is a property that is used by other kernel subsystems eg > IOMMU other than the RCAR host controller driver. The policy, provided > there is one should be shared across them. You can't leave a PCI > host controller half-programmed and expect other subsystems (that > *expect* those ranges to be DMA'ble) to work. > > I reiterate my point: if firmware is broken it is better to fail > the probe rather than limp on hoping that things will keep on > working. But the firmware is not broken ?
[+RobH, Robin] On Wed, Oct 16, 2019 at 05:29:50PM +0200, Marek Vasut wrote: [...] > >> The firmware provides all the ranges which are available and usable, > >> that's the hardware description and that should be in the DT. > > > > If the HW (given that those dma-ranges are declared for the PCI host > > controller) can't be programmed to enable those DMA ranges - those > > ranges are neither available nor usable, ergo DT is broken. > > The hardware can be programmed to enable those DMA ranges, just not all > of them at the same time. Ok, we are down to DT bindings interpretation then. > It's not the job of the bootloader to guess which ranges might the next > stage like best. By the time this series: https://patchwork.ozlabs.org/user/todo/linux-pci/?series=132419 is merged, your policy will require the host controller driver to remove the DMA ranges that could not be programmed in the inbound address decoders from the dma_ranges list, otherwise things will fall apart. > >> The firmware cannot decide the policy for the next stage (Linux in > >> this case) on which ranges are better to use for Linux and which are > >> less good. Linux can then decide which ranges are best suited for it > >> and ignore the other ones. > > > > dma-ranges is a property that is used by other kernel subsystems eg > > IOMMU other than the RCAR host controller driver. The policy, provided > > there is one should be shared across them. You can't leave a PCI > > host controller half-programmed and expect other subsystems (that > > *expect* those ranges to be DMA'ble) to work. > > > > I reiterate my point: if firmware is broken it is better to fail > > the probe rather than limp on hoping that things will keep on > > working. > > But the firmware is not broken ? See above, it depends on how the dma-ranges property is interpreted, hopefully we can reach consensus in this thread, I won't merge a patch that can backfire later unless we all agree that what it does is correct. Lorenzo
On Wed, Oct 16, 2019 at 11:18 AM Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > > [+RobH, Robin] > > On Wed, Oct 16, 2019 at 05:29:50PM +0200, Marek Vasut wrote: > > [...] > > > >> The firmware provides all the ranges which are available and usable, > > >> that's the hardware description and that should be in the DT. > > > > > > If the HW (given that those dma-ranges are declared for the PCI host > > > controller) can't be programmed to enable those DMA ranges - those > > > ranges are neither available nor usable, ergo DT is broken. > > > > The hardware can be programmed to enable those DMA ranges, just not all > > of them at the same time. > > Ok, we are down to DT bindings interpretation then. > > > It's not the job of the bootloader to guess which ranges might the next > > stage like best. > > By the time this series: > > https://patchwork.ozlabs.org/user/todo/linux-pci/?series=132419 > > is merged, your policy will require the host controller driver to > remove the DMA ranges that could not be programmed in the inbound > address decoders from the dma_ranges list, otherwise things will > fall apart. I don't think the above series has too much impact on this. It's my other series dealing with dma masks that's relevant because for dma masks we only ever look at the first dma-ranges entry. We either have to support multiple addresses and sizes per device (the only way to really support any possible dma-ranges), merge entries to single offset/mask or have some way to select which range entry to use. So things are broken to some extent regardless unless MAX_NR_INBOUND_MAPS == 1. > > >> The firmware cannot decide the policy for the next stage (Linux in > > >> this case) on which ranges are better to use for Linux and which are > > >> less good. Linux can then decide which ranges are best suited for it > > >> and ignore the other ones. > > > > > > dma-ranges is a property that is used by other kernel subsystems eg > > > IOMMU other than the RCAR host controller driver. The policy, provided > > > there is one should be shared across them. You can't leave a PCI > > > host controller half-programmed and expect other subsystems (that > > > *expect* those ranges to be DMA'ble) to work. > > > > > > I reiterate my point: if firmware is broken it is better to fail > > > the probe rather than limp on hoping that things will keep on > > > working. > > > > But the firmware is not broken ? > > See above, it depends on how the dma-ranges property is interpreted, > hopefully we can reach consensus in this thread, I won't merge a patch > that can backfire later unless we all agree that what it does is > correct. Defining more dma-ranges entries than the h/w has inbound windows for sounds like a broken DT to me. What exactly does dma-ranges contain in this case? I'm not really visualizing how different clients would pick different dma-ranges entries. Rob
On 10/16/19 8:12 PM, Rob Herring wrote: > On Wed, Oct 16, 2019 at 11:18 AM Lorenzo Pieralisi > <lorenzo.pieralisi@arm.com> wrote: >> >> [+RobH, Robin] >> >> On Wed, Oct 16, 2019 at 05:29:50PM +0200, Marek Vasut wrote: >> >> [...] >> >>>>> The firmware provides all the ranges which are available and usable, >>>>> that's the hardware description and that should be in the DT. >>>> >>>> If the HW (given that those dma-ranges are declared for the PCI host >>>> controller) can't be programmed to enable those DMA ranges - those >>>> ranges are neither available nor usable, ergo DT is broken. >>> >>> The hardware can be programmed to enable those DMA ranges, just not all >>> of them at the same time. >> >> Ok, we are down to DT bindings interpretation then. >> >>> It's not the job of the bootloader to guess which ranges might the next >>> stage like best. >> >> By the time this series: >> >> https://patchwork.ozlabs.org/user/todo/linux-pci/?series=132419 >> >> is merged, your policy will require the host controller driver to >> remove the DMA ranges that could not be programmed in the inbound >> address decoders from the dma_ranges list, otherwise things will >> fall apart. > > I don't think the above series has too much impact on this. It's my > other series dealing with dma masks that's relevant because for dma > masks we only ever look at the first dma-ranges entry. We either have > to support multiple addresses and sizes per device (the only way to > really support any possible dma-ranges), merge entries to single > offset/mask or have some way to select which range entry to use. > > So things are broken to some extent regardless unless MAX_NR_INBOUND_MAPS == 1. > >>>>> The firmware cannot decide the policy for the next stage (Linux in >>>>> this case) on which ranges are better to use for Linux and which are >>>>> less good. Linux can then decide which ranges are best suited for it >>>>> and ignore the other ones. >>>> >>>> dma-ranges is a property that is used by other kernel subsystems eg >>>> IOMMU other than the RCAR host controller driver. The policy, provided >>>> there is one should be shared across them. You can't leave a PCI >>>> host controller half-programmed and expect other subsystems (that >>>> *expect* those ranges to be DMA'ble) to work. >>>> >>>> I reiterate my point: if firmware is broken it is better to fail >>>> the probe rather than limp on hoping that things will keep on >>>> working. >>> >>> But the firmware is not broken ? >> >> See above, it depends on how the dma-ranges property is interpreted, >> hopefully we can reach consensus in this thread, I won't merge a patch >> that can backfire later unless we all agree that what it does is >> correct. > > Defining more dma-ranges entries than the h/w has inbound windows for > sounds like a broken DT to me. > > What exactly does dma-ranges contain in this case? I'm not really > visualizing how different clients would pick different dma-ranges > entries. You can have multiple non-continuous DRAM banks for example. And an entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a separate dma-ranges entry, right ?
On Wed, Oct 16, 2019 at 1:18 PM Marek Vasut <marek.vasut@gmail.com> wrote: > > On 10/16/19 8:12 PM, Rob Herring wrote: > > On Wed, Oct 16, 2019 at 11:18 AM Lorenzo Pieralisi > > <lorenzo.pieralisi@arm.com> wrote: > >> > >> [+RobH, Robin] > >> > >> On Wed, Oct 16, 2019 at 05:29:50PM +0200, Marek Vasut wrote: > >> > >> [...] > >> > >>>>> The firmware provides all the ranges which are available and usable, > >>>>> that's the hardware description and that should be in the DT. > >>>> > >>>> If the HW (given that those dma-ranges are declared for the PCI host > >>>> controller) can't be programmed to enable those DMA ranges - those > >>>> ranges are neither available nor usable, ergo DT is broken. > >>> > >>> The hardware can be programmed to enable those DMA ranges, just not all > >>> of them at the same time. > >> > >> Ok, we are down to DT bindings interpretation then. > >> > >>> It's not the job of the bootloader to guess which ranges might the next > >>> stage like best. > >> > >> By the time this series: > >> > >> https://patchwork.ozlabs.org/user/todo/linux-pci/?series=132419 > >> > >> is merged, your policy will require the host controller driver to > >> remove the DMA ranges that could not be programmed in the inbound > >> address decoders from the dma_ranges list, otherwise things will > >> fall apart. > > > > I don't think the above series has too much impact on this. It's my > > other series dealing with dma masks that's relevant because for dma > > masks we only ever look at the first dma-ranges entry. We either have > > to support multiple addresses and sizes per device (the only way to > > really support any possible dma-ranges), merge entries to single > > offset/mask or have some way to select which range entry to use. > > > > So things are broken to some extent regardless unless MAX_NR_INBOUND_MAPS == 1. > > > >>>>> The firmware cannot decide the policy for the next stage (Linux in > >>>>> this case) on which ranges are better to use for Linux and which are > >>>>> less good. Linux can then decide which ranges are best suited for it > >>>>> and ignore the other ones. > >>>> > >>>> dma-ranges is a property that is used by other kernel subsystems eg > >>>> IOMMU other than the RCAR host controller driver. The policy, provided > >>>> there is one should be shared across them. You can't leave a PCI > >>>> host controller half-programmed and expect other subsystems (that > >>>> *expect* those ranges to be DMA'ble) to work. > >>>> > >>>> I reiterate my point: if firmware is broken it is better to fail > >>>> the probe rather than limp on hoping that things will keep on > >>>> working. > >>> > >>> But the firmware is not broken ? > >> > >> See above, it depends on how the dma-ranges property is interpreted, > >> hopefully we can reach consensus in this thread, I won't merge a patch > >> that can backfire later unless we all agree that what it does is > >> correct. > > > > Defining more dma-ranges entries than the h/w has inbound windows for > > sounds like a broken DT to me. > > > > What exactly does dma-ranges contain in this case? I'm not really > > visualizing how different clients would pick different dma-ranges > > entries. > > You can have multiple non-continuous DRAM banks for example. And an > entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a > separate dma-ranges entry, right ? Not necessarily. We really only want to define the minimum we have to. The ideal system is no dma-ranges. Is each bank at a different relative position compared to the CPU's view of the system. That would seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change. I suppose if your intent is to use inbound windows as a poor man's IOMMU to prevent accesses to the holes, then yes you would list them out. But I think that's wrong and difficult to maintain. You'd also need to deal with reserved-memory regions too. Rob
On 10/16/19 10:25 PM, Rob Herring wrote: [...] >>>>>>> The firmware cannot decide the policy for the next stage (Linux in >>>>>>> this case) on which ranges are better to use for Linux and which are >>>>>>> less good. Linux can then decide which ranges are best suited for it >>>>>>> and ignore the other ones. >>>>>> >>>>>> dma-ranges is a property that is used by other kernel subsystems eg >>>>>> IOMMU other than the RCAR host controller driver. The policy, provided >>>>>> there is one should be shared across them. You can't leave a PCI >>>>>> host controller half-programmed and expect other subsystems (that >>>>>> *expect* those ranges to be DMA'ble) to work. >>>>>> >>>>>> I reiterate my point: if firmware is broken it is better to fail >>>>>> the probe rather than limp on hoping that things will keep on >>>>>> working. >>>>> >>>>> But the firmware is not broken ? >>>> >>>> See above, it depends on how the dma-ranges property is interpreted, >>>> hopefully we can reach consensus in this thread, I won't merge a patch >>>> that can backfire later unless we all agree that what it does is >>>> correct. >>> >>> Defining more dma-ranges entries than the h/w has inbound windows for >>> sounds like a broken DT to me. >>> >>> What exactly does dma-ranges contain in this case? I'm not really >>> visualizing how different clients would pick different dma-ranges >>> entries. >> >> You can have multiple non-continuous DRAM banks for example. And an >> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a >> separate dma-ranges entry, right ? > > Not necessarily. We really only want to define the minimum we have to. > The ideal system is no dma-ranges. Is each bank at a different > relative position compared to the CPU's view of the system. That would > seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change. Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit boundary and some more above the 32bit boundary. These two banks don't need to be continuous. And then you could add the SRAM into the mix. > I suppose if your intent is to use inbound windows as a poor man's > IOMMU to prevent accesses to the holes, then yes you would list them > out. But I think that's wrong and difficult to maintain. You'd also > need to deal with reserved-memory regions too. What's the problem with that? The bootloader has all that information and can patch the DT correctly. In fact, in my specific case, I have platform which can be populated with differently sized DRAM, so the holes are also dynamically calculated ; there is no one DT then, the bootloader is responsible to generate the dma-ranges accordingly.
On Wed, Oct 16, 2019 at 4:16 PM Marek Vasut <marek.vasut@gmail.com> wrote: > > On 10/16/19 10:25 PM, Rob Herring wrote: > [...] > >>>>>>> The firmware cannot decide the policy for the next stage (Linux in > >>>>>>> this case) on which ranges are better to use for Linux and which are > >>>>>>> less good. Linux can then decide which ranges are best suited for it > >>>>>>> and ignore the other ones. > >>>>>> > >>>>>> dma-ranges is a property that is used by other kernel subsystems eg > >>>>>> IOMMU other than the RCAR host controller driver. The policy, provided > >>>>>> there is one should be shared across them. You can't leave a PCI > >>>>>> host controller half-programmed and expect other subsystems (that > >>>>>> *expect* those ranges to be DMA'ble) to work. > >>>>>> > >>>>>> I reiterate my point: if firmware is broken it is better to fail > >>>>>> the probe rather than limp on hoping that things will keep on > >>>>>> working. > >>>>> > >>>>> But the firmware is not broken ? > >>>> > >>>> See above, it depends on how the dma-ranges property is interpreted, > >>>> hopefully we can reach consensus in this thread, I won't merge a patch > >>>> that can backfire later unless we all agree that what it does is > >>>> correct. > >>> > >>> Defining more dma-ranges entries than the h/w has inbound windows for > >>> sounds like a broken DT to me. > >>> > >>> What exactly does dma-ranges contain in this case? I'm not really > >>> visualizing how different clients would pick different dma-ranges > >>> entries. > >> > >> You can have multiple non-continuous DRAM banks for example. And an > >> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a > >> separate dma-ranges entry, right ? > > > > Not necessarily. We really only want to define the minimum we have to. > > The ideal system is no dma-ranges. Is each bank at a different > > relative position compared to the CPU's view of the system. That would > > seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change. > > Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit > boundary and some more above the 32bit boundary. These two banks don't > need to be continuous. And then you could add the SRAM into the mix. Continuous is irrelevant. My question was in more specific terms is (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs. PCI host view (i.e. bus addr)? If not, then that is 1 translation and 1 dma-ranges entry. > > I suppose if your intent is to use inbound windows as a poor man's > > IOMMU to prevent accesses to the holes, then yes you would list them > > out. But I think that's wrong and difficult to maintain. You'd also > > need to deal with reserved-memory regions too. > > What's the problem with that? The bootloader has all that information > and can patch the DT correctly. In fact, in my specific case, I have > platform which can be populated with differently sized DRAM, so the > holes are also dynamically calculated ; there is no one DT then, the > bootloader is responsible to generate the dma-ranges accordingly. The problems are it doesn't work: Your dma-mask and offset are not going to be correct. You are running out of inbound windows. Your patch does nothing to solve that. The solution would be merging multiple dma-ranges entries to a single inbound window. We'd have to do that both for dma-mask and inbound windows. The former would also have to figure out which entries apply to setting up dma-mask. I'm simply suggesting just do that up front and avoid any pointless splits. You are setting up random inbound windows. The bootloader can't assume what order the OS parses dma-ranges, and the OS can't assume what order the bootloader writes the entries. Rob
On 10/17/19 12:26 AM, Rob Herring wrote: [...] >>>> You can have multiple non-continuous DRAM banks for example. And an >>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a >>>> separate dma-ranges entry, right ? >>> >>> Not necessarily. We really only want to define the minimum we have to. >>> The ideal system is no dma-ranges. Is each bank at a different >>> relative position compared to the CPU's view of the system. That would >>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change. >> >> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit >> boundary and some more above the 32bit boundary. These two banks don't >> need to be continuous. And then you could add the SRAM into the mix. > > Continuous is irrelevant. My question was in more specific terms is > (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs. > PCI host view (i.e. bus addr)? If not, then that is 1 translation and > 1 dma-ranges entry. I don't think it's different in that aspect. Except the bus has this 32bit limitation, where it only sees subset of the DRAM. Why should the DMA ranges incorrectly cover also the DRAM which is not present ? >>> I suppose if your intent is to use inbound windows as a poor man's >>> IOMMU to prevent accesses to the holes, then yes you would list them >>> out. But I think that's wrong and difficult to maintain. You'd also >>> need to deal with reserved-memory regions too. >> >> What's the problem with that? The bootloader has all that information >> and can patch the DT correctly. In fact, in my specific case, I have >> platform which can be populated with differently sized DRAM, so the >> holes are also dynamically calculated ; there is no one DT then, the >> bootloader is responsible to generate the dma-ranges accordingly. > > The problems are it doesn't work: > > Your dma-mask and offset are not going to be correct. > > You are running out of inbound windows. Your patch does nothing to > solve that. The solution would be merging multiple dma-ranges entries > to a single inbound window. We'd have to do that both for dma-mask and > inbound windows. The former would also have to figure out which > entries apply to setting up dma-mask. I'm simply suggesting just do > that up front and avoid any pointless splits. But then the PCI device can trigger a transaction to non-existent DRAM and cause undefined behavior. Surely we do not want that ? > You are setting up random inbound windows. The bootloader can't assume > what order the OS parses dma-ranges, and the OS can't assume what > order the bootloader writes the entries. But the OS can assume the ranges are correct and cover only valid memory, right ? That is, memory into which the PCI controller can safely access.
Hi Marek, On Thu, Oct 17, 2019 at 12:33 AM Marek Vasut <marek.vasut@gmail.com> wrote: > On 10/17/19 12:26 AM, Rob Herring wrote: > [...] > >>>> You can have multiple non-continuous DRAM banks for example. And an > >>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a > >>>> separate dma-ranges entry, right ? > >>> > >>> Not necessarily. We really only want to define the minimum we have to. > >>> The ideal system is no dma-ranges. Is each bank at a different > >>> relative position compared to the CPU's view of the system. That would > >>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change. > >> > >> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit > >> boundary and some more above the 32bit boundary. These two banks don't > >> need to be continuous. And then you could add the SRAM into the mix. > > > > Continuous is irrelevant. My question was in more specific terms is > > (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs. > > PCI host view (i.e. bus addr)? If not, then that is 1 translation and > > 1 dma-ranges entry. > > I don't think it's different in that aspect. Except the bus has this > 32bit limitation, where it only sees subset of the DRAM. > > Why should the DMA ranges incorrectly cover also the DRAM which is not > present ? > > >>> I suppose if your intent is to use inbound windows as a poor man's > >>> IOMMU to prevent accesses to the holes, then yes you would list them > >>> out. But I think that's wrong and difficult to maintain. You'd also > >>> need to deal with reserved-memory regions too. > >> > >> What's the problem with that? The bootloader has all that information > >> and can patch the DT correctly. In fact, in my specific case, I have > >> platform which can be populated with differently sized DRAM, so the > >> holes are also dynamically calculated ; there is no one DT then, the > >> bootloader is responsible to generate the dma-ranges accordingly. > > > > The problems are it doesn't work: > > > > Your dma-mask and offset are not going to be correct. > > > > You are running out of inbound windows. Your patch does nothing to > > solve that. The solution would be merging multiple dma-ranges entries > > to a single inbound window. We'd have to do that both for dma-mask and > > inbound windows. The former would also have to figure out which > > entries apply to setting up dma-mask. I'm simply suggesting just do > > that up front and avoid any pointless splits. > > But then the PCI device can trigger a transaction to non-existent DRAM > and cause undefined behavior. Surely we do not want that ? The PCI device will trigger transactions to memory only when instructed to do so by Linux, right? Hence if Linux takes into account chosen/memory and dma-ranges, there is no problem? > > You are setting up random inbound windows. The bootloader can't assume > > what order the OS parses dma-ranges, and the OS can't assume what > > order the bootloader writes the entries. > > But the OS can assume the ranges are correct and cover only valid > memory, right ? That is, memory into which the PCI controller can safely > access. Gr{oetje,eeting}s, Geert
On 10/17/19 9:06 AM, Geert Uytterhoeven wrote: [...] >>>>> I suppose if your intent is to use inbound windows as a poor man's >>>>> IOMMU to prevent accesses to the holes, then yes you would list them >>>>> out. But I think that's wrong and difficult to maintain. You'd also >>>>> need to deal with reserved-memory regions too. >>>> >>>> What's the problem with that? The bootloader has all that information >>>> and can patch the DT correctly. In fact, in my specific case, I have >>>> platform which can be populated with differently sized DRAM, so the >>>> holes are also dynamically calculated ; there is no one DT then, the >>>> bootloader is responsible to generate the dma-ranges accordingly. >>> >>> The problems are it doesn't work: >>> >>> Your dma-mask and offset are not going to be correct. >>> >>> You are running out of inbound windows. Your patch does nothing to >>> solve that. The solution would be merging multiple dma-ranges entries >>> to a single inbound window. We'd have to do that both for dma-mask and >>> inbound windows. The former would also have to figure out which >>> entries apply to setting up dma-mask. I'm simply suggesting just do >>> that up front and avoid any pointless splits. >> >> But then the PCI device can trigger a transaction to non-existent DRAM >> and cause undefined behavior. Surely we do not want that ? > > The PCI device will trigger transactions to memory only when instructed > to do so by Linux, right? Hence if Linux takes into account chosen/memory > and dma-ranges, there is no problem? Unless of course the remote device initiates a transfer. And if the controller is programmed such that accesses to the missing DRAM in the holes are not filtered out by the controller, then the controller will gladly let the transaction through. Do we really want to let this happen ? [...]
On 17/10/2019 11:55, Marek Vasut wrote: > On 10/17/19 9:06 AM, Geert Uytterhoeven wrote: > > [...] > >>>>>> I suppose if your intent is to use inbound windows as a poor man's >>>>>> IOMMU to prevent accesses to the holes, then yes you would list them >>>>>> out. But I think that's wrong and difficult to maintain. You'd also >>>>>> need to deal with reserved-memory regions too. >>>>> >>>>> What's the problem with that? The bootloader has all that information >>>>> and can patch the DT correctly. In fact, in my specific case, I have >>>>> platform which can be populated with differently sized DRAM, so the >>>>> holes are also dynamically calculated ; there is no one DT then, the >>>>> bootloader is responsible to generate the dma-ranges accordingly. >>>> >>>> The problems are it doesn't work: >>>> >>>> Your dma-mask and offset are not going to be correct. >>>> >>>> You are running out of inbound windows. Your patch does nothing to >>>> solve that. The solution would be merging multiple dma-ranges entries >>>> to a single inbound window. We'd have to do that both for dma-mask and >>>> inbound windows. The former would also have to figure out which >>>> entries apply to setting up dma-mask. I'm simply suggesting just do >>>> that up front and avoid any pointless splits. >>> >>> But then the PCI device can trigger a transaction to non-existent DRAM >>> and cause undefined behavior. Surely we do not want that ? >> >> The PCI device will trigger transactions to memory only when instructed >> to do so by Linux, right? Hence if Linux takes into account chosen/memory >> and dma-ranges, there is no problem? > > Unless of course the remote device initiates a transfer. And if the > controller is programmed such that accesses to the missing DRAM in the > holes are not filtered out by the controller, then the controller will > gladly let the transaction through. Do we really want to let this happen ? If you've got devices making random unsolicited accesses then who's to say they wouldn't also hit valid windows and corrupt memory? If it's happening at all you've already lost. And realistically, if the address isn't valid then it's not going to make much difference anyway - in probably 99% of cases, either the transaction doesn't hit a window and the host bridge returns a completer abort, or it does hit a window, the AXI side returns DECERR or SLVERR, and the host bridge translates that into a completer abort. Consider also that many PCI IPs don't have discrete windows and just map the entirety of PCI mem space directly to the system PA space. I don't believe this is a valid argument for anything whichever way round. Robin.
On 10/17/19 3:06 PM, Robin Murphy wrote: > On 17/10/2019 11:55, Marek Vasut wrote: >> On 10/17/19 9:06 AM, Geert Uytterhoeven wrote: >> >> [...] >> >>>>>>> I suppose if your intent is to use inbound windows as a poor man's >>>>>>> IOMMU to prevent accesses to the holes, then yes you would list them >>>>>>> out. But I think that's wrong and difficult to maintain. You'd also >>>>>>> need to deal with reserved-memory regions too. >>>>>> >>>>>> What's the problem with that? The bootloader has all that information >>>>>> and can patch the DT correctly. In fact, in my specific case, I have >>>>>> platform which can be populated with differently sized DRAM, so the >>>>>> holes are also dynamically calculated ; there is no one DT then, the >>>>>> bootloader is responsible to generate the dma-ranges accordingly. >>>>> >>>>> The problems are it doesn't work: >>>>> >>>>> Your dma-mask and offset are not going to be correct. >>>>> >>>>> You are running out of inbound windows. Your patch does nothing to >>>>> solve that. The solution would be merging multiple dma-ranges entries >>>>> to a single inbound window. We'd have to do that both for dma-mask and >>>>> inbound windows. The former would also have to figure out which >>>>> entries apply to setting up dma-mask. I'm simply suggesting just do >>>>> that up front and avoid any pointless splits. >>>> >>>> But then the PCI device can trigger a transaction to non-existent DRAM >>>> and cause undefined behavior. Surely we do not want that ? >>> >>> The PCI device will trigger transactions to memory only when instructed >>> to do so by Linux, right? Hence if Linux takes into account >>> chosen/memory >>> and dma-ranges, there is no problem? >> >> Unless of course the remote device initiates a transfer. And if the >> controller is programmed such that accesses to the missing DRAM in the >> holes are not filtered out by the controller, then the controller will >> gladly let the transaction through. Do we really want to let this >> happen ? > > If you've got devices making random unsolicited accesses then who's to > say they wouldn't also hit valid windows and corrupt memory? If it's > happening at all you've already lost. Not necessarily. If your controller is programmed correctly with just the ranges that are valid, then it will filter out at least the accesses outside of valid memory. If it is programmed incorrectly, as you suggest, then the accesses will go through, causing undefined behavior. And note that there is such weird buggy PCI hardware. A slightly unrelated example are some of the ath9k, which are generating spurious MSIs even if they are in legacy PCI IRQ mode. If the controller is configured correctly, even those buggy cards work, because it can filter the spurious MSIs out. If not, they do not. That's why I would prefer to configure the controller correctly, not just hope that nothing bad will come out of misconfiguring it slightly. > And realistically, if the address > isn't valid then it's not going to make much difference anyway - in > probably 99% of cases, either the transaction doesn't hit a window and > the host bridge returns a completer abort, or it does hit a window, the > AXI side returns DECERR or SLVERR, and the host bridge translates that > into a completer abort. Consider also that many PCI IPs don't have > discrete windows and just map the entirety of PCI mem space directly to > the system PA space. And in that 1% of cases, we are OK with failure which could have been easily prevented if the controller was programmed correctly ? That does not look like a good thing. > I don't believe this is a valid argument for anything whichever way round.
On Thu, Oct 17, 2019 at 9:00 AM Marek Vasut <marek.vasut@gmail.com> wrote: > > On 10/17/19 3:06 PM, Robin Murphy wrote: > > On 17/10/2019 11:55, Marek Vasut wrote: > >> On 10/17/19 9:06 AM, Geert Uytterhoeven wrote: > >> > >> [...] > >> > >>>>>>> I suppose if your intent is to use inbound windows as a poor man's > >>>>>>> IOMMU to prevent accesses to the holes, then yes you would list them > >>>>>>> out. But I think that's wrong and difficult to maintain. You'd also > >>>>>>> need to deal with reserved-memory regions too. > >>>>>> > >>>>>> What's the problem with that? The bootloader has all that information > >>>>>> and can patch the DT correctly. In fact, in my specific case, I have > >>>>>> platform which can be populated with differently sized DRAM, so the > >>>>>> holes are also dynamically calculated ; there is no one DT then, the > >>>>>> bootloader is responsible to generate the dma-ranges accordingly. > >>>>> > >>>>> The problems are it doesn't work: > >>>>> > >>>>> Your dma-mask and offset are not going to be correct. > >>>>> > >>>>> You are running out of inbound windows. Your patch does nothing to > >>>>> solve that. The solution would be merging multiple dma-ranges entries > >>>>> to a single inbound window. We'd have to do that both for dma-mask and > >>>>> inbound windows. The former would also have to figure out which > >>>>> entries apply to setting up dma-mask. I'm simply suggesting just do > >>>>> that up front and avoid any pointless splits. > >>>> > >>>> But then the PCI device can trigger a transaction to non-existent DRAM > >>>> and cause undefined behavior. Surely we do not want that ? > >>> > >>> The PCI device will trigger transactions to memory only when instructed > >>> to do so by Linux, right? Hence if Linux takes into account > >>> chosen/memory > >>> and dma-ranges, there is no problem? > >> > >> Unless of course the remote device initiates a transfer. And if the > >> controller is programmed such that accesses to the missing DRAM in the > >> holes are not filtered out by the controller, then the controller will > >> gladly let the transaction through. Do we really want to let this > >> happen ? > > > > If you've got devices making random unsolicited accesses then who's to > > say they wouldn't also hit valid windows and corrupt memory? If it's > > happening at all you've already lost. > > Not necessarily. If your controller is programmed correctly with just > the ranges that are valid, then it will filter out at least the accesses > outside of valid memory. If it is programmed incorrectly, as you > suggest, then the accesses will go through, causing undefined behavior. > > And note that there is such weird buggy PCI hardware. A slightly > unrelated example are some of the ath9k, which are generating spurious > MSIs even if they are in legacy PCI IRQ mode. If the controller is > configured correctly, even those buggy cards work, because it can filter > the spurious MSIs out. If not, they do not. How do those devices work on h/w without inbound window configuration or they don't? How do the spurious MSIs only go to invalid addresses and not valid addresses? > That's why I would prefer to configure the controller correctly, not > just hope that nothing bad will come out of misconfiguring it slightly. Again, just handling the first N dma-ranges entries and ignoring the rest is not 'configure the controller correctly'. > > And realistically, if the address > > isn't valid then it's not going to make much difference anyway - in > > probably 99% of cases, either the transaction doesn't hit a window and > > the host bridge returns a completer abort, or it does hit a window, the > > AXI side returns DECERR or SLVERR, and the host bridge translates that > > into a completer abort. Consider also that many PCI IPs don't have > > discrete windows and just map the entirety of PCI mem space directly to > > the system PA space. > > And in that 1% of cases, we are OK with failure which could have been > easily prevented if the controller was programmed correctly ? That does > not look like a good thing. You don't need dma-ranges if you want to handle holes in DRAM. Use memblock to get this information. Then it will work if you boot using UEFI too. dma-ranges at the PCI bridge should be restrictions in the PCI bridge, not ones somewhere else in the system. Rob
On 10/17/19 4:36 PM, Rob Herring wrote: [...]>>>>> The PCI device will trigger transactions to memory only when instructed >>>>> to do so by Linux, right? Hence if Linux takes into account >>>>> chosen/memory >>>>> and dma-ranges, there is no problem? >>>> >>>> Unless of course the remote device initiates a transfer. And if the >>>> controller is programmed such that accesses to the missing DRAM in the >>>> holes are not filtered out by the controller, then the controller will >>>> gladly let the transaction through. Do we really want to let this >>>> happen ? >>> >>> If you've got devices making random unsolicited accesses then who's to >>> say they wouldn't also hit valid windows and corrupt memory? If it's >>> happening at all you've already lost. >> >> Not necessarily. If your controller is programmed correctly with just >> the ranges that are valid, then it will filter out at least the accesses >> outside of valid memory. If it is programmed incorrectly, as you >> suggest, then the accesses will go through, causing undefined behavior. >> >> And note that there is such weird buggy PCI hardware. A slightly >> unrelated example are some of the ath9k, which are generating spurious >> MSIs even if they are in legacy PCI IRQ mode. If the controller is >> configured correctly, even those buggy cards work, because it can filter >> the spurious MSIs out. If not, they do not. > > How do those devices work on h/w without inbound window configuration > or they don't? With legacy IRQs. > How do the spurious MSIs only go to invalid addresses and not valid addresses? They do not, the point was, there is such broken hardware so the controller should be programmed correctly. >> That's why I would prefer to configure the controller correctly, not >> just hope that nothing bad will come out of misconfiguring it slightly. > > Again, just handling the first N dma-ranges entries and ignoring the > rest is not 'configure the controller correctly'. It's the best effort thing to do. It's well possible the next generation of the controller will have more windows, so could accommodate the whole list of ranges. Thinking about this further, this patch should be OK either way, if there is a DT which defines more DMA ranges than the controller can handle, handling some is better than failing outright -- a PCI which works with a subset of memory is better than PCI that does not work at all. >>> And realistically, if the address >>> isn't valid then it's not going to make much difference anyway - in >>> probably 99% of cases, either the transaction doesn't hit a window and >>> the host bridge returns a completer abort, or it does hit a window, the >>> AXI side returns DECERR or SLVERR, and the host bridge translates that >>> into a completer abort. Consider also that many PCI IPs don't have >>> discrete windows and just map the entirety of PCI mem space directly to >>> the system PA space. >> >> And in that 1% of cases, we are OK with failure which could have been >> easily prevented if the controller was programmed correctly ? That does >> not look like a good thing. > > You don't need dma-ranges if you want to handle holes in DRAM. Use > memblock to get this information. Then it will work if you boot using > UEFI too. Do you have any further details about this ? > dma-ranges at the PCI bridge should be restrictions in the PCI bridge, > not ones somewhere else in the system.
On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote: [...] > > Again, just handling the first N dma-ranges entries and ignoring the > > rest is not 'configure the controller correctly'. > > It's the best effort thing to do. It's well possible the next generation > of the controller will have more windows, so could accommodate the whole > list of ranges. > > Thinking about this further, this patch should be OK either way, if > there is a DT which defines more DMA ranges than the controller can > handle, handling some is better than failing outright -- a PCI which > works with a subset of memory is better than PCI that does not work at all. OK to sum it up, this patch is there to deal with u-boot adding multiple dma-ranges to DT. I still do not understand the benefit given that for DMA masks they are useless as Rob pointed out and ditto for inbound windows programming (given that AFAICS the PCI controller filters out any transaction that does not fall within its inbound windows by default so adding dma-ranges has the net effect of widening the DMA'able address space rather than limiting it). In short, what's the benefit of adding more dma-ranges regions to the DT (and consequently handling them in the kernel) ? Thanks, Lorenzo > >>> And realistically, if the address > >>> isn't valid then it's not going to make much difference anyway - in > >>> probably 99% of cases, either the transaction doesn't hit a window and > >>> the host bridge returns a completer abort, or it does hit a window, the > >>> AXI side returns DECERR or SLVERR, and the host bridge translates that > >>> into a completer abort. Consider also that many PCI IPs don't have > >>> discrete windows and just map the entirety of PCI mem space directly to > >>> the system PA space. > >> > >> And in that 1% of cases, we are OK with failure which could have been > >> easily prevented if the controller was programmed correctly ? That does > >> not look like a good thing. > > > > You don't need dma-ranges if you want to handle holes in DRAM. Use > > memblock to get this information. Then it will work if you boot using > > UEFI too. > > Do you have any further details about this ? > > > dma-ranges at the PCI bridge should be restrictions in the PCI bridge, > > not ones somewhere else in the system. > > -- > Best regards, > Marek Vasut
On Thu, Oct 17, 2019 at 12:33:24AM +0200, Marek Vasut wrote: > On 10/17/19 12:26 AM, Rob Herring wrote: > [...] > >>>> You can have multiple non-continuous DRAM banks for example. And an > >>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a > >>>> separate dma-ranges entry, right ? > >>> > >>> Not necessarily. We really only want to define the minimum we have to. > >>> The ideal system is no dma-ranges. Is each bank at a different > >>> relative position compared to the CPU's view of the system. That would > >>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change. > >> > >> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit > >> boundary and some more above the 32bit boundary. These two banks don't > >> need to be continuous. And then you could add the SRAM into the mix. > > > > Continuous is irrelevant. My question was in more specific terms is > > (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs. > > PCI host view (i.e. bus addr)? If not, then that is 1 translation and > > 1 dma-ranges entry. > > I don't think it's different in that aspect. Except the bus has this > 32bit limitation, where it only sees subset of the DRAM. > > Why should the DMA ranges incorrectly cover also the DRAM which is not > present ? I think this is where there is a difference in understanding. If I understand correctly, the job of the dma-ranges property isn't to describe *what* ranges the PCI device can access - it's there to describe *how*, i.e. the mapping between PCI and CPU-visible memory. The dma-ranges property is a side-effect of how the busses are wired up between the CPU and PCI controller - and so it doesn't matter what is or isn't on those buses. It's the job of other parts of the system to ensure that PCI devices are told the correct addresses to write to, e.g. the enumerating software referring to a valid CPU visible address correctly translated for the view of the PCI device, ATS etc. And any IOMMU to enforce that. It sounds like there is a 1:1 mapping between CPU and PCI - in which case there isn't a reason for a dma-ranges. Thanks, Andrew Murray > > >>> I suppose if your intent is to use inbound windows as a poor man's > >>> IOMMU to prevent accesses to the holes, then yes you would list them > >>> out. But I think that's wrong and difficult to maintain. You'd also > >>> need to deal with reserved-memory regions too. > >> > >> What's the problem with that? The bootloader has all that information > >> and can patch the DT correctly. In fact, in my specific case, I have > >> platform which can be populated with differently sized DRAM, so the > >> holes are also dynamically calculated ; there is no one DT then, the > >> bootloader is responsible to generate the dma-ranges accordingly. > > > > The problems are it doesn't work: > > > > Your dma-mask and offset are not going to be correct. > > > > You are running out of inbound windows. Your patch does nothing to > > solve that. The solution would be merging multiple dma-ranges entries > > to a single inbound window. We'd have to do that both for dma-mask and > > inbound windows. The former would also have to figure out which > > entries apply to setting up dma-mask. I'm simply suggesting just do > > that up front and avoid any pointless splits. > > But then the PCI device can trigger a transaction to non-existent DRAM > and cause undefined behavior. Surely we do not want that ? > > > You are setting up random inbound windows. The bootloader can't assume > > what order the OS parses dma-ranges, and the OS can't assume what > > order the bootloader writes the entries. > > But the OS can assume the ranges are correct and cover only valid > memory, right ? That is, memory into which the PCI controller can safely > access. > > -- > Best regards, > Marek Vasut
Hi Andrew, On Fri, Oct 18, 2019 at 12:07 PM Andrew Murray <andrew.murray@arm.com> wrote: > On Thu, Oct 17, 2019 at 12:33:24AM +0200, Marek Vasut wrote: > > On 10/17/19 12:26 AM, Rob Herring wrote: > > [...] > > >>>> You can have multiple non-continuous DRAM banks for example. And an > > >>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a > > >>>> separate dma-ranges entry, right ? > > >>> > > >>> Not necessarily. We really only want to define the minimum we have to. > > >>> The ideal system is no dma-ranges. Is each bank at a different > > >>> relative position compared to the CPU's view of the system. That would > > >>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change. > > >> > > >> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit > > >> boundary and some more above the 32bit boundary. These two banks don't > > >> need to be continuous. And then you could add the SRAM into the mix. > > > > > > Continuous is irrelevant. My question was in more specific terms is > > > (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs. > > > PCI host view (i.e. bus addr)? If not, then that is 1 translation and > > > 1 dma-ranges entry. > > > > I don't think it's different in that aspect. Except the bus has this > > 32bit limitation, where it only sees subset of the DRAM. > > > > Why should the DMA ranges incorrectly cover also the DRAM which is not > > present ? > > I think this is where there is a difference in understanding. > > If I understand correctly, the job of the dma-ranges property isn't to > describe *what* ranges the PCI device can access - it's there to describe > *how*, i.e. the mapping between PCI and CPU-visible memory. > > The dma-ranges property is a side-effect of how the busses are wired up > between the CPU and PCI controller - and so it doesn't matter what is or > isn't on those buses. > > It's the job of other parts of the system to ensure that PCI devices are > told the correct addresses to write to, e.g. the enumerating software > referring to a valid CPU visible address correctly translated for the view > of the PCI device, ATS etc. And any IOMMU to enforce that. Yep, that's what I thought, too. > It sounds like there is a 1:1 mapping between CPU and PCI - in which case > there isn't a reason for a dma-ranges. There's still the 32-bit limitation: PCI devices can access low 32-bit addresses only. Gr{oetje,eeting}s, Geert
On Fri, Oct 18, 2019 at 12:17:38PM +0200, Geert Uytterhoeven wrote: > Hi Andrew, > > On Fri, Oct 18, 2019 at 12:07 PM Andrew Murray <andrew.murray@arm.com> wrote: > > On Thu, Oct 17, 2019 at 12:33:24AM +0200, Marek Vasut wrote: > > > On 10/17/19 12:26 AM, Rob Herring wrote: > > > [...] > > > >>>> You can have multiple non-continuous DRAM banks for example. And an > > > >>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a > > > >>>> separate dma-ranges entry, right ? > > > >>> > > > >>> Not necessarily. We really only want to define the minimum we have to. > > > >>> The ideal system is no dma-ranges. Is each bank at a different > > > >>> relative position compared to the CPU's view of the system. That would > > > >>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change. > > > >> > > > >> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit > > > >> boundary and some more above the 32bit boundary. These two banks don't > > > >> need to be continuous. And then you could add the SRAM into the mix. > > > > > > > > Continuous is irrelevant. My question was in more specific terms is > > > > (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs. > > > > PCI host view (i.e. bus addr)? If not, then that is 1 translation and > > > > 1 dma-ranges entry. > > > > > > I don't think it's different in that aspect. Except the bus has this > > > 32bit limitation, where it only sees subset of the DRAM. > > > > > > Why should the DMA ranges incorrectly cover also the DRAM which is not > > > present ? > > > > I think this is where there is a difference in understanding. > > > > If I understand correctly, the job of the dma-ranges property isn't to > > describe *what* ranges the PCI device can access - it's there to describe > > *how*, i.e. the mapping between PCI and CPU-visible memory. > > > > The dma-ranges property is a side-effect of how the busses are wired up > > between the CPU and PCI controller - and so it doesn't matter what is or > > isn't on those buses. > > > > It's the job of other parts of the system to ensure that PCI devices are > > told the correct addresses to write to, e.g. the enumerating software > > referring to a valid CPU visible address correctly translated for the view > > of the PCI device, ATS etc. And any IOMMU to enforce that. > > Yep, that's what I thought, too. > > > It sounds like there is a 1:1 mapping between CPU and PCI - in which case > > there isn't a reason for a dma-ranges. > > There's still the 32-bit limitation: PCI devices can access low 32-bit addresses > only. I guess a single dma-range that is limited to 32bits would work here? Thanks, Andrew Murray > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds
On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote: > On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote: > > [...] > >>> Again, just handling the first N dma-ranges entries and ignoring the >>> rest is not 'configure the controller correctly'. >> >> It's the best effort thing to do. It's well possible the next generation >> of the controller will have more windows, so could accommodate the whole >> list of ranges. >> >> Thinking about this further, this patch should be OK either way, if >> there is a DT which defines more DMA ranges than the controller can >> handle, handling some is better than failing outright -- a PCI which >> works with a subset of memory is better than PCI that does not work at all. > > OK to sum it up, this patch is there to deal with u-boot adding multiple > dma-ranges to DT. Yes, this patch was posted over two months ago, about the same time this functionality was posted for inclusion in U-Boot. It made it into recent U-Boot release, but there was no feedback on the Linux patch until recently. U-Boot can be changed for the next release, assuming we agree on how it should behave. > I still do not understand the benefit given that for > DMA masks they are useless as Rob pointed out and ditto for inbound > windows programming (given that AFAICS the PCI controller filters out > any transaction that does not fall within its inbound windows by default > so adding dma-ranges has the net effect of widening the DMA'able address > space rather than limiting it). > > In short, what's the benefit of adding more dma-ranges regions to the > DT (and consequently handling them in the kernel) ? The benefit is programming the controller inbound windows correctly. But if there is a better way to do that, I am open to implement that. Are there any suggestions / examples of that ? However, I think the discussion strayed quite far away from the real goal of this patch. This patch only handles the case where there are too many dma-ranges in the DT which cannot all be programmed into the controller. Instead of failing, the patch allows the controller to work with smaller range and reports that in the log, which I think is better than outright failing. [...]
On 18/10/2019 13:22, Marek Vasut wrote: > On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote: >> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote: >> >> [...] >> >>>> Again, just handling the first N dma-ranges entries and ignoring the >>>> rest is not 'configure the controller correctly'. >>> >>> It's the best effort thing to do. It's well possible the next generation >>> of the controller will have more windows, so could accommodate the whole >>> list of ranges. In the context of DT describing the platform that doesn't make any sense. It's like saying it's fine for U-Boot to also describe a bunch of non-existent CPUs just because future SoCs might have them. Just because the system would probably still boot doesn't mean it's right. >>> Thinking about this further, this patch should be OK either way, if >>> there is a DT which defines more DMA ranges than the controller can >>> handle, handling some is better than failing outright -- a PCI which >>> works with a subset of memory is better than PCI that does not work at all. >> >> OK to sum it up, this patch is there to deal with u-boot adding multiple >> dma-ranges to DT. > > Yes, this patch was posted over two months ago, about the same time this > functionality was posted for inclusion in U-Boot. It made it into recent > U-Boot release, but there was no feedback on the Linux patch until recently. > > U-Boot can be changed for the next release, assuming we agree on how it > should behave. > >> I still do not understand the benefit given that for >> DMA masks they are useless as Rob pointed out and ditto for inbound >> windows programming (given that AFAICS the PCI controller filters out >> any transaction that does not fall within its inbound windows by default >> so adding dma-ranges has the net effect of widening the DMA'able address >> space rather than limiting it). >> >> In short, what's the benefit of adding more dma-ranges regions to the >> DT (and consequently handling them in the kernel) ? > > The benefit is programming the controller inbound windows correctly. > But if there is a better way to do that, I am open to implement that. > Are there any suggestions / examples of that ? The crucial thing is that once we improve the existing "dma-ranges" handling in the DMA layer such that it *does* consider multiple entries properly, platforms presenting ranges which don't actually exist will almost certainly start going wrong, and are either going to have to fix their broken bootloaders or try to make a case for platform-specific workarounds in core code. Robin. > However, I think the discussion strayed quite far away from the real > goal of this patch. This patch only handles the case where there are too > many dma-ranges in the DT which cannot all be programmed into the > controller. Instead of failing, the patch allows the controller to work > with smaller range and reports that in the log, which I think is better > than outright failing. > > [...] >
On 10/18/19 2:53 PM, Robin Murphy wrote: > On 18/10/2019 13:22, Marek Vasut wrote: >> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote: >>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote: >>> >>> [...] >>> >>>>> Again, just handling the first N dma-ranges entries and ignoring the >>>>> rest is not 'configure the controller correctly'. >>>> >>>> It's the best effort thing to do. It's well possible the next >>>> generation >>>> of the controller will have more windows, so could accommodate the >>>> whole >>>> list of ranges. > > In the context of DT describing the platform that doesn't make any > sense. It's like saying it's fine for U-Boot to also describe a bunch of > non-existent CPUs just because future SoCs might have them. Just because > the system would probably still boot doesn't mean it's right. It's the exact opposite of what you just described -- the last release of U-Boot currently populates a subset of the DMA ranges, not a superset. The dma-ranges in the Linux DT currently are a superset of available DRAM on the platform. >>>> Thinking about this further, this patch should be OK either way, if >>>> there is a DT which defines more DMA ranges than the controller can >>>> handle, handling some is better than failing outright -- a PCI which >>>> works with a subset of memory is better than PCI that does not work >>>> at all. >>> >>> OK to sum it up, this patch is there to deal with u-boot adding multiple >>> dma-ranges to DT. >> >> Yes, this patch was posted over two months ago, about the same time this >> functionality was posted for inclusion in U-Boot. It made it into recent >> U-Boot release, but there was no feedback on the Linux patch until >> recently. >> >> U-Boot can be changed for the next release, assuming we agree on how it >> should behave. >> >>> I still do not understand the benefit given that for >>> DMA masks they are useless as Rob pointed out and ditto for inbound >>> windows programming (given that AFAICS the PCI controller filters out >>> any transaction that does not fall within its inbound windows by default >>> so adding dma-ranges has the net effect of widening the DMA'able address >>> space rather than limiting it). >>> >>> In short, what's the benefit of adding more dma-ranges regions to the >>> DT (and consequently handling them in the kernel) ? >> >> The benefit is programming the controller inbound windows correctly. >> But if there is a better way to do that, I am open to implement that. >> Are there any suggestions / examples of that ? > > The crucial thing is that once we improve the existing "dma-ranges" > handling in the DMA layer such that it *does* consider multiple entries > properly, platforms presenting ranges which don't actually exist will > almost certainly start going wrong, and are either going to have to fix > their broken bootloaders or try to make a case for platform-specific > workarounds in core code. Again, this is exactly the other way around, the dma-ranges populated by U-Boot cover only existing DRAM. The single dma-range in Linux DT covers even the holes without existing DRAM. So even if the Linux dma-ranges handling changes, there should be no problem. [...]
On 18/10/2019 15:26, Marek Vasut wrote: > On 10/18/19 2:53 PM, Robin Murphy wrote: >> On 18/10/2019 13:22, Marek Vasut wrote: >>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote: >>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote: >>>> >>>> [...] >>>> >>>>>> Again, just handling the first N dma-ranges entries and ignoring the >>>>>> rest is not 'configure the controller correctly'. >>>>> >>>>> It's the best effort thing to do. It's well possible the next >>>>> generation >>>>> of the controller will have more windows, so could accommodate the >>>>> whole >>>>> list of ranges. >> >> In the context of DT describing the platform that doesn't make any >> sense. It's like saying it's fine for U-Boot to also describe a bunch of >> non-existent CPUs just because future SoCs might have them. Just because >> the system would probably still boot doesn't mean it's right. > > It's the exact opposite of what you just described -- the last release > of U-Boot currently populates a subset of the DMA ranges, not a > superset. The dma-ranges in the Linux DT currently are a superset of > available DRAM on the platform. I'm not talking about the overall coverage of addresses - I've already made clear what I think about that - I'm talking about the *number* of individual entries. If the DT binding defines that dma-ranges entries directly represent bridge windows, then the bootloader for a given platform should never generate more entries than that platform has actual windows, because to do otherwise would be bogus. >>>>> Thinking about this further, this patch should be OK either way, if >>>>> there is a DT which defines more DMA ranges than the controller can >>>>> handle, handling some is better than failing outright -- a PCI which >>>>> works with a subset of memory is better than PCI that does not work >>>>> at all. >>>> >>>> OK to sum it up, this patch is there to deal with u-boot adding multiple >>>> dma-ranges to DT. >>> >>> Yes, this patch was posted over two months ago, about the same time this >>> functionality was posted for inclusion in U-Boot. It made it into recent >>> U-Boot release, but there was no feedback on the Linux patch until >>> recently. >>> >>> U-Boot can be changed for the next release, assuming we agree on how it >>> should behave. >>> >>>> I still do not understand the benefit given that for >>>> DMA masks they are useless as Rob pointed out and ditto for inbound >>>> windows programming (given that AFAICS the PCI controller filters out >>>> any transaction that does not fall within its inbound windows by default >>>> so adding dma-ranges has the net effect of widening the DMA'able address >>>> space rather than limiting it). >>>> >>>> In short, what's the benefit of adding more dma-ranges regions to the >>>> DT (and consequently handling them in the kernel) ? >>> >>> The benefit is programming the controller inbound windows correctly. >>> But if there is a better way to do that, I am open to implement that. >>> Are there any suggestions / examples of that ? >> >> The crucial thing is that once we improve the existing "dma-ranges" >> handling in the DMA layer such that it *does* consider multiple entries >> properly, platforms presenting ranges which don't actually exist will >> almost certainly start going wrong, and are either going to have to fix >> their broken bootloaders or try to make a case for platform-specific >> workarounds in core code. > Again, this is exactly the other way around, the dma-ranges populated by > U-Boot cover only existing DRAM. The single dma-range in Linux DT covers > even the holes without existing DRAM. > > So even if the Linux dma-ranges handling changes, there should be no > problem. Say you have a single hardware window, and this DT property (1-cell numbers for simplicity: dma-ranges = <0x00000000 0x00000000 0x80000000>; Driver reads one entry and programs the window to 2GB@0, DMA setup parses the first entry and sets device masks to 0x7fffffff, and everything's fine. Now say we describe the exact same address range this way instead: dma-ranges = <0x00000000 0x00000000 0x40000000, 0x40000000 0x40000000 0x40000000>; Driver reads one entry and programs the window to 1GB@0, DMA setup parses the first entry and sets device masks to 0x3fffffff, and *today*, things are suboptimal but happen to work. Now say we finally get round to fixing the of_dma code to properly generate DMA masks that actually include all usable address bits, a user upgrades their kernel package, and reboots with that same DT... Driver reads one entry and programs the window to 1GB@0, DMA setup parses all entries and sets device masks to 0x7fffffff, devices start randomly failing or throwing DMA errors half the time, angry user looks at the changelog to find that somebody decided their now-corrupted filesystem is less important than the fact that hey, at least the machine didn't refuse to boot because the DT was obviously wrong. Are you sure that shouldn't be a problem? Now, if you want to read the DT binding as less strict and let it just describe some arbitrarily-complex set of address ranges that should be valid for DMA, that's not insurmountable; you just need more complex logic in your driver capable of calculating how best to cover *all* those ranges using the available number of windows. Robin.
On 10/18/19 5:44 PM, Robin Murphy wrote: > On 18/10/2019 15:26, Marek Vasut wrote: >> On 10/18/19 2:53 PM, Robin Murphy wrote: >>> On 18/10/2019 13:22, Marek Vasut wrote: >>>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote: >>>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote: >>>>> >>>>> [...] >>>>> >>>>>>> Again, just handling the first N dma-ranges entries and ignoring the >>>>>>> rest is not 'configure the controller correctly'. >>>>>> >>>>>> It's the best effort thing to do. It's well possible the next >>>>>> generation >>>>>> of the controller will have more windows, so could accommodate the >>>>>> whole >>>>>> list of ranges. >>> >>> In the context of DT describing the platform that doesn't make any >>> sense. It's like saying it's fine for U-Boot to also describe a bunch of >>> non-existent CPUs just because future SoCs might have them. Just because >>> the system would probably still boot doesn't mean it's right. >> >> It's the exact opposite of what you just described -- the last release >> of U-Boot currently populates a subset of the DMA ranges, not a >> superset. The dma-ranges in the Linux DT currently are a superset of >> available DRAM on the platform. > > I'm not talking about the overall coverage of addresses - I've already > made clear what I think about that - I'm talking about the *number* of > individual entries. If the DT binding defines that dma-ranges entries > directly represent bridge windows, then the bootloader for a given > platform should never generate more entries than that platform has > actual windows, because to do otherwise would be bogus. I have a feeling that's not how Rob defined the dma-ranges in this discussion though. >>>>>> Thinking about this further, this patch should be OK either way, if >>>>>> there is a DT which defines more DMA ranges than the controller can >>>>>> handle, handling some is better than failing outright -- a PCI which >>>>>> works with a subset of memory is better than PCI that does not work >>>>>> at all. >>>>> >>>>> OK to sum it up, this patch is there to deal with u-boot adding >>>>> multiple >>>>> dma-ranges to DT. >>>> >>>> Yes, this patch was posted over two months ago, about the same time >>>> this >>>> functionality was posted for inclusion in U-Boot. It made it into >>>> recent >>>> U-Boot release, but there was no feedback on the Linux patch until >>>> recently. >>>> >>>> U-Boot can be changed for the next release, assuming we agree on how it >>>> should behave. >>>> >>>>> I still do not understand the benefit given that for >>>>> DMA masks they are useless as Rob pointed out and ditto for inbound >>>>> windows programming (given that AFAICS the PCI controller filters out >>>>> any transaction that does not fall within its inbound windows by >>>>> default >>>>> so adding dma-ranges has the net effect of widening the DMA'able >>>>> address >>>>> space rather than limiting it). >>>>> >>>>> In short, what's the benefit of adding more dma-ranges regions to the >>>>> DT (and consequently handling them in the kernel) ? >>>> >>>> The benefit is programming the controller inbound windows correctly. >>>> But if there is a better way to do that, I am open to implement that. >>>> Are there any suggestions / examples of that ? >>> >>> The crucial thing is that once we improve the existing "dma-ranges" >>> handling in the DMA layer such that it *does* consider multiple entries >>> properly, platforms presenting ranges which don't actually exist will >>> almost certainly start going wrong, and are either going to have to fix >>> their broken bootloaders or try to make a case for platform-specific >>> workarounds in core code. >> Again, this is exactly the other way around, the dma-ranges populated by >> U-Boot cover only existing DRAM. The single dma-range in Linux DT covers >> even the holes without existing DRAM. >> >> So even if the Linux dma-ranges handling changes, there should be no >> problem. > > Say you have a single hardware window, and this DT property (1-cell > numbers for simplicity: > > dma-ranges = <0x00000000 0x00000000 0x80000000>; > > Driver reads one entry and programs the window to 2GB@0, DMA setup > parses the first entry and sets device masks to 0x7fffffff, and > everything's fine. > > Now say we describe the exact same address range this way instead: > > dma-ranges = <0x00000000 0x00000000 0x40000000, > 0x40000000 0x40000000 0x40000000>; > > Driver reads one entry and programs the window to 1GB@0, DMA setup > parses the first entry and sets device masks to 0x3fffffff, and *today*, > things are suboptimal but happen to work. > > Now say we finally get round to fixing the of_dma code to properly > generate DMA masks that actually include all usable address bits, a user > upgrades their kernel package, and reboots with that same DT... > > Driver reads one entry and programs the window to 1GB@0, DMA setup > parses all entries and sets device masks to 0x7fffffff, devices start > randomly failing or throwing DMA errors half the time, angry user looks > at the changelog to find that somebody decided their now-corrupted > filesystem is less important than the fact that hey, at least the > machine didn't refuse to boot because the DT was obviously wrong. Are > you sure that shouldn't be a problem? I think you picked a rather special case here and arrived as a DMA mask which just fails in this special case. Such special case doesn't happen here, and even if it did, I would expect Linux to merge those two ranges or do something sane ? If the DMA mask is set incorrectly, that's a bug of the DMA code I would think. What DMA mask would you get if those two entries had a gap inbetween them ? E.g.: dma-ranges = <0x00000000 0x00000000 0x20000000, 0x40000000 0x40000000 0x20000000>; > Now, if you want to read the DT binding as less strict and let it just > describe some arbitrarily-complex set of address ranges that should be > valid for DMA, that's not insurmountable; you just need more complex > logic in your driver capable of calculating how best to cover *all* > those ranges using the available number of windows. That's what the driver does with this patchset, except it's not possible to cover all those ranges. It covers them as well as it can.
On 18/10/2019 17:44, Marek Vasut wrote: > On 10/18/19 5:44 PM, Robin Murphy wrote: >> On 18/10/2019 15:26, Marek Vasut wrote: >>> On 10/18/19 2:53 PM, Robin Murphy wrote: >>>> On 18/10/2019 13:22, Marek Vasut wrote: >>>>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote: >>>>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote: >>>>>> >>>>>> [...] >>>>>> >>>>>>>> Again, just handling the first N dma-ranges entries and ignoring the >>>>>>>> rest is not 'configure the controller correctly'. >>>>>>> >>>>>>> It's the best effort thing to do. It's well possible the next >>>>>>> generation >>>>>>> of the controller will have more windows, so could accommodate the >>>>>>> whole >>>>>>> list of ranges. >>>> >>>> In the context of DT describing the platform that doesn't make any >>>> sense. It's like saying it's fine for U-Boot to also describe a bunch of >>>> non-existent CPUs just because future SoCs might have them. Just because >>>> the system would probably still boot doesn't mean it's right. >>> >>> It's the exact opposite of what you just described -- the last release >>> of U-Boot currently populates a subset of the DMA ranges, not a >>> superset. The dma-ranges in the Linux DT currently are a superset of >>> available DRAM on the platform. >> >> I'm not talking about the overall coverage of addresses - I've already >> made clear what I think about that - I'm talking about the *number* of >> individual entries. If the DT binding defines that dma-ranges entries >> directly represent bridge windows, then the bootloader for a given >> platform should never generate more entries than that platform has >> actual windows, because to do otherwise would be bogus. > > I have a feeling that's not how Rob defined the dma-ranges in this > discussion though. > >>>>>>> Thinking about this further, this patch should be OK either way, if >>>>>>> there is a DT which defines more DMA ranges than the controller can >>>>>>> handle, handling some is better than failing outright -- a PCI which >>>>>>> works with a subset of memory is better than PCI that does not work >>>>>>> at all. >>>>>> >>>>>> OK to sum it up, this patch is there to deal with u-boot adding >>>>>> multiple >>>>>> dma-ranges to DT. >>>>> >>>>> Yes, this patch was posted over two months ago, about the same time >>>>> this >>>>> functionality was posted for inclusion in U-Boot. It made it into >>>>> recent >>>>> U-Boot release, but there was no feedback on the Linux patch until >>>>> recently. >>>>> >>>>> U-Boot can be changed for the next release, assuming we agree on how it >>>>> should behave. >>>>> >>>>>> I still do not understand the benefit given that for >>>>>> DMA masks they are useless as Rob pointed out and ditto for inbound >>>>>> windows programming (given that AFAICS the PCI controller filters out >>>>>> any transaction that does not fall within its inbound windows by >>>>>> default >>>>>> so adding dma-ranges has the net effect of widening the DMA'able >>>>>> address >>>>>> space rather than limiting it). >>>>>> >>>>>> In short, what's the benefit of adding more dma-ranges regions to the >>>>>> DT (and consequently handling them in the kernel) ? >>>>> >>>>> The benefit is programming the controller inbound windows correctly. >>>>> But if there is a better way to do that, I am open to implement that. >>>>> Are there any suggestions / examples of that ? >>>> >>>> The crucial thing is that once we improve the existing "dma-ranges" >>>> handling in the DMA layer such that it *does* consider multiple entries >>>> properly, platforms presenting ranges which don't actually exist will >>>> almost certainly start going wrong, and are either going to have to fix >>>> their broken bootloaders or try to make a case for platform-specific >>>> workarounds in core code. >>> Again, this is exactly the other way around, the dma-ranges populated by >>> U-Boot cover only existing DRAM. The single dma-range in Linux DT covers >>> even the holes without existing DRAM. >>> >>> So even if the Linux dma-ranges handling changes, there should be no >>> problem. >> >> Say you have a single hardware window, and this DT property (1-cell >> numbers for simplicity: >> >> dma-ranges = <0x00000000 0x00000000 0x80000000>; >> >> Driver reads one entry and programs the window to 2GB@0, DMA setup >> parses the first entry and sets device masks to 0x7fffffff, and >> everything's fine. >> >> Now say we describe the exact same address range this way instead: >> >> dma-ranges = <0x00000000 0x00000000 0x40000000, >> 0x40000000 0x40000000 0x40000000>; >> >> Driver reads one entry and programs the window to 1GB@0, DMA setup >> parses the first entry and sets device masks to 0x3fffffff, and *today*, >> things are suboptimal but happen to work. >> >> Now say we finally get round to fixing the of_dma code to properly >> generate DMA masks that actually include all usable address bits, a user >> upgrades their kernel package, and reboots with that same DT... >> >> Driver reads one entry and programs the window to 1GB@0, DMA setup >> parses all entries and sets device masks to 0x7fffffff, devices start >> randomly failing or throwing DMA errors half the time, angry user looks >> at the changelog to find that somebody decided their now-corrupted >> filesystem is less important than the fact that hey, at least the >> machine didn't refuse to boot because the DT was obviously wrong. Are >> you sure that shouldn't be a problem? > > I think you picked a rather special case here and arrived as a DMA mask > which just fails in this special case. Such special case doesn't happen > here, and even if it did, I would expect Linux to merge those two ranges > or do something sane ? If the DMA mask is set incorrectly, that's a bug > of the DMA code I would think. The mask is not set incorrectly - DMA masks represent the number of address bits the device (or intermediate interconnect in the case of the bus mask) is capable of driving. Thus when DMA is limited to a specific address range, the masks should be wide enough to cover the topmost address of that range (unless the device's own capability is inherently narrower). > What DMA mask would you get if those two entries had a gap inbetween > them ? E.g.: > > dma-ranges = <0x00000000 0x00000000 0x20000000, > 0x40000000 0x40000000 0x20000000>; OK, here's an real non-simplified example (note that these windows are fixed and not programmed by Linux): dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>, <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, <0x43000000 0x8 0x80000000 0x8 0x80000000 0x2 0x00000000>; The DMA masks for the devices behind this bridge *should* be 35 bits, because that's the size of the largest usable address. Currently, however, because of the of_dma code's deficiency they would end up being an utterly useless 30 bits, which isn't even enough to reach the start of RAM. Thus I can't actually have this property in my DT, and as a result I can't enable the IOMMU, because *that* also needs to know the ranges in order to reserve the unusable gaps between the windows once address translation is in play. >> Now, if you want to read the DT binding as less strict and let it just >> describe some arbitrarily-complex set of address ranges that should be >> valid for DMA, that's not insurmountable; you just need more complex >> logic in your driver capable of calculating how best to cover *all* >> those ranges using the available number of windows. > > That's what the driver does with this patchset, except it's not possible > to cover all those ranges. It covers them as well as it can. Which means by definition it *doesn't* do what I suggested there... Robin.
On 10/18/19 7:35 PM, Robin Murphy wrote: > On 18/10/2019 17:44, Marek Vasut wrote: >> On 10/18/19 5:44 PM, Robin Murphy wrote: >>> On 18/10/2019 15:26, Marek Vasut wrote: >>>> On 10/18/19 2:53 PM, Robin Murphy wrote: >>>>> On 18/10/2019 13:22, Marek Vasut wrote: >>>>>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote: >>>>>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>>> Again, just handling the first N dma-ranges entries and >>>>>>>>> ignoring the >>>>>>>>> rest is not 'configure the controller correctly'. >>>>>>>> >>>>>>>> It's the best effort thing to do. It's well possible the next >>>>>>>> generation >>>>>>>> of the controller will have more windows, so could accommodate the >>>>>>>> whole >>>>>>>> list of ranges. >>>>> >>>>> In the context of DT describing the platform that doesn't make any >>>>> sense. It's like saying it's fine for U-Boot to also describe a >>>>> bunch of >>>>> non-existent CPUs just because future SoCs might have them. Just >>>>> because >>>>> the system would probably still boot doesn't mean it's right. >>>> >>>> It's the exact opposite of what you just described -- the last release >>>> of U-Boot currently populates a subset of the DMA ranges, not a >>>> superset. The dma-ranges in the Linux DT currently are a superset of >>>> available DRAM on the platform. >>> >>> I'm not talking about the overall coverage of addresses - I've already >>> made clear what I think about that - I'm talking about the *number* of >>> individual entries. If the DT binding defines that dma-ranges entries >>> directly represent bridge windows, then the bootloader for a given >>> platform should never generate more entries than that platform has >>> actual windows, because to do otherwise would be bogus. >> >> I have a feeling that's not how Rob defined the dma-ranges in this >> discussion though. >> >>>>>>>> Thinking about this further, this patch should be OK either way, if >>>>>>>> there is a DT which defines more DMA ranges than the controller can >>>>>>>> handle, handling some is better than failing outright -- a PCI >>>>>>>> which >>>>>>>> works with a subset of memory is better than PCI that does not work >>>>>>>> at all. >>>>>>> >>>>>>> OK to sum it up, this patch is there to deal with u-boot adding >>>>>>> multiple >>>>>>> dma-ranges to DT. >>>>>> >>>>>> Yes, this patch was posted over two months ago, about the same time >>>>>> this >>>>>> functionality was posted for inclusion in U-Boot. It made it into >>>>>> recent >>>>>> U-Boot release, but there was no feedback on the Linux patch until >>>>>> recently. >>>>>> >>>>>> U-Boot can be changed for the next release, assuming we agree on >>>>>> how it >>>>>> should behave. >>>>>> >>>>>>> I still do not understand the benefit given that for >>>>>>> DMA masks they are useless as Rob pointed out and ditto for inbound >>>>>>> windows programming (given that AFAICS the PCI controller filters >>>>>>> out >>>>>>> any transaction that does not fall within its inbound windows by >>>>>>> default >>>>>>> so adding dma-ranges has the net effect of widening the DMA'able >>>>>>> address >>>>>>> space rather than limiting it). >>>>>>> >>>>>>> In short, what's the benefit of adding more dma-ranges regions to >>>>>>> the >>>>>>> DT (and consequently handling them in the kernel) ? >>>>>> >>>>>> The benefit is programming the controller inbound windows correctly. >>>>>> But if there is a better way to do that, I am open to implement that. >>>>>> Are there any suggestions / examples of that ? >>>>> >>>>> The crucial thing is that once we improve the existing "dma-ranges" >>>>> handling in the DMA layer such that it *does* consider multiple >>>>> entries >>>>> properly, platforms presenting ranges which don't actually exist will >>>>> almost certainly start going wrong, and are either going to have to >>>>> fix >>>>> their broken bootloaders or try to make a case for platform-specific >>>>> workarounds in core code. >>>> Again, this is exactly the other way around, the dma-ranges >>>> populated by >>>> U-Boot cover only existing DRAM. The single dma-range in Linux DT >>>> covers >>>> even the holes without existing DRAM. >>>> >>>> So even if the Linux dma-ranges handling changes, there should be no >>>> problem. >>> >>> Say you have a single hardware window, and this DT property (1-cell >>> numbers for simplicity: >>> >>> dma-ranges = <0x00000000 0x00000000 0x80000000>; >>> >>> Driver reads one entry and programs the window to 2GB@0, DMA setup >>> parses the first entry and sets device masks to 0x7fffffff, and >>> everything's fine. >>> >>> Now say we describe the exact same address range this way instead: >>> >>> dma-ranges = <0x00000000 0x00000000 0x40000000, >>> 0x40000000 0x40000000 0x40000000>; >>> >>> Driver reads one entry and programs the window to 1GB@0, DMA setup >>> parses the first entry and sets device masks to 0x3fffffff, and *today*, >>> things are suboptimal but happen to work. >>> >>> Now say we finally get round to fixing the of_dma code to properly >>> generate DMA masks that actually include all usable address bits, a user >>> upgrades their kernel package, and reboots with that same DT... >>> >>> Driver reads one entry and programs the window to 1GB@0, DMA setup >>> parses all entries and sets device masks to 0x7fffffff, devices start >>> randomly failing or throwing DMA errors half the time, angry user looks >>> at the changelog to find that somebody decided their now-corrupted >>> filesystem is less important than the fact that hey, at least the >>> machine didn't refuse to boot because the DT was obviously wrong. Are >>> you sure that shouldn't be a problem? >> >> I think you picked a rather special case here and arrived as a DMA mask >> which just fails in this special case. Such special case doesn't happen >> here, and even if it did, I would expect Linux to merge those two ranges >> or do something sane ? If the DMA mask is set incorrectly, that's a bug >> of the DMA code I would think. > > The mask is not set incorrectly - DMA masks represent the number of > address bits the device (or intermediate interconnect in the case of the > bus mask) is capable of driving. Thus when DMA is limited to a specific > address range, the masks should be wide enough to cover the topmost > address of that range (unless the device's own capability is inherently > narrower). Then the mask should be 0x7fffffff in both cases I'd say. >> What DMA mask would you get if those two entries had a gap inbetween >> them ? E.g.: >> >> dma-ranges = <0x00000000 0x00000000 0x20000000, >> 0x40000000 0x40000000 0x20000000>; > > OK, here's an real non-simplified example I would really like an answer to the simple example above before we start inventing convoluted ones. > (note that these windows are fixed and not programmed by Linux): > > dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>, > <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, > <0x43000000 0x8 0x80000000 0x8 0x80000000 0x2 0x00000000>; > > The DMA masks for the devices behind this bridge *should* be 35 bits, > because that's the size of the largest usable address. Currently, > however, because of the of_dma code's deficiency they would end up being > an utterly useless 30 bits, which isn't even enough to reach the start > of RAM. Thus I can't actually have this property in my DT, and as a > result I can't enable the IOMMU, because *that* also needs to know the > ranges in order to reserve the unusable gaps between the windows once > address translation is in play. How is this related to this particular patch ? This looks more like some DMA internal topic. >>> Now, if you want to read the DT binding as less strict and let it just >>> describe some arbitrarily-complex set of address ranges that should be >>> valid for DMA, that's not insurmountable; you just need more complex >>> logic in your driver capable of calculating how best to cover *all* >>> those ranges using the available number of windows. >> >> That's what the driver does with this patchset, except it's not possible >> to cover all those ranges. It covers them as well as it can. > > Which means by definition it *doesn't* do what I suggested there... But then it didn't do this before either, or ... ?
Hi Marek, On Fri, Oct 18, 2019 at 9:03 PM Marek Vasut <marek.vasut@gmail.com> wrote: > On 10/18/19 7:35 PM, Robin Murphy wrote: > > On 18/10/2019 17:44, Marek Vasut wrote: > >> On 10/18/19 5:44 PM, Robin Murphy wrote: > >>> On 18/10/2019 15:26, Marek Vasut wrote: > >>>> On 10/18/19 2:53 PM, Robin Murphy wrote: > >>>>> On 18/10/2019 13:22, Marek Vasut wrote: > >>>>>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote: > >>>>>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote: > >>>>>>> > >>>>>>> [...] > >>>>>>> > >>>>>>>>> Again, just handling the first N dma-ranges entries and > >>>>>>>>> ignoring the > >>>>>>>>> rest is not 'configure the controller correctly'. > >>>>>>>> > >>>>>>>> It's the best effort thing to do. It's well possible the next > >>>>>>>> generation > >>>>>>>> of the controller will have more windows, so could accommodate the > >>>>>>>> whole > >>>>>>>> list of ranges. > >>>>> > >>>>> In the context of DT describing the platform that doesn't make any > >>>>> sense. It's like saying it's fine for U-Boot to also describe a > >>>>> bunch of > >>>>> non-existent CPUs just because future SoCs might have them. Just > >>>>> because > >>>>> the system would probably still boot doesn't mean it's right. > >>>> > >>>> It's the exact opposite of what you just described -- the last release > >>>> of U-Boot currently populates a subset of the DMA ranges, not a > >>>> superset. The dma-ranges in the Linux DT currently are a superset of > >>>> available DRAM on the platform. > >>> > >>> I'm not talking about the overall coverage of addresses - I've already > >>> made clear what I think about that - I'm talking about the *number* of > >>> individual entries. If the DT binding defines that dma-ranges entries > >>> directly represent bridge windows, then the bootloader for a given > >>> platform should never generate more entries than that platform has > >>> actual windows, because to do otherwise would be bogus. > >> > >> I have a feeling that's not how Rob defined the dma-ranges in this > >> discussion though. > >> > >>>>>>>> Thinking about this further, this patch should be OK either way, if > >>>>>>>> there is a DT which defines more DMA ranges than the controller can > >>>>>>>> handle, handling some is better than failing outright -- a PCI > >>>>>>>> which > >>>>>>>> works with a subset of memory is better than PCI that does not work > >>>>>>>> at all. > >>>>>>> > >>>>>>> OK to sum it up, this patch is there to deal with u-boot adding > >>>>>>> multiple > >>>>>>> dma-ranges to DT. > >>>>>> > >>>>>> Yes, this patch was posted over two months ago, about the same time > >>>>>> this > >>>>>> functionality was posted for inclusion in U-Boot. It made it into > >>>>>> recent > >>>>>> U-Boot release, but there was no feedback on the Linux patch until > >>>>>> recently. > >>>>>> > >>>>>> U-Boot can be changed for the next release, assuming we agree on > >>>>>> how it > >>>>>> should behave. > >>>>>> > >>>>>>> I still do not understand the benefit given that for > >>>>>>> DMA masks they are useless as Rob pointed out and ditto for inbound > >>>>>>> windows programming (given that AFAICS the PCI controller filters > >>>>>>> out > >>>>>>> any transaction that does not fall within its inbound windows by > >>>>>>> default > >>>>>>> so adding dma-ranges has the net effect of widening the DMA'able > >>>>>>> address > >>>>>>> space rather than limiting it). > >>>>>>> > >>>>>>> In short, what's the benefit of adding more dma-ranges regions to > >>>>>>> the > >>>>>>> DT (and consequently handling them in the kernel) ? > >>>>>> > >>>>>> The benefit is programming the controller inbound windows correctly. > >>>>>> But if there is a better way to do that, I am open to implement that. > >>>>>> Are there any suggestions / examples of that ? > >>>>> > >>>>> The crucial thing is that once we improve the existing "dma-ranges" > >>>>> handling in the DMA layer such that it *does* consider multiple > >>>>> entries > >>>>> properly, platforms presenting ranges which don't actually exist will > >>>>> almost certainly start going wrong, and are either going to have to > >>>>> fix > >>>>> their broken bootloaders or try to make a case for platform-specific > >>>>> workarounds in core code. > >>>> Again, this is exactly the other way around, the dma-ranges > >>>> populated by > >>>> U-Boot cover only existing DRAM. The single dma-range in Linux DT > >>>> covers > >>>> even the holes without existing DRAM. > >>>> > >>>> So even if the Linux dma-ranges handling changes, there should be no > >>>> problem. > >>> > >>> Say you have a single hardware window, and this DT property (1-cell > >>> numbers for simplicity: > >>> > >>> dma-ranges = <0x00000000 0x00000000 0x80000000>; > >>> > >>> Driver reads one entry and programs the window to 2GB@0, DMA setup > >>> parses the first entry and sets device masks to 0x7fffffff, and > >>> everything's fine. > >>> > >>> Now say we describe the exact same address range this way instead: > >>> > >>> dma-ranges = <0x00000000 0x00000000 0x40000000, > >>> 0x40000000 0x40000000 0x40000000>; > >>> > >>> Driver reads one entry and programs the window to 1GB@0, DMA setup > >>> parses the first entry and sets device masks to 0x3fffffff, and *today*, > >>> things are suboptimal but happen to work. > >>> > >>> Now say we finally get round to fixing the of_dma code to properly > >>> generate DMA masks that actually include all usable address bits, a user > >>> upgrades their kernel package, and reboots with that same DT... > >>> > >>> Driver reads one entry and programs the window to 1GB@0, DMA setup > >>> parses all entries and sets device masks to 0x7fffffff, devices start > >>> randomly failing or throwing DMA errors half the time, angry user looks > >>> at the changelog to find that somebody decided their now-corrupted > >>> filesystem is less important than the fact that hey, at least the > >>> machine didn't refuse to boot because the DT was obviously wrong. Are > >>> you sure that shouldn't be a problem? > >> > >> I think you picked a rather special case here and arrived as a DMA mask > >> which just fails in this special case. Such special case doesn't happen > >> here, and even if it did, I would expect Linux to merge those two ranges > >> or do something sane ? If the DMA mask is set incorrectly, that's a bug > >> of the DMA code I would think. > > > > The mask is not set incorrectly - DMA masks represent the number of > > address bits the device (or intermediate interconnect in the case of the > > bus mask) is capable of driving. Thus when DMA is limited to a specific > > address range, the masks should be wide enough to cover the topmost > > address of that range (unless the device's own capability is inherently > > narrower). > > Then the mask should be 0x7fffffff in both cases I'd say. > > >> What DMA mask would you get if those two entries had a gap inbetween > >> them ? E.g.: > >> > >> dma-ranges = <0x00000000 0x00000000 0x20000000, > >> 0x40000000 0x40000000 0x20000000>; > > > > OK, here's an real non-simplified example > > I would really like an answer to the simple example above before we > start inventing convoluted ones. I'd say that depends on the actual memory configuration, i.e. does memory exist in the gap? Does memory exist above 0x5fffffff? BTW, reading Devicetree Specification, Release v0.2: dma-ranges describes bus address space translation; it does not describe if the full address space is backed by physical RAM (see the /memory node). Hence if there's no memory in the gap, I see no reason to have 2 entries in dma-ranges, as the translation offsets are the same, so they could be covered by a single larger entry. Gr{oetje,eeting}s, Geert
On Fri, Aug 16, 2019 at 03:28:04PM +0200, Marek Vasut wrote: > On 8/16/19 3:23 PM, Simon Horman wrote: > > On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote: > >> From: Marek Vasut <marek.vasut+renesas@gmail.com> > >> > >> In case the "dma-ranges" DT property contains either too many ranges > >> or the range start address is unaligned in such a way that populating > >> the range into the controller requires multiple entries, a situation > >> may occur where all ranges cannot be loaded into the controller. > >> > >> Currently, the driver refuses to probe in such a situation. Relax this > >> behavior, load as many ranges as possible and warn if some ranges do > >> not fit anymore. > > > > What is the motivation for relaxing this? > > U-Boot can fill the ranges in properly now, the list would be longer in > such a case and the driver would fail to probe (because the list is > longer than what the hardware can support). Is this the U-Boot patch you refer to: https://patchwork.ozlabs.org/patch/1129436/ As pci_set_region is called with the same address for PCI and CPU memory this implies there is a 1:1 mapping - therefore I don't see a need for multiple mappings for each DRAM bank. (Also if this controller has a 32 bit limitation, shouldn't this code limit the addresses before calling pci_set_region?). Thanks, Andrew Murray > > -- > Best regards, > Marek Vasut
On 10/21/19 12:18 PM, Andrew Murray wrote: [...] >>>> In case the "dma-ranges" DT property contains either too many ranges >>>> or the range start address is unaligned in such a way that populating >>>> the range into the controller requires multiple entries, a situation >>>> may occur where all ranges cannot be loaded into the controller. >>>> >>>> Currently, the driver refuses to probe in such a situation. Relax this >>>> behavior, load as many ranges as possible and warn if some ranges do >>>> not fit anymore. >>> >>> What is the motivation for relaxing this? >> >> U-Boot can fill the ranges in properly now, the list would be longer in >> such a case and the driver would fail to probe (because the list is >> longer than what the hardware can support). > > Is this the U-Boot patch you refer to: > > https://patchwork.ozlabs.org/patch/1129436/ Yes. > As pci_set_region is called with the same address for PCI and CPU memory > this implies there is a 1:1 mapping - therefore I don't see a need for > multiple mappings for each DRAM bank. (Also if this controller has a > 32 bit limitation, shouldn't this code limit the addresses before calling > pci_set_region?). It would certainly be helpful to know about this dma-ranges detail earlier, this whole thing could've been avoided. Now all I can do is get that patch reverted for the next U-Boot release. But this still leaves me with one open question -- how do I figure out what to program into the PCI controller inbound windows, so that the controller correctly filters inbound transfers which are targetting nonexisting memory ?
On Sat, Oct 26, 2019 at 08:03:12PM +0200, Marek Vasut wrote: > On 10/21/19 12:18 PM, Andrew Murray wrote: > [...] > >>>> In case the "dma-ranges" DT property contains either too many ranges > >>>> or the range start address is unaligned in such a way that populating > >>>> the range into the controller requires multiple entries, a situation > >>>> may occur where all ranges cannot be loaded into the controller. > >>>> > >>>> Currently, the driver refuses to probe in such a situation. Relax this > >>>> behavior, load as many ranges as possible and warn if some ranges do > >>>> not fit anymore. > >>> > >>> What is the motivation for relaxing this? > >> > >> U-Boot can fill the ranges in properly now, the list would be longer in > >> such a case and the driver would fail to probe (because the list is > >> longer than what the hardware can support). > > > > Is this the U-Boot patch you refer to: > > > > https://patchwork.ozlabs.org/patch/1129436/ > > Yes. > > > As pci_set_region is called with the same address for PCI and CPU memory > > this implies there is a 1:1 mapping - therefore I don't see a need for > > multiple mappings for each DRAM bank. (Also if this controller has a > > 32 bit limitation, shouldn't this code limit the addresses before calling > > pci_set_region?). > It would certainly be helpful to know about this dma-ranges detail > earlier, this whole thing could've been avoided. Now all I can do is get > that patch reverted for the next U-Boot release. Yes, I can appreciate the frustration this delay has caused. Though as there are now more reviewers for PCI controllers on this list, future patches ought to get feedback sooner. > > But this still leaves me with one open question -- how do I figure out > what to program into the PCI controller inbound windows, so that the > controller correctly filters inbound transfers which are targetting > nonexisting memory ? Your driver should program into the RC->CPU windows, the exact ranges described in the dma-ranges. Whilst also respecting the alignment and max-size rules your controller has (e.g. the existing upstream logic and also the new logic that recalculates the alignment per entry). As far as I can tell from looking at your U-Boot patch, I think I'd expect a single dma-range to be presented in the DT, that describes 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your controller is limited to 32 bits. And 2) there is a linear mapping between PCI and CPU addresses (given that the second and third arguments on pci_set_region are both the same). As you point out, this range includes lots of things that you don't want the RC to touch - such as non-existent memory. This is OK, when Linux programs addresses into the various EP's for them to DMA to host memory, it uses its own logic to select addresses that are in RAM, the purpose of the dma-range is to describe what the CPU RAM address looks like from the perspective of the RC (for example if the RC was wired with an offset such that made memory writes from the RC made to 0x00000000 end up on the system map at 0x80000000, we need to tell Linux about this offset. Otherwise when a EP device driver programs a DMA address of a RAM buffer at 0x90000000, it'll end up targetting 0x110000000. Thankfully our dma-range will tell Linux to apply an offset such that the actual address written to the EP is 0x10000000.). In your case the dma-range also serves to describe a limit to the range of addresses we can reach. Thanks, Andrew Murray > > -- > Best regards, > Marek Vasut
On Sat, Oct 26, 2019 at 09:36:28PM +0100, Andrew Murray wrote: > On Sat, Oct 26, 2019 at 08:03:12PM +0200, Marek Vasut wrote: > > On 10/21/19 12:18 PM, Andrew Murray wrote: > > [...] > > >>>> In case the "dma-ranges" DT property contains either too many ranges > > >>>> or the range start address is unaligned in such a way that populating > > >>>> the range into the controller requires multiple entries, a situation > > >>>> may occur where all ranges cannot be loaded into the controller. > > >>>> > > >>>> Currently, the driver refuses to probe in such a situation. Relax this > > >>>> behavior, load as many ranges as possible and warn if some ranges do > > >>>> not fit anymore. > > >>> > > >>> What is the motivation for relaxing this? > > >> > > >> U-Boot can fill the ranges in properly now, the list would be longer in > > >> such a case and the driver would fail to probe (because the list is > > >> longer than what the hardware can support). > > > > > > Is this the U-Boot patch you refer to: > > > > > > https://patchwork.ozlabs.org/patch/1129436/ > > > > Yes. > > > > > As pci_set_region is called with the same address for PCI and CPU memory > > > this implies there is a 1:1 mapping - therefore I don't see a need for > > > multiple mappings for each DRAM bank. (Also if this controller has a > > > 32 bit limitation, shouldn't this code limit the addresses before calling > > > pci_set_region?). > > It would certainly be helpful to know about this dma-ranges detail > > earlier, this whole thing could've been avoided. Now all I can do is get > > that patch reverted for the next U-Boot release. > > Yes, I can appreciate the frustration this delay has caused. Though as there > are now more reviewers for PCI controllers on this list, future patches ought > to get feedback sooner. > > > > > But this still leaves me with one open question -- how do I figure out > > what to program into the PCI controller inbound windows, so that the > > controller correctly filters inbound transfers which are targetting > > nonexisting memory ? > > Your driver should program into the RC->CPU windows, the exact ranges > described in the dma-ranges. Whilst also respecting the alignment and > max-size rules your controller has (e.g. the existing upstream logic > and also the new logic that recalculates the alignment per entry). > > As far as I can tell from looking at your U-Boot patch, I think I'd expect > a single dma-range to be presented in the DT, that describes > 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your > controller is limited to 32 bits. And 2) there is a linear mapping between > PCI and CPU addresses (given that the second and third arguments on > pci_set_region are both the same). > > As you point out, this range includes lots of things that you don't > want the RC to touch - such as non-existent memory. This is OK, when > Linux programs addresses into the various EP's for them to DMA to host > memory, it uses its own logic to select addresses that are in RAM, the > purpose of the dma-range is to describe what the CPU RAM address looks > like from the perspective of the RC (for example if the RC was wired > with an offset such that made memory writes from the RC made to > 0x00000000 end up on the system map at 0x80000000, we need to tell Linux > about this offset. Otherwise when a EP device driver programs a DMA > address of a RAM buffer at 0x90000000, it'll end up targetting > 0x110000000. Thankfully our dma-range will tell Linux to apply an offset > such that the actual address written to the EP is 0x10000000.). That last sentence should have read "Thankfully our dma-range will tell the RC to use its address translation such that the actual address written on the bus by the RC is 0x10000000.)." Thanks, Andrew Murray > > In your case the dma-range also serves to describe a limit to the range > of addresses we can reach. > > Thanks, > > Andrew Murray > > > > > -- > > Best regards, > > Marek Vasut
On 10/26/19 10:36 PM, Andrew Murray wrote: [...]>> But this still leaves me with one open question -- how do I figure out >> what to program into the PCI controller inbound windows, so that the >> controller correctly filters inbound transfers which are targetting >> nonexisting memory ? > > Your driver should program into the RC->CPU windows, the exact ranges > described in the dma-ranges. Whilst also respecting the alignment and > max-size rules your controller has (e.g. the existing upstream logic > and also the new logic that recalculates the alignment per entry). > > As far as I can tell from looking at your U-Boot patch, I think I'd expect > a single dma-range to be presented in the DT, that describes > 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your > controller is limited to 32 bits. And 2) there is a linear mapping between > PCI and CPU addresses (given that the second and third arguments on > pci_set_region are both the same). > > As you point out, this range includes lots of things that you don't > want the RC to touch - such as non-existent memory. This is OK, when > Linux programs addresses into the various EP's for them to DMA to host > memory, it uses its own logic to select addresses that are in RAM, the > purpose of the dma-range is to describe what the CPU RAM address looks > like from the perspective of the RC (for example if the RC was wired > with an offset such that made memory writes from the RC made to > 0x00000000 end up on the system map at 0x80000000, we need to tell Linux > about this offset. Otherwise when a EP device driver programs a DMA > address of a RAM buffer at 0x90000000, it'll end up targetting > 0x110000000. Thankfully our dma-range will tell Linux to apply an offset > such that the actual address written to the EP is 0x10000000.). I understand that Linux programs the endpoints correctly. However this still doesn't prevent the endpoint from being broken and from sending a transaction to that non-existent memory. The PCI controller can prevent that and in an automotive SoC, I would very much like the PCI controller to do just that, rather than hope that the endpoint would always work. > In your case the dma-range also serves to describe a limit to the range > of addresses we can reach. [...]
On Thu, Nov 07, 2019 at 12:37:44AM +0100, Marek Vasut wrote: > On 10/26/19 10:36 PM, Andrew Murray wrote: > [...]>> But this still leaves me with one open question -- how do I > figure out > >> what to program into the PCI controller inbound windows, so that the > >> controller correctly filters inbound transfers which are targetting > >> nonexisting memory ? > > > > Your driver should program into the RC->CPU windows, the exact ranges > > described in the dma-ranges. Whilst also respecting the alignment and > > max-size rules your controller has (e.g. the existing upstream logic > > and also the new logic that recalculates the alignment per entry). > > > > As far as I can tell from looking at your U-Boot patch, I think I'd expect > > a single dma-range to be presented in the DT, that describes > > 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your > > controller is limited to 32 bits. And 2) there is a linear mapping between > > PCI and CPU addresses (given that the second and third arguments on > > pci_set_region are both the same). > > > > As you point out, this range includes lots of things that you don't > > want the RC to touch - such as non-existent memory. This is OK, when > > Linux programs addresses into the various EP's for them to DMA to host > > memory, it uses its own logic to select addresses that are in RAM, the > > purpose of the dma-range is to describe what the CPU RAM address looks > > like from the perspective of the RC (for example if the RC was wired > > with an offset such that made memory writes from the RC made to > > 0x00000000 end up on the system map at 0x80000000, we need to tell Linux > > about this offset. Otherwise when a EP device driver programs a DMA > > address of a RAM buffer at 0x90000000, it'll end up targetting > > 0x110000000. Thankfully our dma-range will tell Linux to apply an offset > > such that the actual address written to the EP is 0x10000000.). > > I understand that Linux programs the endpoints correctly. However this > still doesn't prevent the endpoint from being broken and from sending a > transaction to that non-existent memory. Correct. > The PCI controller can prevent > that and in an automotive SoC, I would very much like the PCI controller > to do just that, rather than hope that the endpoint would always work. OK I understand - At least when working on the assumption that your RC will block RC->CPU transactions that are not described in any of it's windows. Thus you want to use the dma-ranges as a means to configure your controller to do this. What actually happens if you have a broken endpoint that reads/writes to non-existent memory on this hardware? Ideally the RC would generate a CA or UR back to the endpoint - does something else happen? Lockup, dead RC, performance issues? Using built-in features of the RC to prevent it from sending transactions to non-existent addresses is clearly helpful. But of course it doesn't stop a broken EP from writing to existent addresses, so only provides limited protection. Despite the good intentions here, it doesn't seem like dma-ranges is designed for this purpose and as the hardware has limited ranges it will only be best-effort. Thanks, Andrew Murray > > > In your case the dma-range also serves to describe a limit to the range > > of addresses we can reach. > > [...] > > -- > Best regards, > Marek Vasut
On 11/7/19 3:19 PM, Andrew Murray wrote: > On Thu, Nov 07, 2019 at 12:37:44AM +0100, Marek Vasut wrote: >> On 10/26/19 10:36 PM, Andrew Murray wrote: >> [...]>> But this still leaves me with one open question -- how do I >> figure out >>>> what to program into the PCI controller inbound windows, so that the >>>> controller correctly filters inbound transfers which are targetting >>>> nonexisting memory ? >>> >>> Your driver should program into the RC->CPU windows, the exact ranges >>> described in the dma-ranges. Whilst also respecting the alignment and >>> max-size rules your controller has (e.g. the existing upstream logic >>> and also the new logic that recalculates the alignment per entry). >>> >>> As far as I can tell from looking at your U-Boot patch, I think I'd expect >>> a single dma-range to be presented in the DT, that describes >>> 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your >>> controller is limited to 32 bits. And 2) there is a linear mapping between >>> PCI and CPU addresses (given that the second and third arguments on >>> pci_set_region are both the same). >>> >>> As you point out, this range includes lots of things that you don't >>> want the RC to touch - such as non-existent memory. This is OK, when >>> Linux programs addresses into the various EP's for them to DMA to host >>> memory, it uses its own logic to select addresses that are in RAM, the >>> purpose of the dma-range is to describe what the CPU RAM address looks >>> like from the perspective of the RC (for example if the RC was wired >>> with an offset such that made memory writes from the RC made to >>> 0x00000000 end up on the system map at 0x80000000, we need to tell Linux >>> about this offset. Otherwise when a EP device driver programs a DMA >>> address of a RAM buffer at 0x90000000, it'll end up targetting >>> 0x110000000. Thankfully our dma-range will tell Linux to apply an offset >>> such that the actual address written to the EP is 0x10000000.). >> >> I understand that Linux programs the endpoints correctly. However this >> still doesn't prevent the endpoint from being broken and from sending a >> transaction to that non-existent memory. > > Correct. > >> The PCI controller can prevent >> that and in an automotive SoC, I would very much like the PCI controller >> to do just that, rather than hope that the endpoint would always work. > > OK I understand - At least when working on the assumption that your RC will > block RC->CPU transactions that are not described in any of it's windows. > Thus you want to use the dma-ranges as a means to configure your controller > to do this. Yes > What actually happens if you have a broken endpoint that reads/writes to > non-existent memory on this hardware? Ideally the RC would generate a > CA or UR back to the endpoint - does something else happen? Lockup, dead RC, > performance issues? The behavior is undefined. > Using built-in features of the RC to prevent it from sending transactions > to non-existent addresses is clearly helpful. But of course it doesn't stop > a broken EP from writing to existent addresses, so only provides limited > protection. Correct. > Despite the good intentions here, it doesn't seem like dma-ranges is > designed for this purpose and as the hardware has limited ranges it will > only be best-effort. So what other options do we have ?
On 16/11/2019 3:48 pm, Marek Vasut wrote: > On 11/7/19 3:19 PM, Andrew Murray wrote: >> On Thu, Nov 07, 2019 at 12:37:44AM +0100, Marek Vasut wrote: >>> On 10/26/19 10:36 PM, Andrew Murray wrote: >>> [...]>> But this still leaves me with one open question -- how do I >>> figure out >>>>> what to program into the PCI controller inbound windows, so that the >>>>> controller correctly filters inbound transfers which are targetting >>>>> nonexisting memory ? >>>> >>>> Your driver should program into the RC->CPU windows, the exact ranges >>>> described in the dma-ranges. Whilst also respecting the alignment and >>>> max-size rules your controller has (e.g. the existing upstream logic >>>> and also the new logic that recalculates the alignment per entry). >>>> >>>> As far as I can tell from looking at your U-Boot patch, I think I'd expect >>>> a single dma-range to be presented in the DT, that describes >>>> 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your >>>> controller is limited to 32 bits. And 2) there is a linear mapping between >>>> PCI and CPU addresses (given that the second and third arguments on >>>> pci_set_region are both the same). >>>> >>>> As you point out, this range includes lots of things that you don't >>>> want the RC to touch - such as non-existent memory. This is OK, when >>>> Linux programs addresses into the various EP's for them to DMA to host >>>> memory, it uses its own logic to select addresses that are in RAM, the >>>> purpose of the dma-range is to describe what the CPU RAM address looks >>>> like from the perspective of the RC (for example if the RC was wired >>>> with an offset such that made memory writes from the RC made to >>>> 0x00000000 end up on the system map at 0x80000000, we need to tell Linux >>>> about this offset. Otherwise when a EP device driver programs a DMA >>>> address of a RAM buffer at 0x90000000, it'll end up targetting >>>> 0x110000000. Thankfully our dma-range will tell Linux to apply an offset >>>> such that the actual address written to the EP is 0x10000000.). >>> >>> I understand that Linux programs the endpoints correctly. However this >>> still doesn't prevent the endpoint from being broken and from sending a >>> transaction to that non-existent memory. >> >> Correct. >> >>> The PCI controller can prevent >>> that and in an automotive SoC, I would very much like the PCI controller >>> to do just that, rather than hope that the endpoint would always work. >> >> OK I understand - At least when working on the assumption that your RC will >> block RC->CPU transactions that are not described in any of it's windows. >> Thus you want to use the dma-ranges as a means to configure your controller >> to do this. > > Yes > >> What actually happens if you have a broken endpoint that reads/writes to >> non-existent memory on this hardware? Ideally the RC would generate a >> CA or UR back to the endpoint - does something else happen? Lockup, dead RC, >> performance issues? > > The behavior is undefined. > >> Using built-in features of the RC to prevent it from sending transactions >> to non-existent addresses is clearly helpful. But of course it doesn't stop >> a broken EP from writing to existent addresses, so only provides limited >> protection. > > Correct. > >> Despite the good intentions here, it doesn't seem like dma-ranges is >> designed for this purpose and as the hardware has limited ranges it will >> only be best-effort. > So what other options do we have ? If you really want to sacrifice DMA efficiency for a perceived increase in theoretical robustness by setting very conservative windows, then ultimately it's your choice, go ahead. It's just that you *need* to make that choice in the bootloader, not in Linux. If Linux gets passed dma-ranges that aren't actually reflected by the hardware, such that this patch is needed, then it *will* go wrong eventually, and you'll only get an "I told you so" from me. The bootloader knows what platform it's running on, so it has no excuse for emitting more ranges than there are available windows on that platform. Robin.
On 18/10/2019 7:44 pm, Marek Vasut wrote: [...] >>>> Say you have a single hardware window, and this DT property (1-cell >>>> numbers for simplicity: >>>> >>>> dma-ranges = <0x00000000 0x00000000 0x80000000>; >>>> >>>> Driver reads one entry and programs the window to 2GB@0, DMA setup >>>> parses the first entry and sets device masks to 0x7fffffff, and >>>> everything's fine. >>>> >>>> Now say we describe the exact same address range this way instead: >>>> >>>> dma-ranges = <0x00000000 0x00000000 0x40000000, >>>> 0x40000000 0x40000000 0x40000000>; >>>> >>>> Driver reads one entry and programs the window to 1GB@0, DMA setup >>>> parses the first entry and sets device masks to 0x3fffffff, and *today*, >>>> things are suboptimal but happen to work. >>>> >>>> Now say we finally get round to fixing the of_dma code to properly >>>> generate DMA masks that actually include all usable address bits, a user >>>> upgrades their kernel package, and reboots with that same DT... >>>> >>>> Driver reads one entry and programs the window to 1GB@0, DMA setup >>>> parses all entries and sets device masks to 0x7fffffff, devices start >>>> randomly failing or throwing DMA errors half the time, angry user looks >>>> at the changelog to find that somebody decided their now-corrupted >>>> filesystem is less important than the fact that hey, at least the >>>> machine didn't refuse to boot because the DT was obviously wrong. Are >>>> you sure that shouldn't be a problem? >>> >>> I think you picked a rather special case here and arrived as a DMA mask >>> which just fails in this special case. Such special case doesn't happen >>> here, and even if it did, I would expect Linux to merge those two ranges >>> or do something sane ? If the DMA mask is set incorrectly, that's a bug >>> of the DMA code I would think. >> >> The mask is not set incorrectly - DMA masks represent the number of >> address bits the device (or intermediate interconnect in the case of the >> bus mask) is capable of driving. Thus when DMA is limited to a specific >> address range, the masks should be wide enough to cover the topmost >> address of that range (unless the device's own capability is inherently >> narrower). > > Then the mask should be 0x7fffffff in both cases I'd say. Yes, *that's my whole point*. It *should* be that for both cases, but today it works out not to be because the current code is a bit crap. >>> What DMA mask would you get if those two entries had a gap inbetween >>> them ? E.g.: >>> >>> dma-ranges = <0x00000000 0x00000000 0x20000000, >>> 0x40000000 0x40000000 0x20000000>; The gap itself is immaterial - the highest address described by those ranges is 0x5fffffff, which requires 31 bits to drive, thus the appropriate mask would be 0x7fffffff. >> OK, here's an real non-simplified example > > I would really like an answer to the simple example above before we > start inventing convoluted ones. Sigh... I did say "real". However convoluted it may look to you, this window configuration was "invented" ~6 years ago when the Arm Juno SoCs were first brought up and has been shipping ever since. >> (note that these windows are fixed and not programmed by Linux): >> >> dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>, >> <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, >> <0x43000000 0x8 0x80000000 0x8 0x80000000 0x2 0x00000000>; >> >> The DMA masks for the devices behind this bridge *should* be 35 bits, >> because that's the size of the largest usable address. Currently, >> however, because of the of_dma code's deficiency they would end up being >> an utterly useless 30 bits, which isn't even enough to reach the start >> of RAM. Thus I can't actually have this property in my DT, and as a >> result I can't enable the IOMMU, because *that* also needs to know the >> ranges in order to reserve the unusable gaps between the windows once >> address translation is in play. > > How is this related to this particular patch ? This looks more like some > DMA internal topic. It's trying to get across the point that "dma-ranges" is consumed by multiple separate parts of the kernel, and implicitly relying on the way they currently interact is a mistake which will come back to bite you. Changing one consumer (your driver) to allow devices to exist despite describing unusable DMA ranges might happen to seem OK today due to shortcomings in another consumer (the DMA API layer), but once the DMA API layer gets improved (necessarily, as above), your changes will cause subtle DMA breakage on R-Car systems with more populated DRAM banks than available PCI windows, which is arguably even worse than having PCI simply refuse to probe in that situation. Robin.
On 11/18/19 7:42 PM, Robin Murphy wrote: [...] >>> Despite the good intentions here, it doesn't seem like dma-ranges is >>> designed for this purpose and as the hardware has limited ranges it will >>> only be best-effort. >> So what other options do we have ? > > If you really want to sacrifice DMA efficiency for a perceived increase > in theoretical robustness by setting very conservative windows, That really means configuring the hardware correctly. > then > ultimately it's your choice, go ahead. It's just that you *need* to make > that choice in the bootloader, not in Linux. If Linux gets passed > dma-ranges that aren't actually reflected by the hardware, such that > this patch is needed, then it *will* go wrong eventually, and you'll > only get an "I told you so" from me. > > The bootloader knows what platform it's running on, so it has no excuse > for emitting more ranges than there are available windows on that platform. So basically the conclusion is to limit the amount of DMA ranges added into the DT and be done with it ?
diff --git a/drivers/pci/controller/pcie-rcar.c b/drivers/pci/controller/pcie-rcar.c index 56a6433eb70b..e2735005ffd3 100644 --- a/drivers/pci/controller/pcie-rcar.c +++ b/drivers/pci/controller/pcie-rcar.c @@ -1049,8 +1049,9 @@ static int rcar_pcie_inbound_ranges(struct rcar_pcie *pcie, while (cpu_addr < cpu_end) { if (idx >= MAX_NR_INBOUND_MAPS - 1) { - dev_err(pcie->dev, "Failed to map inbound regions!\n"); - return -EINVAL; + dev_warn(pcie->dev, + "Too many inbound regions, not all are mapped.\n"); + break; } /* * Set up 64-bit inbound regions as the range parser doesn't