diff mbox series

[V3,2/3] PCI: rcar: Do not abort on too many inbound dma-ranges

Message ID 20190809175741.7066-2-marek.vasut@gmail.com (mailing list archive)
State Superseded
Delegated to: Geert Uytterhoeven
Headers show
Series [V3,1/3] PCI: rcar: Move the inbound index check | expand

Commit Message

Marek Vasut Aug. 9, 2019, 5:57 p.m. UTC
From: Marek Vasut <marek.vasut+renesas@gmail.com>

In case the "dma-ranges" DT property contains either too many ranges
or the range start address is unaligned in such a way that populating
the range into the controller requires multiple entries, a situation
may occur where all ranges cannot be loaded into the controller.

Currently, the driver refuses to probe in such a situation. Relax this
behavior, load as many ranges as possible and warn if some ranges do
not fit anymore.

Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: linux-renesas-soc@vger.kernel.org
To: linux-pci@vger.kernel.org
---
V2: Update on top of 1/3
V3: No change
---
 drivers/pci/controller/pcie-rcar.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Simon Horman Aug. 16, 2019, 1:23 p.m. UTC | #1
On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote:
> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> 
> In case the "dma-ranges" DT property contains either too many ranges
> or the range start address is unaligned in such a way that populating
> the range into the controller requires multiple entries, a situation
> may occur where all ranges cannot be loaded into the controller.
> 
> Currently, the driver refuses to probe in such a situation. Relax this
> behavior, load as many ranges as possible and warn if some ranges do
> not fit anymore.

What is the motivation for relaxing this?

> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: linux-renesas-soc@vger.kernel.org
> To: linux-pci@vger.kernel.org
> ---
> V2: Update on top of 1/3
> V3: No change
> ---
>  drivers/pci/controller/pcie-rcar.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-rcar.c b/drivers/pci/controller/pcie-rcar.c
> index 56a6433eb70b..e2735005ffd3 100644
> --- a/drivers/pci/controller/pcie-rcar.c
> +++ b/drivers/pci/controller/pcie-rcar.c
> @@ -1049,8 +1049,9 @@ static int rcar_pcie_inbound_ranges(struct rcar_pcie *pcie,
>  
>  	while (cpu_addr < cpu_end) {
>  		if (idx >= MAX_NR_INBOUND_MAPS - 1) {
> -			dev_err(pcie->dev, "Failed to map inbound regions!\n");
> -			return -EINVAL;
> +			dev_warn(pcie->dev,
> +				 "Too many inbound regions, not all are mapped.\n");
> +			break;
>  		}
>  		/*
>  		 * Set up 64-bit inbound regions as the range parser doesn't
> -- 
> 2.20.1
>
Marek Vasut Aug. 16, 2019, 1:28 p.m. UTC | #2
On 8/16/19 3:23 PM, Simon Horman wrote:
> On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote:
>> From: Marek Vasut <marek.vasut+renesas@gmail.com>
>>
>> In case the "dma-ranges" DT property contains either too many ranges
>> or the range start address is unaligned in such a way that populating
>> the range into the controller requires multiple entries, a situation
>> may occur where all ranges cannot be loaded into the controller.
>>
>> Currently, the driver refuses to probe in such a situation. Relax this
>> behavior, load as many ranges as possible and warn if some ranges do
>> not fit anymore.
> 
> What is the motivation for relaxing this?

U-Boot can fill the ranges in properly now, the list would be longer in
such a case and the driver would fail to probe (because the list is
longer than what the hardware can support).
Simon Horman Aug. 16, 2019, 1:38 p.m. UTC | #3
On Fri, Aug 16, 2019 at 03:28:04PM +0200, Marek Vasut wrote:
> On 8/16/19 3:23 PM, Simon Horman wrote:
> > On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote:
> >> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> >>
> >> In case the "dma-ranges" DT property contains either too many ranges
> >> or the range start address is unaligned in such a way that populating
> >> the range into the controller requires multiple entries, a situation
> >> may occur where all ranges cannot be loaded into the controller.
> >>
> >> Currently, the driver refuses to probe in such a situation. Relax this
> >> behavior, load as many ranges as possible and warn if some ranges do
> >> not fit anymore.
> > 
> > What is the motivation for relaxing this?
> 
> U-Boot can fill the ranges in properly now, the list would be longer in
> such a case and the driver would fail to probe (because the list is
> longer than what the hardware can support).

Thanks, I think that would be worth adding to the changelog.

Regardless,

Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Marek Vasut Aug. 16, 2019, 5:41 p.m. UTC | #4
On 8/16/19 3:38 PM, Simon Horman wrote:
> On Fri, Aug 16, 2019 at 03:28:04PM +0200, Marek Vasut wrote:
>> On 8/16/19 3:23 PM, Simon Horman wrote:
>>> On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote:
>>>> From: Marek Vasut <marek.vasut+renesas@gmail.com>
>>>>
>>>> In case the "dma-ranges" DT property contains either too many ranges
>>>> or the range start address is unaligned in such a way that populating
>>>> the range into the controller requires multiple entries, a situation
>>>> may occur where all ranges cannot be loaded into the controller.
>>>>
>>>> Currently, the driver refuses to probe in such a situation. Relax this
>>>> behavior, load as many ranges as possible and warn if some ranges do
>>>> not fit anymore.
>>>
>>> What is the motivation for relaxing this?
>>
>> U-Boot can fill the ranges in properly now, the list would be longer in
>> such a case and the driver would fail to probe (because the list is
>> longer than what the hardware can support).
> 
> Thanks, I think that would be worth adding to the changelog.

It does describe exactly what I just said -- if there are too many
ranges or they start in a way that cannot be easily fully programmed
into the HW, this patch applies.

> Regardless,
> 
> Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
>
Lorenzo Pieralisi Oct. 16, 2019, 3 p.m. UTC | #5
On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote:
> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> 
> In case the "dma-ranges" DT property contains either too many ranges
> or the range start address is unaligned in such a way that populating
> the range into the controller requires multiple entries, a situation
> may occur where all ranges cannot be loaded into the controller.
> 
> Currently, the driver refuses to probe in such a situation. Relax this
> behavior, load as many ranges as possible and warn if some ranges do
> not fit anymore.

Patches (1) and (3) are fine but I do not think this one is.

Firmware (DT) should provide dma-ranges according to what HW can handle,
more so given that other subsystems (eg IOMMU) rely on the dma-ranges
value to set-up eg DMA - if there is a mismatch between PCI host inbound
regions and software structures describing DMA'able ranges all bets are
off.

I would not merge this specific patch but let me know what you think.

Thanks,
Lorenzo

> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: linux-renesas-soc@vger.kernel.org
> To: linux-pci@vger.kernel.org
> ---
> V2: Update on top of 1/3
> V3: No change
> ---
>  drivers/pci/controller/pcie-rcar.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-rcar.c b/drivers/pci/controller/pcie-rcar.c
> index 56a6433eb70b..e2735005ffd3 100644
> --- a/drivers/pci/controller/pcie-rcar.c
> +++ b/drivers/pci/controller/pcie-rcar.c
> @@ -1049,8 +1049,9 @@ static int rcar_pcie_inbound_ranges(struct rcar_pcie *pcie,
>  
>  	while (cpu_addr < cpu_end) {
>  		if (idx >= MAX_NR_INBOUND_MAPS - 1) {
> -			dev_err(pcie->dev, "Failed to map inbound regions!\n");
> -			return -EINVAL;
> +			dev_warn(pcie->dev,
> +				 "Too many inbound regions, not all are mapped.\n");
> +			break;
>  		}
>  		/*
>  		 * Set up 64-bit inbound regions as the range parser doesn't
> -- 
> 2.20.1
>
Marek Vasut Oct. 16, 2019, 3:10 p.m. UTC | #6
On 10/16/19 5:00 PM, Lorenzo Pieralisi wrote:
> On Fri, Aug 09, 2019 at 07:57:40PM +0200, Marek Vasut wrote:
>> From: Marek Vasut <marek.vasut+renesas@gmail.com>
>>
>> In case the "dma-ranges" DT property contains either too many ranges
>> or the range start address is unaligned in such a way that populating
>> the range into the controller requires multiple entries, a situation
>> may occur where all ranges cannot be loaded into the controller.
>>
>> Currently, the driver refuses to probe in such a situation. Relax this
>> behavior, load as many ranges as possible and warn if some ranges do
>> not fit anymore.
> 
> Patches (1) and (3) are fine but I do not think this one is.
> 
> Firmware (DT) should provide dma-ranges according to what HW can handle,
> more so given that other subsystems (eg IOMMU) rely on the dma-ranges
> value to set-up eg DMA - if there is a mismatch between PCI host inbound
> regions and software structures describing DMA'able ranges all bets are
> off.

The firmware provides all the ranges which are available and usable,
that's the hardware description and that should be in the DT. The
firmware cannot decide the policy for the next stage (Linux in this
case) on which ranges are better to use for Linux and which are less
good. Linux can then decide which ranges are best suited for it and
ignore the other ones.
Lorenzo Pieralisi Oct. 16, 2019, 3:26 p.m. UTC | #7
On Wed, Oct 16, 2019 at 05:10:02PM +0200, Marek Vasut wrote:
> On 10/16/19 5:00 PM, Lorenzo Pieralisi wrote:
> > On Fri, Aug 09, 2019 at 07:57:40PM +0200, Marek Vasut wrote:
> >> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> >>
> >> In case the "dma-ranges" DT property contains either too many ranges
> >> or the range start address is unaligned in such a way that populating
> >> the range into the controller requires multiple entries, a situation
> >> may occur where all ranges cannot be loaded into the controller.
> >>
> >> Currently, the driver refuses to probe in such a situation. Relax this
> >> behavior, load as many ranges as possible and warn if some ranges do
> >> not fit anymore.
> > 
> > Patches (1) and (3) are fine but I do not think this one is.
> > 
> > Firmware (DT) should provide dma-ranges according to what HW can handle,
> > more so given that other subsystems (eg IOMMU) rely on the dma-ranges
> > value to set-up eg DMA - if there is a mismatch between PCI host inbound
> > regions and software structures describing DMA'able ranges all bets are
> > off.
> 
> The firmware provides all the ranges which are available and usable,
> that's the hardware description and that should be in the DT.

If the HW (given that those dma-ranges are declared for the PCI host
controller) can't be programmed to enable those DMA ranges - those
ranges are neither available nor usable, ergo DT is broken.

> The firmware cannot decide the policy for the next stage (Linux in
> this case) on which ranges are better to use for Linux and which are
> less good. Linux can then decide which ranges are best suited for it
> and ignore the other ones.

dma-ranges is a property that is used by other kernel subsystems eg
IOMMU other than the RCAR host controller driver. The policy, provided
there is one should be shared across them. You can't leave a PCI
host controller half-programmed and expect other subsystems (that
*expect* those ranges to be DMA'ble) to work.

I reiterate my point: if firmware is broken it is better to fail
the probe rather than limp on hoping that things will keep on
working.

Lorenzo
Marek Vasut Oct. 16, 2019, 3:29 p.m. UTC | #8
On 10/16/19 5:26 PM, Lorenzo Pieralisi wrote:
> On Wed, Oct 16, 2019 at 05:10:02PM +0200, Marek Vasut wrote:
>> On 10/16/19 5:00 PM, Lorenzo Pieralisi wrote:
>>> On Fri, Aug 09, 2019 at 07:57:40PM +0200, Marek Vasut wrote:
>>>> From: Marek Vasut <marek.vasut+renesas@gmail.com>
>>>>
>>>> In case the "dma-ranges" DT property contains either too many ranges
>>>> or the range start address is unaligned in such a way that populating
>>>> the range into the controller requires multiple entries, a situation
>>>> may occur where all ranges cannot be loaded into the controller.
>>>>
>>>> Currently, the driver refuses to probe in such a situation. Relax this
>>>> behavior, load as many ranges as possible and warn if some ranges do
>>>> not fit anymore.
>>>
>>> Patches (1) and (3) are fine but I do not think this one is.
>>>
>>> Firmware (DT) should provide dma-ranges according to what HW can handle,
>>> more so given that other subsystems (eg IOMMU) rely on the dma-ranges
>>> value to set-up eg DMA - if there is a mismatch between PCI host inbound
>>> regions and software structures describing DMA'able ranges all bets are
>>> off.
>>
>> The firmware provides all the ranges which are available and usable,
>> that's the hardware description and that should be in the DT.
> 
> If the HW (given that those dma-ranges are declared for the PCI host
> controller) can't be programmed to enable those DMA ranges - those
> ranges are neither available nor usable, ergo DT is broken.

The hardware can be programmed to enable those DMA ranges, just not all
of them at the same time.

It's not the job of the bootloader to guess which ranges might the next
stage like best.

>> The firmware cannot decide the policy for the next stage (Linux in
>> this case) on which ranges are better to use for Linux and which are
>> less good. Linux can then decide which ranges are best suited for it
>> and ignore the other ones.
> 
> dma-ranges is a property that is used by other kernel subsystems eg
> IOMMU other than the RCAR host controller driver. The policy, provided
> there is one should be shared across them. You can't leave a PCI
> host controller half-programmed and expect other subsystems (that
> *expect* those ranges to be DMA'ble) to work.
> 
> I reiterate my point: if firmware is broken it is better to fail
> the probe rather than limp on hoping that things will keep on
> working.

But the firmware is not broken ?
Lorenzo Pieralisi Oct. 16, 2019, 4:18 p.m. UTC | #9
[+RobH, Robin]

On Wed, Oct 16, 2019 at 05:29:50PM +0200, Marek Vasut wrote:

[...]

> >> The firmware provides all the ranges which are available and usable,
> >> that's the hardware description and that should be in the DT.
> > 
> > If the HW (given that those dma-ranges are declared for the PCI host
> > controller) can't be programmed to enable those DMA ranges - those
> > ranges are neither available nor usable, ergo DT is broken.
> 
> The hardware can be programmed to enable those DMA ranges, just not all
> of them at the same time.

Ok, we are down to DT bindings interpretation then.

> It's not the job of the bootloader to guess which ranges might the next
> stage like best.

By the time this series:

https://patchwork.ozlabs.org/user/todo/linux-pci/?series=132419

is merged, your policy will require the host controller driver to
remove the DMA ranges that could not be programmed in the inbound
address decoders from the dma_ranges list, otherwise things will
fall apart.

> >> The firmware cannot decide the policy for the next stage (Linux in
> >> this case) on which ranges are better to use for Linux and which are
> >> less good. Linux can then decide which ranges are best suited for it
> >> and ignore the other ones.
> > 
> > dma-ranges is a property that is used by other kernel subsystems eg
> > IOMMU other than the RCAR host controller driver. The policy, provided
> > there is one should be shared across them. You can't leave a PCI
> > host controller half-programmed and expect other subsystems (that
> > *expect* those ranges to be DMA'ble) to work.
> > 
> > I reiterate my point: if firmware is broken it is better to fail
> > the probe rather than limp on hoping that things will keep on
> > working.
> 
> But the firmware is not broken ?

See above, it depends on how the dma-ranges property is interpreted,
hopefully we can reach consensus in this thread, I won't merge a patch
that can backfire later unless we all agree that what it does is
correct.

Lorenzo
Rob Herring Oct. 16, 2019, 6:12 p.m. UTC | #10
On Wed, Oct 16, 2019 at 11:18 AM Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
>
> [+RobH, Robin]
>
> On Wed, Oct 16, 2019 at 05:29:50PM +0200, Marek Vasut wrote:
>
> [...]
>
> > >> The firmware provides all the ranges which are available and usable,
> > >> that's the hardware description and that should be in the DT.
> > >
> > > If the HW (given that those dma-ranges are declared for the PCI host
> > > controller) can't be programmed to enable those DMA ranges - those
> > > ranges are neither available nor usable, ergo DT is broken.
> >
> > The hardware can be programmed to enable those DMA ranges, just not all
> > of them at the same time.
>
> Ok, we are down to DT bindings interpretation then.
>
> > It's not the job of the bootloader to guess which ranges might the next
> > stage like best.
>
> By the time this series:
>
> https://patchwork.ozlabs.org/user/todo/linux-pci/?series=132419
>
> is merged, your policy will require the host controller driver to
> remove the DMA ranges that could not be programmed in the inbound
> address decoders from the dma_ranges list, otherwise things will
> fall apart.

I don't think the above series has too much impact on this. It's my
other series dealing with dma masks that's relevant because for dma
masks we only ever look at the first dma-ranges entry. We either have
to support multiple addresses and sizes per device (the only way to
really support any possible dma-ranges), merge entries to single
offset/mask or have some way to select which range entry to use.

So things are broken to some extent regardless unless MAX_NR_INBOUND_MAPS == 1.

> > >> The firmware cannot decide the policy for the next stage (Linux in
> > >> this case) on which ranges are better to use for Linux and which are
> > >> less good. Linux can then decide which ranges are best suited for it
> > >> and ignore the other ones.
> > >
> > > dma-ranges is a property that is used by other kernel subsystems eg
> > > IOMMU other than the RCAR host controller driver. The policy, provided
> > > there is one should be shared across them. You can't leave a PCI
> > > host controller half-programmed and expect other subsystems (that
> > > *expect* those ranges to be DMA'ble) to work.
> > >
> > > I reiterate my point: if firmware is broken it is better to fail
> > > the probe rather than limp on hoping that things will keep on
> > > working.
> >
> > But the firmware is not broken ?
>
> See above, it depends on how the dma-ranges property is interpreted,
> hopefully we can reach consensus in this thread, I won't merge a patch
> that can backfire later unless we all agree that what it does is
> correct.

Defining more dma-ranges entries than the h/w has inbound windows for
sounds like a broken DT to me.

What exactly does dma-ranges contain in this case? I'm not really
visualizing how different clients would pick different dma-ranges
entries.

Rob
Marek Vasut Oct. 16, 2019, 6:17 p.m. UTC | #11
On 10/16/19 8:12 PM, Rob Herring wrote:
> On Wed, Oct 16, 2019 at 11:18 AM Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com> wrote:
>>
>> [+RobH, Robin]
>>
>> On Wed, Oct 16, 2019 at 05:29:50PM +0200, Marek Vasut wrote:
>>
>> [...]
>>
>>>>> The firmware provides all the ranges which are available and usable,
>>>>> that's the hardware description and that should be in the DT.
>>>>
>>>> If the HW (given that those dma-ranges are declared for the PCI host
>>>> controller) can't be programmed to enable those DMA ranges - those
>>>> ranges are neither available nor usable, ergo DT is broken.
>>>
>>> The hardware can be programmed to enable those DMA ranges, just not all
>>> of them at the same time.
>>
>> Ok, we are down to DT bindings interpretation then.
>>
>>> It's not the job of the bootloader to guess which ranges might the next
>>> stage like best.
>>
>> By the time this series:
>>
>> https://patchwork.ozlabs.org/user/todo/linux-pci/?series=132419
>>
>> is merged, your policy will require the host controller driver to
>> remove the DMA ranges that could not be programmed in the inbound
>> address decoders from the dma_ranges list, otherwise things will
>> fall apart.
> 
> I don't think the above series has too much impact on this. It's my
> other series dealing with dma masks that's relevant because for dma
> masks we only ever look at the first dma-ranges entry. We either have
> to support multiple addresses and sizes per device (the only way to
> really support any possible dma-ranges), merge entries to single
> offset/mask or have some way to select which range entry to use.
> 
> So things are broken to some extent regardless unless MAX_NR_INBOUND_MAPS == 1.
> 
>>>>> The firmware cannot decide the policy for the next stage (Linux in
>>>>> this case) on which ranges are better to use for Linux and which are
>>>>> less good. Linux can then decide which ranges are best suited for it
>>>>> and ignore the other ones.
>>>>
>>>> dma-ranges is a property that is used by other kernel subsystems eg
>>>> IOMMU other than the RCAR host controller driver. The policy, provided
>>>> there is one should be shared across them. You can't leave a PCI
>>>> host controller half-programmed and expect other subsystems (that
>>>> *expect* those ranges to be DMA'ble) to work.
>>>>
>>>> I reiterate my point: if firmware is broken it is better to fail
>>>> the probe rather than limp on hoping that things will keep on
>>>> working.
>>>
>>> But the firmware is not broken ?
>>
>> See above, it depends on how the dma-ranges property is interpreted,
>> hopefully we can reach consensus in this thread, I won't merge a patch
>> that can backfire later unless we all agree that what it does is
>> correct.
> 
> Defining more dma-ranges entries than the h/w has inbound windows for
> sounds like a broken DT to me.
> 
> What exactly does dma-ranges contain in this case? I'm not really
> visualizing how different clients would pick different dma-ranges
> entries.

You can have multiple non-continuous DRAM banks for example. And an
entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a
separate dma-ranges entry, right ?
Rob Herring Oct. 16, 2019, 8:25 p.m. UTC | #12
On Wed, Oct 16, 2019 at 1:18 PM Marek Vasut <marek.vasut@gmail.com> wrote:
>
> On 10/16/19 8:12 PM, Rob Herring wrote:
> > On Wed, Oct 16, 2019 at 11:18 AM Lorenzo Pieralisi
> > <lorenzo.pieralisi@arm.com> wrote:
> >>
> >> [+RobH, Robin]
> >>
> >> On Wed, Oct 16, 2019 at 05:29:50PM +0200, Marek Vasut wrote:
> >>
> >> [...]
> >>
> >>>>> The firmware provides all the ranges which are available and usable,
> >>>>> that's the hardware description and that should be in the DT.
> >>>>
> >>>> If the HW (given that those dma-ranges are declared for the PCI host
> >>>> controller) can't be programmed to enable those DMA ranges - those
> >>>> ranges are neither available nor usable, ergo DT is broken.
> >>>
> >>> The hardware can be programmed to enable those DMA ranges, just not all
> >>> of them at the same time.
> >>
> >> Ok, we are down to DT bindings interpretation then.
> >>
> >>> It's not the job of the bootloader to guess which ranges might the next
> >>> stage like best.
> >>
> >> By the time this series:
> >>
> >> https://patchwork.ozlabs.org/user/todo/linux-pci/?series=132419
> >>
> >> is merged, your policy will require the host controller driver to
> >> remove the DMA ranges that could not be programmed in the inbound
> >> address decoders from the dma_ranges list, otherwise things will
> >> fall apart.
> >
> > I don't think the above series has too much impact on this. It's my
> > other series dealing with dma masks that's relevant because for dma
> > masks we only ever look at the first dma-ranges entry. We either have
> > to support multiple addresses and sizes per device (the only way to
> > really support any possible dma-ranges), merge entries to single
> > offset/mask or have some way to select which range entry to use.
> >
> > So things are broken to some extent regardless unless MAX_NR_INBOUND_MAPS == 1.
> >
> >>>>> The firmware cannot decide the policy for the next stage (Linux in
> >>>>> this case) on which ranges are better to use for Linux and which are
> >>>>> less good. Linux can then decide which ranges are best suited for it
> >>>>> and ignore the other ones.
> >>>>
> >>>> dma-ranges is a property that is used by other kernel subsystems eg
> >>>> IOMMU other than the RCAR host controller driver. The policy, provided
> >>>> there is one should be shared across them. You can't leave a PCI
> >>>> host controller half-programmed and expect other subsystems (that
> >>>> *expect* those ranges to be DMA'ble) to work.
> >>>>
> >>>> I reiterate my point: if firmware is broken it is better to fail
> >>>> the probe rather than limp on hoping that things will keep on
> >>>> working.
> >>>
> >>> But the firmware is not broken ?
> >>
> >> See above, it depends on how the dma-ranges property is interpreted,
> >> hopefully we can reach consensus in this thread, I won't merge a patch
> >> that can backfire later unless we all agree that what it does is
> >> correct.
> >
> > Defining more dma-ranges entries than the h/w has inbound windows for
> > sounds like a broken DT to me.
> >
> > What exactly does dma-ranges contain in this case? I'm not really
> > visualizing how different clients would pick different dma-ranges
> > entries.
>
> You can have multiple non-continuous DRAM banks for example. And an
> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a
> separate dma-ranges entry, right ?

Not necessarily. We really only want to define the minimum we have to.
The ideal system is no dma-ranges. Is each bank at a different
relative position compared to the CPU's view of the system. That would
seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change.

I suppose if your intent is to use inbound windows as a poor man's
IOMMU to prevent accesses to the holes, then yes you would list them
out. But I think that's wrong and difficult to maintain. You'd also
need to deal with reserved-memory regions too.

Rob
Marek Vasut Oct. 16, 2019, 9:15 p.m. UTC | #13
On 10/16/19 10:25 PM, Rob Herring wrote:
[...]
>>>>>>> The firmware cannot decide the policy for the next stage (Linux in
>>>>>>> this case) on which ranges are better to use for Linux and which are
>>>>>>> less good. Linux can then decide which ranges are best suited for it
>>>>>>> and ignore the other ones.
>>>>>>
>>>>>> dma-ranges is a property that is used by other kernel subsystems eg
>>>>>> IOMMU other than the RCAR host controller driver. The policy, provided
>>>>>> there is one should be shared across them. You can't leave a PCI
>>>>>> host controller half-programmed and expect other subsystems (that
>>>>>> *expect* those ranges to be DMA'ble) to work.
>>>>>>
>>>>>> I reiterate my point: if firmware is broken it is better to fail
>>>>>> the probe rather than limp on hoping that things will keep on
>>>>>> working.
>>>>>
>>>>> But the firmware is not broken ?
>>>>
>>>> See above, it depends on how the dma-ranges property is interpreted,
>>>> hopefully we can reach consensus in this thread, I won't merge a patch
>>>> that can backfire later unless we all agree that what it does is
>>>> correct.
>>>
>>> Defining more dma-ranges entries than the h/w has inbound windows for
>>> sounds like a broken DT to me.
>>>
>>> What exactly does dma-ranges contain in this case? I'm not really
>>> visualizing how different clients would pick different dma-ranges
>>> entries.
>>
>> You can have multiple non-continuous DRAM banks for example. And an
>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a
>> separate dma-ranges entry, right ?
> 
> Not necessarily. We really only want to define the minimum we have to.
> The ideal system is no dma-ranges. Is each bank at a different
> relative position compared to the CPU's view of the system. That would
> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change.

Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit
boundary and some more above the 32bit boundary. These two banks don't
need to be continuous. And then you could add the SRAM into the mix.

> I suppose if your intent is to use inbound windows as a poor man's
> IOMMU to prevent accesses to the holes, then yes you would list them
> out. But I think that's wrong and difficult to maintain. You'd also
> need to deal with reserved-memory regions too.

What's the problem with that? The bootloader has all that information
and can patch the DT correctly. In fact, in my specific case, I have
platform which can be populated with differently sized DRAM, so the
holes are also dynamically calculated ; there is no one DT then, the
bootloader is responsible to generate the dma-ranges accordingly.
Rob Herring Oct. 16, 2019, 10:26 p.m. UTC | #14
On Wed, Oct 16, 2019 at 4:16 PM Marek Vasut <marek.vasut@gmail.com> wrote:
>
> On 10/16/19 10:25 PM, Rob Herring wrote:
> [...]
> >>>>>>> The firmware cannot decide the policy for the next stage (Linux in
> >>>>>>> this case) on which ranges are better to use for Linux and which are
> >>>>>>> less good. Linux can then decide which ranges are best suited for it
> >>>>>>> and ignore the other ones.
> >>>>>>
> >>>>>> dma-ranges is a property that is used by other kernel subsystems eg
> >>>>>> IOMMU other than the RCAR host controller driver. The policy, provided
> >>>>>> there is one should be shared across them. You can't leave a PCI
> >>>>>> host controller half-programmed and expect other subsystems (that
> >>>>>> *expect* those ranges to be DMA'ble) to work.
> >>>>>>
> >>>>>> I reiterate my point: if firmware is broken it is better to fail
> >>>>>> the probe rather than limp on hoping that things will keep on
> >>>>>> working.
> >>>>>
> >>>>> But the firmware is not broken ?
> >>>>
> >>>> See above, it depends on how the dma-ranges property is interpreted,
> >>>> hopefully we can reach consensus in this thread, I won't merge a patch
> >>>> that can backfire later unless we all agree that what it does is
> >>>> correct.
> >>>
> >>> Defining more dma-ranges entries than the h/w has inbound windows for
> >>> sounds like a broken DT to me.
> >>>
> >>> What exactly does dma-ranges contain in this case? I'm not really
> >>> visualizing how different clients would pick different dma-ranges
> >>> entries.
> >>
> >> You can have multiple non-continuous DRAM banks for example. And an
> >> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a
> >> separate dma-ranges entry, right ?
> >
> > Not necessarily. We really only want to define the minimum we have to.
> > The ideal system is no dma-ranges. Is each bank at a different
> > relative position compared to the CPU's view of the system. That would
> > seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change.
>
> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit
> boundary and some more above the 32bit boundary. These two banks don't
> need to be continuous. And then you could add the SRAM into the mix.

Continuous is irrelevant. My question was in more specific terms is
(bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs.
PCI host view (i.e. bus addr)? If not, then that is 1 translation and
1 dma-ranges entry.

> > I suppose if your intent is to use inbound windows as a poor man's
> > IOMMU to prevent accesses to the holes, then yes you would list them
> > out. But I think that's wrong and difficult to maintain. You'd also
> > need to deal with reserved-memory regions too.
>
> What's the problem with that? The bootloader has all that information
> and can patch the DT correctly. In fact, in my specific case, I have
> platform which can be populated with differently sized DRAM, so the
> holes are also dynamically calculated ; there is no one DT then, the
> bootloader is responsible to generate the dma-ranges accordingly.

The problems are it doesn't work:

Your dma-mask and offset are not going to be correct.

You are running out of inbound windows. Your patch does nothing to
solve that. The solution would be merging multiple dma-ranges entries
to a single inbound window. We'd have to do that both for dma-mask and
inbound windows. The former would also have to figure out which
entries apply to setting up dma-mask. I'm simply suggesting just do
that up front and avoid any pointless splits.

You are setting up random inbound windows. The bootloader can't assume
what order the OS parses dma-ranges, and the OS can't assume what
order the bootloader writes the entries.

Rob
Marek Vasut Oct. 16, 2019, 10:33 p.m. UTC | #15
On 10/17/19 12:26 AM, Rob Herring wrote:
[...]
>>>> You can have multiple non-continuous DRAM banks for example. And an
>>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a
>>>> separate dma-ranges entry, right ?
>>>
>>> Not necessarily. We really only want to define the minimum we have to.
>>> The ideal system is no dma-ranges. Is each bank at a different
>>> relative position compared to the CPU's view of the system. That would
>>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change.
>>
>> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit
>> boundary and some more above the 32bit boundary. These two banks don't
>> need to be continuous. And then you could add the SRAM into the mix.
> 
> Continuous is irrelevant. My question was in more specific terms is
> (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs.
> PCI host view (i.e. bus addr)? If not, then that is 1 translation and
> 1 dma-ranges entry.

I don't think it's different in that aspect. Except the bus has this
32bit limitation, where it only sees subset of the DRAM.

Why should the DMA ranges incorrectly cover also the DRAM which is not
present ?

>>> I suppose if your intent is to use inbound windows as a poor man's
>>> IOMMU to prevent accesses to the holes, then yes you would list them
>>> out. But I think that's wrong and difficult to maintain. You'd also
>>> need to deal with reserved-memory regions too.
>>
>> What's the problem with that? The bootloader has all that information
>> and can patch the DT correctly. In fact, in my specific case, I have
>> platform which can be populated with differently sized DRAM, so the
>> holes are also dynamically calculated ; there is no one DT then, the
>> bootloader is responsible to generate the dma-ranges accordingly.
> 
> The problems are it doesn't work:
> 
> Your dma-mask and offset are not going to be correct.
> 
> You are running out of inbound windows. Your patch does nothing to
> solve that. The solution would be merging multiple dma-ranges entries
> to a single inbound window. We'd have to do that both for dma-mask and
> inbound windows. The former would also have to figure out which
> entries apply to setting up dma-mask. I'm simply suggesting just do
> that up front and avoid any pointless splits.

But then the PCI device can trigger a transaction to non-existent DRAM
and cause undefined behavior. Surely we do not want that ?

> You are setting up random inbound windows. The bootloader can't assume
> what order the OS parses dma-ranges, and the OS can't assume what
> order the bootloader writes the entries.

But the OS can assume the ranges are correct and cover only valid
memory, right ? That is, memory into which the PCI controller can safely
access.
Geert Uytterhoeven Oct. 17, 2019, 7:06 a.m. UTC | #16
Hi Marek,

On Thu, Oct 17, 2019 at 12:33 AM Marek Vasut <marek.vasut@gmail.com> wrote:
> On 10/17/19 12:26 AM, Rob Herring wrote:
> [...]
> >>>> You can have multiple non-continuous DRAM banks for example. And an
> >>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a
> >>>> separate dma-ranges entry, right ?
> >>>
> >>> Not necessarily. We really only want to define the minimum we have to.
> >>> The ideal system is no dma-ranges. Is each bank at a different
> >>> relative position compared to the CPU's view of the system. That would
> >>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change.
> >>
> >> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit
> >> boundary and some more above the 32bit boundary. These two banks don't
> >> need to be continuous. And then you could add the SRAM into the mix.
> >
> > Continuous is irrelevant. My question was in more specific terms is
> > (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs.
> > PCI host view (i.e. bus addr)? If not, then that is 1 translation and
> > 1 dma-ranges entry.
>
> I don't think it's different in that aspect. Except the bus has this
> 32bit limitation, where it only sees subset of the DRAM.
>
> Why should the DMA ranges incorrectly cover also the DRAM which is not
> present ?
>
> >>> I suppose if your intent is to use inbound windows as a poor man's
> >>> IOMMU to prevent accesses to the holes, then yes you would list them
> >>> out. But I think that's wrong and difficult to maintain. You'd also
> >>> need to deal with reserved-memory regions too.
> >>
> >> What's the problem with that? The bootloader has all that information
> >> and can patch the DT correctly. In fact, in my specific case, I have
> >> platform which can be populated with differently sized DRAM, so the
> >> holes are also dynamically calculated ; there is no one DT then, the
> >> bootloader is responsible to generate the dma-ranges accordingly.
> >
> > The problems are it doesn't work:
> >
> > Your dma-mask and offset are not going to be correct.
> >
> > You are running out of inbound windows. Your patch does nothing to
> > solve that. The solution would be merging multiple dma-ranges entries
> > to a single inbound window. We'd have to do that both for dma-mask and
> > inbound windows. The former would also have to figure out which
> > entries apply to setting up dma-mask. I'm simply suggesting just do
> > that up front and avoid any pointless splits.
>
> But then the PCI device can trigger a transaction to non-existent DRAM
> and cause undefined behavior. Surely we do not want that ?

The PCI device will trigger transactions to memory only when instructed
to do so by Linux, right?  Hence if Linux takes into account chosen/memory
and dma-ranges, there is no problem?

> > You are setting up random inbound windows. The bootloader can't assume
> > what order the OS parses dma-ranges, and the OS can't assume what
> > order the bootloader writes the entries.
>
> But the OS can assume the ranges are correct and cover only valid
> memory, right ? That is, memory into which the PCI controller can safely
> access.

Gr{oetje,eeting}s,

                        Geert
Marek Vasut Oct. 17, 2019, 10:55 a.m. UTC | #17
On 10/17/19 9:06 AM, Geert Uytterhoeven wrote:

[...]

>>>>> I suppose if your intent is to use inbound windows as a poor man's
>>>>> IOMMU to prevent accesses to the holes, then yes you would list them
>>>>> out. But I think that's wrong and difficult to maintain. You'd also
>>>>> need to deal with reserved-memory regions too.
>>>>
>>>> What's the problem with that? The bootloader has all that information
>>>> and can patch the DT correctly. In fact, in my specific case, I have
>>>> platform which can be populated with differently sized DRAM, so the
>>>> holes are also dynamically calculated ; there is no one DT then, the
>>>> bootloader is responsible to generate the dma-ranges accordingly.
>>>
>>> The problems are it doesn't work:
>>>
>>> Your dma-mask and offset are not going to be correct.
>>>
>>> You are running out of inbound windows. Your patch does nothing to
>>> solve that. The solution would be merging multiple dma-ranges entries
>>> to a single inbound window. We'd have to do that both for dma-mask and
>>> inbound windows. The former would also have to figure out which
>>> entries apply to setting up dma-mask. I'm simply suggesting just do
>>> that up front and avoid any pointless splits.
>>
>> But then the PCI device can trigger a transaction to non-existent DRAM
>> and cause undefined behavior. Surely we do not want that ?
> 
> The PCI device will trigger transactions to memory only when instructed
> to do so by Linux, right?  Hence if Linux takes into account chosen/memory
> and dma-ranges, there is no problem?

Unless of course the remote device initiates a transfer. And if the
controller is programmed such that accesses to the missing DRAM in the
holes are not filtered out by the controller, then the controller will
gladly let the transaction through. Do we really want to let this happen ?

[...]
Robin Murphy Oct. 17, 2019, 1:06 p.m. UTC | #18
On 17/10/2019 11:55, Marek Vasut wrote:
> On 10/17/19 9:06 AM, Geert Uytterhoeven wrote:
> 
> [...]
> 
>>>>>> I suppose if your intent is to use inbound windows as a poor man's
>>>>>> IOMMU to prevent accesses to the holes, then yes you would list them
>>>>>> out. But I think that's wrong and difficult to maintain. You'd also
>>>>>> need to deal with reserved-memory regions too.
>>>>>
>>>>> What's the problem with that? The bootloader has all that information
>>>>> and can patch the DT correctly. In fact, in my specific case, I have
>>>>> platform which can be populated with differently sized DRAM, so the
>>>>> holes are also dynamically calculated ; there is no one DT then, the
>>>>> bootloader is responsible to generate the dma-ranges accordingly.
>>>>
>>>> The problems are it doesn't work:
>>>>
>>>> Your dma-mask and offset are not going to be correct.
>>>>
>>>> You are running out of inbound windows. Your patch does nothing to
>>>> solve that. The solution would be merging multiple dma-ranges entries
>>>> to a single inbound window. We'd have to do that both for dma-mask and
>>>> inbound windows. The former would also have to figure out which
>>>> entries apply to setting up dma-mask. I'm simply suggesting just do
>>>> that up front and avoid any pointless splits.
>>>
>>> But then the PCI device can trigger a transaction to non-existent DRAM
>>> and cause undefined behavior. Surely we do not want that ?
>>
>> The PCI device will trigger transactions to memory only when instructed
>> to do so by Linux, right?  Hence if Linux takes into account chosen/memory
>> and dma-ranges, there is no problem?
> 
> Unless of course the remote device initiates a transfer. And if the
> controller is programmed such that accesses to the missing DRAM in the
> holes are not filtered out by the controller, then the controller will
> gladly let the transaction through. Do we really want to let this happen ?

If you've got devices making random unsolicited accesses then who's to 
say they wouldn't also hit valid windows and corrupt memory? If it's 
happening at all you've already lost. And realistically, if the address 
isn't valid then it's not going to make much difference anyway - in 
probably 99% of cases, either the transaction doesn't hit a window and 
the host bridge returns a completer abort, or it does hit a window, the 
AXI side returns DECERR or SLVERR, and the host bridge translates that 
into a completer abort. Consider also that many PCI IPs don't have 
discrete windows and just map the entirety of PCI mem space directly to 
the system PA space.

I don't believe this is a valid argument for anything whichever way round.

Robin.
Marek Vasut Oct. 17, 2019, 2 p.m. UTC | #19
On 10/17/19 3:06 PM, Robin Murphy wrote:
> On 17/10/2019 11:55, Marek Vasut wrote:
>> On 10/17/19 9:06 AM, Geert Uytterhoeven wrote:
>>
>> [...]
>>
>>>>>>> I suppose if your intent is to use inbound windows as a poor man's
>>>>>>> IOMMU to prevent accesses to the holes, then yes you would list them
>>>>>>> out. But I think that's wrong and difficult to maintain. You'd also
>>>>>>> need to deal with reserved-memory regions too.
>>>>>>
>>>>>> What's the problem with that? The bootloader has all that information
>>>>>> and can patch the DT correctly. In fact, in my specific case, I have
>>>>>> platform which can be populated with differently sized DRAM, so the
>>>>>> holes are also dynamically calculated ; there is no one DT then, the
>>>>>> bootloader is responsible to generate the dma-ranges accordingly.
>>>>>
>>>>> The problems are it doesn't work:
>>>>>
>>>>> Your dma-mask and offset are not going to be correct.
>>>>>
>>>>> You are running out of inbound windows. Your patch does nothing to
>>>>> solve that. The solution would be merging multiple dma-ranges entries
>>>>> to a single inbound window. We'd have to do that both for dma-mask and
>>>>> inbound windows. The former would also have to figure out which
>>>>> entries apply to setting up dma-mask. I'm simply suggesting just do
>>>>> that up front and avoid any pointless splits.
>>>>
>>>> But then the PCI device can trigger a transaction to non-existent DRAM
>>>> and cause undefined behavior. Surely we do not want that ?
>>>
>>> The PCI device will trigger transactions to memory only when instructed
>>> to do so by Linux, right?  Hence if Linux takes into account
>>> chosen/memory
>>> and dma-ranges, there is no problem?
>>
>> Unless of course the remote device initiates a transfer. And if the
>> controller is programmed such that accesses to the missing DRAM in the
>> holes are not filtered out by the controller, then the controller will
>> gladly let the transaction through. Do we really want to let this
>> happen ?
> 
> If you've got devices making random unsolicited accesses then who's to
> say they wouldn't also hit valid windows and corrupt memory? If it's
> happening at all you've already lost.

Not necessarily. If your controller is programmed correctly with just
the ranges that are valid, then it will filter out at least the accesses
outside of valid memory. If it is programmed incorrectly, as you
suggest, then the accesses will go through, causing undefined behavior.

And note that there is such weird buggy PCI hardware. A slightly
unrelated example are some of the ath9k, which are generating spurious
MSIs even if they are in legacy PCI IRQ mode. If the controller is
configured correctly, even those buggy cards work, because it can filter
the spurious MSIs out. If not, they do not.

That's why I would prefer to configure the controller correctly, not
just hope that nothing bad will come out of misconfiguring it slightly.

> And realistically, if the address
> isn't valid then it's not going to make much difference anyway - in
> probably 99% of cases, either the transaction doesn't hit a window and
> the host bridge returns a completer abort, or it does hit a window, the
> AXI side returns DECERR or SLVERR, and the host bridge translates that
> into a completer abort. Consider also that many PCI IPs don't have
> discrete windows and just map the entirety of PCI mem space directly to
> the system PA space.

And in that 1% of cases, we are OK with failure which could have been
easily prevented if the controller was programmed correctly ? That does
not look like a good thing.

> I don't believe this is a valid argument for anything whichever way round.
Rob Herring Oct. 17, 2019, 2:36 p.m. UTC | #20
On Thu, Oct 17, 2019 at 9:00 AM Marek Vasut <marek.vasut@gmail.com> wrote:
>
> On 10/17/19 3:06 PM, Robin Murphy wrote:
> > On 17/10/2019 11:55, Marek Vasut wrote:
> >> On 10/17/19 9:06 AM, Geert Uytterhoeven wrote:
> >>
> >> [...]
> >>
> >>>>>>> I suppose if your intent is to use inbound windows as a poor man's
> >>>>>>> IOMMU to prevent accesses to the holes, then yes you would list them
> >>>>>>> out. But I think that's wrong and difficult to maintain. You'd also
> >>>>>>> need to deal with reserved-memory regions too.
> >>>>>>
> >>>>>> What's the problem with that? The bootloader has all that information
> >>>>>> and can patch the DT correctly. In fact, in my specific case, I have
> >>>>>> platform which can be populated with differently sized DRAM, so the
> >>>>>> holes are also dynamically calculated ; there is no one DT then, the
> >>>>>> bootloader is responsible to generate the dma-ranges accordingly.
> >>>>>
> >>>>> The problems are it doesn't work:
> >>>>>
> >>>>> Your dma-mask and offset are not going to be correct.
> >>>>>
> >>>>> You are running out of inbound windows. Your patch does nothing to
> >>>>> solve that. The solution would be merging multiple dma-ranges entries
> >>>>> to a single inbound window. We'd have to do that both for dma-mask and
> >>>>> inbound windows. The former would also have to figure out which
> >>>>> entries apply to setting up dma-mask. I'm simply suggesting just do
> >>>>> that up front and avoid any pointless splits.
> >>>>
> >>>> But then the PCI device can trigger a transaction to non-existent DRAM
> >>>> and cause undefined behavior. Surely we do not want that ?
> >>>
> >>> The PCI device will trigger transactions to memory only when instructed
> >>> to do so by Linux, right?  Hence if Linux takes into account
> >>> chosen/memory
> >>> and dma-ranges, there is no problem?
> >>
> >> Unless of course the remote device initiates a transfer. And if the
> >> controller is programmed such that accesses to the missing DRAM in the
> >> holes are not filtered out by the controller, then the controller will
> >> gladly let the transaction through. Do we really want to let this
> >> happen ?
> >
> > If you've got devices making random unsolicited accesses then who's to
> > say they wouldn't also hit valid windows and corrupt memory? If it's
> > happening at all you've already lost.
>
> Not necessarily. If your controller is programmed correctly with just
> the ranges that are valid, then it will filter out at least the accesses
> outside of valid memory. If it is programmed incorrectly, as you
> suggest, then the accesses will go through, causing undefined behavior.
>
> And note that there is such weird buggy PCI hardware. A slightly
> unrelated example are some of the ath9k, which are generating spurious
> MSIs even if they are in legacy PCI IRQ mode. If the controller is
> configured correctly, even those buggy cards work, because it can filter
> the spurious MSIs out. If not, they do not.

How do those devices work on h/w without inbound window configuration
or they don't?

How do the spurious MSIs only go to invalid addresses and not valid addresses?

> That's why I would prefer to configure the controller correctly, not
> just hope that nothing bad will come out of misconfiguring it slightly.

Again, just handling the first N dma-ranges entries and ignoring the
rest is not 'configure the controller correctly'.

> > And realistically, if the address
> > isn't valid then it's not going to make much difference anyway - in
> > probably 99% of cases, either the transaction doesn't hit a window and
> > the host bridge returns a completer abort, or it does hit a window, the
> > AXI side returns DECERR or SLVERR, and the host bridge translates that
> > into a completer abort. Consider also that many PCI IPs don't have
> > discrete windows and just map the entirety of PCI mem space directly to
> > the system PA space.
>
> And in that 1% of cases, we are OK with failure which could have been
> easily prevented if the controller was programmed correctly ? That does
> not look like a good thing.

You don't need dma-ranges if you want to handle holes in DRAM. Use
memblock to get this information. Then it will work if you boot using
UEFI too.

dma-ranges at the PCI bridge should be restrictions in the PCI bridge,
not ones somewhere else in the system.

Rob
Marek Vasut Oct. 17, 2019, 3:01 p.m. UTC | #21
On 10/17/19 4:36 PM, Rob Herring wrote:
[...]>>>>> The PCI device will trigger transactions to memory only when
instructed
>>>>> to do so by Linux, right?  Hence if Linux takes into account
>>>>> chosen/memory
>>>>> and dma-ranges, there is no problem?
>>>>
>>>> Unless of course the remote device initiates a transfer. And if the
>>>> controller is programmed such that accesses to the missing DRAM in the
>>>> holes are not filtered out by the controller, then the controller will
>>>> gladly let the transaction through. Do we really want to let this
>>>> happen ?
>>>
>>> If you've got devices making random unsolicited accesses then who's to
>>> say they wouldn't also hit valid windows and corrupt memory? If it's
>>> happening at all you've already lost.
>>
>> Not necessarily. If your controller is programmed correctly with just
>> the ranges that are valid, then it will filter out at least the accesses
>> outside of valid memory. If it is programmed incorrectly, as you
>> suggest, then the accesses will go through, causing undefined behavior.
>>
>> And note that there is such weird buggy PCI hardware. A slightly
>> unrelated example are some of the ath9k, which are generating spurious
>> MSIs even if they are in legacy PCI IRQ mode. If the controller is
>> configured correctly, even those buggy cards work, because it can filter
>> the spurious MSIs out. If not, they do not.
> 
> How do those devices work on h/w without inbound window configuration
> or they don't?

With legacy IRQs.

> How do the spurious MSIs only go to invalid addresses and not valid addresses?

They do not, the point was, there is such broken hardware so the
controller should be programmed correctly.

>> That's why I would prefer to configure the controller correctly, not
>> just hope that nothing bad will come out of misconfiguring it slightly.
> 
> Again, just handling the first N dma-ranges entries and ignoring the
> rest is not 'configure the controller correctly'.

It's the best effort thing to do. It's well possible the next generation
of the controller will have more windows, so could accommodate the whole
list of ranges.

Thinking about this further, this patch should be OK either way, if
there is a DT which defines more DMA ranges than the controller can
handle, handling some is better than failing outright -- a PCI which
works with a subset of memory is better than PCI that does not work at all.

>>> And realistically, if the address
>>> isn't valid then it's not going to make much difference anyway - in
>>> probably 99% of cases, either the transaction doesn't hit a window and
>>> the host bridge returns a completer abort, or it does hit a window, the
>>> AXI side returns DECERR or SLVERR, and the host bridge translates that
>>> into a completer abort. Consider also that many PCI IPs don't have
>>> discrete windows and just map the entirety of PCI mem space directly to
>>> the system PA space.
>>
>> And in that 1% of cases, we are OK with failure which could have been
>> easily prevented if the controller was programmed correctly ? That does
>> not look like a good thing.
> 
> You don't need dma-ranges if you want to handle holes in DRAM. Use
> memblock to get this information. Then it will work if you boot using
> UEFI too.

Do you have any further details about this ?

> dma-ranges at the PCI bridge should be restrictions in the PCI bridge,
> not ones somewhere else in the system.
Lorenzo Pieralisi Oct. 18, 2019, 9:53 a.m. UTC | #22
On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote:

[...]

> > Again, just handling the first N dma-ranges entries and ignoring the
> > rest is not 'configure the controller correctly'.
> 
> It's the best effort thing to do. It's well possible the next generation
> of the controller will have more windows, so could accommodate the whole
> list of ranges.
> 
> Thinking about this further, this patch should be OK either way, if
> there is a DT which defines more DMA ranges than the controller can
> handle, handling some is better than failing outright -- a PCI which
> works with a subset of memory is better than PCI that does not work at all.

OK to sum it up, this patch is there to deal with u-boot adding multiple
dma-ranges to DT. I still do not understand the benefit given that for
DMA masks they are useless as Rob pointed out and ditto for inbound
windows programming (given that AFAICS the PCI controller filters out
any transaction that does not fall within its inbound windows by default
so adding dma-ranges has the net effect of widening the DMA'able address
space rather than limiting it).

In short, what's the benefit of adding more dma-ranges regions to the
DT (and consequently handling them in the kernel) ?

Thanks,
Lorenzo

> >>> And realistically, if the address
> >>> isn't valid then it's not going to make much difference anyway - in
> >>> probably 99% of cases, either the transaction doesn't hit a window and
> >>> the host bridge returns a completer abort, or it does hit a window, the
> >>> AXI side returns DECERR or SLVERR, and the host bridge translates that
> >>> into a completer abort. Consider also that many PCI IPs don't have
> >>> discrete windows and just map the entirety of PCI mem space directly to
> >>> the system PA space.
> >>
> >> And in that 1% of cases, we are OK with failure which could have been
> >> easily prevented if the controller was programmed correctly ? That does
> >> not look like a good thing.
> > 
> > You don't need dma-ranges if you want to handle holes in DRAM. Use
> > memblock to get this information. Then it will work if you boot using
> > UEFI too.
> 
> Do you have any further details about this ?
> 
> > dma-ranges at the PCI bridge should be restrictions in the PCI bridge,
> > not ones somewhere else in the system.
> 
> -- 
> Best regards,
> Marek Vasut
Andrew Murray Oct. 18, 2019, 10:06 a.m. UTC | #23
On Thu, Oct 17, 2019 at 12:33:24AM +0200, Marek Vasut wrote:
> On 10/17/19 12:26 AM, Rob Herring wrote:
> [...]
> >>>> You can have multiple non-continuous DRAM banks for example. And an
> >>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a
> >>>> separate dma-ranges entry, right ?
> >>>
> >>> Not necessarily. We really only want to define the minimum we have to.
> >>> The ideal system is no dma-ranges. Is each bank at a different
> >>> relative position compared to the CPU's view of the system. That would
> >>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change.
> >>
> >> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit
> >> boundary and some more above the 32bit boundary. These two banks don't
> >> need to be continuous. And then you could add the SRAM into the mix.
> > 
> > Continuous is irrelevant. My question was in more specific terms is
> > (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs.
> > PCI host view (i.e. bus addr)? If not, then that is 1 translation and
> > 1 dma-ranges entry.
> 
> I don't think it's different in that aspect. Except the bus has this
> 32bit limitation, where it only sees subset of the DRAM.
> 
> Why should the DMA ranges incorrectly cover also the DRAM which is not
> present ?

I think this is where there is a difference in understanding.

If I understand correctly, the job of the dma-ranges property isn't to
describe *what* ranges the PCI device can access - it's there to describe
*how*, i.e. the mapping between PCI and CPU-visible memory.

The dma-ranges property is a side-effect of how the busses are wired up
between the CPU and PCI controller - and so it doesn't matter what is or
isn't on those buses.

It's the job of other parts of the system to ensure that PCI devices are
told the correct addresses to write to, e.g. the enumerating software
referring to a valid CPU visible address correctly translated for the view
of the PCI device, ATS etc. And any IOMMU to enforce that.

It sounds like there is a 1:1 mapping between CPU and PCI - in which case
there isn't a reason for a dma-ranges.

Thanks,

Andrew Murray

> 
> >>> I suppose if your intent is to use inbound windows as a poor man's
> >>> IOMMU to prevent accesses to the holes, then yes you would list them
> >>> out. But I think that's wrong and difficult to maintain. You'd also
> >>> need to deal with reserved-memory regions too.
> >>
> >> What's the problem with that? The bootloader has all that information
> >> and can patch the DT correctly. In fact, in my specific case, I have
> >> platform which can be populated with differently sized DRAM, so the
> >> holes are also dynamically calculated ; there is no one DT then, the
> >> bootloader is responsible to generate the dma-ranges accordingly.
> > 
> > The problems are it doesn't work:
> > 
> > Your dma-mask and offset are not going to be correct.
> > 
> > You are running out of inbound windows. Your patch does nothing to
> > solve that. The solution would be merging multiple dma-ranges entries
> > to a single inbound window. We'd have to do that both for dma-mask and
> > inbound windows. The former would also have to figure out which
> > entries apply to setting up dma-mask. I'm simply suggesting just do
> > that up front and avoid any pointless splits.
> 
> But then the PCI device can trigger a transaction to non-existent DRAM
> and cause undefined behavior. Surely we do not want that ?
> 
> > You are setting up random inbound windows. The bootloader can't assume
> > what order the OS parses dma-ranges, and the OS can't assume what
> > order the bootloader writes the entries.
> 
> But the OS can assume the ranges are correct and cover only valid
> memory, right ? That is, memory into which the PCI controller can safely
> access.
> 
> -- 
> Best regards,
> Marek Vasut
Geert Uytterhoeven Oct. 18, 2019, 10:17 a.m. UTC | #24
Hi Andrew,

On Fri, Oct 18, 2019 at 12:07 PM Andrew Murray <andrew.murray@arm.com> wrote:
> On Thu, Oct 17, 2019 at 12:33:24AM +0200, Marek Vasut wrote:
> > On 10/17/19 12:26 AM, Rob Herring wrote:
> > [...]
> > >>>> You can have multiple non-continuous DRAM banks for example. And an
> > >>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a
> > >>>> separate dma-ranges entry, right ?
> > >>>
> > >>> Not necessarily. We really only want to define the minimum we have to.
> > >>> The ideal system is no dma-ranges. Is each bank at a different
> > >>> relative position compared to the CPU's view of the system. That would
> > >>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change.
> > >>
> > >> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit
> > >> boundary and some more above the 32bit boundary. These two banks don't
> > >> need to be continuous. And then you could add the SRAM into the mix.
> > >
> > > Continuous is irrelevant. My question was in more specific terms is
> > > (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs.
> > > PCI host view (i.e. bus addr)? If not, then that is 1 translation and
> > > 1 dma-ranges entry.
> >
> > I don't think it's different in that aspect. Except the bus has this
> > 32bit limitation, where it only sees subset of the DRAM.
> >
> > Why should the DMA ranges incorrectly cover also the DRAM which is not
> > present ?
>
> I think this is where there is a difference in understanding.
>
> If I understand correctly, the job of the dma-ranges property isn't to
> describe *what* ranges the PCI device can access - it's there to describe
> *how*, i.e. the mapping between PCI and CPU-visible memory.
>
> The dma-ranges property is a side-effect of how the busses are wired up
> between the CPU and PCI controller - and so it doesn't matter what is or
> isn't on those buses.
>
> It's the job of other parts of the system to ensure that PCI devices are
> told the correct addresses to write to, e.g. the enumerating software
> referring to a valid CPU visible address correctly translated for the view
> of the PCI device, ATS etc. And any IOMMU to enforce that.

Yep, that's what I thought, too.

> It sounds like there is a 1:1 mapping between CPU and PCI - in which case
> there isn't a reason for a dma-ranges.

There's still the 32-bit limitation: PCI devices can access low 32-bit addresses
only.

Gr{oetje,eeting}s,

                        Geert
Andrew Murray Oct. 18, 2019, 11:40 a.m. UTC | #25
On Fri, Oct 18, 2019 at 12:17:38PM +0200, Geert Uytterhoeven wrote:
> Hi Andrew,
> 
> On Fri, Oct 18, 2019 at 12:07 PM Andrew Murray <andrew.murray@arm.com> wrote:
> > On Thu, Oct 17, 2019 at 12:33:24AM +0200, Marek Vasut wrote:
> > > On 10/17/19 12:26 AM, Rob Herring wrote:
> > > [...]
> > > >>>> You can have multiple non-continuous DRAM banks for example. And an
> > > >>>> entry for SRAM optionally. Each DRAM bank and/or the SRAM should have a
> > > >>>> separate dma-ranges entry, right ?
> > > >>>
> > > >>> Not necessarily. We really only want to define the minimum we have to.
> > > >>> The ideal system is no dma-ranges. Is each bank at a different
> > > >>> relative position compared to the CPU's view of the system. That would
> > > >>> seem doubtful for just DRAM banks. Perhaps DRAM and SRAM could change.
> > > >>
> > > >> Is that a question ? Anyway, yes, there is a bit of DRAM below the 32bit
> > > >> boundary and some more above the 32bit boundary. These two banks don't
> > > >> need to be continuous. And then you could add the SRAM into the mix.
> > > >
> > > > Continuous is irrelevant. My question was in more specific terms is
> > > > (bank1 addr - bank0 addr) different for CPU's view (i.e phys addr) vs.
> > > > PCI host view (i.e. bus addr)? If not, then that is 1 translation and
> > > > 1 dma-ranges entry.
> > >
> > > I don't think it's different in that aspect. Except the bus has this
> > > 32bit limitation, where it only sees subset of the DRAM.
> > >
> > > Why should the DMA ranges incorrectly cover also the DRAM which is not
> > > present ?
> >
> > I think this is where there is a difference in understanding.
> >
> > If I understand correctly, the job of the dma-ranges property isn't to
> > describe *what* ranges the PCI device can access - it's there to describe
> > *how*, i.e. the mapping between PCI and CPU-visible memory.
> >
> > The dma-ranges property is a side-effect of how the busses are wired up
> > between the CPU and PCI controller - and so it doesn't matter what is or
> > isn't on those buses.
> >
> > It's the job of other parts of the system to ensure that PCI devices are
> > told the correct addresses to write to, e.g. the enumerating software
> > referring to a valid CPU visible address correctly translated for the view
> > of the PCI device, ATS etc. And any IOMMU to enforce that.
> 
> Yep, that's what I thought, too.
> 
> > It sounds like there is a 1:1 mapping between CPU and PCI - in which case
> > there isn't a reason for a dma-ranges.
> 
> There's still the 32-bit limitation: PCI devices can access low 32-bit addresses
> only.

I guess a single dma-range that is limited to 32bits would work here?

Thanks,

Andrew Murray

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
Marek Vasut Oct. 18, 2019, 12:22 p.m. UTC | #26
On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote:
> 
> [...]
> 
>>> Again, just handling the first N dma-ranges entries and ignoring the
>>> rest is not 'configure the controller correctly'.
>>
>> It's the best effort thing to do. It's well possible the next generation
>> of the controller will have more windows, so could accommodate the whole
>> list of ranges.
>>
>> Thinking about this further, this patch should be OK either way, if
>> there is a DT which defines more DMA ranges than the controller can
>> handle, handling some is better than failing outright -- a PCI which
>> works with a subset of memory is better than PCI that does not work at all.
> 
> OK to sum it up, this patch is there to deal with u-boot adding multiple
> dma-ranges to DT.

Yes, this patch was posted over two months ago, about the same time this
functionality was posted for inclusion in U-Boot. It made it into recent
U-Boot release, but there was no feedback on the Linux patch until recently.

U-Boot can be changed for the next release, assuming we agree on how it
should behave.

> I still do not understand the benefit given that for
> DMA masks they are useless as Rob pointed out and ditto for inbound
> windows programming (given that AFAICS the PCI controller filters out
> any transaction that does not fall within its inbound windows by default
> so adding dma-ranges has the net effect of widening the DMA'able address
> space rather than limiting it).
> 
> In short, what's the benefit of adding more dma-ranges regions to the
> DT (and consequently handling them in the kernel) ?

The benefit is programming the controller inbound windows correctly.
But if there is a better way to do that, I am open to implement that.
Are there any suggestions / examples of that ?

However, I think the discussion strayed quite far away from the real
goal of this patch. This patch only handles the case where there are too
many dma-ranges in the DT which cannot all be programmed into the
controller. Instead of failing, the patch allows the controller to work
with smaller range and reports that in the log, which I think is better
than outright failing.

[...]
Robin Murphy Oct. 18, 2019, 12:53 p.m. UTC | #27
On 18/10/2019 13:22, Marek Vasut wrote:
> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote:
>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote:
>>
>> [...]
>>
>>>> Again, just handling the first N dma-ranges entries and ignoring the
>>>> rest is not 'configure the controller correctly'.
>>>
>>> It's the best effort thing to do. It's well possible the next generation
>>> of the controller will have more windows, so could accommodate the whole
>>> list of ranges.

In the context of DT describing the platform that doesn't make any 
sense. It's like saying it's fine for U-Boot to also describe a bunch of 
non-existent CPUs just because future SoCs might have them. Just because 
the system would probably still boot doesn't mean it's right.

>>> Thinking about this further, this patch should be OK either way, if
>>> there is a DT which defines more DMA ranges than the controller can
>>> handle, handling some is better than failing outright -- a PCI which
>>> works with a subset of memory is better than PCI that does not work at all.
>>
>> OK to sum it up, this patch is there to deal with u-boot adding multiple
>> dma-ranges to DT.
> 
> Yes, this patch was posted over two months ago, about the same time this
> functionality was posted for inclusion in U-Boot. It made it into recent
> U-Boot release, but there was no feedback on the Linux patch until recently.
> 
> U-Boot can be changed for the next release, assuming we agree on how it
> should behave.
> 
>> I still do not understand the benefit given that for
>> DMA masks they are useless as Rob pointed out and ditto for inbound
>> windows programming (given that AFAICS the PCI controller filters out
>> any transaction that does not fall within its inbound windows by default
>> so adding dma-ranges has the net effect of widening the DMA'able address
>> space rather than limiting it).
>>
>> In short, what's the benefit of adding more dma-ranges regions to the
>> DT (and consequently handling them in the kernel) ?
> 
> The benefit is programming the controller inbound windows correctly.
> But if there is a better way to do that, I am open to implement that.
> Are there any suggestions / examples of that ?

The crucial thing is that once we improve the existing "dma-ranges" 
handling in the DMA layer such that it *does* consider multiple entries 
properly, platforms presenting ranges which don't actually exist will 
almost certainly start going wrong, and are either going to have to fix 
their broken bootloaders or try to make a case for platform-specific 
workarounds in core code.

Robin.

> However, I think the discussion strayed quite far away from the real
> goal of this patch. This patch only handles the case where there are too
> many dma-ranges in the DT which cannot all be programmed into the
> controller. Instead of failing, the patch allows the controller to work
> with smaller range and reports that in the log, which I think is better
> than outright failing.
> 
> [...]
>
Marek Vasut Oct. 18, 2019, 2:26 p.m. UTC | #28
On 10/18/19 2:53 PM, Robin Murphy wrote:
> On 18/10/2019 13:22, Marek Vasut wrote:
>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote:
>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote:
>>>
>>> [...]
>>>
>>>>> Again, just handling the first N dma-ranges entries and ignoring the
>>>>> rest is not 'configure the controller correctly'.
>>>>
>>>> It's the best effort thing to do. It's well possible the next
>>>> generation
>>>> of the controller will have more windows, so could accommodate the
>>>> whole
>>>> list of ranges.
> 
> In the context of DT describing the platform that doesn't make any
> sense. It's like saying it's fine for U-Boot to also describe a bunch of
> non-existent CPUs just because future SoCs might have them. Just because
> the system would probably still boot doesn't mean it's right.

It's the exact opposite of what you just described -- the last release
of U-Boot currently populates a subset of the DMA ranges, not a
superset. The dma-ranges in the Linux DT currently are a superset of
available DRAM on the platform.

>>>> Thinking about this further, this patch should be OK either way, if
>>>> there is a DT which defines more DMA ranges than the controller can
>>>> handle, handling some is better than failing outright -- a PCI which
>>>> works with a subset of memory is better than PCI that does not work
>>>> at all.
>>>
>>> OK to sum it up, this patch is there to deal with u-boot adding multiple
>>> dma-ranges to DT.
>>
>> Yes, this patch was posted over two months ago, about the same time this
>> functionality was posted for inclusion in U-Boot. It made it into recent
>> U-Boot release, but there was no feedback on the Linux patch until
>> recently.
>>
>> U-Boot can be changed for the next release, assuming we agree on how it
>> should behave.
>>
>>> I still do not understand the benefit given that for
>>> DMA masks they are useless as Rob pointed out and ditto for inbound
>>> windows programming (given that AFAICS the PCI controller filters out
>>> any transaction that does not fall within its inbound windows by default
>>> so adding dma-ranges has the net effect of widening the DMA'able address
>>> space rather than limiting it).
>>>
>>> In short, what's the benefit of adding more dma-ranges regions to the
>>> DT (and consequently handling them in the kernel) ?
>>
>> The benefit is programming the controller inbound windows correctly.
>> But if there is a better way to do that, I am open to implement that.
>> Are there any suggestions / examples of that ?
> 
> The crucial thing is that once we improve the existing "dma-ranges"
> handling in the DMA layer such that it *does* consider multiple entries
> properly, platforms presenting ranges which don't actually exist will
> almost certainly start going wrong, and are either going to have to fix
> their broken bootloaders or try to make a case for platform-specific
> workarounds in core code.
Again, this is exactly the other way around, the dma-ranges populated by
U-Boot cover only existing DRAM. The single dma-range in Linux DT covers
even the holes without existing DRAM.

So even if the Linux dma-ranges handling changes, there should be no
problem.

[...]
Robin Murphy Oct. 18, 2019, 3:44 p.m. UTC | #29
On 18/10/2019 15:26, Marek Vasut wrote:
> On 10/18/19 2:53 PM, Robin Murphy wrote:
>> On 18/10/2019 13:22, Marek Vasut wrote:
>>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote:
>>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote:
>>>>
>>>> [...]
>>>>
>>>>>> Again, just handling the first N dma-ranges entries and ignoring the
>>>>>> rest is not 'configure the controller correctly'.
>>>>>
>>>>> It's the best effort thing to do. It's well possible the next
>>>>> generation
>>>>> of the controller will have more windows, so could accommodate the
>>>>> whole
>>>>> list of ranges.
>>
>> In the context of DT describing the platform that doesn't make any
>> sense. It's like saying it's fine for U-Boot to also describe a bunch of
>> non-existent CPUs just because future SoCs might have them. Just because
>> the system would probably still boot doesn't mean it's right.
> 
> It's the exact opposite of what you just described -- the last release
> of U-Boot currently populates a subset of the DMA ranges, not a
> superset. The dma-ranges in the Linux DT currently are a superset of
> available DRAM on the platform.

I'm not talking about the overall coverage of addresses - I've already 
made clear what I think about that - I'm talking about the *number* of 
individual entries. If the DT binding defines that dma-ranges entries 
directly represent bridge windows, then the bootloader for a given 
platform should never generate more entries than that platform has 
actual windows, because to do otherwise would be bogus.

>>>>> Thinking about this further, this patch should be OK either way, if
>>>>> there is a DT which defines more DMA ranges than the controller can
>>>>> handle, handling some is better than failing outright -- a PCI which
>>>>> works with a subset of memory is better than PCI that does not work
>>>>> at all.
>>>>
>>>> OK to sum it up, this patch is there to deal with u-boot adding multiple
>>>> dma-ranges to DT.
>>>
>>> Yes, this patch was posted over two months ago, about the same time this
>>> functionality was posted for inclusion in U-Boot. It made it into recent
>>> U-Boot release, but there was no feedback on the Linux patch until
>>> recently.
>>>
>>> U-Boot can be changed for the next release, assuming we agree on how it
>>> should behave.
>>>
>>>> I still do not understand the benefit given that for
>>>> DMA masks they are useless as Rob pointed out and ditto for inbound
>>>> windows programming (given that AFAICS the PCI controller filters out
>>>> any transaction that does not fall within its inbound windows by default
>>>> so adding dma-ranges has the net effect of widening the DMA'able address
>>>> space rather than limiting it).
>>>>
>>>> In short, what's the benefit of adding more dma-ranges regions to the
>>>> DT (and consequently handling them in the kernel) ?
>>>
>>> The benefit is programming the controller inbound windows correctly.
>>> But if there is a better way to do that, I am open to implement that.
>>> Are there any suggestions / examples of that ?
>>
>> The crucial thing is that once we improve the existing "dma-ranges"
>> handling in the DMA layer such that it *does* consider multiple entries
>> properly, platforms presenting ranges which don't actually exist will
>> almost certainly start going wrong, and are either going to have to fix
>> their broken bootloaders or try to make a case for platform-specific
>> workarounds in core code.
> Again, this is exactly the other way around, the dma-ranges populated by
> U-Boot cover only existing DRAM. The single dma-range in Linux DT covers
> even the holes without existing DRAM.
> 
> So even if the Linux dma-ranges handling changes, there should be no
> problem.

Say you have a single hardware window, and this DT property (1-cell 
numbers for simplicity:

	dma-ranges = <0x00000000 0x00000000 0x80000000>;

Driver reads one entry and programs the window to 2GB@0, DMA setup 
parses the first entry and sets device masks to 0x7fffffff, and 
everything's fine.

Now say we describe the exact same address range this way instead:

	dma-ranges = <0x00000000 0x00000000 0x40000000,
		      0x40000000 0x40000000 0x40000000>;

Driver reads one entry and programs the window to 1GB@0, DMA setup 
parses the first entry and sets device masks to 0x3fffffff, and *today*, 
things are suboptimal but happen to work.

Now say we finally get round to fixing the of_dma code to properly 
generate DMA masks that actually include all usable address bits, a user 
upgrades their kernel package, and reboots with that same DT...

Driver reads one entry and programs the window to 1GB@0, DMA setup 
parses all entries and sets device masks to 0x7fffffff, devices start 
randomly failing or throwing DMA errors half the time, angry user looks 
at the changelog to find that somebody decided their now-corrupted 
filesystem is less important than the fact that hey, at least the 
machine didn't refuse to boot because the DT was obviously wrong. Are 
you sure that shouldn't be a problem?


Now, if you want to read the DT binding as less strict and let it just 
describe some arbitrarily-complex set of address ranges that should be 
valid for DMA, that's not insurmountable; you just need more complex 
logic in your driver capable of calculating how best to cover *all* 
those ranges using the available number of windows.

Robin.
Marek Vasut Oct. 18, 2019, 4:44 p.m. UTC | #30
On 10/18/19 5:44 PM, Robin Murphy wrote:
> On 18/10/2019 15:26, Marek Vasut wrote:
>> On 10/18/19 2:53 PM, Robin Murphy wrote:
>>> On 18/10/2019 13:22, Marek Vasut wrote:
>>>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote:
>>>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>> Again, just handling the first N dma-ranges entries and ignoring the
>>>>>>> rest is not 'configure the controller correctly'.
>>>>>>
>>>>>> It's the best effort thing to do. It's well possible the next
>>>>>> generation
>>>>>> of the controller will have more windows, so could accommodate the
>>>>>> whole
>>>>>> list of ranges.
>>>
>>> In the context of DT describing the platform that doesn't make any
>>> sense. It's like saying it's fine for U-Boot to also describe a bunch of
>>> non-existent CPUs just because future SoCs might have them. Just because
>>> the system would probably still boot doesn't mean it's right.
>>
>> It's the exact opposite of what you just described -- the last release
>> of U-Boot currently populates a subset of the DMA ranges, not a
>> superset. The dma-ranges in the Linux DT currently are a superset of
>> available DRAM on the platform.
> 
> I'm not talking about the overall coverage of addresses - I've already
> made clear what I think about that - I'm talking about the *number* of
> individual entries. If the DT binding defines that dma-ranges entries
> directly represent bridge windows, then the bootloader for a given
> platform should never generate more entries than that platform has
> actual windows, because to do otherwise would be bogus.

I have a feeling that's not how Rob defined the dma-ranges in this
discussion though.

>>>>>> Thinking about this further, this patch should be OK either way, if
>>>>>> there is a DT which defines more DMA ranges than the controller can
>>>>>> handle, handling some is better than failing outright -- a PCI which
>>>>>> works with a subset of memory is better than PCI that does not work
>>>>>> at all.
>>>>>
>>>>> OK to sum it up, this patch is there to deal with u-boot adding
>>>>> multiple
>>>>> dma-ranges to DT.
>>>>
>>>> Yes, this patch was posted over two months ago, about the same time
>>>> this
>>>> functionality was posted for inclusion in U-Boot. It made it into
>>>> recent
>>>> U-Boot release, but there was no feedback on the Linux patch until
>>>> recently.
>>>>
>>>> U-Boot can be changed for the next release, assuming we agree on how it
>>>> should behave.
>>>>
>>>>> I still do not understand the benefit given that for
>>>>> DMA masks they are useless as Rob pointed out and ditto for inbound
>>>>> windows programming (given that AFAICS the PCI controller filters out
>>>>> any transaction that does not fall within its inbound windows by
>>>>> default
>>>>> so adding dma-ranges has the net effect of widening the DMA'able
>>>>> address
>>>>> space rather than limiting it).
>>>>>
>>>>> In short, what's the benefit of adding more dma-ranges regions to the
>>>>> DT (and consequently handling them in the kernel) ?
>>>>
>>>> The benefit is programming the controller inbound windows correctly.
>>>> But if there is a better way to do that, I am open to implement that.
>>>> Are there any suggestions / examples of that ?
>>>
>>> The crucial thing is that once we improve the existing "dma-ranges"
>>> handling in the DMA layer such that it *does* consider multiple entries
>>> properly, platforms presenting ranges which don't actually exist will
>>> almost certainly start going wrong, and are either going to have to fix
>>> their broken bootloaders or try to make a case for platform-specific
>>> workarounds in core code.
>> Again, this is exactly the other way around, the dma-ranges populated by
>> U-Boot cover only existing DRAM. The single dma-range in Linux DT covers
>> even the holes without existing DRAM.
>>
>> So even if the Linux dma-ranges handling changes, there should be no
>> problem.
> 
> Say you have a single hardware window, and this DT property (1-cell
> numbers for simplicity:
> 
>     dma-ranges = <0x00000000 0x00000000 0x80000000>;
> 
> Driver reads one entry and programs the window to 2GB@0, DMA setup
> parses the first entry and sets device masks to 0x7fffffff, and
> everything's fine.
> 
> Now say we describe the exact same address range this way instead:
> 
>     dma-ranges = <0x00000000 0x00000000 0x40000000,
>               0x40000000 0x40000000 0x40000000>;
> 
> Driver reads one entry and programs the window to 1GB@0, DMA setup
> parses the first entry and sets device masks to 0x3fffffff, and *today*,
> things are suboptimal but happen to work.
> 
> Now say we finally get round to fixing the of_dma code to properly
> generate DMA masks that actually include all usable address bits, a user
> upgrades their kernel package, and reboots with that same DT...
> 
> Driver reads one entry and programs the window to 1GB@0, DMA setup
> parses all entries and sets device masks to 0x7fffffff, devices start
> randomly failing or throwing DMA errors half the time, angry user looks
> at the changelog to find that somebody decided their now-corrupted
> filesystem is less important than the fact that hey, at least the
> machine didn't refuse to boot because the DT was obviously wrong. Are
> you sure that shouldn't be a problem?

I think you picked a rather special case here and arrived as a DMA mask
which just fails in this special case. Such special case doesn't happen
here, and even if it did, I would expect Linux to merge those two ranges
or do something sane ? If the DMA mask is set incorrectly, that's a bug
of the DMA code I would think.

What DMA mask would you get if those two entries had a gap inbetween
them ? E.g.:

 dma-ranges = <0x00000000 0x00000000 0x20000000,
               0x40000000 0x40000000 0x20000000>;

> Now, if you want to read the DT binding as less strict and let it just
> describe some arbitrarily-complex set of address ranges that should be
> valid for DMA, that's not insurmountable; you just need more complex
> logic in your driver capable of calculating how best to cover *all*
> those ranges using the available number of windows.

That's what the driver does with this patchset, except it's not possible
to cover all those ranges. It covers them as well as it can.
Robin Murphy Oct. 18, 2019, 5:35 p.m. UTC | #31
On 18/10/2019 17:44, Marek Vasut wrote:
> On 10/18/19 5:44 PM, Robin Murphy wrote:
>> On 18/10/2019 15:26, Marek Vasut wrote:
>>> On 10/18/19 2:53 PM, Robin Murphy wrote:
>>>> On 18/10/2019 13:22, Marek Vasut wrote:
>>>>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote:
>>>>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>>> Again, just handling the first N dma-ranges entries and ignoring the
>>>>>>>> rest is not 'configure the controller correctly'.
>>>>>>>
>>>>>>> It's the best effort thing to do. It's well possible the next
>>>>>>> generation
>>>>>>> of the controller will have more windows, so could accommodate the
>>>>>>> whole
>>>>>>> list of ranges.
>>>>
>>>> In the context of DT describing the platform that doesn't make any
>>>> sense. It's like saying it's fine for U-Boot to also describe a bunch of
>>>> non-existent CPUs just because future SoCs might have them. Just because
>>>> the system would probably still boot doesn't mean it's right.
>>>
>>> It's the exact opposite of what you just described -- the last release
>>> of U-Boot currently populates a subset of the DMA ranges, not a
>>> superset. The dma-ranges in the Linux DT currently are a superset of
>>> available DRAM on the platform.
>>
>> I'm not talking about the overall coverage of addresses - I've already
>> made clear what I think about that - I'm talking about the *number* of
>> individual entries. If the DT binding defines that dma-ranges entries
>> directly represent bridge windows, then the bootloader for a given
>> platform should never generate more entries than that platform has
>> actual windows, because to do otherwise would be bogus.
> 
> I have a feeling that's not how Rob defined the dma-ranges in this
> discussion though.
> 
>>>>>>> Thinking about this further, this patch should be OK either way, if
>>>>>>> there is a DT which defines more DMA ranges than the controller can
>>>>>>> handle, handling some is better than failing outright -- a PCI which
>>>>>>> works with a subset of memory is better than PCI that does not work
>>>>>>> at all.
>>>>>>
>>>>>> OK to sum it up, this patch is there to deal with u-boot adding
>>>>>> multiple
>>>>>> dma-ranges to DT.
>>>>>
>>>>> Yes, this patch was posted over two months ago, about the same time
>>>>> this
>>>>> functionality was posted for inclusion in U-Boot. It made it into
>>>>> recent
>>>>> U-Boot release, but there was no feedback on the Linux patch until
>>>>> recently.
>>>>>
>>>>> U-Boot can be changed for the next release, assuming we agree on how it
>>>>> should behave.
>>>>>
>>>>>> I still do not understand the benefit given that for
>>>>>> DMA masks they are useless as Rob pointed out and ditto for inbound
>>>>>> windows programming (given that AFAICS the PCI controller filters out
>>>>>> any transaction that does not fall within its inbound windows by
>>>>>> default
>>>>>> so adding dma-ranges has the net effect of widening the DMA'able
>>>>>> address
>>>>>> space rather than limiting it).
>>>>>>
>>>>>> In short, what's the benefit of adding more dma-ranges regions to the
>>>>>> DT (and consequently handling them in the kernel) ?
>>>>>
>>>>> The benefit is programming the controller inbound windows correctly.
>>>>> But if there is a better way to do that, I am open to implement that.
>>>>> Are there any suggestions / examples of that ?
>>>>
>>>> The crucial thing is that once we improve the existing "dma-ranges"
>>>> handling in the DMA layer such that it *does* consider multiple entries
>>>> properly, platforms presenting ranges which don't actually exist will
>>>> almost certainly start going wrong, and are either going to have to fix
>>>> their broken bootloaders or try to make a case for platform-specific
>>>> workarounds in core code.
>>> Again, this is exactly the other way around, the dma-ranges populated by
>>> U-Boot cover only existing DRAM. The single dma-range in Linux DT covers
>>> even the holes without existing DRAM.
>>>
>>> So even if the Linux dma-ranges handling changes, there should be no
>>> problem.
>>
>> Say you have a single hardware window, and this DT property (1-cell
>> numbers for simplicity:
>>
>>      dma-ranges = <0x00000000 0x00000000 0x80000000>;
>>
>> Driver reads one entry and programs the window to 2GB@0, DMA setup
>> parses the first entry and sets device masks to 0x7fffffff, and
>> everything's fine.
>>
>> Now say we describe the exact same address range this way instead:
>>
>>      dma-ranges = <0x00000000 0x00000000 0x40000000,
>>                0x40000000 0x40000000 0x40000000>;
>>
>> Driver reads one entry and programs the window to 1GB@0, DMA setup
>> parses the first entry and sets device masks to 0x3fffffff, and *today*,
>> things are suboptimal but happen to work.
>>
>> Now say we finally get round to fixing the of_dma code to properly
>> generate DMA masks that actually include all usable address bits, a user
>> upgrades their kernel package, and reboots with that same DT...
>>
>> Driver reads one entry and programs the window to 1GB@0, DMA setup
>> parses all entries and sets device masks to 0x7fffffff, devices start
>> randomly failing or throwing DMA errors half the time, angry user looks
>> at the changelog to find that somebody decided their now-corrupted
>> filesystem is less important than the fact that hey, at least the
>> machine didn't refuse to boot because the DT was obviously wrong. Are
>> you sure that shouldn't be a problem?
> 
> I think you picked a rather special case here and arrived as a DMA mask
> which just fails in this special case. Such special case doesn't happen
> here, and even if it did, I would expect Linux to merge those two ranges
> or do something sane ? If the DMA mask is set incorrectly, that's a bug
> of the DMA code I would think.

The mask is not set incorrectly - DMA masks represent the number of 
address bits the device (or intermediate interconnect in the case of the 
bus mask) is capable of driving. Thus when DMA is limited to a specific 
address range, the masks should be wide enough to cover the topmost 
address of that range (unless the device's own capability is inherently 
narrower).

> What DMA mask would you get if those two entries had a gap inbetween
> them ? E.g.:
> 
>   dma-ranges = <0x00000000 0x00000000 0x20000000,
>                 0x40000000 0x40000000 0x20000000>;

OK, here's an real non-simplified example (note that these windows are 
fixed and not programmed by Linux):

	dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
			     <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
			     <0x43000000 0x8 0x80000000 0x8 0x80000000 0x2 0x00000000>;

The DMA masks for the devices behind this bridge *should* be 35 bits, 
because that's the size of the largest usable address. Currently, 
however, because of the of_dma code's deficiency they would end up being 
an utterly useless 30 bits, which isn't even enough to reach the start 
of RAM. Thus I can't actually have this property in my DT, and as a 
result I can't enable the IOMMU, because *that* also needs to know the 
ranges in order to reserve the unusable gaps between the windows once 
address translation is in play.

>> Now, if you want to read the DT binding as less strict and let it just
>> describe some arbitrarily-complex set of address ranges that should be
>> valid for DMA, that's not insurmountable; you just need more complex
>> logic in your driver capable of calculating how best to cover *all*
>> those ranges using the available number of windows.
> 
> That's what the driver does with this patchset, except it's not possible
> to cover all those ranges. It covers them as well as it can.

Which means by definition it *doesn't* do what I suggested there...

Robin.
Marek Vasut Oct. 18, 2019, 6:44 p.m. UTC | #32
On 10/18/19 7:35 PM, Robin Murphy wrote:
> On 18/10/2019 17:44, Marek Vasut wrote:
>> On 10/18/19 5:44 PM, Robin Murphy wrote:
>>> On 18/10/2019 15:26, Marek Vasut wrote:
>>>> On 10/18/19 2:53 PM, Robin Murphy wrote:
>>>>> On 18/10/2019 13:22, Marek Vasut wrote:
>>>>>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote:
>>>>>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote:
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>>> Again, just handling the first N dma-ranges entries and
>>>>>>>>> ignoring the
>>>>>>>>> rest is not 'configure the controller correctly'.
>>>>>>>>
>>>>>>>> It's the best effort thing to do. It's well possible the next
>>>>>>>> generation
>>>>>>>> of the controller will have more windows, so could accommodate the
>>>>>>>> whole
>>>>>>>> list of ranges.
>>>>>
>>>>> In the context of DT describing the platform that doesn't make any
>>>>> sense. It's like saying it's fine for U-Boot to also describe a
>>>>> bunch of
>>>>> non-existent CPUs just because future SoCs might have them. Just
>>>>> because
>>>>> the system would probably still boot doesn't mean it's right.
>>>>
>>>> It's the exact opposite of what you just described -- the last release
>>>> of U-Boot currently populates a subset of the DMA ranges, not a
>>>> superset. The dma-ranges in the Linux DT currently are a superset of
>>>> available DRAM on the platform.
>>>
>>> I'm not talking about the overall coverage of addresses - I've already
>>> made clear what I think about that - I'm talking about the *number* of
>>> individual entries. If the DT binding defines that dma-ranges entries
>>> directly represent bridge windows, then the bootloader for a given
>>> platform should never generate more entries than that platform has
>>> actual windows, because to do otherwise would be bogus.
>>
>> I have a feeling that's not how Rob defined the dma-ranges in this
>> discussion though.
>>
>>>>>>>> Thinking about this further, this patch should be OK either way, if
>>>>>>>> there is a DT which defines more DMA ranges than the controller can
>>>>>>>> handle, handling some is better than failing outright -- a PCI
>>>>>>>> which
>>>>>>>> works with a subset of memory is better than PCI that does not work
>>>>>>>> at all.
>>>>>>>
>>>>>>> OK to sum it up, this patch is there to deal with u-boot adding
>>>>>>> multiple
>>>>>>> dma-ranges to DT.
>>>>>>
>>>>>> Yes, this patch was posted over two months ago, about the same time
>>>>>> this
>>>>>> functionality was posted for inclusion in U-Boot. It made it into
>>>>>> recent
>>>>>> U-Boot release, but there was no feedback on the Linux patch until
>>>>>> recently.
>>>>>>
>>>>>> U-Boot can be changed for the next release, assuming we agree on
>>>>>> how it
>>>>>> should behave.
>>>>>>
>>>>>>> I still do not understand the benefit given that for
>>>>>>> DMA masks they are useless as Rob pointed out and ditto for inbound
>>>>>>> windows programming (given that AFAICS the PCI controller filters
>>>>>>> out
>>>>>>> any transaction that does not fall within its inbound windows by
>>>>>>> default
>>>>>>> so adding dma-ranges has the net effect of widening the DMA'able
>>>>>>> address
>>>>>>> space rather than limiting it).
>>>>>>>
>>>>>>> In short, what's the benefit of adding more dma-ranges regions to
>>>>>>> the
>>>>>>> DT (and consequently handling them in the kernel) ?
>>>>>>
>>>>>> The benefit is programming the controller inbound windows correctly.
>>>>>> But if there is a better way to do that, I am open to implement that.
>>>>>> Are there any suggestions / examples of that ?
>>>>>
>>>>> The crucial thing is that once we improve the existing "dma-ranges"
>>>>> handling in the DMA layer such that it *does* consider multiple
>>>>> entries
>>>>> properly, platforms presenting ranges which don't actually exist will
>>>>> almost certainly start going wrong, and are either going to have to
>>>>> fix
>>>>> their broken bootloaders or try to make a case for platform-specific
>>>>> workarounds in core code.
>>>> Again, this is exactly the other way around, the dma-ranges
>>>> populated by
>>>> U-Boot cover only existing DRAM. The single dma-range in Linux DT
>>>> covers
>>>> even the holes without existing DRAM.
>>>>
>>>> So even if the Linux dma-ranges handling changes, there should be no
>>>> problem.
>>>
>>> Say you have a single hardware window, and this DT property (1-cell
>>> numbers for simplicity:
>>>
>>>      dma-ranges = <0x00000000 0x00000000 0x80000000>;
>>>
>>> Driver reads one entry and programs the window to 2GB@0, DMA setup
>>> parses the first entry and sets device masks to 0x7fffffff, and
>>> everything's fine.
>>>
>>> Now say we describe the exact same address range this way instead:
>>>
>>>      dma-ranges = <0x00000000 0x00000000 0x40000000,
>>>                0x40000000 0x40000000 0x40000000>;
>>>
>>> Driver reads one entry and programs the window to 1GB@0, DMA setup
>>> parses the first entry and sets device masks to 0x3fffffff, and *today*,
>>> things are suboptimal but happen to work.
>>>
>>> Now say we finally get round to fixing the of_dma code to properly
>>> generate DMA masks that actually include all usable address bits, a user
>>> upgrades their kernel package, and reboots with that same DT...
>>>
>>> Driver reads one entry and programs the window to 1GB@0, DMA setup
>>> parses all entries and sets device masks to 0x7fffffff, devices start
>>> randomly failing or throwing DMA errors half the time, angry user looks
>>> at the changelog to find that somebody decided their now-corrupted
>>> filesystem is less important than the fact that hey, at least the
>>> machine didn't refuse to boot because the DT was obviously wrong. Are
>>> you sure that shouldn't be a problem?
>>
>> I think you picked a rather special case here and arrived as a DMA mask
>> which just fails in this special case. Such special case doesn't happen
>> here, and even if it did, I would expect Linux to merge those two ranges
>> or do something sane ? If the DMA mask is set incorrectly, that's a bug
>> of the DMA code I would think.
> 
> The mask is not set incorrectly - DMA masks represent the number of
> address bits the device (or intermediate interconnect in the case of the
> bus mask) is capable of driving. Thus when DMA is limited to a specific
> address range, the masks should be wide enough to cover the topmost
> address of that range (unless the device's own capability is inherently
> narrower).

Then the mask should be 0x7fffffff in both cases I'd say.

>> What DMA mask would you get if those two entries had a gap inbetween
>> them ? E.g.:
>>
>>   dma-ranges = <0x00000000 0x00000000 0x20000000,
>>                 0x40000000 0x40000000 0x20000000>;
> 
> OK, here's an real non-simplified example

I would really like an answer to the simple example above before we
start inventing convoluted ones.

> (note that these windows are fixed and not programmed by Linux):
> 
>     dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
>                  <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
>                  <0x43000000 0x8 0x80000000 0x8 0x80000000 0x2 0x00000000>;
> 
> The DMA masks for the devices behind this bridge *should* be 35 bits,
> because that's the size of the largest usable address. Currently,
> however, because of the of_dma code's deficiency they would end up being
> an utterly useless 30 bits, which isn't even enough to reach the start
> of RAM. Thus I can't actually have this property in my DT, and as a
> result I can't enable the IOMMU, because *that* also needs to know the
> ranges in order to reserve the unusable gaps between the windows once
> address translation is in play.

How is this related to this particular patch ? This looks more like some
DMA internal topic.

>>> Now, if you want to read the DT binding as less strict and let it just
>>> describe some arbitrarily-complex set of address ranges that should be
>>> valid for DMA, that's not insurmountable; you just need more complex
>>> logic in your driver capable of calculating how best to cover *all*
>>> those ranges using the available number of windows.
>>
>> That's what the driver does with this patchset, except it's not possible
>> to cover all those ranges. It covers them as well as it can.
> 
> Which means by definition it *doesn't* do what I suggested there...

But then it didn't do this before either, or ... ?
Geert Uytterhoeven Oct. 21, 2019, 8:32 a.m. UTC | #33
Hi Marek,

On Fri, Oct 18, 2019 at 9:03 PM Marek Vasut <marek.vasut@gmail.com> wrote:
> On 10/18/19 7:35 PM, Robin Murphy wrote:
> > On 18/10/2019 17:44, Marek Vasut wrote:
> >> On 10/18/19 5:44 PM, Robin Murphy wrote:
> >>> On 18/10/2019 15:26, Marek Vasut wrote:
> >>>> On 10/18/19 2:53 PM, Robin Murphy wrote:
> >>>>> On 18/10/2019 13:22, Marek Vasut wrote:
> >>>>>> On 10/18/19 11:53 AM, Lorenzo Pieralisi wrote:
> >>>>>>> On Thu, Oct 17, 2019 at 05:01:26PM +0200, Marek Vasut wrote:
> >>>>>>>
> >>>>>>> [...]
> >>>>>>>
> >>>>>>>>> Again, just handling the first N dma-ranges entries and
> >>>>>>>>> ignoring the
> >>>>>>>>> rest is not 'configure the controller correctly'.
> >>>>>>>>
> >>>>>>>> It's the best effort thing to do. It's well possible the next
> >>>>>>>> generation
> >>>>>>>> of the controller will have more windows, so could accommodate the
> >>>>>>>> whole
> >>>>>>>> list of ranges.
> >>>>>
> >>>>> In the context of DT describing the platform that doesn't make any
> >>>>> sense. It's like saying it's fine for U-Boot to also describe a
> >>>>> bunch of
> >>>>> non-existent CPUs just because future SoCs might have them. Just
> >>>>> because
> >>>>> the system would probably still boot doesn't mean it's right.
> >>>>
> >>>> It's the exact opposite of what you just described -- the last release
> >>>> of U-Boot currently populates a subset of the DMA ranges, not a
> >>>> superset. The dma-ranges in the Linux DT currently are a superset of
> >>>> available DRAM on the platform.
> >>>
> >>> I'm not talking about the overall coverage of addresses - I've already
> >>> made clear what I think about that - I'm talking about the *number* of
> >>> individual entries. If the DT binding defines that dma-ranges entries
> >>> directly represent bridge windows, then the bootloader for a given
> >>> platform should never generate more entries than that platform has
> >>> actual windows, because to do otherwise would be bogus.
> >>
> >> I have a feeling that's not how Rob defined the dma-ranges in this
> >> discussion though.
> >>
> >>>>>>>> Thinking about this further, this patch should be OK either way, if
> >>>>>>>> there is a DT which defines more DMA ranges than the controller can
> >>>>>>>> handle, handling some is better than failing outright -- a PCI
> >>>>>>>> which
> >>>>>>>> works with a subset of memory is better than PCI that does not work
> >>>>>>>> at all.
> >>>>>>>
> >>>>>>> OK to sum it up, this patch is there to deal with u-boot adding
> >>>>>>> multiple
> >>>>>>> dma-ranges to DT.
> >>>>>>
> >>>>>> Yes, this patch was posted over two months ago, about the same time
> >>>>>> this
> >>>>>> functionality was posted for inclusion in U-Boot. It made it into
> >>>>>> recent
> >>>>>> U-Boot release, but there was no feedback on the Linux patch until
> >>>>>> recently.
> >>>>>>
> >>>>>> U-Boot can be changed for the next release, assuming we agree on
> >>>>>> how it
> >>>>>> should behave.
> >>>>>>
> >>>>>>> I still do not understand the benefit given that for
> >>>>>>> DMA masks they are useless as Rob pointed out and ditto for inbound
> >>>>>>> windows programming (given that AFAICS the PCI controller filters
> >>>>>>> out
> >>>>>>> any transaction that does not fall within its inbound windows by
> >>>>>>> default
> >>>>>>> so adding dma-ranges has the net effect of widening the DMA'able
> >>>>>>> address
> >>>>>>> space rather than limiting it).
> >>>>>>>
> >>>>>>> In short, what's the benefit of adding more dma-ranges regions to
> >>>>>>> the
> >>>>>>> DT (and consequently handling them in the kernel) ?
> >>>>>>
> >>>>>> The benefit is programming the controller inbound windows correctly.
> >>>>>> But if there is a better way to do that, I am open to implement that.
> >>>>>> Are there any suggestions / examples of that ?
> >>>>>
> >>>>> The crucial thing is that once we improve the existing "dma-ranges"
> >>>>> handling in the DMA layer such that it *does* consider multiple
> >>>>> entries
> >>>>> properly, platforms presenting ranges which don't actually exist will
> >>>>> almost certainly start going wrong, and are either going to have to
> >>>>> fix
> >>>>> their broken bootloaders or try to make a case for platform-specific
> >>>>> workarounds in core code.
> >>>> Again, this is exactly the other way around, the dma-ranges
> >>>> populated by
> >>>> U-Boot cover only existing DRAM. The single dma-range in Linux DT
> >>>> covers
> >>>> even the holes without existing DRAM.
> >>>>
> >>>> So even if the Linux dma-ranges handling changes, there should be no
> >>>> problem.
> >>>
> >>> Say you have a single hardware window, and this DT property (1-cell
> >>> numbers for simplicity:
> >>>
> >>>      dma-ranges = <0x00000000 0x00000000 0x80000000>;
> >>>
> >>> Driver reads one entry and programs the window to 2GB@0, DMA setup
> >>> parses the first entry and sets device masks to 0x7fffffff, and
> >>> everything's fine.
> >>>
> >>> Now say we describe the exact same address range this way instead:
> >>>
> >>>      dma-ranges = <0x00000000 0x00000000 0x40000000,
> >>>                0x40000000 0x40000000 0x40000000>;
> >>>
> >>> Driver reads one entry and programs the window to 1GB@0, DMA setup
> >>> parses the first entry and sets device masks to 0x3fffffff, and *today*,
> >>> things are suboptimal but happen to work.
> >>>
> >>> Now say we finally get round to fixing the of_dma code to properly
> >>> generate DMA masks that actually include all usable address bits, a user
> >>> upgrades their kernel package, and reboots with that same DT...
> >>>
> >>> Driver reads one entry and programs the window to 1GB@0, DMA setup
> >>> parses all entries and sets device masks to 0x7fffffff, devices start
> >>> randomly failing or throwing DMA errors half the time, angry user looks
> >>> at the changelog to find that somebody decided their now-corrupted
> >>> filesystem is less important than the fact that hey, at least the
> >>> machine didn't refuse to boot because the DT was obviously wrong. Are
> >>> you sure that shouldn't be a problem?
> >>
> >> I think you picked a rather special case here and arrived as a DMA mask
> >> which just fails in this special case. Such special case doesn't happen
> >> here, and even if it did, I would expect Linux to merge those two ranges
> >> or do something sane ? If the DMA mask is set incorrectly, that's a bug
> >> of the DMA code I would think.
> >
> > The mask is not set incorrectly - DMA masks represent the number of
> > address bits the device (or intermediate interconnect in the case of the
> > bus mask) is capable of driving. Thus when DMA is limited to a specific
> > address range, the masks should be wide enough to cover the topmost
> > address of that range (unless the device's own capability is inherently
> > narrower).
>
> Then the mask should be 0x7fffffff in both cases I'd say.
>
> >> What DMA mask would you get if those two entries had a gap inbetween
> >> them ? E.g.:
> >>
> >>   dma-ranges = <0x00000000 0x00000000 0x20000000,
> >>                 0x40000000 0x40000000 0x20000000>;
> >
> > OK, here's an real non-simplified example
>
> I would really like an answer to the simple example above before we
> start inventing convoluted ones.

I'd say that depends on the actual memory configuration, i.e. does memory
exist in the gap? Does memory exist above 0x5fffffff?

BTW, reading Devicetree Specification, Release v0.2: dma-ranges
describes bus address space translation; it does not describe if the
full address space is backed by physical RAM (see the /memory node).
Hence if there's no memory in the gap, I see no reason to have 2 entries
in dma-ranges, as the translation offsets are the same, so they could be
covered by a single larger entry.

Gr{oetje,eeting}s,

                        Geert
Andrew Murray Oct. 21, 2019, 10:18 a.m. UTC | #34
On Fri, Aug 16, 2019 at 03:28:04PM +0200, Marek Vasut wrote:
> On 8/16/19 3:23 PM, Simon Horman wrote:
> > On Fri, Aug 09, 2019 at 07:57:40PM +0200, marek.vasut@gmail.com wrote:
> >> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> >>
> >> In case the "dma-ranges" DT property contains either too many ranges
> >> or the range start address is unaligned in such a way that populating
> >> the range into the controller requires multiple entries, a situation
> >> may occur where all ranges cannot be loaded into the controller.
> >>
> >> Currently, the driver refuses to probe in such a situation. Relax this
> >> behavior, load as many ranges as possible and warn if some ranges do
> >> not fit anymore.
> > 
> > What is the motivation for relaxing this?
> 
> U-Boot can fill the ranges in properly now, the list would be longer in
> such a case and the driver would fail to probe (because the list is
> longer than what the hardware can support).

Is this the U-Boot patch you refer to:

https://patchwork.ozlabs.org/patch/1129436/

As pci_set_region is called with the same address for PCI and CPU memory
this implies there is a 1:1 mapping - therefore I don't see a need for
multiple mappings for each DRAM bank. (Also if this controller has a
32 bit limitation, shouldn't this code limit the addresses before calling
pci_set_region?).

Thanks,

Andrew Murray

> 
> -- 
> Best regards,
> Marek Vasut
Marek Vasut Oct. 26, 2019, 6:03 p.m. UTC | #35
On 10/21/19 12:18 PM, Andrew Murray wrote:
[...]
>>>> In case the "dma-ranges" DT property contains either too many ranges
>>>> or the range start address is unaligned in such a way that populating
>>>> the range into the controller requires multiple entries, a situation
>>>> may occur where all ranges cannot be loaded into the controller.
>>>>
>>>> Currently, the driver refuses to probe in such a situation. Relax this
>>>> behavior, load as many ranges as possible and warn if some ranges do
>>>> not fit anymore.
>>>
>>> What is the motivation for relaxing this?
>>
>> U-Boot can fill the ranges in properly now, the list would be longer in
>> such a case and the driver would fail to probe (because the list is
>> longer than what the hardware can support).
> 
> Is this the U-Boot patch you refer to:
> 
> https://patchwork.ozlabs.org/patch/1129436/

Yes.

> As pci_set_region is called with the same address for PCI and CPU memory
> this implies there is a 1:1 mapping - therefore I don't see a need for
> multiple mappings for each DRAM bank. (Also if this controller has a
> 32 bit limitation, shouldn't this code limit the addresses before calling
> pci_set_region?).
It would certainly be helpful to know about this dma-ranges detail
earlier, this whole thing could've been avoided. Now all I can do is get
that patch reverted for the next U-Boot release.

But this still leaves me with one open question -- how do I figure out
what to program into the PCI controller inbound windows, so that the
controller correctly filters inbound transfers which are targetting
nonexisting memory ?
Andrew Murray Oct. 26, 2019, 8:36 p.m. UTC | #36
On Sat, Oct 26, 2019 at 08:03:12PM +0200, Marek Vasut wrote:
> On 10/21/19 12:18 PM, Andrew Murray wrote:
> [...]
> >>>> In case the "dma-ranges" DT property contains either too many ranges
> >>>> or the range start address is unaligned in such a way that populating
> >>>> the range into the controller requires multiple entries, a situation
> >>>> may occur where all ranges cannot be loaded into the controller.
> >>>>
> >>>> Currently, the driver refuses to probe in such a situation. Relax this
> >>>> behavior, load as many ranges as possible and warn if some ranges do
> >>>> not fit anymore.
> >>>
> >>> What is the motivation for relaxing this?
> >>
> >> U-Boot can fill the ranges in properly now, the list would be longer in
> >> such a case and the driver would fail to probe (because the list is
> >> longer than what the hardware can support).
> > 
> > Is this the U-Boot patch you refer to:
> > 
> > https://patchwork.ozlabs.org/patch/1129436/
> 
> Yes.
> 
> > As pci_set_region is called with the same address for PCI and CPU memory
> > this implies there is a 1:1 mapping - therefore I don't see a need for
> > multiple mappings for each DRAM bank. (Also if this controller has a
> > 32 bit limitation, shouldn't this code limit the addresses before calling
> > pci_set_region?).
> It would certainly be helpful to know about this dma-ranges detail
> earlier, this whole thing could've been avoided. Now all I can do is get
> that patch reverted for the next U-Boot release.

Yes, I can appreciate the frustration this delay has caused. Though as there
are now more reviewers for PCI controllers on this list, future patches ought
to get feedback sooner.

> 
> But this still leaves me with one open question -- how do I figure out
> what to program into the PCI controller inbound windows, so that the
> controller correctly filters inbound transfers which are targetting
> nonexisting memory ?

Your driver should program into the RC->CPU windows, the exact ranges
described in the dma-ranges. Whilst also respecting the alignment and
max-size rules your controller has (e.g. the existing upstream logic
and also the new logic that recalculates the alignment per entry).

As far as I can tell from looking at your U-Boot patch, I think I'd expect
a single dma-range to be presented in the DT, that describes
0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your
controller is limited to 32 bits. And 2) there is a linear mapping between
PCI and CPU addresses (given that the second and third arguments on
pci_set_region are both the same).

As you point out, this range includes lots of things that you don't
want the RC to touch - such as non-existent memory. This is OK, when
Linux programs addresses into the various EP's for them to DMA to host
memory, it uses its own logic to select addresses that are in RAM, the
purpose of the dma-range is to describe what the CPU RAM address looks
like from the perspective of the RC (for example if the RC was wired
with an offset such that made memory writes from the RC made to
0x00000000 end up on the system map at 0x80000000, we need to tell Linux
about this offset. Otherwise when a EP device driver programs a DMA
address of a RAM buffer at 0x90000000, it'll end up targetting
0x110000000. Thankfully our dma-range will tell Linux to apply an offset
such that the actual address written to the EP is 0x10000000.).

In your case the dma-range also serves to describe a limit to the range
of addresses we can reach.

Thanks,

Andrew Murray

> 
> -- 
> Best regards,
> Marek Vasut
Andrew Murray Oct. 26, 2019, 9:06 p.m. UTC | #37
On Sat, Oct 26, 2019 at 09:36:28PM +0100, Andrew Murray wrote:
> On Sat, Oct 26, 2019 at 08:03:12PM +0200, Marek Vasut wrote:
> > On 10/21/19 12:18 PM, Andrew Murray wrote:
> > [...]
> > >>>> In case the "dma-ranges" DT property contains either too many ranges
> > >>>> or the range start address is unaligned in such a way that populating
> > >>>> the range into the controller requires multiple entries, a situation
> > >>>> may occur where all ranges cannot be loaded into the controller.
> > >>>>
> > >>>> Currently, the driver refuses to probe in such a situation. Relax this
> > >>>> behavior, load as many ranges as possible and warn if some ranges do
> > >>>> not fit anymore.
> > >>>
> > >>> What is the motivation for relaxing this?
> > >>
> > >> U-Boot can fill the ranges in properly now, the list would be longer in
> > >> such a case and the driver would fail to probe (because the list is
> > >> longer than what the hardware can support).
> > > 
> > > Is this the U-Boot patch you refer to:
> > > 
> > > https://patchwork.ozlabs.org/patch/1129436/
> > 
> > Yes.
> > 
> > > As pci_set_region is called with the same address for PCI and CPU memory
> > > this implies there is a 1:1 mapping - therefore I don't see a need for
> > > multiple mappings for each DRAM bank. (Also if this controller has a
> > > 32 bit limitation, shouldn't this code limit the addresses before calling
> > > pci_set_region?).
> > It would certainly be helpful to know about this dma-ranges detail
> > earlier, this whole thing could've been avoided. Now all I can do is get
> > that patch reverted for the next U-Boot release.
> 
> Yes, I can appreciate the frustration this delay has caused. Though as there
> are now more reviewers for PCI controllers on this list, future patches ought
> to get feedback sooner.
> 
> > 
> > But this still leaves me with one open question -- how do I figure out
> > what to program into the PCI controller inbound windows, so that the
> > controller correctly filters inbound transfers which are targetting
> > nonexisting memory ?
> 
> Your driver should program into the RC->CPU windows, the exact ranges
> described in the dma-ranges. Whilst also respecting the alignment and
> max-size rules your controller has (e.g. the existing upstream logic
> and also the new logic that recalculates the alignment per entry).
> 
> As far as I can tell from looking at your U-Boot patch, I think I'd expect
> a single dma-range to be presented in the DT, that describes
> 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your
> controller is limited to 32 bits. And 2) there is a linear mapping between
> PCI and CPU addresses (given that the second and third arguments on
> pci_set_region are both the same).
> 
> As you point out, this range includes lots of things that you don't
> want the RC to touch - such as non-existent memory. This is OK, when
> Linux programs addresses into the various EP's for them to DMA to host
> memory, it uses its own logic to select addresses that are in RAM, the
> purpose of the dma-range is to describe what the CPU RAM address looks
> like from the perspective of the RC (for example if the RC was wired
> with an offset such that made memory writes from the RC made to
> 0x00000000 end up on the system map at 0x80000000, we need to tell Linux
> about this offset. Otherwise when a EP device driver programs a DMA
> address of a RAM buffer at 0x90000000, it'll end up targetting
> 0x110000000. Thankfully our dma-range will tell Linux to apply an offset
> such that the actual address written to the EP is 0x10000000.).

That last sentence should have read "Thankfully our dma-range will tell the
RC to use its address translation such that the actual address written on the
bus by the RC is 0x10000000.)."

Thanks,

Andrew Murray

> 
> In your case the dma-range also serves to describe a limit to the range
> of addresses we can reach.
> 
> Thanks,
> 
> Andrew Murray
> 
> > 
> > -- 
> > Best regards,
> > Marek Vasut
Marek Vasut Nov. 6, 2019, 11:37 p.m. UTC | #38
On 10/26/19 10:36 PM, Andrew Murray wrote:
[...]>> But this still leaves me with one open question -- how do I
figure out
>> what to program into the PCI controller inbound windows, so that the
>> controller correctly filters inbound transfers which are targetting
>> nonexisting memory ?
> 
> Your driver should program into the RC->CPU windows, the exact ranges
> described in the dma-ranges. Whilst also respecting the alignment and
> max-size rules your controller has (e.g. the existing upstream logic
> and also the new logic that recalculates the alignment per entry).
> 
> As far as I can tell from looking at your U-Boot patch, I think I'd expect
> a single dma-range to be presented in the DT, that describes
> 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your
> controller is limited to 32 bits. And 2) there is a linear mapping between
> PCI and CPU addresses (given that the second and third arguments on
> pci_set_region are both the same).
> 
> As you point out, this range includes lots of things that you don't
> want the RC to touch - such as non-existent memory. This is OK, when
> Linux programs addresses into the various EP's for them to DMA to host
> memory, it uses its own logic to select addresses that are in RAM, the
> purpose of the dma-range is to describe what the CPU RAM address looks
> like from the perspective of the RC (for example if the RC was wired
> with an offset such that made memory writes from the RC made to
> 0x00000000 end up on the system map at 0x80000000, we need to tell Linux
> about this offset. Otherwise when a EP device driver programs a DMA
> address of a RAM buffer at 0x90000000, it'll end up targetting
> 0x110000000. Thankfully our dma-range will tell Linux to apply an offset
> such that the actual address written to the EP is 0x10000000.).

I understand that Linux programs the endpoints correctly. However this
still doesn't prevent the endpoint from being broken and from sending a
transaction to that non-existent memory. The PCI controller can prevent
that and in an automotive SoC, I would very much like the PCI controller
to do just that, rather than hope that the endpoint would always work.

> In your case the dma-range also serves to describe a limit to the range
> of addresses we can reach.

[...]
Andrew Murray Nov. 7, 2019, 2:19 p.m. UTC | #39
On Thu, Nov 07, 2019 at 12:37:44AM +0100, Marek Vasut wrote:
> On 10/26/19 10:36 PM, Andrew Murray wrote:
> [...]>> But this still leaves me with one open question -- how do I
> figure out
> >> what to program into the PCI controller inbound windows, so that the
> >> controller correctly filters inbound transfers which are targetting
> >> nonexisting memory ?
> > 
> > Your driver should program into the RC->CPU windows, the exact ranges
> > described in the dma-ranges. Whilst also respecting the alignment and
> > max-size rules your controller has (e.g. the existing upstream logic
> > and also the new logic that recalculates the alignment per entry).
> > 
> > As far as I can tell from looking at your U-Boot patch, I think I'd expect
> > a single dma-range to be presented in the DT, that describes
> > 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your
> > controller is limited to 32 bits. And 2) there is a linear mapping between
> > PCI and CPU addresses (given that the second and third arguments on
> > pci_set_region are both the same).
> > 
> > As you point out, this range includes lots of things that you don't
> > want the RC to touch - such as non-existent memory. This is OK, when
> > Linux programs addresses into the various EP's for them to DMA to host
> > memory, it uses its own logic to select addresses that are in RAM, the
> > purpose of the dma-range is to describe what the CPU RAM address looks
> > like from the perspective of the RC (for example if the RC was wired
> > with an offset such that made memory writes from the RC made to
> > 0x00000000 end up on the system map at 0x80000000, we need to tell Linux
> > about this offset. Otherwise when a EP device driver programs a DMA
> > address of a RAM buffer at 0x90000000, it'll end up targetting
> > 0x110000000. Thankfully our dma-range will tell Linux to apply an offset
> > such that the actual address written to the EP is 0x10000000.).
> 
> I understand that Linux programs the endpoints correctly. However this
> still doesn't prevent the endpoint from being broken and from sending a
> transaction to that non-existent memory.

Correct.

> The PCI controller can prevent
> that and in an automotive SoC, I would very much like the PCI controller
> to do just that, rather than hope that the endpoint would always work.

OK I understand - At least when working on the assumption that your RC will
block RC->CPU transactions that are not described in any of it's windows.
Thus you want to use the dma-ranges as a means to configure your controller
to do this.

What actually happens if you have a broken endpoint that reads/writes to
non-existent memory on this hardware? Ideally the RC would generate a
CA or UR back to the endpoint - does something else happen? Lockup, dead RC,
performance issues?

Using built-in features of the RC to prevent it from sending transactions
to non-existent addresses is clearly helpful. But of course it doesn't stop
a broken EP from writing to existent addresses, so only provides limited
protection.

Despite the good intentions here, it doesn't seem like dma-ranges is
designed for this purpose and as the hardware has limited ranges it will
only be best-effort.

Thanks,

Andrew Murray

> 
> > In your case the dma-range also serves to describe a limit to the range
> > of addresses we can reach.
> 
> [...]
> 
> -- 
> Best regards,
> Marek Vasut
Marek Vasut Nov. 16, 2019, 3:48 p.m. UTC | #40
On 11/7/19 3:19 PM, Andrew Murray wrote:
> On Thu, Nov 07, 2019 at 12:37:44AM +0100, Marek Vasut wrote:
>> On 10/26/19 10:36 PM, Andrew Murray wrote:
>> [...]>> But this still leaves me with one open question -- how do I
>> figure out
>>>> what to program into the PCI controller inbound windows, so that the
>>>> controller correctly filters inbound transfers which are targetting
>>>> nonexisting memory ?
>>>
>>> Your driver should program into the RC->CPU windows, the exact ranges
>>> described in the dma-ranges. Whilst also respecting the alignment and
>>> max-size rules your controller has (e.g. the existing upstream logic
>>> and also the new logic that recalculates the alignment per entry).
>>>
>>> As far as I can tell from looking at your U-Boot patch, I think I'd expect
>>> a single dma-range to be presented in the DT, that describes
>>> 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your
>>> controller is limited to 32 bits. And 2) there is a linear mapping between
>>> PCI and CPU addresses (given that the second and third arguments on
>>> pci_set_region are both the same).
>>>
>>> As you point out, this range includes lots of things that you don't
>>> want the RC to touch - such as non-existent memory. This is OK, when
>>> Linux programs addresses into the various EP's for them to DMA to host
>>> memory, it uses its own logic to select addresses that are in RAM, the
>>> purpose of the dma-range is to describe what the CPU RAM address looks
>>> like from the perspective of the RC (for example if the RC was wired
>>> with an offset such that made memory writes from the RC made to
>>> 0x00000000 end up on the system map at 0x80000000, we need to tell Linux
>>> about this offset. Otherwise when a EP device driver programs a DMA
>>> address of a RAM buffer at 0x90000000, it'll end up targetting
>>> 0x110000000. Thankfully our dma-range will tell Linux to apply an offset
>>> such that the actual address written to the EP is 0x10000000.).
>>
>> I understand that Linux programs the endpoints correctly. However this
>> still doesn't prevent the endpoint from being broken and from sending a
>> transaction to that non-existent memory.
> 
> Correct.
> 
>> The PCI controller can prevent
>> that and in an automotive SoC, I would very much like the PCI controller
>> to do just that, rather than hope that the endpoint would always work.
> 
> OK I understand - At least when working on the assumption that your RC will
> block RC->CPU transactions that are not described in any of it's windows.
> Thus you want to use the dma-ranges as a means to configure your controller
> to do this.

Yes

> What actually happens if you have a broken endpoint that reads/writes to
> non-existent memory on this hardware? Ideally the RC would generate a
> CA or UR back to the endpoint - does something else happen? Lockup, dead RC,
> performance issues?

The behavior is undefined.

> Using built-in features of the RC to prevent it from sending transactions
> to non-existent addresses is clearly helpful. But of course it doesn't stop
> a broken EP from writing to existent addresses, so only provides limited
> protection.

Correct.

> Despite the good intentions here, it doesn't seem like dma-ranges is
> designed for this purpose and as the hardware has limited ranges it will
> only be best-effort.
So what other options do we have ?
Robin Murphy Nov. 18, 2019, 6:42 p.m. UTC | #41
On 16/11/2019 3:48 pm, Marek Vasut wrote:
> On 11/7/19 3:19 PM, Andrew Murray wrote:
>> On Thu, Nov 07, 2019 at 12:37:44AM +0100, Marek Vasut wrote:
>>> On 10/26/19 10:36 PM, Andrew Murray wrote:
>>> [...]>> But this still leaves me with one open question -- how do I
>>> figure out
>>>>> what to program into the PCI controller inbound windows, so that the
>>>>> controller correctly filters inbound transfers which are targetting
>>>>> nonexisting memory ?
>>>>
>>>> Your driver should program into the RC->CPU windows, the exact ranges
>>>> described in the dma-ranges. Whilst also respecting the alignment and
>>>> max-size rules your controller has (e.g. the existing upstream logic
>>>> and also the new logic that recalculates the alignment per entry).
>>>>
>>>> As far as I can tell from looking at your U-Boot patch, I think I'd expect
>>>> a single dma-range to be presented in the DT, that describes
>>>> 0:0xFFFFFFFF => 0:0xFFFFFFFF. This is because 1) I understand your
>>>> controller is limited to 32 bits. And 2) there is a linear mapping between
>>>> PCI and CPU addresses (given that the second and third arguments on
>>>> pci_set_region are both the same).
>>>>
>>>> As you point out, this range includes lots of things that you don't
>>>> want the RC to touch - such as non-existent memory. This is OK, when
>>>> Linux programs addresses into the various EP's for them to DMA to host
>>>> memory, it uses its own logic to select addresses that are in RAM, the
>>>> purpose of the dma-range is to describe what the CPU RAM address looks
>>>> like from the perspective of the RC (for example if the RC was wired
>>>> with an offset such that made memory writes from the RC made to
>>>> 0x00000000 end up on the system map at 0x80000000, we need to tell Linux
>>>> about this offset. Otherwise when a EP device driver programs a DMA
>>>> address of a RAM buffer at 0x90000000, it'll end up targetting
>>>> 0x110000000. Thankfully our dma-range will tell Linux to apply an offset
>>>> such that the actual address written to the EP is 0x10000000.).
>>>
>>> I understand that Linux programs the endpoints correctly. However this
>>> still doesn't prevent the endpoint from being broken and from sending a
>>> transaction to that non-existent memory.
>>
>> Correct.
>>
>>> The PCI controller can prevent
>>> that and in an automotive SoC, I would very much like the PCI controller
>>> to do just that, rather than hope that the endpoint would always work.
>>
>> OK I understand - At least when working on the assumption that your RC will
>> block RC->CPU transactions that are not described in any of it's windows.
>> Thus you want to use the dma-ranges as a means to configure your controller
>> to do this.
> 
> Yes
> 
>> What actually happens if you have a broken endpoint that reads/writes to
>> non-existent memory on this hardware? Ideally the RC would generate a
>> CA or UR back to the endpoint - does something else happen? Lockup, dead RC,
>> performance issues?
> 
> The behavior is undefined.
> 
>> Using built-in features of the RC to prevent it from sending transactions
>> to non-existent addresses is clearly helpful. But of course it doesn't stop
>> a broken EP from writing to existent addresses, so only provides limited
>> protection.
> 
> Correct.
> 
>> Despite the good intentions here, it doesn't seem like dma-ranges is
>> designed for this purpose and as the hardware has limited ranges it will
>> only be best-effort.
> So what other options do we have ?

If you really want to sacrifice DMA efficiency for a perceived increase 
in theoretical robustness by setting very conservative windows, then 
ultimately it's your choice, go ahead. It's just that you *need* to make 
that choice in the bootloader, not in Linux. If Linux gets passed 
dma-ranges that aren't actually reflected by the hardware, such that 
this patch is needed, then it *will* go wrong eventually, and you'll 
only get an "I told you so" from me.

The bootloader knows what platform it's running on, so it has no excuse 
for emitting more ranges than there are available windows on that platform.

Robin.
Robin Murphy Nov. 19, 2019, 12:10 p.m. UTC | #42
On 18/10/2019 7:44 pm, Marek Vasut wrote:
[...]
>>>> Say you have a single hardware window, and this DT property (1-cell
>>>> numbers for simplicity:
>>>>
>>>>       dma-ranges = <0x00000000 0x00000000 0x80000000>;
>>>>
>>>> Driver reads one entry and programs the window to 2GB@0, DMA setup
>>>> parses the first entry and sets device masks to 0x7fffffff, and
>>>> everything's fine.
>>>>
>>>> Now say we describe the exact same address range this way instead:
>>>>
>>>>       dma-ranges = <0x00000000 0x00000000 0x40000000,
>>>>                 0x40000000 0x40000000 0x40000000>;
>>>>
>>>> Driver reads one entry and programs the window to 1GB@0, DMA setup
>>>> parses the first entry and sets device masks to 0x3fffffff, and *today*,
>>>> things are suboptimal but happen to work.
>>>>
>>>> Now say we finally get round to fixing the of_dma code to properly
>>>> generate DMA masks that actually include all usable address bits, a user
>>>> upgrades their kernel package, and reboots with that same DT...
>>>>
>>>> Driver reads one entry and programs the window to 1GB@0, DMA setup
>>>> parses all entries and sets device masks to 0x7fffffff, devices start
>>>> randomly failing or throwing DMA errors half the time, angry user looks
>>>> at the changelog to find that somebody decided their now-corrupted
>>>> filesystem is less important than the fact that hey, at least the
>>>> machine didn't refuse to boot because the DT was obviously wrong. Are
>>>> you sure that shouldn't be a problem?
>>>
>>> I think you picked a rather special case here and arrived as a DMA mask
>>> which just fails in this special case. Such special case doesn't happen
>>> here, and even if it did, I would expect Linux to merge those two ranges
>>> or do something sane ? If the DMA mask is set incorrectly, that's a bug
>>> of the DMA code I would think.
>>
>> The mask is not set incorrectly - DMA masks represent the number of
>> address bits the device (or intermediate interconnect in the case of the
>> bus mask) is capable of driving. Thus when DMA is limited to a specific
>> address range, the masks should be wide enough to cover the topmost
>> address of that range (unless the device's own capability is inherently
>> narrower).
> 
> Then the mask should be 0x7fffffff in both cases I'd say.

Yes, *that's my whole point*. It *should* be that for both cases, but 
today it works out not to be because the current code is a bit crap.

>>> What DMA mask would you get if those two entries had a gap inbetween
>>> them ? E.g.:
>>>
>>>    dma-ranges = <0x00000000 0x00000000 0x20000000,
>>>                  0x40000000 0x40000000 0x20000000>;

The gap itself is immaterial - the highest address described by those 
ranges is 0x5fffffff, which requires 31 bits to drive, thus the 
appropriate mask would be 0x7fffffff.

>> OK, here's an real non-simplified example
> 
> I would really like an answer to the simple example above before we
> start inventing convoluted ones.

Sigh... I did say "real". However convoluted it may look to you, this 
window configuration was "invented" ~6 years ago when the Arm Juno SoCs 
were first brought up and has been shipping ever since.

>> (note that these windows are fixed and not programmed by Linux):
>>
>>      dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
>>                   <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
>>                   <0x43000000 0x8 0x80000000 0x8 0x80000000 0x2 0x00000000>;
>>
>> The DMA masks for the devices behind this bridge *should* be 35 bits,
>> because that's the size of the largest usable address. Currently,
>> however, because of the of_dma code's deficiency they would end up being
>> an utterly useless 30 bits, which isn't even enough to reach the start
>> of RAM. Thus I can't actually have this property in my DT, and as a
>> result I can't enable the IOMMU, because *that* also needs to know the
>> ranges in order to reserve the unusable gaps between the windows once
>> address translation is in play.
> 
> How is this related to this particular patch ? This looks more like some
> DMA internal topic.

It's trying to get across the point that "dma-ranges" is consumed by 
multiple separate parts of the kernel, and implicitly relying on the way 
they currently interact is a mistake which will come back to bite you. 
Changing one consumer (your driver) to allow devices to exist despite 
describing unusable DMA ranges might happen to seem OK today due to 
shortcomings in another consumer (the DMA API layer), but once the DMA 
API layer gets improved (necessarily, as above), your changes will cause 
subtle DMA breakage on R-Car systems with more populated DRAM banks than 
available PCI windows, which is arguably even worse than having PCI 
simply refuse to probe in that situation.

Robin.
Marek Vasut Dec. 22, 2019, 7:46 a.m. UTC | #43
On 11/18/19 7:42 PM, Robin Murphy wrote:
[...]

>>> Despite the good intentions here, it doesn't seem like dma-ranges is
>>> designed for this purpose and as the hardware has limited ranges it will
>>> only be best-effort.
>> So what other options do we have ?
> 
> If you really want to sacrifice DMA efficiency for a perceived increase
> in theoretical robustness by setting very conservative windows,

That really means configuring the hardware correctly.

> then
> ultimately it's your choice, go ahead. It's just that you *need* to make
> that choice in the bootloader, not in Linux. If Linux gets passed
> dma-ranges that aren't actually reflected by the hardware, such that
> this patch is needed, then it *will* go wrong eventually, and you'll
> only get an "I told you so" from me.
> 
> The bootloader knows what platform it's running on, so it has no excuse
> for emitting more ranges than there are available windows on that platform.

So basically the conclusion is to limit the amount of DMA ranges added
into the DT and be done with it ?
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-rcar.c b/drivers/pci/controller/pcie-rcar.c
index 56a6433eb70b..e2735005ffd3 100644
--- a/drivers/pci/controller/pcie-rcar.c
+++ b/drivers/pci/controller/pcie-rcar.c
@@ -1049,8 +1049,9 @@  static int rcar_pcie_inbound_ranges(struct rcar_pcie *pcie,
 
 	while (cpu_addr < cpu_end) {
 		if (idx >= MAX_NR_INBOUND_MAPS - 1) {
-			dev_err(pcie->dev, "Failed to map inbound regions!\n");
-			return -EINVAL;
+			dev_warn(pcie->dev,
+				 "Too many inbound regions, not all are mapped.\n");
+			break;
 		}
 		/*
 		 * Set up 64-bit inbound regions as the range parser doesn't