diff mbox series

[v6,5/7] xen/arm: do not map IRQs and memory for disabled devices

Message ID 20211105063326.939843-6-andr2000@gmail.com (mailing list archive)
State Superseded
Headers show
Series PCI devices passthrough on Arm, part 2 | expand

Commit Message

Oleksandr Andrushchenko Nov. 5, 2021, 6:33 a.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Currently Xen maps all IRQs and memory ranges for all devices except
those marked for passthrough, e.g. it doesn't pay attention to the
"status" property of the node.

According to the device tree specification [1]:
 - "okay"     Indicates the device is operational.
 - "disabled" Indicates that the device is not presently operational,
              but it might become operational in the future (for example,
	      something is not plugged in, or switched off).
	      Refer to the device binding for details on what disabled means
	      for a given device.

So, "disabled" status is device dependent and mapping should be taken by
case-by-case approach with that respect. Although in general Xen should map
IRQs and memory ranges as the disabled devices might become operational it
makes it impossible for the other devices, which are not operational in
any case, to skip the mappings.

This patch disables mapping for the devices with status = "disabled".

[1] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
New in v6
---
 xen/arch/arm/domain_build.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Julien Grall Nov. 16, 2021, 7:22 p.m. UTC | #1
Hi Oleksandr,

On 05/11/2021 06:33, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Currently Xen maps all IRQs and memory ranges for all devices except
> those marked for passthrough, e.g. it doesn't pay attention to the
> "status" property of the node.
> 
> According to the device tree specification [1]:
>   - "okay"     Indicates the device is operational.
>   - "disabled" Indicates that the device is not presently operational,
>                but it might become operational in the future (for example,
> 	      something is not plugged in, or switched off).
> 	      Refer to the device binding for details on what disabled means
> 	      for a given device.
> 
> So, "disabled" status is device dependent and mapping should be taken by
> case-by-case approach with that respect. Although in general Xen should map
> IRQs and memory ranges as the disabled devices might become operational

Right, this change effectively prevent dom0 to use such device if it 
becomes operational in the future. So this sounds like a big regression.

How do you plan to handle it?

> it
> makes it impossible for the other devices, which are not operational in
> any case, to skip the mappings.

You wrote "makes it impossible for the other devices", but it is not 
clear to me what's would go wrong when we map a disabled device (Dom0 
should not touch it). Do you have an example?

Cheers,
Oleksandr Andrushchenko Nov. 18, 2021, 6:59 a.m. UTC | #2
Hi, Julien!

On 16.11.21 21:22, Julien Grall wrote:
> Hi Oleksandr,
>
> On 05/11/2021 06:33, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Currently Xen maps all IRQs and memory ranges for all devices except
>> those marked for passthrough, e.g. it doesn't pay attention to the
>> "status" property of the node.
>>
>> According to the device tree specification [1]:
>>   - "okay"     Indicates the device is operational.
>>   - "disabled" Indicates that the device is not presently operational,
>>                but it might become operational in the future (for example,
>>           something is not plugged in, or switched off).
>>           Refer to the device binding for details on what disabled means
>>           for a given device.
>>
>> So, "disabled" status is device dependent and mapping should be taken by
>> case-by-case approach with that respect. Although in general Xen should map
>> IRQs and memory ranges as the disabled devices might become operational
>
> Right, this change effectively prevent dom0 to use such device if it becomes operational in the future.
Or doesn't, see below
> So this sounds like a big regression.
>
> How do you plan to handle it?
It depends if this is really a regression or is ok and "by design"
>
>> it
>> makes it impossible for the other devices, which are not operational in
>> any case, to skip the mappings.
>
> You wrote "makes it impossible for the other devices", but it is not clear to me what's would go wrong when we map a disabled device (Dom0 should not touch it). Do you have an example?
Ok, here we go. In the SoC I am working with the PCIe controller may run in two modes:
Root or Endpoint. Not configurable at run-time, so you configure it in the device tree.
The are two device tree entries for that: for the root [1] and for the endpoint [2].
So, when you want the controller to be a Root Complex then you put status = "disabled"
for the Endpoint entry and vise versa.

If you take a look at the nodes they both define the same "reg" and "interrupts",
effectively making it impossible for me in the patch [3] to actually trap MMIOs:
we skip the mappings for [1] and then, because we assume "disabled" is
still valid for mappings, we add those for [2].

So, this is probably why device tree documentation says about the disabled status
to be device specific.

Hope this describes a very valid use-case. At the same time you may argue that
I just need to remove the offending entry from the device tree which may also be
valid.
>
> Cheers,
>
Thank you,
Oleksandr

[1] https://github.com/renesas-rcar/linux-bsp/blob/v5.10.41/rcar-5.1.3.rc6/arch/arm64/boot/dts/renesas/r8a779f0.dtsi#L843
[2] https://github.com/renesas-rcar/linux-bsp/blob/v5.10.41/rcar-5.1.3.rc6/arch/arm64/boot/dts/renesas/r8a779f0.dtsi#L896
[3] https://patchwork.kernel.org/project/xen-devel/patch/20211105063326.939843-5-andr2000@gmail.com/
Julien Grall Nov. 22, 2021, 7:31 p.m. UTC | #3
Hi Oleksandr,

On 18/11/2021 06:59, Oleksandr Andrushchenko wrote:
> Hi, Julien!
> 
> On 16.11.21 21:22, Julien Grall wrote:
>> Hi Oleksandr,
>>
>> On 05/11/2021 06:33, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Currently Xen maps all IRQs and memory ranges for all devices except
>>> those marked for passthrough, e.g. it doesn't pay attention to the
>>> "status" property of the node.
>>>
>>> According to the device tree specification [1]:
>>>    - "okay"     Indicates the device is operational.
>>>    - "disabled" Indicates that the device is not presently operational,
>>>                 but it might become operational in the future (for example,
>>>            something is not plugged in, or switched off).
>>>            Refer to the device binding for details on what disabled means
>>>            for a given device.
>>>
>>> So, "disabled" status is device dependent and mapping should be taken by
>>> case-by-case approach with that respect. Although in general Xen should map
>>> IRQs and memory ranges as the disabled devices might become operational
>>
>> Right, this change effectively prevent dom0 to use such device if it becomes operational in the future.
> Or doesn't, see below
>> So this sounds like a big regression.
>>
>> How do you plan to handle it?
> It depends if this is really a regression or is ok and "by design"

Quoting what you wrote in the commit message:

"Indicates that the device is not presently operational, but it might 
become operational in the future."

I read that as it might be possible to have a device turn on after boot.
In fact, looking at Linux, I can see some code allowing to change the 
state of a device after boot (see [4]). So I think we want to keep 
mapping the regions in dom0 at least for some devices.

Note, that other bits of the DT overlay would not work. This should be 
covered by the new series from Xilinx [5].

>>
>>> it
>>> makes it impossible for the other devices, which are not operational in
>>> any case, to skip the mappings.
>>
>> You wrote "makes it impossible for the other devices", but it is not clear to me what's would go wrong when we map a disabled device (Dom0 should not touch it). Do you have an example?
> Ok, here we go. In the SoC I am working with the PCIe controller may run in two modes:
> Root or Endpoint. Not configurable at run-time, so you configure it in the device tree.
> The are two device tree entries for that: for the root [1] and for the endpoint [2].
> So, when you want the controller to be a Root Complex then you put status = "disabled"
> for the Endpoint entry and vise versa.

Thank for the example. I think this is something that should be 
explained in the commit message.

> 
> If you take a look at the nodes they both define the same "reg" and "interrupts",
> effectively making it impossible for me in the patch [3] to actually trap MMIOs: > we skip the mappings for [1] and then, because we assume "disabled" is
> still valid for mappings, we add those for [2].

Technically, you would have the same issues if your device is sharing 
the interrupt or the MMIO page.

The former is fairly common, but IIUC you are not emulated the 
interrupts. Right?

For the latter, I have seen multiple UARTs in the same page (e.g. 
pine64) but not multiple PCI hostbridges. However, this is only with 4KB 
page granularity. We may have the issue arising with 16KB/64KB ones. So 
I would say we could ignore it for now.

> 
> So, this is probably why device tree documentation says about the disabled status
> to be device specific.

Correct. That said, until now, all the devices would have their 
MMIOsregions mapped to dom0. So the interpretation of "disabled" didn't 
matter too much.

> 
> Hope this describes a very valid use-case. At the same time you may argue that
> I just need to remove the offending entry from the device tree which may also be
> valid.

We need to be able to accept any *valid* Device-Tree without any 
modification. At the same time, we want to avoid break potential 
existing use-cases.

I think, you can handle the two gracefully by adding a else in 
pci_host_bridge_mappings() that would remove the P2M mapping if 
bridge->ops->need_p2m_hwdom_mapping() returns false.

This is not pretty but it should do the job for now. Long run, I think 
we should consider to create fake entries in the P2M for emulated regions.

The advantage is we could easily find clash and use them to ignore 
mapping when the region is emulated.

Cheers,

>>
>> Cheers,
>>
> Thank you,
> Oleksandr
> 
> [1] https://github.com/renesas-rcar/linux-bsp/blob/v5.10.41/rcar-5.1.3.rc6/arch/arm64/boot/dts/renesas/r8a779f0.dtsi#L843
> [2] https://github.com/renesas-rcar/linux-bsp/blob/v5.10.41/rcar-5.1.3.rc6/arch/arm64/boot/dts/renesas/r8a779f0.dtsi#L896
> [3] https://patchwork.kernel.org/project/xen-devel/patch/20211105063326.939843-5-andr2000@gmail.com/

[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/dynamic.c#n111
[5] 
https://lore.kernel.org/xen-devel/1636441347-133850-1-git-send-email-fnu.vikram@xilinx.com/


>
Oleksandr Andrushchenko Nov. 23, 2021, 7:23 a.m. UTC | #4
Hi, Julien!

On 22.11.21 21:31, Julien Grall wrote:
> Hi Oleksandr,
>
> On 18/11/2021 06:59, Oleksandr Andrushchenko wrote:
>> Hi, Julien!
>>
>> On 16.11.21 21:22, Julien Grall wrote:
>>> Hi Oleksandr,
>>>
>>> On 05/11/2021 06:33, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> Currently Xen maps all IRQs and memory ranges for all devices except
>>>> those marked for passthrough, e.g. it doesn't pay attention to the
>>>> "status" property of the node.
>>>>
>>>> According to the device tree specification [1]:
>>>>    - "okay"     Indicates the device is operational.
>>>>    - "disabled" Indicates that the device is not presently operational,
>>>>                 but it might become operational in the future (for example,
>>>>            something is not plugged in, or switched off).
>>>>            Refer to the device binding for details on what disabled means
>>>>            for a given device.
>>>>
>>>> So, "disabled" status is device dependent and mapping should be taken by
>>>> case-by-case approach with that respect. Although in general Xen should map
>>>> IRQs and memory ranges as the disabled devices might become operational
>>>
>>> Right, this change effectively prevent dom0 to use such device if it becomes operational in the future.
>> Or doesn't, see below
>>> So this sounds like a big regression.
>>>
>>> How do you plan to handle it?
>> It depends if this is really a regression or is ok and "by design"
>
> Quoting what you wrote in the commit message:
>
> "Indicates that the device is not presently operational, but it might become operational in the future."
>
> I read that as it might be possible to have a device turn on after boot.
> In fact, looking at Linux, I can see some code allowing to change the state of a device after boot (see [4]). So I think we want to keep mapping the regions in dom0 at least for some devices.
>
> Note, that other bits of the DT overlay would not work. This should be covered by the new series from Xilinx [5].
>
>>>
>>>> it
>>>> makes it impossible for the other devices, which are not operational in
>>>> any case, to skip the mappings.
>>>
>>> You wrote "makes it impossible for the other devices", but it is not clear to me what's would go wrong when we map a disabled device (Dom0 should not touch it). Do you have an example?
>> Ok, here we go. In the SoC I am working with the PCIe controller may run in two modes:
>> Root or Endpoint. Not configurable at run-time, so you configure it in the device tree.
>> The are two device tree entries for that: for the root [1] and for the endpoint [2].
>> So, when you want the controller to be a Root Complex then you put status = "disabled"
>> for the Endpoint entry and vise versa.
>
> Thank for the example. I think this is something that should be explained in the commit message.
>
>>
>> If you take a look at the nodes they both define the same "reg" and "interrupts",
>> effectively making it impossible for me in the patch [3] to actually trap MMIOs: > we skip the mappings for [1] and then, because we assume "disabled" is
>> still valid for mappings, we add those for [2].
>
> Technically, you would have the same issues if your device is sharing the interrupt or the MMIO page.
>
> The former is fairly common, but IIUC you are not emulated the interrupts. Right?
>
> For the latter, I have seen multiple UARTs in the same page (e.g. pine64) but not multiple PCI hostbridges. However, this is only with 4KB page granularity. We may have the issue arising with 16KB/64KB ones. So I would say we could ignore it for now.
>
>>
>> So, this is probably why device tree documentation says about the disabled status
>> to be device specific.
>
> Correct. That said, until now, all the devices would have their MMIOsregions mapped to dom0. So the interpretation of "disabled" didn't matter too much.
>
>>
>> Hope this describes a very valid use-case. At the same time you may argue that
>> I just need to remove the offending entry from the device tree which may also be
>> valid.
>
> We need to be able to accept any *valid* Device-Tree without any modification. At the same time, we want to avoid break potential existing use-cases.
>
> I think, you can handle the two gracefully by adding a else in pci_host_bridge_mappings() that would remove the P2M mapping if bridge->ops->need_p2m_hwdom_mapping() returns false.
>
> This is not pretty but it should do the job for now. Long run, I think we should consider to create fake entries in the P2M for emulated regions.
>
> The advantage is we could easily find clash and use them to ignore mapping when the region is emulated.
I think I will drop this patch from the series for now: although it is good to have the device tree
unmodified I still see it way easier to do so. I will delete the node from the device tree as in my
case it is anyways decided at compile time which function the PCI host bridge is about to be.
If need be I'll re-introduce this patch in some other form.

Thank you,
Oleksandr
>
> Cheers,
>
>>>
>>> Cheers,
>>>
>> Thank you,
>> Oleksandr
>>
>> [1] https://urldefense.com/v3/__https://github.com/renesas-rcar/linux-bsp/blob/v5.10.41/rcar-5.1.3.rc6/arch/arm64/boot/dts/renesas/r8a779f0.dtsi*L843__;Iw!!GF_29dbcQIUBPA!ledLSDAXi6ggPloos-Wz0fWRM3IlSkJVX-H4QrTBSM9rlWvCItQ4D8hzLKHe7fBtoUOnJDmMoQ$ [github[.]com]
>> [2] https://urldefense.com/v3/__https://github.com/renesas-rcar/linux-bsp/blob/v5.10.41/rcar-5.1.3.rc6/arch/arm64/boot/dts/renesas/r8a779f0.dtsi*L896__;Iw!!GF_29dbcQIUBPA!ledLSDAXi6ggPloos-Wz0fWRM3IlSkJVX-H4QrTBSM9rlWvCItQ4D8hzLKHe7fBtoUPG8u2ngQ$ [github[.]com]
>> [3] https://urldefense.com/v3/__https://patchwork.kernel.org/project/xen-devel/patch/20211105063326.939843-5-andr2000@gmail.com/__;!!GF_29dbcQIUBPA!ledLSDAXi6ggPloos-Wz0fWRM3IlSkJVX-H4QrTBSM9rlWvCItQ4D8hzLKHe7fBtoUPU79EtaA$ [patchwork[.]kernel[.]org]
>
> [4] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/dynamic.c*n111__;Iw!!GF_29dbcQIUBPA!ledLSDAXi6ggPloos-Wz0fWRM3IlSkJVX-H4QrTBSM9rlWvCItQ4D8hzLKHe7fBtoUPRg1zUUw$ [git[.]kernel[.]org]
> [5] https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/1636441347-133850-1-git-send-email-fnu.vikram@xilinx.com/__;!!GF_29dbcQIUBPA!ledLSDAXi6ggPloos-Wz0fWRM3IlSkJVX-H4QrTBSM9rlWvCItQ4D8hzLKHe7fBtoUMfgs0sdA$ [lore[.]kernel[.]org]
>
>
>>
>
diff mbox series

Patch

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index c7d992456ca7..d3a4c0a173b8 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1837,7 +1837,8 @@  static int __init handle_device(struct domain *d, struct dt_device_node *dev,
     unsigned int i;
     int res;
     u64 addr, size;
-    bool own_device = !dt_device_for_passthrough(dev);
+    bool own_device = !dt_device_for_passthrough(dev) &&
+                      dt_device_is_available(dev);
     /*
      * For PCI passthrough we only need to remap to Dom0 the interrupts
      * and memory ranges from "reg" property which cover controller's