Message ID | 20240111052814.713016-1-vidyas@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [V3] PCI/MSI: Fix MSI hwirq truncation | expand |
On Thu, Jan 11 2024 at 10:58, Vidya Sagar wrote: > While calculating the hwirq number for an MSI interrupt, the higher > bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain > number gets truncated because of the shifted value casting to return > type of pci_domain_nr() which is 'int'. This for example is resulting > in same hwirq number for devices 0019:00:00.0 and 0039:00:00.0. > > So, cast the PCI domain number to 'irq_hw_number_t' before left shifting > it to calculate hwirq number. This still does not explain that this fixes it only on 64-bit platforms and why we don't care for 32-bit systems.
On 1/12/2024 9:23 PM, Thomas Gleixner wrote: > External email: Use caution opening links or attachments > > > On Thu, Jan 11 2024 at 10:58, Vidya Sagar wrote: >> While calculating the hwirq number for an MSI interrupt, the higher >> bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain >> number gets truncated because of the shifted value casting to return >> type of pci_domain_nr() which is 'int'. This for example is resulting >> in same hwirq number for devices 0019:00:00.0 and 0039:00:00.0. >> >> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting >> it to calculate hwirq number. > > This still does not explain that this fixes it only on 64-bit platforms > and why we don't care for 32-bit systems. Agree that this fixes the issue only on 64-bit platforms. It doesn't change the behavior on 32-bit platforms. My understanding is that the issue surfaces only if there are too many PCIe controllers in the system which usually is the case in modern server systems and it is arguable if the server systems really run 32-bit kernels. One way to fix it for both 32-bit and 64-bit systems is by changing the type of 'hwirq' to u64. This may cause two memory reads in 32-bit systems whenever 'hwirq' is accessed and that may intern cause some perf impact?? Is this the way you think I should be handling it? >
On Fri, Jan 12 2024 at 23:03, Vidya Sagar wrote: > On 1/12/2024 9:23 PM, Thomas Gleixner wrote: >> On Thu, Jan 11 2024 at 10:58, Vidya Sagar wrote: >>> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting >>> it to calculate hwirq number. >> >> This still does not explain that this fixes it only on 64-bit platforms >> and why we don't care for 32-bit systems. > Agree that this fixes the issue only on 64-bit platforms. It doesn't > change the behavior on 32-bit platforms. My understanding is that the > issue surfaces only if there are too many PCIe controllers in the system > which usually is the case in modern server systems and it is arguable if > the server systems really run 32-bit kernels. Arguably people who do that can keep the pieces. > One way to fix it for both 32-bit and 64-bit systems is by changing the > type of 'hwirq' to u64. This may cause two memory reads in 32-bit > systems whenever 'hwirq' is accessed and that may intern cause some perf > impact?? Is this the way you think I should be handling it? No. Leave it as is. What I'm asking for is that it's properly documented in the changelog. Thanks, tglx
On 1/15/2024 3:31 PM, Thomas Gleixner wrote: > External email: Use caution opening links or attachments > > > On Fri, Jan 12 2024 at 23:03, Vidya Sagar wrote: >> On 1/12/2024 9:23 PM, Thomas Gleixner wrote: >>> On Thu, Jan 11 2024 at 10:58, Vidya Sagar wrote: >>>> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting >>>> it to calculate hwirq number. >>> >>> This still does not explain that this fixes it only on 64-bit platforms >>> and why we don't care for 32-bit systems. >> Agree that this fixes the issue only on 64-bit platforms. It doesn't >> change the behavior on 32-bit platforms. My understanding is that the >> issue surfaces only if there are too many PCIe controllers in the system >> which usually is the case in modern server systems and it is arguable if >> the server systems really run 32-bit kernels. > > Arguably people who do that can keep the pieces. > >> One way to fix it for both 32-bit and 64-bit systems is by changing the >> type of 'hwirq' to u64. This may cause two memory reads in 32-bit >> systems whenever 'hwirq' is accessed and that may intern cause some perf >> impact?? Is this the way you think I should be handling it? > > No. Leave it as is. What I'm asking for is that it's properly documented > in the changelog. Sure. I'll add this extra information in the change log. > > Thanks, > > tglx >
diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c index c8be056c248d..cfd84a899c82 100644 --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -61,7 +61,7 @@ static irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc) return (irq_hw_number_t)desc->msi_index | pci_dev_id(dev) << 11 | - (pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27; + ((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27; } static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,