Message ID | 20201024213535.443185-18-dwmw2@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields | expand |
From: David Woodhouse > Sent: 24 October 2020 22:35 > > From: Thomas Gleixner <tglx@linutronix.de> > > Use the msi_msg shadow structs and compose the message with named bitfields > instead of the unreadable macro maze. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > --- > arch/x86/pci/xen.c | 26 +++++++++++--------------- > 1 file changed, 11 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c > index c552cd2d0632..3d41a09c2c14 100644 > --- a/arch/x86/pci/xen.c > +++ b/arch/x86/pci/xen.c > @@ -152,7 +152,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 gsi, > > #if defined(CONFIG_PCI_MSI) > #include <linux/msi.h> > -#include <asm/msidef.h> > > struct xen_pci_frontend_ops *xen_pci_frontend; > EXPORT_SYMBOL_GPL(xen_pci_frontend); > @@ -210,23 +209,20 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) > return ret; > } > > -#define XEN_PIRQ_MSI_DATA (MSI_DATA_TRIGGER_EDGE | \ > - MSI_DATA_LEVEL_ASSERT | (3 << 8) | MSI_DATA_VECTOR(0)) > - > static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq, > struct msi_msg *msg) > { > - /* We set vector == 0 to tell the hypervisor we don't care about it, > - * but we want a pirq setup instead. > - * We use the dest_id field to pass the pirq that we want. */ > - msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq); > - msg->address_lo = > - MSI_ADDR_BASE_LO | > - MSI_ADDR_DEST_MODE_PHYSICAL | > - MSI_ADDR_REDIRECTION_CPU | > - MSI_ADDR_DEST_ID(pirq); > - > - msg->data = XEN_PIRQ_MSI_DATA; > + /* > + * We set vector == 0 to tell the hypervisor we don't care about > + * it, but we want a pirq setup instead. We use the dest_id fields > + * to pass the pirq that we want. > + */ > + memset(msg, 0, sizeof(*msg)); > + msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH; > + msg->arch_addr_hi.destid_8_31 = pirq >> 8; > + msg->arch_addr_lo.destid_0_7 = pirq & 0xFF; > + msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW; > + msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_EXTINT; > } Just looking at a random one of these patches... Does the compiler manage to optimise that reasonably? Or does it generate a lot of shifts and masks as each bitfield is set? The code generation for bitfields is often a lot worse that that for |= setting bits in a word. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Sun, 2020-10-25 at 09:49 +0000, David Laight wrote: > Just looking at a random one of these patches... > > Does the compiler manage to optimise that reasonably? > Or does it generate a lot of shifts and masks as each > bitfield is set? > > The code generation for bitfields is often a lot worse > that that for |= setting bits in a word. Indeed, it appears to be utterly appalling. That was one of my motivations for doing it with masks and shifts in the first place. Because in ioapic_setup_msg_from_msi(), for example, these two are consecutive in both source and destination: entry->vector = msg.arch_data.vector; entry->delivery_mode = msg.arch_data.delivery_mode; And so are those: entry->ir_format = msg.arch_addr_lo.dmar_format; entry->ir_index_0_14 = msg.arch_addr_lo.dmar_index_0_14; But GCC 7.5.0 here is doing them all separately... 00000000000011e0 <ioapic_setup_msg_from_msi>: { 11e0: 53 push %rbx 11e1: 48 89 f3 mov %rsi,%rbx 11e4: 48 83 ec 18 sub $0x18,%rsp irq_chip_compose_msi_msg(irq_data, &msg); 11e8: 48 89 e6 mov %rsp,%rsi { 11eb: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax 11f2: 00 00 11f4: 48 89 44 24 10 mov %rax,0x10(%rsp) 11f9: 31 c0 xor %eax,%eax irq_chip_compose_msi_msg(irq_data, &msg); 11fb: e8 00 00 00 00 callq 1200 <ioapic_setup_msg_from_msi+0x20> entry->vector = msg.arch_data.vector; 1200: 0f b6 44 24 08 movzbl 0x8(%rsp),%eax entry->delivery_mode = msg.arch_data.delivery_mode; 1205: 0f b6 53 01 movzbl 0x1(%rbx),%edx 1209: 0f b6 74 24 09 movzbl 0x9(%rsp),%esi entry->vector = msg.arch_data.vector; 120e: 88 03 mov %al,(%rbx) entry->dest_mode_logical = msg.arch_addr_lo.dest_mode_logical; 1210: 0f b6 04 24 movzbl (%rsp),%eax entry->delivery_mode = msg.arch_data.delivery_mode; 1214: 83 e2 f0 and $0xfffffff0,%edx 1217: 83 e6 07 and $0x7,%esi entry->dest_mode_logical = msg.arch_addr_lo.dest_mode_logical; 121a: 09 f2 or %esi,%edx 121c: 8d 0c 00 lea (%rax,%rax,1),%ecx entry->ir_format = msg.arch_addr_lo.dmar_format; 121f: c0 e8 04 shr $0x4,%al entry->dest_mode_logical = msg.arch_addr_lo.dest_mode_logical; 1222: 83 e1 08 and $0x8,%ecx 1225: 09 ca or %ecx,%edx 1227: 88 53 01 mov %dl,0x1(%rbx) entry->ir_format = msg.arch_addr_lo.dmar_format; 122a: 89 c2 mov %eax,%edx entry->ir_index_0_14 = msg.arch_addr_lo.dmar_index_0_14; 122c: 8b 04 24 mov (%rsp),%eax 122f: 83 e2 01 and $0x1,%edx 1232: c1 e8 05 shr $0x5,%eax 1235: 66 25 ff 7f and $0x7fff,%ax 1239: 8d 0c 00 lea (%rax,%rax,1),%ecx 123c: 66 c1 e8 07 shr $0x7,%ax 1240: 88 43 07 mov %al,0x7(%rbx) 1243: 09 ca or %ecx,%edx } 1245: 48 8b 44 24 10 mov 0x10(%rsp),%rax 124a: 65 48 33 04 25 28 00 xor %gs:0x28,%rax 1251: 00 00 entry->ir_index_0_14 = msg.arch_addr_lo.dmar_index_0_14; 1253: 88 53 06 mov %dl,0x6(%rbx) } 1256: 75 06 jne 125e <ioapic_setup_msg_from_msi+0x7e> 1258: 48 83 c4 18 add $0x18,%rsp 125c: 5b pop %rbx 125d: c3 retq 125e: e8 00 00 00 00 callq 1263 <ioapic_setup_msg_from_msi+0x83> 1263: 0f 1f 00 nopl (%rax) 1266: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 126d: 00 00 00 Still, this isn't really a fast path so I'm content to file the GCC PR for the really *obvious* misses and let it take its course.
From: David Woodhouse > Sent: 25 October 2020 10:26 > To: David Laight <David.Laight@ACULAB.COM>; x86@kernel.org > > On Sun, 2020-10-25 at 09:49 +0000, David Laight wrote: > > Just looking at a random one of these patches... > > > > Does the compiler manage to optimise that reasonably? > > Or does it generate a lot of shifts and masks as each > > bitfield is set? > > > > The code generation for bitfields is often a lot worse > > that that for |= setting bits in a word. > > Indeed, it appears to be utterly appalling. That was one of my > motivations for doing it with masks and shifts in the first place. I thought it would be. I'm not even sure using bitfields to map hardware registers makes the code any more readable (or fool proof). I suspect using #define constants and explicit |= and &= ~ is actually best - provided the names are descriptive enough. If you set all the fields the compiler will merge the values and do a single write - provided nothing you read might alias the target. The only way to get strongly typed integers is to cast them to a structure pointer type. Together with a static inline function to remove the casts and | the values together it might make things fool proof. But I've never tried it. ISTR once doing something like that with error codes, but it didn't ever make source control. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index c552cd2d0632..3d41a09c2c14 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -152,7 +152,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 gsi, #if defined(CONFIG_PCI_MSI) #include <linux/msi.h> -#include <asm/msidef.h> struct xen_pci_frontend_ops *xen_pci_frontend; EXPORT_SYMBOL_GPL(xen_pci_frontend); @@ -210,23 +209,20 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) return ret; } -#define XEN_PIRQ_MSI_DATA (MSI_DATA_TRIGGER_EDGE | \ - MSI_DATA_LEVEL_ASSERT | (3 << 8) | MSI_DATA_VECTOR(0)) - static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq, struct msi_msg *msg) { - /* We set vector == 0 to tell the hypervisor we don't care about it, - * but we want a pirq setup instead. - * We use the dest_id field to pass the pirq that we want. */ - msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq); - msg->address_lo = - MSI_ADDR_BASE_LO | - MSI_ADDR_DEST_MODE_PHYSICAL | - MSI_ADDR_REDIRECTION_CPU | - MSI_ADDR_DEST_ID(pirq); - - msg->data = XEN_PIRQ_MSI_DATA; + /* + * We set vector == 0 to tell the hypervisor we don't care about + * it, but we want a pirq setup instead. We use the dest_id fields + * to pass the pirq that we want. + */ + memset(msg, 0, sizeof(*msg)); + msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH; + msg->arch_addr_hi.destid_8_31 = pirq >> 8; + msg->arch_addr_lo.destid_0_7 = pirq & 0xFF; + msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW; + msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_EXTINT; } static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)