Message ID | 20201009104616.1314746-9-dwmw2@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix x2apic enablement and allow up to 32768 CPUs without IR where supported | expand |
On Fri, Oct 09 2020 at 11:46, David Woodhouse wrote: @@ -45,12 +45,11 @@ enum irq_alloc_type { }; > +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry) > +{ > + struct msi_msg msg; > + u32 *entry = _entry; Why is this a void * argument and then converting it to a u32 *? Just to make that function completely unreadable? > + > + irq_chip_compose_msi_msg(irq_data, &msg); Lacks a comment. Also mp_swizzle... is a misnomer as this invokes the msi compose function which is not what the function name suggests. > + /* > + * They're in a bit of a random order for historical reasons, but > + * the IO/APIC is just a device for turning interrupt lines into > + * MSIs, and various bits of the MSI addr/data are just swizzled > + * into/from the bits of Redirection Table Entry. > + */ > + entry[0] &= 0xfffff000; > + entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK | > + MSI_DATA_VECTOR_MASK)); > + entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9; > + > + entry[1] &= 0xffff; > + entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12; Sorry, but this is unreviewable gunk. The whole msi_msg setup sucks with this unholy macro maze. I have a half finished series which allows architectures to provide shadow members for data, address_* so this can be done proper with bitfields. Aside of that it works magically because polarity,trigger and mask bit have been set up before. But of course a comment about this is completely overrated. Thanks, tglx
On Thu, Oct 22 2020 at 23:43, Thomas Gleixner wrote: > On Fri, Oct 09 2020 at 11:46, David Woodhouse wrote: > Aside of that it works magically because polarity,trigger and mask bit > have been set up before. But of course a comment about this is > completely overrated. Also this part: > -static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data, > - struct IO_APIC_route_entry *entry) > +static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data) > { > + struct IO_APIC_route_entry *entry = &data->entry; > + > memset(entry, 0, sizeof(*entry)); > - entry->delivery_mode = apic->irq_delivery_mode; > - entry->dest_mode = apic->irq_dest_mode; > - entry->dest = cfg->dest_apicid & 0xff; > - entry->virt_ext_dest = cfg->dest_apicid >> 8; > - entry->vector = cfg->vector; > + > + mp_swizzle_msi_dest_bits(irq_data, entry); > + > entry->trigger = data->trigger; > entry->polarity = data->polarity; > /* does not make sense. It did not make sense before either, but now it does even make less sense. During allocation this only needs to setup the I/O-APIC specific bits (trigger, polarity, mask). The rest is filled in when the actual activation happens. Nothing writes that entry _before_ activation. /me goes to mop up more
On 22 October 2020 22:43:52 BST, Thomas Gleixner <tglx@linutronix.de> wrote: >On Fri, Oct 09 2020 at 11:46, David Woodhouse wrote: > >@@ -45,12 +45,11 @@ enum irq_alloc_type { > }; > >> +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void >*_entry) >> +{ >> + struct msi_msg msg; >> + u32 *entry = _entry; > >Why is this a void * argument and then converting it to a u32 *? Just >to >make that function completely unreadable? It makes the callers slightly more readable, not having to cast to uint32_t* from the struct. I did ponder defining a new struct with bitfields named along the lines of 'msi_addr_bits_19_to_4', but that seemed like overkill. >> + >> + irq_chip_compose_msi_msg(irq_data, &msg); > >Lacks a comment. Also mp_swizzle... is a misnomer as this invokes the >msi compose function which is not what the function name suggests. It's got a four-line comment right there after it as we use the bits we got from it. >> + /* >> + * They're in a bit of a random order for historical reasons, but >> + * the IO/APIC is just a device for turning interrupt lines into >> + * MSIs, and various bits of the MSI addr/data are just swizzled >> + * into/from the bits of Redirection Table Entry. >> + */ >> + entry[0] &= 0xfffff000; >> + entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK | >> + MSI_DATA_VECTOR_MASK)); >> + entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9; >> + >> + entry[1] &= 0xffff; >> + entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12; > >Sorry, but this is unreviewable gunk. Crap. Sure, want to look at the I/OAPIC and MSI documentation and observe that it's just shifting bits around and "these bits go there, those bits go here..." but there's no magic which will make that go away. At some point you end up swizzling bits around with seemingly random bitmasks and shifts which you have to have worked out from the docs. Now that I killed off the various IOMMU bits which also horrifically touch the RTE directly, perhaps there is more scope for faffing around with it differently, but it won't really change much. It's just urinating into the atmospheric disturbance. >Aside of that it works magically because polarity,trigger and mask bit >have been set up before. But of course a comment about this is >completely overrated. Well yes, there's a lot about the I/OAPIC code which is a bit horrid but this patch is doing just one thing: making the bits get from e.g. cfg->dest_apicid and cfg->vector into the RTE in a generic fashion which does the right thing for IR too. Other cleanups are somewhat orthogonal, but yes it's a good spot that one of these was somewhat redundant in the first place. We could fix that up in a separate patch which comes first perhaps. If we're starting to clean up I/OAPIC code I'd like to take a long hard look at the level ack code paths, and the fact that we have separate ack_apic_irq and apic_ack_irq functions (or something like that; on phone now) which do different things. Perhaps the order (or the capitals on APIC which one of them has) makes it sane and meaningful for them both to exist and do different things? I also note the Hyper-V "remapping" takes the IR code path and I'm not sure that's OK. Because of the complete lack of overall documentation on what it's all doing and why.
On Fri, 2020-10-23 at 00:10 +0200, Thomas Gleixner wrote: > > -static void mp_setup_entry(struct irq_cfg *cfg, struct > > mp_chip_data *data, > > - struct IO_APIC_route_entry *entry) > > +static void mp_setup_entry(struct irq_data *irq_data, struct > > mp_chip_data *data) > > { > > + struct IO_APIC_route_entry *entry = &data->entry; > > + > > memset(entry, 0, sizeof(*entry)); > > - entry->delivery_mode = apic->irq_delivery_mode; > > - entry->dest_mode = apic->irq_dest_mode; > > - entry->dest = cfg->dest_apicid & 0xff; > > - entry->virt_ext_dest = cfg->dest_apicid >> 8; > > - entry->vector = cfg->vector; > > + > > + mp_swizzle_msi_dest_bits(irq_data, entry); > > + > > entry->trigger = data->trigger; > > entry->polarity = data->polarity; > > /* > > does not make sense. It did not make sense before either, but now it > does even make less sense. > > During allocation this only needs to setup the I/O-APIC specific bits > (trigger, polarity, mask). The rest is filled in when the actual > activation happens. Nothing writes that entry _before_ activation. > > /me goes to mop up more Yeah... that code was indeed a pile of crap before I looked at it, wasn't it? And I indeed failed to spot it and mop it up as I touched it. Here's the version I've just pushed to my tree, which I'll test properly both with and without IR over the weekend before posting v3. There is no way that bit swizzling is every going to be anything short of fugly. That's just the reality of the hardware. Even doing it with bitfields is just going to be masking the issue by having structure definitions that don't actually match the I/OAPIC documentation. Better to be up front about it. I've added more words though; more words always help... https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id From f912b52996d381fd8a631dd10c713772c2ade478 Mon Sep 17 00:00:00 2001 From: David Woodhouse <dwmw@amazon.co.uk> Date: Thu, 8 Oct 2020 15:44:42 +0100 Subject: [PATCH 10/19] x86/ioapic: Generate RTE directly from parent irqchip's MSI message The I/OAPIC generates an MSI cycle with address/data bits taken from its Redirection Table Entry in some combination which used to make sense, but now is just a bunch of bits which get passed through in some seemingly arbitrary order. Instead of making IRQ remapping drivers directly frob the I/OAPIC RTE, let them just do their job and generate an MSI message. The bit swizzling to turn that MSI message into the IOAPIC's RTE is the same in all cases, since it's a function of the I/OAPIC hardware. The IRQ remappers have no real need to get involved with that. The only slight caveat is that the I/OAPIC is interpreting some of those fields too, and it does want the 'vector' field to be unique to make EOI work. The AMD IOMMU happens to put its IRTE index in the bits that the I/OAPIC thinks are the vector field, and accommodates this requirement by reserving the first 32 indices for the I/OAPIC. The Intel IOMMU doesn't actually use the bits that the I/OAPIC thinks are the vector field, so it fills in the 'pin' value there instead. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> --- arch/x86/include/asm/hw_irq.h | 11 ++-- arch/x86/include/asm/msidef.h | 2 + arch/x86/kernel/apic/io_apic.c | 81 ++++++++++++++++++++++------- drivers/iommu/amd/iommu.c | 14 ----- drivers/iommu/hyperv-iommu.c | 31 ----------- drivers/iommu/intel/irq_remapping.c | 19 ++----- 6 files changed, 74 insertions(+), 84 deletions(-) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index a4aeeaace040..aabd8f1b6bb0 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -45,12 +45,11 @@ enum irq_alloc_type { }; struct ioapic_alloc_info { - int pin; - int node; - u32 trigger : 1; - u32 polarity : 1; - u32 valid : 1; - struct IO_APIC_route_entry *entry; + int pin; + int node; + u32 trigger : 1; + u32 polarity : 1; + u32 valid : 1; }; struct uv_alloc_info { diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h index ee2f8ccc32d0..37c3d2d492c9 100644 --- a/arch/x86/include/asm/msidef.h +++ b/arch/x86/include/asm/msidef.h @@ -18,6 +18,7 @@ #define MSI_DATA_DELIVERY_MODE_SHIFT 8 #define MSI_DATA_DELIVERY_FIXED (0 << MSI_DATA_DELIVERY_MODE_SHIFT) #define MSI_DATA_DELIVERY_LOWPRI (1 << MSI_DATA_DELIVERY_MODE_SHIFT) +#define MSI_DATA_DELIVERY_MODE_MASK (3 << MSI_DATA_DELIVERY_MODE_SHIFT) #define MSI_DATA_LEVEL_SHIFT 14 #define MSI_DATA_LEVEL_DEASSERT (0 << MSI_DATA_LEVEL_SHIFT) @@ -37,6 +38,7 @@ #define MSI_ADDR_DEST_MODE_SHIFT 2 #define MSI_ADDR_DEST_MODE_PHYSICAL (0 << MSI_ADDR_DEST_MODE_SHIFT) #define MSI_ADDR_DEST_MODE_LOGICAL (1 << MSI_ADDR_DEST_MODE_SHIFT) +#define MSI_ADDR_DEST_MODE_MASK (1 << MSI_DATA_DELIVERY_MODE_SHIFT) #define MSI_ADDR_REDIRECTION_SHIFT 3 #define MSI_ADDR_REDIRECTION_CPU (0 << MSI_ADDR_REDIRECTION_SHIFT) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 54f6a029b1d1..b9e6236af833 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -48,6 +48,7 @@ #include <linux/jiffies.h> /* time_after() */ #include <linux/slab.h> #include <linux/memblock.h> +#include <linux/msi.h> #include <asm/irqdomain.h> #include <asm/io.h> @@ -63,6 +64,7 @@ #include <asm/setup.h> #include <asm/irq_remapping.h> #include <asm/hw_irq.h> +#include <asm/msidef.h> #include <asm/apic.h> @@ -1851,22 +1853,64 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data) eoi_ioapic_pin(data->entry.vector, data); } +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, + struct IO_APIC_route_entry *rte) +{ + struct msi_msg msg; + u32 *entry = (u32 *)rte; + + /* + * They're in a bit of a random order for historical reasons, but + * the I/OAPIC is just a device for turning interrupt lines into + * MSIs, and various bits of the MSI addr/data are just swizzled + * into/from the bits of Redirection Table Entry. So let the + * upstream irqdomain (be it interrupt remapping or otherwise) + * compose the MSI message, and we'll shift the bits into the + * appropriate place in the RTE. + */ + irq_chip_compose_msi_msg(irq_data, &msg); + + /* + * The low 12 bits of the RTE were historically the vector, + * delivery_mode and destination mode. Which come from the + * low 8 bits of the MSI data, the *next* 3 bits of the MSI + * data, and bit 2 of the MSI address (which thus has to be + * shifted up by 9 to land in the right place in bit 11 of + * the RTE). + * + * With Interrupt Remapping of course many bits in the MSI + * have different meanings but the bit-swizzling of the + * I/OAPIC hardware remains the same. + */ + entry[0] &= 0xfffff000; + entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK | + MSI_DATA_VECTOR_MASK)); + entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9; + + /* + * Top 16 bits of the RTE are the destination and extended + * destination ID fields, which come from bits 19-4 of the + * MSI address. + */ + entry[1] &= 0xffff; + entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12; +} + + static void ioapic_configure_entry(struct irq_data *irqd) { struct mp_chip_data *mpd = irqd->chip_data; - struct irq_cfg *cfg = irqd_cfg(irqd); struct irq_pin_list *entry; /* - * Only update when the parent is the vector domain, don't touch it - * if the parent is the remapping domain. Check the installed - * ioapic chip to verify that. + * The polarity, trigger and mask bits have already been + * set up at allocation time by mp_setup_entry(). What + * remains on activation and set_affinity is to set up + * the various destination bits which are obtained from + * the upstream irq domain's generated MSI message. */ - if (irqd->chip == &ioapic_chip) { - mpd->entry.dest = cfg->dest_apicid & 0xff; - mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8; - mpd->entry.vector = cfg->vector; - } + mp_swizzle_msi_dest_bits(irqd, &mpd->entry); + for_each_irq_pin(entry, mpd->irq_2_pin) __ioapic_write_entry(entry->apic, entry->pin, mpd->entry); } @@ -2949,15 +2993,16 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data, } } -static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data, - struct IO_APIC_route_entry *entry) +static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data) { + struct IO_APIC_route_entry *entry = &data->entry; + + /* + * The destination bits get set up by ioapic_configure_entry() + * when the IRQ is activated. For now just set up the I/OAPIC + * specific fields. + */ memset(entry, 0, sizeof(*entry)); - entry->delivery_mode = apic->irq_delivery_mode; - entry->dest_mode = apic->irq_dest_mode; - entry->dest = cfg->dest_apicid & 0xff; - entry->virt_ext_dest = cfg->dest_apicid >> 8; - entry->vector = cfg->vector; entry->trigger = data->trigger; entry->polarity = data->polarity; /* @@ -2995,7 +3040,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, if (!data) return -ENOMEM; - info->ioapic.entry = &data->entry; ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info); if (ret < 0) { kfree(data); @@ -3013,8 +3057,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin); local_irq_save(flags); - if (info->ioapic.entry) - mp_setup_entry(cfg, data, info->ioapic.entry); + mp_setup_entry(irq_data, data); mp_register_handler(virq, data->trigger); if (virq < nr_legacy_irqs()) legacy_pic->mask(virq); diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index ef64e01f66d7..13d0a8f42d56 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3597,7 +3597,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data, { struct irq_2_irte *irte_info = &data->irq_2_irte; struct msi_msg *msg = &data->msi_entry; - struct IO_APIC_route_entry *entry; struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; if (!iommu) @@ -3611,19 +3610,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data, switch (info->type) { case X86_IRQ_ALLOC_TYPE_IOAPIC: - /* Setup IOAPIC entry */ - entry = info->ioapic.entry; - info->ioapic.entry = NULL; - memset(entry, 0, sizeof(*entry)); - entry->vector = index; - entry->mask = 0; - entry->trigger = info->ioapic.trigger; - entry->polarity = info->ioapic.polarity; - /* Mask level triggered irqs. */ - if (info->ioapic.trigger) - entry->mask = 1; - break; - case X86_IRQ_ALLOC_TYPE_HPET: case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index 12ec31534995..3a674262cc91 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data, { struct irq_data *parent = data->parent_data; struct irq_cfg *cfg = irqd_cfg(data); - struct IO_APIC_route_entry *entry; int ret; /* Return error If new irq affinity is out of ioapic_max_cpumask. */ @@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data, if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE) return ret; - entry = data->chip_data; - entry->dest = cfg->dest_apicid; - entry->vector = cfg->vector; send_cleanup_vector(cfg); return 0; @@ -89,20 +85,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain, irq_data->chip = &hyperv_ir_chip; - /* - * If there is interrupt remapping function of IOMMU, setting irq - * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't - * support interrupt remapping function, setting irq affinity of IO-APIC - * interrupts still needs to change IO-APIC registers. But ioapic_ - * configure_entry() will ignore value of cfg->vector and cfg-> - * dest_apicid when IO-APIC's parent irq domain is not the vector - * domain.(See ioapic_configure_entry()) In order to setting vector - * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved - * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_ - * affinity() set vector and dest_apicid directly into IO-APIC entry. - */ - irq_data->chip_data = info->ioapic.entry; - /* * Hypver-V IO APIC irq affinity should be in the scope of * ioapic_max_cpumask because no irq remapping support. @@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain, irq_domain_free_irqs_common(domain, virq, nr_irqs); } -static int hyperv_irq_remapping_activate(struct irq_domain *domain, - struct irq_data *irq_data, bool reserve) -{ - struct irq_cfg *cfg = irqd_cfg(irq_data); - struct IO_APIC_route_entry *entry = irq_data->chip_data; - - entry->dest = cfg->dest_apicid; - entry->vector = cfg->vector; - - return 0; -} - static const struct irq_domain_ops hyperv_ir_domain_ops = { .alloc = hyperv_irq_remapping_alloc, .free = hyperv_irq_remapping_free, - .activate = hyperv_irq_remapping_activate, }; static int __init hyperv_prepare_irq_remapping(void) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 0cfce1d3b7bb..511dfb4884bc 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1265,7 +1265,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, struct irq_alloc_info *info, int index, int sub_handle) { - struct IR_IO_APIC_route_entry *entry; struct irte *irte = &data->irte_entry; struct msi_msg *msg = &data->msi_entry; @@ -1281,23 +1280,15 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, irte->avail, irte->vector, irte->dest_id, irte->sid, irte->sq, irte->svt); - entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry; - info->ioapic.entry = NULL; - memset(entry, 0, sizeof(*entry)); - entry->index2 = (index >> 15) & 0x1; - entry->zero = 0; - entry->format = 1; - entry->index = (index & 0x7fff); /* * IO-APIC RTE will be configured with virtual vector. * irq handler will do the explicit EOI to the io-apic. */ - entry->vector = info->ioapic.pin; - entry->mask = 0; /* enable IRQ */ - entry->trigger = info->ioapic.trigger; - entry->polarity = info->ioapic.polarity; - if (info->ioapic.trigger) - entry->mask = 1; /* Mask level triggered irqs. */ + msg->data = info->ioapic.pin; + msg->address_hi = MSI_ADDR_BASE_HI; + msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT | + MSI_ADDR_IR_INDEX1(index) | + MSI_ADDR_IR_INDEX2(index); break; case X86_IRQ_ALLOC_TYPE_HPET:
On Fri, Oct 23 2020 at 11:10, David Woodhouse wrote: > On 22 October 2020 22:43:52 BST, Thomas Gleixner <tglx@linutronix.de> wrote: > It makes the callers slightly more readable, not having to cast to uint32_t* from the struct. > > I did ponder defining a new struct with bitfields named along the > lines of 'msi_addr_bits_19_to_4', but that seemed like overkill. I did something like this in the meantime, because all of this just sucks. git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/apic Hot of the press and completely untested. >>> + /* >>> + * They're in a bit of a random order for historical reasons, but >>> + * the IO/APIC is just a device for turning interrupt lines into >>> + * MSIs, and various bits of the MSI addr/data are just swizzled >>> + * into/from the bits of Redirection Table Entry. >>> + */ >>> + entry[0] &= 0xfffff000; >>> + entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK | >>> + MSI_DATA_VECTOR_MASK)); >>> + entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9; >>> + >>> + entry[1] &= 0xffff; >>> + entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12; >> >>Sorry, but this is unreviewable gunk. > > Crap. Sure, want to look at the I/OAPIC and MSI documentation and > observe that it's just shifting bits around and "these bits go there, > those bits go here..." but there's no magic which will make that go > away. At some point you end up swizzling bits around with seemingly > random bitmasks and shifts which you have to have worked out from the > docs. Yes, we can't avoid the bit swizzling at all. But it can be made more readable. > Now that I killed off the various IOMMU bits which also horrifically > touch the RTE directly, perhaps there is more scope for faffing around > with it differently, but it won't really change much. It's just > urinating into the atmospheric disturbance. Well, you can look at it this way, but I surely have better things to do than taking pencil and paper and drawing up mappings when reading through that code or a patch against it. >> Aside of that it works magically because polarity,trigger and mask bit >> have been set up before. But of course a comment about this is >> completely overrated. > > Well yes, there's a lot about the I/OAPIC code which is a bit horrid > but this patch is doing just one thing: making the bits get from > e.g. cfg->dest_apicid and cfg->vector into the RTE in a generic > fashion which does the right thing for IR too. Other cleanups are > somewhat orthogonal, but yes it's a good spot that one of these was > somewhat redundant in the first place. We could fix that up in a > separate patch which comes first perhaps. Yes, that code is horrid, but adding a comment to that effect when changing it is not asked too much, right? > If we're starting to clean up I/OAPIC code I'd like to take a long > hard look at the level ack code paths, and the fact that we have > separate ack_apic_irq and apic_ack_irq functions (or something like > that; on phone now) which do different things. Perhaps the order (or > the capitals on APIC which one of them has) makes it sane and > meaningful for them both to exist and do different things? The naming conventions are definitely not sane. > I also note the Hyper-V "remapping" takes the IR code path and I'm not > sure that's OK. Because of the complete lack of overall documentation > on what it's all doing and why. I stared into that today as well. There is a fundamental difference between real hardware remapping and this. Especially the affinity setting mode changes with remapping and hyper-v goes into the remap path, but that hyperv driver does not issue "irq_set_status_flags(virq, IRQ_MOVE_PCNTXT)", which means that you can't change the affinity of the I/O-APIC interrupts in that hyperv mode at all. If that flag is not set then affinity changes are done in actual interrupt context in ioapic_ack_level() which is only used for the non-IR chip ... I'm still wrapping my head around getting rid of this thing completely because now it's just a subset of your KVM case with the only restriction that I/O-APIC cannot be affined to any CPU with a APIC id greater than 255. Thanks, tglx
On Fri, 2020-10-23 at 23:28 +0200, Thomas Gleixner wrote: > On Fri, Oct 23 2020 at 11:10, David Woodhouse wrote: > > On 22 October 2020 22:43:52 BST, Thomas Gleixner <tglx@linutronix.de> wrote: > > It makes the callers slightly more readable, not having to cast to uint32_t* from the struct. > > > > I did ponder defining a new struct with bitfields named along the > > lines of 'msi_addr_bits_19_to_4', but that seemed like overkill. > > I did something like this in the meantime, because all of this just > sucks. > > git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/apic > > Hot of the press and completely untested. Hm, your struct IO_APIC_route_entry isn't actually a union; you've defined a 128-bit structure with the IR fields *following* the non-IR fields. But there *is* a union in io_apic.c, of that 128-bit structure and two uint32_ts. Suspect that wasn't quite what you intended. I'll prod at it this morning and turn it into a single union of the three, and give it some testing. Also, my "Move MSI support into hpet.c" patch¹ got updated to s/CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ/ at about line 53 for the MSI- related variable declarations, which was going to be in the next version I posted. I was also hoping Paolo was going to take the patch which just defines the KVM_FEATURE_MSI_EXT_DEST_ID bit² ASAP, so that we end up with a second patch³ that *just* wires it up to x86_init.msi_ext_dest_id() for KVM. ¹ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/734719c1f4 ² https://git.infradead.org/users/dwmw2/linux.git/commitdiff/3f371d6749 ³ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/8399e14eb5 > Yes, we can't avoid the bit swizzling at all. But it can be made more > readable. Hm, I was about to concede that your version is a bit more readable. But then I got to your new __ipi_msi_compose_msg() and realised that it isn't working because it's setting the 0xFEE base address in the _low_ bits, somehow... msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW; printk("1 Compose MSI message %x/%x\n", msg->address_lo, msg->data); msg->arch_addr_lo.dest_mode_logical = apic->dest_mode_logical; printk("2 Compose MSI message %x/%x\n", msg->address_lo, msg->data); msg->arch_addr_lo.destid_0_7 = cfg->dest_apicid & 0xFF; printk("3 Compose MSI message %x/%x\n", msg->address_lo, msg->data); [ 1.793874] 1 Compose MSI message fee/0 [ 1.794310] 2 Compose MSI message fee/0 [ 1.794768] 3 Compose MSI message f02/0 And now I wish it was just a simple shift instead of an unholy maze of overlapping unions of bitfields. But I'll make more coffee and stare at it harder... > Yes, that code is horrid, but adding a comment to that effect when > changing it is not asked too much, right? Sure. I just actually hadn't noticed that setting the dest/vector bits right there was entirely redundant in the first place. > I'm still wrapping my head around getting rid of this thing completely > because now it's just a subset of your KVM case with the only > restriction that I/O-APIC cannot be affined to any CPU with a APIC id > greater than 255. It was only ever that restriction anyway, wasn't it? Hyper-V PCI has its own MSI handling, and there's no HPET so it was only ever the I/OAPIC which was problematic there. There are Hyper-V VM sizes with 416 vCPUs which depend on this today, and which don't have the 15-bit MSI extension. Removing hyperv-iommu would prevent us from using all the vCPUs on those. You *could* make hyperv-iommu decline to initialise if x86_init.msi_ext_dest_id() returns true though⁴. ⁴ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/633ccf0d42
On Sat, 2020-10-24 at 09:26 +0100, David Woodhouse wrote: > > And now I wish it was just a simple shift instead of an unholy maze of > overlapping unions of bitfields. But I'll make more coffee and stare at > it harder... Hah, it really *was* unions of the bitfields. This boots... diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index c65e89bedc93..3e14adba34ba 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -65,6 +65,8 @@ union IO_APIC_reg_03 { }; struct IO_APIC_route_entry { + union { + struct { u64 vector : 8, delivery_mode : 3, dest_mode_logical : 1, @@ -77,7 +79,8 @@ struct IO_APIC_route_entry { reserved_1 : 17, virt_destid_8_14 : 7, destid_0_7 : 8; - + }; + struct { u64 ir_shared_0 : 8, ir_zero : 3, ir_index_15 : 1, @@ -85,6 +88,8 @@ struct IO_APIC_route_entry { ir_reserved_0 : 31, ir_format : 1, ir_index_0_14 : 15; + }; + }; } __attribute__ ((packed)); struct irq_alloc_info; diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h index 8cf82d134b78..ef9dfaa4603f 100644 --- a/arch/x86/include/asm/msi.h +++ b/arch/x86/include/asm/msi.h @@ -25,6 +25,7 @@ typedef struct x86_msi_data { typedef struct x86_msi_addr_lo { union { + struct { u32 reserved_0 : 2, dest_mode_logical : 1, redirect_hint : 1, @@ -32,13 +33,15 @@ typedef struct x86_msi_addr_lo { virt_destid_8_14 : 7, destid_0_7 : 8, base_address : 12; - + }; + struct { u32 dmar_reserved_0 : 2, dmar_index_15 : 1, dmar_subhandle_valid : 1, dmar_format : 1, dmar_index_0_14 : 15, dmar_base_address : 12; + }; }; } __attribute__ ((packed)) arch_msi_msg_addr_lo_t; #define arch_msi_msg_addr_lo x86_msi_addr_lo
On 24/10/20 10:26, David Woodhouse wrote: > I was also hoping Paolo was going to take the patch which just defines > the KVM_FEATURE_MSI_EXT_DEST_ID bit² ASAP, so that we end up with a > second patch³ that *just* wires it up to x86_init.msi_ext_dest_id() for > KVM. > > ¹ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/734719c1f4 > ² https://git.infradead.org/users/dwmw2/linux.git/commitdiff/3f371d6749 > ³ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/8399e14eb5 Yes, I am going to take it. I was already sort of playing with fire with the 5.10 pull request (and with me being lousy in general during the 5.10 development period, to be honest), so I left it for rc2 or rc3. It's just docs and it happened to conflict with another documentation patch that had gone in through Jon Corbet's tree. Paolo
On 24 October 2020 10:13:36 BST, Paolo Bonzini <pbonzini@redhat.com> wrote: >On 24/10/20 10:26, David Woodhouse wrote: >> I was also hoping Paolo was going to take the patch which just >defines >> the KVM_FEATURE_MSI_EXT_DEST_ID bit² ASAP, so that we end up with a >> second patch³ that *just* wires it up to x86_init.msi_ext_dest_id() >for >> KVM. >> >> ¹ >https://git.infradead.org/users/dwmw2/linux.git/commitdiff/734719c1f4 >> ² >https://git.infradead.org/users/dwmw2/linux.git/commitdiff/3f371d6749 >> ³ >https://git.infradead.org/users/dwmw2/linux.git/commitdiff/8399e14eb5 > >Yes, I am going to take it. > >I was already sort of playing with fire with the 5.10 pull request (and >with me being lousy in general during the 5.10 development period, to >be >honest), so I left it for rc2 or rc3. It's just docs and it happened >to >conflict with another documentation patch that had gone in through Jon >Corbet's tree. OK, thanks. I'll rework Thomas's tree with that first and the other changes I'd mentioned in my parts, as well as fixing up that unholy chimæra of struct/union in which we set some bitfields from each side of the union, test and push it out later today.
On Sat, 2020-10-24 at 11:13 +0100, David Woodhouse wrote: > OK, thanks. I'll rework Thomas's tree with that first and the other > changes I'd mentioned in my parts, as well as fixing up that unholy > chimæra of struct/union in which we set some bitfields from each side > of the union, test and push it out later today. OK, pushed out to https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/x86/apic] It's Thomas's tree plus the struct/union fixes and other things I mentioned earlier, a few comment fixes in the 'Generate RTE directly from parent irqchip' patch, but the most interesting part is finishing the job of the 'Cleanup IO/APIC route entry structs' patch... From 54b623fc2b03eadb76485b4ca0ade3e79acf6c27 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner <tglx@linutronix.de> Date: Thu, 22 Oct 2020 14:48:18 +0200 Subject: [PATCH 124/137] x86/ioapic: Cleanup IO/APIC route entry structs Having two seperate structs for the I/O-APIC RTE entries (non-remapped and DMAR remapped) requires type casts and makes it hard to map. Combine them in IO_APIC_routing_entry by defining a union of two 64bit bitfields. Use naming which reflects which bits are shared and which bits are actually different for the operating modes. [dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for register access into the same union and eliminating a few more places where bits were accessed through masks and shifts.] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> --- arch/x86/include/asm/io_apic.h | 78 ++++++--------- arch/x86/kernel/apic/io_apic.c | 144 +++++++++++++--------------- drivers/iommu/amd/iommu.c | 8 +- drivers/iommu/hyperv-iommu.c | 4 +- drivers/iommu/intel/irq_remapping.c | 19 ++-- 5 files changed, 108 insertions(+), 145 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index a1a26f6d3aa4..73da644b2f0d 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -13,15 +13,6 @@ * Copyright (C) 1997, 1998, 1999, 2000 Ingo Molnar */ -/* I/O Unit Redirection Table */ -#define IO_APIC_REDIR_VECTOR_MASK 0x000FF -#define IO_APIC_REDIR_DEST_LOGICAL 0x00800 -#define IO_APIC_REDIR_DEST_PHYSICAL 0x00000 -#define IO_APIC_REDIR_SEND_PENDING (1 << 12) -#define IO_APIC_REDIR_REMOTE_IRR (1 << 14) -#define IO_APIC_REDIR_LEVEL_TRIGGER (1 << 15) -#define IO_APIC_REDIR_MASKED (1 << 16) - /* * The structure of the IO-APIC: */ @@ -65,52 +56,39 @@ union IO_APIC_reg_03 { }; struct IO_APIC_route_entry { - __u32 vector : 8, - delivery_mode : 3, /* 000: FIXED - * 001: lowest prio - * 111: ExtINT - */ - dest_mode : 1, /* 0: physical, 1: logical */ - delivery_status : 1, - polarity : 1, - irr : 1, - trigger : 1, /* 0: edge, 1: level */ - mask : 1, /* 0: enabled, 1: disabled */ - __reserved_2 : 15; - - __u32 __reserved_3 : 24, - dest : 8; -} __attribute__ ((packed)); - -struct IR_IO_APIC_route_entry { - __u64 vector : 8, - zero : 3, - index2 : 1, - delivery_status : 1, - polarity : 1, - irr : 1, - trigger : 1, - mask : 1, - reserved : 31, - format : 1, - index : 15; + union { + struct { + u64 vector : 8, + delivery_mode : 3, + dest_mode_logical : 1, + delivery_status : 1, + active_low : 1, + irr : 1, + is_level : 1, + masked : 1, + reserved_0 : 15, + reserved_1 : 24, + destid_0_7 : 8; + }; + struct { + u64 ir_shared_0 : 8, + ir_zero : 3, + ir_index_15 : 1, + ir_shared_1 : 5, + ir_reserved_0 : 31, + ir_format : 1, + ir_index_0_14 : 15; + }; + struct { + u64 w1 : 32, + w2 : 32; + }; + }; } __attribute__ ((packed)); struct irq_alloc_info; struct ioapic_domain_cfg; -#define IOAPIC_EDGE 0 -#define IOAPIC_LEVEL 1 - -#define IOAPIC_MASKED 1 -#define IOAPIC_UNMASKED 0 - -#define IOAPIC_POL_HIGH 0 -#define IOAPIC_POL_LOW 1 - -#define IOAPIC_DEST_MODE_PHYSICAL 0 -#define IOAPIC_DEST_MODE_LOGICAL 1 - #define IOAPIC_MAP_ALLOC 0x1 #define IOAPIC_MAP_CHECK 0x2 diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 24a7bba7cbf4..07e754131854 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -286,31 +286,26 @@ static void io_apic_write(unsigned int apic, unsigned int reg, writel(value, &io_apic->data); } -union entry_union { - struct { u32 w1, w2; }; - struct IO_APIC_route_entry entry; -}; - static struct IO_APIC_route_entry __ioapic_read_entry(int apic, int pin) { - union entry_union eu; + struct IO_APIC_route_entry entry; - eu.w1 = io_apic_read(apic, 0x10 + 2 * pin); - eu.w2 = io_apic_read(apic, 0x11 + 2 * pin); + entry.w1 = io_apic_read(apic, 0x10 + 2 * pin); + entry.w2 = io_apic_read(apic, 0x11 + 2 * pin); - return eu.entry; + return entry; } static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin) { - union entry_union eu; + struct IO_APIC_route_entry entry; unsigned long flags; raw_spin_lock_irqsave(&ioapic_lock, flags); - eu.entry = __ioapic_read_entry(apic, pin); + entry = __ioapic_read_entry(apic, pin); raw_spin_unlock_irqrestore(&ioapic_lock, flags); - return eu.entry; + return entry; } /* @@ -321,11 +316,8 @@ static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin) */ static void __ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e) { - union entry_union eu = {{0, 0}}; - - eu.entry = e; - io_apic_write(apic, 0x11 + 2*pin, eu.w2); - io_apic_write(apic, 0x10 + 2*pin, eu.w1); + io_apic_write(apic, 0x11 + 2*pin, e.w2); + io_apic_write(apic, 0x10 + 2*pin, e.w1); } static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e) @@ -344,12 +336,12 @@ static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e) */ static void ioapic_mask_entry(int apic, int pin) { + struct IO_APIC_route_entry e = { .masked = true }; unsigned long flags; - union entry_union eu = { .entry.mask = IOAPIC_MASKED }; raw_spin_lock_irqsave(&ioapic_lock, flags); - io_apic_write(apic, 0x10 + 2*pin, eu.w1); - io_apic_write(apic, 0x11 + 2*pin, eu.w2); + io_apic_write(apic, 0x10 + 2*pin, e.w1); + io_apic_write(apic, 0x11 + 2*pin, e.w2); raw_spin_unlock_irqrestore(&ioapic_lock, flags); } @@ -422,20 +414,15 @@ static void __init replace_pin_at_irq_node(struct mp_chip_data *data, int node, add_pin_to_irq_node(data, node, newapic, newpin); } -static void io_apic_modify_irq(struct mp_chip_data *data, - int mask_and, int mask_or, +static void io_apic_modify_irq(struct mp_chip_data *data, bool masked, void (*final)(struct irq_pin_list *entry)) { - union entry_union eu; struct irq_pin_list *entry; - eu.entry = data->entry; - eu.w1 &= mask_and; - eu.w1 |= mask_or; - data->entry = eu.entry; + data->entry.masked = masked; for_each_irq_pin(entry, data->irq_2_pin) { - io_apic_write(entry->apic, 0x10 + 2 * entry->pin, eu.w1); + io_apic_write(entry->apic, 0x10 + 2 * entry->pin, data->entry.w1); if (final) final(entry); } @@ -459,13 +446,13 @@ static void mask_ioapic_irq(struct irq_data *irq_data) unsigned long flags; raw_spin_lock_irqsave(&ioapic_lock, flags); - io_apic_modify_irq(data, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync); + io_apic_modify_irq(data, true, &io_apic_sync); raw_spin_unlock_irqrestore(&ioapic_lock, flags); } static void __unmask_ioapic(struct mp_chip_data *data) { - io_apic_modify_irq(data, ~IO_APIC_REDIR_MASKED, 0, NULL); + io_apic_modify_irq(data, false, NULL); } static void unmask_ioapic_irq(struct irq_data *irq_data) @@ -506,8 +493,8 @@ static void __eoi_ioapic_pin(int apic, int pin, int vector) /* * Mask the entry and change the trigger mode to edge. */ - entry1.mask = IOAPIC_MASKED; - entry1.trigger = IOAPIC_EDGE; + entry1.masked = true; + entry1.is_level = false; __ioapic_write_entry(apic, pin, entry1); @@ -542,8 +529,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) * Make sure the entry is masked and re-read the contents to check * if it is a level triggered pin and if the remote-IRR is set. */ - if (entry.mask == IOAPIC_UNMASKED) { - entry.mask = IOAPIC_MASKED; + if (!entry.masked) { + entry.masked = true; ioapic_write_entry(apic, pin, entry); entry = ioapic_read_entry(apic, pin); } @@ -556,8 +543,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) * doesn't clear the remote-IRR if the trigger mode is not * set to level. */ - if (entry.trigger == IOAPIC_EDGE) { - entry.trigger = IOAPIC_LEVEL; + if (!entry.is_level) { + entry.is_level = true; ioapic_write_entry(apic, pin, entry); } raw_spin_lock_irqsave(&ioapic_lock, flags); @@ -659,8 +646,8 @@ void mask_ioapic_entries(void) struct IO_APIC_route_entry entry; entry = ioapics[apic].saved_registers[pin]; - if (entry.mask == IOAPIC_UNMASKED) { - entry.mask = IOAPIC_MASKED; + if (!entry.masked) { + entry.masked = true; ioapic_write_entry(apic, pin, entry); } } @@ -947,8 +934,8 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info) if (irq < nr_legacy_irqs() && data->count == 1) { if (info->ioapic.is_level != data->is_level) mp_register_handler(irq, info->ioapic.is_level); - data->entry.trigger = data->is_level = info->ioapic.is_level; - data->entry.polarity = data->active_low = info->ioapic.active_low; + data->entry.is_level = data->is_level = info->ioapic.is_level; + data->entry.active_low = data->active_low = info->ioapic.active_low; } return data->is_level == info->ioapic.is_level && @@ -1231,10 +1218,9 @@ void ioapic_zap_locks(void) static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries) { - int i; - char buf[256]; struct IO_APIC_route_entry entry; - struct IR_IO_APIC_route_entry *ir_entry = (void *)&entry; + char buf[256]; + int i; printk(KERN_DEBUG "IOAPIC %d:\n", apic); for (i = 0; i <= nr_entries; i++) { @@ -1242,20 +1228,20 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries) snprintf(buf, sizeof(buf), " pin%02x, %s, %s, %s, V(%02X), IRR(%1d), S(%1d)", i, - entry.mask == IOAPIC_MASKED ? "disabled" : "enabled ", - entry.trigger == IOAPIC_LEVEL ? "level" : "edge ", - entry.polarity == IOAPIC_POL_LOW ? "low " : "high", + entry.masked ? "disabled" : "enabled ", + entry.is_level ? "level" : "edge ", + entry.active_low ? "low " : "high", entry.vector, entry.irr, entry.delivery_status); - if (ir_entry->format) + if (entry.ir_format) { printk(KERN_DEBUG "%s, remapped, I(%04X), Z(%X)\n", - buf, (ir_entry->index2 << 15) | ir_entry->index, - ir_entry->zero); - else - printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", buf, - entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ? - "logical " : "physical", - entry.dest, entry.delivery_mode); + (entry.ir_index_15 << 15) | entry.ir_index_0_14, + entry.ir_zero); + } else { + printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", buf, + entry.dest_mode_logical ? "logical " : "physical", + entry.destid_0_7, entry.delivery_mode); + } } } @@ -1380,8 +1366,8 @@ void __init enable_IO_APIC(void) /* If the interrupt line is enabled and in ExtInt mode * I have found the pin where the i8259 is connected. */ - if ((entry.mask == 0) && - (entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT)) { + if (!entry.masked && + entry.delivery_mode == APIC_DELIVERY_MODE_EXTINT) { ioapic_i8259.apic = apic; ioapic_i8259.pin = pin; goto found_i8259; @@ -1425,12 +1411,12 @@ void native_restore_boot_irq_mode(void) struct IO_APIC_route_entry entry; memset(&entry, 0, sizeof(entry)); - entry.mask = IOAPIC_UNMASKED; - entry.trigger = IOAPIC_EDGE; - entry.polarity = IOAPIC_POL_HIGH; - entry.dest_mode = IOAPIC_DEST_MODE_PHYSICAL; + entry.masked = false; + entry.is_level = false; + entry.active_low = false; + entry.dest_mode_logical = false; entry.delivery_mode = APIC_DELIVERY_MODE_EXTINT; - entry.dest = read_apic_id(); + entry.destid_0_7 = read_apic_id(); /* * Add it to the IO-APIC irq-routing table: @@ -1709,13 +1695,13 @@ static bool io_apic_level_ack_pending(struct mp_chip_data *data) raw_spin_lock_irqsave(&ioapic_lock, flags); for_each_irq_pin(entry, data->irq_2_pin) { - unsigned int reg; + struct IO_APIC_route_entry e; int pin; pin = entry->pin; - reg = io_apic_read(entry->apic, 0x10 + pin*2); + e.w1 = io_apic_read(entry->apic, 0x10 + pin*2); /* Is the remote IRR bit set? */ - if (reg & IO_APIC_REDIR_REMOTE_IRR) { + if (e.irr) { raw_spin_unlock_irqrestore(&ioapic_lock, flags); return true; } @@ -1874,7 +1860,7 @@ static void ioapic_configure_entry(struct irq_data *irqd) * ioapic chip to verify that. */ if (irqd->chip == &ioapic_chip) { - mpd->entry.dest = cfg->dest_apicid; + mpd->entry.destid_0_7 = cfg->dest_apicid; mpd->entry.vector = cfg->vector; } for_each_irq_pin(entry, mpd->irq_2_pin) @@ -1932,7 +1918,7 @@ static int ioapic_irq_get_chip_state(struct irq_data *irqd, * irrelevant because the IO-APIC treats them as fire and * forget. */ - if (rentry.irr && rentry.trigger) { + if (rentry.irr && rentry.is_level) { *state = true; break; } @@ -2057,12 +2043,12 @@ static inline void __init unlock_ExtINT_logic(void) memset(&entry1, 0, sizeof(entry1)); - entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL; - entry1.mask = IOAPIC_UNMASKED; - entry1.dest = hard_smp_processor_id(); - entry1.delivery_mode = APIC_DELIVERY_MODE_EXTINT; - entry1.polarity = entry0.polarity; - entry1.trigger = IOAPIC_EDGE; + entry1.dest_mode_logical = true; + entry1.masked = false; + entry1.destid_0_7 = hard_smp_processor_id(); + entry1.delivery_mode = APIC_DELIVERY_MODE_EXTINT; + entry1.active_low = entry0.active_low; + entry1.is_level = false; entry1.vector = 0; ioapic_write_entry(apic, pin, entry1); @@ -2937,17 +2923,17 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data, struct IO_APIC_route_entry *entry) { memset(entry, 0, sizeof(*entry)); - entry->delivery_mode = apic->delivery_mode; - entry->dest_mode = apic->dest_mode_logical; - entry->dest = cfg->dest_apicid; - entry->vector = cfg->vector; - entry->trigger = data->is_level; - entry->polarity = data->active_low; + entry->delivery_mode = apic->delivery_mode; + entry->dest_mode_logical = apic->dest_mode_logical; + entry->destid_0_7 = cfg->dest_apicid; + entry->vector = cfg->vector; + entry->is_level = data->is_level; + entry->active_low = data->active_low; /* * Mask level triggered irqs. Edge triggered irqs are masked * by the irq core code in case they fire. */ - entry->mask = data->is_level; + entry->masked = data->is_level; } int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index b0e5210e53b2..3d72ec7bbbf8 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3687,11 +3687,11 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data, entry = info->ioapic.entry; info->ioapic.entry = NULL; memset(entry, 0, sizeof(*entry)); - entry->vector = index; - entry->trigger = info->ioapic.is_level; - entry->polarity = info->ioapic.active_low; + entry->vector = index; + entry->is_level = info->ioapic.is_level; + entry->active_low = info->ioapic.active_low; /* Mask level triggered irqs. */ - entry->mask = info->ioapic.is_level; + entry->masked = info->ioapic.is_level; break; case X86_IRQ_ALLOC_TYPE_HPET: diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index e09e2d734c57..1ab7eb918a5c 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -52,7 +52,7 @@ static int hyperv_ir_set_affinity(struct irq_data *data, return ret; entry = data->chip_data; - entry->dest = cfg->dest_apicid; + entry->destid_0_7 = cfg->dest_apicid; entry->vector = cfg->vector; send_cleanup_vector(cfg); @@ -125,7 +125,7 @@ static int hyperv_irq_remapping_activate(struct irq_domain *domain, struct irq_cfg *cfg = irqd_cfg(irq_data); struct IO_APIC_route_entry *entry = irq_data->chip_data; - entry->dest = cfg->dest_apicid; + entry->destid_0_7 = cfg->dest_apicid; entry->vector = cfg->vector; return 0; diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 54ca69333445..625bdb9f1627 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1279,8 +1279,8 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, struct irq_alloc_info *info, int index, int sub_handle) { - struct IR_IO_APIC_route_entry *entry; struct irte *irte = &data->irte_entry; + struct IO_APIC_route_entry *entry; prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid); switch (info->type) { @@ -1294,22 +1294,21 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, irte->avail, irte->vector, irte->dest_id, irte->sid, irte->sq, irte->svt); - entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry; + entry = info->ioapic.entry; info->ioapic.entry = NULL; memset(entry, 0, sizeof(*entry)); - entry->index2 = (index >> 15) & 0x1; - entry->zero = 0; - entry->format = 1; - entry->index = (index & 0x7fff); + entry->ir_index_15 = !!(index & 0x8000); + entry->ir_format = true; + entry->ir_index_0_14 = index & 0x7fff; /* * IO-APIC RTE will be configured with virtual vector. * irq handler will do the explicit EOI to the io-apic. */ - entry->vector = info->ioapic.pin; - entry->trigger = info->ioapic.is_level; - entry->polarity = info->ioapic.active_low; + entry->vector = info->ioapic.pin; + entry->is_level = info->ioapic.is_level; + entry->active_low = info->ioapic.active_low; /* Mask level triggered irqs. */ - entry->mask = info->ioapic.is_level; + entry->masked = info->ioapic.is_level; break; case X86_IRQ_ALLOC_TYPE_HPET:
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index a4aeeaace040..aabd8f1b6bb0 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -45,12 +45,11 @@ enum irq_alloc_type { }; struct ioapic_alloc_info { - int pin; - int node; - u32 trigger : 1; - u32 polarity : 1; - u32 valid : 1; - struct IO_APIC_route_entry *entry; + int pin; + int node; + u32 trigger : 1; + u32 polarity : 1; + u32 valid : 1; }; struct uv_alloc_info { diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h index ee2f8ccc32d0..37c3d2d492c9 100644 --- a/arch/x86/include/asm/msidef.h +++ b/arch/x86/include/asm/msidef.h @@ -18,6 +18,7 @@ #define MSI_DATA_DELIVERY_MODE_SHIFT 8 #define MSI_DATA_DELIVERY_FIXED (0 << MSI_DATA_DELIVERY_MODE_SHIFT) #define MSI_DATA_DELIVERY_LOWPRI (1 << MSI_DATA_DELIVERY_MODE_SHIFT) +#define MSI_DATA_DELIVERY_MODE_MASK (3 << MSI_DATA_DELIVERY_MODE_SHIFT) #define MSI_DATA_LEVEL_SHIFT 14 #define MSI_DATA_LEVEL_DEASSERT (0 << MSI_DATA_LEVEL_SHIFT) @@ -37,6 +38,7 @@ #define MSI_ADDR_DEST_MODE_SHIFT 2 #define MSI_ADDR_DEST_MODE_PHYSICAL (0 << MSI_ADDR_DEST_MODE_SHIFT) #define MSI_ADDR_DEST_MODE_LOGICAL (1 << MSI_ADDR_DEST_MODE_SHIFT) +#define MSI_ADDR_DEST_MODE_MASK (1 << MSI_DATA_DELIVERY_MODE_SHIFT) #define MSI_ADDR_REDIRECTION_SHIFT 3 #define MSI_ADDR_REDIRECTION_CPU (0 << MSI_ADDR_REDIRECTION_SHIFT) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 54f6a029b1d1..ca2da19d5c55 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -48,6 +48,7 @@ #include <linux/jiffies.h> /* time_after() */ #include <linux/slab.h> #include <linux/memblock.h> +#include <linux/msi.h> #include <asm/irqdomain.h> #include <asm/io.h> @@ -63,6 +64,7 @@ #include <asm/setup.h> #include <asm/irq_remapping.h> #include <asm/hw_irq.h> +#include <asm/msidef.h> #include <asm/apic.h> @@ -1851,22 +1853,36 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data) eoi_ioapic_pin(data->entry.vector, data); } +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry) +{ + struct msi_msg msg; + u32 *entry = _entry; + + irq_chip_compose_msi_msg(irq_data, &msg); + + /* + * They're in a bit of a random order for historical reasons, but + * the IO/APIC is just a device for turning interrupt lines into + * MSIs, and various bits of the MSI addr/data are just swizzled + * into/from the bits of Redirection Table Entry. + */ + entry[0] &= 0xfffff000; + entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK | + MSI_DATA_VECTOR_MASK)); + entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9; + + entry[1] &= 0xffff; + entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12; +} + + static void ioapic_configure_entry(struct irq_data *irqd) { struct mp_chip_data *mpd = irqd->chip_data; - struct irq_cfg *cfg = irqd_cfg(irqd); struct irq_pin_list *entry; - /* - * Only update when the parent is the vector domain, don't touch it - * if the parent is the remapping domain. Check the installed - * ioapic chip to verify that. - */ - if (irqd->chip == &ioapic_chip) { - mpd->entry.dest = cfg->dest_apicid & 0xff; - mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8; - mpd->entry.vector = cfg->vector; - } + mp_swizzle_msi_dest_bits(irqd, &mpd->entry); + for_each_irq_pin(entry, mpd->irq_2_pin) __ioapic_write_entry(entry->apic, entry->pin, mpd->entry); } @@ -2949,15 +2965,14 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data, } } -static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data, - struct IO_APIC_route_entry *entry) +static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data) { + struct IO_APIC_route_entry *entry = &data->entry; + memset(entry, 0, sizeof(*entry)); - entry->delivery_mode = apic->irq_delivery_mode; - entry->dest_mode = apic->irq_dest_mode; - entry->dest = cfg->dest_apicid & 0xff; - entry->virt_ext_dest = cfg->dest_apicid >> 8; - entry->vector = cfg->vector; + + mp_swizzle_msi_dest_bits(irq_data, entry); + entry->trigger = data->trigger; entry->polarity = data->polarity; /* @@ -2995,7 +3010,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, if (!data) return -ENOMEM; - info->ioapic.entry = &data->entry; ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info); if (ret < 0) { kfree(data); @@ -3013,8 +3027,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin); local_irq_save(flags); - if (info->ioapic.entry) - mp_setup_entry(cfg, data, info->ioapic.entry); + mp_setup_entry(irq_data, data); mp_register_handler(virq, data->trigger); if (virq < nr_legacy_irqs()) legacy_pic->mask(virq); diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index ef64e01f66d7..13d0a8f42d56 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3597,7 +3597,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data, { struct irq_2_irte *irte_info = &data->irq_2_irte; struct msi_msg *msg = &data->msi_entry; - struct IO_APIC_route_entry *entry; struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; if (!iommu) @@ -3611,19 +3610,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data, switch (info->type) { case X86_IRQ_ALLOC_TYPE_IOAPIC: - /* Setup IOAPIC entry */ - entry = info->ioapic.entry; - info->ioapic.entry = NULL; - memset(entry, 0, sizeof(*entry)); - entry->vector = index; - entry->mask = 0; - entry->trigger = info->ioapic.trigger; - entry->polarity = info->ioapic.polarity; - /* Mask level triggered irqs. */ - if (info->ioapic.trigger) - entry->mask = 1; - break; - case X86_IRQ_ALLOC_TYPE_HPET: case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index e09e2d734c57..37dd485a5640 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data, { struct irq_data *parent = data->parent_data; struct irq_cfg *cfg = irqd_cfg(data); - struct IO_APIC_route_entry *entry; int ret; /* Return error If new irq affinity is out of ioapic_max_cpumask. */ @@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data, if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE) return ret; - entry = data->chip_data; - entry->dest = cfg->dest_apicid; - entry->vector = cfg->vector; send_cleanup_vector(cfg); return 0; @@ -89,20 +85,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain, irq_data->chip = &hyperv_ir_chip; - /* - * If there is interrupt remapping function of IOMMU, setting irq - * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't - * support interrupt remapping function, setting irq affinity of IO-APIC - * interrupts still needs to change IO-APIC registers. But ioapic_ - * configure_entry() will ignore value of cfg->vector and cfg-> - * dest_apicid when IO-APIC's parent irq domain is not the vector - * domain.(See ioapic_configure_entry()) In order to setting vector - * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved - * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_ - * affinity() set vector and dest_apicid directly into IO-APIC entry. - */ - irq_data->chip_data = info->ioapic.entry; - /* * Hypver-V IO APIC irq affinity should be in the scope of * ioapic_max_cpumask because no irq remapping support. @@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain, irq_domain_free_irqs_common(domain, virq, nr_irqs); } -static int hyperv_irq_remapping_activate(struct irq_domain *domain, - struct irq_data *irq_data, bool reserve) -{ - struct irq_cfg *cfg = irqd_cfg(irq_data); - struct IO_APIC_route_entry *entry = irq_data->chip_data; - - entry->dest = cfg->dest_apicid; - entry->vector = cfg->vector; - - return 0; -} - static const struct irq_domain_ops hyperv_ir_domain_ops = { .alloc = hyperv_irq_remapping_alloc, .free = hyperv_irq_remapping_free, - .activate = hyperv_irq_remapping_activate, }; static int __init hyperv_prepare_irq_remapping(void) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 0cfce1d3b7bb..511dfb4884bc 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1265,7 +1265,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, struct irq_alloc_info *info, int index, int sub_handle) { - struct IR_IO_APIC_route_entry *entry; struct irte *irte = &data->irte_entry; struct msi_msg *msg = &data->msi_entry; @@ -1281,23 +1280,15 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, irte->avail, irte->vector, irte->dest_id, irte->sid, irte->sq, irte->svt); - entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry; - info->ioapic.entry = NULL; - memset(entry, 0, sizeof(*entry)); - entry->index2 = (index >> 15) & 0x1; - entry->zero = 0; - entry->format = 1; - entry->index = (index & 0x7fff); /* * IO-APIC RTE will be configured with virtual vector. * irq handler will do the explicit EOI to the io-apic. */ - entry->vector = info->ioapic.pin; - entry->mask = 0; /* enable IRQ */ - entry->trigger = info->ioapic.trigger; - entry->polarity = info->ioapic.polarity; - if (info->ioapic.trigger) - entry->mask = 1; /* Mask level triggered irqs. */ + msg->data = info->ioapic.pin; + msg->address_hi = MSI_ADDR_BASE_HI; + msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT | + MSI_ADDR_IR_INDEX1(index) | + MSI_ADDR_IR_INDEX2(index); break; case X86_IRQ_ALLOC_TYPE_HPET: