Message ID | 20240701062042.4128863-1-tangnianyao@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESPIN] irqchip/gic-v4.1: Use local 4_1 ITS to generate VSGI | expand |
Please don't use "RESPIN" as a subject tag. This means nothing, and confuses the tooling such as b4, which expects a version number. If the code has changed in any way, increment the version number. This really should have been a v2. On Mon, 01 Jul 2024 07:20:42 +0100, Nianyao Tang <tangnianyao@huawei.com> wrote: > > On multi-node GICv4.1 system, VSGI senders always use one certain 4_1 ITS, > because find_4_1_its return the first its_node in list, regardless of > which node the VSGI sender is on. This brings guest vsgi performance drop > when VM is not deployed on the same node as this returned ITS. s/deployed/running/ > > On a 2-socket environment, each with one ITS and 32 cpu, GICv4.1 enabled, > 4U8G guest, 4 vcpu is deployed on same socket. s/deployed/running/ > When VM on socket0, kvm-unit-tests ipi_hw result is 850ns. > When VM on socket1, it is 750ns. The reason is VSGI sender always > use lasted reported ITS(that on socket1) to inject VSGI. The access s/lasted/the last/ > from cpu to other-socket ITS will cost 100ns more compared to cpu to > local ITS. > > To use local ITS, we can get 12% reduction in IPI latency. s/To use/By using a/ > > The patch modify find_4_1_its to firstly return per-cpu local_4_1_its, Drop "the patch". s/firstly/first/ > which is init when inherit the VPE table from the ITS on secondary CPUs. or from another CPU. > If fail to find local 4_1 ITS, return any 4_1 ITS like before. > > Signed-off-by: Nianyao Tang <tangnianyao@huawei.com> > Reviewed-by: Marc Zyngier <maz@kernel.org> No. I never gave this tag. You can (and probably should) add a "Suggested-by:" tag, but not a "Reviewed-by:", until I explicitly reply to the patch with that tag. Please resend it as a v3 with all of the above fixed. Thanks, M.
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 3c755d5dad6e..f99c0a86320b 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -119,6 +119,8 @@ struct its_node { int vlpi_redist_offset; }; +static DEFINE_PER_CPU(struct its_node *, local_4_1_its); + #define is_v4(its) (!!((its)->typer & GITS_TYPER_VLPIS)) #define is_v4_1(its) (!!((its)->typer & GITS_TYPER_VMAPP)) #define device_ids(its) (FIELD_GET(GITS_TYPER_DEVBITS, (its)->typer) + 1) @@ -2709,6 +2711,8 @@ static u64 inherit_vpe_l1_table_from_its(void) } val |= FIELD_PREP(GICR_VPROPBASER_4_1_SIZE, GITS_BASER_NR_PAGES(baser) - 1); + *this_cpu_ptr(&local_4_1_its) = its; + return val; } @@ -2746,6 +2750,8 @@ static u64 inherit_vpe_l1_table_from_rd(cpumask_t **mask) gic_data_rdist()->vpe_l1_base = gic_data_rdist_cpu(cpu)->vpe_l1_base; *mask = gic_data_rdist_cpu(cpu)->vpe_table_mask; + *this_cpu_ptr(&local_4_1_its) = *per_cpu_ptr(&local_4_1_its, cpu); + return val; } @@ -4058,8 +4064,9 @@ static struct irq_chip its_vpe_irq_chip = { static struct its_node *find_4_1_its(void) { - static struct its_node *its = NULL; + struct its_node *its; + its = *this_cpu_ptr(&local_4_1_its); if (!its) { list_for_each_entry(its, &its_nodes, entry) { if (is_v4_1(its))