Message ID | 1468294608-30619-1-git-send-email-shankerd@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Shanker, On 12/07/16 04:36, Shanker Donthineni wrote: > Read-allocation hints are not enabled for both the GIC-ITS and GICR > tables. This forces the hardware to always read the table contents > from an external memory (DDR) which is slow compared to cache memory. > Most of the tables are often read by hardware. So, it's better to > enable Read-allocate hints in addition to Write-allocate hints in > order to improve the GICR_PEND, GICR_PROP, Collection, Device, and > vCPU tables lookup time. While I'm not opposed to such a change, I'd like to see some evidence that this actually makes a difference. Have you measured an improvement on a particular implementation? If so, could you share your benchmarking method so that it could be be measured on others as well? Thanks, M.
Hi Marc, On 07/12/2016 03:09 AM, Marc Zyngier wrote: > Hi Shanker, > > On 12/07/16 04:36, Shanker Donthineni wrote: >> Read-allocation hints are not enabled for both the GIC-ITS and GICR >> tables. This forces the hardware to always read the table contents >> from an external memory (DDR) which is slow compared to cache memory. >> Most of the tables are often read by hardware. So, it's better to >> enable Read-allocate hints in addition to Write-allocate hints in >> order to improve the GICR_PEND, GICR_PROP, Collection, Device, and >> vCPU tables lookup time. > While I'm not opposed to such a change, I'd like to see some evidence > that this actually makes a difference. Have you measured an improvement > on a particular implementation? If so, could you share your benchmarking > method so that it could be be measured on others as well? I have seen at least 5% performance gain when I was testing direct VLPI feature on Qualcomm emulation platforms. On Silicon, this gain is not noticeable. > Thanks, > > M.
Marc, Are you planning to push this change? I talked to Qualcomm ITS hw team and they told me nice to have this change even though we see a small gain. Shanker On 07/12/2016 08:32 AM, Shanker Donthineni wrote: > Hi Marc, > > On 07/12/2016 03:09 AM, Marc Zyngier wrote: >> Hi Shanker, >> >> On 12/07/16 04:36, Shanker Donthineni wrote: >>> Read-allocation hints are not enabled for both the GIC-ITS and GICR >>> tables. This forces the hardware to always read the table contents >>> from an external memory (DDR) which is slow compared to cache memory. >>> Most of the tables are often read by hardware. So, it's better to >>> enable Read-allocate hints in addition to Write-allocate hints in >>> order to improve the GICR_PEND, GICR_PROP, Collection, Device, and >>> vCPU tables lookup time. >> While I'm not opposed to such a change, I'd like to see some evidence >> that this actually makes a difference. Have you measured an improvement >> on a particular implementation? If so, could you share your benchmarking >> method so that it could be be measured on others as well? > I have seen at least 5% performance gain when I was testing direct > VLPI feature > on Qualcomm emulation platforms. On Silicon, this gain is not noticeable. > > >> Thanks, >> >> M. >
On 29/08/16 16:35, Shanker Donthineni wrote: > Marc, > > Are you planning to push this change? I talked to Qualcomm ITS hw team > and they told me nice to have this change even though we see a small gain. Hi Shanker, As I asked before, I'd like to know what is the actual gain on real HW, and how you measured it, so that I can try and make sure this doesn't introduce regressions on other implementations. If it does, then we'll probably have to quirk it. Thanks, M.
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 7ceaba8..6fc92a8 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -954,7 +954,7 @@ static bool its_parse_baser_device(struct its_node *its, struct its_baser *baser u32 psz, u32 *order) { u64 esz = GITS_BASER_ENTRY_SIZE(its_read_baser(its, baser)); - u64 val = GITS_BASER_InnerShareable | GITS_BASER_WaWb; + u64 val = GITS_BASER_InnerShareable | GITS_BASER_RaWaWb; u32 ids = its->device_ids; u32 new_order = *order; bool indirect = false; @@ -1019,7 +1019,7 @@ static int its_alloc_tables(struct its_node *its) u64 typer = readq_relaxed(its->base + GITS_TYPER); u32 ids = GITS_TYPER_DEVBITS(typer); u64 shr = GITS_BASER_InnerShareable; - u64 cache = GITS_BASER_WaWb; + u64 cache = GITS_BASER_RaWaWb; u32 psz = SZ_64K; int err, i; @@ -1116,7 +1116,7 @@ static void its_cpu_init_lpis(void) /* set PROPBASE */ val = (page_to_phys(gic_rdists->prop_page) | GICR_PROPBASER_InnerShareable | - GICR_PROPBASER_WaWb | + GICR_PROPBASER_RaWaWb | ((LPI_NRBITS - 1) & GICR_PROPBASER_IDBITS_MASK)); writeq_relaxed(val, rbase + GICR_PROPBASER); @@ -1141,7 +1141,7 @@ static void its_cpu_init_lpis(void) /* set PENDBASE */ val = (page_to_phys(pend_page) | GICR_PENDBASER_InnerShareable | - GICR_PENDBASER_WaWb); + GICR_PENDBASER_RaWaWb); writeq_relaxed(val, rbase + GICR_PENDBASER); tmp = readq_relaxed(rbase + GICR_PENDBASER);
Read-allocation hints are not enabled for both the GIC-ITS and GICR tables. This forces the hardware to always read the table contents from an external memory (DDR) which is slow compared to cache memory. Most of the tables are often read by hardware. So, it's better to enable Read-allocate hints in addition to Write-allocate hints in order to improve the GICR_PEND, GICR_PROP, Collection, Device, and vCPU tables lookup time. Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> --- drivers/irqchip/irq-gic-v3-its.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)