Message ID | 20210818080452.2079-3-thunder.leizhen@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd() | expand |
On Wed, Aug 18, 2021 at 04:04:52PM +0800, Zhen Lei wrote: > Although the parameter 'cmd' is always passed by a local array variable, > and only this function modifies it, the compiler does not know this. The > compiler almost always reads the value of cmd[i] from memory rather than > directly using the value cached in the register. This generates many > useless instruction operations and affects the performance to some extent. > > To guide the compiler for proper optimization, 'cmd' is defined as a local > array variable, marked as register, and copied to the output parameter at > a time when the function is returned. > > The optimization effect can be viewed by running the "size arm-smmu-v3.o" > command. > > Before: > text data bss dec hex > 26954 1348 56 28358 6ec6 > > After: > text data bss dec hex > 26762 1348 56 28166 6e06 > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 01e95b56ffa07d1..7cec0c967f71d86 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -234,10 +234,12 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent) > } > > /* High-level queue accessors */ > -static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) > +static int arm_smmu_cmdq_build_cmd(u64 *out_cmd, struct arm_smmu_cmdq_ent *ent) > { > - memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT); > - cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode); > + register u64 cmd[CMDQ_ENT_DWORDS]; 'register' is pretty badly specified outside of a handful of limited uses in conjunction with inline asm, so I really don't think we should be using it here. > + cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode); This generates a compiler warning about taking the address of a 'register' variable. Will
On 2021/10/4 20:07, Will Deacon wrote: > On Wed, Aug 18, 2021 at 04:04:52PM +0800, Zhen Lei wrote: >> Although the parameter 'cmd' is always passed by a local array variable, >> and only this function modifies it, the compiler does not know this. The >> compiler almost always reads the value of cmd[i] from memory rather than >> directly using the value cached in the register. This generates many >> useless instruction operations and affects the performance to some extent. >> >> To guide the compiler for proper optimization, 'cmd' is defined as a local >> array variable, marked as register, and copied to the output parameter at >> a time when the function is returned. >> >> The optimization effect can be viewed by running the "size arm-smmu-v3.o" >> command. >> >> Before: >> text data bss dec hex >> 26954 1348 56 28358 6ec6 >> >> After: >> text data bss dec hex >> 26762 1348 56 28166 6e06 >> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >> --- >> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++--- >> 1 file changed, 8 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> index 01e95b56ffa07d1..7cec0c967f71d86 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> @@ -234,10 +234,12 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent) >> } >> >> /* High-level queue accessors */ >> -static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) >> +static int arm_smmu_cmdq_build_cmd(u64 *out_cmd, struct arm_smmu_cmdq_ent *ent) >> { >> - memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT); >> - cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode); >> + register u64 cmd[CMDQ_ENT_DWORDS]; > > 'register' is pretty badly specified outside of a handful of limited uses in > conjunction with inline asm, so I really don't think we should be using it > here. OK, I think I was overly aggressive in the beginning, and I just tried to remove the register modifier and get the same optimization. > >> + cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode); > > This generates a compiler warning about taking the address of a 'register' > variable. > > Will > . >
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 01e95b56ffa07d1..7cec0c967f71d86 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -234,10 +234,12 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent) } /* High-level queue accessors */ -static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) +static int arm_smmu_cmdq_build_cmd(u64 *out_cmd, struct arm_smmu_cmdq_ent *ent) { - memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT); - cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode); + register u64 cmd[CMDQ_ENT_DWORDS]; + + cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode); + cmd[1] = 0; switch (ent->opcode) { case CMDQ_OP_TLBI_EL2_ALL: @@ -332,6 +334,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) return -ENOENT; } + out_cmd[0] = cmd[0]; + out_cmd[1] = cmd[1]; + return 0; }
Although the parameter 'cmd' is always passed by a local array variable, and only this function modifies it, the compiler does not know this. The compiler almost always reads the value of cmd[i] from memory rather than directly using the value cached in the register. This generates many useless instruction operations and affects the performance to some extent. To guide the compiler for proper optimization, 'cmd' is defined as a local array variable, marked as register, and copied to the output parameter at a time when the function is returned. The optimization effect can be viewed by running the "size arm-smmu-v3.o" command. Before: text data bss dec hex 26954 1348 56 28358 6ec6 After: text data bss dec hex 26762 1348 56 28166 6e06 Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)