[RFC,0/2] iommu/arm-smmu-v3: Improve cmdq lock efficiency

Message ID	1591012248-37956-1-git-send-email-john.garry@huawei.com (mailing list archive)
Headers	show Return-Path: <SRS0=1A0W=7O=lists.infradead.org=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 44D9820738 From: John Garry <john.garry@huawei.com> To: <will@kernel.org>, <robin.murphy@arm.com> Subject: [PATCH RFC 0/2] iommu/arm-smmu-v3: Improve cmdq lock efficiency Date: Mon, 1 Jun 2020 19:50:46 +0800 Message-ID: <1591012248-37956-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 summary: Content analysis details: (-2.3 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.190 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H4 RBL: Very Good reputation (+4) [45.249.212.190 listed in wl.mailspike.net] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders Precedence: list Cc: song.bao.hua@hisilicon.com, maz@kernel.org, joro@8bytes.org, John Garry <john.garry@huawei.com>, iommu@lists.linux-foundation.org, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
Series	iommu/arm-smmu-v3: Improve cmdq lock efficiency \| expand [RFC,0/2] iommu/arm-smmu-v3: Improve cmdq lock efficiency [RFC,1/2] iommu/arm-smmu-v3: Calculate bits for prod and owner [RFC,2/2] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()

Message ID

1591012248-37956-1-git-send-email-john.garry@huawei.com (mailing list archive)

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 44D9820738
From: John Garry <john.garry@huawei.com>
To: <will@kernel.org>, <robin.murphy@arm.com>
Subject: [PATCH RFC 0/2] iommu/arm-smmu-v3: Improve cmdq lock efficiency
Date: Mon, 1 Jun 2020 19:50:46 +0800
Message-ID: <1591012248-37956-1-git-send-email-john.garry@huawei.com>
MIME-Version: 1.0
Precedence: list
Cc: song.bao.hua@hisilicon.com, maz@kernel.org, joro@8bytes.org,
 John Garry <john.garry@huawei.com>, iommu@lists.linux-foundation.org,
 linux-arm-kernel@lists.infradead.org
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Series

iommu/arm-smmu-v3: Improve cmdq lock efficiency | expand

Message

John Garry June 1, 2020, 11:50 a.m. UTC

As mentioned in [0], the CPU may consume many cycles processing
arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to
get space on the queue takes approx 25% of the cycles for this function.

The cmpxchg() is removed as follows:
- We assume that the cmdq can never fill with changes to limit the
  batch size (where necessary) and always issue a CMD_SYNC for a batch
  We need to do this since we no longer maintain the cons value in
  software, and we cannot deal with no available space properly.
- Replace cmpxchg() with atomic inc operation, to maintain the prod
  and owner values.

Early experiments have shown that we may see a 25% boost in throughput
IOPS for my NVMe test with these changes. And some CPUs, which were
loaded at ~55%, now see a ~45% load.

So, even though the changes are incomplete and other parts of the driver
will need fixing up (and it looks maybe broken for !MSI support), the
performance boost seen would seem to be worth the effort of exploring
this.

Comments requested please.

Thanks

[0] https://lore.kernel.org/linux-iommu/B926444035E5E2439431908E3842AFD24B86DB@DGGEMI525-MBS.china.huawei.com/T/#ma02e301c38c3e94b7725e685757c27e39c7cbde3

John Garry (2):
  iommu/arm-smmu-v3: Calculate bits for prod and owner
  iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()

 drivers/iommu/arm-smmu-v3.c | 92 +++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 45 deletions(-)