From patchwork Fri Aug 21 13:54:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Garry X-Patchwork-Id: 11729527 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EA885138C for ; Fri, 21 Aug 2020 13:58:38 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C2C9320720 for ; Fri, 21 Aug 2020 13:58:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="hv7NDkRP" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2C9320720 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:To:From: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=B6P/jWo9LfAsWIQaJ7GNsHi53DdFNJmgHZAe+cpqXyI=; b=hv7NDkRPQCWlT66iCVxu3irB7l Vj4vYgZmYL+g5Bg5w6Ixy+cL7ozoP5iQVjNTln4BUQQ27g6YGbxe3NToGSMyMJpTfKFg0+D28ZaGj BHmXDjN1icuIUcQYGVLZTYailfcjJRYdxZxZ9CJHobi75NVSOB8OMdh+L9pOvmT036mencKh2ft4U 5npFOHf7VJjOIX7yYjQbLooV9fVLOo37Ns2qsqGjIWJQzsJ4ykAVnG69+QWArF25RLV/n718v5Mv/ 9da8N7u34N5HQYzKXYsw5PVgQSJgMV7F+n9PUJIh92uVx3KJTj8wXA/Vb6zPy3UCE8mDnOZmMU3ib OazXhJtg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k97Yq-0005Oi-Ds; Fri, 21 Aug 2020 13:58:28 +0000 Received: from szxga04-in.huawei.com ([45.249.212.190] helo=huawei.com) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1k97Yk-0005LI-8F for linux-arm-kernel@lists.infradead.org; Fri, 21 Aug 2020 13:58:26 +0000 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 1382B9333EB0FF14B947; Fri, 21 Aug 2020 21:58:15 +0800 (CST) Received: from localhost.localdomain (10.69.192.58) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.487.0; Fri, 21 Aug 2020 21:58:04 +0800 From: John Garry To: , Subject: [PATCH v2 0/2] iommu/arm-smmu-v3: Improve cmdq lock efficiency Date: Fri, 21 Aug 2020 21:54:20 +0800 Message-ID: <1598018062-175608-1-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 MIME-Version: 1.0 X-Originating-IP: [10.69.192.58] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200821_095822_689229_03DC97BF X-CRM114-Status: GOOD ( 10.22 ) X-Spam-Score: -2.3 (--) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-2.3 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.190 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H4 RBL: Very Good reputation (+4) [45.249.212.190 listed in wl.mailspike.net] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: maz@kernel.org, joro@8bytes.org, John Garry , linux-kernel@vger.kernel.org, linuxarm@huawei.com, iommu@lists.linux-foundation.org, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org As mentioned in [0], the CPU may consume many cycles processing arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to get space on the queue takes a lot of time once we start getting many CPUs contending - from experiment, for 64 CPUs contending the cmdq, success rate is ~ 1 in 12, which is poor, but not totally awful. This series removes that cmpxchg() and replaces with an atomic_add, same as how the actual cmdq deals with maintaining the prod pointer. For my NVMe test with 3x NVMe SSDs, I'm getting a ~24% throughput increase: Before: 1250K IOPs After: 1550K IOPs I also have a test harness to check the rate of DMA map+unmaps we can achieve: CPU count 8 16 32 64 Before: 282K 115K 36K 11K After: 302K 193K 80K 30K (unit is map+unmaps per CPU per second) [0] https://lore.kernel.org/linux-iommu/B926444035E5E2439431908E3842AFD24B86DB@DGGEMI525-MBS.china.huawei.com/T/#ma02e301c38c3e94b7725e685757c27e39c7cbde3 Differences to v1: - Simplify by dropping patch to always issue a CMD_SYNC - Use 64b atomic add, keeping prod in a separate 32b field John Garry (2): iommu/arm-smmu-v3: Calculate max commands per batch iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist() drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 166 ++++++++++++++------ 1 file changed, 114 insertions(+), 52 deletions(-)