From patchwork Sat Jul 10 07:01:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leo Yan X-Patchwork-Id: 12368357 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E0BFC07E95 for ; Sat, 10 Jul 2021 07:02:50 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 58E68613C3 for ; Sat, 10 Jul 2021 07:02:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 58E68613C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=h4oZysVBqv3DDpuLsgJdl5sZVdQQMoorch5OD1HIuWo=; b=pT55zCmudZzS7q Lakh9pW+CmRlM+2o/vFNLyhdEMVn8Jwar3rztdVdVMlPrMIH0Z2NpRUEsn1XjkPEgpRk22W7Hgj8c xXs8dhGmyEpCxKDIL0de6n5JUZHxA9oLezu24TBRrijtImhwPLtqWWF2i5a0AnYsSqIeTIWG9/CqF u9cLMEgIHgCafUv1Ni0eyVvnRSk77LE3LdHtT2XnwYGwJrkTO9+bpXAa+ng/vdsQGSXQqxJncamgh Pn49UKHeFyh/xBKGZYQ6y2362D20KDHB/dNixqCuYtcHnESImsBRC6XZOhGyryBCWdC4xDnD95k3Y 2nL2hcQdzUw7vtwYGyYA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1m26zN-003Aoy-M1; Sat, 10 Jul 2021 07:01:25 +0000 Received: from mail-pl1-x62c.google.com ([2607:f8b0:4864:20::62c]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1m26zJ-003Ao4-8M for linux-arm-kernel@lists.infradead.org; Sat, 10 Jul 2021 07:01:23 +0000 Received: by mail-pl1-x62c.google.com with SMTP id b12so3295270plh.10 for ; Sat, 10 Jul 2021 00:01:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=gRxQ1Imlo1o6LoMENe44lEkVB1Lq1SxOmIRVr1+XIjk=; b=udQugjiUxnU3JISFgL6nJ6aj9D/VOokdRj5eO0ztnkzYoIc/vjxZEDA3lDgNyKMK92 FmwQvaQyanxs+QtuVdGpzna/iukRnMQeubegP7AT3HgetTU5/gq8Bj5l5IdB2sXnzk2g oukC4Re0Odm4tpIteHr+pDbyK/qLudApX3ceLG6r13jlcYBXoQHw08AB9zNPa/4zhpe1 Szzy4c9/M4jTvzRSE+QFe9cOthuc9SnwZzSvcGxf2/X/VkSnrSJ8c308KMhtlrF9zlBd zpibiuW78KMrmp1WtFbOjja2TXup1mwfcmv3WXSgzLeBP2BFFwGaELOCgtiCJmsO0QcK iN/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=gRxQ1Imlo1o6LoMENe44lEkVB1Lq1SxOmIRVr1+XIjk=; b=XykOHJgmbiXEs02G6+IZ82QG8lOX8ESnHy+TbGKW7R4efr9xEUC1Hwfa6gacHDdJEn 9l62s5yiBOvTYWTV+UPTd9PprcUaYFcFk9mjGEBIKJgu1QHhmk3R4j0qpU7kzrz5hW7b Ao2EowH/D8CcWaikkLf5dfluYUnimCoOzGDy3fK2qNDjiBtXMlSPnsznSKyxdkvnFNzD mtksyj+KiWdjHsl+8Qh3UfCcaiHpI9pa2mddoqmOPGK2Q4nbFQNJpkOP8RMF7ujkeDi5 NjjObc87tQJ6gNpECRYb//7ofwR56cXDxshEFQGH5BPIjwooCvfxQWpaEic43GGgOXxd VFww== X-Gm-Message-State: AOAM533FYaLy7B1e5MgLBRqET++popzVyR2+bQVOQFA7Gmk50olyorcC JysyhpVrv3qEbuYYftQaiXBvZPkq2c9ND1WX X-Google-Smtp-Source: ABdhPJzeLdFdrL3o+iTQzpKjSswpyEDkSHrKAXiwU//V6MR7B6JcAnlmtkdY2yecpASQ+7+f3Jey3g== X-Received: by 2002:a17:902:9695:b029:117:2072:88a8 with SMTP id n21-20020a1709029695b0290117207288a8mr34410670plp.64.1625900479991; Sat, 10 Jul 2021 00:01:19 -0700 (PDT) Received: from localhost ([103.127.241.250]) by smtp.gmail.com with ESMTPSA id m13sm8179970pfo.102.2021.07.10.00.01.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 10 Jul 2021 00:01:19 -0700 (PDT) From: Leo Yan To: Mathieu Poirier , Suzuki K Poulose , Mike Leach , Alexander Shishkin , coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Leo Yan Subject: [PATCH v2] coresight: tmc-etr: Speed up for bounce buffer in flat mode Date: Sat, 10 Jul 2021 15:01:15 +0800 Message-Id: <20210710070115.462674-1-leo.yan@linaro.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210710_000121_347565_5E3225F6 X-CRM114-Status: GOOD ( 23.65 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The AUX bounce buffer is allocated with API dma_alloc_coherent(), in the low level's architecture code, e.g. for Arm64, it maps the memory with the attribution "Normal non-cacheable"; this can be concluded from the definition for pgprot_dmacoherent() in arch/arm64/include/asm/pgtable.h. Later when access the AUX bounce buffer, since the memory mapping is non-cacheable, it's low efficiency due to every load instruction must reach out DRAM. This patch changes to allocate pages with alloc_pages_node(), thus the driver can access the memory with cacheable mapping in the kernel linear virtual address; therefore, because load instructions can fetch data from cache lines rather than always read data from DRAM, the driver can boost memory coping performance. After using the cacheable mapping, the driver uses dma_sync_single_for_cpu() to invalidate cacheline prior to read bounce buffer so can avoid read stale trace data. By measurement the duration for function tmc_update_etr_buffer() with ftrace function_graph tracer, it shows the performance significant improvement for copying 4MiB data from bounce buffer: # echo tmc_etr_get_data_flat_buf > set_graph_notrace // avoid noise # echo tmc_update_etr_buffer > set_graph_function # echo function_graph > current_tracer before: # CPU DURATION FUNCTION CALLS # | | | | | | | 2) | tmc_update_etr_buffer() { ... 2) # 8148.320 us | } after: # CPU DURATION FUNCTION CALLS # | | | | | | | 2) | tmc_update_etr_buffer() { ... 2) # 2463.980 us | } Signed-off-by: Leo Yan Reviewed-by: Suzuki K Poulose --- Changes from v1: Set "flat_buf->daddr" to 0 when fails to map DMA region; and dropped the unexpected if condition change in tmc_etr_free_flat_buf(). .../hwtracing/coresight/coresight-tmc-etr.c | 56 ++++++++++++++++--- 1 file changed, 49 insertions(+), 7 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index acdb59e0e661..888b0f929d33 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -21,6 +21,7 @@ struct etr_flat_buf { struct device *dev; + struct page *pages; dma_addr_t daddr; void *vaddr; size_t size; @@ -600,6 +601,7 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata, { struct etr_flat_buf *flat_buf; struct device *real_dev = drvdata->csdev->dev.parent; + ssize_t aligned_size; /* We cannot reuse existing pages for flat buf */ if (pages) @@ -609,11 +611,18 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata, if (!flat_buf) return -ENOMEM; - flat_buf->vaddr = dma_alloc_coherent(real_dev, etr_buf->size, - &flat_buf->daddr, GFP_KERNEL); - if (!flat_buf->vaddr) { - kfree(flat_buf); - return -ENOMEM; + aligned_size = PAGE_ALIGN(etr_buf->size); + flat_buf->pages = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, + get_order(aligned_size)); + if (!flat_buf->pages) + goto fail_alloc_pages; + + flat_buf->vaddr = page_address(flat_buf->pages); + flat_buf->daddr = dma_map_page(real_dev, flat_buf->pages, 0, + aligned_size, DMA_FROM_DEVICE); + if (dma_mapping_error(real_dev, flat_buf->daddr)) { + flat_buf->daddr = 0; + goto fail_dma_map_page; } flat_buf->size = etr_buf->size; @@ -622,6 +631,12 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata, etr_buf->mode = ETR_MODE_FLAT; etr_buf->private = flat_buf; return 0; + +fail_dma_map_page: + __free_pages(flat_buf->pages, get_order(aligned_size)); +fail_alloc_pages: + kfree(flat_buf); + return -ENOMEM; } static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf) @@ -630,15 +645,20 @@ static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf) if (flat_buf && flat_buf->daddr) { struct device *real_dev = flat_buf->dev->parent; + ssize_t aligned_size = PAGE_ALIGN(etr_buf->size); - dma_free_coherent(real_dev, flat_buf->size, - flat_buf->vaddr, flat_buf->daddr); + dma_unmap_page(real_dev, flat_buf->daddr, aligned_size, + DMA_FROM_DEVICE); + __free_pages(flat_buf->pages, get_order(aligned_size)); } kfree(flat_buf); } static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp) { + struct etr_flat_buf *flat_buf = etr_buf->private; + struct device *real_dev = flat_buf->dev->parent; + /* * Adjust the buffer to point to the beginning of the trace data * and update the available trace data. @@ -648,6 +668,28 @@ static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp) etr_buf->len = etr_buf->size; else etr_buf->len = rwp - rrp; + + if (etr_buf->offset + etr_buf->len > etr_buf->size) { + int len1, len2; + + /* + * If trace data is wrapped around, sync AUX bounce buffer + * for two chunks: "len1" is for the trace date length at + * the tail of bounce buffer, and "len2" is the length from + * the start of the buffer after wrapping around. + */ + len1 = etr_buf->size - etr_buf->offset; + len2 = etr_buf->len - len1; + dma_sync_single_for_cpu(real_dev, + flat_buf->daddr + etr_buf->offset, + len1, DMA_FROM_DEVICE); + dma_sync_single_for_cpu(real_dev, flat_buf->daddr, + len2, DMA_FROM_DEVICE); + } else { + dma_sync_single_for_cpu(real_dev, + flat_buf->daddr + etr_buf->offset, + etr_buf->len, DMA_FROM_DEVICE); + } } static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,