From patchwork Thu Nov 1 14:46:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sharat Masetty X-Patchwork-Id: 10664043 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8047D17D5 for ; Thu, 1 Nov 2018 14:46:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BBEB27968 for ; Thu, 1 Nov 2018 14:46:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5FBC828760; Thu, 1 Nov 2018 14:46:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BF9B927968 for ; Thu, 1 Nov 2018 14:46:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728298AbeKAXuL (ORCPT ); Thu, 1 Nov 2018 19:50:11 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:39912 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728264AbeKAXuL (ORCPT ); Thu, 1 Nov 2018 19:50:11 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id AD68C6053D; Thu, 1 Nov 2018 14:46:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1541083613; bh=5hi6Nur1Rz4izJy5MELwBJ8oflhF7Lna535yfQBne58=; h=From:To:Cc:Subject:Date:From; b=door/DGId826lOArAwqybNfnJKy2tH+uR+clnq39V8LGj6GZE3MrvVyjRUF7ELzqj EORTJ23nnXGYC35Y+UvofKlL3Wx4euiK1SWu+72Trv+lBsTfXAMABqKxQ/eC6uQNBq po6aq2VfS8/2LLgAo9Kvrou4wmsm5vrHub11FnI0= Received: from smasetty-linux.qualcomm.com (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: smasetty@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 63C796053D; Thu, 1 Nov 2018 14:46:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1541083612; bh=5hi6Nur1Rz4izJy5MELwBJ8oflhF7Lna535yfQBne58=; h=From:To:Cc:Subject:Date:From; b=PVb388pyBXWdNgpPXK2KZvd3JO1ovH5QFmhGZMXYYvfoEKhOpf82y1ieYINJESEes 5flmeGofWQZwIj47COexbj+TD0Y8BDUhNXdJdZmSEAfAlikBkpxShb2pepp0WA+IeO HpLObo0sNKwXRPt9b9fUAnZS5OwV9lZxt7HLiakc= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 63C796053D Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=smasetty@codeaurora.org From: Sharat Masetty To: freedreno@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, jcrouse@codeaurora.org, Sharat Masetty Subject: [PATCH] drm/msm: Optimize adreno_show_object() Date: Thu, 1 Nov 2018 20:16:45 +0530 Message-Id: <1541083605-16950-1-git-send-email-smasetty@codeaurora.org> X-Mailer: git-send-email 1.9.1 Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When the userspace tries to read the crashstate dump, the read side implementation in the driver currently ascii85 encodes all the binary buffers and it does this each time the read system call is called. A userspace tool like cat typically does a page by page read and the number of read calls depends on the size of the data captured by the driver. This is certainly not desirable and does not scale well with large captures. This patch encodes the buffer only once in the read path. With this there is an immediate >10X speed improvement in crashstate save time. Signed-off-by: Sharat Masetty --- This is a different and a simpler version of [1]. We move the encode back to read time and make sure each buffer is encoded just once. This is showing similar performance numbers as [1] for similar capture sizes. 1) https://patchwork.freedesktop.org/patch/259627/ drivers/gpu/drm/msm/adreno/adreno_gpu.c | 82 ++++++++++++++++++++++++--------- drivers/gpu/drm/msm/msm_gpu.h | 2 + 2 files changed, 63 insertions(+), 21 deletions(-) -- 1.9.1 diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 141062f..6251e76 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -406,7 +406,7 @@ int adreno_gpu_state_get(struct msm_gpu *gpu, struct msm_gpu_state *state) size = j + 1; if (size) { - state->ring[i].data = kmalloc(size << 2, GFP_KERNEL); + state->ring[i].data = kvmalloc(size << 2, GFP_KERNEL); if (state->ring[i].data) { memcpy(state->ring[i].data, gpu->rb[i]->start, size << 2); state->ring[i].data_size = size << 2; @@ -445,7 +445,7 @@ void adreno_gpu_state_destroy(struct msm_gpu_state *state) int i; for (i = 0; i < ARRAY_SIZE(state->ring); i++) - kfree(state->ring[i].data); + kvfree(state->ring[i].data); for (i = 0; state->bos && i < state->nr_bos; i++) kvfree(state->bos[i].data); @@ -475,34 +475,74 @@ int adreno_gpu_state_put(struct msm_gpu_state *state) #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) -static void adreno_show_object(struct drm_printer *p, u32 *ptr, int len) +static char *adreno_gpu_ascii85_encode(u32 *src, size_t len) { + void *buf; + size_t buf_itr = 0, buffer_size; char out[ASCII85_BUFSZ]; - long l, datalen, i; + long l; + int i; - if (!ptr || !len) - return; + if (!src || !len) + return NULL; + + l = ascii85_encode_len(len); /* - * Only dump the non-zero part of the buffer - rarely will any data - * completely fill the entire allocated size of the buffer + * Ascii85 outputs either a 5 byte string or a 1 byte string. So we + * account for the worst case of 5 bytes per dword plus the 1 for '\0' */ - for (datalen = 0, i = 0; i < len >> 2; i++) { - if (ptr[i]) - datalen = (i << 2) + 1; - } + buffer_size = (l * 5) + 1; + + buf = kvmalloc(buffer_size, GFP_KERNEL); + if (!buf) + return NULL; - /* Skip printing the object if it is empty */ - if (datalen == 0) + for (i = 0; i < l; i++) + buf_itr += snprintf(buf + buf_itr, buffer_size - buf_itr, "%s", + ascii85_encode(src[i], out)); + + return buf; +} + +/* len is expected to be in bytes */ +static void adreno_show_object(struct drm_printer *p, void **ptr, int len, + bool *encoded) +{ + if (!*ptr || !len) return; - l = ascii85_encode_len(datalen); + if (!*encoded) { + long datalen, i; + u32 *buf = *ptr; + + /* + * Only dump the non-zero part of the buffer - rarely will + * any data completely fill the entire allocated size of + * the buffer. + */ + for (datalen = 0, i = 0; i < len >> 2; i++) + if (buf[i]) + datalen = ((i + 1) << 2); + + /* + * If we reach here, then the originally captured binary buffer + * will be replaced with the ascii85 encoded string + */ + *ptr = adreno_gpu_ascii85_encode(buf, datalen); + + kvfree(buf); + + *encoded = true; + } + + if (!*ptr) + return; drm_puts(p, " data: !!ascii85 |\n"); drm_puts(p, " "); - for (i = 0; i < l; i++) - drm_puts(p, ascii85_encode(ptr[i], out)); + drm_puts(p, *ptr); drm_puts(p, "\n"); } @@ -534,8 +574,8 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state, drm_printf(p, " wptr: %d\n", state->ring[i].wptr); drm_printf(p, " size: %d\n", MSM_GPU_RINGBUFFER_SZ); - adreno_show_object(p, state->ring[i].data, - state->ring[i].data_size); + adreno_show_object(p, &state->ring[i].data, + state->ring[i].data_size, &state->ring[i].encoded); } if (state->bos) { @@ -546,8 +586,8 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state, state->bos[i].iova); drm_printf(p, " size: %zd\n", state->bos[i].size); - adreno_show_object(p, state->bos[i].data, - state->bos[i].size); + adreno_show_object(p, &state->bos[i].data, + state->bos[i].size, &state->bos[i].encoded); } } diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index f82bac0..efb49bb 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -187,6 +187,7 @@ struct msm_gpu_state_bo { u64 iova; size_t size; void *data; + bool encoded; }; struct msm_gpu_state { @@ -201,6 +202,7 @@ struct msm_gpu_state { u32 wptr; void *data; int data_size; + bool encoded; } ring[MSM_GPU_MAX_RINGS]; int nr_registers;