From patchwork Fri Oct 20 20:22:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Salyzyn X-Patchwork-Id: 10020945 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9BEC160234 for ; Fri, 20 Oct 2017 20:24:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C9A228711 for ; Fri, 20 Oct 2017 20:24:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 80EC328787; Fri, 20 Oct 2017 20:24:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_MED, RCVD_IN_SORBS_SPAM autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E655128711 for ; Fri, 20 Oct 2017 20:24:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=GXDEnN008f4cIp4w12V49qIGFH3v25dOPlNkekQ82Nk=; b=mpz XK8APUY863tw+erxJHq2g1hGOAWuPgdF6T8R0V9I/BM6xTlRDXGySDMzKBfRjPyUtZmmgiJWnwoj3 1QDut0RbS3kfLp5EDz6kXIjaTutiNzdT78wFP0x44pdQ6OQnH0vFr/XLJ7WFI4nkc7AMMayX0+g5z Qwvodc/tw2l4spPigDXGb4T+orNCof/8uO/Ml2zMspcTih3LgAt7wcDQfNMUjEJXrUBPqFnQRxayS KqF+NTqkNUkIZ6c9gyYo2pzIMFpvsfa3jaur1ndjoLDNdeE9k91k76fqUkBThrxXEM85JHfd+U9cJ 0q8US7ZUvd0dOCbQG4/+5R/B9LNjBTw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1e5dqD-0004Dd-Li; Fri, 20 Oct 2017 20:24:25 +0000 Received: from mail-it0-x242.google.com ([2607:f8b0:4001:c0b::242]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1e5dq9-0004Bw-Mv for linux-arm-kernel@lists.infradead.org; Fri, 20 Oct 2017 20:24:23 +0000 Received: by mail-it0-x242.google.com with SMTP id f187so35551itb.1 for ; Fri, 20 Oct 2017 13:24:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=android.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=agpFIGRgTAfsjtdv9Cz3yYsNHnoG7ozWQhJVePl114Y=; b=aRkVkPSPj1bJ1UAuOfTOFPh84g99WtF6+kPeHwTZ0dtfNh4SSKr6XfjPtkFdedPTe0 sMPV6atua2AoTEmFfVfI0PEXqkQB3o+BfdRi2ZBwScjj6tvFWUOT8U2tLVzuHbFF/ZAG PGDz7aMUEmbKTsAm5QjNJqk/VsVDlCdwTOCbuuqBMbbrf2CTznni8H3R+xa6H0umj/Nd rLxSsZnUoeblGxuJSke51ufForWThiNgBir+vsqNh2QjlqsJ7KfWpO7NdG0WiY06eigV YKOJI+sFZXPbZq6TnuNpbOLuip+nEmsbU0QNxChYmrotrP2OWJFSy+GCn3bP04LoehjQ QjrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=agpFIGRgTAfsjtdv9Cz3yYsNHnoG7ozWQhJVePl114Y=; b=mnOuDv5POhpUUfj5BYv4WDappsB0bwwwc9N/gXOeadQvkx4jyvcgrgg+823PAF7iZA BQtXzQMyjrgKUi4ugg5SaqZvuwTlFhweIXA+cZV3ht4oxGmPkCtcNK2As//KcsDKszZY byR0lKTg4CD5CkIqLR9mqO3lT+s636dHcxKuR/3xkJ4eJraj7YznQFrYPVoIrsMUs2+2 Ba/NCsjNf0Ib1N6BTnzNp53Ck5uzLX1QmBlE3/ZJSdNh6BsSPtefEUMIy0eBpPTD2wGB KuuA8z4EwkcYphs3L9lc8W/nw4gZVL7/hMP1olSEUv/cB+ZSXLGjqeHx4O57uPUxgacU Dihw== X-Gm-Message-State: AMCzsaUH/4fLZvhPjzqcXq81+5mBpkvI7DyDburI6ZYrd2ciWZ1vhzrO ErgoA6I875NoQY3EUfgUqC/Rag== X-Google-Smtp-Source: ABhQp+SYro+X7FHiPDUfO4kU1uz5kOBsupvVyJgrpgniq3HtoD0wDaO3ZzD5ngfG6JrU7vKuC1S6kQ== X-Received: by 10.36.91.76 with SMTP id g73mr86975itb.3.1508531040211; Fri, 20 Oct 2017 13:24:00 -0700 (PDT) Received: from nebulus.mtv.corp.google.com ([100.98.120.17]) by smtp.gmail.com with ESMTPSA id s19sm659569ioa.62.2017.10.20.13.23.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 20 Oct 2017 13:23:59 -0700 (PDT) From: Mark Salyzyn To: linux-kernel@vger.kernel.org Subject: [PATCH] arch64: optimize __memcpy_fromio, __memcpy_toio and __memset_io Date: Fri, 20 Oct 2017 13:22:36 -0700 Message-Id: <20171020202327.2592-1-salyzyn@android.com> X-Mailer: git-send-email 2.15.0.rc0.271.g36b669edcc-goog X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20171020_132421_838647_F85B87E0 X-CRM114-Status: GOOD ( 15.22 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tony Luck , Kees Cook , Catalin Marinas , Anton Vorontsov , Will Deacon , Mark Salyzyn , Colin Cross , linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP __memcpy_fromio and __memcpy_toio functions do not deal well with harmonically unaligned addresses unless they can ultimately be copied as quads (u64) to and from the destination. Without a harmonically aligned relationship, they perform byte operations over the entire buffer. Added optional paths that perform reads and writes at the best alignment possible with source and destination, placing a priority on using quads (8 byte transfers) on the io-side. Removed the volatile on the source for __memcpy_toio as it is unnecessary. This change was motivated by performance issues in the pstore driver. On a test platform, measuring probe time for pstore, console buffer size of 1/4MB and pmsg of 1/2MB, was in the 90-107ms region. Change managed to reduce it to worst case 15ms, an improvement in boot time. Adjusted __memset_io to use the same pattern of access, although it does not have a harmonic relationship between two pointers to worry about, and thus the benefit is balance and not nearly as dramatic. Signed-off-by: Mark Salyzyn Cc: Kees Cook Cc: Anton Vorontsov Cc: Tony Luck Cc: Catalin Marinas Cc: Will Deacon Cc: Anton Vorontsov Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org --- arch/arm64/kernel/io.c | 199 ++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 156 insertions(+), 43 deletions(-) diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c index 354be2a872ae..14ef7c8f20ea 100644 --- a/arch/arm64/kernel/io.c +++ b/arch/arm64/kernel/io.c @@ -20,61 +20,147 @@ #include #include +/* if/while helpers assume from, to and count vars accessible in caller */ + +/* arguments to helpers to ensure proper combinations */ +#define byte b, u8 +#define word w, u16 +#define longword l, u32 +#define quad q, u64 + +/* read helper for unaligned transfers needing intermediate hold and memcpy */ +#define _do_unaligned_read(op, align_type, width, type) do { \ + op(count >= sizeof(type)) { \ + type hold = __raw_read##width##(from); \ + \ + memcpy((align_type *)to, &hold, sizeof(type)); \ + to += sizeof(type); \ + from += sizeof(type); \ + count -= sizeof(type); \ + } \ +} while (0) +#define if_unaligned_read(type, x) _do_unaligned_read(if, type, x) +#define while_unaligned_read(type, x) _do_unaligned_read(while, type, x) + +/* read helper for aligned transfers */ +#define _do_aligned_read(op, width, type) \ + _do_unaligned_read(op, type, width, type) +#define if_aligned_read(x) _do_aligned_read(if, x) +#define while_aligned_read(x) _do_aligned_read(while, x) + /* * Copy data from IO memory space to "real" memory space. */ + void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count) { - while (count && (!IS_ALIGNED((unsigned long)from, 8) || - !IS_ALIGNED((unsigned long)to, 8))) { - *(u8 *)to = __raw_readb(from); - from++; - to++; - count--; - } + if (!IS_ALIGNED((unsigned long)from, sizeof(u16))) + if_aligned_read(byte); - while (count >= 8) { - *(u64 *)to = __raw_readq(from); - from += 8; - to += 8; - count -= 8; + if (!IS_ALIGNED((unsigned long)to, sizeof(u16))) { + if (!IS_ALIGNED((unsigned long)from, sizeof(u32))) + if_unaligned_read(u8, word); + if (!IS_ALIGNED((unsigned long)from, sizeof(u64))) + if_unaligned_read(u8, longword); + while_unaligned_read(u8, quad); + if_unaligned_read(u8, longword); + if_unaligned_read(u8, word); + if_aligned_read(byte); + return; } - while (count) { - *(u8 *)to = __raw_readb(from); - from++; - to++; - count--; + if (!IS_ALIGNED((unsigned long)from, sizeof(u32))) + if_aligned_read(word); + + if (!IS_ALIGNED((unsigned long)to, sizeof(u32))) { + if (!IS_ALIGNED((unsigned long)from, sizeof(u64))) + if_unaligned_read(u16, longword); + while_unaligned_read(u16, quad); + if_unaligned_read(u16, longword); + if_aligned_read(word); + if_aligned_read(byte); + return; } + + if (!IS_ALIGNED((unsigned long)from, sizeof(u64))) + if_aligned_read(longword); + + if (!IS_ALIGNED((unsigned long)to, sizeof(u64))) + while_unaligned_read(u32, quad); + else + while_aligned_read(quad); + + if_aligned_read(longword); + if_aligned_read(word); + if_aligned_read(byte); } EXPORT_SYMBOL(__memcpy_fromio); +/* write helper for unaligned transfers needing intermediate hold and memcpy */ +#define _do_unaligned_write(op, align_type, width, type) do { \ + op(count >= sizeof(type)) { \ + type hold; \ + \ + memcpy(&hold, (align_type *)from, sizeof(type));\ + __raw_write##width##(hold, to); \ + to += sizeof(type); \ + from += sizeof(type); \ + count -= sizeof(type); \ + } \ +} while (0) +#define if_unaligned_write(type, x) _do_unaligned_write(if, type, x) +#define while_unaligned_write(type, x) _do_unaligned_write(while, type, x) + +/* write helper for aligned transfers */ +#define _do_aligned_write(op, width, type) \ + _do_unaligned_write(op, type, width, type) +#define if_aligned_write(x) _do_aligned_write(if, x) +#define while_aligned_write(x) _do_aligned_write(while, x) + /* * Copy data from "real" memory space to IO memory space. */ void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count) { - while (count && (!IS_ALIGNED((unsigned long)to, 8) || - !IS_ALIGNED((unsigned long)from, 8))) { - __raw_writeb(*(volatile u8 *)from, to); - from++; - to++; - count--; - } + if (!IS_ALIGNED((unsigned long)to, sizeof(u16))) + if_aligned_write(byte); - while (count >= 8) { - __raw_writeq(*(volatile u64 *)from, to); - from += 8; - to += 8; - count -= 8; + if (!IS_ALIGNED((unsigned long)from, sizeof(u16))) { + if (!IS_ALIGNED((unsigned long)to, sizeof(u32))) + if_unaligned_write(u8, word); + if (!IS_ALIGNED((unsigned long)to, sizeof(u64))) + if_unaligned_write(u8, longword); + while_unaligned_write(u8, quad); + if_unaligned_write(u8, longword); + if_unaligned_write(u8, word); + if_aligned_write(byte); + return; } - while (count) { - __raw_writeb(*(volatile u8 *)from, to); - from++; - to++; - count--; + if (!IS_ALIGNED((unsigned long)to, sizeof(u32))) + if_aligned_write(word); + + if (!IS_ALIGNED((unsigned long)from, sizeof(u32))) { + if (!IS_ALIGNED((unsigned long)to, sizeof(u64))) + if_unaligned_write(u16, longword); + while_unaligned_write(u16, quad); + if_unaligned_write(u16, longword); + if_aligned_write(word); + if_aligned_write(byte); + return; } + + if (!IS_ALIGNED((unsigned long)to, sizeof(u64))) + if_aligned_write(longword); + + if (!IS_ALIGNED((unsigned long)from, sizeof(u64))) + while_unaligned_write(u32, quad); + else + while_aligned_write(quad); + + if_aligned_write(longword); + if_aligned_write(word); + if_aligned_write(byte); } EXPORT_SYMBOL(__memcpy_toio); @@ -89,22 +175,49 @@ void __memset_io(volatile void __iomem *dst, int c, size_t count) qc |= qc << 16; qc |= qc << 32; - while (count && !IS_ALIGNED((unsigned long)dst, 8)) { + if ((count >= sizeof(u8)) && + !IS_ALIGNED((unsigned long)dst, sizeof(u16))) { __raw_writeb(c, dst); - dst++; - count--; + dst += sizeof(u8); + count -= sizeof(u8); + } + + if ((count >= sizeof(u16)) && + !IS_ALIGNED((unsigned long)dst, sizeof(u32))) { + __raw_writew((u16)qc, dst); + dst += sizeof(u16); + count -= sizeof(u16); } - while (count >= 8) { + if ((count >= sizeof(u32)) && + !IS_ALIGNED((unsigned long)dst, sizeof(u64))) { + __raw_writel((u32)qc, dst); + dst += sizeof(u32); + count -= sizeof(u32); + } + + while (count >= sizeof(u64)) { __raw_writeq(qc, dst); - dst += 8; - count -= 8; + dst += sizeof(u64); + count -= sizeof(u64); + } + + if (count >= sizeof(u32)) { + __raw_writel((u32)qc, dst); + dst += sizeof(u32); + count -= sizeof(u32); + } + + if (count >= sizeof(u16)) { + __raw_writew((u16)qc, dst); + dst += sizeof(u16); + count -= sizeof(u16); } - while (count) { + if (count) { __raw_writeb(c, dst); - dst++; - count--; + dst += sizeof(u8); + count -= sizeof(u8); } } EXPORT_SYMBOL(__memset_io);