From patchwork Sat Jul 9 10:57:24 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tixy X-Patchwork-Id: 959832 Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p69FiJXO005460 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 9 Jul 2011 15:44:47 GMT Received: from canuck.infradead.org ([2001:4978:20e::1]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1QfVbf-0001XM-0s; Sat, 09 Jul 2011 11:21:55 +0000 Received: from localhost ([127.0.0.1] helo=canuck.infradead.org) by canuck.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1QfVbe-0001Rt-CG; Sat, 09 Jul 2011 11:21:54 +0000 Received: from queue02.mail.zen.net.uk ([212.23.3.27]) by canuck.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1QfVZp-0000RA-PR for linux-arm-kernel@lists.infradead.org; Sat, 09 Jul 2011 11:20:02 +0000 Received: from [212.23.3.140] (helo=smarthost01.mail.zen.net.uk) by queue02.mail.zen.net.uk with esmtp (Exim 4.72) (envelope-from ) id 1QfVER-0008FH-74 for linux-arm-kernel@lists.infradead.org; Sat, 09 Jul 2011 10:57:55 +0000 Received: from [82.69.122.217] (helo=plug1) by smarthost01.mail.zen.net.uk with esmtpsa (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1QfVEQ-0005bo-7h for linux-arm-kernel@lists.infradead.org; Sat, 09 Jul 2011 10:57:55 +0000 Received: from [192.168.2.20] (helo=computer2) by plug1 with esmtp (Exim 4.72) (envelope-from ) id 1QfVEM-0003he-RM for linux-arm-kernel@lists.infradead.org; Sat, 09 Jul 2011 11:57:51 +0100 Received: from tixy by computer2 with local (Exim 4.72) (envelope-from ) id 1QfVEM-0005Us-M0 for linux-arm-kernel@lists.infradead.org; Sat, 09 Jul 2011 11:57:50 +0100 From: Tixy To: linux-arm-kernel@lists.infradead.org Subject: [PATCH 37/51] ARM: kprobes: Optimise emulation of LDM and STM Date: Sat, 9 Jul 2011 11:57:24 +0100 Message-Id: <1310209058-20980-38-git-send-email-tixy@yxit.co.uk> X-Mailer: git-send-email 1.7.2.5 In-Reply-To: <1310209058-20980-1-git-send-email-tixy@yxit.co.uk> References: <1310209058-20980-1-git-send-email-tixy@yxit.co.uk> X-Originating-Smarthost01-IP: [82.69.122.217] X-CRM114-Version: 20090807-BlameThorstenAndJenny ( TRE 0.7.6 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20110709_072002_049035_530968DE X-CRM114-Status: GOOD ( 19.63 ) X-Spam-Score: 0.0 (/) X-Spam-Report: SpamAssassin version 3.3.1 on canuck.infradead.org summary: Content analysis details: (0.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [212.23.3.27 listed in list.dnswl.org] X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Sat, 09 Jul 2011 15:44:47 +0000 (UTC) From: Jon Medhurst This patch improves the performance of LDM and STM instruction emulation. This is desirable because. - jprobes and kretprobes probe the first instruction in a function and, when the frame pointer is omitted, this instruction is often a STM used to push registers onto the stack. - The STM and LDM instructions are common in the body and tail of functions. - At the same time as being a common instruction form, they also have one of the slowest and most complicated simulation routines. The approach taken to optimisation is to use simulation rather than emulation, that is, a modified form of the instruction is run with an appropriate register context. Benchmarking on an OMAP3530 shows the optimised emulation is between 2 and 3 times faster than the simulation routines. On a Kirkwood based device the relative performance was very significantly better than this. Signed-off-by: Jon Medhurst Acked-by: Nicolas Pitre --- arch/arm/kernel/kprobes-common.c | 68 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 68 insertions(+), 0 deletions(-) diff --git a/arch/arm/kernel/kprobes-common.c b/arch/arm/kernel/kprobes-common.c index 9ac1427..765c682 100644 --- a/arch/arm/kernel/kprobes-common.c +++ b/arch/arm/kernel/kprobes-common.c @@ -220,13 +220,81 @@ static void __kprobes simulate_ldm1_pc(struct kprobe *p, struct pt_regs *regs) load_write_pc(regs->ARM_pc, regs); } +static void __kprobes +emulate_generic_r0_12_noflags(struct kprobe *p, struct pt_regs *regs) +{ + register void *rregs asm("r1") = regs; + register void *rfn asm("lr") = p->ainsn.insn_fn; + + __asm__ __volatile__ ( + "stmdb sp!, {%[regs], r11} \n\t" + "ldmia %[regs], {r0-r12} \n\t" +#if __LINUX_ARM_ARCH__ >= 6 + "blx %[fn] \n\t" +#else + "str %[fn], [sp, #-4]! \n\t" + "adr lr, 1f \n\t" + "ldr pc, [sp], #4 \n\t" + "1: \n\t" +#endif + "ldr lr, [sp], #4 \n\t" /* lr = regs */ + "stmia lr, {r0-r12} \n\t" + "ldr r11, [sp], #4 \n\t" + : [regs] "=r" (rregs), [fn] "=r" (rfn) + : "0" (rregs), "1" (rfn) + : "r0", "r2", "r3", "r4", "r5", "r6", "r7", + "r8", "r9", "r10", "r12", "memory", "cc" + ); +} + +static void __kprobes +emulate_generic_r2_14_noflags(struct kprobe *p, struct pt_regs *regs) +{ + emulate_generic_r0_12_noflags(p, (struct pt_regs *)(regs->uregs+2)); +} + +static void __kprobes +emulate_ldm_r3_15(struct kprobe *p, struct pt_regs *regs) +{ + emulate_generic_r0_12_noflags(p, (struct pt_regs *)(regs->uregs+3)); + load_write_pc(regs->ARM_pc, regs); +} + enum kprobe_insn __kprobes kprobe_decode_ldmstm(kprobe_opcode_t insn, struct arch_specific_insn *asi) { kprobe_insn_handler_t *handler = 0; unsigned reglist = insn & 0xffff; int is_ldm = insn & 0x100000; + int rn = (insn >> 16) & 0xf; + + if (rn <= 12 && (reglist & 0xe000) == 0) { + /* Instruction only uses registers in the range R0..R12 */ + handler = emulate_generic_r0_12_noflags; + + } else if (rn >= 2 && (reglist & 0x8003) == 0) { + /* Instruction only uses registers in the range R2..R14 */ + rn -= 2; + reglist >>= 2; + handler = emulate_generic_r2_14_noflags; + + } else if (rn >= 3 && (reglist & 0x0007) == 0) { + /* Instruction only uses registers in the range R3..R15 */ + if (is_ldm && (reglist & 0x8000)) { + rn -= 3; + reglist >>= 3; + handler = emulate_ldm_r3_15; + } + } + + if (handler) { + /* We can emulate the instruction in (possibly) modified form */ + asi->insn[0] = (insn & 0xfff00000) | (rn << 16) | reglist; + asi->insn_handler = handler; + return INSN_GOOD; + } + /* Fallback to slower simulation... */ if (reglist & 0x8000) handler = is_ldm ? simulate_ldm1_pc : simulate_stm1_pc; else