diff mbox

[v12,7/7] ARM: kprobes: enable OPTPROBES for ARM 32

Message ID 548189D4.3010907@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wang Nan Dec. 5, 2014, 10:32 a.m. UTC
On 2014/12/5 18:10, Jon Medhurst (Tixy) wrote:
> On Fri, 2014-12-05 at 11:38 +0800, Wang Nan wrote:
>> On 2014/12/5 0:21, Jon Medhurst (Tixy) wrote:
>>> On Thu, 2014-12-04 at 13:36 +0800, Wang Nan wrote:
>>>
>>
>> [trim some text]
>>
>>>
>>> I have retested this patch and on one of the arm test cases I get an
>>> undefined instruction exception in kprobe_arm_test_cases. When this
>>> happens PC points to the second nop below. 
>>>
>>>
>>> 80028a38:	e320f000 	nop	{0}
>>> 80028a3c:	e11000b2 	ldrh	r0, [r0, -r2]
>>> 80028a40:	e320f000 	nop	{0}
>>>
>>> As all three instructions will have probes on them during testing, and
>>> un-optimised probes are implemented by using an undefined instruction to
>>> act as a breakpoint, my first thought was that we have a race condition
>>> somewhere with adding, removing or optimizing probes. Though a reboot a
>>> retest failed in the same way on the same instruction, so I'm not 100%
>>> convinced about strictly timing related bugs.
>>>  
>>
>> Does the problem appear in your platform in each time?
> 
> Three times out of three tries yes. Though the third try was built
> differently and the problem occurred on a different test case.
> 
> 
>>  Currently I have only
>> QEMU machine for testing and haven't seen problem like this before.
> 
> I don't know much about QEMU and have never used it, but I'm assuming
> QEMU doesn't make any attempt to simulate caches like the data cache,
> instruction cache, TLBs, branch predictor? Does it even emulate multiple
> CPUs with multiple host CPU threads? Basically, I very much doubt QEMU
> is a very good test of kernel code in general, and especially code that
> modifies code and has multiple cpus running in parallel.
> 
> Do you not have access to any kind of ARM board to try some testing on?
> 
> 
>>  Could
>> you please provide a detail steps for me to reproduce it? Or do you just
>> enable kprobe test code when booting and this exception simply appear twice?
> 
> I applied the patches on top of Linux 3.18-rc5 and set VERBOSE in
> arm/probes/kprobes/test-core.h to 1. Then built a kernel configured
> using vexpress_defconfig and enabled
> 
> CONFIG_KPROBES=y
> CONFIG_ARM_KPROBES_TEST=y
> CONFIG_DEBUG_INFO=y
> 
> then booted on a Versatile Express board with a TC2 CoreTile (A15/A7
> big.LITTLE CPU).
> 
> The Oops I described happened on two consecutive boots of the board. I
> then tried again setting VERBOSE to 0 and I got a similar OOPs but on a
> different test case.
> 


Before your reply I also did my testing on real hardware platform. I tried 3
times and hit one similar failure. dmesg is pasted at the end of this mail.
After the Oops arises, I wrap the failed testcase with a loop and run it
100 times, they all passed.

I use your test code for testing, with following modification:

and with the kernel config options you mentioned before selected.

The hardware platform I use doesn't have stable BSP for 3.18, so
I have to backport all kprobe related code to 3.10.61.

Both your and mine failure are related to ldrd/h instruction. What
about your second failed testcase?


> I'm worried because this whole optimised kprobes has some rather
> complicated interactions, e.g. can the background thread that changes
> breakpoints to jumps (or back again?) could occur at the same time
> another CPU is processing a kprobe that's been hit, or is in the process
> of removing a probe.
> 

I think x86 should also has to deal with these problems, so if the fault
is caused by these race, they may not ARM specific.

Thank you!

------------ dmesg --------------
...
[ 1398.496592] .long ((0xe089c0df) & 0xFFFFFFFF)
[ 1398.496592] 		@ ldrd r12, [r9], pc
[ 1398.504338] strd	r0, [r1, #-8]	@ e14100f8
[ 1400.241108] strvsd	r8, [r13, #8]	@ 61cd80f8
[ 1402.031108] strd	r4, [r2, #16]!	@ e1e241f0
[ 1403.821107] strvcd	r12, [r11, #-16]!	@ 716bc1f0
[ 1405.641104] strd	r2, [r4], #48	@ e0c423f0
[ 1407.441108] strd	r10, [r9], #-48	@ e049a3f0
[ 1409.261105] strd	r6, [r13, #-64]!	@ e16d64f0
[ 1411.051105] strd r6, [r13, #-64-8]!	@ e16d64f8
[ 1411.101114] strd	r4, [r12, #-64-8]!	@ e16c44f8
[ 1412.871105] .long ((0xe1efc3f0) & 0xFFFFFFFF)
[ 1412.871105] 		@ strd r12, [pc, #48]!
[ 1412.921113] ldrd	r0, [r0, #-8]	@ e14000d8
[ 1413.031116] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
[ 1413.037809] Modules linked in:
[ 1413.040876] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.53-HULK2+ #31
[ 1413.047569] task: ef238000 ti: ef226000 task.ti: ef226000
[ 1413.052971] PC is at kprobe_arm_test_cases+0xa1ec/0xfeec
[ 1413.058276] LR is at 0x21522f52
[ 1413.061416] pc : [<c00286d4>]    lr : [<21522f52>]    psr: 18010113
[ 1413.061416] sp : ef227dd8  ip : 21522d52  fp : 21522a52
[ 1413.072876] r10: 21522b52  r9 : 21522852  r8 : 21522952
[ 1413.078095] r7 : 21522652  r6 : 21522752  r5 : 21522452  r4 : 21522552
[ 1413.084613] r3 : 21522252  r2 : 21522352  r1 : 45678eab  r0 : 45678dab
[ 1413.091132] Flags: nzcV  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[ 1413.098430] Control: 18c53c7d  Table: 8000404a  DAC: 00000015
[ 1413.104167] Process swapper/0 (pid: 1, stack limit = 0xef226238)
[ 1413.110166] Stack: (0xef227dd8 to 0xef228000)
[ 1413.114518] 7dc0:                                                       456789ab 45678aab
[ 1413.122686] 7de0: 45678bab 45678cab 45678dab 45678eab 45678fab 456790ab 456791ab 456792ab
[ 1413.130855] 7e00: 456793ab 456794ab 456795ab ef227e50 456797ab 456798ab 456799ab 45679aab
[ 1413.139022] 7e20: 45679bab 45679cab 45679dab 45679eab 45679fab 4567a0ab 4567a1ab 4567a2ab
[ 1413.147191] 7e40: 4567a3ab 4567a4ab 4567a5ab 4567a6ab 4567a7ab 4567a8ab 4567a9ab 4567aaab
[ 1413.155360] 7e60: 4567abab 4567acab 4567adab 4567aeab 4567afab 4567b0ab 4567b1ab 4567b2ab
[ 1413.163527] 7e80: 4567b3ab 4567b4ab 4567b5ab 4567b6ab 4567b7ab 4567b8ab 4567b9ab 4567baab
[ 1413.171694] 7ea0: 4567bbab 4567bcab 4567bdab 4567beab 4567bfab 4567c0ab 4567c1ab 4567c2ab
[ 1413.179863] 7ec0: 4567c3ab 4567c4ab 4567c5ab 4567c6ab 4567c7ab 4567c8ab c03bd920 c03bdf5e
[ 1413.188031] 7ee0: c02f7708 00000000 c001dd58 00000000 c046526c 00000000 0000000c 00000000
[ 1413.196199] 7f00: c04c1a4c c04427f0 00000001 c02f7708 00000000 00000000 c04c1440 00000007
[ 1413.204367] 7f20: c0465264 c0476520 c04c1440 c044266c c046526c c000868c ef227f64 c0418fac
[ 1413.212534] 7f40: c0418a2c 00000058 00000007 00000007 00000001 00000007 c0465264 c0476520
[ 1413.220702] 7f60: c04c1440 00000058 c043a478 c046526c 00000000 c043ac50 00000007 00000007
[ 1413.228869] 7f80: c043a478 ef226000 c04c1440 c02d8e9c 00000000 00000000 00000000 00000000
[ 1413.237036] 7fa0: 00000000 c02d8ea8 00000000 c000e138 00000000 00000000 00000000 00000000
[ 1413.245203] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1413.253371] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 1413.261545] [<c00286d4>] (kprobe_arm_test_cases+0xa1ec/0xfeec) from [<45678cab>] (0x45678cab)
[ 1413.270060] Code: 00002000 000c000c e7f001f8 e14000d8 (eabf69b9)
[ 1413.276164] ---[ end trace 81706e7e45d860af ]---
[ 1413.280776] Kernel panic - not syncing: Fatal exception
[ 1413.285996] CPU1: stopping
[ 1413.288704] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D      3.10.53-HULK2+ #31
[ 1413.296372] [<c0014b84>] (unwind_backtrace+0x0/0x120) from [<c00118a8>] (show_stack+0x10/0x14)
[ 1413.304977] [<c00118a8>] (show_stack+0x10/0x14) from [<c00137b4>] (handle_IPI+0xc0/0x124)
[ 1413.313148] [<c00137b4>] (handle_IPI+0xc0/0x124) from [<c0008530>] (gic_handle_irq+0x58/0x60)
[ 1413.321675] [<c0008530>] (gic_handle_irq+0x58/0x60) from [<c02e6900>] (__irq_svc+0x40/0x50)
[ 1413.330013] Exception stack(0xef259fa0 to 0xef259fe8)

Comments

Jon Medhurst (Tixy) Dec. 5, 2014, 10:48 a.m. UTC | #1
On Fri, 2014-12-05 at 18:32 +0800, Wang Nan wrote:
> Both your and mine failure are related to ldrd/h instruction. What
> about your second failed testcase?

Can't remember exactly, but it was some instruction that didn't access
memory I remember that much.

I've also just run the tests on an A9 CPU and get the Oops on the nop
after the 'wfi' instruction test. So looks like the problem isn't
related to particular test cases.
diff mbox

Patch

--- ../temp/arch/arm/probes/kprobes/test-core.c	2014-12-05 15:42:28.000000000 +0800
+++ ./arch/arm/probes/kprobes/test-core.c	2014-12-05 16:06:18.000000000 +0800
@@ -311,6 +311,7 @@ 
 	pre_handler_called = test_func_instance;
 	if (regs->ARM_r0 == FUNC_ARG1 && regs->ARM_r1 == FUNC_ARG2)
 		test_regs_ok = true;
+	post_handler_called = test_func_instance + 1;
 	return 0;
 }

@@ -325,7 +326,7 @@ 
 static struct kprobe the_kprobe = {
 	.addr		= 0,
 	.pre_handler	= pre_handler,
-	.post_handler	= post_handler
+	.post_handler	= NULL
 };

 static int test_kprobe(long (*func)(long, long))
@@ -346,6 +347,7 @@ 

 	if (!ret)
 		return -EINVAL;
+#if 0
 	if (pre_handler_called != test_func_instance) {
 		pr_err("FAIL: kprobe pre_handler not called\n");
 		return -EINVAL;
@@ -361,7 +363,7 @@ 
 		pr_err("FAIL: probe called after unregistering\n");
 		return -EINVAL;
 	}
-
+#endif
 	return 0;
 }