diff mbox

[RFC,2/3] arm64: refactor save_stack_trace()

Message ID 20150716102405.2cc8c406@gandalf.local.home (mailing list archive)
State New, archived
Headers show

Commit Message

Steven Rostedt July 16, 2015, 2:24 p.m. UTC
On Thu, 16 Jul 2015 22:29:05 +0900
Jungseok Lee <jungseoklee85@gmail.com> wrote:


> First of all, let's look at the following data.
> 
> 1) stack_trace data
>         Depth    Size   Location    (55 entries)
>         -----    ----   --------
>   0)     4808      16   notifier_call_chain+0x2c/0x94
>   1)     4792      64   raw_notifier_call_chain+0x34/0x44
>   2)     4728      48   timekeeping_update.constprop.9+0xb8/0x114
>   3)     4680      48   update_wall_time+0x408/0x6dc
>   4)     4632     128   tick_do_update_jiffies64+0xd8/0x154
>   5)     4504      80   tick_sched_do_timer+0x50/0x60
>   6)     4424      32   tick_sched_timer+0x34/0x90
>   7)     4392      48   __run_hrtimer+0x60/0x258
>   8)     4344      64   hrtimer_interrupt+0xe8/0x260
>   9)     4280     128   arch_timer_handler_virt+0x38/0x48
>  10)     4152      32   handle_percpu_devid_irq+0x84/0x188
>  11)     4120      64   generic_handle_irq+0x38/0x54
>  12)     4056      32   __handle_domain_irq+0x68/0xbc
>  13)     4024      64   gic_handle_irq+0x38/0x88
>  14)     3960     336   el1_irq+0x64/0xd8
>  15)     3624      16   netif_rx_internal+0x14/0x198
>  16)     3608      64   netif_rx+0x20/0xa4
>  17)     3544      32   loopback_xmit+0x64/0xf4
>  18)     3512      48   dev_hard_start_xmit+0x25c/0x3f8
>  19)     3464     160   __dev_queue_xmit+0x440/0x4dc
>  20)     3304      96   dev_queue_xmit_sk+0x20/0x30
>  21)     3208      32   ip_finish_output+0x1e0/0xabc
>  22)     3176      96   ip_output+0xf0/0x120
>  23)     3080      64   ip_local_out_sk+0x44/0x54
>  24)     3016      32   ip_send_skb+0x24/0xbc
>  25)     2984      48   ip_push_pending_frames+0x40/0x60
>  26)     2936      64   icmp_push_reply+0x104/0x144
>  27)     2872      96   icmp_send+0x3c0/0x3c8
>  28)     2776     192   __udp4_lib_rcv+0x5b8/0x684
>  29)     2584      96   udp_rcv+0x2c/0x3c
>  30)     2488      32   ip_local_deliver+0xa0/0x224
>  31)     2456      48   ip_rcv+0x360/0x57c
>  32)     2408      64   __netif_receive_skb_core+0x4d0/0x80c
>  33)     2344     128   __netif_receive_skb+0x24/0x84
>  34)     2216      32   process_backlog+0x9c/0x15c
>  35)     2184      80   net_rx_action+0x1ec/0x32c
>  36)     2104     160   __do_softirq+0x114/0x2f0
>  37)     1944     128   do_softirq+0x60/0x68
>  38)     1816      32   __local_bh_enable_ip+0xb0/0xd4
>  39)     1784      32   ip_finish_output+0x1f4/0xabc
>  40)     1752      96   ip_output+0xf0/0x120
>  41)     1656      64   ip_local_out_sk+0x44/0x54
>  42)     1592      32   ip_send_skb+0x24/0xbc
>  43)     1560      48   udp_send_skb+0x1b4/0x2f4
>  44)     1512      80   udp_sendmsg+0x2a8/0x7a0
>  45)     1432     272   inet_sendmsg+0xa0/0xd0
>  46)     1160      48   sock_sendmsg+0x30/0x78
>  47)     1112      32   ___sys_sendmsg+0x15c/0x26c
>  48)     1080     400   __sys_sendmmsg+0x94/0x180
>  49)      680     320   SyS_sendmmsg+0x38/0x54
>  50)      360     360   el0_svc_naked+0x20/0x28
> 
> 2) stack_max_size data
> 4888
> 
> The data looks odd in two points.
> 1) the number of entry
> There is a mismatch between start token and real data

Yep, good catch. As soon as I read that, I realized exactly what the
issue was ;-)

> 
> 2) 80-byte gap
> stack_max_size is not aligned with "Depth" field of the first entry of stack_trace.
> 
> IMHO, two items are not considered in this series as digging them out.
> 
> 1) skipped entries
> As x variable is introduced in Steve's patch, it is needed to track down
> how many entries are recorded in both stack_dump_trace and stack_dump_index.

Yep.

> 
> 2) max_stack_trace.skip value
> max_stack_trace.skip is 0 as applying Steve's patch. The above gap could be
> observed unless the value is not considered in arch code. In the above example,
> 80-byte gap is save_stack_trace function in arch/arm64/kernel/stacktrace.c.
> 
> As applying the following fix, stack_trace and stack_max_size are okay.
> However, I'm not sure which code, arch or generic ftrace, should handle trace->skip.
> The latter one is responsible for it under current implementation, not Steve's change.
> 
> Please correct me if I am wrong.

No, it's a bug in my patch. I'll make an update.

Does this new patch fix it for you?

-- Steve

---
 kernel/trace/trace_stack.c |   58 +++++++++++++--------------------------------
 1 file changed, 17 insertions(+), 41 deletions(-)

Comments

Jungseok Lee July 16, 2015, 3:01 p.m. UTC | #1
On Jul 16, 2015, at 11:24 PM, Steven Rostedt wrote:

Hi, Steve

> On Thu, 16 Jul 2015 22:29:05 +0900
> Jungseok Lee <jungseoklee85@gmail.com> wrote:

[ snip ]

>> The data looks odd in two points.
>> 1) the number of entry
>> There is a mismatch between start token and real data
> 
> Yep, good catch. As soon as I read that, I realized exactly what the
> issue was ;-)
> 
>> 
>> 2) 80-byte gap
>> stack_max_size is not aligned with "Depth" field of the first entry of stack_trace.
>> 
>> IMHO, two items are not considered in this series as digging them out.
>> 
>> 1) skipped entries
>> As x variable is introduced in Steve's patch, it is needed to track down
>> how many entries are recorded in both stack_dump_trace and stack_dump_index.
> 
> Yep.
> 
>> 
>> 2) max_stack_trace.skip value
>> max_stack_trace.skip is 0 as applying Steve's patch. The above gap could be
>> observed unless the value is not considered in arch code. In the above example,
>> 80-byte gap is save_stack_trace function in arch/arm64/kernel/stacktrace.c.
>> 
>> As applying the following fix, stack_trace and stack_max_size are okay.
>> However, I'm not sure which code, arch or generic ftrace, should handle trace->skip.
>> The latter one is responsible for it under current implementation, not Steve's change.
>> 
>> Please correct me if I am wrong.
> 
> No, it's a bug in my patch. I'll make an update.
> 
> Does this new patch fix it for you?

I've gathered stack tracer data with your update.

1) stack_trace
        Depth    Size   Location    (35 entries)
        -----    ----   --------
  0)     4424      16   put_cpu_partial+0x28/0x1d0
  1)     4408      80   get_partial_node.isra.64+0x13c/0x344
  2)     4328     256   __slab_alloc.isra.65.constprop.67+0xd8/0x37c
  3)     4072      32   kmem_cache_alloc+0x258/0x294
  4)     4040     304   __alloc_skb+0x48/0x180
  5)     3736      96   alloc_skb_with_frags+0x74/0x234
  6)     3640     112   sock_alloc_send_pskb+0x1d0/0x294
  7)     3528     160   sock_alloc_send_skb+0x44/0x54
  8)     3368      64   __ip_append_data.isra.40+0x78c/0xb48
  9)     3304     224   ip_append_data.part.42+0x98/0xe8
 10)     3080     112   ip_append_data+0x68/0x7c
 11)     2968      96   icmp_push_reply+0x7c/0x144
 12)     2872      96   icmp_send+0x3c0/0x3c8
 13)     2776     192   __udp4_lib_rcv+0x5b8/0x684
 14)     2584      96   udp_rcv+0x2c/0x3c
 15)     2488      32   ip_local_deliver+0xa0/0x224
 16)     2456      48   ip_rcv+0x360/0x57c
 17)     2408      64   __netif_receive_skb_core+0x4d0/0x80c
 18)     2344     128   __netif_receive_skb+0x24/0x84
 19)     2216      32   process_backlog+0x9c/0x15c
 20)     2184      80   net_rx_action+0x1ec/0x32c
 21)     2104     160   __do_softirq+0x114/0x2f0
 22)     1944     128   do_softirq+0x60/0x68
 23)     1816      32   __local_bh_enable_ip+0xb0/0xd4
 24)     1784      32   ip_finish_output+0x1f4/0xabc
 25)     1752      96   ip_output+0xf0/0x120
 26)     1656      64   ip_local_out_sk+0x44/0x54
 27)     1592      32   ip_send_skb+0x24/0xbc
 28)     1560      48   udp_send_skb+0x1b4/0x2f4
 29)     1512      80   udp_sendmsg+0x2a8/0x7a0
 30)     1432     272   inet_sendmsg+0xa0/0xd0
 31)     1160      48   sock_sendmsg+0x30/0x78
 32)     1112      32   ___sys_sendmsg+0x15c/0x26c
 33)     1080     400   __sys_sendmmsg+0x94/0x180
 34)      680     320   SyS_sendmmsg+0x38/0x54
 35)      360     360   el0_svc_naked+0x20/0x28

2) stack_max_size
4504

In case of the number of entries, the following diff might be needed
as I suggested in the previous reply. ;)

----8<----

@@ -330,7 +333,7 @@ static int t_show(struct seq_file *m, void *v)
 		seq_printf(m, "        Depth    Size   Location"
 			   "    (%d entries)\n"
 			   "        -----    ----   --------\n",
-			   max_stack_trace.nr_entries - 1);
+			   max_stack_trace.nr_entries);
 
 		if (!stack_tracer_enabled && !max_stack_size)
 			print_disabled(m);

----8<----

However, 80-byte gap still appears.

Since max_stack_trace.skip is 3 in your update, save_stack_trace in arm64
should be refactored to align with this value. 

max_stack_trace.skip should be set to 4 if AKASHI's [RFC 2/3] patch is merged.
However, arch code is supposed to follow generic framework's rule in this case.
Isn't it?

Best Regards
Jungseok Lee
Steven Rostedt July 16, 2015, 3:31 p.m. UTC | #2
On Fri, 17 Jul 2015 00:01:25 +0900
Jungseok Lee <jungseoklee85@gmail.com> wrote:

> I've gathered stack tracer data with your update.
> 
> 1) stack_trace
>         Depth    Size   Location    (35 entries)
>         -----    ----   --------
>   0)     4424      16   put_cpu_partial+0x28/0x1d0
>   1)     4408      80   get_partial_node.isra.64+0x13c/0x344
>   2)     4328     256   __slab_alloc.isra.65.constprop.67+0xd8/0x37c
>   3)     4072      32   kmem_cache_alloc+0x258/0x294
>   4)     4040     304   __alloc_skb+0x48/0x180
>   5)     3736      96   alloc_skb_with_frags+0x74/0x234
>   6)     3640     112   sock_alloc_send_pskb+0x1d0/0x294
>   7)     3528     160   sock_alloc_send_skb+0x44/0x54
>   8)     3368      64   __ip_append_data.isra.40+0x78c/0xb48
>   9)     3304     224   ip_append_data.part.42+0x98/0xe8
>  10)     3080     112   ip_append_data+0x68/0x7c
>  11)     2968      96   icmp_push_reply+0x7c/0x144
>  12)     2872      96   icmp_send+0x3c0/0x3c8
>  13)     2776     192   __udp4_lib_rcv+0x5b8/0x684
>  14)     2584      96   udp_rcv+0x2c/0x3c
>  15)     2488      32   ip_local_deliver+0xa0/0x224
>  16)     2456      48   ip_rcv+0x360/0x57c
>  17)     2408      64   __netif_receive_skb_core+0x4d0/0x80c
>  18)     2344     128   __netif_receive_skb+0x24/0x84
>  19)     2216      32   process_backlog+0x9c/0x15c
>  20)     2184      80   net_rx_action+0x1ec/0x32c
>  21)     2104     160   __do_softirq+0x114/0x2f0
>  22)     1944     128   do_softirq+0x60/0x68
>  23)     1816      32   __local_bh_enable_ip+0xb0/0xd4
>  24)     1784      32   ip_finish_output+0x1f4/0xabc
>  25)     1752      96   ip_output+0xf0/0x120
>  26)     1656      64   ip_local_out_sk+0x44/0x54
>  27)     1592      32   ip_send_skb+0x24/0xbc
>  28)     1560      48   udp_send_skb+0x1b4/0x2f4
>  29)     1512      80   udp_sendmsg+0x2a8/0x7a0
>  30)     1432     272   inet_sendmsg+0xa0/0xd0
>  31)     1160      48   sock_sendmsg+0x30/0x78
>  32)     1112      32   ___sys_sendmsg+0x15c/0x26c
>  33)     1080     400   __sys_sendmmsg+0x94/0x180
>  34)      680     320   SyS_sendmmsg+0x38/0x54
>  35)      360     360   el0_svc_naked+0x20/0x28
> 
> 2) stack_max_size
> 4504

Strange, on x86 I have this (with my patch applied):

        Depth    Size   Location    (39 entries)
        -----    ----   --------
  0)     3704      64   _raw_spin_lock+0x5/0x30
  1)     3640     200   get_partial_node.isra.80+0x54/0x1da
  2)     3440     208   __slab_alloc.isra.82+0x199/0x3f7
  3)     3232      80   kmem_cache_alloc+0x151/0x160
  4)     3152      16   mempool_alloc_slab+0x15/0x20
  5)     3136     128   mempool_alloc+0x58/0x150
  6)     3008      16   scsi_sg_alloc+0x42/0x50
  7)     2992     112   __sg_alloc_table+0x10b/0x150
  8)     2880      48   scsi_alloc_sgtable+0x43/0x80
  9)     2832      32   scsi_init_sgtable+0x2b/0x70
 10)     2800      80   scsi_init_io+0x59/0x1e0
 11)     2720     128   sd_init_command+0x66/0xd80
 12)     2592      24   scsi_setup_cmnd+0xa9/0x160
 13)     2568      88   scsi_prep_fn+0x7d/0x160
 14)     2480      48   blk_peek_request+0x168/0x2a0
 15)     2432     112   scsi_request_fn+0x3f/0x610
 16)     2320       8   __blk_run_queue+0x37/0x50
 17)     2312     104   queue_unplugged+0x41/0xe0
 18)     2208     112   blk_flush_plug_list+0x1b7/0x1e0
 19)     2096      80   blk_queue_bio+0x257/0x340
 20)     2016      48   generic_make_request+0xb1/0xf0
 21)     1968      96   submit_bio+0x76/0x130
 22)     1872      48   submit_bh_wbc.isra.35+0x10b/0x140
 23)     1824     112   __block_write_full_page.constprop.40+0x188/0x310
 24)     1712      64   block_write_full_page+0xdd/0x130
 25)     1648      16   blkdev_writepage+0x18/0x20
 26)     1632       8   __writepage+0x17/0x40
 27)     1624     312   write_cache_pages+0x21e/0x480
 28)     1312      96   generic_writepages+0x4a/0x70
 29)     1216      16   do_writepages+0x20/0x30
 30)     1200      96   __writeback_single_inode+0x45/0x350
 31)     1104     176   writeback_sb_inodes+0x218/0x3d0
 32)      928      80   __writeback_inodes_wb+0x8c/0xc0
 33)      848     128   wb_writeback+0x239/0x2c0
 34)      720     192   wb_workfn+0x24b/0x460
 35)      528      80   process_one_work+0x14b/0x430
 36)      448     128   worker_thread+0x117/0x460
 37)      320     144   kthread+0xc9/0xe0
 38)      176     176   ret_from_fork+0x3f/0x70

 # cat /debug/tracing/stack_max_size 
3704


> 
> In case of the number of entries, the following diff might be needed
> as I suggested in the previous reply. ;)
> 
> ----8<----
> 
> @@ -330,7 +333,7 @@ static int t_show(struct seq_file *m, void *v)
>  		seq_printf(m, "        Depth    Size   Location"
>  			   "    (%d entries)\n"
>  			   "        -----    ----   --------\n",
> -			   max_stack_trace.nr_entries - 1);
> +			   max_stack_trace.nr_entries);

This would break x86.

>  
>  		if (!stack_tracer_enabled && !max_stack_size)
>  			print_disabled(m);
> 
> ----8<----
> 
> However, 80-byte gap still appears.

This seems to be specific to your arch.

> 
> Since max_stack_trace.skip is 3 in your update, save_stack_trace in arm64
> should be refactored to align with this value. 
> 
> max_stack_trace.skip should be set to 4 if AKASHI's [RFC 2/3] patch is merged.
> However, arch code is supposed to follow generic framework's rule in this case.
> Isn't it?
>

yeah, you don't want to update the skip level. It should just work.

I'll run this on my powerpc box and see if it shows something
different. If I have to, I'll even boot up my arm (not 64) board and
try it there.


-- Steve
Jungseok Lee July 16, 2015, 3:52 p.m. UTC | #3
On Jul 17, 2015, at 12:31 AM, Steven Rostedt wrote:
> On Fri, 17 Jul 2015 00:01:25 +0900
> Jungseok Lee <jungseoklee85@gmail.com> wrote:
> 
>> I've gathered stack tracer data with your update.
>> 
>> 1) stack_trace
>>        Depth    Size   Location    (35 entries)
>>        -----    ----   --------
>>  0)     4424      16   put_cpu_partial+0x28/0x1d0
>>  1)     4408      80   get_partial_node.isra.64+0x13c/0x344
>>  2)     4328     256   __slab_alloc.isra.65.constprop.67+0xd8/0x37c
>>  3)     4072      32   kmem_cache_alloc+0x258/0x294
>>  4)     4040     304   __alloc_skb+0x48/0x180
>>  5)     3736      96   alloc_skb_with_frags+0x74/0x234
>>  6)     3640     112   sock_alloc_send_pskb+0x1d0/0x294
>>  7)     3528     160   sock_alloc_send_skb+0x44/0x54
>>  8)     3368      64   __ip_append_data.isra.40+0x78c/0xb48
>>  9)     3304     224   ip_append_data.part.42+0x98/0xe8
>> 10)     3080     112   ip_append_data+0x68/0x7c
>> 11)     2968      96   icmp_push_reply+0x7c/0x144
>> 12)     2872      96   icmp_send+0x3c0/0x3c8
>> 13)     2776     192   __udp4_lib_rcv+0x5b8/0x684
>> 14)     2584      96   udp_rcv+0x2c/0x3c
>> 15)     2488      32   ip_local_deliver+0xa0/0x224
>> 16)     2456      48   ip_rcv+0x360/0x57c
>> 17)     2408      64   __netif_receive_skb_core+0x4d0/0x80c
>> 18)     2344     128   __netif_receive_skb+0x24/0x84
>> 19)     2216      32   process_backlog+0x9c/0x15c
>> 20)     2184      80   net_rx_action+0x1ec/0x32c
>> 21)     2104     160   __do_softirq+0x114/0x2f0
>> 22)     1944     128   do_softirq+0x60/0x68
>> 23)     1816      32   __local_bh_enable_ip+0xb0/0xd4
>> 24)     1784      32   ip_finish_output+0x1f4/0xabc
>> 25)     1752      96   ip_output+0xf0/0x120
>> 26)     1656      64   ip_local_out_sk+0x44/0x54
>> 27)     1592      32   ip_send_skb+0x24/0xbc
>> 28)     1560      48   udp_send_skb+0x1b4/0x2f4
>> 29)     1512      80   udp_sendmsg+0x2a8/0x7a0
>> 30)     1432     272   inet_sendmsg+0xa0/0xd0
>> 31)     1160      48   sock_sendmsg+0x30/0x78
>> 32)     1112      32   ___sys_sendmsg+0x15c/0x26c
>> 33)     1080     400   __sys_sendmmsg+0x94/0x180
>> 34)      680     320   SyS_sendmmsg+0x38/0x54
>> 35)      360     360   el0_svc_naked+0x20/0x28
>> 
>> 2) stack_max_size
>> 4504
> 
> Strange, on x86 I have this (with my patch applied):
> 
>        Depth    Size   Location    (39 entries)
>        -----    ----   --------
>  0)     3704      64   _raw_spin_lock+0x5/0x30
>  1)     3640     200   get_partial_node.isra.80+0x54/0x1da
>  2)     3440     208   __slab_alloc.isra.82+0x199/0x3f7
>  3)     3232      80   kmem_cache_alloc+0x151/0x160
>  4)     3152      16   mempool_alloc_slab+0x15/0x20
>  5)     3136     128   mempool_alloc+0x58/0x150
>  6)     3008      16   scsi_sg_alloc+0x42/0x50
>  7)     2992     112   __sg_alloc_table+0x10b/0x150
>  8)     2880      48   scsi_alloc_sgtable+0x43/0x80
>  9)     2832      32   scsi_init_sgtable+0x2b/0x70
> 10)     2800      80   scsi_init_io+0x59/0x1e0
> 11)     2720     128   sd_init_command+0x66/0xd80
> 12)     2592      24   scsi_setup_cmnd+0xa9/0x160
> 13)     2568      88   scsi_prep_fn+0x7d/0x160
> 14)     2480      48   blk_peek_request+0x168/0x2a0
> 15)     2432     112   scsi_request_fn+0x3f/0x610
> 16)     2320       8   __blk_run_queue+0x37/0x50
> 17)     2312     104   queue_unplugged+0x41/0xe0
> 18)     2208     112   blk_flush_plug_list+0x1b7/0x1e0
> 19)     2096      80   blk_queue_bio+0x257/0x340
> 20)     2016      48   generic_make_request+0xb1/0xf0
> 21)     1968      96   submit_bio+0x76/0x130
> 22)     1872      48   submit_bh_wbc.isra.35+0x10b/0x140
> 23)     1824     112   __block_write_full_page.constprop.40+0x188/0x310
> 24)     1712      64   block_write_full_page+0xdd/0x130
> 25)     1648      16   blkdev_writepage+0x18/0x20
> 26)     1632       8   __writepage+0x17/0x40
> 27)     1624     312   write_cache_pages+0x21e/0x480
> 28)     1312      96   generic_writepages+0x4a/0x70
> 29)     1216      16   do_writepages+0x20/0x30
> 30)     1200      96   __writeback_single_inode+0x45/0x350
> 31)     1104     176   writeback_sb_inodes+0x218/0x3d0
> 32)      928      80   __writeback_inodes_wb+0x8c/0xc0
> 33)      848     128   wb_writeback+0x239/0x2c0
> 34)      720     192   wb_workfn+0x24b/0x460
> 35)      528      80   process_one_work+0x14b/0x430
> 36)      448     128   worker_thread+0x117/0x460
> 37)      320     144   kthread+0xc9/0xe0
> 38)      176     176   ret_from_fork+0x3f/0x70
> 
> # cat /debug/tracing/stack_max_size 
> 3704
> 
> 
>> 
>> In case of the number of entries, the following diff might be needed
>> as I suggested in the previous reply. ;)
>> 
>> ----8<----
>> 
>> @@ -330,7 +333,7 @@ static int t_show(struct seq_file *m, void *v)
>> 		seq_printf(m, "        Depth    Size   Location"
>> 			   "    (%d entries)\n"
>> 			   "        -----    ----   --------\n",
>> -			   max_stack_trace.nr_entries - 1);
>> +			   max_stack_trace.nr_entries);
> 
> This would break x86.

Thanks for x86 data. It's really helpful!

> 
>> 
>> 		if (!stack_tracer_enabled && !max_stack_size)
>> 			print_disabled(m);
>> 
>> ----8<----
>> 
>> However, 80-byte gap still appears.
> 
> This seems to be specific to your arch.

Totally agree.

Best Regards
Jungseok Lee
diff mbox

Patch

Index: linux-trace.git/kernel/trace/trace_stack.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_stack.c	2015-07-15 12:49:59.290063597 -0400
+++ linux-trace.git/kernel/trace/trace_stack.c	2015-07-16 09:55:37.936188197 -0400
@@ -18,12 +18,6 @@ 
 
 #define STACK_TRACE_ENTRIES 500
 
-#ifdef CC_USING_FENTRY
-# define fentry		1
-#else
-# define fentry		0
-#endif
-
 static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
 	 { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
 static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
@@ -35,7 +29,7 @@  static unsigned stack_dump_index[STACK_T
  */
 static struct stack_trace max_stack_trace = {
 	.max_entries		= STACK_TRACE_ENTRIES - 1,
-	.entries		= &stack_dump_trace[1],
+	.entries		= &stack_dump_trace[0],
 };
 
 static unsigned long max_stack_size;
@@ -77,7 +71,7 @@  check_stack(unsigned long ip, unsigned l
 	unsigned long this_size, flags; unsigned long *p, *top, *start;
 	static int tracer_frame;
 	int frame_size = ACCESS_ONCE(tracer_frame);
-	int i;
+	int i, x;
 
 	this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
 	this_size = THREAD_SIZE - this_size;
@@ -105,26 +99,20 @@  check_stack(unsigned long ip, unsigned l
 	max_stack_size = this_size;
 
 	max_stack_trace.nr_entries = 0;
-
-	if (using_ftrace_ops_list_func())
-		max_stack_trace.skip = 4;
-	else
-		max_stack_trace.skip = 3;
+	max_stack_trace.skip = 3;
 
 	save_stack_trace(&max_stack_trace);
 
-	/*
-	 * Add the passed in ip from the function tracer.
-	 * Searching for this on the stack will skip over
-	 * most of the overhead from the stack tracer itself.
-	 */
-	stack_dump_trace[0] = ip;
-	max_stack_trace.nr_entries++;
+	/* Skip over the overhead of the stack tracer itself */
+	for (i = 0; i < max_stack_trace.nr_entries; i++) {
+		if (stack_dump_trace[i] == ip)
+			break;
+	}
 
 	/*
 	 * Now find where in the stack these are.
 	 */
-	i = 0;
+	x = 0;
 	start = stack;
 	top = (unsigned long *)
 		(((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE);
@@ -139,12 +127,13 @@  check_stack(unsigned long ip, unsigned l
 	while (i < max_stack_trace.nr_entries) {
 		int found = 0;
 
-		stack_dump_index[i] = this_size;
+		stack_dump_index[x] = this_size;
 		p = start;
 
 		for (; p < top && i < max_stack_trace.nr_entries; p++) {
 			if (*p == stack_dump_trace[i]) {
-				this_size = stack_dump_index[i++] =
+				stack_dump_trace[x] = stack_dump_trace[i++];
+				this_size = stack_dump_index[x++] =
 					(top - p) * sizeof(unsigned long);
 				found = 1;
 				/* Start the search from here */
@@ -168,6 +157,10 @@  check_stack(unsigned long ip, unsigned l
 			i++;
 	}
 
+	max_stack_trace.nr_entries = x;
+	for (; x < i; x++)
+		stack_dump_trace[x] = ULONG_MAX;
+
 	if (task_stack_end_corrupted(current)) {
 		print_max_stack();
 		BUG();
@@ -192,24 +185,7 @@  stack_trace_call(unsigned long ip, unsig
 	if (per_cpu(trace_active, cpu)++ != 0)
 		goto out;
 
-	/*
-	 * When fentry is used, the traced function does not get
-	 * its stack frame set up, and we lose the parent.
-	 * The ip is pretty useless because the function tracer
-	 * was called before that function set up its stack frame.
-	 * In this case, we use the parent ip.
-	 *
-	 * By adding the return address of either the parent ip
-	 * or the current ip we can disregard most of the stack usage
-	 * caused by the stack tracer itself.
-	 *
-	 * The function tracer always reports the address of where the
-	 * mcount call was, but the stack will hold the return address.
-	 */
-	if (fentry)
-		ip = parent_ip;
-	else
-		ip += MCOUNT_INSN_SIZE;
+	ip += MCOUNT_INSN_SIZE;
 
 	check_stack(ip, &stack);