@@ -1414,7 +1414,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
struct cs_etm_packet *tmp;
int ret;
u8 trace_chan_id = tidq->trace_chan_id;
- u64 instrs_executed = tidq->packet->instr_count;
+ u64 instrs_executed = tidq->prev_packet->instr_count;
tidq->period_instructions += instrs_executed;
@@ -1445,7 +1445,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
*/
u64 offset = (instrs_executed - instrs_over - 1);
u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
- tidq->packet, offset);
+ tidq->prev_packet, offset);
ret = cs_etm__synth_instruction_sample(
etmq, tidq, addr, etm->instructions_sample_period);
It uses 'tidq->packet' rather than 'tidq->prev_packet' to generate instruction sample, comparing against the thread stack and the branch samples which are using the 'tidp->prev_packet', thus this leads the instruction sample to use one ahead packet than thread stack and branch samples. As result, the instruction's call chain can be wrongly displayed as below: main 1579 100 instructions: ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms]) ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms]) ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms]) ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms]) ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms]) ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms]) In this log, the continuous two lines includes two functions, the up line contains the child function info and the bottom line contains its parent function, and so forth. But if review the first two lines: perf_event_update_userpage+0x4c => the sampled instruction perf_event_update_userpage+0x48 => the parent function's calling The child function and parent function is the same function perf_event_update_userpage(), but perf_event_update_userpage() isn't a recursive function, thus this calling sequence shouldn't never happen. This is caused by the instruction sample using the 'tidq->packet', but this packet is not handled yet by thread stack, the thread stack is delayed to handle one return packet for stack popping. To fix this issue, we can simply to use 'tidq->prev_packet' to generate the instruction sample, this allows the thread stack to push/pop properly for instruction sample. Finally, we can get below result: main 1579 100 instructions: ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms]) ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms]) ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms]) ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms]) ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms]) Signed-off-by: Leo Yan <leo.yan@linaro.org> --- tools/perf/util/cs-etm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)