Message ID | 20200203015203.27882-4-leo.yan@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | perf cs-etm: Fix synthesizing instruction samples | expand |
Hi Leo, There are a couple of typos in the comments below, but I also believe that the sample loop could be considerably simplified On Mon, 3 Feb 2020 at 01:52, Leo Yan <leo.yan@linaro.org> wrote: > > When 'etm->instructions_sample_period' is less than > 'tidq->period_instructions', the function cs_etm__sample() cannot handle > this case properly with its logic. > > Let's see below flow as an example: > > - If we set itrace option '--itrace=i4', then function cs_etm__sample() > has variables with initialized values: > > tidq->period_instructions = 0 > etm->instructions_sample_period = 4 > > - When the first packet is coming: > > packet->instr_count = 10; the number of instructions executed in this > packet is 10, thus update period_instructions as below: > > tidq->period_instructions = 0 + 10 = 10 > instrs_over = 10 - 4 = 6 > offset = 10 - 6 - 1 = 3 > tidq->period_instructions = instrs_over = 6 > > - When the second packet is coming: > > packet->instr_count = 10; in the second pass, assume 10 instructions > in the trace sample again: > > tidq->period_instructions = 6 + 10 = 16 > instrs_over = 16 - 4 = 12 > offset = 10 - 12 - 1 = -3 -> the negative value > tidq->period_instructions = instrs_over = 12 > > So after handle these two packets, there have below issues: > > The first issue is that cs_etm__instr_addr() returns the address within > the current trace sample of the instruction related to offset, so the > offset is supposed to be always unsigned value. But in fact, function > cs_etm__sample() might calculate a negative offset value (in handling > the second packet, the offset is -3) and pass to cs_etm__instr_addr() > with u64 type with a big positive integer. > > The second issue is it only synthesizes 2 samples for sample period = 4. > In theory, every packet has 10 instructions so the two packets have > total 20 instructions, 20 instructions should generate 5 samples > (4 x 5 = 20). This is because cs_etm__sample() only calls once > cs_etm__synth_instruction_sample() to generate instruction sample per > range packet. > > This patch fixes the logic in function cs_etm__sample(); the basic > idea is to divide into three parts for handling coming packet: > > - The first part is for synthesizing the first instruction sample, it > combines the instructions from the tail of previous packet and the > instructions from the head of the new packet; > - The second part is to simply generate samples with sample period > aligned; > - The third part is the tail of new packet, the rest instructions will > be left for the sequential sample handling. > > Suggested-by: Mike Leach <mike.leach@linaro.org> > Signed-off-by: Leo Yan <leo.yan@linaro.org> > --- > tools/perf/util/cs-etm.c | 105 ++++++++++++++++++++++++++++++++++----- > 1 file changed, 92 insertions(+), 13 deletions(-) > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > index 3e28462609e7..c5a05f728eac 100644 > --- a/tools/perf/util/cs-etm.c > +++ b/tools/perf/util/cs-etm.c > @@ -1360,23 +1360,102 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, > * TODO: allow period to be defined in cycles and clock time > */ > > - /* Get number of instructions executed after the sample point */ > - u64 instrs_over = tidq->period_instructions - > - etm->instructions_sample_period; > + /* > + * Below diagram demonstrates the instruction samples > + * generation flows: > + * > + * Instrs Instrs Instrs Instrs > + * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3) > + * | | | | > + * V V V V > + * -------------------------------------------------- > + * ^ ^ > + * | | > + * Period Period > + * instructions(Pi) instructions(Pi') > + * > + * | | > + * \---------------- -----------------/ > + * V > + * instrs_executed > + * > + * Period instructions (Pi) contains the the number of > + * instructions executed after the sample point(n). When a new > + * instruction packet is coming and generate for the next sample > + * (n+1), it combines with two parts instructions, one is the > + * tail of the old packet and another is the head of the new > + * coming packet. So 'head' variable is used to cauclate the typo : s/cauclate/calculate > + * instruction numbers in the new packet for sample(n+1). > + * > + * Sample(n+2) and sample(n+3) consume the instructions with > + * sample period, so directly generate samples based on the > + * sampe period. > + * typo: s/sampe/sample > + * After sample(n+3), the rest instructions will be used by > + * later packet; so use 'instrs_over' to track the rest > + * instruction number and it is assigned to > + * 'tidq->period_instructions' for next round calculation. > + */ > + u64 head, offset = 0; > + u64 addr; > > /* > - * Calculate the address of the sampled instruction (-1 as > - * sample is reported as though instruction has just been > - * executed, but PC has not advanced to next instruction) > + * 'instrs_over' is the number of instructions executed after > + * sample points, initialise it to 'instrs_executed' and will > + * decrease it for consumed instructions in every synthesized > + * instruction sample. > */ > - u64 offset = (instrs_executed - instrs_over - 1); > - u64 addr = cs_etm__instr_addr(etmq, trace_chan_id, > - tidq->packet, offset); > + u64 instrs_over = instrs_executed; > > - ret = cs_etm__synth_instruction_sample( > - etmq, tidq, addr, etm->instructions_sample_period); > - if (ret) > - return ret; > + /* > + * 'head' is the instructions number of the head in the new > + * packet, it combines with the tail of previous packet to > + * generate a sample. So 'head' uses the sample period to > + * decrease the instruction number introduced by the previous > + * packet. > + */ > + head = etm->instructions_sample_period - > + (tidq->period_instructions - instrs_executed); > + > + if (head) { > + offset = head; > + > + /* > + * Calculate the address of the sampled instruction (-1 > + * as sample is reported as though instruction has just > + * been executed, but PC has not advanced to next > + * instruction) > + */ > + addr = cs_etm__instr_addr(etmq, trace_chan_id, > + tidq->packet, offset - 1); > + ret = cs_etm__synth_instruction_sample( > + etmq, tidq, addr, > + etm->instructions_sample_period); > + if (ret) > + return ret; > + > + instrs_over -= head; > + } > + > + while (instrs_over >= etm->instructions_sample_period) { > + offset += etm->instructions_sample_period; > + > + /* > + * Calculate the address of the sampled instruction (-1 > + * as sample is reported as though instruction has just > + * been executed, but PC has not advanced to next > + * instruction) > + */ > + addr = cs_etm__instr_addr(etmq, trace_chan_id, > + tidq->packet, offset - 1); > + ret = cs_etm__synth_instruction_sample( > + etmq, tidq, addr, > + etm->instructions_sample_period); > + if (ret) > + return ret; > + > + instrs_over -= etm->instructions_sample_period; > + } > > /* Carry remaining instructions into next sample period */ > tidq->period_instructions = instrs_over; > -- > 2.17.1 > I believe the following change would work and make for easier reading... .... at the start of the function remove instrs_executed and replace .... /* get instructions remainder from previous packet */ u64 instrs_prev = tidq->period_instructions; /* set available instructions to previous packet remainder + the current packet count */ tidq->period_instructions += tidq->packet->instr_count; .... within the if(etm->sample_instructions && ...) statement I would be more explicit what the elements of the diagram are .... /* * Below diagram demonstrates the instruction samples * generation flows: * * Instrs Instrs Instrs Instrs * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3) * | | | | * V V V V * -------------------------------------------------- * ^ ^ * | | * Period Period * instructions(Pi) instructions(Pi') * * | | * \---------------- -----------------/ * V * tidq->packet->instr_count; * * Instrs Sample(n...) are the synthesised samples occuring every etm->instructions_sample_period * instructions - as defined on the perf command line. Sample(n) being the last sample before the * current etm packet, n+1 to n+3 samples generated from the current etm packet. * * tidq->packet->instr_count represents the number of instructions in the current etm packet. * * Period instructions (Pi) contains the the number of instructions executed after the sample point(n) * from the previous etm packet. This will always be less than etm->instructions_sample_period. * .... continue with explanation here .... .... then we can simplify the loop code removing some of the temporary variables .... /* get the initial offset into the current packet instructions (entry conditions ensure that instrs_prev < etm->instructions_sample_period) */ u64 offset = etm->instructions_sample_period - instrs_prev; u64 addr; /* Prepare last branches for instruction sample */ if (etm->synth_opts.last_branch) cs_etm__copy_last_branch_rb(etmq, tidq); while (tidq->period_instructions >= etm->instructions_sample_period) { /* * Calculate the address of the sampled instruction (-1 * as sample is reported as though instruction has just * been executed, but PC has not advanced to next * instruction) */ addr = cs_etm__instr_addr(etmq, trace_chan_id, tidq->packet, offset - 1); ret = cs_etm__synth_instruction_sample( etmq, tidq, addr, etm->instructions_sample_period); if (ret) return ret; offset += etm->instructions_sample_period; tidq->period_instructions -= etm->instructions_sample_period; } ..... I believe the above should work, but cannot claim to have tried it out. What do you think? Regards Mike
On Wed, Feb 05, 2020 at 04:09:01PM +0000, Mike Leach wrote: > Hi Leo, > > There are a couple of typos in the comments below, but I also believe > that the sample loop could be considerably simplified > > On Mon, 3 Feb 2020 at 01:52, Leo Yan <leo.yan@linaro.org> wrote: > > > > When 'etm->instructions_sample_period' is less than > > 'tidq->period_instructions', the function cs_etm__sample() cannot handle > > this case properly with its logic. > > > > Let's see below flow as an example: > > > > - If we set itrace option '--itrace=i4', then function cs_etm__sample() > > has variables with initialized values: > > > > tidq->period_instructions = 0 > > etm->instructions_sample_period = 4 > > > > - When the first packet is coming: > > > > packet->instr_count = 10; the number of instructions executed in this > > packet is 10, thus update period_instructions as below: > > > > tidq->period_instructions = 0 + 10 = 10 > > instrs_over = 10 - 4 = 6 > > offset = 10 - 6 - 1 = 3 > > tidq->period_instructions = instrs_over = 6 > > > > - When the second packet is coming: > > > > packet->instr_count = 10; in the second pass, assume 10 instructions > > in the trace sample again: > > > > tidq->period_instructions = 6 + 10 = 16 > > instrs_over = 16 - 4 = 12 > > offset = 10 - 12 - 1 = -3 -> the negative value > > tidq->period_instructions = instrs_over = 12 > > > > So after handle these two packets, there have below issues: > > > > The first issue is that cs_etm__instr_addr() returns the address within > > the current trace sample of the instruction related to offset, so the > > offset is supposed to be always unsigned value. But in fact, function > > cs_etm__sample() might calculate a negative offset value (in handling > > the second packet, the offset is -3) and pass to cs_etm__instr_addr() > > with u64 type with a big positive integer. > > > > The second issue is it only synthesizes 2 samples for sample period = 4. > > In theory, every packet has 10 instructions so the two packets have > > total 20 instructions, 20 instructions should generate 5 samples > > (4 x 5 = 20). This is because cs_etm__sample() only calls once > > cs_etm__synth_instruction_sample() to generate instruction sample per > > range packet. > > > > This patch fixes the logic in function cs_etm__sample(); the basic > > idea is to divide into three parts for handling coming packet: > > > > - The first part is for synthesizing the first instruction sample, it > > combines the instructions from the tail of previous packet and the > > instructions from the head of the new packet; > > - The second part is to simply generate samples with sample period > > aligned; > > - The third part is the tail of new packet, the rest instructions will > > be left for the sequential sample handling. > > > > Suggested-by: Mike Leach <mike.leach@linaro.org> > > Signed-off-by: Leo Yan <leo.yan@linaro.org> > > --- > > tools/perf/util/cs-etm.c | 105 ++++++++++++++++++++++++++++++++++----- > > 1 file changed, 92 insertions(+), 13 deletions(-) > > > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > > index 3e28462609e7..c5a05f728eac 100644 > > --- a/tools/perf/util/cs-etm.c > > +++ b/tools/perf/util/cs-etm.c > > @@ -1360,23 +1360,102 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, > > * TODO: allow period to be defined in cycles and clock time > > */ > > > > - /* Get number of instructions executed after the sample point */ > > - u64 instrs_over = tidq->period_instructions - > > - etm->instructions_sample_period; > > + /* > > + * Below diagram demonstrates the instruction samples > > + * generation flows: > > + * > > + * Instrs Instrs Instrs Instrs > > + * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3) > > + * | | | | > > + * V V V V > > + * -------------------------------------------------- > > + * ^ ^ > > + * | | > > + * Period Period > > + * instructions(Pi) instructions(Pi') > > + * > > + * | | > > + * \---------------- -----------------/ > > + * V > > + * instrs_executed > > + * > > + * Period instructions (Pi) contains the the number of > > + * instructions executed after the sample point(n). When a new > > + * instruction packet is coming and generate for the next sample > > + * (n+1), it combines with two parts instructions, one is the > > + * tail of the old packet and another is the head of the new > > + * coming packet. So 'head' variable is used to cauclate the > typo : s/cauclate/calculate Used checkpatch.pl but didn't see any complaints for this. Thanks for pointing out and will fix it. > > + * instruction numbers in the new packet for sample(n+1). > > + * > > + * Sample(n+2) and sample(n+3) consume the instructions with > > + * sample period, so directly generate samples based on the > > + * sampe period. > > + * > typo: s/sampe/sample Will fix. > > + * After sample(n+3), the rest instructions will be used by > > + * later packet; so use 'instrs_over' to track the rest > > + * instruction number and it is assigned to > > + * 'tidq->period_instructions' for next round calculation. > > + */ > > + u64 head, offset = 0; > > + u64 addr; > > > > /* > > - * Calculate the address of the sampled instruction (-1 as > > - * sample is reported as though instruction has just been > > - * executed, but PC has not advanced to next instruction) > > + * 'instrs_over' is the number of instructions executed after > > + * sample points, initialise it to 'instrs_executed' and will > > + * decrease it for consumed instructions in every synthesized > > + * instruction sample. > > */ > > - u64 offset = (instrs_executed - instrs_over - 1); > > - u64 addr = cs_etm__instr_addr(etmq, trace_chan_id, > > - tidq->packet, offset); > > + u64 instrs_over = instrs_executed; > > > > - ret = cs_etm__synth_instruction_sample( > > - etmq, tidq, addr, etm->instructions_sample_period); > > - if (ret) > > - return ret; > > + /* > > + * 'head' is the instructions number of the head in the new > > + * packet, it combines with the tail of previous packet to > > + * generate a sample. So 'head' uses the sample period to > > + * decrease the instruction number introduced by the previous > > + * packet. > > + */ > > + head = etm->instructions_sample_period - > > + (tidq->period_instructions - instrs_executed); > > + > > + if (head) { > > + offset = head; > > + > > + /* > > + * Calculate the address of the sampled instruction (-1 > > + * as sample is reported as though instruction has just > > + * been executed, but PC has not advanced to next > > + * instruction) > > + */ > > + addr = cs_etm__instr_addr(etmq, trace_chan_id, > > + tidq->packet, offset - 1); > > + ret = cs_etm__synth_instruction_sample( > > + etmq, tidq, addr, > > + etm->instructions_sample_period); > > + if (ret) > > + return ret; > > + > > + instrs_over -= head; > > + } > > + > > + while (instrs_over >= etm->instructions_sample_period) { > > + offset += etm->instructions_sample_period; > > + > > + /* > > + * Calculate the address of the sampled instruction (-1 > > + * as sample is reported as though instruction has just > > + * been executed, but PC has not advanced to next > > + * instruction) > > + */ > > + addr = cs_etm__instr_addr(etmq, trace_chan_id, > > + tidq->packet, offset - 1); > > + ret = cs_etm__synth_instruction_sample( > > + etmq, tidq, addr, > > + etm->instructions_sample_period); > > + if (ret) > > + return ret; > > + > > + instrs_over -= etm->instructions_sample_period; > > + } > > > > /* Carry remaining instructions into next sample period */ > > tidq->period_instructions = instrs_over; > > -- > > 2.17.1 > > > > I believe the following change would work and make for easier reading... > > .... at the start of the function remove instrs_executed and replace .... > /* get instructions remainder from previous packet */ > u64 instrs_prev = tidq->period_instructions; > > /* set available instructions to previous packet remainder + the > current packet count */ > tidq->period_instructions += tidq->packet->instr_count; > > > .... within the if(etm->sample_instructions && ...) statement I would > be more explicit what the elements of the diagram are .... > > /* > * Below diagram demonstrates the instruction samples > * generation flows: > * > * Instrs Instrs Instrs Instrs > * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3) > * | | | | > * V V V V > * -------------------------------------------------- > * ^ ^ > * | | > * Period Period > * instructions(Pi) instructions(Pi') > * > * | | > * \---------------- -----------------/ > * V > * tidq->packet->instr_count; > * > * Instrs Sample(n...) are the synthesised samples occuring every > etm->instructions_sample_period > * instructions - as defined on the perf command line. Sample(n) being > the last sample before the > * current etm packet, n+1 to n+3 samples generated from the current etm packet. > * > * tidq->packet->instr_count represents the number of instructions in > the current etm packet. > * > * Period instructions (Pi) contains the the number of instructions > executed after the sample point(n) > * from the previous etm packet. This will always be less than > etm->instructions_sample_period. > * > > .... continue with explanation here .... > > > .... then we can simplify the loop code removing some of the temporary > variables .... > > /* get the initial offset into the current packet instructions > (entry conditions ensure that instrs_prev < etm->instructions_sample_period) > */ > u64 offset = etm->instructions_sample_period - instrs_prev; > u64 addr; > > /* Prepare last branches for instruction sample */ > if (etm->synth_opts.last_branch) > cs_etm__copy_last_branch_rb(etmq, tidq); > > while (tidq->period_instructions >= etm->instructions_sample_period) { > > /* > * Calculate the address of the sampled instruction (-1 > * as sample is reported as though instruction has just > * been executed, but PC has not advanced to next > * instruction) > */ > addr = cs_etm__instr_addr(etmq, trace_chan_id, tidq->packet, offset - 1); > ret = cs_etm__synth_instruction_sample( etmq, tidq, addr, > etm->instructions_sample_period); > if (ret) > return ret; > > offset += etm->instructions_sample_period; > tidq->period_instructions -= etm->instructions_sample_period; > } > > ..... > I believe the above should work, but cannot claim to have tried it > out. What do you think? Agree. To be honest, I considered to use your suggested way, but I worried about the boundary conditions for 'offset', so went back to use explict method with two code segments (head and sequential samples). After review the suggested code, I don't find any issue. Will refine code as this way and give testing for it. Very appreciate the suggestions :) Leo
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 3e28462609e7..c5a05f728eac 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1360,23 +1360,102 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, * TODO: allow period to be defined in cycles and clock time */ - /* Get number of instructions executed after the sample point */ - u64 instrs_over = tidq->period_instructions - - etm->instructions_sample_period; + /* + * Below diagram demonstrates the instruction samples + * generation flows: + * + * Instrs Instrs Instrs Instrs + * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3) + * | | | | + * V V V V + * -------------------------------------------------- + * ^ ^ + * | | + * Period Period + * instructions(Pi) instructions(Pi') + * + * | | + * \---------------- -----------------/ + * V + * instrs_executed + * + * Period instructions (Pi) contains the the number of + * instructions executed after the sample point(n). When a new + * instruction packet is coming and generate for the next sample + * (n+1), it combines with two parts instructions, one is the + * tail of the old packet and another is the head of the new + * coming packet. So 'head' variable is used to cauclate the + * instruction numbers in the new packet for sample(n+1). + * + * Sample(n+2) and sample(n+3) consume the instructions with + * sample period, so directly generate samples based on the + * sampe period. + * + * After sample(n+3), the rest instructions will be used by + * later packet; so use 'instrs_over' to track the rest + * instruction number and it is assigned to + * 'tidq->period_instructions' for next round calculation. + */ + u64 head, offset = 0; + u64 addr; /* - * Calculate the address of the sampled instruction (-1 as - * sample is reported as though instruction has just been - * executed, but PC has not advanced to next instruction) + * 'instrs_over' is the number of instructions executed after + * sample points, initialise it to 'instrs_executed' and will + * decrease it for consumed instructions in every synthesized + * instruction sample. */ - u64 offset = (instrs_executed - instrs_over - 1); - u64 addr = cs_etm__instr_addr(etmq, trace_chan_id, - tidq->packet, offset); + u64 instrs_over = instrs_executed; - ret = cs_etm__synth_instruction_sample( - etmq, tidq, addr, etm->instructions_sample_period); - if (ret) - return ret; + /* + * 'head' is the instructions number of the head in the new + * packet, it combines with the tail of previous packet to + * generate a sample. So 'head' uses the sample period to + * decrease the instruction number introduced by the previous + * packet. + */ + head = etm->instructions_sample_period - + (tidq->period_instructions - instrs_executed); + + if (head) { + offset = head; + + /* + * Calculate the address of the sampled instruction (-1 + * as sample is reported as though instruction has just + * been executed, but PC has not advanced to next + * instruction) + */ + addr = cs_etm__instr_addr(etmq, trace_chan_id, + tidq->packet, offset - 1); + ret = cs_etm__synth_instruction_sample( + etmq, tidq, addr, + etm->instructions_sample_period); + if (ret) + return ret; + + instrs_over -= head; + } + + while (instrs_over >= etm->instructions_sample_period) { + offset += etm->instructions_sample_period; + + /* + * Calculate the address of the sampled instruction (-1 + * as sample is reported as though instruction has just + * been executed, but PC has not advanced to next + * instruction) + */ + addr = cs_etm__instr_addr(etmq, trace_chan_id, + tidq->packet, offset - 1); + ret = cs_etm__synth_instruction_sample( + etmq, tidq, addr, + etm->instructions_sample_period); + if (ret) + return ret; + + instrs_over -= etm->instructions_sample_period; + } /* Carry remaining instructions into next sample period */ tidq->period_instructions = instrs_over;
When 'etm->instructions_sample_period' is less than 'tidq->period_instructions', the function cs_etm__sample() cannot handle this case properly with its logic. Let's see below flow as an example: - If we set itrace option '--itrace=i4', then function cs_etm__sample() has variables with initialized values: tidq->period_instructions = 0 etm->instructions_sample_period = 4 - When the first packet is coming: packet->instr_count = 10; the number of instructions executed in this packet is 10, thus update period_instructions as below: tidq->period_instructions = 0 + 10 = 10 instrs_over = 10 - 4 = 6 offset = 10 - 6 - 1 = 3 tidq->period_instructions = instrs_over = 6 - When the second packet is coming: packet->instr_count = 10; in the second pass, assume 10 instructions in the trace sample again: tidq->period_instructions = 6 + 10 = 16 instrs_over = 16 - 4 = 12 offset = 10 - 12 - 1 = -3 -> the negative value tidq->period_instructions = instrs_over = 12 So after handle these two packets, there have below issues: The first issue is that cs_etm__instr_addr() returns the address within the current trace sample of the instruction related to offset, so the offset is supposed to be always unsigned value. But in fact, function cs_etm__sample() might calculate a negative offset value (in handling the second packet, the offset is -3) and pass to cs_etm__instr_addr() with u64 type with a big positive integer. The second issue is it only synthesizes 2 samples for sample period = 4. In theory, every packet has 10 instructions so the two packets have total 20 instructions, 20 instructions should generate 5 samples (4 x 5 = 20). This is because cs_etm__sample() only calls once cs_etm__synth_instruction_sample() to generate instruction sample per range packet. This patch fixes the logic in function cs_etm__sample(); the basic idea is to divide into three parts for handling coming packet: - The first part is for synthesizing the first instruction sample, it combines the instructions from the tail of previous packet and the instructions from the head of the new packet; - The second part is to simply generate samples with sample period aligned; - The third part is the tail of new packet, the rest instructions will be left for the sequential sample handling. Suggested-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> --- tools/perf/util/cs-etm.c | 105 ++++++++++++++++++++++++++++++++++----- 1 file changed, 92 insertions(+), 13 deletions(-)