Message ID | 7714b0659e3210e34d0904b3347473427546d15c.1536850601.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | contrib: Add script to show uncovered "new" lines | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <dstolee@microsoft.com> > > We have coverage targets in our Makefile for using gcov to display line > coverage based on our test suite. The way I like to do it is to run: > > make coverage-test > make coverage-report > > This leaves the repo in a state where every X.c file that was covered has > an X.c.gcov file containing the coverage counts for every line, and "#####" > at every uncovered line. > > There have been a few bugs in recent patches what would have been caught > if the test suite covered those blocks (including a few of mine). I want > to work towards a "sensible" amount of coverage on new topics. In my opinion, > this means that any logic should be covered, but the 'die()' blocks in error > cases do not need to be covered. > > It is important to not measure the coverage of the codebase by what old code > is not covered. To help, I created the 'contrib/coverage-diff.sh' script. > After creating the coverage statistics at a version (say, 'topic') you can > then run > > contrib/coverage-diff.sh base topic > > to see the lines added between 'base' and 'topic' that are not covered by the > test suite. The output uses 'git blame -c' format so you can find the commits > responsible and view the line numbers for quick access to the context. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > contrib/coverage-diff.sh | 63 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 63 insertions(+) > create mode 100755 contrib/coverage-diff.sh > > diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh > new file mode 100755 > index 0000000000..0f226f038c > --- /dev/null > +++ b/contrib/coverage-diff.sh > @@ -0,0 +1,63 @@ > +#!/bin/sh > + > +# Usage: 'contrib/coverage-diff.sh <version1> <version2> > +# Outputs a list of new lines in version2 compared to version1 that are > +# not covered by the test suite. Assumes you ran > +# 'make coverage-test coverage-report' from root first, so .gcov files exist. I presume that it is "from root first while you have a checkout of version2, so .gcov files for version2 exist in the working tree"? > +V1=$1 > +V2=$2 > + > +diff_lines () { > + while read line > + do > + if echo $line | grep -q -e "^@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*" As you know you are reading from diff output, you do not have to be so strict in this if condition. It's not like you try to carefully reject a line that began with "@@" but did not match this pattern in the corresponding else block. "read line" does funny things to backslashes and whitespaces in the payload ("read -r line" sometimes helps), and echoing $line without quoting will destroy its contents here and in the line below (but you do not care because all you care about is if it begins with a dash). > + then > + line_num=$(echo $line \ > + | awk 'match($0, "@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*", m) { print m[3] }') > + else > + echo "$line_num:$line" > + if ! echo $line | grep -q -e "^-" If a patch removes a line with a solitary 'n' on it, possibly followed by a SP and something else, such a line would say "-n something else", and some implementation of "echo -n something else" would do what this line does not expect. A safer way to do this, as you only care if it is a deletion line, is to do case "$line" in -*) ;; *) line_num=$(( $line_num + 1 ));; esac Also you can make the echoing of "$line_num:$line" above conditional, which will allow you to lose "grep ':+'" in the pipeline that consumes output from this thing, e.g. case "$line" in +*) echo "$line_num:$line";; esac iff you must write this in shell (but see below). > + then > + line_num=$(($line_num + 1)) > + fi > + fi > + done I have a feeling that a single Perl script instead of a shell loop that runs many grep and awk as subprocesses performs better even on Windows, and it would be more readable and maintainable. perl -e ' my $line_num; while (<>) { # Hunk header? Grab the beginning in postimage. if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) { $line_num = $1; next; } # Have we seen a hunk? Ignore "diff --git" etc. next unless defined $line_num; # Deleted line? Ignore. if (/^-/) { next; } # Show only the line number of added lines. if (/^\+/) { print "$line_num\n"; } # Either common context or added line appear in # the postimage. Count it. $line_num++; } ' or something like that, given that you seem to only need line numbers in new_lines.txt anyway? > +} > + > +files=$(git diff --raw $V1 $V2 \ > + | grep \.c$ \ > + | awk 'NF>1{print $NF}') Do we need these other commands in the pipe? How is this different from, say, git diff --name-only "$V1" "$V2" -- \*.c ? > +for file in $files > +do > + git diff $V1 $V2 -- $file \ > + | diff_lines \ > + | grep ":+" \ I think you meant to match "^<linenum>:+<added contents>" you are echoing out in the above helper function, but this will find a removed line that happens to have colon followed by a plus (which is not all that uncommon in a patch to a shell script). > + | sed 's/:/ /g' \ Turning "<linenum>:+<added contents>" to "<linenum> +<added contents>" with this, so that the next "awk" can show <linenum> only? Then you do not need 'g' at the end. I see you use 'g' in many sed scripts for uncovered_lines.txt, and I suspect most of 'g's you do not mean. > + | awk '{print $1}' \ Then we get a list of $line_num here? > + | sort \ This is sorted textually, which goes against intuition because these lines are all line numbers, but it is done so that you can run "comm" on it later and "comm" requires us to feed lines sorted textually. > + >new_lines.txt If I were writing this part, I'd probably write a small Perl script that takes output from 'git diff "$V1" "$V2" -- "$file"' and then reduces it to a list of line numbers for newly introduced lines. Then pipe it to "sort >new_lines.txt" to match what we see below. > + hash_file=$(echo $file | sed "s/\//\#/") > + cat "$hash_file.gcov" \ > + | grep \#\#\#\#\# \ > + | sed 's/ #####: //g' \ > + | sed 's/\:/ /g' \ > + | awk '{print $1}' \ > + | sort \ > + >uncovered_lines.txt Do not cat a single regular file into a pipe. Write pipe at the end of the line, not the beginning of next line. Do not grep into sed. Do not feed sed into sed unless needed. Without knowing what the above is trying to do, but just mechanically translating, I'd get something like this: sed -ne '/#####/{ s/ #####: //g s/:.*// p }' "$hash_file.gcov" | sort >uncovered_lines.txt > + comm -12 uncovered_lines.txt new_lines.txt \ > + | sed -e 's/$/\)/' \ > + | sed -e 's/^/\t/' \ > + >uncovered_new_lines.txt ;-) This creates something like 1) 2) 3) out of a list of numbers. Cute. > + grep -q '[^[:space:]]' < uncovered_new_lines.txt && \ > + echo $file && \ > + git blame -c $file \ > + | grep -f uncovered_new_lines.txt > + > + rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt > +done > +
On Thu, Sep 13, 2018 at 10:56 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > There have been a few bugs in recent patches what would have been caught > if the test suite covered those blocks (including a few of mine). I want > to work towards a "sensible" amount of coverage on new topics. In my opinion, > this means that any logic should be covered, but the 'die()' blocks in error > cases do not need to be covered. The bit about die() blocks is perhaps a bit too general. While it's true that some die()'s signal very unlikely (or near-impossible) conditions, others are merely reporting invalid user or other input to the program. The latter category is often very much worth testing, as the number of test_must_fail() invocations in the test suite shows. 68a6b3a1bd (worktree: teach 'move' to override lock when --force given twice, 2018-08-28), which was highlighted in your cover letter, provides a good example of legitimately testing that a die() is covered. So, perhaps the above can be toned-down a bit by saying something like: ...but die() blocks covering very unlikely (or near-impossible) situations may not warrant coverage. > It is important to not measure the coverage of the codebase by what old code > is not covered.
On 9/13/2018 1:40 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> + then >> + line_num=$(($line_num + 1)) >> + fi >> + fi >> + done > I have a feeling that a single Perl script instead of a shell loop > that runs many grep and awk as subprocesses performs better even on > Windows, and it would be more readable and maintainable. > > perl -e ' > my $line_num; > while (<>) { > # Hunk header? Grab the beginning in postimage. > if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) { > $line_num = $1; > next; > } > > # Have we seen a hunk? Ignore "diff --git" etc. > next unless defined $line_num; > > # Deleted line? Ignore. > if (/^-/) { > next; > } > > # Show only the line number of added lines. > if (/^\+/) { > print "$line_num\n"; > } > # Either common context or added line appear in > # the postimage. Count it. > $line_num++; > } > ' > > or something like that, given that you seem to only need line > numbers in new_lines.txt anyway? Thanks for the deep dive here, especially with the perl assistance. I've never written any perl, but it seems like the right tool here. I'll have time to revisit this next week. Thanks, -Stolee
diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh new file mode 100755 index 0000000000..0f226f038c --- /dev/null +++ b/contrib/coverage-diff.sh @@ -0,0 +1,63 @@ +#!/bin/sh + +# Usage: 'contrib/coverage-diff.sh <version1> <version2> +# Outputs a list of new lines in version2 compared to version1 that are +# not covered by the test suite. Assumes you ran +# 'make coverage-test coverage-report' from root first, so .gcov files exist. + +V1=$1 +V2=$2 + +diff_lines () { + while read line + do + if echo $line | grep -q -e "^@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*" + then + line_num=$(echo $line \ + | awk 'match($0, "@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*", m) { print m[3] }') + else + echo "$line_num:$line" + if ! echo $line | grep -q -e "^-" + then + line_num=$(($line_num + 1)) + fi + fi + done +} + +files=$(git diff --raw $V1 $V2 \ + | grep \.c$ \ + | awk 'NF>1{print $NF}') + +for file in $files +do + git diff $V1 $V2 -- $file \ + | diff_lines \ + | grep ":+" \ + | sed 's/:/ /g' \ + | awk '{print $1}' \ + | sort \ + >new_lines.txt + + hash_file=$(echo $file | sed "s/\//\#/") + cat "$hash_file.gcov" \ + | grep \#\#\#\#\# \ + | sed 's/ #####: //g' \ + | sed 's/\:/ /g' \ + | awk '{print $1}' \ + | sort \ + >uncovered_lines.txt + + comm -12 uncovered_lines.txt new_lines.txt \ + | sed -e 's/$/\)/' \ + | sed -e 's/^/\t/' \ + >uncovered_new_lines.txt + + grep -q '[^[:space:]]' < uncovered_new_lines.txt && \ + echo $file && \ + git blame -c $file \ + | grep -f uncovered_new_lines.txt + + rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt +done +