[v2,1/1] contrib: add coverage-diff script

Message ID	7714b0659e3210e34d0904b3347473427546d15c.1536850601.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Thu, 13 Sep 2018 07:56:44 -0700 (PDT) Message-Id: <7714b0659e3210e34d0904b3347473427546d15c.1536850601.git.gitgitgadget@gmail.com> In-Reply-To: <pull.40.v2.git.gitgitgadget@gmail.com> References: <pull.40.git.gitgitgadget@gmail.com> <pull.40.v2.git.gitgitgadget@gmail.com> From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> Subject: [PATCH v2 1/1] contrib: add coverage-diff script Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: peff@peff.net, Junio C Hamano <gitster@pobox.com>, Derrick Stolee <dstolee@microsoft.com> Sender: git-owner@vger.kernel.org Precedence: bulk
Series	contrib: Add script to show uncovered "new" lines \| expand [v2,0/1] contrib: Add script to show uncovered "new" lines [v2,1/1] contrib: add coverage-diff script

Message ID

7714b0659e3210e34d0904b3347473427546d15c.1536850601.git.gitgitgadget@gmail.com (mailing list archive)

State

New, archived

Headers

Date: Thu, 13 Sep 2018 07:56:44 -0700 (PDT)
Message-Id: 
 <7714b0659e3210e34d0904b3347473427546d15c.1536850601.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.40.v2.git.gitgitgadget@gmail.com>
References: <pull.40.git.gitgitgadget@gmail.com>
        <pull.40.v2.git.gitgitgadget@gmail.com>
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Subject: [PATCH v2 1/1] contrib: add coverage-diff script
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: peff@peff.net, Junio C Hamano <gitster@pobox.com>,
        Derrick Stolee <dstolee@microsoft.com>
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

contrib: Add script to show uncovered "new" lines | expand

Commit Message

Philippe Blain via GitGitGadget Sept. 13, 2018, 2:56 p.m. UTC

From: Derrick Stolee <dstolee@microsoft.com>

We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

    make coverage-test
    make coverage-report

This leaves the repo in a state where every X.c file that was covered has
an X.c.gcov file containing the coverage counts for every line, and "#####"
at every uncovered line.

There have been a few bugs in recent patches what would have been caught
if the test suite covered those blocks (including a few of mine). I want
to work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks in error
cases do not need to be covered.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

    contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by the
test suite. The output uses 'git blame -c' format so you can find the commits
responsible and view the line numbers for quick access to the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 contrib/coverage-diff.sh | 63 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100755 contrib/coverage-diff.sh

Comments

Junio C Hamano Sept. 13, 2018, 5:40 p.m. UTC | #1

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> We have coverage targets in our Makefile for using gcov to display line
> coverage based on our test suite. The way I like to do it is to run:
>
>     make coverage-test
>     make coverage-report
>
> This leaves the repo in a state where every X.c file that was covered has
> an X.c.gcov file containing the coverage counts for every line, and "#####"
> at every uncovered line.
>
> There have been a few bugs in recent patches what would have been caught
> if the test suite covered those blocks (including a few of mine). I want
> to work towards a "sensible" amount of coverage on new topics. In my opinion,
> this means that any logic should be covered, but the 'die()' blocks in error
> cases do not need to be covered.
>
> It is important to not measure the coverage of the codebase by what old code
> is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
> After creating the coverage statistics at a version (say, 'topic') you can
> then run
>
>     contrib/coverage-diff.sh base topic
>
> to see the lines added between 'base' and 'topic' that are not covered by the
> test suite. The output uses 'git blame -c' format so you can find the commits
> responsible and view the line numbers for quick access to the context.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  contrib/coverage-diff.sh | 63 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
>  create mode 100755 contrib/coverage-diff.sh
>
> diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
> new file mode 100755
> index 0000000000..0f226f038c
> --- /dev/null
> +++ b/contrib/coverage-diff.sh
> @@ -0,0 +1,63 @@
> +#!/bin/sh
> +
> +# Usage: 'contrib/coverage-diff.sh <version1> <version2>
> +# Outputs a list of new lines in version2 compared to version1 that are
> +# not covered by the test suite. Assumes you ran
> +# 'make coverage-test coverage-report' from root first, so .gcov files exist.

I presume that it is "from root first while you have a checkout of
version2, so .gcov files for version2 exist in the working tree"?

> +V1=$1
> +V2=$2
> +
> +diff_lines () {
> +	while read line
> +	do
> +		if echo $line | grep -q -e "^@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*"

As you know you are reading from diff output, you do not have to be
so strict in this if condition.  It's not like you try to carefully
reject a line that began with "@@" but did not match this pattern in
the corresponding else block.

"read line" does funny things to backslashes and whitespaces in the
payload ("read -r line" sometimes helps), and echoing $line without
quoting will destroy its contents here and in the line below (but
you do not care because all you care about is if it begins with a
dash).

> +		then
> +			line_num=$(echo $line \
> +				| awk 'match($0, "@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*", m) { print m[3] }')
> +		else
> +			echo "$line_num:$line"
> +			if ! echo $line | grep -q -e "^-"

If a patch removes a line with a solitary 'n' on it, possibly
followed by a SP and something else, such a line would say "-n
something else", and some implementation of "echo -n something else"
would do what this line does not expect.  A safer way to do this,
as you only care if it is a deletion line, is to do

	case "$line" in -*) ;; *) line_num=$(( $line_num + 1 ));; esac

Also you can make the echoing of "$line_num:$line" above
conditional, which will allow you to lose "grep ':+'" in the
pipeline that consumes output from this thing, e.g.

	case "$line" in +*) echo "$line_num:$line";; esac

iff you must write this in shell (but see below).

> +			then
> +				line_num=$(($line_num + 1))
> +			fi
> +		fi
> +	done

I have a feeling that a single Perl script instead of a shell loop
that runs many grep and awk as subprocesses performs better even on
Windows, and it would be more readable and maintainable.

perl -e '
	my $line_num;
	while (<>) {
		# Hunk header?  Grab the beginning in postimage.
		if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) {
			$line_num = $1;
			next;
		}

		# Have we seen a hunk?  Ignore "diff --git" etc.
		next unless defined $line_num;

		# Deleted line? Ignore.
		if (/^-/) {
			next;
		}

		# Show only the line number of added lines.
		if (/^\+/) {
			print "$line_num\n";
		}
		# Either common context or added line appear in
		# the postimage.  Count it.
		$line_num++;
	}
'

or something like that, given that you seem to only need line
numbers in new_lines.txt anyway?

> +}
> +
> +files=$(git diff --raw $V1 $V2 \
> +	| grep \.c$ \
> +	| awk 'NF>1{print $NF}')

Do we need these other commands in the pipe?  How is this different
from, say,

	git diff --name-only "$V1" "$V2" -- \*.c

?

> +for file in $files
> +do
> +	git diff $V1 $V2 -- $file \
> +		| diff_lines \
> +		| grep ":+" \

I think you meant to match "^<linenum>:+<added contents>" you are
echoing out in the above helper function, but this will find a
removed line that happens to have colon followed by a plus (which is
not all that uncommon in a patch to a shell script).

> +		| sed 's/:/ /g' \

Turning "<linenum>:+<added contents>" to "<linenum> +<added contents>"
with this, so that the next "awk" can show <linenum> only?  Then you
do not need 'g' at the end.  I see you use 'g' in many sed scripts
for uncovered_lines.txt, and I suspect most of 'g's you do not mean.

> +		| awk '{print $1}' \

Then we get a list of $line_num here?

> +		| sort \

This is sorted textually, which goes against intuition because these
lines are all line numbers, but it is done so that you can run
"comm" on it later and "comm" requires us to feed lines sorted
textually.

> +		>new_lines.txt

If I were writing this part, I'd probably write a small Perl script
that takes output from 'git diff "$V1" "$V2" -- "$file"' and then
reduces it to a list of line numbers for newly introduced lines.
Then pipe it to "sort >new_lines.txt" to match what we see below.

> +	hash_file=$(echo $file | sed "s/\//\#/")
> +	cat "$hash_file.gcov" \
> +		| grep \#\#\#\#\# \
> +		| sed 's/    #####: //g' \
> +		| sed 's/\:/ /g' \
> +		| awk '{print $1}' \
> +		| sort \
> +		>uncovered_lines.txt

Do not cat a single regular file into a pipe.  Write pipe at the end
of the line, not the beginning of next line.  Do not grep into sed.
Do not feed sed into sed unless needed.

Without knowing what the above is trying to do, but just
mechanically translating, I'd get something like this:

	sed -ne '/#####/{
		s/    #####: //g
		s/:.*//
                p
	}' "$hash_file.gcov" |
	sort >uncovered_lines.txt

> +	comm -12 uncovered_lines.txt new_lines.txt \
> +		| sed -e 's/$/\)/' \
> +		| sed -e 's/^/\t/' \
> +		>uncovered_new_lines.txt

;-)  This creates something like

	1)
	2)
	3)

out of a list of numbers.  Cute.

> +	grep -q '[^[:space:]]' < uncovered_new_lines.txt && \
> +		echo $file && \
> +		git blame -c $file \
> +			| grep -f uncovered_new_lines.txt
> +
> +	rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt
> +done
> +

Eric Sunshine Sept. 13, 2018, 5:54 p.m. UTC | #2

On Thu, Sep 13, 2018 at 10:56 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> There have been a few bugs in recent patches what would have been caught
> if the test suite covered those blocks (including a few of mine). I want
> to work towards a "sensible" amount of coverage on new topics. In my opinion,
> this means that any logic should be covered, but the 'die()' blocks in error
> cases do not need to be covered.

The bit about die() blocks is perhaps a bit too general. While it's
true that some die()'s signal very unlikely (or near-impossible)
conditions, others are merely reporting invalid user or other input to
the program. The latter category is often very much worth testing, as
the number of test_must_fail() invocations in the test suite shows.
68a6b3a1bd (worktree: teach 'move' to override lock when --force given
twice, 2018-08-28), which was highlighted in your cover letter,
provides a good example of legitimately testing that a die() is
covered. So, perhaps the above can be toned-down a bit by saying
something like:

    ...but die() blocks covering very unlikely (or near-impossible)
    situations may not warrant coverage.

> It is important to not measure the coverage of the codebase by what old code
> is not covered.

Derrick Stolee Sept. 13, 2018, 6:15 p.m. UTC | #3

On 9/13/2018 1:40 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> +			then
>> +				line_num=$(($line_num + 1))
>> +			fi
>> +		fi
>> +	done
> I have a feeling that a single Perl script instead of a shell loop
> that runs many grep and awk as subprocesses performs better even on
> Windows, and it would be more readable and maintainable.
>
> perl -e '
> 	my $line_num;
> 	while (<>) {
> 		# Hunk header?  Grab the beginning in postimage.
> 		if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) {
> 			$line_num = $1;
> 			next;
> 		}
>
> 		# Have we seen a hunk?  Ignore "diff --git" etc.
> 		next unless defined $line_num;
>
> 		# Deleted line? Ignore.
> 		if (/^-/) {
> 			next;
> 		}
>
> 		# Show only the line number of added lines.
> 		if (/^\+/) {
> 			print "$line_num\n";
> 		}
> 		# Either common context or added line appear in
> 		# the postimage.  Count it.
> 		$line_num++;
> 	}
> '
>
> or something like that, given that you seem to only need line
> numbers in new_lines.txt anyway?

Thanks for the deep dive here, especially with the perl assistance. I've 
never written any perl, but it seems like the right tool here. I'll have 
time to revisit this next week.

Thanks,

-Stolee

diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
new file mode 100755
index 0000000000..0f226f038c
--- /dev/null
+++ b/contrib/coverage-diff.sh
@@ -0,0 +1,63 @@ 
+#!/bin/sh
+
+# Usage: 'contrib/coverage-diff.sh <version1> <version2>
+# Outputs a list of new lines in version2 compared to version1 that are
+# not covered by the test suite. Assumes you ran
+# 'make coverage-test coverage-report' from root first, so .gcov files exist.
+
+V1=$1
+V2=$2
+
+diff_lines () {
+	while read line
+	do
+		if echo $line | grep -q -e "^@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*"
+		then
+			line_num=$(echo $line \
+				| awk 'match($0, "@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*", m) { print m[3] }')
+		else
+			echo "$line_num:$line"
+			if ! echo $line | grep -q -e "^-"
+			then
+				line_num=$(($line_num + 1))
+			fi
+		fi
+	done
+}
+
+files=$(git diff --raw $V1 $V2 \
+	| grep \.c$ \
+	| awk 'NF>1{print $NF}')
+
+for file in $files
+do
+	git diff $V1 $V2 -- $file \
+		| diff_lines \
+		| grep ":+" \
+		| sed 's/:/ /g' \
+		| awk '{print $1}' \
+		| sort \
+		>new_lines.txt
+
+	hash_file=$(echo $file | sed "s/\//\#/")
+	cat "$hash_file.gcov" \
+		| grep \#\#\#\#\# \
+		| sed 's/    #####: //g' \
+		| sed 's/\:/ /g' \
+		| awk '{print $1}' \
+		| sort \
+		>uncovered_lines.txt
+
+	comm -12 uncovered_lines.txt new_lines.txt \
+		| sed -e 's/$/\)/' \
+		| sed -e 's/^/\t/' \
+		>uncovered_new_lines.txt
+
+	grep -q '[^[:space:]]' < uncovered_new_lines.txt && \
+		echo $file && \
+		git blame -c $file \
+			| grep -f uncovered_new_lines.txt
+
+	rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt
+done
+

[v2,1/1] contrib: add coverage-diff script

Commit Message

Comments

Patch