diff mbox series

[v3,1/1] contrib: add coverage-diff script

Message ID 21214cc321f80cf2e9eb0cdb1ec3ebb869ea496d.1537542952.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series contrib: Add script to show uncovered "new" lines | expand

Commit Message

Philippe Blain via GitGitGadget Sept. 21, 2018, 3:15 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

    make coverage-test
    make coverage-report

This leaves the repo in a state where every X.c file that was covered has
an X.c.gcov file containing the coverage counts for every line, and "#####"
at every uncovered line.

There have been a few bugs in recent patches what would have been caught
if the test suite covered those blocks (including a few of mine). I want
to work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks covering
very unlikely (or near-impossible) situations may not warrant coverage.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

    contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by the
test suite. The output uses 'git blame -c' format so you can find the commits
responsible and view the line numbers for quick access to the context.

Helped-by: Junio C Hamano <gister@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 contrib/coverage-diff.sh | 79 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100755 contrib/coverage-diff.sh

Comments

Junio C Hamano Sept. 25, 2018, 6:36 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +files=$(git diff --name-only $V1 $V2 -- *.c)

You'd want to quote that *.c from the shell, i.e. either one of
these

	files=$(git diff --name-only $V1 $V2 -- \*.c)
	files=$(git diff --name-only $V1 $V2 -- '*.c')

otherwise you'd lose things like "builtin/am.c", I'd think.

> +
> +for file in $files
> +do

I know this is only for running in _our_ source tree, and we do not
have a source with $IFS in it, so I'd declare that this is OK.  It
would be good to document that assumption in red capital letters at
the beginning of this loop, though ;-)

	# Note: this script is only for our codebase and we rely on
	# the fact that the pathnames of our source files do not
	# have any funny characters---letting the shell split $files
	# list at $IFS boundary is very much intentional, and not
	# quoting "$file" in the code below also is.  Don't imitate
	# this in scripts that are meant to handle random end-user
	# repositories!
	for file in $files
	do
		...

> +	git diff $V1 $V2 -- $file \
> +		| diff_lines \
> +		| sort >new_lines.txt

I do not see a strong reason why we would want to limit $V1 and $V2
to branch names and raw commit object names, and quoting them in dq
pair is a cheap fix to allow things like

	$ contrib/coverage-diff.sh master 'pu^{/^### match next}'

so let's do so.

Could you cut lines _after_ typing a pipe and omit backslashes, i.e.

	git diff "$V1" "$V2" -- $file |
	diff_lines |
	sort >new_lines.txt

It seems to be personal taste whether to indent the second and
subsequent lines; I do not care if you indent or if you align too
much either way (but I have moderate perference to align).  

But I do not want to see people type unnecessary backslashes.  This
is not limited to just this pipeline but elsewhere in the script.

> +	if ! test -s new_lines.txt
> +	then
> +		continue
> +	fi
> +
> +	hash_file=$(echo $file | sed "s/\//\#/")
> +	sed -ne '/#####:/{
> +			s/    #####://
> +			s/:.*//
> +			s/ //g
> +			p
> +		}' "$hash_file.gcov" \
> +		| sort >uncovered_lines.txt
> +
> +	comm -12 uncovered_lines.txt new_lines.txt \
> +		| sed -e 's/$/\)/' \
> +		| sed -e 's/^/\t/' \

Do you need two sed invocations for this, or would

	sed -e 's/$/\)/' -e '/^/	/'

work as well?  By the way """The meaning of an unescaped <backslash>
immediately followed by any character other than '&', <backslash>, a
digit, <newline>, or the delimiter character used for this command,
is unspecified."""[*1*] so '\t' on the replacement side is a no-no
in the portability department.

	Reference. *1*
	http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html


> +		>uncovered_new_lines.txt
> +
> +	grep -q '[^[:space:]]' < uncovered_new_lines.txt && \

Lose the backslash at the end.  The shell knows that you haven't
finished your sentence when it sees a line that ends with &&, || or
a pipe |, so there is no need to tell it redundantly that you have
more things to say with the backslash.

> +		echo $file && \
> +		git blame -c $file \
> +			| grep -f uncovered_new_lines.txt
> +
> +	rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt
> +done

Near the begininng (like just before the "for file in $files" loop),
you can probably have a trap to make sure these are removed upon
exit, e.g.

    trap 'rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt' 0
diff mbox series

Patch

diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
new file mode 100755
index 0000000000..48b9a3ae96
--- /dev/null
+++ b/contrib/coverage-diff.sh
@@ -0,0 +1,79 @@ 
+#!/bin/sh
+
+# Usage: Run 'contrib/coverage-diff.sh <version1> <version2>' from source-root
+# after running
+#
+#     make coverage-test
+#     make coverage-report
+#
+# while checked out at <version2>. This script combines the *.gcov files
+# generated by the 'make' commands above with 'git diff <version1> <version2>'
+# to report new lines that are not covered by the test suite.
+
+V1=$1
+V2=$2
+
+diff_lines () {
+	perl -e '
+		my $line_num;
+		while (<>) {
+			# Hunk header?  Grab the beginning in postimage.
+			if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) {
+				$line_num = $1;
+				next;
+			}
+
+			# Have we seen a hunk?  Ignore "diff --git" etc.
+			next unless defined $line_num;
+
+			# Deleted line? Ignore.
+			if (/^-/) {
+				next;
+			}
+
+			# Show only the line number of added lines.
+			if (/^\+/) {
+				print "$line_num\n";
+			}
+			# Either common context or added line appear in
+			# the postimage.  Count it.
+			$line_num++;
+		}
+	'
+}
+
+files=$(git diff --name-only $V1 $V2 -- *.c)
+
+for file in $files
+do
+	git diff $V1 $V2 -- $file \
+		| diff_lines \
+		| sort >new_lines.txt
+
+	if ! test -s new_lines.txt
+	then
+		continue
+	fi
+
+	hash_file=$(echo $file | sed "s/\//\#/")
+	sed -ne '/#####:/{
+			s/    #####://
+			s/:.*//
+			s/ //g
+			p
+		}' "$hash_file.gcov" \
+		| sort >uncovered_lines.txt
+
+	comm -12 uncovered_lines.txt new_lines.txt \
+		| sed -e 's/$/\)/' \
+		| sed -e 's/^/\t/' \
+		>uncovered_new_lines.txt
+
+	grep -q '[^[:space:]]' < uncovered_new_lines.txt && \
+		echo $file && \
+		git blame -c $file \
+			| grep -f uncovered_new_lines.txt
+
+	rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt
+done
+