diff mbox series

[v4] setlocalversion: work around "git describe" performance

Message ID 20241122150037.1085800-1-linux@rasmusvillemoes.dk (mailing list archive)
State New
Headers show
Series [v4] setlocalversion: work around "git describe" performance | expand

Commit Message

Rasmus Villemoes Nov. 22, 2024, 3 p.m. UTC
Contrary to expectations, passing a single candidate tag to "git
describe" is slower than not passing any --match options.

  $ time git describe --debug
  ...
  traversed 10619 commits
  ...
  v6.12-rc5-63-g0fc810ae3ae1

  real    0m0.169s

  $ time git describe --match=v6.12-rc5 --debug
  ...
  traversed 1310024 commits
  v6.12-rc5-63-g0fc810ae3ae1

  real    0m1.281s

In fact, the --debug output shows that git traverses all or most of
history. For some repositories and/or git versions, those 1.3s are
actually 10-15 seconds.

This has been acknowledged as a performance bug in git [1], and a fix
is on its way [2]. However, no solution is yet in git.git, and even
when one lands, it will take quite a while before it finds its way to
a release and for $random_kernel_developer to pick that up.

So rewrite the logic to use plumbing commands. For each of the
candidate values of $tag, we ask: (1) is $tag even an annotated
tag? (2) Is it eligible to describe HEAD, i.e. an ancestor of
HEAD? (3) If so, how many commits are in $tag..HEAD?

I have tested that this produces the same output as the current script
for ~700 random commits between v6.9..v6.10. For those 700 commits,
and in my git repo, the 'make -s kernelrelease' command is on average
~4 times faster with this patch applied (geometric mean of ratios).

For the commit mentioned in Josh's original report [3], the
time-consuming part of setlocalversion goes from

$ time git describe --match=v6.12-rc5 c1e939a21eb1
v6.12-rc5-44-gc1e939a21eb1

real    0m1.210s

to

$ time git rev-list --count --left-right v6.12-rc5..c1e939a21eb1
0       44

real    0m0.037s

[1] https://lore.kernel.org/git/20241101113910.GA2301440@coredump.intra.peff.net/
[2] https://lore.kernel.org/git/20241106192236.GC880133@coredump.intra.peff.net/
[3] https://lore.kernel.org/lkml/309549cafdcfe50c4fceac3263220cc3d8b109b2.1730337435.git.jpoimboe@kernel.org/

Reported-by: Sean Christopherson <seanjc@google.com>
Closes: https://lore.kernel.org/lkml/ZPtlxmdIJXOe0sEy@google.com/
Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Closes: https://lore.kernel.org/lkml/309549cafdcfe50c4fceac3263220cc3d8b109b2.1730337435.git.jpoimboe@kernel.org/
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
v4:

- Switch the logic to make use of the return values from try_tag,
  instead of asking whether $count has been set.

- Update a comment.

- Drop T-b tag from Josh as I think this changes the flow sufficiently
  from the version he tested. I have repeated my tests, with the same
  functional and performance result as indicated in the commit log.

v3: https://lore.kernel.org/lkml/20241118110154.3711777-1-linux@rasmusvillemoes.dk/

 scripts/setlocalversion | 59 +++++++++++++++++++++++++++--------------
 1 file changed, 39 insertions(+), 20 deletions(-)

Comments

Josh Poimboeuf Nov. 22, 2024, 5:08 p.m. UTC | #1
On Fri, Nov 22, 2024 at 04:00:37PM +0100, Rasmus Villemoes wrote:
> v4:
> 
> - Switch the logic to make use of the return values from try_tag,
>   instead of asking whether $count has been set.
> 
> - Update a comment.
> 
> - Drop T-b tag from Josh as I think this changes the flow sufficiently
>   from the version he tested. I have repeated my tests, with the same
>   functional and performance result as indicated in the commit log.

Tested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Masahiro Yamada Nov. 23, 2024, 8:06 a.m. UTC | #2
On Sat, Nov 23, 2024 at 12:01 AM Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> Contrary to expectations, passing a single candidate tag to "git
> describe" is slower than not passing any --match options.
>
>   $ time git describe --debug
>   ...
>   traversed 10619 commits
>   ...
>   v6.12-rc5-63-g0fc810ae3ae1
>
>   real    0m0.169s
>
>   $ time git describe --match=v6.12-rc5 --debug
>   ...
>   traversed 1310024 commits
>   v6.12-rc5-63-g0fc810ae3ae1
>
>   real    0m1.281s
>
> In fact, the --debug output shows that git traverses all or most of
> history. For some repositories and/or git versions, those 1.3s are
> actually 10-15 seconds.
>
> This has been acknowledged as a performance bug in git [1], and a fix
> is on its way [2]. However, no solution is yet in git.git, and even
> when one lands, it will take quite a while before it finds its way to
> a release and for $random_kernel_developer to pick that up.
>
> So rewrite the logic to use plumbing commands. For each of the
> candidate values of $tag, we ask: (1) is $tag even an annotated
> tag? (2) Is it eligible to describe HEAD, i.e. an ancestor of
> HEAD? (3) If so, how many commits are in $tag..HEAD?
>
> I have tested that this produces the same output as the current script
> for ~700 random commits between v6.9..v6.10. For those 700 commits,
> and in my git repo, the 'make -s kernelrelease' command is on average
> ~4 times faster with this patch applied (geometric mean of ratios).
>
> For the commit mentioned in Josh's original report [3], the
> time-consuming part of setlocalversion goes from
>
> $ time git describe --match=v6.12-rc5 c1e939a21eb1
> v6.12-rc5-44-gc1e939a21eb1
>
> real    0m1.210s
>
> to
>
> $ time git rev-list --count --left-right v6.12-rc5..c1e939a21eb1
> 0       44
>
> real    0m0.037s
>
> [1] https://lore.kernel.org/git/20241101113910.GA2301440@coredump.intra.peff.net/
> [2] https://lore.kernel.org/git/20241106192236.GC880133@coredump.intra.peff.net/
> [3] https://lore.kernel.org/lkml/309549cafdcfe50c4fceac3263220cc3d8b109b2.1730337435.git.jpoimboe@kernel.org/
>
> Reported-by: Sean Christopherson <seanjc@google.com>
> Closes: https://lore.kernel.org/lkml/ZPtlxmdIJXOe0sEy@google.com/
> Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
> Closes: https://lore.kernel.org/lkml/309549cafdcfe50c4fceac3263220cc3d8b109b2.1730337435.git.jpoimboe@kernel.org/
> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> ---
> v4:
>
> - Switch the logic to make use of the return values from try_tag,
>   instead of asking whether $count has been set.


No, please do not do this.

As I replied in v3, my plan is to set -e, because otherwise
the shell script is fragile.

With this version, -e will not work in try_tag()
because it is used in the if condition.


> +try_tag() {
> +       tag="$1"
> +
> +       # Is $tag an annotated tag?
> +       [ "$(git cat-file -t "$tag" 2> /dev/null)" = tag ] || return 1
> +
> +       # Is it an ancestor of HEAD, and if so, how many commits are in $tag..HEAD?
> +       # shellcheck disable=SC2046 # word splitting is the point here
> +       set -- $(git rev-list --count --left-right "$tag"...HEAD 2> /dev/null)
> +
> +       # $1 is 0 if and only if $tag is an ancestor of HEAD. Use
> +       # string comparison, because $1 is empty if the 'git rev-list'
> +       # command somehow failed.
> +       [ "$1" = 0 ] || return 1
> +
> +       # $2 is the number of commits in the range $tag..HEAD, possibly 0.
> +       count="$2"

Redundant double-quotes.



--
Best Regards
Masahiro Yamada
diff mbox series

Patch

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index 38b96c6797f4..cde45d92cc0b 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -30,6 +30,27 @@  if test $# -gt 0 -o ! -d "$srctree"; then
 	usage
 fi
 
+try_tag() {
+	tag="$1"
+
+	# Is $tag an annotated tag?
+	[ "$(git cat-file -t "$tag" 2> /dev/null)" = tag ] || return 1
+
+	# Is it an ancestor of HEAD, and if so, how many commits are in $tag..HEAD?
+	# shellcheck disable=SC2046 # word splitting is the point here
+	set -- $(git rev-list --count --left-right "$tag"...HEAD 2> /dev/null)
+
+	# $1 is 0 if and only if $tag is an ancestor of HEAD. Use
+	# string comparison, because $1 is empty if the 'git rev-list'
+	# command somehow failed.
+	[ "$1" = 0 ] || return 1
+
+	# $2 is the number of commits in the range $tag..HEAD, possibly 0.
+	count="$2"
+
+	return 0
+}
+
 scm_version()
 {
 	local short=false
@@ -64,30 +85,27 @@  scm_version()
 	# If a localversion* file exists, and the corresponding
 	# annotated tag exists and is an ancestor of HEAD, use
 	# it. This is the case in linux-next.
-	tag=${file_localversion#-}
-	desc=
-	if [ -n "${tag}" ]; then
-		desc=$(git describe --match=$tag 2>/dev/null)
-	fi
-
+	if [ -n "${file_localversion#-}" ] && try_tag "${file_localversion#-}" ; then
+		:
 	# Otherwise, if a localversion* file exists, and the tag
 	# obtained by appending it to the tag derived from
 	# KERNELVERSION exists and is an ancestor of HEAD, use
 	# it. This is e.g. the case in linux-rt.
-	if [ -z "${desc}" ] && [ -n "${file_localversion}" ]; then
-		tag="${version_tag}${file_localversion}"
-		desc=$(git describe --match=$tag 2>/dev/null)
-	fi
-
+	elif [ -n "${file_localversion}" ] && try_tag "${version_tag}${file_localversion}" ; then
+		:
 	# Otherwise, default to the annotated tag derived from KERNELVERSION.
-	if [ -z "${desc}" ]; then
-		tag="${version_tag}"
-		desc=$(git describe --match=$tag 2>/dev/null)
+	elif try_tag "${version_tag}" ; then
+		:
+	else
+		count=
 	fi
 
-	# If we are at the tagged commit, we ignore it because the version is
-	# well-defined.
-	if [ "${tag}" != "${desc}" ]; then
+	# If we are at the tagged commit, we ignore it because the
+	# version is well-defined. If none of the attempted tags exist
+	# or were usable, $count is empty, so there is no count to
+	# pretty-print, but we can and should still append the -g plus
+	# the abbreviated sha1.
+	if [ "${count}" != 0 ]; then
 
 		# If only the short version is requested, don't bother
 		# running further git commands
@@ -95,14 +113,15 @@  scm_version()
 			echo "+"
 			return
 		fi
+
 		# If we are past the tagged commit, we pretty print it.
 		# (like 6.1.0-14595-g292a089d78d3)
-		if [ -n "${desc}" ]; then
-			echo "${desc}" | awk -F- '{printf("-%05d", $(NF-1))}'
+		if [ -n "${count}" ]; then
+			printf "%s%05d" "-" "${count}"
 		fi
 
 		# Add -g and exactly 12 hex chars.
-		printf '%s%s' -g "$(echo $head | cut -c1-12)"
+		printf '%s%.12s' -g "$head"
 	fi
 
 	if ${no_dirty}; then