Message ID | 20241122150037.1085800-1-linux@rasmusvillemoes.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4] setlocalversion: work around "git describe" performance | expand |
On Fri, Nov 22, 2024 at 04:00:37PM +0100, Rasmus Villemoes wrote: > v4: > > - Switch the logic to make use of the return values from try_tag, > instead of asking whether $count has been set. > > - Update a comment. > > - Drop T-b tag from Josh as I think this changes the flow sufficiently > from the version he tested. I have repeated my tests, with the same > functional and performance result as indicated in the commit log. Tested-by: Josh Poimboeuf <jpoimboe@kernel.org>
On Sat, Nov 23, 2024 at 12:01 AM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > Contrary to expectations, passing a single candidate tag to "git > describe" is slower than not passing any --match options. > > $ time git describe --debug > ... > traversed 10619 commits > ... > v6.12-rc5-63-g0fc810ae3ae1 > > real 0m0.169s > > $ time git describe --match=v6.12-rc5 --debug > ... > traversed 1310024 commits > v6.12-rc5-63-g0fc810ae3ae1 > > real 0m1.281s > > In fact, the --debug output shows that git traverses all or most of > history. For some repositories and/or git versions, those 1.3s are > actually 10-15 seconds. > > This has been acknowledged as a performance bug in git [1], and a fix > is on its way [2]. However, no solution is yet in git.git, and even > when one lands, it will take quite a while before it finds its way to > a release and for $random_kernel_developer to pick that up. > > So rewrite the logic to use plumbing commands. For each of the > candidate values of $tag, we ask: (1) is $tag even an annotated > tag? (2) Is it eligible to describe HEAD, i.e. an ancestor of > HEAD? (3) If so, how many commits are in $tag..HEAD? > > I have tested that this produces the same output as the current script > for ~700 random commits between v6.9..v6.10. For those 700 commits, > and in my git repo, the 'make -s kernelrelease' command is on average > ~4 times faster with this patch applied (geometric mean of ratios). > > For the commit mentioned in Josh's original report [3], the > time-consuming part of setlocalversion goes from > > $ time git describe --match=v6.12-rc5 c1e939a21eb1 > v6.12-rc5-44-gc1e939a21eb1 > > real 0m1.210s > > to > > $ time git rev-list --count --left-right v6.12-rc5..c1e939a21eb1 > 0 44 > > real 0m0.037s > > [1] https://lore.kernel.org/git/20241101113910.GA2301440@coredump.intra.peff.net/ > [2] https://lore.kernel.org/git/20241106192236.GC880133@coredump.intra.peff.net/ > [3] https://lore.kernel.org/lkml/309549cafdcfe50c4fceac3263220cc3d8b109b2.1730337435.git.jpoimboe@kernel.org/ > > Reported-by: Sean Christopherson <seanjc@google.com> > Closes: https://lore.kernel.org/lkml/ZPtlxmdIJXOe0sEy@google.com/ > Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> > Closes: https://lore.kernel.org/lkml/309549cafdcfe50c4fceac3263220cc3d8b109b2.1730337435.git.jpoimboe@kernel.org/ > Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > --- > v4: > > - Switch the logic to make use of the return values from try_tag, > instead of asking whether $count has been set. No, please do not do this. As I replied in v3, my plan is to set -e, because otherwise the shell script is fragile. With this version, -e will not work in try_tag() because it is used in the if condition. > +try_tag() { > + tag="$1" > + > + # Is $tag an annotated tag? > + [ "$(git cat-file -t "$tag" 2> /dev/null)" = tag ] || return 1 > + > + # Is it an ancestor of HEAD, and if so, how many commits are in $tag..HEAD? > + # shellcheck disable=SC2046 # word splitting is the point here > + set -- $(git rev-list --count --left-right "$tag"...HEAD 2> /dev/null) > + > + # $1 is 0 if and only if $tag is an ancestor of HEAD. Use > + # string comparison, because $1 is empty if the 'git rev-list' > + # command somehow failed. > + [ "$1" = 0 ] || return 1 > + > + # $2 is the number of commits in the range $tag..HEAD, possibly 0. > + count="$2" Redundant double-quotes. -- Best Regards Masahiro Yamada
On Sat, Nov 23 2024, Masahiro Yamada <masahiroy@kernel.org> wrote: > On Sat, Nov 23, 2024 at 12:01 AM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: >> v4: >> >> - Switch the logic to make use of the return values from try_tag, >> instead of asking whether $count has been set. > > > No, please do not do this. > > As I replied in v3, my plan is to set -e, because otherwise > the shell script is fragile. > > With this version, -e will not work in try_tag() > because it is used in the if condition. I'm confused. Previously you said that "either style is fine with me". Now you've invented some necessity to add "set -e", which of course yes, is then suppressed inside try_tag. But there is not a single statement within that that would cause "set -e" to exit anyway: The only one that is not a simple assignment or is itself a test is the "set -- $()", and git rev-list failing there does not cause set -e to trigger. Aside from that, I'm skeptical to introducing set -e anyway, it's simply too hard to reason about what it will actually do. http://mywiki.wooledge.org/BashFAQ/105 . But you're the maintainer. >> +try_tag() { >> + tag="$1" >> + >> + # Is $tag an annotated tag? >> + [ "$(git cat-file -t "$tag" 2> /dev/null)" = tag ] || return 1 >> + >> + # Is it an ancestor of HEAD, and if so, how many commits are in $tag..HEAD? >> + # shellcheck disable=SC2046 # word splitting is the point here >> + set -- $(git rev-list --count --left-right "$tag"...HEAD 2> /dev/null) >> + >> + # $1 is 0 if and only if $tag is an ancestor of HEAD. Use >> + # string comparison, because $1 is empty if the 'git rev-list' >> + # command somehow failed. >> + [ "$1" = 0 ] || return 1 >> + >> + # $2 is the number of commits in the range $tag..HEAD, possibly 0. >> + count="$2" > > Redundant double-quotes. Perhaps, but sorry, I'm not code-golfing, and trying to remember when quotes can be elided when variables are referenced is simply not something I spend my very limited brain capacity on. Feel free to make any adjustments you want and commit that, or drop this, I'm not sending a v5 as that seems to be a waste of everybody's time. Rasmus
On Sun, Nov 24, 2024 at 7:12 AM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > On Sat, Nov 23 2024, Masahiro Yamada <masahiroy@kernel.org> wrote: > > > On Sat, Nov 23, 2024 at 12:01 AM Rasmus Villemoes > > <linux@rasmusvillemoes.dk> wrote: > > >> v4: > >> > >> - Switch the logic to make use of the return values from try_tag, > >> instead of asking whether $count has been set. > > > > > > No, please do not do this. > > > > As I replied in v3, my plan is to set -e, because otherwise > > the shell script is fragile. > > > > With this version, -e will not work in try_tag() > > because it is used in the if condition. > > I'm confused. Previously you said that "either style is fine with > me". That was my comment in v2. At that time, I was not aware that the -e option was missing in this script. Sorry, I changed my mind. In v3, I commented what I like this script to look like when turning on the -e option. Then, you came back with a different approach. > Now you've invented some necessity to add "set -e", which of course > yes, is then suppressed inside try_tag. But there is not a single > statement within that that would cause "set -e" to exit anyway: The only > one that is not a simple assignment or is itself a test is the "set -- > $()", and git rev-list failing there does not cause set -e to trigger. This is correct, but think about the resiliency when someone adds more code to try_tag() in the future. > Aside from that, I'm skeptical to introducing set -e anyway, it's simply > too hard to reason about what it will actually > do. http://mywiki.wooledge.org/BashFAQ/105 . But you're the maintainer. First, set -e is almost always useful because it is not realistic to add '|| exit 1' for every command. For example, see commit bc53d3d777f81385c1bb08b07bd1c06450ecc2c1 how the -e catches an error. Second, the link you reference is exaggerating. For example, the article mentioned the quirks in the Bash mode. This argument is not applicable in our case because Bash runs in the POSIX mode when it is invoked as 'sh'. Since this script is invoked by #!/bin/sh, 'set -e' follows the POSIX behavior whether /bin/sh is a symlink to bash or dash. The -e option is propagated to subshell executions. (It would not if the shebang had been #!/bin/bash, but Bash still provides "set -o posix" to cater to this case). In the referred link, I do not find a good reason to avoid using the -e option. When a function returns a value (yes/no question just like try_tag()), you need to be careful about the possible third case. 1) yes 2) no 3) error I hope the following provides even more clarification. Let's say, is_ancestor_tag() checks if the given tag is an ancestor or not. [Bad Code] set -e if is_ancestor_tag "${tag}"; then # Any error in is_ancestor_tag() is ignored. # "Yes, ... " may be printed even if an error occurs. echo "Yes, ${tag} is an ancestor" else echo "No, ${tag} is not an ancestor" fi [Good Code 1] set -e ret=$(is_ancestor_tag "${tag}") # If any error occurs in is_ancestor_tag() # the script is terminated here. if [ "${ret}" = yes ]; then echo "Yes, ${tag} is an ancestor" else echo "No, ${tag} is not an ancestor" fi [Good Code 2] set -e # is_ancestor_tag() sets 'ret' in the function body is_ancestor_tag "${tag}" if [ "${ret}" = yes ]; then echo "Yes, ${tag} is an ancestor" else echo "No, ${tag} is not an ancestor" fi V3 is [Good Code 2], as the return value, 'count' is assigned within the try_tag() function, and the caller checks it. > >> +try_tag() { > >> + tag="$1" > >> + > >> + # Is $tag an annotated tag? > >> + [ "$(git cat-file -t "$tag" 2> /dev/null)" = tag ] || return 1 > >> + > >> + # Is it an ancestor of HEAD, and if so, how many commits are in $tag..HEAD? > >> + # shellcheck disable=SC2046 # word splitting is the point here > >> + set -- $(git rev-list --count --left-right "$tag"...HEAD 2> /dev/null) > >> + > >> + # $1 is 0 if and only if $tag is an ancestor of HEAD. Use > >> + # string comparison, because $1 is empty if the 'git rev-list' > >> + # command somehow failed. > >> + [ "$1" = 0 ] || return 1 > >> + > >> + # $2 is the number of commits in the range $tag..HEAD, possibly 0. > >> + count="$2" > > > > Redundant double-quotes. > > Perhaps, but sorry, I'm not code-golfing, and trying to remember when > quotes can be elided when variables are referenced is simply not > something I spend my very limited brain capacity on. > > Feel free to make any adjustments you want and commit that, or drop > this, I'm not sending a v5 as that seems to be a waste of everybody's > time. > > Rasmus -- Best Regards Masahiro Yamada
diff --git a/scripts/setlocalversion b/scripts/setlocalversion index 38b96c6797f4..cde45d92cc0b 100755 --- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -30,6 +30,27 @@ if test $# -gt 0 -o ! -d "$srctree"; then usage fi +try_tag() { + tag="$1" + + # Is $tag an annotated tag? + [ "$(git cat-file -t "$tag" 2> /dev/null)" = tag ] || return 1 + + # Is it an ancestor of HEAD, and if so, how many commits are in $tag..HEAD? + # shellcheck disable=SC2046 # word splitting is the point here + set -- $(git rev-list --count --left-right "$tag"...HEAD 2> /dev/null) + + # $1 is 0 if and only if $tag is an ancestor of HEAD. Use + # string comparison, because $1 is empty if the 'git rev-list' + # command somehow failed. + [ "$1" = 0 ] || return 1 + + # $2 is the number of commits in the range $tag..HEAD, possibly 0. + count="$2" + + return 0 +} + scm_version() { local short=false @@ -64,30 +85,27 @@ scm_version() # If a localversion* file exists, and the corresponding # annotated tag exists and is an ancestor of HEAD, use # it. This is the case in linux-next. - tag=${file_localversion#-} - desc= - if [ -n "${tag}" ]; then - desc=$(git describe --match=$tag 2>/dev/null) - fi - + if [ -n "${file_localversion#-}" ] && try_tag "${file_localversion#-}" ; then + : # Otherwise, if a localversion* file exists, and the tag # obtained by appending it to the tag derived from # KERNELVERSION exists and is an ancestor of HEAD, use # it. This is e.g. the case in linux-rt. - if [ -z "${desc}" ] && [ -n "${file_localversion}" ]; then - tag="${version_tag}${file_localversion}" - desc=$(git describe --match=$tag 2>/dev/null) - fi - + elif [ -n "${file_localversion}" ] && try_tag "${version_tag}${file_localversion}" ; then + : # Otherwise, default to the annotated tag derived from KERNELVERSION. - if [ -z "${desc}" ]; then - tag="${version_tag}" - desc=$(git describe --match=$tag 2>/dev/null) + elif try_tag "${version_tag}" ; then + : + else + count= fi - # If we are at the tagged commit, we ignore it because the version is - # well-defined. - if [ "${tag}" != "${desc}" ]; then + # If we are at the tagged commit, we ignore it because the + # version is well-defined. If none of the attempted tags exist + # or were usable, $count is empty, so there is no count to + # pretty-print, but we can and should still append the -g plus + # the abbreviated sha1. + if [ "${count}" != 0 ]; then # If only the short version is requested, don't bother # running further git commands @@ -95,14 +113,15 @@ scm_version() echo "+" return fi + # If we are past the tagged commit, we pretty print it. # (like 6.1.0-14595-g292a089d78d3) - if [ -n "${desc}" ]; then - echo "${desc}" | awk -F- '{printf("-%05d", $(NF-1))}' + if [ -n "${count}" ]; then + printf "%s%05d" "-" "${count}" fi # Add -g and exactly 12 hex chars. - printf '%s%s' -g "$(echo $head | cut -c1-12)" + printf '%s%.12s' -g "$head" fi if ${no_dirty}; then
Contrary to expectations, passing a single candidate tag to "git describe" is slower than not passing any --match options. $ time git describe --debug ... traversed 10619 commits ... v6.12-rc5-63-g0fc810ae3ae1 real 0m0.169s $ time git describe --match=v6.12-rc5 --debug ... traversed 1310024 commits v6.12-rc5-63-g0fc810ae3ae1 real 0m1.281s In fact, the --debug output shows that git traverses all or most of history. For some repositories and/or git versions, those 1.3s are actually 10-15 seconds. This has been acknowledged as a performance bug in git [1], and a fix is on its way [2]. However, no solution is yet in git.git, and even when one lands, it will take quite a while before it finds its way to a release and for $random_kernel_developer to pick that up. So rewrite the logic to use plumbing commands. For each of the candidate values of $tag, we ask: (1) is $tag even an annotated tag? (2) Is it eligible to describe HEAD, i.e. an ancestor of HEAD? (3) If so, how many commits are in $tag..HEAD? I have tested that this produces the same output as the current script for ~700 random commits between v6.9..v6.10. For those 700 commits, and in my git repo, the 'make -s kernelrelease' command is on average ~4 times faster with this patch applied (geometric mean of ratios). For the commit mentioned in Josh's original report [3], the time-consuming part of setlocalversion goes from $ time git describe --match=v6.12-rc5 c1e939a21eb1 v6.12-rc5-44-gc1e939a21eb1 real 0m1.210s to $ time git rev-list --count --left-right v6.12-rc5..c1e939a21eb1 0 44 real 0m0.037s [1] https://lore.kernel.org/git/20241101113910.GA2301440@coredump.intra.peff.net/ [2] https://lore.kernel.org/git/20241106192236.GC880133@coredump.intra.peff.net/ [3] https://lore.kernel.org/lkml/309549cafdcfe50c4fceac3263220cc3d8b109b2.1730337435.git.jpoimboe@kernel.org/ Reported-by: Sean Christopherson <seanjc@google.com> Closes: https://lore.kernel.org/lkml/ZPtlxmdIJXOe0sEy@google.com/ Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> Closes: https://lore.kernel.org/lkml/309549cafdcfe50c4fceac3263220cc3d8b109b2.1730337435.git.jpoimboe@kernel.org/ Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> --- v4: - Switch the logic to make use of the return values from try_tag, instead of asking whether $count has been set. - Update a comment. - Drop T-b tag from Josh as I think this changes the flow sufficiently from the version he tested. I have repeated my tests, with the same functional and performance result as indicated in the commit log. v3: https://lore.kernel.org/lkml/20241118110154.3711777-1-linux@rasmusvillemoes.dk/ scripts/setlocalversion | 59 +++++++++++++++++++++++++++-------------- 1 file changed, 39 insertions(+), 20 deletions(-)