Message ID | 6fad1fc7fdad98c3dda1ec334a10a6a9e311fef8.1602139448.git.liu.denton@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | e37eae0c1e13cb418947383f7c803d9463bfa3d7 |
Headers | show |
Series | contrib/git-resurrect.sh: make it hash-agnostic | expand |
On Wed, Oct 07, 2020 at 11:44:40PM -0700, Denton Liu wrote: > diff --git a/contrib/git-resurrect.sh b/contrib/git-resurrect.sh > index 57a77c03f9..d843df3afd 100755 > --- a/contrib/git-resurrect.sh > +++ b/contrib/git-resurrect.sh > @@ -37,19 +37,18 @@ search_reflog_merges () { > ) > } > > -_x40="[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]" > -_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40" > +oid_pattern=$(git hash-object --stdin </dev/null | sed -e 's/./[0-9a-f]/g') This looks correct, although... > search_merges () { > git rev-list --all --grep="Merge branch '$1'" \ > --pretty=tformat:"%P %s" | > - sed -ne "/^$_x40 \($_x40\) Merge .*/ {s//\1/p;$early_exit}" > + sed -ne "/^$oid_pattern \($oid_pattern\) Merge .*/ {s//\1/p;$early_exit}" > } > > search_merge_targets () { > git rev-list --all --grep="Merge branch '[^']*' into $branch\$" \ > --pretty=tformat:"%H %s" --all | > - sed -ne "/^\($_x40\) Merge .*/ {s//\1/p;$early_exit} " > + sed -ne "/^\($oid_pattern\) Merge .*/ {s//\1/p;$early_exit} " > } in both cases we are matching output we asked for, so we really matching [0-9a-f]\+ would be correct and sufficient. That's a little simpler. I don't feel too strongly either way, though. Side note: It's a shame that there is no way to convince rev-list not to print the "commit ..." header, which is really what we're avoiding here. We probably should have suppressed it with user-formats when they were introduced, but it's too late to make that change. I wonder if it would be worth adding a command-line option, though. I've often had to hack around this when parsing rev-list output (and sometimes even resort to using git-log if it's a one-off). -Peff
On Thu, Oct 08, 2020 at 11:29:34AM -0700, Junio C Hamano wrote: > > Side note: It's a shame that there is no way to convince rev-list not > > to print the "commit ..." header, which is really what we're avoiding > > here. We probably should have suppressed it with user-formats when > > they were introduced, but it's too late to make that change. I wonder > > if it would be worth adding a command-line option, though. I've often > > had to hack around this when parsing rev-list output (and sometimes > > even resort to using git-log if it's a one-off). > > Or make "git log" without frills as fast as rev-list, perhaps? > > What extra things do we do that makes "log" inherently slower than > "rev-list"? It's not the speed of log that is a problem, but just that I usually try to use plumbing when scripting. So I often reach for rev-list first. I do think for just listing commit hashes that log is slower, though. One reason is that when there's a commit-graph, it's not as good at avoiding reading the commit objects. E.g.: $ time git rev-list HEAD >/dev/null real 0m0.031s user 0m0.027s sys 0m0.004s $ time git -c core.commitgraph=false rev-list HEAD >/dev/null real 0m0.362s user 0m0.345s sys 0m0.016s $ time git log --format=%H HEAD >/dev/null real 0m0.371s user 0m0.355s sys 0m0.016s So running "git log" takes about the same time as rev-list if we disable the commit-graph. Which makes sense. The pretty-print code is aggressive about loading the object contents, even if we end up not needing it (because really, everything _except_ userformat does need it, and even userformat usually needs it). I think it would be nice to make the formatting code smarter about reporting exactly which parts it needs. > I do not mind a new option (e.g. --no-header) to "rev-list", though. I took a brief look at this earlier today and it was more awkward than I expected. The "commit <oid>" header might also have other stuff attached to it (revision marks, parents, and who even knew we had a "--timestamp" option?). It's not clear where those things should go if we suppress the header (for oneline, they just get stuck in front of the oneline; would that be OK for userformats, too?). -Peff
On 2020-10-08 at 16:13:45, Jeff King wrote: > On Wed, Oct 07, 2020 at 11:44:40PM -0700, Denton Liu wrote: > > > diff --git a/contrib/git-resurrect.sh b/contrib/git-resurrect.sh > > index 57a77c03f9..d843df3afd 100755 > > --- a/contrib/git-resurrect.sh > > +++ b/contrib/git-resurrect.sh > > @@ -37,19 +37,18 @@ search_reflog_merges () { > > ) > > } > > > > -_x40="[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]" > > -_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40" > > +oid_pattern=$(git hash-object --stdin </dev/null | sed -e 's/./[0-9a-f]/g') > > This looks correct, although... We could write this more simply as this: oid_pattern=$(git hash-object /dev/null | sed -e 's/./[0-9a-f]/g') I'm almost certain that works just fine, even on Windows, and I think the hashing code may be able to optimize better if we avoid --stdin (since it knows the size ahead of time). The performance benefit, if any, won't be that significant, though, since we're only hashing 7 bytes. This is, of course, not worth a reroll in itself. > > search_merges () { > > git rev-list --all --grep="Merge branch '$1'" \ > > --pretty=tformat:"%P %s" | > > - sed -ne "/^$_x40 \($_x40\) Merge .*/ {s//\1/p;$early_exit}" > > + sed -ne "/^$oid_pattern \($oid_pattern\) Merge .*/ {s//\1/p;$early_exit}" > > } > > > > search_merge_targets () { > > git rev-list --all --grep="Merge branch '[^']*' into $branch\$" \ > > --pretty=tformat:"%H %s" --all | > > - sed -ne "/^\($_x40\) Merge .*/ {s//\1/p;$early_exit} " > > + sed -ne "/^\($oid_pattern\) Merge .*/ {s//\1/p;$early_exit} " > > } > > in both cases we are matching output we asked for, so we really matching > [0-9a-f]\+ would be correct and sufficient. That's a little simpler. I > don't feel too strongly either way, though. The problem here is that we'd need to write [0-9a-f][0-9a-f]* because this is a BRE and a backslashed + here is a GNU extension. I'm fine with this patch (and patch 1) as it stands or with that change. I don't think this can be ambiguous, although if we think it can be (even with silly user behavior), then we should adopt this solution. > Side note: It's a shame that there is no way to convince rev-list not > to print the "commit ..." header, which is really what we're avoiding > here. We probably should have suppressed it with user-formats when > they were introduced, but it's too late to make that change. I wonder > if it would be worth adding a command-line option, though. I've often > had to hack around this when parsing rev-list output (and sometimes > even resort to using git-log if it's a one-off). Yeah, I use git log for scripting more often than I'd like because rev-list can't remove the header. That would indeed be a welcome feature.
On Okt 08 2020, brian m. carlson wrote: > On 2020-10-08 at 16:13:45, Jeff King wrote: >> On Wed, Oct 07, 2020 at 11:44:40PM -0700, Denton Liu wrote: >> > search_merges () { >> > git rev-list --all --grep="Merge branch '$1'" \ >> > --pretty=tformat:"%P %s" | >> > - sed -ne "/^$_x40 \($_x40\) Merge .*/ {s//\1/p;$early_exit}" >> > + sed -ne "/^$oid_pattern \($oid_pattern\) Merge .*/ {s//\1/p;$early_exit}" >> > } >> > >> > search_merge_targets () { >> > git rev-list --all --grep="Merge branch '[^']*' into $branch\$" \ >> > --pretty=tformat:"%H %s" --all | >> > - sed -ne "/^\($_x40\) Merge .*/ {s//\1/p;$early_exit} " >> > + sed -ne "/^\($oid_pattern\) Merge .*/ {s//\1/p;$early_exit} " >> > } >> >> in both cases we are matching output we asked for, so we really matching >> [0-9a-f]\+ would be correct and sufficient. That's a little simpler. I >> don't feel too strongly either way, though. > > The problem here is that we'd need to write [0-9a-f][0-9a-f]* because > this is a BRE and a backslashed + here is a GNU extension. I wonder why --pretty uses %s when it is filtered out again anyway? (There is also a duplicate --all.) Andreas.
On Fri, Oct 09, 2020 at 09:55:53AM +0200, Andreas Schwab wrote: > >> > search_merges () { > >> > git rev-list --all --grep="Merge branch '$1'" \ > >> > --pretty=tformat:"%P %s" | > >> > - sed -ne "/^$_x40 \($_x40\) Merge .*/ {s//\1/p;$early_exit}" > >> > + sed -ne "/^$oid_pattern \($oid_pattern\) Merge .*/ {s//\1/p;$early_exit}" > [...] > I wonder why --pretty uses %s when it is filtered out again anyway? > (There is also a duplicate --all.) It does confirm that the commit in question is a (likely) merge commit, and not one that happens to have "Merge branch 'foo'" in the body, which the earlier --grep would have hit. But it doesn't actually check for 'foo' in the sed match, so it would be fooled by a commit message like: Merge branch 'bar' * bar: Merge branch 'foo' If we wanted to tighten that up, then sed should match the branch name. If we're willing to accept the looseness, the whole thing could probably be: git rev-list --all --grep="Merge branch '$1'" \ --min-parents 2 --parents ${early_exit:-"-1"} | awk '{print $3}' I'm happy either way, but I'm not sure anybody overly cares. Let's not derail Denton's actual fix to make the script work in a sha256 world. -Peff
diff --git a/contrib/git-resurrect.sh b/contrib/git-resurrect.sh index 57a77c03f9..d843df3afd 100755 --- a/contrib/git-resurrect.sh +++ b/contrib/git-resurrect.sh @@ -37,19 +37,18 @@ search_reflog_merges () { ) } -_x40="[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]" -_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40" +oid_pattern=$(git hash-object --stdin </dev/null | sed -e 's/./[0-9a-f]/g') search_merges () { git rev-list --all --grep="Merge branch '$1'" \ --pretty=tformat:"%P %s" | - sed -ne "/^$_x40 \($_x40\) Merge .*/ {s//\1/p;$early_exit}" + sed -ne "/^$oid_pattern \($oid_pattern\) Merge .*/ {s//\1/p;$early_exit}" } search_merge_targets () { git rev-list --all --grep="Merge branch '[^']*' into $branch\$" \ --pretty=tformat:"%H %s" --all | - sed -ne "/^\($_x40\) Merge .*/ {s//\1/p;$early_exit} " + sed -ne "/^\($oid_pattern\) Merge .*/ {s//\1/p;$early_exit} " } dry_run=
Since Git now supports hashes other than SHA-1, the hash length isn't guaranteed to be 40 characters. Replace $_x40 with a hash-agnostic OID pattern. Signed-off-by: Denton Liu <liu.denton@gmail.com> --- contrib/git-resurrect.sh | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)