diff mbox series

[5/7] diffcore-rename: reduce jumpiness in progress counters

Message ID e8257516c1f983d590b8a81fb32ecd5cb91dc737.1607223276.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series diffcore-rename improvements | expand

Commit Message

Elijah Newren Dec. 6, 2020, 2:54 a.m. UTC
From: Elijah Newren <newren@gmail.com>

Inexact rename detection works by comparing all sources to all
destinations, computing similarities, and then finding the best matches
among those that are sufficiently similar.

However, it is preceded by exact rename detection that works by
checking if there are files with identical hashes.  If exact renames are
found, we can exclude some files from inexact rename detection.

The inexact rename detection loops over the full set of files, but
immediately skips those for which rename_dst[i].is_rename is true and
thus doesn't compare any sources to that destination.  As such, these
paths shouldn't be included in the progress counter.

For the eagle eyed, this change hints at an actual optimization -- the
first one I presented at Git Merge 2020.  I'll be submitting that
optimization later, once the basic merge-ort algorithm has merged.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Taylor Blau Dec. 9, 2020, 10:24 p.m. UTC | #1
On Sun, Dec 06, 2020 at 02:54:34AM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> Inexact rename detection works by comparing all sources to all
> destinations, computing similarities, and then finding the best matches
> among those that are sufficiently similar.
>
> However, it is preceded by exact rename detection that works by
> checking if there are files with identical hashes.  If exact renames are
> found, we can exclude some files from inexact rename detection.
>
> The inexact rename detection loops over the full set of files, but
> immediately skips those for which rename_dst[i].is_rename is true and
> thus doesn't compare any sources to that destination.  As such, these
> paths shouldn't be included in the progress counter.
>
> For the eagle eyed, this change hints at an actual optimization -- the
> first one I presented at Git Merge 2020.  I'll be submitting that
> optimization later, once the basic merge-ort algorithm has merged.

;-) I was hoping that that was the case.

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  diffcore-rename.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 7270eb6af48..3d637ba4645 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -587,7 +587,7 @@ void diffcore_rename(struct diff_options *options)
>  	if (options->show_rename_progress) {
>  		progress = start_delayed_progress(
>  				_("Performing inexact rename detection"),
> -				(uint64_t)rename_dst_nr * (uint64_t)rename_src_nr);
> +				(uint64_t)num_targets * (uint64_t)rename_src_nr);
>  	}
>
>  	mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_targets), sizeof(*mx));
> @@ -626,7 +626,8 @@ void diffcore_rename(struct diff_options *options)
>  			diff_free_filespec_blob(two);
>  		}
>  		dst_cnt++;
> -		display_progress(progress, (uint64_t)(i+1)*(uint64_t)rename_src_nr);
> +		display_progress(progress,
> +				 (uint64_t)dst_cnt * (uint64_t)rename_src_nr);

Both of these spots need only one cast, but the patch itself looks
correct (with or without dropping a cast on one of the operands in each
expression).

Thanks,
Taylor
Junio C Hamano Dec. 10, 2020, 2:36 a.m. UTC | #2
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> Inexact rename detection works by comparing all sources to all
> destinations, computing similarities, and then finding the best matches
> among those that are sufficiently similar.

Here, you are using <sources, destinations> as contrasting pair of
words, which supports my comment on 1/7 and 3/7 ;-)

> However, it is preceded by exact rename detection that works by
> checking if there are files with identical hashes.  If exact renames are
> found, we can exclude some files from inexact rename detection.
>
> The inexact rename detection loops over the full set of files, but
> immediately skips those for which rename_dst[i].is_rename is true and
> thus doesn't compare any sources to that destination.  As such, these
> paths shouldn't be included in the progress counter.
> ...
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 7270eb6af48..3d637ba4645 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -587,7 +587,7 @@ void diffcore_rename(struct diff_options *options)
>  	if (options->show_rename_progress) {
>  		progress = start_delayed_progress(
>  				_("Performing inexact rename detection"),
> -				(uint64_t)rename_dst_nr * (uint64_t)rename_src_nr);
> +				(uint64_t)num_targets * (uint64_t)rename_src_nr);
>  	}

The num_targets (number of destinations) holds the "remaining"
candidates after exact renames are taken care of, so this reduces
the size of the matrix used to count the progress meter.  OK.

>  	mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_targets), sizeof(*mx));
> @@ -626,7 +626,8 @@ void diffcore_rename(struct diff_options *options)
>  			diff_free_filespec_blob(two);
>  		}
>  		dst_cnt++;
> -		display_progress(progress, (uint64_t)(i+1)*(uint64_t)rename_src_nr);
> +		display_progress(progress,
> +				 (uint64_t)dst_cnt * (uint64_t)rename_src_nr);

And this fills the progress meter by using the number of
destinations that are actually considered.  Between the two hunks,
there is a "if the source of this destination is already known, move
to next 'i'" continue, that does not update the progress meter.
Changing (i+1) to dst_cnt here compensates for the reduction of the
matrix size we see above.

Makes sense.  This looks good.

>  	}
>  	stop_progress(&progress);
diff mbox series

Patch

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 7270eb6af48..3d637ba4645 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -587,7 +587,7 @@  void diffcore_rename(struct diff_options *options)
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),
-				(uint64_t)rename_dst_nr * (uint64_t)rename_src_nr);
+				(uint64_t)num_targets * (uint64_t)rename_src_nr);
 	}
 
 	mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_targets), sizeof(*mx));
@@ -626,7 +626,8 @@  void diffcore_rename(struct diff_options *options)
 			diff_free_filespec_blob(two);
 		}
 		dst_cnt++;
-		display_progress(progress, (uint64_t)(i+1)*(uint64_t)rename_src_nr);
+		display_progress(progress,
+				 (uint64_t)dst_cnt * (uint64_t)rename_src_nr);
 	}
 	stop_progress(&progress);