Message ID | 20250211194334.20710-1-meetsoni3017@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [GSoC] merge-recursive: optimize string_list construction | expand |
On Tue, Feb 11, 2025 at 11:43 AM Meet Soni <meetsoni3017@gmail.com> wrote: > > Avoid O(n^2) complexity when building a sorted `string_list` by > constructing it unsorted and sorting it afterward, reducing the > complexity to O(n log n). I'm tempted to say merge-recursive.[ch] is nearly dead and planned for removal, so there's not much value in messing with it, but...it's not dead yet, so I guess this is worthwhile. > Signed-off-by: Meet Soni <meetsoni3017@gmail.com> > --- > merge-recursive.c | 14 ++++---------- > 1 file changed, 4 insertions(+), 10 deletions(-) > > diff --git a/merge-recursive.c b/merge-recursive.c > index 5dfaf32b2c..c43b79e6ef 100644 > --- a/merge-recursive.c > +++ b/merge-recursive.c > @@ -2757,24 +2757,18 @@ static int process_renames(struct merge_options *opt, > struct string_list b_by_dst = STRING_LIST_INIT_NODUP; > const struct rename *sre; > > - /* > - * FIXME: As string-list.h notes, it's O(n^2) to build a sorted > - * string_list one-by-one, but O(n log n) to build it unsorted and > - * then sort it. Note that as we build the list, we do not need to > - * check if the existing destination path is already in the list, > - * because the structure of diffcore_rename guarantees we won't > - * have duplicates. > - */ > for (i = 0; i < a_renames->nr; i++) { > sre = a_renames->items[i].util; > - string_list_insert(&a_by_dst, sre->pair->two->path)->util > + string_list_append(&a_by_dst, sre->pair->two->path)->util > = (void *)sre; > } > for (i = 0; i < b_renames->nr; i++) { > sre = b_renames->items[i].util; > - string_list_insert(&b_by_dst, sre->pair->two->path)->util > + string_list_append(&b_by_dst, sre->pair->two->path)->util > = (void *)sre; > } > + string_list_sort(&a_by_dst); > + string_list_sort(&b_by_dst); If the original source had duplicates, this would change behavior (the insert function checks for duplicates while append does not). Granted, the comment above the block points out why there aren't duplicates, but will that be obvious to future readers now that you've removed the whole comment? Also, are the sources already sorted? If so, we can avoid the manual sort calls at the end, and drop this from O(n log n) to O(n). Digging through the code...it appears these are setup in get_renames() and are sorted but by pair->one->path rather than pair->two->path, so we do need the sorts here. Of course, get_renames() itself utilizes string_list_insert() rather than string_list_append. and there are a number of other string_list_insert calls in the code (though some of the others might be hard to restructure) -- perhaps the first line of your commit message should have a "in process_renames" qualifier, since it's only addressing one case? Anyway, other than perhaps tweaking the first line of the commit message, and not removing the whole comment, the patch looks good to me.
diff --git a/merge-recursive.c b/merge-recursive.c index 5dfaf32b2c..c43b79e6ef 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -2757,24 +2757,18 @@ static int process_renames(struct merge_options *opt, struct string_list b_by_dst = STRING_LIST_INIT_NODUP; const struct rename *sre; - /* - * FIXME: As string-list.h notes, it's O(n^2) to build a sorted - * string_list one-by-one, but O(n log n) to build it unsorted and - * then sort it. Note that as we build the list, we do not need to - * check if the existing destination path is already in the list, - * because the structure of diffcore_rename guarantees we won't - * have duplicates. - */ for (i = 0; i < a_renames->nr; i++) { sre = a_renames->items[i].util; - string_list_insert(&a_by_dst, sre->pair->two->path)->util + string_list_append(&a_by_dst, sre->pair->two->path)->util = (void *)sre; } for (i = 0; i < b_renames->nr; i++) { sre = b_renames->items[i].util; - string_list_insert(&b_by_dst, sre->pair->two->path)->util + string_list_append(&b_by_dst, sre->pair->two->path)->util = (void *)sre; } + string_list_sort(&a_by_dst); + string_list_sort(&b_by_dst); for (i = 0, j = 0; i < a_renames->nr || j < b_renames->nr;) { struct string_list *renames1, *renames2Dst;
Avoid O(n^2) complexity when building a sorted `string_list` by constructing it unsorted and sorting it afterward, reducing the complexity to O(n log n). Signed-off-by: Meet Soni <meetsoni3017@gmail.com> --- merge-recursive.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) base-commit: 9520f7d9985d8879bddd157309928fc0679c8e92