Message ID | 20180911234951.14129-5-sbeller@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fetch: make sure submodule oids are fetched | expand |
Stefan Beller <sbeller@google.com> writes: > We can string_list_insert() to maintain sorted-ness of the > list as we find new items, or we can string_list_append() to > build an unsorted list and sort it at the end just once. > > To pick which one is more appropriate, we notice the fact > that we discover new items more or less in the already > sorted order. That makes "append then sort" more > appropriate. Sorry, but I still do not get the math you are implying in the second paragraph. Are you saying that append-then-sort is efficient when items being appended is already sorted? That depends on the sorting algorithm used, so the logic is incomplete unless you say "given that we use X for sorting,...", I think. Do we really discover new items in sorted order, by the way? In a single diff invocation made inside collect_changed_submodules() for one commit in the superproject's history, we will grab changed paths in the pathname order (i.e. sorted); if the superproject's tip commit touches the submodules at paths A and Z, we will discover these two paths in sorted order. But because we are walking the superproject's history to collect all paths that have been affected in that function, and repeatedly calling diff as we discover commit in the superproject's history, I am not sure how well the resulting set of paths would be sorted. The tip commit in superproject's history may have modified the submodule at path X, the parent of that commit may have touched the submodule at path M, and its parent may have touched the submodule at path A. Don't we end up grabbing these paths in that discoverd order, i.e. X, M and A? I still think changing it from "insert as we find an item, keeping the list sorted" to "append all and then sort before we start looking things up from the result" makes sense, but I do not think the "we find things in sorted order" is either true, or it would affect the choice between the two. A justification to choose the latter I can think of that makes sense is that we don't have to pay cost to keep the list sorted while building it because we do not do any look-up while building the list. > Signed-off-by: Stefan Beller <sbeller@google.com> > --- > submodule.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/submodule.c b/submodule.c > index d29dfa3d1f5..c6eff7699f3 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp, > case RECURSE_SUBMODULES_DEFAULT: > case RECURSE_SUBMODULES_ON_DEMAND: > if (!submodule || > - !unsorted_string_list_lookup( > + !string_list_lookup( > &changed_submodule_names, > submodule->name)) > continue; > @@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r, > /* default value, "--submodule-prefix" and its value are added later */ > > calculate_changed_submodule_paths(); > + string_list_sort(&changed_submodule_names); > run_processes_parallel(max_parallel_jobs, > get_next_submodule, > fetch_start_failure,
On Wed, Sep 12, 2018 at 11:18 AM Junio C Hamano <gitster@pobox.com> wrote: > > Stefan Beller <sbeller@google.com> writes: > > > We can string_list_insert() to maintain sorted-ness of the > > list as we find new items, or we can string_list_append() to > > build an unsorted list and sort it at the end just once. > > > > To pick which one is more appropriate, we notice the fact > > that we discover new items more or less in the already > > sorted order. That makes "append then sort" more > > appropriate. > > Sorry, but I still do not get the math you are implying in the > second paragraph. Are you saying that append-then-sort is efficient > when items being appended is already sorted? That depends on the > sorting algorithm used, so the logic is incomplete unless you say > "given that we use X for sorting,...", I think. > > Do we really discover new items in sorted order, by the way? In a > single diff invocation made inside collect_changed_submodules() for > one commit in the superproject's history, we will grab changed paths > in the pathname order (i.e. sorted); if the superproject's tip commit > touches the submodules at paths A and Z, we will discover these two > paths in sorted order. > > But because we are walking the superproject's history to collect all > paths that have been affected in that function, and repeatedly > calling diff as we discover commit in the superproject's history, I > am not sure how well the resulting set of paths would be sorted. > > The tip commit in superproject's history may have modified the > submodule at path X, the parent of that commit may have touched the > submodule at path M, and its parent may have touched the submodule > at path A. Don't we end up grabbing these paths in that discoverd > order, i.e. X, M and A? That is true. > > I still think changing it from "insert as we find an item, keeping > the list sorted" to "append all and then sort before we start > looking things up from the result" makes sense, but I do not think > the "we find things in sorted order" is either true, or it would > affect the choice between the two. A justification to choose the > latter I can think of that makes sense is that we don't have to pay > cost to keep the list sorted while building it because we do not do > any look-up while building the list. ok. Thanks, Stefan
diff --git a/submodule.c b/submodule.c index d29dfa3d1f5..c6eff7699f3 100644 --- a/submodule.c +++ b/submodule.c @@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp, case RECURSE_SUBMODULES_DEFAULT: case RECURSE_SUBMODULES_ON_DEMAND: if (!submodule || - !unsorted_string_list_lookup( + !string_list_lookup( &changed_submodule_names, submodule->name)) continue; @@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r, /* default value, "--submodule-prefix" and its value are added later */ calculate_changed_submodule_paths(); + string_list_sort(&changed_submodule_names); run_processes_parallel(max_parallel_jobs, get_next_submodule, fetch_start_failure,
We can string_list_insert() to maintain sorted-ness of the list as we find new items, or we can string_list_append() to build an unsorted list and sort it at the end just once. To pick which one is more appropriate, we notice the fact that we discover new items more or less in the already sorted order. That makes "append then sort" more appropriate. Signed-off-by: Stefan Beller <sbeller@google.com> --- submodule.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)