Message ID | 9007249b9488c23f00c2d498ffd520e4af8b37a4.1673037405.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Bundle URIs V: creationToken heuristic for incremental fetches | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static const char *heuristics[] = { > + [BUNDLE_HEURISTIC_NONE] = "", > + [BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken", > +}; Ideally it would require the least amount of maintenance if we could define BUNDLE_HEURISTIC__COUNT as ARRAY_SIZE() of this thing, but it being a file scope static, it might not be easy to arrange that. As a lessor altenative, would it make it safer to size this array more explicitly using BUNDLE_HEURISTIC__COUNT macro? static const char *heuristics[BUNDLE_HEURISTIC__COUNT] = { ... }; or is it more-or-less moot point to aim for safety because nobody enforces that these [indices] used to define the contents of this array are dense? That is ... > @@ -142,6 +150,19 @@ static int bundle_list_update(const char *key, const char *value, > return 0; > } > > + if (!strcmp(subkey, "heuristic")) { > + int i; > + for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) { > + if (!strcmp(value, heuristics[i])) { > + list->heuristic = i; > + return 0; > + } > + } ... this strcmp() will segfault if heuristics[] array is sparse, or BUNDLE_HEURISTIC__COUNT is larger than the array (i.e. you add a new heuristic in "enum bundle_heuristic" before the __COUNT sentinel, but forget to add it to the heuristics[] array). "You are worrying too much. Our developers would notice a segfault and the current code, which may look risky to you, is something they can live with", is a perfectly acceptable response, but somehow I have this nagging feeling that we should be able to make it easier to maintain without incurring extra runtime cost. > diff --git a/bundle-uri.h b/bundle-uri.h > index d5e89f1671c..ad82174112d 100644 > --- a/bundle-uri.h > +++ b/bundle-uri.h > @@ -52,6 +52,14 @@ enum bundle_list_mode { > BUNDLE_MODE_ANY > }; > > +enum bundle_list_heuristic { > + BUNDLE_HEURISTIC_NONE = 0, > + BUNDLE_HEURISTIC_CREATIONTOKEN, > + > + /* Must be last. */ > + BUNDLE_HEURISTIC__COUNT, > +}; The only reason to leave a trailing comma is to make it easy to append new values at the end. By omitting the trailing comma, you can doubly signal "Must be last" here (not suggesting to remove the comment; suggesting to remove the trailing comma). Thanks.
On 1/8/2023 9:38 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> +static const char *heuristics[] = { >> + [BUNDLE_HEURISTIC_NONE] = "", >> + [BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken", >> +}; > > Ideally it would require the least amount of maintenance if we could > define BUNDLE_HEURISTIC__COUNT as ARRAY_SIZE() of this thing, but it > being a file scope static, it might not be easy to arrange that. As > a lessor altenative, would it make it safer to size this array more > explicitly using BUNDLE_HEURISTIC__COUNT macro? > > static const char *heuristics[BUNDLE_HEURISTIC__COUNT] = { > ... > }; Yes, I should have used this size indicator. > or is it more-or-less moot point to aim for safety because nobody > enforces that these [indices] used to define the contents of this > array are dense? > > That is ... > >> @@ -142,6 +150,19 @@ static int bundle_list_update(const char *key, const char *value, >> return 0; >> } >> >> + if (!strcmp(subkey, "heuristic")) { >> + int i; >> + for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) { >> + if (!strcmp(value, heuristics[i])) { >> + list->heuristic = i; >> + return 0; >> + } >> + } > > ... this strcmp() will segfault if heuristics[] array is sparse, or > BUNDLE_HEURISTIC__COUNT is larger than the array (i.e. you add a new > heuristic in "enum bundle_heuristic" before the __COUNT sentinel, > but forget to add it to the heuristics[] array). > > "You are worrying too much. Our developers would notice a segfault > and the current code, which may look risky to you, is something they > can live with", is a perfectly acceptable response, but somehow I > have this nagging feeling that we should be able to make it easier > to maintain without incurring extra runtime cost. You're right. I was following an established pattern of linking enums to values, but I'm not sure that those other examples will loop over the array like this looking for a value. A safer approach would be to have an array of (enum, string) pairs that could either be iterated in a loop (fast enough for a small number of enum values, such as this case) or used to populate a hashmap at runtime if needed for a large number of queries. >> diff --git a/bundle-uri.h b/bundle-uri.h >> index d5e89f1671c..ad82174112d 100644 >> --- a/bundle-uri.h >> +++ b/bundle-uri.h >> @@ -52,6 +52,14 @@ enum bundle_list_mode { >> BUNDLE_MODE_ANY >> }; >> >> +enum bundle_list_heuristic { >> + BUNDLE_HEURISTIC_NONE = 0, >> + BUNDLE_HEURISTIC_CREATIONTOKEN, >> + >> + /* Must be last. */ >> + BUNDLE_HEURISTIC__COUNT, >> +}; > > The only reason to leave a trailing comma is to make it easy to > append new values at the end. By omitting the trailing comma, you > can doubly signal "Must be last" here (not suggesting to remove the > comment; suggesting to remove the trailing comma). This is a great example of "doing the typically right thing" but without thinking of _why_ we do that thing. Thanks for pointing this out. Thanks, -Stolee
Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > > The bundle.heuristic value communicates that the bundle list is > organized to make use of the bundle.<id>.creationToken values that may > be provided in the bundle list. Those values will create a total order > on the bundles, allowing the Git client to download them in a specific > order and even remember previously-downloaded bundles by storing the > maximum creation token value. > > Before implementing any logic that parses or uses the > bundle.<id>.creationToken values, teach Git to parse the > bundle.heuristic value from a bundle list. We can use 'test-tool > bundle-uri' to print the heuristic value and verify that the parsing > works correctly. > > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > Documentation/config/bundle.txt | 7 +++++++ > bundle-uri.c | 21 +++++++++++++++++++++ > bundle-uri.h | 14 ++++++++++++++ > t/t5750-bundle-uri-parse.sh | 19 +++++++++++++++++++ > 4 files changed, 61 insertions(+) > > diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt > index daa21eb674a..3faae386853 100644 > --- a/Documentation/config/bundle.txt > +++ b/Documentation/config/bundle.txt > @@ -15,6 +15,13 @@ bundle.mode:: > complete understanding of the bundled information (`all`) or if any one > of the listed bundle URIs is sufficient (`any`). > > +bundle.heuristic:: > + If this string-valued key exists, then the bundle list is designed to > + work well with incremental `git fetch` commands. The heuristic signals > + that there are additional keys available for each bundle that help > + determine which subset of bundles the client should download. The > + only value currently understood is `creationToken`. This description clearly describes the 'heuristic' key and what it does. > + > bundle.<id>.*:: > The `bundle.<id>.*` keys are used to describe a single item in the > bundle list, grouped under `<id>` for identification purposes. > diff --git a/bundle-uri.c b/bundle-uri.c > index 36268dda172..56c94595c2a 100644 > --- a/bundle-uri.c > +++ b/bundle-uri.c > @@ -9,6 +9,11 @@ > #include "config.h" > #include "remote.h" > > +static const char *heuristics[] = { > + [BUNDLE_HEURISTIC_NONE] = "", > + [BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken", > +}; > + > static int compare_bundles(const void *hashmap_cmp_fn_data, > const struct hashmap_entry *he1, > const struct hashmap_entry *he2, > @@ -100,6 +105,9 @@ void print_bundle_list(FILE *fp, struct bundle_list *list) > fprintf(fp, "\tversion = %d\n", list->version); > fprintf(fp, "\tmode = %s\n", mode); > > + if (list->heuristic) > + printf("\theuristic = %s\n", heuristics[list->heuristic]); Given this condition, the 'heuristic' key should not be sent if it's 'BUNDLE_HEURISTIC_NONE'. But, as a fallback... > + > for_all_bundles_in_list(list, summarize_bundle, fp); > } > > @@ -142,6 +150,19 @@ static int bundle_list_update(const char *key, const char *value, > return 0; > } > > + if (!strcmp(subkey, "heuristic")) { > + int i; > + for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) { > + if (!strcmp(value, heuristics[i])) { > + list->heuristic = i; > + return 0; > + } > + } ...this condition seems to handle 'BUNDLE_HEURISTIC_NONE' anyway. There's no harm in this, since 'BUNDLE_HEURISTIC_NONE' is the default value of 'list->heuristic' anyway. > void init_bundle_list(struct bundle_list *list); > diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh > index 7b4f930e532..6fc92a9c0d4 100755 > --- a/t/t5750-bundle-uri-parse.sh > +++ b/t/t5750-bundle-uri-parse.sh > @@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' ' > test_cmp_config_output expect actual > ' > > +test_expect_success 'parse config format: creationToken heuristic' ' > + cat >expect <<-\EOF && > + [bundle] > + version = 1 > + mode = all > + heuristic = creationToken > + [bundle "one"] > + uri = http://example.com/bundle.bdl > + [bundle "two"] > + uri = https://example.com/bundle.bdl > + [bundle "three"] > + uri = file:///usr/share/git/bundle.bdl > + EOF > + > + test-tool bundle-uri parse-config expect >actual 2>err && > + test_must_be_empty err && > + test_cmp_config_output expect actual > +' And this test verifies that 'heuristic' is no longer being ignored. Looks good! > + > test_done
diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt index daa21eb674a..3faae386853 100644 --- a/Documentation/config/bundle.txt +++ b/Documentation/config/bundle.txt @@ -15,6 +15,13 @@ bundle.mode:: complete understanding of the bundled information (`all`) or if any one of the listed bundle URIs is sufficient (`any`). +bundle.heuristic:: + If this string-valued key exists, then the bundle list is designed to + work well with incremental `git fetch` commands. The heuristic signals + that there are additional keys available for each bundle that help + determine which subset of bundles the client should download. The + only value currently understood is `creationToken`. + bundle.<id>.*:: The `bundle.<id>.*` keys are used to describe a single item in the bundle list, grouped under `<id>` for identification purposes. diff --git a/bundle-uri.c b/bundle-uri.c index 36268dda172..56c94595c2a 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -9,6 +9,11 @@ #include "config.h" #include "remote.h" +static const char *heuristics[] = { + [BUNDLE_HEURISTIC_NONE] = "", + [BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken", +}; + static int compare_bundles(const void *hashmap_cmp_fn_data, const struct hashmap_entry *he1, const struct hashmap_entry *he2, @@ -100,6 +105,9 @@ void print_bundle_list(FILE *fp, struct bundle_list *list) fprintf(fp, "\tversion = %d\n", list->version); fprintf(fp, "\tmode = %s\n", mode); + if (list->heuristic) + printf("\theuristic = %s\n", heuristics[list->heuristic]); + for_all_bundles_in_list(list, summarize_bundle, fp); } @@ -142,6 +150,19 @@ static int bundle_list_update(const char *key, const char *value, return 0; } + if (!strcmp(subkey, "heuristic")) { + int i; + for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) { + if (!strcmp(value, heuristics[i])) { + list->heuristic = i; + return 0; + } + } + + /* Ignore unknown heuristics. */ + return 0; + } + /* Ignore other unknown global keys. */ return 0; } diff --git a/bundle-uri.h b/bundle-uri.h index d5e89f1671c..ad82174112d 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -52,6 +52,14 @@ enum bundle_list_mode { BUNDLE_MODE_ANY }; +enum bundle_list_heuristic { + BUNDLE_HEURISTIC_NONE = 0, + BUNDLE_HEURISTIC_CREATIONTOKEN, + + /* Must be last. */ + BUNDLE_HEURISTIC__COUNT, +}; + /** * A bundle_list contains an unordered set of remote_bundle_info structs, * as well as information about the bundle listing, such as version and @@ -75,6 +83,12 @@ struct bundle_list { * advertised by the bundle list at that location. */ char *baseURI; + + /** + * A list can have a heuristic, which helps reduce the number of + * downloaded bundles. + */ + enum bundle_list_heuristic heuristic; }; void init_bundle_list(struct bundle_list *list); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index 7b4f930e532..6fc92a9c0d4 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' ' test_cmp_config_output expect actual ' +test_expect_success 'parse config format: creationToken heuristic' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + test_done