Message ID | pull.1221.v3.git.git.1646777327043.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v3] cat-file: skip expanding default format | expand |
On Tue, Mar 08, 2022 at 10:08:46PM +0000, John Cai via GitGitGadget wrote: > diff --git a/builtin/cat-file.c b/builtin/cat-file.c > index 7b3f42950ec..e2edba70b41 100644 > --- a/builtin/cat-file.c > +++ b/builtin/cat-file.c > @@ -351,6 +351,13 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d > } > } > > +static int print_default_format(char *buf, int len, struct expand_data *data) > +{ > + return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), > + type_name(data->type), > + (uintmax_t)*data->info.sizep); > +} Two small nits here. It looks like the indentation on the second and third lines is off a little bit, since we'd typically expect those to be indented to the same margin as the first argument to xsnprintf(). The other is that you're reading data->info.sizep by dereferencing it, but we know that it points to data->size. So I think there it makes sense to just read the value directly out of data->size, though note that you'll still need the cast to uintmax_t since you're formatting it with PRIuMAX. > + > /* > * If "pack" is non-NULL, then "offset" is the byte offset within the pack from > * which the object may be accessed (though note that we may also rely on > @@ -381,10 +388,16 @@ static void batch_object_write(const char *obj_name, > } > } > > - strbuf_reset(scratch); > - strbuf_expand(scratch, opt->format, expand_format, data); > - strbuf_addch(scratch, '\n'); > - batch_write(opt, scratch->buf, scratch->len); > + if (!opt->format) { > + char buf[1024]; > + int len = print_default_format(buf, 1024, data); > + batch_write(opt, buf, len); Just curious (and apologies if this was discussed earlier and I missed it), but: is there a reason that we have to use a scratch buffer here that is separate from the strbuf we already have allocated? That would avoid a large-ish stack variable, but it means that the two paths are a little more similar, and can share the batch_write call outside of the if/else statement. The rest of the changes in this file all look good to me. > diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh > new file mode 100755 > index 00000000000..e463623f5a3 > --- /dev/null > +++ b/t/perf/p1006-cat-file.sh > @@ -0,0 +1,16 @@ > +#!/bin/sh > + > +test_description='Basic sort performance tests' Is this description a hold-over from p0071? If so, it may be worth updating here. > +test_expect_success 'setup' ' > + git rev-list --all >rla > +' > + > +test_perf 'cat-file --batch-check' ' > + git cat-file --batch-check <rla > +' We could probably get away with dropping the setup test and using `--batch-all-objects` here. Note that right now you're only printing commit objects, so there would be a slight behavior change from the way the patch is currently written, but it should demonstrate the same performance improvement. Thanks, Taylor
Hi Taylor, On 8 Mar 2022, at 17:30, Taylor Blau wrote: > On Tue, Mar 08, 2022 at 10:08:46PM +0000, John Cai via GitGitGadget wrote: >> diff --git a/builtin/cat-file.c b/builtin/cat-file.c >> index 7b3f42950ec..e2edba70b41 100644 >> --- a/builtin/cat-file.c >> +++ b/builtin/cat-file.c >> @@ -351,6 +351,13 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d >> } >> } >> >> +static int print_default_format(char *buf, int len, struct expand_data *data) >> +{ >> + return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), >> + type_name(data->type), >> + (uintmax_t)*data->info.sizep); >> +} > > Two small nits here. It looks like the indentation on the second and > third lines is off a little bit, since we'd typically expect those to be > indented to the same margin as the first argument to xsnprintf(). Thanks for bringing this up. I did have a question about indentation in this case. for the second line, I did try to indent it to align with buf. I attempted to do the same with the third line, but it's the ( that lines up with buf so optically it looks a little off. > > The other is that you're reading data->info.sizep by dereferencing it, > but we know that it points to data->size. So I think there it makes > sense to just read the value directly out of data->size, though note > that you'll still need the cast to uintmax_t since you're formatting it > with PRIuMAX. good point, I'll adjust this in the next version. > >> + >> /* >> * If "pack" is non-NULL, then "offset" is the byte offset within the pack from >> * which the object may be accessed (though note that we may also rely on >> @@ -381,10 +388,16 @@ static void batch_object_write(const char *obj_name, >> } >> } >> >> - strbuf_reset(scratch); >> - strbuf_expand(scratch, opt->format, expand_format, data); >> - strbuf_addch(scratch, '\n'); >> - batch_write(opt, scratch->buf, scratch->len); >> + if (!opt->format) { >> + char buf[1024]; >> + int len = print_default_format(buf, 1024, data); >> + batch_write(opt, buf, len); > > Just curious (and apologies if this was discussed earlier and I missed > it), but: is there a reason that we have to use a scratch buffer here > that is separate from the strbuf we already have allocated? > > That would avoid a large-ish stack variable, but it means that the two > paths are a little more similar, and can share the batch_write call > outside of the if/else statement. This was holdover code from before. Looks like the scratch buffer gets passed in. Do you mean we don't need to allocate char buf[1024] and instead we can just use scratch and pass it into print_default_format? > > The rest of the changes in this file all look good to me. > >> diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh >> new file mode 100755 >> index 00000000000..e463623f5a3 >> --- /dev/null >> +++ b/t/perf/p1006-cat-file.sh >> @@ -0,0 +1,16 @@ >> +#!/bin/sh >> + >> +test_description='Basic sort performance tests' > > Is this description a hold-over from p0071? If so, it may be worth > updating here. > >> +test_expect_success 'setup' ' >> + git rev-list --all >rla >> +' >> + >> +test_perf 'cat-file --batch-check' ' >> + git cat-file --batch-check <rla >> +' > > We could probably get away with dropping the setup test and using > `--batch-all-objects` here. Note that right now you're only printing > commit objects, so there would be a slight behavior change from the way > the patch is currently written, but it should demonstrate the same > performance improvement. This sounds good to me! > > Thanks, > Taylor
On 8 Mar 2022, at 18:09, John Cai wrote: > Hi Taylor, > > On 8 Mar 2022, at 17:30, Taylor Blau wrote: > >> On Tue, Mar 08, 2022 at 10:08:46PM +0000, John Cai via GitGitGadget wrote: >>> diff --git a/builtin/cat-file.c b/builtin/cat-file.c >>> index 7b3f42950ec..e2edba70b41 100644 >>> --- a/builtin/cat-file.c >>> +++ b/builtin/cat-file.c >>> @@ -351,6 +351,13 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d >>> } >>> } >>> >>> +static int print_default_format(char *buf, int len, struct expand_data *data) >>> +{ >>> + return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), >>> + type_name(data->type), >>> + (uintmax_t)*data->info.sizep); >>> +} >> >> Two small nits here. It looks like the indentation on the second and >> third lines is off a little bit, since we'd typically expect those to be >> indented to the same margin as the first argument to xsnprintf(). > > Thanks for bringing this up. I did have a question about indentation in this > case. for the second line, I did try to indent it to align with buf. I attempted > to do the same with the third line, but it's the ( that lines up with buf so > optically it looks a little off. > >> >> The other is that you're reading data->info.sizep by dereferencing it, >> but we know that it points to data->size. So I think there it makes >> sense to just read the value directly out of data->size, though note >> that you'll still need the cast to uintmax_t since you're formatting it >> with PRIuMAX. > > good point, I'll adjust this in the next version. > >> >>> + >>> /* >>> * If "pack" is non-NULL, then "offset" is the byte offset within the pack from >>> * which the object may be accessed (though note that we may also rely on >>> @@ -381,10 +388,16 @@ static void batch_object_write(const char *obj_name, >>> } >>> } >>> >>> - strbuf_reset(scratch); >>> - strbuf_expand(scratch, opt->format, expand_format, data); >>> - strbuf_addch(scratch, '\n'); >>> - batch_write(opt, scratch->buf, scratch->len); >>> + if (!opt->format) { >>> + char buf[1024]; >>> + int len = print_default_format(buf, 1024, data); >>> + batch_write(opt, buf, len); >> >> Just curious (and apologies if this was discussed earlier and I missed >> it), but: is there a reason that we have to use a scratch buffer here >> that is separate from the strbuf we already have allocated? >> >> That would avoid a large-ish stack variable, but it means that the two >> paths are a little more similar, and can share the batch_write call >> outside of the if/else statement. > > This was holdover code from before. Looks like the scratch buffer gets passed > in. Do you mean we don't need to allocate char buf[1024] and instead we can just > use scratch and pass it into print_default_format? something like this? diff --git a/builtin/cat-file.c b/builtin/cat-file.c index e2edba70b418..2336bcc80850 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -351,11 +351,11 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d } } -static int print_default_format(char *buf, int len, struct expand_data *data) +static void print_default_format(struct strbuf *scratch, struct expand_data *data) { - return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), - type_name(data->type), - (uintmax_t)*data->info.sizep); + strbuf_addf(scratch, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), + type_name(data->type), + (uintmax_t)data->size); } /* @@ -388,17 +388,17 @@ static void batch_object_write(const char *obj_name, } } + strbuf_reset(scratch); + if (!opt->format) { - char buf[1024]; - int len = print_default_format(buf, 1024, data); - batch_write(opt, buf, len); + print_default_format(scratch, data); } else { - strbuf_reset(scratch); strbuf_expand(scratch, opt->format, expand_format, data); strbuf_addch(scratch, '\n'); - batch_write(opt, scratch->buf, scratch->len); } + batch_write(opt, scratch->buf, scratch->len); + if (opt->print_contents) { print_object_or_die(opt, data); batch_write(opt, "\n", 1); > >> >> The rest of the changes in this file all look good to me. >> >>> diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh >>> new file mode 100755 >>> index 00000000000..e463623f5a3 >>> --- /dev/null >>> +++ b/t/perf/p1006-cat-file.sh >>> @@ -0,0 +1,16 @@ >>> +#!/bin/sh >>> + >>> +test_description='Basic sort performance tests' >> >> Is this description a hold-over from p0071? If so, it may be worth >> updating here. >> >>> +test_expect_success 'setup' ' >>> + git rev-list --all >rla >>> +' >>> + >>> +test_perf 'cat-file --batch-check' ' >>> + git cat-file --batch-check <rla >>> +' >> >> We could probably get away with dropping the setup test and using >> `--batch-all-objects` here. Note that right now you're only printing >> commit objects, so there would be a slight behavior change from the way >> the patch is currently written, but it should demonstrate the same >> performance improvement. > > This sounds good to me! > >> >> Thanks, >> Taylor
diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 7b3f42950ec..e2edba70b41 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -351,6 +351,13 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d } } +static int print_default_format(char *buf, int len, struct expand_data *data) +{ + return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), + type_name(data->type), + (uintmax_t)*data->info.sizep); +} + /* * If "pack" is non-NULL, then "offset" is the byte offset within the pack from * which the object may be accessed (though note that we may also rely on @@ -381,10 +388,16 @@ static void batch_object_write(const char *obj_name, } } - strbuf_reset(scratch); - strbuf_expand(scratch, opt->format, expand_format, data); - strbuf_addch(scratch, '\n'); - batch_write(opt, scratch->buf, scratch->len); + if (!opt->format) { + char buf[1024]; + int len = print_default_format(buf, 1024, data); + batch_write(opt, buf, len); + } else { + strbuf_reset(scratch); + strbuf_expand(scratch, opt->format, expand_format, data); + strbuf_addch(scratch, '\n'); + batch_write(opt, scratch->buf, scratch->len); + } if (opt->print_contents) { print_object_or_die(opt, data); @@ -508,6 +521,9 @@ static int batch_unordered_packed(const struct object_id *oid, data); } + +#define DEFAULT_FORMAT "%(objectname) %(objecttype) %(objectsize)" + static int batch_objects(struct batch_options *opt) { struct strbuf input = STRBUF_INIT; @@ -516,9 +532,6 @@ static int batch_objects(struct batch_options *opt) int save_warning; int retval = 0; - if (!opt->format) - opt->format = "%(objectname) %(objecttype) %(objectsize)"; - /* * Expand once with our special mark_query flag, which will prime the * object_info to be handed to oid_object_info_extended for each @@ -526,12 +539,17 @@ static int batch_objects(struct batch_options *opt) */ memset(&data, 0, sizeof(data)); data.mark_query = 1; - strbuf_expand(&output, opt->format, expand_format, &data); + strbuf_expand(&output, + opt->format ? opt->format : DEFAULT_FORMAT, + expand_format, + &data); data.mark_query = 0; strbuf_release(&output); if (opt->cmdmode) data.split_on_whitespace = 1; + if (opt->format && !strcmp(opt->format, DEFAULT_FORMAT)) + opt->format = NULL; /* * If we are printing out the object, then always fill in the type, * since we will want to decide whether or not to stream. diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh new file mode 100755 index 00000000000..e463623f5a3 --- /dev/null +++ b/t/perf/p1006-cat-file.sh @@ -0,0 +1,16 @@ +#!/bin/sh + +test_description='Basic sort performance tests' +. ./perf-lib.sh + +test_perf_large_repo + +test_expect_success 'setup' ' + git rev-list --all >rla +' + +test_perf 'cat-file --batch-check' ' + git cat-file --batch-check <rla +' + +test_done