Message ID | 20200124033436.81097-14-emilyshaffer@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | add git-bugreport tool | expand |
emilyshaffer@google.com writes: > From: Emily Shaffer <emilyshaffer@google.com> > > Alongside the loose object counts, it can be useful to show the number > of packs and packed objects. This way we can check whether the repo has > an appropriate ratio of packed to loose objects to help determine > whether it's behaving correctly. This lets the enumeration machinery to enumerate at the individual object level, relies on the enumeration machinery to show your callback all objects from the same pack before an object from a different pack, to count both objects and packs. But given that packfiles record how many objects there are in the pack, and that the packfile.c layer must know what are the pack files we have available, I wonder if we have an API already to allow us to enumerate each packfile, and while counting the number of times our callback was called, add the objects contained in each of them. If there isn't, it does not sound like a bad API function to add (i.e. given a repository r, iterate over r->objects->packed_git and let the API caller see each of them; the API caller can grab num_objects and add them up).
emilyshaffer@google.com writes: > From: Emily Shaffer <emilyshaffer@google.com> > > Alongside the loose object counts, it can be useful to show the number > of packs and packed objects. This way we can check whether the repo has > an appropriate ratio of packed to loose objects to help determine > whether it's behaving correctly. This step makes me wonder if we want to see the midx as well. Didn't we have a bug that manifests only when midx is (or is not) in use? Similarly, I think use (or non-use) of the commit graph may also be a useful information for diagnosing.
On Tue, Feb 04, 2020 at 11:03:57AM -0800, Junio C Hamano wrote: > emilyshaffer@google.com writes: > > > From: Emily Shaffer <emilyshaffer@google.com> > > > > Alongside the loose object counts, it can be useful to show the number > > of packs and packed objects. This way we can check whether the repo has > > an appropriate ratio of packed to loose objects to help determine > > whether it's behaving correctly. > > This step makes me wonder if we want to see the midx as well. > Didn't we have a bug that manifests only when midx is (or is not) in > use? > > Similarly, I think use (or non-use) of the commit graph may also be > a useful information for diagnosing. > Sure. I'll look into both of these. Would you rather see them as add-ons later, or do you want me to hold off on a v6 of this set until they're ready? - Emily
On Tue, Feb 04, 2020 at 11:00:41AM -0800, Junio C Hamano wrote: > emilyshaffer@google.com writes: > > > From: Emily Shaffer <emilyshaffer@google.com> > > > > Alongside the loose object counts, it can be useful to show the number > > of packs and packed objects. This way we can check whether the repo has > > an appropriate ratio of packed to loose objects to help determine > > whether it's behaving correctly. > > This lets the enumeration machinery to enumerate at the individual > object level, relies on the enumeration machinery to show your > callback all objects from the same pack before an object from a > different pack, to count both objects and packs. > > But given that packfiles record how many objects there are in the > pack, and that the packfile.c layer must know what are the pack > files we have available, I wonder if we have an API already to allow > us to enumerate each packfile, and while counting the number of > times our callback was called, add the objects contained in each of > them. If there isn't, it does not sound like a bad API function to > add (i.e. given a repository r, iterate over r->objects->packed_git > and let the API caller see each of them; the API caller can grab > num_objects and add them up). Yeah, I don't see one. I'll go ahead and add it - that sounds straightforward enough. - Emily
diff --git a/bugreport.c b/bugreport.c index bf10857183..45cc1764e0 100644 --- a/bugreport.c +++ b/bugreport.c @@ -214,6 +214,41 @@ static void get_loose_object_summary(struct strbuf *obj_info) { total_count_questionable ? " (problem during count)" : ""); } +struct packed_object_cb_data { + struct packed_git *last_pack; + int pack_count; + int object_count; +}; + +static int packed_object_cb(const struct object_id *oid, + struct packed_git *pack, + uint32_t pos, + void *data) { + struct packed_object_cb_data *cb_data = data; + + if (!cb_data) + return 1; + + if (pack && pack != cb_data->last_pack) { + cb_data->last_pack = pack; + cb_data->pack_count++; + } + + cb_data->object_count++; + + return 0; +} + +static void get_packed_object_summary(struct strbuf *obj_info) +{ + struct packed_object_cb_data cb_data = {NULL, 0, 0}; + + for_each_packed_object(packed_object_cb, &cb_data, 0); + + strbuf_addf(obj_info, "%d total packs (%d objects)\n", + cb_data.pack_count, cb_data.object_count); +} + static const char * const bugreport_usage[] = { N_("git bugreport [-o|--output <file>]"), NULL @@ -292,6 +327,9 @@ int cmd_main(int argc, const char **argv) get_header(&buffer, "Loose Object Counts"); get_loose_object_summary(&buffer); + get_header(&buffer, "Packed Object Summary"); + get_packed_object_summary(&buffer); + report = fopen_for_writing(report_path.buf); strbuf_write(&buffer, report); fclose(report);