Message ID | 20200606002241.1578150-1-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fast-import: fix incomplete conversion with multiple mark files | expand |
Hi, On Sat, 6 Jun 2020, brian m. carlson wrote: > When ddddf8d7e2 ("fast-import: permit reading multiple marks files", > 2020-02-22) converted fast-import to handle multiple marks files in > preparation for submodule support, the conversion was incomplete. With > a large number of marks, we would actually modify the marks variable > even though we had passed in a different variable to operate on. In > addition, we didn't consider the fact that the code can replace the mark > set passed in, so when we did so we happened to leak quite a bit of > memory, since we never reused the structure we created, instead > reallocating a new one each time. > > It doesn't appear from some testing that we actually produce incorrect > results in this case, only that we leak a substantial amount of memory. > To make things work properly and avoid leaking, pass a pointer to > pointer to struct mark_set, which allows us to modify the set of marks > when the number of marks is large. > > With this patch, importing a dump of git.git with a set of exported > marks goes from taking in excess of 15 GiB of memory (and being killed > by the Linux OOM killer) to using a maximum of 1.4 GiB of memory. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Thanks for the quickly patching it! I tested the patch and I can confirm this solves the memory leak for me. Thanks, Tibor Billes
Tibor Billes <tbilles@gmx.com> writes: > On Sat, 6 Jun 2020, brian m. carlson wrote: > >> When ddddf8d7e2 ("fast-import: permit reading multiple marks files", >> 2020-02-22) converted fast-import to handle multiple marks files in >> preparation for submodule support, the conversion was incomplete. >... > > Thanks for the quickly patching it! I tested the patch and I can confirm this > solves the memory leak for me. Tibor, thanks for testing. Brian, I notice that the singleton global "marks_set_count", even though the number could be counted per instance of the mark_set structure, is still singleton. I didn't bother thinking about it when I wrote my "perhaps along this line" patch, but now I read it again, I think it is OK to leave it (and other stats counters) as is, primarily because we don't have a need (yet) to show stats per mark_set. Did you leave it as a global for the same reason? Just sanity-checking. Thanks.
On 2020-06-08 at 16:47:58, Junio C Hamano wrote: > Brian, I notice that the singleton global "marks_set_count", even > though the number could be counted per instance of the mark_set > structure, is still singleton. I didn't bother thinking about it > when I wrote my "perhaps along this line" patch, but now I read it > again, I think it is OK to leave it (and other stats counters) as > is, primarily because we don't have a need (yet) to show stats per > mark_set. Did you leave it as a global for the same reason? Just > sanity-checking. Yes, I did; sorry for not mentioning that in the commit message. I think it's fine to count the total number of marks processed as a statistic, and that would be independent of how we processed them. It's an interesting metric, but not so interesting that folks will have a need to see a detailed breakdown, I think.
diff --git a/fast-import.c b/fast-import.c index 0dfa14dc8c..ed87d6e380 100644 --- a/fast-import.c +++ b/fast-import.c @@ -150,7 +150,7 @@ struct recent_command { char *buf; }; -typedef void (*mark_set_inserter_t)(struct mark_set *s, struct object_id *oid, uintmax_t mark); +typedef void (*mark_set_inserter_t)(struct mark_set **s, struct object_id *oid, uintmax_t mark); typedef void (*each_mark_fn_t)(uintmax_t mark, void *obj, void *cbp); /* Configured limits on output */ @@ -534,13 +534,15 @@ static char *pool_strdup(const char *s) return r; } -static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe) +static void insert_mark(struct mark_set **sp, uintmax_t idnum, struct object_entry *oe) { + struct mark_set *s = *sp; + while ((idnum >> s->shift) >= 1024) { s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set)); - s->shift = marks->shift + 10; - s->data.sets[0] = marks; - marks = s; + s->shift = (*sp)->shift + 10; + s->data.sets[0] = (*sp); + (*sp) = s; } while (s->shift) { uintmax_t i = idnum >> s->shift; @@ -958,7 +960,7 @@ static int store_object( e = insert_object(&oid); if (mark) - insert_mark(marks, mark, e); + insert_mark(&marks, mark, e); if (e->idx.offset) { duplicate_count_by_type[type]++; return 1; @@ -1156,7 +1158,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) e = insert_object(&oid); if (mark) - insert_mark(marks, mark, e); + insert_mark(&marks, mark, e); if (e->idx.offset) { duplicate_count_by_type[OBJ_BLOB]++; @@ -1731,7 +1733,7 @@ static void dump_marks(void) } } -static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark) +static void insert_object_entry(struct mark_set **s, struct object_id *oid, uintmax_t mark) { struct object_entry *e; e = find_object(oid); @@ -1748,12 +1750,12 @@ static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintm insert_mark(s, mark, e); } -static void insert_oid_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark) +static void insert_oid_entry(struct mark_set **s, struct object_id *oid, uintmax_t mark) { insert_mark(s, mark, xmemdupz(oid, sizeof(*oid))); } -static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inserter) +static void read_mark_file(struct mark_set **s, FILE *f, mark_set_inserter_t inserter) { char line[512]; while (fgets(line, sizeof(line), f)) { @@ -1786,7 +1788,7 @@ static void read_marks(void) goto done; /* Marks file does not exist */ else die_errno("cannot read '%s'", import_marks_file); - read_mark_file(marks, f, insert_object_entry); + read_mark_file(&marks, f, insert_object_entry); fclose(f); done: import_marks_file_done = 1; @@ -3242,7 +3244,7 @@ static void parse_alias(void) die(_("Expected 'to' command, got %s"), command_buf.buf); e = find_object(&b.oid); assert(e); - insert_mark(marks, next_mark, e); + insert_mark(&marks, next_mark, e); } static char* make_fast_import_path(const char *path) @@ -3340,7 +3342,7 @@ static void option_rewrite_submodules(const char *arg, struct string_list *list) fp = fopen(f, "r"); if (!fp) die_errno("cannot read '%s'", f); - read_mark_file(ms, fp, insert_oid_entry); + read_mark_file(&ms, fp, insert_oid_entry); fclose(fp); }