Message ID | 20240530064638.GE1949704@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 64f8502b40affcb4c956c511fef7e704a88b0e25 |
Headers | show |
Series | add-ons for ps/leakfixes | expand |
Jeff King <peff@peff.net> writes: > We manually manage the src_dir array with ALLOC_GROW. Using a strvec is > a little more ergonomic, and makes the memory ownership more clear. It > does mean that we copy the strings (which were otherwise just pointers > into the "sources" strvec), but using the same rationale as 9fcd9e4e72 > (builtin/mv duplicate string list memory, 2024-05-27), it's just not > enough to be worth worrying about here. Hmph, the rationale given by 9fcd9e4e (builtin/mv duplicate string list memory, 2024-05-27) essentially is "the number of elements are the same as the number of command line parameters", but I do not think that is quite correct. When you do "mv srcA srcB ... dst", you'd inspect the command line arguments from left to right, notice that srcA is a directory, find the cache entries for paths that are inside srcA, append the paths in that directory to source[] and destination[] array, and extend argc. "for (i = 0; i < argc; i++)" loop that appends one element to src_for_dst per iteration ends up running the number of paths being moved, which can be order of magnitude more than the command line parameters. Of course, if we needed to make copies for correctness reasons (or to clarify memory ownership semantics), that alone may be a good justification and we do not need an excuse "it's just a handful of elements anyway" to begin with. Anyway, that is about somebody else's patch, not this one ;-). The rationale *does* apply to this change; src_dir is a list of directories we found on the command line, so the number of elements in it is reasonably bounded. > As a bonus, this gets rid of some "int"s used for allocation management > (though in practice these were limited to command-line sizes and thus > not overflowable). Again, correct. > Signed-off-by: Jeff King <peff@peff.net> > --- > builtin/mv.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/builtin/mv.c b/builtin/mv.c > index 01725e4a20..6c69033c5f 100644 > --- a/builtin/mv.c > +++ b/builtin/mv.c > @@ -197,8 +197,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > struct strvec submodule_gitfiles_to_free = STRVEC_INIT; > const char **submodule_gitfiles; > char *dst_w_slash = NULL; > - const char **src_dir = NULL; > - int src_dir_nr = 0, src_dir_alloc = 0; > + struct strvec src_dir = STRVEC_INIT; > enum update_mode *modes, dst_mode = 0; > struct stat st, dest_st; > struct string_list src_for_dst = STRING_LIST_INIT_DUP; > @@ -344,8 +343,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > /* last - first >= 1 */ > modes[i] |= WORKING_DIRECTORY; > > - ALLOC_GROW(src_dir, src_dir_nr + 1, src_dir_alloc); > - src_dir[src_dir_nr++] = src; > + strvec_push(&src_dir, src); > > n = argc + last - first; > REALLOC_ARRAY(modes, n); > @@ -559,7 +557,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > } > } > > - remove_empty_src_dirs(src_dir, src_dir_nr); > + remove_empty_src_dirs(src_dir.v, src_dir.nr); > > if (dirty_paths.nr) > advise_on_moving_dirty_path(&dirty_paths); > @@ -574,7 +572,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > ret = 0; > > out: > - free(src_dir); > + strvec_clear(&src_dir); > free(dst_w_slash); > string_list_clear(&src_for_dst, 0); > string_list_clear(&dirty_paths, 0);
On Thu, May 30, 2024 at 08:36:25AM -0700, Junio C Hamano wrote: > Hmph, the rationale given by 9fcd9e4e (builtin/mv duplicate string > list memory, 2024-05-27) essentially is "the number of elements are > the same as the number of command line parameters", but I do not > think that is quite correct. > > When you do "mv srcA srcB ... dst", you'd inspect the command line > arguments from left to right, notice that srcA is a directory, find > the cache entries for paths that are inside srcA, append the paths > in that directory to source[] and destination[] array, and extend > argc. "for (i = 0; i < argc; i++)" loop that appends one element to > src_for_dst per iteration ends up running the number of paths being > moved, which can be order of magnitude more than the command line > parameters. > > Of course, if we needed to make copies for correctness reasons (or > to clarify memory ownership semantics), that alone may be a good > justification and we do not need an excuse "it's just a handful of > elements anyway" to begin with. > > Anyway, that is about somebody else's patch, not this one ;-). Heh, good digging. I actually wondered if I was making the same mistake while writing mine, but double-checked that src_dir is not expanded in that way. But I didn't think to check Patrick's original. ;) IMHO it is probably still OK. We are bounded by the number of entries in the index (and we already use proportional memory for other parts of the operation). -Peff
Jeff King <peff@peff.net> writes: >> Anyway, that is about somebody else's patch, not this one ;-). > > Heh, good digging. I actually wondered if I was making the same mistake > while writing mine, but double-checked that src_dir is not expanded in > that way. But I didn't think to check Patrick's original. ;) > > IMHO it is probably still OK. We are bounded by the number of entries in > the index (and we already use proportional memory for other parts of the > operation). Yes, I agree it is OK. We have two arrays whose strings have to be allocated, source[] and destination[], and used to have another whose elements borrows from one of these, but with the change instead of borrowing we now duplicate, but as long as the other behefit outweigh the cost of additional memory usage, it is an acceptable trade-off. Thanks.
diff --git a/builtin/mv.c b/builtin/mv.c index 01725e4a20..6c69033c5f 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -197,8 +197,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) struct strvec submodule_gitfiles_to_free = STRVEC_INIT; const char **submodule_gitfiles; char *dst_w_slash = NULL; - const char **src_dir = NULL; - int src_dir_nr = 0, src_dir_alloc = 0; + struct strvec src_dir = STRVEC_INIT; enum update_mode *modes, dst_mode = 0; struct stat st, dest_st; struct string_list src_for_dst = STRING_LIST_INIT_DUP; @@ -344,8 +343,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) /* last - first >= 1 */ modes[i] |= WORKING_DIRECTORY; - ALLOC_GROW(src_dir, src_dir_nr + 1, src_dir_alloc); - src_dir[src_dir_nr++] = src; + strvec_push(&src_dir, src); n = argc + last - first; REALLOC_ARRAY(modes, n); @@ -559,7 +557,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) } } - remove_empty_src_dirs(src_dir, src_dir_nr); + remove_empty_src_dirs(src_dir.v, src_dir.nr); if (dirty_paths.nr) advise_on_moving_dirty_path(&dirty_paths); @@ -574,7 +572,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) ret = 0; out: - free(src_dir); + strvec_clear(&src_dir); free(dst_w_slash); string_list_clear(&src_for_dst, 0); string_list_clear(&dirty_paths, 0);
We manually manage the src_dir array with ALLOC_GROW. Using a strvec is a little more ergonomic, and makes the memory ownership more clear. It does mean that we copy the strings (which were otherwise just pointers into the "sources" strvec), but using the same rationale as 9fcd9e4e72 (builtin/mv duplicate string list memory, 2024-05-27), it's just not enough to be worth worrying about here. As a bonus, this gets rid of some "int"s used for allocation management (though in practice these were limited to command-line sizes and thus not overflowable). Signed-off-by: Jeff King <peff@peff.net> --- builtin/mv.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)