diff mbox series

[5/5] mv: replace src_dir with a strvec

Message ID 20240530064638.GE1949704@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 64f8502b40affcb4c956c511fef7e704a88b0e25
Headers show
Series add-ons for ps/leakfixes | expand

Commit Message

Jeff King May 30, 2024, 6:46 a.m. UTC
We manually manage the src_dir array with ALLOC_GROW. Using a strvec is
a little more ergonomic, and makes the memory ownership more clear. It
does mean that we copy the strings (which were otherwise just pointers
into the "sources" strvec), but using the same rationale as 9fcd9e4e72
(builtin/mv duplicate string list memory, 2024-05-27), it's just not
enough to be worth worrying about here.

As a bonus, this gets rid of some "int"s used for allocation management
(though in practice these were limited to command-line sizes and thus
not overflowable).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/mv.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Comments

Junio C Hamano May 30, 2024, 3:36 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> We manually manage the src_dir array with ALLOC_GROW. Using a strvec is
> a little more ergonomic, and makes the memory ownership more clear. It
> does mean that we copy the strings (which were otherwise just pointers
> into the "sources" strvec), but using the same rationale as 9fcd9e4e72
> (builtin/mv duplicate string list memory, 2024-05-27), it's just not
> enough to be worth worrying about here.

Hmph, the rationale given by 9fcd9e4e (builtin/mv duplicate string
list memory, 2024-05-27) essentially is "the number of elements are
the same as the number of command line parameters", but I do not
think that is quite correct.

When you do "mv srcA srcB ... dst", you'd inspect the command line
arguments from left to right, notice that srcA is a directory, find
the cache entries for paths that are inside srcA, append the paths
in that directory to source[] and destination[] array, and extend
argc.  "for (i = 0; i < argc; i++)" loop that appends one element to
src_for_dst per iteration ends up running the number of paths being
moved, which can be order of magnitude more than the command line
parameters.

Of course, if we needed to make copies for correctness reasons (or
to clarify memory ownership semantics), that alone may be a good
justification and we do not need an excuse "it's just a handful of
elements anyway" to begin with.

Anyway, that is about somebody else's patch, not this one ;-).

The rationale *does* apply to this change; src_dir is a list of
directories we found on the command line, so the number of elements
in it is reasonably bounded.

> As a bonus, this gets rid of some "int"s used for allocation management
> (though in practice these were limited to command-line sizes and thus
> not overflowable).

Again, correct.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/mv.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/mv.c b/builtin/mv.c
> index 01725e4a20..6c69033c5f 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -197,8 +197,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  	struct strvec submodule_gitfiles_to_free = STRVEC_INIT;
>  	const char **submodule_gitfiles;
>  	char *dst_w_slash = NULL;
> -	const char **src_dir = NULL;
> -	int src_dir_nr = 0, src_dir_alloc = 0;
> +	struct strvec src_dir = STRVEC_INIT;
>  	enum update_mode *modes, dst_mode = 0;
>  	struct stat st, dest_st;
>  	struct string_list src_for_dst = STRING_LIST_INIT_DUP;
> @@ -344,8 +343,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  			/* last - first >= 1 */
>  			modes[i] |= WORKING_DIRECTORY;
>  
> -			ALLOC_GROW(src_dir, src_dir_nr + 1, src_dir_alloc);
> -			src_dir[src_dir_nr++] = src;
> +			strvec_push(&src_dir, src);
>  
>  			n = argc + last - first;
>  			REALLOC_ARRAY(modes, n);
> @@ -559,7 +557,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  		}
>  	}
>  
> -	remove_empty_src_dirs(src_dir, src_dir_nr);
> +	remove_empty_src_dirs(src_dir.v, src_dir.nr);
>  
>  	if (dirty_paths.nr)
>  		advise_on_moving_dirty_path(&dirty_paths);
> @@ -574,7 +572,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  	ret = 0;
>  
>  out:
> -	free(src_dir);
> +	strvec_clear(&src_dir);
>  	free(dst_w_slash);
>  	string_list_clear(&src_for_dst, 0);
>  	string_list_clear(&dirty_paths, 0);
Jeff King May 31, 2024, 11:12 a.m. UTC | #2
On Thu, May 30, 2024 at 08:36:25AM -0700, Junio C Hamano wrote:

> Hmph, the rationale given by 9fcd9e4e (builtin/mv duplicate string
> list memory, 2024-05-27) essentially is "the number of elements are
> the same as the number of command line parameters", but I do not
> think that is quite correct.
> 
> When you do "mv srcA srcB ... dst", you'd inspect the command line
> arguments from left to right, notice that srcA is a directory, find
> the cache entries for paths that are inside srcA, append the paths
> in that directory to source[] and destination[] array, and extend
> argc.  "for (i = 0; i < argc; i++)" loop that appends one element to
> src_for_dst per iteration ends up running the number of paths being
> moved, which can be order of magnitude more than the command line
> parameters.
> 
> Of course, if we needed to make copies for correctness reasons (or
> to clarify memory ownership semantics), that alone may be a good
> justification and we do not need an excuse "it's just a handful of
> elements anyway" to begin with.
> 
> Anyway, that is about somebody else's patch, not this one ;-).

Heh, good digging. I actually wondered if I was making the same mistake
while writing mine, but double-checked that src_dir is not expanded in
that way. But I didn't think to check Patrick's original. ;)

IMHO it is probably still OK. We are bounded by the number of entries in
the index (and we already use proportional memory for other parts of the
operation).

-Peff
Junio C Hamano May 31, 2024, 2:56 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

>> Anyway, that is about somebody else's patch, not this one ;-).
>
> Heh, good digging. I actually wondered if I was making the same mistake
> while writing mine, but double-checked that src_dir is not expanded in
> that way. But I didn't think to check Patrick's original. ;)
>
> IMHO it is probably still OK. We are bounded by the number of entries in
> the index (and we already use proportional memory for other parts of the
> operation).

Yes, I agree it is OK.  We have two arrays whose strings have to be
allocated, source[] and destination[], and used to have another
whose elements borrows from one of these, but with the change
instead of borrowing we now duplicate, but as long as the other
behefit outweigh the cost of additional memory usage, it is an
acceptable trade-off.

Thanks.
diff mbox series

Patch

diff --git a/builtin/mv.c b/builtin/mv.c
index 01725e4a20..6c69033c5f 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -197,8 +197,7 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 	struct strvec submodule_gitfiles_to_free = STRVEC_INIT;
 	const char **submodule_gitfiles;
 	char *dst_w_slash = NULL;
-	const char **src_dir = NULL;
-	int src_dir_nr = 0, src_dir_alloc = 0;
+	struct strvec src_dir = STRVEC_INIT;
 	enum update_mode *modes, dst_mode = 0;
 	struct stat st, dest_st;
 	struct string_list src_for_dst = STRING_LIST_INIT_DUP;
@@ -344,8 +343,7 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 			/* last - first >= 1 */
 			modes[i] |= WORKING_DIRECTORY;
 
-			ALLOC_GROW(src_dir, src_dir_nr + 1, src_dir_alloc);
-			src_dir[src_dir_nr++] = src;
+			strvec_push(&src_dir, src);
 
 			n = argc + last - first;
 			REALLOC_ARRAY(modes, n);
@@ -559,7 +557,7 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	remove_empty_src_dirs(src_dir, src_dir_nr);
+	remove_empty_src_dirs(src_dir.v, src_dir.nr);
 
 	if (dirty_paths.nr)
 		advise_on_moving_dirty_path(&dirty_paths);
@@ -574,7 +572,7 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 	ret = 0;
 
 out:
-	free(src_dir);
+	strvec_clear(&src_dir);
 	free(dst_w_slash);
 	string_list_clear(&src_for_dst, 0);
 	string_list_clear(&dirty_paths, 0);