From patchwork Mon May 27 11:47:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13675105 Received: from wfout5-smtp.messagingengine.com (wfout5-smtp.messagingengine.com [64.147.123.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0982415E5B9 for ; Mon, 27 May 2024 11:47:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.148 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716810444; cv=none; b=DVSl8jyRb/ZEZw3hW2Tn4HDC/28uH2M5vonRWcZV12GBGPd7KMkD8gW2uFolYOkgi+RteS90hMPKppIDeZ7oeAZb+w2DAWPcRPwCsbZeOnmffbcUVPvnkRA/dRTQ2QfIytCvufQfgbdLar2WfIpTx4PrJ7iTSu53DsB3S16GDIk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716810444; c=relaxed/simple; bh=vBDW+mldnNjFq3BVYrh3M7uQiCeIISx85cTtmjH6Q0w=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=BjnaT0c2ccAgxLYv0iL2iaGp2sBkeEQHbqk37PXwAg045/nmOOYi2AW9GfwIh3L4v4h0SDIEQrydz84Hx5vmCVHKG9roehMskB1d4oAOVp3+EGiyf3ahgLMuxMAx2q88w3JP3qWImRzEtQgu4L1rdv4u7OoL61lPjd3Nl5fOudw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=pks.im; spf=pass smtp.mailfrom=pks.im; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b=dG585De2; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=H0BcR5cQ; arc=none smtp.client-ip=64.147.123.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="dG585De2"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="H0BcR5cQ" Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailfout.west.internal (Postfix) with ESMTP id 35A2B1C000CD; Mon, 27 May 2024 07:47:22 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Mon, 27 May 2024 07:47:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1716810441; x=1716896841; bh=tIZnt1NB54 EKQzEBE3VWMpIl9rOX0OA4XzwMgH2XCWE=; b=dG585De2cb+iyxyFscqe5x5eXZ yiDn48S7ogJppUEfQ9X2NnlM18yEJcemQWEoVYwxAlWD6cuiJ6esb82ouDTytr+J DaGqjtNgB/XjL+w3CWcmZJDPd6SmmIMYl2M7copTndoavF2D5Obk210+9ghVxN8M pQEZEdXAFBWIpy/ihoV4JFJ6O8cGfs1svGsBlDjqdGZXAEqn1uqof+EocCXwlExW CjzQklDdPDI1Dxf0Hdh7ViuE5D0G/vq+71bKDkbCl1qIm0/2hvQdtsGAnMYySlV9 idHW4TTCrXu4e/OyRmfRjs+ZFkSC+Tq9fsTnmYOsWl39ztj8148EYRt8pL5g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1716810441; x=1716896841; bh=tIZnt1NB54EKQzEBE3VWMpIl9rOX 0OA4XzwMgH2XCWE=; b=H0BcR5cQz9NBQf9z8RZJdwnphAg/Cl5qjAJXFCol76sU cwmfXX03rjA3caTTCZ3BxHV73BCq907houRm/SZAOOd8BV7liX04s9VY6PPw+eCK AAVeL7tzGidh1VKIcoPVLNJJoQDGuzoX/4KX6N6ZqpfJgpxv+Phwz8h2JnIGYT4x UHzPF+jphV78z4txvk50ZgV7kXVzJSjB5xDRTMQ03LHnYcQxE59U0Ip7ur0x0cUw l5oOIGp5ja729/k/Mf/0Vwpi6o2ZFv2u+Xyz71pkTPoOTqtDb04HLYUps+7dhcId eOza29hgjWjX7LVMs7479pUh5X0ZcGEpSE+Lx7W9iw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdejgedggeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesghdtreertddtvdenucfhrhhomheprfgrthhr ihgtkhcuufhtvghinhhhrghrughtuceophhssehpkhhsrdhimheqnecuggftrfgrthhtvg hrnhepueektdevtdffveeljeetgfehheeigeekleduvdeffeeghefgledttdehjeelffet necuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehmrghilhhfrhhomhepphhsse hpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 27 May 2024 07:47:20 -0400 (EDT) Received: by localhost (OpenSMTPD) with ESMTPSA id b1a13543 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 27 May 2024 11:47:10 +0000 (UTC) Date: Mon, 27 May 2024 13:47:18 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Eric Sunshine , Junio C Hamano , Karthik Nayak , Jeff King Subject: [PATCH v3 20/21] builtin/mv: refactor to use `struct strvec` Message-ID: <48b9d3e3436aff539e5c76c945f95067110254ff.1716810168.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Memory allocation patterns in git-mv(1) are extremely hard to follow: We copy around string pointers into manually-managed arrays, some of which alias each other, but only sometimes, while we also drop some of those strings at other times without ever daring to free them. While this may be my own subjective feeling, it seems like others have given up as the code has multiple calls to `UNLEAK()`. These are not sufficient though, and git-mv(1) is still leaking all over the place even with them. Refactor the code to instead track strings in `struct strvec`. While this has the effect of effectively duplicating some of the strings without an actual need, it is way easier to reason about and fixes all of the aliasing of memory that has been going on. It allows us to get rid of the `UNLEAK()` calls and also fixes leaks that those calls did not paper over. Mark tests which are now leak-free accordingly. Signed-off-by: Patrick Steinhardt --- builtin/mv.c | 125 +++++++++++------------ t/t4001-diff-rename.sh | 4 +- t/t4043-diff-rename-binary.sh | 1 + t/t4120-apply-popt.sh | 1 + t/t6400-merge-df.sh | 1 + t/t6412-merge-large-rename.sh | 1 + t/t6426-merge-skip-unneeded-updates.sh | 1 + t/t6429-merge-sequence-rename-caching.sh | 1 + 8 files changed, 68 insertions(+), 67 deletions(-) diff --git a/builtin/mv.c b/builtin/mv.c index 12dcc0b13c..e461d29ca1 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -20,6 +20,7 @@ #include "read-cache-ll.h" #include "repository.h" #include "setup.h" +#include "strvec.h" #include "submodule.h" #include "entry.h" @@ -38,42 +39,32 @@ enum update_mode { #define DUP_BASENAME 1 #define KEEP_TRAILING_SLASH 2 -static const char **internal_prefix_pathspec(const char *prefix, - const char **pathspec, - int count, unsigned flags) +static void internal_prefix_pathspec(struct strvec *out, + const char *prefix, + const char **pathspec, + int count, unsigned flags) { - int i; - const char **result; int prefixlen = prefix ? strlen(prefix) : 0; - ALLOC_ARRAY(result, count + 1); /* Create an intermediate copy of the pathspec based on the flags */ - for (i = 0; i < count; i++) { - int length = strlen(pathspec[i]); - int to_copy = length; - char *it; + for (int i = 0; i < count; i++) { + size_t length = strlen(pathspec[i]); + size_t to_copy = length; + const char *maybe_basename; + char *trimmed, *prefixed_path; + while (!(flags & KEEP_TRAILING_SLASH) && to_copy > 0 && is_dir_sep(pathspec[i][to_copy - 1])) to_copy--; - it = xmemdupz(pathspec[i], to_copy); - if (flags & DUP_BASENAME) { - result[i] = xstrdup(basename(it)); - free(it); - } else { - result[i] = it; - } - } - result[count] = NULL; + trimmed = xmemdupz(pathspec[i], to_copy); + maybe_basename = (flags & DUP_BASENAME) ? basename(trimmed) : trimmed; + prefixed_path = prefix_path(prefix, prefixlen, maybe_basename); + strvec_push(out, prefixed_path); - /* Prefix the pathspec and free the old intermediate strings */ - for (i = 0; i < count; i++) { - const char *match = prefix_path(prefix, prefixlen, result[i]); - free((char *) result[i]); - result[i] = match; + free(prefixed_path); + free(trimmed); } - - return result; } static char *add_slash(const char *path) @@ -176,7 +167,10 @@ int cmd_mv(int argc, const char **argv, const char *prefix) OPT_BOOL(0, "sparse", &ignore_sparse, N_("allow updating entries outside of the sparse-checkout cone")), OPT_END(), }; - const char **source, **destination, **dest_path, **submodule_gitfile; + struct strvec sources = STRVEC_INIT; + struct strvec dest_paths = STRVEC_INIT; + struct strvec destinations = STRVEC_INIT; + const char **submodule_gitfile; char *dst_w_slash = NULL; const char **src_dir = NULL; int src_dir_nr = 0, src_dir_alloc = 0; @@ -201,7 +195,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) if (repo_read_index(the_repository) < 0) die(_("index file corrupt")); - source = internal_prefix_pathspec(prefix, argv, argc, 0); + internal_prefix_pathspec(&sources, prefix, argv, argc, 0); CALLOC_ARRAY(modes, argc); /* @@ -212,41 +206,39 @@ int cmd_mv(int argc, const char **argv, const char *prefix) flags = KEEP_TRAILING_SLASH; if (argc == 1 && is_directory(argv[0]) && !is_directory(argv[1])) flags = 0; - dest_path = internal_prefix_pathspec(prefix, argv + argc, 1, flags); - dst_w_slash = add_slash(dest_path[0]); + internal_prefix_pathspec(&dest_paths, prefix, argv + argc, 1, flags); + dst_w_slash = add_slash(dest_paths.v[0]); submodule_gitfile = xcalloc(argc, sizeof(char *)); - if (dest_path[0][0] == '\0') + if (dest_paths.v[0][0] == '\0') /* special case: "." was normalized to "" */ - destination = internal_prefix_pathspec(dest_path[0], argv, argc, DUP_BASENAME); - else if (!lstat(dest_path[0], &st) && - S_ISDIR(st.st_mode)) { - destination = internal_prefix_pathspec(dst_w_slash, argv, argc, DUP_BASENAME); + internal_prefix_pathspec(&destinations, dest_paths.v[0], argv, argc, DUP_BASENAME); + else if (!lstat(dest_paths.v[0], &st) && S_ISDIR(st.st_mode)) { + internal_prefix_pathspec(&destinations, dst_w_slash, argv, argc, DUP_BASENAME); + } else if (!path_in_sparse_checkout(dst_w_slash, the_repository->index) && + empty_dir_has_sparse_contents(dst_w_slash)) { + internal_prefix_pathspec(&destinations, dst_w_slash, argv, argc, DUP_BASENAME); + dst_mode = SKIP_WORKTREE_DIR; + } else if (argc != 1) { + die(_("destination '%s' is not a directory"), dest_paths.v[0]); } else { - if (!path_in_sparse_checkout(dst_w_slash, the_repository->index) && - empty_dir_has_sparse_contents(dst_w_slash)) { - destination = internal_prefix_pathspec(dst_w_slash, argv, argc, DUP_BASENAME); - dst_mode = SKIP_WORKTREE_DIR; - } else if (argc != 1) { - die(_("destination '%s' is not a directory"), dest_path[0]); - } else { - destination = dest_path; - /* - * is a file outside of sparse-checkout - * cone. Insist on cone mode here for backward - * compatibility. We don't want dst_mode to be assigned - * for a file when the repo is using no-cone mode (which - * is deprecated at this point) sparse-checkout. As - * SPARSE here is only considering cone-mode situation. - */ - if (!path_in_cone_mode_sparse_checkout(destination[0], the_repository->index)) - dst_mode = SPARSE; - } + strvec_pushv(&destinations, dest_paths.v); + + /* + * is a file outside of sparse-checkout + * cone. Insist on cone mode here for backward + * compatibility. We don't want dst_mode to be assigned + * for a file when the repo is using no-cone mode (which + * is deprecated at this point) sparse-checkout. As + * SPARSE here is only considering cone-mode situation. + */ + if (!path_in_cone_mode_sparse_checkout(destinations.v[0], the_repository->index)) + dst_mode = SPARSE; } /* Checking */ for (i = 0; i < argc; i++) { - const char *src = source[i], *dst = destination[i]; + const char *src = sources.v[i], *dst = destinations.v[i]; int length; const char *bad = NULL; int skip_sparse = 0; @@ -330,8 +322,6 @@ int cmd_mv(int argc, const char **argv, const char *prefix) src_dir[src_dir_nr++] = src; n = argc + last - first; - REALLOC_ARRAY(source, n); - REALLOC_ARRAY(destination, n); REALLOC_ARRAY(modes, n); REALLOC_ARRAY(submodule_gitfile, n); @@ -341,12 +331,16 @@ int cmd_mv(int argc, const char **argv, const char *prefix) for (j = 0; j < last - first; j++) { const struct cache_entry *ce = the_repository->index->cache[first + j]; const char *path = ce->name; - source[argc + j] = path; - destination[argc + j] = - prefix_path(dst_with_slash, dst_with_slash_len, path + length + 1); + char *prefixed_path = prefix_path(dst_with_slash, dst_with_slash_len, path + length + 1); + + strvec_push(&sources, path); + strvec_push(&destinations, prefixed_path); + memset(modes + argc + j, 0, sizeof(enum update_mode)); modes[argc + j] |= ce_skip_worktree(ce) ? SPARSE : INDEX; submodule_gitfile[argc + j] = NULL; + + free(prefixed_path); } free(dst_with_slash); @@ -430,8 +424,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) remove_entry: if (--argc > 0) { int n = argc - i; - MOVE_ARRAY(source + i, source + i + 1, n); - MOVE_ARRAY(destination + i, destination + i + 1, n); + strvec_remove(&sources, i); + strvec_remove(&destinations, i); MOVE_ARRAY(modes + i, modes + i + 1, n); MOVE_ARRAY(submodule_gitfile + i, submodule_gitfile + i + 1, n); @@ -448,7 +442,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) } for (i = 0; i < argc; i++) { - const char *src = source[i], *dst = destination[i]; + const char *src = sources.v[i], *dst = destinations.v[i]; enum update_mode mode = modes[i]; int pos; int sparse_and_dirty = 0; @@ -576,8 +570,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) string_list_clear(&src_for_dst, 0); string_list_clear(&dirty_paths, 0); string_list_clear(&only_match_skip_worktree, 0); - UNLEAK(source); - UNLEAK(dest_path); + strvec_clear(&sources); + strvec_clear(&dest_paths); + strvec_clear(&destinations); free(submodule_gitfile); free(modes); return ret; diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh index 49c042a38a..cd1931dd55 100755 --- a/t/t4001-diff-rename.sh +++ b/t/t4001-diff-rename.sh @@ -3,9 +3,9 @@ # Copyright (c) 2005 Junio C Hamano # -test_description='Test rename detection in diff engine. +test_description='Test rename detection in diff engine.' -' +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh . "$TEST_DIRECTORY"/lib-diff.sh diff --git a/t/t4043-diff-rename-binary.sh b/t/t4043-diff-rename-binary.sh index 2a2cf91352..e486493908 100755 --- a/t/t4043-diff-rename-binary.sh +++ b/t/t4043-diff-rename-binary.sh @@ -5,6 +5,7 @@ test_description='Move a binary file' +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh diff --git a/t/t4120-apply-popt.sh b/t/t4120-apply-popt.sh index 697e86c0ff..f788428540 100755 --- a/t/t4120-apply-popt.sh +++ b/t/t4120-apply-popt.sh @@ -5,6 +5,7 @@ test_description='git apply -p handling.' +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh test_expect_success setup ' diff --git a/t/t6400-merge-df.sh b/t/t6400-merge-df.sh index 3de4ef6bd9..27d6efdc9a 100755 --- a/t/t6400-merge-df.sh +++ b/t/t6400-merge-df.sh @@ -7,6 +7,7 @@ test_description='Test merge with directory/file conflicts' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh test_expect_success 'prepare repository' ' diff --git a/t/t6412-merge-large-rename.sh b/t/t6412-merge-large-rename.sh index ca018d11f5..d0863a8fb5 100755 --- a/t/t6412-merge-large-rename.sh +++ b/t/t6412-merge-large-rename.sh @@ -4,6 +4,7 @@ test_description='merging with large rename matrix' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh count() { diff --git a/t/t6426-merge-skip-unneeded-updates.sh b/t/t6426-merge-skip-unneeded-updates.sh index b059475ed0..62f0180325 100755 --- a/t/t6426-merge-skip-unneeded-updates.sh +++ b/t/t6426-merge-skip-unneeded-updates.sh @@ -22,6 +22,7 @@ test_description="merge cases" # underscore notation is to differentiate different # files that might be renamed into each other's paths.) +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh . "$TEST_DIRECTORY"/lib-merge.sh diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh index 0f39ed0d08..cb1c4ceef7 100755 --- a/t/t6429-merge-sequence-rename-caching.sh +++ b/t/t6429-merge-sequence-rename-caching.sh @@ -2,6 +2,7 @@ test_description="remember regular & dir renames in sequence of merges" +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh #