diff mbox series

[2/2] cat-file: configurable "best effort mode" for symlink resolution

Message ID 5de72c45e767c4d704503c8cd5c8e6dce4ea04d6.1718615028.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Symlink resolutions: limits and return modes | expand

Commit Message

Miguel Ángel Pastor Olivar June 17, 2024, 9:03 a.m. UTC
From: =?UTF-8?q?Miguel=20=C3=81ngel=20Pastor=20Olivar?= <migue@github.com>

This patch introduces a new "best effort mode" where the object found at
resolution step N is returned. If we've reached the end of the chain, the
returned object will be the file at the end of the chain, however, if, after
n resolutions we haven't reached the end of the chain, the returned object
will represent a symlink

The goal is to extend the symlink resolution process and provide the ability
to return the object found at the designated depth instead of returning an
error.

The current code already provides a limit to the maximum number of
resolutions that can be performed and something similar to this is returned
back to the caller:

loop SP <size> LF <object> LF

With the new config setting we are looking to return the actual information
of the object where the resolution stopped. Something similar to:

<oid> blob <size>\ndata\n

Signed-off-by: Miguel Ángel Pastor Olivar <migue@github.com>
---
 Documentation/config/core.txt | 16 +++++++++++++++-
 config.c                      | 13 +++++++++++++
 environment.c                 |  2 ++
 environment.h                 |  7 +++++++
 t/t1006-cat-file.sh           | 29 +++++++++++++++++++++++++++++
 tree-walk.c                   | 11 +++++++++++
 6 files changed, 77 insertions(+), 1 deletion(-)

Comments

Junio C Hamano June 17, 2024, 7:33 p.m. UTC | #1
"Miguel Ángel Pastor Olivar via GitGitGadget"
<gitgitgadget@gmail.com> writes:

> From: =?UTF-8?q?Miguel=20=C3=81ngel=20Pastor=20Olivar?= <migue@github.com>
>
> This patch introduces a new "best effort mode" where the object found at
> resolution step N is returned. If we've reached the end of the chain, the
> returned object will be the file at the end of the chain, however, if, after
> n resolutions we haven't reached the end of the chain, the returned object
> will represent a symlink
>
> The goal is to extend the symlink resolution process and provide the ability
> to return the object found at the designated depth instead of returning an
> error.
>
> The current code already provides a limit to the maximum number of
> resolutions that can be performed and something similar to this is returned
> back to the caller:
>
> loop SP <size> LF <object> LF
>
> With the new config setting we are looking to return the actual information
> of the object where the resolution stopped. Something similar to:
>
> <oid> blob <size>\ndata\n

I do not think this should be a configuration variable at all.
Either a command line option, or even better yet would be an
in-stream instruction ("flip into the 'tell me the last symlink
you saw before you gave up' mode"), is understandable though, given
that this is strictly for the "batch" mode.

For that matter, it is dubious that the previous one that added a
configuration variable to lower the symlink recursion limit is a
good idea.  It does not affect anything but "cat-file --batch" and
an in-stream instruction, e.g. "in this session, do not resolve more
than 3 levels", sounds like a much better fit to what this wants to
do.  That way, it will be a lot better isolated from unrelated code
paths.  It might even make sense not to introduce the new
max_symlink_depth global variable, but pass it through as a new
member in "struct object_context" given to get_oid_with_context(),
which in turn is passed as a new parameter to
get_tree_entry_follow_symlinks() function.

So, I am supportive to solving the problem this series attempts to
solve, but I am not on board with the design this series took.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index ca2d1eede52..706f316c89e 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -761,4 +761,18 @@  core.maxTreeDepth::
 core.maxSymlinkDepth::
 	The maximum number of symlinks Git is willing to resolve while
 	looking for a tree entry.
-	The default is GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS.
\ No newline at end of file
+	The default is GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS.
+
+core.symlinkResolutionMode::
+	The result returned by the symlink resolution process when
+	core.maxSymlinkDepth is reached. When set to "error"
+	`
+	loop SP <size> LF
+	<object> LF
+	` is returned.
+	If `best-effort` is set, the resolution process will return
+	something like:
+	`
+	<oid> blob <size> 120000\nname\n
+	`
+	The default is "error".
\ No newline at end of file
diff --git a/config.c b/config.c
index d69e9a3ae6b..fa753565e68 100644
--- a/config.c
+++ b/config.c
@@ -1687,6 +1687,19 @@  static int git_default_core_config(const char *var, const char *value,
 		return 0;
 	}
 
+	if (!strcmp(var, "core.symlinkresolutionmode")) {
+		if (!value)
+			symlink_resolution_mode = SYMLINK_RESOLUTION_MODE_ERROR;
+		if (!strcmp(value, "error"))
+			symlink_resolution_mode = SYMLINK_RESOLUTION_MODE_ERROR;
+		else if (!strcmp(value, "best-effort"))
+			symlink_resolution_mode =
+				SYMLINK_RESOLUTION_MODE_BEST_EFFORT;
+		else
+			warning(_("ignoring unknown core.symlinkresolutionmode value '%s'"),
+				value);
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return platform_core_config(var, value, ctx, cb);
 }
diff --git a/environment.c b/environment.c
index 6d7a5001eb1..a497331f2bc 100644
--- a/environment.c
+++ b/environment.c
@@ -96,6 +96,8 @@  int max_allowed_tree_depth =
 	2048;
 #endif
 int max_symlink_depth = -1;
+enum symlink_resolution_mode symlink_resolution_mode =
+	SYMLINK_RESOLUTION_MODE_ERROR;
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0
diff --git a/environment.h b/environment.h
index ea39c2887b1..5a6eebb061b 100644
--- a/environment.h
+++ b/environment.h
@@ -143,6 +143,13 @@  extern unsigned long pack_size_limit_cfg;
 extern int max_allowed_tree_depth;
 extern int max_symlink_depth;
 
+enum symlink_resolution_mode {
+	SYMLINK_RESOLUTION_MODE_ERROR = 0,
+	SYMLINK_RESOLUTION_MODE_BEST_EFFORT
+};
+
+extern enum symlink_resolution_mode symlink_resolution_mode;
+
 /*
  * Accessors for the core.sharedrepository config which lazy-load the value
  * from the config (if not already set). The "reset" function can be
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index fd7ab1d1eff..c1d807a0d7f 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -1113,6 +1113,35 @@  test_expect_success 'git cat-file --batch --follow-symlink stop resolving symlin
 	test_cmp expect actual
 '
 
+test_expect_success 'git cat-file --batch --follow-symlink stop resolving symlink at designated depth with error mode config' '
+	printf "loop 22\nHEAD:link-to-symlink-3\n">expect &&
+	printf 'HEAD:link-to-symlink-3' | git -c core.maxsymlinkdepth=1 -c core.symlinkresolutionmode=error cat-file --batch="%(objectname) %(objecttype) %(objectsize)" --follow-symlinks > actual &&
+	test_cmp expect actual &&
+	printf 'HEAD:link-to-symlink-3' | git -c core.maxsymlinkdepth=2 -c core.symlinkresolutionmode=error cat-file --batch="%(objectname) %(objecttype) %(objectsize)" --follow-symlinks > actual &&
+	test_cmp expect actual &&
+	printf 'HEAD:link-to-symlink-3' | git -c core.maxsymlinkdepth=3 -c core.symlinkresolutionmode=error cat-file --batch="%(objectname) %(objecttype) %(objectsize)" --follow-symlinks > actual &&
+	test_cmp expect actual &&
+	oid=$(git rev-parse HEAD:morx) &&
+	printf "${oid} blob 11\nHello World\n" >expect &&
+	printf 'HEAD:link-to-symlink-3' | git -c core.maxsymlinkdepth=4 cat-file --batch="%(objectname) %(objecttype) %(objectsize)" --follow-symlinks > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git cat-file --batch --follow-symlink return object info at designated depth' '
+	oid=$(git rev-parse HEAD:link-to-symlink-1) &&
+	printf "${oid} blob 13\nsame-dir-link\n" >expect &&
+	printf 'HEAD:link-to-symlink-1' | git -c core.maxsymlinkdepth=1  -c core.symlinkresolutionmode=best-effort cat-file --batch="%(objectname) %(objecttype) %(objectsize)" --follow-symlinks >actual &&
+	test_cmp expect actual &&
+	oid=$(git rev-parse HEAD:same-dir-link) &&
+	printf "${oid} blob 4\nmorx\n" > expect &&
+	printf 'HEAD:link-to-symlink-1' | git -c core.maxsymlinkdepth=2  -c core.symlinkresolutionmode=best-effort cat-file --batch="%(objectname) %(objecttype) %(objectsize)" --follow-symlinks >actual &&
+	test_cmp expect actual &&
+	oid=$(git rev-parse HEAD:morx) &&
+	printf "${oid} blob 11\nHello World\n" > expect &&
+	printf 'HEAD:link-to-symlink-1' | git -c core.maxsymlinkdepth=3  -c core.symlinkresolutionmode=best-effort cat-file --batch="%(objectname) %(objecttype) %(objectsize)" --follow-symlinks >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'cat-file --batch-all-objects shows all objects' '
 	# make new repos so we know the full set of objects; we will
 	# also make sure that there are some packed and some loose
diff --git a/tree-walk.c b/tree-walk.c
index 3ec2302309e..ee861fd6351 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -821,6 +821,17 @@  enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 			contents_start = contents;
 
 			parent = &parents[parents_nr - 1];
+
+			if (follows_remaining == 0 &&
+			    symlink_resolution_mode ==
+				    SYMLINK_RESOLUTION_MODE_BEST_EFFORT) {
+				strbuf_addstr(result_path, contents);
+				oidcpy(result, &current_tree_oid);
+				free(contents);
+				retval = FOUND;
+				goto done;
+			}
+
 			init_tree_desc(&t, &parent->oid, parent->tree, parent->size);
 			strbuf_splice(&namebuf, 0, len,
 				      contents_start, link_len);