[1/3] Add support for new %w wildcard in checkout filter

Message ID	20210906181002.625647-1-calumlikesapplepie@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: Calum McConnell <calumlikesapplepie@gmail.com> To: git@vger.kernel.org Cc: Calum McConnell <calumlikesapplepie@gmail.com> Subject: [PATCH 1/3] Add support for new %w wildcard in checkout filter Date: Mon, 6 Sep 2021 14:10:00 -0400 Message-Id: <20210906181002.625647-1-calumlikesapplepie@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[1/3] Add support for new %w wildcard in checkout filter \| expand [1/3] Add support for new %w wildcard in checkout filter [2/3] Die if filter is attempted without a worktree [3/3] Document the new gitattributes change

Message ID

20210906181002.625647-1-calumlikesapplepie@gmail.com (mailing list archive)

State

New, archived

Headers

From: Calum McConnell <calumlikesapplepie@gmail.com>
To: git@vger.kernel.org
Cc: Calum McConnell <calumlikesapplepie@gmail.com>
Subject: [PATCH 1/3] Add support for new %w wildcard in checkout filter
Date: Mon,  6 Sep 2021 14:10:00 -0400
Message-Id: <20210906181002.625647-1-calumlikesapplepie@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

[1/3] Add support for new %w wildcard in checkout filter | expand

Commit Message

Calum McConnell Sept. 6, 2021, 6:10 p.m. UTC

When building content filters with gitattributes, for instance to ensure
git stores the plain-text rather than the binary form of data for certain
formats, it is often advantageous to separate the filters into separate,
potentially complex scripts.  However, as the $PWD where content filters
are executed is unspecified the path to scripts needs to be specified as
an absolute path.  That means that the guide for setting up a repository
which uses scripts to filter content cannot simply consist of "include
the following lines in your .git/config file", and it means that the
otherwise safe operation of moving a git repository from one folder to
another is decidedly unsafe.

This %w (short for 'work tree') will allow such scripts to exist and
be executed on each checkout, without needing to be added to the PATH
or be dependent upon the $PWD of the checkout call.

Signed-off-by: Calum McConnell <calumlikesapplepie@gmail.com>
---
 convert.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

Comments

Jeff King Sept. 7, 2021, 4:33 p.m. UTC | #1

On Mon, Sep 06, 2021 at 02:10:00PM -0400, Calum McConnell wrote:

> When building content filters with gitattributes, for instance to ensure
> git stores the plain-text rather than the binary form of data for certain
> formats, it is often advantageous to separate the filters into separate,
> potentially complex scripts.  However, as the $PWD where content filters
> are executed is unspecified the path to scripts needs to be specified as
> an absolute path.  That means that the guide for setting up a repository
> which uses scripts to filter content cannot simply consist of "include
> the following lines in your .git/config file", and it means that the
> otherwise safe operation of moving a git repository from one folder to
> another is decidedly unsafe.
> 
> This %w (short for 'work tree') will allow such scripts to exist and
> be executed on each checkout, without needing to be added to the PATH
> or be dependent upon the $PWD of the checkout call.

I sympathize with your goal here, but I have two high-level thoughts.

One is regarding security.

I was worried a bit that this might provide an opportunity for untrusted
repositories to trigger scripts from within their working trees without
the user having a chance to OK them. But the "%w" here would be used in
the config that the user must manually specify. So that gives them an
opportunity to investigate it as much as they want.

But it does mean we're pointing to code in the working tree that could
change after a git-pull, etc. You _can_ be careful about that (e.g., by
fetching and looking at the results, and then merging only if OK), but
the "use from working tree by default" model makes it easy to screw up
(especially if you put it into your ~/.gitconfig, and then it's used
with every repository, trusted or not).

When discussing similar features (like, say, taking project-level config
after making sure it's OK), our general thinking has been that we should
make sure Git can access some non-worktree location, and make it easy for
users to copy the data to it.

So in its most awkward state, that's asking people to copy the script
somewhere in their PATH (and then update it as needed, after verifying
it each time). That's not _too_ bad as instructions go, but gets hard if
your audience doesn't have a consistent PATH set up.

I wonder if we could do better with one or both of:

  - some way of specifying the repository directory in a config option.
    I think you can do something like that now with:

      mkdir -p .git/scripts
      cp myfilter .git/scripts/foo-filter
      git config filter.foo.clean '$(git rev-parse --git-dir)/scripts/foo-filter'

    which is admittedly a bit awkward. Some kind of syntactic candy
    could help there (having Git recognize "$GIT_SCRIPTS/foo-filter" or
    something).

  - some way of specifying an in-repo blob via config. I think right now
    you could do something like:

      git config filter.foo.clean '
        tmp=$(mktemp -t foo-filter) &&
	trap "rm -f \"$tmp\"" 0 &&
	git cat-file 1234abcd >"$tmp" &&
	"$tmp"
      '

    That's obviously even more horrible, but if we had some kind of
    syntactic sugar, you could set the config to "$oid:1234abcd" or
    something.

    That saves you from the awkward "mkdir/cp" in the first example, but
    gets around the security implications because of the immutability of
    Git objects.

My second thought was that this is a more general problem. It's one for
config itself, and for other script programs you might point to via
config. So it seems like we should be able to come up with a solution
that is more general than just clean/smudge filters.

-Peff

Junio C Hamano Sept. 7, 2021, 8:28 p.m. UTC | #2

> Subject: [PATCH 1/3] Add support for new %w wildcard in checkout filter

Please write your title to help readers of "git shortlog --no-merges".
With the above, readers would not know where the %w would newly be
allowed to appear, what it does, and how it helps what use case.

> When building content filters with gitattributes, for instance to ensure
> git stores the plain-text rather than the binary form of data for certain
> formats, it is often advantageous to separate the filters into separate,
> potentially complex scripts.  However, as the $PWD where content filters
> are executed is unspecified the path to scripts needs to be specified as
> an absolute path.

Isn't $PWD at least stable in a repository, relative to the top of
worktree or something?  IOW, isn't the above raise a separate
documentation issue that is better solved without any new code?

> That means that the guide for setting up a repository
> which uses scripts to filter content cannot simply consist of "include
> the following lines in your .git/config file", and it means that the
> otherwise safe operation of moving a git repository from one folder to
> another is decidedly unsafe.

If these paths are given as absolute paths (e.g. ~/filter-scripts/),
the repositories that refer to these can be moved freely, as long as
the location of the the directory that holds these auxiliary scripts
is kept stable, no?

diff --git a/convert.c b/convert.c
index 0d6fb3410a..5d64ccce57 100644
--- a/convert.c
+++ b/convert.c
@@ -9,6 +9,7 @@ 
 #include "sub-process.h"
 #include "utf8.h"
 #include "ll-merge.h"
+#include "repository.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -630,19 +631,26 @@  static int filter_buffer_or_fd(int in, int out, void *data)
 
 	/* apply % substitution to cmd */
 	struct strbuf cmd = STRBUF_INIT;
-	struct strbuf path = STRBUF_INIT;
+	struct strbuf filePath = STRBUF_INIT;
+	struct strbuf worktreePath = STRBUF_INIT;
 	struct strbuf_expand_dict_entry dict[] = {
 		{ "f", NULL, },
+		{ "w", NULL, }, 
 		{ NULL, NULL, },
 	};
 
-	/* quote the path to preserve spaces, etc. */
-	sq_quote_buf(&path, params->path);
-	dict[0].value = path.buf;
+	/* quote the paths to preserve spaces, etc. */
+	sq_quote_buf(&filePath, params->path);
+	dict[0].value = filePath.buf;
+	
+	sq_quote_buf(&worktreePath, the_repository->worktree);
+	dict[1].value = worktreePath.buf;
 
-	/* expand all %f with the quoted path */
+	/* expand all %f or %w with the quoted path */
 	strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict);
-	strbuf_release(&path);
+	strbuf_release(&filePath);
+  	strbuf_release(&worktreePath);
+
 
 	strvec_push(&child_process.args, cmd.buf);
 	child_process.use_shell = 1;

[1/3] Add support for new %w wildcard in checkout filter

Commit Message

Comments

Patch