Message ID | 20210906181002.625647-1-calumlikesapplepie@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] Add support for new %w wildcard in checkout filter | expand |
On Mon, Sep 06, 2021 at 02:10:00PM -0400, Calum McConnell wrote: > When building content filters with gitattributes, for instance to ensure > git stores the plain-text rather than the binary form of data for certain > formats, it is often advantageous to separate the filters into separate, > potentially complex scripts. However, as the $PWD where content filters > are executed is unspecified the path to scripts needs to be specified as > an absolute path. That means that the guide for setting up a repository > which uses scripts to filter content cannot simply consist of "include > the following lines in your .git/config file", and it means that the > otherwise safe operation of moving a git repository from one folder to > another is decidedly unsafe. > > This %w (short for 'work tree') will allow such scripts to exist and > be executed on each checkout, without needing to be added to the PATH > or be dependent upon the $PWD of the checkout call. I sympathize with your goal here, but I have two high-level thoughts. One is regarding security. I was worried a bit that this might provide an opportunity for untrusted repositories to trigger scripts from within their working trees without the user having a chance to OK them. But the "%w" here would be used in the config that the user must manually specify. So that gives them an opportunity to investigate it as much as they want. But it does mean we're pointing to code in the working tree that could change after a git-pull, etc. You _can_ be careful about that (e.g., by fetching and looking at the results, and then merging only if OK), but the "use from working tree by default" model makes it easy to screw up (especially if you put it into your ~/.gitconfig, and then it's used with every repository, trusted or not). When discussing similar features (like, say, taking project-level config after making sure it's OK), our general thinking has been that we should make sure Git can access some non-worktree location, and make it easy for users to copy the data to it. So in its most awkward state, that's asking people to copy the script somewhere in their PATH (and then update it as needed, after verifying it each time). That's not _too_ bad as instructions go, but gets hard if your audience doesn't have a consistent PATH set up. I wonder if we could do better with one or both of: - some way of specifying the repository directory in a config option. I think you can do something like that now with: mkdir -p .git/scripts cp myfilter .git/scripts/foo-filter git config filter.foo.clean '$(git rev-parse --git-dir)/scripts/foo-filter' which is admittedly a bit awkward. Some kind of syntactic candy could help there (having Git recognize "$GIT_SCRIPTS/foo-filter" or something). - some way of specifying an in-repo blob via config. I think right now you could do something like: git config filter.foo.clean ' tmp=$(mktemp -t foo-filter) && trap "rm -f \"$tmp\"" 0 && git cat-file 1234abcd >"$tmp" && "$tmp" ' That's obviously even more horrible, but if we had some kind of syntactic sugar, you could set the config to "$oid:1234abcd" or something. That saves you from the awkward "mkdir/cp" in the first example, but gets around the security implications because of the immutability of Git objects. My second thought was that this is a more general problem. It's one for config itself, and for other script programs you might point to via config. So it seems like we should be able to come up with a solution that is more general than just clean/smudge filters. -Peff
> Subject: [PATCH 1/3] Add support for new %w wildcard in checkout filter Please write your title to help readers of "git shortlog --no-merges". With the above, readers would not know where the %w would newly be allowed to appear, what it does, and how it helps what use case. > When building content filters with gitattributes, for instance to ensure > git stores the plain-text rather than the binary form of data for certain > formats, it is often advantageous to separate the filters into separate, > potentially complex scripts. However, as the $PWD where content filters > are executed is unspecified the path to scripts needs to be specified as > an absolute path. Isn't $PWD at least stable in a repository, relative to the top of worktree or something? IOW, isn't the above raise a separate documentation issue that is better solved without any new code? > That means that the guide for setting up a repository > which uses scripts to filter content cannot simply consist of "include > the following lines in your .git/config file", and it means that the > otherwise safe operation of moving a git repository from one folder to > another is decidedly unsafe. If these paths are given as absolute paths (e.g. ~/filter-scripts/), the repositories that refer to these can be moved freely, as long as the location of the the directory that holds these auxiliary scripts is kept stable, no?
diff --git a/convert.c b/convert.c index 0d6fb3410a..5d64ccce57 100644 --- a/convert.c +++ b/convert.c @@ -9,6 +9,7 @@ #include "sub-process.h" #include "utf8.h" #include "ll-merge.h" +#include "repository.h" /* * convert.c - convert a file when checking it out and checking it in. @@ -630,19 +631,26 @@ static int filter_buffer_or_fd(int in, int out, void *data) /* apply % substitution to cmd */ struct strbuf cmd = STRBUF_INIT; - struct strbuf path = STRBUF_INIT; + struct strbuf filePath = STRBUF_INIT; + struct strbuf worktreePath = STRBUF_INIT; struct strbuf_expand_dict_entry dict[] = { { "f", NULL, }, + { "w", NULL, }, { NULL, NULL, }, }; - /* quote the path to preserve spaces, etc. */ - sq_quote_buf(&path, params->path); - dict[0].value = path.buf; + /* quote the paths to preserve spaces, etc. */ + sq_quote_buf(&filePath, params->path); + dict[0].value = filePath.buf; + + sq_quote_buf(&worktreePath, the_repository->worktree); + dict[1].value = worktreePath.buf; - /* expand all %f with the quoted path */ + /* expand all %f or %w with the quoted path */ strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict); - strbuf_release(&path); + strbuf_release(&filePath); + strbuf_release(&worktreePath); + strvec_push(&child_process.args, cmd.buf); child_process.use_shell = 1;
When building content filters with gitattributes, for instance to ensure git stores the plain-text rather than the binary form of data for certain formats, it is often advantageous to separate the filters into separate, potentially complex scripts. However, as the $PWD where content filters are executed is unspecified the path to scripts needs to be specified as an absolute path. That means that the guide for setting up a repository which uses scripts to filter content cannot simply consist of "include the following lines in your .git/config file", and it means that the otherwise safe operation of moving a git repository from one folder to another is decidedly unsafe. This %w (short for 'work tree') will allow such scripts to exist and be executed on each checkout, without needing to be added to the PATH or be dependent upon the $PWD of the checkout call. Signed-off-by: Calum McConnell <calumlikesapplepie@gmail.com> --- convert.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)