Message ID | 20210124151306.23185-1-tboegi@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCH/RFC,v1,1/1] git restore -p . and precomposed unicode | expand |
tboegi@web.de writes: > The solution is to read the config variable "core.precomposeunicode" early. For a single command like "restore", we need to fail if it is run outside a repository anyway, so it is OK, but code in "git.c" in general can be called outside a repository, we may not know where our configuration files are, though. So I'd prefer to avoid adding too many "do this thing early for every single command". Where do we normally read that variable? I see calls to precompose_argv() on only a few codepaths builtin/diff-files.c:38: precompose_argv(argc, argv); builtin/diff-index.c:28: precompose_argv(argc, argv); builtin/diff-tree.c:129: precompose_argv(argc, argv); builtin/diff.c:455: precompose_argv(argc, argv); builtin/submodule--helper.c:1260: precompose_argv(diff_args.nr, diff_args.v); parse-options.c:872: precompose_argv(argc, argv); I guess the reason we can get away with it is because most of the newer commands use the parse_options() API, and the call to precompose_argv() is used by the codepaths implementing these commands. And as a rule, these commands read config first before calling parse_options(), so by the time the control reaches there, the value of the variable may be already known. The question is why "restore -p" is so special? Or does this patch mean everybody, even the ones that use parse_options() is broken? I guess "prefix" needs to be munged for everybody, so "restore -p" on the title of the patch is a red herring, and the problem applies to all "git" commands---in which case, inside "git.c" would be the right place to do so. I wonder if everybody who calls precompoase_argv() has access to the prefix, though. If so, would it make more sense to extend the API to precompose_argv(int argc, char **argv, char **prefix) so that the callers only need to call a single function, without any additional code like this patch does? Also, as the current precompose_argv() begins like this: void precompose_argv(int argc, const char **argv) { int i = 0; const char *oldarg; char *newarg; iconv_t ic_precompose; if (precomposed_unicode != 1) return; and environment.c initializes the global to (-1), I wonder if we can get away with an approach not to read the "config" anywhere outside precompose_argv() function. Instead, can't the above snippet become more like: if (precomposed_unicode < 0) precomposed_unicode = read from config; if (precomposed_unicode != 1) return; That way, you do not even have to touch anything outside compat/precompose_utf8.c, other than adjusting the callers to pass the pointer to their prefix to be munged. Namely > +int precompose_read_config_gently(void) This can become file-scope static. > +{ > + git_config_get_bool("core.precomposeunicode", &precomposed_unicode); > + return precomposed_unicode == 1; > +} > > void probe_utf8_pathname_composition(void) > { > @@ -60,6 +65,25 @@ void probe_utf8_pathname_composition(void) > strbuf_release(&path); > } > > +char *precompose_string_if_needed(const char *in) > +{ This too. > diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h > index 6f843d3e1a..ce82857d73 100644 > --- a/compat/precompose_utf8.h > +++ b/compat/precompose_utf8.h > @@ -28,6 +28,8 @@ typedef struct { > struct dirent_prec_psx *dirent_nfc; > } PREC_DIR; > > +int precompose_read_config_gently(void); > +char *precompose_string_if_needed(const char *in); > void precompose_argv(int argc, const char **argv); > void probe_utf8_pathname_composition(void); And this can go away. > diff --git a/git-compat-util.h b/git-compat-util.h > index 104993b975..f34854b66f 100644 > --- a/git-compat-util.h > +++ b/git-compat-util.h > @@ -252,6 +252,14 @@ typedef unsigned long uintptr_t; > #ifdef PRECOMPOSE_UNICODE > #include "compat/precompose_utf8.h" > #else > +static inline int precompose_read_config_gently(void) > +{ > + return 0; > +} > +static inline char *precompose_string_if_needed(const char *in) > +{ > + return NULL; /* no need to precompose a string */ > +} So do these. > diff --git a/git.c b/git.c > index a00a0a4d94..f09e14f733 100644 > --- a/git.c > +++ b/git.c > @@ -421,6 +421,15 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) > prefix = setup_git_directory_gently(&nongit_ok); > } > > + if (precompose_read_config_gently()) { > + precompose_argv(argc, argv); > + if (prefix) { > + const char *prec_pfx; > + prec_pfx = precompose_string_if_needed(prefix); > + if (prec_pfx) > + prefix = prec_pfx; /* memory lost */ > + } > + } And this would move to the beginning of precompose_argv() implementation. Also the code we currently use to read the core.precomposeunicode configuration variable (presumably somewhere in git_default_config() callchain; I didn't check) can go away. Which would be much nicer outcome, if doable (again, I didn't check). > diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh > index 54ce19e353..bbbc50da93 100755 > --- a/t/t3910-mac-os-precompose.sh > +++ b/t/t3910-mac-os-precompose.sh > @@ -191,6 +191,21 @@ test_expect_failure 'handle existing decomposed filenames' ' > test_must_be_empty untracked > ' > > +test_expect_success "unicode decomposed: git restore -p . " ' > + DIRNAMEPWD=dir.Odiarnfc && > + DIRNAMEINREPO=dir.$Adiarnfc && > + export DIRNAMEPWD DIRNAMEINREPO && > + git init $DIRNAMEPWD && > + ( cd $DIRNAMEPWD && > + mkdir $DIRNAMEINREPO && Style: ( cd $DIRNAMEPWD && mkdir $DIRNAMEINREPO && Is "restore" the only thing that has this issue? I would imagine that anything that takes '.' pathspec to limit its operation to the current subdirectory (e.g. "cd sub && git diff .") would be affected (clarification: I am *not* hinting that other commands need to be tested---I am however hinting to update the proposed log message to explain either (1) this applies in general to commands that does X, or (2) this affects only "restore -p" which does this unusual thing Y that no other commands do). Thanks. > + cd $DIRNAMEINREPO && > + echo "Initial" >file && > + git add file && > + echo "More stuff" >>file && > + echo y | git restore -p . > + ) > +' > + > # Test if the global core.precomposeunicode stops autosensing > # Must be the last test case > test_expect_success "respect git config --global core.precomposeunicode" ' > -- > 2.30.0.155.g66e871b664
On Sun, Jan 24, 2021 at 11:51:33AM -0800, Junio C Hamano wrote: Thanks for the fast review, much appreciated. The short answer: There is more to be done, V2 is coming somewhen. For the longer story, please see inline. > tboegi@web.de writes: > > > The solution is to read the config variable "core.precomposeunicode" early. > > For a single command like "restore", we need to fail if it is run > outside a repository anyway, so it is OK, but code in "git.c" in > general can be called outside a repository, we may not know where > our configuration files are, though. So I'd prefer to avoid > adding too many "do this thing early for every single command". > > Where do we normally read that variable? I see calls to > precompose_argv() on only a few codepaths > > builtin/diff-files.c:38: precompose_argv(argc, argv); > builtin/diff-index.c:28: precompose_argv(argc, argv); > builtin/diff-tree.c:129: precompose_argv(argc, argv); > builtin/diff.c:455: precompose_argv(argc, argv); > builtin/submodule--helper.c:1260: precompose_argv(diff_args.nr, diff_args.v); > parse-options.c:872: precompose_argv(argc, argv); > > I guess the reason we can get away with it is because most of the > newer commands use the parse_options() API, and the call to > precompose_argv() is used by the codepaths implementing these > commands. And as a rule, these commands read config first before > calling parse_options(), so by the time the control reaches there, > the value of the variable may be already known. Yes. > > The question is why "restore -p" is so special? Or does this patch > mean everybody, even the ones that use parse_options() is broken? One special thing is that it uses pathspec, `git restore -p .` and $CWD is inside a decomposed directory. There is more work to be done, as it seems. Running `git status . ` in an ASCII based repo (ignore the debug prints) user@mac:/tmp/210125-precomp/A.dir> git status . git.c/handle_builtin:699 cmd="status" len=6 git.c:428 prefix="A.dir/" (6) git.c:434 prec_pfx="(null)" (4294967295) builtin/commit.c:1401 prefix="A.dir/" (6) On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: O.file ------------------------- But doing the same test within a decomosed directory: user@mac:/tmp/210125-decomp/Ä.dir> git status . git.c/handle_builtin:699 cmd="status" len=6 git.c:428 prefix="Ä.dir/" (8) git.c:434 prec_pfx="Ä.dir/" (7) builtin/commit.c:1401 prefix="Ä.dir/" (7) On branch master nothing to commit, working tree clean --------- But using ".." as the pathspec works: user@mac:/tmp/210125-decomp/Ä.dir> git status .. git.c/handle_builtin:699 cmd="status" len=6 git.c:428 prefix="Ä.dir/" (8) git.c:434 prec_pfx="Ä.dir/" (7) builtin/commit.c:1401 prefix="Ä.dir/" (7) On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: "../A\314\210.dir/\303\226.file" no changes added to commit (use "git add" and/or "git commit -a") -------------- So in that sense, it seems as if `git restore` is special because the patch helps. More patches are needed asseen above. And the rest of the suggestions makes all sense.
diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c index 136250fbf6..06e371660f 100644 --- a/compat/precompose_utf8.c +++ b/compat/precompose_utf8.c @@ -36,6 +36,11 @@ static size_t has_non_ascii(const char *s, size_t maxlen, size_t *strlen_c) return ret; } +int precompose_read_config_gently(void) +{ + git_config_get_bool("core.precomposeunicode", &precomposed_unicode); + return precomposed_unicode == 1; +} void probe_utf8_pathname_composition(void) { @@ -60,6 +65,25 @@ void probe_utf8_pathname_composition(void) strbuf_release(&path); } +char *precompose_string_if_needed(const char *in) +{ + size_t inlen = strlen(in); + size_t outlen; + char *out = NULL; + if ((has_non_ascii(in, inlen, NULL)) && (precomposed_unicode == 1)) { + int saved_errno = errno; + out = reencode_string_len(in, inlen, + repo_encoding, path_encoding, + &outlen); + if (out && outlen == inlen && !memcmp(in, out, outlen)) { + /* strings are identical: no need to return a new one */ + free(out); + out = NULL; + } + errno = saved_errno; + } + return out; +} void precompose_argv(int argc, const char **argv) { diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h index 6f843d3e1a..ce82857d73 100644 --- a/compat/precompose_utf8.h +++ b/compat/precompose_utf8.h @@ -28,6 +28,8 @@ typedef struct { struct dirent_prec_psx *dirent_nfc; } PREC_DIR; +int precompose_read_config_gently(void); +char *precompose_string_if_needed(const char *in); void precompose_argv(int argc, const char **argv); void probe_utf8_pathname_composition(void); diff --git a/git-compat-util.h b/git-compat-util.h index 104993b975..f34854b66f 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -252,6 +252,14 @@ typedef unsigned long uintptr_t; #ifdef PRECOMPOSE_UNICODE #include "compat/precompose_utf8.h" #else +static inline int precompose_read_config_gently(void) +{ + return 0; +} +static inline char *precompose_string_if_needed(const char *in) +{ + return NULL; /* no need to precompose a string */ +} static inline void precompose_argv(int argc, const char **argv) { ; /* nothing */ diff --git a/git.c b/git.c index a00a0a4d94..f09e14f733 100644 --- a/git.c +++ b/git.c @@ -421,6 +421,15 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) prefix = setup_git_directory_gently(&nongit_ok); } + if (precompose_read_config_gently()) { + precompose_argv(argc, argv); + if (prefix) { + const char *prec_pfx; + prec_pfx = precompose_string_if_needed(prefix); + if (prec_pfx) + prefix = prec_pfx; /* memory lost */ + } + } if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) && !(p->option & DELAY_PAGER_CONFIG)) use_pager = check_pager_config(p->cmd); diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh index 54ce19e353..bbbc50da93 100755 --- a/t/t3910-mac-os-precompose.sh +++ b/t/t3910-mac-os-precompose.sh @@ -191,6 +191,21 @@ test_expect_failure 'handle existing decomposed filenames' ' test_must_be_empty untracked ' +test_expect_success "unicode decomposed: git restore -p . " ' + DIRNAMEPWD=dir.Odiarnfc && + DIRNAMEINREPO=dir.$Adiarnfc && + export DIRNAMEPWD DIRNAMEINREPO && + git init $DIRNAMEPWD && + ( cd $DIRNAMEPWD && + mkdir $DIRNAMEINREPO && + cd $DIRNAMEINREPO && + echo "Initial" >file && + git add file && + echo "More stuff" >>file && + echo y | git restore -p . + ) +' + # Test if the global core.precomposeunicode stops autosensing # Must be the last test case test_expect_success "respect git config --global core.precomposeunicode" '