Message ID | 20250412174051.780148-1-jayatheerthkulkarni2005@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | t3706: Add test for wildcard vs literal pathspec | expand |
Hi! > +test_expect_success 'setup: create files and initial commit' ' > + mkdir testdir && > + >testdir/f\* && > + >testdir/f\*\* && > + >testdir/foo && > + git add testdir && > + git commit -m "Initial setup with literal wildcard files" > +' > + > +test_expect_success 'clean slate before testing wildcard behavior' ' > + git rm -rf testdir && > + git commit -m "Clean state" > +' > > +test_expect_success 'recreate files to test add behavior' ' > + mkdir testdir && > + >testdir/f\* && > + >testdir/f\*\* && > + >testdir/foo > +' Two questions: 1. Does this need to be inside a test_expect_success? It seems to me that those two tests cases are actually setup code for the next two. 2. If so, does it need to have all that setup? I could reproduce the bug by only running: ``` git reset touch foo 'f*' 'f**' git add 'f*' git ls-files ``` btw, this works with your code, congrats! Other idea: `?` is another wildcard for matching only one character. Have you tested if the same bug happens with it? PS: while I was writing this review I pushed this to my GitHub just to make the CI run the entire test suite since pathspecs are a sensible part of Git. Take look at this, it seems that your tests aren't passing on Windows: https://github.com/lucasoshiro/git/actions/runs/14450183624/job/40521015897. Perhaps you'll need to change something there. It seems to be related to how Windows handle paths (specially the \ character, which means the same as / in Unix). Personally, I'm not a Windows guy and can't help you further with this. A quick reference on how paths on Windows work is this (and yeah, they are far more complex than in Unix): https://www.fileside.app/blog/2023-03-17_windows-file-paths/
On Mon, Apr 14, 2025 at 10:21 PM Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> wrote: > > Hi! > Hi Lucas, > > +test_expect_success 'setup: create files and initial commit' ' > > + mkdir testdir && > > + >testdir/f\* && > > + >testdir/f\*\* && > > + >testdir/foo && > > + git add testdir && > > + git commit -m "Initial setup with literal wildcard files" > > +' > > + > > +test_expect_success 'clean slate before testing wildcard behavior' ' > > + git rm -rf testdir && > > + git commit -m "Clean state" > > +' > > > > +test_expect_success 'recreate files to test add behavior' ' > > + mkdir testdir && > > + >testdir/f\* && > > + >testdir/f\*\* && > > + >testdir/foo > > +' > > > Two questions: > > 1. Does this need to be inside a test_expect_success? It seems to me > that those two tests cases are actually setup code for the next > two. > > 2. If so, does it need to have all that setup? I could reproduce the > bug by only running: > > ``` > git reset > touch foo 'f*' 'f**' > git add 'f*' > git ls-files > ``` > Umm, I think the setup should just be a single block. I will send a patch on this I'm parallely working on a second patch. > btw, this works with your code, congrats! > Thank you, credit goes to Peff(Jeff King) I almost lost track. > Other idea: `?` is another wildcard for matching only one character. > Have you tested if the same bug happens with it? > Yup I think that's also a great suggestion, I think consolidating setup and adding at least a few different wildcards will be good. > PS: while I was writing this review I pushed this to my GitHub just > to make the CI run the entire test suite since pathspecs are a > sensible part of Git. > > Take look at this, it seems that your tests aren't passing on Windows: > https://github.com/lucasoshiro/git/actions/runs/14450183624/job/40521015897. Oh damn!! That's a silly mistake. I almost forgot windows exist!! Thanks for letting me know > Perhaps you'll need to change something there. It seems to be > related to how Windows handle paths (specially the \ character, which > means the same as / in Unix). Personally, I'm not a Windows guy and Same not a windows guy, but I will have to read some things out. Will figure it out. > can't help you further with this. A quick reference on how paths on > Windows work is this (and yeah, they are far more complex than in > Unix): > https://www.fileside.app/blog/2023-03-17_windows-file-paths/ > > Thanks again Lucas, these help. -Jayatheerth
K Jayatheerth <jayatheerthkulkarni2005@gmail.com> writes: > When 'git add <pattern>' is run, and a file exists whose literal > name matches <pattern> (e.g., a file named "f*" when running > 'git add f*'), Git should treat the pattern as a wildcard > matching multiple files, rather than just adding the literal file. This pretends to be a separate and independent patch, but it in reality depends on the other change to dir.c, doesn't it? Either make them into two-patch series, or better yet make them into a single patch, perhaps? > Add a test case that: > 1. Creates files 'f*', 'f**', and 'foo'. > 2. Verifies that 'git add "f\*\*"' > correctly adds only the literal file 'f**'. > 3. Verifies that 'git add 'f*'' (quoted to prevent shell expansion) > correctly adds 'f*', 'f**', and 'foo' by treating 'f*' as a > wildcard. That is something easily can be read from the script itself. Don't repeat what the code does, UNLESS the explanation helps readers to understand what the code does NOT tell them (like, "why does the code does what it does?"). > Covering these the test adds 5 cases. > > Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> > --- > t/meson.build | 1 + > t/t3706-add-wildcard-literal.sh | 44 +++++++++++++++++++++++++++++++++ > 2 files changed, 45 insertions(+) > create mode 100755 t/t3706-add-wildcard-literal.sh I am not sure if this should be done as a new test script and covers only "git add". Is "git add" the only code paths that triggers do_match_pathspec()? If there are many, it may make sense to have a new script (like this patch does) and cover all of these commands that are corrected by the previous patch. If not, and if it truly affects only "git add", then it would make more sense to add to an existing test script that covers "git add". > +test_expect_success 'setup: create files and initial commit' ' > + mkdir testdir && > + >testdir/f\* && > + >testdir/f\*\* && > + >testdir/foo && > + git add testdir && > + git commit -m "Initial setup with literal wildcard files" > +' > + > +test_expect_success 'clean slate before testing wildcard behavior' ' > + git rm -rf testdir && > + git commit -m "Clean state" > +' What's the point of the above two test pieces? The initial commit is not used in any way after it gets created, the wildcard pathspec, which is the topic of this test, is not involved in its creation in any way, and everything gets removed before the real test after this step starts. Not complaining, but showing puzzlement, as I truly do not see the point of these two steps. > +test_expect_success 'recreate files to test add behavior' ' > + mkdir testdir && > + >testdir/f\* && > + >testdir/f\*\* && > + >testdir/foo > +' > + > +test_expect_success 'quoted literal: git add "f\\*\\*" adds only f**' ' > + git reset && What is the point of this "reset"? Since the last commit, the tree of HEAD matches the index, and nobody touched the index after that. > + git add "testdir/f\\*\\*" && OK. We are giving the pathspec with asterisks quoted in it, to make sure the asterisks are not interpreted as glob. So there is only one path that gets added in the end, and that is what is tested. That's a good test (except that I do not understand what the "reset" at the beginning is trying to do). > + git ls-files >actual && > + echo "testdir/f**" >expected && > + test_cmp expected actual > +' > + > +test_expect_success 'wildcard: git add f* adds f*, f** and foo' ' > + git reset && > + git add '\''testdir/f*'\'' && Why not just git add "testdir/f*" && to match the previous one? > + git ls-files | sort >actual && "git" command on the left hand side of a pipe means even if it fails, its exit status is hidden. Also, the order ls-files emits paths is well defined (sorted in the byte-value order regardless of locale), so piping its output through sort loses information unnecessarily (you wouldn't notice if ls-files started emitting its paths in an incorrect order). Since you use these three pathnames in this order quote often in this script, why not do this once upfront: cat >list-of-files <<-\EOF && testdir/f* testdir/f** testdir/foo EOF and then use the list of files in later steps like so: while read path do >"$path" done <list-of-files && git ls-files >actual && test_cmp list-of-files actual > + printf "%s\n" "testdir/f*" "testdir/f**" "testdir/foo" | sort >expected && > + test_cmp expected actual > +' > + > +test_done > \ No newline at end of file Please make sure your files are properly formatted. Perhaps we should add whitespace error class 'incomplete-lines' that catches this, and enable it globally, or something?
On 2025-04-12 at 17:40:51, K Jayatheerth wrote: > +test_expect_success 'recreate files to test add behavior' ' > + mkdir testdir && > + >testdir/f\* && > + >testdir/f\*\* && I just want to point out that creating files with asterisks may not be possible on Windows due to limitations in the file system. I'm not a Windows expert, so unfortunately I can't provide more details than that, but you may end up needing to add a prerequisite here to skip this on our Windows platforms if necessary. Hopefully CI and a suitable search can help you figure it out. > +test_done > \ No newline at end of file We do want to keep newlines at the end of a file. POSIX mandates one on text files and some systems are less tolerant of missing newlines than others. Usually Linux and the BSDs handle this just fine, but some proprietary Unix systems, which unfortunately we don't have CI for, tend to be the ones that are less happy about this. I haven't given this a full review, since others have done that instead, but just pointed out one or two things that got my attention.
On Wed, Apr 16, 2025 at 4:02 AM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2025-04-12 at 17:40:51, K Jayatheerth wrote: > > +test_expect_success 'recreate files to test add behavior' ' > > + mkdir testdir && > > + >testdir/f\* && > > + >testdir/f\*\* && > > I just want to point out that creating files with asterisks may not be > possible on Windows due to limitations in the file system. I'm not a > Windows expert, so unfortunately I can't provide more details than that, > but you may end up needing to add a prerequisite here to skip this on > our Windows platforms if necessary. Hopefully CI and a suitable search > can help you figure it out. > Ok I will look into it, thank you for letting me know > > +test_done > > \ No newline at end of file > > We do want to keep newlines at the end of a file. POSIX mandates one on > text files and some systems are less tolerant of missing newlines than > others. Usually Linux and the BSDs handle this just fine, but some > proprietary Unix systems, which unfortunately we don't have CI for, tend > to be the ones that are less happy about this. > Ok I will make sure of that > I haven't given this a full review, since others have done that instead, > but just pointed out one or two things that got my attention. > -- > brian m. carlson (they/them) > Toronto, Ontario, CA While I was looking into the reviews I was creating various test cases with these files '*' '**' '?' '\*' '[abc]' commit_files 'f*' 'f**' 'file[1-3]' 'foo*bar' 'f?z' 'hello?world' Everything went correct But when I checked \* and which is used for getting * as specific but there is also a literal \* in the above files So it still adds both, I'm unsure if that is the intended behaviour. but when I say git add "\*" it adds both the files * and \* But rest of the other wildcards and literals work as intended which is why I incorporated the \* literal I also think I will still divide the test file because git add isn't the only one that looks into wildcards and pathspec I think something like git commit "*" -m "Test" also would be a great test or even git rm command. About the windows question, I think I will see if there is any common ground I could find But until then I think prereq is a great option. For reference my test file looks something like this, --- /dev/null +++ b/t/t6137-pathspec-wildcard-literal.sh @@ -0,0 +1,139 @@ +#!/bin/sh + +test_description='test wildcards and literals with various git commands' + +. ./test-lib.sh + +reset_git_repo () { + rm -rf .git && + git init +} + +test_expect_success 'setup' ' + mkdir testdir && + cd testdir && + touch "*" "?" "[abc]" "f*" "f?z" "a" && + touch "**" "foo*bar" "hello?world" "f**" "hello_world" && + git init +' + +test_expect_success 'check * wildcard in git add' ' + git init && + git add "*" && + cat >expected_files <<EOF && +* +** +? +[abc] +a +f* +f** +f?z +foo*bar +hello?world +hello_world +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_expect_success 'check \* literal in git add' ' + reset_git_repo && + git add "\*" && + cat >expected_files <<EOF && +* +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_expect_success 'check f* wildcard in git add' ' + reset_git_repo && + git add "f*" && + cat >expected_files <<EOF && +f* +f** +f?z +foo*bar +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_expect_success 'check f\* literal in git add' ' + reset_git_repo && + git add "f\*" && + cat >expected_files <<EOF && +f* +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_expect_success 'check f** wildcard in git add' ' + reset_git_repo && + git add "f**" && + cat >expected_files <<EOF && +f* +f** +f?z +foo*bar +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_expect_success 'check f\*\* literal in git add' ' + reset_git_repo && + git add "f\*\*" && + cat >expected_files <<EOF && +f** +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_expect_success 'check ? wildcard in git add' ' + reset_git_repo && + git add "?" && + cat >expected_files <<EOF && +* +? +a +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_expect_success 'check \? literal in git add' ' + reset_git_repo && + git add "\?" && + cat >expected_files <<EOF && +? +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_expect_success 'check hello?world wildcard in git add' ' + reset_git_repo && + git add "hello?world" && + cat >expected_files <<EOF && +hello?world +hello_world +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_expect_success 'check hello\?world literal in git add' ' + reset_git_repo && + git add "hello\?world" && + cat >expected_files <<EOF && +hello?world +EOF + git ls-files >actual_files && + test_cmp expected_files actual_files +' + +test_done
JAYATHEERTH K <jayatheerthkulkarni2005@gmail.com> writes: > I think something like git commit "*" -m "Test" also would be a great test > or even git rm command. Yes, there are things other than "git add" that take pathspec (even "git ls-files" does so), and demonstrating the blast radious of the existing "bug" you fixed, with how they behave differently and better with your fix, would be a good thing to do. But make sure you follow "dashed options first, then other args" convention. I do not offhand know (and *MORE* *IMPORTANTLY*, I do not want anybody to depend on) what the current command line parser happens to do to git commit "*" -m "Test" If you meant to say that a pathspec with glob, always write it the right way: git commit -m "Test" -- "*" > About the windows question, I think I will see if there is any common > ground I could find > But until then I think prereq is a great option. The FUNNYNAMES prereq was invented to mark tests that rely on filesystem being able to handle certain letters, so that may be a good thing to use.
On Wed, Apr 16, 2025 at 6:42 PM Junio C Hamano <gitster@pobox.com> wrote: > > JAYATHEERTH K <jayatheerthkulkarni2005@gmail.com> writes: > > > I think something like git commit "*" -m "Test" also would be a great test > > or even git rm command. > > Yes, there are things other than "git add" that take pathspec (even > "git ls-files" does so), and demonstrating the blast radious of the > existing "bug" you fixed, with how they behave differently and > better with your fix, would be a good thing to do. > Yes ls-files is also a great example, I will add them in the test. I think for the pathspec and glob specific commands almost all the commands share the same code, so it should work the same for all. > But make sure you follow "dashed options first, then other args" > convention. I do not offhand know (and *MORE* *IMPORTANTLY*, I do > not want anybody to depend on) what the current command line parser > happens to do to > > git commit "*" -m "Test" > > If you meant to say that a pathspec with glob, always write it the > right way: > > git commit -m "Test" -- "*" > Ok, it makes sense, I will follow this format. > > About the windows question, I think I will see if there is any common > > ground I could find > > But until then I think prereq is a great option. > > The FUNNYNAMES prereq was invented to mark tests that rely on > filesystem being able to handle certain letters, so that may be a > good thing to use. Noted will add this too, I'm still finding all the possible files and how they could matter in this test. Once I do that and check if the CI works or not and if I cannot make it work with both platforms (Which I think most probably will happen) I will add the prereq FUNNYNAMES as intended. -Jayatheerth
> Yes ls-files is also a great example, I will add them in the test. I was going to suggest you to use `git ls-files -o 'f**'` in your test, which would eliminate the need of `git add` and `git reset`. However, I just found that the bug doesn't happen here: ``` git init touch foo 'f*' 'f**' f bar git ls-files -o 'f*' ``` Here (I'm using the current `next`, currently at `fd585f7`), `git ls-files -o 'f*'` list the files correctly: ``` f f* f** foo ``` I also tried with `git grep`: ``` git init touch foo 'f*' 'f**' f bar for f in *; do echo 123 > "$f"; done git add -A git grep 123 -- 'f*' and somehow it worked: ``` f:1:123 f*:1:123 f**:1:123 foo:1:123 ``` So, if I'm not doing anything wrong, it looks that it is not solely related to pathspecs, but related to pathspecs when used with some other commands. hmmm... > I think for the pathspec and glob specific commands almost all the commands > share the same code, so it should work the same for all. I also though the same, but somehow it behaves differently at least with `ls-files` and `grep`. Perhaps it will need further investigation on how some commands behave correctly and some don't. I would start by inspecting other commands that uses pathspecs (some that I remember: checkout, log, show, stash, status, ls-files, grep) and see if they work correctly or not, then compare the two groups and see what differs between them under the hook.
Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> writes: > So, if I'm not doing anything wrong, it looks that it is not solely > related to pathspecs, but related to pathspecs when used with some other > commands. hmmm... I haven't looked into the code, but if my recollection is correct, "add" is a bit curious in that it has to deal with paths that are not yet in the index, unlike "ls-files" and "grep". It could be one half of the code paths use the "grab everything that matches the glob" while the other half uses "stop when there is an exact match", perhaps? In the very early days of Git, I do recall making a very conscious decision to stop when there is an exact match to help those who add funny pathsnames with glob characters in it, but that is a long time ago, so I wouldn't be surprised if we gained multiple code paths to do the same thing, some of which have been corrected while the original ones haven't.
On Wed, Apr 16, 2025 at 9:20 PM Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> wrote: > > > > Yes ls-files is also a great example, I will add them in the test. > > > I was going to suggest you to use `git ls-files -o 'f**'` in your test, > which would eliminate the need of `git add` and `git reset`. However, I > just found that the bug doesn't happen here: > > ``` > git init > touch foo 'f*' 'f**' f bar > git ls-files -o 'f*' > ``` > > Here (I'm using the current `next`, currently at `fd585f7`), > `git ls-files -o 'f*'` list the files correctly: > > ``` > f > f* > f** > foo > ``` > > I also tried with `git grep`: > > ``` > git init > touch foo 'f*' 'f**' f bar > for f in *; do echo 123 > "$f"; done > git add -A > git grep 123 -- 'f*' > > and somehow it worked: > > ``` > f:1:123 > f*:1:123 > f**:1:123 > foo:1:123 > ``` > > So, if I'm not doing anything wrong, it looks that it is not solely > related to pathspecs, but related to pathspecs when used with some other > commands. hmmm... > > > I think for the pathspec and glob specific commands almost all the commands > > share the same code, so it should work the same for all. > > > I also though the same, but somehow it behaves differently at least with > `ls-files` and `grep`. Perhaps it will need further investigation on how > some commands behave correctly and some don't. I would start by > inspecting other commands that uses pathspecs (some that I remember: > checkout, log, show, stash, status, ls-files, grep) and see if they work > correctly or not, then compare the two groups and see what differs > between them under the hook. That is interesting given that this part of code <prune function in add.c> while (--i >= 0) { struct dir_entry *entry = *src++; if (dir_path_match(repo->index, entry, pathspec, prefix, seen)) *dst++ = entry; } takes part in dir_path_match Which traces to *Exact Match* problem that was found and from ls files if (!index_name_is_other(istate, ent->name, ent->len)) continue; show_dir_entry(istate, tag_other, ent); Where show_dir_entry calls static void show_dir_entry(struct index_state *istate, const char *tag, struct dir_entry *ent) { int len = max_prefix_len; if (len > ent->len) die("git ls-files: internal error - directory entry not superset of prefix"); /* If ps_matches is non-NULL, figure out which pathspec(s) match. */ if (ps_matched) dir_path_match(istate, ent, &pathspec, len, ps_matched); Something like this So I think the argument values are making the difference but I'm still unsure.
> Something like this > So I think the argument values are making the difference but I'm still unsure. I think from ls-files it's how &pathspec values are received, it contains has_wildcard kinds of variables in the structure -Jayatheerth
test_expect_success 'commit wildcard pathspec limits commit' ' reset_git_repo && git add . && git commit --allow-empty -m "Test commit with * wildcard" -- "*" && cat >expected_files <<-\EOF && * ** ? [abc] a f* f** f?z foo*bar hello?world hello_world EOF git ls-tree -r --name-only HEAD > actual_files && test_cmp expected_files actual_files ' I get issues with these test cases, Specially at git ls-tree -r --name-only HEAD > actual_files For some reason the test_cmp doesn't get resolved Is it a bad way to do? or is there a better way?
diff --git a/t/meson.build b/t/meson.build index 8b3aed14ea..e9cd9da8a2 100644 --- a/t/meson.build +++ b/t/meson.build @@ -417,6 +417,7 @@ integration_tests = [ 't3703-add-magic-pathspec.sh', 't3704-add-pathspec-file.sh', 't3705-add-sparse-checkout.sh', + 't3706-add-wildcard-literal.sh', 't3800-mktag.sh', 't3900-i18n-commit.sh', 't3901-i18n-patch.sh', diff --git a/t/t3706-add-wildcard-literal.sh b/t/t3706-add-wildcard-literal.sh new file mode 100755 index 0000000000..0fc27b28ac --- /dev/null +++ b/t/t3706-add-wildcard-literal.sh @@ -0,0 +1,44 @@ +#!/bin/sh + +test_description='git add: wildcard must not be shadowed by literal filename' + +. ./test-lib.sh + +test_expect_success 'setup: create files and initial commit' ' + mkdir testdir && + >testdir/f\* && + >testdir/f\*\* && + >testdir/foo && + git add testdir && + git commit -m "Initial setup with literal wildcard files" +' + +test_expect_success 'clean slate before testing wildcard behavior' ' + git rm -rf testdir && + git commit -m "Clean state" +' + +test_expect_success 'recreate files to test add behavior' ' + mkdir testdir && + >testdir/f\* && + >testdir/f\*\* && + >testdir/foo +' + +test_expect_success 'quoted literal: git add "f\\*\\*" adds only f**' ' + git reset && + git add "testdir/f\\*\\*" && + git ls-files >actual && + echo "testdir/f**" >expected && + test_cmp expected actual +' + +test_expect_success 'wildcard: git add f* adds f*, f** and foo' ' + git reset && + git add '\''testdir/f*'\'' && + git ls-files | sort >actual && + printf "%s\n" "testdir/f*" "testdir/f**" "testdir/foo" | sort >expected && + test_cmp expected actual +' + +test_done \ No newline at end of file
When 'git add <pattern>' is run, and a file exists whose literal name matches <pattern> (e.g., a file named "f*" when running 'git add f*'), Git should treat the pattern as a wildcard matching multiple files, rather than just adding the literal file. Add a test case that: 1. Creates files 'f*', 'f**', and 'foo'. 2. Verifies that 'git add "f\*\*"' correctly adds only the literal file 'f**'. 3. Verifies that 'git add 'f*'' (quoted to prevent shell expansion) correctly adds 'f*', 'f**', and 'foo' by treating 'f*' as a wildcard. Covering these the test adds 5 cases. Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> --- t/meson.build | 1 + t/t3706-add-wildcard-literal.sh | 44 +++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) create mode 100755 t/t3706-add-wildcard-literal.sh