Message ID | pull.1175.git.1647193162570.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | partial-clone: add a partial-clone test case | expand |
"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > > In a blobless-cloned repo, `git log --follow -- <path>` (`<path>` have > an exact OID rename) doesn't download blob of the file from where the > new file is renamed. Is this "doesn't" (documenting current behaviour, without saying if it is wrong or is desired) or "shouldn't" (documenting the desired behaviour, which the current implementation may or may not satisfy)? > +test_expect_success 'git log --follow does not download blobs if an exact OID rename found (blobless clone)' ' That's mouthful. > + rm -rf repo partial.git && > + test_create_repo repo && > + content="some dummy content" && > + test_commit -C repo create-a-file file.txt "$content" && > + git -C repo mv file.txt new-file.txt && > + git -C repo commit -m rename-the-file && > + test_config -C repo uploadpack.allowfilter 1 && > + test_config -C repo uploadpack.allowanysha1inwant 1 && > + > + git clone --filter=blob:none "file://$(pwd)/repo" partial.git && > + GIT_TRACE2_EVENT="$(pwd)/trace.txt" \ > + git -C partial.git log --follow -- new-file.txt > "$(pwd)/trace.txt" && Lose SP after '>'. git -C partial.git log --follow -- new-file.txt >"$(pwd)/trace.txt" && > + ! test_subcommand_inexact fetch <trace.txt Looking at the implementation of the helper, it seems to be prepared to handle negation itself. Shouldn't this be test_subcommand_inexact ! fetch <trace.txt instead? > +' > + > test_expect_success 'lazy-fetch when accessing object not in the_repository' ' > rm -rf full partial.git && > test_create_repo full && > diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh > index 0f439c99d61..07a2b60c103 100644 > --- a/t/test-lib-functions.sh > +++ b/t/test-lib-functions.sh > @@ -1811,7 +1811,7 @@ test_subcommand_inexact () { > shift > fi > > - local expr=$(printf '"%s".*' "$@") > + local expr=$(printf '.*"%s".*' "$@") The original wanted to make sure that the arguments to the helper are initial items of a comma separated list, and an existing caller, for example, i.e. test_subcommand_inexact git pack-objects --honor-pack-keep <trace is relying on the behaviour to make sure 'git', 'pack-objects', ... appear at the beginning of "[...]" enclosed list. This change breaks its ability to notice that an insertion of unrelated token before 'git' as an error. In other words, it looks like an uncalled-for selfish change. Why can't you specify what should NOT come before "fetch" in your use of this helper? > expr="${expr%,}" The preimage already has this problem, but the stripping of trailing comma here is a result of mistaken copy-and-paste from the exact variant, I think. test_subcommand uses local expr=$(printf '"%s",' "$@") to concatenate "$@" into a single comma-separated string, so it perfectly makes sense to drop the last one here, but with or without your change here, neither is adding a comma that need to be stripped. It is not _your_ theme, but I think this helper is poorly designed, especially compared to the original "exact" variant. test_subcommand_inexact () { local negate= if test "$1" = "!" then negate=t shift fi local expr=$(printf '"%s".*' "$@") expr="${expr%,}" if test -n "$negate" then ! grep "\"event\":\"child_start\".*\[$expr\]" else grep "\"event\":\"child_start\".*\[$expr\]" fi } I've already touched that "${expr%,}" there is a totally useful statement that will always be a no-op. When "test_subcommand_inexact git pack-objects" is run, the printf assigns to $expr: expr='"git".*"pack-objects".*' and the actual grep command invoked becomes grep '"event":"child_start".*\["git".*"pack-objects".*\]' I am not sure if that is what we really want. I wonder if it was more like this that the original wanted to grep for: grep '"event":"child_start".*\["git","pack-objects",.*\]' in which case the two lines there should be more like local expr=$(printf '"%s",' "$@") expr="${expr%,}.*" I would think. This comes from Derrick's e4d0c11c (repack: respect kept objects with '--write-midx -b', 2021-12-20).
Junio C Hamano <gitster@pobox.com> wrote: > Is this "doesn't" (documenting current behaviour, without saying if > it is wrong or is desired) or "shouldn't" (documenting the desired > behaviour, which the current implementation may or may not satisfy)? The current behaviour is okay and this commit adds the test case for it. So, in that sense, I think "shouldn't" is better word. > That's mouthful. Sorry if the test name is very long. But, I couldn't think of shorter test name than this - to explain what the test case is. > Lose SP after '>'. > > git -C partial.git log --follow -- new-file.txt >"$(pwd)/trace.txt" && Okay. > Looking at the implementation of the helper, it seems to be prepared > to handle negation itself. Shouldn't this be > > test_subcommand_inexact ! fetch <trace.txt > > instead? Oops, completely missed it. Correcting it :) > Why can't you specify what should NOT come before "fetch" in your > use of this helper? Below is the event triggered for non-exact OID rename - git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin Derrick told me to not depend on other flags like `-c fetch.negotiationAlgorithm` etc. as they might be changed or omitted and as it makes sense to me also. That's why I didn't specify those things. > I wonder if it was more like this that the original wanted to grep for: > > grep '"event":"child_start".*\["git","pack-objects",.*\]' I don't know about other cases, but in my case, atleast I really wanted it. So, In this scenerio, should I stick with `test_subcommand_inexact` or I have to see other helper functions (or make my own) for it?
On 3/13/2022 3:41 PM, Junio C Hamano wrote: > "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com> > writes: >> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh >> index 0f439c99d61..07a2b60c103 100644 >> --- a/t/test-lib-functions.sh >> +++ b/t/test-lib-functions.sh >> @@ -1811,7 +1811,7 @@ test_subcommand_inexact () { >> shift >> fi >> >> - local expr=$(printf '"%s".*' "$@") >> + local expr=$(printf '.*"%s".*' "$@") > > The original wanted to make sure that the arguments to the helper > are initial items of a comma separated list, and an existing caller, > for example, i.e. > > test_subcommand_inexact git pack-objects --honor-pack-keep <trace > > is relying on the behaviour to make sure 'git', 'pack-objects', ... > appear at the beginning of "[...]" enclosed list. This change > breaks its ability to notice that an insertion of unrelated token > before 'git' as an error. > > In other words, it looks like an uncalled-for selfish change. This change was my recommendation, but I see that it is probably not ideal. > Why can't you specify what should NOT come before "fetch" in your > use of this helper? > >> expr="${expr%,}" > > The preimage already has this problem, but the stripping of trailing > comma here is a result of mistaken copy-and-paste from the exact > variant, I think. test_subcommand uses > > local expr=$(printf '"%s",' "$@") > > to concatenate "$@" into a single comma-separated string, so it > perfectly makes sense to drop the last one here, but with or without > your change here, neither is adding a comma that need to be > stripped. > > > It is not _your_ theme, but I think this helper is poorly designed, > especially compared to the original "exact" variant. > > test_subcommand_inexact () { > local negate= > if test "$1" = "!" > then > negate=t > shift > fi > > local expr=$(printf '"%s".*' "$@") > expr="${expr%,}" > > if test -n "$negate" > then > ! grep "\"event\":\"child_start\".*\[$expr\]" > else > grep "\"event\":\"child_start\".*\[$expr\]" > fi > } > > > I've already touched that "${expr%,}" there is a totally useful > statement that will always be a no-op. > > When "test_subcommand_inexact git pack-objects" is run, the printf > assigns to $expr: > > expr='"git".*"pack-objects".*' > > and the actual grep command invoked becomes > > grep '"event":"child_start".*\["git".*"pack-objects".*\]' > > I am not sure if that is what we really want. Ah, yes this certainly seems to not be the expected plan. It does allow for more flexibility than intended: the intention was to add flexibility at the end of the command, but instead adds flexibility throughout, only caring that a certain list of options is present as a subsequence (except that the first item is the first item, namely "git" in most cases). That unintended flexibility would allow the current needs to use test_subcommand_inexact ! git fetch as desired, but there is the additional worries about whether it is too flexible for the existing uses. > I wonder if it was more like this that the original wanted to grep for: > > grep '"event":"child_start".*\["git","pack-objects",.*\]' > > in which case the two lines there should be more like > > local expr=$(printf '"%s",' "$@") > expr="${expr%,}.*" > > I would think. This comes from Derrick's e4d0c11c (repack: respect > kept objects with '--write-midx -b', 2021-12-20). Yep, that was definitely the intention, but I wrote it wrong. I'm torn between "let's fix it to work as intended and do something different for this test case" and "this flexibility is unexpected, but still gives us enough information to trust the tests." If you think that we should fix the helper to work differently, then I can work on a patch to do so, so Abhradeep doesn't get too sidetracked on that. Thanks, -Stolee
On 3/14/2022 11:46 AM, Abhradeep Chakraborty wrote: > Junio C Hamano <gitster@pobox.com> wrote: >> Why can't you specify what should NOT come before "fetch" in your >> use of this helper? > > Below is the event triggered for non-exact OID rename - > > git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin > > Derrick told me to not depend on other flags like > `-c fetch.negotiationAlgorithm` etc. as they might be changed or omitted > and as it makes sense to me also. That's why I didn't specify those things. This reason is something that could be mentioned in the commit message to motivate the change to the helper. >> I wonder if it was more like this that the original wanted to grep for: >> >> grep '"event":"child_start".*\["git","pack-objects",.*\]' > > I don't know about other cases, but in my case, atleast I really wanted > it. > > So, In this scenerio, should I stick with `test_subcommand_inexact` or I > have to see other helper functions (or make my own) for it? As I mentioned earlier, it seems that test_subcommand_inexact ! git fetch would actually work for your needs without changing the helper. We will see whether or not the helper needs to be updated in a way that that line would not work anymore. Thanks, -Stolee
Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> writes: > Junio C Hamano <gitster@pobox.com> wrote: > >> Is this "doesn't" (documenting current behaviour, without saying if >> it is wrong or is desired) or "shouldn't" (documenting the desired >> behaviour, which the current implementation may or may not satisfy)? > > The current behaviour is okay and this commit adds the test case for it. > So, in that sense, I think "shouldn't" is better word. > >> That's mouthful. > > Sorry if the test name is very long. But, I couldn't think of shorter > test name than this - to explain what the test case is. "exact rename does not need to fetch the blob lazily" >> Lose SP after '>'. >> >> git -C partial.git log --follow -- new-file.txt >"$(pwd)/trace.txt" && > > Okay. > >> Looking at the implementation of the helper, it seems to be prepared >> to handle negation itself. Shouldn't this be >> >> test_subcommand_inexact ! fetch <trace.txt >> >> instead? > > Oops, completely missed it. Correcting it :) > >> Why can't you specify what should NOT come before "fetch" in your >> use of this helper? > > Below is the event triggered for non-exact OID rename - > > git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin > > Derrick told me to not depend on other flags like > `-c fetch.negotiationAlgorithm` etc. as they might be changed or omitted > and as it makes sense to me also. That's why I didn't specify those things. > >> I wonder if it was more like this that the original wanted to grep for: >> >> grep '"event":"child_start".*\["git","pack-objects",.*\]' > > I don't know about other cases, but in my case, atleast I really wanted > it. > > So, In this scenerio, should I stick with `test_subcommand_inexact` or I > have to see other helper functions (or make my own) for it? If you are doing just a single grep, I am not sure why grepping for "fetch" alone without any helper is insufficient. In any case, butchering the "inexcat" helper to loosen it for other existing users of the same helper does not sound like a good direction to go. Thanks.
Derrick Stolee <derrickstolee@github.com> writes: >> git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin >> >> Derrick told me to not depend on other flags like >> `-c fetch.negotiationAlgorithm` etc. as they might be changed or omitted >> and as it makes sense to me also. That's why I didn't specify those things. > > This reason is something that could be mentioned in the commit message > to motivate the change to the helper. > >>> I wonder if it was more like this that the original wanted to grep for: >>> >>> grep '"event":"child_start".*\["git","pack-objects",.*\]' >> >> I don't know about other cases, but in my case, atleast I really wanted >> it. >> >> So, In this scenerio, should I stick with `test_subcommand_inexact` or I >> have to see other helper functions (or make my own) for it? > > As I mentioned earlier, it seems that > > test_subcommand_inexact ! git fetch > > would actually work for your needs without changing the helper. We will see > whether or not the helper needs to be updated in a way that that line would > not work anymore. Ah, that is because the current implementation of the helper sprinkles ".*" in between each and every pair of parameters, so the resulting pattern \["git".*"fetch".*\] would not be prevented from matching by the presence of "-c var.iable=value"? In any case, the _inexact helper may need to be rethought, or at least documented better what inexact-ness it wants to allow. With no concrete goals written down, my guess was that it wanted to ensure only the earlier command names and options, while ignoring the later arguments. And that was where my "adding .* in between looks wrong, and stripping trailing comma with ${expt%,} is nonsense" comment came from. Thanks.
Derrick Stolee <derrickstolee@github.com> writes: >> When "test_subcommand_inexact git pack-objects" is run, the printf >> assigns to $expr: >> >> expr='"git".*"pack-objects".*' >> >> and the actual grep command invoked becomes >> >> grep '"event":"child_start".*\["git".*"pack-objects".*\]' >> >> I am not sure if that is what we really want. > > Ah, yes this certainly seems to not be the expected plan. It does > allow for more flexibility than intended: the intention was to > add flexibility at the end of the command, but instead adds > flexibility throughout, only caring that a certain list of options > is present as a subsequence (except that the first item is the > first item, namely "git" in most cases). I guess I sent a response before reading this message from you. > That unintended flexibility would allow the current needs to use > > test_subcommand_inexact ! git fetch > > as desired, but there is the additional worries about whether it > is too flexible for the existing uses. Yeah, it looked a bit too loose. > If you think that we should fix the helper to work differently, then > I can work on a patch to do so, so Abhradeep doesn't get too > sidetracked on that. I agree that comparing what _inexact does and what its inventor wanted it to do and reconciling the differences would be outside the scope of this topic, which means the test in this patch should refrain from using the _inexact helper at all. I found it quite a roundabout way to look into trace to see if a "fetch" was run to determine if we are doing the right thing. Regardless of whatever mechanism is used to lazily fetch objects that have become necessary from the promisor remotes, what we want to ensure is that the blob object HEAD:new-file.txt is still missing in our object store after running "log --follow", isn't it? In a future version of "git", our on-demand lazy fetch mechanism may not even invoke "git fetch" under the hood, after all. Don't we have a more direct way to ask "does this object exist in our object store, or is it merely left as a promise?" without triggering a lazy fetching that we can use in this test? I think such a direct approach is what we want to use in this test.
Derrick Stolee <derrickstolee@github.com> wrote: > This reason is something that could be mentioned in the commit message > to motivate the change to the helper. Sure, definitely. Will add it. > As I mentioned earlier, it seems that > > test_subcommand_inexact ! git fetch > > would actually work for your needs without changing the helper. We will see > whether or not the helper needs to be updated in a way that that line would > not work anymore. Okay. got it! Thanks :)
Junio C Hamano <gitster@pobox.com> wrote: > I found it quite a roundabout way to look into trace to see if > a "fetch" was run to determine if we are doing the right thing. > > Regardless of whatever mechanism is used to lazily fetch objects > that have become necessary from the promisor remotes, what we want > to ensure is that the blob object HEAD:new-file.txt is still missing > in our object store after running "log --follow", isn't it? In a > future version of "git", our on-demand lazy fetch mechanism may not > even invoke "git fetch" under the hood, after all. > > Don't we have a more direct way to ask "does this object exist in > our object store, or is it merely left as a promise?" without > triggering a lazy fetching that we can use in this test? I think > such a direct approach is what we want to use in this test. I did think of other ways to detect the downloading before. Initially I thought of using grep functionality to see if any download related messages are printed or not. But I found that `git log` doesn't print any download related messages (e.g. like "enumerating ...."). So, I dropped that idea. The next idea that came into my mind was to detect if the previous file (the file from where the new file is renamed) is still missing ( you're suggesting the same approach). But the problem I faced with this apprach is I didn't find a way to detect if the file is missing. I tried to use `git rev-list --objects --missing=print` with `HEAD` and first commit hash. But in both cases, I didn't found a missing `[?]` sign before ` <blob-hash-ID> file.txt`. That means, both blob objects ( or I think the same blob object) exists in the local repo. Most probably, this is because the content of these two files are same. Blob object is never modified. So, as far as I think, both base trees ( one which is pointed by the initial commit and other which is pointed by the latest commit) is pointing to the same blob object. As a result, the file is not missing. ( I am not sure if git really does it as I said; sorry if I am wrong here) That's why I asked Derrick to help me detecting the download and he suggested me this. It is true that this is not ideal (use of `fetch` may change later). But this was the only option that I can go with. If you find another/better way or if you think I tried to use `git rev-list ...` in the wrong way - please tell me. I would be very happy. Thanks :)
On 3/15/2022 7:30 AM, Abhradeep Chakraborty wrote: > Junio C Hamano <gitster@pobox.com> wrote: > >> I found it quite a roundabout way to look into trace to see if >> a "fetch" was run to determine if we are doing the right thing. >> >> Regardless of whatever mechanism is used to lazily fetch objects >> that have become necessary from the promisor remotes, what we want >> to ensure is that the blob object HEAD:new-file.txt is still missing >> in our object store after running "log --follow", isn't it? In a >> future version of "git", our on-demand lazy fetch mechanism may not >> even invoke "git fetch" under the hood, after all. >> >> Don't we have a more direct way to ask "does this object exist in >> our object store, or is it merely left as a promise?" without >> triggering a lazy fetching that we can use in this test? I think >> such a direct approach is what we want to use in this test. > > I did think of other ways to detect the downloading before. Initially I > thought of using grep functionality to see if any download related > messages are printed or not. But I found that `git log` doesn't print > any download related messages (e.g. like "enumerating ...."). So, I > dropped that idea. > > The next idea that came into my mind was to detect if the previous file > (the file from where the new file is renamed) is still missing ( you're > suggesting the same approach). But the problem I faced with this apprach > is I didn't find a way to detect if the file is missing. > > I tried to use `git rev-list --objects --missing=print` with `HEAD` and > first commit hash. But in both cases, I didn't found a missing `[?]` sign > before ` <blob-hash-ID> file.txt`. That means, both blob objects ( or I > think the same blob object) exists in the local repo. Ah! This method really should have worked. And the test as written is not testing the right thing, because we also skip downloading the blobs if we already have them. I think the key issue is that your clone says this: + git clone --filter=blob:none "file://$(pwd)/repo" partial.git && which will do a checkout and download the blobs at tip. If you add a "--bare" to this clone command, then no blobs should be downloaded, and your rev-list command should show the missing objects. > Most probably, this is because the content of these two files are same. > Blob object is never modified. So, as far as I think, both base trees ( one > which is pointed by the initial commit and other which is pointed by the latest > commit) is pointing to the same blob object. As a result, the file is not > missing. ( I am not sure if git really does it as I said; sorry if I am > wrong here) > > That's why I asked Derrick to help me detecting the download and he > suggested me this. It is true that this is not ideal (use of `fetch` may > change later). But this was the only option that I can go with. > > If you find another/better way or if you think I tried to use `git rev-list > ...` in the wrong way - please tell me. I would be very happy. Hopefully my suggestion to use --bare will help. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> wrote: > Ah! This method really should have worked. And the test as written > is not testing the right thing, because we also skip downloading the > blobs if we already have them. > > I think the key issue is that your clone says this: > > + git clone --filter=blob:none "file://$(pwd)/repo" partial.git && > > which will do a checkout and download the blobs at tip. > > If you add a "--bare" to this clone command, then no blobs should be > downloaded, and your rev-list command should show the missing objects. Thanks Derrick! Let my try it. Thanks :)
Derrick Stolee <derrickstolee@github.com> writes: >> I tried to use `git rev-list --objects --missing=print` with `HEAD` and >> first commit hash. But in both cases, I didn't found a missing `[?]` sign >> before ` <blob-hash-ID> file.txt`. That means, both blob objects ( or I >> think the same blob object) exists in the local repo. > ... Yup. I was about to suggest --missing=allow-promisor to catch other unexpected missing objects, but in this toy history for testing, what is missing is all the objects expected from the promisor, so it should be sufficient to use --missing=print. > I think the key issue is that your clone says this: > > + git clone --filter=blob:none "file://$(pwd)/repo" partial.git && > > which will do a checkout and download the blobs at tip. > > If you add a "--bare" to this clone command, then no blobs should be > downloaded, and your rev-list command should show the missing objects. That sounds like pointing at a different issue. If the test repository downloads the blobs at the tip, then the fact that the trace output did not have "fetch" in it does not mean much, does it? It could be that we refrained from lazily download the blob because we did not need its contents for the purpose of following through an exact rename, but it could also be that we did not need to lazily download it because we already had it. > Hopefully my suggestion to use --bare will help. Yup, thanks. So regardless of "--missing=print" vs "grep in trace", there was a bug in the test set-up, and we caught it in this discussion, which is excellent.
Junio C Hamano <gitster@pobox.com> wrote: > That sounds like pointing at a different issue. If the test > repository downloads the blobs at the tip, then the fact that the > trace output did not have "fetch" in it does not mean much, does it? > It could be that we refrained from lazily download the blob because > we did not need its contents for the purpose of following through an > exact rename, but it could also be that we did not need to lazily > download it because we already had it. hmm, you're right. > So regardless of "--missing=print" vs "grep in trace", there was a > bug in the test set-up, and we caught it in this discussion, which > is excellent. :)
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index f17abd298c8..1e54f4844fd 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -618,6 +618,22 @@ test_expect_success 'do not fetch when checking existence of tree we construct o git -C repo cherry-pick side1 ' +test_expect_success 'git log --follow does not download blobs if an exact OID rename found (blobless clone)' ' + rm -rf repo partial.git && + test_create_repo repo && + content="some dummy content" && + test_commit -C repo create-a-file file.txt "$content" && + git -C repo mv file.txt new-file.txt && + git -C repo commit -m rename-the-file && + test_config -C repo uploadpack.allowfilter 1 && + test_config -C repo uploadpack.allowanysha1inwant 1 && + + git clone --filter=blob:none "file://$(pwd)/repo" partial.git && + GIT_TRACE2_EVENT="$(pwd)/trace.txt" \ + git -C partial.git log --follow -- new-file.txt > "$(pwd)/trace.txt" && + ! test_subcommand_inexact fetch <trace.txt +' + test_expect_success 'lazy-fetch when accessing object not in the_repository' ' rm -rf full partial.git && test_create_repo full && diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 0f439c99d61..07a2b60c103 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1811,7 +1811,7 @@ test_subcommand_inexact () { shift fi - local expr=$(printf '"%s".*' "$@") + local expr=$(printf '.*"%s".*' "$@") expr="${expr%,}" if test -n "$negate"