Message ID | pull.1730.git.1715742069966.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | bundle-uri: refresh packed_git if unbundle succeed | expand |
On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote: > From: Xing Xin <xingxin.xx@bytedance.com> Long time no see :) > When using the bundle-uri mechanism with a bundle list containing > multiple interrelated bundles, we encountered a bug where tips from > downloaded bundles were not being discovered, resulting in rather slow > clones. This was particularly problematic when employing the heuristic > `creationTokens`. > > And this is easy to reproduce. Suppose we have a repository with a > single branch `main` pointing to commit `A`, firstly we create a base > bundle with > > git bundle create base.bundle main > > Then let's add a new commit `B` on top of `A`, so that an incremental > bundle for `main` can be created with > > git bundle create incr.bundle A..main > > Now we can generate a bundle list with the following content: > > [bundle] > version = 1 > mode = all > heuristic = creationToken > > [bundle "base"] > uri = base.bundle > creationToken = 1 > > [bundle "incr"] > uri = incr.bundle > creationToken = 2 > > A fresh clone with the bundle list above would give the expected > `refs/bundles/main` pointing at `B` in new repository, in other words we > already had everything locally from the bundles, but git would still > download everything from server as if we got nothing. > > So why the `refs/bundles/main` is not discovered? After some digging I > found that: > > 1. when unbundling a downloaded bundle, a `verify_bundle` is called to > check its prerequisites if any. The verify procedure would find oids > so `packed_git` is initialized. > > 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`, > during which `mark_complete_and_common_ref` and `mark_tips` would > find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be > enlisted if `packed_git` has already initialized in 1. And I assume we do not want it to not use `OBJECT_INFO_QUICK`? > Back to the example above, when unbunding `incr.bundle`, `base.pack` is > enlisted to `packed_git` bacause of the prerequisites to verify. Then we > can not find `B` for negotiation at a latter time bacause `B` exists in > `incr.pack` which is not enlisted in `packed_git`. Okay, the explanation feels sensible. > This commit fixes this by adding a `reprepare_packed_git` call for every > successfully unbundled bundle, which ensures to enlist all generated > packs from bundle uri. And a set of negotiation related tests are added. This makes me wonder though. Do we really need to call `reprepare_packed_git()` once for every bundle, or can't we instead call it once at the end once we have fetched all bundles? Reading on. > Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> > --- > bundle-uri: refresh packed_git if unbundle succeed > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1730 > > bundle-uri.c | 3 + > t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 132 insertions(+) > > diff --git a/bundle-uri.c b/bundle-uri.c > index ca32050a78f..2b9d36cfd8e 100644 > --- a/bundle-uri.c > +++ b/bundle-uri.c > @@ -7,6 +7,7 @@ > #include "refs.h" > #include "run-command.h" > #include "hashmap.h" > +#include "packfile.h" > #include "pkt-line.h" > #include "config.h" > #include "remote.h" > @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) > VERIFY_BUNDLE_QUIET))) > return 1; > > + reprepare_packed_git(r); > + So what's hidden here is that `unbundle_from_file()` will also try to access the bundle's refs right away. Surprisingly, we do so by calling `refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the effect that we basically accept arbitrary object IDs here even if we do not know those. That's why we didn't have to `reprepare_packed_git()` before this change. Now there are two conflicting thoughts here: - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs should now be accessible. - Or we can avoid calling `reprepare_packed_git()` inside the loop and instead call it once after we have fetched all bundles. The second one feels a bit like premature optimization to me. But the first item does feel like it could help us to catch broken bundles because we wouldn't end up creating refs for objects that neither we nor the bundle have. Patrick
"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Xing Xin <xingxin.xx@bytedance.com>> > When using the bundle-uri mechanism with a bundle list containing > multiple interrelated bundles, we encountered a bug where tips from > downloaded bundles were not being discovered, resulting in rather slow > clones. This was particularly problematic when employing the heuristic > `creationTokens`. > > And this is easy to reproduce. Suppose we have a repository with a > single branch `main` pointing to commit `A`, firstly we create a base > bundle with > > git bundle create base.bundle main > > Then let's add a new commit `B` on top of `A`, so that an incremental > bundle for `main` can be created with > > git bundle create incr.bundle A..main > > Now we can generate a bundle list with the following content: > > [bundle] > version = 1 > mode = all > heuristic = creationToken > > [bundle "base"] > uri = base.bundle > creationToken = 1 > > [bundle "incr"] > uri = incr.bundle > creationToken = 2 > > A fresh clone with the bundle list above would give the expected > `refs/bundles/main` pointing at `B` in new repository, in other words we > already had everything locally from the bundles, but git would still > download everything from server as if we got nothing. > > So why the `refs/bundles/main` is not discovered? After some digging I > found that: > > 1. when unbundling a downloaded bundle, a `verify_bundle` is called to s/a// > check its prerequisites if any. The verify procedure would find oids > so `packed_git` is initialized. > So the flow is: 1. `fetch_bundle_list` fetches all the bundles advertised via `download_bundle_list` to local files. 2. It then calls `unbundle_all_bundles` to unbundle all the bundles. 3. Each bundle is then unbundled using `unbundle_from_file`. 4. Here, we first read the bundle header to get all the prerequisites for the bundle, this is done in `read_bundle_header`. 5. Then we call `unbundle`, which calls `verify_bundle` to ensure that the repository does indeed contain the prerequisites mentioned in the bundle. Then it creates the index from the bundle file. So because the objects are being checked, the `prepare_packed_git` function is eventually called, which means that the `raw_object_store->packed_git` data gets filled in and `packed_git_initialized` is set. This means consecutive calls to `prepare_packed_git` doesn't re-initiate `raw_object_store->packed_git` since `packed_git_initialized` already is set. So your explanation makes sense, as a _nit_ I would perhaps add the part about why consecutive calls to `prepare_packed_git` are ineffective. > 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`, s/unbundled/unbundling > during which `mark_complete_and_common_ref` and `mark_tips` would > find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be > enlisted if `packed_git` has already initialized in 1. > Back to the example above, when unbunding `incr.bundle`, `base.pack` is > enlisted to `packed_git` bacause of the prerequisites to verify. Then we > can not find `B` for negotiation at a latter time bacause `B` exists in > `incr.pack` which is not enlisted in `packed_git`. > > This commit fixes this by adding a `reprepare_packed_git` call for every > successfully unbundled bundle, which ensures to enlist all generated > packs from bundle uri. And a set of negotiation related tests are added. > The solution makes sense. > Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> > --- > bundle-uri: refresh packed_git if unbundle succeed > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1730 > > bundle-uri.c | 3 + > t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 132 insertions(+) > > diff --git a/bundle-uri.c b/bundle-uri.c > index ca32050a78f..2b9d36cfd8e 100644 > --- a/bundle-uri.c > +++ b/bundle-uri.c > @@ -7,6 +7,7 @@ > #include "refs.h" > #include "run-command.h" > #include "hashmap.h" > +#include "packfile.h" > #include "pkt-line.h" > #include "config.h" > #include "remote.h" > @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) > VERIFY_BUNDLE_QUIET))) > return 1; > > + reprepare_packed_git(r); > + > Would it make sense to move this to `bundle.c:unbundle()`, since that is also where the idx is created? [snip]
Patrick Steinhardt <ps@pks.im> writes: > Now there are two conflicting thoughts here: > > - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs > should now be accessible. > > - Or we can avoid calling `reprepare_packed_git()` inside the loop and > instead call it once after we have fetched all bundles. > > The second one feels a bit like premature optimization to me. But the > first item does feel like it could help us to catch broken bundles > because we wouldn't end up creating refs for objects that neither we nor > the bundle have. I like the way your thoughts are structured around here. I do agree that the latter is a wrong approach---we shouldn't be trusting what came from elsewhere over the network without first checking. We should probably be running the "index-pack --fix-thin" the unbundling process runs with also the "--fsck-objects" option if we are not doing so already, and even then, we should make sure that the object we are making our ref point at have everything behind it.
At 2024-05-17 13:00:49, "Patrick Steinhardt" <ps@pks.im> wrote: >On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote: >> From: Xing Xin <xingxin.xx@bytedance.com> > >Long time no see :) Glad to see you again here :) >> >> So why the `refs/bundles/main` is not discovered? After some digging I >> found that: >> >> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to >> check its prerequisites if any. The verify procedure would find oids >> so `packed_git` is initialized. >> >> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`, >> during which `mark_complete_and_common_ref` and `mark_tips` would >> find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be >> enlisted if `packed_git` has already initialized in 1. > >And I assume we do not want it to not use `OBJECT_INFO_QUICK`? I think so. For clones or fetches without using bundle-uri, we can hardly encounter the case that new packs are added during the negotiation process. So using `OBJECT_INFO_QUICK` can get some performance gain. > >> Back to the example above, when unbunding `incr.bundle`, `base.pack` is >> enlisted to `packed_git` bacause of the prerequisites to verify. Then we >> can not find `B` for negotiation at a latter time bacause `B` exists in >> `incr.pack` which is not enlisted in `packed_git`. > >Okay, the explanation feels sensible. > >> This commit fixes this by adding a `reprepare_packed_git` call for every >> successfully unbundled bundle, which ensures to enlist all generated >> packs from bundle uri. And a set of negotiation related tests are added. > >This makes me wonder though. Do we really need to call >`reprepare_packed_git()` once for every bundle, or can't we instead call >it once at the end once we have fetched all bundles? Reading on. > >> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> >> --- >> bundle-uri: refresh packed_git if unbundle succeed >> >> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1 >> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1 >> Pull-Request: https://github.com/gitgitgadget/git/pull/1730 >> >> bundle-uri.c | 3 + >> t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++ >> 2 files changed, 132 insertions(+) >> >> diff --git a/bundle-uri.c b/bundle-uri.c >> index ca32050a78f..2b9d36cfd8e 100644 >> --- a/bundle-uri.c >> +++ b/bundle-uri.c >> @@ -7,6 +7,7 @@ >> #include "refs.h" >> #include "run-command.h" >> #include "hashmap.h" >> +#include "packfile.h" >> #include "pkt-line.h" >> #include "config.h" >> #include "remote.h" >> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) >> VERIFY_BUNDLE_QUIET))) >> return 1; >> >> + reprepare_packed_git(r); >> + > >So what's hidden here is that `unbundle_from_file()` will also try to >access the bundle's refs right away. Surprisingly, we do so by calling >`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the >effect that we basically accept arbitrary object IDs here even if we do >not know those. That's why we didn't have to `reprepare_packed_git()` >before this change. You are right! I tried dropping this `REF_SKIP_OID_VERIFICATION` flag and the negotiation works as expected. After some further digging I find that without `REF_SKIP_OID_VERIFICATION`, both `write_ref_to_lockfile` for files backend and `reftable_be_transaction_prepare` for reftable backend would call `parse_object` to check the oid. `parse_object` can help refresh `packed_git` via `reprepare_packed_git`. > >Now there are two conflicting thoughts here: > > - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs > should now be accessible. > > - Or we can avoid calling `reprepare_packed_git()` inside the loop and > instead call it once after we have fetched all bundles. > >The second one feels a bit like premature optimization to me. But the >first item does feel like it could help us to catch broken bundles >because we wouldn't end up creating refs for objects that neither we nor >the bundle have. I favor the first approach because a validation on the object IDs we are writing is a safe guard . And the flag itself was designed to be used in testing scenarios. /* * Blindly write an object_id. This is useful for testing data corruption * scenarios. */ #define REF_SKIP_OID_VERIFICATION (1 << 10)
At 2024-05-17 15:36:53, "Karthik Nayak" <karthik.188@gmail.com> wrote: >"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Xing Xin <xingxin.xx@bytedance.com>> >> When using the bundle-uri mechanism with a bundle list containing >> multiple interrelated bundles, we encountered a bug where tips from >> downloaded bundles were not being discovered, resulting in rather slow >> clones. This was particularly problematic when employing the heuristic >> `creationTokens`. >> >> And this is easy to reproduce. Suppose we have a repository with a >> single branch `main` pointing to commit `A`, firstly we create a base >> bundle with >> >> git bundle create base.bundle main >> >> Then let's add a new commit `B` on top of `A`, so that an incremental >> bundle for `main` can be created with >> >> git bundle create incr.bundle A..main >> >> Now we can generate a bundle list with the following content: >> >> [bundle] >> version = 1 >> mode = all >> heuristic = creationToken >> >> [bundle "base"] >> uri = base.bundle >> creationToken = 1 >> >> [bundle "incr"] >> uri = incr.bundle >> creationToken = 2 >> >> A fresh clone with the bundle list above would give the expected >> `refs/bundles/main` pointing at `B` in new repository, in other words we >> already had everything locally from the bundles, but git would still >> download everything from server as if we got nothing. >> >> So why the `refs/bundles/main` is not discovered? After some digging I >> found that: >> >> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to > >s/a// Thanks! > >> check its prerequisites if any. The verify procedure would find oids >> so `packed_git` is initialized. >> > >So the flow is: >1. `fetch_bundle_list` fetches all the bundles advertised via >`download_bundle_list` to local files. >2. It then calls `unbundle_all_bundles` to unbundle all the bundles. >3. Each bundle is then unbundled using `unbundle_from_file`. >4. Here, we first read the bundle header to get all the prerequisites >for the bundle, this is done in `read_bundle_header`. >5. Then we call `unbundle`, which calls `verify_bundle` to ensure that >the repository does indeed contain the prerequisites mentioned in the >bundle. Then it creates the index from the bundle file. > >So because the objects are being checked, the `prepare_packed_git` >function is eventually called, which means that the >`raw_object_store->packed_git` data gets filled in and >`packed_git_initialized` is set. > >This means consecutive calls to `prepare_packed_git` doesn't >re-initiate `raw_object_store->packed_git` since >`packed_git_initialized` already is set. > >So your explanation makes sense, as a _nit_ I would perhaps add the part >about why consecutive calls to `prepare_packed_git` are ineffective. Thanks my friend, you have expressed this issue more clearly. I will post a new description based on your explanation with the creationToken case covered. > >> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`, > >s/unbundled/unbundling Copy that. > >> +#include "packfile.h" >> #include "pkt-line.h" >> #include "config.h" >> #include "remote.h" >> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) >> VERIFY_BUNDLE_QUIET))) >> return 1; >> >> + reprepare_packed_git(r); >> + >> > >Would it make sense to move this to `bundle.c:unbundle()`, since that is >also where the idx is created? > I wonder if we need a mental model that we should `reprepare_packed_git` that when a new pack and its corresponding idx is generated? Currently whether to call `reprepare_packed_git` is determined by the caller. But within the scope of this bug, I tend to remove the `REF_SKIP_OID_VERIFICATION` flag when writing refs as Patrick suggested. Xing Xin
At 2024-05-18 00:14:47, "Junio C Hamano" <gitster@pobox.com> wrote: >Patrick Steinhardt <ps@pks.im> writes: > >> Now there are two conflicting thoughts here: >> >> - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs >> should now be accessible. >> >> - Or we can avoid calling `reprepare_packed_git()` inside the loop and >> instead call it once after we have fetched all bundles. >> >> The second one feels a bit like premature optimization to me. But the >> first item does feel like it could help us to catch broken bundles >> because we wouldn't end up creating refs for objects that neither we nor >> the bundle have. > >I like the way your thoughts are structured around here. > >I do agree that the latter is a wrong approach---we shouldn't be >trusting what came from elsewhere over the network without first >checking. We should probably be running the "index-pack --fix-thin" >the unbundling process runs with also the "--fsck-objects" option if >we are not doing so already, and even then, we should make sure that >the object we are making our ref point at have everything behind it. Currently `unbundle` and all its callers are not adding "--fsck-objects". There is a param `extra_index_pack_args` for `unbundle` but it is mainly used for progress related options. Personally I think data from bundles and data received via network should be treated equally. For "fetch-pack" we now have some configs such as "fetch.fsckobjects" and "transfer.fsckobjects" to decide the behavior, these configs are invisible when we are fetching bundles. So for bundles I think we have some different ways to go: - Acknowledge the configs mentioned above and behave like "fetch-pack". - Add "--fsck-objects" as a default in `unbundle`. - In `unbundle_from_file`, pass in "--fsck-objects" as `extra_index_pack_args` for `unbundle` so this only affect the bundle-uri related process. What do you think? Xing Xin
"Xing Xin" <bupt_xingxin@163.com> writes: > Personally I think data from bundles and data received via network > should be treated equally. Yup, that is not personal ;-) but is universally accepted as a good discipline. In the case of bundle-uri, the bundle came over the network so it is even more true that they should be treated the same. > For "fetch-pack" we now have some configs > such as "fetch.fsckobjects" and "transfer.fsckobjects" to decide the > behavior, these configs are invisible when we are fetching bundles. When fetching over network, transport.c:fetch_refs_via_pack() calls fetch_pack.c:fetch_pack(), which eventually calls get_pack() and the configuration variables are honored there. It appears that the transport layer is unaware of the .fsckobjects configuration knobs. When fetching from a bundle, transport.c:fetch_refs_from_bundle() calls bundle.c:unbundle(). This function has three callers, i.e. "git bundle unbundle", normal fetching from a bundle, and more recently added bundle-uri codepaths. I think one reasonable approach to take is to add an extra parameter that takes one of three values: (never, use-config, always), and conditionally add "--fsck-objects" to the command line of the index-pack. Teach "git bundle unbundle" the "--fsck-objects" option so that it can pass 'never' or 'always' from the command line, and pass 'use-config' from the code paths for normal fetching from a budnle and bundle-uri. To implement use-config, you'd probably need to refactor a small part of fetch-pack.c:get_pack() if (fetch_fsck_objects >= 0 ? fetch_fsck_objects : transfer_fsck_objects >= 0 ? transfer_fsck_objects : 0) fsck_objects = 1; into a public function (to support a caller like unbundle() that comes from sideways, the new function may also need to call fetch_pack_setup() to prime them). A patch series may take a structure like so: * define enum { UNBUNDLE_FSCK_NEVER, UNBUNDLE_FSCK_ALWAYS } in bundle.h, have bundle.c:unbundle() accept a new parameter of that type, and conditionally add "--fsck-objects" to its call to "index-pack". "git bundle unbundle" can pass 'never' to its invocation to unbundle() as an easy way to test it. For the other two callers, we can start by passing 'always'. * (optional) teach "git bundle unbundle" a new "--fsck-objects" option to allow passing 'always' to its call to unbundle(). With that, add tests to feed it a bundle with questionable objects in it and make sure that unbundling notices. * refactor fetch-pack.c:get_pack() to make the fetch-then-transfer configuration logic available to external callers. * Add UNBUNDLE_FSCK_USE_CONFIG to the enum, enhance unbundle() to react to the value by calling the helper function you introduced in the previous step. Thanks.
At 2024-05-21 01:19:02, "Junio C Hamano" <gitster@pobox.com> wrote: >"Xing Xin" <bupt_xingxin@163.com> writes: > >> Personally I think data from bundles and data received via network >> should be treated equally. > >Yup, that is not personal ;-) but is universally accepted as a good >discipline. In the case of bundle-uri, the bundle came over the >network so it is even more true that they should be treated the >same. > >> For "fetch-pack" we now have some configs >> such as "fetch.fsckobjects" and "transfer.fsckobjects" to decide the >> behavior, these configs are invisible when we are fetching bundles. > >When fetching over network, transport.c:fetch_refs_via_pack() calls >fetch_pack.c:fetch_pack(), which eventually calls get_pack() and the >configuration variables are honored there. It appears that the >transport layer is unaware of the .fsckobjects configuration knobs. > >When fetching from a bundle, transport.c:fetch_refs_from_bundle() >calls bundle.c:unbundle(). This function has three callers, i.e. >"git bundle unbundle", normal fetching from a bundle, and more >recently added bundle-uri codepaths. > >I think one reasonable approach to take is to add an extra parameter >that takes one of three values: (never, use-config, always), and >conditionally add "--fsck-objects" to the command line of the >index-pack. Teach "git bundle unbundle" the "--fsck-objects" option >so that it can pass 'never' or 'always' from the command line, and >pass 'use-config' from the code paths for normal fetching from a >budnle and bundle-uri. > >To implement use-config, you'd probably need to refactor a small >part of fetch-pack.c:get_pack() > > if (fetch_fsck_objects >= 0 > ? fetch_fsck_objects > : transfer_fsck_objects >= 0 > ? transfer_fsck_objects > : 0) > fsck_objects = 1; > >into a public function (to support a caller like unbundle() that >comes from sideways, the new function may also need to call >fetch_pack_setup() to prime them). > >A patch series may take a structure like so: > > * define enum { UNBUNDLE_FSCK_NEVER, UNBUNDLE_FSCK_ALWAYS } in > bundle.h, have bundle.c:unbundle() accept a new parameter of that > type, and conditionally add "--fsck-objects" to its call to > "index-pack". "git bundle unbundle" can pass 'never' to its > invocation to unbundle() as an easy way to test it. For the > other two callers, we can start by passing 'always'. > > * (optional) teach "git bundle unbundle" a new "--fsck-objects" > option to allow passing 'always' to its call to unbundle(). With > that, add tests to feed it a bundle with questionable objects in > it and make sure that unbundling notices. I just submitted a new series mainly focusing on the unbundle handling during fetches. I would like to submit a new one for teaching "git bundle unbundle" a "--fsck-objects" option after this to make changes more targeted. > * refactor fetch-pack.c:get_pack() to make the fetch-then-transfer > configuration logic available to external callers. > > * Add UNBUNDLE_FSCK_USE_CONFIG to the enum, enhance unbundle() to I tend to use `UNBUNDLE_FSCK_FOLLOW_FETCH` because this option is only used in fetches, though the current implementation is indeed reading configs. > react to the value by calling the helper function you introduced > in the previous step. The new patch series is constructed right as you suggested, thanks a lot for your help. Xing Xin
diff --git a/bundle-uri.c b/bundle-uri.c index ca32050a78f..2b9d36cfd8e 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -7,6 +7,7 @@ #include "refs.h" #include "run-command.h" #include "hashmap.h" +#include "packfile.h" #include "pkt-line.h" #include "config.h" #include "remote.h" @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) VERIFY_BUNDLE_QUIET))) return 1; + reprepare_packed_git(r); + /* * Convert all refs/heads/ from the bundle into refs/bundles/ * in the local repository. diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 1ca5f745e73..a5b04d6f187 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' ' test_expect_success 'create bundle' ' git init clone-from && git -C clone-from checkout -b topic && + test_commit -C clone-from A && + git -C clone-from bundle create A.bundle topic && + test_commit -C clone-from B && git -C clone-from bundle create B.bundle topic ' @@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' ' ! grep "refs/bundles/" refs ' +######################################################################### +# Clone negotiation related tests begin here + +test_expect_success 'negotiation: bundle with part of wanted commits' ' + test_when_finished rm -rf trace*.txt && + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \ + git clone --no-local --bundle-uri="clone-from/A.bundle" \ + clone-from nego-bundle-part && + git -C nego-bundle-part for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/topic + EOF + test_cmp expect actual && + # Ensure that refs/bundles/topic are sent as "have". + grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt +' + +test_expect_success 'negotiation: bundle with all wanted commits' ' + test_when_finished rm -rf trace*.txt && + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \ + git clone --no-local --single-branch --branch=topic --no-tags \ + --bundle-uri="clone-from/B.bundle" \ + clone-from nego-bundle-all && + git -C nego-bundle-all for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/topic + EOF + test_cmp expect actual && + # We already have all needed commits so no "want" needed. + ! grep "clone> want " trace-packet.txt +' + +test_expect_success 'negotiation: bundle list (no heuristic)' ' + test_when_finished rm -f trace*.txt && + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + EOF + + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \ + git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from nego-bundle-list-no-heuristic && + + git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + EOF + test_cmp expect actual && + grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt +' + +test_expect_success 'negotiation: bundle list (creationToken)' ' + test_when_finished rm -f trace*.txt && + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + creationToken = 2 + EOF + + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \ + git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from nego-bundle-list-heuristic && + + git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + EOF + test_cmp expect actual && + grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt +' + +test_expect_success 'negotiation: bundle list with all wanted commits' ' + test_when_finished rm -f trace*.txt && + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + creationToken = 2 + EOF + + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \ + git clone --no-local --single-branch --branch=left --no-tags \ + --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from nego-bundle-list-all && + + git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + EOF + test_cmp expect actual && + # We already have all needed commits so no "want" needed. + ! grep "clone> want " trace-packet.txt +' + ######################################################################### # HTTP tests begin here