Message ID | 20220302003613.15567-1-chooglen@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 8d2eaf649abd28baa083a723d1e041b26d2be83e |
Headers | show |
Series | checkout, clone: die if tree cannot be parsed | expand |
Glen Choo <chooglen@google.com> writes: > Note that there are many other callsites that don't check for NULLs from > parse_tree_indirect(), and some of which are fairly subtle. I wasn't > confident in changing those, so I stayed on the conservative side and > only changed the ones that I could get to segfault. Thanks. > - tree = parse_tree_indirect(old_branch_info->commit ? > - &old_branch_info->commit->object.oid : > - the_hash_algo->empty_tree); > + > + old_commit_oid = old_branch_info->commit ? > + &old_branch_info->commit->object.oid : > + the_hash_algo->empty_tree; I guess this is done only so that you can use the object name in the error message, which is fine. > + tree = parse_tree_indirect(old_commit_oid); > + if (!tree) > + die(_("unable to parse commit %s"), > + oid_to_hex(old_commit_oid)); "unable to parse commit" is a bit of a white lie. In fact, there is nothing that makes oid_commit_oid the name of a commit object. "unable to parse object '%s' as a tree" would be more technically correct, but one random-looking hexadecimal string is almost indistinguishable from another, and neither would be a very useful message from the end user's point of view. I am wondering if we can use old_branch_info to formulate something easier to understand for the user. update_refs_for_switch() seems to compute old_desc as a human readable name by using old_branch_info->name if available before falling back to old_branch_info->commit object name, which might be a source of inspiration. > init_tree_desc(&trees[0], tree->buffer, tree->size); > parse_tree(new_tree); > tree = new_tree; > diff --git a/builtin/clone.c b/builtin/clone.c > index a572cda503..0aea177660 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -700,6 +700,8 @@ > init_checkout_metadata(&opts.meta, head, &oid, NULL); > > tree = parse_tree_indirect(&oid); > + if (!tree) > + die(_("unable to parse commit %s"), oid_to_hex(&oid)); If we follow the code path, we can positively tell that oid here has always came from reading HEAD, so "unable to parse HEAD as a tree" would be an easier version of the message, I think. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Glen Choo <chooglen@google.com> writes: > >> - tree = parse_tree_indirect(old_branch_info->commit ? >> - &old_branch_info->commit->object.oid : >> - the_hash_algo->empty_tree); >> + >> + old_commit_oid = old_branch_info->commit ? >> + &old_branch_info->commit->object.oid : >> + the_hash_algo->empty_tree; > > I guess this is done only so that you can use the object name in the > error message, which is fine. That's correct. >> + tree = parse_tree_indirect(old_commit_oid); >> + if (!tree) >> + die(_("unable to parse commit %s"), >> + oid_to_hex(old_commit_oid)); > > "unable to parse commit" is a bit of a white lie. In fact, there is > nothing that makes oid_commit_oid the name of a commit object. > > "unable to parse object '%s' as a tree" would be more technically > correct, Hm, yes. With regards to parse_tree_indirect(), "unable to parse object '%s' as a tree" is a more accurate description of the failure. But since we know that the oid is a commit in this context, I'm not sure if we need to offload that much information to the user - if we failed to parse the given object id in the appropriate manner, the user would still be directed to figure out what's wrong with the object. > but one random-looking hexadecimal string is almost > indistinguishable from another, and neither would be a very useful > message from the end user's point of view. I am wondering if we can > use old_branch_info to formulate something easier to understand for > the user. update_refs_for_switch() seems to compute old_desc as a > human readable name by using old_branch_info->name if available > before falling back to old_branch_info->commit object name, which > might be a source of inspiration. I think it's actually more helpful to have the oid instead of a human-readable description like old_branch_info->name. For an advanced user/repo admin, seeing the oid makes it very obvious that there's a particular problem with the given object, and this would direct them to hunt down the object locally (without partial clone) or on the remote (with partial clone, as in the original motivation). From there, it's easy to figure out which branch points to the offending object. The branch name might be misleading - the user would presumably start with hunting down the ref, then explore several possibilities before realizing the problem is actually with the _object_. For a novice user, neither the branch name nor the object id is actionable because they probably wouldn't be able to fix the issue anyway. The advantage of the opaque hex string is that by being intimidating and unrecognizable, it indicates to the user that they shouldn't try to debug the issue and so they might give up sooner and ask for help from someone who might be able to fix it.
Hi Glen, On Wed, 2 Mar 2022, Glen Choo wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > but one random-looking hexadecimal string is almost > > indistinguishable from another, and neither would be a very useful > > message from the end user's point of view. I am wondering if we can > > use old_branch_info to formulate something easier to understand for > > the user. update_refs_for_switch() seems to compute old_desc as a > > human readable name by using old_branch_info->name if available > > before falling back to old_branch_info->commit object name, which > > might be a source of inspiration. > > I think it's actually more helpful to have the oid instead of a > human-readable description like old_branch_info->name. The most helpful would be to have both. That way, it would at least be potentially possible to figure out from a ref how to fetch a non-corrupt version from elsewhere. Ciao, Dscho
diff --git a/builtin/checkout.c b/builtin/checkout.c index d9b31bbb6d..c1035304a5 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -738,6 +738,7 @@ struct tree_desc trees[2]; struct tree *tree; struct unpack_trees_options topts; + const struct object_id *old_commit_oid; memset(&topts, 0, sizeof(topts)); topts.head_idx = -1; @@ -765,9 +766,15 @@ &new_branch_info->commit->object.oid : &new_branch_info->oid, NULL); topts.preserve_ignored = !opts->overwrite_ignore; - tree = parse_tree_indirect(old_branch_info->commit ? - &old_branch_info->commit->object.oid : - the_hash_algo->empty_tree); + + old_commit_oid = old_branch_info->commit ? + &old_branch_info->commit->object.oid : + the_hash_algo->empty_tree; + tree = parse_tree_indirect(old_commit_oid); + if (!tree) + die(_("unable to parse commit %s"), + oid_to_hex(old_commit_oid)); + init_tree_desc(&trees[0], tree->buffer, tree->size); parse_tree(new_tree); tree = new_tree; diff --git a/builtin/clone.c b/builtin/clone.c index a572cda503..0aea177660 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -700,6 +700,8 @@ init_checkout_metadata(&opts.meta, head, &oid, NULL); tree = parse_tree_indirect(&oid); + if (!tree) + die(_("unable to parse commit %s"), oid_to_hex(&oid)); parse_tree(tree); init_tree_desc(&t, tree->buffer, tree->size); if (unpack_trees(1, &t, &opts) < 0)
When a tree oid is invalid, parse_tree_indirect() can return NULL. Check for NULL instead of proceeding as though it were a valid pointer and segfaulting. Signed-off-by: Glen Choo <chooglen@google.com> --- At $DAYJOB, this bug was discovered due to some interactions between "git clone --filter=tree:0" and a buggy server that failed to transfer certain commits. In the 'checkout' step of "git clone --filter=tree:0", the repo tries to get the HEAD commit from the server (since it's not present locally), but this fails due to an unrelated bug in the server. Since the commit tree is invalid, parse_tree_indirect() returns NULL, causing parse_tree(NULL) to segfault. I tried to write a test for this segfault, but I couldn't quite figure out how: - Invalid trees are typically caught pretty early, so I suspect that any reproduction scenario would need to replicate the partial clone + buggy server setup. - I couldn't figure out how to replicate the aforementioned buggy setup I'd appreciate any suggestions on how to test this though :) Note that there are many other callsites that don't check for NULLs from parse_tree_indirect(), and some of which are fairly subtle. I wasn't confident in changing those, so I stayed on the conservative side and only changed the ones that I could get to segfault. builtin/checkout.c | 13 ++++++++++--- builtin/clone.c | 2 ++ 2 files changed, 12 insertions(+), 3 deletions(-) base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7