Message ID | 9fbd965e3892307bb5bb78952f017624fcc0b73a.1567720960.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | commit-graph: harden against various corruptions | expand |
On Thu, Sep 05, 2019 at 06:04:57PM -0400, Taylor Blau wrote: > @@ -846,7 +847,11 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, > if (parse_commit_no_graph(*list)) > die(_("unable to parse commit %s"), > oid_to_hex(&(*list)->object.oid)); > - hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len); > + tree = get_commit_tree_oid(*list); > + if (!tree) > + die(_("unable to get tree for %s"), > + oid_to_hex(&(*list)->object.oid)); > + hashwrite(f, tree->hash, hash_len); Yeah, I think this is a good stop-gap to protect ourselves, until a time when parse_commit() and friends consistently warn us about the breakage. > diff --git a/commit.c b/commit.c > index a98de16e3d..fab22cb740 100644 > --- a/commit.c > +++ b/commit.c > @@ -358,7 +358,8 @@ struct tree *repo_get_commit_tree(struct repository *r, > > struct object_id *get_commit_tree_oid(const struct commit *commit) > { > - return &get_commit_tree(commit)->object.oid; > + struct tree *tree = get_commit_tree(commit); > + return tree ? &tree->object.oid : NULL; > } This one in theory benefits lots of other callsites, too, since it means we'll actually return NULL instead of nonsense like "8". But grepping around for calls to this function, I found literally zero of them actually bother checking for a NULL result. So there are probably dozens of similar segfaults waiting to happen in other code paths. Discouraging. This is sort-of attributable to my 834876630b (get_commit_tree(): return NULL for broken tree, 2019-04-09). Before then it was a BUG(). However, that state was relatively short-lived. Before 7b8a21dba1 (commit-graph: lazy-load trees for commits, 2018-04-06), we'd have similarly returned NULL (and anyway, BUG() is clearly wrong since it's a data error). None of which argues against your patches, but it's kind of sad that the issue is present in so many code paths. I wonder if we could be handling this in a more central way, but I don't see how short of dying. -Peff
On Fri, Sep 06, 2019 at 02:19:20AM -0400, Jeff King wrote: > On Thu, Sep 05, 2019 at 06:04:57PM -0400, Taylor Blau wrote: > > > @@ -846,7 +847,11 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, > > if (parse_commit_no_graph(*list)) > > die(_("unable to parse commit %s"), > > oid_to_hex(&(*list)->object.oid)); > > - hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len); > > + tree = get_commit_tree_oid(*list); > > + if (!tree) > > + die(_("unable to get tree for %s"), > > + oid_to_hex(&(*list)->object.oid)); > > + hashwrite(f, tree->hash, hash_len); > > Yeah, I think this is a good stop-gap to protect ourselves, until a time > when parse_commit() and friends consistently warn us about the breakage. > > > diff --git a/commit.c b/commit.c > > index a98de16e3d..fab22cb740 100644 > > --- a/commit.c > > +++ b/commit.c > > @@ -358,7 +358,8 @@ struct tree *repo_get_commit_tree(struct repository *r, > > > > struct object_id *get_commit_tree_oid(const struct commit *commit) > > { > > - return &get_commit_tree(commit)->object.oid; > > + struct tree *tree = get_commit_tree(commit); > > + return tree ? &tree->object.oid : NULL; > > } You mentioned in the version of this series that is rebased on GitHub's fork that it may be worth putting this hunk in a separate commit entirely. I don't disagree, so if there are other comments that merit a reroll of this, I'm happy to pull this change out as 3/4. > This one in theory benefits lots of other callsites, too, since it means > we'll actually return NULL instead of nonsense like "8". But grepping > around for calls to this function, I found literally zero of them > actually bother checking for a NULL result. So there are probably dozens > of similar segfaults waiting to happen in other code paths. > Discouraging. Discouraging indeed. I think that you suggest it below, but perhaps the right thing to do here is implement 'get_commit_tree_oid()' as follows: struct object_id *get_commit_tree_oid(const struct commit *commit) { struct tree *tree = get_commit_tree(commit); if (!tree) die(_("unable to get tree from commit %s"), oid_to_hex(&commit->object.oid)); return &tree->object.oid; } Which then puts the onus on the *caller* to check their commit pointer to make sure that it has a legit tree in it, unless they're OK with dying. In the commit-graph change that this whole thread is in response to, that's exactly what we want: I don't want to have to check the return value of two function calls myself. I'm perfectly happy to die() in the middle of things if there is an object corruption, but the library code should take care of that for me, and not allow for dozens of checks, each with their own unique 'die()'-ing message. All of that said, I don't know if I think it's worth holding this series up on the above in the meantime. I do think that it (or something like it) is generally worth doing, but I'm not sure that now is the time to do it. > This is sort-of attributable to my 834876630b (get_commit_tree(): return > NULL for broken tree, 2019-04-09). Before then it was a BUG(). However, > that state was relatively short-lived. Before 7b8a21dba1 (commit-graph: > lazy-load trees for commits, 2018-04-06), we'd have similarly returned > NULL (and anyway, BUG() is clearly wrong since it's a data error). Ha, I was wondering why that commit message looked familiar... it turns out that I'm the culprit, too, via the 'Co-authored-by' trailer. Oops. > None of which argues against your patches, but it's kind of sad that the > issue is present in so many code paths. I wonder if we could be handling > this in a more central way, but I don't see how short of dying. > > -Peff Thanks, Taylor
On 9/6/2019 2:19 AM, Jeff King wrote: > On Thu, Sep 05, 2019 at 06:04:57PM -0400, Taylor Blau wrote: > >> @@ -846,7 +847,11 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, >> if (parse_commit_no_graph(*list)) >> die(_("unable to parse commit %s"), >> oid_to_hex(&(*list)->object.oid)); >> - hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len); >> + tree = get_commit_tree_oid(*list); >> + if (!tree) >> + die(_("unable to get tree for %s"), >> + oid_to_hex(&(*list)->object.oid)); >> + hashwrite(f, tree->hash, hash_len); > > Yeah, I think this is a good stop-gap to protect ourselves, until a time > when parse_commit() and friends consistently warn us about the breakage. > >> diff --git a/commit.c b/commit.c >> index a98de16e3d..fab22cb740 100644 >> --- a/commit.c >> +++ b/commit.c >> @@ -358,7 +358,8 @@ struct tree *repo_get_commit_tree(struct repository *r, >> >> struct object_id *get_commit_tree_oid(const struct commit *commit) >> { >> - return &get_commit_tree(commit)->object.oid; >> + struct tree *tree = get_commit_tree(commit); >> + return tree ? &tree->object.oid : NULL; >> } > > This one in theory benefits lots of other callsites, too, since it means > we'll actually return NULL instead of nonsense like "8". But grepping > around for calls to this function, I found literally zero of them > actually bother checking for a NULL result. So there are probably dozens > of similar segfaults waiting to happen in other code paths. > Discouraging. > > This is sort-of attributable to my 834876630b (get_commit_tree(): return > NULL for broken tree, 2019-04-09). Before then it was a BUG(). However, > that state was relatively short-lived. Before 7b8a21dba1 (commit-graph: > lazy-load trees for commits, 2018-04-06), we'd have similarly returned > NULL (and anyway, BUG() is clearly wrong since it's a data error). > > None of which argues against your patches, but it's kind of sad that the > issue is present in so many code paths. I wonder if we could be handling > this in a more central way, but I don't see how short of dying. This is due to the mechanical conversion from using commit->tree->oid to get_commit_tree_oid(commit). Those consumers were not checking if the tree pointer was NULL, either, but they probably assumed that the parse_commit() call would have failed earlier. Now that we are using this method (for performance reasons to avoid creating too many 'struct tree's) it makes sense to convert some of them to checking the return value more carefully. If we wanted the more invasive change of modifying every caller to check NULL and respond appropriately, that would be _best_, but is probably not worth the effort. Thanks, -Stolee
Jeff King <peff@peff.net> writes: > This is sort-of attributable to my 834876630b (get_commit_tree(): return > NULL for broken tree, 2019-04-09). Before then it was a BUG(). However, > that state was relatively short-lived. Before 7b8a21dba1 (commit-graph: > lazy-load trees for commits, 2018-04-06), we'd have similarly returned > NULL (and anyway, BUG() is clearly wrong since it's a data error). > > None of which argues against your patches, but it's kind of sad that the > issue is present in so many code paths. I wonder if we could be handling > this in a more central way, but I don't see how short of dying. Well, either we explicitly die in here, or let the caller segfault. Is there even a single caller that is prepared to react to NULL? Answer. There is a single hit inside fsck.c that wants to report an error without killing ourselves in fsck_commit_buffer(). I however doubt its use of get_commit_tree() is correct in the first place. The function is about validating the commit object payload manually, without trusting the result of parse_commit(), and it does read the object name of the tree object; the call to get_commit_tree() used for reporting the error there should probably become has_object() on the tree_oid. So, after fixing the above, we may safely be able to die inside get_commit_tree() instead of returning NULL. By the way, I think get_commit_tree() and parse_commit() in fsck should always use the value obtained from the underlying object and bypass any caches like commit graph---if they pay attention to the caches, they should be fixed. Secondary caches like commit graph should of course be validated against what are recorded in the underlying object, but that should be done separately.
Junio C Hamano <gitster@pobox.com> writes: > ... > Is there even a single caller that is prepared to react to NULL? > > Answer. There is a single hit inside fsck.c that wants to report > an error without killing ourselves in fsck_commit_buffer(). I > however doubt its use of get_commit_tree() is correct in the > first place. The function is about validating the commit object > payload manually, without trusting the result of parse_commit(), > and it does read the object name of the tree object; the call to > get_commit_tree() used for reporting the error there should > probably become has_object() on the tree_oid. At least we need to ensure, not just has_object(), but the object indeed claims to be a tree object. It is OK if it is a corrupt tree object---we'll catch its brokenness separately anyway. Hmm, the should we also tolerate the pointed object to be broken in a way that it is not even able to claim to be a tree object? That would mean that has_object() is sufficient to check here. OK, so... > So, after fixing the above, we may safely be able to die inside > get_commit_tree() instead of returning NULL. > > By the way, I think get_commit_tree() and parse_commit() in fsck > should always use the value obtained from the underlying object and > bypass any caches like commit graph---if they pay attention to the > caches, they should be fixed. Secondary caches like commit graph > should of course be validated against what are recorded in the > underlying object, but that should be done separately.
On Fri, Sep 06, 2019 at 09:57:29AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > This is sort-of attributable to my 834876630b (get_commit_tree(): return > > NULL for broken tree, 2019-04-09). Before then it was a BUG(). However, > > that state was relatively short-lived. Before 7b8a21dba1 (commit-graph: > > lazy-load trees for commits, 2018-04-06), we'd have similarly returned > > NULL (and anyway, BUG() is clearly wrong since it's a data error). > > > > None of which argues against your patches, but it's kind of sad that the > > issue is present in so many code paths. I wonder if we could be handling > > this in a more central way, but I don't see how short of dying. > > Well, either we explicitly die in here, or let the caller segfault. > Is there even a single caller that is prepared to react to NULL? > [...] > So, after fixing the above, we may safely be able to die inside > get_commit_tree() instead of returning NULL. I think the one alternative is catching this more reliably during the parse phase. And then callers have the option of handling the error _then_, without forcing every downstream user of the struct to re-validate it. We _could_ add the die() there to catch any stragglers. But that does make it harder for somebody to try to examine the error situation, or gracefully return an error up the stack. Maybe that use case really doesn't have any value. I dunno. This case did BUG() until recently, and we did run into it in the real world. But the problem wasn't that the operation didn't succeed, but rather the BUG(). I don't know of any code path where the caller doesn't simply die(). > Answer. There is a single hit inside fsck.c that wants to report > an error without killing ourselves in fsck_commit_buffer(). I > however doubt its use of get_commit_tree() is correct in the > first place. The function is about validating the commit object > payload manually, without trusting the result of parse_commit(), > and it does read the object name of the tree object; the call to > get_commit_tree() used for reporting the error there should > probably become has_object() on the tree_oid. I actually think that check should be removed entirely. That function is about checking the syntactic validity of the object itself, not about connectivity (which is handled separately). We already check that we have a valid "tree" pointer earlier in the function. The current get_commit_tree() check is doing essentially nothing. parse_commit() would have parsed the same thing we already checked, and the lookup_tree() call it uses to fill in the pointer is not reliable (it would only fail if we happened to have seen the same oid already as a non-tree in the same process). The history is interesting here. In the early days fsck-cache actually did parse the commit object itself. Then ff5ebe39b0 ([PATCH] Port fsck-cache to use parsing functions, 2005-04-18) converted it to use parse_commit(). Then de2eb7f694 (git-fsck-cache.c: check commit objects more carefully, 2005-07-27) went back to parsing it ourselves, but left the struct checks in place. We also look at commit->parents, but seemingly only to compare them to grafts (and that's weird itself, because grafts aren't a property of the object at all, and it seems like at best this is just verifying that we correctly loaded the grafts). > By the way, I think get_commit_tree() and parse_commit() in fsck > should always use the value obtained from the underlying object and > bypass any caches like commit graph---if they pay attention to the > caches, they should be fixed. Secondary caches like commit graph > should of course be validated against what are recorded in the > underlying object, but that should be done separately. Agreed. Probably fsck should just be disabling the commit graph for the whole process (it looks like there's an env variable for this, but no internal global, which is what fsck would want). -Peff
On Fri, Sep 06, 2019 at 10:11:09AM -0700, Junio C Hamano wrote: > > Is there even a single caller that is prepared to react to NULL? > > > > Answer. There is a single hit inside fsck.c that wants to report > > an error without killing ourselves in fsck_commit_buffer(). I > > however doubt its use of get_commit_tree() is correct in the > > first place. The function is about validating the commit object > > payload manually, without trusting the result of parse_commit(), > > and it does read the object name of the tree object; the call to > > get_commit_tree() used for reporting the error there should > > probably become has_object() on the tree_oid. > > At least we need to ensure, not just has_object(), but the object > indeed claims to be a tree object. It is OK if it is a corrupt > tree object---we'll catch its brokenness separately anyway. > > Hmm, the should we also tolerate the pointed object to be broken > in a way that it is not even able to claim to be a tree object? > That would mean that has_object() is sufficient to check here. > > OK, so... We'd do that check later, during fsck_walk_commit(). Which ironically calls get_commit_tree() without checking for NULL, and would likely segfault. ;) -Peff
On Fri, Sep 06, 2019 at 11:42:14AM -0400, Taylor Blau wrote: > > > struct object_id *get_commit_tree_oid(const struct commit *commit) > > > { > > > - return &get_commit_tree(commit)->object.oid; > > > + struct tree *tree = get_commit_tree(commit); > > > + return tree ? &tree->object.oid : NULL; > > > } > > You mentioned in the version of this series that is rebased on GitHub's > fork that it may be worth putting this hunk in a separate commit > entirely. I don't disagree, so if there are other comments that merit a > reroll of this, I'm happy to pull this change out as 3/4. Yeah, I could go either way on that, I think. I was thinking it might be fixing other callsites, but it seems that nobody else bothers to check for NULL anyway. But being in its own commit, we could explain that. > > This one in theory benefits lots of other callsites, too, since it means > > we'll actually return NULL instead of nonsense like "8". But grepping > > around for calls to this function, I found literally zero of them > > actually bother checking for a NULL result. So there are probably dozens > > of similar segfaults waiting to happen in other code paths. > > Discouraging. > > Discouraging indeed. I think that you suggest it below, but perhaps the > right thing to do here is implement 'get_commit_tree_oid()' as follows: > > struct object_id *get_commit_tree_oid(const struct commit *commit) > { > struct tree *tree = get_commit_tree(commit); > if (!tree) > die(_("unable to get tree from commit %s"), > oid_to_hex(&commit->object.oid)); > return &tree->object.oid; > } > > Which then puts the onus on the *caller* to check their commit pointer > to make sure that it has a legit tree in it, unless they're OK with > dying. Yeah, I agree that would prevent segfaults (and is similar to what René proposed for tags with a similar situation). It does feel like a step backwards in terms of lib-ification. But maybe it's a belt-and-suspenders on top of trying to catch this case at the parsing stage, too. > All of that said, I don't know if I think it's worth holding this series > up on the above in the meantime. I do think that it (or something like > it) is generally worth doing, but I'm not sure that now is the time to > do it. I'd agree with that, and I think it's sensible to take your patches with the extra tree check. We can rip it out later if it becomes redundant. -Peff
On Fri, Sep 06, 2019 at 12:51:56PM -0400, Derrick Stolee wrote: > > This one in theory benefits lots of other callsites, too, since it means > > we'll actually return NULL instead of nonsense like "8". But grepping > > around for calls to this function, I found literally zero of them > > actually bother checking for a NULL result. So there are probably dozens > > of similar segfaults waiting to happen in other code paths. > > Discouraging. > > > > This is sort-of attributable to my 834876630b (get_commit_tree(): return > > NULL for broken tree, 2019-04-09). Before then it was a BUG(). However, > > that state was relatively short-lived. Before 7b8a21dba1 (commit-graph: > > lazy-load trees for commits, 2018-04-06), we'd have similarly returned > > NULL (and anyway, BUG() is clearly wrong since it's a data error). > > > > None of which argues against your patches, but it's kind of sad that the > > issue is present in so many code paths. I wonder if we could be handling > > this in a more central way, but I don't see how short of dying. > > This is due to the mechanical conversion from using commit->tree->oid to > get_commit_tree_oid(commit). Those consumers were not checking if the > tree pointer was NULL, either, but they probably assumed that the > parse_commit() call would have failed earlier. Now that we are using this > method (for performance reasons to avoid creating too many 'struct tree's) > it makes sense to convert some of them to checking the return value more > carefully. Right, none of this is new at all. We have historically been very loose about assuming that things like commit->tree were valid. And they _usually_ are. Even if we're missing the object on disk, lookup_tree() is happy to assign it a struct (unless the object was already seen as another type!). I think turning that case into an error from parse_commit() would cover a lot of cases easily, without forcing each caller to check for NULL. -Peff
Jeff King <peff@peff.net> writes: >> Answer. There is a single hit inside fsck.c that wants to report >> an error without killing ourselves in fsck_commit_buffer(). I >> however doubt its use of get_commit_tree() is correct in the >> first place. The function is about validating the commit object >> payload manually, without trusting the result of parse_commit(), >> and it does read the object name of the tree object; the call to >> get_commit_tree() used for reporting the error there should >> probably become has_object() on the tree_oid. > > I actually think that check should be removed entirely. That function is > about checking the syntactic validity of the object itself, not about > connectivity (which is handled separately). We already check that we > have a valid "tree" pointer earlier in the function. Of course, you're right.
diff --git a/commit-graph.c b/commit-graph.c index 6aa6998ecd..cea1b37493 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -839,6 +839,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, while (list < last) { struct commit_list *parent; + struct object_id *tree; int edge_value; uint32_t packedDate[2]; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -846,7 +847,11 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, if (parse_commit_no_graph(*list)) die(_("unable to parse commit %s"), oid_to_hex(&(*list)->object.oid)); - hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len); + tree = get_commit_tree_oid(*list); + if (!tree) + die(_("unable to get tree for %s"), + oid_to_hex(&(*list)->object.oid)); + hashwrite(f, tree->hash, hash_len); parent = (*list)->parents; diff --git a/commit.c b/commit.c index a98de16e3d..fab22cb740 100644 --- a/commit.c +++ b/commit.c @@ -358,7 +358,8 @@ struct tree *repo_get_commit_tree(struct repository *r, struct object_id *get_commit_tree_oid(const struct commit *commit) { - return &get_commit_tree(commit)->object.oid; + struct tree *tree = get_commit_tree(commit); + return tree ? &tree->object.oid : NULL; } void release_commit_memory(struct parsed_object_pool *pool, struct commit *c) diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index abde8d4e90..5d2d88b100 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -607,7 +607,7 @@ test_expect_success 'corrupt commit-graph write (broken parent)' ' ) ' -test_expect_failure 'corrupt commit-graph write (missing tree)' ' +test_expect_success 'corrupt commit-graph write (missing tree)' ' rm -rf repo && git init repo && (
Apply similar treatment as in the previous commit to handle an unchecked call to 'get_commit_tree_oid()'. Previously, a NULL return value from this function would be immediately dereferenced with '->hash', and then cause a segfault. Before dereferencing to access the 'hash' member, check the return value of 'get_commit_tree_oid()' to make sure that it is not NULL. To make this check correct, a related change is also needed in 'commit.c', which is to check the return value of 'get_commit_tree' before taking its address. If 'get_commit_tree' returns NULL, we encounter an undefined behavior when taking the address of the return value of 'get_commit_tree' and then taking '->object.oid'. (On my system, this is memory address 0x8, which is obviously wrong). Fix this by making sure that 'get_commit_tree' returns something non-NULL before digging through a structure that is not there, thus preventing a segfault down the line in the commit graph code. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- commit-graph.c | 7 ++++++- commit.c | 3 ++- t/t5318-commit-graph.sh | 2 +- 3 files changed, 9 insertions(+), 3 deletions(-)