Message ID | cover.1567563244.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | commit-graph.c: handle corrupt commit trees | expand |
On 9/3/2019 10:22 PM, Taylor Blau wrote: > Hi, > > I was running some of the new 'git commit-graph' commands, and noticed > that I could consistently get 'git commit-graph write --reachable' to > segfault when a commit's root tree is corrupt. > > I have an extremely-unfinished fix attached as an RFC PATCH below, but I > wanted to get a few thoughts on this before sending it out as a non-RFC. > > In my patch, I simply 'die()' when a commit isn't able to be parsed > (i.e., when 'parse_commit_no_graph' returns a non-zero code), but I > wanted to see if others thought that this was an OK approach. Some > thoughts: I like the idea of completely bailing if the commit can't be parsed too. Only question: Is there a reason you chose to die() instead of BUG() like the other two places in that function? What is the criteria of choosing one over the other? > > * It seems like we could write a commit-graph by placing a "filler" > entry where the broken commit would have gone. I don't see any place > where this is implemented currently, but this seems like a viable > alternative to not writing _any_ commits into the commit-graph. I would rather we didn't do this cause it will probably kick open the can of always watching for that filler when we are working with the commit-graph. Or do we already do that today? Maybe @stolee can chime in on what we do in cases of shallow clones and other potential gaps in the walk -Garima
Hi Garima, On Wed, Sep 04, 2019 at 02:25:55PM -0400, Garima Singh wrote: > > On 9/3/2019 10:22 PM, Taylor Blau wrote: > > Hi, > > > > I was running some of the new 'git commit-graph' commands, and noticed > > that I could consistently get 'git commit-graph write --reachable' to > > segfault when a commit's root tree is corrupt. > > > > I have an extremely-unfinished fix attached as an RFC PATCH below, but I > > wanted to get a few thoughts on this before sending it out as a non-RFC. > > > > In my patch, I simply 'die()' when a commit isn't able to be parsed > > (i.e., when 'parse_commit_no_graph' returns a non-zero code), but I > > wanted to see if others thought that this was an OK approach. Some > > thoughts: > > I like the idea of completely bailing if the commit can't be parsed too. > Only question: Is there a reason you chose to die() instead of BUG() like > the other two places in that function? What is the criteria of choosing one > over the other? I did not call 'BUG' here because 'BUG' is traditionally used to indicate an internal bug, e.g., an unexpected state or some such. On the other side of that coin, 'BUG' is _not_ used to indicate repository corruption, since that is not an issue in the Git codebase, rather in the user's repository. Though, to be honest, I've never seen that rule written out explicitly (maybe if it were to be written somewhere, it could be stored in Documentation/CodingGuidelines?). I think that this is some good #leftoverbits material. > > > > * It seems like we could write a commit-graph by placing a "filler" > > entry where the broken commit would have gone. I don't see any place > > where this is implemented currently, but this seems like a viable > > alternative to not writing _any_ commits into the commit-graph. > > I would rather we didn't do this cause it will probably kick open the can of > always watching for that filler when we are working with the commit-graph. > Or do we already do that today? Maybe @stolee can chime in on what we do in > cases of shallow clones and other potential gaps in the walk Yeah, I think that the consensus is that it makes sense to just die here, which is fine by me. > -Garima Thanks, Taylor
On Wed, Sep 04, 2019 at 05:21:21PM -0400, Taylor Blau wrote: > > I like the idea of completely bailing if the commit can't be parsed too. > > Only question: Is there a reason you chose to die() instead of BUG() like > > the other two places in that function? What is the criteria of choosing one > > over the other? > > I did not call 'BUG' here because 'BUG' is traditionally used to > indicate an internal bug, e.g., an unexpected state or some such. On the > other side of that coin, 'BUG' is _not_ used to indicate repository > corruption, since that is not an issue in the Git codebase, rather in > the user's repository. > > Though, to be honest, I've never seen that rule written out explicitly > (maybe if it were to be written somewhere, it could be stored in > Documentation/CodingGuidelines?). I think that this is some good > #leftoverbits material. That rule matches my understanding. A BUG() should be about asserting invariants or catching should-not-happen cases, etc. Any time a BUG() triggers, that is truly a bug in Git, no matter what input got thrown at it, what syscalls failed, etc, and is worth fixing (even if the only sensible thing is to die()). As a side note, we've generally treated segfaults the same way. It doesn't matter if the files on disk or the program input is garbage, we should say so and abort the operation cleanly. -Peff
On 9/4/2019 5:21 PM, Taylor Blau wrote: > Hi Garima, > > On Wed, Sep 04, 2019 at 02:25:55PM -0400, Garima Singh wrote: >> >> On 9/3/2019 10:22 PM, Taylor Blau wrote: >>> Hi, >>> >>> I was running some of the new 'git commit-graph' commands, and noticed >>> that I could consistently get 'git commit-graph write --reachable' to >>> segfault when a commit's root tree is corrupt. >>> >>> I have an extremely-unfinished fix attached as an RFC PATCH below, but I >>> wanted to get a few thoughts on this before sending it out as a non-RFC. >>> >>> In my patch, I simply 'die()' when a commit isn't able to be parsed >>> (i.e., when 'parse_commit_no_graph' returns a non-zero code), but I >>> wanted to see if others thought that this was an OK approach. Some >>> thoughts: >> >> I like the idea of completely bailing if the commit can't be parsed too. >> Only question: Is there a reason you chose to die() instead of BUG() like >> the other two places in that function? What is the criteria of choosing one >> over the other? > > I did not call 'BUG' here because 'BUG' is traditionally used to > indicate an internal bug, e.g., an unexpected state or some such. On the > other side of that coin, 'BUG' is _not_ used to indicate repository > corruption, since that is not an issue in the Git codebase, rather in > the user's repository. > > Though, to be honest, I've never seen that rule written out explicitly > (maybe if it were to be written somewhere, it could be stored in > Documentation/CodingGuidelines?). I think that this is some good > #leftoverbits material. > >>> >>> * It seems like we could write a commit-graph by placing a "filler" >>> entry where the broken commit would have gone. I don't see any place >>> where this is implemented currently, but this seems like a viable >>> alternative to not writing _any_ commits into the commit-graph. >> >> I would rather we didn't do this cause it will probably kick open the can of >> always watching for that filler when we are working with the commit-graph. >> Or do we already do that today? Maybe @stolee can chime in on what we do in >> cases of shallow clones and other potential gaps in the walk > > Yeah, I think that the consensus is that it makes sense to just die > here, which is fine by me. I agree the die() is the best thing to do for now. If we wanted to salvage as much as possible, then we could use these corrupt marks and then use the "reverse walk" in compute_generation_numbers() to mark all commits that can reach the corrupt commit as corrupt. We would then need to remove all corrupt commits from the list we are planning to write. However, that is just hiding a corrupt object in the object database, which is not a situation we want to leave unnoticed. Thanks, -Stolee