Message ID | pull.848.git.1611676886.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Refactor chunk-format into an API | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > This is a restart on the topic previously submitted [1] but dropped because > ak/corrected-commit-date was still in progress. This version is based on > that branch. Nice to see that we have an endorsement on ak/corrected-commit-date topic ;-) I've scanned this round of the topic and they were pleasant read. I may have other comments after a more careful reading, but so far, I am happy with what I see here. Thanks.
On Tue, Jan 26, 2021 at 04:01:09PM +0000, Derrick Stolee via GitGitGadget wrote: > This version also changes the approach to use a more dynamic interaction > with a struct chunkfile pointer. This idea is credited to Taylor Blau [2], > but I started again from scratch. I also go further to make struct chunkfile > anonymous to API consumers. It is defined only in chunk-format.c, which > should hopefully deter future users from interacting with that data > directly. > > [2] https://lore.kernel.org/git/X8%2FI%2FRzXZksio+ri@nand.local/ Great; I am very happy that you found my patch to be useful. I'm glad that you decided to start from scratch, too, since as I recall there were some unresolved test issues that I punted on in case you decided to abandon the topic altogether. > This combined API is beneficial to reduce duplicated logic. Or rather, to > ensure that similar file formats have similar protections against bad data. > The multi-pack-index code did not have as many guards as the commit-graph > code did, but now they both share a common base that checks for things like > duplicate chunks or offsets outside the size of the file. Definitely good. > Here are some stats for the end-to-end change: > > * 638 insertions(+), 456 deletions(-). > * commit-graph.c: 171 insertions(+), 192 deletions(-) > * midx.c: 196 insertions(+), 260 deletions(-) > > While there is an overall increase to the code size, the consumers do get a > bit smaller. Boilerplate things like abstracting method to match > chunk_write_fn and chunk_read_fn make up a lot of these insertions. The > "interesting" code gets a lot smaller and cleaner. Like I said in [1], I don't think a net +182 line diff is reason alone not to pursue this topic. I don't think that an chunked index v3 will come as part of my work on the on-disk revindex format, but I do think that it's something brian may be interested in. So, I'm feeling rather certain that we'll eventually have new callers, at which point this will reduce duplication overall. [1]: https://lore.kernel.org/git/X8%2FK1dUgUmwp8ZOv@nand.local/ Thanks, Taylor