mbox series

[v2,0/3] batch blob diff generation

Message ID 20250212041825.2455031-1-jltobler@gmail.com (mailing list archive)
Headers show
Series batch blob diff generation | expand

Message

Justin Tobler Feb. 12, 2025, 4:18 a.m. UTC
Through git-diff(1) it is possible to generate a diff directly between
two blobs. This is particularly useful when the pre-image and post-image
blobs are known and we only care about the diff between them.
Unfortunately, if a user has a batch of known blob pairs to compute
diffs for, there is currently not a way to do so via a single Git
process.

To enable support for batch diffs of multiple blob pairs, this
series introduces a new diff plumbing command git-diff-pairs(1) based on
a previous patch series submitted by Peff[1]. This command uses null
delimited raw diffs as its source of input to control exactly which
filepairs are diffed. The advantage of using the raw diff format is that
it already has diff status type and object context information embedded
in each line making it more efficient to generate diffs with as we can
avoid having to peel revisions to get some the same info.

For example:

    git diff-tree -r -z -M $old $new |
    git diff-pairs -p

Here the output of git-diff-tree(1) is fed to git-diff-pairs(1) to
generate the same output that would be expected from `git diff-tree -p
-M`. While by itself not particularly useful, this means it is possible
to split git-diff-tree(1) output across multiple git-diff-pairs(1)
processes. Such a feature is useful on the server-side where diffs
bewteen a large set of changes may not be feasible all at once due to
timeout concerns.

This series is structured as follows:

    - Patch 1 adds some new helper functions to get access to the queued
      `diff_filepair` after `diff_queue()` is invoked.

    - Patch 2 introduces the new git-diff-pairs(1) plumbing command.

    - Patch 3 teaches git-diff-pairs(1) a way to perform explicit diff
      queue flushes instead of waiting until stdin EOF to flush.

In 1f010d6bdf (doc: use .adoc extension for AsciiDoc files, 2025-01-20),
the extension for documentation was change from .txt to .adoc. This
series builds on top of that change as to avoid conflicts in next.

Changes since V1:

    - Changed from git-diff-blob(1) to git-diff-pairs(1) based on a
      previously submitted series.

    - Instead of each line containing a pair of blob revisions, the raw
      diff format is used as input which already has diff status and
      object context embedded.

-Justin

[1]: <20161201204042.6yslbyrg7l6ghhww@sigill.intra.peff.net>

Justin Tobler (3):
  diff: return diff_filepair from diff queue helpers
  builtin: introduce diff-pairs command
  builtin/diff-pairs: allow explicit diff queue flush

 .gitignore                        |   1 +
 Documentation/git-diff-pairs.adoc |  66 +++++++++++
 Documentation/meson.build         |   1 +
 Makefile                          |   1 +
 builtin.h                         |   1 +
 builtin/diff-pairs.c              | 189 ++++++++++++++++++++++++++++++
 command-list.txt                  |   1 +
 diff.c                            |  66 ++++++++---
 diff.h                            |  15 +++
 git.c                             |   1 +
 meson.build                       |   1 +
 t/meson.build                     |   1 +
 t/t4070-diff-pairs.sh             | 102 ++++++++++++++++
 13 files changed, 427 insertions(+), 19 deletions(-)
 create mode 100644 Documentation/git-diff-pairs.adoc
 create mode 100644 builtin/diff-pairs.c
 create mode 100755 t/t4070-diff-pairs.sh