diff mbox series

[RFC] diff: introduce scope option

Message ID pull.1398.git.1667189512579.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] diff: introduce scope option | expand

Commit Message

ZheNing Hu Oct. 31, 2022, 4:11 a.m. UTC
From: ZheNing Hu <adlternative@gmail.com>

When we use sparse-checkout, we often want the set of files
that some commands operate on to be restricted to the
sparse-checkout specification.

So introduce the `--scope` option to git diff, which have two
value: "sparse" and "all". "sparse" mean that diff is performed
restrict to paths which matching sparse-checkout specification,
"all" mean that diff is performed regardless of whether the path
meets the sparse-checkout specification. `--no-scope` is the default
option for now.

Add `diff.scope={sparse, all}` config, which can also have the same
capabilities as `--scope`, and it will be covered by `--scope` option.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    [RFC] diff: introduce scope option
    
    In [1], we discovered that users working on different sparse-checkout
    specification may download unnecessary blobs from each other's
    specification in collaboration. In [2] Junio suggested that maybe we can
    restrict some git command's filespec in sparse-checkout specification to
    elegantly solve this problem above. In [3]: Newren and Derrick Stolee
    prefer to name the option --scope={sparse, all}.
    
    So this patch is attempt to do this thing on git diff:
    
    v1:
    
     1. add --restrict option to git diff, which restrict diff filespec in
        sparse-checkout specification. [4] v2.
     2. rename --restrict to --scope={sparse, all}, support --no-scope.
     3. add config: diff.scope={sparse,all}.
    
    Unresolved work:
    
     1. how to properly pass this --scope={sparse, all} to other commands
        like git log, git format-patch, etc.
     2. how to set the default value of scope for different diff commands.
    
    [1]:
    https://lore.kernel.org/git/CAOLTT8SHo66kGbvWr=+LQ9UVd1NHgqGGEYK2qq6==QgRCgLZqQ@mail.gmail.com/
    [2]: https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ [3]:
    https://lore.kernel.org/git/07a25d48-e364-0d9b-6ffa-41a5984eb5db@github.com/
    [4]:
    https://lore.kernel.org/git/pull.1368.git.1664036052741.gitgitgadget@gmail.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1398%2Fadlternative%2Fzh%2Fdiff-scope-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1398/adlternative/zh/diff-scope-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1398

 Documentation/config/diff.txt         |  12 +
 Documentation/diff-options.txt        |  18 +
 builtin/diff.c                        |   4 +
 diff-lib.c                            |  36 +-
 diff-no-index.c                       |   4 +
 diff.c                                |  39 +++
 diff.h                                |  11 +
 t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++
 tree-diff.c                           |   5 +
 9 files changed, 597 insertions(+), 1 deletion(-)
 create mode 100644 t/t4070-diff-sparse-checkout-scope.sh


base-commit: 63bba4fdd86d80ef061c449daa97a981a9be0792

Comments

Taylor Blau Nov. 1, 2022, 1:34 a.m. UTC | #1
On Mon, Oct 31, 2022 at 04:11:52AM +0000, ZheNing Hu via GitGitGadget wrote:
>  t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++

It looks like this test is non-executable, leading to the following
error:

    ~/s/git [nand] (zh/diff--scope) $ make test
        SUBDIR templates
    make -C t/ all
    make[1]: Entering directory '/home/ttaylorr/src/git/t'
    rm -f -r 'test-results'
    non-executable tests: t4070-diff-sparse-checkout-scope.sh
    make[1]: *** [Makefile:120: test-lint-executable] Error 1
    make[1]: *** Waiting for unfinished jobs....
    make[1]: Leaving directory '/home/ttaylorr/src/git/t'
    make: *** [Makefile:3083: test] Error 2

Thanks,
Taylor
ZheNing Hu Nov. 1, 2022, 2:13 a.m. UTC | #2
Taylor Blau <me@ttaylorr.com> 于2022年11月1日周二 09:34写道:
>
> On Mon, Oct 31, 2022 at 04:11:52AM +0000, ZheNing Hu via GitGitGadget wrote:
> >  t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++
>
> It looks like this test is non-executable, leading to the following
> error:
>
>     ~/s/git [nand] (zh/diff--scope) $ make test
>         SUBDIR templates
>     make -C t/ all
>     make[1]: Entering directory '/home/ttaylorr/src/git/t'
>     rm -f -r 'test-results'
>     non-executable tests: t4070-diff-sparse-checkout-scope.sh
>     make[1]: *** [Makefile:120: test-lint-executable] Error 1
>     make[1]: *** Waiting for unfinished jobs....
>     make[1]: Leaving directory '/home/ttaylorr/src/git/t'
>     make: *** [Makefile:3083: test] Error 2
>

Thanks for pointing out that I had overlooked this problem.

> Thanks,
> Taylor

ZheNing Hu
Elijah Newren Nov. 1, 2022, 5:18 a.m. UTC | #3
Hi,

On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: ZheNing Hu <adlternative@gmail.com>
>
> When we use sparse-checkout, we often want the set of files
> that some commands operate on to be restricted to the
> sparse-checkout specification.

It seems a bit premature to send this, when the guideline document[*]
detailing how these options should work is still in the "Needs Review"
state.  I know, it's been waiting for a while, but it's a _long_
document.

[*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/

> So introduce the `--scope` option to git diff, which have two
> value: "sparse" and "all". "sparse" mean that diff is performed
> restrict to paths which matching sparse-checkout specification,
> "all" mean that diff is performed regardless of whether the path
> meets the sparse-checkout specification.

The wording probably needs some care to reflect the fact that it only
affects cases where either --cached or revisions are passed.  In
particular, your wording for "all" suggests behavior very different
from today, whereas "all" is probably best thought of as today's
current behavior.  For example, a plain `git diff` without --cached or
revisions, should be unaffected by either of these flags.

> `--no-scope` is the default
> option for now.

What does --no-scope even mean?  You didn't explain it, and I don't
see how it could make sense.  We explicitly avoided a `--no-` prefix
by allowing the --scope option to take a value.  I don't think there
should be a --no-scope option.

> Add `diff.scope={sparse, all}` config, which can also have the same
> capabilities as `--scope`, and it will be covered by `--scope` option.

This is not what we want.  The high level usecases should not need to
be configured per-command.  There should be a config option which
reflects the high level use cases (e.g. sparse.scope) and then all
relevant commands (diff, log, grep, etc.) can key off it.

> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>     [RFC] diff: introduce scope option
>
>     In [1], we discovered that users working on different sparse-checkout
>     specification may download unnecessary blobs from each other's
>     specification in collaboration. In [2] Junio suggested that maybe we can
>     restrict some git command's filespec in sparse-checkout specification to
>     elegantly solve this problem above. In [3]: Newren and Derrick Stolee
>     prefer to name the option --scope={sparse, all}.
>
>     So this patch is attempt to do this thing on git diff:
>
>     v1:
>
>      1. add --restrict option to git diff, which restrict diff filespec in
>         sparse-checkout specification. [4] v2.
>      2. rename --restrict to --scope={sparse, all}, support --no-scope.
>      3. add config: diff.scope={sparse,all}.
>
>     Unresolved work:
>
>      1. how to properly pass this --scope={sparse, all} to other commands
>         like git log, git format-patch, etc.

log & grep should accept a similar flag.  format-patch should not, and
should ignore any config in this area.

>      2. how to set the default value of scope for different diff commands.

I don't understand this.

>     [1]:
>     https://lore.kernel.org/git/CAOLTT8SHo66kGbvWr=+LQ9UVd1NHgqGGEYK2qq6==QgRCgLZqQ@mail.gmail.com/
>     [2]: https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ [3]:
>     https://lore.kernel.org/git/07a25d48-e364-0d9b-6ffa-41a5984eb5db@github.com/
>     [4]:
>     https://lore.kernel.org/git/pull.1368.git.1664036052741.gitgitgadget@gmail.com/
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1398%2Fadlternative%2Fzh%2Fdiff-scope-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1398/adlternative/zh/diff-scope-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1398
>
>  Documentation/config/diff.txt         |  12 +
>  Documentation/diff-options.txt        |  18 +
>  builtin/diff.c                        |   4 +
>  diff-lib.c                            |  36 +-
>  diff-no-index.c                       |   4 +
>  diff.c                                |  39 +++
>  diff.h                                |  11 +
>  t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++
>  tree-diff.c                           |   5 +
>  9 files changed, 597 insertions(+), 1 deletion(-)
>  create mode 100644 t/t4070-diff-sparse-checkout-scope.sh
>
> diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
> index 35a7bf86d77..52707e1b2d6 100644
> --- a/Documentation/config/diff.txt
> +++ b/Documentation/config/diff.txt
> @@ -201,6 +201,18 @@ diff.algorithm::
>  --
>  +
>
> +diff.scope::
> +       Choose diff scope. The variants are as follows:
> ++
> +--
> +`sparse`;;
> +       Restrict diff paths to those matching sparse-checkout specification.
> +`all`;;
> +       Without restriction, diff is performed regardless of whether the path
> +       meets the sparse-checkout specification.

As noted above, this is the wrong level to specify things.  The
description for "all" is misleading as well and suggests something
other than "behavior B" from the direction document.

>  diff.wsErrorHighlight::
>         Highlight whitespace errors in the `context`, `old` or `new`
>         lines of the diff.  Multiple values are separated by comma,
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 3674ac48e92..04bf83e8be1 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
>  non-default value and want to use the default one, then you
>  have to use `--diff-algorithm=default` option.
>
> +ifndef::git-format-patch[]
> +ifndef::git-log[]
> +
> +--scope={sparse|all}::
> +       Choose diff scope. The variants are as follows:
> ++
> +--
> +`--sparse`;;
> +       Restrict diff paths to those matching sparse-checkout specification.
> +`--all`;;
> +       Without restriction, diff is performed regardless of whether the path
> +       meets the sparse-checkout specification.
> +--
> ++
> +
> +endif::git-log[]
> +endif::git-format-patch[]

What about diff-files, diff-index, diff-tree, and show?

> +
>  --stat[=<width>[,<name-width>[,<count>]]]::
>         Generate a diffstat. By default, as much space as necessary
>         will be used for the filename part, and the rest for the graph
> diff --git a/builtin/diff.c b/builtin/diff.c
> index 854d2c5a5c4..6b450f7184c 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -54,6 +54,10 @@ static void stuff_change(struct diff_options *opt,
>             oideq(old_oid, new_oid) && (old_mode == new_mode))
>                 return;
>
> +       if (opt->scope == DIFF_SCOPE_SPARSE &&
> +           !diff_paths_in_sparse_checkout(old_path, new_path))
> +               return;

This can't be right.
   git diff c231e0f26fe9b2ea9ec46aa68ff95ba984ce592e
72d42bd856228c15f702fa3c353432f4f1defe03
(to directly diff two known blobs) will go through this function, with
old_path == c231e0f26fe9b2ea9ec46aa68ff95ba984ce592e and new_path ==
72d42bd856228c15f702fa3c353432f4f1defe03.  But those aren't real
paths, and sparse-checkout should not restrict what is shown in those
cases.

> +
>         if (opt->flags.reverse_diff) {
>                 SWAP(old_mode, new_mode);
>                 SWAP(old_oid, new_oid);
> diff --git a/diff-lib.c b/diff-lib.c
> index 2edea41a234..a3381f2e0ff 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -88,6 +88,22 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
>         return changed;
>  }
>
> +int diff_path_in_sparse_checkout(const char *path) {
> +       if (core_sparse_checkout_cone)
> +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> +       else
> +               return path_in_sparse_checkout(path, the_repository->index);
> +}

This says we are including the path if it matches the sparsity
patterns.  Thus, we have to be careful when we use this function,
because the relevant paths are ones that match the sparsity
specification.  The sparsity specification will always match the
sparsity patterns when diffing two commits, but when either the index
or the working tree is part of the diff, the sparsity specification
*might* be temporarily expanded.

> +int diff_paths_in_sparse_checkout(const char *one, const char*two) {
> +       if (one == two || !strcmp(one, two))
> +               return diff_path_in_sparse_checkout(one);
> +       else
> +               return diff_path_in_sparse_checkout(one) &&
> +                      diff_path_in_sparse_checkout(two);

Why && rather than || ?

> +}
> +
> +
>  int run_diff_files(struct rev_info *revs, unsigned int option)
>  {
>         int entries, i;
> @@ -113,6 +129,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>
>                 if (diff_can_quit_early(&revs->diffopt))
>                         break;
> +               if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
> +                   !diff_path_in_sparse_checkout(ce->name))
> +                       continue;

Here you've cut off the possibility of showing diffs for anything
outside the sparsity patterns, which is a mistake.  We need to handle
a temporarily expanded sparse specification too.

>                 if (!ce_path_match(istate, ce, &revs->prune_data, NULL))
>                         continue;
> @@ -202,7 +221,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>                                 continue;
>                 }
>
> -               if (ce_uptodate(ce) || ce_skip_worktree(ce))
> +               if (ce_uptodate(ce) ||
> +                   (revs->diffopt.scope != DIFF_SCOPE_ALL && ce_skip_worktree(ce)))
>                         continue;

Here you make --scope=all show files even if they are skip-worktree,
making them appear to have been deleted.  I called out your
description earlier as potentially misleading, because it could imply
this behavior.  It looks like you were consistent with the description
and implementation, it just doesn't match what we want.

>                 /*
> @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,

do_oneway_diff is for cases where we are diffing against the index...

>                         return; /* nothing to diff.. */
>         }
>
> +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> +               if (idx && tree) {
> +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> +                               return;
> +               } else if (idx) {
> +                       if (!diff_path_in_sparse_checkout(idx->name))
> +                               return;
> +               } else if (tree) {
> +                       if (!diff_path_in_sparse_checkout(tree->name))
> +                               return;
> +               } else
> +                       return;
> +       }

...and you again mistakenly only compare to the sparsity patterns
instead of the sparse specification.

> +
>         /* if the entry is not checked out, don't examine work tree */
>         cached = o->index_only ||
>                 (idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx)));



> diff --git a/diff-no-index.c b/diff-no-index.c
> index 18edbdf4b59..ea94a104ea4 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -281,6 +281,10 @@ int diff_no_index(struct rev_info *revs,
>
>         fixup_paths(paths, &replacement);
>
> +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
> +           !diff_paths_in_sparse_checkout(paths[0], paths[1]))
> +               goto out;

--no-index means we're diffing two files that are not tracked, or at
least treating them as not tracked.  sparse-checkout should not affect
such files.

> +
>         revs->diffopt.skip_stat_unmatch = 1;
>         if (!revs->diffopt.output_format)
>                 revs->diffopt.output_format = DIFF_FORMAT_PATCH;
> diff --git a/diff.c b/diff.c
> index 285d6e2d575..9de4044ae05 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -48,6 +48,7 @@ static int diff_interhunk_context_default;
>  static const char *diff_word_regex_cfg;
>  static const char *external_diff_cmd_cfg;
>  static const char *diff_order_file_cfg;
> +static const char *external_diff_scope_cfg;
>  int diff_auto_refresh_index = 1;
>  static int diff_mnemonic_prefix;
>  static int diff_no_prefix;
> @@ -57,6 +58,7 @@ static int diff_dirstat_permille_default = 30;
>  static struct diff_options default_diff_options;
>  static long diff_algorithm;
>  static unsigned ws_error_highlight_default = WSEH_NEW;
> +static enum diff_scope external_diff_scope;

Why is this called "external"?

>  static char diff_colors[][COLOR_MAXLEN] = {
>         GIT_COLOR_RESET,
> @@ -423,6 +425,16 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>                 return 0;
>         }
>
> +       if (!strcmp(var, "diff.scope")) {
> +               git_config_string(&external_diff_scope_cfg, var, value);
> +               if (!strcmp(value, "all"))
> +                       external_diff_scope = DIFF_SCOPE_ALL;
> +               else if (!strcmp(value, "sparse"))
> +                       external_diff_scope = DIFF_SCOPE_SPARSE;
> +               else
> +                       return -1;
> +       }
> +
>         if (git_color_config(var, value, cb) < 0)
>                 return -1;
>
> @@ -4663,6 +4675,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options)
>
>         options->color_moved = diff_color_moved_default;
>         options->color_moved_ws_handling = diff_color_moved_ws_default;
> +       options->scope = external_diff_scope;
>
>         prep_parse_options(options);
>  }
> @@ -4914,6 +4927,29 @@ static int parse_dirstat_opt(struct diff_options *options, const char *params)
>         return 1;
>  }
>
> +static int diff_opt_diff_scope(const struct option *option,
> +                               const char *optarg, int unset)
> +{
> +       struct diff_options *opt = option->value;
> +
> +       if (unset) {
> +               opt->scope = DIFF_SCOPE_NONE;

I think we should instead have a
    BUG_ON_OPT_NEG(unset)
or, even better, a
    BUG_ON_OPT_NEG_NOARG(unset, optarg)
at the beginning of this function...

> +       } else if (optarg) {

...which would also allow you to drop this if and dedent the rest of
the function.

> +               if (!strcmp(optarg, "all")) {
> +                       if (core_apply_sparse_checkout) {
> +                               opt->scope = DIFF_SCOPE_ALL;
> +                       }
> +               } else if (!strcmp(optarg, "sparse")) {
> +                       if (core_apply_sparse_checkout) {
> +                               opt->scope = DIFF_SCOPE_SPARSE;
> +                       }

If core_apply_sparse_checkout is false, should we perhaps throw an
error instead of just silently ignoring the option the user passed?

> +               } else
> +                       return error(_("invalid --scope value: %s"), optarg);
> +       }

As written with no follow-on else clause here, wouldn't this silently
accept "--scope" without an "=<something>" argument without an error
and without properly initializing opt->scope?

> +
> +       return 0;
> +}
> +
>  static int diff_opt_diff_filter(const struct option *option,
>                                 const char *optarg, int unset)
>  {
> @@ -5683,6 +5719,9 @@ static void prep_parse_options(struct diff_options *options)
>                 OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
>                                N_("select files by diff type"),
>                                PARSE_OPT_NONEG, diff_opt_diff_filter),
> +               OPT_CALLBACK_F(0, "scope", options, N_("[sparse|all]"),
> +                              N_("choose diff scope"),

maybe "choose diff scope in sparse checkouts"?

> +                              PARSE_OPT_OPTARG, diff_opt_diff_scope),
>                 { OPTION_CALLBACK, 0, "output", options, N_("<file>"),
>                   N_("output to a specific file"),
>                   PARSE_OPT_NONEG, NULL, 0, diff_opt_output },
> diff --git a/diff.h b/diff.h
> index 8ae18e5ab1e..90f7512034c 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -230,6 +230,12 @@ enum diff_submodule_format {
>         DIFF_SUBMODULE_INLINE_DIFF
>  };
>
> +enum diff_scope {
> +       DIFF_SCOPE_NONE = 0,
> +       DIFF_SCOPE_ALL,
> +       DIFF_SCOPE_SPARSE,
> +};
> +
>  /**
>   * the set of options the calling program wants to affect the operation of
>   * diffcore library with.
> @@ -285,6 +291,9 @@ struct diff_options {
>         /* diff-filter bits */
>         unsigned int filter, filter_not;
>
> +       /* diff sparse-checkout scope */
> +       enum diff_scope scope;
> +
>         int use_color;
>
>         /* Number of context lines to generate in patch output. */
> @@ -696,4 +705,6 @@ void print_stat_summary(FILE *fp, int files,
>                         int insertions, int deletions);
>  void setup_diff_pager(struct diff_options *);
>
> +int diff_path_in_sparse_checkout(const char *path);
> +int diff_paths_in_sparse_checkout(const char *one, const char *two);
>  #endif /* DIFF_H */
> diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> new file mode 100644

This needs to be fixed.

> index 00000000000..dca75a3308b
> --- /dev/null
> +++ b/t/t4070-diff-sparse-checkout-scope.sh
> @@ -0,0 +1,469 @@
> +#!/bin/sh
> +
> +test_description='diff sparse-checkout scope'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +. ./test-lib.sh
> +
> +
> +test_expect_success 'setup' '
> +       git init temp &&
> +       (
> +               cd temp &&
> +               mkdir sub1 &&
> +               mkdir sub2 &&
> +               echo sub1/file1 >sub1/file1 &&
> +               echo sub2/file2 >sub2/file2 &&
> +               echo file1 >file1 &&
> +               echo file2 >file2 &&
> +               git add --all &&
> +               git commit -m init &&
> +               echo sub1/file1 >>sub1/file1 &&
> +               echo sub1/file2 >>sub1/file2 &&
> +               echo sub2/file1 >>sub2/file1 &&
> +               echo sub2/file2 >>sub2/file2 &&
> +               echo file1 >>file1 &&
> +               echo file2 >>file2 &&
> +               git add --all &&
> +               git commit -m change1 &&
> +               echo sub1/file1 >>sub1/file1 &&
> +               echo sub1/file2 >>sub1/file2 &&
> +               echo sub2/file1 >>sub2/file1 &&
> +               echo sub2/file2 >>sub2/file2 &&
> +               echo file1 >>file1 &&
> +               echo file2 >>file2 &&
> +               git add --all &&
> +               git commit -m change2
> +       )
> +'
> +
> +reset_repo () {
> +       rm -rf repo &&
> +       git clone --no-checkout temp repo

Why --no-checkout rather than say --sparse?

> +}
> +
> +reset_with_sparse_checkout() {
> +       reset_repo &&
> +       git -C repo sparse-checkout set $1 sub1 &&
> +       git -C repo checkout

Fixing the above would let us get rid of this really weird extra
checkout command too.

> +}
> +
> +change_worktree_and_index() {
> +       (
> +               cd repo &&
> +               mkdir sub2 sub3 &&
> +               echo sub1/file3 >sub1/file3 &&
> +               echo sub2/file3 >sub2/file3 &&
> +               echo sub3/file3 >sub3/file3 &&
> +               echo file3 >file3 &&
> +               git add --all --sparse &&
> +               echo sub1/file3 >>sub1/file3 &&
> +               echo sub2/file3 >>sub2/file3 &&
> +               echo sub3/file3 >>sub3/file3 &&
> +               echo file3 >>file3
> +       )
> +}

It would be nice to modify different paths in the working tree and
index, to see if we can handle cases where the sparse specification
does not match the sparsity patterns better.  (So, modify files inside
and outside the sparsity patterns, stage those changes, and then do a
`git sparse-checkout reapply` to make the files outside the sparsity
patterns disappear from the working tree...then modify different files
in the working tree both inside and outside the sparsity patterns.
And also remove some file that matches the sparsity patterns and
manually mark it as SKIP_WORKTREE.)  That'd give us much better
coverage.

> +
> +diff_scope() {
> +       title=$1
> +       need_change_worktree_and_index=$2
> +       sparse_checkout_option=$3
> +       scope_option=$4
> +       expect=$5
> +       shift 5
> +       args=("$@")
> +
> +       test_expect_success "$title $sparse_checkout_option $scope_option" "
> +               reset_with_sparse_checkout $sparse_checkout_option &&
> +               if test \"$need_change_worktree_and_index\" = \"true\" ; then
> +                       change_worktree_and_index
> +               fi &&
> +               git -C repo diff $scope_option ${args[*]} >actual &&
> +               if test -z \"$expect\" ; then
> +                       >expect
> +               else
> +                       cat > expect <<-EOF
> +$expect
> +                       EOF
> +               fi &&
> +               test_cmp expect actual
> +       "
> +}
> +
> +args=("--name-only" "HEAD" "HEAD~")
> +diff_scope builtin_diff_tree false "--no-cone" "--scope=sparse" \
> +"sub1/file1
> +sub1/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--no-cone" "--scope=all" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2
> +sub2/file1
> +sub2/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--no-cone" "--no-scope" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2
> +sub2/file1
> +sub2/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--cone" "--scope=sparse" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--cone" "--scope=all" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2
> +sub2/file1
> +sub2/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--cone" "--no-scope" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2
> +sub2/file1
> +sub2/file2" "${args[@]}"
> +
> +args=("--name-only" "HEAD~")
> +diff_scope builtin_diff_index true "--no-cone" "--scope=sparse" \
> +"sub1/file1
> +sub1/file2
> +sub1/file3" "${args[@]}"

Here's a good example where the testcase is doing the wrong thing.
The expected answer here would also include file3, sub2/file3, and
sub3/file3.

> +
> +diff_scope builtin_diff_index true "--no-cone" "--scope=all" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3
> +sub3/file3" "${args[@]}"
> +
> +diff_scope builtin_diff_index true "--no-cone" "--no-scope" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3
> +sub3/file3" "${args[@]}"
> +
> +diff_scope builtin_diff_index true "--cone" "--scope=sparse" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3" "${args[@]}"

This is also wrong; it's missing sub2/file3 and sub3/file3.

> +
> +diff_scope builtin_diff_index true "--cone" "--scope=all" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3
> +sub3/file3" "${args[@]}"
> +
> +diff_scope builtin_diff_index true "--cone" "--no-scope" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3
> +sub3/file3" "${args[@]}"
> +
> +args=("--name-only" "file3" "sub1/" "sub2/")
> +
> +diff_scope builtin_diff_files true "--no-cone" "--scope=sparse" \
> +"sub1/file3" "${args[@]}"

This should also include file3, sub2/file3, and sub3/file3.
`--scope=` should not affect diff output at all if neither --cached
nor revision arguments are supplied.

> +
> +diff_scope builtin_diff_files true "--no-cone" "--scope=all" \
> +"file3
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3" "${args[@]}"

This is wrong due to including too much; it should not include
sub2/file1 or sub2/file2 (it is only including those because it is
showing them as deleted, when they are not deleted but are
SKIP_WORKTREE).

I think I'm going to stop reviewing here.  I'm probably just going to
keep repeating the same issues I identified earlier if I continue.
ZheNing Hu Nov. 6, 2022, 2:11 a.m. UTC | #4
inHi,

Elijah Newren <newren@gmail.com> 于2022年11月1日周二 13:18写道:
>
> Hi,
>
> On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > When we use sparse-checkout, we often want the set of files
> > that some commands operate on to be restricted to the
> > sparse-checkout specification.
>
> It seems a bit premature to send this, when the guideline document[*]
> detailing how these options should work is still in the "Needs Review"
> state.  I know, it's been waiting for a while, but it's a _long_
> document.
>
> [*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/
>

Fine, I just want to start trying to experiment with this feature in
git-diff earlier,
and I can wait for the sparse-checkout.txt documentation to finish
first if needed :)

> > So introduce the `--scope` option to git diff, which have two
> > value: "sparse" and "all". "sparse" mean that diff is performed
> > restrict to paths which matching sparse-checkout specification,
> > "all" mean that diff is performed regardless of whether the path
> > meets the sparse-checkout specification.
>
> The wording probably needs some care to reflect the fact that it only
> affects cases where either --cached or revisions are passed.  In
> particular, your wording for "all" suggests behavior very different
> from today, whereas "all" is probably best thought of as today's
> current behavior.  For example, a plain `git diff` without --cached or
> revisions, should be unaffected by either of these flags.
>

Yes, after re-reading your sparse-checkout.txt patch, I realized that I
misinterpreted "--scope=sparse" as sparse patterns instead of sparse
specification, and misinterpreted "-scope=all" as diff on all files.

> > `--no-scope` is the default
> > option for now.
>
> What does --no-scope even mean?  You didn't explain it, and I don't
> see how it could make sense.  We explicitly avoided a `--no-` prefix
> by allowing the --scope option to take a value.  I don't think there
> should be a --no-scope option.
>

I think the “--no-scope” here does nothing, as if it were unaffected by scope
(just like correctly "--scope=all", right?)

> > Add `diff.scope={sparse, all}` config, which can also have the same
> > capabilities as `--scope`, and it will be covered by `--scope` option.
>
> This is not what we want.  The high level usecases should not need to
> be configured per-command.  There should be a config option which
> reflects the high level use cases (e.g. sparse.scope) and then all
> relevant commands (diff, log, grep, etc.) can key off it.
>

Ok, using a global config should indeed be more useful.

> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >     [RFC] diff: introduce scope option
> >
> >     In [1], we discovered that users working on different sparse-checkout
> >     specification may download unnecessary blobs from each other's
> >     specification in collaboration. In [2] Junio suggested that maybe we can
> >     restrict some git command's filespec in sparse-checkout specification to
> >     elegantly solve this problem above. In [3]: Newren and Derrick Stolee
> >     prefer to name the option --scope={sparse, all}.
> >
> >     So this patch is attempt to do this thing on git diff:
> >
> >     v1:
> >
> >      1. add --restrict option to git diff, which restrict diff filespec in
> >         sparse-checkout specification. [4] v2.
> >      2. rename --restrict to --scope={sparse, all}, support --no-scope.
> >      3. add config: diff.scope={sparse,all}.
> >
> >     Unresolved work:
> >
> >      1. how to properly pass this --scope={sparse, all} to other commands
> >         like git log, git format-patch, etc.
>
> log & grep should accept a similar flag.  format-patch should not, and
> should ignore any config in this area.
>
> >      2. how to set the default value of scope for different diff commands.
>
> I don't understand this.
>

Oh, I was just curious if the config defaults for scope needed to be configured
separately for the different diff commands  git diff-files, git diff-index,
git diff-no-index, git diff-tree, since sparse-checkout.txt mentions
the different
behavior of scope for them. Now I think this just needs to be handled in command
code logic and no additional command level configuration is needed.

> >     [1]:
> >     https://lore.kernel.org/git/CAOLTT8SHo66kGbvWr=+LQ9UVd1NHgqGGEYK2qq6==QgRCgLZqQ@mail.gmail.com/
> >     [2]: https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ [3]:
> >     https://lore.kernel.org/git/07a25d48-e364-0d9b-6ffa-41a5984eb5db@github.com/
> >     [4]:
> >     https://lore.kernel.org/git/pull.1368.git.1664036052741.gitgitgadget@gmail.com/
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1398%2Fadlternative%2Fzh%2Fdiff-scope-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1398/adlternative/zh/diff-scope-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/1398
> >
> >  Documentation/config/diff.txt         |  12 +
> >  Documentation/diff-options.txt        |  18 +
> >  builtin/diff.c                        |   4 +
> >  diff-lib.c                            |  36 +-
> >  diff-no-index.c                       |   4 +
> >  diff.c                                |  39 +++
> >  diff.h                                |  11 +
> >  t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++
> >  tree-diff.c                           |   5 +
> >  9 files changed, 597 insertions(+), 1 deletion(-)
> >  create mode 100644 t/t4070-diff-sparse-checkout-scope.sh
> >
> > diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
> > index 35a7bf86d77..52707e1b2d6 100644
> > --- a/Documentation/config/diff.txt
> > +++ b/Documentation/config/diff.txt
> > @@ -201,6 +201,18 @@ diff.algorithm::
> >  --
> >  +
> >
> > +diff.scope::
> > +       Choose diff scope. The variants are as follows:
> > ++
> > +--
> > +`sparse`;;
> > +       Restrict diff paths to those matching sparse-checkout specification.
> > +`all`;;
> > +       Without restriction, diff is performed regardless of whether the path
> > +       meets the sparse-checkout specification.
>
> As noted above, this is the wrong level to specify things.  The
> description for "all" is misleading as well and suggests something
> other than "behavior B" from the direction document.
>

So do we need to make "--scope=all" correspond to the "behavior B",
The correct description of it should be: "worktree-sparse-history-dense"...

> >  diff.wsErrorHighlight::
> >         Highlight whitespace errors in the `context`, `old` or `new`
> >         lines of the diff.  Multiple values are separated by comma,
> > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > index 3674ac48e92..04bf83e8be1 100644
> > --- a/Documentation/diff-options.txt
> > +++ b/Documentation/diff-options.txt
> > @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
> >  non-default value and want to use the default one, then you
> >  have to use `--diff-algorithm=default` option.
> >
> > +ifndef::git-format-patch[]
> > +ifndef::git-log[]
> > +
> > +--scope={sparse|all}::
> > +       Choose diff scope. The variants are as follows:
> > ++
> > +--
> > +`--sparse`;;
> > +       Restrict diff paths to those matching sparse-checkout specification.
> > +`--all`;;
> > +       Without restriction, diff is performed regardless of whether the path
> > +       meets the sparse-checkout specification.
> > +--
> > ++
> > +
> > +endif::git-log[]
> > +endif::git-format-patch[]
>
> What about diff-files, diff-index, diff-tree, and show?
>

diff-options.txt included in their documents, and git-format-patch.txt,
git-log.txt, but should not in git-format-patch.txt. I don't know if it
should be included in git-diff-files.txt, because git diff-files compare
the files in the working tree and the index (so it's the same as
"simple" git-diff, which should not be affected by scope?)

> > +
> >  --stat[=<width>[,<name-width>[,<count>]]]::
> >         Generate a diffstat. By default, as much space as necessary
> >         will be used for the filename part, and the rest for the graph
> > diff --git a/builtin/diff.c b/builtin/diff.c
> > index 854d2c5a5c4..6b450f7184c 100644
> > --- a/builtin/diff.c
> > +++ b/builtin/diff.c
> > @@ -54,6 +54,10 @@ static void stuff_change(struct diff_options *opt,
> >             oideq(old_oid, new_oid) && (old_mode == new_mode))
> >                 return;
> >
> > +       if (opt->scope == DIFF_SCOPE_SPARSE &&
> > +           !diff_paths_in_sparse_checkout(old_path, new_path))
> > +               return;
>
> This can't be right.
>    git diff c231e0f26fe9b2ea9ec46aa68ff95ba984ce592e
> 72d42bd856228c15f702fa3c353432f4f1defe03
> (to directly diff two known blobs) will go through this function, with
> old_path == c231e0f26fe9b2ea9ec46aa68ff95ba984ce592e and new_path ==
> 72d42bd856228c15f702fa3c353432f4f1defe03.  But those aren't real
> paths, and sparse-checkout should not restrict what is shown in those
> cases.
>

Yeah, it's buggy that I forget to check two blobs without paths.

> > +
> >         if (opt->flags.reverse_diff) {
> >                 SWAP(old_mode, new_mode);
> >                 SWAP(old_oid, new_oid);
> > diff --git a/diff-lib.c b/diff-lib.c
> > index 2edea41a234..a3381f2e0ff 100644
> > --- a/diff-lib.c
> > +++ b/diff-lib.c
> > @@ -88,6 +88,22 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
> >         return changed;
> >  }
> >
> > +int diff_path_in_sparse_checkout(const char *path) {
> > +       if (core_sparse_checkout_cone)
> > +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> > +       else
> > +               return path_in_sparse_checkout(path, the_repository->index);
> > +}
>
> This says we are including the path if it matches the sparsity
> patterns.  Thus, we have to be careful when we use this function,
> because the relevant paths are ones that match the sparsity
> specification.  The sparsity specification will always match the
> sparsity patterns when diffing two commits, but when either the index
> or the working tree is part of the diff, the sparsity specification
> *might* be temporarily expanded.
>

Yes, I may have to look at more code to better understand how and when the
"sparsity specification" is extended. Any recommendations for places to read?

> > +int diff_paths_in_sparse_checkout(const char *one, const char*two) {
> > +       if (one == two || !strcmp(one, two))
> > +               return diff_path_in_sparse_checkout(one);
> > +       else
> > +               return diff_path_in_sparse_checkout(one) &&
> > +                      diff_path_in_sparse_checkout(two);
>
> Why && rather than || ?
>

Agree, we do need to use || here, because the one diff side in the sparse
specification, we should diff the two files.

> > +}
> > +
> > +
> >  int run_diff_files(struct rev_info *revs, unsigned int option)
> >  {
> >         int entries, i;
> > @@ -113,6 +129,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> >
> >                 if (diff_can_quit_early(&revs->diffopt))
> >                         break;
> > +               if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
> > +                   !diff_path_in_sparse_checkout(ce->name))
> > +                       continue;
>
> Here you've cut off the possibility of showing diffs for anything
> outside the sparsity patterns, which is a mistake.  We need to handle
> a temporarily expanded sparse specification too.
>

Agree.

> >                 if (!ce_path_match(istate, ce, &revs->prune_data, NULL))
> >                         continue;
> > @@ -202,7 +221,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> >                                 continue;
> >                 }
> >
> > -               if (ce_uptodate(ce) || ce_skip_worktree(ce))
> > +               if (ce_uptodate(ce) ||
> > +                   (revs->diffopt.scope != DIFF_SCOPE_ALL && ce_skip_worktree(ce)))
> >                         continue;
>
> Here you make --scope=all show files even if they are skip-worktree,
> making them appear to have been deleted.  I called out your
> description earlier as potentially misleading, because it could imply
> this behavior.  It looks like you were consistent with the description
> and implementation, it just doesn't match what we want.
>

Agree too. So we should not do anything in run_diff_files.

> >                 /*
> > @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
>
> do_oneway_diff is for cases where we are diffing against the index...
>
> >                         return; /* nothing to diff.. */
> >         }
> >
> > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> > +               if (idx && tree) {
> > +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> > +                               return;
> > +               } else if (idx) {
> > +                       if (!diff_path_in_sparse_checkout(idx->name))
> > +                               return;
> > +               } else if (tree) {
> > +                       if (!diff_path_in_sparse_checkout(tree->name))
> > +                               return;
> > +               } else
> > +                       return;
> > +       }
>
> ...and you again mistakenly only compare to the sparsity patterns
> instead of the sparse specification.
>

So here we should use ce_skip_worktree(idx) to check if this index entry matches
sparse specification.

> > +
> >         /* if the entry is not checked out, don't examine work tree */
> >         cached = o->index_only ||
> >                 (idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx)));
>
>
>
> > diff --git a/diff-no-index.c b/diff-no-index.c
> > index 18edbdf4b59..ea94a104ea4 100644
> > --- a/diff-no-index.c
> > +++ b/diff-no-index.c
> > @@ -281,6 +281,10 @@ int diff_no_index(struct rev_info *revs,
> >
> >         fixup_paths(paths, &replacement);
> >
> > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
> > +           !diff_paths_in_sparse_checkout(paths[0], paths[1]))
> > +               goto out;
>
> --no-index means we're diffing two files that are not tracked, or at
> least treating them as not tracked.  sparse-checkout should not affect
> such files.
>

Yeah, we should care about untracked files sparse-checkout only when
we are using git add/rm/update-index...

> > +
> >         revs->diffopt.skip_stat_unmatch = 1;
> >         if (!revs->diffopt.output_format)
> >                 revs->diffopt.output_format = DIFF_FORMAT_PATCH;
> > diff --git a/diff.c b/diff.c
> > index 285d6e2d575..9de4044ae05 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -48,6 +48,7 @@ static int diff_interhunk_context_default;
> >  static const char *diff_word_regex_cfg;
> >  static const char *external_diff_cmd_cfg;
> >  static const char *diff_order_file_cfg;
> > +static const char *external_diff_scope_cfg;
> >  int diff_auto_refresh_index = 1;
> >  static int diff_mnemonic_prefix;
> >  static int diff_no_prefix;
> > @@ -57,6 +58,7 @@ static int diff_dirstat_permille_default = 30;
> >  static struct diff_options default_diff_options;
> >  static long diff_algorithm;
> >  static unsigned ws_error_highlight_default = WSEH_NEW;
> > +static enum diff_scope external_diff_scope;
>
> Why is this called "external"?
>

Learn from external_diff_cmd_cfg, I should remove it.

> >  static char diff_colors[][COLOR_MAXLEN] = {
> >         GIT_COLOR_RESET,
> > @@ -423,6 +425,16 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
> >                 return 0;
> >         }
> >
> > +       if (!strcmp(var, "diff.scope")) {
> > +               git_config_string(&external_diff_scope_cfg, var, value);
> > +               if (!strcmp(value, "all"))
> > +                       external_diff_scope = DIFF_SCOPE_ALL;
> > +               else if (!strcmp(value, "sparse"))
> > +                       external_diff_scope = DIFF_SCOPE_SPARSE;
> > +               else
> > +                       return -1;
> > +       }
> > +
> >         if (git_color_config(var, value, cb) < 0)
> >                 return -1;
> >
> > @@ -4663,6 +4675,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options)
> >
> >         options->color_moved = diff_color_moved_default;
> >         options->color_moved_ws_handling = diff_color_moved_ws_default;
> > +       options->scope = external_diff_scope;
> >
> >         prep_parse_options(options);
> >  }
> > @@ -4914,6 +4927,29 @@ static int parse_dirstat_opt(struct diff_options *options, const char *params)
> >         return 1;
> >  }
> >
> > +static int diff_opt_diff_scope(const struct option *option,
> > +                               const char *optarg, int unset)
> > +{
> > +       struct diff_options *opt = option->value;
> > +
> > +       if (unset) {
> > +               opt->scope = DIFF_SCOPE_NONE;
>
> I think we should instead have a
>     BUG_ON_OPT_NEG(unset)
> or, even better, a
>     BUG_ON_OPT_NEG_NOARG(unset, optarg)
> at the beginning of this function...
>
> > +       } else if (optarg) {
>
> ...which would also allow you to drop this if and dedent the rest of
> the function.
>

Agree.

> > +               if (!strcmp(optarg, "all")) {
> > +                       if (core_apply_sparse_checkout) {
> > +                               opt->scope = DIFF_SCOPE_ALL;
> > +                       }
> > +               } else if (!strcmp(optarg, "sparse")) {
> > +                       if (core_apply_sparse_checkout) {
> > +                               opt->scope = DIFF_SCOPE_SPARSE;
> > +                       }
>
> If core_apply_sparse_checkout is false, should we perhaps throw an
> error instead of just silently ignoring the option the user passed?
>

Agree.

> > +               } else
> > +                       return error(_("invalid --scope value: %s"), optarg);
> > +       }
>
> As written with no follow-on else clause here, wouldn't this silently
> accept "--scope" without an "=<something>" argument without an error
> and without properly initializing opt->scope?
>

Because opt will be initializing with default_diff_options in repo_diff_setup(),
and opt->scope should respect config value first. So I don't think there's a
mistake here.

> > +
> > +       return 0;
> > +}
> > +
> >  static int diff_opt_diff_filter(const struct option *option,
> >                                 const char *optarg, int unset)
> >  {
> > @@ -5683,6 +5719,9 @@ static void prep_parse_options(struct diff_options *options)
> >                 OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
> >                                N_("select files by diff type"),
> >                                PARSE_OPT_NONEG, diff_opt_diff_filter),
> > +               OPT_CALLBACK_F(0, "scope", options, N_("[sparse|all]"),
> > +                              N_("choose diff scope"),
>
> maybe "choose diff scope in sparse checkouts"?
>

OK.

> > +                              PARSE_OPT_OPTARG, diff_opt_diff_scope),
> >                 { OPTION_CALLBACK, 0, "output", options, N_("<file>"),
> >                   N_("output to a specific file"),
> >                   PARSE_OPT_NONEG, NULL, 0, diff_opt_output },
> > diff --git a/diff.h b/diff.h
> > index 8ae18e5ab1e..90f7512034c 100644
> > --- a/diff.h
> > +++ b/diff.h
> > @@ -230,6 +230,12 @@ enum diff_submodule_format {
> >         DIFF_SUBMODULE_INLINE_DIFF
> >  };
> >
> > +enum diff_scope {
> > +       DIFF_SCOPE_NONE = 0,
> > +       DIFF_SCOPE_ALL,
> > +       DIFF_SCOPE_SPARSE,
> > +};
> > +
> >  /**
> >   * the set of options the calling program wants to affect the operation of
> >   * diffcore library with.
> > @@ -285,6 +291,9 @@ struct diff_options {
> >         /* diff-filter bits */
> >         unsigned int filter, filter_not;
> >
> > +       /* diff sparse-checkout scope */
> > +       enum diff_scope scope;
> > +
> >         int use_color;
> >
> >         /* Number of context lines to generate in patch output. */
> > @@ -696,4 +705,6 @@ void print_stat_summary(FILE *fp, int files,
> >                         int insertions, int deletions);
> >  void setup_diff_pager(struct diff_options *);
> >
> > +int diff_path_in_sparse_checkout(const char *path);
> > +int diff_paths_in_sparse_checkout(const char *one, const char *two);
> >  #endif /* DIFF_H */
> > diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> > new file mode 100644
>
> This needs to be fixed.
>
> > index 00000000000..dca75a3308b
> > --- /dev/null
> > +++ b/t/t4070-diff-sparse-checkout-scope.sh
> > @@ -0,0 +1,469 @@
> > +#!/bin/sh
> > +
> > +test_description='diff sparse-checkout scope'
> > +
> > +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> > +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> > +
> > +. ./test-lib.sh
> > +
> > +
> > +test_expect_success 'setup' '
> > +       git init temp &&
> > +       (
> > +               cd temp &&
> > +               mkdir sub1 &&
> > +               mkdir sub2 &&
> > +               echo sub1/file1 >sub1/file1 &&
> > +               echo sub2/file2 >sub2/file2 &&
> > +               echo file1 >file1 &&
> > +               echo file2 >file2 &&
> > +               git add --all &&
> > +               git commit -m init &&
> > +               echo sub1/file1 >>sub1/file1 &&
> > +               echo sub1/file2 >>sub1/file2 &&
> > +               echo sub2/file1 >>sub2/file1 &&
> > +               echo sub2/file2 >>sub2/file2 &&
> > +               echo file1 >>file1 &&
> > +               echo file2 >>file2 &&
> > +               git add --all &&
> > +               git commit -m change1 &&
> > +               echo sub1/file1 >>sub1/file1 &&
> > +               echo sub1/file2 >>sub1/file2 &&
> > +               echo sub2/file1 >>sub2/file1 &&
> > +               echo sub2/file2 >>sub2/file2 &&
> > +               echo file1 >>file1 &&
> > +               echo file2 >>file2 &&
> > +               git add --all &&
> > +               git commit -m change2
> > +       )
> > +'
> > +
> > +reset_repo () {
> > +       rm -rf repo &&
> > +       git clone --no-checkout temp repo
>
> Why --no-checkout rather than say --sparse?
>

This is because I accidentally associated it with a
partial clone. I often use "git clone -filter=blob:none -no-checkout"
to do a partial clone, then "git sparse- checkout set <pattern>"
to set sparse-checkout patterns, and "git checkout" to download
the missing blobs and checkout to a branch. But in this
test file, we only need sparse-checkout, so it's true that I should
not do such strange no-checkout thing.

> > +}
> > +
> > +reset_with_sparse_checkout() {
> > +       reset_repo &&
> > +       git -C repo sparse-checkout set $1 sub1 &&
> > +       git -C repo checkout
>
> Fixing the above would let us get rid of this really weird extra
> checkout command too.
>
> > +}
> > +
> > +change_worktree_and_index() {
> > +       (
> > +               cd repo &&
> > +               mkdir sub2 sub3 &&
> > +               echo sub1/file3 >sub1/file3 &&
> > +               echo sub2/file3 >sub2/file3 &&
> > +               echo sub3/file3 >sub3/file3 &&
> > +               echo file3 >file3 &&
> > +               git add --all --sparse &&
> > +               echo sub1/file3 >>sub1/file3 &&
> > +               echo sub2/file3 >>sub2/file3 &&
> > +               echo sub3/file3 >>sub3/file3 &&
> > +               echo file3 >>file3
> > +       )
> > +}
>
> It would be nice to modify different paths in the working tree and
> index, to see if we can handle cases where the sparse specification
> does not match the sparsity patterns better.  (So, modify files inside
> and outside the sparsity patterns, stage those changes, and then do a
> `git sparse-checkout reapply` to make the files outside the sparsity
> patterns disappear from the working tree...then modify different files
> in the working tree both inside and outside the sparsity patterns.
> And also remove some file that matches the sparsity patterns and
> manually mark it as SKIP_WORKTREE.)  That'd give us much better
> coverage.
>

Nice addition. So I should use git update-index --skip-worktree to set
skip_worktree bit for index entries.

> > +
> > +diff_scope() {
> > +       title=$1
> > +       need_change_worktree_and_index=$2
> > +       sparse_checkout_option=$3
> > +       scope_option=$4
> > +       expect=$5
> > +       shift 5
> > +       args=("$@")
> > +
> > +       test_expect_success "$title $sparse_checkout_option $scope_option" "
> > +               reset_with_sparse_checkout $sparse_checkout_option &&
> > +               if test \"$need_change_worktree_and_index\" = \"true\" ; then
> > +                       change_worktree_and_index
> > +               fi &&
> > +               git -C repo diff $scope_option ${args[*]} >actual &&
> > +               if test -z \"$expect\" ; then
> > +                       >expect
> > +               else
> > +                       cat > expect <<-EOF
> > +$expect
> > +                       EOF
> > +               fi &&
> > +               test_cmp expect actual
> > +       "
> > +}
> > +
> > +args=("--name-only" "HEAD" "HEAD~")
> > +diff_scope builtin_diff_tree false "--no-cone" "--scope=sparse" \
> > +"sub1/file1
> > +sub1/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--no-cone" "--scope=all" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2
> > +sub2/file1
> > +sub2/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--no-cone" "--no-scope" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2
> > +sub2/file1
> > +sub2/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--cone" "--scope=sparse" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--cone" "--scope=all" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2
> > +sub2/file1
> > +sub2/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--cone" "--no-scope" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2
> > +sub2/file1
> > +sub2/file2" "${args[@]}"
> > +
> > +args=("--name-only" "HEAD~")
> > +diff_scope builtin_diff_index true "--no-cone" "--scope=sparse" \
> > +"sub1/file1
> > +sub1/file2
> > +sub1/file3" "${args[@]}"
>
> Here's a good example where the testcase is doing the wrong thing.
> The expected answer here would also include file3, sub2/file3, and
> sub3/file3.
>

Yeah. Files that are not part of the sparse-checkout patterns are temporarily
extended into the sparse specification.

> > +
> > +diff_scope builtin_diff_index true "--no-cone" "--scope=all" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3
> > +sub3/file3" "${args[@]}"
> > +
> > +diff_scope builtin_diff_index true "--no-cone" "--no-scope" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3
> > +sub3/file3" "${args[@]}"
> > +
> > +diff_scope builtin_diff_index true "--cone" "--scope=sparse" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3" "${args[@]}"
>
> This is also wrong; it's missing sub2/file3 and sub3/file3.
>
> > +
> > +diff_scope builtin_diff_index true "--cone" "--scope=all" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3
> > +sub3/file3" "${args[@]}"
> > +
> > +diff_scope builtin_diff_index true "--cone" "--no-scope" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3
> > +sub3/file3" "${args[@]}"
> > +
> > +args=("--name-only" "file3" "sub1/" "sub2/")
> > +
> > +diff_scope builtin_diff_files true "--no-cone" "--scope=sparse" \
> > +"sub1/file3" "${args[@]}"
>
> This should also include file3, sub2/file3, and sub3/file3.
> `--scope=` should not affect diff output at all if neither --cached
> nor revision arguments are supplied.
>

Agree.

> > +
> > +diff_scope builtin_diff_files true "--no-cone" "--scope=all" \
> > +"file3
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3" "${args[@]}"
>
> This is wrong due to including too much; it should not include
> sub2/file1 or sub2/file2 (it is only including those because it is
> showing them as deleted, when they are not deleted but are
> SKIP_WORKTREE).
>

Agree.

> I think I'm going to stop reviewing here.  I'm probably just going to
> keep repeating the same issues I identified earlier if I continue.

Thank you very much for your review, you have pointed out very many
problems with this patch :)

--
ZheNing Hu
Elijah Newren Nov. 6, 2022, 6:58 a.m. UTC | #5
Hi,

On Sat, Nov 5, 2022 at 7:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
> Elijah Newren <newren@gmail.com> 于2022年11月1日周二 13:18写道:
> > On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > >
> > > From: ZheNing Hu <adlternative@gmail.com>
> > >
> > > When we use sparse-checkout, we often want the set of files
> > > that some commands operate on to be restricted to the
> > > sparse-checkout specification.
> >
> > It seems a bit premature to send this, when the guideline document[*]
> > detailing how these options should work is still in the "Needs Review"
> > state.  I know, it's been waiting for a while, but it's a _long_
> > document.
> >
> > [*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/
> >
>
> Fine, I just want to start trying to experiment with this feature in
> git-diff earlier,
> and I can wait for the sparse-checkout.txt documentation to finish
> first if needed :)

Note that you may be able to help reduce the wait by reviewing the
document.  (You commented on v1 -- thanks! -- but no one has commented
on any of the newer versions, despite being out for over a month and
still being marked as "Needs Review" in the "What's cooking" reports.)

> > > `--no-scope` is the default
> > > option for now.
> >
> > What does --no-scope even mean?  You didn't explain it, and I don't
> > see how it could make sense.  We explicitly avoided a `--no-` prefix
> > by allowing the --scope option to take a value.  I don't think there
> > should be a --no-scope option.
>
> I think the “--no-scope” here does nothing, as if it were unaffected by scope
> (just like correctly "--scope=all", right?)

Reading your patch, I was unable to determine where you made
--no-scope behave differently than how you implemented --scope=all.
Maybe there was a difference and I missed it.

Anyway, I don't think there should be a --no-scope option.

> > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > > index 3674ac48e92..04bf83e8be1 100644
> > > --- a/Documentation/diff-options.txt
> > > +++ b/Documentation/diff-options.txt
> > > @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
> > >  non-default value and want to use the default one, then you
> > >  have to use `--diff-algorithm=default` option.
> > >
> > > +ifndef::git-format-patch[]
> > > +ifndef::git-log[]
> > > +
> > > +--scope={sparse|all}::
> > > +       Choose diff scope. The variants are as follows:
> > > ++
> > > +--
> > > +`--sparse`;;
> > > +       Restrict diff paths to those matching sparse-checkout specification.
> > > +`--all`;;
> > > +       Without restriction, diff is performed regardless of whether the path
> > > +       meets the sparse-checkout specification.
> > > +--
> > > ++
> > > +
> > > +endif::git-log[]
> > > +endif::git-format-patch[]
> >
> > What about diff-files, diff-index, diff-tree, and show?
> >
>
> diff-options.txt included in their documents, and git-format-patch.txt,
> git-log.txt, but should not in git-format-patch.txt. I don't know if it
> should be included in git-diff-files.txt, because git diff-files compare
> the files in the working tree and the index (so it's the same as
> "simple" git-diff, which should not be affected by scope?)

Oh, good point.  Yeah, git-diff-files.txt should not get this option
as it won't have any affect.  git-diff-index and git-diff-tree and
git-show and git-log should get it, though.

> > > +int diff_path_in_sparse_checkout(const char *path) {
> > > +       if (core_sparse_checkout_cone)
> > > +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> > > +       else
> > > +               return path_in_sparse_checkout(path, the_repository->index);
> > > +}
> >
> > This says we are including the path if it matches the sparsity
> > patterns.  Thus, we have to be careful when we use this function,
> > because the relevant paths are ones that match the sparsity
> > specification.  The sparsity specification will always match the
> > sparsity patterns when diffing two commits, but when either the index
> > or the working tree is part of the diff, the sparsity specification
> > *might* be temporarily expanded.
>
> Yes, I may have to look at more code to better understand how and when the
> "sparsity specification" is extended. Any recommendations for places to read?

Yes, please review the newer version of my sparse-checkout.txt
directions file; it covers this in more detail.

> > > @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
> >
> > do_oneway_diff is for cases where we are diffing against the index...
> >
> > >                         return; /* nothing to diff.. */
> > >         }
> > >
> > > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> > > +               if (idx && tree) {
> > > +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> > > +                               return;
> > > +               } else if (idx) {
> > > +                       if (!diff_path_in_sparse_checkout(idx->name))
> > > +                               return;
> > > +               } else if (tree) {
> > > +                       if (!diff_path_in_sparse_checkout(tree->name))
> > > +                               return;
> > > +               } else
> > > +                       return;
> > > +       }
> >
> > ...and you again mistakenly only compare to the sparsity patterns
> > instead of the sparse specification.
> >
>
> So here we should use ce_skip_worktree(idx) to check if this index entry matches
> sparse specification.

ce_skip_worktree(idx) only checks for expansions to the sparse
specification in the working tree.  If dealing with index files, the
sparse specification is also expanded to handle all files in the index
that differ from HEAD.

> > > +               } else
> > > +                       return error(_("invalid --scope value: %s"), optarg);
> > > +       }
> >
> > As written with no follow-on else clause here, wouldn't this silently
> > accept "--scope" without an "=<something>" argument without an error
> > and without properly initializing opt->scope?
> >
>
> Because opt will be initializing with default_diff_options in repo_diff_setup(),
> and opt->scope should respect config value first. So I don't think there's a
> mistake here.

Okay, it's good that you've got the variables initialized somehow, but
that's only half the point here.  The main point is that the user can
specify something that makes no sense and if they do, we should throw
an error informing them of their mistake.  --scope should not be
passed without an argument (either "all" or "sparse", currently), but
this code allowed it.

> > > diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> > > new file mode 100644
> >
> > This needs to be fixed.

Since you didn't comment on this but usually do comment on thing, I'll
just re-ping it to make sure you don't miss the comment about the
incorrect file mode here.

> > > +reset_repo () {
> > > +       rm -rf repo &&
> > > +       git clone --no-checkout temp repo
> >
> > Why --no-checkout rather than say --sparse?
> >
>
> This is because I accidentally associated it with a
> partial clone. I often use "git clone -filter=blob:none -no-checkout"
> to do a partial clone, then "git sparse- checkout set <pattern>"
> to set sparse-checkout patterns, and "git checkout" to download
> the missing blobs and checkout to a branch. But in this
> test file, we only need sparse-checkout, so it's true that I should
> not do such strange no-checkout thing.

Yeah, no need to involve partial clones here.  (As an aside, though, I
think even with a partial clone that --sparse makes more sense than
--no-checkout.)

> > > +change_worktree_and_index() {
> > > +       (
> > > +               cd repo &&
> > > +               mkdir sub2 sub3 &&
> > > +               echo sub1/file3 >sub1/file3 &&
> > > +               echo sub2/file3 >sub2/file3 &&
> > > +               echo sub3/file3 >sub3/file3 &&
> > > +               echo file3 >file3 &&
> > > +               git add --all --sparse &&
> > > +               echo sub1/file3 >>sub1/file3 &&
> > > +               echo sub2/file3 >>sub2/file3 &&
> > > +               echo sub3/file3 >>sub3/file3 &&
> > > +               echo file3 >>file3
> > > +       )
> > > +}
> >
> > It would be nice to modify different paths in the working tree and
> > index, to see if we can handle cases where the sparse specification
> > does not match the sparsity patterns better.  (So, modify files inside
> > and outside the sparsity patterns, stage those changes, and then do a
> > `git sparse-checkout reapply` to make the files outside the sparsity
> > patterns disappear from the working tree...then modify different files
> > in the working tree both inside and outside the sparsity patterns.
> > And also remove some file that matches the sparsity patterns and
> > manually mark it as SKIP_WORKTREE.)  That'd give us much better
> > coverage.
> >
>
> Nice addition. So I should use git update-index --skip-worktree to set
> skip_worktree bit for index entries.

Well, I'd say use normal "sparse-checkout" commands to set most of
them.  However, adding one or two extra SKIP_WORKTREE paths via
running `git update-index --skip-worktree $PATH` (and removing the
corresponding local file) would make the testcases more interesting.

> > I think I'm going to stop reviewing here.  I'm probably just going to
> > keep repeating the same issues I identified earlier if I continue.
>
> Thank you very much for your review, you have pointed out very many
> problems with this patch :)

I'm glad it was helpful.  :)
ZheNing Hu Nov. 14, 2022, 9:08 a.m. UTC | #6
Elijah Newren <newren@gmail.com> 于2022年11月6日周日 14:58写道:
>
> Hi,
>
> On Sat, Nov 5, 2022 at 7:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
> > Elijah Newren <newren@gmail.com> 于2022年11月1日周二 13:18写道:
> > > On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> > > >
> > > > From: ZheNing Hu <adlternative@gmail.com>
> > > >
> > > > When we use sparse-checkout, we often want the set of files
> > > > that some commands operate on to be restricted to the
> > > > sparse-checkout specification.
> > >
> > > It seems a bit premature to send this, when the guideline document[*]
> > > detailing how these options should work is still in the "Needs Review"
> > > state.  I know, it's been waiting for a while, but it's a _long_
> > > document.
> > >
> > > [*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/
> > >
> >
> > Fine, I just want to start trying to experiment with this feature in
> > git-diff earlier,
> > and I can wait for the sparse-checkout.txt documentation to finish
> > first if needed :)
>
> Note that you may be able to help reduce the wait by reviewing the
> document.  (You commented on v1 -- thanks! -- but no one has commented
> on any of the newer versions, despite being out for over a month and
> still being marked as "Needs Review" in the "What's cooking" reports.)
>

I've skimmed v3 in general, but haven't looked at v4 yet, I'll take a closer
look at it.

> > > > `--no-scope` is the default
> > > > option for now.
> > >
> > > What does --no-scope even mean?  You didn't explain it, and I don't
> > > see how it could make sense.  We explicitly avoided a `--no-` prefix
> > > by allowing the --scope option to take a value.  I don't think there
> > > should be a --no-scope option.
> >
> > I think the “--no-scope” here does nothing, as if it were unaffected by scope
> > (just like correctly "--scope=all", right?)
>
> Reading your patch, I was unable to determine where you made
> --no-scope behave differently than how you implemented --scope=all.
> Maybe there was a difference and I missed it.
>
> Anyway, I don't think there should be a --no-scope option.
>

Fine, I'll get rid of it.

> > > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > > > index 3674ac48e92..04bf83e8be1 100644
> > > > --- a/Documentation/diff-options.txt
> > > > +++ b/Documentation/diff-options.txt
> > > > @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
> > > >  non-default value and want to use the default one, then you
> > > >  have to use `--diff-algorithm=default` option.
> > > >
> > > > +ifndef::git-format-patch[]
> > > > +ifndef::git-log[]
> > > > +
> > > > +--scope={sparse|all}::
> > > > +       Choose diff scope. The variants are as follows:
> > > > ++
> > > > +--
> > > > +`--sparse`;;
> > > > +       Restrict diff paths to those matching sparse-checkout specification.
> > > > +`--all`;;
> > > > +       Without restriction, diff is performed regardless of whether the path
> > > > +       meets the sparse-checkout specification.
> > > > +--
> > > > ++
> > > > +
> > > > +endif::git-log[]
> > > > +endif::git-format-patch[]
> > >
> > > What about diff-files, diff-index, diff-tree, and show?
> > >
> >
> > diff-options.txt included in their documents, and git-format-patch.txt,
> > git-log.txt, but should not in git-format-patch.txt. I don't know if it
> > should be included in git-diff-files.txt, because git diff-files compare
> > the files in the working tree and the index (so it's the same as
> > "simple" git-diff, which should not be affected by scope?)
>
> Oh, good point.  Yeah, git-diff-files.txt should not get this option
> as it won't have any affect.  git-diff-index and git-diff-tree and
> git-show and git-log should get it, though.
>
> > > > +int diff_path_in_sparse_checkout(const char *path) {
> > > > +       if (core_sparse_checkout_cone)
> > > > +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> > > > +       else
> > > > +               return path_in_sparse_checkout(path, the_repository->index);
> > > > +}
> > >
> > > This says we are including the path if it matches the sparsity
> > > patterns.  Thus, we have to be careful when we use this function,
> > > because the relevant paths are ones that match the sparsity
> > > specification.  The sparsity specification will always match the
> > > sparsity patterns when diffing two commits, but when either the index
> > > or the working tree is part of the diff, the sparsity specification
> > > *might* be temporarily expanded.
> >
> > Yes, I may have to look at more code to better understand how and when the
> > "sparsity specification" is extended. Any recommendations for places to read?
>
> Yes, please review the newer version of my sparse-checkout.txt
> directions file; it covers this in more detail.
>

Ok, thanks.

> > > > @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
> > >
> > > do_oneway_diff is for cases where we are diffing against the index...
> > >
> > > >                         return; /* nothing to diff.. */
> > > >         }
> > > >
> > > > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> > > > +               if (idx && tree) {
> > > > +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> > > > +                               return;
> > > > +               } else if (idx) {
> > > > +                       if (!diff_path_in_sparse_checkout(idx->name))
> > > > +                               return;
> > > > +               } else if (tree) {
> > > > +                       if (!diff_path_in_sparse_checkout(tree->name))
> > > > +                               return;
> > > > +               } else
> > > > +                       return;
> > > > +       }
> > >
> > > ...and you again mistakenly only compare to the sparsity patterns
> > > instead of the sparse specification.
> > >
> >
> > So here we should use ce_skip_worktree(idx) to check if this index entry matches
> > sparse specification.
>
> ce_skip_worktree(idx) only checks for expansions to the sparse
> specification in the working tree.  If dealing with index files, the
> sparse specification is also expanded to handle all files in the index
> that differ from HEAD.
>

I will read more code to understand this detail.

> > > > +               } else
> > > > +                       return error(_("invalid --scope value: %s"), optarg);
> > > > +       }
> > >
> > > As written with no follow-on else clause here, wouldn't this silently
> > > accept "--scope" without an "=<something>" argument without an error
> > > and without properly initializing opt->scope?
> > >
> >
> > Because opt will be initializing with default_diff_options in repo_diff_setup(),
> > and opt->scope should respect config value first. So I don't think there's a
> > mistake here.
>
> Okay, it's good that you've got the variables initialized somehow, but
> that's only half the point here.  The main point is that the user can
> specify something that makes no sense and if they do, we should throw
> an error informing them of their mistake.  --scope should not be
> passed without an argument (either "all" or "sparse", currently), but
> this code allowed it.
>

Ah, it makes sense, I will fix it.

> > > > diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> > > > new file mode 100644
> > >
> > > This needs to be fixed.
>
> Since you didn't comment on this but usually do comment on thing, I'll
> just re-ping it to make sure you don't miss the comment about the
> incorrect file mode here.
>

Yes, I've noticed the file permissions issue here, but there may be some other
issues with this test file: it fails when running CI tests, presumably
because of
some shell syntax incompatibility.

> > > > +reset_repo () {
> > > > +       rm -rf repo &&
> > > > +       git clone --no-checkout temp repo
> > >
> > > Why --no-checkout rather than say --sparse?
> > >
> >
> > This is because I accidentally associated it with a
> > partial clone. I often use "git clone -filter=blob:none -no-checkout"
> > to do a partial clone, then "git sparse- checkout set <pattern>"
> > to set sparse-checkout patterns, and "git checkout" to download
> > the missing blobs and checkout to a branch. But in this
> > test file, we only need sparse-checkout, so it's true that I should
> > not do such strange no-checkout thing.
>
> Yeah, no need to involve partial clones here.  (As an aside, though, I
> think even with a partial clone that --sparse makes more sense than
> --no-checkout.)
>

Using --sparse may indeed be better than -no-checkout, as there is no
need to perform an additional checkout after using clone --no-checkout
and sparse-checkout. But it uses sparse-checkout cone mode by default.

A little curious, why can't we specify --no-cone mode when doing git clone
and why can't we specify sparse-checkout patterns here? If such a feature
is available, git clone and git sparse-checkout will be combined in one step.

> > > > +change_worktree_and_index() {
> > > > +       (
> > > > +               cd repo &&
> > > > +               mkdir sub2 sub3 &&
> > > > +               echo sub1/file3 >sub1/file3 &&
> > > > +               echo sub2/file3 >sub2/file3 &&
> > > > +               echo sub3/file3 >sub3/file3 &&
> > > > +               echo file3 >file3 &&
> > > > +               git add --all --sparse &&
> > > > +               echo sub1/file3 >>sub1/file3 &&
> > > > +               echo sub2/file3 >>sub2/file3 &&
> > > > +               echo sub3/file3 >>sub3/file3 &&
> > > > +               echo file3 >>file3
> > > > +       )
> > > > +}
> > >
> > > It would be nice to modify different paths in the working tree and
> > > index, to see if we can handle cases where the sparse specification
> > > does not match the sparsity patterns better.  (So, modify files inside
> > > and outside the sparsity patterns, stage those changes, and then do a
> > > `git sparse-checkout reapply` to make the files outside the sparsity
> > > patterns disappear from the working tree...then modify different files
> > > in the working tree both inside and outside the sparsity patterns.
> > > And also remove some file that matches the sparsity patterns and
> > > manually mark it as SKIP_WORKTREE.)  That'd give us much better
> > > coverage.
> > >
> >
> > Nice addition. So I should use git update-index --skip-worktree to set
> > skip_worktree bit for index entries.
>
> Well, I'd say use normal "sparse-checkout" commands to set most of
> them.  However, adding one or two extra SKIP_WORKTREE paths via
> running `git update-index --skip-worktree $PATH` (and removing the
> corresponding local file) would make the testcases more interesting.
>

Get it.

> > > I think I'm going to stop reviewing here.  I'm probably just going to
> > > keep repeating the same issues I identified earlier if I continue.
> >
> > Thank you very much for your review, you have pointed out very many
> > problems with this patch :)
>
> I'm glad it was helpful.  :)

Thanks.

ZheNing Hu
diff mbox series

Patch

diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
index 35a7bf86d77..52707e1b2d6 100644
--- a/Documentation/config/diff.txt
+++ b/Documentation/config/diff.txt
@@ -201,6 +201,18 @@  diff.algorithm::
 --
 +
 
+diff.scope::
+	Choose diff scope. The variants are as follows:
++
+--
+`sparse`;;
+	Restrict diff paths to those matching sparse-checkout specification.
+`all`;;
+	Without restriction, diff is performed regardless of whether the path
+	meets the sparse-checkout specification.
+--
++
+
 diff.wsErrorHighlight::
 	Highlight whitespace errors in the `context`, `old` or `new`
 	lines of the diff.  Multiple values are separated by comma,
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 3674ac48e92..04bf83e8be1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -195,6 +195,24 @@  For instance, if you configured the `diff.algorithm` variable to a
 non-default value and want to use the default one, then you
 have to use `--diff-algorithm=default` option.
 
+ifndef::git-format-patch[]
+ifndef::git-log[]
+
+--scope={sparse|all}::
+	Choose diff scope. The variants are as follows:
++
+--
+`--sparse`;;
+	Restrict diff paths to those matching sparse-checkout specification.
+`--all`;;
+	Without restriction, diff is performed regardless of whether the path
+	meets the sparse-checkout specification.
+--
++
+
+endif::git-log[]
+endif::git-format-patch[]
+
 --stat[=<width>[,<name-width>[,<count>]]]::
 	Generate a diffstat. By default, as much space as necessary
 	will be used for the filename part, and the rest for the graph
diff --git a/builtin/diff.c b/builtin/diff.c
index 854d2c5a5c4..6b450f7184c 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -54,6 +54,10 @@  static void stuff_change(struct diff_options *opt,
 	    oideq(old_oid, new_oid) && (old_mode == new_mode))
 		return;
 
+	if (opt->scope == DIFF_SCOPE_SPARSE &&
+	    !diff_paths_in_sparse_checkout(old_path, new_path))
+		return;
+
 	if (opt->flags.reverse_diff) {
 		SWAP(old_mode, new_mode);
 		SWAP(old_oid, new_oid);
diff --git a/diff-lib.c b/diff-lib.c
index 2edea41a234..a3381f2e0ff 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -88,6 +88,22 @@  static int match_stat_with_submodule(struct diff_options *diffopt,
 	return changed;
 }
 
+int diff_path_in_sparse_checkout(const char *path) {
+	if (core_sparse_checkout_cone)
+		return path_in_cone_mode_sparse_checkout(path, the_repository->index);
+	else
+		return path_in_sparse_checkout(path, the_repository->index);
+}
+
+int diff_paths_in_sparse_checkout(const char *one, const char*two) {
+	if (one == two || !strcmp(one, two))
+		return diff_path_in_sparse_checkout(one);
+	else
+		return diff_path_in_sparse_checkout(one) &&
+		       diff_path_in_sparse_checkout(two);
+}
+
+
 int run_diff_files(struct rev_info *revs, unsigned int option)
 {
 	int entries, i;
@@ -113,6 +129,9 @@  int run_diff_files(struct rev_info *revs, unsigned int option)
 
 		if (diff_can_quit_early(&revs->diffopt))
 			break;
+		if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
+		    !diff_path_in_sparse_checkout(ce->name))
+			continue;
 
 		if (!ce_path_match(istate, ce, &revs->prune_data, NULL))
 			continue;
@@ -202,7 +221,8 @@  int run_diff_files(struct rev_info *revs, unsigned int option)
 				continue;
 		}
 
-		if (ce_uptodate(ce) || ce_skip_worktree(ce))
+		if (ce_uptodate(ce) ||
+		    (revs->diffopt.scope != DIFF_SCOPE_ALL && ce_skip_worktree(ce)))
 			continue;
 
 		/*
@@ -439,6 +459,20 @@  static void do_oneway_diff(struct unpack_trees_options *o,
 			return;	/* nothing to diff.. */
 	}
 
+	if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
+		if (idx && tree) {
+			if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
+				return;
+		} else if (idx) {
+			if (!diff_path_in_sparse_checkout(idx->name))
+				return;
+		} else if (tree) {
+			if (!diff_path_in_sparse_checkout(tree->name))
+				return;
+		} else
+			return;
+	}
+
 	/* if the entry is not checked out, don't examine work tree */
 	cached = o->index_only ||
 		(idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx)));
diff --git a/diff-no-index.c b/diff-no-index.c
index 18edbdf4b59..ea94a104ea4 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -281,6 +281,10 @@  int diff_no_index(struct rev_info *revs,
 
 	fixup_paths(paths, &replacement);
 
+	if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
+	    !diff_paths_in_sparse_checkout(paths[0], paths[1]))
+		goto out;
+
 	revs->diffopt.skip_stat_unmatch = 1;
 	if (!revs->diffopt.output_format)
 		revs->diffopt.output_format = DIFF_FORMAT_PATCH;
diff --git a/diff.c b/diff.c
index 285d6e2d575..9de4044ae05 100644
--- a/diff.c
+++ b/diff.c
@@ -48,6 +48,7 @@  static int diff_interhunk_context_default;
 static const char *diff_word_regex_cfg;
 static const char *external_diff_cmd_cfg;
 static const char *diff_order_file_cfg;
+static const char *external_diff_scope_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
@@ -57,6 +58,7 @@  static int diff_dirstat_permille_default = 30;
 static struct diff_options default_diff_options;
 static long diff_algorithm;
 static unsigned ws_error_highlight_default = WSEH_NEW;
+static enum diff_scope external_diff_scope;
 
 static char diff_colors[][COLOR_MAXLEN] = {
 	GIT_COLOR_RESET,
@@ -423,6 +425,16 @@  int git_diff_ui_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "diff.scope")) {
+		git_config_string(&external_diff_scope_cfg, var, value);
+		if (!strcmp(value, "all"))
+			external_diff_scope = DIFF_SCOPE_ALL;
+		else if (!strcmp(value, "sparse"))
+			external_diff_scope = DIFF_SCOPE_SPARSE;
+		else
+			return -1;
+	}
+
 	if (git_color_config(var, value, cb) < 0)
 		return -1;
 
@@ -4663,6 +4675,7 @@  void repo_diff_setup(struct repository *r, struct diff_options *options)
 
 	options->color_moved = diff_color_moved_default;
 	options->color_moved_ws_handling = diff_color_moved_ws_default;
+	options->scope = external_diff_scope;
 
 	prep_parse_options(options);
 }
@@ -4914,6 +4927,29 @@  static int parse_dirstat_opt(struct diff_options *options, const char *params)
 	return 1;
 }
 
+static int diff_opt_diff_scope(const struct option *option,
+				const char *optarg, int unset)
+{
+	struct diff_options *opt = option->value;
+
+	if (unset) {
+		opt->scope = DIFF_SCOPE_NONE;
+	} else if (optarg) {
+		if (!strcmp(optarg, "all")) {
+			if (core_apply_sparse_checkout) {
+				opt->scope = DIFF_SCOPE_ALL;
+			}
+		} else if (!strcmp(optarg, "sparse")) {
+			if (core_apply_sparse_checkout) {
+				opt->scope = DIFF_SCOPE_SPARSE;
+			}
+		} else
+			return error(_("invalid --scope value: %s"), optarg);
+	}
+
+	return 0;
+}
+
 static int diff_opt_diff_filter(const struct option *option,
 				const char *optarg, int unset)
 {
@@ -5683,6 +5719,9 @@  static void prep_parse_options(struct diff_options *options)
 		OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
 			       N_("select files by diff type"),
 			       PARSE_OPT_NONEG, diff_opt_diff_filter),
+		OPT_CALLBACK_F(0, "scope", options, N_("[sparse|all]"),
+			       N_("choose diff scope"),
+			       PARSE_OPT_OPTARG, diff_opt_diff_scope),
 		{ OPTION_CALLBACK, 0, "output", options, N_("<file>"),
 		  N_("output to a specific file"),
 		  PARSE_OPT_NONEG, NULL, 0, diff_opt_output },
diff --git a/diff.h b/diff.h
index 8ae18e5ab1e..90f7512034c 100644
--- a/diff.h
+++ b/diff.h
@@ -230,6 +230,12 @@  enum diff_submodule_format {
 	DIFF_SUBMODULE_INLINE_DIFF
 };
 
+enum diff_scope {
+	DIFF_SCOPE_NONE = 0,
+	DIFF_SCOPE_ALL,
+	DIFF_SCOPE_SPARSE,
+};
+
 /**
  * the set of options the calling program wants to affect the operation of
  * diffcore library with.
@@ -285,6 +291,9 @@  struct diff_options {
 	/* diff-filter bits */
 	unsigned int filter, filter_not;
 
+	/* diff sparse-checkout scope */
+	enum diff_scope scope;
+
 	int use_color;
 
 	/* Number of context lines to generate in patch output. */
@@ -696,4 +705,6 @@  void print_stat_summary(FILE *fp, int files,
 			int insertions, int deletions);
 void setup_diff_pager(struct diff_options *);
 
+int diff_path_in_sparse_checkout(const char *path);
+int diff_paths_in_sparse_checkout(const char *one, const char *two);
 #endif /* DIFF_H */
diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
new file mode 100644
index 00000000000..dca75a3308b
--- /dev/null
+++ b/t/t4070-diff-sparse-checkout-scope.sh
@@ -0,0 +1,469 @@ 
+#!/bin/sh
+
+test_description='diff sparse-checkout scope'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+
+test_expect_success 'setup' '
+	git init temp &&
+	(
+		cd temp &&
+		mkdir sub1 &&
+		mkdir sub2 &&
+		echo sub1/file1 >sub1/file1 &&
+		echo sub2/file2 >sub2/file2 &&
+		echo file1 >file1 &&
+		echo file2 >file2 &&
+		git add --all &&
+		git commit -m init &&
+		echo sub1/file1 >>sub1/file1 &&
+		echo sub1/file2 >>sub1/file2 &&
+		echo sub2/file1 >>sub2/file1 &&
+		echo sub2/file2 >>sub2/file2 &&
+		echo file1 >>file1 &&
+		echo file2 >>file2 &&
+		git add --all &&
+		git commit -m change1 &&
+		echo sub1/file1 >>sub1/file1 &&
+		echo sub1/file2 >>sub1/file2 &&
+		echo sub2/file1 >>sub2/file1 &&
+		echo sub2/file2 >>sub2/file2 &&
+		echo file1 >>file1 &&
+		echo file2 >>file2 &&
+		git add --all &&
+		git commit -m change2
+	)
+'
+
+reset_repo () {
+	rm -rf repo &&
+	git clone --no-checkout temp repo
+}
+
+reset_with_sparse_checkout() {
+	reset_repo &&
+	git -C repo sparse-checkout set $1 sub1 &&
+	git -C repo checkout
+}
+
+change_worktree_and_index() {
+	(
+		cd repo &&
+		mkdir sub2 sub3 &&
+		echo sub1/file3 >sub1/file3 &&
+		echo sub2/file3 >sub2/file3 &&
+		echo sub3/file3 >sub3/file3 &&
+		echo file3 >file3 &&
+		git add --all --sparse &&
+		echo sub1/file3 >>sub1/file3 &&
+		echo sub2/file3 >>sub2/file3 &&
+		echo sub3/file3 >>sub3/file3 &&
+		echo file3 >>file3
+	)
+}
+
+diff_scope() {
+	title=$1
+	need_change_worktree_and_index=$2
+	sparse_checkout_option=$3
+	scope_option=$4
+	expect=$5
+	shift 5
+	args=("$@")
+
+	test_expect_success "$title $sparse_checkout_option $scope_option" "
+		reset_with_sparse_checkout $sparse_checkout_option &&
+		if test \"$need_change_worktree_and_index\" = \"true\" ; then
+			change_worktree_and_index
+		fi &&
+		git -C repo diff $scope_option ${args[*]} >actual &&
+		if test -z \"$expect\" ; then
+			>expect
+		else
+			cat > expect <<-EOF
+$expect
+			EOF
+		fi &&
+		test_cmp expect actual
+	"
+}
+
+args=("--name-only" "HEAD" "HEAD~")
+diff_scope builtin_diff_tree false "--no-cone" "--scope=sparse" \
+"sub1/file1
+sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--no-cone" "--scope=all" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--no-cone" "--no-scope" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--cone" "--scope=sparse" \
+"file1
+file2
+sub1/file1
+sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--cone" "--scope=all" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--cone" "--no-scope" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+args=("--name-only" "HEAD~")
+diff_scope builtin_diff_index true "--no-cone" "--scope=sparse" \
+"sub1/file1
+sub1/file2
+sub1/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--no-cone" "--scope=all" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3
+sub3/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--no-cone" "--no-scope" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3
+sub3/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--cone" "--scope=sparse" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--cone" "--scope=all" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3
+sub3/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--cone" "--no-scope" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3
+sub3/file3" "${args[@]}"
+
+args=("--name-only" "file3" "sub1/" "sub2/")
+
+diff_scope builtin_diff_files true "--no-cone" "--scope=sparse" \
+"sub1/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--no-cone" "--scope=all" \
+"file3
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--no-cone" "--no-scope" \
+"file3
+sub1/file3
+sub2/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--cone" "--scope=sparse" \
+"file3
+sub1/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--cone" "--scope=all" \
+"file3
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--cone" "--no-scope" \
+"file3
+sub1/file3
+sub2/file3" "${args[@]}"
+
+
+args=("--name-only" "HEAD~:sub2/file2" "sub1/file2")
+
+diff_scope builtin_diff_b_f true "--no-cone" "--scope=sparse" \
+"" "${args[@]}"
+
+diff_scope builtin_diff_b_f true "--no-cone" "--scope=all" \
+"sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_b_f true "--no-cone" "--no-scope" \
+"sub1/file2" "${args[@]}"
+
+args=("--name-only" "HEAD~:sub1/file1" "file3")
+
+diff_scope builtin_diff_b_f true "--cone" "--scope=sparse" \
+"file3" "${args[@]}"
+
+diff_scope builtin_diff_b_f true "--cone" "--scope=all" \
+"file3" "${args[@]}"
+
+diff_scope builtin_diff_b_f true "--cone" "--no-scope" \
+"file3" "${args[@]}"
+
+args=("--name-only" HEAD~:sub2/file2 HEAD:sub1/file2)
+
+diff_scope builtin_diff_blobs true "--no-cone" "--scope=sparse" \
+"" "${args[@]}"
+
+diff_scope builtin_diff_blobs true "--no-cone" "--scope=all" \
+"sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_blobs true "--no-cone" "--no-scope" \
+"sub1/file2" "${args[@]}"
+
+args=("--name-only" HEAD~:sub1/file1 HEAD:file2)
+
+diff_scope builtin_diff_blobs false "--cone" "--scope=sparse" \
+"file2" "${args[@]}"
+
+diff_scope builtin_diff_blobs false "--cone" "--scope=all" \
+"file2" "${args[@]}"
+
+diff_scope builtin_diff_blobs false "--cone" "--no-scope" \
+"file2" "${args[@]}"
+
+args=("--name-only" HEAD~2 HEAD~ HEAD)
+
+diff_scope builtin_diff_combined false "--no-cone" "--scope=sparse" \
+"sub1/file1
+sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--no-cone" "--scope=all" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--no-cone" "--no-scope" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--cone" "--scope=sparse" \
+"file1
+file2
+sub1/file1
+sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--cone" "--scope=all" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--cone" "--no-scope" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+test_expect_success 'diff_no_index --no-cone, --scope=sparse' '
+	reset_with_sparse_checkout --no-cone &&
+	(
+		cd repo &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=sparse sub1/file1 sub1/file2 >actual &&
+	cat > expect <<-EOF &&
+sub1/file2
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=sparse sub1/file1 sub3/file3 >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --no-cone, --scope=all' '
+	reset_with_sparse_checkout --no-cone &&
+	(
+		cd repo &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=all sub1/file1 sub1/file2 >actual &&
+	cat > expect <<-EOF &&
+sub1/file2
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=all sub1/file1 sub3/file3 >actual &&
+	cat > expect <<-EOF &&
+sub3/file3
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --no-cone, --no-scope' '
+	reset_with_sparse_checkout --no-cone &&
+	(
+		cd repo &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --no-scope sub1/file1 sub1/file2 >actual &&
+	cat > expect <<-EOF &&
+sub1/file2
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --no-scope sub1/file1 sub3/file3 >actual &&
+	cat > expect <<-EOF &&
+sub3/file3
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --cone, --scope=sparse' '
+	reset_with_sparse_checkout --cone &&
+	(
+		cd repo &&
+		echo file3 >file3 &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=sparse sub1/file1 file3 >actual &&
+	cat > expect <<-EOF &&
+file3
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=sparse sub1/file1 sub3/file3 >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --cone, --scope=all' '
+	reset_with_sparse_checkout --cone &&
+	(
+		cd repo &&
+		echo file3 >file3 &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=all sub1/file1 file3 >actual &&
+	cat > expect <<-EOF &&
+file3
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=all sub1/file1 sub3/file3 >actual &&
+	cat > expect <<-EOF &&
+sub3/file3
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --cone, --no-scope' '
+	reset_with_sparse_checkout --cone &&
+	(
+		cd repo &&
+		echo file3 >file3 &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --no-scope sub1/file1 file3 >actual &&
+	cat > expect <<-EOF &&
+file3
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --no-scope sub1/file1 sub3/file3 >actual &&
+	cat > expect <<-EOF &&
+sub3/file3
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff scope config sparse' '
+	reset_with_sparse_checkout --cone &&
+	git -C repo -c diff.scope=sparse diff --name-only HEAD~ >actual &&
+	cat > expect <<-EOF &&
+file1
+file2
+sub1/file1
+sub1/file2
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff scope config all' '
+	reset_with_sparse_checkout --cone &&
+	git -C repo -c diff.scope=all diff --name-only HEAD~ >actual &&
+	cat > expect <<-EOF &&
+file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff scope config override by option' '
+	reset_with_sparse_checkout --cone &&
+	git -C repo -c diff.scope=sparse diff --name-only --scope=all HEAD~ >actual &&
+	cat > expect <<-EOF &&
+file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2
+	EOF
+	test_cmp expect actual
+'
+
+test_done
diff --git a/tree-diff.c b/tree-diff.c
index 69031d7cbae..67f99c8e4df 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -76,6 +76,11 @@  static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2)
 static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_diff_path *p)
 {
 	struct combine_diff_parent *p0 = &p->parent[0];
+
+	if (opt->scope == DIFF_SCOPE_SPARSE &&
+	    !diff_path_in_sparse_checkout(p->path))
+		return 0;
+
 	if (p->mode && p0->mode) {
 		opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid,
 			1, 1, p->path, 0, 0);