diff mbox series

advice: warn when sparse index expands

Message ID pull.1756.git.1720019679517.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series advice: warn when sparse index expands | expand

Commit Message

Derrick Stolee July 3, 2024, 3:14 p.m. UTC
From: Derrick Stolee <stolee@gmail.com>

Typically, forcing a sparse index to expand to a full index means that
Git could not determine the status of a file outside of the
sparse-checkout and needed to expand sparse trees into the full list of
sparse blobs. This operation can be very slow when the sparse-checkout
is much smaller than the full tree at HEAD.

When users are in this state, it is common that 'git status' will report
the problem. Usually there is a modified or untracked file outside of
the sparse-checkout mentioned by the 'git status' output. There are a
number of reasons why this is insufficient:

 1. Users may not have a full understanding of which files are inside or
    outside of their sparse-checkout. This is more common in monorepos
    that manage the sparse-checkout using custom tools that map build
    dependencies into sparse-checkout definitions.

 2. In some cases, an empty directory could exist outside the
    sparse-checkout and these empty directories are not reported by 'git
    status' and friends.

 3. If the user has '.gitignore' or 'exclude' files, then 'git status'
    will squelch the warnings and not demonstrate any problems.

In order to help users who are in this state, add a new advice message
to indicate that a sparse index is expanded to a full index. This
message should be written at most once per process, so add a static
global 'give_advice_on_expansion' to sparse-index.c. Further, there is a
case in 'git sparse-checkout set' that uses the sparse index as an
in-memory data structure (even when writing a full index) so we need to
disable the message in that kind of case.

The t1092-sparse-checkout-compatibility.sh test script compares the
behavior of several Git commands across full and sparse repositories,
including sparse repositories with and without a sparse index. We need
to disable the advice in the sparse-index repo to avoid differences in
stderr. By leaving the advice on in the sparse-checkout repo (without
the sparse index), we can test the behavior of disabling the advice in
convert_to_sparse(). (Indeed, these tests are how that necessity was
discovered.) Add a test that reenables the advice and demonstrates that
the message is output.

The advice message is defined outside of expand_index() to avoid super-
wide lines. It is also defined as a macro to avoid compile issues with
-Werror=format-security.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
    advice: warn when sparse index expands
    
    This is motivated by some cases I noticed while supporting users who
    were in this bad state and hitting the performance issue noticed in [1].
    These users had some of the example situations where 'git status' did
    not provide helpful output. This idea has always been in the back of my
    mind since the sparse index was created, but it didn't make sense
    initially when only a few builtins could operate without immediately
    expanding a sparse index to a full one.
    
    [1]
    https://lore.kernel.org/git/pull.1754.v3.git.1719578605.gitgitgadget@gmail.com/
    
    Thanks, Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1756%2Fderrickstolee%2Fadvice-sparse-index-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1756/derrickstolee/advice-sparse-index-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1756

 Documentation/config/advice.txt          |  4 ++++
 advice.c                                 |  1 +
 advice.h                                 |  1 +
 sparse-index.c                           | 27 ++++++++++++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 16 +++++++++++++-
 5 files changed, 48 insertions(+), 1 deletion(-)


base-commit: 06e570c0dfb2a2deb64d217db78e2ec21672f558

Comments

Junio C Hamano July 3, 2024, 6:16 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> Typically, forcing a sparse index to expand to a full index means that
> Git could not determine the status of a file outside of the
> sparse-checkout and needed to expand sparse trees into the full list of
> sparse blobs. This operation can be very slow when the sparse-checkout
> is much smaller than the full tree at HEAD.
>
> When users are in this state, it is common that 'git status' will report
> the problem. Usually there is a modified or untracked file outside of
> the sparse-checkout mentioned by the 'git status' output. There are a
> number of reasons why this is insufficient:

Nicely written to explain why giving an advice message is a good
idea to cover this situation.

Making it possible to squelch comes with no cost (once the code to
do so is written), so I do not have a huge problem with the use of
advise_if_enabled(), but I offhand do not know if the users would
ever want to squelch it.  Is this something that users would choose
to say "yes, I know what I am doing is making my sparse working tree
unusuably slow and I've heard how to whip my sparse working tree
into a better shape already---please do not tell it to me ever
again; because I need to leave these crufts outside the sparse cone
anyway, I am willing to accept the unusually slow response,
overhead, and wasted cycles and power" to?

Other than that, nicely done.
Derrick Stolee July 3, 2024, 7:18 p.m. UTC | #2
On 7/3/24 2:16 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> Typically, forcing a sparse index to expand to a full index means that
>> Git could not determine the status of a file outside of the
>> sparse-checkout and needed to expand sparse trees into the full list of
>> sparse blobs. This operation can be very slow when the sparse-checkout
>> is much smaller than the full tree at HEAD.
>>
>> When users are in this state, it is common that 'git status' will report
>> the problem. Usually there is a modified or untracked file outside of
>> the sparse-checkout mentioned by the 'git status' output. There are a
>> number of reasons why this is insufficient:
> 
> Nicely written to explain why giving an advice message is a good
> idea to cover this situation.
> 
> Making it possible to squelch comes with no cost (once the code to
> do so is written), so I do not have a huge problem with the use of
> advise_if_enabled(), but I offhand do not know if the users would
> ever want to squelch it.  Is this something that users would choose
> to say "yes, I know what I am doing is making my sparse working tree
> unusuably slow and I've heard how to whip my sparse working tree
> into a better shape already---please do not tell it to me ever
> again; because I need to leave these crufts outside the sparse cone
> anyway, I am willing to accept the unusually slow response,
> overhead, and wasted cycles and power" to?

I currently can't imagine a case where a user would want to disable
this advice, but I defaulted to allowing it. I suppose it is more
difficult to remove that option later, so I should have defaulted
to not having it removable via config.

I can send a v2 without the config option present. (I'll wait a
day or two to see if others have strong opinions.)

Thanks,
-Stolee
Randall S. Becker July 3, 2024, 7:28 p.m. UTC | #3
On Wednesday, July 3, 2024 3:18 PM, Derrick Stolee wrote:
>On 7/3/24 2:16 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Derrick Stolee <stolee@gmail.com>
>>>
>>> Typically, forcing a sparse index to expand to a full index means
>>> that Git could not determine the status of a file outside of the
>>> sparse-checkout and needed to expand sparse trees into the full list
>>> of sparse blobs. This operation can be very slow when the
>>> sparse-checkout is much smaller than the full tree at HEAD.
>>>
>>> When users are in this state, it is common that 'git status' will
>>> report the problem. Usually there is a modified or untracked file
>>> outside of the sparse-checkout mentioned by the 'git status' output.
>>> There are a number of reasons why this is insufficient:
>>
>> Nicely written to explain why giving an advice message is a good idea
>> to cover this situation.
>>
>> Making it possible to squelch comes with no cost (once the code to do
>> so is written), so I do not have a huge problem with the use of
>> advise_if_enabled(), but I offhand do not know if the users would ever
>> want to squelch it.  Is this something that users would choose to say
>> "yes, I know what I am doing is making my sparse working tree
>> unusuably slow and I've heard how to whip my sparse working tree into
>> a better shape already---please do not tell it to me ever again;
>> because I need to leave these crufts outside the sparse cone anyway, I
>> am willing to accept the unusually slow response, overhead, and wasted
>> cycles and power" to?
>
>I currently can't imagine a case where a user would want to disable this advice, but I
>defaulted to allowing it. I suppose it is more difficult to remove that option later, so I
>should have defaulted to not having it removable via config.
>
>I can send a v2 without the config option present. (I'll wait a day or two to see if
>others have strong opinions.)

One possible use case is during a CI test of user code where they need to do something that causes the advice, but don't want to see the message in their logs.
--Randall
Junio C Hamano July 3, 2024, 7:54 p.m. UTC | #4
Derrick Stolee <stolee@gmail.com> writes:

> I can send a v2 without the config option present. (I'll wait a
> day or two to see if others have strong opinions.)

FWIW, that wasn't the direction I meant to lead the topic into.  It
is perfectly acceptable if some are much more often used than others
among various advice.* configuration variables for squelching these
messages.

I was merely making an observation that this is likely to be a less
used ones.

Thanks.
Rubén Justo July 3, 2024, 8:36 p.m. UTC | #5
On Wed, Jul 03, 2024 at 03:14:39PM +0000, Derrick Stolee via GitGitGadget wrote:

>  	[ADVICE_SET_UPSTREAM_FAILURE]			= { "setUpstreamFailure" },

> +	[ADVICE_SPARSE_INDEX_EXPANDED]			= { "sparseIndexExpanded" },
>  	[ADVICE_SKIPPED_CHERRY_PICKS]			= { "skippedCherryPicks" },

Maybe this change will disappear in a possible v2, but if it remains
you'll probably want to swap these two lines. 

>  	ADVICE_SET_UPSTREAM_FAILURE,

> +	ADVICE_SPARSE_INDEX_EXPANDED,
>  	ADVICE_SKIPPED_CHERRY_PICKS,

Ditto.
Elijah Newren July 5, 2024, 8:29 p.m. UTC | #6
Hi,

On Wed, Jul 3, 2024 at 8:14 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <stolee@gmail.com>
>
> Typically, forcing a sparse index to expand to a full index means that
> Git could not determine the status of a file outside of the
> sparse-checkout and needed to expand sparse trees into the full list of
> sparse blobs. This operation can be very slow when the sparse-checkout
> is much smaller than the full tree at HEAD.

Yep, I'm with you here.

> When users are in this state, it is common that 'git status' will report
> the problem.

I struggled to understand this sentence in combination with your later
statements, though that may only be because I had some difficulty with
later parts of the commit message.  Perhaps addressing the later parts
will make this sentence fine as-is, but it's possible this sentence
could do with a bit more detail.

> Usually there is a modified or untracked file outside of
> the sparse-checkout mentioned by the 'git status' output. There are a
> number of reasons why this is insufficient:

Fair enough; let's focus on why the output of git status is insufficient...

>  1. Users may not have a full understanding of which files are inside or
>     outside of their sparse-checkout. This is more common in monorepos
>     that manage the sparse-checkout using custom tools that map build
>     dependencies into sparse-checkout definitions.

Having sparse-checkout patterns managed by custom tools is a really
good point, but doesn't this statement of yours about needing to know
particular files or directories suggest that...

> diff --git a/sparse-index.c b/sparse-index.c
> index e48e40cae71..1e517f696dd 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -12,6 +12,21 @@
>  #include "config.h"
>  #include "dir.h"
>  #include "fsmonitor-ll.h"
> +#include "advice.h"
> +
> +/**
> + * This global is used by expand_index() to determine if we should give the
> + * advice for advice.sparseIndexExpanded when expanding a sparse index to a full
> + * one. However, this is sometimes done on purpose, such as in the sparse-checkout
> + * builtin, even when index.sparse=false. This may be disabled in
> + * convert_to_sparse().
> + */
> +static int give_advice_on_expansion = 1;
> +#define ADVICE_MSG \
> +       "The sparse index is expanding to a full index, a slow operation.\n" \
> +       "This likely means that you have files in your working directory\n"  \
> +       "that are outside of your sparse-checkout patterns. Remove them\n"   \
> +       "to recover performance expectations, such as with 'git clean'."

...this is an insufficient solution?

I was a bit surprised you'd list your first reason for git status
being insufficient, that users need to know which files/directories
are the problem and then provide a solution that doesn't attempt to
identify any files or directories.

>  2. In some cases, an empty directory could exist outside the
>     sparse-checkout and these empty directories are not reported by 'git
>     status' and friends.

This is a really good point too...but given this point, shouldn't your
added advice message also mention "directories" instead of just
mentioning "files" so that users are aware they need to look for those
too?

>  3. If the user has '.gitignore' or 'exclude' files, then 'git status'
>     will squelch the warnings and not demonstrate any problems.

Your solution does help the user to know that there is a problem (even
if they don't know which files -- or directories -- are the problem),
so this patch is making things better.

Two other small comments...

> diff --git a/advice.c b/advice.c
> index 558a46fc0b3..7845e427c89 100644
> --- a/advice.c
> +++ b/advice.c
> @@ -77,6 +77,7 @@ static struct {
>         [ADVICE_RM_HINTS]                               = { "rmHints" },
>         [ADVICE_SEQUENCER_IN_USE]                       = { "sequencerInUse" },
>         [ADVICE_SET_UPSTREAM_FAILURE]                   = { "setUpstreamFailure" },
> +       [ADVICE_SPARSE_INDEX_EXPANDED]                  = { "sparseIndexExpanded" },
>         [ADVICE_SKIPPED_CHERRY_PICKS]                   = { "skippedCherryPicks" },

The rest of the list is in alphabetical order, so the new entry
probably should be too.

>         [ADVICE_STATUS_AHEAD_BEHIND_WARNING]            = { "statusAheadBehindWarning" },
>         [ADVICE_STATUS_HINTS]                           = { "statusHints" },
> diff --git a/advice.h b/advice.h
> index 5105d90129d..572272fa0da 100644
> --- a/advice.h
> +++ b/advice.h
> @@ -44,6 +44,7 @@ enum advice_type {
>         ADVICE_RM_HINTS,
>         ADVICE_SEQUENCER_IN_USE,
>         ADVICE_SET_UPSTREAM_FAILURE,
> +       ADVICE_SPARSE_INDEX_EXPANDED,
>         ADVICE_SKIPPED_CHERRY_PICKS,

Again, the rest of the entries are in alphabetical order.


Anyway, I've nit-picked on the little things I saw, but overall this
is a good patch that moves things in a good direction; with a few
small touch-ups it'll probably be good to go.
Derrick Stolee July 8, 2024, 12:57 p.m. UTC | #7
On 7/5/24 4:29 PM, Elijah Newren wrote:
> On Wed, Jul 3, 2024 at 8:14 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> Typically, forcing a sparse index to expand to a full index means that
>> Git could not determine the status of a file outside of the
>> sparse-checkout and needed to expand sparse trees into the full list of
>> sparse blobs. This operation can be very slow when the sparse-checkout
>> is much smaller than the full tree at HEAD.
> 
> Yep, I'm with you here.
> 
>> When users are in this state, it is common that 'git status' will report
>> the problem.
> 
> I struggled to understand this sentence in combination with your later
> statements, though that may only be because I had some difficulty with
> later parts of the commit message.  Perhaps addressing the later parts
> will make this sentence fine as-is, but it's possible this sentence
> could do with a bit more detail.

I think also the issue with this sentence is that its thought isn't
complete until also reading the next one. These can be combined to
get to the point faster.

>> Usually there is a modified or untracked file outside of
>> the sparse-checkout mentioned by the 'git status' output. There are a
>> number of reasons why this is insufficient:
> 
> Fair enough; let's focus on why the output of git status is insufficient...
> 
>>   1. Users may not have a full understanding of which files are inside or
>>      outside of their sparse-checkout. This is more common in monorepos
>>      that manage the sparse-checkout using custom tools that map build
>>      dependencies into sparse-checkout definitions.
> 
> Having sparse-checkout patterns managed by custom tools is a really
> good point, but doesn't this statement of yours about needing to know
> particular files or directories suggest that...
>
>> +static int give_advice_on_expansion = 1;
>> +#define ADVICE_MSG \
>> +       "The sparse index is expanding to a full index, a slow operation.\n" \
>> +       "This likely means that you have files in your working directory\n"  \
>> +       "that are outside of your sparse-checkout patterns. Remove them\n"   \
>> +       "to recover performance expectations, such as with 'git clean'."
> 
> ...this is an insufficient solution?
> 
> I was a bit surprised you'd list your first reason for git status
> being insufficient, that users need to know which files/directories
> are the problem and then provide a solution that doesn't attempt to
> identify any files or directories.
> 
>>   2. In some cases, an empty directory could exist outside the
>>      sparse-checkout and these empty directories are not reported by 'git
>>      status' and friends.
> 
> This is a really good point too...but given this point, shouldn't your
> added advice message also mention "directories" instead of just
> mentioning "files" so that users are aware they need to look for those
> too?
> 
>>   3. If the user has '.gitignore' or 'exclude' files, then 'git status'
>>      will squelch the warnings and not demonstrate any problems.
> 
> Your solution does help the user to know that there is a problem (even
> if they don't know which files -- or directories -- are the problem),
> so this patch is making things better.

Thanks for pushing for a more helpful message. I'm currently thinking
that the core issue is that the working directory has contents (files
or directories) that disagree with the sparse-checkout definition. I
will update the language in v2 to point the user in the direction of
comparing the two.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/config/advice.txt b/Documentation/config/advice.txt
index fa612417568..0ba89898207 100644
--- a/Documentation/config/advice.txt
+++ b/Documentation/config/advice.txt
@@ -116,6 +116,10 @@  advice.*::
 	skippedCherryPicks::
 		Shown when linkgit:git-rebase[1] skips a commit that has already
 		been cherry-picked onto the upstream branch.
+	sparseIndexExpanded::
+		Shown when a sparse index is expanded to a full index, which is likely
+		due to an unexpected set of files existing outside of the
+		sparse-checkout.
 	statusAheadBehind::
 		Shown when linkgit:git-status[1] computes the ahead/behind
 		counts for a local ref compared to its remote tracking ref,
diff --git a/advice.c b/advice.c
index 558a46fc0b3..7845e427c89 100644
--- a/advice.c
+++ b/advice.c
@@ -77,6 +77,7 @@  static struct {
 	[ADVICE_RM_HINTS]				= { "rmHints" },
 	[ADVICE_SEQUENCER_IN_USE]			= { "sequencerInUse" },
 	[ADVICE_SET_UPSTREAM_FAILURE]			= { "setUpstreamFailure" },
+	[ADVICE_SPARSE_INDEX_EXPANDED]			= { "sparseIndexExpanded" },
 	[ADVICE_SKIPPED_CHERRY_PICKS]			= { "skippedCherryPicks" },
 	[ADVICE_STATUS_AHEAD_BEHIND_WARNING]		= { "statusAheadBehindWarning" },
 	[ADVICE_STATUS_HINTS]				= { "statusHints" },
diff --git a/advice.h b/advice.h
index 5105d90129d..572272fa0da 100644
--- a/advice.h
+++ b/advice.h
@@ -44,6 +44,7 @@  enum advice_type {
 	ADVICE_RM_HINTS,
 	ADVICE_SEQUENCER_IN_USE,
 	ADVICE_SET_UPSTREAM_FAILURE,
+	ADVICE_SPARSE_INDEX_EXPANDED,
 	ADVICE_SKIPPED_CHERRY_PICKS,
 	ADVICE_STATUS_AHEAD_BEHIND_WARNING,
 	ADVICE_STATUS_HINTS,
diff --git a/sparse-index.c b/sparse-index.c
index e48e40cae71..1e517f696dd 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -12,6 +12,21 @@ 
 #include "config.h"
 #include "dir.h"
 #include "fsmonitor-ll.h"
+#include "advice.h"
+
+/**
+ * This global is used by expand_index() to determine if we should give the
+ * advice for advice.sparseIndexExpanded when expanding a sparse index to a full
+ * one. However, this is sometimes done on purpose, such as in the sparse-checkout
+ * builtin, even when index.sparse=false. This may be disabled in
+ * convert_to_sparse().
+ */
+static int give_advice_on_expansion = 1;
+#define ADVICE_MSG \
+	"The sparse index is expanding to a full index, a slow operation.\n" \
+	"This likely means that you have files in your working directory\n"  \
+	"that are outside of your sparse-checkout patterns. Remove them\n"   \
+	"to recover performance expectations, such as with 'git clean'."
 
 struct modify_index_context {
 	struct index_state *write;
@@ -183,6 +198,12 @@  int convert_to_sparse(struct index_state *istate, int flags)
 	    !is_sparse_index_allowed(istate, flags))
 		return 0;
 
+	/*
+	 * If we are purposefully collapsing a full index, then don't give
+	 * advice when it is expanded later.
+	 */
+	give_advice_on_expansion = 0;
+
 	/*
 	 * NEEDSWORK: If we have unmerged entries, then stay full.
 	 * Unmerged entries prevent the cache-tree extension from working.
@@ -328,6 +349,12 @@  void expand_index(struct index_state *istate, struct pattern_list *pl)
 			pl = NULL;
 	}
 
+	if (!pl && give_advice_on_expansion) {
+		give_advice_on_expansion = 0;
+		advise_if_enabled(ADVICE_SPARSE_INDEX_EXPANDED,
+				  _(ADVICE_MSG));
+	}
+
 	/*
 	 * A NULL pattern set indicates we are expanding a full index, so
 	 * we use a special region name that indicates the full expansion.
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2f1ae5fd3bc..a2c0e1b4dcc 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -159,7 +159,10 @@  init_repos () {
 	git -C sparse-checkout sparse-checkout set deep &&
 	git -C sparse-index sparse-checkout init --cone --sparse-index &&
 	test_cmp_config -C sparse-index true index.sparse &&
-	git -C sparse-index sparse-checkout set deep
+	git -C sparse-index sparse-checkout set deep &&
+
+	# Disable this message to keep stderr the same.
+	git -C sparse-index config advice.sparseIndexExpanded false
 }
 
 init_repos_as_submodules () {
@@ -2331,4 +2334,15 @@  test_expect_success 'sparse-index is not expanded: check-attr' '
 	ensure_not_expanded check-attr -a --cached -- folder1/a
 '
 
+test_expect_success 'advice.sparseIndexExpanded' '
+	init_repos &&
+
+	git -C sparse-index config --unset advice.sparseIndexExpanded &&
+	git -C sparse-index sparse-checkout set deep/deeper1 &&
+	mkdir -p sparse-index/deep/deeper2/deepest &&
+	touch sparse-index/deep/deeper2/deepest/bogus &&
+	git -C sparse-index status 2>err &&
+	grep "The sparse index is expanding to a full index" err
+'
+
 test_done