Message ID | pull.1756.git.1720019679517.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | advice: warn when sparse index expands | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <stolee@gmail.com> > > Typically, forcing a sparse index to expand to a full index means that > Git could not determine the status of a file outside of the > sparse-checkout and needed to expand sparse trees into the full list of > sparse blobs. This operation can be very slow when the sparse-checkout > is much smaller than the full tree at HEAD. > > When users are in this state, it is common that 'git status' will report > the problem. Usually there is a modified or untracked file outside of > the sparse-checkout mentioned by the 'git status' output. There are a > number of reasons why this is insufficient: Nicely written to explain why giving an advice message is a good idea to cover this situation. Making it possible to squelch comes with no cost (once the code to do so is written), so I do not have a huge problem with the use of advise_if_enabled(), but I offhand do not know if the users would ever want to squelch it. Is this something that users would choose to say "yes, I know what I am doing is making my sparse working tree unusuably slow and I've heard how to whip my sparse working tree into a better shape already---please do not tell it to me ever again; because I need to leave these crufts outside the sparse cone anyway, I am willing to accept the unusually slow response, overhead, and wasted cycles and power" to? Other than that, nicely done.
On 7/3/24 2:16 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Derrick Stolee <stolee@gmail.com> >> >> Typically, forcing a sparse index to expand to a full index means that >> Git could not determine the status of a file outside of the >> sparse-checkout and needed to expand sparse trees into the full list of >> sparse blobs. This operation can be very slow when the sparse-checkout >> is much smaller than the full tree at HEAD. >> >> When users are in this state, it is common that 'git status' will report >> the problem. Usually there is a modified or untracked file outside of >> the sparse-checkout mentioned by the 'git status' output. There are a >> number of reasons why this is insufficient: > > Nicely written to explain why giving an advice message is a good > idea to cover this situation. > > Making it possible to squelch comes with no cost (once the code to > do so is written), so I do not have a huge problem with the use of > advise_if_enabled(), but I offhand do not know if the users would > ever want to squelch it. Is this something that users would choose > to say "yes, I know what I am doing is making my sparse working tree > unusuably slow and I've heard how to whip my sparse working tree > into a better shape already---please do not tell it to me ever > again; because I need to leave these crufts outside the sparse cone > anyway, I am willing to accept the unusually slow response, > overhead, and wasted cycles and power" to? I currently can't imagine a case where a user would want to disable this advice, but I defaulted to allowing it. I suppose it is more difficult to remove that option later, so I should have defaulted to not having it removable via config. I can send a v2 without the config option present. (I'll wait a day or two to see if others have strong opinions.) Thanks, -Stolee
On Wednesday, July 3, 2024 3:18 PM, Derrick Stolee wrote: >On 7/3/24 2:16 PM, Junio C Hamano wrote: >> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> >>> From: Derrick Stolee <stolee@gmail.com> >>> >>> Typically, forcing a sparse index to expand to a full index means >>> that Git could not determine the status of a file outside of the >>> sparse-checkout and needed to expand sparse trees into the full list >>> of sparse blobs. This operation can be very slow when the >>> sparse-checkout is much smaller than the full tree at HEAD. >>> >>> When users are in this state, it is common that 'git status' will >>> report the problem. Usually there is a modified or untracked file >>> outside of the sparse-checkout mentioned by the 'git status' output. >>> There are a number of reasons why this is insufficient: >> >> Nicely written to explain why giving an advice message is a good idea >> to cover this situation. >> >> Making it possible to squelch comes with no cost (once the code to do >> so is written), so I do not have a huge problem with the use of >> advise_if_enabled(), but I offhand do not know if the users would ever >> want to squelch it. Is this something that users would choose to say >> "yes, I know what I am doing is making my sparse working tree >> unusuably slow and I've heard how to whip my sparse working tree into >> a better shape already---please do not tell it to me ever again; >> because I need to leave these crufts outside the sparse cone anyway, I >> am willing to accept the unusually slow response, overhead, and wasted >> cycles and power" to? > >I currently can't imagine a case where a user would want to disable this advice, but I >defaulted to allowing it. I suppose it is more difficult to remove that option later, so I >should have defaulted to not having it removable via config. > >I can send a v2 without the config option present. (I'll wait a day or two to see if >others have strong opinions.) One possible use case is during a CI test of user code where they need to do something that causes the advice, but don't want to see the message in their logs. --Randall
Derrick Stolee <stolee@gmail.com> writes: > I can send a v2 without the config option present. (I'll wait a > day or two to see if others have strong opinions.) FWIW, that wasn't the direction I meant to lead the topic into. It is perfectly acceptable if some are much more often used than others among various advice.* configuration variables for squelching these messages. I was merely making an observation that this is likely to be a less used ones. Thanks.
On Wed, Jul 03, 2024 at 03:14:39PM +0000, Derrick Stolee via GitGitGadget wrote: > [ADVICE_SET_UPSTREAM_FAILURE] = { "setUpstreamFailure" }, > + [ADVICE_SPARSE_INDEX_EXPANDED] = { "sparseIndexExpanded" }, > [ADVICE_SKIPPED_CHERRY_PICKS] = { "skippedCherryPicks" }, Maybe this change will disappear in a possible v2, but if it remains you'll probably want to swap these two lines. > ADVICE_SET_UPSTREAM_FAILURE, > + ADVICE_SPARSE_INDEX_EXPANDED, > ADVICE_SKIPPED_CHERRY_PICKS, Ditto.
Hi, On Wed, Jul 3, 2024 at 8:14 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <stolee@gmail.com> > > Typically, forcing a sparse index to expand to a full index means that > Git could not determine the status of a file outside of the > sparse-checkout and needed to expand sparse trees into the full list of > sparse blobs. This operation can be very slow when the sparse-checkout > is much smaller than the full tree at HEAD. Yep, I'm with you here. > When users are in this state, it is common that 'git status' will report > the problem. I struggled to understand this sentence in combination with your later statements, though that may only be because I had some difficulty with later parts of the commit message. Perhaps addressing the later parts will make this sentence fine as-is, but it's possible this sentence could do with a bit more detail. > Usually there is a modified or untracked file outside of > the sparse-checkout mentioned by the 'git status' output. There are a > number of reasons why this is insufficient: Fair enough; let's focus on why the output of git status is insufficient... > 1. Users may not have a full understanding of which files are inside or > outside of their sparse-checkout. This is more common in monorepos > that manage the sparse-checkout using custom tools that map build > dependencies into sparse-checkout definitions. Having sparse-checkout patterns managed by custom tools is a really good point, but doesn't this statement of yours about needing to know particular files or directories suggest that... > diff --git a/sparse-index.c b/sparse-index.c > index e48e40cae71..1e517f696dd 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -12,6 +12,21 @@ > #include "config.h" > #include "dir.h" > #include "fsmonitor-ll.h" > +#include "advice.h" > + > +/** > + * This global is used by expand_index() to determine if we should give the > + * advice for advice.sparseIndexExpanded when expanding a sparse index to a full > + * one. However, this is sometimes done on purpose, such as in the sparse-checkout > + * builtin, even when index.sparse=false. This may be disabled in > + * convert_to_sparse(). > + */ > +static int give_advice_on_expansion = 1; > +#define ADVICE_MSG \ > + "The sparse index is expanding to a full index, a slow operation.\n" \ > + "This likely means that you have files in your working directory\n" \ > + "that are outside of your sparse-checkout patterns. Remove them\n" \ > + "to recover performance expectations, such as with 'git clean'." ...this is an insufficient solution? I was a bit surprised you'd list your first reason for git status being insufficient, that users need to know which files/directories are the problem and then provide a solution that doesn't attempt to identify any files or directories. > 2. In some cases, an empty directory could exist outside the > sparse-checkout and these empty directories are not reported by 'git > status' and friends. This is a really good point too...but given this point, shouldn't your added advice message also mention "directories" instead of just mentioning "files" so that users are aware they need to look for those too? > 3. If the user has '.gitignore' or 'exclude' files, then 'git status' > will squelch the warnings and not demonstrate any problems. Your solution does help the user to know that there is a problem (even if they don't know which files -- or directories -- are the problem), so this patch is making things better. Two other small comments... > diff --git a/advice.c b/advice.c > index 558a46fc0b3..7845e427c89 100644 > --- a/advice.c > +++ b/advice.c > @@ -77,6 +77,7 @@ static struct { > [ADVICE_RM_HINTS] = { "rmHints" }, > [ADVICE_SEQUENCER_IN_USE] = { "sequencerInUse" }, > [ADVICE_SET_UPSTREAM_FAILURE] = { "setUpstreamFailure" }, > + [ADVICE_SPARSE_INDEX_EXPANDED] = { "sparseIndexExpanded" }, > [ADVICE_SKIPPED_CHERRY_PICKS] = { "skippedCherryPicks" }, The rest of the list is in alphabetical order, so the new entry probably should be too. > [ADVICE_STATUS_AHEAD_BEHIND_WARNING] = { "statusAheadBehindWarning" }, > [ADVICE_STATUS_HINTS] = { "statusHints" }, > diff --git a/advice.h b/advice.h > index 5105d90129d..572272fa0da 100644 > --- a/advice.h > +++ b/advice.h > @@ -44,6 +44,7 @@ enum advice_type { > ADVICE_RM_HINTS, > ADVICE_SEQUENCER_IN_USE, > ADVICE_SET_UPSTREAM_FAILURE, > + ADVICE_SPARSE_INDEX_EXPANDED, > ADVICE_SKIPPED_CHERRY_PICKS, Again, the rest of the entries are in alphabetical order. Anyway, I've nit-picked on the little things I saw, but overall this is a good patch that moves things in a good direction; with a few small touch-ups it'll probably be good to go.
On 7/5/24 4:29 PM, Elijah Newren wrote: > On Wed, Jul 3, 2024 at 8:14 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <stolee@gmail.com> >> >> Typically, forcing a sparse index to expand to a full index means that >> Git could not determine the status of a file outside of the >> sparse-checkout and needed to expand sparse trees into the full list of >> sparse blobs. This operation can be very slow when the sparse-checkout >> is much smaller than the full tree at HEAD. > > Yep, I'm with you here. > >> When users are in this state, it is common that 'git status' will report >> the problem. > > I struggled to understand this sentence in combination with your later > statements, though that may only be because I had some difficulty with > later parts of the commit message. Perhaps addressing the later parts > will make this sentence fine as-is, but it's possible this sentence > could do with a bit more detail. I think also the issue with this sentence is that its thought isn't complete until also reading the next one. These can be combined to get to the point faster. >> Usually there is a modified or untracked file outside of >> the sparse-checkout mentioned by the 'git status' output. There are a >> number of reasons why this is insufficient: > > Fair enough; let's focus on why the output of git status is insufficient... > >> 1. Users may not have a full understanding of which files are inside or >> outside of their sparse-checkout. This is more common in monorepos >> that manage the sparse-checkout using custom tools that map build >> dependencies into sparse-checkout definitions. > > Having sparse-checkout patterns managed by custom tools is a really > good point, but doesn't this statement of yours about needing to know > particular files or directories suggest that... > >> +static int give_advice_on_expansion = 1; >> +#define ADVICE_MSG \ >> + "The sparse index is expanding to a full index, a slow operation.\n" \ >> + "This likely means that you have files in your working directory\n" \ >> + "that are outside of your sparse-checkout patterns. Remove them\n" \ >> + "to recover performance expectations, such as with 'git clean'." > > ...this is an insufficient solution? > > I was a bit surprised you'd list your first reason for git status > being insufficient, that users need to know which files/directories > are the problem and then provide a solution that doesn't attempt to > identify any files or directories. > >> 2. In some cases, an empty directory could exist outside the >> sparse-checkout and these empty directories are not reported by 'git >> status' and friends. > > This is a really good point too...but given this point, shouldn't your > added advice message also mention "directories" instead of just > mentioning "files" so that users are aware they need to look for those > too? > >> 3. If the user has '.gitignore' or 'exclude' files, then 'git status' >> will squelch the warnings and not demonstrate any problems. > > Your solution does help the user to know that there is a problem (even > if they don't know which files -- or directories -- are the problem), > so this patch is making things better. Thanks for pushing for a more helpful message. I'm currently thinking that the core issue is that the working directory has contents (files or directories) that disagree with the sparse-checkout definition. I will update the language in v2 to point the user in the direction of comparing the two. Thanks, -Stolee
diff --git a/Documentation/config/advice.txt b/Documentation/config/advice.txt index fa612417568..0ba89898207 100644 --- a/Documentation/config/advice.txt +++ b/Documentation/config/advice.txt @@ -116,6 +116,10 @@ advice.*:: skippedCherryPicks:: Shown when linkgit:git-rebase[1] skips a commit that has already been cherry-picked onto the upstream branch. + sparseIndexExpanded:: + Shown when a sparse index is expanded to a full index, which is likely + due to an unexpected set of files existing outside of the + sparse-checkout. statusAheadBehind:: Shown when linkgit:git-status[1] computes the ahead/behind counts for a local ref compared to its remote tracking ref, diff --git a/advice.c b/advice.c index 558a46fc0b3..7845e427c89 100644 --- a/advice.c +++ b/advice.c @@ -77,6 +77,7 @@ static struct { [ADVICE_RM_HINTS] = { "rmHints" }, [ADVICE_SEQUENCER_IN_USE] = { "sequencerInUse" }, [ADVICE_SET_UPSTREAM_FAILURE] = { "setUpstreamFailure" }, + [ADVICE_SPARSE_INDEX_EXPANDED] = { "sparseIndexExpanded" }, [ADVICE_SKIPPED_CHERRY_PICKS] = { "skippedCherryPicks" }, [ADVICE_STATUS_AHEAD_BEHIND_WARNING] = { "statusAheadBehindWarning" }, [ADVICE_STATUS_HINTS] = { "statusHints" }, diff --git a/advice.h b/advice.h index 5105d90129d..572272fa0da 100644 --- a/advice.h +++ b/advice.h @@ -44,6 +44,7 @@ enum advice_type { ADVICE_RM_HINTS, ADVICE_SEQUENCER_IN_USE, ADVICE_SET_UPSTREAM_FAILURE, + ADVICE_SPARSE_INDEX_EXPANDED, ADVICE_SKIPPED_CHERRY_PICKS, ADVICE_STATUS_AHEAD_BEHIND_WARNING, ADVICE_STATUS_HINTS, diff --git a/sparse-index.c b/sparse-index.c index e48e40cae71..1e517f696dd 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -12,6 +12,21 @@ #include "config.h" #include "dir.h" #include "fsmonitor-ll.h" +#include "advice.h" + +/** + * This global is used by expand_index() to determine if we should give the + * advice for advice.sparseIndexExpanded when expanding a sparse index to a full + * one. However, this is sometimes done on purpose, such as in the sparse-checkout + * builtin, even when index.sparse=false. This may be disabled in + * convert_to_sparse(). + */ +static int give_advice_on_expansion = 1; +#define ADVICE_MSG \ + "The sparse index is expanding to a full index, a slow operation.\n" \ + "This likely means that you have files in your working directory\n" \ + "that are outside of your sparse-checkout patterns. Remove them\n" \ + "to recover performance expectations, such as with 'git clean'." struct modify_index_context { struct index_state *write; @@ -183,6 +198,12 @@ int convert_to_sparse(struct index_state *istate, int flags) !is_sparse_index_allowed(istate, flags)) return 0; + /* + * If we are purposefully collapsing a full index, then don't give + * advice when it is expanded later. + */ + give_advice_on_expansion = 0; + /* * NEEDSWORK: If we have unmerged entries, then stay full. * Unmerged entries prevent the cache-tree extension from working. @@ -328,6 +349,12 @@ void expand_index(struct index_state *istate, struct pattern_list *pl) pl = NULL; } + if (!pl && give_advice_on_expansion) { + give_advice_on_expansion = 0; + advise_if_enabled(ADVICE_SPARSE_INDEX_EXPANDED, + _(ADVICE_MSG)); + } + /* * A NULL pattern set indicates we are expanding a full index, so * we use a special region name that indicates the full expansion. diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 2f1ae5fd3bc..a2c0e1b4dcc 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -159,7 +159,10 @@ init_repos () { git -C sparse-checkout sparse-checkout set deep && git -C sparse-index sparse-checkout init --cone --sparse-index && test_cmp_config -C sparse-index true index.sparse && - git -C sparse-index sparse-checkout set deep + git -C sparse-index sparse-checkout set deep && + + # Disable this message to keep stderr the same. + git -C sparse-index config advice.sparseIndexExpanded false } init_repos_as_submodules () { @@ -2331,4 +2334,15 @@ test_expect_success 'sparse-index is not expanded: check-attr' ' ensure_not_expanded check-attr -a --cached -- folder1/a ' +test_expect_success 'advice.sparseIndexExpanded' ' + init_repos && + + git -C sparse-index config --unset advice.sparseIndexExpanded && + git -C sparse-index sparse-checkout set deep/deeper1 && + mkdir -p sparse-index/deep/deeper2/deepest && + touch sparse-index/deep/deeper2/deepest/bogus && + git -C sparse-index status 2>err && + grep "The sparse index is expanding to a full index" err +' + test_done