Message ID | 9f50f11d394e46ffbe348a579792c2b683096452.1629220124.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Sparse Index: Integrate with merge, cherry-pick, rebase, and revert | expand |
On Tue, Aug 17, 2021 at 10:08 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > The diff_populate_filespec() method is used to describe the diff after a > merge operation is complete, especially when a conflict appears. In > order to avoid expanding a sparse index, the reuse_worktree_file() needs > to be adapted to ignore files that are outside of the sparse-checkout > cone. The file names and OIDs used for this check come from the merged > tree in the case of the ORT strategy, not the index, hence the ability > to look into these paths without having already expanded the index. I'm confused; I thought the diffstat was only shown if the merge was successful, in which case there would be no conflicts appearing. Also, I'm not that familiar with the general diff machinery (just the rename detection parts), but...if the diffstat only shows when the merge is successful, wouldn't we be comparing two REVS (ORIG_HEAD to HEAD)? Why would we make use of the working tree at all in such a case? And, wouldn't using the working tree be dangerous...what if there was a merge performed with a dirty working tree? On a bit of a tangent, I know diffcore-rename.c calls into diff_populate_filespec() as well, and I have some code doing merges in a bare repository (where there obviously is no index). It seemed to be working, but given this commit message, now I'm wondering if I've missed something fundamental either in that implementation or there's something amiss in this patch. Or both. Maybe I need to dig into diff_populate_filespec() more, but it seems you already have. Any pointers to orient me on why your changes are right here (and, if you know, why diffcore-rename.c should or shouldn't be using diff_populate_filespec() in a bare repo)? > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > diff.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/diff.c b/diff.c > index a8113f17070..c457cfa0e59 100644 > --- a/diff.c > +++ b/diff.c > @@ -26,6 +26,7 @@ > #include "parse-options.h" > #include "help.h" > #include "promisor-remote.h" > +#include "dir.h" > > #ifdef NO_FAST_WORKING_DIRECTORY > #define FAST_WORKING_DIRECTORY 0 > @@ -3900,6 +3901,13 @@ static int reuse_worktree_file(struct index_state *istate, > if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid)) > return 0; > > + /* > + * If this path does not match our sparse-checkout definition, > + * then the file will not be in the working directory. > + */ > + if (!path_in_sparse_checkout(name, istate)) > + return 0; > + > /* > * Similarly, if we'd have to convert the file contents anyway, that > * makes the optimization not worthwhile. > -- > gitgitgadget >
On 8/20/2021 5:32 PM, Elijah Newren wrote: > On Tue, Aug 17, 2021 at 10:08 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <dstolee@microsoft.com> >> >> The diff_populate_filespec() method is used to describe the diff after a >> merge operation is complete, especially when a conflict appears. In >> order to avoid expanding a sparse index, the reuse_worktree_file() needs >> to be adapted to ignore files that are outside of the sparse-checkout >> cone. The file names and OIDs used for this check come from the merged >> tree in the case of the ORT strategy, not the index, hence the ability >> to look into these paths without having already expanded the index. > > I'm confused; I thought the diffstat was only shown if the merge was > successful, in which case there would be no conflicts appearing. That's my mistake. I'll edit the message accordingly. > Also, I'm not that familiar with the general diff machinery (just the > rename detection parts), but...if the diffstat only shows when the > merge is successful, wouldn't we be comparing two REVS (ORIG_HEAD to > HEAD)? Why would we make use of the working tree at all in such a > case? And, wouldn't using the working tree be dangerous...what if > there was a merge performed with a dirty working tree? > > On a bit of a tangent, I know diffcore-rename.c calls into > diff_populate_filespec() as well, and I have some code doing merges in > a bare repository (where there obviously is no index). It seemed to > be working, but given this commit message, now I'm wondering if I've > missed something fundamental either in that implementation or there's > something amiss in this patch. Or both. Maybe I need to dig into > diff_populate_filespec() more, but it seems you already have. Any > pointers to orient me on why your changes are right here (and, if you > know, why diffcore-rename.c should or shouldn't be using > diff_populate_filespec() in a bare repo)? I think the cases you are thinking about are covered by this condition before the one I'm inserting: /* We want to avoid the working directory if our caller * doesn't need the data in a normal file, this system * is rather slow with its stat/open/mmap/close syscalls, * and the object is contained in a pack file. The pack * is probably already open and will be faster to obtain * the data through than the working directory. Loose * objects however would tend to be slower as they need * to be individually opened and inflated. */ if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid)) return 0; or after: /* * Similarly, if we'd have to convert the file contents anyway, that * makes the optimization not worthwhile. */ if (!want_file && would_convert_to_git(istate, name)) return 0; (This makes me think that I should move my new condition further down so these two can be linked by context.) Sounds like this is just an optimization, so it is fine to opt out of it if we think the optimization isn't necessary. Outside of the sparse-checkout cone qualifies. Thanks, -Stolee
On Tue, Aug 24, 2021 at 11:30 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 8/20/2021 5:32 PM, Elijah Newren wrote: > > On Tue, Aug 17, 2021 at 10:08 AM Derrick Stolee via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > >> > >> From: Derrick Stolee <dstolee@microsoft.com> > >> > >> The diff_populate_filespec() method is used to describe the diff after a > >> merge operation is complete, especially when a conflict appears. In > >> order to avoid expanding a sparse index, the reuse_worktree_file() needs > >> to be adapted to ignore files that are outside of the sparse-checkout > >> cone. The file names and OIDs used for this check come from the merged > >> tree in the case of the ORT strategy, not the index, hence the ability > >> to look into these paths without having already expanded the index. > > > > I'm confused; I thought the diffstat was only shown if the merge was > > successful, in which case there would be no conflicts appearing. > > That's my mistake. I'll edit the message accordingly. > > > Also, I'm not that familiar with the general diff machinery (just the > > rename detection parts), but...if the diffstat only shows when the > > merge is successful, wouldn't we be comparing two REVS (ORIG_HEAD to > > HEAD)? Why would we make use of the working tree at all in such a > > case? And, wouldn't using the working tree be dangerous...what if > > there was a merge performed with a dirty working tree? > > > > On a bit of a tangent, I know diffcore-rename.c calls into > > diff_populate_filespec() as well, and I have some code doing merges in > > a bare repository (where there obviously is no index). It seemed to > > be working, but given this commit message, now I'm wondering if I've > > missed something fundamental either in that implementation or there's > > something amiss in this patch. Or both. Maybe I need to dig into > > diff_populate_filespec() more, but it seems you already have. Any > > pointers to orient me on why your changes are right here (and, if you > > know, why diffcore-rename.c should or shouldn't be using > > diff_populate_filespec() in a bare repo)? > > I think the cases you are thinking about are covered by this > condition before the one I'm inserting: > > /* We want to avoid the working directory if our caller > * doesn't need the data in a normal file, this system > * is rather slow with its stat/open/mmap/close syscalls, > * and the object is contained in a pack file. The pack > * is probably already open and will be faster to obtain > * the data through than the working directory. Loose > * objects however would tend to be slower as they need > * to be individually opened and inflated. > */ > if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid)) > return 0; > > or after: > > /* > * Similarly, if we'd have to convert the file contents anyway, that > * makes the optimization not worthwhile. > */ > if (!want_file && would_convert_to_git(istate, name)) > return 0; > > (This makes me think that I should move my new condition further down > so these two can be linked by context.) > > Sounds like this is just an optimization, so it is fine to opt out of it > if we think the optimization isn't necessary. Outside of the sparse-checkout > cone qualifies. Ah, this is very helpful. Thanks for digging up these details.
diff --git a/diff.c b/diff.c index a8113f17070..c457cfa0e59 100644 --- a/diff.c +++ b/diff.c @@ -26,6 +26,7 @@ #include "parse-options.h" #include "help.h" #include "promisor-remote.h" +#include "dir.h" #ifdef NO_FAST_WORKING_DIRECTORY #define FAST_WORKING_DIRECTORY 0 @@ -3900,6 +3901,13 @@ static int reuse_worktree_file(struct index_state *istate, if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid)) return 0; + /* + * If this path does not match our sparse-checkout definition, + * then the file will not be in the working directory. + */ + if (!path_in_sparse_checkout(name, istate)) + return 0; + /* * Similarly, if we'd have to convert the file contents anyway, that * makes the optimization not worthwhile.