diff mbox series

[v2,05/20] sparse-index: implement ensure_full_index()

Message ID 399ddb0bad56c69ff9d9591f5e8eacf52cf50a15.1615404665.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse Index: Design, Format, Tests | expand

Commit Message

Derrick Stolee March 10, 2021, 7:30 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

We will mark an in-memory index_state as having sparse directory entries
with the sparse_index bit. These currently cannot exist, but we will add
a mechanism for collapsing a full index to a sparse one in a later
change. That will happen at write time, so we must first allow parsing
the format before writing it.

Commands or methods that require a full index in order to operate can
call ensure_full_index() to expand that index in-memory. This requires
parsing trees using that index's repository.

Sparse directory entries have a specific 'ce_mode' value. The macro
S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type.
This ce_mode is not possible with the existing index formats, so we don't
also verify all properties of a sparse-directory entry, which are:

 1. ce->ce_mode == 0040000
 2. ce->flags & CE_SKIP_WORKTREE is true
 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator)
 4. ce->oid references a tree object.

These are all semi-enforced in ensure_full_index() to some extent. Any
deviation will cause a warning at minimum or a failure in the worst
case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache.h        | 13 ++++++-
 read-cache.c   |  9 +++++
 sparse-index.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 115 insertions(+), 2 deletions(-)

Comments

Junio C Hamano March 12, 2021, 6:50 a.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  void ensure_full_index(struct index_state *istate)
>  {
> ...
> +	int i;
> +		tree = lookup_tree(istate->repo, &ce->oid);
> +
> +		memset(&ps, 0, sizeof(ps));
> +		ps.recursive = 1;
> +		ps.has_wildcard = 1;
> +		ps.max_depth = -1;
> +
> +		read_tree_recursive(istate->repo, tree,
> +				    ce->name, strlen(ce->name),
> +				    0, &ps,
> +				    add_path_to_index, full);

Ævar, the assumption that led to your e68237bb (tree.h API: remove
support for starting at prefix != "", 2021-03-08) closes the door
for this code rather badly.  Please work with Derrick to figure out
what the best course of action would be.

Thanks.

> +		/* free directory entries. full entries are re-used */
> +		discard_cache_entry(ce);
> +	}
> +
> +	/* Copy back into original index. */
> +	memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash));
> +	istate->sparse_index = 0;
> +	free(istate->cache);
> +	istate->cache = full->cache;
> +	istate->cache_nr = full->cache_nr;
> +	istate->cache_alloc = full->cache_alloc;
> +
> +	free(full);
> +
> +	trace2_region_leave("index", "ensure_full_index", istate->repo);
>  }
Derrick Stolee March 12, 2021, 1:56 p.m. UTC | #2
On 3/12/2021 1:50 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>  void ensure_full_index(struct index_state *istate)
>>  {
>> ...
>> +	int i;
>> +		tree = lookup_tree(istate->repo, &ce->oid);
>> +
>> +		memset(&ps, 0, sizeof(ps));
>> +		ps.recursive = 1;
>> +		ps.has_wildcard = 1;
>> +		ps.max_depth = -1;
>> +
>> +		read_tree_recursive(istate->repo, tree,
>> +				    ce->name, strlen(ce->name),
>> +				    0, &ps,
>> +				    add_path_to_index, full);
> 
> Ævar, the assumption that led to your e68237bb (tree.h API: remove
> support for starting at prefix != "", 2021-03-08) closes the door
> for this code rather badly.  Please work with Derrick to figure out
> what the best course of action would be.

Thanks for pointing this out, Junio.

My preference would be to drop "tree.h API: remove support for
starting at prefix != """, but it should be OK to keep "tree.h API:
remove "stage" parameter from read_tree_recursive()" (currently
b3a078863f6), even though it introduces a semantic conflict here.

Since I haven't seen my sparse-index topic get picked up by a
tracking branch, I'd be happy to rebase on top of Ævar's topic if
I can still set a non-root prefix.

Thanks,
-Stolee
Junio C Hamano March 12, 2021, 8:08 p.m. UTC | #3
Derrick Stolee <stolee@gmail.com> writes:

>> Ævar, the assumption that led to your e68237bb (tree.h API: remove
>> support for starting at prefix != "", 2021-03-08) closes the door
>> for this code rather badly.  Please work with Derrick to figure out
>> what the best course of action would be.
>
> Thanks for pointing this out, Junio.
>
> My preference would be to drop "tree.h API: remove support for
> starting at prefix != """, but it should be OK to keep "tree.h API:
> remove "stage" parameter from read_tree_recursive()" (currently
> b3a078863f6), even though it introduces a semantic conflict here.
>
> Since I haven't seen my sparse-index topic get picked up by a
> tracking branch, I'd be happy to rebase on top of Ævar's topic if
> I can still set a non-root prefix.

I did try to have both in 'seen' (after all, that is the primary way
I find out these conflicts early---no one can keep all the details
of all the topics in flight in one's head), and saw that we now have
a need for non-empty prefix that we thought we no longer have in the
other topic --- I think we should probably keep support of non-empty
prefix (as the primary reason why that patch exists is because we
saw no in-tree users---now if your 05/20 proves to be a good use of
the feature, there is one fewer reasons to remove the support) in
some form, so discarding e68237bb certainly is an option.


If we were to base the sparse-index topic on top of ab/read-tree, we
may be able to gain further simplification and clean-up of the API.

I think all the clean-up value e68237bb has are on the calling side
(they no longer have to pass constant ("", 0) to the function), and
we could rewrite e68237bb by

 - renaming "read_tree_recursive()" to "read_tree_at()", with the
   non-empty prefix support.

 - creating a new function "read_tree()", which lacks the support
   for prefix, as a thin-wrapper around "read_tree_at()".

 - modifying the callers of "read_tree_recursive()" changed by
   e68237bb to instead call "read_tree()" (without prefix).

to simplify majority of calling sites without losing functionality.

Then your [05/20] can use the read_tree_at() to read with a prefix.


But that kind of details, I'd want to see you two figure out
yourselves.

Thanks.
Derrick Stolee March 12, 2021, 8:11 p.m. UTC | #4
On 3/12/2021 3:08 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>>> Ævar, the assumption that led to your e68237bb (tree.h API: remove
>>> support for starting at prefix != "", 2021-03-08) closes the door
>>> for this code rather badly.  Please work with Derrick to figure out
>>> what the best course of action would be.
>>
>> Thanks for pointing this out, Junio.
>>
>> My preference would be to drop "tree.h API: remove support for
>> starting at prefix != """, but it should be OK to keep "tree.h API:
>> remove "stage" parameter from read_tree_recursive()" (currently
>> b3a078863f6), even though it introduces a semantic conflict here.
>>
>> Since I haven't seen my sparse-index topic get picked up by a
>> tracking branch, I'd be happy to rebase on top of Ævar's topic if
>> I can still set a non-root prefix.
> I think all the clean-up value e68237bb has are on the calling side
> (they no longer have to pass constant ("", 0) to the function), and
> we could rewrite e68237bb by
> 
>  - renaming "read_tree_recursive()" to "read_tree_at()", with the
>    non-empty prefix support.
> 
>  - creating a new function "read_tree()", which lacks the support
>    for prefix, as a thin-wrapper around "read_tree_at()".
> 
>  - modifying the callers of "read_tree_recursive()" changed by
>    e68237bb to instead call "read_tree()" (without prefix).
> 
> to simplify majority of calling sites without losing functionality.
> 
> Then your [05/20] can use the read_tree_at() to read with a prefix.
> 
> 
> But that kind of details, I'd want to see you two figure out
> yourselves.

You've given us a great proposal. I'll wait for Ævar to chime in
(and probably update his topic) before I submit a new version.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason March 15, 2021, 11:52 p.m. UTC | #5
On Fri, Mar 12 2021, Derrick Stolee wrote:

> On 3/12/2021 3:08 PM, Junio C Hamano wrote:
>> Derrick Stolee <stolee@gmail.com> writes:
>> 
>>>> Ævar, the assumption that led to your e68237bb (tree.h API: remove
>>>> support for starting at prefix != "", 2021-03-08) closes the door
>>>> for this code rather badly.  Please work with Derrick to figure out
>>>> what the best course of action would be.
>>>
>>> Thanks for pointing this out, Junio.
>>>
>>> My preference would be to drop "tree.h API: remove support for
>>> starting at prefix != """, but it should be OK to keep "tree.h API:
>>> remove "stage" parameter from read_tree_recursive()" (currently
>>> b3a078863f6), even though it introduces a semantic conflict here.
>>>
>>> Since I haven't seen my sparse-index topic get picked up by a
>>> tracking branch, I'd be happy to rebase on top of Ævar's topic if
>>> I can still set a non-root prefix.
>> I think all the clean-up value e68237bb has are on the calling side
>> (they no longer have to pass constant ("", 0) to the function), and
>> we could rewrite e68237bb by
>> 
>>  - renaming "read_tree_recursive()" to "read_tree_at()", with the
>>    non-empty prefix support.
>> 
>>  - creating a new function "read_tree()", which lacks the support
>>    for prefix, as a thin-wrapper around "read_tree_at()".
>> 
>>  - modifying the callers of "read_tree_recursive()" changed by
>>    e68237bb to instead call "read_tree()" (without prefix).
>> 
>> to simplify majority of calling sites without losing functionality.
>> 
>> Then your [05/20] can use the read_tree_at() to read with a prefix.
>> 
>> 
>> But that kind of details, I'd want to see you two figure out
>> yourselves.
>
> You've given us a great proposal. I'll wait for Ævar to chime in
> (and probably update his topic) before I submit a new version.

I've re-rolled my series just now at
https://lore.kernel.org/git/20210315234344.28427-1-avarab@gmail.com/
sorry for the delay.

You should be able to rebase easily on top of it, although note that the
new read_tree_at() uses a strbuf, but is otherwise the same as the old
read_tree_recursive().

Note that the pathspec can also be used to get to where
read_tree_recursive() would have brought you. I haven't looked at
whether there's reasons to convert in-tree (or this) code to pathspec
use, or vice-versa convert some things that use pathspecs now
(e.g. ls-tree with a path) to providing a prefix via the strbuf.
diff mbox series

Patch

diff --git a/cache.h b/cache.h
index d92814961405..1f0b42264606 100644
--- a/cache.h
+++ b/cache.h
@@ -204,6 +204,8 @@  struct cache_entry {
 #error "CE_EXTENDED_FLAGS out of range"
 #endif
 
+#define S_ISSPARSEDIR(m) ((m) == S_IFDIR)
+
 /* Forward structure decls */
 struct pathspec;
 struct child_process;
@@ -319,7 +321,14 @@  struct index_state {
 		 drop_cache_tree : 1,
 		 updated_workdir : 1,
 		 updated_skipworktree : 1,
-		 fsmonitor_has_run_once : 1;
+		 fsmonitor_has_run_once : 1,
+
+		 /*
+		  * sparse_index == 1 when sparse-directory
+		  * entries exist. Requires sparse-checkout
+		  * in cone mode.
+		  */
+		 sparse_index : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	struct object_id oid;
@@ -722,6 +731,8 @@  int read_index_from(struct index_state *, const char *path,
 		    const char *gitdir);
 int is_index_unborn(struct index_state *);
 
+void ensure_full_index(struct index_state *istate);
+
 /* For use with `write_locked_index()`. */
 #define COMMIT_LOCK		(1 << 0)
 #define SKIP_IF_UNCHANGED	(1 << 1)
diff --git a/read-cache.c b/read-cache.c
index 29144cf879e7..97dbf2434f30 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -101,6 +101,9 @@  static const char *alternate_index_output;
 
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		istate->sparse_index = 1;
+
 	istate->cache[nr] = ce;
 	add_name_hash(istate, ce);
 }
@@ -2255,6 +2258,12 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	trace2_data_intmax("index", the_repository, "read/cache_nr",
 			   istate->cache_nr);
 
+	if (!istate->repo)
+		istate->repo = the_repository;
+	prepare_repo_settings(istate->repo);
+	if (istate->repo->settings.command_requires_full_index)
+		ensure_full_index(istate);
+
 	return istate->cache_nr;
 
 unmap:
diff --git a/sparse-index.c b/sparse-index.c
index 82183ead563b..316cb949b74b 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -1,8 +1,101 @@ 
 #include "cache.h"
 #include "repository.h"
 #include "sparse-index.h"
+#include "tree.h"
+#include "pathspec.h"
+#include "trace2.h"
+
+static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
+{
+	ALLOC_GROW(istate->cache, nr + 1, istate->cache_alloc);
+
+	istate->cache[nr] = ce;
+	add_name_hash(istate, ce);
+}
+
+static int add_path_to_index(const struct object_id *oid,
+				struct strbuf *base, const char *path,
+				unsigned int mode, int stage, void *context)
+{
+	struct index_state *istate = (struct index_state *)context;
+	struct cache_entry *ce;
+	size_t len = base->len;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	strbuf_addstr(base, path);
+
+	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
+	ce->ce_flags |= CE_SKIP_WORKTREE;
+	set_index_entry(istate, istate->cache_nr++, ce);
+
+	strbuf_setlen(base, len);
+	return 0;
+}
 
 void ensure_full_index(struct index_state *istate)
 {
-	/* intentionally left blank */
+	int i;
+	struct index_state *full;
+
+	if (!istate || !istate->sparse_index)
+		return;
+
+	if (!istate->repo)
+		istate->repo = the_repository;
+
+	trace2_region_enter("index", "ensure_full_index", istate->repo);
+
+	/* initialize basics of new index */
+	full = xcalloc(1, sizeof(struct index_state));
+	memcpy(full, istate, sizeof(struct index_state));
+
+	/* then change the necessary things */
+	full->sparse_index = 0;
+	full->cache_alloc = (3 * istate->cache_alloc) / 2;
+	full->cache_nr = 0;
+	ALLOC_ARRAY(full->cache, full->cache_alloc);
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		struct tree *tree;
+		struct pathspec ps;
+
+		if (!S_ISSPARSEDIR(ce->ce_mode)) {
+			set_index_entry(full, full->cache_nr++, ce);
+			continue;
+		}
+		if (!(ce->ce_flags & CE_SKIP_WORKTREE))
+			warning(_("index entry is a directory, but not sparse (%08x)"),
+				ce->ce_flags);
+
+		/* recursively walk into cd->name */
+		tree = lookup_tree(istate->repo, &ce->oid);
+
+		memset(&ps, 0, sizeof(ps));
+		ps.recursive = 1;
+		ps.has_wildcard = 1;
+		ps.max_depth = -1;
+
+		read_tree_recursive(istate->repo, tree,
+				    ce->name, strlen(ce->name),
+				    0, &ps,
+				    add_path_to_index, full);
+
+		/* free directory entries. full entries are re-used */
+		discard_cache_entry(ce);
+	}
+
+	/* Copy back into original index. */
+	memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash));
+	istate->sparse_index = 0;
+	free(istate->cache);
+	istate->cache = full->cache;
+	istate->cache_nr = full->cache_nr;
+	istate->cache_alloc = full->cache_alloc;
+
+	free(full);
+
+	trace2_region_leave("index", "ensure_full_index", istate->repo);
 }