diff mbox series

[v2,1/3] packfile-uris: support for excluding commit object

Message ID 73e64147b17cb382d34357c913616095b6169650.1621327467.git.dyroneteng@gmail.com (mailing list archive)
State New, archived
Headers show
Series packfile-uris: commit objects exclusion | expand

Commit Message

Teng Long May 18, 2021, 8:49 a.m. UTC
On the server, more sophisticated means of excluding objects should be
supported, such as commit object. This commit introduces a new
configuration `uploadpack.excludeobject` for this.

The old configuration `uploadpack.blobpackfileuri` is only support to
exclude blobs and the name has no abstract meaning, so the configruation
name changes, to support more object types. Compatibility issues will
not be considered because packfile-uris now is an experimental feature.

In addition to the configuration name, the format of the configuration
value has also been expanded. When excluding the commits (or trees in
the future) objects, the old format `<object-hash> <pack-hash> <uri>`
can not express the meaning of recursion. So, the format is expanded,
the new format `<object-hash> <recursively> <pack-hash> <uri>` should
deal with this scenario (When processing commit objects, whether they
are absolutely recursively excluded).

Signed-off-by: Teng Long <dyroneteng@gmail.com>
---
 builtin/pack-objects.c | 53 ++++++++++++++++++++++++++++++------------
 fetch-pack.c           |  5 ++++
 upload-pack.c          |  5 ++--
 3 files changed, 45 insertions(+), 18 deletions(-)

Comments

Junio C Hamano May 19, 2021, 4:28 a.m. UTC | #1
Teng Long <dyroneteng@gmail.com> writes:

> On the server, more sophisticated means of excluding objects should be
> supported, such as commit object. This commit introduces a new
> configuration `uploadpack.excludeobject` for this.

This "should" is not justfied at all, it seems?  What is lacking in
what we already have?  What new things does it all us to do by
adding a new configuration variable?

> The old configuration `uploadpack.blobpackfileuri` is only support to
> exclude blobs and the name has no abstract meaning, so the configruation
> name changes, to support more object types. Compatibility issues will
> not be considered because packfile-uris now is an experimental feature.

I'll let Jonathan speak up, but even for an experimental feature,
whatever new and incompatible way to do things should have a clear
advantage compared to the old way.  Sell the backward incomptibility
along that line---"it is an experimental so I'll trash it" is not,
but "by doing this it gets this much better, and migrating existing
users won't be too taxing (it is just this simple thing)" is an
acceptable way to justify such a change.

Note that I am not opposed to the proposed change (and I am not
supporting it, either).  I do have a problem with the way the change
is sold, though.

>  builtin/pack-objects.c | 53 ++++++++++++++++++++++++++++++------------
>  fetch-pack.c           |  5 ++++
>  upload-pack.c          |  5 ++--
>  3 files changed, 45 insertions(+), 18 deletions(-)

Even though the name of the configuration variable changed, and the
semantics of the value of it changed, there is no documentation
change, because...?

Because the original didn't even document the variable properly?  It
may be another reason why changing it may not impact the existing
users too much.

> @@ -132,6 +134,7 @@ struct configured_exclusion {
>  	struct oidmap_entry e;
>  	char *pack_hash_hex;
>  	char *uri;
> +	int recursively:1;
>  };
>  static struct oidmap configured_exclusions;
>  
> @@ -1291,10 +1294,16 @@ static int want_object_in_pack_one(struct packed_git *p,
>   * and its offset in these variables.
>   */
>  static int want_object_in_pack(const struct object_id *oid,
> +			       enum object_type type,
>  			       int exclude,
>  			       struct packed_git **found_pack,
>  			       off_t *found_offset)
>  {
> +	if (exclude_until_next_commit && type != OBJ_COMMIT)
> +		return 0;
> +	if (type == OBJ_COMMIT)
> +		exclude_until_next_commit = 0 ;

Lose SP before the semicolon.

Our codebase does not allow statements before declarations.  Move
all of the above down to be below the block of decls at the
beginning of the function.

>  	int want;
>  	struct list_head *pos;
>  	struct multi_pack_index *m;

> @@ -1345,6 +1354,8 @@ static int want_object_in_pack(const struct object_id *oid,
>  						&p) &&
>  				    *p == ':') {
>  					oidset_insert(&excluded_by_config, oid);
> +					if(ex->recursively && type == OBJ_COMMIT)
> +						exclude_until_next_commit = 1;

This depends on a new file-scope global variable, which means two
things.

 * if two or more threads are deciding which object to pack and not
   to pack, this code will horribly break, as they are traversing
   totally different parts of the object DAG to find out which
   objects to pack, but one thread hitting a commit to be excluded
   and setting this flag will cause other thread skip unrelated
   blobs and trees that it discovers, doesn't it?

 * even if we assume there is no concurrency and reentrancy issues
   (e.g. by forcing single-threaded operation when this feature is
   in use), the code _assumes_ a concrete order in which this helper
   function gets called, namely, non-commit objects fed to this
   helper after the helper gets a single commit object *all* belong
   to that commit.  With the current code that feeds objects as they
   are discovered during depth first traversal of the top-level tree
   starting at each commit, that assumption might hold, but it feels
   that the assumption is too much to be healty.  For example, would
   it be possible for the bitmap code to cause this helper to be
   called in different order (i.e. it might find it more convenent
   to feed a tree, a blob or a tag that is unrelated to the commit
   that was last fed to the helper)?  If so, the logic in this code
   will constrain the caller too much.

I'll stop reading for now at this place; review of the remainder may
come at a later time, but not now.

Thanks.
Junio C Hamano May 20, 2021, 4:46 a.m. UTC | #2
(continuing from yesterday's review)

> @@ -3023,7 +3040,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  	struct rev_info *revs = _data;
>  	struct object_info oi = OBJECT_INFO_INIT;
>  	off_t ofs;
> -	enum object_type type;
> +	static enum object_type type;
>  
>  	display_progress(progress_state, ++nr_seen);
>  
> @@ -3031,7 +3048,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  		return 0;
>  
>  	ofs = nth_packed_object_offset(p, pos);
> -	if (!want_object_in_pack(oid, 0, &p, &ofs))
> +	if (!want_object_in_pack(oid, type, 0, &p, &ofs))
>  		return 0;
>  
>  	oi.typep = &type;

This change is puzzling.

The first call to this helper will use BSS initialized type to call
want_object_in_pack(), and then after that call says "yes, we want
that object", packed_object_info() gets called to learn what type it
is.  And the second call would use the type of the object we
previously handled, because type is a function scope static.  Or if
we are not that lucky and multiple threads are involved, the type we
pass is from a random and unrelated object some other thread has
called this helper with.
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6d13cd3e1a..e687061420 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -99,6 +99,8 @@  static struct bitmap *reuse_packfile_bitmap;
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
 static int allow_pack_reuse = 1;
+static int in_commit_order;
+static int exclude_until_next_commit;
 static enum {
 	WRITE_BITMAP_FALSE = 0,
 	WRITE_BITMAP_QUIET,
@@ -132,6 +134,7 @@  struct configured_exclusion {
 	struct oidmap_entry e;
 	char *pack_hash_hex;
 	char *uri;
+	int recursively:1;
 };
 static struct oidmap configured_exclusions;
 
@@ -1291,10 +1294,16 @@  static int want_object_in_pack_one(struct packed_git *p,
  * and its offset in these variables.
  */
 static int want_object_in_pack(const struct object_id *oid,
+			       enum object_type type,
 			       int exclude,
 			       struct packed_git **found_pack,
 			       off_t *found_offset)
 {
+	if (exclude_until_next_commit && type != OBJ_COMMIT)
+		return 0;
+	if (type == OBJ_COMMIT)
+		exclude_until_next_commit = 0 ;
+
 	int want;
 	struct list_head *pos;
 	struct multi_pack_index *m;
@@ -1345,6 +1354,8 @@  static int want_object_in_pack(const struct object_id *oid,
 						&p) &&
 				    *p == ':') {
 					oidset_insert(&excluded_by_config, oid);
+					if(ex->recursively && type == OBJ_COMMIT)
+						exclude_until_next_commit = 1;
 					return 0;
 				}
 			}
@@ -1394,7 +1405,7 @@  static int add_object_entry(const struct object_id *oid, enum object_type type,
 	if (have_duplicate_entry(oid, exclude))
 		return 0;
 
-	if (!want_object_in_pack(oid, exclude, &found_pack, &found_offset)) {
+	if (!want_object_in_pack(oid, type, exclude, &found_pack, &found_offset)) {
 		/* The pack is missing an object, so it will not have closure */
 		if (write_bitmap_index) {
 			if (write_bitmap_index != WRITE_BITMAP_QUIET)
@@ -1420,7 +1431,7 @@  static int add_object_entry_from_bitmap(const struct object_id *oid,
 	if (have_duplicate_entry(oid, 0))
 		return 0;
 
-	if (!want_object_in_pack(oid, 0, &pack, &offset))
+	if (!want_object_in_pack(oid, type, 0, &pack, &offset))
 		return 0;
 
 	create_object_entry(oid, type, name_hash, 0, 0, pack, offset);
@@ -2985,27 +2996,33 @@  static int git_pack_config(const char *k, const char *v, void *cb)
 			pack_idx_opts.flags &= ~WRITE_REV;
 		return 0;
 	}
-	if (!strcmp(k, "uploadpack.blobpackfileuri")) {
+	if (!strcmp(k, "uploadpack.excludeobject")) {
 		struct configured_exclusion *ex = xmalloc(sizeof(*ex));
-		const char *oid_end, *pack_end;
+		const char *oid_end, *pack_end, *recursively_end;
 		/*
 		 * Stores the pack hash. This is not a true object ID, but is
 		 * of the same form.
 		 */
 		struct object_id pack_hash;
-
+		char recursively[2];
 		if (parse_oid_hex(v, &ex->e.oid, &oid_end) ||
 		    *oid_end != ' ' ||
-		    parse_oid_hex(oid_end + 1, &pack_hash, &pack_end) ||
+		    !strlcpy(recursively, oid_end + 1, sizeof(recursively)) ||
+		    parse_oid_hex(oid_end + 3, &pack_hash, &pack_end) ||
 		    *pack_end != ' ')
-			die(_("value of uploadpack.blobpackfileuri must be "
-			      "of the form '<object-hash> <pack-hash> <uri>' (got '%s')"), v);
+                        die(_("value of uploadpack.excludeobject must be "
+                              "of the form '<object-hash> <recursively> <pack-hash> <uri>' (got '%s')"), v);
 		if (oidmap_get(&configured_exclusions, &ex->e.oid))
-			die(_("object already configured in another "
-			      "uploadpack.blobpackfileuri (got '%s')"), v);
-		ex->pack_hash_hex = xcalloc(1, pack_end - oid_end);
-		memcpy(ex->pack_hash_hex, oid_end + 1, pack_end - oid_end - 1);
+                        die(_("object already configured by an earlier "
+                              "uploadpack.excludeobject (got '%s')"), v);
+		recursively_end = oid_end + 2;
+		ex->pack_hash_hex = xcalloc(1, pack_end - recursively_end);
+		memcpy(ex->pack_hash_hex, recursively_end + 1, pack_end - recursively_end - 1);
 		ex->uri = xstrdup(pack_end + 1);
+		if (atoi(recursively)) {
+			ex->recursively = 1;
+			in_commit_order = 1;
+                }
 		oidmap_put(&configured_exclusions, ex);
 	}
 	return git_default_config(k, v, cb);
@@ -3023,7 +3040,7 @@  static int add_object_entry_from_pack(const struct object_id *oid,
 	struct rev_info *revs = _data;
 	struct object_info oi = OBJECT_INFO_INIT;
 	off_t ofs;
-	enum object_type type;
+	static enum object_type type;
 
 	display_progress(progress_state, ++nr_seen);
 
@@ -3031,7 +3048,7 @@  static int add_object_entry_from_pack(const struct object_id *oid,
 		return 0;
 
 	ofs = nth_packed_object_offset(p, pos);
-	if (!want_object_in_pack(oid, 0, &p, &ofs))
+	if (!want_object_in_pack(oid, type, 0, &p, &ofs))
 		return 0;
 
 	oi.typep = &type;
@@ -3831,7 +3848,7 @@  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			 N_("respect islands during delta compression")),
 		OPT_STRING_LIST(0, "uri-protocol", &uri_protocols,
 				N_("protocol"),
-				N_("exclude any configured uploadpack.blobpackfileuri with this protocol")),
+				N_("exclude any configured uploadpack.excludeobject with this protocol")),
 		OPT_END(),
 	};
 
@@ -3903,6 +3920,12 @@  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		fetch_if_missing = 0;
 		strvec_push(&rp, "--exclude-promisor-objects");
 	}
+
+	if (in_commit_order){
+		use_internal_rev_list = 1;
+		strvec_push(&rp, "--in-commit-order");
+	}
+
 	if (unpack_unreachable || keep_unreachable || pack_loose_unreachable)
 		use_internal_rev_list = 1;
 
diff --git a/fetch-pack.c b/fetch-pack.c
index 2318ebe680..cdf8777907 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -23,6 +23,7 @@ 
 #include "fetch-negotiator.h"
 #include "fsck.h"
 #include "shallow.h"
+#include "strmap.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -1677,6 +1678,8 @@  static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		}
 	}
 
+	struct strset uris;
+	strset_init(&uris);
 	for (i = 0; i < packfile_uris.nr; i++) {
 		int j;
 		struct child_process cmd = CHILD_PROCESS_INIT;
@@ -1684,6 +1687,8 @@  static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		const char *uri = packfile_uris.items[i].string +
 			the_hash_algo->hexsz + 1;
 
+		if (!strset_add(&uris, uri))
+			continue;
 		strvec_push(&cmd.args, "http-fetch");
 		strvec_pushf(&cmd.args, "--packfile=%.*s",
 			     (int) the_hash_algo->hexsz,
diff --git a/upload-pack.c b/upload-pack.c
index 5c1cd19612..4d994658d2 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1745,9 +1745,8 @@  int upload_pack_advertise(struct repository *r,
 			strbuf_addstr(value, " sideband-all");
 
 		if (!repo_config_get_string(the_repository,
-					    "uploadpack.blobpackfileuri",
-					    &str) &&
-		    str) {
+					 "uploadpack.excludeobject",
+					 &str) && str) {
 			strbuf_addstr(value, " packfile-uris");
 			free(str);
 		}