diff mbox series

[3/5] match-trees: use hashcpy to splice trees

Message ID 20190110042551.915769-4-sandals@crustytoothpaste.net (mailing list archive)
State New, archived
Headers show
Series tree-walk object_id refactor | expand

Commit Message

brian m. carlson Jan. 10, 2019, 4:25 a.m. UTC
When we're splicing trees, we're writing directly from one location into
a buffer that is exactly the same size as a tree object. If the current
hash algorithm is SHA-1, we may not have a full 32 (GIT_MAX_RAWSZ) bytes
available to write, nor would we want to write that many bytes even if
we did. In a future commit, we'll split out hashcpy to respect
the_hash_algo while oidcpy uses GIT_MAX_RAWSZ, so convert the oidcpy to
a hashcpy so we copy the right number of bytes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 match-trees.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jeff King Jan. 10, 2019, 6:45 a.m. UTC | #1
On Thu, Jan 10, 2019 at 04:25:49AM +0000, brian m. carlson wrote:

> When we're splicing trees, we're writing directly from one location into
> a buffer that is exactly the same size as a tree object. If the current
> hash algorithm is SHA-1, we may not have a full 32 (GIT_MAX_RAWSZ) bytes
> available to write, nor would we want to write that many bytes even if
> we did. In a future commit, we'll split out hashcpy to respect
> the_hash_algo while oidcpy uses GIT_MAX_RAWSZ, so convert the oidcpy to
> a hashcpy so we copy the right number of bytes.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  match-trees.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/match-trees.c b/match-trees.c
> index feca48a5fd..b1fbd022d1 100644
> --- a/match-trees.c
> +++ b/match-trees.c
> @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
>  	} else {
>  		rewrite_with = oid2;
>  	}
> -	oidcpy(rewrite_here, rewrite_with);
> +	hashcpy(rewrite_here->hash, rewrite_with->hash);

Hrm. Our coccinelle patches will want to convert this back to oidcpy(),
though I see you drop them in the final patch.

However, I wonder if it points to another mismatch. Isn't the point that
we _don't_ actually have "struct object_id"s here? I.e., rewrite_here
and rewrite_with should actually be "const unsigned char *" that we
happen to know are the_hash_algo->raw_sz?

I think the only reason they are "struct object_id" is because that's
what tree_entry_extract() returns. Should we be providing another
function to allow more byte-oriented access?

-Peff
brian m. carlson Jan. 10, 2019, 11:55 p.m. UTC | #2
On Thu, Jan 10, 2019 at 01:45:20AM -0500, Jeff King wrote:
> On Thu, Jan 10, 2019 at 04:25:49AM +0000, brian m. carlson wrote:
> > diff --git a/match-trees.c b/match-trees.c
> > index feca48a5fd..b1fbd022d1 100644
> > --- a/match-trees.c
> > +++ b/match-trees.c
> > @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
> >  	} else {
> >  		rewrite_with = oid2;
> >  	}
> > -	oidcpy(rewrite_here, rewrite_with);
> > +	hashcpy(rewrite_here->hash, rewrite_with->hash);
> 
> Hrm. Our coccinelle patches will want to convert this back to oidcpy(),
> though I see you drop them in the final patch.
> 
> However, I wonder if it points to another mismatch. Isn't the point that
> we _don't_ actually have "struct object_id"s here? I.e., rewrite_here
> and rewrite_with should actually be "const unsigned char *" that we
> happen to know are the_hash_algo->raw_sz?

Correct.

> I think the only reason they are "struct object_id" is because that's
> what tree_entry_extract() returns. Should we be providing another
> function to allow more byte-oriented access?

The reason is we recursively call splice_tree, passing rewrite_here as
the first argument. That argument is then used for read_object_file,
which requires a struct object_id * (because it is, logically, an object
ID).

I *think* we could fix this by copying an unsigned char *rewrite_here
into a temporary struct object_id before we recurse, but it's not
obvious to me if that's safe to do.
Jeff King Jan. 11, 2019, 2:51 p.m. UTC | #3
On Thu, Jan 10, 2019 at 11:55:51PM +0000, brian m. carlson wrote:

> > I think the only reason they are "struct object_id" is because that's
> > what tree_entry_extract() returns. Should we be providing another
> > function to allow more byte-oriented access?
> 
> The reason is we recursively call splice_tree, passing rewrite_here as
> the first argument. That argument is then used for read_object_file,
> which requires a struct object_id * (because it is, logically, an object
> ID).
> 
> I *think* we could fix this by copying an unsigned char *rewrite_here
> into a temporary struct object_id before we recurse, but it's not
> obvious to me if that's safe to do.

I think rewrite_here needs to be a direct pointer into the buffer, since
we plan to modify it.

I think rewrite_with is correct to be an object_id. It's either the oid
passed in from the caller, or the subtree we generated (in which case
it's the result from write_object_file).

So the "most correct" thing is probably something like this:

diff --git a/match-trees.c b/match-trees.c
index 03e81b56e1..129b13a970 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	char *buf;
 	unsigned long sz;
 	struct tree_desc desc;
-	struct object_id *rewrite_here;
+	unsigned char *rewrite_here;
 	const struct object_id *rewrite_with;
 	struct object_id subtree;
 	enum object_type type;
@@ -206,9 +206,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 			if (!S_ISDIR(mode))
 				die("entry %s in tree %s is not a tree", name,
 				    oid_to_hex(oid1));
-			rewrite_here = (struct object_id *)(desc.entry.path +
-							    strlen(desc.entry.path) +
-							    1);
+			/*
+			 * We cast here for two reasons:
+			 *
+			 *   - to flip the "char *" (for the path) to "unsigned
+			 *     char *" (for the hash stored after it)
+			 *
+			 *   - to discard the "const"; this is OK because we
+			 *     know it points into our non-const "buf"
+			 */
+			rewrite_here = (unsigned char *)desc.entry.path + strlen(desc.entry.path) + 1;
 			break;
 		}
 		update_tree_entry(&desc);
@@ -224,7 +231,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	} else {
 		rewrite_with = oid2;
 	}
-	hashcpy(rewrite_here->hash, rewrite_with->hash);
+	hashcpy(rewrite_here, rewrite_with->hash);
 	status = write_object_file(buf, sz, tree_type, result);
 	free(buf);
 	return status;

I think if I were trying to write this in a less-subtle way, I'd
probably stop trying to do it in-place, and have a copy loop more like:

  for entry in src_tree
    if match_entry(entry, prefix)
      entry = rewrite_entry(entry) /* either oid2 or subtree */
    push_entry(dst_tree)

We may even have to go that way eventually if we might ever be rewriting
to a tree with a different hash size (i.e., there is a hidden assumption
here that rewrite_here points to exactly the_hash_algo->rawsz bytes of
hash). But I think for now it's not necessary, and it's way outside the
scope of what you're trying to do here.

-Peff
Jeff King Jan. 11, 2019, 2:54 p.m. UTC | #4
On Fri, Jan 11, 2019 at 09:51:06AM -0500, Jeff King wrote:

> I think rewrite_here needs to be a direct pointer into the buffer, since
> we plan to modify it.
> 
> I think rewrite_with is correct to be an object_id. It's either the oid
> passed in from the caller, or the subtree we generated (in which case
> it's the result from write_object_file).
> 
> So the "most correct" thing is probably something like this:

Of course, it would be nice if what I sent compiled. ;)

rewrite_here does double duty: it's the pointer in the tree that we need
to rewrite, and it's also the oid we pass down recursively. So we have
to handle both cases, like so:

diff --git a/match-trees.c b/match-trees.c
index 03e81b56e1..3e0ed889b4 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	char *buf;
 	unsigned long sz;
 	struct tree_desc desc;
-	struct object_id *rewrite_here;
+	unsigned char *rewrite_here;
 	const struct object_id *rewrite_with;
 	struct object_id subtree;
 	enum object_type type;
@@ -206,9 +206,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 			if (!S_ISDIR(mode))
 				die("entry %s in tree %s is not a tree", name,
 				    oid_to_hex(oid1));
-			rewrite_here = (struct object_id *)(desc.entry.path +
-							    strlen(desc.entry.path) +
-							    1);
+			/*
+			 * We cast here for two reasons:
+			 *
+			 *   - to flip the "char *" (for the path) to "unsigned
+			 *     char *" (for the hash stored after it)
+			 *
+			 *   - to discard the "const"; this is OK because we
+			 *     know it points into our non-const "buf"
+			 */
+			rewrite_here = (unsigned char *)desc.entry.path + strlen(desc.entry.path) + 1;
 			break;
 		}
 		update_tree_entry(&desc);
@@ -217,14 +224,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 		die("entry %.*s not found in tree %s", toplen, prefix,
 		    oid_to_hex(oid1));
 	if (*subpath) {
-		status = splice_tree(rewrite_here, subpath, oid2, &subtree);
+		struct object_id tree_oid;
+		hashcpy(tree_oid.hash, rewrite_here);
+		status = splice_tree(&tree_oid, subpath, oid2, &subtree);
 		if (status)
 			return status;
 		rewrite_with = &subtree;
 	} else {
 		rewrite_with = oid2;
 	}
-	hashcpy(rewrite_here->hash, rewrite_with->hash);
+	hashcpy(rewrite_here, rewrite_with->hash);
 	status = write_object_file(buf, sz, tree_type, result);
 	free(buf);
 	return status;
brian m. carlson Jan. 14, 2019, 1:30 a.m. UTC | #5
On Fri, Jan 11, 2019 at 09:54:46AM -0500, Jeff King wrote:
> On Fri, Jan 11, 2019 at 09:51:06AM -0500, Jeff King wrote:
> 
> > I think rewrite_here needs to be a direct pointer into the buffer, since
> > we plan to modify it.
> > 
> > I think rewrite_with is correct to be an object_id. It's either the oid
> > passed in from the caller, or the subtree we generated (in which case
> > it's the result from write_object_file).
> > 
> > So the "most correct" thing is probably something like this:
> 
> Of course, it would be nice if what I sent compiled. ;)
> 
> rewrite_here does double duty: it's the pointer in the tree that we need
> to rewrite, and it's also the oid we pass down recursively. So we have
> to handle both cases, like so:

Since I took most of the patch you wrote, may I apply your sign-off to
the resulting patch I send out?
Jeff King Jan. 14, 2019, 3:40 p.m. UTC | #6
On Mon, Jan 14, 2019 at 01:30:17AM +0000, brian m. carlson wrote:

> > > So the "most correct" thing is probably something like this:
> > 
> > Of course, it would be nice if what I sent compiled. ;)
> > 
> > rewrite_here does double duty: it's the pointer in the tree that we need
> > to rewrite, and it's also the oid we pass down recursively. So we have
> > to handle both cases, like so:
> 
> Since I took most of the patch you wrote, may I apply your sign-off to
> the resulting patch I send out?

Yes, definitely.

-Peff
diff mbox series

Patch

diff --git a/match-trees.c b/match-trees.c
index feca48a5fd..b1fbd022d1 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -224,7 +224,7 @@  static int splice_tree(const struct object_id *oid1, const char *prefix,
 	} else {
 		rewrite_with = oid2;
 	}
-	oidcpy(rewrite_here, rewrite_with);
+	hashcpy(rewrite_here->hash, rewrite_with->hash);
 	status = write_object_file(buf, sz, tree_type, result);
 	free(buf);
 	return status;