Message ID | 20190110042551.915769-4-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tree-walk object_id refactor | expand |
On Thu, Jan 10, 2019 at 04:25:49AM +0000, brian m. carlson wrote: > When we're splicing trees, we're writing directly from one location into > a buffer that is exactly the same size as a tree object. If the current > hash algorithm is SHA-1, we may not have a full 32 (GIT_MAX_RAWSZ) bytes > available to write, nor would we want to write that many bytes even if > we did. In a future commit, we'll split out hashcpy to respect > the_hash_algo while oidcpy uses GIT_MAX_RAWSZ, so convert the oidcpy to > a hashcpy so we copy the right number of bytes. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > match-trees.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/match-trees.c b/match-trees.c > index feca48a5fd..b1fbd022d1 100644 > --- a/match-trees.c > +++ b/match-trees.c > @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, > } else { > rewrite_with = oid2; > } > - oidcpy(rewrite_here, rewrite_with); > + hashcpy(rewrite_here->hash, rewrite_with->hash); Hrm. Our coccinelle patches will want to convert this back to oidcpy(), though I see you drop them in the final patch. However, I wonder if it points to another mismatch. Isn't the point that we _don't_ actually have "struct object_id"s here? I.e., rewrite_here and rewrite_with should actually be "const unsigned char *" that we happen to know are the_hash_algo->raw_sz? I think the only reason they are "struct object_id" is because that's what tree_entry_extract() returns. Should we be providing another function to allow more byte-oriented access? -Peff
On Thu, Jan 10, 2019 at 01:45:20AM -0500, Jeff King wrote: > On Thu, Jan 10, 2019 at 04:25:49AM +0000, brian m. carlson wrote: > > diff --git a/match-trees.c b/match-trees.c > > index feca48a5fd..b1fbd022d1 100644 > > --- a/match-trees.c > > +++ b/match-trees.c > > @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, > > } else { > > rewrite_with = oid2; > > } > > - oidcpy(rewrite_here, rewrite_with); > > + hashcpy(rewrite_here->hash, rewrite_with->hash); > > Hrm. Our coccinelle patches will want to convert this back to oidcpy(), > though I see you drop them in the final patch. > > However, I wonder if it points to another mismatch. Isn't the point that > we _don't_ actually have "struct object_id"s here? I.e., rewrite_here > and rewrite_with should actually be "const unsigned char *" that we > happen to know are the_hash_algo->raw_sz? Correct. > I think the only reason they are "struct object_id" is because that's > what tree_entry_extract() returns. Should we be providing another > function to allow more byte-oriented access? The reason is we recursively call splice_tree, passing rewrite_here as the first argument. That argument is then used for read_object_file, which requires a struct object_id * (because it is, logically, an object ID). I *think* we could fix this by copying an unsigned char *rewrite_here into a temporary struct object_id before we recurse, but it's not obvious to me if that's safe to do.
On Thu, Jan 10, 2019 at 11:55:51PM +0000, brian m. carlson wrote: > > I think the only reason they are "struct object_id" is because that's > > what tree_entry_extract() returns. Should we be providing another > > function to allow more byte-oriented access? > > The reason is we recursively call splice_tree, passing rewrite_here as > the first argument. That argument is then used for read_object_file, > which requires a struct object_id * (because it is, logically, an object > ID). > > I *think* we could fix this by copying an unsigned char *rewrite_here > into a temporary struct object_id before we recurse, but it's not > obvious to me if that's safe to do. I think rewrite_here needs to be a direct pointer into the buffer, since we plan to modify it. I think rewrite_with is correct to be an object_id. It's either the oid passed in from the caller, or the subtree we generated (in which case it's the result from write_object_file). So the "most correct" thing is probably something like this: diff --git a/match-trees.c b/match-trees.c index 03e81b56e1..129b13a970 100644 --- a/match-trees.c +++ b/match-trees.c @@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, char *buf; unsigned long sz; struct tree_desc desc; - struct object_id *rewrite_here; + unsigned char *rewrite_here; const struct object_id *rewrite_with; struct object_id subtree; enum object_type type; @@ -206,9 +206,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, if (!S_ISDIR(mode)) die("entry %s in tree %s is not a tree", name, oid_to_hex(oid1)); - rewrite_here = (struct object_id *)(desc.entry.path + - strlen(desc.entry.path) + - 1); + /* + * We cast here for two reasons: + * + * - to flip the "char *" (for the path) to "unsigned + * char *" (for the hash stored after it) + * + * - to discard the "const"; this is OK because we + * know it points into our non-const "buf" + */ + rewrite_here = (unsigned char *)desc.entry.path + strlen(desc.entry.path) + 1; break; } update_tree_entry(&desc); @@ -224,7 +231,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, } else { rewrite_with = oid2; } - hashcpy(rewrite_here->hash, rewrite_with->hash); + hashcpy(rewrite_here, rewrite_with->hash); status = write_object_file(buf, sz, tree_type, result); free(buf); return status; I think if I were trying to write this in a less-subtle way, I'd probably stop trying to do it in-place, and have a copy loop more like: for entry in src_tree if match_entry(entry, prefix) entry = rewrite_entry(entry) /* either oid2 or subtree */ push_entry(dst_tree) We may even have to go that way eventually if we might ever be rewriting to a tree with a different hash size (i.e., there is a hidden assumption here that rewrite_here points to exactly the_hash_algo->rawsz bytes of hash). But I think for now it's not necessary, and it's way outside the scope of what you're trying to do here. -Peff
On Fri, Jan 11, 2019 at 09:51:06AM -0500, Jeff King wrote: > I think rewrite_here needs to be a direct pointer into the buffer, since > we plan to modify it. > > I think rewrite_with is correct to be an object_id. It's either the oid > passed in from the caller, or the subtree we generated (in which case > it's the result from write_object_file). > > So the "most correct" thing is probably something like this: Of course, it would be nice if what I sent compiled. ;) rewrite_here does double duty: it's the pointer in the tree that we need to rewrite, and it's also the oid we pass down recursively. So we have to handle both cases, like so: diff --git a/match-trees.c b/match-trees.c index 03e81b56e1..3e0ed889b4 100644 --- a/match-trees.c +++ b/match-trees.c @@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, char *buf; unsigned long sz; struct tree_desc desc; - struct object_id *rewrite_here; + unsigned char *rewrite_here; const struct object_id *rewrite_with; struct object_id subtree; enum object_type type; @@ -206,9 +206,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, if (!S_ISDIR(mode)) die("entry %s in tree %s is not a tree", name, oid_to_hex(oid1)); - rewrite_here = (struct object_id *)(desc.entry.path + - strlen(desc.entry.path) + - 1); + /* + * We cast here for two reasons: + * + * - to flip the "char *" (for the path) to "unsigned + * char *" (for the hash stored after it) + * + * - to discard the "const"; this is OK because we + * know it points into our non-const "buf" + */ + rewrite_here = (unsigned char *)desc.entry.path + strlen(desc.entry.path) + 1; break; } update_tree_entry(&desc); @@ -217,14 +224,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, die("entry %.*s not found in tree %s", toplen, prefix, oid_to_hex(oid1)); if (*subpath) { - status = splice_tree(rewrite_here, subpath, oid2, &subtree); + struct object_id tree_oid; + hashcpy(tree_oid.hash, rewrite_here); + status = splice_tree(&tree_oid, subpath, oid2, &subtree); if (status) return status; rewrite_with = &subtree; } else { rewrite_with = oid2; } - hashcpy(rewrite_here->hash, rewrite_with->hash); + hashcpy(rewrite_here, rewrite_with->hash); status = write_object_file(buf, sz, tree_type, result); free(buf); return status;
On Fri, Jan 11, 2019 at 09:54:46AM -0500, Jeff King wrote: > On Fri, Jan 11, 2019 at 09:51:06AM -0500, Jeff King wrote: > > > I think rewrite_here needs to be a direct pointer into the buffer, since > > we plan to modify it. > > > > I think rewrite_with is correct to be an object_id. It's either the oid > > passed in from the caller, or the subtree we generated (in which case > > it's the result from write_object_file). > > > > So the "most correct" thing is probably something like this: > > Of course, it would be nice if what I sent compiled. ;) > > rewrite_here does double duty: it's the pointer in the tree that we need > to rewrite, and it's also the oid we pass down recursively. So we have > to handle both cases, like so: Since I took most of the patch you wrote, may I apply your sign-off to the resulting patch I send out?
On Mon, Jan 14, 2019 at 01:30:17AM +0000, brian m. carlson wrote: > > > So the "most correct" thing is probably something like this: > > > > Of course, it would be nice if what I sent compiled. ;) > > > > rewrite_here does double duty: it's the pointer in the tree that we need > > to rewrite, and it's also the oid we pass down recursively. So we have > > to handle both cases, like so: > > Since I took most of the patch you wrote, may I apply your sign-off to > the resulting patch I send out? Yes, definitely. -Peff
diff --git a/match-trees.c b/match-trees.c index feca48a5fd..b1fbd022d1 100644 --- a/match-trees.c +++ b/match-trees.c @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, } else { rewrite_with = oid2; } - oidcpy(rewrite_here, rewrite_with); + hashcpy(rewrite_here->hash, rewrite_with->hash); status = write_object_file(buf, sz, tree_type, result); free(buf); return status;
When we're splicing trees, we're writing directly from one location into a buffer that is exactly the same size as a tree object. If the current hash algorithm is SHA-1, we may not have a full 32 (GIT_MAX_RAWSZ) bytes available to write, nor would we want to write that many bytes even if we did. In a future commit, we'll split out hashcpy to respect the_hash_algo while oidcpy uses GIT_MAX_RAWSZ, so convert the oidcpy to a hashcpy so we copy the right number of bytes. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- match-trees.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)