Message ID | 20181015021900.1030041-14-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Base SHA-256 implementation | expand |
On 10/14/2018 10:19 PM, brian m. carlson wrote: > Since the commit-graph code wants to serialize the hash algorithm into > the data store, specify a version number for each supported algorithm. > Note that we don't use the values of the constants themselves, as they > are internal and could change in the future. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > commit-graph.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 7a28fbb03f..e587c21bb6 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) > > static uint8_t oid_version(void) > { > - return 1; > + switch (hash_algo_by_ptr(the_hash_algo)) { > + case GIT_HASH_SHA1: > + return 1; > + case GIT_HASH_SHA256: > + return 2; > + default: > + BUG("unknown hash algorithm"); > + } > } > > static struct commit_graph *alloc_commit_graph(void) Simple and easy! Thanks, -Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > Since the commit-graph code wants to serialize the hash algorithm into > the data store, specify a version number for each supported algorithm. > Note that we don't use the values of the constants themselves, as they > are internal and could change in the future. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > commit-graph.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 7a28fbb03f..e587c21bb6 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) > > static uint8_t oid_version(void) > { > - return 1; > + switch (hash_algo_by_ptr(the_hash_algo)) { > + case GIT_HASH_SHA1: > + return 1; > + case GIT_HASH_SHA256: > + return 2; > + default: > + BUG("unknown hash algorithm"); > + } Style: align switch/case. Shouldn't this be just using GIT_HASH_* constants instead? Are we expecting that it would be benefitial to allow more than one "oid version" even within a same GIT_HASH_*? > } > > static struct commit_graph *alloc_commit_graph(void)
On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > Since the commit-graph code wants to serialize the hash algorithm into > the data store, specify a version number for each supported algorithm. > Note that we don't use the values of the constants themselves, as they > are internal and could change in the future. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > commit-graph.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 7a28fbb03f..e587c21bb6 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) > > static uint8_t oid_version(void) > { > - return 1; > + switch (hash_algo_by_ptr(the_hash_algo)) { > + case GIT_HASH_SHA1: > + return 1; > + case GIT_HASH_SHA256: > + return 2; Should we just increase this field to uint32_t and store format_id instead? That will keep oid version unique in all data formats. > + default: > + BUG("unknown hash algorithm"); > + } > } > > static struct commit_graph *alloc_commit_graph(void)
On 10/16/2018 11:35 AM, Duy Nguyen wrote: > On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson > <sandals@crustytoothpaste.net> wrote: >> Since the commit-graph code wants to serialize the hash algorithm into >> the data store, specify a version number for each supported algorithm. >> Note that we don't use the values of the constants themselves, as they >> are internal and could change in the future. >> >> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> >> --- >> commit-graph.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/commit-graph.c b/commit-graph.c >> index 7a28fbb03f..e587c21bb6 100644 >> --- a/commit-graph.c >> +++ b/commit-graph.c >> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) >> >> static uint8_t oid_version(void) >> { >> - return 1; >> + switch (hash_algo_by_ptr(the_hash_algo)) { >> + case GIT_HASH_SHA1: >> + return 1; >> + case GIT_HASH_SHA256: >> + return 2; > Should we just increase this field to uint32_t and store format_id > instead? That will keep oid version unique in all data formats. Both the commit-graph and multi-pack-index store a single byte for the hash version, so that ship has sailed (without incrementing the full file version number in each format). It may be good to make this method accessible to both formats. I'm not sure if Brian's branch is built on top of the multi-pack-index code. Probably best to see if ds/multi-pack-verify is in the history. Thanks, -Stolee
On Tue, Oct 16, 2018 at 6:01 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 10/16/2018 11:35 AM, Duy Nguyen wrote: > > On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson > > <sandals@crustytoothpaste.net> wrote: > >> Since the commit-graph code wants to serialize the hash algorithm into > >> the data store, specify a version number for each supported algorithm. > >> Note that we don't use the values of the constants themselves, as they > >> are internal and could change in the future. > >> > >> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > >> --- > >> commit-graph.c | 9 ++++++++- > >> 1 file changed, 8 insertions(+), 1 deletion(-) > >> > >> diff --git a/commit-graph.c b/commit-graph.c > >> index 7a28fbb03f..e587c21bb6 100644 > >> --- a/commit-graph.c > >> +++ b/commit-graph.c > >> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) > >> > >> static uint8_t oid_version(void) > >> { > >> - return 1; > >> + switch (hash_algo_by_ptr(the_hash_algo)) { > >> + case GIT_HASH_SHA1: > >> + return 1; > >> + case GIT_HASH_SHA256: > >> + return 2; > > Should we just increase this field to uint32_t and store format_id > > instead? That will keep oid version unique in all data formats. > Both the commit-graph and multi-pack-index store a single byte for the > hash version, so that ship has sailed (without incrementing the full > file version number in each format). And it's probably premature to add the oid version field when multiple hash support has not been fully realized. Now we have different ways of storing hash id and need separate mappings. I would go for incrementing file version. Otherwise maybe we just update format_id to be one byte instead, and the way of storing hash version in commit-graph will be used everywhere. > It may be good to make this method accessible to both formats. I'm not > sure if Brian's branch is built on top of the multi-pack-index code. > Probably best to see if ds/multi-pack-verify is in the history. > > Thanks, > -Stolee
On Tue, Oct 16, 2018 at 11:00:19AM +0900, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > Since the commit-graph code wants to serialize the hash algorithm into > > the data store, specify a version number for each supported algorithm. > > Note that we don't use the values of the constants themselves, as they > > are internal and could change in the future. > > > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > > --- > > commit-graph.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/commit-graph.c b/commit-graph.c > > index 7a28fbb03f..e587c21bb6 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) > > > > static uint8_t oid_version(void) > > { > > - return 1; > > + switch (hash_algo_by_ptr(the_hash_algo)) { > > + case GIT_HASH_SHA1: > > + return 1; > > + case GIT_HASH_SHA256: > > + return 2; > > + default: > > + BUG("unknown hash algorithm"); > > + } > > Style: align switch/case. Will fix. > Shouldn't this be just using GIT_HASH_* constants instead? Are we > expecting that it would be benefitial to allow more than one "oid > version" even within a same GIT_HASH_*? I really would like to have us rely not rely explicitly on those values. We don't serialize them anywhere else, and they're explicitly documented as not being suitable for serialization. If we were writing this fresh today, we'd probably use the format_version field, which is designed for this purpose, or simply omit the field altogether.
On Tue, Oct 16, 2018 at 06:09:41PM +0200, Duy Nguyen wrote: > On Tue, Oct 16, 2018 at 6:01 PM Derrick Stolee <stolee@gmail.com> wrote: > > > > On 10/16/2018 11:35 AM, Duy Nguyen wrote: > > > On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson > > > <sandals@crustytoothpaste.net> wrote: > > >> Since the commit-graph code wants to serialize the hash algorithm into > > >> the data store, specify a version number for each supported algorithm. > > >> Note that we don't use the values of the constants themselves, as they > > >> are internal and could change in the future. > > >> > > >> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > > >> --- > > >> commit-graph.c | 9 ++++++++- > > >> 1 file changed, 8 insertions(+), 1 deletion(-) > > >> > > >> diff --git a/commit-graph.c b/commit-graph.c > > >> index 7a28fbb03f..e587c21bb6 100644 > > >> --- a/commit-graph.c > > >> +++ b/commit-graph.c > > >> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) > > >> > > >> static uint8_t oid_version(void) > > >> { > > >> - return 1; > > >> + switch (hash_algo_by_ptr(the_hash_algo)) { > > >> + case GIT_HASH_SHA1: > > >> + return 1; > > >> + case GIT_HASH_SHA256: > > >> + return 2; > > > Should we just increase this field to uint32_t and store format_id > > > instead? That will keep oid version unique in all data formats. > > Both the commit-graph and multi-pack-index store a single byte for the > > hash version, so that ship has sailed (without incrementing the full > > file version number in each format). > > And it's probably premature to add the oid version field when multiple > hash support has not been fully realized. Now we have different ways > of storing hash id and need separate mappings. Honestly, anything in the .git directory that is not the v3 pack indexes or the loose object file should be in exactly one hash algorithm. We could simply just leave this value at 1 all the time and ignore the field, since we already know what algorithm it will use. > I would go for incrementing file version. Otherwise maybe we just > update format_id to be one byte instead, and the way of storing hash > version in commit-graph will be used everywhere. It needs to be four bytes for pack files so that it's four-byte aligned. Otherwise accessing pack index fields will cause alignment issues if we don't massively rewrite the pack handling code.
On 10/14/2018 10:19 PM, brian m. carlson wrote: > Since the commit-graph code wants to serialize the hash algorithm into > the data store, specify a version number for each supported algorithm. > Note that we don't use the values of the constants themselves, as they > are internal and could change in the future. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > commit-graph.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 7a28fbb03f..e587c21bb6 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) > > static uint8_t oid_version(void) > { > - return 1; > + switch (hash_algo_by_ptr(the_hash_algo)) { hash_algo_by_ptr is specified as an inline function in hash.h, so this leads to a linking error in jch and pu right now. I think the fix is simply to add '#include "hash.h"' to the list of includes. Thanks, -Stolee
On Wed, Oct 17, 2018 at 12:44 AM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > > >> static uint8_t oid_version(void) > > > >> { > > > >> - return 1; > > > >> + switch (hash_algo_by_ptr(the_hash_algo)) { > > > >> + case GIT_HASH_SHA1: > > > >> + return 1; > > > >> + case GIT_HASH_SHA256: > > > >> + return 2; > > > > Should we just increase this field to uint32_t and store format_id > > > > instead? That will keep oid version unique in all data formats. > > > Both the commit-graph and multi-pack-index store a single byte for the > > > hash version, so that ship has sailed (without incrementing the full > > > file version number in each format). > > > > And it's probably premature to add the oid version field when multiple > > hash support has not been fully realized. Now we have different ways > > of storing hash id and need separate mappings. > > Honestly, anything in the .git directory that is not the v3 pack indexes > or the loose object file should be in exactly one hash algorithm. We > could simply just leave this value at 1 all the time and ignore the > field, since we already know what algorithm it will use. In this particular case, I agree, but not as a general principle. It's nice to have independence for fsck-like tools. I don't know if we have a tool that simply validates commit-graph file format (and not trying to access any real object). But for such a tool, I guess we can just pass the hash algorithm from command line. The user would have to guess a bit.
On Wed, Oct 17, 2018 at 08:21:42AM -0400, Derrick Stolee wrote: > On 10/14/2018 10:19 PM, brian m. carlson wrote: > > Since the commit-graph code wants to serialize the hash algorithm into > > the data store, specify a version number for each supported algorithm. > > Note that we don't use the values of the constants themselves, as they > > are internal and could change in the future. > > > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > > --- > > commit-graph.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/commit-graph.c b/commit-graph.c > > index 7a28fbb03f..e587c21bb6 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) > > static uint8_t oid_version(void) > > { > > - return 1; > > + switch (hash_algo_by_ptr(the_hash_algo)) { > hash_algo_by_ptr is specified as an inline function in hash.h, so this leads > to a linking error in jch and pu right now. > > I think the fix is simply to add '#include "hash.h"' to the list of > includes. hash.h is already included by cache.h, so it should be present. We probably need to make it static. I'll make that change; can you see if it fixes the problem for you?
On Wed, Oct 17, 2018 at 04:31:19PM +0200, Duy Nguyen wrote: > On Wed, Oct 17, 2018 at 12:44 AM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > Honestly, anything in the .git directory that is not the v3 pack indexes > > or the loose object file should be in exactly one hash algorithm. We > > could simply just leave this value at 1 all the time and ignore the > > field, since we already know what algorithm it will use. > > In this particular case, I agree, but not as a general principle. It's > nice to have independence for fsck-like tools. I don't know if we have > a tool that simply validates commit-graph file format (and not trying > to access any real object). But for such a tool, I guess we can just > pass the hash algorithm from command line. The user would have to > guess a bit. I'm going to drop this patch for now. I'll send a follow-up series later which bumps the format version for this and the multi-pack index and serializes them with the four-byte value. I probably should have caught this earlier, but unfortunately I don't always have the time to look at every series that hits the list.
On 10/17/2018 8:06 PM, brian m. carlson wrote: > On Wed, Oct 17, 2018 at 04:31:19PM +0200, Duy Nguyen wrote: >> On Wed, Oct 17, 2018 at 12:44 AM brian m. carlson >> <sandals@crustytoothpaste.net> wrote: >>> Honestly, anything in the .git directory that is not the v3 pack indexes >>> or the loose object file should be in exactly one hash algorithm. We >>> could simply just leave this value at 1 all the time and ignore the >>> field, since we already know what algorithm it will use. >> In this particular case, I agree, but not as a general principle. It's >> nice to have independence for fsck-like tools. I don't know if we have >> a tool that simply validates commit-graph file format (and not trying >> to access any real object). But for such a tool, I guess we can just >> pass the hash algorithm from command line. The user would have to >> guess a bit. > I'm going to drop this patch for now. I'll send a follow-up series > later which bumps the format version for this and the multi-pack index > and serializes them with the four-byte value. I probably should have > caught this earlier, but unfortunately I don't always have the time to > look at every series that hits the list. We should coordinate before incrementing the version number. I was working on making the file formats incremental (think split-index) but couldn't come up with a way to do it without incrementing the file format. It would be best to combine these features into v2 of each format. Thanks, -Stolee
On Thu, Oct 18, 2018 at 09:03:22AM -0400, Derrick Stolee wrote: > We should coordinate before incrementing the version number. I was working > on making the file formats incremental (think split-index) but couldn't come > up with a way to do it without incrementing the file format. It would be > best to combine these features into v2 of each format. For the moment, I'm happy to leave things as they are, and I'll interpret version 1 of the hash version field as whatever hash is in use elsewhere in .git. If you're going to bump the version in the future, feel free to CC me on the patches, and we'll make sure that we serialize the format_version field in the file then.
diff --git a/commit-graph.c b/commit-graph.c index 7a28fbb03f..e587c21bb6 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir) static uint8_t oid_version(void) { - return 1; + switch (hash_algo_by_ptr(the_hash_algo)) { + case GIT_HASH_SHA1: + return 1; + case GIT_HASH_SHA256: + return 2; + default: + BUG("unknown hash algorithm"); + } } static struct commit_graph *alloc_commit_graph(void)
Since the commit-graph code wants to serialize the hash algorithm into the data store, specify a version number for each supported algorithm. Note that we don't use the values of the constants themselves, as they are internal and could change in the future. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- commit-graph.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)