Message ID | 20210423164118.693197-4-lukeshu@lukeshu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fast-export, fast-import: implement signed-commits | expand |
Luke Shumaker <lukeshu@lukeshu.com> writes: > From: Luke Shumaker <lukeshu@datawire.io> > > fast-export has an existing --signed-tags= flag that controls how to Don't call a command line option "a flag", especially when it is not a boolean. "has an existing" feels redundantly repeticious. > handle tag signatures. However, there is no equivalent for commit > signatures; it just silently strips the signature out of the commit > (analogously to --signed-tags=strip). > > While signatures are generally problematic for fast-export/fast-import > (because hashes are likely to change), if they're going to support tag > signatures, there's no reason to not also support commit signatures. > > So, implement signed-commits. That's misleading. You are not inventing "git commit --signed" here. So implement `--signed-commits=<disposition>` that mirrors the `--signed-tags=<disposition>` option. > +--signed-commits=(verbatim|warn-verbatim|warn-strip|strip|abort):: > + Specify how to handle signed commits. Behaves exactly as > + --signed-tags (but for commits), except that the default is > + 'warn-strip' rather than 'abort'. Why deliberate inconsistency? I am not sure "historically we did a wrong thing" is a good reason (if we view that silently stripping was a disservice to the users, aborting would be a bugfix). > diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt > index 458af0a2d6..4955c94305 100644 > --- a/Documentation/git-fast-import.txt > +++ b/Documentation/git-fast-import.txt > diff --git a/builtin/fast-export.c b/builtin/fast-export.c > index d121dd2ee6..2b1101d104 100644 > --- a/builtin/fast-export.c > +++ b/builtin/fast-export.c > @@ -30,8 +30,11 @@ static const char *fast_export_usage[] = { > NULL > }; > > +enum sign_mode { SIGN_ABORT, SIGN_VERBATIM, SIGN_STRIP, SIGN_VERBATIM_WARN, SIGN_STRIP_WARN }; > + > static int progress; > -static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT; Giving the enum values consistent prefix "SIGN_" is a great improvement. On the other hand, swapping the word order, e.g. WARN_STRIP to SIGN_STRIP_WARN, is unwarranted. > +static enum sign_mode signed_tag_mode = SIGN_ABORT; > +static enum sign_mode signed_commit_mode = SIGN_STRIP_WARN; I think it is safer to abort for both and sell it as a bugfix ("silently stripping commit signatures was wrong. we should abort the same way by default when encountering a signed tag"). > -static int parse_opt_signed_tag_mode(const struct option *opt, > +static int parse_opt_sign_mode(const struct option *opt, > const char *arg, int unset) > { > - if (unset || !strcmp(arg, "abort")) > - signed_tag_mode = SIGNED_TAG_ABORT; > + enum sign_mode *valptr = opt->value; > + if (unset) > + return 0; > + else if (!strcmp(arg, "abort")) > + *valptr = SIGN_ABORT; > else if (!strcmp(arg, "verbatim") || !strcmp(arg, "ignore")) > - signed_tag_mode = VERBATIM; > + *valptr = SIGN_VERBATIM; Interesting and not a new issue at all, but "ignore" is a confusing symonym to "verbatim"---I would have expected "ignore", if accepted as a choice, would strip the signature. Not documenting it is probably good, but perhaps we would eventually remove it? > @@ -499,6 +505,60 @@ static void show_filemodify(struct diff_queue_struct *q, > } > } > > +static const char *find_signature(const char *begin, const char *end, const char *key) This is only for in-header signature used in commit objects, and not for the traditional "attached to the end" signature used in tag objects, right? The name of this function should be designed to answer the above question, but find_signature() that does not say either commit or tag implies it can accept both (which would be a horrible interface, though). If this is only for in-header signature, rename it to make sure that the fact is readable out of its name? > +{ > + static struct strbuf needle = STRBUF_INIT; > + char *bod, *eod, *eol; > + > + strbuf_reset(&needle); > + strbuf_addch(&needle, '\n'); > + strbuf_addstr(&needle, key); > + strbuf_addch(&needle, ' '); strbuf_addf(), perhaps? > + bod = memmem(begin, end ? end - begin : strlen(begin), > + needle.buf, needle.len); > + if (!bod) > + return NULL; > + bod += needle.len; > + > + /* > + * In the commit object, multi-line header values are stored > + * by prefixing continuation lines begin with a space. So "by prefixig continuation lines with a space" > + * within the commit object, it looks like > + * > + * "gpgsig -----BEGIN PGP SIGNATURE-----\n" > + * " Version: GnuPG v1.4.5 (GNU/Linux)\n" > + * " \n" > + * " base64_pem_here\n" > + * " -----END PGP SIGNATURE-----\n" > + * > + * So we need to look for the first '\n' that *isn't* followed > + * by a ' ' (or the first '\0', if no such '\n' exists). > + */ > + eod = strchrnul(bod, '\n'); > + while (eod[0] == '\n' && eod[1] == ' ') { > + eod = strchrnul(eod+1, '\n'); > + } SP on both sides of '+'; no {} around a block that consists of a single statement. > + *eod = '\0'; The begin and end pointers pointed to a piece of memory that is supposed to be read-only, but this pointer points into that region of memory and then updates a byte? The function signature is misleading---if you intend to muck with the string, accept them as mutable pointers. Better yet, don't butcher the region of memory pointed by the "message" variable the caller uses to keep reading from the remainder of the commit object buffer with this and memmove() below. Perhaps have the caller pass a strbuf to fill in the signature found by this helper as another parameter, and then return a bool "Yes, I found a sig" as its return value? > + > + /* > + * We now have the value as it's stored in the commit object. > + * However, we want the raw value; we want to return > + * > + * "-----BEGIN PGP SIGNATURE-----\n" > + * "Version: GnuPG v1.4.5 (GNU/Linux)\n" > + * "\n" > + * "base64_pem_here\n" > + * "-----END PGP SIGNATURE-----\n" > + * > + * So now we need to strip out all of those extra spaces. > + */ > + while ((eol = strstr(bod, "\n "))) > + memmove(eol+1, eol+2, strlen(eol+1)); Besides, this is O(n^2), isn't it, as it always starts scanning at bod while there are lines in the signature block to be processed, it needs to skip over the lines that the loop already has processed. I'd stop here for now, as there should be enough to polish. Thanks.
On Tue, 27 Apr 2021 22:02:47 -0600, Junio C Hamano wrote: > > Luke Shumaker <lukeshu@lukeshu.com> writes: > > > fast-export has an existing --signed-tags= flag that controls how to > > Don't call a command line option "a flag", especially when it is not > a boolean. Good to know, perhaps this should be mentioned in CodingGuidelines or SubmittingPatches.txt? I see lots of instances in the docs of "flag" being used. > "has an existing" feels redundantly repeticious. I guess I did this to make it clearer that that paragraph is describing the state of things before the patch, rather than after the patch. This is of course the required way of writing messages for git.git, but I worded it that way to make it clearer to reviewers that I'm following that requirement (especially since I haven't gotten a commit landed in git.git before). > > So, implement signed-commits. > > That's misleading. You are not inventing "git commit --signed" > here. > > So implement `--signed-commits=<disposition>` that mirrors the > `--signed-tags=<disposition>` option. Ack. > > +--signed-commits=(verbatim|warn-verbatim|warn-strip|strip|abort):: > > + Specify how to handle signed commits. Behaves exactly as > > + --signed-tags (but for commits), except that the default is > > + 'warn-strip' rather than 'abort'. > > Why deliberate inconsistency? I am not sure "historically we did a > wrong thing" is a good reason (if we view that silently stripping > was a disservice to the users, aborting would be a bugfix). I *almost* agree. I agree in principle, but disagree in practice because I know that it would break a bunch of existing tooling, including git-filter-repo. > > > > +enum sign_mode { SIGN_ABORT, SIGN_VERBATIM, SIGN_STRIP, SIGN_VERBATIM_WARN, SIGN_STRIP_WARN }; > > + > > static int progress; > > -static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT; > > Giving the enum values consistent prefix "SIGN_" is a great > improvement. On the other hand, swapping the word order, > e.g. WARN_STRIP to SIGN_STRIP_WARN, is unwarranted. Flipping it around made the switch statements read better, I thought. But I can change it back. > > +static enum sign_mode signed_tag_mode = SIGN_ABORT; > > +static enum sign_mode signed_commit_mode = SIGN_STRIP_WARN; > > I think it is safer to abort for both and sell it as a bugfix > ("silently stripping commit signatures was wrong. we should abort > the same way by default when encountering a signed tag"). > > > -static int parse_opt_signed_tag_mode(const struct option *opt, > > +static int parse_opt_sign_mode(const struct option *opt, > > const char *arg, int unset) > > { > > - if (unset || !strcmp(arg, "abort")) > > - signed_tag_mode = SIGNED_TAG_ABORT; > > + enum sign_mode *valptr = opt->value; > > + if (unset) > > + return 0; > > + else if (!strcmp(arg, "abort")) > > + *valptr = SIGN_ABORT; > > else if (!strcmp(arg, "verbatim") || !strcmp(arg, "ignore")) > > - signed_tag_mode = VERBATIM; > > + *valptr = SIGN_VERBATIM; > > Interesting and not a new issue at all, but "ignore" is a confusing > symonym to "verbatim"---I would have expected "ignore", if accepted > as a choice, would strip the signature. Not documenting it is > probably good, but perhaps we would eventually remove it? Indeed, it was renamed from "ignore" to "verbatim" because "ignore" was such a confusing name. It was renamed (in ee4bc3715f (fast-export: rename the signed tag mode 'ignore' to 'verbatim', 2007-12-03)) by the original fast-export author, pretty much immediately after fast-export was originally introduced. There was never a released version of Git that had fast-export but didn't have the 'ignore'->'verbatim' rename (fast-export was first released in v1.5.4, and the rename was already present then). > > @@ -499,6 +505,60 @@ static void show_filemodify(struct diff_queue_struct *q, > > } > > } > > > > +static const char *find_signature(const char *begin, const char *end, const char *key) > > This is only for in-header signature used in commit objects, and not > for the traditional "attached to the end" signature used in tag > objects, right? > > The name of this function should be designed to answer the above > question, but find_signature() that does not say either commit or > tag implies it can accept both (which would be a horrible interface, > though). If this is only for in-header signature, rename it to make > sure that the fact is readable out of its name? > > > +{ > > + static struct strbuf needle = STRBUF_INIT; > > + char *bod, *eod, *eol; > > + > > + strbuf_reset(&needle); > > + strbuf_addch(&needle, '\n'); > > + strbuf_addstr(&needle, key); > > + strbuf_addch(&needle, ' '); > > strbuf_addf(), perhaps? Currently, strbuf_addf is only used by fast-export.c in the "anonymize_" functions. I took that to mean "avoid strbuf_addf if you can", figuring that fast-export and fast-import seem to go reasonably far out of their way to avoid dynamic things (like printf) in the happy-path. But I can change if it I'm mis-reading fast-export's paranoia. > > + bod = memmem(begin, end ? end - begin : strlen(begin), > > + needle.buf, needle.len); > > + if (!bod) > > + return NULL; > > + bod += needle.len; > > + > > + /* > > + * In the commit object, multi-line header values are stored > > + * by prefixing continuation lines begin with a space. So > > "by prefixig continuation lines with a space" Oops. > > + * within the commit object, it looks like > > + * > > + * "gpgsig -----BEGIN PGP SIGNATURE-----\n" > > + * " Version: GnuPG v1.4.5 (GNU/Linux)\n" > > + * " \n" > > + * " base64_pem_here\n" > > + * " -----END PGP SIGNATURE-----\n" > > + * > > + * So we need to look for the first '\n' that *isn't* followed > > + * by a ' ' (or the first '\0', if no such '\n' exists). > > + */ > > > + eod = strchrnul(bod, '\n'); > > + while (eod[0] == '\n' && eod[1] == ' ') { > > + eod = strchrnul(eod+1, '\n'); > > + } > > SP on both sides of '+'; no {} around a block that consists of a > single statement. Ack. > > + *eod = '\0'; > > The begin and end pointers pointed to a piece of memory that is > supposed to be read-only, but this pointer points into that region > of memory and then updates a byte? The function signature is > misleading---if you intend to muck with the string, accept them as > mutable pointers. > > Better yet, don't butcher the region of memory pointed by the > "message" variable the caller uses to keep reading from the > remainder of the commit object buffer with this and memmove() > below. Perhaps have the caller pass a strbuf to fill in the > signature found by this helper as another parameter, and then return > a bool "Yes, I found a sig" as its return value? That all sounds very sane, but I was mimicking the existing `find_encoding`. You aren't supposed to modify the memory from get_commit_buffer, but fast-export does anyway. I assume that Johannes knew what he was doing when he wrote it (that it's safe because fast-export never traverses the same object twice?) and that he did it as an allocation-avoiding optimization. Part of me thinks that it would be better to just use the standard functions for this, like read_commit_extra_headers or find_commit_header? > > + > > + /* > > + * We now have the value as it's stored in the commit object. > > + * However, we want the raw value; we want to return > > + * > > + * "-----BEGIN PGP SIGNATURE-----\n" > > + * "Version: GnuPG v1.4.5 (GNU/Linux)\n" > > + * "\n" > > + * "base64_pem_here\n" > > + * "-----END PGP SIGNATURE-----\n" > > + * > > + * So now we need to strip out all of those extra spaces. > > + */ > > + while ((eol = strstr(bod, "\n "))) > > + memmove(eol+1, eol+2, strlen(eol+1)); > > Besides, this is O(n^2), isn't it, as it always starts scanning at > bod while there are lines in the signature block to be processed, it > needs to skip over the lines that the loop already has processed. Indeed, I'm embarrassed that made it in to something I submitted.
On Thu, Apr 29, 2021 at 1:06 PM Luke Shumaker <lukeshu@lukeshu.com> wrote: > > On Tue, 27 Apr 2021 22:02:47 -0600, > Junio C Hamano wrote: > > > > Luke Shumaker <lukeshu@lukeshu.com> writes: > > > > > +--signed-commits=(verbatim|warn-verbatim|warn-strip|strip|abort):: > > > + Specify how to handle signed commits. Behaves exactly as > > > + --signed-tags (but for commits), except that the default is > > > + 'warn-strip' rather than 'abort'. > > > > Why deliberate inconsistency? I am not sure "historically we did a > > wrong thing" is a good reason (if we view that silently stripping > > was a disservice to the users, aborting would be a bugfix). > > I *almost* agree. I agree in principle, but disagree in practice > because I know that it would break a bunch of existing tooling, > including git-filter-repo. I understand that fast-export's behavior in the past matched what --signed-commits=warn-strip would now do, and thus you wanted to select it for backward compatibility. But throwing an error and making the user choose when they are potentially losing data seems like a safer choice to me. I do get that we might have to use warn-strip as the default anyway just because some existing tools might rely on it, but do you have any examples outside of git-filter-repo? Given the filter-repo bug reports I've gotten with users being surprised at commit signatures being stripped (despite the fact that this is documented -- users don't always read the documentation), I'd argue that changing to --signed-commits=abort as the default is probably a good bugfix for both fast-export and for filter-repo. Clearly, it'd probably make sense for filter-repo to also add an option for the user to select to: (0) abort if commit signatures are found, (1) strip commit signatures, (2) retain commit signatures even if they are invalid, or (3) only retain commit signatures if they are valid. In the past, we could only reasonably do (1). Your series makes (0) and (2) possible. More work in fast-import would be needed to make (3) a possibility, so I wouldn't be able to add it to filter-repo yet, but I could add the other options.
Elijah Newren <newren@gmail.com> writes: > I do get that we might have to use warn-strip as the default anyway > just because some existing tools might rely on it, but do you have any > examples outside of git-filter-repo? Given the filter-repo bug > reports I've gotten with users being surprised at commit signatures > being stripped (despite the fact that this is documented -- users > don't always read the documentation), I'd argue that changing to > --signed-commits=abort as the default is probably a good bugfix for > both fast-export and for filter-repo. Thanks. The "filter-repo already gets bug reports from the users" is a valuable input when deciding if it is reasonable to sell the behaviour change as a bugfix to our users. Perhaps teaching fast-export to pay attention to two environment variables that say "when no --signed-{tag,commit}=<disposition>" command line option is given, use this behaviour" would be a good enough escape hatch for existing tools and their users, while they are waiting for their tools to get updated with the new option you are planning to add? Also, I am glad that you brought up another possible behaviour that Luke's patch did not add. Exporting existing signatures that may become invalid and deciding what to do with them on the receiving end would be a good option to have. And that would most likely have to be done at "fast-import" end, as a commit that "fast-export" expected to retain its object name if its export stream were applied as-is may not retain the object name when the export stream gets preprocessed before being fed to "fast-import".
On Thu, Apr 29, 2021 at 4:42 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > I do get that we might have to use warn-strip as the default anyway > > just because some existing tools might rely on it, but do you have any > > examples outside of git-filter-repo? Given the filter-repo bug > > reports I've gotten with users being surprised at commit signatures > > being stripped (despite the fact that this is documented -- users > > don't always read the documentation), I'd argue that changing to > > --signed-commits=abort as the default is probably a good bugfix for > > both fast-export and for filter-repo. > > Thanks. The "filter-repo already gets bug reports from the users" > is a valuable input when deciding if it is reasonable to sell the > behaviour change as a bugfix to our users. > > Perhaps teaching fast-export to pay attention to two environment > variables that say "when no --signed-{tag,commit}=<disposition>" > command line option is given, use this behaviour" would be a good > enough escape hatch for existing tools and their users, while they > are waiting for their tools to get updated with the new option you > are planning to add? As far as git filter-repo is concerned, you can immediately introduce --signed-commit and give it a default value of abort with no deprecation period. filter-repo already has to check git versions for available command line options, so one more wouldn't hurt. And a default of "abort" seems more user friendly, as it gives users a chance to be aware of and handle their data appropriately. Of course, there are a few factors that make filter-repo more tolerant of upstream changes: I don't expect people to user filter-repo often (it's a once-in-a-blue-moon rewrite), I don't expect them to use it in automated processes, I tend to make releases that coincide in timing with git releases (so I'll just release a git-filter-repo 2.32.0 the day you release git 2.32, and it'll come with an option to handle this new default), and filter-repo includes the following disclaimer in its documentation: """ I assume that people use filter-repo for one-shot conversions, not ongoing data transfers. I explicitly reserve the right to change any API in filter-repo based on this presumption (and a comment to this effect is found in multiple places in the code and examples). You have been warned. """ So, if it's just for filter-repo, then I'd say just change fast-export's default now. If you're concerned with --signed-commit=abort being a changed default being too drastic for other users or tools, then the environment variable escape hatch sounds reasonable to me. Personally, I'm worried users are seeing "lost" data (though they don't notice it until weeks or months later) and are being surprised by it, which feels like a bigger issue to me than "my automated script isn't running anymore on this one repo, now I have to figure out what flag to use in order to choose whether I care about that data from that special repo being tossed or not". So I would bias towards throwing an error so users get a chance to handle it. > Also, I am glad that you brought up another possible behaviour that > Luke's patch did not add. Exporting existing signatures that may > become invalid and deciding what to do with them on the receiving > end would be a good option to have. And that would most likely have > to be done at "fast-import" end, as a commit that "fast-export" > expected to retain its object name if its export stream were applied > as-is may not retain the object name when the export stream gets > preprocessed before being fed to "fast-import". Right, but I'd go a step further: Even if the fast-export stream is not pre-processed before feeding to fast-import, you still cannot always expect to get the same object names when importing the stream. To see why, note that the fast-export stream has no way to encode tree information. So if trees in the original history deviated from "normal" in some fashion, such as not-quite-sorted entries, or non standard modes, then sending those objects through fast-export and fast-import will necessarily result in different object names. fast-export also may have modified other objects to normalize them, either because of default re-encoding of commit messages into UTF-8, because of stripping any unrecognized commit headers, or perhaps even because it'd truncate commit messages with an embedded NUL character. Combine all these "normalizations" that fast-export/fast-import do with the ability for users to process the stream from fast-export to fast-import and it becomes clear that the only stage in the pipeline that can check the validity of the gpg signatures for the imported history is the fast-import step.
Elijah Newren <newren@gmail.com> writes: > So, if it's just for filter-repo, then I'd say just change > fast-export's default now. If you're concerned with > --signed-commit=abort being a changed default being too drastic for > other users or tools, then the environment variable escape hatch > sounds reasonable to me. I wasn't specifically worried about any single tool. It is largely third-party's business, and my job is to make sure it won't be too hard for them to adjust to our changes. Even existing users of filter-repo would probably need such an escape hatch, as it may not necessarily be possible to update filter-repo at the same time they update Git. Unless filter-repo refuses to work with a version of Git that is newer than what it knows about (which is not quite how I would prepare a tool for external change, though), that is. > Combine all these "normalizations" that fast-export/fast-import do > with the ability for users to process the stream from fast-export to > fast-import and it becomes clear that the only stage in the pipeline > that can check the validity of the gpg signatures for the imported > history is the fast-import step. Yup. So I guess we two are in agreement wrt the "ideal world".
On Thu, 29 Apr 2021 17:42:24 -0600, Junio C Hamano wrote: > > Elijah Newren <newren@gmail.com> writes: > > > I do get that we might have to use warn-strip as the default anyway > > just because some existing tools might rely on it, but do you have any > > examples outside of git-filter-repo? Given the filter-repo bug > > reports I've gotten with users being surprised at commit signatures > > being stripped (despite the fact that this is documented -- users > > don't always read the documentation), I'd argue that changing to > > --signed-commits=abort as the default is probably a good bugfix for > > both fast-export and for filter-repo. > > Thanks. The "filter-repo already gets bug reports from the users" > is a valuable input when deciding if it is reasonable to sell the > behaviour change as a bugfix to our users. > > Perhaps teaching fast-export to pay attention to two environment > variables that say "when no --signed-{tag,commit}=<disposition>" > command line option is given, use this behaviour" would be a good > enough escape hatch for existing tools and their users, while they > are waiting for their tools to get updated with the new option you > are planning to add? Between Elijah being on-board with changing the default, and the suggested env-var escape hatch, you've won me over. I'll change the default to 'abort' and implement an env-var escape hatch. Any suggestions on how to name it? `FAST_EXPORT_SIGNED_COMMITS`? Should I give it a `GIT_` prefix? `FILTER_BRANCH_SQUELCH_WARNING` doesn't have a `GIT_` prefix... > Also, I am glad that you brought up another possible behaviour that > Luke's patch did not add. Exporting existing signatures that may > become invalid and deciding what to do with them on the receiving > end would be a good option to have. And that would most likely have > to be done at "fast-import" end, as a commit that "fast-export" > expected to retain its object name if its export stream were applied > as-is may not retain the object name when the export stream gets > preprocessed before being fed to "fast-import". Elijah suggested that on an earlier version of the patchset too. I agree that it's a splendid idea, but I'm not willing to be the one to do the work of implementing it... at least not in the next few months.
On Tue, 27 Apr 2021 22:02:47 -0600, Junio C Hamano wrote: > Better yet, don't butcher the region of memory pointed by the > "message" variable the caller uses to keep reading from the > remainder of the commit object buffer with this and memmove() > below. Perhaps have the caller pass a strbuf to fill in the > signature found by this helper as another parameter, and then return > a bool "Yes, I found a sig" as its return value? Stupid question: is there a better way to append a region of bytes to a strbuf than strbuf_addf(&buf, "%.*s", (int)(str_end - str_beg), str); ? It seems weird to me to invoke the printf machinery for something so simple, but I don't see anything alternatives in strbuf.h. Am I missing something?
On Fri, Apr 30, 2021 at 12:34 PM Luke Shumaker <lukeshu@lukeshu.com> wrote: > > On Tue, 27 Apr 2021 22:02:47 -0600, > Junio C Hamano wrote: > > Better yet, don't butcher the region of memory pointed by the > > "message" variable the caller uses to keep reading from the > > remainder of the commit object buffer with this and memmove() > > below. Perhaps have the caller pass a strbuf to fill in the > > signature found by this helper as another parameter, and then return > > a bool "Yes, I found a sig" as its return value? > > Stupid question: is there a better way to append a region of bytes to > a strbuf than > > strbuf_addf(&buf, "%.*s", (int)(str_end - str_beg), str); > > ? > > It seems weird to me to invoke the printf machinery for something so > simple, but I don't see anything alternatives in strbuf.h. Am I > missing something? I struggled to find it some time ago as well; I wonder if some reorganization of strbuf.[ch] might make it more clear. Anyway, strbuf_add() if you have the number of bytes already handy, strbuf_addstr() if you don't have the number of bytes handy but the string is NUL-delimited.
On Fri, 30 Apr 2021 13:59:43 -0600, Elijah Newren wrote: > > On Fri, Apr 30, 2021 at 12:34 PM Luke Shumaker <lukeshu@lukeshu.com> wrote: > > > > On Tue, 27 Apr 2021 22:02:47 -0600, > > Junio C Hamano wrote: > > > Better yet, don't butcher the region of memory pointed by the > > > "message" variable the caller uses to keep reading from the > > > remainder of the commit object buffer with this and memmove() > > > below. Perhaps have the caller pass a strbuf to fill in the > > > signature found by this helper as another parameter, and then return > > > a bool "Yes, I found a sig" as its return value? > > > > Stupid question: is there a better way to append a region of bytes to > > a strbuf than > > > > strbuf_addf(&buf, "%.*s", (int)(str_end - str_beg), str); > > > > ? > > > > It seems weird to me to invoke the printf machinery for something so > > simple, but I don't see anything alternatives in strbuf.h. Am I > > missing something? > > I struggled to find it some time ago as well; I wonder if some > reorganization of strbuf.[ch] might make it more clear. > > Anyway, strbuf_add() if you have the number of bytes already handy, > strbuf_addstr() if you don't have the number of bytes handy but the > string is NUL-delimited. Ah! I was looking for `char *`, but strbuf_add takes a `void *`, that's why I didn't find it. Thank you!
diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index c307839e81..b651fa993b 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -41,6 +41,11 @@ see a warning. + `warn` is a deprecated synonym of `warn-verbatim`. +--signed-commits=(verbatim|warn-verbatim|warn-strip|strip|abort):: + Specify how to handle signed commits. Behaves exactly as + --signed-tags (but for commits), except that the default is + 'warn-strip' rather than 'abort'. + --tag-of-filtered-object=(abort|drop|rewrite):: Specify how to handle tags whose tagged object is filtered out. Since revisions and files to export can be limited by path, diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 458af0a2d6..4955c94305 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -431,12 +431,21 @@ and control the current import process. More detailed discussion Create or update a branch with a new commit, recording one logical change to the project. +//// +Yes, it's intentional that the 'gpgsig' line doesn't have a trailing +`LF`; the the definition of `data` has a byte-count prefix, so it +doesn't need an `LF` to act as a terminator (and `data` also already +includes an optional trailing `LF?` just in case you want to include +one). +//// + .... 'commit' SP <ref> LF mark? original-oid? ('author' (SP <name>)? SP LT <email> GT SP <when> LF)? 'committer' (SP <name>)? SP LT <email> GT SP <when> LF + ('gpgsig' SP <alg> LF data)? ('encoding' SP <encoding> LF)? data ('from' SP <commit-ish> LF)? @@ -505,6 +514,15 @@ that was selected by the --date-format=<fmt> command-line option. See ``Date Formats'' above for the set of supported formats, and their syntax. +`gpgsig` +^^^^^^^^ + +The optional `gpgsig` command is used to include a PGP/GPG signature +that signs the commit data. + +Here <alg> specifies which hashing algorithm is used for this +signature, either `sha1` or `sha256`. + `encoding` ^^^^^^^^^^ The optional `encoding` command indicates the encoding of the commit diff --git a/builtin/fast-export.c b/builtin/fast-export.c index d121dd2ee6..2b1101d104 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -30,8 +30,11 @@ static const char *fast_export_usage[] = { NULL }; +enum sign_mode { SIGN_ABORT, SIGN_VERBATIM, SIGN_STRIP, SIGN_VERBATIM_WARN, SIGN_STRIP_WARN }; + static int progress; -static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT; +static enum sign_mode signed_tag_mode = SIGN_ABORT; +static enum sign_mode signed_commit_mode = SIGN_STRIP_WARN; static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = TAG_FILTERING_ABORT; static enum { REENCODE_ABORT, REENCODE_YES, REENCODE_NO } reencode_mode = REENCODE_ABORT; static int fake_missing_tagger; @@ -48,21 +51,24 @@ static int anonymize; static struct hashmap anonymized_seeds; static struct revision_sources revision_sources; -static int parse_opt_signed_tag_mode(const struct option *opt, +static int parse_opt_sign_mode(const struct option *opt, const char *arg, int unset) { - if (unset || !strcmp(arg, "abort")) - signed_tag_mode = SIGNED_TAG_ABORT; + enum sign_mode *valptr = opt->value; + if (unset) + return 0; + else if (!strcmp(arg, "abort")) + *valptr = SIGN_ABORT; else if (!strcmp(arg, "verbatim") || !strcmp(arg, "ignore")) - signed_tag_mode = VERBATIM; + *valptr = SIGN_VERBATIM; else if (!strcmp(arg, "warn-verbatim") || !strcmp(arg, "warn")) - signed_tag_mode = WARN; + *valptr = SIGN_VERBATIM_WARN; else if (!strcmp(arg, "warn-strip")) - signed_tag_mode = WARN_STRIP; + *valptr = SIGN_STRIP_WARN; else if (!strcmp(arg, "strip")) - signed_tag_mode = STRIP; + *valptr = SIGN_STRIP; else - return error("Unknown signed-tags mode: %s", arg); + return error("Unknown %s mode: %s", opt->long_name, arg); return 0; } @@ -499,6 +505,60 @@ static void show_filemodify(struct diff_queue_struct *q, } } +static const char *find_signature(const char *begin, const char *end, const char *key) +{ + static struct strbuf needle = STRBUF_INIT; + char *bod, *eod, *eol; + + strbuf_reset(&needle); + strbuf_addch(&needle, '\n'); + strbuf_addstr(&needle, key); + strbuf_addch(&needle, ' '); + + bod = memmem(begin, end ? end - begin : strlen(begin), + needle.buf, needle.len); + if (!bod) + return NULL; + bod += needle.len; + + /* + * In the commit object, multi-line header values are stored + * by prefixing continuation lines begin with a space. So + * within the commit object, it looks like + * + * "gpgsig -----BEGIN PGP SIGNATURE-----\n" + * " Version: GnuPG v1.4.5 (GNU/Linux)\n" + * " \n" + * " base64_pem_here\n" + * " -----END PGP SIGNATURE-----\n" + * + * So we need to look for the first '\n' that *isn't* followed + * by a ' ' (or the first '\0', if no such '\n' exists). + */ + eod = strchrnul(bod, '\n'); + while (eod[0] == '\n' && eod[1] == ' ') { + eod = strchrnul(eod+1, '\n'); + } + *eod = '\0'; + + /* + * We now have the value as it's stored in the commit object. + * However, we want the raw value; we want to return + * + * "-----BEGIN PGP SIGNATURE-----\n" + * "Version: GnuPG v1.4.5 (GNU/Linux)\n" + * "\n" + * "base64_pem_here\n" + * "-----END PGP SIGNATURE-----\n" + * + * So now we need to strip out all of those extra spaces. + */ + while ((eol = strstr(bod, "\n "))) + memmove(eol+1, eol+2, strlen(eol+1)); + + return bod; +} + static const char *find_encoding(const char *begin, const char *end) { const char *needle = "\nencoding "; @@ -621,6 +681,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, int saved_output_format = rev->diffopt.output_format; const char *commit_buffer; const char *author, *author_end, *committer, *committer_end; + const char *signature_alg = NULL, *signature; const char *encoding, *message; char *reencoded = NULL; struct commit_list *p; @@ -644,6 +705,10 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, committer++; committer_end = strchrnul(committer, '\n'); message = strstr(committer_end, "\n\n"); + if ((signature = find_signature(committer_end, message, "gpgsig"))) + signature_alg = "sha1"; + else if ((signature = find_signature(committer_end, message, "gpgsig-sha256"))) + signature_alg = "sha256"; encoding = find_encoding(committer_end, message); if (message) message += 2; @@ -703,6 +768,29 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, printf("%.*s\n%.*s\n", (int)(author_end - author), author, (int)(committer_end - committer), committer); + if (signature) + switch(signed_commit_mode) { + case SIGN_ABORT: + die("encountered signed commit %s", + oid_to_hex(&commit->object.oid)); + case SIGN_VERBATIM_WARN: + warning("exporting signed commit %s", + oid_to_hex(&commit->object.oid)); + /* fallthru */ + case SIGN_VERBATIM: + printf("gpgsig %s\ndata %u\n%s", + signature_alg, + (unsigned)strlen(signature), + signature); + break; + case SIGN_STRIP_WARN: + warning("stripping signature from commit %s; use" + "--signed-commits=<mode> to handle it differently", + oid_to_hex(&commit->object.oid)); + /* fallthru */ + case SIGN_STRIP: + break; + } if (!reencoded && encoding) printf("encoding %s\n", encoding); printf("data %u\n%s", @@ -830,21 +918,21 @@ static void handle_tag(const char *name, struct tag *tag) "\n-----BEGIN PGP SIGNATURE-----\n"); if (signature) switch(signed_tag_mode) { - case SIGNED_TAG_ABORT: + case SIGN_ABORT: die("encountered signed tag %s; use " "--signed-tags=<mode> to handle it", oid_to_hex(&tag->object.oid)); - case WARN: + case SIGN_VERBATIM_WARN: warning("exporting signed tag %s", oid_to_hex(&tag->object.oid)); /* fallthru */ - case VERBATIM: + case SIGN_VERBATIM: break; - case WARN_STRIP: + case SIGN_STRIP_WARN: warning("stripping signature from tag %s", oid_to_hex(&tag->object.oid)); /* fallthru */ - case STRIP: + case SIGN_STRIP: message_size = signature + 1 - message; break; } @@ -1197,7 +1285,10 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) N_("show progress after <n> objects")), OPT_CALLBACK(0, "signed-tags", &signed_tag_mode, N_("mode"), N_("select handling of signed tags"), - parse_opt_signed_tag_mode), + parse_opt_sign_mode), + OPT_CALLBACK(0, "signed-commits", &signed_commit_mode, N_("mode"), + N_("select handling of signed commits"), + parse_opt_sign_mode), OPT_CALLBACK(0, "tag-of-filtered-object", &tag_of_filtered_mode, N_("mode"), N_("select handling of tags that tag filtered objects"), parse_opt_tag_of_filtered_mode), diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 3afa81cf9a..ee7516dd38 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -2669,10 +2669,13 @@ static struct hash_list *parse_merge(unsigned int *count) static void parse_new_commit(const char *arg) { + static struct strbuf sig = STRBUF_INIT; static struct strbuf msg = STRBUF_INIT; + struct string_list siglines = STRING_LIST_INIT_NODUP; struct branch *b; char *author = NULL; char *committer = NULL; + char *sig_alg = NULL; char *encoding = NULL; struct hash_list *merge_list = NULL; unsigned int merge_count; @@ -2696,6 +2699,13 @@ static void parse_new_commit(const char *arg) } if (!committer) die("Expected committer but didn't get one"); + if (skip_prefix(command_buf.buf, "gpgsig ", &v)) { + sig_alg = xstrdup(v); + read_next_command(); + parse_data(&sig, 0, NULL); + read_next_command(); + } else + strbuf_setlen(&sig, 0); if (skip_prefix(command_buf.buf, "encoding ", &v)) { encoding = xstrdup(v); read_next_command(); @@ -2769,10 +2779,23 @@ static void parse_new_commit(const char *arg) strbuf_addf(&new_data, "encoding %s\n", encoding); + if (sig_alg) { + if (!strcmp(sig_alg, "sha1")) + strbuf_addstr(&new_data, "gpgsig "); + else if (!strcmp(sig_alg, "sha256")) + strbuf_addstr(&new_data, "gpgsig-sha256 "); + else + die("Expected gpgsig algorithm sha1 or sha256, got %s", sig_alg); + string_list_split_in_place(&siglines, sig.buf, '\n', -1); + strbuf_add_separated_string_list(&new_data, "\n ", &siglines); + strbuf_addch(&new_data, '\n'); + } strbuf_addch(&new_data, '\n'); strbuf_addbuf(&new_data, &msg); + string_list_clear(&siglines, 1); free(author); free(committer); + free(sig_alg); free(encoding); if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark)) diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 892737439b..e686c3b894 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -8,6 +8,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh +. "$TEST_DIRECTORY/lib-gpg.sh" test_expect_success 'setup' ' @@ -284,9 +285,78 @@ test_expect_success 'signed-tags=warn-strip' ' test -s err ' +test_expect_success GPG 'set up signed commit' ' + + # Generate a commit with both "gpgsig" and "encoding" set, so + # that we can test that fast-import gets the ordering correct + # between the two. + test_config i18n.commitEncoding ISO-8859-1 && + git checkout -f -b commit-signing main && + echo Sign your name > file-sign && + git add file-sign && + git commit -S -m "signed commit" && + COMMIT_SIGNING=$(git rev-parse --verify commit-signing) + +' + +test_expect_success GPG 'signed-commits=abort' ' + + test_must_fail git fast-export --signed-commits=abort commit-signing + +' + +test_expect_success GPG 'signed-commits=verbatim' ' + + git fast-export --signed-commits=verbatim --reencode=no commit-signing >output && + grep "^gpgsig sha" output && + grep "encoding ISO-8859-1" output && + (cd new && + git fast-import && + test $COMMIT_SIGNING = $(git rev-parse --verify refs/heads/commit-signing)) <output + +' + +test_expect_success GPG 'signed-commits=warn-verbatim' ' + + git fast-export --signed-commits=warn-verbatim --reencode=no commit-signing >output 2>err && + grep "^gpgsig sha" output && + grep "encoding ISO-8859-1" output && + test -s err && + (cd new && + git fast-import && + test $COMMIT_SIGNING = $(git rev-parse --verify refs/heads/commit-signing)) <output + +' + +test_expect_success GPG 'signed-commits=strip' ' + + git fast-export --signed-commits=strip --reencode=no commit-signing >output && + ! grep ^gpgsig output && + grep "^encoding ISO-8859-1" output && + sed "s/commit-signing/commit-strip-signing/" output | + (cd new && + git fast-import && + test $COMMIT_SIGNING != $(git rev-parse --verify refs/heads/commit-strip-signing)) + +' + +test_expect_success GPG 'signed-commits=warn-strip' ' + + git fast-export --signed-commits=warn-strip --reencode=no commit-signing >output 2>err && + ! grep ^gpgsig output && + grep "^encoding ISO-8859-1" output && + test -s err && + sed "s/commit-signing/commit-strip-signing/" output | + (cd new && + git fast-import && + test $COMMIT_SIGNING != $(git rev-parse --verify refs/heads/commit-strip-signing)) + +' + test_expect_success 'setup submodule' ' git checkout -f main && + { git update-ref -d refs/heads/commit-signing || true; } && mkdir sub && ( cd sub &&