diff mbox series

[v3,3/3] fast-export, fast-import: implement signed-commits

Message ID 20210423164118.693197-4-lukeshu@lukeshu.com (mailing list archive)
State New, archived
Headers show
Series fast-export, fast-import: implement signed-commits | expand

Commit Message

Luke Shumaker April 23, 2021, 4:41 p.m. UTC
From: Luke Shumaker <lukeshu@datawire.io>

fast-export has an existing --signed-tags= flag that controls how to
handle tag signatures.  However, there is no equivalent for commit
signatures; it just silently strips the signature out of the commit
(analogously to --signed-tags=strip).

While signatures are generally problematic for fast-export/fast-import
(because hashes are likely to change), if they're going to support tag
signatures, there's no reason to not also support commit signatures.

So, implement signed-commits.

On the fast-export side, try to be as much like signed-tags as possible,
in both implementation and in user-interface; with the exception that
the default should be `--signed-commits=warn-strip` (compared to the
default `--signed-tags=abort`), in order to avoid breaking the
historical behavior (it will now print a warning while doing that
behavior, though).

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
---
v2:
 - Remove erroneous remark about ordering from the commit message.
 - Adjust the stream syntax to include the hash algorithm, as
   suggested by brian.
 - Add support for sha256 (based on lots of useful information from
   brian).  It does not support multiply-signed commits.
 - Shorten the documentation, based on feedback from Taylor.
 - Add comments, based on feedback from Taylor.
 - Change the default from `--signed-commits=strip` to
   `--signed-commits=warn-strip`.  This shouldn't break anyone, and
   means that users get useful feedback by default.
v3: no changes

 Documentation/git-fast-export.txt |   5 ++
 Documentation/git-fast-import.txt |  18 +++++
 builtin/fast-export.c             | 121 ++++++++++++++++++++++++++----
 builtin/fast-import.c             |  23 ++++++
 t/t9350-fast-export.sh            |  70 +++++++++++++++++
 5 files changed, 222 insertions(+), 15 deletions(-)

Comments

Junio C Hamano April 28, 2021, 4:02 a.m. UTC | #1
Luke Shumaker <lukeshu@lukeshu.com> writes:

> From: Luke Shumaker <lukeshu@datawire.io>
>
> fast-export has an existing --signed-tags= flag that controls how to

Don't call a command line option "a flag", especially when it is not
a boolean.

"has an existing" feels redundantly repeticious.

> handle tag signatures.  However, there is no equivalent for commit
> signatures; it just silently strips the signature out of the commit
> (analogously to --signed-tags=strip).
>
> While signatures are generally problematic for fast-export/fast-import
> (because hashes are likely to change), if they're going to support tag
> signatures, there's no reason to not also support commit signatures.
>
> So, implement signed-commits.

That's misleading.  You are not inventing "git commit --signed"
here.

    So implement `--signed-commits=<disposition>` that mirrors the
    `--signed-tags=<disposition>` option.

> +--signed-commits=(verbatim|warn-verbatim|warn-strip|strip|abort)::
> +	Specify how to handle signed commits.  Behaves exactly as
> +	--signed-tags (but for commits), except that the default is
> +	'warn-strip' rather than 'abort'.

Why deliberate inconsistency?  I am not sure "historically we did a
wrong thing" is a good reason (if we view that silently stripping
was a disservice to the users, aborting would be a bugfix).

> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> index 458af0a2d6..4955c94305 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index d121dd2ee6..2b1101d104 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -30,8 +30,11 @@ static const char *fast_export_usage[] = {
>  	NULL
>  };
>  
> +enum sign_mode { SIGN_ABORT, SIGN_VERBATIM, SIGN_STRIP, SIGN_VERBATIM_WARN, SIGN_STRIP_WARN };
> +
>  static int progress;
> -static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT;

Giving the enum values consistent prefix "SIGN_" is a great
improvement.  On the other hand, swapping the word order,
e.g. WARN_STRIP to SIGN_STRIP_WARN, is unwarranted.

> +static enum sign_mode signed_tag_mode = SIGN_ABORT;
> +static enum sign_mode signed_commit_mode = SIGN_STRIP_WARN;

I think it is safer to abort for both and sell it as a bugfix
("silently stripping commit signatures was wrong. we should abort
the same way by default when encountering a signed tag").

> -static int parse_opt_signed_tag_mode(const struct option *opt,
> +static int parse_opt_sign_mode(const struct option *opt,
>  				     const char *arg, int unset)
>  {
> -	if (unset || !strcmp(arg, "abort"))
> -		signed_tag_mode = SIGNED_TAG_ABORT;
> +	enum sign_mode *valptr = opt->value;
> +	if (unset)
> +		return 0;
> +	else if (!strcmp(arg, "abort"))
> +		*valptr = SIGN_ABORT;
>  	else if (!strcmp(arg, "verbatim") || !strcmp(arg, "ignore"))
> -		signed_tag_mode = VERBATIM;
> +		*valptr = SIGN_VERBATIM;

Interesting and not a new issue at all, but "ignore" is a confusing
symonym to "verbatim"---I would have expected "ignore", if accepted
as a choice, would strip the signature.  Not documenting it is
probably good, but perhaps we would eventually remove it?

> @@ -499,6 +505,60 @@ static void show_filemodify(struct diff_queue_struct *q,
>  	}
>  }
>  
> +static const char *find_signature(const char *begin, const char *end, const char *key)

This is only for in-header signature used in commit objects, and not
for the traditional "attached to the end" signature used in tag
objects, right?

The name of this function should be designed to answer the above
question, but find_signature() that does not say either commit or
tag implies it can accept both (which would be a horrible interface,
though).  If this is only for in-header signature, rename it to make
sure that the fact is readable out of its name?

> +{
> +	static struct strbuf needle = STRBUF_INIT;
> +	char *bod, *eod, *eol;
> +
> +	strbuf_reset(&needle);
> +	strbuf_addch(&needle, '\n');
> +	strbuf_addstr(&needle, key);
> +	strbuf_addch(&needle, ' ');

strbuf_addf(), perhaps?

> +	bod = memmem(begin, end ? end - begin : strlen(begin),
> +		     needle.buf, needle.len);
> +	if (!bod)
> +		return NULL;
> +	bod += needle.len;
> +
> +	/*
> +	 * In the commit object, multi-line header values are stored
> +	 * by prefixing continuation lines begin with a space.  So

"by prefixig continuation lines with a space"

> +	 * within the commit object, it looks like
> +	 *
> +	 *     "gpgsig -----BEGIN PGP SIGNATURE-----\n"
> +	 *     " Version: GnuPG v1.4.5 (GNU/Linux)\n"
> +	 *     " \n"
> +	 *     " base64_pem_here\n"
> +	 *     " -----END PGP SIGNATURE-----\n"
> +	 *
> +	 * So we need to look for the first '\n' that *isn't* followed
> +	 * by a ' ' (or the first '\0', if no such '\n' exists).
> +	 */

> +	eod = strchrnul(bod, '\n');
> +	while (eod[0] == '\n' && eod[1] == ' ') {
> +		eod = strchrnul(eod+1, '\n');
> +	}

SP on both sides of '+'; no {} around a block that consists of a
single statement.

> +	*eod = '\0';

The begin and end pointers pointed to a piece of memory that is
supposed to be read-only, but this pointer points into that region
of memory and then updates a byte?  The function signature is
misleading---if you intend to muck with the string, accept them as
mutable pointers.

Better yet, don't butcher the region of memory pointed by the
"message" variable the caller uses to keep reading from the
remainder of the commit object buffer with this and memmove()
below.  Perhaps have the caller pass a strbuf to fill in the
signature found by this helper as another parameter, and then return
a bool "Yes, I found a sig" as its return value?

> +
> +	/*
> +	 * We now have the value as it's stored in the commit object.
> +	 * However, we want the raw value; we want to return
> +	 *
> +	 *     "-----BEGIN PGP SIGNATURE-----\n"
> +	 *     "Version: GnuPG v1.4.5 (GNU/Linux)\n"
> +	 *     "\n"
> +	 *     "base64_pem_here\n"
> +	 *     "-----END PGP SIGNATURE-----\n"
> +	 *
> +	 * So now we need to strip out all of those extra spaces.
> +	 */
> +	while ((eol = strstr(bod, "\n ")))
> +		memmove(eol+1, eol+2, strlen(eol+1));

Besides, this is O(n^2), isn't it, as it always starts scanning at
bod while there are lines in the signature block to be processed, it
needs to skip over the lines that the loop already has processed.

I'd stop here for now, as there should be enough to polish.

Thanks.
Luke Shumaker April 29, 2021, 8:06 p.m. UTC | #2
On Tue, 27 Apr 2021 22:02:47 -0600,
Junio C Hamano wrote:
> 
> Luke Shumaker <lukeshu@lukeshu.com> writes:
> 
> > fast-export has an existing --signed-tags= flag that controls how to
> 
> Don't call a command line option "a flag", especially when it is not
> a boolean.

Good to know, perhaps this should be mentioned in CodingGuidelines or
SubmittingPatches.txt?  I see lots of instances in the docs of "flag"
being used.

> "has an existing" feels redundantly repeticious.

I guess I did this to make it clearer that that paragraph is
describing the state of things before the patch, rather than after the
patch.  This is of course the required way of writing messages for
git.git, but I worded it that way to make it clearer to reviewers that
I'm following that requirement (especially since I haven't gotten a
commit landed in git.git before).

> > So, implement signed-commits.
> 
> That's misleading.  You are not inventing "git commit --signed"
> here.
> 
>     So implement `--signed-commits=<disposition>` that mirrors the
>     `--signed-tags=<disposition>` option.

Ack.

> > +--signed-commits=(verbatim|warn-verbatim|warn-strip|strip|abort)::
> > +	Specify how to handle signed commits.  Behaves exactly as
> > +	--signed-tags (but for commits), except that the default is
> > +	'warn-strip' rather than 'abort'.
> 
> Why deliberate inconsistency?  I am not sure "historically we did a
> wrong thing" is a good reason (if we view that silently stripping
> was a disservice to the users, aborting would be a bugfix).

I *almost* agree.  I agree in principle, but disagree in practice
because I know that it would break a bunch of existing tooling,
including git-filter-repo.

> >  
> > +enum sign_mode { SIGN_ABORT, SIGN_VERBATIM, SIGN_STRIP, SIGN_VERBATIM_WARN, SIGN_STRIP_WARN };
> > +
> >  static int progress;
> > -static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT;
> 
> Giving the enum values consistent prefix "SIGN_" is a great
> improvement.  On the other hand, swapping the word order,
> e.g. WARN_STRIP to SIGN_STRIP_WARN, is unwarranted.

Flipping it around made the switch statements read better, I thought.
But I can change it back.

> > +static enum sign_mode signed_tag_mode = SIGN_ABORT;
> > +static enum sign_mode signed_commit_mode = SIGN_STRIP_WARN;
> 
> I think it is safer to abort for both and sell it as a bugfix
> ("silently stripping commit signatures was wrong. we should abort
> the same way by default when encountering a signed tag").
> 
> > -static int parse_opt_signed_tag_mode(const struct option *opt,
> > +static int parse_opt_sign_mode(const struct option *opt,
> >  				     const char *arg, int unset)
> >  {
> > -	if (unset || !strcmp(arg, "abort"))
> > -		signed_tag_mode = SIGNED_TAG_ABORT;
> > +	enum sign_mode *valptr = opt->value;
> > +	if (unset)
> > +		return 0;
> > +	else if (!strcmp(arg, "abort"))
> > +		*valptr = SIGN_ABORT;
> >  	else if (!strcmp(arg, "verbatim") || !strcmp(arg, "ignore"))
> > -		signed_tag_mode = VERBATIM;
> > +		*valptr = SIGN_VERBATIM;
> 
> Interesting and not a new issue at all, but "ignore" is a confusing
> symonym to "verbatim"---I would have expected "ignore", if accepted
> as a choice, would strip the signature.  Not documenting it is
> probably good, but perhaps we would eventually remove it?

Indeed, it was renamed from "ignore" to "verbatim" because "ignore"
was such a confusing name.  It was renamed (in ee4bc3715f
(fast-export: rename the signed tag mode 'ignore' to 'verbatim',
2007-12-03)) by the original fast-export author, pretty much
immediately after fast-export was originally introduced.  There was
never a released version of Git that had fast-export but didn't have
the 'ignore'->'verbatim' rename (fast-export was first released in
v1.5.4, and the rename was already present then).

> > @@ -499,6 +505,60 @@ static void show_filemodify(struct diff_queue_struct *q,
> >  	}
> >  }
> >  
> > +static const char *find_signature(const char *begin, const char *end, const char *key)
> 
> This is only for in-header signature used in commit objects, and not
> for the traditional "attached to the end" signature used in tag
> objects, right?
> 
> The name of this function should be designed to answer the above
> question, but find_signature() that does not say either commit or
> tag implies it can accept both (which would be a horrible interface,
> though).  If this is only for in-header signature, rename it to make
> sure that the fact is readable out of its name?
> 
> > +{
> > +	static struct strbuf needle = STRBUF_INIT;
> > +	char *bod, *eod, *eol;
> > +
> > +	strbuf_reset(&needle);
> > +	strbuf_addch(&needle, '\n');
> > +	strbuf_addstr(&needle, key);
> > +	strbuf_addch(&needle, ' ');
> 
> strbuf_addf(), perhaps?

Currently, strbuf_addf is only used by fast-export.c in the
"anonymize_" functions.  I took that to mean "avoid strbuf_addf if you
can", figuring that fast-export and fast-import seem to go reasonably
far out of their way to avoid dynamic things (like printf) in the
happy-path.

But I can change if it I'm mis-reading fast-export's paranoia.

> > +	bod = memmem(begin, end ? end - begin : strlen(begin),
> > +		     needle.buf, needle.len);
> > +	if (!bod)
> > +		return NULL;
> > +	bod += needle.len;
> > +
> > +	/*
> > +	 * In the commit object, multi-line header values are stored
> > +	 * by prefixing continuation lines begin with a space.  So
> 
> "by prefixig continuation lines with a space"

Oops.

> > +	 * within the commit object, it looks like
> > +	 *
> > +	 *     "gpgsig -----BEGIN PGP SIGNATURE-----\n"
> > +	 *     " Version: GnuPG v1.4.5 (GNU/Linux)\n"
> > +	 *     " \n"
> > +	 *     " base64_pem_here\n"
> > +	 *     " -----END PGP SIGNATURE-----\n"
> > +	 *
> > +	 * So we need to look for the first '\n' that *isn't* followed
> > +	 * by a ' ' (or the first '\0', if no such '\n' exists).
> > +	 */
> 
> > +	eod = strchrnul(bod, '\n');
> > +	while (eod[0] == '\n' && eod[1] == ' ') {
> > +		eod = strchrnul(eod+1, '\n');
> > +	}
> 
> SP on both sides of '+'; no {} around a block that consists of a
> single statement.

Ack.

> > +	*eod = '\0';
> 
> The begin and end pointers pointed to a piece of memory that is
> supposed to be read-only, but this pointer points into that region
> of memory and then updates a byte?  The function signature is
> misleading---if you intend to muck with the string, accept them as
> mutable pointers.
> 
> Better yet, don't butcher the region of memory pointed by the
> "message" variable the caller uses to keep reading from the
> remainder of the commit object buffer with this and memmove()
> below.  Perhaps have the caller pass a strbuf to fill in the
> signature found by this helper as another parameter, and then return
> a bool "Yes, I found a sig" as its return value?

That all sounds very sane, but I was mimicking the existing
`find_encoding`.

You aren't supposed to modify the memory from get_commit_buffer, but
fast-export does anyway.  I assume that Johannes knew what he was
doing when he wrote it (that it's safe because fast-export never
traverses the same object twice?) and that he did it as an
allocation-avoiding optimization.

Part of me thinks that it would be better to just use the standard
functions for this, like read_commit_extra_headers or
find_commit_header?

> > +
> > +	/*
> > +	 * We now have the value as it's stored in the commit object.
> > +	 * However, we want the raw value; we want to return
> > +	 *
> > +	 *     "-----BEGIN PGP SIGNATURE-----\n"
> > +	 *     "Version: GnuPG v1.4.5 (GNU/Linux)\n"
> > +	 *     "\n"
> > +	 *     "base64_pem_here\n"
> > +	 *     "-----END PGP SIGNATURE-----\n"
> > +	 *
> > +	 * So now we need to strip out all of those extra spaces.
> > +	 */
> > +	while ((eol = strstr(bod, "\n ")))
> > +		memmove(eol+1, eol+2, strlen(eol+1));
> 
> Besides, this is O(n^2), isn't it, as it always starts scanning at
> bod while there are lines in the signature block to be processed, it
> needs to skip over the lines that the loop already has processed.

Indeed, I'm embarrassed that made it in to something I submitted.
Elijah Newren April 29, 2021, 10:38 p.m. UTC | #3
On Thu, Apr 29, 2021 at 1:06 PM Luke Shumaker <lukeshu@lukeshu.com> wrote:
>
> On Tue, 27 Apr 2021 22:02:47 -0600,
> Junio C Hamano wrote:
> >
> > Luke Shumaker <lukeshu@lukeshu.com> writes:
> >

> > > +--signed-commits=(verbatim|warn-verbatim|warn-strip|strip|abort)::
> > > +   Specify how to handle signed commits.  Behaves exactly as
> > > +   --signed-tags (but for commits), except that the default is
> > > +   'warn-strip' rather than 'abort'.
> >
> > Why deliberate inconsistency?  I am not sure "historically we did a
> > wrong thing" is a good reason (if we view that silently stripping
> > was a disservice to the users, aborting would be a bugfix).
>
> I *almost* agree.  I agree in principle, but disagree in practice
> because I know that it would break a bunch of existing tooling,
> including git-filter-repo.

I understand that fast-export's behavior in the past matched what
--signed-commits=warn-strip would now do, and thus you wanted to
select it for backward compatibility.  But throwing an error and
making the user choose when they are potentially losing data seems
like a safer choice to me.

I do get that we might have to use warn-strip as the default anyway
just because some existing tools might rely on it, but do you have any
examples outside of git-filter-repo?  Given the filter-repo bug
reports I've gotten with users being surprised at commit signatures
being stripped (despite the fact that this is documented -- users
don't always read the documentation), I'd argue that changing to
--signed-commits=abort as the default is probably a good bugfix for
both fast-export and for filter-repo.

Clearly, it'd probably make sense for filter-repo to also add an
option for the user to select to: (0) abort if commit signatures are
found, (1) strip commit signatures, (2) retain commit signatures even
if they are invalid, or (3) only retain commit signatures if they are
valid.  In the past, we could only reasonably do (1).  Your series
makes (0) and (2) possible.  More work in fast-import would be needed
to make (3) a possibility, so I wouldn't be able to add it to
filter-repo yet, but I could add the other options.
Junio C Hamano April 29, 2021, 11:42 p.m. UTC | #4
Elijah Newren <newren@gmail.com> writes:

> I do get that we might have to use warn-strip as the default anyway
> just because some existing tools might rely on it, but do you have any
> examples outside of git-filter-repo?  Given the filter-repo bug
> reports I've gotten with users being surprised at commit signatures
> being stripped (despite the fact that this is documented -- users
> don't always read the documentation), I'd argue that changing to
> --signed-commits=abort as the default is probably a good bugfix for
> both fast-export and for filter-repo.

Thanks.  The "filter-repo already gets bug reports from the users"
is a valuable input when deciding if it is reasonable to sell the
behaviour change as a bugfix to our users.

Perhaps teaching fast-export to pay attention to two environment
variables that say "when no --signed-{tag,commit}=<disposition>"
command line option is given, use this behaviour" would be a good
enough escape hatch for existing tools and their users, while they
are waiting for their tools to get updated with the new option you
are planning to add?

Also, I am glad that you brought up another possible behaviour that
Luke's patch did not add.  Exporting existing signatures that may
become invalid and deciding what to do with them on the receiving
end would be a good option to have.  And that would most likely have
to be done at "fast-import" end, as a commit that "fast-export"
expected to retain its object name if its export stream were applied
as-is may not retain the object name when the export stream gets
preprocessed before being fed to "fast-import".
Elijah Newren April 30, 2021, 2:23 a.m. UTC | #5
On Thu, Apr 29, 2021 at 4:42 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > I do get that we might have to use warn-strip as the default anyway
> > just because some existing tools might rely on it, but do you have any
> > examples outside of git-filter-repo?  Given the filter-repo bug
> > reports I've gotten with users being surprised at commit signatures
> > being stripped (despite the fact that this is documented -- users
> > don't always read the documentation), I'd argue that changing to
> > --signed-commits=abort as the default is probably a good bugfix for
> > both fast-export and for filter-repo.
>
> Thanks.  The "filter-repo already gets bug reports from the users"
> is a valuable input when deciding if it is reasonable to sell the
> behaviour change as a bugfix to our users.
>
> Perhaps teaching fast-export to pay attention to two environment
> variables that say "when no --signed-{tag,commit}=<disposition>"
> command line option is given, use this behaviour" would be a good
> enough escape hatch for existing tools and their users, while they
> are waiting for their tools to get updated with the new option you
> are planning to add?

As far as git filter-repo is concerned, you can immediately introduce
--signed-commit and give it a default value of abort with no
deprecation period.  filter-repo already has to check git versions for
available command line options, so one more wouldn't hurt.  And a
default of "abort" seems more user friendly, as it gives users a
chance to be aware of and handle their data appropriately.

Of course, there are a few factors that make filter-repo more tolerant
of upstream changes: I don't expect people to user filter-repo often
(it's a once-in-a-blue-moon rewrite), I don't expect them to use it in
automated processes, I tend to make releases that coincide in timing
with git releases (so I'll just release a git-filter-repo 2.32.0 the
day you release git 2.32, and it'll come with an option to handle this
new default), and filter-repo includes the following disclaimer in its
documentation:

"""
I assume that people use filter-repo for one-shot conversions, not
ongoing data transfers. I explicitly reserve the right to change any
API in filter-repo based on this presumption (and a comment to this
effect is found in multiple places in the code and examples). You have
been warned.
"""

So, if it's just for filter-repo, then I'd say just change
fast-export's default now.  If you're concerned with
--signed-commit=abort being a changed default being too drastic for
other users or tools, then the environment variable escape hatch
sounds reasonable to me.

Personally, I'm worried users are seeing "lost" data (though they
don't notice it until weeks or months later) and are being surprised
by it, which feels like a bigger issue to me than "my automated script
isn't running anymore on this one repo, now I have to figure out what
flag to use in order to choose whether I care about that data from
that special repo being tossed or not".  So I would bias towards
throwing an error so users get a chance to handle it.

> Also, I am glad that you brought up another possible behaviour that
> Luke's patch did not add.  Exporting existing signatures that may
> become invalid and deciding what to do with them on the receiving
> end would be a good option to have.  And that would most likely have
> to be done at "fast-import" end, as a commit that "fast-export"
> expected to retain its object name if its export stream were applied
> as-is may not retain the object name when the export stream gets
> preprocessed before being fed to "fast-import".

Right, but I'd go a step further: Even if the fast-export stream is
not pre-processed before feeding to fast-import, you still cannot
always expect to get the same object names when importing the stream.

To see why, note that the fast-export stream has no way to encode tree
information.  So if trees in the original history deviated from
"normal" in some fashion, such as not-quite-sorted entries, or non
standard modes, then sending those objects through fast-export and
fast-import will necessarily result in different object names.
fast-export also may have modified other objects to normalize them,
either because of default re-encoding of commit messages into UTF-8,
because of stripping any unrecognized commit headers, or perhaps even
because it'd truncate commit messages with an embedded NUL character.

Combine all these "normalizations" that fast-export/fast-import do
with the ability for users to process the stream from fast-export to
fast-import and it becomes clear that the only stage in the pipeline
that can check the validity of the gpg signatures for the imported
history is the fast-import step.
Junio C Hamano April 30, 2021, 3:20 a.m. UTC | #6
Elijah Newren <newren@gmail.com> writes:

> So, if it's just for filter-repo, then I'd say just change
> fast-export's default now.  If you're concerned with
> --signed-commit=abort being a changed default being too drastic for
> other users or tools, then the environment variable escape hatch
> sounds reasonable to me.

I wasn't specifically worried about any single tool.  It is largely
third-party's business, and my job is to make sure it won't be too
hard for them to adjust to our changes.

Even existing users of filter-repo would probably need such an
escape hatch, as it may not necessarily be possible to update
filter-repo at the same time they update Git.

Unless filter-repo refuses to work with a version of Git that is
newer than what it knows about (which is not quite how I would
prepare a tool for external change, though), that is.

> Combine all these "normalizations" that fast-export/fast-import do
> with the ability for users to process the stream from fast-export to
> fast-import and it becomes clear that the only stage in the pipeline
> that can check the validity of the gpg signatures for the imported
> history is the fast-import step.

Yup.  So I guess we two are in agreement wrt the "ideal world".
Luke Shumaker April 30, 2021, 5:07 p.m. UTC | #7
On Thu, 29 Apr 2021 17:42:24 -0600,
Junio C Hamano wrote:
> 
> Elijah Newren <newren@gmail.com> writes:
> 
> > I do get that we might have to use warn-strip as the default anyway
> > just because some existing tools might rely on it, but do you have any
> > examples outside of git-filter-repo?  Given the filter-repo bug
> > reports I've gotten with users being surprised at commit signatures
> > being stripped (despite the fact that this is documented -- users
> > don't always read the documentation), I'd argue that changing to
> > --signed-commits=abort as the default is probably a good bugfix for
> > both fast-export and for filter-repo.
> 
> Thanks.  The "filter-repo already gets bug reports from the users"
> is a valuable input when deciding if it is reasonable to sell the
> behaviour change as a bugfix to our users.
> 
> Perhaps teaching fast-export to pay attention to two environment
> variables that say "when no --signed-{tag,commit}=<disposition>"
> command line option is given, use this behaviour" would be a good
> enough escape hatch for existing tools and their users, while they
> are waiting for their tools to get updated with the new option you
> are planning to add?

Between Elijah being on-board with changing the default, and the
suggested env-var escape hatch, you've won me over.

I'll change the default to 'abort' and implement an env-var escape
hatch.  Any suggestions on how to name it?
`FAST_EXPORT_SIGNED_COMMITS`?  Should I give it a `GIT_` prefix?
`FILTER_BRANCH_SQUELCH_WARNING` doesn't have a `GIT_` prefix...

> Also, I am glad that you brought up another possible behaviour that
> Luke's patch did not add.  Exporting existing signatures that may
> become invalid and deciding what to do with them on the receiving
> end would be a good option to have.  And that would most likely have
> to be done at "fast-import" end, as a commit that "fast-export"
> expected to retain its object name if its export stream were applied
> as-is may not retain the object name when the export stream gets
> preprocessed before being fed to "fast-import".

Elijah suggested that on an earlier version of the patchset too.  I
agree that it's a splendid idea, but I'm not willing to be the one to
do the work of implementing it... at least not in the next few months.
Luke Shumaker April 30, 2021, 7:34 p.m. UTC | #8
On Tue, 27 Apr 2021 22:02:47 -0600,
Junio C Hamano wrote:
> Better yet, don't butcher the region of memory pointed by the
> "message" variable the caller uses to keep reading from the
> remainder of the commit object buffer with this and memmove()
> below.  Perhaps have the caller pass a strbuf to fill in the
> signature found by this helper as another parameter, and then return
> a bool "Yes, I found a sig" as its return value?

Stupid question: is there a better way to append a region of bytes to
a strbuf than

    strbuf_addf(&buf, "%.*s", (int)(str_end - str_beg), str);

?

It seems weird to me to invoke the printf machinery for something so
simple, but I don't see anything alternatives in strbuf.h.  Am I
missing something?
Elijah Newren April 30, 2021, 7:59 p.m. UTC | #9
On Fri, Apr 30, 2021 at 12:34 PM Luke Shumaker <lukeshu@lukeshu.com> wrote:
>
> On Tue, 27 Apr 2021 22:02:47 -0600,
> Junio C Hamano wrote:
> > Better yet, don't butcher the region of memory pointed by the
> > "message" variable the caller uses to keep reading from the
> > remainder of the commit object buffer with this and memmove()
> > below.  Perhaps have the caller pass a strbuf to fill in the
> > signature found by this helper as another parameter, and then return
> > a bool "Yes, I found a sig" as its return value?
>
> Stupid question: is there a better way to append a region of bytes to
> a strbuf than
>
>     strbuf_addf(&buf, "%.*s", (int)(str_end - str_beg), str);
>
> ?
>
> It seems weird to me to invoke the printf machinery for something so
> simple, but I don't see anything alternatives in strbuf.h.  Am I
> missing something?

I struggled to find it some time ago as well; I wonder if some
reorganization of strbuf.[ch] might make it more clear.

Anyway, strbuf_add() if you have the number of bytes already handy,
strbuf_addstr() if you don't have the number of bytes handy but the
string is NUL-delimited.
Luke Shumaker April 30, 2021, 10:21 p.m. UTC | #10
On Fri, 30 Apr 2021 13:59:43 -0600,
Elijah Newren wrote:
> 
> On Fri, Apr 30, 2021 at 12:34 PM Luke Shumaker <lukeshu@lukeshu.com> wrote:
> >
> > On Tue, 27 Apr 2021 22:02:47 -0600,
> > Junio C Hamano wrote:
> > > Better yet, don't butcher the region of memory pointed by the
> > > "message" variable the caller uses to keep reading from the
> > > remainder of the commit object buffer with this and memmove()
> > > below.  Perhaps have the caller pass a strbuf to fill in the
> > > signature found by this helper as another parameter, and then return
> > > a bool "Yes, I found a sig" as its return value?
> >
> > Stupid question: is there a better way to append a region of bytes to
> > a strbuf than
> >
> >     strbuf_addf(&buf, "%.*s", (int)(str_end - str_beg), str);
> >
> > ?
> >
> > It seems weird to me to invoke the printf machinery for something so
> > simple, but I don't see anything alternatives in strbuf.h.  Am I
> > missing something?
> 
> I struggled to find it some time ago as well; I wonder if some
> reorganization of strbuf.[ch] might make it more clear.
> 
> Anyway, strbuf_add() if you have the number of bytes already handy,
> strbuf_addstr() if you don't have the number of bytes handy but the
> string is NUL-delimited.

Ah!  I was looking for `char *`, but strbuf_add takes a `void *`,
that's why I didn't find it.

Thank you!
diff mbox series

Patch

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index c307839e81..b651fa993b 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -41,6 +41,11 @@  see a warning.
 +
 `warn` is a deprecated synonym of `warn-verbatim`.
 
+--signed-commits=(verbatim|warn-verbatim|warn-strip|strip|abort)::
+	Specify how to handle signed commits.  Behaves exactly as
+	--signed-tags (but for commits), except that the default is
+	'warn-strip' rather than 'abort'.
+
 --tag-of-filtered-object=(abort|drop|rewrite)::
 	Specify how to handle tags whose tagged object is filtered out.
 	Since revisions and files to export can be limited by path,
diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 458af0a2d6..4955c94305 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -431,12 +431,21 @@  and control the current import process.  More detailed discussion
 Create or update a branch with a new commit, recording one logical
 change to the project.
 
+////
+Yes, it's intentional that the 'gpgsig' line doesn't have a trailing
+`LF`; the the definition of `data` has a byte-count prefix, so it
+doesn't need an `LF` to act as a terminator (and `data` also already
+includes an optional trailing `LF?` just in case you want to include
+one).
+////
+
 ....
 	'commit' SP <ref> LF
 	mark?
 	original-oid?
 	('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
 	'committer' (SP <name>)? SP LT <email> GT SP <when> LF
+	('gpgsig' SP <alg> LF data)?
 	('encoding' SP <encoding> LF)?
 	data
 	('from' SP <commit-ish> LF)?
@@ -505,6 +514,15 @@  that was selected by the --date-format=<fmt> command-line option.
 See ``Date Formats'' above for the set of supported formats, and
 their syntax.
 
+`gpgsig`
+^^^^^^^^
+
+The optional `gpgsig` command is used to include a PGP/GPG signature
+that signs the commit data.
+
+Here <alg> specifies which hashing algorithm is used for this
+signature, either `sha1` or `sha256`.
+
 `encoding`
 ^^^^^^^^^^
 The optional `encoding` command indicates the encoding of the commit
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index d121dd2ee6..2b1101d104 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -30,8 +30,11 @@  static const char *fast_export_usage[] = {
 	NULL
 };
 
+enum sign_mode { SIGN_ABORT, SIGN_VERBATIM, SIGN_STRIP, SIGN_VERBATIM_WARN, SIGN_STRIP_WARN };
+
 static int progress;
-static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT;
+static enum sign_mode signed_tag_mode = SIGN_ABORT;
+static enum sign_mode signed_commit_mode = SIGN_STRIP_WARN;
 static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = TAG_FILTERING_ABORT;
 static enum { REENCODE_ABORT, REENCODE_YES, REENCODE_NO } reencode_mode = REENCODE_ABORT;
 static int fake_missing_tagger;
@@ -48,21 +51,24 @@  static int anonymize;
 static struct hashmap anonymized_seeds;
 static struct revision_sources revision_sources;
 
-static int parse_opt_signed_tag_mode(const struct option *opt,
+static int parse_opt_sign_mode(const struct option *opt,
 				     const char *arg, int unset)
 {
-	if (unset || !strcmp(arg, "abort"))
-		signed_tag_mode = SIGNED_TAG_ABORT;
+	enum sign_mode *valptr = opt->value;
+	if (unset)
+		return 0;
+	else if (!strcmp(arg, "abort"))
+		*valptr = SIGN_ABORT;
 	else if (!strcmp(arg, "verbatim") || !strcmp(arg, "ignore"))
-		signed_tag_mode = VERBATIM;
+		*valptr = SIGN_VERBATIM;
 	else if (!strcmp(arg, "warn-verbatim") || !strcmp(arg, "warn"))
-		signed_tag_mode = WARN;
+		*valptr = SIGN_VERBATIM_WARN;
 	else if (!strcmp(arg, "warn-strip"))
-		signed_tag_mode = WARN_STRIP;
+		*valptr = SIGN_STRIP_WARN;
 	else if (!strcmp(arg, "strip"))
-		signed_tag_mode = STRIP;
+		*valptr = SIGN_STRIP;
 	else
-		return error("Unknown signed-tags mode: %s", arg);
+		return error("Unknown %s mode: %s", opt->long_name, arg);
 	return 0;
 }
 
@@ -499,6 +505,60 @@  static void show_filemodify(struct diff_queue_struct *q,
 	}
 }
 
+static const char *find_signature(const char *begin, const char *end, const char *key)
+{
+	static struct strbuf needle = STRBUF_INIT;
+	char *bod, *eod, *eol;
+
+	strbuf_reset(&needle);
+	strbuf_addch(&needle, '\n');
+	strbuf_addstr(&needle, key);
+	strbuf_addch(&needle, ' ');
+
+	bod = memmem(begin, end ? end - begin : strlen(begin),
+		     needle.buf, needle.len);
+	if (!bod)
+		return NULL;
+	bod += needle.len;
+
+	/*
+	 * In the commit object, multi-line header values are stored
+	 * by prefixing continuation lines begin with a space.  So
+	 * within the commit object, it looks like
+	 *
+	 *     "gpgsig -----BEGIN PGP SIGNATURE-----\n"
+	 *     " Version: GnuPG v1.4.5 (GNU/Linux)\n"
+	 *     " \n"
+	 *     " base64_pem_here\n"
+	 *     " -----END PGP SIGNATURE-----\n"
+	 *
+	 * So we need to look for the first '\n' that *isn't* followed
+	 * by a ' ' (or the first '\0', if no such '\n' exists).
+	 */
+	eod = strchrnul(bod, '\n');
+	while (eod[0] == '\n' && eod[1] == ' ') {
+		eod = strchrnul(eod+1, '\n');
+	}
+	*eod = '\0';
+
+	/*
+	 * We now have the value as it's stored in the commit object.
+	 * However, we want the raw value; we want to return
+	 *
+	 *     "-----BEGIN PGP SIGNATURE-----\n"
+	 *     "Version: GnuPG v1.4.5 (GNU/Linux)\n"
+	 *     "\n"
+	 *     "base64_pem_here\n"
+	 *     "-----END PGP SIGNATURE-----\n"
+	 *
+	 * So now we need to strip out all of those extra spaces.
+	 */
+	while ((eol = strstr(bod, "\n ")))
+		memmove(eol+1, eol+2, strlen(eol+1));
+
+	return bod;
+}
+
 static const char *find_encoding(const char *begin, const char *end)
 {
 	const char *needle = "\nencoding ";
@@ -621,6 +681,7 @@  static void handle_commit(struct commit *commit, struct rev_info *rev,
 	int saved_output_format = rev->diffopt.output_format;
 	const char *commit_buffer;
 	const char *author, *author_end, *committer, *committer_end;
+	const char *signature_alg = NULL, *signature;
 	const char *encoding, *message;
 	char *reencoded = NULL;
 	struct commit_list *p;
@@ -644,6 +705,10 @@  static void handle_commit(struct commit *commit, struct rev_info *rev,
 	committer++;
 	committer_end = strchrnul(committer, '\n');
 	message = strstr(committer_end, "\n\n");
+	if ((signature = find_signature(committer_end, message, "gpgsig")))
+		signature_alg = "sha1";
+	else if ((signature = find_signature(committer_end, message, "gpgsig-sha256")))
+		signature_alg = "sha256";
 	encoding = find_encoding(committer_end, message);
 	if (message)
 		message += 2;
@@ -703,6 +768,29 @@  static void handle_commit(struct commit *commit, struct rev_info *rev,
 	printf("%.*s\n%.*s\n",
 	       (int)(author_end - author), author,
 	       (int)(committer_end - committer), committer);
+	if (signature)
+		switch(signed_commit_mode) {
+		case SIGN_ABORT:
+			die("encountered signed commit %s",
+			    oid_to_hex(&commit->object.oid));
+		case SIGN_VERBATIM_WARN:
+			warning("exporting signed commit %s",
+				oid_to_hex(&commit->object.oid));
+			/* fallthru */
+		case SIGN_VERBATIM:
+			printf("gpgsig %s\ndata %u\n%s",
+			       signature_alg,
+			       (unsigned)strlen(signature),
+			       signature);
+			break;
+		case SIGN_STRIP_WARN:
+			warning("stripping signature from commit %s; use"
+				"--signed-commits=<mode> to handle it differently",
+				oid_to_hex(&commit->object.oid));
+			/* fallthru */
+		case SIGN_STRIP:
+			break;
+		}
 	if (!reencoded && encoding)
 		printf("encoding %s\n", encoding);
 	printf("data %u\n%s",
@@ -830,21 +918,21 @@  static void handle_tag(const char *name, struct tag *tag)
 					       "\n-----BEGIN PGP SIGNATURE-----\n");
 		if (signature)
 			switch(signed_tag_mode) {
-			case SIGNED_TAG_ABORT:
+			case SIGN_ABORT:
 				die("encountered signed tag %s; use "
 				    "--signed-tags=<mode> to handle it",
 				    oid_to_hex(&tag->object.oid));
-			case WARN:
+			case SIGN_VERBATIM_WARN:
 				warning("exporting signed tag %s",
 					oid_to_hex(&tag->object.oid));
 				/* fallthru */
-			case VERBATIM:
+			case SIGN_VERBATIM:
 				break;
-			case WARN_STRIP:
+			case SIGN_STRIP_WARN:
 				warning("stripping signature from tag %s",
 					oid_to_hex(&tag->object.oid));
 				/* fallthru */
-			case STRIP:
+			case SIGN_STRIP:
 				message_size = signature + 1 - message;
 				break;
 			}
@@ -1197,7 +1285,10 @@  int cmd_fast_export(int argc, const char **argv, const char *prefix)
 			    N_("show progress after <n> objects")),
 		OPT_CALLBACK(0, "signed-tags", &signed_tag_mode, N_("mode"),
 			     N_("select handling of signed tags"),
-			     parse_opt_signed_tag_mode),
+			     parse_opt_sign_mode),
+		OPT_CALLBACK(0, "signed-commits", &signed_commit_mode, N_("mode"),
+			     N_("select handling of signed commits"),
+			     parse_opt_sign_mode),
 		OPT_CALLBACK(0, "tag-of-filtered-object", &tag_of_filtered_mode, N_("mode"),
 			     N_("select handling of tags that tag filtered objects"),
 			     parse_opt_tag_of_filtered_mode),
diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 3afa81cf9a..ee7516dd38 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -2669,10 +2669,13 @@  static struct hash_list *parse_merge(unsigned int *count)
 
 static void parse_new_commit(const char *arg)
 {
+	static struct strbuf sig = STRBUF_INIT;
 	static struct strbuf msg = STRBUF_INIT;
+	struct string_list siglines = STRING_LIST_INIT_NODUP;
 	struct branch *b;
 	char *author = NULL;
 	char *committer = NULL;
+	char *sig_alg = NULL;
 	char *encoding = NULL;
 	struct hash_list *merge_list = NULL;
 	unsigned int merge_count;
@@ -2696,6 +2699,13 @@  static void parse_new_commit(const char *arg)
 	}
 	if (!committer)
 		die("Expected committer but didn't get one");
+	if (skip_prefix(command_buf.buf, "gpgsig ", &v)) {
+		sig_alg = xstrdup(v);
+		read_next_command();
+		parse_data(&sig, 0, NULL);
+		read_next_command();
+	} else
+		strbuf_setlen(&sig, 0);
 	if (skip_prefix(command_buf.buf, "encoding ", &v)) {
 		encoding = xstrdup(v);
 		read_next_command();
@@ -2769,10 +2779,23 @@  static void parse_new_commit(const char *arg)
 		strbuf_addf(&new_data,
 			"encoding %s\n",
 			encoding);
+	if (sig_alg) {
+		if (!strcmp(sig_alg, "sha1"))
+			strbuf_addstr(&new_data, "gpgsig ");
+		else if (!strcmp(sig_alg, "sha256"))
+			strbuf_addstr(&new_data, "gpgsig-sha256 ");
+		else
+			die("Expected gpgsig algorithm sha1 or sha256, got %s", sig_alg);
+		string_list_split_in_place(&siglines, sig.buf, '\n', -1);
+		strbuf_add_separated_string_list(&new_data, "\n ", &siglines);
+		strbuf_addch(&new_data, '\n');
+	}
 	strbuf_addch(&new_data, '\n');
 	strbuf_addbuf(&new_data, &msg);
+	string_list_clear(&siglines, 1);
 	free(author);
 	free(committer);
+	free(sig_alg);
 	free(encoding);
 
 	if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark))
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 892737439b..e686c3b894 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -8,6 +8,7 @@  GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY/lib-gpg.sh"
 
 test_expect_success 'setup' '
 
@@ -284,9 +285,78 @@  test_expect_success 'signed-tags=warn-strip' '
 	test -s err
 '
 
+test_expect_success GPG 'set up signed commit' '
+
+	# Generate a commit with both "gpgsig" and "encoding" set, so
+	# that we can test that fast-import gets the ordering correct
+	# between the two.
+	test_config i18n.commitEncoding ISO-8859-1 &&
+	git checkout -f -b commit-signing main &&
+	echo Sign your name > file-sign &&
+	git add file-sign &&
+	git commit -S -m "signed commit" &&
+	COMMIT_SIGNING=$(git rev-parse --verify commit-signing)
+
+'
+
+test_expect_success GPG 'signed-commits=abort' '
+
+	test_must_fail git fast-export --signed-commits=abort commit-signing
+
+'
+
+test_expect_success GPG 'signed-commits=verbatim' '
+
+	git fast-export --signed-commits=verbatim --reencode=no commit-signing >output &&
+	grep "^gpgsig sha" output &&
+	grep "encoding ISO-8859-1" output &&
+	(cd new &&
+	 git fast-import &&
+	 test $COMMIT_SIGNING = $(git rev-parse --verify refs/heads/commit-signing)) <output
+
+'
+
+test_expect_success GPG 'signed-commits=warn-verbatim' '
+
+	git fast-export --signed-commits=warn-verbatim --reencode=no commit-signing >output 2>err &&
+	grep "^gpgsig sha" output &&
+	grep "encoding ISO-8859-1" output &&
+	test -s err &&
+	(cd new &&
+	 git fast-import &&
+	 test $COMMIT_SIGNING = $(git rev-parse --verify refs/heads/commit-signing)) <output
+
+'
+
+test_expect_success GPG 'signed-commits=strip' '
+
+	git fast-export --signed-commits=strip --reencode=no commit-signing >output &&
+	! grep ^gpgsig output &&
+	grep "^encoding ISO-8859-1" output &&
+	sed "s/commit-signing/commit-strip-signing/" output |
+		(cd new &&
+		 git fast-import &&
+		 test $COMMIT_SIGNING != $(git rev-parse --verify refs/heads/commit-strip-signing))
+
+'
+
+test_expect_success GPG 'signed-commits=warn-strip' '
+
+	git fast-export --signed-commits=warn-strip --reencode=no commit-signing >output 2>err &&
+	! grep ^gpgsig output &&
+	grep "^encoding ISO-8859-1" output &&
+	test -s err &&
+	sed "s/commit-signing/commit-strip-signing/" output |
+		(cd new &&
+		 git fast-import &&
+		 test $COMMIT_SIGNING != $(git rev-parse --verify refs/heads/commit-strip-signing))
+
+'
+
 test_expect_success 'setup submodule' '
 
 	git checkout -f main &&
+	{ git update-ref -d refs/heads/commit-signing || true; } &&
 	mkdir sub &&
 	(
 		cd sub &&