diff mbox series

t/lib-git.sh: fix ACL-related permissions failure

Message ID 20211104192533.2520-1-adam@dinwoodie.org (mailing list archive)
State Superseded
Headers show
Series t/lib-git.sh: fix ACL-related permissions failure | expand

Commit Message

Adam Dinwoodie Nov. 4, 2021, 7:25 p.m. UTC
SSH keys are expected to be created with very restrictive permissions,
and SSH commands will fail if the permissions are not appropriate.  When
creating a directory for SSH keys in test scripts, attempt to clear any
ACLs that might otherwise cause the private key to inherit less
restrictive permissions than it requires.

This change is required in particular to avoid tests relating to SSH
signing failing in Cygwin.

Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Helped-by: Fabian Stelzer <fs@gigacodes.de>
---
 t/lib-gpg.sh | 1 +
 1 file changed, 1 insertion(+)

Comments

Junio C Hamano Nov. 4, 2021, 7:49 p.m. UTC | #1
Adam Dinwoodie <adam@dinwoodie.org> writes:

> SSH keys are expected to be created with very restrictive permissions,
> and SSH commands will fail if the permissions are not appropriate.  When
> creating a directory for SSH keys in test scripts, attempt to clear any
> ACLs that might otherwise cause the private key to inherit less
> restrictive permissions than it requires.

All of the above makes sense as an explanation as to why the
ssh-keygen command may be unhappy with the $GNUPGHOME directory that
is prepared here, but ...

> This change is required in particular to avoid tests relating to SSH
> signing failing in Cygwin.

... I am not quite sure how this explains "tests relating to ssh
signing failing on Cygwin".  After all, this piece of code is
lazy_prereq, which means that ssh-keygen in this block that fails
(due to a less restrictive permissions) would merely mean that tests
that are protected with GPGSSH prerequisite will be skipped without
causing test failures.  After all that is the whole point of
computing prereq on the fly.

> Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
> Helped-by: Fabian Stelzer <fs@gigacodes.de>

Please order these chronologically, i.e. Fabian helped and the patch
was finished, and finally you sent with your sign off.

> ---
>  t/lib-gpg.sh | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh
> index f99ef3e859..1d8e5b5b7e 100644
> --- a/t/lib-gpg.sh
> +++ b/t/lib-gpg.sh
> @@ -106,6 +106,7 @@ test_lazy_prereq GPGSSH '
>  	test $? = 0 || exit 1;
>  	mkdir -p "${GNUPGHOME}" &&
>  	chmod 0700 "${GNUPGHOME}" &&
> +	(setfacl -k "${GNUPGHOME}" 2>/dev/null || true) &&
>  	ssh-keygen -t ed25519 -N "" -C "git ed25519 key" -f "${GPGSSH_KEY_PRIMARY}" >/dev/null &&
>  	echo "\"principal with number 1\" $(cat "${GPGSSH_KEY_PRIMARY}.pub")" >> "${GPGSSH_ALLOWED_SIGNERS}" &&
>  	ssh-keygen -t rsa -b 2048 -N "" -C "git rsa2048 key" -f "${GPGSSH_KEY_SECONDARY}" >/dev/null &&

There are other uses of ssh-keygen in the real tests but presumably
they just use the GNUPGHOME directory prepared with this lazy_prereq
block, and "setfacl -k" here would have wiped any possible loosening
of permission, and that is why this is the only place that needs a
change, right?  That fact might deserve recording in the proposed
log message.

Thanks.
Junio C Hamano Nov. 4, 2021, 8:03 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

>> This change is required in particular to avoid tests relating to SSH
>> signing failing in Cygwin.
>
> ... I am not quite sure how this explains "tests relating to ssh
> signing failing on Cygwin".  After all, this piece of code is
> lazy_prereq, which means that ssh-keygen in this block that fails
> (due to a less restrictive permissions) would merely mean that tests
> that are protected with GPGSSH prerequisite will be skipped without
> causing test failures.  After all that is the whole point of
> computing prereq on the fly.

The reason why I wondered about the above is that it can be an
indication of another breakage, namely, that we may have tests that
require a working ssh-keygen but are by mistake not protected with
GPGSSH prerequisite.

The test_lazy_prereq block you touched may refrain from setting the
prerequisite on your system (due to the faulty test here that you
touched), but if we had such unprotected tests, we still will run
ssh signing tests and they would fail, due to the lack of the
prerequisite.

And fixing the prereq block alone will hide that other breakage, at
least on your system.  Hence my question.

Thanks.
Ramsay Jones Nov. 4, 2021, 8:09 p.m. UTC | #3
Hi Adam,

On 04/11/2021 19:25, Adam Dinwoodie wrote:
> SSH keys are expected to be created with very restrictive permissions,
> and SSH commands will fail if the permissions are not appropriate.  When
> creating a directory for SSH keys in test scripts, attempt to clear any
> ACLs that might otherwise cause the private key to inherit less
> restrictive permissions than it requires.

I was somewhat surprised to see your report, since all these tests
passed without issue for me on '-rc0'! :D (64-bit cygwin only).

So, the difference seems to be down to FS ACLs, Hmmm ...

(BTW, I am on windows 10 21H1)

ATB,
Ramsay Jones
Fabian Stelzer Nov. 4, 2021, 10:36 p.m. UTC | #4
On 04.11.2021 13:03, Junio C Hamano wrote:
>Junio C Hamano <gitster@pobox.com> writes:
>
>>> This change is required in particular to avoid tests relating to SSH
>>> signing failing in Cygwin.
>>
>> ... I am not quite sure how this explains "tests relating to ssh
>> signing failing on Cygwin".  After all, this piece of code is
>> lazy_prereq, which means that ssh-keygen in this block that fails
>> (due to a less restrictive permissions) would merely mean that tests
>> that are protected with GPGSSH prerequisite will be skipped without
>> causing test failures.  After all that is the whole point of
>> computing prereq on the fly.
>
>The reason why I wondered about the above is that it can be an
>indication of another breakage, namely, that we may have tests that
>require a working ssh-keygen but are by mistake not protected with
>GPGSSH prerequisite.
>
>The test_lazy_prereq block you touched may refrain from setting the
>prerequisite on your system (due to the faulty test here that you
>touched), but if we had such unprotected tests, we still will run
>ssh signing tests and they would fail, due to the lack of the
>prerequisite.
>
>And fixing the prereq block alone will hide that other breakage, at
>least on your system.  Hence my question.
>
>Thanks.

The problem is that the ssh-keygen in the layz_prereq will succeed but
might create a private key with world readable permissions. Only the
remaining tests using this key will then fail with a "your private key
permissions are too restrictive" like error. If we would like to make
sure in the prereq that the keys actually work fine we would need to do
a signing operation with them in it.

Something like the following call would be enough:
echo "test" | ssh-keygen -Y sign -f $GPGSSHKEY_PRIMARY -n "git" 

Not sure if we want to go that far though. The setfacl seems fine to me
otherwise.
Junio C Hamano Nov. 5, 2021, 7:30 a.m. UTC | #5
Fabian Stelzer <fs@gigacodes.de> writes:

> The problem is that the ssh-keygen in the layz_prereq will succeed but
> might create a private key with world readable permissions. Only the
> remaining tests using this key will then fail with a "your private key
> permissions are too restrictive" like error. If we would like to make
> sure in the prereq that the keys actually work fine we would need to do
> a signing operation with them in it.

That sounds like a right thing to do, with or without the setfacl fix.

>
> Something like the following call would be enough:
> echo "test" | ssh-keygen -Y sign -f $GPGSSHKEY_PRIMARY -n "git" 
> Not sure if we want to go that far though. The setfacl seems fine to me
> otherwise.
Adam Dinwoodie Nov. 5, 2021, 11:25 a.m. UTC | #6
On Thursday 04 November 2021 at 12:49 pm -0700, Junio C Hamano wrote:
> Adam Dinwoodie <adam@dinwoodie.org> writes:
> 
> > SSH keys are expected to be created with very restrictive permissions,
> > and SSH commands will fail if the permissions are not appropriate.  When
> > creating a directory for SSH keys in test scripts, attempt to clear any
> > ACLs that might otherwise cause the private key to inherit less
> > restrictive permissions than it requires.
> 
> All of the above makes sense as an explanation as to why the
> ssh-keygen command may be unhappy with the $GNUPGHOME directory that
> is prepared here, but ...
> 
> > This change is required in particular to avoid tests relating to SSH
> > signing failing in Cygwin.
> 
> ... I am not quite sure how this explains "tests relating to ssh
> signing failing on Cygwin".  After all, this piece of code is
> lazy_prereq, which means that ssh-keygen in this block that fails
> (due to a less restrictive permissions) would merely mean that tests
> that are protected with GPGSSH prerequisite will be skipped without
> causing test failures.  After all that is the whole point of
> computing prereq on the fly.

The issue is that the prerequisite check isn't _just_ checking a
prerequisite: it's also creating an SSH key that's used without further
modification by the tests.

There are three cases to consider:

- On systems where this prerequisite check fails, a key may or may not
  be created, but the tests that rely on the key won't be run, so it
  doesn't matter either way.

- On (clearly the mainline) systems where this check passes and there
  are no ACL problems, the key that's generated is stored with
  sufficiently restrictive permissions that the tests that rely on the
  key can pass.

- On my system, where ACLs are a problem, the prerequisite check passes,
  and a key is created, but it has permissions that are too permissive.
  As a result, when a test calls OpenSSH to use that key, OpenSSH
  refuses due to the permissions, and the test fails.

> > Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
> > Helped-by: Fabian Stelzer <fs@gigacodes.de>
> 
> Please order these chronologically, i.e. Fabian helped and the patch
> was finished, and finally you sent with your sign off.

Shall do!  I'll resubmit a corrected version as soon as all the other
discussions about this patch seem tidied up.

> > ---
> >  t/lib-gpg.sh | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh
> > index f99ef3e859..1d8e5b5b7e 100644
> > --- a/t/lib-gpg.sh
> > +++ b/t/lib-gpg.sh
> > @@ -106,6 +106,7 @@ test_lazy_prereq GPGSSH '
> >  	test $? = 0 || exit 1;
> >  	mkdir -p "${GNUPGHOME}" &&
> >  	chmod 0700 "${GNUPGHOME}" &&
> > +	(setfacl -k "${GNUPGHOME}" 2>/dev/null || true) &&
> >  	ssh-keygen -t ed25519 -N "" -C "git ed25519 key" -f "${GPGSSH_KEY_PRIMARY}" >/dev/null &&
> >  	echo "\"principal with number 1\" $(cat "${GPGSSH_KEY_PRIMARY}.pub")" >> "${GPGSSH_ALLOWED_SIGNERS}" &&
> >  	ssh-keygen -t rsa -b 2048 -N "" -C "git rsa2048 key" -f "${GPGSSH_KEY_SECONDARY}" >/dev/null &&
> 
> There are other uses of ssh-keygen in the real tests but presumably
> they just use the GNUPGHOME directory prepared with this lazy_prereq
> block, and "setfacl -k" here would have wiped any possible loosening
> of permission, and that is why this is the only place that needs a
> change, right?  That fact might deserve recording in the proposed
> log message.

More than that: the uses of ssh-keygen elsewhere are calling `ssh-keygen
-l`.  That command doesn't generate a key at all, it merely calculates
the fingerprint from the keys that are generated with the ssh-keygen
commands in this prerequisite check.

I think it's unusual that this test_lazy_prereq check isn't just
checking the state of the system but is creating the keys that will be
used in later tests, but I didn't think to document that in my log
because that was clearly a decision that had been made earlier.  But I'm
very happy to make that logic clearer in my commit message!
Adam Dinwoodie Nov. 5, 2021, 11:47 a.m. UTC | #7
On Thursday 04 November 2021 at 08:09 pm +0000, Ramsay Jones wrote:
> Hi Adam,
> 
> On 04/11/2021 19:25, Adam Dinwoodie wrote:
> > SSH keys are expected to be created with very restrictive permissions,
> > and SSH commands will fail if the permissions are not appropriate.  When
> > creating a directory for SSH keys in test scripts, attempt to clear any
> > ACLs that might otherwise cause the private key to inherit less
> > restrictive permissions than it requires.
> 
> I was somewhat surprised to see your report, since all these tests
> passed without issue for me on '-rc0'! :D (64-bit cygwin only).
> 
> So, the difference seems to be down to FS ACLs, Hmmm ...
> 
> (BTW, I am on windows 10 21H1)

I'm running these tests in subdirectories in the temporary drive on
Dv4-size Windows 11 Pro Gen2 Azure VMs.  I'm spinning up fresh VMs and
using new Cygwin installations regularly, in the name of build
reproducibility; I'm vaguely working on automating more and more of the
Cygwin Git test and release processes.

(At some point now they're becoming available, I'll probably shift to
Ddv5 Azure VMs for this work; I very much doubt that'll make a
difference, but I note it for the sake of completeness.  Longer-term,
I'm hoping to swap to using GitHub Actions to do most of the heavy
lifting.)

This isn't the first time I've seen similar problems in this environment
that haven't been spotted elsewhere: see a1e03535db (t4129: fix
setfacl-related permissions failure, 2020-12-23).

The `getfacl` output for the temporary drive, from Cygwin's perspective,
is as below; I'm `cd`ing into that directory and getting the Git
repositories by running `git clone https://github.com/git/git` from
there.

```
# file: /cygdrive/d
# owner: NETWORK SERVICE
# group: NETWORK SERVICE
user::r-x
group::r-x
group:SYSTEM:rwx        #effective:r-x
group:Administrators:rwx        #effective:r-x
group:Users:r-x
mask::r-x
other::r-x
default:user::rwx
default:group::---
default:group:SYSTEM:rwx
default:group:Administrators:rwx
default:group:Users:rwx
default:mask::rwx
default:other::r-x
```

I'm honestly not sure what it is that means I keep hitting these
problems with this setup.  I've managed to avoid needing anything but
the most cursory knowledge of extended permissions handling,
particularly for Cygwin where one has to contend with both the
underlying OS's interpretation of file permissions and with the Cygwin
layer's reinterpretations.  I can't say I'm keen to get a deep working
knowledge of how all these pieces interact!
Jeff King Nov. 5, 2021, 12:06 p.m. UTC | #8
On Fri, Nov 05, 2021 at 11:25:25AM +0000, Adam Dinwoodie wrote:

> > ... I am not quite sure how this explains "tests relating to ssh
> > signing failing on Cygwin".  After all, this piece of code is
> > lazy_prereq, which means that ssh-keygen in this block that fails
> > (due to a less restrictive permissions) would merely mean that tests
> > that are protected with GPGSSH prerequisite will be skipped without
> > causing test failures.  After all that is the whole point of
> > computing prereq on the fly.
> 
> The issue is that the prerequisite check isn't _just_ checking a
> prerequisite: it's also creating an SSH key that's used without further
> modification by the tests.

This is sort of a side note to your main issue, but I think that relying
on a lazy_prereq for side effects is an anti-pattern. We make no
promises about when or how often the prereqs might be run, and we try to
insulate them from the main tests (by putting them in a subshell and
switching their cwd).

It does happen to work here because the prereq script writes directly to
$GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I
don't think it's really in any danger of breaking, but it is definitely
not using the feature as it was intended. :)

I think the more usual way would be to have an actual
test_expect_success block that creates the keys as a setup step
(possibly triggered by a function, since it's included via lib-gpg.sh).
If we don't want to decide whether we have the GPGSSH prereq until then,
then that test can call test_set_prereq. See the LONG_REF case in t1401
for an example.

Again, that's mostly a tangent to your issue, and maybe not worth
futzing with at this point in the release cycle. I'm mostly just
registering my surprise. ;)

> There are three cases to consider:
> 
> - On systems where this prerequisite check fails, a key may or may not
>   be created, but the tests that rely on the key won't be run, so it
>   doesn't matter either way.
> 
> - On (clearly the mainline) systems where this check passes and there
>   are no ACL problems, the key that's generated is stored with
>   sufficiently restrictive permissions that the tests that rely on the
>   key can pass.
> 
> - On my system, where ACLs are a problem, the prerequisite check passes,
>   and a key is created, but it has permissions that are too permissive.
>   As a result, when a test calls OpenSSH to use that key, OpenSSH
>   refuses due to the permissions, and the test fails.

FWIW, that explanation makes perfect sense to me (and your patch seems
like the right thing to do).

-Peff
Fabian Stelzer Nov. 5, 2021, 12:13 p.m. UTC | #9
On 05.11.2021 08:06, Jeff King wrote:
>On Fri, Nov 05, 2021 at 11:25:25AM +0000, Adam Dinwoodie wrote:
>
>> > ... I am not quite sure how this explains "tests relating to ssh
>> > signing failing on Cygwin".  After all, this piece of code is
>> > lazy_prereq, which means that ssh-keygen in this block that fails
>> > (due to a less restrictive permissions) would merely mean that tests
>> > that are protected with GPGSSH prerequisite will be skipped without
>> > causing test failures.  After all that is the whole point of
>> > computing prereq on the fly.
>>
>> The issue is that the prerequisite check isn't _just_ checking a
>> prerequisite: it's also creating an SSH key that's used without further
>> modification by the tests.
>
>This is sort of a side note to your main issue, but I think that relying
>on a lazy_prereq for side effects is an anti-pattern. We make no
>promises about when or how often the prereqs might be run, and we try to
>insulate them from the main tests (by putting them in a subshell and
>switching their cwd).
>
>It does happen to work here because the prereq script writes directly to
>$GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I
>don't think it's really in any danger of breaking, but it is definitely
>not using the feature as it was intended. :)
>
>I think the more usual way would be to have an actual
>test_expect_success block that creates the keys as a setup step
>(possibly triggered by a function, since it's included via lib-gpg.sh).
>If we don't want to decide whether we have the GPGSSH prereq until then,
>then that test can call test_set_prereq. See the LONG_REF case in t1401
>for an example.

I was not aware of this. I assumed prereq not just meant checking for
pre requisites but also setting them up. Since i still have a follow up
patch series in progress i will keep that in mind and move the actual
setup code into a function in lib-gpg. There are gpg ssh tests in
multiple different test files so i didn't want to create keys for each
of them repeatedly. And i'm not a fan of checking in those file (like
the gpg keyring for testing) since it also needs documentation on how it
was generated that is not quaranteed to match how it was done.

We still could add the actual sign test into the prereq for now.
A `echo "test" | ssh-keygen -Y sign -f $GPGSSHKEY_PRIMARY -n "git"`
will make sure that the keys actually work.

>
>Again, that's mostly a tangent to your issue, and maybe not worth
>futzing with at this point in the release cycle. I'm mostly just
>registering my surprise. ;)
>
>> There are three cases to consider:
>>
>> - On systems where this prerequisite check fails, a key may or may not
>>   be created, but the tests that rely on the key won't be run, so it
>>   doesn't matter either way.
>>
>> - On (clearly the mainline) systems where this check passes and there
>>   are no ACL problems, the key that's generated is stored with
>>   sufficiently restrictive permissions that the tests that rely on the
>>   key can pass.
>>
>> - On my system, where ACLs are a problem, the prerequisite check passes,
>>   and a key is created, but it has permissions that are too permissive.
>>   As a result, when a test calls OpenSSH to use that key, OpenSSH
>>   refuses due to the permissions, and the test fails.
>
>FWIW, that explanation makes perfect sense to me (and your patch seems
>like the right thing to do).
>


>-Peff
Junio C Hamano Nov. 5, 2021, 6:04 p.m. UTC | #10
Jeff King <peff@peff.net> writes:

> On Fri, Nov 05, 2021 at 11:25:25AM +0000, Adam Dinwoodie wrote:
>
>> > ... I am not quite sure how this explains "tests relating to ssh
>> > signing failing on Cygwin".  After all, this piece of code is
>> > lazy_prereq, which means that ssh-keygen in this block that fails
>> > (due to a less restrictive permissions) would merely mean that tests
>> > that are protected with GPGSSH prerequisite will be skipped without
>> > causing test failures.  After all that is the whole point of
>> > computing prereq on the fly.
>> 
>> The issue is that the prerequisite check isn't _just_ checking a
>> prerequisite: it's also creating an SSH key that's used without further
>> modification by the tests.
>
> This is sort of a side note to your main issue, but I think that relying
> on a lazy_prereq for side effects is an anti-pattern. We make no
> promises about when or how often the prereqs might be run, and we try to
> insulate them from the main tests (by putting them in a subshell and
> switching their cwd).
>
> It does happen to work here because the prereq script writes directly to
> $GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I
> don't think it's really in any danger of breaking, but it is definitely
> not using the feature as it was intended. :)

This merely imitates what GPG lazy-prerequisite started and imitated
by other existing signature backends.

I'd expect that you need some "initialization" for a feature X as
part of asking "is feature X usable in this environment?".  Reusing
the result of the initialization for true tests is probably an
optimization worth making.  As long as the question is answered for
the true tests, that is [*].

    side note: so being able to create a key alone, without
    verifying the resulting key is usable, is a no-no.  That is why
    I said it is a good idea to check if the resulting key is usable
    inside the lazy-prereq.

> Again, that's mostly a tangent to your issue, and maybe not worth
> futzing with at this point in the release cycle. I'm mostly just
> registering my surprise. ;)

My purist side is with you and share the surprise.  But my practical
side says this is probably an optimization worth taking.  If prereq
only checks "if we initialize the keys right way, we can use ssh
signing" and then removes the key and the equivalent to .ssh/
directory, and a real test does "Ok, prereq passes so we know ssh
signing is to be tested.  Now initialize the .ssh/ equivalent and
create key", a fix like Adam came up with must be duplicated in two
(or more) places, one for the prereq that initializes the keys
"right way", and one for each test script that prepares the key used
for it.
Junio C Hamano Nov. 5, 2021, 6:14 p.m. UTC | #11
Adam Dinwoodie <adam@dinwoodie.org> writes:

> On Thursday 04 November 2021 at 12:49 pm -0700, Junio C Hamano wrote:
>> Adam Dinwoodie <adam@dinwoodie.org> writes:
>> 
>> > SSH keys are expected to be created with very restrictive permissions,
>> > and SSH commands will fail if the permissions are not appropriate.  When
>> > creating a directory for SSH keys in test scripts, attempt to clear any
>> > ACLs that might otherwise cause the private key to inherit less
>> > restrictive permissions than it requires.
>> 
>> All of the above makes sense as an explanation as to why the
>> ssh-keygen command may be unhappy with the $GNUPGHOME directory that
>> is prepared here, but ...
>> 
>> > This change is required in particular to avoid tests relating to SSH
>> > signing failing in Cygwin.
>> 
>> ... I am not quite sure how this explains "tests relating to ssh
>> signing failing on Cygwin".  After all, this piece of code is
>> lazy_prereq, which means that ssh-keygen in this block that fails
>> (due to a less restrictive permissions) would merely mean that tests
>> that are protected with GPGSSH prerequisite will be skipped without
>> causing test failures.  After all that is the whole point of
>> computing prereq on the fly.
>
> The issue is that the prerequisite check isn't _just_ checking a
> prerequisite: it's also creating an SSH key that's used without further
> modification by the tests.
>
> There are three cases to consider:
>
> - On systems where this prerequisite check fails, a key may or may not
>   be created, but the tests that rely on the key won't be run, so it
>   doesn't matter either way.
>
> - On (clearly the mainline) systems where this check passes and there
>   are no ACL problems, the key that's generated is stored with
>   sufficiently restrictive permissions that the tests that rely on the
>   key can pass.
>
> - On my system, where ACLs are a problem, the prerequisite check passes,
>   and a key is created, but it has permissions that are too permissive.
>   As a result, when a test calls OpenSSH to use that key, OpenSSH
>   refuses due to the permissions, and the test fails.

Makes sense.  If we can update the commit log message so that the
above three points are clear to readers without asking (all three
may not necessarily need to be spelled out in the bulletted list
form), that would be great.
Adam Dinwoodie Nov. 5, 2021, 6:49 p.m. UTC | #12
On Fri, 5 Nov 2021 at 18:04, Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > On Fri, Nov 05, 2021 at 11:25:25AM +0000, Adam Dinwoodie wrote:
> >
> >> > ... I am not quite sure how this explains "tests relating to ssh
> >> > signing failing on Cygwin".  After all, this piece of code is
> >> > lazy_prereq, which means that ssh-keygen in this block that fails
> >> > (due to a less restrictive permissions) would merely mean that tests
> >> > that are protected with GPGSSH prerequisite will be skipped without
> >> > causing test failures.  After all that is the whole point of
> >> > computing prereq on the fly.
> >>
> >> The issue is that the prerequisite check isn't _just_ checking a
> >> prerequisite: it's also creating an SSH key that's used without further
> >> modification by the tests.
> >
> > This is sort of a side note to your main issue, but I think that relying
> > on a lazy_prereq for side effects is an anti-pattern. We make no
> > promises about when or how often the prereqs might be run, and we try to
> > insulate them from the main tests (by putting them in a subshell and
> > switching their cwd).
> >
> > It does happen to work here because the prereq script writes directly to
> > $GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I
> > don't think it's really in any danger of breaking, but it is definitely
> > not using the feature as it was intended. :)
>
> This merely imitates what GPG lazy-prerequisite started and imitated
> by other existing signature backends.
>
> I'd expect that you need some "initialization" for a feature X as
> part of asking "is feature X usable in this environment?".  Reusing
> the result of the initialization for true tests is probably an
> optimization worth making.  As long as the question is answered for
> the true tests, that is [*].
>
>     side note: so being able to create a key alone, without
>     verifying the resulting key is usable, is a no-no.  That is why
>     I said it is a good idea to check if the resulting key is usable
>     inside the lazy-prereq.

I'm not convinced by this. Or at least, I'm convinced by the
principle, but wary of the implications.

Take this case, for example: the function being tested by the
GPGSSH-gated tests is function that should work on Cygwin. If there
were a regression, running the tests on Cygwin ought to catch it, and
in this instance the tests failing meant that we caught a bug. On this
occasion it was a bug in the test library rather than the function
that most Git users care about, but I don't think there's anything
inherent about this situation that means it couldn't have been a
functional bug.

However, if the prerequisite checks had not only created the key but
also verified it could be used, in this scenario these tests would
have been skipped. The function the tests are exercising would still
work, and users would therefore expect it to continue working, but the
only chance we'd have to spot any future regressions is if they're hit
in some other environment or someone spots the tests being skipped by
trawling through the reams of test output to check what tests are
being skipped.

This is probably a much broader conversation. I remember when I first
started packaging Git for Cygwin, I produced a release that didn't
have support for HTTPS URLs due to a missing dependency in my build
environment. The build and test suite all passed -- it assumed I just
wanted to build a release that didn't have HTTPS support -- so some
relatively critical function was silently skipped. I don't know how to
avoid that sort of issue other than relying on (a) user bug (or at
least missing function) reports and (b) folk building Git for
themselves/others periodically going through the output of the
configure scripts and the skipped subtests to make sure only expected
things get missed; neither of those options seem great to me.
Junio C Hamano Nov. 5, 2021, 7:11 p.m. UTC | #13
Adam Dinwoodie <adam@dinwoodie.org> writes:

> This is probably a much broader conversation. I remember when I first
> started packaging Git for Cygwin, I produced a release that didn't
> have support for HTTPS URLs due to a missing dependency in my build
> environment. The build and test suite all passed -- it assumed I just
> wanted to build a release that didn't have HTTPS support -- so some
> relatively critical function was silently skipped. I don't know how to
> avoid that sort of issue other than relying on (a) user bug (or at
> least missing function) reports and (b) folk building Git for
> themselves/others periodically going through the output of the
> configure scripts and the skipped subtests to make sure only expected
> things get missed; neither of those options seem great to me.

I agree with you that there needs a good way to enumerate what the
unsatisfied prerequisites for a particular build are.  That would
have helped in your HTTPS situation.

But that is a separate issue how we should determine a lazy
prerequisite for any feature is satisified.

"We have this feature that our code utilizes. If it is not working
correctly, then we can expect our code that depends on it would not
work, and it is no use testing" is what the test prerequisite system
tries to achieve.  That is quite different from "the frotz feature
could work here as we see a binary /usr/bin/frotz installed, so
let's go test our code that depends on it---we'll find out if the
installed frotz is not what we expect, or way too old to help our
code, as the test will break and let us notice."
Adam Dinwoodie Nov. 5, 2021, 7:24 p.m. UTC | #14
On Fri, 5 Nov 2021 at 19:11, Junio C Hamano <gitster@pobox.com> wrote:
>
> Adam Dinwoodie <adam@dinwoodie.org> writes:
>
> > This is probably a much broader conversation. I remember when I first
> > started packaging Git for Cygwin, I produced a release that didn't
> > have support for HTTPS URLs due to a missing dependency in my build
> > environment. The build and test suite all passed -- it assumed I just
> > wanted to build a release that didn't have HTTPS support -- so some
> > relatively critical function was silently skipped. I don't know how to
> > avoid that sort of issue other than relying on (a) user bug (or at
> > least missing function) reports and (b) folk building Git for
> > themselves/others periodically going through the output of the
> > configure scripts and the skipped subtests to make sure only expected
> > things get missed; neither of those options seem great to me.
>
> I agree with you that there needs a good way to enumerate what the
> unsatisfied prerequisites for a particular build are.  That would
> have helped in your HTTPS situation.
>
> But that is a separate issue how we should determine a lazy
> prerequisite for any feature is satisified.
>
> "We have this feature that our code utilizes. If it is not working
> correctly, then we can expect our code that depends on it would not
> work, and it is no use testing" is what the test prerequisite system
> tries to achieve.  That is quite different from "the frotz feature
> could work here as we see a binary /usr/bin/frotz installed, so
> let's go test our code that depends on it---we'll find out if the
> installed frotz is not what we expect, or way too old to help our
> code, as the test will break and let us notice."

I can see how they're separate problems, but they seem related to me.
If OpenSSH were not installed on my system, Git would be compiled
without this function and the tests would be skipped. If OpenSSH is
installed but the prerequisite check fails, Git will be compiled with
the function, but the tests will be skipped. In the first case,
function some users might depend on will be missing; in the second,
the function will be nominally present but we won't be sure it's
actually working as expected. Both issues would be avoided if the
tests were always run, because suddenly both sorts of silent failure
become noisy.

I'm not actually advocating that -- running all tests all the time
would clearly cause far more problems than it would solve! -- but
that's why I'm seeing these as two sides of the same coin, and
problems that might have a single shared solution.
Carlo Marcelo Arenas Belón Nov. 5, 2021, 9 p.m. UTC | #15
On Fri, Nov 5, 2021 at 1:16 PM Adam Dinwoodie <adam@dinwoodie.org> wrote:

> If OpenSSH were not installed on my system, Git would be compiled
> without this function and the tests would be skipped.

that is correct for the http dependency (because it is a library that
gets linked in), but not for the OpenSSH dependency, which is just
invoking the binary at runtime.

Regardless of what you have in your build environment the code will be
compiled in (and tested or not), and will fail instead at runtime if
OpenSSH is not installed.

Carlo
Ramsay Jones Nov. 5, 2021, 9:44 p.m. UTC | #16
On 05/11/2021 11:47, Adam Dinwoodie wrote:
> On Thursday 04 November 2021 at 08:09 pm +0000, Ramsay Jones wrote:
>> Hi Adam,
>>
>> On 04/11/2021 19:25, Adam Dinwoodie wrote:
>>> SSH keys are expected to be created with very restrictive permissions,
>>> and SSH commands will fail if the permissions are not appropriate.  When
>>> creating a directory for SSH keys in test scripts, attempt to clear any
>>> ACLs that might otherwise cause the private key to inherit less
>>> restrictive permissions than it requires.
>>
>> I was somewhat surprised to see your report, since all these tests
>> passed without issue for me on '-rc0'! :D (64-bit cygwin only).
>>
>> So, the difference seems to be down to FS ACLs, Hmmm ...
>>
>> (BTW, I am on windows 10 21H1)

Just FYI, tests t4202, t5534 and t6200 all pass for me without issue
on both of the -rc0 and -rc1 builds.

> I'm running these tests in subdirectories in the temporary drive on
> Dv4-size Windows 11 Pro Gen2 Azure VMs.  I'm spinning up fresh VMs and
> using new Cygwin installations regularly, in the name of build
> reproducibility; I'm vaguely working on automating more and more of the
> Cygwin Git test and release processes.
> 
> (At some point now they're becoming available, I'll probably shift to
> Ddv5 Azure VMs for this work; I very much doubt that'll make a
> difference, but I note it for the sake of completeness.  Longer-term,
> I'm hoping to swap to using GitHub Actions to do most of the heavy
> lifting.)
> 
> This isn't the first time I've seen similar problems in this environment
> that haven't been spotted elsewhere: see a1e03535db (t4129: fix
> setfacl-related permissions failure, 2020-12-23).
> 
> The `getfacl` output for the temporary drive, from Cygwin's perspective,
> is as below; I'm `cd`ing into that directory and getting the Git
> repositories by running `git clone https://github.com/git/git` from
> there.

Heh, yeah, given the setup above, I'm not exactly shocked that you
are running into permission problems ... ;-)

> ```
> # file: /cygdrive/d
> # owner: NETWORK SERVICE
> # group: NETWORK SERVICE
> user::r-x
> group::r-x
> group:SYSTEM:rwx        #effective:r-x
> group:Administrators:rwx        #effective:r-x
> group:Users:r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:group::---
> default:group:SYSTEM:rwx
> default:group:Administrators:rwx
> default:group:Users:rwx
> default:mask::rwx
> default:other::r-x
> ```

I have been using cygwin since the 'beta-8' days (windows NT 3.51, so about
1997 or so) and have run into several permission problems over the years.
So, in order to finesse these issues, I find it best to keep it simple.
I do not move outside of my cygwin installation (at C:\cygwin64), which
even includes my home directory and all git repos.

So, for me:
  
  $ echo $HOME
  /home/ramsay
  $ cygpath -w /home/ramsay
  C:\cygwin64\home\ramsay
  $ 
  
  $ getfacl /cygdrive/c/cygwin64
  # file: /cygdrive/c/cygwin64
  # owner: ramsay
  # group: None
  user::rwx
  group::r-x
  other::r-x
  default:user::rwx
  default:group::r-x
  default:other::r-x
  
  $ id
  uid=1001(ramsay) gid=513(None) groups=513(None),545(Users),4(INTERACTIVE),66049(CONSOLE LOGON),11(Authenticated Users),15(This Organization),113(Local account),4095(CurrentSession),66048(LOCAL),262154(NTLM Authentication),401408(Medium Mandatory Level)
  $

> I'm honestly not sure what it is that means I keep hitting these
> problems with this setup.  I've managed to avoid needing anything but
> the most cursory knowledge of extended permissions handling,
> particularly for Cygwin where one has to contend with both the
> underlying OS's interpretation of file permissions and with the Cygwin
> layer's reinterpretations.  I can't say I'm keen to get a deep working
> knowledge of how all these pieces interact!

I'm definitely no expert, but even with my current setup, I have had
permission problems. I used to 'ssh' into cygwin from Linux so that
I could build/test git on Linux/cygwin at the same time - that worked
fine for many many years, until a test was added that failed when I
was remotely logged-in to cygwin, but passed when I was actually directly
logged-in on the windows laptop. I don't remember the details, but ever
since I have been having to run the tests locally.

[When remotely logged in:

  $ id
  uid=1001(ramsay) gid=513(None) groups=513(None),114(Local account and member of Administrators group),0(root),545(Users),2(NETWORK),11(Authenticated Users),15(This Organization),113(Local account),4095(CurrentSession),262154(NTLM Authentication),405504(High Mandatory Level)
  $ 

Yes, I am still using the 'privileged user' account for the 'sshd' service.
I suppose I should re-configure it to use the LOCAL ACCOUNT and test again,
but, well, if it ain't broke ... ;-)
]

ATB,
Ramsay Jones
Jeff King Nov. 5, 2021, 11:39 p.m. UTC | #17
On Fri, Nov 05, 2021 at 11:04:15AM -0700, Junio C Hamano wrote:

> > This is sort of a side note to your main issue, but I think that relying
> > on a lazy_prereq for side effects is an anti-pattern. We make no
> > promises about when or how often the prereqs might be run, and we try to
> > insulate them from the main tests (by putting them in a subshell and
> > switching their cwd).
> >
> > It does happen to work here because the prereq script writes directly to
> > $GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I
> > don't think it's really in any danger of breaking, but it is definitely
> > not using the feature as it was intended. :)
> 
> This merely imitates what GPG lazy-prerequisite started and imitated
> by other existing signature backends.

Ah, you're right. I should have checked the other GPG ones. It looks
like that happened recently-ish in b417ec5f22 (tests: turn GPG, GPGSM
and RFC1991 into lazy prereqs, 2020-03-26).

Before that the code was run outside of any test block at all, which I
think is even worse.

> I'd expect that you need some "initialization" for a feature X as
> part of asking "is feature X usable in this environment?".  Reusing
> the result of the initialization for true tests is probably an
> optimization worth making.  As long as the question is answered for
> the true tests, that is [*].

Yes, though if it's possible, I think doing less work in the prereq
check might be a good approach (like say, just checking gpg or openssh
version if we can). It results in flakier prereqs that may say "yes, we
have feature X" when we don't. But it gets a human's attention when
it fails, rather than quietly skipping tests (which is the same point
Adam is making downthread).

It definitely is not something to fiddle with at this point in the -rc
cycle, though.

> > Again, that's mostly a tangent to your issue, and maybe not worth
> > futzing with at this point in the release cycle. I'm mostly just
> > registering my surprise. ;)
> 
> My purist side is with you and share the surprise.  But my practical
> side says this is probably an optimization worth taking.  If prereq
> only checks "if we initialize the keys right way, we can use ssh
> signing" and then removes the key and the equivalent to .ssh/
> directory, and a real test does "Ok, prereq passes so we know ssh
> signing is to be tested.  Now initialize the .ssh/ equivalent and
> create key", a fix like Adam came up with must be duplicated in two
> (or more) places, one for the prereq that initializes the keys
> "right way", and one for each test script that prepares the key used
> for it.

To be clear, I wasn't suggesting doing the key setup twice. I was just
suggesting moving it out of a lazy prereq into a real test_expect block
that sets the prereq flag as a side effect. That just makes the timing
and the fact of running it more deliberate on the part of the test
scripts.

It's probably not worth the effort, though. I think my line of thinking
is coming from the "purist" side, and doesn't have any practical
benefit.

-Peff
Jeff King Nov. 5, 2021, 11:53 p.m. UTC | #18
On Fri, Nov 05, 2021 at 06:49:14PM +0000, Adam Dinwoodie wrote:

> This is probably a much broader conversation. I remember when I first
> started packaging Git for Cygwin, I produced a release that didn't
> have support for HTTPS URLs due to a missing dependency in my build
> environment. The build and test suite all passed -- it assumed I just
> wanted to build a release that didn't have HTTPS support -- so some
> relatively critical function was silently skipped. I don't know how to
> avoid that sort of issue other than relying on (a) user bug (or at
> least missing function) reports and (b) folk building Git for
> themselves/others periodically going through the output of the
> configure scripts and the skipped subtests to make sure only expected
> things get missed; neither of those options seem great to me.

The HTTP tests in particular have a knob for this, as I was worried
about this kind of situation when we introduced auto-enabling of network
tests back in 83d842dc8c (tests: turn on network daemon tests by
default, 2014-02-10). The solution there was to make the knob a
tri-state: the default is "auto", which will try to probe whether we
have a working apache setup, but setting it to "true" will complain if
that setup fails.

Now that's not a perfect solution:

  - you have to know to flip the switch to "true". For an old switch
    like HTTP, that's easy. But somebody packaging Git might not even
    realize GPGSSH was a new thing.

  - The "true" knob only covers probing of the environment. If you
    accidentally build with NO_CURL, we'd still quietly skip the tests.
    It might be reasonable to change this.

  - In your particular case, it probably would not have helped anyway
    because we don't have any specific HTTPS tests (there is an option
    to set up the default server with SSL, but I didn't even realize
    that until just now; I wonder if it actually works).

So I dunno. I guess because of point 1, having an allow-known-skips list
would be more helpful. That gives you the opportunity to examine new
prereqs and decide if they ought to be skipped or not in your setup.

-Peff
diff mbox series

Patch

diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh
index f99ef3e859..1d8e5b5b7e 100644
--- a/t/lib-gpg.sh
+++ b/t/lib-gpg.sh
@@ -106,6 +106,7 @@  test_lazy_prereq GPGSSH '
 	test $? = 0 || exit 1;
 	mkdir -p "${GNUPGHOME}" &&
 	chmod 0700 "${GNUPGHOME}" &&
+	(setfacl -k "${GNUPGHOME}" 2>/dev/null || true) &&
 	ssh-keygen -t ed25519 -N "" -C "git ed25519 key" -f "${GPGSSH_KEY_PRIMARY}" >/dev/null &&
 	echo "\"principal with number 1\" $(cat "${GPGSSH_KEY_PRIMARY}.pub")" >> "${GPGSSH_ALLOWED_SIGNERS}" &&
 	ssh-keygen -t rsa -b 2048 -N "" -C "git rsa2048 key" -f "${GPGSSH_KEY_SECONDARY}" >/dev/null &&