Message ID | 20211104192533.2520-1-adam@dinwoodie.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | t/lib-git.sh: fix ACL-related permissions failure | expand |
Adam Dinwoodie <adam@dinwoodie.org> writes: > SSH keys are expected to be created with very restrictive permissions, > and SSH commands will fail if the permissions are not appropriate. When > creating a directory for SSH keys in test scripts, attempt to clear any > ACLs that might otherwise cause the private key to inherit less > restrictive permissions than it requires. All of the above makes sense as an explanation as to why the ssh-keygen command may be unhappy with the $GNUPGHOME directory that is prepared here, but ... > This change is required in particular to avoid tests relating to SSH > signing failing in Cygwin. ... I am not quite sure how this explains "tests relating to ssh signing failing on Cygwin". After all, this piece of code is lazy_prereq, which means that ssh-keygen in this block that fails (due to a less restrictive permissions) would merely mean that tests that are protected with GPGSSH prerequisite will be skipped without causing test failures. After all that is the whole point of computing prereq on the fly. > Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> > Helped-by: Fabian Stelzer <fs@gigacodes.de> Please order these chronologically, i.e. Fabian helped and the patch was finished, and finally you sent with your sign off. > --- > t/lib-gpg.sh | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh > index f99ef3e859..1d8e5b5b7e 100644 > --- a/t/lib-gpg.sh > +++ b/t/lib-gpg.sh > @@ -106,6 +106,7 @@ test_lazy_prereq GPGSSH ' > test $? = 0 || exit 1; > mkdir -p "${GNUPGHOME}" && > chmod 0700 "${GNUPGHOME}" && > + (setfacl -k "${GNUPGHOME}" 2>/dev/null || true) && > ssh-keygen -t ed25519 -N "" -C "git ed25519 key" -f "${GPGSSH_KEY_PRIMARY}" >/dev/null && > echo "\"principal with number 1\" $(cat "${GPGSSH_KEY_PRIMARY}.pub")" >> "${GPGSSH_ALLOWED_SIGNERS}" && > ssh-keygen -t rsa -b 2048 -N "" -C "git rsa2048 key" -f "${GPGSSH_KEY_SECONDARY}" >/dev/null && There are other uses of ssh-keygen in the real tests but presumably they just use the GNUPGHOME directory prepared with this lazy_prereq block, and "setfacl -k" here would have wiped any possible loosening of permission, and that is why this is the only place that needs a change, right? That fact might deserve recording in the proposed log message. Thanks.
Junio C Hamano <gitster@pobox.com> writes: >> This change is required in particular to avoid tests relating to SSH >> signing failing in Cygwin. > > ... I am not quite sure how this explains "tests relating to ssh > signing failing on Cygwin". After all, this piece of code is > lazy_prereq, which means that ssh-keygen in this block that fails > (due to a less restrictive permissions) would merely mean that tests > that are protected with GPGSSH prerequisite will be skipped without > causing test failures. After all that is the whole point of > computing prereq on the fly. The reason why I wondered about the above is that it can be an indication of another breakage, namely, that we may have tests that require a working ssh-keygen but are by mistake not protected with GPGSSH prerequisite. The test_lazy_prereq block you touched may refrain from setting the prerequisite on your system (due to the faulty test here that you touched), but if we had such unprotected tests, we still will run ssh signing tests and they would fail, due to the lack of the prerequisite. And fixing the prereq block alone will hide that other breakage, at least on your system. Hence my question. Thanks.
Hi Adam, On 04/11/2021 19:25, Adam Dinwoodie wrote: > SSH keys are expected to be created with very restrictive permissions, > and SSH commands will fail if the permissions are not appropriate. When > creating a directory for SSH keys in test scripts, attempt to clear any > ACLs that might otherwise cause the private key to inherit less > restrictive permissions than it requires. I was somewhat surprised to see your report, since all these tests passed without issue for me on '-rc0'! :D (64-bit cygwin only). So, the difference seems to be down to FS ACLs, Hmmm ... (BTW, I am on windows 10 21H1) ATB, Ramsay Jones
On 04.11.2021 13:03, Junio C Hamano wrote: >Junio C Hamano <gitster@pobox.com> writes: > >>> This change is required in particular to avoid tests relating to SSH >>> signing failing in Cygwin. >> >> ... I am not quite sure how this explains "tests relating to ssh >> signing failing on Cygwin". After all, this piece of code is >> lazy_prereq, which means that ssh-keygen in this block that fails >> (due to a less restrictive permissions) would merely mean that tests >> that are protected with GPGSSH prerequisite will be skipped without >> causing test failures. After all that is the whole point of >> computing prereq on the fly. > >The reason why I wondered about the above is that it can be an >indication of another breakage, namely, that we may have tests that >require a working ssh-keygen but are by mistake not protected with >GPGSSH prerequisite. > >The test_lazy_prereq block you touched may refrain from setting the >prerequisite on your system (due to the faulty test here that you >touched), but if we had such unprotected tests, we still will run >ssh signing tests and they would fail, due to the lack of the >prerequisite. > >And fixing the prereq block alone will hide that other breakage, at >least on your system. Hence my question. > >Thanks. The problem is that the ssh-keygen in the layz_prereq will succeed but might create a private key with world readable permissions. Only the remaining tests using this key will then fail with a "your private key permissions are too restrictive" like error. If we would like to make sure in the prereq that the keys actually work fine we would need to do a signing operation with them in it. Something like the following call would be enough: echo "test" | ssh-keygen -Y sign -f $GPGSSHKEY_PRIMARY -n "git" Not sure if we want to go that far though. The setfacl seems fine to me otherwise.
Fabian Stelzer <fs@gigacodes.de> writes: > The problem is that the ssh-keygen in the layz_prereq will succeed but > might create a private key with world readable permissions. Only the > remaining tests using this key will then fail with a "your private key > permissions are too restrictive" like error. If we would like to make > sure in the prereq that the keys actually work fine we would need to do > a signing operation with them in it. That sounds like a right thing to do, with or without the setfacl fix. > > Something like the following call would be enough: > echo "test" | ssh-keygen -Y sign -f $GPGSSHKEY_PRIMARY -n "git" > Not sure if we want to go that far though. The setfacl seems fine to me > otherwise.
On Thursday 04 November 2021 at 12:49 pm -0700, Junio C Hamano wrote: > Adam Dinwoodie <adam@dinwoodie.org> writes: > > > SSH keys are expected to be created with very restrictive permissions, > > and SSH commands will fail if the permissions are not appropriate. When > > creating a directory for SSH keys in test scripts, attempt to clear any > > ACLs that might otherwise cause the private key to inherit less > > restrictive permissions than it requires. > > All of the above makes sense as an explanation as to why the > ssh-keygen command may be unhappy with the $GNUPGHOME directory that > is prepared here, but ... > > > This change is required in particular to avoid tests relating to SSH > > signing failing in Cygwin. > > ... I am not quite sure how this explains "tests relating to ssh > signing failing on Cygwin". After all, this piece of code is > lazy_prereq, which means that ssh-keygen in this block that fails > (due to a less restrictive permissions) would merely mean that tests > that are protected with GPGSSH prerequisite will be skipped without > causing test failures. After all that is the whole point of > computing prereq on the fly. The issue is that the prerequisite check isn't _just_ checking a prerequisite: it's also creating an SSH key that's used without further modification by the tests. There are three cases to consider: - On systems where this prerequisite check fails, a key may or may not be created, but the tests that rely on the key won't be run, so it doesn't matter either way. - On (clearly the mainline) systems where this check passes and there are no ACL problems, the key that's generated is stored with sufficiently restrictive permissions that the tests that rely on the key can pass. - On my system, where ACLs are a problem, the prerequisite check passes, and a key is created, but it has permissions that are too permissive. As a result, when a test calls OpenSSH to use that key, OpenSSH refuses due to the permissions, and the test fails. > > Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> > > Helped-by: Fabian Stelzer <fs@gigacodes.de> > > Please order these chronologically, i.e. Fabian helped and the patch > was finished, and finally you sent with your sign off. Shall do! I'll resubmit a corrected version as soon as all the other discussions about this patch seem tidied up. > > --- > > t/lib-gpg.sh | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh > > index f99ef3e859..1d8e5b5b7e 100644 > > --- a/t/lib-gpg.sh > > +++ b/t/lib-gpg.sh > > @@ -106,6 +106,7 @@ test_lazy_prereq GPGSSH ' > > test $? = 0 || exit 1; > > mkdir -p "${GNUPGHOME}" && > > chmod 0700 "${GNUPGHOME}" && > > + (setfacl -k "${GNUPGHOME}" 2>/dev/null || true) && > > ssh-keygen -t ed25519 -N "" -C "git ed25519 key" -f "${GPGSSH_KEY_PRIMARY}" >/dev/null && > > echo "\"principal with number 1\" $(cat "${GPGSSH_KEY_PRIMARY}.pub")" >> "${GPGSSH_ALLOWED_SIGNERS}" && > > ssh-keygen -t rsa -b 2048 -N "" -C "git rsa2048 key" -f "${GPGSSH_KEY_SECONDARY}" >/dev/null && > > There are other uses of ssh-keygen in the real tests but presumably > they just use the GNUPGHOME directory prepared with this lazy_prereq > block, and "setfacl -k" here would have wiped any possible loosening > of permission, and that is why this is the only place that needs a > change, right? That fact might deserve recording in the proposed > log message. More than that: the uses of ssh-keygen elsewhere are calling `ssh-keygen -l`. That command doesn't generate a key at all, it merely calculates the fingerprint from the keys that are generated with the ssh-keygen commands in this prerequisite check. I think it's unusual that this test_lazy_prereq check isn't just checking the state of the system but is creating the keys that will be used in later tests, but I didn't think to document that in my log because that was clearly a decision that had been made earlier. But I'm very happy to make that logic clearer in my commit message!
On Thursday 04 November 2021 at 08:09 pm +0000, Ramsay Jones wrote: > Hi Adam, > > On 04/11/2021 19:25, Adam Dinwoodie wrote: > > SSH keys are expected to be created with very restrictive permissions, > > and SSH commands will fail if the permissions are not appropriate. When > > creating a directory for SSH keys in test scripts, attempt to clear any > > ACLs that might otherwise cause the private key to inherit less > > restrictive permissions than it requires. > > I was somewhat surprised to see your report, since all these tests > passed without issue for me on '-rc0'! :D (64-bit cygwin only). > > So, the difference seems to be down to FS ACLs, Hmmm ... > > (BTW, I am on windows 10 21H1) I'm running these tests in subdirectories in the temporary drive on Dv4-size Windows 11 Pro Gen2 Azure VMs. I'm spinning up fresh VMs and using new Cygwin installations regularly, in the name of build reproducibility; I'm vaguely working on automating more and more of the Cygwin Git test and release processes. (At some point now they're becoming available, I'll probably shift to Ddv5 Azure VMs for this work; I very much doubt that'll make a difference, but I note it for the sake of completeness. Longer-term, I'm hoping to swap to using GitHub Actions to do most of the heavy lifting.) This isn't the first time I've seen similar problems in this environment that haven't been spotted elsewhere: see a1e03535db (t4129: fix setfacl-related permissions failure, 2020-12-23). The `getfacl` output for the temporary drive, from Cygwin's perspective, is as below; I'm `cd`ing into that directory and getting the Git repositories by running `git clone https://github.com/git/git` from there. ``` # file: /cygdrive/d # owner: NETWORK SERVICE # group: NETWORK SERVICE user::r-x group::r-x group:SYSTEM:rwx #effective:r-x group:Administrators:rwx #effective:r-x group:Users:r-x mask::r-x other::r-x default:user::rwx default:group::--- default:group:SYSTEM:rwx default:group:Administrators:rwx default:group:Users:rwx default:mask::rwx default:other::r-x ``` I'm honestly not sure what it is that means I keep hitting these problems with this setup. I've managed to avoid needing anything but the most cursory knowledge of extended permissions handling, particularly for Cygwin where one has to contend with both the underlying OS's interpretation of file permissions and with the Cygwin layer's reinterpretations. I can't say I'm keen to get a deep working knowledge of how all these pieces interact!
On Fri, Nov 05, 2021 at 11:25:25AM +0000, Adam Dinwoodie wrote: > > ... I am not quite sure how this explains "tests relating to ssh > > signing failing on Cygwin". After all, this piece of code is > > lazy_prereq, which means that ssh-keygen in this block that fails > > (due to a less restrictive permissions) would merely mean that tests > > that are protected with GPGSSH prerequisite will be skipped without > > causing test failures. After all that is the whole point of > > computing prereq on the fly. > > The issue is that the prerequisite check isn't _just_ checking a > prerequisite: it's also creating an SSH key that's used without further > modification by the tests. This is sort of a side note to your main issue, but I think that relying on a lazy_prereq for side effects is an anti-pattern. We make no promises about when or how often the prereqs might be run, and we try to insulate them from the main tests (by putting them in a subshell and switching their cwd). It does happen to work here because the prereq script writes directly to $GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I don't think it's really in any danger of breaking, but it is definitely not using the feature as it was intended. :) I think the more usual way would be to have an actual test_expect_success block that creates the keys as a setup step (possibly triggered by a function, since it's included via lib-gpg.sh). If we don't want to decide whether we have the GPGSSH prereq until then, then that test can call test_set_prereq. See the LONG_REF case in t1401 for an example. Again, that's mostly a tangent to your issue, and maybe not worth futzing with at this point in the release cycle. I'm mostly just registering my surprise. ;) > There are three cases to consider: > > - On systems where this prerequisite check fails, a key may or may not > be created, but the tests that rely on the key won't be run, so it > doesn't matter either way. > > - On (clearly the mainline) systems where this check passes and there > are no ACL problems, the key that's generated is stored with > sufficiently restrictive permissions that the tests that rely on the > key can pass. > > - On my system, where ACLs are a problem, the prerequisite check passes, > and a key is created, but it has permissions that are too permissive. > As a result, when a test calls OpenSSH to use that key, OpenSSH > refuses due to the permissions, and the test fails. FWIW, that explanation makes perfect sense to me (and your patch seems like the right thing to do). -Peff
On 05.11.2021 08:06, Jeff King wrote: >On Fri, Nov 05, 2021 at 11:25:25AM +0000, Adam Dinwoodie wrote: > >> > ... I am not quite sure how this explains "tests relating to ssh >> > signing failing on Cygwin". After all, this piece of code is >> > lazy_prereq, which means that ssh-keygen in this block that fails >> > (due to a less restrictive permissions) would merely mean that tests >> > that are protected with GPGSSH prerequisite will be skipped without >> > causing test failures. After all that is the whole point of >> > computing prereq on the fly. >> >> The issue is that the prerequisite check isn't _just_ checking a >> prerequisite: it's also creating an SSH key that's used without further >> modification by the tests. > >This is sort of a side note to your main issue, but I think that relying >on a lazy_prereq for side effects is an anti-pattern. We make no >promises about when or how often the prereqs might be run, and we try to >insulate them from the main tests (by putting them in a subshell and >switching their cwd). > >It does happen to work here because the prereq script writes directly to >$GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I >don't think it's really in any danger of breaking, but it is definitely >not using the feature as it was intended. :) > >I think the more usual way would be to have an actual >test_expect_success block that creates the keys as a setup step >(possibly triggered by a function, since it's included via lib-gpg.sh). >If we don't want to decide whether we have the GPGSSH prereq until then, >then that test can call test_set_prereq. See the LONG_REF case in t1401 >for an example. I was not aware of this. I assumed prereq not just meant checking for pre requisites but also setting them up. Since i still have a follow up patch series in progress i will keep that in mind and move the actual setup code into a function in lib-gpg. There are gpg ssh tests in multiple different test files so i didn't want to create keys for each of them repeatedly. And i'm not a fan of checking in those file (like the gpg keyring for testing) since it also needs documentation on how it was generated that is not quaranteed to match how it was done. We still could add the actual sign test into the prereq for now. A `echo "test" | ssh-keygen -Y sign -f $GPGSSHKEY_PRIMARY -n "git"` will make sure that the keys actually work. > >Again, that's mostly a tangent to your issue, and maybe not worth >futzing with at this point in the release cycle. I'm mostly just >registering my surprise. ;) > >> There are three cases to consider: >> >> - On systems where this prerequisite check fails, a key may or may not >> be created, but the tests that rely on the key won't be run, so it >> doesn't matter either way. >> >> - On (clearly the mainline) systems where this check passes and there >> are no ACL problems, the key that's generated is stored with >> sufficiently restrictive permissions that the tests that rely on the >> key can pass. >> >> - On my system, where ACLs are a problem, the prerequisite check passes, >> and a key is created, but it has permissions that are too permissive. >> As a result, when a test calls OpenSSH to use that key, OpenSSH >> refuses due to the permissions, and the test fails. > >FWIW, that explanation makes perfect sense to me (and your patch seems >like the right thing to do). > >-Peff
Jeff King <peff@peff.net> writes: > On Fri, Nov 05, 2021 at 11:25:25AM +0000, Adam Dinwoodie wrote: > >> > ... I am not quite sure how this explains "tests relating to ssh >> > signing failing on Cygwin". After all, this piece of code is >> > lazy_prereq, which means that ssh-keygen in this block that fails >> > (due to a less restrictive permissions) would merely mean that tests >> > that are protected with GPGSSH prerequisite will be skipped without >> > causing test failures. After all that is the whole point of >> > computing prereq on the fly. >> >> The issue is that the prerequisite check isn't _just_ checking a >> prerequisite: it's also creating an SSH key that's used without further >> modification by the tests. > > This is sort of a side note to your main issue, but I think that relying > on a lazy_prereq for side effects is an anti-pattern. We make no > promises about when or how often the prereqs might be run, and we try to > insulate them from the main tests (by putting them in a subshell and > switching their cwd). > > It does happen to work here because the prereq script writes directly to > $GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I > don't think it's really in any danger of breaking, but it is definitely > not using the feature as it was intended. :) This merely imitates what GPG lazy-prerequisite started and imitated by other existing signature backends. I'd expect that you need some "initialization" for a feature X as part of asking "is feature X usable in this environment?". Reusing the result of the initialization for true tests is probably an optimization worth making. As long as the question is answered for the true tests, that is [*]. side note: so being able to create a key alone, without verifying the resulting key is usable, is a no-no. That is why I said it is a good idea to check if the resulting key is usable inside the lazy-prereq. > Again, that's mostly a tangent to your issue, and maybe not worth > futzing with at this point in the release cycle. I'm mostly just > registering my surprise. ;) My purist side is with you and share the surprise. But my practical side says this is probably an optimization worth taking. If prereq only checks "if we initialize the keys right way, we can use ssh signing" and then removes the key and the equivalent to .ssh/ directory, and a real test does "Ok, prereq passes so we know ssh signing is to be tested. Now initialize the .ssh/ equivalent and create key", a fix like Adam came up with must be duplicated in two (or more) places, one for the prereq that initializes the keys "right way", and one for each test script that prepares the key used for it.
Adam Dinwoodie <adam@dinwoodie.org> writes: > On Thursday 04 November 2021 at 12:49 pm -0700, Junio C Hamano wrote: >> Adam Dinwoodie <adam@dinwoodie.org> writes: >> >> > SSH keys are expected to be created with very restrictive permissions, >> > and SSH commands will fail if the permissions are not appropriate. When >> > creating a directory for SSH keys in test scripts, attempt to clear any >> > ACLs that might otherwise cause the private key to inherit less >> > restrictive permissions than it requires. >> >> All of the above makes sense as an explanation as to why the >> ssh-keygen command may be unhappy with the $GNUPGHOME directory that >> is prepared here, but ... >> >> > This change is required in particular to avoid tests relating to SSH >> > signing failing in Cygwin. >> >> ... I am not quite sure how this explains "tests relating to ssh >> signing failing on Cygwin". After all, this piece of code is >> lazy_prereq, which means that ssh-keygen in this block that fails >> (due to a less restrictive permissions) would merely mean that tests >> that are protected with GPGSSH prerequisite will be skipped without >> causing test failures. After all that is the whole point of >> computing prereq on the fly. > > The issue is that the prerequisite check isn't _just_ checking a > prerequisite: it's also creating an SSH key that's used without further > modification by the tests. > > There are three cases to consider: > > - On systems where this prerequisite check fails, a key may or may not > be created, but the tests that rely on the key won't be run, so it > doesn't matter either way. > > - On (clearly the mainline) systems where this check passes and there > are no ACL problems, the key that's generated is stored with > sufficiently restrictive permissions that the tests that rely on the > key can pass. > > - On my system, where ACLs are a problem, the prerequisite check passes, > and a key is created, but it has permissions that are too permissive. > As a result, when a test calls OpenSSH to use that key, OpenSSH > refuses due to the permissions, and the test fails. Makes sense. If we can update the commit log message so that the above three points are clear to readers without asking (all three may not necessarily need to be spelled out in the bulletted list form), that would be great.
On Fri, 5 Nov 2021 at 18:04, Junio C Hamano <gitster@pobox.com> wrote: > > Jeff King <peff@peff.net> writes: > > > On Fri, Nov 05, 2021 at 11:25:25AM +0000, Adam Dinwoodie wrote: > > > >> > ... I am not quite sure how this explains "tests relating to ssh > >> > signing failing on Cygwin". After all, this piece of code is > >> > lazy_prereq, which means that ssh-keygen in this block that fails > >> > (due to a less restrictive permissions) would merely mean that tests > >> > that are protected with GPGSSH prerequisite will be skipped without > >> > causing test failures. After all that is the whole point of > >> > computing prereq on the fly. > >> > >> The issue is that the prerequisite check isn't _just_ checking a > >> prerequisite: it's also creating an SSH key that's used without further > >> modification by the tests. > > > > This is sort of a side note to your main issue, but I think that relying > > on a lazy_prereq for side effects is an anti-pattern. We make no > > promises about when or how often the prereqs might be run, and we try to > > insulate them from the main tests (by putting them in a subshell and > > switching their cwd). > > > > It does happen to work here because the prereq script writes directly to > > $GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I > > don't think it's really in any danger of breaking, but it is definitely > > not using the feature as it was intended. :) > > This merely imitates what GPG lazy-prerequisite started and imitated > by other existing signature backends. > > I'd expect that you need some "initialization" for a feature X as > part of asking "is feature X usable in this environment?". Reusing > the result of the initialization for true tests is probably an > optimization worth making. As long as the question is answered for > the true tests, that is [*]. > > side note: so being able to create a key alone, without > verifying the resulting key is usable, is a no-no. That is why > I said it is a good idea to check if the resulting key is usable > inside the lazy-prereq. I'm not convinced by this. Or at least, I'm convinced by the principle, but wary of the implications. Take this case, for example: the function being tested by the GPGSSH-gated tests is function that should work on Cygwin. If there were a regression, running the tests on Cygwin ought to catch it, and in this instance the tests failing meant that we caught a bug. On this occasion it was a bug in the test library rather than the function that most Git users care about, but I don't think there's anything inherent about this situation that means it couldn't have been a functional bug. However, if the prerequisite checks had not only created the key but also verified it could be used, in this scenario these tests would have been skipped. The function the tests are exercising would still work, and users would therefore expect it to continue working, but the only chance we'd have to spot any future regressions is if they're hit in some other environment or someone spots the tests being skipped by trawling through the reams of test output to check what tests are being skipped. This is probably a much broader conversation. I remember when I first started packaging Git for Cygwin, I produced a release that didn't have support for HTTPS URLs due to a missing dependency in my build environment. The build and test suite all passed -- it assumed I just wanted to build a release that didn't have HTTPS support -- so some relatively critical function was silently skipped. I don't know how to avoid that sort of issue other than relying on (a) user bug (or at least missing function) reports and (b) folk building Git for themselves/others periodically going through the output of the configure scripts and the skipped subtests to make sure only expected things get missed; neither of those options seem great to me.
Adam Dinwoodie <adam@dinwoodie.org> writes: > This is probably a much broader conversation. I remember when I first > started packaging Git for Cygwin, I produced a release that didn't > have support for HTTPS URLs due to a missing dependency in my build > environment. The build and test suite all passed -- it assumed I just > wanted to build a release that didn't have HTTPS support -- so some > relatively critical function was silently skipped. I don't know how to > avoid that sort of issue other than relying on (a) user bug (or at > least missing function) reports and (b) folk building Git for > themselves/others periodically going through the output of the > configure scripts and the skipped subtests to make sure only expected > things get missed; neither of those options seem great to me. I agree with you that there needs a good way to enumerate what the unsatisfied prerequisites for a particular build are. That would have helped in your HTTPS situation. But that is a separate issue how we should determine a lazy prerequisite for any feature is satisified. "We have this feature that our code utilizes. If it is not working correctly, then we can expect our code that depends on it would not work, and it is no use testing" is what the test prerequisite system tries to achieve. That is quite different from "the frotz feature could work here as we see a binary /usr/bin/frotz installed, so let's go test our code that depends on it---we'll find out if the installed frotz is not what we expect, or way too old to help our code, as the test will break and let us notice."
On Fri, 5 Nov 2021 at 19:11, Junio C Hamano <gitster@pobox.com> wrote: > > Adam Dinwoodie <adam@dinwoodie.org> writes: > > > This is probably a much broader conversation. I remember when I first > > started packaging Git for Cygwin, I produced a release that didn't > > have support for HTTPS URLs due to a missing dependency in my build > > environment. The build and test suite all passed -- it assumed I just > > wanted to build a release that didn't have HTTPS support -- so some > > relatively critical function was silently skipped. I don't know how to > > avoid that sort of issue other than relying on (a) user bug (or at > > least missing function) reports and (b) folk building Git for > > themselves/others periodically going through the output of the > > configure scripts and the skipped subtests to make sure only expected > > things get missed; neither of those options seem great to me. > > I agree with you that there needs a good way to enumerate what the > unsatisfied prerequisites for a particular build are. That would > have helped in your HTTPS situation. > > But that is a separate issue how we should determine a lazy > prerequisite for any feature is satisified. > > "We have this feature that our code utilizes. If it is not working > correctly, then we can expect our code that depends on it would not > work, and it is no use testing" is what the test prerequisite system > tries to achieve. That is quite different from "the frotz feature > could work here as we see a binary /usr/bin/frotz installed, so > let's go test our code that depends on it---we'll find out if the > installed frotz is not what we expect, or way too old to help our > code, as the test will break and let us notice." I can see how they're separate problems, but they seem related to me. If OpenSSH were not installed on my system, Git would be compiled without this function and the tests would be skipped. If OpenSSH is installed but the prerequisite check fails, Git will be compiled with the function, but the tests will be skipped. In the first case, function some users might depend on will be missing; in the second, the function will be nominally present but we won't be sure it's actually working as expected. Both issues would be avoided if the tests were always run, because suddenly both sorts of silent failure become noisy. I'm not actually advocating that -- running all tests all the time would clearly cause far more problems than it would solve! -- but that's why I'm seeing these as two sides of the same coin, and problems that might have a single shared solution.
On Fri, Nov 5, 2021 at 1:16 PM Adam Dinwoodie <adam@dinwoodie.org> wrote: > If OpenSSH were not installed on my system, Git would be compiled > without this function and the tests would be skipped. that is correct for the http dependency (because it is a library that gets linked in), but not for the OpenSSH dependency, which is just invoking the binary at runtime. Regardless of what you have in your build environment the code will be compiled in (and tested or not), and will fail instead at runtime if OpenSSH is not installed. Carlo
On 05/11/2021 11:47, Adam Dinwoodie wrote: > On Thursday 04 November 2021 at 08:09 pm +0000, Ramsay Jones wrote: >> Hi Adam, >> >> On 04/11/2021 19:25, Adam Dinwoodie wrote: >>> SSH keys are expected to be created with very restrictive permissions, >>> and SSH commands will fail if the permissions are not appropriate. When >>> creating a directory for SSH keys in test scripts, attempt to clear any >>> ACLs that might otherwise cause the private key to inherit less >>> restrictive permissions than it requires. >> >> I was somewhat surprised to see your report, since all these tests >> passed without issue for me on '-rc0'! :D (64-bit cygwin only). >> >> So, the difference seems to be down to FS ACLs, Hmmm ... >> >> (BTW, I am on windows 10 21H1) Just FYI, tests t4202, t5534 and t6200 all pass for me without issue on both of the -rc0 and -rc1 builds. > I'm running these tests in subdirectories in the temporary drive on > Dv4-size Windows 11 Pro Gen2 Azure VMs. I'm spinning up fresh VMs and > using new Cygwin installations regularly, in the name of build > reproducibility; I'm vaguely working on automating more and more of the > Cygwin Git test and release processes. > > (At some point now they're becoming available, I'll probably shift to > Ddv5 Azure VMs for this work; I very much doubt that'll make a > difference, but I note it for the sake of completeness. Longer-term, > I'm hoping to swap to using GitHub Actions to do most of the heavy > lifting.) > > This isn't the first time I've seen similar problems in this environment > that haven't been spotted elsewhere: see a1e03535db (t4129: fix > setfacl-related permissions failure, 2020-12-23). > > The `getfacl` output for the temporary drive, from Cygwin's perspective, > is as below; I'm `cd`ing into that directory and getting the Git > repositories by running `git clone https://github.com/git/git` from > there. Heh, yeah, given the setup above, I'm not exactly shocked that you are running into permission problems ... ;-) > ``` > # file: /cygdrive/d > # owner: NETWORK SERVICE > # group: NETWORK SERVICE > user::r-x > group::r-x > group:SYSTEM:rwx #effective:r-x > group:Administrators:rwx #effective:r-x > group:Users:r-x > mask::r-x > other::r-x > default:user::rwx > default:group::--- > default:group:SYSTEM:rwx > default:group:Administrators:rwx > default:group:Users:rwx > default:mask::rwx > default:other::r-x > ``` I have been using cygwin since the 'beta-8' days (windows NT 3.51, so about 1997 or so) and have run into several permission problems over the years. So, in order to finesse these issues, I find it best to keep it simple. I do not move outside of my cygwin installation (at C:\cygwin64), which even includes my home directory and all git repos. So, for me: $ echo $HOME /home/ramsay $ cygpath -w /home/ramsay C:\cygwin64\home\ramsay $ $ getfacl /cygdrive/c/cygwin64 # file: /cygdrive/c/cygwin64 # owner: ramsay # group: None user::rwx group::r-x other::r-x default:user::rwx default:group::r-x default:other::r-x $ id uid=1001(ramsay) gid=513(None) groups=513(None),545(Users),4(INTERACTIVE),66049(CONSOLE LOGON),11(Authenticated Users),15(This Organization),113(Local account),4095(CurrentSession),66048(LOCAL),262154(NTLM Authentication),401408(Medium Mandatory Level) $ > I'm honestly not sure what it is that means I keep hitting these > problems with this setup. I've managed to avoid needing anything but > the most cursory knowledge of extended permissions handling, > particularly for Cygwin where one has to contend with both the > underlying OS's interpretation of file permissions and with the Cygwin > layer's reinterpretations. I can't say I'm keen to get a deep working > knowledge of how all these pieces interact! I'm definitely no expert, but even with my current setup, I have had permission problems. I used to 'ssh' into cygwin from Linux so that I could build/test git on Linux/cygwin at the same time - that worked fine for many many years, until a test was added that failed when I was remotely logged-in to cygwin, but passed when I was actually directly logged-in on the windows laptop. I don't remember the details, but ever since I have been having to run the tests locally. [When remotely logged in: $ id uid=1001(ramsay) gid=513(None) groups=513(None),114(Local account and member of Administrators group),0(root),545(Users),2(NETWORK),11(Authenticated Users),15(This Organization),113(Local account),4095(CurrentSession),262154(NTLM Authentication),405504(High Mandatory Level) $ Yes, I am still using the 'privileged user' account for the 'sshd' service. I suppose I should re-configure it to use the LOCAL ACCOUNT and test again, but, well, if it ain't broke ... ;-) ] ATB, Ramsay Jones
On Fri, Nov 05, 2021 at 11:04:15AM -0700, Junio C Hamano wrote: > > This is sort of a side note to your main issue, but I think that relying > > on a lazy_prereq for side effects is an anti-pattern. We make no > > promises about when or how often the prereqs might be run, and we try to > > insulate them from the main tests (by putting them in a subshell and > > switching their cwd). > > > > It does happen to work here because the prereq script writes directly to > > $GNUPGHOME, and we run the lazy prereqs about when you'd expect. So I > > don't think it's really in any danger of breaking, but it is definitely > > not using the feature as it was intended. :) > > This merely imitates what GPG lazy-prerequisite started and imitated > by other existing signature backends. Ah, you're right. I should have checked the other GPG ones. It looks like that happened recently-ish in b417ec5f22 (tests: turn GPG, GPGSM and RFC1991 into lazy prereqs, 2020-03-26). Before that the code was run outside of any test block at all, which I think is even worse. > I'd expect that you need some "initialization" for a feature X as > part of asking "is feature X usable in this environment?". Reusing > the result of the initialization for true tests is probably an > optimization worth making. As long as the question is answered for > the true tests, that is [*]. Yes, though if it's possible, I think doing less work in the prereq check might be a good approach (like say, just checking gpg or openssh version if we can). It results in flakier prereqs that may say "yes, we have feature X" when we don't. But it gets a human's attention when it fails, rather than quietly skipping tests (which is the same point Adam is making downthread). It definitely is not something to fiddle with at this point in the -rc cycle, though. > > Again, that's mostly a tangent to your issue, and maybe not worth > > futzing with at this point in the release cycle. I'm mostly just > > registering my surprise. ;) > > My purist side is with you and share the surprise. But my practical > side says this is probably an optimization worth taking. If prereq > only checks "if we initialize the keys right way, we can use ssh > signing" and then removes the key and the equivalent to .ssh/ > directory, and a real test does "Ok, prereq passes so we know ssh > signing is to be tested. Now initialize the .ssh/ equivalent and > create key", a fix like Adam came up with must be duplicated in two > (or more) places, one for the prereq that initializes the keys > "right way", and one for each test script that prepares the key used > for it. To be clear, I wasn't suggesting doing the key setup twice. I was just suggesting moving it out of a lazy prereq into a real test_expect block that sets the prereq flag as a side effect. That just makes the timing and the fact of running it more deliberate on the part of the test scripts. It's probably not worth the effort, though. I think my line of thinking is coming from the "purist" side, and doesn't have any practical benefit. -Peff
On Fri, Nov 05, 2021 at 06:49:14PM +0000, Adam Dinwoodie wrote: > This is probably a much broader conversation. I remember when I first > started packaging Git for Cygwin, I produced a release that didn't > have support for HTTPS URLs due to a missing dependency in my build > environment. The build and test suite all passed -- it assumed I just > wanted to build a release that didn't have HTTPS support -- so some > relatively critical function was silently skipped. I don't know how to > avoid that sort of issue other than relying on (a) user bug (or at > least missing function) reports and (b) folk building Git for > themselves/others periodically going through the output of the > configure scripts and the skipped subtests to make sure only expected > things get missed; neither of those options seem great to me. The HTTP tests in particular have a knob for this, as I was worried about this kind of situation when we introduced auto-enabling of network tests back in 83d842dc8c (tests: turn on network daemon tests by default, 2014-02-10). The solution there was to make the knob a tri-state: the default is "auto", which will try to probe whether we have a working apache setup, but setting it to "true" will complain if that setup fails. Now that's not a perfect solution: - you have to know to flip the switch to "true". For an old switch like HTTP, that's easy. But somebody packaging Git might not even realize GPGSSH was a new thing. - The "true" knob only covers probing of the environment. If you accidentally build with NO_CURL, we'd still quietly skip the tests. It might be reasonable to change this. - In your particular case, it probably would not have helped anyway because we don't have any specific HTTPS tests (there is an option to set up the default server with SSL, but I didn't even realize that until just now; I wonder if it actually works). So I dunno. I guess because of point 1, having an allow-known-skips list would be more helpful. That gives you the opportunity to examine new prereqs and decide if they ought to be skipped or not in your setup. -Peff
diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh index f99ef3e859..1d8e5b5b7e 100644 --- a/t/lib-gpg.sh +++ b/t/lib-gpg.sh @@ -106,6 +106,7 @@ test_lazy_prereq GPGSSH ' test $? = 0 || exit 1; mkdir -p "${GNUPGHOME}" && chmod 0700 "${GNUPGHOME}" && + (setfacl -k "${GNUPGHOME}" 2>/dev/null || true) && ssh-keygen -t ed25519 -N "" -C "git ed25519 key" -f "${GPGSSH_KEY_PRIMARY}" >/dev/null && echo "\"principal with number 1\" $(cat "${GPGSSH_KEY_PRIMARY}.pub")" >> "${GPGSSH_ALLOWED_SIGNERS}" && ssh-keygen -t rsa -b 2048 -N "" -C "git rsa2048 key" -f "${GPGSSH_KEY_SECONDARY}" >/dev/null &&
SSH keys are expected to be created with very restrictive permissions, and SSH commands will fail if the permissions are not appropriate. When creating a directory for SSH keys in test scripts, attempt to clear any ACLs that might otherwise cause the private key to inherit less restrictive permissions than it requires. This change is required in particular to avoid tests relating to SSH signing failing in Cygwin. Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> Helped-by: Fabian Stelzer <fs@gigacodes.de> --- t/lib-gpg.sh | 1 + 1 file changed, 1 insertion(+)