diff mbox series

ci: github action - add check for whitespace errors

Message ID pull.709.git.1600759684548.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 159e0f0277410c44b7cb3df5d1870c71796372f6
Headers show
Series ci: github action - add check for whitespace errors | expand

Commit Message

Philippe Blain via GitGitGadget Sept. 22, 2020, 7:28 a.m. UTC
From: "Chris. Webster" <chris@webstech.net>

Not all developers are aware of `git diff --check` to warn
about whitespace issues.  Running a check when a pull request is
opened or updated can save time for reviewers and the submitter.

A GitHub workflow will run when a pull request is created or the
contents are updated to check the patch series.  A pull request
provides the necessary information (number of commits) to only
check the patch series.

To ensure the developer is aware of any issues, a comment will be
added to the pull request with the check errors.

Signed-off-by: Chris. Webster <chris@webstech.net>
---
    ci: GitHub Action - add check for whitespace errors
    
    Not all developers are aware of git diff --check to warn about
    whitespace issues. Running a check when a pull request is opened or
    updated can save time for reviewers and the submitter.
    
    A GitHub workflow will run when a pull request is created or the
    contents are updated to check the patch series. A pull request provides
    the necessary information (number of commits) to only check the patch
    series.
    
    To ensure the developer is aware of any issues, a comment will be added
    to the pull request with the check errors.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-709%2Fwebstech%2Fcw%2Fdiffcheck-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-709/webstech/cw/diffcheck-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/709

 .github/workflows/check-whitespace.yml | 69 ++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100644 .github/workflows/check-whitespace.yml


base-commit: 675a4aaf3b226c0089108221b96559e0baae5de9

Comments

Jeff King Sept. 22, 2020, 5:07 p.m. UTC | #1
On Tue, Sep 22, 2020 at 07:28:04AM +0000, Chris. Webster via GitGitGadget wrote:

> From: "Chris. Webster" <chris@webstech.net>
> 
> Not all developers are aware of `git diff --check` to warn
> about whitespace issues.  Running a check when a pull request is
> opened or updated can save time for reviewers and the submitter.

Sounds like a useful thing to have.

> A GitHub workflow will run when a pull request is created or the
> contents are updated to check the patch series.  A pull request
> provides the necessary information (number of commits) to only
> check the patch series.

I think this will work OK in practice, but a few thoughts:

 - for a linear branch on top of master, using the commit count will
   work reliably. But I suspect it would run into problems if there were
   ever a merge on a PR (e.g., back-merging from master), where we'd be
   subject to how `git log` linearizes the commits. That's not really a
   workflow I'd expect people to use with git.git, but it would probably
   be easy to make it more robust. Does the PR object provide the "base"
   oid, so we could do "git log $base..$head"?

 - this will run only on PRs. That's helpful for people using
   GitGitGadget, but it might also be useful for people just running the
   CI by pushing branches, or looking at CI builds of Junio's next or
   seen branches. Could we make it work there? Obviously we wouldn't be
   able to rely on having PR data, but I wonder if "git log
   HEAD..$branch" would be sufficient.

-Peff
Junio C Hamano Sept. 22, 2020, 5:55 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

>  - this will run only on PRs. That's helpful for people using
>    GitGitGadget, but it might also be useful for people just running the
>    CI by pushing branches, or looking at CI builds of Junio's next or
>    seen branches. Could we make it work there? Obviously we wouldn't be
>    able to rely on having PR data, but I wonder if "git log
>    HEAD..$branch" would be sufficient.

Yes, I like that very much.  If a push triggers a CI run to notice
whitespace breakage and other mechanically detectable errors, that
would prevent embarrassment before even a pull request is opened.

For me, 'next' is way too late to catch mechanically detectable
errors, but an extra set of eyes, even mechanical ones, on the tip
of 'seen' is always appreciated.

Thanks.
Chris Webster Sept. 22, 2020, 10:17 p.m. UTC | #3
On Tue, Sep 22, 2020 at 10:07 AM Jeff King <peff@peff.net> wrote:
>  - for a linear branch on top of master, using the commit count will
>    work reliably. But I suspect it would run into problems if there were
>    ever a merge on a PR (e.g., back-merging from master), where we'd be
>    subject to how `git log` linearizes the commits. That's not really a
>    workflow I'd expect people to use with git.git, but it would probably
>    be easy to make it more robust. Does the PR object provide the "base"
>    oid, so we could do "git log $base..$head"?

GitGitGadget PR linting is going to flag merges in the PR and request
a rebase.  If I understand correctly, that means back-merging is not
part of the workflow.  The checkout is limited to improve performance
and reduce resources.  In the PR object, the base is the branch.  The
github api would need to be used to get more detailed information.
The "base" is not really part of the checkout so it can not be
referenced in the git log command (without doing a larger checkout).

...chris.
Chris Webster Sept. 22, 2020, 10:41 p.m. UTC | #4
On Tue, Sep 22, 2020 at 10:55 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> >  - this will run only on PRs. That's helpful for people using
> >    GitGitGadget, but it might also be useful for people just running the
> >    CI by pushing branches, or looking at CI builds of Junio's next or
> >    seen branches. Could we make it work there? Obviously we wouldn't be
> >    able to rely on having PR data, but I wonder if "git log
> >    HEAD..$branch" would be sufficient.
>
> Yes, I like that very much.  If a push triggers a CI run to notice
> whitespace breakage and other mechanically detectable errors, that
> would prevent embarrassment before even a pull request is opened.

This was originally started as a push action which was expected to be
in the GitGitGadget workflow.  It was changed to run on PRs using the
more helpful PR data.  The original GitGitGadget issue suggested a
build target to run the check, which could have been part of the CI
build (or a local build).  Doing the check later as part of the PR
process is consistent with GitGitGadget performing linting on the PR
request, with similar opportunities for embarrassment.

...chris.
Jeff King Sept. 24, 2020, 6:51 a.m. UTC | #5
On Tue, Sep 22, 2020 at 03:17:54PM -0700, Chris Webster wrote:

> On Tue, Sep 22, 2020 at 10:07 AM Jeff King <peff@peff.net> wrote:
> >  - for a linear branch on top of master, using the commit count will
> >    work reliably. But I suspect it would run into problems if there were
> >    ever a merge on a PR (e.g., back-merging from master), where we'd be
> >    subject to how `git log` linearizes the commits. That's not really a
> >    workflow I'd expect people to use with git.git, but it would probably
> >    be easy to make it more robust. Does the PR object provide the "base"
> >    oid, so we could do "git log $base..$head"?
> 
> GitGitGadget PR linting is going to flag merges in the PR and request
> a rebase.  If I understand correctly, that means back-merging is not
> part of the workflow.

Yeah, I would definitely be surprised to see it used with a git PR, but
I didn't realize there was other linting that would actually complain
about it.

> The checkout is limited to improve performance
> and reduce resources.  In the PR object, the base is the branch.  The
> github api would need to be used to get more detailed information.
> The "base" is not really part of the checkout so it can not be
> referenced in the git log command (without doing a larger checkout).

Hmm.

  git clone --shallow-exclude=HEAD --single-branch -b $branch
  git log --check

_almost_ works. The problem is that the shallow graft means that the
bottom commit looks like it introduces every file. We really want to
graft at HEAD^, but the server side only accepts exact refnames. You
could work around it with a followup:

  git fetch --deepen 1

which is getting a bit convoluted. I suspect you may also have to
abandon the "checkout" action and do this manually. Definitely not worth
it compared to your solution for a PR, but maybe worth it if it lets us
do the same thing for arbitrary branches.

-Peff
Chris Webster Sept. 25, 2020, 5:10 a.m. UTC | #6
On Wed, Sep 23, 2020 at 11:51 PM Jeff King <peff@peff.net> wrote:
>   git clone --shallow-exclude=HEAD --single-branch -b $branch
>   git log --check
>
> _almost_ works. The problem is that the shallow graft means that the
> bottom commit looks like it introduces every file. We really want to
> graft at HEAD^, but the server side only accepts exact refnames. You
> could work around it with a followup:
>
>   git fetch --deepen 1

Thanks for the other possibilities (I have much to learn).  The first
step to increase the commit count is addressing this very problem.

> Definitely not worth
> it compared to your solution for a PR, but maybe worth it if it lets us
> do the same thing for arbitrary branches.

The PR solution works because fixed values are available from GitHub
(both repos are present and accounted for).  A push action for
branches could have issues with the state of the GitHub repo versus
the local repo.  What happens if the base branch is not current on
GitHub?  Is HEAD reliable? What if the branch has been re-used with a
back-merge? How do you limit the check in this case?  Based on my
demonstrated lack of knowledge these concerns may be addressable.

The original push solution pulled an arbitrary depth knowing
GitGitGadget had a limit of 30 commits and then limited the check to
post merge commits, again knowing merges were flagged.  Not pretty but
workable in the confines of the GitGitGadget workflow.

A generic push solution (with an opt out?) could be a separate file or
replace this (yea).

I appreciate the feedback,
...chris.
Jeff King Sept. 25, 2020, 6:44 a.m. UTC | #7
On Thu, Sep 24, 2020 at 10:10:16PM -0700, Chris Webster wrote:

> > Definitely not worth
> > it compared to your solution for a PR, but maybe worth it if it lets us
> > do the same thing for arbitrary branches.
> 
> The PR solution works because fixed values are available from GitHub
> (both repos are present and accounted for).  A push action for
> branches could have issues with the state of the GitHub repo versus
> the local repo.  What happens if the base branch is not current on
> GitHub?  Is HEAD reliable? What if the branch has been re-used with a
> back-merge? How do you limit the check in this case?  Based on my
> demonstrated lack of knowledge these concerns may be addressable.

Hmm, good points. The case I was most worried about was branches based
on older points in history, but as long as master keeps moving forward,
we'd be OK there (at least in the local case where we have all of the
commits; not sure about the shallow-exclude I mentioned above).

And in the case of git.git, I think we're pretty safe. "master" gets
pushed along with "seen". But not necessarily so in other repositories.
If I base a new topic on Junio's "master" and then push it up, it may be
far ahead of my "master" (and in fact, I don't even have a "master" in
my personal repo).

GGG PRs figure this out because that repo is a fork of git/git, and it
looks at the master of the parent repo as the base for the PR. So
probably we could do something similar, but this is starting to get
rather tricky.

I think you've convinced me that it's not easy to just adapt this to
handle any branch. Let's punt on that idea for now (unless somebody
feels like digging further on it, of course) and move forward with doing
this for the PR case as your patch does.

-Peff
Chris Webster Oct. 9, 2020, 5 a.m. UTC | #8
Is this waiting for some action on my part?  I thought the question of
running on push vs pull had been resolved (in favour of pull).

thanks,
...chris.
Johannes Schindelin Oct. 9, 2020, 1:20 p.m. UTC | #9
Hi Chris,

On Thu, 8 Oct 2020, Chris Webster wrote:

> Is this waiting for some action on my part?  I thought the question of
> running on push vs pull had been resolved (in favour of pull).

FWIW I agree that the current shape is the best we can do for now (and of
course, full disclosure: I was the one suggesting to restrict this to Pull
Requests because we know exactly the commit range to check in that case).

Thanks,
Dscho
Junio C Hamano Oct. 9, 2020, 4:23 p.m. UTC | #10
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Chris,
>
> On Thu, 8 Oct 2020, Chris Webster wrote:
>
>> Is this waiting for some action on my part?  I thought the question of
>> running on push vs pull had been resolved (in favour of pull).
>
> FWIW I agree that the current shape is the best we can do for now (and of
> course, full disclosure: I was the one suggesting to restrict this to Pull
> Requests because we know exactly the commit range to check in that case).

I think this is exactly the use case that

    After the list reached a consensus that it is a good idea to apply the
    patch, re-send it with "To:" set to the maintainer{current-maintainer}
    and "cc:" the list{git-ml} for inclusion.

in Documentation/SubmittingPatches was written to address.

I usually pay attention to majority of topics and have them on my
radar by getting involved in _some_ way in the discussion thread, so
I often know when the patch(es) matured enough to be picked up
without such a "this is the version after our discussion and it is
as close to perfect as we can possibly make" resend.

But for some topics, I have no strong opinion on the exact shape of
the final patch(es), and/or I have no expertise to offer to help the
discussion to reach the final product.  In such a case, I'd be just
waiting, without getting involved in the discussion, for trusted
others to bring the posted patch to a completed form.  I think this
is such a case.

Thanks.
Jeff King Oct. 9, 2020, 5:59 p.m. UTC | #11
On Fri, Oct 09, 2020 at 09:23:28AM -0700, Junio C Hamano wrote:

> I think this is exactly the use case that
> 
>     After the list reached a consensus that it is a good idea to apply the
>     patch, re-send it with "To:" set to the maintainer{current-maintainer}
>     and "cc:" the list{git-ml} for inclusion.
> 
> in Documentation/SubmittingPatches was written to address.
> 
> I usually pay attention to majority of topics and have them on my
> radar by getting involved in _some_ way in the discussion thread, so
> I often know when the patch(es) matured enough to be picked up
> without such a "this is the version after our discussion and it is
> as close to perfect as we can possibly make" resend.
> 
> But for some topics, I have no strong opinion on the exact shape of
> the final patch(es), and/or I have no expertise to offer to help the
> discussion to reach the final product.  In such a case, I'd be just
> waiting, without getting involved in the discussion, for trusted
> others to bring the posted patch to a completed form.  I think this
> is such a case.

As the other person in the discussion, I'm sufficiently convinced that
doing this just for PRs is a good step for now. I.e., I think the
"completed form" is just what was posted already (though I agree it is
often convenient to the maintainer to re-post the patch as part of the
ping).

-Peff
Junio C Hamano Oct. 9, 2020, 6:13 p.m. UTC | #12
Jeff King <peff@peff.net> writes:

> On Fri, Oct 09, 2020 at 09:23:28AM -0700, Junio C Hamano wrote:
>
>> I think this is exactly the use case that
>> 
>>     After the list reached a consensus that it is a good idea to apply the
>>     patch, re-send it with "To:" set to the maintainer{current-maintainer}
>>     and "cc:" the list{git-ml} for inclusion.
>> 
>> in Documentation/SubmittingPatches was written to address.
>> 
>> I usually pay attention to majority of topics and have them on my
>> radar by getting involved in _some_ way in the discussion thread, so
>> I often know when the patch(es) matured enough to be picked up
>> without such a "this is the version after our discussion and it is
>> as close to perfect as we can possibly make" resend.
>> 
>> But for some topics, I have no strong opinion on the exact shape of
>> the final patch(es), and/or I have no expertise to offer to help the
>> discussion to reach the final product.  In such a case, I'd be just
>> waiting, without getting involved in the discussion, for trusted
>> others to bring the posted patch to a completed form.  I think this
>> is such a case.
>
> As the other person in the discussion, I'm sufficiently convinced that
> doing this just for PRs is a good step for now. I.e., I think the
> "completed form" is just what was posted already (though I agree it is
> often convenient to the maintainer to re-post the patch as part of the
> ping).

Yes, and CC'ing those who were involved in the review would give
them the last chance to say "oh, no, that extra change you added
for this final submission was not something I meant to suggest!",
etc.

So, is <pull.709.git.1600759684548.gitgitgadget@gmail.com> as-is the
one we should take?

Thanks.
Jeff King Oct. 9, 2020, 6:18 p.m. UTC | #13
On Fri, Oct 09, 2020 at 11:13:43AM -0700, Junio C Hamano wrote:

> > As the other person in the discussion, I'm sufficiently convinced that
> > doing this just for PRs is a good step for now. I.e., I think the
> > "completed form" is just what was posted already (though I agree it is
> > often convenient to the maintainer to re-post the patch as part of the
> > ping).
> 
> Yes, and CC'ing those who were involved in the review would give
> them the last chance to say "oh, no, that extra change you added
> for this final submission was not something I meant to suggest!",
> etc.
> 
> So, is <pull.709.git.1600759684548.gitgitgadget@gmail.com> as-is the
> one we should take?

AFAIK it's the only one on the list. :) So yes, that one is fine with
me.

-Peff
Junio C Hamano Oct. 9, 2020, 6:56 p.m. UTC | #14
Jeff King <peff@peff.net> writes:

> On Fri, Oct 09, 2020 at 11:13:43AM -0700, Junio C Hamano wrote:
>
>> > As the other person in the discussion, I'm sufficiently convinced that
>> > doing this just for PRs is a good step for now. I.e., I think the
>> > "completed form" is just what was posted already (though I agree it is
>> > often convenient to the maintainer to re-post the patch as part of the
>> > ping).
>> 
>> Yes, and CC'ing those who were involved in the review would give
>> them the last chance to say "oh, no, that extra change you added
>> for this final submission was not something I meant to suggest!",
>> etc.
>> 
>> So, is <pull.709.git.1600759684548.gitgitgadget@gmail.com> as-is the
>> one we should take?
>
> AFAIK it's the only one on the list. :) So yes, that one is fine with
> me.

Thanks.

Another thing the resending does is that it can credit who helped
the patch into the final shape with Reviewed-by/Helped-by etc.  If
the maintainer must hunt for the names of those who had input to the
discussion and judge the degree of contribution for a topic whose
review has been delegated to trusted others, that defeats the whole
point of delegation (I think the attached clarification may help).

For this particular patch, I added Reviewed-by: naming you before
applying.

Thanks.

 Documentation/SubmittingPatches | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git c/Documentation/SubmittingPatches w/Documentation/SubmittingPatches
index 291b61e262..87089654ae 100644
--- c/Documentation/SubmittingPatches
+++ w/Documentation/SubmittingPatches
@@ -290,12 +290,14 @@ identify them), to solicit comments and reviews.
 :git-ml: footnote:[The mailing list: git@vger.kernel.org]
 
 After the list reached a consensus that it is a good idea to apply the
-patch, re-send it with "To:" set to the maintainer{current-maintainer} and "cc:" the
-list{git-ml} for inclusion.
+patch, re-send it with "To:" set to the maintainer{current-maintainer}
+and "cc:" the list{git-ml} for inclusion.  This is especially relevant
+when the maintainer did not heavily participate in the discussion and
+instead left the review to trusted others.
 
 Do not forget to add trailers such as `Acked-by:`, `Reviewed-by:` and
 `Tested-by:` lines as necessary to credit people who helped your
-patch.
+patch, and "cc:" them when sending such a final version for inclusion.
 
 [[sign-off]]
 === Certify your work by adding your "Signed-off-by: " line
Chris Webster Oct. 10, 2020, 5:26 a.m. UTC | #15
On Fri, Oct 9, 2020 at 11:57 AM Junio C Hamano <gitster@pobox.com> wrote:

> Thanks.
>
> Another thing the resending does is that it can credit who helped
> the patch into the final shape with Reviewed-by/Helped-by etc.  If
> the maintainer must hunt for the names of those who had input to the
> discussion and judge the degree of contribution for a topic whose
> review has been delegated to trusted others, that defeats the whole
> point of delegation (I think the attached clarification may help).
>
> For this particular patch, I added Reviewed-by: naming you before
> applying.
>
> Thanks.
>

Thank you for moving forward with this.  I apologize for not
re-reviewing the SubmittingPatches doc.  I should have done that.

Thanks to Johannes for the input on this before it was submitted.
Will work on improving the commit messages with more credits.

Is there an opportunity here for a gitgitgadget command to send the
'consensus reached' email?  The value may be in deciding who is
getting the email and trying to select content from the PR comments.

...chris.
Junio C Hamano Oct. 10, 2020, 6:29 a.m. UTC | #16
Chris Webster <chris@webstech.net> writes:

> Thank you for moving forward with this.  I apologize for not
> re-reviewing the SubmittingPatches doc.  I should have done that.
>
> Thanks to Johannes for the input on this before it was submitted.
> Will work on improving the commit messages with more credits.

Thank you to all, and especially to you Chris for pinging to revive
the thread.  Applied and pushed out.

> Is there an opportunity here for a gitgitgadget command to send the
> 'consensus reached' email?  The value may be in deciding who is
> getting the email and trying to select content from the PR comments.
diff mbox series

Patch

diff --git a/.github/workflows/check-whitespace.yml b/.github/workflows/check-whitespace.yml
new file mode 100644
index 0000000000..9d070b9cdf
--- /dev/null
+++ b/.github/workflows/check-whitespace.yml
@@ -0,0 +1,69 @@ 
+name: check-whitespace
+
+# Get the repo with the commits(+1) in the series.
+# Process `git log --check` output to extract just the check errors.
+# Add a comment to the pull request with the check errors.
+
+on:
+  pull_request:
+    types: [opened, synchronize]
+
+jobs:
+  check-whitespace:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Set commit count
+      shell: bash
+      run: echo "::set-env name=COMMIT_DEPTH::$((1+$COMMITS))"
+      env:
+        COMMITS: ${{ github.event.pull_request.commits }}
+
+    - uses: actions/checkout@v2
+      with:
+        fetch-depth: ${{ env.COMMIT_DEPTH }}
+
+    - name: git log --check
+      id: check_out
+      run: |
+        log=
+        commit=
+        while read dash etc
+        do
+          case "${dash}" in
+          "---")
+            commit="${etc}"
+            ;;
+          "")
+            ;;
+          *)
+            if test -n "${commit}"
+            then
+              log="${log}\n${commit}"
+              echo ""
+              echo "--- ${commit}"
+            fi
+            commit=
+            log="${log}\n${dash} ${etc}"
+            echo "${dash} ${etc}"
+            ;;
+          esac
+        done <<< $(git log --check --pretty=format:"---% h% s" -${{github.event.pull_request.commits}})
+
+        if test -n "${log}"
+        then
+          echo "::set-output name=checkout::"${log}""
+          exit 2
+        fi
+
+    - name: Add Check Output as Comment
+      uses: actions/github-script@v3
+      id: add-comment
+      with:
+        script: |
+            github.issues.createComment({
+              issue_number: context.issue.number,
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              body: "Whitespace errors found in workflow ${{ github.workflow }}:\n\n${{ steps.check_out.outputs.checkout }}"
+            })
+      if: ${{ failure() }}