diff mbox series

[1/2] patchwork-bot: Ignore empty context lines

Message ID 20211202233836.2024510-2-keescook@chromium.org (mailing list archive)
State New, archived
Headers show
Series patchwork-bot: Ignore empty context lines | expand

Commit Message

Kees Cook Dec. 2, 2021, 11:38 p.m. UTC
It seems that there is some mismatch in diff output (or storage?) between
git and patchwork under conditions I haven't figured out (MUA or MTA
whitespace stripping?): patchwork appears to ignore empty context lines
(i.e. a line that is only a single space). As a result, the patchwork
hash will not match, and commits will not be identified in patchwork.

As an example of a commit in upstream that patchwork-bot cannot locate
in patchwork due to a different hash, due to the differing empty context
lines:

$ MSGID=20210911102818.3804-1-len.baker@gmx.com
$ wget -qO- https://patchwork.kernel.org/project/linux-hardening/patch/$MSGID/raw/ | \
	grep '^ $' | wc -l
0

$ SHA=f11ee2ad25b22c2ee587045dd6999434375532f7
$ git diff ${SHA}~..${SHA} | \
	grep '^ $' | wc -l
2

Teaching patchwork-bot to ignore empty context lines allows pwhashes to
match in these cases, which lets those patches get located again.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 git-patchwork-bot.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Kees Cook Dec. 16, 2021, 10:11 p.m. UTC | #1
On Thu, Dec 02, 2021 at 03:38:35PM -0800, Kees Cook wrote:
> It seems that there is some mismatch in diff output (or storage?) between
> git and patchwork under conditions I haven't figured out (MUA or MTA
> whitespace stripping?): patchwork appears to ignore empty context lines
> (i.e. a line that is only a single space). As a result, the patchwork
> hash will not match, and commits will not be identified in patchwork.
> 
> As an example of a commit in upstream that patchwork-bot cannot locate
> in patchwork due to a different hash, due to the differing empty context
> lines:
> 
> $ MSGID=20210911102818.3804-1-len.baker@gmx.com
> $ wget -qO- https://patchwork.kernel.org/project/linux-hardening/patch/$MSGID/raw/ | \
> 	grep '^ $' | wc -l
> 0
> 
> $ SHA=f11ee2ad25b22c2ee587045dd6999434375532f7
> $ git diff ${SHA}~..${SHA} | \
> 	grep '^ $' | wc -l
> 2
> 
> Teaching patchwork-bot to ignore empty context lines allows pwhashes to
> match in these cases, which lets those patches get located again.

With regard to the revert, do you not see the above problem with your
repos? I'm working against the same patchwork. I guess there is some
other root cause?
Konstantin Ryabitsev Dec. 17, 2021, 4:17 p.m. UTC | #2
On Thu, Dec 16, 2021 at 02:11:08PM -0800, Kees Cook wrote:
> > $ MSGID=20210911102818.3804-1-len.baker@gmx.com
> > $ wget -qO- https://patchwork.kernel.org/project/linux-hardening/patch/$MSGID/raw/ | \
> > 	grep '^ $' | wc -l
> > 0
> > 
> > $ SHA=f11ee2ad25b22c2ee587045dd6999434375532f7
> > $ git diff ${SHA}~..${SHA} | \
> > 	grep '^ $' | wc -l
> > 2
> > 
> > Teaching patchwork-bot to ignore empty context lines allows pwhashes to
> > match in these cases, which lets those patches get located again.
> 
> With regard to the revert, do you not see the above problem with your
> repos? I'm working against the same patchwork. I guess there is some
> other root cause?

I'm curious if you have local git configs that affect your "git diff" output.
If you "export GIT_CONFIG_GLOBAL=/dev/null", does this make any difference to
your experience?

-K
Kees Cook Dec. 17, 2021, 4:41 p.m. UTC | #3
On Fri, Dec 17, 2021 at 11:17:44AM -0500, Konstantin Ryabitsev wrote:
> On Thu, Dec 16, 2021 at 02:11:08PM -0800, Kees Cook wrote:
> > > $ MSGID=20210911102818.3804-1-len.baker@gmx.com
> > > $ wget -qO- https://patchwork.kernel.org/project/linux-hardening/patch/$MSGID/raw/ | \
> > > 	grep '^ $' | wc -l
> > > 0
> > > 
> > > $ SHA=f11ee2ad25b22c2ee587045dd6999434375532f7
> > > $ git diff ${SHA}~..${SHA} | \
> > > 	grep '^ $' | wc -l
> > > 2
> > > 
> > > Teaching patchwork-bot to ignore empty context lines allows pwhashes to
> > > match in these cases, which lets those patches get located again.
> > 
> > With regard to the revert, do you not see the above problem with your
> > repos? I'm working against the same patchwork. I guess there is some
> > other root cause?
> 
> I'm curious if you have local git configs that affect your "git diff" output.
> If you "export GIT_CONFIG_GLOBAL=/dev/null", does this make any difference to
> your experience?

$ export SHA=f11ee2ad25b22c2ee587045dd6999434375532f7
$ GIT_CONFIG_GLOBAL=/dev/null git diff $SHA~..$SHA | grep '^ $' | wc -l
2
$ git --version
git version 2.30.2

Does this report "0" for you?
Konstantin Ryabitsev Dec. 17, 2021, 5:21 p.m. UTC | #4
On Fri, Dec 17, 2021 at 08:41:41AM -0800, Kees Cook wrote:
> > I'm curious if you have local git configs that affect your "git diff" output.
> > If you "export GIT_CONFIG_GLOBAL=/dev/null", does this make any difference to
> > your experience?
> 
> $ export SHA=f11ee2ad25b22c2ee587045dd6999434375532f7
> $ GIT_CONFIG_GLOBAL=/dev/null git diff $SHA~..$SHA | grep '^ $' | wc -l
> 2
> $ git --version
> git version 2.30.2
> 
> Does this report "0" for you?

No, I get 2 as well. However, what I'm driving at is that the hashing function
has to be identical to what patchwork uses, which is why it's a verbatim lift:

https://github.com/getpatchwork/patchwork/blob/master/patchwork/hasher.py#L18
vs.
https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/git-patchwork-bot.py#n479

If it generates a different hash from an email message than it does from the
"git diff" output, then something else is happening.

-K
diff mbox series

Patch

diff --git a/git-patchwork-bot.py b/git-patchwork-bot.py
index 2ebe5b4e88e4..4e10385de629 100755
--- a/git-patchwork-bot.py
+++ b/git-patchwork-bot.py
@@ -528,7 +528,9 @@  def get_patchwork_hash(diff):
             line_nos = list(map(fn, hunk_match.groups()))
             line = '@@ -%d +%d @@' % tuple(line_nos)
         elif line[0] in prefixes:
-            # if we have a +, - or context line, leave as-is
+            # if we have a +, - or context line, leave as-is, except blank lines
+            if line == ' ':
+                continue
             pass
         else:
             # other lines are ignored