Message ID | 20210810190937.305765-1-tsdh@gnu.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v4] userdiff: improve java hunk header regex | expand |
Am 10.08.21 um 21:09 schrieb Tassilo Horn: > Currently, the git diff hunk headers show the wrong method signature if the > method has a qualified return type, an array return type, or a generic return > type because the regex doesn't allow dots (.), [], or < and > in the return > type. Also, type parameter declarations couldn't be matched. > > Add several t4018 tests asserting the right hunk headers for increasingly > complex method signatures: > > public String[] myMethod(String[] RIGHT) > public List<String> myMethod(String[] RIGHT) > public <T> List<T> myMethod(T[] RIGHT) > public <AType, B> Map<AType, B> myMethod(String[] RIGHT) > public <AType, B> java.util.Map<AType, Map<B, B[]>> myMethod(String[] RIGHT) > public List<? extends Comparable> myMethod(String[] RIGHT) > public <T extends Serializable & Comparable<T>> List<T> myMethod(String[] RIGHT) > > Signed-off-by: Tassilo Horn <tsdh@gnu.org> > --- > t/t4018/java-constructor | 6 ++++++ > t/t4018/java-enum-constant | 6 ++++++ > t/t4018/java-nested-field | 6 ++++++ > t/t4018/java-return-array | 6 ++++++ > t/t4018/java-return-generic | 6 ++++++ > t/t4018/java-return-generic-bounded | 6 ++++++ > t/t4018/java-return-generic-wildcart | 6 ++++++ > t/t4018/java-return-generic2 | 6 ++++++ > t/t4018/java-return-generic3 | 6 ++++++ > t/t4018/java-return-generic4 | 6 ++++++ > userdiff.c | 23 ++++++++++++++++++++++- > 11 files changed, 82 insertions(+), 1 deletion(-) > create mode 100644 t/t4018/java-constructor > create mode 100644 t/t4018/java-enum-constant > create mode 100644 t/t4018/java-nested-field > create mode 100644 t/t4018/java-return-array > create mode 100644 t/t4018/java-return-generic > create mode 100644 t/t4018/java-return-generic-bounded > create mode 100644 t/t4018/java-return-generic-wildcart > create mode 100644 t/t4018/java-return-generic2 > create mode 100644 t/t4018/java-return-generic3 > create mode 100644 t/t4018/java-return-generic4 > These new tests are very much appreciated. You do not have to go wild with that many return type tests; IMO, the simple one and the most complicated one should do it. (And btw, s/cart/card/) > diff --git a/t/t4018/java-return-array b/t/t4018/java-return-array > new file mode 100644 > index 0000000000..747638b9a8 > --- /dev/null > +++ b/t/t4018/java-return-array > @@ -0,0 +1,6 @@ > +class MyExample { > + public String[] myMethod(String[] RIGHT) { > + // Whatever... > + return new; // ChangeMe > + } > +} > diff --git a/userdiff.c b/userdiff.c > index 3c3bbe38b0..9bd751b7d2 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -142,7 +142,28 @@ PATTERNS("html", > "[^<>= \t]+"), > PATTERNS("java", > "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n" > - "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$", > + "^[ \t]*(" > + /* Class, enum, and interface declarations: */ > + /* optional modifiers: public */ > + "(([a-z]+[ \t]+)*" > + /* the kind of declaration */ > + "(class|enum|interface)[ \t]+" > + /* the name */ > + "[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)" > + /* Method & constructor signatures: */ > + /* optional modifiers: public static */ > + "|(([a-z]+[ \t]+)*" > + /* type params and return types for methods but not constructors */ > + "(" > + /* optional type parameters: <A, B extends Comparable<B>> */ > + "(<[A-Za-z0-9_,.&<> \t]+>[ \t]+)?" > + /* return type: java.util.Map<A, B[]> or List<?> */ > + "([A-Za-z_]([A-Za-z_0-9<>,.?]|\\[[ \t]*\\])*[ \t]+)+" > + /* end of type params and return type */ > + ")?" > + /* the method name followed by the parameter list: myMethod(...) */ > + "[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)" > + ")$", I don't see the point in this complicated regex. Please recall that it will be applied only to syntactically correct Java text. Therefore, you do not have to implement all syntactical corner cases, just be sufficiently permissive. What is wrong with "^[ \t]*(([A-Za-z_][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$", i.e. take every "token" until an identifier followed by an opening parenthesis is found. Can types in Java contain parentheses? That would make my suggested simplified regex too permissive, but otherwise it would do its job, I would think. -- Hannes
Johannes Sixt <j6t@kdbg.org> writes: Hi Hannes & Junio, > These new tests are very much appreciated. You do not have to go wild > with that many return type tests; IMO, the simple one and the most > complicated one should do it. (And btw, s/cart/card/) Well, they appeared naturally as a result during development and made it easier to spot errors when you know up to which level of complexity it still worked. Is there a stronger reason to remove tests which might not be needed, e.g., runtime cost on some CI machines? >> - "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$", >> + "^[ \t]*(" >> + /* Class, enum, and interface declarations: */ >> + /* optional modifiers: public */ >> + "(([a-z]+[ \t]+)*" >> + /* the kind of declaration */ >> + "(class|enum|interface)[ \t]+" >> + /* the name */ >> + "[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)" >> + /* Method & constructor signatures: */ >> + /* optional modifiers: public static */ >> + "|(([a-z]+[ \t]+)*" >> + /* type params and return types for methods but not constructors */ >> + "(" >> + /* optional type parameters: <A, B extends Comparable<B>> */ >> + "(<[A-Za-z0-9_,.&<> \t]+>[ \t]+)?" >> + /* return type: java.util.Map<A, B[]> or List<?> */ >> + "([A-Za-z_]([A-Za-z_0-9<>,.?]|\\[[ \t]*\\])*[ \t]+)+" >> + /* end of type params and return type */ >> + ")?" >> + /* the method name followed by the parameter list: myMethod(...) */ >> + "[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)" >> + ")$", > > I don't see the point in this complicated regex. Please recall that it > will be applied only to syntactically correct Java text. Therefore, > you do not have to implement all syntactical corner cases, just be > sufficiently permissive. I actually find it easier to understand if it is broken up into more concrete alternatives and parts which are commented instaed of one opaque "permissively match everything in one alternative" regex. It shows the intent of what you want to match. But YMMV and since Junio agrees with you, I'm fine with that approach. > What is wrong with > > "^[ \t]*(([A-Za-z_][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ > \t]*\\([^;]*)$", That doesn't work for <T> List<T> foo() or <T extends Foo & Bar> T foo() so at least it needs to include &<> in the first group, too. Also, it doesn't match class/enum/interface declarations anymore, so class Foo { String x = "ChangeMe"; } will have an empty hunk header. Another thing I've noticed (with my suggested patch) is that I should not try to match constructor signatures. I think that's impossible because they are indistinguishable from method calls, e.g., in public class MyClass { MyClass(String RIGHT) { someMethodCall(); someOtherMethod(17) .doThat(); // Whatever // ChangeMe } } there is no regex way to prefer MyClass(String RIGHT) over someOtherMethod(). So all in all, I'd propose this version in the next patch version: --8<---------------cut here---------------start------------->8--- PATTERNS("java", "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n" "^[ \t]*(" /* Class, enum, and interface declarations */ "(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)" /* Method definitions; note that constructor signatures are not */ /* matched because they are indistinguishable from method calls. */ "|(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)" ")$", /* -- */ "[a-zA-Z_][a-zA-Z0-9_]*" "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" "|[-+*/<>%&^|=!]=" "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), --8<---------------cut here---------------end--------------->8--- That works for all my test cases (which I have also altered to include the method calls from above before the ChangeMe) except for java-constructor where it shows public class MyClass { instead of MyClass(String RIGHT) { in the hunk header which is expected as explained earlier and in the comment. Does that seem like a good middle ground? Bye, Tassilo
Am 11.08.21 um 07:22 schrieb Tassilo Horn: > Johannes Sixt <j6t@kdbg.org> writes: >> These new tests are very much appreciated. You do not have to go wild >> with that many return type tests; IMO, the simple one and the most >> complicated one should do it. (And btw, s/cart/card/) > > Well, they appeared naturally as a result during development and made it > easier to spot errors when you know up to which level of complexity it > still worked. Is there a stronger reason to remove tests which might > not be needed, e.g., runtime cost on some CI machines? I totally understand how the test cases evolved. Having many of them is not a big deal. It's just the disproportion of tests of this new feature vs. the existing tests that your patch creates, in particular, when earlier of the new tests are subsumed by later new tests. > Another thing I've noticed (with my suggested patch) is that I should > not try to match constructor signatures. I think that's impossible > because they are indistinguishable from method calls, e.g., in > > public class MyClass { > MyClass(String RIGHT) { > someMethodCall(); > someOtherMethod(17) > .doThat(); > // Whatever > // ChangeMe > } > } > > there is no regex way to prefer MyClass(String RIGHT) over > someOtherMethod(). Good find. > So all in all, I'd propose this version in the next patch version: > > --8<---------------cut here---------------start------------->8--- > PATTERNS("java", > "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n" > "^[ \t]*(" > /* Class, enum, and interface declarations */ > "(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)" > /* Method definitions; note that constructor signatures are not */ > /* matched because they are indistinguishable from method calls. */ > "|(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)" > ")$", > /* -- */ > "[a-zA-Z_][a-zA-Z0-9_]*" > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > "|[-+*/<>%&^|=!]=" > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), > --8<---------------cut here---------------end--------------->8--- That looks fine. One suggestion, though. You do not have to have all positive patterns ("class, enum, interface" and "method definitions") in a single pattern separated by "|". You can place them on different "lines" (note the "\n" at the end of the first pattern): /* Class, enum, and interface declarations */ "^[ \t]*(...(class|enum|interface)...)$\n" /* * Method definitions; note that constructor signatures are not * matched because they are indistinguishable from method calls. */ "^[ \t]*(...[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*))$", I don't think there is a technical difference, but I find this form easier to understand because fewer open parentheses have to be tracked. -- Hannes
Johannes Sixt <j6t@kdbg.org> writes: Hi Hannes, >>> These new tests are very much appreciated. You do not have to go >>> wild with that many return type tests; IMO, the simple one and the >>> most complicated one should do it. (And btw, s/cart/card/) >> >> Well, they appeared naturally as a result during development and made >> it easier to spot errors when you know up to which level of >> complexity it still worked. Is there a stronger reason to remove >> tests which might not be needed, e.g., runtime cost on some CI >> machines? > > I totally understand how the test cases evolved. Having many of them > is not a big deal. It's just the disproportion of tests of this new > feature vs. the existing tests that your patch creates, in particular, > when earlier of the new tests are subsumed by later new tests. Sure thing, I'll see if I can remove some tests. >> Another thing I've noticed (with my suggested patch) is that I should >> not try to match constructor signatures. I think that's impossible >> because they are indistinguishable from method calls, e.g., in >> >> public class MyClass { >> MyClass(String RIGHT) { >> someMethodCall(); >> someOtherMethod(17) >> .doThat(); >> // Whatever >> // ChangeMe >> } >> } >> >> there is no regex way to prefer MyClass(String RIGHT) over >> someOtherMethod(). > > Good find. The longer you play with it, the more you find out. >> So all in all, I'd propose this version in the next patch version: >> >> --8<---------------cut here---------------start------------->8--- >> PATTERNS("java", >> "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n" >> "^[ \t]*(" >> /* Class, enum, and interface declarations */ >> "(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)" >> /* Method definitions; note that constructor signatures are not */ >> /* matched because they are indistinguishable from method calls. */ >> "|(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)" >> ")$", >> /* -- */ >> "[a-zA-Z_][a-zA-Z0-9_]*" >> "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" >> "|[-+*/<>%&^|=!]=" >> "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), >> --8<---------------cut here---------------end--------------->8--- > > That looks fine. > > One suggestion, though. You do not have to have all positive patterns > ("class, enum, interface" and "method definitions") in a single > pattern separated by "|". You can place them on different "lines" > (note the "\n" at the end of the first pattern): > > /* Class, enum, and interface declarations */ > "^[ \t]*(...(class|enum|interface)...)$\n" > /* > * Method definitions; note that constructor signatures are not > * matched because they are indistinguishable from method calls. > */ > "^[ \t]*(...[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*))$", > > I don't think there is a technical difference, but I find this form > easier to understand because fewer open parentheses have to be > tracked. Yes, indeed. Because of that reason I've put the first ( and the last ) on separate lines but your approach is even better. Patch version v5 will come anytime soon. Thanks! Tassilo
diff --git a/t/t4018/java-constructor b/t/t4018/java-constructor new file mode 100644 index 0000000000..9daf7c5430 --- /dev/null +++ b/t/t4018/java-constructor @@ -0,0 +1,6 @@ +public class MyClass { + MyClass(String RIGHT) { + // Whatever + // ChangeMe + } +} diff --git a/t/t4018/java-enum-constant b/t/t4018/java-enum-constant new file mode 100644 index 0000000000..a1931c8379 --- /dev/null +++ b/t/t4018/java-enum-constant @@ -0,0 +1,6 @@ +private enum RIGHT { + ONE, + TWO, + THREE, + ChangeMe +} diff --git a/t/t4018/java-nested-field b/t/t4018/java-nested-field new file mode 100644 index 0000000000..d92d3ec688 --- /dev/null +++ b/t/t4018/java-nested-field @@ -0,0 +1,6 @@ +class MyExample { + private static class RIGHT { + // change an inner class field + String inner = "ChangeMe"; + } +} diff --git a/t/t4018/java-return-array b/t/t4018/java-return-array new file mode 100644 index 0000000000..747638b9a8 --- /dev/null +++ b/t/t4018/java-return-array @@ -0,0 +1,6 @@ +class MyExample { + public String[] myMethod(String[] RIGHT) { + // Whatever... + return new; // ChangeMe + } +} diff --git a/t/t4018/java-return-generic b/t/t4018/java-return-generic new file mode 100644 index 0000000000..161dd8338f --- /dev/null +++ b/t/t4018/java-return-generic @@ -0,0 +1,6 @@ +class MyExample { + public List<String> myMethod(String[] RIGHT) { + // Whatever... + return Arrays.asList("ChangeMe"); + } +} diff --git a/t/t4018/java-return-generic-bounded b/t/t4018/java-return-generic-bounded new file mode 100644 index 0000000000..440115a788 --- /dev/null +++ b/t/t4018/java-return-generic-bounded @@ -0,0 +1,6 @@ +class MyExample { + public <T extends Serializable & Comparable<T>> List<T> myMethod(String[] RIGHT) { + // Whatever... + return (List<T>) Arrays.asList("ChangeMe"); + } +} diff --git a/t/t4018/java-return-generic-wildcart b/t/t4018/java-return-generic-wildcart new file mode 100644 index 0000000000..2d682e1e2b --- /dev/null +++ b/t/t4018/java-return-generic-wildcart @@ -0,0 +1,6 @@ +class MyExample { + public List<? extends Comparable> myMethod(String[] RIGHT) { + // Whatever... + return Arrays.asList("ChangeMe"); + } +} diff --git a/t/t4018/java-return-generic2 b/t/t4018/java-return-generic2 new file mode 100644 index 0000000000..7109c27456 --- /dev/null +++ b/t/t4018/java-return-generic2 @@ -0,0 +1,6 @@ +class MyExample { + public <T> List<T> myMethod(T[] RIGHT) { + // Whatever... + return (List<T>) Arrays.asList("ChangeMe"); + } +} diff --git a/t/t4018/java-return-generic3 b/t/t4018/java-return-generic3 new file mode 100644 index 0000000000..849f116f50 --- /dev/null +++ b/t/t4018/java-return-generic3 @@ -0,0 +1,6 @@ +class MyExample { + public <AType, B> Map<AType, B> myMethod(String[] RIGHT) { + // Whatever... + return new java.util.HashMap<>(); // ChangeMe + } +} diff --git a/t/t4018/java-return-generic4 b/t/t4018/java-return-generic4 new file mode 100644 index 0000000000..1b22c8c037 --- /dev/null +++ b/t/t4018/java-return-generic4 @@ -0,0 +1,6 @@ +class MyExample { + public <AType, B> java.util.Map<AType, Map<B, B[]>> myMethod(String[] RIGHT) { + // Whatever... + return new java.util.HashMap<>(); // ChangeMe + } +} diff --git a/userdiff.c b/userdiff.c index 3c3bbe38b0..9bd751b7d2 100644 --- a/userdiff.c +++ b/userdiff.c @@ -142,7 +142,28 @@ PATTERNS("html", "[^<>= \t]+"), PATTERNS("java", "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n" - "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$", + "^[ \t]*(" + /* Class, enum, and interface declarations: */ + /* optional modifiers: public */ + "(([a-z]+[ \t]+)*" + /* the kind of declaration */ + "(class|enum|interface)[ \t]+" + /* the name */ + "[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)" + /* Method & constructor signatures: */ + /* optional modifiers: public static */ + "|(([a-z]+[ \t]+)*" + /* type params and return types for methods but not constructors */ + "(" + /* optional type parameters: <A, B extends Comparable<B>> */ + "(<[A-Za-z0-9_,.&<> \t]+>[ \t]+)?" + /* return type: java.util.Map<A, B[]> or List<?> */ + "([A-Za-z_]([A-Za-z_0-9<>,.?]|\\[[ \t]*\\])*[ \t]+)+" + /* end of type params and return type */ + ")?" + /* the method name followed by the parameter list: myMethod(...) */ + "[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)" + ")$", /* -- */ "[a-zA-Z_][a-zA-Z0-9_]*" "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
Currently, the git diff hunk headers show the wrong method signature if the method has a qualified return type, an array return type, or a generic return type because the regex doesn't allow dots (.), [], or < and > in the return type. Also, type parameter declarations couldn't be matched. Add several t4018 tests asserting the right hunk headers for increasingly complex method signatures: public String[] myMethod(String[] RIGHT) public List<String> myMethod(String[] RIGHT) public <T> List<T> myMethod(T[] RIGHT) public <AType, B> Map<AType, B> myMethod(String[] RIGHT) public <AType, B> java.util.Map<AType, Map<B, B[]>> myMethod(String[] RIGHT) public List<? extends Comparable> myMethod(String[] RIGHT) public <T extends Serializable & Comparable<T>> List<T> myMethod(String[] RIGHT) Signed-off-by: Tassilo Horn <tsdh@gnu.org> --- t/t4018/java-constructor | 6 ++++++ t/t4018/java-enum-constant | 6 ++++++ t/t4018/java-nested-field | 6 ++++++ t/t4018/java-return-array | 6 ++++++ t/t4018/java-return-generic | 6 ++++++ t/t4018/java-return-generic-bounded | 6 ++++++ t/t4018/java-return-generic-wildcart | 6 ++++++ t/t4018/java-return-generic2 | 6 ++++++ t/t4018/java-return-generic3 | 6 ++++++ t/t4018/java-return-generic4 | 6 ++++++ userdiff.c | 23 ++++++++++++++++++++++- 11 files changed, 82 insertions(+), 1 deletion(-) create mode 100644 t/t4018/java-constructor create mode 100644 t/t4018/java-enum-constant create mode 100644 t/t4018/java-nested-field create mode 100644 t/t4018/java-return-array create mode 100644 t/t4018/java-return-generic create mode 100644 t/t4018/java-return-generic-bounded create mode 100644 t/t4018/java-return-generic-wildcart create mode 100644 t/t4018/java-return-generic2 create mode 100644 t/t4018/java-return-generic3 create mode 100644 t/t4018/java-return-generic4