mbox series

[v3,0/3] userdiff: Java updates

Message ID 20230207234259.452141-1-rybak.a.v@gmail.com (mailing list archive)
Headers show
Series userdiff: Java updates | expand

Message

Andrei Rybak Feb. 7, 2023, 11:42 p.m. UTC
On 2023-02-05T22:33 Johannes Sixt wrote:
> Having seen all these examples, I think the following truncated
> expression might do the right thing for all cases that are valid Java:
> 
> "^[ \t]*(([a-z-]+[ \t]+)*(class|enum|interface|record)[ \t].*)$"

Only the '\n' is missing at the end, but otherwise I concur, so here's a v3.

> i.e., we recognize a whitespace in order to identify the keyword, and
> then capture anything that follows without being specific. My reasoning
> is that "class", "enum", "interface", and "record" cannot occur in any
> other context than the beginning of a class definition. (But please do
> correct me; I know next to nothing about Java syntax.)

The word "class" can also occur as part of a class literal, for example:

    Class<String> c = String.class;

but valid uses of class literals won't interfere with our regex, unless some
wild formatting is applied.  This is technically valid Java:

    Class<String> c = String.
    class 
    ;

and with a space after lowercase "class", the v3 regex will trip.  Class
literals are described in the JLS here:
https://docs.oracle.com/javase/specs/jls/se17/html/jls-15.html#jls-15.8.2

> As always,
> userdiff regular expressions can assume that only valid constructs are
> inspected.

Changes since v2:

  - simplified regex that doesn't match class names at all and supports more
    code styles
  - updated the comment just above the regex in PATCH 2/3 to mention records
  - more tests to cover the cases mentioned during review of v2
  - reworded commit messages to reflect the above items

Range diff since v2:

1:  c300745a58 ! 1:  9e859e3b79 userdiff: support Java type parameters
    @@ Metadata
      ## Commit message ##
         userdiff: support Java type parameters
     
    -    A class or interface in Java [1] can have type parameters immediately
    -    following the name in the declaration, surrounded by angle brackets
    -    (paired less than and greater than signs).[2]  Example of a class with
    -    type parameters "A" and "N":
    -
    -        public class ParameterizedClass<A, N> {
    -            private A field1;
    -            private N field2;
    +    A class or interface in Java can have type parameters following the name
    +    in the declared type, surrounded by angle brackets (paired less than and
    +    greater than signs).[2]   The type parameters -- `A` and `B` in the
    +    examples -- may follow the class name immediately:
    +
    +        public class ParameterizedClass<A, B> {
             }
     
    -    Support matching a parameterized class or interface declaration with
    -    type parameters immediately following the name of the type in the
    -    builtin userdiff pattern for Java.  Do so by just allowing matching the
    -    first character after the name of the type to "<".
    +    or may be separated by whitespace:
    +
    +        public class SpaceBeforeTypeParameters <A, B> {
    +        }
     
    -    An alternative approach could be to match both the opening and the
    -    closing angle brackets and matching the content between them in various
    -    ways.  Just use the simpler regex for now.
    +    A part of the builtin userdiff pattern for Java matches declarations of
    +    classes, enums, and interfaces.  The regular expression requires at
    +    least one whitespace character after the name of the declared type.
    +    This disallows matching for opening angle bracket of type parameters
    +    immediately after the name of the type.  Mandatory whitespace after the
    +    name of the type also disallows using the pattern in repositories with a
    +    fairly common code style that puts braces for the body of a class on
    +    separate lines:
    +
    +        class WithLineBreakBeforeOpeningBrace
    +        {
    +        }
    +
    +    Support matching Java code in more diverse code styles and declarations
    +    of classes and interfaces with type parameters immediately following the
    +    name of the type in the builtin userdiff pattern for Java.  Do so by
    +    just matching anything until the end of the line after the keywords for
    +    the kind of type being declared.
     
         [1] Since Java 5 released in 2004.
         [2] Detailed description is available in the Java Language
    @@ Commit message
     
         Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
     
    + ## t/t4018/java-class-brace-on-separate-line (new) ##
    +@@
    ++class RIGHT
    ++{
    ++    static int ONE;
    ++    static int TWO;
    ++    static int ChangeMe;
    ++}
    +
    + ## t/t4018/java-class-space-before-type-parameters (new) ##
    +@@
    ++class RIGHT <TYPE, PARAMS, AFTER, SPACE> {
    ++    static int ONE;
    ++    static int TWO;
    ++    static int THREE;
    ++    private A ChangeMe;
    ++}
    +
      ## t/t4018/java-class-type-parameters (new) ##
     @@
     +class RIGHT<A, B> {
    @@ userdiff.c: PATTERNS("html",
      	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
      	 /* Class, enum, and interface declarations */
     -	 "^[ \t]*(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)$\n"
    -+	 "^[ \t]*(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*([ \t]+|<).*)$\n"
    ++	 "^[ \t]*(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+.*)$\n"
      	 /* Method definitions; note that constructor signatures are not */
      	 /* matched because they are indistinguishable from method calls. */
      	 "^[ \t]*(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
2:  a0e622a0f8 ! 2:  4f7be5f642 userdiff: support Java record types
    @@ Commit message
         the name of the record class is followed by a mandatory list of
         components.  The list is enclosed in parentheses, it may be empty, and
         it may immediately follow the name of the class or type parameters, if
    -    any, without separating whitespace.
    -
    -    Code examples:
    +    any, with or without separating whitespace.  For example:
     
             public record Example(int i, String s) {
             }
    @@ Commit message
             public record WithTypeParameters<A, B>(A a, B b, String s) {
             }
     
    +        record SpaceBeforeComponents (String comp1, int comp2) {
    +        }
    +
         Support records in the builtin userdiff pattern for Java.  Add "record"
    -    to the alternatives of keywords for kinds of class, and match an opening
    -    parenthesis as the first character right after the type name.
    +    to the alternatives of keywords for kinds of class.
     
    -    An alternative approach could be to have an optional group that would
    -    match both the opening and the closing parentheses with some way of
    -    matching the declarations of the components.  Just use the simpler
    -    regular expression for now.
    +    Allowing matching various possibilities for the type parameters and/or
    +    list of the components of a record has already been covered by the
    +    preceding patch.
     
         [1] detailed description is available in "JEP 395: Records"
             https://openjdk.org/jeps/395
    @@ t/t4018/java-record (new)
     +    static int TWO;
     +    static int THREE;
     +    static int ChangeMe;
    ++}
    +
    + ## t/t4018/java-record-space-before-components (new) ##
    +@@
    ++public record RIGHT (String components, String after, String space) {
    ++    static int ONE;
    ++    static int TWO;
    ++    static int THREE;
    ++    static int ChangeMe;
     +}
     
      ## t/t4018/java-record-type-parameters (new) ##
    @@ t/t4018/java-record-type-parameters (new)
     
      ## userdiff.c ##
     @@ userdiff.c: PATTERNS("html",
    + 	 "[^<>= \t]+"),
      PATTERNS("java",
      	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
    - 	 /* Class, enum, and interface declarations */
    --	 "^[ \t]*(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*([ \t]+|<).*)$\n"
    -+	 "^[ \t]*(([a-z]+[ \t]+)*(class|enum|interface|record)[ \t]+[A-Za-z][A-Za-z0-9_$]*([ \t]+|[<(]).*)$\n"
    +-	 /* Class, enum, and interface declarations */
    +-	 "^[ \t]*(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+.*)$\n"
    ++	 /* Class, enum, interface, and record declarations */
    ++	 "^[ \t]*(([a-z]+[ \t]+)*(class|enum|interface|record)[ \t]+.*)$\n"
      	 /* Method definitions; note that constructor signatures are not */
      	 /* matched because they are indistinguishable from method calls. */
      	 "^[ \t]*(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
3:  b9c6a5dffd ! 3:  ea6ce671ef userdiff: support Java sealed classes
    @@ userdiff.c
     @@ userdiff.c: PATTERNS("html",
      PATTERNS("java",
      	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
    - 	 /* Class, enum, and interface declarations */
    --	 "^[ \t]*(([a-z]+[ \t]+)*(class|enum|interface|record)[ \t]+[A-Za-z][A-Za-z0-9_$]*([ \t]+|[<(]).*)$\n"
    -+	 "^[ \t]*(([a-z-]+[ \t]+)*(class|enum|interface|record)[ \t]+[A-Za-z][A-Za-z0-9_$]*([ \t]+|[<(]).*)$\n"
    + 	 /* Class, enum, interface, and record declarations */
    +-	 "^[ \t]*(([a-z]+[ \t]+)*(class|enum|interface|record)[ \t]+.*)$\n"
    ++	 "^[ \t]*(([a-z-]+[ \t]+)*(class|enum|interface|record)[ \t]+.*)$\n"
      	 /* Method definitions; note that constructor signatures are not */
      	 /* matched because they are indistinguishable from method calls. */
      	 "^[ \t]*(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",

Andrei Rybak (3):
  userdiff: support Java type parameters
  userdiff: support Java record types
  userdiff: support Java sealed classes

 t/t4018/java-class-brace-on-separate-line              | 6 ++++++
 t/t4018/java-class-space-before-type-parameters        | 6 ++++++
 t/t4018/java-class-type-parameters                     | 6 ++++++
 t/t4018/java-class-type-parameters-implements          | 6 ++++++
 t/t4018/java-interface-type-parameters                 | 6 ++++++
 t/t4018/java-interface-type-parameters-extends         | 6 ++++++
 t/t4018/java-non-sealed                                | 8 ++++++++
 t/t4018/java-record                                    | 6 ++++++
 t/t4018/java-record-space-before-components            | 6 ++++++
 t/t4018/java-record-type-parameters                    | 6 ++++++
 t/t4018/java-sealed                                    | 7 +++++++
 t/t4018/java-sealed-permits                            | 6 ++++++
 t/t4018/java-sealed-type-parameters                    | 6 ++++++
 t/t4018/java-sealed-type-parameters-implements-permits | 6 ++++++
 t/t4018/java-sealed-type-parameters-permits            | 6 ++++++
 userdiff.c                                             | 4 ++--
 16 files changed, 95 insertions(+), 2 deletions(-)
 create mode 100644 t/t4018/java-class-brace-on-separate-line
 create mode 100644 t/t4018/java-class-space-before-type-parameters
 create mode 100644 t/t4018/java-class-type-parameters
 create mode 100644 t/t4018/java-class-type-parameters-implements
 create mode 100644 t/t4018/java-interface-type-parameters
 create mode 100644 t/t4018/java-interface-type-parameters-extends
 create mode 100644 t/t4018/java-non-sealed
 create mode 100644 t/t4018/java-record
 create mode 100644 t/t4018/java-record-space-before-components
 create mode 100644 t/t4018/java-record-type-parameters
 create mode 100644 t/t4018/java-sealed
 create mode 100644 t/t4018/java-sealed-permits
 create mode 100644 t/t4018/java-sealed-type-parameters
 create mode 100644 t/t4018/java-sealed-type-parameters-implements-permits
 create mode 100644 t/t4018/java-sealed-type-parameters-permits

Comments

Johannes Sixt Feb. 8, 2023, 8:51 p.m. UTC | #1
Am 08.02.23 um 00:42 schrieb Andrei Rybak:
> On 2023-02-05T22:33 Johannes Sixt wrote:
>> Having seen all these examples, I think the following truncated
>> expression might do the right thing for all cases that are valid Java:
>>
>> "^[ \t]*(([a-z-]+[ \t]+)*(class|enum|interface|record)[ \t].*)$"
> 
> Only the '\n' is missing at the end, but otherwise I concur, so here's a v3.
> 
>> i.e., we recognize a whitespace in order to identify the keyword, and
>> then capture anything that follows without being specific. My reasoning
>> is that "class", "enum", "interface", and "record" cannot occur in any
>> other context than the beginning of a class definition. (But please do
>> correct me; I know next to nothing about Java syntax.)
> 
> The word "class" can also occur as part of a class literal, for example:
> 
>     Class<String> c = String.class;
> 
> but valid uses of class literals won't interfere with our regex, unless some
> wild formatting is applied.  This is technically valid Java:
> 
>     Class<String> c = String.
>     class 
>     ;
> 
> and with a space after lowercase "class", the v3 regex will trip.

Yeah, let's assume that nobody writes code like this.

This iteration is all good!

Reviewed-by: Johannes Sixt <j6t@kdbg.org>

-- Hannes
Junio C Hamano Feb. 8, 2023, 8:55 p.m. UTC | #2
Johannes Sixt <j6t@kdbg.org> writes:

> Yeah, let's assume that nobody writes code like this.
>
> This iteration is all good!
>
> Reviewed-by: Johannes Sixt <j6t@kdbg.org>

Thanks.  I think I've queued this round already, but let me amend
your Reviewed-by into them.