diff mbox series

userdiff: add builtin diff driver for TypeScript language

Message ID pull.1746.git.git.1721061218993.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series userdiff: add builtin diff driver for TypeScript language | expand

Commit Message

Matthew Hughes July 15, 2024, 4:33 p.m. UTC
From: Matthew Hughes <matthewhughes934@gmail.com>

TypeScript[1] is an open-source programming language that builds on
JavaScript. This patch adds builtin diff driver support for this
language. As far as I can tell there is no official syntax specification
for the language (see[2] for some discussion) so this patch is based off
some existing work[3]. The docs[4] probably provide the best
reference as to what this driver should satisfy. See[5] for
discussion/motivation for this change from the TypeScript language team.

This is my first time developing a diff driver, so as such the
implementation borrows quite a bit from existing drivers. The funcname
attribute matches function and class definitions, the list of keywords
used to define functions was take from[3], I could not find an
exhaustive list for these. The word-regex borrows much from other
existing diff engines, with the addition of the rather unique
right-shifting operators (>>> and >>>=) available in JavaScript (and
hence Typescript)[6]

[1] https://www.typescriptlang.org/
[2] https://github.com/Microsoft/TypeScript/issues/15711
[3] https://github.com/git/git/pull/859
[4] https://www.typescriptlang.org/docs/
[5] https://github.com/microsoft/TypeScript/issues/36185
[6] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Unsigned_right_shift

Signed-off-by: Matthew Hughes <matthewhughes934@gmail.com>
---
    userdiff: add builtin diff driver for TypeScript language

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1746%2Fmatthewhughes934%2Fadd-typescript-diff-driver-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1746/matthewhughes934/add-typescript-diff-driver-v1
Pull-Request: https://github.com/git/git/pull/1746

 Documentation/gitattributes.txt      |  2 ++
 t/t4018/typescript-arrow-func        |  7 +++++
 t/t4018/typescript-class             |  8 ++++++
 t/t4018/typescript-func              |  7 +++++
 t/t4018/typescript-indented-func     |  7 +++++
 t/t4018/typescript-interface         |  5 ++++
 t/t4018/typescript-nested-arrow-func |  7 +++++
 t/t4034-diff-words.sh                |  1 +
 t/t4034/typescript/expect            | 42 ++++++++++++++++++++++++++++
 t/t4034/typescript/post              | 24 ++++++++++++++++
 t/t4034/typescript/pre               | 24 ++++++++++++++++
 userdiff.c                           | 11 ++++++++
 12 files changed, 145 insertions(+)
 create mode 100644 t/t4018/typescript-arrow-func
 create mode 100644 t/t4018/typescript-class
 create mode 100644 t/t4018/typescript-func
 create mode 100644 t/t4018/typescript-indented-func
 create mode 100644 t/t4018/typescript-interface
 create mode 100644 t/t4018/typescript-nested-arrow-func
 create mode 100644 t/t4034/typescript/expect
 create mode 100644 t/t4034/typescript/post
 create mode 100644 t/t4034/typescript/pre


base-commit: a7dae3bdc8b516d36f630b12bb01e853a667e0d9

Comments

Matthew Hughes July 16, 2024, 12:21 p.m. UTC | #1
On Mon, Jul 15, 2024 at 04:33:38PM +0000, Matthew Hughes via GitGitGadget wrote:
> diff --git a/userdiff.c b/userdiff.c
> index c4ebb9ff734..7247d351cde 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -333,6 +333,17 @@ PATTERNS("scheme",
>  	 "|([^][)(}{[ \t])+"),
>  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
>  	 "\\\\[a-zA-Z@]+|\\\\.|([a-zA-Z0-9]|[^\x01-\x7f])+"),
> +PATTERNS("typescript",
> +	"^[\t ]*((class|constructor|public|private|protected|function|interface)[ \t].*)$\n"
> +	// arrow funcs
> +	"^[\t ]*((const|let|var)?[^()]*)=[\t ]*\\([^()]*\\)[\t ]*.*=>.*$",
> +	/* -- */
> +	"[a-zA-Z_][a-zA-Z0-9_]*"
> +	// numeric constants
> +	"|[-+0-9.e]+|0[xX]?[0-9a-fA-F]"
> +	// operators
> +	"|[-+*/<>%&^|=!]"
> +	"|--|\\+\\+|//=?|<<=?|>>?=?"),
>  { .name = "default", .binary = -1 },
>  };
>  #undef PATTERNS
> 
> base-commit: a7dae3bdc8b516d36f630b12bb01e853a667e0d9
> -- 
> gitgitgadget

This needs some updates. For the arrow function, definitions can cover multiple
lines e.g.:

    const bar = (
        name: string
    ) => console.log(name)

The funcname pattern should also consider the `export` keyword, since both of
the following are valid:

    export const bar = (
        name: string
    ) => console.log(name)

    export function foo() {}

Some docs: https://www.typescriptlang.org/docs/handbook/modules/reference.html#module-syntax
Junio C Hamano July 16, 2024, 3:45 p.m. UTC | #2
Matthew Hughes <matthewhughes934@gmail.com> writes:

> This needs some updates.

What does it mean?

The patterns that were posted were so broken that they are unusable
and harm the users by giving misleading information?

Or do the patterns work just fine in basic or tutorial cases, but
with more advanced or realistic uses of the language construct, they
highlight wrong lines as the function header and/or split at wrong
word boundaries that are obviously much less optimal than ideal that
any human users would find questionable?

In the latter case, how far from the ideal are the decisions done by
the current patterns, and what's the rough percentage of usual code
we see in the real world, for which the current patterns do not work
well?

What I am trying to gauge is if it is so broken that it should not
exist (in other words, you regret sending the patch to the list
before doing these updates), or is "already serviceable, but not
perfect yet".  Waiting for perfection takes forever.  If the latter,
letting the general public to use it to gather feedbacks by waiting
for the dust to settle before making such updates is often better.
Matthew Hughes July 16, 2024, 7:33 p.m. UTC | #3
Oe Tue, Jul 16, 2024 at 08:45:08AM -0700, Junio C Hamano wrote:
> What does it mean?
> 
> The patterns that were posted were so broken that they are unusable
> and harm the users by giving misleading information?

I think this would be a good summary. It's sufficient for some simpler cases
considered, and does even give some benefits e.g. for function headers for
nested functions. However, the cases where it fails can be significant, e.g.
hundreds of lines away from the correct function header for files with multiple
consecutive multi-line arrow functions.

> In the latter case, how far from the ideal are the decisions done by
> the current patterns, and what's the rough percentage of usual code
> we see in the real world, for which the current patterns do not work
> well?

I think just the missing `export` keyword handling would be equivalent to
missing all public functions in other programming languages, so that alone
would be a decent percentage.

> What I am trying to gauge is if it is so broken that it should not
> exist (in other words, you regret sending the patch to the list
> before doing these updates), or is "already serviceable, but not
> perfect yet".  Waiting for perfection takes forever.  If the latter,
> letting the general public to use it to gather feedbacks by waiting
> for the dust to settle before making such updates is often better.

I'm leaning towards the former case: that this patch was premature. I think
it's far enough from perfect that it would greatly benefit from me more
actively reaching out to the TypeScript language team and asking some devs
there try out the changes and gather some more input (and identify some more
missing cases, of which I now expect that are many) before getting
something out to general users.
Johannes Sixt July 16, 2024, 8:53 p.m. UTC | #4
We have had a submission for typescript just recently:

https://lore.kernel.org/git/20240404163827.5855-1-utsavp0213@gmail.com/

And two for Javascript

https://lore.kernel.org/git/20240301074048.188835-1-sergiusnyah@gmail.com/
https://lore.kernel.org/git/20220403132508.28196-1-a97410985new@gmail.com/

Please review these earlier submissions. If you think you can improve on
them, you are very welcome to do so. But you can also just resend one of
these series with a note that they are sufficiently mature and that you
support the submitted version.

I may very well mistaken, but I think that Typescript is a syntactical
superset of Javascript, so we need just one language driver and mention
in the documentation that it can be used for both languages.

Thanks,
-- Hannes
Junio C Hamano July 16, 2024, 9:10 p.m. UTC | #5
Matthew Hughes <matthewhughes934@gmail.com> writes:

> I'm leaning towards the former case: that this patch was premature.

OK.  Then let's take sufficient time.  After all, we are never in a
hurry ;-)

Thanks for giving an honest assessment.  Will keep the topic in
'seen' without marking it for 'next' (at least until it gets
replaced with a version that is more suitable to the public).
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e6150595af8..88ebfb2e6c9 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -907,6 +907,8 @@  patterns are available:
 
 - `tex` suitable for source code for LaTeX documents.
 
+- `typescript` suitable for source code in the TypeScript language.
+
 
 Customizing word diff
 ^^^^^^^^^^^^^^^^^^^^^
diff --git a/t/t4018/typescript-arrow-func b/t/t4018/typescript-arrow-func
new file mode 100644
index 00000000000..c884c2e6bdc
--- /dev/null
+++ b/t/t4018/typescript-arrow-func
@@ -0,0 +1,7 @@ 
+const RIGHT = (a: number) : number => {
+    const res = a + 1
+
+    // some comment
+    console.log("ChangeMe")
+    return res
+}
diff --git a/t/t4018/typescript-class b/t/t4018/typescript-class
new file mode 100644
index 00000000000..a373f7af1dd
--- /dev/null
+++ b/t/t4018/typescript-class
@@ -0,0 +1,8 @@ 
+class RIGHT {
+    // some
+    // comments
+    // ChangeMe
+    constructor() {
+        console.log("constructing...")
+    }
+}
diff --git a/t/t4018/typescript-func b/t/t4018/typescript-func
new file mode 100644
index 00000000000..94cc659a452
--- /dev/null
+++ b/t/t4018/typescript-func
@@ -0,0 +1,7 @@ 
+function RIGHT(a: number) : number {
+    const res = a + 1
+
+    // some comment
+    console.log("ChangeMe")
+    return res
+}
diff --git a/t/t4018/typescript-indented-func b/t/t4018/typescript-indented-func
new file mode 100644
index 00000000000..fee0373126e
--- /dev/null
+++ b/t/t4018/typescript-indented-func
@@ -0,0 +1,7 @@ 
+class Foo {
+    public RIGHT() {
+        // some
+        // comments
+        console.log("ChangeMe")
+    }
+}
diff --git a/t/t4018/typescript-interface b/t/t4018/typescript-interface
new file mode 100644
index 00000000000..78d14523446
--- /dev/null
+++ b/t/t4018/typescript-interface
@@ -0,0 +1,5 @@ 
+interface RIGHT {
+    name: string;
+    // some comment 
+    ChangeMe: number;
+}
diff --git a/t/t4018/typescript-nested-arrow-func b/t/t4018/typescript-nested-arrow-func
new file mode 100644
index 00000000000..52dbc6a92e5
--- /dev/null
+++ b/t/t4018/typescript-nested-arrow-func
@@ -0,0 +1,7 @@ 
+class Foo {
+    RIGHT = () => {
+        // some
+        // comment
+        console.log("ChangeMe")
+    }
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 74586f3813c..4e3cf415c23 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -338,6 +338,7 @@  test_language_driver python
 test_language_driver ruby
 test_language_driver scheme
 test_language_driver tex
+test_language_driver typescript
 
 test_expect_success 'word-diff with diff.sbe' '
 	cat >pre <<-\EOF &&
diff --git a/t/t4034/typescript/expect b/t/t4034/typescript/expect
new file mode 100644
index 00000000000..fde27423339
--- /dev/null
+++ b/t/t4034/typescript/expect
@@ -0,0 +1,42 @@ 
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 6a5ba06c60..db4788b1a6 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,24 +1,24 @@<RESET>
+console.log("Hello world<RED>!<RESET><GREEN>?<RESET>")
+foo===<RED>bar<RESET><GREEN>buz<RESET>
+foo!==<RED>bar<RESET><GREEN>buz<RESET>
+function fn() : <RED>number<RESET><GREEN>string<RESET> { return <RED>1<RESET><GREEN>"ok"<RESET> }
+<RED>a<RESET><GREEN>x<RESET>.someFunc()
+const <RED>a<RESET><GREEN>x<RESET> = () => {}
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+"<RED>x<RESET><GREEN>y<RESET>"
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>>>>=<RED>b a<RESET><GREEN>y x<RESET>>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/typescript/post b/t/t4034/typescript/post
new file mode 100644
index 00000000000..4aca03f8ce6
--- /dev/null
+++ b/t/t4034/typescript/post
@@ -0,0 +1,24 @@ 
+console.log("Hello world?")
+foo===buz
+foo!==buz
+function fn() : string { return "ok" }
+x.someFunc()
+const x = () => {}
+(1) (-1e10) (0xabcdef) 'y'
+"y"
+[x] x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x>>>=y x>>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/typescript/pre b/t/t4034/typescript/pre
new file mode 100644
index 00000000000..ef654a92c61
--- /dev/null
+++ b/t/t4034/typescript/pre
@@ -0,0 +1,24 @@ 
+console.log("Hello world!")
+foo===bar
+foo!==bar
+function fn() : number { return 1 }
+a.someFunc()
+const a = () => {}
+1 -1e10 0xabcdef 'x'
+"x"
+[a] a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a>>>=b a>>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/userdiff.c b/userdiff.c
index c4ebb9ff734..7247d351cde 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -333,6 +333,17 @@  PATTERNS("scheme",
 	 "|([^][)(}{[ \t])+"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
 	 "\\\\[a-zA-Z@]+|\\\\.|([a-zA-Z0-9]|[^\x01-\x7f])+"),
+PATTERNS("typescript",
+	"^[\t ]*((class|constructor|public|private|protected|function|interface)[ \t].*)$\n"
+	// arrow funcs
+	"^[\t ]*((const|let|var)?[^()]*)=[\t ]*\\([^()]*\\)[\t ]*.*=>.*$",
+	/* -- */
+	"[a-zA-Z_][a-zA-Z0-9_]*"
+	// numeric constants
+	"|[-+0-9.e]+|0[xX]?[0-9a-fA-F]"
+	// operators
+	"|[-+*/<>%&^|=!]"
+	"|--|\\+\\+|//=?|<<=?|>>?=?"),
 { .name = "default", .binary = -1 },
 };
 #undef PATTERNS