From patchwork Thu Oct 7 06:50:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12540989 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 320A7C433F5 for ; Thu, 7 Oct 2021 06:51:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1B2EC61245 for ; Thu, 7 Oct 2021 06:51:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240226AbhJGGw7 (ORCPT ); Thu, 7 Oct 2021 02:52:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231279AbhJGGw5 (ORCPT ); Thu, 7 Oct 2021 02:52:57 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7531CC061746 for ; Wed, 6 Oct 2021 23:51:04 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id o20so15722329wro.3 for ; Wed, 06 Oct 2021 23:51:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=1ONOga22QuyanLN85An/eWTp8qTqIq9SnebvZhdJxTw=; b=Zo2Ized6wluilF3LqlN0j4HxWdztRGKNp9IxMkEkZ0pUb8k0x0RYwapzsyR/UXEqm5 7AFTWQnCd7nnguwb8RrmHDZYfS9ACByaUebqs8NA2X9kt98Rf4R6franI8l++eyQNRBM JJdRYvUsMolwwqruHbtupTeTOaaOVLgPjZmRb76ilBEptfnzs6CrswDtwdxlLuXxRTh2 fx/7yH2Rv2S9PRDQ3zbP99etXrJ3VXsU9NhJwSyLn0pucNyIKYzViX0i3xcnvPZUQQtm bCDOr2iN+Y/xqLKR5PsRpsy8KrVnMNDpxK4EAaMMk3nj/afXX+KAGLclgUmW6ozm/JrK k1sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=1ONOga22QuyanLN85An/eWTp8qTqIq9SnebvZhdJxTw=; b=CNKePohfw5JCb2l5nTHXXwfDi9w55M5kgnIxkjEk6F57v2nxSY0OvgqJy57kSisIj4 XYH2NTjDUzOVyffi4mDb0xEJGS1ynU10AzM2O1eOnpDYkshg3b/lx4Ew4l2Oy7G5TEEj SLAKPSR8m13aw6te+QRc8kx9rkt7Acea+LVC/FWM8KRXjKHhQtGL52PqBPT2zOnyv0Z4 OeLBElEFcdvhXOHa7336zFdnA5O95EXYPx8ZHYvtA1Fe63bkcKXb+HQzsp59lmptLtMa dUORFPHVc4JjzXRoEd2n83AefJAo1eRRoKMsp5X43Yuw8Aa1AN5g0IuZtv7/Frt8+XX5 DnFQ== X-Gm-Message-State: AOAM531368piGSBPuua6WMWdKmn9AtKgaWJEXmyRRAb60iZfORirGl/f AOf1tE99riq1Nvd/n6xIBJZjtBDJ40c= X-Google-Smtp-Source: ABdhPJw4cC+tC1nCohvYMJRvv7LGrwsi3PboD7i9NBHMXqWZ2/GkXKxylAZM9utAXNBb8J8KAW7LvQ== X-Received: by 2002:adf:aa88:: with SMTP id h8mr3158710wrc.360.1633589463048; Wed, 06 Oct 2021 23:51:03 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l2sm7762907wmi.1.2021.10.06.23.51.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Oct 2021 23:51:02 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 07 Oct 2021 06:50:58 +0000 Subject: [PATCH 1/3] userdiff: tighten cpp word regex Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt Generally, word regex can be written such that they match tokens liberally and need not model the actual syntax because it can be assumed that the regex will only be applied to syntactically correct text. The regex for cpp (C/C++) is too liberal, though. It regards these sequences as single tokens: 1+2 1.5-e+2+f and the following amalgams as one token: .l as in str.length .f as in str.find Tighten the regex in the following way: - Accept + and - only in one position in the exponent. + and - are no longer regarded as the sign of a number and are treated by the catcher-all that is not visible in the driver's regex. - Accept a leading decimal point only when it is followed by a digit. For readability, factor hex- and binary numbers into an own term. As a drive-by, this fixes that floatingpoint numbers such as 12E5 (with upper-case E) were split into two tokens. Signed-off-by: Johannes Sixt --- userdiff.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/userdiff.c b/userdiff.c index d9b2ba752f0..ce2a9230703 100644 --- a/userdiff.c +++ b/userdiff.c @@ -54,8 +54,14 @@ PATTERNS("cpp", /* functions/methods, variables, and compounds at top level */ "^((::[[:space:]]*)?[A-Za-z_].*)$", /* -- */ + /* identifiers and keywords */ "[a-zA-Z_][a-zA-Z0-9_]*" - "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lLuU]*" + /* decimal and octal integers as well as floatingpoint numbers */ + "|[0-9][0-9.]*([Ee][-+]?[0-9]+)?[fFlLuU]*" + /* hexadecimal and binary integers */ + "|0[xXbB][0-9a-fA-F]+[lLuU]*" + /* floatingpoint numbers that begin with a decimal point */ + "|\\.[0-9]+([Ee][-+]?[0-9]+)?[fFlL]?" "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"), PATTERNS("csharp", /* Keywords */ From patchwork Thu Oct 7 06:50:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12540991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4001FC433EF for ; Thu, 7 Oct 2021 06:51:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1FD1F610A2 for ; Thu, 7 Oct 2021 06:51:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240255AbhJGGxC (ORCPT ); Thu, 7 Oct 2021 02:53:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240216AbhJGGw6 (ORCPT ); Thu, 7 Oct 2021 02:52:58 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBAE4C061755 for ; Wed, 6 Oct 2021 23:51:04 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id r10so15729718wra.12 for ; Wed, 06 Oct 2021 23:51:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=wSphJW77NGZ6yJKzBcFZADojysIOmdufyA8AVJIS2Qg=; b=NICwwqmfzdT3FrczvPwpR2HqtgX8DItlJB+0jiP6glvLR8kbONAca5Qb0K6xrujiGG SCKSqRHmc1nj4Xww0vd2VVh/vk0x/TKCg4QPx/Z5j5Tuh9VHpx83TdzQxwZcNsvmfRI8 lbfi4UNw5KwGDLean6dC42yuSy1a8/p+Z9X3f6l+ZFzFf4D64pQ7NngFvyMnp3XwiWzt BHZPmIhBjdjz20GWsHP/rQW3EGKW42mP8CNHUYIfKkkz9a5U3/Q9+IBLTzS0D5QjufYo TS92naKnvAmbLV2TDspjW+BNJj1aSGTuYsXKJT+0clumqZBKMMHzkdVX7d6ltIVrkI/L 4Mtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=wSphJW77NGZ6yJKzBcFZADojysIOmdufyA8AVJIS2Qg=; b=ks5f+grt9GQMCjE5Qa4Tplkh1MZF2LSZERBl1tT0rX7/4Z/2tK3fSxxIIxuQX78lXL dt41L+LJXGfFtDK4kFAsNX2amu6kTwYtP3yOe+NxxHECR1xo5esg2EEFk8Ghvs/64swA LaWCGb/jh2FvHylBo1sSGQZvW8WsV64C2xl9rrZK+24OWTphD/cW0Va4/I2S0+gWlh/j TwOL+n97e2ojH+exazKfPD8028fVXLf2pV5YzcBtKaNlxYk/Y451BLAj0jLCTGDAQeu5 YHwIhBrhTgqSswDWW1WHM36PRSChz7l4LUL0KWBJE/mvK+xMyIVQy8d4FMyMv/Us2CNK gbjA== X-Gm-Message-State: AOAM532Qg+CZdZzOqIEs4NHHyc00mVTsYquFvFC6AVdIQ1tMGzT5PbC3 +xGCl5kDg/Uc+/4Kq80re/JUW9DDKSk= X-Google-Smtp-Source: ABdhPJwzAmPgGwgqoYpW6tM5ZDtQ1dePbBo9jevY/bOmFBbEH/XtWqEoN2PDhe8v86YFByqxZHddJw== X-Received: by 2002:a5d:6481:: with SMTP id o1mr3299587wri.140.1633589463582; Wed, 06 Oct 2021 23:51:03 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n7sm23192714wra.37.2021.10.06.23.51.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Oct 2021 23:51:03 -0700 (PDT) Message-Id: <9d1c05f5f414b77a4edd741b738f47d4b9e3c56a.1633589461.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 07 Oct 2021 06:50:59 +0000 Subject: [PATCH 2/3] userdiff: permit the digit-separating single-quote in numbers Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt Since C++17, the single-quote can be used as digit separator: 3.141'592'654 1'000'000 0xdead'beaf Make it known to the word regex of the cpp driver, so that numbers are not split into separate tokens at the single-quotes. Signed-off-by: Johannes Sixt --- userdiff.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/userdiff.c b/userdiff.c index ce2a9230703..1b640c7df79 100644 --- a/userdiff.c +++ b/userdiff.c @@ -57,11 +57,11 @@ PATTERNS("cpp", /* identifiers and keywords */ "[a-zA-Z_][a-zA-Z0-9_]*" /* decimal and octal integers as well as floatingpoint numbers */ - "|[0-9][0-9.]*([Ee][-+]?[0-9]+)?[fFlLuU]*" + "|[0-9][0-9.']*([Ee][-+]?[0-9]+)?[fFlLuU]*" /* hexadecimal and binary integers */ - "|0[xXbB][0-9a-fA-F]+[lLuU]*" + "|0[xXbB][0-9a-fA-F']+[lLuU]*" /* floatingpoint numbers that begin with a decimal point */ - "|\\.[0-9]+([Ee][-+]?[0-9]+)?[fFlL]?" + "|\\.[0-9']+([Ee][-+]?[0-9]+)?[fFlL]?" "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"), PATTERNS("csharp", /* Keywords */ From patchwork Thu Oct 7 06:51:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12540993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68E4EC433F5 for ; Thu, 7 Oct 2021 06:51:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 491AD61215 for ; Thu, 7 Oct 2021 06:51:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240273AbhJGGxE (ORCPT ); Thu, 7 Oct 2021 02:53:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240208AbhJGGw7 (ORCPT ); Thu, 7 Oct 2021 02:52:59 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2A08C061746 for ; Wed, 6 Oct 2021 23:51:05 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id o20so15722474wro.3 for ; Wed, 06 Oct 2021 23:51:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=FXBmf7nYv0JJpqWnB/H36sHnSVaNnk8tIyAHSCh+KNQ=; b=QEPxmTx/vFronEsNG6UiZpVK2m6NDkwecflk4XacdZT6XCW5q+vLQgVmEcuE2IiSiE CpJ1jqDmiw2PdTCHNhmB3oVC7D6UZRvAOEKZIgiZr/86zgTSeZbwXJWCDNwoGWRas8yn vyT0M9E5jCH/7i25SouoHw2EP/R/SJ3R2B0uUsMUTQ0krK4RDc8yIEsgHlqyi9jazTuQ eXjMtqRCUHzbPBWxSb3i7U/C9vrpN/fkz6oP82/jtETDQ3pm50plBvzY9rfOB/IxKxsU EUj6ZqIDsebD2dTolVV9DTYDXLGP6PCmuQ9AlSy6oykSw8X79cTqihDfgRV6WBILvWAK lWQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=FXBmf7nYv0JJpqWnB/H36sHnSVaNnk8tIyAHSCh+KNQ=; b=ayUOlrEVqAEgTtkwFdSUWhSFLE0NpxgY5KVRVHKEf7hd7NqUE7cd3a9wnsmlm+XV6R HRq6+UbLSku3rStInkD7N2xwC6I0uX8ap/ufF2khhEcqWe0U0bY8kR5giDLiffAtrY5p s5c54sqeLPQUqRbGnMK6Z4yd9hXyH6dWw7Z3GC+iESPUudUE5Kk2ETcNvlIid2CpBkSD IKYY69NrPl+Tkphb6My1GClM9bcQWg+984E7+qiGVpp87mmbEnm461k+6jHyJTZLONs9 vxVQ780FS/9DH4Jr3A/FJm1Ud0vnuNEjkBlXrkz7U0LAXtZnmznVArHimm0806N8sakM WG0w== X-Gm-Message-State: AOAM533AbOXZF14c05bhI7b7Vwy/6IADPtoMLle3o83UzTUf4GICrW11 9PsXQVC6iXYaCFwsBCRU5+OP3oepBLk= X-Google-Smtp-Source: ABdhPJyf2vB/ZlzNdg9JD/ujd8/c5jcpdXRgQtEUn3XtmDTKVUS93/OGbZQvRysI8bZpLy0lb9bfkg== X-Received: by 2002:a05:600c:1c8e:: with SMTP id k14mr6555717wms.27.1633589464184; Wed, 06 Oct 2021 23:51:04 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q3sm7059207wmc.25.2021.10.06.23.51.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Oct 2021 23:51:03 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 07 Oct 2021 06:51:00 +0000 Subject: [PATCH 3/3] userdiff: learn the C++ spaceship operator Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt Since C++20, the language has a generalized comparison operator. Teach the cpp driver not to separate it into <= and > tokens. Signed-off-by: Johannes Sixt --- userdiff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/userdiff.c b/userdiff.c index 1b640c7df79..13cec0b48db 100644 --- a/userdiff.c +++ b/userdiff.c @@ -62,7 +62,7 @@ PATTERNS("cpp", "|0[xXbB][0-9a-fA-F']+[lLuU]*" /* floatingpoint numbers that begin with a decimal point */ "|\\.[0-9']+([Ee][-+]?[0-9]+)?[fFlL]?" - "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"), + "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*|<=>"), PATTERNS("csharp", /* Keywords */ "!^[ \t]*(do|while|for|if|else|instanceof|new|return|switch|case|throw|catch|using)\n"