diff mbox series

scripts/coccinelle: New range.cocci

Message ID 20240725055447.14512-1-yaoxt.fnst@fujitsu.com (mailing list archive)
State New, archived
Headers show
Series scripts/coccinelle: New range.cocci | expand

Commit Message

Xingtao Yao (Fujitsu) July 25, 2024, 5:54 a.m. UTC
This is the semantic patch from commit 7b3e371526 "cxl/mailbox: make
range overlap check more readable"

Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
---
 scripts/coccinelle/range.cocci | 49 ++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
 create mode 100644 scripts/coccinelle/range.cocci

Comments

Zhijian Li (Fujitsu)" via Aug. 20, 2024, 3:13 a.m. UTC | #1
ping.

> -----Original Message-----
> From: Yao Xingtao <yaoxt.fnst@fujitsu.com>
> Sent: Thursday, July 25, 2024 1:55 PM
> To: qemu-devel@nongnu.org
> Cc: Yao, Xingtao/姚 幸涛 <yaoxt.fnst@fujitsu.com>
> Subject: [PATCH] scripts/coccinelle: New range.cocci
> 
> This is the semantic patch from commit 7b3e371526 "cxl/mailbox: make
> range overlap check more readable"
> 
> Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
> ---
>  scripts/coccinelle/range.cocci | 49
> ++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
>  create mode 100644 scripts/coccinelle/range.cocci
> 
> diff --git a/scripts/coccinelle/range.cocci b/scripts/coccinelle/range.cocci
> new file mode 100644
> index 000000000000..21b07945ccb2
> --- /dev/null
> +++ b/scripts/coccinelle/range.cocci
> @@ -0,0 +1,49 @@
> +/*
> +  Usage:
> +
> +    spatch \
> +           --macro-file scripts/cocci-macro-file.h \
> +           --sp-file scripts/coccinelle/range.cocci \
> +           --keep-comments \
> +           --in-place \
> +           --dir .
> +
> +  Description:
> +    Find out the range overlap check and use ranges_overlap() instead.
> +
> +  Note:
> +    This pattern cannot accurately match the region overlap check, and you
> +    need to manually delete the use cases that do not meet the conditions.
> +
> +    In addition, the parameters of ranges_overlap() may be filled incorrectly,
> +    and some use cases may be better to use range_overlaps_range().
> +*/
> +
> +@@
> +expression E1, E2, E3, E4;
> +@@
> +(
> +- E2 <= E3 || E1 >= E4
> ++ !ranges_overlap(E1, E2, E3, E4)
> +|
> +
> +- (E2 <= E3) || (E1 >= E4)
> ++ !ranges_overlap(E1, E2, E3, E4)
> +|
> +
> +- E1 < E4 && E2 > E3
> ++ ranges_overlap(E1, E2, E3, E4)
> +|
> +
> +- (E1 < E4) && (E2 > E3)
> ++ ranges_overlap(E1, E2, E3, E4)
> +|
> +
> +- (E1 >= E3 && E1 < E4) || (E2 > E3 && E2 <= E4)
> ++ ranges_overlap(E1, E2, E3, E4)
> +
> +|
> +- ((E1 >= E3) && (E1 < E4)) || ((E2 > E3) && (E2 <= E4))
> ++ ranges_overlap(E1, E2, E3, E4)
> +)
> +
> --
> 2.41.0
Peter Maydell Aug. 20, 2024, 8:41 a.m. UTC | #2
On Thu, 25 Jul 2024 at 06:55, Yao Xingtao via <qemu-devel@nongnu.org> wrote:
>
> This is the semantic patch from commit 7b3e371526 "cxl/mailbox: make
> range overlap check more readable"
>
> Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
> ---
>  scripts/coccinelle/range.cocci | 49 ++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
>  create mode 100644 scripts/coccinelle/range.cocci
>
> diff --git a/scripts/coccinelle/range.cocci b/scripts/coccinelle/range.cocci
> new file mode 100644
> index 000000000000..21b07945ccb2
> --- /dev/null
> +++ b/scripts/coccinelle/range.cocci
> @@ -0,0 +1,49 @@
> +/*
> +  Usage:
> +
> +    spatch \
> +           --macro-file scripts/cocci-macro-file.h \
> +           --sp-file scripts/coccinelle/range.cocci \
> +           --keep-comments \
> +           --in-place \
> +           --dir .
> +
> +  Description:
> +    Find out the range overlap check and use ranges_overlap() instead.
> +
> +  Note:
> +    This pattern cannot accurately match the region overlap check, and you
> +    need to manually delete the use cases that do not meet the conditions.
> +
> +    In addition, the parameters of ranges_overlap() may be filled incorrectly,
> +    and some use cases may be better to use range_overlaps_range().

I think these restrictions mean that we should do one
of two things:
 (1) rewrite this as a Coccinelle script that just prints
     out the places where it found matches (i.e. a "grep"
     type operation, not a search-and-replace), so the
     user can then go and investigate them and do the
     range_overlap they want to do
 (2) fix the problems so that you really can apply it to
     the source tree and get correct fixes

The ideal would be (2) -- the ideal flow for coccinelle
driven patches is that you can review the coccinelle
script to check for things like off-by-one errors, and
then can trust all the changes it makes. Otherwise
reviewers need to carefully scrutinize each source
change individually.

It's the off-by-one issue that really makes me think
that (2) would be preferable -- it would otherwise
be I think quite easy to accidentally rewrite a correct
range check into one that's off-by-one and not notice.

thanks
-- PMM
Zhijian Li (Fujitsu)" via Aug. 21, 2024, 12:21 a.m. UTC | #3
> -----Original Message-----
> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Tuesday, August 20, 2024 4:41 PM
> To: Yao, Xingtao/姚 幸涛 <yaoxt.fnst@fujitsu.com>
> Cc: qemu-devel@nongnu.org
> Subject: Re: [PATCH] scripts/coccinelle: New range.cocci
> 
> On Thu, 25 Jul 2024 at 06:55, Yao Xingtao via <qemu-devel@nongnu.org> wrote:
> >
> > This is the semantic patch from commit 7b3e371526 "cxl/mailbox: make
> > range overlap check more readable"
> >
> > Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
> > ---
> >  scripts/coccinelle/range.cocci | 49
> ++++++++++++++++++++++++++++++++++
> >  1 file changed, 49 insertions(+)
> >  create mode 100644 scripts/coccinelle/range.cocci
> >
> > diff --git a/scripts/coccinelle/range.cocci b/scripts/coccinelle/range.cocci
> > new file mode 100644
> > index 000000000000..21b07945ccb2
> > --- /dev/null
> > +++ b/scripts/coccinelle/range.cocci
> > @@ -0,0 +1,49 @@
> > +/*
> > +  Usage:
> > +
> > +    spatch \
> > +           --macro-file scripts/cocci-macro-file.h \
> > +           --sp-file scripts/coccinelle/range.cocci \
> > +           --keep-comments \
> > +           --in-place \
> > +           --dir .
> > +
> > +  Description:
> > +    Find out the range overlap check and use ranges_overlap() instead.
> > +
> > +  Note:
> > +    This pattern cannot accurately match the region overlap check, and you
> > +    need to manually delete the use cases that do not meet the conditions.
> > +
> > +    In addition, the parameters of ranges_overlap() may be filled incorrectly,
> > +    and some use cases may be better to use range_overlaps_range().
> 
> I think these restrictions mean that we should do one
> of two things:
>  (1) rewrite this as a Coccinelle script that just prints
>      out the places where it found matches (i.e. a "grep"
>      type operation, not a search-and-replace), so the
>      user can then go and investigate them and do the
>      range_overlap they want to do
>  (2) fix the problems so that you really can apply it to
>      the source tree and get correct fixes
> 
> The ideal would be (2) -- the ideal flow for coccinelle
> driven patches is that you can review the coccinelle
> script to check for things like off-by-one errors, and
> then can trust all the changes it makes. Otherwise
> reviewers need to carefully scrutinize each source
> change individually.
> 
> It's the off-by-one issue that really makes me think
> that (2) would be preferable -- it would otherwise
> be I think quite easy to accidentally rewrite a correct
> range check into one that's off-by-one and not notice.
thanks for your reply!
I tried my best to match all the patterns of the region overlap check,
but it is difficult, I am not good at cocci, could you give me some advice?
> 
> thanks
> -- PMM
Peter Maydell Aug. 22, 2024, 3:31 p.m. UTC | #4
On Wed, 21 Aug 2024 at 01:21, Xingtao Yao (Fujitsu)
<yaoxt.fnst@fujitsu.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Peter Maydell <peter.maydell@linaro.org>
> > Sent: Tuesday, August 20, 2024 4:41 PM
> > To: Yao, Xingtao/姚 幸涛 <yaoxt.fnst@fujitsu.com>
> > Cc: qemu-devel@nongnu.org
> > Subject: Re: [PATCH] scripts/coccinelle: New range.cocci
> >
> > On Thu, 25 Jul 2024 at 06:55, Yao Xingtao via <qemu-devel@nongnu.org> wrote:
> > >
> > > This is the semantic patch from commit 7b3e371526 "cxl/mailbox: make
> > > range overlap check more readable"
> > >
> > > Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
> > > ---
> > >  scripts/coccinelle/range.cocci | 49
> > ++++++++++++++++++++++++++++++++++
> > >  1 file changed, 49 insertions(+)
> > >  create mode 100644 scripts/coccinelle/range.cocci
> > >
> > > diff --git a/scripts/coccinelle/range.cocci b/scripts/coccinelle/range.cocci
> > > new file mode 100644
> > > index 000000000000..21b07945ccb2
> > > --- /dev/null
> > > +++ b/scripts/coccinelle/range.cocci
> > > @@ -0,0 +1,49 @@
> > > +/*
> > > +  Usage:
> > > +
> > > +    spatch \
> > > +           --macro-file scripts/cocci-macro-file.h \
> > > +           --sp-file scripts/coccinelle/range.cocci \
> > > +           --keep-comments \
> > > +           --in-place \
> > > +           --dir .
> > > +
> > > +  Description:
> > > +    Find out the range overlap check and use ranges_overlap() instead.
> > > +
> > > +  Note:
> > > +    This pattern cannot accurately match the region overlap check, and you
> > > +    need to manually delete the use cases that do not meet the conditions.
> > > +
> > > +    In addition, the parameters of ranges_overlap() may be filled incorrectly,
> > > +    and some use cases may be better to use range_overlaps_range().
> >
> > I think these restrictions mean that we should do one
> > of two things:
> >  (1) rewrite this as a Coccinelle script that just prints
> >      out the places where it found matches (i.e. a "grep"
> >      type operation, not a search-and-replace), so the
> >      user can then go and investigate them and do the
> >      range_overlap they want to do
> >  (2) fix the problems so that you really can apply it to
> >      the source tree and get correct fixes
> >
> > The ideal would be (2) -- the ideal flow for coccinelle
> > driven patches is that you can review the coccinelle
> > script to check for things like off-by-one errors, and
> > then can trust all the changes it makes. Otherwise
> > reviewers need to carefully scrutinize each source
> > change individually.
> >
> > It's the off-by-one issue that really makes me think
> > that (2) would be preferable -- it would otherwise
> > be I think quite easy to accidentally rewrite a correct
> > range check into one that's off-by-one and not notice.
> thanks for your reply!
> I tried my best to match all the patterns of the region overlap check,
> but it is difficult, I am not good at cocci, could you give me some advice?


Something like this seems to me to work and mostly makes the
same substitutions as your series suggests:

===begin===
//  Usage:
//    spatch \
//          --macro-file scripts/cocci-macro-file.h \
//           --sp-file scripts/coccinelle/range.cocci \
//           --keep-comments \
//           --in-place \
//           --dir .
//
//  Description:
//   Find out the range overlap check and use ranges_overlap() instead.
//
// Usage notes:
// * Any change made here shouldn't be a behaviour change, but you'll
//   need to check that the suggested changes really are range checks
//   semantically, and not something else that happened to match the pattern.
//   In particular the patterns for (start1, end1) vs (start2, end2)
//   can match very widely.
// * This won't detect cases where somebody intended to write a range overlap
//   but made an off-by-one error in the bounds checks.
// * You may need to add a #include "qemu/range.h" to the file.
//
// The arguments to ranges_overlap() are start1, len1, start2, len2.
// This means that the last value included in each range is (start + len - 1).
//
// Note that Coccinelle is smart enough to match unbracketed
// versions of expressions if we provide it with patterns with brackets,
// but not vice-versa, so our match expressions are generous with bracketing.

@@
// Where the code expresses things in terms of start and length:
expression S1, L1, S2, L2;
@@
(
- (((S1 + L1) <= S2) || ((S2 + L2) <= S1))
+ !ranges_overlap(S1, L1, S2, L2)
|
- ((S1 < (S2 + L2)) && (S2 < (S1 + L1)))
+ ranges_overlap(S1, L1, S2, L2)
)
@@
// Where the code expresses things with (start, length) and (start, end)
// with the 'end' byte not being inside the second range.
expression S1, E1, S2, L2;
@@
(
- ((S1 >= (S2 + L2)) || (E1 <= S2))
+ !ranges_overlap(S1, E1 - S1, S2, L2)
|
- ((S1 < (S2 + L2)) && (E1 > S2))
+ ranges_overlap(S1, E1 - S1, S2, L2)
)
@@
// Where the code expresses things with (start, end), (start, end)
// with the 'end' bytes not being inside the second range.
expression S1, E1, S2, E2;
@@
(
- ((S1 >= E2) || (E1 <= S2))
+ !ranges_overlap(S1, E1 - S1, S2, E2 - S2)
|
- ((S1 < E2) && (E1 > S2))
+ ranges_overlap(S1, E1 - S1, S2, E2 - S2)
)
===endit===

The awkward part here is that (as the notes suggest) those last
two patterns will match an awful lot of things that aren't
ranges. (I think we could in theory improve on that a bit by
for instance insisting that none of S1 S2 E1 E2 were constants,
but that seems tricky at minimum in Coccinelle.) But at least
the substitution they suggest will be an arithmetically correct one.

A couple of other review notes:
 * our Coccinelle comment format style is '//', not C-style comments
 * all new files in the tree need the usual copyright-and-license
   information at the top

thanks
-- PMM
Zhijian Li (Fujitsu)" via Aug. 23, 2024, 12:20 a.m. UTC | #5
> -----Original Message-----
> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Thursday, August 22, 2024 11:31 PM
> To: Yao, Xingtao/姚 幸涛 <yaoxt.fnst@fujitsu.com>
> Cc: qemu-devel@nongnu.org
> Subject: Re: [PATCH] scripts/coccinelle: New range.cocci
> 
> On Wed, 21 Aug 2024 at 01:21, Xingtao Yao (Fujitsu)
> <yaoxt.fnst@fujitsu.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Peter Maydell <peter.maydell@linaro.org>
> > > Sent: Tuesday, August 20, 2024 4:41 PM
> > > To: Yao, Xingtao/姚 幸涛 <yaoxt.fnst@fujitsu.com>
> > > Cc: qemu-devel@nongnu.org
> > > Subject: Re: [PATCH] scripts/coccinelle: New range.cocci
> > >
> > > On Thu, 25 Jul 2024 at 06:55, Yao Xingtao via <qemu-devel@nongnu.org>
> wrote:
> > > >
> > > > This is the semantic patch from commit 7b3e371526 "cxl/mailbox: make
> > > > range overlap check more readable"
> > > >
> > > > Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
> > > > ---
> > > >  scripts/coccinelle/range.cocci | 49
> > > ++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 49 insertions(+)
> > > >  create mode 100644 scripts/coccinelle/range.cocci
> > > >
> > > > diff --git a/scripts/coccinelle/range.cocci b/scripts/coccinelle/range.cocci
> > > > new file mode 100644
> > > > index 000000000000..21b07945ccb2
> > > > --- /dev/null
> > > > +++ b/scripts/coccinelle/range.cocci
> > > > @@ -0,0 +1,49 @@
> > > > +/*
> > > > +  Usage:
> > > > +
> > > > +    spatch \
> > > > +           --macro-file scripts/cocci-macro-file.h \
> > > > +           --sp-file scripts/coccinelle/range.cocci \
> > > > +           --keep-comments \
> > > > +           --in-place \
> > > > +           --dir .
> > > > +
> > > > +  Description:
> > > > +    Find out the range overlap check and use ranges_overlap() instead.
> > > > +
> > > > +  Note:
> > > > +    This pattern cannot accurately match the region overlap check, and you
> > > > +    need to manually delete the use cases that do not meet the conditions.
> > > > +
> > > > +    In addition, the parameters of ranges_overlap() may be filled
> incorrectly,
> > > > +    and some use cases may be better to use range_overlaps_range().
> > >
> > > I think these restrictions mean that we should do one
> > > of two things:
> > >  (1) rewrite this as a Coccinelle script that just prints
> > >      out the places where it found matches (i.e. a "grep"
> > >      type operation, not a search-and-replace), so the
> > >      user can then go and investigate them and do the
> > >      range_overlap they want to do
> > >  (2) fix the problems so that you really can apply it to
> > >      the source tree and get correct fixes
> > >
> > > The ideal would be (2) -- the ideal flow for coccinelle
> > > driven patches is that you can review the coccinelle
> > > script to check for things like off-by-one errors, and
> > > then can trust all the changes it makes. Otherwise
> > > reviewers need to carefully scrutinize each source
> > > change individually.
> > >
> > > It's the off-by-one issue that really makes me think
> > > that (2) would be preferable -- it would otherwise
> > > be I think quite easy to accidentally rewrite a correct
> > > range check into one that's off-by-one and not notice.
> > thanks for your reply!
> > I tried my best to match all the patterns of the region overlap check,
> > but it is difficult, I am not good at cocci, could you give me some advice?
> 
> 
> Something like this seems to me to work and mostly makes the
> same substitutions as your series suggests:
> 
> ===begin===
> //  Usage:
> //    spatch \
> //          --macro-file scripts/cocci-macro-file.h \
> //           --sp-file scripts/coccinelle/range.cocci \
> //           --keep-comments \
> //           --in-place \
> //           --dir .
> //
> //  Description:
> //   Find out the range overlap check and use ranges_overlap() instead.
> //
> // Usage notes:
> // * Any change made here shouldn't be a behaviour change, but you'll
> //   need to check that the suggested changes really are range checks
> //   semantically, and not something else that happened to match the pattern.
> //   In particular the patterns for (start1, end1) vs (start2, end2)
> //   can match very widely.
> // * This won't detect cases where somebody intended to write a range overlap
> //   but made an off-by-one error in the bounds checks.
> // * You may need to add a #include "qemu/range.h" to the file.
> //
> // The arguments to ranges_overlap() are start1, len1, start2, len2.
> // This means that the last value included in each range is (start + len - 1).
> //
> // Note that Coccinelle is smart enough to match unbracketed
> // versions of expressions if we provide it with patterns with brackets,
> // but not vice-versa, so our match expressions are generous with bracketing.
> 
> @@
> // Where the code expresses things in terms of start and length:
> expression S1, L1, S2, L2;
> @@
> (
> - (((S1 + L1) <= S2) || ((S2 + L2) <= S1))
> + !ranges_overlap(S1, L1, S2, L2)
> |
> - ((S1 < (S2 + L2)) && (S2 < (S1 + L1)))
> + ranges_overlap(S1, L1, S2, L2)
> )
> @@
> // Where the code expresses things with (start, length) and (start, end)
> // with the 'end' byte not being inside the second range.
> expression S1, E1, S2, L2;
> @@
> (
> - ((S1 >= (S2 + L2)) || (E1 <= S2))
> + !ranges_overlap(S1, E1 - S1, S2, L2)
> |
> - ((S1 < (S2 + L2)) && (E1 > S2))
> + ranges_overlap(S1, E1 - S1, S2, L2)
> )
> @@
> // Where the code expresses things with (start, end), (start, end)
> // with the 'end' bytes not being inside the second range.
> expression S1, E1, S2, E2;
> @@
> (
> - ((S1 >= E2) || (E1 <= S2))
> + !ranges_overlap(S1, E1 - S1, S2, E2 - S2)
> |
> - ((S1 < E2) && (E1 > S2))
> + ranges_overlap(S1, E1 - S1, S2, E2 - S2)
> )
> ===endit===
> 
> The awkward part here is that (as the notes suggest) those last
> two patterns will match an awful lot of things that aren't
> ranges. (I think we could in theory improve on that a bit by
> for instance insisting that none of S1 S2 E1 E2 were constants,
> but that seems tricky at minimum in Coccinelle.) But at least
> the substitution they suggest will be an arithmetically correct one.
thanks for your advice, will fix in next revision.
> 
> A couple of other review notes:
>  * our Coccinelle comment format style is '//', not C-style comments
>  * all new files in the tree need the usual copyright-and-license
>    information at the top
got it, thanks.
> 
> thanks
> -- PMM
diff mbox series

Patch

diff --git a/scripts/coccinelle/range.cocci b/scripts/coccinelle/range.cocci
new file mode 100644
index 000000000000..21b07945ccb2
--- /dev/null
+++ b/scripts/coccinelle/range.cocci
@@ -0,0 +1,49 @@ 
+/*
+  Usage:
+
+    spatch \
+           --macro-file scripts/cocci-macro-file.h \
+           --sp-file scripts/coccinelle/range.cocci \
+           --keep-comments \
+           --in-place \
+           --dir .
+
+  Description:
+    Find out the range overlap check and use ranges_overlap() instead.
+
+  Note:
+    This pattern cannot accurately match the region overlap check, and you
+    need to manually delete the use cases that do not meet the conditions.
+
+    In addition, the parameters of ranges_overlap() may be filled incorrectly,
+    and some use cases may be better to use range_overlaps_range().
+*/
+
+@@
+expression E1, E2, E3, E4;
+@@
+(
+- E2 <= E3 || E1 >= E4
++ !ranges_overlap(E1, E2, E3, E4)
+|
+
+- (E2 <= E3) || (E1 >= E4)
++ !ranges_overlap(E1, E2, E3, E4)
+|
+
+- E1 < E4 && E2 > E3
++ ranges_overlap(E1, E2, E3, E4)
+|
+
+- (E1 < E4) && (E2 > E3)
++ ranges_overlap(E1, E2, E3, E4)
+|
+
+- (E1 >= E3 && E1 < E4) || (E2 > E3 && E2 <= E4)
++ ranges_overlap(E1, E2, E3, E4)
+
+|
+- ((E1 >= E3) && (E1 < E4)) || ((E2 > E3) && (E2 <= E4))
++ ranges_overlap(E1, E2, E3, E4)
+)
+