Message ID | ded153d4-4aea-d4da-11cb-ec66d181e4c9@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cocci: avoid normalization rules for memcpy | expand |
On Sun, Jul 10 2022, René Scharfe wrote: > Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are > intended to reduce the number of sizeof variants to deal with. They can > have unintended side effects if only they match, but not the one for the > COPY_ARRAY conversion at the end. Since ab/cocci-unused is marked for "next" it would be really nice to have this based on top so we can add tests for these transformations (the topic adds a way to do that). But if you don't feel like doing that this is fine too. > diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci > index 9a4f00cb1b..aa75937950 100644 > --- a/contrib/coccinelle/array.cocci > +++ b/contrib/coccinelle/array.cocci > @@ -1,60 +1,58 @@ > -@@ > -expression dst, src, n, E; > -@@ > - memcpy(dst, src, n * sizeof( > -- E[...] > -+ *(E) > - )) > - > -@@ > -type T; > -T *ptr; > -T[] arr; > -expression E, n; > -@@ > -( > - memcpy(ptr, E, > -- n * sizeof(*(ptr)) > -+ n * sizeof(T) > - ) > -| > - memcpy(arr, E, > -- n * sizeof(*(arr)) > -+ n * sizeof(T) > - ) > -| > - memcpy(E, ptr, > -- n * sizeof(*(ptr)) > -+ n * sizeof(T) > - ) > -| > - memcpy(E, arr, > -- n * sizeof(*(arr)) > -+ n * sizeof(T) > - ) > -) > - > @@ > type T; > T *dst_ptr; > T *src_ptr; > -T[] dst_arr; > -T[] src_arr; > expression n; > @@ > -( > -- memcpy(dst_ptr, src_ptr, (n) * sizeof(T)) > +- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(T) > +- \| sizeof(*(dst_ptr)) > +- \| sizeof(*(src_ptr)) > +- \| sizeof(dst_ptr[...]) > +- \| sizeof(src_ptr[...]) > +- \) ) > + COPY_ARRAY(dst_ptr, src_ptr, n) > -| > -- memcpy(dst_ptr, src_arr, (n) * sizeof(T)) > + > +@@ > +type T; > +T *dst_ptr; > +T[] src_arr; > +expression n; > +@@ > +- memcpy(dst_ptr, src_arr, (n) * \( sizeof(T) > +- \| sizeof(*(dst_ptr)) > +- \| sizeof(*(src_arr)) > +- \| sizeof(dst_ptr[...]) > +- \| sizeof(src_arr[...]) > +- \) ) > + COPY_ARRAY(dst_ptr, src_arr, n) > -| > -- memcpy(dst_arr, src_ptr, (n) * sizeof(T)) > + > +@@ > +type T; > +T[] dst_arr; > +T *src_ptr; > +expression n; > +@@ > +- memcpy(dst_arr, src_ptr, (n) * \( sizeof(T) > +- \| sizeof(*(dst_arr)) > +- \| sizeof(*(src_ptr)) > +- \| sizeof(dst_arr[...]) > +- \| sizeof(src_ptr[...]) > +- \) ) > + COPY_ARRAY(dst_arr, src_ptr, n) > -| > -- memcpy(dst_arr, src_arr, (n) * sizeof(T)) > + > +@@ > +type T; > +T[] dst_arr; > +T[] src_arr; > +expression n; > +@@ > +- memcpy(dst_arr, src_arr, (n) * \( sizeof(T) > +- \| sizeof(*(dst_arr)) > +- \| sizeof(*(src_arr)) > +- \| sizeof(dst_arr[...]) > +- \| sizeof(src_arr[...]) > +- \) ) > + COPY_ARRAY(dst_arr, src_arr, n) > -) > > @@ > type T; Hrm, this seems like a lot of repetition, it's here in the rules you're editing already, but these repeated "sizeof" make it a lot more verbose. Isn't there a way to avoid this by simply wrapping this across lines, I didn't test, but I think you can do this sort of thing in the cocci grammar: - memcpy( - COPY_ARRAY( ( dst_arr | dst_ptr ) , ( src_arr | src_ptr ) , (n) * - [your big sizeof alternate here] ) I.e. you want to preserve whatever we match in the 1st and 2nd arguments, but only want to munge part of the 3rd argument. The cocci grammar can "reach into" lines like that, it doesn't need to be limited to line-based diffs. But I didn't try it in this caes, and maybe there's a good reason for why it can't happen in this case... I also wonder if that won't be a lot faster, i.e. if you can condense this all into one rule it won't need to match this N times, but maybe the overall complexity of the rules makes it come out to the same thing in the end...
On Sun, Jul 10 2022, Ævar Arnfjörð Bjarmason wrote: > On Sun, Jul 10 2022, René Scharfe wrote: > Isn't there a way to avoid this by simply wrapping this across lines, I > didn't test, but I think you can do this sort of thing in the cocci > grammar: > > - memcpy( > - COPY_ARRAY( Oops, typo! I just manually typed this out, so I meant "+ COPY_ARRAY(" here, in case it wasn't clear. To elaborate: Here's an example in the cocci repository: https://github.com/coccinelle/coccinelle/blob/master/tests/pcim.cocci You can also use "..." elides, but I didn't check if that works for the start of an argument list (but it should, I think), here's an example with it at the end: https://github.com/coccinelle/coccinelle/blob/master/scripts/coccicheck/cocci/kc.cocci So I think rules like: - memcpy( + COPY_ARRAY( ..., - A + B ); Are possible, and something you can use in this case (I have "5.1 Basic Transformations" from cocci_spatch_grammar.pdf in front of me).
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Since ab/cocci-unused is marked for "next" it would be really nice to > have this based on top so we can add tests for these transformations > (the topic adds a way to do that). Please do not follow this mantra in general. What is already in 'next' cannot be avoided. What is not yet in 'next' can yield just fine if needed.
Am 10.07.22 um 16:45 schrieb Ævar Arnfjörð Bjarmason: > > On Sun, Jul 10 2022, René Scharfe wrote: > >> Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are >> intended to reduce the number of sizeof variants to deal with. They can >> have unintended side effects if only they match, but not the one for the >> COPY_ARRAY conversion at the end. > > Since ab/cocci-unused is marked for "next" it would be really nice to > have this based on top so we can add tests for these transformations > (the topic adds a way to do that). Testing semantic patches sounds like a good idea. We can add tests later, once that feature landed. > > But if you don't feel like doing that this is fine too. > >> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci >> index 9a4f00cb1b..aa75937950 100644 >> --- a/contrib/coccinelle/array.cocci >> +++ b/contrib/coccinelle/array.cocci >> @@ -1,60 +1,58 @@ >> -@@ >> -expression dst, src, n, E; >> -@@ >> - memcpy(dst, src, n * sizeof( >> -- E[...] >> -+ *(E) >> - )) >> - >> -@@ >> -type T; >> -T *ptr; >> -T[] arr; >> -expression E, n; >> -@@ >> -( >> - memcpy(ptr, E, >> -- n * sizeof(*(ptr)) >> -+ n * sizeof(T) >> - ) >> -| >> - memcpy(arr, E, >> -- n * sizeof(*(arr)) >> -+ n * sizeof(T) >> - ) >> -| >> - memcpy(E, ptr, >> -- n * sizeof(*(ptr)) >> -+ n * sizeof(T) >> - ) >> -| >> - memcpy(E, arr, >> -- n * sizeof(*(arr)) >> -+ n * sizeof(T) >> - ) >> -) >> - >> @@ >> type T; >> T *dst_ptr; >> T *src_ptr; >> -T[] dst_arr; >> -T[] src_arr; >> expression n; >> @@ >> -( >> -- memcpy(dst_ptr, src_ptr, (n) * sizeof(T)) >> +- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(T) >> +- \| sizeof(*(dst_ptr)) >> +- \| sizeof(*(src_ptr)) >> +- \| sizeof(dst_ptr[...]) >> +- \| sizeof(src_ptr[...]) >> +- \) ) >> + COPY_ARRAY(dst_ptr, src_ptr, n) >> -| >> -- memcpy(dst_ptr, src_arr, (n) * sizeof(T)) >> + >> +@@ >> +type T; >> +T *dst_ptr; >> +T[] src_arr; >> +expression n; >> +@@ >> +- memcpy(dst_ptr, src_arr, (n) * \( sizeof(T) >> +- \| sizeof(*(dst_ptr)) >> +- \| sizeof(*(src_arr)) >> +- \| sizeof(dst_ptr[...]) >> +- \| sizeof(src_arr[...]) >> +- \) ) >> + COPY_ARRAY(dst_ptr, src_arr, n) >> -| >> -- memcpy(dst_arr, src_ptr, (n) * sizeof(T)) >> + >> +@@ >> +type T; >> +T[] dst_arr; >> +T *src_ptr; >> +expression n; >> +@@ >> +- memcpy(dst_arr, src_ptr, (n) * \( sizeof(T) >> +- \| sizeof(*(dst_arr)) >> +- \| sizeof(*(src_ptr)) >> +- \| sizeof(dst_arr[...]) >> +- \| sizeof(src_ptr[...]) >> +- \) ) >> + COPY_ARRAY(dst_arr, src_ptr, n) >> -| >> -- memcpy(dst_arr, src_arr, (n) * sizeof(T)) >> + >> +@@ >> +type T; >> +T[] dst_arr; >> +T[] src_arr; >> +expression n; >> +@@ >> +- memcpy(dst_arr, src_arr, (n) * \( sizeof(T) >> +- \| sizeof(*(dst_arr)) >> +- \| sizeof(*(src_arr)) >> +- \| sizeof(dst_arr[...]) >> +- \| sizeof(src_arr[...]) >> +- \) ) >> + COPY_ARRAY(dst_arr, src_arr, n) >> -) >> >> @@ >> type T; > > Hrm, this seems like a lot of repetition, it's here in the rules you're > editing already, but these repeated "sizeof" make it a lot more verbose. > > Isn't there a way to avoid this by simply wrapping this across lines, I > didn't test, but I think you can do this sort of thing in the cocci > grammar: > > - memcpy( > - COPY_ARRAY( > ( > dst_arr > | > dst_ptr > ) > , > ( > src_arr > | > src_ptr > ) > , > (n) * > - [your big sizeof alternate here] > ) Hmm, that would match many more combinations, e.g. this one: void f(int *a, int *b, long n, int c[1]) { memcpy(a, b, n * sizeof(*c)); } The elements of a, b and c have the same type, so replacing c with a (which a conversion to COPY_ARRAY would do) would produce the same object code. I can't come up with a plausible scenario like above and where a type change of c down the line would cause problems, but I also can't convince myself that no such thing can exist. Tricky. And I can't get it to format the whitespace around the third argument of COPY_ARRAY nicely in all cases. And it takes 37% longer on my machine. But it sure is more compact. :) @@ type T; T *dst_ptr; T *src_ptr; T[] dst_arr; T[] src_arr; expression n; @@ - memcpy + COPY_ARRAY ( \( dst_ptr \| dst_arr \) , \( src_ptr \| src_arr \) , - (n) * \( sizeof(T) - \| sizeof(*(dst_ptr)) - \| sizeof(*(dst_arr)) - \| sizeof(*(src_ptr)) - \| sizeof(*(src_arr)) - \| sizeof(dst_ptr[...]) - \| sizeof(dst_arr[...]) - \| sizeof(src_ptr[...]) - \| sizeof(src_arr[...]) - \) + n ) > > I.e. you want to preserve whatever we match in the 1st and 2nd > arguments, but only want to munge part of the 3rd argument. The cocci > grammar can "reach into" lines like that, it doesn't need to be limited > to line-based diffs. > > But I didn't try it in this caes, and maybe there's a good reason for > why it can't happen in this case... > > I also wonder if that won't be a lot faster, i.e. if you can condense > this all into one rule it won't need to match this N times, but maybe > the overall complexity of the rules makes it come out to the same thing > in the end...
On Mon, Jul 11 2022, René Scharfe wrote: > Am 10.07.22 um 16:45 schrieb Ævar Arnfjörð Bjarmason: >> >> On Sun, Jul 10 2022, René Scharfe wrote: >> >>> Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are >>> intended to reduce the number of sizeof variants to deal with. They can >>> have unintended side effects if only they match, but not the one for the >>> COPY_ARRAY conversion at the end. >> >> Since ab/cocci-unused is marked for "next" it would be really nice to >> have this based on top so we can add tests for these transformations >> (the topic adds a way to do that). > > Testing semantic patches sounds like a good idea. We can add tests later, > once that feature landed. Okey, I might look at this then... >> >> But if you don't feel like doing that this is fine too. >> >>> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci >>> index 9a4f00cb1b..aa75937950 100644 >>> --- a/contrib/coccinelle/array.cocci >>> +++ b/contrib/coccinelle/array.cocci >>> @@ -1,60 +1,58 @@ >>> -@@ >>> -expression dst, src, n, E; >>> -@@ >>> - memcpy(dst, src, n * sizeof( >>> -- E[...] >>> -+ *(E) >>> - )) >>> - >>> -@@ >>> -type T; >>> -T *ptr; >>> -T[] arr; >>> -expression E, n; >>> -@@ >>> -( >>> - memcpy(ptr, E, >>> -- n * sizeof(*(ptr)) >>> -+ n * sizeof(T) >>> - ) >>> -| >>> - memcpy(arr, E, >>> -- n * sizeof(*(arr)) >>> -+ n * sizeof(T) >>> - ) >>> -| >>> - memcpy(E, ptr, >>> -- n * sizeof(*(ptr)) >>> -+ n * sizeof(T) >>> - ) >>> -| >>> - memcpy(E, arr, >>> -- n * sizeof(*(arr)) >>> -+ n * sizeof(T) >>> - ) >>> -) >>> - >>> @@ >>> type T; >>> T *dst_ptr; >>> T *src_ptr; >>> -T[] dst_arr; >>> -T[] src_arr; >>> expression n; >>> @@ >>> -( >>> -- memcpy(dst_ptr, src_ptr, (n) * sizeof(T)) >>> +- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(T) >>> +- \| sizeof(*(dst_ptr)) >>> +- \| sizeof(*(src_ptr)) >>> +- \| sizeof(dst_ptr[...]) >>> +- \| sizeof(src_ptr[...]) >>> +- \) ) >>> + COPY_ARRAY(dst_ptr, src_ptr, n) >>> -| >>> -- memcpy(dst_ptr, src_arr, (n) * sizeof(T)) >>> + >>> +@@ >>> +type T; >>> +T *dst_ptr; >>> +T[] src_arr; >>> +expression n; >>> +@@ >>> +- memcpy(dst_ptr, src_arr, (n) * \( sizeof(T) >>> +- \| sizeof(*(dst_ptr)) >>> +- \| sizeof(*(src_arr)) >>> +- \| sizeof(dst_ptr[...]) >>> +- \| sizeof(src_arr[...]) >>> +- \) ) >>> + COPY_ARRAY(dst_ptr, src_arr, n) >>> -| >>> -- memcpy(dst_arr, src_ptr, (n) * sizeof(T)) >>> + >>> +@@ >>> +type T; >>> +T[] dst_arr; >>> +T *src_ptr; >>> +expression n; >>> +@@ >>> +- memcpy(dst_arr, src_ptr, (n) * \( sizeof(T) >>> +- \| sizeof(*(dst_arr)) >>> +- \| sizeof(*(src_ptr)) >>> +- \| sizeof(dst_arr[...]) >>> +- \| sizeof(src_ptr[...]) >>> +- \) ) >>> + COPY_ARRAY(dst_arr, src_ptr, n) >>> -| >>> -- memcpy(dst_arr, src_arr, (n) * sizeof(T)) >>> + >>> +@@ >>> +type T; >>> +T[] dst_arr; >>> +T[] src_arr; >>> +expression n; >>> +@@ >>> +- memcpy(dst_arr, src_arr, (n) * \( sizeof(T) >>> +- \| sizeof(*(dst_arr)) >>> +- \| sizeof(*(src_arr)) >>> +- \| sizeof(dst_arr[...]) >>> +- \| sizeof(src_arr[...]) >>> +- \) ) >>> + COPY_ARRAY(dst_arr, src_arr, n) >>> -) >>> >>> @@ >>> type T; >> >> Hrm, this seems like a lot of repetition, it's here in the rules you're >> editing already, but these repeated "sizeof" make it a lot more verbose. >> >> Isn't there a way to avoid this by simply wrapping this across lines, I >> didn't test, but I think you can do this sort of thing in the cocci >> grammar: >> >> - memcpy( >> - COPY_ARRAY( >> ( >> dst_arr >> | >> dst_ptr >> ) >> , >> ( >> src_arr >> | >> src_ptr >> ) >> , >> (n) * >> - [your big sizeof alternate here] >> ) > > Hmm, that would match many more combinations, e.g. this one: > > void f(int *a, int *b, long n, int c[1]) { > memcpy(a, b, n * sizeof(*c)); > } > > The elements of a, b and c have the same type, so replacing c with a > (which a conversion to COPY_ARRAY would do) would produce the same > object code. I can't come up with a plausible scenario like above and > where a type change of c down the line would cause problems, but I > also can't convince myself that no such thing can exist. Tricky. > > And I can't get it to format the whitespace around the third argument > of COPY_ARRAY nicely in all cases. > > And it takes 37% longer on my machine. > > But it sure is more compact. :) > > @@ > type T; > T *dst_ptr; > T *src_ptr; > T[] dst_arr; > T[] src_arr; > expression n; > @@ > - memcpy > + COPY_ARRAY > ( > \( dst_ptr \| dst_arr \) , > \( src_ptr \| src_arr \) , > - (n) * \( sizeof(T) > - \| sizeof(*(dst_ptr)) > - \| sizeof(*(dst_arr)) > - \| sizeof(*(src_ptr)) > - \| sizeof(*(src_arr)) > - \| sizeof(dst_ptr[...]) > - \| sizeof(dst_arr[...]) > - \| sizeof(src_ptr[...]) > - \| sizeof(src_arr[...]) > - \) > + n > ) ..once those land, because cocci debug output is a lot more useful with target test data :)
diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci index 9a4f00cb1b..aa75937950 100644 --- a/contrib/coccinelle/array.cocci +++ b/contrib/coccinelle/array.cocci @@ -1,60 +1,58 @@ -@@ -expression dst, src, n, E; -@@ - memcpy(dst, src, n * sizeof( -- E[...] -+ *(E) - )) - -@@ -type T; -T *ptr; -T[] arr; -expression E, n; -@@ -( - memcpy(ptr, E, -- n * sizeof(*(ptr)) -+ n * sizeof(T) - ) -| - memcpy(arr, E, -- n * sizeof(*(arr)) -+ n * sizeof(T) - ) -| - memcpy(E, ptr, -- n * sizeof(*(ptr)) -+ n * sizeof(T) - ) -| - memcpy(E, arr, -- n * sizeof(*(arr)) -+ n * sizeof(T) - ) -) - @@ type T; T *dst_ptr; T *src_ptr; -T[] dst_arr; -T[] src_arr; expression n; @@ -( -- memcpy(dst_ptr, src_ptr, (n) * sizeof(T)) +- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(T) +- \| sizeof(*(dst_ptr)) +- \| sizeof(*(src_ptr)) +- \| sizeof(dst_ptr[...]) +- \| sizeof(src_ptr[...]) +- \) ) + COPY_ARRAY(dst_ptr, src_ptr, n) -| -- memcpy(dst_ptr, src_arr, (n) * sizeof(T)) + +@@ +type T; +T *dst_ptr; +T[] src_arr; +expression n; +@@ +- memcpy(dst_ptr, src_arr, (n) * \( sizeof(T) +- \| sizeof(*(dst_ptr)) +- \| sizeof(*(src_arr)) +- \| sizeof(dst_ptr[...]) +- \| sizeof(src_arr[...]) +- \) ) + COPY_ARRAY(dst_ptr, src_arr, n) -| -- memcpy(dst_arr, src_ptr, (n) * sizeof(T)) + +@@ +type T; +T[] dst_arr; +T *src_ptr; +expression n; +@@ +- memcpy(dst_arr, src_ptr, (n) * \( sizeof(T) +- \| sizeof(*(dst_arr)) +- \| sizeof(*(src_ptr)) +- \| sizeof(dst_arr[...]) +- \| sizeof(src_ptr[...]) +- \) ) + COPY_ARRAY(dst_arr, src_ptr, n) -| -- memcpy(dst_arr, src_arr, (n) * sizeof(T)) + +@@ +type T; +T[] dst_arr; +T[] src_arr; +expression n; +@@ +- memcpy(dst_arr, src_arr, (n) * \( sizeof(T) +- \| sizeof(*(dst_arr)) +- \| sizeof(*(src_arr)) +- \| sizeof(dst_arr[...]) +- \| sizeof(src_arr[...]) +- \) ) + COPY_ARRAY(dst_arr, src_arr, n) -) @@ type T;
Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are intended to reduce the number of sizeof variants to deal with. They can have unintended side effects if only they match, but not the one for the COPY_ARRAY conversion at the end. Avoid these side effects by instead using a self-contained rule for each combination of array and pointer for source and destination which lists all sizeof variants inline. This lets "make contrib/coccinelle/array.cocci.patch" take 15% longer on my machine, but gives peace of mind that no incomplete transformation will be generated. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> --- Patch generated with --patience for easier reading. contrib/coccinelle/array.cocci | 92 +++++++++++++++++----------------- 1 file changed, 45 insertions(+), 47 deletions(-) -- 2.37.0