diff mbox series

Makefile: add pending semantic patches

Message ID 20181108205255.193402-1-sbeller@google.com (mailing list archive)
State New, archived
Headers show
Series Makefile: add pending semantic patches | expand

Commit Message

Stefan Beller Nov. 8, 2018, 8:52 p.m. UTC
From: SZEDER Gábor <szeder.dev@gmail.com>

Add a description and place on how to use coccinelle for large refactorings
that happen only once.

Based-on-work-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---

I consider including this patch in a resend instead.
It outlays the basics of such a new workflow, which we can refine later.

Thanks,
Stefan

 Makefile                  |  7 +++--
 contrib/coccinelle/README | 60 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+), 2 deletions(-)

Comments

Martin Ågren Nov. 9, 2018, 4:55 a.m. UTC | #1
On Thu, 8 Nov 2018 at 21:53, Stefan Beller <sbeller@google.com> wrote:
>
> From: SZEDER Gábor <szeder.dev@gmail.com>
>

I haven't followed the original discussion too carefully, so I'll read
this like someone new to the topic probably would.

A nit, perhaps, but I was genuinely confused at first. The subject is
"Makefile: add pending semantic patches", but this patch doesn't add
any. It adds Makefile-support for such patches though, and it defines
the entire concept of pending semantic patches. How about "coccicheck:
introduce 'pending' semantic patches"?

> Add a description and place on how to use coccinelle for large refactorings
> that happen only once.

A bit confused about "and place". Based on my understanding from reading
the remainder of this patch, maybe:

  Teach `make coccicheck` to avoid patches named "*.pending.cocci" and
  handle them separately in a new `make coccicheck-pending` instead.
  This means that we can separate "critical" patches from "FYI" patches.
  The former target can continue causing Travis to fail its static
  analysis job, while the latter can let us keep an eye on ongoing
  (pending) transitions without them causing too much fallout.

  Document the intended use-cases and processes around these two
  targets.

>  This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/)
>  semantic patches that might be useful to developers.
> +
> +There are two types of semantic patches:
> +
> + * Using the semantic transformation to check for bad patterns in the code;
> +   This is what the original target 'make coccicheck' is designed to do and

Is it relevant that this was the "original" target? (Genuine question.)

> +   it is expected that any resulting patch indicates a regression.
> +   The patches resulting from 'make coccicheck' are small and infrequent,
> +   so once they are found, they can be sent to the mailing list as per usual.
> +
> +   Example for introducing new patterns:
> +   67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28)
> +   b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24)
> +
> +   Example of fixes using this approach:
> +   248f66ed8e (run-command: use strbuf_addstr() for adding a string to
> +               a strbuf, 2018-03-25)
> +   f919ffebed (Use MOVE_ARRAY, 2018-01-22)
> +
> +   These types of semantic patches are usually part of testing, c.f.
> +   0860a7641b (travis-ci: fail if Coccinelle static analysis found something
> +               to transform, 2018-07-23)

Very nicely described, nice with the examples to quickly give a feeling
about how/when to use this.

> + * Using semantic transformations in large scale refactorings throughout
> +   the code base.
> +
> +   When applying the semantic patch into a real patch, sending it to the
> +   mailing list in the usual way, such a patch would be expected to have a
> +   lot of textual and semantic conflicts as such large scale refactorings
> +   change function signatures that are used widely in the code base.
> +   A textual conflict would arise if surrounding code near any call of such
> +   function changes. A semantic conflict arises when other patch series in
> +   flight introduce calls to such functions.

OK, I'm with you.

> +   So to aid these large scale refactorings, semantic patches can be used,
> +   using the process as follows:
> +
> +   1) Figure out what kind of large scale refactoring we need
> +      -> This is usually done by another unrelated series.

"This"? The figuring out, or the refactoring? Also, "unrelated"?

> +   2) Create the sematic patch needed for the large scale refactoring

s/sematic/semantic/

> +      and store it in contrib/coccinelle/*.pending.cocci
> +      -> The suffix containing 'pending' is important to differentiate
> +      this case from the other use case of checking for bad patterns.

Good.

> +   3) Apply the semantic patch only partially, where needed for the patch series
> +      that motivates the large scale refactoring and then build that series
> +      on top of it.
> +      By applying the semantic patch only partially where needed, the series
> +      is less likely to conflict with other series in flight.
> +      To make it possible to apply the semantic patch partially, there needs
> +      to be mechanism for backwards compatibility to keep those places working
> +      where the semantic patch is not applied. This can be done via a
> +      wrapper function that has the exact name and signature as the function
> +      to be changed.
> +
> +   4) Send the series as usual, including only the needed parts of the
> +      large scale refactoring

Trailing period.

OK, I think I get it, but I wonder if this might not work equally well
or better under certain circumstances:

  - introduce new API
  - add pending semantic patch
  - convert quiet areas to use the new API

On the other hand, listing all possible flows might be needlessly
limiting. I guess it boils down to this:

"Create a pending semantic patch. Make sure the old way of doing things
still works. Apply the semantic patch to the quieter areas of the
codebase. If in doubt, convert fewer places in the original series --
remaining spots can always be converted at a later time."

Martin
Junio C Hamano Nov. 9, 2018, 5:18 a.m. UTC | #2
Stefan Beller <sbeller@google.com> writes:

> From: SZEDER Gábor <szeder.dev@gmail.com>
>
> Add a description and place on how to use coccinelle for large refactorings
> that happen only once.
>
> Based-on-work-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
> I consider including this patch in a resend instead.
> It outlays the basics of such a new workflow, which we can refine later.

Thanks for tying loose ends.

> diff --git a/contrib/coccinelle/README b/contrib/coccinelle/README
> index 9c2f8879c2..fa09d1abcc 100644
> --- a/contrib/coccinelle/README
> +++ b/contrib/coccinelle/README
> @@ -1,2 +1,62 @@
>  This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/)
>  semantic patches that might be useful to developers.
> +
> +There are two types of semantic patches:
> +
> + * Using the semantic transformation to check for bad patterns in the code;
> +   This is what the original target 'make coccicheck' is designed to do and
> +   it is expected that any resulting patch indicates a regression.
> +   The patches resulting from 'make coccicheck' are small and infrequent,
> +   so once they are found, they can be sent to the mailing list as per usual.
> +
> +   Example for introducing new patterns:
> +   67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28)
> +   b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24)
> +
> +   Example of fixes using this approach:
> +   248f66ed8e (run-command: use strbuf_addstr() for adding a string to
> +               a strbuf, 2018-03-25)
> +   f919ffebed (Use MOVE_ARRAY, 2018-01-22)
> +
> +   These types of semantic patches are usually part of testing, c.f.
> +   0860a7641b (travis-ci: fail if Coccinelle static analysis found something
> +               to transform, 2018-07-23)

Yup, and I think what we have in 'pu' (including your the_repository
stuff) falls into this category.

I've been paying attention to "make coccicheck" produced patches for
the past few weeks, and so far, I found it _much_ _much_ more
pleasant, compared to having to worry about merge conflicts with the
topics in flight that changes day to day (not just because we add
new topics or update existing topics, but also the order of the
merge changes as topics mature at different rates and jumps over
each other in master..pu history), that "make coccicheck" after
topics are merged to integration branches fixes these issues up as
needed.

> +   3) Apply the semantic patch only partially, where needed for the patch series
> +      that motivates the large scale refactoring and then build that series
> +      on top of it.

It is not quite clear what "needed for the patch series" really
means in the context of this paragraph.  What are the changes that
are not needed, that is still produced if we ran "make coccicheck"?

Are they wrong changes (e.g. a carelessly written read_cache() to
read_index(&the_index) conversion may munge the implementation of
read_cache(...) { return read_index(&the_index, ...); } and make
inifinite recursion)?  Or are they "correct but not immediately
necessary" (e.g. because calling read_cache() does not hurt until
that function gets removed, so rewriting the callers to call
read_index() with &the_index may be correct but not immediately
necessary)?

> +      By applying the semantic patch only partially where needed, the series
> +      is less likely to conflict with other series in flight.

That is correct.

> +      To make it possible to apply the semantic patch partially, there needs
> +      to be mechanism for backwards compatibility to keep those places working
> +      where the semantic patch is not applied. This can be done via a
> +      wrapper function that has the exact name and signature as the function
> +      to be changed.

OK, so this argues for leaving read_cache()-like things to help
other in-flight topics, while a change to encourage the use of
read_index() takes place.  That also makes sense, and this directly
relates to "less likely to conflict" benefit you mentioned above.

But it is still unclear to me then what are "necessary".

... goes and thinks ...

OK, so a series that allows a codepath to work on an arbitrary
in-core istate, for example, may need to update a function to take
istate and use it to call read_index(istate...), and the old code in
such a call chain must have used read_cache(), always operating on
&the_index.  For the purpose of that series, it does not matter if
other codepaths that are not involved in the callchain the series
wants to update are still only working on &the_index, so a change to
turn read_cache() into read_index(&the_index) is *not* necessary
(but still would be correct) and should be left out of the series.
But any change the series makes to the callchain in question that
turns read_cache() into read_index() with something call-specific
(not &the_index) is a necesary one.  Is that a reasonable example
of what these paragraphs wanted to convey with the distinction
between "needed for the patch series" and other changes?
Stefan Beller Nov. 9, 2018, 8:50 p.m. UTC | #3
On Thu, Nov 8, 2018 at 8:56 PM Martin Ågren <martin.agren@gmail.com> wrote:
> I haven't followed the original discussion too carefully, so I'll read
> this like someone new to the topic probably would.

Thanks!

> A nit, perhaps, but I was genuinely confused at first. The subject is
> "Makefile: add pending semantic patches", but this patch doesn't add
> any. It adds Makefile-support for such patches though, and it defines
> the entire concept of pending semantic patches. How about "coccicheck:
> introduce 'pending' semantic patches"?
>
> > Add a description and place on how to use coccinelle for large refactorings
> > that happen only once.
>
> A bit confused about "and place". Based on my understanding from reading
> the remainder of this patch, maybe:
>
>   Teach `make coccicheck` to avoid patches named "*.pending.cocci" and
>   handle them separately in a new `make coccicheck-pending` instead.
>   This means that we can separate "critical" patches from "FYI" patches.
>   The former target can continue causing Travis to fail its static
>   analysis job, while the latter can let us keep an eye on ongoing
>   (pending) transitions without them causing too much fallout.
>
>   Document the intended use-cases and processes around these two
>   targets.

Both suggested title and new commit message make sense.

>
> >  This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/)
> >  semantic patches that might be useful to developers.
> > +
> > +There are two types of semantic patches:
> > +
> > + * Using the semantic transformation to check for bad patterns in the code;
> > +   This is what the original target 'make coccicheck' is designed to do and
>
> Is it relevant that this was the "original" target? (Genuine question.)

No. I can omit that part.

>
> > +   it is expected that any resulting patch indicates a regression.
> > +   The patches resulting from 'make coccicheck' are small and infrequent,
> > +   so once they are found, they can be sent to the mailing list as per usual.
> > +
> > +   Example for introducing new patterns:
> > +   67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28)
> > +   b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24)
> > +
> > +   Example of fixes using this approach:
> > +   248f66ed8e (run-command: use strbuf_addstr() for adding a string to
> > +               a strbuf, 2018-03-25)
> > +   f919ffebed (Use MOVE_ARRAY, 2018-01-22)
> > +
> > +   These types of semantic patches are usually part of testing, c.f.
> > +   0860a7641b (travis-ci: fail if Coccinelle static analysis found something
> > +               to transform, 2018-07-23)
>
> Very nicely described, nice with the examples to quickly give a feeling
> about how/when to use this.

Thanks.


>
> > + * Using semantic transformations in large scale refactorings throughout
> > +   the code base.
> > +
> > +   When applying the semantic patch into a real patch, sending it to the
> > +   mailing list in the usual way, such a patch would be expected to have a
> > +   lot of textual and semantic conflicts as such large scale refactorings
> > +   change function signatures that are used widely in the code base.
> > +   A textual conflict would arise if surrounding code near any call of such
> > +   function changes. A semantic conflict arises when other patch series in
> > +   flight introduce calls to such functions.
>
> OK, I'm with you.
>
> > +   So to aid these large scale refactorings, semantic patches can be used,
> > +   using the process as follows:
> > +
> > +   1) Figure out what kind of large scale refactoring we need
> > +      -> This is usually done by another unrelated series.
>
> "This"? The figuring out, or the refactoring? Also, "unrelated"?

The need and type of what kind of large scale refactoring are
usually determined by a patch series, that is not refactoring for the
sake of refactoring, but it wants to achieve a specific goal, unrelated
to large refactorings per se.

The large refactoring is just another tool that a developer can use
to make their original series happen much faster.

So "unrelated" == "not the large scale refactoring, as that may
come as an preparatory series, but to have a preparatory series
it may be good to showcase why we need the preparatory series"

>
> > +   2) Create the sematic patch needed for the large scale refactoring
>
> s/sematic/semantic/

yup.

>
> > +      and store it in contrib/coccinelle/*.pending.cocci
> > +      -> The suffix containing 'pending' is important to differentiate
> > +      this case from the other use case of checking for bad patterns.
>
> Good.
>
> > +   3) Apply the semantic patch only partially, where needed for the patch series
> > +      that motivates the large scale refactoring and then build that series
> > +      on top of it.
> > +      By applying the semantic patch only partially where needed, the series
> > +      is less likely to conflict with other series in flight.
> > +      To make it possible to apply the semantic patch partially, there needs
> > +      to be mechanism for backwards compatibility to keep those places working
> > +      where the semantic patch is not applied. This can be done via a
> > +      wrapper function that has the exact name and signature as the function
> > +      to be changed.
> > +
> > +   4) Send the series as usual, including only the needed parts of the
> > +      large scale refactoring
>
> Trailing period.

ok.

> OK, I think I get it, but I wonder if this might not work equally well
> or better under certain circumstances:
>
>   - introduce new API

The  "new API" is similar enough to the old API and may even be
a superset.

>   - add pending semantic patch
>   - convert quiet areas to use the new API
>
> On the other hand, listing all possible flows might be needlessly
> limiting. I guess it boils down to this:
>
> "Create a pending semantic patch. Make sure the old way of doing things
> still works. Apply the semantic patch to the quieter areas of the
> codebase. If in doubt, convert fewer places in the original series --
> remaining spots can always be converted at a later time."

That's the gist of it. :)

Thanks for the review,
Stefan
Stefan Beller Nov. 9, 2018, 9:58 p.m. UTC | #4
On Thu, Nov 8, 2018 at 9:18 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > From: SZEDER Gábor <szeder.dev@gmail.com>
> >
> > Add a description and place on how to use coccinelle for large refactorings
> > that happen only once.
> >
> > Based-on-work-by: SZEDER Gábor <szeder.dev@gmail.com>
> > Signed-off-by: Stefan Beller <sbeller@google.com>
> > ---
> >
> > I consider including this patch in a resend instead.
> > It outlays the basics of such a new workflow, which we can refine later.
>
> Thanks for tying loose ends.
>
> > diff --git a/contrib/coccinelle/README b/contrib/coccinelle/README
> > index 9c2f8879c2..fa09d1abcc 100644
> > --- a/contrib/coccinelle/README
> > +++ b/contrib/coccinelle/README
> > @@ -1,2 +1,62 @@
> >  This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/)
> >  semantic patches that might be useful to developers.
> > +
> > +There are two types of semantic patches:
> > +
> > + * Using the semantic transformation to check for bad patterns in the code;
> > +   This is what the original target 'make coccicheck' is designed to do and
> > +   it is expected that any resulting patch indicates a regression.
> > +   The patches resulting from 'make coccicheck' are small and infrequent,
> > +   so once they are found, they can be sent to the mailing list as per usual.
> > +
> > +   Example for introducing new patterns:
> > +   67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28)
> > +   b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24)
> > +
> > +   Example of fixes using this approach:
> > +   248f66ed8e (run-command: use strbuf_addstr() for adding a string to
> > +               a strbuf, 2018-03-25)
> > +   f919ffebed (Use MOVE_ARRAY, 2018-01-22)
> > +
> > +   These types of semantic patches are usually part of testing, c.f.
> > +   0860a7641b (travis-ci: fail if Coccinelle static analysis found something
> > +               to transform, 2018-07-23)
>
> Yup, and I think what we have in 'pu' (including your the_repository
> stuff) falls into this category.

My impression was that the_repository is strongly second category
as the_repository.cocci doesn't fix bad smells of code, but proposes
a refactoring that we agreed on doing.

> I've been paying attention to "make coccicheck" produced patches for
> the past few weeks, and so far, I found it _much_ _much_ more
> pleasant, compared to having to worry about merge conflicts with the
> topics in flight that changes day to day (not just because we add
> new topics or update existing topics, but also the order of the
> merge changes as topics mature at different rates and jumps over
> each other in master..pu history), that "make coccicheck" after
> topics are merged to integration branches fixes these issues up as
> needed.

So from your end we would not need the "pending" category as long
as the follow ups come in a timely manner?

>
> > +   3) Apply the semantic patch only partially, where needed for the patch series
> > +      that motivates the large scale refactoring and then build that series
> > +      on top of it.
>
> It is not quite clear what "needed for the patch series" really
> means in the context of this paragraph.  What are the changes that
> are not needed, that is still produced if we ran "make coccicheck"?

An example for "needed" would be
3f21279f50..bd8737176b
whereas "not needed" is what is in
"treewide: apply cocci patch".

The treewide conversion of e.g. unuse_commit_buffer to
repo_unuse_commit_buffer is nice, but "needed" only in
its followup patch that converts logmsg_reencode as that
calls into the unuse_commit_buffer.

> Are they wrong changes (e.g. a carelessly written read_cache() to
> read_index(&the_index) conversion may munge the implementation of
> read_cache(...) { return read_index(&the_index, ...); } and make
> inifinite recursion)?  Or are they "correct but not immediately
> necessary" (e.g. because calling read_cache() does not hurt until
> that function gets removed, so rewriting the callers to call
> read_index() with &the_index may be correct but not immediately
> necessary)?

the latter. I assume correctness (of the semantic patch) to be a given,
but this is all about timing, i.e. how can I send the series without breaking
others in flight.

>
> > +      By applying the semantic patch only partially where needed, the series
> > +      is less likely to conflict with other series in flight.
>
> That is correct.
>
> > +      To make it possible to apply the semantic patch partially, there needs
> > +      to be mechanism for backwards compatibility to keep those places working
> > +      where the semantic patch is not applied. This can be done via a
> > +      wrapper function that has the exact name and signature as the function
> > +      to be changed.
>
> OK, so this argues for leaving read_cache()-like things to help
> other in-flight topics, while a change to encourage the use of
> read_index() takes place.  That also makes sense, and this directly
> relates to "less likely to conflict" benefit you mentioned above.

ok.

>
> But it is still unclear to me then what are "necessary".
>
> ... goes and thinks ...
>
> OK, so a series that allows a codepath to work on an arbitrary
> in-core istate, for example, may need to update a function to take
> istate and use it to call read_index(istate...), and the old code in
> such a call chain must have used read_cache(), always operating on
> &the_index.  For the purpose of that series, it does not matter if
> other codepaths that are not involved in the callchain the series
> wants to update are still only working on &the_index, so a change to
> turn read_cache() into read_index(&the_index) is *not* necessary
> (but still would be correct) and should be left out of the series.
> But any change the series makes to the callchain in question that
> turns read_cache() into read_index() with something call-specific
> (not &the_index) is a necesary one.  Is that a reasonable example
> of what these paragraphs wanted to convey with the distinction
> between "needed for the patch series" and other changes?

Exactly.

Maybe another way to phrase it is to explain the two series
independently of each other:

 1) Create the semantic patch series containing
 1a) - a *.pending.cocci semantic patch
 1b) - forward/backward compatibility providers
         (wrapper/defines)
 1c) then send the semantic patch series to the list

 2a) Write the other series as if (1) doesn't exist
    This means there will be some upgrades of
    the call chain from read_cache() to read_index()
 2b) Coincidentally these upgrades are the same
   as (1) would have produced. That's the whole
   trick.
 2c) send of this series independently of (1)

This can be done for read_cache / read_index as
they both exist already, but when read_index is
new to the code base, we'd need (2) to rely on (1b).

And that is why this patch sounded complicated.
SZEDER Gábor Nov. 13, 2018, 3:48 p.m. UTC | #5
On Fri, Nov 09, 2018 at 01:58:01PM -0800, Stefan Beller wrote:
> On Thu, Nov 8, 2018 at 9:18 PM Junio C Hamano <gitster@pobox.com> wrote:
> > Are they wrong changes (e.g. a carelessly written read_cache() to
> > read_index(&the_index) conversion may munge the implementation of
> > read_cache(...) { return read_index(&the_index, ...); } and make
> > inifinite recursion)?  Or are they "correct but not immediately
> > necessary" (e.g. because calling read_cache() does not hurt until
> > that function gets removed, so rewriting the callers to call
> > read_index() with &the_index may be correct but not immediately
> > necessary)?
> 
> the latter. I assume correctness (of the semantic patch) to be a given,

I'm afraid we can't assume that.  As far as correctness is concerned,
I think semantic patches are not different from any other code we
write, i.e. they are potentially buggy.  Perhaps even more so than the
"usual" Git code, because we have long experience writing C and shell
code (with all their quirks and error-proneness), but none of us is
really an expert in writing semantic patches.

Cases in point:

  - 6afedba8c9 (object_id.cocci: match only expressions of type
    'struct object_id', 2018-10-15)
  - 279ffad17d (coccinelle: avoid wrong transformation suggestions
    from commit.cocci, 2018-04-30)
  - cd9a4b6d93 (cocci: use format keyword instead of a literal string,
    2018-01-19); though this one is probably a bug in Coccinelle
    itself
  - c2bb0c1d1e (cocci: avoid self-references in object_id
    transformations, 2016-11-01)
Junio C Hamano Nov. 14, 2018, 4:15 a.m. UTC | #6
SZEDER Gábor <szeder.dev@gmail.com> writes:

>> > inifinite recursion)?  Or are they "correct but not immediately
>> > necessary" (e.g. because calling read_cache() does not hurt until
>> > that function gets removed, so rewriting the callers to call
>> > read_index() with &the_index may be correct but not immediately
>> > necessary)?
>> 
>> the latter. I assume correctness (of the semantic patch) to be a given,
>
> I'm afraid we can't assume that.  As far as correctness is concerned,
> I think semantic patches are not different from any other code we
> write, i.e. they are potentially buggy.  Perhaps even more so than the
> "usual" Git code, because we have long experience writing C and shell
> code (with all their quirks and error-proneness), but none of us is
> really an expert in writing semantic patches.

All correct.

And applying semantic patches generated from buggy sources can
produce buggy code, just like merging a buggy topic branch does.

These days, I ran coccicheck at the tip of 'pu' (even though this
cost me quite a lot a few times every day) and feed its findings
back to authors of topics that introduce new ones, so that their
topics next time do not need the fix-up at the tip of 'pu' in the
next integration cycle.  That way, the changes mechanically
suggested by coccicheck can still be reviewed in small bite-sized
chunks and hopefully possible problems caused by buggy sources can
either be found in the review process, or discovered at the tip of
'pu'.
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index b08d5ea258..e5abfe4cef 100644
--- a/Makefile
+++ b/Makefile
@@ -2739,9 +2739,12 @@  endif
 	then \
 		echo '    ' SPATCH result: $@; \
 	fi
-coccicheck: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.cocci))
+coccicheck: $(addsuffix .patch,$(filter-out %.pending.cocci,$(wildcard contrib/coccinelle/*.cocci)))
 
-.PHONY: coccicheck
+# See contrib/coccinelle/README
+coccicheck-pending: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.pending.cocci))
+
+.PHONY: coccicheck coccicheck-pending
 
 ### Installation rules
 
diff --git a/contrib/coccinelle/README b/contrib/coccinelle/README
index 9c2f8879c2..fa09d1abcc 100644
--- a/contrib/coccinelle/README
+++ b/contrib/coccinelle/README
@@ -1,2 +1,62 @@ 
 This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/)
 semantic patches that might be useful to developers.
+
+There are two types of semantic patches:
+
+ * Using the semantic transformation to check for bad patterns in the code;
+   This is what the original target 'make coccicheck' is designed to do and
+   it is expected that any resulting patch indicates a regression.
+   The patches resulting from 'make coccicheck' are small and infrequent,
+   so once they are found, they can be sent to the mailing list as per usual.
+
+   Example for introducing new patterns:
+   67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28)
+   b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24)
+
+   Example of fixes using this approach:
+   248f66ed8e (run-command: use strbuf_addstr() for adding a string to
+               a strbuf, 2018-03-25)
+   f919ffebed (Use MOVE_ARRAY, 2018-01-22)
+
+   These types of semantic patches are usually part of testing, c.f.
+   0860a7641b (travis-ci: fail if Coccinelle static analysis found something
+               to transform, 2018-07-23)
+
+ * Using semantic transformations in large scale refactorings throughout
+   the code base.
+
+   When applying the semantic patch into a real patch, sending it to the
+   mailing list in the usual way, such a patch would be expected to have a
+   lot of textual and semantic conflicts as such large scale refactorings
+   change function signatures that are used widely in the code base.
+   A textual conflict would arise if surrounding code near any call of such
+   function changes. A semantic conflict arises when other patch series in
+   flight introduce calls to such functions.
+
+   So to aid these large scale refactorings, semantic patches can be used,
+   using the process as follows:
+
+   1) Figure out what kind of large scale refactoring we need
+      -> This is usually done by another unrelated series.
+   2) Create the sematic patch needed for the large scale refactoring
+      and store it in contrib/coccinelle/*.pending.cocci
+      -> The suffix containing 'pending' is important to differentiate
+      this case from the other use case of checking for bad patterns.
+   3) Apply the semantic patch only partially, where needed for the patch series
+      that motivates the large scale refactoring and then build that series
+      on top of it.
+      By applying the semantic patch only partially where needed, the series
+      is less likely to conflict with other series in flight.
+      To make it possible to apply the semantic patch partially, there needs
+      to be mechanism for backwards compatibility to keep those places working
+      where the semantic patch is not applied. This can be done via a
+      wrapper function that has the exact name and signature as the function
+      to be changed.
+
+   4) Send the series as usual, including only the needed parts of the
+      large scale refactoring
+
+   Later steps (not necessarily by the original author) are to apply the
+   semantic patch in a way that do not produce lots of conflicts, for example
+   by consulting `git diff --numstat origin/master..origin/pu` and the changes
+   of the semantic patch.