[GSoC,0/3] Move generation, graph_pos to a slab

Message ID	20200604072759.19142-1-abhishekkumar8222@gmail.com (mailing list archive)
Headers	show Return-Path: <SRS0=8h89=7R=vger.kernel.org=git-owner@kernel.org> From: Abhishek Kumar <abhishekkumar8222@gmail.com> To: git@vger.kernel.org Cc: stolee@gmail.com, jnareb@gmail.com Subject: [GSoC Patch 0/3] Move generation, graph_pos to a slab Date: Thu, 4 Jun 2020 12:57:56 +0530 Message-Id: <20200604072759.19142-1-abhishekkumar8222@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	Move generation, graph_pos to a slab \| expand [GSoC,0/3] Move generation, graph_pos to a slab [GSoC,1/3] commit: introduce helpers for generation slab [GSoC,2/3] commit: convert commit->generation to a slab [GSoC,3/3] commit: convert commit->graph_pos to a slab

Abhishek Kumar June 4, 2020, 7:27 a.m. UTC

The struct commit is used in many contexts. However, members generation
and graph_pos are only used for commit-graph related operations and
otherwise waste memory.

This wastage would have been more pronounced as transistion to
generation number v2, which uses 64-bit generation number instead of
current 32-bits.

The third patch ("commit: convert commit->graph_pos to a slab",
2020-06-04) is currently failing diff-submodule related tests (t4041,
t4059 and t4060) for gcc [1]. I am going to send a second version soon,
fixing that.

[1]: https://travis-ci.com/github/abhishekkumar2718/git/jobs/343441189

Abhishek Kumar (3):
  commit: introduce helpers for generation slab
  commit: convert commit->generation to a slab
  commit: convert commit->graph_pos to a slab

 alloc.c                             |   2 -
 blame.c                             |   2 +-
 bloom.c                             |   6 +-
 commit-graph.c                      | 116 +++++++++++++++++++++-------
 commit-graph.h                      |   8 ++
 commit-reach.c                      |  50 ++++++------
 commit.c                            |   6 +-
 commit.h                            |   6 --
 contrib/coccinelle/generation.cocci |  12 +++
 contrib/coccinelle/graph_pos.cocci  |  12 +++
 revision.c                          |  16 ++--
 11 files changed, 158 insertions(+), 78 deletions(-)
 create mode 100644 contrib/coccinelle/generation.cocci
 create mode 100644 contrib/coccinelle/graph_pos.cocci

Derrick Stolee June 4, 2020, 2:22 p.m. UTC | #1

On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> The struct commit is used in many contexts. However, members generation
> and graph_pos are only used for commit-graph related operations and
> otherwise waste memory.
> 
> This wastage would have been more pronounced as transistion to
> generation number v2, which uses 64-bit generation number instead of
> current 32-bits.

Thanks! This is an important step, and will already improve
performance in subtle ways.

> The third patch ("commit: convert commit->graph_pos to a slab",
> 2020-06-04) is currently failing diff-submodule related tests (t4041,
> t4059 and t4060) for gcc [1]. I am going to send a second version soon,
> fixing that.
> 
> [1]: https://travis-ci.com/github/abhishekkumar2718/git/jobs/343441189
> 
> Abhishek Kumar (3):
>   commit: introduce helpers for generation slab
>   commit: convert commit->generation to a slab
>   commit: convert commit->graph_pos to a slab

If we have a commit-graph file, then we have graph_pos
and generation both coming from that file. Perhaps it
would be better to combine the data into a single slab
that stores a "struct commit_graph_data" or something?

This would change only the slab definitions, since you
already do a good job of wrapping the slab access in
methods.

>  alloc.c                             |   2 -
>  blame.c                             |   2 +-
>  bloom.c                             |   6 +-
>  commit-graph.c                      | 116 +++++++++++++++++++++-------
>  commit-graph.h                      |   8 ++
>  commit-reach.c                      |  50 ++++++------
>  commit.c                            |   6 +-
>  commit.h                            |   6 --
>  contrib/coccinelle/generation.cocci |  12 +++
>  contrib/coccinelle/graph_pos.cocci  |  12 +++
>  revision.c                          |  16 ++--
>  11 files changed, 158 insertions(+), 78 deletions(-)
>  create mode 100644 contrib/coccinelle/generation.cocci
>  create mode 100644 contrib/coccinelle/graph_pos.cocci

I appreciate the Coccinelle scripts to help identify
automatic fixes for other topics in-flight. However,
I wonder if they would be better placed inside the
existing commit.cocci file?

Thanks,
-Stolee

Junio C Hamano June 4, 2020, 5:55 p.m. UTC | #2

Derrick Stolee <stolee@gmail.com> writes:

> If we have a commit-graph file, then we have graph_pos
> and generation both coming from that file. Perhaps it
> would be better to combine the data into a single slab
> that stores a "struct commit_graph_data" or something?

Excellent.

Jakub Narębski June 5, 2020, 7 p.m. UTC | #3

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> The struct commit is used in many contexts. However, members generation
> and graph_pos are only used for commit-graph related operations and
> otherwise waste memory.

Very minor nitpick: this sentence would read better if the names of
`generation` and `graph_pos` fields (but especially the 'generation')
were quoted.

>
> This wastage would have been more pronounced as transistion to
> generation number v2, which uses 64-bit generation number instead of
> current 32-bits.

Good.  Moving reachability index value into a commit slab was one of
prerequisites to switching to the generation number v2, see [2]

[2]: https://public-inbox.org/git/cfa2c367-5cd7-add5-0293-caa75b103f34@gmail.com/t/#u

The other prerequisite was proper handling of commit-graph format
change, either by using "metadata chunk" as more flexible replacement of
mishandled format version field in the commit-graph file header, or as
proposed in [3] (and subsequent posts), removing "CDAT" chunk and
replacing it with "CDA2" chunk.

[3]: https://public-inbox.org/git/xmqq369z7i1b.fsf@gitster.c.googlers.com/t/#u

Also, we should probably stop mishandling the format version field, that
is do not error out [4] when commit-graph version of the file does not
match version supported by git code running the command, but just simply
not use the commit-graph (like it is done for Bloom filter chunks).

[4]: https://github.com/git/git/blob/master/commit-graph.c#L253

>
> The third patch ("commit: convert commit->graph_pos to a slab",
> 2020-06-04) is currently failing diff-submodule related tests (t4041,
> t4059 and t4060) for gcc [1]. I am going to send a second version soon,
> fixing that.
>
> [1]: https://travis-ci.com/github/abhishekkumar2718/git/jobs/343441189
>
> Abhishek Kumar (3):
>   commit: introduce helpers for generation slab
>   commit: convert commit->generation to a slab
>   commit: convert commit->graph_pos to a slab
>
>  alloc.c                             |   2 -
>  blame.c                             |   2 +-
>  bloom.c                             |   6 +-
>  commit-graph.c                      | 116 +++++++++++++++++++++-------
>  commit-graph.h                      |   8 ++
>  commit-reach.c                      |  50 ++++++------
>  commit.c                            |   6 +-
>  commit.h                            |   6 --
>  contrib/coccinelle/generation.cocci |  12 +++
>  contrib/coccinelle/graph_pos.cocci  |  12 +++

It is nice to see the use of Coccinelle scripts.

>  revision.c                          |  16 ++--
>  11 files changed, 158 insertions(+), 78 deletions(-)
>  create mode 100644 contrib/coccinelle/generation.cocci
>  create mode 100644 contrib/coccinelle/graph_pos.cocci

Best,

SZEDER Gábor June 7, 2020, 7:53 p.m. UTC | #4

On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> > The struct commit is used in many contexts. However, members generation
> > and graph_pos are only used for commit-graph related operations and
> > otherwise waste memory.
> > 
> > This wastage would have been more pronounced as transistion to
> > generation number v2, which uses 64-bit generation number instead of
> > current 32-bits.
> 
> Thanks! This is an important step, and will already improve
> performance in subtle ways.

While the reduced memory footprint of each commit object might improve
performance, accessing graph position and generation numbers in a
commit-slab is more expensive than direct field accesses in 'struct
commit' instances.  Consequently, these patches increase the runtime
of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
repository from 0.630s to 0.940s.

> >  create mode 100644 contrib/coccinelle/generation.cocci
> >  create mode 100644 contrib/coccinelle/graph_pos.cocci
> 
> I appreciate the Coccinelle scripts to help identify
> automatic fixes for other topics in-flight. However,
> I wonder if they would be better placed inside the
> existing commit.cocci file?

We add Coccinelle scripts to avoid undesirable code patterns entering
our code base.  That, however, is not the case here: this is a
one-time conversion, and at the end of this series 'struct commit'
won't have a 'generation' field anymore, so once it's merged the
compiler will catch any new 'commit->generation' accesses.  Therefore
I don't think that these Coccinelle scripts should be added at all.

Abhishek Kumar June 8, 2020, 5:48 a.m. UTC | #5

On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> > On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> > > The struct commit is used in many contexts. However, members generation
> > > and graph_pos are only used for commit-graph related operations and
> > > otherwise waste memory.
> > > 
> > > This wastage would have been more pronounced as transistion to
> > > generation number v2, which uses 64-bit generation number instead of
> > > current 32-bits.
> > 
> > Thanks! This is an important step, and will already improve
> > performance in subtle ways.
> 
> While the reduced memory footprint of each commit object might improve
> performance, accessing graph position and generation numbers in a
> commit-slab is more expensive than direct field accesses in 'struct
> commit' instances.  Consequently, these patches increase the runtime
> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
> repository from 0.630s to 0.940s.
> 

Thank you for checking performance. Performance penalty was something we
had discussed here [1]. 

Caching the commit slab results in local variables helped wonderfully in v2 [2].
For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
in the linux repository increased from 0.762 to 0.767s. Since this is a
change of <1%, it is *no longer* a performance regression in my opinion.

[1]: https://lore.kernel.org/git/9a15c7ba-8b55-099a-3c59-b5e7ff6124f6@gmail.com/
[2]: https://lore.kernel.org/git/20200607193237.699335-5-abhishekkumar8222@gmail.com/

> 
> > >  create mode 100644 contrib/coccinelle/generation.cocci
> > >  create mode 100644 contrib/coccinelle/graph_pos.cocci
> > 
> > I appreciate the Coccinelle scripts to help identify
> > automatic fixes for other topics in-flight. However,
> > I wonder if they would be better placed inside the
> > existing commit.cocci file?
> 
> We add Coccinelle scripts to avoid undesirable code patterns entering
> our code base.  That, however, is not the case here: this is a
> one-time conversion, and at the end of this series 'struct commit'
> won't have a 'generation' field anymore, so once it's merged the
> compiler will catch any new 'commit->generation' accesses.  Therefore
> I don't think that these Coccinelle scripts should be added at all.
> 

Alright, that makes sense to me. Will remove in a subsequent version.

Thanks
Abhishek

SZEDER Gábor June 8, 2020, 8:36 a.m. UTC | #6

On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
> > On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> > > On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> > > > The struct commit is used in many contexts. However, members generation
> > > > and graph_pos are only used for commit-graph related operations and
> > > > otherwise waste memory.
> > > > 
> > > > This wastage would have been more pronounced as transistion to
> > > > generation number v2, which uses 64-bit generation number instead of
> > > > current 32-bits.
> > > 
> > > Thanks! This is an important step, and will already improve
> > > performance in subtle ways.
> > 
> > While the reduced memory footprint of each commit object might improve
> > performance, accessing graph position and generation numbers in a
> > commit-slab is more expensive than direct field accesses in 'struct
> > commit' instances.  Consequently, these patches increase the runtime
> > of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
> > repository from 0.630s to 0.940s.
> > 
> 
> Thank you for checking performance. Performance penalty was something we
> had discussed here [1]. 
> 
> Caching the commit slab results in local variables helped wonderfully in v2 [2].
> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
> in the linux repository increased from 0.762 to 0.767s. Since this is a
> change of <1%, it is *no longer* a performance regression in my opinion.

Interesting, I measured 0.870s with v2, still a notable increase from
0.630s.

Derrick Stolee June 8, 2020, 1:45 p.m. UTC | #7

On 6/8/2020 4:36 AM, SZEDER Gábor wrote:
> On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
>> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
>>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
>>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
>>>>> The struct commit is used in many contexts. However, members generation
>>>>> and graph_pos are only used for commit-graph related operations and
>>>>> otherwise waste memory.
>>>>>
>>>>> This wastage would have been more pronounced as transistion to
>>>>> generation number v2, which uses 64-bit generation number instead of
>>>>> current 32-bits.
>>>>
>>>> Thanks! This is an important step, and will already improve
>>>> performance in subtle ways.
>>>
>>> While the reduced memory footprint of each commit object might improve
>>> performance, accessing graph position and generation numbers in a
>>> commit-slab is more expensive than direct field accesses in 'struct
>>> commit' instances.  Consequently, these patches increase the runtime
>>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
>>> repository from 0.630s to 0.940s.
>>>
>>
>> Thank you for checking performance. Performance penalty was something we
>> had discussed here [1]. 
>>
>> Caching the commit slab results in local variables helped wonderfully in v2 [2].
>> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
>> in the linux repository increased from 0.762 to 0.767s. Since this is a
>> change of <1%, it is *no longer* a performance regression in my opinion.
> 
> Interesting, I measured 0.870s with v2, still a notable increase from
> 0.630s.

This is an interesting point. The --is-ancestor is critical to the
performance issue (as measured on my machine).

For "git merge-base HEAD~50000 HEAD" on the Linux repo, I get

v2.27.0:
real    0m0.515s
user    0m0.467s
sys     0m0.048s

v2 series:
real    0m0.534s
user    0m0.481s
sys     0m0.053s

With "--is-ancestor" I see the following:

v2.27.0:
real    0m0.591s
user    0m0.539s
sys     0m0.052s

v2 series:
real    0m0.773s
user    0m0.733s
sys     0m0.040s

The --is-ancestor option [1] says

    Check if the first <commit> is an ancestor of the second
    <commit>, and exit with status 0 if true, or with status
    1 if not. Errors are signaled by a non-zero status that
    is not 1.

[1] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor

This _should_ be faster than "git branch --contains HEAD~50000",
but it is much much slower:

$ time git branch --contains HEAD~50000
real    0m0.068s
user    0m0.061s
sys     0m0.008s

So, there is definitely something going on that slows the
"--is-ancestor" path in this case. But, the solution is not
to halt the current patch (which likely has memory footprint
benefits when dealing with a lot of tree and blob objects)
and instead fix the underlying algorithm.

Let's add that to the list of things to do.

>>>  create mode 100644 contrib/coccinelle/generation.cocci
>>>  create mode 100644 contrib/coccinelle/graph_pos.cocci
>>
>> I appreciate the Coccinelle scripts to help identify
>> automatic fixes for other topics in-flight. However,
>> I wonder if they would be better placed inside the
>> existing commit.cocci file?
>
> We add Coccinelle scripts to avoid undesirable code patterns entering
> our code base.  That, however, is not the case here: this is a
> one-time conversion, and at the end of this series 'struct commit'
> won't have a 'generation' field anymore, so once it's merged the
> compiler will catch any new 'commit->generation' accesses.  Therefore
> I don't think that these Coccinelle scripts should be added at all.

I disagree. We _also_ add Coccinelle scripts when doing one-time
refactors to avoid logical merge conflicts with other topics in
flight. If someone else is working on a parallel topic that adds
references to graph_pos or generation member, then the scripts provide
an easy way for the maintainer to update those references in the merge
commit. Alternatively, the contributor could rebase on top of this
series and run the scripts themselves to fix their patches before
submission.

For example, this was done carefully in the sha->object_id
conversion using contrib/coccinelle/object_id.cocci.

Thanks,
-Stolee

Jakub Narębski June 8, 2020, 3:21 p.m. UTC | #8

SZEDER Gábor <szeder.dev@gmail.com> writes:
> On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
>> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
>>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
>>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:

>>>>> The struct commit is used in many contexts. However, members generation
>>>>> and graph_pos are only used for commit-graph related operations and
>>>>> otherwise waste memory.
>>>>> 
>>>>> This wastage would have been more pronounced as transistion to
>>>>> generation number v2, which uses 64-bit generation number instead of
>>>>> current 32-bits.
>>>> 
>>>> Thanks! This is an important step, and will already improve
>>>> performance in subtle ways.
>>> 
>>> While the reduced memory footprint of each commit object might improve
>>> performance, accessing graph position and generation numbers in a
>>> commit-slab is more expensive than direct field accesses in 'struct
>>> commit' instances.  Consequently, these patches increase the runtime
>>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
>>> repository from 0.630s to 0.940s. 
>> 
>> Thank you for checking performance. Performance penalty was something we
>> had discussed here [1]. 
>> 
>> Caching the commit slab results in local variables helped wonderfully in v2 [2].
>> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
>> in the linux repository increased from 0.762 to 0.767s. Since this is a
>> change of <1%, it is *no longer* a performance regression in my opinion.
>>
>> [1]: https://lore.kernel.org/git/9a15c7ba-8b55-099a-3c59-b5e7ff6124f6@gmail.com/
>> [2]: https://lore.kernel.org/git/20200607193237.699335-5-abhishekkumar8222@gmail.com/
>
> Interesting, I measured 0.870s with v2, still a notable increase from
> 0.630s [a change of +38%].

I wonder what might be the cause for this difference.  Is it difference
in hardware (faster memory, larger CPU cache?), difference in operating
system, or difference in position of HEAD?

On one hand it is large relative difference.  On the other hand it is
almost unnoticeable absolute difference of 0.25s.


I also wonder how the performance changes (with moving commit-graph data
to the slab) for commands that do not use this data, like e.g.:

  $ git -o core.commitGraph=false merge-base --is-ancestor HEAD~50000 HEAD

or

  $ git gc


Sidenote: I think the performance changes should be mentioned at least
in the cover letter for the series, if not in commit message(s).

Best,

SZEDER Gábor June 8, 2020, 4:46 p.m. UTC | #9

On Mon, Jun 08, 2020 at 09:45:12AM -0400, Derrick Stolee wrote:
> On 6/8/2020 4:36 AM, SZEDER Gábor wrote:
> > On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
> >> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
> >>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> >>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> >>>>> The struct commit is used in many contexts. However, members generation
> >>>>> and graph_pos are only used for commit-graph related operations and
> >>>>> otherwise waste memory.
> >>>>>
> >>>>> This wastage would have been more pronounced as transistion to
> >>>>> generation number v2, which uses 64-bit generation number instead of
> >>>>> current 32-bits.
> >>>>
> >>>> Thanks! This is an important step, and will already improve
> >>>> performance in subtle ways.
> >>>
> >>> While the reduced memory footprint of each commit object might improve
> >>> performance, accessing graph position and generation numbers in a
> >>> commit-slab is more expensive than direct field accesses in 'struct
> >>> commit' instances.  Consequently, these patches increase the runtime
> >>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
> >>> repository from 0.630s to 0.940s.
> >>>
> >>
> >> Thank you for checking performance. Performance penalty was something we
> >> had discussed here [1]. 
> >>
> >> Caching the commit slab results in local variables helped wonderfully in v2 [2].
> >> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
> >> in the linux repository increased from 0.762 to 0.767s. Since this is a
> >> change of <1%, it is *no longer* a performance regression in my opinion.
> > 
> > Interesting, I measured 0.870s with v2, still a notable increase from
> > 0.630s.
> 
> This is an interesting point. The --is-ancestor is critical to the
> performance issue (as measured on my machine).
> 
> For "git merge-base HEAD~50000 HEAD" on the Linux repo, I get
> 
> v2.27.0:
> real    0m0.515s
> user    0m0.467s
> sys     0m0.048s
> 
> v2 series:
> real    0m0.534s
> user    0m0.481s
> sys     0m0.053s

I, too, see similarly small differences in this case.

> With "--is-ancestor" I see the following:
> 
> v2.27.0:
> real    0m0.591s
> user    0m0.539s
> sys     0m0.052s
> 
> v2 series:
> real    0m0.773s
> user    0m0.733s
> sys     0m0.040s
> 
> The --is-ancestor option [1] says
> 
>     Check if the first <commit> is an ancestor of the second
>     <commit>, and exit with status 0 if true, or with status
>     1 if not. Errors are signaled by a non-zero status that
>     is not 1.
> 
> [1] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor
> 
> This _should_ be faster than "git branch --contains HEAD~50000",
> but it is much much slower:
> 
> $ time git branch --contains HEAD~50000
> real    0m0.068s
> user    0m0.061s
> sys     0m0.008s
> 
> So, there is definitely something going on that slows the
> "--is-ancestor" path in this case. But, the solution is not
> to halt the current patch (which likely has memory footprint
> benefits when dealing with a lot of tree and blob objects)
> and instead fix the underlying algorithm.

Other, more common cases are affected as well, notably the simple 'git
rev-list --topo-order':

  performance: 1.226479734 s: git command: /home/szeder/src/git/BUILDS/v2.27.0/bin/git rev-list --topo-order HEAD
  max RSS: 162400k
  
  performance: 1.741309536 s: git command: /home/szeder/src/git/git rev-list --topo-order HEAD
  max RSS: 169556k

Is the supposed memory footprint reduction that large to justify this
runtime increase?

> Let's add that to the list of things to do.

And to the commit messages.

> >>>  create mode 100644 contrib/coccinelle/generation.cocci
> >>>  create mode 100644 contrib/coccinelle/graph_pos.cocci
> >>
> >> I appreciate the Coccinelle scripts to help identify
> >> automatic fixes for other topics in-flight. However,
> >> I wonder if they would be better placed inside the
> >> existing commit.cocci file?
> >
> > We add Coccinelle scripts to avoid undesirable code patterns entering
> > our code base.  That, however, is not the case here: this is a
> > one-time conversion, and at the end of this series 'struct commit'
> > won't have a 'generation' field anymore, so once it's merged the
> > compiler will catch any new 'commit->generation' accesses.  Therefore
> > I don't think that these Coccinelle scripts should be added at all.
> 
> I disagree. We _also_ add Coccinelle scripts when doing one-time
> refactors to avoid logical merge conflicts with other topics in
> flight. If someone else is working on a parallel topic that adds
> references to graph_pos or generation member, then the scripts provide
> an easy way for the maintainer to update those references in the merge
> commit. Alternatively, the contributor could rebase on top of this
> series and run the scripts themselves to fix their patches before
> submission.
> 
> For example, this was done carefully in the sha->object_id
> conversion using contrib/coccinelle/object_id.cocci.

'object_id.cocci' is not about sha->object_id conversions, but about
avoiding undesirable code patterns, e.g. we prefer oideq() over
!oidcmp(), and the compiler, of course, can't help to catch that.
Coccinelle scripts used for actual sha->object_id transformations were
not added to 'object_id.cocci', but were recorded only in the commit
messages for reference, see e.g.  9b56149996 (merge-recursive: convert
struct merge_file_info to object_id, 2016-06-24) and a couple of its
ancestors.

[GSoC,0/3] Move generation, graph_pos to a slab

Message

Comments