diff mbox series

[v2,18/24] pack-bitmap-write: build fewer intermediate bitmaps

Message ID 5262daa3300114fbaccdbc7393882c5435f95f4f.1605649533.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-bitmap: bitmap generation improvements | expand

Commit Message

Taylor Blau Nov. 17, 2020, 9:48 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The bitmap_writer_build() method calls bitmap_builder_init() to
construct a list of commits reachable from the selected commits along
with a "reverse graph". This reverse graph has edges pointing from a
commit to other commits that can reach that commit. After computing a
reachability bitmap for a commit, the values in that bitmap are then
copied to the reachability bitmaps across the edges in the reverse
graph.

We can now relax the role of the reverse graph to greatly reduce the
number of intermediate reachability bitmaps we compute during this
reverse walk. The end result is that we walk objects the same number of
times as before when constructing the reachability bitmaps, but we also
spend much less time copying bits between bitmaps and have much lower
memory pressure in the process.

The core idea is to select a set of "important" commits based on
interactions among the sets of commits reachable from each selected commit.

The first technical concept is to create a new 'commit_mask' member in the
bb_commit struct. Note that the selected commits are provided in an
ordered array. The first thing to do is to mark the ith bit in the
commit_mask for the ith selected commit. As we walk the commit-graph, we
copy the bits in a commit's commit_mask to its parents. At the end of
the walk, the ith bit in the commit_mask for a commit C stores a boolean
representing "The ith selected commit can reach C."

As we walk, we will discover non-selected commits that are important. We
will get into this later, but those important commits must also receive
bit positions, growing the width of the bitmasks as we walk. At the true
end of the walk, the ith bit means "the ith _important_ commit can reach
C."

MAXIMAL COMMITS
---------------

We use a new 'maximal' bit in the bb_commit struct to represent whether
a commit is important or not. The term "maximal" comes from the
partially-ordered set of commits in the commit-graph where C >= P if P
is a parent of C, and then extending the relationship transitively.
Instead of taking the maximal commits across the entire commit-graph, we
instead focus on selecting each commit that is maximal among commits
with the same bits on in their commit_mask. This definition is
important, so let's consider an example.

Suppose we have three selected commits A, B, and C. These are assigned
bitmasks 100, 010, and 001 to start. Each of these can be marked as
maximal immediately because they each will be the uniquely maximal
commit that contains their own bit. Keep in mind that that these commits
may have different bitmasks after the walk; for example, if B can reach
C but A cannot, then the final bitmask for C is 011. Even in these
cases, C would still be a maximal commit among all commits with the
third bit on in their masks.

Now define sets X, Y, and Z to be the sets of commits reachable from A,
B, and C, respectively. The intersections of these sets correspond to
different bitmasks:

 * 100: X - (Y union Z)
 * 010: Y - (X union Z)
 * 001: Z - (X union Y)
 * 110: (X intersect Y) - Z
 * 101: (X intersect Z) - Y
 * 011: (Y intersect Z) - X
 * 111: X intersect Y intersect Z

This can be visualized with the following Hasse diagram:

	100    010    001
         | \  /   \  / |
         |  \/     \/  |
         |  /\     /\  |
         | /  \   /  \ |
        110    101    011
          \___  |  ___/
              \ | /
               111

Some of these bitmasks may not be represented, depending on the topology
of the commit-graph. In fact, we are counting on it, since the number of
possible bitmasks is exponential in the number of selected commits, but
is also limited by the total number of commits. In practice, very few
bitmasks are possible because most commits converge on a common "trunk"
in the commit history.

With this three-bit example, we wish to find commits that are maximal
for each bitmask. How can we identify this as we are walking?

As we walk, we visit a commit C. Since we are walking the commits in
topo-order, we know that C is visited after all of its children are
visited. Thus, when we get C from the revision walk we inspect the
'maximal' property of its bb_data and use that to determine if C is truly
important. Its commit_mask is also nearly final. If C is not one of the
originally-selected commits, then assign a bit position to C (by
incrementing num_maximal) and set that bit on in commit_mask. See
"MULTIPLE MAXIMAL COMMITS" below for more detail on this.

Now that the commit C is known to be maximal or not, consider each
parent P of C. Compute two new values:

 * c_not_p : true if and only if the commit_mask for C contains a bit
             that is not contained in the commit_mask for P.

 * p_not_c : true if and only if the commit_mask for P contains a bit
             that is not contained in the commit_mask for P.

If c_not_p is false, then P already has all of the bits that C would
provide to its commit_mask. In this case, move on to other parents as C
has nothing to contribute to P's state that was not already provided by
other children of P.

We continue with the case that c_not_p is true. This means there are
bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or()
to add those bits.

If p_not_c is also true, then set the maximal bit for P to one. This means
that if no other commit has P as a parent, then P is definitely maximal.
This is because no child had the same bitmask. It is important to think
about the maximal bit for P at this point as a temporary state: "P is
maximal based on current information."

In contrast, if p_not_c is false, then set the maximal bit for P to
zero. Further, clear all reverse_edges for P since any edges that were
previously assigned to P are no longer important. P will gain all
reverse edges based on C.

The final thing we need to do is to update the reverse edges for P.
These reverse edges respresent "which closest maximal commits
contributed bits to my commit_mask?" Since C contributed bits to P's
commit_mask in this case, C must add to the reverse edges of P.

If C is maximal, then C is a 'closest' maximal commit that contributed
bits to P. Add C to P's reverse_edges list.

Otherwise, C has a list of maximal commits that contributed bits to its
bitmask (and this list is exactly one element). Add all of these items
to P's reverse_edges list. Be careful to ignore duplicates here.

After inspecting all parents P for a commit C, we can clear the
commit_mask for C. This reduces the memory load to be limited to the
"width" of the commit graph.

Consider our ABC/XYZ example from earlier and let's inspect the state of
the commits for an interesting bitmask, say 011. Suppose that D is the
only maximal commit with this bitmask (in the first three bits). All
other commits with bitmask 011 have D as the only entry in their
reverse_edges list. D's reverse_edges list contains B and C.

COMPUTING REACHABILITY BITMAPS
------------------------------

Now that we have our definition, let's zoom out and consider what
happens with our new reverse graph when computing reachability bitmaps.
We walk the reverse graph in reverse-topo-order, so we visit commits
with largest commit_masks first. After we compute the reachability
bitmap for a commit C, we push the bits in that bitmap to each commit D
in the reverse edge list for C. Then, when we finally visit D we already
have the bits for everything reachable from maximal commits that D can
reach and we only need to walk the objects in the set-difference.

In our ABC/XYZ example, when we finally walk for the commit A we only
need to walk commits with bitmask equal to A's bitmask. If that bitmask
is 100, then we are only walking commits in X - (Y union Z) because the
bitmap already contains the bits for objects reachable from (X intersect
Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps
for the maximal commits with bitmasks 110 and 101).

The behavior is intended to walk each commit (and the trees that commit
introduces) at most once while allocating and copying fewer reachability
bitmaps. There is one caveat: what happens when there are multiple
maximal commits with the same bitmask, with respect to the initial set
of selected commits?

MULTIPLE MAXIMAL COMMITS
------------------------

Earlier, we mentioned that when we discover a new maximal commit, we
assign a new bit position to that commit and set that bit position to
one for that commit. This is absolutely important for interesting
commit-graphs such as git/git and torvalds/linux. The reason is due to
the existence of "butterflies" in the commit-graph partial order.

Here is an example of four commits forming a butterfly:

   I    J
   |\  /|
   | \/ |
   | /\ |
   |/  \|
   M    N
    \  /
     |/
     Q

Here, I and J both have parents M and N. In general, these do not need
to be exact parent relationships, but reachability relationships. The
most important part is that M and N cannot reach each other, so they are
independent in the partial order. If I had commit_mask 10 and J had
commit_mask 01, then M and N would both be assigned commit_mask 11 and
be maximal commits with the bitmask 11. Then, what happens when M and N
can both reach a commit Q? If Q is also assigned the bitmask 11, then it
is not maximal but is reachable from both M and N.

While this is not necessarily a deal-breaker for our abstract definition
of finding maximal commits according to a given bitmask, we have a few
issues that can come up in our larger picture of constructing
reachability bitmaps.

In particular, if we do not also consider Q to be a "maximal" commit,
then we will walk commits reachable from Q twice: once when computing
the reachability bitmap for M and another time when computing the
reachability bitmap for N. This becomes much worse if the topology
continues this pattern with multiple butterflies.

The solution has already been mentioned: each of M and N are assigned
their own bits to the bitmask and hence they become uniquely maximal for
their bitmasks. Finally, Q also becomes maximal and thus we do not need
to walk its commits multiple times. The final bitmasks for these commits
are as follows:

  I:10       J:01
   |\        /|
   | \ _____/ |
   | /\____   |
   |/      \  |
   M:111    N:1101
        \  /
       Q:1111

Further, Q's reverse edge list is { M, N }, while M and N both have
reverse edge list { I, J }.

PERFORMANCE MEASUREMENTS
------------------------

Now that we've spent a LOT of time on the theory of this algorithm,
let's show that this is actually worth all that effort.

To test the performance, use GIT_TRACE2_PERF=1 when running
'git repack -abd' in a repository with no existing reachability bitmaps.
This avoids any issues with keeping existing bitmaps to skew the
numbers.

Inspect the "building_bitmaps_total" region in the trace2 output to
focus on the portion of work that is affected by this change. Here are
the performance comparisons for a few repositories. The timings are for
the following versions of Git: "multi" is the timing from before any
reverse graph is constructed, where we might perform multiple
traversals. "reverse" is for the previous change where the reverse graph
has every reachable commit.  Finally "maximal" is the version introduced
here where the reverse graph only contains the maximal commits.

      Repository: git/git
           multi: 2.628 sec
         reverse: 2.344 sec
         maximal: 2.047 sec

      Repository: torvalds/linux
           multi: 64.7 sec
         reverse: 205.3 sec
         maximal: 44.7 sec

So in all cases we've not only recovered any time lost to switching to
the reverse-edge algorithm, but we come out ahead of "multi" in all
cases. Likewise, peak heap has gone back to something reasonable:

      Repository: torvalds/linux
           multi: 2.087 GB
         reverse: 3.141 GB
         maximal: 2.288 GB

While I do not have access to full fork networks on GitHub, Peff has run
this algorithm on the chromium/chromium fork network and reported a
change from 3 hours to ~233 seconds. That network is particularly
beneficial for this approach because it has a long, linear history along
with many tags. The "multi" approach was obviously quadratic and the new
approach is linear.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c     | 72 +++++++++++++++++++++++++++++++---
 t/t5310-pack-bitmaps.sh | 87 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 149 insertions(+), 10 deletions(-)

Comments

Jonathan Tan Nov. 24, 2020, 6:07 a.m. UTC | #1
I think this is the "big patch" mentioned in IRC [1]? I'll review just
the commit message of this one first, then go back and review from patch
10 ("pack-bitmap-write: reimplement bitmap writing") up to this
one inclusive.

[1] https://colabti.org/irclogger/irclogger_log/git-devel?date=2020-11-23

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The bitmap_writer_build() method calls bitmap_builder_init() to
> construct a list of commits reachable from the selected commits along
> with a "reverse graph". This reverse graph has edges pointing from a
> commit to other commits that can reach that commit. After computing a
> reachability bitmap for a commit, the values in that bitmap are then
> copied to the reachability bitmaps across the edges in the reverse
> graph.
> 
> We can now relax the role of the reverse graph to greatly reduce the
> number of intermediate reachability bitmaps we compute during this
> reverse walk. The end result is that we walk objects the same number of
> times as before when constructing the reachability bitmaps, but we also
> spend much less time copying bits between bitmaps and have much lower
> memory pressure in the process.

OK - as I have seen in the previous patches, and as said here, the edges
of the graph were previously parent to immediate child, but I believe
that patch 12 ("pack-bitmap-write: fill bitmap with commit history")
makes it so that the edges don't need to be direct parent-child
relationships. That patch does not, however, decide what the vertices
and edges should thus be, so that ability could not be used. This patch
seems to do so, and thus makes use of that ability.

> The core idea is to select a set of "important" commits based on
> interactions among the sets of commits reachable from each selected commit.

Makes sense.

> The first technical concept is to create a new 'commit_mask' member in the
> bb_commit struct. Note that the selected commits are provided in an
> ordered array. The first thing to do is to mark the ith bit in the
> commit_mask for the ith selected commit. 

OK - so this commit_mask is like the bitmaps in Git. There is an array,
initially populated with the list of selected commits (which are
selected using another algorithm, which is a separate concern from this
patch set), and each bit in commit_mask corresponds to the corresponding
entry in that array.

From this, I assume that the commit_mask values in the selected
bb_commit structs will start with a nonzero value (written in binary,
having 1 bit set), and the other commit_mask values will start with
zero.

> As we walk the commit-graph, we
> copy the bits in a commit's commit_mask to its parents. At the end of
> the walk, the ith bit in the commit_mask for a commit C stores a boolean
> representing "The ith selected commit can reach C."

The walk is done in topological order - visiting children before
parents. Copying makes sense - if a commit can reach me, it can reach my
parents as well.

> As we walk, we will discover non-selected commits that are important. We
> will get into this later, but those important commits must also receive
> bit positions, growing the width of the bitmasks as we walk. At the true
> end of the walk, the ith bit means "the ith _important_ commit can reach
> C."

OK - so the initial array, initially populated with the list of selected
commits, can be grown and will include other important commits as well.
This is similar to the bitmap revwalk algorithm - the bitmaps in that
algorithm can be grown to include other objects as well.

> MAXIMAL COMMITS
> ---------------
> 
> We use a new 'maximal' bit in the bb_commit struct to represent whether
> a commit is important or not. The term "maximal" comes from the
> partially-ordered set of commits in the commit-graph where C >= P if P
> is a parent of C, and then extending the relationship transitively.

I had to look up what "maximal" means in a partial order. :-P An element
of a partially ordered set is "maximal" if there is no other element
that is "greater" than it. Here, all descendants are "greater" than
their ancestors.

> Instead of taking the maximal commits across the entire commit-graph, 

I was wondering about this :-)

> we
> instead focus on selecting each commit that is maximal among commits
> with the same bits on in their commit_mask. 

Ah, OK. Two commits will have the same commit_mask if the exact same set
of important commits can reach them.

> This definition is
> important, so let's consider an example.
> 
> Suppose we have three selected commits A, B, and C. These are assigned
> bitmasks 100, 010, and 001 to start. Each of these can be marked as
> maximal immediately because they each will be the uniquely maximal
> commit that contains their own bit. 

That is correct. To further elaborate on this explanation, let's say we
have a selected commit C (and since it is selected, the commit_mask in
each commit will have a bit corresponding to whether C can reach it).
Each other commit is either an ancestor, a descendant, or unrelated.

 - C cannot reach descendants.
 - C cannot reach unrelated commits.
 - C can reach all ancestors, but in the partial order, C compares
   "greater" to them anyway.

So every other commit cannot affect C's maximal status.

> Keep in mind that that these commits
> may have different bitmasks after the walk; for example, if B can reach
> C but A cannot, then the final bitmask for C is 011. Even in these
> cases, C would still be a maximal commit among all commits with the
> third bit on in their masks.

Yes.

> Now define sets X, Y, and Z to be the sets of commits reachable from A,
> B, and C, respectively. The intersections of these sets correspond to
> different bitmasks:
> 
>  * 100: X - (Y union Z)
>  * 010: Y - (X union Z)
>  * 001: Z - (X union Y)
>  * 110: (X intersect Y) - Z
>  * 101: (X intersect Z) - Y
>  * 011: (Y intersect Z) - X
>  * 111: X intersect Y intersect Z
> 
> This can be visualized with the following Hasse diagram:
> 
> 	100    010    001
>          | \  /   \  / |
>          |  \/     \/  |
>          |  /\     /\  |
>          | /  \   /  \ |
>         110    101    011
>           \___  |  ___/
>               \ | /
>                111
> 
> Some of these bitmasks may not be represented, depending on the topology
> of the commit-graph. In fact, we are counting on it, since the number of
> possible bitmasks is exponential in the number of selected commits, but
> is also limited by the total number of commits. In practice, very few
> bitmasks are possible because most commits converge on a common "trunk"
> in the commit history.

This section wasn't very useful to me - but I would appreciate it if
others chimed in to say it was useful to them.

> With this three-bit example, we wish to find commits that are maximal
> for each bitmask. How can we identify this as we are walking?

OK - now we come to the algorithm. I presume the algorithm doesn't only
find commits that are maximal for each bitmask, but also updates the
list of important commits (and thus increasing the size of the bitmask)?
Reading below, I see that the answer to my question is yes. Ah...it
wasn't clear to me that the purpose of finding the maximal commits is
also to add to the list of important commits, but perhaps it will be
obvious to other reasons.

I'll work through the algorithm using the butterfly example below,
reproduced here:

>    I    J
>    |\  /|
>    | \/ |
>    | /\ |
>    |/  \|
>    M    N
>     \  /
>      |/
>      Q

I was going to suggest that we suppose that there are no selected
commits, but it looks like the algorithm would optimize itself out
(meaning that it won't make any commit maximal - which makes sense, I
guess). The example below had I and J as selected commits (which I know
because the commit_mask values for I and J are "0b10" and "0b01"
respectively), so let's go with that.

> As we walk, we visit a commit C. Since we are walking the commits in
> topo-order, we know that C is visited after all of its children are
> visited. Thus, when we get C from the revision walk we inspect the
> 'maximal' property of its bb_data and use that to determine if C is truly
> important. Its commit_mask is also nearly final. 

OK - when a commit is visited, we would already know its "maximal"
status because when we had visited its parents, we already modified
"maximal" (because we update a commit's children when we visit it -
details about this are to follow, presumably).

> If C is not one of the
> originally-selected commits, then assign a bit position to C (by
> incrementing num_maximal) and set that bit on in commit_mask. See
> "MULTIPLE MAXIMAL COMMITS" below for more detail on this.

I presume we only assign a bit position to C if it is "maximal"?

> Now that the commit C is known to be maximal or not, consider each
> parent P of C. Compute two new values:
> 
>  * c_not_p : true if and only if the commit_mask for C contains a bit
>              that is not contained in the commit_mask for P.
> 
>  * p_not_c : true if and only if the commit_mask for P contains a bit
>              that is not contained in the commit_mask for P.

OK, let's try this with I. I'll use the same <commit
letter>:<commit_mask in little-endian order> notation as the one the
commit author uses below to indicate the commit_mask of a commit. We
have I:10 with 2 parents M:00 and N:00, so for both parents, c_not_p is
true and p_not_c is false.

> If c_not_p is false, then P already has all of the bits that C would
> provide to its commit_mask. In this case, move on to other parents as C
> has nothing to contribute to P's state that was not already provided by
> other children of P.

To emphasize, we "move on" regardless of what p_not_c is. In our
example, this is not true in I's case, so let's read on.

After the analysis below, I see why we can "move on".

> We continue with the case that c_not_p is true. This means there are
> bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or()
> to add those bits.

OK. So we have I:10 (unchanged), M:10, N:10. Which as I said above,
makes sense, since if a commit can reach I, it can reach M and N.

> If p_not_c is also true, then set the maximal bit for P to one. This means
> that if no other commit has P as a parent, then P is definitely maximal.
> This is because no child had the same bitmask. It is important to think
> about the maximal bit for P at this point as a temporary state: "P is
> maximal based on current information."
> 
> In contrast, if p_not_c is false, then set the maximal bit for P to
> zero.

At first it was confusing that (1) the last maximal bit is used when
there is no guarantee that the children are iterated in any particular
order, and (2) the maximal bit is never updated when c_not_p is false.
So I was thinking of a counterexample, but couldn't think of one. So
this algorithm looks correct so far.

For (2), I think of it this way:

    C1 C2 C3
      \ |/
        P

Let's say that C1 has a commit_mask, and C2 and C3 have the exact same
commit_mask. P's maximal bit will be the exact same in the following
situation:

    C1 C2
      \ |
        P

So skipping over C3 is correct. And C3 must be skipped because the
algorithm as written is not idempotent (during the C2 iteration,
commit_mask of P is updated).

For (1), I think of it as follows. The only one that counts is the very
last calculation (which you can see from the algorithm - the maximal bit
is constantly being overridden). During the very last iteration, does P
have any information that Cx does not? If yes, P has a commit_mask that
is unique w.r.t. all its children (since it combines unique information
from its other children that Cx does not, plus some other unique
information that Cx has and its other children do not) and is
therefore maximal. If not, all the information P got is also known by
Cx, so it is definitely not maximal (Cx shares the commit_mask with P,
and is "greater" than P).

> Further, clear all reverse_edges for P since any edges that were
> previously assigned to P are no longer important. P will gain all
> reverse edges based on C.

(I read ahead to see what the reverse edges are.) Indeed, C has all the
information.

> The final thing we need to do is to update the reverse edges for P.
> These reverse edges respresent "which closest maximal commits
> contributed bits to my commit_mask?" Since C contributed bits to P's
> commit_mask in this case, C must add to the reverse edges of P.
> 
> If C is maximal, then C is a 'closest' maximal commit that contributed
> bits to P. Add C to P's reverse_edges list.
> 
> Otherwise, C has a list of maximal commits that contributed bits to its
> bitmask (and this list is exactly one element). Add all of these items
> to P's reverse_edges list. Be careful to ignore duplicates here.

OK - the other end of reverse edges are always to maximal commits.
Propagation in this way (if C is not maximal) makes sense.

> After inspecting all parents P for a commit C, we can clear the
> commit_mask for C. This reduces the memory load to be limited to the
> "width" of the commit graph.

Optimization - OK.

I might as well finish working through the example. Let's start from the
beginning.

    I    J I:10()M J:01()M
    |\  /|
    | \/ |
    | /\ |
    |/  \|
    M    N
     \  /
      |/
      Q

(Brackets are the destinations of reverse edges. M is maximal, ~M is not
maximal.)

Iteration starts with I. I is maximal, but it is also one of the
selected commits, so we need not do anything else. Look at its parents,
starting with M. c_of_p is true, so copy the bits over. p_of_c is false,
so the maximal bit of M is zero, clear all its reverse edges (no-op in
this case, since it has none), and update its reverse edges: C is
maximal so it will just be C. The procedure for N is exactly the same.
So we have:

    I    J I:10()M J:01()M
    |\  /|
    | \/ |
    | /\ |
    |/  \|
    M    N M:10(I)~M N:10(I)~M
     \  /
      |/
      Q

Now onto J. J is maximal, but it is also one of the selected commits, so
we need not do anything else. Look at its parent M. c_of_p is true, so
copy the bits over. p_of_c is true this time. So set the maximal bit to
true. (Indeed, M has information that is independent of J - the stuff
that it got from I.) We need not clear any reverse edges, but must still
update them: J is maximal so we add J. The procedure for N is exactly
the same. So we have:

    I    J I:10()M J:01()M
    |\  /|
    | \/ |
    | /\ |
    |/  \|
    M    N M:11(I,J)M N:11(I,J)M
     \  /
      |/
      Q

Let's go to M. M is maximal, and it is not one of the selected commits,
so widen the commit_mask and set the corresponding bit on M. Look at its
only parent Q. c_of_p is true, so copy the bits over. p_of_c is false,
so the maximal bit of Q is zero, and update its reverse edges as usual.

    I    J I:10()M J:01()M
    |\  /|
    | \/ |
    | /\ |
    |/  \|
    M    N M:111(I,J)M N:11(I,J)M
     \  /
      |/
      Q Q:111(M)~M

Now, N. N is maximal, and it is not one of the selected commits, so
widen the commit_mask and set the corresponding bit on N. Look at its
only parent Q. c_of_p is true; copy bits; but this time p_of_c is true,
so set the maximal bit to true. Don't clear reverse edges; N is maximal
so add it.

    I    J I:10()M J:01()M
    |\  /|
    | \/ |
    | /\ |
    |/  \|
    M    N M:111(I,J)M N:1101(I,J)M
     \  /
      |/
      Q Q:1111(M,N)M

Checking the answer below, it looks like the commit author and I agree.

> Consider our ABC/XYZ example from earlier and let's inspect the state of
> the commits for an interesting bitmask, say 011. Suppose that D is the
> only maximal commit with this bitmask (in the first three bits). All
> other commits with bitmask 011 have D as the only entry in their
> reverse_edges list. D's reverse_edges list contains B and C.

Yes, this makes sense.

Let me write about "D's reverse_edges list contains B and C" first: The
fact that D has a bitmask of 011 shows no important commits can reach it
other than B or C. Any intermediate commits between B and D would not be
maximal (because B is "greater" than such an intermediate commit, and we
already established that no other important commit can reach D, and
therefore no other important commit can reach any of the commits on the
path between B and D). Same analysis applies for C instead of B. So the
propagated reverse edges would just be B and C.

Now "all other commits with bitmask 011 have D as the only entry in
their reverse_edges list". Since D is the only maximal commit with 011,
any other commit that has 011 (1) must have got it from D, and (2)
cannot be reached by any other important commit. A similar analysis as
in the previous paragraph shows why the reverse_edges list for all these
commits would only contain D.

> COMPUTING REACHABILITY BITMAPS
> ------------------------------
> 
> Now that we have our definition, let's zoom out and consider what
> happens with our new reverse graph when computing reachability bitmaps.
> We walk the reverse graph in reverse-topo-order, so we visit commits
> with largest commit_masks first. After we compute the reachability
> bitmap for a commit C, we push the bits in that bitmap to each commit D
> in the reverse edge list for C. 

That makes sense - the reachability bitmap for D is the reachability
bitmap for C + the reachability bitmap of (D-C). Here we have the first
operand.

> Then, when we finally visit D we already
> have the bits for everything reachable from maximal commits that D can
> reach and we only need to walk the objects in the set-difference.

Walking the objects in the set-difference gives us the second operand.
Makes sense.

> In our ABC/XYZ example, when we finally walk for the commit A we only
> need to walk commits with bitmask equal to A's bitmask. If that bitmask
> is 100, then we are only walking commits in X - (Y union Z) because the
> bitmap already contains the bits for objects reachable from (X intersect
> Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps
> for the maximal commits with bitmasks 110 and 101).

This is probably correct, but I've lost track of what X, Y, and Z are so
I'll just skip this paragraph.

> The behavior is intended to walk each commit (and the trees that commit
> introduces) at most once while allocating and copying fewer reachability
> bitmaps. 

Yes, I can see how this algorithm causes this behavior.

> There is one caveat: what happens when there are multiple
> maximal commits with the same bitmask, with respect to the initial set
> of selected commits?

I'm going to be lazy here and ask, is this possible? As described
below in "MULTIPLE MAXIMAL COMMITS", if a non-selected commit turns out
to be maximal, it will have its very own bit, and thus become the
"progenitor" of all commits with that bit set. (This is not a true
progenitor, because this bit propagates from the children to the
parents and not the other way round - unlike a gene.)

> MULTIPLE MAXIMAL COMMITS
> ------------------------

I think I discussed everything here earlier, so [skip].

> PERFORMANCE MEASUREMENTS
> ------------------------

Numbers look good. [skip]

As discussed above, I'll not review the code this round. [skip code]

Phew...this took longer than expected. I'll see if I can review the rest
of the patches tomorrow.
Jonathan Tan Nov. 25, 2020, 1:46 a.m. UTC | #2
I've reviewed the commit message already [1]; now let's look at the code.

[1] https://lore.kernel.org/git/20201124060738.762751-1-jonathantanmy@google.com/

>  struct bb_commit {
>  	struct commit_list *reverse_edges;
> +	struct bitmap *commit_mask;
>  	struct bitmap *bitmap;
> -	unsigned selected:1;
> +	unsigned selected:1,
> +		 maximal:1;

The code itself probably should contain comments about the algorithm,
but I'm not sure of the best way to do it. (E.g. I would say that
"maximal" should be "When iteration in bitmap_builder_init() reaches
this bb_commit, this is true iff none of its descendants has or will
ever have the exact same commit_mask" - but then when do we explain why
the commit_mask matters?)

>  	unsigned idx; /* within selected array */
>  };
>  
> @@ -198,7 +200,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb,
>  {
>  	struct rev_info revs;
>  	struct commit *commit;
> -	unsigned int i;
> +	unsigned int i, num_maximal;
>  
>  	memset(bb, 0, sizeof(*bb));
>  	init_bb_data(&bb->data);
> @@ -210,27 +212,85 @@ static void bitmap_builder_init(struct bitmap_builder *bb,
>  	for (i = 0; i < writer->selected_nr; i++) {
>  		struct commit *c = writer->selected[i].commit;
>  		struct bb_commit *ent = bb_data_at(&bb->data, c);
> +
>  		ent->selected = 1;
> +		ent->maximal = 1;
>  		ent->idx = i;
> +
> +		ent->commit_mask = bitmap_new();
> +		bitmap_set(ent->commit_mask, i);
> +
>  		add_pending_object(&revs, &c->object, "");
>  	}
> +	num_maximal = writer->selected_nr;
>  
>  	if (prepare_revision_walk(&revs))
>  		die("revision walk setup failed");
>  
>  	while ((commit = get_revision(&revs))) {
>  		struct commit_list *p;
> +		struct bb_commit *c_ent;
>  
>  		parse_commit_or_die(commit);
>  
> -		ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
> -		bb->commits[bb->commits_nr++] = commit;
> +		c_ent = bb_data_at(&bb->data, commit);
> +
> +		if (c_ent->maximal) {
> +			if (!c_ent->selected) {
> +				bitmap_set(c_ent->commit_mask, num_maximal);
> +				num_maximal++;
> +			}
> +
> +			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
> +			bb->commits[bb->commits_nr++] = commit;

So the order of bit assignments in the commit_mask and the order of
commits in bb->commits are not the same. In the commit_mask, bits are
first assigned for selected commits and then the rest for commits we
discover to be maximal. But in bb->commits, the order follows the
topologically-sorted iteration. This is fine, but might be worth a
comment (to add to the already big comment burden...)

> +		}
>  
>  		for (p = commit->parents; p; p = p->next) {
> -			struct bb_commit *ent = bb_data_at(&bb->data, p->item);
> -			commit_list_insert(commit, &ent->reverse_edges);
> +			struct bb_commit *p_ent = bb_data_at(&bb->data, p->item);
> +			int c_not_p, p_not_c;
> +
> +			if (!p_ent->commit_mask) {
> +				p_ent->commit_mask = bitmap_new();
> +				c_not_p = 1;
> +				p_not_c = 0;
> +			} else {
> +				c_not_p = bitmap_diff_nonzero(c_ent->commit_mask, p_ent->commit_mask);
> +				p_not_c = bitmap_diff_nonzero(p_ent->commit_mask, c_ent->commit_mask);
> +			}
> +
> +			if (!c_not_p)
> +				continue;
> +
> +			bitmap_or(p_ent->commit_mask, c_ent->commit_mask);
> +
> +			if (p_not_c)
> +				p_ent->maximal = 1;
> +			else {
> +				p_ent->maximal = 0;
> +				free_commit_list(p_ent->reverse_edges);
> +				p_ent->reverse_edges = NULL;
> +			}
> +
> +			if (c_ent->maximal) {
> +				commit_list_insert(commit, &p_ent->reverse_edges);
> +			} else {
> +				struct commit_list *cc = c_ent->reverse_edges;
> +
> +				for (; cc; cc = cc->next) {
> +					if (!commit_list_contains(cc->item, p_ent->reverse_edges))
> +						commit_list_insert(cc->item, &p_ent->reverse_edges);
> +				}
> +			}
>  		}
> +
> +		bitmap_free(c_ent->commit_mask);
> +		c_ent->commit_mask = NULL;
>  	}
> +
> +	trace2_data_intmax("pack-bitmap-write", the_repository,
> +			   "num_selected_commits", writer->selected_nr);
> +	trace2_data_intmax("pack-bitmap-write", the_repository,
> +			   "num_maximal_commits", num_maximal);
>  }

The rest looks like a faithful implementation of the algorithm.

Now let's look at the tests.

> +# To ensure the logic for "maximal commits" is exercised, make
> +# the repository a bit more complicated.
> +#
> +#    other                         master
> +#      *                             *
> +# (99 commits)                  (99 commits)
> +#      *                             *
> +#      |\                           /|
> +#      | * octo-other  octo-master * |
> +#      |/|\_________  ____________/|\|
> +#      | \          \/  __________/  |
> +#      |  | ________/\ /             |
> +#      *  |/          * merge-right  *
> +#      | _|__________/ \____________ |
> +#      |/ |                         \|
> +# (l1) *  * merge-left               * (r1)
> +#      | / \________________________ |
> +#      |/                           \|
> +# (l2) *                             * (r2)
> +#       \____________...____________ |

What does the ... represent? If a certain number of commits, it would be
clearer to write that there.

> +#                                   \|
> +#                                    * (base)

OK - some of the crosses are unclear, but from the bitmask given below,
I know where the lines should go.

> +#
> +# The important part for the maximal commit algorithm is how
> +# the bitmasks are extended. Assuming starting bit positions
> +# for master (bit 0) and other (bit 1), and some flexibility
> +# in the order that merge bases are visited, the bitmasks at
> +# the end should be:
> +#
> +#      master: 1       (maximal, selected)
> +#       other: 01      (maximal, selected)
> +# octo-master: 1
> +#  octo-other: 01
> +# merge-right: 111     (maximal)
> +#        (l1): 111
> +#        (r1): 111
> +#  merge-left: 1101    (maximal)
> +#        (l2): 11111   (maximal)
> +#        (r2): 111101  (maximal)
> +#      (base): 1111111 (maximal)

This makes sense. (l1) and (r1) are not maximal because everything that
can reach merge-right can also reach them.

[snip]

>  test_expect_success 'full repack creates bitmaps' '
> -	git repack -ad &&
> +	GIT_TRACE2_EVENT_NESTING=4 GIT_TRACE2_EVENT="$(pwd)/trace" \
> +		git repack -ad &&
>  	ls .git/objects/pack/ | grep bitmap >output &&
> -	test_line_count = 1 output
> +	test_line_count = 1 output &&
> +	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
> +	grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace

From the diagram and bit masks, I see that the important number for
"maximal" is 7. Could this test be run twice - one without the crosses
and one with, and we can verify that the difference between the maximal
commits is 7? As it is, this 111 number is hard to verify.
Derrick Stolee Nov. 30, 2020, 6:41 p.m. UTC | #3
On 11/24/2020 8:46 PM, Jonathan Tan wrote:
> I've reviewed the commit message already [1]; now let's look at the code.
> 
> [1] https://lore.kernel.org/git/20201124060738.762751-1-jonathantanmy@google.com/
> 
>>  struct bb_commit {
>>  	struct commit_list *reverse_edges;
>> +	struct bitmap *commit_mask;
>>  	struct bitmap *bitmap;
>> -	unsigned selected:1;
>> +	unsigned selected:1,
>> +		 maximal:1;
> 
> The code itself probably should contain comments about the algorithm,
> but I'm not sure of the best way to do it. (E.g. I would say that
> "maximal" should be "When iteration in bitmap_builder_init() reaches
> this bb_commit, this is true iff none of its descendants has or will
> ever have the exact same commit_mask" - but then when do we explain why
> the commit_mask matters?)

Comments are tricky, as they are likely to go stale. In fact,
this algorithm changes dramatically later in this very series!

How much can we expect a reader to inspect the commit history to
discover the lengthy commit message? The message explains the
algorithm and its many subtleties versus reading comments that may
be too specific to this initial version.

At this point, "maximal" is a property that doesn't mean much
without inspecting the places where we set or check that bit.

>> +		if (c_ent->maximal) {
>> +			if (!c_ent->selected) {
>> +				bitmap_set(c_ent->commit_mask, num_maximal);
>> +				num_maximal++;
>> +			}
>> +
>> +			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
>> +			bb->commits[bb->commits_nr++] = commit;
> 
> So the order of bit assignments in the commit_mask and the order of
> commits in bb->commits are not the same. In the commit_mask, bits are
> first assigned for selected commits and then the rest for commits we
> discover to be maximal. But in bb->commits, the order follows the
> topologically-sorted iteration. This is fine, but might be worth a
> comment (to add to the already big comment burden...)

This one I waffled on a bit. There _is_ a difference between the
bitmask order and this list order. This isn't a problem as long as
no one attempts to use the bitmasks to navigate into this list.

Further, it is entirely possible that the tests to never demonstrate
that difference, so pointing out may help a future developer from
making that mistake. Could be good to add this comment over the
bb->commits definition:

	/*
	 * 'commits' stores the list of maximal commits, in visited
	 * order. This can be different than the bitmask order for
	 * the selected commits.
	 */

>> +# To ensure the logic for "maximal commits" is exercised, make
>> +# the repository a bit more complicated.
>> +#
>> +#    other                         master
>> +#      *                             *
>> +# (99 commits)                  (99 commits)
>> +#      *                             *
>> +#      |\                           /|
>> +#      | * octo-other  octo-master * |
>> +#      |/|\_________  ____________/|\|
>> +#      | \          \/  __________/  |
>> +#      |  | ________/\ /             |
>> +#      *  |/          * merge-right  *
>> +#      | _|__________/ \____________ |
>> +#      |/ |                         \|
>> +# (l1) *  * merge-left               * (r1)
>> +#      | / \________________________ |
>> +#      |/                           \|
>> +# (l2) *                             * (r2)
>> +#       \____________...____________ |
> 
> What does the ... represent? If a certain number of commits, it would be
> clearer to write that there.

The ... are unnecessary and should be ___ instead. Thanks.
 
>> +#                                   \|
>> +#                                    * (base)
> 
> OK - some of the crosses are unclear, but from the bitmask given below,
> I know where the lines should go.
> 
>> +#
>> +# The important part for the maximal commit algorithm is how
>> +# the bitmasks are extended. Assuming starting bit positions
>> +# for master (bit 0) and other (bit 1), and some flexibility
>> +# in the order that merge bases are visited, the bitmasks at
>> +# the end should be:
>> +#
>> +#      master: 1       (maximal, selected)
>> +#       other: 01      (maximal, selected)
>> +# octo-master: 1
>> +#  octo-other: 01
>> +# merge-right: 111     (maximal)
>> +#        (l1): 111
>> +#        (r1): 111
>> +#  merge-left: 1101    (maximal)
>> +#        (l2): 11111   (maximal)
>> +#        (r2): 111101  (maximal)
>> +#      (base): 1111111 (maximal)
> 
> This makes sense. (l1) and (r1) are not maximal because everything that
> can reach merge-right can also reach them.
> 
> [snip]
> 
>>  test_expect_success 'full repack creates bitmaps' '
>> -	git repack -ad &&
>> +	GIT_TRACE2_EVENT_NESTING=4 GIT_TRACE2_EVENT="$(pwd)/trace" \
>> +		git repack -ad &&
>>  	ls .git/objects/pack/ | grep bitmap >output &&
>> -	test_line_count = 1 output
>> +	test_line_count = 1 output &&
>> +	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
>> +	grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace
> 
> From the diagram and bit masks, I see that the important number for
> "maximal" is 7. Could this test be run twice - one without the crosses
> and one with, and we can verify that the difference between the maximal
> commits is 7? As it is, this 111 number is hard to verify.

This number _is_ hard to verify. It is fragile to many behaviors inside
the code of Git, including the selection algorithm and some hard-coded
limits (there's a reason we insert 100 commits on top of each side).
Further, this number changes as the algorithm is modified.

Perhaps the best way to recognize this number is that it changes by adding
5 to the previous number (these are the 5 "newly maximal" commits, since
two are already selected as tips).

This number changes again later, and the difference is justified by
the number of maximal commits dropping by 4.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 369c76a87c..7b4fc0f304 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -180,8 +180,10 @@  static void compute_xor_offsets(void)
 
 struct bb_commit {
 	struct commit_list *reverse_edges;
+	struct bitmap *commit_mask;
 	struct bitmap *bitmap;
-	unsigned selected:1;
+	unsigned selected:1,
+		 maximal:1;
 	unsigned idx; /* within selected array */
 };
 
@@ -198,7 +200,7 @@  static void bitmap_builder_init(struct bitmap_builder *bb,
 {
 	struct rev_info revs;
 	struct commit *commit;
-	unsigned int i;
+	unsigned int i, num_maximal;
 
 	memset(bb, 0, sizeof(*bb));
 	init_bb_data(&bb->data);
@@ -210,27 +212,85 @@  static void bitmap_builder_init(struct bitmap_builder *bb,
 	for (i = 0; i < writer->selected_nr; i++) {
 		struct commit *c = writer->selected[i].commit;
 		struct bb_commit *ent = bb_data_at(&bb->data, c);
+
 		ent->selected = 1;
+		ent->maximal = 1;
 		ent->idx = i;
+
+		ent->commit_mask = bitmap_new();
+		bitmap_set(ent->commit_mask, i);
+
 		add_pending_object(&revs, &c->object, "");
 	}
+	num_maximal = writer->selected_nr;
 
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 
 	while ((commit = get_revision(&revs))) {
 		struct commit_list *p;
+		struct bb_commit *c_ent;
 
 		parse_commit_or_die(commit);
 
-		ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
-		bb->commits[bb->commits_nr++] = commit;
+		c_ent = bb_data_at(&bb->data, commit);
+
+		if (c_ent->maximal) {
+			if (!c_ent->selected) {
+				bitmap_set(c_ent->commit_mask, num_maximal);
+				num_maximal++;
+			}
+
+			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
+			bb->commits[bb->commits_nr++] = commit;
+		}
 
 		for (p = commit->parents; p; p = p->next) {
-			struct bb_commit *ent = bb_data_at(&bb->data, p->item);
-			commit_list_insert(commit, &ent->reverse_edges);
+			struct bb_commit *p_ent = bb_data_at(&bb->data, p->item);
+			int c_not_p, p_not_c;
+
+			if (!p_ent->commit_mask) {
+				p_ent->commit_mask = bitmap_new();
+				c_not_p = 1;
+				p_not_c = 0;
+			} else {
+				c_not_p = bitmap_diff_nonzero(c_ent->commit_mask, p_ent->commit_mask);
+				p_not_c = bitmap_diff_nonzero(p_ent->commit_mask, c_ent->commit_mask);
+			}
+
+			if (!c_not_p)
+				continue;
+
+			bitmap_or(p_ent->commit_mask, c_ent->commit_mask);
+
+			if (p_not_c)
+				p_ent->maximal = 1;
+			else {
+				p_ent->maximal = 0;
+				free_commit_list(p_ent->reverse_edges);
+				p_ent->reverse_edges = NULL;
+			}
+
+			if (c_ent->maximal) {
+				commit_list_insert(commit, &p_ent->reverse_edges);
+			} else {
+				struct commit_list *cc = c_ent->reverse_edges;
+
+				for (; cc; cc = cc->next) {
+					if (!commit_list_contains(cc->item, p_ent->reverse_edges))
+						commit_list_insert(cc->item, &p_ent->reverse_edges);
+				}
+			}
 		}
+
+		bitmap_free(c_ent->commit_mask);
+		c_ent->commit_mask = NULL;
 	}
+
+	trace2_data_intmax("pack-bitmap-write", the_repository,
+			   "num_selected_commits", writer->selected_nr);
+	trace2_data_intmax("pack-bitmap-write", the_repository,
+			   "num_maximal_commits", num_maximal);
 }
 
 static void bitmap_builder_clear(struct bitmap_builder *bb)
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 6bf68fee85..1691710ec1 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -20,11 +20,87 @@  has_any () {
 	grep -Ff "$1" "$2"
 }
 
+# To ensure the logic for "maximal commits" is exercised, make
+# the repository a bit more complicated.
+#
+#    other                         master
+#      *                             *
+# (99 commits)                  (99 commits)
+#      *                             *
+#      |\                           /|
+#      | * octo-other  octo-master * |
+#      |/|\_________  ____________/|\|
+#      | \          \/  __________/  |
+#      |  | ________/\ /             |
+#      *  |/          * merge-right  *
+#      | _|__________/ \____________ |
+#      |/ |                         \|
+# (l1) *  * merge-left               * (r1)
+#      | / \________________________ |
+#      |/                           \|
+# (l2) *                             * (r2)
+#       \____________...____________ |
+#                                   \|
+#                                    * (base)
+#
+# The important part for the maximal commit algorithm is how
+# the bitmasks are extended. Assuming starting bit positions
+# for master (bit 0) and other (bit 1), and some flexibility
+# in the order that merge bases are visited, the bitmasks at
+# the end should be:
+#
+#      master: 1       (maximal, selected)
+#       other: 01      (maximal, selected)
+# octo-master: 1
+#  octo-other: 01
+# merge-right: 111     (maximal)
+#        (l1): 111
+#        (r1): 111
+#  merge-left: 1101    (maximal)
+#        (l2): 11111   (maximal)
+#        (r2): 111101  (maximal)
+#      (base): 1111111 (maximal)
+
 test_expect_success 'setup repo with moderate-sized history' '
-	test_commit_bulk --id=file 100 &&
+	test_commit_bulk --id=file 10 &&
 	git checkout -b other HEAD~5 &&
 	test_commit_bulk --id=side 10 &&
+
+	# add complicated history setup, including merges and
+	# ambiguous merge-bases
+
+	git checkout -b merge-left other~2 &&
+	git merge master~2 -m "merge-left" &&
+
+	git checkout -b merge-right master~1 &&
+	git merge other~1 -m "merge-right" &&
+
+	git checkout -b octo-master master &&
+	git merge merge-left merge-right -m "octopus-master" &&
+
+	git checkout -b octo-other other &&
+	git merge merge-left merge-right -m "octopus-other" &&
+
+	git checkout other &&
+	git merge octo-other -m "pull octopus" &&
+
 	git checkout master &&
+	git merge octo-master -m "pull octopus" &&
+
+	# Remove these branches so they are not selected
+	# as bitmap tips
+	git branch -D merge-left &&
+	git branch -D merge-right &&
+	git branch -D octo-other &&
+	git branch -D octo-master &&
+
+	# add padding to make these merges less interesting
+	# and avoid having them selected for bitmaps
+	test_commit_bulk --id=file 100 &&
+	git checkout other &&
+	test_commit_bulk --id=side 100 &&
+	git checkout master &&
+
 	bitmaptip=$(git rev-parse master) &&
 	blob=$(echo tagged-blob | git hash-object -w --stdin) &&
 	git tag tagged-blob $blob &&
@@ -32,9 +108,12 @@  test_expect_success 'setup repo with moderate-sized history' '
 '
 
 test_expect_success 'full repack creates bitmaps' '
-	git repack -ad &&
+	GIT_TRACE2_EVENT_NESTING=4 GIT_TRACE2_EVENT="$(pwd)/trace" \
+		git repack -ad &&
 	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output
+	test_line_count = 1 output &&
+	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
+	grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace
 '
 
 test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
@@ -356,7 +435,7 @@  test_expect_success 'truncated bitmap fails gracefully (ewah)' '
 	git rev-list --use-bitmap-index --count --all >expect &&
 	bitmap=$(ls .git/objects/pack/*.bitmap) &&
 	test_when_finished "rm -f $bitmap" &&
-	test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
+	test_copy_bytes 270 <$bitmap >$bitmap.tmp &&
 	mv -f $bitmap.tmp $bitmap &&
 	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
 	test_cmp expect actual &&