diff mbox series

[1/1] read-cache: update index format default to v4

Message ID 56374128136fe9377503d446daf98e67847194aa.1537823728.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series read-cache: update index format default to v4 | expand

Commit Message

Philippe Blain via GitGitGadget Sept. 24, 2018, 9:15 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The index v4 format has been available since 2012 with 9d22778
"reach-cache.c: write prefix-compressed names in the index". Since
the format has been stable for so long, almost all versions of Git
in use today understand version 4, removing one barrier to upgrade
-- that someone may want to downgrade and needs a working repo.

Despite being stable for a long time, this index version was never
adopted as the default. This prefix-compressed version of the format
can get significant space savings on repos with large working
directories (which naturally tend to have deep nesting). This version
is set as the default for some external tools, such as VFS for Git.
Because of this external use, the format has had a lot of "testing in
production" and also is subject to continuous integration in these
environments.

Previously, to test version 4 indexes, we needed to run the test
suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).

One potential, but short-term, downside is that we lose coverage of
the version 3 indexes. The trade-off is that we may want to cover
that version using GIT_TEST_INDEX_VERSION=3.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

SZEDER Gábor Sept. 24, 2018, 9:32 p.m. UTC | #1
On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The index v4 format has been available since 2012 with 9d22778
> "reach-cache.c: write prefix-compressed names in the index". Since
> the format has been stable for so long, almost all versions of Git
> in use today understand version 4, removing one barrier to upgrade
> -- that someone may want to downgrade and needs a working repo.

What about alternative implementations, like JGit, libgit2, etc.?

> Despite being stable for a long time, this index version was never
> adopted as the default. This prefix-compressed version of the format
> can get significant space savings on repos with large working
> directories (which naturally tend to have deep nesting). This version
> is set as the default for some external tools, such as VFS for Git.
> Because of this external use, the format has had a lot of "testing in
> production" and also is subject to continuous integration in these
> environments.
> 
> Previously, to test version 4 indexes, we needed to run the test
> suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).
> 
> One potential, but short-term, downside is that we lose coverage of
> the version 3 indexes. The trade-off is that we may want to cover
> that version using GIT_TEST_INDEX_VERSION=3.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  read-cache.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index 372588260e..af6c8f2a67 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
>   * Index File I/O
>   *****************************************************************/
>  
> -#define INDEX_FORMAT_DEFAULT 3
> +#define INDEX_FORMAT_DEFAULT 4
>  
>  static unsigned int get_index_format_default(void)
>  {
> -- 
> gitgitgadget
Junio C Hamano Sept. 24, 2018, 10:29 p.m. UTC | #2
SZEDER Gábor <szeder.dev@gmail.com> writes:

> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>> 
>> The index v4 format has been available since 2012 with 9d22778
>> "reach-cache.c: write prefix-compressed names in the index". Since
>> the format has been stable for so long, almost all versions of Git
>> in use today understand version 4, removing one barrier to upgrade
>> -- that someone may want to downgrade and needs a working repo.
>
> What about alternative implementations, like JGit, libgit2, etc.?

Good question.

Because the index-version of an index file is designed to be sticky,
repos that need to be accessed by other implementations can keep
whatever current version.  A new repo that need to be accessed by
them can be (forcibly) written in v2 and keep its v2ness, I would
think.

And that would serve as an incentive for the implementations to
catch up ;-)
Patrick Steinhardt Sept. 25, 2018, 7:06 a.m. UTC | #3
On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> > From: Derrick Stolee <dstolee@microsoft.com>
> > 
> > The index v4 format has been available since 2012 with 9d22778
> > "reach-cache.c: write prefix-compressed names in the index". Since
> > the format has been stable for so long, almost all versions of Git
> > in use today understand version 4, removing one barrier to upgrade
> > -- that someone may want to downgrade and needs a working repo.
> 
> What about alternative implementations, like JGit, libgit2, etc.?

Speaking of libgit2, we are able to read and write index v4 since
commit c1b370e93 (Merge pull request #3837 from
novalis/dturner/indexv4, 2016-08-17), released with v0.25 in
December 2016. Due to insufficient tests, our read support was
initially broken, which got fixed with commit 3bc95cfe3 (Merge
pull request #4236 from pks-t/pks/index-v4-fixes, 2017-06-07) and
released with v0.26 in June 2017.

Right now I'm not aware of additional bugs in our index v4
support, so we'd be fine with changing the default.

Patrick

> > Despite being stable for a long time, this index version was never
> > adopted as the default. This prefix-compressed version of the format
> > can get significant space savings on repos with large working
> > directories (which naturally tend to have deep nesting). This version
> > is set as the default for some external tools, such as VFS for Git.
> > Because of this external use, the format has had a lot of "testing in
> > production" and also is subject to continuous integration in these
> > environments.
> > 
> > Previously, to test version 4 indexes, we needed to run the test
> > suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).
> > 
> > One potential, but short-term, downside is that we lose coverage of
> > the version 3 indexes. The trade-off is that we may want to cover
> > that version using GIT_TEST_INDEX_VERSION=3.
> > 
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  read-cache.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/read-cache.c b/read-cache.c
> > index 372588260e..af6c8f2a67 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
> >   * Index File I/O
> >   *****************************************************************/
> >  
> > -#define INDEX_FORMAT_DEFAULT 3
> > +#define INDEX_FORMAT_DEFAULT 4
> >  
> >  static unsigned int get_index_format_default(void)
> >  {
> > -- 
> > gitgitgadget
Derrick Stolee Sept. 25, 2018, 2:29 p.m. UTC | #4
On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
>> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>
>>> The index v4 format has been available since 2012 with 9d22778
>>> "reach-cache.c: write prefix-compressed names in the index". Since
>>> the format has been stable for so long, almost all versions of Git
>>> in use today understand version 4, removing one barrier to upgrade
>>> -- that someone may want to downgrade and needs a working repo.
>> What about alternative implementations, like JGit, libgit2, etc.?
> Speaking of libgit2, we are able to read and write index v4 since
> commit c1b370e93

This is a good point, Szeder.

Patrick: I'm glad LibGit2 is up-to-date with index formats.

Unfortunately, taking a look (for the first time) at the JGit code 
reveals that they don't appear to have v4 support. In 
org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the 
DirCache.readFrom() method: lines 488-494, I see the following snippet:

                 final int ver = NB.decodeInt32(hdr, 4);
                 boolean extended = false;
                 if (ver == 3)
                         extended = true;
                 else if (ver != 2)
                         throw new 
CorruptObjectException(MessageFormat.format(
JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));

It looks like this will immediately throw with versions other than 2 or 3.

I'm adding Jonathan Nieder to CC so he can check with JGit people about 
the impact of this change.

Thanks,

-Stolee
Stefan Beller Sept. 25, 2018, 6:01 p.m. UTC | #5
On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> >>> From: Derrick Stolee <dstolee@microsoft.com>
> >>>
> >>> The index v4 format has been available since 2012 with 9d22778
> >>> "reach-cache.c: write prefix-compressed names in the index". Since
> >>> the format has been stable for so long, almost all versions of Git
> >>> in use today understand version 4, removing one barrier to upgrade
> >>> -- that someone may want to downgrade and needs a working repo.
> >> What about alternative implementations, like JGit, libgit2, etc.?
> > Speaking of libgit2, we are able to read and write index v4 since
> > commit c1b370e93
>
> This is a good point, Szeder.
>
> Patrick: I'm glad LibGit2 is up-to-date with index formats.
>
> Unfortunately, taking a look (for the first time) at the JGit code
> reveals that they don't appear to have v4 support. In
> org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
> DirCache.readFrom() method: lines 488-494, I see the following snippet:
>
>                  final int ver = NB.decodeInt32(hdr, 4);
>                  boolean extended = false;
>                  if (ver == 3)
>                          extended = true;
>                  else if (ver != 2)
>                          throw new
> CorruptObjectException(MessageFormat.format(
> JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
>
> It looks like this will immediately throw with versions other than 2 or 3.
>
> I'm adding Jonathan Nieder to CC so he can check with JGit people about
> the impact of this change.

JGit is used both on the server (which doesn't use index/staging area)
as well as client side as e.g. an Eclipse integration, which would
very much like to use the index.

Adding Matthias Sohn as well, who is active in JGit and cares
more about the client side than Googlers who only make use
of the server side part of JGit.

Stefan
Ben Peart Sept. 25, 2018, 9:22 p.m. UTC | #6
On 9/25/2018 2:01 PM, Stefan Beller wrote:
> On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
>>> On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
>>>> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
>>>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>>>
>>>>> The index v4 format has been available since 2012 with 9d22778
>>>>> "reach-cache.c: write prefix-compressed names in the index". Since
>>>>> the format has been stable for so long, almost all versions of Git
>>>>> in use today understand version 4, removing one barrier to upgrade
>>>>> -- that someone may want to downgrade and needs a working repo.
>>>> What about alternative implementations, like JGit, libgit2, etc.?
>>> Speaking of libgit2, we are able to read and write index v4 since
>>> commit c1b370e93
>>
>> This is a good point, Szeder.
>>
>> Patrick: I'm glad LibGit2 is up-to-date with index formats.
>>
>> Unfortunately, taking a look (for the first time) at the JGit code
>> reveals that they don't appear to have v4 support. In
>> org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
>> DirCache.readFrom() method: lines 488-494, I see the following snippet:
>>
>>                   final int ver = NB.decodeInt32(hdr, 4);
>>                   boolean extended = false;
>>                   if (ver == 3)
>>                           extended = true;
>>                   else if (ver != 2)
>>                           throw new
>> CorruptObjectException(MessageFormat.format(
>> JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
>>
>> It looks like this will immediately throw with versions other than 2 or 3.
>>
>> I'm adding Jonathan Nieder to CC so he can check with JGit people about
>> the impact of this change.
> 
> JGit is used both on the server (which doesn't use index/staging area)
> as well as client side as e.g. an Eclipse integration, which would
> very much like to use the index.
> 
> Adding Matthias Sohn as well, who is active in JGit and cares
> more about the client side than Googlers who only make use
> of the server side part of JGit.
> 
> Stefan
> 

In addition to the compatibility with other implementations of git 
question, I think we should include Duy's patch [1] to "optimize reading 
index format v4" before flipping the default to V4.

In my testing, this reduced the index load time of V4 by 10%-15% making 
the CPU cost only ~2% more expensive than V2 or V3 indexes.  This 
increase in CPU cost is more than offset by the significant reduction in 
file IO due to the reduced index file size.

[1] https://public-inbox.org/git/20180825064458.28484-1-pclouds@gmail.com/
Matthias Sohn Sept. 26, 2018, 9:03 p.m. UTC | #7
On Tue, Sep 25, 2018 at 8:01 PM Stefan Beller <sbeller@google.com> wrote:
>
> On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> > > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> > >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> > >>> From: Derrick Stolee <dstolee@microsoft.com>
> > >>>
> > >>> The index v4 format has been available since 2012 with 9d22778
> > >>> "reach-cache.c: write prefix-compressed names in the index". Since
> > >>> the format has been stable for so long, almost all versions of Git
> > >>> in use today understand version 4, removing one barrier to upgrade
> > >>> -- that someone may want to downgrade and needs a working repo.
> > >> What about alternative implementations, like JGit, libgit2, etc.?
> > > Speaking of libgit2, we are able to read and write index v4 since
> > > commit c1b370e93
> >
> > This is a good point, Szeder.
> >
> > Patrick: I'm glad LibGit2 is up-to-date with index formats.
> >
> > Unfortunately, taking a look (for the first time) at the JGit code
> > reveals that they don't appear to have v4 support. In
> > org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
> > DirCache.readFrom() method: lines 488-494, I see the following snippet:
> >
> >                  final int ver = NB.decodeInt32(hdr, 4);
> >                  boolean extended = false;
> >                  if (ver == 3)
> >                          extended = true;
> >                  else if (ver != 2)
> >                          throw new
> > CorruptObjectException(MessageFormat.format(
> > JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
> >
> > It looks like this will immediately throw with versions other than 2 or 3.
> >
> > I'm adding Jonathan Nieder to CC so he can check with JGit people about
> > the impact of this change.
>
> JGit is used both on the server (which doesn't use index/staging area)
> as well as client side as e.g. an Eclipse integration, which would
> very much like to use the index.
>
> Adding Matthias Sohn as well, who is active in JGit and cares
> more about the client side than Googlers who only make use
> of the server side part of JGit.

thanks for the heads up, in fact JGit does not yet support index version 4.
I will look into this.

-Matthias
diff mbox series

Patch

diff --git a/read-cache.c b/read-cache.c
index 372588260e..af6c8f2a67 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1484,7 +1484,7 @@  struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
  * Index File I/O
  *****************************************************************/
 
-#define INDEX_FORMAT_DEFAULT 3
+#define INDEX_FORMAT_DEFAULT 4
 
 static unsigned int get_index_format_default(void)
 {