Message ID | 56374128136fe9377503d446daf98e67847194aa.1537823728.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | read-cache: update index format default to v4 | expand |
On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > The index v4 format has been available since 2012 with 9d22778 > "reach-cache.c: write prefix-compressed names in the index". Since > the format has been stable for so long, almost all versions of Git > in use today understand version 4, removing one barrier to upgrade > -- that someone may want to downgrade and needs a working repo. What about alternative implementations, like JGit, libgit2, etc.? > Despite being stable for a long time, this index version was never > adopted as the default. This prefix-compressed version of the format > can get significant space savings on repos with large working > directories (which naturally tend to have deep nesting). This version > is set as the default for some external tools, such as VFS for Git. > Because of this external use, the format has had a lot of "testing in > production" and also is subject to continuous integration in these > environments. > > Previously, to test version 4 indexes, we needed to run the test > suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4). > > One potential, but short-term, downside is that we lose coverage of > the version 3 indexes. The trade-off is that we may want to cover > that version using GIT_TEST_INDEX_VERSION=3. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > read-cache.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/read-cache.c b/read-cache.c > index 372588260e..af6c8f2a67 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce, > * Index File I/O > *****************************************************************/ > > -#define INDEX_FORMAT_DEFAULT 3 > +#define INDEX_FORMAT_DEFAULT 4 > > static unsigned int get_index_format_default(void) > { > -- > gitgitgadget
SZEDER Gábor <szeder.dev@gmail.com> writes: > On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote: >> From: Derrick Stolee <dstolee@microsoft.com> >> >> The index v4 format has been available since 2012 with 9d22778 >> "reach-cache.c: write prefix-compressed names in the index". Since >> the format has been stable for so long, almost all versions of Git >> in use today understand version 4, removing one barrier to upgrade >> -- that someone may want to downgrade and needs a working repo. > > What about alternative implementations, like JGit, libgit2, etc.? Good question. Because the index-version of an index file is designed to be sticky, repos that need to be accessed by other implementations can keep whatever current version. A new repo that need to be accessed by them can be (forcibly) written in v2 and keep its v2ness, I would think. And that would serve as an incentive for the implementations to catch up ;-)
On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote: > On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > > > The index v4 format has been available since 2012 with 9d22778 > > "reach-cache.c: write prefix-compressed names in the index". Since > > the format has been stable for so long, almost all versions of Git > > in use today understand version 4, removing one barrier to upgrade > > -- that someone may want to downgrade and needs a working repo. > > What about alternative implementations, like JGit, libgit2, etc.? Speaking of libgit2, we are able to read and write index v4 since commit c1b370e93 (Merge pull request #3837 from novalis/dturner/indexv4, 2016-08-17), released with v0.25 in December 2016. Due to insufficient tests, our read support was initially broken, which got fixed with commit 3bc95cfe3 (Merge pull request #4236 from pks-t/pks/index-v4-fixes, 2017-06-07) and released with v0.26 in June 2017. Right now I'm not aware of additional bugs in our index v4 support, so we'd be fine with changing the default. Patrick > > Despite being stable for a long time, this index version was never > > adopted as the default. This prefix-compressed version of the format > > can get significant space savings on repos with large working > > directories (which naturally tend to have deep nesting). This version > > is set as the default for some external tools, such as VFS for Git. > > Because of this external use, the format has had a lot of "testing in > > production" and also is subject to continuous integration in these > > environments. > > > > Previously, to test version 4 indexes, we needed to run the test > > suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4). > > > > One potential, but short-term, downside is that we lose coverage of > > the version 3 indexes. The trade-off is that we may want to cover > > that version using GIT_TEST_INDEX_VERSION=3. > > > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > > --- > > read-cache.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/read-cache.c b/read-cache.c > > index 372588260e..af6c8f2a67 100644 > > --- a/read-cache.c > > +++ b/read-cache.c > > @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce, > > * Index File I/O > > *****************************************************************/ > > > > -#define INDEX_FORMAT_DEFAULT 3 > > +#define INDEX_FORMAT_DEFAULT 4 > > > > static unsigned int get_index_format_default(void) > > { > > -- > > gitgitgadget
On 9/25/2018 3:06 AM, Patrick Steinhardt wrote: > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote: >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote: >>> From: Derrick Stolee <dstolee@microsoft.com> >>> >>> The index v4 format has been available since 2012 with 9d22778 >>> "reach-cache.c: write prefix-compressed names in the index". Since >>> the format has been stable for so long, almost all versions of Git >>> in use today understand version 4, removing one barrier to upgrade >>> -- that someone may want to downgrade and needs a working repo. >> What about alternative implementations, like JGit, libgit2, etc.? > Speaking of libgit2, we are able to read and write index v4 since > commit c1b370e93 This is a good point, Szeder. Patrick: I'm glad LibGit2 is up-to-date with index formats. Unfortunately, taking a look (for the first time) at the JGit code reveals that they don't appear to have v4 support. In org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the DirCache.readFrom() method: lines 488-494, I see the following snippet: final int ver = NB.decodeInt32(hdr, 4); boolean extended = false; if (ver == 3) extended = true; else if (ver != 2) throw new CorruptObjectException(MessageFormat.format( JGitText.get().unknownDIRCVersion, Integer.valueOf(ver))); It looks like this will immediately throw with versions other than 2 or 3. I'm adding Jonathan Nieder to CC so he can check with JGit people about the impact of this change. Thanks, -Stolee
On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 9/25/2018 3:06 AM, Patrick Steinhardt wrote: > > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote: > >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote: > >>> From: Derrick Stolee <dstolee@microsoft.com> > >>> > >>> The index v4 format has been available since 2012 with 9d22778 > >>> "reach-cache.c: write prefix-compressed names in the index". Since > >>> the format has been stable for so long, almost all versions of Git > >>> in use today understand version 4, removing one barrier to upgrade > >>> -- that someone may want to downgrade and needs a working repo. > >> What about alternative implementations, like JGit, libgit2, etc.? > > Speaking of libgit2, we are able to read and write index v4 since > > commit c1b370e93 > > This is a good point, Szeder. > > Patrick: I'm glad LibGit2 is up-to-date with index formats. > > Unfortunately, taking a look (for the first time) at the JGit code > reveals that they don't appear to have v4 support. In > org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the > DirCache.readFrom() method: lines 488-494, I see the following snippet: > > final int ver = NB.decodeInt32(hdr, 4); > boolean extended = false; > if (ver == 3) > extended = true; > else if (ver != 2) > throw new > CorruptObjectException(MessageFormat.format( > JGitText.get().unknownDIRCVersion, Integer.valueOf(ver))); > > It looks like this will immediately throw with versions other than 2 or 3. > > I'm adding Jonathan Nieder to CC so he can check with JGit people about > the impact of this change. JGit is used both on the server (which doesn't use index/staging area) as well as client side as e.g. an Eclipse integration, which would very much like to use the index. Adding Matthias Sohn as well, who is active in JGit and cares more about the client side than Googlers who only make use of the server side part of JGit. Stefan
On 9/25/2018 2:01 PM, Stefan Beller wrote: > On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 9/25/2018 3:06 AM, Patrick Steinhardt wrote: >>> On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote: >>>> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote: >>>>> From: Derrick Stolee <dstolee@microsoft.com> >>>>> >>>>> The index v4 format has been available since 2012 with 9d22778 >>>>> "reach-cache.c: write prefix-compressed names in the index". Since >>>>> the format has been stable for so long, almost all versions of Git >>>>> in use today understand version 4, removing one barrier to upgrade >>>>> -- that someone may want to downgrade and needs a working repo. >>>> What about alternative implementations, like JGit, libgit2, etc.? >>> Speaking of libgit2, we are able to read and write index v4 since >>> commit c1b370e93 >> >> This is a good point, Szeder. >> >> Patrick: I'm glad LibGit2 is up-to-date with index formats. >> >> Unfortunately, taking a look (for the first time) at the JGit code >> reveals that they don't appear to have v4 support. In >> org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the >> DirCache.readFrom() method: lines 488-494, I see the following snippet: >> >> final int ver = NB.decodeInt32(hdr, 4); >> boolean extended = false; >> if (ver == 3) >> extended = true; >> else if (ver != 2) >> throw new >> CorruptObjectException(MessageFormat.format( >> JGitText.get().unknownDIRCVersion, Integer.valueOf(ver))); >> >> It looks like this will immediately throw with versions other than 2 or 3. >> >> I'm adding Jonathan Nieder to CC so he can check with JGit people about >> the impact of this change. > > JGit is used both on the server (which doesn't use index/staging area) > as well as client side as e.g. an Eclipse integration, which would > very much like to use the index. > > Adding Matthias Sohn as well, who is active in JGit and cares > more about the client side than Googlers who only make use > of the server side part of JGit. > > Stefan > In addition to the compatibility with other implementations of git question, I think we should include Duy's patch [1] to "optimize reading index format v4" before flipping the default to V4. In my testing, this reduced the index load time of V4 by 10%-15% making the CPU cost only ~2% more expensive than V2 or V3 indexes. This increase in CPU cost is more than offset by the significant reduction in file IO due to the reduced index file size. [1] https://public-inbox.org/git/20180825064458.28484-1-pclouds@gmail.com/
On Tue, Sep 25, 2018 at 8:01 PM Stefan Beller <sbeller@google.com> wrote: > > On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote: > > > > On 9/25/2018 3:06 AM, Patrick Steinhardt wrote: > > > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote: > > >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote: > > >>> From: Derrick Stolee <dstolee@microsoft.com> > > >>> > > >>> The index v4 format has been available since 2012 with 9d22778 > > >>> "reach-cache.c: write prefix-compressed names in the index". Since > > >>> the format has been stable for so long, almost all versions of Git > > >>> in use today understand version 4, removing one barrier to upgrade > > >>> -- that someone may want to downgrade and needs a working repo. > > >> What about alternative implementations, like JGit, libgit2, etc.? > > > Speaking of libgit2, we are able to read and write index v4 since > > > commit c1b370e93 > > > > This is a good point, Szeder. > > > > Patrick: I'm glad LibGit2 is up-to-date with index formats. > > > > Unfortunately, taking a look (for the first time) at the JGit code > > reveals that they don't appear to have v4 support. In > > org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the > > DirCache.readFrom() method: lines 488-494, I see the following snippet: > > > > final int ver = NB.decodeInt32(hdr, 4); > > boolean extended = false; > > if (ver == 3) > > extended = true; > > else if (ver != 2) > > throw new > > CorruptObjectException(MessageFormat.format( > > JGitText.get().unknownDIRCVersion, Integer.valueOf(ver))); > > > > It looks like this will immediately throw with versions other than 2 or 3. > > > > I'm adding Jonathan Nieder to CC so he can check with JGit people about > > the impact of this change. > > JGit is used both on the server (which doesn't use index/staging area) > as well as client side as e.g. an Eclipse integration, which would > very much like to use the index. > > Adding Matthias Sohn as well, who is active in JGit and cares > more about the client side than Googlers who only make use > of the server side part of JGit. thanks for the heads up, in fact JGit does not yet support index version 4. I will look into this. -Matthias
diff --git a/read-cache.c b/read-cache.c index 372588260e..af6c8f2a67 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce, * Index File I/O *****************************************************************/ -#define INDEX_FORMAT_DEFAULT 3 +#define INDEX_FORMAT_DEFAULT 4 static unsigned int get_index_format_default(void) {