From patchwork Wed Sep 26 19:54:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ben Peart X-Patchwork-Id: 10616729 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 032CD175A for ; Wed, 26 Sep 2018 19:54:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E44222B2D9 for ; Wed, 26 Sep 2018 19:54:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D6AF22B2F2; Wed, 26 Sep 2018 19:54:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 279A52B2D9 for ; Wed, 26 Sep 2018 19:54:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726766AbeI0CJa (ORCPT ); Wed, 26 Sep 2018 22:09:30 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:37299 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726385AbeI0CJa (ORCPT ); Wed, 26 Sep 2018 22:09:30 -0400 Received: by mail-qt1-f193.google.com with SMTP id n6-v6so259285qtl.4 for ; Wed, 26 Sep 2018 12:54:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=c0cdpW/ZL38zqaKZW8jRs6AIQhN4Hl6ioweMWbhKqyQ=; b=uiSmH1Pm7gahnWmweCaBCvho16BdRAlXcW2DPHXmqcXHBMv1PQDMWl/b3sUJE9ZEkg UJqEBlbMeRAysaYtFUOMrKO2eQV/4E928v1OXtwcYsRMO73esAOfKiez7iN0naqf3H9h 9f58ElD4cmkgHZwAWWWeAycFzG8290pzF+drbpYJ1y4KVGhLC4Q1HISEp/vNNj3hTIXq zKHW4e6HmLO+TA7sMHd3uyin+QLJa3z2rZgQdJtQAiTemdifeSGTqJHvAqoCwfV0rpZp qHI1+DWNFu9HfYkN+yPItjaHVBXxXIzlFE07Y2WiawMMqxzEGcIP6DtjFV6to5bUv/Ie tatg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=c0cdpW/ZL38zqaKZW8jRs6AIQhN4Hl6ioweMWbhKqyQ=; b=EL1KTC29tZyRjPMqNug1SrpS7HPVDoHe6bEDs2rhEZZTZfQKC60Ubhf2r6ZVoqr2Eg K3gem2vbZJpB+pkeiW61VE1GK2wOTQidGIKbwUFO/oH/qtzmm53GUyzHbZedLZLAo6NH YIrJpxXdf2uURML4sNhDn3yr2Jm9CQzWDP/TSiDDikFGmQNMMu8soDGYyJcn/P5CGC41 CfvmwESKDtWQnXtbQCIXWukkAVfvd/80NQAfAPyYzz41K7jjrMrsuMLkUdTyjj4qIGGu kbD6eFGc6VG9psUf4niR9B4wgK7xwIwYo5PCEYKH787lKZcao+joEjff4iHTuqELxkLF Trvw== X-Gm-Message-State: ABuFfoj9UMusgzHzu3fbrKZk9/pfepPU635rgRozERn3cDmyrs3TO9uF 7RFa5pRoSDiramZ56rD9IQZ5J6rq5E4= X-Google-Smtp-Source: ACcGV63msdylh6bzCM7OTIeKDKAvwWpB60l+0A0PXm8sIlPtwtKa5buGN0bIse8CDkwOXts4ZWyd6Q== X-Received: by 2002:a0c:e2cd:: with SMTP id t13-v6mr5599399qvl.79.1537991694575; Wed, 26 Sep 2018 12:54:54 -0700 (PDT) Received: from localhost.localdomain (70-33-148-227.unassigned.ntelos.net. [70.33.148.227]) by smtp.gmail.com with ESMTPSA id 23-v6sm3314922qkn.11.2018.09.26.12.54.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Sep 2018 12:54:53 -0700 (PDT) From: Ben Peart X-Google-Original-From: Ben Peart To: git@vger.kernel.org Cc: gitster@pobox.com, pclouds@gmail.com Subject: [PATCH v6 1/7] read-cache.c: optimize reading index format v4 Date: Wed, 26 Sep 2018 15:54:36 -0400 Message-Id: <20180926195442.1380-2-benpeart@microsoft.com> X-Mailer: git-send-email 2.18.0.windows.1 In-Reply-To: <20180926195442.1380-1-benpeart@microsoft.com> References: <20180823154053.20212-1-benpeart@microsoft.com> <20180926195442.1380-1-benpeart@microsoft.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Nguyễn Thái Ngọc Duy Index format v4 requires some more computation to assemble a path based on a previous one. The current code is not very efficient because - it doubles memory copy, we assemble the final path in a temporary first before putting it back to a cache_entry - strbuf_remove() in expand_name_field() is not exactly a good fit for stripping a part at the end, _setlen() would do the same job and is much cheaper. - the open-coded loop to find the end of the string in expand_name_field() can't beat an optimized strlen() This patch avoids the temporary buffer and writes directly to the new cache_entry, which addresses the first two points. The last point could also be avoided if the total string length fits in the first 12 bits of ce_flags, if not we fall back to strlen(). Running "test-tool read-cache 100" on webkit.git (275k files), reading v2 only takes 4.226 seconds, while v4 takes 5.711 seconds, 35% more time. The patch reduces read time on v4 to 4.319 seconds. Signed-off-by: Nguyễn Thái Ngọc Duy --- read-cache.c | 128 ++++++++++++++++++++++++--------------------------- 1 file changed, 60 insertions(+), 68 deletions(-) diff --git a/read-cache.c b/read-cache.c index 8d04d78a58..583a4fb1f8 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1713,63 +1713,24 @@ int read_index(struct index_state *istate) return read_index_from(istate, get_index_file(), get_git_dir()); } -static struct cache_entry *cache_entry_from_ondisk(struct mem_pool *mem_pool, - struct ondisk_cache_entry *ondisk, - unsigned int flags, - const char *name, - size_t len) -{ - struct cache_entry *ce = mem_pool__ce_alloc(mem_pool, len); - - ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec); - ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec); - ce->ce_stat_data.sd_ctime.nsec = get_be32(&ondisk->ctime.nsec); - ce->ce_stat_data.sd_mtime.nsec = get_be32(&ondisk->mtime.nsec); - ce->ce_stat_data.sd_dev = get_be32(&ondisk->dev); - ce->ce_stat_data.sd_ino = get_be32(&ondisk->ino); - ce->ce_mode = get_be32(&ondisk->mode); - ce->ce_stat_data.sd_uid = get_be32(&ondisk->uid); - ce->ce_stat_data.sd_gid = get_be32(&ondisk->gid); - ce->ce_stat_data.sd_size = get_be32(&ondisk->size); - ce->ce_flags = flags & ~CE_NAMEMASK; - ce->ce_namelen = len; - ce->index = 0; - hashcpy(ce->oid.hash, ondisk->sha1); - memcpy(ce->name, name, len); - ce->name[len] = '\0'; - return ce; -} - -/* - * Adjacent cache entries tend to share the leading paths, so it makes - * sense to only store the differences in later entries. In the v4 - * on-disk format of the index, each on-disk cache entry stores the - * number of bytes to be stripped from the end of the previous name, - * and the bytes to append to the result, to come up with its name. - */ -static unsigned long expand_name_field(struct strbuf *name, const char *cp_) -{ - const unsigned char *ep, *cp = (const unsigned char *)cp_; - size_t len = decode_varint(&cp); - - if (name->len < len) - die("malformed name field in the index"); - strbuf_remove(name, name->len - len, len); - for (ep = cp; *ep; ep++) - ; /* find the end */ - strbuf_add(name, cp, ep - cp); - return (const char *)ep + 1 - cp_; -} - -static struct cache_entry *create_from_disk(struct mem_pool *mem_pool, +static struct cache_entry *create_from_disk(struct index_state *istate, struct ondisk_cache_entry *ondisk, unsigned long *ent_size, - struct strbuf *previous_name) + const struct cache_entry *previous_ce) { struct cache_entry *ce; size_t len; const char *name; unsigned int flags; + size_t copy_len; + /* + * Adjacent cache entries tend to share the leading paths, so it makes + * sense to only store the differences in later entries. In the v4 + * on-disk format of the index, each on-disk cache entry stores the + * number of bytes to be stripped from the end of the previous name, + * and the bytes to append to the result, to come up with its name. + */ + int expand_name_field = istate->version == 4; /* On-disk flags are just 16 bits */ flags = get_be16(&ondisk->flags); @@ -1789,21 +1750,54 @@ static struct cache_entry *create_from_disk(struct mem_pool *mem_pool, else name = ondisk->name; - if (!previous_name) { - /* v3 and earlier */ - if (len == CE_NAMEMASK) - len = strlen(name); - ce = cache_entry_from_ondisk(mem_pool, ondisk, flags, name, len); + if (expand_name_field) { + const unsigned char *cp = (const unsigned char *)name; + size_t strip_len, previous_len; - *ent_size = ondisk_ce_size(ce); - } else { - unsigned long consumed; - consumed = expand_name_field(previous_name, name); - ce = cache_entry_from_ondisk(mem_pool, ondisk, flags, - previous_name->buf, - previous_name->len); + previous_len = previous_ce ? previous_ce->ce_namelen : 0; + strip_len = decode_varint(&cp); + if (previous_len < strip_len) { + if (previous_ce) + die(_("malformed name field in the index, near path '%s'"), + previous_ce->name); + else + die(_("malformed name field in the index in the first path")); + } + copy_len = previous_len - strip_len; + name = (const char *)cp; + } + + if (len == CE_NAMEMASK) { + len = strlen(name); + if (expand_name_field) + len += copy_len; + } + + ce = mem_pool__ce_alloc(istate->ce_mem_pool, len); + + ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec); + ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec); + ce->ce_stat_data.sd_ctime.nsec = get_be32(&ondisk->ctime.nsec); + ce->ce_stat_data.sd_mtime.nsec = get_be32(&ondisk->mtime.nsec); + ce->ce_stat_data.sd_dev = get_be32(&ondisk->dev); + ce->ce_stat_data.sd_ino = get_be32(&ondisk->ino); + ce->ce_mode = get_be32(&ondisk->mode); + ce->ce_stat_data.sd_uid = get_be32(&ondisk->uid); + ce->ce_stat_data.sd_gid = get_be32(&ondisk->gid); + ce->ce_stat_data.sd_size = get_be32(&ondisk->size); + ce->ce_flags = flags & ~CE_NAMEMASK; + ce->ce_namelen = len; + ce->index = 0; + hashcpy(ce->oid.hash, ondisk->sha1); - *ent_size = (name - ((char *)ondisk)) + consumed; + if (expand_name_field) { + if (copy_len) + memcpy(ce->name, previous_ce->name, copy_len); + memcpy(ce->name + copy_len, name, len + 1 - copy_len); + *ent_size = (name - ((char *)ondisk)) + len + 1 - copy_len; + } else { + memcpy(ce->name, name, len + 1); + *ent_size = ondisk_ce_size(ce); } return ce; } @@ -1898,7 +1892,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) struct cache_header *hdr; void *mmap; size_t mmap_size; - struct strbuf previous_name_buf = STRBUF_INIT, *previous_name; + const struct cache_entry *previous_ce = NULL; if (istate->initialized) return istate->cache_nr; @@ -1936,11 +1930,9 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) istate->initialized = 1; if (istate->version == 4) { - previous_name = &previous_name_buf; mem_pool_init(&istate->ce_mem_pool, estimate_cache_size_from_compressed(istate->cache_nr)); } else { - previous_name = NULL; mem_pool_init(&istate->ce_mem_pool, estimate_cache_size(mmap_size, istate->cache_nr)); } @@ -1952,12 +1944,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) unsigned long consumed; disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset); - ce = create_from_disk(istate->ce_mem_pool, disk_ce, &consumed, previous_name); + ce = create_from_disk(istate, disk_ce, &consumed, previous_ce); set_index_entry(istate, i, ce); src_offset += consumed; + previous_ce = ce; } - strbuf_release(&previous_name_buf); istate->timestamp.sec = st.st_mtime; istate->timestamp.nsec = ST_MTIME_NSEC(st); From patchwork Wed Sep 26 19:54:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Peart X-Patchwork-Id: 10616731 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D7DE174A for ; Wed, 26 Sep 2018 19:54:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2D0002AED5 for ; Wed, 26 Sep 2018 19:54:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 210E42B2CB; Wed, 26 Sep 2018 19:54:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A64052AED5 for ; Wed, 26 Sep 2018 19:54:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726944AbeI0CJb (ORCPT ); Wed, 26 Sep 2018 22:09:31 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:44417 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726520AbeI0CJb (ORCPT ); Wed, 26 Sep 2018 22:09:31 -0400 Received: by mail-qt1-f195.google.com with SMTP id k38-v6so207250qtk.11 for ; Wed, 26 Sep 2018 12:54:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=RSMlkrUHvIUgQl3bjmF1TFsChPzSdupL3MF0sKJjdOM=; b=AaWuSX40b9zfGvnaXL29qsWP3nbuZR+wg0HeFu1XCf0N0Hl+bRN4N0juh5ZG8Ac++t qZGy6FQWci87RJrSbo0cyn9vSvy7BR4e3qZbNArIoYRnXaFNp888+ixHulGRrQt1yIpB xet4+nvwrZl66H56xi5z2eEEuIaQk1yr6vRM1dSY5AJxbUvyeUWpQlGdyYpCzTNiOFvj PAmJqXb5Rcp1VDCuhcTCo+84wlAVuyojc3L2CipFgRPcWirUFuoHSkIlG4L/IBzU7MQp F8qaiLvPMAU25QcINSPl1tKFXsI1bpAbMYPeIqCc3riC5fLNVHxvydfVtScoEBJuZv1+ McZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=RSMlkrUHvIUgQl3bjmF1TFsChPzSdupL3MF0sKJjdOM=; b=hSJRXsOca/ej4kQnmeeAs+9hFxALRysDP7bebsHNlzdQXAhKxiq13qkTTbthksuBUu fw3Y9wY7hGR/Ym7CmpfKtBbBmO+Q9jWC0dqH24uOw9V774yir34t+6mtsZVJkUc/P3qk wvyV2bpaAPsupWZMj4Ni3i5GlUkVG3DHsXORCUnwE7P7Odf8J4ZRxAQvR+55N+R4VfNs MuPeOsE7GJTN/fxQG2njVFe9SSY2ARrrlWHyd6XwU5ileIdmbX8aG9q4cFII7yWKOg8I vjdb4jzy/fd+ohLr0un14wbG6aI2wBVM1ZSjMG2Acpkx6aoX1fL2vOjxlTrYLuTrczXl txVQ== X-Gm-Message-State: ABuFfohqTtTjuY9c5XLNZNFIyUjofJ4O+qxloNowVHyGciVTu9XspUjK LvlTtQyAs3XTC54w859TbY2qoLh0XsA= X-Google-Smtp-Source: ACcGV60vRbw9ie4PLIHnHNRH+yVUrzxXdli5+Clgacr07xTV3BueTLnH9t2q5UDUMUWdljuzLY43Yg== X-Received: by 2002:a0c:cd82:: with SMTP id v2-v6mr5569438qvm.84.1537991695889; Wed, 26 Sep 2018 12:54:55 -0700 (PDT) Received: from localhost.localdomain (70-33-148-227.unassigned.ntelos.net. [70.33.148.227]) by smtp.gmail.com with ESMTPSA id 23-v6sm3314922qkn.11.2018.09.26.12.54.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Sep 2018 12:54:54 -0700 (PDT) From: Ben Peart X-Google-Original-From: Ben Peart To: git@vger.kernel.org Cc: gitster@pobox.com, pclouds@gmail.com, Ben Peart , Ben Peart Subject: [PATCH v6 2/7] read-cache: clean up casting and byte decoding Date: Wed, 26 Sep 2018 15:54:37 -0400 Message-Id: <20180926195442.1380-3-benpeart@microsoft.com> X-Mailer: git-send-email 2.18.0.windows.1 In-Reply-To: <20180926195442.1380-1-benpeart@microsoft.com> References: <20180823154053.20212-1-benpeart@microsoft.com> <20180926195442.1380-1-benpeart@microsoft.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch does a clean up pass to minimize the casting required to work with the memory mapped index (mmap). It also makes the decoding of network byte order more consistent by using get_be32() where possible. Signed-off-by: Ben Peart --- read-cache.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/read-cache.c b/read-cache.c index 583a4fb1f8..6ba99e2c96 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1650,7 +1650,7 @@ int verify_index_checksum; /* Allow fsck to force verification of the cache entry order. */ int verify_ce_order; -static int verify_hdr(struct cache_header *hdr, unsigned long size) +static int verify_hdr(const struct cache_header *hdr, unsigned long size) { git_hash_ctx c; unsigned char hash[GIT_MAX_RAWSZ]; @@ -1674,7 +1674,7 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size) } static int read_index_extension(struct index_state *istate, - const char *ext, void *data, unsigned long sz) + const char *ext, const char *data, unsigned long sz) { switch (CACHE_EXT(ext)) { case CACHE_EXT_TREE: @@ -1889,8 +1889,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) int fd, i; struct stat st; unsigned long src_offset; - struct cache_header *hdr; - void *mmap; + const struct cache_header *hdr; + const char *mmap; size_t mmap_size; const struct cache_entry *previous_ce = NULL; @@ -1918,7 +1918,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) die_errno("unable to map index file"); close(fd); - hdr = mmap; + hdr = (const struct cache_header *)mmap; if (verify_hdr(hdr, mmap_size) < 0) goto unmap; @@ -1943,7 +1943,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) struct cache_entry *ce; unsigned long consumed; - disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset); + disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset); ce = create_from_disk(istate, disk_ce, &consumed, previous_ce); set_index_entry(istate, i, ce); @@ -1961,21 +1961,20 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) * in 4-byte network byte order. */ uint32_t extsize; - memcpy(&extsize, (char *)mmap + src_offset + 4, 4); - extsize = ntohl(extsize); + extsize = get_be32(mmap + src_offset + 4); if (read_index_extension(istate, - (const char *) mmap + src_offset, - (char *) mmap + src_offset + 8, + mmap + src_offset, + mmap + src_offset + 8, extsize) < 0) goto unmap; src_offset += 8; src_offset += extsize; } - munmap(mmap, mmap_size); + munmap((void *)mmap, mmap_size); return istate->cache_nr; unmap: - munmap(mmap, mmap_size); + munmap((void *)mmap, mmap_size); die("index file corrupt"); } From patchwork Wed Sep 26 19:54:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Peart X-Patchwork-Id: 10616735 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C139180E for ; Wed, 26 Sep 2018 19:55:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 553BA2AED5 for ; Wed, 26 Sep 2018 19:55:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 492A82B2CB; Wed, 26 Sep 2018 19:55:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 01A2A2AED5 for ; Wed, 26 Sep 2018 19:55:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726959AbeI0CJd (ORCPT ); Wed, 26 Sep 2018 22:09:33 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:37310 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726520AbeI0CJc (ORCPT ); Wed, 26 Sep 2018 22:09:32 -0400 Received: by mail-qt1-f196.google.com with SMTP id n6-v6so259459qtl.4 for ; Wed, 26 Sep 2018 12:54:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Wl0aIdR7XAhHlMAIRyIwcuhjCIeJAgM+7pjBg+gs2lE=; b=OZ8fEYg3MNcwgUCqM/iuueQQtUh4OIeHC0dwbQp/ltVO9G/JpAPfUptmitf5/v8Mze IG8n83G1kIBfQdbrn5IVRb0ViNGjAzkp2/qX/sr19AYBxydvb892KiIy9mHvlnZax/q3 tdwTYIa8o4SjLqisEEIUXLYxEjngDMTFiL6HJEXzPhAr8itqsBALpOvKG+Di1YZgApQ4 28a6lqteF71q0XSz6kRsAfA0pdRsCvzFT9OsTVOdZ/2bOqdh4omMFL0t43zjbCI9hJ90 k3ta4G4MX8l4SQjDE0UiImTc4g6UaN9NkdpkkNsvHUiwXaM6B9dOHECydjm4BaC6ypB2 ZlTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Wl0aIdR7XAhHlMAIRyIwcuhjCIeJAgM+7pjBg+gs2lE=; b=STFNoc46p/71Q9SRDyfKlrUwEGqWxTFP0dk0KNqQgLNnThywzyTd9liFraVZoofWw1 frMu/zi6KX9pqZ/vL75Bs7WXGY0grhG0aHwxrwWbhCKjmSvNSWuZdosF4areqKm6TxTd KEAJal8a9ZyHphzqgkUSaXp3tcZ/aPcys+h31gOr8s0O5oK4p3m+A58NoGlbSggnYL1W PdAOeNG1x3UGyX/c4AyZYAI8J6MMOylT/WeCf05CtT4SFeDrb4rcZMiVys1pdWWTKVGQ G8TpamX8G91UL7Hf8nua7xX3iuTPKNCE2aRZ2obX0Lv3oU6yLVBSUcv+kxN24F2TUCAh 4NJQ== X-Gm-Message-State: ABuFfojVonqKuvQg+4w739Lp3duinCY1r73OvPpG9INeY6T+uxUMbMqI 7PbGqT/8DNY1RbmvqYzbWAcKESlY/Ag= X-Google-Smtp-Source: ACcGV63hvTgyMUCkPNeX6m9Zgfjm0UnXBP/bGV70HPJ8c3VAsSBLg5h0/tlC8EhFt/W1EM+xeH9ckg== X-Received: by 2002:a0c:bd1e:: with SMTP id m30-v6mr5503376qvg.234.1537991697226; Wed, 26 Sep 2018 12:54:57 -0700 (PDT) Received: from localhost.localdomain (70-33-148-227.unassigned.ntelos.net. [70.33.148.227]) by smtp.gmail.com with ESMTPSA id 23-v6sm3314922qkn.11.2018.09.26.12.54.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Sep 2018 12:54:56 -0700 (PDT) From: Ben Peart X-Google-Original-From: Ben Peart To: git@vger.kernel.org Cc: gitster@pobox.com, pclouds@gmail.com, Ben Peart , Ben Peart Subject: [PATCH v6 3/7] eoie: add End of Index Entry (EOIE) extension Date: Wed, 26 Sep 2018 15:54:38 -0400 Message-Id: <20180926195442.1380-4-benpeart@microsoft.com> X-Mailer: git-send-email 2.18.0.windows.1 In-Reply-To: <20180926195442.1380-1-benpeart@microsoft.com> References: <20180823154053.20212-1-benpeart@microsoft.com> <20180926195442.1380-1-benpeart@microsoft.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The End of Index Entry (EOIE) is used to locate the end of the variable length index entries and the beginning of the extensions. Code can take advantage of this to quickly locate the index extensions without having to parse through all of the index entries. Because it must be able to be loaded before the variable length cache entries and other index extensions, this extension must be written last. The signature for this extension is { 'E', 'O', 'I', 'E' }. The extension consists of: - 32-bit offset to the end of the index entries - 160-bit SHA-1 over the extension types and their sizes (but not their contents). E.g. if we have "TREE" extension that is N-bytes long, "REUC" extension that is M-bytes long, followed by "EOIE", then the hash would be: SHA-1("TREE" + + "REUC" + ) Signed-off-by: Ben Peart --- Documentation/technical/index-format.txt | 23 ++++ read-cache.c | 151 +++++++++++++++++++++-- t/README | 5 + t/t1700-split-index.sh | 1 + 4 files changed, 172 insertions(+), 8 deletions(-) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index db3572626b..6bc2d90f7f 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -314,3 +314,26 @@ The remaining data of each directory block is grouped by type: - An ewah bitmap, the n-th bit indicates whether the n-th index entry is not CE_FSMONITOR_VALID. + +== End of Index Entry + + The End of Index Entry (EOIE) is used to locate the end of the variable + length index entries and the begining of the extensions. Code can take + advantage of this to quickly locate the index extensions without having + to parse through all of the index entries. + + Because it must be able to be loaded before the variable length cache + entries and other index extensions, this extension must be written last. + The signature for this extension is { 'E', 'O', 'I', 'E' }. + + The extension consists of: + + - 32-bit offset to the end of the index entries + + - 160-bit SHA-1 over the extension types and their sizes (but not + their contents). E.g. if we have "TREE" extension that is N-bytes + long, "REUC" extension that is M-bytes long, followed by "EOIE", + then the hash would be: + + SHA-1("TREE" + + + "REUC" + ) diff --git a/read-cache.c b/read-cache.c index 6ba99e2c96..80255d3088 100644 --- a/read-cache.c +++ b/read-cache.c @@ -43,6 +43,7 @@ #define CACHE_EXT_LINK 0x6c696e6b /* "link" */ #define CACHE_EXT_UNTRACKED 0x554E5452 /* "UNTR" */ #define CACHE_EXT_FSMONITOR 0x46534D4E /* "FSMN" */ +#define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945 /* "EOIE" */ /* changes that can be kept in $GIT_DIR/index (basically all extensions) */ #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \ @@ -1693,6 +1694,9 @@ static int read_index_extension(struct index_state *istate, case CACHE_EXT_FSMONITOR: read_fsmonitor_extension(istate, data, sz); break; + case CACHE_EXT_ENDOFINDEXENTRIES: + /* already handled in do_read_index() */ + break; default: if (*ext < 'A' || 'Z' < *ext) return error("index uses %.4s extension, which we do not understand", @@ -1883,6 +1887,9 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries) return ondisk_size + entries * per_entry; } +static size_t read_eoie_extension(const char *mmap, size_t mmap_size); +static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset); + /* remember to discard_cache() before reading a different cache! */ int do_read_index(struct index_state *istate, const char *path, int must_exist) { @@ -2190,11 +2197,15 @@ static int ce_write(git_hash_ctx *context, int fd, void *data, unsigned int len) return 0; } -static int write_index_ext_header(git_hash_ctx *context, int fd, - unsigned int ext, unsigned int sz) +static int write_index_ext_header(git_hash_ctx *context, git_hash_ctx *eoie_context, + int fd, unsigned int ext, unsigned int sz) { ext = htonl(ext); sz = htonl(sz); + if (eoie_context) { + the_hash_algo->update_fn(eoie_context, &ext, 4); + the_hash_algo->update_fn(eoie_context, &sz, 4); + } return ((ce_write(context, fd, &ext, 4) < 0) || (ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0; } @@ -2437,7 +2448,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, { uint64_t start = getnanotime(); int newfd = tempfile->fd; - git_hash_ctx c; + git_hash_ctx c, eoie_c; struct cache_header hdr; int i, err = 0, removed, extended, hdr_version; struct cache_entry **cache = istate->cache; @@ -2446,6 +2457,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct ondisk_cache_entry_extended ondisk; struct strbuf previous_name_buf = STRBUF_INIT, *previous_name; int drop_cache_tree = istate->drop_cache_tree; + off_t offset; for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) @@ -2479,6 +2491,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0) return -1; + offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len; previous_name = (hdr_version == 4) ? &previous_name_buf : NULL; for (i = 0; i < entries; i++) { @@ -2512,11 +2525,14 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, return err; /* Write extension data here */ + offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len; + the_hash_algo->init_fn(&eoie_c); + if (!strip_extensions && istate->split_index) { struct strbuf sb = STRBUF_INIT; err = write_link_extension(&sb, istate) < 0 || - write_index_ext_header(&c, newfd, CACHE_EXT_LINK, + write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_LINK, sb.len) < 0 || ce_write(&c, newfd, sb.buf, sb.len) < 0; strbuf_release(&sb); @@ -2527,7 +2543,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; cache_tree_write(&sb, istate->cache_tree); - err = write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sb.len) < 0 + err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_TREE, sb.len) < 0 || ce_write(&c, newfd, sb.buf, sb.len) < 0; strbuf_release(&sb); if (err) @@ -2537,7 +2553,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; resolve_undo_write(&sb, istate->resolve_undo); - err = write_index_ext_header(&c, newfd, CACHE_EXT_RESOLVE_UNDO, + err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_RESOLVE_UNDO, sb.len) < 0 || ce_write(&c, newfd, sb.buf, sb.len) < 0; strbuf_release(&sb); @@ -2548,7 +2564,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; write_untracked_extension(&sb, istate->untracked); - err = write_index_ext_header(&c, newfd, CACHE_EXT_UNTRACKED, + err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_UNTRACKED, sb.len) < 0 || ce_write(&c, newfd, sb.buf, sb.len) < 0; strbuf_release(&sb); @@ -2559,7 +2575,23 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; write_fsmonitor_extension(&sb, istate); - err = write_index_ext_header(&c, newfd, CACHE_EXT_FSMONITOR, sb.len) < 0 + err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_FSMONITOR, sb.len) < 0 + || ce_write(&c, newfd, sb.buf, sb.len) < 0; + strbuf_release(&sb); + if (err) + return -1; + } + + /* + * CACHE_EXT_ENDOFINDEXENTRIES must be written as the last entry before the SHA1 + * so that it can be found and processed before all the index entries are + * read. + */ + if (!strip_extensions && offset && !git_env_bool("GIT_TEST_DISABLE_EOIE", 0)) { + struct strbuf sb = STRBUF_INIT; + + write_eoie_extension(&sb, &eoie_c, offset); + err = write_index_ext_header(&c, NULL, newfd, CACHE_EXT_ENDOFINDEXENTRIES, sb.len) < 0 || ce_write(&c, newfd, sb.buf, sb.len) < 0; strbuf_release(&sb); if (err) @@ -2975,3 +3007,106 @@ int should_validate_cache_entries(void) return validate_index_cache_entries; } + +#define EOIE_SIZE (4 + GIT_SHA1_RAWSZ) /* <4-byte offset> + <20-byte hash> */ +#define EOIE_SIZE_WITH_HEADER (4 + 4 + EOIE_SIZE) /* <4-byte signature> + <4-byte length> + EOIE_SIZE */ + +static size_t read_eoie_extension(const char *mmap, size_t mmap_size) +{ + /* + * The end of index entries (EOIE) extension is guaranteed to be last + * so that it can be found by scanning backwards from the EOF. + * + * "EOIE" + * <4-byte length> + * <4-byte offset> + * <20-byte hash> + */ + const char *index, *eoie; + uint32_t extsize; + size_t offset, src_offset; + unsigned char hash[GIT_MAX_RAWSZ]; + git_hash_ctx c; + + /* ensure we have an index big enough to contain an EOIE extension */ + if (mmap_size < sizeof(struct cache_header) + EOIE_SIZE_WITH_HEADER + the_hash_algo->rawsz) + return 0; + + /* validate the extension signature */ + index = eoie = mmap + mmap_size - EOIE_SIZE_WITH_HEADER - the_hash_algo->rawsz; + if (CACHE_EXT(index) != CACHE_EXT_ENDOFINDEXENTRIES) + return 0; + index += sizeof(uint32_t); + + /* validate the extension size */ + extsize = get_be32(index); + if (extsize != EOIE_SIZE) + return 0; + index += sizeof(uint32_t); + + /* + * Validate the offset we're going to look for the first extension + * signature is after the index header and before the eoie extension. + */ + offset = get_be32(index); + if (mmap + offset < mmap + sizeof(struct cache_header)) + return 0; + if (mmap + offset >= eoie) + return 0; + index += sizeof(uint32_t); + + /* + * The hash is computed over extension types and their sizes (but not + * their contents). E.g. if we have "TREE" extension that is N-bytes + * long, "REUC" extension that is M-bytes long, followed by "EOIE", + * then the hash would be: + * + * SHA-1("TREE" + + + * "REUC" + ) + */ + src_offset = offset; + the_hash_algo->init_fn(&c); + while (src_offset < mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER) { + /* After an array of active_nr index entries, + * there can be arbitrary number of extended + * sections, each of which is prefixed with + * extension name (4-byte) and section length + * in 4-byte network byte order. + */ + uint32_t extsize; + memcpy(&extsize, mmap + src_offset + 4, 4); + extsize = ntohl(extsize); + + /* verify the extension size isn't so large it will wrap around */ + if (src_offset + 8 + extsize < src_offset) + return 0; + + the_hash_algo->update_fn(&c, mmap + src_offset, 8); + + src_offset += 8; + src_offset += extsize; + } + the_hash_algo->final_fn(hash, &c); + if (hashcmp(hash, (const unsigned char *)index)) + return 0; + + /* Validate that the extension offsets returned us back to the eoie extension. */ + if (src_offset != mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER) + return 0; + + return offset; +} + +static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset) +{ + uint32_t buffer; + unsigned char hash[GIT_MAX_RAWSZ]; + + /* offset */ + put_be32(&buffer, offset); + strbuf_add(sb, &buffer, sizeof(uint32_t)); + + /* hash */ + the_hash_algo->final_fn(hash, eoie_context); + strbuf_add(sb, hash, the_hash_algo->rawsz); +} diff --git a/t/README b/t/README index 3ea6c85460..aa33ac4f26 100644 --- a/t/README +++ b/t/README @@ -327,6 +327,11 @@ GIT_TEST_COMMIT_GRAPH=, when true, forces the commit-graph to be written after every 'git commit' command, and overrides the 'core.commitGraph' setting to true. +GIT_TEST_DISABLE_EOIE= disables writing the EOIE extension. +This is used to allow tests 1, 4-9 in t1700-split-index.sh to succeed +as they currently hard code SHA values for the index which are no longer +valid due to the addition of the EOIE extension. + Naming Tests ------------ diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index be22398a85..1f168378c8 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -7,6 +7,7 @@ test_description='split index mode tests' # We need total control of index splitting here sane_unset GIT_TEST_SPLIT_INDEX sane_unset GIT_FSMONITOR_TEST +GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE test_expect_success 'enable split index' ' git config splitIndex.maxPercentChange 100 && From patchwork Wed Sep 26 19:54:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Peart X-Patchwork-Id: 10616733 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3A267174A for ; Wed, 26 Sep 2018 19:55:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2866C2B2AD for ; Wed, 26 Sep 2018 19:55:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B75D2B2D9; Wed, 26 Sep 2018 19:55:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A14642B2AD for ; Wed, 26 Sep 2018 19:55:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726978AbeI0CJe (ORCPT ); Wed, 26 Sep 2018 22:09:34 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:43851 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726947AbeI0CJd (ORCPT ); Wed, 26 Sep 2018 22:09:33 -0400 Received: by mail-qt1-f193.google.com with SMTP id l16-v6so215387qtq.10 for ; Wed, 26 Sep 2018 12:54:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=1xJn8qwab5K3lfhvxcZJA90PuF+dlIK6yO6YmbehqDM=; b=M0Ir3GfB3pnJj7HD45Q7w7TZGOMoOfOPm1eajDGCwO4mku2YpoIfMcSOWHarSiGD4e n41l8mlIeGAJkdizFywTRDdl+emfcyneAQguhDtFI4/SZ9hDpQiri2BojVX8oceknBlV pwMQhmJlHjzDauxfecsYzapNMLV3vP4zmfywPMzb268pYsuAfZUgs7smcIl/NDy9G0Hs pjFoZuxzOX1iI7hjvDvARpmFI8FyANCuQIl6ypRge8mxQvl0lfJCRECvTOJ6TfSNuje2 3PW5J0QlsKQcqPBJP5zGvR5p9an/ej+a6KibqlRBUloNsk/E/wcEhWuiwK2uJWGzmYuL dLbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=1xJn8qwab5K3lfhvxcZJA90PuF+dlIK6yO6YmbehqDM=; b=JXnIlzGUNmdu7DAkCsKXJ7x7uKef6dm6kbZ8LecqeLino+FXspBzwWwpGeAFqnlJXb OkHDG055mlxrYFRVSVhkWe2gPNYFbY7nf22YEcdRhtBsEzlSvQ8GqJcxh3iP2Rww0XAQ Ls30DpTuBHc+yOtXCHHH0H2ogvgHahTzidkgUmBL3+fLnxTAYS6NT5bmjwanVTjsfrfA E9aXV0D/4y1sgCxBLknYORxLIHa2Ed4dnHdMeuMYWroPDApuzMBjM8hYjHIH8A4v+cVN Yd1/j8uehCU7KciTK87mgQZRb2Mhfrl1PFV93P+NOA8Z7E4vj6jwbqpiXwRF+CtBTpZp M/5Q== X-Gm-Message-State: ABuFfog02IXW7ulPNlVFfTQrnPXkXFdAKYxsykzzjBGMEvAQssXznSJk Ls0CmHo307QUORzKqg57JgUUZaCBGco= X-Google-Smtp-Source: ACcGV63SiCIdskNWoJu3flkhL09zVaLzRMblyL2TRE76fTl6fz6yBWj90j/z8Ay48WgUKRMMlA1CHA== X-Received: by 2002:a0c:eb8d:: with SMTP id x13-v6mr5505573qvo.200.1537991698507; Wed, 26 Sep 2018 12:54:58 -0700 (PDT) Received: from localhost.localdomain (70-33-148-227.unassigned.ntelos.net. [70.33.148.227]) by smtp.gmail.com with ESMTPSA id 23-v6sm3314922qkn.11.2018.09.26.12.54.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Sep 2018 12:54:57 -0700 (PDT) From: Ben Peart X-Google-Original-From: Ben Peart To: git@vger.kernel.org Cc: gitster@pobox.com, pclouds@gmail.com, Ben Peart , Ben Peart Subject: [PATCH v6 4/7] config: add new index.threads config setting Date: Wed, 26 Sep 2018 15:54:39 -0400 Message-Id: <20180926195442.1380-5-benpeart@microsoft.com> X-Mailer: git-send-email 2.18.0.windows.1 In-Reply-To: <20180926195442.1380-1-benpeart@microsoft.com> References: <20180823154053.20212-1-benpeart@microsoft.com> <20180926195442.1380-1-benpeart@microsoft.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for a new index.threads config setting which will be used to control the threading code in do_read_index(). A value of 0 will tell the index code to automatically determine the correct number of threads to use. A value of 1 will make the code single threaded. A value greater than 1 will set the maximum number of threads to use. For testing purposes, this setting can be overwritten by setting the GIT_TEST_INDEX_THREADS= environment variable to a value greater than 0. Signed-off-by: Ben Peart --- Documentation/config.txt | 7 +++++++ config.c | 18 ++++++++++++++++++ config.h | 1 + t/README | 5 +++++ t/t1700-split-index.sh | 1 + 5 files changed, 32 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index ad0f4510c3..8fd973b76b 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2413,6 +2413,13 @@ imap:: The configuration variables in the 'imap' section are described in linkgit:git-imap-send[1]. +index.threads:: + Specifies the number of threads to spawn when loading the index. + This is meant to reduce index load time on multiprocessor machines. + Specifying 0 or 'true' will cause Git to auto-detect the number of + CPU's and set the number of threads accordingly. Specifying 1 or + 'false' will disable multithreading. Defaults to 'true'. + index.version:: Specify the version with which new index files should be initialized. This does not affect existing repositories. diff --git a/config.c b/config.c index 3461993f0a..2ee29f6f86 100644 --- a/config.c +++ b/config.c @@ -2289,6 +2289,24 @@ int git_config_get_fsmonitor(void) return 0; } +int git_config_get_index_threads(void) +{ + int is_bool, val = 0; + + val = git_env_ulong("GIT_TEST_INDEX_THREADS", 0); + if (val) + return val; + + if (!git_config_get_bool_or_int("index.threads", &is_bool, &val)) { + if (is_bool) + return val ? 0 : 1; + else + return val; + } + + return 0; /* auto */ +} + NORETURN void git_die_config_linenr(const char *key, const char *filename, int linenr) { diff --git a/config.h b/config.h index ab46e0165d..a06027e69b 100644 --- a/config.h +++ b/config.h @@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void); extern int git_config_get_split_index(void); extern int git_config_get_max_percent_split_change(void); extern int git_config_get_fsmonitor(void); +extern int git_config_get_index_threads(void); /* This dies if the configured or default date is in the future */ extern int git_config_get_expiry(const char *key, const char **output); diff --git a/t/README b/t/README index aa33ac4f26..0fcecf4500 100644 --- a/t/README +++ b/t/README @@ -332,6 +332,11 @@ This is used to allow tests 1, 4-9 in t1700-split-index.sh to succeed as they currently hard code SHA values for the index which are no longer valid due to the addition of the EOIE extension. +GIT_TEST_INDEX_THREADS= enables exercising the multi-threaded loading +of the index for the whole test suite by bypassing the default number of +cache entries and thread minimums. Settting this to 1 will make the +index loading single threaded. + Naming Tests ------------ diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index 1f168378c8..ab205954cf 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -8,6 +8,7 @@ test_description='split index mode tests' sane_unset GIT_TEST_SPLIT_INDEX sane_unset GIT_FSMONITOR_TEST GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE +GIT_TEST_INDEX_THREADS=1; export GIT_TEST_INDEX_THREADS test_expect_success 'enable split index' ' git config splitIndex.maxPercentChange 100 && From patchwork Wed Sep 26 19:54:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Peart X-Patchwork-Id: 10616737 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1AF6F174A for ; Wed, 26 Sep 2018 19:55:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 098BE2AED5 for ; Wed, 26 Sep 2018 19:55:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F20642B2CB; Wed, 26 Sep 2018 19:55:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D2D572AED5 for ; Wed, 26 Sep 2018 19:55:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727049AbeI0CJf (ORCPT ); Wed, 26 Sep 2018 22:09:35 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:46732 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726947AbeI0CJf (ORCPT ); Wed, 26 Sep 2018 22:09:35 -0400 Received: by mail-qt1-f194.google.com with SMTP id h22-v6so190513qtr.13 for ; Wed, 26 Sep 2018 12:55:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=emK+fbUZWqNXPeZ861ZcIcXDRFe6xxqmsDQ1+FzF25Y=; b=UMXgnTjeHOtYBQjSMixOHT94SOPBqIDC8cujBdBf6lHOmFxa3Ao306ExlQVg8MBYoD oMODfZrPkOYADv9MRtwsLFqFnR3FuPE5yNI8npl52XjNhMBe0tfiO5tMvSgt6g0Faogj Dl8CPDS8Q1L+Ds01BnoLAFGyzmLTzJlbgOzG070oiOlfUfec5Xhmq/tsqyUUzakpve70 83NE50LxdQfJWq/GqjDOvopRe9VVexwk+ugM1xmD+tuJt8A81F1FJ/kgbzx56PalMcpj xYUeB+JNk/ww4XnYu0uDUvUXd0PudHyf3PlWsO8AyxZCBq4AfMBv0joDSOqmfsH4Fbct nqTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=emK+fbUZWqNXPeZ861ZcIcXDRFe6xxqmsDQ1+FzF25Y=; b=Kxnngz8Iyh3U1YWeYO6s6vvwh/vkvjCjLnoScKPplqC66x682Ru6XLhDo8kepujI3M ffeWoYawzyP40yfk3ANedF85WpelwgRofTs9ARJedOoFZiFmSvmigZWzRbvzTS8OhsVi Qy+MueoHsx6sm91lgEyscakKN9pTAiSaO9nkjsM/SoDf2SOxk9R9YCHmuB+/AFzZw6v2 4UD7GLux0jnysMHzmzmc5vwmrnPTvsyM0e88c3eXRr6UEiG2TH3DGdL0aerKnMnj8Sw7 XLlULABYorqp8qotN8oBHfPDNAfhh4qlLqCqtkn4Eq4aE3PS5iTgMkmXtzp5E4L7IJmS dFDg== X-Gm-Message-State: ABuFfoiiMIwUyqpyt8q0F7v9cyc9GqUJoA9qamSt6uz1ww6b+ZoItBko 4ns3ayqVK4BEkRU7Sfk52/8kCZZx6cs= X-Google-Smtp-Source: ACcGV60bXSbioSUM3ThNew4J5/LUgCojoYKnpoVbcPsi0AJgLN7k5m5F0nGepMFrZ6jl7CGfXNXgsg== X-Received: by 2002:a0c:f4d1:: with SMTP id o17-v6mr2104502qvm.33.1537991699787; Wed, 26 Sep 2018 12:54:59 -0700 (PDT) Received: from localhost.localdomain (70-33-148-227.unassigned.ntelos.net. [70.33.148.227]) by smtp.gmail.com with ESMTPSA id 23-v6sm3314922qkn.11.2018.09.26.12.54.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Sep 2018 12:54:58 -0700 (PDT) From: Ben Peart X-Google-Original-From: Ben Peart To: git@vger.kernel.org Cc: gitster@pobox.com, pclouds@gmail.com, Ben Peart , Ben Peart Subject: [PATCH v6 5/7] read-cache: load cache extensions on a worker thread Date: Wed, 26 Sep 2018 15:54:40 -0400 Message-Id: <20180926195442.1380-6-benpeart@microsoft.com> X-Mailer: git-send-email 2.18.0.windows.1 In-Reply-To: <20180926195442.1380-1-benpeart@microsoft.com> References: <20180823154053.20212-1-benpeart@microsoft.com> <20180926195442.1380-1-benpeart@microsoft.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch helps address the CPU cost of loading the index by loading the cache extensions on a worker thread in parallel with loading the cache entries. In some cases, loading the extensions takes longer than loading the cache entries so this patch utilizes the new EOIE to start the thread to load the extensions before loading all the cache entries in parallel. This is possible because the current extensions don't access the cache entries in the index_state structure so are OK that they don't all exist yet. The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED extensions don't even get a pointer to the index so don't have access to the cache entries. CACHE_EXT_LINK only uses the index_state to initialize the split index. CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last update and dirty flags. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 files reduced the time by 0.53% Test w/1,000,000 files reduced the time by 27.78% Signed-off-by: Ben Peart --- read-cache.c | 97 +++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 81 insertions(+), 16 deletions(-) diff --git a/read-cache.c b/read-cache.c index 80255d3088..8da21c9273 100644 --- a/read-cache.c +++ b/read-cache.c @@ -23,6 +23,7 @@ #include "split-index.h" #include "utf8.h" #include "fsmonitor.h" +#include "thread-utils.h" /* Mask for the name length in ce_flags in the on-disk index */ @@ -1890,6 +1891,46 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries) static size_t read_eoie_extension(const char *mmap, size_t mmap_size); static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset); +struct load_index_extensions +{ +#ifndef NO_PTHREADS + pthread_t pthread; +#endif + struct index_state *istate; + const char *mmap; + size_t mmap_size; + unsigned long src_offset; +}; + +static void *load_index_extensions(void *_data) +{ + struct load_index_extensions *p = _data; + unsigned long src_offset = p->src_offset; + + while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) { + /* After an array of active_nr index entries, + * there can be arbitrary number of extended + * sections, each of which is prefixed with + * extension name (4-byte) and section length + * in 4-byte network byte order. + */ + uint32_t extsize; + memcpy(&extsize, p->mmap + src_offset + 4, 4); + extsize = ntohl(extsize); + if (read_index_extension(p->istate, + p->mmap + src_offset, + p->mmap + src_offset + 8, + extsize) < 0) { + munmap((void *)p->mmap, p->mmap_size); + die(_("index file corrupt")); + } + src_offset += 8; + src_offset += extsize; + } + + return NULL; +} + /* remember to discard_cache() before reading a different cache! */ int do_read_index(struct index_state *istate, const char *path, int must_exist) { @@ -1900,6 +1941,11 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) const char *mmap; size_t mmap_size; const struct cache_entry *previous_ce = NULL; + struct load_index_extensions p; + size_t extension_offset = 0; +#ifndef NO_PTHREADS + int nr_threads; +#endif if (istate->initialized) return istate->cache_nr; @@ -1936,6 +1982,30 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache)); istate->initialized = 1; + p.istate = istate; + p.mmap = mmap; + p.mmap_size = mmap_size; + +#ifndef NO_PTHREADS + nr_threads = git_config_get_index_threads(); + if (!nr_threads) + nr_threads = online_cpus(); + + if (nr_threads > 1) { + extension_offset = read_eoie_extension(mmap, mmap_size); + if (extension_offset) { + int err; + + p.src_offset = extension_offset; + err = pthread_create(&p.pthread, NULL, load_index_extensions, &p); + if (err) + die(_("unable to create load_index_extensions thread: %s"), strerror(err)); + + nr_threads--; + } + } +#endif + if (istate->version == 4) { mem_pool_init(&istate->ce_mem_pool, estimate_cache_size_from_compressed(istate->cache_nr)); @@ -1960,22 +2030,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) istate->timestamp.sec = st.st_mtime; istate->timestamp.nsec = ST_MTIME_NSEC(st); - while (src_offset <= mmap_size - the_hash_algo->rawsz - 8) { - /* After an array of active_nr index entries, - * there can be arbitrary number of extended - * sections, each of which is prefixed with - * extension name (4-byte) and section length - * in 4-byte network byte order. - */ - uint32_t extsize; - extsize = get_be32(mmap + src_offset + 4); - if (read_index_extension(istate, - mmap + src_offset, - mmap + src_offset + 8, - extsize) < 0) - goto unmap; - src_offset += 8; - src_offset += extsize; + /* if we created a thread, join it otherwise load the extensions on the primary thread */ +#ifndef NO_PTHREADS + if (extension_offset) { + int ret = pthread_join(p.pthread, NULL); + if (ret) + die(_("unable to join load_index_extensions thread: %s"), strerror(ret)); + } +#endif + if (!extension_offset) { + p.src_offset = src_offset; + load_index_extensions(&p); } munmap((void *)mmap, mmap_size); return istate->cache_nr; From patchwork Wed Sep 26 19:54:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Peart X-Patchwork-Id: 10616739 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38DCE174A for ; Wed, 26 Sep 2018 19:55:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 264B42AED5 for ; Wed, 26 Sep 2018 19:55:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1A87C2B2CB; Wed, 26 Sep 2018 19:55:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58FE32AED5 for ; Wed, 26 Sep 2018 19:55:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727064AbeI0CJh (ORCPT ); Wed, 26 Sep 2018 22:09:37 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:34924 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727020AbeI0CJg (ORCPT ); Wed, 26 Sep 2018 22:09:36 -0400 Received: by mail-qt1-f196.google.com with SMTP id z14-v6so274115qtn.2 for ; Wed, 26 Sep 2018 12:55:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=RpLe6DEWHkf+cUVhIx5zz/hXzghalXM6Ky0CS0ia+vg=; b=De/OuRW2ao+UDegwMnAyEinYYJIRWIZW26gUY2meO/6Xb6cwHB7+Nw87b2HolYkX8y RurPhqASADndRPDFQhv7WQ+ViV3Dc0SJq7mFsyVvuv3/YXkYoASBWBVYST6xhIMitDLg v4s7GR0cfuSk8liOKYlVLhGOIrPUEBJmKSSbIKOtwzMmDxmrtPrJtroHgXEXs80ns205 7P8EKilV+43uBxtEno+iwnyKNm0FWIqrt6MEYrBziVwhyyIKaQhhEzSiqOQOzO7CeD38 cJJuE+H0k5W7z/aFiR+6A/nCg1vPlvS+X4TU/SsSyNU5FDBA0ELu4xXC6rZl4DYJdK26 mYCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=RpLe6DEWHkf+cUVhIx5zz/hXzghalXM6Ky0CS0ia+vg=; b=sOePpWeJd9mPjIuFV1FCq9NCdN88SWsNL2E1qJgkr7DWKsoIazwltuZsGyjbl2+Owh FZT+E8tQzQk2vmhRJJha1zH9de6+czeKJyKxK/axQ8nZH8ixinDjWXQozUgKLyVlfPDm 8Q5YIYCVG73uc5xQEQAISfBWpvAKWPvHaotlDst+WkzhnF3YANvW7aCoCCUDM8pIafxJ 1/NqleZcDNUTEJ/xyMxohnjMaioRBZnIS6cNQgbjMDhVl5odgd44GfEBpY/sCZLnibzY CcAg6oOcuP3DL82OjGA5i4ygDpa1xlaLOEJkPWIbFi27RkzdpUORpKoCQUlgc0yFntiQ YOMQ== X-Gm-Message-State: ABuFfogCu+rETd4Df4Gzu5mQT4OC5K117CQmBFQLJpiyHDB0jYv+LBlN 3qOJDNupKNOTbSFf4iNO74mXjEtl/xc= X-Google-Smtp-Source: ACcGV63ZzTvrWLKyGB9zzL1hyNuCymNYlUhTz14yfq5zkE85udTzxU05VBp0XlI+g7IYOX1Pw3SP/w== X-Received: by 2002:aed:356a:: with SMTP id b39-v6mr5798142qte.186.1537991700871; Wed, 26 Sep 2018 12:55:00 -0700 (PDT) Received: from localhost.localdomain (70-33-148-227.unassigned.ntelos.net. [70.33.148.227]) by smtp.gmail.com with ESMTPSA id 23-v6sm3314922qkn.11.2018.09.26.12.54.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Sep 2018 12:55:00 -0700 (PDT) From: Ben Peart X-Google-Original-From: Ben Peart To: git@vger.kernel.org Cc: gitster@pobox.com, pclouds@gmail.com, Ben Peart Subject: [PATCH v6 6/7] ieot: add Index Entry Offset Table (IEOT) extension Date: Wed, 26 Sep 2018 15:54:41 -0400 Message-Id: <20180926195442.1380-7-benpeart@microsoft.com> X-Mailer: git-send-email 2.18.0.windows.1 In-Reply-To: <20180926195442.1380-1-benpeart@microsoft.com> References: <20180823154053.20212-1-benpeart@microsoft.com> <20180926195442.1380-1-benpeart@microsoft.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch enables addressing the CPU cost of loading the index by adding additional data to the index that will allow us to efficiently multi- thread the loading and conversion of cache entries. It accomplishes this by adding an (optional) index extension that is a table of offsets to blocks of cache entries in the index file. To make this work for V4 indexes, when writing the cache entries, it periodically "resets" the prefix-compression by encoding the current entry as if the path name for the previous entry is completely different and saves the offset of that entry in the IEOT. Basically, with V4 indexes, it generates offsets into blocks of prefix-compressed entries. Signed-off-by: Ben Peart --- Documentation/technical/index-format.txt | 18 +++ read-cache.c | 166 +++++++++++++++++++++++ 2 files changed, 184 insertions(+) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index 6bc2d90f7f..7c4d67aa6a 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -337,3 +337,21 @@ The remaining data of each directory block is grouped by type: SHA-1("TREE" + + "REUC" + ) + +== Index Entry Offset Table + + The Index Entry Offset Table (IEOT) is used to help address the CPU + cost of loading the index by enabling multi-threading the process of + converting cache entries from the on-disk format to the in-memory format. + The signature for this extension is { 'I', 'E', 'O', 'T' }. + + The extension consists of: + + - 32-bit version (currently 1) + + - A number of index offset entries each consisting of: + + - 32-bit offset from the begining of the file to the first cache entry + in this block of entries. + + - 32-bit count of cache entries in this block diff --git a/read-cache.c b/read-cache.c index 8da21c9273..9b0554d4e6 100644 --- a/read-cache.c +++ b/read-cache.c @@ -45,6 +45,7 @@ #define CACHE_EXT_UNTRACKED 0x554E5452 /* "UNTR" */ #define CACHE_EXT_FSMONITOR 0x46534D4E /* "FSMN" */ #define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945 /* "EOIE" */ +#define CACHE_EXT_INDEXENTRYOFFSETTABLE 0x49454F54 /* "IEOT" */ /* changes that can be kept in $GIT_DIR/index (basically all extensions) */ #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \ @@ -1696,6 +1697,7 @@ static int read_index_extension(struct index_state *istate, read_fsmonitor_extension(istate, data, sz); break; case CACHE_EXT_ENDOFINDEXENTRIES: + case CACHE_EXT_INDEXENTRYOFFSETTABLE: /* already handled in do_read_index() */ break; default: @@ -1888,6 +1890,23 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries) return ondisk_size + entries * per_entry; } +struct index_entry_offset +{ + /* starting byte offset into index file, count of index entries in this block */ + int offset, nr; +}; + +struct index_entry_offset_table +{ + int nr; + struct index_entry_offset entries[0]; +}; + +#ifndef NO_PTHREADS +static struct index_entry_offset_table *read_ieot_extension(const char *mmap, size_t mmap_size, size_t offset); +static void write_ieot_extension(struct strbuf *sb, struct index_entry_offset_table *ieot); +#endif + static size_t read_eoie_extension(const char *mmap, size_t mmap_size); static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset); @@ -1931,6 +1950,15 @@ static void *load_index_extensions(void *_data) return NULL; } +/* + * Mostly randomly chosen maximum thread counts: we + * cap the parallelism to online_cpus() threads, and we want + * to have at least 10000 cache entries per thread for it to + * be worth starting a thread. + */ + +#define THREAD_COST (10000) + /* remember to discard_cache() before reading a different cache! */ int do_read_index(struct index_state *istate, const char *path, int must_exist) { @@ -2523,6 +2551,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf previous_name_buf = STRBUF_INIT, *previous_name; int drop_cache_tree = istate->drop_cache_tree; off_t offset; + int ieot_work = 1; + struct index_entry_offset_table *ieot = NULL; + int nr; for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) @@ -2556,7 +2587,33 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0) return -1; +#ifndef NO_PTHREADS + if (!strip_extensions && (nr = git_config_get_index_threads()) != 1) { + int ieot_blocks, cpus; + + /* + * ensure default number of ieot blocks maps evenly to the + * default number of threads that will process them + */ + if (!nr) { + ieot_blocks = istate->cache_nr / THREAD_COST; + if (ieot_blocks < 1) + ieot_blocks = 1; + cpus = online_cpus(); + if (ieot_blocks > cpus - 1) + ieot_blocks = cpus - 1; + } else { + ieot_blocks = nr; + } + ieot = xcalloc(1, sizeof(struct index_entry_offset_table) + + (ieot_blocks * sizeof(struct index_entry_offset))); + ieot->nr = 0; + ieot_work = DIV_ROUND_UP(entries, ieot_blocks); + } +#endif + offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len; + nr = 0; previous_name = (hdr_version == 4) ? &previous_name_buf : NULL; for (i = 0; i < entries; i++) { @@ -2578,11 +2635,31 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, drop_cache_tree = 1; } + if (ieot && i && (i % ieot_work == 0)) { + ieot->entries[ieot->nr].nr = nr; + ieot->entries[ieot->nr].offset = offset; + ieot->nr++; + /* + * If we have a V4 index, set the first byte to an invalid + * character to ensure there is nothing common with the previous + * entry + */ + if (previous_name) + previous_name->buf[0] = 0; + nr = 0; + offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len; + } if (ce_write_entry(&c, newfd, ce, previous_name, (struct ondisk_cache_entry *)&ondisk) < 0) err = -1; if (err) break; + nr++; + } + if (ieot && nr) { + ieot->entries[ieot->nr].nr = nr; + ieot->entries[ieot->nr].offset = offset; + ieot->nr++; } strbuf_release(&previous_name_buf); @@ -2593,6 +2670,24 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len; the_hash_algo->init_fn(&eoie_c); + /* + * Lets write out CACHE_EXT_INDEXENTRYOFFSETTABLE first so that we + * can minimze the number of extensions we have to scan through to + * find it during load. + */ +#ifndef NO_PTHREADS + if (!strip_extensions && ieot) { + struct strbuf sb = STRBUF_INIT; + + write_ieot_extension(&sb, ieot); + err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_INDEXENTRYOFFSETTABLE, sb.len) < 0 + || ce_write(&c, newfd, sb.buf, sb.len) < 0; + strbuf_release(&sb); + if (err) + return -1; + } +#endif + if (!strip_extensions && istate->split_index) { struct strbuf sb = STRBUF_INIT; @@ -3175,3 +3270,74 @@ static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, the_hash_algo->final_fn(hash, eoie_context); strbuf_add(sb, hash, the_hash_algo->rawsz); } + +#ifndef NO_PTHREADS +#define IEOT_VERSION (1) + +static struct index_entry_offset_table *read_ieot_extension(const char *mmap, size_t mmap_size, size_t offset) +{ + const char *index = NULL; + uint32_t extsize, ext_version; + struct index_entry_offset_table *ieot; + int i, nr; + + /* find the IEOT extension */ + if (!offset) + return NULL; + while (offset <= mmap_size - the_hash_algo->rawsz - 8) { + extsize = get_be32(mmap + offset + 4); + if (CACHE_EXT((mmap + offset)) == CACHE_EXT_INDEXENTRYOFFSETTABLE) { + index = mmap + offset + 4 + 4; + break; + } + offset += 8; + offset += extsize; + } + if (!index) + return NULL; + + /* validate the version is IEOT_VERSION */ + ext_version = get_be32(index); + if (ext_version != IEOT_VERSION) + return NULL; + index += sizeof(uint32_t); + + /* extension size - version bytes / bytes per entry */ + nr = (extsize - sizeof(uint32_t)) / (sizeof(uint32_t) + sizeof(uint32_t)); + if (!nr) + return NULL; + ieot = xmalloc(sizeof(struct index_entry_offset_table) + + (nr * sizeof(struct index_entry_offset))); + ieot->nr = nr; + for (i = 0; i < nr; i++) { + ieot->entries[i].offset = get_be32(index); + index += sizeof(uint32_t); + ieot->entries[i].nr = get_be32(index); + index += sizeof(uint32_t); + } + + return ieot; +} + +static void write_ieot_extension(struct strbuf *sb, struct index_entry_offset_table *ieot) +{ + uint32_t buffer; + int i; + + /* version */ + put_be32(&buffer, IEOT_VERSION); + strbuf_add(sb, &buffer, sizeof(uint32_t)); + + /* ieot */ + for (i = 0; i < ieot->nr; i++) { + + /* offset */ + put_be32(&buffer, ieot->entries[i].offset); + strbuf_add(sb, &buffer, sizeof(uint32_t)); + + /* count */ + put_be32(&buffer, ieot->entries[i].nr); + strbuf_add(sb, &buffer, sizeof(uint32_t)); + } +} +#endif From patchwork Wed Sep 26 19:54:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Peart X-Patchwork-Id: 10616741 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ABA84175A for ; Wed, 26 Sep 2018 19:55:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 99D452AED5 for ; Wed, 26 Sep 2018 19:55:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8E0C02B2CB; Wed, 26 Sep 2018 19:55:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BE3FF2AED5 for ; Wed, 26 Sep 2018 19:55:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727090AbeI0CJi (ORCPT ); Wed, 26 Sep 2018 22:09:38 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:46737 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726947AbeI0CJh (ORCPT ); Wed, 26 Sep 2018 22:09:37 -0400 Received: by mail-qt1-f193.google.com with SMTP id h22-v6so190654qtr.13 for ; Wed, 26 Sep 2018 12:55:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=4Zg/9+QSzSVOD0cvyAnNzoxC77m09LMLUGN98T3le6Q=; b=UqgOsAs0odvJh9APS72eJLl/t/NhUKO/o8+OsXwZ1svsyOBq79SP7+y1oV9828LP0B wPAAn6JkfAUAiUW6uhMaFbcJxqvtYl5Qqqx0B5Cgzqewvee02cYnIYOWihA6iwO3HaPm tTzoVesyhdp5m0eUJr7JCvWSBFE0muXhGF0G97k4ukep32vQTNFuKqAf5w0re6zbNG/h qkum4uzkXe+zv52ZTmynvo8hz5R/+G+1Btcgkkne3LK3OhhodZ4zh/6AV1unYvsACAv/ 82JjcupWtRK5Z3yjyhvCX39KpGwdaAvYMqm02De2BStYycqAHvVJElbIXB7CCi1UuNUt B04w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=4Zg/9+QSzSVOD0cvyAnNzoxC77m09LMLUGN98T3le6Q=; b=bSYsyO7UJx/ms36mOMGNpcs9L/XN+Dh452Ii3EDTwwNavZtUyXrTF5fkG3N7ZgxBqS wGKamV2A+OVwySxzGSS2DUz3v3V2wnv8EdKoRA/5toBqcgtj9czwhwgR2FU80KOA65gQ NnsmovJGAqADkHPVi3D1Ojm7Do1PSRZaEi4xdjzLyqSfV/AoUCylqV7uF6AKjKWtmOFo Q8chwrVElrZuKy6YTNGiaaCQDGPCYPxKKlrHcseYojFn0LSlo5V0rxtz/Lxx2KmqBbsT +QKibpE0N8VHt9VP8W/tgtiXwKIqPvnrnD0NsqIbKlyUa8FouSvEq2qE2NBbH4KAShNy j7uA== X-Gm-Message-State: ABuFfoiuCOkC7G6+ttphWO1QnpMnJeSenNJepZPzN9HfusOogBSc9xgW LMw07h08hah7aWoKLUpSWGiOJBjqiYE= X-Google-Smtp-Source: ACcGV62ltIW1YHvHgUlqV7M5cAALY7dX5Pqul2nMtXhhch7FXunqvo9P/mUfgjmkpKSrMUoIEkyQdw== X-Received: by 2002:ac8:22ac:: with SMTP id f41-v6mr5699750qta.197.1537991702113; Wed, 26 Sep 2018 12:55:02 -0700 (PDT) Received: from localhost.localdomain (70-33-148-227.unassigned.ntelos.net. [70.33.148.227]) by smtp.gmail.com with ESMTPSA id 23-v6sm3314922qkn.11.2018.09.26.12.55.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Sep 2018 12:55:01 -0700 (PDT) From: Ben Peart X-Google-Original-From: Ben Peart To: git@vger.kernel.org Cc: gitster@pobox.com, pclouds@gmail.com, Ben Peart , Ben Peart Subject: [PATCH v6 7/7] read-cache: load cache entries on worker threads Date: Wed, 26 Sep 2018 15:54:42 -0400 Message-Id: <20180926195442.1380-8-benpeart@microsoft.com> X-Mailer: git-send-email 2.18.0.windows.1 In-Reply-To: <20180926195442.1380-1-benpeart@microsoft.com> References: <20180823154053.20212-1-benpeart@microsoft.com> <20180926195442.1380-1-benpeart@microsoft.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch helps address the CPU cost of loading the index by utilizing the Index Entry Offset Table (IEOT) to divide loading and conversion of the cache entries across multiple threads in parallel. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 files reduced the time by 32.24% Test w/1,000,000 files reduced the time by -4.77% Note that on the 1,000,000 files case, multi-threading the cache entry parsing does not yield a performance win. This is because the cost to parse the index extensions in this repo, far outweigh the cost of loading the cache entries. The high cost of parsing the index extensions is driven by the cache tree and the untracked cache extensions. As this is currently the longest pole, any reduction in this time will reduce the overall index load times so is worth further investigation in another patch series. Signed-off-by: Ben Peart --- read-cache.c | 224 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 189 insertions(+), 35 deletions(-) diff --git a/read-cache.c b/read-cache.c index 9b0554d4e6..f5d766088d 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1720,7 +1720,8 @@ int read_index(struct index_state *istate) return read_index_from(istate, get_index_file(), get_git_dir()); } -static struct cache_entry *create_from_disk(struct index_state *istate, +static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, + unsigned int version, struct ondisk_cache_entry *ondisk, unsigned long *ent_size, const struct cache_entry *previous_ce) @@ -1737,7 +1738,7 @@ static struct cache_entry *create_from_disk(struct index_state *istate, * number of bytes to be stripped from the end of the previous name, * and the bytes to append to the result, to come up with its name. */ - int expand_name_field = istate->version == 4; + int expand_name_field = version == 4; /* On-disk flags are just 16 bits */ flags = get_be16(&ondisk->flags); @@ -1761,16 +1762,17 @@ static struct cache_entry *create_from_disk(struct index_state *istate, const unsigned char *cp = (const unsigned char *)name; size_t strip_len, previous_len; - previous_len = previous_ce ? previous_ce->ce_namelen : 0; + /* If we're at the begining of a block, ignore the previous name */ strip_len = decode_varint(&cp); - if (previous_len < strip_len) { - if (previous_ce) + if (previous_ce) { + previous_len = previous_ce->ce_namelen; + if (previous_len < strip_len) die(_("malformed name field in the index, near path '%s'"), - previous_ce->name); - else - die(_("malformed name field in the index in the first path")); + previous_ce->name); + copy_len = previous_len - strip_len; + } else { + copy_len = 0; } - copy_len = previous_len - strip_len; name = (const char *)cp; } @@ -1780,7 +1782,7 @@ static struct cache_entry *create_from_disk(struct index_state *istate, len += copy_len; } - ce = mem_pool__ce_alloc(istate->ce_mem_pool, len); + ce = mem_pool__ce_alloc(ce_mem_pool, len); ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec); ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec); @@ -1950,6 +1952,52 @@ static void *load_index_extensions(void *_data) return NULL; } +/* + * A helper function that will load the specified range of cache entries + * from the memory mapped file and add them to the given index. + */ +static unsigned long load_cache_entry_block(struct index_state *istate, + struct mem_pool *ce_mem_pool, int offset, int nr, const char *mmap, + unsigned long start_offset, const struct cache_entry *previous_ce) +{ + int i; + unsigned long src_offset = start_offset; + + for (i = offset; i < offset + nr; i++) { + struct ondisk_cache_entry *disk_ce; + struct cache_entry *ce; + unsigned long consumed; + + disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset); + ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, &consumed, previous_ce); + set_index_entry(istate, i, ce); + + src_offset += consumed; + previous_ce = ce; + } + return src_offset - start_offset; +} + +static unsigned long load_all_cache_entries(struct index_state *istate, + const char *mmap, size_t mmap_size, unsigned long src_offset) +{ + unsigned long consumed; + + if (istate->version == 4) { + mem_pool_init(&istate->ce_mem_pool, + estimate_cache_size_from_compressed(istate->cache_nr)); + } else { + mem_pool_init(&istate->ce_mem_pool, + estimate_cache_size(mmap_size, istate->cache_nr)); + } + + consumed = load_cache_entry_block(istate, istate->ce_mem_pool, + 0, istate->cache_nr, mmap, src_offset, NULL); + return consumed; +} + +#ifndef NO_PTHREADS + /* * Mostly randomly chosen maximum thread counts: we * cap the parallelism to online_cpus() threads, and we want @@ -1959,20 +2007,125 @@ static void *load_index_extensions(void *_data) #define THREAD_COST (10000) +struct load_cache_entries_thread_data +{ + pthread_t pthread; + struct index_state *istate; + struct mem_pool *ce_mem_pool; + int offset; + const char *mmap; + struct index_entry_offset_table *ieot; + int ieot_offset; /* starting index into the ieot array */ + int ieot_work; /* count of ieot entries to process */ + unsigned long consumed; /* return # of bytes in index file processed */ +}; + +/* + * A thread proc to run the load_cache_entries() computation + * across multiple background threads. + */ +static void *load_cache_entries_thread(void *_data) +{ + struct load_cache_entries_thread_data *p = _data; + int i; + + /* iterate across all ieot blocks assigned to this thread */ + for (i = p->ieot_offset; i < p->ieot_offset + p->ieot_work; i++) { + p->consumed += load_cache_entry_block(p->istate, p->ce_mem_pool, p->offset, p->ieot->entries[i].nr, p->mmap, p->ieot->entries[i].offset, NULL); + p->offset += p->ieot->entries[i].nr; + } + return NULL; +} + +static unsigned long load_cache_entries_threaded(struct index_state *istate, const char *mmap, size_t mmap_size, + unsigned long src_offset, int nr_threads, struct index_entry_offset_table *ieot) +{ + int i, offset, ieot_work, ieot_offset, err; + struct load_cache_entries_thread_data *data; + unsigned long consumed = 0; + int nr; + + /* a little sanity checking */ + if (istate->name_hash_initialized) + BUG("the name hash isn't thread safe"); + + mem_pool_init(&istate->ce_mem_pool, 0); + data = xcalloc(nr_threads, sizeof(struct load_cache_entries_thread_data)); + + /* ensure we have no more threads than we have blocks to process */ + if (nr_threads > ieot->nr) + nr_threads = ieot->nr; + data = xcalloc(nr_threads, sizeof(struct load_cache_entries_thread_data)); + + offset = ieot_offset = 0; + ieot_work = DIV_ROUND_UP(ieot->nr, nr_threads); + for (i = 0; i < nr_threads; i++) { + struct load_cache_entries_thread_data *p = &data[i]; + int j; + + if (ieot_offset + ieot_work > ieot->nr) + ieot_work = ieot->nr - ieot_offset; + + p->istate = istate; + p->offset = offset; + p->mmap = mmap; + p->ieot = ieot; + p->ieot_offset = ieot_offset; + p->ieot_work = ieot_work; + + /* create a mem_pool for each thread */ + nr = 0; + for (j = p->ieot_offset; j < p->ieot_offset + p->ieot_work; j++) + nr += p->ieot->entries[j].nr; + if (istate->version == 4) { + mem_pool_init(&p->ce_mem_pool, + estimate_cache_size_from_compressed(nr)); + } + else { + mem_pool_init(&p->ce_mem_pool, + estimate_cache_size(mmap_size, nr)); + } + + err = pthread_create(&p->pthread, NULL, load_cache_entries_thread, p); + if (err) + die(_("unable to create load_cache_entries thread: %s"), strerror(err)); + + /* increment by the number of cache entries in the ieot block being processed */ + for (j = 0; j < ieot_work; j++) + offset += ieot->entries[ieot_offset + j].nr; + ieot_offset += ieot_work; + } + + for (i = 0; i < nr_threads; i++) { + struct load_cache_entries_thread_data *p = &data[i]; + + err = pthread_join(p->pthread, NULL); + if (err) + die(_("unable to join load_cache_entries thread: %s"), strerror(err)); + mem_pool_combine(istate->ce_mem_pool, p->ce_mem_pool); + consumed += p->consumed; + } + + free(data); + + return consumed; +} +#endif + /* remember to discard_cache() before reading a different cache! */ int do_read_index(struct index_state *istate, const char *path, int must_exist) { - int fd, i; + int fd; struct stat st; unsigned long src_offset; const struct cache_header *hdr; const char *mmap; size_t mmap_size; - const struct cache_entry *previous_ce = NULL; struct load_index_extensions p; size_t extension_offset = 0; #ifndef NO_PTHREADS - int nr_threads; + int nr_threads, cpus; + struct index_entry_offset_table *ieot = 0; #endif if (istate->initialized) @@ -2014,10 +2167,18 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) p.mmap = mmap; p.mmap_size = mmap_size; + src_offset = sizeof(*hdr); + #ifndef NO_PTHREADS nr_threads = git_config_get_index_threads(); - if (!nr_threads) - nr_threads = online_cpus(); + + /* TODO: does creating more threads than cores help? */ + if (!nr_threads) { + nr_threads = istate->cache_nr / THREAD_COST; + cpus = online_cpus(); + if (nr_threads > cpus) + nr_threads = cpus; + } if (nr_threads > 1) { extension_offset = read_eoie_extension(mmap, mmap_size); @@ -2032,29 +2193,22 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) nr_threads--; } } -#endif - - if (istate->version == 4) { - mem_pool_init(&istate->ce_mem_pool, - estimate_cache_size_from_compressed(istate->cache_nr)); - } else { - mem_pool_init(&istate->ce_mem_pool, - estimate_cache_size(mmap_size, istate->cache_nr)); - } - src_offset = sizeof(*hdr); - for (i = 0; i < istate->cache_nr; i++) { - struct ondisk_cache_entry *disk_ce; - struct cache_entry *ce; - unsigned long consumed; + /* + * Locate and read the index entry offset table so that we can use it + * to multi-thread the reading of the cache entries. + */ + if (extension_offset && nr_threads > 1) + ieot = read_ieot_extension(mmap, mmap_size, extension_offset); - disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset); - ce = create_from_disk(istate, disk_ce, &consumed, previous_ce); - set_index_entry(istate, i, ce); + if (ieot) + src_offset += load_cache_entries_threaded(istate, mmap, mmap_size, src_offset, nr_threads, ieot); + else + src_offset += load_all_cache_entries(istate, mmap, mmap_size, src_offset); +#else + src_offset += load_all_cache_entries(istate, mmap, mmap_size, src_offset); +#endif - src_offset += consumed; - previous_ce = ce; - } istate->timestamp.sec = st.st_mtime; istate->timestamp.nsec = ST_MTIME_NSEC(st);