From patchwork Mon Dec 28 11:15:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 11991075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5FE9C433DB for ; Mon, 28 Dec 2020 11:17:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A18C4229C4 for ; Mon, 28 Dec 2020 11:17:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727307AbgL1LQy (ORCPT ); Mon, 28 Dec 2020 06:16:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727019AbgL1LQy (ORCPT ); Mon, 28 Dec 2020 06:16:54 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5C58C061794 for ; Mon, 28 Dec 2020 03:16:13 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id d13so10983207wrc.13 for ; Mon, 28 Dec 2020 03:16:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=/BNtyX8PkcS7Xyds8I9R8qvbFAHFtzF4geBA1+bG1i4=; b=PplEAYQNRj3R5TOmlrnfGyP8Xsem9qQfc0L+Cg6MnhTyEJZjcPg0HpgPSYR6gLzLLM LBC4L2p1HxpjsvQaTOn2uKySndUws2Lowax2HZAmhiuvizq719eOEvnR/uvcIQweo55p SrnE1DxVA4bZMXsi6TaUyek3yA248EdmMUFbhhKwTIr8XEXXl2nW/qtWdVyOwVWRia4w Ipm0VdIFicu3ZPjqN8MgnI3ijcBM0v4vzBszyKuKlHALZ1dxNaEp7bXpiBk+4rxKPo5J nPkvmGILmW4pkj9N3Dy8GmjNARejPOBElGkRc3KdFwOySGJKaKhZP5SRYaTPtD6u/Hcp gjOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=/BNtyX8PkcS7Xyds8I9R8qvbFAHFtzF4geBA1+bG1i4=; b=YTEWH5uKZFqPVziSGZZ60vLaEo/VHvIj6Ns2P61CRtpx7l0WiRhdNEn1GMyJS700aA 9ql/IFHhK+w4dANJEukLvC3IfMB3ssn3881t1JboBYYgNAdFkKWI+cR1qHidsKChjzC8 E38RTAt8baldn2NDAoOLF2zkbkpJs1Ix5k225lcmkoTxIj1tzeZp+9M+jS83yu+ctbKY 5nZnvffmeD9q1P1u3p2akTz9HVCKkiybXyib+vjR44zHaBMsIgHvHaSSKuDwAG9XprXY FuJZtEevMUQdjn/Ix3dtEk1V0DIkSv/JVJSxXYvDsix0bS4L/udjwmfvqI/liAex5WFt HM5Q== X-Gm-Message-State: AOAM533MRIN6e5lW56WjjbJHpYfEgVQcGZvY2NePbV8AUr7ux+Px8fbK 9xXQYilr5K95/j70Xu5q4finlqMcH78= X-Google-Smtp-Source: ABdhPJzXpiaJ07JYUffXYWyo0L1fGSjXIkZBFKvd7LY3cRN9HSQuS/660z7z+1xOTsm/2L89f5ZhoA== X-Received: by 2002:adf:fe05:: with SMTP id n5mr51414352wrr.9.1609154171717; Mon, 28 Dec 2020 03:16:11 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x17sm55338130wro.40.2020.12.28.03.16.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Dec 2020 03:16:11 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 28 Dec 2020 11:15:58 +0000 Subject: [PATCH v5 01/11] commit-graph: fix regression when computing Bloom filters Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar Before computing Bloom fitlers, the commit-graph machinery uses commit_gen_cmp to sort commits by generation order for improved diff performance. 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) claims that this sort can reduce the time spent to compute Bloom filters by nearly half. But since c49c82aa4c (commit: move members graph_pos, generation to a slab, 2020-06-17), this optimization is broken, since asking for a 'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY while writing. Not all hope is lost, though: 'commit_graph_generation()' falls back to comparing commits by their date when they have equal generation number, and so since c49c82aa4c is purely a date comparision function. This heuristic is good enough that we don't seem to loose appreciable performance while computing Bloom filters. Applying this patch (compared with v2.29.1) speeds up computing Bloom filters by around ~4 seconds. So, avoid the useless 'commit_graph_generation()' while writing by instead accessing the slab directly. This returns the newly-computed generation numbers, and allows us to avoid the heuristic by directly comparing generation numbers. Signed-off-by: Abhishek Kumar --- commit-graph.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 06f8dc1d896..caf823295f4 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -144,8 +144,8 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *a = *(const struct commit **)va; const struct commit *b = *(const struct commit **)vb; - uint32_t generation_a = commit_graph_generation(a); - uint32_t generation_b = commit_graph_generation(b); + uint32_t generation_a = commit_graph_data_at(a)->generation; + uint32_t generation_b = commit_graph_data_at(b)->generation; /* lower generation commits first */ if (generation_a < generation_b) return -1; From patchwork Wed Oct 7 14:09:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 11820717 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7EB6917CF for ; Wed, 7 Oct 2020 14:09:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 592382076C for ; Wed, 7 Oct 2020 14:09:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="a0A3d4V0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728563AbgJGOJv (ORCPT ); Wed, 7 Oct 2020 10:09:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728177AbgJGOJv (ORCPT ); Wed, 7 Oct 2020 10:09:51 -0400 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E069DC0613D2 for ; Wed, 7 Oct 2020 07:09:50 -0700 (PDT) Received: by mail-wm1-x341.google.com with SMTP id 13so2490130wmf.0 for ; Wed, 07 Oct 2020 07:09:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=gcKi6qkxI9VO9dEsNVzoKfjgNi2BwArPJy/u5DDUbw8=; b=a0A3d4V0XNoJjIXKnnjIUrf8xH36CwzeII+bYGNpUN6ySpNltyPCTXGAwBQa8tfPoa xEw9uEbOgJHynuKHo49nbx/CbXXdKkeRNkNVcy9ivui1n8ac1ikpFuzSYa3IHFHJ72rx 0C+TEUaVFZ2hVXwR9cMber0TJ/3v4BpL1emBgQGgZDQT+vnL1EPoQJEgHB66RTVyCcxu 5d6f/yTdrBiEs/xOEjqgXnZUsNGCVSSBtpzf87Q4UwJlmIoUI5Ji1Igbcmi55hAJT2lr AjbELOriB9vQjI9RvBZsSrFLJ3kjNpv4BjJACsU+oVlxvz//1D/5oiUAvgMCeyMhQYrs a7MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=gcKi6qkxI9VO9dEsNVzoKfjgNi2BwArPJy/u5DDUbw8=; b=d3bQQjLKxmr/Mg9MVJJtsutrPWLFEdq2FSHgBGs7FVJxyZrStRhXi7aSzrepsbWM6V EqN+YMwB9uVhK5V0ANloS/v8xAv8AMryOrZoTvY8ssVOa2t4WfB8wcToTOfU6KIbwoo9 L/GH0y+L/3EdOyCCx2UvC3FKync35IdQa9D5dM//r4VDpVt7Kv6uqn4aH+B/n3GEG5wl ++u1RhtxQDltMO+clEB88U2IiC3PHgdSRFg7lt9MaRtcOMYP49TwMN3/jXJBHMrXtx1n CcE+bamw+w5rri4rqt4lniMhnB2iS4DGeEa5nxnyfaaND/sF8rybM9/FIWMR48fDiqPi +ynw== X-Gm-Message-State: AOAM532IPA5ia2D/S0/YPltYPE+TxOJyjm/pmhaSx0Iis0/yo6TZEt+i UyhA+4LLL1WKSX6O3NCVinLv6xHGN0A= X-Google-Smtp-Source: ABdhPJxyEjI+yjL5QEqA8/V49/EJxLHeey3NwQYmofekV92QFR5HGOk0ie6P+v/4g3/RimtqgREiZg== X-Received: by 2002:a7b:c01a:: with SMTP id c26mr2537759wmb.35.1602079789500; Wed, 07 Oct 2020 07:09:49 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e7sm3248490wrm.6.2020.10.07.07.09.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Oct 2020 07:09:49 -0700 (PDT) Message-Id: <4470d916428a28bb8277dfc4c3da84e08110e88e.1602079786.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Abhishek Kumar via GitGitGadget" Date: Wed, 07 Oct 2020 14:09:37 +0000 Subject: [PATCH v4 02/10] revision: parse parent in indegree_walk_step() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar In indegree_walk_step(), we add unvisited parents to the indegree queue. However, parents are not guaranteed to be parsed. As the indegree queue sorts by generation number, let's parse parents before inserting them to ensure the correct priority order. Signed-off-by: Abhishek Kumar --- revision.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/revision.c b/revision.c index aa62212040..c97abcdde1 100644 --- a/revision.c +++ b/revision.c @@ -3381,6 +3381,9 @@ static void indegree_walk_step(struct rev_info *revs) struct commit *parent = p->item; int *pi = indegree_slab_at(&info->indegree, parent); + if (repo_parse_commit_gently(revs->repo, parent, 1) < 0) + return; + if (*pi) (*pi)++; else From patchwork Wed Oct 7 14:09:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 11820721 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B9E117D2 for ; Wed, 7 Oct 2020 14:09:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 64CBB20870 for ; Wed, 7 Oct 2020 14:09:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NigiRWGp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728576AbgJGOJy (ORCPT ); Wed, 7 Oct 2020 10:09:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728569AbgJGOJw (ORCPT ); Wed, 7 Oct 2020 10:09:52 -0400 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE50BC0613D2 for ; Wed, 7 Oct 2020 07:09:51 -0700 (PDT) Received: by mail-wr1-x442.google.com with SMTP id t9so2318652wrq.11 for ; Wed, 07 Oct 2020 07:09:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=gEEemcudnq8KGHyGjlB6DzfDcvsvsJmzl9yYZeOG8QQ=; b=NigiRWGp+AL1Bw5p3/FEu0CDb5Oet06Xp/ANe+1X8aRDP5ypxIJ7Qo7XxS2GQ+8KBJ sq84oVAN1FtJrHyLTIEnwREdU/chN8Gq0NjgDZ7WbzL8uMDzkw77J8DhU6zdloCRcSNv 5kgAEa7kf87MKaraMigipHmzNXtqjPA8ZgR8cSthraCbByH2AtPSB+C7VDtKLzPpJCC+ oe5QJcN1DCiJdKw+s7KN1/uIAqcRlyiA2xQd5fJQc9z68YFsyRSPa+EQohsRMmvpG6pR CkUMuGW73wUvHx2a3pklEHHMWQAi3sQvadtFgCHwvs5oYf1STo6t+yAg6LW6BbCVN+GK gH5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=gEEemcudnq8KGHyGjlB6DzfDcvsvsJmzl9yYZeOG8QQ=; b=C3GXvhPET1WiULYM1C/F09qrDV6JaK4m7Gf7HW5xRu2qQ4f3VyHvfFdZsbBiTuCTUM 8oKeBlv+0P5o+ilg/mnw0WM4b1hquL6l8uH9wGyhy29D//gqYBeMZoWX4AwM24CEwRSi bpLYWfIvIh7U2Iky3QXfmEkpOJMbXn8YvbBeyIa7HrkBcBkUis+bOLfm0oagKA+h4wiS fPYy3dxuKC3dvtxeqXnJTywsM7kh2vHc9pVlWaJrEwpjqH0jzRxnGqBVSsghC6NL+T8n PU3baPaeUDkhZrTIHIK/g5regr4ITjA7u6Ub5ngGvUgmq93HFK82GJW0HxNjlE3cfr0k S2hA== X-Gm-Message-State: AOAM532PTNAyAe/FqF0vIWqAwiTw6tvp2Z7X4ygEFe20fiAuSUhJjreO znkCyOzO1A2Bw+70b8nKsPbnQG8QxL4= X-Google-Smtp-Source: ABdhPJxEVl3tWXWesdCGGgUPe4obnf7FjWKUG8luO9yz8MA0m1ASVvpIDBKg84EacxVf+YYVxfR+Bw== X-Received: by 2002:adf:ec50:: with SMTP id w16mr3786831wrn.265.1602079790451; Wed, 07 Oct 2020 07:09:50 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p21sm2928833wmc.28.2020.10.07.07.09.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Oct 2020 07:09:49 -0700 (PDT) Message-Id: <18bb3318a12c859c21c8e95285d551c48d31b54b.1602079786.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Abhishek Kumar via GitGitGadget" Date: Wed, 07 Oct 2020 14:09:38 +0000 Subject: [PATCH v4 03/10] commit-graph: consolidate fill_commit_graph_info Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar Both fill_commit_graph_info() and fill_commit_in_graph() parse information present in commit data chunk. Let's simplify the implementation by calling fill_commit_graph_info() within fill_commit_in_graph(). fill_commit_graph_info() used to not load committer data from commit data chunk. However, with the corrected committer date, we have to load committer date to calculate generation number value. e51217e15 (t5000: test tar files that overflow ustar headers, 30-06-2016) introduced a test 'generate tar with future mtime' that creates a commit with committer date of (2 ^ 36 + 1) seconds since EPOCH. The CDAT chunk provides 34-bits for storing committer date, thus committer time overflows into generation number (within CDAT chunk) and has undefined behavior. The test used to pass as fill_commit_graph_info() would not set struct member `date` of struct commit and loads committer date from the object database, generating a tar file with the expected mtime. However, with corrected commit date, we will load the committer date from CDAT chunk (truncated to lower 34-bits to populate the generation number. Thus, Git sets date and generates tar file with the truncated mtime. The ustar format (the header format used by most modern tar programs) only has room for 11 (or 12, depending om some implementations) octal digits for the size and mtime of each files. Thus, setting a timestamp of 2 ^ 33 + 1 would overflow the 11-octal digit implementations while still fitting into commit data chunk. Since we want to test 12-octal digit implementations of ustar as well, let's modify the existing test to no longer use commit-graph file. Signed-off-by: Abhishek Kumar --- commit-graph.c | 27 ++++++++++----------------- t/t5000-tar-tree.sh | 20 +++++++++++++++++++- 2 files changed, 29 insertions(+), 18 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 94503e584b..e8362e144e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -749,15 +749,24 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, const unsigned char *commit_data; struct commit_graph_data *graph_data; uint32_t lex_index; + uint64_t date_high, date_low; while (pos < g->num_commits_in_base) g = g->base_graph; + if (pos >= g->num_commits + g->num_commits_in_base) + die(_("invalid commit position. commit-graph is likely corrupt")); + lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; graph_data = commit_graph_data_at(item); graph_data->graph_pos = pos; + + date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; + date_low = get_be32(commit_data + g->hash_len + 12); + item->date = (timestamp_t)((date_high << 32) | date_low); + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } @@ -772,38 +781,22 @@ static int fill_commit_in_graph(struct repository *r, { uint32_t edge_value; uint32_t *parent_data_ptr; - uint64_t date_low, date_high; struct commit_list **pptr; - struct commit_graph_data *graph_data; const unsigned char *commit_data; uint32_t lex_index; while (pos < g->num_commits_in_base) g = g->base_graph; - if (pos >= g->num_commits + g->num_commits_in_base) - die(_("invalid commit position. commit-graph is likely corrupt")); + fill_commit_graph_info(item, g, pos); - /* - * Store the "full" position, but then use the - * "local" position for the rest of the calculation. - */ - graph_data = commit_graph_data_at(item); - graph_data->graph_pos = pos; lex_index = pos - g->num_commits_in_base; - commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; item->object.parsed = 1; set_commit_tree(item, NULL); - date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; - date_low = get_be32(commit_data + g->hash_len + 12); - item->date = (timestamp_t)((date_high << 32) | date_low); - - graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; - pptr = &item->parents; edge_value = get_be32(commit_data + g->hash_len); diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 3ebb0d3b65..8f41cdc509 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -431,11 +431,29 @@ test_expect_success TAR_HUGE,LONG_IS_64BIT 'system tar can read our huge size' ' test_cmp expect actual ' +test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' ' + rm -f .git/index && + echo foo >file && + git add file && + GIT_COMMITTER_DATE="@17179869183 +0000" \ + git commit -m "tempori parendum" +' + +test_expect_success TIME_IS_64BIT 'generate tar with future mtime' ' + git archive HEAD >future.tar +' + +test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read our future mtime' ' + echo 2514 >expect && + tar_info future.tar | cut -d" " -f2 >actual && + test_cmp expect actual +' + test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' ' rm -f .git/index && echo content >file && git add file && - GIT_COMMITTER_DATE="@68719476737 +0000" \ + GIT_TEST_COMMIT_GRAPH=0 GIT_COMMITTER_DATE="@68719476737 +0000" \ git commit -m "tempori parendum" ' From patchwork Wed Oct 7 14:09:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 11820727 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C61C017CF for ; Wed, 7 Oct 2020 14:10:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94895206BE for ; Wed, 7 Oct 2020 14:10:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TpDJOAjl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728594AbgJGOKD (ORCPT ); Wed, 7 Oct 2020 10:10:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728569AbgJGOJz (ORCPT ); Wed, 7 Oct 2020 10:09:55 -0400 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 523DEC061755 for ; Wed, 7 Oct 2020 07:09:53 -0700 (PDT) Received: by mail-wm1-x344.google.com with SMTP id q5so2542961wmq.0 for ; Wed, 07 Oct 2020 07:09:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=2OLV0rTilDjO+uJilIqgF6qGDlsngAxIeLV3TYH5nY4=; b=TpDJOAjl+8Mty5UdBrXQoO9LeM5B/DAwVVggLPjlDzHnlxxSOEtDdRfmeGGELTlfA6 9Altr3DJjo3SyuecJk0zn8GVyLlY80WxRGCoTtM61FP+z3RCxku+gjc4c5P7jQlS/U+A m63mFmfv+5W08iSb/RlfE4+ZEEscik0+rCZc6TlB6nIq17iWqGrbFZy+Xz6o6kwRpAaE R0EoIj+x5FUbyOKYv8SfODzJ/jOEUtIABjurgxq+flxDl9/JWsPCf+eQa42b5uuM3psL OdRqlbMeL64NsH2ASbWPe0DIuIe3/Z3on05o5O8J0vm56UbnwNX2D5HpmeWM3xG6Nsji 2H4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=2OLV0rTilDjO+uJilIqgF6qGDlsngAxIeLV3TYH5nY4=; b=fF/blTTh7HQ5WTVUcOHK+2C+OH29lkKFADX7GwJbrvjNW0yOjouZ62pmY2ookWbA75 vPe5GItP6N6b+t7xwcNZ2VAp2vsNX9AOR7HjOecGBErVn0cpaBhICX8Se8F5RFjje8hN KOY4bSLAdfj6rnER5yPYgYEJVS9M9t+25prMrEZcsV3zF+8fNh5/UmGFtVu2UmYTV+ID A5aJUcU7vV4CLzPelTlGXxFvoupsQLzq+LCgGRSeOVSwL481d9zqTIY/Tp0E+MBMjwHh PE8DLE7dccgpOMc7Alz/Nsdmt98q5sVtfXQtJiwEOVJFXGFzGgEWpQTPovQw1S0tQX9p 6KiA== X-Gm-Message-State: AOAM530ay64Ljsn/9XgHaf/dEvDWV4FpxE1MDAgXtpD3Rj6kBlCRoaUz 5kEONZI3T6uYqjjQAfCkh4NTQi1Sh+4= X-Google-Smtp-Source: ABdhPJwUWeJmbsA7EI1UvOclKDOv2X02VuhCpjRnZcxxCaYRqsiqqJ8lpz0Vsfj0lKlvjGhLRdicHw== X-Received: by 2002:a1c:96cf:: with SMTP id y198mr3719068wmd.104.1602079791459; Wed, 07 Oct 2020 07:09:51 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i15sm3029974wrb.91.2020.10.07.07.09.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Oct 2020 07:09:50 -0700 (PDT) Message-Id: <011b0aa497d1352bf54ac6a9e2e22ed92d409e64.1602079786.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Abhishek Kumar via GitGitGadget" Date: Wed, 07 Oct 2020 14:09:39 +0000 Subject: [PATCH v4 04/10] commit-graph: return 64-bit generation number Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar In a preparatory step, let's return timestamp_t values from commit_graph_generation(), use timestamp_t for local variables and define GENERATION_NUMBER_INFINITY as (2 ^ 63 - 1) instead. We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to represent the largest topological level we can store in the commit data chunk. With corrected commit dates implemented, we will have two such *_MAX variables to denote the largest offset and largest topological level that can be stored. Signed-off-by: Abhishek Kumar --- commit-graph.c | 22 +++++++++++----------- commit-graph.h | 4 ++-- commit-reach.c | 36 ++++++++++++++++++------------------ commit-reach.h | 2 +- commit.c | 4 ++-- commit.h | 4 ++-- revision.c | 10 +++++----- upload-pack.c | 2 +- 8 files changed, 42 insertions(+), 42 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index e8362e144e..bfc532de6f 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -99,7 +99,7 @@ uint32_t commit_graph_position(const struct commit *c) return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH; } -uint32_t commit_graph_generation(const struct commit *c) +timestamp_t commit_graph_generation(const struct commit *c) { struct commit_graph_data *data = commit_graph_data_slab_peek(&commit_graph_data_slab, c); @@ -144,8 +144,8 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *a = *(const struct commit **)va; const struct commit *b = *(const struct commit **)vb; - uint32_t generation_a = commit_graph_data_at(a)->generation; - uint32_t generation_b = commit_graph_data_at(b)->generation; + const timestamp_t generation_a = commit_graph_data_at(a)->generation; + const timestamp_t generation_b = commit_graph_data_at(b)->generation; /* lower generation commits first */ if (generation_a < generation_b) return -1; @@ -1350,7 +1350,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) _("Computing commit graph generation numbers"), ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { - uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; + timestamp_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; display_progress(ctx->progress, i + 1); if (generation != GENERATION_NUMBER_INFINITY && @@ -1383,8 +1383,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) data->generation = max_generation + 1; pop_commit(&list); - if (data->generation > GENERATION_NUMBER_MAX) - data->generation = GENERATION_NUMBER_MAX; + if (data->generation > GENERATION_NUMBER_V1_MAX) + data->generation = GENERATION_NUMBER_V1_MAX; } } } @@ -2404,8 +2404,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) for (i = 0; i < g->num_commits; i++) { struct commit *graph_commit, *odb_commit; struct commit_list *graph_parents, *odb_parents; - uint32_t max_generation = 0; - uint32_t generation; + timestamp_t max_generation = 0; + timestamp_t generation; display_progress(progress, i + 1); hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); @@ -2469,11 +2469,11 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) continue; /* - * If one of our parents has generation GENERATION_NUMBER_MAX, then - * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid + * If one of our parents has generation GENERATION_NUMBER_V1_MAX, then + * our generation is also GENERATION_NUMBER_V1_MAX. Decrement to avoid * extra logic in the following condition. */ - if (max_generation == GENERATION_NUMBER_MAX) + if (max_generation == GENERATION_NUMBER_V1_MAX) max_generation--; generation = commit_graph_generation(graph_commit); diff --git a/commit-graph.h b/commit-graph.h index f8e92500c6..8be247fa35 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -144,12 +144,12 @@ void disable_commit_graph(struct repository *r); struct commit_graph_data { uint32_t graph_pos; - uint32_t generation; + timestamp_t generation; }; /* * Commits should be parsed before accessing generation, graph positions. */ -uint32_t commit_graph_generation(const struct commit *); +timestamp_t commit_graph_generation(const struct commit *); uint32_t commit_graph_position(const struct commit *); #endif diff --git a/commit-reach.c b/commit-reach.c index 50175b159e..20b48b872b 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -32,12 +32,12 @@ static int queue_has_nonstale(struct prio_queue *queue) static struct commit_list *paint_down_to_common(struct repository *r, struct commit *one, int n, struct commit **twos, - int min_generation) + timestamp_t min_generation) { struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; struct commit_list *result = NULL; int i; - uint32_t last_gen = GENERATION_NUMBER_INFINITY; + timestamp_t last_gen = GENERATION_NUMBER_INFINITY; if (!min_generation) queue.compare = compare_commits_by_commit_date; @@ -58,10 +58,10 @@ static struct commit_list *paint_down_to_common(struct repository *r, struct commit *commit = prio_queue_get(&queue); struct commit_list *parents; int flags; - uint32_t generation = commit_graph_generation(commit); + timestamp_t generation = commit_graph_generation(commit); if (min_generation && generation > last_gen) - BUG("bad generation skip %8x > %8x at %s", + BUG("bad generation skip %"PRItime" > %"PRItime" at %s", generation, last_gen, oid_to_hex(&commit->object.oid)); last_gen = generation; @@ -177,12 +177,12 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt repo_parse_commit(r, array[i]); for (i = 0; i < cnt; i++) { struct commit_list *common; - uint32_t min_generation = commit_graph_generation(array[i]); + timestamp_t min_generation = commit_graph_generation(array[i]); if (redundant[i]) continue; for (j = filled = 0; j < cnt; j++) { - uint32_t curr_generation; + timestamp_t curr_generation; if (i == j || redundant[j]) continue; filled_index[filled] = j; @@ -321,7 +321,7 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, { struct commit_list *bases; int ret = 0, i; - uint32_t generation, max_generation = GENERATION_NUMBER_ZERO; + timestamp_t generation, max_generation = GENERATION_NUMBER_INFINITY; if (repo_parse_commit(r, commit)) return ret; @@ -470,7 +470,7 @@ static int in_commit_list(const struct commit_list *want, struct commit *c) static enum contains_result contains_test(struct commit *candidate, const struct commit_list *want, struct contains_cache *cache, - uint32_t cutoff) + timestamp_t cutoff) { enum contains_result *cached = contains_cache_at(cache, candidate); @@ -506,11 +506,11 @@ static enum contains_result contains_tag_algo(struct commit *candidate, { struct contains_stack contains_stack = { 0, 0, NULL }; enum contains_result result; - uint32_t cutoff = GENERATION_NUMBER_INFINITY; + timestamp_t cutoff = GENERATION_NUMBER_INFINITY; const struct commit_list *p; for (p = want; p; p = p->next) { - uint32_t generation; + timestamp_t generation; struct commit *c = p->item; load_commit_graph_info(the_repository, c); generation = commit_graph_generation(c); @@ -566,8 +566,8 @@ static int compare_commits_by_gen(const void *_a, const void *_b) const struct commit *a = *(const struct commit * const *)_a; const struct commit *b = *(const struct commit * const *)_b; - uint32_t generation_a = commit_graph_generation(a); - uint32_t generation_b = commit_graph_generation(b); + timestamp_t generation_a = commit_graph_generation(a); + timestamp_t generation_b = commit_graph_generation(b); if (generation_a < generation_b) return -1; @@ -580,7 +580,7 @@ int can_all_from_reach_with_flag(struct object_array *from, unsigned int with_flag, unsigned int assign_flag, time_t min_commit_date, - uint32_t min_generation) + timestamp_t min_generation) { struct commit **list = NULL; int i; @@ -681,13 +681,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, time_t min_commit_date = cutoff_by_min_date ? from->item->date : 0; struct commit_list *from_iter = from, *to_iter = to; int result; - uint32_t min_generation = GENERATION_NUMBER_INFINITY; + timestamp_t min_generation = GENERATION_NUMBER_INFINITY; while (from_iter) { add_object_array(&from_iter->item->object, NULL, &from_objs); if (!parse_commit(from_iter->item)) { - uint32_t generation; + timestamp_t generation; if (from_iter->item->date < min_commit_date) min_commit_date = from_iter->item->date; @@ -701,7 +701,7 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, while (to_iter) { if (!parse_commit(to_iter->item)) { - uint32_t generation; + timestamp_t generation; if (to_iter->item->date < min_commit_date) min_commit_date = to_iter->item->date; @@ -741,13 +741,13 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit_list *found_commits = NULL; struct commit **to_last = to + nr_to; struct commit **from_last = from + nr_from; - uint32_t min_generation = GENERATION_NUMBER_INFINITY; + timestamp_t min_generation = GENERATION_NUMBER_INFINITY; int num_to_find = 0; struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; for (item = to; item < to_last; item++) { - uint32_t generation; + timestamp_t generation; struct commit *c = *item; parse_commit(c); diff --git a/commit-reach.h b/commit-reach.h index b49ad71a31..148b56fea5 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -87,7 +87,7 @@ int can_all_from_reach_with_flag(struct object_array *from, unsigned int with_flag, unsigned int assign_flag, time_t min_commit_date, - uint32_t min_generation); + timestamp_t min_generation); int can_all_from_reach(struct commit_list *from, struct commit_list *to, int commit_date_cutoff); diff --git a/commit.c b/commit.c index f53429c0ac..3b488381d5 100644 --- a/commit.c +++ b/commit.c @@ -731,8 +731,8 @@ int compare_commits_by_author_date(const void *a_, const void *b_, int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused) { const struct commit *a = a_, *b = b_; - const uint32_t generation_a = commit_graph_generation(a), - generation_b = commit_graph_generation(b); + const timestamp_t generation_a = commit_graph_generation(a), + generation_b = commit_graph_generation(b); /* newer commits first */ if (generation_a < generation_b) diff --git a/commit.h b/commit.h index 5467786c7b..33c66b2177 100644 --- a/commit.h +++ b/commit.h @@ -11,8 +11,8 @@ #include "commit-slab.h" #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF -#define GENERATION_NUMBER_MAX 0x3FFFFFFF +#define GENERATION_NUMBER_INFINITY ((1ULL << 63) - 1) +#define GENERATION_NUMBER_V1_MAX 0x3FFFFFFF #define GENERATION_NUMBER_ZERO 0 struct commit_list { diff --git a/revision.c b/revision.c index c97abcdde1..2861f1c45c 100644 --- a/revision.c +++ b/revision.c @@ -3308,7 +3308,7 @@ define_commit_slab(indegree_slab, int); define_commit_slab(author_date_slab, timestamp_t); struct topo_walk_info { - uint32_t min_generation; + timestamp_t min_generation; struct prio_queue explore_queue; struct prio_queue indegree_queue; struct prio_queue topo_queue; @@ -3354,7 +3354,7 @@ static void explore_walk_step(struct rev_info *revs) } static void explore_to_depth(struct rev_info *revs, - uint32_t gen_cutoff) + timestamp_t gen_cutoff) { struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; @@ -3397,7 +3397,7 @@ static void indegree_walk_step(struct rev_info *revs) } static void compute_indegrees_to_depth(struct rev_info *revs, - uint32_t gen_cutoff) + timestamp_t gen_cutoff) { struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; @@ -3455,7 +3455,7 @@ static void init_topo_walk(struct rev_info *revs) info->min_generation = GENERATION_NUMBER_INFINITY; for (list = revs->commits; list; list = list->next) { struct commit *c = list->item; - uint32_t generation; + timestamp_t generation; if (repo_parse_commit_gently(revs->repo, c, 1)) continue; @@ -3516,7 +3516,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) for (p = commit->parents; p; p = p->next) { struct commit *parent = p->item; int *pi; - uint32_t generation; + timestamp_t generation; if (parent->object.flags & UNINTERESTING) continue; diff --git a/upload-pack.c b/upload-pack.c index 3b858eb457..fdb82885b6 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -497,7 +497,7 @@ static int got_oid(struct upload_pack_data *data, static int ok_to_give_up(struct upload_pack_data *data) { - uint32_t min_generation = GENERATION_NUMBER_ZERO; + timestamp_t min_generation = GENERATION_NUMBER_ZERO; if (!data->have_obj.nr) return 0; From patchwork Wed Oct 7 14:09:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 11820723 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02928139F for ; Wed, 7 Oct 2020 14:09:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CC1B2208C7 for ; Wed, 7 Oct 2020 14:09:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bmAGzSvD" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728585AbgJGOJ4 (ORCPT ); Wed, 7 Oct 2020 10:09:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728577AbgJGOJy (ORCPT ); Wed, 7 Oct 2020 10:09:54 -0400 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E74A6C0613D2 for ; Wed, 7 Oct 2020 07:09:53 -0700 (PDT) Received: by mail-wm1-x344.google.com with SMTP id p15so2467868wmi.4 for ; Wed, 07 Oct 2020 07:09:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=y8g9rCoMUT6HMs0B9h7lOOzFLzETFkz2Vc3uvyYBhP8=; b=bmAGzSvD6LKWJnPy4m2vVi7VDmcO/gZ9y+gseDIba3XjCRmd92u/ipJzEUB7Jrihjv P6EAN9vM5XpOi3bHSe+GsYmoss/iqIjSShSCsgRB/oMkqxai1nRn201cooRycbtRUZJr ooLMNr3hnpogHs9bIoy2iXEX2+7xy5bgDZmWtGoDLvZkxXhfmh+pqFquw+QBajHcXizL LZ3pmpwWnuv27bTVOyF3DbuCDKX21p4wojgdXRZQ2PVLdgO9WoTdzMCGh8+t95WpyaVY UUUCr3NZuNu0oAGIzHxwRM1TU+gMFFOzZU1Wg+9j6riIQgW+HNQQvxCz5OfdtGzv/Z1I 8sWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=y8g9rCoMUT6HMs0B9h7lOOzFLzETFkz2Vc3uvyYBhP8=; b=bzTgqjNrp4WznzSW5n2XjRcPS2nDM11/1awEl28ufbLC/zqNEWPgLRNyQUNQAUe/uG OgewI3jiuBbPj0XYiHWZiiqzGEgNbtPg4qX2HE06qDpEjV/CNEqdkBls56gS89VhA2+B XwG9FG/Vl5aVeGlgYgGk3ClkpUcpcvsWGvGmxTHZXucJfTgb6J5C0IcTYzHiKhXBPvoS CNhwcSIneM8pQTx34pf/burjBl2OsiPA0v5PSCCCTL4n3P0P78Ofe0moljOFnmaD3K30 u49cGOMMSXU0dxKZI7LdMZnlnUl5X3VuZcBKkm+jDO2QWmUaMljWHVwSujZg+p78FtKn ubkA== X-Gm-Message-State: AOAM530URtmecmy/OriEdCLtk5v6Ih7NV8mDs252OCIgU2wfHAn8IBE1 4uNYf+iP95scmm8CMshWet34IPOY59w= X-Google-Smtp-Source: ABdhPJztHDYBnFUrDY+8gSLpMS269B5scxowurV/sWQHFnuWpTooEmabK1ISsdgwq7RKu4A2UDRfSw== X-Received: by 2002:a7b:ce86:: with SMTP id q6mr3663996wmj.163.1602079792268; Wed, 07 Oct 2020 07:09:52 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d30sm3302195wrc.19.2020.10.07.07.09.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Oct 2020 07:09:51 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Abhishek Kumar via GitGitGadget" Date: Wed, 07 Oct 2020 14:09:40 +0000 Subject: [PATCH v4 05/10] commit-graph: add a slab to store topological levels Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar In a later commit we will introduce corrected commit date as the generation number v2. This value will be stored in the new seperate Generation Data chunk. However, to ensure backwards compatibility with "Old" Git we need to continue to write generation number v1, which is topological level, to the commit data chunk. This means that we need to compute both versions of generation numbers when writing the commit-graph file. Therefore, let's introduce a commit-slab to store topological levels; corrected commit date will be stored in the member `generation` of struct commit_graph_data. When Git creates a split commit-graph, it takes advantage of the generation values that have been computed already and present in existing commit-graph files. So, let's add a pointer to struct commit_graph as well as struct write_commit_graph_context to the topological level commit-slab and populate it with topological levels while writing a commit-graph file. Signed-off-by: Abhishek Kumar --- commit-graph.c | 47 ++++++++++++++++++++++++++++++++--------------- commit-graph.h | 1 + 2 files changed, 33 insertions(+), 15 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index bfc532de6f..cedd311024 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -64,6 +64,8 @@ void git_test_write_commit_graph_or_die(void) /* Remember to update object flag allocation in object.h */ #define REACHABLE (1u<<15) +define_commit_slab(topo_level_slab, uint32_t); + /* Keep track of the order in which commits are added to our list. */ define_commit_slab(commit_pos, int); static struct commit_pos commit_pos = COMMIT_SLAB_INIT(1, commit_pos); @@ -768,6 +770,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, item->date = (timestamp_t)((date_high << 32) | date_low); graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + + if (g->topo_levels) + *topo_level_slab_at(g->topo_levels, item) = get_be32(commit_data + g->hash_len + 8) >> 2; } static inline void set_commit_tree(struct commit *c, struct tree *t) @@ -962,6 +967,7 @@ struct write_commit_graph_context { changed_paths:1, order_by_pack:1; + struct topo_level_slab *topo_levels; const struct commit_graph_opts *opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; @@ -1108,7 +1114,7 @@ static int write_graph_chunk_data(struct hashfile *f, else packedDate[0] = 0; - packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2); + packedDate[0] |= htonl(*topo_level_slab_at(ctx->topo_levels, *list) << 2); packedDate[1] = htonl((*list)->date); hashwrite(f, packedDate, 8); @@ -1350,11 +1356,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) _("Computing commit graph generation numbers"), ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { - timestamp_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; + timestamp_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]); display_progress(ctx->progress, i + 1); - if (generation != GENERATION_NUMBER_INFINITY && - generation != GENERATION_NUMBER_ZERO) + if (level != GENERATION_NUMBER_INFINITY && + level != GENERATION_NUMBER_ZERO) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1362,29 +1368,27 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) struct commit *current = list->item; struct commit_list *parent; int all_parents_computed = 1; - uint32_t max_generation = 0; + uint32_t max_level = 0; for (parent = current->parents; parent; parent = parent->next) { - generation = commit_graph_data_at(parent->item)->generation; + level = *topo_level_slab_at(ctx->topo_levels, parent->item); - if (generation == GENERATION_NUMBER_INFINITY || - generation == GENERATION_NUMBER_ZERO) { + if (level == GENERATION_NUMBER_INFINITY || + level == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (generation > max_generation) { - max_generation = generation; + } else if (level > max_level) { + max_level = level; } } if (all_parents_computed) { - struct commit_graph_data *data = commit_graph_data_at(current); - - data->generation = max_generation + 1; pop_commit(&list); - if (data->generation > GENERATION_NUMBER_V1_MAX) - data->generation = GENERATION_NUMBER_V1_MAX; + if (max_level > GENERATION_NUMBER_V1_MAX - 1) + max_level = GENERATION_NUMBER_V1_MAX - 1; + *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; } } } @@ -2142,6 +2146,7 @@ int write_commit_graph(struct object_directory *odb, int res = 0; int replace = 0; struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; + struct topo_level_slab topo_levels; if (!commit_graph_compatible(the_repository)) return 0; @@ -2163,6 +2168,18 @@ int write_commit_graph(struct object_directory *odb, bloom_settings.max_changed_paths); ctx->bloom_settings = &bloom_settings; + init_topo_level_slab(&topo_levels); + ctx->topo_levels = &topo_levels; + + if (ctx->r->objects->commit_graph) { + struct commit_graph *g = ctx->r->objects->commit_graph; + + while (g) { + g->topo_levels = &topo_levels; + g = g->base_graph; + } + } + if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) ctx->changed_paths = 1; if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { diff --git a/commit-graph.h b/commit-graph.h index 8be247fa35..2e9aa7824e 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -73,6 +73,7 @@ struct commit_graph { const unsigned char *chunk_bloom_indexes; const unsigned char *chunk_bloom_data; + struct topo_level_slab *topo_levels; struct bloom_filter_settings *bloom_filter_settings; }; From patchwork Wed Oct 7 14:09:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 11820737 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5EA9717CF for ; Wed, 7 Oct 2020 14:10:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 388FD20870 for ; Wed, 7 Oct 2020 14:10:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AA/5JsT+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728591AbgJGOKC (ORCPT ); Wed, 7 Oct 2020 10:10:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728584AbgJGOJ4 (ORCPT ); Wed, 7 Oct 2020 10:09:56 -0400 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7BC5C0613D2 for ; Wed, 7 Oct 2020 07:09:55 -0700 (PDT) Received: by mail-wr1-x442.google.com with SMTP id t9so2318900wrq.11 for ; Wed, 07 Oct 2020 07:09:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=MqpZbVRT0AAkrfqM5/ytby9slzWuY5bfW7NJH6O8wUE=; b=AA/5JsT+naD5Ok/hQeuxXZ2t3c5bQrTdIMfeqdbebR+SDk9rw0e5fsPZc1VvZ0PQs/ Ks5F1A9SL0LFfXbfCN/EuslW04LS2bI0350K82s/RMXsyBSAYlamkjj5I1f31d7lANbZ 3arNJXayccav3GwfHD1w1jkVL5IRjyvVEcjgNBKzRg3tuxLPP+lTUftXSLLEgZS6V83B YmvNloqNuOsY0Uj3P+nnVDpNjIlMbbRaBA/HINxCbOl5VcKuRNIQt8Nn4zePgA4lOmWF fYanBe3+uIWAwKkrs6GI6HMHREbfgBIrMXNPwNibkZqU206N0ZzLS01Bi5s/TGSYlAyU SikA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=MqpZbVRT0AAkrfqM5/ytby9slzWuY5bfW7NJH6O8wUE=; b=MLyQSXIE70Pziu3V7EC3L1RC+tfBvhPJNoMgCVZAOsJ81Eo2RPSLzcmwX3YLYD7vi1 ak7RL7l1p/E1rhzWNZd1wth2iBY2ZD1Qi3zQKH3+SJsgv83YtioHOxUZ4JFeycMJJQpn WzEd81ixd3hYiZTNl/YECIIsuu0abrIIUH/EM74sGD27XLsdZ8OtzynRyVhL+qyhBGBg UF6km01XgWwMk5JuaoX/vpXz/HGmT1IEHlAhciSfy0tGwDME2wU8jio6pqKA/aJIMH75 VFjbpI8umeiWCyQ2dHQT81amdqsUp3jrHWA/NMWd2mtNBlOmSu3/R/90++mXZy6EoIHG OO9w== X-Gm-Message-State: AOAM533Gbs0wD3n1ok5djY77UW3qVYqLhmJKOtPfjHIUeuwEXGi3xJ+C FAw3ykFjpoBeBrmNJEgR/N/uxKWR5H8= X-Google-Smtp-Source: ABdhPJz3l/kKdlrX1j57Vrht3XBKLRX7m9PV4f47YtRrOmPSFVFJu8rxl5D7ecLMA0v4Ot6Var1l4g== X-Received: by 2002:adf:9541:: with SMTP id 59mr3795473wrs.396.1602079793955; Wed, 07 Oct 2020 07:09:53 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a199sm2960371wmd.8.2020.10.07.07.09.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Oct 2020 07:09:52 -0700 (PDT) Message-Id: <694ef1ec08d9dc96a74a2631b2710ad206397dbc.1602079786.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Abhishek Kumar via GitGitGadget" Date: Wed, 07 Oct 2020 14:09:41 +0000 Subject: [PATCH v4 06/10] commit-graph: implement corrected commit date Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar With most of preparations done, let's implement corrected commit date. The corrected commit date for a commit is defined as: * A commit with no parents (a root commit) has corrected commit date equal to its committer date. * A commit with at least one parent has corrected commit date equal to the maximum of its commit date and one more than the largest corrected commit date among its parents. As a special case, a root commit with timestamp of zero (01.01.1970 00:00:00Z) has corrected commit date of one, to be able to distinguish from GENERATION_NUMBER_ZERO (that is, an uncomputed corrected commit date). To minimize the space required to store corrected commit date, Git stores corrected commit date offsets into the commit-graph file. The corrected commit date offset for a commit is defined as the difference between its corrected commit date and actual commit date. Storing corrected commit date requires sizeof(timestamp_t) bytes, which in most cases is 64 bits (uintmax_t). However, corrected commit date offsets can be safely stored using only 32-bits. This halves the size of GDAT chunk, which is a reduction of around 6% in the size of commit-graph file. However, using offsets be problematic if one of commits is malformed but valid and has committerdate of 0 Unix time, as the offset would be the same as corrected commit date and thus require 64-bits to be stored properly. While Git does not write out offsets at this stage, Git stores the corrected commit dates in member generation of struct commit_graph_data. It will begin writing commit date offsets with the introduction of generation data chunk. Signed-off-by: Abhishek Kumar --- commit-graph.c | 43 +++++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index cedd311024..03948adfce 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -154,11 +154,6 @@ static int commit_gen_cmp(const void *va, const void *vb) else if (generation_a > generation_b) return 1; - /* use date as a heuristic when generations are equal */ - if (a->date < b->date) - return -1; - else if (a->date > b->date) - return 1; return 0; } @@ -1357,10 +1352,14 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { timestamp_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]); + timestamp_t corrected_commit_date = commit_graph_data_at(ctx->commits.list[i])->generation; display_progress(ctx->progress, i + 1); if (level != GENERATION_NUMBER_INFINITY && - level != GENERATION_NUMBER_ZERO) + level != GENERATION_NUMBER_ZERO && + corrected_commit_date != GENERATION_NUMBER_INFINITY && + corrected_commit_date != GENERATION_NUMBER_ZERO + ) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1369,17 +1368,25 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) struct commit_list *parent; int all_parents_computed = 1; uint32_t max_level = 0; + timestamp_t max_corrected_commit_date = 0; for (parent = current->parents; parent; parent = parent->next) { level = *topo_level_slab_at(ctx->topo_levels, parent->item); - + corrected_commit_date = commit_graph_data_at(parent->item)->generation; if (level == GENERATION_NUMBER_INFINITY || - level == GENERATION_NUMBER_ZERO) { + level == GENERATION_NUMBER_ZERO || + corrected_commit_date == GENERATION_NUMBER_INFINITY || + corrected_commit_date == GENERATION_NUMBER_ZERO + ) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (level > max_level) { - max_level = level; + } else { + if (level > max_level) + max_level = level; + + if (corrected_commit_date > max_corrected_commit_date) + max_corrected_commit_date = corrected_commit_date; } } @@ -1389,6 +1396,10 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) if (max_level > GENERATION_NUMBER_V1_MAX - 1) max_level = GENERATION_NUMBER_V1_MAX - 1; *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; + + if (current->date && current->date > max_corrected_commit_date) + max_corrected_commit_date = current->date - 1; + commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; } } } @@ -2485,17 +2496,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) if (generation_zero == GENERATION_ZERO_EXISTS) continue; - /* - * If one of our parents has generation GENERATION_NUMBER_V1_MAX, then - * our generation is also GENERATION_NUMBER_V1_MAX. Decrement to avoid - * extra logic in the following condition. - */ - if (max_generation == GENERATION_NUMBER_V1_MAX) - max_generation--; - generation = commit_graph_generation(graph_commit); - if (generation != max_generation + 1) - graph_report(_("commit-graph generation for commit %s is %u != %u"), + if (generation < max_generation + 1) + graph_report(_("commit-graph generation for commit %s is %"PRItime" < %"PRItime), oid_to_hex(&cur_oid), generation, max_generation + 1); From patchwork Wed Oct 7 14:09:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 11820731 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 82CB9139F for ; Wed, 7 Oct 2020 14:10:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3F6192076C for ; Wed, 7 Oct 2020 14:10:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Cbu5qWZm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728599AbgJGOKE (ORCPT ); Wed, 7 Oct 2020 10:10:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728577AbgJGOJ6 (ORCPT ); Wed, 7 Oct 2020 10:09:58 -0400 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1DD6C0613D3 for ; Wed, 7 Oct 2020 07:09:57 -0700 (PDT) Received: by mail-wr1-x444.google.com with SMTP id n18so2355566wrs.5 for ; Wed, 07 Oct 2020 07:09:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=RocCDtpoyRW0SLXFpraKO5FlpTJOQ4Sv19gqDP/ZfVQ=; b=Cbu5qWZmw33msUL/7stztqoR0p/M7O+e3qFkhPkfNrShws0iKF8JMYzDSYm4qVHe/s U9FfwEq7U1bU0PtEM+B1gLJQ7B2FmuhPPR7RvteR0ki2A3iObSTlT03ttQ21R1b2SFRx d66f19rHAlr+nZbt5NLKe2ylt1TCxrm0rQJl0N0OeNBfuHXADP6eVdxX+VoG9AJ2om8k ORew51BquT9rmWn5Jc7wBIcRQ96tG0HWGWAdiHVXkCfccTq6YNQJSJaV0PmqUYcXedLq yl88N2wEdYASX4jgVCp5pKvTtGSdCqRqC1veTT2JWwEqfQoAzIIpSqj32SCAOF6k7osg 28tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=RocCDtpoyRW0SLXFpraKO5FlpTJOQ4Sv19gqDP/ZfVQ=; b=YFYci1z8PeRMR/4Ty4Ct0D/MfHV30e59g1XRmsgIYJ0jozoUW7YLTe7e3PwQyJpLGb +HBmEUm2g0e8yNSsmwI9NXBBmfdGI+0Tbs5yAgZ6d813I9cN7DQ+H5+3FM1erOxwOq2/ mGiq8elgx+jwZ79quQBJLQPXiSG5tKmvzLRS8EvTOhMJZF1y1VgPPqgLhyngCKRBG//A Oh2+GGzQPUAJ6JQqMhFtKcvYoguC0WH/uxpN6HWUnFSvozl6P6O2YskjeVIdXpIg5oB/ cJ5cd3eCX7d5yuc9R/T/ZQ2NbWOyfEZUU0fGTnBQarrXQ1JlAjuEFzQXQZa0jjJpiZcR 9T3w== X-Gm-Message-State: AOAM530lXdUxGNQ5xNDzhOOMiOAray9ZcPp0tbwMgiimyPIDfHIaEa6I Xq1Yy2KiwMYYEECNpwI7CYdY3WSm+3U= X-Google-Smtp-Source: ABdhPJzv5rewMCf9KI3u3QMD9TWQDma/Nu06ZHYhVt7SkA2Q2M+ImAmVh6JohnavqMWWkqbiDFXjsQ== X-Received: by 2002:adf:fac8:: with SMTP id a8mr3651207wrs.186.1602079795644; Wed, 07 Oct 2020 07:09:55 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d18sm2979511wrm.10.2020.10.07.07.09.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Oct 2020 07:09:54 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Abhishek Kumar via GitGitGadget" Date: Wed, 07 Oct 2020 14:09:42 +0000 Subject: [PATCH v4 07/10] commit-graph: implement generation data chunk MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation Data chunk (or GDAT). GDAT stores corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. While storing corrected commit date offset instead of the corrected commit date saves us 4 bytes per commit, it's possible for the offsets to overflow the 4-bytes allocated. As such overflows are exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, GENERATION_DATA_OVERFLOW ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar --- commit-graph.c | 98 +++++++++++++++++++++++++++++++++-- commit-graph.h | 3 ++ commit.h | 1 + t/README | 3 ++ t/helper/test-read-graph.c | 4 ++ t/t4216-log-bloom.sh | 4 +- t/t5318-commit-graph.sh | 70 ++++++++++++++++++++----- t/t5324-split-commit-graph.sh | 12 ++--- t/t6600-test-reach.sh | 68 +++++++++++++----------- 9 files changed, 206 insertions(+), 57 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 03948adfce..71d0b243db 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -38,11 +38,13 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ +#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */ +#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f56 /* "GDOV" */ #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ -#define MAX_NUM_CHUNKS 7 +#define MAX_NUM_CHUNKS 9 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -61,6 +63,8 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \ + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) +#define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31) + /* Remember to update object flag allocation in object.h */ #define REACHABLE (1u<<15) @@ -385,6 +389,20 @@ struct commit_graph *parse_commit_graph(struct repository *r, graph->chunk_commit_data = data + chunk_offset; break; + case GRAPH_CHUNKID_GENERATION_DATA: + if (graph->chunk_generation_data) + chunk_repeated = 1; + else + graph->chunk_generation_data = data + chunk_offset; + break; + + case GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW: + if (graph->chunk_generation_data_overflow) + chunk_repeated = 1; + else + graph->chunk_generation_data_overflow = data + chunk_offset; + break; + case GRAPH_CHUNKID_EXTRAEDGES: if (graph->chunk_extra_edges) chunk_repeated = 1; @@ -745,8 +763,8 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, { const unsigned char *commit_data; struct commit_graph_data *graph_data; - uint32_t lex_index; - uint64_t date_high, date_low; + uint32_t lex_index, offset_pos; + uint64_t date_high, date_low, offset; while (pos < g->num_commits_in_base) g = g->base_graph; @@ -764,7 +782,16 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + if (g->chunk_generation_data) { + offset = (timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); + + if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { + offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW; + graph_data->generation = get_be64(g->chunk_generation_data_overflow + 8 * offset_pos); + } else + graph_data->generation = item->date + offset; + } else + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; if (g->topo_levels) *topo_level_slab_at(g->topo_levels, item) = get_be32(commit_data + g->hash_len + 8) >> 2; @@ -942,6 +969,7 @@ struct write_commit_graph_context { struct packed_oid_list oids; struct packed_commit_list commits; int num_extra_edges; + int num_generation_data_overflows; unsigned long approx_nr_objects; struct progress *progress; int progress_done; @@ -960,7 +988,8 @@ struct write_commit_graph_context { report_progress:1, split:1, changed_paths:1, - order_by_pack:1; + order_by_pack:1, + write_generation_data:1; struct topo_level_slab *topo_levels; const struct commit_graph_opts *opts; @@ -1120,6 +1149,44 @@ static int write_graph_chunk_data(struct hashfile *f, return 0; } +static int write_graph_chunk_generation_data(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + int i, num_generation_data_overflows = 0; + for (i = 0; i < ctx->commits.nr; i++) { + struct commit *c = ctx->commits.list[i]; + timestamp_t offset = commit_graph_data_at(c)->generation - c->date; + display_progress(ctx->progress, ++ctx->progress_cnt); + + if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { + offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows; + num_generation_data_overflows++; + } + + hashwrite_be32(f, offset); + } + + return 0; +} + +static int write_graph_chunk_generation_data_overflow(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + int i; + for (i = 0; i < ctx->commits.nr; i++) { + struct commit *c = ctx->commits.list[i]; + timestamp_t offset = commit_graph_data_at(c)->generation - c->date; + display_progress(ctx->progress, ++ctx->progress_cnt); + + if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { + hashwrite_be32(f, offset >> 32); + hashwrite_be32(f, (uint32_t) offset); + } + } + + return 0; +} + static int write_graph_chunk_extra_edges(struct hashfile *f, struct write_commit_graph_context *ctx) { @@ -1399,7 +1466,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) if (current->date && current->date > max_corrected_commit_date) max_corrected_commit_date = current->date - 1; + commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; + + if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) + ctx->num_generation_data_overflows++; } } } @@ -1765,6 +1836,21 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) chunks[2].id = GRAPH_CHUNKID_DATA; chunks[2].size = (hashsz + 16) * ctx->commits.nr; chunks[2].write_fn = write_graph_chunk_data; + + if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0)) + ctx->write_generation_data = 0; + if (ctx->write_generation_data) { + chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA; + chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; + chunks[num_chunks].write_fn = write_graph_chunk_generation_data; + num_chunks++; + } + if (ctx->num_generation_data_overflows) { + chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW; + chunks[num_chunks].size = sizeof(timestamp_t) * ctx->num_generation_data_overflows; + chunks[num_chunks].write_fn = write_graph_chunk_generation_data_overflow; + num_chunks++; + } if (ctx->num_extra_edges) { chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES; chunks[num_chunks].size = 4 * ctx->num_extra_edges; @@ -2170,6 +2256,8 @@ int write_commit_graph(struct object_directory *odb, ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; + ctx->write_generation_data = 1; + ctx->num_generation_data_overflows = 0; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", bloom_settings.bits_per_entry); diff --git a/commit-graph.h b/commit-graph.h index 2e9aa7824e..19a02001fd 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -6,6 +6,7 @@ #include "oidset.h" #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" +#define GIT_TEST_COMMIT_GRAPH_NO_GDAT "GIT_TEST_COMMIT_GRAPH_NO_GDAT" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE" #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS" @@ -68,6 +69,8 @@ struct commit_graph { const uint32_t *chunk_oid_fanout; const unsigned char *chunk_oid_lookup; const unsigned char *chunk_commit_data; + const unsigned char *chunk_generation_data; + const unsigned char *chunk_generation_data_overflow; const unsigned char *chunk_extra_edges; const unsigned char *chunk_base_graphs; const unsigned char *chunk_bloom_indexes; diff --git a/commit.h b/commit.h index 33c66b2177..251d877fcf 100644 --- a/commit.h +++ b/commit.h @@ -14,6 +14,7 @@ #define GENERATION_NUMBER_INFINITY ((1ULL << 63) - 1) #define GENERATION_NUMBER_V1_MAX 0x3FFFFFFF #define GENERATION_NUMBER_ZERO 0 +#define GENERATION_NUMBER_V2_OFFSET_MAX ((1ULL << 31) - 1) struct commit_list { struct commit *item; diff --git a/t/README b/t/README index 2adaf7c2d2..975c054bc9 100644 --- a/t/README +++ b/t/README @@ -379,6 +379,9 @@ GIT_TEST_COMMIT_GRAPH=, when true, forces the commit-graph to be written after every 'git commit' command, and overrides the 'core.commitGraph' setting to true. +GIT_TEST_COMMIT_GRAPH_NO_GDAT=, when true, forces the +commit-graph to be written without generation data chunk. + GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=, when true, forces commit-graph write to compute and write changed path Bloom filters for every 'git commit-graph write', as if the `--changed-paths` option was diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 5f585a1725..75927b2c81 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -33,6 +33,10 @@ int cmd__read_graph(int argc, const char **argv) printf(" oid_lookup"); if (graph->chunk_commit_data) printf(" commit_metadata"); + if (graph->chunk_generation_data) + printf(" generation_data"); + if (graph->chunk_generation_data_overflow) + printf(" generation_data_overflow"); if (graph->chunk_extra_edges) printf(" extra_edges"); if (graph->chunk_bloom_indexes) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index d11040ce41..dbde016188 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -40,11 +40,11 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' ' graph_read_expect () { - NUM_CHUNKS=5 + NUM_CHUNKS=6 cat >expect <<- EOF header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 - chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data + chunks: oid_fanout oid_lookup commit_metadata generation_data bloom_indexes bloom_data EOF test-tool read-graph >actual && test_cmp expect actual diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 2ed0c1544d..0328e98564 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -76,7 +76,7 @@ graph_git_behavior 'no graph' full commits/3 commits/1 graph_read_expect() { OPTIONAL="" NUM_CHUNKS=3 - if test ! -z $2 + if test ! -z "$2" then OPTIONAL=" $2" NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) @@ -103,14 +103,14 @@ test_expect_success 'exit with correct error on bad input to --stdin-commits' ' # valid commit and tree OID git rev-parse HEAD HEAD^{tree} >in && git commit-graph write --stdin-commits >commits-in && cat commits-in | git commit-graph write --stdin-commits && test_path_is_file $objdir/info/commit-graph && - graph_read_expect "6" + graph_read_expect "6" "generation_data" ' graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1 @@ -297,7 +297,7 @@ test_expect_success 'build graph from commits with append' ' cd "$TRASH_DIRECTORY/full" && git rev-parse merge/3 | git commit-graph write --stdin-commits --append && test_path_is_file $objdir/info/commit-graph && - graph_read_expect "10" "extra_edges" + graph_read_expect "10" "generation_data extra_edges" ' graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1 @@ -307,7 +307,7 @@ test_expect_success 'build graph using --reachable' ' cd "$TRASH_DIRECTORY/full" && git commit-graph write --reachable && test_path_is_file $objdir/info/commit-graph && - graph_read_expect "11" "extra_edges" + graph_read_expect "11" "generation_data extra_edges" ' graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1 @@ -328,7 +328,7 @@ test_expect_success 'write graph in bare repo' ' cd "$TRASH_DIRECTORY/bare" && git commit-graph write && test_path_is_file $baredir/info/commit-graph && - graph_read_expect "11" "extra_edges" + graph_read_expect "11" "generation_data extra_edges" ' graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1 @@ -454,8 +454,9 @@ test_expect_success 'warn on improper hash version' ' test_expect_success 'git commit-graph verify' ' cd "$TRASH_DIRECTORY/full" && - git rev-parse commits/8 | git commit-graph write --stdin-commits && - git commit-graph verify >output + git rev-parse commits/8 | GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --stdin-commits && + git commit-graph verify >output && + graph_read_expect 9 extra_edges ' NUM_COMMITS=9 @@ -741,4 +742,47 @@ test_expect_success 'corrupt commit-graph write (missing tree)' ' ) ' +test_commit_with_date() { + file="$1.t" && + echo "$1" >"$file" && + git add "$file" && + GIT_COMMITTER_DATE="$2" GIT_AUTHOR_DATE="$2" git commit -m "$1" + git tag "$1" +} + +test_expect_success 'overflow corrected commit date offset' ' + objdir=".git/objects" && + UNIX_EPOCH_ZERO="1970-01-01 00:00 +0000" && + FUTURE_DATE="@2147483646 +0000" && + test_oid_cache <<-EOF && + oid_version sha1:1 + oid_version sha256:2 + EOF + cd "$TRASH_DIRECTORY" && + mkdir repo && + cd repo && + git init && + test_commit_with_date 1 "$UNIX_EPOCH_ZERO" && + test_commit 2 && + test_commit_with_date 3 "$UNIX_EPOCH_ZERO" && + git commit-graph write --reachable && + graph_read_expect 3 generation_data && + test_commit_with_date 4 "$FUTURE_DATE" && + test_commit 5 && + test_commit_with_date 6 "$UNIX_EPOCH_ZERO" && + git branch left && + git reset --hard 3 && + test_commit 7 && + test_commit_with_date 8 "$FUTURE_DATE" && + test_commit 9 && + git branch right && + git reset --hard 3 && + git merge left right && + git commit-graph write --reachable && + graph_read_expect 10 "generation_data generation_data_overflow" && + git commit-graph verify +' + +graph_git_behavior 'overflow corrected commit date offset' repo left right + test_done diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index c334ee9155..651df89ab2 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -13,11 +13,11 @@ test_expect_success 'setup repo' ' infodir=".git/objects/info" && graphdir="$infodir/commit-graphs" && test_oid_cache <<-EOM - shallow sha1:1760 - shallow sha256:2064 + shallow sha1:2132 + shallow sha256:2436 - base sha1:1376 - base sha256:1496 + base sha1:1408 + base sha256:1528 oid_version sha1:1 oid_version sha256:2 @@ -31,9 +31,9 @@ graph_read_expect() { NUM_BASE=$2 fi cat >expect <<- EOF - header: 43475048 1 $(test_oid oid_version) 3 $NUM_BASE + header: 43475048 1 $(test_oid oid_version) 4 $NUM_BASE num_commits: $1 - chunks: oid_fanout oid_lookup commit_metadata + chunks: oid_fanout oid_lookup commit_metadata generation_data EOF test-tool read-graph >output && test_cmp expect output diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index f807276337..e2d33a8a4c 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -55,10 +55,13 @@ test_expect_success 'setup' ' git show-ref -s commit-5-5 | git commit-graph write --stdin-commits && mv .git/objects/info/commit-graph commit-graph-half && chmod u+w commit-graph-half && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable && + mv .git/objects/info/commit-graph commit-graph-no-gdat && + chmod u+w commit-graph-no-gdat && git config core.commitGraph true ' -run_three_modes () { +run_all_modes () { test_when_finished rm -rf .git/objects/info/commit-graph && "$@" actual && test_cmp expect actual && @@ -67,11 +70,14 @@ run_three_modes () { test_cmp expect actual && cp commit-graph-half .git/objects/info/commit-graph && "$@" actual && + test_cmp expect actual && + cp commit-graph-no-gdat .git/objects/info/commit-graph && + "$@" actual && test_cmp expect actual } -test_three_modes () { - run_three_modes test-tool reach "$@" +test_all_modes () { + run_all_modes test-tool reach "$@" } test_expect_success 'ref_newer:miss' ' @@ -80,7 +86,7 @@ test_expect_success 'ref_newer:miss' ' B:commit-4-9 EOF echo "ref_newer(A,B):0" >expect && - test_three_modes ref_newer + test_all_modes ref_newer ' test_expect_success 'ref_newer:hit' ' @@ -89,7 +95,7 @@ test_expect_success 'ref_newer:hit' ' B:commit-2-3 EOF echo "ref_newer(A,B):1" >expect && - test_three_modes ref_newer + test_all_modes ref_newer ' test_expect_success 'in_merge_bases:hit' ' @@ -98,7 +104,7 @@ test_expect_success 'in_merge_bases:hit' ' B:commit-8-8 EOF echo "in_merge_bases(A,B):1" >expect && - test_three_modes in_merge_bases + test_all_modes in_merge_bases ' test_expect_success 'in_merge_bases:miss' ' @@ -107,7 +113,7 @@ test_expect_success 'in_merge_bases:miss' ' B:commit-5-9 EOF echo "in_merge_bases(A,B):0" >expect && - test_three_modes in_merge_bases + test_all_modes in_merge_bases ' test_expect_success 'in_merge_bases_many:hit' ' @@ -117,7 +123,7 @@ test_expect_success 'in_merge_bases_many:hit' ' X:commit-5-7 EOF echo "in_merge_bases_many(A,X):1" >expect && - test_three_modes in_merge_bases_many + test_all_modes in_merge_bases_many ' test_expect_success 'in_merge_bases_many:miss' ' @@ -127,7 +133,7 @@ test_expect_success 'in_merge_bases_many:miss' ' X:commit-8-6 EOF echo "in_merge_bases_many(A,X):0" >expect && - test_three_modes in_merge_bases_many + test_all_modes in_merge_bases_many ' test_expect_success 'in_merge_bases_many:miss-heuristic' ' @@ -137,7 +143,7 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' ' X:commit-6-6 EOF echo "in_merge_bases_many(A,X):0" >expect && - test_three_modes in_merge_bases_many + test_all_modes in_merge_bases_many ' test_expect_success 'is_descendant_of:hit' ' @@ -148,7 +154,7 @@ test_expect_success 'is_descendant_of:hit' ' X:commit-1-1 EOF echo "is_descendant_of(A,X):1" >expect && - test_three_modes is_descendant_of + test_all_modes is_descendant_of ' test_expect_success 'is_descendant_of:miss' ' @@ -159,7 +165,7 @@ test_expect_success 'is_descendant_of:miss' ' X:commit-7-6 EOF echo "is_descendant_of(A,X):0" >expect && - test_three_modes is_descendant_of + test_all_modes is_descendant_of ' test_expect_success 'get_merge_bases_many' ' @@ -174,7 +180,7 @@ test_expect_success 'get_merge_bases_many' ' git rev-parse commit-5-6 \ commit-4-7 | sort } >expect && - test_three_modes get_merge_bases_many + test_all_modes get_merge_bases_many ' test_expect_success 'reduce_heads' ' @@ -196,7 +202,7 @@ test_expect_success 'reduce_heads' ' commit-2-8 \ commit-1-10 | sort } >expect && - test_three_modes reduce_heads + test_all_modes reduce_heads ' test_expect_success 'can_all_from_reach:hit' ' @@ -219,7 +225,7 @@ test_expect_success 'can_all_from_reach:hit' ' Y:commit-8-1 EOF echo "can_all_from_reach(X,Y):1" >expect && - test_three_modes can_all_from_reach + test_all_modes can_all_from_reach ' test_expect_success 'can_all_from_reach:miss' ' @@ -241,7 +247,7 @@ test_expect_success 'can_all_from_reach:miss' ' Y:commit-8-5 EOF echo "can_all_from_reach(X,Y):0" >expect && - test_three_modes can_all_from_reach + test_all_modes can_all_from_reach ' test_expect_success 'can_all_from_reach_with_flag: tags case' ' @@ -264,7 +270,7 @@ test_expect_success 'can_all_from_reach_with_flag: tags case' ' Y:commit-8-1 EOF echo "can_all_from_reach_with_flag(X,_,_,0,0):1" >expect && - test_three_modes can_all_from_reach_with_flag + test_all_modes can_all_from_reach_with_flag ' test_expect_success 'commit_contains:hit' ' @@ -280,8 +286,8 @@ test_expect_success 'commit_contains:hit' ' X:commit-9-3 EOF echo "commit_contains(_,A,X,_):1" >expect && - test_three_modes commit_contains && - test_three_modes commit_contains --tag + test_all_modes commit_contains && + test_all_modes commit_contains --tag ' test_expect_success 'commit_contains:miss' ' @@ -297,8 +303,8 @@ test_expect_success 'commit_contains:miss' ' X:commit-9-3 EOF echo "commit_contains(_,A,X,_):0" >expect && - test_three_modes commit_contains && - test_three_modes commit_contains --tag + test_all_modes commit_contains && + test_all_modes commit_contains --tag ' test_expect_success 'rev-list: basic topo-order' ' @@ -310,7 +316,7 @@ test_expect_success 'rev-list: basic topo-order' ' commit-6-2 commit-5-2 commit-4-2 commit-3-2 commit-2-2 commit-1-2 \ commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 commit-1-1 \ >expect && - run_three_modes git rev-list --topo-order commit-6-6 + run_all_modes git rev-list --topo-order commit-6-6 ' test_expect_success 'rev-list: first-parent topo-order' ' @@ -322,7 +328,7 @@ test_expect_success 'rev-list: first-parent topo-order' ' commit-6-2 \ commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 commit-1-1 \ >expect && - run_three_modes git rev-list --first-parent --topo-order commit-6-6 + run_all_modes git rev-list --first-parent --topo-order commit-6-6 ' test_expect_success 'rev-list: range topo-order' ' @@ -334,7 +340,7 @@ test_expect_success 'rev-list: range topo-order' ' commit-6-2 commit-5-2 commit-4-2 \ commit-6-1 commit-5-1 commit-4-1 \ >expect && - run_three_modes git rev-list --topo-order commit-3-3..commit-6-6 + run_all_modes git rev-list --topo-order commit-3-3..commit-6-6 ' test_expect_success 'rev-list: range topo-order' ' @@ -346,7 +352,7 @@ test_expect_success 'rev-list: range topo-order' ' commit-6-2 commit-5-2 commit-4-2 \ commit-6-1 commit-5-1 commit-4-1 \ >expect && - run_three_modes git rev-list --topo-order commit-3-8..commit-6-6 + run_all_modes git rev-list --topo-order commit-3-8..commit-6-6 ' test_expect_success 'rev-list: first-parent range topo-order' ' @@ -358,7 +364,7 @@ test_expect_success 'rev-list: first-parent range topo-order' ' commit-6-2 \ commit-6-1 commit-5-1 commit-4-1 \ >expect && - run_three_modes git rev-list --first-parent --topo-order commit-3-8..commit-6-6 + run_all_modes git rev-list --first-parent --topo-order commit-3-8..commit-6-6 ' test_expect_success 'rev-list: ancestry-path topo-order' ' @@ -368,7 +374,7 @@ test_expect_success 'rev-list: ancestry-path topo-order' ' commit-6-4 commit-5-4 commit-4-4 commit-3-4 \ commit-6-3 commit-5-3 commit-4-3 \ >expect && - run_three_modes git rev-list --topo-order --ancestry-path commit-3-3..commit-6-6 + run_all_modes git rev-list --topo-order --ancestry-path commit-3-3..commit-6-6 ' test_expect_success 'rev-list: symmetric difference topo-order' ' @@ -382,7 +388,7 @@ test_expect_success 'rev-list: symmetric difference topo-order' ' commit-3-8 commit-2-8 commit-1-8 \ commit-3-7 commit-2-7 commit-1-7 \ >expect && - run_three_modes git rev-list --topo-order commit-3-8...commit-6-6 + run_all_modes git rev-list --topo-order commit-3-8...commit-6-6 ' test_expect_success 'get_reachable_subset:all' ' @@ -402,7 +408,7 @@ test_expect_success 'get_reachable_subset:all' ' commit-1-7 \ commit-5-6 | sort ) >expect && - test_three_modes get_reachable_subset + test_all_modes get_reachable_subset ' test_expect_success 'get_reachable_subset:some' ' @@ -420,7 +426,7 @@ test_expect_success 'get_reachable_subset:some' ' git rev-parse commit-3-3 \ commit-1-7 | sort ) >expect && - test_three_modes get_reachable_subset + test_all_modes get_reachable_subset ' test_expect_success 'get_reachable_subset:none' ' @@ -434,7 +440,7 @@ test_expect_success 'get_reachable_subset:none' ' Y:commit-2-8 EOF echo "get_reachable_subset(X,Y)" >expect && - test_three_modes get_reachable_subset + test_all_modes get_reachable_subset ' test_done From patchwork Wed Oct 7 14:09:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 11820735 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D40517CF for ; Wed, 7 Oct 2020 14:10:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DEC5020870 for ; Wed, 7 Oct 2020 14:10:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="frsJx7Gf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728596AbgJGOKD (ORCPT ); Wed, 7 Oct 2020 10:10:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728586AbgJGOJ7 (ORCPT ); Wed, 7 Oct 2020 10:09:59 -0400 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BEBDC0613D4 for ; Wed, 7 Oct 2020 07:09:59 -0700 (PDT) Received: by mail-wr1-x444.google.com with SMTP id w5so2334482wrp.8 for ; Wed, 07 Oct 2020 07:09:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=wuBwySB8Ix03KXrrpRqQJSaf3en/EhrTFRYzWh2SViU=; b=frsJx7Gf/YB1u67lvMJ6yIyFa44GJMDesv7xSJZLfLfXhiBzG3PrES9eo22Idq+nSZ NBwxU51HLXbk/nfH7RhAF3d7UBIpJrnYQ9SbEZrQcDqnqXGwYjvBgmGoX/xdesRw7+Bl VnVzBWmdBeHbHvrpgY+o7NkO5L3tpkG0XHr5y3ZB34X30Eg8SIKCXOZOgRTAlpCiElD5 k7LQyWcEHH4dpVYcF5ev1Dz5yjz5Yw2MbsQdJDuZ1aowpO7e+ZVBHwLKbWTlnDkDixls m54plAzv6D8l60+CFzjdYPbXHtwL/Aqj2u7HX44ONqgPRYS0Win+4tP2nW7AKtGoXCmV RhMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=wuBwySB8Ix03KXrrpRqQJSaf3en/EhrTFRYzWh2SViU=; b=h/l+4YJAF0ncvC08WSs6ZAGlbHwDgM3LLl/7673wuqmyqrKkb5nUl3iqz7SEXXnztM YsFpd7QKDDHKLAdEMONmSYPQMZNsB4Ye7sHCtzvtdpV5GePPnWtDu6jUbwCJTgMLtU3w JNSapFaugNXF7jjCA/FR3hyF4PB7Zw1WO8jVY0lSmFpz3RR9PWxjubHyVbZHUfLCERq5 L6pnXp35qIFJ2XY23Zns/O0JxCKQrZ6huCfDScKwDrS9EiBBX5QfKEznodPkmZUahSjY 8JmRZOKawZcYjblzhngw1sVLOSX7tutzIgXCaDjmFeXRSxqgM9OnrK8tN11maqBXS3+p XABQ== X-Gm-Message-State: AOAM532XrDFIDdr2Qoa1AGFmv9URlo2p1OBKQAFbjjsCMvYV54Vrw1wp hyfvtMIKODyA3boLkkasxZw3hZYucRE= X-Google-Smtp-Source: ABdhPJybuXvnK9wJpG/WdNlvTbkkPO6hfjtV8JokJL61h9ia7OIurt5WY93f0ywpGN2z0bRNDQkH2Q== X-Received: by 2002:adf:f70d:: with SMTP id r13mr3828381wrp.317.1602079797452; Wed, 07 Oct 2020 07:09:57 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g144sm2972999wmg.30.2020.10.07.07.09.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Oct 2020 07:09:56 -0700 (PDT) Message-Id: <8ec119edc66814ad4d63908c79437a7f9dd3c08c.1602079786.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Abhishek Kumar via GitGitGadget" Date: Wed, 07 Oct 2020 14:09:43 +0000 Subject: [PATCH v4 08/10] commit-graph: use generation v2 only if entire chain does Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar Since there are released versions of Git that understand generation numbers in the commit-graph's CDAT chunk but do not understand the GDAT chunk, the following scenario is possible: 1. "New" Git writes a commit-graph with the GDAT chunk. 2. "Old" Git writes a split commit-graph on top without a GDAT chunk. Because of the current use of inspecting the current layer for a chunk_generation_data pointer, the commits in the lower layer will be interpreted as having very large generation values (commit date plus offset) compared to the generation numbers in the top layer (topological level). This violates the expectation that the generation of a parent is strictly smaller than the generation of a child. It is difficult to expose this issue in a test. Since we _start_ with artificially low generation numbers, any commit walk that prioritizes generation numbers will walk all of the commits with high generation number before walking the commits with low generation number. In all the cases I tried, the commit-graph layers themselves "protect" any incorrect behavior since none of the commits in the lower layer can reach the commits in the upper layer. This issue would manifest itself as a performance problem in this case, especially with something like "git log --graph" since the low generation numbers would cause the in-degree queue to walk all of the commits in the lower layer before allowing the topo-order queue to write anything to output (depending on the size of the upper layer). When writing the new layer in split commit-graph, we write a GDAT chunk only if the topmost layer has a GDAT chunk. This guarantees that if a layer has GDAT chunk, all lower layers must have a GDAT chunk as well. Rewriting layers follows similar approach: if the topmost layer below the set of layers being rewritten (in the split commit-graph chain) exists, and it does not contain GDAT chunk, then the result of rewrite does not have GDAT chunks either. Signed-off-by: Derrick Stolee Signed-off-by: Abhishek Kumar --- commit-graph.c | 29 +++++++++++- commit-graph.h | 1 + t/t5324-split-commit-graph.sh | 86 +++++++++++++++++++++++++++++++++++ 3 files changed, 115 insertions(+), 1 deletion(-) diff --git a/commit-graph.c b/commit-graph.c index 71d0b243db..5d15a1399b 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -605,6 +605,21 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return graph_chain; } +static void validate_mixed_generation_chain(struct commit_graph *g) +{ + int read_generation_data; + + if (!g) + return; + + read_generation_data = !!g->chunk_generation_data; + + while (g) { + g->read_generation_data = read_generation_data; + g = g->base_graph; + } +} + struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb) { @@ -613,6 +628,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r, if (!g) g = load_commit_graph_chain(r, odb); + validate_mixed_generation_chain(g); + return g; } @@ -782,7 +799,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - if (g->chunk_generation_data) { + if (g->chunk_generation_data && g->read_generation_data) { offset = (timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { @@ -2030,6 +2047,9 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) } } + if (!ctx->write_generation_data && g->chunk_generation_data) + ctx->write_generation_data = 1; + if (flags != COMMIT_GRAPH_SPLIT_REPLACE) ctx->new_base_graph = g; else if (ctx->num_commit_graphs_after != 1) @@ -2274,6 +2294,7 @@ int write_commit_graph(struct object_directory *odb, struct commit_graph *g = ctx->r->objects->commit_graph; while (g) { + g->read_generation_data = 1; g->topo_levels = &topo_levels; g = g->base_graph; } @@ -2300,6 +2321,9 @@ int write_commit_graph(struct object_directory *odb, g = ctx->r->objects->commit_graph; + if (g && !g->chunk_generation_data) + ctx->write_generation_data = 0; + while (g) { ctx->num_commit_graphs_before++; g = g->base_graph; @@ -2318,6 +2342,9 @@ int write_commit_graph(struct object_directory *odb, if (ctx->opts) replace = ctx->opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE; + + if (replace) + ctx->write_generation_data = 1; } ctx->approx_nr_objects = approximate_object_count(); diff --git a/commit-graph.h b/commit-graph.h index 19a02001fd..ad52130883 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -64,6 +64,7 @@ struct commit_graph { struct object_directory *odb; uint32_t num_commits_in_base; + unsigned int read_generation_data; struct commit_graph *base_graph; const uint32_t *chunk_oid_fanout; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 651df89ab2..d0949a9eb8 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -440,4 +440,90 @@ test_expect_success '--split=replace with partial Bloom data' ' verify_chain_files_exist $graphdir ' +test_expect_success 'setup repo for mixed generation commit-graph-chain' ' + mkdir mixed && + graphdir=".git/objects/info/commit-graphs" && + test_oid_cache <<-EOM && + oid_version sha1:1 + oid_version sha256:2 + EOM + cd "$TRASH_DIRECTORY/mixed" && + git init && + git config core.commitGraph true && + git config gc.writeCommitGraph false && + for i in $(test_seq 3) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git reset --hard commits/1 && + for i in $(test_seq 4 5) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git reset --hard commits/2 && + for i in $(test_seq 6 10) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split && + git reset --hard commits/2 && + git merge commits/4 && + git branch merge/1 && + git reset --hard commits/4 && + git merge commits/6 && + git branch merge/2 && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split=no-merge && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 1 + num_commits: 2 + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify +' + +test_expect_success 'does not write generation data chunk if not present on existing tip' ' + cd "$TRASH_DIRECTORY/mixed" && + git reset --hard commits/3 && + git merge merge/1 && + git merge commits/5 && + git merge merge/2 && + git branch merge/3 && + git commit-graph write --reachable --split=no-merge && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 2 + num_commits: 3 + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify +' + +test_expect_success 'writes generation data chunk when commit-graph chain is replaced' ' + cd "$TRASH_DIRECTORY/mixed" && + git commit-graph write --reachable --split=replace && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 1 $graphdir/commit-graph-chain && + verify_chain_files_exist $graphdir && + graph_read_expect 15 && + git commit-graph verify +' + +test_expect_success 'add one commit, write a tip graph' ' + cd "$TRASH_DIRECTORY/mixed" && + test_commit 11 && + git branch commits/11 && + git commit-graph write --reachable --split && + test_path_is_missing $infodir/commit-graph && + test_path_is_file $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 2 graph-files && + verify_chain_files_exist $graphdir +' + test_done From patchwork Wed Oct 7 14:09:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 11820733 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CBC8517D2 for ; Wed, 7 Oct 2020 14:10:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9ED0421531 for ; Wed, 7 Oct 2020 14:10:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="a60paE88" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728598AbgJGOKE (ORCPT ); Wed, 7 Oct 2020 10:10:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728532AbgJGOKB (ORCPT ); Wed, 7 Oct 2020 10:10:01 -0400 Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3848C0613D5 for ; Wed, 7 Oct 2020 07:10:00 -0700 (PDT) Received: by mail-wr1-x441.google.com with SMTP id n18so2355795wrs.5 for ; Wed, 07 Oct 2020 07:10:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=aEiKPs+ct87ku14xOM1B4f4rXmdc7luCwwUm3MWBZCQ=; b=a60paE88GDJFMmUDggZbzjzU0tsa8gAeB7KVnwA+9pUYUtq0lgsGuP2f+B58rPVL+3 MENQEdks19Pq1UZnKk/6P/9sy6kiU2oYKn2M68Neid2PPzLgMb/nWOJJ5qkNXChK8IIf H6nG+QQUPnN8wsbjZ+h9mHTlrmXNmy0AyKVjmXpQK9L4uvECYY3bWeM/XbpqCJRlAxlt /v9djHgPklT0ZzwlaKZ5bM9Gd2j9UmIlZp9F2nkRf7JEVpjRqIZ0Kcnz1mTSVeHvPUw7 ygLEEbJsPrrFNng5OSPBJOVUYoztKjceVMKl5ER1H3cEiMZgCdUAyqAkuoWY275InGpA NzfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=aEiKPs+ct87ku14xOM1B4f4rXmdc7luCwwUm3MWBZCQ=; b=BDUxYD/dwplAIEGCUT3Dgytv36ZngJYCh90QQjZ8ZAwuAKklDkDv4M3vFeo60o08Bj M9HfRQwY2YzlSwyUZpq+rzf8ucPiDNSUMf8MKmqdxCVuBrwgZt7frnAVIvj+y3WPR9Iy DsHrRHGCo62w7yVnvnb/Oea4yjHQJ4UCI4KxTH0dFIhnl4cX7z1KrUw5/Oa3LXxVMDbp Aygk986lZ3dKnp4SFF0vhsR3FAUKKudRJECUWiFySOuwBrZOdkq2847QZGoS1n3GxyiU pk3PTZj36A+tXa9mUJWujFyJPxgqtTU4dxuirgEWKz4CA0Sy4xI8XqS33DPUkMQChKT1 Yl7w== X-Gm-Message-State: AOAM533EUjw+KQiEqD02niLCxJZPXMlkw7smoiD9DxyZPfHPvJ40Pr1i eQbEX+Wi04CUPoUIlg5qqz1rzxXib50= X-Google-Smtp-Source: ABdhPJxNKBgzI1E0AWZa2/OpVRp9QdWyD+B0GkJfsh2tSTLJveLneqJ/X6EKSbpO1gWnxYC30W53XQ== X-Received: by 2002:adf:f784:: with SMTP id q4mr3754224wrp.126.1602079799283; Wed, 07 Oct 2020 07:09:59 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m11sm2853161wmf.10.2020.10.07.07.09.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Oct 2020 07:09:58 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Abhishek Kumar via GitGitGadget" Date: Wed, 07 Oct 2020 14:09:44 +0000 Subject: [PATCH v4 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar With corrected commit dates implemented, we no longer have to rely on commit date as a heuristic in paint_down_to_common(). While using corrected commit dates Git walks nearly the same number of commits as commit date, the process is slower as for each comparision we have to access a commit-slab (for corrected committer date) instead of accessing struct member (for committer date). For example, the command `git merge-base v4.8 v4.9` on the linux repository walks 167468 commits, taking 0.135s for committer date and 167496 commits, taking 0.157s for corrected committer date respectively. t6404-recursive-merge setups a unique repository where all commits have the same committer date without well-defined merge-base. While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer date as a heuristic in paint_down_to_common(). 6404.1 'combined merge conflicts' merges commits in the order: - Merge C with B to form a intermediate commit. - Merge the intermediate commit with A. With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently use the corrected committer date, which changes the order in which commits are merged: - Merge A with B to form a intermediate commit. - Merge the intermediate commit with C. While resulting repositories are equivalent, 6404.4 'virtual trees were processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting different merge-bases and thus have different object ids for the intermediate commits. As this has already causes problems (as noted in 859fdc0 (commit-graph: define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph within t6404-recursive-merge. Signed-off-by: Abhishek Kumar --- commit-graph.c | 14 ++++++++++++++ commit-graph.h | 8 +++++++- commit-reach.c | 2 +- t/t6404-recursive-merge.sh | 5 ++++- 4 files changed, 26 insertions(+), 3 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 5d15a1399b..3de1933ede 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -705,6 +705,20 @@ int generation_numbers_enabled(struct repository *r) return !!first_generation; } +int corrected_commit_dates_enabled(struct repository *r) +{ + struct commit_graph *g; + if (!prepare_commit_graph(r)) + return 0; + + g = r->objects->commit_graph; + + if (!g->num_commits) + return 0; + + return g->read_generation_data; +} + struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r) { struct commit_graph *g = r->objects->commit_graph; diff --git a/commit-graph.h b/commit-graph.h index ad52130883..d2c048dc64 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -89,13 +89,19 @@ struct commit_graph *read_commit_graph_one(struct repository *r, struct commit_graph *parse_commit_graph(struct repository *r, void *graph_map, size_t graph_size); +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); + /* * Return 1 if and only if the repository has a commit-graph * file and generation numbers are computed in that file. */ int generation_numbers_enabled(struct repository *r); -struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); +/* + * Return 1 if and only if the repository has a commit-graph + * file and generation data chunk has been written for the file. + */ +int corrected_commit_dates_enabled(struct repository *r); enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_APPEND = (1 << 0), diff --git a/commit-reach.c b/commit-reach.c index 20b48b872b..46f5a9e638 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -39,7 +39,7 @@ static struct commit_list *paint_down_to_common(struct repository *r, int i; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; - if (!min_generation) + if (!min_generation && !corrected_commit_dates_enabled(r)) queue.compare = compare_commits_by_commit_date; one->object.flags |= PARENT1; diff --git a/t/t6404-recursive-merge.sh b/t/t6404-recursive-merge.sh index 332cfc53fd..7055771b62 100755 --- a/t/t6404-recursive-merge.sh +++ b/t/t6404-recursive-merge.sh @@ -15,6 +15,8 @@ GIT_COMMITTER_DATE="2006-12-12 23:28:00 +0100" export GIT_COMMITTER_DATE test_expect_success 'setup tests' ' + GIT_TEST_COMMIT_GRAPH=0 && + export GIT_TEST_COMMIT_GRAPH && echo 1 >a1 && git add a1 && GIT_AUTHOR_DATE="2006-12-12 23:00:00" git commit -m 1 a1 && @@ -66,7 +68,7 @@ test_expect_success 'setup tests' ' ' test_expect_success 'combined merge conflicts' ' - test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git merge -m final G + test_must_fail git merge -m final G ' test_expect_success 'result contains a conflict' ' @@ -82,6 +84,7 @@ test_expect_success 'result contains a conflict' ' ' test_expect_success 'virtual trees were processed' ' + # TODO: fragile test, relies on ambigious merge-base resolution git ls-files --stage >out && cat >expect <<-EOF && From patchwork Wed Oct 7 14:09:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Blain via GitGitGadget X-Patchwork-Id: 11820729 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 32C57139F for ; Wed, 7 Oct 2020 14:10:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 08AB221531 for ; Wed, 7 Oct 2020 14:10:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kHuw332S" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728601AbgJGOKF (ORCPT ); Wed, 7 Oct 2020 10:10:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728535AbgJGOKC (ORCPT ); Wed, 7 Oct 2020 10:10:02 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24912C0613D6 for ; Wed, 7 Oct 2020 07:10:02 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id j2so2343420wrx.7 for ; Wed, 07 Oct 2020 07:10:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=hiVEWQSG+HZfs7BMxvnv6ua0Bsg8vshDXec12N3xsX4=; b=kHuw332SAnQYW3Cy/0oOzt9OYX0OMU7siKDbvuz9XUQJhObVt9uYOTmxDTBhFGxnFD yIvpO46jUaERGhiBIDyb1XyGZ5JtrKKy3ZbzOOIP3AsFG8dBHgtVZcT/vFEm5xtmK/vm gN7yEcZw/eePkNQKkDtvGaeP2qmRjC2n4UGvtDWYAH0pCUNmUItYjwjtMAbtIr4OQJSw 67NFZJ+99MCprEL54YHqcNAO3wFnZAvL7+KvyFL/2L/OVE2bjStAdx/+VbidP7/KMAxH /Se0ZsfPdDOS77+lOs9Ppe+1SXHirK3q7x6/JxH5Z/uohW11LUKysQhUDmmkImn5PPAe rJ3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=hiVEWQSG+HZfs7BMxvnv6ua0Bsg8vshDXec12N3xsX4=; b=oIlyjb5Nocm27lOzr6zNNmI97fZcg9RBOhWzQ2SV4MFDH0uzTu7cMf+psaaWJbcVHJ MgPPIJERDNsAqjPYCp1/9mi+AHoSssY3WNzQHobXFuIepOyBBbHVdJGriPCUXuw2Di5h NkcLqdeO0ALzWdRrSQpApGoliM6dtuDsivPWqqHGvyIn4HRhFRPo4Xt2Z2FQq47eAYtd FruQwrySv3i3rmHuBPGdhbAsodYD+3eI34MyfwbX75HoxU6oFlPkml2lElvmsQ5xtK20 WSfEin9I5d1XTK3cwXqzg0DTUoH0X9Y1wtWQ/4Zpghgcs69J7CEDVAIeIJIWLyNSXtM5 6HAw== X-Gm-Message-State: AOAM530WK4shrHKCCHT+5bzkyWzmJ9kU+JpL7lvEpIVhHTOjgbdaaJ6J y/nMXUjyjLKQNLiw1yWFSkl166RkDOw= X-Google-Smtp-Source: ABdhPJyQniRIACftN8MzjrgqFBr5FSveE9kzMZBNoGwN17dhvU8YbhQfx1Z/1VspqciWwv+AKNvmYQ== X-Received: by 2002:adf:e304:: with SMTP id b4mr3612169wrj.141.1602079800402; Wed, 07 Oct 2020 07:10:00 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u17sm3265845wri.45.2020.10.07.07.09.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Oct 2020 07:09:59 -0700 (PDT) Message-Id: <9ada43967d29a3ec717b6a8db0de5b09e6d916b1.1602079786.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Abhishek Kumar via GitGitGadget" Date: Wed, 07 Oct 2020 14:09:45 +0000 Subject: [PATCH v4 10/10] doc: add corrected commit date info Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar With generation data chunk and corrected commit dates implemented, let's update the technical documentation for commit-graph. Signed-off-by: Abhishek Kumar --- .../technical/commit-graph-format.txt | 21 +++++-- Documentation/technical/commit-graph.txt | 62 ++++++++++++++++--- 2 files changed, 69 insertions(+), 14 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index b3b58880b9..08d9026ad4 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -4,11 +4,7 @@ Git commit graph format The Git commit graph stores a list of commit OIDs and some associated metadata, including: -- The generation number of the commit. Commits with no parents have - generation number 1; commits with parents have generation number - one more than the maximum generation number of its parents. We - reserve zero as special, and can be used to mark a generation - number invalid or as "not computed". +- The generation number of the commit. - The root tree OID. @@ -86,13 +82,26 @@ CHUNK DATA: position. If there are more than two parents, the second value has its most-significant bit on and the other bits store an array position into the Extra Edge List chunk. - * The next 8 bytes store the generation number of the commit and + * The next 8 bytes store the topological level (generation number v1) + of the commit and the commit time in seconds since EPOCH. The generation number uses the higher 30 bits of the first 4 bytes, while the commit time uses the 32 bits of the second 4 bytes, along with the lowest 2 bits of the lowest byte, storing the 33rd and 34th bit of the commit time. + Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) + * This list of 4-byte values store corrected commit date offsets for the + commits, arranged in the same order as commit data chunk. + * If the corrected commit date offset cannot be stored within 31 bits, + the value has its most-significant bit on and the other bits store + the position of corrected commit date into the Generation Data Overflow + chunk. + + Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional] + * This list of 8-byte values stores the corrected commit dates for commits + with corrected commit date offsets that cannot be stored within 31 bits. + Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] This list of 4-byte values store the second through nth parents for all octopus merges. The second parent value in the commit data stores diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index f14a7659aa..75f71c4c7b 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph: Values 1-4 satisfy the requirements of parse_commit_gently(). -Define the "generation number" of a commit recursively as follows: +There are two definitions of generation number: +1. Corrected committer dates (generation number v2) +2. Topological levels (generation nummber v1) - * A commit with no parents (a root commit) has generation number one. +Define "corrected committer date" of a commit recursively as follows: - * A commit with at least one parent has generation number one more than - the largest generation number among its parents. + * A commit with no parents (a root commit) has corrected committer date + equal to its committer date. -Equivalently, the generation number of a commit A is one more than the + * A commit with at least one parent has corrected committer date equal to + the maximum of its commiter date and one more than the largest corrected + committer date among its parents. + + * As a special case, a root commit with timestamp zero has corrected commit + date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO + (that is, an uncomputed corrected commit date). + +Define the "topological level" of a commit recursively as follows: + + * A commit with no parents (a root commit) has topological level of one. + + * A commit with at least one parent has topological level one more than + the largest topological level among its parents. + +Equivalently, the topological level of a commit A is one more than the length of a longest path from A to a root commit. The recursive definition is easier to use for computation and observing the following property: @@ -60,6 +77,9 @@ is easier to use for computation and observing the following property: generation numbers, then we always expand the boundary commit with highest generation number and can easily detect the stopping condition. +The properties applies to both versions of generation number, that is both +corrected committer dates and topological levels. + This property can be used to significantly reduce the time it takes to walk commits and determine topological relationships. Without generation numbers, the general heuristic is the following: @@ -67,7 +87,9 @@ numbers, the general heuristic is the following: If A and B are commits with commit time X and Y, respectively, and X < Y, then A _probably_ cannot reach B. -This heuristic is currently used whenever the computation is allowed to +In absence of corrected commit dates (for example, old versions of Git or +mixed generation graph chains), +this heuristic is currently used whenever the computation is allowed to violate topological relationships due to clock skew (such as "git log" with default order), but is not used when the topological order is required (such as merge base calculations, "git log --graph"). @@ -77,7 +99,7 @@ in the commit graph. We can treat these commits as having "infinite" generation number and walk until reaching commits with known generation number. -We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not +We use the macro GENERATION_NUMBER_INFINITY to mark commits not in the commit-graph file. If a commit-graph file was written by a version of Git that did not compute generation numbers, then those commits will have generation number represented by the macro GENERATION_NUMBER_ZERO = 0. @@ -93,7 +115,7 @@ fully-computed generation numbers. Using strict inequality may result in walking a few extra commits, but the simplicity in dealing with commits with generation number *_INFINITY or *_ZERO is valuable. -We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose +We use the macro GENERATION_NUMBER_MAX for commits whose generation numbers are computed to be at least this value. We limit at this value since it is the largest value that can be stored in the commit-graph file using the 30 bits available to generation numbers. This @@ -267,6 +289,30 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. +## Handling Mixed Generation Number Chains + +With the introduction of generation number v2 and generation data chunk, the +following scenario is possible: + +1. "New" Git writes a commit-graph with the corrected commit dates. +2. "Old" Git writes a split commit-graph on top without corrected commit dates. + +A naive approach of using the newest available generation number from +each layer would lead to violated expectations: the lower layer would +use corrected commit dates which are much larger than the topological +levels of the higher layer. For this reason, Git inspects each layer to +see if any layer is missing corrected commit dates. In such a case, Git +only uses topological level + +When writing a new layer in split commit-graph, we write corrected commit +dates if the topmost layer has corrected commit dates written. This +guarantees that if a layer has corrected commit dates, all lower layers +must have corrected commit dates as well. + +When merging layers, we do not consider whether the merged layers had corrected +commit dates. Instead, the new layer will have corrected commit dates if and +only if all existing layers below the new layer have corrected commit dates. + ## Deleting graph-{hash} files After a new tip file is written, some `graph-{hash}` files may no longer