From patchwork Wed Sep 16 18:06:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780343 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 926CA6CA for ; Wed, 16 Sep 2020 18:08:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 71EE12137B for ; Wed, 16 Sep 2020 18:08:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="lFOmhV+X" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727732AbgIPSH6 (ORCPT ); Wed, 16 Sep 2020 14:07:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727711AbgIPSHA (ORCPT ); Wed, 16 Sep 2020 14:07:00 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 654C7C06174A for ; Wed, 16 Sep 2020 11:06:58 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id b2so2150296qtp.8 for ; Wed, 16 Sep 2020 11:06:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=jAM80tVqXQyi4vtnlYH1l5eYEy92OHVb3Eyr7kkfQTQ=; b=lFOmhV+XUpFGVwb+yY13p4KU98TlGU8Wz5MHG3/2VM/1mmhVPaqggdlBwbiYQKypAj yweLWWvkrxAVwlJes8PSwPC5L+nIfhbTnfGcXW+mrWefBIrwJRjjUfAwWA+iDrxj0G+1 TB3+azHpCyyD9ZymmGc2B12PUQS3usffeoMWTPslS/a1StJy1TwpPwAKnfTJ+mXRCHZ3 zNA73GX7yQsGFkpXzjNrJxkBtdtFIbb9mFsDeqtDQabgatL8kYDDibYFZMzygmpP7N9Q LAUmRmJckbNNUNOU8raliLACSsrlFNXVZpgic+tmC2CWx7avpjpx4av2xVesG1VRKhcB /wkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=jAM80tVqXQyi4vtnlYH1l5eYEy92OHVb3Eyr7kkfQTQ=; b=FDpcP5E47EnMaax6JQ4wl/unOxfrj/126b4vdLrvJ+nwrVdU+Qz2ZyL5xoutJqSRxO Tvg6mo1Uf9n6HXnVPvnJUv55BYnfoVnEf2mL0aRviADdF55iAT16dVuc+hCG8v29jaBQ uqXD7KcGVOeRcw6J2NAxL+a4PEktyoXnNXtNyMM1dc3Ur1RuO1Itmxf2XJjyia7FuDSP fZJ3Z7j+dVXlh6UUpZ48apJY7qMTjmOF5VzwP/j60LonGotQNpXuFPqWUxsAllDl5CBa /3mTUueepqAg2KUEn+PZm5EiGGqCltuOLZXjjyJNMF3SrbdMsFAgoik1RtzF4+iRq739 vlSg== X-Gm-Message-State: AOAM531mzf8cO+2X2m6+mW7WnuWgRpWNFcvep49s9s+rKKHSPYD+PcFs dTOIiokQQjPOXgMieLp+1qUdkTfPqCybX/7M X-Google-Smtp-Source: ABdhPJycGp6BN0rjFPvE/gTf3oM+TKYao5QY8HpnEMkonyDyJDNxIs1JxQ7MYGwhRzTEAcuqlpd5qQ== X-Received: by 2002:ac8:3175:: with SMTP id h50mr23748302qtb.185.1600279616472; Wed, 16 Sep 2020 11:06:56 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id m24sm11209523qtn.59.2020.09.16.11.06.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:06:55 -0700 (PDT) Date: Wed, 16 Sep 2020 14:06:53 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 01/13] commit-graph: introduce 'get_bloom_filter_settings()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed in practice to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings *' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter *'. Co-authored-by: Derrick Stolee Signed-off-by: Taylor Blau --- blame.c | 6 ++++-- bloom.c | 6 +++--- commit-graph.c | 11 +++++++++++ commit-graph.h | 2 ++ revision.c | 5 +---- t/t4216-log-bloom.sh | 9 ++++++--- t/t5324-split-commit-graph.sh | 13 +++++++++++++ 7 files changed, 40 insertions(+), 12 deletions(-) diff --git a/blame.c b/blame.c index 1be1cd82a2..903e23af23 100644 --- a/blame.c +++ b/blame.c @@ -2892,16 +2892,18 @@ void setup_blame_bloom_data(struct blame_scoreboard *sb, const char *path) { struct blame_bloom_data *bd; + struct bloom_filter_settings *bs; if (!sb->repo->objects->commit_graph) return; - if (!sb->repo->objects->commit_graph->bloom_filter_settings) + bs = get_bloom_filter_settings(sb->repo); + if (!bs) return; bd = xmalloc(sizeof(struct blame_bloom_data)); - bd->settings = sb->repo->objects->commit_graph->bloom_filter_settings; + bd->settings = bs; bd->alloc = 4; bd->nr = 0; diff --git a/bloom.c b/bloom.c index 1a573226e7..cd9380ac62 100644 --- a/bloom.c +++ b/bloom.c @@ -38,7 +38,7 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, while (graph_pos < g->num_commits_in_base) g = g->base_graph; - /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ + /* The commit graph commit 'c' lives in doesn't carry Bloom filters. */ if (!g->chunk_bloom_indexes) return 0; @@ -195,8 +195,8 @@ struct bloom_filter *get_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - r->objects->commit_graph->chunk_bloom_indexes) - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) + return filter; } if (filter->data) diff --git a/commit-graph.c b/commit-graph.c index 0ed003e218..6a36ed0b06 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -667,6 +667,17 @@ int generation_numbers_enabled(struct repository *r) return !!first_generation; } +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r) +{ + struct commit_graph *g = r->objects->commit_graph; + while (g) { + if (g->bloom_filter_settings) + return g->bloom_filter_settings; + g = g->base_graph; + } + return NULL; +} + static void close_commit_graph_one(struct commit_graph *g) { if (!g) diff --git a/commit-graph.h b/commit-graph.h index 09a97030dc..0677dd1031 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -87,6 +87,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); */ int generation_numbers_enabled(struct repository *r); +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); + enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_APPEND = (1 << 0), COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1), diff --git a/revision.c b/revision.c index 1239023f93..6aeb764821 100644 --- a/revision.c +++ b/revision.c @@ -681,10 +681,7 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) repo_parse_commit(revs->repo, revs->commits->item); - if (!revs->repo->objects->commit_graph) - return; - - revs->bloom_filter_settings = revs->repo->objects->commit_graph->bloom_filter_settings; + revs->bloom_filter_settings = get_bloom_filter_settings(revs->repo); if (!revs->bloom_filter_settings) return; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 4bb9e9dbe2..715912ad0f 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -65,7 +65,7 @@ setup () { test_bloom_filters_used () { log_args=$1 - bloom_trace_prefix="statistics:{\"filter_not_present\":0,\"maybe\"" + bloom_trace_prefix="statistics:{\"filter_not_present\":${2:-0},\"maybe\"" setup "$log_args" && grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom && @@ -139,8 +139,11 @@ test_expect_success 'setup - add commit-graph to the chain without Bloom filters test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain ' -test_expect_success 'Do not use Bloom filters if the latest graph does not have Bloom filters.' ' - test_bloom_filters_not_used "-- A/B" +test_expect_success 'use Bloom filters even if the latest graph does not have Bloom filters' ' + # Ensure that the number of empty filters is equal to the number of + # filters in the latest graph layer to prove that they are loaded (and + # ignored). + test_bloom_filters_used "-- A/B" 3 ' test_expect_success 'setup - add commit-graph to the chain with Bloom filters' ' diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 18216463c7..c334ee9155 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -427,4 +427,17 @@ done <<\EOF 0600 -r-------- EOF +test_expect_success '--split=replace with partial Bloom data' ' + rm -rf $graphdir $infodir/commit-graph && + git reset --hard commits/3 && + git rev-list -1 HEAD~2 >a && + git rev-list -1 HEAD~1 >b && + git commit-graph write --split=no-merge --stdin-commits --changed-paths graph-files && + test_line_count = 1 graph-files && + verify_chain_files_exist $graphdir +' + test_done From patchwork Wed Sep 16 18:07:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780341 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5D40139F for ; Wed, 16 Sep 2020 18:07:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BBB702083B for ; Wed, 16 Sep 2020 18:07:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="aYAHFVSK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727727AbgIPSHz (ORCPT ); Wed, 16 Sep 2020 14:07:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727750AbgIPSHF (ORCPT ); Wed, 16 Sep 2020 14:07:05 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D2A7C061756 for ; Wed, 16 Sep 2020 11:07:04 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id v54so6844232qtj.7 for ; Wed, 16 Sep 2020 11:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=KY5DLSiAztKca4b5UTZ+e54C1v4rWfLhOel86owC76U=; b=aYAHFVSKs7GyaxIIptbYUQl5i9cwwW3LQwLyrVC5viZZYYwwLpagoaA9jJZtxRK3Wm 95BJcIFvS9QYEQsB+W9SMAZRXCIJhG8HRBE1wA2x3KzSxyj1mjemwuYhr+KZLhzauc9c DVAe8ewicsYunzhLC4g2Uw/BXaEmsSc6d0LAEzH2NyXeUyJSi27i2hYVdnH+KKG3WU2X IGLVTRKfOI11HVl6XNybig2nQjvaSi32Xp8c/g/puW41lt5lFhj0q86il78D4lrzkDKx 7DQgfZjUZG+JhV1S1QMmEA1frvRKSZqVZ53QOLw8OjdXYYcvItX0zy2IUues0Cm+t8/F o3Rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=KY5DLSiAztKca4b5UTZ+e54C1v4rWfLhOel86owC76U=; b=qO8rPHa/edTdEKB7e1G8LPIziXeOWghnPAyZuA6HUToKvuuqutfULpTHr+UuDgWP0Q pIIPObsf821lUjGPnUv2DXVRRMJa2I6VL+r588KVtwUu41LvqNVNIjbVQE05BEZkh+FV dyBR/ZaCcddJKlDy2mPzgXYG2JbixeCX7F5VO69KX+UlkMvdOWxc7pmOQd4qoN6vz0pr nirpRRuXWxPtvFYhs0lg5Z8G+e6yz/qIvYTS2S3Mbsw1he8hRGNswg5B+mYqYiU+wd7z oUEokDAJaCDxagJUsGK8woPan1uNPftld1V3dz0VPAheu/mOf6EKhTChM/mnkqYGutZl 87Yw== X-Gm-Message-State: AOAM532iJhrr/x6IaSCC4SbKGYhx9b1ird/Nct5he1hOpSSpD2zl0G+Y cn9HEItwh5m99OJTBCMcass2USUmpWvhnDXe X-Google-Smtp-Source: ABdhPJx0HCAghc4i1KGZh1nUZurkDteO8aFvPi4vHl+UnTXIfPg1ft0eJjUGbGYn44OtOBEc/Pe0YQ== X-Received: by 2002:ac8:568f:: with SMTP id h15mr12082071qta.232.1600279623141; Wed, 16 Sep 2020 11:07:03 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id s47sm20431310qtb.13.2020.09.16.11.07.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:07:02 -0700 (PDT) Date: Wed, 16 Sep 2020 14:07:00 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 02/13] t4216: use an '&&'-chain Message-ID: <677b2f1692b3b6dca8695e415fdb11b320b0a432.1600279373.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06), a 'rm' invocation was added without a corresponding '&&' chain. When 'trace.perf' already exists, everything works fine. However, the function can be executed without 'trace.perf' on disk (eg., when the subset of tests run is altered with '--run'), and so the bare 'rm' complains about a missing file. To remove some noise from the test log, invoke 'rm' with '-f', at which point it is sensible to place the 'rm -f' in an '&&'-chain, which is both (1) our usual style, and (2) avoids a broken chain in the future if more commands are added at the beginning of the function. Helped-by: Eric Sunshine Helped-by: Jeff King Signed-off-by: Taylor Blau --- t/t4216-log-bloom.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 715912ad0f..cd89c75002 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -58,7 +58,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF sane_unset GIT_TRACE2_CONFIG_PARAMS setup () { - rm "$TRASH_DIRECTORY/trace.perf" + rm -f "$TRASH_DIRECTORY/trace.perf" && git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom && GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom } From patchwork Wed Sep 16 18:07:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780335 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1BF2E139F for ; Wed, 16 Sep 2020 18:07:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EB22C21582 for ; Wed, 16 Sep 2020 18:07:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="b+b/MDLv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727348AbgIPSHj (ORCPT ); Wed, 16 Sep 2020 14:07:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727696AbgIPSHK (ORCPT ); Wed, 16 Sep 2020 14:07:10 -0400 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0278C061788 for ; Wed, 16 Sep 2020 11:07:09 -0700 (PDT) Received: by mail-qv1-xf41.google.com with SMTP id cr8so4000126qvb.10 for ; Wed, 16 Sep 2020 11:07:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=jGDpqN2Zui4Hmu5NwotOKRW01cVi6UT23SkbgECw41Q=; b=b+b/MDLvqxPlk7zXt/oeLvt5vS4li8iF+ZCNnq80OHqtB4+KFQ2xpldJeVsRXYrkFy tHKMaYlDUagRwC1yA6f+O2vTmTNXwZJhWRISabxePAz/JySjZ5WapmufBvyOF0QV4BK9 RcVJlw7x3iaTqgh8pfxvcCU495y/Zw3liRbmI03mA4ZlZUC7g1iH0mjeQMx8/KNszhkM 2C8uiGa3fQYjFx3XB7h7iQ6ZoNMRznuLv2fhMLw3deNd/SqxGnnCiIf6wHZrJSTG+l9f Glsg07ZKyYUrFgxFLtyjlPfxgtac+4J49jWmsUwfZt7GLgcz2KxTEnzvk0rv8sJqpPFW W3kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=jGDpqN2Zui4Hmu5NwotOKRW01cVi6UT23SkbgECw41Q=; b=k1gPb2gJr8Lq3y000EqxIux40GxvokqeTWhzQmhyR4lmViF5pSxK9WBNyEWRxYoLGn rxXF4/OmWCJZn/ecdc00P1MTNYf980by1KzGQKc2OA1LtGq2bnTha0pX+DbIz/kR1SKL Anbb5GVv0dGahiAGfZ7jsU+OVO1Q2NuSyuUb69Z+2Xplplqoji0cGuFvp7WRNYb3aHGY +H+dxei4YoKasYSerUr1I7FgUGISwsKPrc1nm0akYHAFI7GSIy2zUawOIXbKmr1tNqu8 13xJvEPkJHWTitFPU2bvOthQajgI5pP9eTG9HoZRXK6v1R3Sushzvk3XjP+jG2JzL1C/ 7GrA== X-Gm-Message-State: AOAM533EWr2AZEo1LKfzj+i4nFqx7nCk505sV3554a1FsYHpBhuWzugg zTvu726beTk9oHSzUYqAxN6l66muuiCB6Hxz X-Google-Smtp-Source: ABdhPJzvw7k+gki0J1ZfBGxjR2mFREQ9HTNLBqL8phUBut9+uNDVBnLqmTHC4uOifQG7Jkx+dFwlIA== X-Received: by 2002:ad4:4b61:: with SMTP id m1mr25023292qvx.11.1600279628581; Wed, 16 Sep 2020 11:07:08 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id e1sm19885407qtb.0.2020.09.16.11.07.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:07:07 -0700 (PDT) Date: Wed, 16 Sep 2020 14:07:05 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 03/13] commit-graph: pass a 'struct repository *' in more places Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a future commit, some commit-graph internals will want access to 'r->settings', but we only have the 'struct object_directory *' corresponding to that repository. Add an additional parameter to pass the repository around in more places. Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 2 +- commit-graph.c | 17 ++++++++++------- commit-graph.h | 6 ++++-- fuzz-commit-graph.c | 5 +++-- 4 files changed, 18 insertions(+), 12 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 523501f217..ba5584463f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -106,7 +106,7 @@ static int graph_verify(int argc, const char **argv) FREE_AND_NULL(graph_name); if (open_ok) - graph = load_commit_graph_one_fd_st(fd, &st, odb); + graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else graph = read_commit_graph_one(the_repository, odb); diff --git a/commit-graph.c b/commit-graph.c index 6a36ed0b06..72a838bd00 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -231,7 +231,8 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st) return 1; } -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb) { void *graph_map; @@ -247,7 +248,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); - ret = parse_commit_graph(graph_map, graph_size); + ret = parse_commit_graph(r, graph_map, graph_size); if (ret) ret->odb = odb; @@ -287,7 +288,8 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size) { const unsigned char *data, *chunk_lookup; uint32_t i; @@ -452,7 +454,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) return NULL; } -static struct commit_graph *load_commit_graph_one(const char *graph_file, +static struct commit_graph *load_commit_graph_one(struct repository *r, + const char *graph_file, struct object_directory *odb) { @@ -464,7 +467,7 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file, if (!open_ok) return NULL; - g = load_commit_graph_one_fd_st(fd, &st, odb); + g = load_commit_graph_one_fd_st(r, fd, &st, odb); if (g) g->filename = xstrdup(graph_file); @@ -476,7 +479,7 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, struct object_directory *odb) { char *graph_name = get_commit_graph_filename(odb); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); return g; @@ -557,7 +560,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, valid = 0; for (odb = r->objects->odb; odb; odb = odb->next) { char *graph_name = get_split_graph_filename(odb, line.buf); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); diff --git a/commit-graph.h b/commit-graph.h index 0677dd1031..d9acb22bac 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -75,11 +75,13 @@ struct commit_graph { struct bloom_filter_settings *bloom_filter_settings; }; -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); /* * Return 1 if and only if the repository has a commit-graph diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c index 430817214d..e7cf6d5b0f 100644 --- a/fuzz-commit-graph.c +++ b/fuzz-commit-graph.c @@ -1,7 +1,8 @@ #include "commit-graph.h" #include "repository.h" -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); @@ -10,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) struct commit_graph *g; initialize_the_repository(); - g = parse_commit_graph((void *)data, size); + g = parse_commit_graph(the_repository, (void *)data, size); repo_clear(the_repository); free_commit_graph(g); From patchwork Wed Sep 16 18:07:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780331 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC43F6CA for ; Wed, 16 Sep 2020 18:07:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8866E20872 for ; Wed, 16 Sep 2020 18:07:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="NiI5YiAp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727291AbgIPSHg (ORCPT ); Wed, 16 Sep 2020 14:07:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727703AbgIPSHS (ORCPT ); Wed, 16 Sep 2020 14:07:18 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 141D6C06178A for ; Wed, 16 Sep 2020 11:07:17 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id p65so6883269qtd.2 for ; Wed, 16 Sep 2020 11:07:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=hRMElt2uq9/5C4eKkJRsIRdQBvymmD1QqYtUGgVACEY=; b=NiI5YiApI1hr3W2iJRZJTxObXJreTTiYpARfbK8gX3smmxGL2buV1nXm2/VDHzQ993 g4xwrgQsHAGi9IcQh6aA6N/nwuRQzrims20+qtSed0nTrAigvNc3X3QgRL3hBR5o+a1c dUE4ZQG1zyjudasLiepoCMQL2gMxL1k3q43uOaHs7edswv5JJqno3X/+kLf6Pe8dv28R D1Yi0UtVEPiNx0g8ubX/keyDFSgISkTAuMCBYTOJVYWfvkTdH0hqmkho7pGpF7JwAW41 UO5cTZQMJNOhsa1MufXoR3sbbcgvoEy3mocyf5PjERoSLpIJvQEKV1lMauKbvga6CF+c oZTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=hRMElt2uq9/5C4eKkJRsIRdQBvymmD1QqYtUGgVACEY=; b=WC1iXaxoVgbQP20P6Df0QXtIlP8RkceOyNaVDKVkjedBoS8xwJwE9GxxDo1G2pW63U cemF0r+UaWInPewmaDrwrNAfdFmUD3tWQAWs8hBlpcix6ewTo3z3eChHMkUxiyON3esX MRjKXMsAk5A4FEeNoLAjdGrT3Ek8u479ivDPZoR/X5JPU7+roJoKyzZdZnofCNYOnlQL tZViPfAP84SLp/q8StIlqb0owSPBvSN7ATFPkClzXzGkJdgmdAdrvW0Chrijz1xxKFX/ 9cUkYCwNPkYRtxO36YhjMwZ11OrOC1drWb4JDIcGh+fkUiyAODsHpmF6UGVJajb5y5Go v31A== X-Gm-Message-State: AOAM53099ppx1rGkuSRxGJfOttGjLjIlQ8QRqro9E+iLbpRjumQmVstm zztXQAUziPgJr2VxvT/tNCHisTc6+ZdQBPFF X-Google-Smtp-Source: ABdhPJxgHbCghgx8++62/2j31ERr+XDxLXoPV0HTPVvISAAXOXhDQ/bcMEiKWWSpHuYhVAnQlwiuVA== X-Received: by 2002:ac8:c4e:: with SMTP id l14mr24686189qti.149.1600279635919; Wed, 16 Sep 2020 11:07:15 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id w59sm19281059qtd.1.2020.09.16.11.07.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:07:14 -0700 (PDT) Date: Wed, 16 Sep 2020 14:07:12 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 04/13] t/helper/test-read-graph.c: prepare repo settings Message-ID: <9416c467394e46e0a04c2c32737229a88998dcf7.1600279373.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The read-graph test-tool is used by a number of the commit-graph test to assert various properties about a commit-graph. Previously, this program never ran 'prepare_repo_settings()'. There was no need to do so, since none of the commit-graph machinery is affected by the repo settings. In the next patch, the commit-graph machinery's behavior will become dependent on the repo settings, and so loading them before running the rest of the test tool is critical. As such, teach the test tool to call 'prepare_repo_settings()'. Signed-off-by: Taylor Blau --- t/helper/test-read-graph.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 6d0c962438..5f585a1725 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -12,11 +12,12 @@ int cmd__read_graph(int argc, const char **argv) setup_git_directory(); odb = the_repository->objects->odb; + prepare_repo_settings(the_repository); + graph = read_commit_graph_one(the_repository, odb); if (!graph) return 1; - printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), *(unsigned char*)(graph->data + 4), From patchwork Wed Sep 16 18:07:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780337 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7759E6CA for ; Wed, 16 Sep 2020 18:07:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 557DF21582 for ; Wed, 16 Sep 2020 18:07:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="LwCMF8Kr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727701AbgIPSHu (ORCPT ); Wed, 16 Sep 2020 14:07:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727502AbgIPSHY (ORCPT ); Wed, 16 Sep 2020 14:07:24 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7249FC06174A for ; Wed, 16 Sep 2020 11:07:23 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id h6so6860914qtd.6 for ; Wed, 16 Sep 2020 11:07:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fHeBI0pYi99YmdtyHfymBlwTOfC71IZPo55pMaYLjxs=; b=LwCMF8KriEvdkjysa7G+YYbcEKpkfQkbk1MujJP26HX9+hvOdS+dqwI3yh7QID9SBT efQOUhFuDwx+XqL8xF4Gs0duhkSmrPiHFwhUiGiwHOrpf6Q+eWv7GxjaPbAp4sCop74s 5LQIY4JEjkxfhucIQgyRdRK2an4YrPQjqqo6Lv7x7i9/3BrnxgTJnyhgNOCu3B5sG1dW Ok0nHkxqbnIx6MZU2y+TeN9PQWSg9DO6GVvfWjf2UHO5+TRAg7WFn/ms9s+oi3+thWJL BSA/vwDZYpi/0xg8hoafaw8+UyqoVZRWCAdf6l3VfQSYcTA3U5HeP4H2iugieYfI4Rp+ wLdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=fHeBI0pYi99YmdtyHfymBlwTOfC71IZPo55pMaYLjxs=; b=XU/W6iu6ELUZeqky2FYZ8F3k7IHH5bZQ5KfFlMXB/4NpyuphS4pLCicoqkhtR7umq6 ivvOHxgsi+U/SubWjO85BGOijfsYRaCOzr89g19mUwgbIVHYaVlYN6lD2b4vdHKILv5b NceYHUfeunR5K72Qlyz++rS3Yihcge8gRk7kZd3imnAiHmG/4mwrL4F+7ATBEB+syzgR f14/If7SNy8CT/0K6cVY1ldR99vd4UUtLGCZOD1ArV+cKbtkkUHGcUo8UGe4+OpH81Bl yhdkHqIa+FeKK+8i4m5Ch+DiPVJk7gU4T9ZEwKTOTIXqktEY8cP5kqzNfPfTbEyePd25 doaA== X-Gm-Message-State: AOAM532PbQ3uOUnEKWSn78+fpCx/kUNTz2Uu1p14idBwABIY1lsPsuZp P2X4ZrRvziUBkLNdr8BMe/fG4Ac0Ne0UqZsL X-Google-Smtp-Source: ABdhPJyiNhaSWIWZXRXU6Buecd9zRFkRgkGGuQLSnV3eh0sTFvCU9mDWxuzQeTCyTOlzicFBPlYV3w== X-Received: by 2002:ac8:43d3:: with SMTP id w19mr25230612qtn.129.1600279642344; Wed, 16 Sep 2020 11:07:22 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id h68sm20375585qkf.30.2020.09.16.11.07.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:07:21 -0700 (PDT) Date: Wed, 16 Sep 2020 14:07:19 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 05/13] commit-graph: respect 'commitGraph.readChangedPaths' Message-ID: <29bc0d07c9422d58ae19750390bb08b8c739a03e.1600279373.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- Documentation/config.txt | 2 ++ Documentation/config/commitgraph.txt | 4 ++++ commit-graph.c | 6 ++++-- repo-settings.c | 3 +++ repository.h | 1 + t/t4216-log-bloom.sh | 4 +++- 6 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 Documentation/config/commitgraph.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 3042d80978..770ae79b82 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -340,6 +340,8 @@ include::config/column.txt[] include::config/commit.txt[] +include::config/commitgraph.txt[] + include::config/credential.txt[] include::config/completion.txt[] diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt new file mode 100644 index 0000000000..cff0797b54 --- /dev/null +++ b/Documentation/config/commitgraph.txt @@ -0,0 +1,4 @@ +commitGraph.readChangedPaths:: + If true, then git will use the changed-path Bloom filters in the + commit-graph file (if it exists, and they are present). Defaults to + true. See linkgit:git-commit-graph[1] for more information. diff --git a/commit-graph.c b/commit-graph.c index 72a838bd00..ea54d108b9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -327,6 +327,8 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } + prepare_repo_settings(r); + graph = alloc_commit_graph(); graph->hash_len = the_hash_algo->rawsz; @@ -403,14 +405,14 @@ struct commit_graph *parse_commit_graph(struct repository *r, case GRAPH_CHUNKID_BLOOMINDEXES: if (graph->chunk_bloom_indexes) chunk_repeated = 1; - else + else if (r->settings.commit_graph_read_changed_paths) graph->chunk_bloom_indexes = data + chunk_offset; break; case GRAPH_CHUNKID_BLOOMDATA: if (graph->chunk_bloom_data) chunk_repeated = 1; - else { + else if (r->settings.commit_graph_read_changed_paths) { uint32_t hash_version; graph->chunk_bloom_data = data + chunk_offset; hash_version = get_be32(data + chunk_offset); diff --git a/repo-settings.c b/repo-settings.c index aa61a35338..88ccce2036 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -17,9 +17,12 @@ void prepare_repo_settings(struct repository *r) if (!repo_config_get_bool(r, "core.commitgraph", &value)) r->settings.core_commit_graph = value; + if (!repo_config_get_bool(r, "commitgraph.readchangedpaths", &value)) + r->settings.commit_graph_read_changed_paths = value; if (!repo_config_get_bool(r, "gc.writecommitgraph", &value)) r->settings.gc_write_commit_graph = value; UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1); + UPDATE_DEFAULT_BOOL(r->settings.commit_graph_read_changed_paths, 1); UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1); if (!repo_config_get_int(r, "index.version", &value)) diff --git a/repository.h b/repository.h index 628c834367..bacf843d46 100644 --- a/repository.h +++ b/repository.h @@ -30,6 +30,7 @@ struct repo_settings { int initialized; int core_commit_graph; + int commit_graph_read_changed_paths; int gc_write_commit_graph; int fetch_write_commit_graph; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index cd89c75002..fc7693806c 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -95,7 +95,9 @@ do "--ancestry-path side..master" do test_expect_success "git log option: $option for path: $path" ' - test_bloom_filters_used "$option -- $path" + test_bloom_filters_used "$option -- $path" && + test_config commitgraph.readChangedPaths false && + test_bloom_filters_not_used "$option -- $path" ' done done From patchwork Wed Sep 16 18:07:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780339 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 87F2214F6 for ; Wed, 16 Sep 2020 18:07:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C4042083B for ; Wed, 16 Sep 2020 18:07:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="LqvmniE5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727710AbgIPSHw (ORCPT ); Wed, 16 Sep 2020 14:07:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727646AbgIPSHc (ORCPT ); Wed, 16 Sep 2020 14:07:32 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58AD8C061756 for ; Wed, 16 Sep 2020 11:07:31 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id h6so6861269qtd.6 for ; Wed, 16 Sep 2020 11:07:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=kJmFu6amOf+j9yAsUT2d9jTiADgXE3/HBBFXOzZpDx4=; b=LqvmniE54GCrqph+GfZSJlyJ676Mreq4bGFrhHhNEkfRrtvtPGqc1Pa3sRjIxd9jHt AiZ0QdNNesHKK4aYz/V/bv2K+UWdaPk6U6O4hLzoGmV1w1BNA4rHMAwKlziMCMUN8HAA MIImWlgIB9yzvc5KphbXVaZyaUMsY5cDyF9sm2lkrZ1YASr0ZrXd1dCom5al7GW/omjb Kxe+SlHdMwcD/LHDyp8xO306dJZKeyvukC+uy0ABLOSSvoPvJ277S+bxaOxaJJR/LyHu UTjycQ7VZf96bwldclcrrP8rlAA+GYaNIycfvBSENhZEVty1A/psawao+XudGl+2jll3 3oFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=kJmFu6amOf+j9yAsUT2d9jTiADgXE3/HBBFXOzZpDx4=; b=YYi9Qmv5LpORI1MvTaxL2P/XW6YEv9BU6ebc1YWeHdcwJBpEb1eeH3jzy7kYv5zP4W 2DQX6W3MHv9fiWiB0iZooganmbVomKC13r6Gtv8HSkJ5aE27FzuO+uyhcvqctq9jamsH cq7W0DnnjXD6PxeH0ytIOlkLucqoNjJttm8qow1REWz5YBSO9zdCyIjPpKq0cZMly1um P1g20tnyhl1ysS43HJHXhjEZRUIkNQzbWQ9Tj3JFlDgu/aXynl3rmQ0j0idK8MQzfMnU JbUgBFfELMH0wxyUOn1LJo/w2e421wfDCsPbiNARHLQLANS6UDDlXPGoGaAyrqcYuslX RyGg== X-Gm-Message-State: AOAM533ig4Ro7cUjl66YfBRdiKysml6XKKRlTXLfbPDxn5XbDx+Bl0LO pMxStfYPFZ7+dApvmB+PYa+57HrhGTBIrXaa X-Google-Smtp-Source: ABdhPJxdE136vdRoqqlN+TMrs+jct+myU+NMb/QAd2NWc4oNQ+QA2ePRw97oVLZGvhtlhquphJjKhg== X-Received: by 2002:ac8:7219:: with SMTP id a25mr25057194qtp.4.1600279650048; Wed, 16 Sep 2020 11:07:30 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id x124sm20164972qkd.72.2020.09.16.11.07.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:07:29 -0700 (PDT) Date: Wed, 16 Sep 2020 14:07:26 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 06/13] commit-graph.c: store maximum changed paths Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org For now, we assume that there is a fixed constant describing the maximum number of changed paths we are willing to store in a Bloom filter. Prepare for that to (at least partially) not be the case by making it a member of the 'struct bloom_filter_settings'. This will be helpful in the subsequent patches by reducing the size of test cases that exercise storing too many changed paths, as well as preparing for an eventual future in which this value might change. This patch alone does not cause newly generated Bloom filters to use a custom upper-bound on the maximum number of changed paths a single Bloom filter can hold, that will occur in a later patch. Signed-off-by: Taylor Blau --- bloom.h | 11 ++++++++++- commit-graph.c | 3 +++ t/t4216-log-bloom.sh | 4 ++-- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/bloom.h b/bloom.h index d8fbb0fbf1..0b9b59a6fe 100644 --- a/bloom.h +++ b/bloom.h @@ -28,9 +28,18 @@ struct bloom_filter_settings { * that contain n*b bits. */ uint32_t bits_per_entry; + + /* + * The maximum number of changed paths per commit + * before declaring a Bloom filter to be too-large. + * + * Not written to the commit-graph file. + */ + uint32_t max_changed_paths; }; -#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10 } +#define DEFAULT_BLOOM_MAX_CHANGES 512 +#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10, DEFAULT_BLOOM_MAX_CHANGES } #define BITS_PER_WORD 8 #define BLOOMDATA_CHUNK_HEADER_SIZE 3 * sizeof(uint32_t) diff --git a/commit-graph.c b/commit-graph.c index ea54d108b9..55af498aa0 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1201,6 +1201,7 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) jw_object_intmax(&jw, "hash_version", ctx->bloom_settings->hash_version); jw_object_intmax(&jw, "num_hashes", ctx->bloom_settings->num_hashes); jw_object_intmax(&jw, "bits_per_entry", ctx->bloom_settings->bits_per_entry); + jw_object_intmax(&jw, "max_changed_paths", ctx->bloom_settings->max_changed_paths); jw_end(&jw); trace2_data_json("bloom", ctx->r, "settings", &jw); @@ -1669,6 +1670,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) bloom_settings.bits_per_entry); bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); ctx->bloom_settings = &bloom_settings; } diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index fc7693806c..47ddf2641f 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -174,11 +174,11 @@ test_expect_success 'persist filter settings' ' GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \ GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2.txt && + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2.txt && GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \ GIT_TRACE2_EVENT_NESTING=5 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2-auto.txt + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2-auto.txt ' test_expect_success 'correctly report changes over limit' ' From patchwork Wed Sep 16 18:07:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780345 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AE97A139F for ; Wed, 16 Sep 2020 18:09:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7C34B20872 for ; Wed, 16 Sep 2020 18:09:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="ZHRAwD3R" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727758AbgIPSIw (ORCPT ); Wed, 16 Sep 2020 14:08:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727676AbgIPSHv (ORCPT ); Wed, 16 Sep 2020 14:07:51 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67A47C06174A for ; Wed, 16 Sep 2020 11:07:44 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id h6so6861942qtd.6 for ; Wed, 16 Sep 2020 11:07:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=6Lh1rTmJypFjZMoBWdiU5LBa9Psj2qZiA1TVG4tv1vw=; b=ZHRAwD3RSszLfENiDu0rOzMcyxD3fwehjCgpn9YJeUR1qSawQWhAc3pa5/Rp6PfzlB RZzaAHgFjSC+0K4kO4fB6V8BfG2jMueyYuE6Mz1U7woX+25LlUZBdd+nxFJ25h0haGZm GGxvOD5UVqQdIdr7w+OUt/eeteAZaAmDPs/vKLOv+VBUKlcJ5XdpyAJgvCXMbl/jz1xH wcPFL1u5awxUFq08pfYBmarEA9JWyr1kzcRpNGzHShAOTLRtrPutkscDM/HvnwenCZkF /dxUVH8ESVaHon2mJBn8+9/H/SOQ1WrTcaCi2Ylq13dZpMA7UrUYl0Lpb35VlqVecdtI 3Usw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=6Lh1rTmJypFjZMoBWdiU5LBa9Psj2qZiA1TVG4tv1vw=; b=ItHhSLoiYslRsaf9oMgxb/T4emrN4sGz1FOF+yclcW1PiL2H69X9CRpC4S7OdQxURe UmkpRe90mBuaL6gaYBQNp0Tym/MkVXAKCmGo6t7878r716Dt2MT8QP8cMmSIQ9M39Td+ 1bgf/PBWpiFBRquzx/YK/wSPXuO+QpEtLwKfiKQrbWzCM8g1ps7V7AC/7B/LCgNU+Kw8 Bqi4RdVdiAEXwBK25L2U7SejyDMivc1fhabEBHDpDH+za7do1r6r+aaA/S8su+gJMjE/ QL+rel0v3Z33a6B6rn43hY1QBET99qMDvFfmBP7CZeIPhU9fPfTNmjTl6W/St3/09uhO o9aQ== X-Gm-Message-State: AOAM531khKa4+/AkkzlOfxWoyD+kTbk1NixZHCZ/8YMWnM7+B5As3wkD oOyBXnT3+4dh2xC4XsWlt4a2LD8dv/TyUzJz X-Google-Smtp-Source: ABdhPJy/gng48dYGjrrdoKo/ygcPLYrqJVusS8WcP4RFb1pkGylSPqosaax9iB1lbvWvYwwDV5ff8Q== X-Received: by 2002:ac8:7b2c:: with SMTP id l12mr23562802qtu.98.1600279662018; Wed, 16 Sep 2020 11:07:42 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id r42sm20221684qtk.29.2020.09.16.11.07.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:07:35 -0700 (PDT) Date: Wed, 16 Sep 2020 14:07:32 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 07/13] bloom: split 'get_bloom_filter()' in two Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with counters in the 'write_commit_graph_context' struct which store the number of filters that we did and didn't compute, as well as filters that were truncated. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau --- blame.c | 2 +- bloom.c | 16 +++++++++++++--- bloom.h | 16 +++++++++++++--- commit-graph.c | 34 +++++++++++++++++++++++++++++++--- line-log.c | 2 +- revision.c | 2 +- t/helper/test-bloom.c | 3 ++- 7 files changed, 62 insertions(+), 13 deletions(-) diff --git a/blame.c b/blame.c index 903e23af23..e5ba35dbd1 100644 --- a/blame.c +++ b/blame.c @@ -1276,7 +1276,7 @@ static int maybe_changed_path(struct repository *r, if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY) return 1; - filter = get_bloom_filter(r, origin->commit, 0); + filter = get_bloom_filter(r, origin->commit); if (!filter) return 1; diff --git a/bloom.c b/bloom.c index cd9380ac62..393c61b4bc 100644 --- a/bloom.c +++ b/bloom.c @@ -177,9 +177,10 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, return strcmp(e1->path, e2->path); } -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present) +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + enum bloom_filter_computed *computed) { struct bloom_filter *filter; struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; @@ -187,6 +188,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct diff_options diffopt; int max_changes = 512; + if (computed) + *computed = BLOOM_NOT_COMPUTED; + if (!bloom_filters.slab_size) return NULL; @@ -271,8 +275,14 @@ struct bloom_filter *get_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); filter->data = NULL; filter->len = 0; + + if (computed) + *computed |= BLOOM_TRUNC_LARGE; } + if (computed) + *computed |= BLOOM_COMPUTED; + free(diff_queued_diff.queue); DIFF_QUEUE_CLEAR(&diff_queued_diff); diff --git a/bloom.h b/bloom.h index 0b9b59a6fe..e2e035ad14 100644 --- a/bloom.h +++ b/bloom.h @@ -89,9 +89,19 @@ void add_key_to_filter(const struct bloom_key *key, void init_bloom_filters(void); -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present); +enum bloom_filter_computed { + BLOOM_NOT_COMPUTED = (1 << 0), + BLOOM_COMPUTED = (1 << 1), + BLOOM_TRUNC_LARGE = (1 << 2), +}; + +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + enum bloom_filter_computed *computed); + +#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ + (r), (c), 0, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index 55af498aa0..45fcd62596 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -971,6 +971,10 @@ struct write_commit_graph_context { const struct split_commit_graph_opts *split_opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; + + int count_bloom_filter_computed; + int count_bloom_filter_not_computed; + int count_bloom_filter_trunc_large; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1182,7 +1186,7 @@ static int write_graph_chunk_bloom_indexes(struct hashfile *f, uint32_t cur_pos = 0; while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; cur_pos += len; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1222,7 +1226,7 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, hashwrite_be32(f, ctx->bloom_settings->bits_per_entry); while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1392,6 +1396,16 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } +static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx) +{ + trace2_data_intmax("commit-graph", ctx->r, "filter-computed", + ctx->count_bloom_filter_computed); + trace2_data_intmax("commit-graph", ctx->r, "filter-not-computed", + ctx->count_bloom_filter_not_computed); + trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-large", + ctx->count_bloom_filter_trunc_large); +} + static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; @@ -1414,12 +1428,26 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); for (i = 0; i < ctx->commits.nr; i++) { + enum bloom_filter_computed computed = 0; struct commit *c = sorted_commits[i]; - struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); + struct bloom_filter *filter = get_or_compute_bloom_filter( + ctx->r, + c, + 1, + &computed); + if (computed & BLOOM_COMPUTED) { + ctx->count_bloom_filter_computed++; + if (computed & BLOOM_TRUNC_LARGE) + ctx->count_bloom_filter_trunc_large++; + } else if (computed & BLOOM_NOT_COMPUTED) + ctx->count_bloom_filter_not_computed++; ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1); } + if (trace2_is_enabled()) + trace2_bloom_filter_write_statistics(ctx); + free(sorted_commits); stop_progress(&progress); } diff --git a/line-log.c b/line-log.c index bf73ea95ac..68eeb425f8 100644 --- a/line-log.c +++ b/line-log.c @@ -1159,7 +1159,7 @@ static int bloom_filter_check(struct rev_info *rev, return 1; if (!rev->bloom_filter_settings || - !(filter = get_bloom_filter(rev->repo, commit, 0))) + !(filter = get_bloom_filter(rev->repo, commit))) return 1; if (!range) diff --git a/revision.c b/revision.c index 6aeb764821..c53ab99e59 100644 --- a/revision.c +++ b/revision.c @@ -752,7 +752,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY) return -1; - filter = get_bloom_filter(revs->repo, commit, 0); + filter = get_bloom_filter(revs->repo, commit); if (!filter) { count_bloom_filter_not_present++; diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 5e77d56f59..9f7bb729fc 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -39,7 +39,8 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) struct bloom_filter *filter; setup_git_directory(); c = lookup_commit(the_repository, commit_oid); - filter = get_bloom_filter(the_repository, c, 1); + filter = get_or_compute_bloom_filter(the_repository, c, 1, + NULL); print_bloom_filter(filter); } From patchwork Wed Sep 16 18:07:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780347 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FBCA6CA for ; Wed, 16 Sep 2020 18:09:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EA67B21582 for ; Wed, 16 Sep 2020 18:09:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="BypM4iCI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727707AbgIPSJE (ORCPT ); Wed, 16 Sep 2020 14:09:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727703AbgIPSHv (ORCPT ); Wed, 16 Sep 2020 14:07:51 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 536B7C061756 for ; Wed, 16 Sep 2020 11:07:51 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id o5so9091494qke.12 for ; Wed, 16 Sep 2020 11:07:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=6gVVOBGTmn8w/w2wgbowEBipPA6rBGV/NBds/GeyL/g=; b=BypM4iCIRgJvjtq4o8z+tyew4nNP47zAMKcaOKnIPnOUzmUQUy7aS79dTSgH6ZAixB VIsJAAonDqQ8vyQ/2Vatx5Y6nD2RrfCYwIM/bo0xmqX3T649lKqjcfcXtsqBblVyObhv Y2Tx/1iJUG+QpDX1qEXPRAy+qNdZ1kfCcaGJZJI2UwTOUeXGvlNFEO08WtIBYDioou99 9IZ2cLch/6LRivM/9VNBytU1LfjBiL6HdXXBr1uWBUhu+SywzxBQ/WXSJKoAhiatwiCV CP1eaR/KPYDtpKAjyVayID3wo9EIv4Aa/jlGhrIu+bBtAerKck1s37bxBM5cX+o2Zt9d herQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=6gVVOBGTmn8w/w2wgbowEBipPA6rBGV/NBds/GeyL/g=; b=FQBaWWVvwyjvuzUXU8ddruuXMmFcPRHP9sjyzvM2HS3co5lAIgVYX6/JSBQDjIsYRJ su6M2zD+++b9jzAElKqLMuMctfB1dSdjX+tfKeJ71xbItScYoreTBhHXh8Wm49zgIMB4 WXl6SKLRH3vRscgxDV3yn3mP8XzJRwhZR8IiJYnGhXXQ+uVjzWO4iX1tVNpZ0UpHzESS Nf8YEs/gz55MVvIg/MzPCowoyi2XDZxfHlBmQupiDmXUXuQQADt65QCzWv7K5T8V/ZDk rXggQafMcN+bYlVCnkZmuKX1ObBjBhDJuN+UrNndDBFCI28GiNHy211ubtXsNQ+HF8Tr B+ng== X-Gm-Message-State: AOAM532xD3MWeaf60XXiE9JO7lzaKS82GzBmp2zDWSoATygnrTavv+BU E98YH/e+c6VpLFg0r3QDMmA6hUS7NP1wnJ0Q X-Google-Smtp-Source: ABdhPJwFd6iveCdBeZLNRbfjoVbzNoo/PKM6eQoiBrNxa4OAyEv3wkslX2GzLhkfTVAp46ncjWEh+Q== X-Received: by 2002:a37:7fc3:: with SMTP id a186mr19592527qkd.381.1600279669984; Wed, 16 Sep 2020 11:07:49 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id o4sm19337326qkk.75.2020.09.16.11.07.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:07:48 -0700 (PDT) Date: Wed, 16 Sep 2020 14:07:46 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 08/13] bloom: use provided 'struct bloom_filter_settings' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter from scratch, it looks to the default 'struct bloom_filter_settings' in order to determine the maximum number of changed paths, number of bits per entry, and so on. All of these values have so far been constant, and so there was no need to pass in a pointer from the caller (eg., the one that is stored in the 'struct write_commit_graph_context'). Start passing in a 'struct bloom_filter_settings *' instead of using the default values to respect graph-specific settings (eg., in the case of setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS'). In order to have an initialized value for these settings, move its initialization to earlier in the commit-graph write. Signed-off-by: Taylor Blau --- bloom.c | 13 ++++++------- bloom.h | 3 ++- commit-graph.c | 21 ++++++++++----------- t/helper/test-bloom.c | 1 + 4 files changed, 19 insertions(+), 19 deletions(-) diff --git a/bloom.c b/bloom.c index 393c61b4bc..2d6aef9098 100644 --- a/bloom.c +++ b/bloom.c @@ -180,13 +180,12 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, enum bloom_filter_computed *computed) { struct bloom_filter *filter; - struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; int i; struct diff_options diffopt; - int max_changes = 512; if (computed) *computed = BLOOM_NOT_COMPUTED; @@ -211,7 +210,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, repo_diff_setup(r, &diffopt); diffopt.flags.recursive = 1; diffopt.detect_rename = 0; - diffopt.max_changes = max_changes; + diffopt.max_changes = settings->max_changed_paths; diff_setup_done(&diffopt); /* ensure commit is parsed so we have parent information */ @@ -223,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= max_changes) { + if (diffopt.num_changes <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -260,13 +259,13 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } - filter->len = (hashmap_get_size(&pathmap) * settings.bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); hashmap_for_each_entry(&pathmap, &iter, e, entry) { struct bloom_key key; - fill_bloom_key(e->path, strlen(e->path), &key, &settings); - add_key_to_filter(&key, filter, &settings); + fill_bloom_key(e->path, strlen(e->path), &key, settings); + add_key_to_filter(&key, filter, settings); } hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); diff --git a/bloom.h b/bloom.h index e2e035ad14..c6d77e8393 100644 --- a/bloom.h +++ b/bloom.h @@ -98,10 +98,11 @@ enum bloom_filter_computed { struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, enum bloom_filter_computed *computed); #define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ - (r), (c), 0, NULL) + (r), (c), 0, NULL, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index 45fcd62596..1ca754f19c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1434,6 +1434,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->r, c, 1, + ctx->bloom_settings, &computed); if (computed & BLOOM_COMPUTED) { ctx->count_bloom_filter_computed++; @@ -1691,17 +1692,6 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) int num_chunks = 3; uint64_t chunk_offset; struct object_id file_hash; - struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; - - if (!ctx->bloom_settings) { - bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", - bloom_settings.bits_per_entry); - bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", - bloom_settings.num_hashes); - bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", - bloom_settings.max_changed_paths); - ctx->bloom_settings = &bloom_settings; - } if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -2147,6 +2137,7 @@ int write_commit_graph(struct object_directory *odb, uint32_t i, count_distinct = 0; int res = 0; int replace = 0; + struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; if (!commit_graph_compatible(the_repository)) return 0; @@ -2160,6 +2151,14 @@ int write_commit_graph(struct object_directory *odb, ctx->split_opts = split_opts; ctx->total_bloom_filter_data_size = 0; + bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", + bloom_settings.bits_per_entry); + bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", + bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); + ctx->bloom_settings = &bloom_settings; + if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) ctx->changed_paths = 1; if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 9f7bb729fc..46e97b04eb 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -40,6 +40,7 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) setup_git_directory(); c = lookup_commit(the_repository, commit_oid); filter = get_or_compute_bloom_filter(the_repository, c, 1, + &settings, NULL); print_bloom_filter(filter); } From patchwork Wed Sep 16 18:07:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780355 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F5EE6CA for ; Wed, 16 Sep 2020 18:11:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 376A22083B for ; Wed, 16 Sep 2020 18:11:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="TLotjE0Q" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727703AbgIPSLE (ORCPT ); Wed, 16 Sep 2020 14:11:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727783AbgIPSJT (ORCPT ); Wed, 16 Sep 2020 14:09:19 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF6A3C061788 for ; Wed, 16 Sep 2020 11:07:57 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id e7so6837855qtj.11 for ; Wed, 16 Sep 2020 11:07:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=wjiHXF1qjLf93piJYX/E8ScTlX22Y2OZBx0J5zlVtlM=; b=TLotjE0Ql8vbzSCGxHRAXNCgTO1VjJJoK+Nm5mpM2jNk4SyYNHtv2ydcnhfcDm6Qad 1a0o65Ke/e32I9Z7I8giR+0JNkZlSdxTwQV27+ZoU4eBflqPbx4Us7f6DJPauXhMrEzg 1mNOG8zqTXZHIE6Fbad9KqdBKUTtSMED+jcOzu2ESgzAuLQBntoii9GZ6WABGfzkkoZB GTZkJ2jKiywFFhdcY3XLtwiVPwnUlZMRgI0us9agYBFdSPYeuyzbcn08vnxQqKi7htsw t7cw7BvhGnbscKWhVSDGv/qeylOClg0w0RTGI/1B2mp+skPVd6vDh4MWu/ON73MpIv0Z lRCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=wjiHXF1qjLf93piJYX/E8ScTlX22Y2OZBx0J5zlVtlM=; b=jEz4qu+6ih5m5Y78YL5oBLZ/QvHbHXBLGSQOODGeaUnszKT9TlQtOApgviK3PpljMs z7XqzPxyY3zgJJirX6MU14aG+n9W1WV94DOErryk+7r4Dg3IQNp+FpR7uYZoncYuXGSN oT8gB0ftlpw5awWON7xuxT3QkRBWrJL4g0vGbt3kQWF15MI6B6BaB7a3PFzUzJLuJ2jl 894InxmBB09M3RHXPeK6Z14Wgu6MTRMB/rG9X1SHDAaxG8cRNAnGs/pW6U7aQPHK2K7v g9R1fBE2DmciVWNfGC/DvXU44dgPTh+PMyktVXfnKpFaZIsee9psWQfblBsfz06otFPy poOg== X-Gm-Message-State: AOAM530wfbmAAit5SWmgi3FAYaJRT5egcXqP68G09IPQdRw0O5jKpl/H DzIzZ5LNA1wHc8/8+YP/0pQrWspa0575vYhI X-Google-Smtp-Source: ABdhPJzI2UeTz6UaixiJ+uo6puT3vHhiCA4iWGxvr5TmR+eB0bYaMTkCQ7SHyggvj1QtQezKnHYm9Q== X-Received: by 2002:ac8:4e49:: with SMTP id e9mr11098874qtw.334.1600279676362; Wed, 16 Sep 2020 11:07:56 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id u18sm20256252qtk.61.2020.09.16.11.07.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:07:55 -0700 (PDT) Date: Wed, 16 Sep 2020 14:07:52 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 09/13] bloom/diff: properly short-circuit on max_changes Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee Commit e3696980 (diff: halt tree-diff early after max_changes, 2020-03-30) intended to create a mechanism to short-circuit a diff calculation after a certain number of paths were modified. By incrementing a "num_changes" counter throughout the recursive ll_diff_tree_paths(), this was supposed to match the number of changes that would be written into the changed-path Bloom filters. Unfortunately, this was not implemented correctly and instead misses simple cases like file modifications. This then does not stop very large changed-path filters from being written (unless they add or remove many files). To start, change the implementation in ll_diff_tree_paths() to instead use the global diff_queue_diff struct's 'nr' member as the count. This is a way to simplify the logic instead of making more mistakes in the complicated diff code. This has a drawback: the diff_queue_diff struct only lists the paths corresponding to blob changes, not their leading directories. Thus, get_or_compute_bloom_filter() needs an additional check to see if the hashmap with the leading directories becomes too large. One reason why this was not caught by test cases was that the test in t4216-log-bloom.sh that was supposed to check this "too many changes" condition only checked this on the initial commit of a repository. The old logic counted these values correctly. Update this test in a few ways: 1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit, allowing smaller commits to engage with this logic. 2. Create several interesting cases of edits, adds, removes, and mode changes (in the second commit). By testing both sides of the inequality with the *_MAX_CHANGED_PATHS variable, we can see that the count is exactly correct, so none of these changes are missed or over-counted. 3. Use the trace2 data value filter_found_large to verify that these commits are on the correct side of the limit. Another way to verify the behavior is correct is through performance tests. By testing on my local copies of the Git repository and the Linux kernel repository, I could measure the effect of these short-circuits when computing a fresh commit-graph file with changed-path Bloom filters using the command GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \ git commit-graph write --reachable --changed-paths and reporting the wall time and resulting commit-graph size. For Git, the results are | | N=1 | N=10 | N=512 | |--------|----------------|----------------|----------------| | HEAD~1 | 10.90s 9.18MB | 11.11s 9.34MB | 11.31s 9.35MB | | HEAD | 9.21s 8.62MB | 11.11s 9.29MB | 11.29s 9.34MB | For Linux, the results are | | N=1 | N=20 | N=512 | |--------|----------------|---------------|---------------| | HEAD~1 | 61.28s 64.3MB | 76.9s 72.6MB | 77.6s 72.6MB | | HEAD | 49.44s 56.3MB | 68.7s 65.9MB | 69.2s 65.9MB | Naturally, the improvement becomes much less as the limit grows, as fewer commits satisfy the short-circuit. Reported-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- bloom.c | 9 +++- diff.h | 2 - t/t4216-log-bloom.sh | 100 +++++++++++++++++++++++++++++++++++++++---- tree-diff.c | 5 +-- 4 files changed, 100 insertions(+), 16 deletions(-) diff --git a/bloom.c b/bloom.c index 2d6aef9098..db9fb82437 100644 --- a/bloom.c +++ b/bloom.c @@ -222,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= settings->max_changed_paths) { + if (diff_queued_diff.nr <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -259,6 +259,12 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } + if (hashmap_get_size(&pathmap) > settings->max_changed_paths) { + if (computed) + *computed |= BLOOM_TRUNC_LARGE; + goto cleanup; + } + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); @@ -268,6 +274,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, add_key_to_filter(&key, filter, settings); } + cleanup: hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); } else { for (i = 0; i < diff_queued_diff.nr; i++) diff --git a/diff.h b/diff.h index e0c0af6286..1d32b71885 100644 --- a/diff.h +++ b/diff.h @@ -287,8 +287,6 @@ struct diff_options { /* If non-zero, then stop computing after this many changes. */ int max_changes; - /* For internal use only. */ - int num_changes; int ita_invisible_in_index; /* white-space error highlighting */ diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 47ddf2641f..1ac8f4c4eb 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -181,21 +181,103 @@ test_expect_success 'persist filter settings' ' grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2-auto.txt ' +test_max_changed_paths () { + grep "\"max_changed_paths\":$1" $2 +} + +test_filter_computed () { + grep "\"key\":\"filter-computed\",\"value\":\"$1\"" $2 +} + +test_filter_trunc_large () { + grep "\"key\":\"filter-trunc-large\",\"value\":\"$1\"" $2 +} + test_expect_success 'correctly report changes over limit' ' - git init 513changes && + git init limits && ( - cd 513changes && - for i in $(test_seq 1 513) + cd limits && + mkdir d && + mkdir d/e && + + for i in $(test_seq 1 2) do - echo $i >file$i.txt || return 1 + printf $i >d/file$i.txt && + printf $i >d/e/file$i.txt || return 1 done && - git add . && + + mkdir mode && + printf bash >mode/script.sh && + + mkdir foo && + touch foo/bar && + touch foo.txt && + + git add d foo foo.txt mode && git commit -m "files" && - git commit-graph write --reachable --changed-paths && - for i in $(test_seq 1 513) + + # Commit has 7 file and 4 directory adds + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace" \ + git commit-graph write --reachable --changed-paths && + test_max_changed_paths 10 trace && + test_filter_computed 1 trace && + test_filter_trunc_large 1 trace && + + for path in $(git ls-tree -r --name-only HEAD) do - git -c core.commitGraph=false log -- file$i.txt >expect && - git log -- file$i.txt >actual && + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # Make a variety of path changes + printf new1 >d/e/file1.txt && + printf new2 >d/file2.txt && + rm d/e/file2.txt && + rm -r foo && + printf text >foo && + mkdir f && + printf new1 >f/file1.txt && + + # including a mode-only change (counts as modified) + git update-index --chmod=+x mode/script.sh && + + git add foo d f && + git commit -m "complicated" && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace-edit" \ + git commit-graph write --reachable --changed-paths && + test_max_changed_paths 10 trace-edit && + test_filter_computed 2 trace-edit && + test_filter_trunc_large 2 trace-edit && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=11 \ + GIT_TRACE2_EVENT="$(pwd)/trace-update" \ + git commit-graph write --reachable --changed-paths && + test_max_changed_paths 11 trace-update && + test_filter_computed 2 trace-update && + test_filter_trunc_large 0 trace-update && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && test_cmp expect actual || return 1 done ) diff --git a/tree-diff.c b/tree-diff.c index 6ebad1a46f..7cebbb327e 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -434,7 +434,7 @@ static struct combine_diff_path *ll_diff_tree_paths( if (diff_can_quit_early(opt)) break; - if (opt->max_changes && opt->num_changes > opt->max_changes) + if (opt->max_changes && diff_queued_diff.nr > opt->max_changes) break; if (opt->pathspec.nr) { @@ -521,7 +521,6 @@ static struct combine_diff_path *ll_diff_tree_paths( /* t↓ */ update_tree_entry(&t); - opt->num_changes++; } /* t > p[imin] */ @@ -539,7 +538,6 @@ static struct combine_diff_path *ll_diff_tree_paths( skip_emit_tp: /* ∀ pi=p[imin] pi↓ */ update_tp_entries(tp, nparent); - opt->num_changes++; } } @@ -557,7 +555,6 @@ struct combine_diff_path *diff_tree_paths( const struct object_id **parents_oid, int nparent, struct strbuf *base, struct diff_options *opt) { - opt->num_changes = 0; p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt); /* From patchwork Wed Sep 16 18:07:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780349 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EB676CA for ; Wed, 16 Sep 2020 18:09:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 69C8C2083B for ; Wed, 16 Sep 2020 18:09:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="piyIiAKs" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727801AbgIPSJj (ORCPT ); Wed, 16 Sep 2020 14:09:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727785AbgIPSJT (ORCPT ); Wed, 16 Sep 2020 14:09:19 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F33DC06178A for ; Wed, 16 Sep 2020 11:08:04 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id g72so9147032qke.8 for ; Wed, 16 Sep 2020 11:08:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=Z+Z0ZgoCx8GzfG/7B2Il5ctzxmP3dZx1lu4Bc5/+onQ=; b=piyIiAKsKRgnfNO6pNWnFUowedJrq/O4203HgAFJ59B0hPM9AdbgWTqVLeXD57OPPg uAQ4xtps2ywl6dtPUQjlBYcNELeii9RoFHgm0mS/gtaYvRpITVIyuRuqu6MBZwCS3ut0 x9tcIpFuPcTpvbKHiy1MdmQOUI6hfSsRzFCK9ySSqsrQfcpoeAiTzHJxx84I1nRbhK0K +7ATG/WUx+GsoYw3h3r89HNvBQp4XRsSL+6SmAMeW2DFRaZfXEyPTRBOk6nNC4V+vNNC Tb3ppnYH5WggXw7Qjbu2GkilSA8rNkswLhpAy5un0USEIOApnjVW4wNUouQo1K8N3C6L 56pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=Z+Z0ZgoCx8GzfG/7B2Il5ctzxmP3dZx1lu4Bc5/+onQ=; b=Uowbu2yjtawBaKmvOoz4+H6IaWUrwukol5SvxG/8BIGGvmHpjPZLdcp7ly8s7IJq6q MgdvBoeSsCT4Eu49ZDWbeAKM7p6yqjpI8amtYGumQEkCyrrmBQ5/zbzMM3g4/szgu0NE Tgjm5OEDfEMFtkaJcgOgplmY2+Fc39p1yOizhC0Cou0Luna0mMVu9/hAy31UhLwY1pk1 HTDc9NlUzacOiU1zhIAFFMf6I+f5hMCqWtqXuRW2Nt8Xgd5o0EHf4QTCKGNXHWlgDn97 y+UGvYxSifEwjorHyAaxYmHaXBnD5S+J0kU0CY9GgLzjE4jJIQWGD5hLCjydcPElBIy7 cAow== X-Gm-Message-State: AOAM533TXVbTgfTas2wxVOeAWkMYp5XkzF/8D/bVMBUaR4KS3mX8eWlO g8rgsPuBgJsMs1BoxXjXH0lKE5LpaRUG6kgk X-Google-Smtp-Source: ABdhPJwpFkD9FuC7kT9WDGSmO3QQlnHw9QA/g7XOj3PpjEigAXNDpAe20u7Yzs2gufqIl9x59Lys7A== X-Received: by 2002:ae9:f301:: with SMTP id p1mr24297625qkg.216.1600279683068; Wed, 16 Sep 2020 11:08:03 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id d10sm19419141qkk.1.2020.09.16.11.08.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:08:02 -0700 (PDT) Date: Wed, 16 Sep 2020 14:07:59 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 10/13] bloom: encode out-of-bounds filters as non-empty Message-ID: <4653b5b4bcd254a3791797214b46722b4062dc18.1600279373.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When a changed-path Bloom filter has either zero, or more than a certain number (commonly 512) of entries, the commit-graph machinery encodes it as "missing". More specifically, it sets the indices adjacent in the BIDX chunk as equal to each other to indicate a "length 0" filter; that is, that the filter occupies zero bytes on disk. This has heretofore been fine, since the commit-graph machinery has no need to care about these filters with too few or too many changed paths. Both cases act like no filter has been generated at all, and so there is no need to store them. In a subsequent commit, however, the commit-graph machinery will learn to only compute Bloom filters for some commits in the current commit-graph layer. This is a change from the current implementation which computes Bloom filters for all commits that are in the layer being written. Critically for this patch, only computing some of the Bloom filters means adding a third state for length 0 Bloom filters: zero entries, too many entries, or "hasn't been computed". It will be important for that future patch to distinguish between "not representable" (i.e., zero or too-many changed paths), and "hasn't been computed". In particular, we don't want to waste time recomputing filters that have already been computed. To that end, change how we store Bloom filters in the "computed but not representable" category: - Bloom filters with no entries are stored as a single byte with all bits low (i.e., all queries to that Bloom filter will return "definitely not") - Bloom filters with too many entries are stored as a single byte with all bits set high (i.e., all queries to that Bloom filter will return "maybe"). These rules are sufficient to not incur a behavior change by changing the on-disk representation of these two classes. Likewise, no specification changes are necessary for the commit-graph format, either: - Filters that were previously empty will be recomputed and stored according to the new rules, and - old clients reading filters generated by new clients will interpret the filters correctly and be none the wiser to how they were generated. Clients will invoke the Bloom machinery in more cases than before, but this can be addressed by returning a NULL filter when all bits are set high. This can be addressed in a future patch. Finally, note that this does increase the size of on-disk commit-graphs, but far less than other proposals. In particular, this is generally more efficient than storing a bitmap for which commits haven't computed their Bloom filters. Storing a bitmap incurs a penalty of one bit per commit, whereas storing explicit filters as above incurs a penalty of one byte per too-large or too-small commit. In practice, these boundary commits likely occupy a small proportion of the overall number of commits, and so the size penalty is likely smaller than storing a bitmap for all commits. A test to exercise filters which contain too many changed path entries will be introduced in a subsequent patch. Suggested-by: SZEDER Gábor Suggested-by: Jakub Narębski Helped-by: Derrick Stolee Signed-off-by: Taylor Blau --- .../technical/commit-graph-format.txt | 2 +- bloom.c | 16 ++++++++-- bloom.h | 1 + commit-graph.c | 5 ++++ t/t0095-bloom.sh | 8 ++--- t/t4216-log-bloom.sh | 30 +++++++++++++++++-- 6 files changed, 53 insertions(+), 9 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 6ddbceba15..6585f1948a 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -125,7 +125,7 @@ CHUNK DATA: * The rest of the chunk is the concatenation of all the computed Bloom filters for the commits in lexicographic order. * Note: Commits with no changes or more than 512 changes have Bloom filters - of length zero. + of length one, with either all bits set to zero or one respectively. * The BDAT chunk is present if and only if BIDX is present. Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] diff --git a/bloom.c b/bloom.c index db9fb82437..d24747a1d5 100644 --- a/bloom.c +++ b/bloom.c @@ -177,6 +177,13 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, return strcmp(e1->path, e2->path); } +static void init_truncated_large_filter(struct bloom_filter *filter) +{ + filter->data = xmalloc(1); + filter->data[0] = 0xFF; + filter->len = 1; +} + struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, @@ -260,12 +267,18 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, } if (hashmap_get_size(&pathmap) > settings->max_changed_paths) { + init_truncated_large_filter(filter); if (computed) *computed |= BLOOM_TRUNC_LARGE; goto cleanup; } filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + if (!filter->len) { + if (computed) + *computed |= BLOOM_TRUNC_SMALL; + filter->len = 1; + } filter->data = xcalloc(filter->len, sizeof(unsigned char)); hashmap_for_each_entry(&pathmap, &iter, e, entry) { @@ -279,8 +292,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, } else { for (i = 0; i < diff_queued_diff.nr; i++) diff_free_filepair(diff_queued_diff.queue[i]); - filter->data = NULL; - filter->len = 0; + init_truncated_large_filter(filter); if (computed) *computed |= BLOOM_TRUNC_LARGE; diff --git a/bloom.h b/bloom.h index c6d77e8393..70a8840896 100644 --- a/bloom.h +++ b/bloom.h @@ -93,6 +93,7 @@ enum bloom_filter_computed { BLOOM_NOT_COMPUTED = (1 << 0), BLOOM_COMPUTED = (1 << 1), BLOOM_TRUNC_LARGE = (1 << 2), + BLOOM_TRUNC_SMALL = (1 << 3), }; struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff --git a/commit-graph.c b/commit-graph.c index 1ca754f19c..bd4247bca5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -974,6 +974,7 @@ struct write_commit_graph_context { int count_bloom_filter_computed; int count_bloom_filter_not_computed; + int count_bloom_filter_trunc_small; int count_bloom_filter_trunc_large; }; @@ -1402,6 +1403,8 @@ static void trace2_bloom_filter_write_statistics(struct write_commit_graph_conte ctx->count_bloom_filter_computed); trace2_data_intmax("commit-graph", ctx->r, "filter-not-computed", ctx->count_bloom_filter_not_computed); + trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-small", + ctx->count_bloom_filter_trunc_small); trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-large", ctx->count_bloom_filter_trunc_large); } @@ -1438,6 +1441,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) &computed); if (computed & BLOOM_COMPUTED) { ctx->count_bloom_filter_computed++; + if (computed & BLOOM_TRUNC_SMALL) + ctx->count_bloom_filter_trunc_small++; if (computed & BLOOM_TRUNC_LARGE) ctx->count_bloom_filter_trunc_large++; } else if (computed & BLOOM_NOT_COMPUTED) diff --git a/t/t0095-bloom.sh b/t/t0095-bloom.sh index 232ba2c485..7e4ab1795f 100755 --- a/t/t0095-bloom.sh +++ b/t/t0095-bloom.sh @@ -71,8 +71,8 @@ test_expect_success 'get bloom filters for commit with no changes' ' git init && git commit --allow-empty -m "c0" && cat >expect <<-\EOF && - Filter_Length:0 - Filter_Data: + Filter_Length:1 + Filter_Data:00| EOF test-tool bloom get_filter_for_commit "$(git rev-parse HEAD)" >actual && test_cmp expect actual @@ -107,8 +107,8 @@ test_expect_success EXPENSIVE 'get bloom filter for commit with 513 changes' ' git add bigDir && git commit -m "commit with 513 changes" && cat >expect <<-\EOF && - Filter_Length:0 - Filter_Data: + Filter_Length:1 + Filter_Data:ff| EOF test-tool bloom get_filter_for_commit "$(git rev-parse HEAD)" >actual && test_cmp expect actual diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 1ac8f4c4eb..a0c9c9ea23 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -30,6 +30,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' rm file_to_be_deleted && git add . && git commit -m "file removed" && + git commit --allow-empty -m "empty" && git commit-graph write --reachable --changed-paths && test_oid_cache <<-EOF @@ -49,7 +50,7 @@ graph_read_expect () { } test_expect_success 'commit-graph write wrote out the bloom chunks' ' - graph_read_expect 15 + graph_read_expect 16 ' # Turn off any inherited trace2 settings for this test. @@ -156,7 +157,7 @@ test_expect_success 'setup - add commit-graph to the chain with Bloom filters' ' test_bloom_filters_used_when_some_filters_are_missing () { log_args=$1 - bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":8" + bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":9" setup "$log_args" && grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom @@ -185,10 +186,18 @@ test_max_changed_paths () { grep "\"max_changed_paths\":$1" $2 } +test_filter_not_computed () { + grep "\"key\":\"filter-not-computed\",\"value\":\"$1\"" $2 +} + test_filter_computed () { grep "\"key\":\"filter-computed\",\"value\":\"$1\"" $2 } +test_filter_trunc_small () { + grep "\"key\":\"filter-trunc-small\",\"value\":\"$1\"" $2 +} + test_filter_trunc_large () { grep "\"key\":\"filter-trunc-large\",\"value\":\"$1\"" $2 } @@ -283,4 +292,21 @@ test_expect_success 'correctly report changes over limit' ' ) ' +test_expect_success 'correctly report commits with no changed paths' ' + git init small && + test_when_finished "rm -fr small" && + ( + cd small && + + git commit --allow-empty -m "initial commit" && + + GIT_TRACE2_EVENT="$(pwd)/trace.event" \ + git commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace.event && + test_filter_not_computed 0 trace.event && + test_filter_trunc_small 1 trace.event && + test_filter_trunc_large 0 trace.event + ) +' + test_done From patchwork Wed Sep 16 18:08:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780351 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E710F6CA for ; Wed, 16 Sep 2020 18:09:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0AC72137B for ; Wed, 16 Sep 2020 18:09:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="KxNF54xq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727802AbgIPSJq (ORCPT ); Wed, 16 Sep 2020 14:09:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39094 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727787AbgIPSJT (ORCPT ); Wed, 16 Sep 2020 14:09:19 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CD5AC06178B for ; Wed, 16 Sep 2020 11:08:10 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id o16so9126834qkj.10 for ; Wed, 16 Sep 2020 11:08:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=3ScqAQ2ax55WMk5nkE3TJzK0kIlB9+E6bKbRDV+G1RY=; b=KxNF54xqajfZZQ7D9rccPe2A7u77jaNXHaAXF83s06Y8huy2oL9ebGq9f6wZzLAtKY G9zqQBx0nmrmGEk3mfkNPV+rrUSDrI+5j0/jRHynmAS0z31CAGwDU7yImcbghZ/humrQ sRNZzgqJZjHTWQ2mwJKPknLqo8fs0TjkxI/70HANTShJbJ4fWP7q9Z/VUCIITesvvB+v kKOEOtwUPPhMy701yEpVOgjUR1eEGiZItsbdOB3PDNlp7KwnZmWhEqzrca77kJ4blSLE AGZDSXA46KKErZJL1gzZLlfuY5HBkybeztR1oaBhtLbPB1VlAJDMK0jb0qiKsrLoktJd f9Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=3ScqAQ2ax55WMk5nkE3TJzK0kIlB9+E6bKbRDV+G1RY=; b=rGwttY7L/sNd8LG0UuToO2pMqBAU+Gafc85WX+fOvSCx+ENnrL7UOKrCI+xyuAqilt csiLwuaCmrm7ltAgsQVZ9rR7gAyIgWal35YvLXRNgDKWElxvUEZV/Ggdb9kT6hq2whXB OwvYPzb8a0c03hm+se8L8zxAoRKA/XY9kJLZwJZXuThkIiIP5U9tpGaB3DPSfTwUQmyF /edKkI8RgV8w/FRdGOgsCQtNfDTqJ3lk/1MkOG+Po+wkX3xonOBaA00JWjXJ/JDT5zaE iyqF30+hzVdqVfVQDm6Rjc1L48PnlH1apyeCjzAcqKWoE236dUMToYl6eGcjDYPk9SkU PrOA== X-Gm-Message-State: AOAM532WY1Vv2pDpDRbmU18Y8uZipJeuqVSEVzXHAFX/S1JI/p+c5jIU xNrcTE01ikm2BIsfqDgJ1fAuqtvrmjTg1bad X-Google-Smtp-Source: ABdhPJwebm3YMLR+YIVC8h9t+a8gO+06a/dNfM3A95v8WLq8dNkY33i9wNdbYxjJL7VTed+livzkzQ== X-Received: by 2002:a37:a1cd:: with SMTP id k196mr23577981qke.409.1600279689024; Wed, 16 Sep 2020 11:08:09 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id j11sm20628287qko.111.2020.09.16.11.08.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:08:08 -0700 (PDT) Date: Wed, 16 Sep 2020 14:08:06 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 11/13] commit-graph: rename 'split_commit_graph_opts' Message-ID: <51f9a398fc5f09d19bafbb48bd4218efbfa826b6.1600279373.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In the subsequent commit, additional options will be added to the commit-graph API which have nothing to do with splitting. Rename the 'split_commit_graph_opts' structure to the more-generic 'commit_graph_opts' to encompass both. Likewise, rename the 'flags' member to instead be 'split_flags' to clarify that it only has to do with the behavior implied by '--split'. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 20 ++++++++++---------- commit-graph.c | 40 ++++++++++++++++++++-------------------- commit-graph.h | 8 ++++---- 3 files changed, 34 insertions(+), 34 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index ba5584463f..f3243bd982 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -119,7 +119,7 @@ static int graph_verify(int argc, const char **argv) } extern int read_replace_refs; -static struct split_commit_graph_opts split_opts; +static struct commit_graph_opts write_opts; static int write_option_parse_split(const struct option *opt, const char *arg, int unset) @@ -187,24 +187,24 @@ static int graph_write(int argc, const char **argv) OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths, N_("enable computation for changed paths")), OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), - OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, + OPT_CALLBACK_F(0, "split", &write_opts.split_flags, NULL, N_("allow writing an incremental commit-graph file"), PARSE_OPT_OPTARG | PARSE_OPT_NONEG, write_option_parse_split), - OPT_INTEGER(0, "max-commits", &split_opts.max_commits, + OPT_INTEGER(0, "max-commits", &write_opts.max_commits, N_("maximum number of commits in a non-base split commit-graph")), - OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple, + OPT_INTEGER(0, "size-multiple", &write_opts.size_multiple, N_("maximum ratio between two levels of a split commit-graph")), - OPT_EXPIRY_DATE(0, "expire-time", &split_opts.expire_time, + OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), OPT_END(), }; opts.progress = isatty(2); opts.enable_changed_paths = -1; - split_opts.size_multiple = 2; - split_opts.max_commits = 0; - split_opts.expire_time = 0; + write_opts.size_multiple = 2; + write_opts.max_commits = 0; + write_opts.expire_time = 0; trace2_cmd_mode("write"); @@ -232,7 +232,7 @@ static int graph_write(int argc, const char **argv) odb = find_odb(the_repository, opts.obj_dir); if (opts.reachable) { - if (write_commit_graph_reachable(odb, flags, &split_opts)) + if (write_commit_graph_reachable(odb, flags, &write_opts)) return 1; return 0; } @@ -261,7 +261,7 @@ static int graph_write(int argc, const char **argv) opts.stdin_packs ? &pack_indexes : NULL, opts.stdin_commits ? &commits : NULL, flags, - &split_opts)) + &write_opts)) result = 1; cleanup: diff --git a/commit-graph.c b/commit-graph.c index bd4247bca5..913f78a9a1 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -968,7 +968,7 @@ struct write_commit_graph_context { changed_paths:1, order_by_pack:1; - const struct split_commit_graph_opts *split_opts; + const struct commit_graph_opts *opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; @@ -1292,8 +1292,8 @@ static void close_reachable(struct write_commit_graph_context *ctx) { int i; struct commit *commit; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->split_flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; if (ctx->report_progress) ctx->progress = start_delayed_progress( @@ -1482,7 +1482,7 @@ static int add_ref_to_set(const char *refname, int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct oidset commits = OIDSET_INIT; struct refs_cb_data data; @@ -1499,7 +1499,7 @@ int write_commit_graph_reachable(struct object_directory *odb, stop_progress(&data.progress); result = write_commit_graph(odb, NULL, &commits, - flags, split_opts); + flags, opts); oidset_clear(&commits); return result; @@ -1614,8 +1614,8 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) static void copy_oids_to_commits(struct write_commit_graph_context *ctx) { uint32_t i; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->split_flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; ctx->num_extra_edges = 0; if (ctx->report_progress) @@ -1900,13 +1900,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) int max_commits = 0; int size_mult = 2; - if (ctx->split_opts) { - max_commits = ctx->split_opts->max_commits; + if (ctx->opts) { + max_commits = ctx->opts->max_commits; - if (ctx->split_opts->size_multiple) - size_mult = ctx->split_opts->size_multiple; + if (ctx->opts->size_multiple) + size_mult = ctx->opts->size_multiple; - flags = ctx->split_opts->flags; + flags = ctx->opts->split_flags; } g = ctx->r->objects->commit_graph; @@ -2084,8 +2084,8 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) size_t dirnamelen; timestamp_t expire_time = time(NULL); - if (ctx->split_opts && ctx->split_opts->expire_time) - expire_time = ctx->split_opts->expire_time; + if (ctx->opts && ctx->opts->expire_time) + expire_time = ctx->opts->expire_time; if (!ctx->split) { char *chain_file_name = get_chain_filename(ctx->odb); unlink(chain_file_name); @@ -2136,7 +2136,7 @@ int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct write_commit_graph_context *ctx; uint32_t i, count_distinct = 0; @@ -2153,7 +2153,7 @@ int write_commit_graph(struct object_directory *odb, ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; - ctx->split_opts = split_opts; + ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", @@ -2201,15 +2201,15 @@ int write_commit_graph(struct object_directory *odb, } } - if (ctx->split_opts) - replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; + if (ctx->opts) + replace = ctx->opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE; } ctx->approx_nr_objects = approximate_object_count(); ctx->oids.alloc = ctx->approx_nr_objects / 32; - if (ctx->split && split_opts && ctx->oids.alloc > split_opts->max_commits) - ctx->oids.alloc = split_opts->max_commits; + if (ctx->split && opts && ctx->oids.alloc > opts->max_commits) + ctx->oids.alloc = opts->max_commits; if (ctx->append) { prepare_commit_graph_one(ctx->r, ctx->odb); diff --git a/commit-graph.h b/commit-graph.h index d9acb22bac..b7914b0a7a 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -105,11 +105,11 @@ enum commit_graph_split_flags { COMMIT_GRAPH_SPLIT_REPLACE = 2 }; -struct split_commit_graph_opts { +struct commit_graph_opts { int size_multiple; int max_commits; timestamp_t expire_time; - enum commit_graph_split_flags flags; + enum commit_graph_split_flags split_flags; }; /* @@ -120,12 +120,12 @@ struct split_commit_graph_opts { */ int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); #define COMMIT_GRAPH_VERIFY_SHALLOW (1 << 0) From patchwork Wed Sep 16 18:08:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780353 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD7986CA for ; Wed, 16 Sep 2020 18:09:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A34632137B for ; Wed, 16 Sep 2020 18:09:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="B2UnH2vx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727809AbgIPSJv (ORCPT ); Wed, 16 Sep 2020 14:09:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727762AbgIPSJT (ORCPT ); Wed, 16 Sep 2020 14:09:19 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43896C061797 for ; Wed, 16 Sep 2020 11:08:15 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id w186so9200698qkd.1 for ; Wed, 16 Sep 2020 11:08:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=nkm2hW01F+x4POn5IQ5Zq/VdAiiMrlutWoaJBtfCkuk=; b=B2UnH2vxSAtSMDBCpYbF81H4aASVuIetykHOcLlk9POPBwDABh1goFtvhXKXKidjyZ puOBEVZqPEz9z6bUQH9K0Rd1LnwaxLYu2WAdd7GyzFdSVDRY3rh2QWGT+B+WvFkUqgyc TT1M2rR8M+YaRE3C3RJkk7CF8+d5vSmlsE9BykoSZ9L6COdmCz/ALHw/WSj0yn5S4dZF h9cHnX8RvDrbEFIoKr/F9SGDIqpzJXUqFjcuYVPriDEnQNRXL1XpMuAZdbe3YQhpgo2W pmUzR7HqIj/uDLAblf4iI3oRR5AJnDIMSUz8YFyriFGBnJbAXEzgZ0Cy42TBa2nOOLEM blGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=nkm2hW01F+x4POn5IQ5Zq/VdAiiMrlutWoaJBtfCkuk=; b=MFJQU07LhYJ12r0cirDSL9fBCuAyhFS7ssyV5i4p+3RdbPy9M5eTz+1FOjgT+O700Y bcicJtB4bLOlUYGeRfrSbuzEV4i7LpUZmGVa7H29DuFQmT/SSvxUEjgt2FWEESdoNeWH oJmEZ1lwQLjnG6DR67v+n0bm4c8QDlLBIU+sBjM7pIbLSod0M0LrpXnRAaUyN7N8sNoh zB1X42twpq4iA1Xzug7w2lSgOqm23URs28jzHp0pJO025gsnLgOrtMqRZie4EPCrSmHx IF9twrbQPO4S1Y8O5Ij6cZ14ClndkVN069ju73S6oncCiSfPbfcD66KB9XsTj3qXnOBW l81A== X-Gm-Message-State: AOAM531l8n9hAjnQBhSgI5VsPCYlYMczJONQmI3XrGxIaIJimlz3RCHQ +ZofjUgUtj/ElbSblzAvptW8y/Q550OYdqrk X-Google-Smtp-Source: ABdhPJzcy8fQ7ccLSyYXDbFBVmm6ED6vqVX2ehyidJVi8BAV8+MTMlLiC4gXKVFPvf5pKa0uVICBCg== X-Received: by 2002:a37:9046:: with SMTP id s67mr24244387qkd.212.1600279693916; Wed, 16 Sep 2020 11:08:13 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id 145sm20381718qkf.18.2020.09.16.11.08.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:08:13 -0700 (PDT) Date: Wed, 16 Sep 2020 14:08:10 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 12/13] builtin/commit-graph.c: introduce '--max-new-filters=' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Introduce a command-line flag to specify the maximum number of new Bloom filters that a 'git commit-graph write' is willing to compute from scratch. Prior to this patch, a commit-graph write with '--changed-paths' would compute Bloom filters for all selected commits which haven't already been computed (i.e., by a previous commit-graph write with '--split' such that a roll-up or replacement is performed). This behavior can cause prohibitively-long commit-graph writes for a variety of reasons: * There may be lots of filters whose diffs take a long time to generate (for example, they have close to the maximum number of changes, diffing itself takes a long time, etc). * Old-style commit-graphs (which encode filters with too many entries as not having been computed at all) cause us to waste time recomputing filters that appear to have not been computed only to discover that they are too-large. This can make the upper-bound of the time it takes for 'git commit-graph write --changed-paths' to be rather unpredictable. To make this command behave more predictably, introduce '--max-new-filters=' to allow computing at most '' Bloom filters from scratch. This lets "computing" already-known filters proceed quickly, while bounding the number of slow tasks that Git is willing to do. Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-commit-graph.txt | 5 +++ bloom.c | 7 ++- builtin/commit-graph.c | 27 +++++++++++- commit-graph.c | 9 +++- commit-graph.h | 1 + t/t4216-log-bloom.sh | 70 ++++++++++++++++++++++++++++++ 6 files changed, 111 insertions(+), 8 deletions(-) diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..8357846d30 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,11 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Commits whose filters are not calculated are stored as a +length zero Bloom filter. ++ With the `--split[=]` option, write the commit-graph as a chain of multiple commit-graph files stored in `/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index d24747a1d5..230a515831 100644 --- a/bloom.c +++ b/bloom.c @@ -204,12 +204,11 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); - if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) - return filter; + if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); } - if (filter->data) + if (filter->data && filter->len) return filter; if (!compute_if_not_present) return NULL; diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index f3243bd982..5df9b2ef80 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir ] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of changed-path Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); diff --git a/commit-graph.c b/commit-graph.c index 913f78a9a1..33af6c2430 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1414,6 +1414,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; struct commit **sorted_commits; + int max_new_filters; init_bloom_filters(); @@ -1430,13 +1431,16 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) else QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + max_new_filters = ctx->opts && ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { enum bloom_filter_computed computed = 0; struct commit *c = sorted_commits[i]; struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, - 1, + ctx->count_bloom_filter_computed < max_new_filters, ctx->bloom_settings, &computed); if (computed & BLOOM_COMPUTED) { @@ -1447,7 +1451,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_trunc_large++; } else if (computed & BLOOM_NOT_COMPUTED) ctx->count_bloom_filter_not_computed++; - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + ctx->total_bloom_filter_data_size += filter + ? sizeof(unsigned char) * filter->len : 0; display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index b7914b0a7a..a22bd86701 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -110,6 +110,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags split_flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index a0c9c9ea23..9ce0c318e9 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -309,4 +309,74 @@ test_expect_success 'correctly report commits with no changed paths' ' ) ' +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd limits && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + + rm -f trace.event && + GIT_TRACE2_EVENT="$(pwd)/trace.event" \ + git commit-graph write --reachable --split=replace \ + --changed-paths --max-new-filters=2 && + + test_filter_computed 2 trace.event && + test_filter_not_computed 3 trace.event && + test_filter_trunc_small 0 trace.event && + test_filter_trunc_large 0 trace.event + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + ( + cd limits && + + rm -f trace.event && + GIT_TRACE2_EVENT="$(pwd)/trace.event" \ + git commit-graph write --reachable --changed-paths \ + --split=replace --max-new-filters=1 && + test_filter_computed 1 trace.event && + test_filter_not_computed 4 trace.event && + test_filter_trunc_small 0 trace.event && + test_filter_trunc_large 0 trace.event + ) +' + +test_expect_success 'Bloom generation backfills empty commits' ' + git init empty && + test_when_finished "rm -fr empty" && + ( + cd empty && + for i in $(test_seq 1 6) + do + git commit --allow-empty -m "$i" + done && + + # Generate Bloom filters for empty commits 1-6, two at a time. + for i in $(test_seq 1 3) + do + rm -f trace.event && + GIT_TRACE2_EVENT="$(pwd)/trace.event" \ + git commit-graph write --reachable \ + --changed-paths --max-new-filters=2 && + test_filter_computed 2 trace.event && + test_filter_not_computed 4 trace.event && + test_filter_trunc_small 2 trace.event && + test_filter_trunc_large 0 trace.event + done && + + # Finally, make sure that once all commits have filters, that + # none are subsequently recomputed. + rm -f trace.event && + GIT_TRACE2_EVENT="$(pwd)/trace.event" \ + git commit-graph write --reachable \ + --changed-paths --max-new-filters=2 && + test_filter_computed 0 trace.event && + test_filter_not_computed 6 trace.event && + test_filter_trunc_small 0 trace.event && + test_filter_trunc_large 0 trace.event + ) +' + test_done From patchwork Wed Sep 16 18:08:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11780367 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DF451139F for ; Wed, 16 Sep 2020 18:12:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BB23121655 for ; Wed, 16 Sep 2020 18:12:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="hDuqiQJ0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727731AbgIPSKy (ORCPT ); Wed, 16 Sep 2020 14:10:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727788AbgIPSJU (ORCPT ); Wed, 16 Sep 2020 14:09:20 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D8BDC061351 for ; Wed, 16 Sep 2020 11:08:20 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id k25so6890175qtu.4 for ; Wed, 16 Sep 2020 11:08:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=YSX5z86CwPGzYPidQYbMN3Nk/ZLpPYpBuu0bNpMg9yE=; b=hDuqiQJ0b/Yx1xmHfKL/Zllwf9ZJDZ8IfbotxNrQ9L32hr7I4G5Bu2DkaXv9LsNc7Z 1KFiJd+7d2I0dxKQgjPz5kr9vi0ZyD3r9yk8kRLhoUNQpVgE2MfURxZfKtbJPEOns0qz FdtpDVPiMETDlFM49MrwVfQMznK8htjNCm8psJykstguasBi4P3AO2vLCKq2YoZ1ftHV rK+Pb+vkZwCYzwpuvgx1wLR789UJTDrmN/y3/LdKiydhR60cidFThddI4b9i3V7jdCMl z2oUufwygPp0H39luj8gDK7F2FlBTCZU2mE6b64tVmQHBmMaGSiE4Bbu10u9FkKBX5hX nCig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=YSX5z86CwPGzYPidQYbMN3Nk/ZLpPYpBuu0bNpMg9yE=; b=nltJ5nBwyUiPeBOtjIlbaH9Su8oLUd62I+8FsGWCId6n9CfbUA+XlxqsE5pcCBh1FA SMDoID9Y1DDESfao3uiNbXQ7j1+yX6WvBL+iNErsyGcV8TvdHlJ16ewbFcjOMcB5JOx9 75tA+KAg8empk1H/okSxDu5PKR70itwi1j72oam7VsZ1T5rsLl2SI0dJWnkB8cq1o1/k eUfkCwLN3WbB6CEb7mG8SVKE7JVT7dli+pwCJRKGmZkQWv+0I9dNhd3B7MAe1eJh71Mz mh0+re0isrfTOcC5mZaKHOvPvB5qPsoFljKUVZcQVG0hal3pA1Vr0V92C/NDMo/a5Xkt 5OWQ== X-Gm-Message-State: AOAM533YQ/Y8JsB40/wIbQaHxpAN3PX4L/irWNy6jcDJRLlJCt0qYvle 9WcpNPgwxVr1DK/GifsKwpVu0qaKl0RSg+N9 X-Google-Smtp-Source: ABdhPJxdX6gYU4krGSBuKrRMrYo1qV+JLeDL2lZ0jjkbD+Zhj1WuIPquS6IhMOAXa1EZdHqnUNf+Sg== X-Received: by 2002:ac8:34f2:: with SMTP id x47mr11582214qtb.282.1600279699426; Wed, 16 Sep 2020 11:08:19 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:e86e:6adf:7537:ae24]) by smtp.gmail.com with ESMTPSA id m18sm20454530qkg.77.2020.09.16.11.08.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:08:18 -0700 (PDT) Date: Wed, 16 Sep 2020 14:08:15 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v2 13/13] commit-graph: introduce 'commitGraph.maxNewFilters' Message-ID: <86fdc5f89a1637e889e686d60f75a904c0f4485c.1600279373.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Introduce a configuration variable to specify a default value for the recently-introduce '--max-new-filters' option of 'git commit-graph write'. Signed-off-by: Taylor Blau --- Documentation/config/commitgraph.txt | 4 ++++ Documentation/git-commit-graph.txt | 3 ++- builtin/commit-graph.c | 14 ++++++++++++++ t/t4216-log-bloom.sh | 24 +++++++++++++++++++++++- 4 files changed, 43 insertions(+), 2 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index cff0797b54..4582c39fc4 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitGraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitGraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 8357846d30..60df4e4bfa 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -70,7 +70,8 @@ data. With the `--max-new-filters=` option, generate at most `n` new Bloom filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is enforced. Commits whose filters are not calculated are stored as a -length zero Bloom filter. +length zero Bloom filter. Overrides the `commitGraph.maxNewFilters` +configuration. + With the `--split[=]` option, write the commit-graph as a chain of multiple commit-graph files stored in diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 5df9b2ef80..988445abdf 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -181,6 +181,18 @@ static int write_option_max_new_filters(const struct option *opt, return 0; } +static int git_commit_graph_write_config(const char *var, const char *value, + void *cb) +{ + if (!strcmp(var, "commitgraph.maxnewfilters")) + write_opts.max_new_filters = git_config_int(var, value); + /* + * No need to fall-back to 'git_default_config', since this was already + * called in 'cmd_commit_graph()'. + */ + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -231,6 +243,8 @@ static int graph_write(int argc, const char **argv) trace2_cmd_mode("write"); + git_config(git_commit_graph_write_config, &opts); + argc = parse_options(argc, argv, NULL, builtin_commit_graph_write_options, builtin_commit_graph_write_usage, 0); diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 9ce0c318e9..dc7d62c778 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -329,13 +329,15 @@ test_expect_success 'Bloom generation is limited by --max-new-filters' ' ' test_expect_success 'Bloom generation backfills previously-skipped filters' ' + # Check specifying commitGraph.maxNewFilters over "git config" works. + test_config -C limits commitGraph.maxNewFilters 1 && ( cd limits && rm -f trace.event && GIT_TRACE2_EVENT="$(pwd)/trace.event" \ git commit-graph write --reachable --changed-paths \ - --split=replace --max-new-filters=1 && + --split=replace && test_filter_computed 1 trace.event && test_filter_not_computed 4 trace.event && test_filter_trunc_small 0 trace.event && @@ -343,6 +345,26 @@ test_expect_success 'Bloom generation backfills previously-skipped filters' ' ) ' +test_expect_success '--max-new-filters overrides configuration' ' + git init override && + test_when_finished "rm -fr override" && + test_config -C override commitGraph.maxNewFilters 2 && + ( + cd override && + test_commit one && + test_commit two && + + rm -f trace.event && + GIT_TRACE2_EVENT="$(pwd)/trace.event" \ + git commit-graph write --reachable --changed-paths \ + --max-new-filters=1 && + test_filter_computed 1 trace.event && + test_filter_not_computed 1 trace.event && + test_filter_trunc_small 0 trace.event && + test_filter_trunc_large 0 trace.event + ) +' + test_expect_success 'Bloom generation backfills empty commits' ' git init empty && test_when_finished "rm -fr empty" &&