From patchwork Mon May 16 18:11:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12851316 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB045C433EF for ; Mon, 16 May 2022 18:11:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344600AbiEPSLt (ORCPT ); Mon, 16 May 2022 14:11:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344584AbiEPSLo (ORCPT ); Mon, 16 May 2022 14:11:44 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FAC53D4AB for ; Mon, 16 May 2022 11:11:38 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id f2so14595144wrc.0 for ; Mon, 16 May 2022 11:11:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=g3ENZQRudzmoqf6r+akG3aRLZ7L9YsK4RFiAywuqTNc=; b=oMwAfje26AUbg6n+pOcpllT1kybLgEupAcXFXJPqEZ3ay8xoKq/LTd8irI7PIdQ7W9 OQO8HbbJQYfqGKZoN1nR0P4QAu1dpcdnWhtbNO0VkGSUG7ntI2L4E8gNKcn4xDyJ5NsN FVtMqotQGwu6/8eIPu3Yd7isAIkeImKk8iJVDrhtG3ScGHoPHMDy+yrNuWzo1/ROsCIP uc7iFotV5bLInSDn4CTr5Jr7owpmfQrxGaDiudq+H7qGl6sH4B+TryqyghbnzeMsFCXN 73Lx0EEJPm38MVu8U60imOUP8cI9o452xd/xTBottovFFP88Yu7ALDsmdbqHfZHuWm8i N8dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=g3ENZQRudzmoqf6r+akG3aRLZ7L9YsK4RFiAywuqTNc=; b=OiTUfEfAnAthpsyBd6aTmkIdnPC8oXXLEXsN7O/A3nKqO7kXRlEwsQ9q/qroZgoCRO 4UW/PI8bCjkuRGYBsRH1OjBlXLABILr0D0khRdF66guhemDCJBTU5imcEFVzWuH7EbaX d1C4o81klHZkdBl48ff0scp5db3NsgzcejvziY49TJbIk4hCm5fn610ST1egvop2DzDE eTTucEFfw+wgeMBrttEpm+Q+akxvo1q95JJE4zwq43D0suaybcFQFIMkRXBKJEzRB9gl 7jDyqVUAk6wTlVAGlKewrR2vvaaKuKpf1Kua2qofx1y/vN9cOr6v4fGZZRCTDXRwk0C4 XVlg== X-Gm-Message-State: AOAM531Sx5mxn41AovZZ+cim1Tcdrzdvp0EhNlLSUH91brIFUuGoUQth KrmMGeWZp9cf3ZwCeZhit4qx6mUDoyE= X-Google-Smtp-Source: ABdhPJyGMIEPWQ2pFU2BfycWtxuv7XmbHn/BCibf4/DHHgGW0Mepa702weX/jyuPkqXNskQlQ6M6YQ== X-Received: by 2002:a5d:59a6:0:b0:20c:5aa2:ae1b with SMTP id p6-20020a5d59a6000000b0020c5aa2ae1bmr15375453wrr.130.1652724696304; Mon, 16 May 2022 11:11:36 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q22-20020adfb196000000b0020cfed0bb7fsm6682422wra.53.2022.05.16.11.11.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 11:11:35 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 16 May 2022 18:11:26 +0000 Subject: [PATCH 1/8] sparse-index: create expand_to_pattern_list() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, shaoxuan.yuan02@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee This is the first change in a series to allow modifying the sparse-checkout pattern set without expanding a sparse index to a full one in the process. Here, we focus on the problem of expanding the pattern set through a command like 'git sparse-checkout add ' which needs to create new index entries for the paths now being written to the worktree. To achieve this, we need to be able to replace sparse directory entries with their contained files and subdirectories. Once this is complete, other code paths can discover those cache entries and write the corresponding files to disk before committing the index. We already have logic in ensure_full_index() that expands the index entries, so we will use that as our base. Create a new method, expand_to_pattern_list(), which takes a pattern list, but for now mostly ignores it. The current implementation is only correct when the pattern list is NULL as that does the same as ensure_full_index(). In fact, ensure_full_index() is converted to a shim over expand_to_pattern_list(). A future update will actually implement expand_to_pattern_list() to its full capabilities. For now, it is created and documented. Signed-off-by: Derrick Stolee --- sparse-index.c | 35 ++++++++++++++++++++++++++++++++--- sparse-index.h | 14 ++++++++++++++ 2 files changed, 46 insertions(+), 3 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 8636af72de5..37c7df877a6 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -248,19 +248,41 @@ static int add_path_to_index(const struct object_id *oid, return 0; } -void ensure_full_index(struct index_state *istate) +void expand_to_pattern_list(struct index_state *istate, + struct pattern_list *pl) { int i; struct index_state *full; struct strbuf base = STRBUF_INIT; + /* + * If the index is already full, then keep it full. We will convert + * it to a sparse index on write, if possible. + */ if (!istate || !istate->sparse_index) return; + /* + * If our index is sparse, but our new pattern set does not use + * cone mode patterns, then we need to expand the index before we + * continue. A NULL pattern set indicates a full expansion to a + * full index. + */ + if (pl && !pl->use_cone_patterns) + pl = NULL; + if (!istate->repo) istate->repo = the_repository; - trace2_region_enter("index", "ensure_full_index", istate->repo); + /* + * A NULL pattern set indicates we are expanding a full index, so + * we use a special region name that indicates the full expansion. + * This is used by test cases, but also helps to differentiate the + * two cases. + */ + trace2_region_enter("index", + pl ? "expand_to_pattern_list" : "ensure_full_index", + istate->repo); /* initialize basics of new index */ full = xcalloc(1, sizeof(struct index_state)); @@ -322,7 +344,14 @@ void ensure_full_index(struct index_state *istate) cache_tree_free(&istate->cache_tree); cache_tree_update(istate, 0); - trace2_region_leave("index", "ensure_full_index", istate->repo); + trace2_region_leave("index", + pl ? "expand_to_pattern_list" : "ensure_full_index", + istate->repo); +} + +void ensure_full_index(struct index_state *istate) +{ + expand_to_pattern_list(istate, NULL); } void ensure_correct_sparsity(struct index_state *istate) diff --git a/sparse-index.h b/sparse-index.h index 633d4fb7e31..037b541f49d 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -23,4 +23,18 @@ void expand_to_path(struct index_state *istate, struct repository; int set_sparse_index_config(struct repository *repo, int enable); +struct pattern_list; + +/** + * Scan the given index and compare its entries to the given pattern list. + * If the index is sparse and the pattern list uses cone mode patterns, + * then modify the index to contain the all of the file entries within that + * new pattern list. This expands sparse directories only as far as needed. + * + * If the pattern list is NULL or does not use cone mode patterns, then the + * index is expanded to a full index. + */ +void expand_to_pattern_list(struct index_state *istate, + struct pattern_list *pl); + #endif From patchwork Mon May 16 18:11:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12851318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8755C433EF for ; Mon, 16 May 2022 18:11:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242544AbiEPSL5 (ORCPT ); Mon, 16 May 2022 14:11:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344592AbiEPSLo (ORCPT ); Mon, 16 May 2022 14:11:44 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 872ED3D4B2 for ; Mon, 16 May 2022 11:11:39 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id r23so5587364wrr.2 for ; Mon, 16 May 2022 11:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=tswqE/E8qRZ38Hi4y4SdPjJbnis3ScsuL3rg7yfjXV0=; b=RiVUl0fpv6P+sNglgx4nEA6ga/7GQ7p9CO6HGW7snnxaYHNYAUWk8PxBS6bsqKIgPL BKQHQpTIUNb9jzKLFOD8RJcgliiWEZqqWdSGEZ+Qez+1wg26F1pyfe3zvDiGvLmK1Vz0 sOjIALccqkP7Tg4RWPD52cB2rNX++7/6O459n2Hw2rD0k/HHhfSCvmtN/SUZhNnCJJGM AugYTSOxR9J8uO7gZk1VRyp6HN1+pHXnuHgtiuja0bDnQ8IqW+gZ2KLW91hjXlxJxH25 PXo6x2b0f6B3S15GIf7rnzySeId7tVDzvQURed/jWeC61aTMEFgt+PGd9dPA5jSl8ZSL fXaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=tswqE/E8qRZ38Hi4y4SdPjJbnis3ScsuL3rg7yfjXV0=; b=hiQ3Z/uKuyyF2tj9CvBHO99+ar7CzLGDUg0y1CpJDG1BTPSdE5/6w8HpMUVxFgiKEt R5H2NiBdeW3oITjjIJPYFRZyQ+bp9mSsOz8NfvJLGYxkp3W/TLurDYNRyoF3sG1J3dF+ onMBFCIWXDY/7ZT5fV65FYV5R+Ud5Pykind7VqTAXlR5ef6NBjcExNR5/j8MqZRGrmsz cEz+v+utyyzJccOGIqAKSRJrHIX86tzuSpkM8MLf6TDDB8uhRpW39zqku/JKpTtRHCNh vXuYD5G8PS9v2iCH8AAV93h1GPXWeddnh6zeiVyoRK1Nrf4jGZPWBuQkRtZ0BcoNw5K1 4rSA== X-Gm-Message-State: AOAM533Klb9HJx+lP7gazNQe9W9Rbgh1jMQxbEwBriAvvrCjlnAiMCoI rTfeKG4qFCJEFWkovKJjH1iUJ4Ft+RA= X-Google-Smtp-Source: ABdhPJw+X+MQiPrr/wN3G3GHM6M1p0JC2lZeiOQAzdRN6iqtGwUVuhqZxzKttK76ZsPAD+eQRl/OQQ== X-Received: by 2002:a5d:47af:0:b0:20c:6701:50be with SMTP id 15-20020a5d47af000000b0020c670150bemr15007768wrb.148.1652724697533; Mon, 16 May 2022 11:11:37 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c22-20020a05600c0a5600b003944821105esm14058wmq.2.2022.05.16.11.11.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 11:11:37 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 16 May 2022 18:11:27 +0000 Subject: [PATCH 2/8] sparse-index: introduce partially-sparse indexes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, shaoxuan.yuan02@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee A future change will present a temporary, in-memory mode where the index can both contain sparse directory entries but also not be completely collapsed to the smallest possible sparse directories. This will be necessary for modifying the sparse-checkout definition while using a sparse index. For now, convert the single-bit member 'sparse_index' in 'struct index_state' to be a an 'enum sparse_index_mode' with three modes: * COMPLETELY_FULL (0): No sparse directories exist. * COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the sparse-checkout cone are reduced to sparse directory entries whenever possible. * PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries outside the sparse-checkout cone may exist. Running convert_to_sparse() may further reduce those files to sparse directory entries. The main reason to store this extra information is to allow convert_to_sparse() to short-circuit when the index is already in COMPLETELY_SPARSE mode but to actually do the necessary work when in PARTIALLY_SPARSE mode. The PARTIALLY_SPARSE mode will be used in an upcoming change. Signed-off-by: Derrick Stolee --- builtin/sparse-checkout.c | 2 +- cache.h | 32 ++++++++++++++++++++++++-------- read-cache.c | 6 +++--- sparse-index.c | 6 +++--- 4 files changed, 31 insertions(+), 15 deletions(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 0217d44c5b1..88eea069ad4 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -128,7 +128,7 @@ static void clean_tracked_sparse_directories(struct repository *r) * sparse index will not delete directories that contain * conflicted entries or submodules. */ - if (!r->index->sparse_index) { + if (r->index->sparse_index == COMPLETELY_FULL) { /* * If something, such as a merge conflict or other concern, * prevents us from converting to a sparse index, then do diff --git a/cache.h b/cache.h index 6226f6a8a53..2d067aca2fd 100644 --- a/cache.h +++ b/cache.h @@ -310,6 +310,28 @@ struct untracked_cache; struct progress; struct pattern_list; +enum sparse_index_mode { + /* + * COMPLETELY_FULL: there are no sparse directories + * in the index at all. + */ + COMPLETELY_FULL = 0, + + /* + * COLLAPSED: the index has already been collapsed to sparse + * directories whereever possible. + */ + COLLAPSED = 1, + + /* + * PARTIALLY_SPARSE: the sparse directories that exist are + * outside the sparse-checkout boundary, but it is possible + * that some file entries could collapse to sparse directory + * entries. + */ + PARTIALLY_SPARSE = 2, +}; + struct index_state { struct cache_entry **cache; unsigned int version; @@ -323,14 +345,8 @@ struct index_state { drop_cache_tree : 1, updated_workdir : 1, updated_skipworktree : 1, - fsmonitor_has_run_once : 1, - - /* - * sparse_index == 1 when sparse-directory - * entries exist. Requires sparse-checkout - * in cone mode. - */ - sparse_index : 1; + fsmonitor_has_run_once : 1; + enum sparse_index_mode sparse_index; struct hashmap name_hash; struct hashmap dir_hash; struct object_id oid; diff --git a/read-cache.c b/read-cache.c index 4df97e185e9..cb9b33169fd 100644 --- a/read-cache.c +++ b/read-cache.c @@ -112,7 +112,7 @@ static const char *alternate_index_output; static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { if (S_ISSPARSEDIR(ce->ce_mode)) - istate->sparse_index = 1; + istate->sparse_index = COLLAPSED; istate->cache[nr] = ce; add_name_hash(istate, ce); @@ -1856,7 +1856,7 @@ static int read_index_extension(struct index_state *istate, break; case CACHE_EXT_SPARSE_DIRECTORIES: /* no content, only an indicator */ - istate->sparse_index = 1; + istate->sparse_index = COLLAPSED; break; default: if (*ext < 'A' || 'Z' < *ext) @@ -3149,7 +3149,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l unsigned flags) { int ret; - int was_full = !istate->sparse_index; + int was_full = istate->sparse_index == COMPLETELY_FULL; ret = convert_to_sparse(istate, 0); diff --git a/sparse-index.c b/sparse-index.c index 37c7df877a6..79e8ff087bc 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -173,7 +173,7 @@ int convert_to_sparse(struct index_state *istate, int flags) * If the index is already sparse, empty, or otherwise * cannot be converted to sparse, do not convert. */ - if (istate->sparse_index || !istate->cache_nr || + if (istate->sparse_index == COLLAPSED || !istate->cache_nr || !is_sparse_index_allowed(istate, flags)) return 0; @@ -214,7 +214,7 @@ int convert_to_sparse(struct index_state *istate, int flags) FREE_AND_NULL(istate->fsmonitor_dirty); FREE_AND_NULL(istate->fsmonitor_last_update); - istate->sparse_index = 1; + istate->sparse_index = COLLAPSED; trace2_region_leave("index", "convert_to_sparse", istate->repo); return 0; } @@ -259,7 +259,7 @@ void expand_to_pattern_list(struct index_state *istate, * If the index is already full, then keep it full. We will convert * it to a sparse index on write, if possible. */ - if (!istate || !istate->sparse_index) + if (!istate || istate->sparse_index == COMPLETELY_FULL) return; /* From patchwork Mon May 16 18:11:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12851322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91AEEC433EF for ; Mon, 16 May 2022 18:12:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344627AbiEPSMG (ORCPT ); Mon, 16 May 2022 14:12:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344602AbiEPSLo (ORCPT ); Mon, 16 May 2022 14:11:44 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 674C43D4B3 for ; Mon, 16 May 2022 11:11:40 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id w4so21568736wrg.12 for ; Mon, 16 May 2022 11:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=mgKy/wmfp+WZHTlBuoJS32yn4kSi0zwmpGFlL0Vsrn0=; b=NbGDUL8iz8xboVfUwx9v/3gUFXTAEsT6gasbCq3KlOljNTZC96l2tXz5CpFzTPc3WF M2u3Anb547L0X2eBRoCtFYZvcS1V6uTs8gsRpeSBdpW/KKeGsjQ8IxCz78M/+aQ03hGP 8plmkUT41GgeOPLH1TaYAAhIpBA7qTjwkaGxbfJ9TMDF8JJknTx9yZB+gSbPFUjogQiu 4r4mwuKN4jtTvkNBg7v4YFtsALQ6w2jAaEU5mLCPCzTTltkxhFLraI7XMfa6P3GozciQ n3cGQzHAtLyojtMNSOcr3+GqOh1nFY+xHgoORT8oGqQAblY2ol/JRbbQrCQxMVOQGmYK 9uTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=mgKy/wmfp+WZHTlBuoJS32yn4kSi0zwmpGFlL0Vsrn0=; b=lIczRpTQ4HcQlNDsqg5MylTCApC/IiUf45oTL1s0MC/jWt4zYTrZOXAs1I5z8DB9xZ AtoG6N0fmKAfpZuDFVtS6xMnK+8FOl0tX1iPIuJFmMPEAk/KxEJ9R0f95C7OMsLAoQyF A19LR3Y8FnB8sQmItxHLgiGKyF9VRt4rdFsUPeOdWxjh//a4jsVpZIWixTZCCbHb9L48 Kz6XVFusuDHzy7ByrkpNAVnbiSjml9Bgv/vwODFK4syBOa0//cZmZa49j/X9Ch7YOSch UnWjaz2d7uZm5xq4k1s05zPX4mGlKK6lVgCfvZfe96YRVsv3CL43a/3H7Q1DI6JiAOcp rZxA== X-Gm-Message-State: AOAM531VDqQIWq2j0G3bUhEgGCey8mihBubUdGUKepueSrCzgVg4lnMf Azw1sfvKKPPFONH3Y7lguCINZkK9JKE= X-Google-Smtp-Source: ABdhPJw6/eYp29fsdAr4XwAKpIEPAcLfBNjYSeUZugpDNV6tfOMTnRxnqK1Eysu8hFyk4bpgHHdyWA== X-Received: by 2002:a05:6000:1815:b0:20a:deee:3cf0 with SMTP id m21-20020a056000181500b0020adeee3cf0mr14980478wrh.210.1652724698690; Mon, 16 May 2022 11:11:38 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e15-20020adfa74f000000b0020c5253d8e5sm9754430wrd.49.2022.05.16.11.11.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 11:11:38 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 16 May 2022 18:11:28 +0000 Subject: [PATCH 3/8] cache-tree: implement cache_tree_find_path() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, shaoxuan.yuan02@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Given a 'struct cache_tree', it may be beneficial to navigate directly to a node within that corresponds to a given path name. Create cache_tree_find_path() for this function. It returns NULL when no such path exists. The implementation is adapted from do_invalidate_path() which does a similar search but also modifies the nodes it finds along the way. This new method is not currently used, but will be in an upcoming change. Signed-off-by: Derrick Stolee --- cache-tree.c | 24 ++++++++++++++++++++++++ cache-tree.h | 2 ++ 2 files changed, 26 insertions(+) diff --git a/cache-tree.c b/cache-tree.c index 6752f69d515..23893a7b113 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -100,6 +100,30 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *it, const char *path) return find_subtree(it, path, pathlen, 1); } +struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path) +{ + const char *slash; + int namelen; + struct cache_tree_sub *down; + + if (!it) + return NULL; + slash = strchrnul(path, '/'); + namelen = slash - path; + it->entry_count = -1; + if (!*slash) { + int pos; + pos = cache_tree_subtree_pos(it, path, namelen); + if (0 <= pos) + return it->down[pos]->cache_tree; + return NULL; + } + down = find_subtree(it, path, namelen, 0); + if (down) + return cache_tree_find_path(down->cache_tree, slash + 1); + return NULL; +} + static int do_invalidate_path(struct cache_tree *it, const char *path) { /* a/b/c diff --git a/cache-tree.h b/cache-tree.h index 8efeccebfc9..f75f8e74dcd 100644 --- a/cache-tree.h +++ b/cache-tree.h @@ -29,6 +29,8 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *); int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen); +struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path); + void cache_tree_write(struct strbuf *, struct cache_tree *root); struct cache_tree *cache_tree_read(const char *buffer, unsigned long size); From patchwork Mon May 16 18:11:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12851317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47FDFC433EF for ; Mon, 16 May 2022 18:11:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242072AbiEPSLw (ORCPT ); Mon, 16 May 2022 14:11:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344601AbiEPSLo (ORCPT ); Mon, 16 May 2022 14:11:44 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B1F33D4B5 for ; Mon, 16 May 2022 11:11:40 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id f2so14595144wrc.0 for ; Mon, 16 May 2022 11:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=PnR1tnT4QBFQfMJMAKPgZ82lu5UpfUErt1pLv/Nsqlk=; b=A/VpyrnbHlk7ET2Nytz9p1bSCcrdBFUulpfm91riUyQyVGWIPCCQH8ZMBKDPEAia2X NF5n20V9TEmV2hHVISOuZPu2O1Jo22kFxcbbMNeYlrcfWxaG7IMYU9IlVIAcpihQDESE 8V8P1tuX/O0KZ1YzICpP7cNlTYK6TDZ9t6coUWK2EN5A4GaE6STnPb3F2kpuBnTH4/13 2nnuyMumESD5S75UJb3/rIGw/4tZcLJF60nQOF14xu973XYuJzMoAm3Lg/oUyN5tg5vd Ql+H+M8ywy9SdHpTK2djlAqfiuqlAFHvN2KqA+bwLk5/ezB4ubZ4uj9B/HUawqfZiCKG ISbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=PnR1tnT4QBFQfMJMAKPgZ82lu5UpfUErt1pLv/Nsqlk=; b=EvPprIwHKa7K91a5+hmtSuMIEUETB98MceJohiaWhB+7jxLZp9OiSVpPwMvxBCgYlH 2iQh/EuL/qYEKMmlyg66rxkg06sYaWYNtsMJQdeQrLgoIpMSA19d0FMGQRg9HF2HUbii 7WyoFTjbqrLMr0wO/twKhmPcAdPO5L54q6mAHXnq3PwI80fYtRqUV3HOwrSh3ggxaJws QpVqjV361zhybPf/FxrLId5qci0EZvfuw7GoNhN9z+ID343mDSqk+WL57t0KrebBKwyN +rQ4SiIAv84fh/4Tx3gJ2MzkmOcfl6wZ/7OdKUvlColXKbjMtDI6rsqdXrnbAzngPbiY rgXA== X-Gm-Message-State: AOAM5312+m+ZRoOMF7lbFzEJcztA1h3vxdOd4Mb1eVDnWKWvfPw00bYM i2wQVE53ypqHaCQmt8U9hEFJr0AaOJ4= X-Google-Smtp-Source: ABdhPJwCWSxFmPcXUWhcGuYS6NYCIZLwHMaXcTYu4Emf+jgodkpyiuesHi+S0+S7wOU4d2+S84fv9w== X-Received: by 2002:a05:6000:1e0e:b0:20c:59b4:e9db with SMTP id bj14-20020a0560001e0e00b0020c59b4e9dbmr15681482wrb.524.1652724699769; Mon, 16 May 2022 11:11:39 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t21-20020adfa2d5000000b0020d0f111241sm1959496wra.24.2022.05.16.11.11.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 11:11:39 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 16 May 2022 18:11:29 +0000 Subject: [PATCH 4/8] sparse-checkout: --no-sparse-index needs a full index Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, shaoxuan.yuan02@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When the --no-sparse-index option is supplied, the sparse-checkout builtin should explicitly ask to expand a sparse index to a full one. This is currently done implicitly due to the command_requires_full_index protection, but that will be removed in an upcoming change. Signed-off-by: Derrick Stolee --- builtin/sparse-checkout.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 88eea069ad4..cbff6ad00b0 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -413,6 +413,9 @@ static int update_modes(int *cone_mode, int *sparse_index) /* force an index rewrite */ repo_read_index(the_repository); the_repository->index->updated_workdir = 1; + + if (!*sparse_index) + ensure_full_index(the_repository->index); } return 0; From patchwork Mon May 16 18:11:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12851321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79709C433F5 for ; Mon, 16 May 2022 18:12:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233143AbiEPSMC (ORCPT ); Mon, 16 May 2022 14:12:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245237AbiEPSLp (ORCPT ); Mon, 16 May 2022 14:11:45 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF0DB3DA44 for ; Mon, 16 May 2022 11:11:42 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id a5so17816122wrp.7 for ; Mon, 16 May 2022 11:11:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=PKKhfPaob0S5doQJeOE3/jb2mSeIHDX+N92uXEC0GVs=; b=qr49prPbu6Yp/HpAWUYk5wIP1BJAkkWbo1ISG3X/dcUQ4ND1epT2p3WNZvWI15UVRZ zqKJA4JjqVcb6dXBCKeyZBWOAhm9aNkVBKx2vb8XttkheOGxxW+7pBiViQDm0YN/uymA TdkmF8pg0jP9NpLBhlSY7VH5j8+kbF9XkEL+BXqmJn7am9h022nGknYVLSmX22MF3k6+ 50wKPsDHoZg4n3+gGhIlA7R5dzdcLIkz3Wa9q18S1kVyzpxt9kq1k5ADPBwvJaMjYzOs IsPX/eNPVN1ktVUmcpD+3NV0qiScW35JMuSAWw3Ykgb0f2aJ9tKyJZ5N2omqpznF4BwJ psFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=PKKhfPaob0S5doQJeOE3/jb2mSeIHDX+N92uXEC0GVs=; b=y2MZ5bYRm15+X8rNlVRYR/PxvLzt9Cbsn8GSjE+UIaTsbsaZf0TYSSrq74JtpZadc6 zMR4QYYs8W1a5hxGFUwWhplrGI4+I/PAuTKg5FGUisE/sOIKFZjxMvWo3ABazbb3440m msyPNREmuCtW7QfOcVSWRFp9YcR8aXSvV5T7zYBKCE54dd1SeYN8GOrqyFVLhJnhDxxx GR/t70OqkypGRv4PesMBsknsstWnP35qXY7x59zifB6I+0/gADSj0YRtbOPA2Y2vuJ1Z Akrqh2tWFm6hLthtTZHEcC/5bzn5StEejqTy5ItUqq6G9nm6u38TVzFyNkkA9bi2hzNT VJag== X-Gm-Message-State: AOAM5303/iC9qCA8EPO5saRuCqQOZT4Rx3R5nucbQfzAjBezvxB1rqbO fj/ql9+bZKu5sILvzkqsXOLoaoYyEd4= X-Google-Smtp-Source: ABdhPJwri3prlU6tsx3AF19B4FqAMy21V3dIo03PyK6c160GrO1ueEHUUsKwS7SnMDF1yw5vybZx0Q== X-Received: by 2002:adf:d1e3:0:b0:20c:6684:9b10 with SMTP id g3-20020adfd1e3000000b0020c66849b10mr15487423wrd.53.1652724700825; Mon, 16 May 2022 11:11:40 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o20-20020a1c4d14000000b0039453fe55a7sm13819288wmh.35.2022.05.16.11.11.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 11:11:40 -0700 (PDT) Message-Id: <5c7546ab07080b43972b265eb2eee3de0c5396a2.1652724693.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 16 May 2022 18:11:30 +0000 Subject: [PATCH 5/8] sparse-index: partially expand directories Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, shaoxuan.yuan02@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The expand_to_pattern_list() method expands sparse directory entries to their list of contained files when either the pattern list is NULL or the directory is contained in the new pattern list's cone mode patterns. It is possible that the pattern list has a recursive match with a directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need to be expanded. If there exists a directory 'A/B/D/', then that directory should not be expanded and instead we can create a sparse directory. To implement this, we plug into the add_path_to_index() callback for the call to read_tree_at(). Since we now need access to both the index we are writing and the pattern list we are comparing, create a 'struct modify_index_context' to use as a data transfer object. It is important that we use the given pattern list since we will use this pattern list to change the sparse-checkout patterns and cannot use istate->sparse_checkout_patterns. Signed-off-by: Derrick Stolee --- sparse-index.c | 46 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 39 insertions(+), 7 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 79e8ff087bc..3d8eed585b5 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -9,6 +9,11 @@ #include "dir.h" #include "fsmonitor.h" +struct modify_index_context { + struct index_state *write; + struct pattern_list *pl; +}; + static struct cache_entry *construct_sparse_dir_entry( struct index_state *istate, const char *sparse_dir, @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid, struct strbuf *base, const char *path, unsigned int mode, void *context) { - struct index_state *istate = (struct index_state *)context; + struct modify_index_context *ctx = (struct modify_index_context *)context; struct cache_entry *ce; size_t len = base->len; - if (S_ISDIR(mode)) - return READ_TREE_RECURSIVE; + if (S_ISDIR(mode)) { + int dtype; + size_t baselen = base->len; + if (!ctx->pl) + return READ_TREE_RECURSIVE; - strbuf_addstr(base, path); + /* + * Have we expanded to a point outside of the sparse-checkout? + */ + strbuf_addstr(base, path); + strbuf_add(base, "/-", 2); + + if (path_matches_pattern_list(base->buf, base->len, + NULL, &dtype, + ctx->pl, ctx->write)) { + strbuf_setlen(base, baselen); + return READ_TREE_RECURSIVE; + } - ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0); + /* + * The path "{base}{path}/" is a sparse directory. Create the correct + * name for inserting the entry into the idnex. + */ + strbuf_setlen(base, base->len - 1); + } else { + strbuf_addstr(base, path); + } + + ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0); ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED; - set_index_entry(istate, istate->cache_nr++, ce); + set_index_entry(ctx->write, ctx->write->cache_nr++, ce); strbuf_setlen(base, len); return 0; @@ -254,6 +282,7 @@ void expand_to_pattern_list(struct index_state *istate, int i; struct index_state *full; struct strbuf base = STRBUF_INIT; + struct modify_index_context ctx; /* * If the index is already full, then keep it full. We will convert @@ -294,6 +323,9 @@ void expand_to_pattern_list(struct index_state *istate, full->cache_nr = 0; ALLOC_ARRAY(full->cache, full->cache_alloc); + ctx.write = full; + ctx.pl = pl; + for (i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce = istate->cache[i]; struct tree *tree; @@ -319,7 +351,7 @@ void expand_to_pattern_list(struct index_state *istate, strbuf_add(&base, ce->name, strlen(ce->name)); read_tree_at(istate->repo, tree, &base, &ps, - add_path_to_index, full); + add_path_to_index, &ctx); /* free directory entries. full entries are re-used */ discard_cache_entry(ce); From patchwork Mon May 16 18:11:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12851320 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73D93C433FE for ; Mon, 16 May 2022 18:11:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344592AbiEPSL6 (ORCPT ); Mon, 16 May 2022 14:11:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239447AbiEPSLp (ORCPT ); Mon, 16 May 2022 14:11:45 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28AE03DA46 for ; Mon, 16 May 2022 11:11:44 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id g17so3463292wrb.11 for ; Mon, 16 May 2022 11:11:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=jp8LPibzt32PIJ60geqqfZp5GnxycP5I2Gk84UF/cHE=; b=SxULzl4GAE2qafds8w4fLKpK7CNBvPwq2k6S9GUCax1q6M+mF3SvARjK21cwlglskc AzU6jMPVI2y+dpBgCTJiXSLr+PV3EntfBFDClIVe12BKAOXkhXF+EyJbiZyz4KkknrWp 4qA/KzLagCeOfwHyCU9QQA3AS/it0CFhDk3pgK0rg4EZd3fVPtNNPKe/2vWS+r/4AGdk ei7lcsfg73SR909SKTf4a8YdqCSlAS/Vz6eN4J98X8VRTb9877VizgQy9/jUPodRNts8 m+fKPlfQcefzf6TYakj9tnMfVGsu0Pj3kEAs3UzjxNF+AmtQdlAh/y/KPw69MsPooZda VVUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=jp8LPibzt32PIJ60geqqfZp5GnxycP5I2Gk84UF/cHE=; b=ozMfje88cJ3RD9BqrcWWASWRqN3wJpuL64c3HlJHAyDjvP68baDkogIKu5W0OsZzfB qs4A1jFqmfspMpIWeMro+Hl90DfBo5uFOd+pLBMzRmL0gsCsZnn0Uc4GD5gDFYF1b/Nk jvI10KT0zwCav2VzMwML0wXUvIXS/33pceH8OL6u16bOkag/lj4RMvic8kbbas83fuU4 YbYB8D3jUjlB9NXAxiwv77uTIYg1Ou8oimkphFxqDmpuM1g+TMe85EkpGVkLptm/xPxZ mVy6zZ0NNWUpNcRRLaFTMhjp3sJa06EDszyfmIiEfb5A+tqAieFpz5oFL0hj1hAI6DvS eNsg== X-Gm-Message-State: AOAM533eCoO8ouI6h0uC7yqkmasaYvAxRzR9yQURuo3G4n/u8BaY2Mwd VuleIB37DBmDoT6B+r1yhIU4trZym2M= X-Google-Smtp-Source: ABdhPJz1BQHFWlQisPmFRSVlQxRN0NeQQRrBaPN49LcjkOAhePFCeX1AAhddlSjdDgFEsT2TgJNEMg== X-Received: by 2002:adf:fe44:0:b0:20d:412:9775 with SMTP id m4-20020adffe44000000b0020d04129775mr7311168wrs.626.1652724702399; Mon, 16 May 2022 11:11:42 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s20-20020adf9794000000b0020c5253d902sm10025897wrb.78.2022.05.16.11.11.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 11:11:41 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 16 May 2022 18:11:31 +0000 Subject: [PATCH 6/8] sparse-index: complete partial expansion Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, shaoxuan.yuan02@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee To complete the implementation of expand_to_pattern_list(), we need to detect when a sparse directory entry should remain sparse. This avoids a full expansion, so we now need to use the PARTIALLY_SPARSE mode to indicate this state. There still are no callers to this method, but we will add one in the next change. Signed-off-by: Derrick Stolee --- sparse-index.c | 41 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 3d8eed585b5..0bad5503304 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -297,8 +297,24 @@ void expand_to_pattern_list(struct index_state *istate, * continue. A NULL pattern set indicates a full expansion to a * full index. */ - if (pl && !pl->use_cone_patterns) + if (pl && !pl->use_cone_patterns) { pl = NULL; + } else { + /* + * We might contract file entries into sparse-directory + * entries, and for that we will need the cache tree to + * be recomputed. + */ + cache_tree_free(&istate->cache_tree); + + /* + * If there is a problem creating the cache tree, then we + * need to expand to a full index since we cannot satisfy + * the current request as a sparse index. + */ + if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) + pl = NULL; + } if (!istate->repo) istate->repo = the_repository; @@ -317,8 +333,14 @@ void expand_to_pattern_list(struct index_state *istate, full = xcalloc(1, sizeof(struct index_state)); memcpy(full, istate, sizeof(struct index_state)); + /* + * This slightly-misnamed 'full' index might still be sparse if we + * are only modifying the list of sparse directories. This hinges + * on whether we have a non-NULL pattern list. + */ + full->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL; + /* then change the necessary things */ - full->sparse_index = 0; full->cache_alloc = (3 * istate->cache_alloc) / 2; full->cache_nr = 0; ALLOC_ARRAY(full->cache, full->cache_alloc); @@ -330,11 +352,22 @@ void expand_to_pattern_list(struct index_state *istate, struct cache_entry *ce = istate->cache[i]; struct tree *tree; struct pathspec ps; + int dtype; if (!S_ISSPARSEDIR(ce->ce_mode)) { set_index_entry(full, full->cache_nr++, ce); continue; } + + /* We now have a sparse directory entry. Should we expand? */ + if (pl && + path_matches_pattern_list(ce->name, ce->ce_namelen, + NULL, &dtype, + pl, istate) <= 0) { + set_index_entry(full, full->cache_nr++, ce); + continue; + } + if (!(ce->ce_flags & CE_SKIP_WORKTREE)) warning(_("index entry is a directory, but not sparse (%08x)"), ce->ce_flags); @@ -360,7 +393,7 @@ void expand_to_pattern_list(struct index_state *istate, /* Copy back into original index. */ memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash)); - istate->sparse_index = 0; + istate->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL; free(istate->cache); istate->cache = full->cache; istate->cache_nr = full->cache_nr; @@ -374,7 +407,7 @@ void expand_to_pattern_list(struct index_state *istate, /* Clear and recompute the cache-tree */ cache_tree_free(&istate->cache_tree); - cache_tree_update(istate, 0); + cache_tree_update(istate, WRITE_TREE_MISSING_OK); trace2_region_leave("index", pl ? "expand_to_pattern_list" : "ensure_full_index", From patchwork Mon May 16 18:11:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12851319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50C68C433EF for ; Mon, 16 May 2022 18:12:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344605AbiEPSMA (ORCPT ); Mon, 16 May 2022 14:12:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344603AbiEPSLq (ORCPT ); Mon, 16 May 2022 14:11:46 -0400 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E12EF377FC for ; Mon, 16 May 2022 11:11:45 -0700 (PDT) Received: by mail-wm1-x32e.google.com with SMTP id k126so9200492wme.2 for ; Mon, 16 May 2022 11:11:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=2IlzxJ7VZLFYW9FoTSMVdEvLGG9ryAdgqq2Dl4y6yJU=; b=Ah6c159hgMD1XchGGpVeiutzh3AM7s6ybmiGVB6bFdWhgm1jyCWf7Ax1hvRoYaoxm0 T33ihuQD/Dloj4slhEPam4Dp+0psNbSn2+MSLr0kRxQwt5eF828TyP45N3bG0uDKfndd ERd6AfvyVWnHQXAwrnCUzO+eX9diseCiVHaOhSVYBA2+e3RXKL7E0abetmYaKt3gn/NC YoWpGSwg1auhQWrooFhTkEUTaSKgrNV5M0Gum32yun/XInJbpWXg6yCUr/Z8RxYVuLc/ euUEvuZi8zuoQSAc54q/d8jKcEiJu5F8ILMdEHu1rxhHPww3sbma/6+xipeSgrlT9zXM 1XRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=2IlzxJ7VZLFYW9FoTSMVdEvLGG9ryAdgqq2Dl4y6yJU=; b=aNy13QXQlx1z/TQvER99x0tn0Na4MEQWW9Ny2OzIspd0dKuAoeeAS4Vwwy6pTHLBMj gsXg1pOSaBFS1oFTtQcqKjrh21DD5dzhWIp1CSE4TkzB3guJVb5w9eH7xPxmDVfok88m kcXGh3g+mhJcY0kmaP1+Mc3LQrZPamZ+3i2TRXOyIlyLWTeuK4GPZCF87jBwwGxBwjJF pVDQqiIrhJ7I4yNPYU6/DeZwz77b/ObuhJM9By5dy9ZWco1qLHbUKq3vuXH8sDteYOzP KFe4jTsYP6uAhDJ785gNxbRvV1lSeZFGs8AGYweR7Cj5tFt9kF2AHgRQTxrHj7ypP49U yffA== X-Gm-Message-State: AOAM530KJBHF60CG3T/IlJgFURv8keFBd1nMky4J4c9Nw8xUElyhW5JR S6IqOR3EQ+YBdc5tTXRuF4LPghc3DHY= X-Google-Smtp-Source: ABdhPJzHWgt0he0o0FZAb8SZttt2lV2B8dF3M/PE0f7MsPE1EdvecGI9cqGenNerMLwPc0vcbaVTrA== X-Received: by 2002:a05:600c:4f95:b0:394:8919:7557 with SMTP id n21-20020a05600c4f9500b0039489197557mr18502411wmq.166.1652724704086; Mon, 16 May 2022 11:11:44 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b2-20020adfc742000000b0020c5253d90csm10141083wrh.88.2022.05.16.11.11.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 11:11:43 -0700 (PDT) Message-Id: <2804326c8bb0f70ca43e68c03789f32ad628cfaa.1652724693.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 16 May 2022 18:11:32 +0000 Subject: [PATCH 7/8] p2000: add test for 'git sparse-checkout [add|set]' Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, shaoxuan.yuan02@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The sparse-checkout builtin is almost completely integrated with the sparse index, allowing the sparse-checkout boundary to be modified without expanding a sparse index to a full one. Add a test to p2000-sparse-operations.sh that adds a directory to the sparse-checkout definition, then removes it. Using both operations is important to ensure that the operation is doing the same work in each repetition as well as leaving the test repo in a good state for later tests. Signed-off-by: Derrick Stolee --- t/perf/p2000-sparse-operations.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 382716cfca9..ce5cfac5714 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -110,6 +110,7 @@ test_perf_on_all git add -A test_perf_on_all git add . test_perf_on_all git commit -a -m A test_perf_on_all git checkout -f - +test_perf_on_all "git sparse-checkout add f2/f3/f1 && git sparse-checkout set $SPARSE_CONE" test_perf_on_all git reset test_perf_on_all git reset --hard test_perf_on_all git reset -- does-not-exist From patchwork Mon May 16 18:11:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12851323 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B208C433EF for ; Mon, 16 May 2022 18:12:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245028AbiEPSMJ (ORCPT ); Mon, 16 May 2022 14:12:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344594AbiEPSLs (ORCPT ); Mon, 16 May 2022 14:11:48 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B5873D4A0 for ; Mon, 16 May 2022 11:11:47 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id g17so3463292wrb.11 for ; Mon, 16 May 2022 11:11:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=9zxer5sFKSbxR68vZth3Kwgw3B0891otUOZh0T2dqTg=; b=SZumQRp1rUIvgYIIa+WT2ppXMie6yDHydwFN3jlMjBh/5zvSTixJWpK1ibAVchVicH Huh4Xb25IyXRqPNb8a+kkviDNeVIPpFvoG0C5vd2r2MoNxeBfgs7arxa7H6ojag63qIP pkHOH5UWqFnFhGWE0N9JyGz8T/YGxxJNGlQ31LrfdKAfXzp9/0gCnTujC7QQVqCo61Wx 0Hjp2krYaXvb6AQ7tThePWuNmyyo/bSA+/RoPz68zJvoH5hOc9PhboQk/U1GD4x/rvnh diFmHHwh+Y3aRrdW5ApvGGJYuat6F/98zLuM78DonyE3E4cSQCKvFDHn/Zu4nKKHdNi8 NANg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=9zxer5sFKSbxR68vZth3Kwgw3B0891otUOZh0T2dqTg=; b=u1tKUJCTz38knnNxBmQ33CvUNZMymI1G61YRyuUN6kTBWNpKq8xSYUeCfC0Tdf8VK/ JfWdrZmndjkBaoHs2tEur6pGqfWH7HpzD/eG1Puu5Kxd7y3gMvSygfAmS7+X7xgiQ4On EmQwLZQbyfwH2EvzdojUH0e9B1fm06LL4ZOrWxYtr9VCjZCQq1OMJ8QYlxng2BgcAJaC SV02Ww9ax3vEVoKysNmDj+X0c9IlkPxCqZtxHUUn+Kb0or4KsIktEED4n2BYQQRyT1z4 6wYU9E0qyY6+Gs2gR+8UDhmmVFBQHgrXIFKC8TkqNGCK25ehzTsuzYHRbVD8G1B/O+7r YLSQ== X-Gm-Message-State: AOAM532nNfO70M1lZYik2Tl8xutRJ8FVwdkAvIrp4S4zGnd0mZzL56yT l4akEV9ItK1kArNJxu522k8G4PaVRWA= X-Google-Smtp-Source: ABdhPJzYI1lg+HgFU9hjrrm3Q/76GyQ5t4BiDWo/TlaVVk7Qpg958gzIjMn22K8zKr4vOSyj1bsahw== X-Received: by 2002:adf:e646:0:b0:20a:c4fa:4991 with SMTP id b6-20020adfe646000000b0020ac4fa4991mr14934264wrn.413.1652724705183; Mon, 16 May 2022 11:11:45 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m6-20020a05600c460600b003942a244f2fsm6217wmo.8.2022.05.16.11.11.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 11:11:44 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 16 May 2022 18:11:33 +0000 Subject: [PATCH 8/8] sparse-checkout: integrate with sparse index Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, shaoxuan.yuan02@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When modifying the sparse-checkout definition, the sparse-checkout builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all cache entries in the index. Before, we needed the index to be fully expanded in order to ensure we had the full list of files necessary that match the new patterns. Insert a call to reset_sparse_directories() that expands sparse directories that are within the new pattern list, but only far enough that every necessary file path now exists as a cache entry. The remaining logic within update_sparsity() will modify the SKIP_WORKTREE bits appropriately. This allows us to disable command_requires_full_index within the sparse-checkout builtin. Add tests that demonstrate that we are not expanding to a full index unnecessarily. We can see the improved performance in the p2000 test script: Test HEAD~1 HEAD ------------------------------------------------------------------------ 2000.24: git ... (sparse-v3) 2.14(1.55+0.58) 1.57(1.03+0.53) -26.6% 2000.25: git ... (sparse-v4) 2.20(1.62+0.57) 1.58(0.98+0.59) -28.2% These reductions of 26-28% are small compared to most examples, but the time is dominated by writing a new copy of the base repository to the worktree and then deleting it again. The fact that the previous index expansion was such a large portion of the time is telling how important it is to complete this sparse index integration. Signed-off-by: Derrick Stolee --- builtin/sparse-checkout.c | 3 +++ t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++++ unpack-trees.c | 4 ++++ 3 files changed, 32 insertions(+) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index cbff6ad00b0..0157b292b36 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -937,6 +937,9 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + if (argc > 0) { if (!strcmp(argv[0], "list")) return sparse_checkout_list(argc, argv); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 93bcfd20bbc..614357fc48c 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -1552,6 +1552,31 @@ test_expect_success 'ls-files' ' ensure_not_expanded ls-files --sparse ' +test_expect_success 'sparse index is not expanded: sparse-checkout' ' + init_repos && + + ensure_not_expanded sparse-checkout set deep/deeper2 && + ensure_not_expanded sparse-checkout set deep/deeper1 && + ensure_not_expanded sparse-checkout set deep && + ensure_not_expanded sparse-checkout add folder1 && + ensure_not_expanded sparse-checkout set deep/deeper1 && + ensure_not_expanded sparse-checkout set folder2 && + + # Demonstrate that the checks that "folder1/a" is a file + # do not cause a sparse-index expansion (since it is in the + # sparse-checkout cone). + echo >>sparse-index/folder2/a && + git -C sparse-index add folder2/a && + + ensure_not_expanded sparse-checkout add folder1 && + + # Skip checks here, since deep/deeper1 is inside a sparse directory + # that must be expanded to check whether `deep/deeper1` is a file + # or not. + ensure_not_expanded sparse-checkout set --skip-checks deep/deeper1 && + ensure_not_expanded sparse-checkout set +' + # NEEDSWORK: a sparse-checkout behaves differently from a full checkout # in this scenario, but it shouldn't. test_expect_success 'reset mixed and checkout orphan' ' diff --git a/unpack-trees.c b/unpack-trees.c index 7f528d35cc2..9745e0dfc34 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -18,6 +18,7 @@ #include "promisor-remote.h" #include "entry.h" #include "parallel-checkout.h" +#include "sparse-index.h" /* * Error messages expected by scripts out of plumbing commands such as @@ -2018,6 +2019,9 @@ enum update_sparsity_result update_sparsity(struct unpack_trees_options *o) goto skip_sparse_checkout; } + /* Expand sparse directories as needed */ + expand_to_pattern_list(o->src_index, o->pl); + /* Set NEW_SKIP_WORKTREE on existing entries. */ mark_all_ce_unused(o->src_index); mark_new_skip_worktree(o->pl, o->src_index, 0,