From patchwork Thu Sep 27 12:44:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?SZEDER_G=C3=A1bor?= X-Patchwork-Id: 10617967 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EAB331759 for ; Thu, 27 Sep 2018 12:44:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D86842B425 for ; Thu, 27 Sep 2018 12:44:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CBE302B432; Thu, 27 Sep 2018 12:44:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C26302B425 for ; Thu, 27 Sep 2018 12:44:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727334AbeI0TDB (ORCPT ); Thu, 27 Sep 2018 15:03:01 -0400 Received: from mail-wm1-f45.google.com ([209.85.128.45]:53603 "EHLO mail-wm1-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727111AbeI0TDA (ORCPT ); Thu, 27 Sep 2018 15:03:00 -0400 Received: by mail-wm1-f45.google.com with SMTP id b19-v6so5778006wme.3 for ; Thu, 27 Sep 2018 05:44:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=q3MdaH6bbL6ji957pBbkJ9qsx3BOWb/Na/SM+TOnOxs=; b=PBbFa5iEMqVFqwvH6CMY1jPd8NeN4owgyOTjcd74Zd/qQa31XrFJwGF0ReZPVXsSQ9 hdVK2Iwu0bICw/6VjdOfncpLER6AR4uoOVV2ovrRrecRqMOU+5vGFQ7oynh8EGij5pBt InVqM2N6ir/MoCUTb2dmNV3eJbkx5UPzLycUztT4YytxOqZ0Ovn8Jf/ZYDNVe/A8nWdG sw9Or1C2Hvt/3BoRCdGEh1YykiV3xXegh0wZcpXCnv14DNEFVnG7gjJ6LzF9zGDVG+gD UYHWtSf/+2504QqLowghgFjhTWu7K0kWztJleMm0ch5MZ6qUStkljDrbs1S3hrXdZbJm 5FVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=q3MdaH6bbL6ji957pBbkJ9qsx3BOWb/Na/SM+TOnOxs=; b=oyYj4NUc3W9u24q8k1Bpc6mozJTxcnyMz3Zu2PEsSO911Wou5cWkopQfF8SBY8AjwZ T3DVJWR0rgzUSYytgHPK7pj9NVr0DJ/FivdO1GGrS6/MFHjZ+3yoCpEj4mvKQfRE1lVb Hb9jiWzu7kaCR1le+uilfTlP07cX3G7gTukZBizaJTOu8njIvfAfBbdWReUC0fmOIcgf AqSzyymZOmZYabb9wH1PLgKwQc0E/PFmqw+/oC277sF5yXIKJ2qXg4tXyX1sUaZw20J1 HIgDHiMgDUYhqhz3S6AtQq8pIcC6FV0GPD6PyYmE2OK88bD1TT5kPjkV7c0MYYzK03je VLMw== X-Gm-Message-State: ABuFfoitEIuyJD84GSc/b7OZ6sEDKefbdqkplaw86X4qxA5Qj8HKrKsL OLcLWBLjJ3mMwQuxpg8xIobyyld5 X-Google-Smtp-Source: ACcGV63hYjX9wCUF2sVj8X43KMWuOr4I4RmuhKfy2tHt3ZLikxZO+PBzHnmDSz5/l1RKGiCbTUMrpg== X-Received: by 2002:a1c:c011:: with SMTP id q17-v6mr7950481wmf.37.1538052291046; Thu, 27 Sep 2018 05:44:51 -0700 (PDT) Received: from localhost.localdomain (x4dbd8656.dyn.telefonica.de. [77.189.134.86]) by smtp.gmail.com with ESMTPSA id c8-v6sm1938543wrn.43.2018.09.27.05.44.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 27 Sep 2018 05:44:50 -0700 (PDT) From: =?utf-8?q?SZEDER_G=C3=A1bor?= To: git@vger.kernel.org Cc: Junio C Hamano , Duy Nguyen , Thomas Gummerer , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= , Paul-Sebastian Ungureanu , =?utf-8?q?SZED?= =?utf-8?q?ER_G=C3=A1bor?= Subject: [PATCH v2 1/5] split-index: add tests to demonstrate the racy split index problem Date: Thu, 27 Sep 2018 14:44:30 +0200 Message-Id: <20180927124434.30835-2-szeder.dev@gmail.com> X-Mailer: git-send-email 2.19.0.361.gafc87ffe72 In-Reply-To: <20180927124434.30835-1-szeder.dev@gmail.com> References: <20180927124434.30835-1-szeder.dev@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Ever since the split index feature was introduced [1], refreshing a split index is prone to a variant of the classic racy git problem. There are a couple of unrelated tests in the test suite that occasionally fail when run with 'GIT_TEST_SPLIT_INDEX=yes', but 't1700-split-index.sh', the only test script focusing solely on split index, has never noticed this issue, because it only cares about how the index is split under various circumstances and all the different ways to turn the split index feature on and off. Add a dedicated test script 't1701-racy-split-index.sh' to exercise the split index feature in racy situations as well; kind of a "t0010-racy-git.sh for split index" but with modern style (the tests do everything in &&-chained list of commands in 'test_expect_...' blocks, and use 'test_cmp' for more informative output on failure). The tests cover the following sequences of index splitting, updating, and racy file modifications, with the last two cases demonstrating the racy split index problem: 1. Split the index while adding a racily clean file: echo "cached content" >file git update-index --split-index --add file echo "dirty worktree" >file # size stays the same This case already works properly. Even though the cache entry's stat data matches with the modifid file in the worktree, subsequent git commands will notice that the (split) index and the file have the same mtime, and then will go on to check the file's content and notice its dirtiness. 2. Add a racily clean file to an already split index: git update-index --split-index echo "cached content" >file git update-index --add file echo "dirty worktree" >file This case already works properly. After the second 'git update-index' writes the newly added file's cache entry to the new split index, it basically works in the same way as case #1. 3. Split the index when it (i.e. the not yet splitted index) contains a racily clean cache entry, i.e. an entry whose cached stat data matches with the corresponding file in the worktree and the cached mtime matches that of the index: echo "cached content" >file git update-index --add file echo "dirty worktree" >file # ... wait ... git update-index --split-index --add other-file This case already works properly. The shared index is written by do_write_index(), i.e. the same function that is responsible for writing "regular" and split indexes as well. This function cleverly notices the racily clean cache entry, and writes the entry to the new shared index with smudged stat data, i.e. file size set to 0. When subsequent git commands read the index, they will notice that the smudged stat data doesn't match with the file in the worktree, and then go on to check the file's content and notice its dirtiness. 4. Update the split index when it contains a racily clean cache entry: git update-index --split-index echo "cached content" >file git update-index --add file echo "dirty worktree" >file # ... wait ... git update-index --add other-file This case already works properly. After the second 'git update-index' the newly added file's cache entry is only stored in the split index. If a cache entry is present in the split index (even if it is a replacement of an outdated entry in the shared index), then it will always be included in the new split index on subsequent split index updates (until the file is removed or a new shared index is written), independently from whether the entry is racily clean or not. When do_write_index() writes the new split index, it notices the racily clean cache entry, and smudges its stat date. Subsequent git commands reading the index will notice the smudged stat data and then go on to check the file's content and notice its dirtiness. 5. Update the split index when a racily clean cache entry is stored only in the shared index: echo "cached content" >file git update-index --split-index --add file echo "dirty worktree" >file # ... wait ... git update-index --add other-file This case fails due to the racy split index problem. In the second 'git update-index' prepare_to_write_split_index() decides, among other things, which cache entries stored only in the shared index should be replaced in the new split index. Alas, this function never looks out for racily clean cache entries, and since the file's stat data in the worktree hasn't changed since the shared index was written, the entry won't be replaced in the new split index. Consequently, do_write_index() doesn't even get this racily clean cache entry, and can't smudge its stat data. Subsequent git commands will then see that the index has more recent mtime than the file and that the (not smudged) cached stat data still matches with the file in the worktree, and, ultimately, will erroneously consider the file clean. 6. Update the split index after unpack_trees() copied a racily clean cache entry from the shared index: echo "cached content" >file git update-index --split-index --add file echo "dirty worktree" >file # ... wait ... git read-tree -m HEAD This case fails due to the racy split index problem. This basically fails for the same reason as case #5 above, but there is one important difference, which warrants the dedicated test. While that second 'git update-index' in case #5 updates index_state in place, in this case 'git read-tree -m' calls unpack_trees(), which throws out the entire index, and constructs a new one from the (potentially updated) copies of the original's cache entries. Consequently, when prepare_to_write_split_index() gets to work on this reconstructed index, it takes a different code path than in case #5 when deciding which cache entries in the shared index should be replaced. The result is the same, though: the racily clean cache entry goes unnoticed, it isn't added to the split index with smudged stat data, and subsequent git commands will then erroneously consider the file clean. Note that in the last two 'test_expect_failure' cases I omitted the '#' (as in nr. of trial) from the tests' name on purpose for now, as it confuses 'prove' into thinking that those tests failed unexpectedly. [1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge branch 'nd/split-index', 2014-07-16). Signed-off-by: SZEDER Gábor --- t/t1701-racy-split-index.sh | 218 ++++++++++++++++++++++++++++++++++++ 1 file changed, 218 insertions(+) create mode 100755 t/t1701-racy-split-index.sh diff --git a/t/t1701-racy-split-index.sh b/t/t1701-racy-split-index.sh new file mode 100755 index 0000000000..ebde418d7e --- /dev/null +++ b/t/t1701-racy-split-index.sh @@ -0,0 +1,218 @@ +#!/bin/sh + +# This test can give false success if your machine is sufficiently +# slow or all trials happened to happen on second boundaries. + +test_description='racy split index' + +. ./test-lib.sh + +test_expect_success 'setup' ' + # Only split the index when the test explicitly says so. + sane_unset GIT_TEST_SPLIT_INDEX GIT_FSMONITOR_TEST && + git config splitIndex.maxPercentChange 100 && + + echo "cached content" >racy-file && + git add racy-file && + git commit -m initial && + + echo something >other-file && + # No raciness with this file. + test-tool chmtime =-20 other-file && + + echo "+cached content" >expect +' + +check_cached_diff () { + git diff-index --patch --cached $EMPTY_TREE racy-file >diff && + tail -1 diff >actual && + test_cmp expect actual +} + +trials="0 1 2 3 4" +for trial in $trials +do + test_expect_success "split the index while adding a racily clean file #$trial" ' + rm -f .git/index .git/sharedindex.* && + + # The next three commands must be run within the same + # second (so both writes to racy-file result in the same + # mtime) to create the interesting racy situation. + echo "cached content" >racy-file && + + # Update and split the index. The cache entry of + # racy-file will be stored only in the shared index. + git update-index --split-index --add racy-file && + + # File size must stay the same. + echo "dirty worktree" >racy-file && + + # Subsequent git commands should notice that racy-file + # and the split index have the same mtime, and check + # the content of the file to see if it is actually + # clean. + check_cached_diff + ' +done + +for trial in $trials +do + test_expect_success "add a racily clean file to an already split index #$trial" ' + rm -f .git/index .git/sharedindex.* && + + git update-index --split-index && + + # The next three commands must be run within the same + # second. + echo "cached content" >racy-file && + + # Update the split index. The cache entry of racy-file + # will be stored only in the split index. + git update-index --add racy-file && + + # File size must stay the same. + echo "dirty worktree" >racy-file && + + # Subsequent git commands should notice that racy-file + # and the split index have the same mtime, and check + # the content of the file to see if it is actually + # clean. + check_cached_diff + ' +done + +for trial in $trials +do + test_expect_success "split the index when the index contains a racily clean cache entry #$trial" ' + rm -f .git/index .git/sharedindex.* && + + # The next three commands must be run within the same + # second. + echo "cached content" >racy-file && + + git update-index --add racy-file && + + # File size must stay the same. + echo "dirty worktree" >racy-file && + + # Now wait a bit to ensure that the split index written + # below will get a more recent mtime than racy-file. + sleep 1 && + + # Update and split the index when the index contains + # the racily clean cache entry of racy-file. + # A corresponding replacement cache entry with smudged + # stat data should be added to the new split index. + git update-index --split-index --add other-file && + + # Subsequent git commands should notice the smudged + # stat data in the replacement cache entry and that it + # doesnt match with the file the worktree. + check_cached_diff + ' +done + +for trial in $trials +do + test_expect_success "update the split index when it contains a new racily clean cache entry #$trial" ' + rm -f .git/index .git/sharedindex.* && + + git update-index --split-index && + + # The next three commands must be run within the same + # second. + echo "cached content" >racy-file && + + # Update the split index. The cache entry of racy-file + # will be stored only in the split index. + git update-index --add racy-file && + + # File size must stay the same. + echo "dirty worktree" >racy-file && + + # Now wait a bit to ensure that the split index written + # below will get a more recent mtime than racy-file. + sleep 1 && + + # Update the split index when the racily clean cache + # entry of racy-file is only stored in the split index. + # An updated cache entry with smudged stat data should + # be added to the new split index. + git update-index --add other-file && + + # Subsequent git commands should notice the smudged + # stat data. + check_cached_diff + ' +done + +for trial in $trials +do + test_expect_failure "update the split index when a racily clean cache entry is stored only in the shared index $trial" ' + rm -f .git/index .git/sharedindex.* && + + # The next three commands must be run within the same + # second. + echo "cached content" >racy-file && + + # Update and split the index. The cache entry of + # racy-file will be stored only in the shared index. + git update-index --split-index --add racy-file && + + # File size must stay the same. + echo "dirty worktree" >racy-file && + + # Now wait a bit to ensure that the split index written + # below will get a more recent mtime than racy-file. + sleep 1 && + + # Update the split index when the racily clean cache + # entry of racy-file is only stored in the shared index. + # A corresponding replacement cache entry with smudged + # stat data should be added to the new split index. + # + # Alas, such a smudged replacement entry is not added! + git update-index --add other-file && + + # Subsequent git commands should notice the smudged + # stat data. + check_cached_diff + ' +done + +for trial in $trials +do + test_expect_failure "update the split index after unpack trees() copied a racily clean cache entry from the shared index $trial" ' + rm -f .git/index .git/sharedindex.* && + + # The next three commands must be run within the same + # second. + echo "cached content" >racy-file && + + # Update and split the index. The cache entry of + # racy-file will be stored only in the shared index. + git update-index --split-index --add racy-file && + + # File size must stay the same. + echo "dirty worktree" >racy-file && + + # Now wait a bit to ensure that the split index written + # below will get a more recent mtime than racy-file. + sleep 1 && + + # Update the split index after unpack_trees() copied the + # racily clean cache entry of racy-file from the shared + # index. A corresponding replacement cache entry + # with smudged stat data should be added to the new + # split index. + # + # Alas, such a smudged replacement entry is not added! + git read-tree -m HEAD && + + # Subsequent git commands should notice the smudged + # stat data. + check_cached_diff + ' +done + +test_done